Jeff Rasley
|
dbd08236a6
formatting
|
2 years ago |
Jeff Rasley
|
0fc11fa0e6
[squash] zero-ckpt-cpu-issue (#1673)
|
2 years ago |
Olatunji Ruwase
|
4354c3cc67
Fix largest param numel calculation (#1623)
|
2 years ago |
Jeff Rasley
|
1d295ff5f8
Refactor ZeRO naming to reduce confusion (#1607)
|
2 years ago |
Alex Hedges
|
fc2f378ece
Improve pre-commit hooks (#1602)
|
2 years ago |
Jeff Rasley
|
a10e4811fe
force set lf instead of crlf (https://github.com/pre-commit/pre-commit-hooks#mixed-line-ending) (#1598)
|
2 years ago |
Mikhail Druzhinin
|
d14baad940
allreduce_always_fp16 (#1487)
|
2 years ago |
Jeff Rasley
|
2332cb31a7
Enables ZeRO-3 inference (#1514)
|
2 years ago |
Olatunji Ruwase
|
7567c76c05
Update offload parameter names (#1536)
|
2 years ago |
Olatunji Ruwase
|
488105ebd2
Fix zinf none swapper (#1550)
|
2 years ago |
Zhen Zhang
|
c0eeb69dfb
ZeRO3, improved parameter all-gather operation (#1188)
|
3 years ago |
Olatunji Ruwase
|
58a8e13ccd
Ensure single zero3 context (#1462)
|
3 years ago |
Alex Hedges
|
be789b1665
Fix many typos (#1423)
|
3 years ago |
Jeff Rasley
|
e2fdd254ed
Big science related changes (#1407)
|
3 years ago |
Stas Bekman
|
3fa24208c4
[zero3] fix reference counting in backward over multiple forwards (#1227)
|
3 years ago |
Stas Bekman
|
2a921069d7
[model weights] zero_to_fp32 multiple improvements (#1181)
|
3 years ago |
Stas Bekman
|
5127b2fa25
improve debug (#1215)
|
3 years ago |
Stas Bekman
|
91f58c068c
[zero3] params_to_reduce isn't always there (#1214)
|
3 years ago |
Stas Bekman
|
a029239812
clean up logging (#1190)
|
3 years ago |
Stas Bekman
|
bc019a5339
undo noise (#1191)
|
3 years ago |
Stas Bekman
|
c0c4ebf143
introduce debug utils (#1136)
|
3 years ago |
Stas Bekman
|
0c1802cc8b
ZeRO 2+3 memory estimators (#965)
|
3 years ago |
Samyam Rajbhandari
|
4eaf910616
Samyamr/largest partitioned params calculation fix (#1150)
|
3 years ago |
Olatunji Ruwase
|
e9e9d5b825
ZeRO-Infinity: Swap into unaligned fp16 buffer (#1086)
|
3 years ago |
Olatunji Ruwase
|
d88d927995
ZeRO-Infinity: support swapping misaligned sized fp16 tensors (#1076)
|
3 years ago |
Olatunji Ruwase
|
6b49b60ec8
Get correct fp16 reuse buffer size (#1071)
|
3 years ago |
Sean Naren
|
b3870363e0
[Stage][Fix] Add additional conditions when checking types of output from the model (#1026)
|
3 years ago |
Olatunji Ruwase
|
429dfa6c3d
Handle Norm allreduce when no mp (#1021)
|
3 years ago |
Samyam Rajbhandari
|
dad26428e3
Samyamr/full precision for ZeRO Stage2 and Stage3 (#1004)
|
3 years ago |
William Buchwalter
|
a711878996
Fix issue where gradient_predivide_factor was called as a func. (#996)
|
3 years ago |