Jeff Rasley
|
91b4a93db0
pytest skips for tests requiring certain ops (#411)
|
4 years ago |
Jeff Rasley
|
b1d4bd734b
fix for 16GB v100 nodes (#393)
|
4 years ago |
Jeff Rasley
|
41db1c2f03
ZeRO-Offload release (#391)
|
4 years ago |
Jeff Rasley
|
376818ef9d
Empty grad fix (#291)
|
4 years ago |
Jeff Rasley
|
f545312416
Support amp deepspeed backend (#286)
|
4 years ago |
Olatunji Ruwase
|
4a3234e0ab
ZeRO-2: Handle gradients of empty partitions (#275)
|
4 years ago |
Olatunji Ruwase
|
88c319aad6
Handle parameter groups smaller than DP (#273)
|
4 years ago |
Jeff Rasley
|
abe2204ddd
Support fp32 grad clipping and fix max_grad_norm confusion (#232)
|
4 years ago |
Jeff Rasley
|
f2ac7eafd5
ZeRO-2 (#217)
|
4 years ago |
Olatunji Ruwase
|
512a0d4de1
Fix index out of range error when parameter count is not multiple of ranks (#202)
|
4 years ago |
Calogero Zarbo
|
43f27332c2
Add "zero_allow_untested_optimizer" option in conf file (#173)
|
4 years ago |
Shaden Smith
|
a76572dc7c
Adding static loss scaling for ZeRO. (#166)
|
4 years ago |
Olatunji Ruwase
|
1c0b326e77
Make lr schedulers support fp16 optimizers (#124)
|
4 years ago |
Jeff Rasley
|
5aa58b3878
Init distributed torch only if needed (#108)
|
4 years ago |
Jeff Rasley
|
001abe2362
Refactor simple model test, fix pythonpath issue (#96)
|
4 years ago |
Jeff Rasley
|
807480a04b
Fix issue with empty grads for non-fused optimizers (#83)
|
4 years ago |