Ammar Ahmad Awan
|
36ad3119d5
DeepSpeed comm backend v1 (#1985)
|
2 年之前 |
Ammar Ahmad Awan
|
c0af6d90f7
Refactor MoE and Groups API to simplify model creation and mangement (#1798)
|
2 年之前 |
Jeff Rasley
|
3293cf72a0
[ZeRO] Default disable elastic ckpt in stage 1+2 and reduce CPU memory overhead during ckpt load (#1525)
|
2 年之前 |
Jeff Rasley
|
2332cb31a7
Enables ZeRO-3 inference (#1514)
|
2 年之前 |
Jeff Rasley
|
6996bb0159
Sparse attn triton v1.0 support + torch1.8 test runner (#1374)
|
3 年之前 |
Hari Prasad
|
c0b27fb019
Added drop_last to DeepSpeedDataLoader (#1321)
|
3 年之前 |
Ammar Ahmad Awan
|
f28432441b
DeepSpeed MoE (#1310)
|
3 年之前 |
Conglong Li
|
b2b34ae342
Curriculum learning (#1307)
|
3 年之前 |
Jeff Rasley
|
adc21a4dfd
ZeRO-1 empty grads fix + tests (#1273)
|
3 年之前 |
hamlet
|
d0b61f1810
Add find_unused_parameters option to DeepSpeedEngine (#945)
|
3 年之前 |
Jeff Rasley
|
2e2dd861f3
Dist testing backend fixes, etc. (#708)
|
3 年之前 |
Olatunji Ruwase
|
e6ac731136
Support initialization with dict configuration (#632)
|
3 年之前 |
Olatunji Ruwase
|
6021b70288
Support non-tensor state in checkpoint (#548)
|
3 年之前 |
Olatunji Ruwase
|
0178e6cc22
Fix unbalanced gradients bug in ZeRO-2 gradient accumulation (#545)
|
3 年之前 |
Olatunji Ruwase
|
be1147c08a
PLD release (#513)
|
4 年之前 |
Shaden Smith
|
65c2f974d8
Pipeline parallel training engine. (#392)
|
4 年之前 |
Jeff Rasley
|
376818ef9d
Empty grad fix (#291)
|
4 年之前 |
Olatunji Ruwase
|
607814feb9
Fix bug in fp32 optimizer state loading (#289)
|
4 年之前 |
Calogero Zarbo
|
43f27332c2
Add "zero_allow_untested_optimizer" option in conf file (#173)
|
4 年之前 |
Jeff Rasley
|
001abe2362
Refactor simple model test, fix pythonpath issue (#96)
|
4 年之前 |