Ma, Guokai
|
c08e69f212
Make op builder detection adapt to accelerator change (#5206)
|
7 月之前 |
Zhihao Lin
|
01af3e1ddf
Enhance the robustness of `module_state_dict` (#4587)
|
11 月之前 |
Alexander Jipa
|
2ded2ff0be
checking process_group before merging bucket ranges (#3521) (#3577)
|
1 年之前 |
Olatunji Ruwase
|
dd8df20fe0
zero3 checkpoint frozen params (#3205)
|
1 年之前 |
Michael Wyatt
|
b361c72761
Update DeepSpeed copyright license to Apache 2.0 (#3111)
|
1 年之前 |
Jeff Rasley
|
91d63e0228
update formatter version and style settings (#3098)
|
1 年之前 |
Ma, Guokai
|
0acf7e9c48
[RFC] add device abstraction to allow other device than CUDA be used (#2221)
|
1 年之前 |
Jeff Rasley
|
da84e60d98
add missing license info to top of all source code (#2889)
|
1 年之前 |
Alexander Jipa
|
cfead55132
fixes #2389 (#2411)
|
2 年之前 |
Michael Wyatt
|
ff42743865
Refactor remaining distributed tests (#2216)
|
2 年之前 |
Ammar Ahmad Awan
|
36ad3119d5
DeepSpeed comm backend v1 (#1985)
|
2 年之前 |
Ammar Ahmad Awan
|
c0af6d90f7
Refactor MoE and Groups API to simplify model creation and mangement (#1798)
|
2 年之前 |
Jeff Rasley
|
3293cf72a0
[ZeRO] Default disable elastic ckpt in stage 1+2 and reduce CPU memory overhead during ckpt load (#1525)
|
2 年之前 |
Jeff Rasley
|
2332cb31a7
Enables ZeRO-3 inference (#1514)
|
2 年之前 |
Jeff Rasley
|
6996bb0159
Sparse attn triton v1.0 support + torch1.8 test runner (#1374)
|
3 年之前 |
Hari Prasad
|
c0b27fb019
Added drop_last to DeepSpeedDataLoader (#1321)
|
3 年之前 |
Ammar Ahmad Awan
|
f28432441b
DeepSpeed MoE (#1310)
|
3 年之前 |
Conglong Li
|
b2b34ae342
Curriculum learning (#1307)
|
3 年之前 |
Jeff Rasley
|
adc21a4dfd
ZeRO-1 empty grads fix + tests (#1273)
|
3 年之前 |
hamlet
|
d0b61f1810
Add find_unused_parameters option to DeepSpeedEngine (#945)
|
3 年之前 |
Jeff Rasley
|
2e2dd861f3
Dist testing backend fixes, etc. (#708)
|
3 年之前 |
Olatunji Ruwase
|
e6ac731136
Support initialization with dict configuration (#632)
|
3 年之前 |
Olatunji Ruwase
|
6021b70288
Support non-tensor state in checkpoint (#548)
|
3 年之前 |
Olatunji Ruwase
|
0178e6cc22
Fix unbalanced gradients bug in ZeRO-2 gradient accumulation (#545)
|
3 年之前 |
Olatunji Ruwase
|
be1147c08a
PLD release (#513)
|
4 年之前 |
Shaden Smith
|
65c2f974d8
Pipeline parallel training engine. (#392)
|
4 年之前 |
Jeff Rasley
|
376818ef9d
Empty grad fix (#291)
|
4 年之前 |
Olatunji Ruwase
|
607814feb9
Fix bug in fp32 optimizer state loading (#289)
|
4 年之前 |
Calogero Zarbo
|
43f27332c2
Add "zero_allow_untested_optimizer" option in conf file (#173)
|
4 年之前 |
Jeff Rasley
|
001abe2362
Refactor simple model test, fix pythonpath issue (#96)
|
4 年之前 |