Olatunji Ruwase
|
541e423ae6
Enable tensor fragments for zero 2 & 3 (#2727)
|
1 年之前 |
Jeff Rasley
|
da84e60d98
add missing license info to top of all source code (#2889)
|
1 年之前 |
Olatunji Ruwase
|
f4a92a19a6
Checkpoint backwards-compatbility workaround (#2384)
|
2 年之前 |
Olatunji Ruwase
|
53182531ed
Refactor universal checkpointing and tensor fragments (#2253)
|
2 年之前 |
Quentin Anthony
|
5349347bb6
DeepSpeed Communication Profiling and Logging (#2012)
|
2 年之前 |
Reza Yazdani
|
aa88137b8d
Add Inference support for running the BigScience-BLOOM Architecture (#2083)
|
2 年之前 |
Ammar Ahmad Awan
|
36ad3119d5
DeepSpeed comm backend v1 (#1985)
|
2 年之前 |
Justin Chiu
|
4912e0ad7e
Various ZeRO Stage3 Optimizations + Improvements (including bfloat16 support) (#1453)
|
2 年之前 |
Ammar Ahmad Awan
|
f28432441b
DeepSpeed MoE (#1310)
|
3 年之前 |
Jeff Rasley
|
7435b2f10a
Ability to initialize distributed backend outside deepspeed runtime (#608)
|
3 年之前 |
Shaden Smith
|
65c2f974d8
Pipeline parallel training engine. (#392)
|
4 年之前 |
Jeff Rasley
|
e5bbc2e559
Sparse attn + ops/runtime refactor + v0.3.0 (#343)
|
4 年之前 |