Max Kovalenko
|
3c0bd31288
BF16 optimizer: Improve device utilization by immediate grad update (#4975)
|
8 months ago |
inkcherry
|
d5a7c1e0b4
Capture short kernel sequences to graph (#4318)
|
10 months ago |
Reza Yazdani
|
2afa1c7f2f
Communication Optimization for Large-Scale Training (#4695)
|
11 months ago |
Reza Yazdani
|
ec029e7625
Fix the sequence-parallelism for the dense model architecture (#4530)
|
1 year ago |
Quentin Anthony
|
0411a9f871
Expose Consecutive Hysteresis to Users (#3553)
|
1 year ago |
Michael Wyatt
|
b361c72761
Update DeepSpeed copyright license to Apache 2.0 (#3111)
|
1 year ago |
Jeff Rasley
|
91d63e0228
update formatter version and style settings (#3098)
|
1 year ago |
Jeff Rasley
|
457850dc5a
[zero] prevent poor configs from running w. zero-offload (#2971)
|
1 year ago |
Jeff Rasley
|
da84e60d98
add missing license info to top of all source code (#2889)
|
1 year ago |
Stas Bekman
|
f30a030861
[fp16] lower initial_scale_power (#2663)
|
1 year ago |
Conglong Li
|
ef869377e9
DeepSpeed Data Efficiency Library (#2585)
|
1 year ago |
Joe Mayer
|
21c2802964
Adding Gradient Accumulation Data Type Config (#2512)
|
1 year ago |
Joe Mayer
|
7d113633e4
Fix Bug #2319 (#2438)
|
2 years ago |
Adam Moody
|
b8fb9c3f1a
parallelize writing of layer checkpoint files across data parallel instances (#1419)
|
2 years ago |
Jeff Rasley
|
a039e2261a
enable fp16 input autocasting (#2158)
|
2 years ago |
Olatunji Ruwase
|
80d0a32f0b
Checkpoint reshaping (#1953)
|
2 years ago |
Zhewei Yao
|
0f4f2f982c
Adding DeepSpeed Compression Composer (#2105)
|
2 years ago |
Quentin Anthony
|
c87f6ee209
DeepSpeed Monitor Module (Master) (#2013)
|
2 years ago |
Olatunji Ruwase
|
56c5223868
bf16+pipeline parallelism (#1801)
|
2 years ago |
Olatunji Ruwase
|
135a625619
Move param_shapes to model files (#1732)
|
2 years ago |
Justin Chiu
|
4912e0ad7e
Various ZeRO Stage3 Optimizations + Improvements (including bfloat16 support) (#1453)
|
2 years ago |
Jeff Rasley
|
3293cf72a0
[ZeRO] Default disable elastic ckpt in stage 1+2 and reduce CPU memory overhead during ckpt load (#1525)
|
2 years ago |
Mikhail Druzhinin
|
d14baad940
allreduce_always_fp16 (#1487)
|
2 years ago |
Rana Ali Amjad
|
648f7bfa50
Bfloat16 zero2 (#1398)
|
3 years ago |
Hari Prasad
|
c0b27fb019
Added drop_last to DeepSpeedDataLoader (#1321)
|
3 years ago |
Ammar Ahmad Awan
|
f28432441b
DeepSpeed MoE (#1310)
|
3 years ago |
Conglong Li
|
b2b34ae342
Curriculum learning (#1307)
|
3 years ago |
Reza Yazdani
|
ed3de0c21b
Quantization + inference release (#1091)
|
3 years ago |
Jeff Rasley
|
f032e56f8a
Validate consistent ckpt tags across ranks (#667)
|
3 years ago |
Jeff Rasley
|
7435b2f10a
Ability to initialize distributed backend outside deepspeed runtime (#608)
|
3 years ago |