Jeff Rasley
|
da84e60d98
add missing license info to top of all source code (#2889)
|
1 年之前 |
Stas Bekman
|
f30a030861
[fp16] lower initial_scale_power (#2663)
|
1 年之前 |
Conglong Li
|
ef869377e9
DeepSpeed Data Efficiency Library (#2585)
|
1 年之前 |
Joe Mayer
|
21c2802964
Adding Gradient Accumulation Data Type Config (#2512)
|
1 年之前 |
Joe Mayer
|
7d113633e4
Fix Bug #2319 (#2438)
|
2 年之前 |
Adam Moody
|
b8fb9c3f1a
parallelize writing of layer checkpoint files across data parallel instances (#1419)
|
2 年之前 |
Jeff Rasley
|
a039e2261a
enable fp16 input autocasting (#2158)
|
2 年之前 |
Olatunji Ruwase
|
80d0a32f0b
Checkpoint reshaping (#1953)
|
2 年之前 |
Zhewei Yao
|
0f4f2f982c
Adding DeepSpeed Compression Composer (#2105)
|
2 年之前 |
Quentin Anthony
|
c87f6ee209
DeepSpeed Monitor Module (Master) (#2013)
|
2 年之前 |
Olatunji Ruwase
|
56c5223868
bf16+pipeline parallelism (#1801)
|
2 年之前 |
Olatunji Ruwase
|
135a625619
Move param_shapes to model files (#1732)
|
2 年之前 |
Justin Chiu
|
4912e0ad7e
Various ZeRO Stage3 Optimizations + Improvements (including bfloat16 support) (#1453)
|
2 年之前 |
Jeff Rasley
|
3293cf72a0
[ZeRO] Default disable elastic ckpt in stage 1+2 and reduce CPU memory overhead during ckpt load (#1525)
|
2 年之前 |
Mikhail Druzhinin
|
d14baad940
allreduce_always_fp16 (#1487)
|
2 年之前 |
Rana Ali Amjad
|
648f7bfa50
Bfloat16 zero2 (#1398)
|
3 年之前 |
Hari Prasad
|
c0b27fb019
Added drop_last to DeepSpeedDataLoader (#1321)
|
3 年之前 |
Ammar Ahmad Awan
|
f28432441b
DeepSpeed MoE (#1310)
|
3 年之前 |
Conglong Li
|
b2b34ae342
Curriculum learning (#1307)
|
3 年之前 |
Reza Yazdani
|
ed3de0c21b
Quantization + inference release (#1091)
|
3 年之前 |
Jeff Rasley
|
f032e56f8a
Validate consistent ckpt tags across ranks (#667)
|
3 年之前 |
Jeff Rasley
|
7435b2f10a
Ability to initialize distributed backend outside deepspeed runtime (#608)
|
3 年之前 |
Olatunji Ruwase
|
be1147c08a
PLD release (#513)
|
4 年之前 |
Jeff Rasley
|
41db1c2f03
ZeRO-Offload release (#391)
|
4 年之前 |
Arash Ashari
|
a64b0abbcc
fixed a typo; this was fixed before but seems like it has been lost in the refactor (#364)
|
4 年之前 |
Jeff Rasley
|
e5bbc2e559
Sparse attn + ops/runtime refactor + v0.3.0 (#343)
|
4 年之前 |