Ma, Guokai
|
98cc35b6a8
Abstract accelerator (step 3) (#2677)
|
1 year ago |
loadams
|
34a11688c4
Change zero_grad() argument to match pytorch (#2741)
|
1 year ago |
JackieWu
|
323c266cfe
[Bug Fixed] use torch.cuda.is_available() (#2661)
|
1 year ago |
Alex Hedges
|
316c4a43e0
Add flake8 to pre-commit checks (#2051)
|
2 years ago |
Karim Foda
|
735406e536
fix import errors (#2026)
|
2 years ago |
Ammar Ahmad Awan
|
36ad3119d5
DeepSpeed comm backend v1 (#1985)
|
2 years ago |
Jeff Rasley
|
50893458d6
Fairseq support (#1915)
|
2 years ago |
Olatunji Ruwase
|
56c5223868
bf16+pipeline parallelism (#1801)
|
2 years ago |
Ammar Ahmad Awan
|
c0af6d90f7
Refactor MoE and Groups API to simplify model creation and mangement (#1798)
|
2 years ago |
Olatunji Ruwase
|
135a625619
Move param_shapes to model files (#1732)
|
2 years ago |
Jeff Rasley
|
e46d808a1b
MoE inference + PR-MoE model support (#1705)
|
2 years ago |
Jeff Rasley
|
3293cf72a0
[ZeRO] Default disable elastic ckpt in stage 1+2 and reduce CPU memory overhead during ckpt load (#1525)
|
2 years ago |
Jeff Rasley
|
e2fdd254ed
Big science related changes (#1407)
|
3 years ago |
Ammar Ahmad Awan
|
f28432441b
DeepSpeed MoE (#1310)
|
3 years ago |
Reza Yazdani
|
ed3de0c21b
Quantization + inference release (#1091)
|
3 years ago |
Conglong Li
|
67a48aaa89
1-bit LAMB optimizer (#970)
|
3 years ago |
Stas Bekman
|
29853c3eed
less scary overflow notice (#833)
|
3 years ago |
Shaden Smith
|
f5cce75e70
Overflow fix (#416)
|
4 years ago |
Shaden Smith
|
65c2f974d8
Pipeline parallel training engine. (#392)
|
4 years ago |
Ammar Ahmad Awan
|
01726ce2b8
Add 1-bit Adam support to DeepSpeed (#380)
|
4 years ago |
Jeff Rasley
|
e5bbc2e559
Sparse attn + ops/runtime refactor + v0.3.0 (#343)
|
4 years ago |