Adam Moody
|
43d58d99eb
ckpt: create directories in checkpoint_engine (#2988)
|
1 year ago |
Jeff Rasley
|
da84e60d98
add missing license info to top of all source code (#2889)
|
1 year ago |
Ma, Guokai
|
98cc35b6a8
Abstract accelerator (step 3) (#2677)
|
1 year ago |
Joe Mayer
|
18713c6838
Updating API docs (#2586)
|
1 year ago |
iLeGend
|
06e00f61ce
Fix typos: deepseed -> deepspeed (#2499)
|
1 year ago |
Adam Moody
|
b8fb9c3f1a
parallelize writing of layer checkpoint files across data parallel instances (#1419)
|
2 years ago |
trajep
|
e669aaf55b
Trajepl/nebula ckpt engine (#2085)
|
2 years ago |
Alex Hedges
|
316c4a43e0
Add flake8 to pre-commit checks (#2051)
|
2 years ago |
Karim Foda
|
735406e536
fix import errors (#2026)
|
2 years ago |
Ammar Ahmad Awan
|
36ad3119d5
DeepSpeed comm backend v1 (#1985)
|
2 years ago |
Jeff Rasley
|
b4fcd98ff0
Inference PP changes for neox (#1899)
|
2 years ago |
Olatunji Ruwase
|
56c5223868
bf16+pipeline parallelism (#1801)
|
2 years ago |
James Reed
|
fafc827d64
Render docs for pipe.ProcessTopology (#1505)
|
2 years ago |
Stas Bekman
|
bf1725bb57
[code readability] pipe (#1510)
|
3 years ago |
Alex Hedges
|
be789b1665
Fix many typos (#1423)
|
3 years ago |
Hyunwoong Ko
|
30965ea734
Add flexibility of pipeline parallel module and engine (#1399)
|
3 years ago |
Jeff Rasley
|
e2fdd254ed
Big science related changes (#1407)
|
3 years ago |
Adam Moody
|
4ad8019cdf
fix: support three digit layer numbers (#1377)
|
3 years ago |
Olatunji Ruwase
|
336dd089e5
Use clone to avoid checkpoint bloat (#1326)
|
3 years ago |
Reza Yazdani
|
ed3de0c21b
Quantization + inference release (#1091)
|
3 years ago |
Shaden Smith
|
46f4573b1a
Seeded unit tests (#1072)
|
3 years ago |
sdtblck
|
669028f0fd
Fix all Pipeline Module Parameters being sent to cuda:0 (#687)
|
3 years ago |
Shaden Smith
|
fbece50b21
assert no Z2/Z3 with pipeline and fix some docs links (#980)
|
3 years ago |
Shaden Smith
|
c82756cd15
readthedocs upgrade (#402)
|
4 years ago |
Shaden Smith
|
65c2f974d8
Pipeline parallel training engine. (#392)
|
4 years ago |