Stas Bekman
|
e3c2d7b16f
[launcher/runner] respect CUDA_VISIBLE_DEVICES for a single node (#960)
|
2 年之前 |
Cheng Li
|
9caa74e577
Autotuning (#1554)
|
2 年之前 |
Chunyang Wen
|
df5b0884c7
Unify use f str (#1511)
|
3 年之前 |
Alex Hedges
|
be789b1665
Fix many typos (#1423)
|
3 年之前 |
Jeff Rasley
|
9e0dab402d
add option to force multi-node launcher mode (#977)
|
3 年之前 |
Jeff Rasley
|
72a30c1eab
revert zero-inf change to launcher
|
3 年之前 |
Jeff Rasley
|
0d4a54a04d
ZeRO-Infinity (#976)
|
3 年之前 |
Takuya Makino
|
e6999ebd16
Delete check of pdsh (#941)
|
3 年之前 |
Takuya Makino
|
ce14cf1af6
Add space in help string (#926)
|
3 年之前 |
Stas Bekman
|
24335d49ce
[runner/launch] propagate the error (#854)
|
3 年之前 |
Jeff Rasley
|
2e6692c8ad
Fix regression in runner (#843)
|
3 年之前 |
Samyam Rajbhandari
|
599258f979
ZeRO 3 Offload (#834)
|
3 年之前 |
Jeff Rasley
|
6217a6c243
skip empty lines in hostfile (#669)
|
3 年之前 |
Jeff Rasley
|
7435b2f10a
Ability to initialize distributed backend outside deepspeed runtime (#608)
|
3 年之前 |
Ammar Ahmad Awan
|
01726ce2b8
Add 1-bit Adam support to DeepSpeed (#380)
|
4 年之前 |
Jeff Rasley
|
e5bbc2e559
Sparse attn + ops/runtime refactor + v0.3.0 (#343)
|
4 年之前 |