Yudi Zhang
|
e75c285ad7
fix user args parsing of string with spaces on runner (#4265)
|
1 年之前 |
Hiromasa
|
8145b5e41f
added port argument for ssh (#4117)
|
1 年之前 |
Logan Adams
|
1a29573946
Handling for SIGTERM as well (#4160)
|
1 年之前 |
Michael Wyatt
|
0cc2d6ff25
Fix user arg parsing in single node deployment (#4007)
|
1 年之前 |
Logan Adams
|
6580a2db17
Allow user to select name of .deepspeed_env (#4006)
|
1 年之前 |
digger yu
|
ce535945e6
fix: change ==NONE to is (#3923)
|
1 年之前 |
Abhilash Majumder
|
26b3e73298
single node pdsh sigkill (#3730)
|
1 年之前 |
Logan Adams
|
d8aaa58122
Fix incorrectly formatted f string (#3698)
|
1 年之前 |
Jeff Rasley
|
49a73549b9
AISC launcher fixes (#3637)
|
1 年之前 |
Ma, Guokai
|
1f72082fc0
[CPU] Support Intel CPU inference (#3041)
|
1 年之前 |
Michael Wyatt
|
2f8d384e8b
print default values (#3347)
|
1 年之前 |
Ma, Guokai
|
0b5252bbd3
[CPU support] Optionally bind each rank to different cores on host (#2881)
|
1 年之前 |
Michael Wyatt
|
b361c72761
Update DeepSpeed copyright license to Apache 2.0 (#3111)
|
1 年之前 |
Jeff Rasley
|
91d63e0228
update formatter version and style settings (#3098)
|
1 年之前 |
mzl
|
8d53ac0cd3
Add MPICH Multinode Runner (#2839)
|
1 年之前 |
Ma, Guokai
|
98cc35b6a8
Abstract accelerator (step 3) (#2677)
|
1 年之前 |
Jeff Rasley
|
a091bc223c
[launcher] fail gracefully if hostname -i doesn't work as expected (#2631)
|
1 年之前 |
mzl
|
11f5daba5e
add enable_each_rank_log to deepspeed/launcher/runner.py (#2571)
|
1 年之前 |
Jeff Rasley
|
8c56c25d84
[launcher] parse hostfile via regex and added error checks (#2626)
|
1 年之前 |
savitamittal1
|
ffb6d98762
Added MLFLOW environment variables for logging metrics within trainig… (#2477)
|
1 年之前 |
Cheng Li
|
8da0238b7a
rollback ds config changes (#2395)
|
2 年之前 |
Dashiell Stander
|
3db0b5e2de
Add SLURM Multinode Runner (#2404)
|
2 年之前 |
Arpan Jain
|
1ed5aa96a8
Elastic Training support in DeepSpeed (#2153) (#2156)
|
2 年之前 |
trajep
|
e669aaf55b
Trajepl/nebula ckpt engine (#2085)
|
2 年之前 |
liamcli
|
380d32f980
[launcher] add option to bypass ssh check (#1957)
|
2 年之前 |
Jeff Rasley
|
a773996d97
[launcher] validate passwordless-ssh works when using hostfile launching (#1832)
|
2 年之前 |
liamcli
|
dac9056e13
Improve how runner parses env var file (#1747)
|
2 年之前 |
Jeff Rasley
|
9351266f78
Multi-node save pid support + allow sparse-attn extra (#1728)
|
2 年之前 |
Jeff Rasley
|
171316fc83
launcher save pid + require manual triton install for sparse-attn (#1727)
|
2 年之前 |
Jeff Rasley
|
2d51f6171b
preserve cuda visible devices order (#1712)
|
2 年之前 |