提交历史

作者 SHA1 备注 提交日期
  Yudi Zhang e75c285ad7 fix user args parsing of string with spaces on runner (#4265) 1 年之前
  Hiromasa 8145b5e41f added port argument for ssh (#4117) 1 年之前
  Logan Adams 1a29573946 Handling for SIGTERM as well (#4160) 1 年之前
  Michael Wyatt 0cc2d6ff25 Fix user arg parsing in single node deployment (#4007) 1 年之前
  Logan Adams 6580a2db17 Allow user to select name of .deepspeed_env (#4006) 1 年之前
  digger yu ce535945e6 fix: change ==NONE to is (#3923) 1 年之前
  Abhilash Majumder 26b3e73298 single node pdsh sigkill (#3730) 1 年之前
  Logan Adams d8aaa58122 Fix incorrectly formatted f string (#3698) 1 年之前
  Jeff Rasley 49a73549b9 AISC launcher fixes (#3637) 1 年之前
  Ma, Guokai 1f72082fc0 [CPU] Support Intel CPU inference (#3041) 1 年之前
  Michael Wyatt 2f8d384e8b print default values (#3347) 1 年之前
  Ma, Guokai 0b5252bbd3 [CPU support] Optionally bind each rank to different cores on host (#2881) 1 年之前
  Michael Wyatt b361c72761 Update DeepSpeed copyright license to Apache 2.0 (#3111) 1 年之前
  Jeff Rasley 91d63e0228 update formatter version and style settings (#3098) 1 年之前
  mzl 8d53ac0cd3 Add MPICH Multinode Runner (#2839) 1 年之前
  Ma, Guokai 98cc35b6a8 Abstract accelerator (step 3) (#2677) 1 年之前
  Jeff Rasley a091bc223c [launcher] fail gracefully if hostname -i doesn't work as expected (#2631) 1 年之前
  mzl 11f5daba5e add enable_each_rank_log to deepspeed/launcher/runner.py (#2571) 1 年之前
  Jeff Rasley 8c56c25d84 [launcher] parse hostfile via regex and added error checks (#2626) 1 年之前
  savitamittal1 ffb6d98762 Added MLFLOW environment variables for logging metrics within trainig… (#2477) 1 年之前
  Cheng Li 8da0238b7a rollback ds config changes (#2395) 2 年之前
  Dashiell Stander 3db0b5e2de Add SLURM Multinode Runner (#2404) 2 年之前
  Arpan Jain 1ed5aa96a8 Elastic Training support in DeepSpeed (#2153) (#2156) 2 年之前
  trajep e669aaf55b Trajepl/nebula ckpt engine (#2085) 2 年之前
  liamcli 380d32f980 [launcher] add option to bypass ssh check (#1957) 2 年之前
  Jeff Rasley a773996d97 [launcher] validate passwordless-ssh works when using hostfile launching (#1832) 2 年之前
  liamcli dac9056e13 Improve how runner parses env var file (#1747) 2 年之前
  Jeff Rasley 9351266f78 Multi-node save pid support + allow sparse-attn extra (#1728) 2 年之前
  Jeff Rasley 171316fc83 launcher save pid + require manual triton install for sparse-attn (#1727) 2 年之前
  Jeff Rasley 2d51f6171b preserve cuda visible devices order (#1712) 2 年之前