提交历史

作者 SHA1 备注 提交日期
  harygo2 0fc19b6a32 Fix crash when creating Torch tensor on NPU with device=get_accelerator().current_device() (#5464) 5 月之前
  inkcherry 0896503e2f Fix a convergence issues in TP topology caused by incorrect grad_norm. (#5411) 6 月之前
  Logan Adams 6dcced1d5c Cleanup required_torch_version code and references. (#5370) 6 月之前
  Masahiro Tanaka c56a4b9e0d Improve universal checkpoint (#5289) 6 月之前
  inkcherry e5dd5501c1 support bf16_optimizer moe expert parallel training and moe EP grad_scale/grad_norm fix (#5259) 6 月之前
  Moshe Island 8ad187d84f Universal ckp fixes (#4588) 11 月之前
  Jackmin801 2f73b834b5 change default set_to_none in zero_grad methods (#4438) 1 年之前
  Alexander Jipa b354c28b76 polishing timers and log_dist (#3996) 1 年之前
  Logan Adams 6b2365e4fa Re-enable elastic training for torch 2+ (#4010) 1 年之前
  Michael Wyatt b361c72761 Update DeepSpeed copyright license to Apache 2.0 (#3111) 1 年之前
  Jeff Rasley 91d63e0228 update formatter version and style settings (#3098) 1 年之前
  Ma, Guokai 98cc35b6a8 Abstract accelerator (step 3) (#2677) 1 年之前
  loadams 34a11688c4 Change zero_grad() argument to match pytorch (#2741) 1 年之前
  JackieWu 323c266cfe [Bug Fixed] use torch.cuda.is_available() (#2661) 1 年之前
  Alex Hedges 316c4a43e0 Add flake8 to pre-commit checks (#2051) 2 年之前
  Karim Foda 735406e536 fix import errors (#2026) 2 年之前
  Ammar Ahmad Awan 36ad3119d5 DeepSpeed comm backend v1 (#1985) 2 年之前
  Jeff Rasley 50893458d6 Fairseq support (#1915) 2 年之前
  Olatunji Ruwase 56c5223868 bf16+pipeline parallelism (#1801) 2 年之前
  Ammar Ahmad Awan c0af6d90f7 Refactor MoE and Groups API to simplify model creation and mangement (#1798) 2 年之前
  Olatunji Ruwase 135a625619 Move param_shapes to model files (#1732) 2 年之前
  Jeff Rasley e46d808a1b MoE inference + PR-MoE model support (#1705) 2 年之前
  Jeff Rasley 3293cf72a0 [ZeRO] Default disable elastic ckpt in stage 1+2 and reduce CPU memory overhead during ckpt load (#1525) 2 年之前
  Jeff Rasley e2fdd254ed Big science related changes (#1407) 3 年之前
  Ammar Ahmad Awan f28432441b DeepSpeed MoE (#1310) 3 年之前
  Reza Yazdani ed3de0c21b Quantization + inference release (#1091) 3 年之前
  Conglong Li 67a48aaa89 1-bit LAMB optimizer (#970) 3 年之前
  Stas Bekman 29853c3eed less scary overflow notice (#833) 3 年之前
  Shaden Smith f5cce75e70 Overflow fix (#416) 4 年之前
  Shaden Smith 65c2f974d8 Pipeline parallel training engine. (#392) 4 年之前