提交历史

作者 SHA1 备注 提交日期
  Tunji Ruwase 3d5ea1430b Respect memory pinning config 1 年之前
  Joe Mayer 8a8683d343 Fix Issue 4083 (#4084) 1 年之前
  leiwen83 1e0c39c6bf enable pipeline checkpoint loading mode (#3629) 1 年之前
  Zhen Zhang 8a63754bce save_non_zero_checkpoint on first partition group (#3787) 1 年之前
  Olatunji Ruwase 7f90ef4bdd Multiple zero stage 3 related fixes (#3886) 1 年之前
  Alexander Jipa b354c28b76 polishing timers and log_dist (#3996) 1 年之前
  Joe Mayer eeab613ab8 Monitored Loss Calculations (#4030) 1 年之前
  Olatunji Ruwase 0a0819b785 Option to exclude frozen weights for checkpoint save (#3953) 1 年之前
  Joe Mayer 8afcda2ac9 ZeRO Gradient Accumulation Dtype. (#2847) 1 年之前
  Adrian Wälchli fb9aebbf25 Fix checkpoint conversion when model layers share weights (#3825) 1 年之前
  digger yu ce535945e6 fix: change ==NONE to is (#3923) 1 年之前
  Xingjian Shi d81dfdabcc Fix LoRA Fuse/Unfuse in Hybrid Engine (#3563) 1 年之前
  kisseternity 1b888399dc Add an api in deepspeed engine for adjusting micro batch size during training (#3773) 1 年之前
  Joe Mayer 5eb2598623 Requires grad checking. (#3789) 1 年之前
  Cheng Li c80855b543 Bug Fixes for autotuner and flops profiler (#1880) 1 年之前
  Heyang Qin d18aa2c79c ZeRO++ (#3784) 1 年之前
  Jeff Rasley 80ccaf9c7a revert PR #3611 (#3786) 1 年之前
  mzl 5a5340d03b remove UtilsBuilder load, use torch (un)flatten ops (#3728) 1 年之前
  Zhen Zhang c88af21432 [MiCS] [Fix] saving and loading model checkpoint logic for MiCS sharding (#3440) 1 年之前
  Guo Yejun 460bec4679 flops_profiler: add option recompute_fwd_factor for the case of activation recompute (#3362) 1 年之前
  digger yu cd4e473ee6 fix typo with deepspeed/ (#3547) 1 年之前
  Olatunji Ruwase d39c311fc6 DS init should not broadcast or move zero.Init models (#3611) 1 年之前
  Joe Mayer 4d269c6e4d Changing monitor loss to aggregate loss over gradient accumulation steps (#3428) 1 年之前
  digger-yu 254663a28c fix spelling error with deepspeed/runtime/ (#3509) 1 年之前
  Tian, Feng 6938c449de Add snip_momentum structured pruning which can support higher sparse ratio with minor accuracy loss (#3300) 1 年之前
  Joe Mayer d3550dc88a Adagrad support in ZeRO (#3401) 1 年之前
  Stas Bekman 77ebf760f3 [zero_to_fp32] fix shared param recovery (#3407) 1 年之前
  Zhen Zhang 2e99f6edf6 [DRAFT] Tentative implementation of MiCS (#2964) 1 年之前
  Alexander Jipa d56268f375 fixing default communication_data_type for bfloat16_enabled and docs (#3370) 1 年之前
  Michael Wyatt ad168a6954 Fix for dist not being initialized when constructing main config (#3324) 1 年之前