Commit History

Author SHA1 Message Date
  Ma, Guokai 98cc35b6a8 Abstract accelerator (step 3) (#2677) 1 year ago
  Alexander Jipa 0f0e38c520 fixes #2498 (#2603) 1 year ago
  ShijieZZZZ 340fc0cf19 Report progress at gradient accumulation boundary (#2553) 1 year ago
  Jeff Rasley 5bd09a8f83 Allow turning off loss scaling wrt GAS + update tput calculator (#2140) 2 years ago
  Alex Hedges 316c4a43e0 Add flake8 to pre-commit checks (#2051) 2 years ago
  Quentin Anthony 5349347bb6 DeepSpeed Communication Profiling and Logging (#2012) 2 years ago
  Zeyu b05237876e fixed "None type has no len()" (#2091) 2 years ago
  Karim Foda 735406e536 fix import errors (#2026) 2 years ago
  Ammar Ahmad Awan 36ad3119d5 DeepSpeed comm backend v1 (#1985) 2 years ago
  Quentin Anthony 0d36893281 Fix timer typo (#1964) 2 years ago
  Olatunji Ruwase fee7313598 Use cuda events to improve timing for multi-stream execution (#1881) 2 years ago
  Justin Chiu 4912e0ad7e Various ZeRO Stage3 Optimizations + Improvements (including bfloat16 support) (#1453) 2 years ago
  Cheng Li 9caa74e577 Autotuning (#1554) 2 years ago
  Cheng Li 4544b7d2f1 Improve flops profiler functionality (#1065) 3 years ago
  Sean Naren 6fb16100ba Replace timer print rank 0 with logging (#732) 3 years ago
  Jeff Rasley 0dc8420042 Dependency pruning (#528) 3 years ago
  Shaden Smith 65c2f974d8 Pipeline parallel training engine. (#392) 4 years ago
  Jeff Rasley e5bbc2e559 Sparse attn + ops/runtime refactor + v0.3.0 (#343) 4 years ago