Quentin Anthony ac2c9ffae4 Improve loss overflow logs (#3008) 1 year ago
..
autotuning 6379defaef bug fix for skipping mbs (#2171) 1 year ago
checkpoint da84e60d98 add missing license info to top of all source code (#2889) 1 year ago
comm 50a49e42fb [logger] implement warning_once (#3021) 1 year ago
compression da84e60d98 add missing license info to top of all source code (#2889) 1 year ago
elasticity da84e60d98 add missing license info to top of all source code (#2889) 1 year ago
inference 87eaf8f99a Check for local CUDA graphs when enable_cuda_graph=True (#2941) 1 year ago
launcher 8d53ac0cd3 Add MPICH Multinode Runner (#2839) 1 year ago
model_implementations 87eaf8f99a Check for local CUDA graphs when enable_cuda_graph=True (#2941) 1 year ago
module_inject 3798e60519 Fix Meta Tensor checkpoint load for OPT models (#2990) 1 year ago
moe da84e60d98 add missing license info to top of all source code (#2889) 1 year ago
monitor 91d7090e47 Fixes `AttributeError` in #2853 (#2854) 1 year ago
nebula da84e60d98 add missing license info to top of all source code (#2889) 1 year ago
ops 17fa0876ad Always convert input mask to half (#2851) 1 year ago
pipe da84e60d98 add missing license info to top of all source code (#2889) 1 year ago
profiling da84e60d98 add missing license info to top of all source code (#2889) 1 year ago
runtime ac2c9ffae4 Improve loss overflow logs (#3008) 1 year ago
utils 50a49e42fb [logger] implement warning_once (#3021) 1 year ago
__init__.py 4abf637f96 Remove mutable default parameter in init_inference() (#2540) 1 year ago
accelerator 9548d48f48 Abstract accelerator (step 2) (#2560) 1 year ago
constants.py 1ed5aa96a8 Elastic Training support in DeepSpeed (#2153) (#2156) 2 years ago
env_report.py da84e60d98 add missing license info to top of all source code (#2889) 1 year ago
git_version_info.py da84e60d98 add missing license info to top of all source code (#2889) 1 year ago