ranzhejiang 7260890452 reduce cpu host overhead when using moe (#5578) 2 月之前
..
__init__.py b361c72761 Update DeepSpeed copyright license to Apache 2.0 (#3111) 1 年之前
experts.py 971d82b573 MoE type hints (#5043) 8 月之前
layer.py 08e0733e4a Support MoE for pipeline models (#5338) 6 月之前
mappings.py f5d6c6311e reduce all-to-all communication volume when both expert and non-expert are tensor-parallel (#5626) 3 月之前
sharded_moe.py 7260890452 reduce cpu host overhead when using moe (#5578) 2 月之前
utils.py 42a8eaa705 Auto convert moe param groups (#5354) 6 月之前