ranzhejiang 7260890452 reduce cpu host overhead when using moe (#5578) | 2 月之前 | |
---|---|---|
.. | ||
__init__.py | b361c72761 Update DeepSpeed copyright license to Apache 2.0 (#3111) | 1 年之前 |
experts.py | 971d82b573 MoE type hints (#5043) | 8 月之前 |
layer.py | 08e0733e4a Support MoE for pipeline models (#5338) | 6 月之前 |
mappings.py | f5d6c6311e reduce all-to-all communication volume when both expert and non-expert are tensor-parallel (#5626) | 3 月之前 |
sharded_moe.py | 7260890452 reduce cpu host overhead when using moe (#5578) | 2 月之前 |
utils.py | 42a8eaa705 Auto convert moe param groups (#5354) | 6 月之前 |