gyou2021 474a3288cd Enabled Qwen2-MoE Tensor Parallelism (TP) inference (#6551) 1 周之前
..
containers 9fa4c42443 fix: quantization with DeepSpeed HE (#5624) 3 月之前
__init__.py b361c72761 Update DeepSpeed copyright license to Apache 2.0 (#3111) 1 年之前
auto_tp.py 474a3288cd Enabled Qwen2-MoE Tensor Parallelism (TP) inference (#6551) 1 周之前
auto_tp_model_utils.py c20f6fa4e0 support baichuan model: (#4721) 10 月之前
fusedqkv_utils.py 0d3bb77b33 Add chatglm2 & chatglm3 autotp (#5540) 3 月之前
inject.py b361c72761 Update DeepSpeed copyright license to Apache 2.0 (#3111) 1 年之前
layers.py 8ea995ee1f enable yuan autotp & add conv tp (#5428) 4 月之前
load_checkpoint.py 567f97b264 load linear layer weight with given dtype (#4044) 8 月之前
module_quantize.py 430510bfce Checks for user injection policy (#3052) 1 年之前
policy.py 0a61d5d664 Hybrid Engine Refactor and Llama Inference Support (#3425) 1 年之前
replace_module.py e97b453645 Add llama3.2 vision autotp (#6577) 1 周之前
replace_policy.py 468882fb68 Add the policy to run llama model from the official repo (#4313) 1 年之前
tp_shard.py 3dd7ccff81 enable phi3_mini autotp (#5501) 5 月之前
utils.py 468882fb68 Add the policy to run llama model from the official repo (#4313) 1 年之前