.. |
containers
|
9fa4c42443
fix: quantization with DeepSpeed HE (#5624)
|
3 月之前 |
__init__.py
|
b361c72761
Update DeepSpeed copyright license to Apache 2.0 (#3111)
|
1 年之前 |
auto_tp.py
|
474a3288cd
Enabled Qwen2-MoE Tensor Parallelism (TP) inference (#6551)
|
1 周之前 |
auto_tp_model_utils.py
|
c20f6fa4e0
support baichuan model: (#4721)
|
10 月之前 |
fusedqkv_utils.py
|
0d3bb77b33
Add chatglm2 & chatglm3 autotp (#5540)
|
3 月之前 |
inject.py
|
b361c72761
Update DeepSpeed copyright license to Apache 2.0 (#3111)
|
1 年之前 |
layers.py
|
8ea995ee1f
enable yuan autotp & add conv tp (#5428)
|
4 月之前 |
load_checkpoint.py
|
567f97b264
load linear layer weight with given dtype (#4044)
|
8 月之前 |
module_quantize.py
|
430510bfce
Checks for user injection policy (#3052)
|
1 年之前 |
policy.py
|
0a61d5d664
Hybrid Engine Refactor and Llama Inference Support (#3425)
|
1 年之前 |
replace_module.py
|
e97b453645
Add llama3.2 vision autotp (#6577)
|
1 周之前 |
replace_policy.py
|
468882fb68
Add the policy to run llama model from the official repo (#4313)
|
1 年之前 |
tp_shard.py
|
3dd7ccff81
enable phi3_mini autotp (#5501)
|
5 月之前 |
utils.py
|
468882fb68
Add the policy to run llama model from the official repo (#4313)
|
1 年之前 |