.. |
containers
|
d9e12d3a68
Fix attention mask handling in the Hybrid Engine Bloom flow (#5101)
|
7 months ago |
__init__.py
|
b361c72761
Update DeepSpeed copyright license to Apache 2.0 (#3111)
|
1 year ago |
auto_tp.py
|
8ea995ee1f
enable yuan autotp & add conv tp (#5428)
|
4 months ago |
auto_tp_model_utils.py
|
c20f6fa4e0
support baichuan model: (#4721)
|
10 months ago |
fusedqkv_utils.py
|
8ea995ee1f
enable yuan autotp & add conv tp (#5428)
|
4 months ago |
inject.py
|
b361c72761
Update DeepSpeed copyright license to Apache 2.0 (#3111)
|
1 year ago |
layers.py
|
8ea995ee1f
enable yuan autotp & add conv tp (#5428)
|
4 months ago |
load_checkpoint.py
|
567f97b264
load linear layer weight with given dtype (#4044)
|
8 months ago |
module_quantize.py
|
430510bfce
Checks for user injection policy (#3052)
|
1 year ago |
policy.py
|
0a61d5d664
Hybrid Engine Refactor and Llama Inference Support (#3425)
|
1 year ago |
replace_module.py
|
8ea995ee1f
enable yuan autotp & add conv tp (#5428)
|
4 months ago |
replace_policy.py
|
468882fb68
Add the policy to run llama model from the official repo (#4313)
|
1 year ago |
tp_shard.py
|
3dd7ccff81
enable phi3_mini autotp (#5501)
|
5 months ago |
utils.py
|
468882fb68
Add the policy to run llama model from the official repo (#4313)
|
1 year ago |