Michael Wyatt
|
4c35880b16
Allow multiple inference engines in single script (#4384)
|
1 年之前 |
Michael Wyatt
|
85334238da
added check to avoid undefined behavior when the input_id length is greater than max_tokens (#4349)
|
1 年之前 |
Satpal Singh Rathore
|
430510bfce
Checks for user injection policy (#3052)
|
1 年之前 |
Wang, Yi
|
5e16eb2c93
enable autoTP for mpt in huggingface model hub without trust_remote_code (#4062)
|
1 年之前 |
Lev Kurilenko
|
1ba4098918
Fix Stable Diffusion Injection (#4078)
|
1 年之前 |
Wang, Yi
|
0bafeac491
enable autoTP for MPT (#3861)
|
1 年之前 |
stephen youn
|
69d1b9f978
DeepSpeed-Triton for Inference (#3748)
|
1 年之前 |
Danny Semiat
|
d755b9d616
Align InferenceEngine to store ms in _model_times (#3501)
|
1 年之前 |
Ma, Guokai
|
1f72082fc0
[CPU] Support Intel CPU inference (#3041)
|
1 年之前 |
Molly Smith
|
5979ece8a2
Skip autoTP if tp_size is 1 (#3449)
|
1 年之前 |
Wang, Yi
|
b31b46c0d1
fix regression in shard checkpoint loading in AutoTP Path caused by qkv_copy() is deleted and add UT case for shard checkpoint loading in AutoTP (#3457)
|
1 年之前 |
Lev Kurilenko
|
db26f8b413
Update Inference Engine checkpoint loading + meta tensor assertions (#2940)
|
1 年之前 |
Wang, Yi
|
d10b8ca011
add sharded checkpoint loading for AutoTP path to reduce the peak mem… (#3102)
|
1 年之前 |
Connor Holmes
|
0a61d5d664
Hybrid Engine Refactor and Llama Inference Support (#3425)
|
1 年之前 |
Wang, Yi
|
6ba0024d54
Enable autoTP for bloom (#3035)
|
1 年之前 |
Michael Wyatt
|
b361c72761
Update DeepSpeed copyright license to Apache 2.0 (#3111)
|
1 年之前 |
Jeff Rasley
|
91d63e0228
update formatter version and style settings (#3098)
|
1 年之前 |
Lev Kurilenko
|
87eaf8f99a
Check for local CUDA graphs when enable_cuda_graph=True (#2941)
|
1 年之前 |
Ammar Ahmad Awan
|
e4b3b610ba
Refactor DS inference API. No longer need replace_method. (#2831)
|
1 年之前 |
Reza Yazdani
|
9f41ffe4a6
Reset KV-cache at the beginning of text-generation (#2669)
|
1 年之前 |
Molly Smith
|
c5b983e92e
Fix broken kernel inject bug (#2776)
|
1 年之前 |
Ma, Guokai
|
98cc35b6a8
Abstract accelerator (step 3) (#2677)
|
1 年之前 |
Molly Smith
|
d59b572911
Automatic tensor parallelism v2 (#2670)
|
1 年之前 |
Ammar Ahmad Awan
|
867da307d0
Inference Refactor (replace_with_policy, model_implementations) (#2554)
|
1 年之前 |
Reza Yazdani
|
95d9a1b6c3
Fix Opt injection (#2541)
|
1 年之前 |
Jeff Rasley
|
5676f5ec9c
[inference] check for unsupported model generate args (#2627)
|
1 年之前 |
Lev Kurilenko
|
503706ac44
Remove GatheredParameters context from replace_with_policy (#2591)
|
1 年之前 |
Ammar Ahmad Awan
|
90ae688442
Pass down the new DS inference config to replace_transformer_layer. (#2539)
|
1 年之前 |
Connor Holmes
|
57e0a55066
Ensure is initialized for SD (#2534)
|
1 年之前 |
Ammar Ahmad Awan
|
b5d18a6ab3
DeepSpeed inference config. (#2459) (#2472)
|
1 年之前 |