Heyang Qin
|
dc01cee5ca
using container when loading inference checkpoints (#2875)
|
1 year ago |
Jeff Rasley
|
da84e60d98
add missing license info to top of all source code (#2889)
|
1 year ago |
Lev Kurilenko
|
fd1449c766
Port Reza's INT8-quantization fix to container architecture (#2725)
|
1 year ago |
Molly Smith
|
46784cb58e
Fix auto TP for duplicate modules with different gems (#2784)
|
1 year ago |
Lev Kurilenko
|
10f3c301a0
Add container load checkpoint error reporting + refactor (#2792)
|
1 year ago |
Lev Kurilenko
|
0a73e6e613
Container param cleanup + remove qkv_merging (#2780)
|
1 year ago |
Reza Yazdani
|
9f41ffe4a6
Reset KV-cache at the beginning of text-generation (#2669)
|
1 year ago |
Ma, Guokai
|
98cc35b6a8
Abstract accelerator (step 3) (#2677)
|
1 year ago |
Ammar Ahmad Awan
|
867da307d0
Inference Refactor (replace_with_policy, model_implementations) (#2554)
|
1 year ago |
Jeff Rasley
|
d9b788d773
tweaks to ds-attn, distilbert policy, and mup (#2649)
|
1 year ago |
Jeff Rasley
|
e0aa84c5b5
Fix issue w. bloom when changing tp size (#2645)
|
1 year ago |
Lev Kurilenko
|
503706ac44
Remove GatheredParameters context from replace_with_policy (#2591)
|
1 year ago |
Jeff Rasley
|
35eabb0a33
Fix issues w. python 3.6 + add py-version checks to CI (#2589)
|
1 year ago |
Michael Wyatt
|
ccb8eb81fb
Add checkpoint sharding unit tests (#2561)
|
1 year ago |
Reza Yazdani
|
35b350b28c
Fix quantized-inference & Add generic support of checkpoint loading (#2547)
|
1 year ago |
Ammar Ahmad Awan
|
90ae688442
Pass down the new DS inference config to replace_transformer_layer. (#2539)
|
1 year ago |
Ammar Ahmad Awan
|
b5d18a6ab3
DeepSpeed inference config. (#2459) (#2472)
|
1 year ago |
lokoppakmsft
|
f2710bbe1d
Make data contiguous before the inplace reshape-copy_ function (#2489)
|
1 year ago |
Connor Holmes
|
e7e7595502
Stable Diffusion Enhancements (#2491)
|
1 year ago |
Kevin Ko
|
6f77da1bae
Add `scale_attn_by_inverse_layer_idx` feature (#2486)
|
1 year ago |
Reza Yazdani
|
9cfcf7431a
Add correct memory-allocation at DeepSpeed-Attention (#2474)
|
1 year ago |
Connor Holmes
|
10e9d04c23
Cache Allocation and Softmax Fixes (#2433)
|
2 years ago |
Jeff Rasley
|
ec13da6ba7
add SD injection policy (#2381)
|
2 years ago |
Andrey Chernykh
|
cd3a70953a
Fix GPT Neo-X multi-gpu inference (#2401)
|
2 years ago |
lekurile
|
46a886c068
Change type to tuple in replace_wo_policy isinstance check (#2387)
|
2 years ago |
Ammar Ahmad Awan
|
993264388d
Inference profiling updates/fixes (#2348) (#2349)
|
2 years ago |
Stas Bekman
|
b146aa3523
[ds-inference] fix progress bar (#2286)
|
2 years ago |
Reza Yazdani
|
afdc72879f
Ds-inference Int8 support through ZeroQuant technology (#2217)
|
2 years ago |
Molly Smith
|
a7ee688a6f
Update replace_module.py, test-gptj.py related fix (#2269)
|
2 years ago |
Reza Yazdani
|
c35bfe89f6
fix ds-inference without policy (#2247)
|
2 years ago |