Lev Kurilenko
|
87eaf8f99a
Check for local CUDA graphs when enable_cuda_graph=True (#2941)
|
1 年之前 |
Ammar Ahmad Awan
|
e4b3b610ba
Refactor DS inference API. No longer need replace_method. (#2831)
|
1 年之前 |
Reza Yazdani
|
9f41ffe4a6
Reset KV-cache at the beginning of text-generation (#2669)
|
1 年之前 |
Molly Smith
|
c5b983e92e
Fix broken kernel inject bug (#2776)
|
1 年之前 |
Ma, Guokai
|
98cc35b6a8
Abstract accelerator (step 3) (#2677)
|
1 年之前 |
Molly Smith
|
d59b572911
Automatic tensor parallelism v2 (#2670)
|
1 年之前 |
Ammar Ahmad Awan
|
867da307d0
Inference Refactor (replace_with_policy, model_implementations) (#2554)
|
1 年之前 |
Reza Yazdani
|
95d9a1b6c3
Fix Opt injection (#2541)
|
1 年之前 |
Jeff Rasley
|
5676f5ec9c
[inference] check for unsupported model generate args (#2627)
|
1 年之前 |
Lev Kurilenko
|
503706ac44
Remove GatheredParameters context from replace_with_policy (#2591)
|
1 年之前 |
Ammar Ahmad Awan
|
90ae688442
Pass down the new DS inference config to replace_transformer_layer. (#2539)
|
1 年之前 |
Connor Holmes
|
57e0a55066
Ensure is initialized for SD (#2534)
|
1 年之前 |
Ammar Ahmad Awan
|
b5d18a6ab3
DeepSpeed inference config. (#2459) (#2472)
|
1 年之前 |
Connor Holmes
|
e7e7595502
Stable Diffusion Enhancements (#2491)
|
1 年之前 |
Reza Yazdani
|
39bdc14195
fixing the checkpoint loading at inference-engine (#2429)
|
2 年之前 |
Connor Holmes
|
10e9d04c23
Cache Allocation and Softmax Fixes (#2433)
|
2 年之前 |
Michael Wyatt
|
e772f16665
Use CUDA events for inference model profiling (#2371)
|
2 年之前 |
Reza Yazdani
|
537e8581fe
fix checkpoint loading when it is a dictionary (#2425)
|
2 年之前 |
Jeff Rasley
|
ec13da6ba7
add SD injection policy (#2381)
|
2 年之前 |
Ammar Ahmad Awan
|
993264388d
Inference profiling updates/fixes (#2348) (#2349)
|
2 年之前 |
Jeff Rasley
|
cf638be998
only override forward if using cuda-graph (#2291)
|
2 年之前 |
Reza Yazdani
|
afdc72879f
Ds-inference Int8 support through ZeroQuant technology (#2217)
|
2 年之前 |
Arash Bakhtiari
|
8b2a63717a
Add support of OPT models (#2205)
|
2 年之前 |
Reza Yazdani
|
556f005152
Fix random token-generation issue + MP-checkpoint loading/saving (#2132)
|
2 年之前 |
trajep
|
e669aaf55b
Trajepl/nebula ckpt engine (#2085)
|
2 年之前 |
Alex Hedges
|
316c4a43e0
Add flake8 to pre-commit checks (#2051)
|
2 年之前 |
Jeff Rasley
|
844d9f31a9
reduce ds-inference log verbosity (#2111)
|
2 年之前 |
Reza Yazdani
|
aa88137b8d
Add Inference support for running the BigScience-BLOOM Architecture (#2083)
|
2 年之前 |
Karim Foda
|
735406e536
fix import errors (#2026)
|
2 年之前 |
Jeff Rasley
|
b666d5cd73
[inference] test suite for ds-kernels (bert, roberta, gpt2, gpt-neo, gpt-j) (#1992)
|
2 年之前 |