Ammar Ahmad Awan
|
36ad3119d5
DeepSpeed comm backend v1 (#1985)
|
2 年之前 |
Reza Yazdani
|
a5adb90d72
Enabling CUDA-graph for the bert-type models (#1952)
|
2 年之前 |
Jeff Rasley
|
b4fcd98ff0
Inference PP changes for neox (#1899)
|
2 年之前 |
Reza Yazdani
|
60fc06c610
Synchronize the GPUs for the text-generation inference test (#1805)
|
2 年之前 |
Reza Yazdani
|
841f99d162
Load MoE checkpint at deepspeed inference-engine (#1759)
|
2 年之前 |
Reza Yazdani
|
94de0229fb
Fix inference api & add more description on inference engine tutorial (#1711)
|
2 年之前 |
Jeff Rasley
|
e46d808a1b
MoE inference + PR-MoE model support (#1705)
|
2 年之前 |
Reza Yazdani
|
8e891aa568
Transformer kernel/fix layer norm (#1587)
|
2 年之前 |
Reza Yazdani
|
9ce00a2171
Tensor-Parallelism general support (#1512)
|
2 年之前 |
Chunyang Wen
|
f0122007df
Modify inference engine (#1520)
|
2 年之前 |
Reza Yazdani
|
ee6a92c066
Fixing the transformer APIs to return tuple as the output (if needed) (#1491)
|
3 年之前 |
Alex Hedges
|
be789b1665
Fix many typos (#1423)
|
3 年之前 |
Reza Yazdani
|
9f17087fdd
Save the model parallel group at inference engine statically (#1411)
|
3 年之前 |
Reza Yazdani
|
0ec11daa02
Add more synchronizations and barriers for the multi-gpu inference case (#1309)
|
3 年之前 |
Reza Yazdani
|
49b6a63251
Reducing the memory-overhead of creating model for multi-GPU run (#1244)
|
3 年之前 |
Hyunwoong Ko
|
429cbc89af
Fix bugs about non-contiguous tensor broadcasting (#1168)
|
3 年之前 |
Reza Yazdani
|
aca7fc549a
Add local attention for GPT-Neo model architecture (#1114)
|
3 年之前 |
Reza Yazdani
|
d2cf66a668
release inference quantized kernels (#1104)
|
3 年之前 |
Reza Yazdani
|
ed3de0c21b
Quantization + inference release (#1091)
|
3 年之前 |