Reza Yazdani
|
9ce00a2171
Tensor-Parallelism general support (#1512)
|
2 年之前 |
Chunyang Wen
|
f0122007df
Modify inference engine (#1520)
|
2 年之前 |
Reza Yazdani
|
ee6a92c066
Fixing the transformer APIs to return tuple as the output (if needed) (#1491)
|
3 年之前 |
Alex Hedges
|
be789b1665
Fix many typos (#1423)
|
3 年之前 |
Reza Yazdani
|
9f17087fdd
Save the model parallel group at inference engine statically (#1411)
|
3 年之前 |
Reza Yazdani
|
0ec11daa02
Add more synchronizations and barriers for the multi-gpu inference case (#1309)
|
3 年之前 |
Reza Yazdani
|
49b6a63251
Reducing the memory-overhead of creating model for multi-GPU run (#1244)
|
3 年之前 |
Hyunwoong Ko
|
429cbc89af
Fix bugs about non-contiguous tensor broadcasting (#1168)
|
3 年之前 |
Reza Yazdani
|
aca7fc549a
Add local attention for GPT-Neo model architecture (#1114)
|
3 年之前 |
Reza Yazdani
|
d2cf66a668
release inference quantized kernels (#1104)
|
3 年之前 |
Reza Yazdani
|
ed3de0c21b
Quantization + inference release (#1091)
|
3 年之前 |