Commit History

Author SHA1 Message Date
  Reza Yazdani 9ce00a2171 Tensor-Parallelism general support (#1512) 2 years ago
  Chunyang Wen f0122007df Modify inference engine (#1520) 2 years ago
  Reza Yazdani ee6a92c066 Fixing the transformer APIs to return tuple as the output (if needed) (#1491) 3 years ago
  Alex Hedges be789b1665 Fix many typos (#1423) 3 years ago
  Reza Yazdani 9f17087fdd Save the model parallel group at inference engine statically (#1411) 3 years ago
  Reza Yazdani 0ec11daa02 Add more synchronizations and barriers for the multi-gpu inference case (#1309) 3 years ago
  Reza Yazdani 49b6a63251 Reducing the memory-overhead of creating model for multi-GPU run (#1244) 3 years ago
  Hyunwoong Ko 429cbc89af Fix bugs about non-contiguous tensor broadcasting (#1168) 3 years ago
  Reza Yazdani aca7fc549a Add local attention for GPT-Neo model architecture (#1114) 3 years ago
  Reza Yazdani d2cf66a668 release inference quantized kernels (#1104) 3 years ago
  Reza Yazdani ed3de0c21b Quantization + inference release (#1091) 3 years ago