提交历史

作者 SHA1 备注 提交日期
  Ammar Ahmad Awan 36ad3119d5 DeepSpeed comm backend v1 (#1985) 2 年之前
  Reza Yazdani a5adb90d72 Enabling CUDA-graph for the bert-type models (#1952) 2 年之前
  Jeff Rasley b4fcd98ff0 Inference PP changes for neox (#1899) 2 年之前
  Reza Yazdani 60fc06c610 Synchronize the GPUs for the text-generation inference test (#1805) 2 年之前
  Reza Yazdani 841f99d162 Load MoE checkpint at deepspeed inference-engine (#1759) 2 年之前
  Reza Yazdani 94de0229fb Fix inference api & add more description on inference engine tutorial (#1711) 2 年之前
  Jeff Rasley e46d808a1b MoE inference + PR-MoE model support (#1705) 2 年之前
  Reza Yazdani 8e891aa568 Transformer kernel/fix layer norm (#1587) 2 年之前
  Reza Yazdani 9ce00a2171 Tensor-Parallelism general support (#1512) 2 年之前
  Chunyang Wen f0122007df Modify inference engine (#1520) 2 年之前
  Reza Yazdani ee6a92c066 Fixing the transformer APIs to return tuple as the output (if needed) (#1491) 3 年之前
  Alex Hedges be789b1665 Fix many typos (#1423) 3 年之前
  Reza Yazdani 9f17087fdd Save the model parallel group at inference engine statically (#1411) 3 年之前
  Reza Yazdani 0ec11daa02 Add more synchronizations and barriers for the multi-gpu inference case (#1309) 3 年之前
  Reza Yazdani 49b6a63251 Reducing the memory-overhead of creating model for multi-GPU run (#1244) 3 年之前
  Hyunwoong Ko 429cbc89af Fix bugs about non-contiguous tensor broadcasting (#1168) 3 年之前
  Reza Yazdani aca7fc549a Add local attention for GPT-Neo model architecture (#1114) 3 年之前
  Reza Yazdani d2cf66a668 release inference quantized kernels (#1104) 3 年之前
  Reza Yazdani ed3de0c21b Quantization + inference release (#1091) 3 年之前