提交历史

作者 SHA1 备注 提交日期
  Michael Wyatt 4c35880b16 Allow multiple inference engines in single script (#4384) 1 年之前
  Michael Wyatt 85334238da added check to avoid undefined behavior when the input_id length is greater than max_tokens (#4349) 1 年之前
  Satpal Singh Rathore 430510bfce Checks for user injection policy (#3052) 1 年之前
  Wang, Yi 5e16eb2c93 enable autoTP for mpt in huggingface model hub without trust_remote_code (#4062) 1 年之前
  Lev Kurilenko 1ba4098918 Fix Stable Diffusion Injection (#4078) 1 年之前
  Wang, Yi 0bafeac491 enable autoTP for MPT (#3861) 1 年之前
  stephen youn 69d1b9f978 DeepSpeed-Triton for Inference (#3748) 1 年之前
  Danny Semiat d755b9d616 Align InferenceEngine to store ms in _model_times (#3501) 1 年之前
  Ma, Guokai 1f72082fc0 [CPU] Support Intel CPU inference (#3041) 1 年之前
  Molly Smith 5979ece8a2 Skip autoTP if tp_size is 1 (#3449) 1 年之前
  Wang, Yi b31b46c0d1 fix regression in shard checkpoint loading in AutoTP Path caused by qkv_copy() is deleted and add UT case for shard checkpoint loading in AutoTP (#3457) 1 年之前
  Lev Kurilenko db26f8b413 Update Inference Engine checkpoint loading + meta tensor assertions (#2940) 1 年之前
  Wang, Yi d10b8ca011 add sharded checkpoint loading for AutoTP path to reduce the peak mem… (#3102) 1 年之前
  Connor Holmes 0a61d5d664 Hybrid Engine Refactor and Llama Inference Support (#3425) 1 年之前
  Wang, Yi 6ba0024d54 Enable autoTP for bloom (#3035) 1 年之前
  Michael Wyatt b361c72761 Update DeepSpeed copyright license to Apache 2.0 (#3111) 1 年之前
  Jeff Rasley 91d63e0228 update formatter version and style settings (#3098) 1 年之前
  Lev Kurilenko 87eaf8f99a Check for local CUDA graphs when enable_cuda_graph=True (#2941) 1 年之前
  Ammar Ahmad Awan e4b3b610ba Refactor DS inference API. No longer need replace_method. (#2831) 1 年之前
  Reza Yazdani 9f41ffe4a6 Reset KV-cache at the beginning of text-generation (#2669) 1 年之前
  Molly Smith c5b983e92e Fix broken kernel inject bug (#2776) 1 年之前
  Ma, Guokai 98cc35b6a8 Abstract accelerator (step 3) (#2677) 1 年之前
  Molly Smith d59b572911 Automatic tensor parallelism v2 (#2670) 1 年之前
  Ammar Ahmad Awan 867da307d0 Inference Refactor (replace_with_policy, model_implementations) (#2554) 1 年之前
  Reza Yazdani 95d9a1b6c3 Fix Opt injection (#2541) 1 年之前
  Jeff Rasley 5676f5ec9c [inference] check for unsupported model generate args (#2627) 1 年之前
  Lev Kurilenko 503706ac44 Remove GatheredParameters context from replace_with_policy (#2591) 1 年之前
  Ammar Ahmad Awan 90ae688442 Pass down the new DS inference config to replace_transformer_layer. (#2539) 1 年之前
  Connor Holmes 57e0a55066 Ensure is initialized for SD (#2534) 1 年之前
  Ammar Ahmad Awan b5d18a6ab3 DeepSpeed inference config. (#2459) (#2472) 1 年之前