Max Kovalenko d9e12d3a68 Fix attention mask handling in the Hybrid Engine Bloom flow (#5101) 7 月之前
..
features 389bf69319 fix: Remove duplicate word the (#4051) 1 年之前
__init__.py 468882fb68 Add the policy to run llama model from the official repo (#4313) 1 年之前
base.py 468882fb68 Add the policy to run llama model from the official repo (#4313) 1 年之前
base_moe.py b361c72761 Update DeepSpeed copyright license to Apache 2.0 (#3111) 1 年之前
bert.py 69d1b9f978 DeepSpeed-Triton for Inference (#3748) 1 年之前
bloom.py d9e12d3a68 Fix attention mask handling in the Hybrid Engine Bloom flow (#5101) 7 月之前
clip.py 0a61d5d664 Hybrid Engine Refactor and Llama Inference Support (#3425) 1 年之前
distil_bert.py 69d1b9f978 DeepSpeed-Triton for Inference (#3748) 1 年之前
gpt2.py 0a61d5d664 Hybrid Engine Refactor and Llama Inference Support (#3425) 1 年之前
gptj.py d81dfdabcc Fix LoRA Fuse/Unfuse in Hybrid Engine (#3563) 1 年之前
gptneo.py d81dfdabcc Fix LoRA Fuse/Unfuse in Hybrid Engine (#3563) 1 年之前
gptneox.py d81dfdabcc Fix LoRA Fuse/Unfuse in Hybrid Engine (#3563) 1 年之前
internlm.py 367d6f9cec Support InternLM (#4137) 1 年之前
llama.py beed962c25 [Bug fix] Add rope_theta for llama config (#4480) 1 年之前
llama2.py 468882fb68 Add the policy to run llama model from the official repo (#4313) 1 年之前
megatron_gpt.py 6cbf666131 fix MegatronLayerPolicy to be compatible with the newest ParallelTransformerLayer (#4236) 1 年之前
megatron_gpt_moe.py 0a61d5d664 Hybrid Engine Refactor and Llama Inference Support (#3425) 1 年之前
opt.py d81dfdabcc Fix LoRA Fuse/Unfuse in Hybrid Engine (#3563) 1 年之前
unet.py a049370c0c Update import for changes to latest diffusers (#5065) 8 月之前
vae.py e212845e39 Add backwards compatibility w/ older versions of diffusers (<0.25.0) (#5083) 8 月之前