Elsa Granger 4fc2c8e7d5 Fix llama meta tensor loading in AutoTP and kernel injected inference (#3608) 1 year ago
..
features 389bf69319 fix: Remove duplicate word the (#4051) 1 year ago
__init__.py 468882fb68 Add the policy to run llama model from the official repo (#4313) 1 year ago
base.py 468882fb68 Add the policy to run llama model from the official repo (#4313) 1 year ago
base_moe.py b361c72761 Update DeepSpeed copyright license to Apache 2.0 (#3111) 1 year ago
bert.py 69d1b9f978 DeepSpeed-Triton for Inference (#3748) 1 year ago
bloom.py d81dfdabcc Fix LoRA Fuse/Unfuse in Hybrid Engine (#3563) 1 year ago
clip.py 0a61d5d664 Hybrid Engine Refactor and Llama Inference Support (#3425) 1 year ago
distil_bert.py 69d1b9f978 DeepSpeed-Triton for Inference (#3748) 1 year ago
gpt2.py 0a61d5d664 Hybrid Engine Refactor and Llama Inference Support (#3425) 1 year ago
gptj.py d81dfdabcc Fix LoRA Fuse/Unfuse in Hybrid Engine (#3563) 1 year ago
gptneo.py d81dfdabcc Fix LoRA Fuse/Unfuse in Hybrid Engine (#3563) 1 year ago
gptneox.py d81dfdabcc Fix LoRA Fuse/Unfuse in Hybrid Engine (#3563) 1 year ago
internlm.py 367d6f9cec Support InternLM (#4137) 1 year ago
llama.py 4fc2c8e7d5 Fix llama meta tensor loading in AutoTP and kernel injected inference (#3608) 1 year ago
llama2.py 468882fb68 Add the policy to run llama model from the official repo (#4313) 1 year ago
megatron_gpt.py 6cbf666131 fix MegatronLayerPolicy to be compatible with the newest ParallelTransformerLayer (#4236) 1 year ago
megatron_gpt_moe.py 0a61d5d664 Hybrid Engine Refactor and Llama Inference Support (#3425) 1 year ago
opt.py d81dfdabcc Fix LoRA Fuse/Unfuse in Hybrid Engine (#3563) 1 year ago
unet.py b361c72761 Update DeepSpeed copyright license to Apache 2.0 (#3111) 1 year ago
vae.py 1ba4098918 Fix Stable Diffusion Injection (#4078) 1 year ago