Costin Eseanu b3767d01d4 Fixed Windows inference build. (#5609) 4 月之前
..
checkpoint a37e59b590 Update deprecated HuggingFace function (#5144) 8 月之前
kernels b3767d01d4 Fixed Windows inference build. (#5609) 4 月之前
model_implementations ccfdb84e2a FP6 quantization end-to-end. (#5234) 7 月之前
modules e3d873a00e Fix the FP6 kernels compilation problem on non-Ampere GPUs. (#5333) 6 月之前
ragged 49359d0bc7 Replace HIP_PLATFORM_HCC with HIP_PLATFORM_AMD (#5264) 7 月之前
__init__.py 5411030529 Inference Checkpoints in V2 (#4664) 11 月之前
allocator.py 3c811c966b 47% FastGen speedup for low workload - refactor allocator (#5090) 8 月之前
config_v2.py ccfdb84e2a FP6 quantization end-to-end. (#5234) 7 月之前
engine_factory.py bcc617a000 Add fp16 support of Qwen1.5 models (0.5B to 72B) to DeepSpeed-FastGen (#5219) 7 月之前
engine_v2.py 5dea776a84 Enhance query APIs for text generation (#4965) 9 月之前
inference_parameter.py 5411030529 Inference Checkpoints in V2 (#4664) 11 月之前
inference_utils.py 38b41dffa1 DeepSpeed-FastGen (#4604) 11 月之前
logging.py 38b41dffa1 DeepSpeed-FastGen (#4604) 11 月之前
scheduling_utils.py 38b41dffa1 DeepSpeed-FastGen (#4604) 11 月之前