Ma, Guokai f15cccfa0c [AutoTP] Make AutoTP work when num_heads not divisible by number of workers (#4011) 1 年之前
..
quantization 6c86ff393f adding 8bit dequantization kernel for asym fine-grained block quantization in zero-inference (#4450) 1 年之前
test_checkpoint_sharding.py 76953a37b7 fix opt-350m shard loading issue in AutoTP (#3600) 1 年之前
test_inference.py f15cccfa0c [AutoTP] Make AutoTP work when num_heads not divisible by number of workers (#4011) 1 年之前
test_inference_config.py b361c72761 Update DeepSpeed copyright license to Apache 2.0 (#3111) 1 年之前
test_model_profiling.py 7290aace9b [CPU] Skip CPU support unimplemented error (#3633) 1 年之前