Siddharth Singh
|
5fe9d61065
Tensor parallelism for Mixture of Experts (#2074)
|
2 years ago |
Alex Hedges
|
316c4a43e0
Add flake8 to pre-commit checks (#2051)
|
2 years ago |
Karim Foda
|
735406e536
fix import errors (#2026)
|
2 years ago |
Ammar Ahmad Awan
|
36ad3119d5
DeepSpeed comm backend v1 (#1985)
|
2 years ago |
kisseternity
|
89e37ef360
spell err (#1929)
|
2 years ago |
shjwudp
|
1e61c7a860
fix: Fix undefined variable in _create_expert_data_and_model_parallel and make it easier to understand (#1826)
|
2 years ago |
Ammar Ahmad Awan
|
c0af6d90f7
Refactor MoE and Groups API to simplify model creation and mangement (#1798)
|
2 years ago |
Jeff Rasley
|
e46d808a1b
MoE inference + PR-MoE model support (#1705)
|
2 years ago |
Alex Hedges
|
be789b1665
Fix many typos (#1423)
|
3 years ago |
Jeff Rasley
|
9cb64a1fc5
MoE read the docs update (#1312)
|
3 years ago |
Ammar Ahmad Awan
|
f28432441b
DeepSpeed MoE (#1310)
|
3 years ago |