Reza Yazdani
|
2afa1c7f2f
Communication Optimization for Large-Scale Training (#4695)
|
11 months ago |
Michael Wyatt
|
b361c72761
Update DeepSpeed copyright license to Apache 2.0 (#3111)
|
1 year ago |
Jeff Rasley
|
91d63e0228
update formatter version and style settings (#3098)
|
1 year ago |
Joe Mayer
|
18713c6838
Updating API docs (#2586)
|
1 year ago |
Siddharth Singh
|
5fe9d61065
Tensor parallelism for Mixture of Experts (#2074)
|
2 years ago |
Alex Hedges
|
316c4a43e0
Add flake8 to pre-commit checks (#2051)
|
2 years ago |
Jianfeng Liu
|
b4513f6310
fix softmax dim of Residual MoE in moe/layer.py (#2110)
|
2 years ago |
Karim Foda
|
735406e536
fix import errors (#2026)
|
2 years ago |
Ammar Ahmad Awan
|
36ad3119d5
DeepSpeed comm backend v1 (#1985)
|
2 years ago |
shjwudp
|
1e61c7a860
fix: Fix undefined variable in _create_expert_data_and_model_parallel and make it easier to understand (#1826)
|
2 years ago |
Ammar Ahmad Awan
|
c0af6d90f7
Refactor MoE and Groups API to simplify model creation and mangement (#1798)
|
2 years ago |
Jeff Rasley
|
e46d808a1b
MoE inference + PR-MoE model support (#1705)
|
2 years ago |
alexandremuzio
|
2887349cd4
Adding Tutel to MoE layer (#1528)
|
3 years ago |
Ammar Ahmad Awan
|
56635d5b6c
enable/disable moe token dropping. (#1492)
|
3 years ago |
Ammar Ahmad Awan
|
9f5939d2a7
Remove dropout as client code can do it independently. (#1354)
|
3 years ago |
Jeff Rasley
|
9cb64a1fc5
MoE read the docs update (#1312)
|
3 years ago |
Ammar Ahmad Awan
|
f28432441b
DeepSpeed MoE (#1310)
|
3 years ago |