Joe Mayer
|
18713c6838
Updating API docs (#2586)
|
1 年之前 |
Siddharth Singh
|
5fe9d61065
Tensor parallelism for Mixture of Experts (#2074)
|
2 年之前 |
Alex Hedges
|
316c4a43e0
Add flake8 to pre-commit checks (#2051)
|
2 年之前 |
Jianfeng Liu
|
b4513f6310
fix softmax dim of Residual MoE in moe/layer.py (#2110)
|
2 年之前 |
Karim Foda
|
735406e536
fix import errors (#2026)
|
2 年之前 |
Ammar Ahmad Awan
|
36ad3119d5
DeepSpeed comm backend v1 (#1985)
|
2 年之前 |
shjwudp
|
1e61c7a860
fix: Fix undefined variable in _create_expert_data_and_model_parallel and make it easier to understand (#1826)
|
2 年之前 |
Ammar Ahmad Awan
|
c0af6d90f7
Refactor MoE and Groups API to simplify model creation and mangement (#1798)
|
2 年之前 |
Jeff Rasley
|
e46d808a1b
MoE inference + PR-MoE model support (#1705)
|
2 年之前 |
alexandremuzio
|
2887349cd4
Adding Tutel to MoE layer (#1528)
|
3 年之前 |
Ammar Ahmad Awan
|
56635d5b6c
enable/disable moe token dropping. (#1492)
|
3 年之前 |
Ammar Ahmad Awan
|
9f5939d2a7
Remove dropout as client code can do it independently. (#1354)
|
3 年之前 |
Jeff Rasley
|
9cb64a1fc5
MoE read the docs update (#1312)
|
3 年之前 |
Ammar Ahmad Awan
|
f28432441b
DeepSpeed MoE (#1310)
|
3 年之前 |