Alexander Jipa
|
b354c28b76
polishing timers and log_dist (#3996)
|
1 year ago |
Michael Wyatt
|
b361c72761
Update DeepSpeed copyright license to Apache 2.0 (#3111)
|
1 year ago |
Jeff Rasley
|
91d63e0228
update formatter version and style settings (#3098)
|
1 year ago |
叶志晟
|
80f94c10c5
fix #2240: wrong time unit in flops_profiler (#2241)
|
2 years ago |
Siddharth Singh
|
5fe9d61065
Tensor parallelism for Mixture of Experts (#2074)
|
2 years ago |
TIAN Ye
|
31582d7728
Fix conflict between Tutel and top-2 gate in MoE layer (#2053)
|
2 years ago |
Alex Hedges
|
316c4a43e0
Add flake8 to pre-commit checks (#2051)
|
2 years ago |
Karim Foda
|
735406e536
fix import errors (#2026)
|
2 years ago |
Ammar Ahmad Awan
|
36ad3119d5
DeepSpeed comm backend v1 (#1985)
|
2 years ago |
shjwudp
|
5fb4256a7a
fix: fix undefined variable in MoE top2gating (#1827)
|
2 years ago |
Ammar Ahmad Awan
|
c0af6d90f7
Refactor MoE and Groups API to simplify model creation and mangement (#1798)
|
2 years ago |
Jeff Rasley
|
e46d808a1b
MoE inference + PR-MoE model support (#1705)
|
2 years ago |
Reza Yazdani
|
289c3f9ba4
GPT-J inference support (#1670)
|
2 years ago |
Gary Miguel
|
07887f6630
sharded_moe: make top1gating ONNX-exportable (#1578)
|
2 years ago |
alexandremuzio
|
1bc13fe83f
Removing `ImportError` from tutel import try/except (#1583)
|
2 years ago |
alexandremuzio
|
2887349cd4
Adding Tutel to MoE layer (#1528)
|
3 years ago |
Ammar Ahmad Awan
|
56635d5b6c
enable/disable moe token dropping. (#1492)
|
3 years ago |
Gani Nazirov
|
20bf1cc120
Switch to use or not einsum op. Needed for ORT (#1456)
|
3 years ago |
Ammar Ahmad Awan
|
1fc74cb9c8
Add basic MoE timing breakdown (#1428)
|
3 years ago |
Alex Hedges
|
be789b1665
Fix many typos (#1423)
|
3 years ago |
Ammar Ahmad Awan
|
f28432441b
DeepSpeed MoE (#1310)
|
3 years ago |