Logan Adams
|
6b2365e4fa
Re-enable elastic training for torch 2+ (#4010)
|
1 年之前 |
Michael Wyatt
|
7b850d3d04
Re-enable skipped unit tests (#3939)
|
1 年之前 |
Yizhou Wang
|
9f4a876360
Fix broadcast error on multi-node training with ZeroStage3 and TensorParallel=2 (#2999)
|
1 年之前 |
Jeff Rasley
|
a094c9763d
remove megatron-lm, no longer pip installable (#3389)
|
1 年之前 |
Logan Adams
|
ab1f32de32
Update skip on torch in tests (#3136)
|
1 年之前 |
Michael Wyatt
|
b361c72761
Update DeepSpeed copyright license to Apache 2.0 (#3111)
|
1 年之前 |
Jeff Rasley
|
91d63e0228
update formatter version and style settings (#3098)
|
1 年之前 |
Logan Adams
|
4e0686233a
Several fixes to unblock CI (#3047)
|
1 年之前 |
Ma, Guokai
|
0acf7e9c48
[RFC] add device abstraction to allow other device than CUDA be used (#2221)
|
1 年之前 |
Jeff Rasley
|
da84e60d98
add missing license info to top of all source code (#2889)
|
1 年之前 |
Michael Wyatt
|
ff42743865
Refactor remaining distributed tests (#2216)
|
2 年之前 |