inkcherry
|
0896503e2f
Fix a convergence issues in TP topology caused by incorrect grad_norm. (#5411)
|
6 月之前 |
Logan Adams
|
6dcced1d5c
Cleanup required_torch_version code and references. (#5370)
|
6 月之前 |
Moshe Island
|
08e0733e4a
Support MoE for pipeline models (#5338)
|
6 月之前 |
BacharL
|
9f0e21363b
compute global norm on device (#5125)
|
6 月之前 |
inkcherry
|
e5dd5501c1
support bf16_optimizer moe expert parallel training and moe EP grad_scale/grad_norm fix (#5259)
|
6 月之前 |
Masahiro Tanaka
|
005afe124f
Fix gradient clipping (#5150)
|
8 月之前 |
mmhab
|
961bc85624
optimize clip_grad_norm_ function (#4915)
|
8 月之前 |
inkcherry
|
d5a7c1e0b4
Capture short kernel sequences to graph (#4318)
|
10 月之前 |
Hang
|
2bdf061f4d
[BUG] partition_balanced return wrong result. (#4312)
|
10 月之前 |
taozhiwei
|
fd0a52c1ac
use all_gather_into_tensor instead of all_gather (#4705)
|
10 月之前 |
Reza Yazdani
|
2afa1c7f2f
Communication Optimization for Large-Scale Training (#4695)
|
11 月之前 |
Jackmin801
|
58a206059f
Small docstring fix (#4431)
|
1 年之前 |
mzl
|
7f3e82fe09
do allgather only in shared optimizer states groups (#4167)
|
1 年之前 |
marcobellagente93
|
e8318634b4
Spread layers more uniformly when using partition_uniform (#4053)
|
1 年之前 |
Ma, Guokai
|
0f5406323c
[CPU] FusedAdam and CPU training support (#3991)
|
1 年之前 |
Logan Adams
|
6b2365e4fa
Re-enable elastic training for torch 2+ (#4010)
|
1 年之前 |
digger-yu
|
254663a28c
fix spelling error with deepspeed/runtime/ (#3509)
|
1 年之前 |
Olatunji Ruwase
|
47f9f13bd3
DeepSpeed Chat (#3186)
|
1 年之前 |
Guo Yejun
|
6eca037ce0
deepspeed/runtime/utils.py: reset_peak_memory_stats when empty cache (#2803)
|
1 年之前 |
Michael Wyatt
|
b361c72761
Update DeepSpeed copyright license to Apache 2.0 (#3111)
|
1 年之前 |
Mayank Mishra
|
a6317eb509
♻️ replace deprecated functions for communication (#2995)
|
1 年之前 |
Jeff Rasley
|
91d63e0228
update formatter version and style settings (#3098)
|
1 年之前 |
Yasyf Mohamedali
|
d3de737550
Remove deprecated `torch._six` imports (#2863)
|
1 年之前 |
Ma, Guokai
|
98cc35b6a8
Abstract accelerator (step 3) (#2677)
|
1 年之前 |
Guo Yejun
|
d0dbc95a90
call empty_cache to really free up GPU memory as described in comment (#2620)
|
1 年之前 |
Alex Hedges
|
316c4a43e0
Add flake8 to pre-commit checks (#2051)
|
2 年之前 |
Karim Foda
|
735406e536
fix import errors (#2026)
|
2 年之前 |
Ammar Ahmad Awan
|
36ad3119d5
DeepSpeed comm backend v1 (#1985)
|
2 年之前 |
Olatunji Ruwase
|
56c5223868
bf16+pipeline parallelism (#1801)
|
2 年之前 |
Ammar Ahmad Awan
|
c0af6d90f7
Refactor MoE and Groups API to simplify model creation and mangement (#1798)
|
2 年之前 |