mzl
|
7f3e82fe09
do allgather only in shared optimizer states groups (#4167)
|
1 year ago |
marcobellagente93
|
e8318634b4
Spread layers more uniformly when using partition_uniform (#4053)
|
1 year ago |
Ma, Guokai
|
0f5406323c
[CPU] FusedAdam and CPU training support (#3991)
|
1 year ago |
Logan Adams
|
6b2365e4fa
Re-enable elastic training for torch 2+ (#4010)
|
1 year ago |
digger-yu
|
254663a28c
fix spelling error with deepspeed/runtime/ (#3509)
|
1 year ago |
Olatunji Ruwase
|
47f9f13bd3
DeepSpeed Chat (#3186)
|
1 year ago |
Guo Yejun
|
6eca037ce0
deepspeed/runtime/utils.py: reset_peak_memory_stats when empty cache (#2803)
|
1 year ago |
Michael Wyatt
|
b361c72761
Update DeepSpeed copyright license to Apache 2.0 (#3111)
|
1 year ago |
Mayank Mishra
|
a6317eb509
♻️ replace deprecated functions for communication (#2995)
|
1 year ago |
Jeff Rasley
|
91d63e0228
update formatter version and style settings (#3098)
|
1 year ago |
Yasyf Mohamedali
|
d3de737550
Remove deprecated `torch._six` imports (#2863)
|
1 year ago |
Ma, Guokai
|
98cc35b6a8
Abstract accelerator (step 3) (#2677)
|
1 year ago |
Guo Yejun
|
d0dbc95a90
call empty_cache to really free up GPU memory as described in comment (#2620)
|
1 year ago |
Alex Hedges
|
316c4a43e0
Add flake8 to pre-commit checks (#2051)
|
2 years ago |
Karim Foda
|
735406e536
fix import errors (#2026)
|
2 years ago |
Ammar Ahmad Awan
|
36ad3119d5
DeepSpeed comm backend v1 (#1985)
|
2 years ago |
Olatunji Ruwase
|
56c5223868
bf16+pipeline parallelism (#1801)
|
2 years ago |
Ammar Ahmad Awan
|
c0af6d90f7
Refactor MoE and Groups API to simplify model creation and mangement (#1798)
|
2 years ago |
Justin Chiu
|
4912e0ad7e
Various ZeRO Stage3 Optimizations + Improvements (including bfloat16 support) (#1453)
|
2 years ago |
Jeff Rasley
|
e46d808a1b
MoE inference + PR-MoE model support (#1705)
|
2 years ago |
Alex Hedges
|
fc2f378ece
Improve pre-commit hooks (#1602)
|
2 years ago |
Jeff Rasley
|
2332cb31a7
Enables ZeRO-3 inference (#1514)
|
2 years ago |
Cheng Li
|
9caa74e577
Autotuning (#1554)
|
2 years ago |
Jeff Rasley
|
e2fdd254ed
Big science related changes (#1407)
|
3 years ago |
Ammar Ahmad Awan
|
ddffbae021
Remove duplicate clip grad function in deepspeed (#1333)
|
3 years ago |
Olatunji Ruwase
|
85acf14c58
Activation checkpointing improvements (#1254)
|
3 years ago |
Ammar Ahmad Awan
|
f28432441b
DeepSpeed MoE (#1310)
|
3 years ago |
Stas Bekman
|
32e85eda58
[see_memory_usage] fix deprecation (#1234)
|
3 years ago |
Shaden Smith
|
46f4573b1a
Seeded unit tests (#1072)
|
3 years ago |
Olatunji Ruwase
|
e88ebbcfc9
Use amp autocast in ZeRO3 linear (#990)
|
3 years ago |