Alexander Jipa
|
b354c28b76
polishing timers and log_dist (#3996)
|
1 year ago |
digger yu
|
fc8de76f1d
Simplify chain comparisons, remove redundant parentheses (#3912)
|
1 year ago |
Ma, Guokai
|
1f72082fc0
[CPU] Support Intel CPU inference (#3041)
|
1 year ago |
Michael Wyatt
|
b361c72761
Update DeepSpeed copyright license to Apache 2.0 (#3111)
|
1 year ago |
Jeff Rasley
|
91d63e0228
update formatter version and style settings (#3098)
|
1 year ago |
Ma, Guokai
|
98cc35b6a8
Abstract accelerator (step 3) (#2677)
|
1 year ago |
Alexander Jipa
|
0f0e38c520
fixes #2498 (#2603)
|
1 year ago |
ShijieZZZZ
|
340fc0cf19
Report progress at gradient accumulation boundary (#2553)
|
1 year ago |
Jeff Rasley
|
5bd09a8f83
Allow turning off loss scaling wrt GAS + update tput calculator (#2140)
|
2 years ago |
Alex Hedges
|
316c4a43e0
Add flake8 to pre-commit checks (#2051)
|
2 years ago |
Quentin Anthony
|
5349347bb6
DeepSpeed Communication Profiling and Logging (#2012)
|
2 years ago |
Zeyu
|
b05237876e
fixed "None type has no len()" (#2091)
|
2 years ago |
Karim Foda
|
735406e536
fix import errors (#2026)
|
2 years ago |
Ammar Ahmad Awan
|
36ad3119d5
DeepSpeed comm backend v1 (#1985)
|
2 years ago |
Quentin Anthony
|
0d36893281
Fix timer typo (#1964)
|
2 years ago |
Olatunji Ruwase
|
fee7313598
Use cuda events to improve timing for multi-stream execution (#1881)
|
2 years ago |
Justin Chiu
|
4912e0ad7e
Various ZeRO Stage3 Optimizations + Improvements (including bfloat16 support) (#1453)
|
2 years ago |
Cheng Li
|
9caa74e577
Autotuning (#1554)
|
2 years ago |
Cheng Li
|
4544b7d2f1
Improve flops profiler functionality (#1065)
|
3 years ago |
Sean Naren
|
6fb16100ba
Replace timer print rank 0 with logging (#732)
|
3 years ago |
Jeff Rasley
|
0dc8420042
Dependency pruning (#528)
|
3 years ago |
Shaden Smith
|
65c2f974d8
Pipeline parallel training engine. (#392)
|
4 years ago |
Jeff Rasley
|
e5bbc2e559
Sparse attn + ops/runtime refactor + v0.3.0 (#343)
|
4 years ago |