hamlet
|
c69bd1f7b7
Fix pipline dataloader when batch elements contain tuple (#565)
|
1 年之前 |
Björn Plüster
|
0b7a760c35
Fixes timer error referenced in #4212 (#4213)
|
1 年之前 |
Olatunji Ruwase
|
6df158733d
Load z3 checkpoints for inference (#4171)
|
1 年之前 |
Alexander Jipa
|
b354c28b76
polishing timers and log_dist (#3996)
|
1 年之前 |
mzl
|
e8600b7f7d
remove duplicate check for pp and zero stage (#4033)
|
1 年之前 |
Olatunji Ruwase
|
0a0819b785
Option to exclude frozen weights for checkpoint save (#3953)
|
1 年之前 |
mzl
|
05a6cee1f4
do bcast only pp_group_size>1 (#3915)
|
1 年之前 |
YiSheng5
|
195563a2c5
fix a small type error on bf16+pp (#3441)
|
1 年之前 |
Joe Mayer
|
dcb4a7d664
Add ZeRO 1 support to PP for BF16. (#3399)
|
1 年之前 |
Nr Wu
|
b0d9c4d052
Fix `PipelineEngine.eval_batch` result (#3316)
|
1 年之前 |
hablb
|
7ddc3b01dd
Fix pipeline module evaluation when contiguous activation checkpointing is enabled (#3005)
|
1 年之前 |
Olatunji Ruwase
|
dd8df20fe0
zero3 checkpoint frozen params (#3205)
|
1 年之前 |
Michael Wyatt
|
b361c72761
Update DeepSpeed copyright license to Apache 2.0 (#3111)
|
1 年之前 |
Jeff Rasley
|
91d63e0228
update formatter version and style settings (#3098)
|
1 年之前 |
Satpal Singh Rathore
|
5c2a81c2c1
allow list (#3042)
|
1 年之前 |
Ma, Guokai
|
98cc35b6a8
Abstract accelerator (step 3) (#2677)
|
1 年之前 |
Alexander Jipa
|
0f0e38c520
fixes #2498 (#2603)
|
1 年之前 |
Conglong Li
|
ef869377e9
DeepSpeed Data Efficiency Library (#2585)
|
1 年之前 |
Adam Moody
|
b8fb9c3f1a
parallelize writing of layer checkpoint files across data parallel instances (#1419)
|
2 年之前 |
Arpan Jain
|
1ed5aa96a8
Elastic Training support in DeepSpeed (#2153) (#2156)
|
2 年之前 |
trajep
|
e669aaf55b
Trajepl/nebula ckpt engine (#2085)
|
2 年之前 |
Alex Hedges
|
316c4a43e0
Add flake8 to pre-commit checks (#2051)
|
2 年之前 |
Quentin Anthony
|
5349347bb6
DeepSpeed Communication Profiling and Logging (#2012)
|
2 年之前 |
Karim Foda
|
735406e536
fix import errors (#2026)
|
2 年之前 |
Quentin Anthony
|
c87f6ee209
DeepSpeed Monitor Module (Master) (#2013)
|
2 年之前 |
Ammar Ahmad Awan
|
36ad3119d5
DeepSpeed comm backend v1 (#1985)
|
2 年之前 |
Jeff Rasley
|
50893458d6
Fairseq support (#1915)
|
2 年之前 |
Stas Bekman
|
dbeadf16b5
[pipe] prevent deadlock with multiple evals sequence (#1944)
|
2 年之前 |
Zhengqiang Yin
|
a3b90030fd
Fix time error (#1934)
|
2 年之前 |
Jeff Rasley
|
b4fcd98ff0
Inference PP changes for neox (#1899)
|
2 年之前 |