Jeff Rasley
|
50893458d6
Fairseq support (#1915)
|
2 年之前 |
Stas Bekman
|
dbeadf16b5
[pipe] prevent deadlock with multiple evals sequence (#1944)
|
2 年之前 |
Shuai Zheng
|
de88718501
fix step in adam (#1823)
|
2 年之前 |
Manuel R. Ciosici
|
ae43ba1270
Update partition_parameters.py (#1943)
|
2 年之前 |
Stas Bekman
|
73c0798bd7
GatheredParameters - accept a tuple of params (#1941)
|
2 年之前 |
Jeff Rasley
|
ff908ed793
bump to 0.6.5
|
2 年之前 |
Olatunji Ruwase
|
673cb60808
Improve z3 trace management (#1916)
|
2 年之前 |
Zhengqiang Yin
|
a3b90030fd
Fix time error (#1934)
|
2 年之前 |
Jeff Rasley
|
a8d26d6ab5
[ZeRO-3] Rename confusing log message (#1932)
|
2 年之前 |
kisseternity
|
89e37ef360
spell err (#1929)
|
2 年之前 |
Olatunji Ruwase
|
af58f63dde
bf16 inference (#1917)
|
2 年之前 |
Ramya Ramineni
|
96c8bf32aa
Enable DeepSpeed inference on ROCm (#1922)
|
2 年之前 |
Jeff Rasley
|
a160d95778
explictly add op_builder to manifest (#1920)
|
2 年之前 |
Michael Wyatt
|
b7fc00dfd0
lazy import fcntl (#1921)
|
2 年之前 |
Jeff Rasley
|
eae1889ebc
bump to 0.6.4
|
2 年之前 |
Michael Wyatt
|
dda03361a7
bumped to v0.6.3
|
2 年之前 |
Jeff Rasley
|
a52cbf80f7
[zero-3] add bwd support for list/dict types returned in fwd (#1857)
|
2 年之前 |
Jeff Rasley
|
b4fcd98ff0
Inference PP changes for neox (#1899)
|
2 年之前 |
Olatunji Ruwase
|
32d97976ce
Fix OOM and type mismatch (#1884)
|
2 年之前 |
Shuai Zheng
|
4575b2b792
fix launcher for reading env vars (#1907)
|
2 年之前 |
Michael Wyatt
|
8cc8c003cb
Improve ds_report output for HIP/ROCm (#1906)
|
2 年之前 |
Olatunji Ruwase
|
ef17c89570
Fix multiple zero 3 tracing errors (#1901)
|
2 年之前 |
Olatunji Ruwase
|
fee7313598
Use cuda events to improve timing for multi-stream execution (#1881)
|
2 年之前 |
Conglong Li
|
66aae13d18
cast bool when not supported by torch2cupy (#1894)
|
2 年之前 |
Stas Bekman
|
fb00e6a1db
[partition_parameters.py] better diagnostics (#1887)
|
2 年之前 |
Manuel R. Ciosici
|
3b5b92cbb8
Use f-strings where possible (#1900)
|
2 年之前 |
Shuai Zheng
|
801c172345
fix file ordering (#1822)
|
2 年之前 |
Olatunji Ruwase
|
56c5223868
bf16+pipeline parallelism (#1801)
|
2 年之前 |
Jeff Rasley
|
9bf1e9af3a
[docs] fix commonmarker security issue (#1892)
|
2 年之前 |
dependabot[bot]
|
ea9c057582
Bump nokogiri from 1.13.3 to 1.13.4 in /docs (#1889)
|
2 年之前 |