Mikhail Druzhinin
|
b62e0cc5a8
Add gradient_average flag support for sparse grads (#2188)
|
2 年之前 |
Reza Yazdani
|
54a9e1b924
Fix the layer-past for GPT based models (#2196)
|
2 年之前 |
Tiago De Gaspari
|
0f5c2012ce
Update README.md (#2192)
|
2 年之前 |
Jeff Rasley
|
1223b13c9a
Update nv-lightning-v100.yml (#2190)
|
2 年之前 |
Jeff Rasley
|
33667e0e54
[docs] add more models to adoption (#2189)
|
2 年之前 |
Rahil Bathwal
|
ee5ce52460
fix missing import (#2175)
|
2 年之前 |
Hanlin Tang
|
80b5b9259b
Update README to latest Composer version (#2177)
|
2 年之前 |
Ramya Ramineni
|
2e3769a1f4
Enable fused_lamb_cuda_kernel on ROCm (#2148)
|
2 年之前 |
Olatunji Ruwase
|
e419f7cbcd
Match compute and reduce dtype (#2145)
|
2 年之前 |
Reza Yazdani
|
e7d9959540
fixing model partitioning without injection (#2179)
|
2 年之前 |
Jeff Rasley
|
fad0a4106d
update offload docs to include stage 1 (#2178)
|
2 年之前 |
Michael Wyatt
|
d1cd18e5fb
Update for AMD CI workflow (#2172)
|
2 年之前 |
Jeff Rasley
|
bb49dc73f5
[docs] adoption updates (#2173)
|
2 年之前 |
Zion Wu
|
6bfcf3c694
Fix wrong unit of latency in flops-profiler (#2090) (#2095)
|
2 年之前 |
Jeff Rasley
|
776e36988d
delay torch import for inference compatability check (#2167)
|
2 年之前 |
Michael Wyatt
|
1a71e77dc2
Fix for distributed tests on pytorch>=1.12 (#2141)
|
2 年之前 |
Jeff Rasley
|
b005db86fc
bump to 0.7.1
|
2 年之前 |
Siddharth Singh
|
5fe9d61065
Tensor parallelism for Mixture of Experts (#2074)
|
2 年之前 |
Olatunji Ruwase
|
2210ebe70f
Release swap buffers for persisted params (#2089)
|
2 年之前 |
Jeff Rasley
|
a039e2261a
enable fp16 input autocasting (#2158)
|
2 年之前 |
Jeff Rasley
|
46401b3884
[zero-3] shutdown zero.Init from within ds.init (#2150)
|
2 年之前 |
Jeff Rasley
|
63f470eeb6
prevent cuda 10 builds of inference kernels on ampere (#2157)
|
2 年之前 |
Arpan Jain
|
1ed5aa96a8
Elastic Training support in DeepSpeed (#2153) (#2156)
|
2 年之前 |
Nicholas Cilfone
|
ba67bd9a14
Added retain_graph as a kwarg to the main engine backward function (#1149)
|
2 年之前 |
Reza Yazdani
|
556f005152
Fix random token-generation issue + MP-checkpoint loading/saving (#2132)
|
2 年之前 |
shjwudp
|
57140e8e95
fix: fix BF16_Optimizer compatibility issue with optimizer state 0-dim tensor (#2152)
|
2 年之前 |
Jerry Mannil
|
66d29b0a6c
Graceful exit on failures for multi-node runs (#2008)
|
2 年之前 |
trajep
|
e669aaf55b
Trajepl/nebula ckpt engine (#2085)
|
2 年之前 |
Jeff Rasley
|
a54661a06f
force newer datasets version (#2147)
|
2 年之前 |
Jeff Rasley
|
b442264dc9
formatting fix for #1962
|
2 年之前 |