Reza Yazdani
|
fa0735760a
Fix the tensor-slicing with multi-GPU inference and kernel-injection (#1724)
|
2 年之前 |
Stas Bekman
|
bea701a1fc
[debug log] disable (#1736)
|
2 年之前 |
Stas Bekman
|
e26de20b4e
[distributed] print dist init only on rank 0 (#1737)
|
2 年之前 |
Stas Bekman
|
fdd59ca2b5
[zero3] remove debug print (#1733)
|
2 年之前 |
Stas Bekman
|
ed4bbe08d6
[config] fix assert message (#1734)
|
2 年之前 |
Jeff Rasley
|
9351266f78
Multi-node save pid support + allow sparse-attn extra (#1728)
|
2 年之前 |
Jeff Rasley
|
171316fc83
launcher save pid + require manual triton install for sparse-attn (#1727)
|
2 年之前 |
Sean Naren
|
df724e71e9
Add a very simple PyTorch Lightning test! (#1726)
|
2 年之前 |
Alex Hedges
|
4cf970e6bb
Add codespell to pre-commit checks (#1717)
|
2 年之前 |
Manuel R. Ciosici
|
09c065b4c3
Align bfloat16 docs (#1715)
|
2 年之前 |
Olatunji Ruwase
|
e40558ded2
Fix checkpoint api (#1714)
|
2 年之前 |
Justin Chiu
|
4912e0ad7e
Various ZeRO Stage3 Optimizations + Improvements (including bfloat16 support) (#1453)
|
2 年之前 |
Jeff Rasley
|
2d51f6171b
preserve cuda visible devices order (#1712)
|
2 年之前 |
Reza Yazdani
|
94de0229fb
Fix inference api & add more description on inference engine tutorial (#1711)
|
2 年之前 |
Jeff Rasley
|
2662fded2d
add logo and move news (#1709)
|
2 年之前 |
Ammar Ahmad Awan
|
af074de349
Reorganize MoE news and tutorials. (#1708)
|
2 年之前 |
Reza Yazdani
|
e27a60a879
Add more context for the MoE Inference tutorial (#1707)
|
2 年之前 |
Zhewei Yao
|
53fdadfb9a
pr moe tutorial creation (#1704)
|
2 年之前 |
Reza Yazdani
|
38e16c696d
add moe-inference tutorial (#1706)
|
2 年之前 |
Jeff Rasley
|
e46d808a1b
MoE inference + PR-MoE model support (#1705)
|
2 年之前 |
Jeff Rasley
|
3293cf72a0
[ZeRO] Default disable elastic ckpt in stage 1+2 and reduce CPU memory overhead during ckpt load (#1525)
|
2 年之前 |
Jeff Rasley
|
e4cf40d617
force clear stashed tensors (#1698)
|
2 年之前 |
liamcli
|
fead387f78
support module and no python args for launcher (#1690)
|
2 年之前 |
Jeff Rasley
|
a85dce0728
add -lcurand to fix torch-nightly issue w. JIT (#1688)
|
2 年之前 |
Jeff Rasley
|
3a4cb04243
[docs] switch to transparent dark logo
|
2 年之前 |
Reza Yazdani
|
762e697a03
fix the half-precision version of rotary_pos_emb kernel (#1683)
|
2 年之前 |
Reza Yazdani
|
289c3f9ba4
GPT-J inference support (#1670)
|
2 年之前 |
Jeff Rasley
|
7e857aab9a
[docs] add gh-dark-mode logo
|
2 年之前 |
Jeff Rasley
|
9c5cf3a5d4
[docs] add light-mode logo
|
2 年之前 |
Jeff Rasley
|
2422ec4885
add segfault guard for cpu-adam/adagrad (#1681)
|
2 年之前 |