Michael Wyatt
|
825f9d488b
Fixes for various CI problems (#2457)
|
2 年之前 |
Guo Yejun
|
3432c740e9
deepspeed/launcher/launch.py: add option '--enable_each_rank_log logdir' (#2409)
|
2 年之前 |
Connor Holmes
|
be4ffb82ad
Reduction Kernel Utility (#2436)
|
2 年之前 |
Joe Mayer
|
3b3ba3c256
Fixing a mismatch in basic adam test. (#2447)
|
2 年之前 |
Michael Wyatt
|
e772f16665
Use CUDA events for inference model profiling (#2371)
|
2 年之前 |
Cheng Li
|
8da0238b7a
rollback ds config changes (#2395)
|
2 年之前 |
eltonzheng
|
b85eb3b979
Fix build issues on Windows (#2428)
|
2 年之前 |
Cheng Li
|
5d1f595c94
update pytorch pool operator function signiture (#2443)
|
2 年之前 |
Joe Mayer
|
7d113633e4
Fix Bug #2319 (#2438)
|
2 年之前 |
Jeff Rasley
|
a524864357
bump to 0.7.5
|
2 年之前 |
lekurile
|
877a8818a7
Fix broken link to DeepSpeed Megatron fork (#2440)
|
2 年之前 |
Adam Moody
|
b8fb9c3f1a
parallelize writing of layer checkpoint files across data parallel instances (#1419)
|
2 年之前 |
Stas Bekman
|
99fde3b7a5
[memory estimators] new config args sync (#2431)
|
2 年之前 |
lekurile
|
b2a724e257
Add TestInjectionPolicy inference unittest class for testing custom injection policies (#2426)
|
2 年之前 |
Jeff Rasley
|
1b7c6791d5
only add deps if extra is explictly called (#2432)
|
2 年之前 |
Olatunji Ruwase
|
799120e7e4
Universal checkpoint for zero stage 1 (#2284)
|
2 年之前 |
Joe Mayer
|
906b4a025f
Fixing bug 2361 (#2410)
|
2 年之前 |
Michael Wyatt
|
34fb6d1980
Fix for inference gpt-j test (#2430)
|
2 年之前 |
Alexander Jipa
|
cfead55132
fixes #2389 (#2411)
|
2 年之前 |
Molly Smith
|
5eafb8c78d
Make error regex more generic in collect_results.py (#2415)
|
2 年之前 |
Reza Yazdani
|
537e8581fe
fix checkpoint loading when it is a dictionary (#2425)
|
2 年之前 |
Jeff Rasley
|
ec13da6ba7
add SD injection policy (#2381)
|
2 年之前 |
Jeff Rasley
|
d716537413
[docs] update mii blog title (#2423)
|
2 年之前 |
Jeff Rasley
|
1c39fe13c1
CI fixes related to triton (#2422)
|
2 年之前 |
Samyam Rajbhandari
|
1ffcfebf1a
MII-Public and MII-Azure subheading in mii post
|
2 年之前 |
Andrey Chernykh
|
cd3a70953a
Fix GPT Neo-X multi-gpu inference (#2401)
|
2 年之前 |
Samyam Rajbhandari
|
843fd4c19d
DeepSpeed-MII title change in website
|
2 年之前 |
Samyam Rajbhandari
|
3789dc5c59
MII blog title update on Readme
|
2 年之前 |
Andrey Chernykh
|
d5d10b0ce8
Fix issue with corrupted output on long generation for GPT (#2359)
|
2 年之前 |
Dashiell Stander
|
3db0b5e2de
Add SLURM Multinode Runner (#2404)
|
2 年之前 |