Reza Yazdani
|
d154cc0f55
Ds inference/fix mp2 (#2270)
|
2 年之前 |
Mikhail Druzhinin
|
4671cce558
Fix OrderedDict import for python3.6 (#2267)
|
2 年之前 |
Molly Smith
|
a7ee688a6f
Update replace_module.py, test-gptj.py related fix (#2269)
|
2 年之前 |
Michael Wyatt
|
55b7b9e008
Add blob storage to CI runners (#2260)
|
2 年之前 |
Jeff Rasley
|
220e508876
bump to 0.7.3
|
2 年之前 |
叶志晟
|
80f94c10c5
fix #2240: wrong time unit in flops_profiler (#2241)
|
2 年之前 |
Connor Holmes
|
2a64448830
Update half precision header guards (#2261)
|
2 年之前 |
Olatunji Ruwase
|
cb5e05fe55
Correctly detect CPU optimizer usage (#2257)
|
2 年之前 |
Siddharth Singh
|
b288cf1b9b
Enable contiguous gradients with Z1+MoE (#2250)
|
2 年之前 |
Jeff Rasley
|
ebed51df78
bump to 0.7.2
|
2 年之前 |
Reza Yazdani
|
c35bfe89f6
fix ds-inference without policy (#2247)
|
2 年之前 |
Arash Bakhtiari
|
fae896ef60
Make OPT policy backward compatible with pre-OPT transformers versions (#2254)
|
2 年之前 |
Olatunji Ruwase
|
217338beb6
Refactor dist tests: Checkpointing (#2202)
|
2 年之前 |
Jeff Rasley
|
86164c487e
update videos (#2249)
|
2 年之前 |
Olatunji Ruwase
|
a9b3bfa2a8
Correctly detect zero_offload (#2213)
|
2 年之前 |
Jeff Rasley
|
dce3acaac7
allow saving ckpt w/o ckpt json + bloom copy fix (#2237)
|
2 年之前 |
Reza Yazdani
|
fda63432ba
Remove the random-generator from context during inference (#2228)
|
2 年之前 |
Conglong Li
|
5e42cc8be2
add doc for new bert example (#2224)
|
2 年之前 |
Jeff Rasley
|
7d8ad45d6a
Fix regression w. dist_init_required (#2225)
|
2 年之前 |
Zhihong Chen
|
9b418c1e1c
fix typos in readme. (#2218)
|
2 年之前 |
Arash Bakhtiari
|
8b2a63717a
Add support of OPT models (#2205)
|
2 年之前 |
Ammar Ahmad Awan
|
b5ac0d542d
[zero-3] print warning once and support torch parameter (#2127)
|
2 年之前 |
Jeff Rasley
|
f0054691ca
use torch 1.9 (#2215)
|
2 年之前 |
Jeff Rasley
|
a84f9da8a2
add cuda 11.7 (#2211)
|
2 年之前 |
Olatunji Ruwase
|
5870f36c58
Correctly detect offload configuration (#2208)
|
2 年之前 |
Kamal Raj
|
87b201330f
fix table syntax (#2204)
|
2 年之前 |
Michael Wyatt
|
ac9951985f
Refactor Distributed Tests (#2180)
|
2 年之前 |
Reza Yazdani
|
8920308c66
Fix the tensor-slicing copy for qkv parameters (#2198)
|
2 年之前 |
Olatunji Ruwase
|
28dfca8a13
Log user config exactly (#2201)
|
2 年之前 |
Minjia Zhang
|
f82846d7df
Adding additional instructiosn in the compression tutorial on pre-training distillation and quantization for GPT (#2197)
|
2 年之前 |