Logan Adams
|
101fff1da7
Update to latest pydantic (2.0+)
|
1 year ago |
Logan Adams
|
3c89576105
Merge branch 'master' into loadams/update-pydantic
|
1 year ago |
Jinzhen Lin
|
7e8bcc07d6
fix "undefined symbol: curandCreateGenerator" for quantizer op (#3846)
|
1 year ago |
YiSheng5
|
c520d47679
[profiling][mics]Fix some issues for log_summary(). (#3899)
|
1 year ago |
Ramya Ramineni
|
d24629f4fd
[ROCm] Enable TestCUDABackward::test_backward unit tests (#3849)
|
1 year ago |
hipudding
|
c5e55d3d14
Fix a typo of global variable in comm.py(#3852) (#3852)
|
1 year ago |
Michael Wyatt
|
b58e0fa92a
avoid init for deepspeed backend first (#3893)
|
1 year ago |
Alexander Grund
|
9aeba94a8e
Avoid deprecation warnings in `CHECK_CUDA` (#3854)
|
1 year ago |
Michael Wyatt
|
52844f4956
Update workflows for merge queue (#3892)
|
1 year ago |
Pinstripe Potoroo
|
3491e32d72
fix rnn flop profiler to compute flops instead of macs (#3833)
|
1 year ago |
digger yu
|
97e7b8410c
fix error :Dictionary expression not allowed in type annotation Pylance (#3708)
|
1 year ago |
Lev Kurilenko
|
cc3a7c9cba
Fix Meta Tensor checkpoint load for BLOOM models (#3885)
|
1 year ago |
Yejing-Lai
|
d6f622176d
Add GPTNeoX AutoTP support (#3778)
|
1 year ago |
Heyang Qin
|
9377921a3f
Separate ZeRO3 InflightParamRegistry for train and eval (#3884)
|
1 year ago |
Ammar Ahmad Awan
|
db4638d157
Extend HE-Lora test with Z3 support + Fix/add guard in HE for Z3 (#3883)
|
1 year ago |
Logan Adams
|
59c9b0914f
Update apex installation to resolve apex's pyproject.toml issues. (#3745)
|
1 year ago |
Reza Yazdani
|
f3c93b056d
Add FALCON Auto-TP Support (#3640)
|
1 year ago |
Mashiro
|
385e89d4a8
update MMEngine support in README.md (#3879)
|
1 year ago |
Michael Wyatt
|
7c126f431c
update lightning version in CI (#3882)
|
1 year ago |
Xingjian Shi
|
d81dfdabcc
Fix LoRA Fuse/Unfuse in Hybrid Engine (#3563)
|
1 year ago |
Pinstripe Potoroo
|
c1c1d2496f
fix retrieval of out_channels in _conv_trans_flops_compute (#3834)
|
1 year ago |
Jeff Rasley
|
691d246e02
[zero] revert PR #3166, it disabled grad clip for bf16 (#3790)
|
1 year ago |
hablb
|
d229ff175e
Zero3 Fix allreduce optimization for extra large tensor (#3832)
|
1 year ago |
Guo Yejun
|
807d1b5dfc
scripts/check-torchcuda.py: add checking for tensor.is_cuda (#3843)
|
1 year ago |
Alexander Jipa
|
2ded2ff0be
checking process_group before merging bucket ranges (#3521) (#3577)
|
1 year ago |
Ma, Guokai
|
5d1124f2aa
[profiling]add show_straggler argument to log_summary() (#3579)
|
1 year ago |
Michael Wyatt
|
fd1d2c6447
Reduce Unit Test Time (Part 2) (#3838)
|
1 year ago |
Logan Adams
|
c973e15711
Disable AMD test flows in YML (#3847)
|
1 year ago |
Guo Yejun
|
b4626194e4
zero/mics.py: use on_accelerator instead of cuda only (#3806)
|
1 year ago |
Heyang Qin
|
f8551b439e
Fix racing condition in GatheredParameters (#3819)
|
1 year ago |