Logan Adams
|
239b83a77e
Cleanup CODEOWNERS file to be valid (#6603)
|
2 周之前 |
Jagadish Krishnamoorthy
|
b93c7a20c8
[ROCm] Fix subprocess error (#6587)
|
2 周之前 |
Logan Adams
|
8cded575a9
Fix torch include in `op_builder/mlu/fused_adam.py` and update no-torch workflow triggers (#6584)
|
3 周之前 |
Logan Adams
|
828ddfbbda
Fixes on the accelerate side mean we do not need to skip this test (#6583)
|
3 周之前 |
Yizhou Wang
|
d4e1895076
[COMPILE] workflow for deepspeed + torch.compile (#6570)
|
3 周之前 |
Nadav Elyahu
|
1caf6e8107
add bfloat16 to inference support dtypes (#6528)
|
3 周之前 |
Masahiro Tanaka
|
047bcf6af6
Add APIs to offload states of model, optimizer, and engine (#6011)
|
3 周之前 |
Liangliang Ma
|
d45cfd3455
[XPU] Support DeepNVMe new code structure (#6532)
|
3 周之前 |
Nir Sonnenschein
|
ba58682a13
fix errors when setting zero3 leaf modules with torch.compile (#6564)
|
3 周之前 |
Masahiro Tanaka
|
c85c8703bc
Fix gradient accumulation for Z2+offload (#6550)
|
3 周之前 |
andyG
|
0fbe96a502
[Accelerator] Cambricon MLU support (#6472)
|
3 周之前 |
Olatunji Ruwase
|
a5400974df
DeepNVMe perf tuning (#6560)
|
3 周之前 |
Masahiro Tanaka
|
7622cd9e68
Use msgpack for p2p comm (#6547)
|
3 周之前 |
Logan Adams
|
61de017176
Skip failing newly added tests in accelerate (#6574)
|
3 周之前 |
ShifaAbu
|
2a56f53395
Added Intel Gaudi to Accelerator Setup Guide (#6543)
|
1 月之前 |
Logan Adams
|
170b46e8b1
Add conditional on torch version for scaled_dot_product_attention (#6517)
|
1 月之前 |
Olatunji Ruwase
|
659f6be105
Avoid security issues of subprocess shell (#6498)
|
1 月之前 |
Omar Elayan
|
c27483933d
wrap include cuda_bf16.h with ifdef BF16_AVAILABLE (#6520)
|
1 月之前 |
Nadav Elyahu
|
8fa6b50bfe
Revert "BF16 optimizer: Clear lp grads after updating hp grads in hook" (#6508)
|
1 月之前 |
Geary.Z
|
fc22d9602d
fix environment variable export bug for MultiNodeRunner (#5878)
|
1 月之前 |
Roger Feng
|
2a647c51d4
Fix the broken url link (#6500)
|
1 月之前 |
Nadav Elyahu
|
3b09d945ea
fix pipeline eval_batch micro_batches argument for schedule (#6484)
|
1 月之前 |
Jinxing Pan
|
4f803852ac
Op_builder->is_compatible quite warning (#6093)
|
1 月之前 |
Nadav Elyahu
|
857780a85a
HPU: add required ENV vars to acccelerator init (#6495)
|
1 月之前 |
Logan Adams
|
c210e601e3
Update version.txt after 0.15.1 release (#6493)
|
1 月之前 |
Alex Morehead
|
10ba3dde84
Handle an edge case where `CUDA_HOME` is not defined on ROCm systems (#6488)
|
1 月之前 |
Olatunji Ruwase
|
662a421b05
Safe usage of popen (#6490)
|
1 月之前 |
Jiancheng Liu
|
ddd3571823
Add default value to "checkpoint_folder" in "load_state_dict" of bf16_optimizer (#6446)
|
1 月之前 |
Olatunji Ruwase
|
5d1a30c033
DS_BUILD_OPS should build only compatible ops (#6489)
|
1 月之前 |
Masahiro Tanaka
|
ddeb0c19a0
Fix patch for parameter partitioning in zero.Init() (#6388)
|
1 月之前 |