Max H. Gerlach
|
f3af98649a
Let hvd.alltoall return the received splits if non-uniform splits are sent (TensorFlow, PyTorch, MXNet) (#2631)
|
3 年之前 |
Travis Addair
|
6916985c9d
Bumped version to v0.21.3 (#2667)
|
3 年之前 |
Travis Addair
|
c64b1d60c6
Bumped version to v0.21.2 (#2655)
|
3 年之前 |
Richard Liaw
|
4a06f38438
[ray] provide a remote execution call (#2649)
|
3 年之前 |
yuduber
|
8773a0cb94
Fix Wrong default for horovod.tensorflow.keras.allreduce average #2627 (#2651)
|
3 年之前 |
Richard Liaw
|
9ec2117946
[ray] support driver logging (#2639)
|
3 年之前 |
Peng Zhang
|
ea692ade19
Fix DL estimators for getting the output df schema (#2611)
|
3 年之前 |
Yana Shchyokotova
|
b72722b6b0
Add Intel(R) MPI support for horovodrun (#2374)
|
3 年之前 |
Richard Liaw
|
96dd0adbb6
[ray] fix local_rank issue (#2596)
|
3 年之前 |
Travis Addair
|
a9dea74abc
Bump version to v0.21.1 (#2577)
|
3 年之前 |
Richard Liaw
|
3565d5fbe4
[ray] update test, add blocking calls (#2575)
|
3 年之前 |
Mikhail Shiryaev
|
c61af32680
Add cache hint for CCL allreduce (#2560)
|
3 年之前 |
chongxiaoc
|
6fd953db9b
elastic: avoid sync if workers are only shrinked (#1876) (#2514)
|
3 年之前 |
chongxiaoc
|
994913a2ab
add support to check supported frameworks versions (#2518) (#2529)
|
3 年之前 |
Liang Zhang
|
a1fe93fd0d
avoid overwriting methods in the super class (#2526)
|
3 年之前 |
Richard Liaw
|
16f405f417
[ray] remove ssh reliance for elastic training (#2528)
|
3 年之前 |
Yana Shchyokotova
|
336f4d4da9
Migrate to the new oneCCL API (#2513)
|
3 年之前 |
Richard Liaw
|
d8bd98679c
[ray] fix executable cls (#2510)
|
3 年之前 |
Xu Ning
|
2fdea15bc6
Set model.eval() in TorchEstimator before return (#2517)
|
3 年之前 |
Jeff Daily
|
ee64d41428
[ROCm] upate for new hipify logic for PyTorch builds (#2360)
|
3 年之前 |
Travis Addair
|
125115583b
Revert "Updated oneCCL to use the new API (#2433)" (#2508)
|
3 年之前 |
Yana Shchyokotova
|
3d2021e083
Updated oneCCL to use the new API (#2433)
|
3 年之前 |
chongxiaoc
|
635d010058
Fix the search order of Eigen/Flatbuffers path (#2429) (#2473)
|
3 年之前 |
Travis Addair
|
d3c93d7b97
Updated to 0.21.1.dev0
|
3 年之前 |
Travis Addair
|
7d71874258
Bumped version to 0.21.0 and updated changelog (#2475)
|
3 年之前 |
Travis Addair
|
45b37585a2
Require initial_lr parameter to LearningRateScheduleCallback (#2459)
|
3 年之前 |
Travis Addair
|
3ffe479ebb
Changed default cycle time from 5ms to 1ms and fusion threshold from 64MB to 128MB (#2468)
|
3 年之前 |
Aaron Harlap
|
108d3a65e2
Update tf.keras hvd.allreduce() API to match tensorflow allreduce() (#2423)
|
4 年之前 |
Aaron Harlap
|
d93887da74
Add support for backward_passes_per_step > 1 for LegacyOptimizers (TF) in Graph Mode. (#2401)
|
4 年之前 |
Aaron Harlap
|
5f9df541db
Support backward_passes_per_step > 1 for TF Keras Eager Execution (#2371)
|
4 年之前 |