Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog.
[Unreleased] - YYYY-MM-DD
Added
Changed
Deprecated
Removed
Fixed
[v0.28.1] - 2023-06-12
Fixed
- Fixed build with gcc 12. (#3925)
- PyTorch: Fixed build on ROCm. (#3928)
- TensorFlow: Fixed local_rank_op. (#3940)
[v0.28.0] - 2023-05-10
Added
- TensorFlow: Added new
get_local_and_global_gradients
to PartialDistributedGradientTape to retrieve local and non-local gradients separately. (#3859)
Changed
- Improved reducescatter performance by allocating output tensors before enqueuing the operation. (#3824)
- TensorFlow: Ensured that
tf.logical_and
within allreduce tf.cond
runs on CPU. (#3885)
- TensorFlow: Added support for Keras 2.11+ optimizers. (#3860)
CUDA_VISIBLE_DEVICES
environment variable is no longer passed to remote nodes. (#3865)
Fixed
- Fixed build with ROCm. (#3839, #3848)
- Fixed build of Docker image horovod-nvtabular. (#3851)
- Fixed linking recent NCCL by defaulting CUDA runtime library linkage to static and ensuring that weak symbols are overridden. (#3867, #3846)
- Fixed compatibility with TensorFlow 2.12 and recent nightly versions. (#3864, #3894, #3906, #3907)
- Fixed missing arguments of Keras allreduce function. (#3905)
- Updated with_device functions in MXNet and PyTorch to skip unnecessary cudaSetDevice calls. (#3912)
[v0.27.0] - 2023-02-01
Added
- Keras: Added
PartialDistributedOptimizer
API. (#3738)
- Added
HOROVOD_SPARK_USE_LOCAL_RANK_GPU_INDEX
environment variable to ignore GPU device indices assigned by Spark and always use local rank GPU device in Spark estimators. (#3737)
- Added support for reducescatter arguments
prescale_factor
and postscale_factor
and moved averaging into Horovod backend. (#3815)
- Spark Estimator: Added support for custom data loaders in TorchEstimator. (#3787)
- Spark Estimator: Added NVTabular data loader for TorchEstimator. (#3787)
Changed
- Improved NCCL performance for fused allgather operations through padding for better memory alignment. (#3727)
- Improved look-ahead tensor fusion buffer size estimates when allgather and other operations are mixed. (#3727)
Fixed
- ROCm: Fixed GPU MPI operations support in build. (#3746)
- PyTorch: Fixed linking order to avoid using Gloo from PyTorch dynamic libraries. (#3750)
- Fixed memory leak in
MPI_GPUAllgather
. (#3727)
- TensorFlow: Fixed deprecation warnings when building with TensorFlow 2.11. (#3767)
- Keras: Added support for additional arguments to
SyncBatchNormalization._moments()
. (#3775)
- Fixed version number parsing with pypa/packaging 22.0. (#3794)
- TensorFlow: Fixed linking with nightly versions leading up to TensorFlow 2.12. (#3755)
- TensorFlow: Fixed handling of
tf.IndexedSlices
types when scaling local gradients. (#3786)
- Added missing
MEMCPY_IN_FUSION_BUFFER
timeline event for reducescatter. (#3808)
- Fixed build of Docker image horovod-nvtabular. (#3817)
- TensorFlow: Several fixes for allreduce and grouped allreduce handling of
tf.IndexedSlices
. (#3813)
- Spark: Restricted PyArrow to versions < 11.0. (#3830)
- TensorFlow: Resolved conflicts between multiple optimizer wrappers reusing the same gradient accumulation counter. (#3783)
- TensorFlow/Keras: Fixed
DistributedOptimizer
with Keras 2.11+. (#3822)
- PyTorch, ROCm: Fixed allreduce average on process sets. (#3815)
[v0.26.1] - 2022-10-14
Fixed
- Fixed packaging import during install to occur after install_requires. (#3741)
[v0.26.0] - 2022-10-13
Added
- Spark Estimator: Added support for custom data loaders in KerasEstimator. (#3603)
- Spark Estimator: Added NVTabular data loader for KerasEstimator. (#3603)
- Spark Estimator: Added gradient accumulation support to Spark torch estimator. (#3681)
- TensorFlow: Added
register_local_var
functionality to distributed optimizers and local gradient aggregators. (#3695)
- TensorFlow: Added support for local variables for
BroadcastGlobalVariablesCallback
. (#3703)
- Enabled use of native
ncclAvg
op for NCCL allreduces. (#3646)
- Added support for additional reduction operations for
allreduce
(min, max, product). (#3660)
- Added 2D torus
allreduce
using NCCL. (#3608)
- Added support for Petastorm reader level parallel shuffling. (#3665)
- Added random seed support for Lightning datamodule to generate reproducible data loading outputs. (#3665)
- Added support for
int8
and uint8
allreduce
and grouped_allreduce
in TensorFlow. (#3649)
- Added support for batched memory copies in
GPUAllgather
. (#3590)
- Added support for batched memory copies in
GPUReducescatter
. (#3621)
- Added
hvd.grouped_allgather()
and hvd.grouped_reducescatter()
operations. (#3594)
- Added warning messages if output tensor memory allocations fail. (#3594)
- Added
register_local_source
and use_generic_names
funtionality to DistributedGradientTape
. (#3628)
- Added
PartialDistributedGradientTape()
API for model parallel use cases. (#3643)
- Spark/Lightning: Added
reader_worker_count
and reader_pool_type
. (#3612)
- Spark/Lightning: Added
transformation_edit_fields
and transformation_removed_fields
param for EstimatorParams
. (#3651)
- TensorFlow: Added doc string for
hvd.grouped_allreduce()
. (#3594)
- ROCm: Enabled
alltoall
. (#3654)
Changed
- Default Petastorm reader pool is changed from
process
to thread
for lower memory usage. (#3665)
- Keras: Support only legacy optimizers in Keras 2.11+. (#3725)
- Gloo: When negotiating, use
gather
rather than allgather
. (#3633)
- Use
packaging.version
instead of distutils
version classes. (#3700)
Deprecated
- Deprecated field
shuffle_buffer_size
from EstimatorParams
. Use shuffle
to enable shuffle or not. (#3665)
Removed
- Build: Removed std::regex use for better cxxabi11 compatibility. (#3584)
Fixed
- TensorFlow: Fixed the optimizer iteration increments when
backward_passes_per_step > 1
. (#3631)
- Fixed
FuseResponses()
on BATCHED_D2D_PADDING
edge cases for Reducescatter and/or ROCm. (#3621)
- PyTorch: Fixed Reducescatter functions to raise
HorovodInternalError
rather than RuntimeError
. (#3594)
- PyTorch on GPUs without GPU operations: Fixed grouped allreduce to set CPU device in tensor table. (#3594)
- Fixed race condition in PyTorch allocation handling. (#3639)
- Build: Fixed finding
nvcc
(if not in $PATH
) with older versions of CMake. (#3682)
- Fixed
reducescatter()
and grouped_reducescatter()
to raise clean exceptions for scalar inputs. (#3699)
- Updated Eigen submodule to fix build on macOS with aarch64. (#3619)
- Build: Correctly select files in
torch/
directory to be hipified. (#3588)
- Build: Modify regex match for CUDA|ROCm in
FindPytorch.cmake
. (#3593)
- Build: Fixed ROCm-specific build failure. (#3630)
[v0.25.0] - 2022-06-20
Added
- Added
hvd.reducescatter()
operation with implementations in NCCL, MPI, and Gloo. (#3299, #3574)
- Added AMD GPU XLA Op Implementation. (#3486)
- Added Horovod job to spin up distributed TensorFlow Data Service. (#3525)
- Spark: Expose random seed as an optional parameter. (#3517)
- Add Helm Chart. (#3546)
- Elastic: Add elastic run API. (#3503)
- Spark Estimator: Expose random seed for model training reproducibility. (#3517)
- Spark Estimator: Add option whether to use GPUs at all. (#3526)
- Spark Estimator: Expose parameter to set start method for
multiprocessing
. (#3580)
Changed
- MXNet: Updated allreduce functions to newer
op
API. (#3299)
- TensorFlow: Make TensorFlow output allocations asynchronous when using NCCL backend. (#3464)
- TensorFlow: Clear locally accumulated gradient by assigning with
zeros_like
to avoid infinite gradient not correctly cleared up. (#3505)
- Make
HorovodVersionMismatchError
subclass ImportError
instead of just a standard Exception
. (#3549)
- Elastic: Catch any exception to prevent the discovery thread from silently dying. (#3436)
- Horovodrun: Exit check_build (
--check-build
) via sys.exit
to flush stdout. (#3272)
- Spark: Use
env
to set environment vars in remote shell. (#3489)
- Build: Avoid redundant ptx generation for maximum specified compute capability. (#3509)
Deprecated
- MXNet: Deprecated
average
argument of allreduce functions. (#3299)
- Public and internal APIs: deprecate use of np, min_np, max_np. Use num_proc, min_num_proc, and max_num_proc, respectively, instead. (#3409)
- Horovodrun: Providing multiple NICS as comma-separated string via
--network-interface
is deprecated,
use --network-interface
multiple times or --network-interfaces
instead. (#3506)
- horovod.run: Argument
network_interface
with comma-separated string is deprecated,
use network_interfaces
with Iterable[str]
instead. (#3506)
Fixed
- Fallback to NCCL shared lib if static one is not found. (#3500
- Spark/Lightning: Added missing
tranform_spec
for Petastorm datamodule. (#3543)
- Spark/Lightning: Fixed PTL Spark example with checkpoint usage by calling
save_hyperparameters()
. (#3527)
- Elastic: Fixed empty hostname returned from
HostDiscoveryScript
. (#3490)
- TensorFlow 2.9: Fixed build for API change related to
tensorflow_accelerator_device_info
. (#3513)
- TensorFlow 2.10: Bumped build partially to C++17. (#3558)
- TensorFlow: Fixed gradient update timing in TF
AggregationHelperEager
. (#3496)
- TensorFlow: Fixed resource
NotFoundError
in TF AggregationHelper
. (#3499)
[v0.24.3] - 2022-04-21
Fixed
- Make DBFSLocalStore support "file:/dbfs/...", implement get_localized_path. (#3510)
[v0.24.2] - 2022-03-10
Fixed
- Setup: Require fsspec >= 2010.07.0 (#3451)
- Fix ignored cuda arch flags (#3462
[v0.24.1] - 2022-03-03
Fixed
- Extended CMake build script to often find CUDA even if
nvcc
is not in $PATH
. (#3444)
[v0.24.0] - 2022-03-01
Added
- Ray: Added elastic keyword parameters to RayExecutor API: This API supports both static (non-elastic) and elastic Horovod jobs. (#3190)
- TensorFlow: Added in-place broadcasting of variables. (#3128)
- Elastic: Added support for resurrecting blacklisted hosts. (#3319)
- MXNet: Added support for MXNet async dependency engine. (#3242, #2963)
- Spark/Lightning: Added history to lightning estimator. (#3214)
Changed
- Moved to CMake version 3.13 with first-class CUDA language support and re-enabled parallelized builds. Uses a temporary installation of CMake if CMake 3.13 is not found. (#3261, #3371)
- Moved released Docker image
horovod
and horovod-cpu
to Ubuntu 20.04 and Python 3.8. (#3393)
- Spark Estimator: Don't shuffle row groups if training data requires non-shuffle (#3369)
- Spark/Lightning: Reduced memory footprint of async dataloader. (#3239)
- Elastic: Improved handling NCCL errors under elastic scenario. (#3112)
- Spark/Lightning: Do not overwrite model with checkpoint by default. (#3201)
- Make checkpoint name optional so that user can save to h5 format. (#3411)
Deprecated
- Deprecated ElasticRayExecutor APIs in favor of the new RayExecutor API. (#3190)
Removed
- Spark: Removed
h5py<3
constraint as this is not needed anymore for Tensorflow >2.5.0. (#3301)
Fixed
- Elastic Spark: Fixed indices in initial task-to-task registration. (#3410)
- PyTorch: Fixed GIL-related deadlock with PyTorch 1.10.1. (#3352)
- PyTorch: Fixed finalization of ProcessSetTable. (#3351)
- Fixed remote trainers to point to the correct shared lib path. (#3258)
- Fixed imports from
tensorflow.python.keras
with tensorflow 2.6.0+. (#3403)
- Fixed Adasum communicator init logic. (#3379)
- Lightning: Fixed resume logger. (#3375)
- Fixed the checkpoint directory structure for pytorch and pytorch lightning. (#3362)
- Fixed possible integer overflow in multiplication. (#3368)
- Fixed the
pytorch_lightning_mnist.py
example. (#3245, #3290)
- Fixed barrier segmentation fault. (#3313)
- Fixed
hvd.barrier()
tensor queue management. (#3300)
- Fixed PyArrow "list index out of range" IndexError. (#3274)
- Elastic: Fixed all workers sometimes failing on elastic Horovod failure. (#3264)
- Spark/Lightning: Fixed setting
limit_train_batches
and limit_val_batches
. (#3237)
- Elastic: Fixed ElasticSampler and
hvd.elastic.state
losing some indices of processed samples when nodes dropped. (#3143)
- Spark/Lightning: Fixed history metrics for estimator serialization. (#3216)
- Ray: Fixed RayExecutor to fail when
num_workers=0
and num_hosts=None
. (#3210)
- Spark/Lightning: Fixed checkpoint callback
dirpath
typo. (#3204)
[v0.23.0] - 2021-10-06
Added
- Added process sets to concurrently run collective operations on subsets of Horovod processes in TensorFlow, PyTorch, and MXNet. (#2839, #3042, #3043, #3054, #3083, #3090)
- Added XLA support for Allreduce via
tf.function(jit_compile=True)
. (#3053)
- Added fused buffer scaling and unpack/pack kernels on GPU. (#2973)
- Added support for NCCL on CUDA 11.4. (#3182)
- Added fp16 compression for MXNet. (#2987)
- Added terminate_on_nan flag to Spark Lightning estimator. (#3088)
- Added barrier() API to torch module to support simple synchronization among ranks and to achieve parity with PyTorch DDP and similar frameworks. #3139
- Added params for customizing Tensorboard callback. (#3153)
- Added
hvd.cross_rank()
for keras. (#3008)
- Added barrier() API to torch module to support simple synchronization among ranks and to achieve parity with PyTorch DDP and similar frameworks. #3139
Changed
- Implemented more asynchronous dependency handling on GPU. (#2963)
- Ray: RayExecutor will now use the current placement group instead of always creating a new one. (#3134)
- Lightning: turned off shuffling for validation dataset. (#2974)
- Ray: RayExecutor will use the current placement group if one exists. (#3134)
- Extended
hvd.join()
to return the last rank that joined. (#3097
Deprecated
Removed
- Spark/Keras: remove bare Keras support. (#3191)
Fixed
- Fix Horovod develop/editable install mode and incremental builds. (#3074)
- Estimator/Lightning: use lightning datamodule. (#3084)
- Fix Horovod Spark StringType and numpy type mapping issue. (#3146)
- Fixed error in Keras LearningRateScheduler. (#3135)
- Fixed bug in Lightning Profiler on Ray. (#3122)
- Fixed torch op lazy release to prevent OOM in elastic training. (#3110)
- Lightning: Fixed usage of the checkpoint callback. (#3186)
- Fixed MPICH support to use Intel MPI's implementation. (#3148)
- Fixed race condition in PyTorch async dataloader. (#3120)
- Keras: Fixed learning rate scheduler. (#3142, #3135)
[v0.22.1] - 2021-06-10
Added
- Estimator: added support for loading data from S3, GCS, ADLS, and other remote filesystems. (#2927)
- Estimator: added custom Spark data loader interface. (#2938)
- LightningEstimator: added support to supply a logger and associated parameter to control the frequency of logging. (#2926)
- Estimator: added check to ensure all ranks have the same device type. (#2942)
Changed
- Changed behavior from using TensorBoardLogger to now using it as a fallback if a logger is not supplied. (#2926)
- Ray: disabled capturing child tasks in placement group. (#2920)
Fixed
- Fixed
hvd.tensorflow.keras.Compression
, accidentally removed in v0.22.0. (#2945)
- TorchEstimator: fixed usage of
validation_steps
in place of validation_steps_per_epoch
. (#2918)
- TensorFlow: fixed C++ API for TF v2.6.0. (#2932)
- PyTorch: fixed
sparse_allreduce_async
for PyTorch v0.10.0. (#2965)
[v0.22.0] - 2021-05-18
Added
- Added pytorch_lightning spark estimator which enables training pytorch_lightning models. (#2713)
- Added NVTX tracing hooks for profiling with Nsight Systems. (#2723)
- Added a generic
num_workers
API for RayExecutor
(#2870)
- Supports Ray Client without code changes. (#2882)
- Supports inmemory cache option for Keras Estimator. (#2896)
- Added FP16 support for GPU tensor in mxnet. (#2915)
- Added response caching for allgather operations. (#2872)
- Estimator: add petastorm reader_pool_type into constructor (#2903)
Changed
- Changed
alltoall
to return the received splits as a second return value if non-uniform splits are sent. (#2631)
- Changed
RayExecutor
to use Ray Placement Groups for worker colocation. (#2824)
- Changed
Inmemory dataloader
usage for Torch Estimator with petastorm v0.11.0 release. (#2896)
Fixed
- Changed RayExecutor to use Ray node ID to enable multi-container:single-host setups. (#2883)
- Support sparse gradients aggregation in TF1 Keras. (#2879)
- Respect
global_step
parameter for LegacyOptimizers when aggregating gradients. (#2879)
- Fixed compatibility with PyTorch 1.9.0. (#2829)
[v0.21.3] - 2021-02-15
Added
- Add
groups
parameter in DistributedOptimizer
for custom allreduce groups. (#2523)
Removed
- Removed
num_groups
parameter in DistributedOptimizer
, replaced with groups
. (#2523)
Fixed
- Fixed worker desynchronization deadlock issue in TensorFlow 2.4. (#2647)
- Deduped Keras
LearningRateWarmupCallback
log after gradual learning rate warmup. (#2661)
[v0.21.2] - 2021-02-08
Added
- Added support for Intel(R) MPI in horovodrun. (#2374)
- Add support for callbacks in Ray Elastic Executor. (#2639)
- Added forwarding of stdout/stderr captured to driver over Gloo. (#2646)
Fixed
- Fixed broadcast_optimizer_state to handle NoneType params for PyTorch 1.8. (#2624)
- Fixed
local_rank
support for Ray. (#2596)
- Fixed DL estimators to obtain the output df schema without sampling the input. (#2611)
- Fixed wrong default for horovod.tensorflow.keras.allreduce average (#2627)
[v0.21.1] - 2021-01-06
Added
- Added in-memory dataset caching param to
TorchEstimator
. (#2434)
- Added
val_batch_size
param to the Estimator API. (#2505)
- Added support for TorchScript modules when using
TorchEstimator
. (#2494)
Changed
- Migrated to oneCCL aligned with oneAPI specification v1.0. (#2513)
- Added knob to set cache hint for oneCCL allreduce. (#2560)
- Renamed
horovodrun
arg --ccl-bgt-affinity
to --thread-affinity
. (#2562)
- Changed default build parallelism from
-j8
to -j1
to address potential race condition. (#2572)
Fixed
- Fixed building Horovod for ROCm PyTorch with newer hipify script. (#2360)
- Fixed "Executable class" support for Ray. (#2510)
- Fixed TorchEstimator returning model without switching to eval mode. (#2517)
- Remove ssh reliance for Ray elastic training. (#2528)
- Fixed error handling for changing framework without reinstalling horovod. (#2529)
- Fixed "Intermediate path does not exist" error with DBFSLocalStore. (#2526)
- Avoid synchronization if workers are only shrinked in elastic mode. (#2514)
- Fixed Ray resource test. (#2575)
- Fixed usage of env variable
HOROVOD_GLOO_TIMEOUT_SECONDS
with horovodrun
. (#2571)
[v0.21.0] - 2020-11-23
Added
- Added support for backward_passes_per_step > 1 for TF Keras graph mode. (#2346)
- Added support for backward_passes_per_step > 1 for TF Keras eager execution. (#2371)
- Added support for backward_passes_per_step > 1 for TF LegacyOptimizer in graph mode. (#2401)
- Added grouped allreduce to enable more efficient tensor fusion and deterministic training. (#2453)
- Add support for specifying
op
and compression
in horovod.tensorflow.keras.allreduce()
. (#2423)
- Adding support for batched D2D memcopy kernel on GPU. (#2435)
- Added schema inference in Spark Estimator without sampling. (#2373)
- Added
Store.create("dbfs:/")
mapping to DBFSLocalStore("/dbfs/...")
. (#2376)
Changed
- Changed Keras callbacks to require parameter
initial_lr
of LearningRateScheduleCallback
and LearningRateWarmupCallback
. (#2459)
- Changed default cycle time from 5ms to 1ms and fusion threshold from 64MB to 128MB. (#2468)
Fixed
- Fixed support for TensorFlow v2.4.0. (#2381)
- Fixed averaging using CUDA half2 implementation one element half buffers. (#2375)
- Fixed
HOROVOD_THREAD_AFFINITY
when using oneCCL. (#2350)
- Added timeout to SSH check in horovodrun to prevent hanging. (#2448)
- Added
HOROVOD_GLOO_TIMEOUT_SECONDS
value to error messages. (#2436)
- Fixed race condition in dynamic timeline API. (#2341)
- Fixed --log-hide-timestamp to apply to driver logs with Gloo. (#2388)
- Fixed the search order of Eigen and Flatbuffers paths. (#2473)
- Fixed type checks in
TorchEstimator
to correctly use isinstance()
. (#2480)
[0.20.3] - 2020-10-01
Added
- Added Elastic Ray integration. (#2291)
Changed
- Removed dependency on SSH access for Ray. (#2275)
[0.20.2] - 2020-09-25
Fixed
- Fixed building Horovod without HOROVOD_WITHOUT_MXNET when MXNet is not installed. (#2334)
[0.20.1] - 2020-09-25
Added
- Added Databricks storage
DBFSLocalStore
and support for GPU-aware scheduling to horovod.spark Estimator. (#2234)
- Added ElasticSampler and PyTorch Elastic ImageNet example. (#2297)
- Added ability to dynamically start and stop timeline programmatically. (#2215)
- Added support for Gloo on macOS. (#2254)
- Exposed name argument to TensorFlow allreduce operation. (#2325)
- Added option to strip outer name scope from Horovod ops in TensorFlow. (#2328)
Fixed
- Fixed usage of VERBOSE=1 when setting custom MAKEFLAGS. (#2239)
- Fixed bugs in Keras Elastic Callback classes. (#2289)
- Fixed RelWithDebInfo build and made it the default with -03 optimizations. (#2305)
- Fixed usage of tf.cond in TensorFlow alltoall gradient. (#2327)
- Fixed allreduce averaging for TF IndexedSlices in ROCm path. (#2279)
- Include stdexcept to handle certain compiler / frameworks that don't include it already. (#2238)
- Fixed Debug builds by setting compiler options based on CMake build type. (#2263)
- Skipped launching zero-sized send/recvs for NCCLAlltoall. (#2273)
- Fixed missing run in tf keras elastic mode. (#2272)
- Fixed loss function in TensorFlow2 elastic synthetic benchmark. (#2265)
- Fixed usage of HOROVOD_MIXED_INSTALL env var in alltoall tests. (#2266)
- Removed keras requirement from Ray example. (#2262)
[0.20.0] - 2020-09-02
Added
- Added bare-metal elastic mode implementation to enable auto-scaling and fault tolerance. (#1849)
- Added Elastic Horovod support for Spark auto-scaling. (#1956)
- Added All-to-All operation for TensorFlow, PyTorch, and MXNet. (#2143)
- Added support for
gradient_predivide_factor
and averaging in Horovod backend. (#1949)
- Added NCCL implementation of the allgather operation. (#1952)
- Added
HOROVOD_GPU_OPERATIONS
installation variable to simplify enabling NCCL support for all GPU operations. (#1960)
- Added TensorFlow implementation of
SyncBatchNormalization
layer. (#2075)
- Added
hvd.is_initialized()
method. (#2020)
- Added
hvd.allgather_object
function for TensorFlow, PyTorch, and MXNet. (#2166)
- Added
hvd.broadcast_object
function for MXNet. (#2122)
- Added
label_shapes
parameter to KerasEstimator and TorchEstimator. (#2140)
- Added optional
modelCheckPoint
callback to KerasEstimator params. (#2124)
- Added
ssh_identity_file
argument to horovodrun
. (#2201)
- Added support for
horovodrun
on kubeflow/mpi-job
. (#2199)
- Added Ray integration. (#2218)
Changed
- Moved
horovod.run.runner.run
to horovod.run
. (#2099)
- HOROVOD_THREAD_AFFINITY accepts multiple values, one for every Horovod rank (#2131)
- Migrated build system for native libraries to CMake (#2009)
Deprecated
- HOROVOD_CCL_BGT_AFFINITY is deprected. Use HOROVOD_THREAD_AFFINITY instead (#2131)
Removed
- Dropped support for Python 2. (#1954)
- Dropped support for TensorFlow < 1.15. (#2169)
- Dropped support for PyTorch < 1.2. (#2086)
Fixed
- Fixed MXNet allgather implementation to correctly handle resizing the output buffer. (#2092)
- Fixed Keras Spark Estimator incompatibility with TensorFlow 1.15 due to
tf.autograph
. (#2069)
- Fixed API compatibility with PyTorch 1.6. (#2051)
- Fixed Keras API compatibility with TensorFlow 2.4.0. (#2178)
- Fixed allgather gradient for TensorFlow 2 in cases where the tensor shape is not known during graph construction. (#2121)
- Fixed running using Gloo with an imbalanced number of workers per host. (#2212)