Commit History

Author SHA1 Message Date
  Cuong Nguyen 783da640a2 [ci][release] repeated run for release tests (#43472) 7 months ago
  Justin Yu aed14c8134 [train] Simplify `ray.train.xgboost/lightgbm` (5/n): Remove `xgboost_ray` and `lightgbm_ray` dependencies (for release tests) (#43425) 7 months ago
  Balaji Veeramani 006c83f6cd [Data] Move batch inference release tests to Data team (#43389) 8 months ago
  Balaji Veeramani 9ddc08c144 [Data] Remove "inference" release test (#43365) 8 months ago
  Scott Lee 54cd4ca9a4 [Data] Add heterogeneous Ray Data + Train release test (#42618) 8 months ago
  Scott Lee 681976584b [Data] Increase timeout for `iter_tensor_batches_benchmark_multi_node` release test (#43286) 8 months ago
  can c6094a96aa [ci] mark serve_autoscaling_multi_deployment.aws as unstable 8 months ago
  Justin Yu 71d37ff204 [ci][train] Remove unnecessary `xgboost_ray`/`lightgbm_ray` reinstalls for release tests (#43176) 8 months ago
  matthewdeng 6eb1814ecc [train] remove DEFAULT_NCCL_SOCKET_IFNAME (#42808) 8 months ago
  Stephanie Wang 851d154a81 [core] Make microbenchmark stable again (#42813) 8 months ago
  Jiajun Yao b05e38be6f Mark many_pgs as stable (#42687) 9 months ago
  Cuong Nguyen 6f7b66b687 [ci] fix workspace_template_serving_stable_diffusion (#42397) 9 months ago
  Ricky Xu f76081f0f4 [core][ci] Run dag microbenchmark seperately as unstable (#42360) 9 months ago
  Cuong Nguyen 3ddaef2b5d [ci] upgrade release tests to py39 (#42102) 9 months ago
  Artur Niederfahrenhorst 7634169bdb [Workspace template] Fix version conflict with torch (#42094) 10 months ago
  Cuong Nguyen 405735828a [ci] disable finetuning tests (#42038) 10 months ago
  Cuong Nguyen 2daf1fc4a7 [ci] mark some rllib as non release blocking (#41698) 10 months ago
  Hao Chen 27b794cd1d [data][train] default ingest resource limits should exclude resources used by training (#41603) 10 months ago
  Andrew Xue d4baa3f9dc [data] add task filter for node killer (#41099) 10 months ago
  Scott Lee d1a1e7bbd2 [Data] Fix timeout on `read_images_train_4_gpu_chaos` release test (#41368) 11 months ago
  Gene Der Su a9a17aed08 [CI][Serve] unjail serve_handle_wide_ensemble.aws (#41345) 11 months ago
  Gene Der Su d3ac31ec80 [CI][Serve] unjail serve_handle_long_chain.aws (#41344) 11 months ago
  Gene Der Su 636234f94a [Serve] fix long_running_serve.aws (#41322) 11 months ago
  Archit Kulkarni a1a9a48e5b [CI] [Cluster] Fix example GCP GPU/docker example cluster YAML file (#41134) 11 months ago
  Scott Lee e9d5ac9eae [Data] e2e multi-node train benchmark (#41034) 11 months ago
  Andrew Xue 80018ffaf4 [data] add data worker killer (#41112) 11 months ago
  Balaji Veeramani 29aea3ddd6 [Data] Add fault tolerance to remote tasks (#41084) 11 months ago
  Balaji Veeramani 9d137f9fd4 [Data] Add multi-node `read_images` benchmark (#40683) 11 months ago
  Balaji Veeramani daf4f62697 [Data] Remove AIR data bulk benchmark (#40801) 11 months ago
  Yunxuan Xiao 1d15058488 [2.8][Train] Fix GPT-J Deepspeed Fine-tuning Release Test (#40648) 1 year ago