zhuwenxing a08000cfbf enhance:[skip e2e]update one pod mode resource (#34196) 3 月之前
..
chaos_objects 29b3c35b9e [skip e2e]Add mixcoord component for chaos test (#18712) 2 年之前
config 7d4cb79645 [skip e2e]Update multi replicas chaos test (#17739) 2 年之前
scripts 8c71e2bd64 Update milvs helm repo for ci (#28042) 11 月之前
testcases 9a269f1489 test: add import checker to chaos test (#32908) 5 月之前
README.md e045f35440 [skip e2e]Update README for chaos test (#14546) 2 年之前
chaos_commons.py 0f4475e5e3 [test]Refine health checker in test (#26920) 1 年之前
chaos_test.sh b745b6f707 [skip e2e]Add all pods kill chaos test (#15761) 2 年之前
checker.py 3336b91ce6 test: add channel exclusive balance test and resource group test (#33093) 4 月之前
cluster-values.yaml 88d426f390 enhance:[skip e2e]add some custom deploy config back (#31716) 6 月之前
conftest.py 9a269f1489 test: add import checker to chaos test (#32908) 5 月之前
constants.py f0bff1e1a8 test: add hybrid search in checker for test (#30341) 8 月之前
nats-standalone-values.yaml 99721c8dd2 [skip e2e]Disable debug mode in etcd image (#27212) 1 年之前
one-pod-standalone-values.yaml a08000cfbf enhance:[skip e2e]update one pod mode resource (#34196) 3 月之前
requirements.txt c91254f762 test: update pyarrow version (#29992) 9 月之前
run.sh 2c5e911d86 [skip e2e]Update script to run all chaos test (#14004) 2 年之前
standalone-values.yaml 88d426f390 enhance:[skip e2e]add some custom deploy config back (#31716) 6 月之前
test_chaos.py 6efb7afd3f test: add more request type checker for test (#29210) 10 月之前
test_chaos_apply.py 1bad19a121 [test]Fix apply chaos condition (#27625) 1 年之前
test_chaos_apply_to_coord.py 2bcd1bb0d8 [test]Add standby test and adapt to different schemas (#24781) 1 年之前
test_chaos_apply_to_determined_pod.py 0a2655dba0 test: fix chaos apply time (#31076) 7 月之前
test_chaos_bulk_insert.py 1a4c0fa2e8 [test]Fix bulk insert chaos test (#20341) 1 年之前
test_chaos_data_consist.py d199d00f81 [skip e2e]Add wait pod ready function for chaos data consist test (#15898) 2 年之前
test_chaos_memory_stress.py 6efb7afd3f test: add more request type checker for test (#29210) 10 月之前
test_chaos_multi_replicas.py 32c70b7f86 [skip e2e]Get the number of replicas needed to load via get_replicas (#17872) 2 年之前
test_load_with_checker.py 6efb7afd3f test: add more request type checker for test (#29210) 10 月之前

README.md

Chaos Tests

Goal

Chaos tests are designed to check the reliability of Milvus.

For instance, if one pod is killed:

  • verify that it restarts automatically
  • verify that the related operation fails, while the other operations keep working successfully during the absence of the pod
  • verify that all the operations work successfully after the pod back to running state
  • verify that no data lost

Prerequisite

Chaos tests run in pytest framework, same as e2e tests.

Please refer to Run E2E Tests

Flow Chart

Chaos Test Flow Chart

Test Scenarios

Milvus in cluster mode

pod kill

Kill pod every 5s

pod network partition

Two direction(to and from) network isolation between a pod and the rest of the pods

pod failure

Set the pod(querynode, indexnode and datanode)as multiple replicas, make one of them failure, and test milvus's functionality

pod memory stress

Limit the memory resource of pod and generate plenty of stresses over a group of pods

Milvus in standalone mode

  1. standalone pod is killed

  2. minio pod is killed

How it works

  • Test scenarios are designed by different chaos objects
  • Every chaos object is defined in one yaml file locates in folder chaos_objects
  • Every chaos yaml file specified by ALL_CHAOS_YAMLS in constants.py would be parsed as a parameter and be passed into test_chaos.py
  • All expectations of every scenario are defined in testcases.yaml locates in folder chaos_objects
  • Chaos Mesh is used to inject chaos into Milvus in test_chaos.py

Run

Manually

Run a single test scenario manually(take query node pod is killed as instance):

  1. update ALL_CHAOS_YAMLS = 'chaos_querynode_podkill.yaml' in constants.py

  2. run the commands below: ```bash cd /milvus/tests/python_client/chaos

pytest test_chaos.py --host ${Milvus_IP} -v

Run multiple test scenario in a category manually(take network partition chaos for all pods as instance):

1. update `ALL_CHAOS_YAMLS = 'chaos_*_network_partition.yaml'` in `constants.py`

2. run the commands below:
   ```bash
   cd /milvus/tests/python_client/chaos

   pytest test_chaos.py --host ${Milvus_IP} -v

Automation Scripts

Run test scenario automatically:

  1. update chaos type and pod in chaos_test.sh
  2. run the commands below:

    cd /milvus/tests/python_client/chaos
    # in this step, script will install milvus with replicas_num and run testcase
    bash chaos_test.sh ${pod} ${chaos_type} ${chaos_task} ${replicas_num}
    # example: bash chaos_test.sh querynode pod_kill chaos-test 2
    

    Github Action

    Nightly

    still in planning

    Todo

    • network attack
    • clock skew
    • IO injection

    How to contribute

    • Get familiar with chaos engineering and Chaos Mesh
    • Design chaos scenarios, preferring to pick from todo list
    • Generate yaml file for your chaos scenarios. You can create a chaos experiment in chaos-dashboard, then download the yaml file of it.
    • Add yaml file to chaos_objects dir and rename it as chaos_${component_name}_${chaos_type}.yaml. Make sure kubectl apply -f ${your_chaos_yaml_file} can take effect
    • Add testcase in testcases.yaml. You should figure out the expectation of milvus during the chaos
    • Run your added testcase according to Manually above and check whether it as your expectation