milvus/tests/python_client/chaos
zhuwenxing dc328679a1
[skip ci]Update chaos readme for running multi cases (#8252)
Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
2021-09-18 20:09:57 +08:00
..
chaos_objects [skip ci]Update testcases for network partition chaos (#8250) 2021-09-18 19:15:51 +08:00
chaos_commons.py [skip ci]Remove useless code (#8184) 2021-09-18 10:43:51 +08:00
checker.py [skip ci]Add comments for SearchChecker (#8073) 2021-09-16 17:23:50 +08:00
constants.py [skip ci]Refector custom resource definition class for test framework (#8065) 2021-09-16 16:35:53 +08:00
README.md [skip ci]Update chaos readme for running multi cases (#8252) 2021-09-18 20:09:57 +08:00
test_chaos_data_consist.py Optimize import of chaos (#8099) 2021-09-16 18:49:49 +08:00
test_chaos.py Fix reconnect func call (#8133) 2021-09-17 14:33:50 +08:00

Chaos Tests

Goal

Chaos tests are designed to check the reliability of Milvus.

For instance, if one pod is killed:

  • verify that it restarts automatically
  • verify that the related operation fails, while the other operations keep working successfully during the absence of the pod
  • verify that all the operations work successfully after the pod back to running state
  • verify that no data lost

Prerequisite

Chaos tests run in pytest framework, same as e2e tests.

Please refer to Run E2E Tests

Test Scenarios

Milvus in cluster mode

  1. root coordinator pod is killed

  2. proxy pod is killed

  3. data coordinator pod is killed

  4. data node pod is killed

  5. index coordinator pod is killed

  6. index node pod is killed

  7. query coordinator pod is killed

  8. query node pod is killed

  9. minio pod is killed

Milvus in standalone mode

  1. standalone pod is killed

  2. minio pod is killed

How it works

  • Test scenarios are designed by different chaos objects
  • Every chaos object is defined in one yaml file locates in folder chaos_objects
  • Every chaos yaml file specified by ALL_CHAOS_YAMLS in constants.py would be parsed as a parameter and be passed into test_chaos.py
  • All expectations of every scenario are defined in testcases.yaml locates in folder chaos_objects
  • Chaos Mesh is used to inject chaos into Milvus in test_chaos.py

Run

Manually

Run a single test scenario manually(take query node pod is killed as instance):

  1. update ALL_CHAOS_YAMLS = 'chaos_querynode_podkill.yaml' in constants.py

  2. run the commands below:

    cd /milvus/tests/python_client/chaos
    
    pytest test_chaos.py --host ${Milvus_IP} -v
    

Run multiple test scenario in a category manually(take network partition chaos for all pods as instance):

  1. update ALL_CHAOS_YAMLS = 'chaos_*_network_partition.yaml' in constants.py

  2. run the commands below:

    cd /milvus/tests/python_client/chaos
    
    pytest test_chaos.py --host ${Milvus_IP} -v
    

Nightly

still in planning

Todo

  • pod_failure
  • container_kill
  • network attack

How to contribute

  • Get familiar with chaos engineering and Chaos Mesh
  • Design chaos scenarios, preferring to pick from todo list
  • Generate yaml file for your chaos scenarios. You can create a chaos experiment in chaos-dashboard, then download the yaml file of it.
  • Add yaml file to chaos_objects dir and rename it as chaos_${component_name}_${chaos_type}.yaml. Make sure kubectl apply -f ${your_chaos_yaml_file} can take effect
  • Add testcase in testcases.yaml. You should figure out the expectation of milvus during the chaos
  • Run your added testcase according to Manually above and check whether it as your expectation