cd9f23fe71
Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com> |
||
---|---|---|
.. | ||
chaos_objects | ||
scripts | ||
chaos_commons.py | ||
chaos_test.sh | ||
checker.py | ||
cluster-values.yaml | ||
constants.py | ||
README.md | ||
standalone-values.yaml | ||
test_chaos_data_consist.py | ||
test_chaos_memory_stress.py | ||
test_chaos.py |
Chaos Tests
Goal
Chaos tests are designed to check the reliability of Milvus.
For instance, if one pod is killed:
- verify that it restarts automatically
- verify that the related operation fails, while the other operations keep working successfully during the absence of the pod
- verify that all the operations work successfully after the pod back to running state
- verify that no data lost
Prerequisite
Chaos tests run in pytest framework, same as e2e tests.
Please refer to Run E2E Tests
Flow Chart
Test Scenarios
Milvus in cluster mode
pod kill
-
root coordinator pod is killed
-
proxy pod is killed
-
data coordinator pod is killed
-
data node pod is killed
-
index coordinator pod is killed
-
index node pod is killed
-
query coordinator pod is killed
-
query node pod is killed
-
minio pod is killed
pod network partition
two direction(to and from) network isolation between a pod and the rest of the pods
pod failure
Set the pod(querynode, indexnode and datanode)as multiple replicas, make one of them failure, and test milvus's functionality
Milvus in standalone mode
-
standalone pod is killed
-
minio pod is killed
How it works
- Test scenarios are designed by different chaos objects
- Every chaos object is defined in one yaml file locates in folder
chaos_objects
- Every chaos yaml file specified by
ALL_CHAOS_YAMLS
inconstants.py
would be parsed as a parameter and be passed intotest_chaos.py
- All expectations of every scenario are defined in
testcases.yaml
locates in folderchaos_objects
- Chaos Mesh is used to inject chaos into Milvus in
test_chaos.py
Run
Manually
Run a single test scenario manually(take query node pod is killed as instance):
-
update
ALL_CHAOS_YAMLS = 'chaos_querynode_podkill.yaml'
inconstants.py
-
run the commands below:
cd /milvus/tests/python_client/chaos pytest test_chaos.py --host ${Milvus_IP} -v
Run multiple test scenario in a category manually(take network partition chaos for all pods as instance):
-
update
ALL_CHAOS_YAMLS = 'chaos_*_network_partition.yaml'
inconstants.py
-
run the commands below:
cd /milvus/tests/python_client/chaos pytest test_chaos.py --host ${Milvus_IP} -v
Automation Scripts
Run test scenario automatically:
- update chaos type and pod in
chaos_test.sh
- run the commands below:
cd /milvus/tests/python_client/chaos # in this step, script will install milvus and run testcase bash chaos_test.sh
Github Action
Nightly
still in planning
Todo
- pod_failure
- container_kill
- network attack
- memory stress
How to contribute
- Get familiar with chaos engineering and Chaos Mesh
- Design chaos scenarios, preferring to pick from todo list
- Generate yaml file for your chaos scenarios. You can create a chaos experiment in chaos-dashboard, then download the yaml file of it.
- Add yaml file to chaos_objects dir and rename it as
chaos_${component_name}_${chaos_type}.yaml
. Make surekubectl apply -f ${your_chaos_yaml_file}
can take effect - Add testcase in
testcases.yaml
. You should figure out the expectation of milvus during the chaos - Run your added testcase according to
Manually
above and check whether it as your expectation