mirror of https://gitee.com/milvus-io/milvus.git synced 2024-12-01 03:18:29 +08:00

History

ThreadDao 99c4ec44f0 [skip ci] Test chaos memory stress indexnode (#10852 ) Signed-off-by: ThreadDao <yufen.zong@zilliz.com>		2021-10-28 20:30:49 +08:00
..
chaos_objects	[skip ci] Test chaos memory stress indexnode (#10852 )	2021-10-28 20:30:49 +08:00
scripts	[skip ci]Update install script (#10848 )	2021-10-28 20:06:58 +08:00
chao_test.sh	[skip ci]Add parameter passing to shell scripts (#10600 )	2021-10-25 20:34:23 +08:00
chaos_commons.py	[skip ci]Add comments in chaos commons (#9538 )	2021-10-09 18:09:55 +08:00
checker.py	[skip ci]Add comments in checker (#9345 )	2021-10-06 20:53:03 +08:00
cluster-values.yaml	[skip ci]Update helm deploy param in chaos test (#9574 )	2021-10-09 18:18:57 +08:00
constants.py	[skip ci]Fix pod failure chaos github action (#10208 )	2021-10-19 21:08:44 +08:00
README.md	[skip ci]Update readme of chaos test (#10783 )	2021-10-27 21:10:25 +08:00
test_chaos_data_consist.py	[skip ci]Add comments for chaos_data_consist test (#9380 )	2021-10-07 20:35:14 +08:00
test_chaos_memory_stress.py	[skip ci] Test chaos memory stress indexnode (#10852 )	2021-10-28 20:30:49 +08:00
test_chaos.py	[skip ci]Add log info after chaos applied (#10851 )	2021-10-28 20:26:54 +08:00

README.md

Chaos Tests

Goal

Chaos tests are designed to check the reliability of Milvus.

For instance, if one pod is killed:

verify that it restarts automatically
verify that the related operation fails, while the other operations keep working successfully during the absence of the pod
verify that all the operations work successfully after the pod back to running state
verify that no data lost

Prerequisite

Chaos tests run in pytest framework, same as e2e tests.

Please refer to Run E2E Tests

Test Scenarios

Milvus in cluster mode

pod kill

root coordinator pod is killed
proxy pod is killed
data coordinator pod is killed
data node pod is killed
index coordinator pod is killed
index node pod is killed
query coordinator pod is killed
query node pod is killed
minio pod is killed

pod network partition

two direction(to and from) network isolation between a pod and the rest of the pods

pod failure

Set the pod（querynode, indexnode and datanode）as multiple replicas, make one of them failure, and test milvus's functionality

Milvus in standalone mode

standalone pod is killed
minio pod is killed

How it works

Test scenarios are designed by different chaos objects
Every chaos object is defined in one yaml file locates in folder chaos_objects
Every chaos yaml file specified by ALL_CHAOS_YAMLS in constants.py would be parsed as a parameter and be passed into test_chaos.py
All expectations of every scenario are defined in testcases.yaml locates in folder chaos_objects
Chaos Mesh is used to inject chaos into Milvus in test_chaos.py

Run

Manually

Run a single test scenario manually(take query node pod is killed as instance):

update ALL_CHAOS_YAMLS = 'chaos_querynode_podkill.yaml' in constants.py

run the commands below:

cd /milvus/tests/python_client/chaos

pytest test_chaos.py --host ${Milvus_IP} -v

Run multiple test scenario in a category manually(take network partition chaos for all pods as instance):

update ALL_CHAOS_YAMLS = 'chaos_*_network_partition.yaml' in constants.py

run the commands below:

cd /milvus/tests/python_client/chaos

pytest test_chaos.py --host ${Milvus_IP} -v

Automation Scripts

Run test scenario automatically:

update chaos type and pod in chaos_test.sh

run the commands below:

cd /milvus/tests/python_client/chaos
# in this step, script will install milvus and run testcase
bash chaos_test.sh

Github Action

Nightly

still in planning

Todo

pod_failure
container_kill
network attack
memory stress

How to contribute

Get familiar with chaos engineering and Chaos Mesh
Design chaos scenarios, preferring to pick from todo list
Generate yaml file for your chaos scenarios. You can create a chaos experiment in chaos-dashboard, then download the yaml file of it.
Add yaml file to chaos_objects dir and rename it as chaos_${component_name}_${chaos_type}.yaml. Make sure kubectl apply -f ${your_chaos_yaml_file} can take effect
Add testcase in testcases.yaml. You should figure out the expectation of milvus during the chaos
Run your added testcase according to Manually above and check whether it as your expectation

README.md Unescape Escape

Chaos Tests

Goal

Prerequisite

Test Scenarios

Milvus in cluster mode

pod kill

pod network partition

pod failure

Milvus in standalone mode

How it works

Run

Manually

Automation Scripts

Github Action

Nightly

Todo

How to contribute

README.md