milvus/tests/benchmark/README.md

The milvus_benchmark is a non-functional testing tool or service which allows users to run tests on k8s cluster or at local, the primary use case is performance/load/stability testing, the objective is to expose problems in milvus project.

## Quick start

### Description

- Test cases in `milvus_benchmark` can be organized with `yaml`
- Test can run with local mode or helm mode
   - local: install and start your local server, and pass the host/port param when start the tests
   - helm: install the server by helm, which will manage the milvus in k8s cluster, and you can integrate the test stage into argo workflow or jenkins pipeline
   
### Usage

-  Using jenkins:
   Use `ci/main_jenkinsfile` as the jenkins pipeline file
-  Using argo： 
   Example argo workflow yaml configuration: `ci/argo.yaml`

   The client environment can be found in file `Dockerfile`

-  Local test：

   1. Set PYTHONPATH:
   
      ```bash
      $ export PYTHONPATH=/your/project/path/milvus_benchmark
      ```
   
   2. Prepare data: 
   
      if we need to use the sift/deep dataset as the raw data input, then mount NAS and update `RAW_DATA_DIR` in `config.py`, the example mount command:
   
      ```bash
      $ sudo mount -t cifs -o username=test,vers=1.0 //172.16.70.249/test /test
      ```
   
   3. Install requirements:
   
      ```bash
      $ pip install -r requirements.txt
      ```
   
   4. Install the [Python-SDK for milvus](https://github.com/milvus-io/pymilvus).
   
   5. Write test yaml and run with the yaml param:
   
      ```bash
      $ cd milvus_benchmark/ && python main.py --local --host=* --port=19530 --suite=suites/2_insert_data.yaml
      ```

### Test suite

#### Description

Test suite yaml defines the test process, users need to add test suite yaml if adding a customized test into the current test framework.

#### Example

Take the test file `2_insert_data.yaml` as an example
```yaml
insert_performance:
  collections:
     -
       milvus:
         db_config.primary_path: /test/milvus/db_data_2/cluster/sift_1m_128_l2
         wal_enable: true
       collection_name: sift_1m_128_l2
       ni_per: 50000
       build_index: false
       index_type: ivf_sq8
       index_param:
         nlist: 1024
```
- `insert_performance`

   The top level is the runner type: the other test types including: `search_performance/build_performance/insert_performance/accuracy/locust_insert/...`, each test type corresponds to the different runner component defined in directory `runnners`

- other fields under runner type

   The other parts in the test yaml is the params pass to the runner, such as:
   - The field `collection_name` means which kind of collection will be created in milvus
   - The field `ni_per` means the batch size
   - The filed `build_index` means that whether to create index during inserting

While using argo workflow as benchmark pipeline, the test suite is made of both `client` and `server` configmap, an example:

`server`
```yaml
kind: ConfigMap
apiVersion: v1
metadata:
  name: server-cluster-8c16m
  namespace: qa
  uid: 3752f85c-c840-40c6-a5db-ae44146ad8b5
  resourceVersion: '42213135'
  creationTimestamp: '2021-05-14T07:00:53Z'
  managedFields:
    - manager: dashboard
      operation: Update
      apiVersion: v1
      time: '2021-05-14T07:00:53Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:data':
          .: {}
          'f:config.yaml': {}
data:
  config.yaml: |
    server:
      server_tag: "8c16m"
    milvus:
      deploy_mode: "cluster"
```
`client`
```yaml
kind: ConfigMap
apiVersion: v1
metadata:
  name: client-insert-batch-1000
  namespace: qa
  uid: 8604c277-f00f-47c7-8fcb-9b3bc97efa74
  resourceVersion: '42988547'
  creationTimestamp: '2021-07-09T08:33:02Z'
  managedFields:
    - manager: dashboard
      operation: Update
      apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:data':
          .: {}
          'f:config.yaml': {}
data:
  config.yaml: |
    insert_performance:
      collections:
        - 
          milvus:
            wal_enable: true
          collection_name: sift_1m_128_l2
          ni_per: 1000
          build_index: false
          index_type: ivf_sq8
          index_param:
            nlist: 1024
```

### How to prepare data

#### Source data

There are several kinds of data types provided in benchmark:
1. Insert from `local`: random generated vectors
2. Insert from the file: the other data type such as `sift/deep`, the following list shows where the source data comes from, make sure to convert to `.npy` file format that can be loaded by `numpy`, and update the value of `RAW_DATA_DIR` in `config.py` to your own data path

| data type | sift                           | deep                                        |
| --------- | ------------------------------ | ------------------------------------------- |
| url       | http://corpus-texmex.irisa.fr/ | https://github.com/erikbern/ann-benchmarks/ |

There are also many optional datasets could be used to test milvus, here is the reference: http://big-ann-benchmarks.com/index.html

If the first few characters in the `collection_name` in test suite yaml are matched with the above type, the corresponding data will be created during inserting entities in milvus

Also, you should provide the field value of the source data file path `source_file` if running with `ann_accuracy` runner type, the source datasets could be found from https://github.com/erikbern/ann-benchmarks/, `SIFT/Kosarak/GloVe-200` are the datasets which are frequently used in regression testing for milvus

## Overview of the benchmark

### Components

- `main.py`
  
   The entry file: parse the input params and initialize the other components: `metric`, `env`, `runner`

- `metric`

   The test result can be used to analyze the regression or improvement of the milvus system, so we upload the metrics of the test result when a test suite run finished, and then use `redash` to make sense of our data

- `db`

   Currently we use the `mongodb` to store the test result

- `env`

   The `env` component defines the server environment and environment management, the instance of the `env` corresponds to the run mode of the benchmark
   
   - `local`: Only defines the host and port for testing

   - `helm/docker`: Install and uninstall the server in benchmark stage

- `runner`

   The actual executor in benchmark, each test type defined in test suite will generate the corresponding runner instance, there are three stages in `runner`:
   
   - `extract_cases`: There are several test cases defined in each test suite yaml, and each case shares the same server environment and shares the same `prepare` stage, but the `metric` for each case is different, so we need to extract cases from the test suite before the cases runs

   - `prepare`: Prepare the data and operations, for example, before running searching, index needs to be created and data needs to be loaded

   - `run_case`: Do the core operation and set `metric` value

- `suites`: There are two ways to take the content to be tested as input parameters： 
   - Test suite files under `suites` directory
   - Test suite configmap name including `server_config_map` and `client_config_map` if using argo workflow

- `update.py`: While using argo workflow as benchmark pipeline, we have two steps in workflow template: `install-milvus` and `client-test`
   - In stage `install-milvus`, `update.py` is used to generate a new `values.yaml` which will be a param while in `helm install` operation
   - In stage `client-test`, it runs `main.py` and receives the milvus host and port as the cmd params, with the run mode `local` 

### Conceptual overview 

   The following diagram shows the runtime execution graph of the benchmark (local mode based on argo workflow)

   <img src="assets/uml.jpg" />

## Test report

### Metrics

As the above section mentioned, we will collect the test metrics after test case run finished, here is the main metric field:
```
run_id      : each test suite will generate a run_id
mode        : run mode such as local
server      : describe server resource and server version
hardware    : server host
env         : server config
status      : run result
err_message : error msg when run failed
collection  : collection info
index       : index type and index params
search      : search params
run_params  : extra run params
metrics     : metric type and metric value
```

### How to visualize test result

As the metrics uploaded to the db (we use MongoDB currently), we suppose use Redash to visualize test result from https://redash.io/.

For example, in order to find the most suitable insert batch size when preparing data with milvus, a benchmark test suite type named `bp_insert_performance` will run regularly, different `ni_per` in this suite yaml will be executed and the average response time and TPS (Number of rows inserted per second) will be collected.

The query expression:
```json
{
    "collection": "doc",
    "query": {
        "metrics.type": "bp_insert_performance",
        "collection.dataset_name": "sift_1m_128_l2",
        "_type": "case",
        "server.value.mode": "single"
    },
    "fields": {
        "metrics.value.rps": 1,
        "datetime": 4,
        "run_id": 5,
        "server.value.mode": 6,
        "collection.ni_per": 7,
        "metrics.value.ni_time": 8
    },
    "sort": [{
        "name": "run_id",
        "direction": -1
    }],
    "limit": 28
}
```

After the execution of the above query, we will get its charts:

 <img src="assets/dash.png" />

In this chart, we could find an improvement from 2.0.0-RC3 to 2.0.0-RC5.
-												[skip ci] add argo.yaml and update readme for benchmark (#7094)

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-16 11:06:10 +08:00
+								The milvus_benchmark is a non-functional testing tool or service which allows users to run tests on k8s cluster or at local, the primary use case is performance/load/stability testing, the objective is to expose problems in milvus project.
-												[skip ci] Update benchmark readme (#6765)

Signed-off-by: zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-07-23 15:36:12 +08:00
 								## Quick start
-												Bench scripts for 2.0 (#6263)

* [skip ci] update benchmark scripts for 2.0

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update README.md

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update mergify.yml for bench scripts

Signed-off-by: zhenwu <zhenwu@milvus.io>

Co-authored-by: zhenwu <zhenwu@milvus.io>
											
										
										
											2021-07-02 11:40:16 +08:00
-												[skip ci] add argo.yaml and update readme for benchmark (#7094)

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-16 11:06:10 +08:00
+								### Description
-												Bench scripts for 2.0 (#6263)

* [skip ci] update benchmark scripts for 2.0

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update README.md

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update mergify.yml for bench scripts

Signed-off-by: zhenwu <zhenwu@milvus.io>

Co-authored-by: zhenwu <zhenwu@milvus.io>
											
										
										
											2021-07-02 11:40:16 +08:00
-												[skip ci] Update benchmark readme (#6765)

Signed-off-by: zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-07-23 15:36:12 +08:00
+								- Test cases in `milvus_benchmark` can be organized with `yaml`
 								- Test can run with local mode or helm mode
 								   - local: install and start your local server, and pass the host/port param when start the tests
-												[skip ci] Fix wrong spelling (#10176)

Signed-off-by: Binbin Lv <binbin.lv@zilliz.com>
											
										
										
											2021-10-19 16:42:54 +08:00
+								   - helm: install the server by helm, which will manage the milvus in k8s cluster, and you can integrate the test stage into argo workflow or jenkins pipeline
-												[skip ci] add argo.yaml and update readme for benchmark (#7094)

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-16 11:06:10 +08:00
 								### Usage
-												Bench scripts for 2.0 (#6263)

* [skip ci] update benchmark scripts for 2.0

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update README.md

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update mergify.yml for bench scripts

Signed-off-by: zhenwu <zhenwu@milvus.io>

Co-authored-by: zhenwu <zhenwu@milvus.io>
											
										
										
											2021-07-02 11:40:16 +08:00
-												[skip ci] add argo.yaml and update readme for benchmark (#7094)

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-16 11:06:10 +08:00
+								-  Using jenkins:
-												[skip ci] Update benchmark readme (#6765)

Signed-off-by: zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-07-23 15:36:12 +08:00
+								   Use `ci/main_jenkinsfile` as the jenkins pipeline file
-												[skip ci] add argo.yaml and update readme for benchmark (#7094)

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-16 11:06:10 +08:00
+								-  Using argo：
-												Add link to python-sdk (#8055)

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
											
										
										
											2021-09-17 09:07:49 +08:00
+								   Example argo workflow yaml configuration: `ci/argo.yaml`
-												[skip ci] Update readme and requirements.txt in milvus_benchmark (#7205)

Signed-off-by: zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-21 10:02:12 +08:00
 								   The client environment can be found in file `Dockerfile`
-												[skip ci] add argo.yaml and update readme for benchmark (#7094)

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-16 11:06:10 +08:00
+								-  Local test：
-												Bench scripts for 2.0 (#6263)

* [skip ci] update benchmark scripts for 2.0

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update README.md

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update mergify.yml for bench scripts

Signed-off-by: zhenwu <zhenwu@milvus.io>

Co-authored-by: zhenwu <zhenwu@milvus.io>
											
										
										
											2021-07-02 11:40:16 +08:00
-												Add link to python-sdk (#8055)

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
											
										
										
											2021-09-17 09:07:49 +08:00
+. Set PYTHONPATH:
-												[skip ci] Update benchmark readme (#6765)

Signed-off-by: zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-07-23 15:36:12 +08:00
-												[skip ci]Update the content in markdowm format (#10070)

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
											
										
										
											2021-10-18 18:12:47 +08:00
+								      ```bash
 								      $ export PYTHONPATH=/your/project/path/milvus_benchmark
 								      ```
-												Bench scripts for 2.0 (#6263)

* [skip ci] update benchmark scripts for 2.0

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update README.md

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update mergify.yml for bench scripts

Signed-off-by: zhenwu <zhenwu@milvus.io>

Co-authored-by: zhenwu <zhenwu@milvus.io>
											
										
										
											2021-07-02 11:40:16 +08:00
-												Add link to python-sdk (#8055)

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
											
										
										
											2021-09-17 09:07:49 +08:00
+. Prepare data:
-												Bench scripts for 2.0 (#6263)

* [skip ci] update benchmark scripts for 2.0

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update README.md

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update mergify.yml for bench scripts

Signed-off-by: zhenwu <zhenwu@milvus.io>

Co-authored-by: zhenwu <zhenwu@milvus.io>
											
										
										
											2021-07-02 11:40:16 +08:00
-												[skip ci] Refine benchmark readme (#10434)

Signed-off-by: Binbin Lv <binbin.lv@zilliz.com>
											
										
										
											2021-10-22 16:29:13 +08:00
+								      if we need to use the sift/deep dataset as the raw data input, then mount NAS and update `RAW_DATA_DIR` in `config.py`, the example mount command:
-												[skip ci]Update the content in markdowm format (#10070)

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
											
										
										
											2021-10-18 18:12:47 +08:00
-												[skip ci]Update README (#11820)

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
											
										
										
											2021-11-15 19:42:08 +08:00
+								      ```bash
-												[skip ci]Update the content in markdowm format (#10070)

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
											
										
										
											2021-10-18 18:12:47 +08:00
+								      $ sudo mount -t cifs -o username=test,vers=1.0 //172.16.70.249/test /test
-												[skip ci]Update README (#11820)

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
											
										
										
											2021-11-15 19:42:08 +08:00
+								      ```
-												Bench scripts for 2.0 (#6263)

* [skip ci] update benchmark scripts for 2.0

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update README.md

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update mergify.yml for bench scripts

Signed-off-by: zhenwu <zhenwu@milvus.io>

Co-authored-by: zhenwu <zhenwu@milvus.io>
											
										
										
											2021-07-02 11:40:16 +08:00
-												Add link to python-sdk (#8055)

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
											
										
										
											2021-09-17 09:07:49 +08:00
+. Install requirements:
-												[skip ci]Update the content in markdowm format (#10070)

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
											
										
										
											2021-10-18 18:12:47 +08:00
 								      ```bash
 								      $ pip install -r requirements.txt
 								      ```
-												Add link to python-sdk (#8055)

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
											
										
										
											2021-09-17 09:07:49 +08:00
+. Install the [Python-SDK for milvus](https://github.com/milvus-io/pymilvus).
-												[skip ci] Update benchmark readme (#6765)

Signed-off-by: zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-07-23 15:36:12 +08:00
-												Add link to python-sdk (#8055)

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
											
										
										
											2021-09-17 09:07:49 +08:00
+. Write test yaml and run with the yaml param:
-												[skip ci] Update benchmark readme (#6765)

Signed-off-by: zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-07-23 15:36:12 +08:00
-												[skip ci]Update the content in markdowm format (#10070)

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
											
										
										
											2021-10-18 18:12:47 +08:00
+								      ```bash
 								      $ cd milvus_benchmark/ && python main.py --local --host=* --port=19530 --suite=suites/2_insert_data.yaml
 								      ```
-												Bench scripts for 2.0 (#6263)

* [skip ci] update benchmark scripts for 2.0

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update README.md

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update mergify.yml for bench scripts

Signed-off-by: zhenwu <zhenwu@milvus.io>

Co-authored-by: zhenwu <zhenwu@milvus.io>
											
										
										
											2021-07-02 11:40:16 +08:00
-												[skip ci] add argo.yaml and update readme for benchmark (#7094)

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-16 11:06:10 +08:00
+								### Test suite
 								#### Description
 								Test suite yaml defines the test process, users need to add test suite yaml if adding a customized test into the current test framework.
 								#### Example
 								Take the test file `2_insert_data.yaml` as an example
-												[skip ci]Update the content in markdowm format (#10070)

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
											
										
										
											2021-10-18 18:12:47 +08:00
+								```yaml
-												[skip ci] add argo.yaml and update readme for benchmark (#7094)

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-16 11:06:10 +08:00
+								insert_performance:
 								  collections:
 								     -
 								       milvus:
 								         db_config.primary_path: /test/milvus/db_data_2/cluster/sift_1m_128_l2
 								         wal_enable: true
 								       collection_name: sift_1m_128_l2
 								       ni_per: 50000
 								       build_index: false
 								       index_type: ivf_sq8
 								       index_param:
 								         nlist: 1024
 								```
 								- `insert_performance`
-												[skip ci] Fix wrong spelling (#10176)

Signed-off-by: Binbin Lv <binbin.lv@zilliz.com>
											
										
										
											2021-10-19 16:42:54 +08:00
+								   The top level is the runner type: the other test types including: `search_performance/build_performance/insert_performance/accuracy/locust_insert/...`, each test type corresponds to the different runner component defined in directory `runnners`
-												[skip ci] add argo.yaml and update readme for benchmark (#7094)

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-16 11:06:10 +08:00
 								- other fields under runner type
 								   The other parts in the test yaml is the params pass to the runner, such as:
 								   - The field `collection_name` means which kind of collection will be created in milvus
 								   - The field `ni_per` means the batch size
 								   - The filed `build_index` means that whether to create index during inserting
-												[skip ci] Improve readme description (#10931)

Signed-off-by: Binbin Lv <binbin.lv@zilliz.com>
											
										
										
											2021-10-29 21:48:41 +08:00
+								While using argo workflow as benchmark pipeline, the test suite is made of both `client` and `server` configmap, an example:
-												[skip ci] add argo.yaml and update readme for benchmark (#7094)

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-16 11:06:10 +08:00
 								`server`
-												[skip ci]Update the content in markdowm format (#10070)

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
											
										
										
											2021-10-18 18:12:47 +08:00
+								```yaml
-												[skip ci] add argo.yaml and update readme for benchmark (#7094)

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-16 11:06:10 +08:00
+								kind: ConfigMap
 								apiVersion: v1
 								metadata:
 								  name: server-cluster-8c16m
 								  namespace: qa
 								  uid: 3752f85c-c840-40c6-a5db-ae44146ad8b5
 								  resourceVersion: '42213135'
 								  creationTimestamp: '2021-05-14T07:00:53Z'
 								  managedFields:
 								    - manager: dashboard
 								      operation: Update
 								      apiVersion: v1
 								      time: '2021-05-14T07:00:53Z'
 								      fieldsType: FieldsV1
 								      fieldsV1:
 								        'f:data':
 								          .: {}
 								          'f:config.yaml': {}
 								data:
 								  config.yaml: |
 								    server:
 								      server_tag: "8c16m"
 								    milvus:
 								      deploy_mode: "cluster"
 								```
 								`client`
-												[skip ci]Update the content in markdowm format (#10070)

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
											
										
										
											2021-10-18 18:12:47 +08:00
+								```yaml
-												[skip ci] add argo.yaml and update readme for benchmark (#7094)

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-16 11:06:10 +08:00
+								kind: ConfigMap
 								apiVersion: v1
 								metadata:
 								  name: client-insert-batch-1000
 								  namespace: qa
 								  uid: 8604c277-f00f-47c7-8fcb-9b3bc97efa74
 								  resourceVersion: '42988547'
 								  creationTimestamp: '2021-07-09T08:33:02Z'
 								  managedFields:
 								    - manager: dashboard
 								      operation: Update
 								      apiVersion: v1
 								      fieldsType: FieldsV1
 								      fieldsV1:
 								        'f:data':
 								          .: {}
 								          'f:config.yaml': {}
 								data:
 								  config.yaml: |
 								    insert_performance:
 								      collections:
 								        -
 								          milvus:
 								            wal_enable: true
 								          collection_name: sift_1m_128_l2
 								          ni_per: 1000
 								          build_index: false
 								          index_type: ivf_sq8
 								          index_param:
 								            nlist: 1024
 								```
-												[skip ci] Update readme and requirements.txt in milvus_benchmark (#7205)

Signed-off-by: zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-21 10:02:12 +08:00
+								### How to prepare data
 								#### Source data
 								There are several kinds of data types provided in benchmark:
 . Insert from `local`: random generated vectors
 . Insert from the file: the other data type such as `sift/deep`, the following list shows where the source data comes from, make sure to convert to `.npy` file format that can be loaded by `numpy`, and update the value of `RAW_DATA_DIR` in `config.py` to your own data path
-												[skip ci]Update the content in markdowm format (#10070)

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
											
										
										
											2021-10-18 18:12:47 +08:00
+								| data type | sift                           | deep                                        |
 								| --------- | ------------------------------ | ------------------------------------------- |
 								| url       | http://corpus-texmex.irisa.fr/ | https://github.com/erikbern/ann-benchmarks/ |
-												[skip ci] Update readme and requirements.txt in milvus_benchmark (#7205)

Signed-off-by: zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-21 10:02:12 +08:00
 								There are also many optional datasets could be used to test milvus, here is the reference: http://big-ann-benchmarks.com/index.html
 								If the first few characters in the `collection_name` in test suite yaml are matched with the above type, the corresponding data will be created during inserting entities in milvus
 								Also, you should provide the field value of the source data file path `source_file` if running with `ann_accuracy` runner type, the source datasets could be found from https://github.com/erikbern/ann-benchmarks/, `SIFT/Kosarak/GloVe-200` are the datasets which are frequently used in regression testing for milvus
-												[skip ci] add argo.yaml and update readme for benchmark (#7094)

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-16 11:06:10 +08:00
+								## Overview of the benchmark
-												[skip ci] Fix wrong spelling (#10176)

Signed-off-by: Binbin Lv <binbin.lv@zilliz.com>
											
										
										
											2021-10-19 16:42:54 +08:00
+								### Components
-												[skip ci] add argo.yaml and update readme for benchmark (#7094)

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-16 11:06:10 +08:00
 								- `main.py`
-												[skip ci]Update the content in markdowm format (#10070)

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
											
										
										
											2021-10-18 18:12:47 +08:00
-												[skip ci] Refine benchmark readme (#10434)

Signed-off-by: Binbin Lv <binbin.lv@zilliz.com>
											
										
										
											2021-10-22 16:29:13 +08:00
+								   The entry file: parse the input params and initialize the other components: `metric`, `env`, `runner`
-												[skip ci] add argo.yaml and update readme for benchmark (#7094)

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-16 11:06:10 +08:00
 								- `metric`
 								   The test result can be used to analyze the regression or improvement of the milvus system, so we upload the metrics of the test result when a test suite run finished, and then use `redash` to make sense of our data
-												Bench scripts for 2.0 (#6263)

* [skip ci] update benchmark scripts for 2.0

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update README.md

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update mergify.yml for bench scripts

Signed-off-by: zhenwu <zhenwu@milvus.io>

Co-authored-by: zhenwu <zhenwu@milvus.io>
											
										
										
											2021-07-02 11:40:16 +08:00
-												[skip ci] add argo.yaml and update readme for benchmark (#7094)

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-16 11:06:10 +08:00
+								- `db`
-												Bench scripts for 2.0 (#6263)

* [skip ci] update benchmark scripts for 2.0

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update README.md

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update mergify.yml for bench scripts

Signed-off-by: zhenwu <zhenwu@milvus.io>

Co-authored-by: zhenwu <zhenwu@milvus.io>
											
										
										
											2021-07-02 11:40:16 +08:00
-												[skip ci] add argo.yaml and update readme for benchmark (#7094)

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-16 11:06:10 +08:00
+								   Currently we use the `mongodb` to store the test result
-												Bench scripts for 2.0 (#6263)

* [skip ci] update benchmark scripts for 2.0

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update README.md

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update mergify.yml for bench scripts

Signed-off-by: zhenwu <zhenwu@milvus.io>

Co-authored-by: zhenwu <zhenwu@milvus.io>
											
										
										
											2021-07-02 11:40:16 +08:00
-												[skip ci] add argo.yaml and update readme for benchmark (#7094)

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-16 11:06:10 +08:00
+								- `env`
-												Bench scripts for 2.0 (#6263)

* [skip ci] update benchmark scripts for 2.0

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update README.md

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>

* [skip ci] Update mergify.yml for bench scripts

Signed-off-by: zhenwu <zhenwu@milvus.io>

Co-authored-by: zhenwu <zhenwu@milvus.io>
											
										
										
											2021-07-02 11:40:16 +08:00
-												[skip ci] add argo.yaml and update readme for benchmark (#7094)

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-16 11:06:10 +08:00
+								   The `env` component defines the server environment and environment management, the instance of the `env` corresponds to the run mode of the benchmark
 								   - `local`: Only defines the host and port for testing
 								   - `helm/docker`: Install and uninstall the server in benchmark stage
 								- `runner`
 								   The actual executor in benchmark, each test type defined in test suite will generate the corresponding runner instance, there are three stages in `runner`:
 								   - `extract_cases`: There are several test cases defined in each test suite yaml, and each case shares the same server environment and shares the same `prepare` stage, but the `metric` for each case is different, so we need to extract cases from the test suite before the cases runs
 								   - `prepare`: Prepare the data and operations, for example, before running searching, index needs to be created and data needs to be loaded
 								   - `run_case`: Do the core operation and set `metric` value
 								- `suites`: There are two ways to take the content to be tested as input parameters：
 								   - Test suite files under `suites` directory
 								   - Test suite configmap name including `server_config_map` and `client_config_map` if using argo workflow
 								- `update.py`: While using argo workflow as benchmark pipeline, we have two steps in workflow template: `install-milvus` and `client-test`
 								   - In stage `install-milvus`, `update.py` is used to generate a new `values.yaml` which will be a param while in `helm install` operation
 								   - In stage `client-test`, it runs `main.py` and receives the milvus host and port as the cmd params, with the run mode `local`
 								### Conceptual overview
 								   The following diagram shows the runtime execution graph of the benchmark (local mode based on argo workflow)
-												Fix images path in README (#7698)

Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
											
										
										
											2021-09-10 14:17:09 +08:00
+								   <img src="assets/uml.jpg" />
-												[skip ci] add argo.yaml and update readme for benchmark (#7094)

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-16 11:06:10 +08:00
-												[skip ci] Update readme and requirements.txt in milvus_benchmark (#7205)

Signed-off-by: zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-21 10:02:12 +08:00
+								## Test report
-												[skip ci] add argo.yaml and update readme for benchmark (#7094)

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-16 11:06:10 +08:00
-												[skip ci] Update readme and requirements.txt in milvus_benchmark (#7205)

Signed-off-by: zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-21 10:02:12 +08:00
+								### Metrics
-												[skip ci] add argo.yaml and update readme for benchmark (#7094)

Signed-off-by: del-zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-16 11:06:10 +08:00
-												[skip ci] Update readme and requirements.txt in milvus_benchmark (#7205)

Signed-off-by: zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-21 10:02:12 +08:00
+								As the above section mentioned, we will collect the test metrics after test case run finished, here is the main metric field:
 								```
-												[skip ci]Update the content in markdowm format (#10070)

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
											
										
										
											2021-10-18 18:12:47 +08:00
+								run_id      : each test suite will generate a run_id
 								mode        : run mode such as local
 								server      : describe server resource and server version
 								hardware    : server host
 								env         : server config
 								status      : run result
 								err_message : error msg when run failed
 								collection  : collection info
 								index       : index type and index params
 								search      : search params
 								run_params  : extra run params
 								metrics     : metric type and metric value
-												[skip ci] Update readme and requirements.txt in milvus_benchmark (#7205)

Signed-off-by: zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-21 10:02:12 +08:00
+								```
 								### How to visualize test result
 								As the metrics uploaded to the db (we use MongoDB currently), we suppose use Redash to visualize test result from https://redash.io/.
 								For example, in order to find the most suitable insert batch size when preparing data with milvus, a benchmark test suite type named `bp_insert_performance` will run regularly, different `ni_per` in this suite yaml will be executed and the average response time and TPS (Number of rows inserted per second) will be collected.
 								The query expression:
-												[skip ci]Update the content in markdowm format (#10070)

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
											
										
										
											2021-10-18 18:12:47 +08:00
+								```json
-												[skip ci] Update readme and requirements.txt in milvus_benchmark (#7205)

Signed-off-by: zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-21 10:02:12 +08:00
+								{
 								    "collection": "doc",
 								    "query": {
 								        "metrics.type": "bp_insert_performance",
 								        "collection.dataset_name": "sift_1m_128_l2",
 								        "_type": "case",
 								        "server.value.mode": "single"
 								    },
 								    "fields": {
 								        "metrics.value.rps": 1,
 								        "datetime": 4,
 								        "run_id": 5,
 								        "server.value.mode": 6,
 								        "collection.ni_per": 7,
 								        "metrics.value.ni_time": 8
 								    },
 								    "sort": [{
 								        "name": "run_id",
 								        "direction": -1
 								    }],
 								    "limit": 28
 								}
 								```
-												[skip ci] Improve the description (#10989)

Signed-off-by: Binbin Lv <binbin.lv@zilliz.com>
											
										
										
											2021-11-01 16:46:48 +08:00
+								After the execution of the above query, we will get its charts:
-												[skip ci] Update readme and requirements.txt in milvus_benchmark (#7205)

Signed-off-by: zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-21 10:02:12 +08:00
-												Fix images path in README (#7698)

Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
											
										
										
											2021-09-10 14:17:09 +08:00
+								 <img src="assets/dash.png" />
-												[skip ci] Update readme and requirements.txt in milvus_benchmark (#7205)

Signed-off-by: zhenwu <zhenxiang.li@zilliz.com>
											
										
										
											2021-08-21 10:02:12 +08:00
-												[skip ci] Improve benchmark readme (#10939)

Signed-off-by: Binbin Lv <binbin.lv@zilliz.com>
											
										
										
											2021-10-30 11:31:02 +08:00
+								In this chart, we could find an improvement from 2.0.0-RC3 to 2.0.0-RC5.