milvus/tests/milvus_groundtruth
shiyu22 a7d57d7a82
add Tanimoto ground truth (#1138)
* add milvus ground truth

* add milvus groundtruth

* [skip ci] add milvus ground truth

* [skip ci]add tanimoto ground truth
2020-02-14 12:35:50 +08:00
..
milvus_ground_truth.py add Tanimoto ground truth (#1138) 2020-02-14 12:35:50 +08:00
README.md add Tanimoto ground truth (#1138) 2020-02-14 12:35:50 +08:00

Quick Start

  • For calculating L2 or IP distance of feature vectors.
  • At below table, the last five parameters do not need to alter.

Parameter Description

parameter description default setting
PROCESS_NUM number of processes 12
GET_VEC whether to save feature vectors False
CSV whether the query vector file format is csv False
UINT8 whether the query vector data format is uint8 False
BASE_FOLDER_NAME path to the source vector dataset '/data/milvus/base'
NQ_FOLDER_NAME path to the query vector dataset '/data/milvus/query'
GT_ALL_FOLDER_NAME intermediate filename 'ground_truth_all'
GT_FOLDER_NAME path saved the ground truth results 'ground_truth'
LOC_FILE_NAME file saved the gorund truth's location info 'ground_truth.txt'
FLOC_FILE_NAME file saved the gorund truth's filenames info 'file_ground_truth.txt'
VEC_FILE_NAME file saved the gorund truth's feature vectors 'vectors.npy'

Usage

$ python3 milvus_ground_truth.py [-q <nq_num>] -k <topk_num> -m <metric typr>-l

# -q or --nq points the number of vectors taken from the query vector set. This parameter is optional, Without it will take all the data in the query set.

# -k or --topk points calculate the top k similar vectors.

# -m or --metric points the method vector distances are compared in Milvus,such as IP/L2/Tan.

# -l means generate the ground truth results, it will save in GT_FOLDER_NAME.In this path, LOC_FILE_NAME saved the gorund truth's results info, such as "8002005210",the first 8 is meaningless, the 2-4th position means the position of the result file in the folder, the 5-10th position means the position of the result vector in the result file. The result filename and vector location saved in FLOC_FILE_NAME, such as "binary_128d_00000.npy 81759", and the result vector is saved in VEC_FILE_NAME.