未验证 提交 290f4ad7 编写于 作者: J Jin Hai 提交者: GitHub

Merge pull request #432 from Yukikaze-CZR/0.6.0

[skip ci] add ivfsq8h test report detailed version #255
# milvus_ivfsq8h_test_report_detailed_version
## Summary
This document contains the test reports of IVF_SQ8H index on Milvus single server.
## Test objectives
The time cost and recall when searching with different parameters.
## Test method
### Hardware/Software requirements
Operating System: CentOS Linux release 7.6.1810 (Core)
CPU: Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz
GPU0: GeForce GTX 1080
GPU1: GeForce GTX 1080
Memory: 503GB
Docker version: 18.09
NVIDIA Driver version: 430.34
Milvus version: 0.5.3
SDK interface: Python 3.6.8
pymilvus version: 0.2.5
### Data model
The data used in the tests are:
- Data source: sift1b
- Data type: hdf5
For details on this dataset, please check : http://corpus-texmex.irisa.fr/ .
### Measures
- Query Elapsed Time: Time cost (in seconds) to run a query. Variables that affect Query Elapsed Time:
- nq (Number of queried vectors)
> Note: In the query test of query elapsed time, we will test the following parameters with different values:
>
> nq - grouped by: [1, 5, 10, 50, 100, 200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800].
>
- Recall: The fraction of the total amount of relevant instances that were actually retrieved . Variables that affect Recall:
- nq (Number of queried vectors)
- topk (Top k result of a query)
> Note: In the query test of recall, we will test the following parameters with different values:
>
> nq - grouped by: [10, 50, 100, 200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800],
>
> topk - grouped by: [1, 10, 100]
## Test reports
### Test environment
Data base: sift1b-1,000,000,000 vectors, 128-dimension
Table Attributes
- nlist: 16384
- metric_type: L2
Query configuration
- nprobe: 32
Milvus configuration
- cpu_cache_capacity: 150
- gpu_cache_capacity: 6
- use_blas_threshold: 1100
- gpu_search_threshold: 1200
- search_resources: cpu, gpu0, gpu1
The definitions of Milvus configuration are on https://milvus.io/docs/en/reference/milvus_config/.
Test method
Test the query elapsed time and recall with several parameters, and once only change one parameter.
- Whether to restart Milvus after each query: No
### Performance test
#### Data query
**Test result**
Query Elapsed Time
topk = 100
| nq/topk | topk=100 |
| :-----: | :------: |
| nq=1 | 0.34 |
| nq=5 | 0.72 |
| nq=10 | 0.91 |
| nq=50 | 1.51 |
| nq=100 | 2.49 |
| nq=200 | 4.09 |
| nq=400 | 7.32 |
| nq=600 | 10.63 |
| nq=800 | 13.84 |
| nq=1000 | 16.83 |
| nq=1200 | 18.20 |
| nq=1400 | 20.1 |
| nq=1600 | 20.0 |
| nq=1800 | 19.86 |
When nq is 1800, the query time cost of a 128-dimension vector is around 11ms.
**Conclusion**
When nq < 1200, the query elapsed time increases quickly with nq; when nq > 1200, the query elapsed time increases much slower. It is because gpu_search_threshold is set to 1200, when nq < 1200, CPU is chosen to do the query, otherwise GPU is chosen. Compared with CPU, GPU has much more cores and stronger computing capability. When nq is large, it can better reflect GPU's advantages on computing.
The query elapsed time consists of two parts: (1) index CPU-to-GPU copy time; (2) nprobe buckets search time. When nq is larger enough, index CPU-to-GPU copy time can be amortized efficiently. So Milvus performs well through setting suitable gpu_search_threshold.
### Recall test
**Test result**
topk = 1 : recall - recall@1
topk = 10 : recall - recall@10
topk = 100 : recall - recall@100
We use the ground_truth in sift1b dataset to calculate the recall of query results.
| nq/topk | topk=1 | topk=10 | topk=100 |
| :-----: | :----: | :-----: | :------: |
| nq=10 | 0.900 | 0.910 | 0.939 |
| nq=50 | 0.980 | 0.950 | 0.941 |
| nq=100 | 0.970 | 0.937 | 0.931 |
| nq=200 | 0.955 | 0.941 | 0.929 |
| nq=400 | 0.958 | 0.944 | 0.932 |
| nq=600 | 0.952 | 0.946 | 0.934 |
| nq=800 | 0.941 | 0.943 | 0.930 |
| nq=1000 | 0.938 | 0.942 | 0.930 |
| nq=1200 | 0.937 | 0.943 | 0.931 |
| nq=1400 | 0.939 | 0.945 | 0.931 |
| nq=1600 | 0.936 | 0.945 | 0.931 |
| nq=1800 | 0.937 | 0.946 | 0.932 |
**Conclusion**
As nq increases, the recall gradually stabilizes to over 93%. The usage of CPU or GPU and different topk are not related to recall.
\ No newline at end of file
# milvus_ivfsq8h_test_report_detailed_version_cn
## 概述
本文描述了ivfsq8h索引在milvus单机部署方式下的测试结果。
## 测试目标
参数不同情况下的查询时间和召回率。
## 测试方法
### 软硬件环境
操作系统:CentOS Linux release 7.6.1810 (Core)
CPU:Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz
GPU0:GeForce GTX 1080
GPU1:GeForce GTX 1080
内存:503GB
Docker版本:18.09
NVIDIA Driver版本:430.34
Milvus版本:0.5.3
SDK接口:Python 3.6.8
pymilvus版本:0.2.5
### 数据模型
本测试中用到的主要数据:
- 数据来源:sift1b
- 数据类型:hdf5
关于该数据集的详细信息请参考:http://corpus-texmex.irisa.fr/ 。
### 测试指标
- Query Elapsed Time: 数据库查询所有向量的时间(以秒计)。影响Query Elapsed Time的变量:
- nq (被查询向量的数量)
> 备注:在向量查询测试中,我们会测试下面参数不同的取值来观察结果:
>
> 被查询向量的数量nq将按照 [1, 5, 10, 50, 100, 200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800]的数量分组。
>
- Recall: 实际返回的正确结果占总数之比。影响Recall的变量:
- nq (被查询向量的数量)
- topk (单条查询中最相似的K个结果)
> 备注:在向量准确性测试中,我们会测试下面参数不同的取值来观察结果:
>
> 被查询向量的数量nq将按照 [10, 50, 100, 200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800]的数量分组,
>
> 单条查询中最相似的K个结果topk将按照[1, 10, 100]的数量分组。
## 测试报告
### 测试环境
数据集: sift1b-1,000,000,000向量, 128维
表格属性:
- nlist: 16384
- metric_type: L2
查询设置:
- nprobe: 32
Milvus设置:
- cpu_cache_capacity: 150
- gpu_cache_capacity: 6
- use_blas_threshold: 1100
- gpu_search_threshold: 1200
- search_resources: cpu, gpu0, gpu1
Milvus设置的详细定义可以参考 https://milvus.io/docs/en/reference/milvus_config/ 。
测试方法
通过一次仅改变一个参数的值,测试查询向量时间和召回率。
- 查询后是否重启Milvus:否
### 性能测试
#### 数据查询
测试结果
Query Elapsed Time
topk = 100
| nq/topk | topk=100 |
| :-----: | :------: |
| nq=1 | 0.34 |
| nq=5 | 0.72 |
| nq=10 | 0.91 |
| nq=50 | 1.51 |
| nq=100 | 2.49 |
| nq=200 | 4.09 |
| nq=400 | 7.32 |
| nq=600 | 10.63 |
| nq=800 | 13.84 |
| nq=1000 | 16.83 |
| nq=1200 | 18.20 |
| nq=1400 | 20.1 |
| nq=1600 | 20.0 |
| nq=1800 | 19.86 |
当nq为1800时,查询一条128维向量需要耗时约11毫秒。
**总结**
当nq小于1200时,查询耗时随nq的增长快速增大;当nq大于1200时,查询耗时的增大则缓慢许多。这是因为gpu_search_threshold这一参数的值被设为1200,当nq<1200时,选择CPU进行操作,否则选择GPU进行操作。与CPU。
在GPU模式下的查询耗时由两部分组成:(1)索引从CPU到GPU的拷贝时间;(2)所有分桶的查询时间。当nq小于500时,索引从CPU到GPU 的拷贝时间无法被有效均摊,此时CPU模式时一个更优的选择;当nq大于500时,选择GPU模式更合理。和CPU相比,GPU具有更多的核数和更强的算力。当nq较大时,GPU在计算上的优势能被更好地被体现。
### 召回率测试
**测试结果**
topk = 1 : recall - recall@1
topk = 10 : recall - recall@10
topk = 100 : recall - recall@100
我们利用sift1b数据集中的ground_truth来计算查询结果的召回率。
| nq/topk | topk=1 | topk=10 | topk=100 |
| :-----: | :----: | :-----: | :------: |
| nq=10 | 0.900 | 0.910 | 0.939 |
| nq=50 | 0.980 | 0.950 | 0.941 |
| nq=100 | 0.970 | 0.937 | 0.931 |
| nq=200 | 0.955 | 0.941 | 0.929 |
| nq=400 | 0.958 | 0.944 | 0.932 |
| nq=600 | 0.952 | 0.946 | 0.934 |
| nq=800 | 0.941 | 0.943 | 0.930 |
| nq=1000 | 0.938 | 0.942 | 0.930 |
| nq=1200 | 0.937 | 0.943 | 0.931 |
| nq=1400 | 0.939 | 0.945 | 0.931 |
| nq=1600 | 0.936 | 0.945 | 0.931 |
| nq=1800 | 0.937 | 0.946 | 0.932 |
**总结**
随着nq的增大,召回率逐渐稳定至93%以上。CPU/GPU的使用以及topk的值与召回率的大小无关。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册