提交 ad7ddf8a 编写于 作者: Q qingen

[wip][vec] add search result description, test=doc #1543

上级 508f2f5b
...@@ -11,7 +11,7 @@ Audio retrieval (speech, music, speaker, etc.) enables querying and finding simi ...@@ -11,7 +11,7 @@ Audio retrieval (speech, music, speaker, etc.) enables querying and finding simi
In this demo, you will learn how to build an audio retrieval system to retrieve similar sound snippets. The uploaded audio clips are converted into vector data using paddlespeech-based pre-training models (audio classification model, speaker recognition model, etc.) and stored in Milvus. Milvus automatically generates a unique ID for each vector, then stores the ID and the corresponding audio information (audio ID, audio speaker ID, etc.) in MySQL to complete the library construction. During retrieval, users upload test audio to obtain vector, and then conduct vector similarity search in Milvus. The retrieval result returned by Milvus is vector ID, and the corresponding audio information can be queried in MySQL by ID In this demo, you will learn how to build an audio retrieval system to retrieve similar sound snippets. The uploaded audio clips are converted into vector data using paddlespeech-based pre-training models (audio classification model, speaker recognition model, etc.) and stored in Milvus. Milvus automatically generates a unique ID for each vector, then stores the ID and the corresponding audio information (audio ID, audio speaker ID, etc.) in MySQL to complete the library construction. During retrieval, users upload test audio to obtain vector, and then conduct vector similarity search in Milvus. The retrieval result returned by Milvus is vector ID, and the corresponding audio information can be queried in MySQL by ID
![Workflow of an audio searching system](./img/audo_searching.png) ![Workflow of an audio searching system](./img/audio_searching.png)
Note:this demo uses the [CN-Celeb](http://openslr.org/82/) dataset of at least 650,000 audio entries and 3000 speakers to build the audio vector library, which is then retrieved using a preset distance calculation. The dataset can also use other, Adjust as needed, e.g. Librispeech, VoxCeleb, UrbanSound, etc Note:this demo uses the [CN-Celeb](http://openslr.org/82/) dataset of at least 650,000 audio entries and 3000 speakers to build the audio vector library, which is then retrieved using a preset distance calculation. The dataset can also use other, Adjust as needed, e.g. Librispeech, VoxCeleb, UrbanSound, etc
...@@ -31,6 +31,7 @@ Creating milvus-minio ... done ...@@ -31,6 +31,7 @@ Creating milvus-minio ... done
Creating milvus-etcd ... done Creating milvus-etcd ... done
Creating audio-mysql ... done Creating audio-mysql ... done
Creating milvus-standalone ... done Creating milvus-standalone ... done
Creating audio-webclient ... done
``` ```
And show all containers with `docker ps`, and you can use `docker logs audio-mysql` to get the logs of server container And show all containers with `docker ps`, and you can use `docker logs audio-mysql` to get the logs of server container
...@@ -41,7 +42,7 @@ b2bcf279e599 milvusdb/milvus:v2.0.1 "/tini -- milvus run…" 22 hours ago Up ...@@ -41,7 +42,7 @@ b2bcf279e599 milvusdb/milvus:v2.0.1 "/tini -- milvus run…" 22 hours ago Up
d8ef4c84e25c mysql:5.7 "docker-entrypoint.s…" 22 hours ago Up 22 hours 0.0.0.0:3306->3306/tcp, 33060/tcp audio-mysql d8ef4c84e25c mysql:5.7 "docker-entrypoint.s…" 22 hours ago Up 22 hours 0.0.0.0:3306->3306/tcp, 33060/tcp audio-mysql
8fb501edb4f3 quay.io/coreos/etcd:v3.5.0 "etcd -advertise-cli…" 22 hours ago Up 22 hours 2379-2380/tcp milvus-etcd 8fb501edb4f3 quay.io/coreos/etcd:v3.5.0 "etcd -advertise-cli…" 22 hours ago Up 22 hours 2379-2380/tcp milvus-etcd
ffce340b3790 minio/minio:RELEASE.2020-12-03T00-03-10Z "/usr/bin/docker-ent…" 22 hours ago Up 22 hours (healthy) 9000/tcp milvus-minio ffce340b3790 minio/minio:RELEASE.2020-12-03T00-03-10Z "/usr/bin/docker-ent…" 22 hours ago Up 22 hours (healthy) 9000/tcp milvus-minio
15c84a506754 iregistry.baidu-int.com/paddlespeech/audio-search-client:1.0 "/bin/bash -c '/usr/…" 22 hours ago Up 22 hours (healthy) 0.0.0.0:8068->80/tcp audio-webclient
``` ```
### 2. Start API Server ### 2. Start API Server
...@@ -49,79 +50,112 @@ Then to start the system server, and it provides HTTP backend services. ...@@ -49,79 +50,112 @@ Then to start the system server, and it provides HTTP backend services.
- Install the Python packages - Install the Python packages
```bash ```bash
pip install -r requirements.txt pip install -r requirements.txt
``` ```
- Set configuration - Set configuration
```bash ```bash
vim src/config.py vim src/config.py
``` ```
Modify the parameters according to your own environment. Here listing some parameters that need to be set, for more information please refer to [config.py](./src/config.py). Modify the parameters according to your own environment. Here listing some parameters that need to be set, for more information please refer to [config.py](./src/config.py).
| **Parameter** | **Description** | **Default setting** | | **Parameter** | **Description** | **Default setting** |
| ---------------- | ----------------------------------------------------- | ------------------- | | ---------------- | ----------------------------------------------------- | ------------------- |
| MILVUS_HOST | The IP address of Milvus, you can get it by ifconfig. If running everything on one machine, most likely 127.0.0.1 | 127.0.0.1 | | MILVUS_HOST | The IP address of Milvus, you can get it by ifconfig. If running everything on one machine, most likely 127.0.0.1 | 127.0.0.1 |
| MILVUS_PORT | Port of Milvus. | 19530 | | MILVUS_PORT | Port of Milvus. | 19530 |
| VECTOR_DIMENSION | Dimension of the vectors. | 2048 | | VECTOR_DIMENSION | Dimension of the vectors. | 2048 |
| MYSQL_HOST | The IP address of Mysql. | 127.0.0.1 | | MYSQL_HOST | The IP address of Mysql. | 127.0.0.1 |
| MYSQL_PORT | Port of Milvus. | 3306 | | MYSQL_PORT | Port of Milvus. | 3306 |
| DEFAULT_TABLE | The milvus and mysql default collection name. | audio_table | | DEFAULT_TABLE | The milvus and mysql default collection name. | audio_table |
- Run the code - Run the code
Then start the server with Fastapi. Then start the server with Fastapi.
```bash ```bash
python src/main.py python src/main.py
``` ```
Then you will see the Application is started: Then you will see the Application is started:
```bash ```bash
INFO: Started server process [3949] INFO: Started server process [3949]
2022-03-07 17:39:14,864 | INFO | server.py | serve | 75 | Started server process [3949] 2022-03-07 17:39:14,864 | INFO | server.py | serve | 75 | Started server process [3949]
INFO: Waiting for application startup. INFO: Waiting for application startup.
2022-03-07 17:39:14,865 | INFO | on.py | startup | 45 | Waiting for application startup. 2022-03-07 17:39:14,865 | INFO | on.py | startup | 45 | Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
2022-03-07 17:39:14,866 | INFO | on.py | startup | 59 | Application startup complete. 2022-03-07 17:39:14,866 | INFO | on.py | startup | 59 | Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8002 (Press CTRL+C to quit) INFO: Uvicorn running on http://127.0.0.1:8002 (Press CTRL+C to quit)
2022-03-07 17:39:14,867 | INFO | server.py | _log_started_message | 206 | Uvicorn running on http://127.0.0.1:8002 (Press CTRL+C to quit) 2022-03-07 17:39:14,867 | INFO | server.py | _log_started_message | 206 | Uvicorn running on http://127.0.0.1:8002 (Press CTRL+C to quit)
``` ```
### 3. Usage ### 3. Usage
- Prepare data - Prepare data
```bash ```bash
wget -c https://www.openslr.org/resources/82/cn-celeb_v2.tar.gz && tar -xvf cn-celeb_v2.tar.gz wget -c https://www.openslr.org/resources/82/cn-celeb_v2.tar.gz && tar -xvf cn-celeb_v2.tar.gz
``` ```
Note: If you want to build a quick demo, you can use ./src/test_main.py:download_audio_data function, it download 20 audio files , Subsequent results show this collection as an example Note: If you want to build a quick demo, you can use ./src/test_main.py:download_audio_data function, it downloads 20 audio files , Subsequent results show this collection as an example
- Run - scripts test (recommend!)
The internal process is downloading data, loading the Paddlespeech model, extracting embedding, storing library, retrieving and deleting library
```bash
python ./src/test_main.py
```
Output: The internal process is downloading data, loading the Paddlespeech model, extracting embedding, storing library, retrieving and deleting library
```bash ```bash
Checkpoint path: %your model path% python ./src/test_main.py
Extracting feature from audio No. 1 , 20 audios in total ```
Extracting feature from audio No. 2 , 20 audios in total
... Output:
2022-03-09 17:22:13,870 | INFO | main.py | load_audios | 85 | Successfully loaded data, total count: 20 ```bash
2022-03-09 17:22:13,898 | INFO | main.py | count_audio | 147 | Successfully count the number of data! Checkpoint path: %your model path%
2022-03-09 17:22:13,918 | INFO | main.py | audio_path | 57 | Successfully load audio: ./example_audio/test.wav Extracting feature from audio No. 1 , 20 audios in total
... Extracting feature from audio No. 2 , 20 audios in total
2022-03-09 17:22:32,580 | INFO | main.py | search_local_audio | 131 | search result http://testserver/data?audio_path=./example_audio/test.wav, distance 0.0 ...
2022-03-09 17:22:32,580 | INFO | main.py | search_local_audio | 131 | search result http://testserver/data?audio_path=./example_audio/knife_chopping.wav, distance 0.021805256605148315 2022-03-09 17:22:13,870 | INFO | main.py | load_audios | 85 | Successfully loaded data, total count: 20
2022-03-09 17:22:32,580 | INFO | main.py | search_local_audio | 131 | search result http://testserver/data?audio_path=./example_audio/knife_cut_into_flesh.wav, distance 0.052762262523174286 2022-03-09 17:22:13,898 | INFO | main.py | count_audio | 147 | Successfully count the number of data!
... 2022-03-09 17:22:13,918 | INFO | main.py | audio_path | 57 | Successfully load audio: ./example_audio/test.wav
2022-03-09 17:22:32,582 | INFO | main.py | search_local_audio | 135 | Successfully searched similar audio! ...
2022-03-09 17:22:33,658 | INFO | main.py | drop_tables | 159 | Successfully drop tables in Milvus and MySQL! 2022-03-09 17:22:32,580 | INFO | main.py | search_local_audio | 131 | search result http://testserver/data?audio_path=./example_audio/test.wav, distance 0.0
``` 2022-03-09 17:22:32,580 | INFO | main.py | search_local_audio | 131 | search result http://testserver/data?audio_path=./example_audio/knife_chopping.wav, distance 0.021805256605148315
2022-03-09 17:22:32,580 | INFO | main.py | search_local_audio | 131 | search result http://testserver/data?audio_path=./example_audio/knife_cut_into_flesh.wav, distance 0.052762262523174286
...
2022-03-09 17:22:32,582 | INFO | main.py | search_local_audio | 135 | Successfully searched similar audio!
2022-03-09 17:22:33,658 | INFO | main.py | drop_tables | 159 | Successfully drop tables in Milvus and MySQL!
```
- GUI test (optional)
Navigate to 127.0.0.1:8068 in your browser to access the front-end interface.
- Insert data
Download the data and decompress it to a path named /home/speech/data. Then enter /home/speech/data in the address bar of the upload page to upload the data
![](./img/insert.png)
- Search for similar audio
Select the magnifying glass icon on the left side of the interface. Then, press the "Default Target Audio File" button and upload a .wav sound file you'd like to search. Results will be displayed
![](./img/search.png)
### 4.Result
machine configuration:
- OS: CentOS release 7.6
- kernel:4.17.11-1.el7.elrepo.x86_64
- CPU:Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
- memory:132G
dataset:
- CN-Celeb, train size 650,000, test size 10,000, dimention 256, distance L2
recall and elapsed time statistics are shown in the following figure:
![](./img/result.png)
Compared with other algorithms, the retrieval framework based on Milvus ranks in the middle in terms of speed and performance. Under the premise of 90% recall rate, the retrieval time is about 2.9 milliseconds, which can meet most application scenarios
### 4.Pretrained Models ### 5.Pretrained Models
Here is a list of pretrained models released by PaddleSpeech : Here is a list of pretrained models released by PaddleSpeech :
......
...@@ -12,9 +12,9 @@ ...@@ -12,9 +12,9 @@
在本 demo 中,你将学会如何构建一个音频检索系统,用来检索相似的声音片段。使用基于 PaddleSpeech 预训练模型(音频分类模型,说话人识别模型等)将上传的音频片段转换为向量数据,并存储在 Milvus 中。Milvus 自动为每个向量生成唯一的 ID,然后将 ID 和 相应的音频信息(音频id,音频的说话人id等等)存储在 MySQL,这样就完成建库的工作。用户在检索时,上传测试音频,得到向量,然后在 Milvus 中进行向量相似度搜索,Milvus 返回的检索结果为向量 ID,通过 ID 在 MySQL 内部查询相应的音频信息即可 在本 demo 中,你将学会如何构建一个音频检索系统,用来检索相似的声音片段。使用基于 PaddleSpeech 预训练模型(音频分类模型,说话人识别模型等)将上传的音频片段转换为向量数据,并存储在 Milvus 中。Milvus 自动为每个向量生成唯一的 ID,然后将 ID 和 相应的音频信息(音频id,音频的说话人id等等)存储在 MySQL,这样就完成建库的工作。用户在检索时,上传测试音频,得到向量,然后在 Milvus 中进行向量相似度搜索,Milvus 返回的检索结果为向量 ID,通过 ID 在 MySQL 内部查询相应的音频信息即可
![音频检索程图](./img/audio_searching.png) ![音频检索程图](./img/audio_searching.png)
注:该 demo 使用 [CN-Celeb](http://openslr.org/82/) 数据集,包括至少 650000 条音频,3000 个说话人,来建立音频向量库(音频特征,或音频说话人特征),然后通过预设的距离计算方式进行音频(或说话人)检索,这里面数据集也可以使用其他的,根据需要调整,如Librispeech,VoxCeleb,UrbanSound等 注:该 demo 使用 [CN-Celeb](http://openslr.org/82/) 数据集,包括至少 650000 条音频,3000 个说话人,来建立音频向量库(音频特征,或音频说话人特征),然后通过预设的距离计算方式进行音频(或说话人)检索,这里面数据集也可以使用其他的,根据需要调整,如Librispeech,VoxCeleb,UrbanSound,GloVe,MNIST
## 使用方法 ## 使用方法
### 1. MySQL 和 Milvus 安装 ### 1. MySQL 和 Milvus 安装
...@@ -129,13 +129,34 @@ ffce340b3790 minio/minio:RELEASE.2020-12-03T00-03-10Z "/usr/bin/docker-ent…" ...@@ -129,13 +129,34 @@ ffce340b3790 minio/minio:RELEASE.2020-12-03T00-03-10Z "/usr/bin/docker-ent…"
在浏览器中输入 127.0.0.1:8068 访问前端页面 在浏览器中输入 127.0.0.1:8068 访问前端页面
- 上传音频 - 上传音频
下载数据并解压到一文件夹,假设为 /home/speech/data,那么在上传页面地址栏输入 /home/speech/data 进行数据上传
![](./img/insert.png) ![](./img/insert.png)
- 检索相似音频 - 检索相似音频
选择左上角放大镜,点击 “Default Target Audio File” 按钮,上传测试音频,接着你将看到检索结果
![](./img/search.png) ![](./img/search.png)
### 4. 预训练模型 ### 4. 结果
机器配置:
- 操作系统: CentOS release 7.6
- 内核:4.17.11-1.el7.elrepo.x86_64
- 处理器:Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
- 内存:132G
数据集:
- CN-Celeb, 训练集 65万, 测试集 1万,向量维度 256,距离 L2
召回和耗时统计如下图:
![](./img/result.png)
和其他算法比较,基于 milvus 的检索框架在速度与性能排名居中,在召回率 90% 的前提下,检索耗时约 2.9 毫秒,可以满足大多数应用场景
### 5. 预训练模型
以下是 PaddleSpeech 提供的预训练模型列表: 以下是 PaddleSpeech 提供的预训练模型列表:
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册