diff --git a/demos/audio_searching/README.md b/demos/audio_searching/README.md index 0ee781ad26a140b63421fd51dc37500ead44ec84..dc34b61bca8094dc52f774cfd20e4840c6104b68 100644 --- a/demos/audio_searching/README.md +++ b/demos/audio_searching/README.md @@ -3,11 +3,19 @@ # Audio Searching ## Introduction -This demo uses ECAPA-TDNN(or other models) for Speaker Recognition base on MySQL to store user-info/id and Milvus to search vectors. +As the Internet continues to evolve, unstructured data such as emails, social media photos, live videos, and customer service voice calls have become increasingly common. If we want to process the data on a computer, we need to use embedding technology to transform the data into vector and store, index, and query it + +However, when there is a large amount of data, such as hundreds of millions of audio tracks, it is more difficult to do a similarity search. The exhaustive method is feasible, but very time consuming. For this scenario, this demo will introduce how to build an audio similarity retrieval system using the open source vector database Milvus + +Audio retrieval (speech, music, speaker, etc.) enables querying and finding similar sounds (or the same speaker) in a large amount of audio data. The audio similarity retrieval system can be used to identify similar sound effects, minimize intellectual property infringement, quickly retrieve the voice print library, and help enterprises control fraud and identity theft. Audio retrieval also plays an important role in the classification and statistical analysis of audio data + +In this demo, you will learn how to build an audio retrieval system to retrieve similar sound snippets. The uploaded audio clips are converted into vector data using paddlespeech-based pre-training models (audio classification model, speaker recognition model, etc.) and stored in Milvus. Milvus automatically generates a unique ID for each vector, then stores the ID and the corresponding audio information (audio ID, audio speaker ID, etc.) in MySQL to complete the library construction. During retrieval, users upload test audio to obtain vector, and then conduct vector similarity search in Milvus. The retrieval result returned by Milvus is vector ID, and the corresponding audio information can be queried in MySQL by ID + +The demo uses the [CN-Celeb](http://openslr.org/82/) dataset of at least 650,000 audio entries and 3000 speakers to build the audio vector library, which is then retrieved using a preset distance calculation. The dataset can also use other, Adjust as needed, e.g. Librispeech, VoxCeleb, UrbanSound, etc ## Usage ### 1. Prepare MySQL and Milvus services by docker-compose -The molecular similarity search system requires Milvus, MySQL services. We can start these containers with one click through [docker-compose.yaml](./docker-compose.yaml), so please make sure you have [installed Docker Engine](https://docs.docker.com/engine/install/) and [Docker Compose](https://docs.docker.com/compose/install/) before running. then +The audio similarity search system requires Milvus, MySQL services. We can start these containers with one click through [docker-compose.yaml](./docker-compose.yaml), so please make sure you have [installed Docker Engine](https://docs.docker.com/engine/install/) and [Docker Compose](https://docs.docker.com/compose/install/) before running. then ```bash docker-compose -f docker-compose.yaml up -d @@ -81,15 +89,43 @@ INFO: Uvicorn running on http://127.0.0.1:8002 (Press CTRL+C to quit) ``` ### 3. Usage - +- Prepare data + ```bash + wget -c https://www.openslr.org/resources/82/cn-celeb_v2.tar.gz && tar -xvf cn-celeb_v2.tar.gz + ``` + Note: If you want to build a quick demo, you can use ./src/test_main.py:download_audio_data function, it download 20 audio files , Subsequent results show this collection as an example + + - Run + The internal process is downloading data, loading the Paddlespeech model, extracting embedding, storing library, retrieving and deleting library ```bash python ./src/test_main.py ``` + Output: + ```bash + Checkpoint path: %your model path% + Extracting feature from audio No. 1 , 20 audios in total + Extracting feature from audio No. 2 , 20 audios in total + ... + 2022-03-09 17:22:13,870 | INFO | main.py | load_audios | 85 | Successfully loaded data, total count: 20 + 2022-03-09 17:22:13,898 | INFO | main.py | count_audio | 147 | Successfully count the number of data! + 2022-03-09 17:22:13,918 | INFO | main.py | audio_path | 57 | Successfully load audio: ./example_audio/test.wav + ... + 2022-03-09 17:22:32,580 | INFO | main.py | search_local_audio | 131 | search result http://testserver/data?audio_path=./example_audio/test.wav, distance 0.0 + 2022-03-09 17:22:32,580 | INFO | main.py | search_local_audio | 131 | search result http://testserver/data?audio_path=./example_audio/knife_chopping.wav, distance 0.021805256605148315 + 2022-03-09 17:22:32,580 | INFO | main.py | search_local_audio | 131 | search result http://testserver/data?audio_path=./example_audio/knife_cut_into_flesh.wav, distance 0.052762262523174286 + ... + 2022-03-09 17:22:32,582 | INFO | main.py | search_local_audio | 135 | Successfully searched similar audio! + 2022-03-09 17:22:33,658 | INFO | main.py | drop_tables | 159 | Successfully drop tables in Milvus and MySQL! + ``` + ### 4.Pretrained Models -Here is a list of pretrained models released by PaddleSpeech that can be used by command and python API: +Here is a list of pretrained models released by PaddleSpeech : | Model | Sample Rate | :--- | :---: | ecapa_tdnn | 16000 +| panns_cnn6| 32000 +| panns_cnn10| 32000 +| panns_cnn14| 32000 diff --git a/demos/audio_searching/README_cn.md b/demos/audio_searching/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..a76c1118f98e7f1b965603a83afc6ecc872a1660 --- /dev/null +++ b/demos/audio_searching/README_cn.md @@ -0,0 +1,132 @@ + +(简体中文|[English](./README.md)) + +# 音频相似性检索 +## 介绍 + +随着互联网不断发展,电子邮件、社交媒体照片、直播视频、客服语音等非结构化数据已经变得越来越普遍。如果想要使用计算机来处理这些数据,需要使用 embedding 技术将这些数据转化为向量 vector,然后进行存储、建索引、并查询 + +但是,当数据量很大,比如上亿条音频要做相似度搜索,就比较困难了。穷举法固然可行,但非常耗时。针对这种场景,该demo 将介绍如何使用开源向量数据库 Milvus 搭建音频相似度检索系统 + +音频检索(如演讲、音乐、说话人等检索)实现了在海量音频数据中查询并找出相似声音(或相同说话人)片段。音频相似性检索系统可用于识别相似的音效、最大限度减少知识产权侵权等,还可以快速的检索声纹库、帮助企业控制欺诈和身份盗用等。在音频数据的分类和统计分析中,音频检索也发挥着重要作用 + +在本 demo 中,你将学会如何构建一个音频检索系统,用来检索相似的声音片段。使用基于 PaddleSpeech 预训练模型(音频分类模型,说话人识别模型等)将上传的音频片段转换为向量数据,并存储在 Milvus 中。Milvus 自动为每个向量生成唯一的 ID,然后将 ID 和 相应的音频信息(音频id,音频的说话人id等等)存储在 MySQL,这样就完成建库的工作。用户在检索时,上传测试音频,得到向量,然后在 Milvus 中进行向量相似度搜索,Milvus 返回的检索结果为向量 ID,通过 ID 在 MySQL 内部查询相应的音频信息即可 + +这个 demo 使用 [CN-Celeb](http://openslr.org/82/) 数据集,包括至少 650000 条音频,3000 个说话人,来建立音频向量库,然后通过预设的距离计算方式进行检索,这里面数据集也可以使用其他的,根据需要调整,如Librispeech,VoxCeleb,UrbanSound等 + +## 使用方法 +### 1. MySQL 和 Milvus 安装 +音频相似度搜索系统需要Milvus, MySQL服务。 我们可以通过[Docker-Compose.yaml](./ Docker-Compose.yaml)一键启动这些容器,所以请确保在运行之前已经安装了[Docker Engine](https://docs.docker.com/engine/install/) 和[Docker Compose](https://docs.docker.com/compose/install/)。 即 + +```bash +docker-compose -f docker-compose.yaml up -d +``` + +然后你会看到所有的容器都被创建: + +```bash +Creating network "quick_deploy_app_net" with driver "bridge" +Creating milvus-minio ... done +Creating milvus-etcd ... done +Creating audio-mysql ... done +Creating milvus-standalone ... done +``` + +可以采用'docker ps'来显示所有的容器,还可以使用'docker logs audio-mysql'来获取服务器容器的日志: + +```bash +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +b2bcf279e599 milvusdb/milvus:v2.0.1 "/tini -- milvus run…" 22 hours ago Up 22 hours 0.0.0.0:19530->19530/tcp milvus-standalone +d8ef4c84e25c mysql:5.7 "docker-entrypoint.s…" 22 hours ago Up 22 hours 0.0.0.0:3306->3306/tcp, 33060/tcp audio-mysql +8fb501edb4f3 quay.io/coreos/etcd:v3.5.0 "etcd -advertise-cli…" 22 hours ago Up 22 hours 2379-2380/tcp milvus-etcd +ffce340b3790 minio/minio:RELEASE.2020-12-03T00-03-10Z "/usr/bin/docker-ent…" 22 hours ago Up 22 hours (healthy) 9000/tcp milvus-minio + +``` + +### 2. 配置并启动 API 服务 +启动系统服务程序,它会提供基于 Http 后端服务 + +- 安装服务依赖的 python 基础包 + +```bash +pip install -r requirements.txt +``` +- 修改配置 + +```bash +vim src/config.py +``` + +请根据实际环境进行修改。 这里列出了一些需要设置的参数,更多信息请参考[config.py](./src/config.py) + +| **Parameter** | **Description** | **Default setting** | +| ---------------- | ----------------------------------------------------- | ------------------- | +| MILVUS_HOST | The IP address of Milvus, you can get it by ifconfig. If running everything on one machine, most likely 127.0.0.1 | 127.0.0.1 | +| MILVUS_PORT | Port of Milvus. | 19530 | +| VECTOR_DIMENSION | Dimension of the vectors. | 2048 | +| MYSQL_HOST | The IP address of Mysql. | 127.0.0.1 | +| MYSQL_PORT | Port of Milvus. | 3306 | +| DEFAULT_TABLE | The milvus and mysql default collection name. | audio_table | + +- 运行程序 + +启动用 Fastapi 构建的服务 + +```bash +python src/main.py +``` + +然后你会看到应用程序启动: + +```bash +INFO: Started server process [3949] +2022-03-07 17:39:14,864 | INFO | server.py | serve | 75 | Started server process [3949] +INFO: Waiting for application startup. +2022-03-07 17:39:14,865 | INFO | on.py | startup | 45 | Waiting for application startup. +INFO: Application startup complete. +2022-03-07 17:39:14,866 | INFO | on.py | startup | 59 | Application startup complete. +INFO: Uvicorn running on http://127.0.0.1:8002 (Press CTRL+C to quit) +2022-03-07 17:39:14,867 | INFO | server.py | _log_started_message | 206 | Uvicorn running on http://127.0.0.1:8002 (Press CTRL+C to quit) +``` + +### 3. 使用方法 +- 准备数据 + ```bash + wget -c https://www.openslr.org/resources/82/cn-celeb_v2.tar.gz && tar -xvf cn-celeb_v2.tar.gz + ``` + 注:如果希望快速搭建 demo,可以采用 ./src/test_main.py:download_audio_data 内部的 20 条音频,后续结果展示以该集合为例 + + - 运行测试程序 + 内部将依次下载数据,加载 paddlespeech 模型,提取 embedding,存储建库,检索,删库 + ```bash + python ./src/test_main.py + ``` + + 输出: + ```bash + Checkpoint path: %your model path% + Extracting feature from audio No. 1 , 20 audios in total + Extracting feature from audio No. 2 , 20 audios in total + ... + 2022-03-09 17:22:13,870 | INFO | main.py | load_audios | 85 | Successfully loaded data, total count: 20 + 2022-03-09 17:22:13,898 | INFO | main.py | count_audio | 147 | Successfully count the number of data! + 2022-03-09 17:22:13,918 | INFO | main.py | audio_path | 57 | Successfully load audio: ./example_audio/test.wav + ... + 2022-03-09 17:22:32,580 | INFO | main.py | search_local_audio | 131 | search result http://testserver/data?audio_path=./example_audio/test.wav, distance 0.0 + 2022-03-09 17:22:32,580 | INFO | main.py | search_local_audio | 131 | search result http://testserver/data?audio_path=./example_audio/knife_chopping.wav, distance 0.021805256605148315 + 2022-03-09 17:22:32,580 | INFO | main.py | search_local_audio | 131 | search result http://testserver/data?audio_path=./example_audio/knife_cut_into_flesh.wav, distance 0.052762262523174286 + ... + 2022-03-09 17:22:32,582 | INFO | main.py | search_local_audio | 135 | Successfully searched similar audio! + 2022-03-09 17:22:33,658 | INFO | main.py | drop_tables | 159 | Successfully drop tables in Milvus and MySQL! + ``` + +### 4. 预训练模型 + +以下是 PaddleSpeech 提供的预训练模型列表: + +| 模型 | 采样率 +| :--- | :---: +| ecapa_tdnn| 16000 +| panns_cnn6| 32000 +| panns_cnn10| 32000 +| panns_cnn14| 32000 diff --git a/demos/audio_searching/src/main.py b/demos/audio_searching/src/main.py index 89c037a0e6adc056c44d3127b9dee5f36b8dc368..082fc65123cd92648c28f306d817df84c4ca7644 100644 --- a/demos/audio_searching/src/main.py +++ b/demos/audio_searching/src/main.py @@ -126,8 +126,9 @@ async def search_local_audio(request: Request, _, paths, distances = do_search(host, table_name, query_audio_path, MILVUS_CLI, MYSQL_CLI) names = [] - for i in paths: - names.append(os.path.basename(i)) + for path, dist in zip(paths, distances): + names.append(os.path.basename(path)) + LOGGER.info(f"search result {path}, distance {dist}") res = dict(zip(paths, zip(names, distances))) # Sort results by distance metric, closest distances first res = sorted(res.items(), key=lambda item: item[1][1]) diff --git a/demos/audio_searching/src/milvus_helpers.py b/demos/audio_searching/src/milvus_helpers.py index 8ba3776be8871af60bd7f91225b218ff30713c4b..1699e892ede9f6889b00759a517ed62d74bc00da 100644 --- a/demos/audio_searching/src/milvus_helpers.py +++ b/demos/audio_searching/src/milvus_helpers.py @@ -59,7 +59,7 @@ class MilvusHelper: raise Exception( f"There is no collection named:{collection_name}") except Exception as e: - LOGGER.error(f"Failed to load data to Milvus: {e}") + LOGGER.error(f"Failed to set collection in Milvus: {e}") sys.exit(1) def has_collection(self, collection_name): @@ -67,7 +67,7 @@ class MilvusHelper: try: return utility.has_collection(collection_name) except Exception as e: - LOGGER.error(f"Failed to load data to Milvus: {e}") + LOGGER.error(f"Failed to check state of collection in Milvus: {e}") sys.exit(1) def create_collection(self, collection_name): @@ -95,7 +95,7 @@ class MilvusHelper: self.set_collection(collection_name) return "OK" except Exception as e: - LOGGER.error(f"Failed to load data to Milvus: {e}") + LOGGER.error(f"Failed to create collection in Milvus: {e}") sys.exit(1) def insert(self, collection_name, vectors): @@ -112,7 +112,7 @@ class MilvusHelper: ) return ids except Exception as e: - LOGGER.error(f"Failed to load data to Milvus: {e}") + LOGGER.error(f"Failed to insert data to Milvus: {e}") sys.exit(1) def create_index(self, collection_name): @@ -160,7 +160,6 @@ class MilvusHelper: "nprobe": 16 } } - # data = [vectors] res = self.collection.search( vectors, anns_field="embedding", diff --git a/demos/audio_searching/src/test_main.py b/demos/audio_searching/src/test_main.py index 24405f38826ee6a471c8b38e8d7911578f7dbaf2..331208ff159bf95f07c51307854d44334b999e0c 100644 --- a/demos/audio_searching/src/test_main.py +++ b/demos/audio_searching/src/test_main.py @@ -89,7 +89,6 @@ def test_data(): if __name__ == "__main__": download_audio_data() - test_drop() test_load() test_count() test_search()