提交 fdf1af8d 编写于 作者: C chenjian

t Merge branch 'develop' of https://github.com/rainyfly/PaddleHub into face_parse

......@@ -231,3 +231,4 @@ We welcome you to contribute code to PaddleHub, and thank you for your feedback.
* Many thanks to [zl1271](https://github.com/zl1271) for fixing serving docs typo
* Many thanks to [AK391](https://github.com/AK391) for adding the webdemo of UGATIT and deoldify models in Hugging Face spaces
* Many thanks to [itegel](https://github.com/itegel) for fixing quick start docs typo
* Many thanks to [AK391](https://github.com/AK391) for adding the webdemo of Photo2Cartoon model in Hugging Face spaces
......@@ -247,3 +247,4 @@ print(results)
* 非常感谢[zl1271](https://github.com/zl1271)修复了serving文档中的错别字
* 非常感谢[AK391](https://github.com/AK391)在Hugging Face spaces中添加了UGATIT和deoldify模型的web demo
* 非常感谢[itegel](https://github.com/itegel)修复了快速开始文档中的错别字
* 非常感谢[AK391](https://github.com/AK391)在Hugging Face spaces中添加了Photo2Cartoon模型的web demo
......@@ -50,6 +50,8 @@
**UGATIT Selfie2anime Huggingface Web Demo**: Integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/akhaliq/U-GAT-IT-selfie2anime)
**Photo2Cartoon Huggingface Web Demo**: Integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/akhaliq/photo2cartoon)
### Object Detection
- Pedestrian detection, vehicle detection, and more industrial-grade ultra-large-scale pretrained models are provided.
......
# u2_conformer_wenetspeech
|模型名称|u2_conformer_wenetspeech|
| :--- | :---: |
|类别|语音-语音识别|
|网络|Conformer|
|数据集|WenetSpeech|
|是否支持Fine-tuning|否|
|模型大小|494MB|
|最新更新日期|2021-12-10|
|数据指标|中文CER 0.087 |
## 一、模型基本信息
### 模型介绍
U2 Conformer模型是一种适用于英文和中文的end-to-end语音识别模型。u2_conformer_wenetspeech采用了conformer的encoder和transformer的decoder的模型结构,并且使用了ctc-prefix beam search的方式进行一遍打分,再利用attention decoder进行二次打分的方式进行解码来得到最终结果。
u2_conformer_wenetspeech在中文普通话开源语音数据集[WenetSpeech](https://wenet-e2e.github.io/WenetSpeech/)进行了预训练,该模型在其DEV测试集上的CER指标是0.087。
<p align="center">
<img src="https://paddlehub.bj.bcebos.com/paddlehub-img/conformer.png" hspace='10'/> <br />
</p>
<p align="center">
<img src="https://paddlehub.bj.bcebos.com/paddlehub-img/u2_conformer.png" hspace='10'/> <br />
</p>
更多详情请参考:
- [Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition](https://arxiv.org/abs/2012.05481)
- [Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/abs/2005.08100)
- [WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition](https://arxiv.org/abs/2110.03370)
## 二、安装
- ### 1、系统依赖
- libsndfile
- Linux
```shell
$ sudo apt-get install libsndfile
or
$ sudo yum install libsndfile
```
- MacOs
```
$ brew install libsndfile
```
- ### 2、环境依赖
- paddlepaddle >= 2.2.0
- paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 3、安装
- ```shell
$ hub install u2_conformer_wenetspeech
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、预测代码示例
```python
import paddlehub as hub
# 采样率为16k,格式为wav的中文语音音频
wav_file = '/PATH/TO/AUDIO'
model = hub.Module(
name='u2_conformer_wenetspeech',
version='1.0.0')
text = model.speech_recognize(wav_file)
print(text)
```
- ### 2、API
- ```python
def check_audio(audio_file)
```
- 检查输入音频格式和采样率是否满足为16000,如果不满足,则重新采样至16000并将新的音频文件保存至相同目录。
- **参数**
- `audio_file`:本地音频文件(*.wav)的路径,如`/path/to/input.wav`
- ```python
def speech_recognize(
audio_file,
device='cpu',
)
```
- 将输入的音频识别成文字
- **参数**
- `audio_file`:本地音频文件(*.wav)的路径,如`/path/to/input.wav`
- `device`:预测时使用的设备,默认为`cpu`,如需使用gpu预测,请设置为`gpu`。
- **返回**
- `text`:str类型,返回输入音频的识别文字结果。
## 四、服务部署
- PaddleHub Serving可以部署一个在线的语音识别服务。
- ### 第一步:启动PaddleHub Serving
- ```shell
$ hub serving start -m u2_conformer_wenetspeech
```
- 这样就完成了一个语音识别服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
# 需要识别的音频的存放路径,确保部署服务的机器可访问
file = '/path/to/input.wav'
# 以key的方式指定text传入预测方法的时的参数,此例中为"audio_file"
data = {"audio_file": file}
# 发送post请求,content-type类型应指定json方式,url中的ip地址需改为对应机器的ip
url = "http://127.0.0.1:8866/predict/u2_conformer_wenetspeech"
# 指定post请求的headers为application/json方式
headers = {"Content-Type": "application/json"}
r = requests.post(url=url, headers=headers, data=json.dumps(data))
print(r.json())
```
## 五、更新历史
* 1.0.0
初始发布
```shell
$ hub install u2_conformer_wenetspeech
```
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import paddle
from paddleaudio import load, save_wav
from paddlespeech.cli import ASRExecutor
from paddlehub.module.module import moduleinfo, serving
from paddlehub.utils.log import logger
@moduleinfo(
name="u2_conformer_wenetspeech", version="1.0.0", summary="", author="Wenet", author_email="", type="audio/asr")
class U2Conformer(paddle.nn.Layer):
def __init__(self):
super(U2Conformer, self).__init__()
self.asr_executor = ASRExecutor()
self.asr_kw_args = {
'model': 'conformer_wenetspeech',
'lang': 'zh',
'sample_rate': 16000,
'config': None, # Set `config` and `ckpt_path` to None to use pretrained model.
'ckpt_path': None,
}
@staticmethod
def check_audio(audio_file):
assert audio_file.endswith('.wav'), 'Input file must be a wave file `*.wav`.'
sig, sample_rate = load(audio_file)
if sample_rate != 16000:
sig, _ = load(audio_file, 16000)
audio_file_16k = audio_file[:audio_file.rindex('.')] + '_16k.wav'
logger.info('Resampling to 16000 sample rate to new audio file: {}'.format(audio_file_16k))
save_wav(sig, 16000, audio_file_16k)
return audio_file_16k
else:
return audio_file
@serving
def speech_recognize(self, audio_file, device='cpu'):
assert os.path.isfile(audio_file), 'File not exists: {}'.format(audio_file)
audio_file = self.check_audio(audio_file)
text = self.asr_executor(audio_file=audio_file, device=device, **self.asr_kw_args)
return text
## 概述
# deepvoice3_ljspeech
|模型名称|deepvoice3_ljspeech|
| :--- | :---: |
|类别|语音-语音合成|
|网络|DeepVoice3|
|数据集|LJSpeech-1.1|
|是否支持Fine-tuning|否|
|模型大小|58MB|
|最新更新日期|2020-10-27|
|数据指标|-|
## 一、模型基本信息
### 模型介绍
Deep Voice 3是百度研究院2017年发布的端到端的TTS模型(论文录用于ICLR 2018)。它是一个基于卷积神经网络和注意力机制的seq2seq模型,由于不包含循环神经网络,它可以并行训练,远快于基于循环神经网络的模型。Deep Voice 3可以学习到多个说话人的特征,也支持搭配多种声码器使用。deepvoice3_ljspeech是基于ljspeech英文语音数据集预训练得到的英文TTS模型,仅支持预测。
<p align="center">
<img src="https://github.com/PaddlePaddle/Parakeet/blob/develop/examples/deepvoice3/images/model_architecture.png" hspace='10'/> <br />
<img src="https://raw.githubusercontent.com/PaddlePaddle/Parakeet/release/v0.1/examples/deepvoice3/images/model_architecture.png" hspace='10'/> <br/>
</p>
更多详情参考论文[Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning](https://arxiv.org/abs/1710.07654)
## 命令行预测
```shell
$ hub run deepvoice3_ljspeech --input_text='Simple as this proposition is, it is necessary to be stated' --use_gpu True --vocoder griffin-lim
```
## 二、安装
## API
- ### 1、系统依赖
```python
def synthesize(texts, use_gpu=False, vocoder="griffin-lim"):
```
对于Ubuntu用户,请执行:
```
sudo apt-get install libsndfile1
```
对于Centos用户,请执行:
```
sudo yum install libsndfile
```
预测API,由输入文本合成对应音频波形。
- ### 2、环境依赖
**参数**
- 2.0.0 > paddlepaddle >= 1.8.2
* texts (list\[str\]): 待预测文本;
* use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA\_VISIBLE\_DEVICES环境变量**
* vocoder: 指定声码器,可选 "griffin-lim"或"waveflow"
- 2.0.0 > paddlehub >= 1.7.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
**返回**
- ### 3、安装
* wavs (list): 语音合成结果列表,列表中每一个元素为对应输入文本的音频波形,可使用`soundfile.write`进一步处理或保存。
* sample\_rate (int): 合成音频的采样率。
- ```shell
$ hub install deepvoice3_ljspeech
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
**代码示例**
```python
import paddlehub as hub
import soundfile as sf
## 三、模型API预测
# Load deepvoice3_ljspeech module.
module = hub.Module(name="deepvoice3_ljspeech")
- ### 1、命令行预测
# Predict sentiment label
test_texts = ['Simple as this proposition is, it is necessary to be stated',
'Parakeet stands for Paddle PARAllel text-to-speech toolkit']
wavs, sample_rate = module.synthesize(texts=test_texts)
for index, wav in enumerate(wavs):
sf.write(f"{index}.wav", wav, sample_rate)
```
- ```shell
$ hub run deepvoice3_ljspeech --input_text='Simple as this proposition is, it is necessary to be stated' --use_gpu True --vocoder griffin-lim
```
- 通过命令行方式实现语音合成模型的调用,更多请见[PaddleHub命令行指令](https://github.com/shinichiye/PaddleHub/blob/release/v2.1/docs/docs_ch/tutorial/cmd_usage.rst)
## 服务部署
- ### 2、预测代码示例
PaddleHub Serving 可以部署在线服务。
- ```python
import paddlehub as hub
import soundfile as sf
### 第一步:启动PaddleHub Serving
# Load deepvoice3_ljspeech module.
module = hub.Module(name="deepvoice3_ljspeech")
运行启动命令:
```shell
$ hub serving start -m deepvoice3_ljspeech
```
# Predict sentiment label
test_texts = ['Simple as this proposition is, it is necessary to be stated',
'Parakeet stands for Paddle PARAllel text-to-speech toolkit']
wavs, sample_rate = module.synthesize(texts=test_texts)
for index, wav in enumerate(wavs):
sf.write(f"{index}.wav", wav, sample_rate)
```
这样就完成了一个服务化API的部署,默认端口号为8866。
- ### 3、API
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ```python
def synthesize(texts, use_gpu=False, vocoder="griffin-lim"):
```
### 第二步:发送预测请求
- 预测API,由输入文本合成对应音频波形。
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- **参数**
- texts (list\[str\]): 待预测文本;
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA\_VISIBLE\_DEVICES环境变量**;
- vocoder: 指定声码器,可选 "griffin-lim"或"waveflow"
```python
import requests
import json
- **返回**
- wavs (list): 语音合成结果列表,列表中每一个元素为对应输入文本的音频波形,可使用`soundfile.write`进一步处理或保存。
- sample\_rate (int): 合成音频的采样率。
import soundfile as sf
# 发送HTTP请求
## 四、服务部署
data = {'texts':['Simple as this proposition is, it is necessary to be stated',
'Parakeet stands for Paddle PARAllel text-to-speech toolkit'],
'use_gpu':False}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/deepvoice3_ljspeech"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
- PaddleHub Serving可以部署一个在线语音合成服务,可以将此接口用于在线web应用。
# 保存结果
result = r.json()["results"]
wavs = result["wavs"]
sample_rate = result["sample_rate"]
for index, wav in enumerate(wavs):
sf.write(f"{index}.wav", wav, sample_rate)
```
- ### 第一步:启动PaddleHub Serving
## 查看代码
- 运行启动命令
- ```shell
$ hub serving start -m deepvoice3_ljspeech
```
- 这样就完成了服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
https://github.com/PaddlePaddle/Parakeet
- ### 第二步:发送预测请求
### 依赖
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
paddlepaddle >= 1.8.2
- ```python
import requests
import json
paddlehub >= 1.7.0
import soundfile as sf
**NOTE:** 除了python依赖外还必须安装libsndfile库
# 发送HTTP请求
对于Ubuntu用户,请执行:
```
sudo apt-get install libsndfile1
```
对于Centos用户,请执行:
```
sudo yum install libsndfile
```
data = {'texts':['Simple as this proposition is, it is necessary to be stated',
'Parakeet stands for Paddle PARAllel text-to-speech toolkit'],
'use_gpu':False}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/deepvoice3_ljspeech"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
## 更新历史
# 保存结果
result = r.json()["results"]
wavs = result["wavs"]
sample_rate = result["sample_rate"]
for index, wav in enumerate(wavs):
sf.write(f"{index}.wav", wav, sample_rate)
```
## 五、更新历史
* 1.0.0
初始发布
```shell
$ hub install deepvoice3_ljspeech
```
## 概述
# fastspeech_ljspeech
|模型名称|fastspeech_ljspeech|
| :--- | :---: |
|类别|语音-语音合成|
|网络|FastSpeech|
|数据集|LJSpeech-1.1|
|是否支持Fine-tuning|否|
|模型大小|320MB|
|最新更新日期|2020-10-27|
|数据指标|-|
## 一、模型基本信息
### 模型介绍
FastSpeech是基于Transformer的前馈神经网络,作者从encoder-decoder结构的teacher model中提取attention对角线来做发音持续时间预测,即使用长度调节器对文本序列进行扩展来匹配目标梅尔频谱的长度,以便并行生成梅尔频谱。该模型基本上消除了复杂情况下的跳词和重复的问题,并且可以平滑地调整语音速度,更重要的是,该模型大幅度提升了梅尔频谱的生成速度。fastspeech_ljspeech是基于ljspeech英文语音数据集预训练得到的英文TTS模型,仅支持预测。
<p align="center">
<img src="https://github.com/PaddlePaddle/Parakeet/blob/develop/examples/fastspeech/images/model_architecture.png" hspace='10'/> <br />
<img src="https://raw.githubusercontent.com/PaddlePaddle/Parakeet/release/v0.1/examples/fastspeech/images/model_architecture.png" hspace='10'/> <br/>
</p>
更多详情参考论文[FastSpeech: Fast, Robust and Controllable Text to Speech](https://arxiv.org/abs/1905.09263)
## 命令行预测
```shell
$ hub run fastspeech_ljspeech --input_text='Simple as this proposition is, it is necessary to be stated' --use_gpu True --vocoder griffin-lim
```
## 二、安装
## API
- ### 1、系统依赖
```python
def synthesize(texts, use_gpu=False, speed=1.0, vocoder="griffin-lim"):
```
对于Ubuntu用户,请执行:
```
sudo apt-get install libsndfile1
```
对于Centos用户,请执行:
```
sudo yum install libsndfile
```
预测API,由输入文本合成对应音频波形。
- ### 2、环境依赖
**参数**
- 2.0.0 > paddlepaddle >= 1.8.2
* texts (list\[str\]): 待预测文本;
* use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA\_VISIBLE\_DEVICES环境变量**
* speed(float): 音频速度,1.0表示以原速输出。
* vocoder: 指定声码器,可选 "griffin-lim"或"waveflow"
- 2.0.0 > paddlehub >= 1.7.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
**返回**
- ### 3、安装
* wavs (list): 语音合成结果列表,列表中每一个元素为对应输入文本的音频波形,可使用`soundfile.write`进一步处理或保存。
* sample\_rate (int): 合成音频的采样率。
- ```shell
$ hub install fastspeech_ljspeech
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
**代码示例**
```python
import paddlehub as hub
import soundfile as sf
## 三、模型API预测
# Load fastspeech_ljspeech module.
module = hub.Module(name="fastspeech_ljspeech")
- ### 1、命令行预测
# Predict sentiment label
test_texts = ['Simple as this proposition is, it is necessary to be stated',
'Parakeet stands for Paddle PARAllel text-to-speech toolkit']
wavs, sample_rate = module.synthesize(texts=test_texts)
for index, wav in enumerate(wavs):
sf.write(f"{index}.wav", wav, sample_rate)
```
- ```shell
$ hub run fastspeech_ljspeech --input_text='Simple as this proposition is, it is necessary to be stated' --use_gpu True --vocoder griffin-lim
```
- 通过命令行方式实现语音合成模型的调用,更多请见[PaddleHub命令行指令](https://github.com/shinichiye/PaddleHub/blob/release/v2.1/docs/docs_ch/tutorial/cmd_usage.rst)
## 服务部署
- ### 2、预测代码示例
PaddleHub Serving 可以部署在线服务。
- ```python
import paddlehub as hub
import soundfile as sf
### 第一步:启动PaddleHub Serving
# Load fastspeech_ljspeech module.
module = hub.Module(name="fastspeech_ljspeech")
运行启动命令:
```shell
$ hub serving start -m fastspeech_ljspeech
```
# Predict sentiment label
test_texts = ['Simple as this proposition is, it is necessary to be stated',
'Parakeet stands for Paddle PARAllel text-to-speech toolkit']
wavs, sample_rate = module.synthesize(texts=test_texts)
for index, wav in enumerate(wavs):
sf.write(f"{index}.wav", wav, sample_rate)
```
这样就完成了一个服务化API的部署,默认端口号为8866。
- ### 3、API
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ```python
def synthesize(texts, use_gpu=False, speed=1.0, vocoder="griffin-lim"):
```
### 第二步:发送预测请求
- 预测API,由输入文本合成对应音频波形。
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- **参数**
- texts (list\[str\]): 待预测文本;
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA\_VISIBLE\_DEVICES环境变量**;
- speed(float): 音频速度,1.0表示以原速输出。
- vocoder: 指定声码器,可选 "griffin-lim"或"waveflow"
```python
import requests
import json
- **返回**
- wavs (list): 语音合成结果列表,列表中每一个元素为对应输入文本的音频波形,可使用`soundfile.write`进一步处理或保存。
- sample\_rate (int): 合成音频的采样率。
import soundfile as sf
# 发送HTTP请求
## 四、服务部署
data = {'texts':['Simple as this proposition is, it is necessary to be stated',
'Parakeet stands for Paddle PARAllel text-to-speech toolkit'],
'use_gpu':False}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/fastspeech_ljspeech"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
- PaddleHub Serving可以部署一个在线语音合成服务,可以将此接口用于在线web应用。
# 保存结果
result = r.json()["results"]
wavs = result["wavs"]
sample_rate = result["sample_rate"]
for index, wav in enumerate(wavs):
sf.write(f"{index}.wav", wav, sample_rate)
```
- ### 第一步:启动PaddleHub Serving
## 查看代码
- 运行启动命令
- ```shell
$ hub serving start -m fastspeech_ljspeech
```
- 这样就完成了服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
https://github.com/PaddlePaddle/Parakeet
- ### 第二步:发送预测请求
### 依赖
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
paddlepaddle >= 1.8.2
- ```python
import requests
import json
paddlehub >= 1.7.0
import soundfile as sf
**NOTE:** 除了python依赖外还必须安装libsndfile库
# 发送HTTP请求
对于Ubuntu用户,请执行:
```
sudo apt-get install libsndfile1
```
对于Centos用户,请执行:
```
sudo yum install libsndfile
```
data = {'texts':['Simple as this proposition is, it is necessary to be stated',
'Parakeet stands for Paddle PARAllel text-to-speech toolkit'],
'use_gpu':False}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/fastspeech_ljspeech"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
## 更新历史
# 保存结果
result = r.json()["results"]
wavs = result["wavs"]
sample_rate = result["sample_rate"]
for index, wav in enumerate(wavs):
sf.write(f"{index}.wav", wav, sample_rate)
```
## 五、更新历史
* 1.0.0
初始发布
```shell
$ hub install fastspeech_ljspeech
```
## 概述
# transformer_tts_ljspeech
|模型名称|transformer_tts_ljspeech|
| :--- | :---: |
|类别|语音-语音合成|
|网络|Transformer|
|数据集|LJSpeech-1.1|
|是否支持Fine-tuning|否|
|模型大小|54MB|
|最新更新日期|2020-10-27|
|数据指标|-|
## 一、模型基本信息
### 模型介绍
TansformerTTS 是使用了 Transformer 结构的端到端语音合成模型,对 Transformer 和 Tacotron2 进行了融合,取得了令人满意的效果。因为删除了 RNN 的循环连接,可并行的提供 decoder 的输入,进行并行训练,大大提升了模型的训练速度。transformer_tts_ljspeech是基于ljspeech英文语音数据集预训练得到的英文TTS模型,仅支持预测。
<p align="center">
<img src="https://github.com/PaddlePaddle/Parakeet/blob/develop/examples/transformer_tts/images/model_architecture.jpg" hspace='10'/> <br />
<img src="https://raw.githubusercontent.com/PaddlePaddle/Parakeet/release/v0.1/examples/transformer_tts/images/model_architecture.jpg" hspace='10'/> <br/>
</p>
更多详情参考论文[Neural Speech Synthesis with Transformer Network](https://arxiv.org/abs/1809.08895)
## 命令行预测
```shell
$ hub run transformer_tts_ljspeech --input_text="Life was like a box of chocolates, you never know what you're gonna get." --use_gpu True --vocoder griffin-lim
```
## 二、安装
- ### 1、系统依赖
## API
对于Ubuntu用户,请执行:
```
sudo apt-get install libsndfile1
```
对于Centos用户,请执行:
```
sudo yum install libsndfile
```
```python
def synthesize(texts, use_gpu=False, vocoder="griffin-lim"):
```
- ### 2、环境依赖
预测API,由输入文本合成对应音频波形。
- 2.0.0 > paddlepaddle >= 1.8.2
**参数**
- 2.0.0 > paddlehub >= 1.7.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
* texts (list\[str\]): 待预测文本;
* use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA\_VISIBLE\_DEVICES环境变量**
* vocoder: 指定声码器,可选 "griffin-lim"或"waveflow"
- ### 3、安装
**返回**
- ```shell
$ hub install transformer_tts_ljspeech
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
* wavs (list): 语音合成结果列表,列表中每一个元素为对应输入文本的音频波形,可使用`soundfile.write`进一步处理或保存。
* sample\_rate (int): 合成音频的采样率。
**代码示例**
## 三、模型API预测
```python
import paddlehub as hub
import soundfile as sf
- ### 1、命令行预测
# Load transformer_tts_ljspeech module.
module = hub.Module(name="transformer_tts_ljspeech")
- ```shell
$ hub run transformer_tts_ljspeech --input_text="Life was like a box of chocolates, you never know what you're gonna get." --use_gpu True --vocoder griffin-lim
```
- 通过命令行方式实现语音合成模型的调用,更多请见[PaddleHub命令行指令](https://github.com/shinichiye/PaddleHub/blob/release/v2.1/docs/docs_ch/tutorial/cmd_usage.rst)
# Predict sentiment label
test_texts = ["Life was like a box of chocolates, you never know what you're gonna get."]
wavs, sample_rate = module.synthesize(texts=test_texts, use_gpu=True, vocoder="waveflow")
for index, wav in enumerate(wavs):
sf.write(f"{index}.wav", wav, sample_rate)
```
- ### 2、预测代码示例
## 服务部署
- ```python
import paddlehub as hub
import soundfile as sf
PaddleHub Serving 可以部署在线服务。
# Load transformer_tts_ljspeech module.
module = hub.Module(name="transformer_tts_ljspeech")
### 第一步:启动PaddleHub Serving
# Predict sentiment label
test_texts = ["Life was like a box of chocolates, you never know what you're gonna get."]
wavs, sample_rate = module.synthesize(texts=test_texts, use_gpu=True, vocoder="waveflow")
for index, wav in enumerate(wavs):
sf.write(f"{index}.wav", wav, sample_rate)
```
运行启动命令:
```shell
$ hub serving start -m transformer_tts_ljspeech
```
- ### 3、API
这样就完成了一个服务化API的部署,默认端口号为8866。
- ```python
def synthesize(texts, use_gpu=False, vocoder="griffin-lim"):
```
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置
- 预测API,由输入文本合成对应音频波形
### 第二步:发送预测请求
- **参数**
- texts (list\[str\]): 待预测文本;
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA\_VISIBLE\_DEVICES环境变量**;
- vocoder: 指定声码器,可选 "griffin-lim"或"waveflow"
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- **返回**
- wavs (list): 语音合成结果列表,列表中每一个元素为对应输入文本的音频波形,可使用`soundfile.write`进一步处理或保存。
- sample\_rate (int): 合成音频的采样率。
```python
import requests
import json
import soundfile as sf
## 四、服务部署
# 发送HTTP请求
- PaddleHub Serving可以部署一个在线语音合成服务,可以将此接口用于在线web应用。
data = {'texts':['Simple as this proposition is, it is necessary to be stated',
'Parakeet stands for Paddle PARAllel text-to-speech toolkit'],
'use_gpu':False}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/transformer_tts_ljspeech"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
- ### 第一步:启动PaddleHub Serving
# 保存结果
result = r.json()["results"]
wavs = result["wavs"]
sample_rate = result["sample_rate"]
for index, wav in enumerate(wavs):
sf.write(f"{index}.wav", wav, sample_rate)
```
- 运行启动命令
## 查看代码
- ```shell
$ hub serving start -m transformer_tts_ljspeech
```
- 这样就完成了服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
https://github.com/PaddlePaddle/Parakeet
- ### 第二步:发送预测请求
### 依赖
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
paddlepaddle >= 1.8.2
- ```python
import requests
import json
paddlehub >= 1.7.0
import soundfile as sf
**NOTE:** 除了python依赖外还必须安装libsndfile库
# 发送HTTP请求
对于Ubuntu用户,请执行:
```
sudo apt-get install libsndfile1
```
对于Centos用户,请执行:
```
sudo yum install libsndfile
```
data = {'texts':['Simple as this proposition is, it is necessary to be stated',
'Parakeet stands for Paddle PARAllel text-to-speech toolkit'],
'use_gpu':False}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/transformer_tts_ljspeech"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
## 更新历史
# 保存结果
result = r.json()["results"]
wavs = result["wavs"]
sample_rate = result["sample_rate"]
for index, wav in enumerate(wavs):
sf.write(f"{index}.wav", wav, sample_rate)
```
## 五、更新历史
* 1.0.0
初始发布
```shell
$ hub install transformer_tts_ljspeech
```
# ge2e_fastspeech2_pwgan
|模型名称|ge2e_fastspeech2_pwgan|
| :--- | :---: |
|类别|语音-声音克隆|
|网络|FastSpeech2|
|数据集|AISHELL-3|
|是否支持Fine-tuning|否|
|模型大小|462MB|
|最新更新日期|2021-12-17|
|数据指标|-|
## 一、模型基本信息
### 模型介绍
声音克隆是指使用特定的音色,结合文字的读音合成音频,使得合成后的音频具有目标说话人的特征,从而达到克隆的目的。
在训练语音克隆模型时,目标音色作为Speaker Encoder的输入,模型会提取这段语音的说话人特征(音色)作为Speaker Embedding。接着,在训练模型重新合成此类音色的语音时,除了输入的目标文本外,说话人的特征也将成为额外条件加入模型的训练。
在预测时,选取一段新的目标音色作为Speaker Encoder的输入,并提取其说话人特征,最终实现输入为一段文本和一段目标音色,模型生成目标音色说出此段文本的语音片段。
![](https://ai-studio-static-online.cdn.bcebos.com/982ab955b87244d3bae3b003aff8e28d9ec159ff0d6246a79757339076dfe7d4)
`ge2e_fastspeech2_pwgan`是一个支持中文的语音克隆模型,分别使用了LSTMSpeakerEncoder、FastSpeech2和PWGan模型分别用于语音特征提取、目标音频特征合成和语音波形转换。
关于模型的详请可参考[PaddleSpeech](https://github.com/PaddlePaddle/PaddleSpeech)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.2.0
- paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
- ```shell
$ hub install ge2e_fastspeech2_pwgan
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、预测代码示例
- ```python
import paddlehub as hub
model = hub.Module(name='ge2e_fastspeech2_pwgan', output_dir='./', speaker_audio='/data/man.wav') # 指定目标音色音频文件
texts = [
'语音的表现形式在未来将变得越来越重要$',
'今天的天气怎么样$', ]
wavs = model.generate(texts, use_gpu=True)
for text, wav in zip(texts, wavs):
print('='*30)
print(f'Text: {text}')
print(f'Wav: {wav}')
```
- ### 2、API
- ```python
def __init__(speaker_audio: str = None,
output_dir: str = './')
```
- 初始化module,可配置模型的目标音色的音频文件和输出的路径。
- **参数**
- `speaker_audio`(str): 目标说话人语音音频文件(*.wav)的路径,默认为None(使用默认的女声作为目标音色)。
- `output_dir`(str): 合成音频的输出文件,默认为当前目录。
- ```python
def get_speaker_embedding()
```
- 获取模型的目标说话人特征。
- **返回**
- `results`(numpy.ndarray): 长度为256的numpy数组,代表目标说话人的特征。
- ```python
def set_speaker_embedding(speaker_audio: str)
```
- 设置模型的目标说话人特征。
- **参数**
- `speaker_audio`(str): 必填,目标说话人语音音频文件(*.wav)的路径。
- ```python
def generate(data: Union[str, List[str]], use_gpu: bool = False):
```
- 根据输入文字,合成目标说话人的语音音频文件。
- **参数**
- `data`(Union[str, List[str]]): 必填,目标音频的内容文本列表,目前只支持中文,不支持添加标点符号。
- `use_gpu`(bool): 是否使用gpu执行计算,默认为False。
## 四、更新历史
* 1.0.0
初始发布。
```shell
$ hub install ge2e_fastspeech2_pwgan
```
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
from typing import List, Union
import numpy as np
import paddle
import soundfile as sf
import yaml
from yacs.config import CfgNode
from paddlehub.env import MODULE_HOME
from paddlehub.module.module import moduleinfo, serving
from paddlehub.utils.log import logger
from paddlespeech.t2s.frontend.zh_frontend import Frontend
from paddlespeech.t2s.models.fastspeech2 import FastSpeech2
from paddlespeech.t2s.models.fastspeech2 import FastSpeech2Inference
from paddlespeech.t2s.models.parallel_wavegan import PWGGenerator
from paddlespeech.t2s.models.parallel_wavegan import PWGInference
from paddlespeech.t2s.modules.normalizer import ZScore
from paddlespeech.vector.exps.ge2e.audio_processor import SpeakerVerificationPreprocessor
from paddlespeech.vector.models.lstm_speaker_encoder import LSTMSpeakerEncoder
@moduleinfo(
name="ge2e_fastspeech2_pwgan",
version="1.0.0",
summary="",
author="paddlepaddle",
author_email="",
type="audio/voice_cloning",
)
class VoiceCloner(paddle.nn.Layer):
def __init__(self, speaker_audio: str = None, output_dir: str = './'):
super(VoiceCloner, self).__init__()
speaker_encoder_ckpt = os.path.join(MODULE_HOME, 'ge2e_fastspeech2_pwgan', 'assets',
'ge2e_ckpt_0.3/step-3000000.pdparams')
synthesizer_res_dir = os.path.join(MODULE_HOME, 'ge2e_fastspeech2_pwgan', 'assets',
'fastspeech2_nosil_aishell3_vc1_ckpt_0.5')
vocoder_res_dir = os.path.join(MODULE_HOME, 'ge2e_fastspeech2_pwgan', 'assets', 'pwg_aishell3_ckpt_0.5')
# Speaker encoder
self.speaker_processor = SpeakerVerificationPreprocessor(
sampling_rate=16000,
audio_norm_target_dBFS=-30,
vad_window_length=30,
vad_moving_average_width=8,
vad_max_silence_length=6,
mel_window_length=25,
mel_window_step=10,
n_mels=40,
partial_n_frames=160,
min_pad_coverage=0.75,
partial_overlap_ratio=0.5)
self.speaker_encoder = LSTMSpeakerEncoder(n_mels=40, num_layers=3, hidden_size=256, output_size=256)
self.speaker_encoder.set_state_dict(paddle.load(speaker_encoder_ckpt))
self.speaker_encoder.eval()
# Voice synthesizer
with open(os.path.join(synthesizer_res_dir, 'default.yaml'), 'r') as f:
fastspeech2_config = CfgNode(yaml.safe_load(f))
with open(os.path.join(synthesizer_res_dir, 'phone_id_map.txt'), 'r') as f:
phn_id = [line.strip().split() for line in f.readlines()]
model = FastSpeech2(idim=len(phn_id), odim=fastspeech2_config.n_mels, **fastspeech2_config["model"])
model.set_state_dict(paddle.load(os.path.join(synthesizer_res_dir, 'snapshot_iter_96400.pdz'))["main_params"])
model.eval()
stat = np.load(os.path.join(synthesizer_res_dir, 'speech_stats.npy'))
mu, std = stat
mu = paddle.to_tensor(mu)
std = paddle.to_tensor(std)
fastspeech2_normalizer = ZScore(mu, std)
self.sample_rate = fastspeech2_config.fs
self.fastspeech2_inference = FastSpeech2Inference(fastspeech2_normalizer, model)
self.fastspeech2_inference.eval()
# Vocoder
with open(os.path.join(vocoder_res_dir, 'default.yaml')) as f:
pwg_config = CfgNode(yaml.safe_load(f))
vocoder = PWGGenerator(**pwg_config["generator_params"])
vocoder.set_state_dict(
paddle.load(os.path.join(vocoder_res_dir, 'snapshot_iter_1000000.pdz'))["generator_params"])
vocoder.remove_weight_norm()
vocoder.eval()
stat = np.load(os.path.join(vocoder_res_dir, 'feats_stats.npy'))
mu, std = stat
mu = paddle.to_tensor(mu)
std = paddle.to_tensor(std)
pwg_normalizer = ZScore(mu, std)
self.pwg_inference = PWGInference(pwg_normalizer, vocoder)
self.pwg_inference.eval()
# Text frontend
self.frontend = Frontend(phone_vocab_path=os.path.join(synthesizer_res_dir, 'phone_id_map.txt'))
# Speaking embedding
self._speaker_embedding = None
if speaker_audio is None or not os.path.isfile(speaker_audio):
speaker_audio = os.path.join(MODULE_HOME, 'ge2e_fastspeech2_pwgan', 'assets', 'voice_cloning.wav')
logger.warning(f'Due to no speaker audio is specified, speaker encoder will use defult '
f'waveform({speaker_audio}) to extract speaker embedding. You can use '
'"set_speaker_embedding()" method to reset a speaker audio for voice cloning.')
self.set_speaker_embedding(speaker_audio)
self.output_dir = os.path.abspath(output_dir)
if not os.path.exists(self.output_dir):
os.makedirs(self.output_dir)
def get_speaker_embedding(self):
return self._speaker_embedding.numpy()
@paddle.no_grad()
def set_speaker_embedding(self, speaker_audio: str):
assert os.path.exists(speaker_audio), f'Speaker audio file: {speaker_audio} does not exists.'
mel_sequences = self.speaker_processor.extract_mel_partials(
self.speaker_processor.preprocess_wav(speaker_audio))
self._speaker_embedding = self.speaker_encoder.embed_utterance(paddle.to_tensor(mel_sequences))
logger.info(f'Speaker embedding has been set from file: {speaker_audio}')
@paddle.no_grad()
def generate(self, data: Union[str, List[str]], use_gpu: bool = False):
assert self._speaker_embedding is not None, f'Set speaker embedding before voice cloning.'
if isinstance(data, str):
data = [data]
elif isinstance(data, list):
assert len(data) > 0 and isinstance(data[0],
str) and len(data[0]) > 0, f'Input data should be str of List[str].'
else:
raise Exception(f'Input data should be str of List[str].')
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
files = []
for idx, text in enumerate(data):
phone_ids = self.frontend.get_input_ids(text, merge_sentences=True)["phone_ids"][0]
wav = self.pwg_inference(self.fastspeech2_inference(phone_ids, spk_emb=self._speaker_embedding))
output_wav = os.path.join(self.output_dir, f'{idx+1}.wav')
sf.write(output_wav, wav.numpy(), samplerate=self.sample_rate)
files.append(output_wav)
return files
```shell
$ hub install lstm_tacotron2==1.0.0
```
# lstm_tacotron2
|模型名称|lstm_tacotron2|
| :--- | :---: |
|类别|语音-语音合成|
|网络|LSTM、Tacotron2、WaveFlow|
|数据集|AISHELL-3|
|是否支持Fine-tuning|否|
|模型大小|327MB|
|最新更新日期|2021-06-15|
|数据指标|-|
## 一、模型基本信息
## 概述
### 模型介绍
声音克隆是指使用特定的音色,结合文字的读音合成音频,使得合成后的音频具有目标说话人的特征,从而达到克隆的目的。
......@@ -10,93 +20,107 @@ $ hub install lstm_tacotron2==1.0.0
在预测时,选取一段新的目标音色作为Speaker Encoder的输入,并提取其说话人特征,最终实现输入为一段文本和一段目标音色,模型生成目标音色说出此段文本的语音片段。
![](https://ai-studio-static-online.cdn.bcebos.com/982ab955b87244d3bae3b003aff8e28d9ec159ff0d6246a79757339076dfe7d4)
<p align="center">
<img src="https://ai-studio-static-online.cdn.bcebos.com/982ab955b87244d3bae3b003aff8e28d9ec159ff0d6246a79757339076dfe7d4" hspace='10'/> <br/>
</p>
`lstm_tacotron2`是一个支持中文的语音克隆模型,分别使用了LSTMSpeakerEncoder、Tacotron2和WaveFlow模型分别用于语音特征提取、目标音频特征合成和语音波形转换。
关于模型的详请可参考[Parakeet](https://github.com/PaddlePaddle/Parakeet/tree/release/v0.3/parakeet/models)
更多详情请参考:
- [Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis](https://arxiv.org/pdf/1806.04558.pdf)
- [Parakeet](https://github.com/PaddlePaddle/Parakeet/tree/release/v0.3/parakeet/models)
## API
## 二、安装
```python
def __init__(speaker_audio: str = None,
output_dir: str = './')
```
初始化module,可配置模型的目标音色的音频文件和输出的路径。
- ### 1、环境依赖
**参数**
- `speaker_audio`(str): 目标说话人语音音频文件(*.wav)的路径,默认为None(使用默认的女声作为目标音色)。
- `output_dir`(str): 合成音频的输出文件,默认为当前目录。
- paddlepaddle >= 2.0.0
- paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
```python
def get_speaker_embedding()
```
获取模型的目标说话人特征。
- ### 2、安装
**返回**
* `results`(numpy.ndarray): 长度为256的numpy数组,代表目标说话人的特征。
- ```shell
$ hub install lstm_tacotron2
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
```python
def set_speaker_embedding(speaker_audio: str)
```
设置模型的目标说话人特征。
**参数**
- `speaker_audio`(str): 必填,目标说话人语音音频文件(*.wav)的路径。
## 三、模型API预测
```python
def generate(data: List[str], batch_size: int = 1, use_gpu: bool = False):
```
根据输入文字,合成目标说话人的语音音频文件。
- ### 1、预测代码示例
**参数**
- `data`(List[str]): 必填,目标音频的内容文本列表,目前只支持中文,不支持添加标点符号。
- `batch_size`(int): 可选,模型合成语音时的batch_size,默认为1。
- `use_gpu`(bool): 是否使用gpu执行计算,默认为False。
- ```python
import paddlehub as hub
model = hub.Module(name='lstm_tacotron2', output_dir='/data', speaker_audio='/data/man.wav') # 指定目标音色音频文件
texts = [
'语音的表现形式在未来将变得越来越重要$',
'今天的天气怎么样$', ]
wavs = model.generate(texts, use_gpu=True)
**代码示例**
for text, wav in zip(texts, wavs):
print('='*30)
print(f'Text: {text}')
print(f'Wav: {wav}')
```
```
==============================
Text: 语音的表现形式在未来将变得越来越重要$
Wav: /data/1.wav
==============================
Text: 今天的天气怎么样$
Wav: /data/2.wav
```
```python
import paddlehub as hub
- ### 2、API
model = hub.Module(name='lstm_tacotron2', output_dir='./', speaker_audio='/data/man.wav') # 指定目标音色音频文件
texts = [
'语音的表现形式在未来将变得越来越重要$',
'今天的天气怎么样$', ]
wavs = model.generate(texts, use_gpu=True)
- ```python
def __init__(speaker_audio: str = None,
output_dir: str = './')
```
- 初始化module,可配置模型的目标音色的音频文件和输出的路径。
for text, wav in zip(texts, wavs):
print('='*30)
print(f'Text: {text}')
print(f'Wav: {wav}')
```
- **参数**
- `speaker_audio`(str): 目标说话人语音音频文件(*.wav)的路径,默认为None(使用默认的女声作为目标音色)。
- `output_dir`(str): 合成音频的输出文件,默认为当前目录。
输出
```
==============================
Text: 语音的表现形式在未来将变得越来越重要$
Wav: /data/1.wav
==============================
Text: 今天的天气怎么样$
Wav: /data/2.wav
```
- ```python
def get_speaker_embedding()
```
- 获取模型的目标说话人特征。
- **返回**
- `results`(numpy.ndarray): 长度为256的numpy数组,代表目标说话人的特征。
## 查看代码
- ```python
def set_speaker_embedding(speaker_audio: str)
```
- 设置模型的目标说话人特征。
https://github.com/PaddlePaddle/Parakeet
- **参数**
- `speaker_audio`(str): 必填,目标说话人语音音频文件(*.wav)的路径。
## 依赖
- ```python
def generate(data: List[str], batch_size: int = 1, use_gpu: bool = False):
```
- 根据输入文字,合成目标说话人的语音音频文件。
paddlepaddle >= 2.0.0
- **参数**
- `data`(List[str]): 必填,目标音频的内容文本列表,目前只支持中文,不支持添加标点符号。
- `batch_size`(int): 可选,模型合成语音时的batch_size,默认为1。
- `use_gpu`(bool): 是否使用gpu执行计算,默认为False。
paddlehub >= 2.1.0
## 更新历史
## 四、更新历史
* 1.0.0
初始发布
```shell
$ hub install lstm_tacotron2==1.0.0
```
# styleganv2_editing
|模型名称|styleganv2_editing|
| :--- | :---: |
|类别|图像 - 图像生成|
|网络|StyleGAN V2|
|数据集|-|
|是否支持Fine-tuning|否|
|模型大小|190MB|
|最新更新日期|2021-12-15|
|数据指标|-|
## 一、模型基本信息
- ### 应用效果展示
- 样例结果示例:
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/146483720-fb0ea3c0-b259-4ad6-b176-966675b9b164.png" width = "40%" hspace='10'/>
<br />
输入图像
<br />
<img src="https://user-images.githubusercontent.com/22424850/146483730-3104795e-4ee6-43de-b4dc-b7760d502b50.png" width = "40%" hspace='10'/>
<br />
输出图像(修改age)
<br />
</p>
- ### 模型介绍
- StyleGAN V2 的任务是使用风格向量进行image generation,而Editing模块则是利用预先对多图的风格向量进行分类回归得到的属性操纵向量来操纵生成图像的属性。
## 二、安装
- ### 1、环境依赖
- ppgan
- ### 2、安装
- ```shell
$ hub install styleganv2_editing
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
# Read from a file
$ hub run styleganv2_editing --input_path "/PATH/TO/IMAGE" --direction_name age --direction_offset 5
```
- 通过命令行方式实现人脸编辑模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
module = hub.Module(name="styleganv2_editing")
input_path = ["/PATH/TO/IMAGE"]
# Read from a file
module.generate(paths=input_path, direction_name = 'age', direction_offset = 5, output_dir='./editing_result/', use_gpu=True)
```
- ### 3、API
- ```python
generate(self, images=None, paths=None, direction_name = 'age', direction_offset = 0.0, output_dir='./editing_result/', use_gpu=False, visualization=True)
```
- 人脸编辑生成API。
- **参数**
- images (list\[numpy.ndarray\]): 图片数据 <br/>
- paths (list\[str\]): 图片路径;<br/>
- direction_name (str): 要编辑的属性名称,对于ffhq-conf-f有预先准备的这些属性: age、eyes_open、eye_distance、eye_eyebrow_distance、eye_ratio、gender、lip_ratio、mouth_open、mouth_ratio、nose_mouth_distance、nose_ratio、nose_tip、pitch、roll、smile、yaw <br/>
- direction_offset (float): 属性的偏移强度 <br/>
- output\_dir (str): 结果保存的路径; <br/>
- use\_gpu (bool): 是否使用 GPU;<br/>
- visualization(bool): 是否保存结果到本地文件夹
## 四、服务部署
- PaddleHub Serving可以部署一个在线人脸编辑服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m styleganv2_editing
```
- 这样就完成了一个人脸编辑的在线服务API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/styleganv2_editing"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
## 五、更新历史
* 1.0.0
初始发布
- ```shell
$ hub install styleganv2_editing==1.0.0
```
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import random
import numpy as np
import paddle
from ppgan.models.generators import StyleGANv2Generator
from ppgan.utils.download import get_path_from_url
from ppgan.utils.visual import make_grid, tensor2img, save_image
model_cfgs = {
'ffhq-config-f': {
'model_urls': 'https://paddlegan.bj.bcebos.com/models/stylegan2-ffhq-config-f.pdparams',
'size': 1024,
'style_dim': 512,
'n_mlp': 8,
'channel_multiplier': 2
},
'animeface-512': {
'model_urls': 'https://paddlegan.bj.bcebos.com/models/stylegan2-animeface-512.pdparams',
'size': 512,
'style_dim': 512,
'n_mlp': 8,
'channel_multiplier': 2
}
}
@paddle.no_grad()
def get_mean_style(generator):
mean_style = None
for i in range(10):
style = generator.mean_latent(1024)
if mean_style is None:
mean_style = style
else:
mean_style += style
mean_style /= 10
return mean_style
@paddle.no_grad()
def sample(generator, mean_style, n_sample):
image = generator(
[paddle.randn([n_sample, generator.style_dim])],
truncation=0.7,
truncation_latent=mean_style,
)[0]
return image
@paddle.no_grad()
def style_mixing(generator, mean_style, n_source, n_target):
source_code = paddle.randn([n_source, generator.style_dim])
target_code = paddle.randn([n_target, generator.style_dim])
resolution = 2**((generator.n_latent + 2) // 2)
images = [paddle.ones([1, 3, resolution, resolution]) * -1]
source_image = generator([source_code], truncation_latent=mean_style, truncation=0.7)[0]
target_image = generator([target_code], truncation_latent=mean_style, truncation=0.7)[0]
images.append(source_image)
for i in range(n_target):
image = generator(
[target_code[i].unsqueeze(0).tile([n_source, 1]), source_code],
truncation_latent=mean_style,
truncation=0.7,
)[0]
images.append(target_image[i].unsqueeze(0))
images.append(image)
images = paddle.concat(images, 0)
return images
class StyleGANv2Predictor:
def __init__(self,
output_path='output_dir',
weight_path=None,
model_type=None,
seed=None,
size=1024,
style_dim=512,
n_mlp=8,
channel_multiplier=2):
self.output_path = output_path
if weight_path is None:
if model_type in model_cfgs.keys():
weight_path = get_path_from_url(model_cfgs[model_type]['model_urls'])
size = model_cfgs[model_type].get('size', size)
style_dim = model_cfgs[model_type].get('style_dim', style_dim)
n_mlp = model_cfgs[model_type].get('n_mlp', n_mlp)
channel_multiplier = model_cfgs[model_type].get('channel_multiplier', channel_multiplier)
checkpoint = paddle.load(weight_path)
else:
raise ValueError('Predictor need a weight path or a pretrained model type')
else:
checkpoint = paddle.load(weight_path)
self.generator = StyleGANv2Generator(size, style_dim, n_mlp, channel_multiplier)
self.generator.set_state_dict(checkpoint)
self.generator.eval()
if seed is not None:
paddle.seed(seed)
random.seed(seed)
np.random.seed(seed)
def run(self, n_row=3, n_col=5):
os.makedirs(self.output_path, exist_ok=True)
mean_style = get_mean_style(self.generator)
img = sample(self.generator, mean_style, n_row * n_col)
save_image(tensor2img(make_grid(img, nrow=n_col)), f'{self.output_path}/sample.png')
for j in range(2):
img = style_mixing(self.generator, mean_style, n_col, n_row)
save_image(tensor2img(make_grid(img, nrow=n_col + 1)), f'{self.output_path}/sample_mixing_{j}.png')
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import cv2
import numpy as np
import paddle
from ppgan.utils.download import get_path_from_url
from .basemodel import StyleGANv2Predictor
model_cfgs = {
'ffhq-config-f': {
'direction_urls': 'https://paddlegan.bj.bcebos.com/models/stylegan2-ffhq-config-f-directions.pdparams'
}
}
def make_image(tensor):
return (((tensor.detach() + 1) / 2 * 255).clip(min=0, max=255).transpose((0, 2, 3, 1)).numpy().astype('uint8'))
class StyleGANv2EditingPredictor(StyleGANv2Predictor):
def __init__(self, model_type=None, direction_path=None, **kwargs):
super().__init__(model_type=model_type, **kwargs)
if direction_path is None and model_type is not None:
assert model_type in model_cfgs, f'There is not any pretrained direction file for {model_type} model.'
direction_path = get_path_from_url(model_cfgs[model_type]['direction_urls'])
self.directions = paddle.load(direction_path)
@paddle.no_grad()
def run(self, latent, direction, offset):
latent = paddle.to_tensor(latent).unsqueeze(0).astype('float32')
direction = self.directions[direction].unsqueeze(0).astype('float32')
latent_n = paddle.concat([latent, latent + offset * direction], 0)
generator = self.generator
img_gen, _ = generator([latent_n], input_is_latent=True, randomize_noise=False)
imgs = make_image(img_gen)
src_img = imgs[0]
dst_img = imgs[1]
dst_latent = (latent + offset * direction)[0].numpy().astype('float32')
return src_img, dst_img, dst_latent
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import argparse
import copy
import paddle
import paddlehub as hub
from paddlehub.module.module import moduleinfo, runnable, serving
import numpy as np
import cv2
from skimage.io import imread
from skimage.transform import rescale, resize
from .model import StyleGANv2EditingPredictor
from .util import base64_to_cv2
@moduleinfo(
name="styleganv2_editing",
type="CV/style_transfer",
author="paddlepaddle",
author_email="",
summary="",
version="1.0.0")
class styleganv2_editing:
def __init__(self):
self.pretrained_model = os.path.join(self.directory, "stylegan2-ffhq-config-f-directions.pdparams")
self.network = StyleGANv2EditingPredictor(direction_path=self.pretrained_model, model_type='ffhq-config-f')
self.pixel2style2pixel_module = hub.Module(name='pixel2style2pixel')
def generate(self,
images=None,
paths=None,
direction_name='age',
direction_offset=0.0,
output_dir='./editing_result/',
use_gpu=False,
visualization=True):
'''
images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR(read by cv2).
paths (list[str]): paths to image.
direction_name(str): Attribute to be manipulated,For ffhq-conf-f, we have: age, eyes_open, eye_distance, eye_eyebrow_distance, eye_ratio, gender, lip_ratio, mouth_open, mouth_ratio, nose_mouth_distance, nose_ratio, nose_tip, pitch, roll, smile, yaw.
direction_offset(float): Offset strength of the attribute.
output_dir: the dir to save the results
use_gpu: if True, use gpu to perform the computation, otherwise cpu.
visualization: if True, save results in output_dir.
'''
results = []
paddle.disable_static()
place = 'gpu:0' if use_gpu else 'cpu'
place = paddle.set_device(place)
if images == None and paths == None:
print('No image provided. Please input an image or a image path.')
return
if images != None:
for image in images:
image = image[:, :, ::-1]
_, latent = self.pixel2style2pixel_module.network.run(image)
out = self.network.run(latent, direction_name, direction_offset)
results.append(out)
if paths != None:
for path in paths:
image = cv2.imread(path)[:, :, ::-1]
_, latent = self.pixel2style2pixel_module.network.run(image)
out = self.network.run(latent, direction_name, direction_offset)
results.append(out)
if visualization == True:
if not os.path.exists(output_dir):
os.makedirs(output_dir, exist_ok=True)
for i, out in enumerate(results):
if out is not None:
cv2.imwrite(os.path.join(output_dir, 'src_{}.png'.format(i)), out[0][:, :, ::-1])
cv2.imwrite(os.path.join(output_dir, 'dst_{}.png'.format(i)), out[1][:, :, ::-1])
np.save(os.path.join(output_dir, 'dst_{}.npy'.format(i)), out[2])
return results
@runnable
def run_cmd(self, argvs: list):
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(
description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
self.args = self.parser.parse_args(argvs)
results = self.generate(
paths=[self.args.input_path],
direction_name=self.args.direction_name,
direction_offset=self.args.direction_offset,
output_dir=self.args.output_dir,
use_gpu=self.args.use_gpu,
visualization=self.args.visualization)
return results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.generate(images=images_decode, **kwargs)
tolist = [result.tolist() for result in results]
return tolist
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
self.arg_config_group.add_argument(
'--output_dir', type=str, default='editing_result', help='output directory for saving result.')
self.arg_config_group.add_argument('--visualization', type=bool, default=False, help='save results or not.')
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
self.arg_input_group.add_argument(
'--direction_name',
type=str,
default='age',
help=
"Attribute to be manipulated,For ffhq-conf-f, we have: age, eyes_open, eye_distance, eye_eyebrow_distance, eye_ratio, gender, lip_ratio, mouth_open, mouth_ratio, nose_mouth_distance, nose_ratio, nose_tip, pitch, roll, smile, yaw."
)
self.arg_input_group.add_argument('--direction_offset', type=float, help="Offset strength of the attribute.")
import base64
import cv2
import numpy as np
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# wav2lip
|模型名称|wav2lip|
| :--- | :---: |
|类别|图像 - 视频生成|
|网络|Wav2Lip|
|数据集|LRS2|
|是否支持Fine-tuning|否|
|模型大小|139MB|
|最新更新日期|2021-12-14|
|数据指标|-|
## 一、模型基本信息
- ### 应用效果展示
- 样例结果示例:
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/146481773-4ec50285-3b13-4a86-84a2-b105787b63d1.png" width = "40%" hspace='10'/>
<br />
输入图像
<br />
<img src="https://user-images.githubusercontent.com/22424850/146482210-5f309fc3-7582-452d-bcf5-f2c54b5c8dc8.gif" width = "40%" hspace='10'/>
<br />
输出视频
<br />
</p>
- ### 模型介绍
- Wav2Lip实现的是视频人物根据输入音频生成与语音同步的人物唇形,使得生成的视频人物口型与输入语音同步。Wav2Lip不仅可以基于静态图像来输出与目标语音匹配的唇形同步视频,还可以直接将动态的视频进行唇形转换,输出与目标语音匹配的视频。Wav2Lip实现唇形与语音精准同步突破的关键在于,它采用了唇形同步判别器,以强制生成器持续产生准确而逼真的唇部运动。此外,它通过在鉴别器中使用多个连续帧而不是单个帧,并使用视觉质量损失(而不仅仅是对比损失)来考虑时间相关性,从而改善了视觉质量。Wav2Lip适用于任何人脸、任何语言,对任意视频都能达到很高都准确率,可以无缝地与原始视频融合,还可以用于转换动画人脸。
## 二、安装
- ### 1、环境依赖
- ffmpeg
- libsndfile
- ### 2、安装
- ```shell
$ hub install wav2lip
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
# Read from a file
$ hub run wav2lip --face "/PATH/TO/VIDEO or IMAGE" --audio "/PATH/TO/AUDIO"
```
- 通过命令行方式人物唇形生成模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
module = hub.Module(name="wav2lip")
face_input_path = "/PATH/TO/VIDEO or IMAGE"
audio_input_path = "/PATH/TO/AUDIO"
module.wav2lip_transfer(face=face_input_path, audio=audio_input_path, output_dir='./transfer_result/', use_gpu=True)
```
- ### 3、API
- ```python
def wav2lip_transfer(face, audio, output_dir ='./output_result/', use_gpu=False, visualization=True):
```
- 人脸唇形生成API。
- **参数**
- face (str): 视频或图像文件的路径<br/>
- audio (str): 音频文件的路径<br/>
- output\_dir (str): 结果保存的路径; <br/>
- use\_gpu (bool): 是否使用 GPU;<br/>
- visualization(bool): 是否保存结果到本地文件夹
## 四、更新历史
* 1.0.0
初始发布
- ```shell
$ hub install wav2lip==1.0.0
```
from os import listdir, path, makedirs
import platform
import numpy as np
import scipy, cv2, os, sys, argparse
import json, subprocess, random, string
from tqdm import tqdm
from glob import glob
import paddle
from paddle.utils.download import get_weights_path_from_url
from ppgan.faceutils import face_detection
from ppgan.utils import audio
from ppgan.models.generators.wav2lip import Wav2Lip
WAV2LIP_WEIGHT_URL = 'https://paddlegan.bj.bcebos.com/models/wav2lip_hq.pdparams'
mel_step_size = 16
class Wav2LipPredictor:
def __init__(self,
checkpoint_path=None,
static=False,
fps=25,
pads=[0, 10, 0, 0],
face_det_batch_size=16,
wav2lip_batch_size=128,
resize_factor=1,
crop=[0, -1, 0, -1],
box=[-1, -1, -1, -1],
rotate=False,
nosmooth=False,
face_detector='sfd',
face_enhancement=False):
self.img_size = 96
self.checkpoint_path = checkpoint_path
self.static = static
self.fps = fps
self.pads = pads
self.face_det_batch_size = face_det_batch_size
self.wav2lip_batch_size = wav2lip_batch_size
self.resize_factor = resize_factor
self.crop = crop
self.box = box
self.rotate = rotate
self.nosmooth = nosmooth
self.face_detector = face_detector
self.face_enhancement = face_enhancement
if face_enhancement:
from ppgan.faceutils.face_enhancement import FaceEnhancement
self.faceenhancer = FaceEnhancement()
makedirs('./temp', exist_ok=True)
def get_smoothened_boxes(self, boxes, T):
for i in range(len(boxes)):
if i + T > len(boxes):
window = boxes[len(boxes) - T:]
else:
window = boxes[i:i + T]
boxes[i] = np.mean(window, axis=0)
return boxes
def face_detect(self, images):
detector = face_detection.FaceAlignment(
face_detection.LandmarksType._2D, flip_input=False, face_detector=self.face_detector)
batch_size = self.face_det_batch_size
while 1:
predictions = []
try:
for i in tqdm(range(0, len(images), batch_size)):
predictions.extend(detector.get_detections_for_batch(np.array(images[i:i + batch_size])))
except RuntimeError:
if batch_size == 1:
raise RuntimeError(
'Image too big to run face detection on GPU. Please use the --resize_factor argument')
batch_size //= 2
print('Recovering from OOM error; New batch size: {}'.format(batch_size))
continue
break
results = []
pady1, pady2, padx1, padx2 = self.pads
for rect, image in zip(predictions, images):
if rect is None:
cv2.imwrite('temp/faulty_frame.jpg', image) # check this frame where the face was not detected.
raise ValueError('Face not detected! Ensure the video contains a face in all the frames.')
y1 = max(0, rect[1] - pady1)
y2 = min(image.shape[0], rect[3] + pady2)
x1 = max(0, rect[0] - padx1)
x2 = min(image.shape[1], rect[2] + padx2)
results.append([x1, y1, x2, y2])
boxes = np.array(results)
if not self.nosmooth: boxes = self.get_smoothened_boxes(boxes, T=5)
results = [[image[y1:y2, x1:x2], (y1, y2, x1, x2)] for image, (x1, y1, x2, y2) in zip(images, boxes)]
del detector
return results
def datagen(self, frames, mels):
img_batch, mel_batch, frame_batch, coords_batch = [], [], [], []
if self.box[0] == -1:
if not self.static:
face_det_results = self.face_detect(frames) # BGR2RGB for CNN face detection
else:
face_det_results = self.face_detect([frames[0]])
else:
print('Using the specified bounding box instead of face detection...')
y1, y2, x1, x2 = self.box
face_det_results = [[f[y1:y2, x1:x2], (y1, y2, x1, x2)] for f in frames]
for i, m in enumerate(mels):
idx = 0 if self.static else i % len(frames)
frame_to_save = frames[idx].copy()
face, coords = face_det_results[idx].copy()
face = cv2.resize(face, (self.img_size, self.img_size))
img_batch.append(face)
mel_batch.append(m)
frame_batch.append(frame_to_save)
coords_batch.append(coords)
if len(img_batch) >= self.wav2lip_batch_size:
img_batch, mel_batch = np.asarray(img_batch), np.asarray(mel_batch)
img_masked = img_batch.copy()
img_masked[:, self.img_size // 2:] = 0
img_batch = np.concatenate((img_masked, img_batch), axis=3) / 255.
mel_batch = np.reshape(mel_batch, [len(mel_batch), mel_batch.shape[1], mel_batch.shape[2], 1])
yield img_batch, mel_batch, frame_batch, coords_batch
img_batch, mel_batch, frame_batch, coords_batch = [], [], [], []
if len(img_batch) > 0:
img_batch, mel_batch = np.asarray(img_batch), np.asarray(mel_batch)
img_masked = img_batch.copy()
img_masked[:, self.img_size // 2:] = 0
img_batch = np.concatenate((img_masked, img_batch), axis=3) / 255.
mel_batch = np.reshape(mel_batch, [len(mel_batch), mel_batch.shape[1], mel_batch.shape[2], 1])
yield img_batch, mel_batch, frame_batch, coords_batch
def run(self, face, audio_seq, output_dir, visualization=True):
if os.path.isfile(face) and path.basename(face).split('.')[1] in ['jpg', 'png', 'jpeg']:
self.static = True
if not os.path.isfile(face):
raise ValueError('--face argument must be a valid path to video/image file')
elif path.basename(face).split('.')[1] in ['jpg', 'png', 'jpeg']:
full_frames = [cv2.imread(face)]
fps = self.fps
else:
video_stream = cv2.VideoCapture(face)
fps = video_stream.get(cv2.CAP_PROP_FPS)
print('Reading video frames...')
full_frames = []
while 1:
still_reading, frame = video_stream.read()
if not still_reading:
video_stream.release()
break
if self.resize_factor > 1:
frame = cv2.resize(frame,
(frame.shape[1] // self.resize_factor, frame.shape[0] // self.resize_factor))
if self.rotate:
frame = cv2.rotate(frame, cv2.cv2.ROTATE_90_CLOCKWISE)
y1, y2, x1, x2 = self.crop
if x2 == -1: x2 = frame.shape[1]
if y2 == -1: y2 = frame.shape[0]
frame = frame[y1:y2, x1:x2]
full_frames.append(frame)
print("Number of frames available for inference: " + str(len(full_frames)))
if not audio_seq.endswith('.wav'):
print('Extracting raw audio...')
command = 'ffmpeg -y -i {} -strict -2 {}'.format(audio_seq, 'temp/temp.wav')
subprocess.call(command, shell=True)
audio_seq = 'temp/temp.wav'
wav = audio.load_wav(audio_seq, 16000)
mel = audio.melspectrogram(wav)
if np.isnan(mel.reshape(-1)).sum() > 0:
raise ValueError(
'Mel contains nan! Using a TTS voice? Add a small epsilon noise to the wav file and try again')
mel_chunks = []
mel_idx_multiplier = 80. / fps
i = 0
while 1:
start_idx = int(i * mel_idx_multiplier)
if start_idx + mel_step_size > len(mel[0]):
mel_chunks.append(mel[:, len(mel[0]) - mel_step_size:])
break
mel_chunks.append(mel[:, start_idx:start_idx + mel_step_size])
i += 1
print("Length of mel chunks: {}".format(len(mel_chunks)))
full_frames = full_frames[:len(mel_chunks)]
batch_size = self.wav2lip_batch_size
gen = self.datagen(full_frames.copy(), mel_chunks)
model = Wav2Lip()
if self.checkpoint_path is None:
model_weights_path = get_weights_path_from_url(WAV2LIP_WEIGHT_URL)
weights = paddle.load(model_weights_path)
else:
weights = paddle.load(self.checkpoint_path)
model.load_dict(weights)
model.eval()
print("Model loaded")
for i, (img_batch, mel_batch, frames, coords) in enumerate(
tqdm(gen, total=int(np.ceil(float(len(mel_chunks)) / batch_size)))):
if i == 0:
frame_h, frame_w = full_frames[0].shape[:-1]
out = cv2.VideoWriter('temp/result.avi', cv2.VideoWriter_fourcc(*'DIVX'), fps, (frame_w, frame_h))
img_batch = paddle.to_tensor(np.transpose(img_batch, (0, 3, 1, 2))).astype('float32')
mel_batch = paddle.to_tensor(np.transpose(mel_batch, (0, 3, 1, 2))).astype('float32')
with paddle.no_grad():
pred = model(mel_batch, img_batch)
pred = pred.numpy().transpose(0, 2, 3, 1) * 255.
for p, f, c in zip(pred, frames, coords):
y1, y2, x1, x2 = c
if self.face_enhancement:
p = self.faceenhancer.enhance_from_image(p)
p = cv2.resize(p.astype(np.uint8), (x2 - x1, y2 - y1))
f[y1:y2, x1:x2] = p
out.write(f)
out.release()
os.makedirs(output_dir, exist_ok=True)
if visualization:
command = 'ffmpeg -y -i {} -i {} -strict -2 -q:v 1 {}'.format(audio_seq, 'temp/result.avi',
os.path.join(output_dir, 'result.avi'))
subprocess.call(command, shell=platform.system() != 'Windows')
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import argparse
import copy
import paddle
import paddlehub as hub
from paddlehub.module.module import moduleinfo, runnable, serving
import numpy as np
import cv2
from .model import Wav2LipPredictor
@moduleinfo(name="wav2lip", type="CV/generation", author="paddlepaddle", author_email="", summary="", version="1.0.0")
class wav2lip:
def __init__(self):
self.pretrained_model = os.path.join(self.directory, "wav2lip_hq.pdparams")
self.network = Wav2LipPredictor(
checkpoint_path=self.pretrained_model,
static=False,
fps=25,
pads=[0, 10, 0, 0],
face_det_batch_size=16,
wav2lip_batch_size=128,
resize_factor=1,
crop=[0, -1, 0, -1],
box=[-1, -1, -1, -1],
rotate=False,
nosmooth=False,
face_detector='sfd',
face_enhancement=True)
def wav2lip_transfer(self, face, audio, output_dir='./output_result/', use_gpu=False, visualization=True):
'''
face (str): path to video/image that contains faces to use.
audio (str): path to input audio.
output_dir: the dir to save the results
use_gpu: if True, use gpu to perform the computation, otherwise cpu.
visualization: if True, save results in output_dir.
'''
paddle.disable_static()
place = 'gpu:0' if use_gpu else 'cpu'
place = paddle.set_device(place)
self.network.run(face, audio, output_dir, visualization)
@runnable
def run_cmd(self, argvs: list):
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(
description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
self.args = self.parser.parse_args(argvs)
self.wav2lip_transfer(
face=self.args.face,
audio=self.args.audio,
output_dir=self.args.output_dir,
use_gpu=self.args.use_gpu,
visualization=self.args.visualization)
return
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
self.arg_config_group.add_argument(
'--output_dir', type=str, default='output_result', help='output directory for saving result.')
self.arg_config_group.add_argument('--visualization', type=bool, default=False, help='save results or not.')
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument('--audio', type=str, help="path to input audio.")
self.arg_input_group.add_argument('--face', type=str, help="path to video/image that contains faces to use.")
......@@ -50,16 +50,16 @@
## 三、模型API预测
- ### 1、代码示例
- ### 1、预测代码示例
- ```python
import paddlehub as hub
import cv2
model = hub.Module(name="U2Net_Portrait")
result = model.Cartoon_GEN(images=[cv2.imread('/PATH/TO/IMAGE')])
result = model.Portrait_GEN(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = model.Cartoon_GEN(paths=['/PATH/TO/IMAGE'])
# result = model.Portrait_GEN(paths=['/PATH/TO/IMAGE'])
```
- ### 2、API
......
# arabic_ocr_db_crnn_mobile
|模型名称|arabic_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-12-2|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- arabic_ocr_db_crnn_mobile Module用于识别图片当中的阿拉伯文字,包括阿拉伯文、波斯文、维吾尔文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的阿拉伯文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别阿拉伯文的轻量级OCR模型,支持直接预测。
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.0.2
- paddlehub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
- ```shell
$ hub install arabic_ocr_db_crnn_mobile
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run arabic_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
$ hub run arabic_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="arabic_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造ArabicOCRDBCRNNMobile对象
- **参数**
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m arabic_ocr_db_crnn_mobile
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/arabic_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
## 五、更新历史
* 1.0.0
初始发布
- ```shell
$ hub install arabic_ocr_db_crnn_mobile==1.0.0
```
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="arabic_ocr_db_crnn_mobile",
version="1.1.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class ArabicOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="arabic",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
# chinese_cht_ocr_db_crnn_mobile
|模型名称|chinese_cht_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-12-2|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- chinese_cht_ocr_db_crnn_mobile Module用于识别图片当中的繁体中文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的繁体中文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别繁体中文的轻量级OCR模型,支持直接预测。
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.0.2
- paddlehub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
- ```shell
$ hub install chinese_cht_ocr_db_crnn_mobile
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run chinese_cht_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
$ hub run chinese_cht_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="chinese_cht_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造ChineseChtOCRDBCRNNMobile对象
- **参数**
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m chinese_cht_ocr_db_crnn_mobile
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/chinese_cht_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
## 五、更新历史
* 1.0.0
初始发布
- ```shell
$ hub install chinese_cht_ocr_db_crnn_mobile==1.0.0
```
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="chinese_cht_ocr_db_crnn_mobile",
version="1.0.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class ChineseChtOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="chinese_cht",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
# cyrillic_ocr_db_crnn_mobile
|模型名称|cyrillic_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-12-2|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- cyrillic_ocr_db_crnn_mobile Module用于识别图片当中的斯拉夫文,包括俄罗斯文、塞尔维亚文、白俄罗斯文、保加利亚文、乌克兰文、蒙古文、阿迪赫文、阿瓦尔文、达尔瓦文、因古什文、拉克文、莱兹甘文、塔巴萨兰文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的斯拉夫文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别斯拉夫文的轻量级OCR模型,支持直接预测。
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.0.2
- paddlehub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
- ```shell
$ hub install cyrillic_ocr_db_crnn_mobile
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run cyrillic_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
$ hub run cyrillic_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="cyrillic_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造CyrillicOCRDBCRNNMobile对象
- **参数**
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m cyrillic_ocr_db_crnn_mobile
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/cyrillic_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
## 五、更新历史
* 1.0.0
初始发布
- ```shell
$ hub install cyrillic_ocr_db_crnn_mobile==1.0.0
```
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="cyrillic_ocr_db_crnn_mobile",
version="1.0.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class CyrillicOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="cyrillic",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
# devanagari_ocr_db_crnn_mobile
|模型名称|devanagari_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-12-2|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- devanagari_ocr_db_crnn_mobile Module用于识别图片当中的梵文,包括印地文、马拉地文、尼泊尔文、比尔哈文、迈蒂利文、昂加文、孟加拉文、摩揭陀文、那格浦尔文、尼瓦尔文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的梵文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别梵文的轻量级OCR模型,支持直接预测。
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.0.2
- paddlehub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
- ```shell
$ hub install devanagari_ocr_db_crnn_mobile
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run devanagari_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
$ hub run devanagari_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="devanagari_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造DevanagariOCRDBCRNNMobile对象
- **参数**
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m devanagari_ocr_db_crnn_mobile
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/devanagari_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
## 五、更新历史
* 1.0.0
初始发布
- ```shell
$ hub install devanagari_ocr_db_crnn_mobile==1.0.0
```
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="devanagari_ocr_db_crnn_mobile",
version="1.0.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class DevanagariOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="devanagari",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
# french_ocr_db_crnn_mobile
|模型名称|french_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-12-2|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- french_ocr_db_crnn_mobile Module用于识别图片当中的法文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的法文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别法文的轻量级OCR模型,支持直接预测。
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.0.2
- paddlehub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
- ```shell
$ hub install french_ocr_db_crnn_mobile
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run french_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
$ hub run french_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="french_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造FrechOCRDBCRNNMobile对象
- **参数**
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m french_ocr_db_crnn_mobile
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/french_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
## 五、更新历史
* 1.0.0
初始发布
* 1.1.0
优化模型
- ```shell
$ hub install french_ocr_db_crnn_mobile==1.1.0
```
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="french_ocr_db_crnn_mobile",
version="1.1.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class FrechOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="fr",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
......@@ -27,18 +27,9 @@
- ### 1、环境依赖
- paddlepaddle >= 1.8.0
- paddlepaddle >= 2.0.2
- paddlehub >= 1.8.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- shapely
- pyclipper
- ```shell
$ pip install shapely pyclipper
```
- **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
- paddlehub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
......@@ -58,7 +49,7 @@
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
- ### 2、预测代码示例
- ```python
import paddlehub as hub
......@@ -159,13 +150,15 @@
print(r.json()["results"])
```
## 五、更新历史
* 1.0.0
初始发布
* 1.1.0
优化模型
- ```shell
$ hub install german_ocr_db_crnn_mobile==1.0.0
$ hub install german_ocr_db_crnn_mobile==1.1.0
```
!
"
$
%
&
'
(
)
+
,
-
.
/
0
1
2
3
4
5
6
7
8
9
:
;
>
?
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
[
]
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
£
§
­
²
´
µ
·
º
¼
½
¿
À
Á
Ä
Å
Ç
É
Í
Ï
Ô
Ö
Ø
Ù
Ü
ß
à
á
â
ã
ä
å
æ
ç
è
é
ê
ë
í
ï
ñ
ò
ó
ô
ö
ø
ù
ú
û
ü
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import string
class CharacterOps(object):
""" Convert between text-label and text-index """
def __init__(self, config):
self.character_type = config['character_type']
self.loss_type = config['loss_type']
self.max_text_len = config['max_text_length']
if self.character_type == "en":
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
dict_character = list(self.character_str)
elif self.character_type in [
"ch", 'japan', 'korean', 'french', 'german'
]:
character_dict_path = config['character_dict_path']
add_space = False
if 'use_space_char' in config:
add_space = config['use_space_char']
self.character_str = ""
with open(character_dict_path, "rb") as fin:
lines = fin.readlines()
for line in lines:
line = line.decode('utf-8').strip("\n").strip("\r\n")
self.character_str += line
if add_space:
self.character_str += " "
dict_character = list(self.character_str)
elif self.character_type == "en_sensitive":
# same with ASTER setting (use 94 char).
self.character_str = string.printable[:-6]
dict_character = list(self.character_str)
else:
self.character_str = None
assert self.character_str is not None, \
"Nonsupport type of the character: {}".format(self.character_str)
self.beg_str = "sos"
self.end_str = "eos"
if self.loss_type == "attention":
dict_character = [self.beg_str, self.end_str] + dict_character
elif self.loss_type == "srn":
dict_character = dict_character + [self.beg_str, self.end_str]
self.dict = {}
for i, char in enumerate(dict_character):
self.dict[char] = i
self.character = dict_character
def encode(self, text):
"""convert text-label into text-index.
input:
text: text labels of each image. [batch_size]
output:
text: concatenated text index for CTCLoss.
[sum(text_lengths)] = [text_index_0 + text_index_1 + ... + text_index_(n - 1)]
length: length of each text. [batch_size]
"""
if self.character_type == "en":
text = text.lower()
text_list = []
for char in text:
if char not in self.dict:
continue
text_list.append(self.dict[char])
text = np.array(text_list)
return text
def decode(self, text_index, is_remove_duplicate=False):
""" convert text-index into text-label. """
char_list = []
char_num = self.get_char_num()
if self.loss_type == "attention":
beg_idx = self.get_beg_end_flag_idx("beg")
end_idx = self.get_beg_end_flag_idx("end")
ignored_tokens = [beg_idx, end_idx]
else:
ignored_tokens = [char_num]
for idx in range(len(text_index)):
if text_index[idx] in ignored_tokens:
continue
if is_remove_duplicate:
if idx > 0 and text_index[idx - 1] == text_index[idx]:
continue
char_list.append(self.character[int(text_index[idx])])
text = ''.join(char_list)
return text
def get_char_num(self):
return len(self.character)
def get_beg_end_flag_idx(self, beg_or_end):
if self.loss_type == "attention":
if beg_or_end == "beg":
idx = np.array(self.dict[self.beg_str])
elif beg_or_end == "end":
idx = np.array(self.dict[self.end_str])
else:
assert False, "Unsupport type %s in get_beg_end_flag_idx"\
% beg_or_end
return idx
else:
err = "error in get_beg_end_flag_idx when using the loss %s"\
% (self.loss_type)
assert False, err
def cal_predicts_accuracy(char_ops,
preds,
preds_lod,
labels,
labels_lod,
is_remove_duplicate=False):
acc_num = 0
img_num = 0
for ino in range(len(labels_lod) - 1):
beg_no = preds_lod[ino]
end_no = preds_lod[ino + 1]
preds_text = preds[beg_no:end_no].reshape(-1)
preds_text = char_ops.decode(preds_text, is_remove_duplicate)
beg_no = labels_lod[ino]
end_no = labels_lod[ino + 1]
labels_text = labels[beg_no:end_no].reshape(-1)
labels_text = char_ops.decode(labels_text, is_remove_duplicate)
img_num += 1
if preds_text == labels_text:
acc_num += 1
acc = acc_num * 1.0 / img_num
return acc, acc_num, img_num
def cal_predicts_accuracy_srn(char_ops,
preds,
labels,
max_text_len,
is_debug=False):
acc_num = 0
img_num = 0
char_num = char_ops.get_char_num()
total_len = preds.shape[0]
img_num = int(total_len / max_text_len)
for i in range(img_num):
cur_label = []
cur_pred = []
for j in range(max_text_len):
if labels[j + i * max_text_len] != int(char_num - 1): #0
cur_label.append(labels[j + i * max_text_len][0])
else:
break
for j in range(max_text_len + 1):
if j < len(cur_label) and preds[j + i * max_text_len][
0] != cur_label[j]:
break
elif j == len(cur_label) and j == max_text_len:
acc_num += 1
break
elif j == len(cur_label) and preds[j + i * max_text_len][0] == int(
char_num - 1):
acc_num += 1
break
acc = acc_num * 1.0 / img_num
return acc, acc_num, img_num
def convert_rec_attention_infer_res(preds):
img_num = preds.shape[0]
target_lod = [0]
convert_ids = []
for ino in range(img_num):
end_pos = np.where(preds[ino, :] == 1)[0]
if len(end_pos) <= 1:
text_list = preds[ino, 1:]
else:
text_list = preds[ino, 1:end_pos[1]]
target_lod.append(target_lod[ino] + len(text_list))
convert_ids = convert_ids + list(text_list)
convert_ids = np.array(convert_ids)
convert_ids = convert_ids.reshape((-1, 1))
return convert_ids, target_lod
def convert_rec_label_to_lod(ori_labels):
img_num = len(ori_labels)
target_lod = [0]
convert_ids = []
for ino in range(img_num):
target_lod.append(target_lod[ino] + len(ori_labels[ino]))
convert_ids = convert_ids + list(ori_labels[ino])
convert_ids = np.array(convert_ids)
convert_ids = convert_ids.reshape((-1, 1))
return convert_ids, target_lod
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
# -*- coding:utf-8 -*-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
from PIL import Image, ImageDraw, ImageFont
import base64
import cv2
import numpy as np
def draw_ocr(image,
boxes,
txts,
scores,
font_file,
draw_txt=True,
drop_score=0.5):
"""
Visualize the results of OCR detection and recognition
args:
image(Image|array): RGB image
boxes(list): boxes with shape(N, 4, 2)
txts(list): the texts
scores(list): txxs corresponding scores
draw_txt(bool): whether draw text or not
drop_score(float): only scores greater than drop_threshold will be visualized
return(array):
the visualized img
"""
if scores is None:
scores = [1] * len(boxes)
for (box, score) in zip(boxes, scores):
if score < drop_score or math.isnan(score):
continue
box = np.reshape(np.array(box), [-1, 1, 2]).astype(np.int64)
image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2)
if draw_txt:
img = np.array(resize_img(image, input_size=600))
txt_img = text_visual(
txts,
scores,
font_file,
img_h=img.shape[0],
img_w=600,
threshold=drop_score)
img = np.concatenate([np.array(img), np.array(txt_img)], axis=1)
return img
return image
def text_visual(texts, scores, font_file, img_h=400, img_w=600, threshold=0.):
"""
create new blank img and draw txt on it
args:
texts(list): the text will be draw
scores(list|None): corresponding score of each txt
img_h(int): the height of blank img
img_w(int): the width of blank img
return(array):
"""
if scores is not None:
assert len(texts) == len(
scores), "The number of txts and corresponding scores must match"
def create_blank_img():
blank_img = np.ones(shape=[img_h, img_w], dtype=np.int8) * 255
blank_img[:, img_w - 1:] = 0
blank_img = Image.fromarray(blank_img).convert("RGB")
draw_txt = ImageDraw.Draw(blank_img)
return blank_img, draw_txt
blank_img, draw_txt = create_blank_img()
font_size = 20
txt_color = (0, 0, 0)
font = ImageFont.truetype(font_file, font_size, encoding="utf-8")
gap = font_size + 5
txt_img_list = []
count, index = 1, 0
for idx, txt in enumerate(texts):
index += 1
if scores[idx] < threshold or math.isnan(scores[idx]):
index -= 1
continue
first_line = True
while str_count(txt) >= img_w // font_size - 4:
tmp = txt
txt = tmp[:img_w // font_size - 4]
if first_line:
new_txt = str(index) + ': ' + txt
first_line = False
else:
new_txt = ' ' + txt
draw_txt.text((0, gap * count), new_txt, txt_color, font=font)
txt = tmp[img_w // font_size - 4:]
if count >= img_h // gap - 1:
txt_img_list.append(np.array(blank_img))
blank_img, draw_txt = create_blank_img()
count = 0
count += 1
if first_line:
new_txt = str(index) + ': ' + txt + ' ' + '%.3f' % (scores[idx])
else:
new_txt = " " + txt + " " + '%.3f' % (scores[idx])
draw_txt.text((0, gap * count), new_txt, txt_color, font=font)
# whether add new blank img or not
if count >= img_h // gap - 1 and idx + 1 < len(texts):
txt_img_list.append(np.array(blank_img))
blank_img, draw_txt = create_blank_img()
count = 0
count += 1
txt_img_list.append(np.array(blank_img))
if len(txt_img_list) == 1:
blank_img = np.array(txt_img_list[0])
else:
blank_img = np.concatenate(txt_img_list, axis=1)
return np.array(blank_img)
def str_count(s):
"""
Count the number of Chinese characters,
a single English character and a single number
equal to half the length of Chinese characters.
args:
s(string): the input of string
return(int):
the number of Chinese characters
"""
import string
count_zh = count_pu = 0
s_len = len(s)
en_dg_count = 0
for c in s:
if c in string.ascii_letters or c.isdigit() or c.isspace():
en_dg_count += 1
elif c.isalpha():
count_zh += 1
else:
count_pu += 1
return s_len - math.ceil(en_dg_count / 2)
def resize_img(img, input_size=600):
img = np.array(img)
im_shape = img.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
im_scale = float(input_size) / float(im_size_max)
im = cv2.resize(img, None, None, fx=im_scale, fy=im_scale)
return im
def get_image_ext(image):
if image.shape[2] == 4:
return ".png"
return ".jpg"
def sorted_boxes(dt_boxes):
"""
Sort text boxes in order from top to bottom, left to right
args:
dt_boxes(array):detected text boxes with shape [4, 2]
return:
sorted boxes(array) with shape [4, 2]
"""
num_boxes = dt_boxes.shape[0]
sorted_boxes = sorted(dt_boxes, key=lambda x: (x[0][1], x[0][0]))
_boxes = list(sorted_boxes)
for i in range(num_boxes - 1):
if abs(_boxes[i + 1][0][1] - _boxes[i][0][1]) < 10 and \
(_boxes[i + 1][0][0] < _boxes[i][0][0]):
tmp = _boxes[i]
_boxes[i] = _boxes[i + 1]
_boxes[i + 1] = tmp
return _boxes
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
......@@ -27,18 +27,9 @@
- ### 1、环境依赖
- paddlepaddle >= 1.8.0
- paddlepaddle >= 2.0.2
- paddlehub >= 1.8.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- shapely
- pyclipper
- ```shell
$ pip install shapely pyclipper
```
- **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
- paddlehub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
......@@ -58,7 +49,7 @@
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
- ### 2、预测代码示例
- ```python
import paddlehub as hub
......@@ -160,13 +151,15 @@
print(r.json()["results"])
```
## 五、更新历史
* 1.0.0
初始发布
* 1.1.0
优化模型
- ```shell
$ hub install japan_ocr_db_crnn_mobile==1.0.0
$ hub install japan_ocr_db_crnn_mobile==1.1.0
```
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="kannada_ocr_db_crnn_mobile",
version="1.0.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class KannadaOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="ka",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册