提交 f0c2925d 编写于 作者: W wuzewu

Merge branch 'release/v1.3' into develop

# PaddleHub # PaddleHub
[![Build Status](https://travis-ci.org/PaddlePaddle/PaddleHub.svg?branch=develop)](https://travis-ci.org/PaddlePaddle/PaddleHub) [![Build Status](https://travis-ci.org/PaddlePaddle/PaddleHub.svg?branch=release/v1.3)](https://travis-ci.org/PaddlePaddle/PaddleHub)
[![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE) [![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE)
[![Version](https://img.shields.io/github/release/PaddlePaddle/PaddleHub.svg)](https://github.com/PaddlePaddle/PaddleHub/releases) [![Version](https://img.shields.io/github/release/PaddlePaddle/PaddleHub.svg)](https://github.com/PaddlePaddle/PaddleHub/releases)
...@@ -9,19 +9,21 @@ PaddleHub是基于PaddlePaddle生态下的预训练模型管理和迁移学习 ...@@ -9,19 +9,21 @@ PaddleHub是基于PaddlePaddle生态下的预训练模型管理和迁移学习
* 便捷地获取PaddlePaddle生态下的所有预训练模型,涵盖了图像分类、目标检测、词法分析、语义模型、情感分析、语言模型、视频分类、图像生成、图像分割等主流模型。 * 便捷地获取PaddlePaddle生态下的所有预训练模型,涵盖了图像分类、目标检测、词法分析、语义模型、情感分析、语言模型、视频分类、图像生成、图像分割等主流模型。
* 更多详情可查看官网:https://www.paddlepaddle.org.cn/hub * 更多详情可查看官网:https://www.paddlepaddle.org.cn/hub
* 通过PaddleHub Fine-tune API,结合少量代码即可完成**大规模预训练模型**的迁移学习,具体Demo可参考以下链接: * 通过PaddleHub Fine-tune API,结合少量代码即可完成**大规模预训练模型**的迁移学习,具体Demo可参考以下链接:
* [文本分类](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.2/demo/text-classification) * [文本分类](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.3/demo/text-classification)
* [序列标注](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.2/demo/sequence-labeling) * [序列标注](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.3/demo/sequence-labeling)
* [多标签分类](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.2/demo/multi-label-classification) * [多标签分类](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.3/demo/multi-label-classification)
* [图像分类](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.2/demo/image-classification) * [图像分类](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.3/demo/image-classification)
* [检索式问答任务](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.2/demo/qa_classification) * [检索式问答任务](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.3/demo/qa_classification)
* [回归任务](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.2/demo/sentence_similarity) * [回归任务](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.3/demo/sentence_similarity)
* [句子语义相似度计算](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.2/demo/sentence_similarity) * [句子语义相似度计算](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.3/demo/sentence_similarity)
* [阅读理解任务](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.2/demo/reading-comprehension) * [阅读理解任务](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.3/demo/reading-comprehension)
* 支持超参优化(AutoDL Finetuner),自动调整超参数,给出效果较佳的超参数组合。 * 支持超参优化(AutoDL Finetuner),自动调整超参数,给出效果较佳的超参数组合。
* [PaddleHub超参优化功能AutoDL Finetuner使用示例](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.2/demo/autofinetune) * [PaddleHub超参优化功能AutoDL Finetuner使用示例](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.3/demo/autofinetune)
* 引入『**模型即软件**』的设计理念,通过Python API或者命令行实现一键预测,更方便地应用PaddlePaddle模型库。 * 引入『**模型即软件**』的设计理念,通过Python API或者命令行实现一键预测,更方便地应用PaddlePaddle模型库。
* [PaddleHub命令行工具介绍](https://github.com/PaddlePaddle/PaddleHub/wiki/PaddleHub%E5%91%BD%E4%BB%A4%E8%A1%8C%E5%B7%A5%E5%85%B7) * [PaddleHub命令行工具介绍](https://github.com/PaddlePaddle/PaddleHub/wiki/PaddleHub%E5%91%BD%E4%BB%A4%E8%A1%8C%E5%B7%A5%E5%85%B7)
* 一键Module服务化部署 - HubServing
* [PaddleHub-Serving一键服务部署](https://github.com/PaddlePaddle/PaddleHub/wiki/PaddleHub-Serving%E4%B8%80%E9%94%AE%E6%9C%8D%E5%8A%A1%E9%83%A8%E7%BD%B2)
* [使用示例](https://github.com/PaddlePaddle/PaddleHub/tree/release/v1.3/demo/serving)
## 目录 ## 目录
...@@ -37,7 +39,7 @@ PaddleHub是基于PaddlePaddle生态下的预训练模型管理和迁移学习 ...@@ -37,7 +39,7 @@ PaddleHub是基于PaddlePaddle生态下的预训练模型管理和迁移学习
### 环境依赖 ### 环境依赖
* Python==2.7 or Python>=3.5 * Python==2.7 or Python>=3.5
* PaddlePaddle>=1.4.0 * PaddlePaddle>=1.6.1
除上述依赖外,PaddleHub的预训练模型和预置数据集需要连接服务端进行下载,请确保机器可以正常访问网络。若本地已存在相关的数据集和预训练模型,则可以离线运行PaddleHub。 除上述依赖外,PaddleHub的预训练模型和预置数据集需要连接服务端进行下载,请确保机器可以正常访问网络。若本地已存在相关的数据集和预训练模型,则可以离线运行PaddleHub。
...@@ -76,26 +78,20 @@ $ hub run ssd_mobilenet_v1_pascal --input_path test_object_detection.jpg ...@@ -76,26 +78,20 @@ $ hub run ssd_mobilenet_v1_pascal --input_path test_object_detection.jpg
$ hub run yolov3_coco2017 --input_path test_object_detection.jpg $ hub run yolov3_coco2017 --input_path test_object_detection.jpg
$ hub run faster_rcnn_coco2017 --input_path test_object_detection.jpg $ hub run faster_rcnn_coco2017 --input_path test_object_detection.jpg
``` ```
![SSD检测结果](https://raw.githubusercontent.com/PaddlePaddle/PaddleHub/release/v1.2/docs/imgs/object_detection_result.png) ![SSD检测结果](https://raw.githubusercontent.com/PaddlePaddle/PaddleHub/release/v1.3/docs/imgs/object_detection_result.png)
除了上述三类模型外,PaddleHub还发布了语言模型、语义模型、图像分类、生成模型、视频分类等业界主流模型,更多PaddleHub已经发布的模型,请前往 https://www.paddlepaddle.org.cn/hub 查看 除了上述三类模型外,PaddleHub还发布了语言模型、语义模型、图像分类、生成模型、视频分类等业界主流模型,更多PaddleHub已经发布的模型,请前往 https://www.paddlepaddle.org.cn/hub 查看
同时,我们在AI Studio和AIBook上提供了IPython NoteBook形式的demo,您可以直接在平台上在线体验,链接如下: 同时,我们在AI Studio上提供了IPython NoteBook形式的demo,您可以直接在平台上在线体验,链接如下:
* ERNIE文本分类:
* [AI Studio](https://aistudio.baidu.com/aistudio/projectDetail/79380) |类别|AIStudio链接|
* [AI Book](https://console.bce.baidu.com/bml/?_=1562072915183#/bml/aibook/ernie_txt_cls) |-|-|
* ERNIE序列标注: |ERNIE文本分类|[点击体验](https://aistudio.baidu.com/aistudio/projectDetail/79380)|
* [AI Studio](https://aistudio.baidu.com/aistudio/projectDetail/79377) |ERNIE序列标注|[点击体验](https://aistudio.baidu.com/aistudio/projectDetail/79377)|
* [AI Book](https://console.bce.baidu.com/bml/?_=1562072915183#/bml/aibook/ernie_seq_label) |ELMo文本分类|[点击体验](https://aistudio.baidu.com/aistudio/projectDetail/79400)|
* ELMo文本分类: |senta情感分类|[点击体验](https://aistudio.baidu.com/aistudio/projectDetail/79398)|
* [AI Studio](https://aistudio.baidu.com/aistudio/projectDetail/79400) |图像分类|[点击体验](https://aistudio.baidu.com/aistudio/projectDetail/79378)|
* [AI Book](https://console.bce.baidu.com/bml/#/bml/aibook/elmo_txt_cls)
* senta情感分类:
* [AI Studio](https://aistudio.baidu.com/aistudio/projectDetail/79398)
* [AI Book](https://console.bce.baidu.com/bml/#/bml/aibook/senta_bilstm)
* 图像分类:
* [AI Studio](https://aistudio.baidu.com/aistudio/projectDetail/79378)
* [AI Book](https://console.bce.baidu.com/bml/#/bml/aibook/img_cls)
## 教程 ## 教程
...@@ -105,9 +101,9 @@ PaddleHub如何完成迁移学习,详情参考[wiki教程](https://github.com/ ...@@ -105,9 +101,9 @@ PaddleHub如何完成迁移学习,详情参考[wiki教程](https://github.com/
PaddleHub如何自定义迁移任务,详情参考[wiki教程](https://github.com/PaddlePaddle/PaddleHub/wiki/PaddleHub:-%E8%87%AA%E5%AE%9A%E4%B9%89Task) PaddleHub如何自定义迁移任务,详情参考[wiki教程](https://github.com/PaddlePaddle/PaddleHub/wiki/PaddleHub:-%E8%87%AA%E5%AE%9A%E4%B9%89Task)
PaddleHub如何自动优化超参数,详情参考[AutoDL Finetuner使用教程](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.2/tutorial/autofinetune.md) PaddleHub如何自动优化超参数,详情参考[AutoDL Finetuner使用教程](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.3/tutorial/autofinetune.md)
PaddleHub如何使用ULMFiT策略微调预训练模型,详情参考[PaddleHub 迁移学习与ULMFiT微调策略](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.2/tutorial/strategy_exp.md) PaddleHub如何使用ULMFiT策略微调预训练模型,详情参考[PaddleHub 迁移学习与ULMFiT微调策略](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.3/tutorial/strategy_exp.md)
## FAQ ## FAQ
...@@ -153,4 +149,4 @@ print(res) ...@@ -153,4 +149,4 @@ print(res)
## 更新历史 ## 更新历史
详情参考[更新历史](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.2/RELEASE.md) 详情参考[更新历史](https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.3/RELEASE.md)
# `v1.3.0`
* 新增PaddleHub Serving服务部署
* 新增[hub serving](https://github.com/PaddlePaddle/PaddleHub/wiki/PaddleHub-Serving%E4%B8%80%E9%94%AE%E6%9C%8D%E5%8A%A1%E9%83%A8%E7%BD%B2)命令,支持一键启动Module预测服务部署
* 新增预训练模型:
* roberta_wwm_ext_chinese_L-24_H-1024_A-16
* roberta_wwm_ext_chinese_L-12_H-768_A-12
* bert_wwm_ext_chinese_L-12_H-768_A-12
* bert_wwm_chinese_L-12_H-768_A-12
* AutoDL Finetuner优化使用体验
* 支持通过接口方式回传模型性能
* 可视化效果优化,支持多trail效果显示
# `v1.2.1` # `v1.2.1`
* 新增**超参优化Auto Fine-tune**,实现给定超参搜索空间,PaddleHub自动给出较佳的超参组合 * 新增**超参优化Auto Fine-tune**,实现给定超参搜索空间,PaddleHub自动给出较佳的超参组合
......
## 数据格式
input: {files: {"image": [file_1, file_2, ...]}}
output: {"results":[result_1, result_2, ...]}
## Serving快速启动命令
```shell
$ hub serving start -m vgg11_imagenet
```
## python脚本
```shell
$ python vgg11_imagenet_serving_demo.py
```
## 结果示例
```python
{
"results": "[[{'Egyptian cat': 0.540287435054779}], [{'daisy': 0.9976677298545837}]]"
}
```
结果含有生成图片的base64编码,可提取生成图片,示例python脚本生成图片位置为当前目录下的output文件夹下。
# coding: utf8
import requests
import json
if __name__ == "__main__":
# 指定要预测的图片并生成列表[("image", img_1), ("image", img_2), ... ]
file_list = ["../img/cat.jpg", "../img/flower.jpg"]
files = [("image", (open(item, "rb"))) for item in file_list]
# 指定预测方法为vgg11_imagenet并发送post请求
url = "http://127.0.0.1:8866/predict/image/vgg11_imagenet"
r = requests.post(url=url, files=files)
# 打印预测结果
print(json.dumps(r.json(), indent=4, ensure_ascii=False))
## 数据格式
#### 模型所需参数可通过嵌套字典形式传递
input: {files: {"image": [file_1, file_2, ...]}, data: {...}}
output: {"results":[result_1, result_2, ...]}
## Serving快速启动命令
```shell
$ hub serving start -m stgan_celeba
```
## python脚本
```shell
$ python yolov3_coco2017_serving_demo.py
```
## 结果示例
```python
[
{
"path": "cat.jpg",
"data": [
{
"left": 322.2323,
"right": 1420.4119,
"top": 208.81363,
"bottom": 996.04395,
"label": "cat",
"confidence": 0.9289875
}
]
},
{
"path": "dog.jpg",
"data": [
{
"left": 204.74722,
"right": 746.02637,
"top": 122.793274,
"bottom": 566.6292,
"label": "dog",
"confidence": 0.86698055
}
]
}
]
```
结果含有生成图片的base64编码,可提取生成图片,示例python脚本生成图片位置为当前目录下的output文件夹下。
# coding: utf8
import requests
import json
import base64
import os
if __name__ == "__main__":
# 指定要使用的图片文件并生成列表[("image", img_1), ("image", img_2), ... ]
file_list = ["../img/woman.png"]
files = [("image", (open(item, "rb"))) for item in file_list]
# 为每张图片对应指定info和style
data = {"info": ["Female,Brown_Hair"], "style": ["Aged"]}
# 指定图片生成方法为stgan_celeba并发送post请求
url = "http://127.0.0.1:8866/predict/image/stgan_celeba"
r = requests.post(url=url, data=data, files=files)
results = eval(r.json()["results"])
# 保存生成的图片到output文件夹,打印模型输出结果
if not os.path.exists("output"):
os.mkdir("output")
for item in results:
output_path = os.path.join("output", item["path"].split("/")[-1])
with open(output_path, "wb") as fp:
fp.write(base64.b64decode(item["base64"].split(',')[-1]))
item.pop("base64")
print(json.dumps(results, indent=4, ensure_ascii=False))
## 数据格式
input: {"text": [text_1, text_2, ...]}
output: {"results":[result_1, result_2, ...]}
## Serving快速启动命令
```shell
$ hub serving start -m lm_lstm
```
## python脚本
```shell
$ python lm_lstm_serving_demo.py
```
## 结果示例
```python
{
"results": [
{
"perplexity": 4.584166451916099,
"text": "the plant which is owned by <unk> & <unk> co. was under contract with <unk> to make the cigarette filter"
},
{
"perplexity": 6.038358983397484,
"text": "more common <unk> fibers are <unk> and are more easily rejected by the body dr. <unk> explained"
}
]
}
```
# coding: utf8
import requests
import json
if __name__ == "__main__":
# 指定用于用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list = [
"the plant which is owned by <unk> & <unk> co. was under contract with <unk> to make the cigarette filter",
"more common <unk> fibers are <unk> and are more easily rejected by the body dr. <unk> explained"
]
text = {"text": text_list}
# 指定预测方法为lm_lstm并发送post请求
url = "http://127.0.0.1:8866/predict/text/lm_lstm"
r = requests.post(url=url, data=text)
# 打印预测结果
print(json.dumps(r.json(), indent=4, ensure_ascii=False))
## 数据格式
input: {"text": [text_1, text_2, ...]}
output: {"results":[result_1, result_2, ...]}
## Serving快速启动命令
```shell
$ hub serving start -m lac
```
## python脚本
#### 不携带用户自定义词典
```shell
$ python lac_no_dict_serving_demo.py
```
#### 携带用户自定义词典
```shell
$ python lac_with_dict_serving_demo.py
```
## 结果示例
```python
{
"results": [
{
"tag": [
"TIME",
"v",
"q",
"n"
],
"word": [
"今天",
"是",
"个",
"好日子"
]
},
{
"tag": [
"n",
"v",
"TIME",
"v",
"v"
],
"word": [
"天气预报",
"说",
"今天",
"要",
"下雨"
]
}
]
}
```
天气 n 400000
经 v 1000
常 d 1000
# coding: utf8
import requests
import json
if __name__ == "__main__":
# 指定用于用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list = ["今天是个好日子", "天气预报说今天要下雨"]
text = {"text": text_list}
# 指定预测方法为lac并发送post请求
url = "http://127.0.0.1:8866/predict/text/lac"
r = requests.post(url=url, data=text)
# 打印预测结果
print(json.dumps(r.json(), indent=4, ensure_ascii=False))
# coding: utf8
import requests
import json
if __name__ == "__main__":
# 指定用于用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list = ["今天是个好日子", "天气预报说今天要下雨"]
text = {"text": text_list}
# 指定自定义词典{"user_dict": dict.txt}
file = {"user_dict": open("dict.txt", "rb")}
# 指定预测方法为lac并发送post请求
url = "http://127.0.0.1:8866/predict/text/lac"
r = requests.post(url=url, files=file, data=text)
# 打印预测结果
print(json.dumps(r.json(), indent=4, ensure_ascii=False))
## 数据格式
input: {files: {"image": [file_1, file_2, ...]}}
output: {"results":[result_1, result_2, ...]}
## Serving快速启动命令
```shell
$ hub serving start -m yolov3_coco2017
```
## python脚本
```shell
$ python yolov3_coco2017_serving_demo.py
```
## 结果示例
```python
[
{
"path": "cat.jpg",
"data": [
{
"left": 322.2323,
"right": 1420.4119,
"top": 208.81363,
"bottom": 996.04395,
"label": "cat",
"confidence": 0.9289875
}
]
},
{
"path": "dog.jpg",
"data": [
{
"left": 204.74722,
"right": 746.02637,
"top": 122.793274,
"bottom": 566.6292,
"label": "dog",
"confidence": 0.86698055
}
]
}
]
```
结果含有生成图片的base64编码,可提取生成图片,示例python脚本生成图片位置为当前目录下的output文件夹下。
# coding: utf8
import requests
import json
import base64
import os
if __name__ == "__main__":
# 指定要检测的图片并生成列表[("image", img_1), ("image", img_2), ... ]
file_list = ["../img/cat.jpg", "../img/dog.jpg"]
files = [("image", (open(item, "rb"))) for item in file_list]
# 指定检测方法为yolov3_coco2017并发送post请求
url = "http://127.0.0.1:8866/predict/image/yolov3_coco2017"
r = requests.post(url=url, files=files)
results = eval(r.json()["results"])
# 保存检测生成的图片到output文件夹,打印模型输出结果
if not os.path.exists("output"):
os.mkdir("output")
for item in results:
with open(os.path.join("output", item["path"]), "wb") as fp:
fp.write(base64.b64decode(item["base64"].split(',')[-1]))
item.pop("base64")
print(json.dumps(results, indent=4, ensure_ascii=False))
## 数据格式
input: {"text1": [text_a1, text_a2, ...], "text2": [text_b1, text_b2, ...]}
output: {"results":[result_1, result_2, ...]}
## Serving快速启动命令
```shell
$ hub serving start -m simnet_bow
```
## python脚本
```shell
$ python simnet_bow_serving_demo.py
```
## 结果示例
```python
{
"results": [
{
"similarity": 0.8445,
"text_1": "这道题太难了",
"text_2": "这道题是上一年的考题"
},
{
"similarity": 0.9275,
"text_1": "这道题太难了",
"text_2": "这道题不简单"
},
{
"similarity": 0.9083,
"text_1": "这道题太难了",
"text_2": "这道题很有意思"
}
]
}
```
# coding: utf8
import requests
import json
if __name__ == "__main__":
# 指定用于用于匹配的文本并生成字典{"text_1": [text_a1, text_a2, ... ]
# "text_2": [text_b1, text_b2, ... ]}
text = {
"text_1": ["这道题太难了", "这道题太难了", "这道题太难了"],
"text_2": ["这道题是上一年的考题", "这道题不简单", "这道题很有意思"]
}
# 指定匹配方法为simnet_bow并发送post请求
url = "http://127.0.0.1:8866/predict/text/simnet_bow"
r = requests.post(url=url, data=text)
# 打印匹配结果
print(json.dumps(r.json(), indent=4, ensure_ascii=False))
## 数据格式
input: {files: {"image": [file_1, file_2, ...]}}
output: {"results":[result_1, result_2, ...]}
## Serving快速启动命令
```shell
$ hub serving start -m deeplabv3p_xception65_humanseg
```
## python脚本
```shell
$ python deeplabv3p_xception65_humanseg_serving_demo.py
```
## 结果示例
```python
[
{
"origin": "girl.jpg",
"processed": "humanseg_output/girl_2.png"
}
]
```
结果含有生成图片的base64编码,可提取生成图片,示例python脚本生成图片位置为当前目录下的output文件夹中。
# coding: utf8
import requests
import json
import base64
import os
if __name__ == "__main__":
# 指定要使用的图片文件并生成列表[("image", img_1), ("image", img_2), ... ]
file_list = ["../img/girl.jpg"]
files = [("image", (open(item, "rb"))) for item in file_list]
# 指定图片分割方法为deeplabv3p_xception65_humanseg并发送post请求
url = "http://127.0.0.1:8866/predict/image/deeplabv3p_xception65_humanseg"
r = requests.post(url=url, files=files)
results = eval(r.json()["results"])
# 保存分割后的图片到output文件夹,打印模型输出结果
if not os.path.exists("output"):
os.mkdir("output")
for item in results:
with open(
os.path.join("output", item["processed"].split("/")[-1]),
"wb") as fp:
fp.write(base64.b64decode(item["base64"].split(',')[-1]))
item.pop("base64")
print(json.dumps(results, indent=4, ensure_ascii=False))
## 数据格式
input: {"text": [text_1, text_2, ...]}
output: {"results":[result_1, result_2, ...]}
## Serving快速启动命令
```shell
$ hub serving start -m senta_lstm
```
## python脚本
``` shell
$ python senta_lstm_serving_demo.py
```
## 结果示例
```python
{
"results": [
{
"negative_probs": 0.7079,
"positive_probs": 0.2921,
"sentiment_key": "negative",
"sentiment_label": 0,
"text": "我不爱吃甜食"
},
{
"negative_probs": 0.0149,
"positive_probs": 0.9851,
"sentiment_key": "positive",
"sentiment_label": 1,
"text": "我喜欢躺在床上看电影"
}
]
}
```
# coding: utf8
import requests
import json
if __name__ == "__main__":
# 指定用于用于预测的文本并生成字典{"text": [text_1, text_2, ... ]}
text_list = ["我不爱吃甜食", "我喜欢躺在床上看电影"]
text = {"text": text_list}
# 指定预测方法为senta_lstm并发送post请求
url = "http://127.0.0.1:8866/predict/text/senta_lstm"
r = requests.post(url=url, data=text)
# 打印预测结果
print(json.dumps(r.json(), indent=4, ensure_ascii=False))
...@@ -61,3 +61,5 @@ from .finetune.strategy import ULMFiTStrategy ...@@ -61,3 +61,5 @@ from .finetune.strategy import ULMFiTStrategy
from .finetune.strategy import CombinedStrategy from .finetune.strategy import CombinedStrategy
from .autofinetune.evaluator import report_final_result from .autofinetune.evaluator import report_final_result
from .common.hub_server import server_check
...@@ -79,6 +79,9 @@ class DownloadCommand(BaseCommand): ...@@ -79,6 +79,9 @@ class DownloadCommand(BaseCommand):
url = search_result.get('url', None) url = search_result.get('url', None)
except_md5_value = search_result.get('md5', None) except_md5_value = search_result.get('md5', None)
if not url: if not url:
if default_hub_server._server_check() is False:
tips = "Request Hub-Server unsuccessfully, please check your network."
else:
tips = "PaddleHub can't find model/module named %s" % mod_name tips = "PaddleHub can't find model/module named %s" % mod_name
if mod_version: if mod_version:
tips += " with version %s" % mod_version tips += " with version %s" % mod_version
......
...@@ -53,6 +53,11 @@ class SearchCommand(BaseCommand): ...@@ -53,6 +53,11 @@ class SearchCommand(BaseCommand):
tp = TablePrinter( tp = TablePrinter(
titles=["ResourceName", "Type", "Version", "Summary"], titles=["ResourceName", "Type", "Version", "Summary"],
placeholders=placeholders) placeholders=placeholders)
if len(resource_list) == 0:
if default_hub_server._server_check() is False:
print(
"Request Hub-Server unsuccessfully, please check your network."
)
for resource_name, resource_type, resource_version, resource_summary in resource_list: for resource_name, resource_type, resource_version, resource_summary in resource_list:
if resource_type == "Module": if resource_type == "Module":
colors = ["yellow", None, None, None] colors = ["yellow", None, None, None]
......
...@@ -24,6 +24,7 @@ import requests ...@@ -24,6 +24,7 @@ import requests
import json import json
import yaml import yaml
import random import random
import threading
from random import randint from random import randint
from paddlehub.common import utils, srv_utils from paddlehub.common import utils, srv_utils
...@@ -31,7 +32,7 @@ from paddlehub.common.downloader import default_downloader ...@@ -31,7 +32,7 @@ from paddlehub.common.downloader import default_downloader
from paddlehub.common.server_config import default_server_config from paddlehub.common.server_config import default_server_config
from paddlehub.io.parser import yaml_parser from paddlehub.io.parser import yaml_parser
from paddlehub.common.lock import lock from paddlehub.common.lock import lock
import paddlehub as hub from paddlehub.common.dir import CONF_HOME, CACHE_HOME
RESOURCE_LIST_FILE = "resource_list_file.yml" RESOURCE_LIST_FILE = "resource_list_file.yml"
CACHE_TIME = 60 * 10 CACHE_TIME = 60 * 10
...@@ -40,9 +41,9 @@ CACHE_TIME = 60 * 10 ...@@ -40,9 +41,9 @@ CACHE_TIME = 60 * 10
class HubServer(object): class HubServer(object):
def __init__(self, config_file_path=None): def __init__(self, config_file_path=None):
if not config_file_path: if not config_file_path:
config_file_path = os.path.join(hub.CONF_HOME, 'config.json') config_file_path = os.path.join(CONF_HOME, 'config.json')
if not os.path.exists(hub.CONF_HOME): if not os.path.exists(CONF_HOME):
utils.mkdir(hub.CONF_HOME) utils.mkdir(CONF_HOME)
if not os.path.exists(config_file_path): if not os.path.exists(config_file_path):
with open(config_file_path, 'w+') as fp: with open(config_file_path, 'w+') as fp:
lock.flock(fp, lock.LOCK_EX) lock.flock(fp, lock.LOCK_EX)
...@@ -69,7 +70,7 @@ class HubServer(object): ...@@ -69,7 +70,7 @@ class HubServer(object):
return self.server_url[random.randint(0, len(self.server_url) - 1)] return self.server_url[random.randint(0, len(self.server_url) - 1)]
def resource_list_file_path(self): def resource_list_file_path(self):
return os.path.join(hub.CACHE_HOME, RESOURCE_LIST_FILE) return os.path.join(CACHE_HOME, RESOURCE_LIST_FILE)
def _load_resource_list_file_if_valid(self): def _load_resource_list_file_if_valid(self):
self.resource_list_file = {} self.resource_list_file = {}
...@@ -227,12 +228,12 @@ class HubServer(object): ...@@ -227,12 +228,12 @@ class HubServer(object):
extra=extra) extra=extra)
def request(self): def request(self):
if not os.path.exists(hub.CACHE_HOME): if not os.path.exists(CACHE_HOME):
utils.mkdir(hub.CACHE_HOME) utils.mkdir(CACHE_HOME)
try: try:
r = requests.get(self.get_server_url() + '/' + 'search') r = requests.get(self.get_server_url() + '/' + 'search')
data = json.loads(r.text) data = json.loads(r.text)
cache_path = os.path.join(hub.CACHE_HOME, RESOURCE_LIST_FILE) cache_path = os.path.join(CACHE_HOME, RESOURCE_LIST_FILE)
with open(cache_path, 'w+') as fp: with open(cache_path, 'w+') as fp:
yaml.safe_dump({'resource_list': data['data']}, fp) yaml.safe_dump({'resource_list': data['data']}, fp)
return True return True
...@@ -245,12 +246,61 @@ class HubServer(object): ...@@ -245,12 +246,61 @@ class HubServer(object):
file_url = self.config[ file_url = self.config[
'resource_storage_server_url'] + RESOURCE_LIST_FILE 'resource_storage_server_url'] + RESOURCE_LIST_FILE
result, tips, self.resource_list_file = default_downloader.download_file( result, tips, self.resource_list_file = default_downloader.download_file(
file_url, save_path=hub.CACHE_HOME, replace=True) file_url, save_path=CACHE_HOME, replace=True)
if not result: if not result:
return False return False
except: except:
return False return False
return True return True
def _server_check(self):
try:
r = requests.get(self.get_server_url() + '/search')
if r.status_code == 200:
return True
else:
return False
except:
return False
def server_check(self):
if self._server_check() is True:
print("Request Hub-Server successfully.")
else:
print("Request Hub-Server unsuccessfully.")
class CacheUpdater(threading.Thread):
def __init__(self, module, version=None):
threading.Thread.__init__(self)
self.module = module
self.version = version
def update_resource_list_file(self, module, version=None):
payload = {'word': module}
if version:
payload['version'] = version
api_url = srv_utils.uri_path(default_hub_server.get_server_url(),
'search')
cache_path = os.path.join(CACHE_HOME, RESOURCE_LIST_FILE)
extra = {
"command": "update_cache",
"mtime": os.stat(cache_path).st_mtime
}
try:
r = srv_utils.hub_request(api_url, payload, extra)
except Exception as err:
pass
if r.get("update_cache", 0) == 1:
with open(cache_path, 'w+') as fp:
yaml.safe_dump({'resource_list': r['data']}, fp)
def run(self):
self.update_resource_list_file(self.module, self.version)
def server_check():
default_hub_server.server_check()
default_hub_server = HubServer() default_hub_server = HubServer()
...@@ -142,12 +142,7 @@ def from_module_attr_to_param(module_attr): ...@@ -142,12 +142,7 @@ def from_module_attr_to_param(module_attr):
return param return param
def connect_program(pre_program, def _copy_vars_and_ops_in_blocks(from_block, to_block):
next_program,
input_dict=None,
inplace=True,
need_log=True):
def _copy_vars_and_ops_in_blocks(from_block, to_block):
for var in from_block.vars: for var in from_block.vars:
var = from_block.var(var) var = from_block.var(var)
var_info = copy.deepcopy(get_variable_info(var)) var_info = copy.deepcopy(get_variable_info(var))
...@@ -160,17 +155,24 @@ def connect_program(pre_program, ...@@ -160,17 +155,24 @@ def connect_program(pre_program,
op_info = { op_info = {
'type': op.type, 'type': op.type,
'inputs': { 'inputs': {
input: [block.var(var) for var in op.input(input)] input: [to_block.var(var) for var in op.input(input)]
for input in op.input_names for input in op.input_names
}, },
'outputs': { 'outputs': {
output: [block.var(var) for var in op.output(output)] output: [to_block.var(var) for var in op.output(output)]
for output in op.output_names for output in op.output_names
}, },
'attrs': copy.deepcopy(op.all_attrs()) 'attrs': copy.deepcopy(op.all_attrs())
} }
to_block.append_op(**op_info) to_block.append_op(**op_info)
def connect_program(pre_program,
next_program,
input_dict=None,
inplace=True,
need_log=True):
if not isinstance(pre_program, fluid.Program): if not isinstance(pre_program, fluid.Program):
raise TypeError("pre_program shoule be an instance of fluid.Program") raise TypeError("pre_program shoule be an instance of fluid.Program")
...@@ -268,7 +270,10 @@ def set_op_attr(program, is_test=False): ...@@ -268,7 +270,10 @@ def set_op_attr(program, is_test=False):
def clone_program(origin_program, for_test=False): def clone_program(origin_program, for_test=False):
dest_program = origin_program.clone(for_test=for_test) dest_program = fluid.Program()
_copy_vars_and_ops_in_blocks(origin_program.global_block(),
dest_program.global_block())
dest_program = dest_program.clone(for_test=for_test)
if not for_test: if not for_test:
for name, var in origin_program.global_block().vars.items(): for name, var in origin_program.global_block().vars.items():
dest_program.global_block( dest_program.global_block(
......
...@@ -23,6 +23,7 @@ import shutil ...@@ -23,6 +23,7 @@ import shutil
from paddlehub.common import utils from paddlehub.common import utils
from paddlehub.common import srv_utils from paddlehub.common import srv_utils
from paddlehub.common.downloader import default_downloader from paddlehub.common.downloader import default_downloader
from paddlehub.common.hub_server import default_hub_server
from paddlehub.common.dir import MODULE_HOME from paddlehub.common.dir import MODULE_HOME
from paddlehub.module import module_desc_pb2 from paddlehub.module import module_desc_pb2
import paddlehub as hub import paddlehub as hub
...@@ -100,6 +101,9 @@ class LocalModuleManager(object): ...@@ -100,6 +101,9 @@ class LocalModuleManager(object):
installed_module_version = search_result.get('version', None) installed_module_version = search_result.get('version', None)
if not url or (module_version is not None and installed_module_version if not url or (module_version is not None and installed_module_version
!= module_version) or (name != module_name): != module_version) or (name != module_name):
if default_hub_server._server_check() is False:
tips = "Request Hub-Server unsuccessfully, please check your network."
else:
tips = "Can't find module %s" % module_name tips = "Can't find module %s" % module_name
if module_version: if module_version:
tips += " with version %s" % module_version tips += " with version %s" % module_version
......
...@@ -34,6 +34,7 @@ from paddlehub.common.downloader import default_downloader ...@@ -34,6 +34,7 @@ from paddlehub.common.downloader import default_downloader
from paddlehub.module import module_desc_pb2 from paddlehub.module import module_desc_pb2
from paddlehub.common.dir import CONF_HOME from paddlehub.common.dir import CONF_HOME
from paddlehub.module import check_info_pb2 from paddlehub.module import check_info_pb2
from paddlehub.common.hub_server import CacheUpdater
from paddlehub.module.signature import Signature, create_signature from paddlehub.module.signature import Signature, create_signature
from paddlehub.module.checker import ModuleChecker from paddlehub.module.checker import ModuleChecker
from paddlehub.module.manager import default_module_manager from paddlehub.module.manager import default_module_manager
...@@ -127,6 +128,8 @@ class Module(object): ...@@ -127,6 +128,8 @@ class Module(object):
elif module_dir: elif module_dir:
self._init_with_module_file(module_dir=module_dir[0]) self._init_with_module_file(module_dir=module_dir[0])
lock.flock(fp_lock, lock.LOCK_UN) lock.flock(fp_lock, lock.LOCK_UN)
name = module_dir[0].split("/")[-1]
version = module_dir[1]
elif signatures: elif signatures:
if processor: if processor:
if not issubclass(processor, BaseProcessor): if not issubclass(processor, BaseProcessor):
...@@ -144,6 +147,7 @@ class Module(object): ...@@ -144,6 +147,7 @@ class Module(object):
else: else:
lock.flock(fp_lock, lock.LOCK_UN) lock.flock(fp_lock, lock.LOCK_UN)
raise ValueError("Module initialized parameter is empty") raise ValueError("Module initialized parameter is empty")
CacheUpdater(name, version).start()
def _init_with_name(self, name, version=None): def _init_with_name(self, name, version=None):
log_msg = "Installing %s module" % name log_msg = "Installing %s module" % name
......
...@@ -22,17 +22,6 @@ import os ...@@ -22,17 +22,6 @@ import os
import base64 import base64
import logging import logging
nlp_module_method = {
"lac": "predict_lexical_analysis",
"simnet_bow": "predict_sentiment_analysis",
"lm_lstm": "predict_pretrained_model",
"senta_lstm": "predict_pretrained_model",
"senta_gru": "predict_pretrained_model",
"senta_cnn": "predict_pretrained_model",
"senta_bow": "predict_pretrained_model",
"senta_bilstm": "predict_pretrained_model",
"emotion_detection_textcnn": "predict_pretrained_model"
}
cv_module_method = { cv_module_method = {
"vgg19_imagenet": "predict_classification", "vgg19_imagenet": "predict_classification",
"vgg16_imagenet": "predict_classification", "vgg16_imagenet": "predict_classification",
...@@ -65,63 +54,33 @@ cv_module_method = { ...@@ -65,63 +54,33 @@ cv_module_method = {
} }
def predict_sentiment_analysis(module, input_text, batch_size, extra=None): def predict_nlp(module, input_text, req_id, batch_size, extra=None):
global use_gpu
method_name = module.desc.attr.map.data['default_signature'].s
predict_method = getattr(module, method_name)
try:
data = input_text[0]
data.update(input_text[1])
results = predict_method(
data=data, use_gpu=use_gpu, batch_size=batch_size)
except Exception as err:
curr = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(time.time()))
print(curr, " - ", err)
return {"result": "Please check data format!"}
return results
def predict_pretrained_model(module, input_text, batch_size, extra=None):
global use_gpu
method_name = module.desc.attr.map.data['default_signature'].s method_name = module.desc.attr.map.data['default_signature'].s
predict_method = getattr(module, method_name) predict_method = getattr(module, method_name)
try: try:
data = {"text": input_text} data = input_text
results = predict_method( if module.name == "lac" and extra.get("user_dict", []) != []:
data=data, use_gpu=use_gpu, batch_size=batch_size) res = predict_method(
except Exception as err:
curr = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(time.time()))
print(curr, " - ", err)
return {"result": "Please check data format!"}
return results
def predict_lexical_analysis(module, input_text, batch_size, extra=[]):
global use_gpu
method_name = module.desc.attr.map.data['default_signature'].s
predict_method = getattr(module, method_name)
data = {"text": input_text}
try:
if extra == []:
results = predict_method(
data=data, use_gpu=use_gpu, batch_size=batch_size)
else:
user_dict = extra[0]
results = predict_method(
data=data, data=data,
user_dict=user_dict, user_dict=extra.get("user_dict", [])[0],
use_gpu=use_gpu, use_gpu=use_gpu,
batch_size=batch_size) batch_size=batch_size)
for path in extra: else:
os.remove(path) res = predict_method(
data=data, use_gpu=use_gpu, batch_size=batch_size)
except Exception as err: except Exception as err:
curr = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(time.time())) curr = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(time.time()))
print(curr, " - ", err) print(curr, " - ", err)
return {"result": "Please check data format!"} return {"results": "Please check data format!"}
return results finally:
user_dict = extra.get("user_dict", [])
for item in user_dict:
if os.path.exists(item):
os.remove(item)
return {"results": res}
def predict_classification(module, input_img, batch_size): def predict_classification(module, input_img, id, batch_size, extra={}):
global use_gpu global use_gpu
method_name = module.desc.attr.map.data['default_signature'].s method_name = module.desc.attr.map.data['default_signature'].s
predict_method = getattr(module, method_name) predict_method = getattr(module, method_name)
...@@ -133,31 +92,35 @@ def predict_classification(module, input_img, batch_size): ...@@ -133,31 +92,35 @@ def predict_classification(module, input_img, batch_size):
curr = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(time.time())) curr = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(time.time()))
print(curr, " - ", err) print(curr, " - ", err)
return {"result": "Please check data format!"} return {"result": "Please check data format!"}
finally:
for item in input_img["image"]:
if os.path.exists(item):
os.remove(item)
return results return results
def predict_gan(module, input_img, id, batch_size, extra={}): def predict_gan(module, input_img, id, batch_size, extra={}):
# special
output_folder = module.name.split("_")[0] + "_" + "output" output_folder = module.name.split("_")[0] + "_" + "output"
global use_gpu global use_gpu
method_name = module.desc.attr.map.data['default_signature'].s method_name = module.desc.attr.map.data['default_signature'].s
predict_method = getattr(module, method_name) predict_method = getattr(module, method_name)
try: try:
extra.update({"image": input_img})
input_img = {"image": input_img} input_img = {"image": input_img}
results = predict_method( results = predict_method(
data=input_img, use_gpu=use_gpu, batch_size=batch_size) data=extra, use_gpu=use_gpu, batch_size=batch_size)
except Exception as err: except Exception as err:
curr = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(time.time())) curr = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(time.time()))
print(curr, " - ", err) print(curr, " - ", err)
return {"result": "Please check data format!"} return {"result": "Please check data format!"}
finally:
base64_list = [] base64_list = []
results_pack = [] results_pack = []
input_img = input_img.get("image", []) input_img = input_img.get("image", [])
for index in range(len(input_img)): for index in range(len(input_img)):
# special
item = input_img[index] item = input_img[index]
with open(os.path.join(output_folder, item), "rb") as fp: output_file = results[index].split(" ")[-1]
# special with open(output_file, "rb") as fp:
b_head = "data:image/" + item.split(".")[-1] + ";base64" b_head = "data:image/" + item.split(".")[-1] + ";base64"
b_body = base64.b64encode(fp.read()) b_body = base64.b64encode(fp.read())
b_body = str(b_body).replace("b'", "").replace("'", "") b_body = str(b_body).replace("b'", "").replace("'", "")
...@@ -168,11 +131,11 @@ def predict_gan(module, input_img, id, batch_size, extra={}): ...@@ -168,11 +131,11 @@ def predict_gan(module, input_img, id, batch_size, extra={}):
results[index].update({"base64": b_img}) results[index].update({"base64": b_img})
results_pack.append(results[index]) results_pack.append(results[index])
os.remove(item) os.remove(item)
os.remove(os.path.join(output_folder, item)) os.remove(output_file)
return results_pack return results_pack
def predict_object_detection(module, input_img, id, batch_size): def predict_object_detection(module, input_img, id, batch_size, extra={}):
output_folder = "output" output_folder = "output"
global use_gpu global use_gpu
method_name = module.desc.attr.map.data['default_signature'].s method_name = module.desc.attr.map.data['default_signature'].s
...@@ -185,6 +148,7 @@ def predict_object_detection(module, input_img, id, batch_size): ...@@ -185,6 +148,7 @@ def predict_object_detection(module, input_img, id, batch_size):
curr = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(time.time())) curr = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(time.time()))
print(curr, " - ", err) print(curr, " - ", err)
return {"result": "Please check data format!"} return {"result": "Please check data format!"}
finally:
base64_list = [] base64_list = []
results_pack = [] results_pack = []
input_img = input_img.get("image", []) input_img = input_img.get("image", [])
...@@ -205,8 +169,7 @@ def predict_object_detection(module, input_img, id, batch_size): ...@@ -205,8 +169,7 @@ def predict_object_detection(module, input_img, id, batch_size):
return results_pack return results_pack
def predict_semantic_segmentation(module, input_img, id, batch_size): def predict_semantic_segmentation(module, input_img, id, batch_size, extra={}):
# special
output_folder = module.name.split("_")[-1] + "_" + "output" output_folder = module.name.split("_")[-1] + "_" + "output"
global use_gpu global use_gpu
method_name = module.desc.attr.map.data['default_signature'].s method_name = module.desc.attr.map.data['default_signature'].s
...@@ -219,6 +182,7 @@ def predict_semantic_segmentation(module, input_img, id, batch_size): ...@@ -219,6 +182,7 @@ def predict_semantic_segmentation(module, input_img, id, batch_size):
curr = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(time.time())) curr = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(time.time()))
print(curr, " - ", err) print(curr, " - ", err)
return {"result": "Please check data format!"} return {"result": "Please check data format!"}
finally:
base64_list = [] base64_list = []
results_pack = [] results_pack = []
input_img = input_img.get("image", []) input_img = input_img.get("image", [])
...@@ -227,7 +191,6 @@ def predict_semantic_segmentation(module, input_img, id, batch_size): ...@@ -227,7 +191,6 @@ def predict_semantic_segmentation(module, input_img, id, batch_size):
item = input_img[index] item = input_img[index]
output_file_path = "" output_file_path = ""
with open(results[index]["processed"], "rb") as fp: with open(results[index]["processed"], "rb") as fp:
# special
b_head = "data:image/png;base64" b_head = "data:image/png;base64"
b_body = base64.b64encode(fp.read()) b_body = base64.b64encode(fp.read())
b_body = str(b_body).replace("b'", "").replace("'", "") b_body = str(b_body).replace("b'", "").replace("'", "")
...@@ -236,8 +199,8 @@ def predict_semantic_segmentation(module, input_img, id, batch_size): ...@@ -236,8 +199,8 @@ def predict_semantic_segmentation(module, input_img, id, batch_size):
output_file_path = results[index]["processed"] output_file_path = results[index]["processed"]
results[index]["origin"] = results[index]["origin"].replace( results[index]["origin"] = results[index]["origin"].replace(
id + "_", "") id + "_", "")
results[index]["processed"] = results[index]["processed"].replace( results[index]["processed"] = results[index][
id + "_", "") "processed"].replace(id + "_", "")
results[index].update({"base64": b_img}) results[index].update({"base64": b_img})
results_pack.append(results[index]) results_pack.append(results[index])
os.remove(item) os.remove(item)
...@@ -274,14 +237,18 @@ def create_app(): ...@@ -274,14 +237,18 @@ def create_app():
module_info.update({"cv_module": [{"Choose...": "Choose..."}]}) module_info.update({"cv_module": [{"Choose...": "Choose..."}]})
for item in cv_module: for item in cv_module:
module_info["cv_module"].append({item: item}) module_info["cv_module"].append({item: item})
module_info.update({"Choose...": [{"请先选择分类": "Choose..."}]})
return {"module_info": module_info} return {"module_info": module_info}
@app_instance.route("/predict/image/<module_name>", methods=["POST"]) @app_instance.route("/predict/image/<module_name>", methods=["POST"])
def predict_image(module_name): def predict_image(module_name):
if request.path.split("/")[-1] not in cv_module:
return {"error": "Module {} is not available.".format(module_name)}
req_id = request.data.get("id") req_id = request.data.get("id")
global use_gpu, batch_size_dict global use_gpu, batch_size_dict
img_base64 = request.form.getlist("image") img_base64 = request.form.getlist("image")
extra_info = {}
for item in list(request.form.keys()):
extra_info.update({item: request.form.getlist(item)})
file_name_list = [] file_name_list = []
if img_base64 != []: if img_base64 != []:
for item in img_base64: for item in img_base64:
...@@ -310,36 +277,34 @@ def create_app(): ...@@ -310,36 +277,34 @@ def create_app():
module_type = module.type.split("/")[-1].replace("-", "_").lower() module_type = module.type.split("/")[-1].replace("-", "_").lower()
predict_func = eval("predict_" + module_type) predict_func = eval("predict_" + module_type)
batch_size = batch_size_dict.get(module_name, 1) batch_size = batch_size_dict.get(module_name, 1)
results = predict_func(module, file_name_list, req_id, batch_size) results = predict_func(module, file_name_list, req_id, batch_size,
extra_info)
r = {"results": str(results)} r = {"results": str(results)}
return r return r
@app_instance.route("/predict/text/<module_name>", methods=["POST"]) @app_instance.route("/predict/text/<module_name>", methods=["POST"])
def predict_text(module_name): def predict_text(module_name):
if request.path.split("/")[-1] not in nlp_module:
return {"error": "Module {} is not available.".format(module_name)}
req_id = request.data.get("id") req_id = request.data.get("id")
global use_gpu inputs = {}
if module_name == "simnet_bow": for item in list(request.form.keys()):
text_1 = request.form.getlist("text_1") inputs.update({item: request.form.getlist(item)})
text_2 = request.form.getlist("text_2") files = {}
data = [{"text_1": text_1}, {"text_2": text_2}] for file_key in list(request.files.keys()):
else: files[file_key] = []
data = request.form.getlist("text") for file in request.files.getlist(file_key):
file = request.files.getlist("user_dict") file_name = req_id + "_" + file.filename
files[file_key].append(file_name)
file.save(file_name)
module = TextModelService.get_module(module_name) module = TextModelService.get_module(module_name)
predict_func_name = nlp_module_method.get(module_name, "") results = predict_nlp(
if predict_func_name != "": module=module,
predict_func = eval(predict_func_name) input_text=inputs,
else: req_id=req_id,
module_type = module.type.split("/")[-1].replace("-", "_").lower() batch_size=batch_size_dict.get(module_name, 1),
predict_func = eval("predict_" + module_type) extra=files)
file_list = [] return results
for item in file:
file_path = req_id + "_" + item.filename
file_list.append(file_path)
item.save(file_path)
batch_size = batch_size_dict.get(module_name, 1)
results = predict_func(module, data, batch_size, file_list)
return {"results": results}
return app_instance return app_instance
......
...@@ -113,7 +113,7 @@ def finetune(args): ...@@ -113,7 +113,7 @@ def finetune(args):
shutil.rmtree(config.checkpoint_dir) shutil.rmtree(config.checkpoint_dir)
# acc on dev will be used by auto finetune # acc on dev will be used by auto finetune
print("AutoFinetuneEval"+"\t"+str(float(eval_avg_score["acc"]))) hub.report_final_result(eval_avg_score["acc"])
if __name__ == "__main__": if __name__ == "__main__":
......
...@@ -142,5 +142,5 @@ if __name__ == '__main__': ...@@ -142,5 +142,5 @@ if __name__ == '__main__':
shutil.rmtree(config.checkpoint_dir) shutil.rmtree(config.checkpoint_dir)
# acc on dev will be used by auto finetune # acc on dev will be used by auto finetune
print("AutoFinetuneEval"+"\t"+str(float(eval_avg_score["acc"]))) hub.report_final_result(eval_avg_score["acc"])
``` ```
...@@ -14,14 +14,14 @@ PaddleHub AutoDL Finetuner提供两种超参优化算法: ...@@ -14,14 +14,14 @@ PaddleHub AutoDL Finetuner提供两种超参优化算法:
调整参数的基本思路为,调整参数使得产生更优解的概率逐渐增大。优化过程如下图: 调整参数的基本思路为,调整参数使得产生更优解的概率逐渐增大。优化过程如下图:
<p align="center"> <p align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleHub/release/v1.2/docs/imgs/bayesian_optimization.gif" hspace='10'/> <br /> <img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleHub/release/v1.3/docs/imgs/bayesian_optimization.gif" hspace='10'/> <br />
</p> </p>
*图片来源于https://www.kaggle.com/clair14/tutorial-bayesian-optimization* *图片来源于https://www.kaggle.com/clair14/tutorial-bayesian-optimization*
* PSHE2: 采用哈密尔顿动力系统搜索参数空间中“势能”最低的点。而最优超参数组合就是势能低点。现在想求得最优解就是要找到更新超参数组合,即如何更新超参数,才能让算法更快更好的收敛到最优解。PSHE2算法根据超参数本身历史的最优,在一定随机扰动的情况下决定下一步的更新方向。 * PSHE2: 采用哈密尔顿动力系统搜索参数空间中“势能”最低的点。而最优超参数组合就是势能低点。现在想求得最优解就是要找到更新超参数组合,即如何更新超参数,才能让算法更快更好的收敛到最优解。PSHE2算法根据超参数本身历史的最优,在一定随机扰动的情况下决定下一步的更新方向。
<p align="center"> <p align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleHub/release/v1.2/docs/imgs/thermodynamics.gif" hspace='10'/> <br /> <img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleHub/release/v1.3/docs/imgs/thermodynamics.gif" hspace='10'/> <br />
</p> </p>
PaddleHub AutoDL Finetuner为了评估搜索的超参对于任务的效果,提供两种超参评估策略: PaddleHub AutoDL Finetuner为了评估搜索的超参对于任务的效果,提供两种超参评估策略:
...@@ -59,7 +59,7 @@ hparam给出待搜索的超参名字、类型(int或者float)、搜索范围 ...@@ -59,7 +59,7 @@ hparam给出待搜索的超参名字、类型(int或者float)、搜索范围
train.py用于接受PaddleHub搜索到的超参进行一次优化过程,将优化后的效果返回 train.py用于接受PaddleHub搜索到的超参进行一次优化过程,将优化后的效果返回
<p align="center"> <p align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleHub/release/v1.2/docs/imgs/demo.png" hspace='10'/> <br /> <img src="https://raw.githubusercontent.com/PaddlePaddle/PaddleHub/release/v1.3/docs/imgs/demo.png" hspace='10'/> <br />
</p> </p>
**NOTE**: **NOTE**:
...@@ -70,9 +70,9 @@ train.py用于接受PaddleHub搜索到的超参进行一次优化过程,将优 ...@@ -70,9 +70,9 @@ train.py用于接受PaddleHub搜索到的超参进行一次优化过程,将优
* 超参评估策略选择PopulationBased时,train.py须包含选项参数model_path,自动从model_path指定的路径恢复模型 * 超参评估策略选择PopulationBased时,train.py须包含选项参数model_path,自动从model_path指定的路径恢复模型
* train.py须输出模型的评价效果(建议使用验证集或者测试集上的评价效果),输出以“AutoFinetuneEval"开始,与评价效果之间以“\t”分开,如 * train.py须反馈模型的评价效果(建议使用验证集或者测试集上的评价效果),通过调用`report_final_result`接口反馈,如
```python ```python
print("AutoFinetuneEval"+"\t" + str(eval_acc)) hub.report_final_result(eval_avg_score["acc"])
``` ```
* 输出的评价效果取值范围应为`(-∞, 1]`,取值越高,表示效果越好。 * 输出的评价效果取值范围应为`(-∞, 1]`,取值越高,表示效果越好。
...@@ -85,20 +85,20 @@ train.py用于接受PaddleHub搜索到的超参进行一次优化过程,将优 ...@@ -85,20 +85,20 @@ train.py用于接受PaddleHub搜索到的超参进行一次优化过程,将优
## 三、启动方式 ## 三、启动方式
**确认安装PaddleHub版本在1.2.1以上, 同时PaddleHub AutoDL Finetuner功能要求至少有一张GPU显卡可用。** **确认安装PaddleHub版本在1.3.0以上, 同时PaddleHub AutoDL Finetuner功能要求至少有一张GPU显卡可用。**
通过以下命令方式: 通过以下命令方式:
```shell ```shell
$ OUTPUT=result/ $ OUTPUT=result/
$ hub autofinetune train.py --param_file=hparam.yaml --cuda=['1','2'] --popsize=5 --round=10 $ hub autofinetune train.py --param_file=hparam.yaml --gpu=0,1 --popsize=5 --round=10
--output_dir=${OUTPUT} --evaluate_choice=fulltrail --tuning_strategy=pshe2 --output_dir=${OUTPUT} --evaluator=fulltrail --tuning_strategy=pshe2
``` ```
其中,选项 其中,选项
> `--param_file`: 必填,待优化的超参数信息yaml文件,即上述[hparam.yaml](#hparam.yaml)。 > `--param_file`: 必填,待优化的超参数信息yaml文件,即上述[hparam.yaml](#hparam.yaml)。
> `--cuda`: 必填,设置运行程序的可用GPU卡号,list类型,中间以逗号隔开,不能有空格,默认为[‘0’] > `--gpu`: 必填,设置运行程序的可用GPU卡号,中间以逗号隔开,不能有空格
> `--popsize`: 可选,设置程序运行每轮产生的超参组合数,默认为5 > `--popsize`: 可选,设置程序运行每轮产生的超参组合数,默认为5
...@@ -106,7 +106,7 @@ $ hub autofinetune train.py --param_file=hparam.yaml --cuda=['1','2'] --popsize= ...@@ -106,7 +106,7 @@ $ hub autofinetune train.py --param_file=hparam.yaml --cuda=['1','2'] --popsize=
> `--output_dir`: 可选,设置程序运行输出结果存放目录,不指定该选项参数时,在当前运行路径下生成存放程序运行输出信息的文件夹 > `--output_dir`: 可选,设置程序运行输出结果存放目录,不指定该选项参数时,在当前运行路径下生成存放程序运行输出信息的文件夹
> `--evaluate_choice`: 可选,设置自动优化超参的评价效果方式,可选fulltrail和populationbased, 默认为populationbased > `--evaluator`: 可选,设置自动优化超参的评价效果方式,可选fulltrail和populationbased, 默认为populationbased
> `--tuning_strategy`: 可选,设置自动优化超参算法,可选hazero和pshe2,默认为pshe2 > `--tuning_strategy`: 可选,设置自动优化超参算法,可选hazero和pshe2,默认为pshe2
...@@ -114,7 +114,7 @@ $ hub autofinetune train.py --param_file=hparam.yaml --cuda=['1','2'] --popsize= ...@@ -114,7 +114,7 @@ $ hub autofinetune train.py --param_file=hparam.yaml --cuda=['1','2'] --popsize=
* 进行超参搜索时,一共会进行n轮(--round指定),每轮产生m组超参(--popsize指定)进行搜索。上一轮的优化结果决定下一轮超参数调整方向 * 进行超参搜索时,一共会进行n轮(--round指定),每轮产生m组超参(--popsize指定)进行搜索。上一轮的优化结果决定下一轮超参数调整方向
* 当指定GPU数量不足以同时跑一轮时,AutoDL Finetuner功能自动实现排队为了提高GPU利用率,建议卡数为刚好可以被popsize整除。如popsize=6,cuda=['0','1','2','3'],则每搜索一轮,AutoDL Finetuner自动起四个进程训练,所以第5/6组超参组合需要排队一次,在搜索第5/6两组超参时,会存在两张卡出现空闲等待的情况,如果设置为3张可用的卡,则可以避免这种情况的出现。 * 当指定GPU数量不足以同时跑一轮时,AutoDL Finetuner功能自动实现排队为了提高GPU利用率,建议卡数为刚好可以被popsize整除。如popsize=6,gpu=0,1,2,3,则每搜索一轮,AutoDL Finetuner自动起四个进程训练,所以第5/6组超参组合需要排队一次,在搜索第5/6两组超参时,会存在两张卡出现空闲等待的情况,如果设置为3张可用的卡,则可以避免这种情况的出现。
## 四、目录结构 ## 四、目录结构
...@@ -122,6 +122,7 @@ $ hub autofinetune train.py --param_file=hparam.yaml --cuda=['1','2'] --popsize= ...@@ -122,6 +122,7 @@ $ hub autofinetune train.py --param_file=hparam.yaml --cuda=['1','2'] --popsize=
``` ```
./output_dir/ ./output_dir/
├── log_file.txt ├── log_file.txt
├── best_model
├── visualization ├── visualization
├── round0 ├── round0
├── round1 ├── round1
...@@ -140,6 +141,8 @@ $ hub autofinetune train.py --param_file=hparam.yaml --cuda=['1','2'] --popsize= ...@@ -140,6 +141,8 @@ $ hub autofinetune train.py --param_file=hparam.yaml --cuda=['1','2'] --popsize=
* log_file.txt记录每一轮搜索所有的超参以及整个过程中所搜索到的最优超参 * log_file.txt记录每一轮搜索所有的超参以及整个过程中所搜索到的最优超参
* best_model保存整个搜索训练过程中得到的最优的模型参数
* visualization记录可视化过程的日志文件 * visualization记录可视化过程的日志文件
* round0 ~ roundn记录每一轮的数据,在每个round目录下,还存在以下文件: * round0 ~ roundn记录每一轮的数据,在每个round目录下,还存在以下文件:
...@@ -165,8 +168,8 @@ PaddleHub AutoDL Finetuner 支持将train.py中的args其余不需要搜索的 ...@@ -165,8 +168,8 @@ PaddleHub AutoDL Finetuner 支持将train.py中的args其余不需要搜索的
```shell ```shell
$ OUTPUT=result/ $ OUTPUT=result/
$ hub autofinetune train.py --param_file=hparam.yaml --cuda=['1','2'] --popsize=5 --round=10 $ hub autofinetune train.py --param_file=hparam.yaml --gpu=0,1 --popsize=5 --round=10
--output_dir=${OUTPUT} --evaluate_choice=fulltrail --tuning_strategy=pshe2 max_seq_len 128 --output_dir=${OUTPUT} --evaluator=fulltrail --tuning_strategy=pshe2 max_seq_len 128
``` ```
## 七、其他 ## 七、其他
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册