diff --git a/.gitmodules b/.gitmodules index 3cb413099e7443961d3ff3049eee0723c0912c67..100d42ff0fcc9c935dea6fd5108ab9171b9018c5 100644 --- a/.gitmodules +++ b/.gitmodules @@ -1,15 +1,15 @@ -[submodule "fluid/PaddleNLP/LAC"] - path = fluid/PaddleNLP/LAC +[submodule "PaddleNLP/LAC"] + path = PaddleNLP/LAC url = https://github.com/baidu/lac.git -[submodule "fluid/PaddleNLP/SimNet"] - path = fluid/PaddleNLP/SimNet +[submodule "PaddleNLP/SimNet"] + path = PaddleNLP/SimNet url = https://github.com/baidu/AnyQ.git -[submodule "fluid/PaddleNLP/Senta"] - path = fluid/PaddleNLP/Senta +[submodule "PaddleNLP/Senta"] + path = PaddleNLP/Senta url = https://github.com/baidu/Senta.git -[submodule "fluid/PaddleNLP/LARK"] - path = fluid/PaddleNLP/LARK - url = https://github.com/PaddlePaddle/LARK -[submodule "fluid/PaddleNLP/knowledge-driven-dialogue"] - path = fluid/PaddleNLP/knowledge-driven-dialogue +[submodule "PaddleNLP/LARK"] + path = PaddleNLP/LARK + url = https://github.com/PaddlePaddle/LARK.git +[submodule "PaddleNLP/knowledge-driven-dialogue"] + path = PaddleNLP/knowledge-driven-dialogue url = https://github.com/baidu/knowledge-driven-dialogue diff --git a/AutoDL/HiNAS_models/README.md b/AutoDL/HiNAS_models/README.md new file mode 100755 index 0000000000000000000000000000000000000000..9c67736aa30643baf72ce42ed2ca3321d4e22165 --- /dev/null +++ b/AutoDL/HiNAS_models/README.md @@ -0,0 +1,76 @@ +# Image Classification Models +This directory contains six image classification models, which are models automatically discovered by Baidu Big Data Lab (BDL) Hierarchical Neural Architecture Search project (HiNAS), achieving 96.1% accuracy on CIFAR-10 dataset. These models are divided into two categories. The first three have no skip link, named HiNAS 0-2, and the last three networks contain skip links, which are similar to the shortcut connections in Resnet, named HiNAS 3-5. + +--- +## Table of Contents +- [Installation](#installation) +- [Data preparation](#data-preparation) +- [Training a model](#training-a-model) +- [Model performances](#model-performances) + +## Installation +Running the trainer in current directory requires: + +- PadddlePaddle Fluid >= v0.15.0 +- CuDNN >=6.0 + +If PaddlePaddle and CuDNN in your runtime environment do not meet the requirements, please follow the instructions in [installation document](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html) and make an update. + +## Data preparation + +When you run the sample code for the first time, the trainer will automatically download the cifar-10 dataset. Please make sure your environment has an internet connection. + +The dataset will be downloaded to `dataset/cifar/cifar-10-python.tar.gz` in the same directory as the Trainer. If automatic download fails, you can go to https://www.cs.toronto.edu/~kriz/cifar.html and download cifar-10-python.tar.gz to the location mentioned above. + +## Training a model + +After the environment is ready, you can train the model. There are two entrances: `train_hinas.py` and `train_hinas_res.py`. The former is used to train Model 0-2 (without skip link), and the latter is used to train Model 3-5 (contains skip link). + +Train Model 0~2 (without skip link): +``` +python train_hinas.py --model=m_id # m_id can be 0, 1 or 2. +``` +Train Model 3~5 (with skip link): +``` +python train_hinas_res.py --model=m_id # m_id can be 0, 1 or 2. +``` + +In addition, both `train_hinas.py` and `train_hinas_res.py` support the following parameters: + +- **random_flip_left_right**: Random flip image horizontally. (Default: True) +- **random_flip_up_down**: Randomly flip image vertically. (Default: False) +- **cutout**: Add cutout action to image. (Default: True) +- **standardize_image**: Image standardize. (Default: True) +- **pad_and_cut_image**: Random padding image and then crop back to the original size. (Default: True) +- **shuffle_image**: Shuffle the order of the input images during training. (Default: True) +- **lr_max**: Learning rate at the begin of training. (Default: 0.1) +- **lr_min**: Learning rate at the end of training. (Default: 0.0001) +- **batch_size**: Training batch size (Default: 128) +- **num_epochs**: Total training epoch (Default: 200) +- **weight_decay**: L2 Regularization value (Default: 0.0004) +- **momentum**: The momentum parameter in momentum optimizer (Default: 0.9) +- **dropout_rate**: Dropout rate of the dropout layer (Default: 0.5) +- **bn_decay**: The decay/momentum parameter (or called moving average decay) in batch norm layer (Default: 0.9) + + +## Model performances + +Train all six models using same hyperparameters: + +- learning rate: 0.1 -> 0.0001 with cosine annealing +- total epoch: 200 +- batch size: 128 +- L2 decay: 0.000400 +- optimizer: momentum optimizer with m=0.9 and use nesterov +- preprocess: random horizontal flip + image standardization + cutout + +And below is the accuracy on CIFAR-10 dataset: + +| model | round 1 | round 2 | round 3 | max | avg | +|----------|---------|---------|---------|--------|--------| +| HiNAS-0 | 0.9548 | 0.9520 | 0.9513 | 0.9548 | 0.9527 | +| HiNAS-1 | 0.9452 | 0.9462 | 0.9420 | 0.9462 | 0.9445 | +| HiNAS-2 | 0.9508 | 0.9506 | 0.9483 | 0.9508 | 0.9499 | +| HiNAS-3 | 0.9607 | 0.9623 | 0.9601 | 0.9623 | 0.9611 | +| HiNAS-4 | 0.9611 | 0.9584 | 0.9586 | 0.9611 | 0.9594 | +| HiNAS-5 | 0.9578 | 0.9588 | 0.9594 | 0.9594 | 0.9586 | diff --git a/AutoDL/HiNAS_models/README_cn.md b/AutoDL/HiNAS_models/README_cn.md new file mode 100755 index 0000000000000000000000000000000000000000..8ca3bcbfb8d1ea1a15f969c1a1db22ff2ec854f1 --- /dev/null +++ b/AutoDL/HiNAS_models/README_cn.md @@ -0,0 +1,78 @@ +# Image Classification Models +本目录下包含6个图像分类模型,都是百度大数据实验室 Hierarchical Neural Architecture Search (HiNAS) 项目通过机器自动发现的模型,在CIFAR-10数据集上达到96.1%的准确率。这6个模型分为两类,前3个没有skip link,分别命名为 HiNAS 0-2号,后三个网络带有skip link,功能类似于Resnet中的shortcut connection,分别命名 HiNAS 3-5号。 + +--- +## Table of Contents +- [Installation](#installation) +- [Data preparation](#data-preparation) +- [Training a model](#training-a-model) +- [Model performances](#model-performances) + +## Installation +最低环境要求: + +- PadddlePaddle Fluid >= v0.15.0 +- Cudnn >=6.0 + +如果您的运行环境无法满足要求,可以参考此文档升级PaddlePaddle:[installation document](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html) + +## Data preparation + +第一次训练模型的时候,Trainer会自动下载CIFAR-10数据集,请确保您的环境有互联网连接。 + +数据集会被下载到Trainer同目录下的`dataset/cifar/cifar-10-python.tar.gz`,如果自动下载失败,您可以自行从 https://www.cs.toronto.edu/~kriz/cifar.html 下载cifar-10-python.tar.gz,然后放到上述位置。 + + +## Training a model +准备好环境后,可以训练模型,训练有2个入口,`train_hinas.py`和`train_hinas_res.py`,前者用来训练0-2号不含skip link的模型,后者用来训练3-5号包含skip link的模型。 + +训练0~2号不含skip link的模型: +``` +python train_hinas.py --model=m_id # m_id can be 0, 1 or 2. +``` +训练3~5号包含skip link的模型: +``` +python train_hinas_res.py --model=m_id # m_id can be 0, 1 or 2. +``` + +此外,`train_hinas.py`和`train_hinas_res.py` 都支持以下参数: + +初始化部分: + +- random_flip_left_right:图片随机水平翻转(Default:True) +- random_flip_up_down:图片随机垂直翻转(Default:False) +- cutout:图片随机遮挡(Default:True) +- standardize_image:对图片每个像素做 standardize(Default:True) +- pad_and_cut_image:图片随机padding,并裁剪回原大小(Default:True) +- shuffle_image:训练时对输入图片的顺序做shuffle(Default:True) +- lr_max:训练开始时的learning rate(Default:0.1) +- lr_min:训练结束时的learning rate(Default:0.0001) +- batch_size:训练的batch size(Default:128) +- num_epochs:训练总的epoch(Default:200) +- weight_decay:训练时L2 Regularization大小(Default:0.0004) +- momentum:momentum优化器中的momentum系数(Default:0.9) +- dropout_rate:dropout层的dropout_rate(Default:0.5) +- bn_decay:batch norm层的decay/momentum系数(即moving average decay)大小(Default:0.9) + + + +## Model performances +6个模型使用相同的参数训练: + +- learning rate: 0.1 -> 0.0001 with cosine annealing +- total epoch: 200 +- batch size: 128 +- L2 decay: 0.000400 +- optimizer: momentum optimizer with m=0.9 and use nesterov +- preprocess: random horizontal flip + image standardization + cutout + +以下是6个模型在CIFAR-10数据集上的准确率: + +| model | round 1 | round 2 | round 3 | max | avg | +|----------|---------|---------|---------|--------|--------| +| HiNAS-0 | 0.9548 | 0.9520 | 0.9513 | 0.9548 | 0.9527 | +| HiNAS-1 | 0.9452 | 0.9462 | 0.9420 | 0.9462 | 0.9445 | +| HiNAS-2 | 0.9508 | 0.9506 | 0.9483 | 0.9508 | 0.9499 | +| HiNAS-3 | 0.9607 | 0.9623 | 0.9601 | 0.9623 | 0.9611 | +| HiNAS-4 | 0.9611 | 0.9584 | 0.9586 | 0.9611 | 0.9594 | +| HiNAS-5 | 0.9578 | 0.9588 | 0.9594 | 0.9594 | 0.9586 | diff --git a/fluid/DeepASR/data_utils/__init__.py b/AutoDL/HiNAS_models/build/__init__.py old mode 100644 new mode 100755 similarity index 100% rename from fluid/DeepASR/data_utils/__init__.py rename to AutoDL/HiNAS_models/build/__init__.py diff --git a/fluid/PaddleCV/HiNAS_models/build/layers.py b/AutoDL/HiNAS_models/build/layers.py similarity index 100% rename from fluid/PaddleCV/HiNAS_models/build/layers.py rename to AutoDL/HiNAS_models/build/layers.py diff --git a/fluid/PaddleCV/HiNAS_models/build/ops.py b/AutoDL/HiNAS_models/build/ops.py similarity index 100% rename from fluid/PaddleCV/HiNAS_models/build/ops.py rename to AutoDL/HiNAS_models/build/ops.py diff --git a/fluid/PaddleCV/HiNAS_models/build/resnet_base.py b/AutoDL/HiNAS_models/build/resnet_base.py similarity index 100% rename from fluid/PaddleCV/HiNAS_models/build/resnet_base.py rename to AutoDL/HiNAS_models/build/resnet_base.py diff --git a/fluid/PaddleCV/HiNAS_models/build/vgg_base.py b/AutoDL/HiNAS_models/build/vgg_base.py similarity index 100% rename from fluid/PaddleCV/HiNAS_models/build/vgg_base.py rename to AutoDL/HiNAS_models/build/vgg_base.py diff --git a/fluid/PaddleCV/HiNAS_models/nn_paddle.py b/AutoDL/HiNAS_models/nn_paddle.py similarity index 100% rename from fluid/PaddleCV/HiNAS_models/nn_paddle.py rename to AutoDL/HiNAS_models/nn_paddle.py diff --git a/fluid/PaddleCV/HiNAS_models/reader.py b/AutoDL/HiNAS_models/reader.py similarity index 100% rename from fluid/PaddleCV/HiNAS_models/reader.py rename to AutoDL/HiNAS_models/reader.py diff --git a/fluid/PaddleCV/HiNAS_models/tokens/15113.pkl b/AutoDL/HiNAS_models/tokens/15113.pkl similarity index 100% rename from fluid/PaddleCV/HiNAS_models/tokens/15113.pkl rename to AutoDL/HiNAS_models/tokens/15113.pkl diff --git a/fluid/PaddleCV/HiNAS_models/tokens/15383.pkl b/AutoDL/HiNAS_models/tokens/15383.pkl similarity index 100% rename from fluid/PaddleCV/HiNAS_models/tokens/15383.pkl rename to AutoDL/HiNAS_models/tokens/15383.pkl diff --git a/fluid/PaddleCV/HiNAS_models/tokens/15613.pkl b/AutoDL/HiNAS_models/tokens/15613.pkl similarity index 100% rename from fluid/PaddleCV/HiNAS_models/tokens/15613.pkl rename to AutoDL/HiNAS_models/tokens/15613.pkl diff --git a/fluid/PaddleCV/HiNAS_models/tokens/17754.pkl b/AutoDL/HiNAS_models/tokens/17754.pkl similarity index 100% rename from fluid/PaddleCV/HiNAS_models/tokens/17754.pkl rename to AutoDL/HiNAS_models/tokens/17754.pkl diff --git a/fluid/PaddleCV/HiNAS_models/tokens/17925.pkl b/AutoDL/HiNAS_models/tokens/17925.pkl similarity index 100% rename from fluid/PaddleCV/HiNAS_models/tokens/17925.pkl rename to AutoDL/HiNAS_models/tokens/17925.pkl diff --git a/fluid/PaddleCV/HiNAS_models/tokens/18089.pkl b/AutoDL/HiNAS_models/tokens/18089.pkl similarity index 100% rename from fluid/PaddleCV/HiNAS_models/tokens/18089.pkl rename to AutoDL/HiNAS_models/tokens/18089.pkl diff --git a/fluid/PaddleCV/HiNAS_models/train_hinas.py b/AutoDL/HiNAS_models/train_hinas.py similarity index 100% rename from fluid/PaddleCV/HiNAS_models/train_hinas.py rename to AutoDL/HiNAS_models/train_hinas.py diff --git a/fluid/PaddleCV/HiNAS_models/train_hinas_res.py b/AutoDL/HiNAS_models/train_hinas_res.py similarity index 100% rename from fluid/PaddleCV/HiNAS_models/train_hinas_res.py rename to AutoDL/HiNAS_models/train_hinas_res.py diff --git a/AutoDL/LRC/README.md b/AutoDL/LRC/README.md new file mode 100644 index 0000000000000000000000000000000000000000..df9af47d4a3876371673cbbfef0ad2553768b9a5 --- /dev/null +++ b/AutoDL/LRC/README.md @@ -0,0 +1,74 @@ +# LRC Local Rademachar Complexity Regularization +Regularization of Deep Neural Networks(DNNs) for the sake of improving their generalization capability is important and chllenging. This directory contains image classification model based on a novel regularizer rooted in Local Rademacher Complexity (LRC). We appreciate the contribution by [DARTS](https://arxiv.org/abs/1806.09055) for our research. The regularization by LRC and DARTS are combined in this model on CIFAR-10 dataset. Code accompanying the paper +> [An Empirical Study on Regularization of Deep Neural Networks by Local Rademacher Complexity](https://arxiv.org/abs/1902.00873)\ +> Yingzhen Yang, Xingjian Li, Jun Huan.\ +> _arXiv:1902.00873_. + +--- +# Table of Contents + +- [Installation](#installation) +- [Data preparation](#data-preparation) +- [Training](#training) + +## Installation + +Running sample code in this directory requires PaddelPaddle Fluid v.1.2.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html#paddlepaddle) and make an update. + +## Data preparation + +When you want to use the cifar-10 dataset for the first time, you can download the dataset as: + + sh ./dataset/download.sh + +Please make sure your environment has an internet connection. + +The dataset will be downloaded to `dataset/cifar/cifar-10-batches-py` in the same directory as the `train.py`. If automatic download fails, you can download cifar-10-python.tar.gz from https://www.cs.toronto.edu/~kriz/cifar.html and decompress it to the location mentioned above. + + +## Training + +After data preparation, one can start the training step by: + + python -u train_mixup.py \ + --batch_size=80 \ + --auxiliary \ + --weight_decay=0.0003 \ + --learning_rate=0.025 \ + --lrc_loss_lambda=0.7 \ + --cutout +- Set ```export CUDA_VISIBLE_DEVICES=0``` to specifiy one GPU to train. +- For more help on arguments: + + python train_mixup.py --help + +**data reader introduction:** + +* Data reader is defined in `reader.py`. +* Reshape the images to 32 * 32. +* In training stage, images are padding to 40 * 40 and cropped randomly to the original size. +* In training stage, images are horizontally random flipped. +* Images are standardized to (0, 1). +* In training stage, cutout images randomly. +* Shuffle the order of the input images during training. + +**model configuration:** + +* Use auxiliary loss and auxiliary\_weight=0.4. +* Use dropout and drop\_path\_prob=0.2. +* Set lrc\_loss\_lambda=0.7. + +**training strategy:** + +* Use momentum optimizer with momentum=0.9. +* Weight decay is 0.0003. +* Use cosine decay with init\_lr=0.025. +* Total epoch is 600. +* Use Xaiver initalizer to weight in conv2d, Constant initalizer to weight in batch norm and Normal initalizer to weight in fc. +* Initalize bias in batch norm and fc to zero constant and do not add bias to conv2d. + + +## Reference + + - DARTS: Differentiable Architecture Search [`paper`](https://arxiv.org/abs/1806.09055) + - Differentiable architecture search in PyTorch [`code`](https://github.com/quark0/darts) diff --git a/AutoDL/LRC/README_cn.md b/AutoDL/LRC/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..06dc937074de199af31db97ee200e7690443b1b0 --- /dev/null +++ b/AutoDL/LRC/README_cn.md @@ -0,0 +1,71 @@ +# LRC 局部Rademachar复杂度正则化 +为了在深度神经网络中提升泛化能力,正则化的选择十分重要也具有挑战性。本目录包括了一种基于局部rademacher复杂度的新型正则(LRC)的图像分类模型。十分感谢[DARTS](https://arxiv.org/abs/1806.09055)模型对本研究提供的帮助。该模型将LRC正则和DARTS网络相结合,在CIFAR-10数据集中得到了很出色的效果。代码和文章一同发布 +> [An Empirical Study on Regularization of Deep Neural Networks by Local Rademacher Complexity](https://arxiv.org/abs/1902.00873)\ +> Yingzhen Yang, Xingjian Li, Jun Huan.\ +> _arXiv:1902.00873_. + +--- +# 内容 + +- [安装](#安装) +- [数据准备](#数据准备) +- [模型训练](#模型训练) + +## 安装 + +在当前目录下运行样例代码需要PadddlePaddle Fluid的v.1.2.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://www.paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html#paddlepaddle)中的说明来更新PaddlePaddle。 + +## 数据准备 + +第一次使用CIFAR-10数据集时,您可以通过如果命令下载: + + sh ./dataset/download.sh + +请确保您的环境有互联网连接。数据会下载到`train.py`同目录下的`dataset/cifar/cifar-10-batches-py`。如果下载失败,您可以自行从https://www.cs.toronto.edu/~kriz/cifar.html上下载cifar-10-python.tar.gz并解压到上述位置。 + +## 模型训练 + +数据准备好后,可以通过如下命令开始训练: + + python -u train_mixup.py \ + --batch_size=80 \ + --auxiliary \ + --weight_decay=0.0003 \ + --learning_rate=0.025 \ + --lrc_loss_lambda=0.7 \ + --cutout +- 通过设置 ```export CUDA_VISIBLE_DEVICES=0```指定单张GPU训练。 +- 可选参数见: + + python train_mixup.py --help + +**数据读取器说明:** + +* 数据读取器定义在`reader.py`中 +* 输入图像尺寸统一变换为32 * 32 +* 训练时将图像填充为40 * 40然后随机剪裁为原输入图像大小 +* 训练时图像随机水平翻转 +* 对图像每个像素做归一化处理 +* 训练时对图像做随机遮挡 +* 训练时对输入图像做随机洗牌 + +**模型配置:** + +* 使用辅助损失,辅助损失权重为0.4 +* 使用dropout,随机丢弃率为0.2 +* 设置lrc\_loss\_lambda为0.7 + +**训练策略:** + +* 采用momentum优化算法训练,momentum=0.9 +* 权重衰减系数为0.0001 +* 采用正弦学习率衰减,初始学习率为0.025 +* 总共训练600轮 +* 对卷积权重采用Xaiver初始化,对batch norm权重采用固定初始化,对全连接层权重采用高斯初始化 +* 对batch norm和全连接层偏差采用固定初始化,不对卷积设置偏差 + + +## 引用 + + - DARTS: Differentiable Architecture Search [`论文`](https://arxiv.org/abs/1806.09055) + - Differentiable Architecture Search in PyTorch [`代码`](https://github.com/quark0/darts) diff --git a/fluid/AutoDL/LRC/dataset/download.sh b/AutoDL/LRC/dataset/download.sh similarity index 100% rename from fluid/AutoDL/LRC/dataset/download.sh rename to AutoDL/LRC/dataset/download.sh diff --git a/fluid/AutoDL/LRC/genotypes.py b/AutoDL/LRC/genotypes.py similarity index 100% rename from fluid/AutoDL/LRC/genotypes.py rename to AutoDL/LRC/genotypes.py diff --git a/fluid/AutoDL/LRC/learning_rate.py b/AutoDL/LRC/learning_rate.py similarity index 100% rename from fluid/AutoDL/LRC/learning_rate.py rename to AutoDL/LRC/learning_rate.py diff --git a/fluid/AutoDL/LRC/model.py b/AutoDL/LRC/model.py similarity index 100% rename from fluid/AutoDL/LRC/model.py rename to AutoDL/LRC/model.py diff --git a/fluid/AutoDL/LRC/operations.py b/AutoDL/LRC/operations.py similarity index 100% rename from fluid/AutoDL/LRC/operations.py rename to AutoDL/LRC/operations.py diff --git a/fluid/AutoDL/LRC/reader.py b/AutoDL/LRC/reader.py similarity index 100% rename from fluid/AutoDL/LRC/reader.py rename to AutoDL/LRC/reader.py diff --git a/fluid/AutoDL/LRC/run.sh b/AutoDL/LRC/run.sh similarity index 100% rename from fluid/AutoDL/LRC/run.sh rename to AutoDL/LRC/run.sh diff --git a/fluid/AutoDL/LRC/train_mixup.py b/AutoDL/LRC/train_mixup.py similarity index 100% rename from fluid/AutoDL/LRC/train_mixup.py rename to AutoDL/LRC/train_mixup.py diff --git a/fluid/AutoDL/LRC/utils.py b/AutoDL/LRC/utils.py similarity index 100% rename from fluid/AutoDL/LRC/utils.py rename to AutoDL/LRC/utils.py diff --git a/PaddleCV/README.md b/PaddleCV/README.md new file mode 100644 index 0000000000000000000000000000000000000000..bec8ec938ca01469d79ca75c8781d112839e07d2 --- /dev/null +++ b/PaddleCV/README.md @@ -0,0 +1,87 @@ +PaddleCV +======== + +图像分类 +-------- + +图像分类是根据图像的语义信息对不同类别图像进行区分,是计算机视觉中重要的基础问题,是物体检测、图像分割、物体跟踪、行为分析、人脸识别等其他高层视觉任务的基础,在许多领域都有着广泛的应用。如:安防领域的人脸识别和智能视频分析等,交通领域的交通场景识别,互联网领域基于内容的图像检索和相册自动归类,医学领域的图像识别等。 + +在深度学习时代,图像分类的准确率大幅度提升,在图像分类任务中,我们向大家介绍了如何在经典的数据集ImageNet上,训练常用的模型,包括AlexNet、VGG、GoogLeNet、ResNet、Inception-v4、MobileNet、DPN(Dual Path Network)、SE-ResNeXt模型,也开源了[训练的模型](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/image_classification/README_cn.md#已有模型及其性能) 方便用户下载使用。同时提供了能够将Caffe模型转换为PaddlePaddle +Fluid模型配置和参数文件的工具。 + +- [AlexNet](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification/models) +- [VGG](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification/models) +- [GoogleNet](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification/models) +- [Residual Network](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification/models) +- [Inception-v4](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification/models) +- [MobileNet](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification/models) +- [Dual Path Network](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification/models) +- [SE-ResNeXt](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification/models) +- [Caffe模型转换为Paddle Fluid配置和模型文件工具](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/caffe2fluid) + +目标检测 +-------- + +目标检测任务的目标是给定一张图像或是一个视频帧,让计算机找出其中所有目标的位置,并给出每个目标的具体类别。对于人类来说,目标检测是一个非常简单的任务。然而,计算机能够“看到”的是图像被编码之后的数字,很难解图像或是视频帧中出现了人或是物体这样的高层语义概念,也就更加难以定位目标出现在图像中哪个区域。与此同时,由于目标会出现在图像或是视频帧中的任何位置,目标的形态千变万化,图像或是视频帧的背景千差万别,诸多因素都使得目标检测对计算机来说是一个具有挑战性的问题。 + +在目标检测任务中,我们介绍了如何基于[PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/) 、[MS COCO](http://cocodataset.org/#home) 数据训练通用物体检测模型,当前介绍了SSD算法,SSD全称Single Shot MultiBox Detector,是目标检测领域较新且效果较好的检测算法之一,具有检测速度快且检测精度高的特点。 + +开放环境中的检测人脸,尤其是小的、模糊的和部分遮挡的人脸也是一个具有挑战的任务。我们也介绍了如何基于 [WIDER FACE](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace) 数据训练百度自研的人脸检测PyramidBox模型,该算法于2018年3月份在WIDER FACE的多项评测中均获得 [第一名](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/WiderFace_Results.html)。 + +Faster RCNN模型是典型的两阶段目标检测器,相较于传统提取区域的方法,通过RPN网络共享卷积层参数大幅提高提取区域的效率,并提出高质量的候选区域。 + +Mask RCNN模型是基于Faster RCNN模型的经典实例分割模型,在原有Faster RCNN模型基础上添加分割分支,得到掩码结果,实现了掩码和类别预测关系的解藕。 + +- [Single Shot MultiBox Detector](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/README_cn.md) +- [Face Detector: PyramidBox](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/face_detection/README_cn.md) +- [Faster RCNN](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/rcnn/README_cn.md) +- [Mask RCNN](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/rcnn/README_cn.md) + +图像语义分割 +------------ + +图像语意分割顾名思义是将图像像素按照表达的语义含义的不同进行分组/分割,图像语义是指对图像内容的理解,例如,能够描绘出什么物体在哪里做了什么事情等,分割是指对图片中的每个像素点进行标注,标注属于哪一类别。近年来用在无人车驾驶技术中分割街景来避让行人和车辆、医疗影像分析中辅助诊断等。 + +在图像语义分割任务中,我们介绍如何基于图像级联网络(Image Cascade +Network,ICNet)进行语义分割,相比其他分割算法,ICNet兼顾了准确率和速度。 + +- [ICNet](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/icnet) + +图像生成 +----------- + +图像生成是指根据输入向量,生成目标图像。这里的输入向量可以是随机的噪声或用户指定的条件向量。具体的应用场景有:手写体生成、人脸合成、风格迁移、图像修复等。当前的图像生成任务主要是借助生成对抗网络(GAN)来实现。 +生成对抗网络(GAN)由两种子网络组成:生成器和识别器。生成器的输入是随机噪声或条件向量,输出是目标图像。识别器是一个分类器,输入是一张图像,输出是该图像是否是真实的图像。在训练过程中,生成器和识别器通过不断的相互博弈提升自己的能力。 + +在图像生成任务中,我们介绍了如何使用DCGAN和ConditioanlGAN来进行手写数字的生成,另外还介绍了用于风格迁移的CycleGAN. + +- [DCGAN & ConditionalGAN](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/gan/c_gan) +- [CycleGAN](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/gan/cycle_gan) + +场景文字识别 +------------ + +许多场景图像中包含着丰富的文本信息,对理解图像信息有着重要作用,能够极大地帮助人们认知和理解场景图像的内容。场景文字识别是在图像背景复杂、分辨率低下、字体多样、分布随意等情况下,将图像信息转化为文字序列的过程,可认为是一种特别的翻译过程:将图像输入翻译为自然语言输出。场景图像文字识别技术的发展也促进了一些新型应用的产生,如通过自动识别路牌中的文字帮助街景应用获取更加准确的地址信息等。 + +在场景文字识别任务中,我们介绍如何将基于CNN的图像特征提取和基于RNN的序列翻译技术结合,免除人工定义特征,避免字符分割,使用自动学习到的图像特征,完成字符识别。当前,介绍了CRNN-CTC模型和基于注意力机制的序列到序列模型。 + +- [CRNN-CTC模型](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/ocr_recognition) +- [Attention模型](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/ocr_recognition) + + +度量学习 +------- + + +度量学习也称作距离度量学习、相似度学习,通过学习对象之间的距离,度量学习能够用于分析对象时间的关联、比较关系,在实际问题中应用较为广泛,可应用于辅助分类、聚类问题,也广泛用于图像检索、人脸识别等领域。以往,针对不同的任务,需要选择合适的特征并手动构建距离函数,而度量学习可根据不同的任务来自主学习出针对特定任务的度量距离函数。度量学习和深度学习的结合,在人脸识别/验证、行人再识别(human Re-ID)、图像检索等领域均取得较好的性能,在这个任务中我们主要介绍了基于Fluid的深度度量学习模型,包含了三元组、四元组等损失函数。 + +- [Metric Learning](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/metric_learning) + + +视频分类 +------- + +视频分类是视频理解任务的基础,与图像分类不同的是,分类的对象不再是静止的图像,而是一个由多帧图像构成的、包含语音数据、包含运动信息等的视频对象,因此理解视频需要获得更多的上下文信息,不仅要理解每帧图像是什么、包含什么,还需要结合不同帧,知道上下文的关联信息。视频分类方法主要包含基于卷积神经网络、基于循环神经网络、或将这两者结合的方法。该任务中我们介绍基于Fluid的视频分类模型,目前包含Temporal Segment Network(TSN)模型,后续会持续增加更多模型。 + + +- [TSN](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/video_classification) diff --git a/PaddleCV/adversarial/README.md b/PaddleCV/adversarial/README.md new file mode 100644 index 0000000000000000000000000000000000000000..91661f7e1675d59c7d38c4c09bc67d5b9339573d --- /dev/null +++ b/PaddleCV/adversarial/README.md @@ -0,0 +1,112 @@ +The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html). + +--- + +# Advbox + +Advbox is a toolbox to generate adversarial examples that fool neural networks and Advbox can benchmark the robustness of machine learning models. + +The Advbox is based on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) Fluid and is under continual development, always welcoming contributions of the latest method of adversarial attacks and defenses. + + +## Overview +[Szegedy et al.](https://arxiv.org/abs/1312.6199) discovered an intriguing properties of deep neural networks in the context of image classification for the first time. They showed that despite the state-of-the-art deep networks are surprisingly susceptible to adversarial attacks in the form of small perturbations to images that remain (almost) imperceptible to human vision system. These perturbations are found by optimizing the input to maximize the prediction error and the images modified by these perturbations are called as `adversarial examples`. The profound implications of these results triggered a wide interest of researchers in adversarial attacks and their defenses for deep learning in general. + +Advbox is similar to [Foolbox](https://github.com/bethgelab/foolbox) and [CleverHans](https://github.com/tensorflow/cleverhans). CleverHans only supports TensorFlow framework while foolbox interfaces with many popular machine learning frameworks such as PyTorch, Keras, TensorFlow, Theano, Lasagne and MXNet. However, these two great libraries don't support PaddlePaddle, an easy-to-use, efficient, flexible and scalable deep learning platform which is originally developed by Baidu scientists and engineers for the purpose of applying deep learning to many products at Baidu. + +## Usage +Advbox provides many stable reference implementations of modern methods to generate adversarial examples such as FGSM, DeepFool, JSMA. When you want to benchmark the robustness of your neural networks , you can use the advbox to generate some adversarial examples and benchmark the networks. Some tips of using Advbox: + +1. Train a model and save the parameters. +2. Load the parameters which has been trained,then reconstruct the model. +3. Use advbox to generate the adversarial samples. + + +#### Dependencies +* PaddlePaddle: [the lastest develop branch](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html) +* Python 2.x + +#### Structure + +Network models, attack method's implements and the criterion that defines adversarial examples are three essential elements to generate adversarial examples. Misclassification is adopted as the adversarial criterion for briefness in Advbox. + +The structure of Advbox module are as follows: + + . + ├── advbox + | ├── __init__.py + | ├── attack + | ├── __init__.py + | ├── base.py + | ├── deepfool.py + | ├── gradient_method.py + | ├── lbfgs.py + | └── saliency.py + | ├── models + | ├── __init__.py + | ├── base.py + | └── paddle.py + | └── adversary.py + ├── tutorials + | ├── __init__.py + | ├── mnist_model.py + | ├── mnist_tutorial_lbfgs.py + | ├── mnist_tutorial_fgsm.py + | ├── mnist_tutorial_bim.py + | ├── mnist_tutorial_ilcm.py + | ├── mnist_tutorial_mifgsm.py + | ├── mnist_tutorial_jsma.py + | └── mnist_tutorial_deepfool.py + └── README.md + +**advbox.attack** + +Advbox implements several popular adversarial attacks which search adversarial examples. Each attack method uses a distance measure(L1, L2, etc.) to quantify the size of adversarial perturbations. Advbox is easy to craft adversarial example as some attack methods could perform internal hyperparameter tuning to find the minimum perturbation. + +**advbox.model** + +Advbox implements interfaces to PaddlePaddle. Additionally, other deep learning framworks such as TensorFlow can also be defined and employed. The module is use to compute predictions and gradients for given inputs in a specific framework. + +**advbox.adversary** + +Adversary contains the original object, the target and the adversarial examples. It provides the misclassification as the criterion to accept a adversarial example. + +## Tutorials +The `./tutorials/` folder provides some tutorials to generate adversarial examples on the MNIST dataset. You can slightly modify the code to apply to other dataset. These attack methods are supported in Advbox: + +* [L-BFGS](https://arxiv.org/abs/1312.6199) +* [FGSM](https://arxiv.org/abs/1412.6572) +* [BIM](https://arxiv.org/abs/1607.02533) +* [ILCM](https://arxiv.org/abs/1607.02533) +* [MI-FGSM](https://arxiv.org/pdf/1710.06081.pdf) +* [JSMA](https://arxiv.org/pdf/1511.07528) +* [DeepFool](https://arxiv.org/abs/1511.04599) + +## Testing +Benchmarks on a vanilla CNN model. + +> MNIST + +| adversarial attacks | fooling rate (non-targeted) | fooling rate (targeted) | max_epsilon | iterations | Strength | +|:-----:| :----: | :---: | :----: | :----: | :----: | +|L-BFGS| --- | 89.2% | --- | One shot | *** | +|FGSM| 57.8% | 26.55% | 0.3 | One shot| *** | +|BIM| 97.4% | --- | 0.1 | 100 | **** | +|ILCM| --- | 100.0% | 0.1 | 100 | **** | +|MI-FGSM| 94.4% | 100.0% | 0.1 | 100 | **** | +|JSMA| 96.8% | 90.4%| 0.1 | 2000 | *** | +|DeepFool| 97.7% | 51.3% | --- | 100 | **** | + +* The strength (higher for more asterisks) is based on the impression from the reviewed literature. + +--- +## References +* [Intriguing properties of neural networks](https://arxiv.org/abs/1312.6199), C. Szegedy et al., arxiv 2014 +* [Explaining and Harnessing Adversarial Examples](https://arxiv.org/abs/1412.6572), I. Goodfellow et al., ICLR 2015 +* [Adversarial Examples In The Physical World](https://arxiv.org/pdf/1607.02533v3.pdf), A. Kurakin et al., ICLR workshop 2017 +* [Boosting Adversarial Attacks with Momentum](https://arxiv.org/abs/1710.06081), Yinpeng Dong et al., arxiv 2018 +* [The Limitations of Deep Learning in Adversarial Settings](https://arxiv.org/abs/1511.07528), N. Papernot et al., ESSP 2016 +* [DeepFool: a simple and accurate method to fool deep neural networks](https://arxiv.org/abs/1511.04599), S. Moosavi-Dezfooli et al., CVPR 2016 +* [Foolbox: A Python toolbox to benchmark the robustness of machine learning models](https://arxiv.org/abs/1707.04131), Jonas Rauber et al., arxiv 2018 +* [CleverHans: An adversarial example library for constructing attacks, building defenses, and benchmarking both](https://github.com/tensorflow/cleverhans#setting-up-cleverhans) +* [Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey](https://arxiv.org/abs/1801.00553), Naveed Akhtar, Ajmal Mian, arxiv 2018 diff --git a/fluid/adversarial/advbox/__init__.py b/PaddleCV/adversarial/advbox/__init__.py similarity index 100% rename from fluid/adversarial/advbox/__init__.py rename to PaddleCV/adversarial/advbox/__init__.py diff --git a/fluid/adversarial/advbox/adversary.py b/PaddleCV/adversarial/advbox/adversary.py similarity index 100% rename from fluid/adversarial/advbox/adversary.py rename to PaddleCV/adversarial/advbox/adversary.py diff --git a/fluid/adversarial/advbox/attacks/__init__.py b/PaddleCV/adversarial/advbox/attacks/__init__.py similarity index 100% rename from fluid/adversarial/advbox/attacks/__init__.py rename to PaddleCV/adversarial/advbox/attacks/__init__.py diff --git a/fluid/adversarial/advbox/attacks/base.py b/PaddleCV/adversarial/advbox/attacks/base.py similarity index 100% rename from fluid/adversarial/advbox/attacks/base.py rename to PaddleCV/adversarial/advbox/attacks/base.py diff --git a/fluid/adversarial/advbox/attacks/deepfool.py b/PaddleCV/adversarial/advbox/attacks/deepfool.py similarity index 100% rename from fluid/adversarial/advbox/attacks/deepfool.py rename to PaddleCV/adversarial/advbox/attacks/deepfool.py diff --git a/fluid/adversarial/advbox/attacks/gradient_method.py b/PaddleCV/adversarial/advbox/attacks/gradient_method.py similarity index 100% rename from fluid/adversarial/advbox/attacks/gradient_method.py rename to PaddleCV/adversarial/advbox/attacks/gradient_method.py diff --git a/fluid/adversarial/advbox/attacks/lbfgs.py b/PaddleCV/adversarial/advbox/attacks/lbfgs.py similarity index 100% rename from fluid/adversarial/advbox/attacks/lbfgs.py rename to PaddleCV/adversarial/advbox/attacks/lbfgs.py diff --git a/fluid/adversarial/advbox/attacks/saliency.py b/PaddleCV/adversarial/advbox/attacks/saliency.py similarity index 100% rename from fluid/adversarial/advbox/attacks/saliency.py rename to PaddleCV/adversarial/advbox/attacks/saliency.py diff --git a/fluid/adversarial/advbox/models/__init__.py b/PaddleCV/adversarial/advbox/models/__init__.py similarity index 100% rename from fluid/adversarial/advbox/models/__init__.py rename to PaddleCV/adversarial/advbox/models/__init__.py diff --git a/fluid/adversarial/advbox/models/base.py b/PaddleCV/adversarial/advbox/models/base.py similarity index 100% rename from fluid/adversarial/advbox/models/base.py rename to PaddleCV/adversarial/advbox/models/base.py diff --git a/fluid/adversarial/advbox/models/paddle.py b/PaddleCV/adversarial/advbox/models/paddle.py similarity index 100% rename from fluid/adversarial/advbox/models/paddle.py rename to PaddleCV/adversarial/advbox/models/paddle.py diff --git a/fluid/adversarial/tutorials/__init__.py b/PaddleCV/adversarial/tutorials/__init__.py similarity index 100% rename from fluid/adversarial/tutorials/__init__.py rename to PaddleCV/adversarial/tutorials/__init__.py diff --git a/fluid/adversarial/tutorials/mnist_model.py b/PaddleCV/adversarial/tutorials/mnist_model.py similarity index 100% rename from fluid/adversarial/tutorials/mnist_model.py rename to PaddleCV/adversarial/tutorials/mnist_model.py diff --git a/fluid/adversarial/tutorials/mnist_tutorial_bim.py b/PaddleCV/adversarial/tutorials/mnist_tutorial_bim.py similarity index 100% rename from fluid/adversarial/tutorials/mnist_tutorial_bim.py rename to PaddleCV/adversarial/tutorials/mnist_tutorial_bim.py diff --git a/fluid/adversarial/tutorials/mnist_tutorial_deepfool.py b/PaddleCV/adversarial/tutorials/mnist_tutorial_deepfool.py similarity index 100% rename from fluid/adversarial/tutorials/mnist_tutorial_deepfool.py rename to PaddleCV/adversarial/tutorials/mnist_tutorial_deepfool.py diff --git a/fluid/adversarial/tutorials/mnist_tutorial_fgsm.py b/PaddleCV/adversarial/tutorials/mnist_tutorial_fgsm.py similarity index 100% rename from fluid/adversarial/tutorials/mnist_tutorial_fgsm.py rename to PaddleCV/adversarial/tutorials/mnist_tutorial_fgsm.py diff --git a/fluid/adversarial/tutorials/mnist_tutorial_ilcm.py b/PaddleCV/adversarial/tutorials/mnist_tutorial_ilcm.py similarity index 100% rename from fluid/adversarial/tutorials/mnist_tutorial_ilcm.py rename to PaddleCV/adversarial/tutorials/mnist_tutorial_ilcm.py diff --git a/fluid/adversarial/tutorials/mnist_tutorial_jsma.py b/PaddleCV/adversarial/tutorials/mnist_tutorial_jsma.py similarity index 100% rename from fluid/adversarial/tutorials/mnist_tutorial_jsma.py rename to PaddleCV/adversarial/tutorials/mnist_tutorial_jsma.py diff --git a/fluid/adversarial/tutorials/mnist_tutorial_lbfgs.py b/PaddleCV/adversarial/tutorials/mnist_tutorial_lbfgs.py similarity index 100% rename from fluid/adversarial/tutorials/mnist_tutorial_lbfgs.py rename to PaddleCV/adversarial/tutorials/mnist_tutorial_lbfgs.py diff --git a/fluid/adversarial/tutorials/mnist_tutorial_mifgsm.py b/PaddleCV/adversarial/tutorials/mnist_tutorial_mifgsm.py similarity index 100% rename from fluid/adversarial/tutorials/mnist_tutorial_mifgsm.py rename to PaddleCV/adversarial/tutorials/mnist_tutorial_mifgsm.py diff --git a/fluid/PaddleCV/caffe2fluid/.gitignore b/PaddleCV/caffe2fluid/.gitignore similarity index 100% rename from fluid/PaddleCV/caffe2fluid/.gitignore rename to PaddleCV/caffe2fluid/.gitignore diff --git a/PaddleCV/caffe2fluid/README.md b/PaddleCV/caffe2fluid/README.md new file mode 100644 index 0000000000000000000000000000000000000000..8520342325a1ef4e08d8f9669969acd5b6b57851 --- /dev/null +++ b/PaddleCV/caffe2fluid/README.md @@ -0,0 +1,87 @@ +### Caffe2Fluid +This tool is used to convert a Caffe model to a Fluid model + +### Key Features +1. Convert caffe model to fluid model with codes of defining a network(useful for re-training) + +2. Pycaffe is not necessary when just want convert model without do caffe-inference + +3. Caffe's customized layers convertion also be supported by extending this tool + +4. A bunch of tools in `examples/imagenet/tools` are provided to compare the difference + +### HowTo +1. Prepare `caffepb.py` in `./proto` if your python has no `pycaffe` module, two options provided here: + - Generate pycaffe from caffe.proto + ``` + bash ./proto/compile.sh + ``` + + - Download one from github directly + ``` + cd proto/ && wget https://raw.githubusercontent.com/ethereon/caffe-tensorflow/master/kaffe/caffe/caffepb.py + ``` + +2. Convert the Caffe model to Fluid model + - Generate fluid code and weight file + ``` + python convert.py alexnet.prototxt \ + --caffemodel alexnet.caffemodel \ + --data-output-path alexnet.npy \ + --code-output-path alexnet.py + ``` + + - Save weights as fluid model file + ``` + # only infer the last layer's result + python alexnet.py alexnet.npy ./fluid + # infer these 2 layer's result + python alexnet.py alexnet.npy ./fluid fc8,prob + ``` + +3. Use the converted model to infer + - See more details in `examples/imagenet/tools/run.sh` + +4. Compare the inference results with caffe + - See more details in `examples/imagenet/tools/diff.sh` + +### How to convert custom layer +1. Implement your custom layer in a file under `kaffe/custom_layers`, eg: mylayer.py + - Implement ```shape_func(input_shape, [other_caffe_params])``` to calculate the output shape + - Implement ```layer_func(inputs, name, [other_caffe_params])``` to construct a fluid layer + - Register these two functions ```register(kind='MyType', shape=shape_func, layer=layer_func)``` + - Notes: more examples can be found in `kaffe/custom_layers` + +2. Add ```import mylayer``` to `kaffe/custom_layers/\_\_init__.py` + +3. Prepare your pycaffe as your customized version(same as previous env prepare) + - (option1) replace `proto/caffe.proto` with your own caffe.proto and compile it + - (option2) change your `pycaffe` to the customized version + +4. Convert the Caffe model to Fluid model + +5. Set env $CAFFE2FLUID_CUSTOM_LAYERS to the parent directory of 'custom_layers' + ``` + export CAFFE2FLUID_CUSTOM_LAYERS=/path/to/caffe2fluid/kaffe + ``` + +6. Use the converted model when loading model in `xxxnet.py` and `xxxnet.npy`(no need if model is already in `fluid/model` and `fluid/params`) + +### Tested models +- Lenet: +[model addr](https://github.com/ethereon/caffe-tensorflow/blob/master/examples/mnist) + +- ResNets:(ResNet-50, ResNet-101, ResNet-152) +[model addr](https://onedrive.live.com/?authkey=%21AAFW2-FVoxeVRck&id=4006CBB8476FF777%2117887&cid=4006CBB8476FF777) + +- GoogleNet: +[model addr](https://gist.github.com/jimmie33/7ea9f8ac0da259866b854460f4526034) + +- VGG: +[model addr](https://gist.github.com/ksimonyan/211839e770f7b538e2d8) + +- AlexNet: +[model addr](https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet) + +### Notes +Some of this code come from here: [caffe-tensorflow](https://github.com/ethereon/caffe-tensorflow) diff --git a/fluid/PaddleCV/caffe2fluid/convert.py b/PaddleCV/caffe2fluid/convert.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/convert.py rename to PaddleCV/caffe2fluid/convert.py diff --git a/fluid/PaddleCV/caffe2fluid/examples/imagenet/README.md b/PaddleCV/caffe2fluid/examples/imagenet/README.md similarity index 100% rename from fluid/PaddleCV/caffe2fluid/examples/imagenet/README.md rename to PaddleCV/caffe2fluid/examples/imagenet/README.md diff --git a/fluid/PaddleCV/caffe2fluid/examples/imagenet/compare.py b/PaddleCV/caffe2fluid/examples/imagenet/compare.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/examples/imagenet/compare.py rename to PaddleCV/caffe2fluid/examples/imagenet/compare.py diff --git a/fluid/PaddleCV/caffe2fluid/examples/imagenet/data/65.jpeg b/PaddleCV/caffe2fluid/examples/imagenet/data/65.jpeg similarity index 100% rename from fluid/PaddleCV/caffe2fluid/examples/imagenet/data/65.jpeg rename to PaddleCV/caffe2fluid/examples/imagenet/data/65.jpeg diff --git a/fluid/PaddleCV/caffe2fluid/examples/imagenet/infer.py b/PaddleCV/caffe2fluid/examples/imagenet/infer.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/examples/imagenet/infer.py rename to PaddleCV/caffe2fluid/examples/imagenet/infer.py diff --git a/fluid/PaddleCV/caffe2fluid/examples/imagenet/tools/cmp.sh b/PaddleCV/caffe2fluid/examples/imagenet/tools/cmp.sh similarity index 100% rename from fluid/PaddleCV/caffe2fluid/examples/imagenet/tools/cmp.sh rename to PaddleCV/caffe2fluid/examples/imagenet/tools/cmp.sh diff --git a/fluid/PaddleCV/caffe2fluid/examples/imagenet/tools/cmp_layers.sh b/PaddleCV/caffe2fluid/examples/imagenet/tools/cmp_layers.sh similarity index 100% rename from fluid/PaddleCV/caffe2fluid/examples/imagenet/tools/cmp_layers.sh rename to PaddleCV/caffe2fluid/examples/imagenet/tools/cmp_layers.sh diff --git a/fluid/PaddleCV/caffe2fluid/examples/imagenet/tools/diff.sh b/PaddleCV/caffe2fluid/examples/imagenet/tools/diff.sh similarity index 100% rename from fluid/PaddleCV/caffe2fluid/examples/imagenet/tools/diff.sh rename to PaddleCV/caffe2fluid/examples/imagenet/tools/diff.sh diff --git a/fluid/PaddleCV/caffe2fluid/examples/imagenet/tools/run.sh b/PaddleCV/caffe2fluid/examples/imagenet/tools/run.sh similarity index 100% rename from fluid/PaddleCV/caffe2fluid/examples/imagenet/tools/run.sh rename to PaddleCV/caffe2fluid/examples/imagenet/tools/run.sh diff --git a/fluid/PaddleCV/caffe2fluid/examples/imagenet/tools/test.sh b/PaddleCV/caffe2fluid/examples/imagenet/tools/test.sh similarity index 100% rename from fluid/PaddleCV/caffe2fluid/examples/imagenet/tools/test.sh rename to PaddleCV/caffe2fluid/examples/imagenet/tools/test.sh diff --git a/fluid/PaddleCV/caffe2fluid/examples/mnist/README.md b/PaddleCV/caffe2fluid/examples/mnist/README.md similarity index 100% rename from fluid/PaddleCV/caffe2fluid/examples/mnist/README.md rename to PaddleCV/caffe2fluid/examples/mnist/README.md diff --git a/fluid/PaddleCV/caffe2fluid/examples/mnist/evaluate.py b/PaddleCV/caffe2fluid/examples/mnist/evaluate.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/examples/mnist/evaluate.py rename to PaddleCV/caffe2fluid/examples/mnist/evaluate.py diff --git a/fluid/PaddleCV/caffe2fluid/examples/mnist/run.sh b/PaddleCV/caffe2fluid/examples/mnist/run.sh similarity index 100% rename from fluid/PaddleCV/caffe2fluid/examples/mnist/run.sh rename to PaddleCV/caffe2fluid/examples/mnist/run.sh diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/__init__.py b/PaddleCV/caffe2fluid/kaffe/__init__.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/__init__.py rename to PaddleCV/caffe2fluid/kaffe/__init__.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/caffe/__init__.py b/PaddleCV/caffe2fluid/kaffe/caffe/__init__.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/caffe/__init__.py rename to PaddleCV/caffe2fluid/kaffe/caffe/__init__.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/caffe/resolver.py b/PaddleCV/caffe2fluid/kaffe/caffe/resolver.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/caffe/resolver.py rename to PaddleCV/caffe2fluid/kaffe/caffe/resolver.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/__init__.py b/PaddleCV/caffe2fluid/kaffe/custom_layers/__init__.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/__init__.py rename to PaddleCV/caffe2fluid/kaffe/custom_layers/__init__.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/argmax.py b/PaddleCV/caffe2fluid/kaffe/custom_layers/argmax.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/argmax.py rename to PaddleCV/caffe2fluid/kaffe/custom_layers/argmax.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/axpy.py b/PaddleCV/caffe2fluid/kaffe/custom_layers/axpy.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/axpy.py rename to PaddleCV/caffe2fluid/kaffe/custom_layers/axpy.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/crop.py b/PaddleCV/caffe2fluid/kaffe/custom_layers/crop.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/crop.py rename to PaddleCV/caffe2fluid/kaffe/custom_layers/crop.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/detection_out.py b/PaddleCV/caffe2fluid/kaffe/custom_layers/detection_out.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/detection_out.py rename to PaddleCV/caffe2fluid/kaffe/custom_layers/detection_out.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/flatten.py b/PaddleCV/caffe2fluid/kaffe/custom_layers/flatten.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/flatten.py rename to PaddleCV/caffe2fluid/kaffe/custom_layers/flatten.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/normalize.py b/PaddleCV/caffe2fluid/kaffe/custom_layers/normalize.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/normalize.py rename to PaddleCV/caffe2fluid/kaffe/custom_layers/normalize.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/permute.py b/PaddleCV/caffe2fluid/kaffe/custom_layers/permute.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/permute.py rename to PaddleCV/caffe2fluid/kaffe/custom_layers/permute.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/power.py b/PaddleCV/caffe2fluid/kaffe/custom_layers/power.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/power.py rename to PaddleCV/caffe2fluid/kaffe/custom_layers/power.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/priorbox.py b/PaddleCV/caffe2fluid/kaffe/custom_layers/priorbox.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/priorbox.py rename to PaddleCV/caffe2fluid/kaffe/custom_layers/priorbox.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/reduction.py b/PaddleCV/caffe2fluid/kaffe/custom_layers/reduction.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/reduction.py rename to PaddleCV/caffe2fluid/kaffe/custom_layers/reduction.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/register.py b/PaddleCV/caffe2fluid/kaffe/custom_layers/register.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/register.py rename to PaddleCV/caffe2fluid/kaffe/custom_layers/register.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/reshape.py b/PaddleCV/caffe2fluid/kaffe/custom_layers/reshape.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/reshape.py rename to PaddleCV/caffe2fluid/kaffe/custom_layers/reshape.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/roipooling.py b/PaddleCV/caffe2fluid/kaffe/custom_layers/roipooling.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/roipooling.py rename to PaddleCV/caffe2fluid/kaffe/custom_layers/roipooling.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/select.py b/PaddleCV/caffe2fluid/kaffe/custom_layers/select.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/custom_layers/select.py rename to PaddleCV/caffe2fluid/kaffe/custom_layers/select.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/errors.py b/PaddleCV/caffe2fluid/kaffe/errors.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/errors.py rename to PaddleCV/caffe2fluid/kaffe/errors.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/graph.py b/PaddleCV/caffe2fluid/kaffe/graph.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/graph.py rename to PaddleCV/caffe2fluid/kaffe/graph.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/layers.py b/PaddleCV/caffe2fluid/kaffe/layers.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/layers.py rename to PaddleCV/caffe2fluid/kaffe/layers.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/net_template.py b/PaddleCV/caffe2fluid/kaffe/net_template.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/net_template.py rename to PaddleCV/caffe2fluid/kaffe/net_template.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/paddle/__init__.py b/PaddleCV/caffe2fluid/kaffe/paddle/__init__.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/paddle/__init__.py rename to PaddleCV/caffe2fluid/kaffe/paddle/__init__.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/paddle/network.py b/PaddleCV/caffe2fluid/kaffe/paddle/network.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/paddle/network.py rename to PaddleCV/caffe2fluid/kaffe/paddle/network.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/paddle/transformer.py b/PaddleCV/caffe2fluid/kaffe/paddle/transformer.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/paddle/transformer.py rename to PaddleCV/caffe2fluid/kaffe/paddle/transformer.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/protobuf_to_dict.py b/PaddleCV/caffe2fluid/kaffe/protobuf_to_dict.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/protobuf_to_dict.py rename to PaddleCV/caffe2fluid/kaffe/protobuf_to_dict.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/shapes.py b/PaddleCV/caffe2fluid/kaffe/shapes.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/shapes.py rename to PaddleCV/caffe2fluid/kaffe/shapes.py diff --git a/fluid/PaddleCV/caffe2fluid/kaffe/transformers.py b/PaddleCV/caffe2fluid/kaffe/transformers.py similarity index 100% rename from fluid/PaddleCV/caffe2fluid/kaffe/transformers.py rename to PaddleCV/caffe2fluid/kaffe/transformers.py diff --git a/fluid/PaddleCV/caffe2fluid/proto/caffe.proto b/PaddleCV/caffe2fluid/proto/caffe.proto similarity index 100% rename from fluid/PaddleCV/caffe2fluid/proto/caffe.proto rename to PaddleCV/caffe2fluid/proto/caffe.proto diff --git a/fluid/PaddleCV/caffe2fluid/proto/compile.sh b/PaddleCV/caffe2fluid/proto/compile.sh similarity index 100% rename from fluid/PaddleCV/caffe2fluid/proto/compile.sh rename to PaddleCV/caffe2fluid/proto/compile.sh diff --git a/fluid/PaddleCV/deeplabv3+/.gitignore b/PaddleCV/deeplabv3+/.gitignore similarity index 100% rename from fluid/PaddleCV/deeplabv3+/.gitignore rename to PaddleCV/deeplabv3+/.gitignore diff --git a/fluid/PaddleCV/deeplabv3+/.run_ce.sh b/PaddleCV/deeplabv3+/.run_ce.sh similarity index 100% rename from fluid/PaddleCV/deeplabv3+/.run_ce.sh rename to PaddleCV/deeplabv3+/.run_ce.sh diff --git a/PaddleCV/deeplabv3+/README.md b/PaddleCV/deeplabv3+/README.md new file mode 100644 index 0000000000000000000000000000000000000000..eff83fee192d6a34cb338f5c705fb7ec1f59fd07 --- /dev/null +++ b/PaddleCV/deeplabv3+/README.md @@ -0,0 +1,116 @@ +DeepLab运行本目录下的程序示例需要使用PaddlePaddle Fluid v1.3.0版本或以上。如果您的PaddlePaddle安装版本低于此要求,请按照安装文档中的说明更新PaddlePaddle安装版本,如果使用GPU,该程序需要使用cuDNN v7版本。 + + +## 代码结构 +``` +├── models.py # 网络结构定义脚本 +├── train.py # 训练任务脚本 +├── eval.py # 评估脚本 +└── reader.py # 定义通用的函数以及数据预处理脚本 +``` + +## 简介 + +DeepLabv3+ 是DeepLab语义分割系列网络的最新作,其前作有 DeepLabv1,DeepLabv2, DeepLabv3, +在最新作中,DeepLab的作者通过encoder-decoder进行多尺度信息的融合,同时保留了原来的空洞卷积和ASSP层, +其骨干网络使用了Xception模型,提高了语义分割的健壮性和运行速率,在 PASCAL VOC 2012 dataset取得新的state-of-art performance,89.0mIOU。 + +![](./imgs/model.png) + + +## 数据准备 + + + +本文采用Cityscape数据集,请前往[Cityscape官网](https://www.cityscapes-dataset.com)注册下载。 +下载以后的数据目录结构如下 +``` +data/cityscape/ +|-- gtFine +| |-- test +| |-- train +| `-- val +|-- leftImg8bit + |-- test + |-- train + `-- val +``` + +# 预训练模型准备 + +我们为了节约更多的显存,在这里我们使用Group Norm作为我们的归一化手段。 +如果需要从头开始训练模型,用户需要下载我们的初始化模型 +``` +wget https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_gn_init.tgz +tar -xf deeplabv3plus_gn_init.tgz && rm deeplabv3plus_gn_init.tgz +``` +如果需要最终训练模型进行fine tune或者直接用于预测,请下载我们的最终模型 +``` +wget https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_gn.tgz +tar -xf deeplabv3plus_gn.tgz && rm deeplabv3plus_gn.tgz +``` + + +## 模型训练与预测 + +### 训练 +执行以下命令进行训练,同时指定weights的保存路径,初始化路径,以及数据存放位置: +``` +python ./train.py \ + --batch_size=1 \ + --train_crop_size=769 \ + --total_step=50 \ + --norm_type=gn \ + --init_weights_path=$INIT_WEIGHTS_PATH \ + --save_weights_path=$SAVE_WEIGHTS_PATH \ + --dataset_path=$DATASET_PATH +``` +使用以下命令获得更多使用说明: +``` +python train.py --help +``` +以上命令用于测试训练过程是否正常,仅仅迭代了50次并且使用了1的batch size,如果需要复现 +原论文的实验,请使用以下设置: +``` +CUDA_VISIBLE_DEVICES=0 \ +python ./train.py \ + --batch_size=4 \ + --parallel=True \ + --norm_type=gn \ + --train_crop_size=769 \ + --total_step=500000 \ + --base_lr=0.001 \ + --init_weights_path=deeplabv3plus_gn_init \ + --save_weights_path=output \ + --dataset_path=$DATASET_PATH +``` +如果您的显存不足,可以尝试减小`batch_size`,同时等比例放大`total_step`, 缩小`base_lr`, 保证相乘的值不变,这得益于Group Norm的特性,改变 `batch_size` 并不会显著影响结果,而且能够节约更多显存, 比如您可以设置`--batch_size=2 --total_step=1000000 --base_lr=0.0005`。 + +### 测试 +执行以下命令在`Cityscape`测试数据集上进行测试: +``` +python ./eval.py \ + --init_weights_path=deeplabv3plus_gn \ + --norm_type=gn \ + --dataset_path=$DATASET_PATH +``` +需要通过选项`--init_weights_path`指定模型文件。测试脚本的输出的评估指标为mean IoU。 + + +## 实验结果 +训练完成以后,使用`eval.py`在验证集上进行测试,得到以下结果: +``` +load from: ../models/deeplabv3plus_gn +total number 500 +step: 500, mIoU: 0.7881 +``` + +## 其他信息 + +|数据集 | norm type | pretrained model | trained model | mean IoU +|---|---|---|---|---| +|CityScape | group norm | [deeplabv3plus_gn_init.tgz](https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_gn_init.tgz) | [deeplabv3plus_gn.tgz](https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_gn.tgz) | 0.7881 | + +## 参考 + +- [Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1802.02611) diff --git a/fluid/DeepASR/data_utils/augmentor/__init__.py b/PaddleCV/deeplabv3+/__init__.py similarity index 100% rename from fluid/DeepASR/data_utils/augmentor/__init__.py rename to PaddleCV/deeplabv3+/__init__.py diff --git a/fluid/PaddleCV/deeplabv3+/_ce.py b/PaddleCV/deeplabv3+/_ce.py similarity index 100% rename from fluid/PaddleCV/deeplabv3+/_ce.py rename to PaddleCV/deeplabv3+/_ce.py diff --git a/fluid/PaddleCV/deeplabv3+/eval.py b/PaddleCV/deeplabv3+/eval.py similarity index 100% rename from fluid/PaddleCV/deeplabv3+/eval.py rename to PaddleCV/deeplabv3+/eval.py diff --git a/fluid/PaddleCV/deeplabv3+/imgs/model.png b/PaddleCV/deeplabv3+/imgs/model.png similarity index 100% rename from fluid/PaddleCV/deeplabv3+/imgs/model.png rename to PaddleCV/deeplabv3+/imgs/model.png diff --git a/fluid/PaddleCV/deeplabv3+/models.py b/PaddleCV/deeplabv3+/models.py similarity index 100% rename from fluid/PaddleCV/deeplabv3+/models.py rename to PaddleCV/deeplabv3+/models.py diff --git a/fluid/PaddleCV/deeplabv3+/reader.py b/PaddleCV/deeplabv3+/reader.py similarity index 100% rename from fluid/PaddleCV/deeplabv3+/reader.py rename to PaddleCV/deeplabv3+/reader.py diff --git a/fluid/PaddleCV/deeplabv3+/train.py b/PaddleCV/deeplabv3+/train.py similarity index 100% rename from fluid/PaddleCV/deeplabv3+/train.py rename to PaddleCV/deeplabv3+/train.py diff --git a/fluid/PaddleCV/deeplabv3+/utility.py b/PaddleCV/deeplabv3+/utility.py similarity index 100% rename from fluid/PaddleCV/deeplabv3+/utility.py rename to PaddleCV/deeplabv3+/utility.py diff --git a/fluid/PaddleCV/face_detection/.gitignore b/PaddleCV/face_detection/.gitignore similarity index 100% rename from fluid/PaddleCV/face_detection/.gitignore rename to PaddleCV/face_detection/.gitignore diff --git a/fluid/PaddleCV/face_detection/.run_ce.sh b/PaddleCV/face_detection/.run_ce.sh similarity index 100% rename from fluid/PaddleCV/face_detection/.run_ce.sh rename to PaddleCV/face_detection/.run_ce.sh diff --git a/PaddleCV/face_detection/README.md b/PaddleCV/face_detection/README.md new file mode 120000 index 0000000000000000000000000000000000000000..4015683cfa5969297febc12e7ca1264afabbc0b5 --- /dev/null +++ b/PaddleCV/face_detection/README.md @@ -0,0 +1 @@ +README_cn.md \ No newline at end of file diff --git a/PaddleCV/face_detection/README_cn.md b/PaddleCV/face_detection/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..f63fbed02ab34520d79b2d2b000e31f5eb22e7f8 --- /dev/null +++ b/PaddleCV/face_detection/README_cn.md @@ -0,0 +1,185 @@ +## Pyramidbox 人脸检测 + +## Table of Contents +- [简介](#简介) +- [数据准备](#数据准备) +- [模型训练](#模型训练) +- [模型评估](#模型评估) +- [模型发布](#模型发布) + +### 简介 + +人脸检测是经典的计算机视觉任务,非受控场景中的小脸、模糊和遮挡的人脸检测是这个方向上最有挑战的问题。[PyramidBox](https://arxiv.org/pdf/1803.07737.pdf) 是一种基于SSD的单阶段人脸检测器,它利用上下文信息解决困难人脸的检测问题。如下图所示,PyramidBox在六个尺度的特征图上进行不同层级的预测。该工作主要包括以下模块:LFPN、Pyramid Anchors、CPM、Data-anchor-sampling。具体可以参考该方法对应的论文 https://arxiv.org/pdf/1803.07737.pdf ,下面进行简要的介绍。 + +

+
+Pyramidbox 人脸检测模型 +

+ +**LFPN**: LFPN全称Low-level Feature Pyramid Networks, 在检测任务中,LFPN可以充分结合高层次的包含更多上下文的特征和低层次的包含更多纹理的特征。高层级特征被用于检测尺寸较大的人脸,而低层级特征被用于检测尺寸较小的人脸。为了将高层级特征整合到高分辨率的低层级特征上,我们从中间层开始做自上而下的融合,构建Low-level FPN。 + +**Pyramid Anchors**: 该算法使用半监督解决方案来生成与人脸检测相关的具有语义的近似标签,提出基于anchor的语境辅助方法,它引入有监督的信息来学习较小的、模糊的和部分遮挡的人脸的语境特征。使用者可以根据标注的人脸标签,按照一定的比例扩充,得到头部的标签(上下左右各扩充1/2)和人体的标签(可自定义扩充比例)。 + +**CPM**: CPM全称Context-sensitive Predict Module, 本方法设计了一种上下文敏感结构(CPM)来提高预测网络的表达能力。 + +**Data-anchor-sampling**: 设计了一种新的采样方法,称作Data-anchor-sampling,该方法可以增加训练样本在不同尺度上的多样性。该方法改变训练样本的分布,重点关注较小的人脸。 + +Pyramidbox模型可以在以下示例图片上展示鲁棒的检测性能,该图有一千张人脸,该模型检测出其中的880张人脸。 +

+
+Pyramidbox 人脸检测性能展示 +

+ + + +### 数据准备 + +本教程使用 [WIDER FACE 数据集](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/) 来进行模型的训练测试工作,官网给出了详尽的数据介绍。 + +WIDER FACE数据集包含32,203张图片,其中包含393,703个人脸,数据集的人脸在尺度、姿态、遮挡方面有较大的差异性。另外WIDER FACE数据集是基于61个场景归类的,然后针对每个场景,随机的挑选40%作为训练集,10%作为验证集,50%作为测试集。 + +首先,从官网训练集和验证集,放在`data`目录,官网提供了谷歌云和百度云下载地址,请依据情况自行下载。并下载训练集和验证集的标注信息: + +```bash +./data/download.sh +``` + +准备好数据之后,`data`目录如下: + +``` +data +|-- download.sh +|-- wider_face_split +| |-- readme.txt +| |-- wider_face_train_bbx_gt.txt +| |-- wider_face_val_bbx_gt.txt +| `-- ... +|-- WIDER_train +| `-- images +| |-- 0--Parade +| ... +| `-- 9--Press_Conference +`-- WIDER_val + `-- images + |-- 0--Parade + ... + `-- 9--Press_Conference +``` + + +### 模型训练 + +#### 下载预训练模型 + +我们提供了预训练模型,模型是基于VGGNet的主干网络,使用如下命令下载: + + +```bash +wget http://paddlemodels.bj.bcebos.com/vgg_ilsvrc_16_fc_reduced.tar.gz +tar -xf vgg_ilsvrc_16_fc_reduced.tar.gz && rm -f vgg_ilsvrc_16_fc_reduced.tar.gz +``` + +声明:该预训练模型转换自[Caffe](http://cs.unc.edu/~wliu/projects/ParseNet/VGG_ILSVRC_16_layers_fc_reduced.caffemodel)。不久,我们会发布自己预训练的模型。 + + +#### 开始训练 + + +`train.py` 是训练模块的主要执行程序,调用示例如下: + +```bash +python -u train.py --batch_size=16 --pretrained_model=vgg_ilsvrc_16_fc_reduced +``` + - 可以通过设置 `export CUDA_VISIBLE_DEVICES=0,1,2,3` 指定想要使用的GPU数量,`batch_size`默认设置为12或16。 + - 更多的可选参数见: + ```bash + python train.py --help + ``` + - 模型训练150轮以上可以收敛。用Nvidia Tesla P40 GPU 4卡并行,`batch_size=16`的配置,每轮训练大约40分钟,总共训练时长大约100小时 + +模型训练所采用的数据增强: + +**数据增强**:数据的读取行为定义在 `reader.py` 中,所有的图片都会被缩放到640x640。在训练时还会对图片进行数据增强,包括随机扰动、翻转、裁剪等,和[物体检测SSD算法](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/object_detection/README.md)中数据增强类似,除此之外,增加了上面提到的Data-anchor-sampling: + + **尺度变换(Data-anchor-sampling)**:随机将图片尺度变换到一定范围的尺度,大大增强人脸的尺度变化。具体操作为根据随机选择的人脸高(height)和宽(width),得到$v=\\sqrt{width * height}$,判断$v$的值位于缩放区间$[16,32,64,128,256,512]$中的的哪一个。假设$v=45$,则选定$32 + + + +
+Pyramidbox 预测可视化 +

+ + +### 模型发布 + + + +| 模型 | 预训练模型 | 训练数据 | 测试数据 | mAP | +|:------------------------:|:------------------:|:----------------:|:------------:|:----:| +|[Pyramidbox-v1-SSD 640x640](http://paddlemodels.bj.bcebos.com/PyramidBox_WiderFace.tar.gz) | [VGGNet](http://paddlemodels.bj.bcebos.com/vgg_ilsvrc_16_fc_reduced.tar.gz) | WIDER FACE train | WIDER FACE Val | 96.0%/ 94.8%/ 88.8% | + +#### 性能曲线 +

+ + +
+WIDER FACE Easy/Medium/Hard set +

diff --git a/fluid/DeepASR/model_utils/__init__.py b/PaddleCV/face_detection/__init__.py similarity index 100% rename from fluid/DeepASR/model_utils/__init__.py rename to PaddleCV/face_detection/__init__.py diff --git a/fluid/PaddleCV/face_detection/_ce.py b/PaddleCV/face_detection/_ce.py similarity index 100% rename from fluid/PaddleCV/face_detection/_ce.py rename to PaddleCV/face_detection/_ce.py diff --git a/fluid/PaddleCV/face_detection/data/download.sh b/PaddleCV/face_detection/data/download.sh similarity index 100% rename from fluid/PaddleCV/face_detection/data/download.sh rename to PaddleCV/face_detection/data/download.sh diff --git a/fluid/PaddleCV/face_detection/image_util.py b/PaddleCV/face_detection/image_util.py similarity index 100% rename from fluid/PaddleCV/face_detection/image_util.py rename to PaddleCV/face_detection/image_util.py diff --git a/fluid/PaddleCV/face_detection/images/0_Parade_marchingband_1_356.jpg b/PaddleCV/face_detection/images/0_Parade_marchingband_1_356.jpg similarity index 100% rename from fluid/PaddleCV/face_detection/images/0_Parade_marchingband_1_356.jpg rename to PaddleCV/face_detection/images/0_Parade_marchingband_1_356.jpg diff --git a/fluid/PaddleCV/face_detection/images/12_Group_Group_12_Group_Group_12_935.jpg b/PaddleCV/face_detection/images/12_Group_Group_12_Group_Group_12_935.jpg similarity index 100% rename from fluid/PaddleCV/face_detection/images/12_Group_Group_12_Group_Group_12_935.jpg rename to PaddleCV/face_detection/images/12_Group_Group_12_Group_Group_12_935.jpg diff --git a/fluid/PaddleCV/face_detection/images/28_Sports_Fan_Sports_Fan_28_770.jpg b/PaddleCV/face_detection/images/28_Sports_Fan_Sports_Fan_28_770.jpg similarity index 100% rename from fluid/PaddleCV/face_detection/images/28_Sports_Fan_Sports_Fan_28_770.jpg rename to PaddleCV/face_detection/images/28_Sports_Fan_Sports_Fan_28_770.jpg diff --git a/fluid/PaddleCV/face_detection/images/4_Dancing_Dancing_4_194.jpg b/PaddleCV/face_detection/images/4_Dancing_Dancing_4_194.jpg similarity index 100% rename from fluid/PaddleCV/face_detection/images/4_Dancing_Dancing_4_194.jpg rename to PaddleCV/face_detection/images/4_Dancing_Dancing_4_194.jpg diff --git a/fluid/PaddleCV/face_detection/images/architecture_of_pyramidbox.jpg b/PaddleCV/face_detection/images/architecture_of_pyramidbox.jpg similarity index 100% rename from fluid/PaddleCV/face_detection/images/architecture_of_pyramidbox.jpg rename to PaddleCV/face_detection/images/architecture_of_pyramidbox.jpg diff --git a/fluid/PaddleCV/face_detection/images/demo_img.jpg b/PaddleCV/face_detection/images/demo_img.jpg similarity index 100% rename from fluid/PaddleCV/face_detection/images/demo_img.jpg rename to PaddleCV/face_detection/images/demo_img.jpg diff --git a/fluid/PaddleCV/face_detection/images/wider_pr_cruve_int_easy_val.jpg b/PaddleCV/face_detection/images/wider_pr_cruve_int_easy_val.jpg similarity index 100% rename from fluid/PaddleCV/face_detection/images/wider_pr_cruve_int_easy_val.jpg rename to PaddleCV/face_detection/images/wider_pr_cruve_int_easy_val.jpg diff --git a/fluid/PaddleCV/face_detection/images/wider_pr_cruve_int_hard_val.jpg b/PaddleCV/face_detection/images/wider_pr_cruve_int_hard_val.jpg similarity index 100% rename from fluid/PaddleCV/face_detection/images/wider_pr_cruve_int_hard_val.jpg rename to PaddleCV/face_detection/images/wider_pr_cruve_int_hard_val.jpg diff --git a/fluid/PaddleCV/face_detection/images/wider_pr_cruve_int_medium_val.jpg b/PaddleCV/face_detection/images/wider_pr_cruve_int_medium_val.jpg similarity index 100% rename from fluid/PaddleCV/face_detection/images/wider_pr_cruve_int_medium_val.jpg rename to PaddleCV/face_detection/images/wider_pr_cruve_int_medium_val.jpg diff --git a/fluid/PaddleCV/face_detection/profile.py b/PaddleCV/face_detection/profile.py similarity index 100% rename from fluid/PaddleCV/face_detection/profile.py rename to PaddleCV/face_detection/profile.py diff --git a/fluid/PaddleCV/face_detection/pyramidbox.py b/PaddleCV/face_detection/pyramidbox.py similarity index 100% rename from fluid/PaddleCV/face_detection/pyramidbox.py rename to PaddleCV/face_detection/pyramidbox.py diff --git a/fluid/PaddleCV/face_detection/reader.py b/PaddleCV/face_detection/reader.py similarity index 100% rename from fluid/PaddleCV/face_detection/reader.py rename to PaddleCV/face_detection/reader.py diff --git a/fluid/PaddleCV/face_detection/train.py b/PaddleCV/face_detection/train.py similarity index 100% rename from fluid/PaddleCV/face_detection/train.py rename to PaddleCV/face_detection/train.py diff --git a/fluid/PaddleCV/face_detection/utility.py b/PaddleCV/face_detection/utility.py similarity index 100% rename from fluid/PaddleCV/face_detection/utility.py rename to PaddleCV/face_detection/utility.py diff --git a/fluid/PaddleCV/face_detection/visualize.py b/PaddleCV/face_detection/visualize.py similarity index 100% rename from fluid/PaddleCV/face_detection/visualize.py rename to PaddleCV/face_detection/visualize.py diff --git a/fluid/PaddleCV/face_detection/widerface_eval.py b/PaddleCV/face_detection/widerface_eval.py similarity index 100% rename from fluid/PaddleCV/face_detection/widerface_eval.py rename to PaddleCV/face_detection/widerface_eval.py diff --git a/fluid/PaddleCV/gan/c_gan/.run_ce.sh b/PaddleCV/gan/c_gan/.run_ce.sh similarity index 100% rename from fluid/PaddleCV/gan/c_gan/.run_ce.sh rename to PaddleCV/gan/c_gan/.run_ce.sh diff --git a/PaddleCV/gan/c_gan/README.md b/PaddleCV/gan/c_gan/README.md new file mode 100644 index 0000000000000000000000000000000000000000..9f3c18fd0fb9a943f728548f655d3dd3cef73288 --- /dev/null +++ b/PaddleCV/gan/c_gan/README.md @@ -0,0 +1,76 @@ + + +运行本目录下的程序示例需要使用PaddlePaddle develop最新版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。 + +## 代码结构 +``` +├── network.py # 定义基础生成网络和判别网络。 +├── utility.py # 定义通用工具方法。 +├── dc_gan.py # DCGAN训练脚本。 +└── c_gan.py # conditionalGAN训练脚本。 +``` + +## 简介 +TODO + +## 数据准备 + +本教程使用 mnist 数据集来进行模型的训练测试工作,该数据集通过`paddle.dataset`模块自动下载到本地。 + +## 训练测试conditianalGAN + +在GPU单卡上训练conditionalGAN: + +``` +env CUDA_VISIBLE_DEVICES=0 python c_gan.py --output="./result" +``` + +训练过程中,每隔固定的训练轮数,会取一个batch的数据进行测试,测试结果以图片的形式保存至`--output`选项指定的路径。 + +执行`python c_gan.py --help`可查看更多使用方式和参数详细说明。 + +图1为conditionalGAN训练损失示意图,其中横坐标轴为训练轮数,纵轴为在训练集上的损失。其中,'G_loss'和'D_loss'分别为生成网络和判别器网络的训练损失。conditionalGAN训练19轮的模型预测效果如图2所示. + +

+ + + + + + + + + + + +
+ + + +
+ 图 1 + + 图 2 +
+

+ + +## 训练测试DCGAN + +在GPU单卡上训练DCGAN: + +``` +env CUDA_VISIBLE_DEVICES=0 python dc_gan.py --output="./result" +``` + +训练过程中,每隔固定的训练轮数,会取一个batch的数据进行测试,测试结果以图片的形式保存至`--output`选项指定的路径。 + +执行`python dc_gan.py --help`可查看更多使用方式和参数详细说明。 + + +DCGAN训练10轮的模型预测效果如图3所示: + +

+
+图 3 +

diff --git a/fluid/PaddleCV/gan/c_gan/_ce.py b/PaddleCV/gan/c_gan/_ce.py similarity index 100% rename from fluid/PaddleCV/gan/c_gan/_ce.py rename to PaddleCV/gan/c_gan/_ce.py diff --git a/fluid/PaddleCV/gan/c_gan/c_gan.py b/PaddleCV/gan/c_gan/c_gan.py similarity index 100% rename from fluid/PaddleCV/gan/c_gan/c_gan.py rename to PaddleCV/gan/c_gan/c_gan.py diff --git a/fluid/PaddleCV/gan/c_gan/dc_gan.py b/PaddleCV/gan/c_gan/dc_gan.py similarity index 100% rename from fluid/PaddleCV/gan/c_gan/dc_gan.py rename to PaddleCV/gan/c_gan/dc_gan.py diff --git a/fluid/PaddleCV/gan/c_gan/images/DCGAN_demo.png b/PaddleCV/gan/c_gan/images/DCGAN_demo.png similarity index 100% rename from fluid/PaddleCV/gan/c_gan/images/DCGAN_demo.png rename to PaddleCV/gan/c_gan/images/DCGAN_demo.png diff --git a/fluid/PaddleCV/gan/c_gan/images/conditionalGAN_demo.png b/PaddleCV/gan/c_gan/images/conditionalGAN_demo.png similarity index 100% rename from fluid/PaddleCV/gan/c_gan/images/conditionalGAN_demo.png rename to PaddleCV/gan/c_gan/images/conditionalGAN_demo.png diff --git a/fluid/PaddleCV/gan/c_gan/images/conditionalGAN_loss.png b/PaddleCV/gan/c_gan/images/conditionalGAN_loss.png similarity index 100% rename from fluid/PaddleCV/gan/c_gan/images/conditionalGAN_loss.png rename to PaddleCV/gan/c_gan/images/conditionalGAN_loss.png diff --git a/fluid/PaddleCV/gan/c_gan/network.py b/PaddleCV/gan/c_gan/network.py similarity index 100% rename from fluid/PaddleCV/gan/c_gan/network.py rename to PaddleCV/gan/c_gan/network.py diff --git a/fluid/PaddleCV/gan/c_gan/utility.py b/PaddleCV/gan/c_gan/utility.py similarity index 100% rename from fluid/PaddleCV/gan/c_gan/utility.py rename to PaddleCV/gan/c_gan/utility.py diff --git a/fluid/PaddleCV/gan/cycle_gan/.run_ce.sh b/PaddleCV/gan/cycle_gan/.run_ce.sh similarity index 100% rename from fluid/PaddleCV/gan/cycle_gan/.run_ce.sh rename to PaddleCV/gan/cycle_gan/.run_ce.sh diff --git a/PaddleCV/gan/cycle_gan/README.md b/PaddleCV/gan/cycle_gan/README.md new file mode 100644 index 0000000000000000000000000000000000000000..0a9be53c783a557b7c2306f65377e4cafa8cfd90 --- /dev/null +++ b/PaddleCV/gan/cycle_gan/README.md @@ -0,0 +1,91 @@ + + +运行本目录下的程序示例需要使用PaddlePaddle develop最新版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。 + +## 代码结构 +``` +├── data_reader.py # 读取、处理数据。 +├── layers.py # 封装定义基础的layers。 +├── model.py # 定义基础生成网络和判别网络。 +├── trainer.py # 构造loss和训练网络。 +├── train.py # 训练脚本。 +└── infer.py # 预测脚本。 +``` + +## 简介 +TODO + +## 数据准备 + +本教程使用 horse2zebra 数据集 来进行模型的训练测试工作,该数据集是用关键字'wild horse'和'zebra'过滤[ImageNet](http://www.image-net.org/)数据集并下载得到的。 + +horse2zebra训练集包含1069张野马图片,1336张斑马图片。测试集包含121张野马图片和141张斑马图片。 + +数据下载处理完毕后,并组织为以下路径结构: + +``` +data +|-- horse2zebra +| |-- testA +| |-- testA.txt +| |-- testB +| |-- testB.txt +| |-- trainA +| |-- trainA.txt +| |-- trainB +| `-- trainB.txt + +``` + +以上数据文件中,`data`文件夹需要放在训练脚本`train.py`同级目录下。`testA`为存放野马测试图片的文件夹,`testB`为存放斑马测试图片的文件夹,`testA.txt`和`testB.txt`分别为野马和斑马测试图片路径列表文件,格式如下: + +``` +testA/n02381460_9243.jpg +testA/n02381460_9244.jpg +testA/n02381460_9245.jpg +``` + +训练数据组织方式与测试数据相同。 + + +## 模型训练与预测 + +### 训练 + +在GPU单卡上训练: + +``` +env CUDA_VISIBLE_DEVICES=0 python train.py +``` + +执行`python train.py --help`可查看更多使用方式和参数详细说明。 + +图1为训练152轮的训练损失示意图,其中横坐标轴为训练轮数,纵轴为在训练集上的损失。其中,'g_A_loss','g_B_loss','d_A_loss'和'd_B_loss'分别为生成器A、生成器B、判别器A和判别器B的训练损失。 + +

+
+图 1 +

+ + +### 预测 + +执行以下命令读取多张图片进行预测: + +``` +env CUDA_VISIBLE_DEVICE=0 python infer.py \ + --init_model="checkpoints/1" --input="./data/inputA/*" \ + --input_style A --output="./output" +``` + +训练150轮的模型预测效果如图2和图3所示: + +

+
+图 2 +

+ +

+
+图 3 +

diff --git a/fluid/PaddleCV/gan/cycle_gan/_ce.py b/PaddleCV/gan/cycle_gan/_ce.py similarity index 100% rename from fluid/PaddleCV/gan/cycle_gan/_ce.py rename to PaddleCV/gan/cycle_gan/_ce.py diff --git a/fluid/PaddleCV/gan/cycle_gan/data/horse2zebra/trainA.txt b/PaddleCV/gan/cycle_gan/data/horse2zebra/trainA.txt similarity index 100% rename from fluid/PaddleCV/gan/cycle_gan/data/horse2zebra/trainA.txt rename to PaddleCV/gan/cycle_gan/data/horse2zebra/trainA.txt diff --git a/fluid/PaddleCV/gan/cycle_gan/data/horse2zebra/trainA/n02381460_1001.jpg b/PaddleCV/gan/cycle_gan/data/horse2zebra/trainA/n02381460_1001.jpg similarity index 100% rename from fluid/PaddleCV/gan/cycle_gan/data/horse2zebra/trainA/n02381460_1001.jpg rename to PaddleCV/gan/cycle_gan/data/horse2zebra/trainA/n02381460_1001.jpg diff --git a/fluid/PaddleCV/gan/cycle_gan/data/horse2zebra/trainB.txt b/PaddleCV/gan/cycle_gan/data/horse2zebra/trainB.txt similarity index 100% rename from fluid/PaddleCV/gan/cycle_gan/data/horse2zebra/trainB.txt rename to PaddleCV/gan/cycle_gan/data/horse2zebra/trainB.txt diff --git a/fluid/PaddleCV/gan/cycle_gan/data/horse2zebra/trainB/n02391049_10007.jpg b/PaddleCV/gan/cycle_gan/data/horse2zebra/trainB/n02391049_10007.jpg similarity index 100% rename from fluid/PaddleCV/gan/cycle_gan/data/horse2zebra/trainB/n02391049_10007.jpg rename to PaddleCV/gan/cycle_gan/data/horse2zebra/trainB/n02391049_10007.jpg diff --git a/fluid/PaddleCV/gan/cycle_gan/data_reader.py b/PaddleCV/gan/cycle_gan/data_reader.py similarity index 100% rename from fluid/PaddleCV/gan/cycle_gan/data_reader.py rename to PaddleCV/gan/cycle_gan/data_reader.py diff --git a/fluid/PaddleCV/gan/cycle_gan/images/A2B.jpg b/PaddleCV/gan/cycle_gan/images/A2B.jpg similarity index 100% rename from fluid/PaddleCV/gan/cycle_gan/images/A2B.jpg rename to PaddleCV/gan/cycle_gan/images/A2B.jpg diff --git a/fluid/PaddleCV/gan/cycle_gan/images/B2A.jpg b/PaddleCV/gan/cycle_gan/images/B2A.jpg similarity index 100% rename from fluid/PaddleCV/gan/cycle_gan/images/B2A.jpg rename to PaddleCV/gan/cycle_gan/images/B2A.jpg diff --git a/fluid/PaddleCV/gan/cycle_gan/images/cycleGAN_loss.png b/PaddleCV/gan/cycle_gan/images/cycleGAN_loss.png similarity index 100% rename from fluid/PaddleCV/gan/cycle_gan/images/cycleGAN_loss.png rename to PaddleCV/gan/cycle_gan/images/cycleGAN_loss.png diff --git a/fluid/PaddleCV/gan/cycle_gan/infer.py b/PaddleCV/gan/cycle_gan/infer.py similarity index 100% rename from fluid/PaddleCV/gan/cycle_gan/infer.py rename to PaddleCV/gan/cycle_gan/infer.py diff --git a/fluid/PaddleCV/gan/cycle_gan/layers.py b/PaddleCV/gan/cycle_gan/layers.py similarity index 100% rename from fluid/PaddleCV/gan/cycle_gan/layers.py rename to PaddleCV/gan/cycle_gan/layers.py diff --git a/fluid/PaddleCV/gan/cycle_gan/model.py b/PaddleCV/gan/cycle_gan/model.py similarity index 100% rename from fluid/PaddleCV/gan/cycle_gan/model.py rename to PaddleCV/gan/cycle_gan/model.py diff --git a/fluid/PaddleCV/gan/cycle_gan/train.py b/PaddleCV/gan/cycle_gan/train.py similarity index 100% rename from fluid/PaddleCV/gan/cycle_gan/train.py rename to PaddleCV/gan/cycle_gan/train.py diff --git a/fluid/PaddleCV/gan/cycle_gan/trainer.py b/PaddleCV/gan/cycle_gan/trainer.py similarity index 100% rename from fluid/PaddleCV/gan/cycle_gan/trainer.py rename to PaddleCV/gan/cycle_gan/trainer.py diff --git a/fluid/PaddleCV/gan/cycle_gan/utility.py b/PaddleCV/gan/cycle_gan/utility.py similarity index 100% rename from fluid/PaddleCV/gan/cycle_gan/utility.py rename to PaddleCV/gan/cycle_gan/utility.py diff --git a/PaddleCV/human_pose_estimation/README.md b/PaddleCV/human_pose_estimation/README.md new file mode 100644 index 0000000000000000000000000000000000000000..d629c6b7c31ea05329b591a7ee5f6ca929573ec3 --- /dev/null +++ b/PaddleCV/human_pose_estimation/README.md @@ -0,0 +1,115 @@ +# Simple Baselines for Human Pose Estimation in Fluid + +## Introduction +This is a simple demonstration of re-implementation in [PaddlePaddle.Fluid](http://www.paddlepaddle.org/en) for the paper [Simple Baselines for Human Pose Estimation and Tracking](https://arxiv.org/abs/1804.06208) (ECCV'18) from MSRA. + +![demo](demo.gif) + +> **Video in Demo**: *Bruno Mars - That’s What I Like [Official Video]*. + +## Requirements + + - Python == 2.7 + - PaddlePaddle >= 1.1.0 + - opencv-python >= 3.3 + +## Environment + +The code is developed and tested under 4 Tesla K40/P40 GPUS cards on CentOS with installed CUDA-9.2/8.0 and cuDNN-7.1. + +## Results on MPII Val +| Arch | Head | Shoulder | Elbow | Wrist | Hip | Knee | Ankle | Mean | Mean@0.1| Models | +| ---- |:----:|:--------:|:-----:|:-----:|:---:|:----:|:-----:|:----:|:-------:|:------:| +| 256x256\_pose\_resnet\_50 in PyTorch | 96.351 | 95.329 | 88.989 | 83.176 | 88.420 | 83.960 | 79.594 | 88.532 | 33.911 | - | +| 256x256\_pose\_resnet\_50 in Fluid | 96.385 | 95.363 | 89.211 | 84.084 | 88.454 | 84.182 | 79.546 | 88.748 | 33.750 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-mpii-256x256.tar.gz) | +| 384x384\_pose\_resnet\_50 in PyTorch | 96.658 | 95.754 | 89.790 | 84.614 | 88.523 | 84.666 | 79.287 | 89.066 | 38.046 | - | +| 384x384\_pose\_resnet\_50 in Fluid | 96.862 | 95.635 | 90.046 | 85.557 | 88.818 | 84.948 | 78.484 | 89.235 | 38.093 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-mpii-384x384.tar.gz) | + +## Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset +| Arch | AP | Ap .5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) | Models | +| ---- |:--:|:-----:|:------:|:------:|:------:|:--:|:-----:|:------:|:------:|:------:|:------:| +| 256x192\_pose\_resnet\_50 in PyTorch | 0.704 | 0.886 | 0.783 | 0.671 | 0.772 | 0.763 | 0.929 | 0.834 | 0.721 | 0.824 | - | +| 256x192\_pose\_resnet\_50 in Fluid | 0.712 | 0.897 | 0.786 | 0.683 | 0.756 | 0.741 | 0.906 | 0.806 | 0.709 | 0.790 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-coco-256x192.tar.gz) | +| 384x288\_pose\_resnet\_50 in PyTorch | 0.722 | 0.893 | 0.789 | 0.681 | 0.797 | 0.776 | 0.932 | 0.838 | 0.728 | 0.846 | - | +| 384x288\_pose\_resnet\_50 in Fluid | 0.727 | 0.897 | 0.796 | 0.690 | 0.783 | 0.754 | 0.907 | 0.813 | 0.714 | 0.814 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-coco-384x288.tar.gz) | + +### Notes: + + - Flip test is used. + - We do not hardly search the best model, just use the last saved model to make validation. + +## Getting Start + +### Prepare Datasets and Pretrained Models + + - Following the [instruction](https://github.com/Microsoft/human-pose-estimation.pytorch#data-preparation) to prepare datasets. + - Download the pretrained ResNet-50 model in PaddlePaddle.Fluid on ImageNet from [Model Zoo](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification#supported-models-and-performances). + +```bash +wget http://paddle-imagenet-models.bj.bcebos.com/resnet_50_model.tar +``` + +Then, put them in the folder `pretrained` under the directory root of this repo, make them look like: + +``` +${THIS REPO ROOT} + `-- pretrained + `-- resnet_50 + |-- 115 + `-- data + `-- coco + |-- annotations + |-- images + `-- mpii + |-- annot + |-- images +``` + +### Install [COCOAPI](https://github.com/cocodataset/cocoapi) + +```bash +# COCOAPI=/path/to/clone/cocoapi +git clone https://github.com/cocodataset/cocoapi.git $COCOAPI +cd $COCOAPI/PythonAPI +# if cython is not installed +pip install Cython +# Install into global site-packages +make install +# Alternatively, if you do not have permissions or prefer +# not to install the COCO API into global site-packages +python2 setup.py install --user +``` + +### Perform Validating + +Downloading the checkpoints of Pose-ResNet-50 trained on MPII dataset from [here](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-mpii-384x384.tar.gz). Extract it into the folder `checkpoints` under the directory root of this repo. Then run + +```bash +python val.py --dataset 'mpii' --checkpoint 'checkpoints/pose-resnet50-mpii-384x384' +``` + +### Perform Training + +```bash +python train.py --dataset 'mpii' # or coco +``` + +**Note**: Configurations for training are aggregated in the `lib/mpii_reader.py` and `lib/coco_reader.py`. + +### Perform Test on Images + +Put the images into the folder `test` under the directory root of this repo. Then run + +```bash +python test.py --checkpoint 'checkpoints/pose-resnet-50-384x384-mpii' +``` + +If there are multiple persons in images, detectors such as [Faster R-CNN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/rcnn), [SSD](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/object_detection) or others should be used first to crop them out. Because the simple baseline for human pose estimation is a top-down method. + +## Reference + + - Simple Baselines for Human Pose Estimation and Tracking in PyTorch [`code`](https://github.com/Microsoft/human-pose-estimation.pytorch#data-preparation) + +## License + +This code is released under the Apache License 2.0. diff --git a/PaddleCV/human_pose_estimation/README_cn.md b/PaddleCV/human_pose_estimation/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..08c772018a91b830d7865f78abf7d4604d1173fb --- /dev/null +++ b/PaddleCV/human_pose_estimation/README_cn.md @@ -0,0 +1,107 @@ +# 关键点检测(Simple Baselines for Human Pose Estimation) + +## 介绍 +本目录包含了对论文[Simple Baselines for Human Pose Estimation and Tracking](https://arxiv.org/abs/1804.06208) (ECCV'18)的复现. + +![demo](demo.gif) + +> **演示视频**: *Bruno Mars - That’s What I Like [官方视频]*. + +## 环境依赖 + +本目录下的代码均在4卡Tesla K40/P40 GPU,CentOS系统,CUDA-9.2/8.0,cuDNN-7.1环境下测试运行无误 + + - Python == 2.7 + - PaddlePaddle >= 1.1.0 + - opencv-python >= 3.3 + +## MPII Val结果 +| Arch | Head | Shoulder | Elbow | Wrist | Hip | Knee | Ankle | Mean | Mean@0.1| Models | +| ---- |:----:|:--------:|:-----:|:-----:|:---:|:----:|:-----:|:----:|:-------:|:------:| +| 256x256\_pose\_resnet\_50 in PyTorch | 96.351 | 95.329 | 88.989 | 83.176 | 88.420 | 83.960 | 79.594 | 88.532 | 33.911 | - | +| 256x256\_pose\_resnet\_50 in Fluid | 96.385 | 95.363 | 89.211 | 84.084 | 88.454 | 84.182 | 79.546 | 88.748 | 33.750 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-mpii-256x256.tar.gz) | +| 384x384\_pose\_resnet\_50 in PyTorch | 96.658 | 95.754 | 89.790 | 84.614 | 88.523 | 84.666 | 79.287 | 89.066 | 38.046 | - | +| 384x384\_pose\_resnet\_50 in Fluid | 96.862 | 95.635 | 90.046 | 85.557 | 88.818 | 84.948 | 78.484 | 89.235 | 38.093 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-mpii-384x384.tar.gz) | + +## COCO val2017结果(使用的检测器在COCO val2017数据集上AP为56.4) +| Arch | AP | Ap .5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) | Models | +| ---- |:--:|:-----:|:------:|:------:|:------:|:--:|:-----:|:------:|:------:|:------:|:------:| +| 256x192\_pose\_resnet\_50 in PyTorch | 0.704 | 0.886 | 0.783 | 0.671 | 0.772 | 0.763 | 0.929 | 0.834 | 0.721 | 0.824 | - | +| 256x192\_pose\_resnet\_50 in Fluid | 0.712 | 0.897 | 0.786 | 0.683 | 0.756 | 0.741 | 0.906 | 0.806 | 0.709 | 0.790 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-coco-256x192.tar.gz) | +| 384x288\_pose\_resnet\_50 in PyTorch | 0.722 | 0.893 | 0.789 | 0.681 | 0.797 | 0.776 | 0.932 | 0.838 | 0.728 | 0.846 | - | +| 384x288\_pose\_resnet\_50 in Fluid | 0.727 | 0.897 | 0.796 | 0.690 | 0.783 | 0.754 | 0.907 | 0.813 | 0.714 | 0.814 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-coco-384x288.tar.gz) | + +### 说明 + + - 使用Flip test + - 对当前模型结果并没有进行调参选择,使用下面相关实验配置训练后,取最后一个epoch后的模型作为最终模型,即可得到上述实验结果 + +## 开始 + +### 数据准备和预训练模型 + + - 安照[提示](https://github.com/Microsoft/human-pose-estimation.pytorch#data-preparation)进行数据准备 + - 下载预训练好的ResNet-50 + +```bash +wget http://paddle-imagenet-models.bj.bcebos.com/resnet_50_model.tar +``` + +下载完成后,将模型解压、放入到根目录下的'pretrained'文件夹中,默认文件路径树为: + +``` +${根目录} + `-- pretrained + `-- resnet_50 + |-- 115 + `-- data + `-- coco + |-- annotations + |-- images + `-- mpii + |-- annot + |-- images +``` + +### 安装 [COCOAPI](https://github.com/cocodataset/cocoapi) + +```bash +# COCOAPI=/path/to/clone/cocoapi +git clone https://github.com/cocodataset/cocoapi.git $COCOAPI +cd $COCOAPI/PythonAPI +# if cython is not installed +pip install Cython +# Install into global site-packages +make install +# Alternatively, if you do not have permissions or prefer +# not to install the COCO API into global site-packages +python2 setup.py install --user +``` + +### 模型验证(COCO或MPII) + +下载COCO/MPII预训练模型(见上表最后一列所附链接),保存到根目录下的'checkpoints'文件夹中,运行: + +```bash +python val.py --dataset 'mpii' --checkpoint 'checkpoints/pose-resnet50-mpii-384x384' +``` + +### 模型训练 + +```bash +python train.py --dataset 'mpii' # or coco +``` + +**说明** 详细参数配置已保存到`lib/mpii_reader.py` 和 `lib/coco_reader.py`文件中,通过设置dataset来选择使用具体的参数配置 + +### 模型测试(任意图片,使用上述COCO或MPII预训练好的模型) + +将测试图片放入根目录下的'test'文件夹中,执行 + +```bash +python test.py --checkpoint 'checkpoints/pose-resnet-50-384x384-mpii' +``` + +## 引用 + +- Simple Baselines for Human Pose Estimation and Tracking in PyTorch [`code`](https://github.com/Microsoft/human-pose-estimation.pytorch#data-preparation) diff --git a/fluid/PaddleCV/human_pose_estimation/demo.gif b/PaddleCV/human_pose_estimation/demo.gif similarity index 100% rename from fluid/PaddleCV/human_pose_estimation/demo.gif rename to PaddleCV/human_pose_estimation/demo.gif diff --git a/fluid/PaddleCV/HiNAS_models/build/__init__.py b/PaddleCV/human_pose_estimation/lib/__init__.py old mode 100755 new mode 100644 similarity index 100% rename from fluid/PaddleCV/HiNAS_models/build/__init__.py rename to PaddleCV/human_pose_estimation/lib/__init__.py diff --git a/fluid/PaddleCV/human_pose_estimation/lib/base_reader.py b/PaddleCV/human_pose_estimation/lib/base_reader.py similarity index 100% rename from fluid/PaddleCV/human_pose_estimation/lib/base_reader.py rename to PaddleCV/human_pose_estimation/lib/base_reader.py diff --git a/fluid/PaddleCV/human_pose_estimation/lib/coco_reader.py b/PaddleCV/human_pose_estimation/lib/coco_reader.py similarity index 100% rename from fluid/PaddleCV/human_pose_estimation/lib/coco_reader.py rename to PaddleCV/human_pose_estimation/lib/coco_reader.py diff --git a/fluid/PaddleCV/human_pose_estimation/lib/mpii_reader.py b/PaddleCV/human_pose_estimation/lib/mpii_reader.py similarity index 100% rename from fluid/PaddleCV/human_pose_estimation/lib/mpii_reader.py rename to PaddleCV/human_pose_estimation/lib/mpii_reader.py diff --git a/fluid/PaddleCV/human_pose_estimation/lib/pose_resnet.py b/PaddleCV/human_pose_estimation/lib/pose_resnet.py similarity index 100% rename from fluid/PaddleCV/human_pose_estimation/lib/pose_resnet.py rename to PaddleCV/human_pose_estimation/lib/pose_resnet.py diff --git a/fluid/PaddleCV/human_pose_estimation/test.py b/PaddleCV/human_pose_estimation/test.py similarity index 100% rename from fluid/PaddleCV/human_pose_estimation/test.py rename to PaddleCV/human_pose_estimation/test.py diff --git a/fluid/PaddleCV/human_pose_estimation/train.py b/PaddleCV/human_pose_estimation/train.py similarity index 100% rename from fluid/PaddleCV/human_pose_estimation/train.py rename to PaddleCV/human_pose_estimation/train.py diff --git a/fluid/PaddleCV/deeplabv3+/__init__.py b/PaddleCV/human_pose_estimation/utils/__init__.py similarity index 100% rename from fluid/PaddleCV/deeplabv3+/__init__.py rename to PaddleCV/human_pose_estimation/utils/__init__.py diff --git a/fluid/PaddleCV/human_pose_estimation/utils/base_evaluator.py b/PaddleCV/human_pose_estimation/utils/base_evaluator.py similarity index 100% rename from fluid/PaddleCV/human_pose_estimation/utils/base_evaluator.py rename to PaddleCV/human_pose_estimation/utils/base_evaluator.py diff --git a/fluid/PaddleCV/human_pose_estimation/utils/coco_evaluator.py b/PaddleCV/human_pose_estimation/utils/coco_evaluator.py similarity index 100% rename from fluid/PaddleCV/human_pose_estimation/utils/coco_evaluator.py rename to PaddleCV/human_pose_estimation/utils/coco_evaluator.py diff --git a/fluid/PaddleCV/human_pose_estimation/utils/evaluator_builder.py b/PaddleCV/human_pose_estimation/utils/evaluator_builder.py similarity index 100% rename from fluid/PaddleCV/human_pose_estimation/utils/evaluator_builder.py rename to PaddleCV/human_pose_estimation/utils/evaluator_builder.py diff --git a/fluid/PaddleCV/human_pose_estimation/utils/mpii_evaluator.py b/PaddleCV/human_pose_estimation/utils/mpii_evaluator.py similarity index 100% rename from fluid/PaddleCV/human_pose_estimation/utils/mpii_evaluator.py rename to PaddleCV/human_pose_estimation/utils/mpii_evaluator.py diff --git a/fluid/PaddleCV/human_pose_estimation/utils/transforms.py b/PaddleCV/human_pose_estimation/utils/transforms.py similarity index 100% rename from fluid/PaddleCV/human_pose_estimation/utils/transforms.py rename to PaddleCV/human_pose_estimation/utils/transforms.py diff --git a/fluid/PaddleCV/human_pose_estimation/utils/utility.py b/PaddleCV/human_pose_estimation/utils/utility.py similarity index 100% rename from fluid/PaddleCV/human_pose_estimation/utils/utility.py rename to PaddleCV/human_pose_estimation/utils/utility.py diff --git a/fluid/PaddleCV/human_pose_estimation/val.py b/PaddleCV/human_pose_estimation/val.py similarity index 100% rename from fluid/PaddleCV/human_pose_estimation/val.py rename to PaddleCV/human_pose_estimation/val.py diff --git a/fluid/PaddleCV/icnet/.run_ce.sh b/PaddleCV/icnet/.run_ce.sh similarity index 100% rename from fluid/PaddleCV/icnet/.run_ce.sh rename to PaddleCV/icnet/.run_ce.sh diff --git a/PaddleCV/icnet/README.md b/PaddleCV/icnet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..84e067ab081f648a4107ece906bad9a52ae13bbc --- /dev/null +++ b/PaddleCV/icnet/README.md @@ -0,0 +1,110 @@ +运行本目录下的程序示例需要使用PaddlePaddle develop最新版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。 + + +## 代码结构 +``` +├── network.py # 网络结构定义脚本 +├── train.py # 训练任务脚本 +├── eval.py # 评估脚本 +├── infer.py # 预测脚本 +├── cityscape.py # 数据预处理脚本 +└── utils.py # 定义通用的函数 +``` + +## 简介 + +Image Cascade Network(ICNet)主要用于图像实时语义分割。相较于其它压缩计算的方法,ICNet即考虑了速度,也考虑了准确性。 +ICNet的主要思想是将输入图像变换为不同的分辨率,然后用不同计算复杂度的子网络计算不同分辨率的输入,然后将结果合并。ICNet由三个子网络组成,计算复杂度高的网络处理低分辨率输入,计算复杂度低的网络处理分辨率高的网络,通过这种方式在高分辨率图像的准确性和低复杂度网络的效率之间获得平衡。 + +整个网络结构如下: + +

+
+图 1 +

+ + +## 数据准备 + + + +本文采用Cityscape数据集,请前往[Cityscape官网](https://www.cityscapes-dataset.com)注册下载。下载数据之后,按照[这里](https://github.com/mcordts/cityscapesScripts/blob/master/cityscapesscripts/preparation/createTrainIdLabelImgs.py#L3)的说明和工具处理数据。 +处理之后的数据 +``` +data/cityscape/ +|-- gtFine +| |-- test +| |-- train +| `-- val +|-- leftImg8bit +| |-- test +| |-- train +| `-- val +|-- train.list +`-- val.list +``` +其中,train.list和val.list分别是用于训练和测试的列表文件,第一列为输入图像数据,第二列为标注数据,两列用空格分开。示例如下: +``` +leftImg8bit/train/stuttgart/stuttgart_000021_000019_leftImg8bit.png gtFine/train/stuttgart/stuttgart_000021_000019_gtFine_labelTrainIds.png +leftImg8bit/train/stuttgart/stuttgart_000072_000019_leftImg8bit.png gtFine/train/stuttgart/stuttgart_000072_000019_gtFine_labelTrainIds.png +``` +完成数据下载和准备后,需要修改`cityscape.py`脚本中对应的数据地址。 + +## 模型训练与预测 + +### 训练 +执行以下命令进行训练,同时指定checkpoint保存路径: +``` +python train.py --batch_size=16 --use_gpu=True --checkpoint_path="./chkpnt/" +``` +使用以下命令获得更多使用说明: +``` +python train.py --help +``` +训练过程中会根据用户的设置,输出训练集上每个网络分支的`loss`, 示例如下: +``` +Iter[0]; train loss: 2.338; sub4_loss: 3.367; sub24_loss: 4.120; sub124_loss: 0.151 +``` +### 测试 +执行以下命令在`Cityscape`测试数据集上进行测试: +``` +python eval.py --model_path="./model/" --use_gpu=True +``` +需要通过选项`--model_path`指定模型文件。 +测试脚本的输出的评估指标为[mean IoU]()。 + +### 预测 +执行以下命令对指定的数据进行预测: +``` +python infer.py \ +--model_path="./model" \ +--images_path="./data/cityscape/" \ +--images_list="./data/cityscape/infer.list" +``` +通过选项`--images_list`指定列表文件,列表文件中每一行为一个要预测的图片的路径。 +预测结果默认保存到当前路径下的`output`文件夹下。 + +## 实验结果 +图2为在`CityScape`训练集上的训练的Loss曲线: + +

+
+图 2 +

+ +在训练集上训练,在validation数据集上验证的结果为:mean_IoU=67.0%(论文67.7%) + +图3是使用`infer.py`脚本预测产生的结果示例,其中,第一行为输入的原始图片,第二行为人工的标注,第三行为我们模型计算的结果。 +

+
+图 3 +

+ +## 其他信息 +|数据集 | pretrained model | +|---|---| +|CityScape | [pretrained_model](https://paddle-icnet-models.bj.bcebos.com/model_1000.tar.gz) | + +## 参考 + +- [ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545) diff --git a/fluid/PaddleCV/icnet/_ce.py b/PaddleCV/icnet/_ce.py similarity index 100% rename from fluid/PaddleCV/icnet/_ce.py rename to PaddleCV/icnet/_ce.py diff --git a/fluid/PaddleCV/icnet/cityscape.py b/PaddleCV/icnet/cityscape.py similarity index 100% rename from fluid/PaddleCV/icnet/cityscape.py rename to PaddleCV/icnet/cityscape.py diff --git a/fluid/PaddleCV/icnet/data/cityscape/gtFine/train/stuttgart/stuttgart_000021_000019_gtFine_labelTrainIds.png b/PaddleCV/icnet/data/cityscape/gtFine/train/stuttgart/stuttgart_000021_000019_gtFine_labelTrainIds.png similarity index 100% rename from fluid/PaddleCV/icnet/data/cityscape/gtFine/train/stuttgart/stuttgart_000021_000019_gtFine_labelTrainIds.png rename to PaddleCV/icnet/data/cityscape/gtFine/train/stuttgart/stuttgart_000021_000019_gtFine_labelTrainIds.png diff --git a/fluid/PaddleCV/icnet/data/cityscape/leftImg8bit/train/stuttgart/stuttgart_000021_000019_leftImg8bit.png b/PaddleCV/icnet/data/cityscape/leftImg8bit/train/stuttgart/stuttgart_000021_000019_leftImg8bit.png similarity index 100% rename from fluid/PaddleCV/icnet/data/cityscape/leftImg8bit/train/stuttgart/stuttgart_000021_000019_leftImg8bit.png rename to PaddleCV/icnet/data/cityscape/leftImg8bit/train/stuttgart/stuttgart_000021_000019_leftImg8bit.png diff --git a/fluid/PaddleCV/icnet/data/cityscape/train.list b/PaddleCV/icnet/data/cityscape/train.list similarity index 100% rename from fluid/PaddleCV/icnet/data/cityscape/train.list rename to PaddleCV/icnet/data/cityscape/train.list diff --git a/fluid/PaddleCV/icnet/eval.py b/PaddleCV/icnet/eval.py similarity index 100% rename from fluid/PaddleCV/icnet/eval.py rename to PaddleCV/icnet/eval.py diff --git a/fluid/PaddleCV/icnet/icnet.py b/PaddleCV/icnet/icnet.py similarity index 100% rename from fluid/PaddleCV/icnet/icnet.py rename to PaddleCV/icnet/icnet.py diff --git a/fluid/PaddleCV/icnet/images/icnet.png b/PaddleCV/icnet/images/icnet.png similarity index 100% rename from fluid/PaddleCV/icnet/images/icnet.png rename to PaddleCV/icnet/images/icnet.png diff --git a/fluid/PaddleCV/icnet/images/result.png b/PaddleCV/icnet/images/result.png similarity index 100% rename from fluid/PaddleCV/icnet/images/result.png rename to PaddleCV/icnet/images/result.png diff --git a/fluid/PaddleCV/icnet/images/train_loss.png b/PaddleCV/icnet/images/train_loss.png similarity index 100% rename from fluid/PaddleCV/icnet/images/train_loss.png rename to PaddleCV/icnet/images/train_loss.png diff --git a/fluid/PaddleCV/icnet/infer.py b/PaddleCV/icnet/infer.py similarity index 100% rename from fluid/PaddleCV/icnet/infer.py rename to PaddleCV/icnet/infer.py diff --git a/fluid/PaddleCV/icnet/train.py b/PaddleCV/icnet/train.py similarity index 100% rename from fluid/PaddleCV/icnet/train.py rename to PaddleCV/icnet/train.py diff --git a/fluid/PaddleCV/icnet/utils.py b/PaddleCV/icnet/utils.py similarity index 100% rename from fluid/PaddleCV/icnet/utils.py rename to PaddleCV/icnet/utils.py diff --git a/fluid/PaddleCV/image_classification/.gitignore b/PaddleCV/image_classification/.gitignore similarity index 100% rename from fluid/PaddleCV/image_classification/.gitignore rename to PaddleCV/image_classification/.gitignore diff --git a/fluid/PaddleCV/image_classification/.run_ce.sh b/PaddleCV/image_classification/.run_ce.sh similarity index 100% rename from fluid/PaddleCV/image_classification/.run_ce.sh rename to PaddleCV/image_classification/.run_ce.sh diff --git a/PaddleCV/image_classification/README.md b/PaddleCV/image_classification/README.md new file mode 100644 index 0000000000000000000000000000000000000000..9ae4482f5412f7fac72a54d45898022458227ae8 --- /dev/null +++ b/PaddleCV/image_classification/README.md @@ -0,0 +1,168 @@ +# Image Classification and Model Zoo +Image classification, which is an important field of computer vision, is to classify an image into pre-defined labels. Recently, many researchers developed different kinds of neural networks and highly improve the classification performance. This page introduces how to do image classification with PaddlePaddle Fluid. + +--- +## Table of Contents +- [Installation](#installation) +- [Data preparation](#data-preparation) +- [Training a model with flexible parameters](#training-a-model-with-flexible-parameters) +- [Using Mixed-Precision Training](#using-mixed-precision-training) +- [Finetuning](#finetuning) +- [Evaluation](#evaluation) +- [Inference](#inference) +- [Supported models and performances](#supported-models-and-performances) + +## Installation + +Running sample code in this directory requires PaddelPaddle Fluid v0.13.0 and later, the latest release version is recommended, If the PaddlePaddle on your device is lower than v0.13.0, please follow the instructions in [installation document](http://paddlepaddle.org/documentation/docs/zh/1.3/beginners_guide/install/index_cn.html) and make an update. + +## Data preparation + +An example for ImageNet classification is as follows. First of all, preparation of imagenet data can be done as: +``` +cd data/ILSVRC2012/ +sh download_imagenet2012.sh +``` + +In the shell script ```download_imagenet2012.sh```, there are three steps to prepare data: + +**step-1:** Register at ```image-net.org``` first in order to get a pair of ```Username``` and ```AccessKey```, which are used to download ImageNet data. + +**step-2:** Download ImageNet-2012 dataset from website. The training and validation data will be downloaded into folder "train" and "val" respectively. Please note that the size of data is more than 40 GB, it will take much time to download. Users who have downloaded the ImageNet data can organize it into ```data/ILSVRC2012``` directly. + +**step-3:** Download training and validation label files. There are two label files which contain train and validation image labels respectively: + +* *train_list.txt*: label file of imagenet-2012 training set, with each line seperated by ```SPACE```, like: +``` +train/n02483708/n02483708_2436.jpeg 369 +train/n03998194/n03998194_7015.jpeg 741 +train/n04523525/n04523525_38118.jpeg 884 +... +``` +* *val_list.txt*: label file of imagenet-2012 validation set, with each line seperated by ```SPACE```, like. +``` +val/ILSVRC2012_val_00000001.jpeg 65 +val/ILSVRC2012_val_00000002.jpeg 970 +val/ILSVRC2012_val_00000003.jpeg 230 +... +``` + +You may need to modify the path in reader.py to load data correctly. + +## Training a model with flexible parameters + +After data preparation, one can start the training step by: + +``` +python train.py \ + --model=SE_ResNeXt50_32x4d \ + --batch_size=32 \ + --total_images=1281167 \ + --class_dim=1000 \ + --image_shape=3,224,224 \ + --model_save_dir=output/ \ + --with_mem_opt=False \ + --lr_strategy=piecewise_decay \ + --lr=0.1 +``` +**parameter introduction:** +* **model**: name model to use. Default: "SE_ResNeXt50_32x4d". +* **num_epochs**: the number of epochs. Default: 120. +* **batch_size**: the size of each mini-batch. Default: 256. +* **use_gpu**: whether to use GPU or not. Default: True. +* **total_images**: total number of images in the training set. Default: 1281167. +* **class_dim**: the class number of the classification task. Default: 1000. +* **image_shape**: input size of the network. Default: "3,224,224". +* **model_save_dir**: the directory to save trained model. Default: "output". +* **with_mem_opt**: whether to use memory optimization or not. Default: True. +* **lr_strategy**: learning rate changing strategy. Default: "piecewise_decay". +* **lr**: initialized learning rate. Default: 0.1. +* **pretrained_model**: model path for pretraining. Default: None. +* **checkpoint**: the checkpoint path to resume. Default: None. +* **data_dir**: the data path. Default: "./data/ILSVRC2012". +* **fp16**: whether to enable half precision training with fp16. Default: False. +* **scale_loss**: scale loss for fp16. Default: 1.0. +* **l2_decay**: L2_decay parameter. Default: 1e-4. +* **momentum_rate**: momentum_rate. Default: 0.9. + +Or can start the training step by running the ```run.sh```. + +**data reader introduction:** Data reader is defined in ```reader.py```and```reader_cv2.py```, Using CV2 reader can improve the speed of reading. In [training stage](#training-a-model-with-flexible-parameters), random crop and flipping are used, while center crop is used in [Evaluation](#evaluation) and [Inference](#inference) stages. Supported data augmentation includes: +* rotation +* color jitter +* random crop +* center crop +* resize +* flipping + +## Using Mixed-Precision Training + +You may add `--fp16=1` to start train using mixed precisioin training, which the training process will use float16 and the output model ("master" parameters) is saved as float32. You also may need to pass `--scale_loss` to overcome accuracy issues, usually `--scale_loss=8.0` will do. + +Note that currently `--fp16` can not use together with `--with_mem_opt`, so pass `--with_mem_opt=0` to disable memory optimization pass. + +## Finetuning + +Finetuning is to finetune model weights in a specific task by loading pretrained weights. After initializing ```path_to_pretrain_model```, one can finetune a model as: +``` +python train.py + --model=SE_ResNeXt50_32x4d \ + --pretrained_model=${path_to_pretrain_model} \ + --batch_size=32 \ + --total_images=1281167 \ + --class_dim=1000 \ + --image_shape=3,224,224 \ + --model_save_dir=output/ \ + --with_mem_opt=True \ + --lr_strategy=piecewise_decay \ + --lr=0.1 +``` + +## Evaluation +Evaluation is to evaluate the performance of a trained model. One can download [pretrained models](#supported-models-and-performances) and set its path to ```path_to_pretrain_model```. Then top1/top5 accuracy can be obtained by running the following command: +``` +python eval.py \ + --model=SE_ResNeXt50_32x4d \ + --batch_size=32 \ + --class_dim=1000 \ + --image_shape=3,224,224 \ + --with_mem_opt=True \ + --pretrained_model=${path_to_pretrain_model} +``` + +## Inference +Inference is used to get prediction score or image features based on trained models. +``` +python infer.py \ + --model=SE_ResNeXt50_32x4d \ + --class_dim=1000 \ + --image_shape=3,224,224 \ + --with_mem_opt=True \ + --pretrained_model=${path_to_pretrain_model} +``` + +## Supported models and performances + +Available top-1/top-5 validation accuracy on ImageNet 2012 are listed in table. Pretrained models can be downloaded by clicking related model names. + +- Released models: specify parameter names + +|model | top-1/top-5 accuracy(PIL)| top-1/top-5 accuracy(CV2) | +|- |:-: |:-:| +|[AlexNet](http://paddle-imagenet-models-name.bj.bcebos.com/AlexNet_pretrained.zip) | 56.71%/79.18% | 55.88%/78.65% | +|[VGG11](https://paddle-imagenet-models-name.bj.bcebos.com/VGG11_pretrained.zip) | 69.22%/89.09% | 69.01%/88.90% | +|[VGG13](https://paddle-imagenet-models-name.bj.bcebos.com/VGG13_pretrained.zip) | 70.14%/89.48% | 69.83%/89.13% | +|[VGG16](https://paddle-imagenet-models-name.bj.bcebos.com/VGG16_pretrained.zip) | 72.08%/90.63% | 71.65%/90.57% | +|[VGG19](https://paddle-imagenet-models-name.bj.bcebos.com/VGG19_pretrained.zip) | 72.56%/90.83% | 72.32%/90.98% | +|[MobileNetV1](http://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.zip) | 70.91%/89.54% | 70.51%/89.35% | +|[MobileNetV2](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_pretrained.zip) | 71.90%/90.55% | 71.53%/90.41% | +|[ResNet18](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_pretrained.tar) | 70.85%/89.89% | 70.65%/89.89% | +|[ResNet34](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_pretrained.tar) | 74.41%/92.03% | 74.13%/91.97% | +|[ResNet50](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_pretrained.zip) | 76.35%/92.80% | 76.22%/92.92% | +|[ResNet101](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.zip) | 77.49%/93.57% | 77.56%/93.64% | +|[ResNet152](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet152_pretrained.zip) | 78.12%/93.93% | 77.92%/93.87% | +|[SE_ResNeXt50_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNext50_32x4d_pretrained.zip) | 78.50%/94.01% | 78.44%/93.96% | +|[SE_ResNeXt101_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNeXt101_32x4d_pretrained.zip) | 79.26%/94.22% | 79.12%/94.20% | +|[GoogleNet](https://paddle-imagenet-models-name.bj.bcebos.com/GoogleNet_pretrained.tar) | 70.50%/89.59% | 70.27%/89.58% | +|[ShuffleNetV2](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNet_pretrained.tar) | | 69.48%/88.99% | +>>>>>>> dc1b032d2616972b974999b7eebe7b35bfeca2f1 diff --git a/PaddleCV/image_classification/README_cn.md b/PaddleCV/image_classification/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..34b0ac158e16616177957f45c4c89fc600008e9d --- /dev/null +++ b/PaddleCV/image_classification/README_cn.md @@ -0,0 +1,163 @@ +# 图像分类以及模型库 +图像分类是计算机视觉的重要领域,它的目标是将图像分类到预定义的标签。近期,许多研究者提出很多不同种类的神经网络,并且极大的提升了分类算法的性能。本页将介绍如何使用PaddlePaddle进行图像分类。 + +--- +## 内容 +- [安装](#安装) +- [数据准备](#数据准备) +- [模型训练](#模型训练) +- [混合精度训练](#混合精度训练) +- [参数微调](#参数微调) +- [模型评估](#模型评估) +- [模型预测](#模型预测) +- [已有模型及其性能](#已有模型及其性能) + +## 安装 + +在当前目录下运行样例代码需要PadddlePaddle Fluid的v0.13.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据 [installation document](http://paddlepaddle.org/documentation/docs/zh/1.3/beginners_guide/install/index_cn.html) 中的说明来更新PaddlePaddle。 + +## 数据准备 + +下面给出了ImageNet分类任务的样例,首先,通过如下的方式进行数据的准备: +``` +cd data/ILSVRC2012/ +sh download_imagenet2012.sh +``` +在```download_imagenet2012.sh```脚本中,通过下面三步来准备数据: + +**步骤一:** 首先在```image-net.org```网站上完成注册,用于获得一对```Username```和```AccessKey```。 + +**步骤二:** 从ImageNet官网下载ImageNet-2012的图像数据。训练以及验证数据集会分别被下载到"train" 和 "val" 目录中。请注意,ImaegNet数据的大小超过40GB,下载非常耗时;已经自行下载ImageNet的用户可以直接将数据组织放置到```data/ILSVRC2012```。 + +**步骤三:** 下载训练与验证集合对应的标签文件。下面两个文件分别包含了训练集合与验证集合中图像的标签: + +* *train_list.txt*: ImageNet-2012训练集合的标签文件,每一行采用"空格"分隔图像路径与标注,例如: +``` +train/n02483708/n02483708_2436.jpeg 369 +train/n03998194/n03998194_7015.jpeg 741 +train/n04523525/n04523525_38118.jpeg 884 +... +``` +* *val_list.txt*: ImageNet-2012验证集合的标签文件,每一行采用"空格"分隔图像路径与标注,例如: +``` +val/ILSVRC2012_val_00000001.jpeg 65 +val/ILSVRC2012_val_00000002.jpeg 970 +val/ILSVRC2012_val_00000003.jpeg 230 +... +``` +注意:需要根据本地环境调整reader.py相关路径来正确读取数据。 + +## 模型训练 + +数据准备完毕后,可以通过如下的方式启动训练: +``` +python train.py \ + --model=SE_ResNeXt50_32x4d \ + --batch_size=32 \ + --total_images=1281167 \ + --class_dim=1000 \ + --image_shape=3,224,224 \ + --model_save_dir=output/ \ + --with_mem_opt=False \ + --lr_strategy=piecewise_decay \ + --lr=0.1 +``` +**参数说明:** +* **model**: 模型名称, 默认值: "SE_ResNeXt50_32x4d" +* **num_epochs**: 训练回合数,默认值: 120 +* **batch_size**: 批大小,默认值: 256 +* **use_gpu**: 是否在GPU上运行,默认值: True +* **total_images**: 图片数,ImageNet2012默认值: 1281167. +* **class_dim**: 类别数,默认值: 1000 +* **image_shape**: 图片大小,默认值: "3,224,224" +* **model_save_dir**: 模型存储路径,默认值: "output/" +* **with_mem_opt**: 是否开启显存优化,默认值: False +* **lr_strategy**: 学习率变化策略,默认值: "piecewise_decay" +* **lr**: 初始学习率,默认值: 0.1 +* **pretrained_model**: 预训练模型路径,默认值: None +* **checkpoint**: 用于继续训练的检查点(指定具体模型存储路径,如"output/SE_ResNeXt50_32x4d/100/"),默认值: None +* **fp16**: 是否开启混合精度训练,默认值: False +* **scale_loss**: 调整混合训练的loss scale值,默认值: 1.0 +* **l2_decay**: l2_decay值,默认值: 1e-4 +* **momentum_rate**: momentum_rate值,默认值: 0.9 + +在```run.sh```中有用于训练的脚本. + +**数据读取器说明:** 数据读取器定义在```reader.py```和```reader_cv2.py```中。一般, CV2可以提高数据读取速度, PIL reader可以得到相对更高的精度, 我们现在默认基于PIL的数据读取器, 在[训练阶段](#模型训练), 默认采用的增广方式是随机裁剪与水平翻转, 而在[模型评估](#模型评估)与[模型预测](#模型预测)阶段用的默认方式是中心裁剪。当前支持的数据增广方式有: +* 旋转 +* 颜色抖动 +* 随机裁剪 +* 中心裁剪 +* 长宽调整 +* 水平翻转 + +## 混合精度训练 + +可以通过开启`--fp16=True`启动混合精度训练,这样训练过程会使用float16数据,并输出float32的模型参数("master"参数)。您可能需要同时传入`--scale_loss`来解决fp16训练的精度问题,通常传入`--scale_loss=8.0`即可。 + +注意,目前混合精度训练不能和内存优化功能同时使用,所以需要传`--with_mem_opt=False`这个参数来禁用内存优化功能。 + +## 参数微调 + +参数微调是指在特定任务上微调已训练模型的参数。通过初始化```path_to_pretrain_model```,微调一个模型可以采用如下的命令: +``` +python train.py + --model=SE_ResNeXt50_32x4d \ + --pretrained_model=${path_to_pretrain_model} \ + --batch_size=32 \ + --total_images=1281167 \ + --class_dim=1000 \ + --image_shape=3,224,224 \ + --model_save_dir=output/ \ + --with_mem_opt=True \ + --lr_strategy=piecewise_decay \ + --lr=0.1 +``` + +## 模型评估 +模型评估是指对训练完毕的模型评估各类性能指标。用户可以下载[已有模型及其性能](#已有模型及其性能)并且设置```path_to_pretrain_model```为模型所在路径。运行如下的命令,可以获得一个模型top-1/top-5精度: +``` +python eval.py \ + --model=SE_ResNeXt50_32x4d \ + --batch_size=32 \ + --class_dim=1000 \ + --image_shape=3,224,224 \ + --with_mem_opt=True \ + --pretrained_model=${path_to_pretrain_model} +``` + +## 模型预测 +模型预测可以获取一个模型的预测分数或者图像的特征: +``` +python infer.py \ + --model=SE_ResNeXt50_32x4d \ + --class_dim=1000 \ + --image_shape=3,224,224 \ + --with_mem_opt=True \ + --pretrained_model=${path_to_pretrain_model} +``` + +## 已有模型及其性能 +表格中列出了在```models```目录下支持的图像分类模型,并且给出了已完成训练的模型在ImageNet-2012验证集合上的top-1/top-5精度, +可以通过点击相应模型的名称下载相应预训练模型。 + +- Released models: + +|model | top-1/top-5 accuracy(PIL)| top-1/top-5 accuracy(CV2) | +|- |:-: |:-:| +|[AlexNet](http://paddle-imagenet-models-name.bj.bcebos.com/AlexNet_pretrained.zip) | 56.71%/79.18% | 55.88%/78.65% | +|[VGG11](https://paddle-imagenet-models-name.bj.bcebos.com/VGG11_pretrained.zip) | 69.22%/89.09% | 69.01%/88.90% | +|[VGG13](https://paddle-imagenet-models-name.bj.bcebos.com/VGG13_pretrained.zip) | 70.14%/89.48% | 69.83%/89.13% | +|[VGG16](https://paddle-imagenet-models-name.bj.bcebos.com/VGG16_pretrained.zip) | 72.08%/90.63% | 71.65%/90.57% | +|[VGG19](https://paddle-imagenet-models-name.bj.bcebos.com/VGG19_pretrained.zip) | 72.56%/90.83% | 72.32%/90.98% | +|[MobileNetV1](http://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.zip) | 70.91%/89.54% | 70.51%/89.35% | +|[MobileNetV2](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_pretrained.zip) | 71.90%/90.55% | 71.53%/90.41% | +|[ResNet18](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_pretrained.tar) | 70.85%/89.89% | 70.65%/89.89% | +|[ResNet34](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_pretrained.tar) | 74.41%/92.03% | 74.13%/91.97% | +|[ResNet50](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_pretrained.zip) | 76.35%/92.80% | 76.22%/92.92% | +|[ResNet101](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.zip) | 77.49%/93.57% | 77.56%/93.64% | +|[ResNet152](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet152_pretrained.zip) | 78.12%/93.93% | 77.92%/93.87% | +|[SE_ResNeXt50_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNext50_32x4d_pretrained.zip) | 78.50%/94.01% | 78.44%/93.96% | +|[SE_ResNeXt101_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNeXt101_32x4d_pretrained.zip) | 79.26%/94.22% | 79.12%/94.20% | +|[GoogleNet](https://paddle-imagenet-models-name.bj.bcebos.com/GoogleNet_pretrained.tar) | 70.50%/89.59% | 70.27%/89.58% | +|[ShuffleNetV2](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNet_pretrained.tar) | | 69.48%/88.99% | diff --git a/PaddleCV/image_classification/README_ngraph.md b/PaddleCV/image_classification/README_ngraph.md new file mode 100644 index 0000000000000000000000000000000000000000..bb8190758d876244df931a090134f8410b6d38b3 --- /dev/null +++ b/PaddleCV/image_classification/README_ngraph.md @@ -0,0 +1,43 @@ + +# PaddlePaddle inference and training script +This directory contains configuration and instructions to run the PaddlePaddle + nGraph for a local training and inference. + +# How to build PaddlePaddle framework with NGraph engine +In order to build the PaddlePaddle + nGraph engine and run proper script, follow up a few steps: +1. Install PaddlePaddle project +2. set env exports for nGraph and OpenMP +3. run the inference/training script + +Currently supported models: +* ResNet50 (inference and training). + +Only support Adam optimizer yet. + +Short description of aforementioned steps: + +## 1. Install PaddlePaddle +Follow PaddlePaddle [installation instruction](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification#installation) to install PaddlePaddle. If you [build from source](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/beginners_guide/install/compile/compile_Ubuntu_en.md), please use the following cmake arguments and ensure to set `-DWITH_NGRAPH=ON`. +``` +cmake .. -DCMAKE_BUILD_TYPE=Release -DWITH_GPU=OFF -DWITH_MKL=ON -DWITH_MKLDNN=ON -DWITH_NGRAPH=ON +``` +Note: MKLDNN and MKL are required. + +## 2. Set env exports for nGraph and OMP +Set the following exports needed for running nGraph: +``` +export FLAGS_use_ngraph=true +export OMP_NUM_THREADS= +``` + +If multiple threads are used, you may export the following for better performance: +``` +export KMP_AFFINITY=granularity=fine,compact,1,0 +``` + +## 3. How the benchmark script might be run. +If everything built successfully, you can run command in ResNet50 nGraph session in script [run.sh](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/image_classification/run.sh) to start the benchmark job locally. You will need to uncomment the `#ResNet50 nGraph` part of script. + +Above is training job using the nGraph, to run the inference job using the nGraph: + +Please download the pre-trained resnet50 model from [supported models](https://github.com/PaddlePaddle/models/tree/72dcc7c1a8d5de9d19fbd65b4143bd0d661eee2c/fluid/PaddleCV/image_classification#supported-models-and-performances) for inference script. + diff --git a/fluid/PaddleCV/face_detection/__init__.py b/PaddleCV/image_classification/__init__.py similarity index 100% rename from fluid/PaddleCV/face_detection/__init__.py rename to PaddleCV/image_classification/__init__.py diff --git a/fluid/PaddleCV/image_classification/_ce.py b/PaddleCV/image_classification/_ce.py similarity index 100% rename from fluid/PaddleCV/image_classification/_ce.py rename to PaddleCV/image_classification/_ce.py diff --git a/fluid/PaddleCV/image_classification/data/ILSVRC2012/download_imagenet2012.sh b/PaddleCV/image_classification/data/ILSVRC2012/download_imagenet2012.sh similarity index 100% rename from fluid/PaddleCV/image_classification/data/ILSVRC2012/download_imagenet2012.sh rename to PaddleCV/image_classification/data/ILSVRC2012/download_imagenet2012.sh diff --git a/fluid/PaddleCV/image_classification/dist_train/README.md b/PaddleCV/image_classification/dist_train/README.md similarity index 100% rename from fluid/PaddleCV/image_classification/dist_train/README.md rename to PaddleCV/image_classification/dist_train/README.md diff --git a/fluid/PaddleCV/human_pose_estimation/lib/__init__.py b/PaddleCV/image_classification/dist_train/__init__.py similarity index 100% rename from fluid/PaddleCV/human_pose_estimation/lib/__init__.py rename to PaddleCV/image_classification/dist_train/__init__.py diff --git a/fluid/PaddleCV/image_classification/dist_train/batch_merge.py b/PaddleCV/image_classification/dist_train/batch_merge.py similarity index 100% rename from fluid/PaddleCV/image_classification/dist_train/batch_merge.py rename to PaddleCV/image_classification/dist_train/batch_merge.py diff --git a/fluid/PaddleCV/image_classification/dist_train/dist_train.py b/PaddleCV/image_classification/dist_train/dist_train.py similarity index 100% rename from fluid/PaddleCV/image_classification/dist_train/dist_train.py rename to PaddleCV/image_classification/dist_train/dist_train.py diff --git a/fluid/PaddleCV/image_classification/dist_train/dist_utils.py b/PaddleCV/image_classification/dist_train/dist_utils.py similarity index 100% rename from fluid/PaddleCV/image_classification/dist_train/dist_utils.py rename to PaddleCV/image_classification/dist_train/dist_utils.py diff --git a/fluid/PaddleCV/image_classification/dist_train/env.py b/PaddleCV/image_classification/dist_train/env.py similarity index 100% rename from fluid/PaddleCV/image_classification/dist_train/env.py rename to PaddleCV/image_classification/dist_train/env.py diff --git a/fluid/PaddleCV/image_classification/dist_train/run_mp_mode.sh b/PaddleCV/image_classification/dist_train/run_mp_mode.sh similarity index 100% rename from fluid/PaddleCV/image_classification/dist_train/run_mp_mode.sh rename to PaddleCV/image_classification/dist_train/run_mp_mode.sh diff --git a/fluid/PaddleCV/image_classification/dist_train/run_nccl2_mode.sh b/PaddleCV/image_classification/dist_train/run_nccl2_mode.sh similarity index 100% rename from fluid/PaddleCV/image_classification/dist_train/run_nccl2_mode.sh rename to PaddleCV/image_classification/dist_train/run_nccl2_mode.sh diff --git a/fluid/PaddleCV/image_classification/dist_train/run_ps_mode.sh b/PaddleCV/image_classification/dist_train/run_ps_mode.sh similarity index 100% rename from fluid/PaddleCV/image_classification/dist_train/run_ps_mode.sh rename to PaddleCV/image_classification/dist_train/run_ps_mode.sh diff --git a/fluid/PaddleCV/image_classification/eval.py b/PaddleCV/image_classification/eval.py similarity index 100% rename from fluid/PaddleCV/image_classification/eval.py rename to PaddleCV/image_classification/eval.py diff --git a/fluid/PaddleCV/image_classification/fast_imagenet/README.md b/PaddleCV/image_classification/fast_imagenet/README.md similarity index 100% rename from fluid/PaddleCV/image_classification/fast_imagenet/README.md rename to PaddleCV/image_classification/fast_imagenet/README.md diff --git a/fluid/PaddleCV/image_classification/fast_imagenet/requirements.txt b/PaddleCV/image_classification/fast_imagenet/requirements.txt similarity index 100% rename from fluid/PaddleCV/image_classification/fast_imagenet/requirements.txt rename to PaddleCV/image_classification/fast_imagenet/requirements.txt diff --git a/fluid/PaddleCV/image_classification/fast_imagenet/src/acc_curve.png b/PaddleCV/image_classification/fast_imagenet/src/acc_curve.png similarity index 100% rename from fluid/PaddleCV/image_classification/fast_imagenet/src/acc_curve.png rename to PaddleCV/image_classification/fast_imagenet/src/acc_curve.png diff --git a/fluid/PaddleCV/image_classification/fast_imagenet/tools/resize.py b/PaddleCV/image_classification/fast_imagenet/tools/resize.py similarity index 100% rename from fluid/PaddleCV/image_classification/fast_imagenet/tools/resize.py rename to PaddleCV/image_classification/fast_imagenet/tools/resize.py diff --git a/fluid/PaddleCV/image_classification/fast_imagenet/tools/valprep.sh b/PaddleCV/image_classification/fast_imagenet/tools/valprep.sh similarity index 100% rename from fluid/PaddleCV/image_classification/fast_imagenet/tools/valprep.sh rename to PaddleCV/image_classification/fast_imagenet/tools/valprep.sh diff --git a/fluid/PaddleCV/image_classification/fast_imagenet/torchvision_reader.py b/PaddleCV/image_classification/fast_imagenet/torchvision_reader.py similarity index 100% rename from fluid/PaddleCV/image_classification/fast_imagenet/torchvision_reader.py rename to PaddleCV/image_classification/fast_imagenet/torchvision_reader.py diff --git a/fluid/PaddleCV/image_classification/fast_imagenet/train.py b/PaddleCV/image_classification/fast_imagenet/train.py similarity index 100% rename from fluid/PaddleCV/image_classification/fast_imagenet/train.py rename to PaddleCV/image_classification/fast_imagenet/train.py diff --git a/fluid/PaddleCV/image_classification/images/alexnet_imagenet1k_acc1.png b/PaddleCV/image_classification/images/alexnet_imagenet1k_acc1.png similarity index 100% rename from fluid/PaddleCV/image_classification/images/alexnet_imagenet1k_acc1.png rename to PaddleCV/image_classification/images/alexnet_imagenet1k_acc1.png diff --git a/fluid/PaddleCV/image_classification/images/curve.jpg b/PaddleCV/image_classification/images/curve.jpg similarity index 100% rename from fluid/PaddleCV/image_classification/images/curve.jpg rename to PaddleCV/image_classification/images/curve.jpg diff --git a/fluid/PaddleCV/image_classification/images/imagenet_dist_performance.png b/PaddleCV/image_classification/images/imagenet_dist_performance.png similarity index 100% rename from fluid/PaddleCV/image_classification/images/imagenet_dist_performance.png rename to PaddleCV/image_classification/images/imagenet_dist_performance.png diff --git a/fluid/PaddleCV/image_classification/images/imagenet_dist_speedup.png b/PaddleCV/image_classification/images/imagenet_dist_speedup.png similarity index 100% rename from fluid/PaddleCV/image_classification/images/imagenet_dist_speedup.png rename to PaddleCV/image_classification/images/imagenet_dist_speedup.png diff --git a/fluid/PaddleCV/image_classification/images/mobielenetv1_imagenet1k_acc1.png b/PaddleCV/image_classification/images/mobielenetv1_imagenet1k_acc1.png similarity index 100% rename from fluid/PaddleCV/image_classification/images/mobielenetv1_imagenet1k_acc1.png rename to PaddleCV/image_classification/images/mobielenetv1_imagenet1k_acc1.png diff --git a/fluid/PaddleCV/image_classification/images/resnet101_imagenet1k_acc1.png b/PaddleCV/image_classification/images/resnet101_imagenet1k_acc1.png similarity index 100% rename from fluid/PaddleCV/image_classification/images/resnet101_imagenet1k_acc1.png rename to PaddleCV/image_classification/images/resnet101_imagenet1k_acc1.png diff --git a/fluid/PaddleCV/image_classification/images/resnet50_32gpus-acc1.png b/PaddleCV/image_classification/images/resnet50_32gpus-acc1.png similarity index 100% rename from fluid/PaddleCV/image_classification/images/resnet50_32gpus-acc1.png rename to PaddleCV/image_classification/images/resnet50_32gpus-acc1.png diff --git a/fluid/PaddleCV/image_classification/images/resnet50_imagenet1k_acc1.png b/PaddleCV/image_classification/images/resnet50_imagenet1k_acc1.png similarity index 100% rename from fluid/PaddleCV/image_classification/images/resnet50_imagenet1k_acc1.png rename to PaddleCV/image_classification/images/resnet50_imagenet1k_acc1.png diff --git a/fluid/PaddleCV/image_classification/images/vgg11_imagenet1k_acc1.png b/PaddleCV/image_classification/images/vgg11_imagenet1k_acc1.png similarity index 100% rename from fluid/PaddleCV/image_classification/images/vgg11_imagenet1k_acc1.png rename to PaddleCV/image_classification/images/vgg11_imagenet1k_acc1.png diff --git a/fluid/PaddleCV/image_classification/infer.py b/PaddleCV/image_classification/infer.py similarity index 100% rename from fluid/PaddleCV/image_classification/infer.py rename to PaddleCV/image_classification/infer.py diff --git a/fluid/PaddleCV/image_classification/legacy/README.md b/PaddleCV/image_classification/legacy/README.md similarity index 100% rename from fluid/PaddleCV/image_classification/legacy/README.md rename to PaddleCV/image_classification/legacy/README.md diff --git a/fluid/PaddleCV/image_classification/legacy/models/__init__.py b/PaddleCV/image_classification/legacy/models/__init__.py similarity index 100% rename from fluid/PaddleCV/image_classification/legacy/models/__init__.py rename to PaddleCV/image_classification/legacy/models/__init__.py diff --git a/fluid/PaddleCV/image_classification/legacy/models/alexnet.py b/PaddleCV/image_classification/legacy/models/alexnet.py similarity index 100% rename from fluid/PaddleCV/image_classification/legacy/models/alexnet.py rename to PaddleCV/image_classification/legacy/models/alexnet.py diff --git a/fluid/PaddleCV/image_classification/legacy/models/dpn.py b/PaddleCV/image_classification/legacy/models/dpn.py similarity index 100% rename from fluid/PaddleCV/image_classification/legacy/models/dpn.py rename to PaddleCV/image_classification/legacy/models/dpn.py diff --git a/fluid/PaddleCV/image_classification/legacy/models/googlenet.py b/PaddleCV/image_classification/legacy/models/googlenet.py similarity index 100% rename from fluid/PaddleCV/image_classification/legacy/models/googlenet.py rename to PaddleCV/image_classification/legacy/models/googlenet.py diff --git a/fluid/PaddleCV/image_classification/legacy/models/inception_v4.py b/PaddleCV/image_classification/legacy/models/inception_v4.py similarity index 100% rename from fluid/PaddleCV/image_classification/legacy/models/inception_v4.py rename to PaddleCV/image_classification/legacy/models/inception_v4.py diff --git a/fluid/PaddleCV/image_classification/legacy/models/mobilenet.py b/PaddleCV/image_classification/legacy/models/mobilenet.py similarity index 100% rename from fluid/PaddleCV/image_classification/legacy/models/mobilenet.py rename to PaddleCV/image_classification/legacy/models/mobilenet.py diff --git a/fluid/PaddleCV/image_classification/legacy/models/mobilenet_v2.py b/PaddleCV/image_classification/legacy/models/mobilenet_v2.py similarity index 100% rename from fluid/PaddleCV/image_classification/legacy/models/mobilenet_v2.py rename to PaddleCV/image_classification/legacy/models/mobilenet_v2.py diff --git a/fluid/PaddleCV/image_classification/legacy/models/resnet.py b/PaddleCV/image_classification/legacy/models/resnet.py similarity index 100% rename from fluid/PaddleCV/image_classification/legacy/models/resnet.py rename to PaddleCV/image_classification/legacy/models/resnet.py diff --git a/fluid/PaddleCV/image_classification/legacy/models/se_resnext.py b/PaddleCV/image_classification/legacy/models/se_resnext.py similarity index 100% rename from fluid/PaddleCV/image_classification/legacy/models/se_resnext.py rename to PaddleCV/image_classification/legacy/models/se_resnext.py diff --git a/fluid/PaddleCV/image_classification/legacy/models/shufflenet_v2.py b/PaddleCV/image_classification/legacy/models/shufflenet_v2.py similarity index 100% rename from fluid/PaddleCV/image_classification/legacy/models/shufflenet_v2.py rename to PaddleCV/image_classification/legacy/models/shufflenet_v2.py diff --git a/fluid/PaddleCV/image_classification/legacy/models/vgg.py b/PaddleCV/image_classification/legacy/models/vgg.py similarity index 100% rename from fluid/PaddleCV/image_classification/legacy/models/vgg.py rename to PaddleCV/image_classification/legacy/models/vgg.py diff --git a/fluid/PaddleCV/image_classification/models/__init__.py b/PaddleCV/image_classification/models/__init__.py similarity index 100% rename from fluid/PaddleCV/image_classification/models/__init__.py rename to PaddleCV/image_classification/models/__init__.py diff --git a/fluid/PaddleCV/image_classification/models/alexnet.py b/PaddleCV/image_classification/models/alexnet.py similarity index 100% rename from fluid/PaddleCV/image_classification/models/alexnet.py rename to PaddleCV/image_classification/models/alexnet.py diff --git a/fluid/PaddleCV/image_classification/models/dpn.py b/PaddleCV/image_classification/models/dpn.py similarity index 100% rename from fluid/PaddleCV/image_classification/models/dpn.py rename to PaddleCV/image_classification/models/dpn.py diff --git a/fluid/PaddleCV/image_classification/models/fast_imagenet.py b/PaddleCV/image_classification/models/fast_imagenet.py similarity index 100% rename from fluid/PaddleCV/image_classification/models/fast_imagenet.py rename to PaddleCV/image_classification/models/fast_imagenet.py diff --git a/fluid/PaddleCV/image_classification/models/googlenet.py b/PaddleCV/image_classification/models/googlenet.py similarity index 100% rename from fluid/PaddleCV/image_classification/models/googlenet.py rename to PaddleCV/image_classification/models/googlenet.py diff --git a/fluid/PaddleCV/image_classification/models/inception_v4.py b/PaddleCV/image_classification/models/inception_v4.py similarity index 100% rename from fluid/PaddleCV/image_classification/models/inception_v4.py rename to PaddleCV/image_classification/models/inception_v4.py diff --git a/fluid/PaddleCV/image_classification/models/mobilenet.py b/PaddleCV/image_classification/models/mobilenet.py similarity index 100% rename from fluid/PaddleCV/image_classification/models/mobilenet.py rename to PaddleCV/image_classification/models/mobilenet.py diff --git a/fluid/PaddleCV/image_classification/models/mobilenet_v2.py b/PaddleCV/image_classification/models/mobilenet_v2.py similarity index 100% rename from fluid/PaddleCV/image_classification/models/mobilenet_v2.py rename to PaddleCV/image_classification/models/mobilenet_v2.py diff --git a/fluid/PaddleCV/image_classification/models/resnet.py b/PaddleCV/image_classification/models/resnet.py similarity index 100% rename from fluid/PaddleCV/image_classification/models/resnet.py rename to PaddleCV/image_classification/models/resnet.py diff --git a/fluid/PaddleCV/image_classification/models/resnet_dist.py b/PaddleCV/image_classification/models/resnet_dist.py similarity index 100% rename from fluid/PaddleCV/image_classification/models/resnet_dist.py rename to PaddleCV/image_classification/models/resnet_dist.py diff --git a/fluid/PaddleCV/image_classification/models/se_resnext.py b/PaddleCV/image_classification/models/se_resnext.py similarity index 100% rename from fluid/PaddleCV/image_classification/models/se_resnext.py rename to PaddleCV/image_classification/models/se_resnext.py diff --git a/fluid/PaddleCV/image_classification/models/shufflenet_v2.py b/PaddleCV/image_classification/models/shufflenet_v2.py similarity index 100% rename from fluid/PaddleCV/image_classification/models/shufflenet_v2.py rename to PaddleCV/image_classification/models/shufflenet_v2.py diff --git a/fluid/PaddleCV/image_classification/models/vgg.py b/PaddleCV/image_classification/models/vgg.py similarity index 100% rename from fluid/PaddleCV/image_classification/models/vgg.py rename to PaddleCV/image_classification/models/vgg.py diff --git a/fluid/PaddleCV/image_classification/reader.py b/PaddleCV/image_classification/reader.py similarity index 100% rename from fluid/PaddleCV/image_classification/reader.py rename to PaddleCV/image_classification/reader.py diff --git a/fluid/PaddleCV/image_classification/reader_cv2.py b/PaddleCV/image_classification/reader_cv2.py similarity index 100% rename from fluid/PaddleCV/image_classification/reader_cv2.py rename to PaddleCV/image_classification/reader_cv2.py diff --git a/fluid/PaddleCV/image_classification/run.sh b/PaddleCV/image_classification/run.sh similarity index 100% rename from fluid/PaddleCV/image_classification/run.sh rename to PaddleCV/image_classification/run.sh diff --git a/fluid/PaddleCV/image_classification/train.py b/PaddleCV/image_classification/train.py similarity index 100% rename from fluid/PaddleCV/image_classification/train.py rename to PaddleCV/image_classification/train.py diff --git a/PaddleCV/image_classification/utility.py b/PaddleCV/image_classification/utility.py new file mode 100644 index 0000000000000000000000000000000000000000..5b10a179ac2231cb26ab42993b7300d5e99f44bc --- /dev/null +++ b/PaddleCV/image_classification/utility.py @@ -0,0 +1,63 @@ +"""Contains common utility functions.""" +# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve. +# +#Licensed under the Apache License, Version 2.0 (the "License"); +#you may not use this file except in compliance with the License. +#You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +#Unless required by applicable law or agreed to in writing, software +#distributed under the License is distributed on an "AS IS" BASIS, +#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +#See the License for the specific language governing permissions and +#limitations under the License. + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function +import distutils.util +import numpy as np +import six +from paddle.fluid import core + + +def print_arguments(args): + """Print argparse's arguments. + + Usage: + + .. code-block:: python + + parser = argparse.ArgumentParser() + parser.add_argument("name", default="Jonh", type=str, help="User name.") + args = parser.parse_args() + print_arguments(args) + + :param args: Input argparse.Namespace for printing. + :type args: argparse.Namespace + """ + print("----------- Configuration Arguments -----------") + for arg, value in sorted(six.iteritems(vars(args))): + print("%s: %s" % (arg, value)) + print("------------------------------------------------") + + +def add_arguments(argname, type, default, help, argparser, **kwargs): + """Add argparse's argument. + + Usage: + + .. code-block:: python + + parser = argparse.ArgumentParser() + add_argument("name", str, "Jonh", "User name.", parser) + args = parser.parse_args() + """ + type = distutils.util.strtobool if type == bool else type + argparser.add_argument( + "--" + argname, + default=default, + type=type, + help=help + ' Default: %(default)s.', + **kwargs) diff --git a/fluid/PaddleCV/image_classification/utils/__init__.py b/PaddleCV/image_classification/utils/__init__.py similarity index 100% rename from fluid/PaddleCV/image_classification/utils/__init__.py rename to PaddleCV/image_classification/utils/__init__.py diff --git a/fluid/PaddleCV/image_classification/utils/fp16_utils.py b/PaddleCV/image_classification/utils/fp16_utils.py similarity index 100% rename from fluid/PaddleCV/image_classification/utils/fp16_utils.py rename to PaddleCV/image_classification/utils/fp16_utils.py diff --git a/fluid/PaddleCV/image_classification/utils/learning_rate.py b/PaddleCV/image_classification/utils/learning_rate.py similarity index 100% rename from fluid/PaddleCV/image_classification/utils/learning_rate.py rename to PaddleCV/image_classification/utils/learning_rate.py diff --git a/fluid/PaddleCV/image_classification/utils/utility.py b/PaddleCV/image_classification/utils/utility.py similarity index 100% rename from fluid/PaddleCV/image_classification/utils/utility.py rename to PaddleCV/image_classification/utils/utility.py diff --git a/PaddleCV/metric_learning/README.md b/PaddleCV/metric_learning/README.md new file mode 100644 index 0000000000000000000000000000000000000000..d72a2505d7650963a562f306f0bbf85ec3bbc759 --- /dev/null +++ b/PaddleCV/metric_learning/README.md @@ -0,0 +1,113 @@ +# Deep Metric Learning +Metric learning is a kind of methods to learn discriminative features for each sample, with the purpose that intra-class samples have smaller distances while inter-class samples have larger distances in the learned space. With the develop of deep learning technique, metric learning methods are combined with deep neural networks to boost the performance of traditional tasks, such as face recognition/verification, human re-identification, image retrieval and so on. In this page, we introduce the way to implement deep metric learning using PaddlePaddle Fluid, including [data preparation](#data-preparation), [training](#training-metric-learning-models), [finetuning](#finetuning), [evaluation](#evaluation), [inference](#inference) and [Performances](#performances). + +--- +## Table of Contents +- [Installation](#installation) +- [Data preparation](#data-preparation) +- [Training metric learning models](#training-metric-learning-models) +- [Finetuning](#finetuning) +- [Evaluation](#evaluation) +- [Inference](#inference) +- [Performances](#performances) + +## Installation + +Running sample code in this directory requires PaddelPaddle Fluid v0.14.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html) and make an update. + +## Data preparation + +Stanford Online Product(SOP) dataset contains 120,053 images of 22,634 products downloaded from eBay.com. We use it to conduct the metric learning experiments. For training, 59,551 out of 11,318 classes are used, and 11,316 classes(60,502 images) are held out for testing. First of all, preparation of SOP data can be done as: +``` +cd data/ +sh download_sop.sh +``` + +## Training metric learning models + +To train a metric learning model, one need to set the neural network as backbone and the metric loss function to optimize. We train meiric learning model using softmax or arcmargin loss firstly, and then fine-turned the model using other metric learning loss, such as triplet, quadruplet and eml loss. One example of training using arcmargin loss is shown below: + + +``` +python train_elem.py \ + --model=ResNet50 \ + --train_batch_size=256 \ + --test_batch_size=50 \ + --lr=0.01 \ + --total_iter_num=30000 \ + --use_gpu=True \ + --pretrained_model=${path_to_pretrain_imagenet_model} \ + --model_save_dir=${output_model_path} \ + --loss_name=arcmargin \ + --arc_scale=80.0 \ + --arc_margin=0.15 \ + --arc_easy_margin=False +``` +**parameter introduction:** +* **model**: name model to use. Default: "ResNet50". +* **train_batch_size**: the size of each training mini-batch. Default: 256. +* **test_batch_size**: the size of each testing mini-batch. Default: 50. +* **lr**: initialized learning rate. Default: 0.01. +* **total_iter_num**: total number of training iterations. Default: 30000. +* **use_gpu**: whether to use GPU or not. Default: True. +* **pretrained_model**: model path for pretraining. Default: None. +* **model_save_dir**: the directory to save trained model. Default: "output". +* **loss_name**: loss for training model. Default: "softmax". +* **arc_scale**: parameter of arcmargin loss. Default: 80.0. +* **arc_margin**: parameter of arcmargin loss. Default: 0.15. +* **arc_easy_margin**: parameter of arcmargin loss. Default: False. + +## Finetuning + +Finetuning is to finetune model weights in a specific task by loading pretrained weights. After training model using softmax or arcmargin loss, one can finetune the model using triplet, quadruplet or eml loss. One example of fine-turned using eml loss is shown below: + +``` +python train_pair.py \ + --model=ResNet50 \ + --train_batch_size=160 \ + --test_batch_size=50 \ + --lr=0.0001 \ + --total_iter_num=100000 \ + --use_gpu=True \ + --pretrained_model=${path_to_pretrain_arcmargin_model} \ + --model_save_dir=${output_model_path} \ + --loss_name=eml \ + --samples_each_class=2 +``` + +## Evaluation +Evaluation is to evaluate the performance of a trained model. You should set model path to ```path_to_pretrain_model```. Then Recall@Rank-1 can be obtained by running the following command: +``` +python eval.py \ + --model=ResNet50 \ + --batch_size=50 \ + --pretrained_model=${path_to_pretrain_model} \ +``` + +## Inference +Inference is used to get prediction score or image features based on trained models. +``` +python infer.py \ + --model=ResNet50 \ + --batch_size=1 \ + --pretrained_model=${path_to_pretrain_model} +``` + +## Performances + +For comparation, many metric learning models with different neural networks and loss functions are trained using corresponding experiential parameters. Recall@Rank-1 is used as evaluation metric and the performance is listed in the table. + +|pretrain model | softmax | arcmargin +|- | - | -: +|without fine-tuned | 77.42% | 78.11% +|fine-tuned with triplet | 78.37% | 79.21% +|fine-tuned with quadruplet | 78.10% | 79.59% +|fine-tuned with eml | 79.32% | 80.11% +|fine-tuned with npairs | - | 79.81% + +## Reference + +- ArcFace: Additive Angular Margin Loss for Deep Face Recognition [link](https://arxiv.org/abs/1801.07698) +- Margin Sample Mining Loss: A Deep Learning Based Method for Person Re-identification [link](https://arxiv.org/abs/1710.00478) +- Large Scale Strongly Supervised Ensemble Metric Learning, with Applications to Face Verification and Retrieval [link](https://arxiv.org/abs/1212.6094) +- Improved Deep Metric Learning with Multi-class N-pair Loss Objective [link](http://www.nec-labs.com/uploads/images/Department-Images/MediaAnalytics/papers/nips16_npairmetriclearning.pdf) diff --git a/PaddleCV/metric_learning/README_cn.md b/PaddleCV/metric_learning/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..1b9efda881fc9045d70527fe7dad3d99a48eaf19 --- /dev/null +++ b/PaddleCV/metric_learning/README_cn.md @@ -0,0 +1,113 @@ +# 深度度量学习 +度量学习是一种为样本对学习具有区分性特征的方法,目的是在特征空间中,让同一个类别的样本具有较小的特征距离,不同类的样本具有较大的特征距离。随着深度学习技术的发展,基于深度神经网络的度量学习方法已经在许多视觉任务上提升了很大的性能,例如:人脸识别、人脸校验、行人重识别和图像检索等等。在本章节,介绍在PaddlePaddle Fluid里实现的几种度量学习方法和使用方法,具体包括[数据准备](#数据准备),[模型训练](#模型训练),[模型微调](#模型微调),[模型评估](#模型评估),[模型预测](#模型预测)。 + +--- +## 简介 +- [安装](#安装) +- [数据准备](#数据准备) +- [模型训练](#模型训练) +- [模型微调](#模型微调) +- [模型评估](#模型评估) +- [模型预测](#模型预测) +- [模型性能](#模型性能) + +## 安装 + +运行本章节代码需要在PaddlePaddle Fluid v0.14.0 或更高的版本环境。如果你的设备上的PaddlePaddle版本低于v0.14.0,请按照此[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)进行安装和跟新。 + +## 数据准备 + +Stanford Online Product(SOP) 数据集下载自eBay,包含120053张商品图片,有22634个类别。我们使用该数据集进行实验。训练时,使用59551张图片,11318个类别的数据;测试时,使用60502张图片,11316个类别。首先,SOP数据集可以使用以下脚本下载: +``` +cd data/ +sh download_sop.sh +``` + +## 模型训练 + +为了训练度量学习模型,我们需要一个神经网络模型作为骨架模型(如ResNet50)和度量学习代价函数来进行优化。我们首先使用 softmax 或者 arcmargin 来进行训练,然后使用其它的代价函数来进行微调,例如:triplet,quadruplet和eml。下面是一个使用arcmargin训练的例子: + + +``` +python train_elem.py \ + --model=ResNet50 \ + --train_batch_size=256 \ + --test_batch_size=50 \ + --lr=0.01 \ + --total_iter_num=30000 \ + --use_gpu=True \ + --pretrained_model=${path_to_pretrain_imagenet_model} \ + --model_save_dir=${output_model_path} \ + --loss_name=arcmargin \ + --arc_scale=80.0 \ + --arc_margin=0.15 \ + --arc_easy_margin=False +``` +**参数介绍:** +* **model**: 使用的模型名字. 默认: "ResNet50". +* **train_batch_size**: 训练的 mini-batch大小. 默认: 256. +* **test_batch_size**: 测试的 mini-batch大小. 默认: 50. +* **lr**: 初始学习率. 默认: 0.01. +* **total_iter_num**: 总的训练迭代轮数. 默认: 30000. +* **use_gpu**: 是否使用GPU. 默认: True. +* **pretrained_model**: 预训练模型的路径. 默认: None. +* **model_save_dir**: 保存模型的路径. 默认: "output". +* **loss_name**: 优化的代价函数. 默认: "softmax". +* **arc_scale**: arcmargin的参数. 默认: 80.0. +* **arc_margin**: arcmargin的参数. 默认: 0.15. +* **arc_easy_margin**: arcmargin的参数. 默认: False. + +## 模型微调 + +网络微调是在指定的任务上加载已有的模型来微调网络。在用softmax和arcmargin训完网络后,可以继续使用triplet,quadruplet或eml来微调网络。下面是一个使用eml来微调网络的例子: + +``` +python train_pair.py \ + --model=ResNet50 \ + --train_batch_size=160 \ + --test_batch_size=50 \ + --lr=0.0001 \ + --total_iter_num=100000 \ + --use_gpu=True \ + --pretrained_model=${path_to_pretrain_arcmargin_model} \ + --model_save_dir=${output_model_path} \ + --loss_name=eml \ + --samples_each_class=2 +``` + +## 模型评估 +模型评估主要是评估模型的检索性能。这里需要设置```path_to_pretrain_model```。可以使用下面命令来计算Recall@Rank-1。 +``` +python eval.py \ + --model=ResNet50 \ + --batch_size=50 \ + --pretrained_model=${path_to_pretrain_model} \ +``` + +## 模型预测 +模型预测主要是基于训练好的网络来获取图像数据的特征,下面是模型预测的例子: +``` +python infer.py \ + --model=ResNet50 \ + --batch_size=1 \ + --pretrained_model=${path_to_pretrain_model} +``` + +## 模型性能 + +下面列举了几种度量学习的代价函数在SOP数据集上的检索效果,这里使用Recall@Rank-1来进行评估。 + +|预训练模型 | softmax | arcmargin +|- | - | -: +|未微调 | 77.42% | 78.11% +|使用triplet微调 | 78.37% | 79.21% +|使用quadruplet微调 | 78.10% | 79.59% +|使用eml微调 | 79.32% | 80.11% +|使用npairs微调 | - | 79.81% + +## 引用 + +- ArcFace: Additive Angular Margin Loss for Deep Face Recognition [链接](https://arxiv.org/abs/1801.07698) +- Margin Sample Mining Loss: A Deep Learning Based Method for Person Re-identification [链接](https://arxiv.org/abs/1710.00478) +- Large Scale Strongly Supervised Ensemble Metric Learning, with Applications to Face Verification and Retrieval [链接](https://arxiv.org/abs/1212.6094) +- Improved Deep Metric Learning with Multi-class N-pair Loss Objective [链接](http://www.nec-labs.com/uploads/images/Department-Images/MediaAnalytics/papers/nips16_npairmetriclearning.pdf) diff --git a/fluid/PaddleCV/metric_learning/_ce.py b/PaddleCV/metric_learning/_ce.py similarity index 100% rename from fluid/PaddleCV/metric_learning/_ce.py rename to PaddleCV/metric_learning/_ce.py diff --git a/fluid/PaddleCV/metric_learning/data/download_sop.sh b/PaddleCV/metric_learning/data/download_sop.sh similarity index 100% rename from fluid/PaddleCV/metric_learning/data/download_sop.sh rename to PaddleCV/metric_learning/data/download_sop.sh diff --git a/fluid/PaddleCV/metric_learning/eval.py b/PaddleCV/metric_learning/eval.py similarity index 100% rename from fluid/PaddleCV/metric_learning/eval.py rename to PaddleCV/metric_learning/eval.py diff --git a/fluid/PaddleCV/metric_learning/imgtool.py b/PaddleCV/metric_learning/imgtool.py similarity index 100% rename from fluid/PaddleCV/metric_learning/imgtool.py rename to PaddleCV/metric_learning/imgtool.py diff --git a/fluid/PaddleCV/metric_learning/infer.py b/PaddleCV/metric_learning/infer.py similarity index 100% rename from fluid/PaddleCV/metric_learning/infer.py rename to PaddleCV/metric_learning/infer.py diff --git a/fluid/PaddleCV/metric_learning/losses/__init__.py b/PaddleCV/metric_learning/losses/__init__.py similarity index 100% rename from fluid/PaddleCV/metric_learning/losses/__init__.py rename to PaddleCV/metric_learning/losses/__init__.py diff --git a/fluid/PaddleCV/metric_learning/losses/arcmarginloss.py b/PaddleCV/metric_learning/losses/arcmarginloss.py similarity index 100% rename from fluid/PaddleCV/metric_learning/losses/arcmarginloss.py rename to PaddleCV/metric_learning/losses/arcmarginloss.py diff --git a/fluid/PaddleCV/metric_learning/losses/commonfunc.py b/PaddleCV/metric_learning/losses/commonfunc.py similarity index 100% rename from fluid/PaddleCV/metric_learning/losses/commonfunc.py rename to PaddleCV/metric_learning/losses/commonfunc.py diff --git a/fluid/PaddleCV/metric_learning/losses/emlloss.py b/PaddleCV/metric_learning/losses/emlloss.py similarity index 100% rename from fluid/PaddleCV/metric_learning/losses/emlloss.py rename to PaddleCV/metric_learning/losses/emlloss.py diff --git a/fluid/PaddleCV/metric_learning/losses/npairsloss.py b/PaddleCV/metric_learning/losses/npairsloss.py similarity index 100% rename from fluid/PaddleCV/metric_learning/losses/npairsloss.py rename to PaddleCV/metric_learning/losses/npairsloss.py diff --git a/fluid/PaddleCV/metric_learning/losses/quadrupletloss.py b/PaddleCV/metric_learning/losses/quadrupletloss.py similarity index 100% rename from fluid/PaddleCV/metric_learning/losses/quadrupletloss.py rename to PaddleCV/metric_learning/losses/quadrupletloss.py diff --git a/fluid/PaddleCV/metric_learning/losses/softmaxloss.py b/PaddleCV/metric_learning/losses/softmaxloss.py similarity index 100% rename from fluid/PaddleCV/metric_learning/losses/softmaxloss.py rename to PaddleCV/metric_learning/losses/softmaxloss.py diff --git a/fluid/PaddleCV/metric_learning/losses/tripletloss.py b/PaddleCV/metric_learning/losses/tripletloss.py similarity index 100% rename from fluid/PaddleCV/metric_learning/losses/tripletloss.py rename to PaddleCV/metric_learning/losses/tripletloss.py diff --git a/fluid/PaddleCV/metric_learning/models/__init__.py b/PaddleCV/metric_learning/models/__init__.py similarity index 100% rename from fluid/PaddleCV/metric_learning/models/__init__.py rename to PaddleCV/metric_learning/models/__init__.py diff --git a/fluid/PaddleCV/metric_learning/models/resnet_embedding.py b/PaddleCV/metric_learning/models/resnet_embedding.py similarity index 100% rename from fluid/PaddleCV/metric_learning/models/resnet_embedding.py rename to PaddleCV/metric_learning/models/resnet_embedding.py diff --git a/fluid/PaddleCV/metric_learning/reader.py b/PaddleCV/metric_learning/reader.py similarity index 100% rename from fluid/PaddleCV/metric_learning/reader.py rename to PaddleCV/metric_learning/reader.py diff --git a/fluid/PaddleCV/metric_learning/train_elem.py b/PaddleCV/metric_learning/train_elem.py similarity index 100% rename from fluid/PaddleCV/metric_learning/train_elem.py rename to PaddleCV/metric_learning/train_elem.py diff --git a/fluid/PaddleCV/metric_learning/train_pair.py b/PaddleCV/metric_learning/train_pair.py similarity index 100% rename from fluid/PaddleCV/metric_learning/train_pair.py rename to PaddleCV/metric_learning/train_pair.py diff --git a/fluid/PaddleCV/metric_learning/utility.py b/PaddleCV/metric_learning/utility.py similarity index 100% rename from fluid/PaddleCV/metric_learning/utility.py rename to PaddleCV/metric_learning/utility.py diff --git a/fluid/PaddleCV/object_detection/.gitignore b/PaddleCV/object_detection/.gitignore similarity index 100% rename from fluid/PaddleCV/object_detection/.gitignore rename to PaddleCV/object_detection/.gitignore diff --git a/fluid/PaddleCV/object_detection/.run_ce.sh b/PaddleCV/object_detection/.run_ce.sh similarity index 100% rename from fluid/PaddleCV/object_detection/.run_ce.sh rename to PaddleCV/object_detection/.run_ce.sh diff --git a/PaddleCV/object_detection/README.md b/PaddleCV/object_detection/README.md new file mode 100644 index 0000000000000000000000000000000000000000..2466ba96577c7cb1e2bb335a0b8b5c74edbb92fd --- /dev/null +++ b/PaddleCV/object_detection/README.md @@ -0,0 +1,96 @@ +## SSD Object Detection + +## Table of Contents +- [Introduction](#introduction) +- [Data Preparation](#data-preparation) +- [Train](#train) +- [Evaluate](#evaluate) +- [Infer and Visualize](#infer-and-visualize) +- [Released Model](#released-model) + +### Introduction + +[Single Shot MultiBox Detector (SSD)](https://arxiv.org/abs/1512.02325) framework for object detection can be categorized as a single stage detector. A single stage detector simplifies object detection as a regression problem, which directly predicts the bounding boxes and class probabilities without region proposal. SSD further makes improves by producing these predictions of different scales from different layers, as shown below. Six levels predictions are made in six different scale feature maps. And there are two 3x3 convolutional layers in each feature map, which predict category or a shape offset relative to the prior box(also called anchor), respectively. Thus, we get 38x38x4 + 19x19x6 + 10x10x6 + 5x5x6 + 3x3x4 + 1x1x4 = 8732 detections per class. +

+
+The Single Shot MultiBox Detector (SSD) +

+ +SSD is readily pluggable into a wide variant standard convolutional network, such as VGG, ResNet, or MobileNet, which is also called base network or backbone. In this tutorial we used [MobileNet](https://arxiv.org/abs/1704.04861). + + +### Data Preparation + +Please download [PASCAL VOC dataset](http://host.robots.ox.ac.uk/pascal/VOC/) at first, skip this step if you already have one. + +```bash +cd data/pascalvoc +./download.sh +``` + +The command `download.sh` also will create training and testing file lists. + +### Train + +#### Download the Pre-trained Model. + +We provide two pre-trained models. The one is MobileNet-v1 SSD trained on COCO dataset, but removed the convolutional predictors for COCO dataset. This model can be used to initialize the models when training other datasets, like PASCAL VOC. The other pre-trained model is MobileNet-v1 trained on ImageNet 2012 dataset but removed the last weights and bias in the Fully-Connected layer. Download MobileNet-v1 SSD: + + ```bash + ./pretrained/download_coco.sh + ``` + +Declaration: the MobileNet-v1 SSD model is converted by [TensorFlow model](https://github.com/tensorflow/models/blob/f87a58cd96d45de73c9a8330a06b2ab56749a7fa/research/object_detection/g3doc/detection_model_zoo.md). + + +#### Train on PASCAL VOC + +`train.py` is the main caller of the training module. Examples of usage are shown below. + ```bash + python -u train.py --batch_size=64 --dataset='pascalvoc' --pretrained_model='pretrained/ssd_mobilenet_v1_coco/' + ``` + - Set ```export CUDA_VISIBLE_DEVICES=0,1``` to specifiy the number of GPU you want to use. + - For more help on arguments: + + ```bash + python train.py --help + ``` + +Data reader is defined in `reader.py`. All images will be resized to 300x300. In training stage, images are randomly distorted, expanded, cropped and flipped: + - distort: distort brightness, contrast, saturation, and hue. + - expand: put the original image into a larger expanded image which is initialized using image mean. + - crop: crop image with respect to different scale, aspect ratio, and overlap. + - flip: flip horizontally. + +We used RMSProp optimizer with mini-batch size 64 to train the MobileNet-SSD. The initial learning rate is 0.001, and was decayed at 40, 60, 80, 100 epochs with multiplier 0.5, 0.25, 0.1, 0.01, respectively. Weight decay is 0.00005. After 120 epochs we achieve 73.32% mAP under 11point metric. + +### Evaluate + +You can evaluate your trained model in different metrics like 11point, integral on both PASCAL VOC and COCO dataset. Note we set the default test list to the dataset's test/val list, you can use your own test list by setting ```--test_list``` args. + +`eval.py` is the main caller of the evaluating module. Examples of usage are shown below. +```bash +python eval.py --dataset='pascalvoc' --model_dir='train_pascal_model/best_model' --data_dir='data/pascalvoc' --test_list='test.txt' --ap_version='11point' --nms_threshold=0.45 +``` + +### Infer and Visualize +`infer.py` is the main caller of the inferring module. Examples of usage are shown below. +```bash +python infer.py --dataset='pascalvoc' --nms_threshold=0.45 --model_dir='train_pascal_model/best_model' --image_path='./data/pascalvoc/VOCdevkit/VOC2007/JPEGImages/009963.jpg' +``` +Below are the examples of running the inference and visualizing the model result. +

+ + + +
+MobileNet-v1-SSD 300x300 Visualization Examples +

+ + +### Released Model + + +| Model | Pre-trained Model | Training data | Test data | mAP | +|:------------------------:|:------------------:|:----------------:|:------------:|:----:| +|[MobileNet-v1-SSD 300x300](http://paddlemodels.bj.bcebos.com/ssd_mobilenet_v1_pascalvoc.tar.gz) | COCO MobileNet SSD | VOC07+12 trainval| VOC07 test | 73.32% | diff --git a/PaddleCV/object_detection/README_cn.md b/PaddleCV/object_detection/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..8c4cecab28e49c10820e092d3a521facf4be68ea --- /dev/null +++ b/PaddleCV/object_detection/README_cn.md @@ -0,0 +1,99 @@ +## SSD 目标检测 + +## Table of Contents +- [简介](#简介) +- [数据准备](#数据准备) +- [模型训练](#模型训练) +- [模型评估](#模型评估) +- [模型预测以及可视化](#模型预测以及可视化) +- [模型发布](#模型发布) + +### 简介 + +[Single Shot MultiBox Detector (SSD)](https://arxiv.org/abs/1512.02325) 是一种单阶段的目标检测器。与两阶段的检测方法不同,单阶段目标检测并不进行区域推荐,而是直接从特征图回归出目标的边界框和分类概率。SSD 运用了这种单阶段检测的思想,并且对其进行改进:在不同尺度的特征图上检测对应尺度的目标。如下图所示,SSD 在六个尺度的特征图上进行了不同层级的预测。每个层级由两个3x3卷积分别对目标类别和边界框偏移进行回归。因此对于每个类别,SSD 的六个层级一共会产生 38x38x4 + 19x19x6 + 10x10x6 + 5x5x6 + 3x3x4 + 1x1x4 = 8732 个检测结果。 +

+
+SSD 目标检测模型 +

+ +SSD 可以方便地插入到任何一种标准卷积网络中,比如 VGG、ResNet 或者 MobileNet,这些网络被称作检测器的基网络。在这个示例中我们使用 [MobileNet](https://arxiv.org/abs/1704.04861)。 + + +### 数据准备 + + +请先使用下面的命令下载 [PASCAL VOC 数据集](http://host.robots.ox.ac.uk/pascal/VOC/): + +```bash +cd data/pascalvoc +./download.sh +``` + +`download.sh` 命令会自动创建训练和测试用的列表文件。 + + +### 模型训练 + +#### 下载预训练模型 + +我们提供了两个预训练模型。第一个模型是在 COCO 数据集上预训练的 MobileNet-v1 SSD,我们将它的预测头移除了以便在 COCO 以外的数据集上进行训练。第二个模型是在 ImageNet 2012 数据集上预训练的 MobileNet-v1,我们也将最后的全连接层移除以便进行目标检测训练。下载 MobileNet-v1 SSD: + + ```bash + ./pretrained/download_coco.sh + ``` + +声明:MobileNet-v1 SSD 模型转换自[TensorFlow model](https://github.com/tensorflow/models/blob/f87a58cd96d45de73c9a8330a06b2ab56749a7fa/research/object_detection/g3doc/detection_model_zoo.md)。MobileNet-v1 模型转换自[Caffe](https://github.com/shicai/MobileNet-Caffe)。 + + +#### 训练 + +`train.py` 是训练模块的主要执行程序,调用示例如下: + ```bash + python -u train.py --batch_size=64 --dataset='pascalvoc' --pretrained_model='pretrained/ssd_mobilenet_v1_coco/' + ``` + - 可以通过设置 ```export CUDA_VISIBLE_DEVICES=0,1``` 指定想要使用的GPU数量。 + - 更多的可选参数见: + + ```bash + python train.py --help + ``` + +数据的读取行为定义在 `reader.py` 中,所有的图片都会被缩放到300x300。在训练时还会对图片进行数据增强,包括随机扰动、扩张、翻转和裁剪: + - 扰动: 扰动图片亮度、对比度、饱和度和色相。 + - 扩张: 将原始图片放进一张使用像素均值填充(随后会在减均值操作中减掉)的扩张图中,再对此图进行裁剪、缩放和翻转。 + - 翻转: 水平翻转。 + - 裁剪: 根据缩放比例、长宽比例两个参数生成若干候选框,再依据这些候选框和标注框的面积交并比(IoU)挑选出符合要求的裁剪结果。 + +我们使用了 RMSProp 优化算法来训练 MobileNet-SSD,batch大小为64,权重衰减系数为0.00005,初始学习率为 0.001,并且在第40、60、80、100 轮时使用 0.5, 0.25, 0.1, 0.01乘子进行学习率衰减。在120轮训练后,11point评价标准下的mAP为73.32%。 + +### 模型评估 + +你可以使用11point、integral等指标在PASCAL VOC 数据集上评估训练好的模型。不失一般性,我们采用相应数据集的测试列表作为样例代码的默认列表,你也可以通过设置```--test_list```来指定自己的测试样本列表。 + +`eval.py`是评估模块的主要执行程序,调用示例如下: +```bash +python eval.py --dataset='pascalvoc' --model_dir='train_pascal_model/best_model' --data_dir='data/pascalvoc' --test_list='test.txt' --ap_version='11point' --nms_threshold=0.45 +``` + +### 模型预测以及可视化 + +`infer.py`是预测及可视化模块的主要执行程序,调用示例如下: +```bash +python infer.py --dataset='pascalvoc' --nms_threshold=0.45 --model_dir='train_pascal_model/best_model' --image_path='./data/pascalvoc/VOCdevkit/VOC2007/JPEGImages/009963.jpg' +``` +下图可视化了模型的预测结果: +

+ + + +
+MobileNet-v1-SSD 300x300 预测可视化 +

+ + +### 模型发布 + + +| 模型 | 预训练模型 | 训练数据 | 测试数据 | mAP | +|:------------------------:|:------------------:|:----------------:|:------------:|:----:| +|[MobileNet-v1-SSD 300x300](http://paddlemodels.bj.bcebos.com/ssd_mobilenet_v1_pascalvoc.tar.gz) | COCO MobileNet SSD | VOC07+12 trainval| VOC07 test | 73.32% | diff --git a/PaddleCV/object_detection/README_quant.md b/PaddleCV/object_detection/README_quant.md new file mode 100644 index 0000000000000000000000000000000000000000..7ea7f7bd79d21ba34c84d1a1b48a5298837939ac --- /dev/null +++ b/PaddleCV/object_detection/README_quant.md @@ -0,0 +1,146 @@ +## Quantization-aware training for SSD + +### Introduction + +The quantization-aware training used in this experiments is introduced in [fixed-point quantization desigin](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/quantization/fixed_point_quantization.md). Since quantization-aware training is still an active area of research and experimentation, +here, we just give an simple quantization training usage in Fluid based on MobileNet-SSD model, and more other exeperiments are still needed, like how to quantization traning by considering fusing batch normalization and convolution/fully-connected layers, channel-wise quantization of weights and so on. + + +A Python transpiler is used to rewrite Fluid training program or evaluation program for quantization-aware training: + +```python + + #startup_prog = fluid.Program() + #train_prog = fluid.Program() + #loss = build_program( + # main_prog=train_prog, + # startup_prog=startup_prog, + # is_train=True) + #build_program( + # main_prog=test_prog, + # startup_prog=startup_prog, + # is_train=False) + #test_prog = test_prog.clone(for_test=True) + # above is an pseudo code + + transpiler = fluid.contrib.QuantizeTranspiler( + weight_bits=8, + activation_bits=8, + activation_quantize_type='abs_max', # or 'range_abs_max' + weight_quantize_type='abs_max') + # note, transpiler.training_transpile will rewrite train_prog + # startup_prog is needed since it needs to insert and initialize + # some state variable + transpiler.training_transpile(train_prog, startup_prog) + transpiler.training_transpile(test_prog, startup_prog) +``` + + According to above design, this transpiler inserts fake quantization and de-quantization operation for each convolution operation (including depthwise convolution operation) and fully-connected operation. These quantizations take affect on weights and activations. + + In the design, we introduce dynamic quantization and static quantization strategies for different activation quantization methods. In the expriments, when set `activation_quantize_type` to `abs_max`, it is dynamic quantization. That is to say, the quantization scale (maximum of absolute value) of activation will be calculated each mini-batch during inference. When set `activation_quantize_type` to `range_abs_max`, a quantization scale for inference period will be calculated during training. Following part will introduce how to train. + +### Quantization-aware training + + The training is fine-tuned on the well-trained MobileNet-SSD model. So download model at first: + + ``` + wget http://paddlemodels.bj.bcebos.com/ssd_mobilenet_v1_pascalvoc.tar.gz + ``` + +- dynamic quantization: + + ```python + python main_quant.py \ + --data_dir=$PascalVOC_DIR$ \ + --mode='train' \ + --init_model=ssd_mobilenet_v1_pascalvoc \ + --act_quant_type='abs_max' \ + --epoc_num=20 \ + --learning_rate=0.0001 \ + --batch_size=64 \ + --model_save_dir=$OUTPUT_DIR$ + ``` + Since fine-tuned on a well-trained model, we use a small start learnng rate 0.0001, and train 20 epocs. + +- static quantization: + ```python + python main_quant.py \ + --data_dir=$PascalVOC_DIR$ \ + --mode='train' \ + --init_model=ssd_mobilenet_v1_pascalvoc \ + --act_quant_type='range_abs_max' \ + --epoc_num=80 \ + --learning_rate=0.001 \ + --lr_epochs=30,60 \ + --lr_decay_rates=1,0.1,0.01 \ + --batch_size=64 \ + --model_save_dir=$OUTPUT_DIR$ + ``` + Here, train 80 epocs, learning rate decays at 30 and 60 epocs by 0.1 every time. Users can adjust these hype-parameters. + +### Convert to inference model + + As described in the design documentation, the inference graph is a little different from training, the difference is the de-quantization operation is before or after conv/fc. This is equivalent in training due to linear operation of conv/fc and de-quantization and functions' commutative law. But for inference, it needs to convert the graph, `fluid.contrib.QuantizeTranspiler.freeze_program` is used to do this: + + ```python + #startup_prog = fluid.Program() + #test_prog = fluid.Program() + #test_py_reader, map_eval, nmsed_out, image = build_program( + # main_prog=test_prog, + # startup_prog=startup_prog, + # train_params=configs, + # is_train=False) + #test_prog = test_prog.clone(for_test=True) + #transpiler = fluid.contrib.QuantizeTranspiler(weight_bits=8, + # activation_bits=8, + # activation_quantize_type=act_quant_type, + # weight_quantize_type='abs_max') + #transpiler.training_transpile(test_prog, startup_prog) + #place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace() + #exe = fluid.Executor(place) + #exe.run(startup_prog) + + def if_exist(var): + return os.path.exists(os.path.join(init_model, var.name)) + fluid.io.load_vars(exe, init_model, main_program=test_prog, + predicate=if_exist) + # freeze the rewrited training program + # freeze after load parameters, it will quantized weights + transpiler.freeze_program(test_prog, place) + ``` + + Users can evaluate the converted model by: + + ``` + python main_quant.py \ + --data_dir=$PascalVOC_DIR$ \ + --mode='test' \ + --init_model=$MODLE_DIR$ \ + --model_save_dir=$MobileNet_SSD_8BIT_MODEL$ + ``` + + You also can check the 8-bit model by the inference scripts + + ``` + python main_quant.py \ + --mode='infer' \ + --init_model=$MobileNet_SSD_8BIT_MODEL$ \ + --confs_threshold=0.5 \ + --image_path='/data/PascalVOC/VOCdevkit/VOC2007/JPEGImages/002271.jpg' + ``` + See 002271.jpg for the visualized image with bbouding boxes. + + + **Note**, if you want to convert model to 8-bit, you should call `fluid.contrib.QuantizeTranspiler.convert_to_int8` to do this. But, now Paddle can't load 8-bit model to do inference. + +### Results + +Results of MobileNet-v1-SSD 300x300 model on PascalVOC dataset. + +| Model | mAP | +|:---------------------------------------:|:------------------:| +|Floating point: 32bit | 73.32% | +|Fixed point: 8bit, dynamic quantization | 72.77% | +|Fixed point: 8bit, static quantization | 72.45% | + + As mentioned above, other experiments, like how to quantization traning by considering fusing batch normalization and convolution/fully-connected layers, channel-wise quantization of weights, quantizated weights type with uint8 instead of int8 and so on. diff --git a/fluid/PaddleCV/object_detection/_ce.py b/PaddleCV/object_detection/_ce.py similarity index 100% rename from fluid/PaddleCV/object_detection/_ce.py rename to PaddleCV/object_detection/_ce.py diff --git a/fluid/PaddleCV/object_detection/data/coco/download.sh b/PaddleCV/object_detection/data/coco/download.sh similarity index 100% rename from fluid/PaddleCV/object_detection/data/coco/download.sh rename to PaddleCV/object_detection/data/coco/download.sh diff --git a/fluid/PaddleCV/object_detection/data/pascalvoc/create_list.py b/PaddleCV/object_detection/data/pascalvoc/create_list.py similarity index 100% rename from fluid/PaddleCV/object_detection/data/pascalvoc/create_list.py rename to PaddleCV/object_detection/data/pascalvoc/create_list.py diff --git a/fluid/PaddleCV/object_detection/data/pascalvoc/download.sh b/PaddleCV/object_detection/data/pascalvoc/download.sh similarity index 100% rename from fluid/PaddleCV/object_detection/data/pascalvoc/download.sh rename to PaddleCV/object_detection/data/pascalvoc/download.sh diff --git a/fluid/PaddleCV/object_detection/data/pascalvoc/label_list b/PaddleCV/object_detection/data/pascalvoc/label_list similarity index 100% rename from fluid/PaddleCV/object_detection/data/pascalvoc/label_list rename to PaddleCV/object_detection/data/pascalvoc/label_list diff --git a/fluid/PaddleCV/object_detection/eval.py b/PaddleCV/object_detection/eval.py similarity index 100% rename from fluid/PaddleCV/object_detection/eval.py rename to PaddleCV/object_detection/eval.py diff --git a/fluid/PaddleCV/object_detection/eval_coco_map.py b/PaddleCV/object_detection/eval_coco_map.py similarity index 100% rename from fluid/PaddleCV/object_detection/eval_coco_map.py rename to PaddleCV/object_detection/eval_coco_map.py diff --git a/fluid/PaddleCV/object_detection/image_util.py b/PaddleCV/object_detection/image_util.py similarity index 100% rename from fluid/PaddleCV/object_detection/image_util.py rename to PaddleCV/object_detection/image_util.py diff --git a/fluid/PaddleCV/object_detection/images/009943.jpg b/PaddleCV/object_detection/images/009943.jpg similarity index 100% rename from fluid/PaddleCV/object_detection/images/009943.jpg rename to PaddleCV/object_detection/images/009943.jpg diff --git a/fluid/PaddleCV/object_detection/images/009956.jpg b/PaddleCV/object_detection/images/009956.jpg similarity index 100% rename from fluid/PaddleCV/object_detection/images/009956.jpg rename to PaddleCV/object_detection/images/009956.jpg diff --git a/fluid/PaddleCV/object_detection/images/009960.jpg b/PaddleCV/object_detection/images/009960.jpg similarity index 100% rename from fluid/PaddleCV/object_detection/images/009960.jpg rename to PaddleCV/object_detection/images/009960.jpg diff --git a/fluid/PaddleCV/object_detection/images/009962.jpg b/PaddleCV/object_detection/images/009962.jpg similarity index 100% rename from fluid/PaddleCV/object_detection/images/009962.jpg rename to PaddleCV/object_detection/images/009962.jpg diff --git a/fluid/PaddleCV/object_detection/images/COCO_val2014_000000000139.jpg b/PaddleCV/object_detection/images/COCO_val2014_000000000139.jpg similarity index 100% rename from fluid/PaddleCV/object_detection/images/COCO_val2014_000000000139.jpg rename to PaddleCV/object_detection/images/COCO_val2014_000000000139.jpg diff --git a/fluid/PaddleCV/object_detection/images/COCO_val2014_000000000785.jpg b/PaddleCV/object_detection/images/COCO_val2014_000000000785.jpg similarity index 100% rename from fluid/PaddleCV/object_detection/images/COCO_val2014_000000000785.jpg rename to PaddleCV/object_detection/images/COCO_val2014_000000000785.jpg diff --git a/fluid/PaddleCV/object_detection/images/COCO_val2014_000000000885.jpg b/PaddleCV/object_detection/images/COCO_val2014_000000000885.jpg similarity index 100% rename from fluid/PaddleCV/object_detection/images/COCO_val2014_000000000885.jpg rename to PaddleCV/object_detection/images/COCO_val2014_000000000885.jpg diff --git a/fluid/PaddleCV/object_detection/images/COCO_val2014_000000142324.jpg b/PaddleCV/object_detection/images/COCO_val2014_000000142324.jpg similarity index 100% rename from fluid/PaddleCV/object_detection/images/COCO_val2014_000000142324.jpg rename to PaddleCV/object_detection/images/COCO_val2014_000000142324.jpg diff --git a/fluid/PaddleCV/object_detection/images/COCO_val2014_000000144003.jpg b/PaddleCV/object_detection/images/COCO_val2014_000000144003.jpg similarity index 100% rename from fluid/PaddleCV/object_detection/images/COCO_val2014_000000144003.jpg rename to PaddleCV/object_detection/images/COCO_val2014_000000144003.jpg diff --git a/fluid/PaddleCV/object_detection/images/SSD_paper_figure.jpg b/PaddleCV/object_detection/images/SSD_paper_figure.jpg similarity index 100% rename from fluid/PaddleCV/object_detection/images/SSD_paper_figure.jpg rename to PaddleCV/object_detection/images/SSD_paper_figure.jpg diff --git a/fluid/PaddleCV/object_detection/infer.py b/PaddleCV/object_detection/infer.py similarity index 100% rename from fluid/PaddleCV/object_detection/infer.py rename to PaddleCV/object_detection/infer.py diff --git a/fluid/PaddleCV/object_detection/main_quant.py b/PaddleCV/object_detection/main_quant.py similarity index 100% rename from fluid/PaddleCV/object_detection/main_quant.py rename to PaddleCV/object_detection/main_quant.py diff --git a/fluid/PaddleCV/object_detection/mobilenet_ssd.py b/PaddleCV/object_detection/mobilenet_ssd.py similarity index 100% rename from fluid/PaddleCV/object_detection/mobilenet_ssd.py rename to PaddleCV/object_detection/mobilenet_ssd.py diff --git a/fluid/PaddleCV/object_detection/pretrained/download_coco.sh b/PaddleCV/object_detection/pretrained/download_coco.sh similarity index 100% rename from fluid/PaddleCV/object_detection/pretrained/download_coco.sh rename to PaddleCV/object_detection/pretrained/download_coco.sh diff --git a/fluid/PaddleCV/object_detection/pretrained/download_imagenet.sh b/PaddleCV/object_detection/pretrained/download_imagenet.sh similarity index 100% rename from fluid/PaddleCV/object_detection/pretrained/download_imagenet.sh rename to PaddleCV/object_detection/pretrained/download_imagenet.sh diff --git a/fluid/PaddleCV/object_detection/reader.py b/PaddleCV/object_detection/reader.py similarity index 100% rename from fluid/PaddleCV/object_detection/reader.py rename to PaddleCV/object_detection/reader.py diff --git a/fluid/PaddleCV/object_detection/train.py b/PaddleCV/object_detection/train.py similarity index 100% rename from fluid/PaddleCV/object_detection/train.py rename to PaddleCV/object_detection/train.py diff --git a/fluid/PaddleCV/object_detection/utility.py b/PaddleCV/object_detection/utility.py similarity index 100% rename from fluid/PaddleCV/object_detection/utility.py rename to PaddleCV/object_detection/utility.py diff --git a/fluid/PaddleCV/ocr_recognition/.run_ce.sh b/PaddleCV/ocr_recognition/.run_ce.sh similarity index 100% rename from fluid/PaddleCV/ocr_recognition/.run_ce.sh rename to PaddleCV/ocr_recognition/.run_ce.sh diff --git a/PaddleCV/ocr_recognition/README.md b/PaddleCV/ocr_recognition/README.md new file mode 100644 index 0000000000000000000000000000000000000000..1c9553993e84d10376441407704088ec4dd66c0c --- /dev/null +++ b/PaddleCV/ocr_recognition/README.md @@ -0,0 +1,206 @@ + + +运行本目录下的程序示例需要使用PaddlePaddle develop最新版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。 + +## 代码结构 +``` +├── data_reader.py # 下载、读取、处理数据。 +├── crnn_ctc_model.py # 定义了OCR CTC model的网络结构。 +├── attention_model.py # 定义了OCR attention model的网络结构。 +├── train.py # 用于模型的训练。 +├── infer.py # 加载训练好的模型文件,对新数据进行预测。 +├── eval.py # 评估模型在指定数据集上的效果。 +└── utils.py # 定义通用的函数。 +``` + + +## 简介 + +本章的任务是识别图片中单行英文字符,这里我们分别使用CTC model和attention model两种不同的模型来完成该任务。 + +这两种模型的有相同的编码部分,首先采用卷积将图片转为特征图, 然后使用`im2sequence op`将特征图转为序列,通过`双向GRU`学习到序列特征。 + +两种模型的解码部分和使用的损失函数区别如下: + +- CTC model: 训练过程选用的损失函数为CTC(Connectionist Temporal Classification) loss, 预测阶段采用的是贪婪策略和CTC解码策略。 +- Attention model: 训练过程选用的是带注意力机制的解码策略和交叉信息熵损失函数,预测阶段采用的是柱搜索策略。 + +训练以上两种模型的评估指标为样本级别的错误率。 + +## 数据 + +数据的下载和简单预处理都在`data_reader.py`中实现。 + +### 数据示例 + +我们使用的训练和测试数据如`图1`所示,每张图片包含单行不定长的英文字符串,这些图片都是经过检测算法进行预框选处理的。 + +

+
+图 1 +

+ +在训练集中,每张图片对应的label是汉字在词典中的索引。 `图1` 对应的label如下所示: +``` +80,84,68,82,83,72,78,77,68,67 +``` +在上边这个label中,`80` 表示字符`Q`的索引,`67` 表示英文字符`D`的索引。 + + +### 数据准备 + +**A. 训练集** + +我们需要把所有参与训练的图片放入同一个文件夹,暂且记为`train_images`。然后用一个list文件存放每张图片的信息,包括图片大小、图片名称和对应的label,这里暂记该list文件为`train_list`,其格式如下所示: + +``` +185 48 00508_0215.jpg 7740,5332,2369,3201,4162 +48 48 00197_1893.jpg 6569 +338 48 00007_0219.jpg 4590,4788,3015,1994,3402,999,4553 +150 48 00107_4517.jpg 5936,3382,1437,3382 +... +157 48 00387_0622.jpg 2397,1707,5919,1278 +``` + +
文件train_list
+ +上述文件中的每一行表示一张图片,每行被空格分为四列,前两列分别表示图片的宽和高,第三列表示图片的名称,第四列表示该图片对应的sequence label。 +最终我们应有以下类似文件结构: + +``` +|-train_data + |- train_list + |- train_imags + |- 00508_0215.jpg + |- 00197_1893.jpg + |- 00007_0219.jpg + | ... +``` + +在训练时,我们通过选项`--train_images` 和 `--train_list` 分别设置准备好的`train_images` 和`train_list`。 + + +>**注:** 如果`--train_images` 和 `--train_list`都未设置或设置为None, reader.py会自动下载使用[示例数据](http://paddle-ocr-data.bj.bcebos.com/data.tar.gz),并将其缓存到`$HOME/.cache/paddle/dataset/ctc_data/data/` 路径下。 + + +**B. 测试集和评估集** + +测试集、评估集的准备方式与训练集相同。 +在训练阶段,测试集的路径通过train.py的选项`--test_images` 和 `--test_list` 来设置。 +在评估时,评估集的路径通过eval.py的选项`--input_images_dir` 和`--input_images_list` 来设置。 + +**C. 待预测数据集** + +预测支持三种形式的输入: + +第一种:设置`--input_images_dir`和`--input_images_list`, 与训练集类似, 只不过list文件中的最后一列可以放任意占位字符或字符串,如下所示: + +``` +185 48 00508_0215.jpg s +48 48 00197_1893.jpg s +338 48 00007_0219.jpg s +... +``` + +第二种:仅设置`--input_images_list`, 其中list文件中只需放图片的完整路径,如下所示: + +``` +data/test_images/00000.jpg +data/test_images/00001.jpg +data/test_images/00003.jpg +``` + +第三种:从stdin读入一张图片的path,然后进行一次inference. + +## 模型训练与预测 + +### 训练 + +使用默认数据在GPU单卡上训练: + +``` +env CUDA_VISIBLE_DEVICES=0 python train.py +``` +使用默认数据在CPU上训练: +``` +env OMP_NUM_THREADS= python train.py --use_gpu False --parallel=False +``` + +使用默认数据在GPU多卡上训练: + +``` +env CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --parallel=True +``` + +默认使用的是`CTC model`, 可以通过选项`--model="attention"`切换为`attention model`。 + +执行`python train.py --help`可查看更多使用方式和参数详细说明。 + +图2为使用默认参数在默认数据集上训练`CTC model`的收敛曲线,其中横坐标轴为训练迭代次数,纵轴为样本级错误率。其中,蓝线为训练集上的样本错误率,红线为测试集上的样本错误率。测试集上最低错误率为22.0%. + +

+
+图 2 +

+ +图3为使用默认参数在默认数据集上训练`attention model`的收敛曲线,其中横坐标轴为训练迭代次数,纵轴为样本级错误率。其中,蓝线为训练集上的样本错误率,红线为测试集上的样本错误率。测试集上最低错误率为16.25%. + +

+
+图 3 +

+ + +## 测试 + +通过以下命令调用评估脚本用指定数据集对模型进行评估: + +``` +env CUDA_VISIBLE_DEVICE=0 python eval.py \ + --model_path="./models/model_0" \ + --input_images_dir="./eval_data/images/" \ + --input_images_list="./eval_data/eval_list\" \ +``` + +执行`python train.py --help`可查看参数详细说明。 + + +### 预测 + +从标准输入读取一张图片的路径,并对齐进行预测: + +``` +env CUDA_VISIBLE_DEVICE=0 python infer.py \ + --model_path="models/model_00044_15000" +``` + +执行上述命令进行预测的效果如下: + +``` +----------- Configuration Arguments ----------- +use_gpu: True +input_images_dir: None +input_images_list: None +model_path: /home/work/models/fluid/ocr_recognition/models/model_00052_15000 +------------------------------------------------ +Init model from: ./models/model_00052_15000. +Please input the path of image: ./test_images/00001_0060.jpg +result: [3298 2371 4233 6514 2378 3298 2363] +Please input the path of image: ./test_images/00001_0429.jpg +result: [2067 2067 8187 8477 5027 7191 2431 1462] +``` + +从文件中批量读取图片路径,并对其进行预测: + +``` +env CUDA_VISIBLE_DEVICE=0 python infer.py \ + --model_path="models/model_00044_15000" \ + --input_images_list="data/test.list" +``` + +## 预训练模型 + +|模型| 错误率| +|- |:-: | +|[ocr_ctc_params](https://paddle-ocr-models.bj.bcebos.com/ocr_ctc.zip) | 22.3% | +|[ocr_attention_params](https://paddle-ocr-models.bj.bcebos.com/ocr_attention.zip) | 15.8%| diff --git a/fluid/PaddleCV/ocr_recognition/_ce.py b/PaddleCV/ocr_recognition/_ce.py similarity index 100% rename from fluid/PaddleCV/ocr_recognition/_ce.py rename to PaddleCV/ocr_recognition/_ce.py diff --git a/fluid/PaddleCV/ocr_recognition/attention_model.py b/PaddleCV/ocr_recognition/attention_model.py similarity index 100% rename from fluid/PaddleCV/ocr_recognition/attention_model.py rename to PaddleCV/ocr_recognition/attention_model.py diff --git a/fluid/PaddleCV/ocr_recognition/crnn_ctc_model.py b/PaddleCV/ocr_recognition/crnn_ctc_model.py similarity index 100% rename from fluid/PaddleCV/ocr_recognition/crnn_ctc_model.py rename to PaddleCV/ocr_recognition/crnn_ctc_model.py diff --git a/fluid/PaddleCV/ocr_recognition/data_reader.py b/PaddleCV/ocr_recognition/data_reader.py similarity index 100% rename from fluid/PaddleCV/ocr_recognition/data_reader.py rename to PaddleCV/ocr_recognition/data_reader.py diff --git a/fluid/PaddleCV/ocr_recognition/eval.py b/PaddleCV/ocr_recognition/eval.py similarity index 100% rename from fluid/PaddleCV/ocr_recognition/eval.py rename to PaddleCV/ocr_recognition/eval.py diff --git a/fluid/PaddleCV/ocr_recognition/images/demo.jpg b/PaddleCV/ocr_recognition/images/demo.jpg similarity index 100% rename from fluid/PaddleCV/ocr_recognition/images/demo.jpg rename to PaddleCV/ocr_recognition/images/demo.jpg diff --git a/fluid/PaddleCV/ocr_recognition/images/train.jpg b/PaddleCV/ocr_recognition/images/train.jpg similarity index 100% rename from fluid/PaddleCV/ocr_recognition/images/train.jpg rename to PaddleCV/ocr_recognition/images/train.jpg diff --git a/fluid/PaddleCV/ocr_recognition/images/train_attention.jpg b/PaddleCV/ocr_recognition/images/train_attention.jpg similarity index 100% rename from fluid/PaddleCV/ocr_recognition/images/train_attention.jpg rename to PaddleCV/ocr_recognition/images/train_attention.jpg diff --git a/fluid/PaddleCV/ocr_recognition/infer.py b/PaddleCV/ocr_recognition/infer.py similarity index 100% rename from fluid/PaddleCV/ocr_recognition/infer.py rename to PaddleCV/ocr_recognition/infer.py diff --git a/fluid/PaddleCV/ocr_recognition/scripts/README.md b/PaddleCV/ocr_recognition/scripts/README.md similarity index 100% rename from fluid/PaddleCV/ocr_recognition/scripts/README.md rename to PaddleCV/ocr_recognition/scripts/README.md diff --git a/fluid/PaddleCV/ocr_recognition/scripts/infer.sh b/PaddleCV/ocr_recognition/scripts/infer.sh similarity index 100% rename from fluid/PaddleCV/ocr_recognition/scripts/infer.sh rename to PaddleCV/ocr_recognition/scripts/infer.sh diff --git a/fluid/PaddleCV/ocr_recognition/scripts/train.sh b/PaddleCV/ocr_recognition/scripts/train.sh similarity index 100% rename from fluid/PaddleCV/ocr_recognition/scripts/train.sh rename to PaddleCV/ocr_recognition/scripts/train.sh diff --git a/fluid/PaddleCV/ocr_recognition/train.py b/PaddleCV/ocr_recognition/train.py similarity index 100% rename from fluid/PaddleCV/ocr_recognition/train.py rename to PaddleCV/ocr_recognition/train.py diff --git a/fluid/PaddleCV/ocr_recognition/utility.py b/PaddleCV/ocr_recognition/utility.py similarity index 100% rename from fluid/PaddleCV/ocr_recognition/utility.py rename to PaddleCV/ocr_recognition/utility.py diff --git a/fluid/PaddleCV/rcnn/.gitignore b/PaddleCV/rcnn/.gitignore similarity index 100% rename from fluid/PaddleCV/rcnn/.gitignore rename to PaddleCV/rcnn/.gitignore diff --git a/fluid/PaddleCV/rcnn/.run_ce.sh b/PaddleCV/rcnn/.run_ce.sh similarity index 100% rename from fluid/PaddleCV/rcnn/.run_ce.sh rename to PaddleCV/rcnn/.run_ce.sh diff --git a/PaddleCV/rcnn/README.md b/PaddleCV/rcnn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..97d1736f2b25bd6baa5ab5142be18544c9d63b85 --- /dev/null +++ b/PaddleCV/rcnn/README.md @@ -0,0 +1,209 @@ +# RCNN Objective Detection + +--- +## Table of Contents + +- [Installation](#installation) +- [Introduction](#introduction) +- [Data preparation](#data-preparation) +- [Training](#training) +- [Evaluation](#evaluation) +- [Inference and Visualization](#inference-and-visualization) + +## Installation + +Running sample code in this directory requires PaddelPaddle Fluid v.1.3.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://paddlepaddle.org/documentation/docs/en/1.3/beginners_guide/install/index_en.html) and make an update. + +## Introduction + +Region Convolutional Neural Network (RCNN) models are two stages detector. According to proposals and feature extraction, obtain class and more precise proposals. +Now RCNN model contains two typical models: Faster RCNN and Mask RCNN. + +[Faster RCNN](https://arxiv.org/abs/1506.01497), The total framework of network can be divided into four parts: + +1. Base conv layer. As a CNN objective dection, Faster RCNN extract feature maps using a basic convolutional network. The feature maps then can be shared by RPN and fc layers. This sampel uses [ResNet-50](https://arxiv.org/abs/1512.03385) as base conv layer. +2. Region Proposal Network (RPN). RPN generates proposals for detection。This block generates anchors by a set of size and ratio and classifies anchors into fore-ground and back-ground by softmax. Then refine anchors to obtain more precise proposals using box regression. +3. RoI Align. This layer takes feature maps and proposals as input. The proposals are mapped to feature maps and pooled to the same size. The output are sent to fc layers for classification and regression. RoIPool and RoIAlign are used separately to this layer and it can be set in roi\_func in config.py. +4. Detection layer. Using the output of roi pooling to compute the class and locatoin of each proposal in two fc layers. + +[Mask RCNN](https://arxiv.org/abs/1703.06870) is a classical instance segmentation model and an extension of Faster RCNN + +Mask RCNN is a two stage model as well. At the first stage, it generates proposals from input images. At the second stage, it obtains class result, bbox and mask which is the result from segmentation branch on original Faster RCNN model. It decouples the relation between mask and classification. + +## Data preparation + +Train the model on [MS-COCO dataset](http://cocodataset.org/#download), download dataset as below: + + cd dataset/coco + ./download.sh + +The data catalog structure is as follows: + + ``` + data/coco/ + ├── annotations + │   ├── instances_train2014.json + │   ├── instances_train2017.json + │   ├── instances_val2014.json + │   ├── instances_val2017.json + | ... + ├── train2017 + │   ├── 000000000009.jpg + │   ├── 000000580008.jpg + | ... + ├── val2017 + │   ├── 000000000139.jpg + │   ├── 000000000285.jpg + | ... + ``` + +## Training + +**download the pre-trained model:** This sample provides Resnet-50 pre-trained model which is converted from Caffe. The model fuses the parameters in batch normalization layer. One can download pre-trained model as: + + sh ./pretrained/download.sh + +Set `pretrained_model` to load pre-trained model. In addition, this parameter is used to load trained model when finetuning as well. +Please make sure that pretrained_model is downloaded and loaded correctly, otherwise, the loss may be NAN during training. + +**Install the [cocoapi](https://github.com/cocodataset/cocoapi):** + +To train the model, [cocoapi](https://github.com/cocodataset/cocoapi) is needed. Install the cocoapi: + + git clone https://github.com/cocodataset/cocoapi.git + cd cocoapi/PythonAPI + # if cython is not installed + pip install Cython + # Install into global site-packages + make install + # Alternatively, if you do not have permissions or prefer + # not to install the COCO API into global site-packages + python2 setup.py install --user + +After data preparation, one can start the training step by: + +- Faster RCNN + + ``` + python train.py \ + --model_save_dir=output/ \ + --pretrained_model=${path_to_pretrain_model} \ + --data_dir=${path_to_data} \ + --MASK_ON=False + ``` + +- Mask RCNN + + ``` + python train.py \ + --model_save_dir=output/ \ + --pretrained_model=${path_to_pretrain_model} \ + --data_dir=${path_to_data} \ + --MASK_ON=True + ``` + + - Set ```export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7``` to specifiy 8 GPU to train. + - Set ```MASK_ON``` to choose Faster RCNN or Mask RCNN model. + - For more help on arguments: + + python train.py --help + +**data reader introduction:** + +* Data reader is defined in `reader.py`. +* Scaling the short side of all images to `scales`. If the long side is larger than `max_size`, then scaling the long side to `max_size`. +* In training stage, images are horizontally flipped. +* Images in the same batch can be padding to the same size. + +**model configuration:** + +* Use RoIAlign and RoIPool separately. +* NMS threshold=0.7. During training, pre\_nms=12000, post\_nms=2000; during test, pre\_nms=6000, post\_nms=1000. +* In generating proposal lables, fg\_fraction=0.25, fg\_thresh=0.5, bg\_thresh_hi=0.5, bg\_thresh\_lo=0.0. +* In rpn target assignment, rpn\_fg\_fraction=0.5, rpn\_positive\_overlap=0.7, rpn\_negative\_overlap=0.3. + +**training strategy:** + +* Use momentum optimizer with momentum=0.9. +* Weight decay is 0.0001. +* In first 500 iteration, the learning rate increases linearly from 0.00333 to 0.01. Then lr is decayed at 120000, 160000 iteration with multiplier 0.1, 0.01. The maximum iteration is 180000. Also, we released a 2x model which has 360000 iterations and lr is decayed at 240000, 320000. These configuration can be set by max_iter and lr_steps in config.py. +* Set the learning rate of bias to two times as global lr in non basic convolutional layers. +* In basic convolutional layers, parameters of affine layers and res body do not update. + +## Evaluation + +Evaluation is to evaluate the performance of a trained model. This sample provides `eval_coco_map.py` which uses a COCO-specific mAP metric defined by [COCO committee](http://cocodataset.org/#detections-eval). + +`eval_coco_map.py` is the main executor for evalution, one can start evalution step by: + +- Faster RCNN + + ``` + python eval_coco_map.py \ + --dataset=coco2017 \ + --pretrained_model=${path_to_trained_model} \ + --MASK_ON=False + ``` + +- Mask RCNN + + ``` + python eval_coco_map.py \ + --dataset=coco2017 \ + --pretrained_model=${path_to_trainde_model} \ + --MASK_ON=True + ``` + + - Set ```--pretrained_model=${path_to_trained_model}``` to specifiy the trained model, not the initialized model. + - Set ```export CUDA_VISIBLE_DEVICES=0``` to specifiy one GPU to eval. + - Set ```MASK_ON``` to choose Faster RCNN or Mask RCNN model. + +Evalutaion result is shown as below: + +Faster RCNN: + +| Model | RoI function | Batch size | Max iteration | mAP | +| :--------------- | :--------: | :------------: | :------------------: |------: | +| [Fluid RoIPool minibatch padding](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_pool_minibatch_padding.tar.gz) | RoIPool | 8 | 180000 | 0.316 | +| [Fluid RoIPool no padding](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_pool_no_padding.tar.gz) | RoIPool | 8 | 180000 | 0.318 | +| [Fluid RoIAlign no padding](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_align_no_padding.tar.gz) | RoIAlign | 8 | 180000 | 0.348 | +| [Fluid RoIAlign no padding 2x](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_align_no_padding_2x.tar.gz) | RoIAlign | 8 | 360000 | 0.367 | + +* Fluid RoIPool minibatch padding: Use RoIPool. Images in one batch padding to the same size. This method is same as detectron. +* Fluid RoIPool no padding: Images without padding. +* Fluid RoIAlign no padding: Images without padding. +* Fluid RoIAlign no padding 2x: Images without padding, train for 360000 iterations, learning rate is decayed at 240000, 320000. + +Mask RCNN: + +| Model | Batch size | Max iteration | box mAP | mask mAP | +| :--------------- | :--------: | :------------: | :--------: |------: | +| [Fluid mask no padding](https://paddlemodels.bj.bcebos.com/faster_rcnn/Fluid_mask_no_padding.tar.gz) | 8 | 180000 | 0.359 | 0.314 | + +* Fluid mask no padding: Use RoIAlign. Images without padding. + +## Inference and Visualization + +Inference is used to get prediction score or image features based on trained models. `infer.py` is the main executor for inference, one can start infer step by: + +``` +python infer.py \ + --pretrained_model=${path_to_trained_model} \ + --image_path=dataset/coco/val2017/000000000139.jpg \ + --draw_threshold=0.6 +``` + +Please set the model path and image path correctly. GPU device is used by default, you can set `--use_gpu=False` to switch to CPU device. And you can set `draw_threshold` to tune score threshold to control the number of output detection boxes. + +Visualization of infer result is shown as below: +

+ +
+Faster RCNN Visualization Examples +

+ +

+ +
+Mask RCNN Visualization Examples +

diff --git a/PaddleCV/rcnn/README_cn.md b/PaddleCV/rcnn/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..3d45de1c5845727d0b942f9ba5a4bf5216985af9 --- /dev/null +++ b/PaddleCV/rcnn/README_cn.md @@ -0,0 +1,207 @@ +# RCNN 系列目标检测 + +--- +## 内容 + +- [安装](#安装) +- [简介](#简介) +- [数据准备](#数据准备) +- [模型训练](#模型训练) +- [模型评估](#模型评估) +- [模型推断及可视化](#模型推断及可视化) + +## 安装 + +在当前目录下运行样例代码需要PadddlePaddle Fluid的v.1.3.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://www.paddlepaddle.org/)中的说明来更新PaddlePaddle。 + +## 简介 +区域卷积神经网络(RCNN)系列模型为两阶段目标检测器。通过对图像生成候选区域,提取特征,判别特征类别并修正候选框位置。 +RCNN系列目前包含两个代表模型:Faster RCNN,Mask RCNN + +[Faster RCNN](https://arxiv.org/abs/1506.01497) 整体网络可以分为4个主要内容: + +1. 基础卷积层。作为一种卷积神经网络目标检测方法,Faster RCNN首先使用一组基础的卷积网络提取图像的特征图。特征图被后续RPN层和全连接层共享。本示例采用[ResNet-50](https://arxiv.org/abs/1512.03385)作为基础卷积层。 +2. 区域生成网络(RPN)。RPN网络用于生成候选区域(proposals)。该层通过一组固定的尺寸和比例得到一组锚点(anchors), 通过softmax判断锚点属于前景或者背景,再利用区域回归修正锚点从而获得精确的候选区域。 +3. RoI Align。该层收集输入的特征图和候选区域,将候选区域映射到特征图中并池化为统一大小的区域特征图,送入全连接层判定目标类别, 该层可选用RoIPool和RoIAlign两种方式,在config.py中设置roi\_func。 +4. 检测层。利用区域特征图计算候选区域的类别,同时再次通过区域回归获得检测框最终的精确位置。 + +[Mask RCNN](https://arxiv.org/abs/1703.06870) 扩展自Faster RCNN,是经典的实例分割模型。 + +Mask RCNN同样为两阶段框架,第一阶段扫描图像生成候选框;第二阶段根据候选框得到分类结果,边界框,同时在原有Faster RCNN模型基础上添加分割分支,得到掩码结果,实现了掩码和类别预测关系的解藕。 + + +## 数据准备 + +在[MS-COCO数据集](http://cocodataset.org/#download)上进行训练,通过如下方式下载数据集。 + + cd dataset/coco + ./download.sh + +数据目录结构如下: + +``` +data/coco/ +├── annotations +│   ├── instances_train2014.json +│   ├── instances_train2017.json +│   ├── instances_val2014.json +│   ├── instances_val2017.json +| ... +├── train2017 +│   ├── 000000000009.jpg +│   ├── 000000580008.jpg +| ... +├── val2017 +│   ├── 000000000139.jpg +│   ├── 000000000285.jpg +| ... + +``` + +## 模型训练 + +**下载预训练模型:** 本示例提供Resnet-50预训练模型,该模性转换自Caffe,并对批标准化层(Batch Normalization Layer)进行参数融合。采用如下命令下载预训练模型: + + sh ./pretrained/download.sh + +通过初始化`pretrained_model` 加载预训练模型。同时在参数微调时也采用该设置加载已训练模型。 +请在训练前确认预训练模型下载与加载正确,否则训练过程中损失可能会出现NAN。 + +**安装[cocoapi](https://github.com/cocodataset/cocoapi):** + +训练前需要首先下载[cocoapi](https://github.com/cocodataset/cocoapi): + + git clone https://github.com/cocodataset/cocoapi.git + cd cocoapi/PythonAPI + # if cython is not installed + pip install Cython + # Install into global site-packages + make install + # Alternatively, if you do not have permissions or prefer + # not to install the COCO API into global site-packages + python2 setup.py install --user + +数据准备完毕后,可以通过如下的方式启动训练: + +- Faster RCNN + + ``` + python train.py \ + --model_save_dir=output/ \ + --pretrained_model=${path_to_pretrain_model} \ + --data_dir=${path_to_data} \ + --MASK_ON=False + ``` + +- Mask RCNN + + ``` + python train.py \ + --model_save_dir=output/ \ + --pretrained_model=${path_to_pretrain_model} \ + --data_dir=${path_to_data} \ + --MASK_ON=True + ``` + + - 通过设置export CUDA\_VISIBLE\_DEVICES=0,1,2,3,4,5,6,7指定8卡GPU训练。 + - 通过设置```MASK_ON```选择Faster RCNN和Mask RCNN模型。 + - 可选参数见: + + python train.py --help + +**数据读取器说明:** 数据读取器定义在reader.py中。所有图像将短边等比例缩放至`scales`,若长边大于`max_size`, 则再次将长边等比例缩放至`max_size`。在训练阶段,对图像采用水平翻转。支持将同一个batch内的图像padding为相同尺寸。 + +**模型设置:** + +* 分别使用RoIAlign和RoIPool两种方法。 +* 训练过程pre\_nms=12000, post\_nms=2000,测试过程pre\_nms=6000, post\_nms=1000。nms阈值为0.7。 +* RPN网络得到labels的过程中,fg\_fraction=0.25,fg\_thresh=0.5,bg\_thresh_hi=0.5,bg\_thresh\_lo=0.0 +* RPN选择anchor时,rpn\_fg\_fraction=0.5,rpn\_positive\_overlap=0.7,rpn\_negative\_overlap=0.3 + + +**训练策略:** + +* 采用momentum优化算法训练,momentum=0.9。 +* 权重衰减系数为0.0001,前500轮学习率从0.00333线性增加至0.01。在120000,160000轮时使用0.1,0.01乘子进行学习率衰减,最大训练180000轮。同时我们也提供了2x模型,该模型采用更多的迭代轮数进行训练,训练360000轮,学习率在240000,320000轮衰减,其他参数不变,训练最大轮数和学习率策略可以在config.py中对max_iter和lr_steps进行设置。 +* 非基础卷积层卷积bias学习率为整体学习率2倍。 +* 基础卷积层中,affine_layers参数不更新,res2层参数不更新。 + +## 模型评估 + +模型评估是指对训练完毕的模型评估各类性能指标。本示例采用[COCO官方评估](http://cocodataset.org/#detections-eval) + +`eval_coco_map.py`是评估模块的主要执行程序,调用示例如下: + +- Faster RCNN + + ``` + python eval_coco_map.py \ + --dataset=coco2017 \ + --pretrained_model=${path_to_trained_model} \ + --MASK_ON=False + ``` + +- Mask RCNN + + ``` + python eval_coco_map.py \ + --dataset=coco2017 \ + --pretrained_model=${path_to_trained_model} \ + --MASK_ON=True + ``` + + - 通过设置`--pretrained_model=${path_to_trained_model}`指定训练好的模型,注意不是初始化的模型。 + - 通过设置`export CUDA\_VISIBLE\_DEVICES=0`指定单卡GPU评估。 + - 通过设置```MASK_ON```选择Faster RCNN和Mask RCNN模型。 + +下表为模型评估结果: + +Faster RCNN + +| 模型 | RoI处理方式 | 批量大小 | 迭代次数 | mAP | +| :--------------- | :--------: | :------------: | :------------------: |------: | +| [Fluid RoIPool minibatch padding](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_pool_minibatch_padding.tar.gz) | RoIPool | 8 | 180000 | 0.316 | +| [Fluid RoIPool no padding](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_pool_no_padding.tar.gz) | RoIPool | 8 | 180000 | 0.318 | +| [Fluid RoIAlign no padding](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_align_no_padding.tar.gz) | RoIAlign | 8 | 180000 | 0.348 | +| [Fluid RoIAlign no padding 2x](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_align_no_padding_2x.tar.gz) | RoIAlign | 8 | 360000 | 0.367 | + + + +* Fluid RoIPool minibatch padding: 使用RoIPool,同一个batch内的图像填充为相同尺寸。该方法与detectron处理相同。 +* Fluid RoIPool no padding: 使用RoIPool,不对图像做填充处理。 +* Fluid RoIAlign no padding: 使用RoIAlign,不对图像做填充处理。 +* Fluid RoIAlign no padding 2x: 使用RoIAlign,不对图像做填充处理。训练360000轮,学习率在240000,320000轮衰减。 + +Mask RCNN: + +| 模型 | 批量大小 | 迭代次数 | box mAP | mask mAP | +| :--------------- | :--------: | :------------: | :--------: |------: | +| [Fluid mask no padding](https://paddlemodels.bj.bcebos.com/faster_rcnn/Fluid_mask_no_padding.tar.gz) | 8 | 180000 | 0.359 | 0.314 | + +* Fluid mask no padding: 使用RoIAlign,不对图像做填充处理 + +## 模型推断及可视化 + +模型推断可以获取图像中的物体及其对应的类别,`infer.py`是主要执行程序,调用示例如下: + +``` +python infer.py \ + --pretrained_model=${path_to_trained_model} \ + --image_path=dataset/coco/val2017/000000000139.jpg \ + --draw_threshold=0.6 +``` + +注意,请正确设置模型路径`${path_to_trained_model}`和预测图片路径。默认使用GPU设备,也可通过设置`--use_gpu=False`使用CPU设备。可通过设置`draw_threshold`调节得分阈值控制检测框的个数。 + +下图为模型可视化预测结果: +

+ +
+Faster RCNN 预测可视化 +

+ +

+ +
+Mask RCNN 预测可视化 +

diff --git a/fluid/PaddleCV/human_pose_estimation/utils/__init__.py b/PaddleCV/rcnn/__init__.py similarity index 100% rename from fluid/PaddleCV/human_pose_estimation/utils/__init__.py rename to PaddleCV/rcnn/__init__.py diff --git a/fluid/PaddleCV/rcnn/_ce.py b/PaddleCV/rcnn/_ce.py similarity index 100% rename from fluid/PaddleCV/rcnn/_ce.py rename to PaddleCV/rcnn/_ce.py diff --git a/fluid/PaddleCV/rcnn/box_utils.py b/PaddleCV/rcnn/box_utils.py similarity index 100% rename from fluid/PaddleCV/rcnn/box_utils.py rename to PaddleCV/rcnn/box_utils.py diff --git a/fluid/PaddleCV/rcnn/colormap.py b/PaddleCV/rcnn/colormap.py similarity index 100% rename from fluid/PaddleCV/rcnn/colormap.py rename to PaddleCV/rcnn/colormap.py diff --git a/fluid/PaddleCV/rcnn/config.py b/PaddleCV/rcnn/config.py similarity index 100% rename from fluid/PaddleCV/rcnn/config.py rename to PaddleCV/rcnn/config.py diff --git a/fluid/PaddleCV/rcnn/data_utils.py b/PaddleCV/rcnn/data_utils.py similarity index 100% rename from fluid/PaddleCV/rcnn/data_utils.py rename to PaddleCV/rcnn/data_utils.py diff --git a/fluid/PaddleCV/rcnn/dataset/coco/download.sh b/PaddleCV/rcnn/dataset/coco/download.sh similarity index 100% rename from fluid/PaddleCV/rcnn/dataset/coco/download.sh rename to PaddleCV/rcnn/dataset/coco/download.sh diff --git a/fluid/PaddleCV/rcnn/edict.py b/PaddleCV/rcnn/edict.py similarity index 100% rename from fluid/PaddleCV/rcnn/edict.py rename to PaddleCV/rcnn/edict.py diff --git a/fluid/PaddleCV/rcnn/eval_coco_map.py b/PaddleCV/rcnn/eval_coco_map.py similarity index 100% rename from fluid/PaddleCV/rcnn/eval_coco_map.py rename to PaddleCV/rcnn/eval_coco_map.py diff --git a/fluid/PaddleCV/rcnn/eval_helper.py b/PaddleCV/rcnn/eval_helper.py similarity index 100% rename from fluid/PaddleCV/rcnn/eval_helper.py rename to PaddleCV/rcnn/eval_helper.py diff --git a/fluid/PaddleCV/rcnn/image/000000000139.jpg b/PaddleCV/rcnn/image/000000000139.jpg similarity index 100% rename from fluid/PaddleCV/rcnn/image/000000000139.jpg rename to PaddleCV/rcnn/image/000000000139.jpg diff --git a/fluid/PaddleCV/rcnn/image/000000000139_mask.jpg b/PaddleCV/rcnn/image/000000000139_mask.jpg similarity index 100% rename from fluid/PaddleCV/rcnn/image/000000000139_mask.jpg rename to PaddleCV/rcnn/image/000000000139_mask.jpg diff --git a/fluid/PaddleCV/rcnn/image/000000127517.jpg b/PaddleCV/rcnn/image/000000127517.jpg similarity index 100% rename from fluid/PaddleCV/rcnn/image/000000127517.jpg rename to PaddleCV/rcnn/image/000000127517.jpg diff --git a/fluid/PaddleCV/rcnn/image/000000127517_mask.jpg b/PaddleCV/rcnn/image/000000127517_mask.jpg similarity index 100% rename from fluid/PaddleCV/rcnn/image/000000127517_mask.jpg rename to PaddleCV/rcnn/image/000000127517_mask.jpg diff --git a/fluid/PaddleCV/rcnn/image/000000203864.jpg b/PaddleCV/rcnn/image/000000203864.jpg similarity index 100% rename from fluid/PaddleCV/rcnn/image/000000203864.jpg rename to PaddleCV/rcnn/image/000000203864.jpg diff --git a/fluid/PaddleCV/rcnn/image/000000515077.jpg b/PaddleCV/rcnn/image/000000515077.jpg similarity index 100% rename from fluid/PaddleCV/rcnn/image/000000515077.jpg rename to PaddleCV/rcnn/image/000000515077.jpg diff --git a/fluid/PaddleCV/rcnn/infer.py b/PaddleCV/rcnn/infer.py similarity index 100% rename from fluid/PaddleCV/rcnn/infer.py rename to PaddleCV/rcnn/infer.py diff --git a/fluid/PaddleCV/rcnn/learning_rate.py b/PaddleCV/rcnn/learning_rate.py similarity index 100% rename from fluid/PaddleCV/rcnn/learning_rate.py rename to PaddleCV/rcnn/learning_rate.py diff --git a/fluid/PaddleCV/image_classification/__init__.py b/PaddleCV/rcnn/models/__init__.py similarity index 100% rename from fluid/PaddleCV/image_classification/__init__.py rename to PaddleCV/rcnn/models/__init__.py diff --git a/fluid/PaddleCV/rcnn/models/model_builder.py b/PaddleCV/rcnn/models/model_builder.py similarity index 100% rename from fluid/PaddleCV/rcnn/models/model_builder.py rename to PaddleCV/rcnn/models/model_builder.py diff --git a/fluid/PaddleCV/rcnn/models/resnet.py b/PaddleCV/rcnn/models/resnet.py similarity index 100% rename from fluid/PaddleCV/rcnn/models/resnet.py rename to PaddleCV/rcnn/models/resnet.py diff --git a/fluid/PaddleCV/rcnn/pretrained/download.sh b/PaddleCV/rcnn/pretrained/download.sh similarity index 100% rename from fluid/PaddleCV/rcnn/pretrained/download.sh rename to PaddleCV/rcnn/pretrained/download.sh diff --git a/fluid/PaddleCV/rcnn/profile.py b/PaddleCV/rcnn/profile.py similarity index 100% rename from fluid/PaddleCV/rcnn/profile.py rename to PaddleCV/rcnn/profile.py diff --git a/fluid/PaddleCV/rcnn/reader.py b/PaddleCV/rcnn/reader.py similarity index 100% rename from fluid/PaddleCV/rcnn/reader.py rename to PaddleCV/rcnn/reader.py diff --git a/fluid/PaddleCV/rcnn/roidbs.py b/PaddleCV/rcnn/roidbs.py similarity index 100% rename from fluid/PaddleCV/rcnn/roidbs.py rename to PaddleCV/rcnn/roidbs.py diff --git a/fluid/PaddleCV/rcnn/scripts/eval.sh b/PaddleCV/rcnn/scripts/eval.sh similarity index 100% rename from fluid/PaddleCV/rcnn/scripts/eval.sh rename to PaddleCV/rcnn/scripts/eval.sh diff --git a/fluid/PaddleCV/rcnn/scripts/infer.sh b/PaddleCV/rcnn/scripts/infer.sh similarity index 100% rename from fluid/PaddleCV/rcnn/scripts/infer.sh rename to PaddleCV/rcnn/scripts/infer.sh diff --git a/fluid/PaddleCV/rcnn/scripts/train.sh b/PaddleCV/rcnn/scripts/train.sh similarity index 100% rename from fluid/PaddleCV/rcnn/scripts/train.sh rename to PaddleCV/rcnn/scripts/train.sh diff --git a/fluid/PaddleCV/rcnn/segm_utils.py b/PaddleCV/rcnn/segm_utils.py similarity index 100% rename from fluid/PaddleCV/rcnn/segm_utils.py rename to PaddleCV/rcnn/segm_utils.py diff --git a/fluid/PaddleCV/rcnn/train.py b/PaddleCV/rcnn/train.py similarity index 100% rename from fluid/PaddleCV/rcnn/train.py rename to PaddleCV/rcnn/train.py diff --git a/fluid/PaddleCV/rcnn/utility.py b/PaddleCV/rcnn/utility.py similarity index 100% rename from fluid/PaddleCV/rcnn/utility.py rename to PaddleCV/rcnn/utility.py diff --git a/fluid/PaddleCV/video/.gitignore b/PaddleCV/video/.gitignore similarity index 100% rename from fluid/PaddleCV/video/.gitignore rename to PaddleCV/video/.gitignore diff --git a/PaddleCV/video/README.md b/PaddleCV/video/README.md new file mode 100644 index 0000000000000000000000000000000000000000..b6b6cdd2dd817268b2fe42f79da8e9e952f96f74 --- /dev/null +++ b/PaddleCV/video/README.md @@ -0,0 +1,130 @@ + +## 简介 +本教程期望给开发者提供基于PaddlePaddle的便捷、高效的使用深度学习算法解决视频理解、视频编辑、视频生成等一系列模型。目前包含视频分类模型,后续会不断的扩展到其他更多场景。 + +目前视频分类模型包括: + +| 模型 | 类别 | 描述 | +| :--------------- | :--------: | :------------: | +| [Attention Cluster](./models/attention_cluster/README.md) | 视频分类| CVPR'18提出的视频多模态特征注意力聚簇融合方法 | +| [Attention LSTM](./models/attention_lstm/README.md) | 视频分类| 常用模型,速度快精度高 | +| [NeXtVLAD](./models/nextvlad/README.md) | 视频分类| 2nd-Youtube-8M最优单模型 | +| [StNet](./models/stnet/README.md) | 视频分类| AAAI'19提出的视频联合时空建模方法 | +| [TSN](./models/tsn/README.md) | 视频分类| ECCV'16提出的基于2D-CNN经典解决方案 | + +### 主要特点 + +- 包含视频分类方向的多个主流领先模型,其中Attention LSTM,Attention Cluster和NeXtVLAD是比较流行的特征序列模型,TSN和StNet是两个End-to-End的视频分类模型。Attention LSTM模型速度快精度高,NeXtVLAD是2nd-Youtube-8M比赛中最好的单模型, TSN是基于2D-CNN的经典解决方案。Attention Cluster和StNet是百度自研模型,分别发表于CVPR2018和AAAI2019,是Kinetics600比赛第一名中使用到的模型。 + +- 提供了适合视频分类任务的通用骨架代码,用户可一键式高效配置模型完成训练和评测。 + +## 安装 + +在当前模型库运行样例代码需要PadddlePaddle Fluid v.1.2.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://www.paddlepaddle.org/documentation/docs/zh/1.3/beginners_guide/install/index_cn.html)中的说明来更新PaddlePaddle。 + +## 数据准备 + +视频模型库使用Youtube-8M和Kinetics数据集, 具体使用方法请参考[数据说明](./dataset/README.md) + +## 快速使用 + +视频模型库提供通用的train/test/infer框架,通过`train.py/test.py/infer.py`指定模型名、模型配置参数等可一键式进行训练和预测。 + +以StNet模型为例: + +单卡训练: + +``` bash +export CUDA_VISIBLE_DEVICES=0 +python train.py --model-name=STNET + --config=./configs/stnet.txt + --save-dir=checkpoints +``` + +多卡训练: + +``` bash +export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 +python train.py --model-name=STNET + --config=./configs/stnet.txt + --save-dir=checkpoints +``` + +视频模型库同时提供了快速训练脚本,脚本位于`scripts/train`目录下,可通过如下命令启动训练: + +``` bash +bash scripts/train/train_stnet.sh +``` + +- 请根据`CUDA_VISIBLE_DEVICES`指定卡数修改`config`文件中的`num_gpus`和`batch_size`配置。 + +## 模型库结构 + +### 代码结构 + +``` +configs/ + stnet.txt + tsn.txt + ... +dataset/ + youtube/ + kinetics/ +datareader/ + feature_readeer.py + kinetics_reader.py + ... +metrics/ + kinetics/ + youtube8m/ + ... +models/ + stnet/ + tsn/ + ... +scripts/ + train/ + test/ +train.py +test.py +infer.py +``` + +- `configs`: 各模型配置文件模板 +- `datareader`: 提供Youtube-8M,Kinetics数据集reader +- `metrics`: Youtube-8,Kinetics数据集评估脚本 +- `models`: 各模型网络结构构建脚本 +- `scripts`: 各模型快速训练评估脚本 +- `train.py`: 一键式训练脚本,可通过指定模型名,配置文件等一键式启动训练 +- `test.py`: 一键式评估脚本,可通过指定模型名,配置文件,模型权重等一键式启动评估 +- `infer.py`: 一键式推断脚本,可通过指定模型名,配置文件,模型权重,待推断文件列表等一键式启动推断 + +## Model Zoo + +- 基于Youtube-8M数据集模型: + +| 模型 | Batch Size | 环境配置 | cuDNN版本 | GAP | 下载链接 | +| :-------: | :---: | :---------: | :-----: | :----: | :----------: | +| Attention Cluster | 2048 | 8卡P40 | 7.1 | 0.84 | [model](https://paddlemodels.bj.bcebos.com/video_classification/attention_cluster_youtube8m.tar.gz) | +| Attention LSTM | 1024 | 8卡P40 | 7.1 | 0.86 | [model](https://paddlemodels.bj.bcebos.com/video_classification/attention_lstm_youtube8m.tar.gz) | +| NeXtVLAD | 160 | 4卡P40 | 7.1 | 0.87 | [model](https://paddlemodels.bj.bcebos.com/video_classification/nextvlad_youtube8m.tar.gz) | + +- 基于Kinetics数据集模型: + +| 模型 | Batch Size | 环境配置 | cuDNN版本 | Top-1 | 下载链接 | +| :-------: | :---: | :---------: | :----: | :----: | :----------: | +| StNet | 128 | 8卡P40 | 5.1 | 0.69 | [model](https://paddlemodels.bj.bcebos.com/video_classification/stnet_kinetics.tar.gz) | +| TSN | 256 | 8卡P40 | 7.1 | 0.67 | [model](https://paddlemodels.bj.bcebos.com/video_classification/tsn_kinetics.tar.gz) | + +## 参考文献 + +- [Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification](https://arxiv.org/abs/1711.09550), Xiang Long, Chuang Gan, Gerard de Melo, Jiajun Wu, Xiao Liu, Shilei Wen +- [Beyond Short Snippets: Deep Networks for Video Classification](https://arxiv.org/abs/1503.08909) Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, George Toderici +- [NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification](https://arxiv.org/abs/1811.05014), Rongcheng Lin, Jing Xiao, Jianping Fan +- [StNet:Local and Global Spatial-Temporal Modeling for Human Action Recognition](https://arxiv.org/abs/1811.01549), Dongliang He, Zhichao Zhou, Chuang Gan, Fu Li, Xiao Liu, Yandong Li, Limin Wang, Shilei Wen +- [Temporal Segment Networks: Towards Good Practices for Deep Action Recognition](https://arxiv.org/abs/1608.00859), Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc Van Gool + +## 版本更新 + +- 3/2019: 新增模型库,发布Attention Cluster,Attention LSTM,NeXtVLAD,StNet,TSN五个视频分类模型。 + diff --git a/fluid/PaddleCV/video/config.py b/PaddleCV/video/config.py similarity index 100% rename from fluid/PaddleCV/video/config.py rename to PaddleCV/video/config.py diff --git a/fluid/PaddleCV/video/configs/attention_cluster.txt b/PaddleCV/video/configs/attention_cluster.txt similarity index 100% rename from fluid/PaddleCV/video/configs/attention_cluster.txt rename to PaddleCV/video/configs/attention_cluster.txt diff --git a/fluid/PaddleCV/video/configs/attention_lstm.txt b/PaddleCV/video/configs/attention_lstm.txt similarity index 100% rename from fluid/PaddleCV/video/configs/attention_lstm.txt rename to PaddleCV/video/configs/attention_lstm.txt diff --git a/fluid/PaddleCV/video/configs/nextvlad.txt b/PaddleCV/video/configs/nextvlad.txt similarity index 100% rename from fluid/PaddleCV/video/configs/nextvlad.txt rename to PaddleCV/video/configs/nextvlad.txt diff --git a/fluid/PaddleCV/video/configs/stnet.txt b/PaddleCV/video/configs/stnet.txt similarity index 100% rename from fluid/PaddleCV/video/configs/stnet.txt rename to PaddleCV/video/configs/stnet.txt diff --git a/fluid/PaddleCV/video/configs/tsn.txt b/PaddleCV/video/configs/tsn.txt similarity index 100% rename from fluid/PaddleCV/video/configs/tsn.txt rename to PaddleCV/video/configs/tsn.txt diff --git a/fluid/PaddleCV/video/datareader/__init__.py b/PaddleCV/video/datareader/__init__.py similarity index 100% rename from fluid/PaddleCV/video/datareader/__init__.py rename to PaddleCV/video/datareader/__init__.py diff --git a/fluid/PaddleCV/video/datareader/feature_reader.py b/PaddleCV/video/datareader/feature_reader.py similarity index 100% rename from fluid/PaddleCV/video/datareader/feature_reader.py rename to PaddleCV/video/datareader/feature_reader.py diff --git a/fluid/PaddleCV/video/datareader/kinetics_reader.py b/PaddleCV/video/datareader/kinetics_reader.py similarity index 100% rename from fluid/PaddleCV/video/datareader/kinetics_reader.py rename to PaddleCV/video/datareader/kinetics_reader.py diff --git a/fluid/PaddleCV/video/datareader/nonlocal_reader.py b/PaddleCV/video/datareader/nonlocal_reader.py similarity index 100% rename from fluid/PaddleCV/video/datareader/nonlocal_reader.py rename to PaddleCV/video/datareader/nonlocal_reader.py diff --git a/fluid/PaddleCV/video/datareader/reader_utils.py b/PaddleCV/video/datareader/reader_utils.py similarity index 100% rename from fluid/PaddleCV/video/datareader/reader_utils.py rename to PaddleCV/video/datareader/reader_utils.py diff --git a/fluid/PaddleCV/video/dataset/README.md b/PaddleCV/video/dataset/README.md similarity index 100% rename from fluid/PaddleCV/video/dataset/README.md rename to PaddleCV/video/dataset/README.md diff --git a/fluid/PaddleCV/video/dataset/kinetics/generate_label.py b/PaddleCV/video/dataset/kinetics/generate_label.py similarity index 100% rename from fluid/PaddleCV/video/dataset/kinetics/generate_label.py rename to PaddleCV/video/dataset/kinetics/generate_label.py diff --git a/fluid/PaddleCV/video/dataset/kinetics/video2pkl.py b/PaddleCV/video/dataset/kinetics/video2pkl.py similarity index 100% rename from fluid/PaddleCV/video/dataset/kinetics/video2pkl.py rename to PaddleCV/video/dataset/kinetics/video2pkl.py diff --git a/fluid/PaddleCV/video/dataset/youtube8m/tf2pkl.py b/PaddleCV/video/dataset/youtube8m/tf2pkl.py similarity index 100% rename from fluid/PaddleCV/video/dataset/youtube8m/tf2pkl.py rename to PaddleCV/video/dataset/youtube8m/tf2pkl.py diff --git a/fluid/PaddleCV/video/dataset/youtube8m/yt8m_pca/eigenvals.npy b/PaddleCV/video/dataset/youtube8m/yt8m_pca/eigenvals.npy similarity index 100% rename from fluid/PaddleCV/video/dataset/youtube8m/yt8m_pca/eigenvals.npy rename to PaddleCV/video/dataset/youtube8m/yt8m_pca/eigenvals.npy diff --git a/fluid/PaddleCV/video/images/StNet.png b/PaddleCV/video/images/StNet.png similarity index 100% rename from fluid/PaddleCV/video/images/StNet.png rename to PaddleCV/video/images/StNet.png diff --git a/fluid/PaddleCV/video/images/attention_cluster.png b/PaddleCV/video/images/attention_cluster.png similarity index 100% rename from fluid/PaddleCV/video/images/attention_cluster.png rename to PaddleCV/video/images/attention_cluster.png diff --git a/fluid/PaddleCV/video/infer.py b/PaddleCV/video/infer.py similarity index 100% rename from fluid/PaddleCV/video/infer.py rename to PaddleCV/video/infer.py diff --git a/fluid/PaddleCV/video/metrics/__init__.py b/PaddleCV/video/metrics/__init__.py similarity index 100% rename from fluid/PaddleCV/video/metrics/__init__.py rename to PaddleCV/video/metrics/__init__.py diff --git a/fluid/PaddleCV/image_classification/dist_train/__init__.py b/PaddleCV/video/metrics/kinetics/__init__.py similarity index 100% rename from fluid/PaddleCV/image_classification/dist_train/__init__.py rename to PaddleCV/video/metrics/kinetics/__init__.py diff --git a/fluid/PaddleCV/video/metrics/kinetics/accuracy_metrics.py b/PaddleCV/video/metrics/kinetics/accuracy_metrics.py similarity index 100% rename from fluid/PaddleCV/video/metrics/kinetics/accuracy_metrics.py rename to PaddleCV/video/metrics/kinetics/accuracy_metrics.py diff --git a/fluid/PaddleCV/video/metrics/metrics_util.py b/PaddleCV/video/metrics/metrics_util.py similarity index 100% rename from fluid/PaddleCV/video/metrics/metrics_util.py rename to PaddleCV/video/metrics/metrics_util.py diff --git a/fluid/PaddleCV/rcnn/__init__.py b/PaddleCV/video/metrics/multicrop_test/__init__.py similarity index 100% rename from fluid/PaddleCV/rcnn/__init__.py rename to PaddleCV/video/metrics/multicrop_test/__init__.py diff --git a/fluid/PaddleCV/video/metrics/multicrop_test/multicrop_test_metrics.py b/PaddleCV/video/metrics/multicrop_test/multicrop_test_metrics.py similarity index 100% rename from fluid/PaddleCV/video/metrics/multicrop_test/multicrop_test_metrics.py rename to PaddleCV/video/metrics/multicrop_test/multicrop_test_metrics.py diff --git a/fluid/PaddleCV/rcnn/models/__init__.py b/PaddleCV/video/metrics/youtube8m/__init__.py similarity index 100% rename from fluid/PaddleCV/rcnn/models/__init__.py rename to PaddleCV/video/metrics/youtube8m/__init__.py diff --git a/fluid/PaddleCV/video/metrics/youtube8m/average_precision_calculator.py b/PaddleCV/video/metrics/youtube8m/average_precision_calculator.py similarity index 100% rename from fluid/PaddleCV/video/metrics/youtube8m/average_precision_calculator.py rename to PaddleCV/video/metrics/youtube8m/average_precision_calculator.py diff --git a/fluid/PaddleCV/video/metrics/youtube8m/eval_util.py b/PaddleCV/video/metrics/youtube8m/eval_util.py similarity index 100% rename from fluid/PaddleCV/video/metrics/youtube8m/eval_util.py rename to PaddleCV/video/metrics/youtube8m/eval_util.py diff --git a/fluid/PaddleCV/video/metrics/youtube8m/mean_average_precision_calculator.py b/PaddleCV/video/metrics/youtube8m/mean_average_precision_calculator.py similarity index 100% rename from fluid/PaddleCV/video/metrics/youtube8m/mean_average_precision_calculator.py rename to PaddleCV/video/metrics/youtube8m/mean_average_precision_calculator.py diff --git a/fluid/PaddleCV/video/models/__init__.py b/PaddleCV/video/models/__init__.py similarity index 100% rename from fluid/PaddleCV/video/models/__init__.py rename to PaddleCV/video/models/__init__.py diff --git a/fluid/PaddleCV/video/models/attention_cluster/README.md b/PaddleCV/video/models/attention_cluster/README.md similarity index 100% rename from fluid/PaddleCV/video/models/attention_cluster/README.md rename to PaddleCV/video/models/attention_cluster/README.md diff --git a/fluid/PaddleCV/video/models/attention_cluster/__init__.py b/PaddleCV/video/models/attention_cluster/__init__.py similarity index 100% rename from fluid/PaddleCV/video/models/attention_cluster/__init__.py rename to PaddleCV/video/models/attention_cluster/__init__.py diff --git a/fluid/PaddleCV/video/models/attention_cluster/attention_cluster.py b/PaddleCV/video/models/attention_cluster/attention_cluster.py similarity index 100% rename from fluid/PaddleCV/video/models/attention_cluster/attention_cluster.py rename to PaddleCV/video/models/attention_cluster/attention_cluster.py diff --git a/fluid/PaddleCV/video/models/attention_cluster/logistic_model.py b/PaddleCV/video/models/attention_cluster/logistic_model.py similarity index 100% rename from fluid/PaddleCV/video/models/attention_cluster/logistic_model.py rename to PaddleCV/video/models/attention_cluster/logistic_model.py diff --git a/fluid/PaddleCV/video/models/attention_cluster/shifting_attention.py b/PaddleCV/video/models/attention_cluster/shifting_attention.py similarity index 100% rename from fluid/PaddleCV/video/models/attention_cluster/shifting_attention.py rename to PaddleCV/video/models/attention_cluster/shifting_attention.py diff --git a/fluid/PaddleCV/video/models/attention_lstm/README.md b/PaddleCV/video/models/attention_lstm/README.md similarity index 100% rename from fluid/PaddleCV/video/models/attention_lstm/README.md rename to PaddleCV/video/models/attention_lstm/README.md diff --git a/fluid/PaddleCV/video/models/attention_lstm/__init__.py b/PaddleCV/video/models/attention_lstm/__init__.py similarity index 100% rename from fluid/PaddleCV/video/models/attention_lstm/__init__.py rename to PaddleCV/video/models/attention_lstm/__init__.py diff --git a/fluid/PaddleCV/video/models/attention_lstm/attention_lstm.py b/PaddleCV/video/models/attention_lstm/attention_lstm.py similarity index 100% rename from fluid/PaddleCV/video/models/attention_lstm/attention_lstm.py rename to PaddleCV/video/models/attention_lstm/attention_lstm.py diff --git a/fluid/PaddleCV/video/models/attention_lstm/lstm_attention.py b/PaddleCV/video/models/attention_lstm/lstm_attention.py similarity index 100% rename from fluid/PaddleCV/video/models/attention_lstm/lstm_attention.py rename to PaddleCV/video/models/attention_lstm/lstm_attention.py diff --git a/fluid/PaddleCV/video/models/model.py b/PaddleCV/video/models/model.py similarity index 100% rename from fluid/PaddleCV/video/models/model.py rename to PaddleCV/video/models/model.py diff --git a/fluid/PaddleCV/video/models/nextvlad/README.md b/PaddleCV/video/models/nextvlad/README.md similarity index 100% rename from fluid/PaddleCV/video/models/nextvlad/README.md rename to PaddleCV/video/models/nextvlad/README.md diff --git a/fluid/PaddleCV/video/models/nextvlad/__init__.py b/PaddleCV/video/models/nextvlad/__init__.py similarity index 100% rename from fluid/PaddleCV/video/models/nextvlad/__init__.py rename to PaddleCV/video/models/nextvlad/__init__.py diff --git a/fluid/PaddleCV/video/models/nextvlad/clf_model.py b/PaddleCV/video/models/nextvlad/clf_model.py similarity index 100% rename from fluid/PaddleCV/video/models/nextvlad/clf_model.py rename to PaddleCV/video/models/nextvlad/clf_model.py diff --git a/fluid/PaddleCV/video/models/nextvlad/nextvlad.py b/PaddleCV/video/models/nextvlad/nextvlad.py similarity index 100% rename from fluid/PaddleCV/video/models/nextvlad/nextvlad.py rename to PaddleCV/video/models/nextvlad/nextvlad.py diff --git a/fluid/PaddleCV/video/models/nextvlad/nextvlad_model.py b/PaddleCV/video/models/nextvlad/nextvlad_model.py similarity index 100% rename from fluid/PaddleCV/video/models/nextvlad/nextvlad_model.py rename to PaddleCV/video/models/nextvlad/nextvlad_model.py diff --git a/fluid/PaddleCV/video/models/stnet/README.md b/PaddleCV/video/models/stnet/README.md similarity index 100% rename from fluid/PaddleCV/video/models/stnet/README.md rename to PaddleCV/video/models/stnet/README.md diff --git a/fluid/PaddleCV/video/models/stnet/__init__.py b/PaddleCV/video/models/stnet/__init__.py similarity index 100% rename from fluid/PaddleCV/video/models/stnet/__init__.py rename to PaddleCV/video/models/stnet/__init__.py diff --git a/fluid/PaddleCV/video/models/stnet/stnet.py b/PaddleCV/video/models/stnet/stnet.py similarity index 100% rename from fluid/PaddleCV/video/models/stnet/stnet.py rename to PaddleCV/video/models/stnet/stnet.py diff --git a/fluid/PaddleCV/video/models/stnet/stnet_res_model.py b/PaddleCV/video/models/stnet/stnet_res_model.py similarity index 100% rename from fluid/PaddleCV/video/models/stnet/stnet_res_model.py rename to PaddleCV/video/models/stnet/stnet_res_model.py diff --git a/fluid/PaddleCV/video/models/tsn/README.md b/PaddleCV/video/models/tsn/README.md similarity index 100% rename from fluid/PaddleCV/video/models/tsn/README.md rename to PaddleCV/video/models/tsn/README.md diff --git a/fluid/PaddleCV/video/models/tsn/__init__.py b/PaddleCV/video/models/tsn/__init__.py similarity index 100% rename from fluid/PaddleCV/video/models/tsn/__init__.py rename to PaddleCV/video/models/tsn/__init__.py diff --git a/fluid/PaddleCV/video/models/tsn/tsn.py b/PaddleCV/video/models/tsn/tsn.py similarity index 100% rename from fluid/PaddleCV/video/models/tsn/tsn.py rename to PaddleCV/video/models/tsn/tsn.py diff --git a/fluid/PaddleCV/video/models/tsn/tsn_res_model.py b/PaddleCV/video/models/tsn/tsn_res_model.py similarity index 100% rename from fluid/PaddleCV/video/models/tsn/tsn_res_model.py rename to PaddleCV/video/models/tsn/tsn_res_model.py diff --git a/fluid/PaddleCV/video/models/utils.py b/PaddleCV/video/models/utils.py similarity index 100% rename from fluid/PaddleCV/video/models/utils.py rename to PaddleCV/video/models/utils.py diff --git a/fluid/PaddleCV/video/scripts/infer/infer_attention_cluster.sh b/PaddleCV/video/scripts/infer/infer_attention_cluster.sh similarity index 100% rename from fluid/PaddleCV/video/scripts/infer/infer_attention_cluster.sh rename to PaddleCV/video/scripts/infer/infer_attention_cluster.sh diff --git a/fluid/PaddleCV/video/scripts/infer/infer_attention_lstm.sh b/PaddleCV/video/scripts/infer/infer_attention_lstm.sh similarity index 100% rename from fluid/PaddleCV/video/scripts/infer/infer_attention_lstm.sh rename to PaddleCV/video/scripts/infer/infer_attention_lstm.sh diff --git a/fluid/PaddleCV/video/scripts/infer/infer_nextvlad.sh b/PaddleCV/video/scripts/infer/infer_nextvlad.sh similarity index 100% rename from fluid/PaddleCV/video/scripts/infer/infer_nextvlad.sh rename to PaddleCV/video/scripts/infer/infer_nextvlad.sh diff --git a/fluid/PaddleCV/video/scripts/infer/infer_stnet.sh b/PaddleCV/video/scripts/infer/infer_stnet.sh similarity index 100% rename from fluid/PaddleCV/video/scripts/infer/infer_stnet.sh rename to PaddleCV/video/scripts/infer/infer_stnet.sh diff --git a/fluid/PaddleCV/video/scripts/infer/infer_tsn.sh b/PaddleCV/video/scripts/infer/infer_tsn.sh similarity index 100% rename from fluid/PaddleCV/video/scripts/infer/infer_tsn.sh rename to PaddleCV/video/scripts/infer/infer_tsn.sh diff --git a/fluid/PaddleCV/video/scripts/test/test_attention_cluster.sh b/PaddleCV/video/scripts/test/test_attention_cluster.sh similarity index 100% rename from fluid/PaddleCV/video/scripts/test/test_attention_cluster.sh rename to PaddleCV/video/scripts/test/test_attention_cluster.sh diff --git a/fluid/PaddleCV/video/scripts/test/test_attention_lstm.sh b/PaddleCV/video/scripts/test/test_attention_lstm.sh similarity index 100% rename from fluid/PaddleCV/video/scripts/test/test_attention_lstm.sh rename to PaddleCV/video/scripts/test/test_attention_lstm.sh diff --git a/fluid/PaddleCV/video/scripts/test/test_nextvlad.sh b/PaddleCV/video/scripts/test/test_nextvlad.sh similarity index 100% rename from fluid/PaddleCV/video/scripts/test/test_nextvlad.sh rename to PaddleCV/video/scripts/test/test_nextvlad.sh diff --git a/fluid/PaddleCV/video/scripts/test/test_stnet.sh b/PaddleCV/video/scripts/test/test_stnet.sh similarity index 100% rename from fluid/PaddleCV/video/scripts/test/test_stnet.sh rename to PaddleCV/video/scripts/test/test_stnet.sh diff --git a/fluid/PaddleCV/video/scripts/test/test_tsn.sh b/PaddleCV/video/scripts/test/test_tsn.sh similarity index 100% rename from fluid/PaddleCV/video/scripts/test/test_tsn.sh rename to PaddleCV/video/scripts/test/test_tsn.sh diff --git a/fluid/PaddleCV/video/scripts/train/train_attention_cluster.sh b/PaddleCV/video/scripts/train/train_attention_cluster.sh similarity index 100% rename from fluid/PaddleCV/video/scripts/train/train_attention_cluster.sh rename to PaddleCV/video/scripts/train/train_attention_cluster.sh diff --git a/fluid/PaddleCV/video/scripts/train/train_attention_lstm.sh b/PaddleCV/video/scripts/train/train_attention_lstm.sh similarity index 100% rename from fluid/PaddleCV/video/scripts/train/train_attention_lstm.sh rename to PaddleCV/video/scripts/train/train_attention_lstm.sh diff --git a/fluid/PaddleCV/video/scripts/train/train_nextvlad.sh b/PaddleCV/video/scripts/train/train_nextvlad.sh similarity index 100% rename from fluid/PaddleCV/video/scripts/train/train_nextvlad.sh rename to PaddleCV/video/scripts/train/train_nextvlad.sh diff --git a/fluid/PaddleCV/video/scripts/train/train_stnet.sh b/PaddleCV/video/scripts/train/train_stnet.sh similarity index 100% rename from fluid/PaddleCV/video/scripts/train/train_stnet.sh rename to PaddleCV/video/scripts/train/train_stnet.sh diff --git a/fluid/PaddleCV/video/scripts/train/train_tsn.sh b/PaddleCV/video/scripts/train/train_tsn.sh similarity index 100% rename from fluid/PaddleCV/video/scripts/train/train_tsn.sh rename to PaddleCV/video/scripts/train/train_tsn.sh diff --git a/fluid/PaddleCV/video/test.py b/PaddleCV/video/test.py similarity index 100% rename from fluid/PaddleCV/video/test.py rename to PaddleCV/video/test.py diff --git a/fluid/PaddleCV/video/metrics/kinetics/__init__.py b/PaddleCV/video/tools/__init__.py similarity index 100% rename from fluid/PaddleCV/video/metrics/kinetics/__init__.py rename to PaddleCV/video/tools/__init__.py diff --git a/fluid/PaddleCV/video/tools/train_utils.py b/PaddleCV/video/tools/train_utils.py similarity index 100% rename from fluid/PaddleCV/video/tools/train_utils.py rename to PaddleCV/video/tools/train_utils.py diff --git a/fluid/PaddleCV/video/train.py b/PaddleCV/video/train.py similarity index 100% rename from fluid/PaddleCV/video/train.py rename to PaddleCV/video/train.py diff --git a/fluid/PaddleCV/video/utils.py b/PaddleCV/video/utils.py similarity index 100% rename from fluid/PaddleCV/video/utils.py rename to PaddleCV/video/utils.py diff --git a/PaddleCV/video_classification/README.md b/PaddleCV/video_classification/README.md new file mode 100644 index 0000000000000000000000000000000000000000..822c3ccf64cb1c5567e574425229974524a34471 --- /dev/null +++ b/PaddleCV/video_classification/README.md @@ -0,0 +1,140 @@ +# Video Classification Based on Temporal Segment Network + +Video classification has drawn a significant amount of attentions in the past few years. This page introduces how to perform video classification with PaddlePaddle Fluid, on the public UCF-101 dataset, based on the state-of-the-art Temporal Segment Network (TSN) method. + +______________________________________________________________________________ + +## Table of Contents +
  • Installation
  • +
  • Data preparation
  • +
  • Training
  • +
  • Evaluation
  • +
  • Inference
  • +
  • Performance
  • + +### Installation +Running sample code in this directory requires PaddelPaddle Fluid v0.13.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in installation document and make an update. + +### Data preparation + +#### download UCF-101 dataset +Users can download the UCF-101 dataset by the provided script in data/download.sh. + +#### decode video into frame +To avoid the process of decoding videos in network training, we offline decode them into frames and save it in the pickle format, easily readable for python. + +Users can refer to the script data/video_decode.py for video decoding. + +#### split data into train and test +We follow the split 1 of UCF-101 dataset. After data splitting, users can get 9537 videos for training and 3783 videos for validation. The reference script is data/split_data.py. + +#### save pickle for training +As stated above, we save all data as pickle format for training. All information in each video is saved into one pickle, includes video id, frames binary and label. Please refer to the script data/generate_train_data.py. +After this operation, one can get two directories containing training and testing data in pickle format, and two files train.list and test.list, with each line seperated by SPACE. + +### Training +After data preparation, users can start the PaddlePaddle Fluid training by: +``` +python train.py \ + --batch_size=128 \ + --total_videos=9537 \ + --class_dim=101 \ + --num_epochs=60 \ + --image_shape=3,224,224 \ + --model_save_dir=output/ \ + --with_mem_opt=True \ + --lr_init=0.01 \ + --num_layers=50 \ + --seg_num=7 \ + --pretrained_model={path_to_pretrained_model} +``` + +parameter introduction: +
  • batch_size: the size of each mini-batch.
  • +
  • total_videos: total number of videos in the training set.
  • +
  • class_dim: the class number of the classification task.
  • +
  • num_epochs: the number of epochs.
  • +
  • image_shape: input size of the network.
  • +
  • model_save_dir: the directory to save trained model.
  • +
  • with_mem_opt: whether to use memory optimization or not.
  • +
  • lr_init: initialized learning rate.
  • +
  • num_layers: the number of layers for ResNet.
  • +
  • seg_num: the number of segments in TSN.
  • +
  • pretrained_model: model path for pretraining.
  • +
    + +data reader introduction: +Data reader is defined in reader.py. Note that we use group operation for all frames in one video. + + +training: +The training log is like: +``` +[TRAIN] Pass: 0 trainbatch: 0 loss: 4.630959 acc1: 0.0 acc5: 0.0390625 time: 3.09 sec +[TRAIN] Pass: 0 trainbatch: 10 loss: 4.559069 acc1: 0.0546875 acc5: 0.1171875 time: 3.91 sec +[TRAIN] Pass: 0 trainbatch: 20 loss: 4.040092 acc1: 0.09375 acc5: 0.3515625 time: 3.88 sec +[TRAIN] Pass: 0 trainbatch: 30 loss: 3.478214 acc1: 0.3203125 acc5: 0.5546875 time: 3.32 sec +[TRAIN] Pass: 0 trainbatch: 40 loss: 3.005404 acc1: 0.3515625 acc5: 0.6796875 time: 3.33 sec +[TRAIN] Pass: 0 trainbatch: 50 loss: 2.585245 acc1: 0.4609375 acc5: 0.7265625 time: 3.13 sec +[TRAIN] Pass: 0 trainbatch: 60 loss: 2.151489 acc1: 0.4921875 acc5: 0.8203125 time: 3.35 sec +[TRAIN] Pass: 0 trainbatch: 70 loss: 1.981680 acc1: 0.578125 acc5: 0.8359375 time: 3.30 sec +``` + +### Evaluation +Evaluation is to evaluate the performance of a trained model. One can download pretrained models and set its path to path_to_pretrain_model. Then top1/top5 accuracy can be obtained by running the following command: +``` +python eval.py \ + --batch_size=128 \ + --class_dim=101 \ + --image_shape=3,224,224 \ + --with_mem_opt=True \ + --num_layers=50 \ + --seg_num=7 \ + --test_model={path_to_pretrained_model} +``` + +According to the congfiguration of evaluation, the output log is like: +``` +[TEST] Pass: 0 testbatch: 0 loss: 0.011551 acc1: 1.0 acc5: 1.0 time: 0.48 sec +[TEST] Pass: 0 testbatch: 10 loss: 0.710330 acc1: 0.75 acc5: 1.0 time: 0.49 sec +[TEST] Pass: 0 testbatch: 20 loss: 0.000547 acc1: 1.0 acc5: 1.0 time: 0.48 sec +[TEST] Pass: 0 testbatch: 30 loss: 0.036623 acc1: 1.0 acc5: 1.0 time: 0.48 sec +[TEST] Pass: 0 testbatch: 40 loss: 0.138705 acc1: 1.0 acc5: 1.0 time: 0.48 sec +[TEST] Pass: 0 testbatch: 50 loss: 0.056909 acc1: 1.0 acc5: 1.0 time: 0.49 sec +[TEST] Pass: 0 testbatch: 60 loss: 0.742937 acc1: 0.75 acc5: 1.0 time: 0.49 sec +[TEST] Pass: 0 testbatch: 70 loss: 1.720186 acc1: 0.5 acc5: 0.875 time: 0.48 sec +[TEST] Pass: 0 testbatch: 80 loss: 0.199669 acc1: 0.875 acc5: 1.0 time: 0.48 sec +[TEST] Pass: 0 testbatch: 90 loss: 0.195510 acc1: 1.0 acc5: 1.0 time: 0.48 sec +``` + +### Inference +Inference is used to get prediction score or video features based on trained models. +``` +python infer.py \ + --class_dim=101 \ + --image_shape=3,224,224 \ + --with_mem_opt=True \ + --num_layers=50 \ + --seg_num=7 \ + --test_model={path_to_pretrained_model} +``` + +The output contains predication results, including maximum score (before softmax) and corresponding predicted label. +``` +Test sample: PlayingGuitar_g01_c03, score: [21.418629], class [62] +Test sample: SalsaSpin_g05_c06, score: [13.238657], class [76] +Test sample: TrampolineJumping_g04_c01, score: [21.722862], class [93] +Test sample: JavelinThrow_g01_c04, score: [16.27892], class [44] +Test sample: PlayingTabla_g01_c01, score: [15.366951], class [65] +Test sample: ParallelBars_g04_c07, score: [18.42596], class [56] +Test sample: PlayingCello_g05_c05, score: [18.795723], class [58] +Test sample: LongJump_g03_c04, score: [7.100088], class [50] +Test sample: SkyDiving_g06_c03, score: [15.144707], class [82] +Test sample: UnevenBars_g07_c04, score: [22.114838], class [95] +``` + +### Performance +Configuration | Top-1 acc +------------- | ---------------: +seg=7, size=224 | 0.859 +seg=10, size=224 | 0.863 diff --git a/fluid/PaddleCV/video_classification/data/download.sh b/PaddleCV/video_classification/data/download.sh similarity index 100% rename from fluid/PaddleCV/video_classification/data/download.sh rename to PaddleCV/video_classification/data/download.sh diff --git a/fluid/PaddleCV/video_classification/data/generate_train_data.py b/PaddleCV/video_classification/data/generate_train_data.py similarity index 100% rename from fluid/PaddleCV/video_classification/data/generate_train_data.py rename to PaddleCV/video_classification/data/generate_train_data.py diff --git a/fluid/PaddleCV/video_classification/data/split_data.py b/PaddleCV/video_classification/data/split_data.py similarity index 100% rename from fluid/PaddleCV/video_classification/data/split_data.py rename to PaddleCV/video_classification/data/split_data.py diff --git a/fluid/PaddleCV/video_classification/data/video_decode.py b/PaddleCV/video_classification/data/video_decode.py similarity index 100% rename from fluid/PaddleCV/video_classification/data/video_decode.py rename to PaddleCV/video_classification/data/video_decode.py diff --git a/fluid/PaddleCV/video_classification/eval.py b/PaddleCV/video_classification/eval.py similarity index 100% rename from fluid/PaddleCV/video_classification/eval.py rename to PaddleCV/video_classification/eval.py diff --git a/fluid/PaddleCV/video_classification/infer.py b/PaddleCV/video_classification/infer.py similarity index 100% rename from fluid/PaddleCV/video_classification/infer.py rename to PaddleCV/video_classification/infer.py diff --git a/fluid/PaddleCV/video_classification/reader.py b/PaddleCV/video_classification/reader.py similarity index 100% rename from fluid/PaddleCV/video_classification/reader.py rename to PaddleCV/video_classification/reader.py diff --git a/fluid/PaddleCV/video_classification/resnet.py b/PaddleCV/video_classification/resnet.py similarity index 100% rename from fluid/PaddleCV/video_classification/resnet.py rename to PaddleCV/video_classification/resnet.py diff --git a/fluid/PaddleCV/video_classification/train.py b/PaddleCV/video_classification/train.py similarity index 100% rename from fluid/PaddleCV/video_classification/train.py rename to PaddleCV/video_classification/train.py diff --git a/fluid/PaddleCV/video_classification/utility.py b/PaddleCV/video_classification/utility.py similarity index 100% rename from fluid/PaddleCV/video_classification/utility.py rename to PaddleCV/video_classification/utility.py diff --git a/fluid/PaddleCV/yolov3/.gitignore b/PaddleCV/yolov3/.gitignore similarity index 100% rename from fluid/PaddleCV/yolov3/.gitignore rename to PaddleCV/yolov3/.gitignore diff --git a/PaddleCV/yolov3/README.md b/PaddleCV/yolov3/README.md new file mode 100644 index 0000000000000000000000000000000000000000..8b37aded4c21d7e9f50f1d79965bce4bb567b15c --- /dev/null +++ b/PaddleCV/yolov3/README.md @@ -0,0 +1,152 @@ +# YOLO V3 Objective Detection + +--- +## Table of Contents + +- [Installation](#installation) +- [Introduction](#introduction) +- [Data preparation](#data-preparation) +- [Training](#training) +- [Evaluation](#evaluation) +- [Inference and Visualization](#inference-and-visualization) +- [Appendix](#appendix) + +## Installation + +Running sample code in this directory requires PaddelPaddle Fluid v.1.4 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/documentation/docs/zh/1.4/beginners_guide/install/install_doc.html#paddlepaddle) and make an update. + +## Introduction + +[YOLOv3](https://arxiv.org/abs/1804.02767) is a one stage end to end detector。the detection principle of YOLOv3 is as follow: +

    +
    +YOLOv3 detection principle +

    + +YOLOv3 divides the input image in to S\*S grids and predict B bounding boxes in each grid, predictions of boxes include Location(x, y, w, h), Confidence Score and probabilities of C classes, therefore YOLOv3 output layer has S\*S\*B\*(5 + C) channels. YOLOv3 loss consists of three parts: location loss, confidence loss and classification loss. +The bone network of YOLOv3 is darknet53, the structure of YOLOv3 is as follow: +

    +
    +YOLOv3 structure +

    + +YOLOv3 networks are composed of base feature extraction network, multi-scale feature fusion layers, and output layers. + +1. Feature extraction network: YOLOv3 uses [DarkNet53](https://arxiv.org/abs/1612.08242) for feature extracion. Darknet53 uses a full convolution structure, replacing the pooling layer with a convolution operation of step size 2, and adding Residual-block to avoid gradient dispersion when the number of network layers is too deep. + +2. Feature fusion layer. In order to solve the problem that the previous YOLO version is not sensitive to small objects, YOLOv3 uses three different scale feature maps for target detection, which are 13\*13, 26\*26, 52\*52, respectively, for detecting large, medium and small objects. The feature fusion layer selects the three scale feature maps produced by DarkNet as input, and draws on the idea of FPN (feature pyramid networks) to fuse the feature maps of each scale through a series of convolutional layers and upsampling. + +3. Output layer: The output layer also uses a full convolution structure. The number of convolution kernels in the last convolutional layer is 255:3\*(80+4+1)=255, and 3 indicates that a grid cell contains 3 bounding boxes. 4 represents the four coordinate information of the box, 1 represents the Confidence Score, and 80 represents the probability of 80 categories in the COCO dataset. + +## Data preparation + +Train the model on [MS-COCO dataset](http://cocodataset.org/#download), download dataset as below: + + cd dataset/coco + ./download.sh + + +## Training + +After data preparation, one can start the training step by: + + python train.py \ + --model_save_dir=output/ \ + --pretrain=${path_to_pretrain_model} + --data_dir=${path_to_data} + +- Set ```export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7``` to specifiy 8 GPU to train. +- For more help on arguments: + + python train.py --help + +**download the pre-trained model:** This sample provides Resnet-50 pre-trained model which is converted from Caffe. The model fuses the parameters in batch normalization layer. One can download pre-trained model as: + + sh ./weights/download.sh + +Set `pretrain` to load pre-trained model. In addition, this parameter is used to load trained model when finetuning as well. +Please make sure that pre-trained model is downloaded and loaded correctly, otherwise, the loss may be NAN during training. + +**Install the [cocoapi](https://github.com/cocodataset/cocoapi):** + +To train the model, [cocoapi](https://github.com/cocodataset/cocoapi) is needed. Install the cocoapi: + + git clone https://github.com/cocodataset/cocoapi.git + cd PythonAPI + # if cython is not installed + pip install Cython + # Install into global site-packages + make install + # Alternatively, if you do not have permissions or prefer + # not to install the COCO API into global site-packages + python2 setup.py install --user + +**data reader introduction:** + +* Data reader is defined in `reader.py` . + +**model configuration:** + +* The model uses 9 anchors generated based on the COCO dataset, which are 10x13, 16x30, 33x23, 30x61, 62x45, 59x119, 116x90, 156x198, 373x326. + +* NMS threshold=0.45, NMS valid=0.005 nms_topk=400, nms_posk=100 + +**training strategy:** + +* Use momentum optimizer with momentum=0.9. +* In first 4000 iteration, the learning rate increases linearly from 0.0 to 0.001. Then lr is decayed at 400000, 450000 iteration with multiplier 0.1, 0.01. The maximum iteration is 500000. + +Training result is shown as below: +

    +
    +Train Loss +

    + +## Evaluation + +Evaluation is to evaluate the performance of a trained model. This sample provides `eval.py` which uses a COCO-specific mAP metric defined by [COCO committee](http://cocodataset.org/#detections-eval). + +`eval.py` is the main executor for evalution, one can start evalution step by: + + python eval.py \ + --dataset=coco2017 \ + --weights=${path_to_weights} \ + +- Set ```export CUDA_VISIBLE_DEVICES=0``` to specifiy one GPU to eval. + +Evalutaion result is shown as below: + +| input size | mAP(IoU=0.50:0.95) | mAP(IoU=0.50) | mAP(IoU=0.75) | +| :------: | :------: | :------: | :------: | +| 608x608| 37.7 | 59.8 | 40.8 | +| 416x416 | 36.5 | 58.2 | 39.1 | +| 320x320 | 34.1 | 55.4 | 36.3 | + +## Inference and Visualization + +Inference is used to get prediction score or image features based on trained models. `infer.py` is the main executor for inference, one can start infer step by: + + python infer.py \ + --dataset=coco2017 \ + --weights=${path_to_weights} \ + --image_path=data/COCO17/val2017/ \ + --image_name=000000000139.jpg \ + --draw_threshold=0.5 + +Inference speed: + + +| input size | 608x608 | 416x416 | 320x320 | +|:-------------:| :-----: | :-----: | :-----: | +| infer speed | 50 ms/frame | 29 ms/frame |24 ms/frame | + + +Visualization of infer result is shown as below: +

    + + + +
    +YOLOv3 Visualization Examples +

    + diff --git a/PaddleCV/yolov3/README_cn.md b/PaddleCV/yolov3/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..16f7c452a6c5346e1a40469a675afeb1a1961401 --- /dev/null +++ b/PaddleCV/yolov3/README_cn.md @@ -0,0 +1,154 @@ +# YOLO V3 目标检测 + +--- +## 内容 + +- [安装](#安装) +- [简介](#简介) +- [数据准备](#数据准备) +- [模型训练](#模型训练) +- [模型评估](#模型评估) +- [模型推断及可视化](#模型推断及可视化) +- [附录](#附录) + +## 安装 + +在当前目录下运行样例代码需要PadddlePaddle Fluid的v.1.4或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://www.paddlepaddle.org/documentation/docs/zh/1.4/beginners_guide/install/install_doc.html#paddlepaddle)中的说明来更新PaddlePaddle。 + +## 简介 + +[YOLOv3](https://arxiv.org/abs/1804.02767) 是一阶段End2End的目标检测器。其目标检测原理如下图所示: +

    +
    +YOLOv3检测原理 +

    + +YOLOv3将输入图像分成S\*S个格子,每个格子预测B个bounding box,每个bounding box预测内容包括: Location(x, y, w, h)、Confidence Score和C个类别的概率,因此YOLOv3输出层的channel数为S\*S\*B\*(5 + C)。YOLOv3的loss函数也有三部分组成:Location误差,Confidence误差和分类误差。 + +YOLOv3的网络结构如下图所示: +

    +
    +YOLOv3网络结构 +

    + +YOLOv3 的网络结构由基础特征提取网络、multi-scale特征融合层和输出层组成。 + +1. 特征提取网络。YOLOv3使用 [DarkNet53](https://arxiv.org/abs/1612.08242)作为特征提取网络:DarkNet53 基本采用了全卷积网络,用步长为2的卷积操作替代了池化层,同时添加了 Residual 单元,避免在网络层数过深时发生梯度弥散。 + +2. 特征融合层。为了解决之前YOLO版本对小目标不敏感的问题,YOLOv3采用了3个不同尺度的特征图来进行目标检测,分别为13\*13,26\*26,52\*52,用来检测大、中、小三种目标。特征融合层选取 DarkNet 产出的三种尺度特征图作为输入,借鉴了FPN(feature pyramid networks)的思想,通过一系列的卷积层和上采样对各尺度的特征图进行融合。 + +3. 输出层。同样使用了全卷积结构,其中最后一个卷积层的卷积核个数是255:3\*(80+4+1)=255,3表示一个grid cell包含3个bounding box,4表示框的4个坐标信息,1表示Confidence Score,80表示COCO数据集中80个类别的概率。 + + +## 数据准备 + +在[MS-COCO数据集](http://cocodataset.org/#download)上进行训练,通过如下方式下载数据集。 + + cd dataset/coco + ./download.sh + + +## 模型训练 + +数据准备完毕后,可以通过如下的方式启动训练: + + python train.py \ + --model_save_dir=output/ \ + --pretrain=${path_to_pretrain_model} + --data_dir=${path_to_data} + +- 通过设置export CUDA\_VISIBLE\_DEVICES=0,1,2,3,4,5,6,7指定8卡GPU训练。 +- 可选参数见: + + python train.py --help + +**下载预训练模型:** 本示例提供darknet53预训练模型,该模型转换自作者提供的darknet53在ImageNet上预训练的权重,采用如下命令下载预训练模型: + + sh ./weights/download_pretrained_weight.sh + +通过初始化`pretrain` 加载预训练模型。同时在参数微调时也采用该设置加载已训练模型。 +请在训练前确认预训练模型下载与加载正确,否则训练过程中损失可能会出现NAN。 + +**安装[cocoapi](https://github.com/cocodataset/cocoapi):** + +训练前需要首先下载[cocoapi](https://github.com/cocodataset/cocoapi): + + git clone https://github.com/cocodataset/cocoapi.git + cd PythonAPI + # if cython is not installed + pip install Cython + # Install into global site-packages + make install + # Alternatively, if you do not have permissions or prefer + # not to install the COCO API into global site-packages + python2 setup.py install --user + +**数据读取器说明:** + +* 数据读取器定义在reader.py中。 + +**模型设置:** + +* 模型使用了基于COCO数据集生成的9个先验框:10x13,16x30,33x23,30x61,62x45,59x119,116x90,156x198,373x326 +* 检测过程中,nms_topk=400, nms_posk=100,nms_thresh=0.45 + +**训练策略:** + +* 采用momentum优化算法训练YOLOv3,momentum=0.9。 +* 学习率采用warmup算法,前4000轮学习率从0.0线性增加至0.001。在400000,450000轮时使用0.1,0.01乘子进行学习率衰减,最大训练500000轮。 + +下图为模型训练结果: +

    +
    +Train Loss +

    + +## 模型评估 + +模型评估是指对训练完毕的模型评估各类性能指标。本示例采用[COCO官方评估](http://cocodataset.org/#detections-eval) + +`eval.py`是评估模块的主要执行程序,调用示例如下: + + python eval.py \ + --dataset=coco2017 \ + --weights=${path_to_weights} \ + +- 通过设置export CUDA\_VISIBLE\_DEVICES=0指定单卡GPU评估。 + +模型评估结果: + +| input size | mAP(IoU=0.50:0.95) | mAP(IoU=0.50) | mAP(IoU=0.75) | +| :------: | :------: | :------: | :------: | +| 608x608| 37.7 | 59.8 | 40.8 | +| 416x416 | 36.5 | 58.2 | 39.1 | +| 320x320 | 34.1 | 55.4 | 36.3 | + + + +## 模型推断及可视化 + +模型推断可以获取图像中的物体及其对应的类别,`infer.py`是主要执行程序,调用示例如下: + + python infer.py \ + --dataset=coco2017 \ + --weights=${path_to_weights} \ + --image_path=data/COCO17/val2017/ \ + --image_name=000000000139.jpg \ + --draw_threshold=0.5 + +模型预测速度: + + +| input size | 608x608 | 416x416 | 320x320 | +|:-------------:| :-----: | :-----: | :-----: | +| infer speed | 50 ms/frame | 29 ms/frame |24 ms/frame | + +下图为模型可视化预测结果: +

    + + + +
    +YOLOv3 预测可视化 +

    + diff --git a/fluid/PaddleCV/yolov3/box_utils.py b/PaddleCV/yolov3/box_utils.py similarity index 100% rename from fluid/PaddleCV/yolov3/box_utils.py rename to PaddleCV/yolov3/box_utils.py diff --git a/fluid/PaddleCV/yolov3/config.py b/PaddleCV/yolov3/config.py similarity index 100% rename from fluid/PaddleCV/yolov3/config.py rename to PaddleCV/yolov3/config.py diff --git a/fluid/PaddleCV/yolov3/data_utils.py b/PaddleCV/yolov3/data_utils.py similarity index 100% rename from fluid/PaddleCV/yolov3/data_utils.py rename to PaddleCV/yolov3/data_utils.py diff --git a/fluid/PaddleCV/yolov3/dataset/coco/download.sh b/PaddleCV/yolov3/dataset/coco/download.sh similarity index 100% rename from fluid/PaddleCV/yolov3/dataset/coco/download.sh rename to PaddleCV/yolov3/dataset/coco/download.sh diff --git a/fluid/PaddleCV/yolov3/edict.py b/PaddleCV/yolov3/edict.py similarity index 100% rename from fluid/PaddleCV/yolov3/edict.py rename to PaddleCV/yolov3/edict.py diff --git a/fluid/PaddleCV/yolov3/eval.py b/PaddleCV/yolov3/eval.py similarity index 100% rename from fluid/PaddleCV/yolov3/eval.py rename to PaddleCV/yolov3/eval.py diff --git a/fluid/PaddleCV/yolov3/image/000000000139.png b/PaddleCV/yolov3/image/000000000139.png similarity index 100% rename from fluid/PaddleCV/yolov3/image/000000000139.png rename to PaddleCV/yolov3/image/000000000139.png diff --git a/fluid/PaddleCV/yolov3/image/000000127517.png b/PaddleCV/yolov3/image/000000127517.png similarity index 100% rename from fluid/PaddleCV/yolov3/image/000000127517.png rename to PaddleCV/yolov3/image/000000127517.png diff --git a/fluid/PaddleCV/yolov3/image/000000203864.png b/PaddleCV/yolov3/image/000000203864.png similarity index 100% rename from fluid/PaddleCV/yolov3/image/000000203864.png rename to PaddleCV/yolov3/image/000000203864.png diff --git a/fluid/PaddleCV/yolov3/image/000000515077.png b/PaddleCV/yolov3/image/000000515077.png similarity index 100% rename from fluid/PaddleCV/yolov3/image/000000515077.png rename to PaddleCV/yolov3/image/000000515077.png diff --git a/fluid/PaddleCV/yolov3/image/YOLOv3.jpg b/PaddleCV/yolov3/image/YOLOv3.jpg similarity index 100% rename from fluid/PaddleCV/yolov3/image/YOLOv3.jpg rename to PaddleCV/yolov3/image/YOLOv3.jpg diff --git a/fluid/PaddleCV/yolov3/image/YOLOv3_structure.jpg b/PaddleCV/yolov3/image/YOLOv3_structure.jpg similarity index 100% rename from fluid/PaddleCV/yolov3/image/YOLOv3_structure.jpg rename to PaddleCV/yolov3/image/YOLOv3_structure.jpg diff --git a/fluid/PaddleCV/yolov3/image/dog.jpg b/PaddleCV/yolov3/image/dog.jpg similarity index 100% rename from fluid/PaddleCV/yolov3/image/dog.jpg rename to PaddleCV/yolov3/image/dog.jpg diff --git a/fluid/PaddleCV/yolov3/image/eagle.jpg b/PaddleCV/yolov3/image/eagle.jpg similarity index 100% rename from fluid/PaddleCV/yolov3/image/eagle.jpg rename to PaddleCV/yolov3/image/eagle.jpg diff --git a/fluid/PaddleCV/yolov3/image/giraffe.jpg b/PaddleCV/yolov3/image/giraffe.jpg similarity index 100% rename from fluid/PaddleCV/yolov3/image/giraffe.jpg rename to PaddleCV/yolov3/image/giraffe.jpg diff --git a/fluid/PaddleCV/yolov3/image/horses.jpg b/PaddleCV/yolov3/image/horses.jpg similarity index 100% rename from fluid/PaddleCV/yolov3/image/horses.jpg rename to PaddleCV/yolov3/image/horses.jpg diff --git a/fluid/PaddleCV/yolov3/image/kite.jpg b/PaddleCV/yolov3/image/kite.jpg similarity index 100% rename from fluid/PaddleCV/yolov3/image/kite.jpg rename to PaddleCV/yolov3/image/kite.jpg diff --git a/fluid/PaddleCV/yolov3/image/person.jpg b/PaddleCV/yolov3/image/person.jpg similarity index 100% rename from fluid/PaddleCV/yolov3/image/person.jpg rename to PaddleCV/yolov3/image/person.jpg diff --git a/fluid/PaddleCV/yolov3/image/scream.jpg b/PaddleCV/yolov3/image/scream.jpg similarity index 100% rename from fluid/PaddleCV/yolov3/image/scream.jpg rename to PaddleCV/yolov3/image/scream.jpg diff --git a/fluid/PaddleCV/yolov3/image/train_loss.png b/PaddleCV/yolov3/image/train_loss.png similarity index 100% rename from fluid/PaddleCV/yolov3/image/train_loss.png rename to PaddleCV/yolov3/image/train_loss.png diff --git a/fluid/PaddleCV/yolov3/image_utils.py b/PaddleCV/yolov3/image_utils.py similarity index 100% rename from fluid/PaddleCV/yolov3/image_utils.py rename to PaddleCV/yolov3/image_utils.py diff --git a/fluid/PaddleCV/yolov3/infer.py b/PaddleCV/yolov3/infer.py similarity index 100% rename from fluid/PaddleCV/yolov3/infer.py rename to PaddleCV/yolov3/infer.py diff --git a/fluid/PaddleCV/yolov3/learning_rate.py b/PaddleCV/yolov3/learning_rate.py similarity index 100% rename from fluid/PaddleCV/yolov3/learning_rate.py rename to PaddleCV/yolov3/learning_rate.py diff --git a/fluid/PaddleCV/video/metrics/multicrop_test/__init__.py b/PaddleCV/yolov3/models/__init__.py similarity index 100% rename from fluid/PaddleCV/video/metrics/multicrop_test/__init__.py rename to PaddleCV/yolov3/models/__init__.py diff --git a/fluid/PaddleCV/yolov3/models/darknet.py b/PaddleCV/yolov3/models/darknet.py similarity index 100% rename from fluid/PaddleCV/yolov3/models/darknet.py rename to PaddleCV/yolov3/models/darknet.py diff --git a/fluid/PaddleCV/yolov3/models/yolov3.py b/PaddleCV/yolov3/models/yolov3.py similarity index 100% rename from fluid/PaddleCV/yolov3/models/yolov3.py rename to PaddleCV/yolov3/models/yolov3.py diff --git a/fluid/PaddleCV/yolov3/reader.py b/PaddleCV/yolov3/reader.py similarity index 100% rename from fluid/PaddleCV/yolov3/reader.py rename to PaddleCV/yolov3/reader.py diff --git a/fluid/PaddleCV/yolov3/train.py b/PaddleCV/yolov3/train.py similarity index 100% rename from fluid/PaddleCV/yolov3/train.py rename to PaddleCV/yolov3/train.py diff --git a/fluid/PaddleCV/yolov3/utility.py b/PaddleCV/yolov3/utility.py similarity index 100% rename from fluid/PaddleCV/yolov3/utility.py rename to PaddleCV/yolov3/utility.py diff --git a/fluid/PaddleCV/yolov3/weights/download.sh b/PaddleCV/yolov3/weights/download.sh similarity index 100% rename from fluid/PaddleCV/yolov3/weights/download.sh rename to PaddleCV/yolov3/weights/download.sh diff --git a/fluid/PaddleNLP/LAC b/PaddleNLP/LAC similarity index 100% rename from fluid/PaddleNLP/LAC rename to PaddleNLP/LAC diff --git a/PaddleNLP/LARK b/PaddleNLP/LARK new file mode 160000 index 0000000000000000000000000000000000000000..77ab80a7061024c4b28f0b41fdd6ba42d5e6d9e1 --- /dev/null +++ b/PaddleNLP/LARK @@ -0,0 +1 @@ +Subproject commit 77ab80a7061024c4b28f0b41fdd6ba42d5e6d9e1 diff --git a/PaddleNLP/README.md b/PaddleNLP/README.md new file mode 100644 index 0000000000000000000000000000000000000000..fa81f6a27df879e68e704491225c1fee10930c93 --- /dev/null +++ b/PaddleNLP/README.md @@ -0,0 +1,56 @@ +PaddleNLP +========= + +机器翻译 +-------- + +机器翻译(Machine Translation)将一种自然语言(源语言)转换成一种自然语言(目标语言),是自然语言处理中非常基础和重要的研究方向。在全球化的浪潮中,机器翻译在促进跨语言文明的交流中所起的重要作用是不言而喻的。其发展经历了统计机器翻译和基于神经网络的神经机器翻译(Nueural +Machine Translation, NMT)等阶段。在 NMT 成熟后,机器翻译才真正得以大规模应用。而早阶段的 NMT 主要是基于循环神经网络 RNN 的,其训练过程中当前时间步依赖于前一个时间步的计算,时间步之间难以并行化以提高训练速度。因此,非 RNN 结构的 NMT 得以应运而生,例如基 卷积神经网络 CNN 的结构和基于自注意力机制(Self-Attention)的结构。 + +本实例所实现的 Transformer 就是一个基于自注意力机制的机器翻译模型,其中不再有RNN或CNN结构,而是完全利用 Attention 学习语言中的上下文依赖。相较于RNN/CNN, 这种结构在单层内计算复杂度更低、易于并行化、对长程依赖更易建模,最终在多种语言之间取得了最好的翻译效果。 + +- [Transformer](https://github.com/PaddlePaddle/models/blob/develop/PaddleNLP/neural_machine_translation/transformer/README_cn.md) + + +中文词法分析 +------------ + +中文分词(Word Segmentation)是将连续的自然语言文本,切分出具有语义合理性和完整性的词汇序列的过程。因为在汉语中,词是承担语义的最基本单位,切词是文本分类、情感分析、信息检索等众多自然语言处理任务的基础。 词性标注(Part-of-speech Tagging)是为自然语言文本中的每一个词汇赋予一个词性的过程,这里的词性包括名词、动词、形容词、副词等等。 命名实体识别(Named Entity Recognition,NER)又称作“专名识别”,是指识别自然语言文本中具有特定意义的实体,主要包括人名、地名、机构名、专有名词等。 我们将这三个任务统一成一个联合任务,称为词法分析任务,基于深度神经网络,利用海量标注语料进行训练,提供了一个端到端的解决方案。 + +我们把这个联合的中文词法分析解决方案命名为LAC。LAC既可以认为是Lexical Analysis of Chinese的首字母缩写,也可以认为是LAC Analyzes Chinese的递归缩写。 + +- [LAC](https://github.com/baidu/lac/blob/master/README.md) + +情感倾向分析 +------------ + +情感倾向分析针对带有主观描述的中文文本,可自动判断该文本的情感极性类别并给出相应的置信度。情感类型分为积极、消极、中性。情感倾向分析能够帮助企业理解用户消费习惯、分析热点话题和危机舆情监控,为企业提供有力的决策支持。本次我们开放 AI 开放平台中情感倾向分析采用的[模型](http://ai.baidu.com/tech/nlp/sentiment_classify),提供给用户使用。 + +- [Senta](https://github.com/baidu/Senta/blob/master/README.md) + +语义匹配 +-------- + +在自然语言处理很多场景中,需要度量两个文本在语义上的相似度,这类任务通常被称为语义匹配。例如在搜索中根据查询与候选文档的相似度对搜索结果进行排序,文本去重中文本与文本相似度的计算,自动问答中候选答案与问题的匹配等。 + +本例所开放的DAM (Deep Attention Matching Network)为百度自然语言处理部发表于ACL-2018的工作,用于检索式聊天机器人多轮对话中应答的选择。DAM受Transformer的启发,其网络结构完全基于注意力(attention)机制,利用栈式的self-attention结构分别学习不同粒度下应答和语境的语义表示,然后利用cross-attention获取应答与语境之间的相关性,在两个大规模多轮对话数据集上的表现均好于其它模型。 + +- [Deep Attention Matching Network](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/deep_attention_matching_net) + +AnyQ +---- + +[AnyQ](https://github.com/baidu/AnyQ)(ANswer Your Questions) 开源项目主要包含面向FAQ集合的问答系统框架、文本语义匹配工具SimNet。 问答系统框架采用了配置化、插件化的设计,各功能均通过插件形式加入,当前共开放了20+种插件。开发者可以使用AnyQ系统快速构建和定制适用于特定业务场景的FAQ问答系统,并加速迭代和升级。 + +SimNet是百度自然语言处理部于2013年自主研发的语义匹配框架,该框架在百度各产品上广泛应用,主要包括BOW、CNN、RNN、MM-DNN等核心网络结构形式,同时基于该框架也集成了学术界主流的语义匹配模型,如MatchPyramid、MV-LSTM、K-NRM等模型。使用SimNet构建出的模型可以便捷的加入AnyQ系统中,增强AnyQ系统的语义匹配能力。 + +- [SimNet in PaddlePaddle Fluid](https://github.com/baidu/AnyQ/blob/master/tools/simnet/train/paddle/README.md) + +机器阅读理解 +---------- + +机器阅读理解(MRC)是自然语言处理(NLP)中的核心任务之一,最终目标是让机器像人类一样阅读文本,提炼文本信息并回答相关问题。深度学习近年来在NLP中得到广泛使用,也使得机器阅读理解能力在近年有了大幅提高,但是目前研究的机器阅读理解都采用人工构造的数据集,以及回答一些相对简单的问题,和人类处理的数据还有明显差距,因此亟需大规模真实训练数据推动MRC的进一步发展。 + +百度阅读理解数据集是由百度自然语言处理部开源的一个真实世界数据集,所有的问题、原文都来源于实际数据(百度搜索引擎数据和百度知道问答社区),答案是由人类回答的。每个问题都对应多个答案,数据集包含200k问题、1000k原文和420k答案,是目前最大的中文MRC数据集。百度同时开源了对应的阅读理解模型,称为DuReader,采用当前通用的网络分层结构,通过双向attention机制捕捉问题和原文之间的交互关系,生成query-aware的原文表示,最终基于query-aware的原文表示通过point network预测答案范围。 + +- [DuReader in PaddlePaddle Fluid](https://github.com/PaddlePaddle/models/blob/develop/PaddleNLP/machine_reading_comprehension/README.md) diff --git a/fluid/PaddleNLP/Senta b/PaddleNLP/Senta similarity index 100% rename from fluid/PaddleNLP/Senta rename to PaddleNLP/Senta diff --git a/PaddleNLP/SimNet b/PaddleNLP/SimNet new file mode 160000 index 0000000000000000000000000000000000000000..b3e096b92f26720f6e3b020b374e11aa0748c032 --- /dev/null +++ b/PaddleNLP/SimNet @@ -0,0 +1 @@ +Subproject commit b3e096b92f26720f6e3b020b374e11aa0748c032 diff --git a/fluid/PaddleNLP/chinese_ner/.run_ce.sh b/PaddleNLP/chinese_ner/.run_ce.sh similarity index 100% rename from fluid/PaddleNLP/chinese_ner/.run_ce.sh rename to PaddleNLP/chinese_ner/.run_ce.sh diff --git a/PaddleNLP/chinese_ner/README.md b/PaddleNLP/chinese_ner/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a458c83b5f1ad9c007d35ddfb7a6578fb14bbf2a --- /dev/null +++ b/PaddleNLP/chinese_ner/README.md @@ -0,0 +1,62 @@ +# 使用ParallelExecutor的中文命名实体识别示例 + +以下是本例的简要目录结构及说明: + +```text +. +├── data # 存储运行本例所依赖的数据,从外部获取 +├── reader.py # 数据读取接口, 从外部获取 +├── README.md # 文档 +├── train.py # 训练脚本 +├── infer.py # 预测脚本 +``` + +## 数据 +在data目录下,有两个文件夹,train_files中保存的是训练数据,test_files中保存的是测试数据,作为示例,在目录下我们各放置了两个文件,实际训练时,根据自己的实际需要将数据放置在对应目录,并根据数据格式,修改reader.py中的数据读取函数。 + +## 训练 + +通过运行 + +``` +python train.py --help +``` + +来获取命令行参数的帮助,设置正确的数据路径等参数后,运行`train.py`开始训练。 + +训练记录形如 +```txt +pass_id:0, time_cost:4.92960214615s +[Train] precision:0.000862136531076, recall:0.0059880239521, f1:0.00150726226363 +[Test] precision:0.000796178343949, recall:0.00335758254057, f1:0.00128713933283 +pass_id:1, time_cost:0.715255975723s +[Train] precision:0.00474094141551, recall:0.00762112139358, f1:0.00584551148225 +[Test] precision:0.0228873239437, recall:0.00727476217124, f1:0.0110403397028 +pass_id:2, time_cost:0.740842103958s +[Train] precision:0.0120967741935, recall:0.00163309744148, f1:0.00287769784173 +[Test] precision:0, recall:0.0, f1:0 +``` + +## 预测 +类似于训练过程,预测时指定需要测试模型的路径、测试数据、预测标记文件的路径,运行`infer.py`开始预测。 + +预测结果如下 +```txt +152804 O O +130048 O O +38862 10-B O +784 O O +1540 O O +4145 O O +2255 O O +0 O O +1279 O O +7793 O O +373 O O +1621 O O +815 O O +2 O O +247 24-B O +401 24-I O +``` +输出分为三列,以"\t"分割,第一列是输入的词语的序号,第二列是标准结果,第三列为标记结果。多条输入序列之间以空行分隔。 diff --git a/fluid/PaddleCV/video/metrics/youtube8m/__init__.py b/PaddleNLP/chinese_ner/__init__.py similarity index 100% rename from fluid/PaddleCV/video/metrics/youtube8m/__init__.py rename to PaddleNLP/chinese_ner/__init__.py diff --git a/fluid/PaddleNLP/chinese_ner/_ce.py b/PaddleNLP/chinese_ner/_ce.py similarity index 100% rename from fluid/PaddleNLP/chinese_ner/_ce.py rename to PaddleNLP/chinese_ner/_ce.py diff --git a/fluid/PaddleNLP/chinese_ner/data/label_dict b/PaddleNLP/chinese_ner/data/label_dict similarity index 100% rename from fluid/PaddleNLP/chinese_ner/data/label_dict rename to PaddleNLP/chinese_ner/data/label_dict diff --git a/fluid/PaddleNLP/chinese_ner/data/test_files/test_part_1 b/PaddleNLP/chinese_ner/data/test_files/test_part_1 similarity index 100% rename from fluid/PaddleNLP/chinese_ner/data/test_files/test_part_1 rename to PaddleNLP/chinese_ner/data/test_files/test_part_1 diff --git a/fluid/PaddleNLP/chinese_ner/data/test_files/test_part_2 b/PaddleNLP/chinese_ner/data/test_files/test_part_2 similarity index 100% rename from fluid/PaddleNLP/chinese_ner/data/test_files/test_part_2 rename to PaddleNLP/chinese_ner/data/test_files/test_part_2 diff --git a/fluid/PaddleNLP/chinese_ner/data/train_files/train_part_1 b/PaddleNLP/chinese_ner/data/train_files/train_part_1 similarity index 100% rename from fluid/PaddleNLP/chinese_ner/data/train_files/train_part_1 rename to PaddleNLP/chinese_ner/data/train_files/train_part_1 diff --git a/fluid/PaddleNLP/chinese_ner/data/train_files/train_part_2 b/PaddleNLP/chinese_ner/data/train_files/train_part_2 similarity index 100% rename from fluid/PaddleNLP/chinese_ner/data/train_files/train_part_2 rename to PaddleNLP/chinese_ner/data/train_files/train_part_2 diff --git a/fluid/PaddleNLP/chinese_ner/infer.py b/PaddleNLP/chinese_ner/infer.py similarity index 100% rename from fluid/PaddleNLP/chinese_ner/infer.py rename to PaddleNLP/chinese_ner/infer.py diff --git a/fluid/PaddleNLP/chinese_ner/reader.py b/PaddleNLP/chinese_ner/reader.py similarity index 100% rename from fluid/PaddleNLP/chinese_ner/reader.py rename to PaddleNLP/chinese_ner/reader.py diff --git a/fluid/PaddleNLP/chinese_ner/scripts/README.md b/PaddleNLP/chinese_ner/scripts/README.md similarity index 100% rename from fluid/PaddleNLP/chinese_ner/scripts/README.md rename to PaddleNLP/chinese_ner/scripts/README.md diff --git a/fluid/PaddleNLP/chinese_ner/scripts/infer.sh b/PaddleNLP/chinese_ner/scripts/infer.sh similarity index 100% rename from fluid/PaddleNLP/chinese_ner/scripts/infer.sh rename to PaddleNLP/chinese_ner/scripts/infer.sh diff --git a/fluid/PaddleNLP/chinese_ner/scripts/train.sh b/PaddleNLP/chinese_ner/scripts/train.sh similarity index 100% rename from fluid/PaddleNLP/chinese_ner/scripts/train.sh rename to PaddleNLP/chinese_ner/scripts/train.sh diff --git a/fluid/PaddleNLP/chinese_ner/train.py b/PaddleNLP/chinese_ner/train.py similarity index 100% rename from fluid/PaddleNLP/chinese_ner/train.py rename to PaddleNLP/chinese_ner/train.py diff --git a/fluid/PaddleNLP/deep_attention_matching_net/.run_ce.sh b/PaddleNLP/deep_attention_matching_net/.run_ce.sh similarity index 100% rename from fluid/PaddleNLP/deep_attention_matching_net/.run_ce.sh rename to PaddleNLP/deep_attention_matching_net/.run_ce.sh diff --git a/PaddleNLP/deep_attention_matching_net/README.md b/PaddleNLP/deep_attention_matching_net/README.md new file mode 100644 index 0000000000000000000000000000000000000000..37085fe46ee6774b3e553a35d840eb11395da8a0 --- /dev/null +++ b/PaddleNLP/deep_attention_matching_net/README.md @@ -0,0 +1,87 @@ +# __Deep Attention Matching Network__ + +This is the source code of Deep Attention Matching network (DAM), that is proposed for multi-turn response selection in the retrieval-based chatbot. + +DAM is a neural matching network that entirely based on attention mechanism. The motivation of DAM is to capture those semantic dependencies, among dialogue elements at different level of granularities, in multi-turn conversation as matching evidences, in order to better match response candidate with its multi-turn context. DAM appears on ACL-2018, please find our paper at [http://aclweb.org/anthology/P18-1103](http://aclweb.org/anthology/P18-1103). + + +## __Network__ + +DAM is inspired by Transformer in Machine Translation (Vaswani et al., 2017), and we extend the key attention mechanism of Transformer in two perspectives and introduce those two kinds of attention in one uniform neural network. + +- **self-attention** To gradually capture semantic representations in different granularities by stacking attention from word-level embeddings. Those multi-grained semantic representations would facilitate exploring segmental dependencies between context and response. + +- **cross-attention** Attention across context and response can generally capture the relevance in dependency between segment pairs, which could provide complementary information to textual relevance for matching response with multi-turn context. + +

    +
    +Overview of Deep Attention Matching Network +

    + +## __Results__ + +We test DAM on two large-scale multi-turn response selection tasks, i.e., the Ubuntu Corpus v1 and Douban Conversation Corpus, experimental results are bellow: + +

    +
    +

    + +## __Usage__ + +Take the experiment on the Ubuntu Corpus v1 for Example. + +1) Go to the `ubuntu` directory + +``` +cd ubuntu +``` +2) Download the well-preprocessed data for training + +``` +sh download_data.sh +``` +3) Execute the model training and evaluation by + +``` +sh train.sh +``` +for more detailed explanation about the arguments, please run + +``` +python ../train_and_evaluate.py --help +``` + +By default, the training is executed on one single GPU, which can be switched to multiple-GPU mode easily by simply resetting the visible devices in `train.sh`, e.g., + +``` +export CUDA_VISIBLE_DEVICES=0,1,2,3 +``` + +4) Run test by + +``` +sh test.sh +``` +and run the test for different saved models by using different argument `--model_path`. + +Similary, one can carry out the experiment on the Douban Conversation Corpus by going to the directory `douban` and following the same procedure. + +## __Dependencies__ + +- Python >= 2.7.3 +- PaddlePaddle latest develop branch + +## __Citation__ + +The following article describe the DAM in detail. We recommend citing this article as default. + +``` +@inproceedings{ , + title={Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network}, + author={Xiangyang Zhou, Lu Li, Daxiang Dong, Yi Liu, Ying Chen, Wayne Xin Zhao, Dianhai Yu and Hua Wu}, + booktitle={Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, + volume={1}, + pages={ -- }, + year={2018} +} +``` diff --git a/fluid/PaddleNLP/deep_attention_matching_net/_ce.py b/PaddleNLP/deep_attention_matching_net/_ce.py similarity index 100% rename from fluid/PaddleNLP/deep_attention_matching_net/_ce.py rename to PaddleNLP/deep_attention_matching_net/_ce.py diff --git a/fluid/PaddleNLP/deep_attention_matching_net/douban/download_data.sh b/PaddleNLP/deep_attention_matching_net/douban/download_data.sh similarity index 100% rename from fluid/PaddleNLP/deep_attention_matching_net/douban/download_data.sh rename to PaddleNLP/deep_attention_matching_net/douban/download_data.sh diff --git a/fluid/PaddleNLP/deep_attention_matching_net/douban/test.sh b/PaddleNLP/deep_attention_matching_net/douban/test.sh similarity index 100% rename from fluid/PaddleNLP/deep_attention_matching_net/douban/test.sh rename to PaddleNLP/deep_attention_matching_net/douban/test.sh diff --git a/fluid/PaddleNLP/deep_attention_matching_net/douban/train.sh b/PaddleNLP/deep_attention_matching_net/douban/train.sh similarity index 100% rename from fluid/PaddleNLP/deep_attention_matching_net/douban/train.sh rename to PaddleNLP/deep_attention_matching_net/douban/train.sh diff --git a/fluid/PaddleNLP/deep_attention_matching_net/images/Figure1.png b/PaddleNLP/deep_attention_matching_net/images/Figure1.png similarity index 100% rename from fluid/PaddleNLP/deep_attention_matching_net/images/Figure1.png rename to PaddleNLP/deep_attention_matching_net/images/Figure1.png diff --git a/fluid/PaddleNLP/deep_attention_matching_net/images/Figure2.png b/PaddleNLP/deep_attention_matching_net/images/Figure2.png similarity index 100% rename from fluid/PaddleNLP/deep_attention_matching_net/images/Figure2.png rename to PaddleNLP/deep_attention_matching_net/images/Figure2.png diff --git a/fluid/PaddleNLP/deep_attention_matching_net/model.py b/PaddleNLP/deep_attention_matching_net/model.py similarity index 100% rename from fluid/PaddleNLP/deep_attention_matching_net/model.py rename to PaddleNLP/deep_attention_matching_net/model.py diff --git a/fluid/PaddleNLP/deep_attention_matching_net/test_and_evaluate.py b/PaddleNLP/deep_attention_matching_net/test_and_evaluate.py similarity index 100% rename from fluid/PaddleNLP/deep_attention_matching_net/test_and_evaluate.py rename to PaddleNLP/deep_attention_matching_net/test_and_evaluate.py diff --git a/fluid/PaddleNLP/deep_attention_matching_net/train_and_evaluate.py b/PaddleNLP/deep_attention_matching_net/train_and_evaluate.py similarity index 100% rename from fluid/PaddleNLP/deep_attention_matching_net/train_and_evaluate.py rename to PaddleNLP/deep_attention_matching_net/train_and_evaluate.py diff --git a/fluid/PaddleNLP/deep_attention_matching_net/ubuntu/download_data.sh b/PaddleNLP/deep_attention_matching_net/ubuntu/download_data.sh similarity index 100% rename from fluid/PaddleNLP/deep_attention_matching_net/ubuntu/download_data.sh rename to PaddleNLP/deep_attention_matching_net/ubuntu/download_data.sh diff --git a/fluid/PaddleNLP/deep_attention_matching_net/ubuntu/test.sh b/PaddleNLP/deep_attention_matching_net/ubuntu/test.sh similarity index 100% rename from fluid/PaddleNLP/deep_attention_matching_net/ubuntu/test.sh rename to PaddleNLP/deep_attention_matching_net/ubuntu/test.sh diff --git a/fluid/PaddleNLP/deep_attention_matching_net/ubuntu/train.sh b/PaddleNLP/deep_attention_matching_net/ubuntu/train.sh similarity index 100% rename from fluid/PaddleNLP/deep_attention_matching_net/ubuntu/train.sh rename to PaddleNLP/deep_attention_matching_net/ubuntu/train.sh diff --git a/fluid/PaddleCV/video/tools/__init__.py b/PaddleNLP/deep_attention_matching_net/utils/__init__.py similarity index 100% rename from fluid/PaddleCV/video/tools/__init__.py rename to PaddleNLP/deep_attention_matching_net/utils/__init__.py diff --git a/fluid/PaddleNLP/deep_attention_matching_net/utils/douban_evaluation.py b/PaddleNLP/deep_attention_matching_net/utils/douban_evaluation.py similarity index 100% rename from fluid/PaddleNLP/deep_attention_matching_net/utils/douban_evaluation.py rename to PaddleNLP/deep_attention_matching_net/utils/douban_evaluation.py diff --git a/fluid/PaddleNLP/deep_attention_matching_net/utils/evaluation.py b/PaddleNLP/deep_attention_matching_net/utils/evaluation.py similarity index 100% rename from fluid/PaddleNLP/deep_attention_matching_net/utils/evaluation.py rename to PaddleNLP/deep_attention_matching_net/utils/evaluation.py diff --git a/fluid/PaddleNLP/deep_attention_matching_net/utils/layers.py b/PaddleNLP/deep_attention_matching_net/utils/layers.py similarity index 100% rename from fluid/PaddleNLP/deep_attention_matching_net/utils/layers.py rename to PaddleNLP/deep_attention_matching_net/utils/layers.py diff --git a/fluid/PaddleNLP/deep_attention_matching_net/utils/reader.py b/PaddleNLP/deep_attention_matching_net/utils/reader.py similarity index 100% rename from fluid/PaddleNLP/deep_attention_matching_net/utils/reader.py rename to PaddleNLP/deep_attention_matching_net/utils/reader.py diff --git a/fluid/PaddleNLP/deep_attention_matching_net/utils/util.py b/PaddleNLP/deep_attention_matching_net/utils/util.py similarity index 100% rename from fluid/PaddleNLP/deep_attention_matching_net/utils/util.py rename to PaddleNLP/deep_attention_matching_net/utils/util.py diff --git a/fluid/PaddleNLP/knowledge-driven-dialogue b/PaddleNLP/knowledge-driven-dialogue similarity index 100% rename from fluid/PaddleNLP/knowledge-driven-dialogue rename to PaddleNLP/knowledge-driven-dialogue diff --git a/fluid/PaddleNLP/language_model/gru/.run_ce.sh b/PaddleNLP/language_model/gru/.run_ce.sh similarity index 100% rename from fluid/PaddleNLP/language_model/gru/.run_ce.sh rename to PaddleNLP/language_model/gru/.run_ce.sh diff --git a/PaddleNLP/language_model/gru/README.md b/PaddleNLP/language_model/gru/README.md new file mode 100644 index 0000000000000000000000000000000000000000..91ce2d7f58085b56da2ac2dec03af2a05985ab8f --- /dev/null +++ b/PaddleNLP/language_model/gru/README.md @@ -0,0 +1,148 @@ +# 语言模型 + +以下是本例的简要目录结构及说明: + +```text +. +├── README.md # 文档 +├── train.py # 训练脚本 +├── infer.py # 预测脚本 +└── utils.py # 通用函数 +``` + + +## 简介 + +循环神经网络语言模型的介绍可以参阅论文[Recurrent Neural Network Regularization](https://arxiv.org/abs/1409.2329),在本例中,我们实现了GRU-RNN语言模型。 + +## 训练 + +运行命令 `python train.py` 开始训练模型。 +```python +python train.py +``` + +当前支持的参数可参见[train.py](./train.py) `train_net` 函数 +```python +vocab, train_reader, test_reader = utils.prepare_data( + batch_size=20, # batch size + buffer_size=1000, # buffer size, default value is OK + word_freq_threshold=0) # vocabulary related parameter, and words with frequency below this value will be filtered + +train(train_reader=train_reader, + vocab=vocab, + network=network, + hid_size=200, # embedding and hidden size + base_lr=1.0, # base learning rate + batch_size=20, # batch size, the same as that in prepare_data + pass_num=12, # the number of passes for training + use_cuda=True, # whether to use GPU card + parallel=False, # whether to be parallel + model_dir="model", # directory to save model + init_low_bound=-0.1, # uniform parameter initialization lower bound + init_high_bound=0.1) # uniform parameter initialization upper bound +``` + +## 自定义网络结构 + +可在[train.py](./train.py) `network` 函数中调整网络结构,当前的网络结构如下: +```python +emb = fluid.layers.embedding(input=src, size=[vocab_size, hid_size], + param_attr=fluid.ParamAttr( + initializer=fluid.initializer.Uniform(low=init_low_bound, high=init_high_bound), + learning_rate=emb_lr_x), + is_sparse=True) + +fc0 = fluid.layers.fc(input=emb, size=hid_size * 3, + param_attr=fluid.ParamAttr( + initializer=fluid.initializer.Uniform(low=init_low_bound, high=init_high_bound), + learning_rate=gru_lr_x)) +gru_h0 = fluid.layers.dynamic_gru(input=fc0, size=hid_size, + param_attr=fluid.ParamAttr( + initializer=fluid.initializer.Uniform(low=init_low_bound, high=init_high_bound), + learning_rate=gru_lr_x)) + +fc = fluid.layers.fc(input=gru_h0, size=vocab_size, act='softmax', + param_attr=fluid.ParamAttr( + initializer=fluid.initializer.Uniform(low=init_low_bound, high=init_high_bound), + learning_rate=fc_lr_x)) + +cost = fluid.layers.cross_entropy(input=fc, label=dst) +``` + +## 训练结果示例 + +我们在Tesla K40m单GPU卡上训练的日志如下所示 +```text +epoch_1 start +step:100 ppl:771.053 +step:200 ppl:449.597 +step:300 ppl:642.654 +step:400 ppl:458.128 +step:500 ppl:510.912 +step:600 ppl:451.545 +step:700 ppl:364.404 +step:800 ppl:324.272 +step:900 ppl:360.797 +step:1000 ppl:275.761 +step:1100 ppl:294.599 +step:1200 ppl:335.877 +step:1300 ppl:185.262 +step:1400 ppl:241.744 +step:1500 ppl:211.507 +step:1600 ppl:233.431 +step:1700 ppl:298.767 +step:1800 ppl:203.403 +step:1900 ppl:158.828 +step:2000 ppl:171.148 +step:2100 ppl:280.884 +epoch:1 num_steps:2104 time_cost(s):47.478780 +model saved in model/epoch_1 +epoch_2 start +step:100 ppl:238.099 +step:200 ppl:136.527 +step:300 ppl:204.184 +step:400 ppl:252.886 +step:500 ppl:177.377 +step:600 ppl:197.688 +step:700 ppl:131.650 +step:800 ppl:223.906 +step:900 ppl:144.785 +step:1000 ppl:176.286 +step:1100 ppl:148.158 +step:1200 ppl:203.581 +step:1300 ppl:168.208 +step:1400 ppl:159.412 +step:1500 ppl:114.032 +step:1600 ppl:157.985 +step:1700 ppl:147.743 +step:1800 ppl:88.676 +step:1900 ppl:141.962 +step:2000 ppl:106.087 +step:2100 ppl:122.709 +epoch:2 num_steps:2104 time_cost(s):47.583789 +model saved in model/epoch_2 +... +``` + +## 预测 +运行命令 `python infer.py model_dir start_epoch last_epoch(inclusive)` 开始预测,其中,start_epoch指定开始预测的轮次,last_epoch指定结束的轮次,例如 +```python +python infer.py model 1 12 # prediction from epoch 1 to epoch 12 +``` + +## 预测结果示例 +```text +model:model/epoch_1 ppl:254.540 time_cost(s):3.29 +model:model/epoch_2 ppl:177.671 time_cost(s):3.27 +model:model/epoch_3 ppl:156.251 time_cost(s):3.27 +model:model/epoch_4 ppl:139.036 time_cost(s):3.27 +model:model/epoch_5 ppl:132.661 time_cost(s):3.27 +model:model/epoch_6 ppl:130.092 time_cost(s):3.28 +model:model/epoch_7 ppl:128.751 time_cost(s):3.27 +model:model/epoch_8 ppl:125.411 time_cost(s):3.27 +model:model/epoch_9 ppl:124.604 time_cost(s):3.28 +model:model/epoch_10 ppl:124.754 time_cost(s):3.29 +model:model/epoch_11 ppl:125.421 time_cost(s):3.27 +model:model/epoch_12 ppl:125.676 time_cost(s):3.27 +``` diff --git a/fluid/PaddleNLP/language_model/gru/_ce.py b/PaddleNLP/language_model/gru/_ce.py similarity index 100% rename from fluid/PaddleNLP/language_model/gru/_ce.py rename to PaddleNLP/language_model/gru/_ce.py diff --git a/fluid/PaddleNLP/language_model/gru/infer.py b/PaddleNLP/language_model/gru/infer.py similarity index 100% rename from fluid/PaddleNLP/language_model/gru/infer.py rename to PaddleNLP/language_model/gru/infer.py diff --git a/fluid/PaddleNLP/language_model/gru/train.py b/PaddleNLP/language_model/gru/train.py similarity index 100% rename from fluid/PaddleNLP/language_model/gru/train.py rename to PaddleNLP/language_model/gru/train.py diff --git a/fluid/PaddleNLP/language_model/gru/train_on_cloud.py b/PaddleNLP/language_model/gru/train_on_cloud.py similarity index 100% rename from fluid/PaddleNLP/language_model/gru/train_on_cloud.py rename to PaddleNLP/language_model/gru/train_on_cloud.py diff --git a/fluid/PaddleNLP/language_model/gru/utils.py b/PaddleNLP/language_model/gru/utils.py similarity index 100% rename from fluid/PaddleNLP/language_model/gru/utils.py rename to PaddleNLP/language_model/gru/utils.py diff --git a/fluid/PaddleNLP/language_model/lstm/.run_ce.sh b/PaddleNLP/language_model/lstm/.run_ce.sh similarity index 100% rename from fluid/PaddleNLP/language_model/lstm/.run_ce.sh rename to PaddleNLP/language_model/lstm/.run_ce.sh diff --git a/PaddleNLP/language_model/lstm/README.md b/PaddleNLP/language_model/lstm/README.md new file mode 100644 index 0000000000000000000000000000000000000000..f6d1250ff66a066c8634eca9c3f74312f00a7749 --- /dev/null +++ b/PaddleNLP/language_model/lstm/README.md @@ -0,0 +1,76 @@ +# lstm lm + +以下是本例的简要目录结构及说明: + +```text +. +├── README.md # 文档 +├── train.py # 训练脚本 +├── reader.py # 数据读取 +└── lm_model.py # 模型定义文件 +``` + + +## 简介 + +循环神经网络语言模型的介绍可以参阅论文[Recurrent Neural Network Regularization](https://arxiv.org/abs/1409.2329),本文主要是说明基于lstm的语言的模型的实现,数据是采用ptb dataset,下载地址为 +http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz + +## 数据下载 +用户可以自行下载数据,并解压, 也可以利用目录中的脚本 + +cd data; sh download_data.sh + +## 训练 + +运行命令 +`CUDA_VISIBLE_DEVICES=0 python train.py --data_path data/simple-examples/data/ --model_type small --use_gpu True` + 开始训练模型。 + +model_type 为模型配置的大小,目前支持 small,medium, large 三种配置形式 + +实现采用双层的lstm,具体的参数和网络配置 可以参考 train.py, lm_model.py 文件中的设置 + + +## 训练结果示例 + +p40中训练日志如下(small config), test 测试集仅在最后一个epoch完成后进行测试 +```text +epoch id 0 +ppl 232 865.86505 1.0 +ppl 464 632.76526 1.0 +ppl 696 510.47153 1.0 +ppl 928 437.60617 1.0 +ppl 1160 393.38422 1.0 +ppl 1392 353.05365 1.0 +ppl 1624 325.73267 1.0 +ppl 1856 305.488 1.0 +ppl 2088 286.3128 1.0 +ppl 2320 270.91504 1.0 +train ppl 270.86246 +valid ppl 181.867964379 +... +ppl 2320 40.975872 0.001953125 +train ppl 40.974102 +valid ppl 117.85741214 +test ppl 113.939103843 +``` +## 与tf结果对比 + +tf采用的版本是1.6 +```text +small config + train valid test +fluid 1.0 40.962 118.111 112.617 +tf 1.6 40.492 118.329 113.788 + +medium config + train valid test +fluid 1.0 45.620 87.398 83.682 +tf 1.6 45.594 87.363 84.015 + +large config + train valid test +fluid 1.0 37.221 82.358 78.137 +tf 1.6 38.342 82.311 78.121 +``` diff --git a/fluid/PaddleNLP/language_model/lstm/_ce.py b/PaddleNLP/language_model/lstm/_ce.py similarity index 100% rename from fluid/PaddleNLP/language_model/lstm/_ce.py rename to PaddleNLP/language_model/lstm/_ce.py diff --git a/fluid/PaddleNLP/language_model/lstm/args.py b/PaddleNLP/language_model/lstm/args.py similarity index 100% rename from fluid/PaddleNLP/language_model/lstm/args.py rename to PaddleNLP/language_model/lstm/args.py diff --git a/fluid/PaddleNLP/language_model/lstm/data/download_data.sh b/PaddleNLP/language_model/lstm/data/download_data.sh similarity index 100% rename from fluid/PaddleNLP/language_model/lstm/data/download_data.sh rename to PaddleNLP/language_model/lstm/data/download_data.sh diff --git a/fluid/PaddleNLP/language_model/lstm/lm_model.py b/PaddleNLP/language_model/lstm/lm_model.py similarity index 100% rename from fluid/PaddleNLP/language_model/lstm/lm_model.py rename to PaddleNLP/language_model/lstm/lm_model.py diff --git a/fluid/PaddleNLP/language_model/lstm/reader.py b/PaddleNLP/language_model/lstm/reader.py similarity index 100% rename from fluid/PaddleNLP/language_model/lstm/reader.py rename to PaddleNLP/language_model/lstm/reader.py diff --git a/fluid/PaddleNLP/language_model/lstm/train.py b/PaddleNLP/language_model/lstm/train.py similarity index 100% rename from fluid/PaddleNLP/language_model/lstm/train.py rename to PaddleNLP/language_model/lstm/train.py diff --git a/fluid/PaddleNLP/machine_reading_comprehension/.run_ce.sh b/PaddleNLP/machine_reading_comprehension/.run_ce.sh similarity index 100% rename from fluid/PaddleNLP/machine_reading_comprehension/.run_ce.sh rename to PaddleNLP/machine_reading_comprehension/.run_ce.sh diff --git a/PaddleNLP/machine_reading_comprehension/README.md b/PaddleNLP/machine_reading_comprehension/README.md new file mode 100644 index 0000000000000000000000000000000000000000..884c15058e9b5601c7754e27d1b106fd41e2ac27 --- /dev/null +++ b/PaddleNLP/machine_reading_comprehension/README.md @@ -0,0 +1,69 @@ +# Abstract +Dureader is an end-to-end neural network model for machine reading comprehension style question answering, which aims to answer questions from given passages. We first match the question and passages with a bidireactional attention flow network to obtrain the question-aware passages represenation. Then we employ a pointer network to locate the positions of answers from passages. Our experimental evalutions show that DuReader model achieves the state-of-the-art results in DuReader Dadaset. +# Dataset +DuReader Dataset is a new large-scale real-world and human sourced MRC dataset in Chinese. DuReader focuses on real-world open-domain question answering. The advantages of DuReader over existing datasets are concluded as follows: + - Real question + - Real article + - Real answer + - Real application scenario + - Rich annotation + +# Network +DuReader model is inspired by 3 classic reading comprehension models([BiDAF](https://arxiv.org/abs/1611.01603), [Match-LSTM](https://arxiv.org/abs/1608.07905), [R-NET](https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/r-net.pdf)). + +DuReader model is a hierarchical multi-stage process and consists of five layers + +- **Word Embedding Layer** maps each word to a vector using a pre-trained word embedding model. +- **Encoding Layer** extracts context infomation for each position in question and passages with a bi-directional LSTM network. +- **Attention Flow Layer** couples the query and context vectors and produces a set of query-aware feature vectors for each word in the context. Please refer to [BiDAF](https://arxiv.org/abs/1611.01603) for more details. +- **Fusion Layer** employs a layer of bi-directional LSTM to capture the interaction among context words independent of the query. +- **Decode Layer** employs an answer point network with attention pooling of the quesiton to locate the positions of answers from passages. Please refer to [Match-LSTM](https://arxiv.org/abs/1608.07905) and [R-NET](https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/r-net.pdf) for more details. + +## How to Run +### Download the Dataset +To Download DuReader dataset: +``` +cd data && bash download.sh +``` +For more details about DuReader dataset please refer to [DuReader Dataset Homepage](https://ai.baidu.com//broad/subordinate?dataset=dureader). + +### Download Thirdparty Dependencies +We use Bleu and Rouge as evaluation metrics, the calculation of these metrics relies on the scoring scripts under [coco-caption](https://github.com/tylin/coco-caption), to download them, run: + +``` +cd utils && bash download_thirdparty.sh +``` +### Environment Requirements +For now we've only tested on PaddlePaddle v1.0, to install PaddlePaddle and for more details about PaddlePaddle, see [PaddlePaddle Homepage](http://paddlepaddle.org). + +### Preparation +Before training the model, we have to make sure that the data is ready. For preparation, we will check the data files, make directories and extract a vocabulary for later use. You can run the following command to do this with a specified task name: + +``` +sh run.sh --prepare +``` +You can specify the files for train/dev/test by setting the `trainset`/`devset`/`testset`. +### Training +To train the model and you can also set the hyper-parameters such as the learning rate by using `--learning_rate NUM`. For example, to train the model for 10 passes, you can run: + +``` +sh run.sh --train --pass_num 10 +``` + +The training process includes an evaluation on the dev set after each training epoch. By default, the model with the least Bleu-4 score on the dev set will be saved. + +### Evaluation +To conduct a single evaluation on the dev set with the the model already trained, you can run the following command: + +``` +sh run.sh --evaluate --load_dir models/1 +``` + +### Prediction +You can also predict answers for the samples in some files using the following command: + +``` +sh run.sh --predict --load_dir models/1 --testset ../data/preprocessed/testset/search.dev.json +``` + +By default, the results are saved at `../data/results/` folder. You can change this by specifying `--result_dir DIR_PATH`. diff --git a/fluid/PaddleNLP/machine_reading_comprehension/_ce.py b/PaddleNLP/machine_reading_comprehension/_ce.py similarity index 100% rename from fluid/PaddleNLP/machine_reading_comprehension/_ce.py rename to PaddleNLP/machine_reading_comprehension/_ce.py diff --git a/fluid/PaddleNLP/machine_reading_comprehension/args.py b/PaddleNLP/machine_reading_comprehension/args.py similarity index 100% rename from fluid/PaddleNLP/machine_reading_comprehension/args.py rename to PaddleNLP/machine_reading_comprehension/args.py diff --git a/fluid/PaddleNLP/machine_reading_comprehension/data/download.sh b/PaddleNLP/machine_reading_comprehension/data/download.sh similarity index 100% rename from fluid/PaddleNLP/machine_reading_comprehension/data/download.sh rename to PaddleNLP/machine_reading_comprehension/data/download.sh diff --git a/fluid/PaddleNLP/machine_reading_comprehension/data/md5sum.txt b/PaddleNLP/machine_reading_comprehension/data/md5sum.txt similarity index 100% rename from fluid/PaddleNLP/machine_reading_comprehension/data/md5sum.txt rename to PaddleNLP/machine_reading_comprehension/data/md5sum.txt diff --git a/fluid/PaddleNLP/machine_reading_comprehension/dataset.py b/PaddleNLP/machine_reading_comprehension/dataset.py similarity index 100% rename from fluid/PaddleNLP/machine_reading_comprehension/dataset.py rename to PaddleNLP/machine_reading_comprehension/dataset.py diff --git a/fluid/PaddleNLP/machine_reading_comprehension/rc_model.py b/PaddleNLP/machine_reading_comprehension/rc_model.py similarity index 100% rename from fluid/PaddleNLP/machine_reading_comprehension/rc_model.py rename to PaddleNLP/machine_reading_comprehension/rc_model.py diff --git a/fluid/PaddleNLP/machine_reading_comprehension/run.py b/PaddleNLP/machine_reading_comprehension/run.py similarity index 100% rename from fluid/PaddleNLP/machine_reading_comprehension/run.py rename to PaddleNLP/machine_reading_comprehension/run.py diff --git a/fluid/PaddleNLP/machine_reading_comprehension/run.sh b/PaddleNLP/machine_reading_comprehension/run.sh similarity index 100% rename from fluid/PaddleNLP/machine_reading_comprehension/run.sh rename to PaddleNLP/machine_reading_comprehension/run.sh diff --git a/fluid/PaddleNLP/machine_reading_comprehension/utils/__init__.py b/PaddleNLP/machine_reading_comprehension/utils/__init__.py similarity index 100% rename from fluid/PaddleNLP/machine_reading_comprehension/utils/__init__.py rename to PaddleNLP/machine_reading_comprehension/utils/__init__.py diff --git a/fluid/PaddleNLP/machine_reading_comprehension/utils/download_thirdparty.sh b/PaddleNLP/machine_reading_comprehension/utils/download_thirdparty.sh similarity index 100% rename from fluid/PaddleNLP/machine_reading_comprehension/utils/download_thirdparty.sh rename to PaddleNLP/machine_reading_comprehension/utils/download_thirdparty.sh diff --git a/fluid/PaddleNLP/machine_reading_comprehension/utils/dureader_eval.py b/PaddleNLP/machine_reading_comprehension/utils/dureader_eval.py similarity index 100% rename from fluid/PaddleNLP/machine_reading_comprehension/utils/dureader_eval.py rename to PaddleNLP/machine_reading_comprehension/utils/dureader_eval.py diff --git a/fluid/PaddleNLP/machine_reading_comprehension/utils/get_vocab.py b/PaddleNLP/machine_reading_comprehension/utils/get_vocab.py similarity index 100% rename from fluid/PaddleNLP/machine_reading_comprehension/utils/get_vocab.py rename to PaddleNLP/machine_reading_comprehension/utils/get_vocab.py diff --git a/fluid/PaddleNLP/machine_reading_comprehension/utils/marco_tokenize_data.py b/PaddleNLP/machine_reading_comprehension/utils/marco_tokenize_data.py similarity index 100% rename from fluid/PaddleNLP/machine_reading_comprehension/utils/marco_tokenize_data.py rename to PaddleNLP/machine_reading_comprehension/utils/marco_tokenize_data.py diff --git a/fluid/PaddleNLP/machine_reading_comprehension/utils/marcov1_to_dureader.py b/PaddleNLP/machine_reading_comprehension/utils/marcov1_to_dureader.py similarity index 100% rename from fluid/PaddleNLP/machine_reading_comprehension/utils/marcov1_to_dureader.py rename to PaddleNLP/machine_reading_comprehension/utils/marcov1_to_dureader.py diff --git a/fluid/PaddleNLP/machine_reading_comprehension/utils/marcov2_to_v1_tojsonl.py b/PaddleNLP/machine_reading_comprehension/utils/marcov2_to_v1_tojsonl.py similarity index 100% rename from fluid/PaddleNLP/machine_reading_comprehension/utils/marcov2_to_v1_tojsonl.py rename to PaddleNLP/machine_reading_comprehension/utils/marcov2_to_v1_tojsonl.py diff --git a/fluid/PaddleNLP/machine_reading_comprehension/utils/preprocess.py b/PaddleNLP/machine_reading_comprehension/utils/preprocess.py similarity index 100% rename from fluid/PaddleNLP/machine_reading_comprehension/utils/preprocess.py rename to PaddleNLP/machine_reading_comprehension/utils/preprocess.py diff --git a/fluid/PaddleNLP/machine_reading_comprehension/utils/run_marco2dureader_preprocess.sh b/PaddleNLP/machine_reading_comprehension/utils/run_marco2dureader_preprocess.sh similarity index 100% rename from fluid/PaddleNLP/machine_reading_comprehension/utils/run_marco2dureader_preprocess.sh rename to PaddleNLP/machine_reading_comprehension/utils/run_marco2dureader_preprocess.sh diff --git a/fluid/PaddleNLP/machine_reading_comprehension/vocab.py b/PaddleNLP/machine_reading_comprehension/vocab.py similarity index 100% rename from fluid/PaddleNLP/machine_reading_comprehension/vocab.py rename to PaddleNLP/machine_reading_comprehension/vocab.py diff --git a/PaddleNLP/neural_machine_translation/README.md b/PaddleNLP/neural_machine_translation/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a0271ad42e62490282ccc154f6a3c50029b6d13d --- /dev/null +++ b/PaddleNLP/neural_machine_translation/README.md @@ -0,0 +1,9 @@ +The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html). + +--- + +This is a collection of example models for neural machine translation and neural sequence modeling. + +### TODO + +This project is still under active development. diff --git a/fluid/PaddleNLP/neural_machine_translation/rnn_search/.run_ce.sh b/PaddleNLP/neural_machine_translation/rnn_search/.run_ce.sh similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/rnn_search/.run_ce.sh rename to PaddleNLP/neural_machine_translation/rnn_search/.run_ce.sh diff --git a/PaddleNLP/neural_machine_translation/rnn_search/README.md b/PaddleNLP/neural_machine_translation/rnn_search/README.md new file mode 100644 index 0000000000000000000000000000000000000000..86d4a021baf11e04a9fd07c05dbf50425451efab --- /dev/null +++ b/PaddleNLP/neural_machine_translation/rnn_search/README.md @@ -0,0 +1,134 @@ +运行本目录下的范例模型需要安装PaddlePaddle Fluid 1.0版。如果您的 PaddlePaddle 安装版本低于此要求,请按照[安装文档](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html)中的说明更新 PaddlePaddle 安装版本。 + +# 机器翻译:RNN Search + +以下是本范例模型的简要目录结构及说明: + +```text +. +├── README.md # 文档,本文件 +├── args.py # 训练、预测以及模型参数 +├── train.py # 训练主程序 +├── infer.py # 预测主程序 +├── attention_model.py # 带注意力机制的翻译模型配置 +└── no_attention_model.py # 无注意力机制的翻译模型配置 +``` + +## 简介 +机器翻译(machine translation, MT)是用计算机来实现不同语言之间翻译的技术。被翻译的语言通常称为源语言(source language),翻译成的结果语言称为目标语言(target language)。机器翻译即实现从源语言到目标语言转换的过程,是自然语言处理的重要研究领域之一。 + +近年来,深度学习技术的发展不断为机器翻译任务带来新的突破。直接用神经网络将源语言映射到目标语言,即端到端的神经网络机器翻译(End-to-End Neural Machine Translation, End-to-End NMT)模型逐渐成为主流,此类模型一般简称为NMT模型。 + +本目录包含一个经典的机器翻译模型[RNN Search](https://arxiv.org/pdf/1409.0473.pdf)的Paddle Fluid实现。事实上,RNN search是一个较为传统的NMT模型,在现阶段,其表现已被很多新模型(如[Transformer](https://arxiv.org/abs/1706.03762))超越。但除机器翻译外,该模型是许多序列到序列(sequence to sequence, 以下简称Seq2Seq)类模型的基础,很多解决其他NLP问题的模型均以此模型为基础;因此其在NLP领域具有重要意义,并被广泛用作Baseline. + +本目录下此范例模型的实现,旨在展示如何用Paddle Fluid实现一个带有注意力机制(Attention)的RNN模型来解决Seq2Seq类问题,以及如何使用带有Beam Search算法的解码器。如果您仅仅只是需要在机器翻译方面有着较好翻译效果的模型,则建议您参考[Transformer的Paddle Fluid实现](https://github.com/PaddlePaddle/models/tree/develop/fluid/neural_machine_translation/transformer)。 + +## 模型概览 +RNN Search模型使用了经典的编码器-解码器(Encoder-Decoder)的框架结构来解决Seq2Seq类问题。这种方法先用编码器将源序列编码成vector,再用解码器将该vector解码为目标序列。这其实模拟了人类在进行翻译类任务时的行为:先解析源语言,理解其含义,再根据该含义来写出目标语言的语句。编码器和解码器往往都使用RNN来实现。关于此方法的具体原理和数学表达式,可以参考[深度学习101](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/basics/machine_translation/index.html). + +本模型中,在编码器方面,我们的实现使用了双向循环神经网络(Bi-directional Recurrent Neural Network);在解码器方面,我们使用了带注意力(Attention)机制的RNN解码器,并同时提供了一个不带注意力机制的解码器实现作为对比;而在预测方面我们使用柱搜索(beam search)算法来生成翻译的目标语句。以下将分别介绍用到的这些方法。 + +### 双向循环神经网络 +这里介绍Bengio团队在论文\[[2](#参考文献),[4](#参考文献)\]中提出的一种双向循环网络结构。该结构的目的是输入一个序列,得到其在每个时刻的特征表示,即输出的每个时刻都用定长向量表示到该时刻的上下文语义信息。 +具体来说,该双向循环神经网络分别在时间维以顺序和逆序——即前向(forward)和后向(backward)——依次处理输入序列,并将每个时间步RNN的输出拼接成为最终的输出层。这样每个时间步的输出节点,都包含了输入序列中当前时刻完整的过去和未来的上下文信息。下图展示的是一个按时间步展开的双向循环神经网络。该网络包含一个前向和一个后向RNN,其中有六个权重矩阵:输入到前向隐层和后向隐层的权重矩阵($W_1, W_3$),隐层到隐层自己的权重矩阵($W_2,W_5$),前向隐层和后向隐层到输出层的权重矩阵($W_4, W_6$)。注意,该网络的前向隐层和后向隐层之间没有连接。 + +

    +
    +图1. 按时间步展开的双向循环神经网络 +

    + +

    +
    +图2. 使用双向LSTM的编码器 +

    + +### 注意力机制 +如果编码阶段的输出是一个固定维度的向量,会带来以下两个问题:1)不论源语言序列的长度是5个词还是50个词,如果都用固定维度的向量去编码其中的语义和句法结构信息,对模型来说是一个非常高的要求,特别是对长句子序列而言;2)直觉上,当人类翻译一句话时,会对与当前译文更相关的源语言片段上给予更多关注,且关注点会随着翻译的进行而改变。而固定维度的向量则相当于,任何时刻都对源语言所有信息给予了同等程度的关注,这是不合理的。因此,Bahdanau等人\[[4](#参考文献)\]引入注意力(attention)机制,可以对编码后的上下文片段进行解码,以此来解决长句子的特征学习问题。下面介绍在注意力机制下的解码器结构。 + +与简单的解码器不同,这里$z_i$的计算公式为 (由于Github原生不支持LaTeX公式,请您移步[这里](http://www.paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/basics/machine_translation/index.html)查看): + +$$z_{i+1}=\phi _{\theta '}\left ( c_i,u_i,z_i \right )$$ + +可见,源语言句子的编码向量表示为第$i$个词的上下文片段$c_i$,即针对每一个目标语言中的词$u_i$,都有一个特定的$c_i$与之对应。$c_i$的计算公式如下: + +$$c_i=\sum _{j=1}^{T}a_{ij}h_j, a_i=\left[ a_{i1},a_{i2},...,a_{iT}\right ]$$ + +从公式中可以看出,注意力机制是通过对编码器中各时刻的RNN状态$h_j$进行加权平均实现的。权重$a_{ij}$表示目标语言中第$i$个词对源语言中第$j$个词的注意力大小,$a_{ij}$的计算公式如下: + +$$a_{ij} = {exp(e_{ij}) \over {\sum_{k=1}^T exp(e_{ik})}}$$ +$$e_{ij} = {align(z_i, h_j)}$$ + +其中,$align$可以看作是一个对齐模型,用来衡量目标语言中第$i$个词和源语言中第$j$个词的匹配程度。具体而言,这个程度是通过解码RNN的第$i$个隐层状态$z_i$和源语言句子的第$j$个上下文片段$h_j$计算得到的。传统的对齐模型中,目标语言的每个词明确对应源语言的一个或多个词(hard alignment);而在注意力模型中采用的是soft alignment,即任何两个目标语言和源语言词间均存在一定的关联,且这个关联强度是由模型计算得到的实数,因此可以融入整个NMT框架,并通过反向传播算法进行训练。 + +

    +
    +图3. 基于注意力机制的解码器 +

    + +### 柱搜索算法 + +柱搜索([beam search](http://en.wikipedia.org/wiki/Beam_search))是一种启发式图搜索算法,用于在图或树中搜索有限集合中的最优扩展节点,通常用在解空间非常大的系统(如机器翻译、语音识别)中,原因是内存无法装下图或树中所有展开的解。如在机器翻译任务中希望翻译“`你好`”,就算目标语言字典中只有3个词(``, ``, `hello`),也可能生成无限句话(`hello`循环出现的次数不定),为了找到其中较好的翻译结果,我们可采用柱搜索算法。 + +柱搜索算法使用广度优先策略建立搜索树,在树的每一层,按照启发代价(heuristic cost)(本教程中,为生成词的log概率之和)对节点进行排序,然后仅留下预先确定的个数(文献中通常称为beam width、beam size、柱宽度等)的节点。只有这些节点会在下一层继续扩展,其他节点就被剪掉了,也就是说保留了质量较高的节点,剪枝了质量较差的节点。因此,搜索所占用的空间和时间大幅减少,但缺点是无法保证一定获得最优解。 + +使用柱搜索算法的解码阶段,目标是最大化生成序列的概率。思路是: + +1. 每一个时刻,根据源语言句子的编码信息$c$、生成的第$i$个目标语言序列单词$u_i$和$i$时刻RNN的隐层状态$z_i$,计算出下一个隐层状态$z_{i+1}$。 +2. 将$z_{i+1}$通过`softmax`归一化,得到目标语言序列的第$i+1$个单词的概率分布$p_{i+1}$。 +3. 根据$p_{i+1}$采样出单词$u_{i+1}$。 +4. 重复步骤1~3,直到获得句子结束标记``或超过句子的最大生成长度为止。 + +注意:$z_{i+1}$和$p_{i+1}$的计算公式同解码器中的一样。且由于生成时的每一步都是通过贪心法实现的,因此并不能保证得到全局最优解。 + +## 数据介绍 + +本教程使用[WMT-14](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/)数据集中的[bitexts(after selection)](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/data/bitexts.tgz)作为训练集,[dev+test data](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/data/dev+test.tgz)作为测试集和生成集。 + +### 数据预处理 + +我们的预处理流程包括两步: +- 将每个源语言到目标语言的平行语料库文件合并为一个文件: + - 合并每个`XXX.src`和`XXX.trg`文件为`XXX`。 + - `XXX`中的第$i$行内容为`XXX.src`中的第$i$行和`XXX.trg`中的第$i$行连接,用'\t'分隔。 +- 创建训练数据的“源字典”和“目标字典”。每个字典都有**DICTSIZE**个单词,包括:语料中词频最高的(DICTSIZE - 3)个单词,和3个特殊符号``(序列的开始)、``(序列的结束)和``(未登录词)。 + +### 示例数据 + +因为完整的数据集数据量较大,为了验证训练流程,PaddlePaddle接口paddle.dataset.wmt14中默认提供了一个经过预处理的[较小规模的数据集](http://paddlepaddle.bj.bcebos.com/demo/wmt_shrinked_data/wmt14.tgz)。 + +该数据集有193319条训练数据,6003条测试数据,词典长度为30000。因为数据规模限制,使用该数据集训练出来的模型效果无法保证。 + +## 训练模型 + +`train.py`包含训练程序的主函数,要使用默认参数开始训练,只需要简单地执行: +```sh +python train.py +``` +您可以使用命令行参数来设置模型训练时的参数。要显示所有可用的命令行参数,执行: +```sh +python train.py -h +``` +这样会显示所有的命令行参数的描述,以及其默认值。默认的模型是带有注意力机制的。您也可以尝试运行无注意力机制的模型,命令如下: +```sh +python train.py --no_attention +``` +训练好的模型默认会被保存到`./models`路径下。您可以用命令行参数`--save_dir`来指定模型的保存路径。默认每个pass结束时会保存一个模型。 + +## 生成预测结果 + +在模型训练好后,可以用`infer.py`来生成预测结果。同样的,使用默认参数,只需要执行: +```sh +python infer.py +``` +您也可以同样用命令行来指定各参数。注意,预测时的参数设置必须与训练时完全一致,否则载入模型会失败。您可以用`--pass_num`参数来选择读取哪个pass结束时保存的模型。同时您可以使用`--beam_width`参数来选择beam search宽度。 + +## 参考文献 + +1. Koehn P. [Statistical machine translation](https://books.google.com.hk/books?id=4v_Cx1wIMLkC&printsec=frontcover&hl=zh-CN&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false)[M]. Cambridge University Press, 2009. +2. Cho K, Van Merriënboer B, Gulcehre C, et al. [Learning phrase representations using RNN encoder-decoder for statistical machine translation](http://www.aclweb.org/anthology/D/D14/D14-1179.pdf)[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014: 1724-1734. +3. Chung J, Gulcehre C, Cho K H, et al. [Empirical evaluation of gated recurrent neural networks on sequence modeling](https://arxiv.org/abs/1412.3555)[J]. arXiv preprint arXiv:1412.3555, 2014. +4. Bahdanau D, Cho K, Bengio Y. [Neural machine translation by jointly learning to align and translate](https://arxiv.org/abs/1409.0473)[C]//Proceedings of ICLR 2015, 2015. +5. Papineni K, Roukos S, Ward T, et al. [BLEU: a method for automatic evaluation of machine translation](http://dl.acm.org/citation.cfm?id=1073135)[C]//Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2002: 311-318. + +
    +知识共享许可协议
    本教程PaddlePaddle 创作,采用 知识共享 署名-相同方式共享 4.0 国际 许可协议进行许可。 diff --git a/fluid/PaddleNLP/neural_machine_translation/rnn_search/_ce.py b/PaddleNLP/neural_machine_translation/rnn_search/_ce.py similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/rnn_search/_ce.py rename to PaddleNLP/neural_machine_translation/rnn_search/_ce.py diff --git a/fluid/PaddleNLP/neural_machine_translation/rnn_search/args.py b/PaddleNLP/neural_machine_translation/rnn_search/args.py similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/rnn_search/args.py rename to PaddleNLP/neural_machine_translation/rnn_search/args.py diff --git a/fluid/PaddleNLP/neural_machine_translation/rnn_search/attention_model.py b/PaddleNLP/neural_machine_translation/rnn_search/attention_model.py similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/rnn_search/attention_model.py rename to PaddleNLP/neural_machine_translation/rnn_search/attention_model.py diff --git a/fluid/PaddleNLP/neural_machine_translation/rnn_search/images/bi_rnn.png b/PaddleNLP/neural_machine_translation/rnn_search/images/bi_rnn.png similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/rnn_search/images/bi_rnn.png rename to PaddleNLP/neural_machine_translation/rnn_search/images/bi_rnn.png diff --git a/fluid/PaddleNLP/neural_machine_translation/rnn_search/images/decoder_attention.png b/PaddleNLP/neural_machine_translation/rnn_search/images/decoder_attention.png similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/rnn_search/images/decoder_attention.png rename to PaddleNLP/neural_machine_translation/rnn_search/images/decoder_attention.png diff --git a/fluid/PaddleNLP/neural_machine_translation/rnn_search/images/encoder_attention.png b/PaddleNLP/neural_machine_translation/rnn_search/images/encoder_attention.png similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/rnn_search/images/encoder_attention.png rename to PaddleNLP/neural_machine_translation/rnn_search/images/encoder_attention.png diff --git a/fluid/PaddleNLP/neural_machine_translation/rnn_search/infer.py b/PaddleNLP/neural_machine_translation/rnn_search/infer.py similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/rnn_search/infer.py rename to PaddleNLP/neural_machine_translation/rnn_search/infer.py diff --git a/fluid/PaddleNLP/neural_machine_translation/rnn_search/no_attention_model.py b/PaddleNLP/neural_machine_translation/rnn_search/no_attention_model.py similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/rnn_search/no_attention_model.py rename to PaddleNLP/neural_machine_translation/rnn_search/no_attention_model.py diff --git a/fluid/PaddleNLP/neural_machine_translation/rnn_search/train.py b/PaddleNLP/neural_machine_translation/rnn_search/train.py similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/rnn_search/train.py rename to PaddleNLP/neural_machine_translation/rnn_search/train.py diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/.gitignore b/PaddleNLP/neural_machine_translation/transformer/.gitignore similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/transformer/.gitignore rename to PaddleNLP/neural_machine_translation/transformer/.gitignore diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/.run_ce.sh b/PaddleNLP/neural_machine_translation/transformer/.run_ce.sh similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/transformer/.run_ce.sh rename to PaddleNLP/neural_machine_translation/transformer/.run_ce.sh diff --git a/PaddleNLP/neural_machine_translation/transformer/README.md b/PaddleNLP/neural_machine_translation/transformer/README.md new file mode 100644 index 0000000000000000000000000000000000000000..6fea167b5e7c3e9dd759ef30d9225b451350e889 --- /dev/null +++ b/PaddleNLP/neural_machine_translation/transformer/README.md @@ -0,0 +1,23 @@ +The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html). + +--- + +# Attention is All You Need: A Paddle Fluid implementation + +This is a Paddle Fluid implementation of the Transformer model in [Attention is All You Need]() (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017). + +If you use the dataset/code in your research, please cite the paper: + +```text +@inproceedings{vaswani2017attention, + title={Attention is all you need}, + author={Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia}, + booktitle={Advances in Neural Information Processing Systems}, + pages={6000--6010}, + year={2017} +} +``` + +### TODO + +This project is still under active development. diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/README_cn.md b/PaddleNLP/neural_machine_translation/transformer/README_cn.md similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/transformer/README_cn.md rename to PaddleNLP/neural_machine_translation/transformer/README_cn.md diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/_ce.py b/PaddleNLP/neural_machine_translation/transformer/_ce.py similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/transformer/_ce.py rename to PaddleNLP/neural_machine_translation/transformer/_ce.py diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/config.py b/PaddleNLP/neural_machine_translation/transformer/config.py similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/transformer/config.py rename to PaddleNLP/neural_machine_translation/transformer/config.py diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/gen_data.sh b/PaddleNLP/neural_machine_translation/transformer/gen_data.sh similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/transformer/gen_data.sh rename to PaddleNLP/neural_machine_translation/transformer/gen_data.sh diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/images/attention_formula.png b/PaddleNLP/neural_machine_translation/transformer/images/attention_formula.png similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/transformer/images/attention_formula.png rename to PaddleNLP/neural_machine_translation/transformer/images/attention_formula.png diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/images/multi_head_attention.png b/PaddleNLP/neural_machine_translation/transformer/images/multi_head_attention.png similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/transformer/images/multi_head_attention.png rename to PaddleNLP/neural_machine_translation/transformer/images/multi_head_attention.png diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/images/transformer_network.png b/PaddleNLP/neural_machine_translation/transformer/images/transformer_network.png similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/transformer/images/transformer_network.png rename to PaddleNLP/neural_machine_translation/transformer/images/transformer_network.png diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/infer.py b/PaddleNLP/neural_machine_translation/transformer/infer.py similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/transformer/infer.py rename to PaddleNLP/neural_machine_translation/transformer/infer.py diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/local_dist.sh b/PaddleNLP/neural_machine_translation/transformer/local_dist.sh similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/transformer/local_dist.sh rename to PaddleNLP/neural_machine_translation/transformer/local_dist.sh diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/model.py b/PaddleNLP/neural_machine_translation/transformer/model.py similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/transformer/model.py rename to PaddleNLP/neural_machine_translation/transformer/model.py diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/optim.py b/PaddleNLP/neural_machine_translation/transformer/optim.py similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/transformer/optim.py rename to PaddleNLP/neural_machine_translation/transformer/optim.py diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/profile.py b/PaddleNLP/neural_machine_translation/transformer/profile.py similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/transformer/profile.py rename to PaddleNLP/neural_machine_translation/transformer/profile.py diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/reader.py b/PaddleNLP/neural_machine_translation/transformer/reader.py similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/transformer/reader.py rename to PaddleNLP/neural_machine_translation/transformer/reader.py diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/train.py b/PaddleNLP/neural_machine_translation/transformer/train.py similarity index 100% rename from fluid/PaddleNLP/neural_machine_translation/transformer/train.py rename to PaddleNLP/neural_machine_translation/transformer/train.py diff --git a/fluid/PaddleNLP/sequence_tagging_for_ner/.run_ce.sh b/PaddleNLP/sequence_tagging_for_ner/.run_ce.sh similarity index 100% rename from fluid/PaddleNLP/sequence_tagging_for_ner/.run_ce.sh rename to PaddleNLP/sequence_tagging_for_ner/.run_ce.sh diff --git a/PaddleNLP/sequence_tagging_for_ner/README.md b/PaddleNLP/sequence_tagging_for_ner/README.md new file mode 100644 index 0000000000000000000000000000000000000000..6d4efa9eb19dd708a87d4883dccef5ecb5e11666 --- /dev/null +++ b/PaddleNLP/sequence_tagging_for_ner/README.md @@ -0,0 +1,116 @@ +# 命名实体识别 + +以下是本例的简要目录结构及说明: + +```text +. +├── data # 存储运行本例所依赖的数据,从外部获取 +├── network_conf.py # 模型定义 +├── reader.py # 数据读取接口, 从外部获取 +├── README.md # 文档 +├── train.py # 训练脚本 +├── infer.py # 预测脚本 +├── utils.py # 定义通用的函数, 从外部获取 +└── utils_extend.py # 对utils.py的拓展 +``` + + +## 简介,模型详解 + +在PaddlePaddle v2版本[命名实体识别](https://github.com/PaddlePaddle/models/blob/develop/legacy/sequence_tagging_for_ner/README.md)中对于命名实体识别任务有较详细的介绍,在本例中不再重复介绍。 +在模型上,我们沿用了v2版本的模型结构,唯一区别是我们使用LSTM代替原始的RNN。 + +## 数据获取 + +完整数据的获取请参考PaddlePaddle v2版本[命名实体识别](https://github.com/PaddlePaddle/models/blob/develop/legacy/sequence_tagging_for_ner/README.md) 一节中的方式。本例的示例数据同样可以通过运行data/download.sh来获取。 + +## 训练 + +1. 运行 `sh data/download.sh` +2. 修改 `train.py` 的 `main` 函数,指定数据路径 + + ```python + main( + train_data_file="data/train", + test_data_file="data/test", + vocab_file="data/vocab.txt", + target_file="data/target.txt", + emb_file="data/wordVectors.txt", + model_save_dir="models", + num_passes=1000, + use_gpu=False, + parallel=False) + ``` + +3. 运行命令 `python train.py` ,**需要注意:直接运行使用的是示例数据,请替换真实的标记数据。** + + ```text + Pass 127, Batch 9525, Cost 4.0867705, Precision 0.3954984, Recall 0.37846154, F1_score0.38679245 + Pass 127, Batch 9530, Cost 3.137265, Precision 0.42971888, Recall 0.38351256, F1_score0.405303 + Pass 127, Batch 9535, Cost 3.6240938, Precision 0.4272152, Recall 0.41795665, F1_score0.4225352 + Pass 127, Batch 9540, Cost 3.5352352, Precision 0.48464164, Recall 0.4536741, F1_score0.46864685 + Pass 127, Batch 9545, Cost 4.1130385, Precision 0.40131578, Recall 0.3836478, F1_score0.39228293 + Pass 127, Batch 9550, Cost 3.6826708, Precision 0.43333334, Recall 0.43730888, F1_score0.43531203 + Pass 127, Batch 9555, Cost 3.6363933, Precision 0.42424244, Recall 0.3962264, F1_score0.4097561 + Pass 127, Batch 9560, Cost 3.6101768, Precision 0.51363635, Recall 0.353125, F1_score0.41851854 + Pass 127, Batch 9565, Cost 3.5935276, Precision 0.5152439, Recall 0.5, F1_score0.5075075 + Pass 127, Batch 9570, Cost 3.4987144, Precision 0.5, Recall 0.4330218, F1_score0.46410686 + Pass 127, Batch 9575, Cost 3.4659843, Precision 0.39864865, Recall 0.38064516, F1_score0.38943896 + Pass 127, Batch 9580, Cost 3.1702557, Precision 0.5, Recall 0.4490446, F1_score0.47315437 + Pass 127, Batch 9585, Cost 3.1587276, Precision 0.49377593, Recall 0.4089347, F1_score0.4473684 + Pass 127, Batch 9590, Cost 3.5043538, Precision 0.4556962, Recall 0.4600639, F1_score0.45786962 + Pass 127, Batch 9595, Cost 2.981989, Precision 0.44981414, Recall 0.45149255, F1_score0.4506518 + [TrainSet] pass_id:127 pass_precision:[0.46023396] pass_recall:[0.43197003] pass_f1_score:[0.44565433] + [TestSet] pass_id:127 pass_precision:[0.4708409] pass_recall:[0.47971722] pass_f1_score:[0.4752376] + ``` +## 预测 +1. 修改 [infer.py](./infer.py) 的 `infer` 函数,指定:需要测试的模型的路径、测试数据、字典文件,预测标记文件的路径,默认参数如下: + + ```python + infer( + model_path="models/params_pass_0", + batch_size=6, + test_data_file="data/test", + vocab_file="data/vocab.txt", + target_file="data/target.txt", + use_gpu=False + ) + ``` + +2. 在终端运行 `python infer.py`,开始测试,会看到如下预测结果(以下为训练70个pass所得模型的部分预测结果): + + ```text + leicestershire B-ORG B-LOC + extended O O + their O O + first O O + innings O O + by O O + DGDG O O + runs O O + before O O + being O O + bowled O O + out O O + for O O + 296 O O + with O O + england B-LOC B-LOC + discard O O + andy B-PER B-PER + caddick I-PER I-PER + taking O O + three O O + for O O + DGDG O O + . O O + ``` + + 输出分为三列,以“\t” 分隔,第一列是输入的词语,第二列是标准结果,第三列为生成的标记结果。多条输入序列之间以空行分隔。 + +## 结果示例 + +

    +
    +图1. 学习曲线, 横轴表示训练轮数,纵轴表示F1值 +

    diff --git a/fluid/PaddleNLP/sequence_tagging_for_ner/_ce.py b/PaddleNLP/sequence_tagging_for_ner/_ce.py similarity index 100% rename from fluid/PaddleNLP/sequence_tagging_for_ner/_ce.py rename to PaddleNLP/sequence_tagging_for_ner/_ce.py diff --git a/fluid/PaddleNLP/sequence_tagging_for_ner/data/download.sh b/PaddleNLP/sequence_tagging_for_ner/data/download.sh similarity index 100% rename from fluid/PaddleNLP/sequence_tagging_for_ner/data/download.sh rename to PaddleNLP/sequence_tagging_for_ner/data/download.sh diff --git a/fluid/PaddleNLP/sequence_tagging_for_ner/data/target.txt b/PaddleNLP/sequence_tagging_for_ner/data/target.txt similarity index 100% rename from fluid/PaddleNLP/sequence_tagging_for_ner/data/target.txt rename to PaddleNLP/sequence_tagging_for_ner/data/target.txt diff --git a/fluid/PaddleNLP/sequence_tagging_for_ner/data/test b/PaddleNLP/sequence_tagging_for_ner/data/test similarity index 100% rename from fluid/PaddleNLP/sequence_tagging_for_ner/data/test rename to PaddleNLP/sequence_tagging_for_ner/data/test diff --git a/fluid/PaddleNLP/sequence_tagging_for_ner/data/train b/PaddleNLP/sequence_tagging_for_ner/data/train similarity index 100% rename from fluid/PaddleNLP/sequence_tagging_for_ner/data/train rename to PaddleNLP/sequence_tagging_for_ner/data/train diff --git a/fluid/PaddleNLP/sequence_tagging_for_ner/imgs/convergence_curve.png b/PaddleNLP/sequence_tagging_for_ner/imgs/convergence_curve.png similarity index 100% rename from fluid/PaddleNLP/sequence_tagging_for_ner/imgs/convergence_curve.png rename to PaddleNLP/sequence_tagging_for_ner/imgs/convergence_curve.png diff --git a/fluid/PaddleNLP/sequence_tagging_for_ner/infer.py b/PaddleNLP/sequence_tagging_for_ner/infer.py similarity index 100% rename from fluid/PaddleNLP/sequence_tagging_for_ner/infer.py rename to PaddleNLP/sequence_tagging_for_ner/infer.py diff --git a/fluid/PaddleNLP/sequence_tagging_for_ner/network_conf.py b/PaddleNLP/sequence_tagging_for_ner/network_conf.py similarity index 100% rename from fluid/PaddleNLP/sequence_tagging_for_ner/network_conf.py rename to PaddleNLP/sequence_tagging_for_ner/network_conf.py diff --git a/fluid/PaddleNLP/sequence_tagging_for_ner/reader.py b/PaddleNLP/sequence_tagging_for_ner/reader.py similarity index 100% rename from fluid/PaddleNLP/sequence_tagging_for_ner/reader.py rename to PaddleNLP/sequence_tagging_for_ner/reader.py diff --git a/fluid/PaddleNLP/sequence_tagging_for_ner/train.py b/PaddleNLP/sequence_tagging_for_ner/train.py similarity index 100% rename from fluid/PaddleNLP/sequence_tagging_for_ner/train.py rename to PaddleNLP/sequence_tagging_for_ner/train.py diff --git a/fluid/PaddleNLP/sequence_tagging_for_ner/utils.py b/PaddleNLP/sequence_tagging_for_ner/utils.py similarity index 100% rename from fluid/PaddleNLP/sequence_tagging_for_ner/utils.py rename to PaddleNLP/sequence_tagging_for_ner/utils.py diff --git a/fluid/PaddleNLP/sequence_tagging_for_ner/utils_extend.py b/PaddleNLP/sequence_tagging_for_ner/utils_extend.py similarity index 100% rename from fluid/PaddleNLP/sequence_tagging_for_ner/utils_extend.py rename to PaddleNLP/sequence_tagging_for_ner/utils_extend.py diff --git a/fluid/PaddleNLP/text_classification/.run_ce.sh b/PaddleNLP/text_classification/.run_ce.sh similarity index 100% rename from fluid/PaddleNLP/text_classification/.run_ce.sh rename to PaddleNLP/text_classification/.run_ce.sh diff --git a/PaddleNLP/text_classification/README.md b/PaddleNLP/text_classification/README.md new file mode 100644 index 0000000000000000000000000000000000000000..669774bac04fe906cc5bffafa1f60de60323c806 --- /dev/null +++ b/PaddleNLP/text_classification/README.md @@ -0,0 +1,112 @@ +# 文本分类 + +以下是本例的简要目录结构及说明: + +```text +. +├── nets.py # 模型定义 +├── README.md # 文档 +├── train.py # 训练脚本 +├── infer.py # 预测脚本 +└── utils.py # 定义通用函数,从外部获取 +``` + + +## 简介,模型详解 + +在PaddlePaddle v2版本[文本分类](https://github.com/PaddlePaddle/models/blob/develop/legacy/text_classification/README.md)中对于文本分类任务有较详细的介绍,在本例中不再重复介绍。 +在模型上,我们采用了bow, cnn, lstm, gru四种常见的文本分类模型。 + +## 训练 + +1. 运行命令 `python train.py bow` 开始训练模型。 + ```python + python train.py bow # bow指定网络结构,可替换成cnn, lstm, gru + ``` + +2. (可选)想自定义网络结构,需在[nets.py](./nets.py)中自行添加,并设置[train.py](./train.py)中的相应参数。 + ```python + def train(train_reader, # 训练数据 + word_dict, # 数据字典 + network, # 模型配置 + use_cuda, # 是否用GPU + parallel, # 是否并行 + save_dirname, # 保存模型路径 + lr=0.2, # 学习率大小 + batch_size=128, # 每个batch的样本数 + pass_num=30): # 训练的轮数 + ``` + +## 训练结果示例 +```text + pass_id: 0, avg_acc: 0.848040, avg_cost: 0.354073 + pass_id: 1, avg_acc: 0.914200, avg_cost: 0.217945 + pass_id: 2, avg_acc: 0.929800, avg_cost: 0.184302 + pass_id: 3, avg_acc: 0.938680, avg_cost: 0.164240 + pass_id: 4, avg_acc: 0.945120, avg_cost: 0.149150 + pass_id: 5, avg_acc: 0.951280, avg_cost: 0.137117 + pass_id: 6, avg_acc: 0.955360, avg_cost: 0.126434 + pass_id: 7, avg_acc: 0.961400, avg_cost: 0.117405 + pass_id: 8, avg_acc: 0.963560, avg_cost: 0.110070 + pass_id: 9, avg_acc: 0.965840, avg_cost: 0.103273 + pass_id: 10, avg_acc: 0.969800, avg_cost: 0.096314 + pass_id: 11, avg_acc: 0.971720, avg_cost: 0.090206 + pass_id: 12, avg_acc: 0.974800, avg_cost: 0.084970 + pass_id: 13, avg_acc: 0.977400, avg_cost: 0.078981 + pass_id: 14, avg_acc: 0.980000, avg_cost: 0.073685 + pass_id: 15, avg_acc: 0.981080, avg_cost: 0.069898 + pass_id: 16, avg_acc: 0.982080, avg_cost: 0.064923 + pass_id: 17, avg_acc: 0.984680, avg_cost: 0.060861 + pass_id: 18, avg_acc: 0.985840, avg_cost: 0.057095 + pass_id: 19, avg_acc: 0.988080, avg_cost: 0.052424 + pass_id: 20, avg_acc: 0.989160, avg_cost: 0.049059 + pass_id: 21, avg_acc: 0.990120, avg_cost: 0.045882 + pass_id: 22, avg_acc: 0.992080, avg_cost: 0.042140 + pass_id: 23, avg_acc: 0.992280, avg_cost: 0.039722 + pass_id: 24, avg_acc: 0.992840, avg_cost: 0.036607 + pass_id: 25, avg_acc: 0.994440, avg_cost: 0.034040 + pass_id: 26, avg_acc: 0.995000, avg_cost: 0.031501 + pass_id: 27, avg_acc: 0.995440, avg_cost: 0.028988 + pass_id: 28, avg_acc: 0.996240, avg_cost: 0.026639 + pass_id: 29, avg_acc: 0.996960, avg_cost: 0.024186 +``` + +## 预测 +1. 运行命令 `python infer.py bow_model`, 开始预测。 + ```python + python infer.py bow_model # bow_model指定需要导入的模型 + +## 预测结果示例 +```text + model_path: bow_model/epoch0, avg_acc: 0.882800 + model_path: bow_model/epoch1, avg_acc: 0.882360 + model_path: bow_model/epoch2, avg_acc: 0.881400 + model_path: bow_model/epoch3, avg_acc: 0.877800 + model_path: bow_model/epoch4, avg_acc: 0.872920 + model_path: bow_model/epoch5, avg_acc: 0.872640 + model_path: bow_model/epoch6, avg_acc: 0.869960 + model_path: bow_model/epoch7, avg_acc: 0.865160 + model_path: bow_model/epoch8, avg_acc: 0.863680 + model_path: bow_model/epoch9, avg_acc: 0.861200 + model_path: bow_model/epoch10, avg_acc: 0.853520 + model_path: bow_model/epoch11, avg_acc: 0.850400 + model_path: bow_model/epoch12, avg_acc: 0.855960 + model_path: bow_model/epoch13, avg_acc: 0.853480 + model_path: bow_model/epoch14, avg_acc: 0.855960 + model_path: bow_model/epoch15, avg_acc: 0.854120 + model_path: bow_model/epoch16, avg_acc: 0.854160 + model_path: bow_model/epoch17, avg_acc: 0.852240 + model_path: bow_model/epoch18, avg_acc: 0.852320 + model_path: bow_model/epoch19, avg_acc: 0.850280 + model_path: bow_model/epoch20, avg_acc: 0.849760 + model_path: bow_model/epoch21, avg_acc: 0.850160 + model_path: bow_model/epoch22, avg_acc: 0.846800 + model_path: bow_model/epoch23, avg_acc: 0.845440 + model_path: bow_model/epoch24, avg_acc: 0.845640 + model_path: bow_model/epoch25, avg_acc: 0.846200 + model_path: bow_model/epoch26, avg_acc: 0.845880 + model_path: bow_model/epoch27, avg_acc: 0.844880 + model_path: bow_model/epoch28, avg_acc: 0.844680 + model_path: bow_model/epoch29, avg_acc: 0.844960 +``` +注:过拟合导致acc持续下降,请忽略 diff --git a/fluid/PaddleNLP/text_classification/_ce.py b/PaddleNLP/text_classification/_ce.py similarity index 100% rename from fluid/PaddleNLP/text_classification/_ce.py rename to PaddleNLP/text_classification/_ce.py diff --git a/fluid/PaddleNLP/text_classification/async_executor/README.md b/PaddleNLP/text_classification/async_executor/README.md similarity index 100% rename from fluid/PaddleNLP/text_classification/async_executor/README.md rename to PaddleNLP/text_classification/async_executor/README.md diff --git a/fluid/PaddleNLP/text_classification/async_executor/data_generator.sh b/PaddleNLP/text_classification/async_executor/data_generator.sh similarity index 100% rename from fluid/PaddleNLP/text_classification/async_executor/data_generator.sh rename to PaddleNLP/text_classification/async_executor/data_generator.sh diff --git a/fluid/PaddleNLP/text_classification/async_executor/data_generator/IMDB.py b/PaddleNLP/text_classification/async_executor/data_generator/IMDB.py similarity index 100% rename from fluid/PaddleNLP/text_classification/async_executor/data_generator/IMDB.py rename to PaddleNLP/text_classification/async_executor/data_generator/IMDB.py diff --git a/fluid/PaddleNLP/text_classification/async_executor/data_generator/build_raw_data.py b/PaddleNLP/text_classification/async_executor/data_generator/build_raw_data.py similarity index 100% rename from fluid/PaddleNLP/text_classification/async_executor/data_generator/build_raw_data.py rename to PaddleNLP/text_classification/async_executor/data_generator/build_raw_data.py diff --git a/fluid/PaddleNLP/text_classification/async_executor/data_generator/data_generator.py b/PaddleNLP/text_classification/async_executor/data_generator/data_generator.py similarity index 100% rename from fluid/PaddleNLP/text_classification/async_executor/data_generator/data_generator.py rename to PaddleNLP/text_classification/async_executor/data_generator/data_generator.py diff --git a/fluid/PaddleNLP/text_classification/async_executor/data_generator/splitfile.py b/PaddleNLP/text_classification/async_executor/data_generator/splitfile.py similarity index 100% rename from fluid/PaddleNLP/text_classification/async_executor/data_generator/splitfile.py rename to PaddleNLP/text_classification/async_executor/data_generator/splitfile.py diff --git a/fluid/PaddleNLP/text_classification/async_executor/data_reader.py b/PaddleNLP/text_classification/async_executor/data_reader.py similarity index 100% rename from fluid/PaddleNLP/text_classification/async_executor/data_reader.py rename to PaddleNLP/text_classification/async_executor/data_reader.py diff --git a/fluid/PaddleNLP/text_classification/async_executor/infer.py b/PaddleNLP/text_classification/async_executor/infer.py similarity index 100% rename from fluid/PaddleNLP/text_classification/async_executor/infer.py rename to PaddleNLP/text_classification/async_executor/infer.py diff --git a/fluid/PaddleNLP/text_classification/async_executor/train.py b/PaddleNLP/text_classification/async_executor/train.py similarity index 100% rename from fluid/PaddleNLP/text_classification/async_executor/train.py rename to PaddleNLP/text_classification/async_executor/train.py diff --git a/fluid/PaddleNLP/text_classification/clouds/scdb_parallel_executor.py b/PaddleNLP/text_classification/clouds/scdb_parallel_executor.py similarity index 100% rename from fluid/PaddleNLP/text_classification/clouds/scdb_parallel_executor.py rename to PaddleNLP/text_classification/clouds/scdb_parallel_executor.py diff --git a/fluid/PaddleNLP/text_classification/clouds/scdb_single_card.py b/PaddleNLP/text_classification/clouds/scdb_single_card.py similarity index 100% rename from fluid/PaddleNLP/text_classification/clouds/scdb_single_card.py rename to PaddleNLP/text_classification/clouds/scdb_single_card.py diff --git a/fluid/PaddleNLP/text_classification/infer.py b/PaddleNLP/text_classification/infer.py similarity index 100% rename from fluid/PaddleNLP/text_classification/infer.py rename to PaddleNLP/text_classification/infer.py diff --git a/fluid/PaddleNLP/text_classification/nets.py b/PaddleNLP/text_classification/nets.py similarity index 100% rename from fluid/PaddleNLP/text_classification/nets.py rename to PaddleNLP/text_classification/nets.py diff --git a/fluid/PaddleNLP/text_classification/train.py b/PaddleNLP/text_classification/train.py similarity index 100% rename from fluid/PaddleNLP/text_classification/train.py rename to PaddleNLP/text_classification/train.py diff --git a/fluid/PaddleNLP/text_classification/utils.py b/PaddleNLP/text_classification/utils.py similarity index 100% rename from fluid/PaddleNLP/text_classification/utils.py rename to PaddleNLP/text_classification/utils.py diff --git a/fluid/PaddleNLP/text_matching_on_quora/.run_ce.sh b/PaddleNLP/text_matching_on_quora/.run_ce.sh similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/.run_ce.sh rename to PaddleNLP/text_matching_on_quora/.run_ce.sh diff --git a/PaddleNLP/text_matching_on_quora/README.md b/PaddleNLP/text_matching_on_quora/README.md new file mode 100644 index 0000000000000000000000000000000000000000..77d93943ae7dcbe775e60307b74430f320dbaab1 --- /dev/null +++ b/PaddleNLP/text_matching_on_quora/README.md @@ -0,0 +1,177 @@ +# Text matching on Quora qestion-answer pair dataset + +## contents + +* [Introduction](#introduction) + * [a brief review of the Quora Question Pair (QQP) Task](#a-brief-review-of-the-quora-question-pair-qqp-task) + * [Our Work](#our-work) +* [Environment Preparation](#environment-preparation) + * [Install Fluid release 1.0](#install-fluid-release-10) + * [cpu version](#cpu-version) + * [gpu version](#gpu-version) + * [Have I installed Fluid successfully?](#have-i-installed-fluid-successfully) +* [Prepare Data](#prepare-data) +* [Train and evaluate](#train-and-evaluate) +* [Models](#models) +* [Results](#results) + + +## Introduction + +### a brief review of the Quora Question Pair (QQP) Task + +The [Quora Question Pair](https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs) dataset contains 400,000 question pairs from [Quora](https://www.quora.com/), where people ask and answer questions related to specific areas. Each sample in the dataset consists of two questions (both English) and a label that represents whether the questions are duplicate. The dataset is well annotated by human. + +Below are two samples from the dataset. The last column indicates whether the two questions are duplicate (1) or not (0). + +|id | qid1 | qid2| question1| question2| is_duplicate +|:---:|:---:|:---:|:---:|:---:|:---:| +|0 |1 |2 |What is the step by step guide to invest in share market in india? |What is the step by step guide to invest in share market? |0| +|1 |3 |4 |What is the story of Kohinoor (Koh-i-Noor) Diamond? | What would happen if the Indian government stole the Kohinoor (Koh-i-Noor) diamond back? |0| + + A [kaggle competition](https://www.kaggle.com/c/quora-question-pairs#description) was held based on this dataset in 2017. The kagglers were given a training dataset (with labels), and requested to make predictions on a test dataset (without labels). The predictions were evaluated by the log-likelihood loss on the test data. + +The kaggle competition has inspired much effective work. However, most of these models are rule-based and difficult to be transferred to new tasks. Researchers are seeking for more general models that work well on this task and other natual language processing (NLP) tasks. + +[Wang _et al._](https://arxiv.org/abs/1702.03814) proposed a bilateral multi-perspective matching (BIMPM) model based on the Quora Question Pair dataset. They splitted the original dataset to [3 parts](https://drive.google.com/file/d/0B0PlTAo--BnaQWlsZl9FZ3l1c28/view?usp=sharing): _train.tsv_ (384,348 samples), _dev.tsv_ (10,000 samples) and _test.tsv_ (10,000 samples). The class distribution of _train.tsv_ is unbalanced (37% positive and 63% negative), while those of _dev.tsv_ and _test.tsv_ are balanced(50% positive and 50% negetive). We used the same splitting method in our experiments. + +### Our Work + +Based on the Quora Question Pair Dataset, we implemented some classic models in the area of neural language understanding (NLU). The accuracy of prediction results are evaluated on the _test.tsv_ from [Wang _et al._](https://arxiv.org/abs/1702.03814). + +## Environment Preparation + +### Install Fluid release 1.0 + +Please follow the [official document in English](http://www.paddlepaddle.org/documentation/docs/en/1.0/build_and_install/pip_install_en.html) or [official document in Chinese](http://www.paddlepaddle.org/documentation/docs/zh/1.0/beginners_guide/install/Start.html) to install the Fluid deep learning framework. + +#### Have I installed Fluid successfully? + +Run the following script from your command line: + +```shell +python -c "import paddle" +``` + +If Fluid is installed successfully you should see no error message. Feel free to open issues under the [PaddlePaddle repository](https://github.com/PaddlePaddle/Paddle/issues) for support. + +## Prepare Data + +Please download the Quora dataset from [Google drive](https://drive.google.com/file/d/0B0PlTAo--BnaQWlsZl9FZ3l1c28/view?usp=sharing) and unzip to $HOME/.cache/paddle/dataset. + +Then run _data/prepare_quora_data.sh_ to download the pre-trained _word2vec_ embedding file -- _glove.840B.300d.zip_: + +```shell +sh data/prepare_quora_data.sh +``` + +At this point the dataset directory ($HOME/.cache/paddle/dataset) structure should be: + +```shell + +$HOME/.cache/paddle/dataset + |- Quora_question_pair_partition + |- train.tsv + |- test.tsv + |- dev.tsv + |- readme.txt + |- wordvec.txt + |- glove.840B.300d.txt +``` + +## Train and evaluate + +We provide multiple models and configurations. Details are shown in `models` and `configs` directories. For a quick start, please run the _cdssmNet_ model with the corresponding configuration: + +```shell +python train_and_evaluate.py \ + --model_name=cdssmNet \ + --config=cdssm_base +``` + +Logs will be output to the console. If everything works well, the logging information will have the same formats as the content in _cdssm_base.log_. + +All configurations used in our experiments are as follows: + +|Model|Config|command +|:----:|:----:|:----:| +|cdssmNet|cdssm_base|python train_and_evaluate.py --model_name=cdssmNet --config=cdssm_base +|DecAttNet|decatt_glove|python train_and_evaluate.py --model_name=DecAttNet --config=decatt_glove +|InferSentNet|infer_sent_v1|python train_and_evaluate.py --model_name=InferSentNet --config=infer_sent_v1 +|InferSentNet|infer_sent_v2|python train_and_evaluate.py --model_name=InferSentNet --config=infer_sent_v2 +|SSENet|sse_base|python train_and_evaluate.py --model_name=SSENet --config=sse_base + +## Models + +We implemeted 4 models for now: the convolutional deep-structured semantic model (CDSSM, CNN-based), the InferSent model (RNN-based), the shortcut-stacked encoder (SSE, RNN-based), and the decomposed attention model (DecAtt, attention-based). + +|Model|features|Context Encoder|Match Layer|Classification Layer +|:----:|:----:|:----:|:----:|:----:| +|CDSSM|word|1 layer conv1d|concatenation|MLP +|DecAtt|word|Attention|concatenation|MLP +|InferSent|word|1 layer Bi-LSTM|concatenation/element-wise product/
    absolute element-wise difference|MLP +|SSE|word|3 layer Bi-LSTM|concatenation/element-wise product/
    absolute element-wise difference|MLP + +### CDSSM + +``` +@inproceedings{shen2014learning, + title={Learning semantic representations using convolutional neural networks for web search}, + author={Shen, Yelong and He, Xiaodong and Gao, Jianfeng and Deng, Li and Mesnil, Gr{\'e}goire}, + booktitle={Proceedings of the 23rd International Conference on World Wide Web}, + pages={373--374}, + year={2014}, + organization={ACM} +} +``` + +### InferSent + +``` +@article{conneau2017supervised, + title={Supervised learning of universal sentence representations from natural language inference data}, + author={Conneau, Alexis and Kiela, Douwe and Schwenk, Holger and Barrault, Loic and Bordes, Antoine}, + journal={arXiv preprint arXiv:1705.02364}, + year={2017} +} +``` + +### SSE + +``` +@article{nie2017shortcut, + title={Shortcut-stacked sentence encoders for multi-domain inference}, + author={Nie, Yixin and Bansal, Mohit}, + journal={arXiv preprint arXiv:1708.02312}, + year={2017} +} +``` + +### DecAtt + +``` +@article{tomar2017neural, + title={Neural paraphrase identification of questions with noisy pretraining}, + author={Tomar, Gaurav Singh and Duque, Thyago and T{\"a}ckstr{\"o}m, Oscar and Uszkoreit, Jakob and Das, Dipanjan}, + journal={arXiv preprint arXiv:1704.04565}, + year={2017} +} +``` + +## Results + +|Model|Config|dev accuracy| test accuracy +|:----:|:----:|:----:|:----:| +|cdssmNet|cdssm_base|83.56%|82.83%| +|DecAttNet|decatt_glove|86.31%|86.22%| +|InferSentNet|infer_sent_v1|87.15%|86.62%| +|InferSentNet|infer_sent_v2|88.55%|88.43%| +|SSENet|sse_base|88.35%|88.25%| + +In our experiment, we found that LSTM-based models outperformed convolution-based models. The DecAtt model has fewer parameters than LSTM-based models, but is sensitive to hyper-parameters. + +

    + + test_acc + +

    diff --git a/fluid/PaddleCV/yolov3/models/__init__.py b/PaddleNLP/text_matching_on_quora/__init__.py similarity index 100% rename from fluid/PaddleCV/yolov3/models/__init__.py rename to PaddleNLP/text_matching_on_quora/__init__.py diff --git a/fluid/PaddleNLP/text_matching_on_quora/_ce.py b/PaddleNLP/text_matching_on_quora/_ce.py similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/_ce.py rename to PaddleNLP/text_matching_on_quora/_ce.py diff --git a/fluid/PaddleNLP/text_matching_on_quora/cdssm_base.log b/PaddleNLP/text_matching_on_quora/cdssm_base.log similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/cdssm_base.log rename to PaddleNLP/text_matching_on_quora/cdssm_base.log diff --git a/fluid/PaddleNLP/text_matching_on_quora/configs/__init__.py b/PaddleNLP/text_matching_on_quora/configs/__init__.py similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/configs/__init__.py rename to PaddleNLP/text_matching_on_quora/configs/__init__.py diff --git a/fluid/PaddleNLP/text_matching_on_quora/configs/basic_config.py b/PaddleNLP/text_matching_on_quora/configs/basic_config.py similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/configs/basic_config.py rename to PaddleNLP/text_matching_on_quora/configs/basic_config.py diff --git a/fluid/PaddleNLP/text_matching_on_quora/configs/cdssm.py b/PaddleNLP/text_matching_on_quora/configs/cdssm.py similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/configs/cdssm.py rename to PaddleNLP/text_matching_on_quora/configs/cdssm.py diff --git a/fluid/PaddleNLP/text_matching_on_quora/configs/dec_att.py b/PaddleNLP/text_matching_on_quora/configs/dec_att.py similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/configs/dec_att.py rename to PaddleNLP/text_matching_on_quora/configs/dec_att.py diff --git a/fluid/PaddleNLP/text_matching_on_quora/configs/infer_sent.py b/PaddleNLP/text_matching_on_quora/configs/infer_sent.py similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/configs/infer_sent.py rename to PaddleNLP/text_matching_on_quora/configs/infer_sent.py diff --git a/fluid/PaddleNLP/text_matching_on_quora/configs/sse.py b/PaddleNLP/text_matching_on_quora/configs/sse.py similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/configs/sse.py rename to PaddleNLP/text_matching_on_quora/configs/sse.py diff --git a/fluid/PaddleNLP/text_matching_on_quora/data/prepare_quora_data.sh b/PaddleNLP/text_matching_on_quora/data/prepare_quora_data.sh similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/data/prepare_quora_data.sh rename to PaddleNLP/text_matching_on_quora/data/prepare_quora_data.sh diff --git a/fluid/PaddleNLP/text_matching_on_quora/imgs/README.md b/PaddleNLP/text_matching_on_quora/imgs/README.md similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/imgs/README.md rename to PaddleNLP/text_matching_on_quora/imgs/README.md diff --git a/fluid/PaddleNLP/text_matching_on_quora/imgs/models_test_acc.png b/PaddleNLP/text_matching_on_quora/imgs/models_test_acc.png similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/imgs/models_test_acc.png rename to PaddleNLP/text_matching_on_quora/imgs/models_test_acc.png diff --git a/fluid/PaddleNLP/text_matching_on_quora/metric.py b/PaddleNLP/text_matching_on_quora/metric.py similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/metric.py rename to PaddleNLP/text_matching_on_quora/metric.py diff --git a/fluid/PaddleNLP/text_matching_on_quora/models/__init__.py b/PaddleNLP/text_matching_on_quora/models/__init__.py similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/models/__init__.py rename to PaddleNLP/text_matching_on_quora/models/__init__.py diff --git a/fluid/PaddleNLP/text_matching_on_quora/models/cdssm.py b/PaddleNLP/text_matching_on_quora/models/cdssm.py similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/models/cdssm.py rename to PaddleNLP/text_matching_on_quora/models/cdssm.py diff --git a/fluid/PaddleNLP/text_matching_on_quora/models/dec_att.py b/PaddleNLP/text_matching_on_quora/models/dec_att.py similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/models/dec_att.py rename to PaddleNLP/text_matching_on_quora/models/dec_att.py diff --git a/fluid/PaddleNLP/text_matching_on_quora/models/infer_sent.py b/PaddleNLP/text_matching_on_quora/models/infer_sent.py similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/models/infer_sent.py rename to PaddleNLP/text_matching_on_quora/models/infer_sent.py diff --git a/fluid/PaddleNLP/text_matching_on_quora/models/match_layers.py b/PaddleNLP/text_matching_on_quora/models/match_layers.py similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/models/match_layers.py rename to PaddleNLP/text_matching_on_quora/models/match_layers.py diff --git a/fluid/PaddleNLP/text_matching_on_quora/models/my_layers.py b/PaddleNLP/text_matching_on_quora/models/my_layers.py similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/models/my_layers.py rename to PaddleNLP/text_matching_on_quora/models/my_layers.py diff --git a/fluid/PaddleNLP/text_matching_on_quora/models/pwim.py b/PaddleNLP/text_matching_on_quora/models/pwim.py similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/models/pwim.py rename to PaddleNLP/text_matching_on_quora/models/pwim.py diff --git a/fluid/PaddleNLP/text_matching_on_quora/models/sse.py b/PaddleNLP/text_matching_on_quora/models/sse.py similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/models/sse.py rename to PaddleNLP/text_matching_on_quora/models/sse.py diff --git a/fluid/PaddleNLP/text_matching_on_quora/models/test.py b/PaddleNLP/text_matching_on_quora/models/test.py similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/models/test.py rename to PaddleNLP/text_matching_on_quora/models/test.py diff --git a/fluid/PaddleNLP/text_matching_on_quora/pretrained_word2vec.py b/PaddleNLP/text_matching_on_quora/pretrained_word2vec.py similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/pretrained_word2vec.py rename to PaddleNLP/text_matching_on_quora/pretrained_word2vec.py diff --git a/fluid/PaddleNLP/text_matching_on_quora/quora_question_pairs.py b/PaddleNLP/text_matching_on_quora/quora_question_pairs.py similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/quora_question_pairs.py rename to PaddleNLP/text_matching_on_quora/quora_question_pairs.py diff --git a/fluid/PaddleNLP/text_matching_on_quora/train_and_evaluate.py b/PaddleNLP/text_matching_on_quora/train_and_evaluate.py similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/train_and_evaluate.py rename to PaddleNLP/text_matching_on_quora/train_and_evaluate.py diff --git a/fluid/PaddleNLP/text_matching_on_quora/utils.py b/PaddleNLP/text_matching_on_quora/utils.py similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/utils.py rename to PaddleNLP/text_matching_on_quora/utils.py diff --git a/fluid/DeepQNetwork/DQN_agent.py b/PaddleRL/DeepQNetwork/DQN_agent.py similarity index 100% rename from fluid/DeepQNetwork/DQN_agent.py rename to PaddleRL/DeepQNetwork/DQN_agent.py diff --git a/fluid/DeepQNetwork/DoubleDQN_agent.py b/PaddleRL/DeepQNetwork/DoubleDQN_agent.py similarity index 100% rename from fluid/DeepQNetwork/DoubleDQN_agent.py rename to PaddleRL/DeepQNetwork/DoubleDQN_agent.py diff --git a/fluid/DeepQNetwork/DuelingDQN_agent.py b/PaddleRL/DeepQNetwork/DuelingDQN_agent.py similarity index 100% rename from fluid/DeepQNetwork/DuelingDQN_agent.py rename to PaddleRL/DeepQNetwork/DuelingDQN_agent.py diff --git a/PaddleRL/DeepQNetwork/README.md b/PaddleRL/DeepQNetwork/README.md new file mode 100644 index 0000000000000000000000000000000000000000..1edeaaa884318ec3a530ec4fdb7d031d07411b56 --- /dev/null +++ b/PaddleRL/DeepQNetwork/README.md @@ -0,0 +1,67 @@ +[中文版](README_cn.md) + +## Reproduce DQN, DoubleDQN, DuelingDQN model with Fluid version of PaddlePaddle +Based on PaddlePaddle's next-generation API Fluid, the DQN model of deep reinforcement learning is reproduced, and the same level of indicators of the paper is reproduced in the classic Atari game. The model receives the image of the game as input, and uses the end-to-end model to directly predict the next step. The repository contains the following three types of models: ++ DQN in +[Human-level Control Through Deep Reinforcement Learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html) ++ DoubleDQN in: +[Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389) ++ DuelingDQN in: +[Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html) + +## Atari benchmark & performance + +### Atari games introduction + +Please see [here](https://gym.openai.com/envs/#atari) to know more about Atari game. + +### Pong game result + +The average game rewards that can be obtained for the three models as the number of training steps changes during the training are as follows(about 3 hours/1 Million steps): + +
    +DQN result +
    + +## How to use +### Dependencies: ++ python2.7 ++ gym ++ tqdm ++ opencv-python ++ paddlepaddle-gpu>=1.0.0 ++ ale_python_interface + +### Install Dependencies: ++ Install PaddlePaddle: + recommended to compile and install PaddlePaddle from source code ++ Install other dependencies: + ``` + pip install -r requirement.txt + pip install gym[atari] + ``` + Install ale_python_interface, please see [here](https://github.com/mgbellemare/Arcade-Learning-Environment). + +### Start Training: +``` +# To train a model for Pong game with gpu (use DQN model as default) +python train.py --rom ./rom_files/pong.bin --use_cuda + +# To train a model for Pong with DoubleDQN +python train.py --rom ./rom_files/pong.bin --use_cuda --alg DoubleDQN + +# To train a model for Pong with DuelingDQN +python train.py --rom ./rom_files/pong.bin --use_cuda --alg DuelingDQN +``` + +To train more games, you can install more rom files from [here](https://github.com/openai/atari-py/tree/master/atari_py/atari_roms). + +### Start Testing: +``` +# Play the game with saved best model and calculate the average rewards +python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong + +# Play the game with visualization +python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong --viz 0.01 +``` +[Here](https://pan.baidu.com/s/1gIsbNw5V7tMeb74ojx-TMA) is saved models for Pong and Breakout games. You can use it to play the game directly. diff --git a/PaddleRL/DeepQNetwork/README_cn.md b/PaddleRL/DeepQNetwork/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..640d775ad8fed2be360d308b6c5df41c86d77c04 --- /dev/null +++ b/PaddleRL/DeepQNetwork/README_cn.md @@ -0,0 +1,71 @@ +## 基于PaddlePaddle的Fluid版本复现DQN, DoubleDQN, DuelingDQN三个模型 + +基于PaddlePaddle下一代API Fluid复现了深度强化学习领域的DQN模型,在经典的Atari 游戏上复现了论文同等水平的指标,模型接收游戏的图像作为输入,采用端到端的模型直接预测下一步要执行的控制信号,本仓库一共包含以下3类模型: ++ DQN模型: +[Human-level Control Through Deep Reinforcement Learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html) ++ DoubleDQN模型: +[Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389) ++ DuelingDQN模型: +[Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html) + +## 模型效果:Atari游戏表现 + +### Atari游戏介绍 + +请点击[这里](https://gym.openai.com/envs/#atari)了解Atari游戏。 + +### Pong游戏训练结果 +三个模型在训练过程中随着训练步数的变化,能得到的平均游戏奖励如下图所示(大概3小时每1百万步): + +
    +DQN result +
    + +## 使用教程 + +### 依赖: ++ python2.7 ++ gym ++ tqdm ++ opencv-python ++ paddlepaddle-gpu>=1.0.0 ++ ale_python_interface + +### 下载依赖: + ++ 安装PaddlePaddle: + 建议通过PaddlePaddle源码进行编译安装 ++ 下载其它依赖: + ``` + pip install -r requirement.txt + pip install gym[atari] + ``` + 安装ale_python_interface可以参考[这里](https://github.com/mgbellemare/Arcade-Learning-Environment) + +### 训练模型: + +``` +# 使用GPU训练Pong游戏(默认使用DQN模型) +python train.py --rom ./rom_files/pong.bin --use_cuda + +# 训练DoubleDQN模型 +python train.py --rom ./rom_files/pong.bin --use_cuda --alg DoubleDQN + +# 训练DuelingDQN模型 +python train.py --rom ./rom_files/pong.bin --use_cuda --alg DuelingDQN +``` + +训练更多游戏,可以从[这里](https://github.com/openai/atari-py/tree/master/atari_py/atari_roms)下载游戏rom + +### 测试模型: + +``` +# Play the game with saved model and calculate the average rewards +# 使用训练过程中保存的最好模型玩游戏,以及计算平均奖励(rewards) +python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong + +# 以可视化的形式来玩游戏 +python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong --viz 0.01 +``` + +[这里](https://pan.baidu.com/s/1gIsbNw5V7tMeb74ojx-TMA)是Pong和Breakout游戏训练好的模型,可以直接用来测试。 diff --git a/fluid/DeepQNetwork/assets/dqn.png b/PaddleRL/DeepQNetwork/assets/dqn.png similarity index 100% rename from fluid/DeepQNetwork/assets/dqn.png rename to PaddleRL/DeepQNetwork/assets/dqn.png diff --git a/fluid/DeepQNetwork/atari.py b/PaddleRL/DeepQNetwork/atari.py similarity index 100% rename from fluid/DeepQNetwork/atari.py rename to PaddleRL/DeepQNetwork/atari.py diff --git a/fluid/DeepQNetwork/atari_wrapper.py b/PaddleRL/DeepQNetwork/atari_wrapper.py similarity index 100% rename from fluid/DeepQNetwork/atari_wrapper.py rename to PaddleRL/DeepQNetwork/atari_wrapper.py diff --git a/fluid/DeepQNetwork/expreplay.py b/PaddleRL/DeepQNetwork/expreplay.py similarity index 100% rename from fluid/DeepQNetwork/expreplay.py rename to PaddleRL/DeepQNetwork/expreplay.py diff --git a/fluid/DeepQNetwork/play.py b/PaddleRL/DeepQNetwork/play.py similarity index 100% rename from fluid/DeepQNetwork/play.py rename to PaddleRL/DeepQNetwork/play.py diff --git a/fluid/DeepQNetwork/requirement.txt b/PaddleRL/DeepQNetwork/requirement.txt similarity index 100% rename from fluid/DeepQNetwork/requirement.txt rename to PaddleRL/DeepQNetwork/requirement.txt diff --git a/fluid/DeepQNetwork/rom_files/breakout.bin b/PaddleRL/DeepQNetwork/rom_files/breakout.bin similarity index 100% rename from fluid/DeepQNetwork/rom_files/breakout.bin rename to PaddleRL/DeepQNetwork/rom_files/breakout.bin diff --git a/fluid/DeepQNetwork/rom_files/pong.bin b/PaddleRL/DeepQNetwork/rom_files/pong.bin similarity index 100% rename from fluid/DeepQNetwork/rom_files/pong.bin rename to PaddleRL/DeepQNetwork/rom_files/pong.bin diff --git a/fluid/DeepQNetwork/train.py b/PaddleRL/DeepQNetwork/train.py similarity index 100% rename from fluid/DeepQNetwork/train.py rename to PaddleRL/DeepQNetwork/train.py diff --git a/PaddleRL/README.md b/PaddleRL/README.md new file mode 100644 index 0000000000000000000000000000000000000000..5b8d2caf78d426a14b96f7d842eb88ed37bab233 --- /dev/null +++ b/PaddleRL/README.md @@ -0,0 +1,11 @@ +PaddleRL +============ + +强化学习 +-------- + +强化学习是近年来一个愈发重要的机器学习方向,特别是与深度学习相结合而形成的深度强化学习(Deep Reinforcement Learning, DRL),取得了很多令人惊异的成就。人们所熟知的战胜人类顶级围棋职业选手的 AlphaGo 就是 DRL 应用的一个典型例子,除游戏领域外,其它的应用还包括机器人、自然语言处理等。 + +深度强化学习的开山之作是在Atari视频游戏中的成功应用, 其可直接接受视频帧这种高维输入并根据图像内容端到端地预测下一步的动作,所用到的模型被称为深度Q网络(Deep Q-Network, DQN)。本实例就是利用PaddlePaddle Fluid这个灵活的框架,实现了 DQN 及其变体,并测试了它们在 Atari 游戏中的表现。 + +- [DeepQNetwork](https://github.com/PaddlePaddle/models/blob/develop/PaddleRL/DeepQNetwork/README_cn.md) diff --git a/PaddleRL/policy_gradient/README.md b/PaddleRL/policy_gradient/README.md new file mode 100644 index 0000000000000000000000000000000000000000..b813aa124466597adfb80261bee7c2de22b95e67 --- /dev/null +++ b/PaddleRL/policy_gradient/README.md @@ -0,0 +1,171 @@ +运行本目录下的程序示例需要使用PaddlePaddle的最新develop分枝。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。 + +--- + +# Policy Gradient RL by PaddlePaddle +本文介绍了如何使用PaddlePaddle通过policy-based的强化学习方法来训练一个player(actor model), 我们希望这个player可以完成简单的走阶梯任务。 + + 内容分为: + + - 任务描述 + - 模型 + - 策略(目标函数) + - 算法(Gradient ascent) + - PaddlePaddle实现 + + +## 1. 任务描述 +假设有一个阶梯,连接A、B点,player从A点出发,每一步只能向前走一步或向后走一步,到达B点即为完成任务。我们希望训练一个聪明的player,它知道怎么最快的从A点到达B点。 +我们在命令行以下边的形式模拟任务: +``` +A - O - - - - - B +``` +一个‘-'代表一个阶梯,A点在行头,B点在行末,O代表player当前在的位置。 + +## 2. Policy Gradient +### 2.1 模型 +#### inputyer +模型的输入是player观察到的当前阶梯的状态$S$, 要包含阶梯的长度和player当前的位置信息。 +在命令行模拟的情况下,player的位置和阶梯长度连个变量足以表示当前的状态,但是我们为了便于将这个demo推广到更复杂的任务场景,我们这里用一个向量来表示游戏状态$S$. +向量$S$的长度为阶梯的长度,每一维代表一个阶梯,player所在的位置为1,其它位置为0. +下边是一个例子: +``` +S = [0, 1, 0, 0] // 阶梯长度为4,player在第二个阶梯上。 +``` +#### hidden layer +隐藏层采用两个全连接layer `FC_1`和`FC_2`, 其中`FC_1` 的size为10, `FC_2`的size为2. + +#### output layer +我们使用softmax将`FC_2`的output映射为所有可能的动作(前进或后退)的概率分布(Probability of taking the action),即为一个二维向量`act_probs`, 其中,`act_probs[0]` 为后退的概率,`act_probs[1]`为前进的概率。 + +#### 模型表示 +我将我们的player模型(actor)形式化表示如下: +$$a = \pi_\theta(s)$$ +其中$\theta$表示模型的参数,$s$是输入状态。 + + +### 2.2 策略(目标函数) +我们怎么评估一个player(模型)的好坏呢?首先我们定义几个术语: +我们让$\pi_\theta(s)$来玩一局游戏,$s_t$表示第$t$时刻的状态,$a_t$表示在状态$s_t$做出的动作,$r_t$表示做过动作$a_t$后得到的奖赏。 +一局游戏的过程可以表示如下: +$$\tau = [s_1, a_1, r_1, s_2, a_2, r_2 ... s_T, a_T, r_T] \tag{1}$$ + +一局游戏的奖励表示如下: +$$R(\tau) = \sum_{t=1}^Tr_t$$ + +player玩一局游戏,可能会出现多种操作序列$\tau$ ,某个$\tau$出现的概率是依赖于player model的$\theta$, 记做: +$$P(\tau | \theta)$$ +那么,给定一个$\theta$(player model), 玩一局游戏,期望得到的奖励是: +$$\overline {R}_\theta = \sum_\tau R(\tau)\sum_\tau R(\tau) P(\tau|\theta)$$ +大多数情况,我们无法穷举出所有的$\tau$,所以我们就抽取N个$\tau$来计算近似的期望: +$$\overline {R}_\theta = \sum_\tau R(\tau) P(\tau|\theta) \approx \frac{1}{N} \sum_{n=1}^N R(\tau^n)$$ + +$\overline {R}_\theta$就是我们需要的目标函数,它表示了一个参数为$\theta$的player玩一局游戏得分的期望,这个期望越大,代表这个player能力越强。 +### 2.3 算法(Gradient ascent) +我们的目标函数是$\overline {R}_\theta$, 我们训练的任务就是, 我们训练的任务就是: +$$\theta^* = \arg\max_\theta \overline {R}_\theta$$ + +为了找到理想的$\theta$,我们使用Gradient ascent方法不断在$\overline {R}_\theta$的梯度方向更新$\theta$,可表示如下: +$$\theta' = \theta + \eta * \bigtriangledown \overline {R}_\theta$$ + +$$ \bigtriangledown \overline {R}_\theta = \sum_\tau R(\tau) \bigtriangledown P(\tau|\theta)\\ += \sum_\tau R(\tau) P(\tau|\theta) \frac{\bigtriangledown P(\tau|\theta)}{P(\tau|\theta)} \\ +=\sum_\tau R(\tau) P(\tau|\theta) {\bigtriangledown \log P(\tau|\theta)} $$ + + +$$P(\tau|\theta) = P(s_1)P(a_1|s_1,\theta)P(s_2, r_1|s_1,a_1)P(a_2|s_2,\theta)P(s_3,r_2|s_2,a_2)...P(a_t|s_t,\theta)P(s_{t+1}, r_t|s_t,a_t)\\ +=P(s_1) \sum_{t=1}^T P(a_t|s_t,\theta)P(s_{t+1}, r_t|s_t,a_t)$$ + +$$\log P(\tau|\theta) = \log P(s_1) + \sum_{t=1}^T [\log P(a_t|s_t,\theta) + \log P(s_{t+1}, r_t|s_t,a_t)]$$ + +$$ \bigtriangledown \log P(\tau|\theta) = \sum_{t=1}^T \bigtriangledown \log P(a_t|s_t,\theta)$$ + +$$ \bigtriangledown \overline {R}_\theta = \sum_\tau R(\tau) P(\tau|\theta) {\bigtriangledown \log P(\tau|\theta)} \\ +\approx \frac{1}{N} \sum_{n=1}^N R(\tau^n) {\bigtriangledown \log P(\tau|\theta)} \\ += \frac{1}{N} \sum_{n=1}^N R(\tau^n) {\sum_{t=1}^T \bigtriangledown \log P(a_t|s_t,\theta)} \\ += \frac{1}{N} \sum_{n=1}^N \sum_{t=1}^T R(\tau^n) { \bigtriangledown \log P(a_t|s_t,\theta)} \tag{11}$$ + +#### 2.3.2 导数解释 + +在使用深度学习框架进行训练求解时,一般用梯度下降方法,所以我们把Gradient ascent转为Gradient +descent, 重写等式$(5)(6)$为: + +$$\theta^* = \arg\min_\theta (-\overline {R}_\theta \tag{13}$$ +$$\theta' = \theta - \eta * \bigtriangledown (-\overline {R}_\theta)) \tag{14}$$ + +根据上一节的推导,$ (-\bigtriangledown \overline {R}_\theta) $结果如下: + +$$ -\bigtriangledown \overline {R}_\theta += \frac{1}{N} \sum_{n=1}^N \sum_{t=1}^T R(\tau^n) { \bigtriangledown -\log P(a_t|s_t,\theta)} \tag{15}$$ + +根据等式(14), 我们的player的模型可以设计为: + +

    +
    +图 1 +

    + +用户的在一局游戏中的一次操作可以用元组$(s_t, a_t)$, 就是在状态$s_t$状态下做了动作$a_t$, 我们通过图(1)中的前向网络计算出来cross entropy cost为$−\log P(a_t|s_t,\theta)$, 恰好是等式(15)中我们需要微分的一项。 +图1是我们需要的player模型,我用这个网络的前向计算可以预测任何状态下该做什么动作。但是怎么去训练学习这个网络呢?在等式(15)中还有一项$R(\tau^n)$, 我做反向梯度传播的时候要加上这一项,所以我们需要在图1基础上再加上$R(\tau^n)$, 如 图2 所示: + +

    +
    +图 2 +

    + +图2就是我们最终的网络结构。 + +#### 2.3.3 直观理解 +对于等式(15),我只看游戏中的一步操作,也就是这一项: $R(\tau^n) { \bigtriangledown -\log P(a_t|s_t,\theta)}$, 我们可以简单的认为我们训练的目的是让 $R(\tau^n) {[ -\log P(a_t|s_t,\theta)]}$尽可能的小,也就是$R(\tau^n) \log P(a_t|s_t,\theta)$尽可能的大。 + +- 如果我们当前游戏局的奖励$R(\tau^n)$为正,那么我们希望当前操作的出现的概率$P(a_t|s_t,\theta)$尽可能大。 +- 如果我们当前游戏局的奖励$R(\tau^n)$为负,那么我们希望当前操作的出现的概率$P(a_t|s_t,\theta)$尽可能小。 + +#### 2.3.4 一个问题 + +一人犯错,诛连九族。一人得道,鸡犬升天。如果一局游戏得到奖励,我们希望帮助获得奖励的每一次操作都被重视;否则,导致惩罚的操作都要被冷落一次。 +是不是很有道理的样子?但是,如果有些游戏场景只有奖励,没有惩罚,怎么办?也就是所有的$R(\tau^n)$都为正。 +针对不同的游戏场景,我们有不同的解决方案: + +1. 每局游戏得分不一样:将每局的得分减去一个bias,结果就有正有负了。 +2. 每局游戏得分一样:把完成一局的时间作为计分因素,并减去一个bias. + +我们在第一章描述的游戏场景,需要用第二种 ,player每次到达终点都会收到1分的奖励,我们可以按完成任务所用的步数来定义奖励R. +更进一步,我们认为一局游戏中每步动作对结局的贡献是不同的,有聪明的动作,也有愚蠢的操作。直观的理解,一般是靠前的动作是愚蠢的,靠后的动作是聪明的。既然有了这个价值观,那么我们拿到1分的奖励,就不能平均分给每个动作了。 +如图3所示,让所有动作按先后排队,从后往前衰减地给每个动作奖励,然后再每个动作的奖励再减去所有动作奖励的平均值: + +

    +
    +图 3 +

    + +## 3. 训练效果 + +demo运行训练效果如下,经过1000轮尝试,我们的player就学会了如何有效的完成任务了: + +``` +---------O epoch: 0; steps: 42 +---------O epoch: 1; steps: 77 +---------O epoch: 2; steps: 82 +---------O epoch: 3; steps: 64 +---------O epoch: 4; steps: 79 +---------O epoch: 501; steps: 19 +---------O epoch: 1001; steps: 9 +---------O epoch: 1501; steps: 9 +---------O epoch: 2001; steps: 11 +---------O epoch: 2501; steps: 9 +---------O epoch: 3001; steps: 9 +---------O epoch: 3002; steps: 9 +---------O epoch: 3003; steps: 9 +---------O epoch: 3004; steps: 9 +---------O epoch: 3005; steps: 9 +---------O epoch: 3006; steps: 9 +---------O epoch: 3007; steps: 9 +---------O epoch: 3008; steps: 9 +---------O epoch: 3009; steps: 9 +---------O epoch: 3010; steps: 11 +---------O epoch: 3011; steps: 9 +---------O epoch: 3012; steps: 9 +---------O epoch: 3013; steps: 9 +---------O epoch: 3014; steps: 9 +``` diff --git a/fluid/policy_gradient/brain.py b/PaddleRL/policy_gradient/brain.py similarity index 100% rename from fluid/policy_gradient/brain.py rename to PaddleRL/policy_gradient/brain.py diff --git a/fluid/policy_gradient/env.py b/PaddleRL/policy_gradient/env.py similarity index 100% rename from fluid/policy_gradient/env.py rename to PaddleRL/policy_gradient/env.py diff --git a/fluid/policy_gradient/images/PG_1.svg b/PaddleRL/policy_gradient/images/PG_1.svg similarity index 100% rename from fluid/policy_gradient/images/PG_1.svg rename to PaddleRL/policy_gradient/images/PG_1.svg diff --git a/fluid/policy_gradient/images/PG_2.svg b/PaddleRL/policy_gradient/images/PG_2.svg similarity index 100% rename from fluid/policy_gradient/images/PG_2.svg rename to PaddleRL/policy_gradient/images/PG_2.svg diff --git a/fluid/policy_gradient/images/PG_3.svg b/PaddleRL/policy_gradient/images/PG_3.svg similarity index 100% rename from fluid/policy_gradient/images/PG_3.svg rename to PaddleRL/policy_gradient/images/PG_3.svg diff --git a/fluid/policy_gradient/run.py b/PaddleRL/policy_gradient/run.py similarity index 100% rename from fluid/policy_gradient/run.py rename to PaddleRL/policy_gradient/run.py diff --git a/PaddleRec/README.md b/PaddleRec/README.md new file mode 100644 index 0000000000000000000000000000000000000000..2cc6efa204d7d4dbc079f414b4722c53823dbe8b --- /dev/null +++ b/PaddleRec/README.md @@ -0,0 +1,15 @@ +PaddleRec +========= + +个性化推荐 +------- + +推荐系统在当前的互联网服务中正在发挥越来越大的作用,目前大部分电子商务系统、社交网络,广告推荐,搜索引擎,都不同程度的使用了各种形式的个性化推荐技术,帮助用户快速找到他们想要的信息。 + +在工业可用的推荐系统中,推荐策略一般会被划分为多个模块串联执行。以新闻推荐系统为例,存在多个可以使用深度学习技术的环节,例如新闻的自动化标注,个性化新闻召回,个性化匹配与排序等。PaddlePaddle对推荐算法的训练提供了完整的支持,并提供了多种模型配置供用户选择。 + +- [TagSpace](https://github.com/PaddlePaddle/models/tree/develop/PaddleRec/tagspace) +- [GRU4Rec](https://github.com/PaddlePaddle/models/tree/develop/PaddleRec/gru4rec) +- [SequenceSemanticRetrieval](https://github.com/PaddlePaddle/models/tree/develop/PaddleRec/ssr) +- [DeepCTR](https://github.com/PaddlePaddle/models/blob/develop/PaddleRec/ctr/README.cn.md) +- [Multiview-Simnet](https://github.com/PaddlePaddle/models/tree/develop/PaddleRec/multiview_simnet) diff --git a/fluid/PaddleNLP/chinese_ner/__init__.py b/PaddleRec/__init__.py similarity index 100% rename from fluid/PaddleNLP/chinese_ner/__init__.py rename to PaddleRec/__init__.py diff --git a/fluid/PaddleRec/ctr/.run_ce.sh b/PaddleRec/ctr/.run_ce.sh similarity index 100% rename from fluid/PaddleRec/ctr/.run_ce.sh rename to PaddleRec/ctr/.run_ce.sh diff --git a/PaddleRec/ctr/README.cn.md b/PaddleRec/ctr/README.cn.md new file mode 100644 index 0000000000000000000000000000000000000000..05d1653e52c1db36e9690c64166283afc26df429 --- /dev/null +++ b/PaddleRec/ctr/README.cn.md @@ -0,0 +1,79 @@ + +# 基于DNN模型的点击率预估模型 + +## 介绍 +本模型实现了下述论文中提出的DNN模型: + +```text +@inproceedings{guo2017deepfm, + title={DeepFM: A Factorization-Machine based Neural Network for CTR Prediction}, + author={Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li and Xiuqiang He}, + booktitle={the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI)}, + pages={1725--1731}, + year={2017} +} +``` + +## 运行环境 +需要先安装PaddlePaddle Fluid,然后运行: + +```shell +pip install -r requirements.txt +``` + +## 数据集 +本文使用的是Kaggle公司举办的[展示广告竞赛](https://www.kaggle.com/c/criteo-display-ad-challenge/)中所使用的Criteo数据集。 + +每一行是一次广告展示的特征,第一列是一个标签,表示这次广告展示是否被点击。总共有39个特征,其中13个特征采用整型值,另外26个特征是类别类特征。测试集中是没有标签的。 + +下载数据集: +```bash +cd data && ./download.sh && cd .. +``` + +## 模型 +本例子只实现了DeepFM论文中介绍的模型的DNN部分,DeepFM会在其他例子中给出。 + + +## 数据准备 +处理原始数据集,整型特征使用min-max归一化方法规范到[0, 1],类别类特征使用了one-hot编码。原始数据集分割成两部分:90%用于训练,其他10%用于训练过程中的验证。 + +## 训练 +训练的命令行选项可以通过`python train.py -h`列出。 + +### 单机训练: +```bash +python train.py \ + --train_data_path data/raw/train.txt \ + 2>&1 | tee train.log +``` + +训练到第1轮的第40000个batch后,测试的AUC为0.801178,误差(cost)为0.445196。 + +### 分布式训练 + +本地启动一个2 trainer 2 pserver的分布式训练任务,分布式场景下训练数据会按照trainer的id进行切分,保证trainer之间的训练数据不会重叠,提高训练效率 + +```bash +sh cluster_train.sh +``` + +## 预测 +预测的命令行选项可以通过`python infer.py -h`列出。 + +对测试集进行预测: +```bash +python infer.py \ + --model_path models/pass-0/ \ + --data_path data/raw/valid.txt +``` +注意:infer.py跑完最后输出的AUC才是整个预测文件的整体AUC。 + +## 在百度云上运行集群训练 +1. 参考文档 [在百度云上启动Fluid分布式训练](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/user_guides/howto/training/train_on_baidu_cloud_cn.rst) 在百度云上部署一个CPU集群。 +1. 用preprocess.py处理训练数据生成train.txt。 +1. 将train.txt切分成集群机器份,放到每台机器上。 +1. 用上面的 `分布式训练` 中的命令行启动分布式训练任务. + +## 在PaddleCloud上运行集群训练 +如果你正在使用PaddleCloud做集群训练,你可以使用```cloud.py```这个文件来帮助你提交任务,```trian.py```中所需要的参数可以通过PaddleCloud的环境变量来提交。 \ No newline at end of file diff --git a/PaddleRec/ctr/README.md b/PaddleRec/ctr/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e29e2e1eb5493fc52b8bb38a6bf49bd397bb8455 --- /dev/null +++ b/PaddleRec/ctr/README.md @@ -0,0 +1,96 @@ + +# DNN for Click-Through Rate prediction + +## Introduction +This model implements the DNN part proposed in the following paper: + +```text +@inproceedings{guo2017deepfm, + title={DeepFM: A Factorization-Machine based Neural Network for CTR Prediction}, + author={Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li and Xiuqiang He}, + booktitle={the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI)}, + pages={1725--1731}, + year={2017} +} +``` + +The DeepFm combines factorization machine and deep neural networks to model +both low order and high order feature interactions. For details of the +factorization machines, please refer to the paper [factorization +machines](https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf) + +## Environment +You should install PaddlePaddle Fluid first, and run: + +```shell +pip install -r requirements.txt +``` + +## Dataset +This example uses Criteo dataset which was used for the [Display Advertising +Challenge](https://www.kaggle.com/c/criteo-display-ad-challenge/) +hosted by Kaggle. + +Each row is the features for an ad display and the first column is a label +indicating whether this ad has been clicked or not. There are 39 features in +total. 13 features take integer values and the other 26 features are +categorical features. For the test dataset, the labels are omitted. + +Download dataset: +```bash +cd data && ./download.sh && cd .. +``` + +## Model +This Demo only implement the DNN part of the model described in DeepFM paper. +DeepFM model will be provided in other model. + + +## Data Preprocessing method +To preprocess the raw dataset, the integer features are clipped then min-max +normalized to [0, 1] and the categorical features are one-hot encoded. The raw +training dataset are splited such that 90% are used for training and the other +10% are used for validation during training. In reader.py, training data is the first +90% of data in train.txt, and validation data is the left. + +## Train +The command line options for training can be listed by `python train.py -h`. + +### Local Train: +```bash +python train.py \ + --train_data_path data/raw/train.txt \ + 2>&1 | tee train.log +``` + +After training pass 1 batch 40000, the testing AUC is `0.801178` and the testing +cost is `0.445196`. + +### Distributed Train +Run a 2 pserver 2 trainer distribute training on a single machine. +In distributed training setting, training data is splited by trainer_id, so that training data + do not overlap among trainers + +```bash +sh cluster_train.sh +``` + +## Infer +The command line options for infering can be listed by `python infer.py -h`. + +To make inference for the test dataset: +```bash +python infer.py \ + --model_path models/ \ + --data_path data/raw/train.txt +``` +Note: The AUC value in the last log info is the total AUC for all test dataset. Here, train.txt is splited inside the reader.py so that validation data does not have overlap with training data. + +## Train on Baidu Cloud +1. Please prepare some CPU machines on Baidu Cloud following the steps in [train_on_baidu_cloud](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/user_guides/howto/training/train_on_baidu_cloud_cn.rst) +1. Prepare dataset using preprocess.py. +1. Split the train.txt to trainer_num parts and put them on the machines. +1. Run training with the cluster train using the command in `Distributed Train` above. + +## Train on Paddle Cloud +If you want to run this training on PaddleCloud, you can use the script ```cloud.py```, you can change the arguments in ```trian.py``` through environments in PaddleCloud. \ No newline at end of file diff --git a/fluid/PaddleNLP/deep_attention_matching_net/utils/__init__.py b/PaddleRec/ctr/__init__.py similarity index 100% rename from fluid/PaddleNLP/deep_attention_matching_net/utils/__init__.py rename to PaddleRec/ctr/__init__.py diff --git a/fluid/PaddleRec/ctr/_ce.py b/PaddleRec/ctr/_ce.py similarity index 100% rename from fluid/PaddleRec/ctr/_ce.py rename to PaddleRec/ctr/_ce.py diff --git a/fluid/PaddleRec/ctr/cloud.py b/PaddleRec/ctr/cloud.py similarity index 100% rename from fluid/PaddleRec/ctr/cloud.py rename to PaddleRec/ctr/cloud.py diff --git a/fluid/PaddleRec/ctr/cluster_train.sh b/PaddleRec/ctr/cluster_train.sh similarity index 100% rename from fluid/PaddleRec/ctr/cluster_train.sh rename to PaddleRec/ctr/cluster_train.sh diff --git a/fluid/PaddleRec/ctr/data/download.sh b/PaddleRec/ctr/data/download.sh similarity index 100% rename from fluid/PaddleRec/ctr/data/download.sh rename to PaddleRec/ctr/data/download.sh diff --git a/fluid/PaddleRec/ctr/infer.py b/PaddleRec/ctr/infer.py similarity index 100% rename from fluid/PaddleRec/ctr/infer.py rename to PaddleRec/ctr/infer.py diff --git a/fluid/PaddleRec/ctr/network_conf.py b/PaddleRec/ctr/network_conf.py similarity index 100% rename from fluid/PaddleRec/ctr/network_conf.py rename to PaddleRec/ctr/network_conf.py diff --git a/fluid/PaddleRec/ctr/preprocess.py b/PaddleRec/ctr/preprocess.py similarity index 100% rename from fluid/PaddleRec/ctr/preprocess.py rename to PaddleRec/ctr/preprocess.py diff --git a/fluid/PaddleRec/ctr/reader.py b/PaddleRec/ctr/reader.py similarity index 100% rename from fluid/PaddleRec/ctr/reader.py rename to PaddleRec/ctr/reader.py diff --git a/fluid/PaddleRec/ctr/requirements.txt b/PaddleRec/ctr/requirements.txt similarity index 100% rename from fluid/PaddleRec/ctr/requirements.txt rename to PaddleRec/ctr/requirements.txt diff --git a/fluid/PaddleRec/ctr/train.py b/PaddleRec/ctr/train.py similarity index 100% rename from fluid/PaddleRec/ctr/train.py rename to PaddleRec/ctr/train.py diff --git a/PaddleRec/din/README.md b/PaddleRec/din/README.md new file mode 100644 index 0000000000000000000000000000000000000000..3538ba760ff9b80807a6a56aed4b75400c97ae03 --- /dev/null +++ b/PaddleRec/din/README.md @@ -0,0 +1,137 @@ +# DIN + +以下是本例的简要目录结构及说明: + +```text +. +├── README.md # 文档 +├── train.py # 训练脚本 +├── infer.py # 预测脚本 +├── network.py # 网络结构 +├── cluster_train.py # 多机训练 +├── cluster_train.sh # 多机训练脚本 +├── reader.py # 和读取数据相关的函数 +├── data/ + ├── build_dataset.py # 文本数据转化为paddle数据 + ├── convert_pd.py # 将原始数据转化为pandas的dataframe + ├── data_process.sh # 数据预处理脚本 + ├── remap_id.py # remap类别id + +``` + +## 简介 + +DIN模型的介绍可以参阅论文[Deep Interest Network for Click-Through Rate Prediction](https://arxiv.org/abs/1706.06978)。 + +DIN通过一个兴趣激活模块(Activation Unit),用预估目标Candidate ADs的信息去激活用户的历史点击商品,以此提取用户与当前预估目标相关的兴趣。 + +权重高的历史行为表明这部分兴趣和当前广告相关,权重低的则是和广告无关的”兴趣噪声“。我们通过将激活的商品和激活权重相乘,然后累加起来作为当前预估目标ADs相关的兴趣状态表达。 + +最后我们将这相关的用户兴趣表达、用户静态特征和上下文相关特征,以及ad相关的特征拼接起来,输入到后续的多层DNN网络,最后预测得到用户对当前目标ADs的点击概率。 + + +## 数据下载及预处理 + +* Step 1: 运行如下命令 下载[Amazon Product数据集](http://jmcauley.ucsd.edu/data/amazon/)并进行预处理 +``` +cd data && sh data_process.sh && cd .. +``` +如果执行过程中遇到找不到某个包(例如pandas包)的报错,使用如下命令安装对应的包即可。 +``` +pip install pandas +``` + +* Step 2: 产生训练集、测试集和config文件 +``` +python build_dataset.py +``` +运行之后在data文件夹下会产生config.txt、paddle_test.txt、paddle_train.txt三个文件 + +数据格式例子如下: +``` +3737 19450;288 196;18486;674;1 +3647 4342 6855 3805;281 463 558 674;4206;463;1 +1805 4309;87 87;21354;556;1 +18209 20753;649 241;51924;610;0 +13150;351;41455;792;1 +35120 40418;157 714;52035;724;0 +``` + +其中每一行是一个Sample,由分号分隔的5个域组成。前两个域是历史交互的item序列和item对应的类别,第三、四个域是待预测的item和其类别,最后一个域是label,表示点击与否。 + + +## 训练 + +具体的参数配置说明可通过运行下列代码查看 +``` +python train.py -h +``` + +gpu 单机单卡训练 +``` bash +CUDA_VISIBLE_DEVICES=1 python -u train.py --config_path 'data/config.txt' --train_dir 'data/paddle_train.txt' --batch_size 32 --epoch_num 100 --use_cuda 1 > log.txt 2>&1 & +``` + +cpu 单机训练 +``` bash +python -u train.py --config_path 'data/config.txt' --train_dir 'data/paddle_train.txt' --batch_size 32 --epoch_num 100 --use_cuda 0 > log.txt 2>&1 & +``` + +值得注意的是上述单卡训练可以通过加--parallel 1参数使用Parallel Executor来进行加速 + +gpu 单机多卡训练 +``` bash +CUDA_VISIBLE_DEVICES=0,1 python -u train.py --config_path 'data/config.txt' --train_dir 'data/paddle_train.txt' --batch_size 32 --epoch_num 100 --use_cuda 1 --parallel 1 --num_devices 2 > log.txt 2>&1 & +``` + +cpu 单机多卡训练 +``` bash +CPU_NUM=10 python -u train.py --config_path 'data/config.txt' --train_dir 'data/paddle_train.txt' --batch_size 32 --epoch_num 100 --use_cuda 0 --parallel 1 --num_devices 10 > log.txt 2>&1 & +``` + + +## 训练结果示例 + +我们在Tesla K40m单GPU卡上训练的日志如下所示(以实际输出为准) +```text +2019-02-22 09:31:51,578 - INFO - reading data begins +2019-02-22 09:32:22,407 - INFO - reading data completes +W0222 09:32:24.151955 7221 device_context.cc:263] Please NOTE: device: 0, CUDA Capability: 35, Driver API Version: 9.0, Runtime API Version: 8.0 +W0222 09:32:24.152046 7221 device_context.cc:271] device: 0, cuDNN Version: 7.0. +2019-02-22 09:32:27,797 - INFO - train begins +epoch: 1 global_step: 1000 train_loss: 0.6950 time: 14.64 +epoch: 1 global_step: 2000 train_loss: 0.6854 time: 15.41 +epoch: 1 global_step: 3000 train_loss: 0.6799 time: 14.84 +... +model saved in din_amazon/global_step_50000 +... +``` + +提示: + +* 在单机条件下,使用代码中默认的超参数运行时,产生最优auc的global step大致在440000到500000之间 + +* 训练超出一定的epoch后会稍稍出现过拟合 + +## 预测 +参考如下命令,开始预测. + +其中model_path为模型的路径,test_path为测试数据路径。 + +``` +CUDA_VISIBLE_DEVICES=3 python infer.py --model_path 'din_amazon/global_step_400000' --test_path 'data/paddle_test.txt' --use_cuda 1 +``` + +## 预测结果示例 +```text +2019-02-22 11:22:58,804 - INFO - TEST --> loss: [0.47005194] auc:0.863794952818 +``` + + +## 多机训练 +可参考cluster_train.py 配置多机环境 + +运行命令本地模拟多机场景 +``` +sh cluster_train.sh +``` diff --git a/fluid/PaddleRec/din/cluster_train.py b/PaddleRec/din/cluster_train.py similarity index 100% rename from fluid/PaddleRec/din/cluster_train.py rename to PaddleRec/din/cluster_train.py diff --git a/fluid/PaddleRec/din/cluster_train.sh b/PaddleRec/din/cluster_train.sh similarity index 100% rename from fluid/PaddleRec/din/cluster_train.sh rename to PaddleRec/din/cluster_train.sh diff --git a/fluid/PaddleRec/din/data/build_dataset.py b/PaddleRec/din/data/build_dataset.py similarity index 100% rename from fluid/PaddleRec/din/data/build_dataset.py rename to PaddleRec/din/data/build_dataset.py diff --git a/fluid/PaddleRec/din/data/convert_pd.py b/PaddleRec/din/data/convert_pd.py similarity index 100% rename from fluid/PaddleRec/din/data/convert_pd.py rename to PaddleRec/din/data/convert_pd.py diff --git a/fluid/PaddleRec/din/data/data_process.sh b/PaddleRec/din/data/data_process.sh similarity index 100% rename from fluid/PaddleRec/din/data/data_process.sh rename to PaddleRec/din/data/data_process.sh diff --git a/fluid/PaddleRec/din/data/remap_id.py b/PaddleRec/din/data/remap_id.py similarity index 100% rename from fluid/PaddleRec/din/data/remap_id.py rename to PaddleRec/din/data/remap_id.py diff --git a/fluid/PaddleRec/din/infer.py b/PaddleRec/din/infer.py similarity index 100% rename from fluid/PaddleRec/din/infer.py rename to PaddleRec/din/infer.py diff --git a/fluid/PaddleRec/din/network.py b/PaddleRec/din/network.py similarity index 100% rename from fluid/PaddleRec/din/network.py rename to PaddleRec/din/network.py diff --git a/fluid/PaddleRec/din/reader.py b/PaddleRec/din/reader.py similarity index 100% rename from fluid/PaddleRec/din/reader.py rename to PaddleRec/din/reader.py diff --git a/fluid/PaddleRec/din/train.py b/PaddleRec/din/train.py similarity index 100% rename from fluid/PaddleRec/din/train.py rename to PaddleRec/din/train.py diff --git a/PaddleRec/gnn/README.md b/PaddleRec/gnn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..29e3f721c5a81b64e21abb5242adccdc46b3d0f8 --- /dev/null +++ b/PaddleRec/gnn/README.md @@ -0,0 +1,118 @@ +# SR-GNN + +以下是本例的简要目录结构及说明: + +```text +. +├── README.md # 文档 +├── train.py # 训练脚本 +├── infer.py # 预测脚本 +├── network.py # 网络结构 +├── reader.py # 和读取数据相关的函数 +├── data/ + ├── download.sh # 下载数据的脚本 + ├── preprocess.py # 数据预处理 + +``` + +## 简介 + +SR-GNN模型的介绍可以参阅论文[Session-based Recommendation with Graph Neural Networks](https://arxiv.org/abs/1811.00855)。 + +本文解决的是Session-based Recommendation这一问题,过程大致分为以下四步: + +是对所有的session序列通过有向图进行建模。 + +然后通过GNN,学习每个node(item)的隐向量表示 + +然后通过一个attention架构模型得到每个session的embedding + +最后通过一个softmax层进行全表预测 + +我们复现了论文效果,在DIGINETICA数据集上P@20可以达到50.7 + + +## 数据下载及预处理 + +使用[DIGINETICA](http://cikm2016.cs.iupui.edu/cikm-cup)数据集。可以按照下述过程操作获得数据集以及进行简单的数据预处理。 + +* Step 1: 运行如下命令,下载DIGINETICA数据集并进行预处理 +``` +cd data && sh download.sh +``` + +* Step 2: 产生训练集、测试集和config文件 +``` +python preprocess.py --dataset diginetica +cd .. +``` +运行之后在data文件夹下会产生diginetica文件夹,里面包含config.txt、test.txt train.txt三个文件 + +生成的数据格式为:(session_list, +label_list)。 + +其中session_list是一个session的列表,其中每个元素都是一个list,代表不同的session。label_list是一个列表,每个位置的元素是session_list中对应session的label。 + +例子:session_list=[[1,2,3], [4], [7,9]]。代表这个session_list包含3个session,第一个session包含的item序列是1,2,3,第二个session只有1个item 4,第三个session包含的item序列是7,9。 + +label_list = [6, 9, +1]。代表[1,2,3]这个session的预测label值应该为6,后两个以此类推。 + +提示: + +* 如果您想使用自己业务场景下的数据,只要令数据满足上述格式要求即可 +* 本例中的train.txt和test.txt两个文件均为二进制文件 + + +## 训练 + +可以参考下面不同场景下的运行命令进行训练,还可以指定诸如batch_size,lr(learning rate)等参数,具体的配置说明可通过运行下列代码查看 +``` +python train.py -h +``` + +gpu 单机单卡训练 +``` bash +CUDA_VISIBLE_DEVICES=1 python -u train.py --use_cuda 1 > log.txt 2>&1 & +``` + +cpu 单机训练 +``` bash +python -u train.py --use_cuda 0 > log.txt 2>&1 & +``` + +值得注意的是上述单卡训练可以通过加--parallel 1参数使用Parallel Executor来进行加速 + + +## 训练结果示例 + +我们在Tesla K40m单GPU卡上训练的日志如下所示(以实际输出为准) +```text +W0308 16:08:24.249840 1785 device_context.cc:263] Please NOTE: device: 0, CUDA Capability: 35, Driver API Version: 9.0, Runtime API Version: 8.0 +W0308 16:08:24.249974 1785 device_context.cc:271] device: 0, cuDNN Version: 7.0. +2019-03-08 16:08:38,079 - INFO - load data complete +2019-03-08 16:08:38,080 - INFO - begin train +2019-03-08 16:09:07,605 - INFO - step: 500, loss: 10.2052, train_acc: 0.0088 +2019-03-08 16:09:36,940 - INFO - step: 1000, loss: 9.7192, train_acc: 0.0320 +2019-03-08 16:10:08,617 - INFO - step: 1500, loss: 8.9290, train_acc: 0.1350 +... +2019-03-08 16:16:01,151 - INFO - model saved in ./saved_model/epoch_0 +... +``` + +## 预测 +运行如下命令即可开始预测。可以通过参数指定开始和结束的epoch轮次。 + +``` +CUDA_VISIBLE_DEVICES=3 python infer.py +``` + +## 预测结果示例 +```text +W0308 16:41:56.847339 31709 device_context.cc:263] Please NOTE: device: 0, CUDA Capability: 35, Driver API Version: 9.0, Runtime API Version: 8.0 +W0308 16:41:56.847705 31709 device_context.cc:271] device: 0, cuDNN Version: 7.0. +2019-03-08 16:42:20,420 - INFO - TEST --> loss: 5.8865, Recall@20: 0.4525 +2019-03-08 16:42:45,153 - INFO - TEST --> loss: 5.5314, Recall@20: 0.5010 +2019-03-08 16:43:10,233 - INFO - TEST --> loss: 5.5128, Recall@20: 0.5047 +... +``` diff --git a/fluid/PaddleRec/gnn/data/download.sh b/PaddleRec/gnn/data/download.sh similarity index 100% rename from fluid/PaddleRec/gnn/data/download.sh rename to PaddleRec/gnn/data/download.sh diff --git a/fluid/PaddleRec/gnn/data/gdown.pl b/PaddleRec/gnn/data/gdown.pl similarity index 100% rename from fluid/PaddleRec/gnn/data/gdown.pl rename to PaddleRec/gnn/data/gdown.pl diff --git a/fluid/PaddleRec/gnn/data/preprocess.py b/PaddleRec/gnn/data/preprocess.py similarity index 100% rename from fluid/PaddleRec/gnn/data/preprocess.py rename to PaddleRec/gnn/data/preprocess.py diff --git a/fluid/PaddleRec/gnn/infer.py b/PaddleRec/gnn/infer.py similarity index 100% rename from fluid/PaddleRec/gnn/infer.py rename to PaddleRec/gnn/infer.py diff --git a/fluid/PaddleRec/gnn/network.py b/PaddleRec/gnn/network.py similarity index 100% rename from fluid/PaddleRec/gnn/network.py rename to PaddleRec/gnn/network.py diff --git a/fluid/PaddleRec/gnn/reader.py b/PaddleRec/gnn/reader.py similarity index 100% rename from fluid/PaddleRec/gnn/reader.py rename to PaddleRec/gnn/reader.py diff --git a/fluid/PaddleRec/gnn/train.py b/PaddleRec/gnn/train.py similarity index 100% rename from fluid/PaddleRec/gnn/train.py rename to PaddleRec/gnn/train.py diff --git a/fluid/PaddleRec/gru4rec/.run_ce.sh b/PaddleRec/gru4rec/.run_ce.sh similarity index 100% rename from fluid/PaddleRec/gru4rec/.run_ce.sh rename to PaddleRec/gru4rec/.run_ce.sh diff --git a/PaddleRec/gru4rec/README.md b/PaddleRec/gru4rec/README.md new file mode 100644 index 0000000000000000000000000000000000000000..353781567f7012996199e51169233b306cd18722 --- /dev/null +++ b/PaddleRec/gru4rec/README.md @@ -0,0 +1,283 @@ +# GRU4REC + +以下是本例的简要目录结构及说明: + +```text +. +├── README.md # 文档 +├── train.py # 训练脚本 全词表 cross-entropy +├── train_sample_neg.py # 训练脚本 sample负例 包含bpr loss 和cross-entropy +├── infer.py # 预测脚本 全词表 +├── infer_sample_neg.py # 预测脚本 sample负例 +├── net.py # 网络结构 +├── text2paddle.py # 文本数据转paddle数据 +├── cluster_train.py # 多机训练 +├── cluster_train.sh # 多机训练脚本 +├── utils # 通用函数 +├── convert_format.py # 转换数据格式 +├── vocab.txt # 小样本字典 +├── train_data # 小样本训练目录 +└── test_data # 小样本测试目录 + +``` + + +## 简介 + +GRU4REC模型的介绍可以参阅论文[Session-based Recommendations with Recurrent Neural Networks](https://arxiv.org/abs/1511.06939)。 + +论文的贡献在于首次将RNN(GRU)运用于session-based推荐,相比传统的KNN和矩阵分解,效果有明显的提升。 + +论文的核心思想是在一个session中,用户点击一系列item的行为看做一个序列,用来训练RNN模型。预测阶段,给定已知的点击序列作为输入,预测下一个可能点击的item。 + +session-based推荐应用场景非常广泛,比如用户的商品浏览、新闻点击、地点签到等序列数据。 + +支持三种形式的损失函数, 分别是全词表的cross-entropy, 负采样的Bayesian Pairwise Ranking和负采样的Cross-entropy. + +我们基本复现了论文效果,recall@20的效果分别为 + +全词表 cross entropy : 0.67 + +负采样 bpr : 0.606 + +负采样 cross entropy : 0.605 + + +运行样例程序可跳过'RSC15 数据下载及预处理'部分 +## RSC15 数据下载及预处理 + +运行命令 下载RSC15官网数据集 +``` +curl -Lo yoochoose-data.7z https://s3-eu-west-1.amazonaws.com/yc-rdata/yoochoose-data.7z +7z x yoochoose-data.7z +``` + +GRU4REC的数据过滤,下载脚本[https://github.com/hidasib/GRU4Rec/blob/master/examples/rsc15/preprocess.py](https://github.com/hidasib/GRU4Rec/blob/master/examples/rsc15/preprocess.py), + +注意修改文件路径 + +line12: PATH_TO_ORIGINAL_DATA = './' + +line13:PATH_TO_PROCESSED_DATA = './' + +注意使用python3 执行脚本 +``` +python preprocess.py +``` +生成的数据格式如下 + +``` +SessionId ItemId Time +1 214536502 1396839069.277 +1 214536500 1396839249.868 +1 214536506 1396839286.998 +1 214577561 1396839420.306 +2 214662742 1396850197.614 +2 214662742 1396850239.373 +2 214825110 1396850317.446 +2 214757390 1396850390.71 +2 214757407 1396850438.247 +``` + +数据格式需要转换, 运行脚本如下 +``` +python convert_format.py +``` + +模型的训练及测试数据如下,一行表示一个用户按照时间顺序的序列 + +``` +214536502 214536500 214536506 214577561 +214662742 214662742 214825110 214757390 214757407 214551617 +214716935 214774687 214832672 +214836765 214706482 +214701242 214826623 +214826835 214826715 +214838855 214838855 +214576500 214576500 214576500 +214821275 214821275 214821371 214821371 214821371 214717089 214563337 214706462 214717436 214743335 214826837 214819762 +214717867 214717867 +``` + +根据训练和测试文件生成字典和对应的paddle输入文件 + +需要将训练文件放到目录raw_train_data下,测试文件放到目录raw_test_data下,并生成对应的train_data,test_data和vocab.txt文件 +``` +python text2paddle.py raw_train_data/ raw_test_data/ train_data test_data vocab.txt +``` + +转化后生成的格式如下,可参考train_data/small_train.txt +``` +197 196 198 236 +93 93 384 362 363 43 +336 364 407 +421 322 +314 388 +128 58 +138 138 +46 46 46 +34 34 57 57 57 342 228 321 346 357 59 376 +110 110 +``` + +## 训练 + +具体的参数配置可运行 +``` +python train.py -h +``` +全词表cross entropy 训练代码 + +gpu 单机单卡训练 +``` bash +CUDA_VISIBLE_DEVICES=0 python train.py --train_dir train_data --use_cuda 1 --batch_size 50 --model_dir model_output +``` + +cpu 单机训练 +``` bash +python train.py --train_dir train_data --use_cuda 0 --batch_size 50 --model_dir model_output +``` + +gpu 单机多卡训练 +``` bash +CUDA_VISIBLE_DEVICES=0,1 python train.py --train_dir train_data --use_cuda 1 --parallel 1 --batch_size 50 --model_dir model_output --num_devices 2 +``` + +cpu 单机多卡训练 +``` bash +CPU_NUM=10 python train.py --train_dir train_data --use_cuda 0 --parallel 1 --batch_size 50 --model_dir model_output --num_devices 10 +``` + +负采样 bayesian pairwise ranking loss(bpr loss) 训练 +``` +CUDA_VISIBLE_DEVICES=0 python train_sample_neg.py --loss bpr --use_cuda 1 +``` + +负采样 cross entropy 训练 +``` +CUDA_VISIBLE_DEVICES=0 python train_sample_neg.py --loss ce --use_cuda 1 +``` + +## 自定义网络结构 + +可在[net.py](./net.py) `network` 函数中调整网络结构,当前的网络结构如下: +```python +emb = fluid.layers.embedding( + input=src, + size=[vocab_size, hid_size], + param_attr=fluid.ParamAttr( + initializer=fluid.initializer.Uniform( + low=init_low_bound, high=init_high_bound), + learning_rate=emb_lr_x), + is_sparse=True) + +fc0 = fluid.layers.fc(input=emb, + size=hid_size * 3, + param_attr=fluid.ParamAttr( + initializer=fluid.initializer.Uniform( + low=init_low_bound, high=init_high_bound), + learning_rate=gru_lr_x)) +gru_h0 = fluid.layers.dynamic_gru( + input=fc0, + size=hid_size, + param_attr=fluid.ParamAttr( + initializer=fluid.initializer.Uniform( + low=init_low_bound, high=init_high_bound), + learning_rate=gru_lr_x)) + +fc = fluid.layers.fc(input=gru_h0, + size=vocab_size, + act='softmax', + param_attr=fluid.ParamAttr( + initializer=fluid.initializer.Uniform( + low=init_low_bound, high=init_high_bound), + learning_rate=fc_lr_x)) + +cost = fluid.layers.cross_entropy(input=fc, label=dst) +acc = fluid.layers.accuracy(input=fc, label=dst, k=20) +``` + +## 训练结果示例 + +我们在Tesla K40m单GPU卡上训练的日志如下所示 +```text +epoch_1 start +step:100 ppl:441.468 +step:200 ppl:311.043 +step:300 ppl:218.952 +step:400 ppl:186.172 +step:500 ppl:188.600 +step:600 ppl:131.213 +step:700 ppl:165.770 +step:800 ppl:164.414 +step:900 ppl:156.470 +step:1000 ppl:174.201 +step:1100 ppl:118.619 +step:1200 ppl:122.635 +step:1300 ppl:118.220 +step:1400 ppl:90.372 +step:1500 ppl:135.018 +step:1600 ppl:114.327 +step:1700 ppl:141.806 +step:1800 ppl:93.416 +step:1900 ppl:92.897 +step:2000 ppl:121.703 +step:2100 ppl:96.288 +step:2200 ppl:88.355 +step:2300 ppl:101.737 +step:2400 ppl:95.934 +step:2500 ppl:86.158 +step:2600 ppl:80.925 +step:2700 ppl:202.219 +step:2800 ppl:106.828 +step:2900 ppl:91.458 +step:3000 ppl:105.988 +step:3100 ppl:87.067 +step:3200 ppl:92.651 +step:3300 ppl:101.145 +step:3400 ppl:91.247 +step:3500 ppl:107.656 +step:3600 ppl:89.410 +... +... +step:15700 ppl:76.819 +step:15800 ppl:62.257 +step:15900 ppl:81.735 +epoch:1 num_steps:15907 time_cost(s):4154.096032 +model saved in model_recall20/epoch_1 +... +``` + +## 预测 +运行命令 全词表运行infer.py, 负采样运行infer_sample_neg.py。 + +``` +CUDA_VISIBLE_DEVICES=0 python infer.py --test_dir test_data/ --model_dir model_output/ --start_index 1 --last_index 10 --use_cuda 1 +``` + +## 预测结果示例 +```text +model:model_r@20/epoch_1 recall@20:0.613 time_cost(s):12.23 +model:model_r@20/epoch_2 recall@20:0.647 time_cost(s):12.33 +model:model_r@20/epoch_3 recall@20:0.662 time_cost(s):12.38 +model:model_r@20/epoch_4 recall@20:0.669 time_cost(s):12.21 +model:model_r@20/epoch_5 recall@20:0.673 time_cost(s):12.17 +model:model_r@20/epoch_6 recall@20:0.675 time_cost(s):12.26 +model:model_r@20/epoch_7 recall@20:0.677 time_cost(s):12.25 +model:model_r@20/epoch_8 recall@20:0.679 time_cost(s):12.37 +model:model_r@20/epoch_9 recall@20:0.680 time_cost(s):12.22 +model:model_r@20/epoch_10 recall@20:0.681 time_cost(s):12.2 +``` + + +## 多机训练 +厂内用户可以参考[wiki](http://wiki.baidu.com/pages/viewpage.action?pageId=628300529)利用paddlecloud 配置多机环境 + +可参考cluster_train.py 配置其他多机环境 + +运行命令本地模拟多机场景 +``` +sh cluster_train.sh +``` + +注意本地模拟需要关闭代理 diff --git a/fluid/PaddleNLP/text_matching_on_quora/__init__.py b/PaddleRec/gru4rec/__init__.py similarity index 100% rename from fluid/PaddleNLP/text_matching_on_quora/__init__.py rename to PaddleRec/gru4rec/__init__.py diff --git a/fluid/PaddleRec/gru4rec/_ce.py b/PaddleRec/gru4rec/_ce.py similarity index 100% rename from fluid/PaddleRec/gru4rec/_ce.py rename to PaddleRec/gru4rec/_ce.py diff --git a/fluid/PaddleRec/gru4rec/cluster_train.py b/PaddleRec/gru4rec/cluster_train.py similarity index 100% rename from fluid/PaddleRec/gru4rec/cluster_train.py rename to PaddleRec/gru4rec/cluster_train.py diff --git a/fluid/PaddleRec/gru4rec/cluster_train.sh b/PaddleRec/gru4rec/cluster_train.sh similarity index 100% rename from fluid/PaddleRec/gru4rec/cluster_train.sh rename to PaddleRec/gru4rec/cluster_train.sh diff --git a/fluid/PaddleRec/gru4rec/convert_format.py b/PaddleRec/gru4rec/convert_format.py similarity index 100% rename from fluid/PaddleRec/gru4rec/convert_format.py rename to PaddleRec/gru4rec/convert_format.py diff --git a/fluid/PaddleRec/gru4rec/infer.py b/PaddleRec/gru4rec/infer.py similarity index 100% rename from fluid/PaddleRec/gru4rec/infer.py rename to PaddleRec/gru4rec/infer.py diff --git a/fluid/PaddleRec/gru4rec/infer_sample_neg.py b/PaddleRec/gru4rec/infer_sample_neg.py similarity index 100% rename from fluid/PaddleRec/gru4rec/infer_sample_neg.py rename to PaddleRec/gru4rec/infer_sample_neg.py diff --git a/fluid/PaddleRec/gru4rec/net.py b/PaddleRec/gru4rec/net.py similarity index 100% rename from fluid/PaddleRec/gru4rec/net.py rename to PaddleRec/gru4rec/net.py diff --git a/fluid/PaddleRec/gru4rec/test_data/small_test.txt b/PaddleRec/gru4rec/test_data/small_test.txt similarity index 100% rename from fluid/PaddleRec/gru4rec/test_data/small_test.txt rename to PaddleRec/gru4rec/test_data/small_test.txt diff --git a/fluid/PaddleRec/gru4rec/text2paddle.py b/PaddleRec/gru4rec/text2paddle.py similarity index 100% rename from fluid/PaddleRec/gru4rec/text2paddle.py rename to PaddleRec/gru4rec/text2paddle.py diff --git a/fluid/PaddleRec/gru4rec/train.py b/PaddleRec/gru4rec/train.py similarity index 100% rename from fluid/PaddleRec/gru4rec/train.py rename to PaddleRec/gru4rec/train.py diff --git a/fluid/PaddleRec/gru4rec/train_data/small_train.txt b/PaddleRec/gru4rec/train_data/small_train.txt similarity index 100% rename from fluid/PaddleRec/gru4rec/train_data/small_train.txt rename to PaddleRec/gru4rec/train_data/small_train.txt diff --git a/fluid/PaddleRec/gru4rec/train_sample_neg.py b/PaddleRec/gru4rec/train_sample_neg.py similarity index 100% rename from fluid/PaddleRec/gru4rec/train_sample_neg.py rename to PaddleRec/gru4rec/train_sample_neg.py diff --git a/fluid/PaddleRec/gru4rec/utils.py b/PaddleRec/gru4rec/utils.py similarity index 100% rename from fluid/PaddleRec/gru4rec/utils.py rename to PaddleRec/gru4rec/utils.py diff --git a/fluid/PaddleRec/gru4rec/vocab.txt b/PaddleRec/gru4rec/vocab.txt similarity index 100% rename from fluid/PaddleRec/gru4rec/vocab.txt rename to PaddleRec/gru4rec/vocab.txt diff --git a/fluid/PaddleRec/multiview_simnet/.pre-commit-config.yaml b/PaddleRec/multiview_simnet/.pre-commit-config.yaml similarity index 100% rename from fluid/PaddleRec/multiview_simnet/.pre-commit-config.yaml rename to PaddleRec/multiview_simnet/.pre-commit-config.yaml diff --git a/fluid/PaddleRec/multiview_simnet/.run_ce.sh b/PaddleRec/multiview_simnet/.run_ce.sh similarity index 100% rename from fluid/PaddleRec/multiview_simnet/.run_ce.sh rename to PaddleRec/multiview_simnet/.run_ce.sh diff --git a/PaddleRec/multiview_simnet/README.cn.md b/PaddleRec/multiview_simnet/README.cn.md new file mode 100644 index 0000000000000000000000000000000000000000..06df3c32c7996f5003bd7b9c1eb749f32c28b752 --- /dev/null +++ b/PaddleRec/multiview_simnet/README.cn.md @@ -0,0 +1,27 @@ +# 个性化推荐中的多视角Simnet模型 + +## 介绍 +在个性化推荐场景中,推荐系统给用户提供的项目(Item)列表通常是通过个性化的匹配模型计算出来的。在现实世界中,一个用户可能有很多个视角的特征,比如用户Id,年龄,项目的点击历史等。一个项目,举例来说,新闻资讯,也会有多种视角的特征比如新闻标题,新闻类别等。多视角Simnet模型是可以融合用户以及推荐项目的多个视角的特征并进行个性化匹配学习的一体化模型。这类模型在很多工业化的场景中都会被使用到,比如百度的Feed产品中。 + +## 数据集 +目前,本项目使用机器生成的数据集来介绍多视角Simnet模型的概念,未来我们会逐渐加入真是世界中的数据集并在这个模型上进行效果验证。 + +## 模型 +本项目的目标是提供一个在个性化匹配场景下利用Paddle搭建的模型。多视角Simnet模型包括多个编码器模块,每个编码器被用在不同的特征视角上。当前,项目中提供Bag-of-Embedding编码器,Temporal-Convolutional编码器,和Gated-Recurrent-Unit编码器。我们会逐渐加入稀疏特征场景下比较实用的编码器到这个项目中。模型的训练方法,当前采用的是Pairwise ranking模式进行训练,即针对一对具有关联的User-Item组合,随机实用一个Item作为负例进行排序学习。 + +## 训练 +如下 +如下命令行可以获得训练工具的具体选项,`python train.py -h`内容可以参考说明 +```bash +python train.py +``` +## +如下 +如下命令行可以获得预测工具的具体选项,`python infer -h`内容可以参考说明 +```bash +python infer.py +``` +## 未来的工作 +- 多种pairwise的损失函数会被加入到这个项目中。对于不同视角的特征,用户-项目之间的匹配关系可以使用不同的损失函数进行联合优化。整个模型会在真实数据中进行验证。 +- Parallel Executor选项会被加入 +- 分布式训练能力会被加入 diff --git a/PaddleRec/multiview_simnet/README.md b/PaddleRec/multiview_simnet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..525946e612592b97e10707cadf35e5252230c2bd --- /dev/null +++ b/PaddleRec/multiview_simnet/README.md @@ -0,0 +1,27 @@ +# Multi-view Simnet for Personalized recommendation + +## Introduction +In personalized recommendation scenario, a user often is provided with several items from personalized interest matching model. In real world application, a user may have multiple views of features, say user-id, age, click-history of items, search queries. A item, e.g. news, may also have multiple views of features like news title, news category, images in news and so on. Multi-view Simnet is matching a model that combine users' and items' multiple views of features into one unified model. The model can be used in many industrial product like Baidu's feed news. The model is adapted from the paper A Multi-View Deep Learning(MV-DNN) Approach for Cross Domain User Modeling in Recommendation Systems, WWW 2015. The difference between our model and the MV-DNN is that we also consider multiple feature views of users. + +## Dataset +Currently, synthetic dataset is provided for proof of concept and we aim to add more real world dataset in this project in the future. + +## Model +This project aims to provide practical usage of Paddle in personalized matching scenario. The model provides several encoder modules for different views of features. Currently, Bag-of-Embedding encoder, Temporal-Convolutional encoder, Gated-Recurrent-Unit encoder are provided. We will add more practical encoder for sparse features commonly used in recommender systems. Training algorithms used in this model is pairwise ranking in that a negative item with multiple views will be sampled given a pair of positive user-item pair. + +## Train +The command line options for training can be listed by `python train.py -h` +```bash +python train.py +``` + +## Infer +The command line options for inference can be listed by `python infer.py -h` +```bash +python infer.py +``` + +## Future work +- Multiple types of pairwise loss will be added in this project. For different views of features between a user and an item, multiple losses will be supported. The model will be verified in real world dataset. +- Parallel Executor will be added in this project +- Distributed Training will be added diff --git a/fluid/PaddleRec/__init__.py b/PaddleRec/multiview_simnet/__init__.py similarity index 100% rename from fluid/PaddleRec/__init__.py rename to PaddleRec/multiview_simnet/__init__.py diff --git a/fluid/PaddleRec/multiview_simnet/_ce.py b/PaddleRec/multiview_simnet/_ce.py similarity index 100% rename from fluid/PaddleRec/multiview_simnet/_ce.py rename to PaddleRec/multiview_simnet/_ce.py diff --git a/fluid/PaddleRec/multiview_simnet/infer.py b/PaddleRec/multiview_simnet/infer.py similarity index 100% rename from fluid/PaddleRec/multiview_simnet/infer.py rename to PaddleRec/multiview_simnet/infer.py diff --git a/fluid/PaddleRec/multiview_simnet/nets.py b/PaddleRec/multiview_simnet/nets.py similarity index 100% rename from fluid/PaddleRec/multiview_simnet/nets.py rename to PaddleRec/multiview_simnet/nets.py diff --git a/fluid/PaddleRec/multiview_simnet/reader.py b/PaddleRec/multiview_simnet/reader.py similarity index 100% rename from fluid/PaddleRec/multiview_simnet/reader.py rename to PaddleRec/multiview_simnet/reader.py diff --git a/fluid/PaddleRec/multiview_simnet/train.py b/PaddleRec/multiview_simnet/train.py similarity index 100% rename from fluid/PaddleRec/multiview_simnet/train.py rename to PaddleRec/multiview_simnet/train.py diff --git a/fluid/PaddleRec/ssr/.run_ce.sh b/PaddleRec/ssr/.run_ce.sh similarity index 100% rename from fluid/PaddleRec/ssr/.run_ce.sh rename to PaddleRec/ssr/.run_ce.sh diff --git a/PaddleRec/ssr/README.md b/PaddleRec/ssr/README.md new file mode 100644 index 0000000000000000000000000000000000000000..d0b4dfb41b4cea19efa42c4a233c9544349d1770 --- /dev/null +++ b/PaddleRec/ssr/README.md @@ -0,0 +1,52 @@ +# Sequence Semantic Retrieval Model + +## Introduction +In news recommendation scenarios, different from traditional systems that recommend entertainment items such as movies or music, there are several new problems to solve. +- Very sparse user profile features exist that a user may login a news recommendation app anonymously and a user is likely to read a fresh news item. +- News are generated or disappeared very fast compare with movies or musics. Usually, there will be thousands of news generated in a news recommendation app. The Consumption of news is also fast since users care about newly happened things. +- User interests may change frequently in the news recommendation setting. The content of news will affect users' reading behaviors a lot even the category of the news does not belong to users' long-term interest. In news recommendation, reading behaviors are determined by both short-term interest and long-term interest of users. + +[GRU4Rec](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/gru4rec) models a user's short-term and long-term interest by applying a gated-recurrent-unit on the user's reading history. The generalization ability of recurrent neural network captures users' similarity of reading sequences that alleviates the user profile sparsity problem. However, the paper of GRU4Rec operates on close domain of items that the model predicts which item a user will be interested in through classification method. In news recommendation, news items are dynamic through time that GRU4Rec model can not predict items that do not exist in training dataset. + +Sequence Semantic Retrieval(SSR) Model shares the similar idea with Multi-Rate Deep Learning for Temporal Recommendation, SIGIR 2016. Sequence Semantic Retrieval Model has two components, one is the matching model part, the other one is the retrieval part. +- The idea of SSR is to model a user's personalized interest of an item through matching model structure, and the representation of a news item can be computed online even the news item does not exist in training dataset. +- With the representation of news items, we are able to build an vector indexing service online for news prediction and this is the retrieval part of SSR. + +## Dataset +Dataset preprocessing follows the method of [GRU4Rec Project](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/gru4rec). Note that you should reuse scripts from GRU4Rec project for data preprocessing. + +## Training + +The command line options for training can be listed by `python train.py -h` + +gpu 单机单卡训练 +``` bash +CUDA_VISIBLE_DEVICES=0 python train.py --train_dir train_data --use_cuda 1 --batch_size 50 --model_dir model_output +``` + +cpu 单机训练 +``` bash +python train.py --train_dir train_data --use_cuda 0 --batch_size 50 --model_dir model_output +``` + +gpu 单机多卡训练 +``` bash +CUDA_VISIBLE_DEVICES=0,1 python train.py --train_dir train_data --use_cuda 1 --parallel 1 --batch_size 50 --model_dir model_output --num_devices 2 +``` + +cpu 单机多卡训练 +``` bash +CPU_NUM=10 python train.py --train_dir train_data --use_cuda 0 --parallel 1 --batch_size 50 --model_dir model_output --num_devices 10 +``` + +本地模拟多机训练 +``` bash +sh cluster_train.sh +``` + +## Inference + +gpu 预测 +``` bash +CUDA_VISIBLE_DEVICES=0 python infer.py --test_dir test_data --use_cuda 1 --batch_size 50 --model_dir model_output +``` diff --git a/fluid/PaddleRec/ctr/__init__.py b/PaddleRec/ssr/__init__.py similarity index 100% rename from fluid/PaddleRec/ctr/__init__.py rename to PaddleRec/ssr/__init__.py diff --git a/fluid/PaddleRec/ssr/_ce.py b/PaddleRec/ssr/_ce.py similarity index 100% rename from fluid/PaddleRec/ssr/_ce.py rename to PaddleRec/ssr/_ce.py diff --git a/fluid/PaddleRec/ssr/cluster_train.py b/PaddleRec/ssr/cluster_train.py similarity index 100% rename from fluid/PaddleRec/ssr/cluster_train.py rename to PaddleRec/ssr/cluster_train.py diff --git a/fluid/PaddleRec/ssr/cluster_train.sh b/PaddleRec/ssr/cluster_train.sh similarity index 100% rename from fluid/PaddleRec/ssr/cluster_train.sh rename to PaddleRec/ssr/cluster_train.sh diff --git a/fluid/PaddleRec/ssr/infer.py b/PaddleRec/ssr/infer.py similarity index 100% rename from fluid/PaddleRec/ssr/infer.py rename to PaddleRec/ssr/infer.py diff --git a/fluid/PaddleRec/ssr/nets.py b/PaddleRec/ssr/nets.py similarity index 100% rename from fluid/PaddleRec/ssr/nets.py rename to PaddleRec/ssr/nets.py diff --git a/fluid/PaddleRec/ssr/reader.py b/PaddleRec/ssr/reader.py similarity index 100% rename from fluid/PaddleRec/ssr/reader.py rename to PaddleRec/ssr/reader.py diff --git a/fluid/PaddleRec/ssr/test_data/small_test.txt b/PaddleRec/ssr/test_data/small_test.txt similarity index 100% rename from fluid/PaddleRec/ssr/test_data/small_test.txt rename to PaddleRec/ssr/test_data/small_test.txt diff --git a/fluid/PaddleRec/ssr/train.py b/PaddleRec/ssr/train.py similarity index 100% rename from fluid/PaddleRec/ssr/train.py rename to PaddleRec/ssr/train.py diff --git a/fluid/PaddleRec/ssr/train_data/small_train.txt b/PaddleRec/ssr/train_data/small_train.txt similarity index 100% rename from fluid/PaddleRec/ssr/train_data/small_train.txt rename to PaddleRec/ssr/train_data/small_train.txt diff --git a/fluid/PaddleRec/ssr/utils.py b/PaddleRec/ssr/utils.py similarity index 100% rename from fluid/PaddleRec/ssr/utils.py rename to PaddleRec/ssr/utils.py diff --git a/fluid/PaddleRec/ssr/vocab.txt b/PaddleRec/ssr/vocab.txt similarity index 100% rename from fluid/PaddleRec/ssr/vocab.txt rename to PaddleRec/ssr/vocab.txt diff --git a/fluid/PaddleRec/tagspace/.run_ce.sh b/PaddleRec/tagspace/.run_ce.sh similarity index 100% rename from fluid/PaddleRec/tagspace/.run_ce.sh rename to PaddleRec/tagspace/.run_ce.sh diff --git a/PaddleRec/tagspace/README.md b/PaddleRec/tagspace/README.md new file mode 100644 index 0000000000000000000000000000000000000000..4263065bee2c5492684147f532e92c7c8083e16f --- /dev/null +++ b/PaddleRec/tagspace/README.md @@ -0,0 +1,92 @@ +# TagSpace + +以下是本例的简要目录结构及说明: + +```text +. +├── README.md # 文档 +├── train.py # 训练脚本 +├── infer.py # 预测脚本 +├── net.py # 网络结构 +├── text2paddle.py # 文本数据转paddle数据 +├── cluster_train.py # 多机训练 +├── cluster_train.sh # 多机训练脚本 +├── utils # 通用函数 +├── vocab_text.txt # 小样本文本字典 +├── vocab_tag.txt # 小样本类别字典 +├── train_data # 小样本训练目录 +└── test_data # 小样本测试目录 + +``` + + +## 简介 + +TagSpace模型的介绍可以参阅论文[#TagSpace: Semantic Embeddings from Hashtags](https://research.fb.com/publications/tagspace-semantic-embeddings-from-hashtags/)。 + +Tagspace模型学习文本及标签的embedding表示,应用于工业级的标签推荐,具体应用场景有feed新闻标签推荐。 + + +## 数据下载及预处理 + +数据地址: [ag news dataset](https://github.com/mhjabreel/CharCNN/tree/master/data/) + +备份数据地址:[ag news dataset](https://paddle-tagspace.bj.bcebos.com/data.tar) + +数据格式如下 + +``` +"3","Wall St. Bears Claw Back Into the Black (Reuters)","Reuters - Short-sellers, Wall Street's dwindling\band of ultra-cynics, are seeing green again." +``` + +备份数据解压后,将文本数据转为paddle数据,先将数据放到训练数据目录和测试数据目录 +``` +mv train.csv raw_big_train_data +mv test.csv raw_big_test_data +``` + +运行脚本text2paddle.py 生成paddle输入格式 +``` +python text2paddle.py raw_big_train_data/ raw_big_test_data/ train_big_data test_big_data big_vocab_text.txt big_vocab_tag.txt +``` + +## 单机训练 +'--use_cuda 1' 表示使用gpu, 0表示使用cpu, '--parallel 1' 表示使用多卡 + +小数据训练(样例中的数据已经准备,可跳过上一节的数据准备,直接运行命令) + +GPU 环境 +``` +CUDA_VISIBLE_DEVICES=0 python train.py --use_cuda 1 +``` +CPU 环境 +``` +python train.py +``` + +全量数据单机单卡训练 +``` +CUDA_VISIBLE_DEVICES=0 python train.py --use_cuda 1 --train_dir train_big_data/ --vocab_text_path big_vocab_text.txt --vocab_tag_path big_vocab_tag.txt --model_dir big_model --batch_size 500 +``` +全量数据单机多卡训练 + +``` +python train.py --train_dir train_big_data/ --vocab_text_path big_vocab_text.txt --vocab_tag_path big_vocab_tag.txt --model_dir big_model --batch_size 500 --parallel 1 +``` + +## 预测 +小数据预测 +``` +python infer.py +``` + +全量数据预测 +``` +python infer.py --model_dir big_model --vocab_tag_path big_vocab_tag.txt --test_dir test_big_data/ +``` + +## 本地模拟多机 +运行命令 +``` +sh cluster_train.py +``` diff --git a/fluid/PaddleRec/tagspace/__init.py__ b/PaddleRec/tagspace/__init.py__ similarity index 100% rename from fluid/PaddleRec/tagspace/__init.py__ rename to PaddleRec/tagspace/__init.py__ diff --git a/fluid/PaddleRec/tagspace/_ce.py b/PaddleRec/tagspace/_ce.py similarity index 100% rename from fluid/PaddleRec/tagspace/_ce.py rename to PaddleRec/tagspace/_ce.py diff --git a/fluid/PaddleRec/tagspace/cluster_train.py b/PaddleRec/tagspace/cluster_train.py similarity index 100% rename from fluid/PaddleRec/tagspace/cluster_train.py rename to PaddleRec/tagspace/cluster_train.py diff --git a/fluid/PaddleRec/tagspace/cluster_train.sh b/PaddleRec/tagspace/cluster_train.sh similarity index 100% rename from fluid/PaddleRec/tagspace/cluster_train.sh rename to PaddleRec/tagspace/cluster_train.sh diff --git a/fluid/PaddleRec/tagspace/infer.py b/PaddleRec/tagspace/infer.py similarity index 100% rename from fluid/PaddleRec/tagspace/infer.py rename to PaddleRec/tagspace/infer.py diff --git a/fluid/PaddleRec/tagspace/net.py b/PaddleRec/tagspace/net.py similarity index 100% rename from fluid/PaddleRec/tagspace/net.py rename to PaddleRec/tagspace/net.py diff --git a/fluid/PaddleRec/tagspace/test_data/small_test.csv b/PaddleRec/tagspace/test_data/small_test.csv similarity index 100% rename from fluid/PaddleRec/tagspace/test_data/small_test.csv rename to PaddleRec/tagspace/test_data/small_test.csv diff --git a/fluid/PaddleRec/tagspace/text2paddle.py b/PaddleRec/tagspace/text2paddle.py similarity index 100% rename from fluid/PaddleRec/tagspace/text2paddle.py rename to PaddleRec/tagspace/text2paddle.py diff --git a/fluid/PaddleRec/tagspace/train.py b/PaddleRec/tagspace/train.py similarity index 100% rename from fluid/PaddleRec/tagspace/train.py rename to PaddleRec/tagspace/train.py diff --git a/fluid/PaddleRec/tagspace/train_data/small_train.csv b/PaddleRec/tagspace/train_data/small_train.csv similarity index 100% rename from fluid/PaddleRec/tagspace/train_data/small_train.csv rename to PaddleRec/tagspace/train_data/small_train.csv diff --git a/fluid/PaddleRec/tagspace/utils.py b/PaddleRec/tagspace/utils.py similarity index 100% rename from fluid/PaddleRec/tagspace/utils.py rename to PaddleRec/tagspace/utils.py diff --git a/fluid/PaddleRec/tagspace/vocab_tag.txt b/PaddleRec/tagspace/vocab_tag.txt similarity index 100% rename from fluid/PaddleRec/tagspace/vocab_tag.txt rename to PaddleRec/tagspace/vocab_tag.txt diff --git a/fluid/PaddleRec/tagspace/vocab_text.txt b/PaddleRec/tagspace/vocab_text.txt similarity index 100% rename from fluid/PaddleRec/tagspace/vocab_text.txt rename to PaddleRec/tagspace/vocab_text.txt diff --git a/PaddleRec/word2vec/README.md b/PaddleRec/word2vec/README.md new file mode 100644 index 0000000000000000000000000000000000000000..936d9fac5860f7adf9fcc587334ecb2aebce1991 --- /dev/null +++ b/PaddleRec/word2vec/README.md @@ -0,0 +1,113 @@ +# 基于skip-gram的word2vector模型 + +以下是本例的简要目录结构及说明: + +```text +. +├── cluster_train.py # 分布式训练函数 +├── cluster_train.sh # 本地模拟多机脚本 +├── train.py # 训练函数 +├── infer.py # 预测脚本 +├── net.py # 网络结构 +├── preprocess.py # 预处理脚本,包括构建词典和预处理文本 +├── reader.py # 训练阶段的文本读写 +├── README.md # 使用说明 +├── train.py # 训练函数 +└── utils.py # 通用函数 + +``` + +## 介绍 +本例实现了skip-gram模式的word2vector模型。 + + +## 数据下载 +全量数据集使用的是来自1 Billion Word Language Model Benchmark的(http://www.statmt.org/lm-benchmark) 的数据集. + +```bash +wget http://www.statmt.org/lm-benchmark/1-billion-word-language-modeling-benchmark-r13output.tar.gz +tar xzvf 1-billion-word-language-modeling-benchmark-r13output.tar +mv 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/ data/ +``` + +备用数据地址下载命令如下 + +```bash +wget https://paddlerec.bj.bcebos.com/word2vec/1-billion-word-language-modeling-benchmark-r13output.tar +tar xvf 1-billion-word-language-modeling-benchmark-r13output.tar +mv 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/ data/ +``` + +为了方便快速验证,我们也提供了经典的text8样例数据集,包含1700w个词。 下载命令如下 + +```bash +wget https://paddlerec.bj.bcebos.com/word2vec/text.tar +tar xvf text.tar +mv text data/ +``` + + +## 数据预处理 +以样例数据集为例进行预处理。全量数据集注意解压后以training-monolingual.tokenized.shuffled 目录为预处理目录,和样例数据集的text目录并列。 + +词典格式: 词<空格>词频。注意低频词用'UNK'表示 + +可以按格式自建词典,如果自建词典跳过第一步。 +``` +the 1061396 +of 593677 +and 416629 +one 411764 +in 372201 +a 325873 + 324608 +to 316376 +zero 264975 +nine 250430 +``` + +第一步根据英文语料生成词典,中文语料可以通过修改text_strip方法自定义处理方法。 + +```bash +python preprocess.py --build_dict --build_dict_corpus_dir data/text/ --dict_path data/test_build_dict +``` + +第二步根据词典将文本转成id, 同时进行downsample,按照概率过滤常见词。 + +```bash +python preprocess.py --filter_corpus --dict_path data/test_build_dict --input_corpus_dir data/text/ --output_corpus_dir data/convert_text8 --min_count 5 --downsample 0.001 +``` + +## 训练 +具体的参数配置可运行 + + +```bash +python train.py -h +``` + +单机多线程训练 +```bash +OPENBLAS_NUM_THREADS=1 CPU_NUM=5 python train.py --train_data_dir data/convert_text8 --dict_path data/test_build_dict --num_passes 10 --batch_size 100 --model_output_dir v1_cpu5_b100_lr1dir --base_lr 1.0 --print_batch 1000 --with_speed --is_sparse +``` + +本地单机模拟多机训练 + +```bash +sh cluster_train.sh +``` + +## 预测 +测试集下载命令如下 + +```bash +#全量数据集测试集 +wget https://paddlerec.bj.bcebos.com/word2vec/test_dir.tar +#样本数据集测试集 +wget https://paddlerec.bj.bcebos.com/word2vec/test_mid_dir.tar +``` + +预测命令,注意词典名称需要加后缀"_word_to_id_", 此文件是训练阶段生成的。 +```bash +python infer.py --infer_epoch --test_dir data/test_mid_dir/ --dict_path data/test_build_dict_word_to_id_ --batch_size 20000 --model_dir v1_cpu5_b100_lr1dir/ --start_index 0 +``` diff --git a/fluid/PaddleRec/word2vec/cluster_train.py b/PaddleRec/word2vec/cluster_train.py similarity index 100% rename from fluid/PaddleRec/word2vec/cluster_train.py rename to PaddleRec/word2vec/cluster_train.py diff --git a/fluid/PaddleRec/word2vec/cluster_train.sh b/PaddleRec/word2vec/cluster_train.sh similarity index 100% rename from fluid/PaddleRec/word2vec/cluster_train.sh rename to PaddleRec/word2vec/cluster_train.sh diff --git a/fluid/PaddleRec/word2vec/infer.py b/PaddleRec/word2vec/infer.py similarity index 100% rename from fluid/PaddleRec/word2vec/infer.py rename to PaddleRec/word2vec/infer.py diff --git a/fluid/PaddleRec/word2vec/net.py b/PaddleRec/word2vec/net.py similarity index 100% rename from fluid/PaddleRec/word2vec/net.py rename to PaddleRec/word2vec/net.py diff --git a/fluid/PaddleRec/word2vec/preprocess.py b/PaddleRec/word2vec/preprocess.py similarity index 100% rename from fluid/PaddleRec/word2vec/preprocess.py rename to PaddleRec/word2vec/preprocess.py diff --git a/fluid/PaddleRec/word2vec/reader.py b/PaddleRec/word2vec/reader.py similarity index 100% rename from fluid/PaddleRec/word2vec/reader.py rename to PaddleRec/word2vec/reader.py diff --git a/fluid/PaddleRec/word2vec/train.py b/PaddleRec/word2vec/train.py similarity index 100% rename from fluid/PaddleRec/word2vec/train.py rename to PaddleRec/word2vec/train.py diff --git a/fluid/PaddleRec/word2vec/utils.py b/PaddleRec/word2vec/utils.py similarity index 100% rename from fluid/PaddleRec/word2vec/utils.py rename to PaddleRec/word2vec/utils.py diff --git a/fluid/PaddleSlim/compress.py b/PaddleSlim/compress.py similarity index 100% rename from fluid/PaddleSlim/compress.py rename to PaddleSlim/compress.py diff --git a/fluid/PaddleSlim/configs/filter_pruning_sen.yaml b/PaddleSlim/configs/filter_pruning_sen.yaml similarity index 100% rename from fluid/PaddleSlim/configs/filter_pruning_sen.yaml rename to PaddleSlim/configs/filter_pruning_sen.yaml diff --git a/fluid/PaddleSlim/configs/filter_pruning_uniform.yaml b/PaddleSlim/configs/filter_pruning_uniform.yaml similarity index 100% rename from fluid/PaddleSlim/configs/filter_pruning_uniform.yaml rename to PaddleSlim/configs/filter_pruning_uniform.yaml diff --git a/fluid/PaddleSlim/configs/mobilenetv1_resnet50_distillation.yaml b/PaddleSlim/configs/mobilenetv1_resnet50_distillation.yaml similarity index 100% rename from fluid/PaddleSlim/configs/mobilenetv1_resnet50_distillation.yaml rename to PaddleSlim/configs/mobilenetv1_resnet50_distillation.yaml diff --git a/fluid/PaddleSlim/configs/quantization.yaml b/PaddleSlim/configs/quantization.yaml similarity index 100% rename from fluid/PaddleSlim/configs/quantization.yaml rename to PaddleSlim/configs/quantization.yaml diff --git a/fluid/PaddleSlim/models/__init__.py b/PaddleSlim/models/__init__.py similarity index 100% rename from fluid/PaddleSlim/models/__init__.py rename to PaddleSlim/models/__init__.py diff --git a/fluid/PaddleSlim/models/mobilenet.py b/PaddleSlim/models/mobilenet.py similarity index 100% rename from fluid/PaddleSlim/models/mobilenet.py rename to PaddleSlim/models/mobilenet.py diff --git a/fluid/PaddleSlim/models/resnet.py b/PaddleSlim/models/resnet.py similarity index 100% rename from fluid/PaddleSlim/models/resnet.py rename to PaddleSlim/models/resnet.py diff --git a/fluid/PaddleSlim/quant_low_level_api/quant.py b/PaddleSlim/quant_low_level_api/quant.py similarity index 100% rename from fluid/PaddleSlim/quant_low_level_api/quant.py rename to PaddleSlim/quant_low_level_api/quant.py diff --git a/fluid/PaddleSlim/quant_low_level_api/run_quant.sh b/PaddleSlim/quant_low_level_api/run_quant.sh similarity index 100% rename from fluid/PaddleSlim/quant_low_level_api/run_quant.sh rename to PaddleSlim/quant_low_level_api/run_quant.sh diff --git a/fluid/PaddleSlim/reader.py b/PaddleSlim/reader.py similarity index 100% rename from fluid/PaddleSlim/reader.py rename to PaddleSlim/reader.py diff --git a/fluid/PaddleSlim/run.sh b/PaddleSlim/run.sh similarity index 100% rename from fluid/PaddleSlim/run.sh rename to PaddleSlim/run.sh diff --git a/fluid/PaddleSlim/utility.py b/PaddleSlim/utility.py similarity index 100% rename from fluid/PaddleSlim/utility.py rename to PaddleSlim/utility.py diff --git a/fluid/DeepASR/.gitignore b/PaddleSpeech/DeepASR/.gitignore similarity index 100% rename from fluid/DeepASR/.gitignore rename to PaddleSpeech/DeepASR/.gitignore diff --git a/PaddleSpeech/DeepASR/README.md b/PaddleSpeech/DeepASR/README.md new file mode 100644 index 0000000000000000000000000000000000000000..6b9913fd30a56ef2328bc62e9b36e496f6763430 --- /dev/null +++ b/PaddleSpeech/DeepASR/README.md @@ -0,0 +1,36 @@ +The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html). + +## Deep Automatic Speech Recognition + +### Introduction +TBD + +### Installation + +#### Kaldi +The decoder depends on [kaldi](https://github.com/kaldi-asr/kaldi), install it by flowing its instructions. Then + +```shell +export KALDI_ROOT= +``` + +#### Decoder + +```shell +git clone https://github.com/PaddlePaddle/models.git +cd models/fluid/DeepASR/decoder +sh setup.sh +``` + +### Data reprocessing +TBD + +### Training +TBD + + +### Inference & Decoding +TBD + +### Question and Contribution +TBD diff --git a/PaddleSpeech/DeepASR/README_cn.md b/PaddleSpeech/DeepASR/README_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..be78a048701a621bd90942bdfe30ef4d7c7f082f --- /dev/null +++ b/PaddleSpeech/DeepASR/README_cn.md @@ -0,0 +1,186 @@ +运行本目录下的程序示例需要使用 PaddlePaddle v0.14及以上版本。如果您的 PaddlePaddle 安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新 PaddlePaddle 安装版本。 + +--- + +DeepASR (Deep Automatic Speech Recognition) 是一个基于PaddlePaddle FLuid与[Kaldi](http://www.kaldi-asr.org)的语音识别系统。其利用Fluid框架完成语音识别中声学模型的配置和训练,并集成 Kaldi 的解码器。旨在方便已对 Kaldi 的较为熟悉的用户实现中声学模型的快速、大规模训练,并利用kaldi完成复杂的语音数据预处理和最终的解码过程。 + +### 目录 +- [模型概览](#model-overview) +- [安装](#installation) +- [数据预处理](#data-reprocessing) +- [模型训练](#training) +- [训练过程中的时间分析](#perf-profiling) +- [预测和解码](#infer-decoding) +- [评估错误率](#scoring-error-rate) +- [Aishell 实例](#aishell-example) +- [欢迎贡献更多的实例](#how-to-contrib) + +### 模型概览 + +DeepASR的声学模型是一个单卷积层加多层层叠LSTMP 的结构,利用卷积来进行初步的特征提取,并用多层的LSTMP来对时序关系进行建模,所用到的损失函数是交叉熵。[LSTMP](https://arxiv.org/abs/1402.1128)(LSTM with recurrent projection layer)是传统 LSTM 的拓展,在 LSTM 的基础上增加了一个映射层,将隐含层映射到较低的维度并输入下一个时间步,这种结构在大为减小 LSTM 的参数规模和计算复杂度的同时还提升了 LSTM 的性能表现。 + +

    +
    +图1 LSTMP 的拓扑结构 +

    + +### 安装 + + +#### kaldi的安装与设置 + + +DeepASR解码过程中所用的解码器依赖于[Kaldi的安装](https://github.com/kaldi-asr/kaldi),如环境中无Kaldi, 请`git clone`其源代码,并按给定的命令安装好kaldi,最后设置环境变量`KALDI_ROOT`: + +```shell +export KALDI_ROOT= + +``` +#### 解码器的安装 +进入解码器源码所在的目录 + +```shell +cd models/fluid/DeepASR/decoder +``` +运行安装脚本 + +```shell +sh setup.sh +``` + 编译过程完成即成功地安转了解码器。 + +### 数据预处理 + +参考[Kaldi的数据准备流程](http://kaldi-asr.org/doc/data_prep.html)完成音频数据的特征提取和标签对齐 + +### 声学模型的训练 + +可以选择在CPU或GPU模式下进行声学模型的训练,例如在GPU模式下的训练 + +```shell +CUDA_VISIBLE_DEVICES=0,1,2,3 python -u train.py \ + --train_feature_lst train_feature.lst \ + --train_label_lst train_label.lst \ + --val_feature_lst val_feature.lst \ + --val_label_lst val_label.lst \ + --mean_var global_mean_var \ + --parallel +``` +其中`train_feature.lst`和`train_label.lst`分别是训练数据集的特征列表文件和标注列表文件,类似的,`val_feature.lst`和`val_label.lst`对应的则是验证集的列表文件。实际训练过程中要正确指定建模单元大小、学习率等重要参数。关于这些参数的说明,请运行 + +```shell +python train.py --help +``` +获取更多信息。 + +### 训练过程中的时间分析 + +利用Fluid提供的性能分析工具profiler,可对训练过程进行性能分析,获取网络中operator级别的执行时间 + +```shell +CUDA_VISIBLE_DEVICES=0 python -u tools/profile.py \ + --train_feature_lst train_feature.lst \ + --train_label_lst train_label.lst \ + --val_feature_lst val_feature.lst \ + --val_label_lst val_label.lst \ + --mean_var global_mean_var +``` + + +### 预测和解码 + +在充分训练好声学模型之后,利用训练过程中保存下来的模型checkpoint,可对输入的音频数据进行解码输出,得到声音到文字的识别结果 + +``` +CUDA_VISIBLE_DEVICES=0,1,2,3 python -u infer_by_ckpt.py \ + --batch_size 96 \ + --checkpoint deep_asr.pass_1.checkpoint \ + --infer_feature_lst test_feature.lst \ + --infer_label_lst test_label.lst \ + --mean_var global_mean_var \ + --parallel +``` + +### 评估错误率 + +对语音识别系统的评价常用的指标有词错误率(Word Error Rate, WER)和字错误率(Character Error Rate, CER), 在DeepASR中也实现了相关的度量工具,其运行方式为 + +``` +python score_error_rate.py --error_rate_type cer --ref ref.txt --hyp decoding.txt +``` +参数`error_rate_type`表示测量错误率的类型,即 WER 或 CER;`ref.txt` 和 `decoding.txt` 分别表示参考文本和实际解码出的文本,它们有着同样的格式: + +``` +key1 text1 +key2 text2 +key3 text3 +... + +``` + + +### Aishell 实例 + +本节以[Aishell数据集](http://www.aishelltech.com/kysjcp)为例,展示如何完成从数据预处理到解码输出。Aishell是由北京希尔贝克公司所开放的中文普通话语音数据集,时长178小时,包含了400名来自不同口音区域录制者的语音,原始数据可由[openslr](http://www.openslr.org/33)获取。为简化流程,这里提供了已完成预处理的数据集供下载: + +``` +cd examples/aishell +sh prepare_data.sh +``` + +其中包括了声学模型的训练数据以及解码过程中所用到的辅助文件等。下载数据完成后,在开始训练之前可对训练过程进行分析 + +``` +sh profile.sh +``` + +执行训练 + +``` +sh train.sh +``` +默认是用4卡GPU进行训练,在实际过程中可根据可用GPU的数目和显存大小对`batch_size`、学习率等参数进行动态调整。训练过程中典型的损失函数和精度的变化趋势如图2所示 + +

    +
    +图2 在Aishell数据集上训练声学模型的学习曲线 +

    + +完成模型训练后,即可执行预测识别测试集语音中的文字: + +``` +sh infer_by_ckpt.sh +``` + +其中包括了声学模型的预测和解码器的解码输出两个重要的过程。以下是解码输出的样例: + +``` +... +BAC009S0764W0239 十一 五 期间 我 国 累计 境外 投资 七千亿 美元 +BAC009S0765W0140 在 了解 送 方 的 资产 情况 与 需求 之后 +BAC009S0915W0291 这 对 苹果 来说 不 是 件 容易 的 事 儿 +BAC009S0769W0159 今年 土地 收入 预计 近 四万亿 元 +BAC009S0907W0451 由 浦东 商店 作为 掩护 +BAC009S0768W0128 土地 交易 可能 随着 供应 淡季 的 到来 而 降温 +... +``` + +每行对应一个输出,均以音频样本的关键字开头,随后是按词分隔的解码出的中文文本。解码完成后运行脚本评估字错误率(CER) + +``` +sh score_cer.sh +``` + +其输出类似于如下所示 + +``` +Error rate[cer] = 0.101971 (10683/104765), +total 7176 sentences in hyp, 0 not presented in ref. +``` + +利用经过20轮左右训练的声学模型,可以在Aishell的测试集上得到CER约10%的识别结果。 + + +### 欢迎贡献更多的实例 + +DeepASR目前只开放了Aishell实例,我们欢迎用户在更多的数据集上测试完整的训练流程并贡献到这个项目中。 diff --git a/fluid/PaddleRec/gru4rec/__init__.py b/PaddleSpeech/DeepASR/data_utils/__init__.py similarity index 100% rename from fluid/PaddleRec/gru4rec/__init__.py rename to PaddleSpeech/DeepASR/data_utils/__init__.py diff --git a/fluid/DeepASR/data_utils/async_data_reader.py b/PaddleSpeech/DeepASR/data_utils/async_data_reader.py similarity index 100% rename from fluid/DeepASR/data_utils/async_data_reader.py rename to PaddleSpeech/DeepASR/data_utils/async_data_reader.py diff --git a/fluid/PaddleRec/multiview_simnet/__init__.py b/PaddleSpeech/DeepASR/data_utils/augmentor/__init__.py similarity index 100% rename from fluid/PaddleRec/multiview_simnet/__init__.py rename to PaddleSpeech/DeepASR/data_utils/augmentor/__init__.py diff --git a/fluid/DeepASR/data_utils/augmentor/tests/__init__.py b/PaddleSpeech/DeepASR/data_utils/augmentor/tests/__init__.py similarity index 100% rename from fluid/DeepASR/data_utils/augmentor/tests/__init__.py rename to PaddleSpeech/DeepASR/data_utils/augmentor/tests/__init__.py diff --git a/fluid/DeepASR/data_utils/augmentor/tests/data/global_mean_var_search26kHr b/PaddleSpeech/DeepASR/data_utils/augmentor/tests/data/global_mean_var_search26kHr similarity index 100% rename from fluid/DeepASR/data_utils/augmentor/tests/data/global_mean_var_search26kHr rename to PaddleSpeech/DeepASR/data_utils/augmentor/tests/data/global_mean_var_search26kHr diff --git a/fluid/DeepASR/data_utils/augmentor/tests/test_data_trans.py b/PaddleSpeech/DeepASR/data_utils/augmentor/tests/test_data_trans.py similarity index 100% rename from fluid/DeepASR/data_utils/augmentor/tests/test_data_trans.py rename to PaddleSpeech/DeepASR/data_utils/augmentor/tests/test_data_trans.py diff --git a/fluid/DeepASR/data_utils/augmentor/trans_add_delta.py b/PaddleSpeech/DeepASR/data_utils/augmentor/trans_add_delta.py similarity index 100% rename from fluid/DeepASR/data_utils/augmentor/trans_add_delta.py rename to PaddleSpeech/DeepASR/data_utils/augmentor/trans_add_delta.py diff --git a/fluid/DeepASR/data_utils/augmentor/trans_delay.py b/PaddleSpeech/DeepASR/data_utils/augmentor/trans_delay.py similarity index 100% rename from fluid/DeepASR/data_utils/augmentor/trans_delay.py rename to PaddleSpeech/DeepASR/data_utils/augmentor/trans_delay.py diff --git a/fluid/DeepASR/data_utils/augmentor/trans_mean_variance_norm.py b/PaddleSpeech/DeepASR/data_utils/augmentor/trans_mean_variance_norm.py similarity index 100% rename from fluid/DeepASR/data_utils/augmentor/trans_mean_variance_norm.py rename to PaddleSpeech/DeepASR/data_utils/augmentor/trans_mean_variance_norm.py diff --git a/fluid/DeepASR/data_utils/augmentor/trans_splice.py b/PaddleSpeech/DeepASR/data_utils/augmentor/trans_splice.py similarity index 100% rename from fluid/DeepASR/data_utils/augmentor/trans_splice.py rename to PaddleSpeech/DeepASR/data_utils/augmentor/trans_splice.py diff --git a/fluid/DeepASR/data_utils/util.py b/PaddleSpeech/DeepASR/data_utils/util.py similarity index 100% rename from fluid/DeepASR/data_utils/util.py rename to PaddleSpeech/DeepASR/data_utils/util.py diff --git a/fluid/DeepASR/decoder/.gitignore b/PaddleSpeech/DeepASR/decoder/.gitignore similarity index 100% rename from fluid/DeepASR/decoder/.gitignore rename to PaddleSpeech/DeepASR/decoder/.gitignore diff --git a/fluid/DeepASR/decoder/post_latgen_faster_mapped.cc b/PaddleSpeech/DeepASR/decoder/post_latgen_faster_mapped.cc similarity index 100% rename from fluid/DeepASR/decoder/post_latgen_faster_mapped.cc rename to PaddleSpeech/DeepASR/decoder/post_latgen_faster_mapped.cc diff --git a/fluid/DeepASR/decoder/post_latgen_faster_mapped.h b/PaddleSpeech/DeepASR/decoder/post_latgen_faster_mapped.h similarity index 100% rename from fluid/DeepASR/decoder/post_latgen_faster_mapped.h rename to PaddleSpeech/DeepASR/decoder/post_latgen_faster_mapped.h diff --git a/fluid/DeepASR/decoder/pybind.cc b/PaddleSpeech/DeepASR/decoder/pybind.cc similarity index 100% rename from fluid/DeepASR/decoder/pybind.cc rename to PaddleSpeech/DeepASR/decoder/pybind.cc diff --git a/fluid/DeepASR/decoder/setup.py b/PaddleSpeech/DeepASR/decoder/setup.py similarity index 100% rename from fluid/DeepASR/decoder/setup.py rename to PaddleSpeech/DeepASR/decoder/setup.py diff --git a/fluid/DeepASR/decoder/setup.sh b/PaddleSpeech/DeepASR/decoder/setup.sh similarity index 100% rename from fluid/DeepASR/decoder/setup.sh rename to PaddleSpeech/DeepASR/decoder/setup.sh diff --git a/fluid/DeepASR/examples/aishell/.gitignore b/PaddleSpeech/DeepASR/examples/aishell/.gitignore similarity index 100% rename from fluid/DeepASR/examples/aishell/.gitignore rename to PaddleSpeech/DeepASR/examples/aishell/.gitignore diff --git a/fluid/DeepASR/examples/aishell/download_pretrained_model.sh b/PaddleSpeech/DeepASR/examples/aishell/download_pretrained_model.sh similarity index 100% rename from fluid/DeepASR/examples/aishell/download_pretrained_model.sh rename to PaddleSpeech/DeepASR/examples/aishell/download_pretrained_model.sh diff --git a/fluid/DeepASR/examples/aishell/infer_by_ckpt.sh b/PaddleSpeech/DeepASR/examples/aishell/infer_by_ckpt.sh similarity index 100% rename from fluid/DeepASR/examples/aishell/infer_by_ckpt.sh rename to PaddleSpeech/DeepASR/examples/aishell/infer_by_ckpt.sh diff --git a/fluid/DeepASR/examples/aishell/prepare_data.sh b/PaddleSpeech/DeepASR/examples/aishell/prepare_data.sh similarity index 100% rename from fluid/DeepASR/examples/aishell/prepare_data.sh rename to PaddleSpeech/DeepASR/examples/aishell/prepare_data.sh diff --git a/fluid/DeepASR/examples/aishell/profile.sh b/PaddleSpeech/DeepASR/examples/aishell/profile.sh similarity index 100% rename from fluid/DeepASR/examples/aishell/profile.sh rename to PaddleSpeech/DeepASR/examples/aishell/profile.sh diff --git a/fluid/DeepASR/examples/aishell/score_cer.sh b/PaddleSpeech/DeepASR/examples/aishell/score_cer.sh similarity index 100% rename from fluid/DeepASR/examples/aishell/score_cer.sh rename to PaddleSpeech/DeepASR/examples/aishell/score_cer.sh diff --git a/fluid/DeepASR/examples/aishell/train.sh b/PaddleSpeech/DeepASR/examples/aishell/train.sh similarity index 100% rename from fluid/DeepASR/examples/aishell/train.sh rename to PaddleSpeech/DeepASR/examples/aishell/train.sh diff --git a/fluid/DeepASR/images/learning_curve.png b/PaddleSpeech/DeepASR/images/learning_curve.png similarity index 100% rename from fluid/DeepASR/images/learning_curve.png rename to PaddleSpeech/DeepASR/images/learning_curve.png diff --git a/fluid/DeepASR/images/lstmp.png b/PaddleSpeech/DeepASR/images/lstmp.png similarity index 100% rename from fluid/DeepASR/images/lstmp.png rename to PaddleSpeech/DeepASR/images/lstmp.png diff --git a/fluid/DeepASR/infer.py b/PaddleSpeech/DeepASR/infer.py similarity index 100% rename from fluid/DeepASR/infer.py rename to PaddleSpeech/DeepASR/infer.py diff --git a/fluid/DeepASR/infer_by_ckpt.py b/PaddleSpeech/DeepASR/infer_by_ckpt.py similarity index 100% rename from fluid/DeepASR/infer_by_ckpt.py rename to PaddleSpeech/DeepASR/infer_by_ckpt.py diff --git a/fluid/PaddleRec/ssr/__init__.py b/PaddleSpeech/DeepASR/model_utils/__init__.py similarity index 100% rename from fluid/PaddleRec/ssr/__init__.py rename to PaddleSpeech/DeepASR/model_utils/__init__.py diff --git a/fluid/DeepASR/model_utils/model.py b/PaddleSpeech/DeepASR/model_utils/model.py similarity index 100% rename from fluid/DeepASR/model_utils/model.py rename to PaddleSpeech/DeepASR/model_utils/model.py diff --git a/fluid/DeepASR/score_error_rate.py b/PaddleSpeech/DeepASR/score_error_rate.py similarity index 100% rename from fluid/DeepASR/score_error_rate.py rename to PaddleSpeech/DeepASR/score_error_rate.py diff --git a/fluid/DeepASR/tools/_init_paths.py b/PaddleSpeech/DeepASR/tools/_init_paths.py similarity index 100% rename from fluid/DeepASR/tools/_init_paths.py rename to PaddleSpeech/DeepASR/tools/_init_paths.py diff --git a/fluid/DeepASR/tools/error_rate.py b/PaddleSpeech/DeepASR/tools/error_rate.py similarity index 100% rename from fluid/DeepASR/tools/error_rate.py rename to PaddleSpeech/DeepASR/tools/error_rate.py diff --git a/fluid/DeepASR/tools/profile.py b/PaddleSpeech/DeepASR/tools/profile.py similarity index 100% rename from fluid/DeepASR/tools/profile.py rename to PaddleSpeech/DeepASR/tools/profile.py diff --git a/fluid/DeepASR/train.py b/PaddleSpeech/DeepASR/train.py similarity index 100% rename from fluid/DeepASR/train.py rename to PaddleSpeech/DeepASR/train.py diff --git a/PaddleSpeech/README.md b/PaddleSpeech/README.md new file mode 100644 index 0000000000000000000000000000000000000000..39f91c26bd90fdd0e8fa81a395d14c2d3826f7cd --- /dev/null +++ b/PaddleSpeech/README.md @@ -0,0 +1,12 @@ +Fluid 模型库 +============ + +语音识别 +-------- + +自动语音识别(Automatic Speech Recognition, ASR)是将人类声音中的词汇内容转录成计算机可输入的文字的技术。语音识别的相关研究经历了漫长的探索过程,在HMM/GMM模型之后其发展一直较为缓慢,随着深度学习的兴起,其迎来了春天。在多种语言识别任务中,将深度神经网络(DNN)作为声学模型,取得了比GMM更好的性能,使得 ASR 成为深度学习应用最为成功的领域之一。而由于识别准确率的不断提高,有越来越多的语言技术产品得以落地,例如语言输入法、以智能音箱为代表的智能家居设备等 — 基于语言的交互方式正在深刻的改变人类的生活。 + +与 [DeepSpeech](https://github.com/PaddlePaddle/DeepSpeech) 中深度学习模型端到端直接预测字词的分布不同,本实例更接近传统的语言识别流程,以音素为建模单元,关注语言识别中声学模型的训练,利用[kaldi](http://www.kaldi-asr.org) 进行音频数据的特征提取和标签对齐,并集成 kaldi 的解码器完成解码。 + +- [DeepASR](https://github.com/PaddlePaddle/models/blob/develop/PaddleSpeech/DeepASR/README_cn.md) + diff --git a/README.md b/README.md index 182e625239e206294bf1d1ce0a8a18aaec7871ad..0ef29415f046b0c424dcf305ddc002626814a41e 100644 --- a/README.md +++ b/README.md @@ -16,57 +16,57 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 ## PaddleCV 模型|简介|模型优势|参考论文 --|:--:|:--:|:--: -[AlexNet](./fluid/PaddleCV/image_classification/models)|图像分类经典模型|首次在CNN中成功的应用了ReLU、Dropout和LRN,并使用GPU进行运算加速|[ImageNet Classification with Deep Convolutional Neural Networks](https://www.researchgate.net/publication/267960550_ImageNet_Classification_with_Deep_Convolutional_Neural_Networks) -[VGG](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models)|图像分类经典模型|在AlexNet的基础上使用3*3小卷积核,增加网络深度,具有很好的泛化能力|[Very Deep ConvNets for Large-Scale Inage Recognition](https://arxiv.org/pdf/1409.1556.pdf) -[GoogleNet](./fluid/PaddleCV/image_classification/models)|图像分类经典模型|在不增加计算负载的前提下增加了网络的深度和宽度,性能更加优越|[Going deeper with convolutions](https://ieeexplore.ieee.org/document/7298594) -[ResNet](./fluid/PaddleCV/image_classification/models)|残差网络|引入了新的残差结构,解决了随着网络加深,准确率下降的问题|[Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) -[Inception-v4](./fluid/PaddleCV/image_classification/models)|图像分类经典模型|更加deeper和wider的inception结构|[Inception-ResNet and the Impact of Residual Connections on Learning](http://arxiv.org/abs/1602.07261) -[MobileNet](./fluid/PaddleCV/image_classification/models)|轻量级网络模型|为移动和嵌入式设备提出的高效模型|[MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) -[DPN](./fluid/PaddleCV/image_classification/models)|图像分类模型|结合了DenseNet和ResNeXt的网络结构,对图像分类效果有所提升|[Dual Path Networks](https://arxiv.org/abs/1707.01629) -[SE-ResNeXt](./fluid/PaddleCV/image_classification/models)|图像分类模型|ResNeXt中加入了SE block,提高了模型准确率|[Squeeze-and-excitation networks](https://arxiv.org/abs/1709.01507) -[SSD](./fluid/PaddleCV/object_detection/README_cn.md)|单阶段目标检测器|在不同尺度的特征图上检测对应尺度的目标,可以方便地插入到任何一种标准卷积网络中|[SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325) -[Face Detector: PyramidBox](./fluid/PaddleCV/face_detection/README_cn.md)|基于SSD的单阶段人脸检测器|利用上下文信息解决困难人脸的检测问题,网络表达能力高,鲁棒性强|[PyramidBox: A Context-assisted Single Shot Face Detector](https://arxiv.org/pdf/1803.07737.pdf) -[Faster RCNN](./fluid/PaddleCV/rcnn/README_cn.md)|典型的两阶段目标检测器|创造性地采用卷积网络自行产生建议框,并且和目标检测网络共享卷积网络,建议框数目减少,质量提高|[Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/abs/1506.01497) -[Mask RCNN](./fluid/PaddleCV/rcnn/README_cn.md)|基于Faster RCNN模型的经典实例分割模型|在原有Faster RCNN模型基础上添加分割分支,得到掩码结果,实现了掩码和类别预测关系的解藕。|[Mask R-CNN](https://arxiv.org/abs/1703.06870) -[ICNet](./fluid/PaddleCV/icnet)|图像实时语义分割模型|即考虑了速度,也考虑了准确性,在高分辨率图像的准确性和低复杂度网络的效率之间获得平衡|[ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545) -[DCGAN](./fluid/PaddleCV/gan/c_gan)|图像生成模型|深度卷积生成对抗网络,将GAN和卷积网络结合起来,以解决GAN训练不稳定的问题|[Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/pdf/1511.06434.pdf) -[ConditionalGAN](./fluid/PaddleCV/gan/c_gan)|图像生成模型|条件生成对抗网络,一种带条件约束的GAN,使用额外信息对模型增加条件,可以指导数据生成过程|[Conditional Generative Adversarial Nets](https://arxiv.org/abs/1411.1784) -[CycleGAN](./fluid/PaddleCV/gan/cycle_gan)|图片转化模型|自动将某一类图片转换成另外一类图片,可用于风格迁移|[Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks](https://arxiv.org/abs/1703.10593) -[CRNN-CTC模型](./fluid/PaddleCV/ocr_recognition)|场景文字识别模型|使用CTC model识别图片中单行英文字符|[Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks](https://www.researchgate.net/publication/221346365_Connectionist_temporal_classification_Labelling_unsegmented_sequence_data_with_recurrent_neural_'networks) -[Attention模型](./fluid/PaddleCV/ocr_recognition)|场景文字识别模型|使用attention 识别图片中单行英文字符|[Recurrent Models of Visual Attention](https://arxiv.org/abs/1406.6247) -[Metric Learning](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/metric_learning)|度量学习模型|能够用于分析对象时间的关联、比较关系,可应用于辅助分类、聚类问题,也广泛用于图像检索、人脸识别等领域|- -[TSN](./fluid/PaddleCV/video_classification)|视频分类模型|基于长范围时间结构建模,结合了稀疏时间采样策略和视频级监督来保证使用整段视频时学习得有效和高效|[Temporal Segment Networks: Towards Good Practices for Deep Action Recognition](https://arxiv.org/abs/1608.00859) -[视频模型库](./fluid/PaddleCV/video)|视频模型库|给开发者提供基于PaddlePaddle的便捷、高效的使用深度学习算法解决视频理解、视频编辑、视频生成等一系列模型|| -[caffe2fluid](./fluid/PaddleCV/caffe2fluid)|将Caffe模型转换为Paddle Fluid配置和模型文件工具|-|- +[AlexNet](./PaddleCV/image_classification/models)|图像分类经典模型|首次在CNN中成功的应用了ReLU、Dropout和LRN,并使用GPU进行运算加速|[ImageNet Classification with Deep Convolutional Neural Networks](https://www.researchgate.net/publication/267960550_ImageNet_Classification_with_Deep_Convolutional_Neural_Networks) +[VGG](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification/models)|图像分类经典模型|在AlexNet的基础上使用3*3小卷积核,增加网络深度,具有很好的泛化能力|[Very Deep ConvNets for Large-Scale Inage Recognition](https://arxiv.org/pdf/1409.1556.pdf) +[GoogleNet](./PaddleCV/image_classification/models)|图像分类经典模型|在不增加计算负载的前提下增加了网络的深度和宽度,性能更加优越|[Going deeper with convolutions](https://ieeexplore.ieee.org/document/7298594) +[ResNet](./PaddleCV/image_classification/models)|残差网络|引入了新的残差结构,解决了随着网络加深,准确率下降的问题|[Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) +[Inception-v4](./PaddleCV/image_classification/models)|图像分类经典模型|更加deeper和wider的inception结构|[Inception-ResNet and the Impact of Residual Connections on Learning](http://arxiv.org/abs/1602.07261) +[MobileNet](./PaddleCV/image_classification/models)|轻量级网络模型|为移动和嵌入式设备提出的高效模型|[MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) +[DPN](./PaddleCV/image_classification/models)|图像分类模型|结合了DenseNet和ResNeXt的网络结构,对图像分类效果有所提升|[Dual Path Networks](https://arxiv.org/abs/1707.01629) +[SE-ResNeXt](./PaddleCV/image_classification/models)|图像分类模型|ResNeXt中加入了SE block,提高了模型准确率|[Squeeze-and-excitation networks](https://arxiv.org/abs/1709.01507) +[SSD](./PaddleCV/object_detection/README_cn.md)|单阶段目标检测器|在不同尺度的特征图上检测对应尺度的目标,可以方便地插入到任何一种标准卷积网络中|[SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325) +[Face Detector: PyramidBox](./PaddleCV/face_detection/README_cn.md)|基于SSD的单阶段人脸检测器|利用上下文信息解决困难人脸的检测问题,网络表达能力高,鲁棒性强|[PyramidBox: A Context-assisted Single Shot Face Detector](https://arxiv.org/pdf/1803.07737.pdf) +[Faster RCNN](./PaddleCV/rcnn/README_cn.md)|典型的两阶段目标检测器|创造性地采用卷积网络自行产生建议框,并且和目标检测网络共享卷积网络,建议框数目减少,质量提高|[Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks](https://arxiv.org/abs/1506.01497) +[Mask RCNN](./PaddleCV/rcnn/README_cn.md)|基于Faster RCNN模型的经典实例分割模型|在原有Faster RCNN模型基础上添加分割分支,得到掩码结果,实现了掩码和类别预测关系的解藕。|[Mask R-CNN](https://arxiv.org/abs/1703.06870) +[ICNet](./PaddleCV/icnet)|图像实时语义分割模型|即考虑了速度,也考虑了准确性,在高分辨率图像的准确性和低复杂度网络的效率之间获得平衡|[ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545) +[DCGAN](./PaddleCV/gan/c_gan)|图像生成模型|深度卷积生成对抗网络,将GAN和卷积网络结合起来,以解决GAN训练不稳定的问题|[Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/pdf/1511.06434.pdf) +[ConditionalGAN](./PaddleCV/gan/c_gan)|图像生成模型|条件生成对抗网络,一种带条件约束的GAN,使用额外信息对模型增加条件,可以指导数据生成过程|[Conditional Generative Adversarial Nets](https://arxiv.org/abs/1411.1784) +[CycleGAN](./PaddleCV/gan/cycle_gan)|图片转化模型|自动将某一类图片转换成另外一类图片,可用于风格迁移|[Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks](https://arxiv.org/abs/1703.10593) +[CRNN-CTC模型](./PaddleCV/ocr_recognition)|场景文字识别模型|使用CTC model识别图片中单行英文字符|[Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks](https://www.researchgate.net/publication/221346365_Connectionist_temporal_classification_Labelling_unsegmented_sequence_data_with_recurrent_neural_'networks) +[Attention模型](./PaddleCV/ocr_recognition)|场景文字识别模型|使用attention 识别图片中单行英文字符|[Recurrent Models of Visual Attention](https://arxiv.org/abs/1406.6247) +[Metric Learning](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/metric_learning)|度量学习模型|能够用于分析对象时间的关联、比较关系,可应用于辅助分类、聚类问题,也广泛用于图像检索、人脸识别等领域|- +[TSN](./PaddleCV/video_classification)|视频分类模型|基于长范围时间结构建模,结合了稀疏时间采样策略和视频级监督来保证使用整段视频时学习得有效和高效|[Temporal Segment Networks: Towards Good Practices for Deep Action Recognition](https://arxiv.org/abs/1608.00859) +[视频模型库](./PaddleCV/video)|视频模型库|给开发者提供基于PaddlePaddle的便捷、高效的使用深度学习算法解决视频理解、视频编辑、视频生成等一系列模型|| +[caffe2fluid](./PaddleCV/caffe2fluid)|将Caffe模型转换为Paddle Fluid配置和模型文件工具|-|- ## PaddleNLP 模型|简介|模型优势|参考论文 --|:--:|:--:|:--: -[Transformer](./fluid/PaddleNLP/neural_machine_translation/transformer/README_cn.md)|机器翻译模型|基于self-attention,计算复杂度小,并行度高,容易学习长程依赖,翻译效果更好|[Attention Is All You Need](https://arxiv.org/abs/1706.03762) +[Transformer](./PaddleNLP/neural_machine_translation/transformer/README_cn.md)|机器翻译模型|基于self-attention,计算复杂度小,并行度高,容易学习长程依赖,翻译效果更好|[Attention Is All You Need](https://arxiv.org/abs/1706.03762) [BERT](https://github.com/PaddlePaddle/LARK/tree/develop/BERT)|语义表示模型|在多个 NLP 任务上取得 SOTA 效果,支持多卡多机训练,支持混合精度训练|[BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) [LAC](https://github.com/baidu/lac/blob/master/README.md)|联合的词法分析模型|能够整体性地完成中文分词、词性标注、专名识别任务|[Chinese Lexical Analysis with Deep Bi-GRU-CRF Network](https://arxiv.org/abs/1807.01882) [Senta](https://github.com/baidu/Senta/blob/master/README.md)|情感倾向分析模型集|百度AI开放平台中情感倾向分析模型|- -[DAM](./fluid/PaddleNLP/deep_attention_matching_net)|语义匹配模型|百度自然语言处理部发表于ACL-2018的工作,用于检索式聊天机器人多轮对话中应答的选择|[Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network](http://aclweb.org/anthology/P18-1103) +[DAM](./PaddleNLP/deep_attention_matching_net)|语义匹配模型|百度自然语言处理部发表于ACL-2018的工作,用于检索式聊天机器人多轮对话中应答的选择|[Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network](http://aclweb.org/anthology/P18-1103) [SimNet](https://github.com/baidu/AnyQ/blob/master/tools/simnet/train/paddle/README.md)|语义匹配框架|使用SimNet构建出的模型可以便捷的加入AnyQ系统中,增强AnyQ系统的语义匹配能力|- -[DuReader](./fluid/PaddleNLP/machine_reading_comprehension/README.md)|阅读理解模型|百度MRC数据集上的机器阅读理解模型|- -[Bi-GRU-CRF](./fluid/PaddleNLP/sequence_tagging_for_ner/README.md)|命名实体识别|结合了CRF和双向GRU的命名实体识别模型|- +[DuReader](./PaddleNLP/machine_reading_comprehension/README.md)|阅读理解模型|百度MRC数据集上的机器阅读理解模型|- +[Bi-GRU-CRF](./PaddleNLP/sequence_tagging_for_ner/README.md)|命名实体识别|结合了CRF和双向GRU的命名实体识别模型|- ## PaddleRec 模型|简介|模型优势|参考论文 --|:--:|:--:|:--: -[TagSpace](./fluid/PaddleRec/tagspace)|文本及标签的embedding表示学习模型|应用于工业级的标签推荐,具体应用场景有feed新闻标签推荐等|[#TagSpace: Semantic embeddings from hashtags](https://www.bibsonomy.org/bibtex/0ed4314916f8e7c90d066db45c293462) -[GRU4Rec](./fluid/PaddleRec/gru4rec)|个性化推荐模型|首次将RNN(GRU)运用于session-based推荐,相比传统的KNN和矩阵分解,效果有明显的提升|[Session-based Recommendations with Recurrent Neural Networks](https://arxiv.org/abs/1511.06939) -[SSR](./fluid/PaddleRec/ssr)|序列语义检索推荐模型|使用参考论文中的思想,使用多种时间粒度进行用户行为预测|[Multi-Rate Deep Learning for Temporal Recommendation](https://dl.acm.org/citation.cfm?id=2914726) -[DeepCTR](./fluid/PaddleRec/ctr/README.cn.md)|点击率预估模型|只实现了DeepFM论文中介绍的模型的DNN部分,DeepFM会在其他例子中给出|[DeepFM: A Factorization-Machine based Neural Network for CTR Prediction](https://arxiv.org/abs/1703.04247) -[Multiview-Simnet](./fluid/PaddleRec/multiview_simnet)|个性化推荐模型|基于多元视图,将用户和项目的多个功能视图合并为一个统一模型|[A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems](http://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/frp1159-songA.pdf) +[TagSpace](./PaddleRec/tagspace)|文本及标签的embedding表示学习模型|应用于工业级的标签推荐,具体应用场景有feed新闻标签推荐等|[#TagSpace: Semantic embeddings from hashtags](https://www.bibsonomy.org/bibtex/0ed4314916f8e7c90d066db45c293462) +[GRU4Rec](./PaddleRec/gru4rec)|个性化推荐模型|首次将RNN(GRU)运用于session-based推荐,相比传统的KNN和矩阵分解,效果有明显的提升|[Session-based Recommendations with Recurrent Neural Networks](https://arxiv.org/abs/1511.06939) +[SSR](./PaddleRec/ssr)|序列语义检索推荐模型|使用参考论文中的思想,使用多种时间粒度进行用户行为预测|[Multi-Rate Deep Learning for Temporal Recommendation](https://dl.acm.org/citation.cfm?id=2914726) +[DeepCTR](./PaddleRec/ctr/README.cn.md)|点击率预估模型|只实现了DeepFM论文中介绍的模型的DNN部分,DeepFM会在其他例子中给出|[DeepFM: A Factorization-Machine based Neural Network for CTR Prediction](https://arxiv.org/abs/1703.04247) +[Multiview-Simnet](./PaddleRec/multiview_simnet)|个性化推荐模型|基于多元视图,将用户和项目的多个功能视图合并为一个统一模型|[A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems](http://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/frp1159-songA.pdf) ## Other Models 模型|简介|模型优势|参考论文 --|:--:|:--:|:--: -[DeepASR](./fluid/DeepASR/README_cn.md)|语音识别系统|利用Fluid框架完成语音识别中声学模型的配置和训练,并集成 Kaldi 的解码器|- -[DQN](./fluid/DeepQNetwork/README_cn.md)|深度Q网络|value based强化学习算法,第一个成功地将深度学习和强化学习结合起来的模型|[Human-level control through deep reinforcement learning](https://www.nature.com/articles/nature14236) -[DoubleDQN](./fluid/DeepQNetwork/README_cn.md)|DQN的变体|将Double Q的想法应用在DQN上,解决过优化问题|[Font Size: Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389) -[DuelingDQN](./fluid/DeepQNetwork/README_cn.md)|DQN的变体|改进了DQN模型,提高了模型的性能|[Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html) +[DeepASR](./PaddleSpeech/DeepASR/README_cn.md)|语音识别系统|利用Fluid框架完成语音识别中声学模型的配置和训练,并集成 Kaldi 的解码器|- +[DQN](./PaddleRL/DeepQNetwork/README_cn.md)|深度Q网络|value based强化学习算法,第一个成功地将深度学习和强化学习结合起来的模型|[Human-level control through deep reinforcement learning](https://www.nature.com/articles/nature14236) +[DoubleDQN](./PaddleRL/DeepQNetwork/README_cn.md)|DQN的变体|将Double Q的想法应用在DQN上,解决过优化问题|[Font Size: Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389) +[DuelingDQN](./PaddleRL/DeepQNetwork/README_cn.md)|DQN的变体|改进了DQN模型,提高了模型的性能|[Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html) ## License This tutorial is contributed by [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) and licensed under the [Apache-2.0 license](LICENSE). diff --git a/fluid/__init__.py b/contrib/README.md similarity index 100% rename from fluid/__init__.py rename to contrib/README.md diff --git a/fluid/AutoDL/LRC/README.md b/fluid/AutoDL/LRC/README.md index df9af47d4a3876371673cbbfef0ad2553768b9a5..546cb19169b965af5a3d0d41c903e318d4dfc64a 100644 --- a/fluid/AutoDL/LRC/README.md +++ b/fluid/AutoDL/LRC/README.md @@ -1,74 +1,6 @@ -# LRC Local Rademachar Complexity Regularization -Regularization of Deep Neural Networks(DNNs) for the sake of improving their generalization capability is important and chllenging. This directory contains image classification model based on a novel regularizer rooted in Local Rademacher Complexity (LRC). We appreciate the contribution by [DARTS](https://arxiv.org/abs/1806.09055) for our research. The regularization by LRC and DARTS are combined in this model on CIFAR-10 dataset. Code accompanying the paper -> [An Empirical Study on Regularization of Deep Neural Networks by Local Rademacher Complexity](https://arxiv.org/abs/1902.00873)\ -> Yingzhen Yang, Xingjian Li, Jun Huan.\ -> _arXiv:1902.00873_. ---- -# Table of Contents +Hi! -- [Installation](#installation) -- [Data preparation](#data-preparation) -- [Training](#training) +This directory has been deprecated. -## Installation - -Running sample code in this directory requires PaddelPaddle Fluid v.1.2.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html#paddlepaddle) and make an update. - -## Data preparation - -When you want to use the cifar-10 dataset for the first time, you can download the dataset as: - - sh ./dataset/download.sh - -Please make sure your environment has an internet connection. - -The dataset will be downloaded to `dataset/cifar/cifar-10-batches-py` in the same directory as the `train.py`. If automatic download fails, you can download cifar-10-python.tar.gz from https://www.cs.toronto.edu/~kriz/cifar.html and decompress it to the location mentioned above. - - -## Training - -After data preparation, one can start the training step by: - - python -u train_mixup.py \ - --batch_size=80 \ - --auxiliary \ - --weight_decay=0.0003 \ - --learning_rate=0.025 \ - --lrc_loss_lambda=0.7 \ - --cutout -- Set ```export CUDA_VISIBLE_DEVICES=0``` to specifiy one GPU to train. -- For more help on arguments: - - python train_mixup.py --help - -**data reader introduction:** - -* Data reader is defined in `reader.py`. -* Reshape the images to 32 * 32. -* In training stage, images are padding to 40 * 40 and cropped randomly to the original size. -* In training stage, images are horizontally random flipped. -* Images are standardized to (0, 1). -* In training stage, cutout images randomly. -* Shuffle the order of the input images during training. - -**model configuration:** - -* Use auxiliary loss and auxiliary\_weight=0.4. -* Use dropout and drop\_path\_prob=0.2. -* Set lrc\_loss\_lambda=0.7. - -**training strategy:** - -* Use momentum optimizer with momentum=0.9. -* Weight decay is 0.0003. -* Use cosine decay with init\_lr=0.025. -* Total epoch is 600. -* Use Xaiver initalizer to weight in conv2d, Constant initalizer to weight in batch norm and Normal initalizer to weight in fc. -* Initalize bias in batch norm and fc to zero constant and do not add bias to conv2d. - - -## Reference - - - DARTS: Differentiable Architecture Search [`paper`](https://arxiv.org/abs/1806.09055) - - Differentiable architecture search in PyTorch [`code`](https://github.com/quark0/darts) +Please visit the project at [AutoDL/LRC](../../../AutoDL/LRC). diff --git a/fluid/AutoDL/LRC/README_cn.md b/fluid/AutoDL/LRC/README_cn.md index 06dc937074de199af31db97ee200e7690443b1b0..6c87fd2d1cb5f6f4d187d665548ed7c74746bf10 100644 --- a/fluid/AutoDL/LRC/README_cn.md +++ b/fluid/AutoDL/LRC/README_cn.md @@ -1,71 +1,2 @@ -# LRC 局部Rademachar复杂度正则化 -为了在深度神经网络中提升泛化能力,正则化的选择十分重要也具有挑战性。本目录包括了一种基于局部rademacher复杂度的新型正则(LRC)的图像分类模型。十分感谢[DARTS](https://arxiv.org/abs/1806.09055)模型对本研究提供的帮助。该模型将LRC正则和DARTS网络相结合,在CIFAR-10数据集中得到了很出色的效果。代码和文章一同发布 -> [An Empirical Study on Regularization of Deep Neural Networks by Local Rademacher Complexity](https://arxiv.org/abs/1902.00873)\ -> Yingzhen Yang, Xingjian Li, Jun Huan.\ -> _arXiv:1902.00873_. ---- -# 内容 - -- [安装](#安装) -- [数据准备](#数据准备) -- [模型训练](#模型训练) - -## 安装 - -在当前目录下运行样例代码需要PadddlePaddle Fluid的v.1.2.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://www.paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html#paddlepaddle)中的说明来更新PaddlePaddle。 - -## 数据准备 - -第一次使用CIFAR-10数据集时,您可以通过如果命令下载: - - sh ./dataset/download.sh - -请确保您的环境有互联网连接。数据会下载到`train.py`同目录下的`dataset/cifar/cifar-10-batches-py`。如果下载失败,您可以自行从https://www.cs.toronto.edu/~kriz/cifar.html上下载cifar-10-python.tar.gz并解压到上述位置。 - -## 模型训练 - -数据准备好后,可以通过如下命令开始训练: - - python -u train_mixup.py \ - --batch_size=80 \ - --auxiliary \ - --weight_decay=0.0003 \ - --learning_rate=0.025 \ - --lrc_loss_lambda=0.7 \ - --cutout -- 通过设置 ```export CUDA_VISIBLE_DEVICES=0```指定单张GPU训练。 -- 可选参数见: - - python train_mixup.py --help - -**数据读取器说明:** - -* 数据读取器定义在`reader.py`中 -* 输入图像尺寸统一变换为32 * 32 -* 训练时将图像填充为40 * 40然后随机剪裁为原输入图像大小 -* 训练时图像随机水平翻转 -* 对图像每个像素做归一化处理 -* 训练时对图像做随机遮挡 -* 训练时对输入图像做随机洗牌 - -**模型配置:** - -* 使用辅助损失,辅助损失权重为0.4 -* 使用dropout,随机丢弃率为0.2 -* 设置lrc\_loss\_lambda为0.7 - -**训练策略:** - -* 采用momentum优化算法训练,momentum=0.9 -* 权重衰减系数为0.0001 -* 采用正弦学习率衰减,初始学习率为0.025 -* 总共训练600轮 -* 对卷积权重采用Xaiver初始化,对batch norm权重采用固定初始化,对全连接层权重采用高斯初始化 -* 对batch norm和全连接层偏差采用固定初始化,不对卷积设置偏差 - - -## 引用 - - - DARTS: Differentiable Architecture Search [`论文`](https://arxiv.org/abs/1806.09055) - - Differentiable Architecture Search in PyTorch [`代码`](https://github.com/quark0/darts) +您好,该项目已被迁移,请移步到 [AutoDL/LRC](../../../AutoDL/LRC) 目录下浏览本项目。 diff --git a/fluid/DeepASR/README.md b/fluid/DeepASR/README.md index 6b9913fd30a56ef2328bc62e9b36e496f6763430..b7d916c58649790055b2ddbdd32e914d02f14ebf 100644 --- a/fluid/DeepASR/README.md +++ b/fluid/DeepASR/README.md @@ -1,36 +1,6 @@ -The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html). -## Deep Automatic Speech Recognition +Hi! -### Introduction -TBD +This directory has been deprecated. -### Installation - -#### Kaldi -The decoder depends on [kaldi](https://github.com/kaldi-asr/kaldi), install it by flowing its instructions. Then - -```shell -export KALDI_ROOT= -``` - -#### Decoder - -```shell -git clone https://github.com/PaddlePaddle/models.git -cd models/fluid/DeepASR/decoder -sh setup.sh -``` - -### Data reprocessing -TBD - -### Training -TBD - - -### Inference & Decoding -TBD - -### Question and Contribution -TBD +Please visit the project at [PaddleSpeech/DeepASR](../../PaddleSpeech/DeepASR). diff --git a/fluid/DeepASR/README_cn.md b/fluid/DeepASR/README_cn.md index be78a048701a621bd90942bdfe30ef4d7c7f082f..51b0e724c810165810154915f41159d478398234 100644 --- a/fluid/DeepASR/README_cn.md +++ b/fluid/DeepASR/README_cn.md @@ -1,186 +1,2 @@ -运行本目录下的程序示例需要使用 PaddlePaddle v0.14及以上版本。如果您的 PaddlePaddle 安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新 PaddlePaddle 安装版本。 ---- - -DeepASR (Deep Automatic Speech Recognition) 是一个基于PaddlePaddle FLuid与[Kaldi](http://www.kaldi-asr.org)的语音识别系统。其利用Fluid框架完成语音识别中声学模型的配置和训练,并集成 Kaldi 的解码器。旨在方便已对 Kaldi 的较为熟悉的用户实现中声学模型的快速、大规模训练,并利用kaldi完成复杂的语音数据预处理和最终的解码过程。 - -### 目录 -- [模型概览](#model-overview) -- [安装](#installation) -- [数据预处理](#data-reprocessing) -- [模型训练](#training) -- [训练过程中的时间分析](#perf-profiling) -- [预测和解码](#infer-decoding) -- [评估错误率](#scoring-error-rate) -- [Aishell 实例](#aishell-example) -- [欢迎贡献更多的实例](#how-to-contrib) - -### 模型概览 - -DeepASR的声学模型是一个单卷积层加多层层叠LSTMP 的结构,利用卷积来进行初步的特征提取,并用多层的LSTMP来对时序关系进行建模,所用到的损失函数是交叉熵。[LSTMP](https://arxiv.org/abs/1402.1128)(LSTM with recurrent projection layer)是传统 LSTM 的拓展,在 LSTM 的基础上增加了一个映射层,将隐含层映射到较低的维度并输入下一个时间步,这种结构在大为减小 LSTM 的参数规模和计算复杂度的同时还提升了 LSTM 的性能表现。 - -

    -
    -图1 LSTMP 的拓扑结构 -

    - -### 安装 - - -#### kaldi的安装与设置 - - -DeepASR解码过程中所用的解码器依赖于[Kaldi的安装](https://github.com/kaldi-asr/kaldi),如环境中无Kaldi, 请`git clone`其源代码,并按给定的命令安装好kaldi,最后设置环境变量`KALDI_ROOT`: - -```shell -export KALDI_ROOT= - -``` -#### 解码器的安装 -进入解码器源码所在的目录 - -```shell -cd models/fluid/DeepASR/decoder -``` -运行安装脚本 - -```shell -sh setup.sh -``` - 编译过程完成即成功地安转了解码器。 - -### 数据预处理 - -参考[Kaldi的数据准备流程](http://kaldi-asr.org/doc/data_prep.html)完成音频数据的特征提取和标签对齐 - -### 声学模型的训练 - -可以选择在CPU或GPU模式下进行声学模型的训练,例如在GPU模式下的训练 - -```shell -CUDA_VISIBLE_DEVICES=0,1,2,3 python -u train.py \ - --train_feature_lst train_feature.lst \ - --train_label_lst train_label.lst \ - --val_feature_lst val_feature.lst \ - --val_label_lst val_label.lst \ - --mean_var global_mean_var \ - --parallel -``` -其中`train_feature.lst`和`train_label.lst`分别是训练数据集的特征列表文件和标注列表文件,类似的,`val_feature.lst`和`val_label.lst`对应的则是验证集的列表文件。实际训练过程中要正确指定建模单元大小、学习率等重要参数。关于这些参数的说明,请运行 - -```shell -python train.py --help -``` -获取更多信息。 - -### 训练过程中的时间分析 - -利用Fluid提供的性能分析工具profiler,可对训练过程进行性能分析,获取网络中operator级别的执行时间 - -```shell -CUDA_VISIBLE_DEVICES=0 python -u tools/profile.py \ - --train_feature_lst train_feature.lst \ - --train_label_lst train_label.lst \ - --val_feature_lst val_feature.lst \ - --val_label_lst val_label.lst \ - --mean_var global_mean_var -``` - - -### 预测和解码 - -在充分训练好声学模型之后,利用训练过程中保存下来的模型checkpoint,可对输入的音频数据进行解码输出,得到声音到文字的识别结果 - -``` -CUDA_VISIBLE_DEVICES=0,1,2,3 python -u infer_by_ckpt.py \ - --batch_size 96 \ - --checkpoint deep_asr.pass_1.checkpoint \ - --infer_feature_lst test_feature.lst \ - --infer_label_lst test_label.lst \ - --mean_var global_mean_var \ - --parallel -``` - -### 评估错误率 - -对语音识别系统的评价常用的指标有词错误率(Word Error Rate, WER)和字错误率(Character Error Rate, CER), 在DeepASR中也实现了相关的度量工具,其运行方式为 - -``` -python score_error_rate.py --error_rate_type cer --ref ref.txt --hyp decoding.txt -``` -参数`error_rate_type`表示测量错误率的类型,即 WER 或 CER;`ref.txt` 和 `decoding.txt` 分别表示参考文本和实际解码出的文本,它们有着同样的格式: - -``` -key1 text1 -key2 text2 -key3 text3 -... - -``` - - -### Aishell 实例 - -本节以[Aishell数据集](http://www.aishelltech.com/kysjcp)为例,展示如何完成从数据预处理到解码输出。Aishell是由北京希尔贝克公司所开放的中文普通话语音数据集,时长178小时,包含了400名来自不同口音区域录制者的语音,原始数据可由[openslr](http://www.openslr.org/33)获取。为简化流程,这里提供了已完成预处理的数据集供下载: - -``` -cd examples/aishell -sh prepare_data.sh -``` - -其中包括了声学模型的训练数据以及解码过程中所用到的辅助文件等。下载数据完成后,在开始训练之前可对训练过程进行分析 - -``` -sh profile.sh -``` - -执行训练 - -``` -sh train.sh -``` -默认是用4卡GPU进行训练,在实际过程中可根据可用GPU的数目和显存大小对`batch_size`、学习率等参数进行动态调整。训练过程中典型的损失函数和精度的变化趋势如图2所示 - -

    -
    -图2 在Aishell数据集上训练声学模型的学习曲线 -

    - -完成模型训练后,即可执行预测识别测试集语音中的文字: - -``` -sh infer_by_ckpt.sh -``` - -其中包括了声学模型的预测和解码器的解码输出两个重要的过程。以下是解码输出的样例: - -``` -... -BAC009S0764W0239 十一 五 期间 我 国 累计 境外 投资 七千亿 美元 -BAC009S0765W0140 在 了解 送 方 的 资产 情况 与 需求 之后 -BAC009S0915W0291 这 对 苹果 来说 不 是 件 容易 的 事 儿 -BAC009S0769W0159 今年 土地 收入 预计 近 四万亿 元 -BAC009S0907W0451 由 浦东 商店 作为 掩护 -BAC009S0768W0128 土地 交易 可能 随着 供应 淡季 的 到来 而 降温 -... -``` - -每行对应一个输出,均以音频样本的关键字开头,随后是按词分隔的解码出的中文文本。解码完成后运行脚本评估字错误率(CER) - -``` -sh score_cer.sh -``` - -其输出类似于如下所示 - -``` -Error rate[cer] = 0.101971 (10683/104765), -total 7176 sentences in hyp, 0 not presented in ref. -``` - -利用经过20轮左右训练的声学模型,可以在Aishell的测试集上得到CER约10%的识别结果。 - - -### 欢迎贡献更多的实例 - -DeepASR目前只开放了Aishell实例,我们欢迎用户在更多的数据集上测试完整的训练流程并贡献到这个项目中。 +您好,该项目已被迁移,请移步到 [PaddleSpeech/DeepASR](../../PaddleSpeech/DeepASR) 目录下浏览本项目。 diff --git a/fluid/DeepQNetwork/README.md b/fluid/DeepQNetwork/README.md index 1edeaaa884318ec3a530ec4fdb7d031d07411b56..f82d57f12cc4e97dae99d5a711ee495a9895aa91 100644 --- a/fluid/DeepQNetwork/README.md +++ b/fluid/DeepQNetwork/README.md @@ -1,67 +1,6 @@ -[中文版](README_cn.md) -## Reproduce DQN, DoubleDQN, DuelingDQN model with Fluid version of PaddlePaddle -Based on PaddlePaddle's next-generation API Fluid, the DQN model of deep reinforcement learning is reproduced, and the same level of indicators of the paper is reproduced in the classic Atari game. The model receives the image of the game as input, and uses the end-to-end model to directly predict the next step. The repository contains the following three types of models: -+ DQN in -[Human-level Control Through Deep Reinforcement Learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html) -+ DoubleDQN in: -[Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389) -+ DuelingDQN in: -[Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html) +Hi! -## Atari benchmark & performance +This directory has been deprecated. -### Atari games introduction - -Please see [here](https://gym.openai.com/envs/#atari) to know more about Atari game. - -### Pong game result - -The average game rewards that can be obtained for the three models as the number of training steps changes during the training are as follows(about 3 hours/1 Million steps): - -
    -DQN result -
    - -## How to use -### Dependencies: -+ python2.7 -+ gym -+ tqdm -+ opencv-python -+ paddlepaddle-gpu>=1.0.0 -+ ale_python_interface - -### Install Dependencies: -+ Install PaddlePaddle: - recommended to compile and install PaddlePaddle from source code -+ Install other dependencies: - ``` - pip install -r requirement.txt - pip install gym[atari] - ``` - Install ale_python_interface, please see [here](https://github.com/mgbellemare/Arcade-Learning-Environment). - -### Start Training: -``` -# To train a model for Pong game with gpu (use DQN model as default) -python train.py --rom ./rom_files/pong.bin --use_cuda - -# To train a model for Pong with DoubleDQN -python train.py --rom ./rom_files/pong.bin --use_cuda --alg DoubleDQN - -# To train a model for Pong with DuelingDQN -python train.py --rom ./rom_files/pong.bin --use_cuda --alg DuelingDQN -``` - -To train more games, you can install more rom files from [here](https://github.com/openai/atari-py/tree/master/atari_py/atari_roms). - -### Start Testing: -``` -# Play the game with saved best model and calculate the average rewards -python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong - -# Play the game with visualization -python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong --viz 0.01 -``` -[Here](https://pan.baidu.com/s/1gIsbNw5V7tMeb74ojx-TMA) is saved models for Pong and Breakout games. You can use it to play the game directly. +Please visit the project at [PaddleRL/DeepQNetwork](../../PaddleRL/DeepQNetwork). diff --git a/fluid/DeepQNetwork/README_cn.md b/fluid/DeepQNetwork/README_cn.md index 640d775ad8fed2be360d308b6c5df41c86d77c04..b90f215b2d8e0734db5a41b00ab02260021c8cf6 100644 --- a/fluid/DeepQNetwork/README_cn.md +++ b/fluid/DeepQNetwork/README_cn.md @@ -1,71 +1,2 @@ -## 基于PaddlePaddle的Fluid版本复现DQN, DoubleDQN, DuelingDQN三个模型 -基于PaddlePaddle下一代API Fluid复现了深度强化学习领域的DQN模型,在经典的Atari 游戏上复现了论文同等水平的指标,模型接收游戏的图像作为输入,采用端到端的模型直接预测下一步要执行的控制信号,本仓库一共包含以下3类模型: -+ DQN模型: -[Human-level Control Through Deep Reinforcement Learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html) -+ DoubleDQN模型: -[Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389) -+ DuelingDQN模型: -[Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html) - -## 模型效果:Atari游戏表现 - -### Atari游戏介绍 - -请点击[这里](https://gym.openai.com/envs/#atari)了解Atari游戏。 - -### Pong游戏训练结果 -三个模型在训练过程中随着训练步数的变化,能得到的平均游戏奖励如下图所示(大概3小时每1百万步): - -
    -DQN result -
    - -## 使用教程 - -### 依赖: -+ python2.7 -+ gym -+ tqdm -+ opencv-python -+ paddlepaddle-gpu>=1.0.0 -+ ale_python_interface - -### 下载依赖: - -+ 安装PaddlePaddle: - 建议通过PaddlePaddle源码进行编译安装 -+ 下载其它依赖: - ``` - pip install -r requirement.txt - pip install gym[atari] - ``` - 安装ale_python_interface可以参考[这里](https://github.com/mgbellemare/Arcade-Learning-Environment) - -### 训练模型: - -``` -# 使用GPU训练Pong游戏(默认使用DQN模型) -python train.py --rom ./rom_files/pong.bin --use_cuda - -# 训练DoubleDQN模型 -python train.py --rom ./rom_files/pong.bin --use_cuda --alg DoubleDQN - -# 训练DuelingDQN模型 -python train.py --rom ./rom_files/pong.bin --use_cuda --alg DuelingDQN -``` - -训练更多游戏,可以从[这里](https://github.com/openai/atari-py/tree/master/atari_py/atari_roms)下载游戏rom - -### 测试模型: - -``` -# Play the game with saved model and calculate the average rewards -# 使用训练过程中保存的最好模型玩游戏,以及计算平均奖励(rewards) -python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong - -# 以可视化的形式来玩游戏 -python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong --viz 0.01 -``` - -[这里](https://pan.baidu.com/s/1gIsbNw5V7tMeb74ojx-TMA)是Pong和Breakout游戏训练好的模型,可以直接用来测试。 +您好,该项目已被迁移,请移步到 [PaddleRL/DeepQNetwork](../../PaddleRL/DeepQNetwork) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/HiNAS_models/README.md b/fluid/PaddleCV/HiNAS_models/README.md old mode 100755 new mode 100644 index 9c67736aa30643baf72ce42ed2ca3321d4e22165..1e33fea89e2d4e3a9b9ef2cad81012d082ccc504 --- a/fluid/PaddleCV/HiNAS_models/README.md +++ b/fluid/PaddleCV/HiNAS_models/README.md @@ -1,76 +1,6 @@ -# Image Classification Models -This directory contains six image classification models, which are models automatically discovered by Baidu Big Data Lab (BDL) Hierarchical Neural Architecture Search project (HiNAS), achieving 96.1% accuracy on CIFAR-10 dataset. These models are divided into two categories. The first three have no skip link, named HiNAS 0-2, and the last three networks contain skip links, which are similar to the shortcut connections in Resnet, named HiNAS 3-5. ---- -## Table of Contents -- [Installation](#installation) -- [Data preparation](#data-preparation) -- [Training a model](#training-a-model) -- [Model performances](#model-performances) +Hi! -## Installation -Running the trainer in current directory requires: +This directory has been deprecated. -- PadddlePaddle Fluid >= v0.15.0 -- CuDNN >=6.0 - -If PaddlePaddle and CuDNN in your runtime environment do not meet the requirements, please follow the instructions in [installation document](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html) and make an update. - -## Data preparation - -When you run the sample code for the first time, the trainer will automatically download the cifar-10 dataset. Please make sure your environment has an internet connection. - -The dataset will be downloaded to `dataset/cifar/cifar-10-python.tar.gz` in the same directory as the Trainer. If automatic download fails, you can go to https://www.cs.toronto.edu/~kriz/cifar.html and download cifar-10-python.tar.gz to the location mentioned above. - -## Training a model - -After the environment is ready, you can train the model. There are two entrances: `train_hinas.py` and `train_hinas_res.py`. The former is used to train Model 0-2 (without skip link), and the latter is used to train Model 3-5 (contains skip link). - -Train Model 0~2 (without skip link): -``` -python train_hinas.py --model=m_id # m_id can be 0, 1 or 2. -``` -Train Model 3~5 (with skip link): -``` -python train_hinas_res.py --model=m_id # m_id can be 0, 1 or 2. -``` - -In addition, both `train_hinas.py` and `train_hinas_res.py` support the following parameters: - -- **random_flip_left_right**: Random flip image horizontally. (Default: True) -- **random_flip_up_down**: Randomly flip image vertically. (Default: False) -- **cutout**: Add cutout action to image. (Default: True) -- **standardize_image**: Image standardize. (Default: True) -- **pad_and_cut_image**: Random padding image and then crop back to the original size. (Default: True) -- **shuffle_image**: Shuffle the order of the input images during training. (Default: True) -- **lr_max**: Learning rate at the begin of training. (Default: 0.1) -- **lr_min**: Learning rate at the end of training. (Default: 0.0001) -- **batch_size**: Training batch size (Default: 128) -- **num_epochs**: Total training epoch (Default: 200) -- **weight_decay**: L2 Regularization value (Default: 0.0004) -- **momentum**: The momentum parameter in momentum optimizer (Default: 0.9) -- **dropout_rate**: Dropout rate of the dropout layer (Default: 0.5) -- **bn_decay**: The decay/momentum parameter (or called moving average decay) in batch norm layer (Default: 0.9) - - -## Model performances - -Train all six models using same hyperparameters: - -- learning rate: 0.1 -> 0.0001 with cosine annealing -- total epoch: 200 -- batch size: 128 -- L2 decay: 0.000400 -- optimizer: momentum optimizer with m=0.9 and use nesterov -- preprocess: random horizontal flip + image standardization + cutout - -And below is the accuracy on CIFAR-10 dataset: - -| model | round 1 | round 2 | round 3 | max | avg | -|----------|---------|---------|---------|--------|--------| -| HiNAS-0 | 0.9548 | 0.9520 | 0.9513 | 0.9548 | 0.9527 | -| HiNAS-1 | 0.9452 | 0.9462 | 0.9420 | 0.9462 | 0.9445 | -| HiNAS-2 | 0.9508 | 0.9506 | 0.9483 | 0.9508 | 0.9499 | -| HiNAS-3 | 0.9607 | 0.9623 | 0.9601 | 0.9623 | 0.9611 | -| HiNAS-4 | 0.9611 | 0.9584 | 0.9586 | 0.9611 | 0.9594 | -| HiNAS-5 | 0.9578 | 0.9588 | 0.9594 | 0.9594 | 0.9586 | +Please visit the project at [AutoDL/HiNAS_models](../../../AutoDL/HiNAS_models). diff --git a/fluid/PaddleCV/HiNAS_models/README_cn.md b/fluid/PaddleCV/HiNAS_models/README_cn.md old mode 100755 new mode 100644 index 8ca3bcbfb8d1ea1a15f969c1a1db22ff2ec854f1..8ab7149b0aaef04c226aff0302e4282b0172c113 --- a/fluid/PaddleCV/HiNAS_models/README_cn.md +++ b/fluid/PaddleCV/HiNAS_models/README_cn.md @@ -1,78 +1,2 @@ -# Image Classification Models -本目录下包含6个图像分类模型,都是百度大数据实验室 Hierarchical Neural Architecture Search (HiNAS) 项目通过机器自动发现的模型,在CIFAR-10数据集上达到96.1%的准确率。这6个模型分为两类,前3个没有skip link,分别命名为 HiNAS 0-2号,后三个网络带有skip link,功能类似于Resnet中的shortcut connection,分别命名 HiNAS 3-5号。 ---- -## Table of Contents -- [Installation](#installation) -- [Data preparation](#data-preparation) -- [Training a model](#training-a-model) -- [Model performances](#model-performances) - -## Installation -最低环境要求: - -- PadddlePaddle Fluid >= v0.15.0 -- Cudnn >=6.0 - -如果您的运行环境无法满足要求,可以参考此文档升级PaddlePaddle:[installation document](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html) - -## Data preparation - -第一次训练模型的时候,Trainer会自动下载CIFAR-10数据集,请确保您的环境有互联网连接。 - -数据集会被下载到Trainer同目录下的`dataset/cifar/cifar-10-python.tar.gz`,如果自动下载失败,您可以自行从 https://www.cs.toronto.edu/~kriz/cifar.html 下载cifar-10-python.tar.gz,然后放到上述位置。 - - -## Training a model -准备好环境后,可以训练模型,训练有2个入口,`train_hinas.py`和`train_hinas_res.py`,前者用来训练0-2号不含skip link的模型,后者用来训练3-5号包含skip link的模型。 - -训练0~2号不含skip link的模型: -``` -python train_hinas.py --model=m_id # m_id can be 0, 1 or 2. -``` -训练3~5号包含skip link的模型: -``` -python train_hinas_res.py --model=m_id # m_id can be 0, 1 or 2. -``` - -此外,`train_hinas.py`和`train_hinas_res.py` 都支持以下参数: - -初始化部分: - -- random_flip_left_right:图片随机水平翻转(Default:True) -- random_flip_up_down:图片随机垂直翻转(Default:False) -- cutout:图片随机遮挡(Default:True) -- standardize_image:对图片每个像素做 standardize(Default:True) -- pad_and_cut_image:图片随机padding,并裁剪回原大小(Default:True) -- shuffle_image:训练时对输入图片的顺序做shuffle(Default:True) -- lr_max:训练开始时的learning rate(Default:0.1) -- lr_min:训练结束时的learning rate(Default:0.0001) -- batch_size:训练的batch size(Default:128) -- num_epochs:训练总的epoch(Default:200) -- weight_decay:训练时L2 Regularization大小(Default:0.0004) -- momentum:momentum优化器中的momentum系数(Default:0.9) -- dropout_rate:dropout层的dropout_rate(Default:0.5) -- bn_decay:batch norm层的decay/momentum系数(即moving average decay)大小(Default:0.9) - - - -## Model performances -6个模型使用相同的参数训练: - -- learning rate: 0.1 -> 0.0001 with cosine annealing -- total epoch: 200 -- batch size: 128 -- L2 decay: 0.000400 -- optimizer: momentum optimizer with m=0.9 and use nesterov -- preprocess: random horizontal flip + image standardization + cutout - -以下是6个模型在CIFAR-10数据集上的准确率: - -| model | round 1 | round 2 | round 3 | max | avg | -|----------|---------|---------|---------|--------|--------| -| HiNAS-0 | 0.9548 | 0.9520 | 0.9513 | 0.9548 | 0.9527 | -| HiNAS-1 | 0.9452 | 0.9462 | 0.9420 | 0.9462 | 0.9445 | -| HiNAS-2 | 0.9508 | 0.9506 | 0.9483 | 0.9508 | 0.9499 | -| HiNAS-3 | 0.9607 | 0.9623 | 0.9601 | 0.9623 | 0.9611 | -| HiNAS-4 | 0.9611 | 0.9584 | 0.9586 | 0.9611 | 0.9594 | -| HiNAS-5 | 0.9578 | 0.9588 | 0.9594 | 0.9594 | 0.9586 | +您好,该项目已被迁移,请移步到 [AutoDL/HiNAS_models](../../../AutoDL/HiNAS_models) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/caffe2fluid/README.md b/fluid/PaddleCV/caffe2fluid/README.md index 8520342325a1ef4e08d8f9669969acd5b6b57851..8241e980ec9c05cacb0121387780371029625537 100644 --- a/fluid/PaddleCV/caffe2fluid/README.md +++ b/fluid/PaddleCV/caffe2fluid/README.md @@ -1,87 +1,6 @@ -### Caffe2Fluid -This tool is used to convert a Caffe model to a Fluid model -### Key Features -1. Convert caffe model to fluid model with codes of defining a network(useful for re-training) +Hi! -2. Pycaffe is not necessary when just want convert model without do caffe-inference +This directory has been deprecated. -3. Caffe's customized layers convertion also be supported by extending this tool - -4. A bunch of tools in `examples/imagenet/tools` are provided to compare the difference - -### HowTo -1. Prepare `caffepb.py` in `./proto` if your python has no `pycaffe` module, two options provided here: - - Generate pycaffe from caffe.proto - ``` - bash ./proto/compile.sh - ``` - - - Download one from github directly - ``` - cd proto/ && wget https://raw.githubusercontent.com/ethereon/caffe-tensorflow/master/kaffe/caffe/caffepb.py - ``` - -2. Convert the Caffe model to Fluid model - - Generate fluid code and weight file - ``` - python convert.py alexnet.prototxt \ - --caffemodel alexnet.caffemodel \ - --data-output-path alexnet.npy \ - --code-output-path alexnet.py - ``` - - - Save weights as fluid model file - ``` - # only infer the last layer's result - python alexnet.py alexnet.npy ./fluid - # infer these 2 layer's result - python alexnet.py alexnet.npy ./fluid fc8,prob - ``` - -3. Use the converted model to infer - - See more details in `examples/imagenet/tools/run.sh` - -4. Compare the inference results with caffe - - See more details in `examples/imagenet/tools/diff.sh` - -### How to convert custom layer -1. Implement your custom layer in a file under `kaffe/custom_layers`, eg: mylayer.py - - Implement ```shape_func(input_shape, [other_caffe_params])``` to calculate the output shape - - Implement ```layer_func(inputs, name, [other_caffe_params])``` to construct a fluid layer - - Register these two functions ```register(kind='MyType', shape=shape_func, layer=layer_func)``` - - Notes: more examples can be found in `kaffe/custom_layers` - -2. Add ```import mylayer``` to `kaffe/custom_layers/\_\_init__.py` - -3. Prepare your pycaffe as your customized version(same as previous env prepare) - - (option1) replace `proto/caffe.proto` with your own caffe.proto and compile it - - (option2) change your `pycaffe` to the customized version - -4. Convert the Caffe model to Fluid model - -5. Set env $CAFFE2FLUID_CUSTOM_LAYERS to the parent directory of 'custom_layers' - ``` - export CAFFE2FLUID_CUSTOM_LAYERS=/path/to/caffe2fluid/kaffe - ``` - -6. Use the converted model when loading model in `xxxnet.py` and `xxxnet.npy`(no need if model is already in `fluid/model` and `fluid/params`) - -### Tested models -- Lenet: -[model addr](https://github.com/ethereon/caffe-tensorflow/blob/master/examples/mnist) - -- ResNets:(ResNet-50, ResNet-101, ResNet-152) -[model addr](https://onedrive.live.com/?authkey=%21AAFW2-FVoxeVRck&id=4006CBB8476FF777%2117887&cid=4006CBB8476FF777) - -- GoogleNet: -[model addr](https://gist.github.com/jimmie33/7ea9f8ac0da259866b854460f4526034) - -- VGG: -[model addr](https://gist.github.com/ksimonyan/211839e770f7b538e2d8) - -- AlexNet: -[model addr](https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet) - -### Notes -Some of this code come from here: [caffe-tensorflow](https://github.com/ethereon/caffe-tensorflow) +Please visit the project at [PaddleCV/caffe2fluid](../../../PaddleCV/caffe2fluid). diff --git a/fluid/PaddleCV/deeplabv3+/README.md b/fluid/PaddleCV/deeplabv3+/README.md index eff83fee192d6a34cb338f5c705fb7ec1f59fd07..94f81a780a21bda7e230bf513be427b08a6eaca2 100644 --- a/fluid/PaddleCV/deeplabv3+/README.md +++ b/fluid/PaddleCV/deeplabv3+/README.md @@ -1,116 +1,2 @@ -DeepLab运行本目录下的程序示例需要使用PaddlePaddle Fluid v1.3.0版本或以上。如果您的PaddlePaddle安装版本低于此要求,请按照安装文档中的说明更新PaddlePaddle安装版本,如果使用GPU,该程序需要使用cuDNN v7版本。 - -## 代码结构 -``` -├── models.py # 网络结构定义脚本 -├── train.py # 训练任务脚本 -├── eval.py # 评估脚本 -└── reader.py # 定义通用的函数以及数据预处理脚本 -``` - -## 简介 - -DeepLabv3+ 是DeepLab语义分割系列网络的最新作,其前作有 DeepLabv1,DeepLabv2, DeepLabv3, -在最新作中,DeepLab的作者通过encoder-decoder进行多尺度信息的融合,同时保留了原来的空洞卷积和ASSP层, -其骨干网络使用了Xception模型,提高了语义分割的健壮性和运行速率,在 PASCAL VOC 2012 dataset取得新的state-of-art performance,89.0mIOU。 - -![](./imgs/model.png) - - -## 数据准备 - - - -本文采用Cityscape数据集,请前往[Cityscape官网](https://www.cityscapes-dataset.com)注册下载。 -下载以后的数据目录结构如下 -``` -data/cityscape/ -|-- gtFine -| |-- test -| |-- train -| `-- val -|-- leftImg8bit - |-- test - |-- train - `-- val -``` - -# 预训练模型准备 - -我们为了节约更多的显存,在这里我们使用Group Norm作为我们的归一化手段。 -如果需要从头开始训练模型,用户需要下载我们的初始化模型 -``` -wget https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_gn_init.tgz -tar -xf deeplabv3plus_gn_init.tgz && rm deeplabv3plus_gn_init.tgz -``` -如果需要最终训练模型进行fine tune或者直接用于预测,请下载我们的最终模型 -``` -wget https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_gn.tgz -tar -xf deeplabv3plus_gn.tgz && rm deeplabv3plus_gn.tgz -``` - - -## 模型训练与预测 - -### 训练 -执行以下命令进行训练,同时指定weights的保存路径,初始化路径,以及数据存放位置: -``` -python ./train.py \ - --batch_size=1 \ - --train_crop_size=769 \ - --total_step=50 \ - --norm_type=gn \ - --init_weights_path=$INIT_WEIGHTS_PATH \ - --save_weights_path=$SAVE_WEIGHTS_PATH \ - --dataset_path=$DATASET_PATH -``` -使用以下命令获得更多使用说明: -``` -python train.py --help -``` -以上命令用于测试训练过程是否正常,仅仅迭代了50次并且使用了1的batch size,如果需要复现 -原论文的实验,请使用以下设置: -``` -CUDA_VISIBLE_DEVICES=0 \ -python ./train.py \ - --batch_size=4 \ - --parallel=True \ - --norm_type=gn \ - --train_crop_size=769 \ - --total_step=500000 \ - --base_lr=0.001 \ - --init_weights_path=deeplabv3plus_gn_init \ - --save_weights_path=output \ - --dataset_path=$DATASET_PATH -``` -如果您的显存不足,可以尝试减小`batch_size`,同时等比例放大`total_step`, 缩小`base_lr`, 保证相乘的值不变,这得益于Group Norm的特性,改变 `batch_size` 并不会显著影响结果,而且能够节约更多显存, 比如您可以设置`--batch_size=2 --total_step=1000000 --base_lr=0.0005`。 - -### 测试 -执行以下命令在`Cityscape`测试数据集上进行测试: -``` -python ./eval.py \ - --init_weights_path=deeplabv3plus_gn \ - --norm_type=gn \ - --dataset_path=$DATASET_PATH -``` -需要通过选项`--init_weights_path`指定模型文件。测试脚本的输出的评估指标为mean IoU。 - - -## 实验结果 -训练完成以后,使用`eval.py`在验证集上进行测试,得到以下结果: -``` -load from: ../models/deeplabv3plus_gn -total number 500 -step: 500, mIoU: 0.7881 -``` - -## 其他信息 - -|数据集 | norm type | pretrained model | trained model | mean IoU -|---|---|---|---|---| -|CityScape | group norm | [deeplabv3plus_gn_init.tgz](https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_gn_init.tgz) | [deeplabv3plus_gn.tgz](https://paddle-deeplab.bj.bcebos.com/deeplabv3plus_gn.tgz) | 0.7881 | - -## 参考 - -- [Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1802.02611) +您好,该项目已被迁移,请移步到 [PaddleCV/deeplabv3+](../../../PaddleCV/deeplabv3+) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/face_detection/README.md b/fluid/PaddleCV/face_detection/README.md deleted file mode 120000 index 4015683cfa5969297febc12e7ca1264afabbc0b5..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/face_detection/README.md +++ /dev/null @@ -1 +0,0 @@ -README_cn.md \ No newline at end of file diff --git a/fluid/PaddleCV/face_detection/README.md b/fluid/PaddleCV/face_detection/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e9319716f4f660ff75b571337575d8cd53c03a13 --- /dev/null +++ b/fluid/PaddleCV/face_detection/README.md @@ -0,0 +1,2 @@ + +您好,该项目已被迁移,请移步到 [PaddleCV/face_detection](../../../PaddleCV/face_detection) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/face_detection/README_cn.md b/fluid/PaddleCV/face_detection/README_cn.md index f63fbed02ab34520d79b2d2b000e31f5eb22e7f8..e9319716f4f660ff75b571337575d8cd53c03a13 100644 --- a/fluid/PaddleCV/face_detection/README_cn.md +++ b/fluid/PaddleCV/face_detection/README_cn.md @@ -1,185 +1,2 @@ -## Pyramidbox 人脸检测 -## Table of Contents -- [简介](#简介) -- [数据准备](#数据准备) -- [模型训练](#模型训练) -- [模型评估](#模型评估) -- [模型发布](#模型发布) - -### 简介 - -人脸检测是经典的计算机视觉任务,非受控场景中的小脸、模糊和遮挡的人脸检测是这个方向上最有挑战的问题。[PyramidBox](https://arxiv.org/pdf/1803.07737.pdf) 是一种基于SSD的单阶段人脸检测器,它利用上下文信息解决困难人脸的检测问题。如下图所示,PyramidBox在六个尺度的特征图上进行不同层级的预测。该工作主要包括以下模块:LFPN、Pyramid Anchors、CPM、Data-anchor-sampling。具体可以参考该方法对应的论文 https://arxiv.org/pdf/1803.07737.pdf ,下面进行简要的介绍。 - -

    -
    -Pyramidbox 人脸检测模型 -

    - -**LFPN**: LFPN全称Low-level Feature Pyramid Networks, 在检测任务中,LFPN可以充分结合高层次的包含更多上下文的特征和低层次的包含更多纹理的特征。高层级特征被用于检测尺寸较大的人脸,而低层级特征被用于检测尺寸较小的人脸。为了将高层级特征整合到高分辨率的低层级特征上,我们从中间层开始做自上而下的融合,构建Low-level FPN。 - -**Pyramid Anchors**: 该算法使用半监督解决方案来生成与人脸检测相关的具有语义的近似标签,提出基于anchor的语境辅助方法,它引入有监督的信息来学习较小的、模糊的和部分遮挡的人脸的语境特征。使用者可以根据标注的人脸标签,按照一定的比例扩充,得到头部的标签(上下左右各扩充1/2)和人体的标签(可自定义扩充比例)。 - -**CPM**: CPM全称Context-sensitive Predict Module, 本方法设计了一种上下文敏感结构(CPM)来提高预测网络的表达能力。 - -**Data-anchor-sampling**: 设计了一种新的采样方法,称作Data-anchor-sampling,该方法可以增加训练样本在不同尺度上的多样性。该方法改变训练样本的分布,重点关注较小的人脸。 - -Pyramidbox模型可以在以下示例图片上展示鲁棒的检测性能,该图有一千张人脸,该模型检测出其中的880张人脸。 -

    -
    -Pyramidbox 人脸检测性能展示 -

    - - - -### 数据准备 - -本教程使用 [WIDER FACE 数据集](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/) 来进行模型的训练测试工作,官网给出了详尽的数据介绍。 - -WIDER FACE数据集包含32,203张图片,其中包含393,703个人脸,数据集的人脸在尺度、姿态、遮挡方面有较大的差异性。另外WIDER FACE数据集是基于61个场景归类的,然后针对每个场景,随机的挑选40%作为训练集,10%作为验证集,50%作为测试集。 - -首先,从官网训练集和验证集,放在`data`目录,官网提供了谷歌云和百度云下载地址,请依据情况自行下载。并下载训练集和验证集的标注信息: - -```bash -./data/download.sh -``` - -准备好数据之后,`data`目录如下: - -``` -data -|-- download.sh -|-- wider_face_split -| |-- readme.txt -| |-- wider_face_train_bbx_gt.txt -| |-- wider_face_val_bbx_gt.txt -| `-- ... -|-- WIDER_train -| `-- images -| |-- 0--Parade -| ... -| `-- 9--Press_Conference -`-- WIDER_val - `-- images - |-- 0--Parade - ... - `-- 9--Press_Conference -``` - - -### 模型训练 - -#### 下载预训练模型 - -我们提供了预训练模型,模型是基于VGGNet的主干网络,使用如下命令下载: - - -```bash -wget http://paddlemodels.bj.bcebos.com/vgg_ilsvrc_16_fc_reduced.tar.gz -tar -xf vgg_ilsvrc_16_fc_reduced.tar.gz && rm -f vgg_ilsvrc_16_fc_reduced.tar.gz -``` - -声明:该预训练模型转换自[Caffe](http://cs.unc.edu/~wliu/projects/ParseNet/VGG_ILSVRC_16_layers_fc_reduced.caffemodel)。不久,我们会发布自己预训练的模型。 - - -#### 开始训练 - - -`train.py` 是训练模块的主要执行程序,调用示例如下: - -```bash -python -u train.py --batch_size=16 --pretrained_model=vgg_ilsvrc_16_fc_reduced -``` - - 可以通过设置 `export CUDA_VISIBLE_DEVICES=0,1,2,3` 指定想要使用的GPU数量,`batch_size`默认设置为12或16。 - - 更多的可选参数见: - ```bash - python train.py --help - ``` - - 模型训练150轮以上可以收敛。用Nvidia Tesla P40 GPU 4卡并行,`batch_size=16`的配置,每轮训练大约40分钟,总共训练时长大约100小时 - -模型训练所采用的数据增强: - -**数据增强**:数据的读取行为定义在 `reader.py` 中,所有的图片都会被缩放到640x640。在训练时还会对图片进行数据增强,包括随机扰动、翻转、裁剪等,和[物体检测SSD算法](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/object_detection/README.md)中数据增强类似,除此之外,增加了上面提到的Data-anchor-sampling: - - **尺度变换(Data-anchor-sampling)**:随机将图片尺度变换到一定范围的尺度,大大增强人脸的尺度变化。具体操作为根据随机选择的人脸高(height)和宽(width),得到$v=\\sqrt{width * height}$,判断$v$的值位于缩放区间$[16,32,64,128,256,512]$中的的哪一个。假设$v=45$,则选定$32 - - - -
    -Pyramidbox 预测可视化 -

    - - -### 模型发布 - - - -| 模型 | 预训练模型 | 训练数据 | 测试数据 | mAP | -|:------------------------:|:------------------:|:----------------:|:------------:|:----:| -|[Pyramidbox-v1-SSD 640x640](http://paddlemodels.bj.bcebos.com/PyramidBox_WiderFace.tar.gz) | [VGGNet](http://paddlemodels.bj.bcebos.com/vgg_ilsvrc_16_fc_reduced.tar.gz) | WIDER FACE train | WIDER FACE Val | 96.0%/ 94.8%/ 88.8% | - -#### 性能曲线 -

    - - -
    -WIDER FACE Easy/Medium/Hard set -

    +您好,该项目已被迁移,请移步到 [PaddleCV/face_detection](../../../PaddleCV/face_detection) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/gan/c_gan/README.md b/fluid/PaddleCV/gan/c_gan/README.md index 9f3c18fd0fb9a943f728548f655d3dd3cef73288..b36f7084c0a67ce35cc7e7a73333443919a98775 100644 --- a/fluid/PaddleCV/gan/c_gan/README.md +++ b/fluid/PaddleCV/gan/c_gan/README.md @@ -1,76 +1,2 @@ - -运行本目录下的程序示例需要使用PaddlePaddle develop最新版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。 - -## 代码结构 -``` -├── network.py # 定义基础生成网络和判别网络。 -├── utility.py # 定义通用工具方法。 -├── dc_gan.py # DCGAN训练脚本。 -└── c_gan.py # conditionalGAN训练脚本。 -``` - -## 简介 -TODO - -## 数据准备 - -本教程使用 mnist 数据集来进行模型的训练测试工作,该数据集通过`paddle.dataset`模块自动下载到本地。 - -## 训练测试conditianalGAN - -在GPU单卡上训练conditionalGAN: - -``` -env CUDA_VISIBLE_DEVICES=0 python c_gan.py --output="./result" -``` - -训练过程中,每隔固定的训练轮数,会取一个batch的数据进行测试,测试结果以图片的形式保存至`--output`选项指定的路径。 - -执行`python c_gan.py --help`可查看更多使用方式和参数详细说明。 - -图1为conditionalGAN训练损失示意图,其中横坐标轴为训练轮数,纵轴为在训练集上的损失。其中,'G_loss'和'D_loss'分别为生成网络和判别器网络的训练损失。conditionalGAN训练19轮的模型预测效果如图2所示. - -

    - - - - - - - - - - - -
    - - - -
    - 图 1 - - 图 2 -
    -

    - - -## 训练测试DCGAN - -在GPU单卡上训练DCGAN: - -``` -env CUDA_VISIBLE_DEVICES=0 python dc_gan.py --output="./result" -``` - -训练过程中,每隔固定的训练轮数,会取一个batch的数据进行测试,测试结果以图片的形式保存至`--output`选项指定的路径。 - -执行`python dc_gan.py --help`可查看更多使用方式和参数详细说明。 - - -DCGAN训练10轮的模型预测效果如图3所示: - -

    -
    -图 3 -

    +您好,该项目已被迁移,请移步到 [PaddleCV/gan/c_gan](../../../../PaddleCV/gan/c_gan) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/gan/cycle_gan/README.md b/fluid/PaddleCV/gan/cycle_gan/README.md index 0a9be53c783a557b7c2306f65377e4cafa8cfd90..5db6d49b2cbdaa6af4224bc0707593908a05352d 100644 --- a/fluid/PaddleCV/gan/cycle_gan/README.md +++ b/fluid/PaddleCV/gan/cycle_gan/README.md @@ -1,91 +1,2 @@ - -运行本目录下的程序示例需要使用PaddlePaddle develop最新版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。 - -## 代码结构 -``` -├── data_reader.py # 读取、处理数据。 -├── layers.py # 封装定义基础的layers。 -├── model.py # 定义基础生成网络和判别网络。 -├── trainer.py # 构造loss和训练网络。 -├── train.py # 训练脚本。 -└── infer.py # 预测脚本。 -``` - -## 简介 -TODO - -## 数据准备 - -本教程使用 horse2zebra 数据集 来进行模型的训练测试工作,该数据集是用关键字'wild horse'和'zebra'过滤[ImageNet](http://www.image-net.org/)数据集并下载得到的。 - -horse2zebra训练集包含1069张野马图片,1336张斑马图片。测试集包含121张野马图片和141张斑马图片。 - -数据下载处理完毕后,并组织为以下路径结构: - -``` -data -|-- horse2zebra -| |-- testA -| |-- testA.txt -| |-- testB -| |-- testB.txt -| |-- trainA -| |-- trainA.txt -| |-- trainB -| `-- trainB.txt - -``` - -以上数据文件中,`data`文件夹需要放在训练脚本`train.py`同级目录下。`testA`为存放野马测试图片的文件夹,`testB`为存放斑马测试图片的文件夹,`testA.txt`和`testB.txt`分别为野马和斑马测试图片路径列表文件,格式如下: - -``` -testA/n02381460_9243.jpg -testA/n02381460_9244.jpg -testA/n02381460_9245.jpg -``` - -训练数据组织方式与测试数据相同。 - - -## 模型训练与预测 - -### 训练 - -在GPU单卡上训练: - -``` -env CUDA_VISIBLE_DEVICES=0 python train.py -``` - -执行`python train.py --help`可查看更多使用方式和参数详细说明。 - -图1为训练152轮的训练损失示意图,其中横坐标轴为训练轮数,纵轴为在训练集上的损失。其中,'g_A_loss','g_B_loss','d_A_loss'和'd_B_loss'分别为生成器A、生成器B、判别器A和判别器B的训练损失。 - -

    -
    -图 1 -

    - - -### 预测 - -执行以下命令读取多张图片进行预测: - -``` -env CUDA_VISIBLE_DEVICE=0 python infer.py \ - --init_model="checkpoints/1" --input="./data/inputA/*" \ - --input_style A --output="./output" -``` - -训练150轮的模型预测效果如图2和图3所示: - -

    -
    -图 2 -

    - -

    -
    -图 3 -

    +您好,该项目已被迁移,请移步到 [PaddleCV/gan/cycle_gan](../../../../PaddleCV/gan/cycle_gan) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/human_pose_estimation/README.md b/fluid/PaddleCV/human_pose_estimation/README.md index d629c6b7c31ea05329b591a7ee5f6ca929573ec3..6ced2b3b2cd19d413f2c8f2b139725c2e5ea14fc 100644 --- a/fluid/PaddleCV/human_pose_estimation/README.md +++ b/fluid/PaddleCV/human_pose_estimation/README.md @@ -1,115 +1,6 @@ -# Simple Baselines for Human Pose Estimation in Fluid -## Introduction -This is a simple demonstration of re-implementation in [PaddlePaddle.Fluid](http://www.paddlepaddle.org/en) for the paper [Simple Baselines for Human Pose Estimation and Tracking](https://arxiv.org/abs/1804.06208) (ECCV'18) from MSRA. +Hi! -![demo](demo.gif) +This directory has been deprecated. -> **Video in Demo**: *Bruno Mars - That’s What I Like [Official Video]*. - -## Requirements - - - Python == 2.7 - - PaddlePaddle >= 1.1.0 - - opencv-python >= 3.3 - -## Environment - -The code is developed and tested under 4 Tesla K40/P40 GPUS cards on CentOS with installed CUDA-9.2/8.0 and cuDNN-7.1. - -## Results on MPII Val -| Arch | Head | Shoulder | Elbow | Wrist | Hip | Knee | Ankle | Mean | Mean@0.1| Models | -| ---- |:----:|:--------:|:-----:|:-----:|:---:|:----:|:-----:|:----:|:-------:|:------:| -| 256x256\_pose\_resnet\_50 in PyTorch | 96.351 | 95.329 | 88.989 | 83.176 | 88.420 | 83.960 | 79.594 | 88.532 | 33.911 | - | -| 256x256\_pose\_resnet\_50 in Fluid | 96.385 | 95.363 | 89.211 | 84.084 | 88.454 | 84.182 | 79.546 | 88.748 | 33.750 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-mpii-256x256.tar.gz) | -| 384x384\_pose\_resnet\_50 in PyTorch | 96.658 | 95.754 | 89.790 | 84.614 | 88.523 | 84.666 | 79.287 | 89.066 | 38.046 | - | -| 384x384\_pose\_resnet\_50 in Fluid | 96.862 | 95.635 | 90.046 | 85.557 | 88.818 | 84.948 | 78.484 | 89.235 | 38.093 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-mpii-384x384.tar.gz) | - -## Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset -| Arch | AP | Ap .5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) | Models | -| ---- |:--:|:-----:|:------:|:------:|:------:|:--:|:-----:|:------:|:------:|:------:|:------:| -| 256x192\_pose\_resnet\_50 in PyTorch | 0.704 | 0.886 | 0.783 | 0.671 | 0.772 | 0.763 | 0.929 | 0.834 | 0.721 | 0.824 | - | -| 256x192\_pose\_resnet\_50 in Fluid | 0.712 | 0.897 | 0.786 | 0.683 | 0.756 | 0.741 | 0.906 | 0.806 | 0.709 | 0.790 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-coco-256x192.tar.gz) | -| 384x288\_pose\_resnet\_50 in PyTorch | 0.722 | 0.893 | 0.789 | 0.681 | 0.797 | 0.776 | 0.932 | 0.838 | 0.728 | 0.846 | - | -| 384x288\_pose\_resnet\_50 in Fluid | 0.727 | 0.897 | 0.796 | 0.690 | 0.783 | 0.754 | 0.907 | 0.813 | 0.714 | 0.814 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-coco-384x288.tar.gz) | - -### Notes: - - - Flip test is used. - - We do not hardly search the best model, just use the last saved model to make validation. - -## Getting Start - -### Prepare Datasets and Pretrained Models - - - Following the [instruction](https://github.com/Microsoft/human-pose-estimation.pytorch#data-preparation) to prepare datasets. - - Download the pretrained ResNet-50 model in PaddlePaddle.Fluid on ImageNet from [Model Zoo](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification#supported-models-and-performances). - -```bash -wget http://paddle-imagenet-models.bj.bcebos.com/resnet_50_model.tar -``` - -Then, put them in the folder `pretrained` under the directory root of this repo, make them look like: - -``` -${THIS REPO ROOT} - `-- pretrained - `-- resnet_50 - |-- 115 - `-- data - `-- coco - |-- annotations - |-- images - `-- mpii - |-- annot - |-- images -``` - -### Install [COCOAPI](https://github.com/cocodataset/cocoapi) - -```bash -# COCOAPI=/path/to/clone/cocoapi -git clone https://github.com/cocodataset/cocoapi.git $COCOAPI -cd $COCOAPI/PythonAPI -# if cython is not installed -pip install Cython -# Install into global site-packages -make install -# Alternatively, if you do not have permissions or prefer -# not to install the COCO API into global site-packages -python2 setup.py install --user -``` - -### Perform Validating - -Downloading the checkpoints of Pose-ResNet-50 trained on MPII dataset from [here](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-mpii-384x384.tar.gz). Extract it into the folder `checkpoints` under the directory root of this repo. Then run - -```bash -python val.py --dataset 'mpii' --checkpoint 'checkpoints/pose-resnet50-mpii-384x384' -``` - -### Perform Training - -```bash -python train.py --dataset 'mpii' # or coco -``` - -**Note**: Configurations for training are aggregated in the `lib/mpii_reader.py` and `lib/coco_reader.py`. - -### Perform Test on Images - -Put the images into the folder `test` under the directory root of this repo. Then run - -```bash -python test.py --checkpoint 'checkpoints/pose-resnet-50-384x384-mpii' -``` - -If there are multiple persons in images, detectors such as [Faster R-CNN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/rcnn), [SSD](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/object_detection) or others should be used first to crop them out. Because the simple baseline for human pose estimation is a top-down method. - -## Reference - - - Simple Baselines for Human Pose Estimation and Tracking in PyTorch [`code`](https://github.com/Microsoft/human-pose-estimation.pytorch#data-preparation) - -## License - -This code is released under the Apache License 2.0. +Please visit the project at [PaddleCV/human_pose_estimation](../../../PaddleCV/human_pose_estimation). diff --git a/fluid/PaddleCV/human_pose_estimation/README_cn.md b/fluid/PaddleCV/human_pose_estimation/README_cn.md index 08c772018a91b830d7865f78abf7d4604d1173fb..84120d0c568b13bfbccead92cd7f9211193f7669 100644 --- a/fluid/PaddleCV/human_pose_estimation/README_cn.md +++ b/fluid/PaddleCV/human_pose_estimation/README_cn.md @@ -1,107 +1,2 @@ -# 关键点检测(Simple Baselines for Human Pose Estimation) -## 介绍 -本目录包含了对论文[Simple Baselines for Human Pose Estimation and Tracking](https://arxiv.org/abs/1804.06208) (ECCV'18)的复现. - -![demo](demo.gif) - -> **演示视频**: *Bruno Mars - That’s What I Like [官方视频]*. - -## 环境依赖 - -本目录下的代码均在4卡Tesla K40/P40 GPU,CentOS系统,CUDA-9.2/8.0,cuDNN-7.1环境下测试运行无误 - - - Python == 2.7 - - PaddlePaddle >= 1.1.0 - - opencv-python >= 3.3 - -## MPII Val结果 -| Arch | Head | Shoulder | Elbow | Wrist | Hip | Knee | Ankle | Mean | Mean@0.1| Models | -| ---- |:----:|:--------:|:-----:|:-----:|:---:|:----:|:-----:|:----:|:-------:|:------:| -| 256x256\_pose\_resnet\_50 in PyTorch | 96.351 | 95.329 | 88.989 | 83.176 | 88.420 | 83.960 | 79.594 | 88.532 | 33.911 | - | -| 256x256\_pose\_resnet\_50 in Fluid | 96.385 | 95.363 | 89.211 | 84.084 | 88.454 | 84.182 | 79.546 | 88.748 | 33.750 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-mpii-256x256.tar.gz) | -| 384x384\_pose\_resnet\_50 in PyTorch | 96.658 | 95.754 | 89.790 | 84.614 | 88.523 | 84.666 | 79.287 | 89.066 | 38.046 | - | -| 384x384\_pose\_resnet\_50 in Fluid | 96.862 | 95.635 | 90.046 | 85.557 | 88.818 | 84.948 | 78.484 | 89.235 | 38.093 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-mpii-384x384.tar.gz) | - -## COCO val2017结果(使用的检测器在COCO val2017数据集上AP为56.4) -| Arch | AP | Ap .5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) | Models | -| ---- |:--:|:-----:|:------:|:------:|:------:|:--:|:-----:|:------:|:------:|:------:|:------:| -| 256x192\_pose\_resnet\_50 in PyTorch | 0.704 | 0.886 | 0.783 | 0.671 | 0.772 | 0.763 | 0.929 | 0.834 | 0.721 | 0.824 | - | -| 256x192\_pose\_resnet\_50 in Fluid | 0.712 | 0.897 | 0.786 | 0.683 | 0.756 | 0.741 | 0.906 | 0.806 | 0.709 | 0.790 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-coco-256x192.tar.gz) | -| 384x288\_pose\_resnet\_50 in PyTorch | 0.722 | 0.893 | 0.789 | 0.681 | 0.797 | 0.776 | 0.932 | 0.838 | 0.728 | 0.846 | - | -| 384x288\_pose\_resnet\_50 in Fluid | 0.727 | 0.897 | 0.796 | 0.690 | 0.783 | 0.754 | 0.907 | 0.813 | 0.714 | 0.814 | [`link`](https://paddlemodels.bj.bcebos.com/pose/pose-resnet50-coco-384x288.tar.gz) | - -### 说明 - - - 使用Flip test - - 对当前模型结果并没有进行调参选择,使用下面相关实验配置训练后,取最后一个epoch后的模型作为最终模型,即可得到上述实验结果 - -## 开始 - -### 数据准备和预训练模型 - - - 安照[提示](https://github.com/Microsoft/human-pose-estimation.pytorch#data-preparation)进行数据准备 - - 下载预训练好的ResNet-50 - -```bash -wget http://paddle-imagenet-models.bj.bcebos.com/resnet_50_model.tar -``` - -下载完成后,将模型解压、放入到根目录下的'pretrained'文件夹中,默认文件路径树为: - -``` -${根目录} - `-- pretrained - `-- resnet_50 - |-- 115 - `-- data - `-- coco - |-- annotations - |-- images - `-- mpii - |-- annot - |-- images -``` - -### 安装 [COCOAPI](https://github.com/cocodataset/cocoapi) - -```bash -# COCOAPI=/path/to/clone/cocoapi -git clone https://github.com/cocodataset/cocoapi.git $COCOAPI -cd $COCOAPI/PythonAPI -# if cython is not installed -pip install Cython -# Install into global site-packages -make install -# Alternatively, if you do not have permissions or prefer -# not to install the COCO API into global site-packages -python2 setup.py install --user -``` - -### 模型验证(COCO或MPII) - -下载COCO/MPII预训练模型(见上表最后一列所附链接),保存到根目录下的'checkpoints'文件夹中,运行: - -```bash -python val.py --dataset 'mpii' --checkpoint 'checkpoints/pose-resnet50-mpii-384x384' -``` - -### 模型训练 - -```bash -python train.py --dataset 'mpii' # or coco -``` - -**说明** 详细参数配置已保存到`lib/mpii_reader.py` 和 `lib/coco_reader.py`文件中,通过设置dataset来选择使用具体的参数配置 - -### 模型测试(任意图片,使用上述COCO或MPII预训练好的模型) - -将测试图片放入根目录下的'test'文件夹中,执行 - -```bash -python test.py --checkpoint 'checkpoints/pose-resnet-50-384x384-mpii' -``` - -## 引用 - -- Simple Baselines for Human Pose Estimation and Tracking in PyTorch [`code`](https://github.com/Microsoft/human-pose-estimation.pytorch#data-preparation) +您好,该项目已被迁移,请移步到 [PaddleCV/human_pose_estimation](../../../PaddleCV/human_pose_estimation) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/icnet/README.md b/fluid/PaddleCV/icnet/README.md index 84e067ab081f648a4107ece906bad9a52ae13bbc..72a3a91b0ae52894c641e61b489ff7a04c6f8106 100644 --- a/fluid/PaddleCV/icnet/README.md +++ b/fluid/PaddleCV/icnet/README.md @@ -1,110 +1,2 @@ -运行本目录下的程序示例需要使用PaddlePaddle develop最新版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。 - -## 代码结构 -``` -├── network.py # 网络结构定义脚本 -├── train.py # 训练任务脚本 -├── eval.py # 评估脚本 -├── infer.py # 预测脚本 -├── cityscape.py # 数据预处理脚本 -└── utils.py # 定义通用的函数 -``` - -## 简介 - -Image Cascade Network(ICNet)主要用于图像实时语义分割。相较于其它压缩计算的方法,ICNet即考虑了速度,也考虑了准确性。 -ICNet的主要思想是将输入图像变换为不同的分辨率,然后用不同计算复杂度的子网络计算不同分辨率的输入,然后将结果合并。ICNet由三个子网络组成,计算复杂度高的网络处理低分辨率输入,计算复杂度低的网络处理分辨率高的网络,通过这种方式在高分辨率图像的准确性和低复杂度网络的效率之间获得平衡。 - -整个网络结构如下: - -

    -
    -图 1 -

    - - -## 数据准备 - - - -本文采用Cityscape数据集,请前往[Cityscape官网](https://www.cityscapes-dataset.com)注册下载。下载数据之后,按照[这里](https://github.com/mcordts/cityscapesScripts/blob/master/cityscapesscripts/preparation/createTrainIdLabelImgs.py#L3)的说明和工具处理数据。 -处理之后的数据 -``` -data/cityscape/ -|-- gtFine -| |-- test -| |-- train -| `-- val -|-- leftImg8bit -| |-- test -| |-- train -| `-- val -|-- train.list -`-- val.list -``` -其中,train.list和val.list分别是用于训练和测试的列表文件,第一列为输入图像数据,第二列为标注数据,两列用空格分开。示例如下: -``` -leftImg8bit/train/stuttgart/stuttgart_000021_000019_leftImg8bit.png gtFine/train/stuttgart/stuttgart_000021_000019_gtFine_labelTrainIds.png -leftImg8bit/train/stuttgart/stuttgart_000072_000019_leftImg8bit.png gtFine/train/stuttgart/stuttgart_000072_000019_gtFine_labelTrainIds.png -``` -完成数据下载和准备后,需要修改`cityscape.py`脚本中对应的数据地址。 - -## 模型训练与预测 - -### 训练 -执行以下命令进行训练,同时指定checkpoint保存路径: -``` -python train.py --batch_size=16 --use_gpu=True --checkpoint_path="./chkpnt/" -``` -使用以下命令获得更多使用说明: -``` -python train.py --help -``` -训练过程中会根据用户的设置,输出训练集上每个网络分支的`loss`, 示例如下: -``` -Iter[0]; train loss: 2.338; sub4_loss: 3.367; sub24_loss: 4.120; sub124_loss: 0.151 -``` -### 测试 -执行以下命令在`Cityscape`测试数据集上进行测试: -``` -python eval.py --model_path="./model/" --use_gpu=True -``` -需要通过选项`--model_path`指定模型文件。 -测试脚本的输出的评估指标为[mean IoU]()。 - -### 预测 -执行以下命令对指定的数据进行预测: -``` -python infer.py \ ---model_path="./model" \ ---images_path="./data/cityscape/" \ ---images_list="./data/cityscape/infer.list" -``` -通过选项`--images_list`指定列表文件,列表文件中每一行为一个要预测的图片的路径。 -预测结果默认保存到当前路径下的`output`文件夹下。 - -## 实验结果 -图2为在`CityScape`训练集上的训练的Loss曲线: - -

    -
    -图 2 -

    - -在训练集上训练,在validation数据集上验证的结果为:mean_IoU=67.0%(论文67.7%) - -图3是使用`infer.py`脚本预测产生的结果示例,其中,第一行为输入的原始图片,第二行为人工的标注,第三行为我们模型计算的结果。 -

    -
    -图 3 -

    - -## 其他信息 -|数据集 | pretrained model | -|---|---| -|CityScape | [pretrained_model](https://paddle-icnet-models.bj.bcebos.com/model_1000.tar.gz) | - -## 参考 - -- [ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545) +您好,该项目已被迁移,请移步到 [PaddleCV/icnet](../../../PaddleCV/icnet) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/image_classification/README.md b/fluid/PaddleCV/image_classification/README.md index 4f37d8f5b57aed073e1e9522380bb4a1e9181d61..55392b8ac91e4a8c24d2f2d6ac63d695cb58e146 100644 --- a/fluid/PaddleCV/image_classification/README.md +++ b/fluid/PaddleCV/image_classification/README.md @@ -1,167 +1,6 @@ -# Image Classification and Model Zoo -Image classification, which is an important field of computer vision, is to classify an image into pre-defined labels. Recently, many researchers developed different kinds of neural networks and highly improve the classification performance. This page introduces how to do image classification with PaddlePaddle Fluid. ---- -## Table of Contents -- [Installation](#installation) -- [Data preparation](#data-preparation) -- [Training a model with flexible parameters](#training-a-model-with-flexible-parameters) -- [Using Mixed-Precision Training](#using-mixed-precision-training) -- [Finetuning](#finetuning) -- [Evaluation](#evaluation) -- [Inference](#inference) -- [Supported models and performances](#supported-models-and-performances) +Hi! -## Installation +This directory has been deprecated. -Running sample code in this directory requires PaddelPaddle Fluid v0.13.0 and later, the latest release version is recommended, If the PaddlePaddle on your device is lower than v0.13.0, please follow the instructions in [installation document](http://paddlepaddle.org/documentation/docs/zh/1.3/beginners_guide/install/index_cn.html) and make an update. - -## Data preparation - -An example for ImageNet classification is as follows. First of all, preparation of imagenet data can be done as: -``` -cd data/ILSVRC2012/ -sh download_imagenet2012.sh -``` - -In the shell script ```download_imagenet2012.sh```, there are three steps to prepare data: - -**step-1:** Register at ```image-net.org``` first in order to get a pair of ```Username``` and ```AccessKey```, which are used to download ImageNet data. - -**step-2:** Download ImageNet-2012 dataset from website. The training and validation data will be downloaded into folder "train" and "val" respectively. Please note that the size of data is more than 40 GB, it will take much time to download. Users who have downloaded the ImageNet data can organize it into ```data/ILSVRC2012``` directly. - -**step-3:** Download training and validation label files. There are two label files which contain train and validation image labels respectively: - -* *train_list.txt*: label file of imagenet-2012 training set, with each line seperated by ```SPACE```, like: -``` -train/n02483708/n02483708_2436.jpeg 369 -train/n03998194/n03998194_7015.jpeg 741 -train/n04523525/n04523525_38118.jpeg 884 -... -``` -* *val_list.txt*: label file of imagenet-2012 validation set, with each line seperated by ```SPACE```, like. -``` -val/ILSVRC2012_val_00000001.jpeg 65 -val/ILSVRC2012_val_00000002.jpeg 970 -val/ILSVRC2012_val_00000003.jpeg 230 -... -``` - -You may need to modify the path in reader.py to load data correctly. - -## Training a model with flexible parameters - -After data preparation, one can start the training step by: - -``` -python train.py \ - --model=SE_ResNeXt50_32x4d \ - --batch_size=32 \ - --total_images=1281167 \ - --class_dim=1000 \ - --image_shape=3,224,224 \ - --model_save_dir=output/ \ - --with_mem_opt=False \ - --lr_strategy=piecewise_decay \ - --lr=0.1 -``` -**parameter introduction:** -* **model**: name model to use. Default: "SE_ResNeXt50_32x4d". -* **num_epochs**: the number of epochs. Default: 120. -* **batch_size**: the size of each mini-batch. Default: 256. -* **use_gpu**: whether to use GPU or not. Default: True. -* **total_images**: total number of images in the training set. Default: 1281167. -* **class_dim**: the class number of the classification task. Default: 1000. -* **image_shape**: input size of the network. Default: "3,224,224". -* **model_save_dir**: the directory to save trained model. Default: "output". -* **with_mem_opt**: whether to use memory optimization or not. Default: True. -* **lr_strategy**: learning rate changing strategy. Default: "piecewise_decay". -* **lr**: initialized learning rate. Default: 0.1. -* **pretrained_model**: model path for pretraining. Default: None. -* **checkpoint**: the checkpoint path to resume. Default: None. -* **data_dir**: the data path. Default: "./data/ILSVRC2012". -* **fp16**: whether to enable half precision training with fp16. Default: False. -* **scale_loss**: scale loss for fp16. Default: 1.0. -* **l2_decay**: L2_decay parameter. Default: 1e-4. -* **momentum_rate**: momentum_rate. Default: 0.9. - -Or can start the training step by running the ```run.sh```. - -**data reader introduction:** Data reader is defined in ```reader.py```and```reader_cv2.py```, Using CV2 reader can improve the speed of reading. In [training stage](#training-a-model-with-flexible-parameters), random crop and flipping are used, while center crop is used in [Evaluation](#evaluation) and [Inference](#inference) stages. Supported data augmentation includes: -* rotation -* color jitter -* random crop -* center crop -* resize -* flipping - -## Using Mixed-Precision Training - -You may add `--fp16=1` to start train using mixed precisioin training, which the training process will use float16 and the output model ("master" parameters) is saved as float32. You also may need to pass `--scale_loss` to overcome accuracy issues, usually `--scale_loss=8.0` will do. - -Note that currently `--fp16` can not use together with `--with_mem_opt`, so pass `--with_mem_opt=0` to disable memory optimization pass. - -## Finetuning - -Finetuning is to finetune model weights in a specific task by loading pretrained weights. After initializing ```path_to_pretrain_model```, one can finetune a model as: -``` -python train.py - --model=SE_ResNeXt50_32x4d \ - --pretrained_model=${path_to_pretrain_model} \ - --batch_size=32 \ - --total_images=1281167 \ - --class_dim=1000 \ - --image_shape=3,224,224 \ - --model_save_dir=output/ \ - --with_mem_opt=True \ - --lr_strategy=piecewise_decay \ - --lr=0.1 -``` - -## Evaluation -Evaluation is to evaluate the performance of a trained model. One can download [pretrained models](#supported-models-and-performances) and set its path to ```path_to_pretrain_model```. Then top1/top5 accuracy can be obtained by running the following command: -``` -python eval.py \ - --model=SE_ResNeXt50_32x4d \ - --batch_size=32 \ - --class_dim=1000 \ - --image_shape=3,224,224 \ - --with_mem_opt=True \ - --pretrained_model=${path_to_pretrain_model} -``` - -## Inference -Inference is used to get prediction score or image features based on trained models. -``` -python infer.py \ - --model=SE_ResNeXt50_32x4d \ - --class_dim=1000 \ - --image_shape=3,224,224 \ - --with_mem_opt=True \ - --pretrained_model=${path_to_pretrain_model} -``` - -## Supported models and performances - -Available top-1/top-5 validation accuracy on ImageNet 2012 are listed in table. Pretrained models can be downloaded by clicking related model names. - -- Released models: specify parameter names - -|model | top-1/top-5 accuracy(PIL)| top-1/top-5 accuracy(CV2) | -|- |:-: |:-:| -|[AlexNet](http://paddle-imagenet-models-name.bj.bcebos.com/AlexNet_pretrained.zip) | 56.71%/79.18% | 55.88%/78.65% | -|[VGG11](https://paddle-imagenet-models-name.bj.bcebos.com/VGG11_pretrained.zip) | 69.22%/89.09% | 69.01%/88.90% | -|[VGG13](https://paddle-imagenet-models-name.bj.bcebos.com/VGG13_pretrained.zip) | 70.14%/89.48% | 69.83%/89.13% | -|[VGG16](https://paddle-imagenet-models-name.bj.bcebos.com/VGG16_pretrained.zip) | 72.08%/90.63% | 71.65%/90.57% | -|[VGG19](https://paddle-imagenet-models-name.bj.bcebos.com/VGG19_pretrained.zip) | 72.56%/90.83% | 72.32%/90.98% | -|[MobileNetV1](http://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.zip) | 70.91%/89.54% | 70.51%/89.35% | -|[MobileNetV2](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_pretrained.zip) | 71.90%/90.55% | 71.53%/90.41% | -|[ResNet18](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_pretrained.tar) | 70.85%/89.89% | 70.65%/89.89% | -|[ResNet34](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_pretrained.tar) | 74.41%/92.03% | 74.13%/91.97% | -|[ResNet50](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_pretrained.zip) | 76.35%/92.80% | 76.22%/92.92% | -|[ResNet101](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.zip) | 77.49%/93.57% | 77.56%/93.64% | -|[ResNet152](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet152_pretrained.zip) | 78.12%/93.93% | 77.92%/93.87% | -|[SE_ResNeXt50_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNext50_32x4d_pretrained.zip) | 78.50%/94.01% | 78.44%/93.96% | -|[SE_ResNeXt101_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNeXt101_32x4d_pretrained.zip) | 79.26%/94.22% | 79.12%/94.20% | -|[GoogleNet](https://paddle-imagenet-models-name.bj.bcebos.com/GoogleNet_pretrained.tar) | 70.50%/89.59% | 70.27%/89.58% | -|[ShuffleNetV2](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNet_pretrained.tar) | | 69.48%/88.99% | +Please visit the project at [PaddleCV/image_classification](../../../PaddleCV/image_classification). diff --git a/fluid/PaddleCV/image_classification/README_cn.md b/fluid/PaddleCV/image_classification/README_cn.md index 367aa5f8e204de4be0152ddc473394a405ba905c..bb8850cff5fbd658addaba488301783d0e510a6c 100644 --- a/fluid/PaddleCV/image_classification/README_cn.md +++ b/fluid/PaddleCV/image_classification/README_cn.md @@ -1,164 +1,2 @@ -# 图像分类以及模型库 -图像分类是计算机视觉的重要领域,它的目标是将图像分类到预定义的标签。近期,许多研究者提出很多不同种类的神经网络,并且极大的提升了分类算法的性能。本页将介绍如何使用PaddlePaddle进行图像分类。 - ---- -## 内容 -- [安装](#安装) -- [数据准备](#数据准备) -- [模型训练](#模型训练) -- [混合精度训练](#混合精度训练) -- [参数微调](#参数微调) -- [模型评估](#模型评估) -- [模型预测](#模型预测) -- [已有模型及其性能](#已有模型及其性能) - -## 安装 - -在当前目录下运行样例代码需要PadddlePaddle Fluid的v0.13.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据 [installation document](http://paddlepaddle.org/documentation/docs/zh/1.3/beginners_guide/install/index_cn.html) 中的说明来更新PaddlePaddle。 - -## 数据准备 - -下面给出了ImageNet分类任务的样例,首先,通过如下的方式进行数据的准备: -``` -cd data/ILSVRC2012/ -sh download_imagenet2012.sh -``` -在```download_imagenet2012.sh```脚本中,通过下面三步来准备数据: - -**步骤一:** 首先在```image-net.org```网站上完成注册,用于获得一对```Username```和```AccessKey```。 - -**步骤二:** 从ImageNet官网下载ImageNet-2012的图像数据。训练以及验证数据集会分别被下载到"train" 和 "val" 目录中。请注意,ImaegNet数据的大小超过40GB,下载非常耗时;已经自行下载ImageNet的用户可以直接将数据组织放置到```data/ILSVRC2012```。 - -**步骤三:** 下载训练与验证集合对应的标签文件。下面两个文件分别包含了训练集合与验证集合中图像的标签: - -* *train_list.txt*: ImageNet-2012训练集合的标签文件,每一行采用"空格"分隔图像路径与标注,例如: -``` -train/n02483708/n02483708_2436.jpeg 369 -train/n03998194/n03998194_7015.jpeg 741 -train/n04523525/n04523525_38118.jpeg 884 -... -``` -* *val_list.txt*: ImageNet-2012验证集合的标签文件,每一行采用"空格"分隔图像路径与标注,例如: -``` -val/ILSVRC2012_val_00000001.jpeg 65 -val/ILSVRC2012_val_00000002.jpeg 970 -val/ILSVRC2012_val_00000003.jpeg 230 -... -``` -注意:需要根据本地环境调整reader.py相关路径来正确读取数据。 - -## 模型训练 - -数据准备完毕后,可以通过如下的方式启动训练: -``` -python train.py \ - --model=SE_ResNeXt50_32x4d \ - --batch_size=32 \ - --total_images=1281167 \ - --class_dim=1000 \ - --image_shape=3,224,224 \ - --model_save_dir=output/ \ - --with_mem_opt=False \ - --lr_strategy=piecewise_decay \ - --lr=0.1 -``` -**参数说明:** -* **model**: 模型名称, 默认值: "SE_ResNeXt50_32x4d" -* **num_epochs**: 训练回合数,默认值: 120 -* **batch_size**: 批大小,默认值: 256 -* **use_gpu**: 是否在GPU上运行,默认值: True -* **total_images**: 图片数,ImageNet2012默认值: 1281167. -* **class_dim**: 类别数,默认值: 1000 -* **image_shape**: 图片大小,默认值: "3,224,224" -* **model_save_dir**: 模型存储路径,默认值: "output/" -* **with_mem_opt**: 是否开启显存优化,默认值: False -* **lr_strategy**: 学习率变化策略,默认值: "piecewise_decay" -* **lr**: 初始学习率,默认值: 0.1 -* **pretrained_model**: 预训练模型路径,默认值: None -* **checkpoint**: 用于继续训练的检查点(指定具体模型存储路径,如"output/SE_ResNeXt50_32x4d/100/"),默认值: None -* **fp16**: 是否开启混合精度训练,默认值: False -* **scale_loss**: 调整混合训练的loss scale值,默认值: 1.0 -* **l2_decay**: l2_decay值,默认值: 1e-4 -* **momentum_rate**: momentum_rate值,默认值: 0.9 - -在```run.sh```中有用于训练的脚本. - -**数据读取器说明:** 数据读取器定义在```reader.py```和```reader_cv2.py```中。一般, CV2可以提高数据读取速度, PIL reader可以得到相对更高的精度, 我们现在默认基于PIL的数据读取器, 在[训练阶段](#模型训练), 默认采用的增广方式是随机裁剪与水平翻转, 而在[模型评估](#模型评估)与[模型预测](#模型预测)阶段用的默认方式是中心裁剪。当前支持的数据增广方式有: -* 旋转 -* 颜色抖动 -* 随机裁剪 -* 中心裁剪 -* 长宽调整 -* 水平翻转 - -## 混合精度训练 - -可以通过开启`--fp16=True`启动混合精度训练,这样训练过程会使用float16数据,并输出float32的模型参数("master"参数)。您可能需要同时传入`--scale_loss`来解决fp16训练的精度问题,通常传入`--scale_loss=8.0`即可。 - -注意,目前混合精度训练不能和内存优化功能同时使用,所以需要传`--with_mem_opt=False`这个参数来禁用内存优化功能。 - -## 参数微调 - -参数微调是指在特定任务上微调已训练模型的参数。通过初始化```path_to_pretrain_model```,微调一个模型可以采用如下的命令: -``` -python train.py - --model=SE_ResNeXt50_32x4d \ - --pretrained_model=${path_to_pretrain_model} \ - --batch_size=32 \ - --total_images=1281167 \ - --class_dim=1000 \ - --image_shape=3,224,224 \ - --model_save_dir=output/ \ - --with_mem_opt=True \ - --lr_strategy=piecewise_decay \ - --lr=0.1 -``` - -## 模型评估 -模型评估是指对训练完毕的模型评估各类性能指标。用户可以下载[已有模型及其性能](#已有模型及其性能)并且设置```path_to_pretrain_model```为模型所在路径。运行如下的命令,可以获得一个模型top-1/top-5精度: -``` -python eval.py \ - --model=SE_ResNeXt50_32x4d \ - --batch_size=32 \ - --class_dim=1000 \ - --image_shape=3,224,224 \ - --with_mem_opt=True \ - --pretrained_model=${path_to_pretrain_model} -``` - -## 模型预测 -模型预测可以获取一个模型的预测分数或者图像的特征: -``` -python infer.py \ - --model=SE_ResNeXt50_32x4d \ - --class_dim=1000 \ - --image_shape=3,224,224 \ - --with_mem_opt=True \ - --pretrained_model=${path_to_pretrain_model} -``` - -## 已有模型及其性能 -表格中列出了在```models```目录下支持的图像分类模型,并且给出了已完成训练的模型在ImageNet-2012验证集合上的top-1/top-5精度, -可以通过点击相应模型的名称下载相应预训练模型。 - -- Released models: - -|model | top-1/top-5 accuracy(PIL)| top-1/top-5 accuracy(CV2) | -|- |:-: |:-:| -|[AlexNet](http://paddle-imagenet-models-name.bj.bcebos.com/AlexNet_pretrained.zip) | 56.71%/79.18% | 55.88%/78.65% | -|[VGG11](https://paddle-imagenet-models-name.bj.bcebos.com/VGG11_pretrained.zip) | 69.22%/89.09% | 69.01%/88.90% | -|[VGG13](https://paddle-imagenet-models-name.bj.bcebos.com/VGG13_pretrained.zip) | 70.14%/89.48% | 69.83%/89.13% | -|[VGG16](https://paddle-imagenet-models-name.bj.bcebos.com/VGG16_pretrained.zip) | 72.08%/90.63% | 71.65%/90.57% | -|[VGG19](https://paddle-imagenet-models-name.bj.bcebos.com/VGG19_pretrained.zip) | 72.56%/90.83% | 72.32%/90.98% | -|[MobileNetV1](http://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV1_pretrained.zip) | 70.91%/89.54% | 70.51%/89.35% | -|[MobileNetV2](https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV2_pretrained.zip) | 71.90%/90.55% | 71.53%/90.41% | -|[ResNet18](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_pretrained.tar) | 70.85%/89.89% | 70.65%/89.89% | -|[ResNet34](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet34_pretrained.tar) | 74.41%/92.03% | 74.13%/91.97% | -|[ResNet50](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_pretrained.zip) | 76.35%/92.80% | 76.22%/92.92% | -|[ResNet101](http://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_pretrained.zip) | 77.49%/93.57% | 77.56%/93.64% | -|[ResNet152](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet152_pretrained.zip) | 78.12%/93.93% | 77.92%/93.87% | -|[SE_ResNeXt50_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNext50_32x4d_pretrained.zip) | 78.50%/94.01% | 78.44%/93.96% | -|[SE_ResNeXt101_32x4d](https://paddle-imagenet-models-name.bj.bcebos.com/SE_ResNeXt101_32x4d_pretrained.zip) | 79.26%/94.22% | 79.12%/94.20% | -|[GoogleNet](https://paddle-imagenet-models-name.bj.bcebos.com/GoogleNet_pretrained.tar) | 70.50%/89.59% | 70.27%/89.58% | -|[ShuffleNetV2](https://paddle-imagenet-models-name.bj.bcebos.com/ShuffleNet_pretrained.tar) | | 69.48%/88.99% | +您好,该项目已被迁移,请移步到 [PaddleCV/image_classification](../../../PaddleCV/image_classification) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/image_classification/README_ngraph.md b/fluid/PaddleCV/image_classification/README_ngraph.md index bb8190758d876244df931a090134f8410b6d38b3..55392b8ac91e4a8c24d2f2d6ac63d695cb58e146 100644 --- a/fluid/PaddleCV/image_classification/README_ngraph.md +++ b/fluid/PaddleCV/image_classification/README_ngraph.md @@ -1,43 +1,6 @@ -# PaddlePaddle inference and training script -This directory contains configuration and instructions to run the PaddlePaddle + nGraph for a local training and inference. +Hi! -# How to build PaddlePaddle framework with NGraph engine -In order to build the PaddlePaddle + nGraph engine and run proper script, follow up a few steps: -1. Install PaddlePaddle project -2. set env exports for nGraph and OpenMP -3. run the inference/training script - -Currently supported models: -* ResNet50 (inference and training). - -Only support Adam optimizer yet. - -Short description of aforementioned steps: - -## 1. Install PaddlePaddle -Follow PaddlePaddle [installation instruction](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification#installation) to install PaddlePaddle. If you [build from source](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/beginners_guide/install/compile/compile_Ubuntu_en.md), please use the following cmake arguments and ensure to set `-DWITH_NGRAPH=ON`. -``` -cmake .. -DCMAKE_BUILD_TYPE=Release -DWITH_GPU=OFF -DWITH_MKL=ON -DWITH_MKLDNN=ON -DWITH_NGRAPH=ON -``` -Note: MKLDNN and MKL are required. - -## 2. Set env exports for nGraph and OMP -Set the following exports needed for running nGraph: -``` -export FLAGS_use_ngraph=true -export OMP_NUM_THREADS= -``` - -If multiple threads are used, you may export the following for better performance: -``` -export KMP_AFFINITY=granularity=fine,compact,1,0 -``` - -## 3. How the benchmark script might be run. -If everything built successfully, you can run command in ResNet50 nGraph session in script [run.sh](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/image_classification/run.sh) to start the benchmark job locally. You will need to uncomment the `#ResNet50 nGraph` part of script. - -Above is training job using the nGraph, to run the inference job using the nGraph: - -Please download the pre-trained resnet50 model from [supported models](https://github.com/PaddlePaddle/models/tree/72dcc7c1a8d5de9d19fbd65b4143bd0d661eee2c/fluid/PaddleCV/image_classification#supported-models-and-performances) for inference script. +This directory has been deprecated. +Please visit the project at [PaddleCV/image_classification](../../../PaddleCV/image_classification). diff --git a/fluid/PaddleCV/metric_learning/README.md b/fluid/PaddleCV/metric_learning/README.md index d72a2505d7650963a562f306f0bbf85ec3bbc759..6afd28a457c639af25337cc02a6b5b64658845ff 100644 --- a/fluid/PaddleCV/metric_learning/README.md +++ b/fluid/PaddleCV/metric_learning/README.md @@ -1,113 +1,6 @@ -# Deep Metric Learning -Metric learning is a kind of methods to learn discriminative features for each sample, with the purpose that intra-class samples have smaller distances while inter-class samples have larger distances in the learned space. With the develop of deep learning technique, metric learning methods are combined with deep neural networks to boost the performance of traditional tasks, such as face recognition/verification, human re-identification, image retrieval and so on. In this page, we introduce the way to implement deep metric learning using PaddlePaddle Fluid, including [data preparation](#data-preparation), [training](#training-metric-learning-models), [finetuning](#finetuning), [evaluation](#evaluation), [inference](#inference) and [Performances](#performances). ---- -## Table of Contents -- [Installation](#installation) -- [Data preparation](#data-preparation) -- [Training metric learning models](#training-metric-learning-models) -- [Finetuning](#finetuning) -- [Evaluation](#evaluation) -- [Inference](#inference) -- [Performances](#performances) +Hi! -## Installation +This directory has been deprecated. -Running sample code in this directory requires PaddelPaddle Fluid v0.14.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html) and make an update. - -## Data preparation - -Stanford Online Product(SOP) dataset contains 120,053 images of 22,634 products downloaded from eBay.com. We use it to conduct the metric learning experiments. For training, 59,551 out of 11,318 classes are used, and 11,316 classes(60,502 images) are held out for testing. First of all, preparation of SOP data can be done as: -``` -cd data/ -sh download_sop.sh -``` - -## Training metric learning models - -To train a metric learning model, one need to set the neural network as backbone and the metric loss function to optimize. We train meiric learning model using softmax or arcmargin loss firstly, and then fine-turned the model using other metric learning loss, such as triplet, quadruplet and eml loss. One example of training using arcmargin loss is shown below: - - -``` -python train_elem.py \ - --model=ResNet50 \ - --train_batch_size=256 \ - --test_batch_size=50 \ - --lr=0.01 \ - --total_iter_num=30000 \ - --use_gpu=True \ - --pretrained_model=${path_to_pretrain_imagenet_model} \ - --model_save_dir=${output_model_path} \ - --loss_name=arcmargin \ - --arc_scale=80.0 \ - --arc_margin=0.15 \ - --arc_easy_margin=False -``` -**parameter introduction:** -* **model**: name model to use. Default: "ResNet50". -* **train_batch_size**: the size of each training mini-batch. Default: 256. -* **test_batch_size**: the size of each testing mini-batch. Default: 50. -* **lr**: initialized learning rate. Default: 0.01. -* **total_iter_num**: total number of training iterations. Default: 30000. -* **use_gpu**: whether to use GPU or not. Default: True. -* **pretrained_model**: model path for pretraining. Default: None. -* **model_save_dir**: the directory to save trained model. Default: "output". -* **loss_name**: loss for training model. Default: "softmax". -* **arc_scale**: parameter of arcmargin loss. Default: 80.0. -* **arc_margin**: parameter of arcmargin loss. Default: 0.15. -* **arc_easy_margin**: parameter of arcmargin loss. Default: False. - -## Finetuning - -Finetuning is to finetune model weights in a specific task by loading pretrained weights. After training model using softmax or arcmargin loss, one can finetune the model using triplet, quadruplet or eml loss. One example of fine-turned using eml loss is shown below: - -``` -python train_pair.py \ - --model=ResNet50 \ - --train_batch_size=160 \ - --test_batch_size=50 \ - --lr=0.0001 \ - --total_iter_num=100000 \ - --use_gpu=True \ - --pretrained_model=${path_to_pretrain_arcmargin_model} \ - --model_save_dir=${output_model_path} \ - --loss_name=eml \ - --samples_each_class=2 -``` - -## Evaluation -Evaluation is to evaluate the performance of a trained model. You should set model path to ```path_to_pretrain_model```. Then Recall@Rank-1 can be obtained by running the following command: -``` -python eval.py \ - --model=ResNet50 \ - --batch_size=50 \ - --pretrained_model=${path_to_pretrain_model} \ -``` - -## Inference -Inference is used to get prediction score or image features based on trained models. -``` -python infer.py \ - --model=ResNet50 \ - --batch_size=1 \ - --pretrained_model=${path_to_pretrain_model} -``` - -## Performances - -For comparation, many metric learning models with different neural networks and loss functions are trained using corresponding experiential parameters. Recall@Rank-1 is used as evaluation metric and the performance is listed in the table. - -|pretrain model | softmax | arcmargin -|- | - | -: -|without fine-tuned | 77.42% | 78.11% -|fine-tuned with triplet | 78.37% | 79.21% -|fine-tuned with quadruplet | 78.10% | 79.59% -|fine-tuned with eml | 79.32% | 80.11% -|fine-tuned with npairs | - | 79.81% - -## Reference - -- ArcFace: Additive Angular Margin Loss for Deep Face Recognition [link](https://arxiv.org/abs/1801.07698) -- Margin Sample Mining Loss: A Deep Learning Based Method for Person Re-identification [link](https://arxiv.org/abs/1710.00478) -- Large Scale Strongly Supervised Ensemble Metric Learning, with Applications to Face Verification and Retrieval [link](https://arxiv.org/abs/1212.6094) -- Improved Deep Metric Learning with Multi-class N-pair Loss Objective [link](http://www.nec-labs.com/uploads/images/Department-Images/MediaAnalytics/papers/nips16_npairmetriclearning.pdf) +Please visit the project at [PaddleCV/metric_learning](../../../PaddleCV/metric_learning). diff --git a/fluid/PaddleCV/metric_learning/README_cn.md b/fluid/PaddleCV/metric_learning/README_cn.md index 1b9efda881fc9045d70527fe7dad3d99a48eaf19..72417ed9badfc4858f314f143dd069d4ff6a0e6a 100644 --- a/fluid/PaddleCV/metric_learning/README_cn.md +++ b/fluid/PaddleCV/metric_learning/README_cn.md @@ -1,113 +1,2 @@ -# 深度度量学习 -度量学习是一种为样本对学习具有区分性特征的方法,目的是在特征空间中,让同一个类别的样本具有较小的特征距离,不同类的样本具有较大的特征距离。随着深度学习技术的发展,基于深度神经网络的度量学习方法已经在许多视觉任务上提升了很大的性能,例如:人脸识别、人脸校验、行人重识别和图像检索等等。在本章节,介绍在PaddlePaddle Fluid里实现的几种度量学习方法和使用方法,具体包括[数据准备](#数据准备),[模型训练](#模型训练),[模型微调](#模型微调),[模型评估](#模型评估),[模型预测](#模型预测)。 ---- -## 简介 -- [安装](#安装) -- [数据准备](#数据准备) -- [模型训练](#模型训练) -- [模型微调](#模型微调) -- [模型评估](#模型评估) -- [模型预测](#模型预测) -- [模型性能](#模型性能) - -## 安装 - -运行本章节代码需要在PaddlePaddle Fluid v0.14.0 或更高的版本环境。如果你的设备上的PaddlePaddle版本低于v0.14.0,请按照此[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)进行安装和跟新。 - -## 数据准备 - -Stanford Online Product(SOP) 数据集下载自eBay,包含120053张商品图片,有22634个类别。我们使用该数据集进行实验。训练时,使用59551张图片,11318个类别的数据;测试时,使用60502张图片,11316个类别。首先,SOP数据集可以使用以下脚本下载: -``` -cd data/ -sh download_sop.sh -``` - -## 模型训练 - -为了训练度量学习模型,我们需要一个神经网络模型作为骨架模型(如ResNet50)和度量学习代价函数来进行优化。我们首先使用 softmax 或者 arcmargin 来进行训练,然后使用其它的代价函数来进行微调,例如:triplet,quadruplet和eml。下面是一个使用arcmargin训练的例子: - - -``` -python train_elem.py \ - --model=ResNet50 \ - --train_batch_size=256 \ - --test_batch_size=50 \ - --lr=0.01 \ - --total_iter_num=30000 \ - --use_gpu=True \ - --pretrained_model=${path_to_pretrain_imagenet_model} \ - --model_save_dir=${output_model_path} \ - --loss_name=arcmargin \ - --arc_scale=80.0 \ - --arc_margin=0.15 \ - --arc_easy_margin=False -``` -**参数介绍:** -* **model**: 使用的模型名字. 默认: "ResNet50". -* **train_batch_size**: 训练的 mini-batch大小. 默认: 256. -* **test_batch_size**: 测试的 mini-batch大小. 默认: 50. -* **lr**: 初始学习率. 默认: 0.01. -* **total_iter_num**: 总的训练迭代轮数. 默认: 30000. -* **use_gpu**: 是否使用GPU. 默认: True. -* **pretrained_model**: 预训练模型的路径. 默认: None. -* **model_save_dir**: 保存模型的路径. 默认: "output". -* **loss_name**: 优化的代价函数. 默认: "softmax". -* **arc_scale**: arcmargin的参数. 默认: 80.0. -* **arc_margin**: arcmargin的参数. 默认: 0.15. -* **arc_easy_margin**: arcmargin的参数. 默认: False. - -## 模型微调 - -网络微调是在指定的任务上加载已有的模型来微调网络。在用softmax和arcmargin训完网络后,可以继续使用triplet,quadruplet或eml来微调网络。下面是一个使用eml来微调网络的例子: - -``` -python train_pair.py \ - --model=ResNet50 \ - --train_batch_size=160 \ - --test_batch_size=50 \ - --lr=0.0001 \ - --total_iter_num=100000 \ - --use_gpu=True \ - --pretrained_model=${path_to_pretrain_arcmargin_model} \ - --model_save_dir=${output_model_path} \ - --loss_name=eml \ - --samples_each_class=2 -``` - -## 模型评估 -模型评估主要是评估模型的检索性能。这里需要设置```path_to_pretrain_model```。可以使用下面命令来计算Recall@Rank-1。 -``` -python eval.py \ - --model=ResNet50 \ - --batch_size=50 \ - --pretrained_model=${path_to_pretrain_model} \ -``` - -## 模型预测 -模型预测主要是基于训练好的网络来获取图像数据的特征,下面是模型预测的例子: -``` -python infer.py \ - --model=ResNet50 \ - --batch_size=1 \ - --pretrained_model=${path_to_pretrain_model} -``` - -## 模型性能 - -下面列举了几种度量学习的代价函数在SOP数据集上的检索效果,这里使用Recall@Rank-1来进行评估。 - -|预训练模型 | softmax | arcmargin -|- | - | -: -|未微调 | 77.42% | 78.11% -|使用triplet微调 | 78.37% | 79.21% -|使用quadruplet微调 | 78.10% | 79.59% -|使用eml微调 | 79.32% | 80.11% -|使用npairs微调 | - | 79.81% - -## 引用 - -- ArcFace: Additive Angular Margin Loss for Deep Face Recognition [链接](https://arxiv.org/abs/1801.07698) -- Margin Sample Mining Loss: A Deep Learning Based Method for Person Re-identification [链接](https://arxiv.org/abs/1710.00478) -- Large Scale Strongly Supervised Ensemble Metric Learning, with Applications to Face Verification and Retrieval [链接](https://arxiv.org/abs/1212.6094) -- Improved Deep Metric Learning with Multi-class N-pair Loss Objective [链接](http://www.nec-labs.com/uploads/images/Department-Images/MediaAnalytics/papers/nips16_npairmetriclearning.pdf) +您好,该项目已被迁移,请移步到 [PaddleCV/metric_learning](../../../PaddleCV/metric_learning) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/object_detection/README.md b/fluid/PaddleCV/object_detection/README.md index 2466ba96577c7cb1e2bb335a0b8b5c74edbb92fd..99b0f8db58cc8e2ef130c0054b40bf746b5ac2c8 100644 --- a/fluid/PaddleCV/object_detection/README.md +++ b/fluid/PaddleCV/object_detection/README.md @@ -1,96 +1,6 @@ -## SSD Object Detection -## Table of Contents -- [Introduction](#introduction) -- [Data Preparation](#data-preparation) -- [Train](#train) -- [Evaluate](#evaluate) -- [Infer and Visualize](#infer-and-visualize) -- [Released Model](#released-model) +Hi! -### Introduction +This directory has been deprecated. -[Single Shot MultiBox Detector (SSD)](https://arxiv.org/abs/1512.02325) framework for object detection can be categorized as a single stage detector. A single stage detector simplifies object detection as a regression problem, which directly predicts the bounding boxes and class probabilities without region proposal. SSD further makes improves by producing these predictions of different scales from different layers, as shown below. Six levels predictions are made in six different scale feature maps. And there are two 3x3 convolutional layers in each feature map, which predict category or a shape offset relative to the prior box(also called anchor), respectively. Thus, we get 38x38x4 + 19x19x6 + 10x10x6 + 5x5x6 + 3x3x4 + 1x1x4 = 8732 detections per class. -

    -
    -The Single Shot MultiBox Detector (SSD) -

    - -SSD is readily pluggable into a wide variant standard convolutional network, such as VGG, ResNet, or MobileNet, which is also called base network or backbone. In this tutorial we used [MobileNet](https://arxiv.org/abs/1704.04861). - - -### Data Preparation - -Please download [PASCAL VOC dataset](http://host.robots.ox.ac.uk/pascal/VOC/) at first, skip this step if you already have one. - -```bash -cd data/pascalvoc -./download.sh -``` - -The command `download.sh` also will create training and testing file lists. - -### Train - -#### Download the Pre-trained Model. - -We provide two pre-trained models. The one is MobileNet-v1 SSD trained on COCO dataset, but removed the convolutional predictors for COCO dataset. This model can be used to initialize the models when training other datasets, like PASCAL VOC. The other pre-trained model is MobileNet-v1 trained on ImageNet 2012 dataset but removed the last weights and bias in the Fully-Connected layer. Download MobileNet-v1 SSD: - - ```bash - ./pretrained/download_coco.sh - ``` - -Declaration: the MobileNet-v1 SSD model is converted by [TensorFlow model](https://github.com/tensorflow/models/blob/f87a58cd96d45de73c9a8330a06b2ab56749a7fa/research/object_detection/g3doc/detection_model_zoo.md). - - -#### Train on PASCAL VOC - -`train.py` is the main caller of the training module. Examples of usage are shown below. - ```bash - python -u train.py --batch_size=64 --dataset='pascalvoc' --pretrained_model='pretrained/ssd_mobilenet_v1_coco/' - ``` - - Set ```export CUDA_VISIBLE_DEVICES=0,1``` to specifiy the number of GPU you want to use. - - For more help on arguments: - - ```bash - python train.py --help - ``` - -Data reader is defined in `reader.py`. All images will be resized to 300x300. In training stage, images are randomly distorted, expanded, cropped and flipped: - - distort: distort brightness, contrast, saturation, and hue. - - expand: put the original image into a larger expanded image which is initialized using image mean. - - crop: crop image with respect to different scale, aspect ratio, and overlap. - - flip: flip horizontally. - -We used RMSProp optimizer with mini-batch size 64 to train the MobileNet-SSD. The initial learning rate is 0.001, and was decayed at 40, 60, 80, 100 epochs with multiplier 0.5, 0.25, 0.1, 0.01, respectively. Weight decay is 0.00005. After 120 epochs we achieve 73.32% mAP under 11point metric. - -### Evaluate - -You can evaluate your trained model in different metrics like 11point, integral on both PASCAL VOC and COCO dataset. Note we set the default test list to the dataset's test/val list, you can use your own test list by setting ```--test_list``` args. - -`eval.py` is the main caller of the evaluating module. Examples of usage are shown below. -```bash -python eval.py --dataset='pascalvoc' --model_dir='train_pascal_model/best_model' --data_dir='data/pascalvoc' --test_list='test.txt' --ap_version='11point' --nms_threshold=0.45 -``` - -### Infer and Visualize -`infer.py` is the main caller of the inferring module. Examples of usage are shown below. -```bash -python infer.py --dataset='pascalvoc' --nms_threshold=0.45 --model_dir='train_pascal_model/best_model' --image_path='./data/pascalvoc/VOCdevkit/VOC2007/JPEGImages/009963.jpg' -``` -Below are the examples of running the inference and visualizing the model result. -

    - - - -
    -MobileNet-v1-SSD 300x300 Visualization Examples -

    - - -### Released Model - - -| Model | Pre-trained Model | Training data | Test data | mAP | -|:------------------------:|:------------------:|:----------------:|:------------:|:----:| -|[MobileNet-v1-SSD 300x300](http://paddlemodels.bj.bcebos.com/ssd_mobilenet_v1_pascalvoc.tar.gz) | COCO MobileNet SSD | VOC07+12 trainval| VOC07 test | 73.32% | +Please visit the project at [PaddleCV/object_detection](../../../PaddleCV/object_detection). diff --git a/fluid/PaddleCV/object_detection/README_cn.md b/fluid/PaddleCV/object_detection/README_cn.md index 8c4cecab28e49c10820e092d3a521facf4be68ea..d3af497b9aecf23db4976970fbe16bc6c99bf6ff 100644 --- a/fluid/PaddleCV/object_detection/README_cn.md +++ b/fluid/PaddleCV/object_detection/README_cn.md @@ -1,99 +1,2 @@ -## SSD 目标检测 -## Table of Contents -- [简介](#简介) -- [数据准备](#数据准备) -- [模型训练](#模型训练) -- [模型评估](#模型评估) -- [模型预测以及可视化](#模型预测以及可视化) -- [模型发布](#模型发布) - -### 简介 - -[Single Shot MultiBox Detector (SSD)](https://arxiv.org/abs/1512.02325) 是一种单阶段的目标检测器。与两阶段的检测方法不同,单阶段目标检测并不进行区域推荐,而是直接从特征图回归出目标的边界框和分类概率。SSD 运用了这种单阶段检测的思想,并且对其进行改进:在不同尺度的特征图上检测对应尺度的目标。如下图所示,SSD 在六个尺度的特征图上进行了不同层级的预测。每个层级由两个3x3卷积分别对目标类别和边界框偏移进行回归。因此对于每个类别,SSD 的六个层级一共会产生 38x38x4 + 19x19x6 + 10x10x6 + 5x5x6 + 3x3x4 + 1x1x4 = 8732 个检测结果。 -

    -
    -SSD 目标检测模型 -

    - -SSD 可以方便地插入到任何一种标准卷积网络中,比如 VGG、ResNet 或者 MobileNet,这些网络被称作检测器的基网络。在这个示例中我们使用 [MobileNet](https://arxiv.org/abs/1704.04861)。 - - -### 数据准备 - - -请先使用下面的命令下载 [PASCAL VOC 数据集](http://host.robots.ox.ac.uk/pascal/VOC/): - -```bash -cd data/pascalvoc -./download.sh -``` - -`download.sh` 命令会自动创建训练和测试用的列表文件。 - - -### 模型训练 - -#### 下载预训练模型 - -我们提供了两个预训练模型。第一个模型是在 COCO 数据集上预训练的 MobileNet-v1 SSD,我们将它的预测头移除了以便在 COCO 以外的数据集上进行训练。第二个模型是在 ImageNet 2012 数据集上预训练的 MobileNet-v1,我们也将最后的全连接层移除以便进行目标检测训练。下载 MobileNet-v1 SSD: - - ```bash - ./pretrained/download_coco.sh - ``` - -声明:MobileNet-v1 SSD 模型转换自[TensorFlow model](https://github.com/tensorflow/models/blob/f87a58cd96d45de73c9a8330a06b2ab56749a7fa/research/object_detection/g3doc/detection_model_zoo.md)。MobileNet-v1 模型转换自[Caffe](https://github.com/shicai/MobileNet-Caffe)。 - - -#### 训练 - -`train.py` 是训练模块的主要执行程序,调用示例如下: - ```bash - python -u train.py --batch_size=64 --dataset='pascalvoc' --pretrained_model='pretrained/ssd_mobilenet_v1_coco/' - ``` - - 可以通过设置 ```export CUDA_VISIBLE_DEVICES=0,1``` 指定想要使用的GPU数量。 - - 更多的可选参数见: - - ```bash - python train.py --help - ``` - -数据的读取行为定义在 `reader.py` 中,所有的图片都会被缩放到300x300。在训练时还会对图片进行数据增强,包括随机扰动、扩张、翻转和裁剪: - - 扰动: 扰动图片亮度、对比度、饱和度和色相。 - - 扩张: 将原始图片放进一张使用像素均值填充(随后会在减均值操作中减掉)的扩张图中,再对此图进行裁剪、缩放和翻转。 - - 翻转: 水平翻转。 - - 裁剪: 根据缩放比例、长宽比例两个参数生成若干候选框,再依据这些候选框和标注框的面积交并比(IoU)挑选出符合要求的裁剪结果。 - -我们使用了 RMSProp 优化算法来训练 MobileNet-SSD,batch大小为64,权重衰减系数为0.00005,初始学习率为 0.001,并且在第40、60、80、100 轮时使用 0.5, 0.25, 0.1, 0.01乘子进行学习率衰减。在120轮训练后,11point评价标准下的mAP为73.32%。 - -### 模型评估 - -你可以使用11point、integral等指标在PASCAL VOC 数据集上评估训练好的模型。不失一般性,我们采用相应数据集的测试列表作为样例代码的默认列表,你也可以通过设置```--test_list```来指定自己的测试样本列表。 - -`eval.py`是评估模块的主要执行程序,调用示例如下: -```bash -python eval.py --dataset='pascalvoc' --model_dir='train_pascal_model/best_model' --data_dir='data/pascalvoc' --test_list='test.txt' --ap_version='11point' --nms_threshold=0.45 -``` - -### 模型预测以及可视化 - -`infer.py`是预测及可视化模块的主要执行程序,调用示例如下: -```bash -python infer.py --dataset='pascalvoc' --nms_threshold=0.45 --model_dir='train_pascal_model/best_model' --image_path='./data/pascalvoc/VOCdevkit/VOC2007/JPEGImages/009963.jpg' -``` -下图可视化了模型的预测结果: -

    - - - -
    -MobileNet-v1-SSD 300x300 预测可视化 -

    - - -### 模型发布 - - -| 模型 | 预训练模型 | 训练数据 | 测试数据 | mAP | -|:------------------------:|:------------------:|:----------------:|:------------:|:----:| -|[MobileNet-v1-SSD 300x300](http://paddlemodels.bj.bcebos.com/ssd_mobilenet_v1_pascalvoc.tar.gz) | COCO MobileNet SSD | VOC07+12 trainval| VOC07 test | 73.32% | +您好,该项目已被迁移,请移步到 [PaddleCV/object_detection](../../../PaddleCV/object_detection) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/object_detection/README_quant.md b/fluid/PaddleCV/object_detection/README_quant.md index 7ea7f7bd79d21ba34c84d1a1b48a5298837939ac..99b0f8db58cc8e2ef130c0054b40bf746b5ac2c8 100644 --- a/fluid/PaddleCV/object_detection/README_quant.md +++ b/fluid/PaddleCV/object_detection/README_quant.md @@ -1,146 +1,6 @@ -## Quantization-aware training for SSD -### Introduction +Hi! -The quantization-aware training used in this experiments is introduced in [fixed-point quantization desigin](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/design/quantization/fixed_point_quantization.md). Since quantization-aware training is still an active area of research and experimentation, -here, we just give an simple quantization training usage in Fluid based on MobileNet-SSD model, and more other exeperiments are still needed, like how to quantization traning by considering fusing batch normalization and convolution/fully-connected layers, channel-wise quantization of weights and so on. +This directory has been deprecated. - -A Python transpiler is used to rewrite Fluid training program or evaluation program for quantization-aware training: - -```python - - #startup_prog = fluid.Program() - #train_prog = fluid.Program() - #loss = build_program( - # main_prog=train_prog, - # startup_prog=startup_prog, - # is_train=True) - #build_program( - # main_prog=test_prog, - # startup_prog=startup_prog, - # is_train=False) - #test_prog = test_prog.clone(for_test=True) - # above is an pseudo code - - transpiler = fluid.contrib.QuantizeTranspiler( - weight_bits=8, - activation_bits=8, - activation_quantize_type='abs_max', # or 'range_abs_max' - weight_quantize_type='abs_max') - # note, transpiler.training_transpile will rewrite train_prog - # startup_prog is needed since it needs to insert and initialize - # some state variable - transpiler.training_transpile(train_prog, startup_prog) - transpiler.training_transpile(test_prog, startup_prog) -``` - - According to above design, this transpiler inserts fake quantization and de-quantization operation for each convolution operation (including depthwise convolution operation) and fully-connected operation. These quantizations take affect on weights and activations. - - In the design, we introduce dynamic quantization and static quantization strategies for different activation quantization methods. In the expriments, when set `activation_quantize_type` to `abs_max`, it is dynamic quantization. That is to say, the quantization scale (maximum of absolute value) of activation will be calculated each mini-batch during inference. When set `activation_quantize_type` to `range_abs_max`, a quantization scale for inference period will be calculated during training. Following part will introduce how to train. - -### Quantization-aware training - - The training is fine-tuned on the well-trained MobileNet-SSD model. So download model at first: - - ``` - wget http://paddlemodels.bj.bcebos.com/ssd_mobilenet_v1_pascalvoc.tar.gz - ``` - -- dynamic quantization: - - ```python - python main_quant.py \ - --data_dir=$PascalVOC_DIR$ \ - --mode='train' \ - --init_model=ssd_mobilenet_v1_pascalvoc \ - --act_quant_type='abs_max' \ - --epoc_num=20 \ - --learning_rate=0.0001 \ - --batch_size=64 \ - --model_save_dir=$OUTPUT_DIR$ - ``` - Since fine-tuned on a well-trained model, we use a small start learnng rate 0.0001, and train 20 epocs. - -- static quantization: - ```python - python main_quant.py \ - --data_dir=$PascalVOC_DIR$ \ - --mode='train' \ - --init_model=ssd_mobilenet_v1_pascalvoc \ - --act_quant_type='range_abs_max' \ - --epoc_num=80 \ - --learning_rate=0.001 \ - --lr_epochs=30,60 \ - --lr_decay_rates=1,0.1,0.01 \ - --batch_size=64 \ - --model_save_dir=$OUTPUT_DIR$ - ``` - Here, train 80 epocs, learning rate decays at 30 and 60 epocs by 0.1 every time. Users can adjust these hype-parameters. - -### Convert to inference model - - As described in the design documentation, the inference graph is a little different from training, the difference is the de-quantization operation is before or after conv/fc. This is equivalent in training due to linear operation of conv/fc and de-quantization and functions' commutative law. But for inference, it needs to convert the graph, `fluid.contrib.QuantizeTranspiler.freeze_program` is used to do this: - - ```python - #startup_prog = fluid.Program() - #test_prog = fluid.Program() - #test_py_reader, map_eval, nmsed_out, image = build_program( - # main_prog=test_prog, - # startup_prog=startup_prog, - # train_params=configs, - # is_train=False) - #test_prog = test_prog.clone(for_test=True) - #transpiler = fluid.contrib.QuantizeTranspiler(weight_bits=8, - # activation_bits=8, - # activation_quantize_type=act_quant_type, - # weight_quantize_type='abs_max') - #transpiler.training_transpile(test_prog, startup_prog) - #place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace() - #exe = fluid.Executor(place) - #exe.run(startup_prog) - - def if_exist(var): - return os.path.exists(os.path.join(init_model, var.name)) - fluid.io.load_vars(exe, init_model, main_program=test_prog, - predicate=if_exist) - # freeze the rewrited training program - # freeze after load parameters, it will quantized weights - transpiler.freeze_program(test_prog, place) - ``` - - Users can evaluate the converted model by: - - ``` - python main_quant.py \ - --data_dir=$PascalVOC_DIR$ \ - --mode='test' \ - --init_model=$MODLE_DIR$ \ - --model_save_dir=$MobileNet_SSD_8BIT_MODEL$ - ``` - - You also can check the 8-bit model by the inference scripts - - ``` - python main_quant.py \ - --mode='infer' \ - --init_model=$MobileNet_SSD_8BIT_MODEL$ \ - --confs_threshold=0.5 \ - --image_path='/data/PascalVOC/VOCdevkit/VOC2007/JPEGImages/002271.jpg' - ``` - See 002271.jpg for the visualized image with bbouding boxes. - - - **Note**, if you want to convert model to 8-bit, you should call `fluid.contrib.QuantizeTranspiler.convert_to_int8` to do this. But, now Paddle can't load 8-bit model to do inference. - -### Results - -Results of MobileNet-v1-SSD 300x300 model on PascalVOC dataset. - -| Model | mAP | -|:---------------------------------------:|:------------------:| -|Floating point: 32bit | 73.32% | -|Fixed point: 8bit, dynamic quantization | 72.77% | -|Fixed point: 8bit, static quantization | 72.45% | - - As mentioned above, other experiments, like how to quantization traning by considering fusing batch normalization and convolution/fully-connected layers, channel-wise quantization of weights, quantizated weights type with uint8 instead of int8 and so on. +Please visit the project at [PaddleCV/object_detection](../../../PaddleCV/object_detection). diff --git a/fluid/PaddleCV/ocr_recognition/README.md b/fluid/PaddleCV/ocr_recognition/README.md index 1c9553993e84d10376441407704088ec4dd66c0c..aa675d6048ecdb025ef2273ee755354152adc32e 100644 --- a/fluid/PaddleCV/ocr_recognition/README.md +++ b/fluid/PaddleCV/ocr_recognition/README.md @@ -1,206 +1,2 @@ - -运行本目录下的程序示例需要使用PaddlePaddle develop最新版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。 - -## 代码结构 -``` -├── data_reader.py # 下载、读取、处理数据。 -├── crnn_ctc_model.py # 定义了OCR CTC model的网络结构。 -├── attention_model.py # 定义了OCR attention model的网络结构。 -├── train.py # 用于模型的训练。 -├── infer.py # 加载训练好的模型文件,对新数据进行预测。 -├── eval.py # 评估模型在指定数据集上的效果。 -└── utils.py # 定义通用的函数。 -``` - - -## 简介 - -本章的任务是识别图片中单行英文字符,这里我们分别使用CTC model和attention model两种不同的模型来完成该任务。 - -这两种模型的有相同的编码部分,首先采用卷积将图片转为特征图, 然后使用`im2sequence op`将特征图转为序列,通过`双向GRU`学习到序列特征。 - -两种模型的解码部分和使用的损失函数区别如下: - -- CTC model: 训练过程选用的损失函数为CTC(Connectionist Temporal Classification) loss, 预测阶段采用的是贪婪策略和CTC解码策略。 -- Attention model: 训练过程选用的是带注意力机制的解码策略和交叉信息熵损失函数,预测阶段采用的是柱搜索策略。 - -训练以上两种模型的评估指标为样本级别的错误率。 - -## 数据 - -数据的下载和简单预处理都在`data_reader.py`中实现。 - -### 数据示例 - -我们使用的训练和测试数据如`图1`所示,每张图片包含单行不定长的英文字符串,这些图片都是经过检测算法进行预框选处理的。 - -

    -
    -图 1 -

    - -在训练集中,每张图片对应的label是汉字在词典中的索引。 `图1` 对应的label如下所示: -``` -80,84,68,82,83,72,78,77,68,67 -``` -在上边这个label中,`80` 表示字符`Q`的索引,`67` 表示英文字符`D`的索引。 - - -### 数据准备 - -**A. 训练集** - -我们需要把所有参与训练的图片放入同一个文件夹,暂且记为`train_images`。然后用一个list文件存放每张图片的信息,包括图片大小、图片名称和对应的label,这里暂记该list文件为`train_list`,其格式如下所示: - -``` -185 48 00508_0215.jpg 7740,5332,2369,3201,4162 -48 48 00197_1893.jpg 6569 -338 48 00007_0219.jpg 4590,4788,3015,1994,3402,999,4553 -150 48 00107_4517.jpg 5936,3382,1437,3382 -... -157 48 00387_0622.jpg 2397,1707,5919,1278 -``` - -
    文件train_list
    - -上述文件中的每一行表示一张图片,每行被空格分为四列,前两列分别表示图片的宽和高,第三列表示图片的名称,第四列表示该图片对应的sequence label。 -最终我们应有以下类似文件结构: - -``` -|-train_data - |- train_list - |- train_imags - |- 00508_0215.jpg - |- 00197_1893.jpg - |- 00007_0219.jpg - | ... -``` - -在训练时,我们通过选项`--train_images` 和 `--train_list` 分别设置准备好的`train_images` 和`train_list`。 - - ->**注:** 如果`--train_images` 和 `--train_list`都未设置或设置为None, reader.py会自动下载使用[示例数据](http://paddle-ocr-data.bj.bcebos.com/data.tar.gz),并将其缓存到`$HOME/.cache/paddle/dataset/ctc_data/data/` 路径下。 - - -**B. 测试集和评估集** - -测试集、评估集的准备方式与训练集相同。 -在训练阶段,测试集的路径通过train.py的选项`--test_images` 和 `--test_list` 来设置。 -在评估时,评估集的路径通过eval.py的选项`--input_images_dir` 和`--input_images_list` 来设置。 - -**C. 待预测数据集** - -预测支持三种形式的输入: - -第一种:设置`--input_images_dir`和`--input_images_list`, 与训练集类似, 只不过list文件中的最后一列可以放任意占位字符或字符串,如下所示: - -``` -185 48 00508_0215.jpg s -48 48 00197_1893.jpg s -338 48 00007_0219.jpg s -... -``` - -第二种:仅设置`--input_images_list`, 其中list文件中只需放图片的完整路径,如下所示: - -``` -data/test_images/00000.jpg -data/test_images/00001.jpg -data/test_images/00003.jpg -``` - -第三种:从stdin读入一张图片的path,然后进行一次inference. - -## 模型训练与预测 - -### 训练 - -使用默认数据在GPU单卡上训练: - -``` -env CUDA_VISIBLE_DEVICES=0 python train.py -``` -使用默认数据在CPU上训练: -``` -env OMP_NUM_THREADS= python train.py --use_gpu False --parallel=False -``` - -使用默认数据在GPU多卡上训练: - -``` -env CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --parallel=True -``` - -默认使用的是`CTC model`, 可以通过选项`--model="attention"`切换为`attention model`。 - -执行`python train.py --help`可查看更多使用方式和参数详细说明。 - -图2为使用默认参数在默认数据集上训练`CTC model`的收敛曲线,其中横坐标轴为训练迭代次数,纵轴为样本级错误率。其中,蓝线为训练集上的样本错误率,红线为测试集上的样本错误率。测试集上最低错误率为22.0%. - -

    -
    -图 2 -

    - -图3为使用默认参数在默认数据集上训练`attention model`的收敛曲线,其中横坐标轴为训练迭代次数,纵轴为样本级错误率。其中,蓝线为训练集上的样本错误率,红线为测试集上的样本错误率。测试集上最低错误率为16.25%. - -

    -
    -图 3 -

    - - -## 测试 - -通过以下命令调用评估脚本用指定数据集对模型进行评估: - -``` -env CUDA_VISIBLE_DEVICE=0 python eval.py \ - --model_path="./models/model_0" \ - --input_images_dir="./eval_data/images/" \ - --input_images_list="./eval_data/eval_list\" \ -``` - -执行`python train.py --help`可查看参数详细说明。 - - -### 预测 - -从标准输入读取一张图片的路径,并对齐进行预测: - -``` -env CUDA_VISIBLE_DEVICE=0 python infer.py \ - --model_path="models/model_00044_15000" -``` - -执行上述命令进行预测的效果如下: - -``` ------------ Configuration Arguments ----------- -use_gpu: True -input_images_dir: None -input_images_list: None -model_path: /home/work/models/fluid/ocr_recognition/models/model_00052_15000 ------------------------------------------------- -Init model from: ./models/model_00052_15000. -Please input the path of image: ./test_images/00001_0060.jpg -result: [3298 2371 4233 6514 2378 3298 2363] -Please input the path of image: ./test_images/00001_0429.jpg -result: [2067 2067 8187 8477 5027 7191 2431 1462] -``` - -从文件中批量读取图片路径,并对其进行预测: - -``` -env CUDA_VISIBLE_DEVICE=0 python infer.py \ - --model_path="models/model_00044_15000" \ - --input_images_list="data/test.list" -``` - -## 预训练模型 - -|模型| 错误率| -|- |:-: | -|[ocr_ctc_params](https://paddle-ocr-models.bj.bcebos.com/ocr_ctc.zip) | 22.3% | -|[ocr_attention_params](https://paddle-ocr-models.bj.bcebos.com/ocr_attention.zip) | 15.8%| +您好,该项目已被迁移,请移步到 [PaddleCV/ocr_recognition](../../../PaddleCV/ocr_recognition) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/rcnn/README.md b/fluid/PaddleCV/rcnn/README.md index 97d1736f2b25bd6baa5ab5142be18544c9d63b85..1e96b373a0ad13424691921dd17e8f251b9cdfc7 100644 --- a/fluid/PaddleCV/rcnn/README.md +++ b/fluid/PaddleCV/rcnn/README.md @@ -1,209 +1,6 @@ -# RCNN Objective Detection ---- -## Table of Contents +Hi! -- [Installation](#installation) -- [Introduction](#introduction) -- [Data preparation](#data-preparation) -- [Training](#training) -- [Evaluation](#evaluation) -- [Inference and Visualization](#inference-and-visualization) +This directory has been deprecated. -## Installation - -Running sample code in this directory requires PaddelPaddle Fluid v.1.3.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://paddlepaddle.org/documentation/docs/en/1.3/beginners_guide/install/index_en.html) and make an update. - -## Introduction - -Region Convolutional Neural Network (RCNN) models are two stages detector. According to proposals and feature extraction, obtain class and more precise proposals. -Now RCNN model contains two typical models: Faster RCNN and Mask RCNN. - -[Faster RCNN](https://arxiv.org/abs/1506.01497), The total framework of network can be divided into four parts: - -1. Base conv layer. As a CNN objective dection, Faster RCNN extract feature maps using a basic convolutional network. The feature maps then can be shared by RPN and fc layers. This sampel uses [ResNet-50](https://arxiv.org/abs/1512.03385) as base conv layer. -2. Region Proposal Network (RPN). RPN generates proposals for detection。This block generates anchors by a set of size and ratio and classifies anchors into fore-ground and back-ground by softmax. Then refine anchors to obtain more precise proposals using box regression. -3. RoI Align. This layer takes feature maps and proposals as input. The proposals are mapped to feature maps and pooled to the same size. The output are sent to fc layers for classification and regression. RoIPool and RoIAlign are used separately to this layer and it can be set in roi\_func in config.py. -4. Detection layer. Using the output of roi pooling to compute the class and locatoin of each proposal in two fc layers. - -[Mask RCNN](https://arxiv.org/abs/1703.06870) is a classical instance segmentation model and an extension of Faster RCNN - -Mask RCNN is a two stage model as well. At the first stage, it generates proposals from input images. At the second stage, it obtains class result, bbox and mask which is the result from segmentation branch on original Faster RCNN model. It decouples the relation between mask and classification. - -## Data preparation - -Train the model on [MS-COCO dataset](http://cocodataset.org/#download), download dataset as below: - - cd dataset/coco - ./download.sh - -The data catalog structure is as follows: - - ``` - data/coco/ - ├── annotations - │   ├── instances_train2014.json - │   ├── instances_train2017.json - │   ├── instances_val2014.json - │   ├── instances_val2017.json - | ... - ├── train2017 - │   ├── 000000000009.jpg - │   ├── 000000580008.jpg - | ... - ├── val2017 - │   ├── 000000000139.jpg - │   ├── 000000000285.jpg - | ... - ``` - -## Training - -**download the pre-trained model:** This sample provides Resnet-50 pre-trained model which is converted from Caffe. The model fuses the parameters in batch normalization layer. One can download pre-trained model as: - - sh ./pretrained/download.sh - -Set `pretrained_model` to load pre-trained model. In addition, this parameter is used to load trained model when finetuning as well. -Please make sure that pretrained_model is downloaded and loaded correctly, otherwise, the loss may be NAN during training. - -**Install the [cocoapi](https://github.com/cocodataset/cocoapi):** - -To train the model, [cocoapi](https://github.com/cocodataset/cocoapi) is needed. Install the cocoapi: - - git clone https://github.com/cocodataset/cocoapi.git - cd cocoapi/PythonAPI - # if cython is not installed - pip install Cython - # Install into global site-packages - make install - # Alternatively, if you do not have permissions or prefer - # not to install the COCO API into global site-packages - python2 setup.py install --user - -After data preparation, one can start the training step by: - -- Faster RCNN - - ``` - python train.py \ - --model_save_dir=output/ \ - --pretrained_model=${path_to_pretrain_model} \ - --data_dir=${path_to_data} \ - --MASK_ON=False - ``` - -- Mask RCNN - - ``` - python train.py \ - --model_save_dir=output/ \ - --pretrained_model=${path_to_pretrain_model} \ - --data_dir=${path_to_data} \ - --MASK_ON=True - ``` - - - Set ```export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7``` to specifiy 8 GPU to train. - - Set ```MASK_ON``` to choose Faster RCNN or Mask RCNN model. - - For more help on arguments: - - python train.py --help - -**data reader introduction:** - -* Data reader is defined in `reader.py`. -* Scaling the short side of all images to `scales`. If the long side is larger than `max_size`, then scaling the long side to `max_size`. -* In training stage, images are horizontally flipped. -* Images in the same batch can be padding to the same size. - -**model configuration:** - -* Use RoIAlign and RoIPool separately. -* NMS threshold=0.7. During training, pre\_nms=12000, post\_nms=2000; during test, pre\_nms=6000, post\_nms=1000. -* In generating proposal lables, fg\_fraction=0.25, fg\_thresh=0.5, bg\_thresh_hi=0.5, bg\_thresh\_lo=0.0. -* In rpn target assignment, rpn\_fg\_fraction=0.5, rpn\_positive\_overlap=0.7, rpn\_negative\_overlap=0.3. - -**training strategy:** - -* Use momentum optimizer with momentum=0.9. -* Weight decay is 0.0001. -* In first 500 iteration, the learning rate increases linearly from 0.00333 to 0.01. Then lr is decayed at 120000, 160000 iteration with multiplier 0.1, 0.01. The maximum iteration is 180000. Also, we released a 2x model which has 360000 iterations and lr is decayed at 240000, 320000. These configuration can be set by max_iter and lr_steps in config.py. -* Set the learning rate of bias to two times as global lr in non basic convolutional layers. -* In basic convolutional layers, parameters of affine layers and res body do not update. - -## Evaluation - -Evaluation is to evaluate the performance of a trained model. This sample provides `eval_coco_map.py` which uses a COCO-specific mAP metric defined by [COCO committee](http://cocodataset.org/#detections-eval). - -`eval_coco_map.py` is the main executor for evalution, one can start evalution step by: - -- Faster RCNN - - ``` - python eval_coco_map.py \ - --dataset=coco2017 \ - --pretrained_model=${path_to_trained_model} \ - --MASK_ON=False - ``` - -- Mask RCNN - - ``` - python eval_coco_map.py \ - --dataset=coco2017 \ - --pretrained_model=${path_to_trainde_model} \ - --MASK_ON=True - ``` - - - Set ```--pretrained_model=${path_to_trained_model}``` to specifiy the trained model, not the initialized model. - - Set ```export CUDA_VISIBLE_DEVICES=0``` to specifiy one GPU to eval. - - Set ```MASK_ON``` to choose Faster RCNN or Mask RCNN model. - -Evalutaion result is shown as below: - -Faster RCNN: - -| Model | RoI function | Batch size | Max iteration | mAP | -| :--------------- | :--------: | :------------: | :------------------: |------: | -| [Fluid RoIPool minibatch padding](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_pool_minibatch_padding.tar.gz) | RoIPool | 8 | 180000 | 0.316 | -| [Fluid RoIPool no padding](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_pool_no_padding.tar.gz) | RoIPool | 8 | 180000 | 0.318 | -| [Fluid RoIAlign no padding](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_align_no_padding.tar.gz) | RoIAlign | 8 | 180000 | 0.348 | -| [Fluid RoIAlign no padding 2x](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_align_no_padding_2x.tar.gz) | RoIAlign | 8 | 360000 | 0.367 | - -* Fluid RoIPool minibatch padding: Use RoIPool. Images in one batch padding to the same size. This method is same as detectron. -* Fluid RoIPool no padding: Images without padding. -* Fluid RoIAlign no padding: Images without padding. -* Fluid RoIAlign no padding 2x: Images without padding, train for 360000 iterations, learning rate is decayed at 240000, 320000. - -Mask RCNN: - -| Model | Batch size | Max iteration | box mAP | mask mAP | -| :--------------- | :--------: | :------------: | :--------: |------: | -| [Fluid mask no padding](https://paddlemodels.bj.bcebos.com/faster_rcnn/Fluid_mask_no_padding.tar.gz) | 8 | 180000 | 0.359 | 0.314 | - -* Fluid mask no padding: Use RoIAlign. Images without padding. - -## Inference and Visualization - -Inference is used to get prediction score or image features based on trained models. `infer.py` is the main executor for inference, one can start infer step by: - -``` -python infer.py \ - --pretrained_model=${path_to_trained_model} \ - --image_path=dataset/coco/val2017/000000000139.jpg \ - --draw_threshold=0.6 -``` - -Please set the model path and image path correctly. GPU device is used by default, you can set `--use_gpu=False` to switch to CPU device. And you can set `draw_threshold` to tune score threshold to control the number of output detection boxes. - -Visualization of infer result is shown as below: -

    - -
    -Faster RCNN Visualization Examples -

    - -

    - -
    -Mask RCNN Visualization Examples -

    +Please visit the project at [PaddleCV/rcnn](../../../PaddleCV/rcnn). diff --git a/fluid/PaddleCV/rcnn/README_cn.md b/fluid/PaddleCV/rcnn/README_cn.md index 3d45de1c5845727d0b942f9ba5a4bf5216985af9..83d5e0fc06448086e8807587798e804e3c634f97 100644 --- a/fluid/PaddleCV/rcnn/README_cn.md +++ b/fluid/PaddleCV/rcnn/README_cn.md @@ -1,207 +1,2 @@ -# RCNN 系列目标检测 ---- -## 内容 - -- [安装](#安装) -- [简介](#简介) -- [数据准备](#数据准备) -- [模型训练](#模型训练) -- [模型评估](#模型评估) -- [模型推断及可视化](#模型推断及可视化) - -## 安装 - -在当前目录下运行样例代码需要PadddlePaddle Fluid的v.1.3.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://www.paddlepaddle.org/)中的说明来更新PaddlePaddle。 - -## 简介 -区域卷积神经网络(RCNN)系列模型为两阶段目标检测器。通过对图像生成候选区域,提取特征,判别特征类别并修正候选框位置。 -RCNN系列目前包含两个代表模型:Faster RCNN,Mask RCNN - -[Faster RCNN](https://arxiv.org/abs/1506.01497) 整体网络可以分为4个主要内容: - -1. 基础卷积层。作为一种卷积神经网络目标检测方法,Faster RCNN首先使用一组基础的卷积网络提取图像的特征图。特征图被后续RPN层和全连接层共享。本示例采用[ResNet-50](https://arxiv.org/abs/1512.03385)作为基础卷积层。 -2. 区域生成网络(RPN)。RPN网络用于生成候选区域(proposals)。该层通过一组固定的尺寸和比例得到一组锚点(anchors), 通过softmax判断锚点属于前景或者背景,再利用区域回归修正锚点从而获得精确的候选区域。 -3. RoI Align。该层收集输入的特征图和候选区域,将候选区域映射到特征图中并池化为统一大小的区域特征图,送入全连接层判定目标类别, 该层可选用RoIPool和RoIAlign两种方式,在config.py中设置roi\_func。 -4. 检测层。利用区域特征图计算候选区域的类别,同时再次通过区域回归获得检测框最终的精确位置。 - -[Mask RCNN](https://arxiv.org/abs/1703.06870) 扩展自Faster RCNN,是经典的实例分割模型。 - -Mask RCNN同样为两阶段框架,第一阶段扫描图像生成候选框;第二阶段根据候选框得到分类结果,边界框,同时在原有Faster RCNN模型基础上添加分割分支,得到掩码结果,实现了掩码和类别预测关系的解藕。 - - -## 数据准备 - -在[MS-COCO数据集](http://cocodataset.org/#download)上进行训练,通过如下方式下载数据集。 - - cd dataset/coco - ./download.sh - -数据目录结构如下: - -``` -data/coco/ -├── annotations -│   ├── instances_train2014.json -│   ├── instances_train2017.json -│   ├── instances_val2014.json -│   ├── instances_val2017.json -| ... -├── train2017 -│   ├── 000000000009.jpg -│   ├── 000000580008.jpg -| ... -├── val2017 -│   ├── 000000000139.jpg -│   ├── 000000000285.jpg -| ... - -``` - -## 模型训练 - -**下载预训练模型:** 本示例提供Resnet-50预训练模型,该模性转换自Caffe,并对批标准化层(Batch Normalization Layer)进行参数融合。采用如下命令下载预训练模型: - - sh ./pretrained/download.sh - -通过初始化`pretrained_model` 加载预训练模型。同时在参数微调时也采用该设置加载已训练模型。 -请在训练前确认预训练模型下载与加载正确,否则训练过程中损失可能会出现NAN。 - -**安装[cocoapi](https://github.com/cocodataset/cocoapi):** - -训练前需要首先下载[cocoapi](https://github.com/cocodataset/cocoapi): - - git clone https://github.com/cocodataset/cocoapi.git - cd cocoapi/PythonAPI - # if cython is not installed - pip install Cython - # Install into global site-packages - make install - # Alternatively, if you do not have permissions or prefer - # not to install the COCO API into global site-packages - python2 setup.py install --user - -数据准备完毕后,可以通过如下的方式启动训练: - -- Faster RCNN - - ``` - python train.py \ - --model_save_dir=output/ \ - --pretrained_model=${path_to_pretrain_model} \ - --data_dir=${path_to_data} \ - --MASK_ON=False - ``` - -- Mask RCNN - - ``` - python train.py \ - --model_save_dir=output/ \ - --pretrained_model=${path_to_pretrain_model} \ - --data_dir=${path_to_data} \ - --MASK_ON=True - ``` - - - 通过设置export CUDA\_VISIBLE\_DEVICES=0,1,2,3,4,5,6,7指定8卡GPU训练。 - - 通过设置```MASK_ON```选择Faster RCNN和Mask RCNN模型。 - - 可选参数见: - - python train.py --help - -**数据读取器说明:** 数据读取器定义在reader.py中。所有图像将短边等比例缩放至`scales`,若长边大于`max_size`, 则再次将长边等比例缩放至`max_size`。在训练阶段,对图像采用水平翻转。支持将同一个batch内的图像padding为相同尺寸。 - -**模型设置:** - -* 分别使用RoIAlign和RoIPool两种方法。 -* 训练过程pre\_nms=12000, post\_nms=2000,测试过程pre\_nms=6000, post\_nms=1000。nms阈值为0.7。 -* RPN网络得到labels的过程中,fg\_fraction=0.25,fg\_thresh=0.5,bg\_thresh_hi=0.5,bg\_thresh\_lo=0.0 -* RPN选择anchor时,rpn\_fg\_fraction=0.5,rpn\_positive\_overlap=0.7,rpn\_negative\_overlap=0.3 - - -**训练策略:** - -* 采用momentum优化算法训练,momentum=0.9。 -* 权重衰减系数为0.0001,前500轮学习率从0.00333线性增加至0.01。在120000,160000轮时使用0.1,0.01乘子进行学习率衰减,最大训练180000轮。同时我们也提供了2x模型,该模型采用更多的迭代轮数进行训练,训练360000轮,学习率在240000,320000轮衰减,其他参数不变,训练最大轮数和学习率策略可以在config.py中对max_iter和lr_steps进行设置。 -* 非基础卷积层卷积bias学习率为整体学习率2倍。 -* 基础卷积层中,affine_layers参数不更新,res2层参数不更新。 - -## 模型评估 - -模型评估是指对训练完毕的模型评估各类性能指标。本示例采用[COCO官方评估](http://cocodataset.org/#detections-eval) - -`eval_coco_map.py`是评估模块的主要执行程序,调用示例如下: - -- Faster RCNN - - ``` - python eval_coco_map.py \ - --dataset=coco2017 \ - --pretrained_model=${path_to_trained_model} \ - --MASK_ON=False - ``` - -- Mask RCNN - - ``` - python eval_coco_map.py \ - --dataset=coco2017 \ - --pretrained_model=${path_to_trained_model} \ - --MASK_ON=True - ``` - - - 通过设置`--pretrained_model=${path_to_trained_model}`指定训练好的模型,注意不是初始化的模型。 - - 通过设置`export CUDA\_VISIBLE\_DEVICES=0`指定单卡GPU评估。 - - 通过设置```MASK_ON```选择Faster RCNN和Mask RCNN模型。 - -下表为模型评估结果: - -Faster RCNN - -| 模型 | RoI处理方式 | 批量大小 | 迭代次数 | mAP | -| :--------------- | :--------: | :------------: | :------------------: |------: | -| [Fluid RoIPool minibatch padding](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_pool_minibatch_padding.tar.gz) | RoIPool | 8 | 180000 | 0.316 | -| [Fluid RoIPool no padding](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_pool_no_padding.tar.gz) | RoIPool | 8 | 180000 | 0.318 | -| [Fluid RoIAlign no padding](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_align_no_padding.tar.gz) | RoIAlign | 8 | 180000 | 0.348 | -| [Fluid RoIAlign no padding 2x](http://paddlemodels.bj.bcebos.com/faster_rcnn/model_align_no_padding_2x.tar.gz) | RoIAlign | 8 | 360000 | 0.367 | - - - -* Fluid RoIPool minibatch padding: 使用RoIPool,同一个batch内的图像填充为相同尺寸。该方法与detectron处理相同。 -* Fluid RoIPool no padding: 使用RoIPool,不对图像做填充处理。 -* Fluid RoIAlign no padding: 使用RoIAlign,不对图像做填充处理。 -* Fluid RoIAlign no padding 2x: 使用RoIAlign,不对图像做填充处理。训练360000轮,学习率在240000,320000轮衰减。 - -Mask RCNN: - -| 模型 | 批量大小 | 迭代次数 | box mAP | mask mAP | -| :--------------- | :--------: | :------------: | :--------: |------: | -| [Fluid mask no padding](https://paddlemodels.bj.bcebos.com/faster_rcnn/Fluid_mask_no_padding.tar.gz) | 8 | 180000 | 0.359 | 0.314 | - -* Fluid mask no padding: 使用RoIAlign,不对图像做填充处理 - -## 模型推断及可视化 - -模型推断可以获取图像中的物体及其对应的类别,`infer.py`是主要执行程序,调用示例如下: - -``` -python infer.py \ - --pretrained_model=${path_to_trained_model} \ - --image_path=dataset/coco/val2017/000000000139.jpg \ - --draw_threshold=0.6 -``` - -注意,请正确设置模型路径`${path_to_trained_model}`和预测图片路径。默认使用GPU设备,也可通过设置`--use_gpu=False`使用CPU设备。可通过设置`draw_threshold`调节得分阈值控制检测框的个数。 - -下图为模型可视化预测结果: -

    - -
    -Faster RCNN 预测可视化 -

    - -

    - -
    -Mask RCNN 预测可视化 -

    +您好,该项目已被迁移,请移步到 [PaddleCV/rcnn](../../../PaddleCV/rcnn) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/video/README.md b/fluid/PaddleCV/video/README.md index b6b6cdd2dd817268b2fe42f79da8e9e952f96f74..bbef3af1c6f6715e4415041939e046d66f02f58d 100644 --- a/fluid/PaddleCV/video/README.md +++ b/fluid/PaddleCV/video/README.md @@ -1,130 +1,2 @@ -## 简介 -本教程期望给开发者提供基于PaddlePaddle的便捷、高效的使用深度学习算法解决视频理解、视频编辑、视频生成等一系列模型。目前包含视频分类模型,后续会不断的扩展到其他更多场景。 - -目前视频分类模型包括: - -| 模型 | 类别 | 描述 | -| :--------------- | :--------: | :------------: | -| [Attention Cluster](./models/attention_cluster/README.md) | 视频分类| CVPR'18提出的视频多模态特征注意力聚簇融合方法 | -| [Attention LSTM](./models/attention_lstm/README.md) | 视频分类| 常用模型,速度快精度高 | -| [NeXtVLAD](./models/nextvlad/README.md) | 视频分类| 2nd-Youtube-8M最优单模型 | -| [StNet](./models/stnet/README.md) | 视频分类| AAAI'19提出的视频联合时空建模方法 | -| [TSN](./models/tsn/README.md) | 视频分类| ECCV'16提出的基于2D-CNN经典解决方案 | - -### 主要特点 - -- 包含视频分类方向的多个主流领先模型,其中Attention LSTM,Attention Cluster和NeXtVLAD是比较流行的特征序列模型,TSN和StNet是两个End-to-End的视频分类模型。Attention LSTM模型速度快精度高,NeXtVLAD是2nd-Youtube-8M比赛中最好的单模型, TSN是基于2D-CNN的经典解决方案。Attention Cluster和StNet是百度自研模型,分别发表于CVPR2018和AAAI2019,是Kinetics600比赛第一名中使用到的模型。 - -- 提供了适合视频分类任务的通用骨架代码,用户可一键式高效配置模型完成训练和评测。 - -## 安装 - -在当前模型库运行样例代码需要PadddlePaddle Fluid v.1.2.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://www.paddlepaddle.org/documentation/docs/zh/1.3/beginners_guide/install/index_cn.html)中的说明来更新PaddlePaddle。 - -## 数据准备 - -视频模型库使用Youtube-8M和Kinetics数据集, 具体使用方法请参考[数据说明](./dataset/README.md) - -## 快速使用 - -视频模型库提供通用的train/test/infer框架,通过`train.py/test.py/infer.py`指定模型名、模型配置参数等可一键式进行训练和预测。 - -以StNet模型为例: - -单卡训练: - -``` bash -export CUDA_VISIBLE_DEVICES=0 -python train.py --model-name=STNET - --config=./configs/stnet.txt - --save-dir=checkpoints -``` - -多卡训练: - -``` bash -export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 -python train.py --model-name=STNET - --config=./configs/stnet.txt - --save-dir=checkpoints -``` - -视频模型库同时提供了快速训练脚本,脚本位于`scripts/train`目录下,可通过如下命令启动训练: - -``` bash -bash scripts/train/train_stnet.sh -``` - -- 请根据`CUDA_VISIBLE_DEVICES`指定卡数修改`config`文件中的`num_gpus`和`batch_size`配置。 - -## 模型库结构 - -### 代码结构 - -``` -configs/ - stnet.txt - tsn.txt - ... -dataset/ - youtube/ - kinetics/ -datareader/ - feature_readeer.py - kinetics_reader.py - ... -metrics/ - kinetics/ - youtube8m/ - ... -models/ - stnet/ - tsn/ - ... -scripts/ - train/ - test/ -train.py -test.py -infer.py -``` - -- `configs`: 各模型配置文件模板 -- `datareader`: 提供Youtube-8M,Kinetics数据集reader -- `metrics`: Youtube-8,Kinetics数据集评估脚本 -- `models`: 各模型网络结构构建脚本 -- `scripts`: 各模型快速训练评估脚本 -- `train.py`: 一键式训练脚本,可通过指定模型名,配置文件等一键式启动训练 -- `test.py`: 一键式评估脚本,可通过指定模型名,配置文件,模型权重等一键式启动评估 -- `infer.py`: 一键式推断脚本,可通过指定模型名,配置文件,模型权重,待推断文件列表等一键式启动推断 - -## Model Zoo - -- 基于Youtube-8M数据集模型: - -| 模型 | Batch Size | 环境配置 | cuDNN版本 | GAP | 下载链接 | -| :-------: | :---: | :---------: | :-----: | :----: | :----------: | -| Attention Cluster | 2048 | 8卡P40 | 7.1 | 0.84 | [model](https://paddlemodels.bj.bcebos.com/video_classification/attention_cluster_youtube8m.tar.gz) | -| Attention LSTM | 1024 | 8卡P40 | 7.1 | 0.86 | [model](https://paddlemodels.bj.bcebos.com/video_classification/attention_lstm_youtube8m.tar.gz) | -| NeXtVLAD | 160 | 4卡P40 | 7.1 | 0.87 | [model](https://paddlemodels.bj.bcebos.com/video_classification/nextvlad_youtube8m.tar.gz) | - -- 基于Kinetics数据集模型: - -| 模型 | Batch Size | 环境配置 | cuDNN版本 | Top-1 | 下载链接 | -| :-------: | :---: | :---------: | :----: | :----: | :----------: | -| StNet | 128 | 8卡P40 | 5.1 | 0.69 | [model](https://paddlemodels.bj.bcebos.com/video_classification/stnet_kinetics.tar.gz) | -| TSN | 256 | 8卡P40 | 7.1 | 0.67 | [model](https://paddlemodels.bj.bcebos.com/video_classification/tsn_kinetics.tar.gz) | - -## 参考文献 - -- [Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification](https://arxiv.org/abs/1711.09550), Xiang Long, Chuang Gan, Gerard de Melo, Jiajun Wu, Xiao Liu, Shilei Wen -- [Beyond Short Snippets: Deep Networks for Video Classification](https://arxiv.org/abs/1503.08909) Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, George Toderici -- [NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification](https://arxiv.org/abs/1811.05014), Rongcheng Lin, Jing Xiao, Jianping Fan -- [StNet:Local and Global Spatial-Temporal Modeling for Human Action Recognition](https://arxiv.org/abs/1811.01549), Dongliang He, Zhichao Zhou, Chuang Gan, Fu Li, Xiao Liu, Yandong Li, Limin Wang, Shilei Wen -- [Temporal Segment Networks: Towards Good Practices for Deep Action Recognition](https://arxiv.org/abs/1608.00859), Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc Van Gool - -## 版本更新 - -- 3/2019: 新增模型库,发布Attention Cluster,Attention LSTM,NeXtVLAD,StNet,TSN五个视频分类模型。 - +您好,该项目已被迁移,请移步到 [PaddleCV/video](../../../PaddleCV/video) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/video_classification/README.md b/fluid/PaddleCV/video_classification/README.md index 822c3ccf64cb1c5567e574425229974524a34471..bb145d1e7d4538f8b1a6df5cf547d9c5ef5ae8c5 100644 --- a/fluid/PaddleCV/video_classification/README.md +++ b/fluid/PaddleCV/video_classification/README.md @@ -1,140 +1,6 @@ -# Video Classification Based on Temporal Segment Network -Video classification has drawn a significant amount of attentions in the past few years. This page introduces how to perform video classification with PaddlePaddle Fluid, on the public UCF-101 dataset, based on the state-of-the-art Temporal Segment Network (TSN) method. +Hi! -______________________________________________________________________________ +This directory has been deprecated. -## Table of Contents -
  • Installation
  • -
  • Data preparation
  • -
  • Training
  • -
  • Evaluation
  • -
  • Inference
  • -
  • Performance
  • - -### Installation -Running sample code in this directory requires PaddelPaddle Fluid v0.13.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in installation document and make an update. - -### Data preparation - -#### download UCF-101 dataset -Users can download the UCF-101 dataset by the provided script in data/download.sh. - -#### decode video into frame -To avoid the process of decoding videos in network training, we offline decode them into frames and save it in the pickle format, easily readable for python. - -Users can refer to the script data/video_decode.py for video decoding. - -#### split data into train and test -We follow the split 1 of UCF-101 dataset. After data splitting, users can get 9537 videos for training and 3783 videos for validation. The reference script is data/split_data.py. - -#### save pickle for training -As stated above, we save all data as pickle format for training. All information in each video is saved into one pickle, includes video id, frames binary and label. Please refer to the script data/generate_train_data.py. -After this operation, one can get two directories containing training and testing data in pickle format, and two files train.list and test.list, with each line seperated by SPACE. - -### Training -After data preparation, users can start the PaddlePaddle Fluid training by: -``` -python train.py \ - --batch_size=128 \ - --total_videos=9537 \ - --class_dim=101 \ - --num_epochs=60 \ - --image_shape=3,224,224 \ - --model_save_dir=output/ \ - --with_mem_opt=True \ - --lr_init=0.01 \ - --num_layers=50 \ - --seg_num=7 \ - --pretrained_model={path_to_pretrained_model} -``` - -parameter introduction: -
  • batch_size: the size of each mini-batch.
  • -
  • total_videos: total number of videos in the training set.
  • -
  • class_dim: the class number of the classification task.
  • -
  • num_epochs: the number of epochs.
  • -
  • image_shape: input size of the network.
  • -
  • model_save_dir: the directory to save trained model.
  • -
  • with_mem_opt: whether to use memory optimization or not.
  • -
  • lr_init: initialized learning rate.
  • -
  • num_layers: the number of layers for ResNet.
  • -
  • seg_num: the number of segments in TSN.
  • -
  • pretrained_model: model path for pretraining.
  • -
    - -data reader introduction: -Data reader is defined in reader.py. Note that we use group operation for all frames in one video. - - -training: -The training log is like: -``` -[TRAIN] Pass: 0 trainbatch: 0 loss: 4.630959 acc1: 0.0 acc5: 0.0390625 time: 3.09 sec -[TRAIN] Pass: 0 trainbatch: 10 loss: 4.559069 acc1: 0.0546875 acc5: 0.1171875 time: 3.91 sec -[TRAIN] Pass: 0 trainbatch: 20 loss: 4.040092 acc1: 0.09375 acc5: 0.3515625 time: 3.88 sec -[TRAIN] Pass: 0 trainbatch: 30 loss: 3.478214 acc1: 0.3203125 acc5: 0.5546875 time: 3.32 sec -[TRAIN] Pass: 0 trainbatch: 40 loss: 3.005404 acc1: 0.3515625 acc5: 0.6796875 time: 3.33 sec -[TRAIN] Pass: 0 trainbatch: 50 loss: 2.585245 acc1: 0.4609375 acc5: 0.7265625 time: 3.13 sec -[TRAIN] Pass: 0 trainbatch: 60 loss: 2.151489 acc1: 0.4921875 acc5: 0.8203125 time: 3.35 sec -[TRAIN] Pass: 0 trainbatch: 70 loss: 1.981680 acc1: 0.578125 acc5: 0.8359375 time: 3.30 sec -``` - -### Evaluation -Evaluation is to evaluate the performance of a trained model. One can download pretrained models and set its path to path_to_pretrain_model. Then top1/top5 accuracy can be obtained by running the following command: -``` -python eval.py \ - --batch_size=128 \ - --class_dim=101 \ - --image_shape=3,224,224 \ - --with_mem_opt=True \ - --num_layers=50 \ - --seg_num=7 \ - --test_model={path_to_pretrained_model} -``` - -According to the congfiguration of evaluation, the output log is like: -``` -[TEST] Pass: 0 testbatch: 0 loss: 0.011551 acc1: 1.0 acc5: 1.0 time: 0.48 sec -[TEST] Pass: 0 testbatch: 10 loss: 0.710330 acc1: 0.75 acc5: 1.0 time: 0.49 sec -[TEST] Pass: 0 testbatch: 20 loss: 0.000547 acc1: 1.0 acc5: 1.0 time: 0.48 sec -[TEST] Pass: 0 testbatch: 30 loss: 0.036623 acc1: 1.0 acc5: 1.0 time: 0.48 sec -[TEST] Pass: 0 testbatch: 40 loss: 0.138705 acc1: 1.0 acc5: 1.0 time: 0.48 sec -[TEST] Pass: 0 testbatch: 50 loss: 0.056909 acc1: 1.0 acc5: 1.0 time: 0.49 sec -[TEST] Pass: 0 testbatch: 60 loss: 0.742937 acc1: 0.75 acc5: 1.0 time: 0.49 sec -[TEST] Pass: 0 testbatch: 70 loss: 1.720186 acc1: 0.5 acc5: 0.875 time: 0.48 sec -[TEST] Pass: 0 testbatch: 80 loss: 0.199669 acc1: 0.875 acc5: 1.0 time: 0.48 sec -[TEST] Pass: 0 testbatch: 90 loss: 0.195510 acc1: 1.0 acc5: 1.0 time: 0.48 sec -``` - -### Inference -Inference is used to get prediction score or video features based on trained models. -``` -python infer.py \ - --class_dim=101 \ - --image_shape=3,224,224 \ - --with_mem_opt=True \ - --num_layers=50 \ - --seg_num=7 \ - --test_model={path_to_pretrained_model} -``` - -The output contains predication results, including maximum score (before softmax) and corresponding predicted label. -``` -Test sample: PlayingGuitar_g01_c03, score: [21.418629], class [62] -Test sample: SalsaSpin_g05_c06, score: [13.238657], class [76] -Test sample: TrampolineJumping_g04_c01, score: [21.722862], class [93] -Test sample: JavelinThrow_g01_c04, score: [16.27892], class [44] -Test sample: PlayingTabla_g01_c01, score: [15.366951], class [65] -Test sample: ParallelBars_g04_c07, score: [18.42596], class [56] -Test sample: PlayingCello_g05_c05, score: [18.795723], class [58] -Test sample: LongJump_g03_c04, score: [7.100088], class [50] -Test sample: SkyDiving_g06_c03, score: [15.144707], class [82] -Test sample: UnevenBars_g07_c04, score: [22.114838], class [95] -``` - -### Performance -Configuration | Top-1 acc -------------- | ---------------: -seg=7, size=224 | 0.859 -seg=10, size=224 | 0.863 +Please visit the project at [PaddleCV/video_classification](../../../PaddleCV/video_classification). diff --git a/fluid/PaddleCV/yolov3/README.md b/fluid/PaddleCV/yolov3/README.md index 8b37aded4c21d7e9f50f1d79965bce4bb567b15c..d05d89ce182a23b2f74e2633f7ada32fc6390477 100644 --- a/fluid/PaddleCV/yolov3/README.md +++ b/fluid/PaddleCV/yolov3/README.md @@ -1,152 +1,6 @@ -# YOLO V3 Objective Detection ---- -## Table of Contents +Hi! -- [Installation](#installation) -- [Introduction](#introduction) -- [Data preparation](#data-preparation) -- [Training](#training) -- [Evaluation](#evaluation) -- [Inference and Visualization](#inference-and-visualization) -- [Appendix](#appendix) - -## Installation - -Running sample code in this directory requires PaddelPaddle Fluid v.1.4 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/documentation/docs/zh/1.4/beginners_guide/install/install_doc.html#paddlepaddle) and make an update. - -## Introduction - -[YOLOv3](https://arxiv.org/abs/1804.02767) is a one stage end to end detector。the detection principle of YOLOv3 is as follow: -

    -
    -YOLOv3 detection principle -

    - -YOLOv3 divides the input image in to S\*S grids and predict B bounding boxes in each grid, predictions of boxes include Location(x, y, w, h), Confidence Score and probabilities of C classes, therefore YOLOv3 output layer has S\*S\*B\*(5 + C) channels. YOLOv3 loss consists of three parts: location loss, confidence loss and classification loss. -The bone network of YOLOv3 is darknet53, the structure of YOLOv3 is as follow: -

    -
    -YOLOv3 structure -

    - -YOLOv3 networks are composed of base feature extraction network, multi-scale feature fusion layers, and output layers. - -1. Feature extraction network: YOLOv3 uses [DarkNet53](https://arxiv.org/abs/1612.08242) for feature extracion. Darknet53 uses a full convolution structure, replacing the pooling layer with a convolution operation of step size 2, and adding Residual-block to avoid gradient dispersion when the number of network layers is too deep. - -2. Feature fusion layer. In order to solve the problem that the previous YOLO version is not sensitive to small objects, YOLOv3 uses three different scale feature maps for target detection, which are 13\*13, 26\*26, 52\*52, respectively, for detecting large, medium and small objects. The feature fusion layer selects the three scale feature maps produced by DarkNet as input, and draws on the idea of FPN (feature pyramid networks) to fuse the feature maps of each scale through a series of convolutional layers and upsampling. - -3. Output layer: The output layer also uses a full convolution structure. The number of convolution kernels in the last convolutional layer is 255:3\*(80+4+1)=255, and 3 indicates that a grid cell contains 3 bounding boxes. 4 represents the four coordinate information of the box, 1 represents the Confidence Score, and 80 represents the probability of 80 categories in the COCO dataset. - -## Data preparation - -Train the model on [MS-COCO dataset](http://cocodataset.org/#download), download dataset as below: - - cd dataset/coco - ./download.sh - - -## Training - -After data preparation, one can start the training step by: - - python train.py \ - --model_save_dir=output/ \ - --pretrain=${path_to_pretrain_model} - --data_dir=${path_to_data} - -- Set ```export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7``` to specifiy 8 GPU to train. -- For more help on arguments: - - python train.py --help - -**download the pre-trained model:** This sample provides Resnet-50 pre-trained model which is converted from Caffe. The model fuses the parameters in batch normalization layer. One can download pre-trained model as: - - sh ./weights/download.sh - -Set `pretrain` to load pre-trained model. In addition, this parameter is used to load trained model when finetuning as well. -Please make sure that pre-trained model is downloaded and loaded correctly, otherwise, the loss may be NAN during training. - -**Install the [cocoapi](https://github.com/cocodataset/cocoapi):** - -To train the model, [cocoapi](https://github.com/cocodataset/cocoapi) is needed. Install the cocoapi: - - git clone https://github.com/cocodataset/cocoapi.git - cd PythonAPI - # if cython is not installed - pip install Cython - # Install into global site-packages - make install - # Alternatively, if you do not have permissions or prefer - # not to install the COCO API into global site-packages - python2 setup.py install --user - -**data reader introduction:** - -* Data reader is defined in `reader.py` . - -**model configuration:** - -* The model uses 9 anchors generated based on the COCO dataset, which are 10x13, 16x30, 33x23, 30x61, 62x45, 59x119, 116x90, 156x198, 373x326. - -* NMS threshold=0.45, NMS valid=0.005 nms_topk=400, nms_posk=100 - -**training strategy:** - -* Use momentum optimizer with momentum=0.9. -* In first 4000 iteration, the learning rate increases linearly from 0.0 to 0.001. Then lr is decayed at 400000, 450000 iteration with multiplier 0.1, 0.01. The maximum iteration is 500000. - -Training result is shown as below: -

    -
    -Train Loss -

    - -## Evaluation - -Evaluation is to evaluate the performance of a trained model. This sample provides `eval.py` which uses a COCO-specific mAP metric defined by [COCO committee](http://cocodataset.org/#detections-eval). - -`eval.py` is the main executor for evalution, one can start evalution step by: - - python eval.py \ - --dataset=coco2017 \ - --weights=${path_to_weights} \ - -- Set ```export CUDA_VISIBLE_DEVICES=0``` to specifiy one GPU to eval. - -Evalutaion result is shown as below: - -| input size | mAP(IoU=0.50:0.95) | mAP(IoU=0.50) | mAP(IoU=0.75) | -| :------: | :------: | :------: | :------: | -| 608x608| 37.7 | 59.8 | 40.8 | -| 416x416 | 36.5 | 58.2 | 39.1 | -| 320x320 | 34.1 | 55.4 | 36.3 | - -## Inference and Visualization - -Inference is used to get prediction score or image features based on trained models. `infer.py` is the main executor for inference, one can start infer step by: - - python infer.py \ - --dataset=coco2017 \ - --weights=${path_to_weights} \ - --image_path=data/COCO17/val2017/ \ - --image_name=000000000139.jpg \ - --draw_threshold=0.5 - -Inference speed: - - -| input size | 608x608 | 416x416 | 320x320 | -|:-------------:| :-----: | :-----: | :-----: | -| infer speed | 50 ms/frame | 29 ms/frame |24 ms/frame | - - -Visualization of infer result is shown as below: -

    - - - -
    -YOLOv3 Visualization Examples -

    +This directory has been deprecated. +Please visit the project at [PaddleCV/yolov3](../../../PaddleCV/yolov3). diff --git a/fluid/PaddleCV/yolov3/README_cn.md b/fluid/PaddleCV/yolov3/README_cn.md index 518bc36fafd2bce03b6eb416e2efb05bb8b67475..89080d674df265d37a3601b579622adf1829c747 100644 --- a/fluid/PaddleCV/yolov3/README_cn.md +++ b/fluid/PaddleCV/yolov3/README_cn.md @@ -1,155 +1,2 @@ -# YOLO V3 目标检测 - ---- -## 内容 - -- [安装](#安装) -- [简介](#简介) -- [数据准备](#数据准备) -- [模型训练](#模型训练) -- [模型评估](#模型评估) -- [模型推断及可视化](#模型推断及可视化) -- [附录](#附录) - -## 安装 - -在当前目录下运行样例代码需要PadddlePaddle Fluid的v.1.4或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://www.paddlepaddle.org/documentation/docs/zh/1.4/beginners_guide/install/install_doc.html#paddlepaddle)中的说明来更新PaddlePaddle。 - -## 简介 - -[YOLOv3](https://arxiv.org/abs/1804.02767) 是一阶段End2End的目标检测器。其目标检测原理如下图所示: -

    -
    -YOLOv3检测原理 -

    - -YOLOv3将输入图像分成S\*S个格子,每个格子预测B个bounding box,每个bounding box预测内容包括: Location(x, y, w, h)、Confidence Score和C个类别的概率,因此YOLOv3输出层的channel数为S\*S\*B\*(5 + C)。YOLOv3的loss函数也有三部分组成:Location误差,Confidence误差和分类误差。 - -YOLOv3的网络结构如下图所示: -

    -
    -YOLOv3网络结构 -

    - -YOLOv3 的网络结构由基础特征提取网络、multi-scale特征融合层和输出层组成。 - -1. 特征提取网络。YOLOv3使用 [DarkNet53](https://arxiv.org/abs/1612.08242)作为特征提取网络:DarkNet53 基本采用了全卷积网络,用步长为2的卷积操作替代了池化层,同时添加了 Residual 单元,避免在网络层数过深时发生梯度弥散。 - -2. 特征融合层。为了解决之前YOLO版本对小目标不敏感的问题,YOLOv3采用了3个不同尺度的特征图来进行目标检测,分别为13\*13,26\*26,52\*52,用来检测大、中、小三种目标。特征融合层选取 DarkNet 产出的三种尺度特征图作为输入,借鉴了FPN(feature pyramid networks)的思想,通过一系列的卷积层和上采样对各尺度的特征图进行融合。 - -3. 输出层。同样使用了全卷积结构,其中最后一个卷积层的卷积核个数是255:3\*(80+4+1)=255,3表示一个grid cell包含3个bounding box,4表示框的4个坐标信息,1表示Confidence Score,80表示COCO数据集中80个类别的概率。 - - -## 数据准备 - -在[MS-COCO数据集](http://cocodataset.org/#download)上进行训练,通过如下方式下载数据集。 - - cd dataset/coco - ./download.sh - - -## 模型训练 - -数据准备完毕后,可以通过如下的方式启动训练: - - python train.py \ - --model_save_dir=output/ \ - --pretrain=${path_to_pretrain_model} - --data_dir=${path_to_data} - -- 通过设置export CUDA\_VISIBLE\_DEVICES=0,1,2,3,4,5,6,7指定8卡GPU训练。 -- 可选参数见: - - python train.py --help - -**下载预训练模型:** 本示例提供darknet53预训练模型,该模型转换自作者提供的darknet53在ImageNet上预训练的权重,采用如下命令下载预训练模型: - - sh ./weights/download_pretrained_weight.sh - -通过初始化`pretrain` 加载预训练模型。同时在参数微调时也采用该设置加载已训练模型。 -请在训练前确认预训练模型下载与加载正确,否则训练过程中损失可能会出现NAN。 - -**安装[cocoapi](https://github.com/cocodataset/cocoapi):** - -训练前需要首先下载[cocoapi](https://github.com/cocodataset/cocoapi): - - git clone https://github.com/cocodataset/cocoapi.git - cd PythonAPI - # if cython is not installed - pip install Cython - # Install into global site-packages - make install - # Alternatively, if you do not have permissions or prefer - # not to install the COCO API into global site-packages - python2 setup.py install --user - -**数据读取器说明:** - -* 数据读取器定义在reader.py中。 - -**模型设置:** - -* 模型使用了基于COCO数据集生成的9个先验框:10x13,16x30,33x23,30x61,62x45,59x119,116x90,156x198,373x326 -* 检测过程中,nms_topk=400, nms_posk=100,nms_thresh=0.45 - -**训练策略:** - -* 采用momentum优化算法训练YOLOv3,momentum=0.9。 -* 学习率采用warmup算法,前4000轮学习率从0.0线性增加至0.001。在400000,450000轮时使用0.1,0.01乘子进行学习率衰减,最大训练500000轮。 - -下图为模型训练结果: -

    -
    -Train Loss -

    - -## 模型评估 - -模型评估是指对训练完毕的模型评估各类性能指标。本示例采用[COCO官方评估](http://cocodataset.org/#detections-eval) - -`eval.py`是评估模块的主要执行程序,调用示例如下: - - python eval.py \ - --dataset=coco2017 \ - --weights=${path_to_weights} \ - -- 通过设置export CUDA\_VISIBLE\_DEVICES=0指定单卡GPU评估。 - -模型评估结果: - -| input size | mAP(IoU=0.50:0.95) | mAP(IoU=0.50) | mAP(IoU=0.75) | -| :------: | :------: | :------: | :------: | -| 608x608| 37.7 | 59.8 | 40.8 | -| 416x416 | 36.5 | 58.2 | 39.1 | -| 320x320 | 34.1 | 55.4 | 36.3 | - - - -## 模型推断及可视化 - -模型推断可以获取图像中的物体及其对应的类别,`infer.py`是主要执行程序,调用示例如下: - - python infer.py \ - --dataset=coco2017 \ - --weights=${path_to_weights} \ - --image_path=data/COCO17/val2017/ \ - --image_name=000000000139.jpg \ - --draw_threshold=0.5 - -模型预测速度: - - -| input size | 608x608 | 416x416 | 320x320 | -|:-------------:| :-----: | :-----: | :-----: | -| infer speed | 50 ms/frame | 29 ms/frame |24 ms/frame | - -下图为模型可视化预测结果: -

    - - - -
    -YOLOv3 预测可视化 -

    - +您好,该项目已被迁移,请移步到 [PaddleCV/yolov3](../../../PaddleCV/yolov3) 目录下浏览本项目。 diff --git a/fluid/PaddleNLP/LARK b/fluid/PaddleNLP/LARK deleted file mode 160000 index 8dbdf4892a9c22a39a20537fd8584b760f41d963..0000000000000000000000000000000000000000 --- a/fluid/PaddleNLP/LARK +++ /dev/null @@ -1 +0,0 @@ -Subproject commit 8dbdf4892a9c22a39a20537fd8584b760f41d963 diff --git a/fluid/PaddleNLP/SimNet b/fluid/PaddleNLP/SimNet deleted file mode 160000 index 57b93859aa070ae6d96f10a470b1bdf2cfaea052..0000000000000000000000000000000000000000 --- a/fluid/PaddleNLP/SimNet +++ /dev/null @@ -1 +0,0 @@ -Subproject commit 57b93859aa070ae6d96f10a470b1bdf2cfaea052 diff --git a/fluid/PaddleNLP/chinese_ner/README.md b/fluid/PaddleNLP/chinese_ner/README.md index a458c83b5f1ad9c007d35ddfb7a6578fb14bbf2a..d4497b248da56eb8936147c2d1d7c8444f98cb5c 100644 --- a/fluid/PaddleNLP/chinese_ner/README.md +++ b/fluid/PaddleNLP/chinese_ner/README.md @@ -1,62 +1,3 @@ -# 使用ParallelExecutor的中文命名实体识别示例 -以下是本例的简要目录结构及说明: -```text -. -├── data # 存储运行本例所依赖的数据,从外部获取 -├── reader.py # 数据读取接口, 从外部获取 -├── README.md # 文档 -├── train.py # 训练脚本 -├── infer.py # 预测脚本 -``` - -## 数据 -在data目录下,有两个文件夹,train_files中保存的是训练数据,test_files中保存的是测试数据,作为示例,在目录下我们各放置了两个文件,实际训练时,根据自己的实际需要将数据放置在对应目录,并根据数据格式,修改reader.py中的数据读取函数。 - -## 训练 - -通过运行 - -``` -python train.py --help -``` - -来获取命令行参数的帮助,设置正确的数据路径等参数后,运行`train.py`开始训练。 - -训练记录形如 -```txt -pass_id:0, time_cost:4.92960214615s -[Train] precision:0.000862136531076, recall:0.0059880239521, f1:0.00150726226363 -[Test] precision:0.000796178343949, recall:0.00335758254057, f1:0.00128713933283 -pass_id:1, time_cost:0.715255975723s -[Train] precision:0.00474094141551, recall:0.00762112139358, f1:0.00584551148225 -[Test] precision:0.0228873239437, recall:0.00727476217124, f1:0.0110403397028 -pass_id:2, time_cost:0.740842103958s -[Train] precision:0.0120967741935, recall:0.00163309744148, f1:0.00287769784173 -[Test] precision:0, recall:0.0, f1:0 -``` - -## 预测 -类似于训练过程,预测时指定需要测试模型的路径、测试数据、预测标记文件的路径,运行`infer.py`开始预测。 - -预测结果如下 -```txt -152804 O O -130048 O O -38862 10-B O -784 O O -1540 O O -4145 O O -2255 O O -0 O O -1279 O O -7793 O O -373 O O -1621 O O -815 O O -2 O O -247 24-B O -401 24-I O -``` -输出分为三列,以"\t"分割,第一列是输入的词语的序号,第二列是标准结果,第三列为标记结果。多条输入序列之间以空行分隔。 +您好,该项目已被迁移,请移步到 [PaddleNLP/chinese_ner](../../../PaddleNLP/chinese_ner/) 目录下浏览本项目。 diff --git a/fluid/PaddleNLP/deep_attention_matching_net/README.md b/fluid/PaddleNLP/deep_attention_matching_net/README.md index 37085fe46ee6774b3e553a35d840eb11395da8a0..5208812ce239a901d6acef714cfaed56ccfff628 100644 --- a/fluid/PaddleNLP/deep_attention_matching_net/README.md +++ b/fluid/PaddleNLP/deep_attention_matching_net/README.md @@ -1,87 +1,6 @@ -# __Deep Attention Matching Network__ -This is the source code of Deep Attention Matching network (DAM), that is proposed for multi-turn response selection in the retrieval-based chatbot. +Hi! -DAM is a neural matching network that entirely based on attention mechanism. The motivation of DAM is to capture those semantic dependencies, among dialogue elements at different level of granularities, in multi-turn conversation as matching evidences, in order to better match response candidate with its multi-turn context. DAM appears on ACL-2018, please find our paper at [http://aclweb.org/anthology/P18-1103](http://aclweb.org/anthology/P18-1103). +This directory has been deprecated. - -## __Network__ - -DAM is inspired by Transformer in Machine Translation (Vaswani et al., 2017), and we extend the key attention mechanism of Transformer in two perspectives and introduce those two kinds of attention in one uniform neural network. - -- **self-attention** To gradually capture semantic representations in different granularities by stacking attention from word-level embeddings. Those multi-grained semantic representations would facilitate exploring segmental dependencies between context and response. - -- **cross-attention** Attention across context and response can generally capture the relevance in dependency between segment pairs, which could provide complementary information to textual relevance for matching response with multi-turn context. - -

    -
    -Overview of Deep Attention Matching Network -

    - -## __Results__ - -We test DAM on two large-scale multi-turn response selection tasks, i.e., the Ubuntu Corpus v1 and Douban Conversation Corpus, experimental results are bellow: - -

    -
    -

    - -## __Usage__ - -Take the experiment on the Ubuntu Corpus v1 for Example. - -1) Go to the `ubuntu` directory - -``` -cd ubuntu -``` -2) Download the well-preprocessed data for training - -``` -sh download_data.sh -``` -3) Execute the model training and evaluation by - -``` -sh train.sh -``` -for more detailed explanation about the arguments, please run - -``` -python ../train_and_evaluate.py --help -``` - -By default, the training is executed on one single GPU, which can be switched to multiple-GPU mode easily by simply resetting the visible devices in `train.sh`, e.g., - -``` -export CUDA_VISIBLE_DEVICES=0,1,2,3 -``` - -4) Run test by - -``` -sh test.sh -``` -and run the test for different saved models by using different argument `--model_path`. - -Similary, one can carry out the experiment on the Douban Conversation Corpus by going to the directory `douban` and following the same procedure. - -## __Dependencies__ - -- Python >= 2.7.3 -- PaddlePaddle latest develop branch - -## __Citation__ - -The following article describe the DAM in detail. We recommend citing this article as default. - -``` -@inproceedings{ , - title={Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network}, - author={Xiangyang Zhou, Lu Li, Daxiang Dong, Yi Liu, Ying Chen, Wayne Xin Zhao, Dianhai Yu and Hua Wu}, - booktitle={Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, - volume={1}, - pages={ -- }, - year={2018} -} -``` +Please visit the project at [PaddleNLP/deep_attention_matching_net](../../../PaddleNLP/deep_attention_matching_net). diff --git a/fluid/PaddleNLP/language_model/gru/README.md b/fluid/PaddleNLP/language_model/gru/README.md index 91ce2d7f58085b56da2ac2dec03af2a05985ab8f..f508e7061c12d6a7f053fa7c533f8350144d67cc 100644 --- a/fluid/PaddleNLP/language_model/gru/README.md +++ b/fluid/PaddleNLP/language_model/gru/README.md @@ -1,148 +1,2 @@ -# 语言模型 -以下是本例的简要目录结构及说明: - -```text -. -├── README.md # 文档 -├── train.py # 训练脚本 -├── infer.py # 预测脚本 -└── utils.py # 通用函数 -``` - - -## 简介 - -循环神经网络语言模型的介绍可以参阅论文[Recurrent Neural Network Regularization](https://arxiv.org/abs/1409.2329),在本例中,我们实现了GRU-RNN语言模型。 - -## 训练 - -运行命令 `python train.py` 开始训练模型。 -```python -python train.py -``` - -当前支持的参数可参见[train.py](./train.py) `train_net` 函数 -```python -vocab, train_reader, test_reader = utils.prepare_data( - batch_size=20, # batch size - buffer_size=1000, # buffer size, default value is OK - word_freq_threshold=0) # vocabulary related parameter, and words with frequency below this value will be filtered - -train(train_reader=train_reader, - vocab=vocab, - network=network, - hid_size=200, # embedding and hidden size - base_lr=1.0, # base learning rate - batch_size=20, # batch size, the same as that in prepare_data - pass_num=12, # the number of passes for training - use_cuda=True, # whether to use GPU card - parallel=False, # whether to be parallel - model_dir="model", # directory to save model - init_low_bound=-0.1, # uniform parameter initialization lower bound - init_high_bound=0.1) # uniform parameter initialization upper bound -``` - -## 自定义网络结构 - -可在[train.py](./train.py) `network` 函数中调整网络结构,当前的网络结构如下: -```python -emb = fluid.layers.embedding(input=src, size=[vocab_size, hid_size], - param_attr=fluid.ParamAttr( - initializer=fluid.initializer.Uniform(low=init_low_bound, high=init_high_bound), - learning_rate=emb_lr_x), - is_sparse=True) - -fc0 = fluid.layers.fc(input=emb, size=hid_size * 3, - param_attr=fluid.ParamAttr( - initializer=fluid.initializer.Uniform(low=init_low_bound, high=init_high_bound), - learning_rate=gru_lr_x)) -gru_h0 = fluid.layers.dynamic_gru(input=fc0, size=hid_size, - param_attr=fluid.ParamAttr( - initializer=fluid.initializer.Uniform(low=init_low_bound, high=init_high_bound), - learning_rate=gru_lr_x)) - -fc = fluid.layers.fc(input=gru_h0, size=vocab_size, act='softmax', - param_attr=fluid.ParamAttr( - initializer=fluid.initializer.Uniform(low=init_low_bound, high=init_high_bound), - learning_rate=fc_lr_x)) - -cost = fluid.layers.cross_entropy(input=fc, label=dst) -``` - -## 训练结果示例 - -我们在Tesla K40m单GPU卡上训练的日志如下所示 -```text -epoch_1 start -step:100 ppl:771.053 -step:200 ppl:449.597 -step:300 ppl:642.654 -step:400 ppl:458.128 -step:500 ppl:510.912 -step:600 ppl:451.545 -step:700 ppl:364.404 -step:800 ppl:324.272 -step:900 ppl:360.797 -step:1000 ppl:275.761 -step:1100 ppl:294.599 -step:1200 ppl:335.877 -step:1300 ppl:185.262 -step:1400 ppl:241.744 -step:1500 ppl:211.507 -step:1600 ppl:233.431 -step:1700 ppl:298.767 -step:1800 ppl:203.403 -step:1900 ppl:158.828 -step:2000 ppl:171.148 -step:2100 ppl:280.884 -epoch:1 num_steps:2104 time_cost(s):47.478780 -model saved in model/epoch_1 -epoch_2 start -step:100 ppl:238.099 -step:200 ppl:136.527 -step:300 ppl:204.184 -step:400 ppl:252.886 -step:500 ppl:177.377 -step:600 ppl:197.688 -step:700 ppl:131.650 -step:800 ppl:223.906 -step:900 ppl:144.785 -step:1000 ppl:176.286 -step:1100 ppl:148.158 -step:1200 ppl:203.581 -step:1300 ppl:168.208 -step:1400 ppl:159.412 -step:1500 ppl:114.032 -step:1600 ppl:157.985 -step:1700 ppl:147.743 -step:1800 ppl:88.676 -step:1900 ppl:141.962 -step:2000 ppl:106.087 -step:2100 ppl:122.709 -epoch:2 num_steps:2104 time_cost(s):47.583789 -model saved in model/epoch_2 -... -``` - -## 预测 -运行命令 `python infer.py model_dir start_epoch last_epoch(inclusive)` 开始预测,其中,start_epoch指定开始预测的轮次,last_epoch指定结束的轮次,例如 -```python -python infer.py model 1 12 # prediction from epoch 1 to epoch 12 -``` - -## 预测结果示例 -```text -model:model/epoch_1 ppl:254.540 time_cost(s):3.29 -model:model/epoch_2 ppl:177.671 time_cost(s):3.27 -model:model/epoch_3 ppl:156.251 time_cost(s):3.27 -model:model/epoch_4 ppl:139.036 time_cost(s):3.27 -model:model/epoch_5 ppl:132.661 time_cost(s):3.27 -model:model/epoch_6 ppl:130.092 time_cost(s):3.28 -model:model/epoch_7 ppl:128.751 time_cost(s):3.27 -model:model/epoch_8 ppl:125.411 time_cost(s):3.27 -model:model/epoch_9 ppl:124.604 time_cost(s):3.28 -model:model/epoch_10 ppl:124.754 time_cost(s):3.29 -model:model/epoch_11 ppl:125.421 time_cost(s):3.27 -model:model/epoch_12 ppl:125.676 time_cost(s):3.27 -``` +您好,该项目已被迁移,请移步到 [PaddleNLP/language_model/gru](../../../../PaddleNLP/language_model/gru) 目录下浏览本项目。 diff --git a/fluid/PaddleNLP/language_model/lstm/README.md b/fluid/PaddleNLP/language_model/lstm/README.md index f6d1250ff66a066c8634eca9c3f74312f00a7749..9aa66dd7eb87b16ab503342014c4dabdfbe4906d 100644 --- a/fluid/PaddleNLP/language_model/lstm/README.md +++ b/fluid/PaddleNLP/language_model/lstm/README.md @@ -1,76 +1,2 @@ -# lstm lm -以下是本例的简要目录结构及说明: - -```text -. -├── README.md # 文档 -├── train.py # 训练脚本 -├── reader.py # 数据读取 -└── lm_model.py # 模型定义文件 -``` - - -## 简介 - -循环神经网络语言模型的介绍可以参阅论文[Recurrent Neural Network Regularization](https://arxiv.org/abs/1409.2329),本文主要是说明基于lstm的语言的模型的实现,数据是采用ptb dataset,下载地址为 -http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz - -## 数据下载 -用户可以自行下载数据,并解压, 也可以利用目录中的脚本 - -cd data; sh download_data.sh - -## 训练 - -运行命令 -`CUDA_VISIBLE_DEVICES=0 python train.py --data_path data/simple-examples/data/ --model_type small --use_gpu True` - 开始训练模型。 - -model_type 为模型配置的大小,目前支持 small,medium, large 三种配置形式 - -实现采用双层的lstm,具体的参数和网络配置 可以参考 train.py, lm_model.py 文件中的设置 - - -## 训练结果示例 - -p40中训练日志如下(small config), test 测试集仅在最后一个epoch完成后进行测试 -```text -epoch id 0 -ppl 232 865.86505 1.0 -ppl 464 632.76526 1.0 -ppl 696 510.47153 1.0 -ppl 928 437.60617 1.0 -ppl 1160 393.38422 1.0 -ppl 1392 353.05365 1.0 -ppl 1624 325.73267 1.0 -ppl 1856 305.488 1.0 -ppl 2088 286.3128 1.0 -ppl 2320 270.91504 1.0 -train ppl 270.86246 -valid ppl 181.867964379 -... -ppl 2320 40.975872 0.001953125 -train ppl 40.974102 -valid ppl 117.85741214 -test ppl 113.939103843 -``` -## 与tf结果对比 - -tf采用的版本是1.6 -```text -small config - train valid test -fluid 1.0 40.962 118.111 112.617 -tf 1.6 40.492 118.329 113.788 - -medium config - train valid test -fluid 1.0 45.620 87.398 83.682 -tf 1.6 45.594 87.363 84.015 - -large config - train valid test -fluid 1.0 37.221 82.358 78.137 -tf 1.6 38.342 82.311 78.121 -``` +您好,该项目已被迁移,请移步到 [PaddleNLP/language_model/lstm](../../../../PaddleNLP/language_model/lstm) 目录下浏览本项目。 diff --git a/fluid/PaddleNLP/machine_reading_comprehension/README.md b/fluid/PaddleNLP/machine_reading_comprehension/README.md index 884c15058e9b5601c7754e27d1b106fd41e2ac27..eb527349298e0b804ccc11d179cc9b78f1715b49 100644 --- a/fluid/PaddleNLP/machine_reading_comprehension/README.md +++ b/fluid/PaddleNLP/machine_reading_comprehension/README.md @@ -1,69 +1,6 @@ -# Abstract -Dureader is an end-to-end neural network model for machine reading comprehension style question answering, which aims to answer questions from given passages. We first match the question and passages with a bidireactional attention flow network to obtrain the question-aware passages represenation. Then we employ a pointer network to locate the positions of answers from passages. Our experimental evalutions show that DuReader model achieves the state-of-the-art results in DuReader Dadaset. -# Dataset -DuReader Dataset is a new large-scale real-world and human sourced MRC dataset in Chinese. DuReader focuses on real-world open-domain question answering. The advantages of DuReader over existing datasets are concluded as follows: - - Real question - - Real article - - Real answer - - Real application scenario - - Rich annotation -# Network -DuReader model is inspired by 3 classic reading comprehension models([BiDAF](https://arxiv.org/abs/1611.01603), [Match-LSTM](https://arxiv.org/abs/1608.07905), [R-NET](https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/r-net.pdf)). +Hi! -DuReader model is a hierarchical multi-stage process and consists of five layers +This directory has been deprecated. -- **Word Embedding Layer** maps each word to a vector using a pre-trained word embedding model. -- **Encoding Layer** extracts context infomation for each position in question and passages with a bi-directional LSTM network. -- **Attention Flow Layer** couples the query and context vectors and produces a set of query-aware feature vectors for each word in the context. Please refer to [BiDAF](https://arxiv.org/abs/1611.01603) for more details. -- **Fusion Layer** employs a layer of bi-directional LSTM to capture the interaction among context words independent of the query. -- **Decode Layer** employs an answer point network with attention pooling of the quesiton to locate the positions of answers from passages. Please refer to [Match-LSTM](https://arxiv.org/abs/1608.07905) and [R-NET](https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/r-net.pdf) for more details. - -## How to Run -### Download the Dataset -To Download DuReader dataset: -``` -cd data && bash download.sh -``` -For more details about DuReader dataset please refer to [DuReader Dataset Homepage](https://ai.baidu.com//broad/subordinate?dataset=dureader). - -### Download Thirdparty Dependencies -We use Bleu and Rouge as evaluation metrics, the calculation of these metrics relies on the scoring scripts under [coco-caption](https://github.com/tylin/coco-caption), to download them, run: - -``` -cd utils && bash download_thirdparty.sh -``` -### Environment Requirements -For now we've only tested on PaddlePaddle v1.0, to install PaddlePaddle and for more details about PaddlePaddle, see [PaddlePaddle Homepage](http://paddlepaddle.org). - -### Preparation -Before training the model, we have to make sure that the data is ready. For preparation, we will check the data files, make directories and extract a vocabulary for later use. You can run the following command to do this with a specified task name: - -``` -sh run.sh --prepare -``` -You can specify the files for train/dev/test by setting the `trainset`/`devset`/`testset`. -### Training -To train the model and you can also set the hyper-parameters such as the learning rate by using `--learning_rate NUM`. For example, to train the model for 10 passes, you can run: - -``` -sh run.sh --train --pass_num 10 -``` - -The training process includes an evaluation on the dev set after each training epoch. By default, the model with the least Bleu-4 score on the dev set will be saved. - -### Evaluation -To conduct a single evaluation on the dev set with the the model already trained, you can run the following command: - -``` -sh run.sh --evaluate --load_dir models/1 -``` - -### Prediction -You can also predict answers for the samples in some files using the following command: - -``` -sh run.sh --predict --load_dir models/1 --testset ../data/preprocessed/testset/search.dev.json -``` - -By default, the results are saved at `../data/results/` folder. You can change this by specifying `--result_dir DIR_PATH`. +Please visit the project at [PaddleNLP/machine_reading_comprehension](../../../PaddleNLP/machine_reading_comprehension). diff --git a/fluid/PaddleNLP/neural_machine_translation/README.md b/fluid/PaddleNLP/neural_machine_translation/README.md index a0271ad42e62490282ccc154f6a3c50029b6d13d..08446669df7617a2364c455d5093d086f1fd1a6e 100644 --- a/fluid/PaddleNLP/neural_machine_translation/README.md +++ b/fluid/PaddleNLP/neural_machine_translation/README.md @@ -1,9 +1,6 @@ -The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html). ---- +Hi! -This is a collection of example models for neural machine translation and neural sequence modeling. +This directory has been deprecated. -### TODO - -This project is still under active development. +Please visit the project at [PaddleNLP/neural_machine_translation](../../../PaddleNLP/neural_machine_translation). diff --git a/fluid/PaddleNLP/neural_machine_translation/rnn_search/README.md b/fluid/PaddleNLP/neural_machine_translation/rnn_search/README.md index 86d4a021baf11e04a9fd07c05dbf50425451efab..005fb7e2e56c19583bfbeb7997c25fbef5f77578 100644 --- a/fluid/PaddleNLP/neural_machine_translation/rnn_search/README.md +++ b/fluid/PaddleNLP/neural_machine_translation/rnn_search/README.md @@ -1,134 +1,2 @@ -运行本目录下的范例模型需要安装PaddlePaddle Fluid 1.0版。如果您的 PaddlePaddle 安装版本低于此要求,请按照[安装文档](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html)中的说明更新 PaddlePaddle 安装版本。 -# 机器翻译:RNN Search - -以下是本范例模型的简要目录结构及说明: - -```text -. -├── README.md # 文档,本文件 -├── args.py # 训练、预测以及模型参数 -├── train.py # 训练主程序 -├── infer.py # 预测主程序 -├── attention_model.py # 带注意力机制的翻译模型配置 -└── no_attention_model.py # 无注意力机制的翻译模型配置 -``` - -## 简介 -机器翻译(machine translation, MT)是用计算机来实现不同语言之间翻译的技术。被翻译的语言通常称为源语言(source language),翻译成的结果语言称为目标语言(target language)。机器翻译即实现从源语言到目标语言转换的过程,是自然语言处理的重要研究领域之一。 - -近年来,深度学习技术的发展不断为机器翻译任务带来新的突破。直接用神经网络将源语言映射到目标语言,即端到端的神经网络机器翻译(End-to-End Neural Machine Translation, End-to-End NMT)模型逐渐成为主流,此类模型一般简称为NMT模型。 - -本目录包含一个经典的机器翻译模型[RNN Search](https://arxiv.org/pdf/1409.0473.pdf)的Paddle Fluid实现。事实上,RNN search是一个较为传统的NMT模型,在现阶段,其表现已被很多新模型(如[Transformer](https://arxiv.org/abs/1706.03762))超越。但除机器翻译外,该模型是许多序列到序列(sequence to sequence, 以下简称Seq2Seq)类模型的基础,很多解决其他NLP问题的模型均以此模型为基础;因此其在NLP领域具有重要意义,并被广泛用作Baseline. - -本目录下此范例模型的实现,旨在展示如何用Paddle Fluid实现一个带有注意力机制(Attention)的RNN模型来解决Seq2Seq类问题,以及如何使用带有Beam Search算法的解码器。如果您仅仅只是需要在机器翻译方面有着较好翻译效果的模型,则建议您参考[Transformer的Paddle Fluid实现](https://github.com/PaddlePaddle/models/tree/develop/fluid/neural_machine_translation/transformer)。 - -## 模型概览 -RNN Search模型使用了经典的编码器-解码器(Encoder-Decoder)的框架结构来解决Seq2Seq类问题。这种方法先用编码器将源序列编码成vector,再用解码器将该vector解码为目标序列。这其实模拟了人类在进行翻译类任务时的行为:先解析源语言,理解其含义,再根据该含义来写出目标语言的语句。编码器和解码器往往都使用RNN来实现。关于此方法的具体原理和数学表达式,可以参考[深度学习101](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/basics/machine_translation/index.html). - -本模型中,在编码器方面,我们的实现使用了双向循环神经网络(Bi-directional Recurrent Neural Network);在解码器方面,我们使用了带注意力(Attention)机制的RNN解码器,并同时提供了一个不带注意力机制的解码器实现作为对比;而在预测方面我们使用柱搜索(beam search)算法来生成翻译的目标语句。以下将分别介绍用到的这些方法。 - -### 双向循环神经网络 -这里介绍Bengio团队在论文\[[2](#参考文献),[4](#参考文献)\]中提出的一种双向循环网络结构。该结构的目的是输入一个序列,得到其在每个时刻的特征表示,即输出的每个时刻都用定长向量表示到该时刻的上下文语义信息。 -具体来说,该双向循环神经网络分别在时间维以顺序和逆序——即前向(forward)和后向(backward)——依次处理输入序列,并将每个时间步RNN的输出拼接成为最终的输出层。这样每个时间步的输出节点,都包含了输入序列中当前时刻完整的过去和未来的上下文信息。下图展示的是一个按时间步展开的双向循环神经网络。该网络包含一个前向和一个后向RNN,其中有六个权重矩阵:输入到前向隐层和后向隐层的权重矩阵($W_1, W_3$),隐层到隐层自己的权重矩阵($W_2,W_5$),前向隐层和后向隐层到输出层的权重矩阵($W_4, W_6$)。注意,该网络的前向隐层和后向隐层之间没有连接。 - -

    -
    -图1. 按时间步展开的双向循环神经网络 -

    - -

    -
    -图2. 使用双向LSTM的编码器 -

    - -### 注意力机制 -如果编码阶段的输出是一个固定维度的向量,会带来以下两个问题:1)不论源语言序列的长度是5个词还是50个词,如果都用固定维度的向量去编码其中的语义和句法结构信息,对模型来说是一个非常高的要求,特别是对长句子序列而言;2)直觉上,当人类翻译一句话时,会对与当前译文更相关的源语言片段上给予更多关注,且关注点会随着翻译的进行而改变。而固定维度的向量则相当于,任何时刻都对源语言所有信息给予了同等程度的关注,这是不合理的。因此,Bahdanau等人\[[4](#参考文献)\]引入注意力(attention)机制,可以对编码后的上下文片段进行解码,以此来解决长句子的特征学习问题。下面介绍在注意力机制下的解码器结构。 - -与简单的解码器不同,这里$z_i$的计算公式为 (由于Github原生不支持LaTeX公式,请您移步[这里](http://www.paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/basics/machine_translation/index.html)查看): - -$$z_{i+1}=\phi _{\theta '}\left ( c_i,u_i,z_i \right )$$ - -可见,源语言句子的编码向量表示为第$i$个词的上下文片段$c_i$,即针对每一个目标语言中的词$u_i$,都有一个特定的$c_i$与之对应。$c_i$的计算公式如下: - -$$c_i=\sum _{j=1}^{T}a_{ij}h_j, a_i=\left[ a_{i1},a_{i2},...,a_{iT}\right ]$$ - -从公式中可以看出,注意力机制是通过对编码器中各时刻的RNN状态$h_j$进行加权平均实现的。权重$a_{ij}$表示目标语言中第$i$个词对源语言中第$j$个词的注意力大小,$a_{ij}$的计算公式如下: - -$$a_{ij} = {exp(e_{ij}) \over {\sum_{k=1}^T exp(e_{ik})}}$$ -$$e_{ij} = {align(z_i, h_j)}$$ - -其中,$align$可以看作是一个对齐模型,用来衡量目标语言中第$i$个词和源语言中第$j$个词的匹配程度。具体而言,这个程度是通过解码RNN的第$i$个隐层状态$z_i$和源语言句子的第$j$个上下文片段$h_j$计算得到的。传统的对齐模型中,目标语言的每个词明确对应源语言的一个或多个词(hard alignment);而在注意力模型中采用的是soft alignment,即任何两个目标语言和源语言词间均存在一定的关联,且这个关联强度是由模型计算得到的实数,因此可以融入整个NMT框架,并通过反向传播算法进行训练。 - -

    -
    -图3. 基于注意力机制的解码器 -

    - -### 柱搜索算法 - -柱搜索([beam search](http://en.wikipedia.org/wiki/Beam_search))是一种启发式图搜索算法,用于在图或树中搜索有限集合中的最优扩展节点,通常用在解空间非常大的系统(如机器翻译、语音识别)中,原因是内存无法装下图或树中所有展开的解。如在机器翻译任务中希望翻译“`你好`”,就算目标语言字典中只有3个词(``, ``, `hello`),也可能生成无限句话(`hello`循环出现的次数不定),为了找到其中较好的翻译结果,我们可采用柱搜索算法。 - -柱搜索算法使用广度优先策略建立搜索树,在树的每一层,按照启发代价(heuristic cost)(本教程中,为生成词的log概率之和)对节点进行排序,然后仅留下预先确定的个数(文献中通常称为beam width、beam size、柱宽度等)的节点。只有这些节点会在下一层继续扩展,其他节点就被剪掉了,也就是说保留了质量较高的节点,剪枝了质量较差的节点。因此,搜索所占用的空间和时间大幅减少,但缺点是无法保证一定获得最优解。 - -使用柱搜索算法的解码阶段,目标是最大化生成序列的概率。思路是: - -1. 每一个时刻,根据源语言句子的编码信息$c$、生成的第$i$个目标语言序列单词$u_i$和$i$时刻RNN的隐层状态$z_i$,计算出下一个隐层状态$z_{i+1}$。 -2. 将$z_{i+1}$通过`softmax`归一化,得到目标语言序列的第$i+1$个单词的概率分布$p_{i+1}$。 -3. 根据$p_{i+1}$采样出单词$u_{i+1}$。 -4. 重复步骤1~3,直到获得句子结束标记``或超过句子的最大生成长度为止。 - -注意:$z_{i+1}$和$p_{i+1}$的计算公式同解码器中的一样。且由于生成时的每一步都是通过贪心法实现的,因此并不能保证得到全局最优解。 - -## 数据介绍 - -本教程使用[WMT-14](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/)数据集中的[bitexts(after selection)](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/data/bitexts.tgz)作为训练集,[dev+test data](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/data/dev+test.tgz)作为测试集和生成集。 - -### 数据预处理 - -我们的预处理流程包括两步: -- 将每个源语言到目标语言的平行语料库文件合并为一个文件: - - 合并每个`XXX.src`和`XXX.trg`文件为`XXX`。 - - `XXX`中的第$i$行内容为`XXX.src`中的第$i$行和`XXX.trg`中的第$i$行连接,用'\t'分隔。 -- 创建训练数据的“源字典”和“目标字典”。每个字典都有**DICTSIZE**个单词,包括:语料中词频最高的(DICTSIZE - 3)个单词,和3个特殊符号``(序列的开始)、``(序列的结束)和``(未登录词)。 - -### 示例数据 - -因为完整的数据集数据量较大,为了验证训练流程,PaddlePaddle接口paddle.dataset.wmt14中默认提供了一个经过预处理的[较小规模的数据集](http://paddlepaddle.bj.bcebos.com/demo/wmt_shrinked_data/wmt14.tgz)。 - -该数据集有193319条训练数据,6003条测试数据,词典长度为30000。因为数据规模限制,使用该数据集训练出来的模型效果无法保证。 - -## 训练模型 - -`train.py`包含训练程序的主函数,要使用默认参数开始训练,只需要简单地执行: -```sh -python train.py -``` -您可以使用命令行参数来设置模型训练时的参数。要显示所有可用的命令行参数,执行: -```sh -python train.py -h -``` -这样会显示所有的命令行参数的描述,以及其默认值。默认的模型是带有注意力机制的。您也可以尝试运行无注意力机制的模型,命令如下: -```sh -python train.py --no_attention -``` -训练好的模型默认会被保存到`./models`路径下。您可以用命令行参数`--save_dir`来指定模型的保存路径。默认每个pass结束时会保存一个模型。 - -## 生成预测结果 - -在模型训练好后,可以用`infer.py`来生成预测结果。同样的,使用默认参数,只需要执行: -```sh -python infer.py -``` -您也可以同样用命令行来指定各参数。注意,预测时的参数设置必须与训练时完全一致,否则载入模型会失败。您可以用`--pass_num`参数来选择读取哪个pass结束时保存的模型。同时您可以使用`--beam_width`参数来选择beam search宽度。 - -## 参考文献 - -1. Koehn P. [Statistical machine translation](https://books.google.com.hk/books?id=4v_Cx1wIMLkC&printsec=frontcover&hl=zh-CN&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false)[M]. Cambridge University Press, 2009. -2. Cho K, Van Merriënboer B, Gulcehre C, et al. [Learning phrase representations using RNN encoder-decoder for statistical machine translation](http://www.aclweb.org/anthology/D/D14/D14-1179.pdf)[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014: 1724-1734. -3. Chung J, Gulcehre C, Cho K H, et al. [Empirical evaluation of gated recurrent neural networks on sequence modeling](https://arxiv.org/abs/1412.3555)[J]. arXiv preprint arXiv:1412.3555, 2014. -4. Bahdanau D, Cho K, Bengio Y. [Neural machine translation by jointly learning to align and translate](https://arxiv.org/abs/1409.0473)[C]//Proceedings of ICLR 2015, 2015. -5. Papineni K, Roukos S, Ward T, et al. [BLEU: a method for automatic evaluation of machine translation](http://dl.acm.org/citation.cfm?id=1073135)[C]//Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2002: 311-318. - -
    -知识共享许可协议
    本教程PaddlePaddle 创作,采用 知识共享 署名-相同方式共享 4.0 国际 许可协议进行许可。 +您好,该项目已被迁移,请移步到 [PaddleNLP/neural_machine_translation/rnn_search](../../../../PaddleNLP/neural_machine_translation/rnn_search) 目录下浏览本项目。 diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/README.md b/fluid/PaddleNLP/neural_machine_translation/transformer/README.md index 6fea167b5e7c3e9dd759ef30d9225b451350e889..47a4f78bbb1e18e55442807b0701aef08f370fc0 100644 --- a/fluid/PaddleNLP/neural_machine_translation/transformer/README.md +++ b/fluid/PaddleNLP/neural_machine_translation/transformer/README.md @@ -1,23 +1,2 @@ -The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html). ---- - -# Attention is All You Need: A Paddle Fluid implementation - -This is a Paddle Fluid implementation of the Transformer model in [Attention is All You Need]() (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017). - -If you use the dataset/code in your research, please cite the paper: - -```text -@inproceedings{vaswani2017attention, - title={Attention is all you need}, - author={Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia}, - booktitle={Advances in Neural Information Processing Systems}, - pages={6000--6010}, - year={2017} -} -``` - -### TODO - -This project is still under active development. +您好,该项目已被迁移,请移步到 [PaddleNLP/neural_machine_translation/transformer](../../../../PaddleNLP/neural_machine_translation/transformer) 目录下浏览本项目。 diff --git a/fluid/PaddleNLP/sequence_tagging_for_ner/README.md b/fluid/PaddleNLP/sequence_tagging_for_ner/README.md index 6d4efa9eb19dd708a87d4883dccef5ecb5e11666..d9650941eb9469a557dac879cd5cc52a3b0a03d3 100644 --- a/fluid/PaddleNLP/sequence_tagging_for_ner/README.md +++ b/fluid/PaddleNLP/sequence_tagging_for_ner/README.md @@ -1,116 +1,2 @@ -# 命名实体识别 -以下是本例的简要目录结构及说明: - -```text -. -├── data # 存储运行本例所依赖的数据,从外部获取 -├── network_conf.py # 模型定义 -├── reader.py # 数据读取接口, 从外部获取 -├── README.md # 文档 -├── train.py # 训练脚本 -├── infer.py # 预测脚本 -├── utils.py # 定义通用的函数, 从外部获取 -└── utils_extend.py # 对utils.py的拓展 -``` - - -## 简介,模型详解 - -在PaddlePaddle v2版本[命名实体识别](https://github.com/PaddlePaddle/models/blob/develop/legacy/sequence_tagging_for_ner/README.md)中对于命名实体识别任务有较详细的介绍,在本例中不再重复介绍。 -在模型上,我们沿用了v2版本的模型结构,唯一区别是我们使用LSTM代替原始的RNN。 - -## 数据获取 - -完整数据的获取请参考PaddlePaddle v2版本[命名实体识别](https://github.com/PaddlePaddle/models/blob/develop/legacy/sequence_tagging_for_ner/README.md) 一节中的方式。本例的示例数据同样可以通过运行data/download.sh来获取。 - -## 训练 - -1. 运行 `sh data/download.sh` -2. 修改 `train.py` 的 `main` 函数,指定数据路径 - - ```python - main( - train_data_file="data/train", - test_data_file="data/test", - vocab_file="data/vocab.txt", - target_file="data/target.txt", - emb_file="data/wordVectors.txt", - model_save_dir="models", - num_passes=1000, - use_gpu=False, - parallel=False) - ``` - -3. 运行命令 `python train.py` ,**需要注意:直接运行使用的是示例数据,请替换真实的标记数据。** - - ```text - Pass 127, Batch 9525, Cost 4.0867705, Precision 0.3954984, Recall 0.37846154, F1_score0.38679245 - Pass 127, Batch 9530, Cost 3.137265, Precision 0.42971888, Recall 0.38351256, F1_score0.405303 - Pass 127, Batch 9535, Cost 3.6240938, Precision 0.4272152, Recall 0.41795665, F1_score0.4225352 - Pass 127, Batch 9540, Cost 3.5352352, Precision 0.48464164, Recall 0.4536741, F1_score0.46864685 - Pass 127, Batch 9545, Cost 4.1130385, Precision 0.40131578, Recall 0.3836478, F1_score0.39228293 - Pass 127, Batch 9550, Cost 3.6826708, Precision 0.43333334, Recall 0.43730888, F1_score0.43531203 - Pass 127, Batch 9555, Cost 3.6363933, Precision 0.42424244, Recall 0.3962264, F1_score0.4097561 - Pass 127, Batch 9560, Cost 3.6101768, Precision 0.51363635, Recall 0.353125, F1_score0.41851854 - Pass 127, Batch 9565, Cost 3.5935276, Precision 0.5152439, Recall 0.5, F1_score0.5075075 - Pass 127, Batch 9570, Cost 3.4987144, Precision 0.5, Recall 0.4330218, F1_score0.46410686 - Pass 127, Batch 9575, Cost 3.4659843, Precision 0.39864865, Recall 0.38064516, F1_score0.38943896 - Pass 127, Batch 9580, Cost 3.1702557, Precision 0.5, Recall 0.4490446, F1_score0.47315437 - Pass 127, Batch 9585, Cost 3.1587276, Precision 0.49377593, Recall 0.4089347, F1_score0.4473684 - Pass 127, Batch 9590, Cost 3.5043538, Precision 0.4556962, Recall 0.4600639, F1_score0.45786962 - Pass 127, Batch 9595, Cost 2.981989, Precision 0.44981414, Recall 0.45149255, F1_score0.4506518 - [TrainSet] pass_id:127 pass_precision:[0.46023396] pass_recall:[0.43197003] pass_f1_score:[0.44565433] - [TestSet] pass_id:127 pass_precision:[0.4708409] pass_recall:[0.47971722] pass_f1_score:[0.4752376] - ``` -## 预测 -1. 修改 [infer.py](./infer.py) 的 `infer` 函数,指定:需要测试的模型的路径、测试数据、字典文件,预测标记文件的路径,默认参数如下: - - ```python - infer( - model_path="models/params_pass_0", - batch_size=6, - test_data_file="data/test", - vocab_file="data/vocab.txt", - target_file="data/target.txt", - use_gpu=False - ) - ``` - -2. 在终端运行 `python infer.py`,开始测试,会看到如下预测结果(以下为训练70个pass所得模型的部分预测结果): - - ```text - leicestershire B-ORG B-LOC - extended O O - their O O - first O O - innings O O - by O O - DGDG O O - runs O O - before O O - being O O - bowled O O - out O O - for O O - 296 O O - with O O - england B-LOC B-LOC - discard O O - andy B-PER B-PER - caddick I-PER I-PER - taking O O - three O O - for O O - DGDG O O - . O O - ``` - - 输出分为三列,以“\t” 分隔,第一列是输入的词语,第二列是标准结果,第三列为生成的标记结果。多条输入序列之间以空行分隔。 - -## 结果示例 - -

    -
    -图1. 学习曲线, 横轴表示训练轮数,纵轴表示F1值 -

    +您好,该项目已被迁移,请移步到 [PaddleNLP/sequence_tagging_for_ner](../../../PaddleNLP/sequence_tagging_for_ner) 目录下浏览本项目。 diff --git a/fluid/PaddleNLP/text_classification/README.md b/fluid/PaddleNLP/text_classification/README.md index 669774bac04fe906cc5bffafa1f60de60323c806..28351d6a035babea1ee1ff6a8e4b0c69780657de 100644 --- a/fluid/PaddleNLP/text_classification/README.md +++ b/fluid/PaddleNLP/text_classification/README.md @@ -1,112 +1,2 @@ -# 文本分类 -以下是本例的简要目录结构及说明: - -```text -. -├── nets.py # 模型定义 -├── README.md # 文档 -├── train.py # 训练脚本 -├── infer.py # 预测脚本 -└── utils.py # 定义通用函数,从外部获取 -``` - - -## 简介,模型详解 - -在PaddlePaddle v2版本[文本分类](https://github.com/PaddlePaddle/models/blob/develop/legacy/text_classification/README.md)中对于文本分类任务有较详细的介绍,在本例中不再重复介绍。 -在模型上,我们采用了bow, cnn, lstm, gru四种常见的文本分类模型。 - -## 训练 - -1. 运行命令 `python train.py bow` 开始训练模型。 - ```python - python train.py bow # bow指定网络结构,可替换成cnn, lstm, gru - ``` - -2. (可选)想自定义网络结构,需在[nets.py](./nets.py)中自行添加,并设置[train.py](./train.py)中的相应参数。 - ```python - def train(train_reader, # 训练数据 - word_dict, # 数据字典 - network, # 模型配置 - use_cuda, # 是否用GPU - parallel, # 是否并行 - save_dirname, # 保存模型路径 - lr=0.2, # 学习率大小 - batch_size=128, # 每个batch的样本数 - pass_num=30): # 训练的轮数 - ``` - -## 训练结果示例 -```text - pass_id: 0, avg_acc: 0.848040, avg_cost: 0.354073 - pass_id: 1, avg_acc: 0.914200, avg_cost: 0.217945 - pass_id: 2, avg_acc: 0.929800, avg_cost: 0.184302 - pass_id: 3, avg_acc: 0.938680, avg_cost: 0.164240 - pass_id: 4, avg_acc: 0.945120, avg_cost: 0.149150 - pass_id: 5, avg_acc: 0.951280, avg_cost: 0.137117 - pass_id: 6, avg_acc: 0.955360, avg_cost: 0.126434 - pass_id: 7, avg_acc: 0.961400, avg_cost: 0.117405 - pass_id: 8, avg_acc: 0.963560, avg_cost: 0.110070 - pass_id: 9, avg_acc: 0.965840, avg_cost: 0.103273 - pass_id: 10, avg_acc: 0.969800, avg_cost: 0.096314 - pass_id: 11, avg_acc: 0.971720, avg_cost: 0.090206 - pass_id: 12, avg_acc: 0.974800, avg_cost: 0.084970 - pass_id: 13, avg_acc: 0.977400, avg_cost: 0.078981 - pass_id: 14, avg_acc: 0.980000, avg_cost: 0.073685 - pass_id: 15, avg_acc: 0.981080, avg_cost: 0.069898 - pass_id: 16, avg_acc: 0.982080, avg_cost: 0.064923 - pass_id: 17, avg_acc: 0.984680, avg_cost: 0.060861 - pass_id: 18, avg_acc: 0.985840, avg_cost: 0.057095 - pass_id: 19, avg_acc: 0.988080, avg_cost: 0.052424 - pass_id: 20, avg_acc: 0.989160, avg_cost: 0.049059 - pass_id: 21, avg_acc: 0.990120, avg_cost: 0.045882 - pass_id: 22, avg_acc: 0.992080, avg_cost: 0.042140 - pass_id: 23, avg_acc: 0.992280, avg_cost: 0.039722 - pass_id: 24, avg_acc: 0.992840, avg_cost: 0.036607 - pass_id: 25, avg_acc: 0.994440, avg_cost: 0.034040 - pass_id: 26, avg_acc: 0.995000, avg_cost: 0.031501 - pass_id: 27, avg_acc: 0.995440, avg_cost: 0.028988 - pass_id: 28, avg_acc: 0.996240, avg_cost: 0.026639 - pass_id: 29, avg_acc: 0.996960, avg_cost: 0.024186 -``` - -## 预测 -1. 运行命令 `python infer.py bow_model`, 开始预测。 - ```python - python infer.py bow_model # bow_model指定需要导入的模型 - -## 预测结果示例 -```text - model_path: bow_model/epoch0, avg_acc: 0.882800 - model_path: bow_model/epoch1, avg_acc: 0.882360 - model_path: bow_model/epoch2, avg_acc: 0.881400 - model_path: bow_model/epoch3, avg_acc: 0.877800 - model_path: bow_model/epoch4, avg_acc: 0.872920 - model_path: bow_model/epoch5, avg_acc: 0.872640 - model_path: bow_model/epoch6, avg_acc: 0.869960 - model_path: bow_model/epoch7, avg_acc: 0.865160 - model_path: bow_model/epoch8, avg_acc: 0.863680 - model_path: bow_model/epoch9, avg_acc: 0.861200 - model_path: bow_model/epoch10, avg_acc: 0.853520 - model_path: bow_model/epoch11, avg_acc: 0.850400 - model_path: bow_model/epoch12, avg_acc: 0.855960 - model_path: bow_model/epoch13, avg_acc: 0.853480 - model_path: bow_model/epoch14, avg_acc: 0.855960 - model_path: bow_model/epoch15, avg_acc: 0.854120 - model_path: bow_model/epoch16, avg_acc: 0.854160 - model_path: bow_model/epoch17, avg_acc: 0.852240 - model_path: bow_model/epoch18, avg_acc: 0.852320 - model_path: bow_model/epoch19, avg_acc: 0.850280 - model_path: bow_model/epoch20, avg_acc: 0.849760 - model_path: bow_model/epoch21, avg_acc: 0.850160 - model_path: bow_model/epoch22, avg_acc: 0.846800 - model_path: bow_model/epoch23, avg_acc: 0.845440 - model_path: bow_model/epoch24, avg_acc: 0.845640 - model_path: bow_model/epoch25, avg_acc: 0.846200 - model_path: bow_model/epoch26, avg_acc: 0.845880 - model_path: bow_model/epoch27, avg_acc: 0.844880 - model_path: bow_model/epoch28, avg_acc: 0.844680 - model_path: bow_model/epoch29, avg_acc: 0.844960 -``` -注:过拟合导致acc持续下降,请忽略 +您好,该项目已被迁移,请移步到 [PaddleNLP/text_classification](../../../PaddleNLP/text_classification) 目录下浏览本项目。 diff --git a/fluid/PaddleNLP/text_matching_on_quora/README.md b/fluid/PaddleNLP/text_matching_on_quora/README.md index 77d93943ae7dcbe775e60307b74430f320dbaab1..2b268a809b14a9d88a13a12ec4b73d02b7bccf78 100644 --- a/fluid/PaddleNLP/text_matching_on_quora/README.md +++ b/fluid/PaddleNLP/text_matching_on_quora/README.md @@ -1,177 +1,6 @@ -# Text matching on Quora qestion-answer pair dataset -## contents +Hi! -* [Introduction](#introduction) - * [a brief review of the Quora Question Pair (QQP) Task](#a-brief-review-of-the-quora-question-pair-qqp-task) - * [Our Work](#our-work) -* [Environment Preparation](#environment-preparation) - * [Install Fluid release 1.0](#install-fluid-release-10) - * [cpu version](#cpu-version) - * [gpu version](#gpu-version) - * [Have I installed Fluid successfully?](#have-i-installed-fluid-successfully) -* [Prepare Data](#prepare-data) -* [Train and evaluate](#train-and-evaluate) -* [Models](#models) -* [Results](#results) +This directory has been deprecated. - -## Introduction - -### a brief review of the Quora Question Pair (QQP) Task - -The [Quora Question Pair](https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs) dataset contains 400,000 question pairs from [Quora](https://www.quora.com/), where people ask and answer questions related to specific areas. Each sample in the dataset consists of two questions (both English) and a label that represents whether the questions are duplicate. The dataset is well annotated by human. - -Below are two samples from the dataset. The last column indicates whether the two questions are duplicate (1) or not (0). - -|id | qid1 | qid2| question1| question2| is_duplicate -|:---:|:---:|:---:|:---:|:---:|:---:| -|0 |1 |2 |What is the step by step guide to invest in share market in india? |What is the step by step guide to invest in share market? |0| -|1 |3 |4 |What is the story of Kohinoor (Koh-i-Noor) Diamond? | What would happen if the Indian government stole the Kohinoor (Koh-i-Noor) diamond back? |0| - - A [kaggle competition](https://www.kaggle.com/c/quora-question-pairs#description) was held based on this dataset in 2017. The kagglers were given a training dataset (with labels), and requested to make predictions on a test dataset (without labels). The predictions were evaluated by the log-likelihood loss on the test data. - -The kaggle competition has inspired much effective work. However, most of these models are rule-based and difficult to be transferred to new tasks. Researchers are seeking for more general models that work well on this task and other natual language processing (NLP) tasks. - -[Wang _et al._](https://arxiv.org/abs/1702.03814) proposed a bilateral multi-perspective matching (BIMPM) model based on the Quora Question Pair dataset. They splitted the original dataset to [3 parts](https://drive.google.com/file/d/0B0PlTAo--BnaQWlsZl9FZ3l1c28/view?usp=sharing): _train.tsv_ (384,348 samples), _dev.tsv_ (10,000 samples) and _test.tsv_ (10,000 samples). The class distribution of _train.tsv_ is unbalanced (37% positive and 63% negative), while those of _dev.tsv_ and _test.tsv_ are balanced(50% positive and 50% negetive). We used the same splitting method in our experiments. - -### Our Work - -Based on the Quora Question Pair Dataset, we implemented some classic models in the area of neural language understanding (NLU). The accuracy of prediction results are evaluated on the _test.tsv_ from [Wang _et al._](https://arxiv.org/abs/1702.03814). - -## Environment Preparation - -### Install Fluid release 1.0 - -Please follow the [official document in English](http://www.paddlepaddle.org/documentation/docs/en/1.0/build_and_install/pip_install_en.html) or [official document in Chinese](http://www.paddlepaddle.org/documentation/docs/zh/1.0/beginners_guide/install/Start.html) to install the Fluid deep learning framework. - -#### Have I installed Fluid successfully? - -Run the following script from your command line: - -```shell -python -c "import paddle" -``` - -If Fluid is installed successfully you should see no error message. Feel free to open issues under the [PaddlePaddle repository](https://github.com/PaddlePaddle/Paddle/issues) for support. - -## Prepare Data - -Please download the Quora dataset from [Google drive](https://drive.google.com/file/d/0B0PlTAo--BnaQWlsZl9FZ3l1c28/view?usp=sharing) and unzip to $HOME/.cache/paddle/dataset. - -Then run _data/prepare_quora_data.sh_ to download the pre-trained _word2vec_ embedding file -- _glove.840B.300d.zip_: - -```shell -sh data/prepare_quora_data.sh -``` - -At this point the dataset directory ($HOME/.cache/paddle/dataset) structure should be: - -```shell - -$HOME/.cache/paddle/dataset - |- Quora_question_pair_partition - |- train.tsv - |- test.tsv - |- dev.tsv - |- readme.txt - |- wordvec.txt - |- glove.840B.300d.txt -``` - -## Train and evaluate - -We provide multiple models and configurations. Details are shown in `models` and `configs` directories. For a quick start, please run the _cdssmNet_ model with the corresponding configuration: - -```shell -python train_and_evaluate.py \ - --model_name=cdssmNet \ - --config=cdssm_base -``` - -Logs will be output to the console. If everything works well, the logging information will have the same formats as the content in _cdssm_base.log_. - -All configurations used in our experiments are as follows: - -|Model|Config|command -|:----:|:----:|:----:| -|cdssmNet|cdssm_base|python train_and_evaluate.py --model_name=cdssmNet --config=cdssm_base -|DecAttNet|decatt_glove|python train_and_evaluate.py --model_name=DecAttNet --config=decatt_glove -|InferSentNet|infer_sent_v1|python train_and_evaluate.py --model_name=InferSentNet --config=infer_sent_v1 -|InferSentNet|infer_sent_v2|python train_and_evaluate.py --model_name=InferSentNet --config=infer_sent_v2 -|SSENet|sse_base|python train_and_evaluate.py --model_name=SSENet --config=sse_base - -## Models - -We implemeted 4 models for now: the convolutional deep-structured semantic model (CDSSM, CNN-based), the InferSent model (RNN-based), the shortcut-stacked encoder (SSE, RNN-based), and the decomposed attention model (DecAtt, attention-based). - -|Model|features|Context Encoder|Match Layer|Classification Layer -|:----:|:----:|:----:|:----:|:----:| -|CDSSM|word|1 layer conv1d|concatenation|MLP -|DecAtt|word|Attention|concatenation|MLP -|InferSent|word|1 layer Bi-LSTM|concatenation/element-wise product/
    absolute element-wise difference|MLP -|SSE|word|3 layer Bi-LSTM|concatenation/element-wise product/
    absolute element-wise difference|MLP - -### CDSSM - -``` -@inproceedings{shen2014learning, - title={Learning semantic representations using convolutional neural networks for web search}, - author={Shen, Yelong and He, Xiaodong and Gao, Jianfeng and Deng, Li and Mesnil, Gr{\'e}goire}, - booktitle={Proceedings of the 23rd International Conference on World Wide Web}, - pages={373--374}, - year={2014}, - organization={ACM} -} -``` - -### InferSent - -``` -@article{conneau2017supervised, - title={Supervised learning of universal sentence representations from natural language inference data}, - author={Conneau, Alexis and Kiela, Douwe and Schwenk, Holger and Barrault, Loic and Bordes, Antoine}, - journal={arXiv preprint arXiv:1705.02364}, - year={2017} -} -``` - -### SSE - -``` -@article{nie2017shortcut, - title={Shortcut-stacked sentence encoders for multi-domain inference}, - author={Nie, Yixin and Bansal, Mohit}, - journal={arXiv preprint arXiv:1708.02312}, - year={2017} -} -``` - -### DecAtt - -``` -@article{tomar2017neural, - title={Neural paraphrase identification of questions with noisy pretraining}, - author={Tomar, Gaurav Singh and Duque, Thyago and T{\"a}ckstr{\"o}m, Oscar and Uszkoreit, Jakob and Das, Dipanjan}, - journal={arXiv preprint arXiv:1704.04565}, - year={2017} -} -``` - -## Results - -|Model|Config|dev accuracy| test accuracy -|:----:|:----:|:----:|:----:| -|cdssmNet|cdssm_base|83.56%|82.83%| -|DecAttNet|decatt_glove|86.31%|86.22%| -|InferSentNet|infer_sent_v1|87.15%|86.62%| -|InferSentNet|infer_sent_v2|88.55%|88.43%| -|SSENet|sse_base|88.35%|88.25%| - -In our experiment, we found that LSTM-based models outperformed convolution-based models. The DecAtt model has fewer parameters than LSTM-based models, but is sensitive to hyper-parameters. - -

    - - test_acc - -

    +Please visit the project at [PaddleNLP/text_matching_on_quora](../../../PaddleNLP/text_matching_on_quora). diff --git a/fluid/PaddleRec/ctr/README.cn.md b/fluid/PaddleRec/ctr/README.cn.md index 05d1653e52c1db36e9690c64166283afc26df429..81cd20625701c13fce3a3f8ad119663a6e5c162c 100644 --- a/fluid/PaddleRec/ctr/README.cn.md +++ b/fluid/PaddleRec/ctr/README.cn.md @@ -1,79 +1,2 @@ -# 基于DNN模型的点击率预估模型 - -## 介绍 -本模型实现了下述论文中提出的DNN模型: - -```text -@inproceedings{guo2017deepfm, - title={DeepFM: A Factorization-Machine based Neural Network for CTR Prediction}, - author={Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li and Xiuqiang He}, - booktitle={the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI)}, - pages={1725--1731}, - year={2017} -} -``` - -## 运行环境 -需要先安装PaddlePaddle Fluid,然后运行: - -```shell -pip install -r requirements.txt -``` - -## 数据集 -本文使用的是Kaggle公司举办的[展示广告竞赛](https://www.kaggle.com/c/criteo-display-ad-challenge/)中所使用的Criteo数据集。 - -每一行是一次广告展示的特征,第一列是一个标签,表示这次广告展示是否被点击。总共有39个特征,其中13个特征采用整型值,另外26个特征是类别类特征。测试集中是没有标签的。 - -下载数据集: -```bash -cd data && ./download.sh && cd .. -``` - -## 模型 -本例子只实现了DeepFM论文中介绍的模型的DNN部分,DeepFM会在其他例子中给出。 - - -## 数据准备 -处理原始数据集,整型特征使用min-max归一化方法规范到[0, 1],类别类特征使用了one-hot编码。原始数据集分割成两部分:90%用于训练,其他10%用于训练过程中的验证。 - -## 训练 -训练的命令行选项可以通过`python train.py -h`列出。 - -### 单机训练: -```bash -python train.py \ - --train_data_path data/raw/train.txt \ - 2>&1 | tee train.log -``` - -训练到第1轮的第40000个batch后,测试的AUC为0.801178,误差(cost)为0.445196。 - -### 分布式训练 - -本地启动一个2 trainer 2 pserver的分布式训练任务,分布式场景下训练数据会按照trainer的id进行切分,保证trainer之间的训练数据不会重叠,提高训练效率 - -```bash -sh cluster_train.sh -``` - -## 预测 -预测的命令行选项可以通过`python infer.py -h`列出。 - -对测试集进行预测: -```bash -python infer.py \ - --model_path models/pass-0/ \ - --data_path data/raw/valid.txt -``` -注意:infer.py跑完最后输出的AUC才是整个预测文件的整体AUC。 - -## 在百度云上运行集群训练 -1. 参考文档 [在百度云上启动Fluid分布式训练](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/user_guides/howto/training/train_on_baidu_cloud_cn.rst) 在百度云上部署一个CPU集群。 -1. 用preprocess.py处理训练数据生成train.txt。 -1. 将train.txt切分成集群机器份,放到每台机器上。 -1. 用上面的 `分布式训练` 中的命令行启动分布式训练任务. - -## 在PaddleCloud上运行集群训练 -如果你正在使用PaddleCloud做集群训练,你可以使用```cloud.py```这个文件来帮助你提交任务,```trian.py```中所需要的参数可以通过PaddleCloud的环境变量来提交。 \ No newline at end of file +您好,该项目已被迁移,请移步到 [PaddleRec/ctr](../../../PaddleRec/ctr) 目录下浏览本项目。 diff --git a/fluid/PaddleRec/ctr/README.md b/fluid/PaddleRec/ctr/README.md index e29e2e1eb5493fc52b8bb38a6bf49bd397bb8455..1aceff1350c2c28b13ec92ccf82e321bb3ddda04 100644 --- a/fluid/PaddleRec/ctr/README.md +++ b/fluid/PaddleRec/ctr/README.md @@ -1,96 +1,6 @@ -# DNN for Click-Through Rate prediction +Hi! -## Introduction -This model implements the DNN part proposed in the following paper: +This directory has been deprecated. -```text -@inproceedings{guo2017deepfm, - title={DeepFM: A Factorization-Machine based Neural Network for CTR Prediction}, - author={Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li and Xiuqiang He}, - booktitle={the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI)}, - pages={1725--1731}, - year={2017} -} -``` - -The DeepFm combines factorization machine and deep neural networks to model -both low order and high order feature interactions. For details of the -factorization machines, please refer to the paper [factorization -machines](https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf) - -## Environment -You should install PaddlePaddle Fluid first, and run: - -```shell -pip install -r requirements.txt -``` - -## Dataset -This example uses Criteo dataset which was used for the [Display Advertising -Challenge](https://www.kaggle.com/c/criteo-display-ad-challenge/) -hosted by Kaggle. - -Each row is the features for an ad display and the first column is a label -indicating whether this ad has been clicked or not. There are 39 features in -total. 13 features take integer values and the other 26 features are -categorical features. For the test dataset, the labels are omitted. - -Download dataset: -```bash -cd data && ./download.sh && cd .. -``` - -## Model -This Demo only implement the DNN part of the model described in DeepFM paper. -DeepFM model will be provided in other model. - - -## Data Preprocessing method -To preprocess the raw dataset, the integer features are clipped then min-max -normalized to [0, 1] and the categorical features are one-hot encoded. The raw -training dataset are splited such that 90% are used for training and the other -10% are used for validation during training. In reader.py, training data is the first -90% of data in train.txt, and validation data is the left. - -## Train -The command line options for training can be listed by `python train.py -h`. - -### Local Train: -```bash -python train.py \ - --train_data_path data/raw/train.txt \ - 2>&1 | tee train.log -``` - -After training pass 1 batch 40000, the testing AUC is `0.801178` and the testing -cost is `0.445196`. - -### Distributed Train -Run a 2 pserver 2 trainer distribute training on a single machine. -In distributed training setting, training data is splited by trainer_id, so that training data - do not overlap among trainers - -```bash -sh cluster_train.sh -``` - -## Infer -The command line options for infering can be listed by `python infer.py -h`. - -To make inference for the test dataset: -```bash -python infer.py \ - --model_path models/ \ - --data_path data/raw/train.txt -``` -Note: The AUC value in the last log info is the total AUC for all test dataset. Here, train.txt is splited inside the reader.py so that validation data does not have overlap with training data. - -## Train on Baidu Cloud -1. Please prepare some CPU machines on Baidu Cloud following the steps in [train_on_baidu_cloud](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/user_guides/howto/training/train_on_baidu_cloud_cn.rst) -1. Prepare dataset using preprocess.py. -1. Split the train.txt to trainer_num parts and put them on the machines. -1. Run training with the cluster train using the command in `Distributed Train` above. - -## Train on Paddle Cloud -If you want to run this training on PaddleCloud, you can use the script ```cloud.py```, you can change the arguments in ```trian.py``` through environments in PaddleCloud. \ No newline at end of file +Please visit the project at [PaddleRec/ctr](../../../PaddleRec/ctr). diff --git a/fluid/PaddleRec/din/README.md b/fluid/PaddleRec/din/README.md index 3538ba760ff9b80807a6a56aed4b75400c97ae03..6e2df0301cf20434dc3479da8c93644f764c5c42 100644 --- a/fluid/PaddleRec/din/README.md +++ b/fluid/PaddleRec/din/README.md @@ -1,137 +1,2 @@ -# DIN -以下是本例的简要目录结构及说明: - -```text -. -├── README.md # 文档 -├── train.py # 训练脚本 -├── infer.py # 预测脚本 -├── network.py # 网络结构 -├── cluster_train.py # 多机训练 -├── cluster_train.sh # 多机训练脚本 -├── reader.py # 和读取数据相关的函数 -├── data/ - ├── build_dataset.py # 文本数据转化为paddle数据 - ├── convert_pd.py # 将原始数据转化为pandas的dataframe - ├── data_process.sh # 数据预处理脚本 - ├── remap_id.py # remap类别id - -``` - -## 简介 - -DIN模型的介绍可以参阅论文[Deep Interest Network for Click-Through Rate Prediction](https://arxiv.org/abs/1706.06978)。 - -DIN通过一个兴趣激活模块(Activation Unit),用预估目标Candidate ADs的信息去激活用户的历史点击商品,以此提取用户与当前预估目标相关的兴趣。 - -权重高的历史行为表明这部分兴趣和当前广告相关,权重低的则是和广告无关的”兴趣噪声“。我们通过将激活的商品和激活权重相乘,然后累加起来作为当前预估目标ADs相关的兴趣状态表达。 - -最后我们将这相关的用户兴趣表达、用户静态特征和上下文相关特征,以及ad相关的特征拼接起来,输入到后续的多层DNN网络,最后预测得到用户对当前目标ADs的点击概率。 - - -## 数据下载及预处理 - -* Step 1: 运行如下命令 下载[Amazon Product数据集](http://jmcauley.ucsd.edu/data/amazon/)并进行预处理 -``` -cd data && sh data_process.sh && cd .. -``` -如果执行过程中遇到找不到某个包(例如pandas包)的报错,使用如下命令安装对应的包即可。 -``` -pip install pandas -``` - -* Step 2: 产生训练集、测试集和config文件 -``` -python build_dataset.py -``` -运行之后在data文件夹下会产生config.txt、paddle_test.txt、paddle_train.txt三个文件 - -数据格式例子如下: -``` -3737 19450;288 196;18486;674;1 -3647 4342 6855 3805;281 463 558 674;4206;463;1 -1805 4309;87 87;21354;556;1 -18209 20753;649 241;51924;610;0 -13150;351;41455;792;1 -35120 40418;157 714;52035;724;0 -``` - -其中每一行是一个Sample,由分号分隔的5个域组成。前两个域是历史交互的item序列和item对应的类别,第三、四个域是待预测的item和其类别,最后一个域是label,表示点击与否。 - - -## 训练 - -具体的参数配置说明可通过运行下列代码查看 -``` -python train.py -h -``` - -gpu 单机单卡训练 -``` bash -CUDA_VISIBLE_DEVICES=1 python -u train.py --config_path 'data/config.txt' --train_dir 'data/paddle_train.txt' --batch_size 32 --epoch_num 100 --use_cuda 1 > log.txt 2>&1 & -``` - -cpu 单机训练 -``` bash -python -u train.py --config_path 'data/config.txt' --train_dir 'data/paddle_train.txt' --batch_size 32 --epoch_num 100 --use_cuda 0 > log.txt 2>&1 & -``` - -值得注意的是上述单卡训练可以通过加--parallel 1参数使用Parallel Executor来进行加速 - -gpu 单机多卡训练 -``` bash -CUDA_VISIBLE_DEVICES=0,1 python -u train.py --config_path 'data/config.txt' --train_dir 'data/paddle_train.txt' --batch_size 32 --epoch_num 100 --use_cuda 1 --parallel 1 --num_devices 2 > log.txt 2>&1 & -``` - -cpu 单机多卡训练 -``` bash -CPU_NUM=10 python -u train.py --config_path 'data/config.txt' --train_dir 'data/paddle_train.txt' --batch_size 32 --epoch_num 100 --use_cuda 0 --parallel 1 --num_devices 10 > log.txt 2>&1 & -``` - - -## 训练结果示例 - -我们在Tesla K40m单GPU卡上训练的日志如下所示(以实际输出为准) -```text -2019-02-22 09:31:51,578 - INFO - reading data begins -2019-02-22 09:32:22,407 - INFO - reading data completes -W0222 09:32:24.151955 7221 device_context.cc:263] Please NOTE: device: 0, CUDA Capability: 35, Driver API Version: 9.0, Runtime API Version: 8.0 -W0222 09:32:24.152046 7221 device_context.cc:271] device: 0, cuDNN Version: 7.0. -2019-02-22 09:32:27,797 - INFO - train begins -epoch: 1 global_step: 1000 train_loss: 0.6950 time: 14.64 -epoch: 1 global_step: 2000 train_loss: 0.6854 time: 15.41 -epoch: 1 global_step: 3000 train_loss: 0.6799 time: 14.84 -... -model saved in din_amazon/global_step_50000 -... -``` - -提示: - -* 在单机条件下,使用代码中默认的超参数运行时,产生最优auc的global step大致在440000到500000之间 - -* 训练超出一定的epoch后会稍稍出现过拟合 - -## 预测 -参考如下命令,开始预测. - -其中model_path为模型的路径,test_path为测试数据路径。 - -``` -CUDA_VISIBLE_DEVICES=3 python infer.py --model_path 'din_amazon/global_step_400000' --test_path 'data/paddle_test.txt' --use_cuda 1 -``` - -## 预测结果示例 -```text -2019-02-22 11:22:58,804 - INFO - TEST --> loss: [0.47005194] auc:0.863794952818 -``` - - -## 多机训练 -可参考cluster_train.py 配置多机环境 - -运行命令本地模拟多机场景 -``` -sh cluster_train.sh -``` +您好,该项目已被迁移,请移步到 [PaddleRec/din](../../../PaddleRec/din) 目录下浏览本项目。 diff --git a/fluid/PaddleRec/gnn/README.md b/fluid/PaddleRec/gnn/README.md index 29e3f721c5a81b64e21abb5242adccdc46b3d0f8..1ac21f3ee4712ead33f44322447d30fe5aa45918 100644 --- a/fluid/PaddleRec/gnn/README.md +++ b/fluid/PaddleRec/gnn/README.md @@ -1,118 +1,2 @@ -# SR-GNN -以下是本例的简要目录结构及说明: - -```text -. -├── README.md # 文档 -├── train.py # 训练脚本 -├── infer.py # 预测脚本 -├── network.py # 网络结构 -├── reader.py # 和读取数据相关的函数 -├── data/ - ├── download.sh # 下载数据的脚本 - ├── preprocess.py # 数据预处理 - -``` - -## 简介 - -SR-GNN模型的介绍可以参阅论文[Session-based Recommendation with Graph Neural Networks](https://arxiv.org/abs/1811.00855)。 - -本文解决的是Session-based Recommendation这一问题,过程大致分为以下四步: - -是对所有的session序列通过有向图进行建模。 - -然后通过GNN,学习每个node(item)的隐向量表示 - -然后通过一个attention架构模型得到每个session的embedding - -最后通过一个softmax层进行全表预测 - -我们复现了论文效果,在DIGINETICA数据集上P@20可以达到50.7 - - -## 数据下载及预处理 - -使用[DIGINETICA](http://cikm2016.cs.iupui.edu/cikm-cup)数据集。可以按照下述过程操作获得数据集以及进行简单的数据预处理。 - -* Step 1: 运行如下命令,下载DIGINETICA数据集并进行预处理 -``` -cd data && sh download.sh -``` - -* Step 2: 产生训练集、测试集和config文件 -``` -python preprocess.py --dataset diginetica -cd .. -``` -运行之后在data文件夹下会产生diginetica文件夹,里面包含config.txt、test.txt train.txt三个文件 - -生成的数据格式为:(session_list, -label_list)。 - -其中session_list是一个session的列表,其中每个元素都是一个list,代表不同的session。label_list是一个列表,每个位置的元素是session_list中对应session的label。 - -例子:session_list=[[1,2,3], [4], [7,9]]。代表这个session_list包含3个session,第一个session包含的item序列是1,2,3,第二个session只有1个item 4,第三个session包含的item序列是7,9。 - -label_list = [6, 9, -1]。代表[1,2,3]这个session的预测label值应该为6,后两个以此类推。 - -提示: - -* 如果您想使用自己业务场景下的数据,只要令数据满足上述格式要求即可 -* 本例中的train.txt和test.txt两个文件均为二进制文件 - - -## 训练 - -可以参考下面不同场景下的运行命令进行训练,还可以指定诸如batch_size,lr(learning rate)等参数,具体的配置说明可通过运行下列代码查看 -``` -python train.py -h -``` - -gpu 单机单卡训练 -``` bash -CUDA_VISIBLE_DEVICES=1 python -u train.py --use_cuda 1 > log.txt 2>&1 & -``` - -cpu 单机训练 -``` bash -python -u train.py --use_cuda 0 > log.txt 2>&1 & -``` - -值得注意的是上述单卡训练可以通过加--parallel 1参数使用Parallel Executor来进行加速 - - -## 训练结果示例 - -我们在Tesla K40m单GPU卡上训练的日志如下所示(以实际输出为准) -```text -W0308 16:08:24.249840 1785 device_context.cc:263] Please NOTE: device: 0, CUDA Capability: 35, Driver API Version: 9.0, Runtime API Version: 8.0 -W0308 16:08:24.249974 1785 device_context.cc:271] device: 0, cuDNN Version: 7.0. -2019-03-08 16:08:38,079 - INFO - load data complete -2019-03-08 16:08:38,080 - INFO - begin train -2019-03-08 16:09:07,605 - INFO - step: 500, loss: 10.2052, train_acc: 0.0088 -2019-03-08 16:09:36,940 - INFO - step: 1000, loss: 9.7192, train_acc: 0.0320 -2019-03-08 16:10:08,617 - INFO - step: 1500, loss: 8.9290, train_acc: 0.1350 -... -2019-03-08 16:16:01,151 - INFO - model saved in ./saved_model/epoch_0 -... -``` - -## 预测 -运行如下命令即可开始预测。可以通过参数指定开始和结束的epoch轮次。 - -``` -CUDA_VISIBLE_DEVICES=3 python infer.py -``` - -## 预测结果示例 -```text -W0308 16:41:56.847339 31709 device_context.cc:263] Please NOTE: device: 0, CUDA Capability: 35, Driver API Version: 9.0, Runtime API Version: 8.0 -W0308 16:41:56.847705 31709 device_context.cc:271] device: 0, cuDNN Version: 7.0. -2019-03-08 16:42:20,420 - INFO - TEST --> loss: 5.8865, Recall@20: 0.4525 -2019-03-08 16:42:45,153 - INFO - TEST --> loss: 5.5314, Recall@20: 0.5010 -2019-03-08 16:43:10,233 - INFO - TEST --> loss: 5.5128, Recall@20: 0.5047 -... -``` +您好,该项目已被迁移,请移步到 [PaddleRec/gnn](../../../PaddleRec/gnn) 目录下浏览本项目。 diff --git a/fluid/PaddleRec/gru4rec/README.md b/fluid/PaddleRec/gru4rec/README.md index 353781567f7012996199e51169233b306cd18722..9fe28eba00760b67c532e4624a5722cfd62feb57 100644 --- a/fluid/PaddleRec/gru4rec/README.md +++ b/fluid/PaddleRec/gru4rec/README.md @@ -1,283 +1,2 @@ -# GRU4REC -以下是本例的简要目录结构及说明: - -```text -. -├── README.md # 文档 -├── train.py # 训练脚本 全词表 cross-entropy -├── train_sample_neg.py # 训练脚本 sample负例 包含bpr loss 和cross-entropy -├── infer.py # 预测脚本 全词表 -├── infer_sample_neg.py # 预测脚本 sample负例 -├── net.py # 网络结构 -├── text2paddle.py # 文本数据转paddle数据 -├── cluster_train.py # 多机训练 -├── cluster_train.sh # 多机训练脚本 -├── utils # 通用函数 -├── convert_format.py # 转换数据格式 -├── vocab.txt # 小样本字典 -├── train_data # 小样本训练目录 -└── test_data # 小样本测试目录 - -``` - - -## 简介 - -GRU4REC模型的介绍可以参阅论文[Session-based Recommendations with Recurrent Neural Networks](https://arxiv.org/abs/1511.06939)。 - -论文的贡献在于首次将RNN(GRU)运用于session-based推荐,相比传统的KNN和矩阵分解,效果有明显的提升。 - -论文的核心思想是在一个session中,用户点击一系列item的行为看做一个序列,用来训练RNN模型。预测阶段,给定已知的点击序列作为输入,预测下一个可能点击的item。 - -session-based推荐应用场景非常广泛,比如用户的商品浏览、新闻点击、地点签到等序列数据。 - -支持三种形式的损失函数, 分别是全词表的cross-entropy, 负采样的Bayesian Pairwise Ranking和负采样的Cross-entropy. - -我们基本复现了论文效果,recall@20的效果分别为 - -全词表 cross entropy : 0.67 - -负采样 bpr : 0.606 - -负采样 cross entropy : 0.605 - - -运行样例程序可跳过'RSC15 数据下载及预处理'部分 -## RSC15 数据下载及预处理 - -运行命令 下载RSC15官网数据集 -``` -curl -Lo yoochoose-data.7z https://s3-eu-west-1.amazonaws.com/yc-rdata/yoochoose-data.7z -7z x yoochoose-data.7z -``` - -GRU4REC的数据过滤,下载脚本[https://github.com/hidasib/GRU4Rec/blob/master/examples/rsc15/preprocess.py](https://github.com/hidasib/GRU4Rec/blob/master/examples/rsc15/preprocess.py), - -注意修改文件路径 - -line12: PATH_TO_ORIGINAL_DATA = './' - -line13:PATH_TO_PROCESSED_DATA = './' - -注意使用python3 执行脚本 -``` -python preprocess.py -``` -生成的数据格式如下 - -``` -SessionId ItemId Time -1 214536502 1396839069.277 -1 214536500 1396839249.868 -1 214536506 1396839286.998 -1 214577561 1396839420.306 -2 214662742 1396850197.614 -2 214662742 1396850239.373 -2 214825110 1396850317.446 -2 214757390 1396850390.71 -2 214757407 1396850438.247 -``` - -数据格式需要转换, 运行脚本如下 -``` -python convert_format.py -``` - -模型的训练及测试数据如下,一行表示一个用户按照时间顺序的序列 - -``` -214536502 214536500 214536506 214577561 -214662742 214662742 214825110 214757390 214757407 214551617 -214716935 214774687 214832672 -214836765 214706482 -214701242 214826623 -214826835 214826715 -214838855 214838855 -214576500 214576500 214576500 -214821275 214821275 214821371 214821371 214821371 214717089 214563337 214706462 214717436 214743335 214826837 214819762 -214717867 214717867 -``` - -根据训练和测试文件生成字典和对应的paddle输入文件 - -需要将训练文件放到目录raw_train_data下,测试文件放到目录raw_test_data下,并生成对应的train_data,test_data和vocab.txt文件 -``` -python text2paddle.py raw_train_data/ raw_test_data/ train_data test_data vocab.txt -``` - -转化后生成的格式如下,可参考train_data/small_train.txt -``` -197 196 198 236 -93 93 384 362 363 43 -336 364 407 -421 322 -314 388 -128 58 -138 138 -46 46 46 -34 34 57 57 57 342 228 321 346 357 59 376 -110 110 -``` - -## 训练 - -具体的参数配置可运行 -``` -python train.py -h -``` -全词表cross entropy 训练代码 - -gpu 单机单卡训练 -``` bash -CUDA_VISIBLE_DEVICES=0 python train.py --train_dir train_data --use_cuda 1 --batch_size 50 --model_dir model_output -``` - -cpu 单机训练 -``` bash -python train.py --train_dir train_data --use_cuda 0 --batch_size 50 --model_dir model_output -``` - -gpu 单机多卡训练 -``` bash -CUDA_VISIBLE_DEVICES=0,1 python train.py --train_dir train_data --use_cuda 1 --parallel 1 --batch_size 50 --model_dir model_output --num_devices 2 -``` - -cpu 单机多卡训练 -``` bash -CPU_NUM=10 python train.py --train_dir train_data --use_cuda 0 --parallel 1 --batch_size 50 --model_dir model_output --num_devices 10 -``` - -负采样 bayesian pairwise ranking loss(bpr loss) 训练 -``` -CUDA_VISIBLE_DEVICES=0 python train_sample_neg.py --loss bpr --use_cuda 1 -``` - -负采样 cross entropy 训练 -``` -CUDA_VISIBLE_DEVICES=0 python train_sample_neg.py --loss ce --use_cuda 1 -``` - -## 自定义网络结构 - -可在[net.py](./net.py) `network` 函数中调整网络结构,当前的网络结构如下: -```python -emb = fluid.layers.embedding( - input=src, - size=[vocab_size, hid_size], - param_attr=fluid.ParamAttr( - initializer=fluid.initializer.Uniform( - low=init_low_bound, high=init_high_bound), - learning_rate=emb_lr_x), - is_sparse=True) - -fc0 = fluid.layers.fc(input=emb, - size=hid_size * 3, - param_attr=fluid.ParamAttr( - initializer=fluid.initializer.Uniform( - low=init_low_bound, high=init_high_bound), - learning_rate=gru_lr_x)) -gru_h0 = fluid.layers.dynamic_gru( - input=fc0, - size=hid_size, - param_attr=fluid.ParamAttr( - initializer=fluid.initializer.Uniform( - low=init_low_bound, high=init_high_bound), - learning_rate=gru_lr_x)) - -fc = fluid.layers.fc(input=gru_h0, - size=vocab_size, - act='softmax', - param_attr=fluid.ParamAttr( - initializer=fluid.initializer.Uniform( - low=init_low_bound, high=init_high_bound), - learning_rate=fc_lr_x)) - -cost = fluid.layers.cross_entropy(input=fc, label=dst) -acc = fluid.layers.accuracy(input=fc, label=dst, k=20) -``` - -## 训练结果示例 - -我们在Tesla K40m单GPU卡上训练的日志如下所示 -```text -epoch_1 start -step:100 ppl:441.468 -step:200 ppl:311.043 -step:300 ppl:218.952 -step:400 ppl:186.172 -step:500 ppl:188.600 -step:600 ppl:131.213 -step:700 ppl:165.770 -step:800 ppl:164.414 -step:900 ppl:156.470 -step:1000 ppl:174.201 -step:1100 ppl:118.619 -step:1200 ppl:122.635 -step:1300 ppl:118.220 -step:1400 ppl:90.372 -step:1500 ppl:135.018 -step:1600 ppl:114.327 -step:1700 ppl:141.806 -step:1800 ppl:93.416 -step:1900 ppl:92.897 -step:2000 ppl:121.703 -step:2100 ppl:96.288 -step:2200 ppl:88.355 -step:2300 ppl:101.737 -step:2400 ppl:95.934 -step:2500 ppl:86.158 -step:2600 ppl:80.925 -step:2700 ppl:202.219 -step:2800 ppl:106.828 -step:2900 ppl:91.458 -step:3000 ppl:105.988 -step:3100 ppl:87.067 -step:3200 ppl:92.651 -step:3300 ppl:101.145 -step:3400 ppl:91.247 -step:3500 ppl:107.656 -step:3600 ppl:89.410 -... -... -step:15700 ppl:76.819 -step:15800 ppl:62.257 -step:15900 ppl:81.735 -epoch:1 num_steps:15907 time_cost(s):4154.096032 -model saved in model_recall20/epoch_1 -... -``` - -## 预测 -运行命令 全词表运行infer.py, 负采样运行infer_sample_neg.py。 - -``` -CUDA_VISIBLE_DEVICES=0 python infer.py --test_dir test_data/ --model_dir model_output/ --start_index 1 --last_index 10 --use_cuda 1 -``` - -## 预测结果示例 -```text -model:model_r@20/epoch_1 recall@20:0.613 time_cost(s):12.23 -model:model_r@20/epoch_2 recall@20:0.647 time_cost(s):12.33 -model:model_r@20/epoch_3 recall@20:0.662 time_cost(s):12.38 -model:model_r@20/epoch_4 recall@20:0.669 time_cost(s):12.21 -model:model_r@20/epoch_5 recall@20:0.673 time_cost(s):12.17 -model:model_r@20/epoch_6 recall@20:0.675 time_cost(s):12.26 -model:model_r@20/epoch_7 recall@20:0.677 time_cost(s):12.25 -model:model_r@20/epoch_8 recall@20:0.679 time_cost(s):12.37 -model:model_r@20/epoch_9 recall@20:0.680 time_cost(s):12.22 -model:model_r@20/epoch_10 recall@20:0.681 time_cost(s):12.2 -``` - - -## 多机训练 -厂内用户可以参考[wiki](http://wiki.baidu.com/pages/viewpage.action?pageId=628300529)利用paddlecloud 配置多机环境 - -可参考cluster_train.py 配置其他多机环境 - -运行命令本地模拟多机场景 -``` -sh cluster_train.sh -``` - -注意本地模拟需要关闭代理 +您好,该项目已被迁移,请移步到 [PaddleRec/gru4rec](../../../PaddleRec/gru4rec) 目录下浏览本项目。 diff --git a/fluid/PaddleRec/multiview_simnet/README.cn.md b/fluid/PaddleRec/multiview_simnet/README.cn.md index 06df3c32c7996f5003bd7b9c1eb749f32c28b752..9cf8e27bba4775800498c25b550f7bb19479f074 100644 --- a/fluid/PaddleRec/multiview_simnet/README.cn.md +++ b/fluid/PaddleRec/multiview_simnet/README.cn.md @@ -1,27 +1,2 @@ -# 个性化推荐中的多视角Simnet模型 -## 介绍 -在个性化推荐场景中,推荐系统给用户提供的项目(Item)列表通常是通过个性化的匹配模型计算出来的。在现实世界中,一个用户可能有很多个视角的特征,比如用户Id,年龄,项目的点击历史等。一个项目,举例来说,新闻资讯,也会有多种视角的特征比如新闻标题,新闻类别等。多视角Simnet模型是可以融合用户以及推荐项目的多个视角的特征并进行个性化匹配学习的一体化模型。这类模型在很多工业化的场景中都会被使用到,比如百度的Feed产品中。 - -## 数据集 -目前,本项目使用机器生成的数据集来介绍多视角Simnet模型的概念,未来我们会逐渐加入真是世界中的数据集并在这个模型上进行效果验证。 - -## 模型 -本项目的目标是提供一个在个性化匹配场景下利用Paddle搭建的模型。多视角Simnet模型包括多个编码器模块,每个编码器被用在不同的特征视角上。当前,项目中提供Bag-of-Embedding编码器,Temporal-Convolutional编码器,和Gated-Recurrent-Unit编码器。我们会逐渐加入稀疏特征场景下比较实用的编码器到这个项目中。模型的训练方法,当前采用的是Pairwise ranking模式进行训练,即针对一对具有关联的User-Item组合,随机实用一个Item作为负例进行排序学习。 - -## 训练 -如下 -如下命令行可以获得训练工具的具体选项,`python train.py -h`内容可以参考说明 -```bash -python train.py -``` -## -如下 -如下命令行可以获得预测工具的具体选项,`python infer -h`内容可以参考说明 -```bash -python infer.py -``` -## 未来的工作 -- 多种pairwise的损失函数会被加入到这个项目中。对于不同视角的特征,用户-项目之间的匹配关系可以使用不同的损失函数进行联合优化。整个模型会在真实数据中进行验证。 -- Parallel Executor选项会被加入 -- 分布式训练能力会被加入 +您好,该项目已被迁移,请移步到 [PaddleRec/multiview_simnet](../../../PaddleRec/multiview_simnet) 目录下浏览本项目。 diff --git a/fluid/PaddleRec/multiview_simnet/README.md b/fluid/PaddleRec/multiview_simnet/README.md index 525946e612592b97e10707cadf35e5252230c2bd..8fba8e606256ad7ad65ec429b68e967809bc6a51 100644 --- a/fluid/PaddleRec/multiview_simnet/README.md +++ b/fluid/PaddleRec/multiview_simnet/README.md @@ -1,27 +1,6 @@ -# Multi-view Simnet for Personalized recommendation -## Introduction -In personalized recommendation scenario, a user often is provided with several items from personalized interest matching model. In real world application, a user may have multiple views of features, say user-id, age, click-history of items, search queries. A item, e.g. news, may also have multiple views of features like news title, news category, images in news and so on. Multi-view Simnet is matching a model that combine users' and items' multiple views of features into one unified model. The model can be used in many industrial product like Baidu's feed news. The model is adapted from the paper A Multi-View Deep Learning(MV-DNN) Approach for Cross Domain User Modeling in Recommendation Systems, WWW 2015. The difference between our model and the MV-DNN is that we also consider multiple feature views of users. +Hi! -## Dataset -Currently, synthetic dataset is provided for proof of concept and we aim to add more real world dataset in this project in the future. +This directory has been deprecated. -## Model -This project aims to provide practical usage of Paddle in personalized matching scenario. The model provides several encoder modules for different views of features. Currently, Bag-of-Embedding encoder, Temporal-Convolutional encoder, Gated-Recurrent-Unit encoder are provided. We will add more practical encoder for sparse features commonly used in recommender systems. Training algorithms used in this model is pairwise ranking in that a negative item with multiple views will be sampled given a pair of positive user-item pair. - -## Train -The command line options for training can be listed by `python train.py -h` -```bash -python train.py -``` - -## Infer -The command line options for inference can be listed by `python infer.py -h` -```bash -python infer.py -``` - -## Future work -- Multiple types of pairwise loss will be added in this project. For different views of features between a user and an item, multiple losses will be supported. The model will be verified in real world dataset. -- Parallel Executor will be added in this project -- Distributed Training will be added +Please visit the project at [PaddleRec/multiview_simnet](../../../PaddleRec/multiview_simnet). diff --git a/fluid/PaddleRec/ssr/README.md b/fluid/PaddleRec/ssr/README.md index d0b4dfb41b4cea19efa42c4a233c9544349d1770..15111907ccc21942c134a2a614ad341c37710272 100644 --- a/fluid/PaddleRec/ssr/README.md +++ b/fluid/PaddleRec/ssr/README.md @@ -1,52 +1,2 @@ -# Sequence Semantic Retrieval Model -## Introduction -In news recommendation scenarios, different from traditional systems that recommend entertainment items such as movies or music, there are several new problems to solve. -- Very sparse user profile features exist that a user may login a news recommendation app anonymously and a user is likely to read a fresh news item. -- News are generated or disappeared very fast compare with movies or musics. Usually, there will be thousands of news generated in a news recommendation app. The Consumption of news is also fast since users care about newly happened things. -- User interests may change frequently in the news recommendation setting. The content of news will affect users' reading behaviors a lot even the category of the news does not belong to users' long-term interest. In news recommendation, reading behaviors are determined by both short-term interest and long-term interest of users. - -[GRU4Rec](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/gru4rec) models a user's short-term and long-term interest by applying a gated-recurrent-unit on the user's reading history. The generalization ability of recurrent neural network captures users' similarity of reading sequences that alleviates the user profile sparsity problem. However, the paper of GRU4Rec operates on close domain of items that the model predicts which item a user will be interested in through classification method. In news recommendation, news items are dynamic through time that GRU4Rec model can not predict items that do not exist in training dataset. - -Sequence Semantic Retrieval(SSR) Model shares the similar idea with Multi-Rate Deep Learning for Temporal Recommendation, SIGIR 2016. Sequence Semantic Retrieval Model has two components, one is the matching model part, the other one is the retrieval part. -- The idea of SSR is to model a user's personalized interest of an item through matching model structure, and the representation of a news item can be computed online even the news item does not exist in training dataset. -- With the representation of news items, we are able to build an vector indexing service online for news prediction and this is the retrieval part of SSR. - -## Dataset -Dataset preprocessing follows the method of [GRU4Rec Project](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/gru4rec). Note that you should reuse scripts from GRU4Rec project for data preprocessing. - -## Training - -The command line options for training can be listed by `python train.py -h` - -gpu 单机单卡训练 -``` bash -CUDA_VISIBLE_DEVICES=0 python train.py --train_dir train_data --use_cuda 1 --batch_size 50 --model_dir model_output -``` - -cpu 单机训练 -``` bash -python train.py --train_dir train_data --use_cuda 0 --batch_size 50 --model_dir model_output -``` - -gpu 单机多卡训练 -``` bash -CUDA_VISIBLE_DEVICES=0,1 python train.py --train_dir train_data --use_cuda 1 --parallel 1 --batch_size 50 --model_dir model_output --num_devices 2 -``` - -cpu 单机多卡训练 -``` bash -CPU_NUM=10 python train.py --train_dir train_data --use_cuda 0 --parallel 1 --batch_size 50 --model_dir model_output --num_devices 10 -``` - -本地模拟多机训练 -``` bash -sh cluster_train.sh -``` - -## Inference - -gpu 预测 -``` bash -CUDA_VISIBLE_DEVICES=0 python infer.py --test_dir test_data --use_cuda 1 --batch_size 50 --model_dir model_output -``` +您好,该项目已被迁移,请移步到 [PaddleRec/ssr](../../../PaddleRec/ssr) 目录下浏览本项目。 diff --git a/fluid/PaddleRec/tagspace/README.md b/fluid/PaddleRec/tagspace/README.md index 4263065bee2c5492684147f532e92c7c8083e16f..67e3f88f7a2245829d0efbfe23a6566a0745fe41 100644 --- a/fluid/PaddleRec/tagspace/README.md +++ b/fluid/PaddleRec/tagspace/README.md @@ -1,92 +1,2 @@ -# TagSpace -以下是本例的简要目录结构及说明: - -```text -. -├── README.md # 文档 -├── train.py # 训练脚本 -├── infer.py # 预测脚本 -├── net.py # 网络结构 -├── text2paddle.py # 文本数据转paddle数据 -├── cluster_train.py # 多机训练 -├── cluster_train.sh # 多机训练脚本 -├── utils # 通用函数 -├── vocab_text.txt # 小样本文本字典 -├── vocab_tag.txt # 小样本类别字典 -├── train_data # 小样本训练目录 -└── test_data # 小样本测试目录 - -``` - - -## 简介 - -TagSpace模型的介绍可以参阅论文[#TagSpace: Semantic Embeddings from Hashtags](https://research.fb.com/publications/tagspace-semantic-embeddings-from-hashtags/)。 - -Tagspace模型学习文本及标签的embedding表示,应用于工业级的标签推荐,具体应用场景有feed新闻标签推荐。 - - -## 数据下载及预处理 - -数据地址: [ag news dataset](https://github.com/mhjabreel/CharCNN/tree/master/data/) - -备份数据地址:[ag news dataset](https://paddle-tagspace.bj.bcebos.com/data.tar) - -数据格式如下 - -``` -"3","Wall St. Bears Claw Back Into the Black (Reuters)","Reuters - Short-sellers, Wall Street's dwindling\band of ultra-cynics, are seeing green again." -``` - -备份数据解压后,将文本数据转为paddle数据,先将数据放到训练数据目录和测试数据目录 -``` -mv train.csv raw_big_train_data -mv test.csv raw_big_test_data -``` - -运行脚本text2paddle.py 生成paddle输入格式 -``` -python text2paddle.py raw_big_train_data/ raw_big_test_data/ train_big_data test_big_data big_vocab_text.txt big_vocab_tag.txt -``` - -## 单机训练 -'--use_cuda 1' 表示使用gpu, 0表示使用cpu, '--parallel 1' 表示使用多卡 - -小数据训练(样例中的数据已经准备,可跳过上一节的数据准备,直接运行命令) - -GPU 环境 -``` -CUDA_VISIBLE_DEVICES=0 python train.py --use_cuda 1 -``` -CPU 环境 -``` -python train.py -``` - -全量数据单机单卡训练 -``` -CUDA_VISIBLE_DEVICES=0 python train.py --use_cuda 1 --train_dir train_big_data/ --vocab_text_path big_vocab_text.txt --vocab_tag_path big_vocab_tag.txt --model_dir big_model --batch_size 500 -``` -全量数据单机多卡训练 - -``` -python train.py --train_dir train_big_data/ --vocab_text_path big_vocab_text.txt --vocab_tag_path big_vocab_tag.txt --model_dir big_model --batch_size 500 --parallel 1 -``` - -## 预测 -小数据预测 -``` -python infer.py -``` - -全量数据预测 -``` -python infer.py --model_dir big_model --vocab_tag_path big_vocab_tag.txt --test_dir test_big_data/ -``` - -## 本地模拟多机 -运行命令 -``` -sh cluster_train.py -``` +您好,该项目已被迁移,请移步到 [PaddleRec/tagspace](../../../PaddleRec/tagspace) 目录下浏览本项目。 diff --git a/fluid/PaddleRec/word2vec/README.md b/fluid/PaddleRec/word2vec/README.md index 936d9fac5860f7adf9fcc587334ecb2aebce1991..7504ff9c332bf86f606d6d8770cefb325fc29ce0 100644 --- a/fluid/PaddleRec/word2vec/README.md +++ b/fluid/PaddleRec/word2vec/README.md @@ -1,113 +1,2 @@ -# 基于skip-gram的word2vector模型 -以下是本例的简要目录结构及说明: - -```text -. -├── cluster_train.py # 分布式训练函数 -├── cluster_train.sh # 本地模拟多机脚本 -├── train.py # 训练函数 -├── infer.py # 预测脚本 -├── net.py # 网络结构 -├── preprocess.py # 预处理脚本,包括构建词典和预处理文本 -├── reader.py # 训练阶段的文本读写 -├── README.md # 使用说明 -├── train.py # 训练函数 -└── utils.py # 通用函数 - -``` - -## 介绍 -本例实现了skip-gram模式的word2vector模型。 - - -## 数据下载 -全量数据集使用的是来自1 Billion Word Language Model Benchmark的(http://www.statmt.org/lm-benchmark) 的数据集. - -```bash -wget http://www.statmt.org/lm-benchmark/1-billion-word-language-modeling-benchmark-r13output.tar.gz -tar xzvf 1-billion-word-language-modeling-benchmark-r13output.tar -mv 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/ data/ -``` - -备用数据地址下载命令如下 - -```bash -wget https://paddlerec.bj.bcebos.com/word2vec/1-billion-word-language-modeling-benchmark-r13output.tar -tar xvf 1-billion-word-language-modeling-benchmark-r13output.tar -mv 1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/ data/ -``` - -为了方便快速验证,我们也提供了经典的text8样例数据集,包含1700w个词。 下载命令如下 - -```bash -wget https://paddlerec.bj.bcebos.com/word2vec/text.tar -tar xvf text.tar -mv text data/ -``` - - -## 数据预处理 -以样例数据集为例进行预处理。全量数据集注意解压后以training-monolingual.tokenized.shuffled 目录为预处理目录,和样例数据集的text目录并列。 - -词典格式: 词<空格>词频。注意低频词用'UNK'表示 - -可以按格式自建词典,如果自建词典跳过第一步。 -``` -the 1061396 -of 593677 -and 416629 -one 411764 -in 372201 -a 325873 - 324608 -to 316376 -zero 264975 -nine 250430 -``` - -第一步根据英文语料生成词典,中文语料可以通过修改text_strip方法自定义处理方法。 - -```bash -python preprocess.py --build_dict --build_dict_corpus_dir data/text/ --dict_path data/test_build_dict -``` - -第二步根据词典将文本转成id, 同时进行downsample,按照概率过滤常见词。 - -```bash -python preprocess.py --filter_corpus --dict_path data/test_build_dict --input_corpus_dir data/text/ --output_corpus_dir data/convert_text8 --min_count 5 --downsample 0.001 -``` - -## 训练 -具体的参数配置可运行 - - -```bash -python train.py -h -``` - -单机多线程训练 -```bash -OPENBLAS_NUM_THREADS=1 CPU_NUM=5 python train.py --train_data_dir data/convert_text8 --dict_path data/test_build_dict --num_passes 10 --batch_size 100 --model_output_dir v1_cpu5_b100_lr1dir --base_lr 1.0 --print_batch 1000 --with_speed --is_sparse -``` - -本地单机模拟多机训练 - -```bash -sh cluster_train.sh -``` - -## 预测 -测试集下载命令如下 - -```bash -#全量数据集测试集 -wget https://paddlerec.bj.bcebos.com/word2vec/test_dir.tar -#样本数据集测试集 -wget https://paddlerec.bj.bcebos.com/word2vec/test_mid_dir.tar -``` - -预测命令,注意词典名称需要加后缀"_word_to_id_", 此文件是训练阶段生成的。 -```bash -python infer.py --infer_epoch --test_dir data/test_mid_dir/ --dict_path data/test_build_dict_word_to_id_ --batch_size 20000 --model_dir v1_cpu5_b100_lr1dir/ --start_index 0 -``` +您好,该项目已被迁移,请移步到 [PaddleRec/word2vec](../../../PaddleRec/word2vec) 目录下浏览本项目。 diff --git a/fluid/README.cn.rst b/fluid/README.cn.rst deleted file mode 100644 index f75d7d0baa0a50cb8e7b5ab8325722a06d3c7ff0..0000000000000000000000000000000000000000 --- a/fluid/README.cn.rst +++ /dev/null @@ -1,204 +0,0 @@ -`Fluid 模型库 `__ -============ - -图像分类 --------- - -图像分类是根据图像的语义信息对不同类别图像进行区分,是计算机视觉中重要的基础问题,是物体检测、图像分割、物体跟踪、行为分析、人脸识别等其他高层视觉任务的基础,在许多领域都有着广泛的应用。如:安防领域的人脸识别和智能视频分析等,交通领域的交通场景识别,互联网领域基于内容的图像检索和相册自动归类,医学领域的图像识别等。 - -在深度学习时代,图像分类的准确率大幅度提升,在图像分类任务中,我们向大家介绍了如何在经典的数据集ImageNet上,训练常用的模型,包括AlexNet、VGG、GoogLeNet、ResNet、Inception-v4、MobileNet、DPN(Dual -Path -Network)、SE-ResNeXt模型,也开源了\ `训练的模型 `__\ 方便用户下载使用。同时提供了能够将Caffe模型转换为PaddlePaddle -Fluid模型配置和参数文件的工具。 - -- `AlexNet `__ -- `VGG `__ -- `GoogleNet `__ -- `Residual - Network `__ -- `Inception-v4 `__ -- `MobileNet `__ -- `Dual Path - Network `__ -- `SE-ResNeXt `__ -- `Caffe模型转换为Paddle - Fluid配置和模型文件工具 `__ - -目标检测 --------- - -目标检测任务的目标是给定一张图像或是一个视频帧,让计算机找出其中所有目标的位置,并给出每个目标的具体类别。对于人类来说,目标检测是一个非常简单的任务。然而,计算机能够“看到”的是图像被编码之后的数字,很难解图像或是视频帧中出现了人或是物体这样的高层语义概念,也就更加难以定位目标出现在图像中哪个区域。与此同时,由于目标会出现在图像或是视频帧中的任何位置,目标的形态千变万化,图像或是视频帧的背景千差万别,诸多因素都使得目标检测对计算机来说是一个具有挑战性的问题。 - -在目标检测任务中,我们介绍了如何基于\ `PASCAL -VOC `__\ 、\ `MS -COCO `__\ 数据训练通用物体检测模型,当前介绍了SSD算法,SSD全称Single Shot MultiBox Detector,是目标检测领域较新且效果较好的检测算法之一,具有检测速度快且检测精度高的特点。 - -开放环境中的检测人脸,尤其是小的、模糊的和部分遮挡的人脸也是一个具有挑战的任务。我们也介绍了如何基于 `WIDER FACE `_ 数据训练百度自研的人脸检测PyramidBox模型,该算法于2018年3月份在WIDER FACE的多项评测中均获得 `第一名 `_ 。 - -RCNN系列模型是典型的两阶段目标检测器,相较于传统提取区域的方法,RCNN中RPN网络通过共享卷积层参数大幅提高提取区域的效率,并提出高质量的候选区域。其中典型模型包括Faster RCNN和Mask RCNN。 - -- `Single Shot MultiBox - Detector `__ -- `Face Detector: PyramidBox `_ -- `RCNN `_ - -图像语义分割 ------------- - -图像语意分割顾名思义是将图像像素按照表达的语义含义的不同进行分组/分割,图像语义是指对图像内容的理解,例如,能够描绘出什么物体在哪里做了什么事情等,分割是指对图片中的每个像素点进行标注,标注属于哪一类别。近年来用在无人车驾驶技术中分割街景来避让行人和车辆、医疗影像分析中辅助诊断等。 - -在图像语义分割任务中,我们介绍如何基于图像级联网络(Image Cascade -Network,ICNet)进行语义分割,相比其他分割算法,ICNet兼顾了准确率和速度。 - -- `ICNet `__ - -图像生成 ------------ - -图像生成是指根据输入向量,生成目标图像。这里的输入向量可以是随机的噪声或用户指定的条件向量。具体的应用场景有:手写体生成、人脸合成、风格迁移、图像修复等。当前的图像生成任务主要是借助生成对抗网络(GAN)来实现。 -生成对抗网络(GAN)由两种子网络组成:生成器和识别器。生成器的输入是随机噪声或条件向量,输出是目标图像。识别器是一个分类器,输入是一张图像,输出是该图像是否是真实的图像。在训练过程中,生成器和识别器通过不断的相互博弈提升自己的能力。 - -在图像生成任务中,我们介绍了如何使用DCGAN和ConditioanlGAN来进行手写数字的生成,另外还介绍了用于风格迁移的CycleGAN. - -- `DCGAN & ConditionalGAN `__ -- `CycleGAN `__ - -场景文字识别 ------------- - -许多场景图像中包含着丰富的文本信息,对理解图像信息有着重要作用,能够极大地帮助人们认知和理解场景图像的内容。场景文字识别是在图像背景复杂、分辨率低下、字体多样、分布随意等情况下,将图像信息转化为文字序列的过程,可认为是一种特别的翻译过程:将图像输入翻译为自然语言输出。场景图像文字识别技术的发展也促进了一些新型应用的产生,如通过自动识别路牌中的文字帮助街景应用获取更加准确的地址信息等。 - -在场景文字识别任务中,我们介绍如何将基于CNN的图像特征提取和基于RNN的序列翻译技术结合,免除人工定义特征,避免字符分割,使用自动学习到的图像特征,完成字符识别。当前,介绍了CRNN-CTC模型和基于注意力机制的序列到序列模型。 - -- `CRNN-CTC模型 `__ -- `Attention模型 `__ - - -度量学习 -------- - - -度量学习也称作距离度量学习、相似度学习,通过学习对象之间的距离,度量学习能够用于分析对象时间的关联、比较关系,在实际问题中应用较为广泛,可应用于辅助分类、聚类问题,也广泛用于图像检索、人脸识别等领域。以往,针对不同的任务,需要选择合适的特征并手动构建距离函数,而度量学习可根据不同的任务来自主学习出针对特定任务的度量距离函数。度量学习和深度学习的结合,在人脸识别/验证、行人再识别(human Re-ID)、图像检索等领域均取得较好的性能,在这个任务中我们主要介绍了基于Fluid的深度度量学习模型,包含了三元组、四元组等损失函数。 - -- `Metric Learning `__ - - -视频分类 -------- - -视频分类是视频理解任务的基础,与图像分类不同的是,分类的对象不再是静止的图像,而是一个由多帧图像构成的、包含语音数据、包含运动信息等的视频对象,因此理解视频需要获得更多的上下文信息,不仅要理解每帧图像是什么、包含什么,还需要结合不同帧,知道上下文的关联信息。视频分类方法主要包含基于卷积神经网络、基于循环神经网络、或将这两者结合的方法。该任务中我们介绍基于Fluid的视频分类模型,目前包含Temporal Segment Network(TSN)模型,后续会持续增加更多模型。 - - -- `TSN `__ - - - -语音识别 --------- - -自动语音识别(Automatic Speech Recognition, -ASR)是将人类声音中的词汇内容转录成计算机可输入的文字的技术。语音识别的相关研究经历了漫长的探索过程,在HMM/GMM模型之后其发展一直较为缓慢,随着深度学习的兴起,其迎来了春天。在多种语言识别任务中,将深度神经网络(DNN)作为声学模型,取得了比GMM更好的性能,使得 -ASR -成为深度学习应用最为成功的领域之一。而由于识别准确率的不断提高,有越来越多的语言技术产品得以落地,例如语言输入法、以智能音箱为代表的智能家居设备等 -—— 基于语言的交互方式正在深刻的改变人类的生活。 - -与 `DeepSpeech `__ -中深度学习模型端到端直接预测字词的分布不同,本实例更接近传统的语言识别流程,以音素为建模单元,关注语言识别中声学模型的训练,利用\ `kaldi `__\ 进行音频数据的特征提取和标签对齐,并集成 -kaldi 的解码器完成解码。 - -- `DeepASR `__ - -机器翻译 --------- - -机器翻译(Machine -Translation)将一种自然语言(源语言)转换成一种自然语言(目标语言),是自然语言处理中非常基础和重要的研究方向。在全球化的浪潮中,机器翻译在促进跨语言文明的交流中所起的重要作用是不言而喻的。其发展经历了统计机器翻译和基于神经网络的神经机器翻译(Nueural -Machine Translation, NMT)等阶段。在 NMT -成熟后,机器翻译才真正得以大规模应用。而早阶段的 NMT -主要是基于循环神经网络 RNN -的,其训练过程中当前时间步依赖于前一个时间步的计算,时间步之间难以并行化以提高训练速度。因此,非 -RNN 结构的 NMT 得以应运而生,例如基于卷积神经网络 CNN -的结构和基于自注意力机制(Self-Attention)的结构。 - -本实例所实现的 Transformer -就是一个基于自注意力机制的机器翻译模型,其中不再有RNN或CNN结构,而是完全利用 -Attention 学习语言中的上下文依赖。相较于RNN/CNN, -这种结构在单层内计算复杂度更低、易于并行化、对长程依赖更易建模,最终在多种语言之间取得了最好的翻译效果。 - -- `Transformer `__ - -强化学习 --------- - -强化学习是近年来一个愈发重要的机器学习方向,特别是与深度学习相结合而形成的深度强化学习(Deep -Reinforcement Learning, -DRL),取得了很多令人惊异的成就。人们所熟知的战胜人类顶级围棋职业选手的 -AlphaGo 就是 DRL -应用的一个典型例子,除游戏领域外,其它的应用还包括机器人、自然语言处理等。 - -深度强化学习的开山之作是在Atari视频游戏中的成功应用, -其可直接接受视频帧这种高维输入并根据图像内容端到端地预测下一步的动作,所用到的模型被称为深度Q网络(Deep -Q-Network, DQN)。本实例就是利用PaddlePaddle Fluid这个灵活的框架,实现了 -DQN 及其变体,并测试了它们在 Atari 游戏中的表现。 - -- `DeepQNetwork `__ - -中文词法分析 ------------- - -中文分词(Word Segmentation)是将连续的自然语言文本,切分出具有语义合理性和完整性的词汇序列的过程。因为在汉语中,词是承担语义的最基本单位,切词是文本分类、情感分析、信息检索等众多自然语言处理任务的基础。 词性标注(Part-of-speech Tagging)是为自然语言文本中的每一个词汇赋予一个词性的过程,这里的词性包括名词、动词、形容词、副词等等。 命名实体识别(Named Entity Recognition,NER)又称作“专名识别”,是指识别自然语言文本中具有特定意义的实体,主要包括人名、地名、机构名、专有名词等。 我们将这三个任务统一成一个联合任务,称为词法分析任务,基于深度神经网络,利用海量标注语料进行训练,提供了一个端到端的解决方案。 - -我们把这个联合的中文词法分析解决方案命名为LAC。LAC既可以认为是Lexical Analysis of Chinese的首字母缩写,也可以认为是LAC Analyzes Chinese的递归缩写。 - -- `LAC `__ - -情感倾向分析 ------------- - -情感倾向分析针对带有主观描述的中文文本,可自动判断该文本的情感极性类别并给出相应的置信度。情感类型分为积极、消极、 中性。情感倾向分析能够帮助企业理解用户消费习惯、分析热点话题和危机舆情监控,为企业提供有力的决策支持。本次我们开放 AI开放平台中情感倾向分析采用的\ `模型 `__\, 提供给用户使用。 - -- `Senta `__ - -语义匹配 --------- - -在自然语言处理很多场景中,需要度量两个文本在语义上的相似度,这类任务通常被称为语义匹配。例如在搜索中根据查询与候选文档的相似度对搜索结果进行排序,文本去重中文本与文本相似度的计算,自动问答中候选答案与问题的匹配等。 - -本例所开放的DAM (Deep Attention Matching Network)为百度自然语言处理部发表于ACL-2018的工作,用于检索式聊天机器人多轮对话中应答的选择。DAM受Transformer的启发,其网络结构完全基于注意力(attention)机制,利用栈式的self-attention结构分别学习不同粒度下应答和语境的语义表示,然后利用cross-attention获取应答与语境之间的相关性,在两个大规模多轮对话数据集上的表现均好于其它模型。 - -- `Deep Attention Matching Network `__ - -AnyQ ----- - -`AnyQ `__\ (ANswer Your Questions) -开源项目主要包含面向FAQ集合的问答系统框架、文本语义匹配工具SimNet。 -问答系统框架采用了配置化、插件化的设计,各功能均通过插件形式加入,当前共开放了20+种插件。开发者可以使用AnyQ系统快速构建和定制适用于特定业务场景的FAQ问答系统,并加速迭代和升级。 - -SimNet是百度自然语言处理部于2013年自主研发的语义匹配框架,该框架在百度各产品上广泛应用,主要包括BOW、CNN、RNN、MM-DNN等核心网络结构形式,同时基于该框架也集成了学术界主流的语义匹配模型,如MatchPyramid、MV-LSTM、K-NRM等模型。使用SimNet构建出的模型可以便捷的加入AnyQ系统中,增强AnyQ系统的语义匹配能力。 - -- `SimNet in PaddlePaddle - Fluid `__ - -机器阅读理解 ----- - -机器阅读理解(MRC)是自然语言处理(NLP)中的核心任务之一,最终目标是让机器像人类一样阅读文本,提炼文本信息并回答相关问题。深度学习近年来在NLP中得到广泛使用,也使得机器阅读理解能力在近年有了大幅提高,但是目前研究的机器阅读理解都采用人工构造的数据集,以及回答一些相对简单的问题,和人类处理的数据还有明显差距,因此亟需大规模真实训练数据推动MRC的进一步发展。 - -百度阅读理解数据集是由百度自然语言处理部开源的一个真实世界数据集,所有的问题、原文都来源于实际数据(百度搜索引擎数据和百度知道问答社区),答案是由人类回答的。每个问题都对应多个答案,数据集包含200k问题、1000k原文和420k答案,是目前最大的中文MRC数据集。百度同时开源了对应的阅读理解模型,称为DuReader,采用当前通用的网络分层结构,通过双向attention机制捕捉问题和原文之间的交互关系,生成query-aware的原文表示,最终基于query-aware的原文表示通过point network预测答案范围。 - -- `DuReader in PaddlePaddle Fluid `__ - - -个性化推荐 -------- - -推荐系统在当前的互联网服务中正在发挥越来越大的作用,目前大部分电子商务系统、社交网络,广告推荐,搜索引擎,都不同程度的使用了各种形式的个性化推荐技术,帮助用户快速找到他们想要的信息。 - -在工业可用的推荐系统中,推荐策略一般会被划分为多个模块串联执行。以新闻推荐系统为例,存在多个可以使用深度学习技术的环节,例如新闻的自动化标注,个性化新闻召回,个性化匹配与排序等。PaddlePaddle对推荐算法的训练提供了完整的支持,并提供了多种模型配置供用户选择。 - -- `TagSpace `_ -- `GRU4Rec `_ -- `SequenceSemanticRetrieval `_ -- `DeepCTR `_ -- `Multiview-Simnet `_ diff --git a/fluid/README.md b/fluid/README.md deleted file mode 100644 index 788ec1c109150a9359c8637090b30011d6485af8..0000000000000000000000000000000000000000 --- a/fluid/README.md +++ /dev/null @@ -1,172 +0,0 @@ -Fluid 模型库 -============ - -图像分类 --------- - -图像分类是根据图像的语义信息对不同类别图像进行区分,是计算机视觉中重要的基础问题,是物体检测、图像分割、物体跟踪、行为分析、人脸识别等其他高层视觉任务的基础,在许多领域都有着广泛的应用。如:安防领域的人脸识别和智能视频分析等,交通领域的交通场景识别,互联网领域基于内容的图像检索和相册自动归类,医学领域的图像识别等。 - -在深度学习时代,图像分类的准确率大幅度提升,在图像分类任务中,我们向大家介绍了如何在经典的数据集ImageNet上,训练常用的模型,包括AlexNet、VGG、GoogLeNet、ResNet、Inception-v4、MobileNet、DPN(Dual Path Network)、SE-ResNeXt模型,也开源了[训练的模型](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/image_classification/README_cn.md#已有模型及其性能) 方便用户下载使用。同时提供了能够将Caffe模型转换为PaddlePaddle -Fluid模型配置和参数文件的工具。 - -- [AlexNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models) -- [VGG](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models) -- [GoogleNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models) -- [Residual Network](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models) -- [Inception-v4](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models) -- [MobileNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models) -- [Dual Path Network](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models) -- [SE-ResNeXt](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models) -- [Caffe模型转换为Paddle Fluid配置和模型文件工具](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/caffe2fluid) - -目标检测 --------- - -目标检测任务的目标是给定一张图像或是一个视频帧,让计算机找出其中所有目标的位置,并给出每个目标的具体类别。对于人类来说,目标检测是一个非常简单的任务。然而,计算机能够“看到”的是图像被编码之后的数字,很难解图像或是视频帧中出现了人或是物体这样的高层语义概念,也就更加难以定位目标出现在图像中哪个区域。与此同时,由于目标会出现在图像或是视频帧中的任何位置,目标的形态千变万化,图像或是视频帧的背景千差万别,诸多因素都使得目标检测对计算机来说是一个具有挑战性的问题。 - -在目标检测任务中,我们介绍了如何基于[PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/) 、[MS COCO](http://cocodataset.org/#home) 数据训练通用物体检测模型,当前介绍了SSD算法,SSD全称Single Shot MultiBox Detector,是目标检测领域较新且效果较好的检测算法之一,具有检测速度快且检测精度高的特点。 - -开放环境中的检测人脸,尤其是小的、模糊的和部分遮挡的人脸也是一个具有挑战的任务。我们也介绍了如何基于 [WIDER FACE](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace) 数据训练百度自研的人脸检测PyramidBox模型,该算法于2018年3月份在WIDER FACE的多项评测中均获得 [第一名](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/WiderFace_Results.html)。 - -Faster RCNN模型是典型的两阶段目标检测器,相较于传统提取区域的方法,通过RPN网络共享卷积层参数大幅提高提取区域的效率,并提出高质量的候选区域。 - -Mask RCNN模型是基于Faster RCNN模型的经典实例分割模型,在原有Faster RCNN模型基础上添加分割分支,得到掩码结果,实现了掩码和类别预测关系的解藕。 - -- [Single Shot MultiBox Detector](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/object_detection/README_cn.md) -- [Face Detector: PyramidBox](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/face_detection/README_cn.md) -- [Faster RCNN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/rcnn/README_cn.md) -- [Mask RCNN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/rcnn/README_cn.md) - -图像语义分割 ------------- - -图像语意分割顾名思义是将图像像素按照表达的语义含义的不同进行分组/分割,图像语义是指对图像内容的理解,例如,能够描绘出什么物体在哪里做了什么事情等,分割是指对图片中的每个像素点进行标注,标注属于哪一类别。近年来用在无人车驾驶技术中分割街景来避让行人和车辆、医疗影像分析中辅助诊断等。 - -在图像语义分割任务中,我们介绍如何基于图像级联网络(Image Cascade -Network,ICNet)进行语义分割,相比其他分割算法,ICNet兼顾了准确率和速度。 - -- [ICNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/icnet) - -图像生成 ------------ - -图像生成是指根据输入向量,生成目标图像。这里的输入向量可以是随机的噪声或用户指定的条件向量。具体的应用场景有:手写体生成、人脸合成、风格迁移、图像修复等。当前的图像生成任务主要是借助生成对抗网络(GAN)来实现。 -生成对抗网络(GAN)由两种子网络组成:生成器和识别器。生成器的输入是随机噪声或条件向量,输出是目标图像。识别器是一个分类器,输入是一张图像,输出是该图像是否是真实的图像。在训练过程中,生成器和识别器通过不断的相互博弈提升自己的能力。 - -在图像生成任务中,我们介绍了如何使用DCGAN和ConditioanlGAN来进行手写数字的生成,另外还介绍了用于风格迁移的CycleGAN. - -- [DCGAN & ConditionalGAN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/gan/c_gan) -- [CycleGAN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/gan/cycle_gan) - -场景文字识别 ------------- - -许多场景图像中包含着丰富的文本信息,对理解图像信息有着重要作用,能够极大地帮助人们认知和理解场景图像的内容。场景文字识别是在图像背景复杂、分辨率低下、字体多样、分布随意等情况下,将图像信息转化为文字序列的过程,可认为是一种特别的翻译过程:将图像输入翻译为自然语言输出。场景图像文字识别技术的发展也促进了一些新型应用的产生,如通过自动识别路牌中的文字帮助街景应用获取更加准确的地址信息等。 - -在场景文字识别任务中,我们介绍如何将基于CNN的图像特征提取和基于RNN的序列翻译技术结合,免除人工定义特征,避免字符分割,使用自动学习到的图像特征,完成字符识别。当前,介绍了CRNN-CTC模型和基于注意力机制的序列到序列模型。 - -- [CRNN-CTC模型](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/ocr_recognition) -- [Attention模型](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/ocr_recognition) - - -度量学习 -------- - - -度量学习也称作距离度量学习、相似度学习,通过学习对象之间的距离,度量学习能够用于分析对象时间的关联、比较关系,在实际问题中应用较为广泛,可应用于辅助分类、聚类问题,也广泛用于图像检索、人脸识别等领域。以往,针对不同的任务,需要选择合适的特征并手动构建距离函数,而度量学习可根据不同的任务来自主学习出针对特定任务的度量距离函数。度量学习和深度学习的结合,在人脸识别/验证、行人再识别(human Re-ID)、图像检索等领域均取得较好的性能,在这个任务中我们主要介绍了基于Fluid的深度度量学习模型,包含了三元组、四元组等损失函数。 - -- [Metric Learning](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/metric_learning) - - -视频分类 -------- - -视频分类是视频理解任务的基础,与图像分类不同的是,分类的对象不再是静止的图像,而是一个由多帧图像构成的、包含语音数据、包含运动信息等的视频对象,因此理解视频需要获得更多的上下文信息,不仅要理解每帧图像是什么、包含什么,还需要结合不同帧,知道上下文的关联信息。视频分类方法主要包含基于卷积神经网络、基于循环神经网络、或将这两者结合的方法。该任务中我们介绍基于Fluid的视频分类模型,目前包含Temporal Segment Network(TSN)模型,后续会持续增加更多模型。 - - -- [TSN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/video_classification) - - -语音识别 --------- - -自动语音识别(Automatic Speech Recognition, ASR)是将人类声音中的词汇内容转录成计算机可输入的文字的技术。语音识别的相关研究经历了漫长的探索过程,在HMM/GMM模型之后其发展一直较为缓慢,随着深度学习的兴起,其迎来了春天。在多种语言识别任务中,将深度神经网络(DNN)作为声学模型,取得了比GMM更好的性能,使得 ASR 成为深度学习应用最为成功的领域之一。而由于识别准确率的不断提高,有越来越多的语言技术产品得以落地,例如语言输入法、以智能音箱为代表的智能家居设备等 — 基于语言的交互方式正在深刻的改变人类的生活。 - -与 [DeepSpeech](https://github.com/PaddlePaddle/DeepSpeech) 中深度学习模型端到端直接预测字词的分布不同,本实例更接近传统的语言识别流程,以音素为建模单元,关注语言识别中声学模型的训练,利用[kaldi](http://www.kaldi-asr.org) 进行音频数据的特征提取和标签对齐,并集成 kaldi 的解码器完成解码。 - -- [DeepASR](https://github.com/PaddlePaddle/models/blob/develop/fluid/DeepASR/README_cn.md) - -机器翻译 --------- - -机器翻译(Machine Translation)将一种自然语言(源语言)转换成一种自然语言(目标语言),是自然语言处理中非常基础和重要的研究方向。在全球化的浪潮中,机器翻译在促进跨语言文明的交流中所起的重要作用是不言而喻的。其发展经历了统计机器翻译和基于神经网络的神经机器翻译(Nueural -Machine Translation, NMT)等阶段。在 NMT 成熟后,机器翻译才真正得以大规模应用。而早阶段的 NMT 主要是基于循环神经网络 RNN 的,其训练过程中当前时间步依赖于前一个时间步的计算,时间步之间难以并行化以提高训练速度。因此,非 RNN 结构的 NMT 得以应运而生,例如基 卷积神经网络 CNN 的结构和基于自注意力机制(Self-Attention)的结构。 - -本实例所实现的 Transformer 就是一个基于自注意力机制的机器翻译模型,其中不再有RNN或CNN结构,而是完全利用 Attention 学习语言中的上下文依赖。相较于RNN/CNN, 这种结构在单层内计算复杂度更低、易于并行化、对长程依赖更易建模,最终在多种语言之间取得了最好的翻译效果。 - -- [Transformer](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleNLP/neural_machine_translation/transformer/README_cn.md) - -强化学习 --------- - -强化学习是近年来一个愈发重要的机器学习方向,特别是与深度学习相结合而形成的深度强化学习(Deep Reinforcement Learning, DRL),取得了很多令人惊异的成就。人们所熟知的战胜人类顶级围棋职业选手的 AlphaGo 就是 DRL 应用的一个典型例子,除游戏领域外,其它的应用还包括机器人、自然语言处理等。 - -深度强化学习的开山之作是在Atari视频游戏中的成功应用, 其可直接接受视频帧这种高维输入并根据图像内容端到端地预测下一步的动作,所用到的模型被称为深度Q网络(Deep Q-Network, DQN)。本实例就是利用PaddlePaddle Fluid这个灵活的框架,实现了 DQN 及其变体,并测试了它们在 Atari 游戏中的表现。 - -- [DeepQNetwork](https://github.com/PaddlePaddle/models/blob/develop/fluid/DeepQNetwork/README_cn.md) - -中文词法分析 ------------- - -中文分词(Word Segmentation)是将连续的自然语言文本,切分出具有语义合理性和完整性的词汇序列的过程。因为在汉语中,词是承担语义的最基本单位,切词是文本分类、情感分析、信息检索等众多自然语言处理任务的基础。 词性标注(Part-of-speech Tagging)是为自然语言文本中的每一个词汇赋予一个词性的过程,这里的词性包括名词、动词、形容词、副词等等。 命名实体识别(Named Entity Recognition,NER)又称作“专名识别”,是指识别自然语言文本中具有特定意义的实体,主要包括人名、地名、机构名、专有名词等。 我们将这三个任务统一成一个联合任务,称为词法分析任务,基于深度神经网络,利用海量标注语料进行训练,提供了一个端到端的解决方案。 - -我们把这个联合的中文词法分析解决方案命名为LAC。LAC既可以认为是Lexical Analysis of Chinese的首字母缩写,也可以认为是LAC Analyzes Chinese的递归缩写。 - -- [LAC](https://github.com/baidu/lac/blob/master/README.md) - -情感倾向分析 ------------- - -情感倾向分析针对带有主观描述的中文文本,可自动判断该文本的情感极性类别并给出相应的置信度。情感类型分为积极、消极、中性。情感倾向分析能够帮助企业理解用户消费习惯、分析热点话题和危机舆情监控,为企业提供有力的决策支持。本次我们开放 AI 开放平台中情感倾向分析采用的[模型](http://ai.baidu.com/tech/nlp/sentiment_classify),提供给用户使用。 - -- [Senta](https://github.com/baidu/Senta/blob/master/README.md) - -语义匹配 --------- - -在自然语言处理很多场景中,需要度量两个文本在语义上的相似度,这类任务通常被称为语义匹配。例如在搜索中根据查询与候选文档的相似度对搜索结果进行排序,文本去重中文本与文本相似度的计算,自动问答中候选答案与问题的匹配等。 - -本例所开放的DAM (Deep Attention Matching Network)为百度自然语言处理部发表于ACL-2018的工作,用于检索式聊天机器人多轮对话中应答的选择。DAM受Transformer的启发,其网络结构完全基于注意力(attention)机制,利用栈式的self-attention结构分别学习不同粒度下应答和语境的语义表示,然后利用cross-attention获取应答与语境之间的相关性,在两个大规模多轮对话数据集上的表现均好于其它模型。 - -- [Deep Attention Matching Network](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleNLP/deep_attention_matching_net) - -AnyQ ----- - -[AnyQ](https://github.com/baidu/AnyQ)(ANswer Your Questions) 开源项目主要包含面向FAQ集合的问答系统框架、文本语义匹配工具SimNet。 问答系统框架采用了配置化、插件化的设计,各功能均通过插件形式加入,当前共开放了20+种插件。开发者可以使用AnyQ系统快速构建和定制适用于特定业务场景的FAQ问答系统,并加速迭代和升级。 - -SimNet是百度自然语言处理部于2013年自主研发的语义匹配框架,该框架在百度各产品上广泛应用,主要包括BOW、CNN、RNN、MM-DNN等核心网络结构形式,同时基于该框架也集成了学术界主流的语义匹配模型,如MatchPyramid、MV-LSTM、K-NRM等模型。使用SimNet构建出的模型可以便捷的加入AnyQ系统中,增强AnyQ系统的语义匹配能力。 - -- [SimNet in PaddlePaddle Fluid](https://github.com/baidu/AnyQ/blob/master/tools/simnet/train/paddle/README.md) - -机器阅读理解 ----------- - -机器阅读理解(MRC)是自然语言处理(NLP)中的核心任务之一,最终目标是让机器像人类一样阅读文本,提炼文本信息并回答相关问题。深度学习近年来在NLP中得到广泛使用,也使得机器阅读理解能力在近年有了大幅提高,但是目前研究的机器阅读理解都采用人工构造的数据集,以及回答一些相对简单的问题,和人类处理的数据还有明显差距,因此亟需大规模真实训练数据推动MRC的进一步发展。 - -百度阅读理解数据集是由百度自然语言处理部开源的一个真实世界数据集,所有的问题、原文都来源于实际数据(百度搜索引擎数据和百度知道问答社区),答案是由人类回答的。每个问题都对应多个答案,数据集包含200k问题、1000k原文和420k答案,是目前最大的中文MRC数据集。百度同时开源了对应的阅读理解模型,称为DuReader,采用当前通用的网络分层结构,通过双向attention机制捕捉问题和原文之间的交互关系,生成query-aware的原文表示,最终基于query-aware的原文表示通过point network预测答案范围。 - -- [DuReader in PaddlePaddle Fluid](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleNLP/machine_reading_comprehension/README.md) - -个性化推荐 -------- - -推荐系统在当前的互联网服务中正在发挥越来越大的作用,目前大部分电子商务系统、社交网络,广告推荐,搜索引擎,都不同程度的使用了各种形式的个性化推荐技术,帮助用户快速找到他们想要的信息。 - -在工业可用的推荐系统中,推荐策略一般会被划分为多个模块串联执行。以新闻推荐系统为例,存在多个可以使用深度学习技术的环节,例如新闻的自动化标注,个性化新闻召回,个性化匹配与排序等。PaddlePaddle对推荐算法的训练提供了完整的支持,并提供了多种模型配置供用户选择。 - -- [TagSpace](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/tagspace) -- [GRU4Rec](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/gru4rec) -- [SequenceSemanticRetrieval](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/ssr) -- [DeepCTR](https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleRec/ctr/README.cn.md) -- [Multiview-Simnet](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/multiview_simnet) diff --git a/fluid/adversarial/README.md b/fluid/adversarial/README.md index 91661f7e1675d59c7d38c4c09bc67d5b9339573d..b43046d174c6fa7cc9517c043601d5a86e53604a 100644 --- a/fluid/adversarial/README.md +++ b/fluid/adversarial/README.md @@ -1,112 +1,6 @@ -The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html). ---- +Hi! -# Advbox +This directory has been deprecated. -Advbox is a toolbox to generate adversarial examples that fool neural networks and Advbox can benchmark the robustness of machine learning models. - -The Advbox is based on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) Fluid and is under continual development, always welcoming contributions of the latest method of adversarial attacks and defenses. - - -## Overview -[Szegedy et al.](https://arxiv.org/abs/1312.6199) discovered an intriguing properties of deep neural networks in the context of image classification for the first time. They showed that despite the state-of-the-art deep networks are surprisingly susceptible to adversarial attacks in the form of small perturbations to images that remain (almost) imperceptible to human vision system. These perturbations are found by optimizing the input to maximize the prediction error and the images modified by these perturbations are called as `adversarial examples`. The profound implications of these results triggered a wide interest of researchers in adversarial attacks and their defenses for deep learning in general. - -Advbox is similar to [Foolbox](https://github.com/bethgelab/foolbox) and [CleverHans](https://github.com/tensorflow/cleverhans). CleverHans only supports TensorFlow framework while foolbox interfaces with many popular machine learning frameworks such as PyTorch, Keras, TensorFlow, Theano, Lasagne and MXNet. However, these two great libraries don't support PaddlePaddle, an easy-to-use, efficient, flexible and scalable deep learning platform which is originally developed by Baidu scientists and engineers for the purpose of applying deep learning to many products at Baidu. - -## Usage -Advbox provides many stable reference implementations of modern methods to generate adversarial examples such as FGSM, DeepFool, JSMA. When you want to benchmark the robustness of your neural networks , you can use the advbox to generate some adversarial examples and benchmark the networks. Some tips of using Advbox: - -1. Train a model and save the parameters. -2. Load the parameters which has been trained,then reconstruct the model. -3. Use advbox to generate the adversarial samples. - - -#### Dependencies -* PaddlePaddle: [the lastest develop branch](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html) -* Python 2.x - -#### Structure - -Network models, attack method's implements and the criterion that defines adversarial examples are three essential elements to generate adversarial examples. Misclassification is adopted as the adversarial criterion for briefness in Advbox. - -The structure of Advbox module are as follows: - - . - ├── advbox - | ├── __init__.py - | ├── attack - | ├── __init__.py - | ├── base.py - | ├── deepfool.py - | ├── gradient_method.py - | ├── lbfgs.py - | └── saliency.py - | ├── models - | ├── __init__.py - | ├── base.py - | └── paddle.py - | └── adversary.py - ├── tutorials - | ├── __init__.py - | ├── mnist_model.py - | ├── mnist_tutorial_lbfgs.py - | ├── mnist_tutorial_fgsm.py - | ├── mnist_tutorial_bim.py - | ├── mnist_tutorial_ilcm.py - | ├── mnist_tutorial_mifgsm.py - | ├── mnist_tutorial_jsma.py - | └── mnist_tutorial_deepfool.py - └── README.md - -**advbox.attack** - -Advbox implements several popular adversarial attacks which search adversarial examples. Each attack method uses a distance measure(L1, L2, etc.) to quantify the size of adversarial perturbations. Advbox is easy to craft adversarial example as some attack methods could perform internal hyperparameter tuning to find the minimum perturbation. - -**advbox.model** - -Advbox implements interfaces to PaddlePaddle. Additionally, other deep learning framworks such as TensorFlow can also be defined and employed. The module is use to compute predictions and gradients for given inputs in a specific framework. - -**advbox.adversary** - -Adversary contains the original object, the target and the adversarial examples. It provides the misclassification as the criterion to accept a adversarial example. - -## Tutorials -The `./tutorials/` folder provides some tutorials to generate adversarial examples on the MNIST dataset. You can slightly modify the code to apply to other dataset. These attack methods are supported in Advbox: - -* [L-BFGS](https://arxiv.org/abs/1312.6199) -* [FGSM](https://arxiv.org/abs/1412.6572) -* [BIM](https://arxiv.org/abs/1607.02533) -* [ILCM](https://arxiv.org/abs/1607.02533) -* [MI-FGSM](https://arxiv.org/pdf/1710.06081.pdf) -* [JSMA](https://arxiv.org/pdf/1511.07528) -* [DeepFool](https://arxiv.org/abs/1511.04599) - -## Testing -Benchmarks on a vanilla CNN model. - -> MNIST - -| adversarial attacks | fooling rate (non-targeted) | fooling rate (targeted) | max_epsilon | iterations | Strength | -|:-----:| :----: | :---: | :----: | :----: | :----: | -|L-BFGS| --- | 89.2% | --- | One shot | *** | -|FGSM| 57.8% | 26.55% | 0.3 | One shot| *** | -|BIM| 97.4% | --- | 0.1 | 100 | **** | -|ILCM| --- | 100.0% | 0.1 | 100 | **** | -|MI-FGSM| 94.4% | 100.0% | 0.1 | 100 | **** | -|JSMA| 96.8% | 90.4%| 0.1 | 2000 | *** | -|DeepFool| 97.7% | 51.3% | --- | 100 | **** | - -* The strength (higher for more asterisks) is based on the impression from the reviewed literature. - ---- -## References -* [Intriguing properties of neural networks](https://arxiv.org/abs/1312.6199), C. Szegedy et al., arxiv 2014 -* [Explaining and Harnessing Adversarial Examples](https://arxiv.org/abs/1412.6572), I. Goodfellow et al., ICLR 2015 -* [Adversarial Examples In The Physical World](https://arxiv.org/pdf/1607.02533v3.pdf), A. Kurakin et al., ICLR workshop 2017 -* [Boosting Adversarial Attacks with Momentum](https://arxiv.org/abs/1710.06081), Yinpeng Dong et al., arxiv 2018 -* [The Limitations of Deep Learning in Adversarial Settings](https://arxiv.org/abs/1511.07528), N. Papernot et al., ESSP 2016 -* [DeepFool: a simple and accurate method to fool deep neural networks](https://arxiv.org/abs/1511.04599), S. Moosavi-Dezfooli et al., CVPR 2016 -* [Foolbox: A Python toolbox to benchmark the robustness of machine learning models](https://arxiv.org/abs/1707.04131), Jonas Rauber et al., arxiv 2018 -* [CleverHans: An adversarial example library for constructing attacks, building defenses, and benchmarking both](https://github.com/tensorflow/cleverhans#setting-up-cleverhans) -* [Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey](https://arxiv.org/abs/1801.00553), Naveed Akhtar, Ajmal Mian, arxiv 2018 +Please visit the project at [PaddleCV/adversarial](../../PaddleCV/adversarial). diff --git a/fluid/mnist/.run_ce.sh b/fluid/mnist/.run_ce.sh deleted file mode 100755 index d6ccf429b52da1ff26ac02df5af287461a823a98..0000000000000000000000000000000000000000 --- a/fluid/mnist/.run_ce.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/bin/bash - -# This file is only used for continuous evaluation. - -rm -rf *_factor.txt -model_file='model.py' -python $model_file --batch_size 128 --pass_num 5 --device CPU | python _ce.py diff --git a/fluid/mnist/_ce.py b/fluid/mnist/_ce.py deleted file mode 100644 index 9c2dba53526d2e976252fce05c7ff7f0f44b39b2..0000000000000000000000000000000000000000 --- a/fluid/mnist/_ce.py +++ /dev/null @@ -1,61 +0,0 @@ -# this file is only used for continuous evaluation test! - -import os -import sys -sys.path.append(os.environ['ceroot']) -from kpi import CostKpi, DurationKpi, AccKpi - -# NOTE kpi.py should shared in models in some way!!!! - -train_cost_kpi = CostKpi('train_cost', 0.02, actived=True) -test_acc_kpi = AccKpi('test_acc', 0.005, actived=True) -train_duration_kpi = DurationKpi('train_duration', 0.06, actived=True) -train_acc_kpi = AccKpi('train_acc', 0.005, actived=True) - -tracking_kpis = [ - train_acc_kpi, - train_cost_kpi, - test_acc_kpi, - train_duration_kpi, -] - - -def parse_log(log): - ''' - This method should be implemented by model developers. - - The suggestion: - - each line in the log should be key, value, for example: - - " - train_cost\t1.0 - test_cost\t1.0 - train_cost\t1.0 - train_cost\t1.0 - train_acc\t1.2 - " - ''' - for line in log.split('\n'): - fs = line.strip().split('\t') - print(fs) - if len(fs) == 3 and fs[0] == 'kpis': - kpi_name = fs[1] - kpi_value = float(fs[2]) - yield kpi_name, kpi_value - - -def log_to_ce(log): - kpi_tracker = {} - for kpi in tracking_kpis: - kpi_tracker[kpi.name] = kpi - - for (kpi_name, kpi_value) in parse_log(log): - print(kpi_name, kpi_value) - kpi_tracker[kpi_name].add_record(kpi_value) - kpi_tracker[kpi_name].persist() - - -if __name__ == '__main__': - log = sys.stdin.read() - log_to_ce(log) diff --git a/fluid/mnist/model.py b/fluid/mnist/model.py deleted file mode 100644 index a66353c2239fd78eb1fdf9f08690994a9a7d1c08..0000000000000000000000000000000000000000 --- a/fluid/mnist/model.py +++ /dev/null @@ -1,198 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import numpy as np -import argparse -import time - -import paddle -import paddle.fluid as fluid -import paddle.fluid.profiler as profiler -import six - -SEED = 90 -DTYPE = "float32" - -# random seed must set before configuring the network. -fluid.default_startup_program().random_seed = SEED - - -def parse_args(): - parser = argparse.ArgumentParser("mnist model benchmark.") - parser.add_argument( - '--batch_size', type=int, default=128, help='The minibatch size.') - parser.add_argument( - '--iterations', type=int, default=35, help='The number of minibatches.') - parser.add_argument( - '--pass_num', type=int, default=5, help='The number of passes.') - parser.add_argument( - '--device', - type=str, - default='GPU', - choices=['CPU', 'GPU'], - help='The device type.') - parser.add_argument( - '--infer_only', action='store_true', help='If set, run forward only.') - parser.add_argument( - '--use_cprof', action='store_true', help='If set, use cProfile.') - parser.add_argument( - '--use_nvprof', - action='store_true', - help='If set, use nvprof for CUDA.') - args = parser.parse_args() - return args - - -def print_arguments(args): - vars(args)['use_nvprof'] = (vars(args)['use_nvprof'] and - vars(args)['device'] == 'GPU') - print('----------- Configuration Arguments -----------') - for arg, value in sorted(six.iteritems(vars(args))): - print('%s: %s' % (arg, value)) - print('------------------------------------------------') - - -def cnn_model(data): - conv_pool_1 = fluid.nets.simple_img_conv_pool( - input=data, - filter_size=5, - num_filters=20, - pool_size=2, - pool_stride=2, - act="relu") - conv_pool_2 = fluid.nets.simple_img_conv_pool( - input=conv_pool_1, - filter_size=5, - num_filters=50, - pool_size=2, - pool_stride=2, - act="relu") - - # TODO(dzhwinter) : refine the initializer and random seed settting - SIZE = 10 - input_shape = conv_pool_2.shape - param_shape = [six.moves.reduce(lambda a, b: a * b, input_shape[1:], 1) - ] + [SIZE] - scale = (2.0 / (param_shape[0]**2 * SIZE))**0.5 - - predict = fluid.layers.fc( - input=conv_pool_2, - size=SIZE, - act="softmax", - param_attr=fluid.param_attr.ParamAttr( - initializer=fluid.initializer.NormalInitializer( - loc=0.0, scale=scale))) - return predict - - -def eval_test(exe, batch_acc, batch_size_tensor, inference_program): - test_reader = paddle.batch( - paddle.dataset.mnist.test(), batch_size=args.batch_size) - test_pass_acc = fluid.average.WeightedAverage() - for batch_id, data in enumerate(test_reader()): - img_data = np.array( - [x[0].reshape([1, 28, 28]) for x in data]).astype(DTYPE) - y_data = np.array([x[1] for x in data]).astype("int64") - y_data = y_data.reshape([len(y_data), 1]) - - acc, weight = exe.run(inference_program, - feed={"pixel": img_data, - "label": y_data}, - fetch_list=[batch_acc, batch_size_tensor]) - test_pass_acc.add(value=acc, weight=weight) - pass_acc = test_pass_acc.eval() - return pass_acc - - -def run_benchmark(model, args): - if args.use_cprof: - pr = cProfile.Profile() - pr.enable() - start_time = time.time() - # Input data - images = fluid.layers.data(name='pixel', shape=[1, 28, 28], dtype=DTYPE) - label = fluid.layers.data(name='label', shape=[1], dtype='int64') - - # Train program - predict = model(images) - cost = fluid.layers.cross_entropy(input=predict, label=label) - avg_cost = fluid.layers.mean(x=cost) - - # Evaluator - batch_size_tensor = fluid.layers.create_tensor(dtype='int64') - batch_acc = fluid.layers.accuracy( - input=predict, label=label, total=batch_size_tensor) - - # inference program - inference_program = fluid.default_main_program().clone(for_test=True) - - # Optimization - opt = fluid.optimizer.AdamOptimizer( - learning_rate=0.001, beta1=0.9, beta2=0.999) - opt.minimize(avg_cost) - - fluid.memory_optimize(fluid.default_main_program()) - - # Initialize executor - place = fluid.CPUPlace() if args.device == 'CPU' else fluid.CUDAPlace(0) - exe = fluid.Executor(place) - - # Parameter initialization - exe.run(fluid.default_startup_program()) - - # Reader - train_reader = paddle.batch( - paddle.dataset.mnist.train(), batch_size=args.batch_size) - - accuracy = fluid.average.WeightedAverage() - for pass_id in range(args.pass_num): - accuracy.reset() - pass_start = time.time() - every_pass_loss = [] - for batch_id, data in enumerate(train_reader()): - img_data = np.array( - [x[0].reshape([1, 28, 28]) for x in data]).astype(DTYPE) - y_data = np.array([x[1] for x in data]).astype("int64") - y_data = y_data.reshape([len(y_data), 1]) - - start = time.time() - loss, acc, weight = exe.run( - fluid.default_main_program(), - feed={"pixel": img_data, - "label": y_data}, - fetch_list=[avg_cost, batch_acc, batch_size_tensor] - ) # The accuracy is the accumulation of batches, but not the current batch. - end = time.time() - accuracy.add(value=acc, weight=weight) - every_pass_loss.append(loss) - print("Pass = %d, Iter = %d, Loss = %f, Accuracy = %f" % - (pass_id, batch_id, loss, acc)) - - pass_end = time.time() - - train_avg_acc = accuracy.eval() - train_avg_loss = np.mean(every_pass_loss) - test_avg_acc = eval_test(exe, batch_acc, batch_size_tensor, - inference_program) - - print( - "pass=%d, train_avg_acc=%f,train_avg_loss=%f, test_avg_acc=%f, elapse=%f" - % (pass_id, train_avg_acc, train_avg_loss, test_avg_acc, - (pass_end - pass_start))) - #Note: The following logs are special for CE monitoring. - #Other situations do not need to care about these logs. - print("kpis train_acc %f" % train_avg_acc) - print("kpis train_cost %f" % train_avg_loss) - print("kpis test_acc %f" % test_avg_acc) - print("kpis train_duration %f" % (pass_end - pass_start)) - - -if __name__ == '__main__': - args = parse_args() - print_arguments(args) - if args.use_nvprof and args.device == 'GPU': - with profiler.cuda_profiler("cuda_profiler.txt", 'csv') as nvprof: - run_benchmark(cnn_model, args) - else: - run_benchmark(cnn_model, args) diff --git a/fluid/policy_gradient/README.md b/fluid/policy_gradient/README.md index b813aa124466597adfb80261bee7c2de22b95e67..b6ac95d0fba6bbb7552671fbc6e80d052a648045 100644 --- a/fluid/policy_gradient/README.md +++ b/fluid/policy_gradient/README.md @@ -1,171 +1,2 @@ -运行本目录下的程序示例需要使用PaddlePaddle的最新develop分枝。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。 ---- - -# Policy Gradient RL by PaddlePaddle -本文介绍了如何使用PaddlePaddle通过policy-based的强化学习方法来训练一个player(actor model), 我们希望这个player可以完成简单的走阶梯任务。 - - 内容分为: - - - 任务描述 - - 模型 - - 策略(目标函数) - - 算法(Gradient ascent) - - PaddlePaddle实现 - - -## 1. 任务描述 -假设有一个阶梯,连接A、B点,player从A点出发,每一步只能向前走一步或向后走一步,到达B点即为完成任务。我们希望训练一个聪明的player,它知道怎么最快的从A点到达B点。 -我们在命令行以下边的形式模拟任务: -``` -A - O - - - - - B -``` -一个‘-'代表一个阶梯,A点在行头,B点在行末,O代表player当前在的位置。 - -## 2. Policy Gradient -### 2.1 模型 -#### inputyer -模型的输入是player观察到的当前阶梯的状态$S$, 要包含阶梯的长度和player当前的位置信息。 -在命令行模拟的情况下,player的位置和阶梯长度连个变量足以表示当前的状态,但是我们为了便于将这个demo推广到更复杂的任务场景,我们这里用一个向量来表示游戏状态$S$. -向量$S$的长度为阶梯的长度,每一维代表一个阶梯,player所在的位置为1,其它位置为0. -下边是一个例子: -``` -S = [0, 1, 0, 0] // 阶梯长度为4,player在第二个阶梯上。 -``` -#### hidden layer -隐藏层采用两个全连接layer `FC_1`和`FC_2`, 其中`FC_1` 的size为10, `FC_2`的size为2. - -#### output layer -我们使用softmax将`FC_2`的output映射为所有可能的动作(前进或后退)的概率分布(Probability of taking the action),即为一个二维向量`act_probs`, 其中,`act_probs[0]` 为后退的概率,`act_probs[1]`为前进的概率。 - -#### 模型表示 -我将我们的player模型(actor)形式化表示如下: -$$a = \pi_\theta(s)$$ -其中$\theta$表示模型的参数,$s$是输入状态。 - - -### 2.2 策略(目标函数) -我们怎么评估一个player(模型)的好坏呢?首先我们定义几个术语: -我们让$\pi_\theta(s)$来玩一局游戏,$s_t$表示第$t$时刻的状态,$a_t$表示在状态$s_t$做出的动作,$r_t$表示做过动作$a_t$后得到的奖赏。 -一局游戏的过程可以表示如下: -$$\tau = [s_1, a_1, r_1, s_2, a_2, r_2 ... s_T, a_T, r_T] \tag{1}$$ - -一局游戏的奖励表示如下: -$$R(\tau) = \sum_{t=1}^Tr_t$$ - -player玩一局游戏,可能会出现多种操作序列$\tau$ ,某个$\tau$出现的概率是依赖于player model的$\theta$, 记做: -$$P(\tau | \theta)$$ -那么,给定一个$\theta$(player model), 玩一局游戏,期望得到的奖励是: -$$\overline {R}_\theta = \sum_\tau R(\tau)\sum_\tau R(\tau) P(\tau|\theta)$$ -大多数情况,我们无法穷举出所有的$\tau$,所以我们就抽取N个$\tau$来计算近似的期望: -$$\overline {R}_\theta = \sum_\tau R(\tau) P(\tau|\theta) \approx \frac{1}{N} \sum_{n=1}^N R(\tau^n)$$ - -$\overline {R}_\theta$就是我们需要的目标函数,它表示了一个参数为$\theta$的player玩一局游戏得分的期望,这个期望越大,代表这个player能力越强。 -### 2.3 算法(Gradient ascent) -我们的目标函数是$\overline {R}_\theta$, 我们训练的任务就是, 我们训练的任务就是: -$$\theta^* = \arg\max_\theta \overline {R}_\theta$$ - -为了找到理想的$\theta$,我们使用Gradient ascent方法不断在$\overline {R}_\theta$的梯度方向更新$\theta$,可表示如下: -$$\theta' = \theta + \eta * \bigtriangledown \overline {R}_\theta$$ - -$$ \bigtriangledown \overline {R}_\theta = \sum_\tau R(\tau) \bigtriangledown P(\tau|\theta)\\ -= \sum_\tau R(\tau) P(\tau|\theta) \frac{\bigtriangledown P(\tau|\theta)}{P(\tau|\theta)} \\ -=\sum_\tau R(\tau) P(\tau|\theta) {\bigtriangledown \log P(\tau|\theta)} $$ - - -$$P(\tau|\theta) = P(s_1)P(a_1|s_1,\theta)P(s_2, r_1|s_1,a_1)P(a_2|s_2,\theta)P(s_3,r_2|s_2,a_2)...P(a_t|s_t,\theta)P(s_{t+1}, r_t|s_t,a_t)\\ -=P(s_1) \sum_{t=1}^T P(a_t|s_t,\theta)P(s_{t+1}, r_t|s_t,a_t)$$ - -$$\log P(\tau|\theta) = \log P(s_1) + \sum_{t=1}^T [\log P(a_t|s_t,\theta) + \log P(s_{t+1}, r_t|s_t,a_t)]$$ - -$$ \bigtriangledown \log P(\tau|\theta) = \sum_{t=1}^T \bigtriangledown \log P(a_t|s_t,\theta)$$ - -$$ \bigtriangledown \overline {R}_\theta = \sum_\tau R(\tau) P(\tau|\theta) {\bigtriangledown \log P(\tau|\theta)} \\ -\approx \frac{1}{N} \sum_{n=1}^N R(\tau^n) {\bigtriangledown \log P(\tau|\theta)} \\ -= \frac{1}{N} \sum_{n=1}^N R(\tau^n) {\sum_{t=1}^T \bigtriangledown \log P(a_t|s_t,\theta)} \\ -= \frac{1}{N} \sum_{n=1}^N \sum_{t=1}^T R(\tau^n) { \bigtriangledown \log P(a_t|s_t,\theta)} \tag{11}$$ - -#### 2.3.2 导数解释 - -在使用深度学习框架进行训练求解时,一般用梯度下降方法,所以我们把Gradient ascent转为Gradient -descent, 重写等式$(5)(6)$为: - -$$\theta^* = \arg\min_\theta (-\overline {R}_\theta \tag{13}$$ -$$\theta' = \theta - \eta * \bigtriangledown (-\overline {R}_\theta)) \tag{14}$$ - -根据上一节的推导,$ (-\bigtriangledown \overline {R}_\theta) $结果如下: - -$$ -\bigtriangledown \overline {R}_\theta -= \frac{1}{N} \sum_{n=1}^N \sum_{t=1}^T R(\tau^n) { \bigtriangledown -\log P(a_t|s_t,\theta)} \tag{15}$$ - -根据等式(14), 我们的player的模型可以设计为: - -

    -
    -图 1 -

    - -用户的在一局游戏中的一次操作可以用元组$(s_t, a_t)$, 就是在状态$s_t$状态下做了动作$a_t$, 我们通过图(1)中的前向网络计算出来cross entropy cost为$−\log P(a_t|s_t,\theta)$, 恰好是等式(15)中我们需要微分的一项。 -图1是我们需要的player模型,我用这个网络的前向计算可以预测任何状态下该做什么动作。但是怎么去训练学习这个网络呢?在等式(15)中还有一项$R(\tau^n)$, 我做反向梯度传播的时候要加上这一项,所以我们需要在图1基础上再加上$R(\tau^n)$, 如 图2 所示: - -

    -
    -图 2 -

    - -图2就是我们最终的网络结构。 - -#### 2.3.3 直观理解 -对于等式(15),我只看游戏中的一步操作,也就是这一项: $R(\tau^n) { \bigtriangledown -\log P(a_t|s_t,\theta)}$, 我们可以简单的认为我们训练的目的是让 $R(\tau^n) {[ -\log P(a_t|s_t,\theta)]}$尽可能的小,也就是$R(\tau^n) \log P(a_t|s_t,\theta)$尽可能的大。 - -- 如果我们当前游戏局的奖励$R(\tau^n)$为正,那么我们希望当前操作的出现的概率$P(a_t|s_t,\theta)$尽可能大。 -- 如果我们当前游戏局的奖励$R(\tau^n)$为负,那么我们希望当前操作的出现的概率$P(a_t|s_t,\theta)$尽可能小。 - -#### 2.3.4 一个问题 - -一人犯错,诛连九族。一人得道,鸡犬升天。如果一局游戏得到奖励,我们希望帮助获得奖励的每一次操作都被重视;否则,导致惩罚的操作都要被冷落一次。 -是不是很有道理的样子?但是,如果有些游戏场景只有奖励,没有惩罚,怎么办?也就是所有的$R(\tau^n)$都为正。 -针对不同的游戏场景,我们有不同的解决方案: - -1. 每局游戏得分不一样:将每局的得分减去一个bias,结果就有正有负了。 -2. 每局游戏得分一样:把完成一局的时间作为计分因素,并减去一个bias. - -我们在第一章描述的游戏场景,需要用第二种 ,player每次到达终点都会收到1分的奖励,我们可以按完成任务所用的步数来定义奖励R. -更进一步,我们认为一局游戏中每步动作对结局的贡献是不同的,有聪明的动作,也有愚蠢的操作。直观的理解,一般是靠前的动作是愚蠢的,靠后的动作是聪明的。既然有了这个价值观,那么我们拿到1分的奖励,就不能平均分给每个动作了。 -如图3所示,让所有动作按先后排队,从后往前衰减地给每个动作奖励,然后再每个动作的奖励再减去所有动作奖励的平均值: - -

    -
    -图 3 -

    - -## 3. 训练效果 - -demo运行训练效果如下,经过1000轮尝试,我们的player就学会了如何有效的完成任务了: - -``` ----------O epoch: 0; steps: 42 ----------O epoch: 1; steps: 77 ----------O epoch: 2; steps: 82 ----------O epoch: 3; steps: 64 ----------O epoch: 4; steps: 79 ----------O epoch: 501; steps: 19 ----------O epoch: 1001; steps: 9 ----------O epoch: 1501; steps: 9 ----------O epoch: 2001; steps: 11 ----------O epoch: 2501; steps: 9 ----------O epoch: 3001; steps: 9 ----------O epoch: 3002; steps: 9 ----------O epoch: 3003; steps: 9 ----------O epoch: 3004; steps: 9 ----------O epoch: 3005; steps: 9 ----------O epoch: 3006; steps: 9 ----------O epoch: 3007; steps: 9 ----------O epoch: 3008; steps: 9 ----------O epoch: 3009; steps: 9 ----------O epoch: 3010; steps: 11 ----------O epoch: 3011; steps: 9 ----------O epoch: 3012; steps: 9 ----------O epoch: 3013; steps: 9 ----------O epoch: 3014; steps: 9 -``` +您好,该项目已被迁移,请移步到 [PaddleRL/policy_gradient](../../PaddleRL/policy_gradient) 目录下浏览本项目。