diff --git a/README.md b/README.md index 98cef358dae468f9b16209299de2776210970a99..2c68e44f586e9db902d97d25eb7c080d74505c80 100644 --- a/README.md +++ b/README.md @@ -59,14 +59,6 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 [DeepCTR](./fluid/PaddleRec/ctr/README.cn.md)|点击率预估模型|只实现了DeepFM论文中介绍的模型的DNN部分,DeepFM会在其他例子中给出|[DeepFM: A Factorization-Machine based Neural Network for CTR Prediction](https://arxiv.org/abs/1703.04247) [Multiview-Simnet](./fluid/PaddleRec/multiview_simnet)|个性化推荐模型|基于多元视图,将用户和项目的多个功能视图合并为一个统一模型|[A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems](http://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/frp1159-songA.pdf) -## Other Models -模型|简介|模型优势|参考论文 ---|:--:|:--:|:--: -[DeepASR](./fluid/DeepASR/README_cn.md)|语音识别系统|利用Fluid框架完成语音识别中声学模型的配置和训练,并集成 Kaldi 的解码器|- -[DQN](./fluid/DeepQNetwork/README_cn.md)|深度Q网络|value based强化学习算法,第一个成功地将深度学习和强化学习结合起来的模型|[Human-level control through deep reinforcement learning](https://www.nature.com/articles/nature14236) -[DoubleDQN](./fluid/DeepQNetwork/README_cn.md)|DQN的变体|将Double Q的想法应用在DQN上,解决过优化问题|[Font Size: Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389) -[DuelingDQN](./fluid/DeepQNetwork/README_cn.md)|DQN的变体|改进了DQN模型,提高了模型的性能|[Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html) - ## License This tutorial is contributed by [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) and licensed under the [Apache-2.0 license](LICENSE). diff --git a/fluid/AutoDL/LRC/README.md b/fluid/AutoDL/LRC/README.md deleted file mode 100644 index df9af47d4a3876371673cbbfef0ad2553768b9a5..0000000000000000000000000000000000000000 --- a/fluid/AutoDL/LRC/README.md +++ /dev/null @@ -1,74 +0,0 @@ -# LRC Local Rademachar Complexity Regularization -Regularization of Deep Neural Networks(DNNs) for the sake of improving their generalization capability is important and chllenging. This directory contains image classification model based on a novel regularizer rooted in Local Rademacher Complexity (LRC). We appreciate the contribution by [DARTS](https://arxiv.org/abs/1806.09055) for our research. The regularization by LRC and DARTS are combined in this model on CIFAR-10 dataset. Code accompanying the paper -> [An Empirical Study on Regularization of Deep Neural Networks by Local Rademacher Complexity](https://arxiv.org/abs/1902.00873)\ -> Yingzhen Yang, Xingjian Li, Jun Huan.\ -> _arXiv:1902.00873_. - ---- -# Table of Contents - -- [Installation](#installation) -- [Data preparation](#data-preparation) -- [Training](#training) - -## Installation - -Running sample code in this directory requires PaddelPaddle Fluid v.1.2.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html#paddlepaddle) and make an update. - -## Data preparation - -When you want to use the cifar-10 dataset for the first time, you can download the dataset as: - - sh ./dataset/download.sh - -Please make sure your environment has an internet connection. - -The dataset will be downloaded to `dataset/cifar/cifar-10-batches-py` in the same directory as the `train.py`. If automatic download fails, you can download cifar-10-python.tar.gz from https://www.cs.toronto.edu/~kriz/cifar.html and decompress it to the location mentioned above. - - -## Training - -After data preparation, one can start the training step by: - - python -u train_mixup.py \ - --batch_size=80 \ - --auxiliary \ - --weight_decay=0.0003 \ - --learning_rate=0.025 \ - --lrc_loss_lambda=0.7 \ - --cutout -- Set ```export CUDA_VISIBLE_DEVICES=0``` to specifiy one GPU to train. -- For more help on arguments: - - python train_mixup.py --help - -**data reader introduction:** - -* Data reader is defined in `reader.py`. -* Reshape the images to 32 * 32. -* In training stage, images are padding to 40 * 40 and cropped randomly to the original size. -* In training stage, images are horizontally random flipped. -* Images are standardized to (0, 1). -* In training stage, cutout images randomly. -* Shuffle the order of the input images during training. - -**model configuration:** - -* Use auxiliary loss and auxiliary\_weight=0.4. -* Use dropout and drop\_path\_prob=0.2. -* Set lrc\_loss\_lambda=0.7. - -**training strategy:** - -* Use momentum optimizer with momentum=0.9. -* Weight decay is 0.0003. -* Use cosine decay with init\_lr=0.025. -* Total epoch is 600. -* Use Xaiver initalizer to weight in conv2d, Constant initalizer to weight in batch norm and Normal initalizer to weight in fc. -* Initalize bias in batch norm and fc to zero constant and do not add bias to conv2d. - - -## Reference - - - DARTS: Differentiable Architecture Search [`paper`](https://arxiv.org/abs/1806.09055) - - Differentiable architecture search in PyTorch [`code`](https://github.com/quark0/darts) diff --git a/fluid/AutoDL/LRC/README_cn.md b/fluid/AutoDL/LRC/README_cn.md deleted file mode 100644 index 06dc937074de199af31db97ee200e7690443b1b0..0000000000000000000000000000000000000000 --- a/fluid/AutoDL/LRC/README_cn.md +++ /dev/null @@ -1,71 +0,0 @@ -# LRC 局部Rademachar复杂度正则化 -为了在深度神经网络中提升泛化能力,正则化的选择十分重要也具有挑战性。本目录包括了一种基于局部rademacher复杂度的新型正则(LRC)的图像分类模型。十分感谢[DARTS](https://arxiv.org/abs/1806.09055)模型对本研究提供的帮助。该模型将LRC正则和DARTS网络相结合,在CIFAR-10数据集中得到了很出色的效果。代码和文章一同发布 -> [An Empirical Study on Regularization of Deep Neural Networks by Local Rademacher Complexity](https://arxiv.org/abs/1902.00873)\ -> Yingzhen Yang, Xingjian Li, Jun Huan.\ -> _arXiv:1902.00873_. - ---- -# 内容 - -- [安装](#安装) -- [数据准备](#数据准备) -- [模型训练](#模型训练) - -## 安装 - -在当前目录下运行样例代码需要PadddlePaddle Fluid的v.1.2.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://www.paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html#paddlepaddle)中的说明来更新PaddlePaddle。 - -## 数据准备 - -第一次使用CIFAR-10数据集时,您可以通过如果命令下载: - - sh ./dataset/download.sh - -请确保您的环境有互联网连接。数据会下载到`train.py`同目录下的`dataset/cifar/cifar-10-batches-py`。如果下载失败,您可以自行从https://www.cs.toronto.edu/~kriz/cifar.html上下载cifar-10-python.tar.gz并解压到上述位置。 - -## 模型训练 - -数据准备好后,可以通过如下命令开始训练: - - python -u train_mixup.py \ - --batch_size=80 \ - --auxiliary \ - --weight_decay=0.0003 \ - --learning_rate=0.025 \ - --lrc_loss_lambda=0.7 \ - --cutout -- 通过设置 ```export CUDA_VISIBLE_DEVICES=0```指定单张GPU训练。 -- 可选参数见: - - python train_mixup.py --help - -**数据读取器说明:** - -* 数据读取器定义在`reader.py`中 -* 输入图像尺寸统一变换为32 * 32 -* 训练时将图像填充为40 * 40然后随机剪裁为原输入图像大小 -* 训练时图像随机水平翻转 -* 对图像每个像素做归一化处理 -* 训练时对图像做随机遮挡 -* 训练时对输入图像做随机洗牌 - -**模型配置:** - -* 使用辅助损失,辅助损失权重为0.4 -* 使用dropout,随机丢弃率为0.2 -* 设置lrc\_loss\_lambda为0.7 - -**训练策略:** - -* 采用momentum优化算法训练,momentum=0.9 -* 权重衰减系数为0.0001 -* 采用正弦学习率衰减,初始学习率为0.025 -* 总共训练600轮 -* 对卷积权重采用Xaiver初始化,对batch norm权重采用固定初始化,对全连接层权重采用高斯初始化 -* 对batch norm和全连接层偏差采用固定初始化,不对卷积设置偏差 - - -## 引用 - - - DARTS: Differentiable Architecture Search [`论文`](https://arxiv.org/abs/1806.09055) - - Differentiable Architecture Search in PyTorch [`代码`](https://github.com/quark0/darts) diff --git a/fluid/AutoDL/LRC/dataset/download.sh b/fluid/AutoDL/LRC/dataset/download.sh deleted file mode 100644 index 0981c3b6878421f80d392f314fd0ae836644a63c..0000000000000000000000000000000000000000 --- a/fluid/AutoDL/LRC/dataset/download.sh +++ /dev/null @@ -1,10 +0,0 @@ -DIR="$( cd "$(dirname "$0")" ; pwd -P )" -cd "$DIR" -mkdir cifar -cd cifar -# Download the data. -echo "Downloading..." -wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz -# Extract the data. -echo "Extracting..." -tar zvxf cifar-10-python.tar.gz diff --git a/fluid/AutoDL/LRC/genotypes.py b/fluid/AutoDL/LRC/genotypes.py deleted file mode 100644 index 349fbd2478a7c2d1bb4cc3dd901b470de3c8b906..0000000000000000000000000000000000000000 --- a/fluid/AutoDL/LRC/genotypes.py +++ /dev/null @@ -1,116 +0,0 @@ -# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# -# Based on: -# -------------------------------------------------------- -# DARTS -# Copyright (c) 2018, Hanxiao Liu. -# Licensed under the Apache License, Version 2.0; -# -------------------------------------------------------- - -from collections import namedtuple - -Genotype = namedtuple('Genotype', 'normal normal_concat reduce reduce_concat') - -PRIMITIVES = [ - 'none', 'max_pool_3x3', 'avg_pool_3x3', 'skip_connect', 'sep_conv_3x3', - 'sep_conv_5x5', 'dil_conv_3x3', 'dil_conv_5x5' -] - -NASNet = Genotype( - normal=[ - ('sep_conv_5x5', 1), - ('sep_conv_3x3', 0), - ('sep_conv_5x5', 0), - ('sep_conv_3x3', 0), - ('avg_pool_3x3', 1), - ('skip_connect', 0), - ('avg_pool_3x3', 0), - ('avg_pool_3x3', 0), - ('sep_conv_3x3', 1), - ('skip_connect', 1), - ], - normal_concat=[2, 3, 4, 5, 6], - reduce=[ - ('sep_conv_5x5', 1), - ('sep_conv_7x7', 0), - ('max_pool_3x3', 1), - ('sep_conv_7x7', 0), - ('avg_pool_3x3', 1), - ('sep_conv_5x5', 0), - ('skip_connect', 3), - ('avg_pool_3x3', 2), - ('sep_conv_3x3', 2), - ('max_pool_3x3', 1), - ], - reduce_concat=[4, 5, 6], ) - -AmoebaNet = Genotype( - normal=[ - ('avg_pool_3x3', 0), - ('max_pool_3x3', 1), - ('sep_conv_3x3', 0), - ('sep_conv_5x5', 2), - ('sep_conv_3x3', 0), - ('avg_pool_3x3', 3), - ('sep_conv_3x3', 1), - ('skip_connect', 1), - ('skip_connect', 0), - ('avg_pool_3x3', 1), - ], - normal_concat=[4, 5, 6], - reduce=[ - ('avg_pool_3x3', 0), - ('sep_conv_3x3', 1), - ('max_pool_3x3', 0), - ('sep_conv_7x7', 2), - ('sep_conv_7x7', 0), - ('avg_pool_3x3', 1), - ('max_pool_3x3', 0), - ('max_pool_3x3', 1), - ('conv_7x1_1x7', 0), - ('sep_conv_3x3', 5), - ], - reduce_concat=[3, 4, 6]) - -DARTS_V1 = Genotype( - normal=[('sep_conv_3x3', 1), ('sep_conv_3x3', 0), ('skip_connect', 0), - ('sep_conv_3x3', 1), ('skip_connect', 0), ('sep_conv_3x3', 1), - ('sep_conv_3x3', 0), ('skip_connect', 2)], - normal_concat=[2, 3, 4, 5], - reduce=[('max_pool_3x3', 0), ('max_pool_3x3', 1), ('skip_connect', 2), - ('max_pool_3x3', 0), ('max_pool_3x3', 0), ('skip_connect', 2), - ('skip_connect', 2), ('avg_pool_3x3', 0)], - reduce_concat=[2, 3, 4, 5]) -DARTS_V2 = Genotype( - normal=[('sep_conv_3x3', 0), ('sep_conv_3x3', 1), ('sep_conv_3x3', 0), - ('sep_conv_3x3', 1), ('sep_conv_3x3', 1), ('skip_connect', 0), - ('skip_connect', 0), ('dil_conv_3x3', 2)], - normal_concat=[2, 3, 4, 5], - reduce=[('max_pool_3x3', 0), ('max_pool_3x3', 1), ('skip_connect', 2), - ('max_pool_3x3', 1), ('max_pool_3x3', 0), ('skip_connect', 2), - ('skip_connect', 2), ('max_pool_3x3', 1)], - reduce_concat=[2, 3, 4, 5]) - -MY_DARTS = Genotype( - normal=[('sep_conv_3x3', 0), ('skip_connect', 1), ('skip_connect', 0), - ('dil_conv_5x5', 1), ('skip_connect', 0), ('sep_conv_3x3', 1), - ('skip_connect', 0), ('sep_conv_3x3', 1)], - normal_concat=range(2, 6), - reduce=[('max_pool_3x3', 0), ('max_pool_3x3', 1), ('max_pool_3x3', 0), - ('skip_connect', 2), ('max_pool_3x3', 0), ('skip_connect', 2), - ('skip_connect', 2), ('skip_connect', 3)], - reduce_concat=range(2, 6)) - -DARTS = MY_DARTS diff --git a/fluid/AutoDL/LRC/learning_rate.py b/fluid/AutoDL/LRC/learning_rate.py deleted file mode 100644 index 3965171b487884d36e4a7447f10f312204803bf8..0000000000000000000000000000000000000000 --- a/fluid/AutoDL/LRC/learning_rate.py +++ /dev/null @@ -1,43 +0,0 @@ -# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# -# Based on: -# -------------------------------------------------------- -# DARTS -# Copyright (c) 2018, Hanxiao Liu. -# Licensed under the Apache License, Version 2.0; -# -------------------------------------------------------- - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function -import paddle -import paddle.fluid as fluid -import paddle.fluid.layers.ops as ops -from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter -import math -from paddle.fluid.initializer import init_on_cpu - - -def cosine_decay(learning_rate, num_epoch, steps_one_epoch): - """Applies cosine decay to the learning rate. - lr = 0.5 * (math.cos(epoch * (math.pi / 120)) + 1) - """ - global_step = _decay_step_counter() - - with init_on_cpu(): - decayed_lr = learning_rate * \ - (ops.cos((global_step / steps_one_epoch) \ - * math.pi / num_epoch) + 1)/2 - return decayed_lr diff --git a/fluid/AutoDL/LRC/model.py b/fluid/AutoDL/LRC/model.py deleted file mode 100644 index 45a403495ecc0b7cc0ac3b541d75702adbef31b2..0000000000000000000000000000000000000000 --- a/fluid/AutoDL/LRC/model.py +++ /dev/null @@ -1,313 +0,0 @@ -# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve. -# -#Licensed under the Apache License, Version 2.0 (the "License"); -#you may not use this file except in compliance with the License. -#You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -#Unless required by applicable law or agreed to in writing, software -#distributed under the License is distributed on an "AS IS" BASIS, -#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -#See the License for the specific language governing permissions and -#limitations under the License. -# -# Based on: -# -------------------------------------------------------- -# DARTS -# Copyright (c) 2018, Hanxiao Liu. -# Licensed under the Apache License, Version 2.0; -# -------------------------------------------------------- - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function -import os -import sys -import numpy as np -import time -import functools -import paddle -import paddle.fluid as fluid -from operations import * - - -class Cell(): - def __init__(self, genotype, C_prev_prev, C_prev, C, reduction, - reduction_prev): - print(C_prev_prev, C_prev, C) - - if reduction_prev: - self.preprocess0 = functools.partial(FactorizedReduce, C_out=C) - else: - self.preprocess0 = functools.partial( - ReLUConvBN, C_out=C, kernel_size=1, stride=1, padding=0) - self.preprocess1 = functools.partial( - ReLUConvBN, C_out=C, kernel_size=1, stride=1, padding=0) - if reduction: - op_names, indices = zip(*genotype.reduce) - concat = genotype.reduce_concat - else: - op_names, indices = zip(*genotype.normal) - concat = genotype.normal_concat - print(op_names, indices, concat, reduction) - self._compile(C, op_names, indices, concat, reduction) - - def _compile(self, C, op_names, indices, concat, reduction): - assert len(op_names) == len(indices) - self._steps = len(op_names) // 2 - self._concat = concat - self.multiplier = len(concat) - - self._ops = [] - for name, index in zip(op_names, indices): - stride = 2 if reduction and index < 2 else 1 - op = functools.partial(OPS[name], C=C, stride=stride, affine=True) - self._ops += [op] - self._indices = indices - - def forward(self, s0, s1, drop_prob, is_train, name): - self.training = is_train - preprocess0_name = name + 'preprocess0.' - preprocess1_name = name + 'preprocess1.' - s0 = self.preprocess0(s0, name=preprocess0_name) - s1 = self.preprocess1(s1, name=preprocess1_name) - out = [s0, s1] - for i in range(self._steps): - h1 = out[self._indices[2 * i]] - h2 = out[self._indices[2 * i + 1]] - op1 = self._ops[2 * i] - op2 = self._ops[2 * i + 1] - h3 = op1(h1, name=name + '_ops.' + str(2 * i) + '.') - h4 = op2(h2, name=name + '_ops.' + str(2 * i + 1) + '.') - if self.training and drop_prob > 0.: - if h3 != h1: - h3 = fluid.layers.dropout( - h3, - drop_prob, - dropout_implementation='upscale_in_train') - if h4 != h2: - h4 = fluid.layers.dropout( - h4, - drop_prob, - dropout_implementation='upscale_in_train') - s = h3 + h4 - out += [s] - return fluid.layers.concat([out[i] for i in self._concat], axis=1) - - -def AuxiliaryHeadCIFAR(input, num_classes, aux_name='auxiliary_head'): - relu_a = fluid.layers.relu(input) - pool_a = fluid.layers.pool2d(relu_a, 5, 'avg', 3) - conv2d_a = fluid.layers.conv2d( - pool_a, - 128, - 1, - name=aux_name + '.features.2', - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=aux_name + '.features.2.weight'), - bias_attr=False) - bn_a_name = aux_name + '.features.3' - bn_a = fluid.layers.batch_norm( - conv2d_a, - act='relu', - name=bn_a_name, - param_attr=ParamAttr( - initializer=Constant(1.), name=bn_a_name + '.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), name=bn_a_name + '.bias'), - moving_mean_name=bn_a_name + '.running_mean', - moving_variance_name=bn_a_name + '.running_var') - conv2d_b = fluid.layers.conv2d( - bn_a, - 768, - 2, - name=aux_name + '.features.5', - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=aux_name + '.features.5.weight'), - bias_attr=False) - bn_b_name = aux_name + '.features.6' - bn_b = fluid.layers.batch_norm( - conv2d_b, - act='relu', - name=bn_b_name, - param_attr=ParamAttr( - initializer=Constant(1.), name=bn_b_name + '.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), name=bn_b_name + '.bias'), - moving_mean_name=bn_b_name + '.running_mean', - moving_variance_name=bn_b_name + '.running_var') - fc_name = aux_name + '.classifier' - fc = fluid.layers.fc(bn_b, - num_classes, - name=fc_name, - param_attr=ParamAttr( - initializer=Normal(scale=1e-3), - name=fc_name + '.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), name=fc_name + '.bias')) - return fc - - -def StemConv(input, C_out, kernel_size, padding): - conv_a = fluid.layers.conv2d( - input, - C_out, - kernel_size, - padding=padding, - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), name='stem.0.weight'), - bias_attr=False) - bn_a = fluid.layers.batch_norm( - conv_a, - param_attr=ParamAttr( - initializer=Constant(1.), name='stem.1.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), name='stem.1.bias'), - moving_mean_name='stem.1.running_mean', - moving_variance_name='stem.1.running_var') - return bn_a - - -class NetworkCIFAR(object): - def __init__(self, C, class_num, layers, auxiliary, genotype): - self.class_num = class_num - self._layers = layers - self._auxiliary = auxiliary - - stem_multiplier = 3 - self.drop_path_prob = 0 - C_curr = stem_multiplier * C - - C_prev_prev, C_prev, C_curr = C_curr, C_curr, C - self.cells = [] - reduction_prev = False - for i in range(layers): - if i in [layers // 3, 2 * layers // 3]: - C_curr *= 2 - reduction = True - else: - reduction = False - cell = Cell(genotype, C_prev_prev, C_prev, C_curr, reduction, - reduction_prev) - reduction_prev = reduction - self.cells += [cell] - C_prev_prev, C_prev = C_prev, cell.multiplier * C_curr - if i == 2 * layers // 3: - C_to_auxiliary = C_prev - - def forward(self, init_channel, is_train): - self.training = is_train - self.logits_aux = None - num_channel = init_channel * 3 - s0 = StemConv(self.image, num_channel, kernel_size=3, padding=1) - s1 = s0 - for i, cell in enumerate(self.cells): - name = 'cells.' + str(i) + '.' - s0, s1 = s1, cell.forward(s0, s1, self.drop_path_prob, is_train, - name) - if i == int(2 * self._layers // 3): - if self._auxiliary and self.training: - self.logits_aux = AuxiliaryHeadCIFAR(s1, self.class_num) - out = fluid.layers.adaptive_pool2d(s1, (1, 1), "avg") - self.logits = fluid.layers.fc(out, - size=self.class_num, - param_attr=ParamAttr( - initializer=Normal(scale=1e-3), - name='classifier.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), - name='classifier.bias')) - return self.logits, self.logits_aux - - def build_input(self, image_shape, batch_size, is_train): - if is_train: - py_reader = fluid.layers.py_reader( - capacity=64, - shapes=[[-1] + image_shape, [-1, 1], [-1, 1], [-1, 1], [-1, 1], - [-1, 1], [-1, batch_size, self.class_num - 1]], - lod_levels=[0, 0, 0, 0, 0, 0, 0], - dtypes=[ - "float32", "int64", "int64", "float32", "int32", "int32", - "float32" - ], - use_double_buffer=True, - name='train_reader') - else: - py_reader = fluid.layers.py_reader( - capacity=64, - shapes=[[-1] + image_shape, [-1, 1]], - lod_levels=[0, 0], - dtypes=["float32", "int64"], - use_double_buffer=True, - name='test_reader') - return py_reader - - def train_model(self, py_reader, init_channels, aux, aux_w, batch_size, - loss_lambda): - self.image, self.ya, self.yb, self.lam, self.label_reshape,\ - self.non_label_reshape, self.rad_var = fluid.layers.read_file(py_reader) - self.logits, self.logits_aux = self.forward(init_channels, True) - self.mixup_loss = self.mixup_loss(aux, aux_w) - self.lrc_loss = self.lrc_loss(batch_size) - return self.mixup_loss + loss_lambda * self.lrc_loss - - def test_model(self, py_reader, init_channels): - self.image, self.ya = fluid.layers.read_file(py_reader) - self.logits, _ = self.forward(init_channels, False) - prob = fluid.layers.softmax(self.logits, use_cudnn=False) - loss = fluid.layers.cross_entropy(prob, self.ya) - acc_1 = fluid.layers.accuracy(self.logits, self.ya, k=1) - acc_5 = fluid.layers.accuracy(self.logits, self.ya, k=5) - return loss, acc_1, acc_5 - - def mixup_loss(self, auxiliary, auxiliary_weight): - prob = fluid.layers.softmax(self.logits, use_cudnn=False) - loss_a = fluid.layers.cross_entropy(prob, self.ya) - loss_b = fluid.layers.cross_entropy(prob, self.yb) - loss_a_mean = fluid.layers.reduce_mean(loss_a) - loss_b_mean = fluid.layers.reduce_mean(loss_b) - loss = self.lam * loss_a_mean + (1 - self.lam) * loss_b_mean - if auxiliary: - prob_aux = fluid.layers.softmax(self.logits_aux, use_cudnn=False) - loss_a_aux = fluid.layers.cross_entropy(prob_aux, self.ya) - loss_b_aux = fluid.layers.cross_entropy(prob_aux, self.yb) - loss_a_aux_mean = fluid.layers.reduce_mean(loss_a_aux) - loss_b_aux_mean = fluid.layers.reduce_mean(loss_b_aux) - loss_aux = self.lam * loss_a_aux_mean + (1 - self.lam - ) * loss_b_aux_mean - return loss + auxiliary_weight * loss_aux - - def lrc_loss(self, batch_size): - y_diff_reshape = fluid.layers.reshape(self.logits, shape=(-1, 1)) - label_reshape = fluid.layers.squeeze(self.label_reshape, axes=[1]) - non_label_reshape = fluid.layers.squeeze( - self.non_label_reshape, axes=[1]) - label_reshape.stop_gradient = True - non_label_reshape.stop_graident = True - - y_diff_label_reshape = fluid.layers.gather(y_diff_reshape, - label_reshape) - y_diff_non_label_reshape = fluid.layers.gather(y_diff_reshape, - non_label_reshape) - y_diff_label = fluid.layers.reshape( - y_diff_label_reshape, shape=(-1, batch_size, 1)) - y_diff_non_label = fluid.layers.reshape( - y_diff_non_label_reshape, - shape=(-1, batch_size, self.class_num - 1)) - y_diff_ = y_diff_non_label - y_diff_label - - y_diff_ = fluid.layers.transpose(y_diff_, perm=[1, 2, 0]) - rad_var_trans = fluid.layers.transpose(self.rad_var, perm=[1, 2, 0]) - rad_y_diff_trans = rad_var_trans * y_diff_ - lrc_loss_sum = fluid.layers.reduce_sum(rad_y_diff_trans, dim=[0, 1]) - lrc_loss_ = fluid.layers.abs(lrc_loss_sum) / (batch_size * - (self.class_num - 1)) - lrc_loss_mean = fluid.layers.reduce_mean(lrc_loss_) - - return lrc_loss_mean diff --git a/fluid/AutoDL/LRC/operations.py b/fluid/AutoDL/LRC/operations.py deleted file mode 100644 index b015722a1bc5dbf682c90812a971f3dbb2cd8c9a..0000000000000000000000000000000000000000 --- a/fluid/AutoDL/LRC/operations.py +++ /dev/null @@ -1,349 +0,0 @@ -# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve. -# -#Licensed under the Apache License, Version 2.0 (the "License"); -#you may not use this file except in compliance with the License. -#You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -#Unless required by applicable law or agreed to in writing, software -#distributed under the License is distributed on an "AS IS" BASIS, -#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -#See the License for the specific language governing permissions and -#limitations under the License. -# -# Based on: -# -------------------------------------------------------- -# DARTS -# Copyright (c) 2018, Hanxiao Liu. -# Licensed under the Apache License, Version 2.0; -# -------------------------------------------------------- -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function -import os -import sys -import numpy as np -import time -import paddle -import paddle.fluid as fluid -from paddle.fluid.param_attr import ParamAttr -from paddle.fluid.initializer import Xavier -from paddle.fluid.initializer import Normal -from paddle.fluid.initializer import Constant - -OPS = { - 'none' : lambda input, C, stride, name, affine: Zero(input, stride, name), - 'avg_pool_3x3' : lambda input, C, stride, name, affine: fluid.layers.pool2d(input, 3, 'avg', pool_stride=stride, pool_padding=1, name=name), - 'max_pool_3x3' : lambda input, C, stride, name, affine: fluid.layers.pool2d(input, 3, 'max', pool_stride=stride, pool_padding=1, name=name), - 'skip_connect' : lambda input,C, stride, name, affine: Identity(input, name) if stride == 1 else FactorizedReduce(input, C, name=name, affine=affine), - 'sep_conv_3x3' : lambda input,C, stride, name, affine: SepConv(input, C, C, 3, stride, 1, name=name, affine=affine), - 'sep_conv_5x5' : lambda input,C, stride, name, affine: SepConv(input, C, C, 5, stride, 2, name=name, affine=affine), - 'sep_conv_7x7' : lambda input,C, stride, name, affine: SepConv(input, C, C, 7, stride, 3, name=name, affine=affine), - 'dil_conv_3x3' : lambda input,C, stride, name, affine: DilConv(input, C, C, 3, stride, 2, 2, name=name, affine=affine), - 'dil_conv_5x5' : lambda input,C, stride, name, affine: DilConv(input, C, C, 5, stride, 4, 2, name=name, affine=affine), - 'conv_7x1_1x7' : lambda input,C, stride, name, affine: SevenConv(input, C, name=name, affine=affine) -} - - -def ReLUConvBN(input, C_out, kernel_size, stride, padding, name='', - affine=True): - relu_a = fluid.layers.relu(input) - conv2d_a = fluid.layers.conv2d( - relu_a, - C_out, - kernel_size, - stride, - padding, - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=name + 'op.1.weight'), - bias_attr=False) - if affine: - reluconvbn_out = fluid.layers.batch_norm( - conv2d_a, - param_attr=ParamAttr( - initializer=Constant(1.), name=name + 'op.2.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), name=name + 'op.2.bias'), - moving_mean_name=name + 'op.2.running_mean', - moving_variance_name=name + 'op.2.running_var') - else: - reluconvbn_out = fluid.layers.batch_norm( - conv2d_a, - param_attr=ParamAttr( - initializer=Constant(1.), - learning_rate=0., - name=name + 'op.2.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), - learning_rate=0., - name=name + 'op.2.bias'), - moving_mean_name=name + 'op.2.running_mean', - moving_variance_name=name + 'op.2.running_var') - return reluconvbn_out - - -def DilConv(input, - C_in, - C_out, - kernel_size, - stride, - padding, - dilation, - name='', - affine=True): - relu_a = fluid.layers.relu(input) - conv2d_a = fluid.layers.conv2d( - relu_a, - C_in, - kernel_size, - stride, - padding, - dilation, - groups=C_in, - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=name + 'op.1.weight'), - bias_attr=False, - use_cudnn=False) - conv2d_b = fluid.layers.conv2d( - conv2d_a, - C_out, - 1, - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=name + 'op.2.weight'), - bias_attr=False) - if affine: - dilconv_out = fluid.layers.batch_norm( - conv2d_b, - param_attr=ParamAttr( - initializer=Constant(1.), name=name + 'op.3.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), name=name + 'op.3.bias'), - moving_mean_name=name + 'op.3.running_mean', - moving_variance_name=name + 'op.3.running_var') - else: - dilconv_out = fluid.layers.batch_norm( - conv2d_b, - param_attr=ParamAttr( - initializer=Constant(1.), - learning_rate=0., - name=name + 'op.3.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), - learning_rate=0., - name=name + 'op.3.bias'), - moving_mean_name=name + 'op.3.running_mean', - moving_variance_name=name + 'op.3.running_var') - return dilconv_out - - -def SepConv(input, - C_in, - C_out, - kernel_size, - stride, - padding, - name='', - affine=True): - relu_a = fluid.layers.relu(input) - conv2d_a = fluid.layers.conv2d( - relu_a, - C_in, - kernel_size, - stride, - padding, - groups=C_in, - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=name + 'op.1.weight'), - bias_attr=False, - use_cudnn=False) - conv2d_b = fluid.layers.conv2d( - conv2d_a, - C_in, - 1, - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=name + 'op.2.weight'), - bias_attr=False) - if affine: - bn_a = fluid.layers.batch_norm( - conv2d_b, - param_attr=ParamAttr( - initializer=Constant(1.), name=name + 'op.3.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), name=name + 'op.3.bias'), - moving_mean_name=name + 'op.3.running_mean', - moving_variance_name=name + 'op.3.running_var') - else: - bn_a = fluid.layers.batch_norm( - conv2d_b, - param_attr=ParamAttr( - initializer=Constant(1.), - learning_rate=0., - name=name + 'op.3.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), - learning_rate=0., - name=name + 'op.3.bias'), - moving_mean_name=name + 'op.3.running_mean', - moving_variance_name=name + 'op.3.running_var') - - relu_b = fluid.layers.relu(bn_a) - conv2d_d = fluid.layers.conv2d( - relu_b, - C_in, - kernel_size, - 1, - padding, - groups=C_in, - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=name + 'op.5.weight'), - bias_attr=False, - use_cudnn=False) - conv2d_e = fluid.layers.conv2d( - conv2d_d, - C_out, - 1, - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=name + 'op.6.weight'), - bias_attr=False) - if affine: - sepconv_out = fluid.layers.batch_norm( - conv2d_e, - param_attr=ParamAttr( - initializer=Constant(1.), name=name + 'op.7.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), name=name + 'op.7.bias'), - moving_mean_name=name + 'op.7.running_mean', - moving_variance_name=name + 'op.7.running_var') - else: - sepconv_out = fluid.layers.batch_norm( - conv2d_e, - param_attr=ParamAttr( - initializer=Constant(1.), - learning_rate=0., - name=name + 'op.7.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), - learning_rate=0., - name=name + 'op.7.bias'), - moving_mean_name=name + 'op.7.running_mean', - moving_variance_name=name + 'op.7.running_var') - return sepconv_out - - -def SevenConv(input, C_out, stride, name='', affine=True): - relu_a = fluid.layers.relu(input) - conv2d_a = fluid.layers.conv2d( - relu_a, - C_out, (1, 7), (1, stride), (0, 3), - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=name + 'op.1.weight'), - bias_attr=False) - conv2d_b = fluid.layers.conv2d( - conv2d_a, - C_out, (7, 1), (stride, 1), (3, 0), - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=name + 'op.2.weight'), - bias_attr=False) - if affine: - out = fluid.layers.batch_norm( - conv2d_b, - param_attr=ParamAttr( - initializer=Constant(1.), name=name + 'op.3.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), name=name + 'op.3.bias'), - moving_mean_name=name + 'op.3.running_mean', - moving_variance_name=name + 'op.3.running_var') - else: - out = fluid.layers.batch_norm( - conv2d_b, - param_attr=ParamAttr( - initializer=Constant(1.), - learning_rate=0., - name=name + 'op.3.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), - learning_rate=0., - name=name + 'op.3.bias'), - moving_mean_name=name + 'op.3.running_mean', - moving_variance_name=name + 'op.3.running_var') - - -def Identity(input, name=''): - return input - - -def Zero(input, stride, name=''): - ones = np.ones(input.shape[-2:]) - ones[::stride, ::stride] = 0 - ones = fluid.layers.assign(ones) - return input * ones - - -def FactorizedReduce(input, C_out, name='', affine=True): - relu_a = fluid.layers.relu(input) - conv2d_a = fluid.layers.conv2d( - relu_a, - C_out // 2, - 1, - 2, - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=name + 'conv_1.weight'), - bias_attr=False) - h_end = relu_a.shape[2] - w_end = relu_a.shape[3] - slice_a = fluid.layers.slice(relu_a, [2, 3], [1, 1], [h_end, w_end]) - conv2d_b = fluid.layers.conv2d( - slice_a, - C_out // 2, - 1, - 2, - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=name + 'conv_2.weight'), - bias_attr=False) - out = fluid.layers.concat([conv2d_a, conv2d_b], axis=1) - if affine: - out = fluid.layers.batch_norm( - out, - param_attr=ParamAttr( - initializer=Constant(1.), name=name + 'bn.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), name=name + 'bn.bias'), - moving_mean_name=name + 'bn.running_mean', - moving_variance_name=name + 'bn.running_var') - else: - out = fluid.layers.batch_norm( - out, - param_attr=ParamAttr( - initializer=Constant(1.), - learning_rate=0., - name=name + 'bn.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), - learning_rate=0., - name=name + 'bn.bias'), - moving_mean_name=name + 'bn.running_mean', - moving_variance_name=name + 'bn.running_var') - return out diff --git a/fluid/AutoDL/LRC/reader.py b/fluid/AutoDL/LRC/reader.py deleted file mode 100644 index 20b32b504e9245c4ff3892f08736d800080daab4..0000000000000000000000000000000000000000 --- a/fluid/AutoDL/LRC/reader.py +++ /dev/null @@ -1,187 +0,0 @@ -# Copyright (c) 2019 PaddlePaddle Authors. All Rig hts Reserved -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# -# Based on: -# -------------------------------------------------------- -# DARTS -# Copyright (c) 2018, Hanxiao Liu. -# Licensed under the Apache License, Version 2.0; -# -------------------------------------------------------- -""" -CIFAR-10 dataset. -This module will download dataset from -https://www.cs.toronto.edu/~kriz/cifar.html and parse train/test set into -paddle reader creators. -The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, -with 6000 images per class. There are 50000 training images and 10000 test images. -""" - -from PIL import Image -from PIL import ImageOps -import numpy as np - -import cPickle -import random -import utils -import paddle.fluid as fluid -import time -import os -import functools -import paddle.reader - -__all__ = ['train10', 'test10'] - -image_size = 32 -image_depth = 3 -half_length = 8 - -CIFAR_MEAN = [0.4914, 0.4822, 0.4465] -CIFAR_STD = [0.24703233, 0.24348505, 0.26158768] - - -def generate_reshape_label(label, batch_size, CIFAR_CLASSES=10): - reshape_label = np.zeros((batch_size, 1), dtype='int32') - reshape_non_label = np.zeros( - (batch_size * (CIFAR_CLASSES - 1), 1), dtype='int32') - num = 0 - for i in range(batch_size): - label_i = label[i] - reshape_label[i] = label_i + i * CIFAR_CLASSES - for j in range(CIFAR_CLASSES): - if label_i != j: - reshape_non_label[num] = \ - j + i * CIFAR_CLASSES - num += 1 - return reshape_label, reshape_non_label - - -def generate_bernoulli_number(batch_size, CIFAR_CLASSES=10): - rcc_iters = 50 - rad_var = np.zeros((rcc_iters, batch_size, CIFAR_CLASSES - 1)) - for i in range(rcc_iters): - bernoulli_num = np.random.binomial(size=batch_size, n=1, p=0.5) - bernoulli_map = np.array([]) - ones = np.ones((CIFAR_CLASSES - 1, 1)) - for batch_id in range(batch_size): - num = bernoulli_num[batch_id] - var_id = 2 * ones * num - 1 - bernoulli_map = np.append(bernoulli_map, var_id) - rad_var[i] = bernoulli_map.reshape((batch_size, CIFAR_CLASSES - 1)) - return rad_var.astype('float32') - - -def preprocess(sample, is_training, args): - image_array = sample.reshape(3, image_size, image_size) - rgb_array = np.transpose(image_array, (1, 2, 0)) - img = Image.fromarray(rgb_array, 'RGB') - - if is_training: - # pad and ramdom crop - img = ImageOps.expand(img, (4, 4, 4, 4), fill=0) # pad to 40 * 40 * 3 - left_top = np.random.randint(9, size=2) # rand 0 - 8 - img = img.crop((left_top[0], left_top[1], left_top[0] + image_size, - left_top[1] + image_size)) - if np.random.randint(2): - img = img.transpose(Image.FLIP_LEFT_RIGHT) - - img = np.array(img).astype(np.float32) - - # per_image_standardization - img_float = img / 255.0 - img = (img_float - CIFAR_MEAN) / CIFAR_STD - - if is_training and args.cutout: - center = np.random.randint(image_size, size=2) - offset_width = max(0, center[0] - half_length) - offset_height = max(0, center[1] - half_length) - target_width = min(center[0] + half_length, image_size) - target_height = min(center[1] + half_length, image_size) - - for i in range(offset_height, target_height): - for j in range(offset_width, target_width): - img[i][j][:] = 0.0 - - img = np.transpose(img, (2, 0, 1)) - return img - - -def reader_creator_filepath(filename, sub_name, is_training, args): - files = os.listdir(filename) - names = [each_item for each_item in files if sub_name in each_item] - names.sort() - datasets = [] - for name in names: - print("Reading file " + name) - batch = cPickle.load(open(filename + name, 'rb')) - data = batch['data'] - labels = batch.get('labels', batch.get('fine_labels', None)) - assert labels is not None - dataset = zip(data, labels) - datasets.extend(dataset) - random.shuffle(datasets) - - def read_batch(datasets, args): - for sample, label in datasets: - im = preprocess(sample, is_training, args) - yield im, [int(label)] - - def reader(): - batch_data = [] - batch_label = [] - for data, label in read_batch(datasets, args): - batch_data.append(data) - batch_label.append(label) - if len(batch_data) == args.batch_size: - batch_data = np.array(batch_data, dtype='float32') - batch_label = np.array(batch_label, dtype='int64') - if is_training: - flatten_label, flatten_non_label = \ - generate_reshape_label(batch_label, args.batch_size) - rad_var = generate_bernoulli_number(args.batch_size) - mixed_x, y_a, y_b, lam = utils.mixup_data( - batch_data, batch_label, args.batch_size, - args.mix_alpha) - batch_out = [[mixed_x, y_a, y_b, lam, flatten_label, \ - flatten_non_label, rad_var]] - yield batch_out - else: - batch_out = [[batch_data, batch_label]] - yield batch_out - batch_data = [] - batch_label = [] - - return reader - - -def train10(args): - """ - CIFAR-10 training set creator. - It returns a reader creator, each sample in the reader is image pixels in - [0, 1] and label in [0, 9]. - :return: Training reader creator - :rtype: callable - """ - - return reader_creator_filepath(args.data, 'data_batch', True, args) - - -def test10(args): - """ - CIFAR-10 test set creator. - It returns a reader creator, each sample in the reader is image pixels in - [0, 1] and label in [0, 9]. - :return: Test reader creator. - :rtype: callable - """ - return reader_creator_filepath(args.data, 'test_batch', False, args) diff --git a/fluid/AutoDL/LRC/run.sh b/fluid/AutoDL/LRC/run.sh deleted file mode 100644 index 9f1a045d49789c3e9aebbc2a73b84b11da471b5a..0000000000000000000000000000000000000000 --- a/fluid/AutoDL/LRC/run.sh +++ /dev/null @@ -1,8 +0,0 @@ -CUDA_VISIBLE_DEVICES=0 python -u train_mixup.py \ ---batch_size=80 \ ---auxiliary \ ---weight_decay=0.0003 \ ---learning_rate=0.025 \ ---lrc_loss_lambda=0.7 \ ---cutout - diff --git a/fluid/AutoDL/LRC/train_mixup.py b/fluid/AutoDL/LRC/train_mixup.py deleted file mode 100644 index de752c84bcf9276aa83540d60370517e66c0704f..0000000000000000000000000000000000000000 --- a/fluid/AutoDL/LRC/train_mixup.py +++ /dev/null @@ -1,247 +0,0 @@ -# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve. -# -#Licensed under the Apache License, Version 2.0 (the "License"); -#you may not use this file except in compliance with the License. -#You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -#Unless required by applicable law or agreed to in writing, software -#distributed under the License is distributed on an "AS IS" BASIS, -#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -#See the License for the specific language governing permissions and -#limitations under the License. -# -# Based on: -# -------------------------------------------------------- -# DARTS -# Copyright (c) 2018, Hanxiao Liu. -# Licensed under the Apache License, Version 2.0; -# -------------------------------------------------------- - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function -from learning_rate import cosine_decay -import numpy as np -import argparse -from model import NetworkCIFAR as Network -import reader -import sys -import os -import time -import logging -import genotypes -import paddle.fluid as fluid -import shutil -import utils -import cPickle as cp - -parser = argparse.ArgumentParser("cifar") -parser.add_argument( - '--data', - type=str, - default='./dataset/cifar/cifar-10-batches-py/', - help='location of the data corpus') -parser.add_argument('--batch_size', type=int, default=96, help='batch size') -parser.add_argument( - '--learning_rate', type=float, default=0.025, help='init learning rate') -parser.add_argument('--momentum', type=float, default=0.9, help='momentum') -parser.add_argument( - '--weight_decay', type=float, default=3e-4, help='weight decay') -parser.add_argument( - '--report_freq', type=float, default=50, help='report frequency') -parser.add_argument( - '--epochs', type=int, default=600, help='num of training epochs') -parser.add_argument( - '--init_channels', type=int, default=36, help='num of init channels') -parser.add_argument( - '--layers', type=int, default=20, help='total number of layers') -parser.add_argument( - '--model_path', - type=str, - default='saved_models', - help='path to save the model') -parser.add_argument( - '--auxiliary', - action='store_true', - default=False, - help='use auxiliary tower') -parser.add_argument( - '--auxiliary_weight', - type=float, - default=0.4, - help='weight for auxiliary loss') -parser.add_argument( - '--cutout', action='store_true', default=False, help='use cutout') -parser.add_argument( - '--cutout_length', type=int, default=16, help='cutout length') -parser.add_argument( - '--drop_path_prob', type=float, default=0.2, help='drop path probability') -parser.add_argument('--save', type=str, default='EXP', help='experiment name') -parser.add_argument( - '--arch', type=str, default='DARTS', help='which architecture to use') -parser.add_argument( - '--grad_clip', type=float, default=5, help='gradient clipping') -parser.add_argument( - '--lr_exp_decay', - action='store_true', - default=False, - help='use exponential_decay learning_rate') -parser.add_argument('--mix_alpha', type=float, default=0.5, help='mixup alpha') -parser.add_argument( - '--lrc_loss_lambda', default=0, type=float, help='lrc_loss_lambda') -parser.add_argument( - '--loss_type', - default=1, - type=float, - help='loss_type 0: cross entropy 1: multi margin loss 2: max margin loss') - -args = parser.parse_args() - -CIFAR_CLASSES = 10 -dataset_train_size = 50000 -image_size = 32 - - -def main(): - image_shape = [3, image_size, image_size] - devices = os.getenv("CUDA_VISIBLE_DEVICES") or "" - devices_num = len(devices.split(",")) - logging.info("args = %s", args) - genotype = eval("genotypes.%s" % args.arch) - model = Network(args.init_channels, CIFAR_CLASSES, args.layers, - args.auxiliary, genotype) - steps_one_epoch = dataset_train_size / (devices_num * args.batch_size) - train(model, args, image_shape, steps_one_epoch) - - -def build_program(main_prog, startup_prog, args, is_train, model, im_shape, - steps_one_epoch): - out = [] - with fluid.program_guard(main_prog, startup_prog): - py_reader = model.build_input(im_shape, args.batch_size, is_train) - if is_train: - with fluid.unique_name.guard(): - loss = model.train_model(py_reader, args.init_channels, - args.auxiliary, args.auxiliary_weight, - args.batch_size, args.lrc_loss_lambda) - optimizer = fluid.optimizer.Momentum( - learning_rate=cosine_decay(args.learning_rate, \ - args.epochs, steps_one_epoch), - regularization=fluid.regularizer.L2Decay(\ - args.weight_decay), - momentum=args.momentum) - optimizer.minimize(loss) - out = [py_reader, loss] - else: - with fluid.unique_name.guard(): - loss, acc_1, acc_5 = model.test_model(py_reader, - args.init_channels) - out = [py_reader, loss, acc_1, acc_5] - return out - - -def train(model, args, im_shape, steps_one_epoch): - train_startup_prog = fluid.Program() - test_startup_prog = fluid.Program() - train_prog = fluid.Program() - test_prog = fluid.Program() - - train_py_reader, loss_train = build_program(train_prog, train_startup_prog, - args, True, model, im_shape, - steps_one_epoch) - - test_py_reader, loss_test, acc_1, acc_5 = build_program( - test_prog, test_startup_prog, args, False, model, im_shape, - steps_one_epoch) - - test_prog = test_prog.clone(for_test=True) - - place = fluid.CUDAPlace(0) - exe = fluid.Executor(place) - exe.run(train_startup_prog) - exe.run(test_startup_prog) - - exec_strategy = fluid.ExecutionStrategy() - exec_strategy.num_threads = 1 - train_exe = fluid.ParallelExecutor( - main_program=train_prog, - use_cuda=True, - loss_name=loss_train.name, - exec_strategy=exec_strategy) - train_reader = reader.train10(args) - test_reader = reader.test10(args) - train_py_reader.decorate_paddle_reader(train_reader) - test_py_reader.decorate_paddle_reader(test_reader) - - fluid.clip.set_gradient_clip(fluid.clip.GradientClipByNorm(args.grad_clip)) - fluid.memory_optimize(fluid.default_main_program()) - - def save_model(postfix, main_prog): - model_path = os.path.join(args.model_path, postfix) - if os.path.isdir(model_path): - shutil.rmtree(model_path) - fluid.io.save_persistables(exe, model_path, main_program=main_prog) - - def test(epoch_id): - test_fetch_list = [loss_test, acc_1, acc_5] - objs = utils.AvgrageMeter() - top1 = utils.AvgrageMeter() - top5 = utils.AvgrageMeter() - test_py_reader.start() - test_start_time = time.time() - step_id = 0 - try: - while True: - prev_test_start_time = test_start_time - test_start_time = time.time() - loss_test_v, acc_1_v, acc_5_v = exe.run( - test_prog, fetch_list=test_fetch_list) - objs.update(np.array(loss_test_v), args.batch_size) - top1.update(np.array(acc_1_v), args.batch_size) - top5.update(np.array(acc_5_v), args.batch_size) - if step_id % args.report_freq == 0: - print("Epoch {}, Step {}, acc_1 {}, acc_5 {}, time {}". - format(epoch_id, step_id, - np.array(acc_1_v), - np.array(acc_5_v), test_start_time - - prev_test_start_time)) - step_id += 1 - except fluid.core.EOFException: - test_py_reader.reset() - print("Epoch {0}, top1 {1}, top5 {2}".format(epoch_id, top1.avg, - top5.avg)) - - train_fetch_list = [loss_train] - epoch_start_time = time.time() - for epoch_id in range(args.epochs): - model.drop_path_prob = args.drop_path_prob * epoch_id / args.epochs - train_py_reader.start() - epoch_end_time = time.time() - if epoch_id > 0: - print("Epoch {}, total time {}".format(epoch_id - 1, epoch_end_time - - epoch_start_time)) - epoch_start_time = epoch_end_time - epoch_end_time - start_time = time.time() - step_id = 0 - try: - while True: - prev_start_time = start_time - start_time = time.time() - loss_v, = train_exe.run( - fetch_list=[v.name for v in train_fetch_list]) - print("Epoch {}, Step {}, loss {}, time {}".format(epoch_id, step_id, \ - np.array(loss_v).mean(), start_time-prev_start_time)) - step_id += 1 - sys.stdout.flush() - except fluid.core.EOFException: - train_py_reader.reset() - if epoch_id % 50 == 0 or epoch_id == args.epochs - 1: - save_model(str(epoch_id), train_prog) - test(epoch_id) - - -if __name__ == '__main__': - main() diff --git a/fluid/AutoDL/LRC/utils.py b/fluid/AutoDL/LRC/utils.py deleted file mode 100644 index 4002b57c6e91f9a4f7992156c4fa07f9e55d628c..0000000000000000000000000000000000000000 --- a/fluid/AutoDL/LRC/utils.py +++ /dev/null @@ -1,55 +0,0 @@ -# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# -# Based on: -# -------------------------------------------------------- -# DARTS -# Copyright (c) 2018, Hanxiao Liu. -# Licensed under the Apache License, Version 2.0; -# -------------------------------------------------------- - -import os -import sys -import time -import math -import numpy as np - - -def mixup_data(x, y, batch_size, alpha=1.0): - '''Compute the mixup data. Return mixed inputs, pairs of targets, and lambda''' - if alpha > 0.: - lam = np.random.beta(alpha, alpha) - else: - lam = 1. - index = np.random.permutation(batch_size) - - mixed_x = lam * x + (1 - lam) * x[index, :] - y_a, y_b = y, y[index] - return mixed_x.astype('float32'), y_a.astype('int64'),\ - y_b.astype('int64'), np.array(lam, dtype='float32') - - -class AvgrageMeter(object): - def __init__(self): - self.reset() - - def reset(self): - self.avg = 0 - self.sum = 0 - self.cnt = 0 - - def update(self, val, n=1): - self.sum += val * n - self.cnt += n - self.avg = self.sum / self.cnt diff --git a/fluid/DeepASR/.gitignore b/fluid/DeepASR/.gitignore deleted file mode 100644 index 485dee64bcfb48793379b200a1afd14e85a8aaf4..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/.gitignore +++ /dev/null @@ -1 +0,0 @@ -.idea diff --git a/fluid/DeepASR/README.md b/fluid/DeepASR/README.md deleted file mode 100644 index 6b9913fd30a56ef2328bc62e9b36e496f6763430..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/README.md +++ /dev/null @@ -1,36 +0,0 @@ -The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html). - -## Deep Automatic Speech Recognition - -### Introduction -TBD - -### Installation - -#### Kaldi -The decoder depends on [kaldi](https://github.com/kaldi-asr/kaldi), install it by flowing its instructions. Then - -```shell -export KALDI_ROOT= -``` - -#### Decoder - -```shell -git clone https://github.com/PaddlePaddle/models.git -cd models/fluid/DeepASR/decoder -sh setup.sh -``` - -### Data reprocessing -TBD - -### Training -TBD - - -### Inference & Decoding -TBD - -### Question and Contribution -TBD diff --git a/fluid/DeepASR/README_cn.md b/fluid/DeepASR/README_cn.md deleted file mode 100644 index be78a048701a621bd90942bdfe30ef4d7c7f082f..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/README_cn.md +++ /dev/null @@ -1,186 +0,0 @@ -运行本目录下的程序示例需要使用 PaddlePaddle v0.14及以上版本。如果您的 PaddlePaddle 安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新 PaddlePaddle 安装版本。 - ---- - -DeepASR (Deep Automatic Speech Recognition) 是一个基于PaddlePaddle FLuid与[Kaldi](http://www.kaldi-asr.org)的语音识别系统。其利用Fluid框架完成语音识别中声学模型的配置和训练,并集成 Kaldi 的解码器。旨在方便已对 Kaldi 的较为熟悉的用户实现中声学模型的快速、大规模训练,并利用kaldi完成复杂的语音数据预处理和最终的解码过程。 - -### 目录 -- [模型概览](#model-overview) -- [安装](#installation) -- [数据预处理](#data-reprocessing) -- [模型训练](#training) -- [训练过程中的时间分析](#perf-profiling) -- [预测和解码](#infer-decoding) -- [评估错误率](#scoring-error-rate) -- [Aishell 实例](#aishell-example) -- [欢迎贡献更多的实例](#how-to-contrib) - -### 模型概览 - -DeepASR的声学模型是一个单卷积层加多层层叠LSTMP 的结构,利用卷积来进行初步的特征提取,并用多层的LSTMP来对时序关系进行建模,所用到的损失函数是交叉熵。[LSTMP](https://arxiv.org/abs/1402.1128)(LSTM with recurrent projection layer)是传统 LSTM 的拓展,在 LSTM 的基础上增加了一个映射层,将隐含层映射到较低的维度并输入下一个时间步,这种结构在大为减小 LSTM 的参数规模和计算复杂度的同时还提升了 LSTM 的性能表现。 - -

-
-图1 LSTMP 的拓扑结构 -

- -### 安装 - - -#### kaldi的安装与设置 - - -DeepASR解码过程中所用的解码器依赖于[Kaldi的安装](https://github.com/kaldi-asr/kaldi),如环境中无Kaldi, 请`git clone`其源代码,并按给定的命令安装好kaldi,最后设置环境变量`KALDI_ROOT`: - -```shell -export KALDI_ROOT= - -``` -#### 解码器的安装 -进入解码器源码所在的目录 - -```shell -cd models/fluid/DeepASR/decoder -``` -运行安装脚本 - -```shell -sh setup.sh -``` - 编译过程完成即成功地安转了解码器。 - -### 数据预处理 - -参考[Kaldi的数据准备流程](http://kaldi-asr.org/doc/data_prep.html)完成音频数据的特征提取和标签对齐 - -### 声学模型的训练 - -可以选择在CPU或GPU模式下进行声学模型的训练,例如在GPU模式下的训练 - -```shell -CUDA_VISIBLE_DEVICES=0,1,2,3 python -u train.py \ - --train_feature_lst train_feature.lst \ - --train_label_lst train_label.lst \ - --val_feature_lst val_feature.lst \ - --val_label_lst val_label.lst \ - --mean_var global_mean_var \ - --parallel -``` -其中`train_feature.lst`和`train_label.lst`分别是训练数据集的特征列表文件和标注列表文件,类似的,`val_feature.lst`和`val_label.lst`对应的则是验证集的列表文件。实际训练过程中要正确指定建模单元大小、学习率等重要参数。关于这些参数的说明,请运行 - -```shell -python train.py --help -``` -获取更多信息。 - -### 训练过程中的时间分析 - -利用Fluid提供的性能分析工具profiler,可对训练过程进行性能分析,获取网络中operator级别的执行时间 - -```shell -CUDA_VISIBLE_DEVICES=0 python -u tools/profile.py \ - --train_feature_lst train_feature.lst \ - --train_label_lst train_label.lst \ - --val_feature_lst val_feature.lst \ - --val_label_lst val_label.lst \ - --mean_var global_mean_var -``` - - -### 预测和解码 - -在充分训练好声学模型之后,利用训练过程中保存下来的模型checkpoint,可对输入的音频数据进行解码输出,得到声音到文字的识别结果 - -``` -CUDA_VISIBLE_DEVICES=0,1,2,3 python -u infer_by_ckpt.py \ - --batch_size 96 \ - --checkpoint deep_asr.pass_1.checkpoint \ - --infer_feature_lst test_feature.lst \ - --infer_label_lst test_label.lst \ - --mean_var global_mean_var \ - --parallel -``` - -### 评估错误率 - -对语音识别系统的评价常用的指标有词错误率(Word Error Rate, WER)和字错误率(Character Error Rate, CER), 在DeepASR中也实现了相关的度量工具,其运行方式为 - -``` -python score_error_rate.py --error_rate_type cer --ref ref.txt --hyp decoding.txt -``` -参数`error_rate_type`表示测量错误率的类型,即 WER 或 CER;`ref.txt` 和 `decoding.txt` 分别表示参考文本和实际解码出的文本,它们有着同样的格式: - -``` -key1 text1 -key2 text2 -key3 text3 -... - -``` - - -### Aishell 实例 - -本节以[Aishell数据集](http://www.aishelltech.com/kysjcp)为例,展示如何完成从数据预处理到解码输出。Aishell是由北京希尔贝克公司所开放的中文普通话语音数据集,时长178小时,包含了400名来自不同口音区域录制者的语音,原始数据可由[openslr](http://www.openslr.org/33)获取。为简化流程,这里提供了已完成预处理的数据集供下载: - -``` -cd examples/aishell -sh prepare_data.sh -``` - -其中包括了声学模型的训练数据以及解码过程中所用到的辅助文件等。下载数据完成后,在开始训练之前可对训练过程进行分析 - -``` -sh profile.sh -``` - -执行训练 - -``` -sh train.sh -``` -默认是用4卡GPU进行训练,在实际过程中可根据可用GPU的数目和显存大小对`batch_size`、学习率等参数进行动态调整。训练过程中典型的损失函数和精度的变化趋势如图2所示 - -

-
-图2 在Aishell数据集上训练声学模型的学习曲线 -

- -完成模型训练后,即可执行预测识别测试集语音中的文字: - -``` -sh infer_by_ckpt.sh -``` - -其中包括了声学模型的预测和解码器的解码输出两个重要的过程。以下是解码输出的样例: - -``` -... -BAC009S0764W0239 十一 五 期间 我 国 累计 境外 投资 七千亿 美元 -BAC009S0765W0140 在 了解 送 方 的 资产 情况 与 需求 之后 -BAC009S0915W0291 这 对 苹果 来说 不 是 件 容易 的 事 儿 -BAC009S0769W0159 今年 土地 收入 预计 近 四万亿 元 -BAC009S0907W0451 由 浦东 商店 作为 掩护 -BAC009S0768W0128 土地 交易 可能 随着 供应 淡季 的 到来 而 降温 -... -``` - -每行对应一个输出,均以音频样本的关键字开头,随后是按词分隔的解码出的中文文本。解码完成后运行脚本评估字错误率(CER) - -``` -sh score_cer.sh -``` - -其输出类似于如下所示 - -``` -Error rate[cer] = 0.101971 (10683/104765), -total 7176 sentences in hyp, 0 not presented in ref. -``` - -利用经过20轮左右训练的声学模型,可以在Aishell的测试集上得到CER约10%的识别结果。 - - -### 欢迎贡献更多的实例 - -DeepASR目前只开放了Aishell实例,我们欢迎用户在更多的数据集上测试完整的训练流程并贡献到这个项目中。 diff --git a/fluid/DeepASR/data_utils/__init__.py b/fluid/DeepASR/data_utils/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/fluid/DeepASR/data_utils/async_data_reader.py b/fluid/DeepASR/data_utils/async_data_reader.py deleted file mode 100644 index edface051129b248bad85978118daec6f8660adc..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/data_utils/async_data_reader.py +++ /dev/null @@ -1,465 +0,0 @@ -"""This module contains data processing related logic. -""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import random -import struct -import Queue -import time -import numpy as np -from threading import Thread -import signal -from multiprocessing import Manager, Process -import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm -import data_utils.augmentor.trans_add_delta as trans_add_delta -from data_utils.util import suppress_complaints, suppress_signal -from data_utils.util import CriticalException, ForceExitWrapper - - -class SampleInfo(object): - """SampleInfo holds the necessary information to load a sample from disk. - - Args: - feature_bin_path (str): File containing the feature data. - feature_start (int): Start position of the sample's feature data. - feature_size (int): Byte count of the sample's feature data. - feature_frame_num (int): Time length of the sample. - feature_dim (int): Feature dimension of one frame. - label_bin_path (str): File containing the label data. - label_size (int): Byte count of the sample's label data. - label_frame_num (int): Label number of the sample. - sample_name (str): Key of the sample - """ - - def __init__(self, feature_bin_path, feature_start, feature_size, - feature_frame_num, feature_dim, label_bin_path, label_start, - label_size, label_frame_num, sample_name): - self.feature_bin_path = feature_bin_path - self.feature_start = feature_start - self.feature_size = feature_size - self.feature_frame_num = feature_frame_num - self.feature_dim = feature_dim - - self.label_bin_path = label_bin_path - self.label_start = label_start - self.label_size = label_size - self.label_frame_num = label_frame_num - self.sample_name = sample_name - - -class SampleInfoBucket(object): - """SampleInfoBucket contains paths of several description files. Feature - description file contains necessary information (including path of binary - data, sample start position, sample byte number etc.) to access samples' - feature data and the same with the label description file. SampleInfoBucket - is the minimum unit to do shuffle. - - Args: - feature_bin_paths (list|tuple): Files containing the binary feature - data. - feature_desc_paths (list|tuple): Files containing the description of - samples' feature data. - label_bin_paths (list|tuple): Files containing the binary label data. - label_desc_paths (list|tuple): Files containing the description of - samples' label data. - split_perturb(int): Maximum perturbation value for length of - sub-sentence when splitting long sentence. - split_sentence_threshold(int): Sentence whose length larger than - the value will trigger split operation. - split_sub_sentence_len(int): sub-sentence length is equal to - (split_sub_sentence_len - + rand() % split_perturb). - """ - - def __init__(self, - feature_bin_paths, - feature_desc_paths, - label_bin_paths, - label_desc_paths, - split_perturb=50, - split_sentence_threshold=512, - split_sub_sentence_len=256): - block_num = len(label_bin_paths) - assert len(label_desc_paths) == block_num - assert len(feature_bin_paths) == block_num - assert len(feature_desc_paths) == block_num - self._block_num = block_num - - self._feature_bin_paths = feature_bin_paths - self._feature_desc_paths = feature_desc_paths - self._label_bin_paths = label_bin_paths - self._label_desc_paths = label_desc_paths - self._split_perturb = split_perturb - self._split_sentence_threshold = split_sentence_threshold - self._split_sub_sentence_len = split_sub_sentence_len - self._rng = random.Random(0) - - def generate_sample_info_list(self): - sample_info_list = [] - for block_idx in xrange(self._block_num): - label_bin_path = self._label_bin_paths[block_idx] - label_desc_path = self._label_desc_paths[block_idx] - feature_bin_path = self._feature_bin_paths[block_idx] - feature_desc_path = self._feature_desc_paths[block_idx] - - feature_desc_lines = open(feature_desc_path).readlines() - - label_desc_lines = [] - if label_desc_path != "": - label_desc_lines = open(label_desc_path).readlines() - sample_num = int(feature_desc_lines[0].split()[1]) - - if label_desc_path != "": - assert sample_num == int(label_desc_lines[0].split()[1]) - - for i in xrange(sample_num): - feature_desc_split = feature_desc_lines[i + 1].split() - sample_name = feature_desc_split[0] - feature_start = int(feature_desc_split[2]) - feature_size = int(feature_desc_split[3]) - feature_frame_num = int(feature_desc_split[4]) - feature_dim = int(feature_desc_split[5]) - - label_start = -1 - label_size = -1 - label_frame_num = feature_frame_num - if label_desc_path != "": - label_desc_split = label_desc_lines[i + 1].split() - label_start = int(label_desc_split[2]) - label_size = int(label_desc_split[3]) - label_frame_num = int(label_desc_split[4]) - assert feature_frame_num == label_frame_num - - if self._split_sentence_threshold == -1 or \ - self._split_perturb == -1 or \ - self._split_sub_sentence_len == -1 \ - or self._split_sentence_threshold >= feature_frame_num: - sample_info_list.append( - SampleInfo(feature_bin_path, feature_start, - feature_size, feature_frame_num, feature_dim, - label_bin_path, label_start, label_size, - label_frame_num, sample_name)) - #split sentence - else: - cur_frame_pos = 0 - cur_frame_len = 0 - remain_frame_num = feature_frame_num - while True: - if remain_frame_num > self._split_sentence_threshold: - cur_frame_len = self._split_sub_sentence_len + \ - self._rng.randint(0, self._split_perturb) - if cur_frame_len > remain_frame_num: - cur_frame_len = remain_frame_num - else: - cur_frame_len = remain_frame_num - - sample_info_list.append( - SampleInfo( - feature_bin_path, feature_start + cur_frame_pos - * feature_dim * 4, cur_frame_len * feature_dim * - 4, cur_frame_len, feature_dim, label_bin_path, - label_start + cur_frame_pos * 4, cur_frame_len * - 4, cur_frame_len, sample_name)) - - remain_frame_num -= cur_frame_len - cur_frame_pos += cur_frame_len - if remain_frame_num <= 0: - break - return sample_info_list - - -class EpochEndSignal(): - pass - - -class AsyncDataReader(object): - """DataReader provides basic audio sample preprocessing pipeline including - data loading and data augmentation. - - Args: - feature_file_list (str): File containing paths of feature data file and - corresponding description file. - label_file_list (str): File containing paths of label data file and - corresponding description file. - drop_frame_len (int): Samples whose label length above the value will be - dropped.(Using '-1' to disable the policy) - split_sentence_threshold(int): Sentence whose length larger than - the value will trigger split operation. - (Assign -1 to disable split) - proc_num (int): Number of processes for processing data. - sample_buffer_size (int): Buffer size to indicate the maximum samples - cached. - sample_info_buffer_size (int): Buffer size to indicate the maximum - sample information cached. - batch_buffer_size (int): Buffer size to indicate the maximum batch - cached. - shuffle_block_num (int): Block number indicating the minimum unit to do - shuffle. - random_seed (int): Random seed. - verbose (int): If set to 0, complaints including exceptions and signal - traceback from sub-process will be suppressed. If set - to 1, all complaints will be printed. - """ - - def __init__(self, - feature_file_list, - label_file_list="", - drop_frame_len=512, - split_sentence_threshold=1024, - proc_num=10, - sample_buffer_size=1024, - sample_info_buffer_size=1024, - batch_buffer_size=10, - shuffle_block_num=10, - random_seed=0, - verbose=0): - self._feature_file_list = feature_file_list - self._label_file_list = label_file_list - self._drop_frame_len = drop_frame_len - self._split_sentence_threshold = split_sentence_threshold - self._shuffle_block_num = shuffle_block_num - self._block_info_list = None - self._rng = random.Random(random_seed) - self._bucket_list = None - self.generate_bucket_list(True) - self._order_id = 0 - self._manager = Manager() - self._sample_buffer_size = sample_buffer_size - self._sample_info_buffer_size = sample_info_buffer_size - self._batch_buffer_size = batch_buffer_size - self._proc_num = proc_num - self._verbose = verbose - self._force_exit = ForceExitWrapper(self._manager.Value('b', False)) - - def generate_bucket_list(self, is_shuffle): - if self._block_info_list is None: - block_feature_info_lines = open(self._feature_file_list).readlines() - self._block_info_list = [] - if self._label_file_list != "": - block_label_info_lines = open(self._label_file_list).readlines() - assert len(block_feature_info_lines) == len( - block_label_info_lines) - for i in xrange(0, len(block_feature_info_lines), 2): - block_info = (block_feature_info_lines[i], - block_feature_info_lines[i + 1], - block_label_info_lines[i], - block_label_info_lines[i + 1]) - self._block_info_list.append( - map(lambda line: line.strip(), block_info)) - else: - for i in xrange(0, len(block_feature_info_lines), 2): - block_info = (block_feature_info_lines[i], - block_feature_info_lines[i + 1], "", "") - self._block_info_list.append( - map(lambda line: line.strip(), block_info)) - - if is_shuffle: - self._rng.shuffle(self._block_info_list) - - self._bucket_list = [] - for i in xrange(0, len(self._block_info_list), self._shuffle_block_num): - bucket_block_info = self._block_info_list[i:i + - self._shuffle_block_num] - self._bucket_list.append( - SampleInfoBucket( - map(lambda info: info[0], bucket_block_info), - map(lambda info: info[1], bucket_block_info), - map(lambda info: info[2], bucket_block_info), - map(lambda info: info[3], bucket_block_info), - split_sentence_threshold=self._split_sentence_threshold)) - - # @TODO make this configurable - def set_transformers(self, transformers): - self._transformers = transformers - - def _sample_generator(self): - sample_info_queue = self._manager.Queue(self._sample_info_buffer_size) - sample_queue = self._manager.Queue(self._sample_buffer_size) - self._order_id = 0 - - @suppress_complaints(verbose=self._verbose, notify=self._force_exit) - def ordered_feeding_task(sample_info_queue): - for sample_info_bucket in self._bucket_list: - try: - sample_info_list = \ - sample_info_bucket.generate_sample_info_list() - except Exception as e: - raise CriticalException(e) - else: - self._rng.shuffle(sample_info_list) # do shuffle here - for sample_info in sample_info_list: - sample_info_queue.put((sample_info, self._order_id)) - self._order_id += 1 - - for i in xrange(self._proc_num): - sample_info_queue.put(EpochEndSignal()) - - feeding_thread = Thread( - target=ordered_feeding_task, args=(sample_info_queue, )) - feeding_thread.daemon = True - feeding_thread.start() - - @suppress_complaints(verbose=self._verbose, notify=self._force_exit) - def ordered_processing_task(sample_info_queue, sample_queue, out_order): - if self._verbose == 0: - signal.signal(signal.SIGTERM, suppress_signal) - signal.signal(signal.SIGINT, suppress_signal) - - def read_bytes(fpath, start, size): - try: - f = open(fpath, 'r') - f.seek(start, 0) - binary_bytes = f.read(size) - f.close() - return binary_bytes - except Exception as e: - raise CriticalException(e) - - ins = sample_info_queue.get() - - while not isinstance(ins, EpochEndSignal): - sample_info, order_id = ins - - feature_bytes = read_bytes(sample_info.feature_bin_path, - sample_info.feature_start, - sample_info.feature_size) - - assert sample_info.feature_frame_num \ - * sample_info.feature_dim * 4 \ - == len(feature_bytes), \ - (sample_info.feature_bin_path, - sample_info.feature_frame_num, - sample_info.feature_dim, - len(feature_bytes)) - - label_data = None - if sample_info.label_bin_path != "": - label_bytes = read_bytes(sample_info.label_bin_path, - sample_info.label_start, - sample_info.label_size) - - assert sample_info.label_frame_num * 4 == len( - label_bytes), (sample_info.label_bin_path, - sample_info.label_array, - len(label_bytes)) - - label_array = struct.unpack( - 'I' * sample_info.label_frame_num, label_bytes) - label_data = np.array( - label_array, dtype='int64').reshape( - (sample_info.label_frame_num, 1)) - else: - label_data = np.zeros( - (sample_info.label_frame_num, 1), dtype='int64') - - feature_frame_num = sample_info.feature_frame_num - feature_dim = sample_info.feature_dim - assert feature_frame_num * feature_dim * 4 == len(feature_bytes) - feature_array = struct.unpack('f' * feature_frame_num * - feature_dim, feature_bytes) - feature_data = np.array( - feature_array, dtype='float32').reshape(( - sample_info.feature_frame_num, sample_info.feature_dim)) - sample_data = (feature_data, label_data, - sample_info.sample_name) - for transformer in self._transformers: - # @TODO(pkuyym) to make transfomer only accept feature_data - sample_data = transformer.perform_trans(sample_data) - while order_id != out_order[0]: - time.sleep(0.001) - - # drop long sentence - if self._drop_frame_len == -1 or \ - self._drop_frame_len >= sample_data[0].shape[0]: - sample_queue.put(sample_data) - - out_order[0] += 1 - ins = sample_info_queue.get() - - sample_queue.put(EpochEndSignal()) - - out_order = self._manager.list([0]) - args = (sample_info_queue, sample_queue, out_order) - workers = [ - Process( - target=ordered_processing_task, args=args) - for _ in xrange(self._proc_num) - ] - - for w in workers: - w.daemon = True - w.start() - - finished_proc_num = 0 - - while self._force_exit == False: - try: - sample = sample_queue.get_nowait() - except Queue.Empty: - time.sleep(0.001) - else: - if isinstance(sample, EpochEndSignal): - finished_proc_num += 1 - if finished_proc_num >= self._proc_num: - break - else: - continue - - yield sample - - def batch_iterator(self, batch_size, minimum_batch_size): - def batch_to_ndarray(batch_samples, lod): - assert len(batch_samples) - frame_dim = batch_samples[0][0].shape[1] - batch_feature = np.zeros((lod[-1], frame_dim), dtype="float32") - batch_label = np.zeros((lod[-1], 1), dtype="int64") - start = 0 - name_lst = [] - for sample in batch_samples: - frame_num = sample[0].shape[0] - batch_feature[start:start + frame_num, :] = sample[0] - batch_label[start:start + frame_num, :] = sample[1] - start += frame_num - name_lst.append(sample[2]) - return (batch_feature, batch_label, name_lst) - - @suppress_complaints(verbose=self._verbose, notify=self._force_exit) - def batch_assembling_task(sample_generator, batch_queue): - batch_samples = [] - lod = [0] - for sample in sample_generator(): - batch_samples.append(sample) - lod.append(lod[-1] + sample[0].shape[0]) - if len(batch_samples) == batch_size: - (batch_feature, batch_label, name_lst) = batch_to_ndarray( - batch_samples, lod) - batch_queue.put((batch_feature, batch_label, lod, name_lst)) - batch_samples = [] - lod = [0] - - if len(batch_samples) >= minimum_batch_size: - (batch_feature, batch_label, name_lst) = batch_to_ndarray( - batch_samples, lod) - batch_queue.put((batch_feature, batch_label, lod, name_lst)) - - batch_queue.put(EpochEndSignal()) - - batch_queue = Queue.Queue(self._batch_buffer_size) - - assembling_thread = Thread( - target=batch_assembling_task, - args=(self._sample_generator, batch_queue)) - assembling_thread.daemon = True - assembling_thread.start() - - while self._force_exit == False: - try: - batch_data = batch_queue.get_nowait() - except Queue.Empty: - time.sleep(0.001) - else: - if isinstance(batch_data, EpochEndSignal): - break - yield batch_data diff --git a/fluid/DeepASR/data_utils/augmentor/__init__.py b/fluid/DeepASR/data_utils/augmentor/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/fluid/DeepASR/data_utils/augmentor/tests/__init__.py b/fluid/DeepASR/data_utils/augmentor/tests/__init__.py deleted file mode 100644 index 90856dc44374211453f7de128c08c8004ffda912..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/data_utils/augmentor/tests/__init__.py +++ /dev/null @@ -1,7 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm -import data_utils.augmentor.trans_add_delta as trans_add_delta -import data_utils.augmentor.trans_splice as trans_splice diff --git a/fluid/DeepASR/data_utils/augmentor/tests/data/global_mean_var_search26kHr b/fluid/DeepASR/data_utils/augmentor/tests/data/global_mean_var_search26kHr deleted file mode 100644 index 7fabadc789bbd7aaad4e9ac59aba95b080c68b22..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/data_utils/augmentor/tests/data/global_mean_var_search26kHr +++ /dev/null @@ -1,120 +0,0 @@ -16.2845556399 11.6891798673 -17.21509949 12.3788567902 -18.1143704548 14.9912618017 -19.2335963752 18.5419556172 -19.9266772451 21.2768220522 -19.8245737202 21.2347210705 -19.5432940972 20.2784036567 -19.4631271754 20.2934452329 -19.3929919324 20.457971868 -19.2924788362 20.3626439234 -18.9207244502 19.9196569759 -18.7202605641 19.5920276899 -18.4844279398 19.2068349019 -18.2670948624 18.8716893824 -18.0929628855 18.5439666541 -17.8428896026 18.0255891747 -17.6646850635 17.473764296 -17.4955705896 16.8966859471 -17.3706720293 16.4294027467 -17.2530867792 16.0514717623 -17.1304341172 15.7234699057 -17.0038353287 15.4344471514 -16.902550309 15.1603287337 -16.8375590047 14.9304337826 -16.816287853 14.9119310513 -16.828838265 15.0930023024 -16.8602209498 15.3771992423 -16.9101763812 15.6897991789 -16.9466065143 15.9364556489 -16.9486061956 16.0699417826 -16.9041374104 16.0796970272 -16.8410093699 16.0111444599 -16.7045718836 15.7991985601 -16.51128489 15.5208920129 -16.3253910608 15.2603181921 -16.1297317333 14.9499965958 -15.903428372 14.5958280409 -15.6131718105 14.2709618 -15.1395035533 13.9993939893 -14.4298229999 13.3841189151 -0.0034970565424 0.246184766149 -0.00501284154705 0.238484972472 -0.00605942680019 0.269064381708 -0.00687266156243 0.319479238011 -0.00734065019253 0.371947383205 -0.00718807218417 0.384426479694 -0.00652195540212 0.384676838281 -0.00660416525951 0.395543910317 -0.00680202057642 0.400803979681 -0.00659144183007 0.393228973031 -0.00605294530423 0.385021118038 -0.00590452969394 0.361763039625 -0.00612315374687 0.346777773373 -0.00582354093973 0.335802403976 -0.00574556002554 0.320733728218 -0.00612254485891 0.310153103033 -0.00626733043219 0.299854747445 -0.00567398408041 0.293353685493 -0.00519236700706 0.287668810947 -0.00529581474367 0.281479660772 -0.00479019484082 0.27451415777 -0.00486381039428 0.266294391154 -0.00491126372868 0.258105116126 -0.00452105305011 0.252926328298 -0.00531483334271 0.250910887373 -0.00546572110469 0.253302256977 -0.00479544857908 0.258484183394 -0.00422106426297 0.264582900173 -0.00401824135188 0.268467945623 -0.0041705465252 0.269699480291 -0.00405239564143 0.270406162975 -0.0040059737566 0.270407601782 -0.00406426729317 0.267951582656 -0.00416613791013 0.264543833042 -0.00427847607653 0.26247798891 -0.00428050903034 0.259635263243 -0.00454842971786 0.255829377617 -0.00393747552387 0.253802307025 -0.00374143688909 0.251011478787 -0.00335475310258 0.236543650856 -0.000373194755312 0.0419494800709 -0.000230909648678 0.0394102370205 -0.000150840015851 0.0414956922398 -8.44401840771e-05 0.0460502231327 --6.24759314572e-06 0.0528049937739 --8.82957758148e-05 0.055711244886 -1.16795791952e-05 0.0563188428833 --1.68716267856e-05 0.0575232763711 --0.000112625308645 0.057979929947 --0.000122619090002 0.0564126233493 -1.73569637319e-05 0.05522573909 -6.49872782342e-05 0.0507353361334 -4.17746389178e-05 0.0479568131253 -5.13884475653e-05 0.0461253238047 -1.8860115143e-05 0.0436860476919 --5.64317701105e-05 0.042516381059 --0.000136859948115 0.0413574820205 --7.00847019726e-05 0.0409516370727 --5.39392223336e-05 0.040441504085 --9.24897162815e-05 0.0397800398173 -4.7104970622e-05 0.039046286243 -6.24805896165e-06 0.0380185986602 --2.35272813418e-05 0.036851063786 -5.88344154127e-05 0.0361640489242 --8.39162076993e-05 0.0357639427311 --0.000108702805776 0.0358774639538 -3.22013961834e-06 0.0363644530435 -9.43501518394e-05 0.0370309934774 -0.000134406229423 0.0374972993343 -3.84007008533e-05 0.037676222515 -3.05989328157e-05 0.0379111939182 -9.52201629091e-05 0.0380927209106 -0.000102126083729 0.0379925358499 -6.98628072264e-05 0.0377276252241 -4.55782256339e-05 0.0375165468654 -4.76370987786e-05 0.0371482526345 --2.24128832709e-05 0.0366810742947 -0.000125621306953 0.036628355271 -0.000134568666093 0.0364860461759 -0.000159858844464 0.0345583593149 diff --git a/fluid/DeepASR/data_utils/augmentor/tests/test_data_trans.py b/fluid/DeepASR/data_utils/augmentor/tests/test_data_trans.py deleted file mode 100644 index 6b18f3fa5958a9e44899b39b1f583311f186f72e..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/data_utils/augmentor/tests/test_data_trans.py +++ /dev/null @@ -1,136 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import sys -import unittest -import numpy as np -import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm -import data_utils.augmentor.trans_add_delta as trans_add_delta -import data_utils.augmentor.trans_splice as trans_splice -import data_utils.augmentor.trans_delay as trans_delay - - -class TestTransMeanVarianceNorm(unittest.TestCase): - """unit test for TransMeanVarianceNorm - """ - - def setUp(self): - self._file_path = "./data_utils/augmentor/tests/data/" \ - "global_mean_var_search26kHr" - - def test(self): - feature = np.zeros((2, 120), dtype="float32") - feature.fill(1) - trans = trans_mean_variance_norm.TransMeanVarianceNorm(self._file_path) - (feature1, label1, name) = trans.perform_trans((feature, None, None)) - (mean, var) = trans.get_mean_var() - feature_flat1 = feature1.flatten() - feature_flat = feature.flatten() - one = np.ones((1), dtype="float32") - for idx, val in enumerate(feature_flat1): - cur_idx = idx % 120 - self.assertAlmostEqual(val, (one[0] - mean[cur_idx]) * var[cur_idx]) - - -class TestTransAddDelta(unittest.TestCase): - """unit test TestTransAddDelta - """ - - def test_regress(self): - """test regress - """ - feature = np.zeros((14, 120), dtype="float32") - feature[0:5, 0:40].fill(1) - feature[0 + 5, 0:40].fill(1) - feature[1 + 5, 0:40].fill(2) - feature[2 + 5, 0:40].fill(3) - feature[3 + 5, 0:40].fill(4) - feature[8:14, 0:40].fill(4) - trans = trans_add_delta.TransAddDelta() - feature = feature.reshape((14 * 120)) - trans._regress(feature, 5 * 120, feature, 5 * 120 + 40, 40, 4, 120) - trans._regress(feature, 5 * 120 + 40, feature, 5 * 120 + 80, 40, 4, 120) - feature = feature.reshape((14, 120)) - tmp_feature = feature[5:5 + 4, :] - self.assertAlmostEqual(1.0, tmp_feature[0][0]) - self.assertAlmostEqual(0.24, tmp_feature[0][119]) - self.assertAlmostEqual(2.0, tmp_feature[1][0]) - self.assertAlmostEqual(0.13, tmp_feature[1][119]) - self.assertAlmostEqual(3.0, tmp_feature[2][0]) - self.assertAlmostEqual(-0.13, tmp_feature[2][119]) - self.assertAlmostEqual(4.0, tmp_feature[3][0]) - self.assertAlmostEqual(-0.24, tmp_feature[3][119]) - - def test_perform(self): - """test perform - """ - feature = np.zeros((4, 40), dtype="float32") - feature[0, 0:40].fill(1) - feature[1, 0:40].fill(2) - feature[2, 0:40].fill(3) - feature[3, 0:40].fill(4) - trans = trans_add_delta.TransAddDelta() - (feature, label, name) = trans.perform_trans((feature, None, None)) - self.assertAlmostEqual(feature.shape[0], 4) - self.assertAlmostEqual(feature.shape[1], 120) - self.assertAlmostEqual(1.0, feature[0][0]) - self.assertAlmostEqual(0.24, feature[0][119]) - self.assertAlmostEqual(2.0, feature[1][0]) - self.assertAlmostEqual(0.13, feature[1][119]) - self.assertAlmostEqual(3.0, feature[2][0]) - self.assertAlmostEqual(-0.13, feature[2][119]) - self.assertAlmostEqual(4.0, feature[3][0]) - self.assertAlmostEqual(-0.24, feature[3][119]) - - -class TestTransSplict(unittest.TestCase): - """unit test Test TransSplict - """ - - def test_perfrom(self): - feature = np.zeros((8, 10), dtype="float32") - for i in xrange(feature.shape[0]): - feature[i, :].fill(i) - - trans = trans_splice.TransSplice() - (feature, label, name) = trans.perform_trans((feature, None, None)) - self.assertEqual(feature.shape[1], 110) - - for i in xrange(8): - nzero_num = 5 - i - cur_val = 0.0 - if nzero_num < 0: - cur_val = i - 5 - 1 - for j in xrange(11): - if j <= nzero_num: - for k in xrange(10): - self.assertAlmostEqual(feature[i][j * 10 + k], cur_val) - else: - if cur_val < 7: - cur_val += 1.0 - for k in xrange(10): - self.assertAlmostEqual(feature[i][j * 10 + k], cur_val) - - -class TestTransDelay(unittest.TestCase): - """unittest TransDelay - """ - - def test_perform(self): - label = np.zeros((10, 1), dtype="int64") - for i in xrange(10): - label[i][0] = i - - trans = trans_delay.TransDelay(5) - (_, label, _) = trans.perform_trans((None, label, None)) - - for i in xrange(5): - self.assertAlmostEqual(label[i + 5][0], i) - - for i in xrange(5): - self.assertAlmostEqual(label[i][0], 0) - - -if __name__ == '__main__': - unittest.main() diff --git a/fluid/DeepASR/data_utils/augmentor/trans_add_delta.py b/fluid/DeepASR/data_utils/augmentor/trans_add_delta.py deleted file mode 100644 index aa8062f87c932b76dd8a79db825d07e8be273857..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/data_utils/augmentor/trans_add_delta.py +++ /dev/null @@ -1,104 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import numpy as np -import math -import copy - - -class TransAddDelta(object): - """ add delta of feature data - trans feature for shape(a, b) to shape(a, b * 3) - - Attributes: - _norder(int): - _window(int): - """ - - def __init__(self, norder=2, nwindow=2): - """ init construction - Args: - norder: default 2 - nwindow: default 2 - """ - self._norder = norder - self._nwindow = nwindow - - def perform_trans(self, sample): - """ add delta for feature - trans feature shape from (a,b) to (a, b * 3) - - Args: - sample(object,tuple): contain feature numpy and label numpy - Returns: - (feature, label, name) - """ - (feature, label, name) = sample - frame_dim = feature.shape[1] - d_frame_dim = frame_dim * 3 - head_filled = 5 - tail_filled = 5 - mat = np.zeros( - (feature.shape[0] + head_filled + tail_filled, d_frame_dim), - dtype="float32") - #copy first frame - for i in xrange(head_filled): - np.copyto(mat[i, 0:frame_dim], feature[0, :]) - - np.copyto(mat[head_filled:head_filled + feature.shape[0], 0:frame_dim], - feature[:, :]) - - # copy last frame - for i in xrange(head_filled + feature.shape[0], mat.shape[0], 1): - np.copyto(mat[i, 0:frame_dim], feature[feature.shape[0] - 1, :]) - - nframe = feature.shape[0] - start = head_filled - tmp_shape = mat.shape - mat = mat.reshape((tmp_shape[0] * tmp_shape[1])) - self._regress(mat, start * d_frame_dim, mat, - start * d_frame_dim + frame_dim, frame_dim, nframe, - d_frame_dim) - self._regress(mat, start * d_frame_dim + frame_dim, mat, - start * d_frame_dim + 2 * frame_dim, frame_dim, nframe, - d_frame_dim) - mat.shape = tmp_shape - return (mat[head_filled:mat.shape[0] - tail_filled, :], label, name) - - def _regress(self, data_in, start_in, data_out, start_out, size, n, step): - """ regress - Args: - data_in: in data - start_in: start index of data_in - data_out: out data - start_out: start index of data_out - size: frame dimentional - n: frame num - step: 3 * (frame num) - Returns: - None - """ - sigma_t2 = 0.0 - delta_window = self._nwindow - for t in xrange(1, delta_window + 1): - sigma_t2 += t * t - - sigma_t2 *= 2.0 - for i in xrange(n): - fp1 = start_in - fp2 = start_out - for j in xrange(size): - back = fp1 - forw = fp1 - sum = 0.0 - for t in xrange(1, delta_window + 1): - back -= step - forw += step - sum += t * (data_in[forw] - data_in[back]) - - data_out[fp2] = sum / sigma_t2 - fp1 += 1 - fp2 += 1 - start_in += step - start_out += step diff --git a/fluid/DeepASR/data_utils/augmentor/trans_delay.py b/fluid/DeepASR/data_utils/augmentor/trans_delay.py deleted file mode 100644 index b782498edfd5443806a6c80e3b4fe91b8e2b1cc9..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/data_utils/augmentor/trans_delay.py +++ /dev/null @@ -1,37 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import numpy as np -import math - - -class TransDelay(object): - """ Delay label, and copy first label value in the front. - Attributes: - _delay_time : the delay frame num of label - """ - - def __init__(self, delay_time): - """init construction - Args: - delay_time : the delay frame num of label - """ - self._delay_time = delay_time - - def perform_trans(self, sample): - """ - Args: - sample(object):input sample, contain feature numpy and label numpy, sample name list - Returns: - (feature, label, name) - """ - (feature, label, name) = sample - - shape = label.shape - assert len(shape) == 2 - label[self._delay_time:shape[0]] = label[0:shape[0] - self._delay_time] - for i in xrange(self._delay_time): - label[i][0] = label[self._delay_time][0] - - return (feature, label, name) diff --git a/fluid/DeepASR/data_utils/augmentor/trans_mean_variance_norm.py b/fluid/DeepASR/data_utils/augmentor/trans_mean_variance_norm.py deleted file mode 100644 index 9f91b726ea2bcd432340cd06a3cb9006cd5f83f4..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/data_utils/augmentor/trans_mean_variance_norm.py +++ /dev/null @@ -1,71 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import numpy as np -import math - - -class TransMeanVarianceNorm(object): - """ normalization of mean variance for feature data - Attributes: - _mean(numpy.array): the feature mean vector - _var(numpy.array): the feature variance - """ - - def __init__(self, snorm_path): - """init construction - Args: - snorm_path: the path of mean and variance - """ - self._mean = None - self._var = None - self._load_norm(snorm_path) - - def _load_norm(self, snorm_path): - """ load mean var file - Args: - snorm_path(str):the file path - """ - lLines = open(snorm_path).readlines() - nLen = len(lLines) - self._mean = np.zeros((nLen), dtype="float32") - self._var = np.zeros((nLen), dtype="float32") - self._nLen = nLen - for nidx, l in enumerate(lLines): - s = l.split() - assert len(s) == 2 - self._mean[nidx] = float(s[0]) - self._var[nidx] = 1.0 / math.sqrt(float(s[1])) - if self._var[nidx] > 100000.0: - self._var[nidx] = 100000.0 - - def get_mean_var(self): - """ get mean and var - Args: - Returns: - (mean, var) - """ - return (self._mean, self._var) - - def perform_trans(self, sample): - """ feature = (feature - mean) * var - Args: - sample(object):input sample, contain feature numpy and label numpy - Returns: - (feature, label, name) - """ - (feature, label, name) = sample - shape = feature.shape - assert len(shape) == 2 - nfeature_len = shape[0] * shape[1] - assert nfeature_len % self._nLen == 0 - ncur_idx = 0 - feature = feature.reshape((nfeature_len)) - while ncur_idx < nfeature_len: - block = feature[ncur_idx:ncur_idx + self._nLen] - block = (block - self._mean) * self._var - feature[ncur_idx:ncur_idx + self._nLen] = block - ncur_idx += self._nLen - feature = feature.reshape(shape) - return (feature, label, name) diff --git a/fluid/DeepASR/data_utils/augmentor/trans_splice.py b/fluid/DeepASR/data_utils/augmentor/trans_splice.py deleted file mode 100644 index 1fab3d6b442c1613f18d16fd0b0ee89464dbeb2c..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/data_utils/augmentor/trans_splice.py +++ /dev/null @@ -1,64 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import numpy as np -import math - - -class TransSplice(object): - """ copy feature context to construct new feature - expand feature data from shape (frame_num, frame_dim) - to shape (frame_num, frame_dim * 11) - - Attributes: - _nleft_context(int): copy left context number - _nright_context(int): copy right context number - """ - - def __init__(self, nleft_context=5, nright_context=5): - """ init construction - Args: - nleft_context(int): - nright_context(int): - """ - self._nleft_context = nleft_context - self._nright_context = nright_context - - def perform_trans(self, sample): - """ copy feature context - Args: - sample(object): input sample(feature, label) - Return: - (feature, label, name) - """ - (feature, label, name) = sample - nframe_num = feature.shape[0] - nframe_dim = feature.shape[1] - nnew_frame_dim = nframe_dim * ( - self._nleft_context + self._nright_context + 1) - mat = np.zeros( - (nframe_num + self._nleft_context + self._nright_context, - nframe_dim), - dtype="float32") - ret = np.zeros((nframe_num, nnew_frame_dim), dtype="float32") - - #copy left - for i in xrange(self._nleft_context): - mat[i, :] = feature[0, :] - - #copy middle - mat[self._nleft_context:self._nleft_context + - nframe_num, :] = feature[:, :] - - #copy right - for i in xrange(self._nright_context): - mat[i + self._nleft_context + nframe_num, :] = feature[-1, :] - - mat = mat.reshape(mat.shape[0] * mat.shape[1]) - ret = ret.reshape(ret.shape[0] * ret.shape[1]) - for i in xrange(nframe_num): - np.copyto(ret[i * nnew_frame_dim:(i + 1) * nnew_frame_dim], - mat[i * nframe_dim:i * nframe_dim + nnew_frame_dim]) - ret = ret.reshape((nframe_num, nnew_frame_dim)) - return (ret, label, name) diff --git a/fluid/DeepASR/data_utils/util.py b/fluid/DeepASR/data_utils/util.py deleted file mode 100644 index 4a5a8a3f1dad1c46ed773fd48d713e276717d5e5..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/data_utils/util.py +++ /dev/null @@ -1,71 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function -import sys -from six import reraise -from tblib import Traceback - -import numpy as np - - -def to_lodtensor(data, place): - """convert tensor to lodtensor - """ - seq_lens = [len(seq) for seq in data] - cur_len = 0 - lod = [cur_len] - for l in seq_lens: - cur_len += l - lod.append(cur_len) - flattened_data = numpy.concatenate(data, axis=0).astype("int64") - flattened_data = flattened_data.reshape([len(flattened_data), 1]) - res = fluid.LoDTensor() - res.set(flattened_data, place) - res.set_lod([lod]) - return res - - -def split_infer_result(infer_seq, lod): - infer_batch = [] - for i in xrange(0, len(lod[0]) - 1): - infer_batch.append(infer_seq[lod[0][i]:lod[0][i + 1]]) - return infer_batch - - -class CriticalException(Exception): - pass - - -def suppress_signal(signo, stack_frame): - pass - - -def suppress_complaints(verbose, notify=None): - def decorator_maker(func): - def suppress_warpper(*args, **kwargs): - try: - func(*args, **kwargs) - except: - et, ev, tb = sys.exc_info() - - if notify is not None: - notify(except_type=et, except_value=ev, traceback=tb) - - if verbose == 1 or isinstance(ev, CriticalException): - reraise(et, ev, Traceback(tb).as_traceback()) - - return suppress_warpper - - return decorator_maker - - -class ForceExitWrapper(object): - def __init__(self, exit_flag): - self._exit_flag = exit_flag - - @suppress_complaints(verbose=0) - def __call__(self, *args, **kwargs): - self._exit_flag.value = True - - def __eq__(self, flag): - return self._exit_flag.value == flag diff --git a/fluid/DeepASR/decoder/.gitignore b/fluid/DeepASR/decoder/.gitignore deleted file mode 100644 index ef5c97cfb5c06f3308980ca65c87e9c4b9440171..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/decoder/.gitignore +++ /dev/null @@ -1,4 +0,0 @@ -ThreadPool -build -post_latgen_faster_mapped.so -pybind11 diff --git a/fluid/DeepASR/decoder/post_latgen_faster_mapped.cc b/fluid/DeepASR/decoder/post_latgen_faster_mapped.cc deleted file mode 100644 index ad8aaa84803d61bbce3d76757954e47f8585ed8b..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/decoder/post_latgen_faster_mapped.cc +++ /dev/null @@ -1,305 +0,0 @@ -/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. */ - -#include "post_latgen_faster_mapped.h" -#include -#include "ThreadPool.h" - -using namespace kaldi; -typedef kaldi::int32 int32; -using fst::SymbolTable; -using fst::Fst; -using fst::StdArc; - -Decoder::Decoder(std::string trans_model_in_filename, - std::string word_syms_filename, - std::string fst_in_filename, - std::string logprior_in_filename, - size_t beam_size, - kaldi::BaseFloat acoustic_scale) { - const char *usage = - "Generate lattices using neural net model.\n" - "Usage: post-latgen-faster-mapped [options] " - " " - " [ [] " - "]\n"; - ParseOptions po(usage); - allow_partial = false; - this->acoustic_scale = acoustic_scale; - - config.Register(&po); - int32 beam = 11; - po.Register("acoustic-scale", - &acoustic_scale, - "Scaling factor for acoustic likelihoods"); - po.Register("word-symbol-table", - &word_syms_filename, - "Symbol table for words [for debug output]"); - po.Register("allow-partial", - &allow_partial, - "If true, produce output even if end state was not reached."); - - int argc = 2; - char *argv[] = {(char *)"post-latgen-faster-mapped", - (char *)("--beam=" + std::to_string(beam_size)).c_str()}; - - po.Read(argc, argv); - - std::ifstream is_logprior(logprior_in_filename); - logprior.Read(is_logprior, false); - - { - bool binary; - Input ki(trans_model_in_filename, &binary); - this->trans_model.Read(ki.Stream(), binary); - } - - this->determinize = config.determinize_lattice; - - this->word_syms = NULL; - if (word_syms_filename != "") { - if (!(word_syms = fst::SymbolTable::ReadText(word_syms_filename))) { - KALDI_ERR << "Could not read symbol table from file " - << word_syms_filename; - } - } - - // Input FST is just one FST, not a table of FSTs. - this->decode_fst = fst::ReadFstKaldiGeneric(fst_in_filename); - - kaldi::LatticeFasterDecoder *decoder = - new LatticeFasterDecoder(*decode_fst, config); - decoder_pool.emplace_back(decoder); - - std::string lattice_wspecifier = - "ark:|gzip -c > mapped_decoder_data/lat.JOB.gz"; - if (!(determinize ? compact_lattice_writer.Open(lattice_wspecifier) - : lattice_writer.Open(lattice_wspecifier))) - KALDI_ERR << "Could not open table for writing lattices: " - << lattice_wspecifier; - - words_writer = new Int32VectorWriter(""); - alignment_writer = new Int32VectorWriter(""); -} - -Decoder::~Decoder() { - if (!this->word_syms) delete this->word_syms; - delete this->decode_fst; - for (size_t i = 0; i < decoder_pool.size(); ++i) { - delete decoder_pool[i]; - } - delete words_writer; - delete alignment_writer; -} - - -void Decoder::decode_from_file(std::string posterior_rspecifier, - size_t num_processes) { - try { - double tot_like = 0.0; - kaldi::int64 frame_count = 0; - // int num_success = 0, num_fail = 0; - - KALDI_ASSERT(ClassifyRspecifier(fst_in_filename, NULL, NULL) == - kNoRspecifier); - SequentialBaseFloatMatrixReader posterior_reader("ark:" + - posterior_rspecifier); - - Timer timer; - timer.Reset(); - double elapsed = 0.0; - - for (size_t n = decoder_pool.size(); n < num_processes; ++n) { - kaldi::LatticeFasterDecoder *decoder = - new LatticeFasterDecoder(*decode_fst, config); - decoder_pool.emplace_back(decoder); - } - elapsed = timer.Elapsed(); - ThreadPool thread_pool(num_processes); - - while (!posterior_reader.Done()) { - timer.Reset(); - std::vector> que; - for (size_t i = 0; i < num_processes && !posterior_reader.Done(); ++i) { - std::string utt = posterior_reader.Key(); - Matrix &loglikes(posterior_reader.Value()); - que.emplace_back(thread_pool.enqueue(std::bind( - &Decoder::decode_internal, this, decoder_pool[i], utt, loglikes))); - posterior_reader.Next(); - } - timer.Reset(); - for (size_t i = 0; i < que.size(); ++i) { - std::cout << que[i].get() << std::endl; - } - } - - } catch (const std::exception &e) { - std::cerr << e.what(); - } -} - -inline kaldi::Matrix vector2kaldi_mat( - const std::vector> &log_probs) { - size_t num_frames = log_probs.size(); - size_t dim_label = log_probs[0].size(); - kaldi::Matrix loglikes( - num_frames, dim_label, kaldi::kSetZero, kaldi::kStrideEqualNumCols); - for (size_t i = 0; i < num_frames; ++i) { - memcpy(loglikes.Data() + i * dim_label, - log_probs[i].data(), - sizeof(kaldi::BaseFloat) * dim_label); - } - return loglikes; -} - -std::vector Decoder::decode_batch( - std::vector keys, - const std::vector>> - &log_probs_batch, - size_t num_processes) { - ThreadPool thread_pool(num_processes); - std::vector decoding_results; //(keys.size(), ""); - - for (size_t n = decoder_pool.size(); n < num_processes; ++n) { - kaldi::LatticeFasterDecoder *decoder = - new LatticeFasterDecoder(*decode_fst, config); - decoder_pool.emplace_back(decoder); - } - - size_t index = 0; - while (index < keys.size()) { - std::vector> res_in_que; - for (size_t t = 0; t < num_processes && index < keys.size(); ++t) { - kaldi::Matrix loglikes = - vector2kaldi_mat(log_probs_batch[index]); - res_in_que.emplace_back( - thread_pool.enqueue(std::bind(&Decoder::decode_internal, - this, - decoder_pool[t], - keys[index], - loglikes))); - index++; - } - for (size_t i = 0; i < res_in_que.size(); ++i) { - decoding_results.emplace_back(res_in_que[i].get()); - } - } - return decoding_results; -} - -std::string Decoder::decode( - std::string key, - const std::vector> &log_probs) { - kaldi::Matrix loglikes = vector2kaldi_mat(log_probs); - return decode_internal(decoder_pool[0], key, loglikes); -} - - -std::string Decoder::decode_internal( - LatticeFasterDecoder *decoder, - std::string key, - kaldi::Matrix &loglikes) { - if (loglikes.NumRows() == 0) { - KALDI_WARN << "Zero-length utterance: " << key; - // num_fail++; - } - KALDI_ASSERT(loglikes.NumCols() == logprior.Dim()); - - loglikes.ApplyLog(); - loglikes.AddVecToRows(-1.0, logprior); - - DecodableMatrixScaledMapped matrix_decodable( - trans_model, loglikes, acoustic_scale); - double like; - return this->DecodeUtteranceLatticeFaster( - decoder, matrix_decodable, key, &like); -} - - -std::string Decoder::DecodeUtteranceLatticeFaster( - LatticeFasterDecoder *decoder, - DecodableInterface &decodable, // not const but is really an input. - std::string utt, - double *like_ptr) { // puts utterance's like in like_ptr on success. - using fst::VectorFst; - std::string ret = utt + ' '; - - if (!decoder->Decode(&decodable)) { - KALDI_WARN << "Failed to decode file " << utt; - return ret; - } - if (!decoder->ReachedFinal()) { - if (allow_partial) { - KALDI_WARN << "Outputting partial output for utterance " << utt - << " since no final-state reached\n"; - } else { - KALDI_WARN << "Not producing output for utterance " << utt - << " since no final-state reached and " - << "--allow-partial=false.\n"; - return ret; - } - } - - double likelihood; - LatticeWeight weight; - int32 num_frames; - { // First do some stuff with word-level traceback... - VectorFst decoded; - if (!decoder->GetBestPath(&decoded)) - // Shouldn't really reach this point as already checked success. - KALDI_ERR << "Failed to get traceback for utterance " << utt; - - std::vector alignment; - std::vector words; - GetLinearSymbolSequence(decoded, &alignment, &words, &weight); - num_frames = alignment.size(); - // if (alignment_writer->IsOpen()) alignment_writer->Write(utt, alignment); - if (word_syms != NULL) { - for (size_t i = 0; i < words.size(); i++) { - std::string s = word_syms->Find(words[i]); - ret += s + ' '; - } - } - likelihood = -(weight.Value1() + weight.Value2()); - } - - // Get lattice, and do determinization if requested. - Lattice lat; - decoder->GetRawLattice(&lat); - if (lat.NumStates() == 0) - KALDI_ERR << "Unexpected problem getting lattice for utterance " << utt; - fst::Connect(&lat); - if (determinize) { - CompactLattice clat; - if (!DeterminizeLatticePhonePrunedWrapper( - trans_model, - &lat, - decoder->GetOptions().lattice_beam, - &clat, - decoder->GetOptions().det_opts)) - KALDI_WARN << "Determinization finished earlier than the beam for " - << "utterance " << utt; - // We'll write the lattice without acoustic scaling. - if (acoustic_scale != 0.0) - fst::ScaleLattice(fst::AcousticLatticeScale(1.0 / acoustic_scale), &clat); - // disable output lattice temporarily - // compact_lattice_writer.Write(utt, clat); - } else { - // We'll write the lattice without acoustic scaling. - if (acoustic_scale != 0.0) - fst::ScaleLattice(fst::AcousticLatticeScale(1.0 / acoustic_scale), &lat); - // lattice_writer.Write(utt, lat); - } - return ret; -} diff --git a/fluid/DeepASR/decoder/post_latgen_faster_mapped.h b/fluid/DeepASR/decoder/post_latgen_faster_mapped.h deleted file mode 100644 index 9c234b8681690b9f1e3d30b61ac3b97b7055887f..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/decoder/post_latgen_faster_mapped.h +++ /dev/null @@ -1,80 +0,0 @@ -/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. */ - -#include -#include -#include "base/kaldi-common.h" -#include "base/timer.h" -#include "decoder/decodable-matrix.h" -#include "decoder/decoder-wrappers.h" -#include "fstext/kaldi-fst-io.h" -#include "hmm/transition-model.h" -#include "tree/context-dep.h" -#include "util/common-utils.h" - -class Decoder { -public: - Decoder(std::string trans_model_in_filename, - std::string word_syms_filename, - std::string fst_in_filename, - std::string logprior_in_filename, - size_t beam_size, - kaldi::BaseFloat acoustic_scale); - ~Decoder(); - - // Interface to accept the scores read from specifier and print - // the decoding results directly - void decode_from_file(std::string posterior_rspecifier, - size_t num_processes = 1); - - // Accept the scores of one utterance and return the decoding result - std::string decode( - std::string key, - const std::vector> &log_probs); - - // Accept the scores of utterances in batch and return the decoding results - std::vector decode_batch( - std::vector key, - const std::vector>> - &log_probs_batch, - size_t num_processes = 1); - -private: - // For decoding one utterance - std::string decode_internal(kaldi::LatticeFasterDecoder *decoder, - std::string key, - kaldi::Matrix &loglikes); - - std::string DecodeUtteranceLatticeFaster(kaldi::LatticeFasterDecoder *decoder, - kaldi::DecodableInterface &decodable, - std::string utt, - double *like_ptr); - - fst::SymbolTable *word_syms; - fst::Fst *decode_fst; - std::vector decoder_pool; - kaldi::Vector logprior; - kaldi::TransitionModel trans_model; - kaldi::LatticeFasterDecoderConfig config; - - kaldi::CompactLatticeWriter compact_lattice_writer; - kaldi::LatticeWriter lattice_writer; - kaldi::Int32VectorWriter *words_writer; - kaldi::Int32VectorWriter *alignment_writer; - - bool binary; - bool determinize; - kaldi::BaseFloat acoustic_scale; - bool allow_partial; -}; diff --git a/fluid/DeepASR/decoder/pybind.cc b/fluid/DeepASR/decoder/pybind.cc deleted file mode 100644 index 4a9b27d4cf862e5c1492875512fdeba3e95ecb15..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/decoder/pybind.cc +++ /dev/null @@ -1,51 +0,0 @@ -/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. */ - -#include -#include - -#include "post_latgen_faster_mapped.h" - -namespace py = pybind11; - -PYBIND11_MODULE(post_latgen_faster_mapped, m) { - m.doc() = "Decoder for Deep ASR model"; - - py::class_(m, "Decoder") - .def(py::init()) - .def("decode_from_file", - (void (Decoder::*)(std::string, size_t)) & Decoder::decode_from_file, - "Decode for the probability matrices in specifier " - "and print the transcriptions.") - .def( - "decode", - (std::string (Decoder::*)( - std::string, const std::vector>&)) & - Decoder::decode, - "Decode one input probability matrix " - "and return the transcription.") - .def("decode_batch", - (std::vector (Decoder::*)( - std::vector, - const std::vector>>&, - size_t num_processes)) & - Decoder::decode_batch, - "Decode one batch of probability matrices " - "and return the transcriptions."); -} diff --git a/fluid/DeepASR/decoder/setup.py b/fluid/DeepASR/decoder/setup.py deleted file mode 100644 index 81fc857cce5b57af5bce7b34a1f4243fb853c0b6..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/decoder/setup.py +++ /dev/null @@ -1,71 +0,0 @@ -# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -import glob -from distutils.core import setup, Extension -from distutils.sysconfig import get_config_vars - -try: - kaldi_root = os.environ['KALDI_ROOT'] -except: - raise ValueError("Enviroment variable 'KALDI_ROOT' is not defined. Please " - "install kaldi and export KALDI_ROOT= .") - -args = [ - '-std=c++11', '-fopenmp', '-Wno-sign-compare', '-Wno-unused-variable', - '-Wno-unused-local-typedefs', '-Wno-unused-but-set-variable', - '-Wno-deprecated-declarations', '-Wno-unused-function' -] - -# remove warning about -Wstrict-prototypes -(opt, ) = get_config_vars('OPT') -os.environ['OPT'] = " ".join(flag for flag in opt.split() - if flag != '-Wstrict-prototypes') -os.environ['CC'] = 'g++' - -LIBS = [ - 'fst', 'kaldi-base', 'kaldi-util', 'kaldi-matrix', 'kaldi-tree', - 'kaldi-hmm', 'kaldi-fstext', 'kaldi-decoder', 'kaldi-lat' -] - -LIB_DIRS = [ - 'tools/openfst/lib', 'src/base', 'src/matrix', 'src/util', 'src/tree', - 'src/hmm', 'src/fstext', 'src/decoder', 'src/lat' -] -LIB_DIRS = [os.path.join(kaldi_root, path) for path in LIB_DIRS] -LIB_DIRS = [os.path.abspath(path) for path in LIB_DIRS] - -ext_modules = [ - Extension( - 'post_latgen_faster_mapped', - ['pybind.cc', 'post_latgen_faster_mapped.cc'], - include_dirs=[ - 'pybind11/include', '.', os.path.join(kaldi_root, 'src'), - os.path.join(kaldi_root, 'tools/openfst/src/include'), 'ThreadPool' - ], - language='c++', - libraries=LIBS, - library_dirs=LIB_DIRS, - runtime_library_dirs=LIB_DIRS, - extra_compile_args=args, ), -] - -setup( - name='post_latgen_faster_mapped', - version='0.1.0', - author='Paddle', - author_email='', - description='Decoder for Deep ASR model', - ext_modules=ext_modules, ) diff --git a/fluid/DeepASR/decoder/setup.sh b/fluid/DeepASR/decoder/setup.sh deleted file mode 100644 index 238cc64986900bae6fa0bb403d8134981212b8ea..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/decoder/setup.sh +++ /dev/null @@ -1,12 +0,0 @@ -set -e - -if [ ! -d pybind11 ]; then - git clone https://github.com/pybind/pybind11.git -fi - -if [ ! -d ThreadPool ]; then - git clone https://github.com/progschj/ThreadPool.git - echo -e "\n" -fi - -python setup.py build_ext -i diff --git a/fluid/DeepASR/examples/aishell/.gitignore b/fluid/DeepASR/examples/aishell/.gitignore deleted file mode 100644 index c173dd880ae9e06c16989800e06d4d3d7a1a7d5f..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/examples/aishell/.gitignore +++ /dev/null @@ -1,4 +0,0 @@ -aux.tar.gz -aux -data -checkpoints diff --git a/fluid/DeepASR/examples/aishell/download_pretrained_model.sh b/fluid/DeepASR/examples/aishell/download_pretrained_model.sh deleted file mode 100644 index a8813e241c4f6e40392dff6f173160d2bbd77175..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/examples/aishell/download_pretrained_model.sh +++ /dev/null @@ -1,15 +0,0 @@ -url=http://deep-asr-data.gz.bcebos.com/aishell_pretrained_model.tar.gz -md5=7b51bde64e884f43901b7a3461ccbfa3 - -wget -c $url - -echo "Checking md5 sum ..." -md5sum_tmp=`md5sum aishell_pretrained_model.tar.gz | cut -d ' ' -f1` - -if [ $md5sum_tmp != $md5 ]; then - echo "Md5sum check failed, please remove and redownload " - "aishell_pretrained_model.tar.gz." - exit 1 -fi - -tar xvf aishell_pretrained_model.tar.gz diff --git a/fluid/DeepASR/examples/aishell/infer_by_ckpt.sh b/fluid/DeepASR/examples/aishell/infer_by_ckpt.sh deleted file mode 100644 index 2d31757451849afc1412421376484d2ad41962bc..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/examples/aishell/infer_by_ckpt.sh +++ /dev/null @@ -1,18 +0,0 @@ -decode_to_path=./decoding_result.txt - -export CUDA_VISIBLE_DEVICES=0,1,2,3 -python -u ../../infer_by_ckpt.py --batch_size 96 \ - --checkpoint checkpoints/deep_asr.latest.checkpoint \ - --infer_feature_lst data/test_feature.lst \ - --mean_var data/global_mean_var \ - --frame_dim 80 \ - --class_num 3040 \ - --num_threads 24 \ - --beam_size 11 \ - --decode_to_path $decode_to_path \ - --trans_model aux/final.mdl \ - --log_prior aux/logprior \ - --vocabulary aux/graph/words.txt \ - --graphs aux/graph/HCLG.fst \ - --acoustic_scale 0.059 \ - --parallel diff --git a/fluid/DeepASR/examples/aishell/prepare_data.sh b/fluid/DeepASR/examples/aishell/prepare_data.sh deleted file mode 100644 index 8bb7ac5cccb2ba72fd6351fc1e6755f5135740d8..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/examples/aishell/prepare_data.sh +++ /dev/null @@ -1,43 +0,0 @@ -data_dir=~/.cache/paddle/dataset/speech/deep_asr_data/aishell -data_url='http://deep-asr-data.gz.bcebos.com/aishell_data.tar.gz' -lst_url='http://deep-asr-data.gz.bcebos.com/aishell_lst.tar.gz' -aux_url='http://deep-asr-data.gz.bcebos.com/aux.tar.gz' -md5=17669b8d63331c9326f4a9393d289bfb -aux_md5=50e3125eba1e3a2768a6f2e499cc1749 - -if [ ! -e $data_dir ]; then - mkdir -p $data_dir -fi - -if [ ! -e $data_dir/aishell_data.tar.gz ]; then - echo "Download $data_dir/aishell_data.tar.gz ..." - wget -c -P $data_dir $data_url -else - echo "Skip downloading for $data_dir/aishell_data.tar.gz has already existed!" -fi - -echo "Checking md5 sum ..." -md5sum_tmp=`md5sum $data_dir/aishell_data.tar.gz | cut -d ' ' -f1` - -if [ $md5sum_tmp != $md5 ]; then - echo "Md5sum check failed, please remove and redownload " - "$data_dir/aishell_data.tar.gz" - exit 1 -fi - -echo "Untar aishell_data.tar.gz ..." -tar xzf $data_dir/aishell_data.tar.gz -C $data_dir - -if [ ! -e data ]; then - mkdir data -fi - -echo "Download and untar lst files ..." -wget -c -P data $lst_url -tar xvf data/aishell_lst.tar.gz -C data - -ln -s $data_dir data/aishell - -echo "Download and untar aux files ..." -wget -c $aux_url -tar xvf aux.tar.gz diff --git a/fluid/DeepASR/examples/aishell/profile.sh b/fluid/DeepASR/examples/aishell/profile.sh deleted file mode 100644 index e7df868b9ea26db3d91be0c01d0b7ecb63c374de..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/examples/aishell/profile.sh +++ /dev/null @@ -1,7 +0,0 @@ -export CUDA_VISIBLE_DEVICES=0 -python -u ../../tools/profile.py --feature_lst data/train_feature.lst \ - --label_lst data/train_label.lst \ - --mean_var data/global_mean_var \ - --frame_dim 80 \ - --class_num 3040 \ - --batch_size 16 diff --git a/fluid/DeepASR/examples/aishell/score_cer.sh b/fluid/DeepASR/examples/aishell/score_cer.sh deleted file mode 100644 index 70dfcbad4a8427adcc1149fbab02ec674dacde0c..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/examples/aishell/score_cer.sh +++ /dev/null @@ -1,4 +0,0 @@ -ref_txt=aux/test.ref.txt -hyp_txt=decoding_result.txt - -python ../../score_error_rate.py --error_rate_type cer --ref $ref_txt --hyp $hyp_txt diff --git a/fluid/DeepASR/examples/aishell/train.sh b/fluid/DeepASR/examples/aishell/train.sh deleted file mode 100644 index 168581c0ee579ef62f138bb0d8f5bb8886beb90b..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/examples/aishell/train.sh +++ /dev/null @@ -1,14 +0,0 @@ -export CUDA_VISIBLE_DEVICES=4,5,6,7 -python -u ../../train.py --train_feature_lst data/train_feature.lst \ - --train_label_lst data/train_label.lst \ - --val_feature_lst data/val_feature.lst \ - --val_label_lst data/val_label.lst \ - --mean_var data/global_mean_var \ - --checkpoints checkpoints \ - --frame_dim 80 \ - --class_num 3040 \ - --print_per_batches 100 \ - --infer_models '' \ - --batch_size 16 \ - --learning_rate 6.4e-5 \ - --parallel diff --git a/fluid/DeepASR/images/learning_curve.png b/fluid/DeepASR/images/learning_curve.png deleted file mode 100644 index f09e8514e16fa09c8c32f3b455a5515f270df27a..0000000000000000000000000000000000000000 Binary files a/fluid/DeepASR/images/learning_curve.png and /dev/null differ diff --git a/fluid/DeepASR/images/lstmp.png b/fluid/DeepASR/images/lstmp.png deleted file mode 100644 index 72c2fc28998b09218f5dfd9d4c4d09a773b4f503..0000000000000000000000000000000000000000 Binary files a/fluid/DeepASR/images/lstmp.png and /dev/null differ diff --git a/fluid/DeepASR/infer.py b/fluid/DeepASR/infer.py deleted file mode 100644 index 84269261a95c381a9be21425abf43b98006f0886..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/infer.py +++ /dev/null @@ -1,108 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os -import argparse -import paddle.fluid as fluid -import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm -import data_utils.augmentor.trans_add_delta as trans_add_delta -import data_utils.augmentor.trans_splice as trans_splice -import data_utils.async_data_reader as reader -from data_utils.util import lodtensor_to_ndarray -from data_utils.util import split_infer_result - - -def parse_args(): - parser = argparse.ArgumentParser("Inference for stacked LSTMP model.") - parser.add_argument( - '--batch_size', - type=int, - default=32, - help='The sequence number of a batch data. (default: %(default)d)') - parser.add_argument( - '--device', - type=str, - default='GPU', - choices=['CPU', 'GPU'], - help='The device type. (default: %(default)s)') - parser.add_argument( - '--mean_var', - type=str, - default='data/global_mean_var_search26kHr', - help="The path for feature's global mean and variance. " - "(default: %(default)s)") - parser.add_argument( - '--infer_feature_lst', - type=str, - default='data/infer_feature.lst', - help='The feature list path for inference. (default: %(default)s)') - parser.add_argument( - '--infer_label_lst', - type=str, - default='data/infer_label.lst', - help='The label list path for inference. (default: %(default)s)') - parser.add_argument( - '--infer_model_path', - type=str, - default='./infer_models/deep_asr.pass_0.infer.model/', - help='The directory for loading inference model. ' - '(default: %(default)s)') - args = parser.parse_args() - return args - - -def print_arguments(args): - print('----------- Configuration Arguments -----------') - for arg, value in sorted(vars(args).iteritems()): - print('%s: %s' % (arg, value)) - print('------------------------------------------------') - - -def infer(args): - """ Gets one batch of feature data and predicts labels for each sample. - """ - - if not os.path.exists(args.infer_model_path): - raise IOError("Invalid inference model path!") - - place = fluid.CUDAPlace(0) if args.device == 'GPU' else fluid.CPUPlace() - exe = fluid.Executor(place) - - # load model - [infer_program, feed_dict, - fetch_targets] = fluid.io.load_inference_model(args.infer_model_path, exe) - - ltrans = [ - trans_add_delta.TransAddDelta(2, 2), - trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var), - trans_splice.TransSplice() - ] - - infer_data_reader = reader.AsyncDataReader(args.infer_feature_lst, - args.infer_label_lst) - infer_data_reader.set_transformers(ltrans) - - feature_t = fluid.LoDTensor() - one_batch = infer_data_reader.batch_iterator(args.batch_size, 1).next() - - (features, labels, lod) = one_batch - feature_t.set(features, place) - feature_t.set_lod([lod]) - - results = exe.run(infer_program, - feed={feed_dict[0]: feature_t}, - fetch_list=fetch_targets, - return_numpy=False) - - probs, lod = lodtensor_to_ndarray(results[0]) - preds = probs.argmax(axis=1) - infer_batch = split_infer_result(preds, lod) - for index, sample in enumerate(infer_batch): - print("result %d: " % index, sample, '\n') - - -if __name__ == '__main__': - args = parse_args() - print_arguments(args) - infer(args) diff --git a/fluid/DeepASR/infer_by_ckpt.py b/fluid/DeepASR/infer_by_ckpt.py deleted file mode 100644 index 1e0fb15c6d6f05aa1e054b37333b0fa0cb5cd8d9..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/infer_by_ckpt.py +++ /dev/null @@ -1,273 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import sys -import os -import numpy as np -import argparse -import time - -import paddle.fluid as fluid -import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm -import data_utils.augmentor.trans_add_delta as trans_add_delta -import data_utils.augmentor.trans_splice as trans_splice -import data_utils.augmentor.trans_delay as trans_delay -import data_utils.async_data_reader as reader -from data_utils.util import lodtensor_to_ndarray, split_infer_result -from model_utils.model import stacked_lstmp_model -from decoder.post_latgen_faster_mapped import Decoder -from tools.error_rate import char_errors - - -def parse_args(): - parser = argparse.ArgumentParser("Run inference by using checkpoint.") - parser.add_argument( - '--batch_size', - type=int, - default=32, - help='The sequence number of a batch data. (default: %(default)d)') - parser.add_argument( - '--beam_size', - type=int, - default=11, - help='The beam size for decoding. (default: %(default)d)') - parser.add_argument( - '--minimum_batch_size', - type=int, - default=1, - help='The minimum sequence number of a batch data. ' - '(default: %(default)d)') - parser.add_argument( - '--frame_dim', - type=int, - default=80, - help='Frame dimension of feature data. (default: %(default)d)') - parser.add_argument( - '--stacked_num', - type=int, - default=5, - help='Number of lstmp layers to stack. (default: %(default)d)') - parser.add_argument( - '--proj_dim', - type=int, - default=512, - help='Project size of lstmp unit. (default: %(default)d)') - parser.add_argument( - '--hidden_dim', - type=int, - default=1024, - help='Hidden size of lstmp unit. (default: %(default)d)') - parser.add_argument( - '--class_num', - type=int, - default=1749, - help='Number of classes in label. (default: %(default)d)') - parser.add_argument( - '--num_threads', - type=int, - default=10, - help='The number of threads for decoding. (default: %(default)d)') - parser.add_argument( - '--device', - type=str, - default='GPU', - choices=['CPU', 'GPU'], - help='The device type. (default: %(default)s)') - parser.add_argument( - '--parallel', action='store_true', help='If set, run in parallel.') - parser.add_argument( - '--mean_var', - type=str, - default='data/global_mean_var', - help="The path for feature's global mean and variance. " - "(default: %(default)s)") - parser.add_argument( - '--infer_feature_lst', - type=str, - default='data/infer_feature.lst', - help='The feature list path for inference. (default: %(default)s)') - parser.add_argument( - '--checkpoint', - type=str, - default='./checkpoint', - help="The checkpoint path to init model. (default: %(default)s)") - parser.add_argument( - '--trans_model', - type=str, - default='./graph/trans_model', - help="The path to vocabulary. (default: %(default)s)") - parser.add_argument( - '--vocabulary', - type=str, - default='./graph/words.txt', - help="The path to vocabulary. (default: %(default)s)") - parser.add_argument( - '--graphs', - type=str, - default='./graph/TLG.fst', - help="The path to TLG graphs for decoding. (default: %(default)s)") - parser.add_argument( - '--log_prior', - type=str, - default="./logprior", - help="The log prior probs for training data. (default: %(default)s)") - parser.add_argument( - '--acoustic_scale', - type=float, - default=0.2, - help="Scaling factor for acoustic likelihoods. (default: %(default)f)") - parser.add_argument( - '--post_matrix_path', - type=str, - default=None, - help="The path to output post prob matrix. (default: %(default)s)") - parser.add_argument( - '--decode_to_path', - type=str, - default='./decoding_result.txt', - required=True, - help="The path to output the decoding result. (default: %(default)s)") - args = parser.parse_args() - return args - - -def print_arguments(args): - print('----------- Configuration Arguments -----------') - for arg, value in sorted(vars(args).iteritems()): - print('%s: %s' % (arg, value)) - print('------------------------------------------------') - - -class PostMatrixWriter: - """ The writer for outputing the post probability matrix - """ - - def __init__(self, to_path): - self._to_path = to_path - with open(self._to_path, "w") as post_matrix: - post_matrix.seek(0) - post_matrix.truncate() - - def write(self, keys, probs): - with open(self._to_path, "a") as post_matrix: - if isinstance(keys, str): - keys, probs = [keys], [probs] - - for key, prob in zip(keys, probs): - post_matrix.write(key + " [\n") - for i in range(prob.shape[0]): - for j in range(prob.shape[1]): - post_matrix.write(str(prob[i][j]) + " ") - post_matrix.write("\n") - post_matrix.write("]\n") - - -class DecodingResultWriter: - """ The writer for writing out decoding results - """ - - def __init__(self, to_path): - self._to_path = to_path - with open(self._to_path, "w") as decoding_result: - decoding_result.seek(0) - decoding_result.truncate() - - def write(self, results): - with open(self._to_path, "a") as decoding_result: - if isinstance(results, str): - decoding_result.write(results.encode("utf8") + "\n") - else: - for result in results: - decoding_result.write(result.encode("utf8") + "\n") - - -def infer_from_ckpt(args): - """Inference by using checkpoint.""" - - if not os.path.exists(args.checkpoint): - raise IOError("Invalid checkpoint!") - - prediction, avg_cost, accuracy = stacked_lstmp_model( - frame_dim=args.frame_dim, - hidden_dim=args.hidden_dim, - proj_dim=args.proj_dim, - stacked_num=args.stacked_num, - class_num=args.class_num, - parallel=args.parallel) - - infer_program = fluid.default_main_program().clone() - - # optimizer, placeholder - optimizer = fluid.optimizer.Adam( - learning_rate=fluid.layers.exponential_decay( - learning_rate=0.0001, - decay_steps=1879, - decay_rate=1 / 1.2, - staircase=True)) - optimizer.minimize(avg_cost) - - place = fluid.CPUPlace() if args.device == 'CPU' else fluid.CUDAPlace(0) - exe = fluid.Executor(place) - exe.run(fluid.default_startup_program()) - - # load checkpoint. - fluid.io.load_persistables(exe, args.checkpoint) - - # init decoder - decoder = Decoder(args.trans_model, args.vocabulary, args.graphs, - args.log_prior, args.beam_size, args.acoustic_scale) - - ltrans = [ - trans_add_delta.TransAddDelta(2, 2), - trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var), - trans_splice.TransSplice(5, 5), trans_delay.TransDelay(5) - ] - - feature_t = fluid.LoDTensor() - label_t = fluid.LoDTensor() - - # infer data reader - infer_data_reader = reader.AsyncDataReader( - args.infer_feature_lst, drop_frame_len=-1, split_sentence_threshold=-1) - infer_data_reader.set_transformers(ltrans) - - decoding_result_writer = DecodingResultWriter(args.decode_to_path) - post_matrix_writer = None if args.post_matrix_path is None \ - else PostMatrixWriter(args.post_matrix_path) - - for batch_id, batch_data in enumerate( - infer_data_reader.batch_iterator(args.batch_size, - args.minimum_batch_size)): - # load_data - (features, labels, lod, name_lst) = batch_data - features = np.reshape(features, (-1, 11, 3, args.frame_dim)) - features = np.transpose(features, (0, 2, 1, 3)) - feature_t.set(features, place) - feature_t.set_lod([lod]) - label_t.set(labels, place) - label_t.set_lod([lod]) - - results = exe.run(infer_program, - feed={"feature": feature_t, - "label": label_t}, - fetch_list=[prediction, avg_cost, accuracy], - return_numpy=False) - - probs, lod = lodtensor_to_ndarray(results[0]) - infer_batch = split_infer_result(probs, lod) - - print("Decoding batch %d ..." % batch_id) - decoded = decoder.decode_batch(name_lst, infer_batch, args.num_threads) - - decoding_result_writer.write(decoded) - - if args.post_matrix_path is not None: - post_matrix_writer.write(name_lst, infer_batch) - - -if __name__ == '__main__': - args = parse_args() - print_arguments(args) - - infer_from_ckpt(args) diff --git a/fluid/DeepASR/model_utils/__init__.py b/fluid/DeepASR/model_utils/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/fluid/DeepASR/model_utils/model.py b/fluid/DeepASR/model_utils/model.py deleted file mode 100644 index 0b086b55a898a0a29f57132b438684a655e30caf..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/model_utils/model.py +++ /dev/null @@ -1,74 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import paddle.fluid as fluid - - -def stacked_lstmp_model(feature, - label, - hidden_dim, - proj_dim, - stacked_num, - class_num, - parallel=False, - is_train=True): - """ - The model for DeepASR. The main structure is composed of stacked - identical LSTMP (LSTM with recurrent projection) layers. - - When running in training and validation phase, the feeding dictionary - is {'feature', 'label'}, fed by the LodTensor for feature data and - label data respectively. And in inference, only `feature` is needed. - - Args: - frame_dim(int): The frame dimension of feature data. - hidden_dim(int): The hidden state's dimension of the LSTMP layer. - proj_dim(int): The projection size of the LSTMP layer. - stacked_num(int): The number of stacked LSTMP layers. - parallel(bool): Run in parallel or not, default `False`. - is_train(bool): Run in training phase or not, default `True`. - class_dim(int): The number of output classes. - """ - conv1 = fluid.layers.conv2d( - input=feature, - num_filters=32, - filter_size=3, - stride=1, - padding=1, - bias_attr=True, - act="relu") - - pool1 = fluid.layers.pool2d( - conv1, pool_size=3, pool_type="max", pool_stride=2, pool_padding=0) - - stack_input = pool1 - for i in range(stacked_num): - fc = fluid.layers.fc(input=stack_input, - size=hidden_dim * 4, - bias_attr=None) - proj, cell = fluid.layers.dynamic_lstmp( - input=fc, - size=hidden_dim * 4, - proj_size=proj_dim, - bias_attr=True, - use_peepholes=True, - is_reverse=False, - cell_activation="tanh", - proj_activation="tanh") - bn = fluid.layers.batch_norm( - input=proj, - is_test=not is_train, - momentum=0.9, - epsilon=1e-05, - data_layout='NCHW') - stack_input = bn - - prediction = fluid.layers.fc(input=stack_input, - size=class_num, - act='softmax') - - cost = fluid.layers.cross_entropy(input=prediction, label=label) - avg_cost = fluid.layers.mean(x=cost) - acc = fluid.layers.accuracy(input=prediction, label=label) - return prediction, avg_cost, acc diff --git a/fluid/DeepASR/score_error_rate.py b/fluid/DeepASR/score_error_rate.py deleted file mode 100644 index dde5a2448afffcae61c4d033159a5b081e6c79e8..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/score_error_rate.py +++ /dev/null @@ -1,80 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import argparse -from tools.error_rate import char_errors, word_errors - - -def parse_args(): - parser = argparse.ArgumentParser( - "Score word/character error rate (WER/CER) " - "for decoding result.") - parser.add_argument( - '--error_rate_type', - type=str, - default='cer', - choices=['cer', 'wer'], - help="Error rate type. (default: %(default)s)") - parser.add_argument( - '--special_tokens', - type=str, - default='', - help="Special tokens in scoring CER, seperated by space. " - "They shouldn't be splitted and should be treated as one special " - "character. Example: ' ' " - "(default: %(default)s)") - parser.add_argument( - '--ref', type=str, required=True, help="The ground truth text.") - parser.add_argument( - '--hyp', type=str, required=True, help="The decoding result text.") - args = parser.parse_args() - return args - - -if __name__ == '__main__': - - args = parse_args() - ref_dict = {} - sum_errors, sum_ref_len = 0.0, 0 - sent_cnt, not_in_ref_cnt = 0, 0 - - special_tokens = args.special_tokens.split(" ") - - with open(args.ref, "r") as ref_txt: - line = ref_txt.readline() - while line: - del_pos = line.find(" ") - key, sent = line[0:del_pos], line[del_pos + 1:-1].strip() - ref_dict[key] = sent - line = ref_txt.readline() - - with open(args.hyp, "r") as hyp_txt: - line = hyp_txt.readline() - while line: - del_pos = line.find(" ") - key, sent = line[0:del_pos], line[del_pos + 1:-1].strip() - sent_cnt += 1 - line = hyp_txt.readline() - if key not in ref_dict: - not_in_ref_cnt += 1 - continue - - if args.error_rate_type == 'cer': - for sp_tok in special_tokens: - sent = sent.replace(sp_tok, '\0') - errors, ref_len = char_errors( - ref_dict[key].decode("utf8"), - sent.decode("utf8"), - remove_space=True) - else: - errors, ref_len = word_errors(ref_dict[key].decode("utf8"), - sent.decode("utf8")) - sum_errors += errors - sum_ref_len += ref_len - - print("Error rate[%s] = %f (%d/%d)," % - (args.error_rate_type, sum_errors / sum_ref_len, int(sum_errors), - sum_ref_len)) - print("total %d sentences in hyp, %d not presented in ref." % - (sent_cnt, not_in_ref_cnt)) diff --git a/fluid/DeepASR/tools/_init_paths.py b/fluid/DeepASR/tools/_init_paths.py deleted file mode 100644 index 228dbae6bf95231030c1858c4d30b49f162f46e2..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/tools/_init_paths.py +++ /dev/null @@ -1,19 +0,0 @@ -"""Add the parent directory to $PYTHONPATH""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os.path -import sys - - -def add_path(path): - if path not in sys.path: - sys.path.insert(0, path) - - -this_dir = os.path.dirname(__file__) - -# Add project path to PYTHONPATH -proj_path = os.path.join(this_dir, '..') -add_path(proj_path) diff --git a/fluid/DeepASR/tools/error_rate.py b/fluid/DeepASR/tools/error_rate.py deleted file mode 100644 index 215ad39d24a551879d0fd8d4c8892161a0708370..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/tools/error_rate.py +++ /dev/null @@ -1,182 +0,0 @@ -# -*- coding: utf-8 -*- -"""This module provides functions to calculate error rate in different level. -e.g. wer for word-level, cer for char-level. -""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import numpy as np - - -def _levenshtein_distance(ref, hyp): - """Levenshtein distance is a string metric for measuring the difference - between two sequences. Informally, the levenshtein disctance is defined as - the minimum number of single-character edits (substitutions, insertions or - deletions) required to change one word into the other. We can naturally - extend the edits to word level when calculate levenshtein disctance for - two sentences. - """ - m = len(ref) - n = len(hyp) - - # special case - if ref == hyp: - return 0 - if m == 0: - return n - if n == 0: - return m - - if m < n: - ref, hyp = hyp, ref - m, n = n, m - - # use O(min(m, n)) space - distance = np.zeros((2, n + 1), dtype=np.int32) - - # initialize distance matrix - for j in xrange(n + 1): - distance[0][j] = j - - # calculate levenshtein distance - for i in xrange(1, m + 1): - prev_row_idx = (i - 1) % 2 - cur_row_idx = i % 2 - distance[cur_row_idx][0] = i - for j in xrange(1, n + 1): - if ref[i - 1] == hyp[j - 1]: - distance[cur_row_idx][j] = distance[prev_row_idx][j - 1] - else: - s_num = distance[prev_row_idx][j - 1] + 1 - i_num = distance[cur_row_idx][j - 1] + 1 - d_num = distance[prev_row_idx][j] + 1 - distance[cur_row_idx][j] = min(s_num, i_num, d_num) - - return distance[m % 2][n] - - -def word_errors(reference, hypothesis, ignore_case=False, delimiter=' '): - """Compute the levenshtein distance between reference sequence and - hypothesis sequence in word-level. - :param reference: The reference sentence. - :type reference: basestring - :param hypothesis: The hypothesis sentence. - :type hypothesis: basestring - :param ignore_case: Whether case-sensitive or not. - :type ignore_case: bool - :param delimiter: Delimiter of input sentences. - :type delimiter: char - :return: Levenshtein distance and word number of reference sentence. - :rtype: list - """ - if ignore_case == True: - reference = reference.lower() - hypothesis = hypothesis.lower() - - ref_words = filter(None, reference.split(delimiter)) - hyp_words = filter(None, hypothesis.split(delimiter)) - - edit_distance = _levenshtein_distance(ref_words, hyp_words) - return float(edit_distance), len(ref_words) - - -def char_errors(reference, hypothesis, ignore_case=False, remove_space=False): - """Compute the levenshtein distance between reference sequence and - hypothesis sequence in char-level. - :param reference: The reference sentence. - :type reference: basestring - :param hypothesis: The hypothesis sentence. - :type hypothesis: basestring - :param ignore_case: Whether case-sensitive or not. - :type ignore_case: bool - :param remove_space: Whether remove internal space characters - :type remove_space: bool - :return: Levenshtein distance and length of reference sentence. - :rtype: list - """ - if ignore_case == True: - reference = reference.lower() - hypothesis = hypothesis.lower() - - join_char = ' ' - if remove_space == True: - join_char = '' - - reference = join_char.join(filter(None, reference.split(' '))) - hypothesis = join_char.join(filter(None, hypothesis.split(' '))) - - edit_distance = _levenshtein_distance(reference, hypothesis) - return float(edit_distance), len(reference) - - -def wer(reference, hypothesis, ignore_case=False, delimiter=' '): - """Calculate word error rate (WER). WER compares reference text and - hypothesis text in word-level. WER is defined as: - .. math:: - WER = (Sw + Dw + Iw) / Nw - where - .. code-block:: text - Sw is the number of words subsituted, - Dw is the number of words deleted, - Iw is the number of words inserted, - Nw is the number of words in the reference - We can use levenshtein distance to calculate WER. Please draw an attention - that empty items will be removed when splitting sentences by delimiter. - :param reference: The reference sentence. - :type reference: basestring - :param hypothesis: The hypothesis sentence. - :type hypothesis: basestring - :param ignore_case: Whether case-sensitive or not. - :type ignore_case: bool - :param delimiter: Delimiter of input sentences. - :type delimiter: char - :return: Word error rate. - :rtype: float - :raises ValueError: If word number of reference is zero. - """ - edit_distance, ref_len = word_errors(reference, hypothesis, ignore_case, - delimiter) - - if ref_len == 0: - raise ValueError("Reference's word number should be greater than 0.") - - wer = float(edit_distance) / ref_len - return wer - - -def cer(reference, hypothesis, ignore_case=False, remove_space=False): - """Calculate charactor error rate (CER). CER compares reference text and - hypothesis text in char-level. CER is defined as: - .. math:: - CER = (Sc + Dc + Ic) / Nc - where - .. code-block:: text - Sc is the number of characters substituted, - Dc is the number of characters deleted, - Ic is the number of characters inserted - Nc is the number of characters in the reference - We can use levenshtein distance to calculate CER. Chinese input should be - encoded to unicode. Please draw an attention that the leading and tailing - space characters will be truncated and multiple consecutive space - characters in a sentence will be replaced by one space character. - :param reference: The reference sentence. - :type reference: basestring - :param hypothesis: The hypothesis sentence. - :type hypothesis: basestring - :param ignore_case: Whether case-sensitive or not. - :type ignore_case: bool - :param remove_space: Whether remove internal space characters - :type remove_space: bool - :return: Character error rate. - :rtype: float - :raises ValueError: If the reference length is zero. - """ - edit_distance, ref_len = char_errors(reference, hypothesis, ignore_case, - remove_space) - - if ref_len == 0: - raise ValueError("Length of reference should be greater than 0.") - - cer = float(edit_distance) / ref_len - return cer diff --git a/fluid/DeepASR/tools/profile.py b/fluid/DeepASR/tools/profile.py deleted file mode 100644 index d25e18f7db0111acf76e66478f8230aab1d5f760..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/tools/profile.py +++ /dev/null @@ -1,210 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import sys -import numpy as np -import argparse -import time - -import paddle.fluid as fluid -import paddle.fluid.profiler as profiler -import _init_paths -import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm -import data_utils.augmentor.trans_add_delta as trans_add_delta -import data_utils.augmentor.trans_splice as trans_splice -import data_utils.augmentor.trans_delay as trans_delay -import data_utils.async_data_reader as reader -from model_utils.model import stacked_lstmp_model -from data_utils.util import lodtensor_to_ndarray - - -def parse_args(): - parser = argparse.ArgumentParser("Profiling for the stacked LSTMP model.") - parser.add_argument( - '--batch_size', - type=int, - default=32, - help='The sequence number of a batch data. (default: %(default)d)') - parser.add_argument( - '--minimum_batch_size', - type=int, - default=1, - help='The minimum sequence number of a batch data. ' - '(default: %(default)d)') - parser.add_argument( - '--frame_dim', - type=int, - default=120 * 11, - help='Frame dimension of feature data. (default: %(default)d)') - parser.add_argument( - '--stacked_num', - type=int, - default=5, - help='Number of lstmp layers to stack. (default: %(default)d)') - parser.add_argument( - '--proj_dim', - type=int, - default=512, - help='Project size of lstmp unit. (default: %(default)d)') - parser.add_argument( - '--hidden_dim', - type=int, - default=1024, - help='Hidden size of lstmp unit. (default: %(default)d)') - parser.add_argument( - '--class_num', - type=int, - default=1749, - help='Number of classes in label. (default: %(default)d)') - parser.add_argument( - '--learning_rate', - type=float, - default=0.00016, - help='Learning rate used to train. (default: %(default)f)') - parser.add_argument( - '--device', - type=str, - default='GPU', - choices=['CPU', 'GPU'], - help='The device type. (default: %(default)s)') - parser.add_argument( - '--parallel', action='store_true', help='If set, run in parallel.') - parser.add_argument( - '--mean_var', - type=str, - default='data/global_mean_var_search26kHr', - help='mean var path') - parser.add_argument( - '--feature_lst', - type=str, - default='data/feature.lst', - help='feature list path.') - parser.add_argument( - '--label_lst', - type=str, - default='data/label.lst', - help='label list path.') - parser.add_argument( - '--max_batch_num', - type=int, - default=11, - help='Maximum number of batches for profiling. (default: %(default)d)') - parser.add_argument( - '--first_batches_to_skip', - type=int, - default=1, - help='Number of first batches to skip for profiling. ' - '(default: %(default)d)') - parser.add_argument( - '--print_train_acc', - action='store_true', - help='If set, output training accuray.') - parser.add_argument( - '--sorted_key', - type=str, - default='total', - choices=['None', 'total', 'calls', 'min', 'max', 'ave'], - help='Different types of time to sort the profiling report. ' - '(default: %(default)s)') - args = parser.parse_args() - return args - - -def print_arguments(args): - print('----------- Configuration Arguments -----------') - for arg, value in sorted(vars(args).iteritems()): - print('%s: %s' % (arg, value)) - print('------------------------------------------------') - - -def profile(args): - """profile the training process. - """ - - if not args.first_batches_to_skip < args.max_batch_num: - raise ValueError("arg 'first_batches_to_skip' must be smaller than " - "'max_batch_num'.") - if not args.first_batches_to_skip >= 0: - raise ValueError( - "arg 'first_batches_to_skip' must not be smaller than 0.") - - _, avg_cost, accuracy = stacked_lstmp_model( - frame_dim=args.frame_dim, - hidden_dim=args.hidden_dim, - proj_dim=args.proj_dim, - stacked_num=args.stacked_num, - class_num=args.class_num, - parallel=args.parallel) - - optimizer = fluid.optimizer.Adam( - learning_rate=fluid.layers.exponential_decay( - learning_rate=args.learning_rate, - decay_steps=1879, - decay_rate=1 / 1.2, - staircase=True)) - optimizer.minimize(avg_cost) - - place = fluid.CPUPlace() if args.device == 'CPU' else fluid.CUDAPlace(0) - exe = fluid.Executor(place) - exe.run(fluid.default_startup_program()) - - ltrans = [ - trans_add_delta.TransAddDelta(2, 2), - trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var), - trans_splice.TransSplice(5, 5), trans_delay.TransDelay(5) - ] - - data_reader = reader.AsyncDataReader( - args.feature_lst, args.label_lst, -1, split_sentence_threshold=1024) - data_reader.set_transformers(ltrans) - - feature_t = fluid.LoDTensor() - label_t = fluid.LoDTensor() - - sorted_key = None if args.sorted_key is 'None' else args.sorted_key - with profiler.profiler(args.device, sorted_key) as prof: - frames_seen, start_time = 0, 0.0 - for batch_id, batch_data in enumerate( - data_reader.batch_iterator(args.batch_size, - args.minimum_batch_size)): - if batch_id >= args.max_batch_num: - break - if args.first_batches_to_skip == batch_id: - profiler.reset_profiler() - start_time = time.time() - frames_seen = 0 - # load_data - (features, labels, lod, _) = batch_data - features = np.reshape(features, (-1, 11, 3, args.frame_dim)) - features = np.transpose(features, (0, 2, 1, 3)) - feature_t.set(features, place) - feature_t.set_lod([lod]) - label_t.set(labels, place) - label_t.set_lod([lod]) - - frames_seen += lod[-1] - - outs = exe.run(fluid.default_main_program(), - feed={"feature": feature_t, - "label": label_t}, - fetch_list=[avg_cost, accuracy] - if args.print_train_acc else [], - return_numpy=False) - - if args.print_train_acc: - print("Batch %d acc: %f" % - (batch_id, lodtensor_to_ndarray(outs[1])[0])) - else: - sys.stdout.write('.') - sys.stdout.flush() - time_consumed = time.time() - start_time - frames_per_sec = frames_seen / time_consumed - print("\nTime consumed: %f s, performance: %f frames/s." % - (time_consumed, frames_per_sec)) - - -if __name__ == '__main__': - args = parse_args() - print_arguments(args) - profile(args) diff --git a/fluid/DeepASR/train.py b/fluid/DeepASR/train.py deleted file mode 100644 index 1a1dd6cf9ea33bb546cc3bdf65c36be0441832cb..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/train.py +++ /dev/null @@ -1,372 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import sys -import os -import numpy as np -import argparse -import time - -import paddle.fluid as fluid -import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm -import data_utils.augmentor.trans_add_delta as trans_add_delta -import data_utils.augmentor.trans_splice as trans_splice -import data_utils.augmentor.trans_delay as trans_delay -import data_utils.async_data_reader as reader -from model_utils.model import stacked_lstmp_model - - -def parse_args(): - parser = argparse.ArgumentParser("Training for stacked LSTMP model.") - parser.add_argument( - '--batch_size', - type=int, - default=32, - help='The sequence number of a batch data. Batch size per GPU. (default: %(default)d)' - ) - parser.add_argument( - '--minimum_batch_size', - type=int, - default=1, - help='The minimum sequence number of a batch data. ' - '(default: %(default)d)') - parser.add_argument( - '--frame_dim', - type=int, - default=80, - help='Frame dimension of feature data. (default: %(default)d)') - parser.add_argument( - '--stacked_num', - type=int, - default=5, - help='Number of lstmp layers to stack. (default: %(default)d)') - parser.add_argument( - '--proj_dim', - type=int, - default=512, - help='Project size of lstmp unit. (default: %(default)d)') - parser.add_argument( - '--hidden_dim', - type=int, - default=1024, - help='Hidden size of lstmp unit. (default: %(default)d)') - parser.add_argument( - '--class_num', - type=int, - default=3040, - help='Number of classes in label. (default: %(default)d)') - parser.add_argument( - '--pass_num', - type=int, - default=100, - help='Epoch number to train. (default: %(default)d)') - parser.add_argument( - '--print_per_batches', - type=int, - default=100, - help='Interval to print training accuracy. (default: %(default)d)') - parser.add_argument( - '--learning_rate', - type=float, - default=0.00016, - help='Learning rate used to train. (default: %(default)f)') - parser.add_argument( - '--device', - type=str, - default='GPU', - choices=['CPU', 'GPU'], - help='The device type. (default: %(default)s)') - parser.add_argument( - '--parallel', action='store_true', help='If set, run in parallel.') - parser.add_argument( - '--mean_var', - type=str, - default='data/global_mean_var_search26kHr', - help="The path for feature's global mean and variance. " - "(default: %(default)s)") - parser.add_argument( - '--train_feature_lst', - type=str, - default='data/feature.lst', - help='The feature list path for training. (default: %(default)s)') - parser.add_argument( - '--train_label_lst', - type=str, - default='data/label.lst', - help='The label list path for training. (default: %(default)s)') - parser.add_argument( - '--val_feature_lst', - type=str, - default='data/val_feature.lst', - help='The feature list path for validation. (default: %(default)s)') - parser.add_argument( - '--val_label_lst', - type=str, - default='data/val_label.lst', - help='The label list path for validation. (default: %(default)s)') - parser.add_argument( - '--init_model_path', - type=str, - default=None, - help="The model (checkpoint) path which the training resumes from. " - "If None, train the model from scratch. (default: %(default)s)") - parser.add_argument( - '--checkpoints', - type=str, - default='./checkpoints', - help="The directory for saving checkpoints. Do not save checkpoints " - "if set to ''. (default: %(default)s)") - parser.add_argument( - '--infer_models', - type=str, - default='./infer_models', - help="The directory for saving inference models. Do not save inference " - "models if set to ''. (default: %(default)s)") - args = parser.parse_args() - return args - - -def print_arguments(args): - print('----------- Configuration Arguments -----------') - for arg, value in sorted(vars(args).iteritems()): - print('%s: %s' % (arg, value)) - print('------------------------------------------------') - - -def train(args): - """train in loop. - """ - - # paths check - if args.init_model_path is not None and \ - not os.path.exists(args.init_model_path): - raise IOError("Invalid initial model path!") - if args.checkpoints != '' and not os.path.exists(args.checkpoints): - os.mkdir(args.checkpoints) - if args.infer_models != '' and not os.path.exists(args.infer_models): - os.mkdir(args.infer_models) - - train_program = fluid.Program() - train_startup = fluid.Program() - - with fluid.program_guard(train_program, train_startup): - with fluid.unique_name.guard(): - py_train_reader = fluid.layers.py_reader( - capacity=10, - shapes=([-1, 3, 11, args.frame_dim], [-1, 1]), - dtypes=['float32', 'int64'], - lod_levels=[1, 1], - name='train_reader') - feature, label = fluid.layers.read_file(py_train_reader) - prediction, avg_cost, accuracy = stacked_lstmp_model( - feature=feature, - label=label, - hidden_dim=args.hidden_dim, - proj_dim=args.proj_dim, - stacked_num=args.stacked_num, - class_num=args.class_num) - # optimizer = fluid.optimizer.Momentum(learning_rate=args.learning_rate, momentum=0.9) - optimizer = fluid.optimizer.Adam( - learning_rate=fluid.layers.exponential_decay( - learning_rate=args.learning_rate, - decay_steps=1879, - decay_rate=1 / 1.2, - staircase=True)) - optimizer.minimize(avg_cost) - fluid.memory_optimize(train_program) - - test_program = fluid.Program() - test_startup = fluid.Program() - with fluid.program_guard(test_program, test_startup): - with fluid.unique_name.guard(): - py_test_reader = fluid.layers.py_reader( - capacity=10, - shapes=([-1, 3, 11, args.frame_dim], [-1, 1]), - dtypes=['float32', 'int64'], - lod_levels=[1, 1], - name='test_reader') - feature, label = fluid.layers.read_file(py_test_reader) - prediction, avg_cost, accuracy = stacked_lstmp_model( - feature=feature, - label=label, - hidden_dim=args.hidden_dim, - proj_dim=args.proj_dim, - stacked_num=args.stacked_num, - class_num=args.class_num) - test_program = test_program.clone(for_test=True) - place = fluid.CPUPlace() if args.device == 'CPU' else fluid.CUDAPlace(0) - exe = fluid.Executor(place) - exe.run(train_startup) - exe.run(test_startup) - - if args.parallel: - exec_strategy = fluid.ExecutionStrategy() - exec_strategy.num_iteration_per_drop_scope = 10 - train_exe = fluid.ParallelExecutor( - use_cuda=(args.device == 'GPU'), - loss_name=avg_cost.name, - exec_strategy=exec_strategy, - main_program=train_program) - test_exe = fluid.ParallelExecutor( - use_cuda=(args.device == 'GPU'), - main_program=test_program, - exec_strategy=exec_strategy, - share_vars_from=train_exe) - - # resume training if initial model provided. - if args.init_model_path is not None: - fluid.io.load_persistables(exe, args.init_model_path) - - ltrans = [ - trans_add_delta.TransAddDelta(2, 2), - trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var), - trans_splice.TransSplice(5, 5), trans_delay.TransDelay(5) - ] - - # bind train_reader - train_data_reader = reader.AsyncDataReader( - args.train_feature_lst, - args.train_label_lst, - -1, - split_sentence_threshold=1024) - - train_data_reader.set_transformers(ltrans) - - def train_data_provider(): - for data in train_data_reader.batch_iterator(args.batch_size, - args.minimum_batch_size): - yield batch_data_to_lod_tensors(args, data, fluid.CPUPlace()) - - py_train_reader.decorate_tensor_provider(train_data_provider) - - if (os.path.exists(args.val_feature_lst) and - os.path.exists(args.val_label_lst)): - # test data reader - test_data_reader = reader.AsyncDataReader( - args.val_feature_lst, - args.val_label_lst, - -1, - split_sentence_threshold=1024) - test_data_reader.set_transformers(ltrans) - - def test_data_provider(): - for data in test_data_reader.batch_iterator( - args.batch_size, args.minimum_batch_size): - yield batch_data_to_lod_tensors(args, data, fluid.CPUPlace()) - - py_test_reader.decorate_tensor_provider(test_data_provider) - - # validation - def test(exe): - # If test data not found, return invalid cost and accuracy - if not (os.path.exists(args.val_feature_lst) and - os.path.exists(args.val_label_lst)): - return -1.0, -1.0 - batch_id = 0 - test_costs = [] - test_accs = [] - while True: - if batch_id == 0: - py_test_reader.start() - try: - if args.parallel: - cost, acc = exe.run( - fetch_list=[avg_cost.name, accuracy.name], - return_numpy=False) - else: - cost, acc = exe.run(program=test_program, - fetch_list=[avg_cost, accuracy], - return_numpy=False) - sys.stdout.write('.') - sys.stdout.flush() - test_costs.append(np.array(cost)[0]) - test_accs.append(np.array(acc)[0]) - batch_id += 1 - except fluid.core.EOFException: - py_test_reader.reset() - break - return np.mean(test_costs), np.mean(test_accs) - - # train - for pass_id in xrange(args.pass_num): - pass_start_time = time.time() - batch_id = 0 - while True: - if batch_id == 0: - py_train_reader.start() - to_print = batch_id > 0 and (batch_id % args.print_per_batches == 0) - try: - if args.parallel: - outs = train_exe.run( - fetch_list=[avg_cost.name, accuracy.name] - if to_print else [], - return_numpy=False) - else: - outs = exe.run(program=train_program, - fetch_list=[avg_cost, accuracy] - if to_print else [], - return_numpy=False) - except fluid.core.EOFException: - py_train_reader.reset() - break - - if to_print: - if args.parallel: - print("\nBatch %d, train cost: %f, train acc: %f" % - (batch_id, np.mean(outs[0]), np.mean(outs[1]))) - else: - print("\nBatch %d, train cost: %f, train acc: %f" % ( - batch_id, np.array(outs[0])[0], np.array(outs[1])[0])) - # save the latest checkpoint - if args.checkpoints != '': - model_path = os.path.join(args.checkpoints, - "deep_asr.latest.checkpoint") - fluid.io.save_persistables(exe, model_path, train_program) - else: - sys.stdout.write('.') - sys.stdout.flush() - - batch_id += 1 - # run test - val_cost, val_acc = test(test_exe if args.parallel else exe) - - # save checkpoint per pass - if args.checkpoints != '': - model_path = os.path.join( - args.checkpoints, - "deep_asr.pass_" + str(pass_id) + ".checkpoint") - fluid.io.save_persistables(exe, model_path, train_program) - # save inference model - if args.infer_models != '': - model_path = os.path.join( - args.infer_models, - "deep_asr.pass_" + str(pass_id) + ".infer.model") - fluid.io.save_inference_model(model_path, ["feature"], - [prediction], exe, train_program) - # cal pass time - pass_end_time = time.time() - time_consumed = pass_end_time - pass_start_time - # print info at pass end - print("\nPass %d, time consumed: %f s, val cost: %f, val acc: %f\n" % - (pass_id, time_consumed, val_cost, val_acc)) - - -def batch_data_to_lod_tensors(args, batch_data, place): - features, labels, lod, name_lst = batch_data - features = np.reshape(features, (-1, 11, 3, args.frame_dim)) - features = np.transpose(features, (0, 2, 1, 3)) - feature_t = fluid.LoDTensor() - label_t = fluid.LoDTensor() - feature_t.set(features, place) - feature_t.set_lod([lod]) - label_t.set(labels, place) - label_t.set_lod([lod]) - return feature_t, label_t - - -if __name__ == '__main__': - args = parse_args() - print_arguments(args) - - train(args) diff --git a/fluid/DeepQNetwork/DQN_agent.py b/fluid/DeepQNetwork/DQN_agent.py deleted file mode 100644 index 5b474325f656533b91965fd59d70c2d421e16fc3..0000000000000000000000000000000000000000 --- a/fluid/DeepQNetwork/DQN_agent.py +++ /dev/null @@ -1,187 +0,0 @@ -#-*- coding: utf-8 -*- - -import math -import numpy as np -import paddle.fluid as fluid -from paddle.fluid.param_attr import ParamAttr -from tqdm import tqdm - - -class DQNModel(object): - def __init__(self, state_dim, action_dim, gamma, hist_len, use_cuda=False): - self.img_height = state_dim[0] - self.img_width = state_dim[1] - self.action_dim = action_dim - self.gamma = gamma - self.exploration = 1.1 - self.update_target_steps = 10000 // 4 - self.hist_len = hist_len - self.use_cuda = use_cuda - - self.global_step = 0 - self._build_net() - - def _get_inputs(self): - return fluid.layers.data( - name='state', - shape=[self.hist_len, self.img_height, self.img_width], - dtype='float32'), \ - fluid.layers.data( - name='action', shape=[1], dtype='int32'), \ - fluid.layers.data( - name='reward', shape=[], dtype='float32'), \ - fluid.layers.data( - name='next_s', - shape=[self.hist_len, self.img_height, self.img_width], - dtype='float32'), \ - fluid.layers.data( - name='isOver', shape=[], dtype='bool') - - def _build_net(self): - self.predict_program = fluid.Program() - self.train_program = fluid.Program() - self._sync_program = fluid.Program() - - with fluid.program_guard(self.predict_program): - state, action, reward, next_s, isOver = self._get_inputs() - self.pred_value = self.get_DQN_prediction(state) - - with fluid.program_guard(self.train_program): - state, action, reward, next_s, isOver = self._get_inputs() - pred_value = self.get_DQN_prediction(state) - - reward = fluid.layers.clip(reward, min=-1.0, max=1.0) - - action_onehot = fluid.layers.one_hot(action, self.action_dim) - action_onehot = fluid.layers.cast(action_onehot, dtype='float32') - - pred_action_value = fluid.layers.reduce_sum( - fluid.layers.elementwise_mul(action_onehot, pred_value), dim=1) - - targetQ_predict_value = self.get_DQN_prediction(next_s, target=True) - best_v = fluid.layers.reduce_max(targetQ_predict_value, dim=1) - best_v.stop_gradient = True - - target = reward + (1.0 - fluid.layers.cast( - isOver, dtype='float32')) * self.gamma * best_v - cost = fluid.layers.square_error_cost(pred_action_value, target) - cost = fluid.layers.reduce_mean(cost) - - optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3) - optimizer.minimize(cost) - - vars = list(self.train_program.list_vars()) - policy_vars = list(filter( - lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars)) - target_vars = list(filter( - lambda x: 'GRAD' not in x.name and 'target' in x.name, vars)) - policy_vars.sort(key=lambda x: x.name) - target_vars.sort(key=lambda x: x.name) - - with fluid.program_guard(self._sync_program): - sync_ops = [] - for i, var in enumerate(policy_vars): - sync_op = fluid.layers.assign(policy_vars[i], target_vars[i]) - sync_ops.append(sync_op) - - # fluid exe - place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace() - self.exe = fluid.Executor(place) - self.exe.run(fluid.default_startup_program()) - - def get_DQN_prediction(self, image, target=False): - image = image / 255.0 - - variable_field = 'target' if target else 'policy' - - conv1 = fluid.layers.conv2d( - input=image, - num_filters=32, - filter_size=5, - stride=1, - padding=2, - act='relu', - param_attr=ParamAttr(name='{}_conv1'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field))) - max_pool1 = fluid.layers.pool2d( - input=conv1, pool_size=2, pool_stride=2, pool_type='max') - - conv2 = fluid.layers.conv2d( - input=max_pool1, - num_filters=32, - filter_size=5, - stride=1, - padding=2, - act='relu', - param_attr=ParamAttr(name='{}_conv2'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field))) - max_pool2 = fluid.layers.pool2d( - input=conv2, pool_size=2, pool_stride=2, pool_type='max') - - conv3 = fluid.layers.conv2d( - input=max_pool2, - num_filters=64, - filter_size=4, - stride=1, - padding=1, - act='relu', - param_attr=ParamAttr(name='{}_conv3'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field))) - max_pool3 = fluid.layers.pool2d( - input=conv3, pool_size=2, pool_stride=2, pool_type='max') - - conv4 = fluid.layers.conv2d( - input=max_pool3, - num_filters=64, - filter_size=3, - stride=1, - padding=1, - act='relu', - param_attr=ParamAttr(name='{}_conv4'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field))) - - flatten = fluid.layers.flatten(conv4, axis=1) - - out = fluid.layers.fc( - input=flatten, - size=self.action_dim, - param_attr=ParamAttr(name='{}_fc1'.format(variable_field)), - bias_attr=ParamAttr(name='{}_fc1_b'.format(variable_field))) - return out - - - def act(self, state, train_or_test): - sample = np.random.random() - if train_or_test == 'train' and sample < self.exploration: - act = np.random.randint(self.action_dim) - else: - if np.random.random() < 0.01: - act = np.random.randint(self.action_dim) - else: - state = np.expand_dims(state, axis=0) - pred_Q = self.exe.run(self.predict_program, - feed={'state': state.astype('float32')}, - fetch_list=[self.pred_value])[0] - pred_Q = np.squeeze(pred_Q, axis=0) - act = np.argmax(pred_Q) - if train_or_test == 'train': - self.exploration = max(0.1, self.exploration - 1e-6) - return act - - def train(self, state, action, reward, next_state, isOver): - if self.global_step % self.update_target_steps == 0: - self.sync_target_network() - self.global_step += 1 - - action = np.expand_dims(action, -1) - self.exe.run(self.train_program, - feed={ - 'state': state.astype('float32'), - 'action': action.astype('int32'), - 'reward': reward, - 'next_s': next_state.astype('float32'), - 'isOver': isOver - }) - - def sync_target_network(self): - self.exe.run(self._sync_program) diff --git a/fluid/DeepQNetwork/DoubleDQN_agent.py b/fluid/DeepQNetwork/DoubleDQN_agent.py deleted file mode 100644 index c95ae5632fd2e904a625f680f4a9147d5615b765..0000000000000000000000000000000000000000 --- a/fluid/DeepQNetwork/DoubleDQN_agent.py +++ /dev/null @@ -1,195 +0,0 @@ -#-*- coding: utf-8 -*- - -import math -import numpy as np -import paddle.fluid as fluid -from paddle.fluid.param_attr import ParamAttr -from tqdm import tqdm - - -class DoubleDQNModel(object): - def __init__(self, state_dim, action_dim, gamma, hist_len, use_cuda=False): - self.img_height = state_dim[0] - self.img_width = state_dim[1] - self.action_dim = action_dim - self.gamma = gamma - self.exploration = 1.1 - self.update_target_steps = 10000 // 4 - self.hist_len = hist_len - self.use_cuda = use_cuda - - self.global_step = 0 - self._build_net() - - def _get_inputs(self): - return fluid.layers.data( - name='state', - shape=[self.hist_len, self.img_height, self.img_width], - dtype='float32'), \ - fluid.layers.data( - name='action', shape=[1], dtype='int32'), \ - fluid.layers.data( - name='reward', shape=[], dtype='float32'), \ - fluid.layers.data( - name='next_s', - shape=[self.hist_len, self.img_height, self.img_width], - dtype='float32'), \ - fluid.layers.data( - name='isOver', shape=[], dtype='bool') - - def _build_net(self): - self.predict_program = fluid.Program() - self.train_program = fluid.Program() - self._sync_program = fluid.Program() - - with fluid.program_guard(self.predict_program): - state, action, reward, next_s, isOver = self._get_inputs() - self.pred_value = self.get_DQN_prediction(state) - - with fluid.program_guard(self.train_program): - state, action, reward, next_s, isOver = self._get_inputs() - pred_value = self.get_DQN_prediction(state) - - reward = fluid.layers.clip(reward, min=-1.0, max=1.0) - - action_onehot = fluid.layers.one_hot(action, self.action_dim) - action_onehot = fluid.layers.cast(action_onehot, dtype='float32') - - pred_action_value = fluid.layers.reduce_sum( - fluid.layers.elementwise_mul(action_onehot, pred_value), dim=1) - - targetQ_predict_value = self.get_DQN_prediction(next_s, target=True) - - next_s_predcit_value = self.get_DQN_prediction(next_s) - greedy_action = fluid.layers.argmax(next_s_predcit_value, axis=1) - greedy_action = fluid.layers.unsqueeze(greedy_action, axes=[1]) - - predict_onehot = fluid.layers.one_hot(greedy_action, self.action_dim) - best_v = fluid.layers.reduce_sum( - fluid.layers.elementwise_mul(predict_onehot, targetQ_predict_value), - dim=1) - best_v.stop_gradient = True - - target = reward + (1.0 - fluid.layers.cast( - isOver, dtype='float32')) * self.gamma * best_v - cost = fluid.layers.square_error_cost(pred_action_value, target) - cost = fluid.layers.reduce_mean(cost) - - optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3) - optimizer.minimize(cost) - - vars = list(self.train_program.list_vars()) - policy_vars = list(filter( - lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars)) - target_vars = list(filter( - lambda x: 'GRAD' not in x.name and 'target' in x.name, vars)) - policy_vars.sort(key=lambda x: x.name) - target_vars.sort(key=lambda x: x.name) - - with fluid.program_guard(self._sync_program): - sync_ops = [] - for i, var in enumerate(policy_vars): - sync_op = fluid.layers.assign(policy_vars[i], target_vars[i]) - sync_ops.append(sync_op) - - # fluid exe - place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace() - self.exe = fluid.Executor(place) - self.exe.run(fluid.default_startup_program()) - - def get_DQN_prediction(self, image, target=False): - image = image / 255.0 - - variable_field = 'target' if target else 'policy' - - conv1 = fluid.layers.conv2d( - input=image, - num_filters=32, - filter_size=5, - stride=1, - padding=2, - act='relu', - param_attr=ParamAttr(name='{}_conv1'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field))) - max_pool1 = fluid.layers.pool2d( - input=conv1, pool_size=2, pool_stride=2, pool_type='max') - - conv2 = fluid.layers.conv2d( - input=max_pool1, - num_filters=32, - filter_size=5, - stride=1, - padding=2, - act='relu', - param_attr=ParamAttr(name='{}_conv2'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field))) - max_pool2 = fluid.layers.pool2d( - input=conv2, pool_size=2, pool_stride=2, pool_type='max') - - conv3 = fluid.layers.conv2d( - input=max_pool2, - num_filters=64, - filter_size=4, - stride=1, - padding=1, - act='relu', - param_attr=ParamAttr(name='{}_conv3'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field))) - max_pool3 = fluid.layers.pool2d( - input=conv3, pool_size=2, pool_stride=2, pool_type='max') - - conv4 = fluid.layers.conv2d( - input=max_pool3, - num_filters=64, - filter_size=3, - stride=1, - padding=1, - act='relu', - param_attr=ParamAttr(name='{}_conv4'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field))) - - flatten = fluid.layers.flatten(conv4, axis=1) - - out = fluid.layers.fc( - input=flatten, - size=self.action_dim, - param_attr=ParamAttr(name='{}_fc1'.format(variable_field)), - bias_attr=ParamAttr(name='{}_fc1_b'.format(variable_field))) - return out - - - def act(self, state, train_or_test): - sample = np.random.random() - if train_or_test == 'train' and sample < self.exploration: - act = np.random.randint(self.action_dim) - else: - if np.random.random() < 0.01: - act = np.random.randint(self.action_dim) - else: - state = np.expand_dims(state, axis=0) - pred_Q = self.exe.run(self.predict_program, - feed={'state': state.astype('float32')}, - fetch_list=[self.pred_value])[0] - pred_Q = np.squeeze(pred_Q, axis=0) - act = np.argmax(pred_Q) - if train_or_test == 'train': - self.exploration = max(0.1, self.exploration - 1e-6) - return act - - def train(self, state, action, reward, next_state, isOver): - if self.global_step % self.update_target_steps == 0: - self.sync_target_network() - self.global_step += 1 - - action = np.expand_dims(action, -1) - self.exe.run(self.train_program, - feed={ - 'state': state.astype('float32'), - 'action': action.astype('int32'), - 'reward': reward, - 'next_s': next_state.astype('float32'), - 'isOver': isOver - }) - - def sync_target_network(self): - self.exe.run(self._sync_program) diff --git a/fluid/DeepQNetwork/DuelingDQN_agent.py b/fluid/DeepQNetwork/DuelingDQN_agent.py deleted file mode 100644 index cf2ff71bb811e5dce62be78beab1f0afb05d31f9..0000000000000000000000000000000000000000 --- a/fluid/DeepQNetwork/DuelingDQN_agent.py +++ /dev/null @@ -1,197 +0,0 @@ -#-*- coding: utf-8 -*- - -import math -import numpy as np -import paddle.fluid as fluid -from paddle.fluid.param_attr import ParamAttr -from tqdm import tqdm - - -class DuelingDQNModel(object): - def __init__(self, state_dim, action_dim, gamma, hist_len, use_cuda=False): - self.img_height = state_dim[0] - self.img_width = state_dim[1] - self.action_dim = action_dim - self.gamma = gamma - self.exploration = 1.1 - self.update_target_steps = 10000 // 4 - self.hist_len = hist_len - self.use_cuda = use_cuda - - self.global_step = 0 - self._build_net() - - def _get_inputs(self): - return fluid.layers.data( - name='state', - shape=[self.hist_len, self.img_height, self.img_width], - dtype='float32'), \ - fluid.layers.data( - name='action', shape=[1], dtype='int32'), \ - fluid.layers.data( - name='reward', shape=[], dtype='float32'), \ - fluid.layers.data( - name='next_s', - shape=[self.hist_len, self.img_height, self.img_width], - dtype='float32'), \ - fluid.layers.data( - name='isOver', shape=[], dtype='bool') - - def _build_net(self): - self.predict_program = fluid.Program() - self.train_program = fluid.Program() - self._sync_program = fluid.Program() - - with fluid.program_guard(self.predict_program): - state, action, reward, next_s, isOver = self._get_inputs() - self.pred_value = self.get_DQN_prediction(state) - - with fluid.program_guard(self.train_program): - state, action, reward, next_s, isOver = self._get_inputs() - pred_value = self.get_DQN_prediction(state) - - reward = fluid.layers.clip(reward, min=-1.0, max=1.0) - - action_onehot = fluid.layers.one_hot(action, self.action_dim) - action_onehot = fluid.layers.cast(action_onehot, dtype='float32') - - pred_action_value = fluid.layers.reduce_sum( - fluid.layers.elementwise_mul(action_onehot, pred_value), dim=1) - - targetQ_predict_value = self.get_DQN_prediction(next_s, target=True) - best_v = fluid.layers.reduce_max(targetQ_predict_value, dim=1) - best_v.stop_gradient = True - - target = reward + (1.0 - fluid.layers.cast( - isOver, dtype='float32')) * self.gamma * best_v - cost = fluid.layers.square_error_cost(pred_action_value, target) - cost = fluid.layers.reduce_mean(cost) - - optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3) - optimizer.minimize(cost) - - vars = list(self.train_program.list_vars()) - policy_vars = list(filter( - lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars)) - target_vars = list(filter( - lambda x: 'GRAD' not in x.name and 'target' in x.name, vars)) - policy_vars.sort(key=lambda x: x.name) - target_vars.sort(key=lambda x: x.name) - - with fluid.program_guard(self._sync_program): - sync_ops = [] - for i, var in enumerate(policy_vars): - sync_op = fluid.layers.assign(policy_vars[i], target_vars[i]) - sync_ops.append(sync_op) - - # fluid exe - place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace() - self.exe = fluid.Executor(place) - self.exe.run(fluid.default_startup_program()) - - def get_DQN_prediction(self, image, target=False): - image = image / 255.0 - - variable_field = 'target' if target else 'policy' - - conv1 = fluid.layers.conv2d( - input=image, - num_filters=32, - filter_size=5, - stride=1, - padding=2, - act='relu', - param_attr=ParamAttr(name='{}_conv1'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field))) - max_pool1 = fluid.layers.pool2d( - input=conv1, pool_size=2, pool_stride=2, pool_type='max') - - conv2 = fluid.layers.conv2d( - input=max_pool1, - num_filters=32, - filter_size=5, - stride=1, - padding=2, - act='relu', - param_attr=ParamAttr(name='{}_conv2'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field))) - max_pool2 = fluid.layers.pool2d( - input=conv2, pool_size=2, pool_stride=2, pool_type='max') - - conv3 = fluid.layers.conv2d( - input=max_pool2, - num_filters=64, - filter_size=4, - stride=1, - padding=1, - act='relu', - param_attr=ParamAttr(name='{}_conv3'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field))) - max_pool3 = fluid.layers.pool2d( - input=conv3, pool_size=2, pool_stride=2, pool_type='max') - - conv4 = fluid.layers.conv2d( - input=max_pool3, - num_filters=64, - filter_size=3, - stride=1, - padding=1, - act='relu', - param_attr=ParamAttr(name='{}_conv4'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field))) - - flatten = fluid.layers.flatten(conv4, axis=1) - - value = fluid.layers.fc( - input=flatten, - size=1, - param_attr=ParamAttr(name='{}_value_fc'.format(variable_field)), - bias_attr=ParamAttr(name='{}_value_fc_b'.format(variable_field))) - - advantage = fluid.layers.fc( - input=flatten, - size=self.action_dim, - param_attr=ParamAttr(name='{}_advantage_fc'.format(variable_field)), - bias_attr=ParamAttr( - name='{}_advantage_fc_b'.format(variable_field))) - - Q = advantage + (value - fluid.layers.reduce_mean( - advantage, dim=1, keep_dim=True)) - return Q - - - def act(self, state, train_or_test): - sample = np.random.random() - if train_or_test == 'train' and sample < self.exploration: - act = np.random.randint(self.action_dim) - else: - if np.random.random() < 0.01: - act = np.random.randint(self.action_dim) - else: - state = np.expand_dims(state, axis=0) - pred_Q = self.exe.run(self.predict_program, - feed={'state': state.astype('float32')}, - fetch_list=[self.pred_value])[0] - pred_Q = np.squeeze(pred_Q, axis=0) - act = np.argmax(pred_Q) - if train_or_test == 'train': - self.exploration = max(0.1, self.exploration - 1e-6) - return act - - def train(self, state, action, reward, next_state, isOver): - if self.global_step % self.update_target_steps == 0: - self.sync_target_network() - self.global_step += 1 - - action = np.expand_dims(action, -1) - self.exe.run(self.train_program, - feed={ - 'state': state.astype('float32'), - 'action': action.astype('int32'), - 'reward': reward, - 'next_s': next_state.astype('float32'), - 'isOver': isOver - }) - - def sync_target_network(self): - self.exe.run(self._sync_program) diff --git a/fluid/DeepQNetwork/README.md b/fluid/DeepQNetwork/README.md deleted file mode 100644 index 1edeaaa884318ec3a530ec4fdb7d031d07411b56..0000000000000000000000000000000000000000 --- a/fluid/DeepQNetwork/README.md +++ /dev/null @@ -1,67 +0,0 @@ -[中文版](README_cn.md) - -## Reproduce DQN, DoubleDQN, DuelingDQN model with Fluid version of PaddlePaddle -Based on PaddlePaddle's next-generation API Fluid, the DQN model of deep reinforcement learning is reproduced, and the same level of indicators of the paper is reproduced in the classic Atari game. The model receives the image of the game as input, and uses the end-to-end model to directly predict the next step. The repository contains the following three types of models: -+ DQN in -[Human-level Control Through Deep Reinforcement Learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html) -+ DoubleDQN in: -[Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389) -+ DuelingDQN in: -[Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html) - -## Atari benchmark & performance - -### Atari games introduction - -Please see [here](https://gym.openai.com/envs/#atari) to know more about Atari game. - -### Pong game result - -The average game rewards that can be obtained for the three models as the number of training steps changes during the training are as follows(about 3 hours/1 Million steps): - -
-DQN result -
- -## How to use -### Dependencies: -+ python2.7 -+ gym -+ tqdm -+ opencv-python -+ paddlepaddle-gpu>=1.0.0 -+ ale_python_interface - -### Install Dependencies: -+ Install PaddlePaddle: - recommended to compile and install PaddlePaddle from source code -+ Install other dependencies: - ``` - pip install -r requirement.txt - pip install gym[atari] - ``` - Install ale_python_interface, please see [here](https://github.com/mgbellemare/Arcade-Learning-Environment). - -### Start Training: -``` -# To train a model for Pong game with gpu (use DQN model as default) -python train.py --rom ./rom_files/pong.bin --use_cuda - -# To train a model for Pong with DoubleDQN -python train.py --rom ./rom_files/pong.bin --use_cuda --alg DoubleDQN - -# To train a model for Pong with DuelingDQN -python train.py --rom ./rom_files/pong.bin --use_cuda --alg DuelingDQN -``` - -To train more games, you can install more rom files from [here](https://github.com/openai/atari-py/tree/master/atari_py/atari_roms). - -### Start Testing: -``` -# Play the game with saved best model and calculate the average rewards -python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong - -# Play the game with visualization -python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong --viz 0.01 -``` -[Here](https://pan.baidu.com/s/1gIsbNw5V7tMeb74ojx-TMA) is saved models for Pong and Breakout games. You can use it to play the game directly. diff --git a/fluid/DeepQNetwork/README_cn.md b/fluid/DeepQNetwork/README_cn.md deleted file mode 100644 index 640d775ad8fed2be360d308b6c5df41c86d77c04..0000000000000000000000000000000000000000 --- a/fluid/DeepQNetwork/README_cn.md +++ /dev/null @@ -1,71 +0,0 @@ -## 基于PaddlePaddle的Fluid版本复现DQN, DoubleDQN, DuelingDQN三个模型 - -基于PaddlePaddle下一代API Fluid复现了深度强化学习领域的DQN模型,在经典的Atari 游戏上复现了论文同等水平的指标,模型接收游戏的图像作为输入,采用端到端的模型直接预测下一步要执行的控制信号,本仓库一共包含以下3类模型: -+ DQN模型: -[Human-level Control Through Deep Reinforcement Learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html) -+ DoubleDQN模型: -[Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389) -+ DuelingDQN模型: -[Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html) - -## 模型效果:Atari游戏表现 - -### Atari游戏介绍 - -请点击[这里](https://gym.openai.com/envs/#atari)了解Atari游戏。 - -### Pong游戏训练结果 -三个模型在训练过程中随着训练步数的变化,能得到的平均游戏奖励如下图所示(大概3小时每1百万步): - -
-DQN result -
- -## 使用教程 - -### 依赖: -+ python2.7 -+ gym -+ tqdm -+ opencv-python -+ paddlepaddle-gpu>=1.0.0 -+ ale_python_interface - -### 下载依赖: - -+ 安装PaddlePaddle: - 建议通过PaddlePaddle源码进行编译安装 -+ 下载其它依赖: - ``` - pip install -r requirement.txt - pip install gym[atari] - ``` - 安装ale_python_interface可以参考[这里](https://github.com/mgbellemare/Arcade-Learning-Environment) - -### 训练模型: - -``` -# 使用GPU训练Pong游戏(默认使用DQN模型) -python train.py --rom ./rom_files/pong.bin --use_cuda - -# 训练DoubleDQN模型 -python train.py --rom ./rom_files/pong.bin --use_cuda --alg DoubleDQN - -# 训练DuelingDQN模型 -python train.py --rom ./rom_files/pong.bin --use_cuda --alg DuelingDQN -``` - -训练更多游戏,可以从[这里](https://github.com/openai/atari-py/tree/master/atari_py/atari_roms)下载游戏rom - -### 测试模型: - -``` -# Play the game with saved model and calculate the average rewards -# 使用训练过程中保存的最好模型玩游戏,以及计算平均奖励(rewards) -python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong - -# 以可视化的形式来玩游戏 -python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong --viz 0.01 -``` - -[这里](https://pan.baidu.com/s/1gIsbNw5V7tMeb74ojx-TMA)是Pong和Breakout游戏训练好的模型,可以直接用来测试。 diff --git a/fluid/DeepQNetwork/assets/dqn.png b/fluid/DeepQNetwork/assets/dqn.png deleted file mode 100644 index f8f8d12f9887cdab62f09b52597ec187a4c8107c..0000000000000000000000000000000000000000 Binary files a/fluid/DeepQNetwork/assets/dqn.png and /dev/null differ diff --git a/fluid/DeepQNetwork/atari.py b/fluid/DeepQNetwork/atari.py deleted file mode 100644 index ec793cba15ddc1c42986689eaad5773875a4ffde..0000000000000000000000000000000000000000 --- a/fluid/DeepQNetwork/atari.py +++ /dev/null @@ -1,160 +0,0 @@ -# -*- coding: utf-8 -*- - -import numpy as np -import os -import cv2 -import threading - -import gym -from gym import spaces -from gym.envs.atari.atari_env import ACTION_MEANING - -from atari_py import ALEInterface - -__all__ = ['AtariPlayer'] - -ROM_URL = "https://github.com/openai/atari-py/tree/master/atari_py/atari_roms" -_ALE_LOCK = threading.Lock() -""" -The following AtariPlayer are copied or modified from tensorpack/tensorpack: - https://github.com/tensorpack/tensorpack/blob/master/examples/DeepQNetwork/atari.py -""" - - -class AtariPlayer(gym.Env): - """ - A wrapper for ALE emulator, with configurations to mimic DeepMind DQN settings. - Info: - score: the accumulated reward in the current game - gameOver: True when the current game is Over - """ - - def __init__(self, - rom_file, - viz=0, - frame_skip=4, - nullop_start=30, - live_lost_as_eoe=True, - max_num_frames=0): - """ - Args: - rom_file: path to the rom - frame_skip: skip every k frames and repeat the action - viz: visualization to be done. - Set to 0 to disable. - Set to a positive number to be the delay between frames to show. - Set to a string to be a directory to store frames. - nullop_start: start with random number of null ops. - live_losts_as_eoe: consider lost of lives as end of episode. Useful for training. - max_num_frames: maximum number of frames per episode. - """ - super(AtariPlayer, self).__init__() - assert os.path.isfile(rom_file), \ - "rom {} not found. Please download at {}".format(rom_file, ROM_URL) - - try: - ALEInterface.setLoggerMode(ALEInterface.Logger.Error) - except AttributeError: - print("You're not using latest ALE") - - # avoid simulator bugs: https://github.com/mgbellemare/Arcade-Learning-Environment/issues/86 - with _ALE_LOCK: - self.ale = ALEInterface() - self.ale.setInt(b"random_seed", np.random.randint(0, 30000)) - self.ale.setInt(b"max_num_frames_per_episode", max_num_frames) - self.ale.setBool(b"showinfo", False) - - self.ale.setInt(b"frame_skip", 1) - self.ale.setBool(b'color_averaging', False) - # manual.pdf suggests otherwise. - self.ale.setFloat(b'repeat_action_probability', 0.0) - - # viz setup - if isinstance(viz, str): - assert os.path.isdir(viz), viz - self.ale.setString(b'record_screen_dir', viz) - viz = 0 - if isinstance(viz, int): - viz = float(viz) - self.viz = viz - if self.viz and isinstance(self.viz, float): - self.windowname = os.path.basename(rom_file) - cv2.startWindowThread() - cv2.namedWindow(self.windowname) - - self.ale.loadROM(rom_file.encode('utf-8')) - self.width, self.height = self.ale.getScreenDims() - self.actions = self.ale.getMinimalActionSet() - - self.live_lost_as_eoe = live_lost_as_eoe - self.frame_skip = frame_skip - self.nullop_start = nullop_start - - self.action_space = spaces.Discrete(len(self.actions)) - self.observation_space = spaces.Box(low=0, - high=255, - shape=(self.height, self.width), - dtype=np.uint8) - self._restart_episode() - - def get_action_meanings(self): - return [ACTION_MEANING[i] for i in self.actions] - - def _grab_raw_image(self): - """ - :returns: the current 3-channel image - """ - m = self.ale.getScreenRGB() - return m.reshape((self.height, self.width, 3)) - - def _current_state(self): - """ - returns: a gray-scale (h, w) uint8 image - """ - ret = self._grab_raw_image() - # avoid missing frame issue: max-pooled over the last screen - ret = np.maximum(ret, self.last_raw_screen) - if self.viz: - if isinstance(self.viz, float): - cv2.imshow(self.windowname, ret) - cv2.waitKey(int(self.viz * 1000)) - ret = ret.astype('float32') - # 0.299,0.587.0.114. same as rgb2y in torch/image - ret = cv2.cvtColor(ret, cv2.COLOR_RGB2GRAY) - return ret.astype('uint8') # to save some memory - - def _restart_episode(self): - with _ALE_LOCK: - self.ale.reset_game() - - # random null-ops start - n = np.random.randint(self.nullop_start) - self.last_raw_screen = self._grab_raw_image() - for k in range(n): - if k == n - 1: - self.last_raw_screen = self._grab_raw_image() - self.ale.act(0) - - def reset(self): - if self.ale.game_over(): - self._restart_episode() - return self._current_state() - - def step(self, act): - oldlives = self.ale.lives() - r = 0 - for k in range(self.frame_skip): - if k == self.frame_skip - 1: - self.last_raw_screen = self._grab_raw_image() - r += self.ale.act(self.actions[act]) - newlives = self.ale.lives() - if self.ale.game_over() or \ - (self.live_lost_as_eoe and newlives < oldlives): - break - - isOver = self.ale.game_over() - if self.live_lost_as_eoe: - isOver = isOver or newlives < oldlives - - info = {'ale.lives': newlives} - return self._current_state(), r, isOver, info diff --git a/fluid/DeepQNetwork/atari_wrapper.py b/fluid/DeepQNetwork/atari_wrapper.py deleted file mode 100644 index 81ec7e0ba0ee191f70591c16bfff560a62d3d395..0000000000000000000000000000000000000000 --- a/fluid/DeepQNetwork/atari_wrapper.py +++ /dev/null @@ -1,106 +0,0 @@ -# -*- coding: utf-8 -*- - -import numpy as np -from collections import deque - -import gym -from gym import spaces - -_v0, _v1 = gym.__version__.split('.')[:2] -assert int(_v0) > 0 or int(_v1) >= 10, gym.__version__ -""" -The following wrappers are copied or modified from openai/baselines: -https://github.com/openai/baselines/blob/master/baselines/common/atari_wrappers.py -""" - - -class MapState(gym.ObservationWrapper): - def __init__(self, env, map_func): - gym.ObservationWrapper.__init__(self, env) - self._func = map_func - - def observation(self, obs): - return self._func(obs) - - -class FrameStack(gym.Wrapper): - def __init__(self, env, k): - """Buffer observations and stack across channels (last axis).""" - gym.Wrapper.__init__(self, env) - self.k = k - self.frames = deque([], maxlen=k) - shp = env.observation_space.shape - chan = 1 if len(shp) == 2 else shp[2] - self.observation_space = spaces.Box(low=0, - high=255, - shape=(shp[0], shp[1], chan * k), - dtype=np.uint8) - - def reset(self): - """Clear buffer and re-fill by duplicating the first observation.""" - ob = self.env.reset() - for _ in range(self.k - 1): - self.frames.append(np.zeros_like(ob)) - self.frames.append(ob) - return self.observation() - - def step(self, action): - ob, reward, done, info = self.env.step(action) - self.frames.append(ob) - return self.observation(), reward, done, info - - def observation(self): - assert len(self.frames) == self.k - return np.stack(self.frames, axis=0) - - -class _FireResetEnv(gym.Wrapper): - def __init__(self, env): - """Take action on reset for environments that are fixed until firing.""" - gym.Wrapper.__init__(self, env) - assert env.unwrapped.get_action_meanings()[1] == 'FIRE' - assert len(env.unwrapped.get_action_meanings()) >= 3 - - def reset(self): - self.env.reset() - obs, _, done, _ = self.env.step(1) - if done: - self.env.reset() - obs, _, done, _ = self.env.step(2) - if done: - self.env.reset() - return obs - - def step(self, action): - return self.env.step(action) - - -def FireResetEnv(env): - if isinstance(env, gym.Wrapper): - baseenv = env.unwrapped - else: - baseenv = env - if 'FIRE' in baseenv.get_action_meanings(): - return _FireResetEnv(env) - return env - - -class LimitLength(gym.Wrapper): - def __init__(self, env, k): - gym.Wrapper.__init__(self, env) - self.k = k - - def reset(self): - # This assumes that reset() will really reset the env. - # If the underlying env tries to be smart about reset - # (e.g. end-of-life), the assumption doesn't hold. - ob = self.env.reset() - self.cnt = 0 - return ob - - def step(self, action): - ob, r, done, info = self.env.step(action) - self.cnt += 1 - if self.cnt == self.k: - done = True - return ob, r, done, info diff --git a/fluid/DeepQNetwork/expreplay.py b/fluid/DeepQNetwork/expreplay.py deleted file mode 100644 index 5f27ca7286b5db7ac963bc25236be416fad50eb0..0000000000000000000000000000000000000000 --- a/fluid/DeepQNetwork/expreplay.py +++ /dev/null @@ -1,98 +0,0 @@ -# -*- coding: utf-8 -*- - -import numpy as np -import copy -from collections import deque, namedtuple - -Experience = namedtuple('Experience', ['state', 'action', 'reward', 'isOver']) - - -class ReplayMemory(object): - def __init__(self, max_size, state_shape, context_len): - self.max_size = int(max_size) - self.state_shape = state_shape - self.context_len = int(context_len) - - self.state = np.zeros((self.max_size, ) + state_shape, dtype='uint8') - self.action = np.zeros((self.max_size, ), dtype='int32') - self.reward = np.zeros((self.max_size, ), dtype='float32') - self.isOver = np.zeros((self.max_size, ), dtype='bool') - - self._curr_size = 0 - self._curr_pos = 0 - self._context = deque(maxlen=context_len - 1) - - def append(self, exp): - """append a new experience into replay memory - """ - if self._curr_size < self.max_size: - self._assign(self._curr_pos, exp) - self._curr_size += 1 - else: - self._assign(self._curr_pos, exp) - self._curr_pos = (self._curr_pos + 1) % self.max_size - if exp.isOver: - self._context.clear() - else: - self._context.append(exp) - - def recent_state(self): - """ maintain recent state for training""" - lst = list(self._context) - states = [np.zeros(self.state_shape, dtype='uint8')] * \ - (self._context.maxlen - len(lst)) - states.extend([k.state for k in lst]) - return states - - def sample(self, idx): - """ return state, action, reward, isOver, - note that some frames in state may be generated from last episode, - they should be removed from state - """ - state = np.zeros( - (self.context_len + 1, ) + self.state_shape, dtype=np.uint8) - state_idx = np.arange(idx, idx + self.context_len + 1) % self._curr_size - - # confirm that no frame was generated from last episode - has_last_episode = False - for k in range(self.context_len - 2, -1, -1): - to_check_idx = state_idx[k] - if self.isOver[to_check_idx]: - has_last_episode = True - state_idx = state_idx[k + 1:] - state[k + 1:] = self.state[state_idx] - break - - if not has_last_episode: - state = self.state[state_idx] - - real_idx = (idx + self.context_len - 1) % self._curr_size - action = self.action[real_idx] - reward = self.reward[real_idx] - isOver = self.isOver[real_idx] - return state, reward, action, isOver - - def __len__(self): - return self._curr_size - - def _assign(self, pos, exp): - self.state[pos] = exp.state - self.reward[pos] = exp.reward - self.action[pos] = exp.action - self.isOver[pos] = exp.isOver - - def sample_batch(self, batch_size): - """sample a batch from replay memory for training - """ - batch_idx = np.random.randint( - self._curr_size - self.context_len - 1, size=batch_size) - batch_idx = (self._curr_pos + batch_idx) % self._curr_size - batch_exp = [self.sample(i) for i in batch_idx] - return self._process_batch(batch_exp) - - def _process_batch(self, batch_exp): - state = np.asarray([e[0] for e in batch_exp], dtype='uint8') - reward = np.asarray([e[1] for e in batch_exp], dtype='float32') - action = np.asarray([e[2] for e in batch_exp], dtype='int8') - isOver = np.asarray([e[3] for e in batch_exp], dtype='bool') - return [state, action, reward, isOver] diff --git a/fluid/DeepQNetwork/play.py b/fluid/DeepQNetwork/play.py deleted file mode 100644 index b956343f3e78543ad702461175e859d3bef2af88..0000000000000000000000000000000000000000 --- a/fluid/DeepQNetwork/play.py +++ /dev/null @@ -1,65 +0,0 @@ -#-*- coding: utf-8 -*- - -import argparse -import os -import numpy as np -import paddle.fluid as fluid - -from train import get_player -from tqdm import tqdm - - -def predict_action(exe, state, predict_program, feed_names, fetch_targets, - action_dim): - if np.random.random() < 0.01: - act = np.random.randint(action_dim) - else: - state = np.expand_dims(state, axis=0) - pred_Q = exe.run(predict_program, - feed={feed_names[0]: state.astype('float32')}, - fetch_list=fetch_targets)[0] - pred_Q = np.squeeze(pred_Q, axis=0) - act = np.argmax(pred_Q) - return act - - -if __name__ == '__main__': - parser = argparse.ArgumentParser() - parser.add_argument( - '--use_cuda', action='store_true', help='if set, use cuda') - parser.add_argument('--rom', type=str, required=True, help='atari rom') - parser.add_argument( - '--model_path', type=str, required=True, help='dirname to load model') - parser.add_argument( - '--viz', - type=float, - default=0, - help='''viz: visualization setting: - Set to 0 to disable; - Set to a positive number to be the delay between frames to show. - ''') - args = parser.parse_args() - - env = get_player(args.rom, viz=args.viz) - - place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace() - exe = fluid.Executor(place) - inference_scope = fluid.core.Scope() - with fluid.scope_guard(inference_scope): - [predict_program, feed_names, - fetch_targets] = fluid.io.load_inference_model(args.model_path, exe) - - episode_reward = [] - for _ in tqdm(xrange(30), desc='eval agent'): - state = env.reset() - total_reward = 0 - while True: - action = predict_action(exe, state, predict_program, feed_names, - fetch_targets, env.action_space.n) - state, reward, isOver, info = env.step(action) - total_reward += reward - if isOver: - break - episode_reward.append(total_reward) - eval_reward = np.mean(episode_reward) - print('Average reward of 30 epidose: {}'.format(eval_reward)) diff --git a/fluid/DeepQNetwork/requirement.txt b/fluid/DeepQNetwork/requirement.txt deleted file mode 100644 index be84b259f066e9a26dd207fb5e4e6f66ea9fba03..0000000000000000000000000000000000000000 --- a/fluid/DeepQNetwork/requirement.txt +++ /dev/null @@ -1,5 +0,0 @@ -numpy -gym -tqdm -opencv-python -paddlepaddle-gpu==0.12.0 diff --git a/fluid/DeepQNetwork/rom_files/breakout.bin b/fluid/DeepQNetwork/rom_files/breakout.bin deleted file mode 100644 index abab5a8c0a1890461a11b78d4265f1b794327793..0000000000000000000000000000000000000000 Binary files a/fluid/DeepQNetwork/rom_files/breakout.bin and /dev/null differ diff --git a/fluid/DeepQNetwork/rom_files/pong.bin b/fluid/DeepQNetwork/rom_files/pong.bin deleted file mode 100644 index 14a5bdfc72548613c059938bdf712efdbb5d3806..0000000000000000000000000000000000000000 Binary files a/fluid/DeepQNetwork/rom_files/pong.bin and /dev/null differ diff --git a/fluid/DeepQNetwork/train.py b/fluid/DeepQNetwork/train.py deleted file mode 100644 index 614823c52d7e8c29b8e4565fc58f52cfa11b9640..0000000000000000000000000000000000000000 --- a/fluid/DeepQNetwork/train.py +++ /dev/null @@ -1,182 +0,0 @@ -#-*- coding: utf-8 -*- - -from DQN_agent import DQNModel -from DoubleDQN_agent import DoubleDQNModel -from DuelingDQN_agent import DuelingDQNModel -from atari import AtariPlayer -import paddle.fluid as fluid -import gym -import argparse -import cv2 -from tqdm import tqdm -from expreplay import ReplayMemory, Experience -import numpy as np -import os - -from datetime import datetime -from atari_wrapper import FrameStack, MapState, FireResetEnv, LimitLength -from collections import deque - -UPDATE_FREQ = 4 - -#MEMORY_WARMUP_SIZE = 2000 -MEMORY_SIZE = 1e6 -MEMORY_WARMUP_SIZE = MEMORY_SIZE // 20 -IMAGE_SIZE = (84, 84) -CONTEXT_LEN = 4 -ACTION_REPEAT = 4 # aka FRAME_SKIP -UPDATE_FREQ = 4 - - -def run_train_episode(agent, env, exp): - total_reward = 0 - state = env.reset() - step = 0 - while True: - step += 1 - context = exp.recent_state() - context.append(state) - context = np.stack(context, axis=0) - action = agent.act(context, train_or_test='train') - next_state, reward, isOver, _ = env.step(action) - exp.append(Experience(state, action, reward, isOver)) - # train model - # start training - if len(exp) > MEMORY_WARMUP_SIZE: - if step % UPDATE_FREQ == 0: - batch_all_state, batch_action, batch_reward, batch_isOver = exp.sample_batch( - args.batch_size) - batch_state = batch_all_state[:, :CONTEXT_LEN, :, :] - batch_next_state = batch_all_state[:, 1:, :, :] - agent.train(batch_state, batch_action, batch_reward, - batch_next_state, batch_isOver) - total_reward += reward - state = next_state - if isOver: - break - return total_reward, step - - -def get_player(rom, viz=False, train=False): - env = AtariPlayer( - rom, - frame_skip=ACTION_REPEAT, - viz=viz, - live_lost_as_eoe=train, - max_num_frames=60000) - env = FireResetEnv(env) - env = MapState(env, lambda im: cv2.resize(im, IMAGE_SIZE)) - if not train: - # in training, context is taken care of in expreplay buffer - env = FrameStack(env, CONTEXT_LEN) - return env - - -def eval_agent(agent, env): - episode_reward = [] - for _ in tqdm(range(30), desc='eval agent'): - state = env.reset() - total_reward = 0 - step = 0 - while True: - step += 1 - action = agent.act(state, train_or_test='test') - state, reward, isOver, info = env.step(action) - total_reward += reward - if isOver: - break - episode_reward.append(total_reward) - eval_reward = np.mean(episode_reward) - return eval_reward - - -def train_agent(): - env = get_player(args.rom, train=True) - test_env = get_player(args.rom) - exp = ReplayMemory(args.mem_size, IMAGE_SIZE, CONTEXT_LEN) - action_dim = env.action_space.n - - if args.alg == 'DQN': - agent = DQNModel(IMAGE_SIZE, action_dim, args.gamma, CONTEXT_LEN, - args.use_cuda) - elif args.alg == 'DoubleDQN': - agent = DoubleDQNModel(IMAGE_SIZE, action_dim, args.gamma, CONTEXT_LEN, - args.use_cuda) - elif args.alg == 'DuelingDQN': - agent = DuelingDQNModel(IMAGE_SIZE, action_dim, args.gamma, CONTEXT_LEN, - args.use_cuda) - else: - print('Input algorithm name error!') - return - - with tqdm(total=MEMORY_WARMUP_SIZE) as pbar: - while len(exp) < MEMORY_WARMUP_SIZE: - total_reward, step = run_train_episode(agent, env, exp) - pbar.update(step) - - # train - test_flag = 0 - save_flag = 0 - pbar = tqdm(total=1e8) - recent_100_reward = [] - total_step = 0 - max_reward = None - save_path = os.path.join(args.model_dirname, '{}-{}'.format( - args.alg, os.path.basename(args.rom).split('.')[0])) - while True: - # start epoch - total_reward, step = run_train_episode(agent, env, exp) - total_step += step - pbar.set_description('[train]exploration:{}'.format(agent.exploration)) - pbar.update(step) - - if total_step // args.test_every_steps == test_flag: - pbar.write("testing") - eval_reward = eval_agent(agent, test_env) - test_flag += 1 - print("eval_agent done, (steps, eval_reward): ({}, {})".format( - total_step, eval_reward)) - - if max_reward is None or eval_reward > max_reward: - max_reward = eval_reward - fluid.io.save_inference_model(save_path, ['state'], - agent.pred_value, agent.exe, - agent.predict_program) - pbar.close() - - -if __name__ == '__main__': - parser = argparse.ArgumentParser() - parser.add_argument( - '--alg', - type=str, - default='DQN', - help='Reinforcement learning algorithm, support: DQN, DoubleDQN, DuelingDQN' - ) - parser.add_argument( - '--use_cuda', action='store_true', help='if set, use cuda') - parser.add_argument( - '--gamma', - type=float, - default=0.99, - help='discount factor for accumulated reward computation') - parser.add_argument( - '--mem_size', - type=int, - default=1000000, - help='memory size for experience replay') - parser.add_argument( - '--batch_size', type=int, default=64, help='batch size for training') - parser.add_argument('--rom', help='atari rom', required=True) - parser.add_argument( - '--model_dirname', - type=str, - default='saved_model', - help='dirname to save model') - parser.add_argument( - '--test_every_steps', - type=int, - default=100000, - help='every steps number to run test') - args = parser.parse_args() - train_agent() diff --git a/fluid/PaddleCV/human_pose_estimation/README.md b/fluid/PaddleCV/human_pose_estimation/README.md deleted file mode 100644 index 25dc66500c2bfa6ef10c9dea6b150908d1f1418b..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/human_pose_estimation/README.md +++ /dev/null @@ -1,110 +0,0 @@ -# Simple Baselines for Human Pose Estimation in Fluid - -## Introduction -This is a simple demonstration of re-implementation in [PaddlePaddle.Fluid](http://www.paddlepaddle.org/en) for the paper [Simple Baselines for Human Pose Estimation and Tracking](https://arxiv.org/abs/1804.06208) (ECCV'18) from MSRA. - -![demo](demo.gif) - -> **Video in Demo**: *Bruno Mars - That’s What I Like [Official Video]*. - -## Requirements - - - Python == 2.7 - - PaddlePaddle >= 1.0 - - opencv-python >= 3.3 - - tqdm >= 4.25 - -## Environment - -The code is developed and tested under 4 Tesla K40 GPUS cards on CentOS with installed CUDA-9.2/8.0 and cuDNN-7.1. - -## Known Issues - - - The model does not converge with large batch\_size (e.g. = 32) on Tesla P40 / V100 / P100 GPUS cards, because PaddlePaddle uses the batch normalization function of cuDNN. Changing batch\_size into 1 image on each card during training will ease this problem, but not sure the performance. The issue can be tracked at [here](https://github.com/PaddlePaddle/Paddle/issues/14580). - -## Results on MPII Val -| Arch | Head | Shoulder | Elbow | Wrist | Hip | Knee | Ankle | Mean | Mean@0.1| Models | -| ---- |:----:|:--------:|:-----:|:-----:|:---:|:----:|:-----:|:----:|:-------:|:------:| -| 383x384\_pose\_resnet\_50 in PyTorch | 96.658 | 95.754 | 89.790 | 84.614 | 88.523 | 84.666 | 79.287 | 89.066 | 38.046 | - | -| 383x384\_pose\_resnet\_50 in Fluid | 96.248 | 95.346 | 89.807 | 84.873 | 88.298 | 83.679 | 78.649 | 88.767 | 37.374 | [`link`](http://paddlemodels.bj.bcebos.com/pose/pose-resnet-50-384x384-mpii.tar.gz) | - -### Notes: - - - Flip test is used. - - We do not hardly search the best model, just use the last saved model to make validation. - -## Getting Start - -### Prepare Datasets and Pretrained Models - - - Following the [instruction](https://github.com/Microsoft/human-pose-estimation.pytorch#data-preparation) to prepare datasets. - - Download the pretrained ResNet-50 model in PaddlePaddle.Fluid on ImageNet from [Model Zoo](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification#supported-models-and-performances). - -```bash -wget http://paddle-imagenet-models.bj.bcebos.com/resnet_50_model.tar -``` - -Then, put them in the folder `pretrained` under the directory root of this repo, make them look like: - -``` -${THIS REPO ROOT} - `-- pretrained - `-- resnet_50 - |-- 115 - `-- data - `-- coco - |-- annotations - |-- images - `-- mpii - |-- annot - |-- images -``` - -### Install [COCOAPI](https://github.com/cocodataset/cocoapi) - -```bash -# COCOAPI=/path/to/clone/cocoapi -git clone https://github.com/cocodataset/cocoapi.git $COCOAPI -cd $COCOAPI/PythonAPI -# if cython is not installed -pip install Cython -# Install into global site-packages -make install -# Alternatively, if you do not have permissions or prefer -# not to install the COCO API into global site-packages -python2 setup.py install --user -``` - -### Perform Validating - -Downloading the checkpoints of Pose-ResNet-50 trained on MPII dataset from [here](http://paddlemodels.bj.bcebos.com/pose/pose-resnet-50-384x384-mpii.tar.gz). Extract it into the folder `checkpoints` under the directory root of this repo. Then run - -```bash -python2 val.py --dataset 'mpii' --checkpoint 'checkpoints/pose-resnet-50-384x384-mpii' -``` - -### Perform Training - -```bash -python2 train.py --dataset 'mpii' # or coco -``` - -**Note**: Configurations for training are aggregated in the `lib/mpii_reader.py` and `lib/coco_reader.py`. - -### Perform Test on Images - -Put the images into the folder `test` under the directory root of this repo. Then run - -```bash -python2 test.py --checkpoint 'checkpoints/pose-resnet-50-384x384-mpii' -``` - -If there are multiple persons in images, detectors such as [Faster R-CNN](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/faster_rcnn), [SSD](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/object_detection) or others should be used first to crop them out. Because the simple baseline for human pose estimation is a top-down method. - -## Reference - - - Simple Baselines for Human Pose Estimation and Tracking in PyTorch [`code`](https://github.com/Microsoft/human-pose-estimation.pytorch#data-preparation) - -## License - -This code is released under the Apache License 2.0. diff --git a/fluid/PaddleCV/human_pose_estimation/demo.gif b/fluid/PaddleCV/human_pose_estimation/demo.gif deleted file mode 100644 index 1b66e367fae865f9129db00fbf9125bfd73bd784..0000000000000000000000000000000000000000 Binary files a/fluid/PaddleCV/human_pose_estimation/demo.gif and /dev/null differ diff --git a/fluid/PaddleCV/human_pose_estimation/lib/__init__.py b/fluid/PaddleCV/human_pose_estimation/lib/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/fluid/PaddleCV/human_pose_estimation/lib/base_reader.py b/fluid/PaddleCV/human_pose_estimation/lib/base_reader.py deleted file mode 100644 index 52f5168bc234a870ca2b4a2682c5b320ec59ac41..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/human_pose_estimation/lib/base_reader.py +++ /dev/null @@ -1,116 +0,0 @@ -# Copyright (c) 2018-present, Baidu, Inc. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -############################################################################## - -"""Libs for data reader.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os -import shutil -import cv2 -import numpy as np - -def visualize(cfg, filename, data_numpy, input, joints, target): - """ - :param cfg: global configurations for dataset - :param filename: the name of image file - :param data_numpy: original numpy image data - :param input: input tensor [b, c, h, w] - :param joints: [num_joints, 3] - :param target: target tensor [b, c, h, w] - """ - TMPDIR = cfg.TMPDIR - NUM_JOINTS = cfg.NUM_JOINTS - - if os.path.exists(TMPDIR): - shutil.rmtree(TMPDIR) - os.mkdir(TMPDIR) - else: - os.mkdir(TMPDIR) - - f = open(os.path.join(TMPDIR, filename), 'w') - f.close() - - cv2.imwrite(os.path.join(TMPDIR, 'flip.jpg'), data_numpy) - cv2.imwrite(os.path.join(TMPDIR, 'input.jpg'), input) - for i in range(NUM_JOINTS): - cv2.imwrite(os.path.join(TMPDIR, 'target_{}.jpg'.format(i)), cv2.applyColorMap( - np.uint8(np.expand_dims(target[i], 2)*255.), cv2.COLORMAP_JET)) - cv2.circle(input, (int(joints[i, 0]), int(joints[i, 1])), 5, [170, 255, 0], -1) - cv2.imwrite(os.path.join(TMPDIR, 'input_kps.jpg'), input) - -def generate_target(cfg, joints, joints_vis): - """ - :param joints: [num_joints, 3] - :param joints_vis: [num_joints, 3] - :return: target, target_weight(1: visible, 0: invisible) - """ - NUM_JOINTS = cfg.NUM_JOINTS - TARGET_TYPE = cfg.TARGET_TYPE - HEATMAP_SIZE = cfg.HEATMAP_SIZE - IMAGE_SIZE = cfg.IMAGE_SIZE - SIGMA = cfg.SIGMA - - target_weight = np.ones((NUM_JOINTS, 1), dtype=np.float32) - target_weight[:, 0] = joints_vis[:, 0] - - assert TARGET_TYPE == 'gaussian', \ - 'Only support gaussian map now!' - - if TARGET_TYPE == 'gaussian': - target = np.zeros((NUM_JOINTS, - HEATMAP_SIZE[1], - HEATMAP_SIZE[0]), - dtype=np.float32) - - tmp_size = SIGMA * 3 - - for joint_id in range(NUM_JOINTS): - feat_stride = np.array(IMAGE_SIZE) / np.array(HEATMAP_SIZE) - mu_x = int(joints[joint_id][0] / feat_stride[0] + 0.5) - mu_y = int(joints[joint_id][1] / feat_stride[1] + 0.5) - - # Check that any part of the gaussian is in-bounds - ul = [int(mu_x - tmp_size), int(mu_y - tmp_size)] - br = [int(mu_x + tmp_size + 1), int(mu_y + tmp_size + 1)] - if ul[0] >= HEATMAP_SIZE[0] or ul[1] >= HEATMAP_SIZE[1] \ - or br[0] < 0 or br[1] < 0: - # If not, just return the image as is - target_weight[joint_id] = 0 - continue - - # Generate gaussian - size = 2 * tmp_size + 1 - x = np.arange(0, size, 1, np.float32) - y = x[:, np.newaxis] - x0 = y0 = size // 2 - # The gaussian is not normalized, we want the center value to equal 1 - g = np.exp(- ((x - x0) ** 2 + (y - y0) ** 2) / (2 * SIGMA ** 2)) - - # Usable gaussian range - g_x = max(0, -ul[0]), min(br[0], HEATMAP_SIZE[0]) - ul[0] - g_y = max(0, -ul[1]), min(br[1], HEATMAP_SIZE[1]) - ul[1] - # Image range - img_x = max(0, ul[0]), min(br[0], HEATMAP_SIZE[0]) - img_y = max(0, ul[1]), min(br[1], HEATMAP_SIZE[1]) - - v = target_weight[joint_id] - if v > 0.5: - target[joint_id][img_y[0]:img_y[1], img_x[0]:img_x[1]] = \ - g[g_y[0]:g_y[1], g_x[0]:g_x[1]] - - return target, target_weight diff --git a/fluid/PaddleCV/human_pose_estimation/lib/coco_reader.py b/fluid/PaddleCV/human_pose_estimation/lib/coco_reader.py deleted file mode 100644 index e955bee6f3efa1c507bec7d2dc83a93d741159b4..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/human_pose_estimation/lib/coco_reader.py +++ /dev/null @@ -1,323 +0,0 @@ -# Copyright (c) 2018-present, Baidu, Inc. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -############################################################################## - -"""Data reader for COCO dataset.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os -import functools -import numpy as np -import cv2 -import random - -from utils.transforms import fliplr_joints -from utils.transforms import get_affine_transform -from utils.transforms import affine_transform -from lib.base_reader import visualize, generate_target -from pycocotools.coco import COCO - -# NOTE -# -- COCO Datatset -- -# "keypoints": -# { -# 0: "nose", -# 1: "left_eye", -# 2: "right_eye", -# 3: "left_ear", -# 4: "right_ear", -# 5: "left_shoulder", -# 6: "right_shoulder", -# 7: "left_elbow", -# 8: "right_elbow", -# 9: "left_wrist", -# 10: "right_wrist", -# 11: "left_hip", -# 12: "right_hip", -# 13: "left_knee", -# 14: "right_knee", -# 15: "left_ankle", -# 16: "right_ankle" -# }, -# -# "skeleton": -# [ -# [16,14],[14,12],[17,15],[15,13],[12,13],[6,12],[7,13], [6,7],[6,8], -# [7,9],[8,10],[9,11],[2,3],[1,2],[1,3],[2,4],[3,5],[4,6],[5,7] -# ] - -class Config: - """Configurations for COCO dataset. - """ - DEBUG = False - TMPDIR = 'tmp_fold_for_debug' - - # For reader - BUF_SIZE = 102400 - THREAD = 1 if DEBUG else 8 # have to be larger than 0 - - # Fixed infos of dataset - DATAROOT = 'data/coco' - IMAGEDIR = 'images' - NUM_JOINTS = 17 - FLIP_PAIRS = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] - PARENT_IDS = None - - # CFGS - SCALE_FACTOR = 0.3 - ROT_FACTOR = 40 - FLIP = True - TARGET_TYPE = 'gaussian' - SIGMA = 3 - IMAGE_SIZE = [288, 384] - HEATMAP_SIZE = [72, 96] - ASPECT_RATIO = IMAGE_SIZE[0] * 1.0 / IMAGE_SIZE[1] - MEAN = [0.485, 0.456, 0.406] - STD = [0.229, 0.224, 0.225] - PIXEL_STD = 200 - -cfg = Config() - -def _box2cs(box): - x, y, w, h = box[:4] - return _xywh2cs(x, y, w, h) - -def _xywh2cs(x, y, w, h): - center = np.zeros((2), dtype=np.float32) - center[0] = x + w * 0.5 - center[1] = y + h * 0.5 - - if w > cfg.ASPECT_RATIO * h: - h = w * 1.0 / cfg.ASPECT_RATIO - elif w < cfg.ASPECT_RATIO * h: - w = h * cfg.ASPECT_RATIO - scale = np.array( - [w * 1.0 / cfg.PIXEL_STD, h * 1.0 / cfg.PIXEL_STD], - dtype=np.float32) - if center[0] != -1: - scale = scale * 1.25 - - return center, scale - -def _select_data(db): - db_selected = [] - for rec in db: - num_vis = 0 - joints_x = 0.0 - joints_y = 0.0 - for joint, joint_vis in zip( - rec['joints_3d'], rec['joints_3d_vis']): - if joint_vis[0] <= 0: - continue - num_vis += 1 - - joints_x += joint[0] - joints_y += joint[1] - if num_vis == 0: - continue - - joints_x, joints_y = joints_x / num_vis, joints_y / num_vis - - area = rec['scale'][0] * rec['scale'][1] * (cfg.PIXEL_STD**2) - joints_center = np.array([joints_x, joints_y]) - bbox_center = np.array(rec['center']) - diff_norm2 = np.linalg.norm((joints_center-bbox_center), 2) - ks = np.exp(-1.0*(diff_norm2**2) / ((0.2)**2*2.0*area)) - - metric = (0.2 / 16) * num_vis + 0.45 - 0.2 / 16 - if ks > metric: - db_selected.append(rec) - - print('=> num db: {}'.format(len(db))) - print('=> num selected db: {}'.format(len(db_selected))) - return db_selected - -def _load_coco_keypoint_annotation(image_set_index, coco, _coco_ind_to_class_ind, image_set): - """Ground truth bbox and keypoints. - """ - print('generating coco gt_db...') - gt_db = [] - for index in image_set_index: - im_ann = coco.loadImgs(index)[0] - width = im_ann['width'] - height = im_ann['height'] - - annIds = coco.getAnnIds(imgIds=index, iscrowd=False) - objs = coco.loadAnns(annIds) - - # Sanitize bboxes - valid_objs = [] - for obj in objs: - x, y, w, h = obj['bbox'] - x1 = np.max((0, x)) - y1 = np.max((0, y)) - x2 = np.min((width - 1, x1 + np.max((0, w - 1)))) - y2 = np.min((height - 1, y1 + np.max((0, h - 1)))) - if obj['area'] > 0 and x2 >= x1 and y2 >= y1: - obj['clean_bbox'] = [x1, y1, x2-x1, y2-y1] - valid_objs.append(obj) - objs = valid_objs - - rec = [] - for obj in objs: - cls = _coco_ind_to_class_ind[obj['category_id']] - if cls != 1: - continue - - # Ignore objs without keypoints annotation - if max(obj['keypoints']) == 0: - continue - - joints_3d = np.zeros((cfg.NUM_JOINTS, 3), dtype=np.float) - joints_3d_vis = np.zeros((cfg.NUM_JOINTS, 3), dtype=np.float) - for ipt in range(cfg.NUM_JOINTS): - joints_3d[ipt, 0] = obj['keypoints'][ipt * 3 + 0] - joints_3d[ipt, 1] = obj['keypoints'][ipt * 3 + 1] - joints_3d[ipt, 2] = 0 - t_vis = obj['keypoints'][ipt * 3 + 2] - if t_vis > 1: - t_vis = 1 - joints_3d_vis[ipt, 0] = t_vis - joints_3d_vis[ipt, 1] = t_vis - joints_3d_vis[ipt, 2] = 0 - - center, scale = _box2cs(obj['clean_bbox'][:4]) - rec.append({ - 'image': os.path.join(cfg.DATAROOT, cfg.IMAGEDIR, image_set+'2017', '%012d.jpg' % index), - 'center': center, - 'scale': scale, - 'joints_3d': joints_3d, - 'joints_3d_vis': joints_3d_vis, - 'filename': '%012d.jpg' % index, - 'imgnum': 0, - }) - - gt_db.extend(rec) - return gt_db - -def data_augmentation(sample, is_train): - image_file = sample['image'] - filename = sample['filename'] if 'filename' in sample else '' - joints = sample['joints_3d'] - joints_vis = sample['joints_3d_vis'] - c = sample['center'] - s = sample['scale'] - # score = sample['score'] if 'score' in sample else 1 - # imgnum = sample['imgnum'] if 'imgnum' in sample else '' - r = 0 - - data_numpy = cv2.imread( - image_file, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION) - - if is_train: - sf = cfg.SCALE_FACTOR - rf = cfg.ROT_FACTOR - s = s * np.clip(np.random.randn()*sf + 1, 1 - sf, 1 + sf) - r = np.clip(np.random.randn()*rf, -rf*2, rf*2) \ - if random.random() <= 0.6 else 0 - - if cfg.FLIP and random.random() <= 0.5: - data_numpy = data_numpy[:, ::-1, :] - joints, joints_vis = fliplr_joints( - joints, joints_vis, data_numpy.shape[1], cfg.FLIP_PAIRS) - c[0] = data_numpy.shape[1] - c[0] - 1 - - trans = get_affine_transform(c, s, r, cfg.IMAGE_SIZE) - input = cv2.warpAffine( - data_numpy, - trans, - (int(cfg.IMAGE_SIZE[0]), int(cfg.IMAGE_SIZE[1])), - flags=cv2.INTER_LINEAR) - - for i in range(cfg.NUM_JOINTS): - if joints_vis[i, 0] > 0.0: - joints[i, 0:2] = affine_transform(joints[i, 0:2], trans) - - # Numpy target - target, target_weight = generate_target(cfg, joints, joints_vis) - - if cfg.DEBUG: - visualize(cfg, filename, data_numpy, input.copy(), joints, target) - - # Normalization - input = input.astype('float32').transpose((2, 0, 1)) / 255 - input -= np.array(cfg.MEAN).reshape((3, 1, 1)) - input /= np.array(cfg.STD).reshape((3, 1, 1)) - - if is_train: - return input, target, target_weight - else: - return input, target, target_weight, c, s - -# Create a reader -def _reader_creator(root, image_set, shuffle=False, is_train=False, use_gt_bbox=False): - - def reader(): - if image_set in ['train', 'val']: - file_name = os.path.join(root, 'annotations', 'person_keypoints_'+image_set+'2017.json') - elif image_set in ['test', 'test-dev']: - file_name = os.path.join(root, 'annotations', 'image_info_'+image_set+'2017.json') - else: - raise ValueError("The dataset '{}' is not supported".format(image_set)) - - # Load annotations - coco = COCO(file_name) - - # Deal with class names - cats = [cat['name'] - for cat in coco.loadCats(coco.getCatIds())] - classes = ['__background__'] + cats - print('=> classes: {}'.format(classes)) - num_classes = len(classes) - _class_to_ind = dict(zip(classes, range(num_classes))) - _class_to_coco_ind = dict(zip(cats, coco.getCatIds())) - _coco_ind_to_class_ind = dict([(_class_to_coco_ind[cls], - _class_to_ind[cls]) - for cls in classes[1:]]) - - # Load image file names - image_set_index = coco.getImgIds() - num_images = len(image_set_index) - print('=> num_images: {}'.format(num_images)) - - if is_train or use_gt_bbox: - gt_db = _load_coco_keypoint_annotation( - image_set_index, coco, _coco_ind_to_class_ind, image_set) - gt_db = _select_data(gt_db) - - if shuffle: - random.shuffle(gt_db) - - for db in gt_db: - yield db - - mapper = functools.partial(data_augmentation, is_train=is_train) - return reader, mapper - -def train(): - reader, mapper = _reader_creator(cfg.DATAROOT, 'train', shuffle=True, is_train=True) - def pop(): - for i, x in enumerate(reader()): - yield mapper(x) - return pop - -def valid(): - reader, mapper = _reader_creator(cfg.DATAROOT, 'val', shuffle=False, is_train=False) - def pop(): - for i, x in enumerate(reader()): - yield mapper(x) - return pop diff --git a/fluid/PaddleCV/human_pose_estimation/lib/mpii_reader.py b/fluid/PaddleCV/human_pose_estimation/lib/mpii_reader.py deleted file mode 100644 index f1f6fb38b9dcbfc6402a29fc26e44188c3cfe746..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/human_pose_estimation/lib/mpii_reader.py +++ /dev/null @@ -1,216 +0,0 @@ -# Copyright (c) 2018-present, Baidu, Inc. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -############################################################################## - -"""Data reader for MPII.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os -import random -import functools -import json -import numpy as np -import cv2 - -from utils.transforms import fliplr_joints -from utils.transforms import get_affine_transform -from utils.transforms import affine_transform -from lib.base_reader import visualize, generate_target - -class Config: - """Configurations for MPII dataset. - """ - DEBUG = False - TMPDIR = 'tmp_fold_for_debug' - - # For reader - BUF_SIZE = 102400 - THREAD = 1 if DEBUG else 8 # have to be larger than 0 - - # Fixed infos of dataset - DATAROOT = 'data/mpii' - IMAGEDIR = 'images' - NUM_JOINTS = 16 - FLIP_PAIRS = [[0, 5], [1, 4], [2, 3], [10, 15], [11, 14], [12, 13]] - PARENT_IDS = [1, 2, 6, 6, 3, 4, 6, 6, 7, 8, 11, 12, 7, 7, 13, 14] - - # CFGS - SCALE_FACTOR = 0.3 - ROT_FACTOR = 40 - FLIP = True - TARGET_TYPE = 'gaussian' - SIGMA = 3 - IMAGE_SIZE = [384, 384] - HEATMAP_SIZE = [96, 96] - MEAN = [0.485, 0.456, 0.406] - STD = [0.229, 0.224, 0.225] - -cfg = Config() - -def data_augmentation(sample, is_train): - image_file = sample['image'] - filename = sample['filename'] if 'filename' in sample else '' - joints = sample['joints_3d'] - joints_vis = sample['joints_3d_vis'] - c = sample['center'] - s = sample['scale'] - score = sample['score'] if 'score' in sample else 1 - # imgnum = sample['imgnum'] if 'imgnum' in sample else '' - r = 0 - - data_numpy = cv2.imread( - image_file, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION) - - if is_train: - sf = cfg.SCALE_FACTOR - rf = cfg.ROT_FACTOR - s = s * np.clip(np.random.randn()*sf + 1, 1 - sf, 1 + sf) - r = np.clip(np.random.randn()*rf, -rf*2, rf*2) \ - if random.random() <= 0.6 else 0 - - if cfg.FLIP and random.random() <= 0.5: - data_numpy = data_numpy[:, ::-1, :] - joints, joints_vis = fliplr_joints( - joints, joints_vis, data_numpy.shape[1], cfg.FLIP_PAIRS) - c[0] = data_numpy.shape[1] - c[0] - 1 - - trans = get_affine_transform(c, s, r, cfg.IMAGE_SIZE) - input = cv2.warpAffine( - data_numpy, - trans, - (int(cfg.IMAGE_SIZE[0]), int(cfg.IMAGE_SIZE[1])), - flags=cv2.INTER_LINEAR) - - for i in range(cfg.NUM_JOINTS): - if joints_vis[i, 0] > 0.0: - joints[i, 0:2] = affine_transform(joints[i, 0:2], trans) - - # Numpy target - target, target_weight = generate_target(cfg, joints, joints_vis) - - if cfg.DEBUG: - visualize(cfg, filename, data_numpy, input.copy(), joints, target) - - # Normalization - input = input.astype('float32').transpose((2, 0, 1)) / 255 - input -= np.array(cfg.MEAN).reshape((3, 1, 1)) - input /= np.array(cfg.STD).reshape((3, 1, 1)) - - if is_train: - return input, target, target_weight - else: - return input, target, target_weight, c, s, score - -def test_data_augmentation(sample): - image_file = sample['image'] - filename = sample['filename'] if 'filename' in sample else '' - - file_id = int(filename.split('.')[0].split('_')[1]) - - input = cv2.imread( - image_file, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION) - - input = cv2.resize(input, (int(cfg.IMAGE_SIZE[0]), int(cfg.IMAGE_SIZE[1]))) - - # Normalization - input = input.astype('float32').transpose((2, 0, 1)) / 255 - input -= np.array(cfg.MEAN).reshape((3, 1, 1)) - input /= np.array(cfg.STD).reshape((3, 1, 1)) - - return input, file_id - -# Create a reader -def _reader_creator(root, image_set, shuffle=False, is_train=False): - def reader(): - if image_set != 'test': - file_name = os.path.join(root, 'annot', image_set+'.json') - with open(file_name) as anno_file: - anno = json.load(anno_file) - print('=> load {} samples of {} dataset'.format(len(anno), image_set)) - - if shuffle: - random.shuffle(anno) - - for a in anno: - image_name = a['image'] - - c = np.array(a['center'], dtype=np.float) - s = np.array([a['scale'], a['scale']], dtype=np.float) - - # Adjust center/scale slightly to avoid cropping limbs - if c[0] != -1: - c[1] = c[1] + 15 * s[1] - s = s * 1.25 - - # MPII uses matlab format, index is based 1, - # we should first convert to 0-based index - c = c - 1 - - joints_3d = np.zeros((cfg.NUM_JOINTS, 3), dtype=np.float) - joints_3d_vis = np.zeros((cfg.NUM_JOINTS, 3), dtype=np.float) - - joints = np.array(a['joints']) - joints[:, 0:2] = joints[:, 0:2] - 1 - joints_vis = np.array(a['joints_vis']) - assert len(joints) == cfg.NUM_JOINTS, \ - 'joint num diff: {} vs {}'.format(len(joints), cfg.NUM_JOINTS) - - joints_3d[:, 0:2] = joints[:, 0:2] - joints_3d_vis[:, 0] = joints_vis[:] - joints_3d_vis[:, 1] = joints_vis[:] - - yield dict( - image = os.path.join(cfg.DATAROOT, cfg.IMAGEDIR, image_name), - center = c, - scale = s, - joints_3d = joints_3d, - joints_3d_vis = joints_3d_vis, - filename = image_name, - test_mode = False, - imagenum = 0) - else: - fold = 'test' - for img_name in os.listdir(fold): - yield dict(image = os.path.join(fold, img_name), - filename = img_name) - - if not image_set == 'test': - mapper = functools.partial(data_augmentation, is_train=is_train) - else: - mapper = functools.partial(test_data_augmentation) - return reader, mapper - -def train(): - reader, mapper = _reader_creator(cfg.DATAROOT, 'train', shuffle=True, is_train=True) - def pop(): - for i, x in enumerate(reader()): - yield mapper(x) - return pop - -def valid(): - reader, mapper = _reader_creator(cfg.DATAROOT, 'valid', shuffle=False, is_train=False) - def pop(): - for i, x in enumerate(reader()): - yield mapper(x) - return pop - -def test(): - reader, mapper = _reader_creator(cfg.DATAROOT, 'test') - def pop(): - for i, x in enumerate(reader()): - yield mapper(x) - return pop diff --git a/fluid/PaddleCV/human_pose_estimation/lib/pose_resnet.py b/fluid/PaddleCV/human_pose_estimation/lib/pose_resnet.py deleted file mode 100644 index aab9eb551705a40e148e0af591fbbecd15ebcbed..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/human_pose_estimation/lib/pose_resnet.py +++ /dev/null @@ -1,192 +0,0 @@ -# Copyright (c) 2018-present, Baidu, Inc. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -############################################################################## - -"""Functions for building network.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function -import paddle.fluid as fluid - -__all__ = ["ResNet", "ResNet50", "ResNet101", "ResNet152"] - -# Global parameters -BN_MOMENTUM = 0.1 - -class ResNet(): - def __init__(self, layers=50, kps_num=16, test_mode=False): - """ - :param layers: int, the layers number which is used here - :param kps_num: int, the number of keypoints in accord with the dataset - :param test_mode: bool, if True, only return output heatmaps, no loss - - :return: loss, output heatmaps - """ - self.k = kps_num - self.layers = layers - self.test_mode = test_mode - - def net(self, input, target=None, target_weight=None): - layers = self.layers - supported_layers = [50, 101, 152] - assert layers in supported_layers, \ - "supported layers are {} but input layer is {}".format(supported_layers, layers) - - if layers == 50: - depth = [3, 4, 6, 3] - elif layers == 101: - depth = [3, 4, 23, 3] - elif layers == 152: - depth = [3, 8, 36, 3] - num_filters = [64, 128, 256, 512] - - conv = self.conv_bn_layer( - input=input, num_filters=64, filter_size=7, stride=2, act='relu') - conv = fluid.layers.pool2d( - input=conv, - pool_size=3, - pool_stride=2, - pool_padding=1, - pool_type='max') - - for block in range(len(depth)): - for i in range(depth[block]): - conv = self.bottleneck_block( - input=conv, - num_filters=num_filters[block], - stride=2 if i == 0 and block != 0 else 1) - - conv = fluid.layers.conv2d_transpose( - input=conv, num_filters=256, - filter_size=4, - padding=1, - stride=2, - param_attr=fluid.param_attr.ParamAttr( - initializer=fluid.initializer.Normal(0., 0.001)), - act=None, - bias_attr=False) - conv = fluid.layers.batch_norm(input=conv, act='relu', momentum=BN_MOMENTUM) - conv = fluid.layers.conv2d_transpose( - input=conv, num_filters=256, - filter_size=4, - padding=1, - stride=2, - param_attr=fluid.param_attr.ParamAttr( - initializer=fluid.initializer.Normal(0., 0.001)), - act=None, - bias_attr=False) - conv = fluid.layers.batch_norm(input=conv, act='relu', momentum=BN_MOMENTUM) - conv = fluid.layers.conv2d_transpose( - input=conv, num_filters=256, - filter_size=4, - padding=1, - stride=2, - param_attr=fluid.param_attr.ParamAttr( - initializer=fluid.initializer.Normal(0., 0.001)), - act=None, - bias_attr=False) - conv = fluid.layers.batch_norm(input=conv, act='relu', momentum=BN_MOMENTUM) - - out = fluid.layers.conv2d( - input=conv, - num_filters=self.k, - filter_size=1, - stride=1, - padding=0, - act=None, - param_attr=fluid.param_attr.ParamAttr( - initializer=fluid.initializer.Normal(0., 0.001))) - - if self.test_mode: - return out - else: - loss = self.calc_loss(out, target, target_weight) - return loss, out - - def conv_bn_layer(self, - input, - num_filters, - filter_size, - stride=1, - groups=1, - act=None): - conv = fluid.layers.conv2d( - input=input, - num_filters=num_filters, - filter_size=filter_size, - stride=stride, - padding=(filter_size - 1) // 2, - groups=groups, - param_attr=fluid.param_attr.ParamAttr( - initializer=fluid.initializer.Normal(0., 0.001)), - act=None, - bias_attr=False) - return fluid.layers.batch_norm(input=conv, act=act, momentum=BN_MOMENTUM) - - def shortcut(self, input, ch_out, stride): - ch_in = input.shape[1] - if ch_in != ch_out or stride != 1: - return self.conv_bn_layer(input, ch_out, 1, stride) - else: - return input - - def calc_loss(self, heatmap, target, target_weight): - _, c, h, w = heatmap.shape - x = fluid.layers.reshape(heatmap, (-1, self.k, h*w)) - y = fluid.layers.reshape(target, (-1, self.k, h*w)) - w = fluid.layers.reshape(target_weight, (-1, self.k)) - - x = fluid.layers.split(x, num_or_sections=self.k, dim=1) - y = fluid.layers.split(y, num_or_sections=self.k, dim=1) - w = fluid.layers.split(w, num_or_sections=self.k, dim=1) - - _list = [] - for idx in range(self.k): - _tmp = fluid.layers.scale(x=x[idx] - y[idx], scale=1.) - _tmp = _tmp * _tmp - _tmp = fluid.layers.reduce_mean(_tmp, dim=2) - _list.append(_tmp * w[idx]) - - _loss = fluid.layers.concat(_list, axis=0) - _loss = fluid.layers.reduce_mean(_loss) - return 0.5 * _loss - - def bottleneck_block(self, input, num_filters, stride): - conv0 = self.conv_bn_layer( - input=input, num_filters=num_filters, filter_size=1, act='relu') - conv1 = self.conv_bn_layer( - input=conv0, - num_filters=num_filters, - filter_size=3, - stride=stride, - act='relu') - conv2 = self.conv_bn_layer( - input=conv1, num_filters=num_filters * 4, filter_size=1, act=None) - - short = self.shortcut(input, num_filters * 4, stride) - - return fluid.layers.elementwise_add(x=short, y=conv2, act='relu') - -def ResNet50(): - model = ResNet(layers=50) - return model - -def ResNet101(): - model = ResNet(layers=101) - return model - -def ResNet152(): - model = ResNet(layers=152) - return model diff --git a/fluid/PaddleCV/human_pose_estimation/test.py b/fluid/PaddleCV/human_pose_estimation/test.py deleted file mode 100644 index 4c56f29b1d126b4c5141bab24ca4ccb26aacb264..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/human_pose_estimation/test.py +++ /dev/null @@ -1,159 +0,0 @@ -# Copyright (c) 2018-present, Baidu, Inc. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -############################################################################## - -"""Functions for inference.""" - -import os -import argparse -import functools -import paddle -import paddle.fluid as fluid -import paddle.fluid.layers as layers - -from tqdm import tqdm -from lib import pose_resnet -from utils.transforms import flip_back -from utils.utility import * - -parser = argparse.ArgumentParser(description=__doc__) -add_arg = functools.partial(add_arguments, argparser=parser) - -# yapf: disable -add_arg('batch_size', int, 32, "Minibatch size.") -add_arg('dataset', str, 'mpii', "Dataset") -add_arg('use_gpu', bool, True, "Whether to use GPU or not.") -add_arg('num_epochs', int, 140, "Number of epochs.") -add_arg('total_images', int, 144406, "Training image number.") -add_arg('kp_dim', int, 16, "Class number.") -add_arg('model_save_dir', str, "output", "Model save directory") -add_arg('with_mem_opt', bool, True, "Whether to use memory optimization or not.") -add_arg('pretrained_model', str, None, "Whether to use pretrained model.") -add_arg('checkpoint', str, None, "Whether to resume checkpoint.") -add_arg('lr', float, 0.001, "Set learning rate.") -add_arg('lr_strategy', str, "piecewise_decay", "Set the learning rate decay strategy.") -add_arg('flip_test', bool, True, "Flip test") -add_arg('shift_heatmap', bool, True, "Shift heatmap") -add_arg('post_process', bool, False, "post process") -# yapf: enable - -FLIP_PAIRS = [[0, 5], [1, 4], [2, 3], [10, 15], [11, 14], [12, 13]] - -def test(args): - if args.dataset == 'coco': - import lib.coco_reader as reader - IMAGE_SIZE = [288, 384] - # HEATMAP_SIZE = [72, 96] - FLIP_PAIRS = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] - args.kp_dim = 17 - args.total_images = 144406 # 149813 - elif args.dataset == 'mpii': - import lib.mpii_reader as reader - IMAGE_SIZE = [384, 384] - # HEATMAP_SIZE = [96, 96] - FLIP_PAIRS = [[0, 5], [1, 4], [2, 3], [10, 15], [11, 14], [12, 13]] - args.kp_dim = 16 - args.total_images = 2958 # validation - else: - raise ValueError('The dataset {} is not supported yet.'.format(args.dataset)) - - print_arguments(args) - - # Image and target - image = layers.data(name='image', shape=[3, IMAGE_SIZE[1], IMAGE_SIZE[0]], dtype='float32') - file_id = layers.data(name='file_id', shape=[1,], dtype='int') - - # Build model - model = pose_resnet.ResNet(layers=50, kps_num=args.kp_dim, test_mode=True) - - # Output - output = model.net(input=image, target=None, target_weight=None) - - # Parameters from model and arguments - params = {} - params["total_images"] = args.total_images - params["lr"] = args.lr - params["num_epochs"] = args.num_epochs - params["learning_strategy"] = {} - params["learning_strategy"]["batch_size"] = args.batch_size - params["learning_strategy"]["name"] = args.lr_strategy - - if args.with_mem_opt: - fluid.memory_optimize(fluid.default_main_program(), - skip_opt_set=[output.name]) - - place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace() - exe = fluid.Executor(place) - exe.run(fluid.default_startup_program()) - - args.pretrained_model = './pretrained/resnet_50/115' - if args.pretrained_model: - def if_exist(var): - exist_flag = os.path.exists(os.path.join(args.pretrained_model, var.name)) - return exist_flag - fluid.io.load_vars(exe, args.pretrained_model, predicate=if_exist) - - if args.checkpoint is not None: - fluid.io.load_persistables(exe, args.checkpoint) - - # Dataloader - test_reader = paddle.batch(reader.test(), batch_size=args.batch_size) - feeder = fluid.DataFeeder(place=place, feed_list=[image, file_id]) - - test_exe = fluid.ParallelExecutor( - use_cuda=True if args.use_gpu else False, - main_program=fluid.default_main_program().clone(for_test=False), - loss_name=None) - - fetch_list = [image.name, output.name] - - for batch_id, data in tqdm(enumerate(test_reader())): - num_images = len(data) - - file_ids = [] - for i in range(num_images): - file_ids.append(data[i][1]) - - input_image, out_heatmaps = test_exe.run( - fetch_list=fetch_list, - feed=feeder.feed(data)) - - if args.flip_test: - # Flip all the images in a same batch - data_fliped = [] - for i in range(num_images): - data_fliped.append(( - data[i][0][:, :, ::-1], - data[i][1])) - - # Inference again - _, output_flipped = test_exe.run( - fetch_list=fetch_list, - feed=feeder.feed(data_fliped)) - - # Flip back - output_flipped = flip_back(output_flipped, FLIP_PAIRS) - - # Feature is not aligned, shift flipped heatmap for higher accuracy - if args.shift_heatmap: - output_flipped[:, :, :, 1:] = \ - output_flipped.copy()[:, :, :, 0:-1] - - # Aggregate - out_heatmaps = (out_heatmaps + output_flipped) * 0.5 - save_predict_results(input_image, out_heatmaps, file_ids, fold_name='results') - -if __name__ == '__main__': - args = parser.parse_args() - test(args) diff --git a/fluid/PaddleCV/human_pose_estimation/train.py b/fluid/PaddleCV/human_pose_estimation/train.py deleted file mode 100644 index 20926eb3078f1eb21fe48785d0edb12a5d35e595..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/human_pose_estimation/train.py +++ /dev/null @@ -1,172 +0,0 @@ -# Copyright (c) 2018-present, Baidu, Inc. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -############################################################################## - -"""Functions for training.""" - -import os -import numpy as np -import cv2 -import paddle -import paddle.fluid as fluid -import paddle.fluid.layers as layers -import argparse -import functools - -from lib import pose_resnet -from utils.utility import * - -parser = argparse.ArgumentParser(description=__doc__) -add_arg = functools.partial(add_arguments, argparser=parser) -# yapf: disable -add_arg('batch_size', int, 32, "Minibatch size.") -add_arg('dataset', str, 'mpii', "Dataset") -add_arg('use_gpu', bool, True, "Whether to use GPU or not.") -add_arg('num_epochs', int, 140, "Number of epochs.") -add_arg('total_images', int, 144406, "Training image number.") -add_arg('kp_dim', int, 16, "Class number.") -add_arg('model_save_dir', str, "output", "Model save directory") -add_arg('with_mem_opt', bool, True, "Whether to use memory optimization or not.") -add_arg('pretrained_model', str, None, "Whether to use pretrained model.") -add_arg('checkpoint', str, None, "Whether to resume checkpoint.") -add_arg('lr', float, 0.001, "Set learning rate.") -add_arg('lr_strategy', str, "piecewise_decay", "Set the learning rate decay strategy.") -# yapf: enable - -def optimizer_setting(args, params): - lr_drop_ratio = 0.1 - - ls = params["learning_strategy"] - - if ls["name"] == "piecewise_decay": - total_images = params["total_images"] - batch_size = ls["batch_size"] - step = int(total_images / batch_size + 1) - - ls['epochs'] = [91, 121] - print('=> LR will be dropped at the epoch of {}'.format(ls['epochs'])) - - bd = [step * e for e in ls["epochs"]] - base_lr = params["lr"] - lr = [] - lr = [base_lr * (lr_drop_ratio**i) for i in range(len(bd) + 1)] - - # AdamOptimizer - optimizer = paddle.fluid.optimizer.AdamOptimizer( - learning_rate=fluid.layers.piecewise_decay( - boundaries=bd, values=lr)) - else: - lr = params["lr"] - optimizer = fluid.optimizer.Momentum( - learning_rate=lr, - momentum=0.9, - regularization=fluid.regularizer.L2Decay(0.0005)) - - return optimizer - -def train(args): - if args.dataset == 'coco': - import lib.coco_reader as reader - IMAGE_SIZE = [288, 384] - HEATMAP_SIZE = [72, 96] - args.kp_dim = 17 - args.total_images = 144406 # 149813 - elif args.dataset == 'mpii': - import lib.mpii_reader as reader - IMAGE_SIZE = [384, 384] - HEATMAP_SIZE = [96, 96] - args.kp_dim = 16 - args.total_images = 22246 - else: - raise ValueError('The dataset {} is not supported yet.'.format(args.dataset)) - - print_arguments(args) - - # Image and target - image = layers.data(name='image', shape=[3, IMAGE_SIZE[1], IMAGE_SIZE[0]], dtype='float32') - target = layers.data(name='target', shape=[args.kp_dim, HEATMAP_SIZE[1], HEATMAP_SIZE[0]], dtype='float32') - target_weight = layers.data(name='target_weight', shape=[args.kp_dim, 1], dtype='float32') - - # Build model - model = pose_resnet.ResNet(layers=50, kps_num=args.kp_dim) - - # Output - loss, output = model.net(input=image, target=target, target_weight=target_weight) - - # Parameters from model and arguments - params = {} - params["total_images"] = args.total_images - params["lr"] = args.lr - params["num_epochs"] = args.num_epochs - params["learning_strategy"] = {} - params["learning_strategy"]["batch_size"] = args.batch_size - params["learning_strategy"]["name"] = args.lr_strategy - - # Initialize optimizer - optimizer = optimizer_setting(args, params) - optimizer.minimize(loss) - - if args.with_mem_opt: - fluid.memory_optimize(fluid.default_main_program(), - skip_opt_set=[loss.name, output.name, target.name]) - - place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace() - exe = fluid.Executor(place) - exe.run(fluid.default_startup_program()) - - args.pretrained_model = './pretrained/resnet_50/115' - if args.pretrained_model: - def if_exist(var): - exist_flag = os.path.exists(os.path.join(args.pretrained_model, var.name)) - return exist_flag - fluid.io.load_vars(exe, args.pretrained_model, predicate=if_exist) - - if args.checkpoint is not None: - fluid.io.load_persistables(exe, args.checkpoint) - - # Dataloader - train_reader = paddle.batch(reader.train(), batch_size=args.batch_size) - feeder = fluid.DataFeeder(place=place, feed_list=[image, target, target_weight]) - - train_exe = fluid.ParallelExecutor( - use_cuda=True if args.use_gpu else False, loss_name=loss.name) - fetch_list = [image.name, loss.name, output.name] - - for pass_id in range(params["num_epochs"]): - for batch_id, data in enumerate(train_reader()): - current_lr = np.array(paddle.fluid.global_scope().find_var('learning_rate').get_tensor()) - - input_image, loss, out_heatmaps = train_exe.run( - fetch_list, feed=feeder.feed(data)) - - loss = np.mean(np.array(loss)) - - print('Epoch [{:4d}/{:3d}] LR: {:.10f} ' - 'Loss = {:.5f}'.format( - batch_id, pass_id, current_lr[0], loss)) - - if batch_id % 10 == 0: - save_batch_heatmaps(input_image, out_heatmaps, file_name='visualization@train.jpg', normalize=True) - - model_path = os.path.join(args.model_save_dir + '/' + 'simplebase-{}'.format(args.dataset), - str(pass_id)) - if not os.path.isdir(model_path): - os.makedirs(model_path) - fluid.io.save_persistables(exe, model_path) - - -if __name__ == '__main__': - args = parser.parse_args() - train(args) - diff --git a/fluid/PaddleCV/human_pose_estimation/utils/__init__.py b/fluid/PaddleCV/human_pose_estimation/utils/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/fluid/PaddleCV/human_pose_estimation/utils/transforms.py b/fluid/PaddleCV/human_pose_estimation/utils/transforms.py deleted file mode 100644 index 6f22203637f72e9d01877e25e2a01adf5df1e181..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/human_pose_estimation/utils/transforms.py +++ /dev/null @@ -1,141 +0,0 @@ -# Copyright (c) 2018-present, Baidu, Inc. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -############################################################################## -# -# Based on -# ------------------------------------------------------------------------------ -# https://github.com/Microsoft/human-pose-estimation.pytorch -# Copyright (c) Microsoft -# Licensed under the MIT License. -# Written by Bin Xiao (Bin.Xiao@microsoft.com) -# ------------------------------------------------------------------------------ - -"""Transforms functions.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import numpy as np -import cv2 - - -def flip_back(output_flipped, matched_parts): - """ - :param ouput_flipped: numpy.ndarray(batch_size, num_joints, height, width) - """ - assert output_flipped.ndim == 4,\ - 'output_flipped should be [batch_size, num_joints, height, width]' - - output_flipped = output_flipped[:, :, :, ::-1] - - for pair in matched_parts: - tmp = output_flipped[:, pair[0], :, :].copy() - output_flipped[:, pair[0], :, :] = output_flipped[:, pair[1], :, :] - output_flipped[:, pair[1], :, :] = tmp - - return output_flipped - - -def fliplr_joints(joints, joints_vis, width, matched_parts): - """Flip coords. - """ - # Flip horizontal - joints[:, 0] = width - joints[:, 0] - 1 - - # Change left-right parts - for pair in matched_parts: - joints[pair[0], :], joints[pair[1], :] = \ - joints[pair[1], :], joints[pair[0], :].copy() - joints_vis[pair[0], :], joints_vis[pair[1], :] = \ - joints_vis[pair[1], :], joints_vis[pair[0], :].copy() - - return joints*joints_vis, joints_vis - - -def transform_preds(coords, center, scale, output_size): - target_coords = np.zeros(coords.shape) - trans = get_affine_transform(center, scale, 0, output_size, inv=1) - for p in range(coords.shape[0]): - target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans) - return target_coords - - -def get_affine_transform(center, - scale, - rot, - output_size, - shift=np.array([0, 0], dtype=np.float32), - inv=0): - if not isinstance(scale, np.ndarray) and not isinstance(scale, list): - print(scale) - scale = np.array([scale, scale]) - - scale_tmp = scale * 200.0 - src_w = scale_tmp[0] - dst_w = output_size[0] - dst_h = output_size[1] - - rot_rad = np.pi * rot / 180 - src_dir = get_dir([0, src_w * -0.5], rot_rad) - dst_dir = np.array([0, dst_w * -0.5], np.float32) - - src = np.zeros((3, 2), dtype=np.float32) - dst = np.zeros((3, 2), dtype=np.float32) - src[0, :] = center + scale_tmp * shift - src[1, :] = center + src_dir + scale_tmp * shift - dst[0, :] = [dst_w * 0.5, dst_h * 0.5] - dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir - - src[2:, :] = get_3rd_point(src[0, :], src[1, :]) - dst[2:, :] = get_3rd_point(dst[0, :], dst[1, :]) - - if inv: - trans = cv2.getAffineTransform(np.float32(dst), np.float32(src)) - else: - trans = cv2.getAffineTransform(np.float32(src), np.float32(dst)) - - return trans - - -def affine_transform(pt, t): - new_pt = np.array([pt[0], pt[1], 1.]).T - new_pt = np.dot(t, new_pt) - return new_pt[:2] - - -def get_3rd_point(a, b): - direct = a - b - return b + np.array([-direct[1], direct[0]], dtype=np.float32) - - -def get_dir(src_point, rot_rad): - sn, cs = np.sin(rot_rad), np.cos(rot_rad) - - src_result = [0, 0] - src_result[0] = src_point[0] * cs - src_point[1] * sn - src_result[1] = src_point[0] * sn + src_point[1] * cs - - return src_result - - -def crop(img, center, scale, output_size, rot=0): - trans = get_affine_transform(center, scale, rot, output_size) - - dst_img = cv2.warpAffine(img, - trans, - (int(output_size[0]), int(output_size[1])), - flags=cv2.INTER_LINEAR) - - return dst_img diff --git a/fluid/PaddleCV/human_pose_estimation/utils/utility.py b/fluid/PaddleCV/human_pose_estimation/utils/utility.py deleted file mode 100644 index 9c3cd61fbbd41ea2262802beb548d80f13fd1a38..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/human_pose_estimation/utils/utility.py +++ /dev/null @@ -1,397 +0,0 @@ -# Copyright (c) 2018-present, Baidu, Inc. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -############################################################################## - -"""Utility functions.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function -import math -import distutils.util -import numpy as np -import cv2 -from pathlib import Path - - -def print_arguments(args): - """Print argparse's arguments. - - Usage: - - .. code-block:: python - - parser = argparse.ArgumentParser() - parser.add_argument("name", default="Jonh", type=str, help="User name.") - args = parser.parse_args() - print_arguments(args) - - :param args: Input argparse.Namespace for printing. - :type args: argparse.Namespace - """ - print("----------- Configuration Arguments -----------") - for arg, value in sorted(vars(args).iteritems()): - print("%s: %s" % (arg, value)) - print("------------------------------------------------") - - -def add_arguments(argname, type, default, help, argparser, **kwargs): - """Add argparse's argument. - - Usage: - - .. code-block:: python - - parser = argparse.ArgumentParser() - add_argument("name", str, "Jonh", "User name.", parser) - args = parser.parse_args() - """ - type = distutils.util.strtobool if type == bool else type - argparser.add_argument( - "--" + argname, - default=default, - type=type, - help=help + ' Default: %(default)s.', - **kwargs) - - -def get_max_preds(batch_heatmaps): - """Get predictions from score maps. - heatmaps: numpy.ndarray([batch_size, num_joints, height, width]) - """ - assert isinstance(batch_heatmaps, np.ndarray), \ - 'batch_heatmaps should be numpy.ndarray' - assert batch_heatmaps.ndim == 4, 'batch_images should be 4-ndim' - - batch_size = batch_heatmaps.shape[0] - num_joints = batch_heatmaps.shape[1] - width = batch_heatmaps.shape[3] - heatmaps_reshaped = batch_heatmaps.reshape((batch_size, num_joints, -1)) - idx = np.argmax(heatmaps_reshaped, 2) - maxvals = np.amax(heatmaps_reshaped, 2) - - maxvals = maxvals.reshape((batch_size, num_joints, 1)) - idx = idx.reshape((batch_size, num_joints, 1)) - - preds = np.tile(idx, (1, 1, 2)).astype(np.float32) - - preds[:, :, 0] = (preds[:, :, 0]) % width - preds[:, :, 1] = np.floor((preds[:, :, 1]) / width) - - pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2)) - pred_mask = pred_mask.astype(np.float32) - - preds *= pred_mask - return preds, maxvals - - -def affine_transform(pt, t): - new_pt = np.array([pt[0], pt[1], 1.]).T - new_pt = np.dot(t, new_pt) - return new_pt[:2] - - -def get_3rd_point(a, b): - direct = a - b - return b + np.array([-direct[1], direct[0]], dtype=np.float32) - - -def get_dir(src_point, rot_rad): - sn, cs = np.sin(rot_rad), np.cos(rot_rad) - - src_result = [0, 0] - src_result[0] = src_point[0] * cs - src_point[1] * sn - src_result[1] = src_point[0] * sn + src_point[1] * cs - - return src_result - - -def crop(img, center, scale, output_size, rot=0): - trans = get_affine_transform(center, scale, rot, output_size) - - dst_img = cv2.warpAffine(img, - trans, - (int(output_size[0]), int(output_size[1])), - flags=cv2.INTER_LINEAR) - - return dst_img - - -def get_affine_transform(center, - scale, - rot, - output_size, - shift=np.array([0, 0], dtype=np.float32), - inv=0): - if not isinstance(scale, np.ndarray) and not isinstance(scale, list): - print(scale) - scale = np.array([scale, scale]) - - scale_tmp = scale * 200.0 - src_w = scale_tmp[0] - dst_w = output_size[0] - dst_h = output_size[1] - - rot_rad = np.pi * rot / 180 - src_dir = get_dir([0, src_w * -0.5], rot_rad) - dst_dir = np.array([0, dst_w * -0.5], np.float32) - - src = np.zeros((3, 2), dtype=np.float32) - dst = np.zeros((3, 2), dtype=np.float32) - src[0, :] = center + scale_tmp * shift - src[1, :] = center + src_dir + scale_tmp * shift - dst[0, :] = [dst_w * 0.5, dst_h * 0.5] - dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir - - src[2:, :] = get_3rd_point(src[0, :], src[1, :]) - dst[2:, :] = get_3rd_point(dst[0, :], dst[1, :]) - - if inv: - trans = cv2.getAffineTransform(np.float32(dst), np.float32(src)) - else: - trans = cv2.getAffineTransform(np.float32(src), np.float32(dst)) - - return trans - - -def transform_preds(coords, center, scale, output_size): - target_coords = np.zeros(coords.shape) - trans = get_affine_transform(center, scale, 0, output_size, inv=1) - for p in range(coords.shape[0]): - target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans) - return target_coords - - -def get_final_preds(args, batch_heatmaps, center, scale): - coords, maxvals = get_max_preds(batch_heatmaps) - - heatmap_height = batch_heatmaps.shape[2] - heatmap_width = batch_heatmaps.shape[3] - - # Post-processing - if args.post_process: - for n in range(coords.shape[0]): - for p in range(coords.shape[1]): - hm = batch_heatmaps[n][p] - px = int(math.floor(coords[n][p][0] + 0.5)) - py = int(math.floor(coords[n][p][1] + 0.5)) - if 1 < px < heatmap_width-1 and 1 < py < heatmap_height-1: - diff = np.array([hm[py][px+1] - hm[py][px-1], - hm[py+1][px]-hm[py-1][px]]) - coords[n][p] += np.sign(diff) * .25 - - preds = coords.copy() - - # Transform back - for i in range(coords.shape[0]): - preds[i] = transform_preds(coords[i], center[i], scale[i], - [heatmap_width, heatmap_height]) - return preds, maxvals - - -def calc_dists(preds, target, normalize): - preds = preds.astype(np.float32) - target = target.astype(np.float32) - dists = np.zeros((preds.shape[1], preds.shape[0])) - for n in range(preds.shape[0]): - for c in range(preds.shape[1]): - if target[n, c, 0] > 1 and target[n, c, 1] > 1: - normed_preds = preds[n, c, :] / normalize[n] - normed_targets = target[n, c, :] / normalize[n] - dists[c, n] = np.linalg.norm(normed_preds - normed_targets) - else: - dists[c, n] = -1 - return dists - - -def dist_acc(dists, thr=0.5): - """Return percentage below threshold while ignoring values with a -1. - """ - dist_cal = np.not_equal(dists, -1) - num_dist_cal = dist_cal.sum() - if num_dist_cal > 0: - return np.less(dists[dist_cal], thr).sum() * 1.0 / num_dist_cal - else: - return -1 - - -def accuracy(output, target, hm_type='gaussian', thr=0.5): - """ - Calculate accuracy according to PCK, - but uses ground truth heatmap rather than x,y locations - First value to be returned is average accuracy across 'idxs', - followed by individual accuracies - """ - idx = list(range(output.shape[1])) - norm = 1.0 - if hm_type == 'gaussian': - pred, _ = get_max_preds(output) - target, _ = get_max_preds(target) - h = output.shape[2] - w = output.shape[3] - norm = np.ones((pred.shape[0], 2)) * np.array([h, w]) / 10 - dists = calc_dists(pred, target, norm) - - acc = np.zeros((len(idx) + 1)) - avg_acc = 0 - cnt = 0 - - for i in range(len(idx)): - acc[i + 1] = dist_acc(dists[idx[i]]) - if acc[i + 1] >= 0: - avg_acc = avg_acc + acc[i + 1] - cnt += 1 - - avg_acc = avg_acc / cnt if cnt != 0 else 0 - if cnt != 0: - acc[0] = avg_acc - return acc, avg_acc, cnt, pred - - -def save_batch_heatmaps(batch_image, batch_heatmaps, file_name, normalize=True): - """ - :param batch_image: [batch_size, channel, height, width] - :param batch_heatmaps: ['batch_size, num_joints, height, width] - :param file_name: saved file name - """ - if normalize: - min = np.array(batch_image.min(), dtype=np.float) - max = np.array(batch_image.max(), dtype=np.float) - - batch_image = np.add(batch_image, -min) - batch_image = np.divide(batch_image, max - min + 1e-5) - - batch_size, num_joints, \ - heatmap_height, heatmap_width = batch_heatmaps.shape - - grid_image = np.zeros((batch_size*heatmap_height, - (num_joints+1)*heatmap_width, - 3), - dtype=np.uint8) - - preds, maxvals = get_max_preds(batch_heatmaps) - - for i in range(batch_size): - image = batch_image[i] * 255 - image = image.clip(0, 255).astype(np.uint8) - image = image.transpose(1, 2, 0) - - heatmaps = batch_heatmaps[i] * 255 - heatmaps = heatmaps.clip(0, 255).astype(np.uint8) - - resized_image = cv2.resize(image, - (int(heatmap_width), int(heatmap_height))) - height_begin = heatmap_height * i - height_end = heatmap_height * (i + 1) - for j in range(num_joints): - cv2.circle(resized_image, - (int(preds[i][j][0]), int(preds[i][j][1])), - 1, [0, 0, 255], 1) - heatmap = heatmaps[j, :, :] - colored_heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET) - masked_image = colored_heatmap*0.7 + resized_image*0.3 - cv2.circle(masked_image, - (int(preds[i][j][0]), int(preds[i][j][1])), - 1, [0, 0, 255], 1) - - width_begin = heatmap_width * (j+1) - width_end = heatmap_width * (j+2) - grid_image[height_begin:height_end, width_begin:width_end, :] = \ - masked_image - grid_image[height_begin:height_end, 0:heatmap_width, :] = resized_image - - cv2.imwrite(file_name, grid_image) - - -def save_predict_results(batch_image, batch_heatmaps, file_ids, fold_name, normalize=True): - """ - :param batch_image: [batch_size, channel, height, width] - :param batch_heatmaps: ['batch_size, num_joints, height, width] - :param fold_name: saved files in this folder - """ - save_dir = Path('./{}'.format(fold_name)) - try: - save_dir.mkdir() - except OSError: - pass - - if normalize: - min = np.array(batch_image.min(), dtype=np.float) - max = np.array(batch_image.max(), dtype=np.float) - - batch_image = np.add(batch_image, -min) - batch_image = np.divide(batch_image, max - min + 1e-5) - - batch_size, num_joints, \ - heatmap_height, heatmap_width = batch_heatmaps.shape - - # (32, 16, 2), (32, 16, 1)) - preds, maxvals = get_max_preds(batch_heatmaps) - - # Blue - icolor = (255, 137, 0) - ocolor = (138, 255, 0) - - for i in range(batch_size): - image = batch_image[i] * 255 - image = image.clip(0, 255).astype(np.uint8) - image = image.transpose(1, 2, 0) - image = cv2.resize(image, (384, 384)) - - file_id = file_ids[i] - imgname = save_dir.joinpath('rendered_{}.png'.format(str(file_id).zfill(7))) - - for j in range(num_joints): - x, y = preds[i][j] - cv2.circle(image, (int(x * 4), int(y * 4)), 3, icolor, -1, 16) - cv2.circle(image, (int(x * 4), int(y * 4)), 6, ocolor, 1, 16) - - cv2.imwrite(str(imgname), image) - -# Clean format output -def print_name_value(name_value, full_arch_name): - names = name_value.keys() - values = name_value.values() - num_values = len(name_value) - - results = [] - for value in values: - results.append('| {:.3f}'.format(value)) - - print( - '| Arch ' + - ' '.join(['| {}'.format(name) for name in names]) + - ' |' - ) - print('|---' * (num_values+1) + '|') - print('| ' + 'SIMPLEBASE RESNET50 ' + ' '.join(results) + ' |') - - -class AverageMeter(object): - """Computes and stores the average and current value. - """ - def __init__(self): - self.reset() - - def reset(self): - self.val = 0 - self.avg = 0 - self.sum = 0 - self.count = 0 - - def update(self, val, n=1): - self.val = val - self.sum += val * n - self.count += n - self.avg = self.sum / self.count if self.count != 0 else 0 diff --git a/fluid/PaddleCV/human_pose_estimation/val.py b/fluid/PaddleCV/human_pose_estimation/val.py deleted file mode 100644 index d73113e5c2b2cfb017b98c856b5f455708a69dc6..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/human_pose_estimation/val.py +++ /dev/null @@ -1,316 +0,0 @@ -# Copyright (c) 2018-present, Baidu, Inc. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -############################################################################## - -"""Functions for validation.""" - -import os -import argparse -import functools -import numpy as np -import paddle -import paddle.fluid as fluid -import paddle.fluid.layers as layers - -from collections import OrderedDict -from scipy.io import loadmat, savemat -from lib import pose_resnet -from utils.transforms import flip_back -from utils.utility import * - -parser = argparse.ArgumentParser(description=__doc__) -add_arg = functools.partial(add_arguments, argparser=parser) - -# yapf: disable -add_arg('batch_size', int, 32, "Minibatch size.") -add_arg('dataset', str, 'mpii', "Dataset") -add_arg('use_gpu', bool, True, "Whether to use GPU or not.") -add_arg('num_epochs', int, 140, "Number of epochs.") -add_arg('total_images', int, 144406, "Training image number.") -add_arg('kp_dim', int, 16, "Class number.") -add_arg('model_save_dir', str, "output", "Model save directory") -add_arg('with_mem_opt', bool, True, "Whether to use memory optimization or not.") -add_arg('pretrained_model', str, None, "Whether to use pretrained model.") -add_arg('checkpoint', str, None, "Whether to resume checkpoint.") -add_arg('lr', float, 0.001, "Set learning rate.") -add_arg('lr_strategy', str, "piecewise_decay", "Set the learning rate decay strategy.") -add_arg('flip_test', bool, True, "Flip test") -add_arg('shift_heatmap', bool, True, "Shift heatmap") -add_arg('post_process', bool, True, "Post process") -# yapf: enable - -def valid(args): - if args.dataset == 'coco': - import lib.coco_reader as reader - IMAGE_SIZE = [288, 384] - HEATMAP_SIZE = [72, 96] - FLIP_PAIRS = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]] - args.kp_dim = 17 - args.total_images = 144406 # 149813 - elif args.dataset == 'mpii': - import lib.mpii_reader as reader - IMAGE_SIZE = [384, 384] - HEATMAP_SIZE = [96, 96] - FLIP_PAIRS = [[0, 5], [1, 4], [2, 3], [10, 15], [11, 14], [12, 13]] - args.kp_dim = 16 - args.total_images = 2958 # validation - else: - raise ValueError('The dataset {} is not supported yet.'.format(args.dataset)) - - print_arguments(args) - - # Image and target - image = layers.data(name='image', shape=[3, IMAGE_SIZE[1], IMAGE_SIZE[0]], dtype='float32') - target = layers.data(name='target', shape=[args.kp_dim, HEATMAP_SIZE[1], HEATMAP_SIZE[0]], dtype='float32') - target_weight = layers.data(name='target_weight', shape=[args.kp_dim, 1], dtype='float32') - center = layers.data(name='center', shape=[2,], dtype='float32') - scale = layers.data(name='scale', shape=[2,], dtype='float32') - score = layers.data(name='score', shape=[1,], dtype='float32') - - # Build model - model = pose_resnet.ResNet(layers=50, kps_num=args.kp_dim) - - # Output - loss, output = model.net(input=image, target=target, target_weight=target_weight) - - # Parameters from model and arguments - params = {} - params["total_images"] = args.total_images - params["lr"] = args.lr - params["num_epochs"] = args.num_epochs - params["learning_strategy"] = {} - params["learning_strategy"]["batch_size"] = args.batch_size - params["learning_strategy"]["name"] = args.lr_strategy - - if args.with_mem_opt: - fluid.memory_optimize(fluid.default_main_program(), - skip_opt_set=[loss.name, output.name, target.name]) - - place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace() - exe = fluid.Executor(place) - exe.run(fluid.default_startup_program()) - - args.pretrained_model = './pretrained/resnet_50/115' - if args.pretrained_model: - def if_exist(var): - exist_flag = os.path.exists(os.path.join(args.pretrained_model, var.name)) - return exist_flag - fluid.io.load_vars(exe, args.pretrained_model, predicate=if_exist) - - if args.checkpoint is not None: - fluid.io.load_persistables(exe, args.checkpoint) - - # Dataloader - valid_reader = paddle.batch(reader.valid(), batch_size=args.batch_size) - feeder = fluid.DataFeeder(place=place, feed_list=[image, target, target_weight, center, scale, score]) - - valid_exe = fluid.ParallelExecutor( - use_cuda=True if args.use_gpu else False, - main_program=fluid.default_main_program().clone(for_test=False), - loss_name=loss.name) - - fetch_list = [image.name, loss.name, output.name, target.name] - - # For validation - acc = AverageMeter() - idx = 0 - - num_samples = args.total_images - all_preds = np.zeros((num_samples, args.kp_dim, 3), - dtype=np.float32) - all_boxes = np.zeros((num_samples, 6)) - - for batch_id, data in enumerate(valid_reader()): - num_images = len(data) - - centers = [] - scales = [] - scores = [] - for i in range(num_images): - centers.append(data[i][3]) - scales.append(data[i][4]) - scores.append(data[i][5]) - - input_image, loss, out_heatmaps, target_heatmaps = valid_exe.run( - fetch_list=fetch_list, - feed=feeder.feed(data)) - - if args.flip_test: - # Flip all the images in a same batch - data_fliped = [] - for i in range(num_images): - # Input, target, target_weight, c, s, score - data_fliped.append(( - # np.flip(input_image, 3)[i], - data[i][0][:, :, ::-1], - data[i][1], - data[i][2], - data[i][3], - data[i][4], - data[i][5])) - - # Inference again - _, _, output_flipped, _ = valid_exe.run( - fetch_list=fetch_list, - feed=feeder.feed(data_fliped)) - - # Flip back - output_flipped = flip_back(output_flipped, FLIP_PAIRS) - - # Feature is not aligned, shift flipped heatmap for higher accuracy - if args.shift_heatmap: - output_flipped[:, :, :, 1:] = \ - output_flipped.copy()[:, :, :, 0:-1] - - # Aggregate - # out_heatmaps.shape: size[b, args.kp_dim, 96, 96] - out_heatmaps = (out_heatmaps + output_flipped) * 0.5 - - loss = np.mean(np.array(loss)) - - # Accuracy - _, avg_acc, cnt, pred = accuracy(out_heatmaps, target_heatmaps) - acc.update(avg_acc, cnt) - - # Current center, scale, score - centers = np.array(centers) - scales = np.array(scales) - scores = np.array(scores) - - preds, maxvals = get_final_preds( - args, out_heatmaps, centers, scales) - - all_preds[idx:idx + num_images, :, 0:2] = preds[:, :, 0:2] - all_preds[idx:idx + num_images, :, 2:3] = maxvals - # Double check this all_boxes parts - all_boxes[idx:idx + num_images, 0:2] = centers[:, 0:2] - all_boxes[idx:idx + num_images, 2:4] = scales[:, 0:2] - all_boxes[idx:idx + num_images, 4] = np.prod(scales*200, 1) - all_boxes[idx:idx + num_images, 5] = scores - # image_path.extend(meta['image']) - - idx += num_images - - print('Epoch [{:4d}] ' - 'Loss = {:.5f} ' - 'Acc = {:.5f}'.format(batch_id, loss, acc.avg)) - - if batch_id % 10 == 0: - save_batch_heatmaps(input_image, out_heatmaps, file_name='visualization@val.jpg', normalize=True) - - # Evaluate - args.DATAROOT = 'data/mpii' - args.TEST_SET = 'valid' - output_dir = '' - filenames = [] - imgnums = [] - image_path = [] - name_values, perf_indicator = mpii_evaluate( - args, all_preds, output_dir, all_boxes, image_path, - filenames, imgnums) - - print_name_value(name_values, perf_indicator) - -def mpii_evaluate(cfg, preds, output_dir, *args, **kwargs): - # Convert 0-based index to 1-based index - preds = preds[:, :, 0:2] + 1.0 - - if output_dir: - pred_file = os.path.join(output_dir, 'pred.mat') - savemat(pred_file, mdict={'preds': preds}) - - if 'test' in cfg.TEST_SET: - return {'Null': 0.0}, 0.0 - - SC_BIAS = 0.6 - threshold = 0.5 - - gt_file = os.path.join(cfg.DATAROOT, - 'annot', - 'gt_{}.mat'.format(cfg.TEST_SET)) - gt_dict = loadmat(gt_file) - dataset_joints = gt_dict['dataset_joints'] - jnt_missing = gt_dict['jnt_missing'] - pos_gt_src = gt_dict['pos_gt_src'] - headboxes_src = gt_dict['headboxes_src'] - - pos_pred_src = np.transpose(preds, [1, 2, 0]) - - head = np.where(dataset_joints == 'head')[1][0] - lsho = np.where(dataset_joints == 'lsho')[1][0] - lelb = np.where(dataset_joints == 'lelb')[1][0] - lwri = np.where(dataset_joints == 'lwri')[1][0] - lhip = np.where(dataset_joints == 'lhip')[1][0] - lkne = np.where(dataset_joints == 'lkne')[1][0] - lank = np.where(dataset_joints == 'lank')[1][0] - - rsho = np.where(dataset_joints == 'rsho')[1][0] - relb = np.where(dataset_joints == 'relb')[1][0] - rwri = np.where(dataset_joints == 'rwri')[1][0] - rkne = np.where(dataset_joints == 'rkne')[1][0] - rank = np.where(dataset_joints == 'rank')[1][0] - rhip = np.where(dataset_joints == 'rhip')[1][0] - - jnt_visible = 1 - jnt_missing - uv_error = pos_pred_src - pos_gt_src - uv_err = np.linalg.norm(uv_error, axis=1) - headsizes = headboxes_src[1, :, :] - headboxes_src[0, :, :] - headsizes = np.linalg.norm(headsizes, axis=0) - headsizes *= SC_BIAS - scale = np.multiply(headsizes, np.ones((len(uv_err), 1))) - scaled_uv_err = np.divide(uv_err, scale) - scaled_uv_err = np.multiply(scaled_uv_err, jnt_visible) - jnt_count = np.sum(jnt_visible, axis=1) - less_than_threshold = np.multiply((scaled_uv_err <= threshold), - jnt_visible) - PCKh = np.divide(100.*np.sum(less_than_threshold, axis=1), jnt_count) - - # Save - rng = np.arange(0, 0.5+0.01, 0.01) - pckAll = np.zeros((len(rng), cfg.kp_dim)) - - for r in range(len(rng)): - threshold = rng[r] - less_than_threshold = np.multiply(scaled_uv_err <= threshold, - jnt_visible) - pckAll[r, :] = np.divide(100.*np.sum(less_than_threshold, axis=1), - jnt_count) - - PCKh = np.ma.array(PCKh, mask=False) - PCKh.mask[6:8] = True - - jnt_count = np.ma.array(jnt_count, mask=False) - jnt_count.mask[6:8] = True - jnt_ratio = jnt_count / np.sum(jnt_count).astype(np.float64) - - name_value = [ - ('Head', PCKh[head]), - ('Shoulder', 0.5 * (PCKh[lsho] + PCKh[rsho])), - ('Elbow', 0.5 * (PCKh[lelb] + PCKh[relb])), - ('Wrist', 0.5 * (PCKh[lwri] + PCKh[rwri])), - ('Hip', 0.5 * (PCKh[lhip] + PCKh[rhip])), - ('Knee', 0.5 * (PCKh[lkne] + PCKh[rkne])), - ('Ankle', 0.5 * (PCKh[lank] + PCKh[rank])), - ('Mean', np.sum(PCKh * jnt_ratio)), - ('Mean@0.1', np.sum(pckAll[11, :] * jnt_ratio)) - ] - name_value = OrderedDict(name_value) - - return name_value, name_value['Mean'] - -# TODO: coco_evaluate() - -if __name__ == '__main__': - args = parser.parse_args() - valid(args) diff --git a/fluid/PaddleCV/video_classification/README.md b/fluid/PaddleCV/video_classification/README.md deleted file mode 100644 index 822c3ccf64cb1c5567e574425229974524a34471..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/video_classification/README.md +++ /dev/null @@ -1,140 +0,0 @@ -# Video Classification Based on Temporal Segment Network - -Video classification has drawn a significant amount of attentions in the past few years. This page introduces how to perform video classification with PaddlePaddle Fluid, on the public UCF-101 dataset, based on the state-of-the-art Temporal Segment Network (TSN) method. - -______________________________________________________________________________ - -## Table of Contents -
  • Installation
  • -
  • Data preparation
  • -
  • Training
  • -
  • Evaluation
  • -
  • Inference
  • -
  • Performance
  • - -### Installation -Running sample code in this directory requires PaddelPaddle Fluid v0.13.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in installation document and make an update. - -### Data preparation - -#### download UCF-101 dataset -Users can download the UCF-101 dataset by the provided script in data/download.sh. - -#### decode video into frame -To avoid the process of decoding videos in network training, we offline decode them into frames and save it in the pickle format, easily readable for python. - -Users can refer to the script data/video_decode.py for video decoding. - -#### split data into train and test -We follow the split 1 of UCF-101 dataset. After data splitting, users can get 9537 videos for training and 3783 videos for validation. The reference script is data/split_data.py. - -#### save pickle for training -As stated above, we save all data as pickle format for training. All information in each video is saved into one pickle, includes video id, frames binary and label. Please refer to the script data/generate_train_data.py. -After this operation, one can get two directories containing training and testing data in pickle format, and two files train.list and test.list, with each line seperated by SPACE. - -### Training -After data preparation, users can start the PaddlePaddle Fluid training by: -``` -python train.py \ - --batch_size=128 \ - --total_videos=9537 \ - --class_dim=101 \ - --num_epochs=60 \ - --image_shape=3,224,224 \ - --model_save_dir=output/ \ - --with_mem_opt=True \ - --lr_init=0.01 \ - --num_layers=50 \ - --seg_num=7 \ - --pretrained_model={path_to_pretrained_model} -``` - -parameter introduction: -
  • batch_size: the size of each mini-batch.
  • -
  • total_videos: total number of videos in the training set.
  • -
  • class_dim: the class number of the classification task.
  • -
  • num_epochs: the number of epochs.
  • -
  • image_shape: input size of the network.
  • -
  • model_save_dir: the directory to save trained model.
  • -
  • with_mem_opt: whether to use memory optimization or not.
  • -
  • lr_init: initialized learning rate.
  • -
  • num_layers: the number of layers for ResNet.
  • -
  • seg_num: the number of segments in TSN.
  • -
  • pretrained_model: model path for pretraining.
  • -
    - -data reader introduction: -Data reader is defined in reader.py. Note that we use group operation for all frames in one video. - - -training: -The training log is like: -``` -[TRAIN] Pass: 0 trainbatch: 0 loss: 4.630959 acc1: 0.0 acc5: 0.0390625 time: 3.09 sec -[TRAIN] Pass: 0 trainbatch: 10 loss: 4.559069 acc1: 0.0546875 acc5: 0.1171875 time: 3.91 sec -[TRAIN] Pass: 0 trainbatch: 20 loss: 4.040092 acc1: 0.09375 acc5: 0.3515625 time: 3.88 sec -[TRAIN] Pass: 0 trainbatch: 30 loss: 3.478214 acc1: 0.3203125 acc5: 0.5546875 time: 3.32 sec -[TRAIN] Pass: 0 trainbatch: 40 loss: 3.005404 acc1: 0.3515625 acc5: 0.6796875 time: 3.33 sec -[TRAIN] Pass: 0 trainbatch: 50 loss: 2.585245 acc1: 0.4609375 acc5: 0.7265625 time: 3.13 sec -[TRAIN] Pass: 0 trainbatch: 60 loss: 2.151489 acc1: 0.4921875 acc5: 0.8203125 time: 3.35 sec -[TRAIN] Pass: 0 trainbatch: 70 loss: 1.981680 acc1: 0.578125 acc5: 0.8359375 time: 3.30 sec -``` - -### Evaluation -Evaluation is to evaluate the performance of a trained model. One can download pretrained models and set its path to path_to_pretrain_model. Then top1/top5 accuracy can be obtained by running the following command: -``` -python eval.py \ - --batch_size=128 \ - --class_dim=101 \ - --image_shape=3,224,224 \ - --with_mem_opt=True \ - --num_layers=50 \ - --seg_num=7 \ - --test_model={path_to_pretrained_model} -``` - -According to the congfiguration of evaluation, the output log is like: -``` -[TEST] Pass: 0 testbatch: 0 loss: 0.011551 acc1: 1.0 acc5: 1.0 time: 0.48 sec -[TEST] Pass: 0 testbatch: 10 loss: 0.710330 acc1: 0.75 acc5: 1.0 time: 0.49 sec -[TEST] Pass: 0 testbatch: 20 loss: 0.000547 acc1: 1.0 acc5: 1.0 time: 0.48 sec -[TEST] Pass: 0 testbatch: 30 loss: 0.036623 acc1: 1.0 acc5: 1.0 time: 0.48 sec -[TEST] Pass: 0 testbatch: 40 loss: 0.138705 acc1: 1.0 acc5: 1.0 time: 0.48 sec -[TEST] Pass: 0 testbatch: 50 loss: 0.056909 acc1: 1.0 acc5: 1.0 time: 0.49 sec -[TEST] Pass: 0 testbatch: 60 loss: 0.742937 acc1: 0.75 acc5: 1.0 time: 0.49 sec -[TEST] Pass: 0 testbatch: 70 loss: 1.720186 acc1: 0.5 acc5: 0.875 time: 0.48 sec -[TEST] Pass: 0 testbatch: 80 loss: 0.199669 acc1: 0.875 acc5: 1.0 time: 0.48 sec -[TEST] Pass: 0 testbatch: 90 loss: 0.195510 acc1: 1.0 acc5: 1.0 time: 0.48 sec -``` - -### Inference -Inference is used to get prediction score or video features based on trained models. -``` -python infer.py \ - --class_dim=101 \ - --image_shape=3,224,224 \ - --with_mem_opt=True \ - --num_layers=50 \ - --seg_num=7 \ - --test_model={path_to_pretrained_model} -``` - -The output contains predication results, including maximum score (before softmax) and corresponding predicted label. -``` -Test sample: PlayingGuitar_g01_c03, score: [21.418629], class [62] -Test sample: SalsaSpin_g05_c06, score: [13.238657], class [76] -Test sample: TrampolineJumping_g04_c01, score: [21.722862], class [93] -Test sample: JavelinThrow_g01_c04, score: [16.27892], class [44] -Test sample: PlayingTabla_g01_c01, score: [15.366951], class [65] -Test sample: ParallelBars_g04_c07, score: [18.42596], class [56] -Test sample: PlayingCello_g05_c05, score: [18.795723], class [58] -Test sample: LongJump_g03_c04, score: [7.100088], class [50] -Test sample: SkyDiving_g06_c03, score: [15.144707], class [82] -Test sample: UnevenBars_g07_c04, score: [22.114838], class [95] -``` - -### Performance -Configuration | Top-1 acc -------------- | ---------------: -seg=7, size=224 | 0.859 -seg=10, size=224 | 0.863 diff --git a/fluid/PaddleCV/video_classification/data/download.sh b/fluid/PaddleCV/video_classification/data/download.sh deleted file mode 100644 index f7a8045d19907824249575b8538692f59325aa28..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/video_classification/data/download.sh +++ /dev/null @@ -1,9 +0,0 @@ -# Download the dataset -echo "Downloading..." -wget http://crcv.ucf.edu/data/UCF101/UCF101.rar -wget http://crcv.ucf.edu/data/UCF101/UCF101TrainTestSplits-RecognitionTask.zip - -# Extract the data. -echo "Extracting..." -unrar x UCF101.rar -unzip UCF101TrainTestSplits-RecognitionTask.zip diff --git a/fluid/PaddleCV/video_classification/data/generate_train_data.py b/fluid/PaddleCV/video_classification/data/generate_train_data.py deleted file mode 100644 index 1a5fa2edee7353d978d0329fcdc5af1f85b50645..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/video_classification/data/generate_train_data.py +++ /dev/null @@ -1,41 +0,0 @@ -import os -import cPickle - -# read class file -dd = {} -f = open('ucfTrainTestlist/classInd.txt') -for line in f.readlines(): - label, name = line.split() - dd[name.lower()] = int(label) - 1 -f.close() - - -def generate_pkl(mode): - # generate pkl - path = '%s/' % mode - savepath = '%s_pkl/' % mode - if not os.path.exists(savepath): - os.makedirs(savepath) - - fw = open('%s.list' % mode, 'w') - for folder in os.listdir(path): - vidid = folder.split('_', 1)[1] - this_label = dd[folder.split('_')[1].lower()] - this_feat = [] - for img in sorted(os.listdir(path + folder)): - fout = open(path + folder + '/' + img, 'rb') - this_feat.append(fout.read()) - fout.close() - - res = [vidid, this_label, this_feat] - - outp = open(savepath + vidid + '.pkl', 'wb') - cPickle.dump(res, outp, protocol=cPickle.HIGHEST_PROTOCOL) - outp.close() - - fw.write('data/%s/%s.pkl\n' % (savepath, vidid)) - fw.close() - - -generate_pkl('train') -generate_pkl('test') diff --git a/fluid/PaddleCV/video_classification/data/split_data.py b/fluid/PaddleCV/video_classification/data/split_data.py deleted file mode 100644 index 79eb3b4340b4b6fa36c09adf0653f2fbaa76fbf5..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/video_classification/data/split_data.py +++ /dev/null @@ -1,29 +0,0 @@ -import os -import shutil - -# set path -train_path = 'train/' -if not os.path.exists(train_path): - os.makedirs(train_path) - -test_path = 'test/' -if not os.path.exists(test_path): - os.makedirs(test_path) - -# move data -frame_dir = 'frame/' -f = open('ucfTrainTestlist/trainlist01.txt') -for line in f.readlines(): - folder = line.split('.')[0] - vidid = folder.split('/')[-1] - - shutil.move(frame_dir + folder, train_path + vidid) -f.close() - -f = open('ucfTrainTestlist/testlist01.txt') -for line in f.readlines(): - folder = line.split('.')[0] - vidid = folder.split('/')[-1] - - shutil.move(frame_dir + folder, test_path + vidid) -f.close() diff --git a/fluid/PaddleCV/video_classification/data/video_decode.py b/fluid/PaddleCV/video_classification/data/video_decode.py deleted file mode 100644 index b4892530cd4ecd88271cf9ccf89c558dc105c65b..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/video_classification/data/video_decode.py +++ /dev/null @@ -1,20 +0,0 @@ -import os, sys -import shutil - - -def decode(): - path = './UCF-101/' - for folder in os.listdir(path): - for vid in os.listdir(path + folder): - print vid - video_path = path + folder + '/' + vid - image_folder = './frame/' + folder + '/' + vid.split('.')[0] + '/' - if not os.path.exists(image_folder): - os.makedirs(image_folder) - - os.system('./ffmpeg -i ' + video_path + ' -q 0 ' + image_folder + - '/%06d.jpg') - - -if __name__ == '__main__': - decode() diff --git a/fluid/PaddleCV/video_classification/eval.py b/fluid/PaddleCV/video_classification/eval.py deleted file mode 100644 index 130e682c1b03e8203dd0d36124496bb82c81564c..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/video_classification/eval.py +++ /dev/null @@ -1,121 +0,0 @@ -import os -import numpy as np -import time -import sys -import paddle -import paddle.fluid as fluid -from resnet import TSN_ResNet -import reader - -import argparse -import functools -from paddle.fluid.framework import Parameter -from utility import add_arguments, print_arguments - -parser = argparse.ArgumentParser(description=__doc__) -add_arg = functools.partial(add_arguments, argparser=parser) -# yapf: disable -add_arg('batch_size', int, 128, "Minibatch size.") -add_arg('num_layers', int, 50, "How many layers for ResNet model.") -add_arg('with_mem_opt', bool, True, "Whether to use memory optimization or not.") -add_arg('class_dim', int, 101, "Number of class.") -add_arg('seg_num', int, 7, "Number of segments.") -add_arg('image_shape', str, "3,224,224", "Input image size.") -add_arg('test_model', str, None, "Test model path.") -# yapf: enable - - -def eval(args): - # parameters from arguments - seg_num = args.seg_num - class_dim = args.class_dim - num_layers = args.num_layers - batch_size = args.batch_size - test_model = args.test_model - - if test_model == None: - print('Please specify the test model ...') - return - - image_shape = [int(m) for m in args.image_shape.split(",")] - image_shape = [seg_num] + image_shape - - # model definition - model = TSN_ResNet(layers=num_layers, seg_num=seg_num) - image = fluid.layers.data(name='image', shape=image_shape, dtype='float32') - label = fluid.layers.data(name='label', shape=[1], dtype='int64') - - out = model.net(input=image, class_dim=class_dim) - cost = fluid.layers.cross_entropy(input=out, label=label) - - avg_cost = fluid.layers.mean(x=cost) - acc_top1 = fluid.layers.accuracy(input=out, label=label, k=1) - acc_top5 = fluid.layers.accuracy(input=out, label=label, k=5) - - # for test - inference_program = fluid.default_main_program().clone(for_test=True) - - if args.with_mem_opt: - fluid.memory_optimize(fluid.default_main_program()) - - place = fluid.CUDAPlace(0) - exe = fluid.Executor(place) - exe.run(fluid.default_startup_program()) - - def is_parameter(var): - if isinstance(var, Parameter): - return isinstance(var, Parameter) - - if test_model is not None: - vars = filter(is_parameter, inference_program.list_vars()) - fluid.io.load_vars(exe, test_model, vars=vars) - - # reader - test_reader = paddle.batch(reader.test(seg_num), batch_size=batch_size / 16) - feeder = fluid.DataFeeder(place=place, feed_list=[image, label]) - - fetch_list = [avg_cost.name, acc_top1.name, acc_top5.name] - - # test - cnt = 0 - pass_id = 0 - test_info = [[], [], []] - for batch_id, data in enumerate(test_reader()): - t1 = time.time() - loss, acc1, acc5 = exe.run(inference_program, - fetch_list=fetch_list, - feed=feeder.feed(data)) - t2 = time.time() - period = t2 - t1 - loss = np.mean(loss) - acc1 = np.mean(acc1) - acc5 = np.mean(acc5) - test_info[0].append(loss * len(data)) - test_info[1].append(acc1 * len(data)) - test_info[2].append(acc5 * len(data)) - cnt += len(data) - if batch_id % 10 == 0: - print( - "[TEST] Pass: {0}\ttestbatch: {1}\tloss: {2}\tacc1: {3}\tacc5: {4}\ttime: {5}" - .format(pass_id, batch_id, '%.6f' % loss, acc1, acc5, - "%2.2f sec" % period)) - sys.stdout.flush() - - test_loss = np.sum(test_info[0]) / cnt - test_acc1 = np.sum(test_info[1]) / cnt - test_acc5 = np.sum(test_info[2]) / cnt - - print("+ End pass: {0}, test_loss: {1}, test_acc1: {2}, test_acc5: {3}" - .format(pass_id, '%.3f' % test_loss, '%.3f' % test_acc1, '%.3f' % - test_acc5)) - sys.stdout.flush() - - -def main(): - args = parser.parse_args() - print_arguments(args) - eval(args) - - -if __name__ == '__main__': - main() diff --git a/fluid/PaddleCV/video_classification/infer.py b/fluid/PaddleCV/video_classification/infer.py deleted file mode 100644 index 15cc2b53d918f70acf43da7eb1e095c1c03e0c4e..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/video_classification/infer.py +++ /dev/null @@ -1,93 +0,0 @@ -import os -import numpy as np -import time -import sys -import paddle -import paddle.fluid as fluid -from resnet import TSN_ResNet -import reader - -import argparse -import functools -from paddle.fluid.framework import Parameter -from utility import add_arguments, print_arguments - -parser = argparse.ArgumentParser(description=__doc__) -add_arg = functools.partial(add_arguments, argparser=parser) -# yapf: disable -add_arg('num_layers', int, 50, "How many layers for ResNet model.") -add_arg('with_mem_opt', bool, True, "Whether to use memory optimization or not.") -add_arg('class_dim', int, 101, "Number of class.") -add_arg('seg_num', int, 7, "Number of segments.") -add_arg('image_shape', str, "3,224,224", "Input image size.") -add_arg('test_model', str, None, "Test model path.") -# yapf: enable - - -def infer(args): - # parameters from arguments - seg_num = args.seg_num - class_dim = args.class_dim - num_layers = args.num_layers - test_model = args.test_model - - if test_model == None: - print('Please specify the test model ...') - return - - image_shape = [int(m) for m in args.image_shape.split(",")] - image_shape = [seg_num] + image_shape - - # model definition - model = TSN_ResNet(layers=num_layers, seg_num=seg_num) - image = fluid.layers.data(name='image', shape=image_shape, dtype='float32') - - out = model.net(input=image, class_dim=class_dim) - - # for test - inference_program = fluid.default_main_program().clone(for_test=True) - - if args.with_mem_opt: - fluid.memory_optimize(fluid.default_main_program()) - - place = fluid.CUDAPlace(0) - exe = fluid.Executor(place) - exe.run(fluid.default_startup_program()) - - def is_parameter(var): - if isinstance(var, Parameter): - return isinstance(var, Parameter) - - if test_model is not None: - vars = filter(is_parameter, inference_program.list_vars()) - fluid.io.load_vars(exe, test_model, vars=vars) - - # reader - test_reader = paddle.batch(reader.infer(seg_num), batch_size=1) - feeder = fluid.DataFeeder(place=place, feed_list=[image]) - - fetch_list = [out.name] - - # test - TOPK = 1 - for batch_id, data in enumerate(test_reader()): - data, vid = data[0] - data = [[data]] - result = exe.run(inference_program, - fetch_list=fetch_list, - feed=feeder.feed(data)) - result = result[0][0] - pred_label = np.argsort(result)[::-1][:TOPK] - print("Test sample: {0}, score: {1}, class {2}".format(vid, result[ - pred_label], pred_label)) - sys.stdout.flush() - - -def main(): - args = parser.parse_args() - print_arguments(args) - infer(args) - - -if __name__ == '__main__': - main() diff --git a/fluid/PaddleCV/video_classification/reader.py b/fluid/PaddleCV/video_classification/reader.py deleted file mode 100644 index 9c4fd812a63cd1e2b581dadaf6eaf70373e76b3c..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/video_classification/reader.py +++ /dev/null @@ -1,210 +0,0 @@ -import os -import sys -import math -import random -import functools -try: - import cPickle as pickle - from cStringIO import StringIO -except ImportError: - import pickle - from io import BytesIO -import numpy as np -import paddle -from PIL import Image, ImageEnhance - -random.seed(0) - -THREAD = 8 -BUF_SIZE = 1024 - -TRAIN_LIST = 'data/train.list' -TEST_LIST = 'data/test.list' -INFER_LIST = 'data/test.list' - -img_mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1)) -img_std = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1)) - -python_ver = sys.version_info - -def imageloader(buf): - if isinstance(buf, str): - img = Image.open(StringIO(buf)) - else: - img = Image.open(BytesIO(buf)) - - return img.convert('RGB') - - -def group_scale(imgs, target_size): - resized_imgs = [] - for i in range(len(imgs)): - img = imgs[i] - w, h = img.size - if (w <= h and w == target_size) or (h <= w and h == target_size): - resized_imgs.append(img) - continue - - if w < h: - ow = target_size - oh = int(target_size * 4.0 / 3.0) - resized_imgs.append(img.resize((ow, oh), Image.BILINEAR)) - else: - oh = target_size - ow = int(target_size * 4.0 / 3.0) - resized_imgs.append(img.resize((ow, oh), Image.BILINEAR)) - - return resized_imgs - - -def group_random_crop(img_group, target_size): - w, h = img_group[0].size - th, tw = target_size, target_size - - out_images = [] - x1 = random.randint(0, w - tw) - y1 = random.randint(0, h - th) - - for img in img_group: - if w == tw and h == th: - out_images.append(img) - else: - out_images.append(img.crop((x1, y1, x1 + tw, y1 + th))) - - return out_images - - -def group_random_flip(img_group): - v = random.random() - if v < 0.5: - ret = [img.transpose(Image.FLIP_LEFT_RIGHT) for img in img_group] - return ret - else: - return img_group - - -def group_center_crop(img_group, target_size): - img_crop = [] - for img in img_group: - w, h = img.size - th, tw = target_size, target_size - x1 = int(round((w - tw) / 2.)) - y1 = int(round((h - th) / 2.)) - img_crop.append(img.crop((x1, y1, x1 + tw, y1 + th))) - - return img_crop - - -def video_loader(frames, nsample, mode): - videolen = len(frames) - average_dur = videolen // nsample - - imgs = [] - for i in range(nsample): - idx = 0 - if mode == 'train': - if average_dur >= 1: - idx = random.randint(0, average_dur - 1) - idx += i * average_dur - else: - idx = i - else: - if average_dur >= 1: - idx = (average_dur - 1) // 2 - idx += i * average_dur - else: - idx = i - - imgbuf = frames[int(idx % videolen)] - img = imageloader(imgbuf) - imgs.append(img) - - return imgs - - -def decode_pickle(sample, mode, seg_num, short_size, target_size): - pickle_path = sample[0] - if python_ver < (3, 0): - data_loaded = pickle.load(open(pickle_path, 'rb')) - else: - data_loaded = pickle.load(open(pickle_path, 'rb'), encoding='bytes') - vid, label, frames = data_loaded - - imgs = video_loader(frames, seg_num, mode) - imgs = group_scale(imgs, short_size) - - if mode == 'train': - imgs = group_random_crop(imgs, target_size) - imgs = group_random_flip(imgs) - else: - imgs = group_center_crop(imgs, target_size) - - np_imgs = (np.array(imgs[0]).astype('float32').transpose( - (2, 0, 1))).reshape(1, 3, 224, 224) / 255 - for i in range(len(imgs) - 1): - img = (np.array(imgs[i + 1]).astype('float32').transpose( - (2, 0, 1))).reshape(1, 3, 224, 224) / 255 - np_imgs = np.concatenate((np_imgs, img)) - imgs = np_imgs - imgs -= img_mean - imgs /= img_std - - if mode == 'train' or mode == 'test': - return imgs, label - elif mode == 'infer': - return imgs, vid - - -def _reader_creator(pickle_list, - mode, - seg_num, - short_size, - target_size, - shuffle=False): - def reader(): - with open(pickle_list) as flist: - lines = [line.strip() for line in flist] - if shuffle: - random.shuffle(lines) - for line in lines: - pickle_path = line.strip() - yield [pickle_path] - - mapper = functools.partial( - decode_pickle, - mode=mode, - seg_num=seg_num, - short_size=short_size, - target_size=target_size) - - return paddle.reader.xmap_readers(mapper, reader, THREAD, BUF_SIZE) - - -def train(seg_num): - return _reader_creator( - TRAIN_LIST, - 'train', - shuffle=True, - seg_num=seg_num, - short_size=256, - target_size=224) - - -def test(seg_num): - return _reader_creator( - TEST_LIST, - 'test', - shuffle=False, - seg_num=seg_num, - short_size=256, - target_size=224) - - -def infer(seg_num): - return _reader_creator( - INFER_LIST, - 'infer', - shuffle=False, - seg_num=seg_num, - short_size=256, - target_size=224) diff --git a/fluid/PaddleCV/video_classification/resnet.py b/fluid/PaddleCV/video_classification/resnet.py deleted file mode 100644 index 494235469a37939d67f4239d2b47f8d9461264f4..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/video_classification/resnet.py +++ /dev/null @@ -1,106 +0,0 @@ -import os -import time -import sys -import paddle.fluid as fluid -import math - - -class TSN_ResNet(): - def __init__(self, layers=50, seg_num=7): - self.layers = layers - self.seg_num = seg_num - - def conv_bn_layer(self, - input, - num_filters, - filter_size, - stride=1, - groups=1, - act=None): - conv = fluid.layers.conv2d( - input=input, - num_filters=num_filters, - filter_size=filter_size, - stride=stride, - padding=(filter_size - 1) // 2, - groups=groups, - act=None, - bias_attr=False) - return fluid.layers.batch_norm(input=conv, act=act) - - def shortcut(self, input, ch_out, stride): - ch_in = input.shape[1] - if ch_in != ch_out or stride != 1: - return self.conv_bn_layer(input, ch_out, 1, stride) - else: - return input - - def bottleneck_block(self, input, num_filters, stride): - conv0 = self.conv_bn_layer( - input=input, num_filters=num_filters, filter_size=1, act='relu') - conv1 = self.conv_bn_layer( - input=conv0, - num_filters=num_filters, - filter_size=3, - stride=stride, - act='relu') - conv2 = self.conv_bn_layer( - input=conv1, num_filters=num_filters * 4, filter_size=1, act=None) - - short = self.shortcut(input, num_filters * 4, stride) - - return fluid.layers.elementwise_add(x=short, y=conv2, act='relu') - - def net(self, input, class_dim=101): - layers = self.layers - seg_num = self.seg_num - supported_layers = [50, 101, 152] - if layers not in supported_layers: - print("supported layers are", supported_layers, \ - "but input layer is ", layers) - exit() - - # reshape input - channels = input.shape[2] - short_size = input.shape[3] - input = fluid.layers.reshape( - x=input, shape=[-1, channels, short_size, short_size]) - - if layers == 50: - depth = [3, 4, 6, 3] - elif layers == 101: - depth = [3, 4, 23, 3] - elif layers == 152: - depth = [3, 8, 36, 3] - num_filters = [64, 128, 256, 512] - - conv = self.conv_bn_layer( - input=input, num_filters=64, filter_size=7, stride=2, act='relu') - conv = fluid.layers.pool2d( - input=conv, - pool_size=3, - pool_stride=2, - pool_padding=1, - pool_type='max') - - for block in range(len(depth)): - for i in range(depth[block]): - conv = self.bottleneck_block( - input=conv, - num_filters=num_filters[block], - stride=2 if i == 0 and block != 0 else 1) - pool = fluid.layers.pool2d( - input=conv, pool_size=7, pool_type='avg', global_pooling=True) - - feature = fluid.layers.reshape( - x=pool, shape=[-1, seg_num, pool.shape[1]]) - out = fluid.layers.reduce_mean(feature, dim=1) - - stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0) - out = fluid.layers.fc(input=out, - size=class_dim, - act='softmax', - param_attr=fluid.param_attr.ParamAttr( - initializer=fluid.initializer.Uniform(-stdv, - stdv))) - return out diff --git a/fluid/PaddleCV/video_classification/train.py b/fluid/PaddleCV/video_classification/train.py deleted file mode 100644 index e873cdb608ccfd83a8600e77b4837e2e52872549..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/video_classification/train.py +++ /dev/null @@ -1,180 +0,0 @@ -import os -import numpy as np -import time -import sys -import paddle -import paddle.fluid as fluid -from resnet import TSN_ResNet -import reader - -import argparse -import functools -from paddle.fluid.framework import Parameter -from utility import add_arguments, print_arguments - -parser = argparse.ArgumentParser(description=__doc__) -add_arg = functools.partial(add_arguments, argparser=parser) -# yapf: disable -add_arg('batch_size', int, 128, "Minibatch size.") -add_arg('num_layers', int, 50, "How many layers for ResNet model.") -add_arg('with_mem_opt', bool, True, "Whether to use memory optimization or not.") -add_arg('num_epochs', int, 60, "Number of epochs.") -add_arg('class_dim', int, 101, "Number of class.") -add_arg('seg_num', int, 7, "Number of segments.") -add_arg('image_shape', str, "3,224,224", "Input image size.") -add_arg('model_save_dir', str, "output", "Model save directory.") -add_arg('pretrained_model', str, None, "Whether to use pretrained model.") -add_arg('total_videos', int, 9537, "Training video number.") -add_arg('lr_init', float, 0.01, "Set initial learning rate.") -# yapf: enable - - -def train(args): - # parameters from arguments - seg_num = args.seg_num - class_dim = args.class_dim - num_layers = args.num_layers - num_epochs = args.num_epochs - batch_size = args.batch_size - pretrained_model = args.pretrained_model - model_save_dir = args.model_save_dir - - image_shape = [int(m) for m in args.image_shape.split(",")] - image_shape = [seg_num] + image_shape - - # model definition - model = TSN_ResNet(layers=num_layers, seg_num=seg_num) - - image = fluid.layers.data(name='image', shape=image_shape, dtype='float32') - label = fluid.layers.data(name='label', shape=[1], dtype='int64') - - out = model.net(input=image, class_dim=class_dim) - cost = fluid.layers.cross_entropy(input=out, label=label) - - avg_cost = fluid.layers.mean(x=cost) - acc_top1 = fluid.layers.accuracy(input=out, label=label, k=1) - acc_top5 = fluid.layers.accuracy(input=out, label=label, k=5) - - # for test - inference_program = fluid.default_main_program().clone(for_test=True) - - # learning rate strategy - epoch_points = [num_epochs / 3, num_epochs * 2 / 3] - total_videos = args.total_videos - step = int(total_videos / batch_size + 1) - bd = [e * step for e in epoch_points] - - lr_init = args.lr_init - lr = [lr_init, lr_init / 10, lr_init / 100] - - # initialize optimizer - optimizer = fluid.optimizer.Momentum( - learning_rate=fluid.layers.piecewise_decay( - boundaries=bd, values=lr), - momentum=0.9, - regularization=fluid.regularizer.L2Decay(1e-4)) - - opts = optimizer.minimize(avg_cost) - if args.with_mem_opt: - fluid.memory_optimize(fluid.default_main_program()) - - place = fluid.CUDAPlace(0) - exe = fluid.Executor(place) - exe.run(fluid.default_startup_program()) - - def is_parameter(var): - if isinstance(var, Parameter): - return isinstance(var, Parameter) and (not ("fc_0" in var.name)) - - if pretrained_model is not None: - vars = filter(is_parameter, inference_program.list_vars()) - fluid.io.load_vars(exe, pretrained_model, vars=vars) - - # reader - train_reader = paddle.batch(reader.train(seg_num), batch_size=batch_size, drop_last=True) - # test in single GPU - test_reader = paddle.batch(reader.test(seg_num), batch_size=batch_size / 16) - feeder = fluid.DataFeeder(place=place, feed_list=[image, label]) - - train_exe = fluid.ParallelExecutor(use_cuda=True, loss_name=avg_cost.name) - - fetch_list = [avg_cost.name, acc_top1.name, acc_top5.name] - - # train - for pass_id in range(num_epochs): - train_info = [[], [], []] - test_info = [[], [], []] - for batch_id, data in enumerate(train_reader()): - t1 = time.time() - loss, acc1, acc5 = train_exe.run(fetch_list, feed=feeder.feed(data)) - t2 = time.time() - period = t2 - t1 - loss = np.mean(np.array(loss)) - acc1 = np.mean(np.array(acc1)) - acc5 = np.mean(np.array(acc5)) - train_info[0].append(loss) - train_info[1].append(acc1) - train_info[2].append(acc5) - - if batch_id % 10 == 0: - print( - "[TRAIN] Pass: {0}\ttrainbatch: {1}\tloss: {2}\tacc1: {3}\tacc5: {4}\ttime: {5}" - .format(pass_id, batch_id, '%.6f' % loss, acc1, acc5, - "%2.2f sec" % period)) - sys.stdout.flush() - - train_loss = np.array(train_info[0]).mean() - train_acc1 = np.array(train_info[1]).mean() - train_acc5 = np.array(train_info[2]).mean() - - # test - cnt = 0 - for batch_id, data in enumerate(test_reader()): - t1 = time.time() - loss, acc1, acc5 = exe.run(inference_program, - fetch_list=fetch_list, - feed=feeder.feed(data)) - t2 = time.time() - period = t2 - t1 - loss = np.mean(loss) - acc1 = np.mean(acc1) - acc5 = np.mean(acc5) - test_info[0].append(loss * len(data)) - test_info[1].append(acc1 * len(data)) - test_info[2].append(acc5 * len(data)) - cnt += len(data) - if batch_id % 10 == 0: - print( - "[TEST] Pass: {0}\ttestbatch: {1}\tloss: {2}\tacc1: {3}\tacc5: {4}\ttime: {5}" - .format(pass_id, batch_id, '%.6f' % loss, acc1, acc5, - "%2.2f sec" % period)) - sys.stdout.flush() - - test_loss = np.sum(test_info[0]) / cnt - test_acc1 = np.sum(test_info[1]) / cnt - test_acc5 = np.sum(test_info[2]) / cnt - - print( - "+ End pass: {0}, train_loss: {1}, train_acc1: {2}, train_acc5: {3}" - .format(pass_id, '%.3f' % train_loss, '%.3f' % train_acc1, '%.3f' % - train_acc5)) - print("+ End pass: {0}, test_loss: {1}, test_acc1: {2}, test_acc5: {3}" - .format(pass_id, '%.3f' % test_loss, '%.3f' % test_acc1, '%.3f' % - test_acc5)) - sys.stdout.flush() - - # save model - model_path = os.path.join(model_save_dir, str(pass_id)) - if not os.path.isdir(model_path): - os.makedirs(model_path) - fluid.io.save_persistables(exe, model_path) - - -def main(): - args = parser.parse_args() - print_arguments(args) - train(args) - - -if __name__ == '__main__': - main() diff --git a/fluid/PaddleCV/video_classification/utility.py b/fluid/PaddleCV/video_classification/utility.py deleted file mode 100644 index 20b4141f7fb24b1617a1ef0f1d4a3c2536213b14..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/video_classification/utility.py +++ /dev/null @@ -1,62 +0,0 @@ -# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve. -# -#Licensed under the Apache License, Version 2.0 (the "License"); -#you may not use this file except in compliance with the License. -#You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -#Unless required by applicable law or agreed to in writing, software -#distributed under the License is distributed on an "AS IS" BASIS, -#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -#See the License for the specific language governing permissions and -#limitations under the License. -"""Contains common utility functions.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function -import distutils.util -import numpy as np -import six -from paddle.fluid import core - - -def print_arguments(args): - """Print argparse's arguments. - - Usage: - - .. code-block:: python - - parser = argparse.ArgumentParser() - parser.add_argument("name", default="Jonh", type=str, help="User name.") - args = parser.parse_args() - print_arguments(args) - - :param args: Input argparse.Namespace for printing. - :type args: argparse.Namespace - """ - print("----------- Configuration Arguments -----------") - for arg, value in sorted(six.iteritems(vars(args))): - print("%s: %s" % (arg, value)) - print("------------------------------------------------") - - -def add_arguments(argname, type, default, help, argparser, **kwargs): - """Add argparse's argument. - - Usage: - - .. code-block:: python - - parser = argparse.ArgumentParser() - add_argument("name", str, "Jonh", "User name.", parser) - args = parser.parse_args() - """ - type = distutils.util.strtobool if type == bool else type - argparser.add_argument( - "--" + argname, - default=default, - type=type, - help=help + ' Default: %(default)s.', - **kwargs) diff --git a/fluid/adversarial/README.md b/fluid/adversarial/README.md deleted file mode 100644 index 91661f7e1675d59c7d38c4c09bc67d5b9339573d..0000000000000000000000000000000000000000 --- a/fluid/adversarial/README.md +++ /dev/null @@ -1,112 +0,0 @@ -The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html). - ---- - -# Advbox - -Advbox is a toolbox to generate adversarial examples that fool neural networks and Advbox can benchmark the robustness of machine learning models. - -The Advbox is based on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) Fluid and is under continual development, always welcoming contributions of the latest method of adversarial attacks and defenses. - - -## Overview -[Szegedy et al.](https://arxiv.org/abs/1312.6199) discovered an intriguing properties of deep neural networks in the context of image classification for the first time. They showed that despite the state-of-the-art deep networks are surprisingly susceptible to adversarial attacks in the form of small perturbations to images that remain (almost) imperceptible to human vision system. These perturbations are found by optimizing the input to maximize the prediction error and the images modified by these perturbations are called as `adversarial examples`. The profound implications of these results triggered a wide interest of researchers in adversarial attacks and their defenses for deep learning in general. - -Advbox is similar to [Foolbox](https://github.com/bethgelab/foolbox) and [CleverHans](https://github.com/tensorflow/cleverhans). CleverHans only supports TensorFlow framework while foolbox interfaces with many popular machine learning frameworks such as PyTorch, Keras, TensorFlow, Theano, Lasagne and MXNet. However, these two great libraries don't support PaddlePaddle, an easy-to-use, efficient, flexible and scalable deep learning platform which is originally developed by Baidu scientists and engineers for the purpose of applying deep learning to many products at Baidu. - -## Usage -Advbox provides many stable reference implementations of modern methods to generate adversarial examples such as FGSM, DeepFool, JSMA. When you want to benchmark the robustness of your neural networks , you can use the advbox to generate some adversarial examples and benchmark the networks. Some tips of using Advbox: - -1. Train a model and save the parameters. -2. Load the parameters which has been trained,then reconstruct the model. -3. Use advbox to generate the adversarial samples. - - -#### Dependencies -* PaddlePaddle: [the lastest develop branch](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html) -* Python 2.x - -#### Structure - -Network models, attack method's implements and the criterion that defines adversarial examples are three essential elements to generate adversarial examples. Misclassification is adopted as the adversarial criterion for briefness in Advbox. - -The structure of Advbox module are as follows: - - . - ├── advbox - | ├── __init__.py - | ├── attack - | ├── __init__.py - | ├── base.py - | ├── deepfool.py - | ├── gradient_method.py - | ├── lbfgs.py - | └── saliency.py - | ├── models - | ├── __init__.py - | ├── base.py - | └── paddle.py - | └── adversary.py - ├── tutorials - | ├── __init__.py - | ├── mnist_model.py - | ├── mnist_tutorial_lbfgs.py - | ├── mnist_tutorial_fgsm.py - | ├── mnist_tutorial_bim.py - | ├── mnist_tutorial_ilcm.py - | ├── mnist_tutorial_mifgsm.py - | ├── mnist_tutorial_jsma.py - | └── mnist_tutorial_deepfool.py - └── README.md - -**advbox.attack** - -Advbox implements several popular adversarial attacks which search adversarial examples. Each attack method uses a distance measure(L1, L2, etc.) to quantify the size of adversarial perturbations. Advbox is easy to craft adversarial example as some attack methods could perform internal hyperparameter tuning to find the minimum perturbation. - -**advbox.model** - -Advbox implements interfaces to PaddlePaddle. Additionally, other deep learning framworks such as TensorFlow can also be defined and employed. The module is use to compute predictions and gradients for given inputs in a specific framework. - -**advbox.adversary** - -Adversary contains the original object, the target and the adversarial examples. It provides the misclassification as the criterion to accept a adversarial example. - -## Tutorials -The `./tutorials/` folder provides some tutorials to generate adversarial examples on the MNIST dataset. You can slightly modify the code to apply to other dataset. These attack methods are supported in Advbox: - -* [L-BFGS](https://arxiv.org/abs/1312.6199) -* [FGSM](https://arxiv.org/abs/1412.6572) -* [BIM](https://arxiv.org/abs/1607.02533) -* [ILCM](https://arxiv.org/abs/1607.02533) -* [MI-FGSM](https://arxiv.org/pdf/1710.06081.pdf) -* [JSMA](https://arxiv.org/pdf/1511.07528) -* [DeepFool](https://arxiv.org/abs/1511.04599) - -## Testing -Benchmarks on a vanilla CNN model. - -> MNIST - -| adversarial attacks | fooling rate (non-targeted) | fooling rate (targeted) | max_epsilon | iterations | Strength | -|:-----:| :----: | :---: | :----: | :----: | :----: | -|L-BFGS| --- | 89.2% | --- | One shot | *** | -|FGSM| 57.8% | 26.55% | 0.3 | One shot| *** | -|BIM| 97.4% | --- | 0.1 | 100 | **** | -|ILCM| --- | 100.0% | 0.1 | 100 | **** | -|MI-FGSM| 94.4% | 100.0% | 0.1 | 100 | **** | -|JSMA| 96.8% | 90.4%| 0.1 | 2000 | *** | -|DeepFool| 97.7% | 51.3% | --- | 100 | **** | - -* The strength (higher for more asterisks) is based on the impression from the reviewed literature. - ---- -## References -* [Intriguing properties of neural networks](https://arxiv.org/abs/1312.6199), C. Szegedy et al., arxiv 2014 -* [Explaining and Harnessing Adversarial Examples](https://arxiv.org/abs/1412.6572), I. Goodfellow et al., ICLR 2015 -* [Adversarial Examples In The Physical World](https://arxiv.org/pdf/1607.02533v3.pdf), A. Kurakin et al., ICLR workshop 2017 -* [Boosting Adversarial Attacks with Momentum](https://arxiv.org/abs/1710.06081), Yinpeng Dong et al., arxiv 2018 -* [The Limitations of Deep Learning in Adversarial Settings](https://arxiv.org/abs/1511.07528), N. Papernot et al., ESSP 2016 -* [DeepFool: a simple and accurate method to fool deep neural networks](https://arxiv.org/abs/1511.04599), S. Moosavi-Dezfooli et al., CVPR 2016 -* [Foolbox: A Python toolbox to benchmark the robustness of machine learning models](https://arxiv.org/abs/1707.04131), Jonas Rauber et al., arxiv 2018 -* [CleverHans: An adversarial example library for constructing attacks, building defenses, and benchmarking both](https://github.com/tensorflow/cleverhans#setting-up-cleverhans) -* [Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey](https://arxiv.org/abs/1801.00553), Naveed Akhtar, Ajmal Mian, arxiv 2018 diff --git a/fluid/adversarial/advbox/__init__.py b/fluid/adversarial/advbox/__init__.py deleted file mode 100644 index e68b585ef98d12d147da43468aa0b4be667137b2..0000000000000000000000000000000000000000 --- a/fluid/adversarial/advbox/__init__.py +++ /dev/null @@ -1,3 +0,0 @@ -""" - A set of tools for generating adversarial example on paddle platform -""" diff --git a/fluid/adversarial/advbox/adversary.py b/fluid/adversarial/advbox/adversary.py deleted file mode 100644 index 14b8517e336affc4752b53fa586f30f1ec5926be..0000000000000000000000000000000000000000 --- a/fluid/adversarial/advbox/adversary.py +++ /dev/null @@ -1,154 +0,0 @@ -""" -Defines a class that contains the original object, the target and the -adversarial example. - -""" - - -class Adversary(object): - """ - Adversary contains the original object, the target and the adversarial - example. - """ - - def __init__(self, original, original_label=None): - """ - :param original: The original instance, such as an image. - :param original_label: The original instance's label. - """ - assert original is not None - - self.original_label = original_label - self.target_label = None - self.adversarial_label = None - - self.__original = original - self.__target = None - self.__is_targeted_attack = False - self.__adversarial_example = None - self.__bad_adversarial_example = None - - def set_target(self, is_targeted_attack, target=None, target_label=None): - """ - Set the target be targeted or untargeted. - - :param is_targeted_attack: bool - :param target: The target. - :param target_label: If is_targeted_attack is true and target_label is - None, self.target_label will be set by the Attack class. - If is_targeted_attack is false, target_label must be None. - """ - assert (target_label is None) or is_targeted_attack - self.__is_targeted_attack = is_targeted_attack - self.target_label = target_label - self.__target = target - if not is_targeted_attack: - self.target_label = None - self.__target = None - - def set_original(self, original, original_label=None): - """ - Reset the original. - - :param original: Original instance. - :param original_label: Original instance's label. - """ - if original != self.__original: - self.__original = original - self.original_label = original_label - self.__adversarial_example = None - self.__bad_adversarial_example = None - if original is None: - self.original_label = None - - def _is_successful(self, adversarial_label): - """ - Is the adversarial_label is the expected adversarial label. - - :param adversarial_label: adversarial label. - :return: bool - """ - if self.target_label is not None: - return adversarial_label == self.target_label - else: - return (adversarial_label is not None) and \ - (adversarial_label != self.original_label) - - def is_successful(self): - """ - Has the adversarial example been found. - - :return: bool - """ - return self._is_successful(self.adversarial_label) - - def try_accept_the_example(self, adversarial_example, adversarial_label): - """ - If adversarial_label the target label that we are finding. - The adversarial_example and adversarial_label will be accepted and - True will be returned. - - :return: bool - """ - assert adversarial_example is not None - assert self.__original.shape == adversarial_example.shape - - ok = self._is_successful(adversarial_label) - if ok: - self.__adversarial_example = adversarial_example - self.adversarial_label = adversarial_label - else: - self.__bad_adversarial_example = adversarial_example - return ok - - def perturbation(self, multiplying_factor=1.0): - """ - The perturbation that the adversarial_example is added. - - :param multiplying_factor: float. - :return: The perturbation that is multiplied by multiplying_factor. - """ - assert self.__original is not None - assert (self.__adversarial_example is not None) or \ - (self.__bad_adversarial_example is not None) - if self.__adversarial_example is not None: - return multiplying_factor * ( - self.__adversarial_example - self.__original) - else: - return multiplying_factor * ( - self.__bad_adversarial_example - self.__original) - - @property - def is_targeted_attack(self): - """ - :property: is_targeted_attack - """ - return self.__is_targeted_attack - - @property - def target(self): - """ - :property: target - """ - return self.__target - - @property - def original(self): - """ - :property: original - """ - return self.__original - - @property - def adversarial_example(self): - """ - :property: adversarial_example - """ - return self.__adversarial_example - - @property - def bad_adversarial_example(self): - """ - :property: bad_adversarial_example - """ - return self.__bad_adversarial_example diff --git a/fluid/adversarial/advbox/attacks/__init__.py b/fluid/adversarial/advbox/attacks/__init__.py deleted file mode 100644 index 3893b769f3ad62ada135b55d9367352532feb490..0000000000000000000000000000000000000000 --- a/fluid/adversarial/advbox/attacks/__init__.py +++ /dev/null @@ -1,3 +0,0 @@ -""" -Attack methods __init__.py -""" diff --git a/fluid/adversarial/advbox/attacks/base.py b/fluid/adversarial/advbox/attacks/base.py deleted file mode 100644 index af2eae5e41ab2618602a2d82a5151363a35c2378..0000000000000000000000000000000000000000 --- a/fluid/adversarial/advbox/attacks/base.py +++ /dev/null @@ -1,74 +0,0 @@ -""" -The base model of the model. -""" -import logging -from abc import ABCMeta -from abc import abstractmethod - -import numpy as np - - -class Attack(object): - """ - Abstract base class for adversarial attacks. `Attack` represent an - adversarial attack which search an adversarial example. subclass should - implement the _apply() method. - - Args: - model(Model): an instance of the class advbox.base.Model. - - """ - __metaclass__ = ABCMeta - - def __init__(self, model): - self.model = model - - def __call__(self, adversary, **kwargs): - """ - Generate the adversarial sample. - - Args: - adversary(object): The adversary object. - **kwargs: Other named arguments. - """ - self._preprocess(adversary) - return self._apply(adversary, **kwargs) - - @abstractmethod - def _apply(self, adversary, **kwargs): - """ - Search an adversarial example. - - Args: - adversary(object): The adversary object. - **kwargs: Other named arguments. - """ - raise NotImplementedError - - def _preprocess(self, adversary): - """ - Preprocess the adversary object. - - :param adversary: adversary - :return: None - """ - assert self.model.channel_axis() == adversary.original.ndim - - if adversary.original_label is None: - adversary.original_label = np.argmax( - self.model.predict(adversary.original)) - if adversary.is_targeted_attack and adversary.target_label is None: - if adversary.target is None: - raise ValueError( - 'When adversary.is_targeted_attack is true, ' - 'adversary.target_label or adversary.target must be set.') - else: - adversary.target_label = np.argmax( - self.model.predict(adversary.target)) - - logging.info('adversary:' - '\n original_label: {}' - '\n target_label: {}' - '\n is_targeted_attack: {}' - ''.format(adversary.original_label, adversary.target_label, - adversary.is_targeted_attack)) diff --git a/fluid/adversarial/advbox/attacks/deepfool.py b/fluid/adversarial/advbox/attacks/deepfool.py deleted file mode 100644 index abf2292cf30ffedcb8b8056de7237d2e120e3485..0000000000000000000000000000000000000000 --- a/fluid/adversarial/advbox/attacks/deepfool.py +++ /dev/null @@ -1,84 +0,0 @@ -""" -This module provide the attack method for deepfool. Deepfool is a simple and -accurate adversarial attack. -""" -from __future__ import division - -import logging - -import numpy as np - -from .base import Attack - -__all__ = ['DeepFoolAttack'] - - -class DeepFoolAttack(Attack): - """ - DeepFool: a simple and accurate method to fool deep neural networks", - Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Pascal Frossard, - https://arxiv.org/abs/1511.04599 - """ - - def _apply(self, adversary, iterations=100, overshoot=0.02): - """ - Apply the deep fool attack. - - Args: - adversary(Adversary): The Adversary object. - iterations(int): The iterations. - overshoot(float): We add (1+overshoot)*pert every iteration. - Return: - adversary: The Adversary object. - """ - assert adversary is not None - - pre_label = adversary.original_label - min_, max_ = self.model.bounds() - f = self.model.predict(adversary.original) - if adversary.is_targeted_attack: - labels = [adversary.target_label] - else: - max_class_count = 10 - class_count = self.model.num_classes() - if class_count > max_class_count: - labels = np.argsort(f)[-(max_class_count + 1):-1] - else: - labels = np.arange(class_count) - - gradient = self.model.gradient(adversary.original, pre_label) - x = adversary.original - for iteration in xrange(iterations): - w = np.inf - w_norm = np.inf - pert = np.inf - for k in labels: - if k == pre_label: - continue - gradient_k = self.model.gradient(x, k) - w_k = gradient_k - gradient - f_k = f[k] - f[pre_label] - w_k_norm = np.linalg.norm(w_k.flatten()) + 1e-8 - pert_k = (np.abs(f_k) + 1e-8) / w_k_norm - if pert_k < pert: - pert = pert_k - w = w_k - w_norm = w_k_norm - - r_i = -w * pert / w_norm # The gradient is -gradient in the paper. - x = x + (1 + overshoot) * r_i - x = np.clip(x, min_, max_) - - f = self.model.predict(x) - gradient = self.model.gradient(x, pre_label) - adv_label = np.argmax(f) - logging.info('iteration={}, f[pre_label]={}, f[target_label]={}' - ', f[adv_label]={}, pre_label={}, adv_label={}' - ''.format(iteration, f[pre_label], ( - f[adversary.target_label] - if adversary.is_targeted_attack else 'NaN'), f[ - adv_label], pre_label, adv_label)) - if adversary.try_accept_the_example(x, adv_label): - return adversary - - return adversary diff --git a/fluid/adversarial/advbox/attacks/gradient_method.py b/fluid/adversarial/advbox/attacks/gradient_method.py deleted file mode 100644 index 146b650c21464279f5527eb4a8bf44593e9dce29..0000000000000000000000000000000000000000 --- a/fluid/adversarial/advbox/attacks/gradient_method.py +++ /dev/null @@ -1,278 +0,0 @@ -""" -This module provide the attack method for Iterator FGSM's implement. -""" -from __future__ import division - -import logging -from collections import Iterable - -import numpy as np - -from .base import Attack - -__all__ = [ - 'GradientMethodAttack', 'FastGradientSignMethodAttack', 'FGSM', - 'FastGradientSignMethodTargetedAttack', 'FGSMT', - 'BasicIterativeMethodAttack', 'BIM', - 'IterativeLeastLikelyClassMethodAttack', 'ILCM', 'MomentumIteratorAttack', - 'MIFGSM' -] - - -class GradientMethodAttack(Attack): - """ - This class implements gradient attack method, and is the base of FGSM, BIM, - ILCM, etc. - """ - - def __init__(self, model, support_targeted=True): - """ - :param model(model): The model to be attacked. - :param support_targeted(bool): Does this attack method support targeted. - """ - super(GradientMethodAttack, self).__init__(model) - self.support_targeted = support_targeted - - def _apply(self, - adversary, - norm_ord=np.inf, - epsilons=0.01, - steps=1, - epsilon_steps=100): - """ - Apply the gradient attack method. - :param adversary(Adversary): - The Adversary object. - :param norm_ord(int): - Order of the norm, such as np.inf, 1, 2, etc. It can't be 0. - :param epsilons(list|tuple|int): - Attack step size (input variation). - Largest step size if epsilons is not iterable. - :param steps: - The number of attack iteration. - :param epsilon_steps: - The number of Epsilons' iteration for each attack iteration. - :return: - adversary(Adversary): The Adversary object. - """ - if norm_ord == 0: - raise ValueError("L0 norm is not supported!") - - if not self.support_targeted: - if adversary.is_targeted_attack: - raise ValueError( - "This attack method doesn't support targeted attack!") - - if not isinstance(epsilons, Iterable): - epsilons = np.linspace(0, epsilons, num=epsilon_steps) - - pre_label = adversary.original_label - min_, max_ = self.model.bounds() - - assert self.model.channel_axis() == adversary.original.ndim - assert (self.model.channel_axis() == 1 or - self.model.channel_axis() == adversary.original.shape[0] or - self.model.channel_axis() == adversary.original.shape[-1]) - - for epsilon in epsilons[:]: - step = 1 - adv_img = adversary.original - if epsilon == 0.0: - continue - for i in range(steps): - if adversary.is_targeted_attack: - gradient = -self.model.gradient(adv_img, - adversary.target_label) - else: - gradient = self.model.gradient(adv_img, - adversary.original_label) - if norm_ord == np.inf: - gradient_norm = np.sign(gradient) - else: - gradient_norm = gradient / self._norm( - gradient, ord=norm_ord) - - adv_img = adv_img + epsilon * gradient_norm * (max_ - min_) - adv_img = np.clip(adv_img, min_, max_) - adv_label = np.argmax(self.model.predict(adv_img)) - logging.info('step={}, epsilon = {:.5f}, pre_label = {}, ' - 'adv_label={}'.format(step, epsilon, pre_label, - adv_label)) - if adversary.try_accept_the_example(adv_img, adv_label): - return adversary - step += 1 - return adversary - - @staticmethod - def _norm(a, ord): - if a.ndim == 1: - return np.linalg.norm(a, ord=ord) - if a.ndim == a.shape[0]: - norm_shape = (a.ndim, reduce(np.dot, a.shape[1:])) - norm_axis = 1 - else: - norm_shape = (reduce(np.dot, a.shape[:-1]), a.ndim) - norm_axis = 0 - return np.linalg.norm(a.reshape(norm_shape), ord=ord, axis=norm_axis) - - -class FastGradientSignMethodTargetedAttack(GradientMethodAttack): - """ - "Fast Gradient Sign Method" is extended to support targeted attack. - "Fast Gradient Sign Method" was originally implemented by Goodfellow et - al. (2015) with the infinity norm. - - Paper link: https://arxiv.org/abs/1412.6572 - """ - - def _apply(self, adversary, epsilons=0.01): - return GradientMethodAttack._apply( - self, - adversary=adversary, - norm_ord=np.inf, - epsilons=epsilons, - steps=1) - - -class FastGradientSignMethodAttack(FastGradientSignMethodTargetedAttack): - """ - This attack was originally implemented by Goodfellow et al. (2015) with the - infinity norm, and is known as the "Fast Gradient Sign Method". - - Paper link: https://arxiv.org/abs/1412.6572 - """ - - def __init__(self, model): - super(FastGradientSignMethodAttack, self).__init__(model, False) - - -class IterativeLeastLikelyClassMethodAttack(GradientMethodAttack): - """ - "Iterative Least-likely Class Method (ILCM)" extends "BIM" to support - targeted attack. - "The Basic Iterative Method (BIM)" is to extend "FSGM". "BIM" iteratively - take multiple small steps while adjusting the direction after each step. - - Paper link: https://arxiv.org/abs/1607.02533 - """ - - def _apply(self, adversary, epsilons=0.01, steps=1000): - return GradientMethodAttack._apply( - self, - adversary=adversary, - norm_ord=np.inf, - epsilons=epsilons, - steps=steps) - - -class BasicIterativeMethodAttack(IterativeLeastLikelyClassMethodAttack): - """ - FGSM is a one-step method. "The Basic Iterative Method (BIM)" iteratively - take multiple small steps while adjusting the direction after each step. - Paper link: https://arxiv.org/abs/1607.02533 - """ - - def __init__(self, model): - super(BasicIterativeMethodAttack, self).__init__(model, False) - - -class MomentumIteratorAttack(GradientMethodAttack): - """ - The Momentum Iterative Fast Gradient Sign Method (Dong et al. 2017). - This method won the first places in NIPS 2017 Non-targeted Adversarial - Attacks and Targeted Adversarial Attacks. The original paper used - hard labels for this attack; no label smoothing. inf norm. - Paper link: https://arxiv.org/pdf/1710.06081.pdf - """ - - def __init__(self, model, support_targeted=True): - """ - :param model(model): The model to be attacked. - :param support_targeted(bool): Does this attack method support targeted. - """ - super(MomentumIteratorAttack, self).__init__(model) - self.support_targeted = support_targeted - - def _apply(self, - adversary, - norm_ord=np.inf, - epsilons=0.1, - steps=100, - epsilon_steps=100, - decay_factor=1): - """ - Apply the momentum iterative gradient attack method. - :param adversary(Adversary): - The Adversary object. - :param norm_ord(int): - Order of the norm, such as np.inf, 1, 2, etc. It can't be 0. - :param epsilons(list|tuple|float): - Attack step size (input variation). - Largest step size if epsilons is not iterable. - :param epsilon_steps: - The number of Epsilons' iteration for each attack iteration. - :param steps: - The number of attack iteration. - :param decay_factor: - The decay factor for the momentum term. - :return: - adversary(Adversary): The Adversary object. - """ - if norm_ord == 0: - raise ValueError("L0 norm is not supported!") - - if not self.support_targeted: - if adversary.is_targeted_attack: - raise ValueError( - "This attack method doesn't support targeted attack!") - - assert self.model.channel_axis() == adversary.original.ndim - assert (self.model.channel_axis() == 1 or - self.model.channel_axis() == adversary.original.shape[0] or - self.model.channel_axis() == adversary.original.shape[-1]) - - if not isinstance(epsilons, Iterable): - epsilons = np.linspace(0, epsilons, num=epsilon_steps) - - min_, max_ = self.model.bounds() - pre_label = adversary.original_label - - for epsilon in epsilons[:]: - if epsilon == 0.0: - continue - step = 1 - adv_img = adversary.original - momentum = 0 - for i in range(steps): - if adversary.is_targeted_attack: - gradient = -self.model.gradient(adv_img, - adversary.target_label) - else: - gradient = self.model.gradient(adv_img, pre_label) - - # normalize gradient - velocity = gradient / self._norm(gradient, ord=1) - momentum = decay_factor * momentum + velocity - if norm_ord == np.inf: - normalized_grad = np.sign(momentum) - else: - normalized_grad = self._norm(momentum, ord=norm_ord) - perturbation = epsilon * normalized_grad - adv_img = adv_img + perturbation - adv_img = np.clip(adv_img, min_, max_) - adv_label = np.argmax(self.model.predict(adv_img)) - logging.info( - 'step={}, epsilon = {:.5f}, pre_label = {}, adv_label={}' - .format(step, epsilon, pre_label, adv_label)) - if adversary.try_accept_the_example(adv_img, adv_label): - return adversary - step += 1 - - return adversary - - -FGSM = FastGradientSignMethodAttack -FGSMT = FastGradientSignMethodTargetedAttack -BIM = BasicIterativeMethodAttack -ILCM = IterativeLeastLikelyClassMethodAttack -MIFGSM = MomentumIteratorAttack diff --git a/fluid/adversarial/advbox/attacks/lbfgs.py b/fluid/adversarial/advbox/attacks/lbfgs.py deleted file mode 100644 index b427df1d9770c25b4ad68609dffc890f8c232e36..0000000000000000000000000000000000000000 --- a/fluid/adversarial/advbox/attacks/lbfgs.py +++ /dev/null @@ -1,138 +0,0 @@ -""" -This module provide the attack method of "LBFGS". -""" -from __future__ import division - -import logging - -import numpy as np -from scipy.optimize import fmin_l_bfgs_b - -from .base import Attack - -__all__ = ['LBFGSAttack', 'LBFGS'] - - -class LBFGSAttack(Attack): - """ - Uses L-BFGS-B to minimize the cross-entropy and the distance between the - original and the adversary. - - Paper link: https://arxiv.org/abs/1510.05328 - """ - - def __init__(self, model): - super(LBFGSAttack, self).__init__(model) - self._predicts_normalized = None - self._adversary = None # type: Adversary - - def _apply(self, adversary, epsilon=0.001, steps=10): - self._adversary = adversary - - if not adversary.is_targeted_attack: - raise ValueError("This attack method only support targeted attack!") - - # finding initial c - logging.info('finding initial c...') - c = epsilon - x0 = adversary.original.flatten() - for i in range(30): - c = 2 * c - logging.info('c={}'.format(c)) - is_adversary = self._lbfgsb(x0, c, steps) - if is_adversary: - break - if not is_adversary: - logging.info('Failed!') - return adversary - - # binary search c - logging.info('binary search c...') - c_low = 0 - c_high = c - while c_high - c_low >= epsilon: - logging.info('c_high={}, c_low={}, diff={}, epsilon={}' - .format(c_high, c_low, c_high - c_low, epsilon)) - c_half = (c_low + c_high) / 2 - is_adversary = self._lbfgsb(x0, c_half, steps) - if is_adversary: - c_high = c_half - else: - c_low = c_half - - return adversary - - def _is_predicts_normalized(self, predicts): - """ - To determine the predicts is normalized. - :param predicts(np.array): the output of the model. - :return: bool - """ - if self._predicts_normalized is None: - if self.model.predict_name().lower() in [ - 'softmax', 'probabilities', 'probs' - ]: - self._predicts_normalized = True - else: - if np.any(predicts < 0.0): - self._predicts_normalized = False - else: - s = np.sum(predicts.flatten()) - if 0.999 <= s <= 1.001: - self._predicts_normalized = True - else: - self._predicts_normalized = False - assert self._predicts_normalized is not None - return self._predicts_normalized - - def _loss(self, adv_x, c): - """ - To get the loss and gradient. - :param adv_x: the candidate adversarial example - :param c: parameter 'C' in the paper - :return: (loss, gradient) - """ - x = adv_x.reshape(self._adversary.original.shape) - - # cross_entropy - logits = self.model.predict(x) - if not self._is_predicts_normalized(logits): # to softmax - e = np.exp(logits) - logits = e / np.sum(e) - e = np.exp(logits) - s = np.sum(e) - ce = np.log(s) - logits[self._adversary.target_label] - - # L2 distance - min_, max_ = self.model.bounds() - d = np.sum((x - self._adversary.original).flatten() ** 2) \ - / ((max_ - min_) ** 2) / len(adv_x) - - # gradient - gradient = self.model.gradient(x, self._adversary.target_label) - - result = (c * ce + d).astype(float), gradient.flatten().astype(float) - return result - - def _lbfgsb(self, x0, c, maxiter): - min_, max_ = self.model.bounds() - bounds = [(min_, max_)] * len(x0) - approx_grad_eps = (max_ - min_) / 100.0 - x, f, d = fmin_l_bfgs_b( - self._loss, - x0, - args=(c, ), - bounds=bounds, - maxiter=maxiter, - epsilon=approx_grad_eps) - if np.amax(x) > max_ or np.amin(x) < min_: - x = np.clip(x, min_, max_) - shape = self._adversary.original.shape - adv_label = np.argmax(self.model.predict(x.reshape(shape))) - logging.info('pre_label = {}, adv_label={}'.format( - self._adversary.target_label, adv_label)) - return self._adversary.try_accept_the_example( - x.reshape(shape), adv_label) - - -LBFGS = LBFGSAttack diff --git a/fluid/adversarial/advbox/attacks/saliency.py b/fluid/adversarial/advbox/attacks/saliency.py deleted file mode 100644 index 3179f0ffe626c63424063645690f131702c3650c..0000000000000000000000000000000000000000 --- a/fluid/adversarial/advbox/attacks/saliency.py +++ /dev/null @@ -1,146 +0,0 @@ -""" -This module provide the attack method for JSMA's implement. -""" -from __future__ import division - -import logging -import random -import numpy as np - -from .base import Attack - - -class SaliencyMapAttack(Attack): - """ - Implements the Saliency Map Attack. - The Jacobian-based Saliency Map Approach (Papernot et al. 2016). - Paper link: https://arxiv.org/pdf/1511.07528.pdf - """ - - def _apply(self, - adversary, - max_iter=2000, - fast=True, - theta=0.1, - max_perturbations_per_pixel=7): - """ - Apply the JSMA attack. - Args: - adversary(Adversary): The Adversary object. - max_iter(int): The max iterations. - fast(bool): Whether evaluate the pixel influence on sum of residual classes. - theta(float): Perturbation per pixel relative to [min, max] range. - max_perturbations_per_pixel(int): The max count of perturbation per pixel. - Return: - adversary: The Adversary object. - """ - assert adversary is not None - - if not adversary.is_targeted_attack or (adversary.target_label is None): - target_labels = self._generate_random_target( - adversary.original_label) - else: - target_labels = [adversary.target_label] - - for target in target_labels: - original_image = adversary.original - - # the mask defines the search domain - # each modified pixel with border value is set to zero in mask - mask = np.ones_like(original_image) - - # count tracks how often each pixel was changed - counts = np.zeros_like(original_image) - - labels = range(self.model.num_classes()) - adv_img = original_image.copy() - min_, max_ = self.model.bounds() - - for step in range(max_iter): - adv_img = np.clip(adv_img, min_, max_) - adv_label = np.argmax(self.model.predict(adv_img)) - if adversary.try_accept_the_example(adv_img, adv_label): - return adversary - - # stop if mask is all zero - if not any(mask.flatten()): - return adversary - - logging.info('step = {}, original_label = {}, adv_label={}'. - format(step, adversary.original_label, adv_label)) - - # get pixel location with highest influence on class - idx, p_sign = self._saliency_map( - adv_img, target, labels, mask, fast=fast) - - # apply perturbation - adv_img[idx] += -p_sign * theta * (max_ - min_) - - # tracks number of updates for each pixel - counts[idx] += 1 - - # remove pixel from search domain if it hits the bound - if adv_img[idx] <= min_ or adv_img[idx] >= max_: - mask[idx] = 0 - - # remove pixel if it was changed too often - if counts[idx] >= max_perturbations_per_pixel: - mask[idx] = 0 - - adv_img = np.clip(adv_img, min_, max_) - - def _generate_random_target(self, original_label): - """ - Draw random target labels all of which are different and not the original label. - Args: - original_label(int): Original label. - Return: - target_labels(list): random target labels - """ - num_random_target = 1 - num_classes = self.model.num_classes() - assert num_random_target <= num_classes - 1 - - target_labels = random.sample(range(num_classes), num_random_target + 1) - target_labels = [t for t in target_labels if t != original_label] - target_labels = target_labels[:num_random_target] - - return target_labels - - def _saliency_map(self, image, target, labels, mask, fast=False): - """ - Get pixel location with highest influence on class. - Args: - image(numpy.ndarray): Image with shape (height, width, channels). - target(int): The target label. - labels(int): The number of classes of the output label. - mask(list): Each modified pixel with border value is set to zero in mask. - fast(bool): Whether evaluate the pixel influence on sum of residual classes. - Return: - idx: The index of optimal pixel. - pix_sign: The direction of perturbation - """ - # pixel influence on target class - alphas = self.model.gradient(image, target) * mask - - # pixel influence on sum of residual classes(don't evaluate if fast == True) - if fast: - betas = -np.ones_like(alphas) - else: - betas = np.sum([ - self.model.gradient(image, label) * mask - alphas - for label in labels - ], 0) - - # compute saliency map (take into account both pos. & neg. perturbations) - sal_map = np.abs(alphas) * np.abs(betas) * np.sign(alphas * betas) - - # find optimal pixel & direction of perturbation - idx = np.argmin(sal_map) - idx = np.unravel_index(idx, mask.shape) - pix_sign = np.sign(alphas)[idx] - - return idx, pix_sign - - -JSMA = SaliencyMapAttack diff --git a/fluid/adversarial/advbox/models/__init__.py b/fluid/adversarial/advbox/models/__init__.py deleted file mode 100644 index de6d2a9feeb4a3ffc3b8bfb11e87f600a6951487..0000000000000000000000000000000000000000 --- a/fluid/adversarial/advbox/models/__init__.py +++ /dev/null @@ -1,3 +0,0 @@ -""" -Models __init__.py -""" \ No newline at end of file diff --git a/fluid/adversarial/advbox/models/base.py b/fluid/adversarial/advbox/models/base.py deleted file mode 100644 index f25d4e305d4772b1b2876beef670823a393b7089..0000000000000000000000000000000000000000 --- a/fluid/adversarial/advbox/models/base.py +++ /dev/null @@ -1,116 +0,0 @@ -""" -The base model of the model. -""" -from abc import ABCMeta -from abc import abstractmethod - -import numpy as np - - -class Model(object): - """ - Base class of model to provide attack. - - Args: - bounds(tuple): The lower and upper bound for the image pixel. - channel_axis(int): The index of the axis that represents the color - channel. - preprocess(tuple): Two element tuple used to preprocess the input. - First substract the first element, then divide the second element. - """ - __metaclass__ = ABCMeta - - def __init__(self, bounds, channel_axis, preprocess=None): - assert len(bounds) == 2 - assert channel_axis in [0, 1, 2, 3] - - self._bounds = bounds - self._channel_axis = channel_axis - - # Make self._preprocess to be (0,1) if possible, so that don't need - # to do substract or divide. - if preprocess is not None: - sub, div = np.array(preprocess) - if not np.any(sub): - sub = 0 - if np.all(div == 1): - div = 1 - assert (div is None) or np.all(div) - self._preprocess = (sub, div) - else: - self._preprocess = (0, 1) - - def bounds(self): - """ - Return the upper and lower bounds of the model. - """ - return self._bounds - - def channel_axis(self): - """ - Return the channel axis of the model. - """ - return self._channel_axis - - def _process_input(self, input_): - res = None - sub, div = self._preprocess - if np.any(sub != 0): - res = input_ - sub - if not np.all(sub == 1): - if res is None: # "res = input_ - sub" is not executed! - res = input_ / div - else: - res /= div - if res is None: # "res = (input_ - sub)/ div" is not executed! - return input_ - return res - - @abstractmethod - def predict(self, data): - """ - Calculate the prediction of the data. - - Args: - data(numpy.ndarray): input data with shape (size, - height, width, channels). - - Return: - numpy.ndarray: predictions of the data with shape (batch_size, - num_of_classes). - """ - raise NotImplementedError - - @abstractmethod - def num_classes(self): - """ - Determine the number of the classes - - Return: - int: the number of the classes - """ - raise NotImplementedError - - @abstractmethod - def gradient(self, data, label): - """ - Calculate the gradient of the cross-entropy loss w.r.t the image. - - Args: - data(numpy.ndarray): input data with shape (size, height, width, - channels). - label(int): Label used to calculate the gradient. - - Return: - numpy.ndarray: gradient of the cross-entropy loss w.r.t the image - with the shape (height, width, channel). - """ - raise NotImplementedError - - @abstractmethod - def predict_name(self): - """ - Get the predict name, such as "softmax",etc. - :return: string - """ - raise NotImplementedError diff --git a/fluid/adversarial/advbox/models/paddle.py b/fluid/adversarial/advbox/models/paddle.py deleted file mode 100644 index 73439d2a4e616899dca6c1a017e1f75b4fb1971f..0000000000000000000000000000000000000000 --- a/fluid/adversarial/advbox/models/paddle.py +++ /dev/null @@ -1,123 +0,0 @@ -""" -Paddle model -""" -from __future__ import absolute_import - -import numpy as np -import paddle.fluid as fluid - -from .base import Model - - -class PaddleModel(Model): - """ - Create a PaddleModel instance. - When you need to generate a adversarial sample, you should construct an - instance of PaddleModel. - - Args: - program(paddle.fluid.framework.Program): The program of the model - which generate the adversarial sample. - input_name(string): The name of the input. - logits_name(string): The name of the logits. - predict_name(string): The name of the predict. - cost_name(string): The name of the loss in the program. - """ - - def __init__(self, - program, - input_name, - logits_name, - predict_name, - cost_name, - bounds, - channel_axis=3, - preprocess=None): - if preprocess is None: - preprocess = (0, 1) - - super(PaddleModel, self).__init__( - bounds=bounds, channel_axis=channel_axis, preprocess=preprocess) - - self._program = program - self._place = fluid.CPUPlace() - self._exe = fluid.Executor(self._place) - - self._input_name = input_name - self._logits_name = logits_name - self._predict_name = predict_name - self._cost_name = cost_name - - # gradient - loss = self._program.block(0).var(self._cost_name) - param_grads = fluid.backward.append_backward( - loss, parameter_list=[self._input_name]) - self._gradient = filter(lambda p: p[0].name == self._input_name, - param_grads)[0][1] - - def predict(self, data): - """ - Calculate the prediction of the data. - - Args: - data(numpy.ndarray): input data with shape (size, - height, width, channels). - - Return: - numpy.ndarray: predictions of the data with shape (batch_size, - num_of_classes). - """ - scaled_data = self._process_input(data) - feeder = fluid.DataFeeder( - feed_list=[self._input_name, self._logits_name], - place=self._place, - program=self._program) - predict_var = self._program.block(0).var(self._predict_name) - predict = self._exe.run(self._program, - feed=feeder.feed([(scaled_data, 0)]), - fetch_list=[predict_var]) - predict = np.squeeze(predict, axis=0) - return predict - - def num_classes(self): - """ - Calculate the number of classes of the output label. - - Return: - int: the number of classes - """ - predict_var = self._program.block(0).var(self._predict_name) - assert len(predict_var.shape) == 2 - return predict_var.shape[1] - - def gradient(self, data, label): - """ - Calculate the gradient of the cross-entropy loss w.r.t the image. - - Args: - data(numpy.ndarray): input data with shape (size, height, width, - channels). - label(int): Label used to calculate the gradient. - - Return: - numpy.ndarray: gradient of the cross-entropy loss w.r.t the image - with the shape (height, width, channel). - """ - scaled_data = self._process_input(data) - - feeder = fluid.DataFeeder( - feed_list=[self._input_name, self._logits_name], - place=self._place, - program=self._program) - - grad, = self._exe.run(self._program, - feed=feeder.feed([(scaled_data, label)]), - fetch_list=[self._gradient]) - return grad.reshape(data.shape) - - def predict_name(self): - """ - Get the predict name, such as "softmax",etc. - :return: string - """ - return self._program.block(0).var(self._predict_name).op.type diff --git a/fluid/adversarial/tutorials/__init__.py b/fluid/adversarial/tutorials/__init__.py deleted file mode 100644 index 822d1f6f037ec1f3e4e41498172ebcf67342e3e0..0000000000000000000000000000000000000000 --- a/fluid/adversarial/tutorials/__init__.py +++ /dev/null @@ -1,3 +0,0 @@ -""" - A set of tutorials for generating adversarial examples with advbox. -""" \ No newline at end of file diff --git a/fluid/adversarial/tutorials/mnist_model.py b/fluid/adversarial/tutorials/mnist_model.py deleted file mode 100644 index b1ebb0f88752df4c18ddd4ad96725f636bf261fc..0000000000000000000000000000000000000000 --- a/fluid/adversarial/tutorials/mnist_model.py +++ /dev/null @@ -1,96 +0,0 @@ -""" -CNN on mnist data using fluid api of paddlepaddle -""" -import paddle -import paddle.fluid as fluid - - -def mnist_cnn_model(img): - """ - Mnist cnn model - - Args: - img(Varaible): the input image to be recognized - - Returns: - Variable: the label prediction - """ - conv_pool_1 = fluid.nets.simple_img_conv_pool( - input=img, - num_filters=20, - filter_size=5, - pool_size=2, - pool_stride=2, - act='relu') - - conv_pool_2 = fluid.nets.simple_img_conv_pool( - input=conv_pool_1, - num_filters=50, - filter_size=5, - pool_size=2, - pool_stride=2, - act='relu') - fc = fluid.layers.fc(input=conv_pool_2, size=50, act='relu') - - logits = fluid.layers.fc(input=fc, size=10, act='softmax') - return logits - - -def main(): - """ - Train the cnn model on mnist datasets - """ - img = fluid.layers.data(name='img', shape=[1, 28, 28], dtype='float32') - label = fluid.layers.data(name='label', shape=[1], dtype='int64') - logits = mnist_cnn_model(img) - cost = fluid.layers.cross_entropy(input=logits, label=label) - avg_cost = fluid.layers.mean(x=cost) - optimizer = fluid.optimizer.Adam(learning_rate=0.01) - optimizer.minimize(avg_cost) - - batch_size = fluid.layers.create_tensor(dtype='int64') - batch_acc = fluid.layers.accuracy( - input=logits, label=label, total=batch_size) - - BATCH_SIZE = 50 - PASS_NUM = 3 - ACC_THRESHOLD = 0.98 - LOSS_THRESHOLD = 10.0 - train_reader = paddle.batch( - paddle.reader.shuffle( - paddle.dataset.mnist.train(), buf_size=500), - batch_size=BATCH_SIZE) - - # use CPU - place = fluid.CPUPlace() - # use GPU - # place = fluid.CUDAPlace(0) - exe = fluid.Executor(place) - feeder = fluid.DataFeeder(feed_list=[img, label], place=place) - exe.run(fluid.default_startup_program()) - - pass_acc = fluid.average.WeightedAverage() - for pass_id in range(PASS_NUM): - pass_acc.reset() - for data in train_reader(): - loss, acc, b_size = exe.run( - fluid.default_main_program(), - feed=feeder.feed(data), - fetch_list=[avg_cost, batch_acc, batch_size]) - pass_acc.add(value=acc, weight=b_size) - pass_acc_val = pass_acc.eval()[0] - print("pass_id=" + str(pass_id) + " acc=" + str(acc[0]) + - " pass_acc=" + str(pass_acc_val)) - if loss < LOSS_THRESHOLD and pass_acc_val > ACC_THRESHOLD: - # early stop - break - - print("pass_id=" + str(pass_id) + " pass_acc=" + str(pass_acc.eval()[ - 0])) - fluid.io.save_params( - exe, dirname='./mnist', main_program=fluid.default_main_program()) - print('train mnist done') - - -if __name__ == '__main__': - main() diff --git a/fluid/adversarial/tutorials/mnist_tutorial_bim.py b/fluid/adversarial/tutorials/mnist_tutorial_bim.py deleted file mode 100644 index 0524b908ea9ed028cf03aa9621c08fb8ef0cfc79..0000000000000000000000000000000000000000 --- a/fluid/adversarial/tutorials/mnist_tutorial_bim.py +++ /dev/null @@ -1,127 +0,0 @@ -""" -BIM tutorial on mnist using advbox tool. -BIM method iteratively take multiple small steps while adjusting the direction after each step. -It only supports non-targeted attack. -""" -import sys -sys.path.append("..") - -import matplotlib.pyplot as plt -import paddle.fluid as fluid -import paddle - -from advbox.adversary import Adversary -from advbox.attacks.gradient_method import BIM -from advbox.models.paddle import PaddleModel -from tutorials.mnist_model import mnist_cnn_model - - -def main(): - """ - Advbox demo which demonstrate how to use advbox. - """ - TOTAL_NUM = 500 - IMG_NAME = 'img' - LABEL_NAME = 'label' - - img = fluid.layers.data(name=IMG_NAME, shape=[1, 28, 28], dtype='float32') - # gradient should flow - img.stop_gradient = False - label = fluid.layers.data(name=LABEL_NAME, shape=[1], dtype='int64') - logits = mnist_cnn_model(img) - cost = fluid.layers.cross_entropy(input=logits, label=label) - avg_cost = fluid.layers.mean(x=cost) - - # use CPU - place = fluid.CPUPlace() - # use GPU - # place = fluid.CUDAPlace(0) - exe = fluid.Executor(place) - - BATCH_SIZE = 1 - train_reader = paddle.batch( - paddle.reader.shuffle( - paddle.dataset.mnist.train(), buf_size=128 * 10), - batch_size=BATCH_SIZE) - - test_reader = paddle.batch( - paddle.reader.shuffle( - paddle.dataset.mnist.test(), buf_size=128 * 10), - batch_size=BATCH_SIZE) - - fluid.io.load_params( - exe, "./mnist/", main_program=fluid.default_main_program()) - - # advbox demo - m = PaddleModel( - fluid.default_main_program(), - IMG_NAME, - LABEL_NAME, - logits.name, - avg_cost.name, (-1, 1), - channel_axis=1) - attack = BIM(m) - attack_config = {"epsilons": 0.1, "steps": 100} - - # use train data to generate adversarial examples - total_count = 0 - fooling_count = 0 - for data in train_reader(): - total_count += 1 - adversary = Adversary(data[0][0], data[0][1]) - - # BIM non-targeted attack - adversary = attack(adversary, **attack_config) - - if adversary.is_successful(): - fooling_count += 1 - print( - 'attack success, original_label=%d, adversarial_label=%d, count=%d' - % (data[0][1], adversary.adversarial_label, total_count)) - # plt.imshow(adversary.target, cmap='Greys_r') - # plt.show() - # np.save('adv_img', adversary.target) - else: - print('attack failed, original_label=%d, count=%d' % - (data[0][1], total_count)) - - if total_count >= TOTAL_NUM: - print( - "[TRAIN_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f" - % (fooling_count, total_count, - float(fooling_count) / total_count)) - break - - # use test data to generate adversarial examples - total_count = 0 - fooling_count = 0 - for data in test_reader(): - total_count += 1 - adversary = Adversary(data[0][0], data[0][1]) - - # BIM non-targeted attack - adversary = attack(adversary, **attack_config) - - if adversary.is_successful(): - fooling_count += 1 - print( - 'attack success, original_label=%d, adversarial_label=%d, count=%d' - % (data[0][1], adversary.adversarial_label, total_count)) - # plt.imshow(adversary.target, cmap='Greys_r') - # plt.show() - # np.save('adv_img', adversary.target) - else: - print('attack failed, original_label=%d, count=%d' % - (data[0][1], total_count)) - - if total_count >= TOTAL_NUM: - print( - "[TEST_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f" - % (fooling_count, total_count, - float(fooling_count) / total_count)) - break - print("bim attack done") - - -if __name__ == '__main__': - main() diff --git a/fluid/adversarial/tutorials/mnist_tutorial_deepfool.py b/fluid/adversarial/tutorials/mnist_tutorial_deepfool.py deleted file mode 100644 index 74ab5e8040022f4df96dd97fc77da9dc920d8f2b..0000000000000000000000000000000000000000 --- a/fluid/adversarial/tutorials/mnist_tutorial_deepfool.py +++ /dev/null @@ -1,137 +0,0 @@ -""" -DeepFool tutorial on mnist using advbox tool. -Deepfool is a simple and accurate adversarial attack method. -It supports both targeted attack and non-targeted attack. -""" -import sys -sys.path.append("..") - -import matplotlib.pyplot as plt -import paddle.fluid as fluid -import paddle - -from advbox.adversary import Adversary -from advbox.attacks.deepfool import DeepFoolAttack -from advbox.models.paddle import PaddleModel -from tutorials.mnist_model import mnist_cnn_model - - -def main(): - """ - Advbox demo which demonstrate how to use advbox. - """ - TOTAL_NUM = 500 - IMG_NAME = 'img' - LABEL_NAME = 'label' - - img = fluid.layers.data(name=IMG_NAME, shape=[1, 28, 28], dtype='float32') - # gradient should flow - img.stop_gradient = False - label = fluid.layers.data(name=LABEL_NAME, shape=[1], dtype='int64') - logits = mnist_cnn_model(img) - cost = fluid.layers.cross_entropy(input=logits, label=label) - avg_cost = fluid.layers.mean(x=cost) - - # use CPU - place = fluid.CPUPlace() - # use GPU - # place = fluid.CUDAPlace(0) - exe = fluid.Executor(place) - - BATCH_SIZE = 1 - train_reader = paddle.batch( - paddle.reader.shuffle( - paddle.dataset.mnist.train(), buf_size=128 * 10), - batch_size=BATCH_SIZE) - - test_reader = paddle.batch( - paddle.reader.shuffle( - paddle.dataset.mnist.test(), buf_size=128 * 10), - batch_size=BATCH_SIZE) - - fluid.io.load_params( - exe, "./mnist/", main_program=fluid.default_main_program()) - - # advbox demo - m = PaddleModel( - fluid.default_main_program(), - IMG_NAME, - LABEL_NAME, - logits.name, - avg_cost.name, (-1, 1), - channel_axis=1) - attack = DeepFoolAttack(m) - attack_config = {"iterations": 100, "overshoot": 9} - - # use train data to generate adversarial examples - total_count = 0 - fooling_count = 0 - for data in train_reader(): - total_count += 1 - adversary = Adversary(data[0][0], data[0][1]) - - # DeepFool non-targeted attack - adversary = attack(adversary, **attack_config) - - # DeepFool targeted attack - # tlabel = 0 - # adversary.set_target(is_targeted_attack=True, target_label=tlabel) - # adversary = attack(adversary, **attack_config) - - if adversary.is_successful(): - fooling_count += 1 - print( - 'attack success, original_label=%d, adversarial_label=%d, count=%d' - % (data[0][1], adversary.adversarial_label, total_count)) - # plt.imshow(adversary.target, cmap='Greys_r') - # plt.show() - # np.save('adv_img', adversary.target) - else: - print('attack failed, original_label=%d, count=%d' % - (data[0][1], total_count)) - - if total_count >= TOTAL_NUM: - print( - "[TRAIN_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f" - % (fooling_count, total_count, - float(fooling_count) / total_count)) - break - - # use test data to generate adversarial examples - total_count = 0 - fooling_count = 0 - for data in test_reader(): - total_count += 1 - adversary = Adversary(data[0][0], data[0][1]) - - # DeepFool non-targeted attack - adversary = attack(adversary, **attack_config) - - # DeepFool targeted attack - # tlabel = 0 - # adversary.set_target(is_targeted_attack=True, target_label=tlabel) - # adversary = attack(adversary, **attack_config) - - if adversary.is_successful(): - fooling_count += 1 - print( - 'attack success, original_label=%d, adversarial_label=%d, count=%d' - % (data[0][1], adversary.adversarial_label, total_count)) - # plt.imshow(adversary.target, cmap='Greys_r') - # plt.show() - # np.save('adv_img', adversary.target) - else: - print('attack failed, original_label=%d, count=%d' % - (data[0][1], total_count)) - - if total_count >= TOTAL_NUM: - print( - "[TEST_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f" - % (fooling_count, total_count, - float(fooling_count) / total_count)) - break - print("deelfool attack done") - - -if __name__ == '__main__': - main() diff --git a/fluid/adversarial/tutorials/mnist_tutorial_fgsm.py b/fluid/adversarial/tutorials/mnist_tutorial_fgsm.py deleted file mode 100644 index 178fc146dd636dce8fa2f82552a996dca239c55a..0000000000000000000000000000000000000000 --- a/fluid/adversarial/tutorials/mnist_tutorial_fgsm.py +++ /dev/null @@ -1,139 +0,0 @@ -""" -FGSM tutorial on mnist using advbox tool. -FGSM method is non-targeted attack while FGSMT is targeted attack. -""" -import sys -sys.path.append("..") - -import matplotlib.pyplot as plt -import numpy as np -import paddle.fluid as fluid -import paddle - -from advbox.adversary import Adversary -from advbox.attacks.gradient_method import FGSM -from advbox.attacks.gradient_method import FGSMT -from advbox.models.paddle import PaddleModel -from tutorials.mnist_model import mnist_cnn_model - - -def main(): - """ - Advbox demo which demonstrate how to use advbox. - """ - TOTAL_NUM = 500 - IMG_NAME = 'img' - LABEL_NAME = 'label' - - img = fluid.layers.data(name=IMG_NAME, shape=[1, 28, 28], dtype='float32') - # gradient should flow - img.stop_gradient = False - label = fluid.layers.data(name=LABEL_NAME, shape=[1], dtype='int64') - logits = mnist_cnn_model(img) - cost = fluid.layers.cross_entropy(input=logits, label=label) - avg_cost = fluid.layers.mean(x=cost) - - # use CPU - place = fluid.CPUPlace() - # use GPU - # place = fluid.CUDAPlace(0) - exe = fluid.Executor(place) - - BATCH_SIZE = 1 - train_reader = paddle.batch( - paddle.reader.shuffle( - paddle.dataset.mnist.train(), buf_size=128 * 10), - batch_size=BATCH_SIZE) - - test_reader = paddle.batch( - paddle.reader.shuffle( - paddle.dataset.mnist.test(), buf_size=128 * 10), - batch_size=BATCH_SIZE) - - fluid.io.load_params( - exe, "./mnist/", main_program=fluid.default_main_program()) - - # advbox demo - m = PaddleModel( - fluid.default_main_program(), - IMG_NAME, - LABEL_NAME, - logits.name, - avg_cost.name, (-1, 1), - channel_axis=1) - attack = FGSM(m) - # attack = FGSMT(m) - attack_config = {"epsilons": 0.3} - - # use train data to generate adversarial examples - total_count = 0 - fooling_count = 0 - for data in train_reader(): - total_count += 1 - adversary = Adversary(data[0][0], data[0][1]) - - # FGSM non-targeted attack - adversary = attack(adversary, **attack_config) - - # FGSMT targeted attack - # tlabel = 0 - # adversary.set_target(is_targeted_attack=True, target_label=tlabel) - # adversary = attack(adversary, **attack_config) - - if adversary.is_successful(): - fooling_count += 1 - print( - 'attack success, original_label=%d, adversarial_label=%d, count=%d' - % (data[0][1], adversary.adversarial_label, total_count)) - # plt.imshow(adversary.target, cmap='Greys_r') - # plt.show() - # np.save('adv_img', adversary.target) - else: - print('attack failed, original_label=%d, count=%d' % - (data[0][1], total_count)) - - if total_count >= TOTAL_NUM: - print( - "[TRAIN_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f" - % (fooling_count, total_count, - float(fooling_count) / total_count)) - break - - # use test data to generate adversarial examples - total_count = 0 - fooling_count = 0 - for data in test_reader(): - total_count += 1 - adversary = Adversary(data[0][0], data[0][1]) - - # FGSM non-targeted attack - adversary = attack(adversary, **attack_config) - - # FGSMT targeted attack - # tlabel = 0 - # adversary.set_target(is_targeted_attack=True, target_label=tlabel) - # adversary = attack(adversary, **attack_config) - - if adversary.is_successful(): - fooling_count += 1 - print( - 'attack success, original_label=%d, adversarial_label=%d, count=%d' - % (data[0][1], adversary.adversarial_label, total_count)) - # plt.imshow(adversary.target, cmap='Greys_r') - # plt.show() - # np.save('adv_img', adversary.target) - else: - print('attack failed, original_label=%d, count=%d' % - (data[0][1], total_count)) - - if total_count >= TOTAL_NUM: - print( - "[TEST_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f" - % (fooling_count, total_count, - float(fooling_count) / total_count)) - break - print("fgsm attack done") - - -if __name__ == '__main__': - main() diff --git a/fluid/adversarial/tutorials/mnist_tutorial_ilcm.py b/fluid/adversarial/tutorials/mnist_tutorial_ilcm.py deleted file mode 100644 index b12ffaab0367769d9bf9d58ec7396c8edd2487e9..0000000000000000000000000000000000000000 --- a/fluid/adversarial/tutorials/mnist_tutorial_ilcm.py +++ /dev/null @@ -1,130 +0,0 @@ -""" -ILCM tutorial on mnist using advbox tool. -ILCM method extends "BIM" to support targeted attack. -""" -import sys -sys.path.append("..") - -import matplotlib.pyplot as plt -import paddle.fluid as fluid -import paddle - -from advbox.adversary import Adversary -from advbox.attacks.gradient_method import ILCM -from advbox.models.paddle import PaddleModel -from tutorials.mnist_model import mnist_cnn_model - - -def main(): - """ - Advbox demo which demonstrate how to use advbox. - """ - TOTAL_NUM = 500 - IMG_NAME = 'img' - LABEL_NAME = 'label' - - img = fluid.layers.data(name=IMG_NAME, shape=[1, 28, 28], dtype='float32') - # gradient should flow - img.stop_gradient = False - label = fluid.layers.data(name=LABEL_NAME, shape=[1], dtype='int64') - logits = mnist_cnn_model(img) - cost = fluid.layers.cross_entropy(input=logits, label=label) - avg_cost = fluid.layers.mean(x=cost) - - # use CPU - place = fluid.CPUPlace() - # use GPU - # place = fluid.CUDAPlace(0) - exe = fluid.Executor(place) - - BATCH_SIZE = 1 - train_reader = paddle.batch( - paddle.reader.shuffle( - paddle.dataset.mnist.train(), buf_size=128 * 10), - batch_size=BATCH_SIZE) - - test_reader = paddle.batch( - paddle.reader.shuffle( - paddle.dataset.mnist.test(), buf_size=128 * 10), - batch_size=BATCH_SIZE) - - fluid.io.load_params( - exe, "./mnist/", main_program=fluid.default_main_program()) - - # advbox demo - m = PaddleModel( - fluid.default_main_program(), - IMG_NAME, - LABEL_NAME, - logits.name, - avg_cost.name, (-1, 1), - channel_axis=1) - attack = ILCM(m) - attack_config = {"epsilons": 0.1, "steps": 100} - - # use train data to generate adversarial examples - total_count = 0 - fooling_count = 0 - for data in train_reader(): - total_count += 1 - adversary = Adversary(data[0][0], data[0][1]) - tlabel = 0 - adversary.set_target(is_targeted_attack=True, target_label=tlabel) - - # ILCM targeted attack - adversary = attack(adversary, **attack_config) - - if adversary.is_successful(): - fooling_count += 1 - print( - 'attack success, original_label=%d, adversarial_label=%d, count=%d' - % (data[0][1], adversary.adversarial_label, total_count)) - # plt.imshow(adversary.target, cmap='Greys_r') - # plt.show() - # np.save('adv_img', adversary.target) - else: - print('attack failed, original_label=%d, count=%d' % - (data[0][1], total_count)) - - if total_count >= TOTAL_NUM: - print( - "[TRAIN_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f" - % (fooling_count, total_count, - float(fooling_count) / total_count)) - break - - # use test data to generate adversarial examples - total_count = 0 - fooling_count = 0 - for data in test_reader(): - total_count += 1 - adversary = Adversary(data[0][0], data[0][1]) - tlabel = 0 - adversary.set_target(is_targeted_attack=True, target_label=tlabel) - - # ILCM targeted attack - adversary = attack(adversary, **attack_config) - - if adversary.is_successful(): - fooling_count += 1 - print( - 'attack success, original_label=%d, adversarial_label=%d, count=%d' - % (data[0][1], adversary.adversarial_label, total_count)) - # plt.imshow(adversary.target, cmap='Greys_r') - # plt.show() - # np.save('adv_img', adversary.target) - else: - print('attack failed, original_label=%d, count=%d' % - (data[0][1], total_count)) - - if total_count >= TOTAL_NUM: - print( - "[TEST_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f" - % (fooling_count, total_count, - float(fooling_count) / total_count)) - break - print("ilcm attack done") - - -if __name__ == '__main__': - main() diff --git a/fluid/adversarial/tutorials/mnist_tutorial_jsma.py b/fluid/adversarial/tutorials/mnist_tutorial_jsma.py deleted file mode 100644 index 98829ec33afa1abc7646ac9297ec82c3de9b9eff..0000000000000000000000000000000000000000 --- a/fluid/adversarial/tutorials/mnist_tutorial_jsma.py +++ /dev/null @@ -1,142 +0,0 @@ -""" -JSMA tutorial on mnist using advbox tool. -JSMA method supports both targeted attack and non-targeted attack. -""" -import sys -sys.path.append("..") - -import matplotlib.pyplot as plt -import paddle.fluid as fluid -import paddle - -from advbox.adversary import Adversary -from advbox.attacks.saliency import JSMA -from advbox.models.paddle import PaddleModel -from tutorials.mnist_model import mnist_cnn_model - - -def main(): - """ - Advbox demo which demonstrate how to use advbox. - """ - TOTAL_NUM = 500 - IMG_NAME = 'img' - LABEL_NAME = 'label' - - img = fluid.layers.data(name=IMG_NAME, shape=[1, 28, 28], dtype='float32') - # gradient should flow - img.stop_gradient = False - label = fluid.layers.data(name=LABEL_NAME, shape=[1], dtype='int64') - logits = mnist_cnn_model(img) - cost = fluid.layers.cross_entropy(input=logits, label=label) - avg_cost = fluid.layers.mean(x=cost) - - # use CPU - place = fluid.CPUPlace() - # use GPU - # place = fluid.CUDAPlace(0) - exe = fluid.Executor(place) - - BATCH_SIZE = 1 - train_reader = paddle.batch( - paddle.reader.shuffle( - paddle.dataset.mnist.train(), buf_size=128 * 10), - batch_size=BATCH_SIZE) - - test_reader = paddle.batch( - paddle.reader.shuffle( - paddle.dataset.mnist.test(), buf_size=128 * 10), - batch_size=BATCH_SIZE) - - fluid.io.load_params( - exe, "./mnist/", main_program=fluid.default_main_program()) - - # advbox demo - m = PaddleModel( - fluid.default_main_program(), - IMG_NAME, - LABEL_NAME, - logits.name, - avg_cost.name, (-1, 1), - channel_axis=1) - attack = JSMA(m) - attack_config = { - "max_iter": 2000, - "theta": 0.1, - "max_perturbations_per_pixel": 7 - } - - # use train data to generate adversarial examples - total_count = 0 - fooling_count = 0 - for data in train_reader(): - total_count += 1 - adversary = Adversary(data[0][0], data[0][1]) - - # JSMA non-targeted attack - adversary = attack(adversary, **attack_config) - - # JSMA targeted attack - # tlabel = 0 - # adversary.set_target(is_targeted_attack=True, target_label=tlabel) - # adversary = attack(adversary, **attack_config) - - # JSMA may return None - if adversary is not None and adversary.is_successful(): - fooling_count += 1 - print( - 'attack success, original_label=%d, adversarial_label=%d, count=%d' - % (data[0][1], adversary.adversarial_label, total_count)) - # plt.imshow(adversary.target, cmap='Greys_r') - # plt.show() - # np.save('adv_img', adversary.target) - else: - print('attack failed, original_label=%d, count=%d' % - (data[0][1], total_count)) - - if total_count >= TOTAL_NUM: - print( - "[TRAIN_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f" - % (fooling_count, total_count, - float(fooling_count) / total_count)) - break - - # use test data to generate adversarial examples - total_count = 0 - fooling_count = 0 - for data in test_reader(): - total_count += 1 - adversary = Adversary(data[0][0], data[0][1]) - - # JSMA non-targeted attack - adversary = attack(adversary, **attack_config) - - # JSMA targeted attack - # tlabel = 0 - # adversary.set_target(is_targeted_attack=True, target_label=tlabel) - # adversary = attack(adversary, **attack_config) - - # JSMA may return None - if adversary is not None and adversary.is_successful(): - fooling_count += 1 - print( - 'attack success, original_label=%d, adversarial_label=%d, count=%d' - % (data[0][1], adversary.adversarial_label, total_count)) - # plt.imshow(adversary.target, cmap='Greys_r') - # plt.show() - # np.save('adv_img', adversary.target) - else: - print('attack failed, original_label=%d, count=%d' % - (data[0][1], total_count)) - - if total_count >= TOTAL_NUM: - print( - "[TEST_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f" - % (fooling_count, total_count, - float(fooling_count) / total_count)) - break - print("jsma attack done") - - -if __name__ == '__main__': - main() diff --git a/fluid/adversarial/tutorials/mnist_tutorial_lbfgs.py b/fluid/adversarial/tutorials/mnist_tutorial_lbfgs.py deleted file mode 100644 index ba120d9d151573878372e394d2a03d93efccb4e9..0000000000000000000000000000000000000000 --- a/fluid/adversarial/tutorials/mnist_tutorial_lbfgs.py +++ /dev/null @@ -1,130 +0,0 @@ -""" -LBFGS tutorial on mnist using advbox tool. -LBFGS method only supports targeted attack. -""" -import sys -sys.path.append("..") - -import matplotlib.pyplot as plt -import paddle.fluid as fluid -import paddle - -from advbox.adversary import Adversary -from advbox.attacks.lbfgs import LBFGS -from advbox.models.paddle import PaddleModel -from tutorials.mnist_model import mnist_cnn_model - - -def main(): - """ - Advbox demo which demonstrate how to use advbox. - """ - TOTAL_NUM = 500 - IMG_NAME = 'img' - LABEL_NAME = 'label' - - img = fluid.layers.data(name=IMG_NAME, shape=[1, 28, 28], dtype='float32') - # gradient should flow - img.stop_gradient = False - label = fluid.layers.data(name=LABEL_NAME, shape=[1], dtype='int64') - logits = mnist_cnn_model(img) - cost = fluid.layers.cross_entropy(input=logits, label=label) - avg_cost = fluid.layers.mean(x=cost) - - # use CPU - place = fluid.CPUPlace() - # use GPU - # place = fluid.CUDAPlace(0) - exe = fluid.Executor(place) - - BATCH_SIZE = 1 - train_reader = paddle.batch( - paddle.reader.shuffle( - paddle.dataset.mnist.train(), buf_size=128 * 10), - batch_size=BATCH_SIZE) - - test_reader = paddle.batch( - paddle.reader.shuffle( - paddle.dataset.mnist.test(), buf_size=128 * 10), - batch_size=BATCH_SIZE) - - fluid.io.load_params( - exe, "./mnist/", main_program=fluid.default_main_program()) - - # advbox demo - m = PaddleModel( - fluid.default_main_program(), - IMG_NAME, - LABEL_NAME, - logits.name, - avg_cost.name, (-1, 1), - channel_axis=1) - attack = LBFGS(m) - attack_config = {"epsilon": 0.001, } - - # use train data to generate adversarial examples - total_count = 0 - fooling_count = 0 - for data in train_reader(): - total_count += 1 - adversary = Adversary(data[0][0], data[0][1]) - - # LBFGS targeted attack - tlabel = 0 - adversary.set_target(is_targeted_attack=True, target_label=tlabel) - adversary = attack(adversary, **attack_config) - - if adversary.is_successful(): - fooling_count += 1 - print( - 'attack success, original_label=%d, adversarial_label=%d, count=%d' - % (data[0][1], adversary.adversarial_label, total_count)) - # plt.imshow(adversary.target, cmap='Greys_r') - # plt.show() - # np.save('adv_img', adversary.target) - else: - print('attack failed, original_label=%d, count=%d' % - (data[0][1], total_count)) - - if total_count >= TOTAL_NUM: - print( - "[TRAIN_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f" - % (fooling_count, total_count, - float(fooling_count) / total_count)) - break - - # use test data to generate adversarial examples - total_count = 0 - fooling_count = 0 - for data in test_reader(): - total_count += 1 - adversary = Adversary(data[0][0], data[0][1]) - - # LBFGS targeted attack - tlabel = 0 - adversary.set_target(is_targeted_attack=True, target_label=tlabel) - adversary = attack(adversary, **attack_config) - - if adversary.is_successful(): - fooling_count += 1 - print( - 'attack success, original_label=%d, adversarial_label=%d, count=%d' - % (data[0][1], adversary.adversarial_label, total_count)) - # plt.imshow(adversary.target, cmap='Greys_r') - # plt.show() - # np.save('adv_img', adversary.target) - else: - print('attack failed, original_label=%d, count=%d' % - (data[0][1], total_count)) - - if total_count >= TOTAL_NUM: - print( - "[TEST_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f" - % (fooling_count, total_count, - float(fooling_count) / total_count)) - break - print("lbfgs attack done") - - -if __name__ == '__main__': - main() diff --git a/fluid/adversarial/tutorials/mnist_tutorial_mifgsm.py b/fluid/adversarial/tutorials/mnist_tutorial_mifgsm.py deleted file mode 100644 index 8fc84db8f673c8da8eebf8b9d96f41a8712146c8..0000000000000000000000000000000000000000 --- a/fluid/adversarial/tutorials/mnist_tutorial_mifgsm.py +++ /dev/null @@ -1,143 +0,0 @@ -""" -MIFGSM tutorial on mnist using advbox tool. -MIFGSM is a broad class of momentum iterative gradient-based methods based on FSGM. -It supports non-targeted attack and targeted attack. -""" -import sys -sys.path.append("..") - -import matplotlib.pyplot as plt -import numpy as np -import paddle.fluid as fluid -import paddle - -from advbox.adversary import Adversary -from advbox.attacks.gradient_method import MIFGSM -from advbox.models.paddle import PaddleModel -from tutorials.mnist_model import mnist_cnn_model - - -def main(): - """ - Advbox demo which demonstrate how to use advbox. - """ - TOTAL_NUM = 500 - IMG_NAME = 'img' - LABEL_NAME = 'label' - - img = fluid.layers.data(name=IMG_NAME, shape=[1, 28, 28], dtype='float32') - # gradient should flow - img.stop_gradient = False - label = fluid.layers.data(name=LABEL_NAME, shape=[1], dtype='int64') - logits = mnist_cnn_model(img) - cost = fluid.layers.cross_entropy(input=logits, label=label) - avg_cost = fluid.layers.mean(x=cost) - - # use CPU - place = fluid.CPUPlace() - # use GPU - # place = fluid.CUDAPlace(0) - exe = fluid.Executor(place) - - BATCH_SIZE = 1 - train_reader = paddle.batch( - paddle.reader.shuffle( - paddle.dataset.mnist.train(), buf_size=128 * 10), - batch_size=BATCH_SIZE) - - test_reader = paddle.batch( - paddle.reader.shuffle( - paddle.dataset.mnist.test(), buf_size=128 * 10), - batch_size=BATCH_SIZE) - - fluid.io.load_params( - exe, "./mnist/", main_program=fluid.default_main_program()) - - # advbox demo - m = PaddleModel( - fluid.default_main_program(), - IMG_NAME, - LABEL_NAME, - logits.name, - avg_cost.name, (-1, 1), - channel_axis=1) - attack = MIFGSM(m) - attack_config = { - "norm_ord": np.inf, - "epsilons": 0.1, - "steps": 100, - "decay_factor": 1 - } - - # use train data to generate adversarial examples - total_count = 0 - fooling_count = 0 - for data in train_reader(): - total_count += 1 - adversary = Adversary(data[0][0], data[0][1]) - - # MIFGSM non-targeted attack - adversary = attack(adversary, **attack_config) - - # MIFGSM targeted attack - # tlabel = 0 - # adversary.set_target(is_targeted_attack=True, target_label=tlabel) - # adversary = attack(adversary, **attack_config) - - if adversary.is_successful(): - fooling_count += 1 - print( - 'attack success, original_label=%d, adversarial_label=%d, count=%d' - % (data[0][1], adversary.adversarial_label, total_count)) - # plt.imshow(adversary.target, cmap='Greys_r') - # plt.show() - # np.save('adv_img', adversary.target) - else: - print('attack failed, original_label=%d, count=%d' % - (data[0][1], total_count)) - - if total_count >= TOTAL_NUM: - print( - "[TRAIN_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f" - % (fooling_count, total_count, - float(fooling_count) / total_count)) - break - - # use test data to generate adversarial examples - total_count = 0 - fooling_count = 0 - for data in test_reader(): - total_count += 1 - adversary = Adversary(data[0][0], data[0][1]) - - # MIFGSM non-targeted attack - adversary = attack(adversary, **attack_config) - - # MIFGSM targeted attack - # tlabel = 0 - # adversary.set_target(is_targeted_attack=True, target_label=tlabel) - # adversary = attack(adversary, **attack_config) - - if adversary.is_successful(): - fooling_count += 1 - print( - 'attack success, original_label=%d, adversarial_label=%d, count=%d' - % (data[0][1], adversary.adversarial_label, total_count)) - # plt.imshow(adversary.target, cmap='Greys_r') - # plt.show() - # np.save('adv_img', adversary.target) - else: - print('attack failed, original_label=%d, count=%d' % - (data[0][1], total_count)) - - if total_count >= TOTAL_NUM: - print( - "[TEST_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f" - % (fooling_count, total_count, - float(fooling_count) / total_count)) - break - print("mifgsm attack done") - - -if __name__ == '__main__': - main() diff --git a/fluid/mnist/.run_ce.sh b/fluid/mnist/.run_ce.sh deleted file mode 100755 index d6ccf429b52da1ff26ac02df5af287461a823a98..0000000000000000000000000000000000000000 --- a/fluid/mnist/.run_ce.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/bin/bash - -# This file is only used for continuous evaluation. - -rm -rf *_factor.txt -model_file='model.py' -python $model_file --batch_size 128 --pass_num 5 --device CPU | python _ce.py diff --git a/fluid/mnist/_ce.py b/fluid/mnist/_ce.py deleted file mode 100644 index 9c2dba53526d2e976252fce05c7ff7f0f44b39b2..0000000000000000000000000000000000000000 --- a/fluid/mnist/_ce.py +++ /dev/null @@ -1,61 +0,0 @@ -# this file is only used for continuous evaluation test! - -import os -import sys -sys.path.append(os.environ['ceroot']) -from kpi import CostKpi, DurationKpi, AccKpi - -# NOTE kpi.py should shared in models in some way!!!! - -train_cost_kpi = CostKpi('train_cost', 0.02, actived=True) -test_acc_kpi = AccKpi('test_acc', 0.005, actived=True) -train_duration_kpi = DurationKpi('train_duration', 0.06, actived=True) -train_acc_kpi = AccKpi('train_acc', 0.005, actived=True) - -tracking_kpis = [ - train_acc_kpi, - train_cost_kpi, - test_acc_kpi, - train_duration_kpi, -] - - -def parse_log(log): - ''' - This method should be implemented by model developers. - - The suggestion: - - each line in the log should be key, value, for example: - - " - train_cost\t1.0 - test_cost\t1.0 - train_cost\t1.0 - train_cost\t1.0 - train_acc\t1.2 - " - ''' - for line in log.split('\n'): - fs = line.strip().split('\t') - print(fs) - if len(fs) == 3 and fs[0] == 'kpis': - kpi_name = fs[1] - kpi_value = float(fs[2]) - yield kpi_name, kpi_value - - -def log_to_ce(log): - kpi_tracker = {} - for kpi in tracking_kpis: - kpi_tracker[kpi.name] = kpi - - for (kpi_name, kpi_value) in parse_log(log): - print(kpi_name, kpi_value) - kpi_tracker[kpi_name].add_record(kpi_value) - kpi_tracker[kpi_name].persist() - - -if __name__ == '__main__': - log = sys.stdin.read() - log_to_ce(log) diff --git a/fluid/mnist/model.py b/fluid/mnist/model.py deleted file mode 100644 index a66353c2239fd78eb1fdf9f08690994a9a7d1c08..0000000000000000000000000000000000000000 --- a/fluid/mnist/model.py +++ /dev/null @@ -1,198 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import numpy as np -import argparse -import time - -import paddle -import paddle.fluid as fluid -import paddle.fluid.profiler as profiler -import six - -SEED = 90 -DTYPE = "float32" - -# random seed must set before configuring the network. -fluid.default_startup_program().random_seed = SEED - - -def parse_args(): - parser = argparse.ArgumentParser("mnist model benchmark.") - parser.add_argument( - '--batch_size', type=int, default=128, help='The minibatch size.') - parser.add_argument( - '--iterations', type=int, default=35, help='The number of minibatches.') - parser.add_argument( - '--pass_num', type=int, default=5, help='The number of passes.') - parser.add_argument( - '--device', - type=str, - default='GPU', - choices=['CPU', 'GPU'], - help='The device type.') - parser.add_argument( - '--infer_only', action='store_true', help='If set, run forward only.') - parser.add_argument( - '--use_cprof', action='store_true', help='If set, use cProfile.') - parser.add_argument( - '--use_nvprof', - action='store_true', - help='If set, use nvprof for CUDA.') - args = parser.parse_args() - return args - - -def print_arguments(args): - vars(args)['use_nvprof'] = (vars(args)['use_nvprof'] and - vars(args)['device'] == 'GPU') - print('----------- Configuration Arguments -----------') - for arg, value in sorted(six.iteritems(vars(args))): - print('%s: %s' % (arg, value)) - print('------------------------------------------------') - - -def cnn_model(data): - conv_pool_1 = fluid.nets.simple_img_conv_pool( - input=data, - filter_size=5, - num_filters=20, - pool_size=2, - pool_stride=2, - act="relu") - conv_pool_2 = fluid.nets.simple_img_conv_pool( - input=conv_pool_1, - filter_size=5, - num_filters=50, - pool_size=2, - pool_stride=2, - act="relu") - - # TODO(dzhwinter) : refine the initializer and random seed settting - SIZE = 10 - input_shape = conv_pool_2.shape - param_shape = [six.moves.reduce(lambda a, b: a * b, input_shape[1:], 1) - ] + [SIZE] - scale = (2.0 / (param_shape[0]**2 * SIZE))**0.5 - - predict = fluid.layers.fc( - input=conv_pool_2, - size=SIZE, - act="softmax", - param_attr=fluid.param_attr.ParamAttr( - initializer=fluid.initializer.NormalInitializer( - loc=0.0, scale=scale))) - return predict - - -def eval_test(exe, batch_acc, batch_size_tensor, inference_program): - test_reader = paddle.batch( - paddle.dataset.mnist.test(), batch_size=args.batch_size) - test_pass_acc = fluid.average.WeightedAverage() - for batch_id, data in enumerate(test_reader()): - img_data = np.array( - [x[0].reshape([1, 28, 28]) for x in data]).astype(DTYPE) - y_data = np.array([x[1] for x in data]).astype("int64") - y_data = y_data.reshape([len(y_data), 1]) - - acc, weight = exe.run(inference_program, - feed={"pixel": img_data, - "label": y_data}, - fetch_list=[batch_acc, batch_size_tensor]) - test_pass_acc.add(value=acc, weight=weight) - pass_acc = test_pass_acc.eval() - return pass_acc - - -def run_benchmark(model, args): - if args.use_cprof: - pr = cProfile.Profile() - pr.enable() - start_time = time.time() - # Input data - images = fluid.layers.data(name='pixel', shape=[1, 28, 28], dtype=DTYPE) - label = fluid.layers.data(name='label', shape=[1], dtype='int64') - - # Train program - predict = model(images) - cost = fluid.layers.cross_entropy(input=predict, label=label) - avg_cost = fluid.layers.mean(x=cost) - - # Evaluator - batch_size_tensor = fluid.layers.create_tensor(dtype='int64') - batch_acc = fluid.layers.accuracy( - input=predict, label=label, total=batch_size_tensor) - - # inference program - inference_program = fluid.default_main_program().clone(for_test=True) - - # Optimization - opt = fluid.optimizer.AdamOptimizer( - learning_rate=0.001, beta1=0.9, beta2=0.999) - opt.minimize(avg_cost) - - fluid.memory_optimize(fluid.default_main_program()) - - # Initialize executor - place = fluid.CPUPlace() if args.device == 'CPU' else fluid.CUDAPlace(0) - exe = fluid.Executor(place) - - # Parameter initialization - exe.run(fluid.default_startup_program()) - - # Reader - train_reader = paddle.batch( - paddle.dataset.mnist.train(), batch_size=args.batch_size) - - accuracy = fluid.average.WeightedAverage() - for pass_id in range(args.pass_num): - accuracy.reset() - pass_start = time.time() - every_pass_loss = [] - for batch_id, data in enumerate(train_reader()): - img_data = np.array( - [x[0].reshape([1, 28, 28]) for x in data]).astype(DTYPE) - y_data = np.array([x[1] for x in data]).astype("int64") - y_data = y_data.reshape([len(y_data), 1]) - - start = time.time() - loss, acc, weight = exe.run( - fluid.default_main_program(), - feed={"pixel": img_data, - "label": y_data}, - fetch_list=[avg_cost, batch_acc, batch_size_tensor] - ) # The accuracy is the accumulation of batches, but not the current batch. - end = time.time() - accuracy.add(value=acc, weight=weight) - every_pass_loss.append(loss) - print("Pass = %d, Iter = %d, Loss = %f, Accuracy = %f" % - (pass_id, batch_id, loss, acc)) - - pass_end = time.time() - - train_avg_acc = accuracy.eval() - train_avg_loss = np.mean(every_pass_loss) - test_avg_acc = eval_test(exe, batch_acc, batch_size_tensor, - inference_program) - - print( - "pass=%d, train_avg_acc=%f,train_avg_loss=%f, test_avg_acc=%f, elapse=%f" - % (pass_id, train_avg_acc, train_avg_loss, test_avg_acc, - (pass_end - pass_start))) - #Note: The following logs are special for CE monitoring. - #Other situations do not need to care about these logs. - print("kpis train_acc %f" % train_avg_acc) - print("kpis train_cost %f" % train_avg_loss) - print("kpis test_acc %f" % test_avg_acc) - print("kpis train_duration %f" % (pass_end - pass_start)) - - -if __name__ == '__main__': - args = parse_args() - print_arguments(args) - if args.use_nvprof and args.device == 'GPU': - with profiler.cuda_profiler("cuda_profiler.txt", 'csv') as nvprof: - run_benchmark(cnn_model, args) - else: - run_benchmark(cnn_model, args) diff --git a/fluid/policy_gradient/README.md b/fluid/policy_gradient/README.md deleted file mode 100644 index b813aa124466597adfb80261bee7c2de22b95e67..0000000000000000000000000000000000000000 --- a/fluid/policy_gradient/README.md +++ /dev/null @@ -1,171 +0,0 @@ -运行本目录下的程序示例需要使用PaddlePaddle的最新develop分枝。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。 - ---- - -# Policy Gradient RL by PaddlePaddle -本文介绍了如何使用PaddlePaddle通过policy-based的强化学习方法来训练一个player(actor model), 我们希望这个player可以完成简单的走阶梯任务。 - - 内容分为: - - - 任务描述 - - 模型 - - 策略(目标函数) - - 算法(Gradient ascent) - - PaddlePaddle实现 - - -## 1. 任务描述 -假设有一个阶梯,连接A、B点,player从A点出发,每一步只能向前走一步或向后走一步,到达B点即为完成任务。我们希望训练一个聪明的player,它知道怎么最快的从A点到达B点。 -我们在命令行以下边的形式模拟任务: -``` -A - O - - - - - B -``` -一个‘-'代表一个阶梯,A点在行头,B点在行末,O代表player当前在的位置。 - -## 2. Policy Gradient -### 2.1 模型 -#### inputyer -模型的输入是player观察到的当前阶梯的状态$S$, 要包含阶梯的长度和player当前的位置信息。 -在命令行模拟的情况下,player的位置和阶梯长度连个变量足以表示当前的状态,但是我们为了便于将这个demo推广到更复杂的任务场景,我们这里用一个向量来表示游戏状态$S$. -向量$S$的长度为阶梯的长度,每一维代表一个阶梯,player所在的位置为1,其它位置为0. -下边是一个例子: -``` -S = [0, 1, 0, 0] // 阶梯长度为4,player在第二个阶梯上。 -``` -#### hidden layer -隐藏层采用两个全连接layer `FC_1`和`FC_2`, 其中`FC_1` 的size为10, `FC_2`的size为2. - -#### output layer -我们使用softmax将`FC_2`的output映射为所有可能的动作(前进或后退)的概率分布(Probability of taking the action),即为一个二维向量`act_probs`, 其中,`act_probs[0]` 为后退的概率,`act_probs[1]`为前进的概率。 - -#### 模型表示 -我将我们的player模型(actor)形式化表示如下: -$$a = \pi_\theta(s)$$ -其中$\theta$表示模型的参数,$s$是输入状态。 - - -### 2.2 策略(目标函数) -我们怎么评估一个player(模型)的好坏呢?首先我们定义几个术语: -我们让$\pi_\theta(s)$来玩一局游戏,$s_t$表示第$t$时刻的状态,$a_t$表示在状态$s_t$做出的动作,$r_t$表示做过动作$a_t$后得到的奖赏。 -一局游戏的过程可以表示如下: -$$\tau = [s_1, a_1, r_1, s_2, a_2, r_2 ... s_T, a_T, r_T] \tag{1}$$ - -一局游戏的奖励表示如下: -$$R(\tau) = \sum_{t=1}^Tr_t$$ - -player玩一局游戏,可能会出现多种操作序列$\tau$ ,某个$\tau$出现的概率是依赖于player model的$\theta$, 记做: -$$P(\tau | \theta)$$ -那么,给定一个$\theta$(player model), 玩一局游戏,期望得到的奖励是: -$$\overline {R}_\theta = \sum_\tau R(\tau)\sum_\tau R(\tau) P(\tau|\theta)$$ -大多数情况,我们无法穷举出所有的$\tau$,所以我们就抽取N个$\tau$来计算近似的期望: -$$\overline {R}_\theta = \sum_\tau R(\tau) P(\tau|\theta) \approx \frac{1}{N} \sum_{n=1}^N R(\tau^n)$$ - -$\overline {R}_\theta$就是我们需要的目标函数,它表示了一个参数为$\theta$的player玩一局游戏得分的期望,这个期望越大,代表这个player能力越强。 -### 2.3 算法(Gradient ascent) -我们的目标函数是$\overline {R}_\theta$, 我们训练的任务就是, 我们训练的任务就是: -$$\theta^* = \arg\max_\theta \overline {R}_\theta$$ - -为了找到理想的$\theta$,我们使用Gradient ascent方法不断在$\overline {R}_\theta$的梯度方向更新$\theta$,可表示如下: -$$\theta' = \theta + \eta * \bigtriangledown \overline {R}_\theta$$ - -$$ \bigtriangledown \overline {R}_\theta = \sum_\tau R(\tau) \bigtriangledown P(\tau|\theta)\\ -= \sum_\tau R(\tau) P(\tau|\theta) \frac{\bigtriangledown P(\tau|\theta)}{P(\tau|\theta)} \\ -=\sum_\tau R(\tau) P(\tau|\theta) {\bigtriangledown \log P(\tau|\theta)} $$ - - -$$P(\tau|\theta) = P(s_1)P(a_1|s_1,\theta)P(s_2, r_1|s_1,a_1)P(a_2|s_2,\theta)P(s_3,r_2|s_2,a_2)...P(a_t|s_t,\theta)P(s_{t+1}, r_t|s_t,a_t)\\ -=P(s_1) \sum_{t=1}^T P(a_t|s_t,\theta)P(s_{t+1}, r_t|s_t,a_t)$$ - -$$\log P(\tau|\theta) = \log P(s_1) + \sum_{t=1}^T [\log P(a_t|s_t,\theta) + \log P(s_{t+1}, r_t|s_t,a_t)]$$ - -$$ \bigtriangledown \log P(\tau|\theta) = \sum_{t=1}^T \bigtriangledown \log P(a_t|s_t,\theta)$$ - -$$ \bigtriangledown \overline {R}_\theta = \sum_\tau R(\tau) P(\tau|\theta) {\bigtriangledown \log P(\tau|\theta)} \\ -\approx \frac{1}{N} \sum_{n=1}^N R(\tau^n) {\bigtriangledown \log P(\tau|\theta)} \\ -= \frac{1}{N} \sum_{n=1}^N R(\tau^n) {\sum_{t=1}^T \bigtriangledown \log P(a_t|s_t,\theta)} \\ -= \frac{1}{N} \sum_{n=1}^N \sum_{t=1}^T R(\tau^n) { \bigtriangledown \log P(a_t|s_t,\theta)} \tag{11}$$ - -#### 2.3.2 导数解释 - -在使用深度学习框架进行训练求解时,一般用梯度下降方法,所以我们把Gradient ascent转为Gradient -descent, 重写等式$(5)(6)$为: - -$$\theta^* = \arg\min_\theta (-\overline {R}_\theta \tag{13}$$ -$$\theta' = \theta - \eta * \bigtriangledown (-\overline {R}_\theta)) \tag{14}$$ - -根据上一节的推导,$ (-\bigtriangledown \overline {R}_\theta) $结果如下: - -$$ -\bigtriangledown \overline {R}_\theta -= \frac{1}{N} \sum_{n=1}^N \sum_{t=1}^T R(\tau^n) { \bigtriangledown -\log P(a_t|s_t,\theta)} \tag{15}$$ - -根据等式(14), 我们的player的模型可以设计为: - -

    -
    -图 1 -

    - -用户的在一局游戏中的一次操作可以用元组$(s_t, a_t)$, 就是在状态$s_t$状态下做了动作$a_t$, 我们通过图(1)中的前向网络计算出来cross entropy cost为$−\log P(a_t|s_t,\theta)$, 恰好是等式(15)中我们需要微分的一项。 -图1是我们需要的player模型,我用这个网络的前向计算可以预测任何状态下该做什么动作。但是怎么去训练学习这个网络呢?在等式(15)中还有一项$R(\tau^n)$, 我做反向梯度传播的时候要加上这一项,所以我们需要在图1基础上再加上$R(\tau^n)$, 如 图2 所示: - -

    -
    -图 2 -

    - -图2就是我们最终的网络结构。 - -#### 2.3.3 直观理解 -对于等式(15),我只看游戏中的一步操作,也就是这一项: $R(\tau^n) { \bigtriangledown -\log P(a_t|s_t,\theta)}$, 我们可以简单的认为我们训练的目的是让 $R(\tau^n) {[ -\log P(a_t|s_t,\theta)]}$尽可能的小,也就是$R(\tau^n) \log P(a_t|s_t,\theta)$尽可能的大。 - -- 如果我们当前游戏局的奖励$R(\tau^n)$为正,那么我们希望当前操作的出现的概率$P(a_t|s_t,\theta)$尽可能大。 -- 如果我们当前游戏局的奖励$R(\tau^n)$为负,那么我们希望当前操作的出现的概率$P(a_t|s_t,\theta)$尽可能小。 - -#### 2.3.4 一个问题 - -一人犯错,诛连九族。一人得道,鸡犬升天。如果一局游戏得到奖励,我们希望帮助获得奖励的每一次操作都被重视;否则,导致惩罚的操作都要被冷落一次。 -是不是很有道理的样子?但是,如果有些游戏场景只有奖励,没有惩罚,怎么办?也就是所有的$R(\tau^n)$都为正。 -针对不同的游戏场景,我们有不同的解决方案: - -1. 每局游戏得分不一样:将每局的得分减去一个bias,结果就有正有负了。 -2. 每局游戏得分一样:把完成一局的时间作为计分因素,并减去一个bias. - -我们在第一章描述的游戏场景,需要用第二种 ,player每次到达终点都会收到1分的奖励,我们可以按完成任务所用的步数来定义奖励R. -更进一步,我们认为一局游戏中每步动作对结局的贡献是不同的,有聪明的动作,也有愚蠢的操作。直观的理解,一般是靠前的动作是愚蠢的,靠后的动作是聪明的。既然有了这个价值观,那么我们拿到1分的奖励,就不能平均分给每个动作了。 -如图3所示,让所有动作按先后排队,从后往前衰减地给每个动作奖励,然后再每个动作的奖励再减去所有动作奖励的平均值: - -

    -
    -图 3 -

    - -## 3. 训练效果 - -demo运行训练效果如下,经过1000轮尝试,我们的player就学会了如何有效的完成任务了: - -``` ----------O epoch: 0; steps: 42 ----------O epoch: 1; steps: 77 ----------O epoch: 2; steps: 82 ----------O epoch: 3; steps: 64 ----------O epoch: 4; steps: 79 ----------O epoch: 501; steps: 19 ----------O epoch: 1001; steps: 9 ----------O epoch: 1501; steps: 9 ----------O epoch: 2001; steps: 11 ----------O epoch: 2501; steps: 9 ----------O epoch: 3001; steps: 9 ----------O epoch: 3002; steps: 9 ----------O epoch: 3003; steps: 9 ----------O epoch: 3004; steps: 9 ----------O epoch: 3005; steps: 9 ----------O epoch: 3006; steps: 9 ----------O epoch: 3007; steps: 9 ----------O epoch: 3008; steps: 9 ----------O epoch: 3009; steps: 9 ----------O epoch: 3010; steps: 11 ----------O epoch: 3011; steps: 9 ----------O epoch: 3012; steps: 9 ----------O epoch: 3013; steps: 9 ----------O epoch: 3014; steps: 9 -``` diff --git a/fluid/policy_gradient/brain.py b/fluid/policy_gradient/brain.py deleted file mode 100644 index 27a2da28563e5063213100d34c1b88d5fe2f91b0..0000000000000000000000000000000000000000 --- a/fluid/policy_gradient/brain.py +++ /dev/null @@ -1,94 +0,0 @@ -import numpy as np -import paddle.fluid as fluid -# reproducible -np.random.seed(1) - - -class PolicyGradient: - def __init__( - self, - n_actions, - n_features, - learning_rate=0.01, - reward_decay=0.95, - output_graph=False, ): - self.n_actions = n_actions - self.n_features = n_features - self.lr = learning_rate - self.gamma = reward_decay - - self.ep_obs, self.ep_as, self.ep_rs = [], [], [] - - self.place = fluid.CPUPlace() - self.exe = fluid.Executor(self.place) - - def build_net(self): - - obs = fluid.layers.data( - name='obs', shape=[self.n_features], dtype='float32') - acts = fluid.layers.data(name='acts', shape=[1], dtype='int64') - vt = fluid.layers.data(name='vt', shape=[1], dtype='float32') - # fc1 - fc1 = fluid.layers.fc(input=obs, size=10, act="tanh") # tanh activation - # fc2 - all_act_prob = fluid.layers.fc(input=fc1, - size=self.n_actions, - act="softmax") - self.inferece_program = fluid.defaul_main_program().clone() - # to maximize total reward (log_p * R) is to minimize -(log_p * R) - neg_log_prob = fluid.layers.cross_entropy( - input=self.all_act_prob, - label=acts) # this is negative log of chosen action - neg_log_prob_weight = fluid.layers.elementwise_mul(x=neg_log_prob, y=vt) - loss = fluid.layers.reduce_mean( - neg_log_prob_weight) # reward guided loss - - sgd_optimizer = fluid.optimizer.SGD(self.lr) - sgd_optimizer.minimize(loss) - self.exe.run(fluid.default_startup_program()) - - def choose_action(self, observation): - prob_weights = self.exe.run(self.inferece_program, - feed={"obs": observation[np.newaxis, :]}, - fetch_list=[self.all_act_prob]) - prob_weights = np.array(prob_weights[0]) - # select action w.r.t the actions prob - action = np.random.choice( - range(prob_weights.shape[1]), p=prob_weights.ravel()) - return action - - def store_transition(self, s, a, r): - self.ep_obs.append(s) - self.ep_as.append(a) - self.ep_rs.append(r) - - def learn(self): - # discount and normalize episode reward - discounted_ep_rs_norm = self._discount_and_norm_rewards() - tensor_obs = np.vstack(self.ep_obs).astype("float32") - tensor_as = np.array(self.ep_as).astype("int64") - tensor_as = tensor_as.reshape([tensor_as.shape[0], 1]) - tensor_vt = discounted_ep_rs_norm.astype("float32")[:, np.newaxis] - # train on episode - self.exe.run( - fluid.default_main_program(), - feed={ - "obs": tensor_obs, # shape=[None, n_obs] - "acts": tensor_as, # shape=[None, ] - "vt": tensor_vt # shape=[None, ] - }) - self.ep_obs, self.ep_as, self.ep_rs = [], [], [] # empty episode data - return discounted_ep_rs_norm - - def _discount_and_norm_rewards(self): - # discount episode rewards - discounted_ep_rs = np.zeros_like(self.ep_rs) - running_add = 0 - for t in reversed(range(0, len(self.ep_rs))): - running_add = running_add * self.gamma + self.ep_rs[t] - discounted_ep_rs[t] = running_add - - # normalize episode rewards - discounted_ep_rs -= np.mean(discounted_ep_rs) - discounted_ep_rs /= np.std(discounted_ep_rs) - return discounted_ep_rs diff --git a/fluid/policy_gradient/env.py b/fluid/policy_gradient/env.py deleted file mode 100644 index e2cd972dbc9a3943aceb9763b9dabcd50a1e6df1..0000000000000000000000000000000000000000 --- a/fluid/policy_gradient/env.py +++ /dev/null @@ -1,56 +0,0 @@ -import time -import sys -import numpy as np - - -class Env(): - def __init__(self, stage_len, interval): - self.stage_len = stage_len - self.end = self.stage_len - 1 - self.position = 0 - self.interval = interval - self.step = 0 - self.epoch = -1 - self.render = False - - def reset(self): - self.end = self.stage_len - 1 - self.position = 0 - self.epoch += 1 - self.step = 0 - if self.render: - self.draw(True) - - def status(self): - s = np.zeros([self.stage_len]).astype("float32") - s[self.position] = 1 - return s - - def move(self, action): - self.step += 1 - reward = 0.0 - done = False - if action == 0: - self.position = max(0, self.position - 1) - else: - self.position = min(self.end, self.position + 1) - if self.render: - self.draw() - if self.position == self.end: - reward = 1.0 - done = True - return reward, done, self.status() - - def draw(self, new_line=False): - if new_line: - print "" - else: - print "\r", - for i in range(self.stage_len): - if i == self.position: - sys.stdout.write("O") - else: - sys.stdout.write("-") - sys.stdout.write(" epoch: %d; steps: %d" % (self.epoch, self.step)) - sys.stdout.flush() - time.sleep(self.interval) diff --git a/fluid/policy_gradient/images/PG_1.svg b/fluid/policy_gradient/images/PG_1.svg deleted file mode 100644 index e2352ff57ceb70bdba013c55c35eb1dc1cabe275..0000000000000000000000000000000000000000 --- a/fluid/policy_gradient/images/PG_1.svg +++ /dev/null @@ -1,3 +0,0 @@ - - - Produced by OmniGraffle 6.0.5 2017-12-01 08:39Z神经网络Layer 1x_2y_2y_0x_1x_nSoftmaxy_ma_2a_0a_m. . .. . .. . .. . .s_tθy_t = P(a_t | s_t, θ)-log(y_t) = -logP(a_t | s_t, θ)CROSS ENTROPY = diff --git a/fluid/policy_gradient/images/PG_2.svg b/fluid/policy_gradient/images/PG_2.svg deleted file mode 100644 index 3697bf9feca0861c9c0b2da29980ba4c86a3f4d7..0000000000000000000000000000000000000000 --- a/fluid/policy_gradient/images/PG_2.svg +++ /dev/null @@ -1,3 +0,0 @@ - - - Produced by OmniGraffle 6.0.5 2017-12-01 08:39Z神经网络 2Layer 1s_tYFCa_t-logP(a_t | s_t, θ)SoftmaxR(τ^n)Cross EntropyMul-R(τ^n)logP(a_t | s_t, θ)θ diff --git a/fluid/policy_gradient/images/PG_3.svg b/fluid/policy_gradient/images/PG_3.svg deleted file mode 100644 index 97b56c3fe1188e603a3bf5f6eabf7ea0ea3072c7..0000000000000000000000000000000000000000 --- a/fluid/policy_gradient/images/PG_3.svg +++ /dev/null @@ -1,3 +0,0 @@ - - - Produced by OmniGraffle 6.0.5 2017-12-01 09:42Z神经网络 3Layer 1Ra_2= 0.9 * a_1a_(t-1)a_t= 0.9^2 *= 0.9^t * -= mean(a_1, a_2 … a_t) diff --git a/fluid/policy_gradient/run.py b/fluid/policy_gradient/run.py deleted file mode 100644 index 6f2f8c381a9d6452c5d7dfefb41f05eb4551d73a..0000000000000000000000000000000000000000 --- a/fluid/policy_gradient/run.py +++ /dev/null @@ -1,29 +0,0 @@ -from brain import PolicyGradient -from env import Env -import numpy as np - -n_actions = 2 -interval = 0.01 -stage_len = 10 -epoches = 10000 - -if __name__ == "__main__": - - brain = PolicyGradient(n_actions, stage_len) - e = Env(stage_len, interval) - brain.build_net() - done = False - - for epoch in range(epoches): - if (epoch % 500 == 1) or epoch < 5 or epoch > 3000: - e.render = True - else: - e.render = False - e.reset() - while not done: - s = e.status() - action = brain.choose_action(s) - r, done, _ = e.move(action) - brain.store_transition(s, action, r) - done = False - brain.learn() diff --git a/legacy/README.cn.md b/legacy/README.cn.md deleted file mode 100644 index 72fb35ff3b239d8fa5e226f84aa09f084f593697..0000000000000000000000000000000000000000 --- a/legacy/README.cn.md +++ /dev/null @@ -1,136 +0,0 @@ -# models 简介 - -[![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat)](https://github.com/PaddlePaddle/models) -[![Documentation Status](https://img.shields.io/badge/中文文档-最新-brightgreen.svg)](https://github.com/PaddlePaddle/models) -[![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE) - -PaddlePaddle提供了丰富的运算单元,帮助大家以模块化的方式构建起千变万化的深度学习模型来解决不同的应用问题。这里,我们针对常见的机器学习任务,提供了不同的神经网络模型供大家学习和使用。 - - -## 1. 词向量 - -词向量用一个实向量表示词语,向量的每个维都表示文本的某种潜在语法或语义特征,是深度学习应用于自然语言处理领域最成功的概念和成果之一。广义的,词向量也可以应用于普通离散特征。词向量的学习通常都是一个无监督的学习过程,因此,可以充分利用海量的无标记数据以捕获特征之间的关系,也可以有效地解决特征稀疏、标签数据缺失、数据噪声等问题。然而,在常见词向量学习方法中,模型最后一层往往会遇到一个超大规模的分类问题,是计算性能的瓶颈。 - -在词向量任务中,我们向大家展示如何使用Hierarchical-Sigmoid 和噪声对比估计(Noise Contrastive Estimation,NCE)来加速词向量的学习。 - -- 1.1 [Hsigmoid加速词向量训练](https://github.com/PaddlePaddle/models/tree/develop/hsigmoid) -- 1.2 [噪声对比估计加速词向量训练](https://github.com/PaddlePaddle/models/tree/develop/nce_cost) - - -## 2. RNN 语言模型 - -语言模型是自然语言处理领域里一个重要的基础模型,除了得到词向量(语言模型训练的副产物),还可以帮助我们生成文本。给定若干个词,语言模型可以帮助我们预测下一个最可能出现的词。 - -在利用语言模型生成文本的任务中,我们重点介绍循环神经网络语言模型,大家可以通过文档中的使用说明快速适配到自己的训练语料,完成自动写诗、自动写散文等有趣的模型。 - -- 2.1 [使用循环神经网络语言模型生成文本](https://github.com/PaddlePaddle/models/tree/develop/generate_sequence_by_rnn_lm) - -## 3. 点击率预估 - -点击率预估模型预判用户对一条广告点击的概率,对每次广告的点击情况做出预测,是广告技术的核心算法之一。逻谛斯克回归对大规模稀疏特征有着很好的学习能力,在点击率预估任务发展的早期一统天下。近年来,DNN 模型由于其强大的学习能力逐渐接过点击率预估任务的大旗。 - -在点击率预估任务中,我们首先给出谷歌提出的 Wide & Deep 模型。这一模型融合了适用于学习抽象特征的DNN和适用于大规模稀疏特征的逻谛斯克回归两者的优点,可以作为一种相对成熟的模型框架使用,在工业界也有一定的应用。同时,我们提供基于因子分解机的深度神经网络模型,该模型融合了因子分解机和深度神经网络,分别建模输入属性之间的低阶交互和高阶交互。 - -- 3.1 [Wide & deep 点击率预估模型](https://github.com/PaddlePaddle/models/tree/develop/ctr/README.cn.md) -- 3.2 [基于深度因子分解机的点击率预估模型](https://github.com/PaddlePaddle/models/tree/develop/deep_fm) - -## 4. 文本分类 - -文本分类是自然语言处理领域最基础的任务之一,深度学习方法能够免除复杂的特征工程,直接使用原始文本作为输入,数据驱动地最优化分类准确率。 - -在文本分类任务中,我们以情感分类任务为例,提供了基于DNN的非序列文本分类模型,以及基于CNN的序列模型供大家学习和使用(基于LSTM的模型见PaddleBook中[情感分类](http://www.paddlepaddle.org/docs/develop/book/06.understand_sentiment/index.cn.html)一课)。 - -- 4.1 [基于DNN/CNN的情感分类](https://github.com/PaddlePaddle/models/tree/develop/text_classification) -- 4.2 [基于双层序列的文本分类模型](https://github.com/PaddlePaddle/models/tree/develop/nested_sequence/text_classification) - -## 5. 排序学习 - -排序学习(Learning to Rank, LTR)是信息检索和搜索引擎研究的核心问题之一,通过机器学习方法学习一个分值函数对待排序的候选进行打分,再根据分值的高低确定序关系。深度神经网络可以用来建模分值函数,构成各类基于深度学习的LTR模型。 - -在排序学习任务中,我们介绍基于RankLoss损失函数Pairwise排序模型和基于LambdaRank损失函数的Listwise排序模型(Pointwise学习策略见PaddleBook中[推荐系统](http://www.paddlepaddle.org/docs/develop/book/05.recommender_system/index.cn.html)一课)。 - -- 5.1 [基于Pairwise和Listwise的排序学习](https://github.com/PaddlePaddle/models/tree/develop/ltr) - -## 6. 结构化语义模型 - -深度结构化语义模型是一种基于神经网络的语义匹配模型框架,可以用于学习两路信息实体或是文本之间的语义相似性。DSSM使用DNN、CNN或是RNN将两路信息实体或是文本映射到同一个连续的低纬度语义空间中。在这个语义空间中,两路实体或是文本可以同时进行表示,然后,通过定义距离度量和匹配函数来刻画并学习不同实体或是文本在同一个语义空间内的语义相似性。 - -在结构化语义模型任务中,我们演示如何建模两个字符串之间的语义相似度。模型支持DNN(全连接前馈网络)、CNN(卷积网络)、RNN(递归神经网络)等不同的网络结构,以及分类、回归、排序等不同损失函数。本例采用最简单的文本数据作为输入,通过替换自己的训练和预测数据,便可以在真实场景中使用。 - -- 6.1 [深度结构化语义模型](https://github.com/PaddlePaddle/models/tree/develop/dssm/README.cn.md) - -## 7. 命名实体识别 - -给定输入序列,序列标注模型为序列中每一个元素贴上一个类别标签,是自然语言处理领域最基础的任务之一。随着深度学习方法的不断发展,利用循环神经网络学习输入序列的特征表示,条件随机场(Conditional Random Field, CRF)在特征基础上完成序列标注任务,逐渐成为解决序列标注问题的标配解决方案。 - -在序列标注任务中,我们以命名实体识别(Named Entity Recognition,NER)任务为例,介绍如何训练一个端到端的序列标注模型。 - -- 7.1 [命名实体识别](https://github.com/PaddlePaddle/models/tree/develop/sequence_tagging_for_ner) - -## 8. 序列到序列学习 - -序列到序列学习实现两个甚至是多个不定长模型之间的映射,有着广泛的应用,包括:机器翻译、智能对话与问答、广告创意语料生成、自动编码(如金融画像编码)、判断多个文本串之间的语义相关性等。 - -在序列到序列学习任务中,我们首先以机器翻译任务为例,提供了多种改进模型供大家学习和使用。包括:不带注意力机制的序列到序列映射模型,这一模型是所有序列到序列学习模型的基础;使用Scheduled Sampling改善RNN模型在生成任务中的错误累积问题;带外部记忆机制的神经机器翻译,通过增强神经网络的记忆能力,来完成复杂的序列到序列学习任务。除机器翻译任务之外,我们也提供了一个基于深层LSTM网络生成古诗词,实现同语言生成的模型。 - -- 8.1 [无注意力机制的神经机器翻译](https://github.com/PaddlePaddle/models/tree/develop/nmt_without_attention/README.cn.md) -- 8.2 [使用Scheduled Sampling改善翻译质量](https://github.com/PaddlePaddle/models/tree/develop/scheduled_sampling) -- 8.3 [带外部记忆机制的神经机器翻译](https://github.com/PaddlePaddle/models/tree/develop/mt_with_external_memory) -- 8.4 [生成古诗词](https://github.com/PaddlePaddle/models/tree/develop/generate_chinese_poetry) - -## 9. 阅读理解 - -当深度学习以及各类新技术不断推动自然语言处理领域向前发展时,我们不禁会问:应该如何确认模型真正理解了人类特有的自然语言,具备一定的理解和推理能力?纵观NLP领域的各类经典问题:词法分析、句法分析、情感分类、写诗等,这些问题的经典解决方案,从技术原理上距离“语言理解”仍有一定距离。为了衡量现有NLP技术到“语言理解”这一终极目标之间的差距,我们需要一个有足够难度且可量化可复现的任务,这也是阅读理解问题提出的初衷。尽管目前的研究现状表明在现有阅读理解数据集上表现良好的模型,依然没有做到真正的语言理解,但机器阅读理解依然被视为是检验模型向理解语言迈进的一个重要任务。 - -阅读理解本质上也是自动问答的一种,模型“阅读”一段文字后回答给定的问题,在这一任务中,我们介绍使用Learning to Search 方法,将阅读理解转化为从段落中寻找答案所在句子,答案在句子中的起始位置,以及答案在句子中的结束位置,这样一个多步决策过程。 - -- 9.1 [Globally Normalized Reader](https://github.com/PaddlePaddle/models/tree/develop/globally_normalized_reader) - -## 10. 自动问答 - -自动问答(Question Answering)系统利用计算机自动回答用户提出的问题,是验证机器是否具备自然语言理解能力的重要任务之一,其研究历史可以追溯到人工智能的原点。与检索系统相比,自动问答系统是信息服务的一种高级形式,系统返回给用户的不再是排序后的基于关键字匹配的检索结果,而是精准的自然语言答案。 - -在自动问答任务中,我们介绍基于深度学习的端到端问答系统,将自动问答转化为一个序列标注问题。端对端问答系统试图通过从高质量的"问题-证据(Evidence)-答案"数据中学习,建立一个联合学习模型,同时学习语料库、知识库、问句语义表示之间的语义映射关系,将传统的问句语义解析、文本检索、答案抽取与生成的复杂步骤转变为一个可学习过程。 - -- 10.1 [基于序列标注的事实型自动问答模型](https://github.com/PaddlePaddle/models/tree/develop/neural_qa) - -## 11. 图像分类 - -图像相比文字能够提供更加生动、容易理解及更具艺术感的信息,是人们转递与交换信息的重要来源。图像分类是根据图像的语义信息对不同类别图像进行区分,是计算机视觉中重要的基础问题,也是图像检测、图像分割、物体跟踪、行为分析等其他高层视觉任务的基础,在许多领域都有着广泛的应用。如:安防领域的人脸识别和智能视频分析等,交通领域的交通场景识别,互联网领域基于内容的图像检索和相册自动归类,医学领域的图像识别等。 - -在图像分类任务中,我们向大家介绍如何训练AlexNet、VGG、GoogLeNet、ResNet、Inception-v4、Inception-Resnet-V2和Xception模型。同时提供了能够将Caffe或TensorFlow训练好的模型文件转换为PaddlePaddle模型文件的模型转换工具。 - -- 11.1 [将Caffe模型文件转换为PaddlePaddle模型文件](https://github.com/PaddlePaddle/models/tree/develop/image_classification/caffe2paddle) -- 11.2 [将TensorFlow模型文件转换为PaddlePaddle模型文件](https://github.com/PaddlePaddle/models/tree/develop/image_classification/tf2paddle) -- 11.3 [AlexNet](https://github.com/PaddlePaddle/models/tree/develop/image_classification) -- 11.4 [VGG](https://github.com/PaddlePaddle/models/tree/develop/image_classification) -- 11.5 [Residual Network](https://github.com/PaddlePaddle/models/tree/develop/image_classification) -- 11.6 [Inception-v4](https://github.com/PaddlePaddle/models/tree/develop/image_classification) -- 11.7 [Inception-Resnet-V2](https://github.com/PaddlePaddle/models/tree/develop/image_classification) -- 11.8 [Xception](https://github.com/PaddlePaddle/models/tree/develop/image_classification) - -## 12. 目标检测 - -目标检测任务的目标是给定一张图像或是视频帧,让计算机找出其中所有目标的位置,并给出每个目标的具体类别。对于人类来说,目标检测是一个非常简单的任务。然而,计算机能够“看到”的仅有一些值为0 ~ 255的矩阵,很难解图像或是视频帧中出现了人或是物体这样的高层语义概念,也就更加难以定位目标出现在图像中哪个区域。与此同时,由于目标会出现在图像或是视频帧中的任何位置,目标的形态千变万化,图像或是视频帧的背景千差万别,诸多因素都使得目标检测对计算机来说是一个具有挑战性的问题。 - -在目标检测任务中,我们介绍利用SSD方法完成目标检测。SSD全称:Single Shot MultiBox Detector,是目标检测领域较新且效果较好的检测算法之一,具有检测速度快且检测精度高的特点。 - -- 12.1 [Single Shot MultiBox Detector](https://github.com/PaddlePaddle/models/tree/develop/ssd/README.cn.md) - -## 13. 场景文字识别 - -许多场景图像中包含着丰富的文本信息,对理解图像信息有着重要作用,能够极大地帮助人们认知和理解场景图像的内容。场景文字识别是在图像背景复杂、分辨率低下、字体多样、分布随意等情况下,将图像信息转化为文字序列的过程,可认为是一种特别的翻译过程:将图像输入翻译为自然语言输出。场景图像文字识别技术的发展也促进了一些新型应用的产生,如通过自动识别路牌中的文字帮助街景应用获取更加准确的地址信息等。 - -在场景文字识别任务中,我们介绍如何将基于CNN的图像特征提取和基于RNN的序列翻译技术结合,免除人工定义特征,避免字符分割,使用自动学习到的图像特征,完成端到端地无约束字符定位和识别。 - -- 13.1 [场景文字识别](https://github.com/PaddlePaddle/models/tree/develop/scene_text_recognition) - -## 14. 语音识别 - -语音识别技术(Auto Speech Recognize,简称ASR)将人类语音中的词汇内容转化为计算机可读的输入,让机器能够“听懂”人类的语音,在语音助手、语音输入、语音交互等应用中发挥着重要作用。深度学习在语音识别领域取得了瞩目的成绩,端到端的深度学习方法将传统的声学模型、词典、语言模型等模块融为一个整体,不再依赖隐马尔可夫模型中的各种条件独立性假设,令模型变得更加简洁,一个神经网络模型以语音特征为输入,直接输出识别出的文本,目前已经成为语音识别最重要的手段。 - -在语音识别任务中,我们提供了基于 DeepSpeech2 模型的完整流水线,包括:特征提取、数据增强、模型训练、语言模型、解码模块等,并提供一个训练好的模型和体验实例,大家能够使用自己的声音来体验语音识别的乐趣。 - -14.1 [语音识别: DeepSpeech2](https://github.com/PaddlePaddle/DeepSpeech) - -本教程由[PaddlePaddle](https://github.com/PaddlePaddle/Paddle)创作,采用[Apache-2.0](LICENSE) 许可协议进行许可。 diff --git a/legacy/README.md b/legacy/README.md deleted file mode 100644 index f0719c1a26c04341e8de327143dc826248bb3607..0000000000000000000000000000000000000000 --- a/legacy/README.md +++ /dev/null @@ -1,89 +0,0 @@ - -# 该目录的模型已经不再维护,不推荐使用。建议使用Fluid目录下的模型。 - -# Introduction to models - -[![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat)](https://github.com/PaddlePaddle/models) -[![Documentation Status](https://img.shields.io/badge/中文文档-最新-brightgreen.svg)](https://github.com/PaddlePaddle/models) -[![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE) - -PaddlePaddle provides a rich set of computational units to enable users to adopt a modular approach to solving various learning problems. In this repo, we demonstrate how to use PaddlePaddle to solve common machine learning tasks, providing several different neural network model that anyone can easily learn and use. - -## 1. Word Embedding - -The word embedding expresses words with a real vector. Each dimension of the vector represents some of the latent grammatical or semantic features of the text and is one of the most successful concepts in the field of natural language processing. The generalized word vector can also be applied to discrete features. The study of word vector is usually an unsupervised learning. Therefore, it is possible to take full advantage of massive unmarked data to capture the relationship between features and to solve the problem of sparse features, missing tag data, and data noise. However, in the common word vector learning method, the last layer of the model often encounters a large-scale classification problem, which is the bottleneck of computing performance. - -In the example of word vectors, we show how to use Hierarchical-Sigmoid and Noise Contrastive Estimation (NCE) to accelerate word-vector learning. - -- 1.1 [Hsigmoid Accelerated Word Vector Training](https://github.com/PaddlePaddle/models/tree/develop/legacy/hsigmoid) -- 1.2 [Noise Contrastive Estimation Accelerated Word Vector Training](https://github.com/PaddlePaddle/models/tree/develop/legacy/nce_cost) - - -## 2. RNN language model - -The language model is important in the field of natural language processing. In addition to getting the word vector (a by-product of language model training), it can also help us to generate text. Given a number of words, the language model can help us predict the next most likely word. In the example of using the language model to generate text, we focus on the recurrent neural network language model. We can use the instructions in the document quickly adapt to their training corpus, complete automatic writing poetry, automatic writing prose and other interesting models. - -- 2.1 [Generate text using the RNN language model](https://github.com/PaddlePaddle/models/tree/develop/legacy/generate_sequence_by_rnn_lm) - -## 3. Click-Through Rate prediction -The click-through rate model predicts the probability that a user will click on an ad. This is widely used for advertising technology. Logistic Regression has a good learning performance for large-scale sparse features in the early stages of the development of click-through rate prediction. In recent years, DNN model because of its strong learning ability to gradually take the banner rate of the task of the banner. - -In the example of click-through rate estimates, we first give the Google's Wide & Deep model. This model combines the advantages of DNN and the applicable logistic regression model for DNN and large-scale sparse features. Then we provide the deep factorization machine for click-through rate prediction. The deep factorization machine combines the factorization machine and deep neural networks to model both low order and high order interactions of input features. - -- 3.1 [Click-Through Rate Model](https://github.com/PaddlePaddle/models/tree/develop/legacy/ctr) -- 3.2 [Deep Factorization Machine for Click-Through Rate prediction](https://github.com/PaddlePaddle/models/tree/develop/legacy/deep_fm) - -## 4. Text classification - -Text classification is one of the most basic tasks in natural language processing. The deep learning method can eliminate the complex feature engineering, and use the original text as input to optimize the classification accuracy. - -For text classification, we provide a non-sequential text classification model based on DNN and CNN. (For LSTM-based model, please refer to PaddleBook [Sentiment Analysis](http://www.paddlepaddle.org/docs/develop/book/06.understand_sentiment/index.html)). - -- 4.1 [Sentiment analysis based on DNN / CNN](https://github.com/PaddlePaddle/models/tree/develop/legacy/text_classification) - -## 5. Learning to rank - -Learning to rank (LTR) is one of the core problems in information retrieval and search engine research. Training data is used by a learning algorithm to produce a ranking model which computes the relevance of documents for actual queries. -The depth neural network can be used to model the fractional function to form various LTR models based on depth learning. - -The algorithms for learning to rank are usually categorized into three groups by their input representation and the loss function. These are pointwise, pairwise and listwise approaches. Here we demonstrate RankLoss loss function method (pairwise approach), and LambdaRank loss function method (listwise approach). (For Pointwise approaches, please refer to [Recommended System](http://www.paddlepaddle.org/docs/develop/book/05.recommender_system/index.html)). - -- 5.1 [Learning to rank based on Pairwise and Listwise approches](https://github.com/PaddlePaddle/models/tree/develop/legacy/ltr) - -## 6. Semantic model -The deep structured semantic model uses the DNN model to learn the vector representation of the low latitude in a continuous semantic space, finally models the semantic similarity between the two sentences. - -In this example, we demonstrate how to use PaddlePaddle to implement a generic deep structured semantic model to model the semantic similarity between two strings. The model supports different network structures such as CNN (Convolutional Network), FC (Fully Connected Network), RNN (Recurrent Neural Network), and different loss functions such as classification, regression, and sequencing. - -- 6.1 [Deep structured semantic model](https://github.com/PaddlePaddle/models/tree/develop/legacy/dssm) - -## 7. Sequence tagging - -Given the input sequence, the sequence tagging model is one of the most basic tasks in the natural language processing by assigning a category tag to each element in the sequence. Recurrent neural network models with Conditional Random Field (CRF) are commonly used for sequence tagging tasks. - -In the example of the sequence tagging, we describe how to train an end-to-end sequence tagging model with the Named Entity Recognition (NER) task as an example. - -- 7.1 [Name Entity Recognition](https://github.com/PaddlePaddle/models/tree/develop/legacy/sequence_tagging_for_ner) - -## 8. Sequence to sequence learning - -Sequence-to-sequence model has a wide range of applications. This includes machine translation, dialogue system, and parse tree generation. - -As an example for sequence-to-sequence learning, we take the machine translation task. We demonstrate the sequence-to-sequence mapping model without attention mechanism, which is the basis for all sequence-to-sequence learning models. We will use scheduled sampling to improve the problem of error accumulation in the RNN model, and machine translation with external memory mechanism. - -- 8.1 [Basic Sequence-to-sequence model](https://github.com/PaddlePaddle/models/tree/develop/legacy/nmt_without_attention) - -## 9. Image classification - -For the example of image classification, we show you how to train AlexNet, VGG, GoogLeNet, ResNet, Inception-v4, Inception-Resnet-V2 and Xception models in PaddlePaddle. It also provides model conversion tools that convert Caffe or TensorFlow trained model files into PaddlePaddle model files. - -- 9.1 [convert Caffe model file to PaddlePaddle model file](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification/caffe2paddle) -- 9.2 [convert TensorFlow model file to PaddlePaddle model file](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification/tf2paddle) -- 9.3 [AlexNet](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification) -- 9.4 [VGG](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification) -- 9.5 [Residual Network](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification) -- 9.6 [Inception-v4](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification) -- 9.7 [Inception-Resnet-V2](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification) -- 9.8 [Xception](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification) - -This tutorial is contributed by [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) and licensed under the [Apache-2.0 license](LICENSE). diff --git a/legacy/conv_seq2seq/README.md b/legacy/conv_seq2seq/README.md deleted file mode 100644 index 5b22c2c17ea2ff3588e93219e86d81a831242211..0000000000000000000000000000000000000000 --- a/legacy/conv_seq2seq/README.md +++ /dev/null @@ -1,70 +0,0 @@ -The minimum PaddlePaddle version needed for the code sample in this directory is v0.11.0. If you are on a version of PaddlePaddle earlier than v0.11.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html). - ---- - -# Convolutional Sequence to Sequence Learning -This model implements the work in the following paper: - -Jonas Gehring, Micheal Auli, David Grangier, et al. Convolutional Sequence to Sequence Learning. Association for Computational Linguistics (ACL), 2017 - -# Data Preparation -- The data used in this tutorial can be downloaded by runing: - - ```bash - sh download.sh - ``` - -- Each line in the data file contains one sample and each sample consists of a source sentence and a target sentence. And the two sentences are seperated by '\t'. So, to use your own data, it should be organized as follows: - - ``` - \t - ``` - -# Training a Model -- Modify the following script if needed and then run: - - ```bash - python train.py \ - --train_data_path ./data/train \ - --test_data_path ./data/test \ - --src_dict_path ./data/src_dict \ - --trg_dict_path ./data/trg_dict \ - --enc_blocks "[(256, 3)] * 5" \ - --dec_blocks "[(256, 3)] * 3" \ - --emb_size 256 \ - --pos_size 200 \ - --drop_rate 0.2 \ - --use_bn False \ - --use_gpu False \ - --trainer_count 1 \ - --batch_size 32 \ - --num_passes 20 \ - >train.log 2>&1 - ``` - -# Inferring by a Trained Model -- Infer by a trained model by running: - - ```bash - python infer.py \ - --infer_data_path ./data/dev \ - --src_dict_path ./data/src_dict \ - --trg_dict_path ./data/trg_dict \ - --enc_blocks "[(256, 3)] * 5" \ - --dec_blocks "[(256, 3)] * 3" \ - --emb_size 256 \ - --pos_size 200 \ - --drop_rate 0.2 \ - --use_bn False \ - --use_gpu False \ - --trainer_count 1 \ - --max_len 100 \ - --batch_size 256 \ - --beam_size 1 \ - --is_show_attention False \ - --model_path ./params.pass-0.tar.gz \ - 1>infer_result 2>infer.log - ``` - -# Notes -Since PaddlePaddle of current version doesn't support weight normalization, we use batch normalization instead to confirm convergence when the network is deep. diff --git a/legacy/conv_seq2seq/beamsearch.py b/legacy/conv_seq2seq/beamsearch.py deleted file mode 100644 index dd8562f018c803d4f0d7bbba4a2a006ece904851..0000000000000000000000000000000000000000 --- a/legacy/conv_seq2seq/beamsearch.py +++ /dev/null @@ -1,197 +0,0 @@ -#coding=utf-8 - -import sys -import time -import math -import numpy as np - -import reader - - -class BeamSearch(object): - """ - Generate sequence by beam search - """ - - def __init__(self, - inferer, - trg_dict, - pos_size, - padding_num, - batch_size=1, - beam_size=1, - max_len=100): - self.inferer = inferer - self.trg_dict = trg_dict - self.reverse_trg_dict = reader.get_reverse_dict(trg_dict) - self.word_padding = trg_dict.__len__() - self.pos_size = pos_size - self.pos_padding = pos_size - self.padding_num = padding_num - self.win_len = padding_num + 1 - self.max_len = max_len - self.batch_size = batch_size - self.beam_size = beam_size - - def get_beam_input(self, batch, sample_list): - """ - Get input for generation at the current iteration. - """ - beam_input = [] - - for sample_id in sample_list: - for path in self.candidate_path[sample_id]: - if len(path['seq']) < self.win_len: - cur_trg = [self.word_padding] * ( - self.win_len - len(path['seq']) - 1 - ) + [self.trg_dict['']] + path['seq'] - cur_trg_pos = [self.pos_padding] * ( - self.win_len - len(path['seq']) - 1) + [0] + range( - 1, len(path['seq']) + 1) - else: - cur_trg = path['seq'][-self.win_len:] - cur_trg_pos = range( - len(path['seq']) + 1 - self.win_len, - len(path['seq']) + 1) - - beam_input.append(batch[sample_id] + [cur_trg] + [cur_trg_pos]) - - return beam_input - - def get_prob(self, beam_input): - """ - Get the probabilities of all possible tokens. - """ - row_list = [j * self.win_len for j in range(len(beam_input))] - prob = self.inferer.infer(beam_input, field='value')[row_list, :] - return prob - - def _top_k(self, prob, k): - """ - Get indices of the words with k highest probablities. - """ - return prob.argsort()[-k:][::-1] - - def beam_expand(self, prob, sample_list): - """ - In every iteration step, the model predicts the possible next words. - For each input sentence, the top beam_size words are selected as candidates. - """ - top_words = np.apply_along_axis(self._top_k, 1, prob, self.beam_size) - - candidate_words = [[]] * len(self.candidate_path) - idx = 0 - - for sample_id in sample_list: - for seq_id, path in enumerate(self.candidate_path[sample_id]): - for w in top_words[idx, :]: - score = path['score'] + math.log(prob[idx, w]) - candidate_words[sample_id] = candidate_words[sample_id] + [{ - 'word': w, - 'score': score, - 'seq_id': seq_id - }] - idx = idx + 1 - - return candidate_words - - def beam_shrink(self, candidate_words, sample_list): - """ - Pruning process of the beam search. During the process, beam_size most post possible - sequences are selected for the beam in the next generation. - """ - new_path = [[]] * len(self.candidate_path) - - for sample_id in sample_list: - beam_words = sorted( - candidate_words[sample_id], - key=lambda x: x['score'], - reverse=True)[:self.beam_size] - - complete_seq_min_score = None - complete_path_num = len(self.complete_path[sample_id]) - - if complete_path_num > 0: - complete_seq_min_score = min(self.complete_path[sample_id], - key=lambda x: x['score'])['score'] - if complete_path_num >= self.beam_size: - beam_words_max_score = beam_words[0]['score'] - if beam_words_max_score < complete_seq_min_score: - continue - - for w in beam_words: - - if w['word'] == self.trg_dict['']: - if complete_path_num < self.beam_size or complete_seq_min_score <= w[ - 'score']: - - seq = self.candidate_path[sample_id][w['seq_id']]['seq'] - self.complete_path[sample_id] = self.complete_path[ - sample_id] + [{ - 'seq': seq, - 'score': w['score'] - }] - - if complete_seq_min_score is None or complete_seq_min_score > w[ - 'score']: - complete_seq_min_score = w['score'] - else: - seq = self.candidate_path[sample_id][w['seq_id']]['seq'] + [ - w['word'] - ] - new_path[sample_id] = new_path[sample_id] + [{ - 'seq': seq, - 'score': w['score'] - }] - - return new_path - - def search_one_batch(self, batch): - """ - Perform beam search on one mini-batch. - """ - real_size = len(batch) - self.candidate_path = [[{'seq': [], 'score': 0.}]] * real_size - self.complete_path = [[]] * real_size - sample_list = range(real_size) - - for i in xrange(self.max_len): - beam_input = self.get_beam_input(batch, sample_list) - prob = self.get_prob(beam_input) - - candidate_words = self.beam_expand(prob, sample_list) - new_path = self.beam_shrink(candidate_words, sample_list) - self.candidate_path = new_path - sample_list = [ - sample_id for sample_id in sample_list - if len(new_path[sample_id]) > 0 - ] - - if len(sample_list) == 0: - break - - final_path = [] - for i in xrange(real_size): - top_path = sorted( - self.complete_path[i] + self.candidate_path[i], - key=lambda x: x['score'], - reverse=True)[:self.beam_size] - final_path.append(top_path) - return final_path - - def search(self, infer_data): - """ - Perform beam search on all data. - """ - - def _to_sentence(seq): - raw_sentence = [self.reverse_trg_dict[id] for id in seq] - sentence = " ".join(raw_sentence) - return sentence - - for pos in xrange(0, len(infer_data), self.batch_size): - batch = infer_data[pos:min(pos + self.batch_size, len(infer_data))] - self.final_path = self.search_one_batch(batch) - for top_path in self.final_path: - print _to_sentence(top_path[0]['seq']) - sys.stdout.flush() diff --git a/legacy/conv_seq2seq/download.sh b/legacy/conv_seq2seq/download.sh deleted file mode 100644 index b1a924d25b1a10ade9f4be8b504933d1efa01905..0000000000000000000000000000000000000000 --- a/legacy/conv_seq2seq/download.sh +++ /dev/null @@ -1,22 +0,0 @@ -#!/usr/bin/env bash - -CUR_PATH=`pwd` -git clone https://github.com/moses-smt/mosesdecoder.git -git clone https://github.com/rizar/actor-critic-public - -export MOSES=`pwd`/mosesdecoder -export LVSR=`pwd`/actor-critic-public - -cd actor-critic-public/exp/ted -sh create_dataset.sh - -cd $CUR_PATH -mkdir data -cp actor-critic-public/exp/ted/prep/*-* data/ -cp actor-critic-public/exp/ted/vocab.* data/ - -cd data -python ../preprocess.py - -cd .. -rm -rf actor-critic-public mosesdecoder diff --git a/legacy/conv_seq2seq/infer.py b/legacy/conv_seq2seq/infer.py deleted file mode 100644 index c804a84e71ffe920b72064cb05461d72c444ac73..0000000000000000000000000000000000000000 --- a/legacy/conv_seq2seq/infer.py +++ /dev/null @@ -1,236 +0,0 @@ -#coding=utf-8 - -import sys -import argparse -import distutils.util -import gzip - -import paddle.v2 as paddle -from model import conv_seq2seq -from beamsearch import BeamSearch -import reader - - -def parse_args(): - parser = argparse.ArgumentParser( - description="PaddlePaddle Convolutional Seq2Seq") - parser.add_argument( - '--infer_data_path', - type=str, - required=True, - help="Path of the dataset for inference") - parser.add_argument( - '--src_dict_path', - type=str, - required=True, - help='Path of the source dictionary') - parser.add_argument( - '--trg_dict_path', - type=str, - required=True, - help='path of the target dictionary') - parser.add_argument( - '--enc_blocks', type=str, help='Convolution blocks of the encoder') - parser.add_argument( - '--dec_blocks', type=str, help='Convolution blocks of the decoder') - parser.add_argument( - '--emb_size', - type=int, - default=256, - help='Dimension of word embedding. (default: %(default)s)') - parser.add_argument( - '--pos_size', - type=int, - default=200, - help='Total number of the position indexes. (default: %(default)s)') - parser.add_argument( - '--drop_rate', - type=float, - default=0., - help='Dropout rate. (default: %(default)s)') - parser.add_argument( - "--use_bn", - default=False, - type=distutils.util.strtobool, - help="Use batch normalization or not. (default: %(default)s)") - parser.add_argument( - "--use_gpu", - default=False, - type=distutils.util.strtobool, - help="Use gpu or not. (default: %(default)s)") - parser.add_argument( - "--trainer_count", - default=1, - type=int, - help="Trainer number. (default: %(default)s)") - parser.add_argument( - '--max_len', - type=int, - default=100, - help="The maximum length of the sentence to be generated. (default: %(default)s)" - ) - parser.add_argument( - "--batch_size", - default=1, - type=int, - help="Size of a mini-batch. (default: %(default)s)") - parser.add_argument( - "--beam_size", - default=1, - type=int, - help="The width of beam expansion. (default: %(default)s)") - parser.add_argument( - "--model_path", - type=str, - required=True, - help="The path of trained model. (default: %(default)s)") - parser.add_argument( - "--is_show_attention", - default=False, - type=distutils.util.strtobool, - help="Whether to show attention weight or not. (default: %(default)s)") - return parser.parse_args() - - -def infer(infer_data_path, - src_dict_path, - trg_dict_path, - model_path, - enc_conv_blocks, - dec_conv_blocks, - emb_dim=256, - pos_size=200, - drop_rate=0., - use_bn=False, - max_len=100, - batch_size=1, - beam_size=1, - is_show_attention=False): - """ - Inference. - - :param infer_data_path: The path of the data for inference. - :type infer_data_path: str - :param src_dict_path: The path of the source dictionary. - :type src_dict_path: str - :param trg_dict_path: The path of the target dictionary. - :type trg_dict_path: str - :param model_path: The path of a trained model. - :type model_path: str - :param enc_conv_blocks: The scale list of the encoder's convolution blocks. And each element of - the list contains output dimension and context length of the corresponding - convolution block. - :type enc_conv_blocks: list of tuple - :param dec_conv_blocks: The scale list of the decoder's convolution blocks. And each element of - the list contains output dimension and context length of the corresponding - convolution block. - :type dec_conv_blocks: list of tuple - :param emb_dim: The dimension of the embedding vector. - :type emb_dim: int - :param pos_size: The total number of the position indexes, which means - the maximum value of the index is pos_size - 1. - :type pos_size: int - :param drop_rate: Dropout rate. - :type drop_rate: float - :param use_bn: Whether to use batch normalization or not. False is the default value. - :type use_bn: bool - :param max_len: The maximum length of the sentence to be generated. - :type max_len: int - :param beam_size: The width of beam expansion. - :type beam_size: int - :param is_show_attention: Whether to show attention weight or not. False is the default value. - :type is_show_attention: bool - """ - # load dict - src_dict = reader.load_dict(src_dict_path) - trg_dict = reader.load_dict(trg_dict_path) - src_dict_size = src_dict.__len__() - trg_dict_size = trg_dict.__len__() - - prob, weight = conv_seq2seq( - src_dict_size=src_dict_size, - trg_dict_size=trg_dict_size, - pos_size=pos_size, - emb_dim=emb_dim, - enc_conv_blocks=enc_conv_blocks, - dec_conv_blocks=dec_conv_blocks, - drop_rate=drop_rate, - with_bn=use_bn, - is_infer=True) - - # load parameters - parameters = paddle.parameters.Parameters.from_tar(gzip.open(model_path)) - - padding_list = [context_len - 1 for (size, context_len) in dec_conv_blocks] - padding_num = reduce(lambda x, y: x + y, padding_list) - infer_reader = reader.data_reader( - data_file=infer_data_path, - src_dict=src_dict, - trg_dict=trg_dict, - pos_size=pos_size, - padding_num=padding_num) - - if is_show_attention: - attention_inferer = paddle.inference.Inference( - output_layer=weight, parameters=parameters) - for i, data in enumerate(infer_reader()): - src_len = len(data[0]) - trg_len = len(data[2]) - attention_weight = attention_inferer.infer( - [data], field='value', flatten_result=False) - attention_weight = [ - weight.reshape((trg_len, src_len)) - for weight in attention_weight - ] - print attention_weight - break - return - - infer_data = [] - for i, raw_data in enumerate(infer_reader()): - infer_data.append([raw_data[0], raw_data[1]]) - - inferer = paddle.inference.Inference( - output_layer=prob, parameters=parameters) - - searcher = BeamSearch( - inferer=inferer, - trg_dict=trg_dict, - pos_size=pos_size, - padding_num=padding_num, - max_len=max_len, - batch_size=batch_size, - beam_size=beam_size) - - searcher.search(infer_data) - return - - -def main(): - args = parse_args() - enc_conv_blocks = eval(args.enc_blocks) - dec_conv_blocks = eval(args.dec_blocks) - - sys.setrecursionlimit(10000) - - paddle.init(use_gpu=args.use_gpu, trainer_count=args.trainer_count) - - infer( - infer_data_path=args.infer_data_path, - src_dict_path=args.src_dict_path, - trg_dict_path=args.trg_dict_path, - model_path=args.model_path, - enc_conv_blocks=enc_conv_blocks, - dec_conv_blocks=dec_conv_blocks, - emb_dim=args.emb_size, - pos_size=args.pos_size, - drop_rate=args.drop_rate, - use_bn=args.use_bn, - max_len=args.max_len, - batch_size=args.batch_size, - beam_size=args.beam_size, - is_show_attention=args.is_show_attention) - - -if __name__ == '__main__': - main() diff --git a/legacy/conv_seq2seq/model.py b/legacy/conv_seq2seq/model.py deleted file mode 100644 index c31238f83172fdc3d6240095279d1c953ab272ae..0000000000000000000000000000000000000000 --- a/legacy/conv_seq2seq/model.py +++ /dev/null @@ -1,440 +0,0 @@ -#coding=utf-8 - -import math - -import paddle.v2 as paddle - -__all__ = ["conv_seq2seq"] - - -def gated_conv_with_batchnorm(input, - size, - context_len, - context_start=None, - learning_rate=1.0, - drop_rate=0., - with_bn=False): - """ - Definition of the convolution block. - - :param input: The input of this block. - :type input: LayerOutput - :param size: The dimension of the block's output. - :type size: int - :param context_len: The context length of the convolution. - :type context_len: int - :param context_start: The start position of the context. - :type context_start: int - :param learning_rate: The learning rate factor of the parameters in the block. - The actual learning rate is the product of the global - learning rate and this factor. - :type learning_rate: float - :param drop_rate: Dropout rate. - :type drop_rate: float - :param with_bn: Whether to use batch normalization or not. False is the default - value. - :type with_bn: bool - :return: The output of the convolution block. - :rtype: LayerOutput - """ - input = paddle.layer.dropout(input=input, dropout_rate=drop_rate) - - context = paddle.layer.mixed( - size=input.size * context_len, - input=paddle.layer.context_projection( - input=input, context_len=context_len, context_start=context_start)) - - raw_conv = paddle.layer.fc( - input=context, - size=size * 2, - act=paddle.activation.Linear(), - param_attr=paddle.attr.Param( - initial_mean=0., - initial_std=math.sqrt(4.0 * (1.0 - drop_rate) / context.size), - learning_rate=learning_rate), - bias_attr=False) - - if with_bn: - raw_conv = paddle.layer.batch_norm( - input=raw_conv, - act=paddle.activation.Linear(), - param_attr=paddle.attr.Param(learning_rate=learning_rate)) - - with paddle.layer.mixed(size=size) as conv: - conv += paddle.layer.identity_projection(raw_conv, size=size, offset=0) - - with paddle.layer.mixed(size=size, act=paddle.activation.Sigmoid()) as gate: - gate += paddle.layer.identity_projection( - raw_conv, size=size, offset=size) - - with paddle.layer.mixed(size=size) as gated_conv: - gated_conv += paddle.layer.dotmul_operator(conv, gate) - - return gated_conv - - -def encoder(token_emb, - pos_emb, - conv_blocks=[(256, 3)] * 5, - num_attention=3, - drop_rate=0., - with_bn=False): - """ - Definition of the encoder. - - :param token_emb: The embedding vector of the input token. - :type token_emb: LayerOutput - :param pos_emb: The embedding vector of the input token's position. - :type pos_emb: LayerOutput - :param conv_blocks: The scale list of the convolution blocks. Each element of - the list contains output dimension and context length of - the corresponding convolution block. - :type conv_blocks: list of tuple - :param num_attention: The total number of the attention modules used in the decoder. - :type num_attention: int - :param drop_rate: Dropout rate. - :type drop_rate: float - :param with_bn: Whether to use batch normalization or not. False is the default - value. - :type with_bn: bool - :return: The input token encoding. - :rtype: LayerOutput - """ - embedding = paddle.layer.addto( - input=[token_emb, pos_emb], - layer_attr=paddle.attr.Extra(drop_rate=drop_rate)) - - proj_size = conv_blocks[0][0] - block_input = paddle.layer.fc( - input=embedding, - size=proj_size, - act=paddle.activation.Linear(), - param_attr=paddle.attr.Param( - initial_mean=0., - initial_std=math.sqrt((1.0 - drop_rate) / embedding.size), - learning_rate=1.0 / (2.0 * num_attention)), - bias_attr=True, ) - - for (size, context_len) in conv_blocks: - if block_input.size == size: - residual = block_input - else: - residual = paddle.layer.fc( - input=block_input, - size=size, - act=paddle.activation.Linear(), - param_attr=paddle.attr.Param(learning_rate=1.0 / - (2.0 * num_attention)), - bias_attr=True) - - gated_conv = gated_conv_with_batchnorm( - input=block_input, - size=size, - context_len=context_len, - learning_rate=1.0 / (2.0 * num_attention), - drop_rate=drop_rate, - with_bn=with_bn) - - with paddle.layer.mixed(size=size) as block_output: - block_output += paddle.layer.identity_projection(residual) - block_output += paddle.layer.identity_projection(gated_conv) - - # halve the variance of the sum - block_output = paddle.layer.slope_intercept( - input=block_output, slope=math.sqrt(0.5)) - - block_input = block_output - - emb_dim = embedding.size - encoded_vec = paddle.layer.fc( - input=block_output, - size=emb_dim, - act=paddle.activation.Linear(), - param_attr=paddle.attr.Param(learning_rate=1.0 / (2.0 * num_attention)), - bias_attr=True) - - encoded_sum = paddle.layer.addto(input=[encoded_vec, embedding]) - - # halve the variance of the sum - encoded_sum = paddle.layer.slope_intercept( - input=encoded_sum, slope=math.sqrt(0.5)) - - return encoded_vec, encoded_sum - - -def attention(decoder_state, cur_embedding, encoded_vec, encoded_sum): - """ - Definition of the attention. - - :param decoder_state: The hidden state of the decoder. - :type decoder_state: LayerOutput - :param cur_embedding: The embedding vector of the current token. - :type cur_embedding: LayerOutput - :param encoded_vec: The source token encoding. - :type encoded_vec: LayerOutput - :param encoded_sum: The sum of the source token's encoding and embedding. - :type encoded_sum: LayerOutput - :return: A context vector and the attention weight. - :rtype: LayerOutput - """ - residual = decoder_state - - state_size = decoder_state.size - emb_dim = cur_embedding.size - with paddle.layer.mixed(size=emb_dim, bias_attr=True) as state_summary: - state_summary += paddle.layer.full_matrix_projection(decoder_state) - state_summary += paddle.layer.identity_projection(cur_embedding) - - # halve the variance of the sum - state_summary = paddle.layer.slope_intercept( - input=state_summary, slope=math.sqrt(0.5)) - - expanded = paddle.layer.expand(input=state_summary, expand_as=encoded_vec) - - m = paddle.layer.dot_prod(input1=expanded, input2=encoded_vec) - - attention_weight = paddle.layer.fc(input=m, - size=1, - act=paddle.activation.SequenceSoftmax(), - bias_attr=False) - - scaled = paddle.layer.scaling(weight=attention_weight, input=encoded_sum) - - attended = paddle.layer.pooling( - input=scaled, pooling_type=paddle.pooling.Sum()) - - attended_proj = paddle.layer.fc(input=attended, - size=state_size, - act=paddle.activation.Linear(), - bias_attr=True) - - attention_result = paddle.layer.addto(input=[attended_proj, residual]) - - # halve the variance of the sum - attention_result = paddle.layer.slope_intercept( - input=attention_result, slope=math.sqrt(0.5)) - return attention_result, attention_weight - - -def decoder(token_emb, - pos_emb, - encoded_vec, - encoded_sum, - dict_size, - conv_blocks=[(256, 3)] * 3, - drop_rate=0., - with_bn=False): - """ - Definition of the decoder. - - :param token_emb: The embedding vector of the input token. - :type token_emb: LayerOutput - :param pos_emb: The embedding vector of the input token's position. - :type pos_emb: LayerOutput - :param encoded_vec: The source token encoding. - :type encoded_vec: LayerOutput - :param encoded_sum: The sum of the source token's encoding and embedding. - :type encoded_sum: LayerOutput - :param dict_size: The size of the target dictionary. - :type dict_size: int - :param conv_blocks: The scale list of the convolution blocks. Each element - of the list contains output dimension and context length - of the corresponding convolution block. - :type conv_blocks: list of tuple - :param drop_rate: Dropout rate. - :type drop_rate: float - :param with_bn: Whether to use batch normalization or not. False is the default - value. - :type with_bn: bool - :return: The probability of the predicted token and the attention weights. - :rtype: LayerOutput - """ - - def attention_step(decoder_state, cur_embedding, encoded_vec, encoded_sum): - conditional = attention( - decoder_state=decoder_state, - cur_embedding=cur_embedding, - encoded_vec=encoded_vec, - encoded_sum=encoded_sum) - return conditional - - embedding = paddle.layer.addto( - input=[token_emb, pos_emb], - layer_attr=paddle.attr.Extra(drop_rate=drop_rate)) - - proj_size = conv_blocks[0][0] - block_input = paddle.layer.fc( - input=embedding, - size=proj_size, - act=paddle.activation.Linear(), - param_attr=paddle.attr.Param( - initial_mean=0., - initial_std=math.sqrt((1.0 - drop_rate) / embedding.size)), - bias_attr=True, ) - - weight = [] - for (size, context_len) in conv_blocks: - if block_input.size == size: - residual = block_input - else: - residual = paddle.layer.fc(input=block_input, - size=size, - act=paddle.activation.Linear(), - bias_attr=True) - - decoder_state = gated_conv_with_batchnorm( - input=block_input, - size=size, - context_len=context_len, - context_start=0, - drop_rate=drop_rate, - with_bn=with_bn) - - group_inputs = [ - decoder_state, - embedding, - paddle.layer.StaticInput(input=encoded_vec), - paddle.layer.StaticInput(input=encoded_sum), - ] - - conditional, attention_weight = paddle.layer.recurrent_group( - step=attention_step, input=group_inputs) - weight.append(attention_weight) - - block_output = paddle.layer.addto(input=[conditional, residual]) - - # halve the variance of the sum - block_output = paddle.layer.slope_intercept( - input=block_output, slope=math.sqrt(0.5)) - - block_input = block_output - - out_emb_dim = embedding.size - block_output = paddle.layer.fc( - input=block_output, - size=out_emb_dim, - act=paddle.activation.Linear(), - layer_attr=paddle.attr.Extra(drop_rate=drop_rate)) - - decoder_out = paddle.layer.fc( - input=block_output, - size=dict_size, - act=paddle.activation.Softmax(), - param_attr=paddle.attr.Param( - initial_mean=0., - initial_std=math.sqrt((1.0 - drop_rate) / block_output.size)), - bias_attr=True) - - return decoder_out, weight - - -def conv_seq2seq(src_dict_size, - trg_dict_size, - pos_size, - emb_dim, - enc_conv_blocks=[(256, 3)] * 5, - dec_conv_blocks=[(256, 3)] * 3, - drop_rate=0., - with_bn=False, - is_infer=False): - """ - Definition of convolutional sequence-to-sequence network. - - :param src_dict_size: The size of the source dictionary. - :type src_dict_size: int - :param trg_dict_size: The size of the target dictionary. - :type trg_dict_size: int - :param pos_size: The total number of the position indexes, which means - the maximum value of the index is pos_size - 1. - :type pos_size: int - :param emb_dim: The dimension of the embedding vector. - :type emb_dim: int - :param enc_conv_blocks: The scale list of the encoder's convolution blocks. Each element - of the list contains output dimension and context length of the - corresponding convolution block. - :type enc_conv_blocks: list of tuple - :param dec_conv_blocks: The scale list of the decoder's convolution blocks. Each element - of the list contains output dimension and context length of the - corresponding convolution block. - :type dec_conv_blocks: list of tuple - :param drop_rate: Dropout rate. - :type drop_rate: float - :param with_bn: Whether to use batch normalization or not. False is the default value. - :type with_bn: bool - :param is_infer: Whether infer or not. - :type is_infer: bool - :return: Cost or output layer. - :rtype: LayerOutput - """ - src = paddle.layer.data( - name='src_word', - type=paddle.data_type.integer_value_sequence(src_dict_size)) - src_pos = paddle.layer.data( - name='src_word_pos', - type=paddle.data_type.integer_value_sequence(pos_size + - 1)) # one for padding - - src_emb = paddle.layer.embedding( - input=src, - size=emb_dim, - name='src_word_emb', - param_attr=paddle.attr.Param( - initial_mean=0., initial_std=0.1)) - src_pos_emb = paddle.layer.embedding( - input=src_pos, - size=emb_dim, - name='src_pos_emb', - param_attr=paddle.attr.Param( - initial_mean=0., initial_std=0.1)) - - num_attention = len(dec_conv_blocks) - encoded_vec, encoded_sum = encoder( - token_emb=src_emb, - pos_emb=src_pos_emb, - conv_blocks=enc_conv_blocks, - num_attention=num_attention, - drop_rate=drop_rate, - with_bn=with_bn) - - trg = paddle.layer.data( - name='trg_word', - type=paddle.data_type.integer_value_sequence(trg_dict_size + - 1)) # one for padding - trg_pos = paddle.layer.data( - name='trg_word_pos', - type=paddle.data_type.integer_value_sequence(pos_size + - 1)) # one for padding - - trg_emb = paddle.layer.embedding( - input=trg, - size=emb_dim, - name='trg_word_emb', - param_attr=paddle.attr.Param( - initial_mean=0., initial_std=0.1)) - trg_pos_emb = paddle.layer.embedding( - input=trg_pos, - size=emb_dim, - name='trg_pos_emb', - param_attr=paddle.attr.Param( - initial_mean=0., initial_std=0.1)) - - decoder_out, weight = decoder( - token_emb=trg_emb, - pos_emb=trg_pos_emb, - encoded_vec=encoded_vec, - encoded_sum=encoded_sum, - dict_size=trg_dict_size, - conv_blocks=dec_conv_blocks, - drop_rate=drop_rate, - with_bn=with_bn) - - if is_infer: - return decoder_out, weight - - trg_next_word = paddle.layer.data( - name='trg_next_word', - type=paddle.data_type.integer_value_sequence(trg_dict_size)) - cost = paddle.layer.classification_cost( - input=decoder_out, label=trg_next_word) - - return cost diff --git a/legacy/conv_seq2seq/preprocess.py b/legacy/conv_seq2seq/preprocess.py deleted file mode 100644 index 1d5c7cdd7b5cc91e28854fa0bbeeffc9dcbe4e5c..0000000000000000000000000000000000000000 --- a/legacy/conv_seq2seq/preprocess.py +++ /dev/null @@ -1,30 +0,0 @@ -#coding=utf-8 - -import cPickle - - -def concat_file(file1, file2, dst_file): - with open(dst_file, 'w') as dst: - with open(file1) as f1: - with open(file2) as f2: - for i, (line1, line2) in enumerate(zip(f1, f2)): - line1 = line1.strip() - line = line1 + '\t' + line2 - dst.write(line) - - -if __name__ == '__main__': - concat_file('dev.de-en.de', 'dev.de-en.en', 'dev') - concat_file('test.de-en.de', 'test.de-en.en', 'test') - concat_file('train.de-en.de', 'train.de-en.en', 'train') - - src_dict = cPickle.load(open('vocab.de')) - trg_dict = cPickle.load(open('vocab.en')) - - with open('src_dict', 'w') as f: - f.write('\n\nUNK\n') - f.writelines('\n'.join(src_dict.keys())) - - with open('trg_dict', 'w') as f: - f.write('\n\nUNK\n') - f.writelines('\n'.join(trg_dict.keys())) diff --git a/legacy/conv_seq2seq/reader.py b/legacy/conv_seq2seq/reader.py deleted file mode 100644 index ad420af5faade1cd5ee7ef947f7f8920ce6a8bdb..0000000000000000000000000000000000000000 --- a/legacy/conv_seq2seq/reader.py +++ /dev/null @@ -1,67 +0,0 @@ -#coding=utf-8 - -import random - - -def load_dict(dict_file): - word_dict = dict() - with open(dict_file, 'r') as f: - for i, line in enumerate(f): - w = line.strip().split()[0] - word_dict[w] = i - return word_dict - - -def get_reverse_dict(dictionary): - reverse_dict = {dictionary[k]: k for k in dictionary.keys()} - return reverse_dict - - -def load_data(data_file, src_dict, trg_dict): - UNK_IDX = src_dict['UNK'] - with open(data_file, 'r') as f: - for line in f: - line_split = line.strip().split('\t') - if len(line_split) < 2: - continue - src, trg = line_split - src_words = src.strip().split() - trg_words = trg.strip().split() - src_seq = [src_dict.get(w, UNK_IDX) for w in src_words] - trg_seq = [trg_dict.get(w, UNK_IDX) for w in trg_words] - yield src_seq, trg_seq - - -def data_reader(data_file, src_dict, trg_dict, pos_size, padding_num): - def reader(): - UNK_IDX = src_dict['UNK'] - word_padding = trg_dict.__len__() - pos_padding = pos_size - - def _get_pos(pos_list, pos_size, pos_padding): - return [pos if pos < pos_size else pos_padding for pos in pos_list] - - with open(data_file, 'r') as f: - for line in f: - line_split = line.strip().split('\t') - if len(line_split) != 2: - continue - src, trg = line_split - src = src.strip().split() - src_word = [src_dict.get(w, UNK_IDX) for w in src] - src_word_pos = range(len(src_word)) - src_word_pos = _get_pos(src_word_pos, pos_size, pos_padding) - - trg = trg.strip().split() - trg_word = [trg_dict[''] - ] + [trg_dict.get(w, UNK_IDX) for w in trg] - trg_word_pos = range(len(trg_word)) - trg_word_pos = _get_pos(trg_word_pos, pos_size, pos_padding) - - trg_next_word = trg_word[1:] + [trg_dict['']] - trg_word = [word_padding] * padding_num + trg_word - trg_word_pos = [pos_padding] * padding_num + trg_word_pos - trg_next_word = trg_next_word + [trg_dict['']] * padding_num - yield src_word, src_word_pos, trg_word, trg_word_pos, trg_next_word - - return reader diff --git a/legacy/conv_seq2seq/train.py b/legacy/conv_seq2seq/train.py deleted file mode 100644 index 4bd9a1af675ada5820bb375938a4675e6e71fbe1..0000000000000000000000000000000000000000 --- a/legacy/conv_seq2seq/train.py +++ /dev/null @@ -1,263 +0,0 @@ -#coding=utf-8 - -import os -import sys -import time -import argparse -import distutils.util -import gzip -import numpy as np - -import paddle.v2 as paddle -from model import conv_seq2seq -import reader - - -def parse_args(): - parser = argparse.ArgumentParser( - description="PaddlePaddle Convolutional Seq2Seq") - parser.add_argument( - '--train_data_path', - type=str, - required=True, - help="Path of the training set") - parser.add_argument( - '--test_data_path', type=str, help='Path of the test set') - parser.add_argument( - '--src_dict_path', - type=str, - required=True, - help='Path of source dictionary') - parser.add_argument( - '--trg_dict_path', - type=str, - required=True, - help='Path of target dictionary') - parser.add_argument( - '--enc_blocks', type=str, help='Convolution blocks of the encoder') - parser.add_argument( - '--dec_blocks', type=str, help='Convolution blocks of the decoder') - parser.add_argument( - '--emb_size', - type=int, - default=256, - help='Dimension of word embedding. (default: %(default)s)') - parser.add_argument( - '--pos_size', - type=int, - default=200, - help='Total number of the position indexes. (default: %(default)s)') - parser.add_argument( - '--drop_rate', - type=float, - default=0., - help='Dropout rate. (default: %(default)s)') - parser.add_argument( - "--use_bn", - default=False, - type=distutils.util.strtobool, - help="Use batch normalization or not. (default: %(default)s)") - parser.add_argument( - "--use_gpu", - default=False, - type=distutils.util.strtobool, - help="Use gpu or not. (default: %(default)s)") - parser.add_argument( - "--trainer_count", - default=1, - type=int, - help="Trainer number. (default: %(default)s)") - parser.add_argument( - '--batch_size', - type=int, - default=32, - help="Size of a mini-batch. (default: %(default)s)") - parser.add_argument( - '--num_passes', - type=int, - default=15, - help="Number of passes to train. (default: %(default)s)") - return parser.parse_args() - - -def create_reader(padding_num, - train_data_path, - test_data_path=None, - src_dict=None, - trg_dict=None, - pos_size=200, - batch_size=32): - - train_reader = paddle.batch( - reader=paddle.reader.shuffle( - reader=reader.data_reader( - data_file=train_data_path, - src_dict=src_dict, - trg_dict=trg_dict, - pos_size=pos_size, - padding_num=padding_num), - buf_size=10240), - batch_size=batch_size) - - test_reader = None - if test_data_path: - test_reader = paddle.batch( - reader=paddle.reader.shuffle( - reader=reader.data_reader( - data_file=test_data_path, - src_dict=src_dict, - trg_dict=trg_dict, - pos_size=pos_size, - padding_num=padding_num), - buf_size=10240), - batch_size=batch_size) - - return train_reader, test_reader - - -def train(train_data_path, - test_data_path, - src_dict_path, - trg_dict_path, - enc_conv_blocks, - dec_conv_blocks, - emb_dim=256, - pos_size=200, - drop_rate=0., - use_bn=False, - batch_size=32, - num_passes=15): - """ - Train the convolution sequence-to-sequence model. - - :param train_data_path: The path of the training set. - :type train_data_path: str - :param test_data_path: The path of the test set. - :type test_data_path: str - :param src_dict_path: The path of the source dictionary. - :type src_dict_path: str - :param trg_dict_path: The path of the target dictionary. - :type trg_dict_path: str - :param enc_conv_blocks: The scale list of the encoder's convolution blocks. And each element of - the list contains output dimension and context length of the corresponding - convolution block. - :type enc_conv_blocks: list of tuple - :param dec_conv_blocks: The scale list of the decoder's convolution blocks. And each element of - the list contains output dimension and context length of the corresponding - convolution block. - :type dec_conv_blocks: list of tuple - :param emb_dim: The dimension of the embedding vector. - :type emb_dim: int - :param pos_size: The total number of the position indexes, which means - the maximum value of the index is pos_size - 1. - :type pos_size: int - :param drop_rate: Dropout rate. - :type drop_rate: float - :param use_bn: Whether to use batch normalization or not. False is the default value. - :type use_bn: bool - :param batch_size: The size of a mini-batch. - :type batch_size: int - :param num_passes: The total number of the passes to train. - :type num_passes: int - """ - # load dict - src_dict = reader.load_dict(src_dict_path) - trg_dict = reader.load_dict(trg_dict_path) - src_dict_size = src_dict.__len__() - trg_dict_size = trg_dict.__len__() - - optimizer = paddle.optimizer.Adam(learning_rate=1e-3, ) - - cost = conv_seq2seq( - src_dict_size=src_dict_size, - trg_dict_size=trg_dict_size, - pos_size=pos_size, - emb_dim=emb_dim, - enc_conv_blocks=enc_conv_blocks, - dec_conv_blocks=dec_conv_blocks, - drop_rate=drop_rate, - with_bn=use_bn, - is_infer=False) - - # create parameters and trainer - parameters = paddle.parameters.create(cost) - trainer = paddle.trainer.SGD(cost=cost, - parameters=parameters, - update_equation=optimizer) - - padding_list = [context_len - 1 for (size, context_len) in dec_conv_blocks] - padding_num = reduce(lambda x, y: x + y, padding_list) - train_reader, test_reader = create_reader( - padding_num=padding_num, - train_data_path=train_data_path, - test_data_path=test_data_path, - src_dict=src_dict, - trg_dict=trg_dict, - pos_size=pos_size, - batch_size=batch_size) - - feeding = { - 'src_word': 0, - 'src_word_pos': 1, - 'trg_word': 2, - 'trg_word_pos': 3, - 'trg_next_word': 4 - } - - # create event handler - def event_handler(event): - if isinstance(event, paddle.event.EndIteration): - if event.batch_id % 20 == 0: - cur_time = time.strftime('%Y.%m.%d %H:%M:%S', time.localtime()) - print "[%s]: Pass: %d, Batch: %d, TrainCost: %f, %s" % ( - cur_time, event.pass_id, event.batch_id, event.cost, - event.metrics) - sys.stdout.flush() - - if isinstance(event, paddle.event.EndPass): - if test_reader is not None: - cur_time = time.strftime('%Y.%m.%d %H:%M:%S', time.localtime()) - result = trainer.test(reader=test_reader, feeding=feeding) - print "[%s]: Pass: %d, TestCost: %f, %s" % ( - cur_time, event.pass_id, result.cost, result.metrics) - sys.stdout.flush() - with gzip.open("output/params.pass-%d.tar.gz" % event.pass_id, - 'w') as f: - trainer.save_parameter_to_tar(f) - - if not os.path.exists('output'): - os.mkdir('output') - - trainer.train( - reader=train_reader, - event_handler=event_handler, - num_passes=num_passes, - feeding=feeding) - - -def main(): - args = parse_args() - enc_conv_blocks = eval(args.enc_blocks) - dec_conv_blocks = eval(args.dec_blocks) - - sys.setrecursionlimit(10000) - - paddle.init(use_gpu=args.use_gpu, trainer_count=args.trainer_count) - - train( - train_data_path=args.train_data_path, - test_data_path=args.test_data_path, - src_dict_path=args.src_dict_path, - trg_dict_path=args.trg_dict_path, - enc_conv_blocks=enc_conv_blocks, - dec_conv_blocks=dec_conv_blocks, - emb_dim=args.emb_size, - pos_size=args.pos_size, - drop_rate=args.drop_rate, - use_bn=args.use_bn, - batch_size=args.batch_size, - num_passes=args.num_passes) - - -if __name__ == '__main__': - main() diff --git a/legacy/ctr/README.cn.md b/legacy/ctr/README.cn.md deleted file mode 100644 index d717264c46529c4ca3be6500983558b0384a7d77..0000000000000000000000000000000000000000 --- a/legacy/ctr/README.cn.md +++ /dev/null @@ -1,369 +0,0 @@ -运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。 - ---- - -# 点击率预估 - -以下是本例目录包含的文件以及对应说明: - -``` -├── README.md # 本教程markdown 文档 -├── dataset.md # 数据集处理教程 -├── images # 本教程图片目录 -│   ├── lr_vs_dnn.jpg -│   └── wide_deep.png -├── infer.py # 预测脚本 -├── network_conf.py # 模型网络配置 -├── reader.py # data reader -├── train.py # 训练脚本 -└── utils.py # helper functions -└── avazu_data_processer.py # 示例数据预处理脚本 -``` - -## 背景介绍 - -CTR(Click-Through Rate,点击率预估)\[[1](https://en.wikipedia.org/wiki/Click-through_rate)\] -是对用户点击一个特定链接的概率做出预测,是广告投放过程中的一个重要环节。精准的点击率预估对在线广告系统收益最大化具有重要意义。 - -当有多个广告位时,CTR 预估一般会作为排序的基准,比如在搜索引擎的广告系统里,当用户输入一个带商业价值的搜索词(query)时,系统大体上会执行下列步骤来展示广告: - -1. 获取与用户搜索词相关的广告集合 -2. 业务规则和相关性过滤 -3. 根据拍卖机制和 CTR 排序 -4. 展出广告 - -可以看到,CTR 在最终排序中起到了很重要的作用。 - -### 发展阶段 -在业内,CTR 模型经历了如下的发展阶段: - -- Logistic Regression(LR) / GBDT + 特征工程 -- LR + DNN 特征 -- DNN + 特征工程 - -在发展早期时 LR 一统天下,但最近 DNN 模型由于其强大的学习能力和逐渐成熟的性能优化, -逐渐地接过 CTR 预估任务的大旗。 - - -### LR vs DNN - -下图展示了 LR 和一个 \(3x2\) 的 DNN 模型的结构: - -

    -
    -Figure 1. LR 和 DNN 模型结构对比 -

    - -LR 的蓝色箭头部分可以直接类比到 DNN 中对应的结构,可以看到 LR 和 DNN 有一些共通之处(比如权重累加), -但前者的模型复杂度在相同输入维度下比后者可能低很多(从某方面讲,模型越复杂,越有潜力学习到更复杂的信息); -如果 LR 要达到匹敌 DNN 的学习能力,必须增加输入的维度,也就是增加特征的数量, -这也就是为何 LR 和大规模的特征工程必须绑定在一起的原因。 - -LR 对于 DNN 模型的优势是对大规模稀疏特征的容纳能力,包括内存和计算量等方面,工业界都有非常成熟的优化方法; -而 DNN 模型具有自己学习新特征的能力,一定程度上能够提升特征使用的效率, -这使得 DNN 模型在同样规模特征的情况下,更有可能达到更好的学习效果。 - -本文后面的章节会演示如何使用 PaddlePaddle 编写一个结合两者优点的模型。 - - -## 数据和任务抽象 - -我们可以将 `click` 作为学习目标,任务可以有以下几种方案: - -1. 直接学习 click,0,1 作二元分类 -2. Learning to rank, 具体用 pairwise rank(标签 1>0)或者 listwise rank -3. 统计每个广告的点击率,将同一个 query 下的广告两两组合,点击率高的>点击率低的,做 rank 或者分类 - -我们直接使用第一种方法做分类任务。 - -我们使用 Kaggle 上 `Click-through rate prediction` 任务的数据集\[[2](https://www.kaggle.com/c/avazu-ctr-prediction/data)\] 来演示本例中的模型。 - -具体的特征处理方法参看 [data process](./dataset.md)。 - -本教程中演示模型的输入格式如下: - -``` -# \t \t click -1 23 190 \t 230:0.12 3421:0.9 23451:0.12 \t 0 -23 231 \t 1230:0.12 13421:0.9 \t 1 -``` - -详细的格式描述如下: - -- `dnn input ids` 采用 one-hot 表示,只需要填写值为1的ID(注意这里不是变长输入) -- `lr input sparse values` 使用了 `ID:VALUE` 的表示,值部分最好规约到值域 `[-1, 1]`。 - -此外,模型训练时需要传入一个文件描述 dnn 和 lr两个子模型的输入维度,文件的格式如下: - -``` -dnn_input_dim: -lr_input_dim: -``` - -其中, `` 表示一个整型数值。 - -本目录下的 `avazu_data_processor.py` 可以对下载的演示数据集\[[2](#参考文档)\] 进行处理,具体使用方法参考如下说明: - -``` -usage: avazu_data_processer.py [-h] --data_path DATA_PATH --output_dir - OUTPUT_DIR - [--num_lines_to_detect NUM_LINES_TO_DETECT] - [--test_set_size TEST_SET_SIZE] - [--train_size TRAIN_SIZE] - -PaddlePaddle CTR example - -optional arguments: - -h, --help show this help message and exit - --data_path DATA_PATH - path of the Avazu dataset - --output_dir OUTPUT_DIR - directory to output - --num_lines_to_detect NUM_LINES_TO_DETECT - number of records to detect dataset's meta info - --test_set_size TEST_SET_SIZE - size of the validation dataset(default: 10000) - --train_size TRAIN_SIZE - size of the trainset (default: 100000) -``` - -- `data_path` 是待处理的数据路径 -- `output_dir` 生成数据的输出路径 -- `num_lines_to_detect` 预先扫描数据生成ID的个数,这里是扫描的文件行数 -- `test_set_size` 生成测试集的行数 -- `train_size` 生成训练姐的行数 - -## Wide & Deep Learning Model - -谷歌在 16 年提出了 Wide & Deep Learning 的模型框架,用于融合适合学习抽象特征的 DNN 和 适用于大规模稀疏特征的 LR 两种模型的优点。 - - -### 模型简介 - -Wide & Deep Learning Model\[[3](#参考文献)\] 可以作为一种相对成熟的模型框架使用, -在 CTR 预估的任务中工业界也有一定的应用,因此本文将演示使用此模型来完成 CTR 预估的任务。 - -模型结构如下: - -

    -
    -Figure 2. Wide & Deep Model -

    - -模型上边的 Wide 部分,可以容纳大规模系数特征,并且对一些特定的信息(比如 ID)有一定的记忆能力; -而模型下边的 Deep 部分,能够学习特征间的隐含关系,在相同数量的特征下有更好的学习和推导能力。 - - -### 编写模型输入 - -模型只接受 3 个输入,分别是 - -- `dnn_input` ,也就是 Deep 部分的输入 -- `lr_input` ,也就是 Wide 部分的输入 -- `click` , 点击与否,作为二分类模型学习的标签 - -```python -dnn_merged_input = layer.data( - name='dnn_input', - type=paddle.data_type.sparse_binary_vector(data_meta_info['dnn_input'])) - -lr_merged_input = layer.data( - name='lr_input', - type=paddle.data_type.sparse_binary_vector(data_meta_info['lr_input'])) - -click = paddle.layer.data(name='click', type=dtype.dense_vector(1)) -``` - -### 编写 Wide 部分 - -Wide 部分直接使用了 LR 模型,但激活函数改成了 `RELU` 来加速 - -```python -def build_lr_submodel(): - fc = layer.fc( - input=lr_merged_input, size=1, name='lr', act=paddle.activation.Relu()) - return fc -``` - -### 编写 Deep 部分 - -Deep 部分使用了标准的多层前向传导的 DNN 模型 - -```python -def build_dnn_submodel(dnn_layer_dims): - dnn_embedding = layer.fc(input=dnn_merged_input, size=dnn_layer_dims[0]) - _input_layer = dnn_embedding - for i, dim in enumerate(dnn_layer_dims[1:]): - fc = layer.fc( - input=_input_layer, - size=dim, - act=paddle.activation.Relu(), - name='dnn-fc-%d' % i) - _input_layer = fc - return _input_layer -``` - -### 两者融合 - -两个 submodel 的最上层输出加权求和得到整个模型的输出,输出部分使用 `sigmoid` 作为激活函数,得到区间 (0,1) 的预测值, -来逼近训练数据中二元类别的分布,并最终作为 CTR 预估的值使用。 - -```python -# conbine DNN and LR submodels -def combine_submodels(dnn, lr): - merge_layer = layer.concat(input=[dnn, lr]) - fc = layer.fc( - input=merge_layer, - size=1, - name='output', - # use sigmoid function to approximate ctr, wihch is a float value between 0 and 1. - act=paddle.activation.Sigmoid()) - return fc -``` - -### 训练任务的定义 -```python -dnn = build_dnn_submodel(dnn_layer_dims) -lr = build_lr_submodel() -output = combine_submodels(dnn, lr) - -# ============================================================================== -# cost and train period -# ============================================================================== -classification_cost = paddle.layer.multi_binary_label_cross_entropy_cost( - input=output, label=click) - - -paddle.init(use_gpu=False, trainer_count=11) - -params = paddle.parameters.create(classification_cost) - -optimizer = paddle.optimizer.Momentum(momentum=0) - -trainer = paddle.trainer.SGD( - cost=classification_cost, parameters=params, update_equation=optimizer) - -dataset = AvazuDataset(train_data_path, n_records_as_test=test_set_size) - -def event_handler(event): - if isinstance(event, paddle.event.EndIteration): - if event.batch_id % 100 == 0: - logging.warning("Pass %d, Samples %d, Cost %f" % ( - event.pass_id, event.batch_id * batch_size, event.cost)) - - if event.batch_id % 1000 == 0: - result = trainer.test( - reader=paddle.batch(dataset.test, batch_size=1000), - feeding=field_index) - logging.warning("Test %d-%d, Cost %f" % (event.pass_id, event.batch_id, - result.cost)) - - -trainer.train( - reader=paddle.batch( - paddle.reader.shuffle(dataset.train, buf_size=500), - batch_size=batch_size), - feeding=field_index, - event_handler=event_handler, - num_passes=100) -``` -## 运行训练和测试 -训练模型需要如下步骤: - -1. 准备训练数据 - 1. 从 [Kaggle CTR](https://www.kaggle.com/c/avazu-ctr-prediction/data) 下载 train.gz - 2. 解压 train.gz 得到 train.txt - 3. `mkdir -p output; python avazu_data_processer.py --data_path train.txt --output_dir output --num_lines_to_detect 1000 --test_set_size 100` 生成演示数据 -2. 执行 `python train.py --train_data_path ./output/train.txt --test_data_path ./output/test.txt --data_meta_file ./output/data.meta.txt --model_type=0` 开始训练 - -上面第2个步骤可以为 `train.py` 填充命令行参数来定制模型的训练过程,具体的命令行参数及用法如下 - -``` -usage: train.py [-h] --train_data_path TRAIN_DATA_PATH - [--test_data_path TEST_DATA_PATH] [--batch_size BATCH_SIZE] - [--num_passes NUM_PASSES] - [--model_output_prefix MODEL_OUTPUT_PREFIX] --data_meta_file - DATA_META_FILE --model_type MODEL_TYPE - -PaddlePaddle CTR example - -optional arguments: - -h, --help show this help message and exit - --train_data_path TRAIN_DATA_PATH - path of training dataset - --test_data_path TEST_DATA_PATH - path of testing dataset - --batch_size BATCH_SIZE - size of mini-batch (default:10000) - --num_passes NUM_PASSES - number of passes to train - --model_output_prefix MODEL_OUTPUT_PREFIX - prefix of path for model to store (default: - ./ctr_models) - --data_meta_file DATA_META_FILE - path of data meta info file - --model_type MODEL_TYPE - model type, classification: 0, regression 1 (default - classification) -``` - -- `train_data_path` : 训练集的路径 -- `test_data_path` : 测试集的路径 -- `num_passes`: 模型训练多少轮 -- `data_meta_file`: 参考[数据和任务抽象](### 数据和任务抽象)的描述。 -- `model_type`: 模型分类或回归 - - -## 用训好的模型做预测 -训好的模型可以用来预测新的数据, 预测数据的格式为 - -``` -# \t -1 23 190 \t 230:0.12 3421:0.9 23451:0.12 -23 231 \t 1230:0.12 13421:0.9 -``` - -这里与训练数据的格式唯一不同的地方,就是没有标签,也就是训练数据中第3列 `click` 对应的数值。 - -`infer.py` 的使用方法如下 - -``` -usage: infer.py [-h] --model_gz_path MODEL_GZ_PATH --data_path DATA_PATH - --prediction_output_path PREDICTION_OUTPUT_PATH - [--data_meta_path DATA_META_PATH] --model_type MODEL_TYPE - -PaddlePaddle CTR example - -optional arguments: - -h, --help show this help message and exit - --model_gz_path MODEL_GZ_PATH - path of model parameters gz file - --data_path DATA_PATH - path of the dataset to infer - --prediction_output_path PREDICTION_OUTPUT_PATH - path to output the prediction - --data_meta_path DATA_META_PATH - path of trainset's meta info, default is ./data.meta - --model_type MODEL_TYPE - model type, classification: 0, regression 1 (default - classification) -``` - -- `model_gz_path_model`:用 `gz` 压缩过的模型路径 -- `data_path` : 需要预测的数据路径 -- `prediction_output_paht`:预测输出的路径 -- `data_meta_file` :参考[数据和任务抽象](### 数据和任务抽象)的描述。 -- `model_type` :分类或回归 - -示例数据可以用如下命令预测 - -``` -python infer.py --model_gz_path --data_path output/infer.txt --prediction_output_path predictions.txt --data_meta_path data.meta.txt -``` - -最终的预测结果位于 `predictions.txt`。 - -## 参考文献 -1. -2. -3. Cheng H T, Koc L, Harmsen J, et al. [Wide & deep learning for recommender systems](https://arxiv.org/pdf/1606.07792.pdf)[C]//Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 2016: 7-10. diff --git a/legacy/ctr/README.md b/legacy/ctr/README.md deleted file mode 100644 index 9ace483be6126b31e064ce3014cea1b08664f8cf..0000000000000000000000000000000000000000 --- a/legacy/ctr/README.md +++ /dev/null @@ -1,343 +0,0 @@ -The minimum PaddlePaddle version needed for the code sample in this directory is v0.10.0. If you are on a version of PaddlePaddle earlier than v0.10.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html). - ---- - -# Click-Through Rate Prediction - -## Introduction - -CTR(Click-Through Rate)\[[1](https://en.wikipedia.org/wiki/Click-through_rate)\] -is a prediction of the probability that a user clicks on an advertisement. This model is widely used in the advertisement industry. Accurate click rate estimates are important for maximizing online advertising revenue. - -When there are multiple ad slots, CTR estimates are generally used as a baseline for ranking. For example, in a search engine's ad system, when the user enters a query, the system typically performs the following steps to show relevant ads. - -1. Get the ad collection associated with the user's search term. -2. Business rules and relevance filtering. -3. Rank by auction mechanism and CTR. -4. Show ads. - -Here,CTR plays a crucial role. - -### Brief history -Historically, the CTR prediction model has been evolving as follows. - -- Logistic Regression(LR) / Gradient Boosting Decision Trees (GBDT) + feature engineering -- LR + Deep Neural Network (DNN) -- DNN + feature engineering - -In the early stages of development LR dominated, but the recent years DNN based models are mainly used. - - -### LR vs DNN - -The following figure shows the structure of LR and DNN model: - -

    -
    -Figure 1. LR and DNN model structure comparison -

    - -We can see, LR and CNN have some common structures. However, DNN can have non-linear relation between input and output values by adding activation unit and further layers. This enables DNN to achieve better learning results in CTR estimates. - -In the following, we demonstrate how to use PaddlePaddle to learn to predict CTR. - -## Data and Model formation - -Here `click` is the learning objective. There are several ways to learn the objectives. - -1. Direct learning click, 0,1 for binary classification -2. Learning to rank, pairwise rank or listwise rank -3. Measure the ad click rate of each ad, then rank by the click rate. - -In this example, we use the first method. - -We use the Kaggle `Click-through rate prediction` task \[[2](https://www.kaggle.com/c/avazu-ctr-prediction/data)\]. - -Please see the [data process](./dataset.md) for pre-processing data. - -The input data format for the demo model in this tutorial is as follows: - -``` -# \t \t click -1 23 190 \t 230:0.12 3421:0.9 23451:0.12 \t 0 -23 231 \t 1230:0.12 13421:0.9 \t 1 -``` - -Description: - -- `dnn input ids` one-hot coding. -- `lr input sparse values` Use `ID:VALUE` , values are preferaly scaled to the range `[-1, 1]`。 - -此外,模型训练时需要传入一个文件描述 dnn 和 lr两个子模型的输入维度,文件的格式如下: - -``` -dnn_input_dim: -lr_input_dim: -``` - - represents an integer value. - -`avazu_data_processor.py` can be used to download the data set \[[2](#参考文档)\]and pre-process the data. - -``` -usage: avazu_data_processer.py [-h] --data_path DATA_PATH --output_dir - OUTPUT_DIR - [--num_lines_to_detect NUM_LINES_TO_DETECT] - [--test_set_size TEST_SET_SIZE] - [--train_size TRAIN_SIZE] - -PaddlePaddle CTR example - -optional arguments: - -h, --help show this help message and exit - --data_path DATA_PATH - path of the Avazu dataset - --output_dir OUTPUT_DIR - directory to output - --num_lines_to_detect NUM_LINES_TO_DETECT - number of records to detect dataset's meta info - --test_set_size TEST_SET_SIZE - size of the validation dataset(default: 10000) - --train_size TRAIN_SIZE - size of the trainset (default: 100000) -``` - -- `data_path` The data path to be processed -- `output_dir` The output path of the data -- `num_lines_to_detect` The number of generated IDs -- `test_set_size` The number of rows for the test set -- `train_size` The number of rows of training set - -## Wide & Deep Learning Model - -Google proposed a model framework for Wide & Deep Learning to integrate the advantages of both DNNs suitable for learning abstract features and LR models for large sparse features. - - -### Introduction to the model - -Wide & Deep Learning Model\[[3](#References)\] is a relatively mature model, but this model is still being used in the CTR predicting task. Here we demonstrate the use of this model to complete the CTR predicting task. - -The model structure is as follows: - -

    -
    -Figure 2. Wide & Deep Model -

    - -The wide part of the top side of the model can accommodate large-scale coefficient features and has some memory for some specific information (such as ID); and the Deep part of the bottom side of the model can learn the implicit relationship between features. - - -### Model Input - -The model has three inputs as follows. - -- `dnn_input` ,the Deep part of the input -- `lr_input` ,the wide part of the input -- `click` , click on or not - -```python -dnn_merged_input = layer.data( - name='dnn_input', - type=paddle.data_type.sparse_binary_vector(self.dnn_input_dim)) - -lr_merged_input = layer.data( - name='lr_input', - type=paddle.data_type.sparse_vector(self.lr_input_dim)) - -click = paddle.layer.data(name='click', type=dtype.dense_vector(1)) -``` - -### Wide part - -Wide part uses of the LR model, but the activation function changed to `RELU` for speed. - -```python -def build_lr_submodel(): - fc = layer.fc( - input=lr_merged_input, size=1, name='lr', act=paddle.activation.Relu()) - return fc -``` - -### Deep part - -The Deep part uses a standard multi-layer DNN. - -```python -def build_dnn_submodel(dnn_layer_dims): - dnn_embedding = layer.fc(input=dnn_merged_input, size=dnn_layer_dims[0]) - _input_layer = dnn_embedding - for i, dim in enumerate(dnn_layer_dims[1:]): - fc = layer.fc( - input=_input_layer, - size=dim, - act=paddle.activation.Relu(), - name='dnn-fc-%d' % i) - _input_layer = fc - return _input_layer -``` - -### Combine - -The output section uses `sigmoid` function to output (0,1) as the prediction value. - -```python -# conbine DNN and LR submodels -def combine_submodels(dnn, lr): - merge_layer = layer.concat(input=[dnn, lr]) - fc = layer.fc( - input=merge_layer, - size=1, - name='output', - # use sigmoid function to approximate ctr, wihch is a float value between 0 and 1. - act=paddle.activation.Sigmoid()) - return fc -``` - -### Training -```python -dnn = build_dnn_submodel(dnn_layer_dims) -lr = build_lr_submodel() -output = combine_submodels(dnn, lr) - -# ============================================================================== -# cost and train period -# ============================================================================== -classification_cost = paddle.layer.multi_binary_label_cross_entropy_cost( - input=output, label=click) - - -paddle.init(use_gpu=False, trainer_count=11) - -params = paddle.parameters.create(classification_cost) - -optimizer = paddle.optimizer.Momentum(momentum=0) - -trainer = paddle.trainer.SGD( - cost=classification_cost, parameters=params, update_equation=optimizer) - -dataset = AvazuDataset(train_data_path, n_records_as_test=test_set_size) - -def event_handler(event): - if isinstance(event, paddle.event.EndIteration): - if event.batch_id % 100 == 0: - logging.warning("Pass %d, Samples %d, Cost %f" % ( - event.pass_id, event.batch_id * batch_size, event.cost)) - - if event.batch_id % 1000 == 0: - result = trainer.test( - reader=paddle.batch(dataset.test, batch_size=1000), - feeding=field_index) - logging.warning("Test %d-%d, Cost %f" % (event.pass_id, event.batch_id, - result.cost)) - - -trainer.train( - reader=paddle.batch( - paddle.reader.shuffle(dataset.train, buf_size=500), - batch_size=batch_size), - feeding=field_index, - event_handler=event_handler, - num_passes=100) -``` - -## Run training and testing -The model go through the following steps: - -1. Prepare training data - 1. Download train.gz from [Kaggle CTR](https://www.kaggle.com/c/avazu-ctr-prediction/data) . - 2. Unzip train.gz to get train.txt - 3. `mkdir -p output; python avazu_data_processer.py --data_path train.txt --output_dir output --num_lines_to_detect 1000 --test_set_size 100` 生成演示数据 -2. Execute `python train.py --train_data_path ./output/train.txt --test_data_path ./output/test.txt --data_meta_file ./output/data.meta.txt --model_type=0`. Start training. - -The argument options for `train.py` are as follows. - -``` -usage: train.py [-h] --train_data_path TRAIN_DATA_PATH - [--test_data_path TEST_DATA_PATH] [--batch_size BATCH_SIZE] - [--num_passes NUM_PASSES] - [--model_output_prefix MODEL_OUTPUT_PREFIX] --data_meta_file - DATA_META_FILE --model_type MODEL_TYPE - -PaddlePaddle CTR example - -optional arguments: - -h, --help show this help message and exit - --train_data_path TRAIN_DATA_PATH - path of training dataset - --test_data_path TEST_DATA_PATH - path of testing dataset - --batch_size BATCH_SIZE - size of mini-batch (default:10000) - --num_passes NUM_PASSES - number of passes to train - --model_output_prefix MODEL_OUTPUT_PREFIX - prefix of path for model to store (default: - ./ctr_models) - --data_meta_file DATA_META_FILE - path of data meta info file - --model_type MODEL_TYPE - model type, classification: 0, regression 1 (default - classification) -``` - -- `train_data_path` : The path of the training set -- `test_data_path` : The path of the testing set -- `num_passes`: number of rounds of model training -- `data_meta_file`: Please refer to [数据和任务抽象](### 数据和任务抽象)的描述。 -- `model_type`: Model classification or regressio - - -## Use the training model for prediction -The training model can be used to predict new data, and the format of the forecast data is as follows. - - -``` -# \t -1 23 190 \t 230:0.12 3421:0.9 23451:0.12 -23 231 \t 1230:0.12 13421:0.9 -``` - -Here the only difference to the training data is that there is no label (i.e. `click` values). - -We now can use `infer.py` to perform inference. - -``` -usage: infer.py [-h] --model_gz_path MODEL_GZ_PATH --data_path DATA_PATH - --prediction_output_path PREDICTION_OUTPUT_PATH - [--data_meta_path DATA_META_PATH] --model_type MODEL_TYPE - -PaddlePaddle CTR example - -optional arguments: - -h, --help show this help message and exit - --model_gz_path MODEL_GZ_PATH - path of model parameters gz file - --data_path DATA_PATH - path of the dataset to infer - --prediction_output_path PREDICTION_OUTPUT_PATH - path to output the prediction - --data_meta_path DATA_META_PATH - path of trainset's meta info, default is ./data.meta - --model_type MODEL_TYPE - model type, classification: 0, regression 1 (default - classification) -``` - -- `model_gz_path_model`:path for `gz` compressed data. -- `data_path` : -- `prediction_output_patj`:path for the predicted values s -- `data_meta_file` :Please refer to [数据和任务抽象](### 数据和任务抽象)。 -- `model_type` :Classification or regression - -The sample data can be predicted with the following command - -``` -python infer.py --model_gz_path --data_path output/infer.txt --prediction_output_path predictions.txt --data_meta_path data.meta.txt -``` - -The final prediction is written in `predictions.txt`。 - -## References -1. -2. -3. Cheng H T, Koc L, Harmsen J, et al. [Wide & deep learning for recommender systems](https://arxiv.org/pdf/1606.07792.pdf)[C]//Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 2016: 7-10. diff --git a/legacy/ctr/avazu_data_processer.py b/legacy/ctr/avazu_data_processer.py deleted file mode 100644 index dd3c1441f8f8b26473d15889198abb3593edfa51..0000000000000000000000000000000000000000 --- a/legacy/ctr/avazu_data_processer.py +++ /dev/null @@ -1,414 +0,0 @@ -import sys -import csv -import cPickle -import argparse -import os -import numpy as np - -from utils import logger, TaskMode - -parser = argparse.ArgumentParser(description="PaddlePaddle CTR example") -parser.add_argument( - '--data_path', type=str, required=True, help="path of the Avazu dataset") -parser.add_argument( - '--output_dir', type=str, required=True, help="directory to output") -parser.add_argument( - '--num_lines_to_detect', - type=int, - default=500000, - help="number of records to detect dataset's meta info") -parser.add_argument( - '--test_set_size', - type=int, - default=10000, - help="size of the validation dataset(default: 10000)") -parser.add_argument( - '--train_size', - type=int, - default=100000, - help="size of the trainset (default: 100000)") -args = parser.parse_args() -''' -The fields of the dataset are: - - 0. id: ad identifier - 1. click: 0/1 for non-click/click - 2. hour: format is YYMMDDHH, so 14091123 means 23:00 on Sept. 11, 2014 UTC. - 3. C1 -- anonymized categorical variable - 4. banner_pos - 5. site_id - 6. site_domain - 7. site_category - 8. app_id - 9. app_domain - 10. app_category - 11. device_id - 12. device_ip - 13. device_model - 14. device_type - 15. device_conn_type - 16. C14-C21 -- anonymized categorical variables - -We will treat the following fields as categorical features: - - - C1 - - banner_pos - - site_category - - app_category - - device_type - - device_conn_type - -and some other features as id features: - - - id - - site_id - - app_id - - device_id - -The `hour` field will be treated as a continuous feature and will be transformed -to one-hot representation which has 24 bits. - -This script will output 3 files: - -1. train.txt -2. test.txt -3. infer.txt - -all the files are for demo. -''' - -feature_dims = {} - -categorial_features = ( - 'C1 banner_pos site_category app_category ' + 'device_type device_conn_type' -).split() - -id_features = 'id site_id app_id device_id _device_id_cross_site_id'.split() - - -def get_all_field_names(mode=0): - ''' - @mode: int - 0 for train, 1 for test - @return: list of str - ''' - return categorial_features + ['hour'] + id_features + ['click'] \ - if mode == 0 else [] - - -class CategoryFeatureGenerator(object): - ''' - Generator category features. - - Register all records by calling `register` first, then call `gen` to generate - one-hot representation for a record. - ''' - - def __init__(self): - self.dic = {'unk': 0} - self.counter = 1 - - def register(self, key): - ''' - Register record. - ''' - if key not in self.dic: - self.dic[key] = self.counter - self.counter += 1 - - def size(self): - return len(self.dic) - - def gen(self, key): - ''' - Generate one-hot representation for a record. - ''' - if key not in self.dic: - res = self.dic['unk'] - else: - res = self.dic[key] - return [res] - - def __repr__(self): - return '' % len(self.dic) - - -class IDfeatureGenerator(object): - def __init__(self, max_dim, cross_fea0=None, cross_fea1=None): - ''' - @max_dim: int - Size of the id elements' space - ''' - self.max_dim = max_dim - self.cross_fea0 = cross_fea0 - self.cross_fea1 = cross_fea1 - - def gen(self, key): - ''' - Generate one-hot representation for records - ''' - return [hash(key) % self.max_dim] - - def gen_cross_fea(self, fea1, fea2): - key = str(fea1) + str(fea2) - return self.gen(key) - - def size(self): - return self.max_dim - - -class ContinuousFeatureGenerator(object): - def __init__(self, n_intervals): - self.min = sys.maxint - self.max = sys.minint - self.n_intervals = n_intervals - - def register(self, val): - self.min = min(self.minint, val) - self.max = max(self.maxint, val) - - def gen(self, val): - self.len_part = (self.max - self.min) / self.n_intervals - return (val - self.min) / self.len_part - - -# init all feature generators -fields = {} -for key in categorial_features: - fields[key] = CategoryFeatureGenerator() -for key in id_features: - # for cross features - if 'cross' in key: - feas = key[1:].split('_cross_') - fields[key] = IDfeatureGenerator(10000000, *feas) - # for normal ID features - else: - fields[key] = IDfeatureGenerator(10000) - -# used as feed_dict in PaddlePaddle -field_index = dict((key, id) - for id, key in enumerate(['dnn_input', 'lr_input', 'click'])) - - -def detect_dataset(path, topn, id_fea_space=10000): - ''' - Parse the first `topn` records to collect meta information of this dataset. - - NOTE the records should be randomly shuffled first. - ''' - # create categorical statis objects. - logger.warning('detecting dataset') - - with open(path, 'rb') as csvfile: - reader = csv.DictReader(csvfile) - for row_id, row in enumerate(reader): - if row_id > topn: - break - - for key in categorial_features: - fields[key].register(row[key]) - - for key, item in fields.items(): - feature_dims[key] = item.size() - - feature_dims['hour'] = 24 - feature_dims['click'] = 1 - - feature_dims['dnn_input'] = np.sum( - feature_dims[key] for key in categorial_features + ['hour']) + 1 - feature_dims['lr_input'] = np.sum(feature_dims[key] - for key in id_features) + 1 - return feature_dims - - -def load_data_meta(meta_path): - ''' - Load dataset's meta infomation. - ''' - feature_dims, fields = cPickle.load(open(meta_path, 'rb')) - return feature_dims, fields - - -def concat_sparse_vectors(inputs, dims): - ''' - Concaterate more than one sparse vectors into one. - - @inputs: list - list of sparse vector - @dims: list of int - dimention of each sparse vector - ''' - res = [] - assert len(inputs) == len(dims) - start = 0 - for no, vec in enumerate(inputs): - for v in vec: - res.append(v + start) - start += dims[no] - return res - - -class AvazuDataset(object): - ''' - Load AVAZU dataset as train set. - ''' - - def __init__(self, - train_path, - n_records_as_test=-1, - fields=None, - feature_dims=None): - self.train_path = train_path - self.n_records_as_test = n_records_as_test - self.fields = fields - # default is train mode. - self.mode = TaskMode.create_train() - - self.categorial_dims = [ - feature_dims[key] for key in categorial_features + ['hour'] - ] - self.id_dims = [feature_dims[key] for key in id_features] - - def train(self): - ''' - Load trainset. - ''' - logger.info("load trainset from %s" % self.train_path) - self.mode = TaskMode.create_train() - with open(self.train_path) as f: - reader = csv.DictReader(f) - - for row_id, row in enumerate(reader): - # skip top n lines - if self.n_records_as_test > 0 and row_id < self.n_records_as_test: - continue - - rcd = self._parse_record(row) - if rcd: - yield rcd - - def test(self): - ''' - Load testset. - ''' - logger.info("load testset from %s" % self.train_path) - self.mode = TaskMode.create_test() - with open(self.train_path) as f: - reader = csv.DictReader(f) - - for row_id, row in enumerate(reader): - # skip top n lines - if self.n_records_as_test > 0 and row_id > self.n_records_as_test: - break - - rcd = self._parse_record(row) - if rcd: - yield rcd - - def infer(self): - ''' - Load inferset. - ''' - logger.info("load inferset from %s" % self.train_path) - self.mode = TaskMode.create_infer() - with open(self.train_path) as f: - reader = csv.DictReader(f) - - for row_id, row in enumerate(reader): - rcd = self._parse_record(row) - if rcd: - yield rcd - - def _parse_record(self, row): - ''' - Parse a CSV row and get a record. - ''' - record = [] - for key in categorial_features: - record.append(self.fields[key].gen(row[key])) - record.append([int(row['hour'][-2:])]) - dense_input = concat_sparse_vectors(record, self.categorial_dims) - - record = [] - for key in id_features: - if 'cross' not in key: - record.append(self.fields[key].gen(row[key])) - else: - fea0 = self.fields[key].cross_fea0 - fea1 = self.fields[key].cross_fea1 - record.append(self.fields[key].gen_cross_fea(row[fea0], row[ - fea1])) - - sparse_input = concat_sparse_vectors(record, self.id_dims) - - record = [dense_input, sparse_input] - - if not self.mode.is_infer(): - record.append(list((int(row['click']), ))) - return record - - -def ids2dense(vec, dim): - return vec - - -def ids2sparse(vec): - return ["%d:1" % x for x in vec] - - -detect_dataset(args.data_path, args.num_lines_to_detect) -dataset = AvazuDataset( - args.data_path, - args.test_set_size, - fields=fields, - feature_dims=feature_dims) - -output_trainset_path = os.path.join(args.output_dir, 'train.txt') -output_testset_path = os.path.join(args.output_dir, 'test.txt') -output_infer_path = os.path.join(args.output_dir, 'infer.txt') -output_meta_path = os.path.join(args.output_dir, 'data.meta.txt') - -with open(output_trainset_path, 'w') as f: - for id, record in enumerate(dataset.train()): - if id and id % 10000 == 0: - logger.info("load %d records" % id) - if id > args.train_size: - break - dnn_input, lr_input, click = record - dnn_input = ids2dense(dnn_input, feature_dims['dnn_input']) - lr_input = ids2sparse(lr_input) - line = "%s\t%s\t%d\n" % (' '.join(map(str, dnn_input)), - ' '.join(map(str, lr_input)), click[0]) - f.write(line) - logger.info('write to %s' % output_trainset_path) - -with open(output_testset_path, 'w') as f: - for id, record in enumerate(dataset.test()): - dnn_input, lr_input, click = record - dnn_input = ids2dense(dnn_input, feature_dims['dnn_input']) - lr_input = ids2sparse(lr_input) - line = "%s\t%s\t%d\n" % (' '.join(map(str, dnn_input)), - ' '.join(map(str, lr_input)), click[0]) - f.write(line) - logger.info('write to %s' % output_testset_path) - -with open(output_infer_path, 'w') as f: - for id, record in enumerate(dataset.infer()): - dnn_input, lr_input = record - dnn_input = ids2dense(dnn_input, feature_dims['dnn_input']) - lr_input = ids2sparse(lr_input) - line = "%s\t%s\n" % ( - ' '.join(map(str, dnn_input)), - ' '.join(map(str, lr_input)), ) - f.write(line) - if id > args.test_set_size: - break - logger.info('write to %s' % output_infer_path) - -with open(output_meta_path, 'w') as f: - lines = [ - "dnn_input_dim: %d" % feature_dims['dnn_input'], - "lr_input_dim: %d" % feature_dims['lr_input'] - ] - f.write('\n'.join(lines)) - logger.info('write data meta into %s' % output_meta_path) diff --git a/legacy/ctr/dataset.md b/legacy/ctr/dataset.md deleted file mode 100644 index 16c0f9784bf3409ac5bbe704f932a9b28680fbf8..0000000000000000000000000000000000000000 --- a/legacy/ctr/dataset.md +++ /dev/null @@ -1,296 +0,0 @@ -# 数据及处理 -## 数据集介绍 - -本教程演示使用Kaggle上CTR任务的数据集\[[3](#参考文献)\]的预处理方法,最终产生本模型需要的格式,详细的数据格式参考[README.md](./README.md)。 - -Wide && Deep Model\[[2](#参考文献)\]的优势是融合稠密特征和大规模稀疏特征, -因此特征处理方面也针对稠密和稀疏两种特征作处理, -其中Deep部分的稠密值全部转化为ID类特征, -通过embedding 来转化为稠密的向量输入;Wide部分主要通过ID的叉乘提升维度。 - -数据集使用 `csv` 格式存储,其中各个字段内容如下: - -- `id` : ad identifier -- `click` : 0/1 for non-click/click -- `hour` : format is YYMMDDHH, so 14091123 means 23:00 on Sept. 11, 2014 UTC. -- `C1` : anonymized categorical variable -- `banner_pos` -- `site_id` -- `site_domain` -- `site_category` -- `app_id` -- `app_domain` -- `app_category` -- `device_id` -- `device_ip` -- `device_model` -- `device_type` -- `device_conn_type` -- `C14-C21` : anonymized categorical variables - - -## 特征提取 - -下面我们会简单演示几种特征的提取方式。 - -原始数据中的特征可以分为以下几类: - -1. ID 类特征(稀疏,数量多) -- `id` -- `site_id` -- `app_id` -- `device_id` - -2. 类别类特征(稀疏,但数量有限) - -- `C1` -- `site_category` -- `device_type` -- `C14-C21` - -3. 数值型特征转化为类别型特征 - -- hour (可以转化成数值,也可以按小时为单位转化为类别) - -### 类别类特征 - -类别类特征的提取方法有以下两种: - -1. One-hot 表示作为特征 -2. 类似词向量,用一个 Embedding 将每个类别映射到对应的向量 - - -### ID 类特征 - -ID 类特征的特点是稀疏数据,但量比较大,直接使用 One-hot 表示时维度过大。 - -一般会作如下处理: - -1. 确定表示的最大维度 N -2. newid = id % N -3. 用 newid 作为类别类特征使用 - -上面的方法尽管存在一定的碰撞概率,但能够处理任意数量的 ID 特征,并保留一定的效果\[[2](#参考文献)\]。 - -### 数值型特征 - -一般会做如下处理: - -- 归一化,直接作为特征输入模型 -- 用区间分割处理成类别类特征,稀疏化表示,模糊细微上的差别 - -## 特征处理 - - -### 类别型特征 - -类别型特征有有限多种值,在模型中,我们一般使用 Embedding将每种值映射为连续值的向量。 - -这种特征在输入到模型时,一般使用 One-hot 表示,相关处理方法如下: - -```python -class CategoryFeatureGenerator(object): - ''' - Generator category features. - - Register all records by calling ~register~ first, then call ~gen~ to generate - one-hot representation for a record. - ''' - - def __init__(self): - self.dic = {'unk': 0} - self.counter = 1 - - def register(self, key): - ''' - Register record. - ''' - if key not in self.dic: - self.dic[key] = self.counter - self.counter += 1 - - def size(self): - return len(self.dic) - - def gen(self, key): - ''' - Generate one-hot representation for a record. - ''' - if key not in self.dic: - res = self.dic['unk'] - else: - res = self.dic[key] - return [res] - - def __repr__(self): - return '' % len(self.dic) -``` - -`CategoryFeatureGenerator` 需要先扫描数据集,得到该类别对应的项集合,之后才能开始生成特征。 - -我们的实验数据集\[[3](https://www.kaggle.com/c/avazu-ctr-prediction/data)\]已经经过shuffle,可以扫描前面一定数目的记录来近似总的类别项集合(等价于随机抽样), -对于没有抽样上的低频类别项,可以用一个 UNK 的特殊值表示。 - -```python -fields = {} -for key in categorial_features: - fields[key] = CategoryFeatureGenerator() - -def detect_dataset(path, topn, id_fea_space=10000): - ''' - Parse the first `topn` records to collect meta information of this dataset. - - NOTE the records should be randomly shuffled first. - ''' - # create categorical statis objects. - - with open(path, 'rb') as csvfile: - reader = csv.DictReader(csvfile) - for row_id, row in enumerate(reader): - if row_id > topn: - break - - for key in categorial_features: - fields[key].register(row[key]) -``` - -`CategoryFeatureGenerator` 在注册得到数据集中对应类别信息后,可以对相应记录生成对应的特征表示: - -```python -record = [] -for key in categorial_features: - record.append(fields[key].gen(row[key])) -``` - -本任务中,类别类特征会输入到 DNN 中使用。 - -### ID 类特征 - -ID 类特征代稀疏值,且值的空间很大的情况,一般用模操作规约到一个有限空间, -之后可以当成类别类特征使用,这里我们会将 ID 类特征输入到 LR 模型中使用。 - -```python -class IDfeatureGenerator(object): - def __init__(self, max_dim): - ''' - @max_dim: int - Size of the id elements' space - ''' - self.max_dim = max_dim - - def gen(self, key): - ''' - Generate one-hot representation for records - ''' - return [hash(key) % self.max_dim] - - def size(self): - return self.max_dim -``` - -`IDfeatureGenerator` 不需要预先初始化,可以直接生成特征,比如 - -```python -record = [] -for key in id_features: - if 'cross' not in key: - record.append(fields[key].gen(row[key])) -``` - -### 交叉类特征 - -LR 模型作为 Wide & Deep model 的 `wide` 部分,可以输入很 wide 的数据(特征空间的维度很大), -为了充分利用这个优势,我们将演示交叉组合特征构建成更大维度特征的情况,之后塞入到模型中训练。 - -这里我们依旧使用模操作来约束最终组合出的特征空间的大小,具体实现是直接在 `IDfeatureGenerator` 中添加一个 `gen_cross_feature` 的方法: - -```python -def gen_cross_fea(self, fea1, fea2): - key = str(fea1) + str(fea2) - return self.gen(key) -``` - -比如,我们觉得原始数据中, `device_id` 和 `site_id` 有一些关联(比如某个 device 倾向于浏览特定 site), -我们通过组合出两者组合来捕捉这类信息。 - -```python -fea0 = fields[key].cross_fea0 -fea1 = fields[key].cross_fea1 -record.append( - fields[key].gen_cross_fea(row[fea0], row[fea1])) -``` - -### 特征维度 -#### Deep submodel(DNN)特征 -| feature | dimention | -|------------------|-----------| -| app_category | 21 | -| site_category | 22 | -| device_conn_type | 5 | -| hour | 24 | -| banner_pos | 7 | -| **Total** | 79 | - -#### Wide submodel(LR)特征 -| Feature | Dimention | -|---------------------|-----------| -| id | 10000 | -| site_id | 10000 | -| app_id | 10000 | -| device_id | 10000 | -| device_id X site_id | 1000000 | -| **Total** | 1,040,000 | - -## 输入到 PaddlePaddle 中 - -Deep 和 Wide 两部分均以 `sparse_binary_vector` 的格式 \[[1](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/api/v1/data_provider/pydataprovider2_en.rst)\] 输入,输入前需要将相关特征拼合,模型最终只接受 3 个 input, -分别是 - -1. `dnn input` ,DNN 的输入 -2. `lr input` , LR 的输入 -3. `click` , 标签 - -拼合特征的方法: - -```python -def concat_sparse_vectors(inputs, dims): - ''' - concaterate sparse vectors into one - - @inputs: list - list of sparse vector - @dims: list of int - dimention of each sparse vector - ''' - res = [] - assert len(inputs) == len(dims) - start = 0 - for no, vec in enumerate(inputs): - for v in vec: - res.append(v + start) - start += dims[no] - return res -``` - -生成最终特征的代码如下: - -```python -# dimentions of the features -categorial_dims = [ - feature_dims[key] for key in categorial_features + ['hour'] -] -id_dims = [feature_dims[key] for key in id_features] - -dense_input = concat_sparse_vectors(record, categorial_dims) -sparse_input = concat_sparse_vectors(record, id_dims) - -record = [dense_input, sparse_input] -record.append(list((int(row['click']), ))) -yield record -``` - -## 参考文献 - -1. -2. Mikolov T, Deoras A, Povey D, et al. [Strategies for training large scale neural network language models](https://www.researchgate.net/profile/Lukas_Burget/publication/241637478_Strategies_for_training_large_scale_neural_network_language_models/links/542c14960cf27e39fa922ed3.pdf)[C]//Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on. IEEE, 2011: 196-201. -3. diff --git a/legacy/ctr/images/lr_vs_dnn.jpg b/legacy/ctr/images/lr_vs_dnn.jpg deleted file mode 100644 index 50a0db583cd9b6e1a5bc0f83a28ab6e22d649931..0000000000000000000000000000000000000000 Binary files a/legacy/ctr/images/lr_vs_dnn.jpg and /dev/null differ diff --git a/legacy/ctr/images/wide_deep.png b/legacy/ctr/images/wide_deep.png deleted file mode 100644 index 616f88cb22607c1c6bcbe4312644f632ef284e8e..0000000000000000000000000000000000000000 Binary files a/legacy/ctr/images/wide_deep.png and /dev/null differ diff --git a/legacy/ctr/infer.py b/legacy/ctr/infer.py deleted file mode 100644 index 6541c74638df63a9304989c2ccaff0ff4c00463a..0000000000000000000000000000000000000000 --- a/legacy/ctr/infer.py +++ /dev/null @@ -1,79 +0,0 @@ -import gzip -import argparse -import itertools - -import paddle.v2 as paddle -import network_conf -from train import dnn_layer_dims -import reader -from utils import logger, ModelType - -parser = argparse.ArgumentParser(description="PaddlePaddle CTR example") -parser.add_argument( - '--model_gz_path', - type=str, - required=True, - help="path of model parameters gz file") -parser.add_argument( - '--data_path', type=str, required=True, help="path of the dataset to infer") -parser.add_argument( - '--prediction_output_path', - type=str, - required=True, - help="path to output the prediction") -parser.add_argument( - '--data_meta_path', - type=str, - default="./data.meta", - help="path of trainset's meta info, default is ./data.meta") -parser.add_argument( - '--model_type', - type=int, - required=True, - default=ModelType.CLASSIFICATION, - help='model type, classification: %d, regression %d (default classification)' - % (ModelType.CLASSIFICATION, ModelType.REGRESSION)) - -args = parser.parse_args() - -paddle.init(use_gpu=False, trainer_count=1) - - -class CTRInferer(object): - def __init__(self, param_path): - logger.info("create CTR model") - dnn_input_dim, lr_input_dim = reader.load_data_meta(args.data_meta_path) - # create the mdoel - self.ctr_model = network_conf.CTRmodel( - dnn_layer_dims, - dnn_input_dim, - lr_input_dim, - model_type=ModelType(args.model_type), - is_infer=True) - # load parameter - logger.info("load model parameters from %s" % param_path) - self.parameters = paddle.parameters.Parameters.from_tar( - gzip.open(param_path, 'r')) - self.inferer = paddle.inference.Inference( - output_layer=self.ctr_model.model, - parameters=self.parameters, ) - - def infer(self, data_path): - logger.info("infer data...") - dataset = reader.Dataset() - infer_reader = paddle.batch( - dataset.infer(args.data_path), batch_size=1000) - logger.warning('write predictions to %s' % args.prediction_output_path) - output_f = open(args.prediction_output_path, 'w') - for id, batch in enumerate(infer_reader()): - res = self.inferer.infer(input=batch) - predictions = [x for x in itertools.chain.from_iterable(res)] - assert len(batch) == len( - predictions), "predict error, %d inputs, but %d predictions" % ( - len(batch), len(predictions)) - output_f.write('\n'.join(map(str, predictions)) + '\n') - - -if __name__ == '__main__': - ctr_inferer = CTRInferer(args.model_gz_path) - ctr_inferer.infer(args.data_path) diff --git a/legacy/ctr/network_conf.py b/legacy/ctr/network_conf.py deleted file mode 100644 index bcff49ee05e1d8cc80e2fdd28a771bf9bf9502e3..0000000000000000000000000000000000000000 --- a/legacy/ctr/network_conf.py +++ /dev/null @@ -1,104 +0,0 @@ -import paddle.v2 as paddle -from paddle.v2 import layer -from paddle.v2 import data_type as dtype -from utils import logger, ModelType - - -class CTRmodel(object): - ''' - A CTR model which implements wide && deep learning model. - ''' - - def __init__(self, - dnn_layer_dims, - dnn_input_dim, - lr_input_dim, - model_type=ModelType.create_classification(), - is_infer=False): - ''' - @dnn_layer_dims: list of integer - dims of each layer in dnn - @dnn_input_dim: int - size of dnn's input layer - @lr_input_dim: int - size of lr's input layer - @is_infer: bool - whether to build a infer model - ''' - self.dnn_layer_dims = dnn_layer_dims - self.dnn_input_dim = dnn_input_dim - self.lr_input_dim = lr_input_dim - self.model_type = model_type - self.is_infer = is_infer - - self._declare_input_layers() - - self.dnn = self._build_dnn_submodel_(self.dnn_layer_dims) - self.lr = self._build_lr_submodel_() - - # model's prediction - # TODO(superjom) rename it to prediction - if self.model_type.is_classification(): - self.model = self._build_classification_model(self.dnn, self.lr) - if self.model_type.is_regression(): - self.model = self._build_regression_model(self.dnn, self.lr) - - def _declare_input_layers(self): - self.dnn_merged_input = layer.data( - name='dnn_input', - type=paddle.data_type.sparse_binary_vector(self.dnn_input_dim)) - - self.lr_merged_input = layer.data( - name='lr_input', - type=paddle.data_type.sparse_float_vector(self.lr_input_dim)) - - if not self.is_infer: - self.click = paddle.layer.data( - name='click', type=dtype.dense_vector(1)) - - def _build_dnn_submodel_(self, dnn_layer_dims): - ''' - build DNN submodel. - ''' - dnn_embedding = layer.fc(input=self.dnn_merged_input, - size=dnn_layer_dims[0]) - _input_layer = dnn_embedding - for i, dim in enumerate(dnn_layer_dims[1:]): - fc = layer.fc(input=_input_layer, - size=dim, - act=paddle.activation.Relu(), - name='dnn-fc-%d' % i) - _input_layer = fc - return _input_layer - - def _build_lr_submodel_(self): - ''' - config LR submodel - ''' - fc = layer.fc(input=self.lr_merged_input, - size=1, - act=paddle.activation.Relu()) - return fc - - def _build_classification_model(self, dnn, lr): - merge_layer = layer.concat(input=[dnn, lr]) - self.output = layer.fc( - input=merge_layer, - size=1, - # use sigmoid function to approximate ctr rate, a float value between 0 and 1. - act=paddle.activation.Sigmoid()) - - if not self.is_infer: - self.train_cost = paddle.layer.multi_binary_label_cross_entropy_cost( - input=self.output, label=self.click) - return self.output - - def _build_regression_model(self, dnn, lr): - merge_layer = layer.concat(input=[dnn, lr]) - self.output = layer.fc(input=merge_layer, - size=1, - act=paddle.activation.Sigmoid()) - if not self.is_infer: - self.train_cost = paddle.layer.square_error_cost( - input=self.output, label=self.click) - return self.output diff --git a/legacy/ctr/reader.py b/legacy/ctr/reader.py deleted file mode 100644 index cafa2349ed0e51a8de65dbeeea8b345edcf0a879..0000000000000000000000000000000000000000 --- a/legacy/ctr/reader.py +++ /dev/null @@ -1,64 +0,0 @@ -from utils import logger, TaskMode, load_dnn_input_record, load_lr_input_record - -feeding_index = {'dnn_input': 0, 'lr_input': 1, 'click': 2} - - -class Dataset(object): - def train(self, path): - ''' - Load trainset. - ''' - logger.info("load trainset from %s" % path) - mode = TaskMode.create_train() - return self._parse_creator(path, mode) - - def test(self, path): - ''' - Load testset. - ''' - logger.info("load testset from %s" % path) - mode = TaskMode.create_test() - return self._parse_creator(path, mode) - - def infer(self, path): - ''' - Load infer set. - ''' - logger.info("load inferset from %s" % path) - mode = TaskMode.create_infer() - return self._parse_creator(path, mode) - - def _parse_creator(self, path, mode): - ''' - Parse dataset. - ''' - - def _parse(): - with open(path) as f: - for line_id, line in enumerate(f): - fs = line.strip().split('\t') - dnn_input = load_dnn_input_record(fs[0]) - lr_input = load_lr_input_record(fs[1]) - if not mode.is_infer(): - click = [int(fs[2])] - yield dnn_input, lr_input, click - else: - yield dnn_input, lr_input - - return _parse - - -def load_data_meta(path): - ''' - load data meta info from path, return (dnn_input_dim, lr_input_dim) - ''' - with open(path) as f: - lines = f.read().split('\n') - err_info = "wrong meta format" - assert len(lines) == 2, err_info - assert 'dnn_input_dim:' in lines[0] and 'lr_input_dim:' in lines[ - 1], err_info - res = map(int, [_.split(':')[1] for _ in lines]) - logger.info('dnn input dim: %d' % res[0]) - logger.info('lr input dim: %d' % res[1]) - return res diff --git a/legacy/ctr/train.py b/legacy/ctr/train.py deleted file mode 100644 index de7add61d65aba363cc17bed49d32c9054600108..0000000000000000000000000000000000000000 --- a/legacy/ctr/train.py +++ /dev/null @@ -1,112 +0,0 @@ -import argparse -import gzip - -import reader -import paddle.v2 as paddle -from utils import logger, ModelType -from network_conf import CTRmodel - - -def parse_args(): - parser = argparse.ArgumentParser(description="PaddlePaddle CTR example") - parser.add_argument( - '--train_data_path', - type=str, - required=True, - help="path of training dataset") - parser.add_argument( - '--test_data_path', type=str, help='path of testing dataset') - parser.add_argument( - '--batch_size', - type=int, - default=10000, - help="size of mini-batch (default:10000)") - parser.add_argument( - '--num_passes', type=int, default=10, help="number of passes to train") - parser.add_argument( - '--model_output_prefix', - type=str, - default='./ctr_models', - help='prefix of path for model to store (default: ./ctr_models)') - parser.add_argument( - '--data_meta_file', - type=str, - required=True, - help='path of data meta info file', ) - parser.add_argument( - '--model_type', - type=int, - required=True, - default=ModelType.CLASSIFICATION, - help='model type, classification: %d, regression %d (default classification)' - % (ModelType.CLASSIFICATION, ModelType.REGRESSION)) - - return parser.parse_args() - - -dnn_layer_dims = [128, 64, 32, 1] - -# ============================================================================== -# cost and train period -# ============================================================================== - - -def train(): - args = parse_args() - args.model_type = ModelType(args.model_type) - paddle.init(use_gpu=False, trainer_count=1) - dnn_input_dim, lr_input_dim = reader.load_data_meta(args.data_meta_file) - - # create ctr model. - model = CTRmodel( - dnn_layer_dims, - dnn_input_dim, - lr_input_dim, - model_type=args.model_type, - is_infer=False) - - params = paddle.parameters.create(model.train_cost) - optimizer = paddle.optimizer.AdaGrad() - - trainer = paddle.trainer.SGD(cost=model.train_cost, - parameters=params, - update_equation=optimizer) - - dataset = reader.Dataset() - - def __event_handler__(event): - if isinstance(event, paddle.event.EndIteration): - num_samples = event.batch_id * args.batch_size - if event.batch_id % 100 == 0: - logger.warning("Pass %d, Samples %d, Cost %f, %s" % ( - event.pass_id, num_samples, event.cost, event.metrics)) - - if event.batch_id % 1000 == 0: - if args.test_data_path: - result = trainer.test( - reader=paddle.batch( - dataset.test(args.test_data_path), - batch_size=args.batch_size), - feeding=reader.feeding_index) - logger.warning("Test %d-%d, Cost %f, %s" % - (event.pass_id, event.batch_id, result.cost, - result.metrics)) - - path = "{}-pass-{}-batch-{}-test-{}.tar.gz".format( - args.model_output_prefix, event.pass_id, event.batch_id, - result.cost) - with gzip.open(path, 'w') as f: - trainer.save_parameter_to_tar(f) - - trainer.train( - reader=paddle.batch( - paddle.reader.shuffle( - dataset.train(args.train_data_path), buf_size=500), - batch_size=args.batch_size), - feeding=reader.feeding_index, - event_handler=__event_handler__, - num_passes=args.num_passes) - - -if __name__ == '__main__': - train() diff --git a/legacy/ctr/utils.py b/legacy/ctr/utils.py deleted file mode 100644 index 437554c3c291d5a74cc0b3844c8684c73b189a19..0000000000000000000000000000000000000000 --- a/legacy/ctr/utils.py +++ /dev/null @@ -1,70 +0,0 @@ -import logging - -logging.basicConfig() -logger = logging.getLogger("paddle") -logger.setLevel(logging.INFO) - - -class TaskMode: - TRAIN_MODE = 0 - TEST_MODE = 1 - INFER_MODE = 2 - - def __init__(self, mode): - self.mode = mode - - def is_train(self): - return self.mode == self.TRAIN_MODE - - def is_test(self): - return self.mode == self.TEST_MODE - - def is_infer(self): - return self.mode == self.INFER_MODE - - @staticmethod - def create_train(): - return TaskMode(TaskMode.TRAIN_MODE) - - @staticmethod - def create_test(): - return TaskMode(TaskMode.TEST_MODE) - - @staticmethod - def create_infer(): - return TaskMode(TaskMode.INFER_MODE) - - -class ModelType: - CLASSIFICATION = 0 - REGRESSION = 1 - - def __init__(self, mode): - self.mode = mode - - def is_classification(self): - return self.mode == self.CLASSIFICATION - - def is_regression(self): - return self.mode == self.REGRESSION - - @staticmethod - def create_classification(): - return ModelType(ModelType.CLASSIFICATION) - - @staticmethod - def create_regression(): - return ModelType(ModelType.REGRESSION) - - -def load_dnn_input_record(sent): - return map(int, sent.split()) - - -def load_lr_input_record(sent): - res = [] - for _ in [x.split(':') for x in sent.split()]: - res.append(( - int(_[0]), - float(_[1]), )) - return res diff --git a/legacy/deep_fm/README.cn.md b/legacy/deep_fm/README.cn.md deleted file mode 100644 index 1f651acbde0078340dab06c551f583ca2b1dd86c..0000000000000000000000000000000000000000 --- a/legacy/deep_fm/README.cn.md +++ /dev/null @@ -1,76 +0,0 @@ -运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html)中的说明更新PaddlePaddle安装版本。 - ---- - -# 基于深度因子分解机的点击率预估模型 - -## 介绍 -本模型实现了下述论文中提出的DeepFM模型: - -```text -@inproceedings{guo2017deepfm, - title={DeepFM: A Factorization-Machine based Neural Network for CTR Prediction}, - author={Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li and Xiuqiang He}, - booktitle={the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI)}, - pages={1725--1731}, - year={2017} -} -``` - -DeepFM模型把因子分解机和深度神经网络的低阶和高阶特征的相互作用结合起来,有关因子分解机的详细信息,请参考论文[因子分解机](https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf)。 - -## 数据集 -本文使用的是Kaggle公司举办的[展示广告竞赛](https://www.kaggle.com/c/criteo-display-ad-challenge/)中所使用的Criteo数据集。 - -每一行是一次广告展示的特征,第一列是一个标签,表示这次广告展示是否被点击。总共有39个特征,其中13个特征采用整型值,另外26个特征是类别类特征。测试集中是没有标签的。 - -下载数据集: -```bash -cd data && ./download.sh && cd .. -``` - -## 模型 -DeepFM模型是由因子分解机(FM)和深度神经网络(DNN)组成的。所有的输入特征都会同时输入FM和DNN,最后把FM和DNN的输出结合在一起形成最终的输出。DNN中稀疏特征生成的嵌入层与FM层中的隐含向量(因子)共享参数。 - -PaddlePaddle中的因子分解机层负责计算二阶组合特征的相互关系。以下的代码示例结合了因子分解机层和全连接层,形成了完整的的因子分解机: - -```python -def fm_layer(input, factor_size): - first_order = paddle.layer.fc(input=input, size=1, act=paddle.activation.Linear()) - second_order = paddle.layer.factorization_machine(input=input, factor_size=factor_size) - fm = paddle.layer.addto(input=[first_order, second_order], - act=paddle.activation.Linear(), - bias_attr=False) - return fm -``` - -## 数据准备 -处理原始数据集,整型特征使用min-max归一化方法规范到[0, 1],类别类特征使用了one-hot编码。原始数据集分割成两部分:90%用于训练,其他10%用于训练过程中的验证。 - -```bash -python preprocess.py --datadir ./data/raw --outdir ./data -``` - -## 训练 -训练的命令行选项可以通过`python train.py -h`列出。 - -训练模型: -```bash -python train.py \ - --train_data_path data/train.txt \ - --test_data_path data/valid.txt \ - 2>&1 | tee train.log -``` - -训练到第9轮的第40000个batch后,测试的AUC为0.807178,误差(cost)为0.445196。 - -## 预测 -预测的命令行选项可以通过`python infer.py -h`列出。 - -对测试集进行预测: -```bash -python infer.py \ - --model_gz_path models/model-pass-9-batch-10000.tar.gz \ - --data_path data/test.txt \ - --prediction_output_path ./predict.txt -``` diff --git a/legacy/deep_fm/README.md b/legacy/deep_fm/README.md deleted file mode 100644 index 6e2c6fad38d2e9e9db8d17c4967196b4f1cc5a36..0000000000000000000000000000000000000000 --- a/legacy/deep_fm/README.md +++ /dev/null @@ -1,95 +0,0 @@ -The minimum PaddlePaddle version needed for the code sample in this directory is v0.11.0. If you are on a version of PaddlePaddle earlier than v0.11.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html). - ---- - -# Deep Factorization Machine for Click-Through Rate prediction - -## Introduction -This model implements the DeepFM proposed in the following paper: - -```text -@inproceedings{guo2017deepfm, - title={DeepFM: A Factorization-Machine based Neural Network for CTR Prediction}, - author={Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li and Xiuqiang He}, - booktitle={the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI)}, - pages={1725--1731}, - year={2017} -} -``` - -The DeepFm combines factorization machine and deep neural networks to model -both low order and high order feature interactions. For details of the -factorization machines, please refer to the paper [factorization -machines](https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf) - -## Dataset -This example uses Criteo dataset which was used for the [Display Advertising -Challenge](https://www.kaggle.com/c/criteo-display-ad-challenge/) -hosted by Kaggle. - -Each row is the features for an ad display and the first column is a label -indicating whether this ad has been clicked or not. There are 39 features in -total. 13 features take integer values and the other 26 features are -categorical features. For the test dataset, the labels are omitted. - -Download dataset: -```bash -cd data && ./download.sh && cd .. -``` - -## Model -The DeepFM model is composed of the factorization machine layer (FM) and deep -neural networks (DNN). All the input features are feeded to both FM and DNN. -The output from FM and DNN are combined to form the final output. The embedding -layer for sparse features in the DNN shares the parameters with the latent -vectors (factors) of the FM layer. - -The factorization machine layer in PaddlePaddle computes the second order -interactions. The following code example combines the factorization machine -layer and fully connected layer to form the full version of factorization -machine: - -```python -def fm_layer(input, factor_size): - first_order = paddle.layer.fc(input=input, size=1, act=paddle.activation.Linear()) - second_order = paddle.layer.factorization_machine(input=input, factor_size=factor_size) - fm = paddle.layer.addto(input=[first_order, second_order], - act=paddle.activation.Linear(), - bias_attr=False) - return fm -``` - -## Data preparation -To preprocess the raw dataset, the integer features are clipped then min-max -normalized to [0, 1] and the categorical features are one-hot encoded. The raw -training dataset are splited such that 90% are used for training and the other -10% are used for validation during training. - -```bash -python preprocess.py --datadir ./data/raw --outdir ./data -``` - -## Train -The command line options for training can be listed by `python train.py -h`. - -To train the model: -```bash -python train.py \ - --train_data_path data/train.txt \ - --test_data_path data/valid.txt \ - 2>&1 | tee train.log -``` - -After training pass 9 batch 40000, the testing AUC is `0.807178` and the testing -cost is `0.445196`. - -## Infer -The command line options for infering can be listed by `python infer.py -h`. - -To make inference for the test dataset: -```bash -python infer.py \ - --model_gz_path models/model-pass-9-batch-10000.tar.gz \ - --data_path data/test.txt \ - --prediction_output_path ./predict.txt -``` diff --git a/legacy/deep_fm/data/download.sh b/legacy/deep_fm/data/download.sh deleted file mode 100755 index 466a22f2c6cc885cea0a1468f3043cb59c611b59..0000000000000000000000000000000000000000 --- a/legacy/deep_fm/data/download.sh +++ /dev/null @@ -1,8 +0,0 @@ -#!/bin/bash - -wget --no-check-certificate https://s3-eu-west-1.amazonaws.com/criteo-labs/dac.tar.gz -tar zxf dac.tar.gz -rm -f dac.tar.gz - -mkdir raw -mv ./*.txt raw/ diff --git a/legacy/deep_fm/infer.py b/legacy/deep_fm/infer.py deleted file mode 100755 index 40a5929780090d403b8b905f8e949f1f8a020eb3..0000000000000000000000000000000000000000 --- a/legacy/deep_fm/infer.py +++ /dev/null @@ -1,63 +0,0 @@ -import os -import gzip -import argparse -import itertools - -import paddle.v2 as paddle - -from network_conf import DeepFM -import reader - - -def parse_args(): - parser = argparse.ArgumentParser(description="PaddlePaddle DeepFM example") - parser.add_argument( - '--model_gz_path', - type=str, - required=True, - help="The path of model parameters gz file") - parser.add_argument( - '--data_path', - type=str, - required=True, - help="The path of the dataset to infer") - parser.add_argument( - '--prediction_output_path', - type=str, - required=True, - help="The path to output the prediction") - parser.add_argument( - '--factor_size', - type=int, - default=10, - help="The factor size for the factorization machine (default:10)") - - return parser.parse_args() - - -def infer(): - args = parse_args() - - paddle.init(use_gpu=False, trainer_count=1) - - model = DeepFM(args.factor_size, infer=True) - - parameters = paddle.parameters.Parameters.from_tar( - gzip.open(args.model_gz_path, 'r')) - - inferer = paddle.inference.Inference( - output_layer=model, parameters=parameters) - - dataset = reader.Dataset() - - infer_reader = paddle.batch(dataset.infer(args.data_path), batch_size=1000) - - with open(args.prediction_output_path, 'w') as out: - for id, batch in enumerate(infer_reader()): - res = inferer.infer(input=batch) - predictions = [x for x in itertools.chain.from_iterable(res)] - out.write('\n'.join(map(str, predictions)) + '\n') - - -if __name__ == '__main__': - infer() diff --git a/legacy/deep_fm/network_conf.py b/legacy/deep_fm/network_conf.py deleted file mode 100644 index 545fe07b8197e3379eb5a6f34c3134b813a4684e..0000000000000000000000000000000000000000 --- a/legacy/deep_fm/network_conf.py +++ /dev/null @@ -1,75 +0,0 @@ -import paddle.v2 as paddle - -dense_feature_dim = 13 -sparse_feature_dim = 117568 - - -def fm_layer(input, factor_size, fm_param_attr): - first_order = paddle.layer.fc(input=input, - size=1, - act=paddle.activation.Linear()) - second_order = paddle.layer.factorization_machine( - input=input, - factor_size=factor_size, - act=paddle.activation.Linear(), - param_attr=fm_param_attr) - out = paddle.layer.addto( - input=[first_order, second_order], - act=paddle.activation.Linear(), - bias_attr=False) - return out - - -def DeepFM(factor_size, infer=False): - dense_input = paddle.layer.data( - name="dense_input", - type=paddle.data_type.dense_vector(dense_feature_dim)) - sparse_input = paddle.layer.data( - name="sparse_input", - type=paddle.data_type.sparse_binary_vector(sparse_feature_dim)) - sparse_input_ids = [ - paddle.layer.data( - name="C" + str(i), - type=paddle.data_type.integer_value(sparse_feature_dim)) - for i in range(1, 27) - ] - - dense_fm = fm_layer( - dense_input, - factor_size, - fm_param_attr=paddle.attr.Param(name="DenseFeatFactors")) - sparse_fm = fm_layer( - sparse_input, - factor_size, - fm_param_attr=paddle.attr.Param(name="SparseFeatFactors")) - - def embedding_layer(input): - return paddle.layer.embedding( - input=input, - size=factor_size, - param_attr=paddle.attr.Param(name="SparseFeatFactors")) - - sparse_embed_seq = map(embedding_layer, sparse_input_ids) - sparse_embed = paddle.layer.concat(sparse_embed_seq) - - fc1 = paddle.layer.fc(input=[sparse_embed, dense_input], - size=400, - act=paddle.activation.Relu()) - fc2 = paddle.layer.fc(input=fc1, size=400, act=paddle.activation.Relu()) - fc3 = paddle.layer.fc(input=fc2, size=400, act=paddle.activation.Relu()) - - predict = paddle.layer.fc(input=[dense_fm, sparse_fm, fc3], - size=1, - act=paddle.activation.Sigmoid()) - - if not infer: - label = paddle.layer.data( - name="label", type=paddle.data_type.dense_vector(1)) - cost = paddle.layer.multi_binary_label_cross_entropy_cost( - input=predict, label=label) - paddle.evaluator.classification_error( - name="classification_error", input=predict, label=label) - paddle.evaluator.auc(name="auc", input=predict, label=label) - return cost - else: - return predict diff --git a/legacy/deep_fm/preprocess.py b/legacy/deep_fm/preprocess.py deleted file mode 100755 index 36ffea16637c19dee9352d17ed51a67edf582167..0000000000000000000000000000000000000000 --- a/legacy/deep_fm/preprocess.py +++ /dev/null @@ -1,164 +0,0 @@ -""" -Preprocess Criteo dataset. This dataset was used for the Display Advertising -Challenge (https://www.kaggle.com/c/criteo-display-ad-challenge). -""" -import os -import sys -import click -import random -import collections - -# There are 13 integer features and 26 categorical features -continous_features = range(1, 14) -categorial_features = range(14, 40) - -# Clip integer features. The clip point for each integer feature -# is derived from the 95% quantile of the total values in each feature -continous_clip = [20, 600, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50] - - -class CategoryDictGenerator: - """ - Generate dictionary for each of the categorical features - """ - - def __init__(self, num_feature): - self.dicts = [] - self.num_feature = num_feature - for i in range(0, num_feature): - self.dicts.append(collections.defaultdict(int)) - - def build(self, datafile, categorial_features, cutoff=0): - with open(datafile, 'r') as f: - for line in f: - features = line.rstrip('\n').split('\t') - for i in range(0, self.num_feature): - if features[categorial_features[i]] != '': - self.dicts[i][features[categorial_features[i]]] += 1 - for i in range(0, self.num_feature): - self.dicts[i] = filter(lambda x: x[1] >= cutoff, - self.dicts[i].items()) - self.dicts[i] = sorted(self.dicts[i], key=lambda x: (-x[1], x[0])) - vocabs, _ = list(zip(*self.dicts[i])) - self.dicts[i] = dict(zip(vocabs, range(1, len(vocabs) + 1))) - self.dicts[i][''] = 0 - - def gen(self, idx, key): - if key not in self.dicts[idx]: - res = self.dicts[idx][''] - else: - res = self.dicts[idx][key] - return res - - def dicts_sizes(self): - return map(len, self.dicts) - - -class ContinuousFeatureGenerator: - """ - Normalize the integer features to [0, 1] by min-max normalization - """ - - def __init__(self, num_feature): - self.num_feature = num_feature - self.min = [sys.maxint] * num_feature - self.max = [-sys.maxint] * num_feature - - def build(self, datafile, continous_features): - with open(datafile, 'r') as f: - for line in f: - features = line.rstrip('\n').split('\t') - for i in range(0, self.num_feature): - val = features[continous_features[i]] - if val != '': - val = int(val) - if val > continous_clip[i]: - val = continous_clip[i] - self.min[i] = min(self.min[i], val) - self.max[i] = max(self.max[i], val) - - def gen(self, idx, val): - if val == '': - return 0.0 - val = float(val) - return (val - self.min[idx]) / (self.max[idx] - self.min[idx]) - - -@click.command("preprocess") -@click.option("--datadir", type=str, help="Path to raw criteo dataset") -@click.option("--outdir", type=str, help="Path to save the processed data") -def preprocess(datadir, outdir): - """ - All the 13 integer features are normalzied to continous values and these - continous features are combined into one vecotr with dimension 13. - - Each of the 26 categorical features are one-hot encoded and all the one-hot - vectors are combined into one sparse binary vector. - """ - dists = ContinuousFeatureGenerator(len(continous_features)) - dists.build(os.path.join(datadir, 'train.txt'), continous_features) - - dicts = CategoryDictGenerator(len(categorial_features)) - dicts.build( - os.path.join(datadir, 'train.txt'), categorial_features, cutoff=200) - - dict_sizes = dicts.dicts_sizes() - categorial_feature_offset = [0] - for i in range(1, len(categorial_features)): - offset = categorial_feature_offset[i - 1] + dict_sizes[i - 1] - categorial_feature_offset.append(offset) - - random.seed(0) - - # 90% of the data are used for training, and 10% of the data are used - # for validation. - with open(os.path.join(outdir, 'train.txt'), 'w') as out_train: - with open(os.path.join(outdir, 'valid.txt'), 'w') as out_valid: - with open(os.path.join(datadir, 'train.txt'), 'r') as f: - for line in f: - features = line.rstrip('\n').split('\t') - - continous_vals = [] - for i in range(0, len(continous_features)): - val = dists.gen(i, features[continous_features[i]]) - continous_vals.append("{0:.6f}".format(val).rstrip('0') - .rstrip('.')) - categorial_vals = [] - for i in range(0, len(categorial_features)): - val = dicts.gen(i, features[categorial_features[ - i]]) + categorial_feature_offset[i] - categorial_vals.append(str(val)) - - continous_vals = ','.join(continous_vals) - categorial_vals = ','.join(categorial_vals) - label = features[0] - if random.randint(0, 9999) % 10 != 0: - out_train.write('\t'.join( - [continous_vals, categorial_vals, label]) + '\n') - else: - out_valid.write('\t'.join( - [continous_vals, categorial_vals, label]) + '\n') - - with open(os.path.join(outdir, 'test.txt'), 'w') as out: - with open(os.path.join(datadir, 'test.txt'), 'r') as f: - for line in f: - features = line.rstrip('\n').split('\t') - - continous_vals = [] - for i in range(0, len(continous_features)): - val = dists.gen(i, features[continous_features[i] - 1]) - continous_vals.append("{0:.6f}".format(val).rstrip('0') - .rstrip('.')) - categorial_vals = [] - for i in range(0, len(categorial_features)): - val = dicts.gen(i, features[categorial_features[ - i] - 1]) + categorial_feature_offset[i] - categorial_vals.append(str(val)) - - continous_vals = ','.join(continous_vals) - categorial_vals = ','.join(categorial_vals) - out.write('\t'.join([continous_vals, categorial_vals]) + '\n') - - -if __name__ == "__main__": - preprocess() diff --git a/legacy/deep_fm/reader.py b/legacy/deep_fm/reader.py deleted file mode 100644 index 1098ce423c9071864671be91dea81972e47fbc98..0000000000000000000000000000000000000000 --- a/legacy/deep_fm/reader.py +++ /dev/null @@ -1,58 +0,0 @@ -class Dataset: - def _reader_creator(self, path, is_infer): - def reader(): - with open(path, 'r') as f: - for line in f: - features = line.rstrip('\n').split('\t') - dense_feature = map(float, features[0].split(',')) - sparse_feature = map(int, features[1].split(',')) - if not is_infer: - label = [float(features[2])] - yield [dense_feature, sparse_feature - ] + sparse_feature + [label] - else: - yield [dense_feature, sparse_feature] + sparse_feature - - return reader - - def train(self, path): - return self._reader_creator(path, False) - - def test(self, path): - return self._reader_creator(path, False) - - def infer(self, path): - return self._reader_creator(path, True) - - -feeding = { - 'dense_input': 0, - 'sparse_input': 1, - 'C1': 2, - 'C2': 3, - 'C3': 4, - 'C4': 5, - 'C5': 6, - 'C6': 7, - 'C7': 8, - 'C8': 9, - 'C9': 10, - 'C10': 11, - 'C11': 12, - 'C12': 13, - 'C13': 14, - 'C14': 15, - 'C15': 16, - 'C16': 17, - 'C17': 18, - 'C18': 19, - 'C19': 20, - 'C20': 21, - 'C21': 22, - 'C22': 23, - 'C23': 24, - 'C24': 25, - 'C25': 26, - 'C26': 27, - 'label': 28 -} diff --git a/legacy/deep_fm/train.py b/legacy/deep_fm/train.py deleted file mode 100755 index 92d48696d8845ac13b714b66f7810acdd35fe164..0000000000000000000000000000000000000000 --- a/legacy/deep_fm/train.py +++ /dev/null @@ -1,108 +0,0 @@ -import os -import gzip -import logging -import argparse - -import paddle.v2 as paddle - -from network_conf import DeepFM -import reader - -logging.basicConfig() -logger = logging.getLogger("paddle") -logger.setLevel(logging.INFO) - - -def parse_args(): - parser = argparse.ArgumentParser(description="PaddlePaddle DeepFM example") - parser.add_argument( - '--train_data_path', - type=str, - required=True, - help="The path of training dataset") - parser.add_argument( - '--test_data_path', - type=str, - required=True, - help="The path of testing dataset") - parser.add_argument( - '--batch_size', - type=int, - default=1000, - help="The size of mini-batch (default:1000)") - parser.add_argument( - '--num_passes', - type=int, - default=10, - help="The number of passes to train (default: 10)") - parser.add_argument( - '--factor_size', - type=int, - default=10, - help="The factor size for the factorization machine (default:10)") - parser.add_argument( - '--model_output_dir', - type=str, - default='models', - help='The path for model to store (default: models)') - - return parser.parse_args() - - -def train(): - args = parse_args() - - if not os.path.isdir(args.model_output_dir): - os.mkdir(args.model_output_dir) - - paddle.init(use_gpu=False, trainer_count=1) - - optimizer = paddle.optimizer.Adam(learning_rate=1e-4) - - model = DeepFM(args.factor_size) - - params = paddle.parameters.create(model) - - trainer = paddle.trainer.SGD(cost=model, - parameters=params, - update_equation=optimizer) - - dataset = reader.Dataset() - - def __event_handler__(event): - if isinstance(event, paddle.event.EndIteration): - num_samples = event.batch_id * args.batch_size - if event.batch_id % 100 == 0: - logger.warning("Pass %d, Batch %d, Samples %d, Cost %f, %s" % - (event.pass_id, event.batch_id, num_samples, - event.cost, event.metrics)) - - if event.batch_id % 10000 == 0: - if args.test_data_path: - result = trainer.test( - reader=paddle.batch( - dataset.test(args.test_data_path), - batch_size=args.batch_size), - feeding=reader.feeding) - logger.warning("Test %d-%d, Cost %f, %s" % - (event.pass_id, event.batch_id, result.cost, - result.metrics)) - - path = "{}/model-pass-{}-batch-{}.tar.gz".format( - args.model_output_dir, event.pass_id, event.batch_id) - with gzip.open(path, 'w') as f: - trainer.save_parameter_to_tar(f) - - trainer.train( - reader=paddle.batch( - paddle.reader.shuffle( - dataset.train(args.train_data_path), - buf_size=args.batch_size * 10000), - batch_size=args.batch_size), - feeding=reader.feeding, - event_handler=__event_handler__, - num_passes=args.num_passes) - - -if __name__ == '__main__': - train() diff --git a/legacy/dssm/README.cn.md b/legacy/dssm/README.cn.md deleted file mode 100644 index 140446ad2e071e8bc185d7788dcf33651a370d69..0000000000000000000000000000000000000000 --- a/legacy/dssm/README.cn.md +++ /dev/null @@ -1,294 +0,0 @@ -运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此版本要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。 - ---- - -# 深度结构化语义模型 (Deep Structured Semantic Models, DSSM) -DSSM使用DNN模型在一个连续的语义空间中学习文本低纬的表示向量,并且建模两个句子间的语义相似度。本例演示如何使用PaddlePaddle实现一个通用的DSSM 模型,用于建模两个字符串间的语义相似度,模型实现支持通用的数据格式,用户替换数据便可以在真实场景中使用该模型。 - -## 背景介绍 -DSSM \[[1](##参考文献)\]是微软研究院13年提出来的经典的语义模型,用于学习两个文本之间的语义距离,广义上模型也可以推广和适用如下场景: - -1. CTR预估模型,衡量用户搜索词(Query)与候选网页集合(Documents)之间的相关联程度。 -2. 文本相关性,衡量两个字符串间的语义相关程度。 -3. 自动推荐,衡量User与被推荐的Item之间的关联程度。 - -DSSM 已经发展成了一个框架,可以很自然地建模两个记录之间的距离关系,例如对于文本相关性问题,可以用余弦相似度 (cosin similarity) 来刻画语义距离;而对于搜索引擎的结果排序,可以在DSSM上接上Rank损失训练出一个排序模型。 - -## 模型简介 -在原论文\[[1](#参考文献)\]中,DSSM模型用来衡量用户搜索词 Query 和文档集合 Documents 之间隐含的语义关系,模型结构如下 - -

    -

    -图 1. DSSM 原始结构 -

    - -其贯彻的思想是, **用DNN将高维特征向量转化为低纬空间的连续向量(图中红色框部分)** ,**在上层使用cosine similarity来衡量用户搜索词与候选文档间的语义相关性** 。 - -在最顶层损失函数的设计上,原始模型使用类似Word2Vec中负例采样的方法,一个Query会抽取正例 $D+$ 和4个负例 $D-$ 整体上算条件概率用对数似然函数作为损失,这也就是图 1中类似 $P(D_1|Q)$ 的结构,具体细节请参考原论文。 - -随着后续优化DSSM模型的结构得以简化\[[3](#参考文献)\],演变为: - -

    -

    -图 2. DSSM通用结构 -

    - -图中的空白方框可以用任何模型替代,例如:全连接FC,卷积CNN,RNN等。该模型结构专门用于衡量两个元素(比如字符串)间的语义距离。在实际任务中,DSSM模型会作为基础的积木,搭配上不同的损失函数来实现具体的功能,比如: - -- 在排序学习中,将 图 2 中结构添加 pairwise rank损失,变成一个排序模型 -- 在CTR预估中,对点击与否做0,1二元分类,添加交叉熵损失变成一个分类模型 -- 在需要对一个子串打分时,可以使用余弦相似度来计算相似度,变成一个回归模型 - -本例提供一个比较通用的解决方案,在模型任务类型上支持: - -- 分类 -- [-1, 1] 值域内的回归 -- Pairwise-Rank - -在生成低纬语义向量的模型结构上,支持以下三种: - -- FC, 多层全连接层 -- CNN,卷积神经网络 -- RNN,递归神经网络 - -## 模型实现 -DSSM模型可以拆成三部分:分别是左边和右边的DNN,以及顶层的损失函数。在复杂任务中,左右两边DNN的结构可以不同。在原始论文中左右网络分别学习Query和Document的语义向量,两者数据的数据不同,建议对应定制DNN的结构。 - -**本例中为了简便和通用,将左右两个DNN的结构设为相同,因此只提供三个选项FC、CNN、RNN**。 - -损失函数的设计也支持三种类型:分类, 回归, 排序;其中,在回归和排序两种损失中,左右两边的匹配程度通过余弦相似度(cosine similairty)来计算;在分类任务中,类别预测的分布通过softmax计算。 - -在其它教程中,对上述很多内容都有过详细的介绍,例如: - -- 如何CNN, FC 做文本信息提取可以参考 [text classification](https://github.com/PaddlePaddle/models/blob/develop/text_classification/README.md#模型详解) -- RNN/GRU 的内容可以参考 [Machine Translation](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md#gated-recurrent-unit-gru) -- Pairwise Rank即排序学习可参考 [learn to rank](https://github.com/PaddlePaddle/models/blob/develop/ltr/README.md) - -相关原理在此不再赘述,本文接下来的篇幅主要集中介绍使用PaddlePaddle实现这些结构上。 - -如图3,回归和分类模型的结构相似: - -

    -

    -图 3. DSSM for REGRESSION or CLASSIFICATION -

    - -最重要的组成部分包括词向量,图中`(1)`,`(2)`两个低纬向量的学习器(可以用RNN/CNN/FC中的任意一种实现),最上层对应的损失函数。 - -Pairwise Rank的结构会复杂一些,图 4. 中的结构会出现两次,增加了对应的损失函数,模型总体思想是: -- 给定同一个source(源)为左右两个target(目标)分别打分——`(a),(b)`,学习目标是(a),(b)之间的大小关系 -- `(a)`和`(b)`类似图3中结构,用于给source和target的pair打分 -- `(1)`和`(2)`的结构其实是共用的,都表示同一个source,图中为了表达效果展开成两个 - -

    -

    -图 4. DSSM for Pairwise Rank -

    - -下面是各个部分的具体实现,相关代码均包含在 `./network_conf.py` 中。 - - -### 创建文本的词向量表 - -```python -def create_embedding(self, input, prefix=''): - """ - Create word embedding. The `prefix` is added in front of the name of - embedding"s learnable parameter. - """ - logger.info("Create embedding table [%s] whose dimention is %d" % - (prefix, self.dnn_dims[0])) - emb = paddle.layer.embedding( - input=input, - size=self.dnn_dims[0], - param_attr=ParamAttr(name='%s_emb.w' % prefix)) - return emb -``` - -由于输入给词向量表(embedding table)的是一个句子对应的词的ID的列表 ,因此词向量表输出的是词向量的序列。 - -### CNN 结构实现 - -```python -def create_cnn(self, emb, prefix=''): - - """ - A multi-layer CNN. - :param emb: The word embedding. - :type emb: paddle.layer - :param prefix: The prefix will be added to of layers' names. - :type prefix: str - """ - - def create_conv(context_len, hidden_size, prefix): - key = "%s_%d_%d" % (prefix, context_len, hidden_size) - conv = paddle.networks.sequence_conv_pool( - input=emb, - context_len=context_len, - hidden_size=hidden_size, - # set parameter attr for parameter sharing - context_proj_param_attr=ParamAttr(name=key + "contex_proj.w"), - fc_param_attr=ParamAttr(name=key + "_fc.w"), - fc_bias_attr=ParamAttr(name=key + "_fc.b"), - pool_bias_attr=ParamAttr(name=key + "_pool.b")) - return conv - - conv_3 = create_conv(3, self.dnn_dims[1], "cnn") - conv_4 = create_conv(4, self.dnn_dims[1], "cnn") - return paddle.layer.concat(input=[conv_3, conv_4]) -``` - -CNN 接受词向量序列,通过卷积和池化操作捕捉到原始句子的关键信息,最终输出一个语义向量(可以认为是句子向量)。 - -本例的实现中,分别使用了窗口长度为3和4的CNN学到的句子向量按元素求和得到最终的句子向量。 - -### RNN 结构实现 - -RNN很适合学习变长序列的信息,使用RNN来学习句子的信息几乎是自然语言处理任务的标配。 - -```python -def create_rnn(self, emb, prefix=''): - """ - A GRU sentence vector learner. - """ - gru = paddle.networks.simple_gru( - input=emb, - size=self.dnn_dims[1], - mixed_param_attr=ParamAttr(name='%s_gru_mixed.w' % prefix), - mixed_bias_param_attr=ParamAttr(name="%s_gru_mixed.b" % prefix), - gru_param_attr=ParamAttr(name='%s_gru.w' % prefix), - gru_bias_attr=ParamAttr(name="%s_gru.b" % prefix)) - sent_vec = paddle.layer.last_seq(gru) - return sent_vec -``` - -### 多层全连接网络FC - -```python -def create_fc(self, emb, prefix=''): - - """ - A multi-layer fully connected neural networks. - :param emb: The output of the embedding layer - :type emb: paddle.layer - :param prefix: A prefix will be added to the layers' names. - :type prefix: str - """ - - _input_layer = paddle.layer.pooling( - input=emb, pooling_type=paddle.pooling.Max()) - fc = paddle.layer.fc( - input=_input_layer, - size=self.dnn_dims[1], - param_attr=ParamAttr(name='%s_fc.w' % prefix), - bias_attr=ParamAttr(name="%s_fc.b" % prefix)) - return fc -``` - -在构建全连接网络时首先使用`paddle.layer.pooling` 对词向量序列进行最大池化操作,将边长序列转化为一个固定维度向量,作为整个句子的语义表达,使用最大池化能够降低句子长度对句向量表达的影响。 - -### 多层DNN -在 CNN/DNN/FC提取出 semantic vector后,在上层可继续接多层FC来实现深层DNN结构。 - -```python -def create_dnn(self, sent_vec, prefix): - if len(self.dnn_dims) > 1: - _input_layer = sent_vec - for id, dim in enumerate(self.dnn_dims[1:]): - name = "%s_fc_%d_%d" % (prefix, id, dim) - fc = paddle.layer.fc( - input=_input_layer, - size=dim, - act=paddle.activation.Tanh(), - param_attr=ParamAttr(name='%s.w' % name), - bias_attr=ParamAttr(name='%s.b' % name), - ) - _input_layer = fc - return _input_layer -``` - -### 分类及回归 -分类和回归的结构比较相似,具体实现请参考[network_conf.py]( https://github.com/PaddlePaddle/models/blob/develop/dssm/network_conf.py)中的 -`_build_classification_or_regression_model` 函数。 - -### Pairwise Rank -Pairwise Rank复用上面的DNN结构,同一个source对两个target求相似度打分,如果左边的target打分高,预测为1,否则预测为 0。实现请参考 [network_conf.py]( https://github.com/PaddlePaddle/models/blob/develop/dssm/network_conf.py) 中的`_build_rank_model` 函数。 - -## 数据格式 -在 `./data` 中有简单的示例数据 - -### 回归的数据格式 -``` -# 3 fields each line: -# - source word list -# - target word list -# - target - \t \t -``` - -比如: - -``` -苹果 六 袋 苹果 6s 0.1 -新手 汽车 驾驶 驾校 培训 0.9 -``` -### 分类的数据格式 -``` -# 3 fields each line: -# - source word list -# - target word list -# - target - \t \t