diff --git a/AutoDL/HiNAS_models/README.md b/AutoDL/HiNAS_models/README.md deleted file mode 100755 index 9c67736aa30643baf72ce42ed2ca3321d4e22165..0000000000000000000000000000000000000000 --- a/AutoDL/HiNAS_models/README.md +++ /dev/null @@ -1,76 +0,0 @@ -# Image Classification Models -This directory contains six image classification models, which are models automatically discovered by Baidu Big Data Lab (BDL) Hierarchical Neural Architecture Search project (HiNAS), achieving 96.1% accuracy on CIFAR-10 dataset. These models are divided into two categories. The first three have no skip link, named HiNAS 0-2, and the last three networks contain skip links, which are similar to the shortcut connections in Resnet, named HiNAS 3-5. - ---- -## Table of Contents -- [Installation](#installation) -- [Data preparation](#data-preparation) -- [Training a model](#training-a-model) -- [Model performances](#model-performances) - -## Installation -Running the trainer in current directory requires: - -- PadddlePaddle Fluid >= v0.15.0 -- CuDNN >=6.0 - -If PaddlePaddle and CuDNN in your runtime environment do not meet the requirements, please follow the instructions in [installation document](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html) and make an update. - -## Data preparation - -When you run the sample code for the first time, the trainer will automatically download the cifar-10 dataset. Please make sure your environment has an internet connection. - -The dataset will be downloaded to `dataset/cifar/cifar-10-python.tar.gz` in the same directory as the Trainer. If automatic download fails, you can go to https://www.cs.toronto.edu/~kriz/cifar.html and download cifar-10-python.tar.gz to the location mentioned above. - -## Training a model - -After the environment is ready, you can train the model. There are two entrances: `train_hinas.py` and `train_hinas_res.py`. The former is used to train Model 0-2 (without skip link), and the latter is used to train Model 3-5 (contains skip link). - -Train Model 0~2 (without skip link): -``` -python train_hinas.py --model=m_id # m_id can be 0, 1 or 2. -``` -Train Model 3~5 (with skip link): -``` -python train_hinas_res.py --model=m_id # m_id can be 0, 1 or 2. -``` - -In addition, both `train_hinas.py` and `train_hinas_res.py` support the following parameters: - -- **random_flip_left_right**: Random flip image horizontally. (Default: True) -- **random_flip_up_down**: Randomly flip image vertically. (Default: False) -- **cutout**: Add cutout action to image. (Default: True) -- **standardize_image**: Image standardize. (Default: True) -- **pad_and_cut_image**: Random padding image and then crop back to the original size. (Default: True) -- **shuffle_image**: Shuffle the order of the input images during training. (Default: True) -- **lr_max**: Learning rate at the begin of training. (Default: 0.1) -- **lr_min**: Learning rate at the end of training. (Default: 0.0001) -- **batch_size**: Training batch size (Default: 128) -- **num_epochs**: Total training epoch (Default: 200) -- **weight_decay**: L2 Regularization value (Default: 0.0004) -- **momentum**: The momentum parameter in momentum optimizer (Default: 0.9) -- **dropout_rate**: Dropout rate of the dropout layer (Default: 0.5) -- **bn_decay**: The decay/momentum parameter (or called moving average decay) in batch norm layer (Default: 0.9) - - -## Model performances - -Train all six models using same hyperparameters: - -- learning rate: 0.1 -> 0.0001 with cosine annealing -- total epoch: 200 -- batch size: 128 -- L2 decay: 0.000400 -- optimizer: momentum optimizer with m=0.9 and use nesterov -- preprocess: random horizontal flip + image standardization + cutout - -And below is the accuracy on CIFAR-10 dataset: - -| model | round 1 | round 2 | round 3 | max | avg | -|----------|---------|---------|---------|--------|--------| -| HiNAS-0 | 0.9548 | 0.9520 | 0.9513 | 0.9548 | 0.9527 | -| HiNAS-1 | 0.9452 | 0.9462 | 0.9420 | 0.9462 | 0.9445 | -| HiNAS-2 | 0.9508 | 0.9506 | 0.9483 | 0.9508 | 0.9499 | -| HiNAS-3 | 0.9607 | 0.9623 | 0.9601 | 0.9623 | 0.9611 | -| HiNAS-4 | 0.9611 | 0.9584 | 0.9586 | 0.9611 | 0.9594 | -| HiNAS-5 | 0.9578 | 0.9588 | 0.9594 | 0.9594 | 0.9586 | diff --git a/AutoDL/HiNAS_models/README_cn.md b/AutoDL/HiNAS_models/README_cn.md deleted file mode 100755 index 8ca3bcbfb8d1ea1a15f969c1a1db22ff2ec854f1..0000000000000000000000000000000000000000 --- a/AutoDL/HiNAS_models/README_cn.md +++ /dev/null @@ -1,78 +0,0 @@ -# Image Classification Models -本目录下包含6个图像分类模型,都是百度大数据实验室 Hierarchical Neural Architecture Search (HiNAS) 项目通过机器自动发现的模型,在CIFAR-10数据集上达到96.1%的准确率。这6个模型分为两类,前3个没有skip link,分别命名为 HiNAS 0-2号,后三个网络带有skip link,功能类似于Resnet中的shortcut connection,分别命名 HiNAS 3-5号。 - ---- -## Table of Contents -- [Installation](#installation) -- [Data preparation](#data-preparation) -- [Training a model](#training-a-model) -- [Model performances](#model-performances) - -## Installation -最低环境要求: - -- PadddlePaddle Fluid >= v0.15.0 -- Cudnn >=6.0 - -如果您的运行环境无法满足要求,可以参考此文档升级PaddlePaddle:[installation document](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html) - -## Data preparation - -第一次训练模型的时候,Trainer会自动下载CIFAR-10数据集,请确保您的环境有互联网连接。 - -数据集会被下载到Trainer同目录下的`dataset/cifar/cifar-10-python.tar.gz`,如果自动下载失败,您可以自行从 https://www.cs.toronto.edu/~kriz/cifar.html 下载cifar-10-python.tar.gz,然后放到上述位置。 - - -## Training a model -准备好环境后,可以训练模型,训练有2个入口,`train_hinas.py`和`train_hinas_res.py`,前者用来训练0-2号不含skip link的模型,后者用来训练3-5号包含skip link的模型。 - -训练0~2号不含skip link的模型: -``` -python train_hinas.py --model=m_id # m_id can be 0, 1 or 2. -``` -训练3~5号包含skip link的模型: -``` -python train_hinas_res.py --model=m_id # m_id can be 0, 1 or 2. -``` - -此外,`train_hinas.py`和`train_hinas_res.py` 都支持以下参数: - -初始化部分: - -- random_flip_left_right:图片随机水平翻转(Default:True) -- random_flip_up_down:图片随机垂直翻转(Default:False) -- cutout:图片随机遮挡(Default:True) -- standardize_image:对图片每个像素做 standardize(Default:True) -- pad_and_cut_image:图片随机padding,并裁剪回原大小(Default:True) -- shuffle_image:训练时对输入图片的顺序做shuffle(Default:True) -- lr_max:训练开始时的learning rate(Default:0.1) -- lr_min:训练结束时的learning rate(Default:0.0001) -- batch_size:训练的batch size(Default:128) -- num_epochs:训练总的epoch(Default:200) -- weight_decay:训练时L2 Regularization大小(Default:0.0004) -- momentum:momentum优化器中的momentum系数(Default:0.9) -- dropout_rate:dropout层的dropout_rate(Default:0.5) -- bn_decay:batch norm层的decay/momentum系数(即moving average decay)大小(Default:0.9) - - - -## Model performances -6个模型使用相同的参数训练: - -- learning rate: 0.1 -> 0.0001 with cosine annealing -- total epoch: 200 -- batch size: 128 -- L2 decay: 0.000400 -- optimizer: momentum optimizer with m=0.9 and use nesterov -- preprocess: random horizontal flip + image standardization + cutout - -以下是6个模型在CIFAR-10数据集上的准确率: - -| model | round 1 | round 2 | round 3 | max | avg | -|----------|---------|---------|---------|--------|--------| -| HiNAS-0 | 0.9548 | 0.9520 | 0.9513 | 0.9548 | 0.9527 | -| HiNAS-1 | 0.9452 | 0.9462 | 0.9420 | 0.9462 | 0.9445 | -| HiNAS-2 | 0.9508 | 0.9506 | 0.9483 | 0.9508 | 0.9499 | -| HiNAS-3 | 0.9607 | 0.9623 | 0.9601 | 0.9623 | 0.9611 | -| HiNAS-4 | 0.9611 | 0.9584 | 0.9586 | 0.9611 | 0.9594 | -| HiNAS-5 | 0.9578 | 0.9588 | 0.9594 | 0.9594 | 0.9586 | diff --git a/AutoDL/HiNAS_models/build/__init__.py b/AutoDL/HiNAS_models/build/__init__.py deleted file mode 100755 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/AutoDL/HiNAS_models/build/layers.py b/AutoDL/HiNAS_models/build/layers.py deleted file mode 100755 index 5bd67fb837bb21434f8628e339c3ef541b8c5a90..0000000000000000000000000000000000000000 --- a/AutoDL/HiNAS_models/build/layers.py +++ /dev/null @@ -1,214 +0,0 @@ -# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import operator - -import numpy as np -import paddle.fluid as fluid -from absl import flags - -FLAGS = flags.FLAGS - -flags.DEFINE_float("bn_decay", 0.9, "batch norm decay") -flags.DEFINE_float("dropout_rate", 0.5, "dropout rate") - - -def calc_padding(img_width, stride, dilation, filter_width): - """ calculate pixels to padding in order to keep input/output size same. """ - - filter_width = dilation * (filter_width - 1) + 1 - if img_width % stride == 0: - pad_along_width = max(filter_width - stride, 0) - else: - pad_along_width = max(filter_width - (img_width % stride), 0) - return pad_along_width // 2, pad_along_width - pad_along_width // 2 - - -def conv(inputs, - filters, - kernel, - strides=(1, 1), - dilation=(1, 1), - num_groups=1, - conv_param=None): - """ normal conv layer """ - - if isinstance(kernel, (tuple, list)): - n = operator.mul(*kernel) * inputs.shape[1] - else: - n = kernel * kernel * inputs.shape[1] - - # pad input - padding = (0, 0, 0, 0) \ - + calc_padding(inputs.shape[2], strides[0], dilation[0], kernel[0]) \ - + calc_padding(inputs.shape[3], strides[1], dilation[1], kernel[1]) - if sum(padding) > 0: - inputs = fluid.layers.pad(inputs, padding, 0) - - param_attr = fluid.param_attr.ParamAttr( - initializer=fluid.initializer.NormalInitializer( - 0.0, scale=np.sqrt(2.0 / n)), - regularizer=fluid.regularizer.L2Decay(FLAGS.weight_decay)) - - bias_attr = fluid.param_attr.ParamAttr( - regularizer=fluid.regularizer.L2Decay(0.)) - - return fluid.layers.conv2d( - inputs, - filters, - kernel, - stride=strides, - padding=0, - dilation=dilation, - groups=num_groups, - param_attr=param_attr if conv_param is None else conv_param, - use_cudnn=False if num_groups == inputs.shape[1] == filters else True, - bias_attr=bias_attr, - act=None) - - -def sep(inputs, filters, kernel, strides=(1, 1), dilation=(1, 1)): - """ Separable convolution layer """ - - if isinstance(kernel, (tuple, list)): - n_depth = operator.mul(*kernel) - else: - n_depth = kernel * kernel - n_point = inputs.shape[1] - - if isinstance(strides, (tuple, list)): - multiplier = strides[0] - else: - multiplier = strides - - depthwise_param = fluid.param_attr.ParamAttr( - initializer=fluid.initializer.NormalInitializer( - 0.0, scale=np.sqrt(2.0 / n_depth)), - regularizer=fluid.regularizer.L2Decay(FLAGS.weight_decay)) - - pointwise_param = fluid.param_attr.ParamAttr( - initializer=fluid.initializer.NormalInitializer( - 0.0, scale=np.sqrt(2.0 / n_point)), - regularizer=fluid.regularizer.L2Decay(FLAGS.weight_decay)) - - depthwise_conv = conv( - inputs=inputs, - kernel=kernel, - filters=int(filters * multiplier), - strides=strides, - dilation=dilation, - num_groups=int(filters * multiplier), - conv_param=depthwise_param) - - return conv( - inputs=depthwise_conv, - kernel=(1, 1), - filters=int(filters * multiplier), - strides=(1, 1), - dilation=dilation, - conv_param=pointwise_param) - - -def maxpool(inputs, kernel, strides=(1, 1)): - padding = (0, 0, 0, 0) \ - + calc_padding(inputs.shape[2], strides[0], 1, kernel[0]) \ - + calc_padding(inputs.shape[3], strides[1], 1, kernel[1]) - if sum(padding) > 0: - inputs = fluid.layers.pad(inputs, padding, 0) - - return fluid.layers.pool2d( - inputs, kernel, 'max', strides, pool_padding=0, ceil_mode=False) - - -def avgpool(inputs, kernel, strides=(1, 1)): - padding_pixel = (0, 0, 0, 0) - padding_pixel += calc_padding(inputs.shape[2], strides[0], 1, kernel[0]) - padding_pixel += calc_padding(inputs.shape[3], strides[1], 1, kernel[1]) - - if padding_pixel[4] == padding_pixel[5] and padding_pixel[ - 6] == padding_pixel[7]: - # same padding pixel num on all sides. - return fluid.layers.pool2d( - inputs, - kernel, - 'avg', - strides, - pool_padding=(padding_pixel[4], padding_pixel[6]), - ceil_mode=False) - elif padding_pixel[4] + 1 == padding_pixel[5] and padding_pixel[6] + 1 == padding_pixel[7] \ - and strides == (1, 1): - # different padding size: first pad then crop. - x = fluid.layers.pool2d( - inputs, - kernel, - 'avg', - strides, - pool_padding=(padding_pixel[5], padding_pixel[7]), - ceil_mode=False) - x_shape = x.shape - return fluid.layers.crop( - x, - shape=(-1, x_shape[1], x_shape[2] - 1, x_shape[3] - 1), - offsets=(0, 0, 1, 1)) - else: - # not support. use padding-zero and pool2d. - print("Warning: use zero-padding in avgpool") - outputs = fluid.layers.pad(inputs, padding_pixel, 0) - return fluid.layers.pool2d( - outputs, kernel, 'avg', strides, pool_padding=0, ceil_mode=False) - - -def global_avgpool(inputs): - return fluid.layers.pool2d( - inputs, - 1, - 'avg', - 1, - pool_padding=0, - global_pooling=True, - ceil_mode=True) - - -def fully_connected(inputs, units): - n = inputs.shape[1] - param_attr = fluid.param_attr.ParamAttr( - initializer=fluid.initializer.NormalInitializer( - 0.0, scale=np.sqrt(2.0 / n)), - regularizer=fluid.regularizer.L2Decay(FLAGS.weight_decay)) - - bias_attr = fluid.param_attr.ParamAttr( - regularizer=fluid.regularizer.L2Decay(0.)) - - return fluid.layers.fc(inputs, - units, - param_attr=param_attr, - bias_attr=bias_attr) - - -def bn_relu(inputs): - """ batch norm + rely layer """ - - output = fluid.layers.batch_norm( - inputs, momentum=FLAGS.bn_decay, epsilon=0.001, data_layout="NCHW") - return fluid.layers.relu(output) - - -def dropout(inputs): - """ dropout layer """ - - return fluid.layers.dropout(inputs, dropout_prob=FLAGS.dropout_rate) diff --git a/AutoDL/HiNAS_models/build/ops.py b/AutoDL/HiNAS_models/build/ops.py deleted file mode 100755 index 359f62852fe193cabaad73d7361ed6db57cf6d8c..0000000000000000000000000000000000000000 --- a/AutoDL/HiNAS_models/build/ops.py +++ /dev/null @@ -1,117 +0,0 @@ -# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import build.layers as layers - - -def conv_1x1(inputs, downsample=False): - return conv_base(inputs, (1, 1), downsample=downsample) - - -def conv_2x2(inputs, downsample=False): - return conv_base(inputs, (2, 2), downsample=downsample) - - -def conv_3x3(inputs, downsample=False): - return conv_base(inputs, (3, 3), downsample=downsample) - - -def dilated_2x2(inputs, downsample=False): - return conv_base(inputs, (2, 2), (2, 2), downsample) - - -def conv_1x2_2x1(inputs, downsample=False): - return pair_base(inputs, 2, downsample) - - -def conv_1x3_3x1(inputs, downsample=False): - return pair_base(inputs, 3, downsample) - - -def sep_2x2(inputs, downsample=False): - return sep_base(inputs, (2, 2), downsample=downsample) - - -def sep_3x3(inputs, downsample=False): - return sep_base(inputs, (3, 3), downsample=downsample) - - -def maxpool_2x2(inputs, downsample=False): - return maxpool_base(inputs, (2, 2), downsample) - - -def maxpool_3x3(inputs, downsample=False): - return maxpool_base(inputs, (3, 3), downsample) - - -def avgpool_2x2(inputs, downsample=False): - return avgpool_base(inputs, (2, 2), downsample) - - -def avgpool_3x3(inputs, downsample=False): - return avgpool_base(inputs, (3, 3), downsample) - - -def conv_base(inputs, kernel, dilation=(1, 1), downsample=False): - filters = inputs.shape[1] - if downsample: - output = layers.conv(inputs, filters * 2, kernel, (2, 2)) - else: - output = layers.conv(inputs, filters, kernel, dilation=dilation) - return output - - -def pair_base(inputs, kernel, downsample=False): - filters = inputs.shape[1] - if downsample: - output = layers.conv(inputs, filters, (1, kernel), (1, 2)) - output = layers.conv(output, filters, (kernel, 1), (2, 1)) - output = layers.conv(output, filters * 2, (1, 1)) - else: - output = layers.conv(inputs, filters, (1, kernel)) - output = layers.conv(output, filters, (kernel, 1)) - return output - - -def sep_base(inputs, kernel, dilation=(1, 1), downsample=False): - filters = inputs.shape[1] - if downsample: - output = layers.sep(inputs, filters * 2, kernel, (2, 2)) - else: - output = layers.sep(inputs, filters, kernel, dilation=dilation) - return output - - -def maxpool_base(inputs, kernel, downsample=False): - if downsample: - filters = inputs.shape[1] - output = layers.maxpool(inputs, kernel, (2, 2)) - output = layers.conv(output, filters * 2, (1, 1)) - else: - output = layers.maxpool(inputs, kernel) - return output - - -def avgpool_base(inputs, kernel, downsample=False): - if downsample: - filters = inputs.shape[1] - output = layers.avgpool(inputs, kernel, (2, 2)) - output = layers.conv(output, filters * 2, (1, 1)) - else: - output = layers.avgpool(inputs, kernel) - return output diff --git a/AutoDL/HiNAS_models/build/resnet_base.py b/AutoDL/HiNAS_models/build/resnet_base.py deleted file mode 100755 index 76c870de3bed9641622ed5722dff9e58b76fddff..0000000000000000000000000000000000000000 --- a/AutoDL/HiNAS_models/build/resnet_base.py +++ /dev/null @@ -1,109 +0,0 @@ -# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import paddle.fluid as fluid -from absl import flags - -import build.layers as layers -import build.ops as _ops - -FLAGS = flags.FLAGS - -flags.DEFINE_integer("num_stages", 3, "number of stages") -flags.DEFINE_integer("num_blocks", 5, "number of blocks per stage") -flags.DEFINE_integer("num_ops", 2, "number of operations per block") -flags.DEFINE_integer("width", 64, "network width") -flags.DEFINE_string("downsample", "pool", "conv or pool") - -num_classes = 10 - -ops = [ - _ops.conv_1x1, - _ops.conv_2x2, - _ops.conv_3x3, - _ops.dilated_2x2, - _ops.conv_1x2_2x1, - _ops.conv_1x3_3x1, - _ops.sep_2x2, - _ops.sep_3x3, - _ops.maxpool_2x2, - _ops.maxpool_3x3, - _ops.avgpool_2x2, - _ops.avgpool_3x3, -] - - -def net(inputs, tokens): - """ build network with skip links """ - - x = layers.conv(inputs, FLAGS.width, (3, 3)) - - num_ops = FLAGS.num_blocks * FLAGS.num_ops - x = stage(x, tokens[:num_ops], pre_activation=True) - for i in range(1, FLAGS.num_stages): - x = stage(x, tokens[i * num_ops:(i + 1) * num_ops], downsample=True) - - x = layers.bn_relu(x) - x = layers.global_avgpool(x) - x = layers.dropout(x) - logits = layers.fully_connected(x, num_classes) - - return fluid.layers.softmax(logits) - - -def stage(x, tokens, pre_activation=False, downsample=False): - """ build network's stage. Stage consists of blocks """ - - x = block(x, tokens[:FLAGS.num_ops], pre_activation, downsample) - for i in range(1, FLAGS.num_blocks): - print("-" * 12) - x = block(x, tokens[i * FLAGS.num_ops:(i + 1) * FLAGS.num_ops]) - print("=" * 12) - - return x - - -def block(x, tokens, pre_activation=False, downsample=False): - """ build block. """ - - if pre_activation: - x = layers.bn_relu(x) - res = x - else: - res = x - x = layers.bn_relu(x) - - x = ops[tokens[0]](x, downsample) - print("%s \t-> shape %s" % (ops[0].__name__, x.shape)) - for token in tokens[1:]: - x = layers.bn_relu(x) - x = ops[token](x) - print("%s \t-> shape %s" % (ops[token].__name__, x.shape)) - - if downsample: - filters = res.shape[1] - if FLAGS.downsample == "conv": - res = layers.conv(res, filters * 2, (1, 1), (2, 2)) - elif FLAGS.downsample == "pool": - res = layers.avgpool(res, (2, 2), (2, 2)) - res = fluid.layers.pad(res, (0, 0, filters // 2, filters // 2, 0, 0, - 0, 0)) - else: - raise NotImplementedError - - return x + res diff --git a/AutoDL/HiNAS_models/build/vgg_base.py b/AutoDL/HiNAS_models/build/vgg_base.py deleted file mode 100755 index d7506a7ec4617a4c1017911a763084f754c6b1f0..0000000000000000000000000000000000000000 --- a/AutoDL/HiNAS_models/build/vgg_base.py +++ /dev/null @@ -1,70 +0,0 @@ -# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import paddle.fluid as fluid -from absl import flags - -import build.layers as layers -import build.ops as _ops - -FLAGS = flags.FLAGS -flags.DEFINE_integer("num_stages", 5, "number of stages") -flags.DEFINE_integer("width", 64, "network width") - -num_classes = 10 - -ops = [ - _ops.conv_1x1, #0 - _ops.conv_2x2, #1 - _ops.conv_3x3, #2 - _ops.dilated_2x2, #3 - _ops.conv_1x2_2x1, #4 - _ops.conv_1x3_3x1, #5 - _ops.sep_2x2, #6 - _ops.sep_3x3, #7 - _ops.maxpool_2x2, #8 - _ops.maxpool_3x3, - _ops.avgpool_2x2, #10 - _ops.avgpool_3x3, -] - - -def net(inputs, tokens): - depth = len(tokens) - q, r = divmod(depth + 1, FLAGS.num_stages) - downsample_steps = [ - i * q + max(0, i + r - FLAGS.num_stages + 1) - 2 - for i in range(1, FLAGS.num_stages) - ] - - x = layers.conv(inputs, FLAGS.width, (3, 3)) - x = layers.bn_relu(x) - - for i, token in enumerate(tokens): - downsample = i in downsample_steps - x = ops[token](x, downsample) - print("%s \t-> shape %s" % (ops[token].__name__, x.shape)) - if downsample: - print("=" * 12) - x = layers.bn_relu(x) - - x = layers.global_avgpool(x) - x = layers.dropout(x) - logits = layers.fully_connected(x, num_classes) - - return fluid.layers.softmax(logits) diff --git a/AutoDL/HiNAS_models/nn_paddle.py b/AutoDL/HiNAS_models/nn_paddle.py deleted file mode 100755 index d3a3ddd60cf3e5e114de322f3eea763e5a2e6018..0000000000000000000000000000000000000000 --- a/AutoDL/HiNAS_models/nn_paddle.py +++ /dev/null @@ -1,139 +0,0 @@ -# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import math - -import numpy as np -import paddle -import paddle.fluid as fluid -from paddle.fluid.contrib.trainer import * -from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter -import reader - -from absl import flags - -# import preprocess - -FLAGS = flags.FLAGS - -flags.DEFINE_float("lr_max", 0.1, "initial learning rate") -flags.DEFINE_float("lr_min", 0.0001, "limiting learning rate") - -flags.DEFINE_integer("batch_size", 128, "batch size") -flags.DEFINE_integer("num_epochs", 200, "total epochs to train") -flags.DEFINE_float("weight_decay", 0.0004, "weight decay") - -flags.DEFINE_float("momentum", 0.9, "momentum") - -flags.DEFINE_boolean("shuffle_image", True, "shuffle input images on training") - -dataset_train_size = 50000 - - -class Model(object): - def __init__(self, build_fn, tokens): - print("learning rate: %f -> %f, cosine annealing" % - (FLAGS.lr_max, FLAGS.lr_min)) - print("epoch: %d" % FLAGS.num_epochs) - print("batch size: %d" % FLAGS.batch_size) - print("L2 decay: %f" % FLAGS.weight_decay) - - self.max_step = dataset_train_size * FLAGS.num_epochs // FLAGS.batch_size - - self.build_fn = build_fn - self.tokens = tokens - print("Token is %s" % ",".join(map(str, tokens))) - - def cosine_annealing(self): - step = _decay_step_counter() - lr = FLAGS.lr_min + (FLAGS.lr_max - FLAGS.lr_min) / 2 \ - * (1.0 + fluid.layers.ops.cos(step / self.max_step * math.pi)) - return lr - - def optimizer_program(self): - return fluid.optimizer.Momentum( - learning_rate=self.cosine_annealing(), - momentum=FLAGS.momentum, - use_nesterov=True, - regularization=fluid.regularizer.L2DecayRegularizer( - FLAGS.weight_decay)) - - def inference_network(self): - images = fluid.layers.data( - name='pixel', shape=[3, 32, 32], dtype='float32') - return self.build_fn(images, self.tokens) - - def train_network(self): - predict = self.inference_network() - label = fluid.layers.data(name='label', shape=[1], dtype='int64') - cost = fluid.layers.cross_entropy(input=predict, label=label) - avg_cost = fluid.layers.mean(cost) - accuracy = fluid.layers.accuracy(input=predict, label=label) - # self.parameters = fluid.parameters.create(avg_cost) - return [avg_cost, accuracy] - - def run(self): - train_files = reader.train10() - test_files = reader.test10() - - if FLAGS.shuffle_image: - train_reader = paddle.batch( - paddle.reader.shuffle(train_files, dataset_train_size), - batch_size=FLAGS.batch_size) - else: - train_reader = paddle.batch( - train_files, batch_size=FLAGS.batch_size) - - test_reader = paddle.batch(test_files, batch_size=FLAGS.batch_size) - - costs = [] - accs = [] - - def event_handler(event): - if isinstance(event, EndStepEvent): - costs.append(event.metrics[0]) - accs.append(event.metrics[1]) - if event.step % 20 == 0: - print("Epoch %d, Step %d, Loss %f, Acc %f" % ( - event.epoch, event.step, np.mean(costs), np.mean(accs))) - del costs[:] - del accs[:] - - if isinstance(event, EndEpochEvent): - if event.epoch % 3 == 0 or event.epoch == FLAGS.num_epochs - 1: - avg_cost, accuracy = trainer.test( - reader=test_reader, feed_order=['pixel', 'label']) - - event_handler.best_acc = max(event_handler.best_acc, - accuracy) - print("Test with epoch %d, Loss %f, Acc %f" % - (event.epoch, avg_cost, accuracy)) - print("Best acc %f" % event_handler.best_acc) - - event_handler.best_acc = 0.0 - place = fluid.CUDAPlace(0) - trainer = Trainer( - train_func=self.train_network, - optimizer_func=self.optimizer_program, - place=place) - - trainer.train( - reader=train_reader, - num_epochs=FLAGS.num_epochs, - event_handler=event_handler, - feed_order=['pixel', 'label']) diff --git a/AutoDL/HiNAS_models/reader.py b/AutoDL/HiNAS_models/reader.py deleted file mode 100755 index e30725b0c171376029d8c51dc38ac01350740c4a..0000000000000000000000000000000000000000 --- a/AutoDL/HiNAS_models/reader.py +++ /dev/null @@ -1,157 +0,0 @@ -# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -""" -CIFAR-10 dataset. -This module will download dataset from -https://www.cs.toronto.edu/~kriz/cifar.html and parse train/test set into -paddle reader creators. -The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, -with 6000 images per class. There are 50000 training images and 10000 test images. -""" - -from PIL import Image -from PIL import ImageOps -import numpy as np - -import cPickle -import itertools -import paddle.dataset.common -import tarfile -from absl import flags - -FLAGS = flags.FLAGS - -flags.DEFINE_boolean("random_flip_left_right", True, - "random flip left and right") -flags.DEFINE_boolean("random_flip_up_down", False, "random flip up and down") -flags.DEFINE_boolean("cutout", True, "cutout") -flags.DEFINE_boolean("standardize_image", True, "standardize input images") -flags.DEFINE_boolean("pad_and_cut_image", True, "pad and cut input images") - -__all__ = ['train10', 'test10', 'convert'] - -URL_PREFIX = 'https://www.cs.toronto.edu/~kriz/' -CIFAR10_URL = URL_PREFIX + 'cifar-10-python.tar.gz' -CIFAR10_MD5 = 'c58f30108f718f92721af3b95e74349a' - -paddle.dataset.common.DATA_HOME = "dataset/" - -image_size = 32 -image_depth = 3 -half_length = 8 - - -def preprocess(sample, is_training): - image_array = sample.reshape(3, image_size, image_size) - rgb_array = np.transpose(image_array, (1, 2, 0)) - img = Image.fromarray(rgb_array, 'RGB') - - if is_training: - if FLAGS.pad_and_cut_image: - # pad and ramdom crop - img = ImageOps.expand( - img, (2, 2, 2, 2), fill=0) # pad to 36 * 36 * 3 - left_top = np.random.randint(5, size=2) # rand 0 - 4 - img = img.crop((left_top[0], left_top[1], left_top[0] + image_size, - left_top[1] + image_size)) - - if FLAGS.random_flip_left_right and np.random.randint(2): - img = img.transpose(Image.FLIP_LEFT_RIGHT) - if FLAGS.random_flip_up_down and np.random.randint(2): - img = img.transpose(Image.FLIP_TOP_BOTTOM) - - img = np.array(img).astype(np.float32) - - if FLAGS.standardize_image: - # per_image_standardization - img_float = img / 255.0 - mean = np.mean(img_float) - std = max(np.std(img_float), 1.0 / np.sqrt(3 * image_size * image_size)) - img = (img_float - mean) / std - - if is_training and FLAGS.cutout: - center = np.random.randint(image_size, size=2) - offset_width = max(0, center[0] - half_length) - offset_height = max(0, center[1] - half_length) - target_width = min(center[0] + half_length, image_size) - target_height = min(center[1] + half_length, image_size) - - for i in range(offset_height, target_height): - for j in range(offset_width, target_width): - img[i][j][:] = 0.0 - - img = np.transpose(img, (2, 0, 1)) - return img.reshape(3 * image_size * image_size) - - -def reader_creator(filename, sub_name, is_training): - def read_batch(batch): - data = batch['data'] - labels = batch.get('labels', batch.get('fine_labels', None)) - assert labels is not None - for sample, label in itertools.izip(data, labels): - yield preprocess(sample, is_training), int(label) - - def reader(): - with tarfile.open(filename, mode='r') as f: - names = [ - each_item.name for each_item in f if sub_name in each_item.name - ] - names.sort() - - for name in names: - print("Reading file " + name) - batch = cPickle.load(f.extractfile(name)) - for item in read_batch(batch): - yield item - - return reader - - -def train10(): - """ - CIFAR-10 training set creator. - It returns a reader creator, each sample in the reader is image pixels in - [0, 1] and label in [0, 9]. - :return: Training reader creator - :rtype: callable - """ - return reader_creator( - paddle.dataset.common.download(CIFAR10_URL, 'cifar', CIFAR10_MD5), - 'data_batch', True) - - -def test10(): - """ - CIFAR-10 test set creator. - It returns a reader creator, each sample in the reader is image pixels in - [0, 1] and label in [0, 9]. - :return: Test reader creator. - :rtype: callable - """ - return reader_creator( - paddle.dataset.common.download(CIFAR10_URL, 'cifar', CIFAR10_MD5), - 'test_batch', False) - - -def fetch(): - paddle.dataset.common.download(CIFAR10_URL, 'cifar', CIFAR10_MD5) - - -def convert(path): - """ - Converts dataset to recordio format - """ - paddle.dataset.common.convert(path, train10(), 1000, "cifar_train10") - paddle.dataset.common.convert(path, test10(), 1000, "cifar_test10") diff --git a/AutoDL/HiNAS_models/tokens/15113.pkl b/AutoDL/HiNAS_models/tokens/15113.pkl deleted file mode 100755 index a36c7d322311ccceff93b13ddb5bc73058bb4bb7..0000000000000000000000000000000000000000 Binary files a/AutoDL/HiNAS_models/tokens/15113.pkl and /dev/null differ diff --git a/AutoDL/HiNAS_models/tokens/15383.pkl b/AutoDL/HiNAS_models/tokens/15383.pkl deleted file mode 100755 index 9f05c39bb408af893d7e19c0349a279b27ac4bc6..0000000000000000000000000000000000000000 --- a/AutoDL/HiNAS_models/tokens/15383.pkl +++ /dev/null @@ -1,36 +0,0 @@ -cnumpy.core.multiarray -_reconstruct -p0 -(cnumpy -ndarray -p1 -(I0 -tp2 -S'b' -p3 -tp4 -Rp5 -(I1 -(I21 -tp6 -cnumpy -dtype -p7 -(S'i4' -p8 -I0 -I1 -tp9 -Rp10 -(I3 -S'<' -p11 -NNNI-1 -I-1 -I0 -tp12 -bI00 -S'\x05\x00\x00\x00\x07\x00\x00\x00\x02\x00\x00\x00\x05\x00\x00\x00\x05\x00\x00\x00\x02\x00\x00\x00\x08\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x01\x00\x00\x00\n\x00\x00\x00\t\x00\x00\x00\x03\x00\x00\x00\x08\x00\x00\x00\x0b\x00\x00\x00\x03\x00\x00\x00\t\x00\x00\x00\x02\x00\x00\x00\x06\x00\x00\x00\x01\x00\x00\x00\x06\x00\x00\x00' -p13 -tp14 -b. \ No newline at end of file diff --git a/AutoDL/HiNAS_models/tokens/15613.pkl b/AutoDL/HiNAS_models/tokens/15613.pkl deleted file mode 100755 index 332564be14020af6118ad578092d7e68f1447596..0000000000000000000000000000000000000000 Binary files a/AutoDL/HiNAS_models/tokens/15613.pkl and /dev/null differ diff --git a/AutoDL/HiNAS_models/tokens/17754.pkl b/AutoDL/HiNAS_models/tokens/17754.pkl deleted file mode 100755 index 4844119fdbee64f86e457d70ce9e7259ced7b15f..0000000000000000000000000000000000000000 Binary files a/AutoDL/HiNAS_models/tokens/17754.pkl and /dev/null differ diff --git a/AutoDL/HiNAS_models/tokens/17925.pkl b/AutoDL/HiNAS_models/tokens/17925.pkl deleted file mode 100755 index 841412252339dfa63d44430eef3a95eed255379b..0000000000000000000000000000000000000000 --- a/AutoDL/HiNAS_models/tokens/17925.pkl +++ /dev/null @@ -1,36 +0,0 @@ -cnumpy.core.multiarray -_reconstruct -p0 -(cnumpy -ndarray -p1 -(I0 -tp2 -S'b' -p3 -tp4 -Rp5 -(I1 -(I21 -tp6 -cnumpy -dtype -p7 -(S'i4' -p8 -I0 -I1 -tp9 -Rp10 -(I3 -S'<' -p11 -NNNI-1 -I-1 -I0 -tp12 -bI00 -S'\x07\x00\x00\x00\x07\x00\x00\x00\x02\x00\x00\x00\x05\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00\x08\x00\x00\x00\x08\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x02\x00\x00\x00\n\x00\x00\x00\x08\x00\x00\x00\x02\x00\x00\x00\t\x00\x00\x00\x0b\x00\x00\x00\t\x00\x00\x00\x06\x00\x00\x00\x04\x00\x00\x00\x04\x00\x00\x00\n\x00\x00\x00' -p13 -tp14 -b. \ No newline at end of file diff --git a/AutoDL/HiNAS_models/tokens/18089.pkl b/AutoDL/HiNAS_models/tokens/18089.pkl deleted file mode 100755 index a466a6c91d7f664ca85351c8e6eed1046f4a2152..0000000000000000000000000000000000000000 --- a/AutoDL/HiNAS_models/tokens/18089.pkl +++ /dev/null @@ -1,36 +0,0 @@ -cnumpy.core.multiarray -_reconstruct -p0 -(cnumpy -ndarray -p1 -(I0 -tp2 -S'b' -p3 -tp4 -Rp5 -(I1 -(I21 -tp6 -cnumpy -dtype -p7 -(S'i4' -p8 -I0 -I1 -tp9 -Rp10 -(I3 -S'<' -p11 -NNNI-1 -I-1 -I0 -tp12 -bI00 -S'\x07\x00\x00\x00\x05\x00\x00\x00\x08\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\n\x00\x00\x00\t\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00\x08\x00\x00\x00\x08\x00\x00\x00\x08\x00\x00\x00\x02\x00\x00\x00\t\x00\x00\x00\x04\x00\x00\x00\t\x00\x00\x00\x0b\x00\x00\x00\x07\x00\x00\x00\x04\x00\x00\x00\x03\x00\x00\x00' -p13 -tp14 -b. \ No newline at end of file diff --git a/AutoDL/HiNAS_models/train_hinas.py b/AutoDL/HiNAS_models/train_hinas.py deleted file mode 100755 index 8e4a0f855a15545b71b98802f74e467cb22e06b7..0000000000000000000000000000000000000000 --- a/AutoDL/HiNAS_models/train_hinas.py +++ /dev/null @@ -1,44 +0,0 @@ -# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os -import pickle - -from absl import app -from absl import flags - -import nn_paddle as nn -from build import vgg_base - -FLAGS = flags.FLAGS -flags.DEFINE_string("tokdir", "tokens/", "token directory") -flags.DEFINE_integer("model", 0, "model") - -mid = [17925, 18089, 15383] - - -def main(_): - f = os.path.join(FLAGS.tokdir, str(mid[FLAGS.model]) + ".pkl") - tokens = pickle.load(open(f, "rb")) - - model = nn.Model(vgg_base.net, tokens) - model.run() - - -if __name__ == "__main__": - app.run(main) diff --git a/AutoDL/HiNAS_models/train_hinas_res.py b/AutoDL/HiNAS_models/train_hinas_res.py deleted file mode 100755 index 4809042274d5a9b3660a66153c95e25e67ad988f..0000000000000000000000000000000000000000 --- a/AutoDL/HiNAS_models/train_hinas_res.py +++ /dev/null @@ -1,44 +0,0 @@ -# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os -import pickle - -from absl import app -from absl import flags - -import nn_paddle as nn -from build import resnet_base - -FLAGS = flags.FLAGS -flags.DEFINE_string("tokdir", "tokens/", "token directory") -flags.DEFINE_integer("model", 0, "model") - -mid = [17754, 15113, 15613] - - -def main(_): - f = os.path.join(FLAGS.tokdir, str(mid[FLAGS.model]) + ".pkl") - tokens = pickle.load(open(f, "rb")) - - model = nn.Model(resnet_base.net, tokens) - model.run() - - -if __name__ == "__main__": - app.run(main) diff --git a/AutoDL/LRC/README.md b/AutoDL/LRC/README.md deleted file mode 100644 index df9af47d4a3876371673cbbfef0ad2553768b9a5..0000000000000000000000000000000000000000 --- a/AutoDL/LRC/README.md +++ /dev/null @@ -1,74 +0,0 @@ -# LRC Local Rademachar Complexity Regularization -Regularization of Deep Neural Networks(DNNs) for the sake of improving their generalization capability is important and chllenging. This directory contains image classification model based on a novel regularizer rooted in Local Rademacher Complexity (LRC). We appreciate the contribution by [DARTS](https://arxiv.org/abs/1806.09055) for our research. The regularization by LRC and DARTS are combined in this model on CIFAR-10 dataset. Code accompanying the paper -> [An Empirical Study on Regularization of Deep Neural Networks by Local Rademacher Complexity](https://arxiv.org/abs/1902.00873)\ -> Yingzhen Yang, Xingjian Li, Jun Huan.\ -> _arXiv:1902.00873_. - ---- -# Table of Contents - -- [Installation](#installation) -- [Data preparation](#data-preparation) -- [Training](#training) - -## Installation - -Running sample code in this directory requires PaddelPaddle Fluid v.1.2.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html#paddlepaddle) and make an update. - -## Data preparation - -When you want to use the cifar-10 dataset for the first time, you can download the dataset as: - - sh ./dataset/download.sh - -Please make sure your environment has an internet connection. - -The dataset will be downloaded to `dataset/cifar/cifar-10-batches-py` in the same directory as the `train.py`. If automatic download fails, you can download cifar-10-python.tar.gz from https://www.cs.toronto.edu/~kriz/cifar.html and decompress it to the location mentioned above. - - -## Training - -After data preparation, one can start the training step by: - - python -u train_mixup.py \ - --batch_size=80 \ - --auxiliary \ - --weight_decay=0.0003 \ - --learning_rate=0.025 \ - --lrc_loss_lambda=0.7 \ - --cutout -- Set ```export CUDA_VISIBLE_DEVICES=0``` to specifiy one GPU to train. -- For more help on arguments: - - python train_mixup.py --help - -**data reader introduction:** - -* Data reader is defined in `reader.py`. -* Reshape the images to 32 * 32. -* In training stage, images are padding to 40 * 40 and cropped randomly to the original size. -* In training stage, images are horizontally random flipped. -* Images are standardized to (0, 1). -* In training stage, cutout images randomly. -* Shuffle the order of the input images during training. - -**model configuration:** - -* Use auxiliary loss and auxiliary\_weight=0.4. -* Use dropout and drop\_path\_prob=0.2. -* Set lrc\_loss\_lambda=0.7. - -**training strategy:** - -* Use momentum optimizer with momentum=0.9. -* Weight decay is 0.0003. -* Use cosine decay with init\_lr=0.025. -* Total epoch is 600. -* Use Xaiver initalizer to weight in conv2d, Constant initalizer to weight in batch norm and Normal initalizer to weight in fc. -* Initalize bias in batch norm and fc to zero constant and do not add bias to conv2d. - - -## Reference - - - DARTS: Differentiable Architecture Search [`paper`](https://arxiv.org/abs/1806.09055) - - Differentiable architecture search in PyTorch [`code`](https://github.com/quark0/darts) diff --git a/AutoDL/LRC/README_cn.md b/AutoDL/LRC/README_cn.md deleted file mode 100644 index 06dc937074de199af31db97ee200e7690443b1b0..0000000000000000000000000000000000000000 --- a/AutoDL/LRC/README_cn.md +++ /dev/null @@ -1,71 +0,0 @@ -# LRC 局部Rademachar复杂度正则化 -为了在深度神经网络中提升泛化能力,正则化的选择十分重要也具有挑战性。本目录包括了一种基于局部rademacher复杂度的新型正则(LRC)的图像分类模型。十分感谢[DARTS](https://arxiv.org/abs/1806.09055)模型对本研究提供的帮助。该模型将LRC正则和DARTS网络相结合,在CIFAR-10数据集中得到了很出色的效果。代码和文章一同发布 -> [An Empirical Study on Regularization of Deep Neural Networks by Local Rademacher Complexity](https://arxiv.org/abs/1902.00873)\ -> Yingzhen Yang, Xingjian Li, Jun Huan.\ -> _arXiv:1902.00873_. - ---- -# 内容 - -- [安装](#安装) -- [数据准备](#数据准备) -- [模型训练](#模型训练) - -## 安装 - -在当前目录下运行样例代码需要PadddlePaddle Fluid的v.1.2.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://www.paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/install/index_cn.html#paddlepaddle)中的说明来更新PaddlePaddle。 - -## 数据准备 - -第一次使用CIFAR-10数据集时,您可以通过如果命令下载: - - sh ./dataset/download.sh - -请确保您的环境有互联网连接。数据会下载到`train.py`同目录下的`dataset/cifar/cifar-10-batches-py`。如果下载失败,您可以自行从https://www.cs.toronto.edu/~kriz/cifar.html上下载cifar-10-python.tar.gz并解压到上述位置。 - -## 模型训练 - -数据准备好后,可以通过如下命令开始训练: - - python -u train_mixup.py \ - --batch_size=80 \ - --auxiliary \ - --weight_decay=0.0003 \ - --learning_rate=0.025 \ - --lrc_loss_lambda=0.7 \ - --cutout -- 通过设置 ```export CUDA_VISIBLE_DEVICES=0```指定单张GPU训练。 -- 可选参数见: - - python train_mixup.py --help - -**数据读取器说明:** - -* 数据读取器定义在`reader.py`中 -* 输入图像尺寸统一变换为32 * 32 -* 训练时将图像填充为40 * 40然后随机剪裁为原输入图像大小 -* 训练时图像随机水平翻转 -* 对图像每个像素做归一化处理 -* 训练时对图像做随机遮挡 -* 训练时对输入图像做随机洗牌 - -**模型配置:** - -* 使用辅助损失,辅助损失权重为0.4 -* 使用dropout,随机丢弃率为0.2 -* 设置lrc\_loss\_lambda为0.7 - -**训练策略:** - -* 采用momentum优化算法训练,momentum=0.9 -* 权重衰减系数为0.0001 -* 采用正弦学习率衰减,初始学习率为0.025 -* 总共训练600轮 -* 对卷积权重采用Xaiver初始化,对batch norm权重采用固定初始化,对全连接层权重采用高斯初始化 -* 对batch norm和全连接层偏差采用固定初始化,不对卷积设置偏差 - - -## 引用 - - - DARTS: Differentiable Architecture Search [`论文`](https://arxiv.org/abs/1806.09055) - - Differentiable Architecture Search in PyTorch [`代码`](https://github.com/quark0/darts) diff --git a/AutoDL/LRC/dataset/download.sh b/AutoDL/LRC/dataset/download.sh deleted file mode 100644 index 0981c3b6878421f80d392f314fd0ae836644a63c..0000000000000000000000000000000000000000 --- a/AutoDL/LRC/dataset/download.sh +++ /dev/null @@ -1,10 +0,0 @@ -DIR="$( cd "$(dirname "$0")" ; pwd -P )" -cd "$DIR" -mkdir cifar -cd cifar -# Download the data. -echo "Downloading..." -wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz -# Extract the data. -echo "Extracting..." -tar zvxf cifar-10-python.tar.gz diff --git a/AutoDL/LRC/genotypes.py b/AutoDL/LRC/genotypes.py deleted file mode 100644 index 349fbd2478a7c2d1bb4cc3dd901b470de3c8b906..0000000000000000000000000000000000000000 --- a/AutoDL/LRC/genotypes.py +++ /dev/null @@ -1,116 +0,0 @@ -# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# -# Based on: -# -------------------------------------------------------- -# DARTS -# Copyright (c) 2018, Hanxiao Liu. -# Licensed under the Apache License, Version 2.0; -# -------------------------------------------------------- - -from collections import namedtuple - -Genotype = namedtuple('Genotype', 'normal normal_concat reduce reduce_concat') - -PRIMITIVES = [ - 'none', 'max_pool_3x3', 'avg_pool_3x3', 'skip_connect', 'sep_conv_3x3', - 'sep_conv_5x5', 'dil_conv_3x3', 'dil_conv_5x5' -] - -NASNet = Genotype( - normal=[ - ('sep_conv_5x5', 1), - ('sep_conv_3x3', 0), - ('sep_conv_5x5', 0), - ('sep_conv_3x3', 0), - ('avg_pool_3x3', 1), - ('skip_connect', 0), - ('avg_pool_3x3', 0), - ('avg_pool_3x3', 0), - ('sep_conv_3x3', 1), - ('skip_connect', 1), - ], - normal_concat=[2, 3, 4, 5, 6], - reduce=[ - ('sep_conv_5x5', 1), - ('sep_conv_7x7', 0), - ('max_pool_3x3', 1), - ('sep_conv_7x7', 0), - ('avg_pool_3x3', 1), - ('sep_conv_5x5', 0), - ('skip_connect', 3), - ('avg_pool_3x3', 2), - ('sep_conv_3x3', 2), - ('max_pool_3x3', 1), - ], - reduce_concat=[4, 5, 6], ) - -AmoebaNet = Genotype( - normal=[ - ('avg_pool_3x3', 0), - ('max_pool_3x3', 1), - ('sep_conv_3x3', 0), - ('sep_conv_5x5', 2), - ('sep_conv_3x3', 0), - ('avg_pool_3x3', 3), - ('sep_conv_3x3', 1), - ('skip_connect', 1), - ('skip_connect', 0), - ('avg_pool_3x3', 1), - ], - normal_concat=[4, 5, 6], - reduce=[ - ('avg_pool_3x3', 0), - ('sep_conv_3x3', 1), - ('max_pool_3x3', 0), - ('sep_conv_7x7', 2), - ('sep_conv_7x7', 0), - ('avg_pool_3x3', 1), - ('max_pool_3x3', 0), - ('max_pool_3x3', 1), - ('conv_7x1_1x7', 0), - ('sep_conv_3x3', 5), - ], - reduce_concat=[3, 4, 6]) - -DARTS_V1 = Genotype( - normal=[('sep_conv_3x3', 1), ('sep_conv_3x3', 0), ('skip_connect', 0), - ('sep_conv_3x3', 1), ('skip_connect', 0), ('sep_conv_3x3', 1), - ('sep_conv_3x3', 0), ('skip_connect', 2)], - normal_concat=[2, 3, 4, 5], - reduce=[('max_pool_3x3', 0), ('max_pool_3x3', 1), ('skip_connect', 2), - ('max_pool_3x3', 0), ('max_pool_3x3', 0), ('skip_connect', 2), - ('skip_connect', 2), ('avg_pool_3x3', 0)], - reduce_concat=[2, 3, 4, 5]) -DARTS_V2 = Genotype( - normal=[('sep_conv_3x3', 0), ('sep_conv_3x3', 1), ('sep_conv_3x3', 0), - ('sep_conv_3x3', 1), ('sep_conv_3x3', 1), ('skip_connect', 0), - ('skip_connect', 0), ('dil_conv_3x3', 2)], - normal_concat=[2, 3, 4, 5], - reduce=[('max_pool_3x3', 0), ('max_pool_3x3', 1), ('skip_connect', 2), - ('max_pool_3x3', 1), ('max_pool_3x3', 0), ('skip_connect', 2), - ('skip_connect', 2), ('max_pool_3x3', 1)], - reduce_concat=[2, 3, 4, 5]) - -MY_DARTS = Genotype( - normal=[('sep_conv_3x3', 0), ('skip_connect', 1), ('skip_connect', 0), - ('dil_conv_5x5', 1), ('skip_connect', 0), ('sep_conv_3x3', 1), - ('skip_connect', 0), ('sep_conv_3x3', 1)], - normal_concat=range(2, 6), - reduce=[('max_pool_3x3', 0), ('max_pool_3x3', 1), ('max_pool_3x3', 0), - ('skip_connect', 2), ('max_pool_3x3', 0), ('skip_connect', 2), - ('skip_connect', 2), ('skip_connect', 3)], - reduce_concat=range(2, 6)) - -DARTS = MY_DARTS diff --git a/AutoDL/LRC/learning_rate.py b/AutoDL/LRC/learning_rate.py deleted file mode 100644 index 3965171b487884d36e4a7447f10f312204803bf8..0000000000000000000000000000000000000000 --- a/AutoDL/LRC/learning_rate.py +++ /dev/null @@ -1,43 +0,0 @@ -# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# -# Based on: -# -------------------------------------------------------- -# DARTS -# Copyright (c) 2018, Hanxiao Liu. -# Licensed under the Apache License, Version 2.0; -# -------------------------------------------------------- - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function -import paddle -import paddle.fluid as fluid -import paddle.fluid.layers.ops as ops -from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter -import math -from paddle.fluid.initializer import init_on_cpu - - -def cosine_decay(learning_rate, num_epoch, steps_one_epoch): - """Applies cosine decay to the learning rate. - lr = 0.5 * (math.cos(epoch * (math.pi / 120)) + 1) - """ - global_step = _decay_step_counter() - - with init_on_cpu(): - decayed_lr = learning_rate * \ - (ops.cos((global_step / steps_one_epoch) \ - * math.pi / num_epoch) + 1)/2 - return decayed_lr diff --git a/AutoDL/LRC/model.py b/AutoDL/LRC/model.py deleted file mode 100644 index 45a403495ecc0b7cc0ac3b541d75702adbef31b2..0000000000000000000000000000000000000000 --- a/AutoDL/LRC/model.py +++ /dev/null @@ -1,313 +0,0 @@ -# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve. -# -#Licensed under the Apache License, Version 2.0 (the "License"); -#you may not use this file except in compliance with the License. -#You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -#Unless required by applicable law or agreed to in writing, software -#distributed under the License is distributed on an "AS IS" BASIS, -#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -#See the License for the specific language governing permissions and -#limitations under the License. -# -# Based on: -# -------------------------------------------------------- -# DARTS -# Copyright (c) 2018, Hanxiao Liu. -# Licensed under the Apache License, Version 2.0; -# -------------------------------------------------------- - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function -import os -import sys -import numpy as np -import time -import functools -import paddle -import paddle.fluid as fluid -from operations import * - - -class Cell(): - def __init__(self, genotype, C_prev_prev, C_prev, C, reduction, - reduction_prev): - print(C_prev_prev, C_prev, C) - - if reduction_prev: - self.preprocess0 = functools.partial(FactorizedReduce, C_out=C) - else: - self.preprocess0 = functools.partial( - ReLUConvBN, C_out=C, kernel_size=1, stride=1, padding=0) - self.preprocess1 = functools.partial( - ReLUConvBN, C_out=C, kernel_size=1, stride=1, padding=0) - if reduction: - op_names, indices = zip(*genotype.reduce) - concat = genotype.reduce_concat - else: - op_names, indices = zip(*genotype.normal) - concat = genotype.normal_concat - print(op_names, indices, concat, reduction) - self._compile(C, op_names, indices, concat, reduction) - - def _compile(self, C, op_names, indices, concat, reduction): - assert len(op_names) == len(indices) - self._steps = len(op_names) // 2 - self._concat = concat - self.multiplier = len(concat) - - self._ops = [] - for name, index in zip(op_names, indices): - stride = 2 if reduction and index < 2 else 1 - op = functools.partial(OPS[name], C=C, stride=stride, affine=True) - self._ops += [op] - self._indices = indices - - def forward(self, s0, s1, drop_prob, is_train, name): - self.training = is_train - preprocess0_name = name + 'preprocess0.' - preprocess1_name = name + 'preprocess1.' - s0 = self.preprocess0(s0, name=preprocess0_name) - s1 = self.preprocess1(s1, name=preprocess1_name) - out = [s0, s1] - for i in range(self._steps): - h1 = out[self._indices[2 * i]] - h2 = out[self._indices[2 * i + 1]] - op1 = self._ops[2 * i] - op2 = self._ops[2 * i + 1] - h3 = op1(h1, name=name + '_ops.' + str(2 * i) + '.') - h4 = op2(h2, name=name + '_ops.' + str(2 * i + 1) + '.') - if self.training and drop_prob > 0.: - if h3 != h1: - h3 = fluid.layers.dropout( - h3, - drop_prob, - dropout_implementation='upscale_in_train') - if h4 != h2: - h4 = fluid.layers.dropout( - h4, - drop_prob, - dropout_implementation='upscale_in_train') - s = h3 + h4 - out += [s] - return fluid.layers.concat([out[i] for i in self._concat], axis=1) - - -def AuxiliaryHeadCIFAR(input, num_classes, aux_name='auxiliary_head'): - relu_a = fluid.layers.relu(input) - pool_a = fluid.layers.pool2d(relu_a, 5, 'avg', 3) - conv2d_a = fluid.layers.conv2d( - pool_a, - 128, - 1, - name=aux_name + '.features.2', - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=aux_name + '.features.2.weight'), - bias_attr=False) - bn_a_name = aux_name + '.features.3' - bn_a = fluid.layers.batch_norm( - conv2d_a, - act='relu', - name=bn_a_name, - param_attr=ParamAttr( - initializer=Constant(1.), name=bn_a_name + '.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), name=bn_a_name + '.bias'), - moving_mean_name=bn_a_name + '.running_mean', - moving_variance_name=bn_a_name + '.running_var') - conv2d_b = fluid.layers.conv2d( - bn_a, - 768, - 2, - name=aux_name + '.features.5', - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=aux_name + '.features.5.weight'), - bias_attr=False) - bn_b_name = aux_name + '.features.6' - bn_b = fluid.layers.batch_norm( - conv2d_b, - act='relu', - name=bn_b_name, - param_attr=ParamAttr( - initializer=Constant(1.), name=bn_b_name + '.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), name=bn_b_name + '.bias'), - moving_mean_name=bn_b_name + '.running_mean', - moving_variance_name=bn_b_name + '.running_var') - fc_name = aux_name + '.classifier' - fc = fluid.layers.fc(bn_b, - num_classes, - name=fc_name, - param_attr=ParamAttr( - initializer=Normal(scale=1e-3), - name=fc_name + '.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), name=fc_name + '.bias')) - return fc - - -def StemConv(input, C_out, kernel_size, padding): - conv_a = fluid.layers.conv2d( - input, - C_out, - kernel_size, - padding=padding, - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), name='stem.0.weight'), - bias_attr=False) - bn_a = fluid.layers.batch_norm( - conv_a, - param_attr=ParamAttr( - initializer=Constant(1.), name='stem.1.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), name='stem.1.bias'), - moving_mean_name='stem.1.running_mean', - moving_variance_name='stem.1.running_var') - return bn_a - - -class NetworkCIFAR(object): - def __init__(self, C, class_num, layers, auxiliary, genotype): - self.class_num = class_num - self._layers = layers - self._auxiliary = auxiliary - - stem_multiplier = 3 - self.drop_path_prob = 0 - C_curr = stem_multiplier * C - - C_prev_prev, C_prev, C_curr = C_curr, C_curr, C - self.cells = [] - reduction_prev = False - for i in range(layers): - if i in [layers // 3, 2 * layers // 3]: - C_curr *= 2 - reduction = True - else: - reduction = False - cell = Cell(genotype, C_prev_prev, C_prev, C_curr, reduction, - reduction_prev) - reduction_prev = reduction - self.cells += [cell] - C_prev_prev, C_prev = C_prev, cell.multiplier * C_curr - if i == 2 * layers // 3: - C_to_auxiliary = C_prev - - def forward(self, init_channel, is_train): - self.training = is_train - self.logits_aux = None - num_channel = init_channel * 3 - s0 = StemConv(self.image, num_channel, kernel_size=3, padding=1) - s1 = s0 - for i, cell in enumerate(self.cells): - name = 'cells.' + str(i) + '.' - s0, s1 = s1, cell.forward(s0, s1, self.drop_path_prob, is_train, - name) - if i == int(2 * self._layers // 3): - if self._auxiliary and self.training: - self.logits_aux = AuxiliaryHeadCIFAR(s1, self.class_num) - out = fluid.layers.adaptive_pool2d(s1, (1, 1), "avg") - self.logits = fluid.layers.fc(out, - size=self.class_num, - param_attr=ParamAttr( - initializer=Normal(scale=1e-3), - name='classifier.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), - name='classifier.bias')) - return self.logits, self.logits_aux - - def build_input(self, image_shape, batch_size, is_train): - if is_train: - py_reader = fluid.layers.py_reader( - capacity=64, - shapes=[[-1] + image_shape, [-1, 1], [-1, 1], [-1, 1], [-1, 1], - [-1, 1], [-1, batch_size, self.class_num - 1]], - lod_levels=[0, 0, 0, 0, 0, 0, 0], - dtypes=[ - "float32", "int64", "int64", "float32", "int32", "int32", - "float32" - ], - use_double_buffer=True, - name='train_reader') - else: - py_reader = fluid.layers.py_reader( - capacity=64, - shapes=[[-1] + image_shape, [-1, 1]], - lod_levels=[0, 0], - dtypes=["float32", "int64"], - use_double_buffer=True, - name='test_reader') - return py_reader - - def train_model(self, py_reader, init_channels, aux, aux_w, batch_size, - loss_lambda): - self.image, self.ya, self.yb, self.lam, self.label_reshape,\ - self.non_label_reshape, self.rad_var = fluid.layers.read_file(py_reader) - self.logits, self.logits_aux = self.forward(init_channels, True) - self.mixup_loss = self.mixup_loss(aux, aux_w) - self.lrc_loss = self.lrc_loss(batch_size) - return self.mixup_loss + loss_lambda * self.lrc_loss - - def test_model(self, py_reader, init_channels): - self.image, self.ya = fluid.layers.read_file(py_reader) - self.logits, _ = self.forward(init_channels, False) - prob = fluid.layers.softmax(self.logits, use_cudnn=False) - loss = fluid.layers.cross_entropy(prob, self.ya) - acc_1 = fluid.layers.accuracy(self.logits, self.ya, k=1) - acc_5 = fluid.layers.accuracy(self.logits, self.ya, k=5) - return loss, acc_1, acc_5 - - def mixup_loss(self, auxiliary, auxiliary_weight): - prob = fluid.layers.softmax(self.logits, use_cudnn=False) - loss_a = fluid.layers.cross_entropy(prob, self.ya) - loss_b = fluid.layers.cross_entropy(prob, self.yb) - loss_a_mean = fluid.layers.reduce_mean(loss_a) - loss_b_mean = fluid.layers.reduce_mean(loss_b) - loss = self.lam * loss_a_mean + (1 - self.lam) * loss_b_mean - if auxiliary: - prob_aux = fluid.layers.softmax(self.logits_aux, use_cudnn=False) - loss_a_aux = fluid.layers.cross_entropy(prob_aux, self.ya) - loss_b_aux = fluid.layers.cross_entropy(prob_aux, self.yb) - loss_a_aux_mean = fluid.layers.reduce_mean(loss_a_aux) - loss_b_aux_mean = fluid.layers.reduce_mean(loss_b_aux) - loss_aux = self.lam * loss_a_aux_mean + (1 - self.lam - ) * loss_b_aux_mean - return loss + auxiliary_weight * loss_aux - - def lrc_loss(self, batch_size): - y_diff_reshape = fluid.layers.reshape(self.logits, shape=(-1, 1)) - label_reshape = fluid.layers.squeeze(self.label_reshape, axes=[1]) - non_label_reshape = fluid.layers.squeeze( - self.non_label_reshape, axes=[1]) - label_reshape.stop_gradient = True - non_label_reshape.stop_graident = True - - y_diff_label_reshape = fluid.layers.gather(y_diff_reshape, - label_reshape) - y_diff_non_label_reshape = fluid.layers.gather(y_diff_reshape, - non_label_reshape) - y_diff_label = fluid.layers.reshape( - y_diff_label_reshape, shape=(-1, batch_size, 1)) - y_diff_non_label = fluid.layers.reshape( - y_diff_non_label_reshape, - shape=(-1, batch_size, self.class_num - 1)) - y_diff_ = y_diff_non_label - y_diff_label - - y_diff_ = fluid.layers.transpose(y_diff_, perm=[1, 2, 0]) - rad_var_trans = fluid.layers.transpose(self.rad_var, perm=[1, 2, 0]) - rad_y_diff_trans = rad_var_trans * y_diff_ - lrc_loss_sum = fluid.layers.reduce_sum(rad_y_diff_trans, dim=[0, 1]) - lrc_loss_ = fluid.layers.abs(lrc_loss_sum) / (batch_size * - (self.class_num - 1)) - lrc_loss_mean = fluid.layers.reduce_mean(lrc_loss_) - - return lrc_loss_mean diff --git a/AutoDL/LRC/operations.py b/AutoDL/LRC/operations.py deleted file mode 100644 index b015722a1bc5dbf682c90812a971f3dbb2cd8c9a..0000000000000000000000000000000000000000 --- a/AutoDL/LRC/operations.py +++ /dev/null @@ -1,349 +0,0 @@ -# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve. -# -#Licensed under the Apache License, Version 2.0 (the "License"); -#you may not use this file except in compliance with the License. -#You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -#Unless required by applicable law or agreed to in writing, software -#distributed under the License is distributed on an "AS IS" BASIS, -#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -#See the License for the specific language governing permissions and -#limitations under the License. -# -# Based on: -# -------------------------------------------------------- -# DARTS -# Copyright (c) 2018, Hanxiao Liu. -# Licensed under the Apache License, Version 2.0; -# -------------------------------------------------------- -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function -import os -import sys -import numpy as np -import time -import paddle -import paddle.fluid as fluid -from paddle.fluid.param_attr import ParamAttr -from paddle.fluid.initializer import Xavier -from paddle.fluid.initializer import Normal -from paddle.fluid.initializer import Constant - -OPS = { - 'none' : lambda input, C, stride, name, affine: Zero(input, stride, name), - 'avg_pool_3x3' : lambda input, C, stride, name, affine: fluid.layers.pool2d(input, 3, 'avg', pool_stride=stride, pool_padding=1, name=name), - 'max_pool_3x3' : lambda input, C, stride, name, affine: fluid.layers.pool2d(input, 3, 'max', pool_stride=stride, pool_padding=1, name=name), - 'skip_connect' : lambda input,C, stride, name, affine: Identity(input, name) if stride == 1 else FactorizedReduce(input, C, name=name, affine=affine), - 'sep_conv_3x3' : lambda input,C, stride, name, affine: SepConv(input, C, C, 3, stride, 1, name=name, affine=affine), - 'sep_conv_5x5' : lambda input,C, stride, name, affine: SepConv(input, C, C, 5, stride, 2, name=name, affine=affine), - 'sep_conv_7x7' : lambda input,C, stride, name, affine: SepConv(input, C, C, 7, stride, 3, name=name, affine=affine), - 'dil_conv_3x3' : lambda input,C, stride, name, affine: DilConv(input, C, C, 3, stride, 2, 2, name=name, affine=affine), - 'dil_conv_5x5' : lambda input,C, stride, name, affine: DilConv(input, C, C, 5, stride, 4, 2, name=name, affine=affine), - 'conv_7x1_1x7' : lambda input,C, stride, name, affine: SevenConv(input, C, name=name, affine=affine) -} - - -def ReLUConvBN(input, C_out, kernel_size, stride, padding, name='', - affine=True): - relu_a = fluid.layers.relu(input) - conv2d_a = fluid.layers.conv2d( - relu_a, - C_out, - kernel_size, - stride, - padding, - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=name + 'op.1.weight'), - bias_attr=False) - if affine: - reluconvbn_out = fluid.layers.batch_norm( - conv2d_a, - param_attr=ParamAttr( - initializer=Constant(1.), name=name + 'op.2.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), name=name + 'op.2.bias'), - moving_mean_name=name + 'op.2.running_mean', - moving_variance_name=name + 'op.2.running_var') - else: - reluconvbn_out = fluid.layers.batch_norm( - conv2d_a, - param_attr=ParamAttr( - initializer=Constant(1.), - learning_rate=0., - name=name + 'op.2.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), - learning_rate=0., - name=name + 'op.2.bias'), - moving_mean_name=name + 'op.2.running_mean', - moving_variance_name=name + 'op.2.running_var') - return reluconvbn_out - - -def DilConv(input, - C_in, - C_out, - kernel_size, - stride, - padding, - dilation, - name='', - affine=True): - relu_a = fluid.layers.relu(input) - conv2d_a = fluid.layers.conv2d( - relu_a, - C_in, - kernel_size, - stride, - padding, - dilation, - groups=C_in, - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=name + 'op.1.weight'), - bias_attr=False, - use_cudnn=False) - conv2d_b = fluid.layers.conv2d( - conv2d_a, - C_out, - 1, - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=name + 'op.2.weight'), - bias_attr=False) - if affine: - dilconv_out = fluid.layers.batch_norm( - conv2d_b, - param_attr=ParamAttr( - initializer=Constant(1.), name=name + 'op.3.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), name=name + 'op.3.bias'), - moving_mean_name=name + 'op.3.running_mean', - moving_variance_name=name + 'op.3.running_var') - else: - dilconv_out = fluid.layers.batch_norm( - conv2d_b, - param_attr=ParamAttr( - initializer=Constant(1.), - learning_rate=0., - name=name + 'op.3.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), - learning_rate=0., - name=name + 'op.3.bias'), - moving_mean_name=name + 'op.3.running_mean', - moving_variance_name=name + 'op.3.running_var') - return dilconv_out - - -def SepConv(input, - C_in, - C_out, - kernel_size, - stride, - padding, - name='', - affine=True): - relu_a = fluid.layers.relu(input) - conv2d_a = fluid.layers.conv2d( - relu_a, - C_in, - kernel_size, - stride, - padding, - groups=C_in, - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=name + 'op.1.weight'), - bias_attr=False, - use_cudnn=False) - conv2d_b = fluid.layers.conv2d( - conv2d_a, - C_in, - 1, - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=name + 'op.2.weight'), - bias_attr=False) - if affine: - bn_a = fluid.layers.batch_norm( - conv2d_b, - param_attr=ParamAttr( - initializer=Constant(1.), name=name + 'op.3.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), name=name + 'op.3.bias'), - moving_mean_name=name + 'op.3.running_mean', - moving_variance_name=name + 'op.3.running_var') - else: - bn_a = fluid.layers.batch_norm( - conv2d_b, - param_attr=ParamAttr( - initializer=Constant(1.), - learning_rate=0., - name=name + 'op.3.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), - learning_rate=0., - name=name + 'op.3.bias'), - moving_mean_name=name + 'op.3.running_mean', - moving_variance_name=name + 'op.3.running_var') - - relu_b = fluid.layers.relu(bn_a) - conv2d_d = fluid.layers.conv2d( - relu_b, - C_in, - kernel_size, - 1, - padding, - groups=C_in, - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=name + 'op.5.weight'), - bias_attr=False, - use_cudnn=False) - conv2d_e = fluid.layers.conv2d( - conv2d_d, - C_out, - 1, - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=name + 'op.6.weight'), - bias_attr=False) - if affine: - sepconv_out = fluid.layers.batch_norm( - conv2d_e, - param_attr=ParamAttr( - initializer=Constant(1.), name=name + 'op.7.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), name=name + 'op.7.bias'), - moving_mean_name=name + 'op.7.running_mean', - moving_variance_name=name + 'op.7.running_var') - else: - sepconv_out = fluid.layers.batch_norm( - conv2d_e, - param_attr=ParamAttr( - initializer=Constant(1.), - learning_rate=0., - name=name + 'op.7.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), - learning_rate=0., - name=name + 'op.7.bias'), - moving_mean_name=name + 'op.7.running_mean', - moving_variance_name=name + 'op.7.running_var') - return sepconv_out - - -def SevenConv(input, C_out, stride, name='', affine=True): - relu_a = fluid.layers.relu(input) - conv2d_a = fluid.layers.conv2d( - relu_a, - C_out, (1, 7), (1, stride), (0, 3), - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=name + 'op.1.weight'), - bias_attr=False) - conv2d_b = fluid.layers.conv2d( - conv2d_a, - C_out, (7, 1), (stride, 1), (3, 0), - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=name + 'op.2.weight'), - bias_attr=False) - if affine: - out = fluid.layers.batch_norm( - conv2d_b, - param_attr=ParamAttr( - initializer=Constant(1.), name=name + 'op.3.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), name=name + 'op.3.bias'), - moving_mean_name=name + 'op.3.running_mean', - moving_variance_name=name + 'op.3.running_var') - else: - out = fluid.layers.batch_norm( - conv2d_b, - param_attr=ParamAttr( - initializer=Constant(1.), - learning_rate=0., - name=name + 'op.3.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), - learning_rate=0., - name=name + 'op.3.bias'), - moving_mean_name=name + 'op.3.running_mean', - moving_variance_name=name + 'op.3.running_var') - - -def Identity(input, name=''): - return input - - -def Zero(input, stride, name=''): - ones = np.ones(input.shape[-2:]) - ones[::stride, ::stride] = 0 - ones = fluid.layers.assign(ones) - return input * ones - - -def FactorizedReduce(input, C_out, name='', affine=True): - relu_a = fluid.layers.relu(input) - conv2d_a = fluid.layers.conv2d( - relu_a, - C_out // 2, - 1, - 2, - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=name + 'conv_1.weight'), - bias_attr=False) - h_end = relu_a.shape[2] - w_end = relu_a.shape[3] - slice_a = fluid.layers.slice(relu_a, [2, 3], [1, 1], [h_end, w_end]) - conv2d_b = fluid.layers.conv2d( - slice_a, - C_out // 2, - 1, - 2, - param_attr=ParamAttr( - initializer=Xavier( - uniform=False, fan_in=0), - name=name + 'conv_2.weight'), - bias_attr=False) - out = fluid.layers.concat([conv2d_a, conv2d_b], axis=1) - if affine: - out = fluid.layers.batch_norm( - out, - param_attr=ParamAttr( - initializer=Constant(1.), name=name + 'bn.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), name=name + 'bn.bias'), - moving_mean_name=name + 'bn.running_mean', - moving_variance_name=name + 'bn.running_var') - else: - out = fluid.layers.batch_norm( - out, - param_attr=ParamAttr( - initializer=Constant(1.), - learning_rate=0., - name=name + 'bn.weight'), - bias_attr=ParamAttr( - initializer=Constant(0.), - learning_rate=0., - name=name + 'bn.bias'), - moving_mean_name=name + 'bn.running_mean', - moving_variance_name=name + 'bn.running_var') - return out diff --git a/AutoDL/LRC/reader.py b/AutoDL/LRC/reader.py deleted file mode 100644 index 20b32b504e9245c4ff3892f08736d800080daab4..0000000000000000000000000000000000000000 --- a/AutoDL/LRC/reader.py +++ /dev/null @@ -1,187 +0,0 @@ -# Copyright (c) 2019 PaddlePaddle Authors. All Rig hts Reserved -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# -# Based on: -# -------------------------------------------------------- -# DARTS -# Copyright (c) 2018, Hanxiao Liu. -# Licensed under the Apache License, Version 2.0; -# -------------------------------------------------------- -""" -CIFAR-10 dataset. -This module will download dataset from -https://www.cs.toronto.edu/~kriz/cifar.html and parse train/test set into -paddle reader creators. -The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, -with 6000 images per class. There are 50000 training images and 10000 test images. -""" - -from PIL import Image -from PIL import ImageOps -import numpy as np - -import cPickle -import random -import utils -import paddle.fluid as fluid -import time -import os -import functools -import paddle.reader - -__all__ = ['train10', 'test10'] - -image_size = 32 -image_depth = 3 -half_length = 8 - -CIFAR_MEAN = [0.4914, 0.4822, 0.4465] -CIFAR_STD = [0.24703233, 0.24348505, 0.26158768] - - -def generate_reshape_label(label, batch_size, CIFAR_CLASSES=10): - reshape_label = np.zeros((batch_size, 1), dtype='int32') - reshape_non_label = np.zeros( - (batch_size * (CIFAR_CLASSES - 1), 1), dtype='int32') - num = 0 - for i in range(batch_size): - label_i = label[i] - reshape_label[i] = label_i + i * CIFAR_CLASSES - for j in range(CIFAR_CLASSES): - if label_i != j: - reshape_non_label[num] = \ - j + i * CIFAR_CLASSES - num += 1 - return reshape_label, reshape_non_label - - -def generate_bernoulli_number(batch_size, CIFAR_CLASSES=10): - rcc_iters = 50 - rad_var = np.zeros((rcc_iters, batch_size, CIFAR_CLASSES - 1)) - for i in range(rcc_iters): - bernoulli_num = np.random.binomial(size=batch_size, n=1, p=0.5) - bernoulli_map = np.array([]) - ones = np.ones((CIFAR_CLASSES - 1, 1)) - for batch_id in range(batch_size): - num = bernoulli_num[batch_id] - var_id = 2 * ones * num - 1 - bernoulli_map = np.append(bernoulli_map, var_id) - rad_var[i] = bernoulli_map.reshape((batch_size, CIFAR_CLASSES - 1)) - return rad_var.astype('float32') - - -def preprocess(sample, is_training, args): - image_array = sample.reshape(3, image_size, image_size) - rgb_array = np.transpose(image_array, (1, 2, 0)) - img = Image.fromarray(rgb_array, 'RGB') - - if is_training: - # pad and ramdom crop - img = ImageOps.expand(img, (4, 4, 4, 4), fill=0) # pad to 40 * 40 * 3 - left_top = np.random.randint(9, size=2) # rand 0 - 8 - img = img.crop((left_top[0], left_top[1], left_top[0] + image_size, - left_top[1] + image_size)) - if np.random.randint(2): - img = img.transpose(Image.FLIP_LEFT_RIGHT) - - img = np.array(img).astype(np.float32) - - # per_image_standardization - img_float = img / 255.0 - img = (img_float - CIFAR_MEAN) / CIFAR_STD - - if is_training and args.cutout: - center = np.random.randint(image_size, size=2) - offset_width = max(0, center[0] - half_length) - offset_height = max(0, center[1] - half_length) - target_width = min(center[0] + half_length, image_size) - target_height = min(center[1] + half_length, image_size) - - for i in range(offset_height, target_height): - for j in range(offset_width, target_width): - img[i][j][:] = 0.0 - - img = np.transpose(img, (2, 0, 1)) - return img - - -def reader_creator_filepath(filename, sub_name, is_training, args): - files = os.listdir(filename) - names = [each_item for each_item in files if sub_name in each_item] - names.sort() - datasets = [] - for name in names: - print("Reading file " + name) - batch = cPickle.load(open(filename + name, 'rb')) - data = batch['data'] - labels = batch.get('labels', batch.get('fine_labels', None)) - assert labels is not None - dataset = zip(data, labels) - datasets.extend(dataset) - random.shuffle(datasets) - - def read_batch(datasets, args): - for sample, label in datasets: - im = preprocess(sample, is_training, args) - yield im, [int(label)] - - def reader(): - batch_data = [] - batch_label = [] - for data, label in read_batch(datasets, args): - batch_data.append(data) - batch_label.append(label) - if len(batch_data) == args.batch_size: - batch_data = np.array(batch_data, dtype='float32') - batch_label = np.array(batch_label, dtype='int64') - if is_training: - flatten_label, flatten_non_label = \ - generate_reshape_label(batch_label, args.batch_size) - rad_var = generate_bernoulli_number(args.batch_size) - mixed_x, y_a, y_b, lam = utils.mixup_data( - batch_data, batch_label, args.batch_size, - args.mix_alpha) - batch_out = [[mixed_x, y_a, y_b, lam, flatten_label, \ - flatten_non_label, rad_var]] - yield batch_out - else: - batch_out = [[batch_data, batch_label]] - yield batch_out - batch_data = [] - batch_label = [] - - return reader - - -def train10(args): - """ - CIFAR-10 training set creator. - It returns a reader creator, each sample in the reader is image pixels in - [0, 1] and label in [0, 9]. - :return: Training reader creator - :rtype: callable - """ - - return reader_creator_filepath(args.data, 'data_batch', True, args) - - -def test10(args): - """ - CIFAR-10 test set creator. - It returns a reader creator, each sample in the reader is image pixels in - [0, 1] and label in [0, 9]. - :return: Test reader creator. - :rtype: callable - """ - return reader_creator_filepath(args.data, 'test_batch', False, args) diff --git a/AutoDL/LRC/run.sh b/AutoDL/LRC/run.sh deleted file mode 100644 index 9f1a045d49789c3e9aebbc2a73b84b11da471b5a..0000000000000000000000000000000000000000 --- a/AutoDL/LRC/run.sh +++ /dev/null @@ -1,8 +0,0 @@ -CUDA_VISIBLE_DEVICES=0 python -u train_mixup.py \ ---batch_size=80 \ ---auxiliary \ ---weight_decay=0.0003 \ ---learning_rate=0.025 \ ---lrc_loss_lambda=0.7 \ ---cutout - diff --git a/AutoDL/LRC/train_mixup.py b/AutoDL/LRC/train_mixup.py deleted file mode 100644 index de752c84bcf9276aa83540d60370517e66c0704f..0000000000000000000000000000000000000000 --- a/AutoDL/LRC/train_mixup.py +++ /dev/null @@ -1,247 +0,0 @@ -# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve. -# -#Licensed under the Apache License, Version 2.0 (the "License"); -#you may not use this file except in compliance with the License. -#You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -#Unless required by applicable law or agreed to in writing, software -#distributed under the License is distributed on an "AS IS" BASIS, -#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -#See the License for the specific language governing permissions and -#limitations under the License. -# -# Based on: -# -------------------------------------------------------- -# DARTS -# Copyright (c) 2018, Hanxiao Liu. -# Licensed under the Apache License, Version 2.0; -# -------------------------------------------------------- - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function -from learning_rate import cosine_decay -import numpy as np -import argparse -from model import NetworkCIFAR as Network -import reader -import sys -import os -import time -import logging -import genotypes -import paddle.fluid as fluid -import shutil -import utils -import cPickle as cp - -parser = argparse.ArgumentParser("cifar") -parser.add_argument( - '--data', - type=str, - default='./dataset/cifar/cifar-10-batches-py/', - help='location of the data corpus') -parser.add_argument('--batch_size', type=int, default=96, help='batch size') -parser.add_argument( - '--learning_rate', type=float, default=0.025, help='init learning rate') -parser.add_argument('--momentum', type=float, default=0.9, help='momentum') -parser.add_argument( - '--weight_decay', type=float, default=3e-4, help='weight decay') -parser.add_argument( - '--report_freq', type=float, default=50, help='report frequency') -parser.add_argument( - '--epochs', type=int, default=600, help='num of training epochs') -parser.add_argument( - '--init_channels', type=int, default=36, help='num of init channels') -parser.add_argument( - '--layers', type=int, default=20, help='total number of layers') -parser.add_argument( - '--model_path', - type=str, - default='saved_models', - help='path to save the model') -parser.add_argument( - '--auxiliary', - action='store_true', - default=False, - help='use auxiliary tower') -parser.add_argument( - '--auxiliary_weight', - type=float, - default=0.4, - help='weight for auxiliary loss') -parser.add_argument( - '--cutout', action='store_true', default=False, help='use cutout') -parser.add_argument( - '--cutout_length', type=int, default=16, help='cutout length') -parser.add_argument( - '--drop_path_prob', type=float, default=0.2, help='drop path probability') -parser.add_argument('--save', type=str, default='EXP', help='experiment name') -parser.add_argument( - '--arch', type=str, default='DARTS', help='which architecture to use') -parser.add_argument( - '--grad_clip', type=float, default=5, help='gradient clipping') -parser.add_argument( - '--lr_exp_decay', - action='store_true', - default=False, - help='use exponential_decay learning_rate') -parser.add_argument('--mix_alpha', type=float, default=0.5, help='mixup alpha') -parser.add_argument( - '--lrc_loss_lambda', default=0, type=float, help='lrc_loss_lambda') -parser.add_argument( - '--loss_type', - default=1, - type=float, - help='loss_type 0: cross entropy 1: multi margin loss 2: max margin loss') - -args = parser.parse_args() - -CIFAR_CLASSES = 10 -dataset_train_size = 50000 -image_size = 32 - - -def main(): - image_shape = [3, image_size, image_size] - devices = os.getenv("CUDA_VISIBLE_DEVICES") or "" - devices_num = len(devices.split(",")) - logging.info("args = %s", args) - genotype = eval("genotypes.%s" % args.arch) - model = Network(args.init_channels, CIFAR_CLASSES, args.layers, - args.auxiliary, genotype) - steps_one_epoch = dataset_train_size / (devices_num * args.batch_size) - train(model, args, image_shape, steps_one_epoch) - - -def build_program(main_prog, startup_prog, args, is_train, model, im_shape, - steps_one_epoch): - out = [] - with fluid.program_guard(main_prog, startup_prog): - py_reader = model.build_input(im_shape, args.batch_size, is_train) - if is_train: - with fluid.unique_name.guard(): - loss = model.train_model(py_reader, args.init_channels, - args.auxiliary, args.auxiliary_weight, - args.batch_size, args.lrc_loss_lambda) - optimizer = fluid.optimizer.Momentum( - learning_rate=cosine_decay(args.learning_rate, \ - args.epochs, steps_one_epoch), - regularization=fluid.regularizer.L2Decay(\ - args.weight_decay), - momentum=args.momentum) - optimizer.minimize(loss) - out = [py_reader, loss] - else: - with fluid.unique_name.guard(): - loss, acc_1, acc_5 = model.test_model(py_reader, - args.init_channels) - out = [py_reader, loss, acc_1, acc_5] - return out - - -def train(model, args, im_shape, steps_one_epoch): - train_startup_prog = fluid.Program() - test_startup_prog = fluid.Program() - train_prog = fluid.Program() - test_prog = fluid.Program() - - train_py_reader, loss_train = build_program(train_prog, train_startup_prog, - args, True, model, im_shape, - steps_one_epoch) - - test_py_reader, loss_test, acc_1, acc_5 = build_program( - test_prog, test_startup_prog, args, False, model, im_shape, - steps_one_epoch) - - test_prog = test_prog.clone(for_test=True) - - place = fluid.CUDAPlace(0) - exe = fluid.Executor(place) - exe.run(train_startup_prog) - exe.run(test_startup_prog) - - exec_strategy = fluid.ExecutionStrategy() - exec_strategy.num_threads = 1 - train_exe = fluid.ParallelExecutor( - main_program=train_prog, - use_cuda=True, - loss_name=loss_train.name, - exec_strategy=exec_strategy) - train_reader = reader.train10(args) - test_reader = reader.test10(args) - train_py_reader.decorate_paddle_reader(train_reader) - test_py_reader.decorate_paddle_reader(test_reader) - - fluid.clip.set_gradient_clip(fluid.clip.GradientClipByNorm(args.grad_clip)) - fluid.memory_optimize(fluid.default_main_program()) - - def save_model(postfix, main_prog): - model_path = os.path.join(args.model_path, postfix) - if os.path.isdir(model_path): - shutil.rmtree(model_path) - fluid.io.save_persistables(exe, model_path, main_program=main_prog) - - def test(epoch_id): - test_fetch_list = [loss_test, acc_1, acc_5] - objs = utils.AvgrageMeter() - top1 = utils.AvgrageMeter() - top5 = utils.AvgrageMeter() - test_py_reader.start() - test_start_time = time.time() - step_id = 0 - try: - while True: - prev_test_start_time = test_start_time - test_start_time = time.time() - loss_test_v, acc_1_v, acc_5_v = exe.run( - test_prog, fetch_list=test_fetch_list) - objs.update(np.array(loss_test_v), args.batch_size) - top1.update(np.array(acc_1_v), args.batch_size) - top5.update(np.array(acc_5_v), args.batch_size) - if step_id % args.report_freq == 0: - print("Epoch {}, Step {}, acc_1 {}, acc_5 {}, time {}". - format(epoch_id, step_id, - np.array(acc_1_v), - np.array(acc_5_v), test_start_time - - prev_test_start_time)) - step_id += 1 - except fluid.core.EOFException: - test_py_reader.reset() - print("Epoch {0}, top1 {1}, top5 {2}".format(epoch_id, top1.avg, - top5.avg)) - - train_fetch_list = [loss_train] - epoch_start_time = time.time() - for epoch_id in range(args.epochs): - model.drop_path_prob = args.drop_path_prob * epoch_id / args.epochs - train_py_reader.start() - epoch_end_time = time.time() - if epoch_id > 0: - print("Epoch {}, total time {}".format(epoch_id - 1, epoch_end_time - - epoch_start_time)) - epoch_start_time = epoch_end_time - epoch_end_time - start_time = time.time() - step_id = 0 - try: - while True: - prev_start_time = start_time - start_time = time.time() - loss_v, = train_exe.run( - fetch_list=[v.name for v in train_fetch_list]) - print("Epoch {}, Step {}, loss {}, time {}".format(epoch_id, step_id, \ - np.array(loss_v).mean(), start_time-prev_start_time)) - step_id += 1 - sys.stdout.flush() - except fluid.core.EOFException: - train_py_reader.reset() - if epoch_id % 50 == 0 or epoch_id == args.epochs - 1: - save_model(str(epoch_id), train_prog) - test(epoch_id) - - -if __name__ == '__main__': - main() diff --git a/AutoDL/LRC/utils.py b/AutoDL/LRC/utils.py deleted file mode 100644 index 4002b57c6e91f9a4f7992156c4fa07f9e55d628c..0000000000000000000000000000000000000000 --- a/AutoDL/LRC/utils.py +++ /dev/null @@ -1,55 +0,0 @@ -# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# -# Based on: -# -------------------------------------------------------- -# DARTS -# Copyright (c) 2018, Hanxiao Liu. -# Licensed under the Apache License, Version 2.0; -# -------------------------------------------------------- - -import os -import sys -import time -import math -import numpy as np - - -def mixup_data(x, y, batch_size, alpha=1.0): - '''Compute the mixup data. Return mixed inputs, pairs of targets, and lambda''' - if alpha > 0.: - lam = np.random.beta(alpha, alpha) - else: - lam = 1. - index = np.random.permutation(batch_size) - - mixed_x = lam * x + (1 - lam) * x[index, :] - y_a, y_b = y, y[index] - return mixed_x.astype('float32'), y_a.astype('int64'),\ - y_b.astype('int64'), np.array(lam, dtype='float32') - - -class AvgrageMeter(object): - def __init__(self): - self.reset() - - def reset(self): - self.avg = 0 - self.sum = 0 - self.cnt = 0 - - def update(self, val, n=1): - self.sum += val * n - self.cnt += n - self.avg = self.sum / self.cnt diff --git a/PaddleRL/DeepQNetwork/DQN_agent.py b/PaddleRL/DeepQNetwork/DQN_agent.py deleted file mode 100644 index 1b27051a1a4793ee99fd6d735eb876d483eece34..0000000000000000000000000000000000000000 --- a/PaddleRL/DeepQNetwork/DQN_agent.py +++ /dev/null @@ -1,191 +0,0 @@ -#-*- coding: utf-8 -*- - -import math -import numpy as np -import paddle.fluid as fluid -from paddle.fluid.param_attr import ParamAttr -from tqdm import tqdm - - -class DQNModel(object): - def __init__(self, state_dim, action_dim, gamma, hist_len, use_cuda=False): - self.img_height = state_dim[0] - self.img_width = state_dim[1] - self.action_dim = action_dim - self.gamma = gamma - self.exploration = 1.1 - self.update_target_steps = 10000 // 4 - self.hist_len = hist_len - self.use_cuda = use_cuda - - self.global_step = 0 - self._build_net() - - def _get_inputs(self): - return fluid.layers.data( - name='state', - shape=[self.hist_len, self.img_height, self.img_width], - dtype='float32'), \ - fluid.layers.data( - name='action', shape=[1], dtype='int32'), \ - fluid.layers.data( - name='reward', shape=[], dtype='float32'), \ - fluid.layers.data( - name='next_s', - shape=[self.hist_len, self.img_height, self.img_width], - dtype='float32'), \ - fluid.layers.data( - name='isOver', shape=[], dtype='bool') - - def _build_net(self): - self.predict_program = fluid.Program() - self.train_program = fluid.Program() - self._sync_program = fluid.Program() - - with fluid.program_guard(self.predict_program): - state, action, reward, next_s, isOver = self._get_inputs() - self.pred_value = self.get_DQN_prediction(state) - - with fluid.program_guard(self.train_program): - state, action, reward, next_s, isOver = self._get_inputs() - pred_value = self.get_DQN_prediction(state) - - reward = fluid.layers.clip(reward, min=-1.0, max=1.0) - - action_onehot = fluid.layers.one_hot(action, self.action_dim) - action_onehot = fluid.layers.cast(action_onehot, dtype='float32') - - pred_action_value = fluid.layers.reduce_sum( - fluid.layers.elementwise_mul(action_onehot, pred_value), dim=1) - - targetQ_predict_value = self.get_DQN_prediction(next_s, target=True) - best_v = fluid.layers.reduce_max(targetQ_predict_value, dim=1) - best_v.stop_gradient = True - - target = reward + (1.0 - fluid.layers.cast( - isOver, dtype='float32')) * self.gamma * best_v - cost = fluid.layers.square_error_cost(pred_action_value, target) - cost = fluid.layers.reduce_mean(cost) - - optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3) - optimizer.minimize(cost) - - vars = list(self.train_program.list_vars()) - target_vars = list(filter( - lambda x: 'GRAD' not in x.name and 'target' in x.name, vars)) - - policy_vars_name = [ - x.name.replace('target', 'policy') for x in target_vars] - policy_vars = list(filter( - lambda x: x.name in policy_vars_name, vars)) - - policy_vars.sort(key=lambda x: x.name) - target_vars.sort(key=lambda x: x.name) - - with fluid.program_guard(self._sync_program): - sync_ops = [] - for i, var in enumerate(policy_vars): - sync_op = fluid.layers.assign(policy_vars[i], target_vars[i]) - sync_ops.append(sync_op) - - # fluid exe - place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace() - self.exe = fluid.Executor(place) - self.exe.run(fluid.default_startup_program()) - - def get_DQN_prediction(self, image, target=False): - image = image / 255.0 - - variable_field = 'target' if target else 'policy' - - conv1 = fluid.layers.conv2d( - input=image, - num_filters=32, - filter_size=5, - stride=1, - padding=2, - act='relu', - param_attr=ParamAttr(name='{}_conv1'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field))) - max_pool1 = fluid.layers.pool2d( - input=conv1, pool_size=2, pool_stride=2, pool_type='max') - - conv2 = fluid.layers.conv2d( - input=max_pool1, - num_filters=32, - filter_size=5, - stride=1, - padding=2, - act='relu', - param_attr=ParamAttr(name='{}_conv2'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field))) - max_pool2 = fluid.layers.pool2d( - input=conv2, pool_size=2, pool_stride=2, pool_type='max') - - conv3 = fluid.layers.conv2d( - input=max_pool2, - num_filters=64, - filter_size=4, - stride=1, - padding=1, - act='relu', - param_attr=ParamAttr(name='{}_conv3'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field))) - max_pool3 = fluid.layers.pool2d( - input=conv3, pool_size=2, pool_stride=2, pool_type='max') - - conv4 = fluid.layers.conv2d( - input=max_pool3, - num_filters=64, - filter_size=3, - stride=1, - padding=1, - act='relu', - param_attr=ParamAttr(name='{}_conv4'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field))) - - flatten = fluid.layers.flatten(conv4, axis=1) - - out = fluid.layers.fc( - input=flatten, - size=self.action_dim, - param_attr=ParamAttr(name='{}_fc1'.format(variable_field)), - bias_attr=ParamAttr(name='{}_fc1_b'.format(variable_field))) - return out - - - def act(self, state, train_or_test): - sample = np.random.random() - if train_or_test == 'train' and sample < self.exploration: - act = np.random.randint(self.action_dim) - else: - if np.random.random() < 0.01: - act = np.random.randint(self.action_dim) - else: - state = np.expand_dims(state, axis=0) - pred_Q = self.exe.run(self.predict_program, - feed={'state': state.astype('float32')}, - fetch_list=[self.pred_value])[0] - pred_Q = np.squeeze(pred_Q, axis=0) - act = np.argmax(pred_Q) - if train_or_test == 'train': - self.exploration = max(0.1, self.exploration - 1e-6) - return act - - def train(self, state, action, reward, next_state, isOver): - if self.global_step % self.update_target_steps == 0: - self.sync_target_network() - self.global_step += 1 - - action = np.expand_dims(action, -1) - self.exe.run(self.train_program, - feed={ - 'state': state.astype('float32'), - 'action': action.astype('int32'), - 'reward': reward, - 'next_s': next_state.astype('float32'), - 'isOver': isOver - }) - - def sync_target_network(self): - self.exe.run(self._sync_program) diff --git a/PaddleRL/DeepQNetwork/DoubleDQN_agent.py b/PaddleRL/DeepQNetwork/DoubleDQN_agent.py deleted file mode 100644 index ecd94abd459e728ac7c845ebee09adcbd6bbdd22..0000000000000000000000000000000000000000 --- a/PaddleRL/DeepQNetwork/DoubleDQN_agent.py +++ /dev/null @@ -1,199 +0,0 @@ -#-*- coding: utf-8 -*- - -import math -import numpy as np -import paddle.fluid as fluid -from paddle.fluid.param_attr import ParamAttr -from tqdm import tqdm - - -class DoubleDQNModel(object): - def __init__(self, state_dim, action_dim, gamma, hist_len, use_cuda=False): - self.img_height = state_dim[0] - self.img_width = state_dim[1] - self.action_dim = action_dim - self.gamma = gamma - self.exploration = 1.1 - self.update_target_steps = 10000 // 4 - self.hist_len = hist_len - self.use_cuda = use_cuda - - self.global_step = 0 - self._build_net() - - def _get_inputs(self): - return fluid.layers.data( - name='state', - shape=[self.hist_len, self.img_height, self.img_width], - dtype='float32'), \ - fluid.layers.data( - name='action', shape=[1], dtype='int32'), \ - fluid.layers.data( - name='reward', shape=[], dtype='float32'), \ - fluid.layers.data( - name='next_s', - shape=[self.hist_len, self.img_height, self.img_width], - dtype='float32'), \ - fluid.layers.data( - name='isOver', shape=[], dtype='bool') - - def _build_net(self): - self.predict_program = fluid.Program() - self.train_program = fluid.Program() - self._sync_program = fluid.Program() - - with fluid.program_guard(self.predict_program): - state, action, reward, next_s, isOver = self._get_inputs() - self.pred_value = self.get_DQN_prediction(state) - - with fluid.program_guard(self.train_program): - state, action, reward, next_s, isOver = self._get_inputs() - pred_value = self.get_DQN_prediction(state) - - reward = fluid.layers.clip(reward, min=-1.0, max=1.0) - - action_onehot = fluid.layers.one_hot(action, self.action_dim) - action_onehot = fluid.layers.cast(action_onehot, dtype='float32') - - pred_action_value = fluid.layers.reduce_sum( - fluid.layers.elementwise_mul(action_onehot, pred_value), dim=1) - - targetQ_predict_value = self.get_DQN_prediction(next_s, target=True) - - next_s_predcit_value = self.get_DQN_prediction(next_s) - greedy_action = fluid.layers.argmax(next_s_predcit_value, axis=1) - greedy_action = fluid.layers.unsqueeze(greedy_action, axes=[1]) - - predict_onehot = fluid.layers.one_hot(greedy_action, self.action_dim) - best_v = fluid.layers.reduce_sum( - fluid.layers.elementwise_mul(predict_onehot, targetQ_predict_value), - dim=1) - best_v.stop_gradient = True - - target = reward + (1.0 - fluid.layers.cast( - isOver, dtype='float32')) * self.gamma * best_v - cost = fluid.layers.square_error_cost(pred_action_value, target) - cost = fluid.layers.reduce_mean(cost) - - optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3) - optimizer.minimize(cost) - - vars = list(self.train_program.list_vars()) - target_vars = list(filter( - lambda x: 'GRAD' not in x.name and 'target' in x.name, vars)) - - policy_vars_name = [ - x.name.replace('target', 'policy') for x in target_vars] - policy_vars = list(filter( - lambda x: x.name in policy_vars_name, vars)) - - policy_vars.sort(key=lambda x: x.name) - target_vars.sort(key=lambda x: x.name) - - with fluid.program_guard(self._sync_program): - sync_ops = [] - for i, var in enumerate(policy_vars): - sync_op = fluid.layers.assign(policy_vars[i], target_vars[i]) - sync_ops.append(sync_op) - - # fluid exe - place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace() - self.exe = fluid.Executor(place) - self.exe.run(fluid.default_startup_program()) - - def get_DQN_prediction(self, image, target=False): - image = image / 255.0 - - variable_field = 'target' if target else 'policy' - - conv1 = fluid.layers.conv2d( - input=image, - num_filters=32, - filter_size=5, - stride=1, - padding=2, - act='relu', - param_attr=ParamAttr(name='{}_conv1'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field))) - max_pool1 = fluid.layers.pool2d( - input=conv1, pool_size=2, pool_stride=2, pool_type='max') - - conv2 = fluid.layers.conv2d( - input=max_pool1, - num_filters=32, - filter_size=5, - stride=1, - padding=2, - act='relu', - param_attr=ParamAttr(name='{}_conv2'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field))) - max_pool2 = fluid.layers.pool2d( - input=conv2, pool_size=2, pool_stride=2, pool_type='max') - - conv3 = fluid.layers.conv2d( - input=max_pool2, - num_filters=64, - filter_size=4, - stride=1, - padding=1, - act='relu', - param_attr=ParamAttr(name='{}_conv3'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field))) - max_pool3 = fluid.layers.pool2d( - input=conv3, pool_size=2, pool_stride=2, pool_type='max') - - conv4 = fluid.layers.conv2d( - input=max_pool3, - num_filters=64, - filter_size=3, - stride=1, - padding=1, - act='relu', - param_attr=ParamAttr(name='{}_conv4'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field))) - - flatten = fluid.layers.flatten(conv4, axis=1) - - out = fluid.layers.fc( - input=flatten, - size=self.action_dim, - param_attr=ParamAttr(name='{}_fc1'.format(variable_field)), - bias_attr=ParamAttr(name='{}_fc1_b'.format(variable_field))) - return out - - - def act(self, state, train_or_test): - sample = np.random.random() - if train_or_test == 'train' and sample < self.exploration: - act = np.random.randint(self.action_dim) - else: - if np.random.random() < 0.01: - act = np.random.randint(self.action_dim) - else: - state = np.expand_dims(state, axis=0) - pred_Q = self.exe.run(self.predict_program, - feed={'state': state.astype('float32')}, - fetch_list=[self.pred_value])[0] - pred_Q = np.squeeze(pred_Q, axis=0) - act = np.argmax(pred_Q) - if train_or_test == 'train': - self.exploration = max(0.1, self.exploration - 1e-6) - return act - - def train(self, state, action, reward, next_state, isOver): - if self.global_step % self.update_target_steps == 0: - self.sync_target_network() - self.global_step += 1 - - action = np.expand_dims(action, -1) - self.exe.run(self.train_program, - feed={ - 'state': state.astype('float32'), - 'action': action.astype('int32'), - 'reward': reward, - 'next_s': next_state.astype('float32'), - 'isOver': isOver - }) - - def sync_target_network(self): - self.exe.run(self._sync_program) diff --git a/PaddleRL/DeepQNetwork/DuelingDQN_agent.py b/PaddleRL/DeepQNetwork/DuelingDQN_agent.py deleted file mode 100644 index 4c6dbbfb79b4e4a069b32297c9a48b737ec7145f..0000000000000000000000000000000000000000 --- a/PaddleRL/DeepQNetwork/DuelingDQN_agent.py +++ /dev/null @@ -1,201 +0,0 @@ -#-*- coding: utf-8 -*- - -import math -import numpy as np -import paddle.fluid as fluid -from paddle.fluid.param_attr import ParamAttr -from tqdm import tqdm - - -class DuelingDQNModel(object): - def __init__(self, state_dim, action_dim, gamma, hist_len, use_cuda=False): - self.img_height = state_dim[0] - self.img_width = state_dim[1] - self.action_dim = action_dim - self.gamma = gamma - self.exploration = 1.1 - self.update_target_steps = 10000 // 4 - self.hist_len = hist_len - self.use_cuda = use_cuda - - self.global_step = 0 - self._build_net() - - def _get_inputs(self): - return fluid.layers.data( - name='state', - shape=[self.hist_len, self.img_height, self.img_width], - dtype='float32'), \ - fluid.layers.data( - name='action', shape=[1], dtype='int32'), \ - fluid.layers.data( - name='reward', shape=[], dtype='float32'), \ - fluid.layers.data( - name='next_s', - shape=[self.hist_len, self.img_height, self.img_width], - dtype='float32'), \ - fluid.layers.data( - name='isOver', shape=[], dtype='bool') - - def _build_net(self): - self.predict_program = fluid.Program() - self.train_program = fluid.Program() - self._sync_program = fluid.Program() - - with fluid.program_guard(self.predict_program): - state, action, reward, next_s, isOver = self._get_inputs() - self.pred_value = self.get_DQN_prediction(state) - - with fluid.program_guard(self.train_program): - state, action, reward, next_s, isOver = self._get_inputs() - pred_value = self.get_DQN_prediction(state) - - reward = fluid.layers.clip(reward, min=-1.0, max=1.0) - - action_onehot = fluid.layers.one_hot(action, self.action_dim) - action_onehot = fluid.layers.cast(action_onehot, dtype='float32') - - pred_action_value = fluid.layers.reduce_sum( - fluid.layers.elementwise_mul(action_onehot, pred_value), dim=1) - - targetQ_predict_value = self.get_DQN_prediction(next_s, target=True) - best_v = fluid.layers.reduce_max(targetQ_predict_value, dim=1) - best_v.stop_gradient = True - - target = reward + (1.0 - fluid.layers.cast( - isOver, dtype='float32')) * self.gamma * best_v - cost = fluid.layers.square_error_cost(pred_action_value, target) - cost = fluid.layers.reduce_mean(cost) - - optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3) - optimizer.minimize(cost) - - vars = list(self.train_program.list_vars()) - target_vars = list(filter( - lambda x: 'GRAD' not in x.name and 'target' in x.name, vars)) - - policy_vars_name = [ - x.name.replace('target', 'policy') for x in target_vars] - policy_vars = list(filter( - lambda x: x.name in policy_vars_name, vars)) - - policy_vars.sort(key=lambda x: x.name) - target_vars.sort(key=lambda x: x.name) - - with fluid.program_guard(self._sync_program): - sync_ops = [] - for i, var in enumerate(policy_vars): - sync_op = fluid.layers.assign(policy_vars[i], target_vars[i]) - sync_ops.append(sync_op) - - # fluid exe - place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace() - self.exe = fluid.Executor(place) - self.exe.run(fluid.default_startup_program()) - - def get_DQN_prediction(self, image, target=False): - image = image / 255.0 - - variable_field = 'target' if target else 'policy' - - conv1 = fluid.layers.conv2d( - input=image, - num_filters=32, - filter_size=5, - stride=1, - padding=2, - act='relu', - param_attr=ParamAttr(name='{}_conv1'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field))) - max_pool1 = fluid.layers.pool2d( - input=conv1, pool_size=2, pool_stride=2, pool_type='max') - - conv2 = fluid.layers.conv2d( - input=max_pool1, - num_filters=32, - filter_size=5, - stride=1, - padding=2, - act='relu', - param_attr=ParamAttr(name='{}_conv2'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field))) - max_pool2 = fluid.layers.pool2d( - input=conv2, pool_size=2, pool_stride=2, pool_type='max') - - conv3 = fluid.layers.conv2d( - input=max_pool2, - num_filters=64, - filter_size=4, - stride=1, - padding=1, - act='relu', - param_attr=ParamAttr(name='{}_conv3'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field))) - max_pool3 = fluid.layers.pool2d( - input=conv3, pool_size=2, pool_stride=2, pool_type='max') - - conv4 = fluid.layers.conv2d( - input=max_pool3, - num_filters=64, - filter_size=3, - stride=1, - padding=1, - act='relu', - param_attr=ParamAttr(name='{}_conv4'.format(variable_field)), - bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field))) - - flatten = fluid.layers.flatten(conv4, axis=1) - - value = fluid.layers.fc( - input=flatten, - size=1, - param_attr=ParamAttr(name='{}_value_fc'.format(variable_field)), - bias_attr=ParamAttr(name='{}_value_fc_b'.format(variable_field))) - - advantage = fluid.layers.fc( - input=flatten, - size=self.action_dim, - param_attr=ParamAttr(name='{}_advantage_fc'.format(variable_field)), - bias_attr=ParamAttr( - name='{}_advantage_fc_b'.format(variable_field))) - - Q = advantage + (value - fluid.layers.reduce_mean( - advantage, dim=1, keep_dim=True)) - return Q - - - def act(self, state, train_or_test): - sample = np.random.random() - if train_or_test == 'train' and sample < self.exploration: - act = np.random.randint(self.action_dim) - else: - if np.random.random() < 0.01: - act = np.random.randint(self.action_dim) - else: - state = np.expand_dims(state, axis=0) - pred_Q = self.exe.run(self.predict_program, - feed={'state': state.astype('float32')}, - fetch_list=[self.pred_value])[0] - pred_Q = np.squeeze(pred_Q, axis=0) - act = np.argmax(pred_Q) - if train_or_test == 'train': - self.exploration = max(0.1, self.exploration - 1e-6) - return act - - def train(self, state, action, reward, next_state, isOver): - if self.global_step % self.update_target_steps == 0: - self.sync_target_network() - self.global_step += 1 - - action = np.expand_dims(action, -1) - self.exe.run(self.train_program, - feed={ - 'state': state.astype('float32'), - 'action': action.astype('int32'), - 'reward': reward, - 'next_s': next_state.astype('float32'), - 'isOver': isOver - }) - - def sync_target_network(self): - self.exe.run(self._sync_program) diff --git a/PaddleRL/DeepQNetwork/README.md b/PaddleRL/DeepQNetwork/README.md deleted file mode 100644 index 1edeaaa884318ec3a530ec4fdb7d031d07411b56..0000000000000000000000000000000000000000 --- a/PaddleRL/DeepQNetwork/README.md +++ /dev/null @@ -1,67 +0,0 @@ -[中文版](README_cn.md) - -## Reproduce DQN, DoubleDQN, DuelingDQN model with Fluid version of PaddlePaddle -Based on PaddlePaddle's next-generation API Fluid, the DQN model of deep reinforcement learning is reproduced, and the same level of indicators of the paper is reproduced in the classic Atari game. The model receives the image of the game as input, and uses the end-to-end model to directly predict the next step. The repository contains the following three types of models: -+ DQN in -[Human-level Control Through Deep Reinforcement Learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html) -+ DoubleDQN in: -[Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389) -+ DuelingDQN in: -[Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html) - -## Atari benchmark & performance - -### Atari games introduction - -Please see [here](https://gym.openai.com/envs/#atari) to know more about Atari game. - -### Pong game result - -The average game rewards that can be obtained for the three models as the number of training steps changes during the training are as follows(about 3 hours/1 Million steps): - -
-DQN result -
- -## How to use -### Dependencies: -+ python2.7 -+ gym -+ tqdm -+ opencv-python -+ paddlepaddle-gpu>=1.0.0 -+ ale_python_interface - -### Install Dependencies: -+ Install PaddlePaddle: - recommended to compile and install PaddlePaddle from source code -+ Install other dependencies: - ``` - pip install -r requirement.txt - pip install gym[atari] - ``` - Install ale_python_interface, please see [here](https://github.com/mgbellemare/Arcade-Learning-Environment). - -### Start Training: -``` -# To train a model for Pong game with gpu (use DQN model as default) -python train.py --rom ./rom_files/pong.bin --use_cuda - -# To train a model for Pong with DoubleDQN -python train.py --rom ./rom_files/pong.bin --use_cuda --alg DoubleDQN - -# To train a model for Pong with DuelingDQN -python train.py --rom ./rom_files/pong.bin --use_cuda --alg DuelingDQN -``` - -To train more games, you can install more rom files from [here](https://github.com/openai/atari-py/tree/master/atari_py/atari_roms). - -### Start Testing: -``` -# Play the game with saved best model and calculate the average rewards -python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong - -# Play the game with visualization -python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong --viz 0.01 -``` -[Here](https://pan.baidu.com/s/1gIsbNw5V7tMeb74ojx-TMA) is saved models for Pong and Breakout games. You can use it to play the game directly. diff --git a/PaddleRL/DeepQNetwork/README_cn.md b/PaddleRL/DeepQNetwork/README_cn.md deleted file mode 100644 index 640d775ad8fed2be360d308b6c5df41c86d77c04..0000000000000000000000000000000000000000 --- a/PaddleRL/DeepQNetwork/README_cn.md +++ /dev/null @@ -1,71 +0,0 @@ -## 基于PaddlePaddle的Fluid版本复现DQN, DoubleDQN, DuelingDQN三个模型 - -基于PaddlePaddle下一代API Fluid复现了深度强化学习领域的DQN模型,在经典的Atari 游戏上复现了论文同等水平的指标,模型接收游戏的图像作为输入,采用端到端的模型直接预测下一步要执行的控制信号,本仓库一共包含以下3类模型: -+ DQN模型: -[Human-level Control Through Deep Reinforcement Learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html) -+ DoubleDQN模型: -[Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389) -+ DuelingDQN模型: -[Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html) - -## 模型效果:Atari游戏表现 - -### Atari游戏介绍 - -请点击[这里](https://gym.openai.com/envs/#atari)了解Atari游戏。 - -### Pong游戏训练结果 -三个模型在训练过程中随着训练步数的变化,能得到的平均游戏奖励如下图所示(大概3小时每1百万步): - -
-DQN result -
- -## 使用教程 - -### 依赖: -+ python2.7 -+ gym -+ tqdm -+ opencv-python -+ paddlepaddle-gpu>=1.0.0 -+ ale_python_interface - -### 下载依赖: - -+ 安装PaddlePaddle: - 建议通过PaddlePaddle源码进行编译安装 -+ 下载其它依赖: - ``` - pip install -r requirement.txt - pip install gym[atari] - ``` - 安装ale_python_interface可以参考[这里](https://github.com/mgbellemare/Arcade-Learning-Environment) - -### 训练模型: - -``` -# 使用GPU训练Pong游戏(默认使用DQN模型) -python train.py --rom ./rom_files/pong.bin --use_cuda - -# 训练DoubleDQN模型 -python train.py --rom ./rom_files/pong.bin --use_cuda --alg DoubleDQN - -# 训练DuelingDQN模型 -python train.py --rom ./rom_files/pong.bin --use_cuda --alg DuelingDQN -``` - -训练更多游戏,可以从[这里](https://github.com/openai/atari-py/tree/master/atari_py/atari_roms)下载游戏rom - -### 测试模型: - -``` -# Play the game with saved model and calculate the average rewards -# 使用训练过程中保存的最好模型玩游戏,以及计算平均奖励(rewards) -python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong - -# 以可视化的形式来玩游戏 -python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong --viz 0.01 -``` - -[这里](https://pan.baidu.com/s/1gIsbNw5V7tMeb74ojx-TMA)是Pong和Breakout游戏训练好的模型,可以直接用来测试。 diff --git a/PaddleRL/DeepQNetwork/assets/dqn.png b/PaddleRL/DeepQNetwork/assets/dqn.png deleted file mode 100644 index f8f8d12f9887cdab62f09b52597ec187a4c8107c..0000000000000000000000000000000000000000 Binary files a/PaddleRL/DeepQNetwork/assets/dqn.png and /dev/null differ diff --git a/PaddleRL/DeepQNetwork/atari.py b/PaddleRL/DeepQNetwork/atari.py deleted file mode 100644 index ec793cba15ddc1c42986689eaad5773875a4ffde..0000000000000000000000000000000000000000 --- a/PaddleRL/DeepQNetwork/atari.py +++ /dev/null @@ -1,160 +0,0 @@ -# -*- coding: utf-8 -*- - -import numpy as np -import os -import cv2 -import threading - -import gym -from gym import spaces -from gym.envs.atari.atari_env import ACTION_MEANING - -from atari_py import ALEInterface - -__all__ = ['AtariPlayer'] - -ROM_URL = "https://github.com/openai/atari-py/tree/master/atari_py/atari_roms" -_ALE_LOCK = threading.Lock() -""" -The following AtariPlayer are copied or modified from tensorpack/tensorpack: - https://github.com/tensorpack/tensorpack/blob/master/examples/DeepQNetwork/atari.py -""" - - -class AtariPlayer(gym.Env): - """ - A wrapper for ALE emulator, with configurations to mimic DeepMind DQN settings. - Info: - score: the accumulated reward in the current game - gameOver: True when the current game is Over - """ - - def __init__(self, - rom_file, - viz=0, - frame_skip=4, - nullop_start=30, - live_lost_as_eoe=True, - max_num_frames=0): - """ - Args: - rom_file: path to the rom - frame_skip: skip every k frames and repeat the action - viz: visualization to be done. - Set to 0 to disable. - Set to a positive number to be the delay between frames to show. - Set to a string to be a directory to store frames. - nullop_start: start with random number of null ops. - live_losts_as_eoe: consider lost of lives as end of episode. Useful for training. - max_num_frames: maximum number of frames per episode. - """ - super(AtariPlayer, self).__init__() - assert os.path.isfile(rom_file), \ - "rom {} not found. Please download at {}".format(rom_file, ROM_URL) - - try: - ALEInterface.setLoggerMode(ALEInterface.Logger.Error) - except AttributeError: - print("You're not using latest ALE") - - # avoid simulator bugs: https://github.com/mgbellemare/Arcade-Learning-Environment/issues/86 - with _ALE_LOCK: - self.ale = ALEInterface() - self.ale.setInt(b"random_seed", np.random.randint(0, 30000)) - self.ale.setInt(b"max_num_frames_per_episode", max_num_frames) - self.ale.setBool(b"showinfo", False) - - self.ale.setInt(b"frame_skip", 1) - self.ale.setBool(b'color_averaging', False) - # manual.pdf suggests otherwise. - self.ale.setFloat(b'repeat_action_probability', 0.0) - - # viz setup - if isinstance(viz, str): - assert os.path.isdir(viz), viz - self.ale.setString(b'record_screen_dir', viz) - viz = 0 - if isinstance(viz, int): - viz = float(viz) - self.viz = viz - if self.viz and isinstance(self.viz, float): - self.windowname = os.path.basename(rom_file) - cv2.startWindowThread() - cv2.namedWindow(self.windowname) - - self.ale.loadROM(rom_file.encode('utf-8')) - self.width, self.height = self.ale.getScreenDims() - self.actions = self.ale.getMinimalActionSet() - - self.live_lost_as_eoe = live_lost_as_eoe - self.frame_skip = frame_skip - self.nullop_start = nullop_start - - self.action_space = spaces.Discrete(len(self.actions)) - self.observation_space = spaces.Box(low=0, - high=255, - shape=(self.height, self.width), - dtype=np.uint8) - self._restart_episode() - - def get_action_meanings(self): - return [ACTION_MEANING[i] for i in self.actions] - - def _grab_raw_image(self): - """ - :returns: the current 3-channel image - """ - m = self.ale.getScreenRGB() - return m.reshape((self.height, self.width, 3)) - - def _current_state(self): - """ - returns: a gray-scale (h, w) uint8 image - """ - ret = self._grab_raw_image() - # avoid missing frame issue: max-pooled over the last screen - ret = np.maximum(ret, self.last_raw_screen) - if self.viz: - if isinstance(self.viz, float): - cv2.imshow(self.windowname, ret) - cv2.waitKey(int(self.viz * 1000)) - ret = ret.astype('float32') - # 0.299,0.587.0.114. same as rgb2y in torch/image - ret = cv2.cvtColor(ret, cv2.COLOR_RGB2GRAY) - return ret.astype('uint8') # to save some memory - - def _restart_episode(self): - with _ALE_LOCK: - self.ale.reset_game() - - # random null-ops start - n = np.random.randint(self.nullop_start) - self.last_raw_screen = self._grab_raw_image() - for k in range(n): - if k == n - 1: - self.last_raw_screen = self._grab_raw_image() - self.ale.act(0) - - def reset(self): - if self.ale.game_over(): - self._restart_episode() - return self._current_state() - - def step(self, act): - oldlives = self.ale.lives() - r = 0 - for k in range(self.frame_skip): - if k == self.frame_skip - 1: - self.last_raw_screen = self._grab_raw_image() - r += self.ale.act(self.actions[act]) - newlives = self.ale.lives() - if self.ale.game_over() or \ - (self.live_lost_as_eoe and newlives < oldlives): - break - - isOver = self.ale.game_over() - if self.live_lost_as_eoe: - isOver = isOver or newlives < oldlives - - info = {'ale.lives': newlives} - return self._current_state(), r, isOver, info diff --git a/PaddleRL/DeepQNetwork/atari_wrapper.py b/PaddleRL/DeepQNetwork/atari_wrapper.py deleted file mode 100644 index 81ec7e0ba0ee191f70591c16bfff560a62d3d395..0000000000000000000000000000000000000000 --- a/PaddleRL/DeepQNetwork/atari_wrapper.py +++ /dev/null @@ -1,106 +0,0 @@ -# -*- coding: utf-8 -*- - -import numpy as np -from collections import deque - -import gym -from gym import spaces - -_v0, _v1 = gym.__version__.split('.')[:2] -assert int(_v0) > 0 or int(_v1) >= 10, gym.__version__ -""" -The following wrappers are copied or modified from openai/baselines: -https://github.com/openai/baselines/blob/master/baselines/common/atari_wrappers.py -""" - - -class MapState(gym.ObservationWrapper): - def __init__(self, env, map_func): - gym.ObservationWrapper.__init__(self, env) - self._func = map_func - - def observation(self, obs): - return self._func(obs) - - -class FrameStack(gym.Wrapper): - def __init__(self, env, k): - """Buffer observations and stack across channels (last axis).""" - gym.Wrapper.__init__(self, env) - self.k = k - self.frames = deque([], maxlen=k) - shp = env.observation_space.shape - chan = 1 if len(shp) == 2 else shp[2] - self.observation_space = spaces.Box(low=0, - high=255, - shape=(shp[0], shp[1], chan * k), - dtype=np.uint8) - - def reset(self): - """Clear buffer and re-fill by duplicating the first observation.""" - ob = self.env.reset() - for _ in range(self.k - 1): - self.frames.append(np.zeros_like(ob)) - self.frames.append(ob) - return self.observation() - - def step(self, action): - ob, reward, done, info = self.env.step(action) - self.frames.append(ob) - return self.observation(), reward, done, info - - def observation(self): - assert len(self.frames) == self.k - return np.stack(self.frames, axis=0) - - -class _FireResetEnv(gym.Wrapper): - def __init__(self, env): - """Take action on reset for environments that are fixed until firing.""" - gym.Wrapper.__init__(self, env) - assert env.unwrapped.get_action_meanings()[1] == 'FIRE' - assert len(env.unwrapped.get_action_meanings()) >= 3 - - def reset(self): - self.env.reset() - obs, _, done, _ = self.env.step(1) - if done: - self.env.reset() - obs, _, done, _ = self.env.step(2) - if done: - self.env.reset() - return obs - - def step(self, action): - return self.env.step(action) - - -def FireResetEnv(env): - if isinstance(env, gym.Wrapper): - baseenv = env.unwrapped - else: - baseenv = env - if 'FIRE' in baseenv.get_action_meanings(): - return _FireResetEnv(env) - return env - - -class LimitLength(gym.Wrapper): - def __init__(self, env, k): - gym.Wrapper.__init__(self, env) - self.k = k - - def reset(self): - # This assumes that reset() will really reset the env. - # If the underlying env tries to be smart about reset - # (e.g. end-of-life), the assumption doesn't hold. - ob = self.env.reset() - self.cnt = 0 - return ob - - def step(self, action): - ob, r, done, info = self.env.step(action) - self.cnt += 1 - if self.cnt == self.k: - done = True - return ob, r, done, info diff --git a/PaddleRL/DeepQNetwork/expreplay.py b/PaddleRL/DeepQNetwork/expreplay.py deleted file mode 100644 index 5f27ca7286b5db7ac963bc25236be416fad50eb0..0000000000000000000000000000000000000000 --- a/PaddleRL/DeepQNetwork/expreplay.py +++ /dev/null @@ -1,98 +0,0 @@ -# -*- coding: utf-8 -*- - -import numpy as np -import copy -from collections import deque, namedtuple - -Experience = namedtuple('Experience', ['state', 'action', 'reward', 'isOver']) - - -class ReplayMemory(object): - def __init__(self, max_size, state_shape, context_len): - self.max_size = int(max_size) - self.state_shape = state_shape - self.context_len = int(context_len) - - self.state = np.zeros((self.max_size, ) + state_shape, dtype='uint8') - self.action = np.zeros((self.max_size, ), dtype='int32') - self.reward = np.zeros((self.max_size, ), dtype='float32') - self.isOver = np.zeros((self.max_size, ), dtype='bool') - - self._curr_size = 0 - self._curr_pos = 0 - self._context = deque(maxlen=context_len - 1) - - def append(self, exp): - """append a new experience into replay memory - """ - if self._curr_size < self.max_size: - self._assign(self._curr_pos, exp) - self._curr_size += 1 - else: - self._assign(self._curr_pos, exp) - self._curr_pos = (self._curr_pos + 1) % self.max_size - if exp.isOver: - self._context.clear() - else: - self._context.append(exp) - - def recent_state(self): - """ maintain recent state for training""" - lst = list(self._context) - states = [np.zeros(self.state_shape, dtype='uint8')] * \ - (self._context.maxlen - len(lst)) - states.extend([k.state for k in lst]) - return states - - def sample(self, idx): - """ return state, action, reward, isOver, - note that some frames in state may be generated from last episode, - they should be removed from state - """ - state = np.zeros( - (self.context_len + 1, ) + self.state_shape, dtype=np.uint8) - state_idx = np.arange(idx, idx + self.context_len + 1) % self._curr_size - - # confirm that no frame was generated from last episode - has_last_episode = False - for k in range(self.context_len - 2, -1, -1): - to_check_idx = state_idx[k] - if self.isOver[to_check_idx]: - has_last_episode = True - state_idx = state_idx[k + 1:] - state[k + 1:] = self.state[state_idx] - break - - if not has_last_episode: - state = self.state[state_idx] - - real_idx = (idx + self.context_len - 1) % self._curr_size - action = self.action[real_idx] - reward = self.reward[real_idx] - isOver = self.isOver[real_idx] - return state, reward, action, isOver - - def __len__(self): - return self._curr_size - - def _assign(self, pos, exp): - self.state[pos] = exp.state - self.reward[pos] = exp.reward - self.action[pos] = exp.action - self.isOver[pos] = exp.isOver - - def sample_batch(self, batch_size): - """sample a batch from replay memory for training - """ - batch_idx = np.random.randint( - self._curr_size - self.context_len - 1, size=batch_size) - batch_idx = (self._curr_pos + batch_idx) % self._curr_size - batch_exp = [self.sample(i) for i in batch_idx] - return self._process_batch(batch_exp) - - def _process_batch(self, batch_exp): - state = np.asarray([e[0] for e in batch_exp], dtype='uint8') - reward = np.asarray([e[1] for e in batch_exp], dtype='float32') - action = np.asarray([e[2] for e in batch_exp], dtype='int8') - isOver = np.asarray([e[3] for e in batch_exp], dtype='bool') - return [state, action, reward, isOver] diff --git a/PaddleRL/DeepQNetwork/play.py b/PaddleRL/DeepQNetwork/play.py deleted file mode 100644 index 2c93da509d7cccb81d713c7aefd45a11ee28e8fb..0000000000000000000000000000000000000000 --- a/PaddleRL/DeepQNetwork/play.py +++ /dev/null @@ -1,65 +0,0 @@ -#-*- coding: utf-8 -*- - -import argparse -import os -import numpy as np -import paddle.fluid as fluid - -from train import get_player -from tqdm import tqdm - - -def predict_action(exe, state, predict_program, feed_names, fetch_targets, - action_dim): - if np.random.random() < 0.01: - act = np.random.randint(action_dim) - else: - state = np.expand_dims(state, axis=0) - pred_Q = exe.run(predict_program, - feed={feed_names[0]: state.astype('float32')}, - fetch_list=fetch_targets)[0] - pred_Q = np.squeeze(pred_Q, axis=0) - act = np.argmax(pred_Q) - return act - - -if __name__ == '__main__': - parser = argparse.ArgumentParser() - parser.add_argument( - '--use_cuda', action='store_true', help='if set, use cuda') - parser.add_argument('--rom', type=str, required=True, help='atari rom') - parser.add_argument( - '--model_path', type=str, required=True, help='dirname to load model') - parser.add_argument( - '--viz', - type=float, - default=0, - help='''viz: visualization setting: - Set to 0 to disable; - Set to a positive number to be the delay between frames to show. - ''') - args = parser.parse_args() - - env = get_player(args.rom, viz=args.viz) - - place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace() - exe = fluid.Executor(place) - inference_scope = fluid.Scope() - with fluid.scope_guard(inference_scope): - [predict_program, feed_names, - fetch_targets] = fluid.io.load_inference_model(args.model_path, exe) - - episode_reward = [] - for _ in tqdm(xrange(30), desc='eval agent'): - state = env.reset() - total_reward = 0 - while True: - action = predict_action(exe, state, predict_program, feed_names, - fetch_targets, env.action_space.n) - state, reward, isOver, info = env.step(action) - total_reward += reward - if isOver: - break - episode_reward.append(total_reward) - eval_reward = np.mean(episode_reward) - print('Average reward of 30 epidose: {}'.format(eval_reward)) diff --git a/PaddleRL/DeepQNetwork/requirement.txt b/PaddleRL/DeepQNetwork/requirement.txt deleted file mode 100644 index 689eb324e6bd65aabbe44ca041ff7b3ddacb1943..0000000000000000000000000000000000000000 --- a/PaddleRL/DeepQNetwork/requirement.txt +++ /dev/null @@ -1,5 +0,0 @@ -numpy -gym -tqdm -opencv-python -paddlepaddle-gpu>=1.0.0 diff --git a/PaddleRL/DeepQNetwork/rom_files/breakout.bin b/PaddleRL/DeepQNetwork/rom_files/breakout.bin deleted file mode 100644 index abab5a8c0a1890461a11b78d4265f1b794327793..0000000000000000000000000000000000000000 Binary files a/PaddleRL/DeepQNetwork/rom_files/breakout.bin and /dev/null differ diff --git a/PaddleRL/DeepQNetwork/rom_files/pong.bin b/PaddleRL/DeepQNetwork/rom_files/pong.bin deleted file mode 100644 index 14a5bdfc72548613c059938bdf712efdbb5d3806..0000000000000000000000000000000000000000 Binary files a/PaddleRL/DeepQNetwork/rom_files/pong.bin and /dev/null differ diff --git a/PaddleRL/DeepQNetwork/train.py b/PaddleRL/DeepQNetwork/train.py deleted file mode 100644 index dd7986d704aec0c0948f81ca7ddd69bbbd3ea239..0000000000000000000000000000000000000000 --- a/PaddleRL/DeepQNetwork/train.py +++ /dev/null @@ -1,181 +0,0 @@ -#-*- coding: utf-8 -*- - -from DQN_agent import DQNModel -from DoubleDQN_agent import DoubleDQNModel -from DuelingDQN_agent import DuelingDQNModel -from atari import AtariPlayer -import paddle.fluid as fluid -import gym -import argparse -import cv2 -from tqdm import tqdm -from expreplay import ReplayMemory, Experience -import numpy as np -import os - -from datetime import datetime -from atari_wrapper import FrameStack, MapState, FireResetEnv, LimitLength -from collections import deque - -UPDATE_FREQ = 4 - -MEMORY_SIZE = 1e6 -MEMORY_WARMUP_SIZE = MEMORY_SIZE // 20 -IMAGE_SIZE = (84, 84) -CONTEXT_LEN = 4 -ACTION_REPEAT = 4 # aka FRAME_SKIP -UPDATE_FREQ = 4 - - -def run_train_episode(agent, env, exp): - total_reward = 0 - state = env.reset() - step = 0 - while True: - step += 1 - context = exp.recent_state() - context.append(state) - context = np.stack(context, axis=0) - action = agent.act(context, train_or_test='train') - next_state, reward, isOver, _ = env.step(action) - exp.append(Experience(state, action, reward, isOver)) - # train model - # start training - if len(exp) > MEMORY_WARMUP_SIZE: - if step % UPDATE_FREQ == 0: - batch_all_state, batch_action, batch_reward, batch_isOver = exp.sample_batch( - args.batch_size) - batch_state = batch_all_state[:, :CONTEXT_LEN, :, :] - batch_next_state = batch_all_state[:, 1:, :, :] - agent.train(batch_state, batch_action, batch_reward, - batch_next_state, batch_isOver) - total_reward += reward - state = next_state - if isOver: - break - return total_reward, step - - -def get_player(rom, viz=False, train=False): - env = AtariPlayer( - rom, - frame_skip=ACTION_REPEAT, - viz=viz, - live_lost_as_eoe=train, - max_num_frames=60000) - env = FireResetEnv(env) - env = MapState(env, lambda im: cv2.resize(im, IMAGE_SIZE)) - if not train: - # in training, context is taken care of in expreplay buffer - env = FrameStack(env, CONTEXT_LEN) - return env - - -def eval_agent(agent, env): - episode_reward = [] - for _ in tqdm(range(30), desc='eval agent'): - state = env.reset() - total_reward = 0 - step = 0 - while True: - step += 1 - action = agent.act(state, train_or_test='test') - state, reward, isOver, info = env.step(action) - total_reward += reward - if isOver: - break - episode_reward.append(total_reward) - eval_reward = np.mean(episode_reward) - return eval_reward - - -def train_agent(): - env = get_player(args.rom, train=True) - test_env = get_player(args.rom) - exp = ReplayMemory(args.mem_size, IMAGE_SIZE, CONTEXT_LEN) - action_dim = env.action_space.n - - if args.alg == 'DQN': - agent = DQNModel(IMAGE_SIZE, action_dim, args.gamma, CONTEXT_LEN, - args.use_cuda) - elif args.alg == 'DoubleDQN': - agent = DoubleDQNModel(IMAGE_SIZE, action_dim, args.gamma, CONTEXT_LEN, - args.use_cuda) - elif args.alg == 'DuelingDQN': - agent = DuelingDQNModel(IMAGE_SIZE, action_dim, args.gamma, CONTEXT_LEN, - args.use_cuda) - else: - print('Input algorithm name error!') - return - - with tqdm(total=MEMORY_WARMUP_SIZE, desc='Memory warmup') as pbar: - while len(exp) < MEMORY_WARMUP_SIZE: - total_reward, step = run_train_episode(agent, env, exp) - pbar.update(step) - - # train - test_flag = 0 - save_flag = 0 - pbar = tqdm(total=1e8) - recent_100_reward = [] - total_step = 0 - max_reward = None - save_path = os.path.join(args.model_dirname, '{}-{}'.format( - args.alg, os.path.basename(args.rom).split('.')[0])) - while True: - # start epoch - total_reward, step = run_train_episode(agent, env, exp) - total_step += step - pbar.set_description('[train]exploration:{}'.format(agent.exploration)) - pbar.update(step) - - if total_step // args.test_every_steps == test_flag: - pbar.write("testing") - eval_reward = eval_agent(agent, test_env) - test_flag += 1 - print("eval_agent done, (steps, eval_reward): ({}, {})".format( - total_step, eval_reward)) - - if max_reward is None or eval_reward > max_reward: - max_reward = eval_reward - fluid.io.save_inference_model(save_path, ['state'], - agent.pred_value, agent.exe, - agent.predict_program) - pbar.close() - - -if __name__ == '__main__': - parser = argparse.ArgumentParser() - parser.add_argument( - '--alg', - type=str, - default='DQN', - help='Reinforcement learning algorithm, support: DQN, DoubleDQN, DuelingDQN' - ) - parser.add_argument( - '--use_cuda', action='store_true', help='if set, use cuda') - parser.add_argument( - '--gamma', - type=float, - default=0.99, - help='discount factor for accumulated reward computation') - parser.add_argument( - '--mem_size', - type=int, - default=1000000, - help='memory size for experience replay') - parser.add_argument( - '--batch_size', type=int, default=64, help='batch size for training') - parser.add_argument('--rom', help='atari rom', required=True) - parser.add_argument( - '--model_dirname', - type=str, - default='saved_model', - help='dirname to save model') - parser.add_argument( - '--test_every_steps', - type=int, - default=100000, - help='every steps number to run test') - args = parser.parse_args() - train_agent() diff --git a/PaddleRL/README.md b/PaddleRL/README.md deleted file mode 100644 index 5b8d2caf78d426a14b96f7d842eb88ed37bab233..0000000000000000000000000000000000000000 --- a/PaddleRL/README.md +++ /dev/null @@ -1,11 +0,0 @@ -PaddleRL -============ - -强化学习 --------- - -强化学习是近年来一个愈发重要的机器学习方向,特别是与深度学习相结合而形成的深度强化学习(Deep Reinforcement Learning, DRL),取得了很多令人惊异的成就。人们所熟知的战胜人类顶级围棋职业选手的 AlphaGo 就是 DRL 应用的一个典型例子,除游戏领域外,其它的应用还包括机器人、自然语言处理等。 - -深度强化学习的开山之作是在Atari视频游戏中的成功应用, 其可直接接受视频帧这种高维输入并根据图像内容端到端地预测下一步的动作,所用到的模型被称为深度Q网络(Deep Q-Network, DQN)。本实例就是利用PaddlePaddle Fluid这个灵活的框架,实现了 DQN 及其变体,并测试了它们在 Atari 游戏中的表现。 - -- [DeepQNetwork](https://github.com/PaddlePaddle/models/blob/develop/PaddleRL/DeepQNetwork/README_cn.md) diff --git a/PaddleRL/policy_gradient/README.md b/PaddleRL/policy_gradient/README.md deleted file mode 100644 index b813aa124466597adfb80261bee7c2de22b95e67..0000000000000000000000000000000000000000 --- a/PaddleRL/policy_gradient/README.md +++ /dev/null @@ -1,171 +0,0 @@ -运行本目录下的程序示例需要使用PaddlePaddle的最新develop分枝。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。 - ---- - -# Policy Gradient RL by PaddlePaddle -本文介绍了如何使用PaddlePaddle通过policy-based的强化学习方法来训练一个player(actor model), 我们希望这个player可以完成简单的走阶梯任务。 - - 内容分为: - - - 任务描述 - - 模型 - - 策略(目标函数) - - 算法(Gradient ascent) - - PaddlePaddle实现 - - -## 1. 任务描述 -假设有一个阶梯,连接A、B点,player从A点出发,每一步只能向前走一步或向后走一步,到达B点即为完成任务。我们希望训练一个聪明的player,它知道怎么最快的从A点到达B点。 -我们在命令行以下边的形式模拟任务: -``` -A - O - - - - - B -``` -一个‘-'代表一个阶梯,A点在行头,B点在行末,O代表player当前在的位置。 - -## 2. Policy Gradient -### 2.1 模型 -#### inputyer -模型的输入是player观察到的当前阶梯的状态$S$, 要包含阶梯的长度和player当前的位置信息。 -在命令行模拟的情况下,player的位置和阶梯长度连个变量足以表示当前的状态,但是我们为了便于将这个demo推广到更复杂的任务场景,我们这里用一个向量来表示游戏状态$S$. -向量$S$的长度为阶梯的长度,每一维代表一个阶梯,player所在的位置为1,其它位置为0. -下边是一个例子: -``` -S = [0, 1, 0, 0] // 阶梯长度为4,player在第二个阶梯上。 -``` -#### hidden layer -隐藏层采用两个全连接layer `FC_1`和`FC_2`, 其中`FC_1` 的size为10, `FC_2`的size为2. - -#### output layer -我们使用softmax将`FC_2`的output映射为所有可能的动作(前进或后退)的概率分布(Probability of taking the action),即为一个二维向量`act_probs`, 其中,`act_probs[0]` 为后退的概率,`act_probs[1]`为前进的概率。 - -#### 模型表示 -我将我们的player模型(actor)形式化表示如下: -$$a = \pi_\theta(s)$$ -其中$\theta$表示模型的参数,$s$是输入状态。 - - -### 2.2 策略(目标函数) -我们怎么评估一个player(模型)的好坏呢?首先我们定义几个术语: -我们让$\pi_\theta(s)$来玩一局游戏,$s_t$表示第$t$时刻的状态,$a_t$表示在状态$s_t$做出的动作,$r_t$表示做过动作$a_t$后得到的奖赏。 -一局游戏的过程可以表示如下: -$$\tau = [s_1, a_1, r_1, s_2, a_2, r_2 ... s_T, a_T, r_T] \tag{1}$$ - -一局游戏的奖励表示如下: -$$R(\tau) = \sum_{t=1}^Tr_t$$ - -player玩一局游戏,可能会出现多种操作序列$\tau$ ,某个$\tau$出现的概率是依赖于player model的$\theta$, 记做: -$$P(\tau | \theta)$$ -那么,给定一个$\theta$(player model), 玩一局游戏,期望得到的奖励是: -$$\overline {R}_\theta = \sum_\tau R(\tau)\sum_\tau R(\tau) P(\tau|\theta)$$ -大多数情况,我们无法穷举出所有的$\tau$,所以我们就抽取N个$\tau$来计算近似的期望: -$$\overline {R}_\theta = \sum_\tau R(\tau) P(\tau|\theta) \approx \frac{1}{N} \sum_{n=1}^N R(\tau^n)$$ - -$\overline {R}_\theta$就是我们需要的目标函数,它表示了一个参数为$\theta$的player玩一局游戏得分的期望,这个期望越大,代表这个player能力越强。 -### 2.3 算法(Gradient ascent) -我们的目标函数是$\overline {R}_\theta$, 我们训练的任务就是, 我们训练的任务就是: -$$\theta^* = \arg\max_\theta \overline {R}_\theta$$ - -为了找到理想的$\theta$,我们使用Gradient ascent方法不断在$\overline {R}_\theta$的梯度方向更新$\theta$,可表示如下: -$$\theta' = \theta + \eta * \bigtriangledown \overline {R}_\theta$$ - -$$ \bigtriangledown \overline {R}_\theta = \sum_\tau R(\tau) \bigtriangledown P(\tau|\theta)\\ -= \sum_\tau R(\tau) P(\tau|\theta) \frac{\bigtriangledown P(\tau|\theta)}{P(\tau|\theta)} \\ -=\sum_\tau R(\tau) P(\tau|\theta) {\bigtriangledown \log P(\tau|\theta)} $$ - - -$$P(\tau|\theta) = P(s_1)P(a_1|s_1,\theta)P(s_2, r_1|s_1,a_1)P(a_2|s_2,\theta)P(s_3,r_2|s_2,a_2)...P(a_t|s_t,\theta)P(s_{t+1}, r_t|s_t,a_t)\\ -=P(s_1) \sum_{t=1}^T P(a_t|s_t,\theta)P(s_{t+1}, r_t|s_t,a_t)$$ - -$$\log P(\tau|\theta) = \log P(s_1) + \sum_{t=1}^T [\log P(a_t|s_t,\theta) + \log P(s_{t+1}, r_t|s_t,a_t)]$$ - -$$ \bigtriangledown \log P(\tau|\theta) = \sum_{t=1}^T \bigtriangledown \log P(a_t|s_t,\theta)$$ - -$$ \bigtriangledown \overline {R}_\theta = \sum_\tau R(\tau) P(\tau|\theta) {\bigtriangledown \log P(\tau|\theta)} \\ -\approx \frac{1}{N} \sum_{n=1}^N R(\tau^n) {\bigtriangledown \log P(\tau|\theta)} \\ -= \frac{1}{N} \sum_{n=1}^N R(\tau^n) {\sum_{t=1}^T \bigtriangledown \log P(a_t|s_t,\theta)} \\ -= \frac{1}{N} \sum_{n=1}^N \sum_{t=1}^T R(\tau^n) { \bigtriangledown \log P(a_t|s_t,\theta)} \tag{11}$$ - -#### 2.3.2 导数解释 - -在使用深度学习框架进行训练求解时,一般用梯度下降方法,所以我们把Gradient ascent转为Gradient -descent, 重写等式$(5)(6)$为: - -$$\theta^* = \arg\min_\theta (-\overline {R}_\theta \tag{13}$$ -$$\theta' = \theta - \eta * \bigtriangledown (-\overline {R}_\theta)) \tag{14}$$ - -根据上一节的推导,$ (-\bigtriangledown \overline {R}_\theta) $结果如下: - -$$ -\bigtriangledown \overline {R}_\theta -= \frac{1}{N} \sum_{n=1}^N \sum_{t=1}^T R(\tau^n) { \bigtriangledown -\log P(a_t|s_t,\theta)} \tag{15}$$ - -根据等式(14), 我们的player的模型可以设计为: - -

-
-图 1 -

- -用户的在一局游戏中的一次操作可以用元组$(s_t, a_t)$, 就是在状态$s_t$状态下做了动作$a_t$, 我们通过图(1)中的前向网络计算出来cross entropy cost为$−\log P(a_t|s_t,\theta)$, 恰好是等式(15)中我们需要微分的一项。 -图1是我们需要的player模型,我用这个网络的前向计算可以预测任何状态下该做什么动作。但是怎么去训练学习这个网络呢?在等式(15)中还有一项$R(\tau^n)$, 我做反向梯度传播的时候要加上这一项,所以我们需要在图1基础上再加上$R(\tau^n)$, 如 图2 所示: - -

-
-图 2 -

- -图2就是我们最终的网络结构。 - -#### 2.3.3 直观理解 -对于等式(15),我只看游戏中的一步操作,也就是这一项: $R(\tau^n) { \bigtriangledown -\log P(a_t|s_t,\theta)}$, 我们可以简单的认为我们训练的目的是让 $R(\tau^n) {[ -\log P(a_t|s_t,\theta)]}$尽可能的小,也就是$R(\tau^n) \log P(a_t|s_t,\theta)$尽可能的大。 - -- 如果我们当前游戏局的奖励$R(\tau^n)$为正,那么我们希望当前操作的出现的概率$P(a_t|s_t,\theta)$尽可能大。 -- 如果我们当前游戏局的奖励$R(\tau^n)$为负,那么我们希望当前操作的出现的概率$P(a_t|s_t,\theta)$尽可能小。 - -#### 2.3.4 一个问题 - -一人犯错,诛连九族。一人得道,鸡犬升天。如果一局游戏得到奖励,我们希望帮助获得奖励的每一次操作都被重视;否则,导致惩罚的操作都要被冷落一次。 -是不是很有道理的样子?但是,如果有些游戏场景只有奖励,没有惩罚,怎么办?也就是所有的$R(\tau^n)$都为正。 -针对不同的游戏场景,我们有不同的解决方案: - -1. 每局游戏得分不一样:将每局的得分减去一个bias,结果就有正有负了。 -2. 每局游戏得分一样:把完成一局的时间作为计分因素,并减去一个bias. - -我们在第一章描述的游戏场景,需要用第二种 ,player每次到达终点都会收到1分的奖励,我们可以按完成任务所用的步数来定义奖励R. -更进一步,我们认为一局游戏中每步动作对结局的贡献是不同的,有聪明的动作,也有愚蠢的操作。直观的理解,一般是靠前的动作是愚蠢的,靠后的动作是聪明的。既然有了这个价值观,那么我们拿到1分的奖励,就不能平均分给每个动作了。 -如图3所示,让所有动作按先后排队,从后往前衰减地给每个动作奖励,然后再每个动作的奖励再减去所有动作奖励的平均值: - -

-
-图 3 -

- -## 3. 训练效果 - -demo运行训练效果如下,经过1000轮尝试,我们的player就学会了如何有效的完成任务了: - -``` ----------O epoch: 0; steps: 42 ----------O epoch: 1; steps: 77 ----------O epoch: 2; steps: 82 ----------O epoch: 3; steps: 64 ----------O epoch: 4; steps: 79 ----------O epoch: 501; steps: 19 ----------O epoch: 1001; steps: 9 ----------O epoch: 1501; steps: 9 ----------O epoch: 2001; steps: 11 ----------O epoch: 2501; steps: 9 ----------O epoch: 3001; steps: 9 ----------O epoch: 3002; steps: 9 ----------O epoch: 3003; steps: 9 ----------O epoch: 3004; steps: 9 ----------O epoch: 3005; steps: 9 ----------O epoch: 3006; steps: 9 ----------O epoch: 3007; steps: 9 ----------O epoch: 3008; steps: 9 ----------O epoch: 3009; steps: 9 ----------O epoch: 3010; steps: 11 ----------O epoch: 3011; steps: 9 ----------O epoch: 3012; steps: 9 ----------O epoch: 3013; steps: 9 ----------O epoch: 3014; steps: 9 -``` diff --git a/PaddleRL/policy_gradient/brain.py b/PaddleRL/policy_gradient/brain.py deleted file mode 100644 index 27a2da28563e5063213100d34c1b88d5fe2f91b0..0000000000000000000000000000000000000000 --- a/PaddleRL/policy_gradient/brain.py +++ /dev/null @@ -1,94 +0,0 @@ -import numpy as np -import paddle.fluid as fluid -# reproducible -np.random.seed(1) - - -class PolicyGradient: - def __init__( - self, - n_actions, - n_features, - learning_rate=0.01, - reward_decay=0.95, - output_graph=False, ): - self.n_actions = n_actions - self.n_features = n_features - self.lr = learning_rate - self.gamma = reward_decay - - self.ep_obs, self.ep_as, self.ep_rs = [], [], [] - - self.place = fluid.CPUPlace() - self.exe = fluid.Executor(self.place) - - def build_net(self): - - obs = fluid.layers.data( - name='obs', shape=[self.n_features], dtype='float32') - acts = fluid.layers.data(name='acts', shape=[1], dtype='int64') - vt = fluid.layers.data(name='vt', shape=[1], dtype='float32') - # fc1 - fc1 = fluid.layers.fc(input=obs, size=10, act="tanh") # tanh activation - # fc2 - all_act_prob = fluid.layers.fc(input=fc1, - size=self.n_actions, - act="softmax") - self.inferece_program = fluid.defaul_main_program().clone() - # to maximize total reward (log_p * R) is to minimize -(log_p * R) - neg_log_prob = fluid.layers.cross_entropy( - input=self.all_act_prob, - label=acts) # this is negative log of chosen action - neg_log_prob_weight = fluid.layers.elementwise_mul(x=neg_log_prob, y=vt) - loss = fluid.layers.reduce_mean( - neg_log_prob_weight) # reward guided loss - - sgd_optimizer = fluid.optimizer.SGD(self.lr) - sgd_optimizer.minimize(loss) - self.exe.run(fluid.default_startup_program()) - - def choose_action(self, observation): - prob_weights = self.exe.run(self.inferece_program, - feed={"obs": observation[np.newaxis, :]}, - fetch_list=[self.all_act_prob]) - prob_weights = np.array(prob_weights[0]) - # select action w.r.t the actions prob - action = np.random.choice( - range(prob_weights.shape[1]), p=prob_weights.ravel()) - return action - - def store_transition(self, s, a, r): - self.ep_obs.append(s) - self.ep_as.append(a) - self.ep_rs.append(r) - - def learn(self): - # discount and normalize episode reward - discounted_ep_rs_norm = self._discount_and_norm_rewards() - tensor_obs = np.vstack(self.ep_obs).astype("float32") - tensor_as = np.array(self.ep_as).astype("int64") - tensor_as = tensor_as.reshape([tensor_as.shape[0], 1]) - tensor_vt = discounted_ep_rs_norm.astype("float32")[:, np.newaxis] - # train on episode - self.exe.run( - fluid.default_main_program(), - feed={ - "obs": tensor_obs, # shape=[None, n_obs] - "acts": tensor_as, # shape=[None, ] - "vt": tensor_vt # shape=[None, ] - }) - self.ep_obs, self.ep_as, self.ep_rs = [], [], [] # empty episode data - return discounted_ep_rs_norm - - def _discount_and_norm_rewards(self): - # discount episode rewards - discounted_ep_rs = np.zeros_like(self.ep_rs) - running_add = 0 - for t in reversed(range(0, len(self.ep_rs))): - running_add = running_add * self.gamma + self.ep_rs[t] - discounted_ep_rs[t] = running_add - - # normalize episode rewards - discounted_ep_rs -= np.mean(discounted_ep_rs) - discounted_ep_rs /= np.std(discounted_ep_rs) - return discounted_ep_rs diff --git a/PaddleRL/policy_gradient/env.py b/PaddleRL/policy_gradient/env.py deleted file mode 100644 index e2cd972dbc9a3943aceb9763b9dabcd50a1e6df1..0000000000000000000000000000000000000000 --- a/PaddleRL/policy_gradient/env.py +++ /dev/null @@ -1,56 +0,0 @@ -import time -import sys -import numpy as np - - -class Env(): - def __init__(self, stage_len, interval): - self.stage_len = stage_len - self.end = self.stage_len - 1 - self.position = 0 - self.interval = interval - self.step = 0 - self.epoch = -1 - self.render = False - - def reset(self): - self.end = self.stage_len - 1 - self.position = 0 - self.epoch += 1 - self.step = 0 - if self.render: - self.draw(True) - - def status(self): - s = np.zeros([self.stage_len]).astype("float32") - s[self.position] = 1 - return s - - def move(self, action): - self.step += 1 - reward = 0.0 - done = False - if action == 0: - self.position = max(0, self.position - 1) - else: - self.position = min(self.end, self.position + 1) - if self.render: - self.draw() - if self.position == self.end: - reward = 1.0 - done = True - return reward, done, self.status() - - def draw(self, new_line=False): - if new_line: - print "" - else: - print "\r", - for i in range(self.stage_len): - if i == self.position: - sys.stdout.write("O") - else: - sys.stdout.write("-") - sys.stdout.write(" epoch: %d; steps: %d" % (self.epoch, self.step)) - sys.stdout.flush() - time.sleep(self.interval) diff --git a/PaddleRL/policy_gradient/images/PG_1.svg b/PaddleRL/policy_gradient/images/PG_1.svg deleted file mode 100644 index e2352ff57ceb70bdba013c55c35eb1dc1cabe275..0000000000000000000000000000000000000000 --- a/PaddleRL/policy_gradient/images/PG_1.svg +++ /dev/null @@ -1,3 +0,0 @@ - - - Produced by OmniGraffle 6.0.5 2017-12-01 08:39Z神经网络Layer 1x_2y_2y_0x_1x_nSoftmaxy_ma_2a_0a_m. . .. . .. . .. . .s_tθy_t = P(a_t | s_t, θ)-log(y_t) = -logP(a_t | s_t, θ)CROSS ENTROPY = diff --git a/PaddleRL/policy_gradient/images/PG_2.svg b/PaddleRL/policy_gradient/images/PG_2.svg deleted file mode 100644 index 3697bf9feca0861c9c0b2da29980ba4c86a3f4d7..0000000000000000000000000000000000000000 --- a/PaddleRL/policy_gradient/images/PG_2.svg +++ /dev/null @@ -1,3 +0,0 @@ - - - Produced by OmniGraffle 6.0.5 2017-12-01 08:39Z神经网络 2Layer 1s_tYFCa_t-logP(a_t | s_t, θ)SoftmaxR(τ^n)Cross EntropyMul-R(τ^n)logP(a_t | s_t, θ)θ diff --git a/PaddleRL/policy_gradient/images/PG_3.svg b/PaddleRL/policy_gradient/images/PG_3.svg deleted file mode 100644 index 97b56c3fe1188e603a3bf5f6eabf7ea0ea3072c7..0000000000000000000000000000000000000000 --- a/PaddleRL/policy_gradient/images/PG_3.svg +++ /dev/null @@ -1,3 +0,0 @@ - - - Produced by OmniGraffle 6.0.5 2017-12-01 09:42Z神经网络 3Layer 1Ra_2= 0.9 * a_1a_(t-1)a_t= 0.9^2 *= 0.9^t * -= mean(a_1, a_2 … a_t) diff --git a/PaddleRL/policy_gradient/run.py b/PaddleRL/policy_gradient/run.py deleted file mode 100644 index 6f2f8c381a9d6452c5d7dfefb41f05eb4551d73a..0000000000000000000000000000000000000000 --- a/PaddleRL/policy_gradient/run.py +++ /dev/null @@ -1,29 +0,0 @@ -from brain import PolicyGradient -from env import Env -import numpy as np - -n_actions = 2 -interval = 0.01 -stage_len = 10 -epoches = 10000 - -if __name__ == "__main__": - - brain = PolicyGradient(n_actions, stage_len) - e = Env(stage_len, interval) - brain.build_net() - done = False - - for epoch in range(epoches): - if (epoch % 500 == 1) or epoch < 5 or epoch > 3000: - e.render = True - else: - e.render = False - e.reset() - while not done: - s = e.status() - action = brain.choose_action(s) - r, done, _ = e.move(action) - brain.store_transition(s, action, r) - done = False - brain.learn() diff --git a/PaddleSpeech/DeepASR/.gitignore b/PaddleSpeech/DeepASR/.gitignore deleted file mode 100644 index 485dee64bcfb48793379b200a1afd14e85a8aaf4..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/.gitignore +++ /dev/null @@ -1 +0,0 @@ -.idea diff --git a/PaddleSpeech/DeepASR/README.md b/PaddleSpeech/DeepASR/README.md deleted file mode 100644 index 6b9913fd30a56ef2328bc62e9b36e496f6763430..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/README.md +++ /dev/null @@ -1,36 +0,0 @@ -The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html). - -## Deep Automatic Speech Recognition - -### Introduction -TBD - -### Installation - -#### Kaldi -The decoder depends on [kaldi](https://github.com/kaldi-asr/kaldi), install it by flowing its instructions. Then - -```shell -export KALDI_ROOT= -``` - -#### Decoder - -```shell -git clone https://github.com/PaddlePaddle/models.git -cd models/fluid/DeepASR/decoder -sh setup.sh -``` - -### Data reprocessing -TBD - -### Training -TBD - - -### Inference & Decoding -TBD - -### Question and Contribution -TBD diff --git a/PaddleSpeech/DeepASR/README_cn.md b/PaddleSpeech/DeepASR/README_cn.md deleted file mode 100644 index be78a048701a621bd90942bdfe30ef4d7c7f082f..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/README_cn.md +++ /dev/null @@ -1,186 +0,0 @@ -运行本目录下的程序示例需要使用 PaddlePaddle v0.14及以上版本。如果您的 PaddlePaddle 安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新 PaddlePaddle 安装版本。 - ---- - -DeepASR (Deep Automatic Speech Recognition) 是一个基于PaddlePaddle FLuid与[Kaldi](http://www.kaldi-asr.org)的语音识别系统。其利用Fluid框架完成语音识别中声学模型的配置和训练,并集成 Kaldi 的解码器。旨在方便已对 Kaldi 的较为熟悉的用户实现中声学模型的快速、大规模训练,并利用kaldi完成复杂的语音数据预处理和最终的解码过程。 - -### 目录 -- [模型概览](#model-overview) -- [安装](#installation) -- [数据预处理](#data-reprocessing) -- [模型训练](#training) -- [训练过程中的时间分析](#perf-profiling) -- [预测和解码](#infer-decoding) -- [评估错误率](#scoring-error-rate) -- [Aishell 实例](#aishell-example) -- [欢迎贡献更多的实例](#how-to-contrib) - -### 模型概览 - -DeepASR的声学模型是一个单卷积层加多层层叠LSTMP 的结构,利用卷积来进行初步的特征提取,并用多层的LSTMP来对时序关系进行建模,所用到的损失函数是交叉熵。[LSTMP](https://arxiv.org/abs/1402.1128)(LSTM with recurrent projection layer)是传统 LSTM 的拓展,在 LSTM 的基础上增加了一个映射层,将隐含层映射到较低的维度并输入下一个时间步,这种结构在大为减小 LSTM 的参数规模和计算复杂度的同时还提升了 LSTM 的性能表现。 - -

-
-图1 LSTMP 的拓扑结构 -

- -### 安装 - - -#### kaldi的安装与设置 - - -DeepASR解码过程中所用的解码器依赖于[Kaldi的安装](https://github.com/kaldi-asr/kaldi),如环境中无Kaldi, 请`git clone`其源代码,并按给定的命令安装好kaldi,最后设置环境变量`KALDI_ROOT`: - -```shell -export KALDI_ROOT= - -``` -#### 解码器的安装 -进入解码器源码所在的目录 - -```shell -cd models/fluid/DeepASR/decoder -``` -运行安装脚本 - -```shell -sh setup.sh -``` - 编译过程完成即成功地安转了解码器。 - -### 数据预处理 - -参考[Kaldi的数据准备流程](http://kaldi-asr.org/doc/data_prep.html)完成音频数据的特征提取和标签对齐 - -### 声学模型的训练 - -可以选择在CPU或GPU模式下进行声学模型的训练,例如在GPU模式下的训练 - -```shell -CUDA_VISIBLE_DEVICES=0,1,2,3 python -u train.py \ - --train_feature_lst train_feature.lst \ - --train_label_lst train_label.lst \ - --val_feature_lst val_feature.lst \ - --val_label_lst val_label.lst \ - --mean_var global_mean_var \ - --parallel -``` -其中`train_feature.lst`和`train_label.lst`分别是训练数据集的特征列表文件和标注列表文件,类似的,`val_feature.lst`和`val_label.lst`对应的则是验证集的列表文件。实际训练过程中要正确指定建模单元大小、学习率等重要参数。关于这些参数的说明,请运行 - -```shell -python train.py --help -``` -获取更多信息。 - -### 训练过程中的时间分析 - -利用Fluid提供的性能分析工具profiler,可对训练过程进行性能分析,获取网络中operator级别的执行时间 - -```shell -CUDA_VISIBLE_DEVICES=0 python -u tools/profile.py \ - --train_feature_lst train_feature.lst \ - --train_label_lst train_label.lst \ - --val_feature_lst val_feature.lst \ - --val_label_lst val_label.lst \ - --mean_var global_mean_var -``` - - -### 预测和解码 - -在充分训练好声学模型之后,利用训练过程中保存下来的模型checkpoint,可对输入的音频数据进行解码输出,得到声音到文字的识别结果 - -``` -CUDA_VISIBLE_DEVICES=0,1,2,3 python -u infer_by_ckpt.py \ - --batch_size 96 \ - --checkpoint deep_asr.pass_1.checkpoint \ - --infer_feature_lst test_feature.lst \ - --infer_label_lst test_label.lst \ - --mean_var global_mean_var \ - --parallel -``` - -### 评估错误率 - -对语音识别系统的评价常用的指标有词错误率(Word Error Rate, WER)和字错误率(Character Error Rate, CER), 在DeepASR中也实现了相关的度量工具,其运行方式为 - -``` -python score_error_rate.py --error_rate_type cer --ref ref.txt --hyp decoding.txt -``` -参数`error_rate_type`表示测量错误率的类型,即 WER 或 CER;`ref.txt` 和 `decoding.txt` 分别表示参考文本和实际解码出的文本,它们有着同样的格式: - -``` -key1 text1 -key2 text2 -key3 text3 -... - -``` - - -### Aishell 实例 - -本节以[Aishell数据集](http://www.aishelltech.com/kysjcp)为例,展示如何完成从数据预处理到解码输出。Aishell是由北京希尔贝克公司所开放的中文普通话语音数据集,时长178小时,包含了400名来自不同口音区域录制者的语音,原始数据可由[openslr](http://www.openslr.org/33)获取。为简化流程,这里提供了已完成预处理的数据集供下载: - -``` -cd examples/aishell -sh prepare_data.sh -``` - -其中包括了声学模型的训练数据以及解码过程中所用到的辅助文件等。下载数据完成后,在开始训练之前可对训练过程进行分析 - -``` -sh profile.sh -``` - -执行训练 - -``` -sh train.sh -``` -默认是用4卡GPU进行训练,在实际过程中可根据可用GPU的数目和显存大小对`batch_size`、学习率等参数进行动态调整。训练过程中典型的损失函数和精度的变化趋势如图2所示 - -

-
-图2 在Aishell数据集上训练声学模型的学习曲线 -

- -完成模型训练后,即可执行预测识别测试集语音中的文字: - -``` -sh infer_by_ckpt.sh -``` - -其中包括了声学模型的预测和解码器的解码输出两个重要的过程。以下是解码输出的样例: - -``` -... -BAC009S0764W0239 十一 五 期间 我 国 累计 境外 投资 七千亿 美元 -BAC009S0765W0140 在 了解 送 方 的 资产 情况 与 需求 之后 -BAC009S0915W0291 这 对 苹果 来说 不 是 件 容易 的 事 儿 -BAC009S0769W0159 今年 土地 收入 预计 近 四万亿 元 -BAC009S0907W0451 由 浦东 商店 作为 掩护 -BAC009S0768W0128 土地 交易 可能 随着 供应 淡季 的 到来 而 降温 -... -``` - -每行对应一个输出,均以音频样本的关键字开头,随后是按词分隔的解码出的中文文本。解码完成后运行脚本评估字错误率(CER) - -``` -sh score_cer.sh -``` - -其输出类似于如下所示 - -``` -Error rate[cer] = 0.101971 (10683/104765), -total 7176 sentences in hyp, 0 not presented in ref. -``` - -利用经过20轮左右训练的声学模型,可以在Aishell的测试集上得到CER约10%的识别结果。 - - -### 欢迎贡献更多的实例 - -DeepASR目前只开放了Aishell实例,我们欢迎用户在更多的数据集上测试完整的训练流程并贡献到这个项目中。 diff --git a/PaddleSpeech/DeepASR/data_utils/__init__.py b/PaddleSpeech/DeepASR/data_utils/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/PaddleSpeech/DeepASR/data_utils/async_data_reader.py b/PaddleSpeech/DeepASR/data_utils/async_data_reader.py deleted file mode 100644 index edface051129b248bad85978118daec6f8660adc..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/data_utils/async_data_reader.py +++ /dev/null @@ -1,465 +0,0 @@ -"""This module contains data processing related logic. -""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import random -import struct -import Queue -import time -import numpy as np -from threading import Thread -import signal -from multiprocessing import Manager, Process -import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm -import data_utils.augmentor.trans_add_delta as trans_add_delta -from data_utils.util import suppress_complaints, suppress_signal -from data_utils.util import CriticalException, ForceExitWrapper - - -class SampleInfo(object): - """SampleInfo holds the necessary information to load a sample from disk. - - Args: - feature_bin_path (str): File containing the feature data. - feature_start (int): Start position of the sample's feature data. - feature_size (int): Byte count of the sample's feature data. - feature_frame_num (int): Time length of the sample. - feature_dim (int): Feature dimension of one frame. - label_bin_path (str): File containing the label data. - label_size (int): Byte count of the sample's label data. - label_frame_num (int): Label number of the sample. - sample_name (str): Key of the sample - """ - - def __init__(self, feature_bin_path, feature_start, feature_size, - feature_frame_num, feature_dim, label_bin_path, label_start, - label_size, label_frame_num, sample_name): - self.feature_bin_path = feature_bin_path - self.feature_start = feature_start - self.feature_size = feature_size - self.feature_frame_num = feature_frame_num - self.feature_dim = feature_dim - - self.label_bin_path = label_bin_path - self.label_start = label_start - self.label_size = label_size - self.label_frame_num = label_frame_num - self.sample_name = sample_name - - -class SampleInfoBucket(object): - """SampleInfoBucket contains paths of several description files. Feature - description file contains necessary information (including path of binary - data, sample start position, sample byte number etc.) to access samples' - feature data and the same with the label description file. SampleInfoBucket - is the minimum unit to do shuffle. - - Args: - feature_bin_paths (list|tuple): Files containing the binary feature - data. - feature_desc_paths (list|tuple): Files containing the description of - samples' feature data. - label_bin_paths (list|tuple): Files containing the binary label data. - label_desc_paths (list|tuple): Files containing the description of - samples' label data. - split_perturb(int): Maximum perturbation value for length of - sub-sentence when splitting long sentence. - split_sentence_threshold(int): Sentence whose length larger than - the value will trigger split operation. - split_sub_sentence_len(int): sub-sentence length is equal to - (split_sub_sentence_len - + rand() % split_perturb). - """ - - def __init__(self, - feature_bin_paths, - feature_desc_paths, - label_bin_paths, - label_desc_paths, - split_perturb=50, - split_sentence_threshold=512, - split_sub_sentence_len=256): - block_num = len(label_bin_paths) - assert len(label_desc_paths) == block_num - assert len(feature_bin_paths) == block_num - assert len(feature_desc_paths) == block_num - self._block_num = block_num - - self._feature_bin_paths = feature_bin_paths - self._feature_desc_paths = feature_desc_paths - self._label_bin_paths = label_bin_paths - self._label_desc_paths = label_desc_paths - self._split_perturb = split_perturb - self._split_sentence_threshold = split_sentence_threshold - self._split_sub_sentence_len = split_sub_sentence_len - self._rng = random.Random(0) - - def generate_sample_info_list(self): - sample_info_list = [] - for block_idx in xrange(self._block_num): - label_bin_path = self._label_bin_paths[block_idx] - label_desc_path = self._label_desc_paths[block_idx] - feature_bin_path = self._feature_bin_paths[block_idx] - feature_desc_path = self._feature_desc_paths[block_idx] - - feature_desc_lines = open(feature_desc_path).readlines() - - label_desc_lines = [] - if label_desc_path != "": - label_desc_lines = open(label_desc_path).readlines() - sample_num = int(feature_desc_lines[0].split()[1]) - - if label_desc_path != "": - assert sample_num == int(label_desc_lines[0].split()[1]) - - for i in xrange(sample_num): - feature_desc_split = feature_desc_lines[i + 1].split() - sample_name = feature_desc_split[0] - feature_start = int(feature_desc_split[2]) - feature_size = int(feature_desc_split[3]) - feature_frame_num = int(feature_desc_split[4]) - feature_dim = int(feature_desc_split[5]) - - label_start = -1 - label_size = -1 - label_frame_num = feature_frame_num - if label_desc_path != "": - label_desc_split = label_desc_lines[i + 1].split() - label_start = int(label_desc_split[2]) - label_size = int(label_desc_split[3]) - label_frame_num = int(label_desc_split[4]) - assert feature_frame_num == label_frame_num - - if self._split_sentence_threshold == -1 or \ - self._split_perturb == -1 or \ - self._split_sub_sentence_len == -1 \ - or self._split_sentence_threshold >= feature_frame_num: - sample_info_list.append( - SampleInfo(feature_bin_path, feature_start, - feature_size, feature_frame_num, feature_dim, - label_bin_path, label_start, label_size, - label_frame_num, sample_name)) - #split sentence - else: - cur_frame_pos = 0 - cur_frame_len = 0 - remain_frame_num = feature_frame_num - while True: - if remain_frame_num > self._split_sentence_threshold: - cur_frame_len = self._split_sub_sentence_len + \ - self._rng.randint(0, self._split_perturb) - if cur_frame_len > remain_frame_num: - cur_frame_len = remain_frame_num - else: - cur_frame_len = remain_frame_num - - sample_info_list.append( - SampleInfo( - feature_bin_path, feature_start + cur_frame_pos - * feature_dim * 4, cur_frame_len * feature_dim * - 4, cur_frame_len, feature_dim, label_bin_path, - label_start + cur_frame_pos * 4, cur_frame_len * - 4, cur_frame_len, sample_name)) - - remain_frame_num -= cur_frame_len - cur_frame_pos += cur_frame_len - if remain_frame_num <= 0: - break - return sample_info_list - - -class EpochEndSignal(): - pass - - -class AsyncDataReader(object): - """DataReader provides basic audio sample preprocessing pipeline including - data loading and data augmentation. - - Args: - feature_file_list (str): File containing paths of feature data file and - corresponding description file. - label_file_list (str): File containing paths of label data file and - corresponding description file. - drop_frame_len (int): Samples whose label length above the value will be - dropped.(Using '-1' to disable the policy) - split_sentence_threshold(int): Sentence whose length larger than - the value will trigger split operation. - (Assign -1 to disable split) - proc_num (int): Number of processes for processing data. - sample_buffer_size (int): Buffer size to indicate the maximum samples - cached. - sample_info_buffer_size (int): Buffer size to indicate the maximum - sample information cached. - batch_buffer_size (int): Buffer size to indicate the maximum batch - cached. - shuffle_block_num (int): Block number indicating the minimum unit to do - shuffle. - random_seed (int): Random seed. - verbose (int): If set to 0, complaints including exceptions and signal - traceback from sub-process will be suppressed. If set - to 1, all complaints will be printed. - """ - - def __init__(self, - feature_file_list, - label_file_list="", - drop_frame_len=512, - split_sentence_threshold=1024, - proc_num=10, - sample_buffer_size=1024, - sample_info_buffer_size=1024, - batch_buffer_size=10, - shuffle_block_num=10, - random_seed=0, - verbose=0): - self._feature_file_list = feature_file_list - self._label_file_list = label_file_list - self._drop_frame_len = drop_frame_len - self._split_sentence_threshold = split_sentence_threshold - self._shuffle_block_num = shuffle_block_num - self._block_info_list = None - self._rng = random.Random(random_seed) - self._bucket_list = None - self.generate_bucket_list(True) - self._order_id = 0 - self._manager = Manager() - self._sample_buffer_size = sample_buffer_size - self._sample_info_buffer_size = sample_info_buffer_size - self._batch_buffer_size = batch_buffer_size - self._proc_num = proc_num - self._verbose = verbose - self._force_exit = ForceExitWrapper(self._manager.Value('b', False)) - - def generate_bucket_list(self, is_shuffle): - if self._block_info_list is None: - block_feature_info_lines = open(self._feature_file_list).readlines() - self._block_info_list = [] - if self._label_file_list != "": - block_label_info_lines = open(self._label_file_list).readlines() - assert len(block_feature_info_lines) == len( - block_label_info_lines) - for i in xrange(0, len(block_feature_info_lines), 2): - block_info = (block_feature_info_lines[i], - block_feature_info_lines[i + 1], - block_label_info_lines[i], - block_label_info_lines[i + 1]) - self._block_info_list.append( - map(lambda line: line.strip(), block_info)) - else: - for i in xrange(0, len(block_feature_info_lines), 2): - block_info = (block_feature_info_lines[i], - block_feature_info_lines[i + 1], "", "") - self._block_info_list.append( - map(lambda line: line.strip(), block_info)) - - if is_shuffle: - self._rng.shuffle(self._block_info_list) - - self._bucket_list = [] - for i in xrange(0, len(self._block_info_list), self._shuffle_block_num): - bucket_block_info = self._block_info_list[i:i + - self._shuffle_block_num] - self._bucket_list.append( - SampleInfoBucket( - map(lambda info: info[0], bucket_block_info), - map(lambda info: info[1], bucket_block_info), - map(lambda info: info[2], bucket_block_info), - map(lambda info: info[3], bucket_block_info), - split_sentence_threshold=self._split_sentence_threshold)) - - # @TODO make this configurable - def set_transformers(self, transformers): - self._transformers = transformers - - def _sample_generator(self): - sample_info_queue = self._manager.Queue(self._sample_info_buffer_size) - sample_queue = self._manager.Queue(self._sample_buffer_size) - self._order_id = 0 - - @suppress_complaints(verbose=self._verbose, notify=self._force_exit) - def ordered_feeding_task(sample_info_queue): - for sample_info_bucket in self._bucket_list: - try: - sample_info_list = \ - sample_info_bucket.generate_sample_info_list() - except Exception as e: - raise CriticalException(e) - else: - self._rng.shuffle(sample_info_list) # do shuffle here - for sample_info in sample_info_list: - sample_info_queue.put((sample_info, self._order_id)) - self._order_id += 1 - - for i in xrange(self._proc_num): - sample_info_queue.put(EpochEndSignal()) - - feeding_thread = Thread( - target=ordered_feeding_task, args=(sample_info_queue, )) - feeding_thread.daemon = True - feeding_thread.start() - - @suppress_complaints(verbose=self._verbose, notify=self._force_exit) - def ordered_processing_task(sample_info_queue, sample_queue, out_order): - if self._verbose == 0: - signal.signal(signal.SIGTERM, suppress_signal) - signal.signal(signal.SIGINT, suppress_signal) - - def read_bytes(fpath, start, size): - try: - f = open(fpath, 'r') - f.seek(start, 0) - binary_bytes = f.read(size) - f.close() - return binary_bytes - except Exception as e: - raise CriticalException(e) - - ins = sample_info_queue.get() - - while not isinstance(ins, EpochEndSignal): - sample_info, order_id = ins - - feature_bytes = read_bytes(sample_info.feature_bin_path, - sample_info.feature_start, - sample_info.feature_size) - - assert sample_info.feature_frame_num \ - * sample_info.feature_dim * 4 \ - == len(feature_bytes), \ - (sample_info.feature_bin_path, - sample_info.feature_frame_num, - sample_info.feature_dim, - len(feature_bytes)) - - label_data = None - if sample_info.label_bin_path != "": - label_bytes = read_bytes(sample_info.label_bin_path, - sample_info.label_start, - sample_info.label_size) - - assert sample_info.label_frame_num * 4 == len( - label_bytes), (sample_info.label_bin_path, - sample_info.label_array, - len(label_bytes)) - - label_array = struct.unpack( - 'I' * sample_info.label_frame_num, label_bytes) - label_data = np.array( - label_array, dtype='int64').reshape( - (sample_info.label_frame_num, 1)) - else: - label_data = np.zeros( - (sample_info.label_frame_num, 1), dtype='int64') - - feature_frame_num = sample_info.feature_frame_num - feature_dim = sample_info.feature_dim - assert feature_frame_num * feature_dim * 4 == len(feature_bytes) - feature_array = struct.unpack('f' * feature_frame_num * - feature_dim, feature_bytes) - feature_data = np.array( - feature_array, dtype='float32').reshape(( - sample_info.feature_frame_num, sample_info.feature_dim)) - sample_data = (feature_data, label_data, - sample_info.sample_name) - for transformer in self._transformers: - # @TODO(pkuyym) to make transfomer only accept feature_data - sample_data = transformer.perform_trans(sample_data) - while order_id != out_order[0]: - time.sleep(0.001) - - # drop long sentence - if self._drop_frame_len == -1 or \ - self._drop_frame_len >= sample_data[0].shape[0]: - sample_queue.put(sample_data) - - out_order[0] += 1 - ins = sample_info_queue.get() - - sample_queue.put(EpochEndSignal()) - - out_order = self._manager.list([0]) - args = (sample_info_queue, sample_queue, out_order) - workers = [ - Process( - target=ordered_processing_task, args=args) - for _ in xrange(self._proc_num) - ] - - for w in workers: - w.daemon = True - w.start() - - finished_proc_num = 0 - - while self._force_exit == False: - try: - sample = sample_queue.get_nowait() - except Queue.Empty: - time.sleep(0.001) - else: - if isinstance(sample, EpochEndSignal): - finished_proc_num += 1 - if finished_proc_num >= self._proc_num: - break - else: - continue - - yield sample - - def batch_iterator(self, batch_size, minimum_batch_size): - def batch_to_ndarray(batch_samples, lod): - assert len(batch_samples) - frame_dim = batch_samples[0][0].shape[1] - batch_feature = np.zeros((lod[-1], frame_dim), dtype="float32") - batch_label = np.zeros((lod[-1], 1), dtype="int64") - start = 0 - name_lst = [] - for sample in batch_samples: - frame_num = sample[0].shape[0] - batch_feature[start:start + frame_num, :] = sample[0] - batch_label[start:start + frame_num, :] = sample[1] - start += frame_num - name_lst.append(sample[2]) - return (batch_feature, batch_label, name_lst) - - @suppress_complaints(verbose=self._verbose, notify=self._force_exit) - def batch_assembling_task(sample_generator, batch_queue): - batch_samples = [] - lod = [0] - for sample in sample_generator(): - batch_samples.append(sample) - lod.append(lod[-1] + sample[0].shape[0]) - if len(batch_samples) == batch_size: - (batch_feature, batch_label, name_lst) = batch_to_ndarray( - batch_samples, lod) - batch_queue.put((batch_feature, batch_label, lod, name_lst)) - batch_samples = [] - lod = [0] - - if len(batch_samples) >= minimum_batch_size: - (batch_feature, batch_label, name_lst) = batch_to_ndarray( - batch_samples, lod) - batch_queue.put((batch_feature, batch_label, lod, name_lst)) - - batch_queue.put(EpochEndSignal()) - - batch_queue = Queue.Queue(self._batch_buffer_size) - - assembling_thread = Thread( - target=batch_assembling_task, - args=(self._sample_generator, batch_queue)) - assembling_thread.daemon = True - assembling_thread.start() - - while self._force_exit == False: - try: - batch_data = batch_queue.get_nowait() - except Queue.Empty: - time.sleep(0.001) - else: - if isinstance(batch_data, EpochEndSignal): - break - yield batch_data diff --git a/PaddleSpeech/DeepASR/data_utils/augmentor/__init__.py b/PaddleSpeech/DeepASR/data_utils/augmentor/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/PaddleSpeech/DeepASR/data_utils/augmentor/tests/__init__.py b/PaddleSpeech/DeepASR/data_utils/augmentor/tests/__init__.py deleted file mode 100644 index 90856dc44374211453f7de128c08c8004ffda912..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/data_utils/augmentor/tests/__init__.py +++ /dev/null @@ -1,7 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm -import data_utils.augmentor.trans_add_delta as trans_add_delta -import data_utils.augmentor.trans_splice as trans_splice diff --git a/PaddleSpeech/DeepASR/data_utils/augmentor/tests/data/global_mean_var_search26kHr b/PaddleSpeech/DeepASR/data_utils/augmentor/tests/data/global_mean_var_search26kHr deleted file mode 100644 index 7fabadc789bbd7aaad4e9ac59aba95b080c68b22..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/data_utils/augmentor/tests/data/global_mean_var_search26kHr +++ /dev/null @@ -1,120 +0,0 @@ -16.2845556399 11.6891798673 -17.21509949 12.3788567902 -18.1143704548 14.9912618017 -19.2335963752 18.5419556172 -19.9266772451 21.2768220522 -19.8245737202 21.2347210705 -19.5432940972 20.2784036567 -19.4631271754 20.2934452329 -19.3929919324 20.457971868 -19.2924788362 20.3626439234 -18.9207244502 19.9196569759 -18.7202605641 19.5920276899 -18.4844279398 19.2068349019 -18.2670948624 18.8716893824 -18.0929628855 18.5439666541 -17.8428896026 18.0255891747 -17.6646850635 17.473764296 -17.4955705896 16.8966859471 -17.3706720293 16.4294027467 -17.2530867792 16.0514717623 -17.1304341172 15.7234699057 -17.0038353287 15.4344471514 -16.902550309 15.1603287337 -16.8375590047 14.9304337826 -16.816287853 14.9119310513 -16.828838265 15.0930023024 -16.8602209498 15.3771992423 -16.9101763812 15.6897991789 -16.9466065143 15.9364556489 -16.9486061956 16.0699417826 -16.9041374104 16.0796970272 -16.8410093699 16.0111444599 -16.7045718836 15.7991985601 -16.51128489 15.5208920129 -16.3253910608 15.2603181921 -16.1297317333 14.9499965958 -15.903428372 14.5958280409 -15.6131718105 14.2709618 -15.1395035533 13.9993939893 -14.4298229999 13.3841189151 -0.0034970565424 0.246184766149 -0.00501284154705 0.238484972472 -0.00605942680019 0.269064381708 -0.00687266156243 0.319479238011 -0.00734065019253 0.371947383205 -0.00718807218417 0.384426479694 -0.00652195540212 0.384676838281 -0.00660416525951 0.395543910317 -0.00680202057642 0.400803979681 -0.00659144183007 0.393228973031 -0.00605294530423 0.385021118038 -0.00590452969394 0.361763039625 -0.00612315374687 0.346777773373 -0.00582354093973 0.335802403976 -0.00574556002554 0.320733728218 -0.00612254485891 0.310153103033 -0.00626733043219 0.299854747445 -0.00567398408041 0.293353685493 -0.00519236700706 0.287668810947 -0.00529581474367 0.281479660772 -0.00479019484082 0.27451415777 -0.00486381039428 0.266294391154 -0.00491126372868 0.258105116126 -0.00452105305011 0.252926328298 -0.00531483334271 0.250910887373 -0.00546572110469 0.253302256977 -0.00479544857908 0.258484183394 -0.00422106426297 0.264582900173 -0.00401824135188 0.268467945623 -0.0041705465252 0.269699480291 -0.00405239564143 0.270406162975 -0.0040059737566 0.270407601782 -0.00406426729317 0.267951582656 -0.00416613791013 0.264543833042 -0.00427847607653 0.26247798891 -0.00428050903034 0.259635263243 -0.00454842971786 0.255829377617 -0.00393747552387 0.253802307025 -0.00374143688909 0.251011478787 -0.00335475310258 0.236543650856 -0.000373194755312 0.0419494800709 -0.000230909648678 0.0394102370205 -0.000150840015851 0.0414956922398 -8.44401840771e-05 0.0460502231327 --6.24759314572e-06 0.0528049937739 --8.82957758148e-05 0.055711244886 -1.16795791952e-05 0.0563188428833 --1.68716267856e-05 0.0575232763711 --0.000112625308645 0.057979929947 --0.000122619090002 0.0564126233493 -1.73569637319e-05 0.05522573909 -6.49872782342e-05 0.0507353361334 -4.17746389178e-05 0.0479568131253 -5.13884475653e-05 0.0461253238047 -1.8860115143e-05 0.0436860476919 --5.64317701105e-05 0.042516381059 --0.000136859948115 0.0413574820205 --7.00847019726e-05 0.0409516370727 --5.39392223336e-05 0.040441504085 --9.24897162815e-05 0.0397800398173 -4.7104970622e-05 0.039046286243 -6.24805896165e-06 0.0380185986602 --2.35272813418e-05 0.036851063786 -5.88344154127e-05 0.0361640489242 --8.39162076993e-05 0.0357639427311 --0.000108702805776 0.0358774639538 -3.22013961834e-06 0.0363644530435 -9.43501518394e-05 0.0370309934774 -0.000134406229423 0.0374972993343 -3.84007008533e-05 0.037676222515 -3.05989328157e-05 0.0379111939182 -9.52201629091e-05 0.0380927209106 -0.000102126083729 0.0379925358499 -6.98628072264e-05 0.0377276252241 -4.55782256339e-05 0.0375165468654 -4.76370987786e-05 0.0371482526345 --2.24128832709e-05 0.0366810742947 -0.000125621306953 0.036628355271 -0.000134568666093 0.0364860461759 -0.000159858844464 0.0345583593149 diff --git a/PaddleSpeech/DeepASR/data_utils/augmentor/tests/test_data_trans.py b/PaddleSpeech/DeepASR/data_utils/augmentor/tests/test_data_trans.py deleted file mode 100644 index 6b18f3fa5958a9e44899b39b1f583311f186f72e..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/data_utils/augmentor/tests/test_data_trans.py +++ /dev/null @@ -1,136 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import sys -import unittest -import numpy as np -import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm -import data_utils.augmentor.trans_add_delta as trans_add_delta -import data_utils.augmentor.trans_splice as trans_splice -import data_utils.augmentor.trans_delay as trans_delay - - -class TestTransMeanVarianceNorm(unittest.TestCase): - """unit test for TransMeanVarianceNorm - """ - - def setUp(self): - self._file_path = "./data_utils/augmentor/tests/data/" \ - "global_mean_var_search26kHr" - - def test(self): - feature = np.zeros((2, 120), dtype="float32") - feature.fill(1) - trans = trans_mean_variance_norm.TransMeanVarianceNorm(self._file_path) - (feature1, label1, name) = trans.perform_trans((feature, None, None)) - (mean, var) = trans.get_mean_var() - feature_flat1 = feature1.flatten() - feature_flat = feature.flatten() - one = np.ones((1), dtype="float32") - for idx, val in enumerate(feature_flat1): - cur_idx = idx % 120 - self.assertAlmostEqual(val, (one[0] - mean[cur_idx]) * var[cur_idx]) - - -class TestTransAddDelta(unittest.TestCase): - """unit test TestTransAddDelta - """ - - def test_regress(self): - """test regress - """ - feature = np.zeros((14, 120), dtype="float32") - feature[0:5, 0:40].fill(1) - feature[0 + 5, 0:40].fill(1) - feature[1 + 5, 0:40].fill(2) - feature[2 + 5, 0:40].fill(3) - feature[3 + 5, 0:40].fill(4) - feature[8:14, 0:40].fill(4) - trans = trans_add_delta.TransAddDelta() - feature = feature.reshape((14 * 120)) - trans._regress(feature, 5 * 120, feature, 5 * 120 + 40, 40, 4, 120) - trans._regress(feature, 5 * 120 + 40, feature, 5 * 120 + 80, 40, 4, 120) - feature = feature.reshape((14, 120)) - tmp_feature = feature[5:5 + 4, :] - self.assertAlmostEqual(1.0, tmp_feature[0][0]) - self.assertAlmostEqual(0.24, tmp_feature[0][119]) - self.assertAlmostEqual(2.0, tmp_feature[1][0]) - self.assertAlmostEqual(0.13, tmp_feature[1][119]) - self.assertAlmostEqual(3.0, tmp_feature[2][0]) - self.assertAlmostEqual(-0.13, tmp_feature[2][119]) - self.assertAlmostEqual(4.0, tmp_feature[3][0]) - self.assertAlmostEqual(-0.24, tmp_feature[3][119]) - - def test_perform(self): - """test perform - """ - feature = np.zeros((4, 40), dtype="float32") - feature[0, 0:40].fill(1) - feature[1, 0:40].fill(2) - feature[2, 0:40].fill(3) - feature[3, 0:40].fill(4) - trans = trans_add_delta.TransAddDelta() - (feature, label, name) = trans.perform_trans((feature, None, None)) - self.assertAlmostEqual(feature.shape[0], 4) - self.assertAlmostEqual(feature.shape[1], 120) - self.assertAlmostEqual(1.0, feature[0][0]) - self.assertAlmostEqual(0.24, feature[0][119]) - self.assertAlmostEqual(2.0, feature[1][0]) - self.assertAlmostEqual(0.13, feature[1][119]) - self.assertAlmostEqual(3.0, feature[2][0]) - self.assertAlmostEqual(-0.13, feature[2][119]) - self.assertAlmostEqual(4.0, feature[3][0]) - self.assertAlmostEqual(-0.24, feature[3][119]) - - -class TestTransSplict(unittest.TestCase): - """unit test Test TransSplict - """ - - def test_perfrom(self): - feature = np.zeros((8, 10), dtype="float32") - for i in xrange(feature.shape[0]): - feature[i, :].fill(i) - - trans = trans_splice.TransSplice() - (feature, label, name) = trans.perform_trans((feature, None, None)) - self.assertEqual(feature.shape[1], 110) - - for i in xrange(8): - nzero_num = 5 - i - cur_val = 0.0 - if nzero_num < 0: - cur_val = i - 5 - 1 - for j in xrange(11): - if j <= nzero_num: - for k in xrange(10): - self.assertAlmostEqual(feature[i][j * 10 + k], cur_val) - else: - if cur_val < 7: - cur_val += 1.0 - for k in xrange(10): - self.assertAlmostEqual(feature[i][j * 10 + k], cur_val) - - -class TestTransDelay(unittest.TestCase): - """unittest TransDelay - """ - - def test_perform(self): - label = np.zeros((10, 1), dtype="int64") - for i in xrange(10): - label[i][0] = i - - trans = trans_delay.TransDelay(5) - (_, label, _) = trans.perform_trans((None, label, None)) - - for i in xrange(5): - self.assertAlmostEqual(label[i + 5][0], i) - - for i in xrange(5): - self.assertAlmostEqual(label[i][0], 0) - - -if __name__ == '__main__': - unittest.main() diff --git a/PaddleSpeech/DeepASR/data_utils/augmentor/trans_add_delta.py b/PaddleSpeech/DeepASR/data_utils/augmentor/trans_add_delta.py deleted file mode 100644 index aa8062f87c932b76dd8a79db825d07e8be273857..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/data_utils/augmentor/trans_add_delta.py +++ /dev/null @@ -1,104 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import numpy as np -import math -import copy - - -class TransAddDelta(object): - """ add delta of feature data - trans feature for shape(a, b) to shape(a, b * 3) - - Attributes: - _norder(int): - _window(int): - """ - - def __init__(self, norder=2, nwindow=2): - """ init construction - Args: - norder: default 2 - nwindow: default 2 - """ - self._norder = norder - self._nwindow = nwindow - - def perform_trans(self, sample): - """ add delta for feature - trans feature shape from (a,b) to (a, b * 3) - - Args: - sample(object,tuple): contain feature numpy and label numpy - Returns: - (feature, label, name) - """ - (feature, label, name) = sample - frame_dim = feature.shape[1] - d_frame_dim = frame_dim * 3 - head_filled = 5 - tail_filled = 5 - mat = np.zeros( - (feature.shape[0] + head_filled + tail_filled, d_frame_dim), - dtype="float32") - #copy first frame - for i in xrange(head_filled): - np.copyto(mat[i, 0:frame_dim], feature[0, :]) - - np.copyto(mat[head_filled:head_filled + feature.shape[0], 0:frame_dim], - feature[:, :]) - - # copy last frame - for i in xrange(head_filled + feature.shape[0], mat.shape[0], 1): - np.copyto(mat[i, 0:frame_dim], feature[feature.shape[0] - 1, :]) - - nframe = feature.shape[0] - start = head_filled - tmp_shape = mat.shape - mat = mat.reshape((tmp_shape[0] * tmp_shape[1])) - self._regress(mat, start * d_frame_dim, mat, - start * d_frame_dim + frame_dim, frame_dim, nframe, - d_frame_dim) - self._regress(mat, start * d_frame_dim + frame_dim, mat, - start * d_frame_dim + 2 * frame_dim, frame_dim, nframe, - d_frame_dim) - mat.shape = tmp_shape - return (mat[head_filled:mat.shape[0] - tail_filled, :], label, name) - - def _regress(self, data_in, start_in, data_out, start_out, size, n, step): - """ regress - Args: - data_in: in data - start_in: start index of data_in - data_out: out data - start_out: start index of data_out - size: frame dimentional - n: frame num - step: 3 * (frame num) - Returns: - None - """ - sigma_t2 = 0.0 - delta_window = self._nwindow - for t in xrange(1, delta_window + 1): - sigma_t2 += t * t - - sigma_t2 *= 2.0 - for i in xrange(n): - fp1 = start_in - fp2 = start_out - for j in xrange(size): - back = fp1 - forw = fp1 - sum = 0.0 - for t in xrange(1, delta_window + 1): - back -= step - forw += step - sum += t * (data_in[forw] - data_in[back]) - - data_out[fp2] = sum / sigma_t2 - fp1 += 1 - fp2 += 1 - start_in += step - start_out += step diff --git a/PaddleSpeech/DeepASR/data_utils/augmentor/trans_delay.py b/PaddleSpeech/DeepASR/data_utils/augmentor/trans_delay.py deleted file mode 100644 index b782498edfd5443806a6c80e3b4fe91b8e2b1cc9..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/data_utils/augmentor/trans_delay.py +++ /dev/null @@ -1,37 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import numpy as np -import math - - -class TransDelay(object): - """ Delay label, and copy first label value in the front. - Attributes: - _delay_time : the delay frame num of label - """ - - def __init__(self, delay_time): - """init construction - Args: - delay_time : the delay frame num of label - """ - self._delay_time = delay_time - - def perform_trans(self, sample): - """ - Args: - sample(object):input sample, contain feature numpy and label numpy, sample name list - Returns: - (feature, label, name) - """ - (feature, label, name) = sample - - shape = label.shape - assert len(shape) == 2 - label[self._delay_time:shape[0]] = label[0:shape[0] - self._delay_time] - for i in xrange(self._delay_time): - label[i][0] = label[self._delay_time][0] - - return (feature, label, name) diff --git a/PaddleSpeech/DeepASR/data_utils/augmentor/trans_mean_variance_norm.py b/PaddleSpeech/DeepASR/data_utils/augmentor/trans_mean_variance_norm.py deleted file mode 100644 index 9f91b726ea2bcd432340cd06a3cb9006cd5f83f4..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/data_utils/augmentor/trans_mean_variance_norm.py +++ /dev/null @@ -1,71 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import numpy as np -import math - - -class TransMeanVarianceNorm(object): - """ normalization of mean variance for feature data - Attributes: - _mean(numpy.array): the feature mean vector - _var(numpy.array): the feature variance - """ - - def __init__(self, snorm_path): - """init construction - Args: - snorm_path: the path of mean and variance - """ - self._mean = None - self._var = None - self._load_norm(snorm_path) - - def _load_norm(self, snorm_path): - """ load mean var file - Args: - snorm_path(str):the file path - """ - lLines = open(snorm_path).readlines() - nLen = len(lLines) - self._mean = np.zeros((nLen), dtype="float32") - self._var = np.zeros((nLen), dtype="float32") - self._nLen = nLen - for nidx, l in enumerate(lLines): - s = l.split() - assert len(s) == 2 - self._mean[nidx] = float(s[0]) - self._var[nidx] = 1.0 / math.sqrt(float(s[1])) - if self._var[nidx] > 100000.0: - self._var[nidx] = 100000.0 - - def get_mean_var(self): - """ get mean and var - Args: - Returns: - (mean, var) - """ - return (self._mean, self._var) - - def perform_trans(self, sample): - """ feature = (feature - mean) * var - Args: - sample(object):input sample, contain feature numpy and label numpy - Returns: - (feature, label, name) - """ - (feature, label, name) = sample - shape = feature.shape - assert len(shape) == 2 - nfeature_len = shape[0] * shape[1] - assert nfeature_len % self._nLen == 0 - ncur_idx = 0 - feature = feature.reshape((nfeature_len)) - while ncur_idx < nfeature_len: - block = feature[ncur_idx:ncur_idx + self._nLen] - block = (block - self._mean) * self._var - feature[ncur_idx:ncur_idx + self._nLen] = block - ncur_idx += self._nLen - feature = feature.reshape(shape) - return (feature, label, name) diff --git a/PaddleSpeech/DeepASR/data_utils/augmentor/trans_splice.py b/PaddleSpeech/DeepASR/data_utils/augmentor/trans_splice.py deleted file mode 100644 index 1fab3d6b442c1613f18d16fd0b0ee89464dbeb2c..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/data_utils/augmentor/trans_splice.py +++ /dev/null @@ -1,64 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import numpy as np -import math - - -class TransSplice(object): - """ copy feature context to construct new feature - expand feature data from shape (frame_num, frame_dim) - to shape (frame_num, frame_dim * 11) - - Attributes: - _nleft_context(int): copy left context number - _nright_context(int): copy right context number - """ - - def __init__(self, nleft_context=5, nright_context=5): - """ init construction - Args: - nleft_context(int): - nright_context(int): - """ - self._nleft_context = nleft_context - self._nright_context = nright_context - - def perform_trans(self, sample): - """ copy feature context - Args: - sample(object): input sample(feature, label) - Return: - (feature, label, name) - """ - (feature, label, name) = sample - nframe_num = feature.shape[0] - nframe_dim = feature.shape[1] - nnew_frame_dim = nframe_dim * ( - self._nleft_context + self._nright_context + 1) - mat = np.zeros( - (nframe_num + self._nleft_context + self._nright_context, - nframe_dim), - dtype="float32") - ret = np.zeros((nframe_num, nnew_frame_dim), dtype="float32") - - #copy left - for i in xrange(self._nleft_context): - mat[i, :] = feature[0, :] - - #copy middle - mat[self._nleft_context:self._nleft_context + - nframe_num, :] = feature[:, :] - - #copy right - for i in xrange(self._nright_context): - mat[i + self._nleft_context + nframe_num, :] = feature[-1, :] - - mat = mat.reshape(mat.shape[0] * mat.shape[1]) - ret = ret.reshape(ret.shape[0] * ret.shape[1]) - for i in xrange(nframe_num): - np.copyto(ret[i * nnew_frame_dim:(i + 1) * nnew_frame_dim], - mat[i * nframe_dim:i * nframe_dim + nnew_frame_dim]) - ret = ret.reshape((nframe_num, nnew_frame_dim)) - return (ret, label, name) diff --git a/PaddleSpeech/DeepASR/data_utils/util.py b/PaddleSpeech/DeepASR/data_utils/util.py deleted file mode 100644 index 4a5a8a3f1dad1c46ed773fd48d713e276717d5e5..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/data_utils/util.py +++ /dev/null @@ -1,71 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function -import sys -from six import reraise -from tblib import Traceback - -import numpy as np - - -def to_lodtensor(data, place): - """convert tensor to lodtensor - """ - seq_lens = [len(seq) for seq in data] - cur_len = 0 - lod = [cur_len] - for l in seq_lens: - cur_len += l - lod.append(cur_len) - flattened_data = numpy.concatenate(data, axis=0).astype("int64") - flattened_data = flattened_data.reshape([len(flattened_data), 1]) - res = fluid.LoDTensor() - res.set(flattened_data, place) - res.set_lod([lod]) - return res - - -def split_infer_result(infer_seq, lod): - infer_batch = [] - for i in xrange(0, len(lod[0]) - 1): - infer_batch.append(infer_seq[lod[0][i]:lod[0][i + 1]]) - return infer_batch - - -class CriticalException(Exception): - pass - - -def suppress_signal(signo, stack_frame): - pass - - -def suppress_complaints(verbose, notify=None): - def decorator_maker(func): - def suppress_warpper(*args, **kwargs): - try: - func(*args, **kwargs) - except: - et, ev, tb = sys.exc_info() - - if notify is not None: - notify(except_type=et, except_value=ev, traceback=tb) - - if verbose == 1 or isinstance(ev, CriticalException): - reraise(et, ev, Traceback(tb).as_traceback()) - - return suppress_warpper - - return decorator_maker - - -class ForceExitWrapper(object): - def __init__(self, exit_flag): - self._exit_flag = exit_flag - - @suppress_complaints(verbose=0) - def __call__(self, *args, **kwargs): - self._exit_flag.value = True - - def __eq__(self, flag): - return self._exit_flag.value == flag diff --git a/PaddleSpeech/DeepASR/decoder/.gitignore b/PaddleSpeech/DeepASR/decoder/.gitignore deleted file mode 100644 index ef5c97cfb5c06f3308980ca65c87e9c4b9440171..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/decoder/.gitignore +++ /dev/null @@ -1,4 +0,0 @@ -ThreadPool -build -post_latgen_faster_mapped.so -pybind11 diff --git a/PaddleSpeech/DeepASR/decoder/post_latgen_faster_mapped.cc b/PaddleSpeech/DeepASR/decoder/post_latgen_faster_mapped.cc deleted file mode 100644 index ad8aaa84803d61bbce3d76757954e47f8585ed8b..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/decoder/post_latgen_faster_mapped.cc +++ /dev/null @@ -1,305 +0,0 @@ -/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. */ - -#include "post_latgen_faster_mapped.h" -#include -#include "ThreadPool.h" - -using namespace kaldi; -typedef kaldi::int32 int32; -using fst::SymbolTable; -using fst::Fst; -using fst::StdArc; - -Decoder::Decoder(std::string trans_model_in_filename, - std::string word_syms_filename, - std::string fst_in_filename, - std::string logprior_in_filename, - size_t beam_size, - kaldi::BaseFloat acoustic_scale) { - const char *usage = - "Generate lattices using neural net model.\n" - "Usage: post-latgen-faster-mapped [options] " - " " - " [ [] " - "]\n"; - ParseOptions po(usage); - allow_partial = false; - this->acoustic_scale = acoustic_scale; - - config.Register(&po); - int32 beam = 11; - po.Register("acoustic-scale", - &acoustic_scale, - "Scaling factor for acoustic likelihoods"); - po.Register("word-symbol-table", - &word_syms_filename, - "Symbol table for words [for debug output]"); - po.Register("allow-partial", - &allow_partial, - "If true, produce output even if end state was not reached."); - - int argc = 2; - char *argv[] = {(char *)"post-latgen-faster-mapped", - (char *)("--beam=" + std::to_string(beam_size)).c_str()}; - - po.Read(argc, argv); - - std::ifstream is_logprior(logprior_in_filename); - logprior.Read(is_logprior, false); - - { - bool binary; - Input ki(trans_model_in_filename, &binary); - this->trans_model.Read(ki.Stream(), binary); - } - - this->determinize = config.determinize_lattice; - - this->word_syms = NULL; - if (word_syms_filename != "") { - if (!(word_syms = fst::SymbolTable::ReadText(word_syms_filename))) { - KALDI_ERR << "Could not read symbol table from file " - << word_syms_filename; - } - } - - // Input FST is just one FST, not a table of FSTs. - this->decode_fst = fst::ReadFstKaldiGeneric(fst_in_filename); - - kaldi::LatticeFasterDecoder *decoder = - new LatticeFasterDecoder(*decode_fst, config); - decoder_pool.emplace_back(decoder); - - std::string lattice_wspecifier = - "ark:|gzip -c > mapped_decoder_data/lat.JOB.gz"; - if (!(determinize ? compact_lattice_writer.Open(lattice_wspecifier) - : lattice_writer.Open(lattice_wspecifier))) - KALDI_ERR << "Could not open table for writing lattices: " - << lattice_wspecifier; - - words_writer = new Int32VectorWriter(""); - alignment_writer = new Int32VectorWriter(""); -} - -Decoder::~Decoder() { - if (!this->word_syms) delete this->word_syms; - delete this->decode_fst; - for (size_t i = 0; i < decoder_pool.size(); ++i) { - delete decoder_pool[i]; - } - delete words_writer; - delete alignment_writer; -} - - -void Decoder::decode_from_file(std::string posterior_rspecifier, - size_t num_processes) { - try { - double tot_like = 0.0; - kaldi::int64 frame_count = 0; - // int num_success = 0, num_fail = 0; - - KALDI_ASSERT(ClassifyRspecifier(fst_in_filename, NULL, NULL) == - kNoRspecifier); - SequentialBaseFloatMatrixReader posterior_reader("ark:" + - posterior_rspecifier); - - Timer timer; - timer.Reset(); - double elapsed = 0.0; - - for (size_t n = decoder_pool.size(); n < num_processes; ++n) { - kaldi::LatticeFasterDecoder *decoder = - new LatticeFasterDecoder(*decode_fst, config); - decoder_pool.emplace_back(decoder); - } - elapsed = timer.Elapsed(); - ThreadPool thread_pool(num_processes); - - while (!posterior_reader.Done()) { - timer.Reset(); - std::vector> que; - for (size_t i = 0; i < num_processes && !posterior_reader.Done(); ++i) { - std::string utt = posterior_reader.Key(); - Matrix &loglikes(posterior_reader.Value()); - que.emplace_back(thread_pool.enqueue(std::bind( - &Decoder::decode_internal, this, decoder_pool[i], utt, loglikes))); - posterior_reader.Next(); - } - timer.Reset(); - for (size_t i = 0; i < que.size(); ++i) { - std::cout << que[i].get() << std::endl; - } - } - - } catch (const std::exception &e) { - std::cerr << e.what(); - } -} - -inline kaldi::Matrix vector2kaldi_mat( - const std::vector> &log_probs) { - size_t num_frames = log_probs.size(); - size_t dim_label = log_probs[0].size(); - kaldi::Matrix loglikes( - num_frames, dim_label, kaldi::kSetZero, kaldi::kStrideEqualNumCols); - for (size_t i = 0; i < num_frames; ++i) { - memcpy(loglikes.Data() + i * dim_label, - log_probs[i].data(), - sizeof(kaldi::BaseFloat) * dim_label); - } - return loglikes; -} - -std::vector Decoder::decode_batch( - std::vector keys, - const std::vector>> - &log_probs_batch, - size_t num_processes) { - ThreadPool thread_pool(num_processes); - std::vector decoding_results; //(keys.size(), ""); - - for (size_t n = decoder_pool.size(); n < num_processes; ++n) { - kaldi::LatticeFasterDecoder *decoder = - new LatticeFasterDecoder(*decode_fst, config); - decoder_pool.emplace_back(decoder); - } - - size_t index = 0; - while (index < keys.size()) { - std::vector> res_in_que; - for (size_t t = 0; t < num_processes && index < keys.size(); ++t) { - kaldi::Matrix loglikes = - vector2kaldi_mat(log_probs_batch[index]); - res_in_que.emplace_back( - thread_pool.enqueue(std::bind(&Decoder::decode_internal, - this, - decoder_pool[t], - keys[index], - loglikes))); - index++; - } - for (size_t i = 0; i < res_in_que.size(); ++i) { - decoding_results.emplace_back(res_in_que[i].get()); - } - } - return decoding_results; -} - -std::string Decoder::decode( - std::string key, - const std::vector> &log_probs) { - kaldi::Matrix loglikes = vector2kaldi_mat(log_probs); - return decode_internal(decoder_pool[0], key, loglikes); -} - - -std::string Decoder::decode_internal( - LatticeFasterDecoder *decoder, - std::string key, - kaldi::Matrix &loglikes) { - if (loglikes.NumRows() == 0) { - KALDI_WARN << "Zero-length utterance: " << key; - // num_fail++; - } - KALDI_ASSERT(loglikes.NumCols() == logprior.Dim()); - - loglikes.ApplyLog(); - loglikes.AddVecToRows(-1.0, logprior); - - DecodableMatrixScaledMapped matrix_decodable( - trans_model, loglikes, acoustic_scale); - double like; - return this->DecodeUtteranceLatticeFaster( - decoder, matrix_decodable, key, &like); -} - - -std::string Decoder::DecodeUtteranceLatticeFaster( - LatticeFasterDecoder *decoder, - DecodableInterface &decodable, // not const but is really an input. - std::string utt, - double *like_ptr) { // puts utterance's like in like_ptr on success. - using fst::VectorFst; - std::string ret = utt + ' '; - - if (!decoder->Decode(&decodable)) { - KALDI_WARN << "Failed to decode file " << utt; - return ret; - } - if (!decoder->ReachedFinal()) { - if (allow_partial) { - KALDI_WARN << "Outputting partial output for utterance " << utt - << " since no final-state reached\n"; - } else { - KALDI_WARN << "Not producing output for utterance " << utt - << " since no final-state reached and " - << "--allow-partial=false.\n"; - return ret; - } - } - - double likelihood; - LatticeWeight weight; - int32 num_frames; - { // First do some stuff with word-level traceback... - VectorFst decoded; - if (!decoder->GetBestPath(&decoded)) - // Shouldn't really reach this point as already checked success. - KALDI_ERR << "Failed to get traceback for utterance " << utt; - - std::vector alignment; - std::vector words; - GetLinearSymbolSequence(decoded, &alignment, &words, &weight); - num_frames = alignment.size(); - // if (alignment_writer->IsOpen()) alignment_writer->Write(utt, alignment); - if (word_syms != NULL) { - for (size_t i = 0; i < words.size(); i++) { - std::string s = word_syms->Find(words[i]); - ret += s + ' '; - } - } - likelihood = -(weight.Value1() + weight.Value2()); - } - - // Get lattice, and do determinization if requested. - Lattice lat; - decoder->GetRawLattice(&lat); - if (lat.NumStates() == 0) - KALDI_ERR << "Unexpected problem getting lattice for utterance " << utt; - fst::Connect(&lat); - if (determinize) { - CompactLattice clat; - if (!DeterminizeLatticePhonePrunedWrapper( - trans_model, - &lat, - decoder->GetOptions().lattice_beam, - &clat, - decoder->GetOptions().det_opts)) - KALDI_WARN << "Determinization finished earlier than the beam for " - << "utterance " << utt; - // We'll write the lattice without acoustic scaling. - if (acoustic_scale != 0.0) - fst::ScaleLattice(fst::AcousticLatticeScale(1.0 / acoustic_scale), &clat); - // disable output lattice temporarily - // compact_lattice_writer.Write(utt, clat); - } else { - // We'll write the lattice without acoustic scaling. - if (acoustic_scale != 0.0) - fst::ScaleLattice(fst::AcousticLatticeScale(1.0 / acoustic_scale), &lat); - // lattice_writer.Write(utt, lat); - } - return ret; -} diff --git a/PaddleSpeech/DeepASR/decoder/post_latgen_faster_mapped.h b/PaddleSpeech/DeepASR/decoder/post_latgen_faster_mapped.h deleted file mode 100644 index 9c234b8681690b9f1e3d30b61ac3b97b7055887f..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/decoder/post_latgen_faster_mapped.h +++ /dev/null @@ -1,80 +0,0 @@ -/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. */ - -#include -#include -#include "base/kaldi-common.h" -#include "base/timer.h" -#include "decoder/decodable-matrix.h" -#include "decoder/decoder-wrappers.h" -#include "fstext/kaldi-fst-io.h" -#include "hmm/transition-model.h" -#include "tree/context-dep.h" -#include "util/common-utils.h" - -class Decoder { -public: - Decoder(std::string trans_model_in_filename, - std::string word_syms_filename, - std::string fst_in_filename, - std::string logprior_in_filename, - size_t beam_size, - kaldi::BaseFloat acoustic_scale); - ~Decoder(); - - // Interface to accept the scores read from specifier and print - // the decoding results directly - void decode_from_file(std::string posterior_rspecifier, - size_t num_processes = 1); - - // Accept the scores of one utterance and return the decoding result - std::string decode( - std::string key, - const std::vector> &log_probs); - - // Accept the scores of utterances in batch and return the decoding results - std::vector decode_batch( - std::vector key, - const std::vector>> - &log_probs_batch, - size_t num_processes = 1); - -private: - // For decoding one utterance - std::string decode_internal(kaldi::LatticeFasterDecoder *decoder, - std::string key, - kaldi::Matrix &loglikes); - - std::string DecodeUtteranceLatticeFaster(kaldi::LatticeFasterDecoder *decoder, - kaldi::DecodableInterface &decodable, - std::string utt, - double *like_ptr); - - fst::SymbolTable *word_syms; - fst::Fst *decode_fst; - std::vector decoder_pool; - kaldi::Vector logprior; - kaldi::TransitionModel trans_model; - kaldi::LatticeFasterDecoderConfig config; - - kaldi::CompactLatticeWriter compact_lattice_writer; - kaldi::LatticeWriter lattice_writer; - kaldi::Int32VectorWriter *words_writer; - kaldi::Int32VectorWriter *alignment_writer; - - bool binary; - bool determinize; - kaldi::BaseFloat acoustic_scale; - bool allow_partial; -}; diff --git a/PaddleSpeech/DeepASR/decoder/pybind.cc b/PaddleSpeech/DeepASR/decoder/pybind.cc deleted file mode 100644 index 4a9b27d4cf862e5c1492875512fdeba3e95ecb15..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/decoder/pybind.cc +++ /dev/null @@ -1,51 +0,0 @@ -/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. */ - -#include -#include - -#include "post_latgen_faster_mapped.h" - -namespace py = pybind11; - -PYBIND11_MODULE(post_latgen_faster_mapped, m) { - m.doc() = "Decoder for Deep ASR model"; - - py::class_(m, "Decoder") - .def(py::init()) - .def("decode_from_file", - (void (Decoder::*)(std::string, size_t)) & Decoder::decode_from_file, - "Decode for the probability matrices in specifier " - "and print the transcriptions.") - .def( - "decode", - (std::string (Decoder::*)( - std::string, const std::vector>&)) & - Decoder::decode, - "Decode one input probability matrix " - "and return the transcription.") - .def("decode_batch", - (std::vector (Decoder::*)( - std::vector, - const std::vector>>&, - size_t num_processes)) & - Decoder::decode_batch, - "Decode one batch of probability matrices " - "and return the transcriptions."); -} diff --git a/PaddleSpeech/DeepASR/decoder/setup.py b/PaddleSpeech/DeepASR/decoder/setup.py deleted file mode 100644 index 81fc857cce5b57af5bce7b34a1f4243fb853c0b6..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/decoder/setup.py +++ /dev/null @@ -1,71 +0,0 @@ -# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import os -import glob -from distutils.core import setup, Extension -from distutils.sysconfig import get_config_vars - -try: - kaldi_root = os.environ['KALDI_ROOT'] -except: - raise ValueError("Enviroment variable 'KALDI_ROOT' is not defined. Please " - "install kaldi and export KALDI_ROOT= .") - -args = [ - '-std=c++11', '-fopenmp', '-Wno-sign-compare', '-Wno-unused-variable', - '-Wno-unused-local-typedefs', '-Wno-unused-but-set-variable', - '-Wno-deprecated-declarations', '-Wno-unused-function' -] - -# remove warning about -Wstrict-prototypes -(opt, ) = get_config_vars('OPT') -os.environ['OPT'] = " ".join(flag for flag in opt.split() - if flag != '-Wstrict-prototypes') -os.environ['CC'] = 'g++' - -LIBS = [ - 'fst', 'kaldi-base', 'kaldi-util', 'kaldi-matrix', 'kaldi-tree', - 'kaldi-hmm', 'kaldi-fstext', 'kaldi-decoder', 'kaldi-lat' -] - -LIB_DIRS = [ - 'tools/openfst/lib', 'src/base', 'src/matrix', 'src/util', 'src/tree', - 'src/hmm', 'src/fstext', 'src/decoder', 'src/lat' -] -LIB_DIRS = [os.path.join(kaldi_root, path) for path in LIB_DIRS] -LIB_DIRS = [os.path.abspath(path) for path in LIB_DIRS] - -ext_modules = [ - Extension( - 'post_latgen_faster_mapped', - ['pybind.cc', 'post_latgen_faster_mapped.cc'], - include_dirs=[ - 'pybind11/include', '.', os.path.join(kaldi_root, 'src'), - os.path.join(kaldi_root, 'tools/openfst/src/include'), 'ThreadPool' - ], - language='c++', - libraries=LIBS, - library_dirs=LIB_DIRS, - runtime_library_dirs=LIB_DIRS, - extra_compile_args=args, ), -] - -setup( - name='post_latgen_faster_mapped', - version='0.1.0', - author='Paddle', - author_email='', - description='Decoder for Deep ASR model', - ext_modules=ext_modules, ) diff --git a/PaddleSpeech/DeepASR/decoder/setup.sh b/PaddleSpeech/DeepASR/decoder/setup.sh deleted file mode 100644 index 238cc64986900bae6fa0bb403d8134981212b8ea..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/decoder/setup.sh +++ /dev/null @@ -1,12 +0,0 @@ -set -e - -if [ ! -d pybind11 ]; then - git clone https://github.com/pybind/pybind11.git -fi - -if [ ! -d ThreadPool ]; then - git clone https://github.com/progschj/ThreadPool.git - echo -e "\n" -fi - -python setup.py build_ext -i diff --git a/PaddleSpeech/DeepASR/examples/aishell/.gitignore b/PaddleSpeech/DeepASR/examples/aishell/.gitignore deleted file mode 100644 index c173dd880ae9e06c16989800e06d4d3d7a1a7d5f..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/examples/aishell/.gitignore +++ /dev/null @@ -1,4 +0,0 @@ -aux.tar.gz -aux -data -checkpoints diff --git a/PaddleSpeech/DeepASR/examples/aishell/download_pretrained_model.sh b/PaddleSpeech/DeepASR/examples/aishell/download_pretrained_model.sh deleted file mode 100644 index a8813e241c4f6e40392dff6f173160d2bbd77175..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/examples/aishell/download_pretrained_model.sh +++ /dev/null @@ -1,15 +0,0 @@ -url=http://deep-asr-data.gz.bcebos.com/aishell_pretrained_model.tar.gz -md5=7b51bde64e884f43901b7a3461ccbfa3 - -wget -c $url - -echo "Checking md5 sum ..." -md5sum_tmp=`md5sum aishell_pretrained_model.tar.gz | cut -d ' ' -f1` - -if [ $md5sum_tmp != $md5 ]; then - echo "Md5sum check failed, please remove and redownload " - "aishell_pretrained_model.tar.gz." - exit 1 -fi - -tar xvf aishell_pretrained_model.tar.gz diff --git a/PaddleSpeech/DeepASR/examples/aishell/infer_by_ckpt.sh b/PaddleSpeech/DeepASR/examples/aishell/infer_by_ckpt.sh deleted file mode 100644 index 2d31757451849afc1412421376484d2ad41962bc..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/examples/aishell/infer_by_ckpt.sh +++ /dev/null @@ -1,18 +0,0 @@ -decode_to_path=./decoding_result.txt - -export CUDA_VISIBLE_DEVICES=0,1,2,3 -python -u ../../infer_by_ckpt.py --batch_size 96 \ - --checkpoint checkpoints/deep_asr.latest.checkpoint \ - --infer_feature_lst data/test_feature.lst \ - --mean_var data/global_mean_var \ - --frame_dim 80 \ - --class_num 3040 \ - --num_threads 24 \ - --beam_size 11 \ - --decode_to_path $decode_to_path \ - --trans_model aux/final.mdl \ - --log_prior aux/logprior \ - --vocabulary aux/graph/words.txt \ - --graphs aux/graph/HCLG.fst \ - --acoustic_scale 0.059 \ - --parallel diff --git a/PaddleSpeech/DeepASR/examples/aishell/prepare_data.sh b/PaddleSpeech/DeepASR/examples/aishell/prepare_data.sh deleted file mode 100644 index 8bb7ac5cccb2ba72fd6351fc1e6755f5135740d8..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/examples/aishell/prepare_data.sh +++ /dev/null @@ -1,43 +0,0 @@ -data_dir=~/.cache/paddle/dataset/speech/deep_asr_data/aishell -data_url='http://deep-asr-data.gz.bcebos.com/aishell_data.tar.gz' -lst_url='http://deep-asr-data.gz.bcebos.com/aishell_lst.tar.gz' -aux_url='http://deep-asr-data.gz.bcebos.com/aux.tar.gz' -md5=17669b8d63331c9326f4a9393d289bfb -aux_md5=50e3125eba1e3a2768a6f2e499cc1749 - -if [ ! -e $data_dir ]; then - mkdir -p $data_dir -fi - -if [ ! -e $data_dir/aishell_data.tar.gz ]; then - echo "Download $data_dir/aishell_data.tar.gz ..." - wget -c -P $data_dir $data_url -else - echo "Skip downloading for $data_dir/aishell_data.tar.gz has already existed!" -fi - -echo "Checking md5 sum ..." -md5sum_tmp=`md5sum $data_dir/aishell_data.tar.gz | cut -d ' ' -f1` - -if [ $md5sum_tmp != $md5 ]; then - echo "Md5sum check failed, please remove and redownload " - "$data_dir/aishell_data.tar.gz" - exit 1 -fi - -echo "Untar aishell_data.tar.gz ..." -tar xzf $data_dir/aishell_data.tar.gz -C $data_dir - -if [ ! -e data ]; then - mkdir data -fi - -echo "Download and untar lst files ..." -wget -c -P data $lst_url -tar xvf data/aishell_lst.tar.gz -C data - -ln -s $data_dir data/aishell - -echo "Download and untar aux files ..." -wget -c $aux_url -tar xvf aux.tar.gz diff --git a/PaddleSpeech/DeepASR/examples/aishell/profile.sh b/PaddleSpeech/DeepASR/examples/aishell/profile.sh deleted file mode 100644 index e7df868b9ea26db3d91be0c01d0b7ecb63c374de..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/examples/aishell/profile.sh +++ /dev/null @@ -1,7 +0,0 @@ -export CUDA_VISIBLE_DEVICES=0 -python -u ../../tools/profile.py --feature_lst data/train_feature.lst \ - --label_lst data/train_label.lst \ - --mean_var data/global_mean_var \ - --frame_dim 80 \ - --class_num 3040 \ - --batch_size 16 diff --git a/PaddleSpeech/DeepASR/examples/aishell/score_cer.sh b/PaddleSpeech/DeepASR/examples/aishell/score_cer.sh deleted file mode 100644 index 70dfcbad4a8427adcc1149fbab02ec674dacde0c..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/examples/aishell/score_cer.sh +++ /dev/null @@ -1,4 +0,0 @@ -ref_txt=aux/test.ref.txt -hyp_txt=decoding_result.txt - -python ../../score_error_rate.py --error_rate_type cer --ref $ref_txt --hyp $hyp_txt diff --git a/PaddleSpeech/DeepASR/examples/aishell/train.sh b/PaddleSpeech/DeepASR/examples/aishell/train.sh deleted file mode 100644 index 168581c0ee579ef62f138bb0d8f5bb8886beb90b..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/examples/aishell/train.sh +++ /dev/null @@ -1,14 +0,0 @@ -export CUDA_VISIBLE_DEVICES=4,5,6,7 -python -u ../../train.py --train_feature_lst data/train_feature.lst \ - --train_label_lst data/train_label.lst \ - --val_feature_lst data/val_feature.lst \ - --val_label_lst data/val_label.lst \ - --mean_var data/global_mean_var \ - --checkpoints checkpoints \ - --frame_dim 80 \ - --class_num 3040 \ - --print_per_batches 100 \ - --infer_models '' \ - --batch_size 16 \ - --learning_rate 6.4e-5 \ - --parallel diff --git a/PaddleSpeech/DeepASR/images/learning_curve.png b/PaddleSpeech/DeepASR/images/learning_curve.png deleted file mode 100644 index f09e8514e16fa09c8c32f3b455a5515f270df27a..0000000000000000000000000000000000000000 Binary files a/PaddleSpeech/DeepASR/images/learning_curve.png and /dev/null differ diff --git a/PaddleSpeech/DeepASR/images/lstmp.png b/PaddleSpeech/DeepASR/images/lstmp.png deleted file mode 100644 index 72c2fc28998b09218f5dfd9d4c4d09a773b4f503..0000000000000000000000000000000000000000 Binary files a/PaddleSpeech/DeepASR/images/lstmp.png and /dev/null differ diff --git a/PaddleSpeech/DeepASR/infer.py b/PaddleSpeech/DeepASR/infer.py deleted file mode 100644 index 84269261a95c381a9be21425abf43b98006f0886..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/infer.py +++ /dev/null @@ -1,108 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os -import argparse -import paddle.fluid as fluid -import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm -import data_utils.augmentor.trans_add_delta as trans_add_delta -import data_utils.augmentor.trans_splice as trans_splice -import data_utils.async_data_reader as reader -from data_utils.util import lodtensor_to_ndarray -from data_utils.util import split_infer_result - - -def parse_args(): - parser = argparse.ArgumentParser("Inference for stacked LSTMP model.") - parser.add_argument( - '--batch_size', - type=int, - default=32, - help='The sequence number of a batch data. (default: %(default)d)') - parser.add_argument( - '--device', - type=str, - default='GPU', - choices=['CPU', 'GPU'], - help='The device type. (default: %(default)s)') - parser.add_argument( - '--mean_var', - type=str, - default='data/global_mean_var_search26kHr', - help="The path for feature's global mean and variance. " - "(default: %(default)s)") - parser.add_argument( - '--infer_feature_lst', - type=str, - default='data/infer_feature.lst', - help='The feature list path for inference. (default: %(default)s)') - parser.add_argument( - '--infer_label_lst', - type=str, - default='data/infer_label.lst', - help='The label list path for inference. (default: %(default)s)') - parser.add_argument( - '--infer_model_path', - type=str, - default='./infer_models/deep_asr.pass_0.infer.model/', - help='The directory for loading inference model. ' - '(default: %(default)s)') - args = parser.parse_args() - return args - - -def print_arguments(args): - print('----------- Configuration Arguments -----------') - for arg, value in sorted(vars(args).iteritems()): - print('%s: %s' % (arg, value)) - print('------------------------------------------------') - - -def infer(args): - """ Gets one batch of feature data and predicts labels for each sample. - """ - - if not os.path.exists(args.infer_model_path): - raise IOError("Invalid inference model path!") - - place = fluid.CUDAPlace(0) if args.device == 'GPU' else fluid.CPUPlace() - exe = fluid.Executor(place) - - # load model - [infer_program, feed_dict, - fetch_targets] = fluid.io.load_inference_model(args.infer_model_path, exe) - - ltrans = [ - trans_add_delta.TransAddDelta(2, 2), - trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var), - trans_splice.TransSplice() - ] - - infer_data_reader = reader.AsyncDataReader(args.infer_feature_lst, - args.infer_label_lst) - infer_data_reader.set_transformers(ltrans) - - feature_t = fluid.LoDTensor() - one_batch = infer_data_reader.batch_iterator(args.batch_size, 1).next() - - (features, labels, lod) = one_batch - feature_t.set(features, place) - feature_t.set_lod([lod]) - - results = exe.run(infer_program, - feed={feed_dict[0]: feature_t}, - fetch_list=fetch_targets, - return_numpy=False) - - probs, lod = lodtensor_to_ndarray(results[0]) - preds = probs.argmax(axis=1) - infer_batch = split_infer_result(preds, lod) - for index, sample in enumerate(infer_batch): - print("result %d: " % index, sample, '\n') - - -if __name__ == '__main__': - args = parse_args() - print_arguments(args) - infer(args) diff --git a/PaddleSpeech/DeepASR/infer_by_ckpt.py b/PaddleSpeech/DeepASR/infer_by_ckpt.py deleted file mode 100644 index 1e0fb15c6d6f05aa1e054b37333b0fa0cb5cd8d9..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/infer_by_ckpt.py +++ /dev/null @@ -1,273 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import sys -import os -import numpy as np -import argparse -import time - -import paddle.fluid as fluid -import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm -import data_utils.augmentor.trans_add_delta as trans_add_delta -import data_utils.augmentor.trans_splice as trans_splice -import data_utils.augmentor.trans_delay as trans_delay -import data_utils.async_data_reader as reader -from data_utils.util import lodtensor_to_ndarray, split_infer_result -from model_utils.model import stacked_lstmp_model -from decoder.post_latgen_faster_mapped import Decoder -from tools.error_rate import char_errors - - -def parse_args(): - parser = argparse.ArgumentParser("Run inference by using checkpoint.") - parser.add_argument( - '--batch_size', - type=int, - default=32, - help='The sequence number of a batch data. (default: %(default)d)') - parser.add_argument( - '--beam_size', - type=int, - default=11, - help='The beam size for decoding. (default: %(default)d)') - parser.add_argument( - '--minimum_batch_size', - type=int, - default=1, - help='The minimum sequence number of a batch data. ' - '(default: %(default)d)') - parser.add_argument( - '--frame_dim', - type=int, - default=80, - help='Frame dimension of feature data. (default: %(default)d)') - parser.add_argument( - '--stacked_num', - type=int, - default=5, - help='Number of lstmp layers to stack. (default: %(default)d)') - parser.add_argument( - '--proj_dim', - type=int, - default=512, - help='Project size of lstmp unit. (default: %(default)d)') - parser.add_argument( - '--hidden_dim', - type=int, - default=1024, - help='Hidden size of lstmp unit. (default: %(default)d)') - parser.add_argument( - '--class_num', - type=int, - default=1749, - help='Number of classes in label. (default: %(default)d)') - parser.add_argument( - '--num_threads', - type=int, - default=10, - help='The number of threads for decoding. (default: %(default)d)') - parser.add_argument( - '--device', - type=str, - default='GPU', - choices=['CPU', 'GPU'], - help='The device type. (default: %(default)s)') - parser.add_argument( - '--parallel', action='store_true', help='If set, run in parallel.') - parser.add_argument( - '--mean_var', - type=str, - default='data/global_mean_var', - help="The path for feature's global mean and variance. " - "(default: %(default)s)") - parser.add_argument( - '--infer_feature_lst', - type=str, - default='data/infer_feature.lst', - help='The feature list path for inference. (default: %(default)s)') - parser.add_argument( - '--checkpoint', - type=str, - default='./checkpoint', - help="The checkpoint path to init model. (default: %(default)s)") - parser.add_argument( - '--trans_model', - type=str, - default='./graph/trans_model', - help="The path to vocabulary. (default: %(default)s)") - parser.add_argument( - '--vocabulary', - type=str, - default='./graph/words.txt', - help="The path to vocabulary. (default: %(default)s)") - parser.add_argument( - '--graphs', - type=str, - default='./graph/TLG.fst', - help="The path to TLG graphs for decoding. (default: %(default)s)") - parser.add_argument( - '--log_prior', - type=str, - default="./logprior", - help="The log prior probs for training data. (default: %(default)s)") - parser.add_argument( - '--acoustic_scale', - type=float, - default=0.2, - help="Scaling factor for acoustic likelihoods. (default: %(default)f)") - parser.add_argument( - '--post_matrix_path', - type=str, - default=None, - help="The path to output post prob matrix. (default: %(default)s)") - parser.add_argument( - '--decode_to_path', - type=str, - default='./decoding_result.txt', - required=True, - help="The path to output the decoding result. (default: %(default)s)") - args = parser.parse_args() - return args - - -def print_arguments(args): - print('----------- Configuration Arguments -----------') - for arg, value in sorted(vars(args).iteritems()): - print('%s: %s' % (arg, value)) - print('------------------------------------------------') - - -class PostMatrixWriter: - """ The writer for outputing the post probability matrix - """ - - def __init__(self, to_path): - self._to_path = to_path - with open(self._to_path, "w") as post_matrix: - post_matrix.seek(0) - post_matrix.truncate() - - def write(self, keys, probs): - with open(self._to_path, "a") as post_matrix: - if isinstance(keys, str): - keys, probs = [keys], [probs] - - for key, prob in zip(keys, probs): - post_matrix.write(key + " [\n") - for i in range(prob.shape[0]): - for j in range(prob.shape[1]): - post_matrix.write(str(prob[i][j]) + " ") - post_matrix.write("\n") - post_matrix.write("]\n") - - -class DecodingResultWriter: - """ The writer for writing out decoding results - """ - - def __init__(self, to_path): - self._to_path = to_path - with open(self._to_path, "w") as decoding_result: - decoding_result.seek(0) - decoding_result.truncate() - - def write(self, results): - with open(self._to_path, "a") as decoding_result: - if isinstance(results, str): - decoding_result.write(results.encode("utf8") + "\n") - else: - for result in results: - decoding_result.write(result.encode("utf8") + "\n") - - -def infer_from_ckpt(args): - """Inference by using checkpoint.""" - - if not os.path.exists(args.checkpoint): - raise IOError("Invalid checkpoint!") - - prediction, avg_cost, accuracy = stacked_lstmp_model( - frame_dim=args.frame_dim, - hidden_dim=args.hidden_dim, - proj_dim=args.proj_dim, - stacked_num=args.stacked_num, - class_num=args.class_num, - parallel=args.parallel) - - infer_program = fluid.default_main_program().clone() - - # optimizer, placeholder - optimizer = fluid.optimizer.Adam( - learning_rate=fluid.layers.exponential_decay( - learning_rate=0.0001, - decay_steps=1879, - decay_rate=1 / 1.2, - staircase=True)) - optimizer.minimize(avg_cost) - - place = fluid.CPUPlace() if args.device == 'CPU' else fluid.CUDAPlace(0) - exe = fluid.Executor(place) - exe.run(fluid.default_startup_program()) - - # load checkpoint. - fluid.io.load_persistables(exe, args.checkpoint) - - # init decoder - decoder = Decoder(args.trans_model, args.vocabulary, args.graphs, - args.log_prior, args.beam_size, args.acoustic_scale) - - ltrans = [ - trans_add_delta.TransAddDelta(2, 2), - trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var), - trans_splice.TransSplice(5, 5), trans_delay.TransDelay(5) - ] - - feature_t = fluid.LoDTensor() - label_t = fluid.LoDTensor() - - # infer data reader - infer_data_reader = reader.AsyncDataReader( - args.infer_feature_lst, drop_frame_len=-1, split_sentence_threshold=-1) - infer_data_reader.set_transformers(ltrans) - - decoding_result_writer = DecodingResultWriter(args.decode_to_path) - post_matrix_writer = None if args.post_matrix_path is None \ - else PostMatrixWriter(args.post_matrix_path) - - for batch_id, batch_data in enumerate( - infer_data_reader.batch_iterator(args.batch_size, - args.minimum_batch_size)): - # load_data - (features, labels, lod, name_lst) = batch_data - features = np.reshape(features, (-1, 11, 3, args.frame_dim)) - features = np.transpose(features, (0, 2, 1, 3)) - feature_t.set(features, place) - feature_t.set_lod([lod]) - label_t.set(labels, place) - label_t.set_lod([lod]) - - results = exe.run(infer_program, - feed={"feature": feature_t, - "label": label_t}, - fetch_list=[prediction, avg_cost, accuracy], - return_numpy=False) - - probs, lod = lodtensor_to_ndarray(results[0]) - infer_batch = split_infer_result(probs, lod) - - print("Decoding batch %d ..." % batch_id) - decoded = decoder.decode_batch(name_lst, infer_batch, args.num_threads) - - decoding_result_writer.write(decoded) - - if args.post_matrix_path is not None: - post_matrix_writer.write(name_lst, infer_batch) - - -if __name__ == '__main__': - args = parse_args() - print_arguments(args) - - infer_from_ckpt(args) diff --git a/PaddleSpeech/DeepASR/model_utils/__init__.py b/PaddleSpeech/DeepASR/model_utils/__init__.py deleted file mode 100644 index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..0000000000000000000000000000000000000000 diff --git a/PaddleSpeech/DeepASR/model_utils/model.py b/PaddleSpeech/DeepASR/model_utils/model.py deleted file mode 100644 index 0b086b55a898a0a29f57132b438684a655e30caf..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/model_utils/model.py +++ /dev/null @@ -1,74 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import paddle.fluid as fluid - - -def stacked_lstmp_model(feature, - label, - hidden_dim, - proj_dim, - stacked_num, - class_num, - parallel=False, - is_train=True): - """ - The model for DeepASR. The main structure is composed of stacked - identical LSTMP (LSTM with recurrent projection) layers. - - When running in training and validation phase, the feeding dictionary - is {'feature', 'label'}, fed by the LodTensor for feature data and - label data respectively. And in inference, only `feature` is needed. - - Args: - frame_dim(int): The frame dimension of feature data. - hidden_dim(int): The hidden state's dimension of the LSTMP layer. - proj_dim(int): The projection size of the LSTMP layer. - stacked_num(int): The number of stacked LSTMP layers. - parallel(bool): Run in parallel or not, default `False`. - is_train(bool): Run in training phase or not, default `True`. - class_dim(int): The number of output classes. - """ - conv1 = fluid.layers.conv2d( - input=feature, - num_filters=32, - filter_size=3, - stride=1, - padding=1, - bias_attr=True, - act="relu") - - pool1 = fluid.layers.pool2d( - conv1, pool_size=3, pool_type="max", pool_stride=2, pool_padding=0) - - stack_input = pool1 - for i in range(stacked_num): - fc = fluid.layers.fc(input=stack_input, - size=hidden_dim * 4, - bias_attr=None) - proj, cell = fluid.layers.dynamic_lstmp( - input=fc, - size=hidden_dim * 4, - proj_size=proj_dim, - bias_attr=True, - use_peepholes=True, - is_reverse=False, - cell_activation="tanh", - proj_activation="tanh") - bn = fluid.layers.batch_norm( - input=proj, - is_test=not is_train, - momentum=0.9, - epsilon=1e-05, - data_layout='NCHW') - stack_input = bn - - prediction = fluid.layers.fc(input=stack_input, - size=class_num, - act='softmax') - - cost = fluid.layers.cross_entropy(input=prediction, label=label) - avg_cost = fluid.layers.mean(x=cost) - acc = fluid.layers.accuracy(input=prediction, label=label) - return prediction, avg_cost, acc diff --git a/PaddleSpeech/DeepASR/score_error_rate.py b/PaddleSpeech/DeepASR/score_error_rate.py deleted file mode 100644 index dde5a2448afffcae61c4d033159a5b081e6c79e8..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/score_error_rate.py +++ /dev/null @@ -1,80 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import argparse -from tools.error_rate import char_errors, word_errors - - -def parse_args(): - parser = argparse.ArgumentParser( - "Score word/character error rate (WER/CER) " - "for decoding result.") - parser.add_argument( - '--error_rate_type', - type=str, - default='cer', - choices=['cer', 'wer'], - help="Error rate type. (default: %(default)s)") - parser.add_argument( - '--special_tokens', - type=str, - default='', - help="Special tokens in scoring CER, seperated by space. " - "They shouldn't be splitted and should be treated as one special " - "character. Example: ' ' " - "(default: %(default)s)") - parser.add_argument( - '--ref', type=str, required=True, help="The ground truth text.") - parser.add_argument( - '--hyp', type=str, required=True, help="The decoding result text.") - args = parser.parse_args() - return args - - -if __name__ == '__main__': - - args = parse_args() - ref_dict = {} - sum_errors, sum_ref_len = 0.0, 0 - sent_cnt, not_in_ref_cnt = 0, 0 - - special_tokens = args.special_tokens.split(" ") - - with open(args.ref, "r") as ref_txt: - line = ref_txt.readline() - while line: - del_pos = line.find(" ") - key, sent = line[0:del_pos], line[del_pos + 1:-1].strip() - ref_dict[key] = sent - line = ref_txt.readline() - - with open(args.hyp, "r") as hyp_txt: - line = hyp_txt.readline() - while line: - del_pos = line.find(" ") - key, sent = line[0:del_pos], line[del_pos + 1:-1].strip() - sent_cnt += 1 - line = hyp_txt.readline() - if key not in ref_dict: - not_in_ref_cnt += 1 - continue - - if args.error_rate_type == 'cer': - for sp_tok in special_tokens: - sent = sent.replace(sp_tok, '\0') - errors, ref_len = char_errors( - ref_dict[key].decode("utf8"), - sent.decode("utf8"), - remove_space=True) - else: - errors, ref_len = word_errors(ref_dict[key].decode("utf8"), - sent.decode("utf8")) - sum_errors += errors - sum_ref_len += ref_len - - print("Error rate[%s] = %f (%d/%d)," % - (args.error_rate_type, sum_errors / sum_ref_len, int(sum_errors), - sum_ref_len)) - print("total %d sentences in hyp, %d not presented in ref." % - (sent_cnt, not_in_ref_cnt)) diff --git a/PaddleSpeech/DeepASR/tools/_init_paths.py b/PaddleSpeech/DeepASR/tools/_init_paths.py deleted file mode 100644 index 228dbae6bf95231030c1858c4d30b49f162f46e2..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/tools/_init_paths.py +++ /dev/null @@ -1,19 +0,0 @@ -"""Add the parent directory to $PYTHONPATH""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os.path -import sys - - -def add_path(path): - if path not in sys.path: - sys.path.insert(0, path) - - -this_dir = os.path.dirname(__file__) - -# Add project path to PYTHONPATH -proj_path = os.path.join(this_dir, '..') -add_path(proj_path) diff --git a/PaddleSpeech/DeepASR/tools/error_rate.py b/PaddleSpeech/DeepASR/tools/error_rate.py deleted file mode 100644 index 215ad39d24a551879d0fd8d4c8892161a0708370..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/tools/error_rate.py +++ /dev/null @@ -1,182 +0,0 @@ -# -*- coding: utf-8 -*- -"""This module provides functions to calculate error rate in different level. -e.g. wer for word-level, cer for char-level. -""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import numpy as np - - -def _levenshtein_distance(ref, hyp): - """Levenshtein distance is a string metric for measuring the difference - between two sequences. Informally, the levenshtein disctance is defined as - the minimum number of single-character edits (substitutions, insertions or - deletions) required to change one word into the other. We can naturally - extend the edits to word level when calculate levenshtein disctance for - two sentences. - """ - m = len(ref) - n = len(hyp) - - # special case - if ref == hyp: - return 0 - if m == 0: - return n - if n == 0: - return m - - if m < n: - ref, hyp = hyp, ref - m, n = n, m - - # use O(min(m, n)) space - distance = np.zeros((2, n + 1), dtype=np.int32) - - # initialize distance matrix - for j in xrange(n + 1): - distance[0][j] = j - - # calculate levenshtein distance - for i in xrange(1, m + 1): - prev_row_idx = (i - 1) % 2 - cur_row_idx = i % 2 - distance[cur_row_idx][0] = i - for j in xrange(1, n + 1): - if ref[i - 1] == hyp[j - 1]: - distance[cur_row_idx][j] = distance[prev_row_idx][j - 1] - else: - s_num = distance[prev_row_idx][j - 1] + 1 - i_num = distance[cur_row_idx][j - 1] + 1 - d_num = distance[prev_row_idx][j] + 1 - distance[cur_row_idx][j] = min(s_num, i_num, d_num) - - return distance[m % 2][n] - - -def word_errors(reference, hypothesis, ignore_case=False, delimiter=' '): - """Compute the levenshtein distance between reference sequence and - hypothesis sequence in word-level. - :param reference: The reference sentence. - :type reference: basestring - :param hypothesis: The hypothesis sentence. - :type hypothesis: basestring - :param ignore_case: Whether case-sensitive or not. - :type ignore_case: bool - :param delimiter: Delimiter of input sentences. - :type delimiter: char - :return: Levenshtein distance and word number of reference sentence. - :rtype: list - """ - if ignore_case == True: - reference = reference.lower() - hypothesis = hypothesis.lower() - - ref_words = filter(None, reference.split(delimiter)) - hyp_words = filter(None, hypothesis.split(delimiter)) - - edit_distance = _levenshtein_distance(ref_words, hyp_words) - return float(edit_distance), len(ref_words) - - -def char_errors(reference, hypothesis, ignore_case=False, remove_space=False): - """Compute the levenshtein distance between reference sequence and - hypothesis sequence in char-level. - :param reference: The reference sentence. - :type reference: basestring - :param hypothesis: The hypothesis sentence. - :type hypothesis: basestring - :param ignore_case: Whether case-sensitive or not. - :type ignore_case: bool - :param remove_space: Whether remove internal space characters - :type remove_space: bool - :return: Levenshtein distance and length of reference sentence. - :rtype: list - """ - if ignore_case == True: - reference = reference.lower() - hypothesis = hypothesis.lower() - - join_char = ' ' - if remove_space == True: - join_char = '' - - reference = join_char.join(filter(None, reference.split(' '))) - hypothesis = join_char.join(filter(None, hypothesis.split(' '))) - - edit_distance = _levenshtein_distance(reference, hypothesis) - return float(edit_distance), len(reference) - - -def wer(reference, hypothesis, ignore_case=False, delimiter=' '): - """Calculate word error rate (WER). WER compares reference text and - hypothesis text in word-level. WER is defined as: - .. math:: - WER = (Sw + Dw + Iw) / Nw - where - .. code-block:: text - Sw is the number of words subsituted, - Dw is the number of words deleted, - Iw is the number of words inserted, - Nw is the number of words in the reference - We can use levenshtein distance to calculate WER. Please draw an attention - that empty items will be removed when splitting sentences by delimiter. - :param reference: The reference sentence. - :type reference: basestring - :param hypothesis: The hypothesis sentence. - :type hypothesis: basestring - :param ignore_case: Whether case-sensitive or not. - :type ignore_case: bool - :param delimiter: Delimiter of input sentences. - :type delimiter: char - :return: Word error rate. - :rtype: float - :raises ValueError: If word number of reference is zero. - """ - edit_distance, ref_len = word_errors(reference, hypothesis, ignore_case, - delimiter) - - if ref_len == 0: - raise ValueError("Reference's word number should be greater than 0.") - - wer = float(edit_distance) / ref_len - return wer - - -def cer(reference, hypothesis, ignore_case=False, remove_space=False): - """Calculate charactor error rate (CER). CER compares reference text and - hypothesis text in char-level. CER is defined as: - .. math:: - CER = (Sc + Dc + Ic) / Nc - where - .. code-block:: text - Sc is the number of characters substituted, - Dc is the number of characters deleted, - Ic is the number of characters inserted - Nc is the number of characters in the reference - We can use levenshtein distance to calculate CER. Chinese input should be - encoded to unicode. Please draw an attention that the leading and tailing - space characters will be truncated and multiple consecutive space - characters in a sentence will be replaced by one space character. - :param reference: The reference sentence. - :type reference: basestring - :param hypothesis: The hypothesis sentence. - :type hypothesis: basestring - :param ignore_case: Whether case-sensitive or not. - :type ignore_case: bool - :param remove_space: Whether remove internal space characters - :type remove_space: bool - :return: Character error rate. - :rtype: float - :raises ValueError: If the reference length is zero. - """ - edit_distance, ref_len = char_errors(reference, hypothesis, ignore_case, - remove_space) - - if ref_len == 0: - raise ValueError("Length of reference should be greater than 0.") - - cer = float(edit_distance) / ref_len - return cer diff --git a/PaddleSpeech/DeepASR/tools/profile.py b/PaddleSpeech/DeepASR/tools/profile.py deleted file mode 100644 index d25e18f7db0111acf76e66478f8230aab1d5f760..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/tools/profile.py +++ /dev/null @@ -1,210 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import sys -import numpy as np -import argparse -import time - -import paddle.fluid as fluid -import paddle.fluid.profiler as profiler -import _init_paths -import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm -import data_utils.augmentor.trans_add_delta as trans_add_delta -import data_utils.augmentor.trans_splice as trans_splice -import data_utils.augmentor.trans_delay as trans_delay -import data_utils.async_data_reader as reader -from model_utils.model import stacked_lstmp_model -from data_utils.util import lodtensor_to_ndarray - - -def parse_args(): - parser = argparse.ArgumentParser("Profiling for the stacked LSTMP model.") - parser.add_argument( - '--batch_size', - type=int, - default=32, - help='The sequence number of a batch data. (default: %(default)d)') - parser.add_argument( - '--minimum_batch_size', - type=int, - default=1, - help='The minimum sequence number of a batch data. ' - '(default: %(default)d)') - parser.add_argument( - '--frame_dim', - type=int, - default=120 * 11, - help='Frame dimension of feature data. (default: %(default)d)') - parser.add_argument( - '--stacked_num', - type=int, - default=5, - help='Number of lstmp layers to stack. (default: %(default)d)') - parser.add_argument( - '--proj_dim', - type=int, - default=512, - help='Project size of lstmp unit. (default: %(default)d)') - parser.add_argument( - '--hidden_dim', - type=int, - default=1024, - help='Hidden size of lstmp unit. (default: %(default)d)') - parser.add_argument( - '--class_num', - type=int, - default=1749, - help='Number of classes in label. (default: %(default)d)') - parser.add_argument( - '--learning_rate', - type=float, - default=0.00016, - help='Learning rate used to train. (default: %(default)f)') - parser.add_argument( - '--device', - type=str, - default='GPU', - choices=['CPU', 'GPU'], - help='The device type. (default: %(default)s)') - parser.add_argument( - '--parallel', action='store_true', help='If set, run in parallel.') - parser.add_argument( - '--mean_var', - type=str, - default='data/global_mean_var_search26kHr', - help='mean var path') - parser.add_argument( - '--feature_lst', - type=str, - default='data/feature.lst', - help='feature list path.') - parser.add_argument( - '--label_lst', - type=str, - default='data/label.lst', - help='label list path.') - parser.add_argument( - '--max_batch_num', - type=int, - default=11, - help='Maximum number of batches for profiling. (default: %(default)d)') - parser.add_argument( - '--first_batches_to_skip', - type=int, - default=1, - help='Number of first batches to skip for profiling. ' - '(default: %(default)d)') - parser.add_argument( - '--print_train_acc', - action='store_true', - help='If set, output training accuray.') - parser.add_argument( - '--sorted_key', - type=str, - default='total', - choices=['None', 'total', 'calls', 'min', 'max', 'ave'], - help='Different types of time to sort the profiling report. ' - '(default: %(default)s)') - args = parser.parse_args() - return args - - -def print_arguments(args): - print('----------- Configuration Arguments -----------') - for arg, value in sorted(vars(args).iteritems()): - print('%s: %s' % (arg, value)) - print('------------------------------------------------') - - -def profile(args): - """profile the training process. - """ - - if not args.first_batches_to_skip < args.max_batch_num: - raise ValueError("arg 'first_batches_to_skip' must be smaller than " - "'max_batch_num'.") - if not args.first_batches_to_skip >= 0: - raise ValueError( - "arg 'first_batches_to_skip' must not be smaller than 0.") - - _, avg_cost, accuracy = stacked_lstmp_model( - frame_dim=args.frame_dim, - hidden_dim=args.hidden_dim, - proj_dim=args.proj_dim, - stacked_num=args.stacked_num, - class_num=args.class_num, - parallel=args.parallel) - - optimizer = fluid.optimizer.Adam( - learning_rate=fluid.layers.exponential_decay( - learning_rate=args.learning_rate, - decay_steps=1879, - decay_rate=1 / 1.2, - staircase=True)) - optimizer.minimize(avg_cost) - - place = fluid.CPUPlace() if args.device == 'CPU' else fluid.CUDAPlace(0) - exe = fluid.Executor(place) - exe.run(fluid.default_startup_program()) - - ltrans = [ - trans_add_delta.TransAddDelta(2, 2), - trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var), - trans_splice.TransSplice(5, 5), trans_delay.TransDelay(5) - ] - - data_reader = reader.AsyncDataReader( - args.feature_lst, args.label_lst, -1, split_sentence_threshold=1024) - data_reader.set_transformers(ltrans) - - feature_t = fluid.LoDTensor() - label_t = fluid.LoDTensor() - - sorted_key = None if args.sorted_key is 'None' else args.sorted_key - with profiler.profiler(args.device, sorted_key) as prof: - frames_seen, start_time = 0, 0.0 - for batch_id, batch_data in enumerate( - data_reader.batch_iterator(args.batch_size, - args.minimum_batch_size)): - if batch_id >= args.max_batch_num: - break - if args.first_batches_to_skip == batch_id: - profiler.reset_profiler() - start_time = time.time() - frames_seen = 0 - # load_data - (features, labels, lod, _) = batch_data - features = np.reshape(features, (-1, 11, 3, args.frame_dim)) - features = np.transpose(features, (0, 2, 1, 3)) - feature_t.set(features, place) - feature_t.set_lod([lod]) - label_t.set(labels, place) - label_t.set_lod([lod]) - - frames_seen += lod[-1] - - outs = exe.run(fluid.default_main_program(), - feed={"feature": feature_t, - "label": label_t}, - fetch_list=[avg_cost, accuracy] - if args.print_train_acc else [], - return_numpy=False) - - if args.print_train_acc: - print("Batch %d acc: %f" % - (batch_id, lodtensor_to_ndarray(outs[1])[0])) - else: - sys.stdout.write('.') - sys.stdout.flush() - time_consumed = time.time() - start_time - frames_per_sec = frames_seen / time_consumed - print("\nTime consumed: %f s, performance: %f frames/s." % - (time_consumed, frames_per_sec)) - - -if __name__ == '__main__': - args = parse_args() - print_arguments(args) - profile(args) diff --git a/PaddleSpeech/DeepASR/train.py b/PaddleSpeech/DeepASR/train.py deleted file mode 100644 index 1a1dd6cf9ea33bb546cc3bdf65c36be0441832cb..0000000000000000000000000000000000000000 --- a/PaddleSpeech/DeepASR/train.py +++ /dev/null @@ -1,372 +0,0 @@ -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import sys -import os -import numpy as np -import argparse -import time - -import paddle.fluid as fluid -import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm -import data_utils.augmentor.trans_add_delta as trans_add_delta -import data_utils.augmentor.trans_splice as trans_splice -import data_utils.augmentor.trans_delay as trans_delay -import data_utils.async_data_reader as reader -from model_utils.model import stacked_lstmp_model - - -def parse_args(): - parser = argparse.ArgumentParser("Training for stacked LSTMP model.") - parser.add_argument( - '--batch_size', - type=int, - default=32, - help='The sequence number of a batch data. Batch size per GPU. (default: %(default)d)' - ) - parser.add_argument( - '--minimum_batch_size', - type=int, - default=1, - help='The minimum sequence number of a batch data. ' - '(default: %(default)d)') - parser.add_argument( - '--frame_dim', - type=int, - default=80, - help='Frame dimension of feature data. (default: %(default)d)') - parser.add_argument( - '--stacked_num', - type=int, - default=5, - help='Number of lstmp layers to stack. (default: %(default)d)') - parser.add_argument( - '--proj_dim', - type=int, - default=512, - help='Project size of lstmp unit. (default: %(default)d)') - parser.add_argument( - '--hidden_dim', - type=int, - default=1024, - help='Hidden size of lstmp unit. (default: %(default)d)') - parser.add_argument( - '--class_num', - type=int, - default=3040, - help='Number of classes in label. (default: %(default)d)') - parser.add_argument( - '--pass_num', - type=int, - default=100, - help='Epoch number to train. (default: %(default)d)') - parser.add_argument( - '--print_per_batches', - type=int, - default=100, - help='Interval to print training accuracy. (default: %(default)d)') - parser.add_argument( - '--learning_rate', - type=float, - default=0.00016, - help='Learning rate used to train. (default: %(default)f)') - parser.add_argument( - '--device', - type=str, - default='GPU', - choices=['CPU', 'GPU'], - help='The device type. (default: %(default)s)') - parser.add_argument( - '--parallel', action='store_true', help='If set, run in parallel.') - parser.add_argument( - '--mean_var', - type=str, - default='data/global_mean_var_search26kHr', - help="The path for feature's global mean and variance. " - "(default: %(default)s)") - parser.add_argument( - '--train_feature_lst', - type=str, - default='data/feature.lst', - help='The feature list path for training. (default: %(default)s)') - parser.add_argument( - '--train_label_lst', - type=str, - default='data/label.lst', - help='The label list path for training. (default: %(default)s)') - parser.add_argument( - '--val_feature_lst', - type=str, - default='data/val_feature.lst', - help='The feature list path for validation. (default: %(default)s)') - parser.add_argument( - '--val_label_lst', - type=str, - default='data/val_label.lst', - help='The label list path for validation. (default: %(default)s)') - parser.add_argument( - '--init_model_path', - type=str, - default=None, - help="The model (checkpoint) path which the training resumes from. " - "If None, train the model from scratch. (default: %(default)s)") - parser.add_argument( - '--checkpoints', - type=str, - default='./checkpoints', - help="The directory for saving checkpoints. Do not save checkpoints " - "if set to ''. (default: %(default)s)") - parser.add_argument( - '--infer_models', - type=str, - default='./infer_models', - help="The directory for saving inference models. Do not save inference " - "models if set to ''. (default: %(default)s)") - args = parser.parse_args() - return args - - -def print_arguments(args): - print('----------- Configuration Arguments -----------') - for arg, value in sorted(vars(args).iteritems()): - print('%s: %s' % (arg, value)) - print('------------------------------------------------') - - -def train(args): - """train in loop. - """ - - # paths check - if args.init_model_path is not None and \ - not os.path.exists(args.init_model_path): - raise IOError("Invalid initial model path!") - if args.checkpoints != '' and not os.path.exists(args.checkpoints): - os.mkdir(args.checkpoints) - if args.infer_models != '' and not os.path.exists(args.infer_models): - os.mkdir(args.infer_models) - - train_program = fluid.Program() - train_startup = fluid.Program() - - with fluid.program_guard(train_program, train_startup): - with fluid.unique_name.guard(): - py_train_reader = fluid.layers.py_reader( - capacity=10, - shapes=([-1, 3, 11, args.frame_dim], [-1, 1]), - dtypes=['float32', 'int64'], - lod_levels=[1, 1], - name='train_reader') - feature, label = fluid.layers.read_file(py_train_reader) - prediction, avg_cost, accuracy = stacked_lstmp_model( - feature=feature, - label=label, - hidden_dim=args.hidden_dim, - proj_dim=args.proj_dim, - stacked_num=args.stacked_num, - class_num=args.class_num) - # optimizer = fluid.optimizer.Momentum(learning_rate=args.learning_rate, momentum=0.9) - optimizer = fluid.optimizer.Adam( - learning_rate=fluid.layers.exponential_decay( - learning_rate=args.learning_rate, - decay_steps=1879, - decay_rate=1 / 1.2, - staircase=True)) - optimizer.minimize(avg_cost) - fluid.memory_optimize(train_program) - - test_program = fluid.Program() - test_startup = fluid.Program() - with fluid.program_guard(test_program, test_startup): - with fluid.unique_name.guard(): - py_test_reader = fluid.layers.py_reader( - capacity=10, - shapes=([-1, 3, 11, args.frame_dim], [-1, 1]), - dtypes=['float32', 'int64'], - lod_levels=[1, 1], - name='test_reader') - feature, label = fluid.layers.read_file(py_test_reader) - prediction, avg_cost, accuracy = stacked_lstmp_model( - feature=feature, - label=label, - hidden_dim=args.hidden_dim, - proj_dim=args.proj_dim, - stacked_num=args.stacked_num, - class_num=args.class_num) - test_program = test_program.clone(for_test=True) - place = fluid.CPUPlace() if args.device == 'CPU' else fluid.CUDAPlace(0) - exe = fluid.Executor(place) - exe.run(train_startup) - exe.run(test_startup) - - if args.parallel: - exec_strategy = fluid.ExecutionStrategy() - exec_strategy.num_iteration_per_drop_scope = 10 - train_exe = fluid.ParallelExecutor( - use_cuda=(args.device == 'GPU'), - loss_name=avg_cost.name, - exec_strategy=exec_strategy, - main_program=train_program) - test_exe = fluid.ParallelExecutor( - use_cuda=(args.device == 'GPU'), - main_program=test_program, - exec_strategy=exec_strategy, - share_vars_from=train_exe) - - # resume training if initial model provided. - if args.init_model_path is not None: - fluid.io.load_persistables(exe, args.init_model_path) - - ltrans = [ - trans_add_delta.TransAddDelta(2, 2), - trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var), - trans_splice.TransSplice(5, 5), trans_delay.TransDelay(5) - ] - - # bind train_reader - train_data_reader = reader.AsyncDataReader( - args.train_feature_lst, - args.train_label_lst, - -1, - split_sentence_threshold=1024) - - train_data_reader.set_transformers(ltrans) - - def train_data_provider(): - for data in train_data_reader.batch_iterator(args.batch_size, - args.minimum_batch_size): - yield batch_data_to_lod_tensors(args, data, fluid.CPUPlace()) - - py_train_reader.decorate_tensor_provider(train_data_provider) - - if (os.path.exists(args.val_feature_lst) and - os.path.exists(args.val_label_lst)): - # test data reader - test_data_reader = reader.AsyncDataReader( - args.val_feature_lst, - args.val_label_lst, - -1, - split_sentence_threshold=1024) - test_data_reader.set_transformers(ltrans) - - def test_data_provider(): - for data in test_data_reader.batch_iterator( - args.batch_size, args.minimum_batch_size): - yield batch_data_to_lod_tensors(args, data, fluid.CPUPlace()) - - py_test_reader.decorate_tensor_provider(test_data_provider) - - # validation - def test(exe): - # If test data not found, return invalid cost and accuracy - if not (os.path.exists(args.val_feature_lst) and - os.path.exists(args.val_label_lst)): - return -1.0, -1.0 - batch_id = 0 - test_costs = [] - test_accs = [] - while True: - if batch_id == 0: - py_test_reader.start() - try: - if args.parallel: - cost, acc = exe.run( - fetch_list=[avg_cost.name, accuracy.name], - return_numpy=False) - else: - cost, acc = exe.run(program=test_program, - fetch_list=[avg_cost, accuracy], - return_numpy=False) - sys.stdout.write('.') - sys.stdout.flush() - test_costs.append(np.array(cost)[0]) - test_accs.append(np.array(acc)[0]) - batch_id += 1 - except fluid.core.EOFException: - py_test_reader.reset() - break - return np.mean(test_costs), np.mean(test_accs) - - # train - for pass_id in xrange(args.pass_num): - pass_start_time = time.time() - batch_id = 0 - while True: - if batch_id == 0: - py_train_reader.start() - to_print = batch_id > 0 and (batch_id % args.print_per_batches == 0) - try: - if args.parallel: - outs = train_exe.run( - fetch_list=[avg_cost.name, accuracy.name] - if to_print else [], - return_numpy=False) - else: - outs = exe.run(program=train_program, - fetch_list=[avg_cost, accuracy] - if to_print else [], - return_numpy=False) - except fluid.core.EOFException: - py_train_reader.reset() - break - - if to_print: - if args.parallel: - print("\nBatch %d, train cost: %f, train acc: %f" % - (batch_id, np.mean(outs[0]), np.mean(outs[1]))) - else: - print("\nBatch %d, train cost: %f, train acc: %f" % ( - batch_id, np.array(outs[0])[0], np.array(outs[1])[0])) - # save the latest checkpoint - if args.checkpoints != '': - model_path = os.path.join(args.checkpoints, - "deep_asr.latest.checkpoint") - fluid.io.save_persistables(exe, model_path, train_program) - else: - sys.stdout.write('.') - sys.stdout.flush() - - batch_id += 1 - # run test - val_cost, val_acc = test(test_exe if args.parallel else exe) - - # save checkpoint per pass - if args.checkpoints != '': - model_path = os.path.join( - args.checkpoints, - "deep_asr.pass_" + str(pass_id) + ".checkpoint") - fluid.io.save_persistables(exe, model_path, train_program) - # save inference model - if args.infer_models != '': - model_path = os.path.join( - args.infer_models, - "deep_asr.pass_" + str(pass_id) + ".infer.model") - fluid.io.save_inference_model(model_path, ["feature"], - [prediction], exe, train_program) - # cal pass time - pass_end_time = time.time() - time_consumed = pass_end_time - pass_start_time - # print info at pass end - print("\nPass %d, time consumed: %f s, val cost: %f, val acc: %f\n" % - (pass_id, time_consumed, val_cost, val_acc)) - - -def batch_data_to_lod_tensors(args, batch_data, place): - features, labels, lod, name_lst = batch_data - features = np.reshape(features, (-1, 11, 3, args.frame_dim)) - features = np.transpose(features, (0, 2, 1, 3)) - feature_t = fluid.LoDTensor() - label_t = fluid.LoDTensor() - feature_t.set(features, place) - feature_t.set_lod([lod]) - label_t.set(labels, place) - label_t.set_lod([lod]) - return feature_t, label_t - - -if __name__ == '__main__': - args = parse_args() - print_arguments(args) - - train(args) diff --git a/PaddleSpeech/README.md b/PaddleSpeech/README.md deleted file mode 100644 index 39f91c26bd90fdd0e8fa81a395d14c2d3826f7cd..0000000000000000000000000000000000000000 --- a/PaddleSpeech/README.md +++ /dev/null @@ -1,12 +0,0 @@ -Fluid 模型库 -============ - -语音识别 --------- - -自动语音识别(Automatic Speech Recognition, ASR)是将人类声音中的词汇内容转录成计算机可输入的文字的技术。语音识别的相关研究经历了漫长的探索过程,在HMM/GMM模型之后其发展一直较为缓慢,随着深度学习的兴起,其迎来了春天。在多种语言识别任务中,将深度神经网络(DNN)作为声学模型,取得了比GMM更好的性能,使得 ASR 成为深度学习应用最为成功的领域之一。而由于识别准确率的不断提高,有越来越多的语言技术产品得以落地,例如语言输入法、以智能音箱为代表的智能家居设备等 — 基于语言的交互方式正在深刻的改变人类的生活。 - -与 [DeepSpeech](https://github.com/PaddlePaddle/DeepSpeech) 中深度学习模型端到端直接预测字词的分布不同,本实例更接近传统的语言识别流程,以音素为建模单元,关注语言识别中声学模型的训练,利用[kaldi](http://www.kaldi-asr.org) 进行音频数据的特征提取和标签对齐,并集成 kaldi 的解码器完成解码。 - -- [DeepASR](https://github.com/PaddlePaddle/models/blob/develop/PaddleSpeech/DeepASR/README_cn.md) - diff --git a/README.md b/README.md index d1b7db180ddacbe497c0495d97ce6d70393758ab..1ecf77cdf7a5ff8218546b3a65798dd1c1d7bb2e 100644 --- a/README.md +++ b/README.md @@ -32,16 +32,16 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 | **模型名称** | **模型简介** | **数据集** | **评估指标** **top-1/top-5 accuracy(CV2)** | | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------- | ------------------------------------------------ | -| [AlexNet](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification) | 首次在CNN中成功的应用了ReLU、Dropout和LRN,并使用GPU进行运算加速 | ImageNet-2012验证集 | 56.72%/79.17% | -| [VGG](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification) | 在AlexNet的基础上使用3*3小卷积核,增加网络深度,具有很好的泛化能力 | ImageNet-2012验证集 | 72.56%/90.93% | -| [GoogleNet](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification) | 在不增加计算负载的前提下增加了网络的深度和宽度,性能更加优越 | ImageNet-2012验证集 | 70.70%/89.66% | -| [ResNet](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification) | Residual Network,引入了新的残差结构,解决了随着网络加深,准确率下降的问题 | ImageNet-2012验证集 | 80.93%/95.33% | -| [ResNet-D](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification) | 融合最新多种对ResNet改进策略,ResNet50_vd的top1准确率达到79.84% | ImageNet-2012验证集 | 79.84%/94.93% | -| [Inception-v4](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification) | 将Inception模块与Residual Connection进行结合,通过ResNet的结构极大地加速训练并获得性能的提升 | ImageNet-2012验证集 | 80.77%/95.26% | -| [MobileNet v1](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification) | 将传统的卷积结构改造成两层卷积结构的网络,在基本不影响准确率的前提下大大减少计算时间,更适合移动端和嵌入式视觉应用 | ImageNet-2012验证集 | 70.99%/89.68% | -| [MobileNet v2](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification) | MobileNet结构的微调,直接在thinner的bottleneck层上进行skip learning连接以及对bottleneck layer不进行ReLu非线性处理可取得更好的结果 | ImageNet-2012验证集 | 72.15%/90.65% | -| [SE_ResNeXt](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification) | 在ResNeXt 基础、上加入了SE(Sequeeze-and-Excitation) 模块,提高了识别准确率,在ILSVRC 2017 的分类项目中取得了第一名 | ImageNet-2012验证集 | 81.40%/95.48% | -| [ShuffleNet v2](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification) | ECCV2018,轻量级CNN网络,在速度和准确度之间做了很好地平衡。在同等复杂度下,比ShuffleNet和MobileNetv2更准确,更适合移动端以及无人车领域 | ImageNet-2012验证集 | 70.03%/89.17% | +| [AlexNet](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/image_classification) | 首次在CNN中成功的应用了ReLU、Dropout和LRN,并使用GPU进行运算加速 | ImageNet-2012验证集 | 56.72%/79.17% | +| [VGG](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/image_classification) | 在AlexNet的基础上使用3*3小卷积核,增加网络深度,具有很好的泛化能力 | ImageNet-2012验证集 | 72.56%/90.93% | +| [GoogleNet](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/image_classification) | 在不增加计算负载的前提下增加了网络的深度和宽度,性能更加优越 | ImageNet-2012验证集 | 70.70%/89.66% | +| [ResNet](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/image_classification) | Residual Network,引入了新的残差结构,解决了随着网络加深,准确率下降的问题 | ImageNet-2012验证集 | 80.93%/95.33% | +| [ResNet-D](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/image_classification) | 融合最新多种对ResNet改进策略,ResNet50_vd的top1准确率达到79.84% | ImageNet-2012验证集 | 79.84%/94.93% | +| [Inception-v4](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/image_classification) | 将Inception模块与Residual Connection进行结合,通过ResNet的结构极大地加速训练并获得性能的提升 | ImageNet-2012验证集 | 80.77%/95.26% | +| [MobileNet v1](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/image_classification) | 将传统的卷积结构改造成两层卷积结构的网络,在基本不影响准确率的前提下大大减少计算时间,更适合移动端和嵌入式视觉应用 | ImageNet-2012验证集 | 70.99%/89.68% | +| [MobileNet v2](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/image_classification) | MobileNet结构的微调,直接在thinner的bottleneck层上进行skip learning连接以及对bottleneck layer不进行ReLu非线性处理可取得更好的结果 | ImageNet-2012验证集 | 72.15%/90.65% | +| [SE_ResNeXt](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/image_classification) | 在ResNeXt 基础、上加入了SE(Sequeeze-and-Excitation) 模块,提高了识别准确率,在ILSVRC 2017 的分类项目中取得了第一名 | ImageNet-2012验证集 | 81.40%/95.48% | +| [ShuffleNet v2](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/image_classification) | ECCV2018,轻量级CNN网络,在速度和准确度之间做了很好地平衡。在同等复杂度下,比ShuffleNet和MobileNetv2更准确,更适合移动端以及无人车领域 | ImageNet-2012验证集 | 70.03%/89.17% | ### 目标检测 @@ -49,12 +49,12 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 | 模型名称 | 模型简介 | 数据集 | 评估指标 mAP | | ------------------------------------------------------------ | ------------------------------------------------------------ | ---------- | ------------------------------------------------------- | -| [SSD](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/PaddleDetection) | 很好的继承了MobileNet预测速度快,易于部署的特点,能够很好的在多种设备上完成图像目标检测任务 | VOC07 test | mAP = 73.32% | -| [Faster-RCNN](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/PaddleDetection) | 创造性地采用卷积网络自行产生建议框,并且和目标检测网络共享卷积网络,建议框数目减少,质量提高 | MS-COCO | 基于ResNet 50 mAP(0.50:0.95) = 36.7% | -| [Mask-RCNN](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/PaddleDetection) | 经典的两阶段框架,在Faster R-CNN模型基础上添加分割分支,得到掩码结果,实现了掩码和类别预测关系的解藕,可得到像素级别的检测结果。 | MS-COCO | 基于ResNet 50 Mask mAP(0.50:0.95) = 31.4% | -| [RetinaNet](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/PaddleDetection) | 经典的一阶段框架,由ResNet主干网络、FPN结构、和两个分别用于回归物体位置和预测物体类别的子网络组成。在训练过程中使用Focal Loss,解决了传统一阶段检测器存在前景背景类别不平衡的问题,进一步提高了一阶段检测器的精度。 | MS-COCO | 基于ResNet mAP (500.50:0.95) = 36% | -| [YOLOv3](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/PaddleDetection) | 速度和精度均衡的目标检测网络,相比于原作者darknet中的YOLO v3实现,PaddlePaddle实现参考了论文[Bag of Tricks for Image Classification with Convolutional Neural Networks](https://arxiv.org/pdf/1812.01187.pdf) 增加了mixup,label_smooth等处理,精度(mAP(0.5:0.95))相比于原作者提高了4.7个绝对百分点,在此基础上加入synchronize batch normalization, 最终精度相比原作者提高5.9个绝对百分点。 | MS-COCO | 基于DarkNet mAP(0.50:0.95)= 38.9% | -| [PyramidBox](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/face_detection) | **PyramidBox** **模型是百度自主研发的人脸检测模型**,利用上下文信息解决困难人脸的检测问题,网络表达能力高,鲁棒性强。于18年3月份在WIDER Face数据集上取得第一名 | WIDER FACE | mAP (Easy/Medium/Hard set)= 96.0%/ 94.8%/ 88.8% | +| [SSD](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/PaddleDetection) | 很好的继承了MobileNet预测速度快,易于部署的特点,能够很好的在多种设备上完成图像目标检测任务 | VOC07 test | mAP = 73.32% | +| [Faster-RCNN](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/PaddleDetection) | 创造性地采用卷积网络自行产生建议框,并且和目标检测网络共享卷积网络,建议框数目减少,质量提高 | MS-COCO | 基于ResNet 50 mAP(0.50:0.95) = 36.7% | +| [Mask-RCNN](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/PaddleDetection) | 经典的两阶段框架,在Faster R-CNN模型基础上添加分割分支,得到掩码结果,实现了掩码和类别预测关系的解藕,可得到像素级别的检测结果。 | MS-COCO | 基于ResNet 50 Mask mAP(0.50:0.95) = 31.4% | +| [RetinaNet](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/PaddleDetection) | 经典的一阶段框架,由ResNet主干网络、FPN结构、和两个分别用于回归物体位置和预测物体类别的子网络组成。在训练过程中使用Focal Loss,解决了传统一阶段检测器存在前景背景类别不平衡的问题,进一步提高了一阶段检测器的精度。 | MS-COCO | 基于ResNet mAP (500.50:0.95) = 36% | +| [YOLOv3](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/PaddleDetection) | 速度和精度均衡的目标检测网络,相比于原作者darknet中的YOLO v3实现,PaddlePaddle实现参考了论文[Bag of Tricks for Image Classification with Convolutional Neural Networks](https://arxiv.org/pdf/1812.01187.pdf) 增加了mixup,label_smooth等处理,精度(mAP(0.5:0.95))相比于原作者提高了4.7个绝对百分点,在此基础上加入synchronize batch normalization, 最终精度相比原作者提高5.9个绝对百分点。 | MS-COCO | 基于DarkNet mAP(0.50:0.95)= 38.9% | +| [PyramidBox](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/face_detection) | **PyramidBox** **模型是百度自主研发的人脸检测模型**,利用上下文信息解决困难人脸的检测问题,网络表达能力高,鲁棒性强。于18年3月份在WIDER Face数据集上取得第一名 | WIDER FACE | mAP (Easy/Medium/Hard set)= 96.0%/ 94.8%/ 88.8% | ### 图像分割 @@ -62,8 +62,8 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 | 模型名称 | 模型简介 | 数据集 | 评估指标 | | ------------------------------------------------------------ | ------------------------------------------------------------ | --------- | --------------- | -| [ICNet](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/icnet) | 主要用于图像实时语义分割,能够兼顾速度和准确性,易于线上部署 | Cityscape | Mean IoU=67.0% | -| [DeepLab V3+](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/deeplabv3%2B) | 通过encoder-decoder进行多尺度信息的融合,同时保留了原来的空洞卷积和ASSP层, 其骨干网络使用了Xception模型,提高了语义分割的健壮性和运行速率 | Cityscape | Mean IoU=78.81% | +| [ICNet](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/icnet) | 主要用于图像实时语义分割,能够兼顾速度和准确性,易于线上部署 | Cityscape | Mean IoU=67.0% | +| [DeepLab V3+](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/deeplabv3%2B) | 通过encoder-decoder进行多尺度信息的融合,同时保留了原来的空洞卷积和ASSP层, 其骨干网络使用了Xception模型,提高了语义分割的健壮性和运行速率 | Cityscape | Mean IoU=78.81% | ### 关键点检测 @@ -71,7 +71,7 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 | 模型名称 | 模型简介 | 数据集 | 评估指标 | | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------ | ------------ | -| [Simple Baselines](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/human_pose_estimation) | coco2018关键点检测项目亚军方案,网络结构非常简单,效果达到state of the art | COCO val2017 | AP = 72.7% | +| [Simple Baselines](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/human_pose_estimation) | coco2018关键点检测项目亚军方案,网络结构非常简单,效果达到state of the art | COCO val2017 | AP = 72.7% | ### 图像生成 @@ -79,13 +79,13 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 | 模型名称 | 模型简介 | 数据集 | | ------------------------------------------------------------ | ------------------------------------------------------------ | ---------- | -| [CGAN](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/PaddleGAN) | 条件生成对抗网络,一种带条件约束的GAN,使用额外信息对模型增加条件,可以指导数据生成过程 | Mnist | -| [DCGAN](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/PaddleGAN) | 深度卷积生成对抗网络,将GAN和卷积网络结合起来,以解决GAN训练不稳定的问题 | Mnist | -| [Pix2Pix](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/PaddleGAN) | 图像翻译,通过成对图片将某一类图片转换成另外一类图片,可用于风格迁移 | Cityscapes | -| [CycleGAN](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/PaddleGAN) | 图像翻译,可以通过非成对的图片将某一类图片转换成另外一类图片,可用于风格迁移 | Cityscapes | -| [StarGAN](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/PaddleGAN) | 多领域属性迁移,引入辅助分类帮助单个判别器判断多个属性,可用于人脸属性转换 | Celeba | -| [AttGAN](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/PaddleGAN) | 利用分类损失和重构损失来保证改变特定的属性,可用于人脸特定属性转换 | Celeba | -| [STGAN](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/PaddleGAN) | 人脸特定属性转换,只输入有变化的标签,引入GRU结构,更好的选择变化的属性 | Celeba | +| [CGAN](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/PaddleGAN) | 条件生成对抗网络,一种带条件约束的GAN,使用额外信息对模型增加条件,可以指导数据生成过程 | Mnist | +| [DCGAN](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/PaddleGAN) | 深度卷积生成对抗网络,将GAN和卷积网络结合起来,以解决GAN训练不稳定的问题 | Mnist | +| [Pix2Pix](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/PaddleGAN) | 图像翻译,通过成对图片将某一类图片转换成另外一类图片,可用于风格迁移 | Cityscapes | +| [CycleGAN](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/PaddleGAN) | 图像翻译,可以通过非成对的图片将某一类图片转换成另外一类图片,可用于风格迁移 | Cityscapes | +| [StarGAN](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/PaddleGAN) | 多领域属性迁移,引入辅助分类帮助单个判别器判断多个属性,可用于人脸属性转换 | Celeba | +| [AttGAN](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/PaddleGAN) | 利用分类损失和重构损失来保证改变特定的属性,可用于人脸特定属性转换 | Celeba | +| [STGAN](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/PaddleGAN) | 人脸特定属性转换,只输入有变化的标签,引入GRU结构,更好的选择变化的属性 | Celeba | ### 场景文字识别 @@ -93,8 +93,8 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 | 模型名称 | 模型简介 | 数据集 | 评估指标 | | ------------------------------------------------------------ | ------------------------------------------------------------ | -------------------------- | -------------- | -| [CRNN-CTC](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/ocr_recognition) | 使用CTC model识别图片中单行英文字符,用于端到端的文本行图片识别方法 | 单行不定长的英文字符串图片 | 错误率= 22.3% | -| [OCR Attention](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/ocr_recognition) | 使用attention 识别图片中单行英文字符,用于端到端的自然场景文本识别, | 单行不定长的英文字符串图片 | 错误率 = 15.8% | +| [CRNN-CTC](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/ocr_recognition) | 使用CTC model识别图片中单行英文字符,用于端到端的文本行图片识别方法 | 单行不定长的英文字符串图片 | 错误率= 22.3% | +| [OCR Attention](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/ocr_recognition) | 使用attention 识别图片中单行英文字符,用于端到端的自然场景文本识别, | 单行不定长的英文字符串图片 | 错误率 = 15.8% | ### 度量学习 @@ -102,11 +102,11 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 | 模型名称 | 模型简介 | 数据集 | 评估指标 Recall@Rank-1(使用arcmargin训练) | | ------------------------------------------------------------ | --------------------------------------------------------- | ------------------------------ | --------------------------------------------- | -| [ResNet50未微调](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/metric_learning) | 使用arcmargin loss训练的特征模型 | Stanford Online Product(SOP) | 78.11% | -| [ResNet50使用triplet微调](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/metric_learning) | 在arcmargin loss基础上,使用triplet loss微调的特征模型 | Stanford Online Product(SOP) | 79.21% | -| [ResNet50使用quadruplet微调](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/metric_learning) | 在arcmargin loss基础上,使用quadruplet loss微调的特征模型 | Stanford Online Product(SOP) | 79.59% | -| [ResNet50使用eml微调](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/metric_learning) | 在arcmargin loss基础上,使用eml loss微调的特征模型 | Stanford Online Product(SOP) | 80.11% | -| [ResNet50使用npairs微调](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/metric_learning) | 在arcmargin loss基础上,使用npairs loss微调的特征模型 | Stanford Online Product(SOP) | 79.81% | +| [ResNet50未微调](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/metric_learning) | 使用arcmargin loss训练的特征模型 | Stanford Online Product(SOP) | 78.11% | +| [ResNet50使用triplet微调](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/metric_learning) | 在arcmargin loss基础上,使用triplet loss微调的特征模型 | Stanford Online Product(SOP) | 79.21% | +| [ResNet50使用quadruplet微调](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/metric_learning) | 在arcmargin loss基础上,使用quadruplet loss微调的特征模型 | Stanford Online Product(SOP) | 79.59% | +| [ResNet50使用eml微调](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/metric_learning) | 在arcmargin loss基础上,使用eml loss微调的特征模型 | Stanford Online Product(SOP) | 80.11% | +| [ResNet50使用npairs微调](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/metric_learning) | 在arcmargin loss基础上,使用npairs loss微调的特征模型 | Stanford Online Product(SOP) | 79.81% | ### 视频分类和动作定位 @@ -114,14 +114,14 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 | 模型名称 | 模型简介 | 数据集 | 评估指标 | | ------------------------------------------------------------ | ------------------------------------------------------------ | -------------------------- | ----------- | -| [TSN](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/PaddleVideo) | ECCV'16提出的基于2D-CNN经典解决方案 | Kinetics-400 | Top-1 = 67% | -| [Non-Local](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/PaddleVideo) | 视频非局部关联建模模型 | Kinetics-400 | Top-1 = 74% | -| [stNet](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/PaddleVideo) | AAAI'19提出的视频联合时空建模方法 | Kinetics-400 | Top-1 = 69% | -| [TSM](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/PaddleVideo) | 基于时序移位的简单高效视频时空建模方法 | Kinetics-400 | Top-1 = 70% | -| [Attention LSTM](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/PaddleVideo) | 常用模型,速度快精度高 | Youtube-8M | GAP = 86% | -| [Attention Cluster](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/PaddleVideo) | CVPR'18提出的视频多模态特征注意力聚簇融合方法 | Youtube-8M | GAP = 84% | -| [NeXtVlad](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/PaddleVideo) | 2nd-Youtube-8M最优单模型 | Youtube-8M | GAP = 87% | -| [C-TCN](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/PaddleVideo) | 2018年ActivityNet夺冠方案 | ActivityNet1.3 | MAP=31% | +| [TSN](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/PaddleVideo) | ECCV'16提出的基于2D-CNN经典解决方案 | Kinetics-400 | Top-1 = 67% | +| [Non-Local](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/PaddleVideo) | 视频非局部关联建模模型 | Kinetics-400 | Top-1 = 74% | +| [stNet](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/PaddleVideo) | AAAI'19提出的视频联合时空建模方法 | Kinetics-400 | Top-1 = 69% | +| [TSM](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/PaddleVideo) | 基于时序移位的简单高效视频时空建模方法 | Kinetics-400 | Top-1 = 70% | +| [Attention LSTM](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/PaddleVideo) | 常用模型,速度快精度高 | Youtube-8M | GAP = 86% | +| [Attention Cluster](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/PaddleVideo) | CVPR'18提出的视频多模态特征注意力聚簇融合方法 | Youtube-8M | GAP = 84% | +| [NeXtVlad](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/PaddleVideo) | 2nd-Youtube-8M最优单模型 | Youtube-8M | GAP = 87% | +| [C-TCN](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleCV/PaddleVideo) | 2018年ActivityNet夺冠方案 | ActivityNet1.3 | MAP=31% | ## PaddleNLP @@ -129,7 +129,7 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 #### 词法分析 -[LAC(Lexical Analysis of Chinese)](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/lexical_analysis)百度自主研发中文特色模型词法分析任务,**输入是一个字符串,而输出是句子中的词边界和词性、实体类别。 +[LAC(Lexical Analysis of Chinese)](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleNLP/lexical_analysis)百度自主研发中文特色模型词法分析任务,**输入是一个字符串,而输出是句子中的词边界和词性、实体类别。 | **模型** | **Precision** | **Recall** | **F1-score** | | ---------------- | ------------- | ---------- | ------------ | @@ -139,7 +139,7 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 #### 语言模型 -[基于LSTM的语言模型任务](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/language_model),给定一个输入词序列(中文分词、英文tokenize),计算其PPL(语言模型困惑度,用户表示句子的流利程度)。 +[基于LSTM的语言模型任务](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleNLP/language_model),给定一个输入词序列(中文分词、英文tokenize),计算其PPL(语言模型困惑度,用户表示句子的流利程度)。 | **large config** | **train** | **valid** | **test** | | ---------------- | --------- | --------- | -------- | @@ -150,7 +150,7 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 #### 情感分析 -[Senta(Sentiment Classification)](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/sentiment_classification)百度AI开放平台中情感倾向分析模型、百度自主研发的中文特色模型,是目前最好的中文情感分析模型。 +[Senta(Sentiment Classification)](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleNLP/sentiment_classification)百度AI开放平台中情感倾向分析模型、百度自主研发的中文特色模型,是目前最好的中文情感分析模型。 | **模型** | **dev** | **test** | **模型(****finetune****)** | **dev** | **test** | | ------------- | ------- | -------- | ---------------------------- | ------- | -------- | @@ -164,7 +164,7 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 #### 对话情绪识别 -[EmoTect(Emotion Detection)](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/emotion_detection)专注于识别智能对话场景中用户的情绪识别,并开源基于百度海量数据训练好的预训练模型。 +[EmoTect(Emotion Detection)](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleNLP/emotion_detection)专注于识别智能对话场景中用户的情绪识别,并开源基于百度海量数据训练好的预训练模型。 | **模型** | **闲聊** | **客服** | **微博** | | -------- | -------- | -------- | -------- | @@ -178,7 +178,7 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 #### 阅读理解 -[MRC(Machine Reading Comprehension)](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/reading_comprehension)机器阅读理解(MRC)是自然语言处理(NLP)中的关键任务之一,开源的DuReader升级了经典的阅读理解BiDAF模型,去掉了char级别的embedding,在预测层中使用了[pointer network](https://arxiv.org/abs/1506.03134),并且参考了[R-NET](https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/r-net.pdf)中的一些网络结构,效果上有了大幅提升 +[MRC(Machine Reading Comprehension)](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleNLP/reading_comprehension)机器阅读理解(MRC)是自然语言处理(NLP)中的关键任务之一,开源的DuReader升级了经典的阅读理解BiDAF模型,去掉了char级别的embedding,在预测层中使用了[pointer network](https://arxiv.org/abs/1506.03134),并且参考了[R-NET](https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/r-net.pdf)中的一些网络结构,效果上有了大幅提升 | **Model** | **Dev ROUGE-L** | **Test ROUGE-L** | | -------------------------------------------------------- | --------------- | ---------------- | @@ -266,7 +266,7 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 #### SimNet -[SimNet(Similarity Net)](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/similarity_net)一个计算短文本相似度的框架,可以根据用户输入的两个文本,计算出相似度得分。 +[SimNet(Similarity Net)](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleNLP/similarity_net)一个计算短文本相似度的框架,可以根据用户输入的两个文本,计算出相似度得分。 | **模型** | **百度知道** | **ECOM** | **QQSIM** | **UNICOM** | **LCQMC** | | ------------ | ------------ | -------- | --------- | ---------- | --------- | @@ -277,7 +277,7 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 #### 机器翻译 -[MT(machine translation)](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/neural_machine_translation/transformer)机器翻译是利用计算机将一种自然语言(源语言)转换为另一种自然语言(目标语言)的过程,输入为源语言句子,输出为相应的目标语言的句子。 +[MT(machine translation)](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleNLP/neural_machine_translation/transformer)机器翻译是利用计算机将一种自然语言(源语言)转换为另一种自然语言(目标语言)的过程,输入为源语言句子,输出为相应的目标语言的句子。 | **测试集** | **newstest2014** | **newstest2015** | **newstest2016** | | ---------- | ---------------- | ---------------- | ---------------- | @@ -286,7 +286,7 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 #### 对话自动评估 -[对话自动评估(Auto Dialogue Evaluation)](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/dialogue_model_toolkit/auto_dialogue_evaluation)主要用于评估开放领域对话系统的回复质量,能够帮助企业或个人快速评估对话系统的回复质量,减少人工评估成本。 +[对话自动评估(Auto Dialogue Evaluation)](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleNLP/dialogue_model_toolkit/auto_dialogue_evaluation)主要用于评估开放领域对话系统的回复质量,能够帮助企业或个人快速评估对话系统的回复质量,减少人工评估成本。 利用少量标注数据微调后,自动评估打分和人工打分spearman相关系数,如下表。 @@ -296,7 +296,7 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 #### 对话通用理解 -[DGU(Dialogue General Understanding)](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/dialogue_model_toolkit/dialogue_general_understanding)对话通用理解针对数据集开发了相关的模型训练过程,支持分类,多标签分类,序列标注等任务,用户可针对自己的数据集,进行相关的模型定制 +[DGU(Dialogue General Understanding)](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleNLP/dialogue_model_toolkit/dialogue_general_understanding)对话通用理解针对数据集开发了相关的模型训练过程,支持分类,多标签分类,序列标注等任务,用户可针对自己的数据集,进行相关的模型定制 | **ask_name** | **udc** | **udc** | **udc** | **atis_slot** | **dstc2** | **atis_intent** | **swda** | **mrda** | | ------------ | ------- | ------- | ------- | ------------- | ---------- | --------------- | -------- | -------- | @@ -309,7 +309,7 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 #### DAM -[深度注意力机制模型(Deep Attention Maching)](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/dialogue_model_toolkit/deep_attention_matching)是开放领域多轮对话匹配模型。根据多轮对话历史和候选回复内容,排序出最合适的回复。 +[深度注意力机制模型(Deep Attention Maching)](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleNLP/dialogue_model_toolkit/deep_attention_matching)是开放领域多轮对话匹配模型。根据多轮对话历史和候选回复内容,排序出最合适的回复。 | | Ubuntu Corpus | Douban Conversation Corpus | | | | | | | | | | ---- | ------------- | -------------------------- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | @@ -331,25 +331,16 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 | 模型名称 | 模型简介 | | ------------------------------------------------------------ | ------------------------------------------------------------ | -| [TagSpace](https://github.com/PaddlePaddle/models/tree/develop/PaddleRec) | 应用于工业级的标签推荐,具体应用场景有feed新闻标签推荐等 | -| [GRU4Rec](https://github.com/PaddlePaddle/models/tree/develop/PaddleRec) | 首次将RNN(GRU)运用于session-based推荐,相比传统的KNN和矩阵分解,效果有明显的提升 | -| [SequenceSemanticRetrieval](https://github.com/PaddlePaddle/models/tree/develop/PaddleRec) | 使用参考论文中的思想,使用多种时间粒度进行用户行为预测 | -| [DeepCTR](https://github.com/PaddlePaddle/models/tree/develop/PaddleRec) | 只实现了DeepFM论文中介绍的模型的DNN部分,DeepFM会在其他例子中给出 | -| [Multiview-Simnet](https://github.com/PaddlePaddle/models/tree/develop/PaddleRec) | 基于多元视图,将用户和项目的多个功能视图合并为一个统一模型 | -| [Word2Vec](https://github.com/PaddlePaddle/models/tree/develop/PaddleRec) | skip-gram模式的word2vector模型 | -| [GraphNeuralNetwork](https://github.com/PaddlePaddle/models/tree/develop/PaddleRec) | 基于会话的图神经网络模型的推荐系统,可以更好的挖掘item中丰富的转换特性以及生成准确的潜在的用户向量表示 | -| [DeepInterestNetwork](https://github.com/PaddlePaddle/models/tree/develop/PaddleRec) | DIN通过一个兴趣激活模块(Activation Unit),用预估目标Candidate ADs的信息去激活用户的历史点击商品,以此提取用户与当前预估目标相关的兴趣。 | +| [TagSpace](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleRec) | 应用于工业级的标签推荐,具体应用场景有feed新闻标签推荐等 | +| [GRU4Rec](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleRec) | 首次将RNN(GRU)运用于session-based推荐,相比传统的KNN和矩阵分解,效果有明显的提升 | +| [SequenceSemanticRetrieval](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleRec) | 使用参考论文中的思想,使用多种时间粒度进行用户行为预测 | +| [DeepCTR](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleRec) | 只实现了DeepFM论文中介绍的模型的DNN部分,DeepFM会在其他例子中给出 | +| [Multiview-Simnet](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleRec) | 基于多元视图,将用户和项目的多个功能视图合并为一个统一模型 | +| [Word2Vec](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleRec) | skip-gram模式的word2vector模型 | +| [GraphNeuralNetwork](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleRec) | 基于会话的图神经网络模型的推荐系统,可以更好的挖掘item中丰富的转换特性以及生成准确的潜在的用户向量表示 | +| [DeepInterestNetwork](https://github.com/PaddlePaddle/models/tree/release/1.5/PaddleRec) | DIN通过一个兴趣激活模块(Activation Unit),用预估目标Candidate ADs的信息去激活用户的历史点击商品,以此提取用户与当前预估目标相关的兴趣。 | -## 其他模型 - -| 模型名称 | 模型简介 | -| ------------------------------------------------------------ | ------------------------------------------------------------ | -| [DeepASR](https://github.com/PaddlePaddle/models/blob/develop/PaddleSpeech/DeepASR/README_cn.md) | 利用Fluid框架完成语音识别中声学模型的配置和训练,并集成 Kaldi 的解码器 | -| [DQN](https://github.com/PaddlePaddle/models/blob/develop/PaddleRL/DeepQNetwork/README_cn.md) | value based强化学习算法,第一个成功地将深度学习和强化学习结合起来的模型 | -| [DoubleDQN](https://github.com/PaddlePaddle/models/blob/develop/PaddleRL/DeepQNetwork/README_cn.md) | 将Double Q的想法应用在DQN上,解决过优化问题 | -| [DuelingDQN](https://github.com/PaddlePaddle/models/blob/develop/PaddleRL/DeepQNetwork/README_cn.md) | 改进了DQN模型,提高了模型的性能 | - ## License This tutorial is contributed by [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) and licensed under the [Apache-2.0 license](LICENSE). diff --git a/fluid/AutoDL/LRC/README.md b/fluid/AutoDL/LRC/README.md deleted file mode 100644 index 546cb19169b965af5a3d0d41c903e318d4dfc64a..0000000000000000000000000000000000000000 --- a/fluid/AutoDL/LRC/README.md +++ /dev/null @@ -1,6 +0,0 @@ - -Hi! - -This directory has been deprecated. - -Please visit the project at [AutoDL/LRC](../../../AutoDL/LRC). diff --git a/fluid/AutoDL/LRC/README_cn.md b/fluid/AutoDL/LRC/README_cn.md deleted file mode 100644 index 6c87fd2d1cb5f6f4d187d665548ed7c74746bf10..0000000000000000000000000000000000000000 --- a/fluid/AutoDL/LRC/README_cn.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [AutoDL/LRC](../../../AutoDL/LRC) 目录下浏览本项目。 diff --git a/fluid/DeepASR/README.md b/fluid/DeepASR/README.md deleted file mode 100644 index b7d916c58649790055b2ddbdd32e914d02f14ebf..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/README.md +++ /dev/null @@ -1,6 +0,0 @@ - -Hi! - -This directory has been deprecated. - -Please visit the project at [PaddleSpeech/DeepASR](../../PaddleSpeech/DeepASR). diff --git a/fluid/DeepASR/README_cn.md b/fluid/DeepASR/README_cn.md deleted file mode 100644 index 51b0e724c810165810154915f41159d478398234..0000000000000000000000000000000000000000 --- a/fluid/DeepASR/README_cn.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleSpeech/DeepASR](../../PaddleSpeech/DeepASR) 目录下浏览本项目。 diff --git a/fluid/DeepQNetwork/README.md b/fluid/DeepQNetwork/README.md deleted file mode 100644 index f82d57f12cc4e97dae99d5a711ee495a9895aa91..0000000000000000000000000000000000000000 --- a/fluid/DeepQNetwork/README.md +++ /dev/null @@ -1,6 +0,0 @@ - -Hi! - -This directory has been deprecated. - -Please visit the project at [PaddleRL/DeepQNetwork](../../PaddleRL/DeepQNetwork). diff --git a/fluid/DeepQNetwork/README_cn.md b/fluid/DeepQNetwork/README_cn.md deleted file mode 100644 index b90f215b2d8e0734db5a41b00ab02260021c8cf6..0000000000000000000000000000000000000000 --- a/fluid/DeepQNetwork/README_cn.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleRL/DeepQNetwork](../../PaddleRL/DeepQNetwork) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/HiNAS_models/README.md b/fluid/PaddleCV/HiNAS_models/README.md deleted file mode 100644 index 1e33fea89e2d4e3a9b9ef2cad81012d082ccc504..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/HiNAS_models/README.md +++ /dev/null @@ -1,6 +0,0 @@ - -Hi! - -This directory has been deprecated. - -Please visit the project at [AutoDL/HiNAS_models](../../../AutoDL/HiNAS_models). diff --git a/fluid/PaddleCV/HiNAS_models/README_cn.md b/fluid/PaddleCV/HiNAS_models/README_cn.md deleted file mode 100644 index 8ab7149b0aaef04c226aff0302e4282b0172c113..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/HiNAS_models/README_cn.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [AutoDL/HiNAS_models](../../../AutoDL/HiNAS_models) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/caffe2fluid/README.md b/fluid/PaddleCV/caffe2fluid/README.md deleted file mode 100644 index 78702204ba32ffa63bcab4aef999267a5d7c1078..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/caffe2fluid/README.md +++ /dev/null @@ -1,6 +0,0 @@ - -Hi! - -This directory has been deprecated. - -Please visit the project at [X2Paddle](https://github.com/PaddlePaddle/X2Paddle). diff --git a/fluid/PaddleCV/deeplabv3+/README.md b/fluid/PaddleCV/deeplabv3+/README.md deleted file mode 100644 index 94f81a780a21bda7e230bf513be427b08a6eaca2..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/deeplabv3+/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleCV/deeplabv3+](../../../PaddleCV/deeplabv3+) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/face_detection/README.md b/fluid/PaddleCV/face_detection/README.md deleted file mode 100644 index e9319716f4f660ff75b571337575d8cd53c03a13..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/face_detection/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleCV/face_detection](../../../PaddleCV/face_detection) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/face_detection/README_cn.md b/fluid/PaddleCV/face_detection/README_cn.md deleted file mode 100644 index e9319716f4f660ff75b571337575d8cd53c03a13..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/face_detection/README_cn.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleCV/face_detection](../../../PaddleCV/face_detection) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/gan/c_gan/README.md b/fluid/PaddleCV/gan/c_gan/README.md deleted file mode 100644 index b36f7084c0a67ce35cc7e7a73333443919a98775..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/gan/c_gan/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleCV/gan/c_gan](../../../../PaddleCV/gan/c_gan) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/gan/cycle_gan/README.md b/fluid/PaddleCV/gan/cycle_gan/README.md deleted file mode 100644 index 5db6d49b2cbdaa6af4224bc0707593908a05352d..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/gan/cycle_gan/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleCV/gan/cycle_gan](../../../../PaddleCV/gan/cycle_gan) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/human_pose_estimation/README.md b/fluid/PaddleCV/human_pose_estimation/README.md deleted file mode 100644 index 6ced2b3b2cd19d413f2c8f2b139725c2e5ea14fc..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/human_pose_estimation/README.md +++ /dev/null @@ -1,6 +0,0 @@ - -Hi! - -This directory has been deprecated. - -Please visit the project at [PaddleCV/human_pose_estimation](../../../PaddleCV/human_pose_estimation). diff --git a/fluid/PaddleCV/human_pose_estimation/README_cn.md b/fluid/PaddleCV/human_pose_estimation/README_cn.md deleted file mode 100644 index 84120d0c568b13bfbccead92cd7f9211193f7669..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/human_pose_estimation/README_cn.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleCV/human_pose_estimation](../../../PaddleCV/human_pose_estimation) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/icnet/README.md b/fluid/PaddleCV/icnet/README.md deleted file mode 100644 index 72a3a91b0ae52894c641e61b489ff7a04c6f8106..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/icnet/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleCV/icnet](../../../PaddleCV/icnet) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/image_classification/README.md b/fluid/PaddleCV/image_classification/README.md deleted file mode 100644 index 55392b8ac91e4a8c24d2f2d6ac63d695cb58e146..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/image_classification/README.md +++ /dev/null @@ -1,6 +0,0 @@ - -Hi! - -This directory has been deprecated. - -Please visit the project at [PaddleCV/image_classification](../../../PaddleCV/image_classification). diff --git a/fluid/PaddleCV/image_classification/README_cn.md b/fluid/PaddleCV/image_classification/README_cn.md deleted file mode 100644 index bb8850cff5fbd658addaba488301783d0e510a6c..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/image_classification/README_cn.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleCV/image_classification](../../../PaddleCV/image_classification) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/image_classification/README_ngraph.md b/fluid/PaddleCV/image_classification/README_ngraph.md deleted file mode 100644 index 55392b8ac91e4a8c24d2f2d6ac63d695cb58e146..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/image_classification/README_ngraph.md +++ /dev/null @@ -1,6 +0,0 @@ - -Hi! - -This directory has been deprecated. - -Please visit the project at [PaddleCV/image_classification](../../../PaddleCV/image_classification). diff --git a/fluid/PaddleCV/metric_learning/README.md b/fluid/PaddleCV/metric_learning/README.md deleted file mode 100644 index 6afd28a457c639af25337cc02a6b5b64658845ff..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/metric_learning/README.md +++ /dev/null @@ -1,6 +0,0 @@ - -Hi! - -This directory has been deprecated. - -Please visit the project at [PaddleCV/metric_learning](../../../PaddleCV/metric_learning). diff --git a/fluid/PaddleCV/metric_learning/README_cn.md b/fluid/PaddleCV/metric_learning/README_cn.md deleted file mode 100644 index 72417ed9badfc4858f314f143dd069d4ff6a0e6a..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/metric_learning/README_cn.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleCV/metric_learning](../../../PaddleCV/metric_learning) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/object_detection/README.md b/fluid/PaddleCV/object_detection/README.md deleted file mode 100644 index 99b0f8db58cc8e2ef130c0054b40bf746b5ac2c8..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/object_detection/README.md +++ /dev/null @@ -1,6 +0,0 @@ - -Hi! - -This directory has been deprecated. - -Please visit the project at [PaddleCV/object_detection](../../../PaddleCV/object_detection). diff --git a/fluid/PaddleCV/object_detection/README_cn.md b/fluid/PaddleCV/object_detection/README_cn.md deleted file mode 100644 index d3af497b9aecf23db4976970fbe16bc6c99bf6ff..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/object_detection/README_cn.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleCV/object_detection](../../../PaddleCV/object_detection) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/object_detection/README_quant.md b/fluid/PaddleCV/object_detection/README_quant.md deleted file mode 100644 index 99b0f8db58cc8e2ef130c0054b40bf746b5ac2c8..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/object_detection/README_quant.md +++ /dev/null @@ -1,6 +0,0 @@ - -Hi! - -This directory has been deprecated. - -Please visit the project at [PaddleCV/object_detection](../../../PaddleCV/object_detection). diff --git a/fluid/PaddleCV/ocr_recognition/README.md b/fluid/PaddleCV/ocr_recognition/README.md deleted file mode 100644 index aa675d6048ecdb025ef2273ee755354152adc32e..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/ocr_recognition/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleCV/ocr_recognition](../../../PaddleCV/ocr_recognition) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/rcnn/README.md b/fluid/PaddleCV/rcnn/README.md deleted file mode 100644 index 1e96b373a0ad13424691921dd17e8f251b9cdfc7..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/rcnn/README.md +++ /dev/null @@ -1,6 +0,0 @@ - -Hi! - -This directory has been deprecated. - -Please visit the project at [PaddleCV/rcnn](../../../PaddleCV/rcnn). diff --git a/fluid/PaddleCV/rcnn/README_cn.md b/fluid/PaddleCV/rcnn/README_cn.md deleted file mode 100644 index 83d5e0fc06448086e8807587798e804e3c634f97..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/rcnn/README_cn.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleCV/rcnn](../../../PaddleCV/rcnn) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/video/README.md b/fluid/PaddleCV/video/README.md deleted file mode 100644 index bbef3af1c6f6715e4415041939e046d66f02f58d..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/video/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleCV/video](../../../PaddleCV/video) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/video/models/attention_cluster/README.md b/fluid/PaddleCV/video/models/attention_cluster/README.md deleted file mode 100644 index 95056a71cb34304788168e15479a4aa1e2ecf3af..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/video/models/attention_cluster/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleCV/video/models/attention_cluster](../../../../../PaddleCV/video/models/attention_cluster/) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/video/models/attention_lstm/README.md b/fluid/PaddleCV/video/models/attention_lstm/README.md deleted file mode 100644 index 044c88cbecafdc880ae0cd213f6df77a8ce1715f..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/video/models/attention_lstm/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleCV/video/models/attention_lstm](../../../../../PaddleCV/video/models/attention_lstm/) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/video/models/nextvlad/README.md b/fluid/PaddleCV/video/models/nextvlad/README.md deleted file mode 100644 index ad3a926dd83c8d8825224c404dda76fff5238cbe..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/video/models/nextvlad/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleCV/video/models/nextvlad](../../../../../PaddleCV/video/models/nextvlad/) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/video/models/nonlocal_model/README.md b/fluid/PaddleCV/video/models/nonlocal_model/README.md deleted file mode 100644 index 4f72316b5e761c7e2e421f76fc3f743ab4ac12fb..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/video/models/nonlocal_model/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleCV/video/models/nonlocal_model](../../../../../PaddleCV/video/models/nonlocal_model/) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/video/models/stnet/README.md b/fluid/PaddleCV/video/models/stnet/README.md deleted file mode 100644 index 15cff5af0909a93c8cf244629878582aa6c2d12f..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/video/models/stnet/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleCV/video/models/stnet](../../../../../PaddleCV/video/models/stnet/) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/video/models/tsm/README.md b/fluid/PaddleCV/video/models/tsm/README.md deleted file mode 100644 index c93c56618aff1cfd331b2c1bd9fccfbb8a4c7a08..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/video/models/tsm/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleCV/video/models/tsm](../../../../../PaddleCV/video/models/tsm/) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/video/models/tsn/README.md b/fluid/PaddleCV/video/models/tsn/README.md deleted file mode 100644 index 8b4a986a63ea7746a3c7a648cd9d535803784ca3..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/video/models/tsn/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleCV/video/models/tsn](../../../../../PaddleCV/video/models/tsn/) 目录下浏览本项目。 diff --git a/fluid/PaddleCV/video_classification/README.md b/fluid/PaddleCV/video_classification/README.md deleted file mode 100644 index bb145d1e7d4538f8b1a6df5cf547d9c5ef5ae8c5..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/video_classification/README.md +++ /dev/null @@ -1,6 +0,0 @@ - -Hi! - -This directory has been deprecated. - -Please visit the project at [PaddleCV/video_classification](../../../PaddleCV/video_classification). diff --git a/fluid/PaddleCV/yolov3/README.md b/fluid/PaddleCV/yolov3/README.md deleted file mode 100644 index d05d89ce182a23b2f74e2633f7ada32fc6390477..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/yolov3/README.md +++ /dev/null @@ -1,6 +0,0 @@ - -Hi! - -This directory has been deprecated. - -Please visit the project at [PaddleCV/yolov3](../../../PaddleCV/yolov3). diff --git a/fluid/PaddleCV/yolov3/README_cn.md b/fluid/PaddleCV/yolov3/README_cn.md deleted file mode 100644 index 89080d674df265d37a3601b579622adf1829c747..0000000000000000000000000000000000000000 --- a/fluid/PaddleCV/yolov3/README_cn.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleCV/yolov3](../../../PaddleCV/yolov3) 目录下浏览本项目。 diff --git a/fluid/PaddleNLP/chinese_ner/README.md b/fluid/PaddleNLP/chinese_ner/README.md deleted file mode 100644 index 06398c9164e5a39dbd444c78ebddb3ae46093574..0000000000000000000000000000000000000000 --- a/fluid/PaddleNLP/chinese_ner/README.md +++ /dev/null @@ -1,3 +0,0 @@ - - -您好,该项目已被迁移,请移步到 [PaddleNLP/unarchived/chinese_ner](../../../PaddleNLP/unarchived/chinese_ner/) 目录下浏览本项目。 diff --git a/fluid/PaddleNLP/deep_attention_matching_net/README.md b/fluid/PaddleNLP/deep_attention_matching_net/README.md deleted file mode 100644 index 7f4995ff102baadd095d31560c274ba9d57eea9c..0000000000000000000000000000000000000000 --- a/fluid/PaddleNLP/deep_attention_matching_net/README.md +++ /dev/null @@ -1,6 +0,0 @@ - -Hi! - -This directory has been deprecated. - -Please visit the project at [PaddleNLP/unarchived/deep_attention_matching_net](../../../PaddleNLP/unarchived/deep_attention_matching_net). diff --git a/fluid/PaddleNLP/language_model/gru/README.md b/fluid/PaddleNLP/language_model/gru/README.md deleted file mode 100644 index 15176770b1e2df48790386ab0137fcbe7c5d4200..0000000000000000000000000000000000000000 --- a/fluid/PaddleNLP/language_model/gru/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleNLP/unarchived/language_model/gru](../../../../PaddleNLP/unarchived/language_model/gru) 目录下浏览本项目。 diff --git a/fluid/PaddleNLP/language_model/lstm/README.md b/fluid/PaddleNLP/language_model/lstm/README.md deleted file mode 100644 index 8358bea7d81494c81c14a833e653a36d5eceadfb..0000000000000000000000000000000000000000 --- a/fluid/PaddleNLP/language_model/lstm/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleNLP/unarchived/language_model/lstm](../../../../PaddleNLP/unarchived/language_model/lstm) 目录下浏览本项目。 diff --git a/fluid/PaddleNLP/machine_reading_comprehension/README.md b/fluid/PaddleNLP/machine_reading_comprehension/README.md deleted file mode 100644 index e9642bc36abc0cfa9b3f0fed27c044b706eb0074..0000000000000000000000000000000000000000 --- a/fluid/PaddleNLP/machine_reading_comprehension/README.md +++ /dev/null @@ -1,6 +0,0 @@ - -Hi! - -This directory has been deprecated. - -Please visit the project at [PaddleNLP/unarchived/machine_reading_comprehension](../../../PaddleNLP/unarchived/machine_reading_comprehension). diff --git a/fluid/PaddleNLP/neural_machine_translation/README.md b/fluid/PaddleNLP/neural_machine_translation/README.md deleted file mode 100644 index 0117e6214f596b87baf097724526b23db23820f8..0000000000000000000000000000000000000000 --- a/fluid/PaddleNLP/neural_machine_translation/README.md +++ /dev/null @@ -1,6 +0,0 @@ - -Hi! - -This directory has been deprecated. - -Please visit the project at [PaddleNLP/unarchived/neural_machine_translation](../../../PaddleNLP/unarchived/neural_machine_translation). diff --git a/fluid/PaddleNLP/neural_machine_translation/rnn_search/README.md b/fluid/PaddleNLP/neural_machine_translation/rnn_search/README.md deleted file mode 100644 index 005fb7e2e56c19583bfbeb7997c25fbef5f77578..0000000000000000000000000000000000000000 --- a/fluid/PaddleNLP/neural_machine_translation/rnn_search/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleNLP/neural_machine_translation/rnn_search](../../../../PaddleNLP/neural_machine_translation/rnn_search) 目录下浏览本项目。 diff --git a/fluid/PaddleNLP/neural_machine_translation/transformer/README.md b/fluid/PaddleNLP/neural_machine_translation/transformer/README.md deleted file mode 100644 index 47a4f78bbb1e18e55442807b0701aef08f370fc0..0000000000000000000000000000000000000000 --- a/fluid/PaddleNLP/neural_machine_translation/transformer/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleNLP/neural_machine_translation/transformer](../../../../PaddleNLP/neural_machine_translation/transformer) 目录下浏览本项目。 diff --git a/fluid/PaddleNLP/sequence_tagging_for_ner/README.md b/fluid/PaddleNLP/sequence_tagging_for_ner/README.md deleted file mode 100644 index 772c4249c2c635ed9a6070b72028ce3a78a6d548..0000000000000000000000000000000000000000 --- a/fluid/PaddleNLP/sequence_tagging_for_ner/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleNLP/unarchived/sequence_tagging_for_ner](../../../PaddleNLP/unarchived/sequence_tagging_for_ner) 目录下浏览本项目。 diff --git a/fluid/PaddleNLP/text_classification/README.md b/fluid/PaddleNLP/text_classification/README.md deleted file mode 100644 index 48b09ed2c4245a2efe8f97a08a3c80f573e94336..0000000000000000000000000000000000000000 --- a/fluid/PaddleNLP/text_classification/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleNLP/unarchived/text_classification](../../../PaddleNLP/unarchived/text_classification) 目录下浏览本项目。 diff --git a/fluid/PaddleNLP/text_matching_on_quora/README.md b/fluid/PaddleNLP/text_matching_on_quora/README.md deleted file mode 100644 index b735660f56cb775a582f393cb06eb725b8ad36e7..0000000000000000000000000000000000000000 --- a/fluid/PaddleNLP/text_matching_on_quora/README.md +++ /dev/null @@ -1,6 +0,0 @@ - -Hi! - -This directory has been deprecated. - -Please visit the project at [PaddleNLP/unarchived/text_matching_on_quora](../../../PaddleNLP/unarchived/text_matching_on_quora). diff --git a/fluid/PaddleRec/ctr/README.cn.md b/fluid/PaddleRec/ctr/README.cn.md deleted file mode 100644 index 81cd20625701c13fce3a3f8ad119663a6e5c162c..0000000000000000000000000000000000000000 --- a/fluid/PaddleRec/ctr/README.cn.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleRec/ctr](../../../PaddleRec/ctr) 目录下浏览本项目。 diff --git a/fluid/PaddleRec/ctr/README.md b/fluid/PaddleRec/ctr/README.md deleted file mode 100644 index 1aceff1350c2c28b13ec92ccf82e321bb3ddda04..0000000000000000000000000000000000000000 --- a/fluid/PaddleRec/ctr/README.md +++ /dev/null @@ -1,6 +0,0 @@ - -Hi! - -This directory has been deprecated. - -Please visit the project at [PaddleRec/ctr](../../../PaddleRec/ctr). diff --git a/fluid/PaddleRec/din/README.md b/fluid/PaddleRec/din/README.md deleted file mode 100644 index 6e2df0301cf20434dc3479da8c93644f764c5c42..0000000000000000000000000000000000000000 --- a/fluid/PaddleRec/din/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleRec/din](../../../PaddleRec/din) 目录下浏览本项目。 diff --git a/fluid/PaddleRec/gnn/README.md b/fluid/PaddleRec/gnn/README.md deleted file mode 100644 index 1ac21f3ee4712ead33f44322447d30fe5aa45918..0000000000000000000000000000000000000000 --- a/fluid/PaddleRec/gnn/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleRec/gnn](../../../PaddleRec/gnn) 目录下浏览本项目。 diff --git a/fluid/PaddleRec/gru4rec/README.md b/fluid/PaddleRec/gru4rec/README.md deleted file mode 100644 index 9fe28eba00760b67c532e4624a5722cfd62feb57..0000000000000000000000000000000000000000 --- a/fluid/PaddleRec/gru4rec/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleRec/gru4rec](../../../PaddleRec/gru4rec) 目录下浏览本项目。 diff --git a/fluid/PaddleRec/multiview_simnet/README.cn.md b/fluid/PaddleRec/multiview_simnet/README.cn.md deleted file mode 100644 index 9cf8e27bba4775800498c25b550f7bb19479f074..0000000000000000000000000000000000000000 --- a/fluid/PaddleRec/multiview_simnet/README.cn.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleRec/multiview_simnet](../../../PaddleRec/multiview_simnet) 目录下浏览本项目。 diff --git a/fluid/PaddleRec/multiview_simnet/README.md b/fluid/PaddleRec/multiview_simnet/README.md deleted file mode 100644 index 8fba8e606256ad7ad65ec429b68e967809bc6a51..0000000000000000000000000000000000000000 --- a/fluid/PaddleRec/multiview_simnet/README.md +++ /dev/null @@ -1,6 +0,0 @@ - -Hi! - -This directory has been deprecated. - -Please visit the project at [PaddleRec/multiview_simnet](../../../PaddleRec/multiview_simnet). diff --git a/fluid/PaddleRec/ssr/README.md b/fluid/PaddleRec/ssr/README.md deleted file mode 100644 index 15111907ccc21942c134a2a614ad341c37710272..0000000000000000000000000000000000000000 --- a/fluid/PaddleRec/ssr/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleRec/ssr](../../../PaddleRec/ssr) 目录下浏览本项目。 diff --git a/fluid/PaddleRec/tagspace/README.md b/fluid/PaddleRec/tagspace/README.md deleted file mode 100644 index 67e3f88f7a2245829d0efbfe23a6566a0745fe41..0000000000000000000000000000000000000000 --- a/fluid/PaddleRec/tagspace/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleRec/tagspace](../../../PaddleRec/tagspace) 目录下浏览本项目。 diff --git a/fluid/PaddleRec/word2vec/README.md b/fluid/PaddleRec/word2vec/README.md deleted file mode 100644 index 7504ff9c332bf86f606d6d8770cefb325fc29ce0..0000000000000000000000000000000000000000 --- a/fluid/PaddleRec/word2vec/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleRec/word2vec](../../../PaddleRec/word2vec) 目录下浏览本项目。 diff --git a/fluid/adversarial/README.md b/fluid/adversarial/README.md deleted file mode 100644 index b43046d174c6fa7cc9517c043601d5a86e53604a..0000000000000000000000000000000000000000 --- a/fluid/adversarial/README.md +++ /dev/null @@ -1,6 +0,0 @@ - -Hi! - -This directory has been deprecated. - -Please visit the project at [PaddleCV/adversarial](../../PaddleCV/adversarial). diff --git a/fluid/policy_gradient/README.md b/fluid/policy_gradient/README.md deleted file mode 100644 index b6ac95d0fba6bbb7552671fbc6e80d052a648045..0000000000000000000000000000000000000000 --- a/fluid/policy_gradient/README.md +++ /dev/null @@ -1,2 +0,0 @@ - -您好,该项目已被迁移,请移步到 [PaddleRL/policy_gradient](../../PaddleRL/policy_gradient) 目录下浏览本项目。 diff --git a/legacy/README.cn.md b/legacy/README.cn.md deleted file mode 100644 index 72fb35ff3b239d8fa5e226f84aa09f084f593697..0000000000000000000000000000000000000000 --- a/legacy/README.cn.md +++ /dev/null @@ -1,136 +0,0 @@ -# models 简介 - -[![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat)](https://github.com/PaddlePaddle/models) -[![Documentation Status](https://img.shields.io/badge/中文文档-最新-brightgreen.svg)](https://github.com/PaddlePaddle/models) -[![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE) - -PaddlePaddle提供了丰富的运算单元,帮助大家以模块化的方式构建起千变万化的深度学习模型来解决不同的应用问题。这里,我们针对常见的机器学习任务,提供了不同的神经网络模型供大家学习和使用。 - - -## 1. 词向量 - -词向量用一个实向量表示词语,向量的每个维都表示文本的某种潜在语法或语义特征,是深度学习应用于自然语言处理领域最成功的概念和成果之一。广义的,词向量也可以应用于普通离散特征。词向量的学习通常都是一个无监督的学习过程,因此,可以充分利用海量的无标记数据以捕获特征之间的关系,也可以有效地解决特征稀疏、标签数据缺失、数据噪声等问题。然而,在常见词向量学习方法中,模型最后一层往往会遇到一个超大规模的分类问题,是计算性能的瓶颈。 - -在词向量任务中,我们向大家展示如何使用Hierarchical-Sigmoid 和噪声对比估计(Noise Contrastive Estimation,NCE)来加速词向量的学习。 - -- 1.1 [Hsigmoid加速词向量训练](https://github.com/PaddlePaddle/models/tree/develop/hsigmoid) -- 1.2 [噪声对比估计加速词向量训练](https://github.com/PaddlePaddle/models/tree/develop/nce_cost) - - -## 2. RNN 语言模型 - -语言模型是自然语言处理领域里一个重要的基础模型,除了得到词向量(语言模型训练的副产物),还可以帮助我们生成文本。给定若干个词,语言模型可以帮助我们预测下一个最可能出现的词。 - -在利用语言模型生成文本的任务中,我们重点介绍循环神经网络语言模型,大家可以通过文档中的使用说明快速适配到自己的训练语料,完成自动写诗、自动写散文等有趣的模型。 - -- 2.1 [使用循环神经网络语言模型生成文本](https://github.com/PaddlePaddle/models/tree/develop/generate_sequence_by_rnn_lm) - -## 3. 点击率预估 - -点击率预估模型预判用户对一条广告点击的概率,对每次广告的点击情况做出预测,是广告技术的核心算法之一。逻谛斯克回归对大规模稀疏特征有着很好的学习能力,在点击率预估任务发展的早期一统天下。近年来,DNN 模型由于其强大的学习能力逐渐接过点击率预估任务的大旗。 - -在点击率预估任务中,我们首先给出谷歌提出的 Wide & Deep 模型。这一模型融合了适用于学习抽象特征的DNN和适用于大规模稀疏特征的逻谛斯克回归两者的优点,可以作为一种相对成熟的模型框架使用,在工业界也有一定的应用。同时,我们提供基于因子分解机的深度神经网络模型,该模型融合了因子分解机和深度神经网络,分别建模输入属性之间的低阶交互和高阶交互。 - -- 3.1 [Wide & deep 点击率预估模型](https://github.com/PaddlePaddle/models/tree/develop/ctr/README.cn.md) -- 3.2 [基于深度因子分解机的点击率预估模型](https://github.com/PaddlePaddle/models/tree/develop/deep_fm) - -## 4. 文本分类 - -文本分类是自然语言处理领域最基础的任务之一,深度学习方法能够免除复杂的特征工程,直接使用原始文本作为输入,数据驱动地最优化分类准确率。 - -在文本分类任务中,我们以情感分类任务为例,提供了基于DNN的非序列文本分类模型,以及基于CNN的序列模型供大家学习和使用(基于LSTM的模型见PaddleBook中[情感分类](http://www.paddlepaddle.org/docs/develop/book/06.understand_sentiment/index.cn.html)一课)。 - -- 4.1 [基于DNN/CNN的情感分类](https://github.com/PaddlePaddle/models/tree/develop/text_classification) -- 4.2 [基于双层序列的文本分类模型](https://github.com/PaddlePaddle/models/tree/develop/nested_sequence/text_classification) - -## 5. 排序学习 - -排序学习(Learning to Rank, LTR)是信息检索和搜索引擎研究的核心问题之一,通过机器学习方法学习一个分值函数对待排序的候选进行打分,再根据分值的高低确定序关系。深度神经网络可以用来建模分值函数,构成各类基于深度学习的LTR模型。 - -在排序学习任务中,我们介绍基于RankLoss损失函数Pairwise排序模型和基于LambdaRank损失函数的Listwise排序模型(Pointwise学习策略见PaddleBook中[推荐系统](http://www.paddlepaddle.org/docs/develop/book/05.recommender_system/index.cn.html)一课)。 - -- 5.1 [基于Pairwise和Listwise的排序学习](https://github.com/PaddlePaddle/models/tree/develop/ltr) - -## 6. 结构化语义模型 - -深度结构化语义模型是一种基于神经网络的语义匹配模型框架,可以用于学习两路信息实体或是文本之间的语义相似性。DSSM使用DNN、CNN或是RNN将两路信息实体或是文本映射到同一个连续的低纬度语义空间中。在这个语义空间中,两路实体或是文本可以同时进行表示,然后,通过定义距离度量和匹配函数来刻画并学习不同实体或是文本在同一个语义空间内的语义相似性。 - -在结构化语义模型任务中,我们演示如何建模两个字符串之间的语义相似度。模型支持DNN(全连接前馈网络)、CNN(卷积网络)、RNN(递归神经网络)等不同的网络结构,以及分类、回归、排序等不同损失函数。本例采用最简单的文本数据作为输入,通过替换自己的训练和预测数据,便可以在真实场景中使用。 - -- 6.1 [深度结构化语义模型](https://github.com/PaddlePaddle/models/tree/develop/dssm/README.cn.md) - -## 7. 命名实体识别 - -给定输入序列,序列标注模型为序列中每一个元素贴上一个类别标签,是自然语言处理领域最基础的任务之一。随着深度学习方法的不断发展,利用循环神经网络学习输入序列的特征表示,条件随机场(Conditional Random Field, CRF)在特征基础上完成序列标注任务,逐渐成为解决序列标注问题的标配解决方案。 - -在序列标注任务中,我们以命名实体识别(Named Entity Recognition,NER)任务为例,介绍如何训练一个端到端的序列标注模型。 - -- 7.1 [命名实体识别](https://github.com/PaddlePaddle/models/tree/develop/sequence_tagging_for_ner) - -## 8. 序列到序列学习 - -序列到序列学习实现两个甚至是多个不定长模型之间的映射,有着广泛的应用,包括:机器翻译、智能对话与问答、广告创意语料生成、自动编码(如金融画像编码)、判断多个文本串之间的语义相关性等。 - -在序列到序列学习任务中,我们首先以机器翻译任务为例,提供了多种改进模型供大家学习和使用。包括:不带注意力机制的序列到序列映射模型,这一模型是所有序列到序列学习模型的基础;使用Scheduled Sampling改善RNN模型在生成任务中的错误累积问题;带外部记忆机制的神经机器翻译,通过增强神经网络的记忆能力,来完成复杂的序列到序列学习任务。除机器翻译任务之外,我们也提供了一个基于深层LSTM网络生成古诗词,实现同语言生成的模型。 - -- 8.1 [无注意力机制的神经机器翻译](https://github.com/PaddlePaddle/models/tree/develop/nmt_without_attention/README.cn.md) -- 8.2 [使用Scheduled Sampling改善翻译质量](https://github.com/PaddlePaddle/models/tree/develop/scheduled_sampling) -- 8.3 [带外部记忆机制的神经机器翻译](https://github.com/PaddlePaddle/models/tree/develop/mt_with_external_memory) -- 8.4 [生成古诗词](https://github.com/PaddlePaddle/models/tree/develop/generate_chinese_poetry) - -## 9. 阅读理解 - -当深度学习以及各类新技术不断推动自然语言处理领域向前发展时,我们不禁会问:应该如何确认模型真正理解了人类特有的自然语言,具备一定的理解和推理能力?纵观NLP领域的各类经典问题:词法分析、句法分析、情感分类、写诗等,这些问题的经典解决方案,从技术原理上距离“语言理解”仍有一定距离。为了衡量现有NLP技术到“语言理解”这一终极目标之间的差距,我们需要一个有足够难度且可量化可复现的任务,这也是阅读理解问题提出的初衷。尽管目前的研究现状表明在现有阅读理解数据集上表现良好的模型,依然没有做到真正的语言理解,但机器阅读理解依然被视为是检验模型向理解语言迈进的一个重要任务。 - -阅读理解本质上也是自动问答的一种,模型“阅读”一段文字后回答给定的问题,在这一任务中,我们介绍使用Learning to Search 方法,将阅读理解转化为从段落中寻找答案所在句子,答案在句子中的起始位置,以及答案在句子中的结束位置,这样一个多步决策过程。 - -- 9.1 [Globally Normalized Reader](https://github.com/PaddlePaddle/models/tree/develop/globally_normalized_reader) - -## 10. 自动问答 - -自动问答(Question Answering)系统利用计算机自动回答用户提出的问题,是验证机器是否具备自然语言理解能力的重要任务之一,其研究历史可以追溯到人工智能的原点。与检索系统相比,自动问答系统是信息服务的一种高级形式,系统返回给用户的不再是排序后的基于关键字匹配的检索结果,而是精准的自然语言答案。 - -在自动问答任务中,我们介绍基于深度学习的端到端问答系统,将自动问答转化为一个序列标注问题。端对端问答系统试图通过从高质量的"问题-证据(Evidence)-答案"数据中学习,建立一个联合学习模型,同时学习语料库、知识库、问句语义表示之间的语义映射关系,将传统的问句语义解析、文本检索、答案抽取与生成的复杂步骤转变为一个可学习过程。 - -- 10.1 [基于序列标注的事实型自动问答模型](https://github.com/PaddlePaddle/models/tree/develop/neural_qa) - -## 11. 图像分类 - -图像相比文字能够提供更加生动、容易理解及更具艺术感的信息,是人们转递与交换信息的重要来源。图像分类是根据图像的语义信息对不同类别图像进行区分,是计算机视觉中重要的基础问题,也是图像检测、图像分割、物体跟踪、行为分析等其他高层视觉任务的基础,在许多领域都有着广泛的应用。如:安防领域的人脸识别和智能视频分析等,交通领域的交通场景识别,互联网领域基于内容的图像检索和相册自动归类,医学领域的图像识别等。 - -在图像分类任务中,我们向大家介绍如何训练AlexNet、VGG、GoogLeNet、ResNet、Inception-v4、Inception-Resnet-V2和Xception模型。同时提供了能够将Caffe或TensorFlow训练好的模型文件转换为PaddlePaddle模型文件的模型转换工具。 - -- 11.1 [将Caffe模型文件转换为PaddlePaddle模型文件](https://github.com/PaddlePaddle/models/tree/develop/image_classification/caffe2paddle) -- 11.2 [将TensorFlow模型文件转换为PaddlePaddle模型文件](https://github.com/PaddlePaddle/models/tree/develop/image_classification/tf2paddle) -- 11.3 [AlexNet](https://github.com/PaddlePaddle/models/tree/develop/image_classification) -- 11.4 [VGG](https://github.com/PaddlePaddle/models/tree/develop/image_classification) -- 11.5 [Residual Network](https://github.com/PaddlePaddle/models/tree/develop/image_classification) -- 11.6 [Inception-v4](https://github.com/PaddlePaddle/models/tree/develop/image_classification) -- 11.7 [Inception-Resnet-V2](https://github.com/PaddlePaddle/models/tree/develop/image_classification) -- 11.8 [Xception](https://github.com/PaddlePaddle/models/tree/develop/image_classification) - -## 12. 目标检测 - -目标检测任务的目标是给定一张图像或是视频帧,让计算机找出其中所有目标的位置,并给出每个目标的具体类别。对于人类来说,目标检测是一个非常简单的任务。然而,计算机能够“看到”的仅有一些值为0 ~ 255的矩阵,很难解图像或是视频帧中出现了人或是物体这样的高层语义概念,也就更加难以定位目标出现在图像中哪个区域。与此同时,由于目标会出现在图像或是视频帧中的任何位置,目标的形态千变万化,图像或是视频帧的背景千差万别,诸多因素都使得目标检测对计算机来说是一个具有挑战性的问题。 - -在目标检测任务中,我们介绍利用SSD方法完成目标检测。SSD全称:Single Shot MultiBox Detector,是目标检测领域较新且效果较好的检测算法之一,具有检测速度快且检测精度高的特点。 - -- 12.1 [Single Shot MultiBox Detector](https://github.com/PaddlePaddle/models/tree/develop/ssd/README.cn.md) - -## 13. 场景文字识别 - -许多场景图像中包含着丰富的文本信息,对理解图像信息有着重要作用,能够极大地帮助人们认知和理解场景图像的内容。场景文字识别是在图像背景复杂、分辨率低下、字体多样、分布随意等情况下,将图像信息转化为文字序列的过程,可认为是一种特别的翻译过程:将图像输入翻译为自然语言输出。场景图像文字识别技术的发展也促进了一些新型应用的产生,如通过自动识别路牌中的文字帮助街景应用获取更加准确的地址信息等。 - -在场景文字识别任务中,我们介绍如何将基于CNN的图像特征提取和基于RNN的序列翻译技术结合,免除人工定义特征,避免字符分割,使用自动学习到的图像特征,完成端到端地无约束字符定位和识别。 - -- 13.1 [场景文字识别](https://github.com/PaddlePaddle/models/tree/develop/scene_text_recognition) - -## 14. 语音识别 - -语音识别技术(Auto Speech Recognize,简称ASR)将人类语音中的词汇内容转化为计算机可读的输入,让机器能够“听懂”人类的语音,在语音助手、语音输入、语音交互等应用中发挥着重要作用。深度学习在语音识别领域取得了瞩目的成绩,端到端的深度学习方法将传统的声学模型、词典、语言模型等模块融为一个整体,不再依赖隐马尔可夫模型中的各种条件独立性假设,令模型变得更加简洁,一个神经网络模型以语音特征为输入,直接输出识别出的文本,目前已经成为语音识别最重要的手段。 - -在语音识别任务中,我们提供了基于 DeepSpeech2 模型的完整流水线,包括:特征提取、数据增强、模型训练、语言模型、解码模块等,并提供一个训练好的模型和体验实例,大家能够使用自己的声音来体验语音识别的乐趣。 - -14.1 [语音识别: DeepSpeech2](https://github.com/PaddlePaddle/DeepSpeech) - -本教程由[PaddlePaddle](https://github.com/PaddlePaddle/Paddle)创作,采用[Apache-2.0](LICENSE) 许可协议进行许可。 diff --git a/legacy/README.md b/legacy/README.md deleted file mode 100644 index f0719c1a26c04341e8de327143dc826248bb3607..0000000000000000000000000000000000000000 --- a/legacy/README.md +++ /dev/null @@ -1,89 +0,0 @@ - -# 该目录的模型已经不再维护,不推荐使用。建议使用Fluid目录下的模型。 - -# Introduction to models - -[![Documentation Status](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat)](https://github.com/PaddlePaddle/models) -[![Documentation Status](https://img.shields.io/badge/中文文档-最新-brightgreen.svg)](https://github.com/PaddlePaddle/models) -[![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE) - -PaddlePaddle provides a rich set of computational units to enable users to adopt a modular approach to solving various learning problems. In this repo, we demonstrate how to use PaddlePaddle to solve common machine learning tasks, providing several different neural network model that anyone can easily learn and use. - -## 1. Word Embedding - -The word embedding expresses words with a real vector. Each dimension of the vector represents some of the latent grammatical or semantic features of the text and is one of the most successful concepts in the field of natural language processing. The generalized word vector can also be applied to discrete features. The study of word vector is usually an unsupervised learning. Therefore, it is possible to take full advantage of massive unmarked data to capture the relationship between features and to solve the problem of sparse features, missing tag data, and data noise. However, in the common word vector learning method, the last layer of the model often encounters a large-scale classification problem, which is the bottleneck of computing performance. - -In the example of word vectors, we show how to use Hierarchical-Sigmoid and Noise Contrastive Estimation (NCE) to accelerate word-vector learning. - -- 1.1 [Hsigmoid Accelerated Word Vector Training](https://github.com/PaddlePaddle/models/tree/develop/legacy/hsigmoid) -- 1.2 [Noise Contrastive Estimation Accelerated Word Vector Training](https://github.com/PaddlePaddle/models/tree/develop/legacy/nce_cost) - - -## 2. RNN language model - -The language model is important in the field of natural language processing. In addition to getting the word vector (a by-product of language model training), it can also help us to generate text. Given a number of words, the language model can help us predict the next most likely word. In the example of using the language model to generate text, we focus on the recurrent neural network language model. We can use the instructions in the document quickly adapt to their training corpus, complete automatic writing poetry, automatic writing prose and other interesting models. - -- 2.1 [Generate text using the RNN language model](https://github.com/PaddlePaddle/models/tree/develop/legacy/generate_sequence_by_rnn_lm) - -## 3. Click-Through Rate prediction -The click-through rate model predicts the probability that a user will click on an ad. This is widely used for advertising technology. Logistic Regression has a good learning performance for large-scale sparse features in the early stages of the development of click-through rate prediction. In recent years, DNN model because of its strong learning ability to gradually take the banner rate of the task of the banner. - -In the example of click-through rate estimates, we first give the Google's Wide & Deep model. This model combines the advantages of DNN and the applicable logistic regression model for DNN and large-scale sparse features. Then we provide the deep factorization machine for click-through rate prediction. The deep factorization machine combines the factorization machine and deep neural networks to model both low order and high order interactions of input features. - -- 3.1 [Click-Through Rate Model](https://github.com/PaddlePaddle/models/tree/develop/legacy/ctr) -- 3.2 [Deep Factorization Machine for Click-Through Rate prediction](https://github.com/PaddlePaddle/models/tree/develop/legacy/deep_fm) - -## 4. Text classification - -Text classification is one of the most basic tasks in natural language processing. The deep learning method can eliminate the complex feature engineering, and use the original text as input to optimize the classification accuracy. - -For text classification, we provide a non-sequential text classification model based on DNN and CNN. (For LSTM-based model, please refer to PaddleBook [Sentiment Analysis](http://www.paddlepaddle.org/docs/develop/book/06.understand_sentiment/index.html)). - -- 4.1 [Sentiment analysis based on DNN / CNN](https://github.com/PaddlePaddle/models/tree/develop/legacy/text_classification) - -## 5. Learning to rank - -Learning to rank (LTR) is one of the core problems in information retrieval and search engine research. Training data is used by a learning algorithm to produce a ranking model which computes the relevance of documents for actual queries. -The depth neural network can be used to model the fractional function to form various LTR models based on depth learning. - -The algorithms for learning to rank are usually categorized into three groups by their input representation and the loss function. These are pointwise, pairwise and listwise approaches. Here we demonstrate RankLoss loss function method (pairwise approach), and LambdaRank loss function method (listwise approach). (For Pointwise approaches, please refer to [Recommended System](http://www.paddlepaddle.org/docs/develop/book/05.recommender_system/index.html)). - -- 5.1 [Learning to rank based on Pairwise and Listwise approches](https://github.com/PaddlePaddle/models/tree/develop/legacy/ltr) - -## 6. Semantic model -The deep structured semantic model uses the DNN model to learn the vector representation of the low latitude in a continuous semantic space, finally models the semantic similarity between the two sentences. - -In this example, we demonstrate how to use PaddlePaddle to implement a generic deep structured semantic model to model the semantic similarity between two strings. The model supports different network structures such as CNN (Convolutional Network), FC (Fully Connected Network), RNN (Recurrent Neural Network), and different loss functions such as classification, regression, and sequencing. - -- 6.1 [Deep structured semantic model](https://github.com/PaddlePaddle/models/tree/develop/legacy/dssm) - -## 7. Sequence tagging - -Given the input sequence, the sequence tagging model is one of the most basic tasks in the natural language processing by assigning a category tag to each element in the sequence. Recurrent neural network models with Conditional Random Field (CRF) are commonly used for sequence tagging tasks. - -In the example of the sequence tagging, we describe how to train an end-to-end sequence tagging model with the Named Entity Recognition (NER) task as an example. - -- 7.1 [Name Entity Recognition](https://github.com/PaddlePaddle/models/tree/develop/legacy/sequence_tagging_for_ner) - -## 8. Sequence to sequence learning - -Sequence-to-sequence model has a wide range of applications. This includes machine translation, dialogue system, and parse tree generation. - -As an example for sequence-to-sequence learning, we take the machine translation task. We demonstrate the sequence-to-sequence mapping model without attention mechanism, which is the basis for all sequence-to-sequence learning models. We will use scheduled sampling to improve the problem of error accumulation in the RNN model, and machine translation with external memory mechanism. - -- 8.1 [Basic Sequence-to-sequence model](https://github.com/PaddlePaddle/models/tree/develop/legacy/nmt_without_attention) - -## 9. Image classification - -For the example of image classification, we show you how to train AlexNet, VGG, GoogLeNet, ResNet, Inception-v4, Inception-Resnet-V2 and Xception models in PaddlePaddle. It also provides model conversion tools that convert Caffe or TensorFlow trained model files into PaddlePaddle model files. - -- 9.1 [convert Caffe model file to PaddlePaddle model file](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification/caffe2paddle) -- 9.2 [convert TensorFlow model file to PaddlePaddle model file](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification/tf2paddle) -- 9.3 [AlexNet](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification) -- 9.4 [VGG](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification) -- 9.5 [Residual Network](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification) -- 9.6 [Inception-v4](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification) -- 9.7 [Inception-Resnet-V2](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification) -- 9.8 [Xception](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification) - -This tutorial is contributed by [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) and licensed under the [Apache-2.0 license](LICENSE). diff --git a/legacy/conv_seq2seq/README.md b/legacy/conv_seq2seq/README.md deleted file mode 100644 index 5b22c2c17ea2ff3588e93219e86d81a831242211..0000000000000000000000000000000000000000 --- a/legacy/conv_seq2seq/README.md +++ /dev/null @@ -1,70 +0,0 @@ -The minimum PaddlePaddle version needed for the code sample in this directory is v0.11.0. If you are on a version of PaddlePaddle earlier than v0.11.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html). - ---- - -# Convolutional Sequence to Sequence Learning -This model implements the work in the following paper: - -Jonas Gehring, Micheal Auli, David Grangier, et al. Convolutional Sequence to Sequence Learning. Association for Computational Linguistics (ACL), 2017 - -# Data Preparation -- The data used in this tutorial can be downloaded by runing: - - ```bash - sh download.sh - ``` - -- Each line in the data file contains one sample and each sample consists of a source sentence and a target sentence. And the two sentences are seperated by '\t'. So, to use your own data, it should be organized as follows: - - ``` - \t - ``` - -# Training a Model -- Modify the following script if needed and then run: - - ```bash - python train.py \ - --train_data_path ./data/train \ - --test_data_path ./data/test \ - --src_dict_path ./data/src_dict \ - --trg_dict_path ./data/trg_dict \ - --enc_blocks "[(256, 3)] * 5" \ - --dec_blocks "[(256, 3)] * 3" \ - --emb_size 256 \ - --pos_size 200 \ - --drop_rate 0.2 \ - --use_bn False \ - --use_gpu False \ - --trainer_count 1 \ - --batch_size 32 \ - --num_passes 20 \ - >train.log 2>&1 - ``` - -# Inferring by a Trained Model -- Infer by a trained model by running: - - ```bash - python infer.py \ - --infer_data_path ./data/dev \ - --src_dict_path ./data/src_dict \ - --trg_dict_path ./data/trg_dict \ - --enc_blocks "[(256, 3)] * 5" \ - --dec_blocks "[(256, 3)] * 3" \ - --emb_size 256 \ - --pos_size 200 \ - --drop_rate 0.2 \ - --use_bn False \ - --use_gpu False \ - --trainer_count 1 \ - --max_len 100 \ - --batch_size 256 \ - --beam_size 1 \ - --is_show_attention False \ - --model_path ./params.pass-0.tar.gz \ - 1>infer_result 2>infer.log - ``` - -# Notes -Since PaddlePaddle of current version doesn't support weight normalization, we use batch normalization instead to confirm convergence when the network is deep. diff --git a/legacy/conv_seq2seq/beamsearch.py b/legacy/conv_seq2seq/beamsearch.py deleted file mode 100644 index dd8562f018c803d4f0d7bbba4a2a006ece904851..0000000000000000000000000000000000000000 --- a/legacy/conv_seq2seq/beamsearch.py +++ /dev/null @@ -1,197 +0,0 @@ -#coding=utf-8 - -import sys -import time -import math -import numpy as np - -import reader - - -class BeamSearch(object): - """ - Generate sequence by beam search - """ - - def __init__(self, - inferer, - trg_dict, - pos_size, - padding_num, - batch_size=1, - beam_size=1, - max_len=100): - self.inferer = inferer - self.trg_dict = trg_dict - self.reverse_trg_dict = reader.get_reverse_dict(trg_dict) - self.word_padding = trg_dict.__len__() - self.pos_size = pos_size - self.pos_padding = pos_size - self.padding_num = padding_num - self.win_len = padding_num + 1 - self.max_len = max_len - self.batch_size = batch_size - self.beam_size = beam_size - - def get_beam_input(self, batch, sample_list): - """ - Get input for generation at the current iteration. - """ - beam_input = [] - - for sample_id in sample_list: - for path in self.candidate_path[sample_id]: - if len(path['seq']) < self.win_len: - cur_trg = [self.word_padding] * ( - self.win_len - len(path['seq']) - 1 - ) + [self.trg_dict['']] + path['seq'] - cur_trg_pos = [self.pos_padding] * ( - self.win_len - len(path['seq']) - 1) + [0] + range( - 1, len(path['seq']) + 1) - else: - cur_trg = path['seq'][-self.win_len:] - cur_trg_pos = range( - len(path['seq']) + 1 - self.win_len, - len(path['seq']) + 1) - - beam_input.append(batch[sample_id] + [cur_trg] + [cur_trg_pos]) - - return beam_input - - def get_prob(self, beam_input): - """ - Get the probabilities of all possible tokens. - """ - row_list = [j * self.win_len for j in range(len(beam_input))] - prob = self.inferer.infer(beam_input, field='value')[row_list, :] - return prob - - def _top_k(self, prob, k): - """ - Get indices of the words with k highest probablities. - """ - return prob.argsort()[-k:][::-1] - - def beam_expand(self, prob, sample_list): - """ - In every iteration step, the model predicts the possible next words. - For each input sentence, the top beam_size words are selected as candidates. - """ - top_words = np.apply_along_axis(self._top_k, 1, prob, self.beam_size) - - candidate_words = [[]] * len(self.candidate_path) - idx = 0 - - for sample_id in sample_list: - for seq_id, path in enumerate(self.candidate_path[sample_id]): - for w in top_words[idx, :]: - score = path['score'] + math.log(prob[idx, w]) - candidate_words[sample_id] = candidate_words[sample_id] + [{ - 'word': w, - 'score': score, - 'seq_id': seq_id - }] - idx = idx + 1 - - return candidate_words - - def beam_shrink(self, candidate_words, sample_list): - """ - Pruning process of the beam search. During the process, beam_size most post possible - sequences are selected for the beam in the next generation. - """ - new_path = [[]] * len(self.candidate_path) - - for sample_id in sample_list: - beam_words = sorted( - candidate_words[sample_id], - key=lambda x: x['score'], - reverse=True)[:self.beam_size] - - complete_seq_min_score = None - complete_path_num = len(self.complete_path[sample_id]) - - if complete_path_num > 0: - complete_seq_min_score = min(self.complete_path[sample_id], - key=lambda x: x['score'])['score'] - if complete_path_num >= self.beam_size: - beam_words_max_score = beam_words[0]['score'] - if beam_words_max_score < complete_seq_min_score: - continue - - for w in beam_words: - - if w['word'] == self.trg_dict['']: - if complete_path_num < self.beam_size or complete_seq_min_score <= w[ - 'score']: - - seq = self.candidate_path[sample_id][w['seq_id']]['seq'] - self.complete_path[sample_id] = self.complete_path[ - sample_id] + [{ - 'seq': seq, - 'score': w['score'] - }] - - if complete_seq_min_score is None or complete_seq_min_score > w[ - 'score']: - complete_seq_min_score = w['score'] - else: - seq = self.candidate_path[sample_id][w['seq_id']]['seq'] + [ - w['word'] - ] - new_path[sample_id] = new_path[sample_id] + [{ - 'seq': seq, - 'score': w['score'] - }] - - return new_path - - def search_one_batch(self, batch): - """ - Perform beam search on one mini-batch. - """ - real_size = len(batch) - self.candidate_path = [[{'seq': [], 'score': 0.}]] * real_size - self.complete_path = [[]] * real_size - sample_list = range(real_size) - - for i in xrange(self.max_len): - beam_input = self.get_beam_input(batch, sample_list) - prob = self.get_prob(beam_input) - - candidate_words = self.beam_expand(prob, sample_list) - new_path = self.beam_shrink(candidate_words, sample_list) - self.candidate_path = new_path - sample_list = [ - sample_id for sample_id in sample_list - if len(new_path[sample_id]) > 0 - ] - - if len(sample_list) == 0: - break - - final_path = [] - for i in xrange(real_size): - top_path = sorted( - self.complete_path[i] + self.candidate_path[i], - key=lambda x: x['score'], - reverse=True)[:self.beam_size] - final_path.append(top_path) - return final_path - - def search(self, infer_data): - """ - Perform beam search on all data. - """ - - def _to_sentence(seq): - raw_sentence = [self.reverse_trg_dict[id] for id in seq] - sentence = " ".join(raw_sentence) - return sentence - - for pos in xrange(0, len(infer_data), self.batch_size): - batch = infer_data[pos:min(pos + self.batch_size, len(infer_data))] - self.final_path = self.search_one_batch(batch) - for top_path in self.final_path: - print _to_sentence(top_path[0]['seq']) - sys.stdout.flush() diff --git a/legacy/conv_seq2seq/download.sh b/legacy/conv_seq2seq/download.sh deleted file mode 100644 index b1a924d25b1a10ade9f4be8b504933d1efa01905..0000000000000000000000000000000000000000 --- a/legacy/conv_seq2seq/download.sh +++ /dev/null @@ -1,22 +0,0 @@ -#!/usr/bin/env bash - -CUR_PATH=`pwd` -git clone https://github.com/moses-smt/mosesdecoder.git -git clone https://github.com/rizar/actor-critic-public - -export MOSES=`pwd`/mosesdecoder -export LVSR=`pwd`/actor-critic-public - -cd actor-critic-public/exp/ted -sh create_dataset.sh - -cd $CUR_PATH -mkdir data -cp actor-critic-public/exp/ted/prep/*-* data/ -cp actor-critic-public/exp/ted/vocab.* data/ - -cd data -python ../preprocess.py - -cd .. -rm -rf actor-critic-public mosesdecoder diff --git a/legacy/conv_seq2seq/infer.py b/legacy/conv_seq2seq/infer.py deleted file mode 100644 index c804a84e71ffe920b72064cb05461d72c444ac73..0000000000000000000000000000000000000000 --- a/legacy/conv_seq2seq/infer.py +++ /dev/null @@ -1,236 +0,0 @@ -#coding=utf-8 - -import sys -import argparse -import distutils.util -import gzip - -import paddle.v2 as paddle -from model import conv_seq2seq -from beamsearch import BeamSearch -import reader - - -def parse_args(): - parser = argparse.ArgumentParser( - description="PaddlePaddle Convolutional Seq2Seq") - parser.add_argument( - '--infer_data_path', - type=str, - required=True, - help="Path of the dataset for inference") - parser.add_argument( - '--src_dict_path', - type=str, - required=True, - help='Path of the source dictionary') - parser.add_argument( - '--trg_dict_path', - type=str, - required=True, - help='path of the target dictionary') - parser.add_argument( - '--enc_blocks', type=str, help='Convolution blocks of the encoder') - parser.add_argument( - '--dec_blocks', type=str, help='Convolution blocks of the decoder') - parser.add_argument( - '--emb_size', - type=int, - default=256, - help='Dimension of word embedding. (default: %(default)s)') - parser.add_argument( - '--pos_size', - type=int, - default=200, - help='Total number of the position indexes. (default: %(default)s)') - parser.add_argument( - '--drop_rate', - type=float, - default=0., - help='Dropout rate. (default: %(default)s)') - parser.add_argument( - "--use_bn", - default=False, - type=distutils.util.strtobool, - help="Use batch normalization or not. (default: %(default)s)") - parser.add_argument( - "--use_gpu", - default=False, - type=distutils.util.strtobool, - help="Use gpu or not. (default: %(default)s)") - parser.add_argument( - "--trainer_count", - default=1, - type=int, - help="Trainer number. (default: %(default)s)") - parser.add_argument( - '--max_len', - type=int, - default=100, - help="The maximum length of the sentence to be generated. (default: %(default)s)" - ) - parser.add_argument( - "--batch_size", - default=1, - type=int, - help="Size of a mini-batch. (default: %(default)s)") - parser.add_argument( - "--beam_size", - default=1, - type=int, - help="The width of beam expansion. (default: %(default)s)") - parser.add_argument( - "--model_path", - type=str, - required=True, - help="The path of trained model. (default: %(default)s)") - parser.add_argument( - "--is_show_attention", - default=False, - type=distutils.util.strtobool, - help="Whether to show attention weight or not. (default: %(default)s)") - return parser.parse_args() - - -def infer(infer_data_path, - src_dict_path, - trg_dict_path, - model_path, - enc_conv_blocks, - dec_conv_blocks, - emb_dim=256, - pos_size=200, - drop_rate=0., - use_bn=False, - max_len=100, - batch_size=1, - beam_size=1, - is_show_attention=False): - """ - Inference. - - :param infer_data_path: The path of the data for inference. - :type infer_data_path: str - :param src_dict_path: The path of the source dictionary. - :type src_dict_path: str - :param trg_dict_path: The path of the target dictionary. - :type trg_dict_path: str - :param model_path: The path of a trained model. - :type model_path: str - :param enc_conv_blocks: The scale list of the encoder's convolution blocks. And each element of - the list contains output dimension and context length of the corresponding - convolution block. - :type enc_conv_blocks: list of tuple - :param dec_conv_blocks: The scale list of the decoder's convolution blocks. And each element of - the list contains output dimension and context length of the corresponding - convolution block. - :type dec_conv_blocks: list of tuple - :param emb_dim: The dimension of the embedding vector. - :type emb_dim: int - :param pos_size: The total number of the position indexes, which means - the maximum value of the index is pos_size - 1. - :type pos_size: int - :param drop_rate: Dropout rate. - :type drop_rate: float - :param use_bn: Whether to use batch normalization or not. False is the default value. - :type use_bn: bool - :param max_len: The maximum length of the sentence to be generated. - :type max_len: int - :param beam_size: The width of beam expansion. - :type beam_size: int - :param is_show_attention: Whether to show attention weight or not. False is the default value. - :type is_show_attention: bool - """ - # load dict - src_dict = reader.load_dict(src_dict_path) - trg_dict = reader.load_dict(trg_dict_path) - src_dict_size = src_dict.__len__() - trg_dict_size = trg_dict.__len__() - - prob, weight = conv_seq2seq( - src_dict_size=src_dict_size, - trg_dict_size=trg_dict_size, - pos_size=pos_size, - emb_dim=emb_dim, - enc_conv_blocks=enc_conv_blocks, - dec_conv_blocks=dec_conv_blocks, - drop_rate=drop_rate, - with_bn=use_bn, - is_infer=True) - - # load parameters - parameters = paddle.parameters.Parameters.from_tar(gzip.open(model_path)) - - padding_list = [context_len - 1 for (size, context_len) in dec_conv_blocks] - padding_num = reduce(lambda x, y: x + y, padding_list) - infer_reader = reader.data_reader( - data_file=infer_data_path, - src_dict=src_dict, - trg_dict=trg_dict, - pos_size=pos_size, - padding_num=padding_num) - - if is_show_attention: - attention_inferer = paddle.inference.Inference( - output_layer=weight, parameters=parameters) - for i, data in enumerate(infer_reader()): - src_len = len(data[0]) - trg_len = len(data[2]) - attention_weight = attention_inferer.infer( - [data], field='value', flatten_result=False) - attention_weight = [ - weight.reshape((trg_len, src_len)) - for weight in attention_weight - ] - print attention_weight - break - return - - infer_data = [] - for i, raw_data in enumerate(infer_reader()): - infer_data.append([raw_data[0], raw_data[1]]) - - inferer = paddle.inference.Inference( - output_layer=prob, parameters=parameters) - - searcher = BeamSearch( - inferer=inferer, - trg_dict=trg_dict, - pos_size=pos_size, - padding_num=padding_num, - max_len=max_len, - batch_size=batch_size, - beam_size=beam_size) - - searcher.search(infer_data) - return - - -def main(): - args = parse_args() - enc_conv_blocks = eval(args.enc_blocks) - dec_conv_blocks = eval(args.dec_blocks) - - sys.setrecursionlimit(10000) - - paddle.init(use_gpu=args.use_gpu, trainer_count=args.trainer_count) - - infer( - infer_data_path=args.infer_data_path, - src_dict_path=args.src_dict_path, - trg_dict_path=args.trg_dict_path, - model_path=args.model_path, - enc_conv_blocks=enc_conv_blocks, - dec_conv_blocks=dec_conv_blocks, - emb_dim=args.emb_size, - pos_size=args.pos_size, - drop_rate=args.drop_rate, - use_bn=args.use_bn, - max_len=args.max_len, - batch_size=args.batch_size, - beam_size=args.beam_size, - is_show_attention=args.is_show_attention) - - -if __name__ == '__main__': - main() diff --git a/legacy/conv_seq2seq/model.py b/legacy/conv_seq2seq/model.py deleted file mode 100644 index c31238f83172fdc3d6240095279d1c953ab272ae..0000000000000000000000000000000000000000 --- a/legacy/conv_seq2seq/model.py +++ /dev/null @@ -1,440 +0,0 @@ -#coding=utf-8 - -import math - -import paddle.v2 as paddle - -__all__ = ["conv_seq2seq"] - - -def gated_conv_with_batchnorm(input, - size, - context_len, - context_start=None, - learning_rate=1.0, - drop_rate=0., - with_bn=False): - """ - Definition of the convolution block. - - :param input: The input of this block. - :type input: LayerOutput - :param size: The dimension of the block's output. - :type size: int - :param context_len: The context length of the convolution. - :type context_len: int - :param context_start: The start position of the context. - :type context_start: int - :param learning_rate: The learning rate factor of the parameters in the block. - The actual learning rate is the product of the global - learning rate and this factor. - :type learning_rate: float - :param drop_rate: Dropout rate. - :type drop_rate: float - :param with_bn: Whether to use batch normalization or not. False is the default - value. - :type with_bn: bool - :return: The output of the convolution block. - :rtype: LayerOutput - """ - input = paddle.layer.dropout(input=input, dropout_rate=drop_rate) - - context = paddle.layer.mixed( - size=input.size * context_len, - input=paddle.layer.context_projection( - input=input, context_len=context_len, context_start=context_start)) - - raw_conv = paddle.layer.fc( - input=context, - size=size * 2, - act=paddle.activation.Linear(), - param_attr=paddle.attr.Param( - initial_mean=0., - initial_std=math.sqrt(4.0 * (1.0 - drop_rate) / context.size), - learning_rate=learning_rate), - bias_attr=False) - - if with_bn: - raw_conv = paddle.layer.batch_norm( - input=raw_conv, - act=paddle.activation.Linear(), - param_attr=paddle.attr.Param(learning_rate=learning_rate)) - - with paddle.layer.mixed(size=size) as conv: - conv += paddle.layer.identity_projection(raw_conv, size=size, offset=0) - - with paddle.layer.mixed(size=size, act=paddle.activation.Sigmoid()) as gate: - gate += paddle.layer.identity_projection( - raw_conv, size=size, offset=size) - - with paddle.layer.mixed(size=size) as gated_conv: - gated_conv += paddle.layer.dotmul_operator(conv, gate) - - return gated_conv - - -def encoder(token_emb, - pos_emb, - conv_blocks=[(256, 3)] * 5, - num_attention=3, - drop_rate=0., - with_bn=False): - """ - Definition of the encoder. - - :param token_emb: The embedding vector of the input token. - :type token_emb: LayerOutput - :param pos_emb: The embedding vector of the input token's position. - :type pos_emb: LayerOutput - :param conv_blocks: The scale list of the convolution blocks. Each element of - the list contains output dimension and context length of - the corresponding convolution block. - :type conv_blocks: list of tuple - :param num_attention: The total number of the attention modules used in the decoder. - :type num_attention: int - :param drop_rate: Dropout rate. - :type drop_rate: float - :param with_bn: Whether to use batch normalization or not. False is the default - value. - :type with_bn: bool - :return: The input token encoding. - :rtype: LayerOutput - """ - embedding = paddle.layer.addto( - input=[token_emb, pos_emb], - layer_attr=paddle.attr.Extra(drop_rate=drop_rate)) - - proj_size = conv_blocks[0][0] - block_input = paddle.layer.fc( - input=embedding, - size=proj_size, - act=paddle.activation.Linear(), - param_attr=paddle.attr.Param( - initial_mean=0., - initial_std=math.sqrt((1.0 - drop_rate) / embedding.size), - learning_rate=1.0 / (2.0 * num_attention)), - bias_attr=True, ) - - for (size, context_len) in conv_blocks: - if block_input.size == size: - residual = block_input - else: - residual = paddle.layer.fc( - input=block_input, - size=size, - act=paddle.activation.Linear(), - param_attr=paddle.attr.Param(learning_rate=1.0 / - (2.0 * num_attention)), - bias_attr=True) - - gated_conv = gated_conv_with_batchnorm( - input=block_input, - size=size, - context_len=context_len, - learning_rate=1.0 / (2.0 * num_attention), - drop_rate=drop_rate, - with_bn=with_bn) - - with paddle.layer.mixed(size=size) as block_output: - block_output += paddle.layer.identity_projection(residual) - block_output += paddle.layer.identity_projection(gated_conv) - - # halve the variance of the sum - block_output = paddle.layer.slope_intercept( - input=block_output, slope=math.sqrt(0.5)) - - block_input = block_output - - emb_dim = embedding.size - encoded_vec = paddle.layer.fc( - input=block_output, - size=emb_dim, - act=paddle.activation.Linear(), - param_attr=paddle.attr.Param(learning_rate=1.0 / (2.0 * num_attention)), - bias_attr=True) - - encoded_sum = paddle.layer.addto(input=[encoded_vec, embedding]) - - # halve the variance of the sum - encoded_sum = paddle.layer.slope_intercept( - input=encoded_sum, slope=math.sqrt(0.5)) - - return encoded_vec, encoded_sum - - -def attention(decoder_state, cur_embedding, encoded_vec, encoded_sum): - """ - Definition of the attention. - - :param decoder_state: The hidden state of the decoder. - :type decoder_state: LayerOutput - :param cur_embedding: The embedding vector of the current token. - :type cur_embedding: LayerOutput - :param encoded_vec: The source token encoding. - :type encoded_vec: LayerOutput - :param encoded_sum: The sum of the source token's encoding and embedding. - :type encoded_sum: LayerOutput - :return: A context vector and the attention weight. - :rtype: LayerOutput - """ - residual = decoder_state - - state_size = decoder_state.size - emb_dim = cur_embedding.size - with paddle.layer.mixed(size=emb_dim, bias_attr=True) as state_summary: - state_summary += paddle.layer.full_matrix_projection(decoder_state) - state_summary += paddle.layer.identity_projection(cur_embedding) - - # halve the variance of the sum - state_summary = paddle.layer.slope_intercept( - input=state_summary, slope=math.sqrt(0.5)) - - expanded = paddle.layer.expand(input=state_summary, expand_as=encoded_vec) - - m = paddle.layer.dot_prod(input1=expanded, input2=encoded_vec) - - attention_weight = paddle.layer.fc(input=m, - size=1, - act=paddle.activation.SequenceSoftmax(), - bias_attr=False) - - scaled = paddle.layer.scaling(weight=attention_weight, input=encoded_sum) - - attended = paddle.layer.pooling( - input=scaled, pooling_type=paddle.pooling.Sum()) - - attended_proj = paddle.layer.fc(input=attended, - size=state_size, - act=paddle.activation.Linear(), - bias_attr=True) - - attention_result = paddle.layer.addto(input=[attended_proj, residual]) - - # halve the variance of the sum - attention_result = paddle.layer.slope_intercept( - input=attention_result, slope=math.sqrt(0.5)) - return attention_result, attention_weight - - -def decoder(token_emb, - pos_emb, - encoded_vec, - encoded_sum, - dict_size, - conv_blocks=[(256, 3)] * 3, - drop_rate=0., - with_bn=False): - """ - Definition of the decoder. - - :param token_emb: The embedding vector of the input token. - :type token_emb: LayerOutput - :param pos_emb: The embedding vector of the input token's position. - :type pos_emb: LayerOutput - :param encoded_vec: The source token encoding. - :type encoded_vec: LayerOutput - :param encoded_sum: The sum of the source token's encoding and embedding. - :type encoded_sum: LayerOutput - :param dict_size: The size of the target dictionary. - :type dict_size: int - :param conv_blocks: The scale list of the convolution blocks. Each element - of the list contains output dimension and context length - of the corresponding convolution block. - :type conv_blocks: list of tuple - :param drop_rate: Dropout rate. - :type drop_rate: float - :param with_bn: Whether to use batch normalization or not. False is the default - value. - :type with_bn: bool - :return: The probability of the predicted token and the attention weights. - :rtype: LayerOutput - """ - - def attention_step(decoder_state, cur_embedding, encoded_vec, encoded_sum): - conditional = attention( - decoder_state=decoder_state, - cur_embedding=cur_embedding, - encoded_vec=encoded_vec, - encoded_sum=encoded_sum) - return conditional - - embedding = paddle.layer.addto( - input=[token_emb, pos_emb], - layer_attr=paddle.attr.Extra(drop_rate=drop_rate)) - - proj_size = conv_blocks[0][0] - block_input = paddle.layer.fc( - input=embedding, - size=proj_size, - act=paddle.activation.Linear(), - param_attr=paddle.attr.Param( - initial_mean=0., - initial_std=math.sqrt((1.0 - drop_rate) / embedding.size)), - bias_attr=True, ) - - weight = [] - for (size, context_len) in conv_blocks: - if block_input.size == size: - residual = block_input - else: - residual = paddle.layer.fc(input=block_input, - size=size, - act=paddle.activation.Linear(), - bias_attr=True) - - decoder_state = gated_conv_with_batchnorm( - input=block_input, - size=size, - context_len=context_len, - context_start=0, - drop_rate=drop_rate, - with_bn=with_bn) - - group_inputs = [ - decoder_state, - embedding, - paddle.layer.StaticInput(input=encoded_vec), - paddle.layer.StaticInput(input=encoded_sum), - ] - - conditional, attention_weight = paddle.layer.recurrent_group( - step=attention_step, input=group_inputs) - weight.append(attention_weight) - - block_output = paddle.layer.addto(input=[conditional, residual]) - - # halve the variance of the sum - block_output = paddle.layer.slope_intercept( - input=block_output, slope=math.sqrt(0.5)) - - block_input = block_output - - out_emb_dim = embedding.size - block_output = paddle.layer.fc( - input=block_output, - size=out_emb_dim, - act=paddle.activation.Linear(), - layer_attr=paddle.attr.Extra(drop_rate=drop_rate)) - - decoder_out = paddle.layer.fc( - input=block_output, - size=dict_size, - act=paddle.activation.Softmax(), - param_attr=paddle.attr.Param( - initial_mean=0., - initial_std=math.sqrt((1.0 - drop_rate) / block_output.size)), - bias_attr=True) - - return decoder_out, weight - - -def conv_seq2seq(src_dict_size, - trg_dict_size, - pos_size, - emb_dim, - enc_conv_blocks=[(256, 3)] * 5, - dec_conv_blocks=[(256, 3)] * 3, - drop_rate=0., - with_bn=False, - is_infer=False): - """ - Definition of convolutional sequence-to-sequence network. - - :param src_dict_size: The size of the source dictionary. - :type src_dict_size: int - :param trg_dict_size: The size of the target dictionary. - :type trg_dict_size: int - :param pos_size: The total number of the position indexes, which means - the maximum value of the index is pos_size - 1. - :type pos_size: int - :param emb_dim: The dimension of the embedding vector. - :type emb_dim: int - :param enc_conv_blocks: The scale list of the encoder's convolution blocks. Each element - of the list contains output dimension and context length of the - corresponding convolution block. - :type enc_conv_blocks: list of tuple - :param dec_conv_blocks: The scale list of the decoder's convolution blocks. Each element - of the list contains output dimension and context length of the - corresponding convolution block. - :type dec_conv_blocks: list of tuple - :param drop_rate: Dropout rate. - :type drop_rate: float - :param with_bn: Whether to use batch normalization or not. False is the default value. - :type with_bn: bool - :param is_infer: Whether infer or not. - :type is_infer: bool - :return: Cost or output layer. - :rtype: LayerOutput - """ - src = paddle.layer.data( - name='src_word', - type=paddle.data_type.integer_value_sequence(src_dict_size)) - src_pos = paddle.layer.data( - name='src_word_pos', - type=paddle.data_type.integer_value_sequence(pos_size + - 1)) # one for padding - - src_emb = paddle.layer.embedding( - input=src, - size=emb_dim, - name='src_word_emb', - param_attr=paddle.attr.Param( - initial_mean=0., initial_std=0.1)) - src_pos_emb = paddle.layer.embedding( - input=src_pos, - size=emb_dim, - name='src_pos_emb', - param_attr=paddle.attr.Param( - initial_mean=0., initial_std=0.1)) - - num_attention = len(dec_conv_blocks) - encoded_vec, encoded_sum = encoder( - token_emb=src_emb, - pos_emb=src_pos_emb, - conv_blocks=enc_conv_blocks, - num_attention=num_attention, - drop_rate=drop_rate, - with_bn=with_bn) - - trg = paddle.layer.data( - name='trg_word', - type=paddle.data_type.integer_value_sequence(trg_dict_size + - 1)) # one for padding - trg_pos = paddle.layer.data( - name='trg_word_pos', - type=paddle.data_type.integer_value_sequence(pos_size + - 1)) # one for padding - - trg_emb = paddle.layer.embedding( - input=trg, - size=emb_dim, - name='trg_word_emb', - param_attr=paddle.attr.Param( - initial_mean=0., initial_std=0.1)) - trg_pos_emb = paddle.layer.embedding( - input=trg_pos, - size=emb_dim, - name='trg_pos_emb', - param_attr=paddle.attr.Param( - initial_mean=0., initial_std=0.1)) - - decoder_out, weight = decoder( - token_emb=trg_emb, - pos_emb=trg_pos_emb, - encoded_vec=encoded_vec, - encoded_sum=encoded_sum, - dict_size=trg_dict_size, - conv_blocks=dec_conv_blocks, - drop_rate=drop_rate, - with_bn=with_bn) - - if is_infer: - return decoder_out, weight - - trg_next_word = paddle.layer.data( - name='trg_next_word', - type=paddle.data_type.integer_value_sequence(trg_dict_size)) - cost = paddle.layer.classification_cost( - input=decoder_out, label=trg_next_word) - - return cost diff --git a/legacy/conv_seq2seq/preprocess.py b/legacy/conv_seq2seq/preprocess.py deleted file mode 100644 index 1d5c7cdd7b5cc91e28854fa0bbeeffc9dcbe4e5c..0000000000000000000000000000000000000000 --- a/legacy/conv_seq2seq/preprocess.py +++ /dev/null @@ -1,30 +0,0 @@ -#coding=utf-8 - -import cPickle - - -def concat_file(file1, file2, dst_file): - with open(dst_file, 'w') as dst: - with open(file1) as f1: - with open(file2) as f2: - for i, (line1, line2) in enumerate(zip(f1, f2)): - line1 = line1.strip() - line = line1 + '\t' + line2 - dst.write(line) - - -if __name__ == '__main__': - concat_file('dev.de-en.de', 'dev.de-en.en', 'dev') - concat_file('test.de-en.de', 'test.de-en.en', 'test') - concat_file('train.de-en.de', 'train.de-en.en', 'train') - - src_dict = cPickle.load(open('vocab.de')) - trg_dict = cPickle.load(open('vocab.en')) - - with open('src_dict', 'w') as f: - f.write('\n\nUNK\n') - f.writelines('\n'.join(src_dict.keys())) - - with open('trg_dict', 'w') as f: - f.write('\n\nUNK\n') - f.writelines('\n'.join(trg_dict.keys())) diff --git a/legacy/conv_seq2seq/reader.py b/legacy/conv_seq2seq/reader.py deleted file mode 100644 index ad420af5faade1cd5ee7ef947f7f8920ce6a8bdb..0000000000000000000000000000000000000000 --- a/legacy/conv_seq2seq/reader.py +++ /dev/null @@ -1,67 +0,0 @@ -#coding=utf-8 - -import random - - -def load_dict(dict_file): - word_dict = dict() - with open(dict_file, 'r') as f: - for i, line in enumerate(f): - w = line.strip().split()[0] - word_dict[w] = i - return word_dict - - -def get_reverse_dict(dictionary): - reverse_dict = {dictionary[k]: k for k in dictionary.keys()} - return reverse_dict - - -def load_data(data_file, src_dict, trg_dict): - UNK_IDX = src_dict['UNK'] - with open(data_file, 'r') as f: - for line in f: - line_split = line.strip().split('\t') - if len(line_split) < 2: - continue - src, trg = line_split - src_words = src.strip().split() - trg_words = trg.strip().split() - src_seq = [src_dict.get(w, UNK_IDX) for w in src_words] - trg_seq = [trg_dict.get(w, UNK_IDX) for w in trg_words] - yield src_seq, trg_seq - - -def data_reader(data_file, src_dict, trg_dict, pos_size, padding_num): - def reader(): - UNK_IDX = src_dict['UNK'] - word_padding = trg_dict.__len__() - pos_padding = pos_size - - def _get_pos(pos_list, pos_size, pos_padding): - return [pos if pos < pos_size else pos_padding for pos in pos_list] - - with open(data_file, 'r') as f: - for line in f: - line_split = line.strip().split('\t') - if len(line_split) != 2: - continue - src, trg = line_split - src = src.strip().split() - src_word = [src_dict.get(w, UNK_IDX) for w in src] - src_word_pos = range(len(src_word)) - src_word_pos = _get_pos(src_word_pos, pos_size, pos_padding) - - trg = trg.strip().split() - trg_word = [trg_dict[''] - ] + [trg_dict.get(w, UNK_IDX) for w in trg] - trg_word_pos = range(len(trg_word)) - trg_word_pos = _get_pos(trg_word_pos, pos_size, pos_padding) - - trg_next_word = trg_word[1:] + [trg_dict['']] - trg_word = [word_padding] * padding_num + trg_word - trg_word_pos = [pos_padding] * padding_num + trg_word_pos - trg_next_word = trg_next_word + [trg_dict['']] * padding_num - yield src_word, src_word_pos, trg_word, trg_word_pos, trg_next_word - - return reader diff --git a/legacy/conv_seq2seq/train.py b/legacy/conv_seq2seq/train.py deleted file mode 100644 index 4bd9a1af675ada5820bb375938a4675e6e71fbe1..0000000000000000000000000000000000000000 --- a/legacy/conv_seq2seq/train.py +++ /dev/null @@ -1,263 +0,0 @@ -#coding=utf-8 - -import os -import sys -import time -import argparse -import distutils.util -import gzip -import numpy as np - -import paddle.v2 as paddle -from model import conv_seq2seq -import reader - - -def parse_args(): - parser = argparse.ArgumentParser( - description="PaddlePaddle Convolutional Seq2Seq") - parser.add_argument( - '--train_data_path', - type=str, - required=True, - help="Path of the training set") - parser.add_argument( - '--test_data_path', type=str, help='Path of the test set') - parser.add_argument( - '--src_dict_path', - type=str, - required=True, - help='Path of source dictionary') - parser.add_argument( - '--trg_dict_path', - type=str, - required=True, - help='Path of target dictionary') - parser.add_argument( - '--enc_blocks', type=str, help='Convolution blocks of the encoder') - parser.add_argument( - '--dec_blocks', type=str, help='Convolution blocks of the decoder') - parser.add_argument( - '--emb_size', - type=int, - default=256, - help='Dimension of word embedding. (default: %(default)s)') - parser.add_argument( - '--pos_size', - type=int, - default=200, - help='Total number of the position indexes. (default: %(default)s)') - parser.add_argument( - '--drop_rate', - type=float, - default=0., - help='Dropout rate. (default: %(default)s)') - parser.add_argument( - "--use_bn", - default=False, - type=distutils.util.strtobool, - help="Use batch normalization or not. (default: %(default)s)") - parser.add_argument( - "--use_gpu", - default=False, - type=distutils.util.strtobool, - help="Use gpu or not. (default: %(default)s)") - parser.add_argument( - "--trainer_count", - default=1, - type=int, - help="Trainer number. (default: %(default)s)") - parser.add_argument( - '--batch_size', - type=int, - default=32, - help="Size of a mini-batch. (default: %(default)s)") - parser.add_argument( - '--num_passes', - type=int, - default=15, - help="Number of passes to train. (default: %(default)s)") - return parser.parse_args() - - -def create_reader(padding_num, - train_data_path, - test_data_path=None, - src_dict=None, - trg_dict=None, - pos_size=200, - batch_size=32): - - train_reader = paddle.batch( - reader=paddle.reader.shuffle( - reader=reader.data_reader( - data_file=train_data_path, - src_dict=src_dict, - trg_dict=trg_dict, - pos_size=pos_size, - padding_num=padding_num), - buf_size=10240), - batch_size=batch_size) - - test_reader = None - if test_data_path: - test_reader = paddle.batch( - reader=paddle.reader.shuffle( - reader=reader.data_reader( - data_file=test_data_path, - src_dict=src_dict, - trg_dict=trg_dict, - pos_size=pos_size, - padding_num=padding_num), - buf_size=10240), - batch_size=batch_size) - - return train_reader, test_reader - - -def train(train_data_path, - test_data_path, - src_dict_path, - trg_dict_path, - enc_conv_blocks, - dec_conv_blocks, - emb_dim=256, - pos_size=200, - drop_rate=0., - use_bn=False, - batch_size=32, - num_passes=15): - """ - Train the convolution sequence-to-sequence model. - - :param train_data_path: The path of the training set. - :type train_data_path: str - :param test_data_path: The path of the test set. - :type test_data_path: str - :param src_dict_path: The path of the source dictionary. - :type src_dict_path: str - :param trg_dict_path: The path of the target dictionary. - :type trg_dict_path: str - :param enc_conv_blocks: The scale list of the encoder's convolution blocks. And each element of - the list contains output dimension and context length of the corresponding - convolution block. - :type enc_conv_blocks: list of tuple - :param dec_conv_blocks: The scale list of the decoder's convolution blocks. And each element of - the list contains output dimension and context length of the corresponding - convolution block. - :type dec_conv_blocks: list of tuple - :param emb_dim: The dimension of the embedding vector. - :type emb_dim: int - :param pos_size: The total number of the position indexes, which means - the maximum value of the index is pos_size - 1. - :type pos_size: int - :param drop_rate: Dropout rate. - :type drop_rate: float - :param use_bn: Whether to use batch normalization or not. False is the default value. - :type use_bn: bool - :param batch_size: The size of a mini-batch. - :type batch_size: int - :param num_passes: The total number of the passes to train. - :type num_passes: int - """ - # load dict - src_dict = reader.load_dict(src_dict_path) - trg_dict = reader.load_dict(trg_dict_path) - src_dict_size = src_dict.__len__() - trg_dict_size = trg_dict.__len__() - - optimizer = paddle.optimizer.Adam(learning_rate=1e-3, ) - - cost = conv_seq2seq( - src_dict_size=src_dict_size, - trg_dict_size=trg_dict_size, - pos_size=pos_size, - emb_dim=emb_dim, - enc_conv_blocks=enc_conv_blocks, - dec_conv_blocks=dec_conv_blocks, - drop_rate=drop_rate, - with_bn=use_bn, - is_infer=False) - - # create parameters and trainer - parameters = paddle.parameters.create(cost) - trainer = paddle.trainer.SGD(cost=cost, - parameters=parameters, - update_equation=optimizer) - - padding_list = [context_len - 1 for (size, context_len) in dec_conv_blocks] - padding_num = reduce(lambda x, y: x + y, padding_list) - train_reader, test_reader = create_reader( - padding_num=padding_num, - train_data_path=train_data_path, - test_data_path=test_data_path, - src_dict=src_dict, - trg_dict=trg_dict, - pos_size=pos_size, - batch_size=batch_size) - - feeding = { - 'src_word': 0, - 'src_word_pos': 1, - 'trg_word': 2, - 'trg_word_pos': 3, - 'trg_next_word': 4 - } - - # create event handler - def event_handler(event): - if isinstance(event, paddle.event.EndIteration): - if event.batch_id % 20 == 0: - cur_time = time.strftime('%Y.%m.%d %H:%M:%S', time.localtime()) - print "[%s]: Pass: %d, Batch: %d, TrainCost: %f, %s" % ( - cur_time, event.pass_id, event.batch_id, event.cost, - event.metrics) - sys.stdout.flush() - - if isinstance(event, paddle.event.EndPass): - if test_reader is not None: - cur_time = time.strftime('%Y.%m.%d %H:%M:%S', time.localtime()) - result = trainer.test(reader=test_reader, feeding=feeding) - print "[%s]: Pass: %d, TestCost: %f, %s" % ( - cur_time, event.pass_id, result.cost, result.metrics) - sys.stdout.flush() - with gzip.open("output/params.pass-%d.tar.gz" % event.pass_id, - 'w') as f: - trainer.save_parameter_to_tar(f) - - if not os.path.exists('output'): - os.mkdir('output') - - trainer.train( - reader=train_reader, - event_handler=event_handler, - num_passes=num_passes, - feeding=feeding) - - -def main(): - args = parse_args() - enc_conv_blocks = eval(args.enc_blocks) - dec_conv_blocks = eval(args.dec_blocks) - - sys.setrecursionlimit(10000) - - paddle.init(use_gpu=args.use_gpu, trainer_count=args.trainer_count) - - train( - train_data_path=args.train_data_path, - test_data_path=args.test_data_path, - src_dict_path=args.src_dict_path, - trg_dict_path=args.trg_dict_path, - enc_conv_blocks=enc_conv_blocks, - dec_conv_blocks=dec_conv_blocks, - emb_dim=args.emb_size, - pos_size=args.pos_size, - drop_rate=args.drop_rate, - use_bn=args.use_bn, - batch_size=args.batch_size, - num_passes=args.num_passes) - - -if __name__ == '__main__': - main() diff --git a/legacy/ctr/README.cn.md b/legacy/ctr/README.cn.md deleted file mode 100644 index d717264c46529c4ca3be6500983558b0384a7d77..0000000000000000000000000000000000000000 --- a/legacy/ctr/README.cn.md +++ /dev/null @@ -1,369 +0,0 @@ -运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。 - ---- - -# 点击率预估 - -以下是本例目录包含的文件以及对应说明: - -``` -├── README.md # 本教程markdown 文档 -├── dataset.md # 数据集处理教程 -├── images # 本教程图片目录 -│   ├── lr_vs_dnn.jpg -│   └── wide_deep.png -├── infer.py # 预测脚本 -├── network_conf.py # 模型网络配置 -├── reader.py # data reader -├── train.py # 训练脚本 -└── utils.py # helper functions -└── avazu_data_processer.py # 示例数据预处理脚本 -``` - -## 背景介绍 - -CTR(Click-Through Rate,点击率预估)\[[1](https://en.wikipedia.org/wiki/Click-through_rate)\] -是对用户点击一个特定链接的概率做出预测,是广告投放过程中的一个重要环节。精准的点击率预估对在线广告系统收益最大化具有重要意义。 - -当有多个广告位时,CTR 预估一般会作为排序的基准,比如在搜索引擎的广告系统里,当用户输入一个带商业价值的搜索词(query)时,系统大体上会执行下列步骤来展示广告: - -1. 获取与用户搜索词相关的广告集合 -2. 业务规则和相关性过滤 -3. 根据拍卖机制和 CTR 排序 -4. 展出广告 - -可以看到,CTR 在最终排序中起到了很重要的作用。 - -### 发展阶段 -在业内,CTR 模型经历了如下的发展阶段: - -- Logistic Regression(LR) / GBDT + 特征工程 -- LR + DNN 特征 -- DNN + 特征工程 - -在发展早期时 LR 一统天下,但最近 DNN 模型由于其强大的学习能力和逐渐成熟的性能优化, -逐渐地接过 CTR 预估任务的大旗。 - - -### LR vs DNN - -下图展示了 LR 和一个 \(3x2\) 的 DNN 模型的结构: - -

-
-Figure 1. LR 和 DNN 模型结构对比 -

- -LR 的蓝色箭头部分可以直接类比到 DNN 中对应的结构,可以看到 LR 和 DNN 有一些共通之处(比如权重累加), -但前者的模型复杂度在相同输入维度下比后者可能低很多(从某方面讲,模型越复杂,越有潜力学习到更复杂的信息); -如果 LR 要达到匹敌 DNN 的学习能力,必须增加输入的维度,也就是增加特征的数量, -这也就是为何 LR 和大规模的特征工程必须绑定在一起的原因。 - -LR 对于 DNN 模型的优势是对大规模稀疏特征的容纳能力,包括内存和计算量等方面,工业界都有非常成熟的优化方法; -而 DNN 模型具有自己学习新特征的能力,一定程度上能够提升特征使用的效率, -这使得 DNN 模型在同样规模特征的情况下,更有可能达到更好的学习效果。 - -本文后面的章节会演示如何使用 PaddlePaddle 编写一个结合两者优点的模型。 - - -## 数据和任务抽象 - -我们可以将 `click` 作为学习目标,任务可以有以下几种方案: - -1. 直接学习 click,0,1 作二元分类 -2. Learning to rank, 具体用 pairwise rank(标签 1>0)或者 listwise rank -3. 统计每个广告的点击率,将同一个 query 下的广告两两组合,点击率高的>点击率低的,做 rank 或者分类 - -我们直接使用第一种方法做分类任务。 - -我们使用 Kaggle 上 `Click-through rate prediction` 任务的数据集\[[2](https://www.kaggle.com/c/avazu-ctr-prediction/data)\] 来演示本例中的模型。 - -具体的特征处理方法参看 [data process](./dataset.md)。 - -本教程中演示模型的输入格式如下: - -``` -# \t \t click -1 23 190 \t 230:0.12 3421:0.9 23451:0.12 \t 0 -23 231 \t 1230:0.12 13421:0.9 \t 1 -``` - -详细的格式描述如下: - -- `dnn input ids` 采用 one-hot 表示,只需要填写值为1的ID(注意这里不是变长输入) -- `lr input sparse values` 使用了 `ID:VALUE` 的表示,值部分最好规约到值域 `[-1, 1]`。 - -此外,模型训练时需要传入一个文件描述 dnn 和 lr两个子模型的输入维度,文件的格式如下: - -``` -dnn_input_dim: -lr_input_dim: -``` - -其中, `` 表示一个整型数值。 - -本目录下的 `avazu_data_processor.py` 可以对下载的演示数据集\[[2](#参考文档)\] 进行处理,具体使用方法参考如下说明: - -``` -usage: avazu_data_processer.py [-h] --data_path DATA_PATH --output_dir - OUTPUT_DIR - [--num_lines_to_detect NUM_LINES_TO_DETECT] - [--test_set_size TEST_SET_SIZE] - [--train_size TRAIN_SIZE] - -PaddlePaddle CTR example - -optional arguments: - -h, --help show this help message and exit - --data_path DATA_PATH - path of the Avazu dataset - --output_dir OUTPUT_DIR - directory to output - --num_lines_to_detect NUM_LINES_TO_DETECT - number of records to detect dataset's meta info - --test_set_size TEST_SET_SIZE - size of the validation dataset(default: 10000) - --train_size TRAIN_SIZE - size of the trainset (default: 100000) -``` - -- `data_path` 是待处理的数据路径 -- `output_dir` 生成数据的输出路径 -- `num_lines_to_detect` 预先扫描数据生成ID的个数,这里是扫描的文件行数 -- `test_set_size` 生成测试集的行数 -- `train_size` 生成训练姐的行数 - -## Wide & Deep Learning Model - -谷歌在 16 年提出了 Wide & Deep Learning 的模型框架,用于融合适合学习抽象特征的 DNN 和 适用于大规模稀疏特征的 LR 两种模型的优点。 - - -### 模型简介 - -Wide & Deep Learning Model\[[3](#参考文献)\] 可以作为一种相对成熟的模型框架使用, -在 CTR 预估的任务中工业界也有一定的应用,因此本文将演示使用此模型来完成 CTR 预估的任务。 - -模型结构如下: - -

-
-Figure 2. Wide & Deep Model -

- -模型上边的 Wide 部分,可以容纳大规模系数特征,并且对一些特定的信息(比如 ID)有一定的记忆能力; -而模型下边的 Deep 部分,能够学习特征间的隐含关系,在相同数量的特征下有更好的学习和推导能力。 - - -### 编写模型输入 - -模型只接受 3 个输入,分别是 - -- `dnn_input` ,也就是 Deep 部分的输入 -- `lr_input` ,也就是 Wide 部分的输入 -- `click` , 点击与否,作为二分类模型学习的标签 - -```python -dnn_merged_input = layer.data( - name='dnn_input', - type=paddle.data_type.sparse_binary_vector(data_meta_info['dnn_input'])) - -lr_merged_input = layer.data( - name='lr_input', - type=paddle.data_type.sparse_binary_vector(data_meta_info['lr_input'])) - -click = paddle.layer.data(name='click', type=dtype.dense_vector(1)) -``` - -### 编写 Wide 部分 - -Wide 部分直接使用了 LR 模型,但激活函数改成了 `RELU` 来加速 - -```python -def build_lr_submodel(): - fc = layer.fc( - input=lr_merged_input, size=1, name='lr', act=paddle.activation.Relu()) - return fc -``` - -### 编写 Deep 部分 - -Deep 部分使用了标准的多层前向传导的 DNN 模型 - -```python -def build_dnn_submodel(dnn_layer_dims): - dnn_embedding = layer.fc(input=dnn_merged_input, size=dnn_layer_dims[0]) - _input_layer = dnn_embedding - for i, dim in enumerate(dnn_layer_dims[1:]): - fc = layer.fc( - input=_input_layer, - size=dim, - act=paddle.activation.Relu(), - name='dnn-fc-%d' % i) - _input_layer = fc - return _input_layer -``` - -### 两者融合 - -两个 submodel 的最上层输出加权求和得到整个模型的输出,输出部分使用 `sigmoid` 作为激活函数,得到区间 (0,1) 的预测值, -来逼近训练数据中二元类别的分布,并最终作为 CTR 预估的值使用。 - -```python -# conbine DNN and LR submodels -def combine_submodels(dnn, lr): - merge_layer = layer.concat(input=[dnn, lr]) - fc = layer.fc( - input=merge_layer, - size=1, - name='output', - # use sigmoid function to approximate ctr, wihch is a float value between 0 and 1. - act=paddle.activation.Sigmoid()) - return fc -``` - -### 训练任务的定义 -```python -dnn = build_dnn_submodel(dnn_layer_dims) -lr = build_lr_submodel() -output = combine_submodels(dnn, lr) - -# ============================================================================== -# cost and train period -# ============================================================================== -classification_cost = paddle.layer.multi_binary_label_cross_entropy_cost( - input=output, label=click) - - -paddle.init(use_gpu=False, trainer_count=11) - -params = paddle.parameters.create(classification_cost) - -optimizer = paddle.optimizer.Momentum(momentum=0) - -trainer = paddle.trainer.SGD( - cost=classification_cost, parameters=params, update_equation=optimizer) - -dataset = AvazuDataset(train_data_path, n_records_as_test=test_set_size) - -def event_handler(event): - if isinstance(event, paddle.event.EndIteration): - if event.batch_id % 100 == 0: - logging.warning("Pass %d, Samples %d, Cost %f" % ( - event.pass_id, event.batch_id * batch_size, event.cost)) - - if event.batch_id % 1000 == 0: - result = trainer.test( - reader=paddle.batch(dataset.test, batch_size=1000), - feeding=field_index) - logging.warning("Test %d-%d, Cost %f" % (event.pass_id, event.batch_id, - result.cost)) - - -trainer.train( - reader=paddle.batch( - paddle.reader.shuffle(dataset.train, buf_size=500), - batch_size=batch_size), - feeding=field_index, - event_handler=event_handler, - num_passes=100) -``` -## 运行训练和测试 -训练模型需要如下步骤: - -1. 准备训练数据 - 1. 从 [Kaggle CTR](https://www.kaggle.com/c/avazu-ctr-prediction/data) 下载 train.gz - 2. 解压 train.gz 得到 train.txt - 3. `mkdir -p output; python avazu_data_processer.py --data_path train.txt --output_dir output --num_lines_to_detect 1000 --test_set_size 100` 生成演示数据 -2. 执行 `python train.py --train_data_path ./output/train.txt --test_data_path ./output/test.txt --data_meta_file ./output/data.meta.txt --model_type=0` 开始训练 - -上面第2个步骤可以为 `train.py` 填充命令行参数来定制模型的训练过程,具体的命令行参数及用法如下 - -``` -usage: train.py [-h] --train_data_path TRAIN_DATA_PATH - [--test_data_path TEST_DATA_PATH] [--batch_size BATCH_SIZE] - [--num_passes NUM_PASSES] - [--model_output_prefix MODEL_OUTPUT_PREFIX] --data_meta_file - DATA_META_FILE --model_type MODEL_TYPE - -PaddlePaddle CTR example - -optional arguments: - -h, --help show this help message and exit - --train_data_path TRAIN_DATA_PATH - path of training dataset - --test_data_path TEST_DATA_PATH - path of testing dataset - --batch_size BATCH_SIZE - size of mini-batch (default:10000) - --num_passes NUM_PASSES - number of passes to train - --model_output_prefix MODEL_OUTPUT_PREFIX - prefix of path for model to store (default: - ./ctr_models) - --data_meta_file DATA_META_FILE - path of data meta info file - --model_type MODEL_TYPE - model type, classification: 0, regression 1 (default - classification) -``` - -- `train_data_path` : 训练集的路径 -- `test_data_path` : 测试集的路径 -- `num_passes`: 模型训练多少轮 -- `data_meta_file`: 参考[数据和任务抽象](### 数据和任务抽象)的描述。 -- `model_type`: 模型分类或回归 - - -## 用训好的模型做预测 -训好的模型可以用来预测新的数据, 预测数据的格式为 - -``` -# \t -1 23 190 \t 230:0.12 3421:0.9 23451:0.12 -23 231 \t 1230:0.12 13421:0.9 -``` - -这里与训练数据的格式唯一不同的地方,就是没有标签,也就是训练数据中第3列 `click` 对应的数值。 - -`infer.py` 的使用方法如下 - -``` -usage: infer.py [-h] --model_gz_path MODEL_GZ_PATH --data_path DATA_PATH - --prediction_output_path PREDICTION_OUTPUT_PATH - [--data_meta_path DATA_META_PATH] --model_type MODEL_TYPE - -PaddlePaddle CTR example - -optional arguments: - -h, --help show this help message and exit - --model_gz_path MODEL_GZ_PATH - path of model parameters gz file - --data_path DATA_PATH - path of the dataset to infer - --prediction_output_path PREDICTION_OUTPUT_PATH - path to output the prediction - --data_meta_path DATA_META_PATH - path of trainset's meta info, default is ./data.meta - --model_type MODEL_TYPE - model type, classification: 0, regression 1 (default - classification) -``` - -- `model_gz_path_model`:用 `gz` 压缩过的模型路径 -- `data_path` : 需要预测的数据路径 -- `prediction_output_paht`:预测输出的路径 -- `data_meta_file` :参考[数据和任务抽象](### 数据和任务抽象)的描述。 -- `model_type` :分类或回归 - -示例数据可以用如下命令预测 - -``` -python infer.py --model_gz_path --data_path output/infer.txt --prediction_output_path predictions.txt --data_meta_path data.meta.txt -``` - -最终的预测结果位于 `predictions.txt`。 - -## 参考文献 -1. -2. -3. Cheng H T, Koc L, Harmsen J, et al. [Wide & deep learning for recommender systems](https://arxiv.org/pdf/1606.07792.pdf)[C]//Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 2016: 7-10. diff --git a/legacy/ctr/README.md b/legacy/ctr/README.md deleted file mode 100644 index 9ace483be6126b31e064ce3014cea1b08664f8cf..0000000000000000000000000000000000000000 --- a/legacy/ctr/README.md +++ /dev/null @@ -1,343 +0,0 @@ -The minimum PaddlePaddle version needed for the code sample in this directory is v0.10.0. If you are on a version of PaddlePaddle earlier than v0.10.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html). - ---- - -# Click-Through Rate Prediction - -## Introduction - -CTR(Click-Through Rate)\[[1](https://en.wikipedia.org/wiki/Click-through_rate)\] -is a prediction of the probability that a user clicks on an advertisement. This model is widely used in the advertisement industry. Accurate click rate estimates are important for maximizing online advertising revenue. - -When there are multiple ad slots, CTR estimates are generally used as a baseline for ranking. For example, in a search engine's ad system, when the user enters a query, the system typically performs the following steps to show relevant ads. - -1. Get the ad collection associated with the user's search term. -2. Business rules and relevance filtering. -3. Rank by auction mechanism and CTR. -4. Show ads. - -Here,CTR plays a crucial role. - -### Brief history -Historically, the CTR prediction model has been evolving as follows. - -- Logistic Regression(LR) / Gradient Boosting Decision Trees (GBDT) + feature engineering -- LR + Deep Neural Network (DNN) -- DNN + feature engineering - -In the early stages of development LR dominated, but the recent years DNN based models are mainly used. - - -### LR vs DNN - -The following figure shows the structure of LR and DNN model: - -

-
-Figure 1. LR and DNN model structure comparison -

- -We can see, LR and CNN have some common structures. However, DNN can have non-linear relation between input and output values by adding activation unit and further layers. This enables DNN to achieve better learning results in CTR estimates. - -In the following, we demonstrate how to use PaddlePaddle to learn to predict CTR. - -## Data and Model formation - -Here `click` is the learning objective. There are several ways to learn the objectives. - -1. Direct learning click, 0,1 for binary classification -2. Learning to rank, pairwise rank or listwise rank -3. Measure the ad click rate of each ad, then rank by the click rate. - -In this example, we use the first method. - -We use the Kaggle `Click-through rate prediction` task \[[2](https://www.kaggle.com/c/avazu-ctr-prediction/data)\]. - -Please see the [data process](./dataset.md) for pre-processing data. - -The input data format for the demo model in this tutorial is as follows: - -``` -# \t \t click -1 23 190 \t 230:0.12 3421:0.9 23451:0.12 \t 0 -23 231 \t 1230:0.12 13421:0.9 \t 1 -``` - -Description: - -- `dnn input ids` one-hot coding. -- `lr input sparse values` Use `ID:VALUE` , values are preferaly scaled to the range `[-1, 1]`。 - -此外,模型训练时需要传入一个文件描述 dnn 和 lr两个子模型的输入维度,文件的格式如下: - -``` -dnn_input_dim: -lr_input_dim: -``` - - represents an integer value. - -`avazu_data_processor.py` can be used to download the data set \[[2](#参考文档)\]and pre-process the data. - -``` -usage: avazu_data_processer.py [-h] --data_path DATA_PATH --output_dir - OUTPUT_DIR - [--num_lines_to_detect NUM_LINES_TO_DETECT] - [--test_set_size TEST_SET_SIZE] - [--train_size TRAIN_SIZE] - -PaddlePaddle CTR example - -optional arguments: - -h, --help show this help message and exit - --data_path DATA_PATH - path of the Avazu dataset - --output_dir OUTPUT_DIR - directory to output - --num_lines_to_detect NUM_LINES_TO_DETECT - number of records to detect dataset's meta info - --test_set_size TEST_SET_SIZE - size of the validation dataset(default: 10000) - --train_size TRAIN_SIZE - size of the trainset (default: 100000) -``` - -- `data_path` The data path to be processed -- `output_dir` The output path of the data -- `num_lines_to_detect` The number of generated IDs -- `test_set_size` The number of rows for the test set -- `train_size` The number of rows of training set - -## Wide & Deep Learning Model - -Google proposed a model framework for Wide & Deep Learning to integrate the advantages of both DNNs suitable for learning abstract features and LR models for large sparse features. - - -### Introduction to the model - -Wide & Deep Learning Model\[[3](#References)\] is a relatively mature model, but this model is still being used in the CTR predicting task. Here we demonstrate the use of this model to complete the CTR predicting task. - -The model structure is as follows: - -

-
-Figure 2. Wide & Deep Model -

- -The wide part of the top side of the model can accommodate large-scale coefficient features and has some memory for some specific information (such as ID); and the Deep part of the bottom side of the model can learn the implicit relationship between features. - - -### Model Input - -The model has three inputs as follows. - -- `dnn_input` ,the Deep part of the input -- `lr_input` ,the wide part of the input -- `click` , click on or not - -```python -dnn_merged_input = layer.data( - name='dnn_input', - type=paddle.data_type.sparse_binary_vector(self.dnn_input_dim)) - -lr_merged_input = layer.data( - name='lr_input', - type=paddle.data_type.sparse_vector(self.lr_input_dim)) - -click = paddle.layer.data(name='click', type=dtype.dense_vector(1)) -``` - -### Wide part - -Wide part uses of the LR model, but the activation function changed to `RELU` for speed. - -```python -def build_lr_submodel(): - fc = layer.fc( - input=lr_merged_input, size=1, name='lr', act=paddle.activation.Relu()) - return fc -``` - -### Deep part - -The Deep part uses a standard multi-layer DNN. - -```python -def build_dnn_submodel(dnn_layer_dims): - dnn_embedding = layer.fc(input=dnn_merged_input, size=dnn_layer_dims[0]) - _input_layer = dnn_embedding - for i, dim in enumerate(dnn_layer_dims[1:]): - fc = layer.fc( - input=_input_layer, - size=dim, - act=paddle.activation.Relu(), - name='dnn-fc-%d' % i) - _input_layer = fc - return _input_layer -``` - -### Combine - -The output section uses `sigmoid` function to output (0,1) as the prediction value. - -```python -# conbine DNN and LR submodels -def combine_submodels(dnn, lr): - merge_layer = layer.concat(input=[dnn, lr]) - fc = layer.fc( - input=merge_layer, - size=1, - name='output', - # use sigmoid function to approximate ctr, wihch is a float value between 0 and 1. - act=paddle.activation.Sigmoid()) - return fc -``` - -### Training -```python -dnn = build_dnn_submodel(dnn_layer_dims) -lr = build_lr_submodel() -output = combine_submodels(dnn, lr) - -# ============================================================================== -# cost and train period -# ============================================================================== -classification_cost = paddle.layer.multi_binary_label_cross_entropy_cost( - input=output, label=click) - - -paddle.init(use_gpu=False, trainer_count=11) - -params = paddle.parameters.create(classification_cost) - -optimizer = paddle.optimizer.Momentum(momentum=0) - -trainer = paddle.trainer.SGD( - cost=classification_cost, parameters=params, update_equation=optimizer) - -dataset = AvazuDataset(train_data_path, n_records_as_test=test_set_size) - -def event_handler(event): - if isinstance(event, paddle.event.EndIteration): - if event.batch_id % 100 == 0: - logging.warning("Pass %d, Samples %d, Cost %f" % ( - event.pass_id, event.batch_id * batch_size, event.cost)) - - if event.batch_id % 1000 == 0: - result = trainer.test( - reader=paddle.batch(dataset.test, batch_size=1000), - feeding=field_index) - logging.warning("Test %d-%d, Cost %f" % (event.pass_id, event.batch_id, - result.cost)) - - -trainer.train( - reader=paddle.batch( - paddle.reader.shuffle(dataset.train, buf_size=500), - batch_size=batch_size), - feeding=field_index, - event_handler=event_handler, - num_passes=100) -``` - -## Run training and testing -The model go through the following steps: - -1. Prepare training data - 1. Download train.gz from [Kaggle CTR](https://www.kaggle.com/c/avazu-ctr-prediction/data) . - 2. Unzip train.gz to get train.txt - 3. `mkdir -p output; python avazu_data_processer.py --data_path train.txt --output_dir output --num_lines_to_detect 1000 --test_set_size 100` 生成演示数据 -2. Execute `python train.py --train_data_path ./output/train.txt --test_data_path ./output/test.txt --data_meta_file ./output/data.meta.txt --model_type=0`. Start training. - -The argument options for `train.py` are as follows. - -``` -usage: train.py [-h] --train_data_path TRAIN_DATA_PATH - [--test_data_path TEST_DATA_PATH] [--batch_size BATCH_SIZE] - [--num_passes NUM_PASSES] - [--model_output_prefix MODEL_OUTPUT_PREFIX] --data_meta_file - DATA_META_FILE --model_type MODEL_TYPE - -PaddlePaddle CTR example - -optional arguments: - -h, --help show this help message and exit - --train_data_path TRAIN_DATA_PATH - path of training dataset - --test_data_path TEST_DATA_PATH - path of testing dataset - --batch_size BATCH_SIZE - size of mini-batch (default:10000) - --num_passes NUM_PASSES - number of passes to train - --model_output_prefix MODEL_OUTPUT_PREFIX - prefix of path for model to store (default: - ./ctr_models) - --data_meta_file DATA_META_FILE - path of data meta info file - --model_type MODEL_TYPE - model type, classification: 0, regression 1 (default - classification) -``` - -- `train_data_path` : The path of the training set -- `test_data_path` : The path of the testing set -- `num_passes`: number of rounds of model training -- `data_meta_file`: Please refer to [数据和任务抽象](### 数据和任务抽象)的描述。 -- `model_type`: Model classification or regressio - - -## Use the training model for prediction -The training model can be used to predict new data, and the format of the forecast data is as follows. - - -``` -# \t -1 23 190 \t 230:0.12 3421:0.9 23451:0.12 -23 231 \t 1230:0.12 13421:0.9 -``` - -Here the only difference to the training data is that there is no label (i.e. `click` values). - -We now can use `infer.py` to perform inference. - -``` -usage: infer.py [-h] --model_gz_path MODEL_GZ_PATH --data_path DATA_PATH - --prediction_output_path PREDICTION_OUTPUT_PATH - [--data_meta_path DATA_META_PATH] --model_type MODEL_TYPE - -PaddlePaddle CTR example - -optional arguments: - -h, --help show this help message and exit - --model_gz_path MODEL_GZ_PATH - path of model parameters gz file - --data_path DATA_PATH - path of the dataset to infer - --prediction_output_path PREDICTION_OUTPUT_PATH - path to output the prediction - --data_meta_path DATA_META_PATH - path of trainset's meta info, default is ./data.meta - --model_type MODEL_TYPE - model type, classification: 0, regression 1 (default - classification) -``` - -- `model_gz_path_model`:path for `gz` compressed data. -- `data_path` : -- `prediction_output_patj`:path for the predicted values s -- `data_meta_file` :Please refer to [数据和任务抽象](### 数据和任务抽象)。 -- `model_type` :Classification or regression - -The sample data can be predicted with the following command - -``` -python infer.py --model_gz_path --data_path output/infer.txt --prediction_output_path predictions.txt --data_meta_path data.meta.txt -``` - -The final prediction is written in `predictions.txt`。 - -## References -1. -2. -3. Cheng H T, Koc L, Harmsen J, et al. [Wide & deep learning for recommender systems](https://arxiv.org/pdf/1606.07792.pdf)[C]//Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 2016: 7-10. diff --git a/legacy/ctr/avazu_data_processer.py b/legacy/ctr/avazu_data_processer.py deleted file mode 100644 index dd3c1441f8f8b26473d15889198abb3593edfa51..0000000000000000000000000000000000000000 --- a/legacy/ctr/avazu_data_processer.py +++ /dev/null @@ -1,414 +0,0 @@ -import sys -import csv -import cPickle -import argparse -import os -import numpy as np - -from utils import logger, TaskMode - -parser = argparse.ArgumentParser(description="PaddlePaddle CTR example") -parser.add_argument( - '--data_path', type=str, required=True, help="path of the Avazu dataset") -parser.add_argument( - '--output_dir', type=str, required=True, help="directory to output") -parser.add_argument( - '--num_lines_to_detect', - type=int, - default=500000, - help="number of records to detect dataset's meta info") -parser.add_argument( - '--test_set_size', - type=int, - default=10000, - help="size of the validation dataset(default: 10000)") -parser.add_argument( - '--train_size', - type=int, - default=100000, - help="size of the trainset (default: 100000)") -args = parser.parse_args() -''' -The fields of the dataset are: - - 0. id: ad identifier - 1. click: 0/1 for non-click/click - 2. hour: format is YYMMDDHH, so 14091123 means 23:00 on Sept. 11, 2014 UTC. - 3. C1 -- anonymized categorical variable - 4. banner_pos - 5. site_id - 6. site_domain - 7. site_category - 8. app_id - 9. app_domain - 10. app_category - 11. device_id - 12. device_ip - 13. device_model - 14. device_type - 15. device_conn_type - 16. C14-C21 -- anonymized categorical variables - -We will treat the following fields as categorical features: - - - C1 - - banner_pos - - site_category - - app_category - - device_type - - device_conn_type - -and some other features as id features: - - - id - - site_id - - app_id - - device_id - -The `hour` field will be treated as a continuous feature and will be transformed -to one-hot representation which has 24 bits. - -This script will output 3 files: - -1. train.txt -2. test.txt -3. infer.txt - -all the files are for demo. -''' - -feature_dims = {} - -categorial_features = ( - 'C1 banner_pos site_category app_category ' + 'device_type device_conn_type' -).split() - -id_features = 'id site_id app_id device_id _device_id_cross_site_id'.split() - - -def get_all_field_names(mode=0): - ''' - @mode: int - 0 for train, 1 for test - @return: list of str - ''' - return categorial_features + ['hour'] + id_features + ['click'] \ - if mode == 0 else [] - - -class CategoryFeatureGenerator(object): - ''' - Generator category features. - - Register all records by calling `register` first, then call `gen` to generate - one-hot representation for a record. - ''' - - def __init__(self): - self.dic = {'unk': 0} - self.counter = 1 - - def register(self, key): - ''' - Register record. - ''' - if key not in self.dic: - self.dic[key] = self.counter - self.counter += 1 - - def size(self): - return len(self.dic) - - def gen(self, key): - ''' - Generate one-hot representation for a record. - ''' - if key not in self.dic: - res = self.dic['unk'] - else: - res = self.dic[key] - return [res] - - def __repr__(self): - return '' % len(self.dic) - - -class IDfeatureGenerator(object): - def __init__(self, max_dim, cross_fea0=None, cross_fea1=None): - ''' - @max_dim: int - Size of the id elements' space - ''' - self.max_dim = max_dim - self.cross_fea0 = cross_fea0 - self.cross_fea1 = cross_fea1 - - def gen(self, key): - ''' - Generate one-hot representation for records - ''' - return [hash(key) % self.max_dim] - - def gen_cross_fea(self, fea1, fea2): - key = str(fea1) + str(fea2) - return self.gen(key) - - def size(self): - return self.max_dim - - -class ContinuousFeatureGenerator(object): - def __init__(self, n_intervals): - self.min = sys.maxint - self.max = sys.minint - self.n_intervals = n_intervals - - def register(self, val): - self.min = min(self.minint, val) - self.max = max(self.maxint, val) - - def gen(self, val): - self.len_part = (self.max - self.min) / self.n_intervals - return (val - self.min) / self.len_part - - -# init all feature generators -fields = {} -for key in categorial_features: - fields[key] = CategoryFeatureGenerator() -for key in id_features: - # for cross features - if 'cross' in key: - feas = key[1:].split('_cross_') - fields[key] = IDfeatureGenerator(10000000, *feas) - # for normal ID features - else: - fields[key] = IDfeatureGenerator(10000) - -# used as feed_dict in PaddlePaddle -field_index = dict((key, id) - for id, key in enumerate(['dnn_input', 'lr_input', 'click'])) - - -def detect_dataset(path, topn, id_fea_space=10000): - ''' - Parse the first `topn` records to collect meta information of this dataset. - - NOTE the records should be randomly shuffled first. - ''' - # create categorical statis objects. - logger.warning('detecting dataset') - - with open(path, 'rb') as csvfile: - reader = csv.DictReader(csvfile) - for row_id, row in enumerate(reader): - if row_id > topn: - break - - for key in categorial_features: - fields[key].register(row[key]) - - for key, item in fields.items(): - feature_dims[key] = item.size() - - feature_dims['hour'] = 24 - feature_dims['click'] = 1 - - feature_dims['dnn_input'] = np.sum( - feature_dims[key] for key in categorial_features + ['hour']) + 1 - feature_dims['lr_input'] = np.sum(feature_dims[key] - for key in id_features) + 1 - return feature_dims - - -def load_data_meta(meta_path): - ''' - Load dataset's meta infomation. - ''' - feature_dims, fields = cPickle.load(open(meta_path, 'rb')) - return feature_dims, fields - - -def concat_sparse_vectors(inputs, dims): - ''' - Concaterate more than one sparse vectors into one. - - @inputs: list - list of sparse vector - @dims: list of int - dimention of each sparse vector - ''' - res = [] - assert len(inputs) == len(dims) - start = 0 - for no, vec in enumerate(inputs): - for v in vec: - res.append(v + start) - start += dims[no] - return res - - -class AvazuDataset(object): - ''' - Load AVAZU dataset as train set. - ''' - - def __init__(self, - train_path, - n_records_as_test=-1, - fields=None, - feature_dims=None): - self.train_path = train_path - self.n_records_as_test = n_records_as_test - self.fields = fields - # default is train mode. - self.mode = TaskMode.create_train() - - self.categorial_dims = [ - feature_dims[key] for key in categorial_features + ['hour'] - ] - self.id_dims = [feature_dims[key] for key in id_features] - - def train(self): - ''' - Load trainset. - ''' - logger.info("load trainset from %s" % self.train_path) - self.mode = TaskMode.create_train() - with open(self.train_path) as f: - reader = csv.DictReader(f) - - for row_id, row in enumerate(reader): - # skip top n lines - if self.n_records_as_test > 0 and row_id < self.n_records_as_test: - continue - - rcd = self._parse_record(row) - if rcd: - yield rcd - - def test(self): - ''' - Load testset. - ''' - logger.info("load testset from %s" % self.train_path) - self.mode = TaskMode.create_test() - with open(self.train_path) as f: - reader = csv.DictReader(f) - - for row_id, row in enumerate(reader): - # skip top n lines - if self.n_records_as_test > 0 and row_id > self.n_records_as_test: - break - - rcd = self._parse_record(row) - if rcd: - yield rcd - - def infer(self): - ''' - Load inferset. - ''' - logger.info("load inferset from %s" % self.train_path) - self.mode = TaskMode.create_infer() - with open(self.train_path) as f: - reader = csv.DictReader(f) - - for row_id, row in enumerate(reader): - rcd = self._parse_record(row) - if rcd: - yield rcd - - def _parse_record(self, row): - ''' - Parse a CSV row and get a record. - ''' - record = [] - for key in categorial_features: - record.append(self.fields[key].gen(row[key])) - record.append([int(row['hour'][-2:])]) - dense_input = concat_sparse_vectors(record, self.categorial_dims) - - record = [] - for key in id_features: - if 'cross' not in key: - record.append(self.fields[key].gen(row[key])) - else: - fea0 = self.fields[key].cross_fea0 - fea1 = self.fields[key].cross_fea1 - record.append(self.fields[key].gen_cross_fea(row[fea0], row[ - fea1])) - - sparse_input = concat_sparse_vectors(record, self.id_dims) - - record = [dense_input, sparse_input] - - if not self.mode.is_infer(): - record.append(list((int(row['click']), ))) - return record - - -def ids2dense(vec, dim): - return vec - - -def ids2sparse(vec): - return ["%d:1" % x for x in vec] - - -detect_dataset(args.data_path, args.num_lines_to_detect) -dataset = AvazuDataset( - args.data_path, - args.test_set_size, - fields=fields, - feature_dims=feature_dims) - -output_trainset_path = os.path.join(args.output_dir, 'train.txt') -output_testset_path = os.path.join(args.output_dir, 'test.txt') -output_infer_path = os.path.join(args.output_dir, 'infer.txt') -output_meta_path = os.path.join(args.output_dir, 'data.meta.txt') - -with open(output_trainset_path, 'w') as f: - for id, record in enumerate(dataset.train()): - if id and id % 10000 == 0: - logger.info("load %d records" % id) - if id > args.train_size: - break - dnn_input, lr_input, click = record - dnn_input = ids2dense(dnn_input, feature_dims['dnn_input']) - lr_input = ids2sparse(lr_input) - line = "%s\t%s\t%d\n" % (' '.join(map(str, dnn_input)), - ' '.join(map(str, lr_input)), click[0]) - f.write(line) - logger.info('write to %s' % output_trainset_path) - -with open(output_testset_path, 'w') as f: - for id, record in enumerate(dataset.test()): - dnn_input, lr_input, click = record - dnn_input = ids2dense(dnn_input, feature_dims['dnn_input']) - lr_input = ids2sparse(lr_input) - line = "%s\t%s\t%d\n" % (' '.join(map(str, dnn_input)), - ' '.join(map(str, lr_input)), click[0]) - f.write(line) - logger.info('write to %s' % output_testset_path) - -with open(output_infer_path, 'w') as f: - for id, record in enumerate(dataset.infer()): - dnn_input, lr_input = record - dnn_input = ids2dense(dnn_input, feature_dims['dnn_input']) - lr_input = ids2sparse(lr_input) - line = "%s\t%s\n" % ( - ' '.join(map(str, dnn_input)), - ' '.join(map(str, lr_input)), ) - f.write(line) - if id > args.test_set_size: - break - logger.info('write to %s' % output_infer_path) - -with open(output_meta_path, 'w') as f: - lines = [ - "dnn_input_dim: %d" % feature_dims['dnn_input'], - "lr_input_dim: %d" % feature_dims['lr_input'] - ] - f.write('\n'.join(lines)) - logger.info('write data meta into %s' % output_meta_path) diff --git a/legacy/ctr/dataset.md b/legacy/ctr/dataset.md deleted file mode 100644 index 16c0f9784bf3409ac5bbe704f932a9b28680fbf8..0000000000000000000000000000000000000000 --- a/legacy/ctr/dataset.md +++ /dev/null @@ -1,296 +0,0 @@ -# 数据及处理 -## 数据集介绍 - -本教程演示使用Kaggle上CTR任务的数据集\[[3](#参考文献)\]的预处理方法,最终产生本模型需要的格式,详细的数据格式参考[README.md](./README.md)。 - -Wide && Deep Model\[[2](#参考文献)\]的优势是融合稠密特征和大规模稀疏特征, -因此特征处理方面也针对稠密和稀疏两种特征作处理, -其中Deep部分的稠密值全部转化为ID类特征, -通过embedding 来转化为稠密的向量输入;Wide部分主要通过ID的叉乘提升维度。 - -数据集使用 `csv` 格式存储,其中各个字段内容如下: - -- `id` : ad identifier -- `click` : 0/1 for non-click/click -- `hour` : format is YYMMDDHH, so 14091123 means 23:00 on Sept. 11, 2014 UTC. -- `C1` : anonymized categorical variable -- `banner_pos` -- `site_id` -- `site_domain` -- `site_category` -- `app_id` -- `app_domain` -- `app_category` -- `device_id` -- `device_ip` -- `device_model` -- `device_type` -- `device_conn_type` -- `C14-C21` : anonymized categorical variables - - -## 特征提取 - -下面我们会简单演示几种特征的提取方式。 - -原始数据中的特征可以分为以下几类: - -1. ID 类特征(稀疏,数量多) -- `id` -- `site_id` -- `app_id` -- `device_id` - -2. 类别类特征(稀疏,但数量有限) - -- `C1` -- `site_category` -- `device_type` -- `C14-C21` - -3. 数值型特征转化为类别型特征 - -- hour (可以转化成数值,也可以按小时为单位转化为类别) - -### 类别类特征 - -类别类特征的提取方法有以下两种: - -1. One-hot 表示作为特征 -2. 类似词向量,用一个 Embedding 将每个类别映射到对应的向量 - - -### ID 类特征 - -ID 类特征的特点是稀疏数据,但量比较大,直接使用 One-hot 表示时维度过大。 - -一般会作如下处理: - -1. 确定表示的最大维度 N -2. newid = id % N -3. 用 newid 作为类别类特征使用 - -上面的方法尽管存在一定的碰撞概率,但能够处理任意数量的 ID 特征,并保留一定的效果\[[2](#参考文献)\]。 - -### 数值型特征 - -一般会做如下处理: - -- 归一化,直接作为特征输入模型 -- 用区间分割处理成类别类特征,稀疏化表示,模糊细微上的差别 - -## 特征处理 - - -### 类别型特征 - -类别型特征有有限多种值,在模型中,我们一般使用 Embedding将每种值映射为连续值的向量。 - -这种特征在输入到模型时,一般使用 One-hot 表示,相关处理方法如下: - -```python -class CategoryFeatureGenerator(object): - ''' - Generator category features. - - Register all records by calling ~register~ first, then call ~gen~ to generate - one-hot representation for a record. - ''' - - def __init__(self): - self.dic = {'unk': 0} - self.counter = 1 - - def register(self, key): - ''' - Register record. - ''' - if key not in self.dic: - self.dic[key] = self.counter - self.counter += 1 - - def size(self): - return len(self.dic) - - def gen(self, key): - ''' - Generate one-hot representation for a record. - ''' - if key not in self.dic: - res = self.dic['unk'] - else: - res = self.dic[key] - return [res] - - def __repr__(self): - return '' % len(self.dic) -``` - -`CategoryFeatureGenerator` 需要先扫描数据集,得到该类别对应的项集合,之后才能开始生成特征。 - -我们的实验数据集\[[3](https://www.kaggle.com/c/avazu-ctr-prediction/data)\]已经经过shuffle,可以扫描前面一定数目的记录来近似总的类别项集合(等价于随机抽样), -对于没有抽样上的低频类别项,可以用一个 UNK 的特殊值表示。 - -```python -fields = {} -for key in categorial_features: - fields[key] = CategoryFeatureGenerator() - -def detect_dataset(path, topn, id_fea_space=10000): - ''' - Parse the first `topn` records to collect meta information of this dataset. - - NOTE the records should be randomly shuffled first. - ''' - # create categorical statis objects. - - with open(path, 'rb') as csvfile: - reader = csv.DictReader(csvfile) - for row_id, row in enumerate(reader): - if row_id > topn: - break - - for key in categorial_features: - fields[key].register(row[key]) -``` - -`CategoryFeatureGenerator` 在注册得到数据集中对应类别信息后,可以对相应记录生成对应的特征表示: - -```python -record = [] -for key in categorial_features: - record.append(fields[key].gen(row[key])) -``` - -本任务中,类别类特征会输入到 DNN 中使用。 - -### ID 类特征 - -ID 类特征代稀疏值,且值的空间很大的情况,一般用模操作规约到一个有限空间, -之后可以当成类别类特征使用,这里我们会将 ID 类特征输入到 LR 模型中使用。 - -```python -class IDfeatureGenerator(object): - def __init__(self, max_dim): - ''' - @max_dim: int - Size of the id elements' space - ''' - self.max_dim = max_dim - - def gen(self, key): - ''' - Generate one-hot representation for records - ''' - return [hash(key) % self.max_dim] - - def size(self): - return self.max_dim -``` - -`IDfeatureGenerator` 不需要预先初始化,可以直接生成特征,比如 - -```python -record = [] -for key in id_features: - if 'cross' not in key: - record.append(fields[key].gen(row[key])) -``` - -### 交叉类特征 - -LR 模型作为 Wide & Deep model 的 `wide` 部分,可以输入很 wide 的数据(特征空间的维度很大), -为了充分利用这个优势,我们将演示交叉组合特征构建成更大维度特征的情况,之后塞入到模型中训练。 - -这里我们依旧使用模操作来约束最终组合出的特征空间的大小,具体实现是直接在 `IDfeatureGenerator` 中添加一个 `gen_cross_feature` 的方法: - -```python -def gen_cross_fea(self, fea1, fea2): - key = str(fea1) + str(fea2) - return self.gen(key) -``` - -比如,我们觉得原始数据中, `device_id` 和 `site_id` 有一些关联(比如某个 device 倾向于浏览特定 site), -我们通过组合出两者组合来捕捉这类信息。 - -```python -fea0 = fields[key].cross_fea0 -fea1 = fields[key].cross_fea1 -record.append( - fields[key].gen_cross_fea(row[fea0], row[fea1])) -``` - -### 特征维度 -#### Deep submodel(DNN)特征 -| feature | dimention | -|------------------|-----------| -| app_category | 21 | -| site_category | 22 | -| device_conn_type | 5 | -| hour | 24 | -| banner_pos | 7 | -| **Total** | 79 | - -#### Wide submodel(LR)特征 -| Feature | Dimention | -|---------------------|-----------| -| id | 10000 | -| site_id | 10000 | -| app_id | 10000 | -| device_id | 10000 | -| device_id X site_id | 1000000 | -| **Total** | 1,040,000 | - -## 输入到 PaddlePaddle 中 - -Deep 和 Wide 两部分均以 `sparse_binary_vector` 的格式 \[[1](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/api/v1/data_provider/pydataprovider2_en.rst)\] 输入,输入前需要将相关特征拼合,模型最终只接受 3 个 input, -分别是 - -1. `dnn input` ,DNN 的输入 -2. `lr input` , LR 的输入 -3. `click` , 标签 - -拼合特征的方法: - -```python -def concat_sparse_vectors(inputs, dims): - ''' - concaterate sparse vectors into one - - @inputs: list - list of sparse vector - @dims: list of int - dimention of each sparse vector - ''' - res = [] - assert len(inputs) == len(dims) - start = 0 - for no, vec in enumerate(inputs): - for v in vec: - res.append(v + start) - start += dims[no] - return res -``` - -生成最终特征的代码如下: - -```python -# dimentions of the features -categorial_dims = [ - feature_dims[key] for key in categorial_features + ['hour'] -] -id_dims = [feature_dims[key] for key in id_features] - -dense_input = concat_sparse_vectors(record, categorial_dims) -sparse_input = concat_sparse_vectors(record, id_dims) - -record = [dense_input, sparse_input] -record.append(list((int(row['click']), ))) -yield record -``` - -## 参考文献 - -1. -2. Mikolov T, Deoras A, Povey D, et al. [Strategies for training large scale neural network language models](https://www.researchgate.net/profile/Lukas_Burget/publication/241637478_Strategies_for_training_large_scale_neural_network_language_models/links/542c14960cf27e39fa922ed3.pdf)[C]//Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on. IEEE, 2011: 196-201. -3. diff --git a/legacy/ctr/images/lr_vs_dnn.jpg b/legacy/ctr/images/lr_vs_dnn.jpg deleted file mode 100644 index 50a0db583cd9b6e1a5bc0f83a28ab6e22d649931..0000000000000000000000000000000000000000 Binary files a/legacy/ctr/images/lr_vs_dnn.jpg and /dev/null differ diff --git a/legacy/ctr/images/wide_deep.png b/legacy/ctr/images/wide_deep.png deleted file mode 100644 index 616f88cb22607c1c6bcbe4312644f632ef284e8e..0000000000000000000000000000000000000000 Binary files a/legacy/ctr/images/wide_deep.png and /dev/null differ diff --git a/legacy/ctr/infer.py b/legacy/ctr/infer.py deleted file mode 100644 index 6541c74638df63a9304989c2ccaff0ff4c00463a..0000000000000000000000000000000000000000 --- a/legacy/ctr/infer.py +++ /dev/null @@ -1,79 +0,0 @@ -import gzip -import argparse -import itertools - -import paddle.v2 as paddle -import network_conf -from train import dnn_layer_dims -import reader -from utils import logger, ModelType - -parser = argparse.ArgumentParser(description="PaddlePaddle CTR example") -parser.add_argument( - '--model_gz_path', - type=str, - required=True, - help="path of model parameters gz file") -parser.add_argument( - '--data_path', type=str, required=True, help="path of the dataset to infer") -parser.add_argument( - '--prediction_output_path', - type=str, - required=True, - help="path to output the prediction") -parser.add_argument( - '--data_meta_path', - type=str, - default="./data.meta", - help="path of trainset's meta info, default is ./data.meta") -parser.add_argument( - '--model_type', - type=int, - required=True, - default=ModelType.CLASSIFICATION, - help='model type, classification: %d, regression %d (default classification)' - % (ModelType.CLASSIFICATION, ModelType.REGRESSION)) - -args = parser.parse_args() - -paddle.init(use_gpu=False, trainer_count=1) - - -class CTRInferer(object): - def __init__(self, param_path): - logger.info("create CTR model") - dnn_input_dim, lr_input_dim = reader.load_data_meta(args.data_meta_path) - # create the mdoel - self.ctr_model = network_conf.CTRmodel( - dnn_layer_dims, - dnn_input_dim, - lr_input_dim, - model_type=ModelType(args.model_type), - is_infer=True) - # load parameter - logger.info("load model parameters from %s" % param_path) - self.parameters = paddle.parameters.Parameters.from_tar( - gzip.open(param_path, 'r')) - self.inferer = paddle.inference.Inference( - output_layer=self.ctr_model.model, - parameters=self.parameters, ) - - def infer(self, data_path): - logger.info("infer data...") - dataset = reader.Dataset() - infer_reader = paddle.batch( - dataset.infer(args.data_path), batch_size=1000) - logger.warning('write predictions to %s' % args.prediction_output_path) - output_f = open(args.prediction_output_path, 'w') - for id, batch in enumerate(infer_reader()): - res = self.inferer.infer(input=batch) - predictions = [x for x in itertools.chain.from_iterable(res)] - assert len(batch) == len( - predictions), "predict error, %d inputs, but %d predictions" % ( - len(batch), len(predictions)) - output_f.write('\n'.join(map(str, predictions)) + '\n') - - -if __name__ == '__main__': - ctr_inferer = CTRInferer(args.model_gz_path) - ctr_inferer.infer(args.data_path) diff --git a/legacy/ctr/network_conf.py b/legacy/ctr/network_conf.py deleted file mode 100644 index bcff49ee05e1d8cc80e2fdd28a771bf9bf9502e3..0000000000000000000000000000000000000000 --- a/legacy/ctr/network_conf.py +++ /dev/null @@ -1,104 +0,0 @@ -import paddle.v2 as paddle -from paddle.v2 import layer -from paddle.v2 import data_type as dtype -from utils import logger, ModelType - - -class CTRmodel(object): - ''' - A CTR model which implements wide && deep learning model. - ''' - - def __init__(self, - dnn_layer_dims, - dnn_input_dim, - lr_input_dim, - model_type=ModelType.create_classification(), - is_infer=False): - ''' - @dnn_layer_dims: list of integer - dims of each layer in dnn - @dnn_input_dim: int - size of dnn's input layer - @lr_input_dim: int - size of lr's input layer - @is_infer: bool - whether to build a infer model - ''' - self.dnn_layer_dims = dnn_layer_dims - self.dnn_input_dim = dnn_input_dim - self.lr_input_dim = lr_input_dim - self.model_type = model_type - self.is_infer = is_infer - - self._declare_input_layers() - - self.dnn = self._build_dnn_submodel_(self.dnn_layer_dims) - self.lr = self._build_lr_submodel_() - - # model's prediction - # TODO(superjom) rename it to prediction - if self.model_type.is_classification(): - self.model = self._build_classification_model(self.dnn, self.lr) - if self.model_type.is_regression(): - self.model = self._build_regression_model(self.dnn, self.lr) - - def _declare_input_layers(self): - self.dnn_merged_input = layer.data( - name='dnn_input', - type=paddle.data_type.sparse_binary_vector(self.dnn_input_dim)) - - self.lr_merged_input = layer.data( - name='lr_input', - type=paddle.data_type.sparse_float_vector(self.lr_input_dim)) - - if not self.is_infer: - self.click = paddle.layer.data( - name='click', type=dtype.dense_vector(1)) - - def _build_dnn_submodel_(self, dnn_layer_dims): - ''' - build DNN submodel. - ''' - dnn_embedding = layer.fc(input=self.dnn_merged_input, - size=dnn_layer_dims[0]) - _input_layer = dnn_embedding - for i, dim in enumerate(dnn_layer_dims[1:]): - fc = layer.fc(input=_input_layer, - size=dim, - act=paddle.activation.Relu(), - name='dnn-fc-%d' % i) - _input_layer = fc - return _input_layer - - def _build_lr_submodel_(self): - ''' - config LR submodel - ''' - fc = layer.fc(input=self.lr_merged_input, - size=1, - act=paddle.activation.Relu()) - return fc - - def _build_classification_model(self, dnn, lr): - merge_layer = layer.concat(input=[dnn, lr]) - self.output = layer.fc( - input=merge_layer, - size=1, - # use sigmoid function to approximate ctr rate, a float value between 0 and 1. - act=paddle.activation.Sigmoid()) - - if not self.is_infer: - self.train_cost = paddle.layer.multi_binary_label_cross_entropy_cost( - input=self.output, label=self.click) - return self.output - - def _build_regression_model(self, dnn, lr): - merge_layer = layer.concat(input=[dnn, lr]) - self.output = layer.fc(input=merge_layer, - size=1, - act=paddle.activation.Sigmoid()) - if not self.is_infer: - self.train_cost = paddle.layer.square_error_cost( - input=self.output, label=self.click) - return self.output diff --git a/legacy/ctr/reader.py b/legacy/ctr/reader.py deleted file mode 100644 index cafa2349ed0e51a8de65dbeeea8b345edcf0a879..0000000000000000000000000000000000000000 --- a/legacy/ctr/reader.py +++ /dev/null @@ -1,64 +0,0 @@ -from utils import logger, TaskMode, load_dnn_input_record, load_lr_input_record - -feeding_index = {'dnn_input': 0, 'lr_input': 1, 'click': 2} - - -class Dataset(object): - def train(self, path): - ''' - Load trainset. - ''' - logger.info("load trainset from %s" % path) - mode = TaskMode.create_train() - return self._parse_creator(path, mode) - - def test(self, path): - ''' - Load testset. - ''' - logger.info("load testset from %s" % path) - mode = TaskMode.create_test() - return self._parse_creator(path, mode) - - def infer(self, path): - ''' - Load infer set. - ''' - logger.info("load inferset from %s" % path) - mode = TaskMode.create_infer() - return self._parse_creator(path, mode) - - def _parse_creator(self, path, mode): - ''' - Parse dataset. - ''' - - def _parse(): - with open(path) as f: - for line_id, line in enumerate(f): - fs = line.strip().split('\t') - dnn_input = load_dnn_input_record(fs[0]) - lr_input = load_lr_input_record(fs[1]) - if not mode.is_infer(): - click = [int(fs[2])] - yield dnn_input, lr_input, click - else: - yield dnn_input, lr_input - - return _parse - - -def load_data_meta(path): - ''' - load data meta info from path, return (dnn_input_dim, lr_input_dim) - ''' - with open(path) as f: - lines = f.read().split('\n') - err_info = "wrong meta format" - assert len(lines) == 2, err_info - assert 'dnn_input_dim:' in lines[0] and 'lr_input_dim:' in lines[ - 1], err_info - res = map(int, [_.split(':')[1] for _ in lines]) - logger.info('dnn input dim: %d' % res[0]) - logger.info('lr input dim: %d' % res[1]) - return res diff --git a/legacy/ctr/train.py b/legacy/ctr/train.py deleted file mode 100644 index de7add61d65aba363cc17bed49d32c9054600108..0000000000000000000000000000000000000000 --- a/legacy/ctr/train.py +++ /dev/null @@ -1,112 +0,0 @@ -import argparse -import gzip - -import reader -import paddle.v2 as paddle -from utils import logger, ModelType -from network_conf import CTRmodel - - -def parse_args(): - parser = argparse.ArgumentParser(description="PaddlePaddle CTR example") - parser.add_argument( - '--train_data_path', - type=str, - required=True, - help="path of training dataset") - parser.add_argument( - '--test_data_path', type=str, help='path of testing dataset') - parser.add_argument( - '--batch_size', - type=int, - default=10000, - help="size of mini-batch (default:10000)") - parser.add_argument( - '--num_passes', type=int, default=10, help="number of passes to train") - parser.add_argument( - '--model_output_prefix', - type=str, - default='./ctr_models', - help='prefix of path for model to store (default: ./ctr_models)') - parser.add_argument( - '--data_meta_file', - type=str, - required=True, - help='path of data meta info file', ) - parser.add_argument( - '--model_type', - type=int, - required=True, - default=ModelType.CLASSIFICATION, - help='model type, classification: %d, regression %d (default classification)' - % (ModelType.CLASSIFICATION, ModelType.REGRESSION)) - - return parser.parse_args() - - -dnn_layer_dims = [128, 64, 32, 1] - -# ============================================================================== -# cost and train period -# ============================================================================== - - -def train(): - args = parse_args() - args.model_type = ModelType(args.model_type) - paddle.init(use_gpu=False, trainer_count=1) - dnn_input_dim, lr_input_dim = reader.load_data_meta(args.data_meta_file) - - # create ctr model. - model = CTRmodel( - dnn_layer_dims, - dnn_input_dim, - lr_input_dim, - model_type=args.model_type, - is_infer=False) - - params = paddle.parameters.create(model.train_cost) - optimizer = paddle.optimizer.AdaGrad() - - trainer = paddle.trainer.SGD(cost=model.train_cost, - parameters=params, - update_equation=optimizer) - - dataset = reader.Dataset() - - def __event_handler__(event): - if isinstance(event, paddle.event.EndIteration): - num_samples = event.batch_id * args.batch_size - if event.batch_id % 100 == 0: - logger.warning("Pass %d, Samples %d, Cost %f, %s" % ( - event.pass_id, num_samples, event.cost, event.metrics)) - - if event.batch_id % 1000 == 0: - if args.test_data_path: - result = trainer.test( - reader=paddle.batch( - dataset.test(args.test_data_path), - batch_size=args.batch_size), - feeding=reader.feeding_index) - logger.warning("Test %d-%d, Cost %f, %s" % - (event.pass_id, event.batch_id, result.cost, - result.metrics)) - - path = "{}-pass-{}-batch-{}-test-{}.tar.gz".format( - args.model_output_prefix, event.pass_id, event.batch_id, - result.cost) - with gzip.open(path, 'w') as f: - trainer.save_parameter_to_tar(f) - - trainer.train( - reader=paddle.batch( - paddle.reader.shuffle( - dataset.train(args.train_data_path), buf_size=500), - batch_size=args.batch_size), - feeding=reader.feeding_index, - event_handler=__event_handler__, - num_passes=args.num_passes) - - -if __name__ == '__main__': - train() diff --git a/legacy/ctr/utils.py b/legacy/ctr/utils.py deleted file mode 100644 index 437554c3c291d5a74cc0b3844c8684c73b189a19..0000000000000000000000000000000000000000 --- a/legacy/ctr/utils.py +++ /dev/null @@ -1,70 +0,0 @@ -import logging - -logging.basicConfig() -logger = logging.getLogger("paddle") -logger.setLevel(logging.INFO) - - -class TaskMode: - TRAIN_MODE = 0 - TEST_MODE = 1 - INFER_MODE = 2 - - def __init__(self, mode): - self.mode = mode - - def is_train(self): - return self.mode == self.TRAIN_MODE - - def is_test(self): - return self.mode == self.TEST_MODE - - def is_infer(self): - return self.mode == self.INFER_MODE - - @staticmethod - def create_train(): - return TaskMode(TaskMode.TRAIN_MODE) - - @staticmethod - def create_test(): - return TaskMode(TaskMode.TEST_MODE) - - @staticmethod - def create_infer(): - return TaskMode(TaskMode.INFER_MODE) - - -class ModelType: - CLASSIFICATION = 0 - REGRESSION = 1 - - def __init__(self, mode): - self.mode = mode - - def is_classification(self): - return self.mode == self.CLASSIFICATION - - def is_regression(self): - return self.mode == self.REGRESSION - - @staticmethod - def create_classification(): - return ModelType(ModelType.CLASSIFICATION) - - @staticmethod - def create_regression(): - return ModelType(ModelType.REGRESSION) - - -def load_dnn_input_record(sent): - return map(int, sent.split()) - - -def load_lr_input_record(sent): - res = [] - for _ in [x.split(':') for x in sent.split()]: - res.append(( - int(_[0]), - float(_[1]), )) - return res diff --git a/legacy/deep_fm/README.cn.md b/legacy/deep_fm/README.cn.md deleted file mode 100644 index 1f651acbde0078340dab06c551f583ca2b1dd86c..0000000000000000000000000000000000000000 --- a/legacy/deep_fm/README.cn.md +++ /dev/null @@ -1,76 +0,0 @@ -运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html)中的说明更新PaddlePaddle安装版本。 - ---- - -# 基于深度因子分解机的点击率预估模型 - -## 介绍 -本模型实现了下述论文中提出的DeepFM模型: - -```text -@inproceedings{guo2017deepfm, - title={DeepFM: A Factorization-Machine based Neural Network for CTR Prediction}, - author={Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li and Xiuqiang He}, - booktitle={the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI)}, - pages={1725--1731}, - year={2017} -} -``` - -DeepFM模型把因子分解机和深度神经网络的低阶和高阶特征的相互作用结合起来,有关因子分解机的详细信息,请参考论文[因子分解机](https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf)。 - -## 数据集 -本文使用的是Kaggle公司举办的[展示广告竞赛](https://www.kaggle.com/c/criteo-display-ad-challenge/)中所使用的Criteo数据集。 - -每一行是一次广告展示的特征,第一列是一个标签,表示这次广告展示是否被点击。总共有39个特征,其中13个特征采用整型值,另外26个特征是类别类特征。测试集中是没有标签的。 - -下载数据集: -```bash -cd data && ./download.sh && cd .. -``` - -## 模型 -DeepFM模型是由因子分解机(FM)和深度神经网络(DNN)组成的。所有的输入特征都会同时输入FM和DNN,最后把FM和DNN的输出结合在一起形成最终的输出。DNN中稀疏特征生成的嵌入层与FM层中的隐含向量(因子)共享参数。 - -PaddlePaddle中的因子分解机层负责计算二阶组合特征的相互关系。以下的代码示例结合了因子分解机层和全连接层,形成了完整的的因子分解机: - -```python -def fm_layer(input, factor_size): - first_order = paddle.layer.fc(input=input, size=1, act=paddle.activation.Linear()) - second_order = paddle.layer.factorization_machine(input=input, factor_size=factor_size) - fm = paddle.layer.addto(input=[first_order, second_order], - act=paddle.activation.Linear(), - bias_attr=False) - return fm -``` - -## 数据准备 -处理原始数据集,整型特征使用min-max归一化方法规范到[0, 1],类别类特征使用了one-hot编码。原始数据集分割成两部分:90%用于训练,其他10%用于训练过程中的验证。 - -```bash -python preprocess.py --datadir ./data/raw --outdir ./data -``` - -## 训练 -训练的命令行选项可以通过`python train.py -h`列出。 - -训练模型: -```bash -python train.py \ - --train_data_path data/train.txt \ - --test_data_path data/valid.txt \ - 2>&1 | tee train.log -``` - -训练到第9轮的第40000个batch后,测试的AUC为0.807178,误差(cost)为0.445196。 - -## 预测 -预测的命令行选项可以通过`python infer.py -h`列出。 - -对测试集进行预测: -```bash -python infer.py \ - --model_gz_path models/model-pass-9-batch-10000.tar.gz \ - --data_path data/test.txt \ - --prediction_output_path ./predict.txt -``` diff --git a/legacy/deep_fm/README.md b/legacy/deep_fm/README.md deleted file mode 100644 index 6e2c6fad38d2e9e9db8d17c4967196b4f1cc5a36..0000000000000000000000000000000000000000 --- a/legacy/deep_fm/README.md +++ /dev/null @@ -1,95 +0,0 @@ -The minimum PaddlePaddle version needed for the code sample in this directory is v0.11.0. If you are on a version of PaddlePaddle earlier than v0.11.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html). - ---- - -# Deep Factorization Machine for Click-Through Rate prediction - -## Introduction -This model implements the DeepFM proposed in the following paper: - -```text -@inproceedings{guo2017deepfm, - title={DeepFM: A Factorization-Machine based Neural Network for CTR Prediction}, - author={Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li and Xiuqiang He}, - booktitle={the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI)}, - pages={1725--1731}, - year={2017} -} -``` - -The DeepFm combines factorization machine and deep neural networks to model -both low order and high order feature interactions. For details of the -factorization machines, please refer to the paper [factorization -machines](https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf) - -## Dataset -This example uses Criteo dataset which was used for the [Display Advertising -Challenge](https://www.kaggle.com/c/criteo-display-ad-challenge/) -hosted by Kaggle. - -Each row is the features for an ad display and the first column is a label -indicating whether this ad has been clicked or not. There are 39 features in -total. 13 features take integer values and the other 26 features are -categorical features. For the test dataset, the labels are omitted. - -Download dataset: -```bash -cd data && ./download.sh && cd .. -``` - -## Model -The DeepFM model is composed of the factorization machine layer (FM) and deep -neural networks (DNN). All the input features are feeded to both FM and DNN. -The output from FM and DNN are combined to form the final output. The embedding -layer for sparse features in the DNN shares the parameters with the latent -vectors (factors) of the FM layer. - -The factorization machine layer in PaddlePaddle computes the second order -interactions. The following code example combines the factorization machine -layer and fully connected layer to form the full version of factorization -machine: - -```python -def fm_layer(input, factor_size): - first_order = paddle.layer.fc(input=input, size=1, act=paddle.activation.Linear()) - second_order = paddle.layer.factorization_machine(input=input, factor_size=factor_size) - fm = paddle.layer.addto(input=[first_order, second_order], - act=paddle.activation.Linear(), - bias_attr=False) - return fm -``` - -## Data preparation -To preprocess the raw dataset, the integer features are clipped then min-max -normalized to [0, 1] and the categorical features are one-hot encoded. The raw -training dataset are splited such that 90% are used for training and the other -10% are used for validation during training. - -```bash -python preprocess.py --datadir ./data/raw --outdir ./data -``` - -## Train -The command line options for training can be listed by `python train.py -h`. - -To train the model: -```bash -python train.py \ - --train_data_path data/train.txt \ - --test_data_path data/valid.txt \ - 2>&1 | tee train.log -``` - -After training pass 9 batch 40000, the testing AUC is `0.807178` and the testing -cost is `0.445196`. - -## Infer -The command line options for infering can be listed by `python infer.py -h`. - -To make inference for the test dataset: -```bash -python infer.py \ - --model_gz_path models/model-pass-9-batch-10000.tar.gz \ - --data_path data/test.txt \ - --prediction_output_path ./predict.txt -``` diff --git a/legacy/deep_fm/data/download.sh b/legacy/deep_fm/data/download.sh deleted file mode 100755 index 466a22f2c6cc885cea0a1468f3043cb59c611b59..0000000000000000000000000000000000000000 --- a/legacy/deep_fm/data/download.sh +++ /dev/null @@ -1,8 +0,0 @@ -#!/bin/bash - -wget --no-check-certificate https://s3-eu-west-1.amazonaws.com/criteo-labs/dac.tar.gz -tar zxf dac.tar.gz -rm -f dac.tar.gz - -mkdir raw -mv ./*.txt raw/ diff --git a/legacy/deep_fm/infer.py b/legacy/deep_fm/infer.py deleted file mode 100755 index 40a5929780090d403b8b905f8e949f1f8a020eb3..0000000000000000000000000000000000000000 --- a/legacy/deep_fm/infer.py +++ /dev/null @@ -1,63 +0,0 @@ -import os -import gzip -import argparse -import itertools - -import paddle.v2 as paddle - -from network_conf import DeepFM -import reader - - -def parse_args(): - parser = argparse.ArgumentParser(description="PaddlePaddle DeepFM example") - parser.add_argument( - '--model_gz_path', - type=str, - required=True, - help="The path of model parameters gz file") - parser.add_argument( - '--data_path', - type=str, - required=True, - help="The path of the dataset to infer") - parser.add_argument( - '--prediction_output_path', - type=str, - required=True, - help="The path to output the prediction") - parser.add_argument( - '--factor_size', - type=int, - default=10, - help="The factor size for the factorization machine (default:10)") - - return parser.parse_args() - - -def infer(): - args = parse_args() - - paddle.init(use_gpu=False, trainer_count=1) - - model = DeepFM(args.factor_size, infer=True) - - parameters = paddle.parameters.Parameters.from_tar( - gzip.open(args.model_gz_path, 'r')) - - inferer = paddle.inference.Inference( - output_layer=model, parameters=parameters) - - dataset = reader.Dataset() - - infer_reader = paddle.batch(dataset.infer(args.data_path), batch_size=1000) - - with open(args.prediction_output_path, 'w') as out: - for id, batch in enumerate(infer_reader()): - res = inferer.infer(input=batch) - predictions = [x for x in itertools.chain.from_iterable(res)] - out.write('\n'.join(map(str, predictions)) + '\n') - - -if __name__ == '__main__': - infer() diff --git a/legacy/deep_fm/network_conf.py b/legacy/deep_fm/network_conf.py deleted file mode 100644 index 545fe07b8197e3379eb5a6f34c3134b813a4684e..0000000000000000000000000000000000000000 --- a/legacy/deep_fm/network_conf.py +++ /dev/null @@ -1,75 +0,0 @@ -import paddle.v2 as paddle - -dense_feature_dim = 13 -sparse_feature_dim = 117568 - - -def fm_layer(input, factor_size, fm_param_attr): - first_order = paddle.layer.fc(input=input, - size=1, - act=paddle.activation.Linear()) - second_order = paddle.layer.factorization_machine( - input=input, - factor_size=factor_size, - act=paddle.activation.Linear(), - param_attr=fm_param_attr) - out = paddle.layer.addto( - input=[first_order, second_order], - act=paddle.activation.Linear(), - bias_attr=False) - return out - - -def DeepFM(factor_size, infer=False): - dense_input = paddle.layer.data( - name="dense_input", - type=paddle.data_type.dense_vector(dense_feature_dim)) - sparse_input = paddle.layer.data( - name="sparse_input", - type=paddle.data_type.sparse_binary_vector(sparse_feature_dim)) - sparse_input_ids = [ - paddle.layer.data( - name="C" + str(i), - type=paddle.data_type.integer_value(sparse_feature_dim)) - for i in range(1, 27) - ] - - dense_fm = fm_layer( - dense_input, - factor_size, - fm_param_attr=paddle.attr.Param(name="DenseFeatFactors")) - sparse_fm = fm_layer( - sparse_input, - factor_size, - fm_param_attr=paddle.attr.Param(name="SparseFeatFactors")) - - def embedding_layer(input): - return paddle.layer.embedding( - input=input, - size=factor_size, - param_attr=paddle.attr.Param(name="SparseFeatFactors")) - - sparse_embed_seq = map(embedding_layer, sparse_input_ids) - sparse_embed = paddle.layer.concat(sparse_embed_seq) - - fc1 = paddle.layer.fc(input=[sparse_embed, dense_input], - size=400, - act=paddle.activation.Relu()) - fc2 = paddle.layer.fc(input=fc1, size=400, act=paddle.activation.Relu()) - fc3 = paddle.layer.fc(input=fc2, size=400, act=paddle.activation.Relu()) - - predict = paddle.layer.fc(input=[dense_fm, sparse_fm, fc3], - size=1, - act=paddle.activation.Sigmoid()) - - if not infer: - label = paddle.layer.data( - name="label", type=paddle.data_type.dense_vector(1)) - cost = paddle.layer.multi_binary_label_cross_entropy_cost( - input=predict, label=label) - paddle.evaluator.classification_error( - name="classification_error", input=predict, label=label) - paddle.evaluator.auc(name="auc", input=predict, label=label) - return cost - else: - return predict diff --git a/legacy/deep_fm/preprocess.py b/legacy/deep_fm/preprocess.py deleted file mode 100755 index 36ffea16637c19dee9352d17ed51a67edf582167..0000000000000000000000000000000000000000 --- a/legacy/deep_fm/preprocess.py +++ /dev/null @@ -1,164 +0,0 @@ -""" -Preprocess Criteo dataset. This dataset was used for the Display Advertising -Challenge (https://www.kaggle.com/c/criteo-display-ad-challenge). -""" -import os -import sys -import click -import random -import collections - -# There are 13 integer features and 26 categorical features -continous_features = range(1, 14) -categorial_features = range(14, 40) - -# Clip integer features. The clip point for each integer feature -# is derived from the 95% quantile of the total values in each feature -continous_clip = [20, 600, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50] - - -class CategoryDictGenerator: - """ - Generate dictionary for each of the categorical features - """ - - def __init__(self, num_feature): - self.dicts = [] - self.num_feature = num_feature - for i in range(0, num_feature): - self.dicts.append(collections.defaultdict(int)) - - def build(self, datafile, categorial_features, cutoff=0): - with open(datafile, 'r') as f: - for line in f: - features = line.rstrip('\n').split('\t') - for i in range(0, self.num_feature): - if features[categorial_features[i]] != '': - self.dicts[i][features[categorial_features[i]]] += 1 - for i in range(0, self.num_feature): - self.dicts[i] = filter(lambda x: x[1] >= cutoff, - self.dicts[i].items()) - self.dicts[i] = sorted(self.dicts[i], key=lambda x: (-x[1], x[0])) - vocabs, _ = list(zip(*self.dicts[i])) - self.dicts[i] = dict(zip(vocabs, range(1, len(vocabs) + 1))) - self.dicts[i][''] = 0 - - def gen(self, idx, key): - if key not in self.dicts[idx]: - res = self.dicts[idx][''] - else: - res = self.dicts[idx][key] - return res - - def dicts_sizes(self): - return map(len, self.dicts) - - -class ContinuousFeatureGenerator: - """ - Normalize the integer features to [0, 1] by min-max normalization - """ - - def __init__(self, num_feature): - self.num_feature = num_feature - self.min = [sys.maxint] * num_feature - self.max = [-sys.maxint] * num_feature - - def build(self, datafile, continous_features): - with open(datafile, 'r') as f: - for line in f: - features = line.rstrip('\n').split('\t') - for i in range(0, self.num_feature): - val = features[continous_features[i]] - if val != '': - val = int(val) - if val > continous_clip[i]: - val = continous_clip[i] - self.min[i] = min(self.min[i], val) - self.max[i] = max(self.max[i], val) - - def gen(self, idx, val): - if val == '': - return 0.0 - val = float(val) - return (val - self.min[idx]) / (self.max[idx] - self.min[idx]) - - -@click.command("preprocess") -@click.option("--datadir", type=str, help="Path to raw criteo dataset") -@click.option("--outdir", type=str, help="Path to save the processed data") -def preprocess(datadir, outdir): - """ - All the 13 integer features are normalzied to continous values and these - continous features are combined into one vecotr with dimension 13. - - Each of the 26 categorical features are one-hot encoded and all the one-hot - vectors are combined into one sparse binary vector. - """ - dists = ContinuousFeatureGenerator(len(continous_features)) - dists.build(os.path.join(datadir, 'train.txt'), continous_features) - - dicts = CategoryDictGenerator(len(categorial_features)) - dicts.build( - os.path.join(datadir, 'train.txt'), categorial_features, cutoff=200) - - dict_sizes = dicts.dicts_sizes() - categorial_feature_offset = [0] - for i in range(1, len(categorial_features)): - offset = categorial_feature_offset[i - 1] + dict_sizes[i - 1] - categorial_feature_offset.append(offset) - - random.seed(0) - - # 90% of the data are used for training, and 10% of the data are used - # for validation. - with open(os.path.join(outdir, 'train.txt'), 'w') as out_train: - with open(os.path.join(outdir, 'valid.txt'), 'w') as out_valid: - with open(os.path.join(datadir, 'train.txt'), 'r') as f: - for line in f: - features = line.rstrip('\n').split('\t') - - continous_vals = [] - for i in range(0, len(continous_features)): - val = dists.gen(i, features[continous_features[i]]) - continous_vals.append("{0:.6f}".format(val).rstrip('0') - .rstrip('.')) - categorial_vals = [] - for i in range(0, len(categorial_features)): - val = dicts.gen(i, features[categorial_features[ - i]]) + categorial_feature_offset[i] - categorial_vals.append(str(val)) - - continous_vals = ','.join(continous_vals) - categorial_vals = ','.join(categorial_vals) - label = features[0] - if random.randint(0, 9999) % 10 != 0: - out_train.write('\t'.join( - [continous_vals, categorial_vals, label]) + '\n') - else: - out_valid.write('\t'.join( - [continous_vals, categorial_vals, label]) + '\n') - - with open(os.path.join(outdir, 'test.txt'), 'w') as out: - with open(os.path.join(datadir, 'test.txt'), 'r') as f: - for line in f: - features = line.rstrip('\n').split('\t') - - continous_vals = [] - for i in range(0, len(continous_features)): - val = dists.gen(i, features[continous_features[i] - 1]) - continous_vals.append("{0:.6f}".format(val).rstrip('0') - .rstrip('.')) - categorial_vals = [] - for i in range(0, len(categorial_features)): - val = dicts.gen(i, features[categorial_features[ - i] - 1]) + categorial_feature_offset[i] - categorial_vals.append(str(val)) - - continous_vals = ','.join(continous_vals) - categorial_vals = ','.join(categorial_vals) - out.write('\t'.join([continous_vals, categorial_vals]) + '\n') - - -if __name__ == "__main__": - preprocess() diff --git a/legacy/deep_fm/reader.py b/legacy/deep_fm/reader.py deleted file mode 100644 index 1098ce423c9071864671be91dea81972e47fbc98..0000000000000000000000000000000000000000 --- a/legacy/deep_fm/reader.py +++ /dev/null @@ -1,58 +0,0 @@ -class Dataset: - def _reader_creator(self, path, is_infer): - def reader(): - with open(path, 'r') as f: - for line in f: - features = line.rstrip('\n').split('\t') - dense_feature = map(float, features[0].split(',')) - sparse_feature = map(int, features[1].split(',')) - if not is_infer: - label = [float(features[2])] - yield [dense_feature, sparse_feature - ] + sparse_feature + [label] - else: - yield [dense_feature, sparse_feature] + sparse_feature - - return reader - - def train(self, path): - return self._reader_creator(path, False) - - def test(self, path): - return self._reader_creator(path, False) - - def infer(self, path): - return self._reader_creator(path, True) - - -feeding = { - 'dense_input': 0, - 'sparse_input': 1, - 'C1': 2, - 'C2': 3, - 'C3': 4, - 'C4': 5, - 'C5': 6, - 'C6': 7, - 'C7': 8, - 'C8': 9, - 'C9': 10, - 'C10': 11, - 'C11': 12, - 'C12': 13, - 'C13': 14, - 'C14': 15, - 'C15': 16, - 'C16': 17, - 'C17': 18, - 'C18': 19, - 'C19': 20, - 'C20': 21, - 'C21': 22, - 'C22': 23, - 'C23': 24, - 'C24': 25, - 'C25': 26, - 'C26': 27, - 'label': 28 -} diff --git a/legacy/deep_fm/train.py b/legacy/deep_fm/train.py deleted file mode 100755 index 92d48696d8845ac13b714b66f7810acdd35fe164..0000000000000000000000000000000000000000 --- a/legacy/deep_fm/train.py +++ /dev/null @@ -1,108 +0,0 @@ -import os -import gzip -import logging -import argparse - -import paddle.v2 as paddle - -from network_conf import DeepFM -import reader - -logging.basicConfig() -logger = logging.getLogger("paddle") -logger.setLevel(logging.INFO) - - -def parse_args(): - parser = argparse.ArgumentParser(description="PaddlePaddle DeepFM example") - parser.add_argument( - '--train_data_path', - type=str, - required=True, - help="The path of training dataset") - parser.add_argument( - '--test_data_path', - type=str, - required=True, - help="The path of testing dataset") - parser.add_argument( - '--batch_size', - type=int, - default=1000, - help="The size of mini-batch (default:1000)") - parser.add_argument( - '--num_passes', - type=int, - default=10, - help="The number of passes to train (default: 10)") - parser.add_argument( - '--factor_size', - type=int, - default=10, - help="The factor size for the factorization machine (default:10)") - parser.add_argument( - '--model_output_dir', - type=str, - default='models', - help='The path for model to store (default: models)') - - return parser.parse_args() - - -def train(): - args = parse_args() - - if not os.path.isdir(args.model_output_dir): - os.mkdir(args.model_output_dir) - - paddle.init(use_gpu=False, trainer_count=1) - - optimizer = paddle.optimizer.Adam(learning_rate=1e-4) - - model = DeepFM(args.factor_size) - - params = paddle.parameters.create(model) - - trainer = paddle.trainer.SGD(cost=model, - parameters=params, - update_equation=optimizer) - - dataset = reader.Dataset() - - def __event_handler__(event): - if isinstance(event, paddle.event.EndIteration): - num_samples = event.batch_id * args.batch_size - if event.batch_id % 100 == 0: - logger.warning("Pass %d, Batch %d, Samples %d, Cost %f, %s" % - (event.pass_id, event.batch_id, num_samples, - event.cost, event.metrics)) - - if event.batch_id % 10000 == 0: - if args.test_data_path: - result = trainer.test( - reader=paddle.batch( - dataset.test(args.test_data_path), - batch_size=args.batch_size), - feeding=reader.feeding) - logger.warning("Test %d-%d, Cost %f, %s" % - (event.pass_id, event.batch_id, result.cost, - result.metrics)) - - path = "{}/model-pass-{}-batch-{}.tar.gz".format( - args.model_output_dir, event.pass_id, event.batch_id) - with gzip.open(path, 'w') as f: - trainer.save_parameter_to_tar(f) - - trainer.train( - reader=paddle.batch( - paddle.reader.shuffle( - dataset.train(args.train_data_path), - buf_size=args.batch_size * 10000), - batch_size=args.batch_size), - feeding=reader.feeding, - event_handler=__event_handler__, - num_passes=args.num_passes) - - -if __name__ == '__main__': - train() diff --git a/legacy/dssm/README.cn.md b/legacy/dssm/README.cn.md deleted file mode 100644 index 140446ad2e071e8bc185d7788dcf33651a370d69..0000000000000000000000000000000000000000 --- a/legacy/dssm/README.cn.md +++ /dev/null @@ -1,294 +0,0 @@ -运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此版本要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。 - ---- - -# 深度结构化语义模型 (Deep Structured Semantic Models, DSSM) -DSSM使用DNN模型在一个连续的语义空间中学习文本低纬的表示向量,并且建模两个句子间的语义相似度。本例演示如何使用PaddlePaddle实现一个通用的DSSM 模型,用于建模两个字符串间的语义相似度,模型实现支持通用的数据格式,用户替换数据便可以在真实场景中使用该模型。 - -## 背景介绍 -DSSM \[[1](##参考文献)\]是微软研究院13年提出来的经典的语义模型,用于学习两个文本之间的语义距离,广义上模型也可以推广和适用如下场景: - -1. CTR预估模型,衡量用户搜索词(Query)与候选网页集合(Documents)之间的相关联程度。 -2. 文本相关性,衡量两个字符串间的语义相关程度。 -3. 自动推荐,衡量User与被推荐的Item之间的关联程度。 - -DSSM 已经发展成了一个框架,可以很自然地建模两个记录之间的距离关系,例如对于文本相关性问题,可以用余弦相似度 (cosin similarity) 来刻画语义距离;而对于搜索引擎的结果排序,可以在DSSM上接上Rank损失训练出一个排序模型。 - -## 模型简介 -在原论文\[[1](#参考文献)\]中,DSSM模型用来衡量用户搜索词 Query 和文档集合 Documents 之间隐含的语义关系,模型结构如下 - -

-

-图 1. DSSM 原始结构 -

- -其贯彻的思想是, **用DNN将高维特征向量转化为低纬空间的连续向量(图中红色框部分)** ,**在上层使用cosine similarity来衡量用户搜索词与候选文档间的语义相关性** 。 - -在最顶层损失函数的设计上,原始模型使用类似Word2Vec中负例采样的方法,一个Query会抽取正例 $D+$ 和4个负例 $D-$ 整体上算条件概率用对数似然函数作为损失,这也就是图 1中类似 $P(D_1|Q)$ 的结构,具体细节请参考原论文。 - -随着后续优化DSSM模型的结构得以简化\[[3](#参考文献)\],演变为: - -

-

-图 2. DSSM通用结构 -

- -图中的空白方框可以用任何模型替代,例如:全连接FC,卷积CNN,RNN等。该模型结构专门用于衡量两个元素(比如字符串)间的语义距离。在实际任务中,DSSM模型会作为基础的积木,搭配上不同的损失函数来实现具体的功能,比如: - -- 在排序学习中,将 图 2 中结构添加 pairwise rank损失,变成一个排序模型 -- 在CTR预估中,对点击与否做0,1二元分类,添加交叉熵损失变成一个分类模型 -- 在需要对一个子串打分时,可以使用余弦相似度来计算相似度,变成一个回归模型 - -本例提供一个比较通用的解决方案,在模型任务类型上支持: - -- 分类 -- [-1, 1] 值域内的回归 -- Pairwise-Rank - -在生成低纬语义向量的模型结构上,支持以下三种: - -- FC, 多层全连接层 -- CNN,卷积神经网络 -- RNN,递归神经网络 - -## 模型实现 -DSSM模型可以拆成三部分:分别是左边和右边的DNN,以及顶层的损失函数。在复杂任务中,左右两边DNN的结构可以不同。在原始论文中左右网络分别学习Query和Document的语义向量,两者数据的数据不同,建议对应定制DNN的结构。 - -**本例中为了简便和通用,将左右两个DNN的结构设为相同,因此只提供三个选项FC、CNN、RNN**。 - -损失函数的设计也支持三种类型:分类, 回归, 排序;其中,在回归和排序两种损失中,左右两边的匹配程度通过余弦相似度(cosine similairty)来计算;在分类任务中,类别预测的分布通过softmax计算。 - -在其它教程中,对上述很多内容都有过详细的介绍,例如: - -- 如何CNN, FC 做文本信息提取可以参考 [text classification](https://github.com/PaddlePaddle/models/blob/develop/text_classification/README.md#模型详解) -- RNN/GRU 的内容可以参考 [Machine Translation](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md#gated-recurrent-unit-gru) -- Pairwise Rank即排序学习可参考 [learn to rank](https://github.com/PaddlePaddle/models/blob/develop/ltr/README.md) - -相关原理在此不再赘述,本文接下来的篇幅主要集中介绍使用PaddlePaddle实现这些结构上。 - -如图3,回归和分类模型的结构相似: - -

-

-图 3. DSSM for REGRESSION or CLASSIFICATION -

- -最重要的组成部分包括词向量,图中`(1)`,`(2)`两个低纬向量的学习器(可以用RNN/CNN/FC中的任意一种实现),最上层对应的损失函数。 - -Pairwise Rank的结构会复杂一些,图 4. 中的结构会出现两次,增加了对应的损失函数,模型总体思想是: -- 给定同一个source(源)为左右两个target(目标)分别打分——`(a),(b)`,学习目标是(a),(b)之间的大小关系 -- `(a)`和`(b)`类似图3中结构,用于给source和target的pair打分 -- `(1)`和`(2)`的结构其实是共用的,都表示同一个source,图中为了表达效果展开成两个 - -

-

-图 4. DSSM for Pairwise Rank -

- -下面是各个部分的具体实现,相关代码均包含在 `./network_conf.py` 中。 - - -### 创建文本的词向量表 - -```python -def create_embedding(self, input, prefix=''): - """ - Create word embedding. The `prefix` is added in front of the name of - embedding"s learnable parameter. - """ - logger.info("Create embedding table [%s] whose dimention is %d" % - (prefix, self.dnn_dims[0])) - emb = paddle.layer.embedding( - input=input, - size=self.dnn_dims[0], - param_attr=ParamAttr(name='%s_emb.w' % prefix)) - return emb -``` - -由于输入给词向量表(embedding table)的是一个句子对应的词的ID的列表 ,因此词向量表输出的是词向量的序列。 - -### CNN 结构实现 - -```python -def create_cnn(self, emb, prefix=''): - - """ - A multi-layer CNN. - :param emb: The word embedding. - :type emb: paddle.layer - :param prefix: The prefix will be added to of layers' names. - :type prefix: str - """ - - def create_conv(context_len, hidden_size, prefix): - key = "%s_%d_%d" % (prefix, context_len, hidden_size) - conv = paddle.networks.sequence_conv_pool( - input=emb, - context_len=context_len, - hidden_size=hidden_size, - # set parameter attr for parameter sharing - context_proj_param_attr=ParamAttr(name=key + "contex_proj.w"), - fc_param_attr=ParamAttr(name=key + "_fc.w"), - fc_bias_attr=ParamAttr(name=key + "_fc.b"), - pool_bias_attr=ParamAttr(name=key + "_pool.b")) - return conv - - conv_3 = create_conv(3, self.dnn_dims[1], "cnn") - conv_4 = create_conv(4, self.dnn_dims[1], "cnn") - return paddle.layer.concat(input=[conv_3, conv_4]) -``` - -CNN 接受词向量序列,通过卷积和池化操作捕捉到原始句子的关键信息,最终输出一个语义向量(可以认为是句子向量)。 - -本例的实现中,分别使用了窗口长度为3和4的CNN学到的句子向量按元素求和得到最终的句子向量。 - -### RNN 结构实现 - -RNN很适合学习变长序列的信息,使用RNN来学习句子的信息几乎是自然语言处理任务的标配。 - -```python -def create_rnn(self, emb, prefix=''): - """ - A GRU sentence vector learner. - """ - gru = paddle.networks.simple_gru( - input=emb, - size=self.dnn_dims[1], - mixed_param_attr=ParamAttr(name='%s_gru_mixed.w' % prefix), - mixed_bias_param_attr=ParamAttr(name="%s_gru_mixed.b" % prefix), - gru_param_attr=ParamAttr(name='%s_gru.w' % prefix), - gru_bias_attr=ParamAttr(name="%s_gru.b" % prefix)) - sent_vec = paddle.layer.last_seq(gru) - return sent_vec -``` - -### 多层全连接网络FC - -```python -def create_fc(self, emb, prefix=''): - - """ - A multi-layer fully connected neural networks. - :param emb: The output of the embedding layer - :type emb: paddle.layer - :param prefix: A prefix will be added to the layers' names. - :type prefix: str - """ - - _input_layer = paddle.layer.pooling( - input=emb, pooling_type=paddle.pooling.Max()) - fc = paddle.layer.fc( - input=_input_layer, - size=self.dnn_dims[1], - param_attr=ParamAttr(name='%s_fc.w' % prefix), - bias_attr=ParamAttr(name="%s_fc.b" % prefix)) - return fc -``` - -在构建全连接网络时首先使用`paddle.layer.pooling` 对词向量序列进行最大池化操作,将边长序列转化为一个固定维度向量,作为整个句子的语义表达,使用最大池化能够降低句子长度对句向量表达的影响。 - -### 多层DNN -在 CNN/DNN/FC提取出 semantic vector后,在上层可继续接多层FC来实现深层DNN结构。 - -```python -def create_dnn(self, sent_vec, prefix): - if len(self.dnn_dims) > 1: - _input_layer = sent_vec - for id, dim in enumerate(self.dnn_dims[1:]): - name = "%s_fc_%d_%d" % (prefix, id, dim) - fc = paddle.layer.fc( - input=_input_layer, - size=dim, - act=paddle.activation.Tanh(), - param_attr=ParamAttr(name='%s.w' % name), - bias_attr=ParamAttr(name='%s.b' % name), - ) - _input_layer = fc - return _input_layer -``` - -### 分类及回归 -分类和回归的结构比较相似,具体实现请参考[network_conf.py]( https://github.com/PaddlePaddle/models/blob/develop/dssm/network_conf.py)中的 -`_build_classification_or_regression_model` 函数。 - -### Pairwise Rank -Pairwise Rank复用上面的DNN结构,同一个source对两个target求相似度打分,如果左边的target打分高,预测为1,否则预测为 0。实现请参考 [network_conf.py]( https://github.com/PaddlePaddle/models/blob/develop/dssm/network_conf.py) 中的`_build_rank_model` 函数。 - -## 数据格式 -在 `./data` 中有简单的示例数据 - -### 回归的数据格式 -``` -# 3 fields each line: -# - source word list -# - target word list -# - target - \t \t -``` - -比如: - -``` -苹果 六 袋 苹果 6s 0.1 -新手 汽车 驾驶 驾校 培训 0.9 -``` -### 分类的数据格式 -``` -# 3 fields each line: -# - source word list -# - target word list -# - target - \t \t