Merge branch 'develop' of https://github.com/PaddlePaddle/models into rl

3f83c18d · wanghaoshuang · bc441c58 · 3b75c23b · 3f83c18d · 3f83c18d
215 changed file
--- a/.clang_format.hook
+++ b/.clang_format.hook
 #!/usr/bin/env bash
 set -e
-readonly VERSION="3.9"
+readonly VERSION="3.8"
 version=$(clang-format -version)

--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
-   repo: https://github.com/pre-commit/mirrors-yapf.git
+-   repo: https://github.com/PaddlePaddle/mirrors-yapf.git
-    sha: v0.16.0
+    sha: 0d79c0c469bab64f7229c9aca2b1186ef47f0e37
    hooks:
    -   id: yapf
        files: \.py$

--- a/.travis.yml
+++ b/.travis.yml
@@ -16,11 +16,12 @@ addons:
      - python
      - python-pip
      - python2.7-dev
+      - clang-format-3.8
  ssh_known_hosts: 52.76.173.135
 before_install:
-  -  sudo pip install -U virtualenv pre-commit pip
+  - if [[ "$JOB" == "PRE_COMMIT" ]]; then sudo ln -s /usr/bin/clang-format-3.8 /usr/bin/clang-format; fi
-  -  docker pull paddlepaddle/paddle:latest
+  - sudo pip install -U virtualenv pre-commit pip
+  - docker pull paddlepaddle/paddle:latest
 script:
  - exit_code=0

--- a/README.cn.md
+++ b/README.cn.md
@@ -29,15 +29,16 @@ PaddlePaddle提供了丰富的运算单元，帮助大家以模块化的方式
 点击率预估模型预判用户对一条广告点击的概率，对每次广告的点击情况做出预测，是广告技术的核心算法之一。逻谛斯克回归对大规模稀疏特征有着很好的学习能力，在点击率预估任务发展的早期一统天下。近年来，DNN 模型由于其强大的学习能力逐渐接过点击率预估任务的大旗。
-在点击率预估任务中，我们给出谷歌提出的 Wide & Deep 模型。这一模型融合了适用于学习抽象特征的DNN和适用于大规模稀疏特征的逻谛斯克回归两者的优点，可以作为一种相对成熟的模型框架使用，在工业界也有一定的应用。
+在点击率预估任务中，我们首先给出谷歌提出的 Wide & Deep 模型。这一模型融合了适用于学习抽象特征的DNN和适用于大规模稀疏特征的逻谛斯克回归两者的优点，可以作为一种相对成熟的模型框架使用，在工业界也有一定的应用。同时，我们提供基于因子分解机的深度神经网络模型，该模型融合了因子分解机和深度神经网络，分别建模输入属性之间的低阶交互和高阶交互。
- 3.1 [Wide & deep 点击率预估模型](https://github.com/PaddlePaddle/models/tree/develop/ctr)
+- 3.1 [Wide & deep 点击率预估模型](https://github.com/PaddlePaddle/models/tree/develop/ctr/README.cn.md)
+- 3.2 [基于深度因子分解机的点击率预估模型](https://github.com/PaddlePaddle/models/tree/develop/deep_fm)
 ## 4. 文本分类
 文本分类是自然语言处理领域最基础的任务之一，深度学习方法能够免除复杂的特征工程，直接使用原始文本作为输入，数据驱动地最优化分类准确率。
-在文本分类任务中，我们以情感分类任务为例，提供了基于DNN的非序列文本分类模型，以及基于CNN的序列模型供大家学习和使用（基于LSTM的模型见PaddleBook中[情感分类](https://github.com/PaddlePaddle/book/blob/develop/06.understand_sentiment/README.cn.md)一课）。
+在文本分类任务中，我们以情感分类任务为例，提供了基于DNN的非序列文本分类模型，以及基于CNN的序列模型供大家学习和使用（基于LSTM的模型见PaddleBook中[情感分类](http://www.paddlepaddle.org/docs/develop/book/06.understand_sentiment/index.cn.html)一课）。
 - 4.1 [基于DNN/CNN的情感分类](https://github.com/PaddlePaddle/models/tree/develop/text_classification)
 - 4.2 [基于双层序列的文本分类模型](https://github.com/PaddlePaddle/models/tree/develop/nested_sequence/text_classification)
@@ -46,7 +47,7 @@ PaddlePaddle提供了丰富的运算单元，帮助大家以模块化的方式
 排序学习(Learning to Rank， LTR)是信息检索和搜索引擎研究的核心问题之一，通过机器学习方法学习一个分值函数对待排序的候选进行打分，再根据分值的高低确定序关系。深度神经网络可以用来建模分值函数，构成各类基于深度学习的LTR模型。
-在排序学习任务中，我们介绍基于RankLoss损失函数Pairwise排序模型和基于LambdaRank损失函数的Listwise排序模型(Pointwise学习策略见PaddleBook中[推荐系统](https://github.com/PaddlePaddle/book/blob/develop/05.recommender_system/README.cn.md)一课）。
+在排序学习任务中，我们介绍基于RankLoss损失函数Pairwise排序模型和基于LambdaRank损失函数的Listwise排序模型(Pointwise学习策略见PaddleBook中[推荐系统](http://www.paddlepaddle.org/docs/develop/book/05.recommender_system/index.cn.html)一课）。
 - 5.1 [基于Pairwise和Listwise的排序学习](https://github.com/PaddlePaddle/models/tree/develop/ltr)
@@ -56,7 +57,7 @@ PaddlePaddle提供了丰富的运算单元，帮助大家以模块化的方式
 在结构化语义模型任务中，我们演示如何建模两个字符串之间的语义相似度。模型支持DNN(全连接前馈网络)、CNN(卷积网络)、RNN(递归神经网络)等不同的网络结构，以及分类、回归、排序等不同损失函数。本例采用最简单的文本数据作为输入，通过替换自己的训练和预测数据，便可以在真实场景中使用。
- 6.1 [深度结构化语义模型](https://github.com/PaddlePaddle/models/tree/develop/dssm)
+- 6.1 [深度结构化语义模型](https://github.com/PaddlePaddle/models/tree/develop/dssm/README.cn.md)
 ## 7. 命名实体识别
@@ -72,7 +73,7 @@ PaddlePaddle提供了丰富的运算单元，帮助大家以模块化的方式
 在序列到序列学习任务中，我们首先以机器翻译任务为例，提供了多种改进模型供大家学习和使用。包括：不带注意力机制的序列到序列映射模型，这一模型是所有序列到序列学习模型的基础；使用Scheduled Sampling改善RNN模型在生成任务中的错误累积问题；带外部记忆机制的神经机器翻译，通过增强神经网络的记忆能力，来完成复杂的序列到序列学习任务。除机器翻译任务之外，我们也提供了一个基于深层LSTM网络生成古诗词，实现同语言生成的模型。
- 8.1 [无注意力机制的神经机器翻译](https://github.com/PaddlePaddle/models/tree/develop/nmt_without_attention)
+- 8.1 [无注意力机制的神经机器翻译](https://github.com/PaddlePaddle/models/tree/develop/nmt_without_attention/README.cn.md)
 - 8.2 [使用Scheduled Sampling改善翻译质量](https://github.com/PaddlePaddle/models/tree/develop/scheduled_sampling)
 - 8.3 [带外部记忆机制的神经机器翻译](https://github.com/PaddlePaddle/models/tree/develop/mt_with_external_memory)
 - 8.4 [生成古诗词](https://github.com/PaddlePaddle/models/tree/develop/generate_chinese_poetry)
@@ -97,12 +98,16 @@ PaddlePaddle提供了丰富的运算单元，帮助大家以模块化的方式
 图像相比文字能够提供更加生动、容易理解及更具艺术感的信息，是人们转递与交换信息的重要来源。图像分类是根据图像的语义信息对不同类别图像进行区分，是计算机视觉中重要的基础问题，也是图像检测、图像分割、物体跟踪、行为分析等其他高层视觉任务的基础，在许多领域都有着广泛的应用。如：安防领域的人脸识别和智能视频分析等，交通领域的交通场景识别，互联网领域基于内容的图像检索和相册自动归类，医学领域的图像识别等。
-在图像分类任务中，我们向大家介绍如何训练AlexNet、VGG、GoogLeNet和ResNet模型。同时提供了一个够将Caffe训练好的模型文件转换为PaddlePaddle模型文件的模型转换工具。
+在图像分类任务中，我们向大家介绍如何训练AlexNet、VGG、GoogLeNet、ResNet、Inception-v4、Inception-Resnet-V2和Xception模型。同时提供了能够将Caffe或TensorFlow训练好的模型文件转换为PaddlePaddle模型文件的模型转换工具。
 - 11.1 [将Caffe模型文件转换为PaddlePaddle模型文件](https://github.com/PaddlePaddle/models/tree/develop/image_classification/caffe2paddle)
- 11.2 [AlexNet](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
+- 11.2 [将TensorFlow模型文件转换为PaddlePaddle模型文件](https://github.com/PaddlePaddle/models/tree/develop/image_classification/tf2paddle)
- 11.3 [VGG](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
+- 11.3 [AlexNet](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
- 11.4 [Residual Network](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
+- 11.4 [VGG](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
+- 11.5 [Residual Network](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
+- 11.6 [Inception-v4](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
+- 11.7 [Inception-Resnet-V2](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
+- 11.8 [Xception](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
 ## 12. 目标检测
@@ -110,7 +115,7 @@ PaddlePaddle提供了丰富的运算单元，帮助大家以模块化的方式
 在目标检测任务中，我们介绍利用SSD方法完成目标检测。SSD全称：Single Shot MultiBox Detector，是目标检测领域较新且效果较好的检测算法之一，具有检测速度快且检测精度高的特点。
- 12.1 [Single Shot MultiBox Detector](https://github.com/PaddlePaddle/models/tree/develop/ssd)
+- 12.1 [Single Shot MultiBox Detector](https://github.com/PaddlePaddle/models/tree/develop/ssd/README.cn.md)
 ## 13. 场景文字识别
@@ -126,7 +131,6 @@ PaddlePaddle提供了丰富的运算单元，帮助大家以模块化的方式
 在语音识别任务中，我们提供了基于 DeepSpeech2 模型的完整流水线，包括：特征提取、数据增强、模型训练、语言模型、解码模块等，并提供一个训练好的模型和体验实例，大家能够使用自己的声音来体验语音识别的乐趣。
- 14.1 [语音识别: DeepSpeech2](https://github.com/PaddlePaddle/models/tree/develop/deep_speech_2)
+14.1 [语音识别: DeepSpeech2](https://github.com/PaddlePaddle/DeepSpeech)
 本教程由[PaddlePaddle](https://github.com/PaddlePaddle/Paddle)创作，采用[Apache-2.0](LICENSE) 许可协议进行许可。
--- a/README.md
+++ b/README.md
@@ -25,15 +25,16 @@ The language model is important in the field of natural language processing. In
 ## 3. Click-Through Rate prediction
 The click-through rate model predicts the probability that a user will click on an ad. This is widely used for advertising technology. Logistic Regression has a good learning performance for large-scale sparse features in the early stages of the development of click-through rate prediction. In recent years, DNN model because of its strong learning ability to gradually take the banner rate of the task of the banner.
-In the example of click-through rate estimates, we give the Google's Wide & Deep model. This model combines the advantages of DNN and the applicable logistic regression model for DNN and large-scale sparse features.
+In the example of click-through rate estimates, we first give the Google's Wide & Deep model. This model combines the advantages of DNN and the applicable logistic regression model for DNN and large-scale sparse features. Then we provide the deep factorization machine for click-through rate prediction. The deep factorization machine combines the factorization machine and deep neural networks to model both low order and high order interactions of input features.
 - 3.1 [Click-Through Rate Model](https://github.com/PaddlePaddle/models/tree/develop/ctr)
+- 3.2 [Deep Factorization Machine for Click-Through Rate prediction](https://github.com/PaddlePaddle/models/tree/develop/deep_fm)
 ## 4. Text classification
 Text classification is one of the most basic tasks in natural language processing. The deep learning method can eliminate the complex feature engineering, and use the original text as input to optimize the classification accuracy.
-For text classification, we provide a non-sequential text classification model based on DNN and CNN. (For LSTM-based model, please refer to PaddleBook [Sentiment Analysis](https://github.com/PaddlePaddle/book/blob/develop/06.understand_sentiment/README.cn.md)).
+For text classification, we provide a non-sequential text classification model based on DNN and CNN. (For LSTM-based model, please refer to PaddleBook [Sentiment Analysis](http://www.paddlepaddle.org/docs/develop/book/06.understand_sentiment/index.html)).
 - 4.1 [Sentiment analysis based on DNN / CNN](https://github.com/PaddlePaddle/models/tree/develop/text_classification)
@@ -42,7 +43,7 @@ For text classification, we provide a non-sequential text classification model b
 Learning to rank (LTR) is one of the core problems in information retrieval and search engine research. Training data is used by a learning algorithm to produce a ranking model which computes the relevance of documents for actual queries.
 The depth neural network can be used to model the fractional function to form various LTR models based on depth learning.
-The algorithms for learning to rank are usually categorized into three groups by their input representation and the loss function. These are pointwise, pairwise and listwise approaches. Here we demonstrate RankLoss loss function method (pairwise approach), and LambdaRank loss function method (listwise approach). (For Pointwise approaches, please refer to [Recommended System](https://github.com/PaddlePaddle/book/blob/develop/05.recommender_system/README.cn.md)).
+The algorithms for learning to rank are usually categorized into three groups by their input representation and the loss function. These are pointwise, pairwise and listwise approaches. Here we demonstrate RankLoss loss function method (pairwise approach), and LambdaRank loss function method (listwise approach). (For Pointwise approaches, please refer to [Recommended System](http://www.paddlepaddle.org/docs/develop/book/05.recommender_system/index.html)).
 - 5.1 [Learning to rank based on Pairwise and Listwise approches](https://github.com/PaddlePaddle/models/tree/develop/ltr)
@@ -71,11 +72,15 @@ As an example for sequence-to-sequence learning, we take the machine translation
 ## 9. Image classification
-For the example of image classification, we show you how to train AlexNet, VGG, GoogLeNet and ResNet models in PaddlePaddle. It also provides a model conversion tool that converts Caffe trained model files into PaddlePaddle model files.
+For the example of image classification, we show you how to train AlexNet, VGG, GoogLeNet, ResNet, Inception-v4, Inception-Resnet-V2 and Xception models in PaddlePaddle. It also provides model conversion tools that convert Caffe or TensorFlow trained model files into PaddlePaddle model files.
 - 9.1 [convert Caffe model file to PaddlePaddle model file](https://github.com/PaddlePaddle/models/tree/develop/image_classification/caffe2paddle)
- 9.2 [AlexNet](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
+- 9.2 [convert TensorFlow model file to PaddlePaddle model file](https://github.com/PaddlePaddle/models/tree/develop/image_classification/tf2paddle)
- 9.3 [VGG](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
+- 9.3 [AlexNet](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
- 9.4 [Residual Network](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
+- 9.4 [VGG](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
+- 9.5 [Residual Network](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
+- 9.6 [Inception-v4](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
+- 9.7 [Inception-Resnet-V2](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
+- 9.8 [Xception](https://github.com/PaddlePaddle/models/tree/develop/image_classification)
 This tutorial is contributed by [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) and licensed under the [Apache-2.0 license](LICENSE).
--- a/conv_seq2seq/README.md
+++ b/conv_seq2seq/README.md
@@ -4,55 +4,63 @@ This model implements the work in the following paper:
 Jonas Gehring, Micheal Auli, David Grangier, et al. Convolutional Sequence to Sequence Learning. Association for Computational Linguistics (ACL), 2017
 # Data Preparation
+- The data used in this tutorial can be downloaded by runing:
- In this tutorial, each line in a data file contains one sample and each sample consists of a source sentence and a target sentence. And the two sentences are seperated by '\t'. So, to use your own data, it should be organized as follows:
+    ```bash
+    sh download.sh
+    ```
-  ```
+- Each line in the data file contains one sample and each sample consists of a source sentence and a target sentence. And the two sentences are seperated by '\t'. So, to use your own data, it should be organized as follows:
-  <source sentence>\t<target sentence>
-  ```
+    ```
+    <source sentence>\t<target sentence>
+    ```
 # Training a Model
 - Modify the following script if needed and then run:
-    ```bash
+  ```bash
-    python train.py \
+  python train.py \
-      --train_data_path ./data/train_data \
+      --train_data_path ./data/train \
-      --test_data_path ./data/test_data \
+      --test_data_path ./data/test \
      --src_dict_path ./data/src_dict \
      --trg_dict_path ./data/trg_dict \
      --enc_blocks "[(256, 3)] * 5" \
      --dec_blocks "[(256, 3)] * 3" \
      --emb_size 256 \
      --pos_size 200 \
-      --drop_rate 0.1 \
+      --drop_rate 0.2 \
+      --use_bn False \
      --use_gpu False \
      --trainer_count 1 \
      --batch_size 32 \
      --num_passes 20 \
      >train.log 2>&1
-    ```
+  ```
 # Inferring by a Trained Model
 - Infer by a trained model by running:
-    ```bash
+  ```bash
-    python infer.py \
+  python infer.py \
-      --infer_data_path ./data/infer_data \
+      --infer_data_path ./data/dev \
      --src_dict_path ./data/src_dict \
      --trg_dict_path ./data/trg_dict \
      --enc_blocks "[(256, 3)] * 5" \
      --dec_blocks "[(256, 3)] * 3" \
      --emb_size 256 \
      --pos_size 200 \
-      --drop_rate 0.1 \
+      --drop_rate 0.2 \
+      --use_bn False \
      --use_gpu False \
      --trainer_count 1 \
      --max_len 100 \
+      --batch_size 256 \
      --beam_size 1 \
+      --is_show_attention False \
      --model_path ./params.pass-0.tar.gz \
      1>infer_result 2>infer.log
    ```
 # Notes
+Since PaddlePaddle of current version doesn't support weight normalization, we use batch normalization instead to confirm convergence when the network is deep.
-Currently, beam search will forward the encoder multiple times when predicting each target word, which requires extra computations. And we will fix it later.
--- a/conv_seq2seq/beamsearch.py
+++ b/conv_seq2seq/beamsearch.py
@@ -2,13 +2,15 @@
 import sys
 import time
+import math
 import numpy as np
+import reader
 class BeamSearch(object):
    """
    Generate sequence by beam search
-    NOTE: this class only implements generating one sentence at a time.
    """
    def __init__(self,
@@ -16,44 +18,44 @@ class BeamSearch(object):
                 trg_dict,
                 pos_size,
                 padding_num,
+                 batch_size=1,
                 beam_size=1,
                 max_len=100):
        self.inferer = inferer
        self.trg_dict = trg_dict
+        self.reverse_trg_dict = reader.get_reverse_dict(trg_dict)
        self.word_padding = trg_dict.__len__()
        self.pos_size = pos_size
        self.pos_padding = pos_size
        self.padding_num = padding_num
        self.win_len = padding_num + 1
        self.max_len = max_len
+        self.batch_size = batch_size
        self.beam_size = beam_size
-    def get_beam_input(self, pre_beam_list, infer_data):
+    def get_beam_input(self, batch, sample_list):
        """
        Get input for generation at the current iteration.
        """
        beam_input = []
-        if len(pre_beam_list) == 0:
+        for sample_id in sample_list:
-            cur_trg = [self.word_padding
+            for path in self.candidate_path[sample_id]:
-                       ] * self.padding_num + [self.trg_dict['<s>']]
+                if len(path['seq']) < self.win_len:
-            cur_trg_pos = [self.pos_padding] * self.padding_num + [0]
-            beam_input.append(infer_data + [cur_trg] + [cur_trg_pos])
-        else:
-            for seq in pre_beam_list:
-                if len(seq) < self.win_len:
                    cur_trg = [self.word_padding] * (
-                        self.win_len - len(seq) - 1
+                        self.win_len - len(path['seq']) - 1
-                    ) + [self.trg_dict['<s>']] + seq
+                    ) + [self.trg_dict['<s>']] + path['seq']
                    cur_trg_pos = [self.pos_padding] * (
-                        self.win_len - len(seq) - 1) + [0] + range(1,
+                        self.win_len - len(path['seq']) - 1) + [0] + range(
-                                                                   len(seq) + 1)
+                            1, len(path['seq']) + 1)
                else:
-                    cur_trg = seq[-self.win_len:]
+                    cur_trg = path['seq'][-self.win_len:]
                    cur_trg_pos = range(
-                        len(seq) + 1 - self.win_len, len(seq) + 1)
+                        len(path['seq']) + 1 - self.win_len,
+                        len(path['seq']) + 1)
+                beam_input.append(batch[sample_id] + [cur_trg] + [cur_trg_pos])
-                beam_input.append(infer_data + [cur_trg] + [cur_trg_pos])
        return beam_input
    def get_prob(self, beam_input):
@@ -64,100 +66,132 @@ class BeamSearch(object):
        prob = self.inferer.infer(beam_input, field='value')[row_list, :]
        return prob
-    def get_candidate(self, pre_beam_list, pre_beam_score, prob):
+    def _top_k(self, prob, k):
        """
-        Get top beam_size tokens and their scores for each beam.
+        Get indices of the words with k highest probablities.
        """
-        if prob.ndim == 1:
+        return prob.argsort()[-k:][::-1]
-            candidate_id = prob.argsort()[-self.beam_size:][::-1]
-            candidate_log_prob = np.log(prob[candidate_id])
+    def beam_expand(self, prob, sample_list):
-        else:
-            candidate_id = prob.argsort()[:, -self.beam_size:][:, ::-1]
-            candidate_log_prob = np.zeros_like(candidate_id).astype('float32')
-            for j in range(len(pre_beam_list)):
-                candidate_log_prob[j, :] = np.log(prob[j, candidate_id[j, :]])
-        if pre_beam_score.size > 0:
-            candidate_score = candidate_log_prob + pre_beam_score.reshape(
-                (pre_beam_score.size, 1))
-        else:
-            candidate_score = candidate_log_prob
-        return candidate_id, candidate_score
-    def prune(self, candidate_id, candidate_score, pre_beam_list,
-              completed_seq_list, completed_seq_score, completed_seq_min_score):
        """
-        Pruning process of the beam search. During the process, beam_size most possible sequences
+        In every iteration step, the model predicts the possible next words.
-        are selected for the beam in the next iteration. Besides, their scores and the minimum score
+        For each input sentence, the top beam_size words are selected as candidates.
-        of the completed sequences are updated.
        """
-        candidate_id = candidate_id.flatten()
+        top_words = np.apply_along_axis(self._top_k, 1, prob, self.beam_size)
-        candidate_score = candidate_score.flatten()
+        candidate_words = [[]] * len(self.candidate_path)
-        topk_idx = candidate_score.argsort()[-self.beam_size:][::-1].tolist()
+        idx = 0
-        topk_seq_idx = [idx / self.beam_size for idx in topk_idx]
+        for sample_id in sample_list:
-        next_beam = []
+            for seq_id, path in enumerate(self.candidate_path[sample_id]):
-        beam_score = []
+                for w in top_words[idx, :]:
-        for j in range(len(topk_idx)):
+                    score = path['score'] + math.log(prob[idx, w])
-            if candidate_id[topk_idx[j]] == self.trg_dict['<e>']:
+                    candidate_words[sample_id] = candidate_words[sample_id] + [{
-                if len(
+                        'word': w,
-                        completed_seq_list
+                        'score': score,
-                ) < self.beam_size or completed_seq_min_score <= candidate_score[
+                        'seq_id': seq_id
-                        topk_idx[j]]:
+                    }]
-                    completed_seq_list.append(pre_beam_list[topk_seq_idx[j]])
+                idx = idx + 1
-                    completed_seq_score.append(candidate_score[topk_idx[j]])
+        return candidate_words
-                    if completed_seq_min_score is None or (
-                            completed_seq_min_score >=
+    def beam_shrink(self, candidate_words, sample_list):
-                            candidate_score[topk_idx[j]] and
-                            len(completed_seq_list) < self.beam_size):
-                        completed_seq_min_score = candidate_score[topk_idx[j]]
-            else:
-                seq = pre_beam_list[topk_seq_idx[
-                    j]] + [candidate_id[topk_idx[j]]]
-                score = candidate_score[topk_idx[j]]
-                next_beam.append(seq)
-                beam_score.append(score)
-        beam_score = np.array(beam_score)
-        return next_beam, beam_score, completed_seq_min_score
-    def search_one_sample(self, infer_data):
        """
-        Beam search process for one sample.
+        Pruning process of the beam search. During the process, beam_size most post possible
+        sequences are selected for the beam in the next generation.
        """
-        completed_seq_list = []
+        new_path = [[]] * len(self.candidate_path)
-        completed_seq_score = []
-        completed_seq_min_score = None
+        for sample_id in sample_list:
-        uncompleted_seq_list = [[]]
+            beam_words = sorted(
-        uncompleted_seq_score = np.zeros(0)
+                candidate_words[sample_id],
+                key=lambda x: x['score'],
+                reverse=True)[:self.beam_size]
+            complete_seq_min_score = None
+            complete_path_num = len(self.complete_path[sample_id])
+            if complete_path_num > 0:
+                complete_seq_min_score = min(self.complete_path[sample_id],
+                                             key=lambda x: x['score'])['score']
+                if complete_path_num >= self.beam_size:
+                    beam_words_max_score = beam_words[0]['score']
+                    if beam_words_max_score < complete_seq_min_score:
+                        continue
+            for w in beam_words:
+                if w['word'] == self.trg_dict['<e>']:
+                    if complete_path_num < self.beam_size or complete_seq_min_score <= w[
+                            'score']:
+                        seq = self.candidate_path[sample_id][w['seq_id']]['seq']
+                        self.complete_path[sample_id] = self.complete_path[
+                            sample_id] + [{
+                                'seq': seq,
+                                'score': w['score']
+                            }]
+                        if complete_seq_min_score is None or complete_seq_min_score > w[
+                                'score']:
+                            complete_seq_min_score = w['score']
+                else:
+                    seq = self.candidate_path[sample_id][w['seq_id']]['seq'] + [
+                        w['word']
+                    ]
+                    new_path[sample_id] = new_path[sample_id] + [{
+                        'seq': seq,
+                        'score': w['score']
+                    }]
-        for i in xrange(self.max_len):
+        return new_path
-            beam_input = self.get_beam_input(uncompleted_seq_list, infer_data)
-            prob = self.get_prob(beam_input)
+    def search_one_batch(self, batch):
+        """
+        Perform beam search on one mini-batch.
+        """
+        real_size = len(batch)
+        self.candidate_path = [[{'seq': [], 'score': 0.}]] * real_size
+        self.complete_path = [[]] * real_size
+        sample_list = range(real_size)
-            candidate_id, candidate_score = self.get_candidate(
+        for i in xrange(self.max_len):
-                uncompleted_seq_list, uncompleted_seq_score, prob)
+            beam_input = self.get_beam_input(batch, sample_list)
+            prob = self.get_prob(beam_input)
-            uncompleted_seq_list, uncompleted_seq_score, completed_seq_min_score = self.prune(
+            candidate_words = self.beam_expand(prob, sample_list)
-                candidate_id, candidate_score, uncompleted_seq_list,
+            new_path = self.beam_shrink(candidate_words, sample_list)
-                completed_seq_list, completed_seq_score,
+            self.candidate_path = new_path
-                completed_seq_min_score)
+            sample_list = [
+                sample_id for sample_id in sample_list
+                if len(new_path[sample_id]) > 0
+            ]
-            if len(uncompleted_seq_list) == 0:
+            if len(sample_list) == 0:
                break
-            if len(completed_seq_list) >= self.beam_size:
-                seq_max_score = uncompleted_seq_score.max()
+        final_path = []
-                if seq_max_score < completed_seq_min_score:
+        for i in xrange(real_size):
-                    uncompleted_seq_list = []
+            top_path = sorted(
-                    break
+                self.complete_path[i] + self.candidate_path[i],
+                key=lambda x: x['score'],
-        final_seq_list = completed_seq_list + uncompleted_seq_list
+                reverse=True)[:self.beam_size]
-        final_score = np.concatenate(
+            final_path.append(top_path)
-            (np.array(completed_seq_score), uncompleted_seq_score))
+        return final_path
-        max_id = final_score.argmax()
-        top_seq = final_seq_list[max_id]
+    def search(self, infer_data):
-        return top_seq
+        """
+        Perform beam search on all data.
+        """
+        def _to_sentence(seq):
+            raw_sentence = [self.reverse_trg_dict[id] for id in seq]
+            sentence = " ".join(raw_sentence)
+            return sentence
+        for pos in xrange(0, len(infer_data), self.batch_size):
+            batch = infer_data[pos:min(pos + self.batch_size, len(infer_data))]
+            self.final_path = self.search_one_batch(batch)
+            for top_path in self.final_path:
+                print _to_sentence(top_path[0]['seq'])
+            sys.stdout.flush()
--- a/conv_seq2seq/download.sh
+++ b/conv_seq2seq/download.sh
+#!/usr/bin/env bash
+CUR_PATH=`pwd`
+git clone https://github.com/moses-smt/mosesdecoder.git
+git clone https://github.com/rizar/actor-critic-public
+export MOSES=`pwd`/mosesdecoder
+export LVSR=`pwd`/actor-critic-public
+cd actor-critic-public/exp/ted
+sh create_dataset.sh
+cd $CUR_PATH
+mkdir data
+cp actor-critic-public/exp/ted/prep/*-* data/
+cp actor-critic-public/exp/ted/vocab.* data/
+cd data
+python ../preprocess.py
+cd ..
+rm -rf actor-critic-public mosesdecoder
--- a/conv_seq2seq/infer.py
+++ b/conv_seq2seq/infer.py
@@ -36,7 +36,7 @@ def parse_args():
    parser.add_argument(
        '--emb_size',
        type=int,
-        default=512,
+        default=256,
        help='Dimension of word embedding. (default: %(default)s)')
    parser.add_argument(
        '--pos_size',
@@ -48,6 +48,11 @@ def parse_args():
        type=float,
        default=0.,
        help='Dropout rate. (default: %(default)s)')
+    parser.add_argument(
+        "--use_bn",
+        default=False,
+        type=distutils.util.strtobool,
+        help="Use batch normalization or not. (default: %(default)s)")
    parser.add_argument(
        "--use_gpu",
        default=False,
@@ -64,36 +69,43 @@ def parse_args():
        default=100,
        help="The maximum length of the sentence to be generated. (default: %(default)s)"
    )
+    parser.add_argument(
+        "--batch_size",
+        default=1,
+        type=int,
+        help="Size of a mini-batch. (default: %(default)s)")
    parser.add_argument(
        "--beam_size",
        default=1,
        type=int,
-        help="The width of beam expasion. (default: %(default)s)")
+        help="The width of beam expansion. (default: %(default)s)")
    parser.add_argument(
        "--model_path",
        type=str,
        required=True,
        help="The path of trained model. (default: %(default)s)")
+    parser.add_argument(
+        "--is_show_attention",
+        default=False,
+        type=distutils.util.strtobool,
+        help="Whether to show attention weight or not. (default: %(default)s)")
    return parser.parse_args()
-def to_sentence(seq, dictionary):
-    raw_sentence = [dictionary[id] for id in seq]
-    sentence = " ".join(raw_sentence)
-    return sentence
 def infer(infer_data_path,
          src_dict_path,
          trg_dict_path,
          model_path,
          enc_conv_blocks,
          dec_conv_blocks,
-          emb_dim=512,
+          emb_dim=256,
          pos_size=200,
          drop_rate=0.,
+          use_bn=False,
          max_len=100,
-          beam_size=1):
+          batch_size=1,
+          beam_size=1,
+          is_show_attention=False):
    """
    Inference.
@@ -120,10 +132,14 @@ def infer(infer_data_path,
    :type pos_size: int
    :param drop_rate: Dropout rate.
    :type drop_rate: float
+    :param use_bn: Whether to use batch normalization or not. False is the default value.
+    :type use_bn: bool
    :param max_len: The maximum length of the sentence to be generated.
    :type max_len: int
    :param beam_size: The width of beam expansion.
    :type beam_size: int
+    :param is_show_attention: Whether to show attention weight or not. False is the default value.
+    :type is_show_attention: bool
    """
    # load dict
    src_dict = reader.load_dict(src_dict_path)
@@ -131,7 +147,7 @@ def infer(infer_data_path,
    src_dict_size = src_dict.__len__()
    trg_dict_size = trg_dict.__len__()
-    prob = conv_seq2seq(
+    prob, weight = conv_seq2seq(
        src_dict_size=src_dict_size,
        trg_dict_size=trg_dict_size,
        pos_size=pos_size,
@@ -139,6 +155,7 @@ def infer(infer_data_path,
        enc_conv_blocks=enc_conv_blocks,
        dec_conv_blocks=dec_conv_blocks,
        drop_rate=drop_rate,
+        with_bn=use_bn,
        is_infer=True)
    # load parameters
@@ -153,6 +170,26 @@ def infer(infer_data_path,
        pos_size=pos_size,
        padding_num=padding_num)
+    if is_show_attention:
+        attention_inferer = paddle.inference.Inference(
+            output_layer=weight, parameters=parameters)
+        for i, data in enumerate(infer_reader()):
+            src_len = len(data[0])
+            trg_len = len(data[2])
+            attention_weight = attention_inferer.infer(
+                [data], field='value', flatten_result=False)
+            attention_weight = [
+                weight.reshape((trg_len, src_len))
+                for weight in attention_weight
+            ]
+            print attention_weight
+            break
+        return
+    infer_data = []
+    for i, raw_data in enumerate(infer_reader()):
+        infer_data.append([raw_data[0], raw_data[1]])
    inferer = paddle.inference.Inference(
        output_layer=prob, parameters=parameters)
@@ -162,15 +199,10 @@ def infer(infer_data_path,
        pos_size=pos_size,
        padding_num=padding_num,
        max_len=max_len,
+        batch_size=batch_size,
        beam_size=beam_size)
-    reverse_trg_dict = reader.get_reverse_dict(trg_dict)
+    searcher.search(infer_data)
-    for i, raw_data in enumerate(infer_reader()):
-        infer_data = [raw_data[0], raw_data[1]]
-        result = searcher.search_one_sample(infer_data)
-        sentence = to_sentence(result, reverse_trg_dict)
-        print sentence
-        sys.stdout.flush()
    return
@@ -179,6 +211,8 @@ def main():
    enc_conv_blocks = eval(args.enc_blocks)
    dec_conv_blocks = eval(args.dec_blocks)
+    sys.setrecursionlimit(10000)
    paddle.init(use_gpu=args.use_gpu, trainer_count=args.trainer_count)
    infer(
@@ -191,8 +225,11 @@ def main():
        emb_dim=args.emb_size,
        pos_size=args.pos_size,
        drop_rate=args.drop_rate,
+        use_bn=args.use_bn,
        max_len=args.max_len,
-        beam_size=args.beam_size)
+        batch_size=args.batch_size,
+        beam_size=args.beam_size,
+        is_show_attention=args.is_show_attention)
 if __name__ == '__main__':

--- a/conv_seq2seq/model.py
+++ b/conv_seq2seq/model.py
@@ -12,7 +12,8 @@ def gated_conv_with_batchnorm(input,
                              context_len,
                              context_start=None,
                              learning_rate=1.0,
-                              drop_rate=0.):
+                              drop_rate=0.,
+                              with_bn=False):
    """
    Definition of the convolution block.
@@ -30,6 +31,9 @@ def gated_conv_with_batchnorm(input,
    :type learning_rate: float
    :param drop_rate: Dropout rate.
    :type drop_rate: float
+    :param with_bn: Whether to use batch normalization or not. False is the default
+                    value.
+    :type with_bn: bool
    :return: The output of the convolution block.
    :rtype: LayerOutput
    """
@@ -50,18 +54,18 @@ def gated_conv_with_batchnorm(input,
            learning_rate=learning_rate),
        bias_attr=False)
-    batch_norm_conv = paddle.layer.batch_norm(
+    if with_bn:
-        input=raw_conv,
+        raw_conv = paddle.layer.batch_norm(
-        act=paddle.activation.Linear(),
+            input=raw_conv,
-        param_attr=paddle.attr.Param(learning_rate=learning_rate))
+            act=paddle.activation.Linear(),
+            param_attr=paddle.attr.Param(learning_rate=learning_rate))
    with paddle.layer.mixed(size=size) as conv:
-        conv += paddle.layer.identity_projection(
+        conv += paddle.layer.identity_projection(raw_conv, size=size, offset=0)
-            batch_norm_conv, size=size, offset=0)
    with paddle.layer.mixed(size=size, act=paddle.activation.Sigmoid()) as gate:
        gate += paddle.layer.identity_projection(
-            batch_norm_conv, size=size, offset=size)
+            raw_conv, size=size, offset=size)
    with paddle.layer.mixed(size=size) as gated_conv:
        gated_conv += paddle.layer.dotmul_operator(conv, gate)
@@ -73,7 +77,8 @@ def encoder(token_emb,
            pos_emb,
            conv_blocks=[(256, 3)] * 5,
            num_attention=3,
-            drop_rate=0.1):
+            drop_rate=0.,
+            with_bn=False):
    """
    Definition of the encoder.
@@ -89,6 +94,9 @@ def encoder(token_emb,
    :type num_attention: int
    :param drop_rate: Dropout rate.
    :type drop_rate: float
+    :param with_bn: Whether to use batch normalization or not. False is the default
+                    value.
+    :type with_bn: bool
    :return: The input token encoding.
    :rtype: LayerOutput
    """
@@ -124,7 +132,8 @@ def encoder(token_emb,
            size=size,
            context_len=context_len,
            learning_rate=1.0 / (2.0 * num_attention),
-            drop_rate=drop_rate)
+            drop_rate=drop_rate,
+            with_bn=with_bn)
        with paddle.layer.mixed(size=size) as block_output:
            block_output += paddle.layer.identity_projection(residual)
@@ -165,7 +174,7 @@ def attention(decoder_state, cur_embedding, encoded_vec, encoded_sum):
    :type encoded_vec: LayerOutput
    :param encoded_sum: The sum of the source token's encoding and embedding.
    :type encoded_sum: LayerOutput
-    :return: A context vector.
+    :return: A context vector and the attention weight.
    :rtype: LayerOutput
    """
    residual = decoder_state
@@ -182,31 +191,29 @@ def attention(decoder_state, cur_embedding, encoded_vec, encoded_sum):
    expanded = paddle.layer.expand(input=state_summary, expand_as=encoded_vec)
-    m = paddle.layer.linear_comb(weights=expanded, vectors=encoded_vec)
+    m = paddle.layer.dot_prod(input1=expanded, input2=encoded_vec)
-    attention_weight = paddle.layer.fc(
+    attention_weight = paddle.layer.fc(input=m,
-        input=m,
+                                       size=1,
-        size=1,
+                                       act=paddle.activation.SequenceSoftmax(),
-        act=paddle.activation.SequenceSoftmax(),
+                                       bias_attr=False)
-        bias_attr=False)
    scaled = paddle.layer.scaling(weight=attention_weight, input=encoded_sum)
    attended = paddle.layer.pooling(
        input=scaled, pooling_type=paddle.pooling.Sum())
-    attended_proj = paddle.layer.fc(
+    attended_proj = paddle.layer.fc(input=attended,
-        input=attended,
+                                    size=state_size,
-        size=state_size,
+                                    act=paddle.activation.Linear(),
-        act=paddle.activation.Linear(),
+                                    bias_attr=True)
-        bias_attr=True)
    attention_result = paddle.layer.addto(input=[attended_proj, residual])
    # halve the variance of the sum
    attention_result = paddle.layer.slope_intercept(
        input=attention_result, slope=math.sqrt(0.5))
-    return attention_result
+    return attention_result, attention_weight
 def decoder(token_emb,
@@ -215,7 +222,8 @@ def decoder(token_emb,
            encoded_sum,
            dict_size,
            conv_blocks=[(256, 3)] * 3,
-            drop_rate=0.1):
+            drop_rate=0.,
+            with_bn=False):
    """
    Definition of the decoder.
@@ -235,7 +243,10 @@ def decoder(token_emb,
    :type conv_blocks: list of tuple
    :param drop_rate: Dropout rate.
    :type drop_rate: float
-    :return: The probability of the predicted token.
+    :param with_bn: Whether to use batch normalization or not. False is the default
+                    value.
+    :type with_bn: bool
+    :return: The probability of the predicted token and the attention weights.
    :rtype: LayerOutput
    """
@@ -261,22 +272,23 @@ def decoder(token_emb,
            initial_std=math.sqrt((1.0 - drop_rate) / embedding.size)),
        bias_attr=True, )
+    weight = []
    for (size, context_len) in conv_blocks:
        if block_input.size == size:
            residual = block_input
        else:
-            residual = paddle.layer.fc(
+            residual = paddle.layer.fc(input=block_input,
-                input=block_input,
+                                       size=size,
-                size=size,
+                                       act=paddle.activation.Linear(),
-                act=paddle.activation.Linear(),
+                                       bias_attr=True)
-                bias_attr=True)
        decoder_state = gated_conv_with_batchnorm(
            input=block_input,
            size=size,
            context_len=context_len,
            context_start=0,
-            drop_rate=drop_rate)
+            drop_rate=drop_rate,
+            with_bn=with_bn)
        group_inputs = [
            decoder_state,
@@ -285,8 +297,9 @@ def decoder(token_emb,
            paddle.layer.StaticInput(input=encoded_sum),
        ]
-        conditional = paddle.layer.recurrent_group(
+        conditional, attention_weight = paddle.layer.recurrent_group(
            step=attention_step, input=group_inputs)
+        weight.append(attention_weight)
        block_output = paddle.layer.addto(input=[conditional, residual])
@@ -312,7 +325,7 @@ def decoder(token_emb,
            initial_std=math.sqrt((1.0 - drop_rate) / block_output.size)),
        bias_attr=True)
-    return decoder_out
+    return decoder_out, weight
 def conv_seq2seq(src_dict_size,
@@ -321,7 +334,8 @@ def conv_seq2seq(src_dict_size,
                 emb_dim,
                 enc_conv_blocks=[(256, 3)] * 5,
                 dec_conv_blocks=[(256, 3)] * 3,
-                 drop_rate=0.1,
+                 drop_rate=0.,
+                 with_bn=False,
                 is_infer=False):
    """
    Definition of convolutional sequence-to-sequence network.
@@ -345,6 +359,8 @@ def conv_seq2seq(src_dict_size,
    :type dec_conv_blocks: list of tuple
    :param drop_rate: Dropout rate.
    :type drop_rate: float
+    :param with_bn: Whether to use batch normalization or not. False is the default value.
+    :type with_bn: bool
    :param is_infer: Whether infer or not.
    :type is_infer: bool
    :return: Cost or output layer.
@@ -362,12 +378,14 @@ def conv_seq2seq(src_dict_size,
        input=src,
        size=emb_dim,
        name='src_word_emb',
-        param_attr=paddle.attr.Param(initial_mean=0., initial_std=0.1))
+        param_attr=paddle.attr.Param(
+            initial_mean=0., initial_std=0.1))
    src_pos_emb = paddle.layer.embedding(
        input=src_pos,
        size=emb_dim,
        name='src_pos_emb',
-        param_attr=paddle.attr.Param(initial_mean=0., initial_std=0.1))
+        param_attr=paddle.attr.Param(
+            initial_mean=0., initial_std=0.1))
    num_attention = len(dec_conv_blocks)
    encoded_vec, encoded_sum = encoder(
@@ -375,7 +393,8 @@ def conv_seq2seq(src_dict_size,
        pos_emb=src_pos_emb,
        conv_blocks=enc_conv_blocks,
        num_attention=num_attention,
-        drop_rate=drop_rate)
+        drop_rate=drop_rate,
+        with_bn=with_bn)
    trg = paddle.layer.data(
        name='trg_word',
@@ -390,24 +409,27 @@ def conv_seq2seq(src_dict_size,
        input=trg,
        size=emb_dim,
        name='trg_word_emb',
-        param_attr=paddle.attr.Param(initial_mean=0., initial_std=0.1))
+        param_attr=paddle.attr.Param(
+            initial_mean=0., initial_std=0.1))
    trg_pos_emb = paddle.layer.embedding(
        input=trg_pos,
        size=emb_dim,
        name='trg_pos_emb',
-        param_attr=paddle.attr.Param(initial_mean=0., initial_std=0.1))
+        param_attr=paddle.attr.Param(
+            initial_mean=0., initial_std=0.1))
-    decoder_out = decoder(
+    decoder_out, weight = decoder(
        token_emb=trg_emb,
        pos_emb=trg_pos_emb,
        encoded_vec=encoded_vec,
        encoded_sum=encoded_sum,
        dict_size=trg_dict_size,
        conv_blocks=dec_conv_blocks,
-        drop_rate=drop_rate)
+        drop_rate=drop_rate,
+        with_bn=with_bn)
    if is_infer:
-        return decoder_out
+        return decoder_out, weight
    trg_next_word = paddle.layer.data(
        name='trg_next_word',

--- a/conv_seq2seq/preprocess.py
+++ b/conv_seq2seq/preprocess.py
+#coding=utf-8
+import cPickle
+def concat_file(file1, file2, dst_file):
+    with open(dst_file, 'w') as dst:
+        with open(file1) as f1:
+            with open(file2) as f2:
+                for i, (line1, line2) in enumerate(zip(f1, f2)):
+                    line1 = line1.strip()
+                    line = line1 + '\t' + line2
+                    dst.write(line)
+if __name__ == '__main__':
+    concat_file('dev.de-en.de', 'dev.de-en.en', 'dev')
+    concat_file('test.de-en.de', 'test.de-en.en', 'test')
+    concat_file('train.de-en.de', 'train.de-en.en', 'train')
+    src_dict = cPickle.load(open('vocab.de'))
+    trg_dict = cPickle.load(open('vocab.en'))
+    with open('src_dict', 'w') as f:
+        f.write('<s>\n<e>\nUNK\n')
+        f.writelines('\n'.join(src_dict.keys()))
+    with open('trg_dict', 'w') as f:
+        f.write('<s>\n<e>\nUNK\n')
+        f.writelines('\n'.join(trg_dict.keys()))
--- a/conv_seq2seq/reader.py
+++ b/conv_seq2seq/reader.py
@@ -18,7 +18,7 @@ def get_reverse_dict(dictionary):
 def load_data(data_file, src_dict, trg_dict):
-    UNK_IDX = src_dict['<unk>']
+    UNK_IDX = src_dict['UNK']
    with open(data_file, 'r') as f:
        for line in f:
            line_split = line.strip().split('\t')
@@ -34,7 +34,7 @@ def load_data(data_file, src_dict, trg_dict):
 def data_reader(data_file, src_dict, trg_dict, pos_size, padding_num):
    def reader():
-        UNK_IDX = src_dict['<unk>']
+        UNK_IDX = src_dict['UNK']
        word_padding = trg_dict.__len__()
        pos_padding = pos_size

--- a/conv_seq2seq/train.py
+++ b/conv_seq2seq/train.py
@@ -40,7 +40,7 @@ def parse_args():
    parser.add_argument(
        '--emb_size',
        type=int,
-        default=512,
+        default=256,
        help='Dimension of word embedding. (default: %(default)s)')
    parser.add_argument(
        '--pos_size',
@@ -52,6 +52,11 @@ def parse_args():
        type=float,
        default=0.,
        help='Dropout rate. (default: %(default)s)')
+    parser.add_argument(
+        "--use_bn",
+        default=False,
+        type=distutils.util.strtobool,
+        help="Use batch normalization or not. (default: %(default)s)")
    parser.add_argument(
        "--use_gpu",
        default=False,
@@ -116,9 +121,10 @@ def train(train_data_path,
          trg_dict_path,
          enc_conv_blocks,
          dec_conv_blocks,
-          emb_dim=512,
+          emb_dim=256,
          pos_size=200,
          drop_rate=0.,
+          use_bn=False,
          batch_size=32,
          num_passes=15):
    """
@@ -147,6 +153,8 @@ def train(train_data_path,
    :type pos_size: int
    :param drop_rate: Dropout rate.
    :type drop_rate: float
+    :param use_bn: Whether to use batch normalization or not. False is the default value.
+    :type use_bn: bool
    :param batch_size: The size of a mini-batch.
    :type batch_size: int
    :param num_passes: The total number of the passes to train.
@@ -158,8 +166,7 @@ def train(train_data_path,
    src_dict_size = src_dict.__len__()
    trg_dict_size = trg_dict.__len__()
-    optimizer = paddle.optimizer.Adam(
+    optimizer = paddle.optimizer.Adam(learning_rate=1e-3, )
-        learning_rate=1e-3, )
    cost = conv_seq2seq(
        src_dict_size=src_dict_size,
@@ -169,12 +176,14 @@ def train(train_data_path,
        enc_conv_blocks=enc_conv_blocks,
        dec_conv_blocks=dec_conv_blocks,
        drop_rate=drop_rate,
+        with_bn=use_bn,
        is_infer=False)
    # create parameters and trainer
    parameters = paddle.parameters.create(cost)
-    trainer = paddle.trainer.SGD(
+    trainer = paddle.trainer.SGD(cost=cost,
-        cost=cost, parameters=parameters, update_equation=optimizer)
+                                 parameters=parameters,
+                                 update_equation=optimizer)
    padding_list = [context_len - 1 for (size, context_len) in dec_conv_blocks]
    padding_num = reduce(lambda x, y: x + y, padding_list)
@@ -203,7 +212,6 @@ def train(train_data_path,
                print "[%s]: Pass: %d, Batch: %d, TrainCost: %f, %s" % (
                    cur_time, event.pass_id, event.batch_id, event.cost,
                    event.metrics)
-            else:
                sys.stdout.flush()
        if isinstance(event, paddle.event.EndPass):
@@ -232,6 +240,8 @@ def main():
    enc_conv_blocks = eval(args.enc_blocks)
    dec_conv_blocks = eval(args.dec_blocks)
+    sys.setrecursionlimit(10000)
    paddle.init(use_gpu=args.use_gpu, trainer_count=args.trainer_count)
    train(
@@ -244,6 +254,7 @@ def main():
        emb_dim=args.emb_size,
        pos_size=args.pos_size,
        drop_rate=args.drop_rate,
+        use_bn=args.use_bn,
        batch_size=args.batch_size,
        num_passes=args.num_passes)

--- a/ctr/README.cn.md
+++ b/ctr/README.cn.md
+# 点击率预估
+以下是本例目录包含的文件以及对应说明:
+```
+├── README.md               # 本教程markdown 文档
+├── dataset.md              # 数据集处理教程
+├── images                  # 本教程图片目录
+│   ├── lr_vs_dnn.jpg
+│   └── wide_deep.png
+├── infer.py                # 预测脚本
+├── network_conf.py         # 模型网络配置
+├── reader.py               # data reader
+├── train.py                # 训练脚本
+└── utils.py                # helper functions
+└── avazu_data_processer.py # 示例数据预处理脚本
+```
+## 背景介绍
+CTR(Click-Through Rate，点击率预估)\[[1](https://en.wikipedia.org/wiki/Click-through_rate)\]
+是对用户点击一个特定链接的概率做出预测，是广告投放过程中的一个重要环节。精准的点击率预估对在线广告系统收益最大化具有重要意义。
+当有多个广告位时，CTR 预估一般会作为排序的基准，比如在搜索引擎的广告系统里，当用户输入一个带商业价值的搜索词（query）时，系统大体上会执行下列步骤来展示广告：
+1.  获取与用户搜索词相关的广告集合
+2.  业务规则和相关性过滤
+3.  根据拍卖机制和 CTR 排序
+4.  展出广告
+可以看到，CTR 在最终排序中起到了很重要的作用。
+### 发展阶段
+在业内，CTR 模型经历了如下的发展阶段：
+-   Logistic Regression(LR) / GBDT + 特征工程
+-   LR + DNN 特征
+-   DNN + 特征工程
+在发展早期时 LR 一统天下，但最近 DNN 模型由于其强大的学习能力和逐渐成熟的性能优化，
+逐渐地接过 CTR 预估任务的大旗。
+### LR vs DNN
+下图展示了 LR 和一个 \(3x2\) 的 DNN 模型的结构：
+<p align="center">
+<img src="images/lr_vs_dnn.jpg" width="620" hspace='10'/> <br/>
+Figure 1. LR 和 DNN 模型结构对比
+</p>
+LR 的蓝色箭头部分可以直接类比到 DNN 中对应的结构，可以看到 LR 和 DNN 有一些共通之处（比如权重累加），
+但前者的模型复杂度在相同输入维度下比后者可能低很多（从某方面讲，模型越复杂，越有潜力学习到更复杂的信息）；
+如果 LR 要达到匹敌 DNN 的学习能力，必须增加输入的维度，也就是增加特征的数量，
+这也就是为何 LR 和大规模的特征工程必须绑定在一起的原因。
+LR 对于 DNN 模型的优势是对大规模稀疏特征的容纳能力，包括内存和计算量等方面，工业界都有非常成熟的优化方法；
+而 DNN 模型具有自己学习新特征的能力，一定程度上能够提升特征使用的效率，
+这使得 DNN 模型在同样规模特征的情况下，更有可能达到更好的学习效果。
+本文后面的章节会演示如何使用 PaddlePaddle 编写一个结合两者优点的模型。
+## 数据和任务抽象
+我们可以将 `click` 作为学习目标，任务可以有以下几种方案：
+1.  直接学习 click，0,1 作二元分类
+2.  Learning to rank, 具体用 pairwise rank（标签 1>0）或者 listwise rank
+3.  统计每个广告的点击率，将同一个 query 下的广告两两组合，点击率高的>点击率低的，做 rank 或者分类
+我们直接使用第一种方法做分类任务。
+我们使用 Kaggle 上 `Click-through rate prediction` 任务的数据集\[[2](https://www.kaggle.com/c/avazu-ctr-prediction/data)\] 来演示本例中的模型。
+具体的特征处理方法参看 [data process](./dataset.md)。
+本教程中演示模型的输入格式如下：
+```
+# <dnn input ids> \t <lr input sparse values> \t click
+1 23 190 \t 230:0.12 3421:0.9 23451:0.12 \t 0
+23 231 \t 1230:0.12 13421:0.9 \t 1
+```
+详细的格式描述如下：
+- `dnn input ids` 采用 one-hot 表示，只需要填写值为1的ID（注意这里不是变长输入）
+- `lr input sparse values` 使用了 `ID:VALUE` 的表示，值部分最好规约到值域 `[-1, 1]`。
+此外，模型训练时需要传入一个文件描述 dnn 和 lr两个子模型的输入维度，文件的格式如下：
+```
+dnn_input_dim: <int>
+lr_input_dim: <int>
+```
+其中， `<int>` 表示一个整型数值。
+本目录下的 `avazu_data_processor.py` 可以对下载的演示数据集\[[2](#参考文档)\] 进行处理，具体使用方法参考如下说明：
+```
+usage: avazu_data_processer.py [-h] --data_path DATA_PATH --output_dir
+                               OUTPUT_DIR
+                               [--num_lines_to_detect NUM_LINES_TO_DETECT]
+                               [--test_set_size TEST_SET_SIZE]
+                               [--train_size TRAIN_SIZE]
+PaddlePaddle CTR example
+optional arguments:
+  -h, --help            show this help message and exit
+  --data_path DATA_PATH
+                        path of the Avazu dataset
+  --output_dir OUTPUT_DIR
+                        directory to output
+  --num_lines_to_detect NUM_LINES_TO_DETECT
+                        number of records to detect dataset's meta info
+  --test_set_size TEST_SET_SIZE
+                        size of the validation dataset(default: 10000)
+  --train_size TRAIN_SIZE
+                        size of the trainset (default: 100000)
+```
+- `data_path` 是待处理的数据路径
+- `output_dir` 生成数据的输出路径
+- `num_lines_to_detect` 预先扫描数据生成ID的个数，这里是扫描的文件行数
+- `test_set_size` 生成测试集的行数
+- `train_size` 生成训练姐的行数
+## Wide & Deep Learning Model
+谷歌在 16 年提出了 Wide & Deep Learning 的模型框架，用于融合适合学习抽象特征的 DNN 和 适用于大规模稀疏特征的 LR 两种模型的优点。
+### 模型简介
+Wide & Deep Learning Model\[[3](#参考文献)\] 可以作为一种相对成熟的模型框架使用，
+在 CTR 预估的任务中工业界也有一定的应用，因此本文将演示使用此模型来完成 CTR 预估的任务。
+模型结构如下：
+<p align="center">
+<img src="images/wide_deep.png" width="820" hspace='10'/> <br/>
+Figure 2. Wide & Deep Model
+</p>
+模型上边的 Wide 部分，可以容纳大规模系数特征，并且对一些特定的信息（比如 ID）有一定的记忆能力；
+而模型下边的 Deep 部分，能够学习特征间的隐含关系，在相同数量的特征下有更好的学习和推导能力。
+### 编写模型输入
+模型只接受 3 个输入，分别是
+-   `dnn_input` ，也就是 Deep 部分的输入
+-   `lr_input` ，也就是 Wide 部分的输入
+-   `click` ， 点击与否，作为二分类模型学习的标签
+```python
+dnn_merged_input = layer.data(
+    name='dnn_input',
+    type=paddle.data_type.sparse_binary_vector(data_meta_info['dnn_input']))
+lr_merged_input = layer.data(
+    name='lr_input',
+    type=paddle.data_type.sparse_binary_vector(data_meta_info['lr_input']))
+click = paddle.layer.data(name='click', type=dtype.dense_vector(1))
+```
+### 编写 Wide 部分
+Wide 部分直接使用了 LR 模型，但激活函数改成了 `RELU` 来加速
+```python
+def build_lr_submodel():
+    fc = layer.fc(
+        input=lr_merged_input, size=1, name='lr', act=paddle.activation.Relu())
+    return fc
+```
+### 编写 Deep 部分
+Deep 部分使用了标准的多层前向传导的 DNN 模型
+```python
+def build_dnn_submodel(dnn_layer_dims):
+    dnn_embedding = layer.fc(input=dnn_merged_input, size=dnn_layer_dims[0])
+    _input_layer = dnn_embedding
+    for i, dim in enumerate(dnn_layer_dims[1:]):
+        fc = layer.fc(
+            input=_input_layer,
+            size=dim,
+            act=paddle.activation.Relu(),
+            name='dnn-fc-%d' % i)
+        _input_layer = fc
+    return _input_layer
+```
+### 两者融合
+两个 submodel 的最上层输出加权求和得到整个模型的输出，输出部分使用 `sigmoid` 作为激活函数，得到区间 (0,1) 的预测值，
+来逼近训练数据中二元类别的分布，并最终作为 CTR 预估的值使用。
+```python
+# conbine DNN and LR submodels
+def combine_submodels(dnn, lr):
+    merge_layer = layer.concat(input=[dnn, lr])
+    fc = layer.fc(
+        input=merge_layer,
+        size=1,
+        name='output',
+        # use sigmoid function to approximate ctr, wihch is a float value between 0 and 1.
+        act=paddle.activation.Sigmoid())
+    return fc
+```
+### 训练任务的定义
+```python
+dnn = build_dnn_submodel(dnn_layer_dims)
+lr = build_lr_submodel()
+output = combine_submodels(dnn, lr)
+# ==============================================================================
+#                   cost and train period
+# ==============================================================================
+classification_cost = paddle.layer.multi_binary_label_cross_entropy_cost(
+    input=output, label=click)
+paddle.init(use_gpu=False, trainer_count=11)
+params = paddle.parameters.create(classification_cost)
+optimizer = paddle.optimizer.Momentum(momentum=0)
+trainer = paddle.trainer.SGD(
+    cost=classification_cost, parameters=params, update_equation=optimizer)
+dataset = AvazuDataset(train_data_path, n_records_as_test=test_set_size)
+def event_handler(event):
+    if isinstance(event, paddle.event.EndIteration):
+        if event.batch_id % 100 == 0:
+            logging.warning("Pass %d, Samples %d, Cost %f" % (
+                event.pass_id, event.batch_id * batch_size, event.cost))
+        if event.batch_id % 1000 == 0:
+            result = trainer.test(
+                reader=paddle.batch(dataset.test, batch_size=1000),
+                feeding=field_index)
+            logging.warning("Test %d-%d, Cost %f" % (event.pass_id, event.batch_id,
+                                           result.cost))
+trainer.train(
+    reader=paddle.batch(
+        paddle.reader.shuffle(dataset.train, buf_size=500),
+        batch_size=batch_size),
+    feeding=field_index,
+    event_handler=event_handler,
+    num_passes=100)
+```
+## 运行训练和测试
+训练模型需要如下步骤：
+1. 准备训练数据
+    1. 从 [Kaggle CTR](https://www.kaggle.com/c/avazu-ctr-prediction/data) 下载 train.gz
+    2. 解压 train.gz 得到 train.txt
+    3. `mkdir -p output; python avazu_data_processer.py --data_path train.txt --output_dir output --num_lines_to_detect 1000 --test_set_size 100` 生成演示数据
+2. 执行 `python train.py --train_data_path ./output/train.txt --test_data_path ./output/test.txt --data_meta_file ./output/data.meta.txt --model_type=0` 开始训练
+上面第2个步骤可以为 `train.py` 填充命令行参数来定制模型的训练过程，具体的命令行参数及用法如下
+```
+usage: train.py [-h] --train_data_path TRAIN_DATA_PATH
+                [--test_data_path TEST_DATA_PATH] [--batch_size BATCH_SIZE]
+                [--num_passes NUM_PASSES]
+                [--model_output_prefix MODEL_OUTPUT_PREFIX] --data_meta_file
+                DATA_META_FILE --model_type MODEL_TYPE
+PaddlePaddle CTR example
+optional arguments:
+  -h, --help            show this help message and exit
+  --train_data_path TRAIN_DATA_PATH
+                        path of training dataset
+  --test_data_path TEST_DATA_PATH
+                        path of testing dataset
+  --batch_size BATCH_SIZE
+                        size of mini-batch (default:10000)
+  --num_passes NUM_PASSES
+                        number of passes to train
+  --model_output_prefix MODEL_OUTPUT_PREFIX
+                        prefix of path for model to store (default:
+                        ./ctr_models)
+  --data_meta_file DATA_META_FILE
+                        path of data meta info file
+  --model_type MODEL_TYPE
+                        model type, classification: 0, regression 1 (default
+                        classification)
+```
+- `train_data_path` ： 训练集的路径
+- `test_data_path` : 测试集的路径
+- `num_passes`: 模型训练多少轮
+- `data_meta_file`: 参考[数据和任务抽象](### 数据和任务抽象)的描述。
+- `model_type`: 模型分类或回归
+## 用训好的模型做预测
+训好的模型可以用来预测新的数据， 预测数据的格式为
+```
+# <dnn input ids> \t <lr input sparse values>
+1 23 190 \t 230:0.12 3421:0.9 23451:0.12
+23 231 \t 1230:0.12 13421:0.9
+```
+这里与训练数据的格式唯一不同的地方，就是没有标签，也就是训练数据中第3列 `click` 对应的数值。
+`infer.py` 的使用方法如下
+```
+usage: infer.py [-h] --model_gz_path MODEL_GZ_PATH --data_path DATA_PATH
+                --prediction_output_path PREDICTION_OUTPUT_PATH
+                [--data_meta_path DATA_META_PATH] --model_type MODEL_TYPE
+PaddlePaddle CTR example
+optional arguments:
+  -h, --help            show this help message and exit
+  --model_gz_path MODEL_GZ_PATH
+                        path of model parameters gz file
+  --data_path DATA_PATH
+                        path of the dataset to infer
+  --prediction_output_path PREDICTION_OUTPUT_PATH
+                        path to output the prediction
+  --data_meta_path DATA_META_PATH
+                        path of trainset's meta info, default is ./data.meta
+  --model_type MODEL_TYPE
+                        model type, classification: 0, regression 1 (default
+                        classification)
+```
+- `model_gz_path_model`：用 `gz` 压缩过的模型路径
+- `data_path` ： 需要预测的数据路径
+- `prediction_output_paht`：预测输出的路径
+- `data_meta_file` ：参考[数据和任务抽象](### 数据和任务抽象)的描述。
+- `model_type` ：分类或回归
+示例数据可以用如下命令预测
+```
+python infer.py --model_gz_path <model_path> --data_path output/infer.txt --prediction_output_path predictions.txt --data_meta_path data.meta.txt
+```
+最终的预测结果位于 `predictions.txt`。
+## 参考文献
+1. <https://en.wikipedia.org/wiki/Click-through_rate>
+2. <https://www.kaggle.com/c/avazu-ctr-prediction/data>
+3. Cheng H T, Koc L, Harmsen J, et al. [Wide & deep learning for recommender systems](https://arxiv.org/pdf/1606.07792.pdf)[C]//Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 2016: 7-10.
--- a/ctr/README.md
+++ b/ctr/README.md
@@ -120,7 +120,7 @@ The model structure is as follows:
 Figure 2. Wide & Deep Model
 </p>
-The wide part of the left side of the model can accommodate large-scale coefficient features and has some memory for some specific information (such as ID); and the Deep part of the right side of the model can learn the implicit relationship between features.
+The wide part of the top side of the model can accommodate large-scale coefficient features and has some memory for some specific information (such as ID); and the Deep part of the bottom side of the model can learn the implicit relationship between features.
 ### Model Input

--- a/ctr/avazu_data_processer.py
+++ b/ctr/avazu_data_processer.py
@@ -79,8 +79,9 @@ all the files are for demo.
 feature_dims = {}
-categorial_features = ('C1 banner_pos site_category app_category ' +
+categorial_features = (
-                       'device_type device_conn_type').split()
+    'C1 banner_pos site_category app_category ' + 'device_type device_conn_type'
+).split()
 id_features = 'id site_id app_id device_id _device_id_cross_site_id'.split()
@@ -335,8 +336,8 @@ class AvazuDataset(object):
            else:
                fea0 = self.fields[key].cross_fea0
                fea1 = self.fields[key].cross_fea1
-                record.append(
+                record.append(self.fields[key].gen_cross_fea(row[fea0], row[
-                    self.fields[key].gen_cross_fea(row[fea0], row[fea1]))
+                    fea1]))
        sparse_input = concat_sparse_vectors(record, self.id_dims)
@@ -396,8 +397,9 @@ with open(output_infer_path, 'w') as f:
        dnn_input, lr_input = record
        dnn_input = ids2dense(dnn_input, feature_dims['dnn_input'])
        lr_input = ids2sparse(lr_input)
-        line = "%s\t%s\n" % (' '.join(map(str, dnn_input)),
+        line = "%s\t%s\n" % (
-                             ' '.join(map(str, lr_input)), )
+            ' '.join(map(str, dnn_input)),
+            ' '.join(map(str, lr_input)), )
        f.write(line)
        if id > args.test_set_size:
            break

--- a/ctr/images/wide_deep.png
+++ b/ctr/images/wide_deep.png
--- a/ctr/network_conf.py
+++ b/ctr/network_conf.py
@@ -60,15 +60,14 @@ class CTRmodel(object):
        '''
        build DNN submodel.
        '''
-        dnn_embedding = layer.fc(
+        dnn_embedding = layer.fc(input=self.dnn_merged_input,
-            input=self.dnn_merged_input, size=dnn_layer_dims[0])
+                                 size=dnn_layer_dims[0])
        _input_layer = dnn_embedding
        for i, dim in enumerate(dnn_layer_dims[1:]):
-            fc = layer.fc(
+            fc = layer.fc(input=_input_layer,
-                input=_input_layer,
+                          size=dim,
-                size=dim,
+                          act=paddle.activation.Relu(),
-                act=paddle.activation.Relu(),
+                          name='dnn-fc-%d' % i)
-                name='dnn-fc-%d' % i)
            _input_layer = fc
        return _input_layer
@@ -76,8 +75,9 @@ class CTRmodel(object):
        '''
        config LR submodel
        '''
-        fc = layer.fc(
+        fc = layer.fc(input=self.lr_merged_input,
-            input=self.lr_merged_input, size=1, act=paddle.activation.Relu())
+                      size=1,
+                      act=paddle.activation.Relu())
        return fc
    def _build_classification_model(self, dnn, lr):
@@ -95,8 +95,9 @@ class CTRmodel(object):
    def _build_regression_model(self, dnn, lr):
        merge_layer = layer.concat(input=[dnn, lr])
-        self.output = layer.fc(
+        self.output = layer.fc(input=merge_layer,
-            input=merge_layer, size=1, act=paddle.activation.Sigmoid())
+                               size=1,
+                               act=paddle.activation.Sigmoid())
        if not self.is_infer:
            self.train_cost = paddle.layer.square_error_cost(
                input=self.output, label=self.click)

--- a/ctr/train.py
+++ b/ctr/train.py
@@ -68,8 +68,9 @@ def train():
    params = paddle.parameters.create(model.train_cost)
    optimizer = paddle.optimizer.AdaGrad()
-    trainer = paddle.trainer.SGD(
+    trainer = paddle.trainer.SGD(cost=model.train_cost,
-        cost=model.train_cost, parameters=params, update_equation=optimizer)
+                                 parameters=params,
+                                 update_equation=optimizer)
    dataset = reader.Dataset()

--- a/ctr/utils.py
+++ b/ctr/utils.py
@@ -64,5 +64,7 @@ def load_dnn_input_record(sent):
 def load_lr_input_record(sent):
    res = []
    for _ in [x.split(':') for x in sent.split()]:
-        res.append((int(_[0]), float(_[1]), ))
+        res.append((
+            int(_[0]),
+            float(_[1]), ))
    return res
--- a/deep_fm/README.md
+++ b/deep_fm/README.md
+# Deep Factorization Machine for Click-Through Rate prediction
+## Introduction
+This model implements the DeepFM proposed in the following paper:
+```text
+@inproceedings{guo2017deepfm,
+  title={DeepFM: A Factorization-Machine based Neural Network for CTR Prediction},
+  author={Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li and Xiuqiang He},
+  booktitle={the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI)},
+  pages={1725--1731},
+  year={2017}
+}
+```
+The DeepFm combines factorization machine and deep neural networks to model
+both low order and high order feature interactions. For details of the
+factorization machines, please refer to the paper [factorization
+machines](https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf)
+## Dataset
+This example uses Criteo dataset which was used for the [Display Advertising
+Challenge](https://www.kaggle.com/c/criteo-display-ad-challenge/)
+hosted by Kaggle.
+Each row is the features for an ad display and the first column is a label
+indicating whether this ad has been clicked or not. There are 39 features in
+total. 13 features take integer values and the other 26 features are
+categorical features. For the test dataset, the labels are omitted.
+Download dataset:
+```bash
+cd data && ./download.sh && cd ..
+```
+## Model
+The DeepFM model is composed of the factorization machine layer (FM) and deep
+neural networks (DNN). All the input features are feeded to both FM and DNN.
+The output from FM and DNN are combined to form the final output. The embedding
+layer for sparse features in the DNN shares the parameters with the latent
+vectors (factors) of the FM layer.
+The factorization machine layer in PaddlePaddle computes the second order
+interactions. The following code example combines the factorization machine
+layer and fully connected layer to form the full version of factorization
+machine:
+```python
+def fm_layer(input, factor_size):
+    first_order = paddle.layer.fc(input=input, size=1, act=paddle.activation.Linear())
+    second_order = paddle.layer.factorization_machine(input=input, factor_size=factor_size)
+    fm = paddle.layer.addto(input=[first_order, second_order],
+                            act=paddle.activation.Linear(),
+                            bias_attr=False)
+    return fm
+```
+## Data preparation
+To preprocess the raw dataset, the integer features are clipped then min-max
+normalized to [0, 1] and the categorical features are one-hot encoded. The raw
+training dataset are splited such that 90% are used for training and the other
+10% are used for validation during training.
+```bash
+python preprocess.py --datadir ./data/raw --outdir ./data
+```
+## Train
+The command line options for training can be listed by `python train.py -h`.
+To train the model:
+```bash
+python train.py \
+        --train_data_path data/train.txt \
+        --test_data_path data/valid.txt \
+        2>&1 | tee train.log
+```
+After training pass 9 batch 40000, the testing AUC is `0.807178` and the testing
+cost is `0.445196`.
+## Infer
+The command line options for infering can be listed by `python infer.py -h`.
+To make inference for the test dataset:
+```bash
+python infer.py \
+        --model_gz_path models/model-pass-9-batch-10000.tar.gz \
+        --data_path data/test.txt \
+        --prediction_output_path ./predict.txt
+```
--- a/deep_fm/data/download.sh
+++ b/deep_fm/data/download.sh
+#!/bin/bash
+wget --no-check-certificate https://s3-eu-west-1.amazonaws.com/criteo-labs/dac.tar.gz
+tar zxf dac.tar.gz
+rm -f dac.tar.gz
+mkdir raw
+mv ./*.txt raw/
--- a/deep_fm/infer.py
+++ b/deep_fm/infer.py
+import os
+import gzip
+import argparse
+import itertools
+import paddle.v2 as paddle
+from network_conf import DeepFM
+import reader
+def parse_args():
+    parser = argparse.ArgumentParser(description="PaddlePaddle DeepFM example")
+    parser.add_argument(
+        '--model_gz_path',
+        type=str,
+        required=True,
+        help="The path of model parameters gz file")
+    parser.add_argument(
+        '--data_path',
+        type=str,
+        required=True,
+        help="The path of the dataset to infer")
+    parser.add_argument(
+        '--prediction_output_path',
+        type=str,
+        required=True,
+        help="The path to output the prediction")
+    parser.add_argument(
+        '--factor_size',
+        type=int,
+        default=10,
+        help="The factor size for the factorization machine (default:10)")
+    return parser.parse_args()
+def infer():
+    args = parse_args()
+    paddle.init(use_gpu=False, trainer_count=1)
+    model = DeepFM(args.factor_size, infer=True)
+    parameters = paddle.parameters.Parameters.from_tar(
+        gzip.open(args.model_gz_path, 'r'))
+    inferer = paddle.inference.Inference(
+        output_layer=model, parameters=parameters)
+    dataset = reader.Dataset()
+    infer_reader = paddle.batch(dataset.infer(args.data_path), batch_size=1000)
+    with open(args.prediction_output_path, 'w') as out:
+        for id, batch in enumerate(infer_reader()):
+            res = inferer.infer(input=batch)
+            predictions = [x for x in itertools.chain.from_iterable(res)]
+            out.write('\n'.join(map(str, predictions)) + '\n')
+if __name__ == '__main__':
+    infer()
--- a/deep_fm/network_conf.py
+++ b/deep_fm/network_conf.py
+import paddle.v2 as paddle
+dense_feature_dim = 13
+sparse_feature_dim = 117568
+def fm_layer(input, factor_size, fm_param_attr):
+    first_order = paddle.layer.fc(input=input,
+                                  size=1,
+                                  act=paddle.activation.Linear())
+    second_order = paddle.layer.factorization_machine(
+        input=input,
+        factor_size=factor_size,
+        act=paddle.activation.Linear(),
+        param_attr=fm_param_attr)
+    out = paddle.layer.addto(
+        input=[first_order, second_order],
+        act=paddle.activation.Linear(),
+        bias_attr=False)
+    return out
+def DeepFM(factor_size, infer=False):
+    dense_input = paddle.layer.data(
+        name="dense_input",
+        type=paddle.data_type.dense_vector(dense_feature_dim))
+    sparse_input = paddle.layer.data(
+        name="sparse_input",
+        type=paddle.data_type.sparse_binary_vector(sparse_feature_dim))
+    sparse_input_ids = [
+        paddle.layer.data(
+            name="C" + str(i),
+            type=paddle.data_type.integer_value(sparse_feature_dim))
+        for i in range(1, 27)
+    ]
+    dense_fm = fm_layer(
+        dense_input,
+        factor_size,
+        fm_param_attr=paddle.attr.Param(name="DenseFeatFactors"))
+    sparse_fm = fm_layer(
+        sparse_input,
+        factor_size,
+        fm_param_attr=paddle.attr.Param(name="SparseFeatFactors"))
+    def embedding_layer(input):
+        return paddle.layer.embedding(
+            input=input,
+            size=factor_size,
+            param_attr=paddle.attr.Param(name="SparseFeatFactors"))
+    sparse_embed_seq = map(embedding_layer, sparse_input_ids)
+    sparse_embed = paddle.layer.concat(sparse_embed_seq)
+    fc1 = paddle.layer.fc(input=[sparse_embed, dense_input],
+                          size=400,
+                          act=paddle.activation.Relu())
+    fc2 = paddle.layer.fc(input=fc1, size=400, act=paddle.activation.Relu())
+    fc3 = paddle.layer.fc(input=fc2, size=400, act=paddle.activation.Relu())
+    predict = paddle.layer.fc(input=[dense_fm, sparse_fm, fc3],
+                              size=1,
+                              act=paddle.activation.Sigmoid())
+    if not infer:
+        label = paddle.layer.data(
+            name="label", type=paddle.data_type.dense_vector(1))
+        cost = paddle.layer.multi_binary_label_cross_entropy_cost(
+            input=predict, label=label)
+        paddle.evaluator.classification_error(
+            name="classification_error", input=predict, label=label)
+        paddle.evaluator.auc(name="auc", input=predict, label=label)
+        return cost
+    else:
+        return predict
--- a/deep_fm/preprocess.py
+++ b/deep_fm/preprocess.py
+"""
+Preprocess Criteo dataset. This dataset was used for the Display Advertising
+Challenge (https://www.kaggle.com/c/criteo-display-ad-challenge).
+"""
+import os
+import sys
+import click
+import random
+import collections
+# There are 13 integer features and 26 categorical features
+continous_features = range(1, 14)
+categorial_features = range(14, 40)
+# Clip integer features. The clip point for each integer feature
+# is derived from the 95% quantile of the total values in each feature
+continous_clip = [20, 600, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50]
+class CategoryDictGenerator:
+    """
+    Generate dictionary for each of the categorical features
+    """
+    def __init__(self, num_feature):
+        self.dicts = []
+        self.num_feature = num_feature
+        for i in range(0, num_feature):
+            self.dicts.append(collections.defaultdict(int))
+    def build(self, datafile, categorial_features, cutoff=0):
+        with open(datafile, 'r') as f:
+            for line in f:
+                features = line.rstrip('\n').split('\t')
+                for i in range(0, self.num_feature):
+                    if features[categorial_features[i]] != '':
+                        self.dicts[i][features[categorial_features[i]]] += 1
+        for i in range(0, self.num_feature):
+            self.dicts[i] = filter(lambda x: x[1] >= cutoff,
+                                   self.dicts[i].items())
+            self.dicts[i] = sorted(self.dicts[i], key=lambda x: (-x[1], x[0]))
+            vocabs, _ = list(zip(*self.dicts[i]))
+            self.dicts[i] = dict(zip(vocabs, range(1, len(vocabs) + 1)))
+            self.dicts[i]['<unk>'] = 0
+    def gen(self, idx, key):
+        if key not in self.dicts[idx]:
+            res = self.dicts[idx]['<unk>']
+        else:
+            res = self.dicts[idx][key]
+        return res
+    def dicts_sizes(self):
+        return map(len, self.dicts)
+class ContinuousFeatureGenerator:
+    """
+    Normalize the integer features to [0, 1] by min-max normalization
+    """
+    def __init__(self, num_feature):
+        self.num_feature = num_feature
+        self.min = [sys.maxint] * num_feature
+        self.max = [-sys.maxint] * num_feature
+    def build(self, datafile, continous_features):
+        with open(datafile, 'r') as f:
+            for line in f:
+                features = line.rstrip('\n').split('\t')
+                for i in range(0, self.num_feature):
+                    val = features[continous_features[i]]
+                    if val != '':
+                        val = int(val)
+                        if val > continous_clip[i]:
+                            val = continous_clip[i]
+                        self.min[i] = min(self.min[i], val)
+                        self.max[i] = max(self.max[i], val)
+    def gen(self, idx, val):
+        if val == '':
+            return 0.0
+        val = float(val)
+        return (val - self.min[idx]) / (self.max[idx] - self.min[idx])
+@click.command("preprocess")
+@click.option("--datadir", type=str, help="Path to raw criteo dataset")
+@click.option("--outdir", type=str, help="Path to save the processed data")
+def preprocess(datadir, outdir):
+    """
+    All the 13 integer features are normalzied to continous values and these
+    continous features are combined into one vecotr with dimension 13.
+    Each of the 26 categorical features are one-hot encoded and all the one-hot
+    vectors are combined into one sparse binary vector.
+    """
+    dists = ContinuousFeatureGenerator(len(continous_features))
+    dists.build(os.path.join(datadir, 'train.txt'), continous_features)
+    dicts = CategoryDictGenerator(len(categorial_features))
+    dicts.build(
+        os.path.join(datadir, 'train.txt'), categorial_features, cutoff=200)
+    dict_sizes = dicts.dicts_sizes()
+    categorial_feature_offset = [0]
+    for i in range(1, len(categorial_features)):
+        offset = categorial_feature_offset[i - 1] + dict_sizes[i - 1]
+        categorial_feature_offset.append(offset)
+    random.seed(0)
+    # 90% of the data are used for training, and 10% of the data are used
+    # for validation.
+    with open(os.path.join(outdir, 'train.txt'), 'w') as out_train:
+        with open(os.path.join(outdir, 'valid.txt'), 'w') as out_valid:
+            with open(os.path.join(datadir, 'train.txt'), 'r') as f:
+                for line in f:
+                    features = line.rstrip('\n').split('\t')
+                    continous_vals = []
+                    for i in range(0, len(continous_features)):
+                        val = dists.gen(i, features[continous_features[i]])
+                        continous_vals.append("{0:.6f}".format(val).rstrip('0')
+                                              .rstrip('.'))
+                    categorial_vals = []
+                    for i in range(0, len(categorial_features)):
+                        val = dicts.gen(i, features[categorial_features[
+                            i]]) + categorial_feature_offset[i]
+                        categorial_vals.append(str(val))
+                    continous_vals = ','.join(continous_vals)
+                    categorial_vals = ','.join(categorial_vals)
+                    label = features[0]
+                    if random.randint(0, 9999) % 10 != 0:
+                        out_train.write('\t'.join(
+                            [continous_vals, categorial_vals, label]) + '\n')
+                    else:
+                        out_valid.write('\t'.join(
+                            [continous_vals, categorial_vals, label]) + '\n')
+    with open(os.path.join(outdir, 'test.txt'), 'w') as out:
+        with open(os.path.join(datadir, 'test.txt'), 'r') as f:
+            for line in f:
+                features = line.rstrip('\n').split('\t')
+                continous_vals = []
+                for i in range(0, len(continous_features)):
+                    val = dists.gen(i, features[continous_features[i] - 1])
+                    continous_vals.append("{0:.6f}".format(val).rstrip('0')
+                                          .rstrip('.'))
+                categorial_vals = []
+                for i in range(0, len(categorial_features)):
+                    val = dicts.gen(i, features[categorial_features[
+                        i] - 1]) + categorial_feature_offset[i]
+                    categorial_vals.append(str(val))
+                continous_vals = ','.join(continous_vals)
+                categorial_vals = ','.join(categorial_vals)
+                out.write('\t'.join([continous_vals, categorial_vals]) + '\n')
+if __name__ == "__main__":
+    preprocess()
--- a/deep_fm/reader.py
+++ b/deep_fm/reader.py
+class Dataset:
+    def _reader_creator(self, path, is_infer):
+        def reader():
+            with open(path, 'r') as f:
+                for line in f:
+                    features = line.rstrip('\n').split('\t')
+                    dense_feature = map(float, features[0].split(','))
+                    sparse_feature = map(int, features[1].split(','))
+                    if not is_infer:
+                        label = [float(features[2])]
+                        yield [dense_feature, sparse_feature
+                               ] + sparse_feature + [label]
+                    else:
+                        yield [dense_feature, sparse_feature] + sparse_feature
+        return reader
+    def train(self, path):
+        return self._reader_creator(path, False)
+    def test(self, path):
+        return self._reader_creator(path, False)
+    def infer(self, path):
+        return self._reader_creator(path, True)
+feeding = {
+    'dense_input': 0,
+    'sparse_input': 1,
+    'C1': 2,
+    'C2': 3,
+    'C3': 4,
+    'C4': 5,
+    'C5': 6,
+    'C6': 7,
+    'C7': 8,
+    'C8': 9,
+    'C9': 10,
+    'C10': 11,
+    'C11': 12,
+    'C12': 13,
+    'C13': 14,
+    'C14': 15,
+    'C15': 16,
+    'C16': 17,
+    'C17': 18,
+    'C18': 19,
+    'C19': 20,
+    'C20': 21,
+    'C21': 22,
+    'C22': 23,
+    'C23': 24,
+    'C24': 25,
+    'C25': 26,
+    'C26': 27,
+    'label': 28
+}
--- a/deep_fm/train.py
+++ b/deep_fm/train.py
+import os
+import gzip
+import logging
+import argparse
+import paddle.v2 as paddle
+from network_conf import DeepFM
+import reader
+logging.basicConfig()
+logger = logging.getLogger("paddle")
+logger.setLevel(logging.INFO)
+def parse_args():
+    parser = argparse.ArgumentParser(description="PaddlePaddle DeepFM example")
+    parser.add_argument(
+        '--train_data_path',
+        type=str,
+        required=True,
+        help="The path of training dataset")
+    parser.add_argument(
+        '--test_data_path',
+        type=str,
+        required=True,
+        help="The path of testing dataset")
+    parser.add_argument(
+        '--batch_size',
+        type=int,
+        default=1000,
+        help="The size of mini-batch (default:1000)")
+    parser.add_argument(
+        '--num_passes',
+        type=int,
+        default=10,
+        help="The number of passes to train (default: 10)")
+    parser.add_argument(
+        '--factor_size',
+        type=int,
+        default=10,
+        help="The factor size for the factorization machine (default:10)")
+    parser.add_argument(
+        '--model_output_dir',
+        type=str,
+        default='models',
+        help='The path for model to store (default: models)')
+    return parser.parse_args()
+def train():
+    args = parse_args()
+    if not os.path.isdir(args.model_output_dir):
+        os.mkdir(args.model_output_dir)
+    paddle.init(use_gpu=False, trainer_count=1)
+    optimizer = paddle.optimizer.Adam(learning_rate=1e-4)
+    model = DeepFM(args.factor_size)
+    params = paddle.parameters.create(model)
+    trainer = paddle.trainer.SGD(cost=model,
+                                 parameters=params,
+                                 update_equation=optimizer)
+    dataset = reader.Dataset()
+    def __event_handler__(event):
+        if isinstance(event, paddle.event.EndIteration):
+            num_samples = event.batch_id * args.batch_size
+            if event.batch_id % 100 == 0:
+                logger.warning("Pass %d, Batch %d, Samples %d, Cost %f, %s" %
+                               (event.pass_id, event.batch_id, num_samples,
+                                event.cost, event.metrics))
+            if event.batch_id % 10000 == 0:
+                if args.test_data_path:
+                    result = trainer.test(
+                        reader=paddle.batch(
+                            dataset.test(args.test_data_path),
+                            batch_size=args.batch_size),
+                        feeding=reader.feeding)
+                    logger.warning("Test %d-%d, Cost %f, %s" %
+                                   (event.pass_id, event.batch_id, result.cost,
+                                    result.metrics))
+                path = "{}/model-pass-{}-batch-{}.tar.gz".format(
+                    args.model_output_dir, event.pass_id, event.batch_id)
+                with gzip.open(path, 'w') as f:
+                    trainer.save_parameter_to_tar(f)
+    trainer.train(
+        reader=paddle.batch(
+            paddle.reader.shuffle(
+                dataset.train(args.train_data_path),
+                buf_size=args.batch_size * 10000),
+            batch_size=args.batch_size),
+        feeding=reader.feeding,
+        event_handler=__event_handler__,
+        num_passes=args.num_passes)
+if __name__ == '__main__':
+    train()
--- a/deep_speech_2/README.md
+++ b/deep_speech_2/README.md
--- a/deep_speech_2/cloud/README.md
+++ b/deep_speech_2/cloud/README.md
-# Train DeepSpeech2 on PaddleCloud
->Note:
->Please make sure [PaddleCloud Client](https://github.com/PaddlePaddle/cloud/blob/develop/doc/usage_cn.md#%E4%B8%8B%E8%BD%BD%E5%B9%B6%E9%85%8D%E7%BD%AEpaddlecloud) has be installed and current directory is `deep_speech_2/cloud/`
-## Step 1:  Upload Data
-Provided with several input manifests, `pcloud_upload_data.sh` will pack and upload all the containing audio files to PaddleCloud filesystem, and also generate some corresponding manifest files with updated cloud paths.
-Please modify the following arguments in `pcloud_upload_data.sh`:
- `IN_MANIFESTS`： Paths (in local filesystem) of manifest files containing the audio files to be uploaded. Multiple paths can be concatenated with a whitespace delimeter.
- `OUT_MANIFESTS`: Paths (in local filesystem) to write the updated output manifest files to. Multiple paths can be concatenated with a whitespace delimeter. The values of `audio_filepath` in the output manifests are updated with cloud filesystem paths.
- `CLOUD_DATA_DIR`:  Directory (in PaddleCloud filesystem) to upload the data to. Don't forget to replace `USERNAME` in the default directory and make sure that you have the permission to write it.
- `NUM_SHARDS`: Number of data shards / parts (in tar files) to be generated when packing and uploading data. Smaller `num_shards` requires larger temoporal local disk space for packing data.
-By running:
-```
-sh pcloud_upload_data.sh
-```
-all the audio files will be uploaded to PaddleCloud filesystem, and you will get modified manifests files in `OUT_MANIFESTS`.
-You have to take this step only once, in the very first time you do the cloud training. Later on, the data is persisitent on the cloud filesystem and reusable for further job submissions.
-## Step 2:  Configure Training
-Configure cloud training arguments in `pcloud_submit.sh`, with the following arguments:
- `TRAIN_MANIFEST`: Manifest filepath (in local filesystem) for training. Notice that the`audio_filepath` should be in cloud filesystem, like those generated by `pcloud_upload_data.sh`.
- `DEV_MANIFEST`: Manifest filepath (in local filesystem) for validation.
- `CLOUD_MODEL_DIR`: Directory (in PaddleCloud filesystem) to save the model parameters (checkpoints). Don't forget to replace `USERNAME` in the default directory and make sure that you have the permission to write it.
- `BATCH_SIZE`: Training batch size for a single node.
- `NUM_GPU`: Number of GPUs allocated for a single node.
- `NUM_NODE`: Number of nodes (machines) allocated for this job.
- `IS_LOCAL`: Set to False to enable parameter server, if using multiple nodes.
-Configure other training hyper-parameters in `pcloud_train.sh` as you wish, just as what you can do in local training.
-By running:
-```
-sh pcloud_submit.sh
-```
-you submit a training job to PaddleCloud. And you will see the job name when the submission is done.
-## Step 3  Get Job Logs
-Run this to list all the jobs you have submitted, as well as their running status:
-```
-paddlecloud get jobs
-```
-Run this, the corresponding job's logs will be printed.
-```
-paddlecloud logs -n 10000 $REPLACED_WITH_YOUR_ACTUAL_JOB_NAME
-```
-## More Help
-For more information about the usage of PaddleCloud, please refer to [PaddleCloud Usage](https://github.com/PaddlePaddle/cloud/blob/develop/doc/usage_cn.md#提交任务).
--- a/deep_speech_2/cloud/_init_paths.py
+++ b/deep_speech_2/cloud/_init_paths.py
-"""Set up paths for DS2"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import os.path
-import sys
-def add_path(path):
-    if path not in sys.path:
-        sys.path.insert(0, path)
-this_dir = os.path.dirname(__file__)
-proj_path = os.path.join(this_dir, '..')
-add_path(proj_path)
--- a/deep_speech_2/cloud/pcloud_submit.sh
+++ b/deep_speech_2/cloud/pcloud_submit.sh
-#! /usr/bin/env bash
-TRAIN_MANIFEST="cloud/cloud_manifests/cloud.manifest.train"
-DEV_MANIFEST="cloud/cloud_manifests/cloud.manifest.dev"
-CLOUD_MODEL_DIR="./checkpoints"
-BATCH_SIZE=512
-NUM_GPU=8
-NUM_NODE=1
-IS_LOCAL="True"
-JOB_NAME=deepspeech-`date +%Y%m%d%H%M%S`
-DS2_PATH=${PWD%/*}
-cp -f  pcloud_train.sh ${DS2_PATH}
-paddlecloud submit \
-image bootstrapper:5000/paddlepaddle/pcloud_ds2:latest \
-jobname ${JOB_NAME} \
-cpu ${NUM_GPU} \
-gpu ${NUM_GPU} \
-memory 64Gi \
-parallelism ${NUM_NODE} \
-pscpu 1 \
-pservers 1 \
-psmemory 64Gi \
-passes 1 \
-entry "sh pcloud_train.sh ${TRAIN_MANIFEST} ${DEV_MANIFEST} ${CLOUD_MODEL_DIR} ${NUM_GPU} ${BATCH_SIZE} ${IS_LOCAL}" \
-${DS2_PATH}
-rm ${DS2_PATH}/pcloud_train.sh
--- a/deep_speech_2/cloud/pcloud_train.sh
+++ b/deep_speech_2/cloud/pcloud_train.sh
-#! /usr/bin/env bash
-TRAIN_MANIFEST=$1
-DEV_MANIFEST=$2
-MODEL_PATH=$3
-NUM_GPU=$4
-BATCH_SIZE=$5
-IS_LOCAL=$6
-python ./cloud/split_data.py \
--in_manifest_path=${TRAIN_MANIFEST} \
--out_manifest_path='/local.manifest.train'
-python ./cloud/split_data.py \
--in_manifest_path=${DEV_MANIFEST} \
--out_manifest_path='/local.manifest.dev'
-mkdir ./logs
-python -u train.py \
--batch_size=${BATCH_SIZE} \
--trainer_count=${NUM_GPU} \
--num_passes=200 \
--num_proc_data=${NUM_GPU} \
--num_conv_layers=2 \
--num_rnn_layers=3 \
--rnn_layer_size=2048 \
--num_iter_print=100 \
--learning_rate=5e-4 \
--max_duration=27.0 \
--min_duration=0.0 \
--use_sortagrad=True \
--use_gru=False \
--use_gpu=True \
--is_local=${IS_LOCAL} \
--share_rnn_weights=True \
--train_manifest='/local.manifest.train' \
--dev_manifest='/local.manifest.dev' \
--mean_std_path='data/librispeech/mean_std.npz' \
--vocab_path='data/librispeech/vocab.txt' \
--output_model_dir='./checkpoints' \
--output_model_dir=${MODEL_PATH} \
--augment_conf_path='conf/augmentation.config' \
--specgram_type='linear' \
--shuffle_method='batch_shuffle_clipped' \
-2>&1 | tee ./logs/train.log
--- a/deep_speech_2/cloud/pcloud_upload_data.sh
+++ b/deep_speech_2/cloud/pcloud_upload_data.sh
-#! /usr/bin/env bash
-mkdir cloud_manifests
-IN_MANIFESTS="../data/librispeech/manifest.train ../data/librispeech/manifest.dev-clean ../data/librispeech/manifest.test-clean"
-OUT_MANIFESTS="cloud_manifests/cloud.manifest.train cloud_manifests/cloud.manifest.dev cloud_manifests/cloud.manifest.test"
-CLOUD_DATA_DIR="/pfs/dlnel/home/USERNAME/deepspeech2/data/librispeech"
-NUM_SHARDS=50
-python upload_data.py \
--in_manifest_paths ${IN_MANIFESTS} \
--out_manifest_paths ${OUT_MANIFESTS} \
--cloud_data_dir ${CLOUD_DATA_DIR} \
--num_shards ${NUM_SHARDS}
-if [ $? -ne 0 ]
-then
-    echo "Upload Data Failed!"
-    exit 1
-fi
-echo "All Done."
--- a/deep_speech_2/cloud/split_data.py
+++ b/deep_speech_2/cloud/split_data.py
-"""This tool is used for splitting data into each node of
-paddlecloud. This script should be called in paddlecloud.
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import os
-import json
-import argparse
-parser = argparse.ArgumentParser(description=__doc__)
-parser.add_argument(
-    "--in_manifest_path",
-    type=str,
-    required=True,
-    help="Input manifest path for all nodes.")
-parser.add_argument(
-    "--out_manifest_path",
-    type=str,
-    required=True,
-    help="Output manifest file path for current node.")
-args = parser.parse_args()
-def split_data(in_manifest_path, out_manifest_path):
-    with open("/trainer_id", "r") as f:
-        trainer_id = int(f.readline()[:-1])
-    with open("/trainer_count", "r") as f:
-        trainer_count = int(f.readline()[:-1])
-    out_manifest = []
-    for index, json_line in enumerate(open(in_manifest_path, 'r')):
-        if (index % trainer_count) == trainer_id:
-            out_manifest.append("%s\n" % json_line.strip())
-    with open(out_manifest_path, 'w') as f:
-        f.writelines(out_manifest)
-if __name__ == '__main__':
-    split_data(args.in_manifest_path, args.out_manifest_path)
--- a/deep_speech_2/cloud/upload_data.py
+++ b/deep_speech_2/cloud/upload_data.py
-"""This script is for uploading data for DeepSpeech2 training on paddlecloud.
-Steps:
-1. Read original manifests and extract local sound files.
-2. Tar all local sound files into multiple tar files and upload them.
-3. Modify original manifests with updated paths in cloud filesystem.
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import json
-import os
-import tarfile
-import sys
-import argparse
-import shutil
-from subprocess import call
-import _init_paths
-from data_utils.utils import read_manifest
-parser = argparse.ArgumentParser(description=__doc__)
-parser.add_argument(
-    "--in_manifest_paths",
-    default=[
-        "../datasets/manifest.train", "../datasets/manifest.dev",
-        "../datasets/manifest.test"
-    ],
-    type=str,
-    nargs='+',
-    help="Local filepaths of input manifests to load, pack and upload."
-    "(default: %(default)s)")
-parser.add_argument(
-    "--out_manifest_paths",
-    default=[
-        "./cloud.manifest.train", "./cloud.manifest.dev",
-        "./cloud.manifest.test"
-    ],
-    type=str,
-    nargs='+',
-    help="Local filepaths of modified manifests to write to. "
-    "(default: %(default)s)")
-parser.add_argument(
-    "--cloud_data_dir",
-    required=True,
-    type=str,
-    help="Destination directory on paddlecloud to upload data to.")
-parser.add_argument(
-    "--num_shards",
-    default=10,
-    type=int,
-    help="Number of parts to split data to. (default: %(default)s)")
-parser.add_argument(
-    "--local_tmp_dir",
-    default="./tmp/",
-    type=str,
-    help="Local directory for storing temporary data. (default: %(default)s)")
-args = parser.parse_args()
-def upload_data(in_manifest_path_list, out_manifest_path_list, local_tmp_dir,
-                upload_tar_dir, num_shards):
-    """Extract and pack sound files listed in the manifest files into multple
-    tar files and upload them to padldecloud. Besides, generate new manifest
-    files with updated paths in paddlecloud.
-    """
-    # compute total audio number
-    total_line = 0
-    for manifest_path in in_manifest_path_list:
-        with open(manifest_path, 'r') as f:
-            total_line += len(f.readlines())
-    line_per_tar = (total_line // num_shards) + 1
-    # pack and upload shard by shard
-    line_count, tar_file = 0, None
-    for manifest_path, out_manifest_path in zip(in_manifest_path_list,
-                                                out_manifest_path_list):
-        manifest = read_manifest(manifest_path)
-        out_manifest = []
-        for json_data in manifest:
-            sound_filepath = json_data['audio_filepath']
-            sound_filename = os.path.basename(sound_filepath)
-            if line_count % line_per_tar == 0:
-                if tar_file != None:
-                    tar_file.close()
-                    pcloud_cp(tar_path, upload_tar_dir)
-                    os.remove(tar_path)
-                tar_name = 'part-%s-of-%s.tar' % (
-                    str(line_count // line_per_tar).zfill(5),
-                    str(num_shards).zfill(5))
-                tar_path = os.path.join(local_tmp_dir, tar_name)
-                tar_file = tarfile.open(tar_path, 'w')
-            tar_file.add(sound_filepath, arcname=sound_filename)
-            line_count += 1
-            json_data['audio_filepath'] = "tar:%s#%s" % (
-                os.path.join(upload_tar_dir, tar_name), sound_filename)
-            out_manifest.append("%s\n" % json.dumps(json_data))
-        with open(out_manifest_path, 'w') as f:
-            f.writelines(out_manifest)
-        pcloud_cp(out_manifest_path, upload_tar_dir)
-    tar_file.close()
-    pcloud_cp(tar_path, upload_tar_dir)
-    os.remove(tar_path)
-def pcloud_mkdir(dir):
-    """Make directory in PaddleCloud filesystem.
-    """
-    if call(['paddlecloud', 'mkdir', dir]) != 0:
-        raise IOError("PaddleCloud mkdir failed: %s." % dir)
-def pcloud_cp(src, dst):
-    """Copy src from local filesytem to dst in PaddleCloud filesystem,
-    or downlowd src from PaddleCloud filesystem to dst in local filesystem.
-    """
-    if call(['paddlecloud', 'cp', src, dst]) != 0:
-        raise IOError("PaddleCloud cp failed: from [%s] to [%s]." % (src, dst))
-if __name__ == '__main__':
-    if not os.path.exists(args.local_tmp_dir):
-        os.makedirs(args.local_tmp_dir)
-    pcloud_mkdir(args.cloud_data_dir)
-    upload_data(args.in_manifest_paths, args.out_manifest_paths,
-                args.local_tmp_dir, args.cloud_data_dir, args.num_shards)
-    shutil.rmtree(args.local_tmp_dir)
--- a/deep_speech_2/conf/augmentation.config
+++ b/deep_speech_2/conf/augmentation.config
-[
-    {
-        "type": "shift",
-        "params": {"min_shift_ms": -5,
-                   "max_shift_ms": 5},
-        "prob": 1.0
-    }
-]
--- a/deep_speech_2/conf/augmentation.config.example
+++ b/deep_speech_2/conf/augmentation.config.example
-[
-    {
-        "type": "noise",
-        "params": {"min_snr_dB": 40,
-                   "max_snr_dB": 50,
-                   "noise_manifest_path": "datasets/manifest.noise"},
-        "prob": 0.6
-    },
-    {
-        "type": "impulse",
-        "params": {"impulse_manifest_path": "datasets/manifest.impulse"},
-        "prob": 0.5
-    },
-    {
-        "type": "speed",
-        "params": {"min_speed_rate": 0.95,
-                   "max_speed_rate": 1.05},
-        "prob": 0.5
-    },
-    {
-        "type": "shift",
-        "params": {"min_shift_ms": -5,
-                   "max_shift_ms": 5},
-        "prob": 1.0
-    },
-    {
-        "type": "volume",
-        "params": {"min_gain_dBFS": -10,
-                   "max_gain_dBFS": 10},
-        "prob": 0.0
-    },
-    {
-        "type": "bayesian_normal",
-        "params": {"target_db": -20,
-                   "prior_db": -20,
-                   "prior_samples": 100},
-        "prob": 0.0
-    }
-]
--- a/deep_speech_2/data/aishell/aishell.py
+++ b/deep_speech_2/data/aishell/aishell.py
-"""Prepare Aishell mandarin dataset
-Download, unpack and create manifest files.
-Manifest file is a json-format file with each line containing the
-meta data (i.e. audio filepath, transcript and audio duration)
-of each audio file in the data set.
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import os
-import codecs
-import soundfile
-import json
-import argparse
-from data_utils.utility import download, unpack
-DATA_HOME = os.path.expanduser('~/.cache/paddle/dataset/speech')
-URL_ROOT = 'http://www.openslr.org/resources/33'
-DATA_URL = URL_ROOT + '/data_aishell.tgz'
-MD5_DATA = '2f494334227864a8a8fec932999db9d8'
-parser = argparse.ArgumentParser(description=__doc__)
-parser.add_argument(
-    "--target_dir",
-    default=DATA_HOME + "/Aishell",
-    type=str,
-    help="Directory to save the dataset. (default: %(default)s)")
-parser.add_argument(
-    "--manifest_prefix",
-    default="manifest",
-    type=str,
-    help="Filepath prefix for output manifests. (default: %(default)s)")
-args = parser.parse_args()
-def create_manifest(data_dir, manifest_path_prefix):
-    print("Creating manifest %s ..." % manifest_path_prefix)
-    json_lines = []
-    transcript_path = os.path.join(data_dir, 'transcript',
-                                   'aishell_transcript_v0.8.txt')
-    transcript_dict = {}
-    for line in codecs.open(transcript_path, 'r', 'utf-8'):
-        line = line.strip()
-        if line == '': continue
-        audio_id, text = line.split(' ', 1)
-        # remove withespace
-        text = ''.join(text.split())
-        transcript_dict[audio_id] = text
-    data_types = ['train', 'dev', 'test']
-    for type in data_types:
-        audio_dir = os.path.join(data_dir, 'wav', type)
-        for subfolder, _, filelist in sorted(os.walk(audio_dir)):
-            for fname in filelist:
-                audio_path = os.path.join(subfolder, fname)
-                audio_id = fname[:-4]
-                # if no transcription for audio then skipped
-                if audio_id not in transcript_dict:
-                    continue
-                audio_data, samplerate = soundfile.read(audio_path)
-                duration = float(len(audio_data) / samplerate)
-                text = transcript_dict[audio_id]
-                json_lines.append(
-                    json.dumps(
-                        {
-                            'audio_filepath': audio_path,
-                            'duration': duration,
-                            'text': text
-                        },
-                        ensure_ascii=False))
-        manifest_path = manifest_path_prefix + '.' + type
-        with codecs.open(manifest_path, 'w', 'utf-8') as fout:
-            for line in json_lines:
-                fout.write(line + '\n')
-def prepare_dataset(url, md5sum, target_dir, manifest_path):
-    """Download, unpack and create manifest file."""
-    data_dir = os.path.join(target_dir, 'data_aishell')
-    if not os.path.exists(data_dir):
-        filepath = download(url, md5sum, target_dir)
-        unpack(filepath, target_dir)
-        # unpack all audio tar files
-        audio_dir = os.path.join(data_dir, 'wav')
-        for subfolder, _, filelist in sorted(os.walk(audio_dir)):
-            for ftar in filelist:
-                unpack(os.path.join(subfolder, ftar), subfolder, True)
-    else:
-        print("Skip downloading and unpacking. Data already exists in %s." %
-              target_dir)
-    create_manifest(data_dir, manifest_path)
-def main():
-    if args.target_dir.startswith('~'):
-        args.target_dir = os.path.expanduser(args.target_dir)
-    prepare_dataset(
-        url=DATA_URL,
-        md5sum=MD5_DATA,
-        target_dir=args.target_dir,
-        manifest_path=args.manifest_prefix)
-if __name__ == '__main__':
-    main()
--- a/deep_speech_2/data/librispeech/librispeech.py
+++ b/deep_speech_2/data/librispeech/librispeech.py
-"""Prepare Librispeech ASR datasets.
-Download, unpack and create manifest files.
-Manifest file is a json-format file with each line containing the
-meta data (i.e. audio filepath, transcript and audio duration)
-of each audio file in the data set.
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import distutils.util
-import os
-import sys
-import argparse
-import soundfile
-import json
-import codecs
-from data_utils.utility import download, unpack
-URL_ROOT = "http://www.openslr.org/resources/12"
-URL_TEST_CLEAN = URL_ROOT + "/test-clean.tar.gz"
-URL_TEST_OTHER = URL_ROOT + "/test-other.tar.gz"
-URL_DEV_CLEAN = URL_ROOT + "/dev-clean.tar.gz"
-URL_DEV_OTHER = URL_ROOT + "/dev-other.tar.gz"
-URL_TRAIN_CLEAN_100 = URL_ROOT + "/train-clean-100.tar.gz"
-URL_TRAIN_CLEAN_360 = URL_ROOT + "/train-clean-360.tar.gz"
-URL_TRAIN_OTHER_500 = URL_ROOT + "/train-other-500.tar.gz"
-MD5_TEST_CLEAN = "32fa31d27d2e1cad72775fee3f4849a9"
-MD5_TEST_OTHER = "fb5a50374b501bb3bac4815ee91d3135"
-MD5_DEV_CLEAN = "42e2234ba48799c1f50f24a7926300a1"
-MD5_DEV_OTHER = "c8d0bcc9cca99d4f8b62fcc847357931"
-MD5_TRAIN_CLEAN_100 = "2a93770f6d5c6c964bc36631d331a522"
-MD5_TRAIN_CLEAN_360 = "c0e676e450a7ff2f54aeade5171606fa"
-MD5_TRAIN_OTHER_500 = "d1a0fd59409feb2c614ce4d30c387708"
-parser = argparse.ArgumentParser(description=__doc__)
-parser.add_argument(
-    "--target_dir",
-    default='~/.cache/paddle/dataset/speech/libri',
-    type=str,
-    help="Directory to save the dataset. (default: %(default)s)")
-parser.add_argument(
-    "--manifest_prefix",
-    default="manifest",
-    type=str,
-    help="Filepath prefix for output manifests. (default: %(default)s)")
-parser.add_argument(
-    "--full_download",
-    default="True",
-    type=distutils.util.strtobool,
-    help="Download all datasets for Librispeech."
-    " If False, only download a minimal requirement (test-clean, dev-clean"
-    " train-clean-100). (default: %(default)s)")
-args = parser.parse_args()
-def create_manifest(data_dir, manifest_path):
-    """Create a manifest json file summarizing the data set, with each line
-    containing the meta data (i.e. audio filepath, transcription text, audio
-    duration) of each audio file within the data set.
-    """
-    print("Creating manifest %s ..." % manifest_path)
-    json_lines = []
-    for subfolder, _, filelist in sorted(os.walk(data_dir)):
-        text_filelist = [
-            filename for filename in filelist if filename.endswith('trans.txt')
-        ]
-        if len(text_filelist) > 0:
-            text_filepath = os.path.join(data_dir, subfolder, text_filelist[0])
-            for line in open(text_filepath):
-                segments = line.strip().split()
-                text = ' '.join(segments[1:]).lower()
-                audio_filepath = os.path.join(data_dir, subfolder,
-                                              segments[0] + '.flac')
-                audio_data, samplerate = soundfile.read(audio_filepath)
-                duration = float(len(audio_data)) / samplerate
-                json_lines.append(
-                    json.dumps({
-                        'audio_filepath': audio_filepath,
-                        'duration': duration,
-                        'text': text
-                    }))
-    with codecs.open(manifest_path, 'w', 'utf-8') as out_file:
-        for line in json_lines:
-            out_file.write(line + '\n')
-def prepare_dataset(url, md5sum, target_dir, manifest_path):
-    """Download, unpack and create summmary manifest file.
-    """
-    if not os.path.exists(os.path.join(target_dir, "LibriSpeech")):
-        # download
-        filepath = download(url, md5sum, target_dir)
-        # unpack
-        unpack(filepath, target_dir)
-    else:
-        print("Skip downloading and unpacking. Data already exists in %s." %
-              target_dir)
-    # create manifest json file
-    create_manifest(target_dir, manifest_path)
-def main():
-    if args.target_dir.startswith('~'):
-        args.target_dir = os.path.expanduser(args.target_dir)
-    prepare_dataset(
-        url=URL_TEST_CLEAN,
-        md5sum=MD5_TEST_CLEAN,
-        target_dir=os.path.join(args.target_dir, "test-clean"),
-        manifest_path=args.manifest_prefix + ".test-clean")
-    prepare_dataset(
-        url=URL_DEV_CLEAN,
-        md5sum=MD5_DEV_CLEAN,
-        target_dir=os.path.join(args.target_dir, "dev-clean"),
-        manifest_path=args.manifest_prefix + ".dev-clean")
-    if args.full_download:
-        prepare_dataset(
-            url=URL_TRAIN_CLEAN_100,
-            md5sum=MD5_TRAIN_CLEAN_100,
-            target_dir=os.path.join(args.target_dir, "train-clean-100"),
-            manifest_path=args.manifest_prefix + ".train-clean-100")
-        prepare_dataset(
-            url=URL_TEST_OTHER,
-            md5sum=MD5_TEST_OTHER,
-            target_dir=os.path.join(args.target_dir, "test-other"),
-            manifest_path=args.manifest_prefix + ".test-other")
-        prepare_dataset(
-            url=URL_DEV_OTHER,
-            md5sum=MD5_DEV_OTHER,
-            target_dir=os.path.join(args.target_dir, "dev-other"),
-            manifest_path=args.manifest_prefix + ".dev-other")
-        prepare_dataset(
-            url=URL_TRAIN_CLEAN_360,
-            md5sum=MD5_TRAIN_CLEAN_360,
-            target_dir=os.path.join(args.target_dir, "train-clean-360"),
-            manifest_path=args.manifest_prefix + ".train-clean-360")
-        prepare_dataset(
-            url=URL_TRAIN_OTHER_500,
-            md5sum=MD5_TRAIN_OTHER_500,
-            target_dir=os.path.join(args.target_dir, "train-other-500"),
-            manifest_path=args.manifest_prefix + ".train-other-500")
-if __name__ == '__main__':
-    main()
--- a/deep_speech_2/data/noise/chime3_background.py
+++ b/deep_speech_2/data/noise/chime3_background.py
-"""Prepare CHiME3 background data.
-Download, unpack and create manifest files.
-Manifest file is a json-format file with each line containing the
-meta data (i.e. audio filepath, transcript and audio duration)
-of each audio file in the data set.
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import distutils.util
-import os
-import wget
-import zipfile
-import argparse
-import soundfile
-import json
-from paddle.v2.dataset.common import md5file
-DATA_HOME = os.path.expanduser('~/.cache/paddle/dataset/speech')
-URL = "https://d4s.myairbridge.com/packagev2/AG0Y3DNBE5IWRRTV/?dlid=W19XG7T0NNHB027139H0EQ"
-MD5 = "c3ff512618d7a67d4f85566ea1bc39ec"
-parser = argparse.ArgumentParser(description=__doc__)
-parser.add_argument(
-    "--target_dir",
-    default=DATA_HOME + "/chime3_background",
-    type=str,
-    help="Directory to save the dataset. (default: %(default)s)")
-parser.add_argument(
-    "--manifest_filepath",
-    default="manifest.chime3.background",
-    type=str,
-    help="Filepath for output manifests. (default: %(default)s)")
-args = parser.parse_args()
-def download(url, md5sum, target_dir, filename=None):
-    """Download file from url to target_dir, and check md5sum."""
-    if filename == None:
-        filename = url.split("/")[-1]
-    if not os.path.exists(target_dir): os.makedirs(target_dir)
-    filepath = os.path.join(target_dir, filename)
-    if not (os.path.exists(filepath) and md5file(filepath) == md5sum):
-        print("Downloading %s ..." % url)
-        wget.download(url, target_dir)
-        print("\nMD5 Chesksum %s ..." % filepath)
-        if not md5file(filepath) == md5sum:
-            raise RuntimeError("MD5 checksum failed.")
-    else:
-        print("File exists, skip downloading. (%s)" % filepath)
-    return filepath
-def unpack(filepath, target_dir):
-    """Unpack the file to the target_dir."""
-    print("Unpacking %s ..." % filepath)
-    if filepath.endswith('.zip'):
-        zip = zipfile.ZipFile(filepath, 'r')
-        zip.extractall(target_dir)
-        zip.close()
-    elif filepath.endswith('.tar') or filepath.endswith('.tar.gz'):
-        tar = zipfile.open(filepath)
-        tar.extractall(target_dir)
-        tar.close()
-    else:
-        raise ValueError("File format is not supported for unpacking.")
-def create_manifest(data_dir, manifest_path):
-    """Create a manifest json file summarizing the data set, with each line
-    containing the meta data (i.e. audio filepath, transcription text, audio
-    duration) of each audio file within the data set.
-    """
-    print("Creating manifest %s ..." % manifest_path)
-    json_lines = []
-    for subfolder, _, filelist in sorted(os.walk(data_dir)):
-        for filename in filelist:
-            if filename.endswith('.wav'):
-                filepath = os.path.join(data_dir, subfolder, filename)
-                audio_data, samplerate = soundfile.read(filepath)
-                duration = float(len(audio_data)) / samplerate
-                json_lines.append(
-                    json.dumps({
-                        'audio_filepath': filepath,
-                        'duration': duration,
-                        'text': ''
-                    }))
-    with open(manifest_path, 'w') as out_file:
-        for line in json_lines:
-            out_file.write(line + '\n')
-def prepare_chime3(url, md5sum, target_dir, manifest_path):
-    """Download, unpack and create summmary manifest file."""
-    if not os.path.exists(os.path.join(target_dir, "CHiME3")):
-        # download
-        filepath = download(url, md5sum, target_dir,
-                            "myairbridge-AG0Y3DNBE5IWRRTV.zip")
-        # unpack
-        unpack(filepath, target_dir)
-        unpack(
-            os.path.join(target_dir, 'CHiME3_background_bus.zip'), target_dir)
-        unpack(
-            os.path.join(target_dir, 'CHiME3_background_caf.zip'), target_dir)
-        unpack(
-            os.path.join(target_dir, 'CHiME3_background_ped.zip'), target_dir)
-        unpack(
-            os.path.join(target_dir, 'CHiME3_background_str.zip'), target_dir)
-    else:
-        print("Skip downloading and unpacking. Data already exists in %s." %
-              target_dir)
-    # create manifest json file
-    create_manifest(target_dir, manifest_path)
-def main():
-    prepare_chime3(
-        url=URL,
-        md5sum=MD5,
-        target_dir=args.target_dir,
-        manifest_path=args.manifest_filepath)
-if __name__ == '__main__':
-    main()
--- a/deep_speech_2/data_utils/audio.py
+++ b/deep_speech_2/data_utils/audio.py
--- a/deep_speech_2/data_utils/augmentor/__init__.py
+++ b/deep_speech_2/data_utils/augmentor/__init__.py
--- a/deep_speech_2/data_utils/augmentor/augmentation.py
+++ b/deep_speech_2/data_utils/augmentor/augmentation.py
-"""Contains the data augmentation pipeline."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import json
-import random
-from data_utils.augmentor.volume_perturb import VolumePerturbAugmentor
-from data_utils.augmentor.shift_perturb import ShiftPerturbAugmentor
-from data_utils.augmentor.speed_perturb import SpeedPerturbAugmentor
-from data_utils.augmentor.noise_perturb import NoisePerturbAugmentor
-from data_utils.augmentor.impulse_response import ImpulseResponseAugmentor
-from data_utils.augmentor.resample import ResampleAugmentor
-from data_utils.augmentor.online_bayesian_normalization import \
-     OnlineBayesianNormalizationAugmentor
-class AugmentationPipeline(object):
-    """Build a pre-processing pipeline with various augmentation models.Such a
-    data augmentation pipeline is oftern leveraged to augment the training
-    samples to make the model invariant to certain types of perturbations in the
-    real world, improving model's generalization ability.
-    The pipeline is built according the the augmentation configuration in json
-    string, e.g.
-    .. code-block::
-        [ {
-                "type": "noise",
-                "params": {"min_snr_dB": 10,
-                           "max_snr_dB": 20,
-                           "noise_manifest_path": "datasets/manifest.noise"},
-                "prob": 0.0
-            },
-            {
-                "type": "speed",
-                "params": {"min_speed_rate": 0.9,
-                           "max_speed_rate": 1.1},
-                "prob": 1.0
-            },
-            {
-                "type": "shift",
-                "params": {"min_shift_ms": -5,
-                           "max_shift_ms": 5},
-                "prob": 1.0
-            },
-            {
-                "type": "volume",
-                "params": {"min_gain_dBFS": -10,
-                           "max_gain_dBFS": 10},
-                "prob": 0.0
-            },
-            {
-                "type": "bayesian_normal",
-                "params": {"target_db": -20,
-                           "prior_db": -20,
-                           "prior_samples": 100},
-                "prob": 0.0
-            }
-        ]
-    This augmentation configuration inserts two augmentation models
-    into the pipeline, with one is VolumePerturbAugmentor and the other
-    SpeedPerturbAugmentor. "prob" indicates the probability of the current
-    augmentor to take effect. If "prob" is zero, the augmentor does not take
-    effect.
-    :param augmentation_config: Augmentation configuration in json string.
-    :type augmentation_config: str
-    :param random_seed: Random seed.
-    :type random_seed: int
-    :raises ValueError: If the augmentation json config is in incorrect format".
-    """
-    def __init__(self, augmentation_config, random_seed=0):
-        self._rng = random.Random(random_seed)
-        self._augmentors, self._rates = self._parse_pipeline_from(
-            augmentation_config)
-    def transform_audio(self, audio_segment):
-        """Run the pre-processing pipeline for data augmentation.
-        Note that this is an in-place transformation.
-        :param audio_segment: Audio segment to process.
-        :type audio_segment: AudioSegmenet|SpeechSegment
-        """
-        for augmentor, rate in zip(self._augmentors, self._rates):
-            if self._rng.uniform(0., 1.) < rate:
-                augmentor.transform_audio(audio_segment)
-    def _parse_pipeline_from(self, config_json):
-        """Parse the config json to build a augmentation pipelien."""
-        try:
-            configs = json.loads(config_json)
-            augmentors = [
-                self._get_augmentor(config["type"], config["params"])
-                for config in configs
-            ]
-            rates = [config["prob"] for config in configs]
-        except Exception as e:
-            raise ValueError("Failed to parse the augmentation config json: "
-                             "%s" % str(e))
-        return augmentors, rates
-    def _get_augmentor(self, augmentor_type, params):
-        """Return an augmentation model by the type name, and pass in params."""
-        if augmentor_type == "volume":
-            return VolumePerturbAugmentor(self._rng, **params)
-        elif augmentor_type == "shift":
-            return ShiftPerturbAugmentor(self._rng, **params)
-        elif augmentor_type == "speed":
-            return SpeedPerturbAugmentor(self._rng, **params)
-        elif augmentor_type == "resample":
-            return ResampleAugmentor(self._rng, **params)
-        elif augmentor_type == "bayesian_normal":
-            return OnlineBayesianNormalizationAugmentor(self._rng, **params)
-        elif augmentor_type == "noise":
-            return NoisePerturbAugmentor(self._rng, **params)
-        elif augmentor_type == "impulse":
-            return ImpulseResponseAugmentor(self._rng, **params)
-        else:
-            raise ValueError("Unknown augmentor type [%s]." % augmentor_type)
--- a/deep_speech_2/data_utils/augmentor/base.py
+++ b/deep_speech_2/data_utils/augmentor/base.py
-"""Contains the abstract base class for augmentation models."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-from abc import ABCMeta, abstractmethod
-class AugmentorBase(object):
-    """Abstract base class for augmentation model (augmentor) class.
-    All augmentor classes should inherit from this class, and implement the
-    following abstract methods.
-    """
-    __metaclass__ = ABCMeta
-    @abstractmethod
-    def __init__(self):
-        pass
-    @abstractmethod
-    def transform_audio(self, audio_segment):
-        """Adds various effects to the input audio segment. Such effects
-        will augment the training data to make the model invariant to certain
-        types of perturbations in the real world, improving model's
-        generalization ability.
-        Note that this is an in-place transformation.
-        :param audio_segment: Audio segment to add effects to.
-        :type audio_segment: AudioSegmenet|SpeechSegment
-        """
-        pass
--- a/deep_speech_2/data_utils/augmentor/impulse_response.py
+++ b/deep_speech_2/data_utils/augmentor/impulse_response.py
-"""Contains the impulse response augmentation model."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-from data_utils.augmentor.base import AugmentorBase
-from data_utils.utility import read_manifest
-from data_utils.audio import AudioSegment
-class ImpulseResponseAugmentor(AugmentorBase):
-    """Augmentation model for adding impulse response effect.
-    :param rng: Random generator object.
-    :type rng: random.Random
-    :param impulse_manifest_path: Manifest path for impulse audio data.
-    :type impulse_manifest_path: basestring
-    """
-    def __init__(self, rng, impulse_manifest_path):
-        self._rng = rng
-        self._impulse_manifest = read_manifest(impulse_manifest_path)
-    def transform_audio(self, audio_segment):
-        """Add impulse response effect.
-        Note that this is an in-place transformation.
-        :param audio_segment: Audio segment to add effects to.
-        :type audio_segment: AudioSegmenet|SpeechSegment
-        """
-        impulse_json = self._rng.sample(self._impulse_manifest, 1)[0]
-        impulse_segment = AudioSegment.from_file(impulse_json['audio_filepath'])
-        audio_segment.convolve(impulse_segment, allow_resample=True)
--- a/deep_speech_2/data_utils/augmentor/noise_perturb.py
+++ b/deep_speech_2/data_utils/augmentor/noise_perturb.py
-"""Contains the noise perturb augmentation model."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-from data_utils.augmentor.base import AugmentorBase
-from data_utils.utility import read_manifest
-from data_utils.audio import AudioSegment
-class NoisePerturbAugmentor(AugmentorBase):
-    """Augmentation model for adding background noise.
-    :param rng: Random generator object.
-    :type rng: random.Random
-    :param min_snr_dB: Minimal signal noise ratio, in decibels.
-    :type min_snr_dB: float
-    :param max_snr_dB: Maximal signal noise ratio, in decibels.
-    :type max_snr_dB: float
-    :param noise_manifest_path: Manifest path for noise audio data.
-    :type noise_manifest_path: basestring
-    """
-    def __init__(self, rng, min_snr_dB, max_snr_dB, noise_manifest_path):
-        self._min_snr_dB = min_snr_dB
-        self._max_snr_dB = max_snr_dB
-        self._rng = rng
-        self._noise_manifest = read_manifest(manifest_path=noise_manifest_path)
-    def transform_audio(self, audio_segment):
-        """Add background noise audio.
-        Note that this is an in-place transformation.
-        :param audio_segment: Audio segment to add effects to.
-        :type audio_segment: AudioSegmenet|SpeechSegment
-        """
-        noise_json = self._rng.sample(self._noise_manifest, 1)[0]
-        if noise_json['duration'] < audio_segment.duration:
-            raise RuntimeError("The duration of sampled noise audio is smaller "
-                               "than the audio segment to add effects to.")
-        diff_duration = noise_json['duration'] - audio_segment.duration
-        start = self._rng.uniform(0, diff_duration)
-        end = start + audio_segment.duration
-        noise_segment = AudioSegment.slice_from_file(
-            noise_json['audio_filepath'], start=start, end=end)
-        snr_dB = self._rng.uniform(self._min_snr_dB, self._max_snr_dB)
-        audio_segment.add_noise(
-            noise_segment, snr_dB, allow_downsampling=True, rng=self._rng)
--- a/deep_speech_2/data_utils/augmentor/online_bayesian_normalization.py
+++ b/deep_speech_2/data_utils/augmentor/online_bayesian_normalization.py
-"""Contain the online bayesian normalization augmentation model."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-from data_utils.augmentor.base import AugmentorBase
-class OnlineBayesianNormalizationAugmentor(AugmentorBase):
-    """Augmentation model for adding online bayesian normalization.
-    :param rng: Random generator object.
-    :type rng: random.Random
-    :param target_db: Target RMS value in decibels.
-    :type target_db: float
-    :param prior_db: Prior RMS estimate in decibels.
-    :type prior_db: float
-    :param prior_samples: Prior strength in number of samples.
-    :type prior_samples: int
-    :param startup_delay: Default 0.0s. If provided, this function will
-                          accrue statistics for the first startup_delay 
-                          seconds before applying online normalization.
-    :type starup_delay: float.
-    """
-    def __init__(self,
-                 rng,
-                 target_db,
-                 prior_db,
-                 prior_samples,
-                 startup_delay=0.0):
-        self._target_db = target_db
-        self._prior_db = prior_db
-        self._prior_samples = prior_samples
-        self._rng = rng
-        self._startup_delay = startup_delay
-    def transform_audio(self, audio_segment):
-        """Normalizes the input audio using the online Bayesian approach.
-        Note that this is an in-place transformation.
-        :param audio_segment: Audio segment to add effects to.
-        :type audio_segment: AudioSegment|SpeechSegment
-        """
-        audio_segment.normalize_online_bayesian(self._target_db, self._prior_db,
-                                                self._prior_samples,
-                                                self._startup_delay)
--- a/deep_speech_2/data_utils/augmentor/resample.py
+++ b/deep_speech_2/data_utils/augmentor/resample.py
-"""Contain the resample augmentation model."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-from data_utils.augmentor.base import AugmentorBase
-class ResampleAugmentor(AugmentorBase):
-    """Augmentation model for resampling.
-    See more info here:
-    https://ccrma.stanford.edu/~jos/resample/index.html
-    :param rng: Random generator object.
-    :type rng: random.Random
-    :param new_sample_rate: New sample rate in Hz.
-    :type new_sample_rate: int
-    """
-    def __init__(self, rng, new_sample_rate):
-        self._new_sample_rate = new_sample_rate
-        self._rng = rng
-    def transform_audio(self, audio_segment):
-        """Resamples the input audio to a target sample rate.
-        Note that this is an in-place transformation.
-        :param audio: Audio segment to add effects to.
-        :type audio: AudioSegment|SpeechSegment
-        """
-        audio_segment.resample(self._new_sample_rate)
--- a/deep_speech_2/data_utils/augmentor/shift_perturb.py
+++ b/deep_speech_2/data_utils/augmentor/shift_perturb.py
-"""Contains the volume perturb augmentation model."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-from data_utils.augmentor.base import AugmentorBase
-class ShiftPerturbAugmentor(AugmentorBase):
-    """Augmentation model for adding random shift perturbation.
-    :param rng: Random generator object.
-    :type rng: random.Random
-    :param min_shift_ms: Minimal shift in milliseconds.
-    :type min_shift_ms: float
-    :param max_shift_ms: Maximal shift in milliseconds.
-    :type max_shift_ms: float
-    """
-    def __init__(self, rng, min_shift_ms, max_shift_ms):
-        self._min_shift_ms = min_shift_ms
-        self._max_shift_ms = max_shift_ms
-        self._rng = rng
-    def transform_audio(self, audio_segment):
-        """Shift audio.
-        Note that this is an in-place transformation.
-        :param audio_segment: Audio segment to add effects to.
-        :type audio_segment: AudioSegmenet|SpeechSegment
-        """
-        shift_ms = self._rng.uniform(self._min_shift_ms, self._max_shift_ms)
-        audio_segment.shift(shift_ms)
--- a/deep_speech_2/data_utils/augmentor/speed_perturb.py
+++ b/deep_speech_2/data_utils/augmentor/speed_perturb.py
-"""Contain the speech perturbation augmentation model."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-from data_utils.augmentor.base import AugmentorBase
-class SpeedPerturbAugmentor(AugmentorBase):
-    """Augmentation model for adding speed perturbation.
-    See reference paper here:
-    http://www.danielpovey.com/files/2015_interspeech_augmentation.pdf
-    :param rng: Random generator object.
-    :type rng: random.Random
-    :param min_speed_rate: Lower bound of new speed rate to sample and should
-                           not be smaller than 0.9.
-    :type min_speed_rate: float
-    :param max_speed_rate: Upper bound of new speed rate to sample and should
-                           not be larger than 1.1.
-    :type max_speed_rate: float
-    """
-    def __init__(self, rng, min_speed_rate, max_speed_rate):
-        if min_speed_rate < 0.9:
-            raise ValueError(
-                "Sampling speed below 0.9 can cause unnatural effects")
-        if max_speed_rate > 1.1:
-            raise ValueError(
-                "Sampling speed above 1.1 can cause unnatural effects")
-        self._min_speed_rate = min_speed_rate
-        self._max_speed_rate = max_speed_rate
-        self._rng = rng
-    def transform_audio(self, audio_segment):
-        """Sample a new speed rate from the given range and
-        changes the speed of the given audio clip.
-        Note that this is an in-place transformation.
-        :param audio_segment: Audio segment to add effects to.
-        :type audio_segment: AudioSegment|SpeechSegment
-        """
-        sampled_speed = self._rng.uniform(self._min_speed_rate,
-                                          self._max_speed_rate)
-        audio_segment.change_speed(sampled_speed)
--- a/deep_speech_2/data_utils/augmentor/volume_perturb.py
+++ b/deep_speech_2/data_utils/augmentor/volume_perturb.py
-"""Contains the volume perturb augmentation model."""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-from data_utils.augmentor.base import AugmentorBase
-class VolumePerturbAugmentor(AugmentorBase):
-    """Augmentation model for adding random volume perturbation.
-    This is used for multi-loudness training of PCEN. See
-    https://arxiv.org/pdf/1607.05666v1.pdf
-    for more details.
-    :param rng: Random generator object.
-    :type rng: random.Random
-    :param min_gain_dBFS: Minimal gain in dBFS.
-    :type min_gain_dBFS: float
-    :param max_gain_dBFS: Maximal gain in dBFS.
-    :type max_gain_dBFS: float
-    """
-    def __init__(self, rng, min_gain_dBFS, max_gain_dBFS):
-        self._min_gain_dBFS = min_gain_dBFS
-        self._max_gain_dBFS = max_gain_dBFS
-        self._rng = rng
-    def transform_audio(self, audio_segment):
-        """Change audio loadness.
-        Note that this is an in-place transformation.
-        :param audio_segment: Audio segment to add effects to.
-        :type audio_segment: AudioSegmenet|SpeechSegment
-        """
-        gain = self._rng.uniform(self._min_gain_dBFS, self._max_gain_dBFS)
-        audio_segment.gain_db(gain)
--- a/deep_speech_2/data_utils/data.py
+++ b/deep_speech_2/data_utils/data.py
-"""Contains data generator for orgnaizing various audio data preprocessing
-pipeline and offering data reader interface of PaddlePaddle requirements.
-"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import random
-import tarfile
-import multiprocessing
-import numpy as np
-import paddle.v2 as paddle
-from threading import local
-import atexit
-from data_utils.utility import read_manifest
-from data_utils.utility import xmap_readers_mp
-from data_utils.augmentor.augmentation import AugmentationPipeline
-from data_utils.featurizer.speech_featurizer import SpeechFeaturizer
-from data_utils.speech import SpeechSegment
-from data_utils.normalizer import FeatureNormalizer
-class DataGenerator(object):
-    """
-    DataGenerator provides basic audio data preprocessing pipeline, and offers
-    data reader interfaces of PaddlePaddle requirements.
-    :param vocab_filepath: Vocabulary filepath for indexing tokenized
-                           transcripts.
-    :type vocab_filepath: basestring
-    :param mean_std_filepath: File containing the pre-computed mean and stddev.
-    :type mean_std_filepath: None|basestring
-    :param augmentation_config: Augmentation configuration in json string.
-                                Details see AugmentationPipeline.__doc__.
-    :type augmentation_config: str
-    :param max_duration: Audio with duration (in seconds) greater than
-                         this will be discarded.
-    :type max_duration: float
-    :param min_duration: Audio with duration (in seconds) smaller than
-                         this will be discarded.
-    :type min_duration: float
-    :param stride_ms: Striding size (in milliseconds) for generating frames.
-    :type stride_ms: float
-    :param window_ms: Window size (in milliseconds) for generating frames.
-    :type window_ms: float
-    :param max_freq: Used when specgram_type is 'linear', only FFT bins
-                     corresponding to frequencies between [0, max_freq] are
-                     returned.
-    :types max_freq: None|float
-    :param specgram_type: Specgram feature type. Options: 'linear'.
-    :type specgram_type: str
-    :param use_dB_normalization: Whether to normalize the audio to -20 dB
-                                before extracting the features.
-    :type use_dB_normalization: bool
-    :param num_threads: Number of CPU threads for processing data.
-    :type num_threads: int
-    :param random_seed: Random seed.
-    :type random_seed: int
-    :param keep_transcription_text: If set to True, transcription text will
-                                    be passed forward directly without
-                                    converting to index sequence.
-    :type keep_transcription_text: bool
-    :param num_conv_layers: The number of convolution layer, used to compute
-                            the sequence length.
-    :type num_conv_layers: int
-    """
-    def __init__(self,
-                 vocab_filepath,
-                 mean_std_filepath,
-                 augmentation_config='{}',
-                 max_duration=float('inf'),
-                 min_duration=0.0,
-                 stride_ms=10.0,
-                 window_ms=20.0,
-                 max_freq=None,
-                 specgram_type='linear',
-                 use_dB_normalization=True,
-                 num_threads=multiprocessing.cpu_count() // 2,
-                 random_seed=0,
-                 keep_transcription_text=False,
-                 num_conv_layers=2):
-        self._max_duration = max_duration
-        self._min_duration = min_duration
-        self._normalizer = FeatureNormalizer(mean_std_filepath)
-        self._augmentation_pipeline = AugmentationPipeline(
-            augmentation_config=augmentation_config, random_seed=random_seed)
-        self._speech_featurizer = SpeechFeaturizer(
-            vocab_filepath=vocab_filepath,
-            specgram_type=specgram_type,
-            stride_ms=stride_ms,
-            window_ms=window_ms,
-            max_freq=max_freq,
-            use_dB_normalization=use_dB_normalization)
-        self._num_threads = num_threads
-        self._rng = random.Random(random_seed)
-        self._keep_transcription_text = keep_transcription_text
-        self._epoch = 0
-        # for caching tar files info
-        self._local_data = local()
-        self._local_data.tar2info = {}
-        self._local_data.tar2object = {}
-        self._num_conv_layers = num_conv_layers
-    def process_utterance(self, filename, transcript):
-        """Load, augment, featurize and normalize for speech data.
-        :param filename: Audio filepath
-        :type filename: basestring | file
-        :param transcript: Transcription text.
-        :type transcript: basestring
-        :return: Tuple of audio feature tensor and data of transcription part,
-                 where transcription part could be token ids or text.
-        :rtype: tuple of (2darray, list)
-        """
-        if filename.startswith('tar:'):
-            speech_segment = SpeechSegment.from_file(
-                self._subfile_from_tar(filename), transcript)
-        else:
-            speech_segment = SpeechSegment.from_file(filename, transcript)
-        self._augmentation_pipeline.transform_audio(speech_segment)
-        specgram, transcript_part = self._speech_featurizer.featurize(
-            speech_segment, self._keep_transcription_text)
-        specgram = self._normalizer.apply(specgram)
-        return specgram, transcript_part
-    def batch_reader_creator(self,
-                             manifest_path,
-                             batch_size,
-                             min_batch_size=1,
-                             padding_to=-1,
-                             flatten=False,
-                             sortagrad=False,
-                             shuffle_method="batch_shuffle"):
-        """
-        Batch data reader creator for audio data. Return a callable generator
-        function to produce batches of data.
-        Audio features within one batch will be padded with zeros to have the
-        same shape, or a user-defined shape.
-        :param manifest_path: Filepath of manifest for audio files.
-        :type manifest_path: basestring
-        :param batch_size: Number of instances in a batch.
-        :type batch_size: int
-        :param min_batch_size: Any batch with batch size smaller than this will
-                               be discarded. (To be deprecated in the future.)
-        :type min_batch_size: int
-        :param padding_to:  If set -1, the maximun shape in the batch
-                            will be used as the target shape for padding.
-                            Otherwise, `padding_to` will be the target shape.
-        :type padding_to: int
-        :param flatten: If set True, audio features will be flatten to 1darray.
-        :type flatten: bool
-        :param sortagrad: If set True, sort the instances by audio duration
-                          in the first epoch for speed up training.
-        :type sortagrad: bool
-        :param shuffle_method: Shuffle method. Options:
-                                '' or None: no shuffle.
-                                'instance_shuffle': instance-wise shuffle.
-                                'batch_shuffle': similarly-sized instances are
-                                                 put into batches, and then
-                                                 batch-wise shuffle the batches.
-                                                 For more details, please see
-                                                 ``_batch_shuffle.__doc__``.
-                                'batch_shuffle_clipped': 'batch_shuffle' with
-                                                         head shift and tail
-                                                         clipping. For more
-                                                         details, please see
-                                                         ``_batch_shuffle``.
-                              If sortagrad is True, shuffle is disabled
-                              for the first epoch.
-        :type shuffle_method: None|str
-        :return: Batch reader function, producing batches of data when called.
-        :rtype: callable
-        """
-        def batch_reader():
-            # read manifest
-            manifest = read_manifest(
-                manifest_path=manifest_path,
-                max_duration=self._max_duration,
-                min_duration=self._min_duration)
-            # sort (by duration) or batch-wise shuffle the manifest
-            if self._epoch == 0 and sortagrad:
-                manifest.sort(key=lambda x: x["duration"])
-            else:
-                if shuffle_method == "batch_shuffle":
-                    manifest = self._batch_shuffle(
-                        manifest, batch_size, clipped=False)
-                elif shuffle_method == "batch_shuffle_clipped":
-                    manifest = self._batch_shuffle(
-                        manifest, batch_size, clipped=True)
-                elif shuffle_method == "instance_shuffle":
-                    self._rng.shuffle(manifest)
-                elif shuffle_method == None:
-                    pass
-                else:
-                    raise ValueError("Unknown shuffle method %s." %
-                                     shuffle_method)
-            # prepare batches
-            instance_reader = self._instance_reader_creator(manifest)
-            batch = []
-            for instance in instance_reader():
-                batch.append(instance)
-                if len(batch) == batch_size:
-                    yield self._padding_batch(batch, padding_to, flatten)
-                    batch = []
-            if len(batch) >= min_batch_size:
-                yield self._padding_batch(batch, padding_to, flatten)
-            self._epoch += 1
-        return batch_reader
-    @property
-    def feeding(self):
-        """Returns data reader's feeding dict.
-        :return: Data feeding dict.
-        :rtype: dict
-        """
-        feeding_dict = {
-            "audio_spectrogram": 0,
-            "transcript_text": 1,
-            "sequence_offset": 2,
-            "sequence_length": 3
-        }
-        for i in xrange(self._num_conv_layers):
-            feeding_dict["conv%d_index_range" % i] = len(feeding_dict)
-        return feeding_dict
-    @property
-    def vocab_size(self):
-        """Return the vocabulary size.
-        :return: Vocabulary size.
-        :rtype: int
-        """
-        return self._speech_featurizer.vocab_size
-    @property
-    def vocab_list(self):
-        """Return the vocabulary in list.
-        :return: Vocabulary in list.
-        :rtype: list
-        """
-        return self._speech_featurizer.vocab_list
-    def _parse_tar(self, file):
-        """Parse a tar file to get a tarfile object
-        and a map containing tarinfoes
-        """
-        result = {}
-        f = tarfile.open(file)
-        for tarinfo in f.getmembers():
-            result[tarinfo.name] = tarinfo
-        return f, result
-    def _subfile_from_tar(self, file):
-        """Get subfile object from tar.
-        It will return a subfile object from tar file
-        and cached tar file info for next reading request.
-        """
-        tarpath, filename = file.split(':', 1)[1].split('#', 1)
-        if 'tar2info' not in self._local_data.__dict__:
-            self._local_data.tar2info = {}
-        if 'tar2object' not in self._local_data.__dict__:
-            self._local_data.tar2object = {}
-        if tarpath not in self._local_data.tar2info:
-            object, infoes = self._parse_tar(tarpath)
-            self._local_data.tar2info[tarpath] = infoes
-            self._local_data.tar2object[tarpath] = object
-        return self._local_data.tar2object[tarpath].extractfile(
-            self._local_data.tar2info[tarpath][filename])
-    def _instance_reader_creator(self, manifest):
-        """
-        Instance reader creator. Create a callable function to produce
-        instances of data.
-        Instance: a tuple of ndarray of audio spectrogram and a list of
-        token indices for transcript.
-        """
-        def reader():
-            for instance in manifest:
-                yield instance
-        reader, cleanup_callback = xmap_readers_mp(
-            lambda instance: self.process_utterance(instance["audio_filepath"], instance["text"]),
-            reader,
-            self._num_threads,
-            4096,
-            order=True)
-        # register callback to main process
-        atexit.register(cleanup_callback)
-        return reader
-    def _padding_batch(self, batch, padding_to=-1, flatten=False):
-        """
-        Padding audio features with zeros to make them have the same shape (or
-        a user-defined shape) within one bach.
-        If ``padding_to`` is -1, the maximun shape in the batch will be used
-        as the target shape for padding. Otherwise, `padding_to` will be the
-        target shape (only refers to the second axis).
-        If `flatten` is True, features will be flatten to 1darray.
-        """
-        new_batch = []
-        # get target shape
-        max_length = max([audio.shape[1] for audio, text in batch])
-        if padding_to != -1:
-            if padding_to < max_length:
-                raise ValueError("If padding_to is not -1, it should be larger "
-                                 "than any instance's shape in the batch")
-            max_length = padding_to
-        # padding
-        for audio, text in batch:
-            padded_audio = np.zeros([audio.shape[0], max_length])
-            padded_audio[:, :audio.shape[1]] = audio
-            if flatten:
-                padded_audio = padded_audio.flatten()
-            # Stride size for conv0 is (3, 2)
-            # Stride size for conv1 to convN is (1, 2)
-            # Same as the network, hard-coded here
-            padded_instance = [padded_audio, text]
-            padded_conv0_h = (padded_audio.shape[0] - 1) // 2 + 1
-            padded_conv0_w = (padded_audio.shape[1] - 1) // 3 + 1
-            valid_w = (audio.shape[1] - 1) // 3 + 1
-            padded_instance += [
-                [0],  # sequence offset, always 0
-                [valid_w],  # valid sequence length
-                # Index ranges for channel, height and width
-                # Please refer scale_sub_region layer to see details
-                [1, 32, 1, padded_conv0_h, valid_w + 1, padded_conv0_w]
-            ]
-            pre_padded_h = padded_conv0_h
-            for i in xrange(self._num_conv_layers - 1):
-                padded_h = (pre_padded_h - 1) // 2 + 1
-                pre_padded_h = padded_h
-                padded_instance += [
-                    [1, 32, 1, padded_h, valid_w + 1, padded_conv0_w]
-                ]
-            new_batch.append(padded_instance)
-        return new_batch
-    def _batch_shuffle(self, manifest, batch_size, clipped=False):
-        """Put similarly-sized instances into minibatches for better efficiency
-        and make a batch-wise shuffle.
-        1. Sort the audio clips by duration.
-        2. Generate a random number `k`, k in [0, batch_size).
-        3. Randomly shift `k` instances in order to create different batches
-           for different epochs. Create minibatches.
-        4. Shuffle the minibatches.
-        :param manifest: Manifest contents. List of dict.
-        :type manifest: list
-        :param batch_size: Batch size. This size is also used for generate
-                           a random number for batch shuffle.
-        :type batch_size: int
-        :param clipped: Whether to clip the heading (small shift) and trailing
-                        (incomplete batch) instances.
-        :type clipped: bool
-        :return: Batch shuffled mainifest.
-        :rtype: list
-        """
-        manifest.sort(key=lambda x: x["duration"])
-        shift_len = self._rng.randint(0, batch_size - 1)
-        batch_manifest = zip(*[iter(manifest[shift_len:])] * batch_size)
-        self._rng.shuffle(batch_manifest)
-        batch_manifest = [item for batch in batch_manifest for item in batch]
-        if not clipped:
-            res_len = len(manifest) - shift_len - len(batch_manifest)
-            batch_manifest.extend(manifest[-res_len:])
-            batch_manifest.extend(manifest[0:shift_len])
-        return batch_manifest
--- a/deep_speech_2/data_utils/featurizer/__init__.py
+++ b/deep_speech_2/data_utils/featurizer/__init__.py
--- a/deep_speech_2/data_utils/featurizer/audio_featurizer.py
+++ b/deep_speech_2/data_utils/featurizer/audio_featurizer.py
--- a/deep_speech_2/data_utils/featurizer/speech_featurizer.py
+++ b/deep_speech_2/data_utils/featurizer/speech_featurizer.py
--- a/deep_speech_2/data_utils/featurizer/text_featurizer.py
+++ b/deep_speech_2/data_utils/featurizer/text_featurizer.py
--- a/deep_speech_2/data_utils/normalizer.py
+++ b/deep_speech_2/data_utils/normalizer.py
--- a/deep_speech_2/data_utils/speech.py
+++ b/deep_speech_2/data_utils/speech.py
--- a/deep_speech_2/data_utils/utility.py
+++ b/deep_speech_2/data_utils/utility.py
--- a/deep_speech_2/decoders/__init__.py
+++ b/deep_speech_2/decoders/__init__.py
--- a/deep_speech_2/decoders/decoders_deprecated.py
+++ b/deep_speech_2/decoders/decoders_deprecated.py
--- a/deep_speech_2/decoders/scorer_deprecated.py
+++ b/deep_speech_2/decoders/scorer_deprecated.py
--- a/deep_speech_2/decoders/swig/__init__.py
+++ b/deep_speech_2/decoders/swig/__init__.py
--- a/deep_speech_2/decoders/swig/_init_paths.py
+++ b/deep_speech_2/decoders/swig/_init_paths.py
-"""Set up paths for DS2"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import os.path
-import sys
-def add_path(path):
-    if path not in sys.path:
-        sys.path.insert(0, path)
-this_dir = os.path.dirname(__file__)
-# Add project path to PYTHONPATH
-proj_path = os.path.join(this_dir, '..')
-add_path(proj_path)
--- a/deep_speech_2/decoders/swig/ctc_beam_search_decoder.cpp
+++ b/deep_speech_2/decoders/swig/ctc_beam_search_decoder.cpp
--- a/deep_speech_2/decoders/swig/ctc_beam_search_decoder.h
+++ b/deep_speech_2/decoders/swig/ctc_beam_search_decoder.h
--- a/deep_speech_2/decoders/swig/ctc_greedy_decoder.cpp
+++ b/deep_speech_2/decoders/swig/ctc_greedy_decoder.cpp
--- a/deep_speech_2/decoders/swig/ctc_greedy_decoder.h
+++ b/deep_speech_2/decoders/swig/ctc_greedy_decoder.h
--- a/deep_speech_2/decoders/swig/decoder_utils.cpp
+++ b/deep_speech_2/decoders/swig/decoder_utils.cpp
--- a/deep_speech_2/decoders/swig/decoder_utils.h
+++ b/deep_speech_2/decoders/swig/decoder_utils.h
--- a/deep_speech_2/decoders/swig/decoders.i
+++ b/deep_speech_2/decoders/swig/decoders.i
--- a/deep_speech_2/decoders/swig/path_trie.cpp
+++ b/deep_speech_2/decoders/swig/path_trie.cpp
--- a/deep_speech_2/decoders/swig/path_trie.h
+++ b/deep_speech_2/decoders/swig/path_trie.h
--- a/deep_speech_2/decoders/swig/scorer.cpp
+++ b/deep_speech_2/decoders/swig/scorer.cpp
--- a/deep_speech_2/decoders/swig/scorer.h
+++ b/deep_speech_2/decoders/swig/scorer.h
--- a/deep_speech_2/decoders/swig/setup.py
+++ b/deep_speech_2/decoders/swig/setup.py
--- a/deep_speech_2/decoders/swig/setup.sh
+++ b/deep_speech_2/decoders/swig/setup.sh
--- a/deep_speech_2/decoders/swig_wrapper.py
+++ b/deep_speech_2/decoders/swig_wrapper.py
--- a/deep_speech_2/decoders/tests/test_decoders.py
+++ b/deep_speech_2/decoders/tests/test_decoders.py
--- a/deep_speech_2/deploy/_init_paths.py
+++ b/deep_speech_2/deploy/_init_paths.py
-"""Set up paths for DS2"""
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-import os.path
-import sys
-def add_path(path):
-    if path not in sys.path:
-        sys.path.insert(0, path)
-this_dir = os.path.dirname(__file__)
-# Add project path to PYTHONPATH
-proj_path = os.path.join(this_dir, '..')
-add_path(proj_path)
--- a/deep_speech_2/deploy/demo_client.py
+++ b/deep_speech_2/deploy/demo_client.py
--- a/deep_speech_2/deploy/demo_server.py
+++ b/deep_speech_2/deploy/demo_server.py
--- a/deep_speech_2/docs/images/multi_gpu_speedup.png
+++ b/deep_speech_2/docs/images/multi_gpu_speedup.png
--- a/deep_speech_2/docs/images/tuning_error_surface.png
+++ b/deep_speech_2/docs/images/tuning_error_surface.png
--- a/deep_speech_2/examples/aishell/run_data.sh
+++ b/deep_speech_2/examples/aishell/run_data.sh
--- a/deep_speech_2/examples/aishell/run_infer.sh
+++ b/deep_speech_2/examples/aishell/run_infer.sh
--- a/deep_speech_2/examples/aishell/run_infer_golden.sh
+++ b/deep_speech_2/examples/aishell/run_infer_golden.sh
--- a/deep_speech_2/examples/aishell/run_test.sh
+++ b/deep_speech_2/examples/aishell/run_test.sh
--- a/deep_speech_2/examples/aishell/run_test_golden.sh
+++ b/deep_speech_2/examples/aishell/run_test_golden.sh
--- a/deep_speech_2/examples/aishell/run_train.sh
+++ b/deep_speech_2/examples/aishell/run_train.sh
--- a/deep_speech_2/examples/deploy_demo/run_demo_client.sh
+++ b/deep_speech_2/examples/deploy_demo/run_demo_client.sh
--- a/deep_speech_2/examples/deploy_demo/run_english_demo_server.sh
+++ b/deep_speech_2/examples/deploy_demo/run_english_demo_server.sh
--- a/deep_speech_2/examples/librispeech/run_data.sh
+++ b/deep_speech_2/examples/librispeech/run_data.sh
--- a/deep_speech_2/examples/librispeech/run_infer.sh
+++ b/deep_speech_2/examples/librispeech/run_infer.sh
--- a/deep_speech_2/examples/librispeech/run_infer_golden.sh
+++ b/deep_speech_2/examples/librispeech/run_infer_golden.sh
--- a/deep_speech_2/examples/librispeech/run_test.sh
+++ b/deep_speech_2/examples/librispeech/run_test.sh
--- a/deep_speech_2/examples/librispeech/run_test_golden.sh
+++ b/deep_speech_2/examples/librispeech/run_test_golden.sh
--- a/deep_speech_2/examples/librispeech/run_train.sh
+++ b/deep_speech_2/examples/librispeech/run_train.sh
--- a/deep_speech_2/examples/librispeech/run_tune.sh
+++ b/deep_speech_2/examples/librispeech/run_tune.sh
--- a/deep_speech_2/examples/tiny/run_data.sh
+++ b/deep_speech_2/examples/tiny/run_data.sh
--- a/deep_speech_2/examples/tiny/run_infer.sh
+++ b/deep_speech_2/examples/tiny/run_infer.sh
--- a/deep_speech_2/examples/tiny/run_infer_golden.sh
+++ b/deep_speech_2/examples/tiny/run_infer_golden.sh
--- a/deep_speech_2/examples/tiny/run_test.sh
+++ b/deep_speech_2/examples/tiny/run_test.sh
--- a/deep_speech_2/examples/tiny/run_test_golden.sh
+++ b/deep_speech_2/examples/tiny/run_test_golden.sh
--- a/deep_speech_2/examples/tiny/run_train.sh
+++ b/deep_speech_2/examples/tiny/run_train.sh
--- a/deep_speech_2/examples/tiny/run_tune.sh
+++ b/deep_speech_2/examples/tiny/run_tune.sh
--- a/deep_speech_2/infer.py
+++ b/deep_speech_2/infer.py
--- a/deep_speech_2/model_utils/__init__.py
+++ b/deep_speech_2/model_utils/__init__.py
--- a/deep_speech_2/model_utils/model.py
+++ b/deep_speech_2/model_utils/model.py
--- a/deep_speech_2/model_utils/network.py
+++ b/deep_speech_2/model_utils/network.py
--- a/deep_speech_2/models/aishell/download_model.sh
+++ b/deep_speech_2/models/aishell/download_model.sh
--- a/deep_speech_2/models/baidu_en8k/download_model.sh
+++ b/deep_speech_2/models/baidu_en8k/download_model.sh
--- a/deep_speech_2/models/librispeech/download_model.sh
+++ b/deep_speech_2/models/librispeech/download_model.sh
--- a/deep_speech_2/models/lm/download_lm_ch.sh
+++ b/deep_speech_2/models/lm/download_lm_ch.sh
--- a/deep_speech_2/models/lm/download_lm_en.sh
+++ b/deep_speech_2/models/lm/download_lm_en.sh
--- a/deep_speech_2/requirements.txt
+++ b/deep_speech_2/requirements.txt
--- a/deep_speech_2/setup.sh
+++ b/deep_speech_2/setup.sh
--- a/deep_speech_2/test.py
+++ b/deep_speech_2/test.py
--- a/deep_speech_2/tools/_init_paths.py
+++ b/deep_speech_2/tools/_init_paths.py
--- a/deep_speech_2/tools/build_vocab.py
+++ b/deep_speech_2/tools/build_vocab.py
--- a/deep_speech_2/tools/compute_mean_std.py
+++ b/deep_speech_2/tools/compute_mean_std.py
--- a/deep_speech_2/tools/profile.sh
+++ b/deep_speech_2/tools/profile.sh
--- a/deep_speech_2/tools/tune.py
+++ b/deep_speech_2/tools/tune.py
--- a/deep_speech_2/train.py
+++ b/deep_speech_2/train.py
--- a/deep_speech_2/utils/__init__.py
+++ b/deep_speech_2/utils/__init__.py
--- a/deep_speech_2/utils/error_rate.py
+++ b/deep_speech_2/utils/error_rate.py
--- a/deep_speech_2/utils/tests/test_error_rate.py
+++ b/deep_speech_2/utils/tests/test_error_rate.py
--- a/deep_speech_2/utils/utility.py
+++ b/deep_speech_2/utils/utility.py
--- a/deep_speech_2/utils/utility.sh
+++ b/deep_speech_2/utils/utility.sh
--- a/dssm/README.cn.md
+++ b/dssm/README.cn.md
--- a/dssm/README.md
+++ b/dssm/README.md
--- a/dssm/network_conf.py
+++ b/dssm/network_conf.py
--- a/dssm/reader.py
+++ b/dssm/reader.py
--- a/dssm/train.py
+++ b/dssm/train.py
--- a/deep_speech_2/data_utils/__init__.py
+++ b/deep_speech_2/data_utils/__init__.py
--- a/fluid/adversarial/README.md
+++ b/fluid/adversarial/README.md
--- a/fluid/adversarial/advbox/__init__.py
+++ b/fluid/adversarial/advbox/__init__.py
--- a/fluid/adversarial/advbox/attacks/base.py
+++ b/fluid/adversarial/advbox/attacks/base.py
--- a/fluid/adversarial/advbox/attacks/gradientsign.py
+++ b/fluid/adversarial/advbox/attacks/gradientsign.py
--- a/fluid/adversarial/advbox/models/__init__.py
+++ b/fluid/adversarial/advbox/models/__init__.py
--- a/fluid/adversarial/advbox/models/base.py
+++ b/fluid/adversarial/advbox/models/base.py
--- a/fluid/adversarial/advbox/models/paddle.py
+++ b/fluid/adversarial/advbox/models/paddle.py
--- a/fluid/adversarial/fluid_mnist.py
+++ b/fluid/adversarial/fluid_mnist.py
--- a/fluid/adversarial/mnist_tutorial_fgsm.py
+++ b/fluid/adversarial/mnist_tutorial_fgsm.py
--- a/fluid/image_classification/reader.py
+++ b/fluid/image_classification/reader.py
--- a/fluid/image_classification/se_resnext.py
+++ b/fluid/image_classification/se_resnext.py
--- a/generate_chinese_poetry/network_conf.py
+++ b/generate_chinese_poetry/network_conf.py
--- a/generate_chinese_poetry/reader.py
+++ b/generate_chinese_poetry/reader.py
--- a/generate_chinese_poetry/train.py
+++ b/generate_chinese_poetry/train.py
--- a/generate_sequence_by_rnn_lm/README.md
+++ b/generate_sequence_by_rnn_lm/README.md
--- a/generate_sequence_by_rnn_lm/network_conf.py
+++ b/generate_sequence_by_rnn_lm/network_conf.py
--- a/generate_sequence_by_rnn_lm/train.py
+++ b/generate_sequence_by_rnn_lm/train.py
--- a/globally_normalized_reader/basic_modules.py
+++ b/globally_normalized_reader/basic_modules.py
--- a/globally_normalized_reader/beam_decoding.py
+++ b/globally_normalized_reader/beam_decoding.py
--- a/globally_normalized_reader/featurize.py
+++ b/globally_normalized_reader/featurize.py
--- a/globally_normalized_reader/model.py
+++ b/globally_normalized_reader/model.py
--- a/globally_normalized_reader/train.py
+++ b/globally_normalized_reader/train.py
--- a/globally_normalized_reader/vocab.py
+++ b/globally_normalized_reader/vocab.py
--- a/how_to_use_capi/README.md
+++ b/how_to_use_capi/README.md
--- a/hsigmoid/network_conf.py
+++ b/hsigmoid/network_conf.py
--- a/hsigmoid/train.py
+++ b/hsigmoid/train.py
--- a/image_classification/README.md
+++ b/image_classification/README.md
--- a/image_classification/alexnet.py
+++ b/image_classification/alexnet.py
--- a/image_classification/caffe2paddle/caffe2paddle.py
+++ b/image_classification/caffe2paddle/caffe2paddle.py
--- a/image_classification/googlenet.py
+++ b/image_classification/googlenet.py
--- a/image_classification/inception_resnet_v2.py
+++ b/image_classification/inception_resnet_v2.py
--- a/image_classification/inception_v4.py
+++ b/image_classification/inception_v4.py
--- a/image_classification/infer.py
+++ b/image_classification/infer.py
--- a/image_classification/resnet.py
+++ b/image_classification/resnet.py
--- a/image_classification/tf2paddle/README.md
+++ b/image_classification/tf2paddle/README.md
--- a/image_classification/tf2paddle/tf2paddle.py
+++ b/image_classification/tf2paddle/tf2paddle.py
--- a/image_classification/train.py
+++ b/image_classification/train.py
--- a/image_classification/vgg.py
+++ b/image_classification/vgg.py
--- a/image_classification/xception.py
+++ b/image_classification/xception.py
--- a/ltr/README.md
+++ b/ltr/README.md
--- a/ltr/infer.py
+++ b/ltr/infer.py
--- a/ltr/lambda_rank.py
+++ b/ltr/lambda_rank.py
--- a/ltr/metrics.py
+++ b/ltr/metrics.py
--- a/ltr/ranknet.py
+++ b/ltr/ranknet.py
--- a/ltr/train.py
+++ b/ltr/train.py
--- a/mt_with_external_memory/external_memory.py
+++ b/mt_with_external_memory/external_memory.py
--- a/mt_with_external_memory/infer.py
+++ b/mt_with_external_memory/infer.py
--- a/mt_with_external_memory/model.py
+++ b/mt_with_external_memory/model.py
--- a/mt_with_external_memory/train.py
+++ b/mt_with_external_memory/train.py
--- a/nce_cost/README.md
+++ b/nce_cost/README.md
--- a/nce_cost/network_conf.py
+++ b/nce_cost/network_conf.py
--- a/nce_cost/train.py
+++ b/nce_cost/train.py
--- a/nested_sequence/text_classification/network_conf.py
+++ b/nested_sequence/text_classification/network_conf.py
--- a/nested_sequence/text_classification/reader.py
+++ b/nested_sequence/text_classification/reader.py
--- a/neural_qa/network.py
+++ b/neural_qa/network.py
--- a/neural_qa/train.py
+++ b/neural_qa/train.py
--- a/neural_qa/utils.py
+++ b/neural_qa/utils.py
--- a/nmt_without_attention/README.md
+++ b/nmt_without_attention/README.md
--- a/nmt_without_attention/network_conf.py
+++ b/nmt_without_attention/network_conf.py
--- a/nmt_without_attention/train.py
+++ b/nmt_without_attention/train.py
--- a/scene_text_recognition/infer.py
+++ b/scene_text_recognition/infer.py
--- a/scene_text_recognition/network_conf.py
+++ b/scene_text_recognition/network_conf.py
--- a/scene_text_recognition/train.py
+++ b/scene_text_recognition/train.py
--- a/scheduled_sampling/README.md
+++ b/scheduled_sampling/README.md
--- a/scheduled_sampling/generate.py
+++ b/scheduled_sampling/generate.py
--- a/scheduled_sampling/scheduled_sampling.py
+++ b/scheduled_sampling/scheduled_sampling.py
--- a/scheduled_sampling/reader.py
+++ b/scheduled_sampling/reader.py
--- a/scheduled_sampling/train.py
+++ b/scheduled_sampling/train.py
--- a/scheduled_sampling/random_schedule_generator.py
+++ b/scheduled_sampling/random_schedule_generator.py
--- a/sequence_tagging_for_ner/README.md
+++ b/sequence_tagging_for_ner/README.md
--- a/sequence_tagging_for_ner/network_conf.py
+++ b/sequence_tagging_for_ner/network_conf.py
--- a/sequence_tagging_for_ner/train.py
+++ b/sequence_tagging_for_ner/train.py
--- a/ssd/config/__init__.py
+++ b/ssd/config/__init__.py
--- a/ssd/config/pascal_voc_conf.py
+++ b/ssd/config/pascal_voc_conf.py
--- a/ssd/eval.py
+++ b/ssd/eval.py
--- a/ssd/train.py
+++ b/ssd/train.py
--- a/ssd/vgg_ssd_net.py
+++ b/ssd/vgg_ssd_net.py
--- a/text_classification/infer.py
+++ b/text_classification/infer.py
--- a/text_classification/network_conf.py
+++ b/text_classification/network_conf.py
--- a/text_classification/train.py
+++ b/text_classification/train.py