update README.

72a72150 · caoying03 · ba0ff690 · 06f272ab · 72a72150 · 72a72150
47 changed file
--- a/.travis.yml
+++ b/.travis.yml
@@ -21,9 +21,8 @@ before_install:
  -  docker pull paddlepaddle/paddle:latest
 script:
  -  .travis/precommit.sh
-  -  docker run -i --rm -v "$PWD:/py_unittest" paddlepaddle/paddle:latest /bin/bash -c
-     "cd /py_unittest && find . -name 'tests' -type d -print0 | xargs -0 -I{} -n1 bash -c 'cd {};
-     python -m unittest discover -v'"
+  -  docker run -i --rm -v "$PWD:/py_unittest" paddlepaddle/paddle:latest /bin/bash -c 
+    'cd /py_unittest; sh .travis/unittest.sh'

 notifications:
  email:

--- a/.travis/unittest.sh
+++ b/.travis/unittest.sh
+#!/bin/bash
+
+abort(){
+    echo "Run unittest failed" 1>&2
+    echo "Please check your code" 1>&2
+    exit 1
+}
+
+unittest(){
+    cd $1 > /dev/null
+    if [ -f "requirements.txt" ]; then
+        pip install -r requirements.txt
+    fi
+    if [ $? != 0 ]; then
+        exit 1
+    fi
+    find . -name 'tests' -type d -print0 | \
+        xargs -0 -I{} -n1 bash -c \
+        'python -m unittest discover -v -s {}'
+    cd - > /dev/null
+}
+
+trap 'abort' 0
+set -e
+
+for proj in */ ; do
+    if [ -d $proj ]; then
+        unittest $proj
+        if [ $? != 0 ]; then
+            exit 1
+        fi
+    fi
+done
+
+trap : 0
--- a/README.md
+++ b/README.md
@@ -14,46 +14,57 @@ PaddlePaddle提供了丰富的运算单元，帮助大家以模块化的方式
 在词向量的例子中，我们向大家展示如何使用Hierarchical-Sigmoid 和噪声对比估计（Noise Contrastive Estimation，NCE）来加速词向量的学习。

 - 1.1 [Hsigmoid加速词向量训练](https://github.com/PaddlePaddle/models/tree/develop/word_embedding)
+- 1.2 [噪声对比估计加速词向量训练](https://github.com/PaddlePaddle/models/tree/develop/nce_cost)

-## 2. 点击率预估
+
+## 2. 语言模型
+
+语言模型是自然语言处理领域里一个重要的基础模型，它是一个概率分布模型，利用它可以确定哪个词序列的可能性更大，或者给定若干个词，可以预测下一个最可能出现的词。语言模型被应用在很多领域，如：自动写作、QA、机器翻译、拼写检查、语音识别、词性标注等。
+
+在语言模型的例子中，我们以文本生成为例，提供了RNN LM（包括LSTM、GRU）和N-Gram LM，供大家学习和使用。用户可以通过文档中的 “使用说明” 快速上手：适配训练语料，以训练 “自动写诗”、“自动写散文” 等有趣的模型。
+
+- 2.1 [基于LSTM、GRU、N-Gram的文本生成模型](https://github.com/PaddlePaddle/models/tree/develop/language_model)
+
+## 3. 点击率预估

 点击率预估模型预判用户对一条广告点击的概率，对每次广告的点击情况做出预测，是广告技术的核心算法之一。逻谛斯克回归对大规模稀疏特征有着很好的学习能力，在点击率预估任务发展的早期一统天下。近年来，DNN 模型由于其强大的学习能力逐渐接过点击率预估任务的大旗。

 在点击率预估的例子中，我们给出谷歌提出的 Wide & Deep 模型。这一模型融合了适用于学习抽象特征的 DNN 和适用于大规模稀疏特征的逻谛斯克回归两者模型的优点，可以作为一种相对成熟的模型框架使用， 在工业界也有一定的应用。

- 2.1 [Wide & deep 点击率预估模型](https://github.com/PaddlePaddle/models/tree/develop/ctr)
+- 3.1 [Wide & deep 点击率预估模型](https://github.com/PaddlePaddle/models/tree/develop/ctr)

-## 3. 文本分类
+## 4. 文本分类

 文本分类是自然语言处理领域最基础的任务之一，深度学习方法能够免除复杂的特征工程，直接使用原始文本作为输入，数据驱动地最优化分类准确率。

 在文本分类的例子中，我们以情感分类任务为例，提供了基于DNN的非序列文本分类模型，以及基于CNN的序列模型供大家学习和使用（基于LSTM的模型见PaddleBook中[情感分类](https://github.com/PaddlePaddle/book/blob/develop/06.understand_sentiment/README.cn.md)一课）。

- 3.1 [基于 DNN / CNN 的情感分类](https://github.com/PaddlePaddle/models/tree/develop/text_classification)
+- 4.1 [基于 DNN / CNN 的情感分类](https://github.com/PaddlePaddle/models/tree/develop/text_classification)

-## 4. 排序学习
+## 5. 排序学习

 排序学习(Learning to Rank， LTR)是信息检索和搜索引擎研究的核心问题之一，通过机器学习方法学习一个分值函数对待排序的候选进行打分，再根据分值的高低确定序关系。深度神经网络可以用来建模分值函数，构成各类基于深度学习的LTR模型。

 在排序学习的例子中，我们介绍基于 RankLoss 损失函数的 Pairwise 排序模型和基于LambdaRank损失函数的Listwise排序模型(Pointwise学习策略见PaddleBook中[推荐系统](https://github.com/PaddlePaddle/book/blob/develop/05.recommender_system/README.cn.md)一课）。

- 4.1 [基于 Pairwise 和 Listwise 的排序学习](https://github.com/PaddlePaddle/models/tree/develop/ltr)
+- 5.1 [基于 Pairwise 和 Listwise 的排序学习](https://github.com/PaddlePaddle/models/tree/develop/ltr)

-## 5. 序列标注
+## 6. 序列标注

 给定输入序列，序列标注模型为序列中每一个元素贴上一个类别标签，是自然语言处理领域最基础的任务之一。随着深度学习的不断探索和发展，利用循环神经网络学习输入序列的特征表示，条件随机场（Conditional Random Field, CRF）在特征基础上完成序列标注任务，逐渐成为解决序列标注问题的标配解决方案。

 在序列标注的例子中，我们以命名实体识别（Named Entity Recognition，NER）任务为例，介绍如何训练一个端到端的序列标注模型。

- 5.1 [命名实体识别](https://github.com/PaddlePaddle/models/tree/develop/sequence_tagging_for_ner)
+- 6.1 [命名实体识别](https://github.com/PaddlePaddle/models/tree/develop/sequence_tagging_for_ner)

-## 6. 序列到序列学习
+## 7. 序列到序列学习

 序列到序列学习实现两个甚至是多个不定长模型之间的映射，有着广泛的应用，包括：机器翻译、智能对话与问答、广告创意语料生成、自动编码（如金融画像编码）、判断多个文本串之间的语义相关性等。

 在序列到序列学习的例子中，我们以机器翻译任务为例，提供了多种改进模型，供大家学习和使用。包括：不带注意力机制的序列到序列映射模型，这一模型是所有序列到序列学习模型的基础；使用 scheduled sampling 改善 RNN 模型在生成任务中的错误累积问题；带外部记忆机制的神经机器翻译，通过增强神经网络的记忆能力，来完成复杂的序列到序列学习任务。

- 6.1 [无注意力机制的编码器解码器模型](https://github.com/PaddlePaddle/models/tree/develop/nmt_without_attention)
+- 7.1 [无注意力机制的编码器解码器模型](https://github.com/PaddlePaddle/models/tree/develop/nmt_without_attention)
+

 ## Copyright and License
 PaddlePaddle is provided under the [Apache-2.0 license](LICENSE).
--- a/deep_speech_2/compute_mean_std.py
+++ b/deep_speech_2/compute_mean_std.py
--- a/deep_speech_2/data_utils/__init__.py
+++ b/deep_speech_2/data_utils/__init__.py
--- a/deep_speech_2/data_utils/audio.py
+++ b/deep_speech_2/data_utils/audio.py
@@ -6,6 +6,10 @@ from __future__ import print_function
 import numpy as np
 import io
 import soundfile
+import scikits.samplerate
+from scipy import signal
+import random
+import copy


 class AudioSegment(object):
@@ -75,6 +79,32 @@ class AudioSegment(object):
            io.BytesIO(bytes), dtype='float32')
        return cls(samples, sample_rate)

+    @classmethod
+    def concatenate(cls, *segments):
+        """Concatenate an arbitrary number of audio segments together.
+
+        :param *segments: Input audio segments to be concatenated.
+        :type *segments: tuple of AudioSegment
+        :return: Audio segment instance as concatenating results.
+        :rtype: AudioSegment
+        :raises ValueError: If the number of segments is zero, or if the 
+                            sample_rate of any segments does not match.
+        :raises TypeError: If any segment is not AudioSegment instance.
+        """
+        # Perform basic sanity-checks.
+        if len(segments) == 0:
+            raise ValueError("No audio segments are given to concatenate.")
+        sample_rate = segments[0]._sample_rate
+        for seg in segments:
+            if sample_rate != seg._sample_rate:
+                raise ValueError("Can't concatenate segments with "
+                                 "different sample rates")
+            if type(seg) is not cls:
+                raise TypeError("Only audio segments of the same type "
+                                "can be concatenated.")
+        samples = np.concatenate([seg.samples for seg in segments])
+        return cls(samples, sample_rate)
+
    def to_wav_file(self, filepath, dtype='float32'):
        """Save audio segment to disk as wav file.
        
@@ -100,6 +130,89 @@ class AudioSegment(object):
            format='WAV',
            subtype=subtype_map[dtype])

+    @classmethod
+    def slice_from_file(cls, file, start=None, end=None):
+        """Loads a small section of an audio without having to load
+        the entire file into the memory which can be incredibly wasteful.
+
+        :param file: Input audio filepath or file object.
+        :type file: basestring|file
+        :param start: Start time in seconds. If start is negative, it wraps
+                      around from the end. If not provided, this function 
+                      reads from the very beginning.
+        :type start: float
+        :param end: End time in seconds. If end is negative, it wraps around
+                    from the end. If not provided, the default behvaior is
+                    to read to the end of the file.
+        :type end: float
+        :return: AudioSegment instance of the specified slice of the input
+                 audio file.
+        :rtype: AudioSegment
+        :raise ValueError: If start or end is incorrectly set, e.g. out of
+                           bounds in time.
+        """
+        sndfile = soundfile.SoundFile(file)
+        sample_rate = sndfile.samplerate
+        duration = float(len(sndfile)) / sample_rate
+        start = 0. if start is None else start
+        end = 0. if end is None else end
+        if start < 0.0:
+            start += duration
+        if end < 0.0:
+            end += duration
+        if start < 0.0:
+            raise ValueError("The slice start position (%f s) is out of "
+                             "bounds." % start)
+        if end < 0.0:
+            raise ValueError("The slice end position (%f s) is out of bounds." %
+                             end)
+        if start > end:
+            raise ValueError("The slice start position (%f s) is later than "
+                             "the slice end position (%f s)." % (start, end))
+        if end > duration:
+            raise ValueError("The slice end position (%f s) is out of bounds "
+                             "(> %f s)" % (end, duration))
+        start_frame = int(start * sample_rate)
+        end_frame = int(end * sample_rate)
+        sndfile.seek(start_frame)
+        data = sndfile.read(frames=end_frame - start_frame, dtype='float32')
+        return cls(data, sample_rate)
+
+    @classmethod
+    def make_silence(cls, duration, sample_rate):
+        """Creates a silent audio segment of the given duration and sample rate.
+
+        :param duration: Length of silence in seconds.
+        :type duration: float
+        :param sample_rate: Sample rate.
+        :type sample_rate: float
+        :return: Silent AudioSegment instance of the given duration.
+        :rtype: AudioSegment
+        """
+        samples = np.zeros(int(duration * sample_rate))
+        return cls(samples, sample_rate)
+
+    def superimpose(self, other):
+        """Add samples from another segment to those of this segment
+        (sample-wise addition, not segment concatenation).
+
+        Note that this is an in-place transformation.
+
+        :param other: Segment containing samples to be added in.
+        :type other: AudioSegments
+        :raise TypeError: If type of two segments don't match.
+        :raise ValueError: If the sample rates of the two segments are not
+                           equal, or if the lengths of segments don't match.
+        """
+        if type(self) != type(other):
+            raise TypeError("Cannot add segments of different types: %s "
+                            "and %s." % (type(self), type(other)))
+        if self._sample_rate != other._sample_rate:
+            raise ValueError("Sample rates must match to add segments.")
+        if len(self._samples) != len(other._samples):
+            raise ValueError("Segment lengths must match to add segments.")
+        self._samples += other._samples
+
    def to_bytes(self, dtype='float32'):
        """Create a byte string containing the audio content.
        
@@ -143,23 +256,257 @@ class AudioSegment(object):
        new_indices = np.linspace(start=0, stop=old_length, num=new_length)
        self._samples = np.interp(new_indices, old_indices, self._samples)

-    def normalize(self, target_sample_rate):
-        raise NotImplementedError()
+    def normalize(self, target_db=-20, max_gain_db=300.0):
+        """Normalize audio to be of the desired RMS value in decibels.
+
+        Note that this is an in-place transformation.
+
+        :param target_db: Target RMS value in decibels. This value should be
+                          less than 0.0 as 0.0 is full-scale audio.
+        :type target_db: float
+        :param max_gain_db: Max amount of gain in dB that can be applied for
+                            normalization. This is to prevent nans when
+                            attempting to normalize a signal consisting of
+                            all zeros.
+        :type max_gain_db: float
+        :raises ValueError: If the required gain to normalize the segment to
+                            the target_db value exceeds max_gain_db.
+        """
+        gain = target_db - self.rms_db
+        if gain > max_gain_db:
+            raise ValueError(
+                "Unable to normalize segment to %f dB because the "
+                "the probable gain have exceeds max_gain_db (%f dB)" %
+                (target_db, max_gain_db))
+        self.apply_gain(min(max_gain_db, target_db - self.rms_db))
+
+    def normalize_online_bayesian(self,
+                                  target_db,
+                                  prior_db,
+                                  prior_samples,
+                                  startup_delay=0.0):
+        """Normalize audio using a production-compatible online/causal
+        algorithm. This uses an exponential likelihood and gamma prior to
+        make online estimates of the RMS even when there are very few samples.
+
+        Note that this is an in-place transformation.
+
+        :param target_db: Target RMS value in decibels.
+        :type target_bd: float
+        :param prior_db: Prior RMS estimate in decibels.
+        :type prior_db: float
+        :param prior_samples: Prior strength in number of samples.
+        :type prior_samples: float
+        :param startup_delay: Default 0.0s. If provided, this function will
+                              accrue statistics for the first startup_delay 
+                              seconds before applying online normalization.
+        :type startup_delay: float
+        """
+        # Estimate total RMS online.
+        startup_sample_idx = min(self.num_samples - 1,
+                                 int(self.sample_rate * startup_delay))
+        prior_mean_squared = 10.**(prior_db / 10.)
+        prior_sum_of_squares = prior_mean_squared * prior_samples
+        cumsum_of_squares = np.cumsum(self.samples**2)
+        sample_count = np.arange(len(self.num_samples)) + 1
+        if startup_sample_idx > 0:
+            cumsum_of_squares[:startup_sample_idx] = \
+                cumsum_of_squares[startup_sample_idx]
+            sample_count[:startup_sample_idx] = \
+                sample_count[startup_sample_idx]
+        mean_squared_estimate = ((cumsum_of_squares + prior_sum_of_squares) /
+                                 (sample_count + prior_samples))
+        rms_estimate_db = 10 * np.log10(mean_squared_estimate)
+        # Compute required time-varying gain.
+        gain_db = target_db - rms_estimate_db
+        self.apply_gain(gain_db)
+
+    def resample(self, target_sample_rate, quality='sinc_medium'):
+        """Resample the audio to a target sample rate.
+
+        Note that this is an in-place transformation.

-    def resample(self, target_sample_rate):
-        raise NotImplementedError()
+        :param target_sample_rate: Target sample rate.
+        :type target_sample_rate: int
+        :param quality: One of {'sinc_fastest', 'sinc_medium', 'sinc_best'}.
+                        Sets resampling speed/quality tradeoff.
+                        See http://www.mega-nerd.com/SRC/api_misc.html#Converters
+        :type quality: str
+        """
+        resample_ratio = target_sample_rate / self._sample_rate
+        self._samples = scikits.samplerate.resample(
+            self._samples, r=resample_ratio, type=quality)
+        self._sample_rate = target_sample_rate

    def pad_silence(self, duration, sides='both'):
-        raise NotImplementedError()
+        """Pad this audio sample with a period of silence.
+
+        Note that this is an in-place transformation.
+
+        :param duration: Length of silence in seconds to pad.
+        :type duration: float
+        :param sides: Position for padding:
+                     'beginning' - adds silence in the beginning;
+                     'end' - adds silence in the end;
+                     'both' - adds silence in both the beginning and the end.
+        :type sides: str
+        :raises ValueError: If sides is not supported.
+        """
+        if duration == 0.0:
+            return self
+        cls = type(self)
+        silence = self.make_silence(duration, self._sample_rate)
+        if sides == "beginning":
+            padded = cls.concatenate(silence, self)
+        elif sides == "end":
+            padded = cls.concatenate(self, silence)
+        elif sides == "both":
+            padded = cls.concatenate(silence, self, silence)
+        else:
+            raise ValueError("Unknown value for the sides %s" % sides)
+        self._samples = padded._samples

    def subsegment(self, start_sec=None, end_sec=None):
-        raise NotImplementedError()
+        """Cut the AudioSegment between given boundaries.

-    def convolve(self, filter, allow_resample=False):
-        raise NotImplementedError()
+        Note that this is an in-place transformation.
+
+        :param start_sec: Beginning of subsegment in seconds.
+        :type start_sec: float
+        :param end_sec: End of subsegment in seconds.
+        :type end_sec: float
+        :raise ValueError: If start_sec or end_sec is incorrectly set, e.g. out
+                           of bounds in time.
+        """
+        start_sec = 0.0 if start_sec is None else start_sec
+        end_sec = self.duration if end_sec is None else end_sec
+        if start_sec < 0.0:
+            start_sec = self.duration + start_sec
+        if end_sec < 0.0:
+            end_sec = self.duration + end_sec
+        if start_sec < 0.0:
+            raise ValueError("The slice start position (%f s) is out of "
+                             "bounds." % start_sec)
+        if end_sec < 0.0:
+            raise ValueError("The slice end position (%f s) is out of bounds." %
+                             end_sec)
+        if start_sec > end_sec:
+            raise ValueError("The slice start position (%f s) is later than "
+                             "the end position (%f s)." % (start_sec, end_sec))
+        if end_sec > self.duration:
+            raise ValueError("The slice end position (%f s) is out of bounds "
+                             "(> %f s)" % (end_sec, self.duration))
+        start_sample = int(round(start_sec * self._sample_rate))
+        end_sample = int(round(end_sec * self._sample_rate))
+        self._samples = self._samples[start_sample:end_sample]
+
+    def random_subsegment(self, subsegment_length, rng=None):
+        """Cut the specified length of the audiosegment randomly.
+
+        Note that this is an in-place transformation.
+
+        :param subsegment_length: Subsegment length in seconds.
+        :type subsegment_length: float
+        :param rng: Random number generator state.
+        :type rng: random.Random
+        :raises ValueError: If the length of subsegment is greater than
+                            the origineal segemnt.
+        """
+        rng = random.Random() if rng is None else rng
+        if subsegment_length > self.duration:
+            raise ValueError("Length of subsegment must not be greater "
+                             "than original segment.")
+        start_time = rng.uniform(0.0, self.duration - subsegment_length)
+        self.subsegment(start_time, start_time + subsegment_length)

-    def convolve_and_normalize(self, filter, allow_resample=False):
-        raise NotImplementedError()
+    def convolve(self, impulse_segment, allow_resample=False):
+        """Convolve this audio segment with the given impulse segment.
+
+        Note that this is an in-place transformation.
+
+        :param impulse_segment: Impulse response segments.
+        :type impulse_segment: AudioSegment
+        :param allow_resample: Indicates whether resampling is allowed when
+                               the impulse_segment has a different sample 
+                               rate from this signal.
+        :type allow_resample: bool
+        :raises ValueError: If the sample rate is not match between two
+                            audio segments when resample is not allowed.
+        """
+        if allow_resample and self.sample_rate != impulse_segment.sample_rate:
+            impulse_segment = impulse_segment.resample(self.sample_rate)
+        if self.sample_rate != impulse_segment.sample_rate:
+            raise ValueError("Impulse segment's sample rate (%d Hz) is not"
+                             "equal to base signal sample rate (%d Hz)." %
+                             (impulse_segment.sample_rate, self.sample_rate))
+        samples = signal.fftconvolve(self.samples, impulse_segment.samples,
+                                     "full")
+        self._samples = samples
+
+    def convolve_and_normalize(self, impulse_segment, allow_resample=False):
+        """Convolve and normalize the resulting audio segment so that it
+        has the same average power as the input signal.
+
+        Note that this is an in-place transformation.
+
+        :param impulse_segment: Impulse response segments.
+        :type impulse_segment: AudioSegment
+        :param allow_resample: Indicates whether resampling is allowed when
+                               the impulse_segment has a different sample
+                               rate from this signal.
+        :type allow_resample: bool
+        """
+        target_db = self.rms_db
+        self.convolve(impulse_segment, allow_resample=allow_resample)
+        self.normalize(target_db)
+
+    def add_noise(self,
+                  noise,
+                  snr_dB,
+                  allow_downsampling=False,
+                  max_gain_db=300.0,
+                  rng=None):
+        """Add the given noise segment at a specific signal-to-noise ratio.
+        If the noise segment is longer than this segment, a random subsegment
+        of matching length is sampled from it and used instead.
+
+        Note that this is an in-place transformation.
+
+        :param noise: Noise signal to add.
+        :type noise: AudioSegment
+        :param snr_dB: Signal-to-Noise Ratio, in decibels.
+        :type snr_dB: float
+        :param allow_downsampling: Whether to allow the noise signal to be
+                                   downsampled to match the base signal sample
+                                   rate.
+        :type allow_downsampling: bool
+        :param max_gain_db: Maximum amount of gain to apply to noise signal
+                            before adding it in. This is to prevent attempting
+                            to apply infinite gain to a zero signal.
+        :type max_gain_db: float
+        :param rng: Random number generator state.
+        :type rng: None|random.Random
+        :raises ValueError: If the sample rate does not match between the two
+                            audio segments when downsampling is not allowed, or
+                            if the duration of noise segments is shorter than
+                            original audio segments.
+        """
+        rng = random.Random() if rng is None else rng
+        if allow_downsampling and noise.sample_rate > self.sample_rate:
+            noise = noise.resample(self.sample_rate)
+        if noise.sample_rate != self.sample_rate:
+            raise ValueError("Noise sample rate (%d Hz) is not equal to base "
+                             "signal sample rate (%d Hz)." % (noise.sample_rate,
+                                                              self.sample_rate))
+        if noise.duration < self.duration:
+            raise ValueError("Noise signal (%f sec) must be at least as long as"
+                             " base signal (%f sec)." %
+                             (noise.duration, self.duration))
+        noise_gain_db = min(self.rms_db - noise.rms_db - snr_dB, max_gain_db)
+        noise_new = copy.deepcopy(noise)
+        noise_new.random_subsegment(self.duration, rng=rng)
+        noise_new.apply_gain(noise_gain_db)
+        self.superimpose(noise_new)

    @property
    def samples(self):
@@ -186,7 +533,7 @@ class AudioSegment(object):
        :return: Number of samples.
        :rtype: int
        """
-        return self._samples.shape(0)
+        return self._samples.shape[0]

    @property
    def duration(self):
@@ -230,7 +577,7 @@ class AudioSegment(object):
        Audio sample type is usually integer or float-point. For integer
        type, float32 will be rescaled from [-1, 1] to the maximum range
        supported by the integer type.
-        
+
        This is for writing a audio file.
        """
        dtype = np.dtype(dtype)

--- a/deep_speech_2/data_utils/augmentor/__init__.py
+++ b/deep_speech_2/data_utils/augmentor/__init__.py
--- a/deep_speech_2/data_utils/augmentor/augmentation.py
+++ b/deep_speech_2/data_utils/augmentor/augmentation.py
--- a/deep_speech_2/data_utils/augmentor/base.py
+++ b/deep_speech_2/data_utils/augmentor/base.py
--- a/deep_speech_2/data_utils/augmentor/volume_perturb.py
+++ b/deep_speech_2/data_utils/augmentor/volume_perturb.py
--- a/deep_speech_2/data_utils/data.py
+++ b/deep_speech_2/data_utils/data.py
@@ -80,7 +80,7 @@ class DataGenerator(object):
                             padding_to=-1,
                             flatten=False,
                             sortagrad=False,
-                             batch_shuffle=False):
+                             shuffle_method="batch_shuffle"):
        """
        Batch data reader creator for audio data. Return a callable generator
        function to produce batches of data.
@@ -104,12 +104,22 @@ class DataGenerator(object):
        :param sortagrad: If set True, sort the instances by audio duration
                          in the first epoch for speed up training.
        :type sortagrad: bool
-        :param batch_shuffle: If set True, instances are batch-wise shuffled.
-                              For more details, please see 
-                              ``_batch_shuffle.__doc__``.
-                              If sortagrad is True, batch_shuffle is disabled
+        :param shuffle_method: Shuffle method. Options:
+                                '' or None: no shuffle.
+                                'instance_shuffle': instance-wise shuffle.
+                                'batch_shuffle': similarly-sized instances are
+                                                 put into batches, and then
+                                                 batch-wise shuffle the batches.
+                                                 For more details, please see
+                                                 ``_batch_shuffle.__doc__``.
+                                'batch_shuffle_clipped': 'batch_shuffle' with
+                                                         head shift and tail
+                                                         clipping. For more
+                                                         details, please see
+                                                         ``_batch_shuffle``.
+                              If sortagrad is True, shuffle is disabled
                              for the first epoch.
-        :type batch_shuffle: bool
+        :type shuffle_method: None|str
        :return: Batch reader function, producing batches of data when called.
        :rtype: callable
        """
@@ -123,8 +133,20 @@ class DataGenerator(object):
            # sort (by duration) or batch-wise shuffle the manifest
            if self._epoch == 0 and sortagrad:
                manifest.sort(key=lambda x: x["duration"])
-            elif batch_shuffle:
-                manifest = self._batch_shuffle(manifest, batch_size)
+            else:
+                if shuffle_method == "batch_shuffle":
+                    manifest = self._batch_shuffle(
+                        manifest, batch_size, clipped=False)
+                elif shuffle_method == "batch_shuffle_clipped":
+                    manifest = self._batch_shuffle(
+                        manifest, batch_size, clipped=True)
+                elif shuffle_method == "instance_shuffle":
+                    self._rng.shuffle(manifest)
+                elif not shuffle_method:
+                    pass
+                else:
+                    raise ValueError("Unknown shuffle method %s." %
+                                     shuffle_method)
            # prepare batches
            instance_reader = self._instance_reader_creator(manifest)
            batch = []
@@ -218,7 +240,7 @@ class DataGenerator(object):
            new_batch.append((padded_audio, text))
        return new_batch

-    def _batch_shuffle(self, manifest, batch_size):
+    def _batch_shuffle(self, manifest, batch_size, clipped=False):
        """Put similarly-sized instances into minibatches for better efficiency
        and make a batch-wise shuffle.

@@ -233,6 +255,9 @@ class DataGenerator(object):
        :param batch_size: Batch size. This size is also used for generate
                           a random number for batch shuffle.
        :type batch_size: int
+        :param clipped: Whether to clip the heading (small shift) and trailing
+                        (incomplete batch) instances.
+        :type clipped: bool
        :return: Batch shuffled mainifest.
        :rtype: list
        """
@@ -241,7 +266,8 @@ class DataGenerator(object):
        batch_manifest = zip(*[iter(manifest[shift_len:])] * batch_size)
        self._rng.shuffle(batch_manifest)
        batch_manifest = list(sum(batch_manifest, ()))
-        res_len = len(manifest) - shift_len - len(batch_manifest)
-        batch_manifest.extend(manifest[-res_len:])
-        batch_manifest.extend(manifest[0:shift_len])
+        if not clipped:
+            res_len = len(manifest) - shift_len - len(batch_manifest)
+            batch_manifest.extend(manifest[-res_len:])
+            batch_manifest.extend(manifest[0:shift_len])
        return batch_manifest
--- a/deep_speech_2/data_utils/featurizer/__init__.py
+++ b/deep_speech_2/data_utils/featurizer/__init__.py
--- a/deep_speech_2/data_utils/featurizer/audio_featurizer.py
+++ b/deep_speech_2/data_utils/featurizer/audio_featurizer.py
--- a/deep_speech_2/data_utils/featurizer/speech_featurizer.py
+++ b/deep_speech_2/data_utils/featurizer/speech_featurizer.py
--- a/deep_speech_2/data_utils/featurizer/text_featurizer.py
+++ b/deep_speech_2/data_utils/featurizer/text_featurizer.py
--- a/deep_speech_2/data_utils/normalizer.py
+++ b/deep_speech_2/data_utils/normalizer.py
--- a/deep_speech_2/data_utils/speech.py
+++ b/deep_speech_2/data_utils/speech.py
@@ -65,6 +65,74 @@ class SpeechSegment(AudioSegment):
        audio = AudioSegment.from_bytes(bytes)
        return cls(audio.samples, audio.sample_rate, transcript)

+    @classmethod
+    def concatenate(cls, *segments):
+        """Concatenate an arbitrary number of speech segments together, both
+        audio and transcript will be concatenated.
+
+        :param *segments: Input speech segments to be concatenated.
+        :type *segments: tuple of SpeechSegment
+        :return: Speech segment instance.
+        :rtype: SpeechSegment
+        :raises ValueError: If the number of segments is zero, or if the 
+                            sample_rate of any two segments does not match.
+        :raises TypeError: If any segment is not SpeechSegment instance.
+        """
+        if len(segments) == 0:
+            raise ValueError("No speech segments are given to concatenate.")
+        sample_rate = segments[0]._sample_rate
+        transcripts = ""
+        for seg in segments:
+            if sample_rate != seg._sample_rate:
+                raise ValueError("Can't concatenate segments with "
+                                 "different sample rates")
+            if type(seg) is not cls:
+                raise TypeError("Only speech segments of the same type "
+                                "instance can be concatenated.")
+            transcripts += seg._transcript
+        samples = np.concatenate([seg.samples for seg in segments])
+        return cls(samples, sample_rate, transcripts)
+
+    @classmethod
+    def slice_from_file(cls, filepath, start=None, end=None, transcript):
+        """Loads a small section of an speech without having to load
+        the entire file into the memory which can be incredibly wasteful.
+
+        :param filepath: Filepath or file object to audio file.
+        :type filepath: basestring|file
+        :param start: Start time in seconds. If start is negative, it wraps
+                      around from the end. If not provided, this function 
+                      reads from the very beginning.
+        :type start: float
+        :param end: End time in seconds. If end is negative, it wraps around
+                    from the end. If not provided, the default behvaior is
+                    to read to the end of the file.
+        :type end: float
+        :param transcript: Transcript text for the speech. if not provided, 
+                           the defaults is an empty string.
+        :type transript: basestring
+        :return: SpeechSegment instance of the specified slice of the input
+                 speech file.
+        :rtype: SpeechSegment
+        """
+        audio = Audiosegment.slice_from_file(filepath, start, end)
+        return cls(audio.samples, audio.sample_rate, transcript)
+
+    @classmethod
+    def make_silence(cls, duration, sample_rate):
+        """Creates a silent speech segment of the given duration and
+        sample rate, transcript will be an empty string.
+
+        :param duration: Length of silence in seconds.
+        :type duration: float
+        :param sample_rate: Sample rate.
+        :type sample_rate: float
+        :return: Silence of the given duration.
+        :rtype: SpeechSegment
+        """
+        audio = AudioSegment.make_silence(duration, sample_rate)
+        return cls(audio.samples, audio.sample_rate, "")
+
    @property
    def transcript(self):
        """Return the transcript text.

--- a/deep_speech_2/data_utils/utils.py
+++ b/deep_speech_2/data_utils/utils.py
--- a/deep_speech_2/datasets/librispeech/librispeech.py
+++ b/deep_speech_2/datasets/librispeech/librispeech.py
@@ -37,8 +37,7 @@ MD5_TRAIN_CLEAN_100 = "2a93770f6d5c6c964bc36631d331a522"
 MD5_TRAIN_CLEAN_360 = "c0e676e450a7ff2f54aeade5171606fa"
 MD5_TRAIN_OTHER_500 = "d1a0fd59409feb2c614ce4d30c387708"

-parser = argparse.ArgumentParser(
-    description='Downloads and prepare LibriSpeech dataset.')
+parser = argparse.ArgumentParser(description=__doc__)
 parser.add_argument(
    "--target_dir",
    default=DATA_HOME + "/Libri",

--- a/deep_speech_2/datasets/run_all.sh
+++ b/deep_speech_2/datasets/run_all.sh
--- a/deep_speech_2/decoder.py
+++ b/deep_speech_2/decoder.py
@@ -8,8 +8,7 @@ from itertools import groupby


 def ctc_best_path_decode(probs_seq, vocabulary):
-    """
-    Best path decoding, also called argmax decoding or greedy decoding.
+    """Best path decoding, also called argmax decoding or greedy decoding.
    Path consisting of the most probable tokens are further post-processed to
    remove consecutive repetitions and all blanks.

@@ -38,8 +37,7 @@ def ctc_best_path_decode(probs_seq, vocabulary):


 def ctc_decode(probs_seq, vocabulary, method):
-    """
-    CTC-like sequence decoding from a sequence of likelihood probablilites. 
+    """CTC-like sequence decoding from a sequence of likelihood probablilites.

    :param probs_seq: 2-D list of probabilities over the vocabulary for each
                      character. Each element is a list of float probabilities

--- a/deep_speech_2/error_rate.py
+++ b/deep_speech_2/error_rate.py
+# -*- coding: utf-8 -*-
+"""This module provides functions to calculate error rate in different level.
+e.g. wer for word-level, cer for char-level.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+
+
+def _levenshtein_distance(ref, hyp):
+    """Levenshtein distance is a string metric for measuring the difference between
+    two sequences. Informally, the levenshtein disctance is defined as the minimum
+    number of single-character edits (substitutions, insertions or deletions) 
+    required to change one word into the other. We can naturally extend the edits to 
+    word level when calculate levenshtein disctance for two sentences.
+    """
+    ref_len = len(ref)
+    hyp_len = len(hyp)
+
+    # special case
+    if ref == hyp:
+        return 0
+    if ref_len == 0:
+        return hyp_len
+    if hyp_len == 0:
+        return ref_len
+
+    distance = np.zeros((ref_len + 1, hyp_len + 1), dtype=np.int32)
+
+    # initialize distance matrix
+    for j in xrange(hyp_len + 1):
+        distance[0][j] = j
+    for i in xrange(ref_len + 1):
+        distance[i][0] = i
+
+    # calculate levenshtein distance
+    for i in xrange(1, ref_len + 1):
+        for j in xrange(1, hyp_len + 1):
+            if ref[i - 1] == hyp[j - 1]:
+                distance[i][j] = distance[i - 1][j - 1]
+            else:
+                s_num = distance[i - 1][j - 1] + 1
+                i_num = distance[i][j - 1] + 1
+                d_num = distance[i - 1][j] + 1
+                distance[i][j] = min(s_num, i_num, d_num)
+
+    return distance[ref_len][hyp_len]
+
+
+def wer(reference, hypothesis, ignore_case=False, delimiter=' '):
+    """Calculate word error rate (WER). WER compares reference text and 
+    hypothesis text in word-level. WER is defined as:
+
+    .. math::
+        WER = (Sw + Dw + Iw) / Nw
+
+    where
+
+    .. code-block:: text
+
+        Sw is the number of words subsituted,
+        Dw is the number of words deleted,
+        Iw is the number of words inserted,
+        Nw is the number of words in the reference
+
+    We can use levenshtein distance to calculate WER. Please draw an attention that 
+    empty items will be removed when splitting sentences by delimiter.
+
+    :param reference: The reference sentence.
+    :type reference: basestring
+    :param hypothesis: The hypothesis sentence.
+    :type hypothesis: basestring
+    :param ignore_case: Whether case-sensitive or not.
+    :type ignore_case: bool
+    :param delimiter: Delimiter of input sentences.
+    :type delimiter: char
+    :return: Word error rate.
+    :rtype: float
+    :raises ValueError: If the reference length is zero.
+    """
+    if ignore_case == True:
+        reference = reference.lower()
+        hypothesis = hypothesis.lower()
+
+    ref_words = filter(None, reference.split(delimiter))
+    hyp_words = filter(None, hypothesis.split(delimiter))
+
+    if len(ref_words) == 0:
+        raise ValueError("Reference's word number should be greater than 0.")
+
+    edit_distance = _levenshtein_distance(ref_words, hyp_words)
+    wer = float(edit_distance) / len(ref_words)
+    return wer
+
+
+def cer(reference, hypothesis, ignore_case=False):
+    """Calculate charactor error rate (CER). CER compares reference text and
+    hypothesis text in char-level. CER is defined as:
+
+    .. math::
+        CER = (Sc + Dc + Ic) / Nc
+
+    where
+
+    .. code-block:: text
+
+        Sc is the number of characters substituted,
+        Dc is the number of characters deleted,
+        Ic is the number of characters inserted
+        Nc is the number of characters in the reference
+
+    We can use levenshtein distance to calculate CER. Chinese input should be 
+    encoded to unicode. Please draw an attention that the leading and tailing 
+    white space characters will be truncated and multiple consecutive white 
+    space characters in a sentence will be replaced by one white space character.
+
+    :param reference: The reference sentence.
+    :type reference: basestring
+    :param hypothesis: The hypothesis sentence.
+    :type hypothesis: basestring
+    :param ignore_case: Whether case-sensitive or not.
+    :type ignore_case: bool
+    :return: Character error rate.
+    :rtype: float
+    :raises ValueError: If the reference length is zero.
+    """
+    if ignore_case == True:
+        reference = reference.lower()
+        hypothesis = hypothesis.lower()
+
+    reference = ' '.join(filter(None, reference.split(' ')))
+    hypothesis = ' '.join(filter(None, hypothesis.split(' ')))
+
+    if len(reference) == 0:
+        raise ValueError("Length of reference should be greater than 0.")
+
+    edit_distance = _levenshtein_distance(reference, hypothesis)
+    cer = float(edit_distance) / len(reference)
+    return cer
--- a/deep_speech_2/infer.py
+++ b/deep_speech_2/infer.py
@@ -10,9 +10,9 @@ import paddle.v2 as paddle
 from data_utils.data import DataGenerator
 from model import deep_speech2
 from decoder import ctc_decode
+import utils

-parser = argparse.ArgumentParser(
-    description='Simplified version of DeepSpeech2 inference.')
+parser = argparse.ArgumentParser(description=__doc__)
 parser.add_argument(
    "--num_samples",
    default=10,
@@ -62,9 +62,7 @@ args = parser.parse_args()


 def infer():
-    """
-    Max-ctc-decoding for DeepSpeech2.
-    """
+    """Max-ctc-decoding for DeepSpeech2."""
    # initialize data generator
    data_generator = DataGenerator(
        vocab_filepath=args.vocab_filepath,
@@ -98,7 +96,7 @@ def infer():
        manifest_path=args.decode_manifest_path,
        batch_size=args.num_samples,
        sortagrad=False,
-        batch_shuffle=False)
+        shuffle_method=None)
    infer_data = batch_reader().next()

    # run inference
@@ -123,6 +121,7 @@ def infer():


 def main():
+    utils.print_arguments(args)
    paddle.init(use_gpu=args.use_gpu, trainer_count=1)
    infer()


--- a/deep_speech_2/requirements.txt
+++ b/deep_speech_2/requirements.txt
 SoundFile==0.9.0.post1
 wget==3.2
+scikits.samplerate==0.3.3
+scipy==0.13.0b1
--- a/deep_speech_2/tests/test_error_rate.py
+++ b/deep_speech_2/tests/test_error_rate.py
+# -*- coding: utf-8 -*-
+"""Test error rate."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import unittest
+import error_rate
+
+
+class TestParse(unittest.TestCase):
+    def test_wer_1(self):
+        ref = 'i UM the PHONE IS i LEFT THE portable PHONE UPSTAIRS last night'
+        hyp = 'i GOT IT TO the FULLEST i LOVE TO portable FROM OF STORES last night'
+        word_error_rate = error_rate.wer(ref, hyp)
+        self.assertTrue(abs(word_error_rate - 0.769230769231) < 1e-6)
+
+    def test_wer_2(self):
+        ref = 'i UM the PHONE IS i LEFT THE portable PHONE UPSTAIRS last night'
+        word_error_rate = error_rate.wer(ref, ref)
+        self.assertEqual(word_error_rate, 0.0)
+
+    def test_wer_3(self):
+        ref = ' '
+        hyp = 'Hypothesis sentence'
+        with self.assertRaises(ValueError):
+            word_error_rate = error_rate.wer(ref, hyp)
+
+    def test_cer_1(self):
+        ref = 'werewolf'
+        hyp = 'weae  wolf'
+        char_error_rate = error_rate.cer(ref, hyp)
+        self.assertTrue(abs(char_error_rate - 0.25) < 1e-6)
+
+    def test_cer_2(self):
+        ref = 'werewolf'
+        char_error_rate = error_rate.cer(ref, ref)
+        self.assertEqual(char_error_rate, 0.0)
+
+    def test_cer_3(self):
+        ref = u'我是中国人'
+        hyp = u'我是 美洲人'
+        char_error_rate = error_rate.cer(ref, hyp)
+        self.assertTrue(abs(char_error_rate - 0.6) < 1e-6)
+
+    def test_cer_4(self):
+        ref = u'我是中国人'
+        char_error_rate = error_rate.cer(ref, ref)
+        self.assertFalse(char_error_rate, 0.0)
+
+    def test_cer_5(self):
+        ref = ''
+        hyp = 'Hypothesis'
+        with self.assertRaises(ValueError):
+            char_error_rate = error_rate.cer(ref, hyp)
+
+
+if __name__ == '__main__':
+    unittest.main()
--- a/deep_speech_2/train.py
+++ b/deep_speech_2/train.py
@@ -12,6 +12,7 @@ import distutils.util
 import paddle.v2 as paddle
 from model import deep_speech2
 from data_utils.data import DataGenerator
+import utils

 parser = argparse.ArgumentParser(description=__doc__)
 parser.add_argument(
@@ -51,6 +52,12 @@ parser.add_argument(
    default=True,
    type=distutils.util.strtobool,
    help="Use sortagrad or not. (default: %(default)s)")
+parser.add_argument(
+    "--shuffle_method",
+    default='instance_shuffle',
+    type=str,
+    help="Shuffle method: 'instance_shuffle', 'batch_shuffle', "
+    "'batch_shuffle_batch'. (default: %(default)s)")
 parser.add_argument(
    "--trainer_count",
    default=4,
@@ -93,9 +100,7 @@ args = parser.parse_args()


 def train():
-    """
-    DeepSpeech2 training.
-    """
+    """DeepSpeech2 training."""

    # initialize data generator
    def data_generator():
@@ -143,13 +148,15 @@ def train():
    train_batch_reader = train_generator.batch_reader_creator(
        manifest_path=args.train_manifest_path,
        batch_size=args.batch_size,
+        min_batch_size=args.trainer_count,
        sortagrad=args.use_sortagrad if args.init_model_path is None else False,
-        batch_shuffle=True)
+        shuffle_method=args.shuffle_method)
    test_batch_reader = test_generator.batch_reader_creator(
        manifest_path=args.dev_manifest_path,
        batch_size=args.batch_size,
+        min_batch_size=1,  # must be 1, but will have errors.
        sortagrad=False,
-        batch_shuffle=False)
+        shuffle_method=None)

    # create event handler
    def event_handler(event):
@@ -157,11 +164,11 @@ def train():
        if isinstance(event, paddle.event.EndIteration):
            cost_sum += event.cost
            cost_counter += 1
-            if event.batch_id % 50 == 0:
-                print("\nPass: %d, Batch: %d, TrainCost: %f" %
-                      (event.pass_id, event.batch_id, cost_sum / cost_counter))
+            if (event.batch_id + 1) % 100 == 0:
+                print("\nPass: %d, Batch: %d, TrainCost: %f" % (
+                    event.pass_id, event.batch_id + 1, cost_sum / cost_counter))
                cost_sum, cost_counter = 0.0, 0
-                with gzip.open("params_tmp.tar.gz", 'w') as f:
+                with gzip.open("params.tar.gz", 'w') as f:
                    parameters.to_tar(f)
            else:
                sys.stdout.write('.')
@@ -184,6 +191,7 @@ def train():


 def main():
+    utils.print_arguments(args)
    paddle.init(use_gpu=args.use_gpu, trainer_count=args.trainer_count)
    train()


--- a/deep_speech_2/utils.py
+++ b/deep_speech_2/utils.py
+"""Contains common utility functions."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+
+def print_arguments(args):
+    """Print argparse's arguments.
+
+    Usage:
+
+    .. code-block:: python
+        
+        parser = argparse.ArgumentParser()
+        parser.add_argument("name", default="Jonh", type=str, help="User name.")
+        args = parser.parse_args() 
+        print_arguments(args)
+    
+    :param args: Input argparse.Namespace for printing.
+    :type args: argparse.Namespace
+    """
+    print("-----  Configuration Arguments -----")
+    for arg, value in vars(args).iteritems():
+        print("%s: %s" % (arg, value))
+    print("------------------------------------")
--- a/image_classification/README.md
+++ b/image_classification/README.md
-TBD
+图像分类
+=======================
+
+这里将介绍如何在PaddlePaddle下使用AlexNet、VGG、GoogLeNet和ResNet模型进行图像分类。图像分类问题的描述和这四种模型的介绍可以参考[PaddlePaddle book](https://github.com/PaddlePaddle/book/tree/develop/03.image_classification)。
+
+## 训练模型
+
+### 初始化
+
+在初始化阶段需要导入所用的包，并对PaddlePaddle进行初始化。
+
+```python
+import gzip
+import paddle.v2.dataset.flowers as flowers
+import paddle.v2 as paddle
+import reader
+import vgg
+import resnet
+import alexnet
+import googlenet
+
+
+# PaddlePaddle init
+paddle.init(use_gpu=False, trainer_count=1)
+```
+
+### 定义参数和输入
+
+设置算法参数（如数据维度、类别数目和batch size等参数），定义数据输入层`image`和类别标签`lbl`。
+
+```python
+DATA_DIM = 3 * 224 * 224
+CLASS_DIM = 102
+BATCH_SIZE = 128
+
+image = paddle.layer.data(
+    name="image", type=paddle.data_type.dense_vector(DATA_DIM))
+lbl = paddle.layer.data(
+    name="label", type=paddle.data_type.integer_value(CLASS_DIM))
+```
+
+### 获得所用模型
+
+这里可以选择使用AlexNet、VGG、GoogLeNet和ResNet模型中的一个模型进行图像分类。通过调用相应的方法可以获得网络最后的Softmax层。
+
+1. 使用AlexNet模型
+
+指定输入层`image`和类别数目`CLASS_DIM`后，可以通过下面的代码得到AlexNet的Softmax层。
+
+```python
+out = alexnet.alexnet(image, class_dim=CLASS_DIM)
+```
+
+2. 使用VGG模型
+
+根据层数的不同，VGG分为VGG13、VGG16和VGG19。使用VGG16模型的代码如下：
+
+```python
+out = vgg.vgg16(image, class_dim=CLASS_DIM)
+```
+
+类似地，VGG13和VGG19可以分别通过`vgg.vgg13`和`vgg.vgg19`方法获得。
+
+3. 使用GoogLeNet模型
+
+GoogLeNet在训练阶段使用两个辅助的分类器强化梯度信息并进行额外的正则化。因此`googlenet.googlenet`共返回三个Softmax层，如下面的代码所示：
+
+```python
+out, out1, out2 = googlenet.googlenet(image, class_dim=CLASS_DIM)
+loss1 = paddle.layer.cross_entropy_cost(
+    input=out1, label=lbl, coeff=0.3)
+paddle.evaluator.classification_error(input=out1, label=lbl)
+loss2 = paddle.layer.cross_entropy_cost(
+    input=out2, label=lbl, coeff=0.3)
+paddle.evaluator.classification_error(input=out2, label=lbl)
+extra_layers = [loss1, loss2]
+```
+
+对于两个辅助的输出，这里分别对其计算损失函数并评价错误率，然后将损失作为后文SGD的extra_layers。
+
+4. 使用ResNet模型
+
+ResNet模型可以通过下面的代码获取：
+
+```python
+out = resnet.resnet_imagenet(image, class_dim=CLASS_DIM)
+```
+
+### 定义损失函数
+
+```python
+cost = paddle.layer.classification_cost(input=out, label=lbl)
+```
+
+### 创建参数和优化方法
+
+```python
+# Create parameters
+parameters = paddle.parameters.create(cost)
+
+# Create optimizer
+optimizer = paddle.optimizer.Momentum(
+    momentum=0.9,
+    regularization=paddle.optimizer.L2Regularization(rate=0.0005 *
+                                                     BATCH_SIZE),
+    learning_rate=0.001 / BATCH_SIZE,
+    learning_rate_decay_a=0.1,
+    learning_rate_decay_b=128000 * 35,
+    learning_rate_schedule="discexp", )
+```
+
+通过 `learning_rate_decay_a` (简写$a$） 、`learning_rate_decay_b` (简写$b$) 和 `learning_rate_schedule` 指定学习率调整策略，这里采用离散指数的方式调节学习率，计算公式如下， $n$ 代表已经处理过的累计总样本数，$lr_{0}$ 即为参数里设置的 `learning_rate`。
+
+$$  lr = lr_{0} * a^ {\lfloor \frac{n}{ b}\rfloor} $$
+
+
+### 定义数据读取
+
+首先以[花卉数据](http://www.robots.ox.ac.uk/~vgg/data/flowers/102/index.html)为例说明如何定义输入。下面的代码定义了花卉数据训练集和验证集的输入：
+
+```python
+train_reader = paddle.batch(
+    paddle.reader.shuffle(
+        flowers.train(),
+        buf_size=1000),
+    batch_size=BATCH_SIZE)
+test_reader = paddle.batch(
+    flowers.valid(),
+    batch_size=BATCH_SIZE)
+```
+
+若需要使用其他数据，则需要先建立图像列表文件。`reader.py`定义了这种文件的读取方式，它从图像列表文件中解析出图像路径和类别标签。
+
+图像列表文件是一个文本文件，其中每一行由一个图像路径和类别标签构成，二者以跳格符（Tab）隔开。类别标签用整数表示，其最小值为0。下面给出一个图像列表文件的片段示例：
+
+```
+dataset_100/train_images/n03982430_23191.jpeg    1
+dataset_100/train_images/n04461696_23653.jpeg    7
+dataset_100/train_images/n02441942_3170.jpeg 8
+dataset_100/train_images/n03733281_31716.jpeg    2
+dataset_100/train_images/n03424325_240.jpeg  0
+dataset_100/train_images/n02643566_75.jpeg   8
+```
+
+训练时需要分别指定训练集和验证集的图像列表文件。这里假设这两个文件分别为`train.list`和`val.list`，数据读取方式如下：
+
+```python
+train_reader = paddle.batch(
+    paddle.reader.shuffle(
+        reader.train_reader('train.list'),
+        buf_size=1000),
+    batch_size=BATCH_SIZE)
+test_reader = paddle.batch(
+    reader.test_reader('val.list'),
+    batch_size=BATCH_SIZE)
+```
+
+### 定义事件处理程序
+```python
+# End batch and end pass event handler
+def event_handler(event):
+    if isinstance(event, paddle.event.EndIteration):
+        if event.batch_id % 1 == 0:
+            print "\nPass %d, Batch %d, Cost %f, %s" % (
+                event.pass_id, event.batch_id, event.cost, event.metrics)
+    if isinstance(event, paddle.event.EndPass):
+        with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
+            parameters.to_tar(f)
+
+        result = trainer.test(reader=test_reader)
+        print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)
+```
+
+### 定义训练方法
+
+对于AlexNet、VGG和ResNet，可以按下面的代码定义训练方法：
+
+```python
+# Create trainer
+trainer = paddle.trainer.SGD(
+    cost=cost,
+    parameters=parameters,
+    update_equation=optimizer)
+```
+
+GoogLeNet有两个额外的输出层，因此需要指定`extra_layers`，如下所示：
+
+```python
+# Create trainer
+trainer = paddle.trainer.SGD(
+    cost=cost,
+    parameters=parameters,
+    update_equation=optimizer,
+    extra_layers=extra_layers)
+```
+
+### 开始训练
+
+```python
+trainer.train(
+    reader=train_reader, num_passes=200, event_handler=event_handler)
+```
+
+## 应用模型
+模型训练好后，可以使用下面的代码预测给定图片的类别。
+
+```python
+# load parameters
+with gzip.open('params_pass_10.tar.gz', 'r') as f:
+    parameters = paddle.parameters.Parameters.from_tar(f)
+
+file_list = [line.strip() for line in open(image_list_file)]
+test_data = [(paddle.image.load_and_transform(image_file, 256, 224, False)
+              .flatten().astype('float32'), )
+             for image_file in file_list]
+probs = paddle.infer(
+    output_layer=out, parameters=parameters, input=test_data)
+lab = np.argsort(-probs)
+for file_name, result in zip(file_list, lab):
+    print "Label of %s is: %d" % (file_name, result[0])
+```
+
+首先从文件中加载训练好的模型（代码里以第10轮迭代的结果为例），然后读取`image_list_file`中的图像。`image_list_file`是一个文本文件，每一行为一个图像路径。代码使用`paddle.infer`判断`image_list_file`中每个图像的类别，并进行输出。
--- a/image_classification/alexnet.py
+++ b/image_classification/alexnet.py
+import paddle.v2 as paddle
+
+__all__ = ['alexnet']
+
+
+def alexnet(input, class_dim):
+    conv1 = paddle.layer.img_conv(
+        input=input,
+        filter_size=11,
+        num_channels=3,
+        num_filters=96,
+        stride=4,
+        padding=1)
+    cmrnorm1 = paddle.layer.img_cmrnorm(
+        input=conv1, size=5, scale=0.0001, power=0.75)
+    pool1 = paddle.layer.img_pool(input=cmrnorm1, pool_size=3, stride=2)
+
+    conv2 = paddle.layer.img_conv(
+        input=pool1,
+        filter_size=5,
+        num_filters=256,
+        stride=1,
+        padding=2,
+        groups=1)
+    cmrnorm2 = paddle.layer.img_cmrnorm(
+        input=conv2, size=5, scale=0.0001, power=0.75)
+    pool2 = paddle.layer.img_pool(input=cmrnorm2, pool_size=3, stride=2)
+
+    pool3 = paddle.networks.img_conv_group(
+        input=pool2,
+        pool_size=3,
+        pool_stride=2,
+        conv_num_filter=[384, 384, 256],
+        conv_filter_size=3,
+        pool_type=paddle.pooling.Max())
+
+    fc1 = paddle.layer.fc(
+        input=pool3,
+        size=4096,
+        act=paddle.activation.Relu(),
+        layer_attr=paddle.attr.Extra(drop_rate=0.5))
+    fc2 = paddle.layer.fc(
+        input=fc1,
+        size=4096,
+        act=paddle.activation.Relu(),
+        layer_attr=paddle.attr.Extra(drop_rate=0.5))
+
+    out = paddle.layer.fc(
+        input=fc2, size=class_dim, act=paddle.activation.Softmax())
+    return out
--- a/image_classification/googlenet.py
+++ b/image_classification/googlenet.py
+import paddle.v2 as paddle
+
+__all__ = ['googlenet']
+
+
+def inception(name, input, channels, filter1, filter3R, filter3, filter5R,
+              filter5, proj):
+    cov1 = paddle.layer.img_conv(
+        name=name + '_1',
+        input=input,
+        filter_size=1,
+        num_channels=channels,
+        num_filters=filter1,
+        stride=1,
+        padding=0)
+
+    cov3r = paddle.layer.img_conv(
+        name=name + '_3r',
+        input=input,
+        filter_size=1,
+        num_channels=channels,
+        num_filters=filter3R,
+        stride=1,
+        padding=0)
+    cov3 = paddle.layer.img_conv(
+        name=name + '_3',
+        input=cov3r,
+        filter_size=3,
+        num_filters=filter3,
+        stride=1,
+        padding=1)
+
+    cov5r = paddle.layer.img_conv(
+        name=name + '_5r',
+        input=input,
+        filter_size=1,
+        num_channels=channels,
+        num_filters=filter5R,
+        stride=1,
+        padding=0)
+    cov5 = paddle.layer.img_conv(
+        name=name + '_5',
+        input=cov5r,
+        filter_size=5,
+        num_filters=filter5,
+        stride=1,
+        padding=2)
+
+    pool1 = paddle.layer.img_pool(
+        name=name + '_max',
+        input=input,
+        pool_size=3,
+        num_channels=channels,
+        stride=1,
+        padding=1)
+    covprj = paddle.layer.img_conv(
+        name=name + '_proj',
+        input=pool1,
+        filter_size=1,
+        num_filters=proj,
+        stride=1,
+        padding=0)
+
+    cat = paddle.layer.concat(name=name, input=[cov1, cov3, cov5, covprj])
+    return cat
+
+
+def googlenet(input, class_dim):
+    # stage 1
+    conv1 = paddle.layer.img_conv(
+        name="conv1",
+        input=input,
+        filter_size=7,
+        num_channels=3,
+        num_filters=64,
+        stride=2,
+        padding=3)
+    pool1 = paddle.layer.img_pool(
+        name="pool1", input=conv1, pool_size=3, num_channels=64, stride=2)
+
+    # stage 2
+    conv2_1 = paddle.layer.img_conv(
+        name="conv2_1",
+        input=pool1,
+        filter_size=1,
+        num_filters=64,
+        stride=1,
+        padding=0)
+    conv2_2 = paddle.layer.img_conv(
+        name="conv2_2",
+        input=conv2_1,
+        filter_size=3,
+        num_filters=192,
+        stride=1,
+        padding=1)
+    pool2 = paddle.layer.img_pool(
+        name="pool2", input=conv2_2, pool_size=3, num_channels=192, stride=2)
+
+    # stage 3
+    ince3a = inception("ince3a", pool2, 192, 64, 96, 128, 16, 32, 32)
+    ince3b = inception("ince3b", ince3a, 256, 128, 128, 192, 32, 96, 64)
+    pool3 = paddle.layer.img_pool(
+        name="pool3", input=ince3b, num_channels=480, pool_size=3, stride=2)
+
+    # stage 4
+    ince4a = inception("ince4a", pool3, 480, 192, 96, 208, 16, 48, 64)
+    ince4b = inception("ince4b", ince4a, 512, 160, 112, 224, 24, 64, 64)
+    ince4c = inception("ince4c", ince4b, 512, 128, 128, 256, 24, 64, 64)
+    ince4d = inception("ince4d", ince4c, 512, 112, 144, 288, 32, 64, 64)
+    ince4e = inception("ince4e", ince4d, 528, 256, 160, 320, 32, 128, 128)
+    pool4 = paddle.layer.img_pool(
+        name="pool4", input=ince4e, num_channels=832, pool_size=3, stride=2)
+
+    # stage 5
+    ince5a = inception("ince5a", pool4, 832, 256, 160, 320, 32, 128, 128)
+    ince5b = inception("ince5b", ince5a, 832, 384, 192, 384, 48, 128, 128)
+    pool5 = paddle.layer.img_pool(
+        name="pool5",
+        input=ince5b,
+        num_channels=1024,
+        pool_size=7,
+        stride=7,
+        pool_type=paddle.pooling.Avg())
+    dropout = paddle.layer.addto(
+        input=pool5,
+        layer_attr=paddle.attr.Extra(drop_rate=0.4),
+        act=paddle.activation.Linear())
+
+    out = paddle.layer.fc(
+        input=dropout, size=class_dim, act=paddle.activation.Softmax())
+
+    # fc for output 1
+    pool_o1 = paddle.layer.img_pool(
+        name="pool_o1",
+        input=ince4a,
+        num_channels=512,
+        pool_size=5,
+        stride=3,
+        pool_type=paddle.pooling.Avg())
+    conv_o1 = paddle.layer.img_conv(
+        name="conv_o1",
+        input=pool_o1,
+        filter_size=1,
+        num_filters=128,
+        stride=1,
+        padding=0)
+    fc_o1 = paddle.layer.fc(
+        name="fc_o1",
+        input=conv_o1,
+        size=1024,
+        layer_attr=paddle.attr.Extra(drop_rate=0.7),
+        act=paddle.activation.Relu())
+    out1 = paddle.layer.fc(
+        input=fc_o1, size=class_dim, act=paddle.activation.Softmax())
+
+    # fc for output 2
+    pool_o2 = paddle.layer.img_pool(
+        name="pool_o2",
+        input=ince4d,
+        num_channels=528,
+        pool_size=5,
+        stride=3,
+        pool_type=paddle.pooling.Avg())
+    conv_o2 = paddle.layer.img_conv(
+        name="conv_o2",
+        input=pool_o2,
+        filter_size=1,
+        num_filters=128,
+        stride=1,
+        padding=0)
+    fc_o2 = paddle.layer.fc(
+        name="fc_o2",
+        input=conv_o2,
+        size=1024,
+        layer_attr=paddle.attr.Extra(drop_rate=0.7),
+        act=paddle.activation.Relu())
+    out2 = paddle.layer.fc(
+        input=fc_o2, size=class_dim, act=paddle.activation.Softmax())
+
+    return out, out1, out2
--- a/image_classification/infer.py
+++ b/image_classification/infer.py
+import gzip
+import paddle.v2 as paddle
+import reader
+import vgg
+import resnet
+import alexnet
+import googlenet
+import argparse
+import os
+from PIL import Image
+import numpy as np
+
+WIDTH = 224
+HEIGHT = 224
+DATA_DIM = 3 * WIDTH * HEIGHT
+CLASS_DIM = 102
+
+
+def main():
+    # parse the argument
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        'data_list',
+        help='The path of data list file, which consists of one image path per line'
+    )
+    parser.add_argument(
+        'model',
+        help='The model for image classification',
+        choices=['alexnet', 'vgg13', 'vgg16', 'vgg19', 'resnet', 'googlenet'])
+    parser.add_argument(
+        'params_path', help='The file which stores the parameters')
+    args = parser.parse_args()
+
+    # PaddlePaddle init
+    paddle.init(use_gpu=True, trainer_count=1)
+
+    image = paddle.layer.data(
+        name="image", type=paddle.data_type.dense_vector(DATA_DIM))
+
+    if args.model == 'alexnet':
+        out = alexnet.alexnet(image, class_dim=CLASS_DIM)
+    elif args.model == 'vgg13':
+        out = vgg.vgg13(image, class_dim=CLASS_DIM)
+    elif args.model == 'vgg16':
+        out = vgg.vgg16(image, class_dim=CLASS_DIM)
+    elif args.model == 'vgg19':
+        out = vgg.vgg19(image, class_dim=CLASS_DIM)
+    elif args.model == 'resnet':
+        out = resnet.resnet_imagenet(image, class_dim=CLASS_DIM)
+    elif args.model == 'googlenet':
+        out, _, _ = googlenet.googlenet(image, class_dim=CLASS_DIM)
+
+    # load parameters
+    with gzip.open(args.params_path, 'r') as f:
+        parameters = paddle.parameters.Parameters.from_tar(f)
+
+    file_list = [line.strip() for line in open(args.data_list)]
+    test_data = [(paddle.image.load_and_transform(image_file, 256, 224, False)
+                  .flatten().astype('float32'), ) for image_file in file_list]
+    probs = paddle.infer(
+        output_layer=out, parameters=parameters, input=test_data)
+    lab = np.argsort(-probs)
+    for file_name, result in zip(file_list, lab):
+        print "Label of %s is: %d" % (file_name, result[0])
+
+
+if __name__ == '__main__':
+    main()
--- a/image_classification/reader.py
+++ b/image_classification/reader.py
 import random
 from paddle.v2.image import load_and_transform
+import paddle.v2 as paddle
+from multiprocessing import cpu_count


-def train_reader(train_list):
+def train_mapper(sample):
+    '''
+    map image path to type needed by model input layer for the training set
+    '''
+    img, label = sample
+    img = paddle.image.load_image(img)
+    img = paddle.image.simple_transform(img, 256, 224, True)
+    return img.flatten().astype('float32'), label
+
+
+def test_mapper(sample):
+    '''
+    map image path to type needed by model input layer for the test set
+    '''
+    img, label = sample
+    img = paddle.image.load_image(img)
+    img = paddle.image.simple_transform(img, 256, 224, True)
+    return img.flatten().astype('float32'), label
+
+
+def train_reader(train_list, buffered_size=1024):
    def reader():
        with open(train_list, 'r') as f:
            lines = [line.strip() for line in f]
-            random.shuffle(lines)
            for line in lines:
                img_path, lab = line.strip().split('\t')
-                im = load_and_transform(img_path, 256, 224, True)
-                yield im.flatten().astype('float32'), int(lab)
+                yield img_path, int(lab)

-    return reader
+    return paddle.reader.xmap_readers(train_mapper, reader,
+                                      cpu_count(), buffered_size)


-def test_reader(test_list):
+def test_reader(test_list, buffered_size=1024):
    def reader():
        with open(test_list, 'r') as f:
            lines = [line.strip() for line in f]
            for line in lines:
                img_path, lab = line.strip().split('\t')
-                im = load_and_transform(img_path, 256, 224, False)
-                yield im.flatten().astype('float32'), int(lab)
+                yield img_path, int(lab)

-    return reader
+    return paddle.reader.xmap_readers(test_mapper, reader,
+                                      cpu_count(), buffered_size)


 if __name__ == '__main__':

--- a/image_classification/resnet.py
+++ b/image_classification/resnet.py
+import paddle.v2 as paddle
+
+__all__ = ['resnet_imagenet', 'resnet_cifar10']
+
+
+def conv_bn_layer(input,
+                  ch_out,
+                  filter_size,
+                  stride,
+                  padding,
+                  active_type=paddle.activation.Relu(),
+                  ch_in=None):
+    tmp = paddle.layer.img_conv(
+        input=input,
+        filter_size=filter_size,
+        num_channels=ch_in,
+        num_filters=ch_out,
+        stride=stride,
+        padding=padding,
+        act=paddle.activation.Linear(),
+        bias_attr=False)
+    return paddle.layer.batch_norm(input=tmp, act=active_type)
+
+
+def shortcut(input, ch_in, ch_out, stride):
+    if ch_in != ch_out:
+        return conv_bn_layer(input, ch_out, 1, stride, 0,
+                             paddle.activation.Linear())
+    else:
+        return input
+
+
+def basicblock(input, ch_in, ch_out, stride):
+    short = shortcut(input, ch_in, ch_out, stride)
+    conv1 = conv_bn_layer(input, ch_out, 3, stride, 1)
+    conv2 = conv_bn_layer(conv1, ch_out, 3, 1, 1, paddle.activation.Linear())
+    return paddle.layer.addto(
+        input=[short, conv2], act=paddle.activation.Relu())
+
+
+def bottleneck(input, ch_in, ch_out, stride):
+    short = shortcut(input, ch_in, ch_out * 4, stride)
+    conv1 = conv_bn_layer(input, ch_out, 1, stride, 0)
+    conv2 = conv_bn_layer(conv1, ch_out, 3, 1, 1)
+    conv3 = conv_bn_layer(conv2, ch_out * 4, 1, 1, 0,
+                          paddle.activation.Linear())
+    return paddle.layer.addto(
+        input=[short, conv3], act=paddle.activation.Relu())
+
+
+def layer_warp(block_func, input, ch_in, ch_out, count, stride):
+    conv = block_func(input, ch_in, ch_out, stride)
+    for i in range(1, count):
+        conv = block_func(conv, ch_out, ch_out, 1)
+    return conv
+
+
+def resnet_imagenet(input, class_dim, depth=50):
+    cfg = {
+        18: ([2, 2, 2, 1], basicblock),
+        34: ([3, 4, 6, 3], basicblock),
+        50: ([3, 4, 6, 3], bottleneck),
+        101: ([3, 4, 23, 3], bottleneck),
+        152: ([3, 8, 36, 3], bottleneck)
+    }
+    stages, block_func = cfg[depth]
+    conv1 = conv_bn_layer(
+        input, ch_in=3, ch_out=64, filter_size=7, stride=2, padding=3)
+    pool1 = paddle.layer.img_pool(input=conv1, pool_size=3, stride=2)
+    res1 = layer_warp(block_func, pool1, 64, 64, stages[0], 1)
+    res2 = layer_warp(block_func, res1, 64, 128, stages[1], 2)
+    res3 = layer_warp(block_func, res2, 128, 256, stages[2], 2)
+    res4 = layer_warp(block_func, res3, 256, 512, stages[3], 2)
+    pool2 = paddle.layer.img_pool(
+        input=res4, pool_size=7, stride=1, pool_type=paddle.pooling.Avg())
+    out = paddle.layer.fc(
+        input=pool2, size=class_dim, act=paddle.activation.Softmax())
+    return out
+
+
+def resnet_cifar10(input, class_dim, depth=32):
+    # depth should be one of 20, 32, 44, 56, 110, 1202
+    assert (depth - 2) % 6 == 0
+    n = (depth - 2) / 6
+    nStages = {16, 64, 128}
+    conv1 = conv_bn_layer(
+        input, ch_in=3, ch_out=16, filter_size=3, stride=1, padding=1)
+    res1 = layer_warp(basicblock, conv1, 16, 16, n, 1)
+    res2 = layer_warp(basicblock, res1, 16, 32, n, 2)
+    res3 = layer_warp(basicblock, res2, 32, 64, n, 2)
+    pool = paddle.layer.img_pool(
+        input=res3, pool_size=8, stride=1, pool_type=paddle.pooling.Avg())
+    out = paddle.layer.fc(
+        input=pool, size=class_dim, act=paddle.activation.Softmax())
+    return out
--- a/image_classification/train.py
+++ b/image_classification/train.py
 import gzip
-
+import paddle.v2.dataset.flowers as flowers
 import paddle.v2 as paddle
 import reader
 import vgg
+import resnet
+import alexnet
+import googlenet
+import argparse

 DATA_DIM = 3 * 224 * 224
-CLASS_DIM = 1000
+CLASS_DIM = 102
 BATCH_SIZE = 128


 def main():
+    # parse the argument
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        'model',
+        help='The model for image classification',
+        choices=['alexnet', 'vgg13', 'vgg16', 'vgg19', 'resnet', 'googlenet'])
+    args = parser.parse_args()

    # PaddlePaddle init
-    paddle.init(use_gpu=True, trainer_count=4)
+    paddle.init(use_gpu=True, trainer_count=1)

    image = paddle.layer.data(
        name="image", type=paddle.data_type.dense_vector(DATA_DIM))
    lbl = paddle.layer.data(
        name="label", type=paddle.data_type.integer_value(CLASS_DIM))
-    net = vgg.vgg13(image)
-    out = paddle.layer.fc(
-        input=net, size=CLASS_DIM, act=paddle.activation.Softmax())
+
+    extra_layers = None
+    learning_rate = 0.01
+    if args.model == 'alexnet':
+        out = alexnet.alexnet(image, class_dim=CLASS_DIM)
+    elif args.model == 'vgg13':
+        out = vgg.vgg13(image, class_dim=CLASS_DIM)
+    elif args.model == 'vgg16':
+        out = vgg.vgg16(image, class_dim=CLASS_DIM)
+    elif args.model == 'vgg19':
+        out = vgg.vgg19(image, class_dim=CLASS_DIM)
+    elif args.model == 'resnet':
+        out = resnet.resnet_imagenet(image, class_dim=CLASS_DIM)
+        learning_rate = 0.1
+    elif args.model == 'googlenet':
+        out, out1, out2 = googlenet.googlenet(image, class_dim=CLASS_DIM)
+        loss1 = paddle.layer.cross_entropy_cost(
+            input=out1, label=lbl, coeff=0.3)
+        paddle.evaluator.classification_error(input=out1, label=lbl)
+        loss2 = paddle.layer.cross_entropy_cost(
+            input=out2, label=lbl, coeff=0.3)
+        paddle.evaluator.classification_error(input=out2, label=lbl)
+        extra_layers = [loss1, loss2]
+
    cost = paddle.layer.classification_cost(input=out, label=lbl)

    # Create parameters
@@ -31,16 +63,23 @@ def main():
        momentum=0.9,
        regularization=paddle.optimizer.L2Regularization(rate=0.0005 *
                                                         BATCH_SIZE),
-        learning_rate=0.01 / BATCH_SIZE,
+        learning_rate=learning_rate / BATCH_SIZE,
        learning_rate_decay_a=0.1,
        learning_rate_decay_b=128000 * 35,
        learning_rate_schedule="discexp", )

    train_reader = paddle.batch(
-        paddle.reader.shuffle(reader.train_reader("train.list"), buf_size=1000),
+        paddle.reader.shuffle(
+            flowers.train(),
+            # To use other data, replace the above line with:
+            # reader.train_reader('train.list'),
+            buf_size=1000),
        batch_size=BATCH_SIZE)
    test_reader = paddle.batch(
-        reader.test_reader("test.list"), batch_size=BATCH_SIZE)
+        flowers.valid(),
+        # To use other data, replace the above line with:
+        # reader.test_reader('val.list'),
+        batch_size=BATCH_SIZE)

    # End batch and end pass event handler
    def event_handler(event):
@@ -57,11 +96,14 @@ def main():

    # Create trainer
    trainer = paddle.trainer.SGD(
-        cost=cost, parameters=parameters, update_equation=optimizer)
+        cost=cost,
+        parameters=parameters,
+        update_equation=optimizer,
+        extra_layers=extra_layers)

    trainer.train(
        reader=train_reader, num_passes=200, event_handler=event_handler)


 if __name__ == '__main__':
-    main()
+    main()
\ No newline at end of file
--- a/image_classification/vgg.py
+++ b/image_classification/vgg.py
@@ -3,7 +3,7 @@ import paddle.v2 as paddle
 __all__ = ['vgg13', 'vgg16', 'vgg19']


-def vgg(input, nums):
+def vgg(input, nums, class_dim):
    def conv_block(input, num_filter, groups, num_channels=None):
        return paddle.networks.img_conv_group(
            input=input,
@@ -34,19 +34,21 @@ def vgg(input, nums):
        size=fc_dim,
        act=paddle.activation.Relu(),
        layer_attr=paddle.attr.Extra(drop_rate=0.5))
-    return fc2
+    out = paddle.layer.fc(
+        input=fc2, size=class_dim, act=paddle.activation.Softmax())
+    return out


-def vgg13(input):
+def vgg13(input, class_dim):
    nums = [2, 2, 2, 2, 2]
-    return vgg(input, nums)
+    return vgg(input, nums, class_dim)


-def vgg16(input):
+def vgg16(input, class_dim):
    nums = [2, 2, 3, 3, 3]
-    return vgg(input, nums)
+    return vgg(input, nums, class_dim)


-def vgg19(input):
+def vgg19(input, class_dim):
    nums = [2, 2, 4, 4, 4]
-    return vgg(input, nums)
+    return vgg(input, nums, class_dim)
--- a/scheduled_sampling/README.md
+++ b/scheduled_sampling/README.md
-TBD
+# Scheduled Sampling
+
+## 概述
+
+序列生成任务的生成目标是在给定源输入的条件下，最大化目标序列的概率。训练时该模型将目标序列中的真实元素作为解码器每一步的输入，然后最大化下一个元素的概率。生成时上一步解码得到的元素被用作当前的输入，然后生成下一个元素。可见这种情况下训练阶段和生成阶段的解码器输入数据的概率分布并不一致。
+
+Scheduled Sampling\[[1](#参考文献)\]是一种解决训练和生成时输入数据分布不一致的方法。在训练早期该方法主要使用目标序列中的真实元素作为解码器输入，可以将模型从随机初始化的状态快速引导至一个合理的状态。随着训练的进行，该方法会逐渐更多地使用生成的元素作为解码器输入，以解决数据分布不一致的问题。
+
+标准的序列到序列模型中，如果序列前面生成了错误的元素，后面的输入状态将会收到影响，而该误差会随着生成过程不断向后累积。Scheduled Sampling以一定概率将生成的元素作为解码器输入，这样即使前面生成错误，其训练目标仍然是最大化真实目标序列的概率，模型会朝着正确的方向进行训练。因此这种方式增加了模型的容错能力。
+
+## 算法简介
+Scheduled Sampling主要应用在序列到序列模型的训练阶段，而生成阶段则不需要使用。
+
+训练阶段解码器在最大化第$t$个元素概率时，标准序列到序列模型使用上一时刻的真实元素$y_{t-1}$作为输入。设上一时刻生成的元素为$g_{t-1}$，Scheduled Sampling算法会以一定概率使用$g_{t-1}$作为解码器输入。
+
+设当前已经训练到了第$i$个mini-batch，Scheduled Sampling定义了一个概率$\epsilon_i$控制解码器的输入。$\epsilon_i$是一个随着$i$增大而衰减的变量，常见的定义方式有：
+
+ - 线性衰减：$\epsilon_i=max(\epsilon,k-c*i)$，其中$\epsilon$限制$\epsilon_i$的最小值，$k$和$c$控制线性衰减的幅度。
+
+ - 指数衰减：$\epsilon_i=k^i$，其中$0<k<1$，$k$控制着指数衰减的幅度。
+
+ - 反向Sigmoid衰减：$\epsilon_i=k/(k+exp(i/k))$，其中$k>1$，$k$同样控制衰减的幅度。
+
+图1给出了这三种方式的衰减曲线，
+
+<p align="center">
+<img src="img/decay.jpg" width="50%" align="center"><br>
+图1. 线性衰减、指数衰减和反向Sigmoid衰减的衰减曲线
+</p>
+
+如图2所示，在解码器的$t$时刻Scheduled Sampling以概率$\epsilon_i$使用上一时刻的真实元素$y_{t-1}$作为解码器输入，以概率$1-\epsilon_i$使用上一时刻生成的元素$g_{t-1}$作为解码器输入。从图1可知随着$i$的增大$\epsilon_i$会不断减小，解码器将不断倾向于使用生成的元素作为输入，训练阶段和生成阶段的数据分布将变得越来越一致。
+
+<p align="center">
+<img src="img/Scheduled_Sampling.jpg" width="50%" align="center"><br>
+图2. Scheduled Sampling选择不同元素作为解码器输入示意图
+</p>
+
+## 模型实现
+
+由于Scheduled Sampling是对序列到序列模型的改进，其整体实现框架与序列到序列模型较为相似。为突出本文重点，这里仅介绍与Scheduled Sampling相关的部分，完整的代码见`scheduled_sampling.py`。
+
+首先导入需要的包，并定义控制衰减概率的类`RandomScheduleGenerator`，如下：
+
+```python
+import numpy as np
+import math
+
+
+class RandomScheduleGenerator:
+    """
+    The random sampling rate for scheduled sampling algoithm, which uses devcayed
+    sampling rate.
+
+    """
+    ...
+```
+
+下面将分别定义类`RandomScheduleGenerator`的`__init__`、`getScheduleRate`和`processBatch`三个方法。
+
+`__init__`方法对类进行初始化，其`schedule_type`参数指定了使用哪种衰减方式，可选的方式有`constant`、`linear`、`exponential`和`inverse_sigmoid`。`constant`指对所有的mini-batch使用固定的$\epsilon_i$，`linear`指线性衰减方式，`exponential`表示指数衰减方式，`inverse_sigmoid`表示反向Sigmoid衰减。`__init__`方法的参数`a`和`b`表示衰减方法的参数，需要在验证集上调优。`self.schedule_computers`将衰减方式映射为计算$\epsilon_i$的函数。最后一行根据`schedule_type`将选择的衰减函数赋给`self.schedule_computer`变量。
+
+```python
+    def __init__(self, schedule_type, a, b):
+        """
+        schduled_type: is the type of the decay. It supports constant, linear,
+        exponential, and inverse_sigmoid right now.
+        a: parameter of the decay (MUST BE DOUBLE)
+        b: parameter of the decay (MUST BE DOUBLE)
+        """
+        self.schedule_type = schedule_type
+        self.a = a
+        self.b = b
+        self.data_processed_ = 0
+        self.schedule_computers = {
+            "constant": lambda a, b, d: a,
+            "linear": lambda a, b, d: max(a, 1 - d / b),
+            "exponential": lambda a, b, d: pow(a, d / b),
+            "inverse_sigmoid": lambda a, b, d: b / (b + math.exp(d * a / b)),
+        }
+        assert (self.schedule_type in self.schedule_computers)
+        self.schedule_computer = self.schedule_computers[self.schedule_type]
+```
+
+`getScheduleRate`根据衰减函数和已经处理的数据量计算$\epsilon_i$。
+
+```python
+    def getScheduleRate(self):
+        """
+        Get the schedule sampling rate. Usually not needed to be called by the users
+        """
+        return self.schedule_computer(self.a, self.b, self.data_processed_)
+
+```
+
+`processBatch`方法根据概率值$\epsilon_i$进行采样，得到`indexes`，`indexes`中每个元素取值为`0`的概率为$\epsilon_i$，取值为`1`的概率为$1-\epsilon_i$。`indexes`决定了解码器的输入是真实元素还是生成的元素，取值为`0`表示使用真实元素，取值为`1`表示使用生成的元素。
+
+```python
+    def processBatch(self, batch_size):
+        """
+        Get a batch_size of sampled indexes. These indexes can be passed to a
+        MultiplexLayer to select from the grouth truth and generated samples
+        from the last time step.
+        """
+        rate = self.getScheduleRate()
+        numbers = np.random.rand(batch_size)
+        indexes = (numbers >= rate).astype('int32').tolist()
+        self.data_processed_ += batch_size
+        return indexes
+```
+
+Scheduled Sampling需要在序列到序列模型的基础上增加一个输入`true_token_flag`，以控制解码器输入。
+
+```python
+true_token_flags = paddle.layer.data(
+    name='true_token_flag',
+    type=paddle.data_type.integer_value_sequence(2))
+```
+
+这里还需要对原始reader进行封装，增加`true_token_flag`的数据生成器。下面以线性衰减为例说明如何调用上面定义的`RandomScheduleGenerator`产生`true_token_flag`的输入数据。
+
+```python
+schedule_generator = RandomScheduleGenerator("linear", 0.75, 1000000)
+
+def gen_schedule_data(reader):
+    """
+    Creates a data reader for scheduled sampling.
+
+    Output from the iterator that created by original reader will be
+    appended with "true_token_flag" to indicate whether to use true token.
+
+    :param reader: the original reader.
+    :type reader: callable
+
+    :return: the new reader with the field "true_token_flag".
+    :rtype: callable
+    """
+
+    def data_reader():
+        for src_ids, trg_ids, trg_ids_next in reader():
+            yield src_ids, trg_ids, trg_ids_next, \
+                  [0] + schedule_generator.processBatch(len(trg_ids) - 1)
+
+    return data_reader
+```
+
+这段代码在原始输入数据（即源序列元素`src_ids`、目标序列元素`trg_ids`和目标序列下一个元素`trg_ids_next`）后追加了控制解码器输入的数据。由于解码器第一个元素是序列开始符，因此将追加的数据第一个元素设置为`0`，表示解码器第一步始终使用真实目标序列的第一个元素（即序列开始符）。
+
+训练时`recurrent_group`每一步调用的解码器函数如下：
+
+```python
+    def gru_decoder_with_attention_train(enc_vec, enc_proj, true_word,
+                                         true_token_flag):
+        """
+        The decoder step for training.
+        :param enc_vec: the encoder vector for attention
+        :type enc_vec: LayerOutput
+        :param enc_proj: the encoder projection for attention
+        :type enc_proj: LayerOutput
+        :param true_word: the ground-truth target word
+        :type true_word: LayerOutput
+        :param true_token_flag: the flag of using the ground-truth target word
+        :type true_token_flag: LayerOutput
+        :return: the softmax output layer
+        :rtype: LayerOutput
+        """
+
+        decoder_mem = paddle.layer.memory(
+            name='gru_decoder', size=decoder_size, boot_layer=decoder_boot)
+
+        context = paddle.networks.simple_attention(
+            encoded_sequence=enc_vec,
+            encoded_proj=enc_proj,
+            decoder_state=decoder_mem)
+
+        gru_out_memory = paddle.layer.memory(
+            name='gru_out', size=target_dict_dim)
+
+        generated_word = paddle.layer.max_id(input=gru_out_memory)
+
+        generated_word_emb = paddle.layer.embedding(
+            input=generated_word,
+            size=word_vector_dim,
+            param_attr=paddle.attr.ParamAttr(name='_target_language_embedding'))
+
+        current_word = paddle.layer.multiplex(
+            input=[true_token_flag, true_word, generated_word_emb])
+
+        with paddle.layer.mixed(size=decoder_size * 3) as decoder_inputs:
+            decoder_inputs += paddle.layer.full_matrix_projection(input=context)
+            decoder_inputs += paddle.layer.full_matrix_projection(
+                input=current_word)
+
+        gru_step = paddle.layer.gru_step(
+            name='gru_decoder',
+            input=decoder_inputs,
+            output_mem=decoder_mem,
+            size=decoder_size)
+
+        with paddle.layer.mixed(
+                name='gru_out',
+                size=target_dict_dim,
+                bias_attr=True,
+                act=paddle.activation.Softmax()) as out:
+            out += paddle.layer.full_matrix_projection(input=gru_step)
+
+        return out
+```
+
+该函数使用`memory`层`gru_out_memory`记忆上一时刻生成的元素，根据`gru_out_memory`选择概率最大的词语`generated_word`作为生成的词语。`multiplex`层会在真实元素`true_word`和生成的元素`generated_word`之间做出选择，并将选择的结果作为解码器输入。`multiplex`层使用了三个输入，分别为`true_token_flag`、`true_word`和`generated_word_emb`。对于这三个输入中每个元素，若`true_token_flag`中的值为`0`，则`multiplex`层输出`true_word`中的相应元素；若`true_token_flag`中的值为`1`，则`multiplex`层输出`generated_word_emb`中的相应元素。
+
+## 参考文献
+
+[1] Bengio S, Vinyals O, Jaitly N, et al. [Scheduled sampling for sequence prediction with recurrent neural networks](http://papers.nips.cc/paper/5956-scheduled-sampling-for-sequence-prediction-with-recurrent-neural-networks)//Advances in Neural Information Processing Systems. 2015: 1171-1179.
--- a/scheduled_sampling/img/Scheduled_Sampling.jpg
+++ b/scheduled_sampling/img/Scheduled_Sampling.jpg
--- a/scheduled_sampling/img/decay.jpg
+++ b/scheduled_sampling/img/decay.jpg
--- a/scheduled_sampling/random_schedule_generator.py
+++ b/scheduled_sampling/random_schedule_generator.py
+import numpy as np
+import math
+
+
+class RandomScheduleGenerator:
+    """
+    The random sampling rate for scheduled sampling algoithm, which uses devcayed
+    sampling rate.
+    """
+
+    def __init__(self, schedule_type, a, b):
+        """
+        schduled_type: is the type of the decay. It supports constant, linear,
+        exponential, and inverse_sigmoid right now.
+        a: parameter of the decay (MUST BE DOUBLE)
+        b: parameter of the decay (MUST BE DOUBLE)
+        """
+        self.schedule_type = schedule_type
+        self.a = a
+        self.b = b
+        self.data_processed_ = 0
+        self.schedule_computers = {
+            "constant": lambda a, b, d: a,
+            "linear": lambda a, b, d: max(a, 1 - d / b),
+            "exponential": lambda a, b, d: pow(a, d / b),
+            "inverse_sigmoid": lambda a, b, d: b / (b + math.exp(d * a / b)),
+        }
+        assert (self.schedule_type in self.schedule_computers)
+        self.schedule_computer = self.schedule_computers[self.schedule_type]
+
+    def getScheduleRate(self):
+        """
+        Get the schedule sampling rate. Usually not needed to be called by the users
+        """
+        return self.schedule_computer(self.a, self.b, self.data_processed_)
+
+    def processBatch(self, batch_size):
+        """
+        Get a batch_size of sampled indexes. These indexes can be passed to a
+        MultiplexLayer to select from the grouth truth and generated samples
+        from the last time step.
+        """
+        rate = self.getScheduleRate()
+        numbers = np.random.rand(batch_size)
+        indexes = (numbers >= rate).astype('int32').tolist()
+        self.data_processed_ += batch_size
+        return indexes
--- a/scheduled_sampling/scheduled_sampling.py
+++ b/scheduled_sampling/scheduled_sampling.py
+import sys
+import paddle.v2 as paddle
+from random_schedule_generator import RandomScheduleGenerator
+
+schedule_generator = RandomScheduleGenerator("linear", 0.75, 1000000)
+
+
+def gen_schedule_data(reader):
+    """
+    Creates a data reader for scheduled sampling.
+
+    Output from the iterator that created by original reader will be
+    appended with "true_token_flag" to indicate whether to use true token.
+
+    :param reader: the original reader.
+    :type reader: callable
+
+    :return: the new reader with the field "true_token_flag".
+    :rtype: callable
+    """
+
+    def data_reader():
+        for src_ids, trg_ids, trg_ids_next in reader():
+            yield src_ids, trg_ids, trg_ids_next, \
+                  [0] + schedule_generator.processBatch(len(trg_ids) - 1)
+
+    return data_reader
+
+
+def seqToseq_net(source_dict_dim, target_dict_dim, is_generating=False):
+    """
+    The definition of the sequence to sequence model
+    :param source_dict_dim: the dictionary size of the source language
+    :type source_dict_dim: int
+    :param target_dict_dim: the dictionary size of the target language
+    :type target_dict_dim: int
+    :param is_generating: whether in generating mode
+    :type is_generating: Bool
+    :return: the last layer of the network
+    :rtype: LayerOutput
+    """
+    ### Network Architecture
+    word_vector_dim = 512  # dimension of word vector
+    decoder_size = 512  # dimension of hidden unit in GRU Decoder network
+    encoder_size = 512  # dimension of hidden unit in GRU Encoder network
+
+    beam_size = 3
+    max_length = 250
+
+    #### Encoder
+    src_word_id = paddle.layer.data(
+        name='source_language_word',
+        type=paddle.data_type.integer_value_sequence(source_dict_dim))
+    src_embedding = paddle.layer.embedding(
+        input=src_word_id, size=word_vector_dim)
+    src_forward = paddle.networks.simple_gru(
+        input=src_embedding, size=encoder_size)
+    src_backward = paddle.networks.simple_gru(
+        input=src_embedding, size=encoder_size, reverse=True)
+    encoded_vector = paddle.layer.concat(input=[src_forward, src_backward])
+
+    #### Decoder
+    with paddle.layer.mixed(size=decoder_size) as encoded_proj:
+        encoded_proj += paddle.layer.full_matrix_projection(
+            input=encoded_vector)
+
+    backward_first = paddle.layer.first_seq(input=src_backward)
+
+    with paddle.layer.mixed(
+            size=decoder_size, act=paddle.activation.Tanh()) as decoder_boot:
+        decoder_boot += paddle.layer.full_matrix_projection(
+            input=backward_first)
+
+    def gru_decoder_with_attention_train(enc_vec, enc_proj, true_word,
+                                         true_token_flag):
+        """
+        The decoder step for training.
+        :param enc_vec: the encoder vector for attention
+        :type enc_vec: LayerOutput
+        :param enc_proj: the encoder projection for attention
+        :type enc_proj: LayerOutput
+        :param true_word: the ground-truth target word
+        :type true_word: LayerOutput
+        :param true_token_flag: the flag of using the ground-truth target word
+        :type true_token_flag: LayerOutput
+        :return: the softmax output layer
+        :rtype: LayerOutput
+        """
+
+        decoder_mem = paddle.layer.memory(
+            name='gru_decoder', size=decoder_size, boot_layer=decoder_boot)
+
+        context = paddle.networks.simple_attention(
+            encoded_sequence=enc_vec,
+            encoded_proj=enc_proj,
+            decoder_state=decoder_mem)
+
+        gru_out_memory = paddle.layer.memory(
+            name='gru_out', size=target_dict_dim)
+
+        generated_word = paddle.layer.max_id(input=gru_out_memory)
+
+        generated_word_emb = paddle.layer.embedding(
+            input=generated_word,
+            size=word_vector_dim,
+            param_attr=paddle.attr.ParamAttr(name='_target_language_embedding'))
+
+        current_word = paddle.layer.multiplex(
+            input=[true_token_flag, true_word, generated_word_emb])
+
+        with paddle.layer.mixed(size=decoder_size * 3) as decoder_inputs:
+            decoder_inputs += paddle.layer.full_matrix_projection(input=context)
+            decoder_inputs += paddle.layer.full_matrix_projection(
+                input=current_word)
+
+        gru_step = paddle.layer.gru_step(
+            name='gru_decoder',
+            input=decoder_inputs,
+            output_mem=decoder_mem,
+            size=decoder_size)
+
+        with paddle.layer.mixed(
+                name='gru_out',
+                size=target_dict_dim,
+                bias_attr=True,
+                act=paddle.activation.Softmax()) as out:
+            out += paddle.layer.full_matrix_projection(input=gru_step)
+
+        return out
+
+    def gru_decoder_with_attention_test(enc_vec, enc_proj, current_word):
+        """
+        The decoder step for generating.
+        :param enc_vec: the encoder vector for attention
+        :type enc_vec: LayerOutput
+        :param enc_proj: the encoder projection for attention
+        :type enc_proj: LayerOutput
+        :param current_word: the previously generated word
+        :type current_word: LayerOutput
+        :return: the softmax output layer
+        :rtype: LayerOutput
+        """
+
+        decoder_mem = paddle.layer.memory(
+            name='gru_decoder', size=decoder_size, boot_layer=decoder_boot)
+
+        context = paddle.networks.simple_attention(
+            encoded_sequence=enc_vec,
+            encoded_proj=enc_proj,
+            decoder_state=decoder_mem)
+
+        with paddle.layer.mixed(size=decoder_size * 3) as decoder_inputs:
+            decoder_inputs += paddle.layer.full_matrix_projection(input=context)
+            decoder_inputs += paddle.layer.full_matrix_projection(
+                input=current_word)
+
+        gru_step = paddle.layer.gru_step(
+            name='gru_decoder',
+            input=decoder_inputs,
+            output_mem=decoder_mem,
+            size=decoder_size)
+
+        with paddle.layer.mixed(
+                size=target_dict_dim,
+                bias_attr=True,
+                act=paddle.activation.Softmax()) as out:
+            out += paddle.layer.full_matrix_projection(input=gru_step)
+        return out
+
+    decoder_group_name = "decoder_group"
+    group_input1 = paddle.layer.StaticInput(input=encoded_vector, is_seq=True)
+    group_input2 = paddle.layer.StaticInput(input=encoded_proj, is_seq=True)
+    group_inputs = [group_input1, group_input2]
+
+    if not is_generating:
+        trg_embedding = paddle.layer.embedding(
+            input=paddle.layer.data(
+                name='target_language_word',
+                type=paddle.data_type.integer_value_sequence(target_dict_dim)),
+            size=word_vector_dim,
+            param_attr=paddle.attr.ParamAttr(name='_target_language_embedding'))
+        group_inputs.append(trg_embedding)
+
+        true_token_flags = paddle.layer.data(
+            name='true_token_flag',
+            type=paddle.data_type.integer_value_sequence(2))
+        group_inputs.append(true_token_flags)
+
+        decoder = paddle.layer.recurrent_group(
+            name=decoder_group_name,
+            step=gru_decoder_with_attention_train,
+            input=group_inputs)
+
+        lbl = paddle.layer.data(
+            name='target_language_next_word',
+            type=paddle.data_type.integer_value_sequence(target_dict_dim))
+        cost = paddle.layer.classification_cost(input=decoder, label=lbl)
+
+        return cost
+    else:
+        trg_embedding = paddle.layer.GeneratedInput(
+            size=target_dict_dim,
+            embedding_name='_target_language_embedding',
+            embedding_size=word_vector_dim)
+        group_inputs.append(trg_embedding)
+
+        beam_gen = paddle.layer.beam_search(
+            name=decoder_group_name,
+            step=gru_decoder_with_attention_test,
+            input=group_inputs,
+            bos_id=0,
+            eos_id=1,
+            beam_size=beam_size,
+            max_length=max_length)
+
+        return beam_gen
+
+
+def main():
+    paddle.init(use_gpu=False, trainer_count=1)
+    is_generating = False
+    model_path_for_generating = 'params_pass_1.tar.gz'
+
+    # source and target dict dim.
+    dict_size = 30000
+    source_dict_dim = target_dict_dim = dict_size
+
+    # train the network
+    if not is_generating:
+        cost = seqToseq_net(source_dict_dim, target_dict_dim)
+        parameters = paddle.parameters.create(cost)
+
+        # define optimize method and trainer
+        optimizer = paddle.optimizer.Adam(
+            learning_rate=5e-5,
+            regularization=paddle.optimizer.L2Regularization(rate=8e-4))
+        trainer = paddle.trainer.SGD(
+            cost=cost, parameters=parameters, update_equation=optimizer)
+        # define data reader
+        wmt14_reader = paddle.batch(
+            gen_schedule_data(
+                paddle.reader.shuffle(
+                    paddle.dataset.wmt14.train(dict_size), buf_size=8192)),
+            batch_size=5)
+
+        feeding = {
+            'source_language_word': 0,
+            'target_language_word': 1,
+            'target_language_next_word': 2,
+            'true_token_flag': 3
+        }
+
+        # define event_handler callback
+        def event_handler(event):
+            if isinstance(event, paddle.event.EndIteration):
+                if event.batch_id % 10 == 0:
+                    print "\nPass %d, Batch %d, Cost %f, %s" % (
+                        event.pass_id, event.batch_id, event.cost,
+                        event.metrics)
+                else:
+                    sys.stdout.write('.')
+                    sys.stdout.flush()
+            if isinstance(event, paddle.event.EndPass):
+                # save parameters
+                with gzip.open('params_pass_%d.tar.gz' % event.pass_id,
+                               'w') as f:
+                    parameters.to_tar(f)
+
+        # start to train
+        trainer.train(
+            reader=wmt14_reader,
+            event_handler=event_handler,
+            feeding=feeding,
+            num_passes=2)
+
+    # generate a english sequence to french
+    else:
+        # use the first 3 samples for generation
+        gen_creator = paddle.dataset.wmt14.gen(dict_size)
+        gen_data = []
+        gen_num = 3
+        for item in gen_creator():
+            gen_data.append((item[0], ))
+            if len(gen_data) == gen_num:
+                break
+
+        beam_gen = seqToseq_net(source_dict_dim, target_dict_dim, is_generating)
+        # get the trained model
+        with gzip.open(model_path_for_generating, 'r') as f:
+            parameters = Parameters.from_tar(f)
+        # prob is the prediction probabilities, and id is the prediction word.
+        beam_result = paddle.infer(
+            output_layer=beam_gen,
+            parameters=parameters,
+            input=gen_data,
+            field=['prob', 'id'])
+
+        # get the dictionary
+        src_dict, trg_dict = paddle.dataset.wmt14.get_dict(dict_size)
+
+        # the delimited element of generated sequences is -1,
+        # the first element of each generated sequence is the sequence length
+        seq_list = []
+        seq = []
+        for w in beam_result[1]:
+            if w != -1:
+                seq.append(w)
+            else:
+                seq_list.append(' '.join([trg_dict.get(w) for w in seq[1:]]))
+                seq = []
+
+        prob = beam_result[0]
+        beam_size = 3
+        for i in xrange(gen_num):
+            print "\n*******************************************************\n"
+            print "src:", ' '.join(
+                [src_dict.get(w) for w in gen_data[i][0]]), "\n"
+            for j in xrange(beam_size):
+                print "prob = %f:" % (prob[i][j]), seq_list[i * beam_size + j]
+
+
+if __name__ == '__main__':
+    main()
--- a/sequence_tagging_for_ner/README.md
+++ b/sequence_tagging_for_ner/README.md
--- a/sequence_tagging_for_ner/data/download.sh
+++ b/sequence_tagging_for_ner/data/download.sh
 wget http://cs224d.stanford.edu/assignment2/assignment2.zip
-unzip assignment2.zip
-cp assignment2_release/data/ner/wordVectors.txt ./
-cp assignment2_release/data/ner/vocab.txt ./
-rm -rf assignment2.zip assignment2_release
+
+if [ $? -eq 0  ];then
+    unzip assignment2.zip
+    cp assignment2_release/data/ner/wordVectors.txt ./data
+    cp assignment2_release/data/ner/vocab.txt ./data
+    rm -rf assignment2.zip assignment2_release
+else
+  echo "download data error!" >> /dev/stderr
+  exit 1
+fi

--- a/sequence_tagging_for_ner/index.html
+++ b/sequence_tagging_for_ner/index.html
--- a/sequence_tagging_for_ner/infer.py
+++ b/sequence_tagging_for_ner/infer.py
@@ -12,11 +12,10 @@ def infer(model_path, batch_size, test_data_file, vocab_file, target_file):

        for idx, test_sample in enumerate(test_data):
            start_id = 0
-            pred_str = ""
            for w, tag in zip(test_sample[0],
                              probs[start_id:start_id + len(test_sample[0])]):
-                pred_str += "%s[%s] " % (id_2_word[w], id_2_label[tag])
-            print(pred_str.strip())
+                print("%s\t%s" % (id_2_word[w], id_2_label[tag]))
+            print("\n")
            start_id += len(test_sample[0])

    word_dict = load_dict(vocab_file)

--- a/sequence_tagging_for_ner/reader.py
+++ b/sequence_tagging_for_ner/reader.py
@@ -26,8 +26,6 @@ def canonicalize_word(word, wordset=None, digits=True):

 def data_reader(data_file, word_dict, label_dict):
    """
-    Conll03 train set creator.
-
    The dataset can be obtained according to http://www.clips.uantwerpen.be/conll2003/ner/.
    It returns a reader creator, each sample in the reader includes:
    word id sequence, label id sequence and raw sentence.

--- a/sequence_tagging_for_ner/train.py
+++ b/sequence_tagging_for_ner/train.py
@@ -5,8 +5,6 @@ import reader
 from utils import *
 from network_conf import *

-from paddle.v2.layer import parse_network
-

 def main(train_data_file,
         test_data_file,

--- a/sequence_tagging_for_ner/utils.py
+++ b/sequence_tagging_for_ner/utils.py
@@ -19,10 +19,29 @@ def get_embedding(emb_file='data/wordVectors.txt'):


 def load_dict(dict_path):
+    """
+    Load the word dictionary from the given file.
+    Each line of the given file is a word, which can include multiple columns
+    seperated by tab.
+
+    This function takes the first column (columns in a line are seperated by
+    tab) as key and takes line number of a line as the key (index of the word
+    in the dictionary).
+    """
+
    return dict((line.strip().split("\t")[0], idx)
                for idx, line in enumerate(open(dict_path, "r").readlines()))


 def load_reverse_dict(dict_path):
+    """
+    Load the word dictionary from the given file.
+    Each line of the given file is a word, which can include multiple columns
+    seperated by tab.
+
+    This function takes line number of a line as the key (index of the word in
+    the dictionary) and the first column (columns in a line are seperated by
+    tab) as the value.
+    """
    return dict((idx, line.strip().split("\t")[0])
                for idx, line in enumerate(open(dict_path, "r").readlines()))