提交 2319ecdc 编写于 作者: Y Yu Yang

Merge branch 'gh-pages' of github.com:baidu/Paddle into gh-pages

......@@ -90,7 +90,7 @@ var _hmt = _hmt || [];
<h1>PaddlePaddle</h1>
<hr>
<p>开放易用的深度学习平台,工业应用与学术研究并举</p>
<a href="http://book.paddlepaddle.org" class="btn btn-primary btn-xl page-scroll btn-pd" target="_blank">快速上手</a>
<a href="http://book.paddlepaddle.org" class="btn btn-primary btn-xl page-scroll btn-pd" target="_blank">深度学习入门</a>
<a href="http://www.paddlepaddle.org/doc_cn/" class="btn btn-primary btn-xl page-scroll btn-pd" target="_blank">文档</a>
<a href="https://github.com/baidu/paddle" class="btn btn-primary btn-xl page-scroll btn-pd" target="_blank">代码</a>
</div>
......
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 4d7a146cda87e1e0222ce8a24b0ea6b4
tags: 645f666f9bcd5a90fca523b33c5a78b7
ABOUT
=======
PaddlPaddle is an easy-to-use, efficient, flexible and scalable deep learning platform,
which is originally developed by Baidu scientists and engineers for the purpose of applying deep learning to many products at Baidu.
PaddlePaddle is now open source but far from complete, which is intended to be built upon, improved, scaled, and extended.
We hope to build an active open source community both by providing feedback and by actively contributing to the source code.
Credits
--------
We owe many thanks to `all contributors and developers <https://github.com/PaddlePaddle/Paddle/graphs/contributors>`_ of PaddlePaddle!
API
===
Model Config API
----------------
.. toctree::
:maxdepth: 1
v2/model_configs.rst
Data API
--------
.. toctree::
:maxdepth: 1
v2/data.rst
Train API
---------
.. toctree::
:maxdepth: 1
v2/run_logic.rst
\ No newline at end of file
Introduction
==============
DataProvider is a module that loads training or testing data into cpu or gpu
memory for the following triaining or testing process.
For simple use, users can use Python :code:`PyDataProvider` to dynamically reads
the original data in any format or in any form, and then transfer them into a
data format PaddlePaddle requires. The process is extremly flexible and highly
customized, with sacrificing the efficiency only a little. This is extremly
useful when you have to dynamically generate certain kinds of data according to,
for example, the training performance.
Besides, users also can customize a C++ :code:`DataProvider` for a more
complex usage, or for a higher efficiency.
The following parameters are required to define in the PaddlePaddle network
configuration file (trainer_config.py): which DataProvider is chosen to used,
and specific parameters for DataProvider, including training file list
(train.list) and testing file list (test.list).
Train.list and test.list are simply two plain text files, which defines path
of training or testing data. It is recommended that directly placing them into
the training directory, and reference to them by using a relative path (
relative to the PaddePaddle program).
Testing or evaluating will not be performed during training if the test.list is
not set or set to None. Otherwise, PaddlePaddle will evaluate the trained model
by the specified tesing data while training, every testing period (a user
defined command line parameter in PaddlePaddle) to prevent over-fitting.
Each line of train.list and test.list is an absolute or relative path (relative
to the PaddePaddle program runtime) of data file. Fascinatingly more, each line
can also be a HDFS file path or a SQL connection string. As long as the user
assures how to access each file in DataProvider.
.. _api_pydataprovider2:
PyDataProvider2
===============
We highly recommand users to use PyDataProvider2 to provide training or testing
data to PaddlePaddle. The user only needs to focus on how to read a single
sample from the original data file by using PyDataProvider2, leaving all of the
trivial work, including, transfering data into cpu/gpu memory, shuffle, binary
serialization to PyDataProvider2. PyDataProvider2 uses multithreading and a
fanscinating but simple cache strategy to optimize the efficiency of the data
providing process.
DataProvider for the non-sequential model
-----------------------------------------
Here we use the MNIST handwriting recognition data as an example to illustrate
how to write a simple PyDataProvider.
MNIST is a handwriting classification data set. It contains 70,000 digital
grayscale images. Labels of the training sample range from 0 to 9. All the
images have been size-normalized and centered into images with the same size
of 28 x 28 pixels.
A small part of the original data as an example is shown as below:
.. literalinclude:: src/mnist_train.txt
Each line of the data contains two parts, separated by :code:`;`. The first part is
label of an image. The second part contains 28x28 pixel float values.
Just write path of the above data into train.list. It looks like this:
.. literalinclude:: src/train.list
The corresponding dataprovider is shown as below:
.. literalinclude:: src/mnist_provider.dict.py
The first line imports PyDataProvider2 package.
The main function is the process function, that has two parameters.
The first parameter is the settings, which is not used in this example.
The second parameter is the filename, that is exactly each line of train.list.
This parameter is passed to the process function by PaddlePaddle.
:code:`@provider` is a Python
`Decorator <http://www.learnpython.org/en/Decorators>`_ .
It sets some properties to DataProvider, and constructs a real PaddlePaddle
DataProvider from a very simple user implemented python function. It does not
matter if you are not familiar with `Decorator`_. You can keep it simple by
just taking :code:`@provider` as a fixed mark above the provider function you
implemented.
`input_types`_ defines the data format that a DataProvider returns.
In this example, it is set to a 28x28-dimensional dense vector and an integer
scalar, whose value ranges from 0 to 9.
`input_types`_ can be set to several kinds of input formats, please refer to the
document of `input_types`_ for more details.
The process method is the core part to construct a real DataProvider in
PaddlePaddle. It implements how to open the text file, how to read one sample
from the original text file, convert them into `input_types`_, and give them
back to PaddlePaddle process at line 23.
Note that data yielded by the process function must follow the same order that
`input_types`_ are defined.
With the help of PyDataProvider2, user can focus on how to generate ONE traning
sample by using keywords :code:`yield`.
:code:`yield` is a python keyword, and a concept related to it includes
:code:`generator`.
Only a few lines of codes need to be added into the training configuration file,
you can take this as an example.
.. literalinclude:: src/mnist_config.py
Here we specify training data by :code:`train.list`, and no testing data is specified.
The method which actually provide data is :code:`process`.
User also can use another style to provide data, which defines the
:code:`data_layer`'s name explicitly when `yield`. For example,
the :code:`dataprovider` is shown as below.
.. literalinclude:: src/mnist_provider.dict.py
:linenos:
If user did't give the :code:`data_layer`'s name, PaddlePaddle will use
the order of :code:`data_layer` definition roughly to determine which feature to
which :code:`data_layer`. This order may be not correct, so TO DEFINE THE
:code:`data_layer`'s NAMES EXPLICITLY IS THE RECOMMANDED WAY TO PROVIDER DATA.
Now, this simple example of using PyDataProvider is finished.
The only thing that the user should know is how to generte **one sample** from
**one data file**.
And PaddlePadle will do all of the rest things\:
* Form a training batch
* Shuffle the training data
* Read data with multithreading
* Cache the training data (Optional)
* CPU-> GPU double buffering.
Is this cool?
.. _api_pydataprovider2_sequential_model:
DataProvider for the sequential model
-------------------------------------
A sequence model takes sequences as its input. A sequence is made up of several
timesteps. The so-called timestep, is not necessary to have something to do
with time. It can also be explained to that the order of data are taken into
consideration into model design and training.
For example, the sentence can be interpreted as a kind of sequence data in NLP
tasks.
Here is an example on data proivider for English sentiment classification data.
The original input data are simple English text, labeled into positive or
negative sentiment (marked by 0 and 1 respectively).
A small part of the original data as an example can be found in the path below:
.. literalinclude:: src/sentimental_train.txt
The corresponding data provider can be found in the path below:
.. literalinclude:: src/sentimental_provider.py
This data provider for sequential model is a little more complex than that
for MINST dataset.
A new initialization method is introduced here.
The method :code:`on_init` is configured to DataProvider by :code:`@provider`'s
:code:`init_hook` parameter, and it will be invoked once DataProvider is
initialized. The :code:`on_init` function has the following parameters:
* The first parameter is the settings object.
* The rest parameters are passed by key word arguments. Some of them are passed
by PaddlePaddle, see reference for `init_hook`_.
The :code:`dictionary` object is a python dict object passed from the trainer
configuration file, and it maps word string to word id.
To pass these parameters into DataProvider, the following lines should be added
into trainer configuration file.
.. literalinclude:: src/sentimental_config.py
The definition is basically same as MNIST example, except:
* Load dictionary in this configuration
* Pass it as a parameter to the DataProvider
The `input_types` is configured in method :code:`on_init`. It has the same
effect to configure them by :code:`@provider`'s :code:`input_types` parameter.
However, the :code:`input_types` is set at runtime, so we can set it to
different types according to the input data. Input of the neural network is a
sequence of word id, so set :code:`seq_type` to :code:`integer_value_sequence`.
Durning :code:`on_init`, we save :code:`dictionary` variable to
:code:`settings`, and it will be used in :code:`process`. Note the settings
parameter for the process function and for the on_init's function are a same
object.
The basic processing logic is the same as MNIST's :code:`process` method. Each
sample in the data file is given back to PaddlePaddle process.
Thus, the basic usage of PyDataProvider is here.
Please refer to the following section reference for details.
Reference
---------
@provider
+++++++++
.. autofunction:: paddle.trainer.PyDataProvider2.provider
input_types
+++++++++++
PaddlePaddle has four data types, and three sequence types.
The four data types are:
* :code:`dense_vector`: dense float vector.
* :code:`sparse_binary_vector`: sparse binary vector, most of the value is 0, and
the non zero elements are fixed to 1.
* :code:`sparse_float_vector`: sparse float vector, most of the value is 0, and some
non zero elements can be any float value. They are given by the user.
* :code:`integer`: an integer scalar, that is especially used for label or word index.
The three sequence types are:
* :code:`SequenceType.NO_SEQUENCE` means the sample is not a sequence.
* :code:`SequenceType.SEQUENCE` means the sample is a sequence.
* :code:`SequenceType.SUB_SEQUENCE` means it is a nested sequence, that each timestep of
the input sequence is also a sequence.
Different input type has a defferenct input format. Their formats are shown
in the above table.
+----------------------+---------------------+-----------------------------------+------------------------------------------------+
| | NO_SEQUENCE | SEQUENCE | SUB_SEQUENCE |
+======================+=====================+===================================+================================================+
| dense_vector | [f, f, ...] | [[f, ...], [f, ...], ...] | [[[f, ...], ...], [[f, ...], ...],...] |
+----------------------+---------------------+-----------------------------------+------------------------------------------------+
| sparse_binary_vector | [i, i, ...] | [[i, ...], [i, ...], ...] | [[[i, ...], ...], [[i, ...], ...],...] |
+----------------------+---------------------+-----------------------------------+------------------------------------------------+
| sparse_float_vector | [(i,f), (i,f), ...] | [[(i,f), ...], [(i,f), ...], ...] | [[[(i,f), ...], ...], [[(i,f), ...], ...],...] |
+----------------------+---------------------+-----------------------------------+------------------------------------------------+
| integer_value | i | [i, i, ...] | [[i, ...], [i, ...], ...] |
+----------------------+---------------------+-----------------------------------+------------------------------------------------+
where f represents a float value, i represents an integer value.
init_hook
+++++++++
init_hook is a function that is invoked once the data provoder is initialized.
Its parameters lists as follows:
* The first parameter is a settings object, which is the same to :code:`settings`
in :code:`process` method. The object contains several attributes, including:
* :code:`settings.input_types`: the input types. Reference `input_types`_.
* :code:`settings.logger`: a logging object.
* The rest parameters are the key word arguments. It is made up of PaddpePaddle
pre-defined parameters and user defined parameters.
* PaddlePaddle-defined parameters including:
* :code:`is_train` is a bool parameter that indicates the DataProvider is used in
training or testing.
* :code:`file_list` is the list of all files.
* User-defined parameters args can be set in training configuration.
Note, PaddlePaddle reserves the right to add pre-defined parameter, so please
use :code:`**kwargs` in init_hook to ensure compatibility by accepting the
parameters which your init_hook does not use.
cache
+++++
DataProvider provides two simple cache strategy. They are:
* :code:`CacheType.NO_CACHE` means do not cache any data, then data is read at runtime by
the user implemented python module every pass.
* :code:`CacheType.CACHE_PASS_IN_MEM` means the first pass reads data by the user
implemented python module, and the rest passes will directly read data from
memory.
API
===
DataProvider API
----------------
.. toctree::
:maxdepth: 1
data_provider/dataprovider_en.rst
data_provider/pydataprovider2_en.rst
.. _api_trainer_config:
Model Config API
----------------
.. toctree::
:maxdepth: 1
trainer_config_helpers/optimizers.rst
trainer_config_helpers/data_sources.rst
trainer_config_helpers/layers.rst
trainer_config_helpers/activations.rst
trainer_config_helpers/poolings.rst
trainer_config_helpers/networks.rst
trainer_config_helpers/evaluators.rst
trainer_config_helpers/attrs.rst
Applications API
----------------
.. toctree::
:maxdepth: 1
predict/swig_py_paddle_en.rst
Python Prediction
==================
PaddlePaddle offers a set of clean prediction interfaces for python with the help of
SWIG. The main steps of predict values in python are:
* Parse training configurations
* Construct GradientMachine
* Prepare data
* Predict
Here is a sample python script that shows the typical prediction process for the
MNIST classification problem. A complete sample code could be found at
:code:`src_root/doc/ui/predict/predict_sample.py`.
.. literalinclude:: src/predict_sample.py
:language: python
:lines: 15-18,90-100,101-104
The module that does the most of the job is py_paddle.swig_paddle, it's
generated by SWIG and has complete documents, for more details you can use
python's :code:`help()` function. Let's walk through the above python script:
* At the beginning, use :code:`swig_paddle.initPaddle()` to initialize
PaddlePaddle with command line arguments, for more about command line arguments
see :ref:`cmd_detail_introduction` .
* Parse the configuration file that is used in training with :code:`parse_config()`.
Because data to predict with always have no label, and output of prediction work
normally is the output layer rather than the cost layer, so you should modify
the configuration file accordingly before using it in the prediction work.
* Create a neural network with
:code:`swig_paddle.GradientMachine.createFromConfigproto()`, which takes the
parsed configuration :code:`conf.model_config` as argument. Then load the
trained parameters from the model with :code:`network.loadParameters()`.
* Create a data converter object of utility class :code:`DataProviderConverter`.
- Note: As swig_paddle can only accept C++ matrices, we offer a utility
class DataProviderConverter that can accept the same input data with
PyDataProvider2, for more information please refer to document
of :ref:`api_pydataprovider2` .
* Do the prediction with :code:`forwardTest()`, which takes the converted
input data and outputs the activations of the output layer.
Here is a typical output:
.. code-block:: text
[{'id': None, 'value': array([[ 5.53018653e-09, 1.12194102e-05, 1.96644767e-09,
1.43630644e-02, 1.51111044e-13, 9.85625684e-01,
2.08823112e-10, 2.32777140e-08, 2.00186201e-09,
1.15501715e-08],
[ 9.99982715e-01, 1.27787406e-10, 1.72296313e-05,
1.49316648e-09, 1.36540484e-11, 6.93137714e-10,
2.70634608e-08, 3.48565123e-08, 5.25639710e-09,
4.48684503e-08]], dtype=float32)}]
:code:`value` is the output of the output layer, each row represents result of
the corresponding row in the input data, each element represents activation of
the corresponding neuron in the output layer.
===========
Activations
===========
BaseActivation
==============
.. automodule:: paddle.trainer_config_helpers.activations
:members: BaseActivation
:noindex:
AbsActivation
===============
.. automodule:: paddle.trainer_config_helpers.activations
:members: AbsActivation
:noindex:
ExpActivation
===============
.. automodule:: paddle.trainer_config_helpers.activations
:members: ExpActivation
:noindex:
IdentityActivation
==================
.. automodule:: paddle.trainer_config_helpers.activations
:members: IdentityActivation
:noindex:
LinearActivation
==================
.. automodule:: paddle.trainer_config_helpers.activations
:members: LinearActivation
:noindex:
LogActivation
==================
.. automodule:: paddle.trainer_config_helpers.activations
:members: LogActivation
:noindex:
SquareActivation
================
.. automodule:: paddle.trainer_config_helpers.activations
:members: SquareActivation
:noindex:
SigmoidActivation
=================
.. automodule:: paddle.trainer_config_helpers.activations
:members: SigmoidActivation
:noindex:
SoftmaxActivation
=================
.. automodule:: paddle.trainer_config_helpers.activations
:members: SoftmaxActivation
:noindex:
SequenceSoftmaxActivation
=========================
.. automodule:: paddle.trainer_config_helpers.activations
:members: SequenceSoftmaxActivation
:noindex:
ReluActivation
==============
.. automodule:: paddle.trainer_config_helpers.activations
:members: ReluActivation
:noindex:
BReluActivation
===============
.. automodule:: paddle.trainer_config_helpers.activations
:members: BReluActivation
:noindex:
SoftReluActivation
==================
.. automodule:: paddle.trainer_config_helpers.activations
:members: SoftReluActivation
:noindex:
TanhActivation
==============
.. automodule:: paddle.trainer_config_helpers.activations
:members: TanhActivation
:noindex:
STanhActivation
===============
.. automodule:: paddle.trainer_config_helpers.activations
:members: STanhActivation
:noindex:
Parameter Attributes
=======================
.. automodule:: paddle.trainer_config_helpers.attrs
:members:
.. _api_trainer_config_helpers_data_sources:
DataSources
===========
.. automodule:: paddle.trainer_config_helpers.data_sources
:members:
.. _api_trainer_config_helpers_evaluators:
==========
Evaluators
==========
Base
====
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: evaluator_base
:noindex:
Classification
==============
classification_error_evaluator
------------------------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: classification_error_evaluator
:noindex:
auc_evaluator
-------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: auc_evaluator
:noindex:
ctc_error_evaluator
-------------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: ctc_error_evaluator
:noindex:
chunk_evaluator
---------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: chunk_evaluator
:noindex:
precision_recall_evaluator
--------------------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: precision_recall_evaluator
:noindex:
Rank
====
pnpair_evaluator
----------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: pnpair_evaluator
:noindex:
Utils
=====
sum_evaluator
-------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: sum_evaluator
:noindex:
column_sum_evaluator
--------------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: column_sum_evaluator
:noindex:
Print
=====
classification_error_printer_evaluator
--------------------------------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: classification_error_printer_evaluator
:noindex:
gradient_printer_evaluator
--------------------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: gradient_printer_evaluator
:noindex:
maxid_printer_evaluator
-----------------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: maxid_printer_evaluator
:noindex:
maxframe_printer_evaluator
---------------------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: maxframe_printer_evaluator
:noindex:
seqtext_printer_evaluator
-------------------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: seqtext_printer_evaluator
:noindex:
value_printer_evaluator
-----------------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: value_printer_evaluator
:noindex:
.. _api_trainer_config_helpers_layers:
======
Layers
======
Base
======
LayerType
---------
.. automodule:: paddle.trainer_config_helpers.layers
:members: LayerType
:noindex:
LayerOutput
-----------
.. automodule:: paddle.trainer_config_helpers.layers
:members: LayerOutput
:noindex:
Data layer
===========
.. _api_trainer_config_helpers_layers_data_layer:
data_layer
----------
.. automodule:: paddle.trainer_config_helpers.layers
:members: data_layer
:noindex:
Fully Connected Layers
======================
.. _api_trainer_config_helpers_layers_fc_layer:
fc_layer
--------
.. automodule:: paddle.trainer_config_helpers.layers
:members: fc_layer
:noindex:
selective_fc_layer
------------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: selective_fc_layer
:noindex:
Conv Layers
===========
conv_operator
-------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: conv_operator
:noindex:
conv_projection
---------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: conv_projection
:noindex:
conv_shift_layer
------------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: conv_shift_layer
:noindex:
img_conv_layer
--------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: img_conv_layer
:noindex:
.. _api_trainer_config_helpers_layers_context_projection:
context_projection
------------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: context_projection
:noindex:
Image Pooling Layer
===================
img_pool_layer
--------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: img_pool_layer
:noindex:
spp_layer
--------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: spp_layer
:noindex:
maxout_layer
------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: maxout_layer
:noindex:
Norm Layer
==========
img_cmrnorm_layer
-----------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: img_cmrnorm_layer
:noindex:
batch_norm_layer
---------------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: batch_norm_layer
:noindex:
sum_to_one_norm_layer
---------------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: sum_to_one_norm_layer
:noindex:
Recurrent Layers
================
recurrent_layer
-----------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: recurrent_layer
:noindex:
lstmemory
---------
.. automodule:: paddle.trainer_config_helpers.layers
:members: lstmemory
:noindex:
grumemory
---------
.. automodule:: paddle.trainer_config_helpers.layers
:members: grumemory
:noindex:
Recurrent Layer Group
=====================
memory
------
.. automodule:: paddle.trainer_config_helpers.layers
:members: memory
:noindex:
recurrent_group
---------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: recurrent_group
:noindex:
lstm_step_layer
---------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: lstm_step_layer
:noindex:
gru_step_layer
---------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: gru_step_layer
:noindex:
beam_search
------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: beam_search
:noindex:
get_output_layer
-----------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: get_output_layer
:noindex:
Mixed Layer
===========
.. _api_trainer_config_helpers_layers_mixed_layer:
mixed_layer
-----------
.. automodule:: paddle.trainer_config_helpers.layers
:members: mixed_layer
:noindex:
.. _api_trainer_config_helpers_layers_embedding_layer:
embedding_layer
---------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: embedding_layer
:noindex:
scaling_projection
------------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: scaling_projection
:noindex:
dotmul_projection
-----------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: dotmul_projection
:noindex:
dotmul_operator
---------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: dotmul_operator
:noindex:
full_matrix_projection
----------------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: full_matrix_projection
:noindex:
identity_projection
-------------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: identity_projection
:noindex:
table_projection
----------------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: table_projection
:noindex:
trans_full_matrix_projection
----------------------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: trans_full_matrix_projection
:noindex:
Aggregate Layers
================
.. _api_trainer_config_helpers_layers_pooling_layer:
pooling_layer
-------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: pooling_layer
:noindex:
.. _api_trainer_config_helpers_layers_last_seq:
last_seq
--------
.. automodule:: paddle.trainer_config_helpers.layers
:members: last_seq
:noindex:
.. _api_trainer_config_helpers_layers_first_seq:
first_seq
---------
.. automodule:: paddle.trainer_config_helpers.layers
:members: first_seq
:noindex:
concat_layer
------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: concat_layer
:noindex:
seq_concat_layer
----------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: seq_concat_layer
:noindex:
Reshaping Layers
================
block_expand_layer
------------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: block_expand_layer
:noindex:
.. _api_trainer_config_helpers_layers_expand_layer:
expand_layer
------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: expand_layer
:noindex:
repeat_layer
------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: repeat_layer
:noindex:
rotate_layer
------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: rotate_layer
:noindex:
seq_reshape_layer
-----------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: seq_reshape_layer
:noindex:
Math Layers
===========
addto_layer
-----------
.. automodule:: paddle.trainer_config_helpers.layers
:members: addto_layer
:noindex:
linear_comb_layer
-----------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: linear_comb_layer
:noindex:
interpolation_layer
-------------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: interpolation_layer
:noindex:
bilinear_interp_layer
----------------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: bilinear_interp_layer
:noindex:
power_layer
-----------
.. automodule:: paddle.trainer_config_helpers.layers
:members: power_layer
:noindex:
scaling_layer
-------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: scaling_layer
:noindex:
slope_intercept_layer
----------------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: slope_intercept_layer
:noindex:
tensor_layer
------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: tensor_layer
:noindex:
.. _api_trainer_config_helpers_layers_cos_sim:
cos_sim
-------
.. automodule:: paddle.trainer_config_helpers.layers
:members: cos_sim
:noindex:
trans_layer
------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: trans_layer
:noindex:
Sampling Layers
===============
maxid_layer
-----------
.. automodule:: paddle.trainer_config_helpers.layers
:members: maxid_layer
:noindex:
sampling_id_layer
-----------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: sampling_id_layer
:noindex:
Slicing and Joining Layers
==========================
pad_layer
-----------
.. automodule:: paddle.trainer_config_helpers.layers
:members: pad_layer
:noindex:
.. _api_trainer_config_helpers_layers_cost_layers:
Cost Layers
===========
cross_entropy
-------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: cross_entropy
:noindex:
cross_entropy_with_selfnorm
---------------------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: cross_entropy_with_selfnorm
:noindex:
multi_binary_label_cross_entropy
--------------------------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: multi_binary_label_cross_entropy
:noindex:
huber_cost
----------
.. automodule:: paddle.trainer_config_helpers.layers
:members: huber_cost
:noindex:
lambda_cost
-----------
.. automodule:: paddle.trainer_config_helpers.layers
:members: lambda_cost
:noindex:
rank_cost
---------
.. automodule:: paddle.trainer_config_helpers.layers
:members: rank_cost
:noindex:
crf_layer
-----------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: crf_layer
:noindex:
crf_decoding_layer
-------------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: crf_decoding_layer
:noindex:
ctc_layer
-----------
.. automodule:: paddle.trainer_config_helpers.layers
:members: ctc_layer
:noindex:
warp_ctc_layer
--------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: warp_ctc_layer
:noindex:
nce_layer
-----------
.. automodule:: paddle.trainer_config_helpers.layers
:members: nce_layer
:noindex:
hsigmoid
---------
.. automodule:: paddle.trainer_config_helpers.layers
:members: hsigmoid
:noindex:
sum_cost
---------
.. automodule:: paddle.trainer_config_helpers.layers
:members: sum_cost
:noindex:
Check Layer
============
eos_layer
------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: eos_layer
:noindex:
========
Networks
========
The networks module contains pieces of neural network that combine multiple layers.
NLP
===
sequence_conv_pool
------------------
.. automodule:: paddle.trainer_config_helpers.networks
:members: sequence_conv_pool
:noindex:
.. _api_trainer_config_helpers_network_text_conv_pool:
text_conv_pool
--------------
.. automodule:: paddle.trainer_config_helpers.networks
:members: text_conv_pool
:noindex:
Images
======
img_conv_bn_pool
----------------
.. automodule:: paddle.trainer_config_helpers.networks
:members: img_conv_bn_pool
:noindex:
img_conv_group
--------------
.. automodule:: paddle.trainer_config_helpers.networks
:members: img_conv_group
:noindex:
.. _api_trainer_config_helpers_network_simple_img_conv_pool:
simple_img_conv_pool
--------------------
.. automodule:: paddle.trainer_config_helpers.networks
:members: simple_img_conv_pool
:noindex:
vgg_16_network
---------------
.. automodule:: paddle.trainer_config_helpers.networks
:members: vgg_16_network
:noindex:
Recurrent
=========
LSTM
----
lstmemory_unit
``````````````
.. automodule:: paddle.trainer_config_helpers.networks
:members: lstmemory_unit
:noindex:
lstmemory_group
```````````````
.. automodule:: paddle.trainer_config_helpers.networks
:members: lstmemory_group
:noindex:
simple_lstm
```````````
.. automodule:: paddle.trainer_config_helpers.networks
:members: simple_lstm
:noindex:
bidirectional_lstm
``````````````````
.. automodule:: paddle.trainer_config_helpers.networks
:members: bidirectional_lstm
:noindex:
GRU
---
gru_unit
````````
.. automodule:: paddle.trainer_config_helpers.networks
:members: gru_unit
:noindex:
gru_group
`````````
.. automodule:: paddle.trainer_config_helpers.networks
:members: gru_group
:noindex:
simple_gru
``````````
.. automodule:: paddle.trainer_config_helpers.networks
:members: simple_gru
:noindex:
simple_attention
----------------
.. automodule:: paddle.trainer_config_helpers.networks
:members: simple_attention
:noindex:
Miscs
=====
dropout_layer
--------------
.. automodule:: paddle.trainer_config_helpers.networks
:members: dropout_layer
:noindex:
outputs
-------
.. automodule:: paddle.trainer_config_helpers.networks
:members: outputs
:noindex:
.. _api_trainer_config_helpers_optimizers:
==========
Optimizers
==========
BaseSGDOptimizer
================
.. automodule:: paddle.trainer_config_helpers.optimizers
:members: BaseSGDOptimizer
:noindex:
MomentumOptimizer
=================
.. automodule:: paddle.trainer_config_helpers.optimizers
:members: MomentumOptimizer
:noindex:
AdamOptimizer
=============
.. automodule:: paddle.trainer_config_helpers.optimizers
:members: AdamOptimizer
:noindex:
AdamaxOptimizer
================
.. automodule:: paddle.trainer_config_helpers.optimizers
:members: AdamaxOptimizer
:noindex:
AdaGradOptimizer
================
.. automodule:: paddle.trainer_config_helpers.optimizers
:members: AdaGradOptimizer
:noindex:
DecayedAdaGradOptimizer
=======================
.. automodule:: paddle.trainer_config_helpers.optimizers
:members: DecayedAdaGradOptimizer
:noindex:
AdaDeltaOptimizer
=================
.. automodule:: paddle.trainer_config_helpers.optimizers
:members: AdaDeltaOptimizer
:noindex:
RMSPropOptimizer
================
.. automodule:: paddle.trainer_config_helpers.optimizers
:members: RMSPropOptimizer
:noindex:
.. _api_trainer_config_helpers_optimizers_settings:
settings
========
.. automodule:: paddle.trainer_config_helpers.optimizers
:members: settings
:noindex:
========
Poolings
========
BasePoolingType
===============
.. automodule:: paddle.trainer_config_helpers.poolings
:members: BasePoolingType
:noindex:
AvgPooling
==========
.. automodule:: paddle.trainer_config_helpers.poolings
:members: AvgPooling
:noindex:
MaxPooling
==========
.. automodule:: paddle.trainer_config_helpers.poolings
:members: MaxPooling
:noindex:
SumPooling
==========
.. automodule:: paddle.trainer_config_helpers.poolings
:members: SumPooling
:noindex:
SquareRootNPooling
==================
.. automodule:: paddle.trainer_config_helpers.poolings
:members: SquareRootNPooling
:noindex:
================
Data Related API
================
#########
DataTypes
#########
.. automodule:: paddle.v2.data_type
:members:
##########
DataFeeder
##########
.. automodule:: paddle.v2.data_feeder
:members:
######
Reader
######
.. automodule:: paddle.v2.reader
:members:
.. automodule:: paddle.v2.reader.creator
:members:
#########
minibatch
#########
.. automodule:: paddle.v2.minibatch
:members:
#######
Dataset
#######
.. automodule:: paddle.v2.dataset
:members:
mnist
+++++
.. automodule:: paddle.v2.dataset.mnist
:members:
cifar
+++++
.. automodule:: paddle.v2.dataset.cifar
:members:
conll05
+++++++
.. automodule:: paddle.v2.dataset.conll05
:members:
imdb
++++
.. automodule:: paddle.v2.dataset.imdb
:members:
imikolov
++++++++
.. automodule:: paddle.v2.dataset.imikolov
:members:
movielens
+++++++++
.. automodule:: paddle.v2.dataset.movielens
:members:
sentiment
+++++++++
.. automodule:: paddle.v2.dataset.sentiment
:members:
uci_housing
+++++++++++
.. automodule:: paddle.v2.dataset.uci_housing
:members:
#########################
Configuration Related API
#########################
======
Layers
======
.. automodule:: paddle.v2.layer
:members:
==========
Attributes
==========
.. automodule:: paddle.v2.attr
:members:
===========
Activations
===========
.. automodule:: paddle.v2.activation
:members:
========
Poolings
========
.. automodule:: paddle.v2.pooling
:members:
========
Networks
========
.. automodule:: paddle.v2.networks
:members:
==========
Optimizers
==========
.. automodule:: paddle.v2.optimizer
:members:
###########
Trainer API
###########
==========
Parameters
==========
.. automodule:: paddle.v2.parameters
:members:
=======
Trainer
=======
.. automodule:: paddle.v2.trainer
:members:
=====
Event
=====
.. automodule:: paddle.v2.event
:members:
=========
Inference
=========
.. autofunction:: paddle.v2.infer
\ No newline at end of file
# PaddlePaddle Design Doc
## Ingredients
As our design principle is starting from the essence: how could we
allow users to express and solve their problems at neural networks.
Some essential concepts that our API have to provide include:
1. A *topology* is an expression of *layers*.
1. A layer could be any kind of computation, including *cost*.
1. Some layers have parameters, some don't. Most costs don't have
parameters.
1. In some topologies, layers share parameters. For
example,
[the network for training a ranking model](https://github.com/PaddlePaddle/Paddle/issues/1311#issuecomment-279121850).
1. At programming time, users specify topologies and possible sharing
of parameters. PaddlePaddle can figure out and create parameters
required (and possibly shared) by one or more topologies.
## Starting from Examples
As a summarization
of
[our disucssion](https://github.com/PaddlePaddle/Paddle/issues/1315),
let us present two examples here:
### Example 1. Sharing Parameters between Layers
We use
the
[3-branch ranking](https://github.com/PaddlePaddle/Paddle/issues/1311#issuecomment-279121850) model
in this example. For your convenience, I copy-a-paste the model's
topology as follows:
```
A -> f -\
Q -> f --> cost
B -> f -/
```
The following program trains the topology including the cost, and then
use the sub-network in the trained topology in inference:
```python
def f(in):
e = paddle.layer.embedding(in, parameter_name="embedding")
o = paddle.layer.softmax(e, parameter_name="semantic")
return o
# Create 3 topologies (subnets), they share parameters because all
# correspoinding layers have the same parameter names.
fA = f(paddle.layer.data(input_name="A"))
fB = f(paddle.layer.data(input_name="B"))
fQ = f(paddle.layer.data(input_name="Q"))
topology = paddle.layer.less_than(
paddle.layer.cross_entropy(fA, fQ),
paddle.layer.corss_entropy(fB, fQ))
# Derive parameters required in topology and create them in model.
parameters = paddle.parameters.create(topology)
# Estimate parameters used in topology from data.
paddle.train(topology, parameters, reader=read_ranking_model_data)
# Inference using fA (or fB or fC, as they share their parameters).
[testA, testB, testQ] = read_ranking_model_data()
print "The sematic-vector of testA: ", paddle.infer(fA, parameters, testA)
```
### Example 2. Sharing Parameters between "Models"
We use [GAN](https://github.com/PaddlePaddle/book/tree/develop/gan) in
this example. In the following example program, `d0` and `d1`
correspond to the two networks in the following figure:
<img src="https://github.com/wangyang59/book/raw/00036f4b0da5225041a6824587c1a01cf20159b1/gan/image/gan_ig.png" width=400 />
```python
def G(in):
# over-simplified example as G has only one layers:
return paddle.layer.fc(in, parameter_name="G")
def D(in);
# again, over-simplified:
return paddle.layer.fc(in, parameter_name="D")
# Construct the first topology, which contains both D and G.
# By learning this topology, we update parameters of G.
d0 = paddle.layer.should_be_false(D(G(paddle.layer.data())))
# Construct a second topology d1, which contains only D. By
# training this topology, we update parameters of D. Note
# that d1 share parameters with d0.
d1 = paddle.layer.should_be_true(D(paddle.layer.data()))
# Create parameters from a list of multiple topologies (models) for
# the chance to share parameters between these topologies.
parameters = paddle.parameters.create([d0, d1])
# Iterative training of GAN.
for ...:
train(d0, parameters, reader=read_from_rng, immutable_parameters={"D"})
train(d1, parameters, reader=read_from_realistic_images)
# Use d1 for inference:
print "D thinks a batch of images are realistic ", infer(d1, parameters, read_mnist_images)
```
### Summarization
Above two programs reveal some important design concerns:
1. Users describe a topology as an expression of layers. Every layer
has a *parameter name*. If the users don't specify it explicitly, it's automatically generated as a unique name. By
specifying the parameter name, users can specify the sharing of
parameters between layers and even between topologies.
1. `paddle.parameters.create` figures out parameters required by one
or more topologies from parameter names of layers. It creates these
parameters and returns a `ParameterSet` object, which is in essence
a map from *parameter names* to *parameters*.
1. At training and inference time, `paddle.train` and `paddle.infer`
requires both a topology and the parameter set that holds the parameters of that topology. There are some reasons:
1. This prevents users from forgetting to call
`paddle.parameters.create`.
1. `paddle.train` needs to know which parameter set to update.
1. Users could load another (pre-trained) parameter set and use it
with a topology in `train.infer`.
1. By specifying the `immutable_parameters` parameter of
`paddle.train`, we can forbid the update of these parameters.
## Reader
Not all programming frameworks allow users to define I/O functions.
An example is Google MapReduce, which can only read from text,
SSTable, and RecordIO files. Hadoop MapReduce allows users to define
readers and writers by deriving from base classes `Reader` and
`Writer`. The former is less flexible but also less error-prone. We
decide to provide the flexibility to users to define their readers.
There are some open questions here:
1. **Should a reader return a Python dictionary?**
1. **How to map multiple outputs from a reader to multiple data layers?**
1. **How to easily compose some existing readers to read more data and
feed a topology with more data layers?**
## Training
The recommended way to training a model is to call `paddle.train`,
which simply calls `paddle.trainer.Default`, a global variable of
type `paddle.trainer.SGD`. Equivalently, we can do
```python
opt = paddle.trainer.SGD(..., paddle.updater.Adam(...))
opt.train(topology, parameters, reader=read, ...)
```
### Updater
Please be aware that a trainer can accept an updater as its data
member, where an updater is a class derived from
`paddle.trainer.Updater`. This is to make it easier to customize
trainers, as discussed
[here](https://github.com/PaddlePaddle/Paddle/issues/1319).
### Event Handler
`paddle.train` and `paddle.trainer.XXX.train` take an optional
parameter `event_handler`, which should be either `None` or a function
that handle some events:
1. BeginTraining
1. EndTraining
1. BeginIteration
1. EndIteration
1. BeginPass
1. EndPass
where EndPass is sent if and only if the reader yields
`end_pass=True`.
An example as follows:
```python
def event_handler(event):
if ininstance(event, paddle.event.EndIteration):
print paddle.test(...)
paddle.train(topology, parameters, reader, event_handler)
```
If we are writing a PaddlePaddle program in and for iPython/Jypyter,
we can use metaplotlib in the event handler to plot a curve of
cost/error versus iterations, as shown
[here](https://blog.dominodatalab.com/interactive-dashboards-in-jupyter/).
### Distributed Training
If users want to do distributed training on a cluster, s/he should
call `paddle.dist_train` and provides access tokens to the cluster as
a parameter.
For example, if the user has a TLS certificate that allows him to
access a Kubernetes cluster, s/he should be able to call
```python
paddle.dist_train(model,
trainer=paddle.trainer.SGD(...,
paddle.updater.Adam(...)),
reader=read,
k8s_user="yi",
k8s_token="kube_cluster_tls.pem",
k8s_job="hello",
num_parameter_servers=15)
```
The pseudo code if `paddle.dist_train` is as follows:
```python
def dist_train(topology, parameters, trainer, reader, ...):
if os.getenv("KUBERNETES_SERVICE_HOST") == None:
image_name = k8s_user + '/' + k8s_job
docker_build(image_name)
docker_push()
kube_ctrl_start_job(image_name, k8s_user, k8s_token)
else:
rank = kube_list_containers_in_job_and_return_current_containers_rank()
if rank == 0:
master()
elif rank < 15:
parameter_server()
else:
trainer.train(model, reader=read)
```
Please be aware that if a process is running on the Kubernetes
cluster, it will have some environment variables pre-defined.
If `dist_train` doesn't see these environment variables, it knows
that it's running on users' personal computer, and it should work as a
*launcher*. Otherwise, it knows that it's running on the cluster and
need to figure out its role as either the master, or a trainer, or a
parameter server.
# Python Data Reader Design Doc
At training and testing time, PaddlePaddle programs need to read data. To ease the users' work to write data reading code, we define that
- A *reader* is a function that reads data (from file, network, random number generator, etc) and yields data items.
- A *reader creator* is a function that returns a reader function.
- A *reader decorator* is a function, which accepts one or more readers, and returns a reader.
- A *batch reader* is a function that reads data (from *reader*, file, network, random number generator, etc) and yields a batch of data items.
and provide function which converts reader to batch reader, frequently used reader creators and reader decorators.
## Data Reader Interface
Indeed, *data reader* doesn't have to be a function that reads and yields data items. It can be any function with no parameter that creates a iterable (anything can be used in `for x in iterable`):
```
iterable = data_reader()
```
Element produced from the iterable should be a **single** entry of data, **not** a mini batch. That entry of data could be a single item, or a tuple of items. Item should be of [supported type](http://www.paddlepaddle.org/doc/ui/data_provider/pydataprovider2.html?highlight=dense_vector#input-types) (e.g., numpy 1d array of float32, int, list of int)
An example implementation for single item data reader creator:
```python
def reader_creator_random_image(width, height):
def reader():
while True:
yield numpy.random.uniform(-1, 1, size=width*height)
return reader
```
An example implementation for multiple item data reader creator:
```python
def reader_creator_random_image_and_label(width, height, label):
def reader():
while True:
yield numpy.random.uniform(-1, 1, size=width*height), label
return reader
```
## Batch Reader Interface
*batch reader* can be any function with no parameter that creates a iterable (anything can be used in `for x in iterable`). The output of the iterable should be a batch (list) of data items. Each item inside the list must be a tuple.
Here are valid outputs:
```python
# a mini batch of three data items. Each data item consist three columns of data, each of which is 1.
[(1, 1, 1),
(2, 2, 2),
(3, 3, 3)]
# a mini batch of three data items, each data item is a list (single column).
[([1,1,1],),
([2,2,2],),
([3,3,3],),
```
Please note that each item inside the list must be a tuple, below is an invalid output:
```python
# wrong, [1,1,1] needs to be inside a tuple: ([1,1,1],).
# Otherwise it's ambiguous whether [1,1,1] means a single column of data [1, 1, 1],
# or three column of datas, each of which is 1.
[[1,1,1],
[2,2,2],
[3,3,3]]
```
It's easy to convert from reader to batch reader:
```python
mnist_train = paddle.dataset.mnist.train()
mnist_train_batch_reader = paddle.batch(mnist_train, 128)
```
Also easy to create custom batch reader:
```python
def custom_batch_reader():
while True:
batch = []
for i in xrange(128):
batch.append((numpy.random.uniform(-1, 1, 28*28),)) # note that it's a tuple being appended.
yield batch
mnist_random_image_batch_reader = custom_batch_reader
```
## Usage
batch reader, mapping from item(s) read to data layer, batch size and number of total pass will be passed into `paddle.train`:
```python
# two data layer is created:
image_layer = paddle.layer.data("image", ...)
label_layer = paddle.layer.data("label", ...)
# ...
batch_reader = paddle.batch(paddle.dataset.mnist.train(), 128)
paddle.train(batch_reader, {"image":0, "label":1}, 128, 10, ...)
```
## Data Reader Decorator
*Data reader decorator* takes a single or multiple data reader, returns a new data reader. It is similar to a [python decorator](https://wiki.python.org/moin/PythonDecorators), but it does not use `@` syntax.
Since we have a strict interface for data readers (no parameter, return a single data item). Data reader can be used flexiable via data reader decorators. Following are a few examples:
### Prefetch Data
Since reading data may take time and training can not proceed without data. It is generally a good idea to prefetch data.
Use `paddle.reader.buffered` to prefetch data:
```python
buffered_reader = paddle.reader.buffered(paddle.dataset.mnist.train(), 100)
```
`buffered_reader` will try to buffer (prefetch) `100` data entries.
### Compose Multiple Data Readers
For example, we want to use a source of real images (reusing mnist dataset), and a source of random images as input for [Generative Adversarial Networks](https://arxiv.org/abs/1406.2661).
We can do:
```python
def reader_creator_random_image(width, height):
def reader():
while True:
yield numpy.random.uniform(-1, 1, size=width*height)
return reader
def reader_creator_bool(t):
def reader:
while True:
yield t
return reader
true_reader = reader_creator_bool(True)
false_reader = reader_creator_bool(False)
reader = paddle.reader.compose(paddle.dataset.mnist.train(), data_reader_creator_random_image(20, 20), true_reader, false_reader)
# Skipped 1 because paddle.dataset.mnist.train() produces two items per data entry.
# And we don't care second item at this time.
paddle.train(paddle.batch(reader, 128), {"true_image":0, "fake_image": 2, "true_label": 3, "false_label": 4}, ...)
```
### Shuffle
Given shuffle buffer size `n`, `paddle.reader.shuffle` will return a data reader that buffers `n` data entries and shuffle them before a data entry is read.
Example:
```python
reader = paddle.reader.shuffle(paddle.dataset.mnist.train(), 512)
```
## Q & A
### Why reader return only a single entry, but not a mini batch?
Always returning a single entry make reusing existing data readers much easier (e.g., if existing reader return not a single entry but 3 entries, training code will be more complex because it need to handle cases like batch size 2).
We provide function `paddle.batch` to turn (single entry) reader into batch reader.
### Why do we need batch reader, isn't train take reader and batch_size as arguments sufficient?
In most of the case, train taking reader and batch_size as arguments would be sufficent. However sometimes user want to customize order of data entries inside a mini batch. Or even change batch size dynamically.
### Why use a dictionary but not a list to provide mapping?
We decided to use dictionary (`{"image":0, "label":1}`) instead of list (`["image", "label"]`) is because that user can easily resue item (e.g., using `{"image_a":0, "image_b":0, "label":1}`) or skip item (e.g., using `{"image_a":0, "label":2}`).
### How to create custom data reader creator
```python
def image_reader_creator(image_path, label_path, n):
def reader():
f = open(image_path)
l = open(label_path)
images = numpy.fromfile(
f, 'ubyte', count=n * 28 * 28).reshape((n, 28 * 28)).astype('float32')
images = images / 255.0 * 2.0 - 1.0
labels = numpy.fromfile(l, 'ubyte', count=n).astype("int")
for i in xrange(n):
yield images[i, :], labels[i] # a single entry of data is created each time
f.close()
l.close()
return reader
# images_reader_creator creates a reader
reader = image_reader_creator("/path/to/image_file", "/path/to/label_file", 1024)
paddle.train(paddle.batch(reader, 128), {"image":0, "label":1}, ...)
```
### How is `paddle.train` implemented
An example implementation of paddle.train could be:
```python
def train(batch_reader, mapping, batch_size, total_pass):
for pass_idx in range(total_pass):
for mini_batch in batch_reader(): # this loop will never end in online learning.
do_forward_backward(mini_batch, mapping)
```
Simple Linear Regression
========================
PaddlePaddle is a deep learning platform open-sourced by Baidu. With PaddlePaddle, you can easily train a classic neural network within a couple lines of configuration, or you can build sophisticated models that provide state-of-the-art performance on difficult learning tasks like sentiment analysis, machine translation, image caption and so on.
Problem Background
------------------
Now, to give you a hint of what using PaddlePaddle looks like, let's start with a fundamental learning problem - `simple linear regression <https://en.wikipedia.org/wiki/Simple_linear_regression>`_: you have observed a set of two-dimensional data points of ``X`` and ``Y``, where ``X`` is an explanatory variable and ``Y`` is corresponding dependent variable, and you want to recover the underlying correlation between ``X`` and ``Y``. Linear regression can be used in many practical scenarios. For example, ``X`` can be a variable about house size, and ``Y`` a variable about house price. You can build a model that captures relationship between them by observing real estate markets.
Prepare the Data
-----------------
Suppose the true relationship can be characterized as ``Y = 2X + 0.3``, let's see how to recover this pattern only from observed data. Here is a piece of python code that feeds synthetic data to PaddlePaddle. The code is pretty self-explanatory, the only extra thing you need to add for PaddlePaddle is a definition of input data types.
.. code-block:: python
# dataprovider.py
from paddle.trainer.PyDataProvider2 import *
import random
# define data types of input: 2 real numbers
@provider(input_types=[dense_vector(1), dense_vector(1)],use_seq=False)
def process(settings, input_file):
for i in xrange(2000):
x = random.random()
yield [x], [2*x+0.3]
Train a NeuralNetwork
----------------------
To recover this relationship between ``X`` and ``Y``, we use a neural network with one layer of linear activation units and a square error cost layer. Don't worry if you are not familiar with these terminologies, it's just saying that we are starting from a random line ``Y' = wX + b`` , then we gradually adapt ``w`` and ``b`` to minimize the difference between ``Y'`` and ``Y``. Here is what it looks like in PaddlePaddle:
.. code-block:: python
# trainer_config.py
from paddle.trainer_config_helpers import *
# 1. read data. Suppose you saved above python code as dataprovider.py
data_file = 'empty.list'
with open(data_file, 'w') as f: f.writelines(' ')
define_py_data_sources2(train_list=data_file, test_list=None,
module='dataprovider', obj='process',args={})
# 2. learning algorithm
settings(batch_size=12, learning_rate=1e-3, learning_method=MomentumOptimizer())
# 3. Network configuration
x = data_layer(name='x', size=1)
y = data_layer(name='y', size=1)
y_predict = fc_layer(input=x, param_attr=ParamAttr(name='w'), size=1, act=LinearActivation(), bias_attr=ParamAttr(name='b'))
cost = regression_cost(input=y_predict, label=y)
outputs(cost)
Some of the most fundamental usages of PaddlePaddle are demonstrated:
- The first part shows how to feed data into PaddlePaddle. In general cases, PaddlePaddle reads raw data from a list of files, and then do some user-defined process to get real input. In this case, we only need to create a placeholder file since we are generating synthetic data on the fly.
- The second part describes learning algorithm. It defines in what ways adjustments are made to model parameters. PaddlePaddle provides a rich set of optimizers, but a simple momentum based optimizer will suffice here, and it processes 12 data points each time.
- Finally, the network configuration. It usually is as simple as "stacking" layers. Three kinds of layers are used in this configuration:
- **Data Layer**: a network always starts with one or more data layers. They provide input data to the rest of the network. In this problem, two data layers are used respectively for ``X`` and ``Y``.
- **FC Layer**: FC layer is short for Fully Connected Layer, which connects all the input units to current layer and does the actual computation specified as activation function. Computation layers like this are the fundamental building blocks of a deeper model.
- **Cost Layer**: in training phase, cost layers are usually the last layers of the network. They measure the performance of current model, and provide guidence to adjust parameters.
Now that everything is ready, you can train the network with a simple command line call:
.. code-block:: bash
paddle train --config=trainer_config.py --save_dir=./output --num_passes=30
This means that PaddlePaddle will train this network on the synthectic dataset for 30 passes, and save all the models under path ``./output``. You will see from the messages printed out during training phase that the model cost is decreasing as time goes by, which indicates we are getting a closer guess.
Evaluate the Model
-------------------
Usually, a different dataset that left out during training phase should be used to evalute the models. However, we are lucky enough to know the real answer: ``w=2, b=0.3``, thus a better option is to check out model parameters directly.
In PaddlePaddle, training is just to get a collection of model parameters, which are ``w`` and ``b`` in this case. Each parameter is saved in an individual file in the popular ``numpy`` array format. Here is the code that reads parameters from last pass.
.. code-block:: python
import numpy as np
import os
def load(file_name):
with open(file_name, 'rb') as f:
f.read(16) # skip header for float type.
return np.fromfile(f, dtype=np.float32)
print 'w=%.6f, b=%.6f' % (load('output/pass-00029/w'), load('output/pass-00029/b'))
# w=1.999743, b=0.300137
.. image:: parameters.png
:align: center
Although starts from a random guess, you can see that value of ``w`` changes quickly towards 2 and ``b`` changes quickly towards 0.3. In the end, the predicted line is almost identical with real answer.
There, you have recovered the underlying pattern between ``X`` and ``Y`` only from observed data.
Installing from Sources
==========================
* [1. Download and Setup](#download)
* [2. Requirements](#requirements)
* [3. Build on Ubuntu](#ubuntu)
* [4. Build on Centos](#centos)
## <span id="download">Download and Setup</span>
You can download PaddlePaddle from the [github source](https://github.com/PaddlePaddle/Paddle).
```bash
git clone https://github.com/PaddlePaddle/Paddle paddle
cd paddle
```
## <span id="requirements">Requirements</span>
To compile the source code, your computer must be equipped with the following dependencies.
- **Compiler**: GCC >= 4.8 or Clang >= 3.3 (AppleClang >= 5.1) and gfortran compiler
- **CMake**: CMake >= 3.0 (at least CMake 3.4 on Mac OS X)
- **BLAS**: MKL, OpenBlas or ATLAS
- **Python**: only support Python 2.7
**Note:** For CUDA 7.0 and CUDA 7.5, GCC 5.0 and up are not supported!
For CUDA 8.0, GCC versions later than 5.3 are not supported!
### Options
PaddlePaddle supports some build options.
<html>
<table>
<thead>
<tr>
<th scope="col" class="left">Optional</th>
<th scope="col" class="left">Description</th>
</tr>
</thead>
<tbody>
<tr><td class="left">WITH_GPU</td><td class="left">Compile PaddlePaddle with NVIDIA GPU</td></tr>
<tr><td class="left">WITH_AVX</td><td class="left">Compile PaddlePaddle with AVX intrinsics</td></tr>
<tr><td class="left">WITH_DSO</td><td class="left">Compile PaddlePaddle with dynamic linked CUDA</td></tr>
<tr><td class="left">WITH_TESTING</td><td class="left">Compile PaddlePaddle with unit testing</td></tr>
<tr><td class="left">WITH_SWIG_PY</td><td class="left">Compile PaddlePaddle with inference api</td></tr>
<tr><td class="left">WITH_STYLE_CHECK</td><td class="left">Compile PaddlePaddle with style check</td></tr>
<tr><td class="left">WITH_PYTHON</td><td class="left">Compile PaddlePaddle with python interpreter</td></tr>
<tr><td class="left">WITH_DOUBLE</td><td class="left">Compile PaddlePaddle with double precision</td></tr>
<tr><td class="left">WITH_RDMA</td><td class="left">Compile PaddlePaddle with RDMA support</td></tr>
<tr><td class="left">WITH_TIMER</td><td class="left">Compile PaddlePaddle with stats timer</td></tr>
<tr><td class="left">WITH_PROFILER</td><td class="left">Compile PaddlePaddle with GPU profiler</td></tr>
<tr><td class="left">WITH_DOC</td><td class="left">Compile PaddlePaddle with documentation</td></tr>
<tr><td class="left">ON_COVERALLS</td><td class="left">Compile PaddlePaddle with code coverage</td></tr>
<tr><td class="left">COVERALLS_UPLOAD</td><td class="left">Package code coverage data to coveralls</td></tr>
<tr><td class="left">ON_TRAVIS</td><td class="left">Exclude special unit test on Travis CI</td></tr>
</tbody>
</table>
</html>
**Note:**
- The GPU version works best with Cuda Toolkit 8.0 and cuDNN v5.
- Other versions like Cuda Toolkit 7.0, 7.5 and cuDNN v3, v4 are also supported.
- **To utilize cuDNN v5, Cuda Toolkit 7.5 is prerequisite and vice versa.**
As a simple example, consider the following:
1. **BLAS Dependencies(optional)**
CMake will search BLAS libraries from system. If not found, OpenBLAS will be downloaded, built and installed automatically.
To utilize preinstalled BLAS, you can simply specify MKL, OpenBLAS or ATLAS via `MKL_ROOT`, `OPENBLAS_ROOT` or `ATLAS_ROOT`.
```bash
# specify MKL
cmake .. -DMKL_ROOT=<mkl_path>
# or specify OpenBLAS
cmake .. -DOPENBLAS_ROOT=<openblas_path>
```
2. **Doc Dependencies(optional)**
To generate PaddlePaddle's documentation, install dependencies and set `-DWITH_DOC=ON` as follows:
```bash
pip install 'sphinx>=1.4.0'
pip install sphinx_rtd_theme recommonmark
# install doxygen on Ubuntu
sudo apt-get install doxygen
# install doxygen on Mac OS X
brew install doxygen
# active docs in cmake
cmake .. -DWITH_DOC=ON`
```
## <span id="ubuntu">Build on Ubuntu 14.04</span>
### Install Dependencies
- **Paddle Dependencies**
```bash
# necessary
sudo apt-get update
sudo apt-get install -y git curl gcc g++ gfortran make build-essential automake
sudo apt-get install -y python python-pip python-numpy libpython-dev bison
sudo pip install 'protobuf==3.1.0.post1'
# install cmake 3.4
curl -sSL https://cmake.org/files/v3.4/cmake-3.4.1.tar.gz | tar -xz && \
cd cmake-3.4.1 && ./bootstrap && make -j4 && sudo make install && \
cd .. && rm -rf cmake-3.4.1
```
- **GPU Dependencies (optional)**
To build GPU version, you will need the following installed:
1. a CUDA-capable GPU
2. A supported version of Linux with a gcc compiler and toolchain
3. NVIDIA CUDA Toolkit (available at http://developer.nvidia.com/cuda-downloads)
4. NVIDIA cuDNN Library (availabel at https://developer.nvidia.com/cudnn)
The CUDA development environment relies on tight integration with the host development environment,
including the host compiler and C runtime libraries, and is therefore only supported on
distribution versions that have been qualified for this CUDA Toolkit release.
After downloading cuDNN library, issue the following commands:
```bash
sudo tar -xzf cudnn-7.5-linux-x64-v5.1.tgz -C /usr/local
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
```
Then you need to set LD\_LIBRARY\_PATH, PATH environment variables in ~/.bashrc.
```bash
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda/bin:$PATH
```
### Build and Install
As usual, the best option is to create build folder under paddle project directory.
```bash
mkdir build && cd build
```
Finally, you can build and install PaddlePaddle:
```bash
# you can add build option here, such as:
cmake .. -DCMAKE_INSTALL_PREFIX=<path to install>
# please use sudo make install, if you want to install PaddlePaddle into the system
make -j `nproc` && make install
# set PaddlePaddle installation path in ~/.bashrc
export PATH=<path to install>/bin:$PATH
# install PaddlePaddle Python modules.
sudo pip install <path to install>/opt/paddle/share/wheels/*.whl
```
## <span id="centos">Build on Centos 7</span>
### Install Dependencies
- **CPU Dependencies**
```bash
# necessary
sudo yum update
sudo yum install -y epel-release
sudo yum install -y make cmake3 python-devel python-pip gcc-gfortran swig git
sudo pip install wheel numpy
sudo pip install 'protobuf>=3.0.0'
```
- **GPU Dependencies (optional)**
To build GPU version, you will need the following installed:
1. a CUDA-capable GPU
2. A supported version of Linux with a gcc compiler and toolchain
3. NVIDIA CUDA Toolkit (available at http://developer.nvidia.com/cuda-downloads)
4. NVIDIA cuDNN Library (availabel at https://developer.nvidia.com/cudnn)
The CUDA development environment relies on tight integration with the host development environment,
including the host compiler and C runtime libraries, and is therefore only supported on
distribution versions that have been qualified for this CUDA Toolkit release.
After downloading cuDNN library, issue the following commands:
```bash
sudo tar -xzf cudnn-7.5-linux-x64-v5.1.tgz -C /usr/local
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
```
Then you need to set LD\_LIBRARY\_PATH, PATH environment variables in ~/.bashrc.
```bash
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda/bin:$PATH
```
### Build and Install
As usual, the best option is to create build folder under paddle project directory.
```bash
mkdir build && cd build
```
Finally, you can build and install PaddlePaddle:
```bash
# you can add build option here, such as:
cmake3 .. -DCMAKE_INSTALL_PREFIX=<path to install>
# please use sudo make install, if you want to install PaddlePaddle into the system
make -j `nproc` && make install
# set PaddlePaddle installation path in ~/.bashrc
export PATH=<path to install>/bin:$PATH
# install PaddlePaddle Python modules.
sudo pip install <path to install>/opt/paddle/share/wheels/*.whl
```
PaddlePaddle in Docker Containers
=================================
Docker container is currently the only officially-supported way to
running PaddlePaddle. This is reasonable as Docker now runs on all
major operating systems including Linux, Mac OS X, and Windows.
Please be aware that you will need to change `Dockers settings
<https://github.com/PaddlePaddle/Paddle/issues/627>`_ to make full use
of your hardware resource on Mac OS X and Windows.
Development Using Docker
------------------------
Developers can work on PaddlePaddle using Docker. This allows
developers to work on different platforms -- Linux, Mac OS X, and
Windows -- in a consistent way.
1. Build the Development Environment as a Docker Image
.. code-block:: bash
git clone --recursive https://github.com/PaddlePaddle/Paddle
cd Paddle
docker build -t paddle:dev -f paddle/scripts/docker/Dockerfile .
Note that by default :code:`docker build` wouldn't import source
tree into the image and build it. If we want to do that, we need
to set a build arg:
.. code-block:: bash
docker build -t paddle:dev -f paddle/scripts/docker/Dockerfile --build-arg BUILD_AND_INSTALL=ON .
2. Run the Development Environment
Once we got the image :code:`paddle:dev`, we can use it to develop
Paddle by mounting the local source code tree into a container that
runs the image:
.. code-block:: bash
docker run -d -p 2202:22 -p 8888:8888 -v $PWD:/paddle paddle:dev
This runs a container of the development environment Docker image
with the local source tree mounted to :code:`/paddle` of the
container.
Note that the default entry-point of :code:`paddle:dev` is
:code:`sshd`, and above :code:`docker run` commands actually starts
an SSHD server listening on port 2202. This allows us to log into
this container with:
.. code-block:: bash
ssh root@localhost -p 2202
Usually, I run above commands on my Mac. I can also run them on a
GPU server :code:`xxx.yyy.zzz.www` and ssh from my Mac to it:
.. code-block:: bash
my-mac$ ssh root@xxx.yyy.zzz.www -p 2202
3. Build and Install Using the Development Environment
Once I am in the container, I can use
:code:`paddle/scripts/docker/build.sh` to build, install, and test
Paddle:
.. code-block:: bash
/paddle/paddle/scripts/docker/build.sh
This builds everything about Paddle in :code:`/paddle/build`. And
we can run unit tests there:
.. code-block:: bash
cd /paddle/build
ctest
4. Run PaddlePaddle Book under Docker Container
The Jupyter Notebook is an open-source web application that allows
you to create and share documents that contain live code, equations,
visualizations and explanatory text in a single browser.
PaddlePaddle Book is an interactive Jupyter Notebook for users and developers.
We already exposed port 8888 for this book. If you want to
dig deeper into deep learning, PaddlePaddle Book definitely is your best choice.
Once you are inside the container, simply issue the command:
.. code-block:: bash
jupyter notebook
Then, you would back and paste the address into the local browser:
.. code-block:: text
http://localhost:8888/
That's all. Enjoy your journey!
CPU-only and GPU Images
-----------------------
For each version of PaddlePaddle, we release 2 Docker images, a
CPU-only one and a CUDA GPU one. We do so by configuring
`dockerhub.com <https://hub.docker.com/r/paddledev/paddle/>`_
automatically runs the following commands:
.. code-block:: bash
docker build -t paddle:cpu -f paddle/scripts/docker/Dockerfile .
docker build -t paddle:gpu -f paddle/scripts/docker/Dockerfile.gpu .
To run the CPU-only image as an interactive container:
.. code-block:: bash
docker run -it --rm paddledev/paddle:cpu-latest /bin/bash
or, we can run it as a daemon container
.. code-block:: bash
docker run -d -p 2202:22 paddledev/paddle:cpu-latest
and SSH to this container using password :code:`root`:
.. code-block:: bash
ssh -p 2202 root@localhost
An advantage of using SSH is that we can connect to PaddlePaddle from
more than one terminals. For example, one terminal running vi and
another one running Python interpreter. Another advantage is that we
can run the PaddlePaddle container on a remote server and SSH to it
from a laptop.
Above methods work with the GPU image too -- just please don't forget
to install CUDA driver and let Docker knows about it:
.. code-block:: bash
export CUDA_SO="$(\ls /usr/lib64/libcuda* | xargs -I{} echo '-v {}:{}') $(\ls /usr/lib64/libnvidia* | xargs -I{} echo '-v {}:{}')"
export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
docker run ${CUDA_SO} ${DEVICES} -it paddledev/paddle:gpu-latest
Non-AVX Images
--------------
Please be aware that the CPU-only and the GPU images both use the AVX
instruction set, but old computers produced before 2008 do not support
AVX. The following command checks if your Linux computer supports
AVX:
.. code-block:: bash
if cat /proc/cpuinfo | grep -i avx; then echo Yes; else echo No; fi
If it doesn't, we will need to build non-AVX images manually from
source code:
.. code-block:: bash
cd ~
git clone https://github.com/PaddlePaddle/Paddle.git
cd Paddle
docker build --build-arg WITH_AVX=OFF -t paddle:cpu-noavx -f paddle/scripts/docker/Dockerfile .
docker build --build-arg WITH_AVX=OFF -t paddle:gpu-noavx -f paddle/scripts/docker/Dockerfile.gpu .
Documentation
-------------
Paddle Docker images include an HTML version of C++ source code
generated using `woboq code browser
<https://github.com/woboq/woboq_codebrowser>`_. This makes it easy
for users to browse and understand the C++ source code.
As long as we give the Paddle Docker container a name, we can run an
additional Nginx Docker container to serve the volume from the Paddle
container:
.. code-block:: bash
docker run -d --name paddle-cpu-doc paddle:cpu
docker run -d --volumes-from paddle-cpu-doc -p 8088:80 nginx
Then we can direct our Web browser to the HTML version of source code
at http://localhost:8088/paddle/
Install and Build
=================
Install PaddlePaddle
----------------------
.. toctree::
:maxdepth: 1
docker_install_en.rst
ubuntu_install_en.rst
Build from Source
-----------------
.. warning::
Please use :code:`deb` package or :code:`docker` image to install paddle. The building guide is used for hacking or contributing PaddlePaddle source code.
.. toctree::
:maxdepth: 1
build_from_source_en.md
Debian Package installation guide
=================================
PaddlePaddle supports :code:`deb` pacakge. The installation of this :code:`deb` package is tested in ubuntu 14.04, but it should be support other debian based linux, too.
There are four versions of debian package, :code:`cpu`, :code:`gpu`, :code:`cpu-noavx`, :code:`gpu-noavx`. And :code:`noavx` version is used to support CPU which does not contain :code:`AVX` instructions. The download url of :code:`deb` package is \: https://github.com/baidu/Paddle/releases/
After downloading PaddlePaddle deb packages, you can use :code:`gdebi` install.
.. code-block:: bash
gdebi paddle-*.deb
If :code:`gdebi` is not installed, you can use :code:`sudo apt-get install gdebi` to install it.
Or you can use following commands to install PaddlePaddle.
.. code-block:: bash
dpkg -i paddle-*.deb
apt-get install -f
And if you use GPU version deb package, you need to install CUDA toolkit and cuDNN, and set related environment variables(such as LD_LIBRARY_PATH) first. It is normal when `dpkg -i` get errors. `apt-get install -f` will continue install paddle, and install dependences.
GET STARTED
============
.. toctree::
:maxdepth: 2
build_and_install/index_en.rst
basic_usage/index_en.rst
RNN Models
==========
.. toctree::
:maxdepth: 1
rnn_config_en.rst
RNN Configuration
=================
This tutorial will guide you how to configure recurrent neural network in PaddlePaddle. PaddlePaddle supports highly flexible and efficient recurrent neural network configuration. In this tutorial, you will learn how to:
- prepare sequence data for learning recurrent neural networks.
- configure recurrent neural network architecture.
- generate sequence with learned recurrent neural network models.
We will use vanilla recurrent neural network, and sequence to sequence model to guide you through these steps. The code of sequence to sequence model can be found at :code:`demo/seqToseq`.
=====================
Prepare Sequence Data
=====================
PaddlePaddle does not need any preprocessing to sequence data, such as padding. The only thing that needs to be done is to set the type of the corresponding type to input. For example, the following code snippets defines three input. All of them are sequences, and the size of them are :code:`src_dict`, :code:`trg_dict`, and :code:`trg_dict`:
.. code-block:: python
settings.input_types = [
integer_value_sequence(len(settings.src_dict)),
integer_value_sequence(len(settings.trg_dict)),
integer_value_sequence(len(settings.trg_dict))]
Then at the :code:`process` function, each :code:`yield` function will return three integer lists. Each integer list is treated as a sequence of integers:
.. code-block:: python
yield src_ids, trg_ids, trg_ids_next
For more details description of how to write a data provider, please refer to :ref:`api_pydataprovider2` . The full data provider file is located at :code:`demo/seqToseq/dataprovider.py`.
===============================================
Configure Recurrent Neural Network Architecture
===============================================
-------------------------------------
Simple Gated Recurrent Neural Network
-------------------------------------
Recurrent neural network process a sequence at each time step sequentially. An example of the architecture of LSTM is listed below.
.. image:: ../../../tutorials/sentiment_analysis/src/bi_lstm.jpg
:align: center
Generally speaking, a recurrent network perform the following operations from :math:`t=1` to :math:`t=T`, or reversely from :math:`t=T` to :math:`t=1`.
.. math::
x_{t+1} = f_x(x_t), y_t = f_y(x_t)
where :math:`f_x(.)` is called **step function**, and :math:`f_y(.)` is called **output function**. In vanilla recurrent neural network, both of the step function and output function are very simple. However, PaddlePaddle supports the configuration of very complex architectures by modifying these two functions. We will use the sequence to sequence model with attention as an example to demonstrate how you can configure complex recurrent neural network models. In this section, we will use a simple vanilla recurrent neural network as an example of configuring simple recurrent neural network using :code:`recurrent_group`. Notice that if you only need to use simple RNN, GRU, or LSTM, then :code:`grumemory` and :code:`lstmemory` is recommended because they are more computationally efficient than :code:`recurrent_group`.
For vanilla RNN, at each time step, the **step function** is:
.. math::
x_{t+1} = W_x x_t + W_i I_t + b
where :math:`x_t` is the RNN state, and :math:`I_t` is the input, :math:`W_x` and :math:`W_i` are transformation matrices for RNN states and inputs, respectively. :math:`b` is the bias.
Its **output function** simply takes :math:`x_t` as the output.
:code:`recurrent_group` is the most important tools for constructing recurrent neural networks. It defines the **step function**, **output function** and the inputs of the recurrent neural network. Notice that the :code:`step` argument of this function implements both the :code:`step function` and the :code:`output function`:
.. code-block:: python
def simple_rnn(input,
size=None,
name=None,
reverse=False,
rnn_bias_attr=None,
act=None,
rnn_layer_attr=None):
def __rnn_step__(ipt):
out_mem = memory(name=name, size=size)
rnn_out = mixed_layer(input = [full_matrix_projection(ipt),
full_matrix_projection(out_mem)],
name = name,
bias_attr = rnn_bias_attr,
act = act,
layer_attr = rnn_layer_attr,
size = size)
return rnn_out
return recurrent_group(name='%s_recurrent_group' % name,
step=__rnn_step__,
reverse=reverse,
input=input)
PaddlePaddle uses memory to construct step function. **Memory** is the most important concept when constructing recurrent neural networks in PaddlePaddle. A memory is a state that is used recurrently in step functions, such as :math:`x_{t+1} = f_x(x_t)`. One memory contains an **output** and a **input**. The output of memory at the current time step is utilized as the input of the memory at the next time step. A memory can also has a **boot layer**, whose output is utilized as the initial value of the memory. In our case, the output of the gated recurrent unit is employed as the output memory. Notice that the name of the layer :code:`rnn_out` is the same as the name of :code:`out_mem`. This means the output of the layer :code:`rnn_out` (:math:`x_{t+1}`) is utilized as the **output** of :code:`out_mem` memory.
A memory can also be a sequence. In this case, at each time step, we have a sequence as the state of the recurrent neural network. This can be useful when constructing very complex recurrent neural network. Other advanced functions include defining multiple memories, and defining hierarchical recurrent neural network architecture using sub-sequence.
We return :code:`rnn_out` at the end of the function. It means that the output of the layer :code:`rnn_out` is utilized as the **output** function of the gated recurrent neural network.
-----------------------------------------
Sequence to Sequence Model with Attention
-----------------------------------------
We will use the sequence to sequence model with attention as an example to demonstrate how you can configure complex recurrent neural network models. An illustration of the sequence to sequence model with attention is shown in the following figure.
.. image:: ../../../tutorials/text_generation/encoder-decoder-attention-model.png
:align: center
In this model, the source sequence :math:`S = \{s_1, \dots, s_T\}` is encoded with a bidirectional gated recurrent neural networks. The hidden states of the bidirectional gated recurrent neural network :math:`H_S = \{H_1, \dots, H_T\}` is called *encoder vector* The decoder is a gated recurrent neural network. When decoding each token :math:`y_t`, the gated recurrent neural network generates a set of weights :math:`W_S^t = \{W_1^t, \dots, W_T^t\}`, which are used to compute a weighted sum of the encoder vector. The weighted sum of the encoder vector is utilized to condition the generation of the token :math:`y_t`.
The encoder part of the model is listed below. It calls :code:`grumemory` to represent gated recurrent neural network. It is the recommended way of using recurrent neural network if the network architecture is simple, because it is faster than :code:`recurrent_group`. We have implemented most of the commonly used recurrent neural network architectures, you can refer to :ref:`api_trainer_config_helpers_layers` for more details.
We also project the encoder vector to :code:`decoder_size` dimensional space, get the first instance of the backward recurrent network, and project it to :code:`decoder_size` dimensional space:
.. code-block:: python
# Define the data layer of the source sentence.
src_word_id = data_layer(name='source_language_word', size=source_dict_dim)
# Calculate the word embedding of each word.
src_embedding = embedding_layer(
input=src_word_id,
size=word_vector_dim,
param_attr=ParamAttr(name='_source_language_embedding'))
# Apply forward recurrent neural network.
src_forward = grumemory(input=src_embedding, size=encoder_size)
# Apply backward recurrent neural network. reverse=True means backward recurrent neural network.
src_backward = grumemory(input=src_embedding,
size=encoder_size,
reverse=True)
# Mix the forward and backward parts of the recurrent neural network together.
encoded_vector = concat_layer(input=[src_forward, src_backward])
# Project encoding vector to decoder_size.
encoder_proj = mixed_layer(input = [full_matrix_projection(encoded_vector)],
size = decoder_size)
# Compute the first instance of the backward RNN.
backward_first = first_seq(input=src_backward)
# Project the first instance of backward RNN to decoder size.
decoder_boot = mixed_layer(input=[full_matrix_projection(backward_first)], size=decoder_size, act=TanhActivation())
The decoder uses :code:`recurrent_group` to define the recurrent neural network. The step and output functions are defined in :code:`gru_decoder_with_attention`:
.. code-block:: python
group_inputs=[StaticInput(input=encoded_vector,is_seq=True),
StaticInput(input=encoded_proj,is_seq=True)]
trg_embedding = embedding_layer(
input=data_layer(name='target_language_word',
size=target_dict_dim),
size=word_vector_dim,
param_attr=ParamAttr(name='_target_language_embedding'))
group_inputs.append(trg_embedding)
# For decoder equipped with attention mechanism, in training,
# target embedding (the groudtruth) is the data input,
# while encoded source sequence is accessed to as an unbounded memory.
# StaticInput means the same value is utilized at different time steps.
# Otherwise, it is a sequence input. Inputs at different time steps are different.
# All sequence inputs should have the same length.
decoder = recurrent_group(name=decoder_group_name,
step=gru_decoder_with_attention,
input=group_inputs)
The implementation of the step function is listed as below. First, it defines the **memory** of the decoder network. Then it defines attention, gated recurrent unit step function, and the output function:
.. code-block:: python
def gru_decoder_with_attention(enc_vec, enc_proj, current_word):
# Defines the memory of the decoder.
# The output of this memory is defined in gru_step.
# Notice that the name of gru_step should be the same as the name of this memory.
decoder_mem = memory(name='gru_decoder',
size=decoder_size,
boot_layer=decoder_boot)
# Compute attention weighted encoder vector.
context = simple_attention(encoded_sequence=enc_vec,
encoded_proj=enc_proj,
decoder_state=decoder_mem)
# Mix the current word embedding and the attention weighted encoder vector.
decoder_inputs = mixed_layer(inputs = [full_matrix_projection(context),
full_matrix_projection(current_word)],
size = decoder_size * 3)
# Define Gated recurrent unit recurrent neural network step function.
gru_step = gru_step_layer(name='gru_decoder',
input=decoder_inputs,
output_mem=decoder_mem,
size=decoder_size)
# Defines the output function.
out = mixed_layer(input=[full_matrix_projection(input=gru_step)],
size=target_dict_dim,
bias_attr=True,
act=SoftmaxActivation())
return out
=================
Generate Sequence
=================
After training the model, we can use it to generate sequences. A common practice is to use **beam search** to generate sequences. The following code snippets defines a beam search algorithm. Notice that :code:`beam_search` function assumes the output function of the :code:`step` returns a softmax normalized probability vector of the next token. We made the following changes to the model.
* use :code:`GeneratedInput` for trg_embedding. :code:`GeneratedInput` computes the embedding of the generated token at the last time step for the input at the current time step.
* use :code:`beam_search` function. This function needs to set:
- :code:`bos_id`: the start token. Every sentence starts with the start token.
- :code:`eos_id`: the end token. Every sentence ends with the end token.
- :code:`beam_size`: the beam size used in beam search.
- :code:`max_length`: the maximum length of the generated sentences.
* use :code:`seqtext_printer_evaluator` to print text according to index matrix and dictionary. This function needs to set:
- :code:`id_input`: the integer ID of the data, used to identify the corresponding output in the generated files.
- :code:`dict_file`: the dictionary file for converting word id to word.
- :code:`result_file`: the path of the generation result file.
The code is listed below:
.. code-block:: python
group_inputs=[StaticInput(input=encoded_vector,is_seq=True),
StaticInput(input=encoded_proj,is_seq=True)]
# In generation, decoder predicts a next target word based on
# the encoded source sequence and the last generated target word.
# The encoded source sequence (encoder's output) must be specified by
# StaticInput which is a read-only memory.
# Here, GeneratedInputs automatically fetchs the last generated word,
# which is initialized by a start mark, such as <s>.
trg_embedding = GeneratedInput(
size=target_dict_dim,
embedding_name='_target_language_embedding',
embedding_size=word_vector_dim)
group_inputs.append(trg_embedding)
beam_gen = beam_search(name=decoder_group_name,
step=gru_decoder_with_attention,
input=group_inputs,
bos_id=0, # Beginnning token.
eos_id=1, # End of sentence token.
beam_size=beam_size,
max_length=max_length)
seqtext_printer_evaluator(input=beam_gen,
id_input=data_layer(name="sent_id", size=1),
dict_file=trg_dict_path,
result_file=gen_trans_file)
outputs(beam_gen)
Notice that this generation technique is only useful for decoder like generation process. If you are working on sequence tagging tasks, please refer to :ref:`semantic_role_labeling` for more details.
The full configuration file is located at :code:`demo/seqToseq/seqToseq_net.py`.
# Contribute Code
We sincerely appreciate your contributions. You can use fork and pull request
workflow to merge your code.
## Code Requirements
- Your code must be fully documented by
[doxygen](http://www.stack.nl/~dimitri/doxygen/) style.
- Make sure the compiler option WITH\_STYLE\_CHECK is on and the compiler
passes the code style check.
- All code must have unit test.
- Pass all unit tests.
The following tutorial guides you into submitting your contibution.
## [Creating a Fork](https://help.github.com/articles/fork-a-repo/)
Just head over to the GitHub page and click the "Fork" button.
It's just that simple.
## Clone
Paddle is currently using [git-flow branching model](http://nvie.com/posts/a-successful-git-branching-model/).
The **develop** is the main branch, and other user's branches are feature branches.
Once you've created a fork, you can use your favorite git client to clone your
repo or just head straight to the command line:
```shell
# Clone your fork to your local machine
git clone --branch develop https://github.com/USERNAME/Paddle.git
```
If your repository doesn't contain **develop** branch, just create it by your own.
```shell
git clone https://github.com/USERNAME/Paddle.git Paddle
cd Paddle
git checkout -b develop # create develop branch.
git remote add upstream https://github.com/PaddlePaddle/Paddle.git # add upstream to baidu/Paddle
git pull upstream develop # update to upstream
```
Then you can start to develop by making a local developement branch
```shell
git checkout -b MY_COOL_STUFF_BRANCH
```
## Using `pre-commit` hook
Paddle developers use [pre-commit](http://pre-commit.com/) tool to manage git
pre-commit hooks. It can help us format source codes (cpp, python), check some
basic thing before commit (only one EOL for each file, do not add a huge file
in git). `pre-commit` tests is a part of unit tests in Travis-CI now, every
PR doesn't fit hook can not be merged into Paddle.
To use [pre-commit](http://pre-commit.com/), you should install it by
`pip install pre-commit`, and currently, Paddle uses `clang-format` to format
c/cpp sources. Please make sure clang-format 3.8+ installed.
Then just run `pre-commit install` in your Paddle clone directory. When you
commit your code, the pre-commit hook will check the local code if there is
anything not suitable to commit, and so on.
## Commit
Commit your changes by following command lines:
```shell
# show the working tree status
git status
# add modified files
git add xx
env EDITOR=vim git commit # You can write your comments by vim/nano/emacs.
```
The first line of commit infomation is the title. The second and later lines
are the details if any.
## Keeping Fork Up to Date
Before pull your request, you should sync your code from the latest PaddlePaddle.
To do this, you'll need to add a remote at first:
```shell
# see the current configured remote repository
git remote -v
# add upstream repository
git remote add upstream https://github.com/PaddlePaddle/Paddle.git
# verify the new upstream
git remote -v
```
Update your fork with the latest upstream changes:
```shell
git pull --rebase upstream develop
```
If there are no unique commits locally, git will simply perform a fast-forward.
However, if you have been making changes (in the vast majority of cases you
probably shouldn't be), you may have to deal with conflicts.
Now, your local master branch is up-to-date with everything modified upstream.
## Push to GitHub
```shell
# push to your repository in Github
git push -u origin MY_COOL_STUFF_BRANCH # create remote branch MY_COOL_STUFF_BRANCH to origin.
```
## Pull Request
Go to the page for your fork on GitHub, select your development branch,
and click the **pull request button**.
## Update your pull request with the lastest version
During the code review, your pull request may become stale because new commits in
baidu/Paddle. GitHub allows autmotic update if there is no conflict. You can do this
by clicking the "Update Branch" button in your pull request page. However, in the case
of conflict, you need to do the update manually. You need to do the following on
your local repository:
```shell
git checkout MY_COOL_STUFF_BRANCH
git pull upstream develop
# You may need to resolve the conflict according to the git prompt.
# Make and test your code.
git push origin MY_COOL_STUFF_BRANCH
```
Now your Pull Request is updated with the latest version.
## Revise your pull request
When you revise your pull request according to reviewer's comments, please use 'git commit' instead of 'git commit --amend' to commit your changes so that the reviewers can see the difference between the new pull requrest and the old pull request.
The possible commands are
```shell
git checkout MY_COOL_STUFF_BRANCH
git pull upstream develop # update local to newest code base.
# May be some conflicts will occured.
# And develop your cool stuff
env EDITOR=vim git commit # add your revise log
git push origin MY_COOL_STUFF_BRANCH
```
此差异已折叠。
HOW TO
=======
Usage
-------
.. toctree::
:maxdepth: 1
usage/cmd_parameter/index_en.rst
usage/cluster/cluster_train_en.md
usage/k8s/k8s_en.md
usage/k8s/k8s_aws_en.md
Development
------------
.. toctree::
:maxdepth: 1
dev/new_layer_en.rst
dev/contribute_to_paddle_en.md
Configuration
-------------
.. toctree::
:maxdepth: 1
deep_model/rnn/index_en.rst
Optimization
-------------
.. toctree::
:maxdepth: 1
optimization/gpu_profiling_en.rst
====================
Tune GPU Performance
====================
.. contents::
This tutorial will guide you step-by-step through how to conduct profiling and performance tuning using built-in timer, **nvprof** and **nvvp**.
- What is profiling?
- Why we need profiling?
- How to do profiling?
- Profile tools
- Hands-on Tutorial
- Profiling tips
What's profiling?
=================
In software engineering, profiling is a form of dynamic program analysis that measures the space (memory) or time
complexity of a program, the usage of particular instructions, or the frequency and duration of function calls.
Most commonly, profiling information serves to aid program optimization.
Briefly, profiler is used to measure application performance. Program analysis tools are extremely important for
understanding program behavior. Simple profiling can tell you that how long does an operation take? For advanced
profiling, it can interpret why does an operation take a long time?
Why we need profiling?
======================
Since training deep neural network typically take a very long time to get over, performance is gradually becoming
the most important thing in deep learning field. The first step to improve performance is to understand what parts
are slow. There is no point in improving performance of a region which doesn’t take much time!
How to do profiling?
====================
To achieve maximum performance, there are five steps you can take to reach your goals.
- Profile the code
- Find the slow parts
- Work out why they’re slow
- Make them fast
- Profile the code again
Usually, processor has two key performance limits include float point throughput and
memory throughput. For GPU, it also need more parallelism to fulfill its potential.
This is why they can be so fast.
Profiler Tools
==============
For general GPU profiling, a bunch of tools are provided from both NVIDIA and third party.
**nvprof** is Nvidia profiler and **nvvp** is (GUI based) Nvidia visual profiler.
In this tutorial, we will focus on nvprof and nvvp.
:code:`test_GpuProfiler` from :code:`paddle/math/tests` directory will be used to evaluate
above profilers.
.. literalinclude:: ../../../paddle/math/tests/test_GpuProfiler.cpp
:language: c++
:lines: 137-151
:linenos:
The above code snippet includes two methods, you can use any of them to profile the regions of interest.
1. :code:`REGISTER_TIMER_INFO` is a built-in timer wrapper which can calculate the time overhead of both cpu functions and cuda kernels.
2. :code:`REGISTER_GPU_PROFILER` is a general purpose wrapper object of :code:`cudaProfilerStart` and :code:`cudaProfilerStop` to avoid
program crashes when CPU version of PaddlePaddle invokes them.
You can find more details about how to use both of them in the next session.
Hands-on Approach
=================
Built-in Timer
--------------
To enable built-in timer in PaddlePaddle, first you have to add :code:`REGISTER_TIMER_INFO` into the regions of you interest.
Then, all information could be stamped in the console via :code:`printStatus` or :code:`printAllStatus` function.
As a simple example, consider the following:
1. Add :code:`REGISTER_TIMER_INFO` and :code:`printAllStatus` functions (see the emphasize-lines).
.. literalinclude:: ../../../paddle/math/tests/test_GpuProfiler.cpp
:language: c++
:lines: 137-151
:emphasize-lines: 8-12,14
:linenos:
2. Configure cmake with **WITH_TIMER** and recompile PaddlePaddle.
.. code-block:: bash
cmake .. -DWITH_TIMER=ON
make
3. Execute your code and observe the results (see the emphasize-lines).
.. code-block:: bash
:emphasize-lines: 1,12-15
> ./paddle/math/tests/test_GpuProfiler
I1117 11:13:42.313065 2522362816 Util.cpp:155] commandline: ./paddle/math/tests/test_GpuProfiler
I1117 11:13:42.845065 2522362816 Util.cpp:130] Calling runInitFunctions
I1117 11:13:42.845208 2522362816 Util.cpp:143] Call runInitFunctions done.
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Profiler
[ RUN ] Profiler.BilinearFwdBwd
I1117 11:13:42.845310 2522362816 test_GpuProfiler.cpp:114] Enable GPU Profiler Stat: [testBilinearFwdBwd] "numSamples = 10, channels = 16, im
gSizeX = 64, imgSizeY = 64"
I1117 11:13:42.850154 2522362816 ThreadLocal.cpp:37] thread use undeterministic rand seed:20659751
I1117 11:13:42.981501 2522362816 Stat.cpp:130] ======= StatSet: [GlobalStatInfo] status ======
I1117 11:13:42.981539 2522362816 Stat.cpp:133] Stat=testBilinearFwdBwd total=136.141 avg=136.141 max=136.141 min=136.141 count=1
I1117 11:13:42.981572 2522362816 Stat.cpp:141] ======= BarrierStatSet status ======
I1117 11:13:42.981575 2522362816 Stat.cpp:154] --------------------------------------------------
[ OK ] Profiler.BilinearFwdBwd (136 ms)
[----------] 1 test from Profiler (136 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (136 ms total)
[ PASSED ] 1 test.
nvprof profiler
---------------
To use this command line profiler **nvprof**, you can simply issue the following command:
1. Add :code:`REGISTER_GPU_PROFILER` function (see the emphasize-lines).
.. literalinclude:: ../../../paddle/math/tests/test_GpuProfiler.cpp
:language: c++
:lines: 137-151
:emphasize-lines: 6-7
:linenos:
2. Configure cmake with **WITH_PROFILER** and recompile PaddlePaddle.
.. code-block:: bash
cmake .. -DWITH_PROFILER=ON
make
3. Use Nvidia profiler **nvprof** to profile the binary.
.. code-block:: bash
nvprof ./paddle/math/tests/test_GpuProfiler
Then, you can get the following profiling result:
.. code-block:: bash
==78544== Profiling application: ./paddle/math/tests/test_GpuProfiler
==78544== Profiling result:
Time(%) Time Calls Avg Min Max Name
27.60% 9.6305ms 5 1.9261ms 3.4560us 6.4035ms [CUDA memcpy HtoD]
26.07% 9.0957ms 1 9.0957ms 9.0957ms 9.0957ms KeBilinearInterpBw
23.78% 8.2977ms 1 8.2977ms 8.2977ms 8.2977ms KeBilinearInterpFw
22.55% 7.8661ms 2 3.9330ms 1.5798ms 6.2863ms [CUDA memcpy DtoH]
==78544== API calls:
Time(%) Time Calls Avg Min Max Name
46.85% 682.28ms 8 85.285ms 12.639us 682.03ms cudaStreamCreateWithFlags
39.83% 580.00ms 4 145.00ms 302ns 550.27ms cudaFree
9.82% 143.03ms 9 15.892ms 8.7090us 142.78ms cudaStreamCreate
1.23% 17.983ms 7 2.5690ms 23.210us 6.4563ms cudaMemcpy
1.23% 17.849ms 2 8.9247ms 8.4726ms 9.3768ms cudaStreamSynchronize
0.66% 9.5969ms 7 1.3710ms 288.43us 2.4279ms cudaHostAlloc
0.13% 1.9530ms 11 177.54us 7.6810us 591.06us cudaMalloc
0.07% 1.0424ms 8 130.30us 1.6970us 453.72us cudaGetDevice
0.04% 527.90us 40 13.197us 525ns 253.99us cudaEventCreateWithFlags
0.03% 435.73us 348 1.2520us 124ns 42.704us cuDeviceGetAttribute
0.03% 419.36us 1 419.36us 419.36us 419.36us cudaGetDeviceCount
0.02% 260.75us 2 130.38us 129.32us 131.43us cudaGetDeviceProperties
0.02% 222.32us 2 111.16us 106.94us 115.39us cudaLaunch
0.01% 214.06us 4 53.514us 28.586us 77.655us cuDeviceGetName
0.01% 115.45us 4 28.861us 9.8250us 44.526us cuDeviceTotalMem
0.01% 83.988us 4 20.997us 578ns 77.760us cudaSetDevice
0.00% 38.918us 1 38.918us 38.918us 38.918us cudaEventCreate
0.00% 34.573us 31 1.1150us 279ns 12.784us cudaDeviceGetAttribute
0.00% 17.767us 1 17.767us 17.767us 17.767us cudaProfilerStart
0.00% 15.228us 2 7.6140us 3.5460us 11.682us cudaConfigureCall
0.00% 14.536us 2 7.2680us 1.1490us 13.387us cudaGetLastError
0.00% 8.6080us 26 331ns 173ns 783ns cudaSetupArgument
0.00% 5.5470us 6 924ns 215ns 2.6780us cuDeviceGet
0.00% 5.4090us 6 901ns 328ns 3.3320us cuDeviceGetCount
0.00% 4.1770us 3 1.3920us 1.0630us 1.8300us cuDriverGetVersion
0.00% 3.4650us 3 1.1550us 1.0810us 1.2680us cuInit
0.00% 830ns 1 830ns 830ns 830ns cudaRuntimeGetVersion
nvvp profiler
-------------
For visual profiler **nvvp**, you can either import the output of :code:`nvprof –o ...` or
run application through GUI.
**Note: nvvp also support CPU profiling** (Click the box in nvvp to enable profile execution on CPU).
.. image:: nvvp1.png
:align: center
:scale: 33%
From the perspective of kernel functions, **nvvp** can even illustrate why does an operation take a long time?
As shown in the following figure, kernel's block usage, register usage and shared memory usage from :code:`nvvp`
allow us to fully utilize all warps on the GPU.
.. image:: nvvp2.png
:align: center
:scale: 33%
From the perspective of application, **nvvp** can give you some suggestions to address performance bottleneck.
For instance, some advice in data movement and compute utilization from the below figure can guide you to tune performance.
.. image:: nvvp3.png
:align: center
:scale: 33%
.. image:: nvvp4.png
:align: center
:scale: 33%
Profiling tips
==============
- The **nvprof** and **nvvp** output is a very good place to start.
- The timeline is a good place to go next.
- Only dig deep into a kernel if it’s taking a significant amount of your time.
- Where possible, try to match profiler output with theory.
1) For example, if I know I’m moving 1GB, and my kernel takes 10ms, I expect the profiler to report 100GB/s.
2) Discrepancies are likely to mean your application isn’t doing what you thought it was.
- Know your hardware: If your GPU can do 6 TFLOPs, and you’re already doing 5.5 TFLOPs, you won’t go much faster!
Profiling is a key step in optimization. Sometimes quite simple changes can lead to big improvements in performance.
Your mileage may vary!
Reference
=========
Jeremy Appleyard, `GPU Profiling for Deep Learning <http://www.robots.ox.ac.uk/~seminars/seminars/Extra/2015_10_08_JeremyAppleyard.pdf>`_, 2015
# Run Distributed Training
In this article, we explain how to run distributed Paddle training jobs on clusters. We will create the distributed version of the single-process training example, [recommendation](https://github.com/baidu/Paddle/tree/develop/demo/recommendation).
[Scripts](https://github.com/baidu/Paddle/tree/develop/paddle/scripts/cluster_train) used in this article launch distributed jobs via SSH. They also work as a reference for users running more sophisticated cluster management systems like MPI and [Kubernetes](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/howto/usage/k8s).
## Prerequisite
1. Aforementioned scripts use a Python library [fabric](http://www.fabfile.org/) to run SSH commands. We can use `pip` to install fabric:
```bash
pip install fabric
```
1. We need to install PaddlePaddle on all nodes in the cluster. To enable GPUs, we need to install CUDA in `/usr/local/cuda`; otherwise Paddle would report errors at runtime.
1. Set the `ROOT_DIR` variable in [`cluster_train/conf.py`] on all nodes. For convenience, we often create a Unix user `paddle` on all nodes and set `ROOT_DIR=/home/paddle`. In this way, we can write public SSH keys into `/home/paddle/.ssh/authorized_keys` so that user `paddle` can SSH to all nodes without password.
## Prepare Job Workspace
We refer to the directory where we put dependent libraries, config files, etc., as *workspace*.
These `train/test` data should be prepared before launching cluster job. To satisfy the requirement that train/test data are placed in different directory from workspace, PADDLE refers train/test data according to index file named as `train.list/test.list` which are used in model config file. So the train/test data also contains train.list/test.list two list file. All local training demo already provides scripts to help you create these two files, and all nodes in cluster job will handle files with same logical code in normal condition.
Generally, you can use same model file from local training for cluster training. What you should have in mind that, the `batch_size` set in `setting` function in model file means batch size in `each` node of cluster job instead of total batch size if synchronization SGD was used.
Following steps are based on [demo/recommendation](https://github.com/PaddlePaddle/Paddle/tree/develop/demo/recommendation) demo in demo directory.
You just go through demo/recommendation tutorial doc until `Train` section, and at last you will get train/test data and model configuration file. Finaly, just use demo/recommendation as workspace for cluster training.
At last your workspace should look like as follow:
```
.
|-- common_utils.py
|-- data
| |-- config.json
| |-- config_generator.py
| |-- meta.bin
| |-- meta_config.json
| |-- meta_generator.py
| |-- ml-1m
| |-- ml_data.sh
| |-- ratings.dat.test
| |-- ratings.dat.train
| |-- split.py
| |-- test.list
| `-- train.list
|-- dataprovider.py
|-- evaluate.sh
|-- prediction.py
|-- preprocess.sh
|-- requirements.txt
|-- run.sh
`-- trainer_config.py
```
Not all of these files are needed for cluster training, but it's not necessary to remove useless files.
`trainer_config.py`
Indicates the model config file.
`train.list` and `test.list`
File index. It stores all relative or absolute file paths of all train/test data at current node.
`dataprovider.py`
used to read train/test samples. It's same as local training.
`data`
all files in data directory are refered by train.list/test.list which are refered by data provider.
## Prepare Cluster Job Configuration
The options below must be carefully set in cluster_train/conf.py
`HOSTS` all nodes hostname or ip that will run cluster job. You can also append user and ssh port with hostname, such as root@192.168.100.17:9090.
`ROOT_DIR` workspace ROOT directory for placing JOB workspace directory
`PADDLE_NIC` the NIC(Network Interface Card) interface name for cluster communication channel, such as eth0 for ethternet, ib0 for infiniband.
`PADDLE_PORT` port number for cluster commnunication channel
`PADDLE_PORTS_NUM` the number of port used for cluster communication channle. if the number of cluster nodes is small(less than 5~6nodes), recommend you set it to larger, such as 2 ~ 8, for better network performance.
`PADDLE_PORTS_NUM_FOR_SPARSE` the number of port used for sparse updater cluster commnunication channel. if sparse remote update is used, set it like `PADDLE_PORTS_NUM`
`LD_LIBRARY_PATH` set addtional LD_LIBRARY_PATH for cluster job. You can use it to set CUDA libraries path.
Default Configuration as follow:
```python
HOSTS = [
"root@192.168.100.17",
"root@192.168.100.18",
]
'''
workspace configuration
'''
#root dir for workspace
ROOT_DIR = "/home/paddle"
'''
network configuration
'''
#pserver nics
PADDLE_NIC = "eth0"
#pserver port
PADDLE_PORT = 7164
#pserver ports num
PADDLE_PORTS_NUM = 2
#pserver sparse ports num
PADDLE_PORTS_NUM_FOR_SPARSE = 2
#environments setting for all processes in cluster job
LD_LIBRARY_PATH="/usr/local/cuda/lib64:/usr/lib64"
```
### Launching Cluster Job
`paddle.py` provides automatical scripts to start all PaddlePaddle cluster processes in different nodes. By default, all command line options can set as `paddle.py` command options and `paddle.py` will transparently and automatically set these options to PaddlePaddle lower level processes.
`paddle.py`provides two distinguished command option for easy job launching.
`job_dispatch_package` set it with local `workspace`directory, it will be dispatched to all nodes set in conf.py. It could be helpful for frequent hacking workspace files, otherwise frequent mulit-nodes workspace deployment could make your crazy.
`job_workspace` set it with already deployed workspace directory, `paddle.py` will skip dispatch stage to directly launch cluster job with all nodes. It could help to reduce heavy
dispatch latency.
`cluster_train/run.sh` provides command line sample to run `demo/recommendation` cluster job, just modify `job_dispatch_package` and `job_workspace` with your defined directory, then:
```
sh run.sh
```
The cluster Job will start in several seconds.
### Kill Cluster Job
`paddle.py` can capture `Ctrl + C` SIGINT signal to automatically kill all processes launched by it. So just stop `paddle.py` to kill cluster job. You should mannally kill job if program crashed.
### Check Cluster Training Result
Check log in $workspace/log for details, each node owns same log structure.
`paddle_trainer.INFO`
It provides almost all interal output log for training, same as local training. Check runtime model convergence here.
`paddle_pserver2.INFO`
It provides pserver running log, which could help to diagnose distributed error.
`server.log`
It provides stderr and stdout of pserver process. Check error log if training crashs.
`train.log`
It provides stderr and stdout of trainer process. Check error log if training crashs.
### Check Model Output
After one pass finished, model files will be writed in `output` directory in node 0.
`nodefile` in workspace indicates the node id of current cluster job.
# Argument Outline
It looks like there are a lot of arguments. However, most of them are for developers or alrealy set automatically in cluster submitting environment and users do not need to care about them. Here, we divide these arguments into serveral classes according to the scenario that they are used in. For example, the arguments in `common` can be used in all scenes. Some arguments can be only used in certain layers. Some are needed by multi machines training in cluster, etc.
<html>
<table border="2" frame="border">
<thead>
<tr>
<th scope="col" class="left"></th>
<th scope="col" class="left">args</th>
<th scope="col" class="left">local train</th>
<th scope="col" class="left">cluster train</th>
<th scope="col" class="left">local test</th>
<th scope="col" class="left">cluster test</th>
</tr>
</thead>
<tbody>
<tr>
<td class="left" rowspan="9">common</td>
<td class="left">job</td>
<td class="left">√</td><td class="left">√</td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left">use_gpu</td>
<td class="left">√</td><td class="left">√</td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left">local</td>
<td class="left">√</td><td class="left">√</td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left">config</td>
<td class="left">√</td><td class="left">√</td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left">config_args</td>
<td class="left">√</td><td class="left">√</td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left">num_passes</td>
<td class="left">√</td><td class="left">√</td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left">trainer_count</td>
<td class="left">√</td><td class="left">√</td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left">version</td>
<td class="left">√</td><td class="left">√</td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left">show_layer_stat</td>
<td class="left">√</td><td class="left">√</td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left" rowspan="15">train</td><td class="left">dot_period</td>
<td class="left">√</td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">test_period</td>
<td class="left">√</td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">saving_period</td>
<td class="left">√</td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">show_parameter_stats_period</td>
<td class="left">√</td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">init_model_path</td>
<td class="left">√</td><td class="left">√</td><td class="left">√</td><td class="left"></td>
</tr>
<tr>
<td class="left">load_missing_parameter_strategy</td>
<td class="left">√</td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">saving_period_by_batches</td>
<td class="left">√</td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">use_old_updater</td>
<td class="left">√</td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">enable_grad_share</td>
<td class="left">√</td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">grad_share_block_num</td>
<td class="left">√</td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">log_error_clipping</td>
<td class="left">√</td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">log_clipping</td>
<td class="left">√</td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">save_only_one</td>
<td class="left">√</td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">start_pass</td>
<td class="left">√</td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">train/test</td><td class="left">save_dir</td>
<td class="left">√</td><td class="left">√</td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left" rowspan = "2">testing during training</td><td class="left">test_period</td>
<td class="left">√</td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">average_test_period</td>
<td class="left">√</td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left" rowspan = "5">test</td><td class="left">model_list</td>
<td class="left"></td><td class="left"></td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left">test_wait</td>
<td class="left"></td><td class="left"></td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left">test_pass</td>
<td class="left"></td><td class="left"></td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left">predict_output_dir</td>
<td class="left"></td><td class="left"></td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left">distribute_test</td>
<td class="left"></td><td class="left"></td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left">Auc/PnpairValidation</td><td class="left">predict_file</td>
<td class="left"></td><td class="left"></td><td class="left"></td>√<td class="left">√</td>
</tr>
<tr>
<td class="left" rowspan = "6">GPU</td><td class="left">gpu_id</td>
<td class="left">√</td><td class="left">√</td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left">parallel_nn</td>
<td class="left">√</td><td class="left">√</td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left">allow_only_one_model_on_one_gpu</td>
<td class="left">√</td><td class="left">√</td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left">cudnn_dir</td>
<td class="left">√</td><td class="left">√</td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left">cuda_dir</td>
<td class="left">√</td><td class="left">√</td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left">cudnn_conv_workspace_limit_in_mb</td>
<td class="left">√</td><td class="left">√</td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left" rowspan = "4">RNN</td>
<td class="left">beam_size</td>
<td class="left"></td><td class="left"></td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left">rnn_use_batch</td>
<td class="left">√</td><td class="left">√</td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left">prev_batch_state</td>
<td class="left">√</td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">diy_beam_search_prob_so</td>
<td class="left"></td><td class="left"></td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left" rowspan = "2">metric learning</td><td class="left">external</td>
<td class="left">√</td><td class="left">√</td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left">data_server_port</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left">√</td>
</tr>
<tr>
<td class="left" rowspan = "16">PServer</td><td class="left">start_pserver</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left">√</td>
</tr>
<tr>
<td class="left">pservers</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left">√</td>
</tr>
<tr>
<td class="left">port</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left">√</td>
</tr>
<tr>
<td class="left">port_num</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left">√</td>
</tr>
<tr>
<td class="left">ports_num_for_sparse</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left">√</td>
</tr>
<tr>
<td class="left">nics</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left">√</td>
</tr>
<tr>
<td class="left">rdma_tcp</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left">√</td>
</tr>
<tr>
<td class="left">small_messages</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">loadsave_parameters_in_pserver</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left">√</td>
</tr>
<tr>
<td class="left">log_period_server</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">pserver_num_threads</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">sock_send_buf_size</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">sock_recv_buf_size</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">num_gradient_servers</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">parameter_block_size</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">parameter_block_size_for_sparse</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left" rowspan = "3">Async SGD</td><td class="left">async_count</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">async_lagged_ratio_min</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">async_lagged_ratio_default</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left" rowspan = "8">Performance Tuning</td><td class="left">log_barrier_abstract</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">log_barrier_lowest_nodes</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">log_barrier_show_log</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">check_sparse_distribution_batches</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">check_sparse_distribution_ratio</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">check_sparse_distribution_unbalance_degree</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">check_sparse_distribution_in_pserver</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">show_check_sparse_distribution_log</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">Data Provider</td><td class="left">memory_threshold_on_load_data</td>
<td class="left">√</td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left" rowspan = "2">RandomNumber</td><td class="left">seed</td>
<td class="left">√</td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">thread_local_rand_use_global_seed</td>
<td class="left">√</td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">UnitTest</td><td class="left">checkgrad_eps</td>
<td class="left"></td><td class="left"></td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">Matrix/Vector</td><td class="left">enable_parallel_vector</td>
<td class="left">√</td><td class="left">√</td><td class="left">√</td><td class="left">√</td>
</tr>
</tbody>
</table>
</html>
```eval_rst
.. _cmd_detail_introduction:
```
# Detail Description
## Common
* `--job`
- Job mode, including: **train, test, checkgrad**, where checkgrad is mainly for developers and users do not need to care about.
- type: string (default: train)
* `--config`
- Use to specfiy network configure file.
- type: string (default: null).
* `--use_gpu`
- Whether to use GPU for training, false is cpu mode and true is gpu mode.
- type: bool (default: 1).
* `--local`
- Whether the training is in local mode or not. True when training locally or using one node in cluster. False when using multiple machines in cluster.
- type: bool (default: 1).
* `--trainer_count`
- Define the number of threads used in one machine. For example, trainer_count = 4, means use 4 GPU in GPU mode and 4 threads in CPU mode. Each thread (or GPU) is assigned to 1/4 samples in current batch. That is to say, if setting batch_size of 512 in trainer config, each thread train 128 samples.
- type: int32 (default: 1).
* `--num_passes`
- When `--job=train`, means training for num_passes passes. One pass means training all samples in dataset one time. When `--job=test`, means testing data from model of test_pass to model of (num_passes - 1).
- type: int32 (default: 100).
* `--config_args`
- arguments passed to config file. Format: key1=value1,key2=value2.
- type: string (default: null).
* `--version`
- Whether to print version information.
- type: bool (default: 0).
* `--show_layer_stat`
- Whether to show the statistics of each layer **per batch**.
- type: bool (default: 0).
## Train
* `--log_period`
- Log progress every log_period batches.
- type: int32 (default: 100).
* `--dot_period`
- Print '.' every dot_period batches.
- type: int32 (default: 1).
* `--saving_period`
- Save parameters every saving_period passes
- type: int32 (default: 1).
* `--save_dir`
- Directory for saving model parameters. It needs to be specified, but no need to be created in advance.
- type: string (default: null).
* `--start_pass`
- Start training from this pass. It will load parameters from the previous pass.
- type: int32 (default: 0).
* `--show_parameter_stats_period`
- Show parameter statistic during training every show_parameter_stats_period batches. It will not show by default.
- type: int32 (default: 0).
* `--save_only_one`
- Save the parameters only in last pass, while the previous parameters will be removed.
- type: bool (default: 0).
* `--load_missing_parameter_strategy`
- Specify the loading operation when model file is missing. Now support fail/rand/zero three operations.
- `fail`: program will exit.
- `rand`: uniform or normal distribution according to **initial\_strategy** in network config. Uniform range is: **[mean - std, mean + std]**, where mean and std are configures in trainer config.
- `zero`: all parameters are zero.
- type: string (default: fail).
* `--init_model_path`
- Path of the initialization model. If it was set, start\_pass will be ignored. It can be used to specify model path in testing mode as well.
- type: string (default: null).
* `--saving_period_by_batches`
- Save parameters every saving_period_by_batches batches in one pass.
- type: int32 (default: 0).
* `--log_error_clipping`
- Whether to print error clipping log when setting **error_clipping_threshold** in layer config. If it is true, log will be printed in backward propagation **per batch**. This clipping effects on **gradient of output**.
- type: bool (default: 0).
* `--log_clipping`
- Enable print log clipping or not when setting **gradient_clipping_threshold** in trainer config. This clipping effects on **gradient w.r.t. (with respect to) weight**.
- type: bool (default: 0).
* `--use_old_updater`
- Whether to use the old RemoteParameterUpdater. Default use ConcurrentRemoteParameterUpdater. It is mainly for deverlopers and users usually do not need to care about.
- type: bool (default: 0).
* `--enable_grad_share`
- threshold for enable gradient parameter, which is shared for batch multi-cpu training.
- type: int32 (default: 100 \* 1024 \* 1024).
* `--grad_share_block_num`
- block number of gradient parameter, which is shared for batch multi-cpu training.
- type: int32 (default: 64).
## Test
* `--test_pass`
- Load parameter from this pass to test.
- type: int32 (default: -1).
* `--test_period`
- if equal 0, do test on all test data at the end of each pass. While if equal non-zero, do test on all test data every test_period batches.
- type: int32 (default: 0).
* `--test_wait`
 - Whether to wait for parameter per pass if not exist. It can be used when user launch another process to perfom testing during the training process.
- type: bool (default: 0).
* `--model_list`
- File that saves the model list when testing.
- type: string (default: "", null).
* `--predict_output_dir`
- Directory that saves the layer output. It is configured in Outputs() in network config. Default, this argument is null, meaning save nothing. Specify this directory if you want to save feature map of some layers in testing mode. Note that, layer outputs are values after activation function.
- type: string (default: "", null).
* `--average_test_period`
- Do test on average parameter every `average_test_period` batches. It MUST be devided by FLAGS_log_period. Default 0 means do not test on average parameter.
- type: int32 (default: 0).
* `--distribute_test`
- Testing in distribute environment will merge results from multiple machines.
- type: bool (default: 0).
* `--predict_file`
- File name for saving predicted result. Default, this argument is null, meaning save nothing. Now, this argument is only used in AucValidationLayer and PnpairValidationLayer, and saves predicted result every pass.
- type: string (default: "", null).
## GPU
* `--gpu_id`
- Which gpu core to use.
- type: int32 (default: 0).
* `--allow_only_one_model_on_one_gpu`
- If true, do not allow multiple models on one GPU device.
- type: bool (default: 1).
* `--parallel_nn`
- Whether to use multi-thread to calculate one neural network or not. If false, use gpu_id specify which gpu core to use (the device property in trainer config will be ingored). If true, the gpu core is specified in trainer config (gpu_id will be ignored).
- type: bool (default: 0).
* `--cudnn_dir`
- Choose path to dynamic load NVIDIA CuDNN library, for instance, /usr/local/cuda/lib64. [Default]: LD_LIBRARY_PATH
- type: string (default: "", null)
* `--cuda_dir`
- Choose path to dynamic load NVIDIA CUDA library, for instance, /usr/local/cuda/lib64. [Default]: LD_LIBRARY_PATH
- type: string (default: "", null)
* `--cudnn_conv_workspace_limit_in_mb`
- Specify cuDNN max workspace limit, in units MB, 4096MB=4GB by default.
- type: int32 (default: 4096MB=4GB)
## NLP: RNN/LSTM/GRU
* `--rnn_use_batch`
- Whether to use batch method for calculation in simple RecurrentLayer.
- type: bool (default: 0).
* `--prev_batch_state`
- batch is continue with next batch.
- type: bool (default: 0).
* `--beam_size`
- Beam search uses breadth-first search to build its search tree. At each level of the tree, it generates all successors of the states at the current level, sorting them in increasing order of heuristic cost. However, it only stores a predetermined number of best states at each level (called the beam size).
- type: int32 (default: 1).
* `--diy_beam_search_prob_so`
- Specify shared dynamic library. It can be defined out of paddle by user.
- type: string (default: "", null).
## Metric Learning
* `--external`
- Whether to use external machine for metric learning.
- type: bool (default: 0).
* `--data_server_port`
- Listening port for dserver (data server), dserver is mainly used in metric learning.
- type: int32 (default: 21134).
## DataProvider
* `--memory_threshold_on_load_data`
- Stop loading data when memory is not sufficient.
- type: double (default: 1.0).
## Unit Test
* `--checkgrad_eps`
- parameter change size for checkgrad.
- type: double (default: 1e-05).
## Parameter Server and Distributed Communication
* `--start_pserver`
- Whether to start pserver (parameter server).
- type: bool (default: 0).
* `--pservers`
- Comma separated IP addresses of pservers.
- type: string (default: "127.0.0.1").
* `--port`
- Listening port for pserver.
- type: int32 (default: 20134).
* `--ports_num`
- The ports number for parameter send, increment based on default port number.
- type: int32 (default: 1).
* `--trainer_id`
- In distributed training, each trainer must be given an unique id ranging from 0 to num_trainers-1. Trainer 0 is the master trainer. User do not need to care this flag.
- type: int32 (default: 0).
* `--num_gradient_servers`
- Numbers of gradient servers. This arguments is set automatically in cluster submitting environment.
- type: int32 (default: 1).
* `--small_messages`
- If message size is small, recommend set it True to enable quick ACK and no delay
- type: bool (default: 0).
* `--sock_send_buf_size`
- Restrict socket send buffer size. It can reduce network congestion if set carefully.
- type: int32 (default: 1024 \* 1024 \* 40).
* `--sock_recv_buf_size`
- Restrict socket recieve buffer size.
- type: int32 (default: 1024 \* 1024 \* 40).
* `--parameter_block_size`
- Parameter block size for pserver, will automatically calculate a suitable value if it's not set.
- type: int32 (default: 0).
* `--parameter_block_size_for_sparse`
- Parameter block size for sparse update pserver, will automatically calculate a suitable value if it's not set.
- type: int32 (default: 0).
* `--log_period_server`
- Log progress every log_period_server batches at pserver end.
- type: int32 (default: 500).
* `--loadsave_parameters_in_pserver`
- Load and save parameters in pserver. Only work when parameter set sparse_remote_update.
- type: bool (default: 0).
* `--pserver_num_threads`
- number of threads for sync op exec.
- type: bool (default: 1).
* `--ports_num_for_sparse`
- The ports number for parameter send, increment based on default (port + ports_num). It is used by sparse Tranning.
- type: int32 (default: 0).
* `--nics`
- Network device name for pservers, already set in cluster submitting environment.
- type: string (default: "xgbe0,xgbe1").
* `--rdma_tcp`
- Use rdma or tcp transport protocol, already set in cluster submitting environment.
- type: string (default: "tcp").
## Async SGD
* `--async_count`
- Defined the asynchronous training length, if 0, then use synchronized training.
- type: int32 (default: 0).
* `--async_lagged_ratio_min`
- Control the minimize value of `config_.async_lagged_grad_discard_ratio()`.
- type: double (default: 1.0).
* `--async_lagged_ratio_default`
- If async_lagged_grad_discard_ratio is not set in network config, use it as defalut value.
- type: double (default: 1.5).
## Performance Tuning
* `--log_barrier_abstract`
- If true, show abstract barrier performance information.
- type: bool (default: 1).
* `--log_barrier_show_log`
- If true, always show barrier abstract even with little gap.
- type: bool (default: 0).
* `--log_barrier_lowest_nodes`
- How many lowest node will be logged.
- type: int32 (default: 5).
* `--check_sparse_distribution_in_pserver`
- Whether to check that the distribution of sparse parameter on all pservers is balanced.
- type: bool (default: 0).
* `--show_check_sparse_distribution_log`
- show log details for sparse parameter distribution in pserver.
- type: bool (default: 0).
* `--check_sparse_distribution_batches`
- Running sparse parameter distribution check every so many batches.
- type: int32 (default: 100).
* `--check_sparse_distribution_ratio`
- If parameters dispatched to different pservers have an unbalanced distribution for check_sparse_distribution_ratio * check_sparse_distribution_batches times, crash program.
- type: double (default: 0.6).
* `--check_sparse_distribution_unbalance_degree`
- The ratio of maximum data size / minimun data size for different pserver.
- type: double (default: 2).
## Matrix/Vector/RandomNumber
* `--enable_parallel_vector`
- threshold for enable parallel vector.
- type: int32 (default: 0).
* `--seed`
- random number seed. 0 for srand(time)
- type: int32 (default: 1)
* `--thread_local_rand_use_global_seed`
- Whether to use global seed in rand of thread local.
- type: bool (default: 0).
.. _cmd_line_index:
Set Command-line Parameters
===========================
.. toctree::
:maxdepth: 1
use_case_en.md
arguments_en.md
detail_introduction_en.md
# Use Case
## Local Training
These command line arguments are commonly used by local training experiments, such as image classification, natural language processing, et al.
```
paddle train \
--use_gpu=1/0 \ #1:GPU,0:CPU(default:true)
--config=network_config \
--save_dir=output \
--trainer_count=COUNT \ #(default:1)
--test_period=M \ #(default:0)
--num_passes=N \ #(defalut:100)
--log_period=K \ #(default:100)
--dot_period=1000 \ #(default:1)
#[--show_parameter_stats_period=100] \ #(default:0)
#[--saving_period_by_batches=200] \ #(default:0)
```
`show_parameter_stats_period` and `saving_period_by_batches` are optional according to your task.
### 1) Pass Command Argument to Network config
`config_args` is a useful parameter to pass arguments to network config.
```
--config_args=generating=1,beam_size=5,layer_num=10 \
```
And `get_config_arg` can be used to parse these arguments in network config as follows:
```
generating = get_config_arg('generating', bool, False)
beam_size = get_config_arg('beam_size', int, 3)
layer_num = get_config_arg('layer_num', int, 8)
```
`get_config_arg`:
```
get_config_arg(name, type, default_value)
```
- name: the name specified in the `--config_args`
- type: value type, bool, int, str, float etc.
- default_value: default value if not set.
### 2) Use Model to Initialize Network
add argument:
```
--init_model_path=model_path
--load_missing_parameter_strategy=rand
```
## Local Testing
Method 1:
```
paddle train --job=test \
--use_gpu=1/0 \
--config=network_config \
--trainer_count=COUNT \
--init_model_path=model_path \
```
- use init\_model\_path to specify test model.
- only can test one model.
Method 2:
```
paddle train --job=test \
--use_gpu=1/0 \
--config=network_config \
--trainer_count=COUNT \
--model_list=model.list \
```
- use model_list to specify test models
- can test several models, where model.list likes:
```
./alexnet_pass1
./alexnet_pass2
```
Method 3:
```
paddle train --job=test \
--use_gpu=1/0 \
--config=network_config \
--trainer_count=COUNT \
--save_dir=model \
--test_pass=M \
--num_passes=N \
```
This way must use model path saved by Paddle like this: `model/pass-%5d`. Testing model is from M-th pass to (N-1)-th pass. For example: M=12 and N=14 will test `model/pass-00012` and `model/pass-00013`.
## Sparse Training
Sparse training is usually used to accelerate calculation when input is sparse data with highly dimension. For example, dictionary dimension of input data is 1 million, but one sample just have several words. In paddle, sparse matrix multiplication is used in forward propagation and sparse updating is perfomed on weight updating after backward propagation.
### 1) Local training
You need to set **sparse\_update=True** in network config. Check the network config documentation for more details.
### 2) cluster training
Add the following argument for cluster training of a sparse model. At the same time you need to set **sparse\_remote\_update=True** in network config. Check the network config documentation for more details.
```
--ports_num_for_sparse=1 #(default: 0)
```
## parallel_nn
`parallel_nn` can be set to mixed use of GPUs and CPUs to compute layers. That is to say, you can deploy network to use a GPU to compute some layers and use a CPU to compute other layers. The other way is to split layers into different GPUs, which can **reduce GPU memory** or **use parallel computation to accelerate some layers**.
If you want to use these characteristics, you need to specify device ID in network config (denote it as deviceId) and add command line argument:
```
--parallel_nn=true
```
### case 1: Mixed Use of GPU and CPU
Consider the following example:
```
#command line:
paddle train --use_gpu=true --parallel_nn=true trainer_count=COUNT
default_device(0)
fc1=fc_layer(...)
fc2=fc_layer(...)
fc3=fc_layer(...,layer_attr=ExtraAttr(device=-1))
```
- default_device(0): set default device ID to 0. This means that except the layers with device=-1, all layers will use a GPU, and the specific GPU used for each layer depends on trainer\_count and gpu\_id (0 by default). Here, layer fc1 and fc2 are computed on the GPU.
- device=-1: use the CPU for layer fc3.
- trainer_count:
- trainer_count=1: if gpu\_id is not set, then use the first GPU to compute layers fc1 and fc2. Otherwise use the GPU with gpu\_id.
- trainer_count>1: use trainer\_count GPUs to compute one layer using data parallelism. For example, trainer\_count=2 means that GPUs 0 and 1 will use data parallelism to compute layer fc1 and fc2.
### Case 2: Specify Layers in Different Devices
```
#command line:
paddle train --use_gpu=true --parallel_nn=true --trainer_count=COUNT
#network:
fc2=fc_layer(input=l1, layer_attr=ExtraAttr(device=0), ...)
fc3=fc_layer(input=l1, layer_attr=ExtraAttr(device=1), ...)
fc4=fc_layer(input=fc2, layer_attr=ExtraAttr(device=-1), ...)
```
In this case, we assume that there are 4 GPUs in one machine.
- trainer_count=1:
- Use GPU 0 to compute layer fc2.
- Use GPU 1 to compute layer fc3.
- Use CPU to compute layer fc4.
- trainer_count=2:
- Use GPU 0 and 1 to compute layer fc2.
- Use GPU 2 and 3 to compute layer fc3.
- Use CPU to compute fc4 in two threads.
- trainer_count=4:
- It will fail (note, we have assumed that there are 4 GPUs in machine), because argument `allow_only_one_model_on_one_gpu` is true by default.
**Allocation of device ID when `device!=-1`**:
```
(deviceId + gpu_id + threadId * numLogicalDevices_) % numDevices_
deviceId: specified in layer.
gpu_id: 0 by default.
threadId: thread ID, range: 0,1,..., trainer_count-1
numDevices_: device (GPU) count in machine.
numLogicalDevices_: min(max(deviceId + 1), numDevices_)
```
此差异已折叠。
# Paddle On Kubernetes
>In this article, we will introduce how to run Paddle training job on single CPU machine using Kubernetes. In next article, we will introduce how to run Paddle training job on distributed cluster.
## Build Docker Image
In distributed Kubernetes cluster, we will use Ceph or other shared storage system for storing training related data so that all processes in Paddle training can retrieve data from Ceph. In this example, we will only demo training job on single machine. In order to simplify the requirement of the environment, we will directly put training data into Paddle's Docker Image, so we need to create a Paddle Docker image that already includes the training data.
Paddle's [Quick Start Tutorial](http://www.paddlepaddle.org/doc/demo/quick_start/index_en.html) introduces how to download and train data by using script from Paddle's source code.
And `paddledev/paddle:cpu-demo-latest` image has the Paddle source code and demo. (Caution: Default Paddle image `paddledev/paddle:cpu-latest` doesn't include the source code, Paddle's different versions of image can be referred here: [Docker installation guide](http://www.paddlepaddle.org/doc/build/docker_install.html)), so we run this container and download the training data, and then commit the whole container to be a new Docker image.
### Run Docker Container
```
$ docker run --name quick_start_data -it paddledev/paddle:cpu-demo-latest
```
### Download Training Data
Getting into `/root/paddle/demo/quick_start/data` Directory,using `get_data.sh` to download training data.
Then getting into `/root/paddle/demo/quick_start` Directory, using `preprocess.sh` to pre-process training data.
```
$ root@fbd1f2bb71f4:~/paddle/demo/quick_start/data# ./get_data.sh
Downloading Amazon Electronics reviews data...
--2016-10-31 01:33:43-- http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Electronics_5.json.gz
Resolving snap.stanford.edu (snap.stanford.edu)... 171.64.75.80
Connecting to snap.stanford.edu (snap.stanford.edu)|171.64.75.80|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 495854086 (473M) [application/x-gzip]
Saving to: 'reviews_Electronics_5.json.gz'
10% [=======> ] 874,279 64.7KB/s eta 2h 13m
```
### Modify Startup Script
After downloading the data,modify `/root/paddle/demo/quick_start/train.sh` file contents are as follows (one more cd cmd):
```
set -e
cd /root/paddle/demo/quick_start
cfg=trainer_config.lr.py
#cfg=trainer_config.emb.py
#cfg=trainer_config.cnn.py
#cfg=trainer_config.lstm.py
#cfg=trainer_config.bidi-lstm.py
#cfg=trainer_config.db-lstm.py
paddle train \
--config=$cfg \
--save_dir=./output \
--trainer_count=4 \
--log_period=20 \
--num_passes=15 \
--use_gpu=false \
--show_parameter_stats_period=100 \
--test_all_data_in_one_period=1 \
2>&1 | tee 'train.log'
```
### Commit Docker Image
```
$ docker commit quick_start_data mypaddle/paddle:quickstart
```
## Use Kubernetes For Training
>We will use Kubernetes job for training process, following steps shows how to do the training with Kubernetes.
### Create Yaml Files
The output result in container will be demolished when job finished (container stopped running), so we need to mount the volume out to the local disk when creating the container to store the training result. Using our previously created image, we can create a [Kubernetes Job](http://kubernetes.io/docs/user-guide/jobs/#what-is-a-job), the yaml contents are as follows:
```
apiVersion: batch/v1
kind: Job
metadata:
name: quickstart
spec:
parallelism: 1
completions: 1
template:
metadata:
name: quickstart
spec:
volumes:
- name: output
hostPath:
path: /home/work/paddle_output
containers:
- name: pi
image: mypaddle/paddle:quickstart
command: ["bin/bash", "-c", "/root/paddle/demo/quick_start/train.sh"]
volumeMounts:
- name: output
mountPath: /root/paddle/demo/quick_start/output
restartPolicy: Never
```
### Start Paddle Job
Using the above yaml file to start the Kubernetes job.
```
$ kubectl create -f paddle.yaml
```
Get the detailed status of the job:
```
$ kubectl get job
NAME DESIRED SUCCESSFUL AGE
quickstart 1 0 58s
$ kubectl describe job quickstart
Name: quickstart
Namespace: default
Image(s): registry.baidu.com/public/paddle:cpu-demo-latest
Selector: controller-uid=f120da72-9f18-11e6-b363-448a5b355b84
Parallelism: 1
Completions: 1
Start Time: Mon, 31 Oct 2016 11:20:16 +0800
Labels: controller-uid=f120da72-9f18-11e6-b363-448a5b355b84,job-name=quickstart
Pods Statuses: 0 Running / 1 Succeeded / 0 Failed
Volumes:
output:
Type: HostPath (bare host directory volume)
Path: /home/work/paddle_output
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1m 1m 1 {job-controller } Normal SuccessfulCreate Created pod: quickstart-fa0wx
```
### Get Training Result
We can use kubectl command to take a look at the status of related pod.
```
$ kubectl describe pod quickstart-fa0wx
Name: quickstart-fa0wx
Namespace: default
Node: paddle-demo-let02/10.206.202.44
Start Time: Mon, 31 Oct 2016 11:20:17 +0800
Labels: controller-uid=f120da72-9f18-11e6-b363-448a5b355b84,job-name=quickstart
Status: Succeeded
IP: 10.0.0.9
Controllers: Job/quickstart
Containers:
quickstart:
Container ID: docker://b8561f5c79193550d64fa47418a9e67ebdd71546186e840f88de5026b8097465
Image: registry.baidu.com/public/paddle:cpu-demo-latest
Image ID: docker://18e457ce3d362ff5f3febf8e7f85ffec852f70f3b629add10aed84f930a68750
Port:
Command:
bin/bash
-c
/root/paddle/demo/quick_start/train.sh
QoS Tier:
cpu: BestEffort
memory: BestEffort
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 31 Oct 2016 11:20:20 +0800
Finished: Mon, 31 Oct 2016 11:21:46 +0800
Ready: False
Restart Count: 0
Environment Variables:
Conditions:
Type Status
Ready False
Volumes:
output:
Type: HostPath (bare host directory volume)
Path: /home/work/paddle_output
```
We can also ssh to Kubernetes node to take a look at the training result.
```
[root@paddle-demo-let02 paddle_output]# ll
total 60
drwxr-xr-x 2 root root 4096 Oct 31 11:20 pass-00000
drwxr-xr-x 2 root root 4096 Oct 31 11:20 pass-00001
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00002
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00003
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00004
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00005
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00006
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00007
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00008
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00009
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00010
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00011
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00012
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00013
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00014
```
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册