提交 792163f6 编写于 作者: T Travis CI

Deploy to GitHub Pages: 2c98becb

上级 63e6f142
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: abb235454c522821afda02c2aa921d6f
config: 4d7a146cda87e1e0222ce8a24b0ea6b4
tags: 645f666f9bcd5a90fca523b33c5a78b7
ABOUT
=======
PaddlPaddle is an easy-to-use, efficient, flexible and scalable deep learning platform,
which is originally developed by Baidu scientists and engineers for the purpose of applying deep learning to many products at Baidu.
PaddlePaddle is now open source but far from complete, which is intended to be built upon, improved, scaled, and extended.
We hope to build an active open source community both by providing feedback and by actively contributing to the source code.
Credits
--------
We owe many thanks to `all contributors and developers <https://github.com/PaddlePaddle/Paddle/graphs/contributors>`_ of PaddlePaddle!
API
===
.. toctree::
:maxdepth: 1
v2/model_configs.rst
v2/data.rst
v2/run_logic.rst
DataProvider Introduction
=========================
Introduction
==============
DataProvider is a module that loads training or testing data into cpu or gpu
memory for the following triaining or testing process.
......@@ -32,11 +32,3 @@ Each line of train.list and test.list is an absolute or relative path (relative
to the PaddePaddle program runtime) of data file. Fascinatingly more, each line
can also be a HDFS file path or a SQL connection string. As long as the user
assures how to access each file in DataProvider.
Please refer to the following articles for more information about the detail
usages of DataProvider and how to implement a new DataProvider,
.. toctree::
pydataprovider2.rst
write_new_dataprovider.rst
How to use PyDataProvider2
==========================
.. _api_pydataprovider2:
PyDataProvider2
===============
We highly recommand users to use PyDataProvider2 to provide training or testing
data to PaddlePaddle. The user only needs to focus on how to read a single
......@@ -22,18 +24,18 @@ of 28 x 28 pixels.
A small part of the original data as an example is shown as below:
.. literalinclude:: ../../../doc_cn/ui/data_provider/mnist_train.txt
.. literalinclude:: src/mnist_train.txt
Each line of the data contains two parts, separated by :code:`;`. The first part is
label of an image. The second part contains 28x28 pixel float values.
Just write path of the above data into train.list. It looks like this:
.. literalinclude:: ../../../doc_cn/ui/data_provider/train.list
.. literalinclude:: src/train.list
The corresponding dataprovider is shown as below:
.. literalinclude:: ../../../doc_cn/ui/data_provider/mnist_provider.py
.. literalinclude:: src/mnist_provider.dict.py
The first line imports PyDataProvider2 package.
The main function is the process function, that has two parameters.
......@@ -72,7 +74,7 @@ sample by using keywords :code:`yield`.
Only a few lines of codes need to be added into the training configuration file,
you can take this as an example.
.. literalinclude:: ../../../doc_cn/ui/data_provider/mnist_config.py
.. literalinclude:: src/mnist_config.py
Here we specify training data by :code:`train.list`, and no testing data is specified.
The method which actually provide data is :code:`process`.
......@@ -81,7 +83,7 @@ User also can use another style to provide data, which defines the
:code:`data_layer`'s name explicitly when `yield`. For example,
the :code:`dataprovider` is shown as below.
.. literalinclude:: ../../../doc_cn/ui/data_provider/mnist_provider.dict.py
.. literalinclude:: src/mnist_provider.dict.py
:linenos:
If user did't give the :code:`data_layer`'s name, PaddlePaddle will use
......@@ -102,6 +104,8 @@ And PaddlePadle will do all of the rest things\:
Is this cool?
.. _api_pydataprovider2_sequential_model:
DataProvider for the sequential model
-------------------------------------
A sequence model takes sequences as its input. A sequence is made up of several
......@@ -117,11 +121,11 @@ negative sentiment (marked by 0 and 1 respectively).
A small part of the original data as an example can be found in the path below:
.. literalinclude:: ../../../doc_cn/ui/data_provider/sentimental_train.txt
.. literalinclude:: src/sentimental_train.txt
The corresponding data provider can be found in the path below:
.. literalinclude:: ../../../doc_cn/ui/data_provider/sentimental_provider.py
.. literalinclude:: src/sentimental_provider.py
This data provider for sequential model is a little more complex than that
for MINST dataset.
......@@ -139,7 +143,7 @@ initialized. The :code:`on_init` function has the following parameters:
To pass these parameters into DataProvider, the following lines should be added
into trainer configuration file.
.. literalinclude:: ../../../doc_cn/ui/data_provider/sentimental_config.py
.. literalinclude:: src/sentimental_config.py
The definition is basically same as MNIST example, except:
* Load dictionary in this configuration
......@@ -179,7 +183,7 @@ The four data types are:
* :code:`dense_vector`: dense float vector.
* :code:`sparse_binary_vector`: sparse binary vector, most of the value is 0, and
the non zero elements are fixed to 1.
* :code:`sparse_vector`: sparse float vector, most of the value is 0, and some
* :code:`sparse_float_vector`: sparse float vector, most of the value is 0, and some
non zero elements can be any float value. They are given by the user.
* :code:`integer`: an integer scalar, that is especially used for label or word index.
......@@ -200,7 +204,7 @@ in the above table.
+----------------------+---------------------+-----------------------------------+------------------------------------------------+
| sparse_binary_vector | [i, i, ...] | [[i, ...], [i, ...], ...] | [[[i, ...], ...], [[i, ...], ...],...] |
+----------------------+---------------------+-----------------------------------+------------------------------------------------+
| sparse_vector | [(i,f), (i,f), ...] | [[(i,f), ...], [(i,f), ...], ...] | [[[(i,f), ...], ...], [[(i,f), ...], ...],...] |
| sparse_float_vector | [(i,f), (i,f), ...] | [[(i,f), ...], [(i,f), ...], ...] | [[[(i,f), ...], ...], [[(i,f), ...], ...],...] |
+----------------------+---------------------+-----------------------------------+------------------------------------------------+
| integer_value | i | [i, i, ...] | [[i, ...], [i, ...], ...] |
+----------------------+---------------------+-----------------------------------+------------------------------------------------+
......
API
===
DataProvider API
----------------
.. toctree::
:maxdepth: 1
data_provider/dataprovider_en.rst
data_provider/pydataprovider2_en.rst
.. _api_trainer_config:
Model Config API
----------------
.. toctree::
:maxdepth: 1
trainer_config_helpers/optimizers.rst
trainer_config_helpers/data_sources.rst
trainer_config_helpers/layers.rst
trainer_config_helpers/activations.rst
trainer_config_helpers/poolings.rst
trainer_config_helpers/networks.rst
trainer_config_helpers/evaluators.rst
trainer_config_helpers/attrs.rst
Applications API
----------------
.. toctree::
:maxdepth: 1
predict/swig_py_paddle_en.rst
Python Prediction API
=====================
Python Prediction
==================
PaddlePaddle offers a set of clean prediction interfaces for python with the help of
SWIG. The main steps of predict values in python are:
......@@ -13,7 +13,7 @@ Here is a sample python script that shows the typical prediction process for the
MNIST classification problem. A complete sample code could be found at
:code:`src_root/doc/ui/predict/predict_sample.py`.
.. literalinclude:: ./predict_sample.py
.. literalinclude:: src/predict_sample.py
:language: python
:lines: 15-18,90-100,101-104
......@@ -23,7 +23,7 @@ python's :code:`help()` function. Let's walk through the above python script:
* At the beginning, use :code:`swig_paddle.initPaddle()` to initialize
PaddlePaddle with command line arguments, for more about command line arguments
see `Command Line Arguments <../cmd_argument/detail_introduction.html>`_.
see :ref:`cmd_detail_introduction` .
* Parse the configuration file that is used in training with :code:`parse_config()`.
Because data to predict with always have no label, and output of prediction work
normally is the output layer rather than the cost layer, so you should modify
......@@ -36,7 +36,7 @@ python's :code:`help()` function. Let's walk through the above python script:
- Note: As swig_paddle can only accept C++ matrices, we offer a utility
class DataProviderConverter that can accept the same input data with
PyDataProvider2, for more information please refer to document
of `PyDataProvider2 <../data_provider/pydataprovider2.html>`_.
of :ref:`api_pydataprovider2` .
* Do the prediction with :code:`forwardTest()`, which takes the converted
input data and outputs the activations of the output layer.
......
Parameter and Extra Layer Attribute
===================================
Parameter Attributes
=======================
.. automodule:: paddle.trainer_config_helpers.attrs
:members:
.. _api_trainer_config_helpers_layers:
======
Layers
======
......@@ -20,6 +22,8 @@ LayerOutput
Data layer
===========
.. _api_trainer_config_helpers_layers_data_layer:
data_layer
----------
.. automodule:: paddle.trainer_config_helpers.layers
......@@ -29,6 +33,8 @@ data_layer
Fully Connected Layers
======================
.. _api_trainer_config_helpers_layers_fc_layer:
fc_layer
--------
.. automodule:: paddle.trainer_config_helpers.layers
......@@ -68,6 +74,8 @@ img_conv_layer
:members: img_conv_layer
:noindex:
.. _api_trainer_config_helpers_layers_context_projection:
context_projection
------------------
.. automodule:: paddle.trainer_config_helpers.layers
......@@ -131,24 +139,12 @@ lstmemory
:members: lstmemory
:noindex:
lstm_step_layer
---------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: lstm_step_layer
:noindex:
grumemory
---------
.. automodule:: paddle.trainer_config_helpers.layers
:members: grumemory
:noindex:
gru_step_layer
---------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: gru_step_layer
:noindex:
Recurrent Layer Group
=====================
......@@ -164,6 +160,18 @@ recurrent_group
:members: recurrent_group
:noindex:
lstm_step_layer
---------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: lstm_step_layer
:noindex:
gru_step_layer
---------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: gru_step_layer
:noindex:
beam_search
------------
.. automodule:: paddle.trainer_config_helpers.layers
......@@ -179,12 +187,16 @@ get_output_layer
Mixed Layer
===========
.. _api_trainer_config_helpers_layers_mixed_layer:
mixed_layer
-----------
.. automodule:: paddle.trainer_config_helpers.layers
:members: mixed_layer
:noindex:
.. _api_trainer_config_helpers_layers_embedding_layer:
embedding_layer
---------------
.. automodule:: paddle.trainer_config_helpers.layers
......@@ -192,7 +204,7 @@ embedding_layer
:noindex:
scaling_projection
-----------------
------------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: scaling_projection
:noindex:
......@@ -237,18 +249,24 @@ trans_full_matrix_projection
Aggregate Layers
================
.. _api_trainer_config_helpers_layers_pooling_layer:
pooling_layer
-------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: pooling_layer
:noindex:
.. _api_trainer_config_helpers_layers_last_seq:
last_seq
--------
.. automodule:: paddle.trainer_config_helpers.layers
:members: last_seq
:noindex:
.. _api_trainer_config_helpers_layers_first_seq:
first_seq
---------
.. automodule:: paddle.trainer_config_helpers.layers
......@@ -261,6 +279,12 @@ concat_layer
:members: concat_layer
:noindex:
seq_concat_layer
----------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: seq_concat_layer
:noindex:
Reshaping Layers
================
......@@ -270,6 +294,8 @@ block_expand_layer
:members: block_expand_layer
:noindex:
.. _api_trainer_config_helpers_layers_expand_layer:
expand_layer
------------
.. automodule:: paddle.trainer_config_helpers.layers
......@@ -282,6 +308,18 @@ repeat_layer
:members: repeat_layer
:noindex:
rotate_layer
------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: rotate_layer
:noindex:
seq_reshape_layer
-----------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: seq_reshape_layer
:noindex:
Math Layers
===========
......@@ -333,6 +371,8 @@ tensor_layer
:members: tensor_layer
:noindex:
.. _api_trainer_config_helpers_layers_cos_sim:
cos_sim
-------
.. automodule:: paddle.trainer_config_helpers.layers
......@@ -360,6 +400,17 @@ sampling_id_layer
:members: sampling_id_layer
:noindex:
Slicing and Joining Layers
==========================
pad_layer
-----------
.. automodule:: paddle.trainer_config_helpers.layers
:members: pad_layer
:noindex:
.. _api_trainer_config_helpers_layers_cost_layers:
Cost Layers
===========
......@@ -381,6 +432,12 @@ multi_binary_label_cross_entropy
:members: multi_binary_label_cross_entropy
:noindex:
mse_cost
---------
.. automodule:: paddle.trainer_config_helpers.layers
:members: mse_cost
:noindex:
huber_cost
----------
.. automodule:: paddle.trainer_config_helpers.layers
......@@ -399,6 +456,12 @@ rank_cost
:members: rank_cost
:noindex:
sum_cost
---------
.. automodule:: paddle.trainer_config_helpers.layers
:members: sum_cost
:noindex:
crf_layer
-----------------
.. automodule:: paddle.trainer_config_helpers.layers
......@@ -417,6 +480,12 @@ ctc_layer
:members: ctc_layer
:noindex:
warp_ctc_layer
--------------
.. automodule:: paddle.trainer_config_helpers.layers
:members: warp_ctc_layer
:noindex:
nce_layer
-----------
.. automodule:: paddle.trainer_config_helpers.layers
......@@ -429,12 +498,6 @@ hsigmoid
:members: hsigmoid
:noindex:
sum_cost
---------
.. automodule:: paddle.trainer_config_helpers.layers
:members: sum_cost
:noindex:
Check Layer
============
......
......@@ -13,6 +13,8 @@ sequence_conv_pool
:members: sequence_conv_pool
:noindex:
.. _api_trainer_config_helpers_network_text_conv_pool:
text_conv_pool
--------------
.. automodule:: paddle.trainer_config_helpers.networks
......@@ -34,6 +36,8 @@ img_conv_group
:members: img_conv_group
:noindex:
.. _api_trainer_config_helpers_network_simple_img_conv_pool:
simple_img_conv_pool
--------------------
.. automodule:: paddle.trainer_config_helpers.networks
......
.. _api_trainer_config_helpers_optimizers:
==========
Optimizers
==========
......@@ -50,6 +52,8 @@ RMSPropOptimizer
:members: RMSPropOptimizer
:noindex:
.. _api_trainer_config_helpers_optimizers_settings:
settings
========
.. automodule:: paddle.trainer_config_helpers.optimizers
......
===========
Activation
===========
Abs
===
.. automodule:: paddle.v2.activation
:members: Abs
:noindex:
Exp
===
.. automodule:: paddle.v2.activation
:members: Exp
:noindex:
Identity
========
.. automodule:: paddle.v2.activation
:members: Identity
:noindex:
Linear
======
.. automodule:: paddle.v2.activation
:members: Linear
:noindex:
Log
===
.. automodule:: paddle.v2.activation
:members: Log
:noindex:
Square
======
.. automodule:: paddle.v2.activation
:members: Square
:noindex:
Sigmoid
=======
.. automodule:: paddle.v2.activation
:members: Sigmoid
:noindex:
Softmax
=======
.. automodule:: paddle.v2.activation
:members: Softmax
:noindex:
SequenceSoftmax
===============
.. automodule:: paddle.v2.activation
:members: SequenceSoftmax
:noindex:
Relu
====
.. automodule:: paddle.v2.activation
:members: Relu
:noindex:
BRelu
=====
.. automodule:: paddle.v2.activation
:members: BRelu
:noindex:
SoftRelu
========
.. automodule:: paddle.v2.activation
:members: SoftRelu
:noindex:
Tanh
====
.. automodule:: paddle.v2.activation
:members: Tanh
:noindex:
STanh
=====
.. automodule:: paddle.v2.activation
:members: STanh
:noindex:
Parameter Attribute
===================
.. automodule:: paddle.v2.attr
:members:
:noindex:
.. _api_v2.layer:
======
Layers
======
Data layer
===========
.. _api_v2.layer_data:
data
----
.. autoclass:: paddle.v2.layer.data
:noindex:
Fully Connected Layers
======================
.. _api_v2.layer_fc:
fc
--
.. autoclass:: paddle.v2.layer.fc
:noindex:
selective_fc
------------
.. autoclass:: paddle.v2.layer.selective_fc
:noindex:
Conv Layers
===========
conv_operator
-------------
.. autoclass:: paddle.v2.layer.conv_operator
:noindex:
conv_projection
---------------
.. autoclass:: paddle.v2.layer.conv_projection
:noindex:
conv_shift
----------
.. autoclass:: paddle.v2.layer.conv_shift
:noindex:
img_conv
--------
.. autoclass:: paddle.v2.layer.img_conv
:noindex:
.. _api_v2.layer_context_projection:
context_projection
------------------
.. autoclass:: paddle.v2.layer.context_projection
:noindex:
Image Pooling Layer
===================
img_pool
--------
.. autoclass:: paddle.v2.layer.img_pool
:noindex:
spp
---
.. autoclass:: paddle.v2.layer.spp
:noindex:
maxout
------
.. autoclass:: paddle.v2.layer.maxout
:noindex:
Norm Layer
==========
img_cmrnorm
-----------
.. autoclass:: paddle.v2.layer.img_cmrnorm
:noindex:
batch_norm
----------
.. autoclass:: paddle.v2.layer.batch_norm
:noindex:
sum_to_one_norm
---------------
.. autoclass:: paddle.v2.layer.sum_to_one_norm
:noindex:
cross_channel_norm
------------------
.. autoclass:: paddle.v2.layer.cross_channel_norm
:noindex:
Recurrent Layers
================
recurrent
---------
.. autoclass:: paddle.v2.layer.recurrent
:noindex:
lstmemory
---------
.. autoclass:: paddle.v2.layer.lstmemory
:noindex:
grumemory
---------
.. autoclass:: paddle.v2.layer.grumemory
:noindex:
Recurrent Layer Group
=====================
memory
------
.. autoclass:: paddle.v2.layer.memory
:noindex:
recurrent_group
---------------
.. autoclass:: paddle.v2.layer.recurrent_group
:noindex:
lstm_step
---------
.. autoclass:: paddle.v2.layer.lstm_step
:noindex:
gru_step
--------
.. autoclass:: paddle.v2.layer.gru_step
:noindex:
beam_search
------------
.. autoclass:: paddle.v2.layer.beam_search
:noindex:
get_output
----------
.. autoclass:: paddle.v2.layer.get_output
:noindex:
Mixed Layer
===========
.. _api_v2.layer_mixed:
mixed
-----
.. autoclass:: paddle.v2.layer.mixed
:noindex:
.. _api_v2.layer_embedding:
embedding
---------
.. autoclass:: paddle.v2.layer.embedding
:noindex:
scaling_projection
------------------
.. autoclass:: paddle.v2.layer.scaling_projection
:noindex:
dotmul_projection
-----------------
.. autoclass:: paddle.v2.layer.dotmul_projection
:noindex:
dotmul_operator
---------------
.. autoclass:: paddle.v2.layer.dotmul_operator
:noindex:
full_matrix_projection
----------------------
.. autoclass:: paddle.v2.layer.full_matrix_projection
:noindex:
identity_projection
-------------------
.. autoclass:: paddle.v2.layer.identity_projection
:noindex:
table_projection
----------------
.. autoclass:: paddle.v2.layer.table_projection
:noindex:
trans_full_matrix_projection
----------------------------
.. autoclass:: paddle.v2.layer.trans_full_matrix_projection
:noindex:
Aggregate Layers
================
.. _api_v2.layer_pooling:
pooling
-------
.. autoclass:: paddle.v2.layer.pooling
:noindex:
.. _api_v2.layer_last_seq:
last_seq
--------
.. autoclass:: paddle.v2.layer.last_seq
:noindex:
.. _api_v2.layer_first_seq:
first_seq
---------
.. autoclass:: paddle.v2.layer.first_seq
:noindex:
concat
------
.. autoclass:: paddle.v2.layer.concat
:noindex:
seq_concat
----------
.. autoclass:: paddle.v2.layer.seq_concat
:noindex:
Reshaping Layers
================
block_expand
------------
.. autoclass:: paddle.v2.layer.block_expand
:noindex:
.. _api_v2.layer_expand:
expand
------
.. autoclass:: paddle.v2.layer.expand
:noindex:
repeat
------
.. autoclass:: paddle.v2.layer.repeat
:noindex:
rotate
------
.. autoclass:: paddle.v2.layer.rotate
:noindex:
seq_reshape
-----------
.. autoclass:: paddle.v2.layer.seq_reshape
:noindex:
Math Layers
===========
addto
-----
.. autoclass:: paddle.v2.layer.addto
:noindex:
linear_comb
-----------
.. autoclass:: paddle.v2.layer.linear_comb
:noindex:
interpolation
-------------
.. autoclass:: paddle.v2.layer.interpolation
:noindex:
bilinear_interp
---------------
.. autoclass:: paddle.v2.layer.bilinear_interp
:noindex:
power
-----
.. autoclass:: paddle.v2.layer.power
:noindex:
scaling
-------
.. autoclass:: paddle.v2.layer.scaling
:noindex:
slope_intercept
---------------
.. autoclass:: paddle.v2.layer.slope_intercept
:noindex:
tensor
------
.. autoclass:: paddle.v2.layer.tensor
:noindex:
.. _api_v2.layer_cos_sim:
cos_sim
-------
.. autoclass:: paddle.v2.layer.cos_sim
:noindex:
trans
-----
.. autoclass:: paddle.v2.layer.trans
:noindex:
Sampling Layers
===============
maxid
-----
.. autoclass:: paddle.v2.layer.max_id
:noindex:
sampling_id
-----------
.. autoclass:: paddle.v2.layer.sampling_id
:noindex:
Slicing and Joining Layers
==========================
pad
----
.. autoclass:: paddle.v2.layer.pad
:noindex:
.. _api_v2.layer_costs:
Cost Layers
===========
cross_entropy_cost
------------------
.. autoclass:: paddle.v2.layer.cross_entropy_cost
:noindex:
cross_entropy_with_selfnorm_cost
--------------------------------
.. autoclass:: paddle.v2.layer.cross_entropy_with_selfnorm_cost
:noindex:
multi_binary_label_cross_entropy_cost
-------------------------------------
.. autoclass:: paddle.v2.layer.multi_binary_label_cross_entropy_cost
:noindex:
huber_cost
----------
.. autoclass:: paddle.v2.layer.huber_cost
:noindex:
lambda_cost
-----------
.. autoclass:: paddle.v2.layer.lambda_cost
:noindex:
mse_cost
--------
.. autoclass:: paddle.v2.layer.mse_cost
:noindex:
rank_cost
---------
.. autoclass:: paddle.v2.layer.rank_cost
:noindex:
sum_cost
---------
.. autoclass:: paddle.v2.layer.sum_cost
:noindex:
crf
---
.. autoclass:: paddle.v2.layer.crf
:noindex:
crf_decoding
------------
.. autoclass:: paddle.v2.layer.crf_decoding
:noindex:
ctc
---
.. autoclass:: paddle.v2.layer.ctc
:noindex:
warp_ctc
--------
.. autoclass:: paddle.v2.layer.warp_ctc
:noindex:
nce
---
.. autoclass:: paddle.v2.layer.nce
:noindex:
hsigmoid
---------
.. autoclass:: paddle.v2.layer.hsigmoid
:noindex:
Check Layer
============
eos
---
.. autoclass:: paddle.v2.layer.eos
:noindex:
========
Networks
========
The v2.networks module contains pieces of neural network that combine multiple layers.
NLP
===
sequence_conv_pool
------------------
.. automodule:: paddle.v2.networks
:members: sequence_conv_pool
:noindex:
.. _api_trainer_config_helpers_network_text_conv_pool:
text_conv_pool
--------------
.. automodule:: paddle.v2.networks
:members: text_conv_pool
:noindex:
Images
======
img_conv_bn_pool
----------------
.. automodule:: paddle.v2.networks
:members: img_conv_bn_pool
:noindex:
img_conv_group
--------------
.. automodule:: paddle.v2.networks
:members: img_conv_group
:noindex:
.. _api_trainer_config_helpers_network_simple_img_conv_pool:
simple_img_conv_pool
--------------------
.. automodule:: paddle.v2.networks
:members: simple_img_conv_pool
:noindex:
vgg_16_network
---------------
.. automodule:: paddle.v2.networks
:members: vgg_16_network
:noindex:
Recurrent
=========
LSTM
----
lstmemory_unit
``````````````
.. automodule:: paddle.v2.networks
:members: lstmemory_unit
:noindex:
lstmemory_group
```````````````
.. automodule:: paddle.v2.networks
:members: lstmemory_group
:noindex:
simple_lstm
```````````
.. automodule:: paddle.v2.networks
:members: simple_lstm
:noindex:
bidirectional_lstm
``````````````````
.. automodule:: paddle.v2.networks
:members: bidirectional_lstm
:noindex:
GRU
---
gru_unit
````````
.. automodule:: paddle.v2.networks
:members: gru_unit
:noindex:
gru_group
`````````
.. automodule:: paddle.v2.networks
:members: gru_group
:noindex:
simple_gru
``````````
.. automodule:: paddle.v2.networks
:members: simple_gru
:noindex:
simple_attention
----------------
.. automodule:: paddle.v2.networks
:members: simple_attention
:noindex:
Miscs
=====
dropout_layer
--------------
.. automodule:: paddle.v2.networks
:members: dropout_layer
:noindex:
==========
Optimizer
==========
Momentum
========
.. automodule:: paddle.v2.optimizer
:members: Momentum
:noindex:
Adam
====
.. automodule:: paddle.v2.optimizer
:members: Adam
:noindex:
Adamax
======
.. automodule:: paddle.v2.optimizer
:members: Adamax
:noindex:
AdaGrad
=======
.. automodule:: paddle.v2.optimizer
:members: AdaGrad
:noindex:
DecayedAdaGrad
==============
.. automodule:: paddle.v2.optimizer
:members: DecayedAdaGrad
:noindex:
AdaDelta
========
.. automodule:: paddle.v2.optimizer
:members: AdaDelta
:noindex:
RMSProp
=======
.. automodule:: paddle.v2.optimizer
:members: RMSProp
:noindex:
=======
Pooling
=======
BasePool
========
.. automodule:: paddle.v2.pooling
:members: BasePool
:noindex:
Avg
===
.. automodule:: paddle.v2.pooling
:members: Avg
:noindex:
Max
===
.. automodule:: paddle.v2.pooling
:members: Max
:noindex:
Sum
===
.. automodule:: paddle.v2.pooling
:members: Sum
:noindex:
SquareRootN
===========
.. automodule:: paddle.v2.pooling
:members: SquareRootN
:noindex:
CudnnAvg
========
.. automodule:: paddle.v2.pooling
:members: CudnnAvg
:noindex:
CudnnMax
========
.. automodule:: paddle.v2.pooling
:members: CudnnMax
:noindex:
==================================
Data Reader Interface and DataSets
==================================
DataTypes
=========
.. automodule:: paddle.v2.data_type
:members:
:noindex:
DataFeeder
==========
.. automodule:: paddle.v2.data_feeder
:members:
:noindex:
Reader
======
.. automodule:: paddle.v2.reader
:members:
:noindex:
.. automodule:: paddle.v2.reader.creator
:members:
:noindex:
minibatch
=========
.. automodule:: paddle.v2.minibatch
:members:
:noindex:
Dataset
=======
.. automodule:: paddle.v2.dataset
:members:
:noindex:
mnist
+++++
.. automodule:: paddle.v2.dataset.mnist
:members:
:noindex:
cifar
+++++
.. automodule:: paddle.v2.dataset.cifar
:members:
:noindex:
conll05
+++++++
.. automodule:: paddle.v2.dataset.conll05
:members: get_dict,get_embedding,test
:noindex:
imdb
++++
.. automodule:: paddle.v2.dataset.imdb
:members:
:noindex:
imikolov
++++++++
.. automodule:: paddle.v2.dataset.imikolov
:members:
:noindex:
movielens
+++++++++
.. automodule:: paddle.v2.dataset.movielens
:members:
:noindex:
.. autoclass:: paddle.v2.dataset.movielens.MovieInfo
:noindex:
.. autoclass:: paddle.v2.dataset.movielens.UserInfo
:noindex:
sentiment
+++++++++
.. automodule:: paddle.v2.dataset.sentiment
:members:
:noindex:
uci_housing
+++++++++++
.. automodule:: paddle.v2.dataset.uci_housing
:members:
:noindex:
wmt14
+++++
.. automodule:: paddle.v2.dataset.wmt14
:members:
:noindex:
Model Configuration
===================
.. toctree::
:maxdepth: 1
config/activation.rst
config/layer.rst
config/optimizer.rst
config/pooling.rst
config/networks.rst
config/attr.rst
======================
Training and Inference
======================
Parameters
==========
.. automodule:: paddle.v2.parameters
:members: Parameters
:noindex:
Trainer
=======
.. automodule:: paddle.v2.trainer
:members: SGD
:noindex:
Event
=====
.. automodule:: paddle.v2.event
:members:
:noindex:
Inference
=========
.. autofunction:: paddle.v2.infer
:noindex:
\ No newline at end of file
Docker installation guide
==========================
PaddlePaddle provide the `Docker <https://www.docker.com/>`_ image. `Docker`_ is a lightweight container utilities. The performance of PaddlePaddle in `Docker`_ container is basically as same as run it in a normal linux. The `Docker`_ is a very convenient way to deliver the binary release for linux programs.
.. note::
The `Docker`_ image is the recommended way to run PaddlePaddle
PaddlePaddle Docker images
--------------------------
There are 12 `images <https://hub.docker.com/r/paddledev/paddle/tags/>`_ for PaddlePaddle, and the name is :code:`paddle-dev/paddle`, tags are\:
+-----------------+------------------+------------------------+-----------------------+
| | normal | devel | demo |
+=================+==================+========================+=======================+
| CPU | cpu-latest | cpu-devel-latest | cpu-demo-latest |
+-----------------+------------------+------------------------+-----------------------+
| GPU | gpu-latest | gpu-devel-latest | gpu-demo-latest |
+-----------------+------------------+------------------------+-----------------------+
| CPU WITHOUT AVX | cpu-noavx-latest | cpu-devel-noavx-latest | cpu-demo-noavx-latest |
+-----------------+------------------+------------------------+-----------------------+
| GPU WITHOUT AVX | gpu-noavx-latest | gpu-devel-noavx-latest | gpu-demo-noavx-latest |
+-----------------+------------------+------------------------+-----------------------+
And the three columns are:
* normal\: The docker image only contains binary of PaddlePaddle.
* devel\: The docker image contains PaddlePaddle binary, source code and essential build environment.
* demo\: The docker image contains the dependencies to run PaddlePaddle demo.
And the four rows are:
* CPU\: CPU Version. Support CPU which has :code:`AVX` instructions.
* GPU\: GPU Version. Support GPU, and cpu has :code:`AVX` instructions.
* CPU WITHOUT AVX\: CPU Version, which support most CPU even doesn't have :code:`AVX` instructions.
* GPU WITHOUT AVX\: GPU Version, which support most CPU even doesn't have :code:`AVX` instructions.
User can choose any version depends on machine. The following script can help you to detect your CPU support :code:`AVX` or not.
.. code-block:: bash
if cat /proc/cpuinfo | grep -q avx ; then echo "Support AVX"; else echo "Not support AVX"; fi
If the output is :code:`Support AVX`, then you can choose the AVX version of PaddlePaddle, otherwise, you need select :code:`noavx` version of PaddlePaddle. For example, the CPU develop version of PaddlePaddle is :code:`paddle-dev/paddle:cpu-devel-latest`.
The PaddlePaddle images don't contain any entry command. You need to write your entry command to use this image. See :code:`Remote Access` part or just use following command to run a :code:`bash`
.. code-block:: bash
docker run -it paddledev/paddle:cpu-latest /bin/bash
Download and Run Docker images
------------------------------
You have to install Docker in your machine which has linux kernel version 3.10+ first. You can refer to the official guide https://docs.docker.com/engine/installation/ for further information.
You can use :code:`docker pull ` to download images first, or just launch a container with :code:`docker run` \:
.. code-block:: bash
docker run -it paddledev/paddle:cpu-latest
If you want to launch container with GPU support, you need to set some environment variables at the same time:
.. code-block:: bash
export CUDA_SO="$(\ls /usr/lib64/libcuda* | xargs -I{} echo '-v {}:{}') $(\ls /usr/lib64/libnvidia* | xargs -I{} echo '-v {}:{}')"
export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
docker run ${CUDA_SO} ${DEVICES} -it paddledev/paddle:gpu-latest
Some notes for docker
---------------------
Performance
+++++++++++
Since Docker is based on the lightweight virtual containers, the CPU computing performance maintains well. And GPU driver and equipments are all mapped to the container, so the GPU computing performance would not be seriously affected.
If you use high performance nic, such as RDMA(RoCE 40GbE or IB 56GbE), Ethernet(10GbE), it is recommended to use config "-net = host".
Remote access
+++++++++++++
If you want to enable ssh access background, you need to build an image by yourself. Please refer to official guide https://docs.docker.com/engine/reference/builder/ for further information.
Following is a simple Dockerfile with ssh:
.. literalinclude:: ../../doc_cn/build_and_install/install/paddle_ssh.Dockerfile
Then you can build an image with Dockerfile and launch a container:
.. code-block:: bash
# cd into Dockerfile directory
docker build . -t paddle_ssh
# run container, and map host machine port 8022 to container port 22
docker run -d -p 8022:22 --name paddle_ssh_machine paddle_ssh
Now, you can ssh on port 8022 to access the container, username is root, password is also root:
.. code-block:: bash
ssh -p 8022 root@YOUR_HOST_MACHINE
You can stop and delete the container as following:
.. code-block:: bash
# stop
docker stop paddle_ssh_machine
# delete
docker rm paddle_ssh_machine
Cluster Train
====================
.. toctree::
:glob:
opensource/cluster_train.md
internal/index.md
Image Classification Tutorial
=============================
.. toctree::
:maxdepth: 3
:glob:
Training Locally <image_classification.md>
cluster_train/internal/cluster_train.md
cluster_train/opensource/cluster_train.md
# Examples and demos
There are serveral examples and demos here.
## Image
* [Image Classification](image_classification/index.rst)
## NLP
* [Sentiment Analysis](sentiment_analysis/index.rst)
* [Text Generation](text_generation/index.rst)
* [Semantic Role Labeling](semantic_role_labeling/index.rst)
## Recommendation
* [MovieLens Dataset](rec/ml_dataset.md)
* [MovieLens Regression](rec/ml_regression.rst)
## Model Zoo
* [ImageNet: ResNet](imagenet_model/resnet_model.md)
* [Embedding: Chinese Word](embedding_model/index.md)
Semantic Role Labeling Tutorial
===============================
.. toctree::
:maxdepth: 3
semantic_role_labeling.md
Sentiment Analasis Tutorial
===========================
.. toctree::
:maxdepth: 3
:glob:
Training Locally <sentiment_analysis.md>
internal/cluster_train.md
Text Generation Tutorial
========================
.. toctree::
:maxdepth: 3
:glob:
Training Locally <text_generation.md>
internal/cluster_train.md
# PaddlePaddle Design Doc
## Ingredients
As our design principle is starting from the essence: how could we
allow users to express and solve their problems at neural networks.
Some essential concepts that our API have to provide include:
1. A *topology* is an expression of *layers*.
1. A layer could be any kind of computation, including *cost*.
1. Some layers have parameters, some don't. Most costs don't have
parameters.
1. In some topologies, layers share parameters. For
example,
[the network for training a ranking model](https://github.com/PaddlePaddle/Paddle/issues/1311#issuecomment-279121850).
1. At programming time, users specify topologies and possible sharing
of parameters. PaddlePaddle can figure out and create parameters
required (and possibly shared) by one or more topologies.
## Starting from Examples
As a summarization
of
[our disucssion](https://github.com/PaddlePaddle/Paddle/issues/1315),
let us present two examples here:
### Example 1. Sharing Parameters between Layers
We use
the
[3-branch ranking](https://github.com/PaddlePaddle/Paddle/issues/1311#issuecomment-279121850) model
in this example. For your convenience, I copy-a-paste the model's
topology as follows:
```
A -> f -\
Q -> f --> cost
B -> f -/
```
The following program trains the topology including the cost, and then
use the sub-network in the trained topology in inference:
```python
def f(in):
e = paddle.layer.embedding(in, parameter_name="embedding")
o = paddle.layer.softmax(e, parameter_name="semantic")
return o
# Create 3 topologies (subnets), they share parameters because all
# correspoinding layers have the same parameter names.
fA = f(paddle.layer.data(input_name="A"))
fB = f(paddle.layer.data(input_name="B"))
fQ = f(paddle.layer.data(input_name="Q"))
topology = paddle.layer.less_than(
paddle.layer.cross_entropy(fA, fQ),
paddle.layer.corss_entropy(fB, fQ))
# Derive parameters required in topology and create them in model.
parameters = paddle.parameters.create(topology)
# Estimate parameters used in topology from data.
paddle.train(topology, parameters, reader=read_ranking_model_data)
# Inference using fA (or fB or fC, as they share their parameters).
[testA, testB, testQ] = read_ranking_model_data()
print "The sematic-vector of testA: ", paddle.infer(fA, parameters, testA)
```
### Example 2. Sharing Parameters between "Models"
We use [GAN](https://github.com/PaddlePaddle/book/tree/develop/gan) in
this example. In the following example program, `d0` and `d1`
correspond to the two networks in the following figure:
<img src="https://github.com/wangyang59/book/raw/00036f4b0da5225041a6824587c1a01cf20159b1/gan/image/gan_ig.png" width=400 />
```python
def G(in):
# over-simplified example as G has only one layers:
return paddle.layer.fc(in, parameter_name="G")
def D(in);
# again, over-simplified:
return paddle.layer.fc(in, parameter_name="D")
# Construct the first topology, which contains both D and G.
# By learning this topology, we update parameters of G.
d0 = paddle.layer.should_be_false(D(G(paddle.layer.data())))
# Construct a second topology d1, which contains only D. By
# training this topology, we update parameters of D. Note
# that d1 share parameters with d0.
d1 = paddle.layer.should_be_true(D(paddle.layer.data()))
# Create parameters from a list of multiple topologies (models) for
# the chance to share parameters between these topologies.
parameters = paddle.parameters.create([d0, d1])
# Iterative training of GAN.
for ...:
train(d0, parameters, reader=read_from_rng, immutable_parameters={"D"})
train(d1, parameters, reader=read_from_realistic_images)
# Use d1 for inference:
print "D thinks a batch of images are realistic ", infer(d1, parameters, read_mnist_images)
```
### Summarization
Above two programs reveal some important design concerns:
1. Users describe a topology as an expression of layers. Every layer
has a *parameter name*. If the users don't specify it explicitly, it's automatically generated as a unique name. By
specifying the parameter name, users can specify the sharing of
parameters between layers and even between topologies.
1. `paddle.parameters.create` figures out parameters required by one
or more topologies from parameter names of layers. It creates these
parameters and returns a `ParameterSet` object, which is in essence
a map from *parameter names* to *parameters*.
1. At training and inference time, `paddle.train` and `paddle.infer`
requires both a topology and the parameter set that holds the parameters of that topology. There are some reasons:
1. This prevents users from forgetting to call
`paddle.parameters.create`.
1. `paddle.train` needs to know which parameter set to update.
1. Users could load another (pre-trained) parameter set and use it
with a topology in `train.infer`.
1. By specifying the `immutable_parameters` parameter of
`paddle.train`, we can forbid the update of these parameters.
## Reader
Not all programming frameworks allow users to define I/O functions.
An example is Google MapReduce, which can only read from text,
SSTable, and RecordIO files. Hadoop MapReduce allows users to define
readers and writers by deriving from base classes `Reader` and
`Writer`. The former is less flexible but also less error-prone. We
decide to provide the flexibility to users to define their readers.
There are some open questions here:
1. **Should a reader return a Python dictionary?**
1. **How to map multiple outputs from a reader to multiple data layers?**
1. **How to easily compose some existing readers to read more data and
feed a topology with more data layers?**
## Training
The recommended way to training a model is to call `paddle.train`,
which simply calls `paddle.trainer.Default`, a global variable of
type `paddle.trainer.SGD`. Equivalently, we can do
```python
opt = paddle.trainer.SGD(..., paddle.updater.Adam(...))
opt.train(topology, parameters, reader=read, ...)
```
### Updater
Please be aware that a trainer can accept an updater as its data
member, where an updater is a class derived from
`paddle.trainer.Updater`. This is to make it easier to customize
trainers, as discussed
[here](https://github.com/PaddlePaddle/Paddle/issues/1319).
### Event Handler
`paddle.train` and `paddle.trainer.XXX.train` take an optional
parameter `event_handler`, which should be either `None` or a function
that handle some events:
1. BeginTraining
1. EndTraining
1. BeginIteration
1. EndIteration
1. BeginPass
1. EndPass
where EndPass is sent if and only if the reader yields
`end_pass=True`.
An example as follows:
```python
def event_handler(event):
if ininstance(event, paddle.event.EndIteration):
print paddle.test(...)
paddle.train(topology, parameters, reader, event_handler)
```
If we are writing a PaddlePaddle program in and for iPython/Jypyter,
we can use metaplotlib in the event handler to plot a curve of
cost/error versus iterations, as shown
[here](https://blog.dominodatalab.com/interactive-dashboards-in-jupyter/).
### Distributed Training
If users want to do distributed training on a cluster, s/he should
call `paddle.dist_train` and provides access tokens to the cluster as
a parameter.
For example, if the user has a TLS certificate that allows him to
access a Kubernetes cluster, s/he should be able to call
```python
paddle.dist_train(model,
trainer=paddle.trainer.SGD(...,
paddle.updater.Adam(...)),
reader=read,
k8s_user="yi",
k8s_token="kube_cluster_tls.pem",
k8s_job="hello",
num_parameter_servers=15)
```
The pseudo code if `paddle.dist_train` is as follows:
```python
def dist_train(topology, parameters, trainer, reader, ...):
if os.getenv("KUBERNETES_SERVICE_HOST") == None:
image_name = k8s_user + '/' + k8s_job
docker_build(image_name)
docker_push()
kube_ctrl_start_job(image_name, k8s_user, k8s_token)
else:
rank = kube_list_containers_in_job_and_return_current_containers_rank()
if rank == 0:
master()
elif rank < 15:
parameter_server()
else:
trainer.train(model, reader=read)
```
Please be aware that if a process is running on the Kubernetes
cluster, it will have some environment variables pre-defined.
If `dist_train` doesn't see these environment variables, it knows
that it's running on users' personal computer, and it should work as a
*launcher*. Otherwise, it knows that it's running on the cluster and
need to figure out its role as either the master, or a trainer, or a
parameter server.
# Design Doc: Distributed Training
## Objective
In [this slides](https://www.slideshare.net/cxwangyi/paddlepaddle-a-complete-solution-for-businesses), we explained that we'd like PaddlePaddle running on general-purpose clusters like those managed by Kubernetes, so to address demands for AI from both Internet and non-Internet industries.
This poses technical challenges to PaddlePaddle:
1. Support fault-recovery.
1. Support both offline and online training.
1. [Serverless computing](https://en.wikipedia.org/wiki/Serverless_computing) of distributed training.
## Training Job
A training job will be created once user asks Paddle cloud to train a model. The training job is made up of different processes that collaboratively consume data and produce a trained model. There are three kinds of processes:
1. the *master process*, which dispatches tasks to
1. one or more *trainer processes*, which run distributed training and synchronize gradients/models via
1. one or more *parameter server processes*, where each holds a shard of the global model.
Their relation is illustrated in the following graph:
<img src="src/paddle-model-sharding.png"/>
### Master Process
The master process will:
- Partition a dataset into [tasks](#task) and dispatch tasks to trainers.
- Keep track of training progress on the dataset with [task queue](#task-queue). A training job will iterate on the dataset for a full pass until it goes into next pass.
#### Task
A task is a data shard to be trained. The total number of tasks will be much bigger than the total number of trainers. The number of data instances inside a task will be much bigger than the mini-batch size.
#### Task Queue
The master process has three task queues to track training progress. As illustrated in the graph below, Job A and Job B both have one master process. Each master process has three task queues.
<img src="src/paddle-task-queues.png"/>
- The todo queue holds tasks to be dispatched. When a job starts, the master process fills in the todo queue with all tasks.
- The pending queue holds tasks that are currently training by trainers.
- the done queue holds tasks that are already trained.
The life cycle of a single task is illustrated below:
<img src="src/paddle-task-states.png"/>
1. When a new pass of training starts, all tasks will be placed in the todo queue.
1. The master process will dispatch few tasks to each trainer at a time, puts them in the pending queue and waits for completion.
1. The trainer will work on its tasks and tell the master process once a task is completed. The master process will dispatch a new task to that trainer.
1. If a task timeout. the master process will move it back to the todo queue. The timeout count will increase by one. If the timeout count is above a threshold, the task is likely to cause a trainer to crash, so it will be discarded.
1. The master process will move completed task to the done queue. When the todo queue is empty, the master process will start a new pass by moving all tasks in the done queue to todo queue and reset the timeout counter of all tasks to zero.
### Trainer Process
The trainer process will:
- Receive tasks from the master.
- Work on the tasks: calculate and upload gradient to parameter servers, and update local model by downloading new parameters from parameter servers.
### Parameter Server Process
Parameter server processes hold the parameters collaboratively. The parameters are partitioned on different parameter servers.
The parameter server will:
- Receive gradient from the trainers, update its parameters, and give the trainers the latest parameters.
- Periodically save its parameters to distributed file system by overriding the previous save.
### Optimization Algorithms
The communication pattern between the trainers and the parameter servers depends on the category of optimization algorithm:
- Synchronous Stochastic Gradient Descent (sync-SGD)
Parameter server will wait for all trainer finish n-th mini-batch calculation and send their gradients before broadcasting new parameters to every trainer. Every trainer will wait for the new parameters before starting n+1-th mini-batch.
- Asynchronous Stochastic Gradient Descent (async-SGD)
There will no synchronization between different trainers, and parameter server updates its parameter as soon as it receives new gradient:
- Each trainer uploads its accumulated gradient every n mini-batches.
- Every m mini-batches, the trainer downloads new parameters from parameter server.
- n and m do not have to be equal.
## Fault Tolerant
The training job will pause if the master processes is dead, or any of the parameter server process is dead. They will be started by [Kubernetes](https://kubernetes.io/) and recover in few minutes. Please refer to [fault recovery](#fault-recovery).
The training job will continue to make progress if there is at least one training process running. The strategy depends on the type of optimization algorithm:
- sync-SGD
TODO
- async-SGD
Since async-SGD does not require synchronization between mini-batches, the system will by definition make process if at least one trainer is running.
## Fault Recovery
PaddlePaddle uses [etcd](https://github.com/coreos/etcd) to keep track of the states of processes. Because etcd is a distributed reliable key-value store, the restarted process can recover its states from etcd. The model parameters are periodically saved into distributed file system, so a restarted parameter server can recover its parameters from the saved file.
Now we will introduce how each process recovers from a failure, the graph below shows how etcd is used:
<img src="src/paddle-etcd.png"/>
### Master Process
When the master is started by the Kubernetes, it executes the following steps at startup:
1. Grabs a unique *master* lock in etcd, which prevents concurrent master instantiations.
1. Recovers the task queues from etcd if they already exist, otherwise, the master will create them.
1. Watches the trainer prefix keys `/trainer/` on etcd to find the live trainers.
1. Starts dispatching the tasks to the trainers, and updates task queue using an etcd transaction to ensure lock is held during the update.
The master process will kill itself if its etcd lease expires.
When the master process is dead for any reason, Kubernetes will restart it. It will be online again with all states recovered from etcd in few minutes.
### Trainer Process
When the trainer is started by the Kubernetes, it executes the following steps at startup:
1. Watches the available parameter server prefix keys `/ps/` on etcd and waits until the count of parameter servers reaches the desired count.
1. Generates a unique ID, and sets key `/trainer/<unique ID>` with its contact address as value. The key will be deleted when the lease expires, so the master will be aware of the trainer being online and offline.
1. Waits for tasks from the master to start training.
If trainer's etcd lease expires, it will try set key `/trainer/<unique ID>` again so that the master process can discover the trainer again.
### Parameter Server Process
When the parameter server is started by Kubernetes, it executes the following steps at startup:
1. Read desired total number of parameter servers from etcd `/ps_desired`
1. Search through etcd keys `/ps/<index>` (`/ps/0`, `/ps/1`, ...) to find the first non-existant key whose index is smaller than the total number of parameter servers. Set the key using a transaction to avoid concurrent writes. The parameter server's index is inferred from the key name.
The desired number of parameter servers is 3:
<img src="src/paddle-ps-0.png"/>
The third parameter server joined:
<img src="src/paddle-ps-1.png"/>
1. The parameter server can load parameters if there are already saved parameters in the save path (inferred from its index).
1. Now the parameter server is ready for the trainers' requests.
If the parameter server's etcd lease expires, the parameter server will kill itself.
## Dynamic Scaling
### Trainer Scaling
TODO
### Parameter Server Scaling
Not planned for v1.
## Training Dataset Format
TODO
## User Interface
TODO
# Paddle多语言接口实现
## 背景
Paddle需要一个多语言接口,这个接口需要做到:
* 有标准的,良好的文档
* 例如Python可以使用[Sphinx](http://www.sphinx-doc.org/en/stable/)生成API文档,golang可以使用[GoDoc](https://godoc.org/golang.org/x/tools/cmd/godoc)生成文档。这都需要这个接口按照约定俗成的规则来注释完备。
* 不同语言的接口适应不同语言的特性
* 例如Java与Python的错误处理是直接扔出来Exception,而对于golang错误处理应该使用返回值。
## 基本要求
Paddle的多语言接口实现包括一下几个方面:
* 我们使用动态库来分发Paddle。在这个动态库中不嵌入任何其他语言的解释器,也不使用其他动态库。
* 这个动态库使用C99标准的头文件导出一些函数,不使用/导出C++符号。
* 不导出Paddle内部的结构体、类,仅仅使用`void*`指针作为类型的句柄(handler)。
* 不使用SWIG这种代码生成器,而是手写多语言绑定。
## 原因
### 使用动态库来分发Paddle
* Paddle的链接方式比较复杂
* 如果用户要把Paddle的静态库(libpaddle.a)链接到自己的程序里,得使用 `--whole-archive` (for GCC) 或者 `--force_load` (for Clang) 参数,来确保把 libpaddle.a 里所有的符号都写入自己的程序的二进制文件里。这是因为 Paddle 的源码里使用了[object factory design pattern](http://stackoverflow.com/a/1310326/724872)。
* 编译型语言,例如C/C++使用静态库和动态库难度差不多。但是解释性语言,例如[Python](http://stackoverflow.com/questions/19560594/how-to-import-static-library-in-python)或者[Java](http://stackoverflow.com/questions/24493337/linking-static-library-with-jni),只能调用Paddle的动态库,否则得把Paddle静态库链接到解释器里。
* 解释性语言实际运行的二进制是解释器本身,如果调用静态库只能将静态库与解释器链接。例如对于Java来说,便是将静态库加入JVM中。这对于通常的Java的开发者来说,是不常见的做法。
### 动态库中不嵌入任何其他语言的解释器
* 目前Paddle的进程模型是C++内部驱动Python解释器进行模型配置解析和数据读取
* 我们最终的动态库中不嵌入Python或者其他任何语言的解释器。模型配置解析,数据读取均交由其他语言完成
现阶段Paddle有一个问题是,Paddle内嵌的Python解释器和外部使用的Python如果版本不同,会直接报错退出。
### Paddle动态库中,不引用其他动态库
* 即这个动态库是不依赖于其他任何文件的,可以在任何机器上执行的。
### 这个动态库使用C99标准的头文件导出一些函数,不使用/导出C++符号
* 由于C++编译器没有[名字修饰](https://en.wikipedia.org/wiki/Name_mangling#C.2B.2B)的规范,不同版本的编译器之间,对于同一段C++代码生成的符号可能不一致。而多语言接口需要直接读取生成的二进制(动态库),需要有稳定的导出符号。
* C语言是有导出符号的标准的,并且在常见的平台上,都是ABI调用标准的。
* 大多数语言都支持使用C语言API
* 使用C99而不使用C89,是因为C99支持[Fixed-width integer types](https://en.wikipedia.org/wiki/C_data_types#Fixed-width_integer_types)和[Boolean type](https://en.wikipedia.org/wiki/C_data_types#Boolean_type)。
* 使用C99而不使用C11的原因是,[C11](https://en.wikipedia.org/wiki/C11_(C_standard_revision))并没有Paddle特别需要的特性,且C99相对于C11使用更加广泛。
### 不导出Paddle内部的结构体、类,仅仅使用`void*`指针作为类型的句柄(handler)
* Paddle内部的类为C++书写,直接导出到C的接口比较困难。
* 在C-API中使用`void*`来表示Paddle内部类。再在每一个API中自己检查类型。
在C的头文件 `paddle_matrix.h` 中:
```C
typedef void* paddle_matrix;
typedef int paddle_error;
extern "C"
paddle_error paddle_matrix_shape(paddle_matrix matrix,
uint64_t* width,
uint64_t* height);
```
而在CPP里面实现这个C的接口,文件 `paddle_matrix.cpp`
```cpp
#include "paddle/math/matrix.hpp"
extern "C"
paddle_error paddle_matrix_shape(paddle_matrix matrix,
uint64_t *width,
uint64_t *height) {
auto m = (paddle::math::matrix*)(matrix);
*width = m->width();
*height = m->height();
}
```
其中`paddle/math/matrix.hpp`文件内容为:
```cpp
namespace paddle {
namespace math {
class Matrix {
//...
};
} // namespace math
} // namespace paddle
```
### 不使用SWIG这种代码生成器,而是手写多语言绑定
* [SWIG](http://www.swig.org/)是一个多语言接口的代码生成器。他的目标是使用C/C++写代码,SWIG直接读取C/C++的头文件,生成各种语言的绑定代码。
* 对于多语言接口,SWIG需要写一个interface文件。这个文件具有独特的语法,学习成本高。且增加一个第三方语言,就需要对这个第三方语言增加一些定义。有的时候,interface文件的写法非常[tricky](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/api/Paddle.swig#L36)。社区贡献代码学习成本高。
* SWIG暴露的接口保留了C++的接口样式,很难保证多语言代码风格的一致性。(函数命名,错误处理)
* 因为SWIG在第三方语言中暴露的函数名,类名和C++中完全一致。C++的命名风格并不能适应其他第三方语言。如果使用SWIG我们需要将在interface文件里,将大量的`SomeCppClass`重命名成`some_python_class`,或者`SomeGoTypes`。
* 对于不同语言,错误处理的方式也不尽相同。例如对于Java或者Python,最常见的错误处理方式是Exception,而对于Golang,错误处理方式是返回值。而SWIG只能简单的暴露C++接口,无法做到对于各种语言错误处理方式的适配。
* 对于大多数语言,直接使用C语言的.h并不困难。例如Python的[cffi](https://cffi.readthedocs.io/en/latest/overview.html#simple-example-abi-level-in-line)或者[Cython](http://cython.org/), golang的[cgo](https://golang.org/cmd/cgo/)。
* SWIG支持的语言或者解释器有局限。例如对于Python,使用SWIG只支持CPython解释器,而不支持PyPy解释器。
## 原因列表
| 结论 | 对比 | 原因 |
|---| --- | --- |
| 使用动态库 | 不使用静态库 | 解释型语言只能调用动态库,Paddle静态库链接复杂 |
| 不嵌入其他语言解释器 | 不嵌入Python解释器 | Paddle C++目前嵌入Python解释器,会导致不同版本Python在一个进程里的bug |
| 不引用其他动态库 | | Paddle一个动态库可以在任何Linux系统上运行 |
| 使用C99做接口 | 不使用C++做接口 | C有标准的ABI,C99是目前C最广泛的使用标准,且C99支持bool类型和定长整数(uint64_t等)类型 |
| 使用void*作为类句柄 | 不显示的写每个类具体包含什么| 实现简单,并且让接口脱离实现细节 |
| 手写多语言绑定 | 不使用SWIG | 使用SWIG需要多语言绑定的开发人员熟练掌握SWIG配置,社区参与困难。SWIG生成的代码不能保证多语言代码风格的一致性 |
## 简单实现
TBD
# Python Data Reader Design Doc
At training and testing time, PaddlePaddle programs need to read data. To ease the users' work to write data reading code, we define that
- A *reader* is a function that reads data (from file, network, random number generator, etc) and yields data items.
- A *reader creator* is a function that returns a reader function.
- A *reader decorator* is a function, which accepts one or more readers, and returns a reader.
- A *batch reader* is a function that reads data (from *reader*, file, network, random number generator, etc) and yields a batch of data items.
and provide function which converts reader to batch reader, frequently used reader creators and reader decorators.
## Data Reader Interface
Indeed, *data reader* doesn't have to be a function that reads and yields data items. It can be any function with no parameter that creates a iterable (anything can be used in `for x in iterable`):
```
iterable = data_reader()
```
Element produced from the iterable should be a **single** entry of data, **not** a mini batch. That entry of data could be a single item, or a tuple of items. Item should be of [supported type](http://www.paddlepaddle.org/doc/ui/data_provider/pydataprovider2.html?highlight=dense_vector#input-types) (e.g., numpy 1d array of float32, int, list of int)
An example implementation for single item data reader creator:
```python
def reader_creator_random_image(width, height):
def reader():
while True:
yield numpy.random.uniform(-1, 1, size=width*height)
return reader
```
An example implementation for multiple item data reader creator:
```python
def reader_creator_random_image_and_label(width, height, label):
def reader():
while True:
yield numpy.random.uniform(-1, 1, size=width*height), label
return reader
```
## Batch Reader Interface
*batch reader* can be any function with no parameter that creates a iterable (anything can be used in `for x in iterable`). The output of the iterable should be a batch (list) of data items. Each item inside the list must be a tuple.
Here are valid outputs:
```python
# a mini batch of three data items. Each data item consist three columns of data, each of which is 1.
[(1, 1, 1),
(2, 2, 2),
(3, 3, 3)]
# a mini batch of three data items, each data item is a list (single column).
[([1,1,1],),
([2,2,2],),
([3,3,3],),
```
Please note that each item inside the list must be a tuple, below is an invalid output:
```python
# wrong, [1,1,1] needs to be inside a tuple: ([1,1,1],).
# Otherwise it's ambiguous whether [1,1,1] means a single column of data [1, 1, 1],
# or three column of datas, each of which is 1.
[[1,1,1],
[2,2,2],
[3,3,3]]
```
It's easy to convert from reader to batch reader:
```python
mnist_train = paddle.dataset.mnist.train()
mnist_train_batch_reader = paddle.batch(mnist_train, 128)
```
Also easy to create custom batch reader:
```python
def custom_batch_reader():
while True:
batch = []
for i in xrange(128):
batch.append((numpy.random.uniform(-1, 1, 28*28),)) # note that it's a tuple being appended.
yield batch
mnist_random_image_batch_reader = custom_batch_reader
```
## Usage
batch reader, mapping from item(s) read to data layer, batch size and number of total pass will be passed into `paddle.train`:
```python
# two data layer is created:
image_layer = paddle.layer.data("image", ...)
label_layer = paddle.layer.data("label", ...)
# ...
batch_reader = paddle.batch(paddle.dataset.mnist.train(), 128)
paddle.train(batch_reader, {"image":0, "label":1}, 128, 10, ...)
```
## Data Reader Decorator
*Data reader decorator* takes a single or multiple data reader, returns a new data reader. It is similar to a [python decorator](https://wiki.python.org/moin/PythonDecorators), but it does not use `@` syntax.
Since we have a strict interface for data readers (no parameter, return a single data item). Data reader can be used flexiable via data reader decorators. Following are a few examples:
### Prefetch Data
Since reading data may take time and training can not proceed without data. It is generally a good idea to prefetch data.
Use `paddle.reader.buffered` to prefetch data:
```python
buffered_reader = paddle.reader.buffered(paddle.dataset.mnist.train(), 100)
```
`buffered_reader` will try to buffer (prefetch) `100` data entries.
### Compose Multiple Data Readers
For example, we want to use a source of real images (reusing mnist dataset), and a source of random images as input for [Generative Adversarial Networks](https://arxiv.org/abs/1406.2661).
We can do:
```python
def reader_creator_random_image(width, height):
def reader():
while True:
yield numpy.random.uniform(-1, 1, size=width*height)
return reader
def reader_creator_bool(t):
def reader:
while True:
yield t
return reader
true_reader = reader_creator_bool(True)
false_reader = reader_creator_bool(False)
reader = paddle.reader.compose(paddle.dataset.mnist.train(), data_reader_creator_random_image(20, 20), true_reader, false_reader)
# Skipped 1 because paddle.dataset.mnist.train() produces two items per data entry.
# And we don't care second item at this time.
paddle.train(paddle.batch(reader, 128), {"true_image":0, "fake_image": 2, "true_label": 3, "false_label": 4}, ...)
```
### Shuffle
Given shuffle buffer size `n`, `paddle.reader.shuffle` will return a data reader that buffers `n` data entries and shuffle them before a data entry is read.
Example:
```python
reader = paddle.reader.shuffle(paddle.dataset.mnist.train(), 512)
```
## Q & A
### Why reader return only a single entry, but not a mini batch?
Always returning a single entry make reusing existing data readers much easier (e.g., if existing reader return not a single entry but 3 entries, training code will be more complex because it need to handle cases like batch size 2).
We provide function `paddle.batch` to turn (single entry) reader into batch reader.
### Why do we need batch reader, isn't train take reader and batch_size as arguments sufficient?
In most of the case, train taking reader and batch_size as arguments would be sufficent. However sometimes user want to customize order of data entries inside a mini batch. Or even change batch size dynamically.
### Why use a dictionary but not a list to provide mapping?
We decided to use dictionary (`{"image":0, "label":1}`) instead of list (`["image", "label"]`) is because that user can easily resue item (e.g., using `{"image_a":0, "image_b":0, "label":1}`) or skip item (e.g., using `{"image_a":0, "label":2}`).
### How to create custom data reader creator
```python
def image_reader_creator(image_path, label_path, n):
def reader():
f = open(image_path)
l = open(label_path)
images = numpy.fromfile(
f, 'ubyte', count=n * 28 * 28).reshape((n, 28 * 28)).astype('float32')
images = images / 255.0 * 2.0 - 1.0
labels = numpy.fromfile(l, 'ubyte', count=n).astype("int")
for i in xrange(n):
yield images[i, :], labels[i] # a single entry of data is created each time
f.close()
l.close()
return reader
# images_reader_creator creates a reader
reader = image_reader_creator("/path/to/image_file", "/path/to/label_file", 1024)
paddle.train(paddle.batch(reader, 128), {"image":0, "label":1}, ...)
```
### How is `paddle.train` implemented
An example implementation of paddle.train could be:
```python
def train(batch_reader, mapping, batch_size, total_pass):
for pass_idx in range(total_pass):
for mini_batch in batch_reader(): # this loop will never end in online learning.
do_forward_backward(mini_batch, mapping)
```
Writing New Layers
==================
.. toctree::
:maxdepth: 3
new_layer.rst
# Introduction
Simple Linear Regression
========================
PaddlePaddle is a deep learning platform open-sourced by Baidu. With PaddlePaddle, you can easily train a classic neural network within a couple lines of configuration, or you can build sophisticated models that provide state-of-the-art performance on difficult learning tasks like sentiment analysis, machine translation, image caption and so on.
## 1. A Classic Problem
Problem Background
------------------
Now, to give you a hint of what using PaddlePaddle looks like, let's start with a fundamental learning problem - <a href="https://en.wikipedia.org/wiki/Simple_linear_regression">**simple linear regression**</a> : you have observed a set of two-dimensional data points of `X` and `Y`, where `X` is an explanatory variable and `Y` is corresponding dependent variable, and you want to recover the underlying correlation between `X` and `Y`. Linear regression can be used in many practical scenarios. For example, `X` can be a variable about house size, and `Y` a variable about house price. You can build a model that captures relationship between them by observing real estate markets.
Now, to give you a hint of what using PaddlePaddle looks like, let's start with a fundamental learning problem - `simple linear regression <https://en.wikipedia.org/wiki/Simple_linear_regression>`_: you have observed a set of two-dimensional data points of ``X`` and ``Y``, where ``X`` is an explanatory variable and ``Y`` is corresponding dependent variable, and you want to recover the underlying correlation between ``X`` and ``Y``. Linear regression can be used in many practical scenarios. For example, ``X`` can be a variable about house size, and ``Y`` a variable about house price. You can build a model that captures relationship between them by observing real estate markets.
## 2. Prepare the Data
Prepare the Data
-----------------
Suppose the true relationship can be characterized as `Y = 2X + 0.3`, let's see how to recover this pattern only from observed data. Here is a piece of python code that feeds synthetic data to PaddlePaddle. The code is pretty self-explanatory, the only extra thing you need to add for PaddlePaddle is a definition of input data types.
Suppose the true relationship can be characterized as ``Y = 2X + 0.3``, let's see how to recover this pattern only from observed data. Here is a piece of python code that feeds synthetic data to PaddlePaddle. The code is pretty self-explanatory, the only extra thing you need to add for PaddlePaddle is a definition of input data types.
```python
# dataprovider.py
from paddle.trainer.PyDataProvider2 import *
import random
.. code-block:: python
# define data types of input: 2 real numbers
@provider(input_types=[dense_vector(1), dense_vector(1)],use_seq=False)
def process(settings, input_file):
for i in xrange(2000):
x = random.random()
yield [x], [2*x+0.3]
```
# dataprovider.py
from paddle.trainer.PyDataProvider2 import *
import random
## 3. Train a NeuralNetwork in PaddlePaddle
# define data types of input: 2 real numbers
@provider(input_types=[dense_vector(1), dense_vector(1)],use_seq=False)
def process(settings, input_file):
for i in xrange(2000):
x = random.random()
yield [x], [2*x+0.3]
To recover this relationship between `X` and `Y`, we use a neural network with one layer of linear activation units and a square error cost layer. Don't worry if you are not familiar with these terminologies, it's just saying that we are starting from a random line `Y' = wX + b` , then we gradually adapt `w` and `b` to minimize the difference between `Y'` and `Y`. Here is what it looks like in PaddlePaddle:
Train a NeuralNetwork
----------------------
```python
# trainer_config.py
from paddle.trainer_config_helpers import *
To recover this relationship between ``X`` and ``Y``, we use a neural network with one layer of linear activation units and a square error cost layer. Don't worry if you are not familiar with these terminologies, it's just saying that we are starting from a random line ``Y' = wX + b`` , then we gradually adapt ``w`` and ``b`` to minimize the difference between ``Y'`` and ``Y``. Here is what it looks like in PaddlePaddle:
# 1. read data. Suppose you saved above python code as dataprovider.py
data_file = 'empty.list'
with open(data_file, 'w') as f: f.writelines(' ')
define_py_data_sources2(train_list=data_file, test_list=None,
module='dataprovider', obj='process',args={})
.. code-block:: python
# 2. learning algorithm
settings(batch_size=12, learning_rate=1e-3, learning_method=MomentumOptimizer())
# trainer_config.py
from paddle.trainer_config_helpers import *
# 3. Network configuration
x = data_layer(name='x', size=1)
y = data_layer(name='y', size=1)
y_predict = fc_layer(input=x, param_attr=ParamAttr(name='w'), size=1, act=LinearActivation(), bias_attr=ParamAttr(name='b'))
cost = regression_cost(input=y_predict, label=y)
outputs(cost)
```
# 1. read data. Suppose you saved above python code as dataprovider.py
data_file = 'empty.list'
with open(data_file, 'w') as f: f.writelines(' ')
define_py_data_sources2(train_list=data_file, test_list=None,
module='dataprovider', obj='process',args={})
# 2. learning algorithm
settings(batch_size=12, learning_rate=1e-3, learning_method=MomentumOptimizer())
# 3. Network configuration
x = data_layer(name='x', size=1)
y = data_layer(name='y', size=1)
y_predict = fc_layer(input=x, param_attr=ParamAttr(name='w'), size=1, act=LinearActivation(), bias_attr=ParamAttr(name='b'))
cost = mse_cost(input=y_predict, label=y)
outputs(cost)
Some of the most fundamental usages of PaddlePaddle are demonstrated:
......@@ -55,46 +59,43 @@ Some of the most fundamental usages of PaddlePaddle are demonstrated:
- The second part describes learning algorithm. It defines in what ways adjustments are made to model parameters. PaddlePaddle provides a rich set of optimizers, but a simple momentum based optimizer will suffice here, and it processes 12 data points each time.
- Finally, the network configuration. It usually is as simple as "stacking" layers. Three kinds of layers are used in this configuration:
- **Data Layer**: a network always starts with one or more data layers. They provide input data to the rest of the network. In this problem, two data layers are used respectively for `X` and `Y`.
- **Data Layer**: a network always starts with one or more data layers. They provide input data to the rest of the network. In this problem, two data layers are used respectively for ``X`` and ``Y``.
- **FC Layer**: FC layer is short for Fully Connected Layer, which connects all the input units to current layer and does the actual computation specified as activation function. Computation layers like this are the fundamental building blocks of a deeper model.
- **Cost Layer**: in training phase, cost layers are usually the last layers of the network. They measure the performance of current model, and provide guidence to adjust parameters.
Now that everything is ready, you can train the network with a simple command line call:
```
paddle train --config=trainer_config.py --save_dir=./output --num_passes=30
```
This means that PaddlePaddle will train this network on the synthectic dataset for 30 passes, and save all the models under path `./output`. You will see from the messages printed out during training phase that the model cost is decreasing as time goes by, which indicates we are getting a closer guess.
.. code-block:: bash
paddle train --config=trainer_config.py --save_dir=./output --num_passes=30
## 4. Evaluate the Model
This means that PaddlePaddle will train this network on the synthectic dataset for 30 passes, and save all the models under path ``./output``. You will see from the messages printed out during training phase that the model cost is decreasing as time goes by, which indicates we are getting a closer guess.
Usually, a different dataset that left out during training phase should be used to evalute the models. However, we are lucky enough to know the real answer: `w=2, b=0.3`, thus a better option is to check out model parameters directly.
In PaddlePaddle, training is just to get a collection of model parameters, which are `w` and `b` in this case. Each parameter is saved in an individual file in the popular `numpy` array format. Here is the code that reads parameters from last pass.
Evaluate the Model
-------------------
```python
import numpy as np
import os
Usually, a different dataset that left out during training phase should be used to evalute the models. However, we are lucky enough to know the real answer: ``w=2, b=0.3``, thus a better option is to check out model parameters directly.
def load(file_name):
with open(file_name, 'rb') as f:
f.read(16) # skip header for float type.
return np.fromfile(f, dtype=np.float32)
print 'w=%.6f, b=%.6f' % (load('output/pass-00029/w'), load('output/pass-00029/b'))
# w=1.999743, b=0.300137
```
In PaddlePaddle, training is just to get a collection of model parameters, which are ``w`` and ``b`` in this case. Each parameter is saved in an individual file in the popular ``numpy`` array format. Here is the code that reads parameters from last pass.
<center> ![](./parameters.png) </center>
.. code-block:: python
Although starts from a random guess, you can see that value of `w` changes quickly towards 2 and `b` changes quickly towards 0.3. In the end, the predicted line is almost identical with real answer.
import numpy as np
import os
There, you have recovered the underlying pattern between `X` and `Y` only from observed data.
def load(file_name):
with open(file_name, 'rb') as f:
f.read(16) # skip header for float type.
return np.fromfile(f, dtype=np.float32)
print 'w=%.6f, b=%.6f' % (load('output/pass-00029/w'), load('output/pass-00029/b'))
# w=1.999743, b=0.300137
.. image:: parameters.png
:align: center
## 5. Where to Go from Here
Although starts from a random guess, you can see that value of ``w`` changes quickly towards 2 and ``b`` changes quickly towards 0.3. In the end, the predicted line is almost identical with real answer.
- <a href="../build/index.html"> Build and Installation </a>
- <a href="../demo/quick_start/index_en.html">Quick Start</a>
- <a href="../demo/index.html">Example and Demo</a>
There, you have recovered the underlying pattern between ``X`` and ``Y`` only from observed data.
......@@ -4,28 +4,31 @@ Installing from Sources
* [1. Download and Setup](#download)
* [2. Requirements](#requirements)
* [3. Build on Ubuntu](#ubuntu)
* [4. Build on Centos](#centos)
## <span id="download">Download and Setup</span>
You can download PaddlePaddle from the [github source](https://github.com/gangliao/Paddle).
You can download PaddlePaddle from the [github source](https://github.com/PaddlePaddle/Paddle).
```bash
git clone https://github.com/baidu/Paddle paddle
git clone https://github.com/PaddlePaddle/Paddle paddle
cd paddle
```
## <span id="requirements">Requirements</span>
To compile the source code, your computer must be equipped with GCC >=4.6 or Clang compiler.
### Dependencies
To compile the source code, your computer must be equipped with the following dependencies.
- **CMake**: version >= 2.8
- **Compiler**: GCC >= 4.8 or Clang >= 3.3 (AppleClang >= 5.1) and gfortran compiler
- **CMake**: CMake >= 3.0 (at least CMake 3.4 on Mac OS X)
- **BLAS**: MKL, OpenBlas or ATLAS
- **protobuf**: version >= 2.4, **Note: 3.x is not supported**
- **python**: only python 2.7 is supported currently
- **Python**: only support Python 2.7
**Note:** For CUDA 7.0 and CUDA 7.5, GCC 5.0 and up are not supported!
For CUDA 8.0, GCC versions later than 5.3 are not supported!
### Options
PaddlePaddle supports some build options. To enable it, first you need to install the related libraries.
PaddlePaddle supports some build options.
<html>
<table>
......@@ -36,37 +39,42 @@ PaddlePaddle supports some build options. To enable it, first you need to instal
</tr>
</thead>
<tbody>
<tr><td class="left">WITH_GPU</td><td class="left">Compile with GPU mode.</td></tr>
<tr><td class="left">WITH_DOUBLE</td><td class="left">Compile with double precision floating-point, default: single precision.</td></tr>
<tr><td class="left">WITH_GLOG</td><td class="left">Compile with glog. If not found, default: an internal log implementation.</td></tr>
<tr><td class="left">WITH_GFLAGS</td><td class="left">Compile with gflags. If not found, default: an internal flag implementation.</td></tr>
<tr><td class="left">WITH_TESTING</td><td class="left">Compile with gtest for PaddlePaddle's unit testing.</td></tr>
<tr><td class="left">WITH_DOC</td><td class="left"> Compile to generate PaddlePaddle's docs, default: disabled (OFF).</td></tr>
<tr><td class="left">WITH_SWIG_PY</td><td class="left">Compile with python predict API, default: disabled (OFF).</td></tr>
<tr><td class="left">WITH_STYLE_CHECK</td><td class="left">Compile with code style check, default: enabled (ON).</td></tr>
<tr><td class="left">WITH_GPU</td><td class="left">Compile PaddlePaddle with NVIDIA GPU</td></tr>
<tr><td class="left">WITH_AVX</td><td class="left">Compile PaddlePaddle with AVX intrinsics</td></tr>
<tr><td class="left">WITH_DSO</td><td class="left">Compile PaddlePaddle with dynamic linked CUDA</td></tr>
<tr><td class="left">WITH_TESTING</td><td class="left">Compile PaddlePaddle with unit testing</td></tr>
<tr><td class="left">WITH_SWIG_PY</td><td class="left">Compile PaddlePaddle with inference api</td></tr>
<tr><td class="left">WITH_STYLE_CHECK</td><td class="left">Compile PaddlePaddle with style check</td></tr>
<tr><td class="left">WITH_PYTHON</td><td class="left">Compile PaddlePaddle with python interpreter</td></tr>
<tr><td class="left">WITH_DOUBLE</td><td class="left">Compile PaddlePaddle with double precision</td></tr>
<tr><td class="left">WITH_RDMA</td><td class="left">Compile PaddlePaddle with RDMA support</td></tr>
<tr><td class="left">WITH_TIMER</td><td class="left">Compile PaddlePaddle with stats timer</td></tr>
<tr><td class="left">WITH_PROFILER</td><td class="left">Compile PaddlePaddle with GPU profiler</td></tr>
<tr><td class="left">WITH_DOC</td><td class="left">Compile PaddlePaddle with documentation</td></tr>
<tr><td class="left">WITH_COVERAGE</td><td class="left">Compile PaddlePaddle with code coverage</td></tr>
<tr><td class="left">COVERALLS_UPLOAD</td><td class="left">Package code coverage data to coveralls</td></tr>
<tr><td class="left">ON_TRAVIS</td><td class="left">Exclude special unit test on Travis CI</td></tr>
</tbody>
</table>
</html>
**Note:**
- The GPU version works best with Cuda Toolkit 7.5 and cuDNN v5.
- Other versions like Cuda Toolkit 6.5, 7.0, 8.0 and cuDNN v2, v3, v4 are also supported.
- The GPU version works best with Cuda Toolkit 8.0 and cuDNN v5.
- Other versions like Cuda Toolkit 7.0, 7.5 and cuDNN v3, v4 are also supported.
- **To utilize cuDNN v5, Cuda Toolkit 7.5 is prerequisite and vice versa.**
As a simple example, consider the following:
1. **Python Dependencies(optional)**
1. **BLAS Dependencies(optional)**
To compile PaddlePaddle with python predict API, make sure swig installed and set `-DWITH_SWIG_PY=ON` as follows:
CMake will search BLAS libraries from system. If not found, OpenBLAS will be downloaded, built and installed automatically.
To utilize preinstalled BLAS, you can simply specify MKL, OpenBLAS or ATLAS via `MKL_ROOT`, `OPENBLAS_ROOT` or `ATLAS_ROOT`.
```bash
# install swig on ubuntu
sudo apt-get install swig
# install swig on Mac OS X
brew install swig
# active swig in cmake
cmake .. -DWITH_SWIG_PY=ON
# specify MKL
cmake .. -DMKL_ROOT=<mkl_path>
# or specify OpenBLAS
cmake .. -DOPENBLAS_ROOT=<openblas_path>
```
2. **Doc Dependencies(optional)**
......@@ -75,7 +83,7 @@ As a simple example, consider the following:
```bash
pip install 'sphinx>=1.4.0'
pip install sphinx_rtd_theme breathe recommonmark
pip install sphinx_rtd_theme recommonmark
# install doxygen on Ubuntu
sudo apt-get install doxygen
......@@ -90,24 +98,21 @@ As a simple example, consider the following:
### Install Dependencies
- **CPU Dependencies**
- **Paddle Dependencies**
```bash
# necessary
sudo apt-get update
sudo apt-get install -y g++ make cmake build-essential libatlas-base-dev python python-pip libpython-dev m4 libprotobuf-dev protobuf-compiler python-protobuf python-numpy git
# optional
sudo apt-get install libgoogle-glog-dev
sudo apt-get install libgflags-dev
sudo apt-get install libgtest-dev
sudo pip install wheel
pushd /usr/src/gtest
cmake .
make
sudo cp *.a /usr/lib
popd
sudo apt-get install -y git curl gcc g++ gfortran make build-essential automake
sudo apt-get install -y python python-pip python-numpy libpython-dev bison
sudo pip install 'protobuf==3.1.0.post1'
# install cmake 3.4
curl -sSL https://cmake.org/files/v3.4/cmake-3.4.1.tar.gz | tar -xz && \
cd cmake-3.4.1 && ./bootstrap && make -j4 && sudo make install && \
cd .. && rm -rf cmake-3.4.1
```
- **GPU Dependencies (optional)**
To build GPU version, you will need the following installed:
......@@ -140,53 +145,78 @@ As usual, the best option is to create build folder under paddle project directo
```bash
mkdir build && cd build
cmake ..
```
Finally, you can build and install PaddlePaddle:
```bash
# you can add build option here, such as:
cmake .. -DCMAKE_INSTALL_PREFIX=<path to install>
# please use sudo make install, if you want to install PaddlePaddle into the system
make -j `nproc` && make install
# set PaddlePaddle installation path in ~/.bashrc
export PATH=<path to install>/bin:$PATH
# install PaddlePaddle Python modules.
sudo pip install <path to install>/opt/paddle/share/wheels/*.whl
```
## <span id="centos">Build on Centos 7</span>
### Install Dependencies
- **CPU Dependencies**
```bash
# necessary
sudo yum update
sudo yum install -y epel-release
sudo yum install -y make cmake3 python-devel python-pip gcc-gfortran swig git
sudo pip install wheel numpy
sudo pip install 'protobuf>=3.0.0'
```
- **GPU Dependencies (optional)**
To build GPU version, you will need the following installed:
CMake first check PaddlePaddle's dependencies in system default path. After installing some optional
libraries, corresponding build option will be set automatically (for instance, glog, gtest and gflags).
If still not found, you can manually set it based on CMake error information from your screen.
1. a CUDA-capable GPU
2. A supported version of Linux with a gcc compiler and toolchain
3. NVIDIA CUDA Toolkit (available at http://developer.nvidia.com/cuda-downloads)
4. NVIDIA cuDNN Library (availabel at https://developer.nvidia.com/cudnn)
As a simple example, consider the following:
The CUDA development environment relies on tight integration with the host development environment,
including the host compiler and C runtime libraries, and is therefore only supported on
distribution versions that have been qualified for this CUDA Toolkit release.
After downloading cuDNN library, issue the following commands:
- **Only CPU**
```bash
sudo tar -xzf cudnn-7.5-linux-x64-v5.1.tgz -C /usr/local
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
```
Then you need to set LD\_LIBRARY\_PATH, PATH environment variables in ~/.bashrc.
```bash
cmake .. -DWITH_GPU=OFF
```
- **GPU**
```bash
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda/bin:$PATH
```
```bash
cmake .. -DWITH_GPU=ON
```
### Build and Install
- **GPU with doc and swig**
As usual, the best option is to create build folder under paddle project directory.
```bash
cmake .. -DWITH_GPU=ON -DWITH_DOC=ON -DWITH_SWIG_PY=ON
```
```bash
mkdir build && cd build
```
Finally, you can build PaddlePaddle:
Finally, you can build and install PaddlePaddle:
```bash
# you can add build option here, such as:
cmake .. -DWITH_GPU=ON -DCMAKE_INSTALL_PREFIX=<path to install>
cmake3 .. -DCMAKE_INSTALL_PREFIX=<path to install>
# please use sudo make install, if you want to install PaddlePaddle into the system
make -j `nproc` && make install
# set PaddlePaddle installation path in ~/.bashrc
export PATH=<path to install>/bin:$PATH
```
**Note:**
If you set `WITH_SWIG_PY=ON`, related python dependencies also need to be installed.
Otherwise, PaddlePaddle will automatically install python dependencies
at first time when user run paddle commands, such as `paddle version`, `paddle train`.
It may require sudo privileges:
```bash
# you can run
# install PaddlePaddle Python modules.
sudo pip install <path to install>/opt/paddle/share/wheels/*.whl
# or just run
sudo paddle version
```
PaddlePaddle in Docker Containers
=================================
Docker container is currently the only officially-supported way to
running PaddlePaddle. This is reasonable as Docker now runs on all
major operating systems including Linux, Mac OS X, and Windows.
Please be aware that you will need to change `Dockers settings
<https://github.com/PaddlePaddle/Paddle/issues/627>`_ to make full use
of your hardware resource on Mac OS X and Windows.
Working With Docker
-------------------
Docker is simple as long as we understand a few basic concepts:
- *image*: A Docker image is a pack of software. It could contain one or more programs and all their dependencies. For example, the PaddlePaddle's Docker image includes pre-built PaddlePaddle and Python and many Python packages. We can run a Docker image directly, other than installing all these software. We can type
.. code-block:: bash
docker images
to list all images in the system. We can also run
.. code-block:: bash
docker pull paddlepaddle/paddle:0.10.0rc2
to download a Docker image, paddlepaddle/paddle in this example,
from Dockerhub.com.
- *container*: considering a Docker image a program, a container is a
"process" that runs the image. Indeed, a container is exactly an
operating system process, but with a virtualized filesystem, network
port space, and other virtualized environment. We can type
.. code-block:: bash
docker run paddlepaddle/paddle:0.10.0rc2
to start a container to run a Docker image, paddlepaddle/paddle in this example.
- By default docker container have an isolated file system namespace,
we can not see the files in the host file system. By using *volume*,
mounted files in host will be visible inside docker container.
Following command will mount current dirctory into /data inside
docker container, run docker container from debian image with
command :code:`ls /data`.
.. code-block:: bash
docker run --rm -v $(pwd):/data debian ls /data
Usage of CPU-only and GPU Images
----------------------------------
We package PaddlePaddle's compile environment into a Docker image,
called the develop image, it contains all compiling tools that
PaddlePaddle needs. We package compiled PaddlePaddle program into a
Docker image as well, called the production image, it contains all
runtime environment that running PaddlePaddle needs. For each version
of PaddlePaddle, we release both of them. Production image includes
CPU-only version and a CUDA GPU version and their no-AVX versions.
We put the docker images on `dockerhub.com
<https://hub.docker.com/r/paddledev/paddle/>`_. You can find the
latest versions under "tags" tab at dockerhub.com. If you are in
China, you can use our Docker image registry mirror to speed up the
download process. To use it, please replace all paddlepaddle/paddle in
the commands to docker.paddlepaddle.org/paddle.
1. Production images, this image might have multiple variants:
- GPU/AVX::code:`paddlepaddle/paddle:<version>-gpu`
- GPU/no-AVX::code:`paddlepaddle/paddle:<version>-gpu-noavx`
- CPU/AVX::code:`paddlepaddle/paddle:<version>`
- CPU/no-AVX::code:`paddlepaddle/paddle:<version>-noavx`
Please be aware that the CPU-only and the GPU images both use the
AVX instruction set, but old computers produced before 2008 do not
support AVX. The following command checks if your Linux computer
supports AVX:
.. code-block:: bash
if cat /proc/cpuinfo | grep -i avx; then echo Yes; else echo No; fi
To run the CPU-only image as an interactive container:
.. code-block:: bash
docker run -it --rm paddlepaddle/paddle:0.10.0rc2 /bin/bash
Above method work with the GPU image too -- the recommended way is
using `nvidia-docker <https://github.com/NVIDIA/nvidia-docker>`_.
Please install nvidia-docker first following this `tutorial
<https://github.com/NVIDIA/nvidia-docker#quick-start>`_.
Now you can run a GPU image:
.. code-block:: bash
nvidia-docker run -it --rm paddlepaddle/paddle:0.10.0rc2-gpu /bin/bash
2. development image :code:`paddlepaddle/paddle:<version>-dev`
This image has packed related develop tools and runtime
environment. Users and developers can use this image instead of
their own local computer to accomplish development, build,
releasing, document writing etc. While different version of paddle
may depends on different version of libraries and tools, if you
want to setup a local environment, you must pay attention to the
versions. The development image contains:
- gcc/clang
- nvcc
- Python
- sphinx
- woboq
- sshd
Many developers use servers with GPUs, they can use ssh to login to
the server and run :code:`docker exec` to enter the docker
container and start their work. Also they can start a development
docker image with SSHD service, so they can login to the container
and start work.
Train Model Using Python API
----------------------------
Our official docker image provides a runtime for PaddlePaddle
programs. The typical workflow will be as follows:
Create a directory as workspace:
.. code-block:: bash
mkdir ~/workspace
Edit a PaddlePaddle python program using your favourite editor
.. code-block:: bash
emacs ~/workspace/example.py
Run the program using docker:
.. code-block:: bash
docker run --rm -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2 python /workspace/example.py
Or if you are using GPU for training:
.. code-block:: bash
nvidia-docker run --rm -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2-gpu python /workspace/example.py
Above commands will start a docker container by running :code:`python
/workspace/example.py`. It will stop once :code:`python
/workspace/example.py` finishes.
Another way is to tell docker to start a :code:`/bin/bash` session and
run PaddlePaddle program interactively:
.. code-block:: bash
docker run -it -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2 /bin/bash
# now we are inside docker container
cd /workspace
python example.py
Running with GPU is identical:
.. code-block:: bash
nvidia-docker run -it -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2-gpu /bin/bash
# now we are inside docker container
cd /workspace
python example.py
Develop PaddlePaddle or Train Model Using C++ API
---------------------------------------------------
We will be using PaddlePaddle development image since it contains all
compiling tools and dependencies.
1. Build PaddlePaddle develop image
Use following command to build PaddlePaddle develop image:
.. code-block:: bash
git clone https://github.com/PaddlePaddle/Paddle.git && cd Paddle
docker build -t paddle:dev .
2. Build PaddlePaddle production image
There are two steps for building production image, the first step is to run:
.. code-block:: bash
docker run -v $(pwd):/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=OFF" -e "WITH_TEST=ON" paddle:dev
The above command will compile PaddlePaddle and create a Dockerfile for building production image. All the generated files are in the build directory. "WITH_GPU" controls if the generated production image supports GPU. "WITH_AVX" controls if the generated production image supports AVX. "WITH_TEST" controls if the unit test will be generated.
The second step is to run:
.. code-block:: bash
docker build -t paddle:prod -f build/Dockerfile ./build
The above command will generate the production image by copying the compiled PaddlePaddle program into the image.
3. Run unit test
Following command will run unit test:
.. code-block:: bash
docker run -it -v $(pwd):/paddle paddle:dev bash -c "cd /paddle/build && ctest"
PaddlePaddle Book
------------------
The Jupyter Notebook is an open-source web application that allows
you to create and share documents that contain live code, equations,
visualizations and explanatory text in a single browser.
PaddlePaddle Book is an interactive Jupyter Notebook for users and developers.
We already exposed port 8888 for this book. If you want to
dig deeper into deep learning, PaddlePaddle Book definitely is your best choice.
We provide a packaged book image, simply issue the command:
.. code-block:: bash
docker run -p 8888:8888 paddlepaddle/book
Then, you would back and paste the address into the local browser:
.. code-block:: text
http://localhost:8888/
That's all. Enjoy your journey!
Documentation
-------------
Paddle Docker images include an HTML version of C++ source code
generated using `woboq code browser
<https://github.com/woboq/woboq_codebrowser>`_. This makes it easy
for users to browse and understand the C++ source code.
As long as we give the Paddle Docker container a name, we can run an
additional Nginx Docker container to serve the volume from the Paddle
container:
.. code-block:: bash
docker run -d --name paddle-cpu-doc paddle:<version>
docker run -d --volumes-from paddle-cpu-doc -p 8088:80 nginx
Then we can direct our Web browser to the HTML version of source code
at http://localhost:8088/paddle/
Build And Install PaddlePaddle
================================
Install and Build
=================
Install PaddlePaddle
----------------------
.. toctree::
:maxdepth: 1
:glob:
install_*
internal/install_from_jumbo.md
docker_install.rst
ubuntu_install.rst
docker_install_en.rst
ubuntu_install_en.rst
Build from Source
-----------------
.. warning::
Please use :code:`deb` package or :code:`docker` image to install paddle. The building guide is used for hacking or contributing to PaddlePaddle.
If you want to hack and contribute PaddlePaddle source code, following guides can help you\:
Please use :code:`deb` package or :code:`docker` image to install paddle. The building guide is used for hacking or contributing PaddlePaddle source code.
.. toctree::
:maxdepth: 1
:glob:
build_from_source.md
contribute_to_paddle.md
build_from_source_en.md
GET STARTED
============
.. toctree::
:maxdepth: 1
build_and_install/index_en.rst
- `Deep Learning 101 <http://book.paddlepaddle.org/index.en.html>`_
Recurrent Neural Network Configuration
======================================
RNN Configuration
=================
This tutorial will guide you how to configure recurrent neural network in PaddlePaddle. PaddlePaddle supports highly flexible and efficient recurrent neural network configuration. In this tutorial, you will learn how to:
......@@ -17,7 +17,7 @@ PaddlePaddle does not need any preprocessing to sequence data, such as padding.
.. code-block:: python
settings.slots = [
settings.input_types = [
integer_value_sequence(len(settings.src_dict)),
integer_value_sequence(len(settings.trg_dict)),
integer_value_sequence(len(settings.trg_dict))]
......@@ -30,7 +30,7 @@ Then at the :code:`process` function, each :code:`yield` function will return th
yield src_ids, trg_ids, trg_ids_next
For more details description of how to write a data provider, please refer to `PyDataProvider2 <../../ui/data_provider/index.html>`_. The full data provider file is located at :code:`demo/seqToseq/dataprovider.py`.
For more details description of how to write a data provider, please refer to :ref:`api_pydataprovider2` . The full data provider file is located at :code:`demo/seqToseq/dataprovider.py`.
===============================================
Configure Recurrent Neural Network Architecture
......@@ -42,8 +42,8 @@ Simple Gated Recurrent Neural Network
Recurrent neural network process a sequence at each time step sequentially. An example of the architecture of LSTM is listed below.
.. image:: ./bi_lstm.jpg
:align: center
.. image:: ../../../tutorials/sentiment_analysis/src/bi_lstm.jpg
:align: center
Generally speaking, a recurrent network perform the following operations from :math:`t=1` to :math:`t=T`, or reversely from :math:`t=T` to :math:`t=1`.
......@@ -101,12 +101,12 @@ Sequence to Sequence Model with Attention
-----------------------------------------
We will use the sequence to sequence model with attention as an example to demonstrate how you can configure complex recurrent neural network models. An illustration of the sequence to sequence model with attention is shown in the following figure.
.. image:: ./encoder-decoder-attention-model.png
:align: center
.. image:: ../../../tutorials/text_generation/encoder-decoder-attention-model.png
:align: center
In this model, the source sequence :math:`S = \{s_1, \dots, s_T\}` is encoded with a bidirectional gated recurrent neural networks. The hidden states of the bidirectional gated recurrent neural network :math:`H_S = \{H_1, \dots, H_T\}` is called *encoder vector* The decoder is a gated recurrent neural network. When decoding each token :math:`y_t`, the gated recurrent neural network generates a set of weights :math:`W_S^t = \{W_1^t, \dots, W_T^t\}`, which are used to compute a weighted sum of the encoder vector. The weighted sum of the encoder vector is utilized to condition the generation of the token :math:`y_t`.
The encoder part of the model is listed below. It calls :code:`grumemory` to represent gated recurrent neural network. It is the recommended way of using recurrent neural network if the network architecture is simple, because it is faster than :code:`recurrent_group`. We have implemented most of the commonly used recurrent neural network architectures, you can refer to `Layers <../../ui/api/trainer_config_helpers/layers_index.html>`_ for more details.
The encoder part of the model is listed below. It calls :code:`grumemory` to represent gated recurrent neural network. It is the recommended way of using recurrent neural network if the network architecture is simple, because it is faster than :code:`recurrent_group`. We have implemented most of the commonly used recurrent neural network architectures, you can refer to :ref:`api_trainer_config_helpers_layers` for more details.
We also project the encoder vector to :code:`decoder_size` dimensional space, get the first instance of the backward recurrent network, and project it to :code:`decoder_size` dimensional space:
......@@ -246,6 +246,6 @@ The code is listed below:
outputs(beam_gen)
Notice that this generation technique is only useful for decoder like generation process. If you are working on sequence tagging tasks, please refer to `Semantic Role Labeling Demo <../../demo/semantic_role_labeling/index.html>`_ for more details.
Notice that this generation technique is only useful for decoder like generation process. If you are working on sequence tagging tasks, please refer to :ref:`semantic_role_labeling` for more details.
The full configuration file is located at :code:`demo/seqToseq/seqToseq_net.py`.
# Contribute to PaddlePaddle
# Contribute Code
We sincerely appreciate your contributions. You can use fork and pull request
workflow to merge your code.
workflow to merge your code.
## Code Requirements
- Your code must be fully documented by
[doxygen](http://www.stack.nl/~dimitri/doxygen/) style.
......@@ -12,11 +12,11 @@ workflow to merge your code.
- Pass all unit tests.
The following tutorial guides you into submitting your contibution.
## [Creating a Fork](https://help.github.com/articles/fork-a-repo/)
Just head over to the GitHub page and click the "Fork" button.
It's just that simple.
It's just that simple.
## Clone
......@@ -25,7 +25,7 @@ The **develop** is the main branch, and other user's branches are feature branch
Once you've created a fork, you can use your favorite git client to clone your
repo or just head straight to the command line:
```shell
# Clone your fork to your local machine
git clone --branch develop https://github.com/USERNAME/Paddle.git
......@@ -36,7 +36,7 @@ If your repository doesn't contain **develop** branch, just create it by your ow
git clone https://github.com/USERNAME/Paddle.git Paddle
cd Paddle
git checkout -b develop # create develop branch.
git remote add upstream https://github.com/baidu/Paddle.git # add upstream to baidu/Paddle
git remote add upstream https://github.com/PaddlePaddle/Paddle.git # add upstream to baidu/Paddle
git pull upstream develop # update to upstream
```
......@@ -46,6 +46,22 @@ Then you can start to develop by making a local developement branch
git checkout -b MY_COOL_STUFF_BRANCH
```
## Using `pre-commit` hook
Paddle developers use [pre-commit](http://pre-commit.com/) tool to manage git
pre-commit hooks. It can help us format source codes (cpp, python), check some
basic thing before commit (only one EOL for each file, do not add a huge file
in git). `pre-commit` tests is a part of unit tests in Travis-CI now, every
PR doesn't fit hook can not be merged into Paddle.
To use [pre-commit](http://pre-commit.com/), you should install it by
`pip install pre-commit`, and currently, Paddle uses `clang-format` to format
c/cpp sources. Please make sure clang-format 3.8+ installed.
Then just run `pre-commit install` in your Paddle clone directory. When you
commit your code, the pre-commit hook will check the local code if there is
anything not suitable to commit, and so on.
## Commit
Commit your changes by following command lines:
......@@ -69,7 +85,7 @@ To do this, you'll need to add a remote at first:
# see the current configured remote repository
git remote -v
# add upstream repository
git remote add upstream https://github.com/baidu/Paddle.git
git remote add upstream https://github.com/PaddlePaddle/Paddle.git
# verify the new upstream
git remote -v
```
......@@ -82,7 +98,7 @@ git pull --rebase upstream develop
If there are no unique commits locally, git will simply perform a fast-forward.
However, if you have been making changes (in the vast majority of cases you
probably shouldn't be), you may have to deal with conflicts.
probably shouldn't be), you may have to deal with conflicts.
Now, your local master branch is up-to-date with everything modified upstream.
......
Writing New Layers
==================
================
Write New Layers
================
This tutorial will guide you to write customized layers in PaddlePaddle. We will utilize fully connected layer as an example to guide you through the following steps for writing a new layer.
......@@ -59,7 +60,7 @@ Implement C++ Class
The C++ class of the layer implements the initialization, forward, and backward part of the layer. The fully connected layer is at :code:`paddle/gserver/layers/FullyConnectedLayer.h` and :code:`paddle/gserver/layers/FullyConnectedLayer.cpp`. We list simplified version of the code below.
It needs to derive the base class :code:`paddle::BaseLayer`, and it needs to override the following functions:
It needs to derive the base class :code:`paddle::Layer`, and it needs to override the following functions:
- constructor and destructor.
- :code:`init` function. It is used to initialize the parameters and settings.
......@@ -208,7 +209,6 @@ The implementation of the backward part has the following steps.
if (biases_ && biases_->getWGrad()) {
biases_->getWGrad()->collectBias(*getOutputGrad(), 1);
/* Increasing the number of gradient */
biases_->getParameterPtr()->incUpdate(callback);
}
......@@ -296,7 +296,7 @@ All the gradient check unit tests are located in :code:`paddle/gserver/tests/tes
+ each inputs needs to call :code:`config.layerConfig.add_inputs();` once.
+ call :code:`testLayerGrad` to perform gradient checks. It has the following arguments.
- layer and input configurations. (:code:`config` in our example)
- type of the input. (:code:`fc` in our example)
- type of the layer. (:code:`fc` in our example)
- batch size of the gradient check. (100 in our example)
- whether the input is transpose. Most layers need to set it to :code:`false`. (:code:`false` in our example)
- whether to use weights. Some layers or activations perform normalization so that the sum of their output is a constant. For example, the sum of output of a softmax activation is one. In this case, we cannot correctly compute the gradients using regular gradient check techniques. A weighted sum of the output, which is not a constant, is utilized to compute the gradients. (:code:`true` in our example, because the activation of a fully connected layer can be softmax)
......@@ -309,7 +309,7 @@ All the gradient check unit tests are located in :code:`paddle/gserver/tests/tes
config.biasSize = 4096;
config.layerConfig.set_type("fc");
config.layerConfig.set_size(4096);
config.layerConfig.set_active_type("sigmoid");
config.layerConfig.set_active_type("softmax");
config.layerConfig.set_drop_rate(0.1);
// Setup inputs.
config.inputDefs.push_back(
......
HOW TO
=======
Usage
-------
.. toctree::
:maxdepth: 1
usage/cmd_parameter/index_en.rst
usage/cluster/cluster_train_en.md
usage/k8s/k8s_en.md
usage/k8s/k8s_aws_en.md
Development
------------
.. toctree::
:maxdepth: 1
dev/new_layer_en.rst
dev/contribute_to_paddle_en.md
Configuration
-------------
.. toctree::
:maxdepth: 1
deep_model/rnn/index_en.rst
Optimization
-------------
.. toctree::
:maxdepth: 1
optimization/gpu_profiling_en.rst
====================
Tune GPU Performance
====================
.. contents::
This tutorial will guide you step-by-step through how to conduct profiling and performance tuning using built-in timer, **nvprof** and **nvvp**.
- What is profiling?
- Why we need profiling?
- How to do profiling?
- Profile tools
- Hands-on Tutorial
- Profiling tips
What's profiling?
=================
In software engineering, profiling is a form of dynamic program analysis that measures the space (memory) or time
complexity of a program, the usage of particular instructions, or the frequency and duration of function calls.
Most commonly, profiling information serves to aid program optimization.
Briefly, profiler is used to measure application performance. Program analysis tools are extremely important for
understanding program behavior. Simple profiling can tell you that how long does an operation take? For advanced
profiling, it can interpret why does an operation take a long time?
Why we need profiling?
======================
Since training deep neural network typically take a very long time to get over, performance is gradually becoming
the most important thing in deep learning field. The first step to improve performance is to understand what parts
are slow. There is no point in improving performance of a region which doesn’t take much time!
How to do profiling?
====================
To achieve maximum performance, there are five steps you can take to reach your goals.
- Profile the code
- Find the slow parts
- Work out why they’re slow
- Make them fast
- Profile the code again
Usually, processor has two key performance limits include float point throughput and
memory throughput. For GPU, it also need more parallelism to fulfill its potential.
This is why they can be so fast.
Profiler Tools
==============
For general GPU profiling, a bunch of tools are provided from both NVIDIA and third party.
**nvprof** is Nvidia profiler and **nvvp** is (GUI based) Nvidia visual profiler.
In this tutorial, we will focus on nvprof and nvvp.
:code:`test_GpuProfiler` from :code:`paddle/math/tests` directory will be used to evaluate
above profilers.
.. literalinclude:: ../../../paddle/math/tests/test_GpuProfiler.cpp
:language: c++
:lines: 137-151
:linenos:
The above code snippet includes two methods, you can use any of them to profile the regions of interest.
1. :code:`REGISTER_TIMER_INFO` is a built-in timer wrapper which can calculate the time overhead of both cpu functions and cuda kernels.
2. :code:`REGISTER_GPU_PROFILER` is a general purpose wrapper object of :code:`cudaProfilerStart` and :code:`cudaProfilerStop` to avoid
program crashes when CPU version of PaddlePaddle invokes them.
You can find more details about how to use both of them in the next session.
Hands-on Approach
=================
Built-in Timer
--------------
To enable built-in timer in PaddlePaddle, first you have to add :code:`REGISTER_TIMER_INFO` into the regions of you interest.
Then, all information could be stamped in the console via :code:`printStatus` or :code:`printAllStatus` function.
As a simple example, consider the following:
1. Add :code:`REGISTER_TIMER_INFO` and :code:`printAllStatus` functions (see the emphasize-lines).
.. literalinclude:: ../../../paddle/math/tests/test_GpuProfiler.cpp
:language: c++
:lines: 137-151
:emphasize-lines: 8-12,14
:linenos:
2. Configure cmake with **WITH_TIMER** and recompile PaddlePaddle.
.. code-block:: bash
cmake .. -DWITH_TIMER=ON
make
3. Execute your code and observe the results (see the emphasize-lines).
.. code-block:: bash
:emphasize-lines: 1,12-15
> ./paddle/math/tests/test_GpuProfiler
I1117 11:13:42.313065 2522362816 Util.cpp:155] commandline: ./paddle/math/tests/test_GpuProfiler
I1117 11:13:42.845065 2522362816 Util.cpp:130] Calling runInitFunctions
I1117 11:13:42.845208 2522362816 Util.cpp:143] Call runInitFunctions done.
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Profiler
[ RUN ] Profiler.BilinearFwdBwd
I1117 11:13:42.845310 2522362816 test_GpuProfiler.cpp:114] Enable GPU Profiler Stat: [testBilinearFwdBwd] "numSamples = 10, channels = 16, im
gSizeX = 64, imgSizeY = 64"
I1117 11:13:42.850154 2522362816 ThreadLocal.cpp:37] thread use undeterministic rand seed:20659751
I1117 11:13:42.981501 2522362816 Stat.cpp:130] ======= StatSet: [GlobalStatInfo] status ======
I1117 11:13:42.981539 2522362816 Stat.cpp:133] Stat=testBilinearFwdBwd total=136.141 avg=136.141 max=136.141 min=136.141 count=1
I1117 11:13:42.981572 2522362816 Stat.cpp:141] ======= BarrierStatSet status ======
I1117 11:13:42.981575 2522362816 Stat.cpp:154] --------------------------------------------------
[ OK ] Profiler.BilinearFwdBwd (136 ms)
[----------] 1 test from Profiler (136 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (136 ms total)
[ PASSED ] 1 test.
nvprof profiler
---------------
To use this command line profiler **nvprof**, you can simply issue the following command:
1. Add :code:`REGISTER_GPU_PROFILER` function (see the emphasize-lines).
.. literalinclude:: ../../../paddle/math/tests/test_GpuProfiler.cpp
:language: c++
:lines: 137-151
:emphasize-lines: 6-7
:linenos:
2. Configure cmake with **WITH_PROFILER** and recompile PaddlePaddle.
.. code-block:: bash
cmake .. -DWITH_PROFILER=ON
make
3. Use Nvidia profiler **nvprof** to profile the binary.
.. code-block:: bash
nvprof ./paddle/math/tests/test_GpuProfiler
Then, you can get the following profiling result:
.. code-block:: bash
==78544== Profiling application: ./paddle/math/tests/test_GpuProfiler
==78544== Profiling result:
Time(%) Time Calls Avg Min Max Name
27.60% 9.6305ms 5 1.9261ms 3.4560us 6.4035ms [CUDA memcpy HtoD]
26.07% 9.0957ms 1 9.0957ms 9.0957ms 9.0957ms KeBilinearInterpBw
23.78% 8.2977ms 1 8.2977ms 8.2977ms 8.2977ms KeBilinearInterpFw
22.55% 7.8661ms 2 3.9330ms 1.5798ms 6.2863ms [CUDA memcpy DtoH]
==78544== API calls:
Time(%) Time Calls Avg Min Max Name
46.85% 682.28ms 8 85.285ms 12.639us 682.03ms cudaStreamCreateWithFlags
39.83% 580.00ms 4 145.00ms 302ns 550.27ms cudaFree
9.82% 143.03ms 9 15.892ms 8.7090us 142.78ms cudaStreamCreate
1.23% 17.983ms 7 2.5690ms 23.210us 6.4563ms cudaMemcpy
1.23% 17.849ms 2 8.9247ms 8.4726ms 9.3768ms cudaStreamSynchronize
0.66% 9.5969ms 7 1.3710ms 288.43us 2.4279ms cudaHostAlloc
0.13% 1.9530ms 11 177.54us 7.6810us 591.06us cudaMalloc
0.07% 1.0424ms 8 130.30us 1.6970us 453.72us cudaGetDevice
0.04% 527.90us 40 13.197us 525ns 253.99us cudaEventCreateWithFlags
0.03% 435.73us 348 1.2520us 124ns 42.704us cuDeviceGetAttribute
0.03% 419.36us 1 419.36us 419.36us 419.36us cudaGetDeviceCount
0.02% 260.75us 2 130.38us 129.32us 131.43us cudaGetDeviceProperties
0.02% 222.32us 2 111.16us 106.94us 115.39us cudaLaunch
0.01% 214.06us 4 53.514us 28.586us 77.655us cuDeviceGetName
0.01% 115.45us 4 28.861us 9.8250us 44.526us cuDeviceTotalMem
0.01% 83.988us 4 20.997us 578ns 77.760us cudaSetDevice
0.00% 38.918us 1 38.918us 38.918us 38.918us cudaEventCreate
0.00% 34.573us 31 1.1150us 279ns 12.784us cudaDeviceGetAttribute
0.00% 17.767us 1 17.767us 17.767us 17.767us cudaProfilerStart
0.00% 15.228us 2 7.6140us 3.5460us 11.682us cudaConfigureCall
0.00% 14.536us 2 7.2680us 1.1490us 13.387us cudaGetLastError
0.00% 8.6080us 26 331ns 173ns 783ns cudaSetupArgument
0.00% 5.5470us 6 924ns 215ns 2.6780us cuDeviceGet
0.00% 5.4090us 6 901ns 328ns 3.3320us cuDeviceGetCount
0.00% 4.1770us 3 1.3920us 1.0630us 1.8300us cuDriverGetVersion
0.00% 3.4650us 3 1.1550us 1.0810us 1.2680us cuInit
0.00% 830ns 1 830ns 830ns 830ns cudaRuntimeGetVersion
nvvp profiler
-------------
For visual profiler **nvvp**, you can either import the output of :code:`nvprof –o ...` or
run application through GUI.
**Note: nvvp also support CPU profiling** (Click the box in nvvp to enable profile execution on CPU).
.. image:: nvvp1.png
:align: center
:scale: 33%
From the perspective of kernel functions, **nvvp** can even illustrate why does an operation take a long time?
As shown in the following figure, kernel's block usage, register usage and shared memory usage from :code:`nvvp`
allow us to fully utilize all warps on the GPU.
.. image:: nvvp2.png
:align: center
:scale: 33%
From the perspective of application, **nvvp** can give you some suggestions to address performance bottleneck.
For instance, some advice in data movement and compute utilization from the below figure can guide you to tune performance.
.. image:: nvvp3.png
:align: center
:scale: 33%
.. image:: nvvp4.png
:align: center
:scale: 33%
Profiling tips
==============
- The **nvprof** and **nvvp** output is a very good place to start.
- The timeline is a good place to go next.
- Only dig deep into a kernel if it’s taking a significant amount of your time.
- Where possible, try to match profiler output with theory.
1) For example, if I know I’m moving 1GB, and my kernel takes 10ms, I expect the profiler to report 100GB/s.
2) Discrepancies are likely to mean your application isn’t doing what you thought it was.
- Know your hardware: If your GPU can do 6 TFLOPs, and you’re already doing 5.5 TFLOPs, you won’t go much faster!
Profiling is a key step in optimization. Sometimes quite simple changes can lead to big improvements in performance.
Your mileage may vary!
Reference
=========
Jeremy Appleyard, `GPU Profiling for Deep Learning <http://www.robots.ox.ac.uk/~seminars/seminars/Extra/2015_10_08_JeremyAppleyard.pdf>`_, 2015
# Distributed Training
# Run Distributed Training
In this article, we explain how to run distributed Paddle training jobs on clusters. We will create the distributed version of the single-process training example, [recommendation](https://github.com/baidu/Paddle/tree/develop/demo/recommendation).
[Scripts](https://github.com/baidu/Paddle/tree/develop/paddle/scripts/cluster_train) used in this article launch distributed jobs via SSH. They also work as a reference for users running more sophisticated cluster management systems like MPI and Kubernetes.
[Scripts](https://github.com/baidu/Paddle/tree/develop/paddle/scripts/cluster_train) used in this article launch distributed jobs via SSH. They also work as a reference for users running more sophisticated cluster management systems like MPI and [Kubernetes](https://github.com/PaddlePaddle/Paddle/tree/develop/doc/howto/usage/k8s).
## Prerequisite
1. Aforementioned scripts use a Python library [fabric](http://www.fabfile.org/) to run SSH commands. We can use `pip` to install fabric:
```bash
pip install fabric
pip install fabric
```
1. We need to install PaddlePaddle on all nodes in the cluster. To enable GPUs, we need to install CUDA in `/usr/local/cuda`; otherwise Paddle would report errors at runtime.
......@@ -20,13 +20,13 @@ pip install fabric
We refer to the directory where we put dependent libraries, config files, etc., as *workspace*.
These ```train/test``` data should be prepared before launching cluster job. To satisfy the requirement that train/test data are placed in different directory from workspace, PADDLE refers train/test data according to index file named as ```train.list/test.list``` which are used in model config file. So the train/test data also contains train.list/test.list two list file. All local training demo already provides scripts to help you create these two files, and all nodes in cluster job will handle files with same logical code in normal condition.
These `train/test` data should be prepared before launching cluster job. To satisfy the requirement that train/test data are placed in different directory from workspace, PADDLE refers train/test data according to index file named as `train.list/test.list` which are used in model config file. So the train/test data also contains train.list/test.list two list file. All local training demo already provides scripts to help you create these two files, and all nodes in cluster job will handle files with same logical code in normal condition.
Generally, you can use same model file from local training for cluster training. What you should have in mind that, the ```batch_size``` set in ```setting``` function in model file means batch size in ```each``` node of cluster job instead of total batch size if synchronization SGD was used.
Generally, you can use same model file from local training for cluster training. What you should have in mind that, the `batch_size` set in `setting` function in model file means batch size in `each` node of cluster job instead of total batch size if synchronization SGD was used.
Following steps are based on demo/recommendation demo in demo directory.
Following steps are based on [demo/recommendation](https://github.com/PaddlePaddle/Paddle/tree/develop/demo/recommendation) demo in demo directory.
You just go through demo/recommendation tutorial doc until ```Train``` section, and at last you will get train/test data and model configuration file. Finaly, just use demo/recommendation as workspace for cluster training.
You just go through demo/recommendation tutorial doc until `Train` section, and at last you will get train/test data and model configuration file. Finaly, just use demo/recommendation as workspace for cluster training.
At last your workspace should look like as follow:
```
......@@ -55,16 +55,16 @@ At last your workspace should look like as follow:
```
Not all of these files are needed for cluster training, but it's not necessary to remove useless files.
```trainer_config.py```
`trainer_config.py`
Indicates the model config file.
```train.list``` and ```test.list```
`train.list` and `test.list`
File index. It stores all relative or absolute file paths of all train/test data at current node.
```dataprovider.py```
`dataprovider.py`
used to read train/test samples. It's same as local training.
```data```
`data`
all files in data directory are refered by train.list/test.list which are refered by data provider.
......@@ -72,19 +72,19 @@ all files in data directory are refered by train.list/test.list which are refere
The options below must be carefully set in cluster_train/conf.py
```HOSTS``` all nodes hostname or ip that will run cluster job. You can also append user and ssh port with hostname, such as root@192.168.100.17:9090.
`HOSTS` all nodes hostname or ip that will run cluster job. You can also append user and ssh port with hostname, such as root@192.168.100.17:9090.
```ROOT_DIR``` workspace ROOT directory for placing JOB workspace directory
`ROOT_DIR` workspace ROOT directory for placing JOB workspace directory
```PADDLE_NIC``` the NIC(Network Interface Card) interface name for cluster communication channel, such as eth0 for ethternet, ib0 for infiniband.
`PADDLE_NIC` the NIC(Network Interface Card) interface name for cluster communication channel, such as eth0 for ethternet, ib0 for infiniband.
```PADDLE_PORT``` port number for cluster commnunication channel
`PADDLE_PORT` port number for cluster commnunication channel
```PADDLE_PORTS_NUM``` the number of port used for cluster communication channle. if the number of cluster nodes is small(less than 5~6nodes), recommend you set it to larger, such as 2 ~ 8, for better network performance.
`PADDLE_PORTS_NUM` the number of port used for cluster communication channle. if the number of cluster nodes is small(less than 5~6nodes), recommend you set it to larger, such as 2 ~ 8, for better network performance.
```PADDLE_PORTS_NUM_FOR_SPARSE``` the number of port used for sparse updater cluster commnunication channel. if sparse remote update is used, set it like ```PADDLE_PORTS_NUM```
`PADDLE_PORTS_NUM_FOR_SPARSE` the number of port used for sparse updater cluster commnunication channel. if sparse remote update is used, set it like `PADDLE_PORTS_NUM`
```LD_LIBRARY_PATH``` set addtional LD_LIBRARY_PATH for cluster job. You can use it to set CUDA libraries path.
`LD_LIBRARY_PATH` set addtional LD_LIBRARY_PATH for cluster job. You can use it to set CUDA libraries path.
Default Configuration as follow:
......@@ -118,15 +118,15 @@ LD_LIBRARY_PATH="/usr/local/cuda/lib64:/usr/lib64"
```
### Launching Cluster Job
```paddle.py``` provides automatical scripts to start all PaddlePaddle cluster processes in different nodes. By default, all command line options can set as ```paddle.py``` command options and ```paddle.py``` will transparently and automatically set these options to PaddlePaddle lower level processes.
`paddle.py` provides automatical scripts to start all PaddlePaddle cluster processes in different nodes. By default, all command line options can set as `paddle.py` command options and `paddle.py` will transparently and automatically set these options to PaddlePaddle lower level processes.
```paddle.py```provides two distinguished command option for easy job launching.
`paddle.py`provides two distinguished command option for easy job launching.
```job_dispatch_package``` set it with local ```workspace```directory, it will be dispatched to all nodes set in conf.py. It could be helpful for frequent hacking workspace files, otherwise frequent mulit-nodes workspace deployment could make your crazy.
```job_workspace``` set it with already deployed workspace directory, ```paddle.py``` will skip dispatch stage to directly launch cluster job with all nodes. It could help to reduce heavy
`job_dispatch_package` set it with local `workspace`directory, it will be dispatched to all nodes set in conf.py. It could be helpful for frequent hacking workspace files, otherwise frequent mulit-nodes workspace deployment could make your crazy.
`job_workspace` set it with already deployed workspace directory, `paddle.py` will skip dispatch stage to directly launch cluster job with all nodes. It could help to reduce heavy
dispatch latency.
```cluster_train/run.sh``` provides command line sample to run ```demo/recommendation``` cluster job, just modify ```job_dispatch_package``` and ```job_workspace``` with your defined directory, then:
`cluster_train/run.sh` provides command line sample to run `demo/recommendation` cluster job, just modify `job_dispatch_package` and `job_workspace` with your defined directory, then:
```
sh run.sh
```
......@@ -134,23 +134,23 @@ sh run.sh
The cluster Job will start in several seconds.
### Kill Cluster Job
```paddle.py``` can capture ```Ctrl + C``` SIGINT signal to automatically kill all processes launched by it. So just stop ```paddle.py``` to kill cluster job. You should mannally kill job if program crashed.
`paddle.py` can capture `Ctrl + C` SIGINT signal to automatically kill all processes launched by it. So just stop `paddle.py` to kill cluster job. You should mannally kill job if program crashed.
### Check Cluster Training Result
Check log in $workspace/log for details, each node owns same log structure.
```paddle_trainer.INFO```
`paddle_trainer.INFO`
It provides almost all interal output log for training, same as local training. Check runtime model convergence here.
```paddle_pserver2.INFO```
`paddle_pserver2.INFO`
It provides pserver running log, which could help to diagnose distributed error.
```server.log```
`server.log`
It provides stderr and stdout of pserver process. Check error log if training crashs.
```train.log```
`train.log`
It provides stderr and stdout of trainer process. Check error log if training crashs.
### Check Model Output
After one pass finished, model files will be writed in ```output``` directory in node 0.
```nodefile``` in workspace indicates the node id of current cluster job.
After one pass finished, model files will be writed in `output` directory in node 0.
`nodefile` in workspace indicates the node id of current cluster job.
......@@ -127,11 +127,6 @@ It looks like there are a lot of arguments. However, most of them are for develo
<td class="left">√</td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">allow_inefficient_sparse_update</td>
<td class="left">√</td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
<tr>
<td class="left">start_pass</td>
<td class="left">√</td><td class="left">√</td><td class="left"></td><td class="left"></td>
......@@ -143,7 +138,7 @@ It looks like there are a lot of arguments. However, most of them are for develo
</tr>
<tr>
<td class="left" rowspan = "2">testing during training</td><td class="left">test_all_data_in_one_period</td>
<td class="left" rowspan = "2">testing during training</td><td class="left">test_period</td>
<td class="left">√</td><td class="left">√</td><td class="left"></td><td class="left"></td>
</tr>
......@@ -233,16 +228,6 @@ It looks like there are a lot of arguments. However, most of them are for develo
<td class="left"></td><td class="left"></td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left" rowspan = "2">metric learning</td><td class="left">external</td>
<td class="left">√</td><td class="left">√</td><td class="left">√</td><td class="left">√</td>
</tr>
<tr>
<td class="left">data_server_port</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left">√</td>
</tr>
<tr>
<td class="left" rowspan = "16">PServer</td><td class="left">start_pserver</td>
<td class="left"></td><td class="left">√</td><td class="left"></td><td class="left">√</td>
......
```eval_rst
.. _cmd_detail_introduction:
```
# Detail Description
## Common
......@@ -31,7 +35,7 @@
- type: string (default: null).
* `--version`
- Whether to print version infomatrion.
- Whether to print version information.
- type: bool (default: 0).
* `--show_layer_stat`
......@@ -69,7 +73,7 @@
- type: bool (default: 0).
* `--load_missing_parameter_strategy`
- Specify the loading operation when model file is missing. Now support fail/rand/zere three operations.
- Specify the loading operation when model file is missing. Now support fail/rand/zero three operations.
- `fail`: program will exit.
- `rand`: uniform or normal distribution according to **initial\_strategy** in network config. Uniform range is: **[mean - std, mean + std]**, where mean and std are configures in trainer config.
- `zero`: all parameters are zero.
......@@ -110,21 +114,17 @@
- type: int32 (default: -1).
* `--test_period`
- Run testing every test_period train batches. If not set, run testing each pass.
- type: int32 (default: 1000).
- if equal 0, do test on all test data at the end of each pass. While if equal non-zero, do test on all test data every test_period batches.
- type: int32 (default: 0).
* `--test_wait`
- Whether to wait for parameter per pass if not exist. If set test_data_path in submitting environment of cluster, it will launch one process to perfom testing, so we need to set test_wait=1. Note that in the cluster submitting environment, this argument has been set True by default.
 - Whether to wait for parameter per pass if not exist. It can be used when user launch another process to perfom testing during the training process.
- type: bool (default: 0).
* `--model_list`
- File that saves the model list when testing. It was set automatically when using cluster submitting environment after setting model_path.
- File that saves the model list when testing.
- type: string (default: "", null).
* `--test_all_data_in_one_period`
- This argument is usually used in testing period during traning. If true, all data will be tested in one test period. Otherwise (batch_size * log_peroid) data will be tested.
- type: bool (default: 0).
* `--predict_output_dir`
- Directory that saves the layer output. It is configured in Outputs() in network config. Default, this argument is null, meaning save nothing. Specify this directory if you want to save feature map of some layers in testing mode. Note that, layer outputs are values after activation function.
- type: string (default: "", null).
......@@ -184,15 +184,6 @@
- Specify shared dynamic library. It can be defined out of paddle by user.
- type: string (default: "", null).
## Metric Learning
* `--external`
- Whether to use external machine for metric learning.
- type: bool (default: 0).
* `--data_server_port`
- Listening port for dserver (data server), dserver is mainly used in metric learning.
- type: int32 (default: 21134).
## DataProvider
* `--memory_threshold_on_load_data`
......@@ -212,7 +203,7 @@
- type: bool (default: 0).
* `--pservers`
- Comma separated IP addresses of pservers. It is set automatically in cluster submitting environment.
- Comma separated IP addresses of pservers.
- type: string (default: "127.0.0.1").
* `--port`
......@@ -310,10 +301,6 @@
- show log details for sparse parameter distribution in pserver.
- type: bool (default: 0).
* `--allow_inefficient_sparse_update`
- Whether to allow inefficient sparse update.
- type: bool (default: 0).
* `--check_sparse_distribution_batches`
- Running sparse parameter distribution check every so many batches.
- type: int32 (default: 100).
......
.. _cmd_line_index:
Set Command-line Parameters
===========================
.. toctree::
:maxdepth: 1
use_case_en.md
arguments_en.md
detail_introduction_en.md
......@@ -10,9 +10,8 @@ paddle train \
--config=network_config \
--save_dir=output \
--trainer_count=COUNT \ #(default:1)
--test_period=M \ #(default:1000)
--test_all_data_in_one_period=true \ #(default:false)
--num_passes=N \ #(defalut:100)
--test_period=M \ #(default:0)
--num_passes=N \ #(defalut:100)
--log_period=K \ #(default:100)
--dot_period=1000 \ #(default:1)
#[--show_parameter_stats_period=100] \ #(default:0)
......@@ -135,14 +134,14 @@ fc2=fc_layer(...)
fc3=fc_layer(...,layer_attr=ExtraAttr(device=-1))
```
- default_device(0): set default device ID to 0. This means that except the layers with device=-1, all layers will use a GPU, and the specific GPU used for each layer depends on trainer\_count and gpu\_id (0 by default). Here, layer l1 and l2 are computed on the GPU.
- default_device(0): set default device ID to 0. This means that except the layers with device=-1, all layers will use a GPU, and the specific GPU used for each layer depends on trainer\_count and gpu\_id (0 by default). Here, layer fc1 and fc2 are computed on the GPU.
- device=-1: use the CPU for layer l3.
- device=-1: use the CPU for layer fc3.
- trainer_count:
- trainer_count=1: if gpu\_id is not set, then use the first GPU to compute layers l1 and l2. Otherwise use the GPU with gpu\_id.
- trainer_count=1: if gpu\_id is not set, then use the first GPU to compute layers fc1 and fc2. Otherwise use the GPU with gpu\_id.
- trainer_count>1: use trainer\_count GPUs to compute one layer using data parallelism. For example, trainer\_count=2 means that GPUs 0 and 1 will use data parallelism to compute layer l1 and l2.
- trainer_count>1: use trainer\_count GPUs to compute one layer using data parallelism. For example, trainer\_count=2 means that GPUs 0 and 1 will use data parallelism to compute layer fc1 and fc2.
### Case 2: Specify Layers in Different Devices
......@@ -158,14 +157,14 @@ fc4=fc_layer(input=fc2, layer_attr=ExtraAttr(device=-1), ...)
In this case, we assume that there are 4 GPUs in one machine.
- trainer_count=1:
- Use GPU 0 to compute layer l2.
- Use GPU 1 to compute layer l3.
- Use CPU to compute layer l4.
- Use GPU 0 to compute layer fc2.
- Use GPU 1 to compute layer fc3.
- Use CPU to compute layer fc4.
- trainer_count=2:
- Use GPU 0 and 1 to compute layer l2.
- Use GPU 2 and 3 to compute layer l3.
- Use CPU to compute l4 in two threads.
- Use GPU 0 and 1 to compute layer fc2.
- Use GPU 2 and 3 to compute layer fc3.
- Use CPU to compute fc4 in two threads.
- trainer_count=4:
- It will fail (note, we have assumed that there are 4 GPUs in machine), because argument `allow_only_one_model_on_one_gpu` is true by default.
......
此差异已折叠。
# Paddle On Kubernetes
>In this article, we will introduce how to run Paddle training job on single CPU machine using Kubernetes. In next article, we will introduce how to run Paddle training job on distributed cluster.
## Build Docker Image
In distributed Kubernetes cluster, we will use Ceph or other shared storage system for storing training related data so that all processes in Paddle training can retrieve data from Ceph. In this example, we will only demo training job on single machine. In order to simplify the requirement of the environment, we will directly put training data into Paddle's Docker Image, so we need to create a Paddle Docker image that already includes the training data.
Paddle's [Quick Start Tutorial](http://www.paddlepaddle.org/doc/demo/quick_start/index_en.html) introduces how to download and train data by using script from Paddle's source code.
And `paddledev/paddle:cpu-demo-latest` image has the Paddle source code and demo. (Caution: Default Paddle image `paddledev/paddle:cpu-latest` doesn't include the source code, Paddle's different versions of image can be referred here: [Docker installation guide](http://www.paddlepaddle.org/doc/build/docker_install.html)), so we run this container and download the training data, and then commit the whole container to be a new Docker image.
### Run Docker Container
```
$ docker run --name quick_start_data -it paddledev/paddle:cpu-demo-latest
```
### Download Training Data
Getting into `/root/paddle/demo/quick_start/data` Directory,using `get_data.sh` to download training data.
Then getting into `/root/paddle/demo/quick_start` Directory, using `preprocess.sh` to pre-process training data.
```
$ root@fbd1f2bb71f4:~/paddle/demo/quick_start/data# ./get_data.sh
Downloading Amazon Electronics reviews data...
--2016-10-31 01:33:43-- http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Electronics_5.json.gz
Resolving snap.stanford.edu (snap.stanford.edu)... 171.64.75.80
Connecting to snap.stanford.edu (snap.stanford.edu)|171.64.75.80|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 495854086 (473M) [application/x-gzip]
Saving to: 'reviews_Electronics_5.json.gz'
10% [=======> ] 874,279 64.7KB/s eta 2h 13m
```
### Modify Startup Script
After downloading the data,modify `/root/paddle/demo/quick_start/train.sh` file contents are as follows (one more cd cmd):
```
set -e
cd /root/paddle/demo/quick_start
cfg=trainer_config.lr.py
#cfg=trainer_config.emb.py
#cfg=trainer_config.cnn.py
#cfg=trainer_config.lstm.py
#cfg=trainer_config.bidi-lstm.py
#cfg=trainer_config.db-lstm.py
paddle train \
--config=$cfg \
--save_dir=./output \
--trainer_count=4 \
--log_period=20 \
--num_passes=15 \
--use_gpu=false \
--show_parameter_stats_period=100 \
--test_all_data_in_one_period=1 \
2>&1 | tee 'train.log'
```
### Commit Docker Image
```
$ docker commit quick_start_data mypaddle/paddle:quickstart
```
## Use Kubernetes For Training
>We will use Kubernetes job for training process, following steps shows how to do the training with Kubernetes.
### Create Yaml Files
The output result in container will be demolished when job finished (container stopped running), so we need to mount the volume out to the local disk when creating the container to store the training result. Using our previously created image, we can create a [Kubernetes Job](http://kubernetes.io/docs/user-guide/jobs/#what-is-a-job), the yaml contents are as follows:
```
apiVersion: batch/v1
kind: Job
metadata:
name: quickstart
spec:
parallelism: 1
completions: 1
template:
metadata:
name: quickstart
spec:
volumes:
- name: output
hostPath:
path: /home/work/paddle_output
containers:
- name: pi
image: mypaddle/paddle:quickstart
command: ["bin/bash", "-c", "/root/paddle/demo/quick_start/train.sh"]
volumeMounts:
- name: output
mountPath: /root/paddle/demo/quick_start/output
restartPolicy: Never
```
### Start Paddle Job
Using the above yaml file to start the Kubernetes job.
```
$ kubectl create -f paddle.yaml
```
Get the detailed status of the job:
```
$ kubectl get job
NAME DESIRED SUCCESSFUL AGE
quickstart 1 0 58s
$ kubectl describe job quickstart
Name: quickstart
Namespace: default
Image(s): registry.baidu.com/public/paddle:cpu-demo-latest
Selector: controller-uid=f120da72-9f18-11e6-b363-448a5b355b84
Parallelism: 1
Completions: 1
Start Time: Mon, 31 Oct 2016 11:20:16 +0800
Labels: controller-uid=f120da72-9f18-11e6-b363-448a5b355b84,job-name=quickstart
Pods Statuses: 0 Running / 1 Succeeded / 0 Failed
Volumes:
output:
Type: HostPath (bare host directory volume)
Path: /home/work/paddle_output
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1m 1m 1 {job-controller } Normal SuccessfulCreate Created pod: quickstart-fa0wx
```
### Get Training Result
We can use kubectl command to take a look at the status of related pod.
```
$ kubectl describe pod quickstart-fa0wx
Name: quickstart-fa0wx
Namespace: default
Node: paddle-demo-let02/10.206.202.44
Start Time: Mon, 31 Oct 2016 11:20:17 +0800
Labels: controller-uid=f120da72-9f18-11e6-b363-448a5b355b84,job-name=quickstart
Status: Succeeded
IP: 10.0.0.9
Controllers: Job/quickstart
Containers:
quickstart:
Container ID: docker://b8561f5c79193550d64fa47418a9e67ebdd71546186e840f88de5026b8097465
Image: registry.baidu.com/public/paddle:cpu-demo-latest
Image ID: docker://18e457ce3d362ff5f3febf8e7f85ffec852f70f3b629add10aed84f930a68750
Port:
Command:
bin/bash
-c
/root/paddle/demo/quick_start/train.sh
QoS Tier:
cpu: BestEffort
memory: BestEffort
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 31 Oct 2016 11:20:20 +0800
Finished: Mon, 31 Oct 2016 11:21:46 +0800
Ready: False
Restart Count: 0
Environment Variables:
Conditions:
Type Status
Ready False
Volumes:
output:
Type: HostPath (bare host directory volume)
Path: /home/work/paddle_output
```
We can also ssh to Kubernetes node to take a look at the training result.
```
[root@paddle-demo-let02 paddle_output]# ll
total 60
drwxr-xr-x 2 root root 4096 Oct 31 11:20 pass-00000
drwxr-xr-x 2 root root 4096 Oct 31 11:20 pass-00001
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00002
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00003
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00004
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00005
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00006
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00007
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00008
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00009
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00010
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00011
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00012
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00013
drwxr-xr-x 2 root root 4096 Oct 31 11:21 pass-00014
```
To build PaddlePaddle data preparation image in tutorial [Distributed PaddlePaddle Training on AWS with Kubernetes](../../k8s_aws_en.md), run following commands:
```
cp -r ../../../../../../demo/quick_start .
docker build . -t prepare-data-image-name
```
To build PaddlePaddle training image in tutorial [Distributed PaddlePaddle Training on AWS with Kubernetes](../../k8s_aws_en.md), run following command:
```
docker build . -t train-image-name
```
PaddlePaddle Documentation
==========================
User Guide
----------
* [Introduction](introduction/index.md)
* [Quick Start](demo/quick_start/index_en.md)
* [Build and Installation](build/index.rst)
* [Contribute Code](build/contribute_to_paddle.md)
* [User Interface](ui/index.md)
* [Model Config Interface](ui/api/trainer_config_helpers/index.rst)
* [Example and Demo](demo/index.md)
* [Cluster Train](cluster/index.md)
Development Guide
-----------------
* [Layer Documents](layer.md)
* [Writing New Layers](dev/new_layer/index.rst)
* [Source Code Documents](source/index.md)
Algorithm Tutorial
------------------
* [RNN Configuration](algorithm/rnn/rnn.rst)
PaddlePaddle Documentation
==========================
.. toctree::
:maxdepth: 1
getstarted/index_en.rst
howto/index_en.rst
api/index_en.rst
about/index_en.rst
# Layer Documents
* [Layer Source Code Document](source/gserver/layers/index.rst)
* [Layer Python API Document](ui/api/trainer_config_helpers/layers_index.rst)
API
========
.. doxygenfile:: paddle/api/PaddleAPI.h
.. doxygenfile:: paddle/api/Internal.h
Cuda
=============
Dynamic Link Libs
--------------------------
hl_dso_loader.h
``````````````````
.. doxygenfile:: paddle/cuda/include/hl_dso_loader.h
GPU Resources
----------------
hl_cuda.ph
``````````````
.. doxygenfile:: paddle/cuda/include/hl_cuda.ph
hl_cuda.h
``````````````
.. doxygenfile:: paddle/cuda/include/hl_cuda.h
CUDA Wrapper
--------------
hl_cuda_cublas.h
``````````````````````
.. doxygenfile:: paddle/cuda/include/hl_cuda_cublas.h
hl_cuda_cudnn.h
``````````````````````
.. doxygenfile:: paddle/cuda/include/hl_cuda_cudnn.h
hl_cuda_cudnn.h
``````````````````````
.. doxygenfile:: paddle/cuda/include/hl_cuda_cudnn.ph
CUDA
====================
.. toctree::
:maxdepth: 3
cuda.rst
Matrix
====================
.. toctree::
:maxdepth: 3
matrix.rst
Matrix
=======
Base Matrix
-------------
hl_matrix.h
``````````````````
.. doxygenfile:: paddle/cuda/include/hl_matrix.h
hl_matrix_base.h
``````````````````
.. doxygenfile:: paddle/cuda/include/hl_matrix_base.cuh
hl_matrix_apply.cuh
``````````````````````
.. doxygenfile:: paddle/cuda/include/hl_matrix_apply.cuh
hl_matrix_ops.cuh
``````````````````````
.. doxygenfile:: paddle/cuda/include/hl_matrix_ops.cuh
hl_matrix_type.cuh
``````````````````````
.. doxygenfile:: paddle/cuda/include/hl_matrix_type.cuh
hl_sse_matrix_kernel.cuh
``````````````````````````
.. doxygenfile:: paddle/cuda/include/hl_sse_matrix_kernel.cuh
hl_batch_transpose.h
``````````````````````````
.. doxygenfile:: paddle/cuda/include/hl_batch_transpose.h
Sparse Matrix
--------------
hl_sparse.h
``````````````````
.. doxygenfile:: paddle/cuda/include/hl_sparse.h
hl_sparse.ph
``````````````````````
.. doxygenfile:: paddle/cuda/include/hl_sparse.ph
Others
---------------
hl_aggregate.h
``````````````````
.. doxygenfile:: paddle/cuda/include/hl_aggregate.h
hl_table_apply.h
``````````````````
.. doxygenfile:: paddle/cuda/include/hl_table_apply.h
hl_top_k.h
``````````````````
.. doxygenfile:: paddle/cuda/include/hl_top_k.h
RNN
====================
.. toctree::
:maxdepth: 3
rnn.rst
Neural Networks
==================
Base
-------
.. doxygenfile:: paddle/cuda/include/hl_gpu.h
.. doxygenfile:: paddle/cuda/include/hl_cnn.h
.. doxygenfile:: paddle/cuda/include/hl_functions.h
.. doxygenfile:: paddle/cuda/include/hl_avx_functions.h
.. doxygenfile:: paddle/cuda/include/hl_device_functions.cuh
.. doxygenfile:: paddle/cuda/include/hl_gpu_functions.cuh
Activation Functions
-----------------------
.. doxygenfile:: paddle/cuda/include/hl_activation_functions.h
RNN Related APIs
-----------------
.. doxygenfile:: paddle/cuda/include/hl_recurrent_apply.cuh
.. doxygenfile:: paddle/cuda/include/hl_sequence.h
LSTM Model
``````````````
.. doxygenfile:: paddle/cuda/include/hl_lstm.h
.. dpxygenfile:: paddle/cuda/include/hl_cpu_lstm.cuh
.. doxygenfile:: paddle/cuda/include/hl_gpu_lstm.cuh
.. doxygenfile:: paddle/cuda/include/hl_lstm_ops.cuh
GRU Model
````````````````
.. doxygenfile:: paddle/cuda/include/hl_gru_ops.cuh
.. doxygenfile:: paddle/cuda/include/hl_cpu_gru.cuh
.. doxygenfile:: paddle/cuda/include/hl_gpu_gru.cuh
Utils
====================
.. toctree::
:maxdepth: 3
utils.rst
Utilities
===========
HPPL Base
------------
hl_base.h
``````````````
.. doxygenfile:: paddle/cuda/include/hl_base.h
Timer
-----------
hl_time.h
``````````````
.. doxygenfile:: paddle/cuda/include/hl_time.h
Thread Resource
-----------
hl_thread.ph
``````````````
.. doxygenfile:: paddle/cuda/include/hl_thread.ph
Activations
=============
.. doxygenclass:: paddle::ActivationFunction
:members:
Data Providers Documents
==========================
.. toctree::
:maxdepth: 3
dataproviders.rst
Evaluators
==========
.. toctree::
:maxdepth: 3
evaluators.rst
Gradient Machines Documents
=============================
.. toctree::
:maxdepth: 3
gradientmachines.rst
Layers Documents
====================
.. toctree::
:maxdepth: 3
layer.rst
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册