diff --git a/develop/doc/_images/ps_en.png b/develop/doc/_images/ps_en.png new file mode 100644 index 0000000000000000000000000000000000000000..6537d3d56589ca9f19a77a50a970e4b5275e6ce0 Binary files /dev/null and b/develop/doc/_images/ps_en.png differ diff --git a/develop/doc/_sources/getstarted/quickstart_en.rst.txt b/develop/doc/_sources/getstarted/quickstart_en.rst.txt index d1bcf82ea071e2c53760a5ccf6a5074a3ac0abd5..70f7fe0646068aa79cd72955c6848ac0250c2300 100644 --- a/develop/doc/_sources/getstarted/quickstart_en.rst.txt +++ b/develop/doc/_sources/getstarted/quickstart_en.rst.txt @@ -1,6 +1,9 @@ Quick Start ============ +Quick Install +------------- + You can use pip to install PaddlePaddle with a single command, supports CentOS 6 above, Ubuntu 14.04 above or MacOS 10.12, with Python 2.7 installed. Simply run the following command to install, the version is cpu_avx_openblas: @@ -17,6 +20,9 @@ If you need to install GPU version (cuda7.5_cudnn5_avx_openblas), run: For more details about installation and build: :ref:`install_steps` . +Quick Use +--------- + Create a new file called housing.py, and paste this Python code: diff --git a/develop/doc/_sources/howto/cluster/index_en.rst.txt b/develop/doc/_sources/howto/cluster/index_en.rst.txt index af957e06cd7930ce63569a1bafdde47a1d34eb69..2640a09dcc904619bc97c9bd3f3d81a9dc307663 100644 --- a/develop/doc/_sources/howto/cluster/index_en.rst.txt +++ b/develop/doc/_sources/howto/cluster/index_en.rst.txt @@ -1,10 +1,22 @@ Distributed Training ==================== +In this section, we'll explain how to run distributed training jobs with PaddlePaddle on different types of clusters. The diagram below shows the main architecture of a distributed trainning job: + +.. image:: src/ps_en.png + :width: 500 + +- Data shard: training data will be split into multiple partitions, trainers use the partitions of the whole dataset to do the training job. +- Trainer: each trainer reads the data shard, and train the neural network. Then the trainer will upload calculated "gradients" to parameter servers, and wait for parameters to be optimized on the parameter server side. When that finishes, the trainer download optimized parameters and continues its training. +- Parameter server: every parameter server stores part of the whole neural network model data. They will do optimization calculations when gradients are uploaded from trainers, and then send updated parameters to trainers. + +PaddlePaddle can support both synchronize stochastic gradient descent (SGD) and asynchronous SGD. + +When training with synchronize SGD, PaddlePaddle uses an internal "synchronize barrier" which makes gradients update and parameter download in strict order. On the other hand, asynchronous SGD won't wait for all trainers to finish upload at a single step, this will increase the parallelism of distributed training: parameter servers do not depend on each other, they'll do parameter optimization concurrently. Parameter servers will not wait for trainers, so trainers will also do their work concurrently. But asynchronous SGD will introduce more randomness and noises in the gradient. + .. toctree:: :maxdepth: 1 - introduction_en.md preparations_en.md cmd_argument_en.md multi_cluster/index_en.rst diff --git a/develop/doc/_sources/howto/cluster/introduction_en.md.txt b/develop/doc/_sources/howto/cluster/introduction_en.md.txt deleted file mode 100644 index eb70d7cf35ab729e0da4c6a3a2e732c26905f584..0000000000000000000000000000000000000000 --- a/develop/doc/_sources/howto/cluster/introduction_en.md.txt +++ /dev/null @@ -1,13 +0,0 @@ -## Introduction - -In this section, we'll explain how to run distributed training jobs with PaddlePaddle on different types of clusters. The diagram below shows the main architecture of a distributed trainning job: - - - -- Data shard: training data will be split into multiple partitions, trainers use the partitions of the whole dataset to do the training job. -- Trainer: each trainer reads the data shard, and train the neural network. Then the trainer will upload calculated "gradients" to parameter servers, and wait for parameters to be optimized on the parameter server side. When that finishes, the trainer download optimized parameters and continues its training. -- Parameter server: every parameter server stores part of the whole neural network model data. They will do optimization calculations when gradients are uploaded from trainers, and then send updated parameters to trainers. - -PaddlePaddle can support both synchronize stochastic gradient descent (SGD) and asynchronous SGD. - -When training with synchronize SGD, PaddlePaddle uses an internal "synchronize barrier" which makes gradients update and parameter download in strict order. On the other hand, asynchronous SGD won't wait for all trainers to finish upload at a single step, this will increase the parallelism of distributed training: parameter servers do not depend on each other, they'll do parameter optimization concurrently. Parameter servers will not wait for trainers, so trainers will also do their work concurrently. But asynchronous SGD will introduce more randomness and noises in the gradient. diff --git a/develop/doc/api/index_en.html b/develop/doc/api/index_en.html index 0d0b5f5a04704c357f5804703e16a952d4b717bf..7e297e188e7a466c3035133f8884dcb0427784f9 100644 --- a/develop/doc/api/index_en.html +++ b/develop/doc/api/index_en.html @@ -128,7 +128,6 @@
  • Distributed Training