未验证 提交 4087de2c 编写于 作者: C Cheerego 提交者: GitHub

fix_typo_and_adjust_structure (#797)

上级 933ebbb9
.. _api_guide_cpu_training_best_practice:
##################
####################
分布式CPU训练最佳实践
##################
####################
提高CPU分布式训练的训练速度,主要要从两个方面来考虑:
1)提高训练速度,主要是提高CPU的使用率;2)提高通信速度,主要是减少通信传输的数据量。
......
.. _best_practice_dist_training_gpu:
性能优化最佳实践之:GPU分布式训练
============================
#####################
分布式GPU训练最佳实践
#####################
开始优化您的GPU分布式训练任务
-------------------------
......@@ -170,7 +171,7 @@ PaddlePaddle Fluid使用“线程池” [#]_ 模型调度并执行Op,Op在启
数据读取的优化在GPU训练中至关重要,尤其在不断增加batch_size提升吞吐时,计算对reader性能会有更高对要求,
优化reader性能需要考虑的点包括:
1. 使用 :code:`pyreader`
1. 使用 :code:`pyreader`
参考 `这里 <../../user_guides/howto/prepare_data/use_py_reader.html>`_
使用pyreader,并开启 :code:`use_double_buffer`
2. reader返回uint8类型数据
......@@ -229,7 +230,7 @@ PaddlePaddle Fluid使用“线程池” [#]_ 模型调度并执行Op,Op在启
for batch_id in (iters_per_pass):
exe.run()
pyreader.reset()
使用混合精度训练
++++++++++++++
......
#########
最佳实践
#########
.. toctree::
:hidden:
cpu_train_best_practice.rst
dist_training_gpu.rst
###############
Best Practice
###############
.. toctree::
:hidden:
cpu_train_best_practice_en.rst
......@@ -21,28 +21,28 @@ Fluid使用一种编译器式的执行流程,分为编译时和运行时两个
</p>
1. 编译时,用户编写一段python程序,通过调用 Fluid 提供的算子,向一段 Program 中添加变量(Tensor)以及对变量的操作(Operators 或者 Layers)。用户只需要描述核心的前向计算,不需要关心反向计算、分布式下以及异构设备下如何计算。
2. 原始的 Program 在平台内部转换为中间描述语言: `ProgramDesc`
3. 编译期最重要的一个功能模块是 `Transpiler``Transpiler` 接受一段 `ProgramDesc` ,输出一段变化后的 `ProgramDesc` ,作为后端 `Executor` 最终需要执行的 Fluid Program
4. 后端 Executor 接受 Transpiler 输出的这段 Program ,依次执行其中的 Operator(可以类比为程序语言中的指令),在执行过程中会为 Operator 创建所需的输入输出并进行管理。
## 2. Program设计思想
## 2. Program设计思想
用户完成网络定义后,一段 Fluid 程序中通常存在 2 段 Program:
1. fluid.default_startup_program:定义了创建模型参数,输入输出,以及模型中可学习参数的初始化等各种操作
default_startup_program 可以由框架自动生成,使用时无需显示地创建
如果调用修改了参数的默认初始化方式,框架会自动的将相关的修改加入default_startup_program
2. fluid.default_main_program :定义了神经网络模型,前向反向计算,以及优化算法对网络中可学习参数的更新
使用Fluid的核心就是构建起 default_main_program
......@@ -53,7 +53,7 @@ Fluid 的 Program 的基本结构是一些嵌套 blocks,形式上类似一段
blocks中包含:
- 本地变量的定义
- 一系列的operator
- 一系列的operator
block的概念与通用程序一致,例如在下列这段C++代码中包含三个block:
......@@ -95,7 +95,7 @@ prob = ie()
```
### BlockDesc and ProgramDesc
用户描述的block与program信息在Fluid中以[protobuf](https://en.wikipedia.org/wiki/Protocol_Buffers) 格式保存,所有的`protobub`信息被定义在`framework.proto`中,在Fluid中被称为BlockDesc和ProgramDesc。ProgramDesc和BlockDesc的概念类似于一个[抽象语法树](https://en.wikipedia.org/wiki/Abstract_syntax_tree)
用户描述的block与program信息在Fluid中以[protobuf](https://en.wikipedia.org/wiki/Protocol_Buffers) 格式保存,所有的`protobuf`信息被定义在`framework.proto`中,在Fluid中被称为BlockDesc和ProgramDesc。ProgramDesc和BlockDesc的概念类似于一个[抽象语法树](https://en.wikipedia.org/wiki/Abstract_syntax_tree)
`BlockDesc`中包含本地变量的定义`vars`,和一系列的operator`ops`
......@@ -172,12 +172,12 @@ class Executor{
Scope* scope,
int block_id) {
auto& block = pdesc.Block(block_id);
//创建所有变量
for (auto& var : block.AllVars())
scope->Var(Var->Name());
}
//创建OP并按顺序执行
for (auto& op_desc : block.AllOps()){
auto op = CreateOp(*op_desc);
......@@ -300,7 +300,7 @@ BlockDesc中包含定义的 vars 和一系列的 ops,以输入x为例,python
x = fluid.layers.data(name="x",shape=[1],dtype='float32')
```
在BlockDesc中,变量x被描述为:
```
```
vars {
name: "x"
type {
......
......@@ -29,5 +29,5 @@
development/profiling/index_cn.rst
development/contribute_to_paddle/index_cn.rst
development/write_docs_cn.md
best_practice/dist_training_gpu.rst
paddle_slim/paddle_slim.md
best_practice/index_cn.rst
paddle_slim/paddle_slim.md
......@@ -29,3 +29,4 @@ We gladly encourage your contributions of codes and documentation to our communi
development/profiling/index_en.rst
development/contribute_to_paddle/index_en.rst
development/write_docs_en.md
best_practice/index_en.rst
此差异已折叠。
## High/Low-level API简介
PaddlePaddle Fluid目前有2套API接口:
- Low-level(底层) API:
- 灵活性强并且已经相对成熟,使用它训练的模型,能直接支持C++预测上线。
- 提供了大量的模型作为使用示例,包括[Book](https://github.com/PaddlePaddle/book)中的全部章节,以及[models](https://github.com/PaddlePaddle/models)中的所有章节。
- 适用人群:对深度学习有一定了解,需要自定义网络进行训练/预测/上线部署的用户。
- High-level(高层)API:
- 使用简单
- 尚未成熟,接口暂时在[paddle.fluid.contrib](https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid/contrib)下面。
## Introduction to High/Low-level API
Currently PaddlePaddle Fluid has 2 branches of API interfaces:
- Low-level API:
- It is highly flexible and relatively mature. The model trained by it can directly support C++ inference deployment and release.
- There are a large number of models as examples, including all chapters in [book](https://github.com/PaddlePaddle/book), and [models](https://github.com/PaddlePaddle/models).
- Recommended for users who have a certain understanding of deep learning and need to customize a network for training/inference/online deployment.
- High-level API:
- Simple to use
- Still under development. the interface is temporarily in [paddle.fluid.contrib](https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid/contrib).
\ No newline at end of file
===========
API使用指南
API快速检索
===========
API使用指南分功能向您介绍PaddlePaddle Fluid的API体系和用法,帮助您快速了解PaddlePaddle Fluid API的全貌,包括以下几个模块:
API快速检索分功能向您介绍PaddlePaddle Fluid的API体系和用法,帮助您快速了解PaddlePaddle Fluid API的全貌,包括以下几个模块:
.. toctree::
:maxdepth: 1
high_low_level_api.md
low_level/program.rst
low_level/layers/index.rst
low_level/executor.rst
low_level/nets.rst
low_level/optimizer.rst
low_level/backward.rst
low_level/metrics.rst
low_level/model_save_reader.rst
low_level/inference.rst
low_level/distributed/index.rst
low_level/memory_optimize.rst
low_level/nets.rst
low_level/executor.rst
low_level/parallel_executor.rst
low_level/backward.rst
low_level/parameter.rst
low_level/program.rst
low_level/distributed/index.rst
===========
API Guides
===========
=================
API Quick Search
=================
This section introduces the Fluid API structure and usage, to help you quickly get the full picture of the PaddlePaddle Fluid API. This section is divided into the following modules:
.. toctree::
:maxdepth: 1
high_low_level_api_en.md
low_level/program_en.rst
low_level/layers/index_en.rst
low_level/executor_en.rst
low_level/nets_en.rst
low_level/optimizer_en.rst
low_level/backward_en.rst
low_level/metrics_en.rst
low_level/model_save_reader_en.rst
low_level/inference_en.rst
low_level/distributed/index_en.rst
low_level/memory_optimize_en.rst
low_level/nets_en.rst
low_level/executor_en.rst
low_level/parallel_executor_en.rst
low_level/compiled_program_en.rst
low_level/backward_en.rst
low_level/parameter_en.rst
low_level/program_en.rst
low_level/distributed/index_en.rst
......@@ -20,13 +20,13 @@ API详细使用方法参考 :ref:`cn_api_fluid_DistributeTranspiler` ,简单
# 配置策略config
config.slice_var_up = False
t = fluid.DistributedTranspiler(config=config)
t.transpile(trainer_id,
t.transpile(trainer_id,
program=main_program,
pservers="192.168.0.1:6174,192.168.0.2:6174",
trainers=1,
sync_mode=False)
以上参数说明请参考 `同步训练 <../distributed/sync_training.html>`_
以上参数说明请参考 `同步训练 <../distributed/sync_training.html>`_
需要注意的是:进行异步训练时,请修改 :code:`sync_mode` 的值
......
......@@ -4,21 +4,21 @@
Asynchronous Distributed Training
####################################
Fluid supports parallelism asynchronous distributed training. :code:`DistributedTranspiler` converts a single node network configuration into a :code:`pserver` side program and the :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, the corresponding :code:`pserver` or :code:`trainer` role can be executed.
Fluid supports parallelism asynchronous distributed training. :code:`DistributeTranspiler` converts a single node network configuration into a :code:`pserver` side program and the :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, the corresponding :code:`pserver` or :code:`trainer` role can be executed.
**Asynchronous distributed training in Fluid only supports the pserver mode** . The main difference between asynchronous training and `synchronous training <../distributed/sync_training_en.html>`_ is that the gradients of each trainer are asynchronously applied on the parameters, but in synchronous training, the gradients of all trainers must be combined first and then they are used to update the parameters. Therefore, the hyperparameters of synchronous training and asynchronous training need to be adjusted separately.
Asynchronous distributed training in Pserver mode
Asynchronous distributed training in Pserver mode
==================================================
For detailed API, please refer to :ref:`api_fluid_transpiler_DistributeTranspiler` . A simple example:
.. code-block:: python
config = fluid.DistributedTranspilerConfig()
#Configuring config policy
config = fluid.DistributeTranspilerConfig()
#Configuring config policy
config.slice_var_up = False
t = fluid.DistributedTranspiler(config=config)
t = fluid.DistributeTranspiler(config=config)
t.transpile(trainer_id,
program=main_program,
pservers="192.168.0.1:6174,192.168.0.2:6174",
......
......@@ -7,8 +7,5 @@
sync_training.rst
async_training.rst
cpu_train_best_practice.rst
large_scale_sparse_feature_training.rst
cluster_train_data_cn.rst
......@@ -7,7 +7,6 @@ Distributed Training
sync_training_en.rst
async_training_en.rst
cpu_train_best_practice_en.rst
large_scale_sparse_feature_training_en.rst
cluster_train_data_en.rst
......
......@@ -4,7 +4,7 @@
分布式同步训练
############
Fluid支持数据并行的分布式同步训练,API使用 :code:`DistributedTranspiler` 将单机网络配置转换成可以多机执行的
Fluid支持数据并行的分布式同步训练,API使用 :code:`DistributeTranspiler` 将单机网络配置转换成可以多机执行的
:code:`pserver` 端程序和 :code:`trainer` 端程序。用户在不同的节点执行相同的一段代码,根据环境变量或启动参数,
可以执行对应的 :code:`pserver` 或 :code:`trainer` 角色。Fluid分布式同步训练同时支持pserver模式和NCCL2模式,
在API使用上有差别,需要注意。
......@@ -16,11 +16,11 @@ API详细使用方法参考 :ref:`DistributeTranspiler` ,简单实例用法:
.. code-block:: python
config = fluid.DistributedTranspilerConfig()
config = fluid.DistributeTranspilerConfig()
# 配置策略config
config.slice_var_up = False
t = fluid.DistributedTranspiler(config=config)
t.transpile(trainer_id,
t = fluid.DistributeTranspiler(config=config)
t.transpile(trainer_id,
program=main_program,
pservers="192.168.0.1:6174,192.168.0.2:6174",
trainers=1,
......@@ -68,8 +68,8 @@ NCCL2模式分布式训练
config = fluid.DistributeTranspilerConfig()
config.mode = "nccl2"
t = fluid.DistributedTranspiler(config=config)
t.transpile(trainer_id,
t = fluid.DistributeTranspiler(config=config)
t.transpile(trainer_id,
program=main_program,
startup_program=startup_program,
trainers="192.168.0.1:6174,192.168.0.2:6174",
......
......@@ -4,19 +4,19 @@
Synchronous Distributed Training
####################################
Fluid supports parallelism distributed synchronous training, the API uses the :code:`DistributedTranspiler` to convert a single node network configuration into a :code:`pserver` side and :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, you can execute the corresponding :code:`pserver` or :code:`trainer` role. Fluid distributed synchronous training supports both pserver mode and NCCL2 mode. There are differences in the use of the API, to which you need to pay attention.
Fluid supports parallelism distributed synchronous training, the API uses the :code:`DistributeTranspiler` to convert a single node network configuration into a :code:`pserver` side and :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, you can execute the corresponding :code:`pserver` or :code:`trainer` role. Fluid distributed synchronous training supports both pserver mode and NCCL2 mode. There are differences in the use of the API, to which you need to pay attention.
Distributed training in pserver mode
Distributed training in pserver mode
======================================
For API Reference, please refer to :ref:`DistributeTranspiler`. A simple example :
.. code-block:: python
config = fluid.DistributedTranspilerConfig()
config = fluid.DistributeTranspilerConfig()
#Configuring policy config
config.slice_var_up = False
t = fluid.DistributedTranspiler(config=config)
t = fluid.DistributeTranspiler(config=config)
t.transpile(trainer_id,
program=main_program,
pservers="192.168.0.1:6174,192.168.0.2:6174",
......@@ -51,7 +51,7 @@ Configuration for general environment variables:
- :code:`FLAGS_rpc_deadline` : int, the longest waiting time for RPC communication, in milliseconds, default 180000
Distributed training in NCCL2 mode
Distributed training in NCCL2 mode
====================================
The multi-node synchronous training mode based on NCCL2 (Collective Communication) is only supported in the GPU cluster.
......@@ -65,7 +65,7 @@ Use the following code to convert the current :code:`Program` to a Fluid :code:`
Config = fluid.DistributeTranspilerConfig()
Config.mode = "nccl2"
t = fluid.DistributedTranspiler(config=config)
t = fluid.DistributeTranspiler(config=config)
t.transpile(trainer_id,
program=main_program,
startup_program=startup_program,
......
.. _api_guide_Program:
###############################
Program/Block/Operator/Variable
###############################
#########
基础概念
#########
==================
Program
......@@ -13,13 +13,13 @@ Program
总得来说:
* 一个模型是一个 Fluid :code:`Program` ,一个模型可以含有多于一个 :code:`Program` ;
* 一个模型是一个 Fluid :code:`Program` ,一个模型可以含有多于一个 :code:`Program` ;
* :code:`Program` 由嵌套的 :code:`Block` 构成,:code:`Block` 的概念可以类比到 C++ 或是 Java 中的一对大括号,或是 Python 语言中的一个缩进块;
* :code:`Block` 中的计算由顺序执行、条件选择或者循环执行三种方式组合,构成复杂的计算逻辑;
* :code:`Block` 中包含对计算和计算对象的描述。计算的描述称之为 Operator;计算作用的对象(或者说 Operator 的输入和输出)被统一为 Tensor,在Fluid中,Tensor 用层级为0的 `LoD-Tensor <http://paddlepaddle.org/documentation/docs/zh/1.2/user_guides/howto/prepare_data/lod_tensor.html#permalink-4-lod-tensor>`_ 表示。
* :code:`Block` 中包含对计算和计算对象的描述。计算的描述称之为 Operator;计算作用的对象(或者说 Operator 的输入和输出)被统一为 Tensor,在Fluid中,Tensor 用层级为0的 `LoD-Tensor <http://paddlepaddle.org/documentation/docs/zh/1.2/user_guides/howto/prepare_data/lod_tensor.html#permalink-4-lod-tensor>`_ 表示。
......@@ -37,7 +37,7 @@ Block
+----------------------+-------------------------+
| if-else, switch | IfElseOp, SwitchOp |
+----------------------+-------------------------+
| 顺序执行 | 一系列 layers |
| 顺序执行 | 一系列 layers |
+----------------------+-------------------------+
如上文所说,Fluid 中的 :code:`Block` 描述了一组以顺序、选择或是循环执行的 Operator 以及 Operator 操作的对象:Tensor。
......@@ -54,7 +54,7 @@ Operator
这是因为一些常见的对 Tensor 的操作可能是由更多基础操作构成,为了提高使用的便利性,框架内部对基础 Operator 进行了一些封装,包括创建 Operator 依赖可学习参数,可学习参数的初始化细节等,减少用户重复开发的成本。
更多内容可参考阅读 `Fluid设计思想 <../../advanced_usage/design_idea/fluid_design_idea.html>`_
更多内容可参考阅读 `Fluid设计思想 <../../advanced_usage/design_idea/fluid_design_idea.html>`_
=========
......@@ -78,4 +78,4 @@ Fluid 中的 :code:`Variable` 可以包含任何类型的值———在大多
* 用户还可以使用 :ref:`cn_api_fluid_program_guard` 配合 :code:`with` 语句,修改配置好的 :ref:`cn_api_fluid_default_startup_program` 和 :ref:`cn_api_fluid_default_main_program` 。
* 在Fluid中,Block内部执行顺序由控制流决定,如 :ref:`cn_api_fluid_layers_IfElse` , :ref:`cn_api_fluid_layers_While`, :ref:`cn_api_fluid_layers_Switch` 等,更多内容可参考: :ref:`api_guide_control_flow`
* 在Fluid中,Block内部执行顺序由控制流决定,如 :ref:`cn_api_fluid_layers_IfElse` , :ref:`cn_api_fluid_layers_While`, :ref:`cn_api_fluid_layers_Switch` 等,更多内容可参考: :ref:`api_guide_control_flow`
.. _api_guide_Program_en:
###############################
Program/Block/Operator/Variable
###############################
###############
Basic Concept
###############
==================
Program
......@@ -36,7 +36,7 @@ Block
+----------------------+-------------------------+
| if-else, switch | IfElseOp, SwitchOp |
+----------------------+-------------------------+
| execute sequentially | a series of layers |
| execute sequentially | a series of layers |
+----------------------+-------------------------+
As mentioned above, :code:`Block` in Fluid describes a set of Operators that include sequential execution, conditional selection or loop execution, and the operating object of Operator: Tensor.
......@@ -53,7 +53,7 @@ This is because some common operations on Tensor may consist of more basic opera
More information can be read for reference. `Fluid Design Idea <../../advanced_usage/design_idea/fluid_design_idea.html>`_
More information can be read for reference. `Fluid Design Idea <../../advanced_usage/design_idea/fluid_design_idea.html>`_
=========
......@@ -75,4 +75,4 @@ Related API
* Users can also use :ref:`api_fluid_program_guard` with :code:`with` to modify the configured :ref:`api_fluid_default_startup_program` and :ref:`api_fluid_default_main_program` .
* In Fluid,the execution order in a Block is determined by control flow,such as :ref:`api_fluid_layers_IfElse` , :ref:`api_fluid_layers_While` and :ref:`api_fluid_layers_Switch` . For more information, please refer to: :ref:`api_guide_control_flow_en`
* In Fluid,the execution order in a Block is determined by control flow,such as :ref:`api_fluid_layers_IfElse` , :ref:`api_fluid_layers_While` and :ref:`api_fluid_layers_Switch` . For more information, please refer to: :ref:`api_guide_control_flow_en`
......@@ -186,6 +186,7 @@
For Python2: cmake .. -DWITH_FLUID_ONLY=ON -DWITH_GPU=OFF -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release
For Python3: cmake .. -DPY_VERSION=3.5 -DPYTHON_INCLUDE_DIR=${PYTHON_INCLUDE_DIRS} \
-DPYTHON_LIBRARY=${PYTHON_LIBRARY} -DWITH_FLUID_ONLY=ON -DWITH_GPU=OFF -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release
>`-DPY_VERSION=3.5`请修改为安装环境的Python版本
10. 使用以下命令来编译:
......
......@@ -121,7 +121,7 @@ Congratulations, you have now completed the process of compiling PaddlePaddle us
4. (Only For Python3) Set Python-related environment variables:
- a. First use
- a. First use
```find `dirname $(dirname
$(which python3))` -name "libpython3.*.dylib"```
to find the path to Pythonlib (the first one it prompts is the dylib path for the python you need to use), then (below [python-lib-path] is replaced by finding the file path)
......@@ -148,7 +148,7 @@ Congratulations, you have now completed the process of compiling PaddlePaddle us
Since we are using CMake3.4 please follow the steps below:
1. Download the CMake image from the [official CMake website](https://cmake.org/files/v3.4/cmake-3.4.3-Darwin-x86_64.dmg) and install it.
2. Enter `sudo "/Applications/CMake.app/Contents/bin/cmake-gui" –install` in the console
- b. If you do not want to use the system default blas and want to use your own installed OPENBLAS please read [FAQ](../FAQ.html/#OPENBLAS)
......
......@@ -236,7 +236,7 @@ Fluid的设计思想类似于高级编程语言C++和JAVA等。程序的执行
#定义Exector
cpu = fluid.core.CPUPlace() #定义运算场所,这里选择在CPU下训练
exe = fluid.Executor(cpu) #创建执行器
exe.run(fluid.default_startup_program()) #初始化Program
exe.run(fluid.default_startup_program()) #用来进行初始化的program
#训练Program,开始计算
#feed以字典的形式定义了数据传入网络的顺序
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册