未验证 提交 4087de2c 编写于 作者: C Cheerego 提交者: GitHub

fix_typo_and_adjust_structure (#797)

上级 933ebbb9
.. _api_guide_cpu_training_best_practice:
##################
####################
分布式CPU训练最佳实践
##################
####################
提高CPU分布式训练的训练速度,主要要从两个方面来考虑:
1)提高训练速度,主要是提高CPU的使用率;2)提高通信速度,主要是减少通信传输的数据量。
......
.. _best_practice_dist_training_gpu:
性能优化最佳实践之:GPU分布式训练
============================
#####################
分布式GPU训练最佳实践
#####################
开始优化您的GPU分布式训练任务
-------------------------
......
#########
最佳实践
#########
.. toctree::
:hidden:
cpu_train_best_practice.rst
dist_training_gpu.rst
###############
Best Practice
###############
.. toctree::
:hidden:
cpu_train_best_practice_en.rst
......@@ -95,7 +95,7 @@ prob = ie()
```
### BlockDesc and ProgramDesc
用户描述的block与program信息在Fluid中以[protobuf](https://en.wikipedia.org/wiki/Protocol_Buffers) 格式保存,所有的`protobub`信息被定义在`framework.proto`中,在Fluid中被称为BlockDesc和ProgramDesc。ProgramDesc和BlockDesc的概念类似于一个[抽象语法树](https://en.wikipedia.org/wiki/Abstract_syntax_tree)
用户描述的block与program信息在Fluid中以[protobuf](https://en.wikipedia.org/wiki/Protocol_Buffers) 格式保存,所有的`protobuf`信息被定义在`framework.proto`中,在Fluid中被称为BlockDesc和ProgramDesc。ProgramDesc和BlockDesc的概念类似于一个[抽象语法树](https://en.wikipedia.org/wiki/Abstract_syntax_tree)
`BlockDesc`中包含本地变量的定义`vars`,和一系列的operator`ops`
......
......@@ -29,5 +29,5 @@
development/profiling/index_cn.rst
development/contribute_to_paddle/index_cn.rst
development/write_docs_cn.md
best_practice/dist_training_gpu.rst
best_practice/index_cn.rst
paddle_slim/paddle_slim.md
......@@ -29,3 +29,4 @@ We gladly encourage your contributions of codes and documentation to our communi
development/profiling/index_en.rst
development/contribute_to_paddle/index_en.rst
development/write_docs_en.md
best_practice/index_en.rst
......@@ -213,7 +213,6 @@ memory用于缓存分段数据。memory的初始值可以是零,也可以是
- **x** (Variable) - 输入序列
- **level** (int) - 用于拆分步骤的LOD层级,默认值0
返回:当前的输入序列中的timestep。
.. py:method:: static_input(x)
......@@ -1742,7 +1741,7 @@ beam_search
参数:
- **pre_ids** (Variable) - LodTensor变量,它是上一步 ``beam_search`` 的输出。在第一步中。它应该是LodTensor,shape为 :math:`(batch\_size,1)` , :math:`lod [[0,1,...,batch\_size],[0,1,...,batch\_size]]`
- **pre_scores** (Variable) - LodTensor变量,它是上一步中beam_search的输出
- **ids** (Variable) - 包含候选ID的LodTensor变量。shpae为 :math:`(batch\_size×beam\_ize,K)` ,其中 ``K`` 应该是 ``beam_size``
- **ids** (Variable) - 包含候选ID的LodTensor变量。shape为 :math:`(batch\_size×beam\_ize,K)` ,其中 ``K`` 应该是 ``beam_size``
- **scores** (Variable) - 与 ``ids`` 及其shape对应的累积分数的LodTensor变量, 与 ``ids`` 的shape相同。
- **beam_size** (int) - 束搜索中的束宽度。
- **end_id** (int) - 结束标记的id。
......@@ -3185,11 +3184,6 @@ LSTMP层(具有循环映射的LSTM)在LSTM层后有一个分离的映射层,
- **cell_clip** (float) - 如果提供该参数,则在单元输出激活之前,单元状态将被此值剪裁。
- **proj_clip** (float) - 如果 num_proj > 0 并且 proj_clip 被提供,那么将投影值沿元素方向剪切到[-proj_clip,proj_clip]内
返回:含有两个输出变量的元组,隐藏状态(hidden state)的投影和LSTMP的cell状态。投影的shape为(T*P),cell state的shape为(T*D),两者的LoD和输入相同。
返回类型:元组(tuple)
......@@ -6312,7 +6306,6 @@ pad_constant_like
[[41, 42, 43]]]]
Y.shape = (1, 3, 1, 3)
参数:
- **x** (Variable)- 输入Tensor变量。
- **y** (Variable)- 输出Tensor变量。
......@@ -7261,7 +7254,7 @@ align_corners和align_mode是可选参数,插值的计算方法可以由它们
参数:
- **input** (Variable) - 双线性插值的输入张量,是一个shpae为(N x C x h x w)的4d张量。
- **input** (Variable) - 双线性插值的输入张量,是一个shape为(N x C x h x w)的4d张量。
- **out_shape** (Variable) - 一维张量,包含两个数。第一个数是高度,第二个数是宽度。
- **scale** (float|None) - 用于输入高度或宽度的乘数因子。out_shape和scale至少要设置一个。out_shape的优先级高于scale。默认值:None。
- **name** (str|None) - 输出变量名。
......@@ -8605,7 +8598,6 @@ shape层。
返回类型: Variable
**代码示例:**
.. code-block:: python
......@@ -9065,6 +9057,7 @@ softmax_with_cross_entropy
参数:
- **logits** (Variable) - 未标准化(unscaled)的log概率,一个形为 N X K 的二维张量。 N是batch大小,K是类别总数。
- **label** (Variable) - 2-D 张量,代表了正确标注(ground truth), 如果 ``soft_label`` 为 False,则该参数是一个形为 N X 1 的Tensor<int64> 。如果 ``soft_label`` 为 True,它是 Tensor<float/double> ,形为 N X K 。
- **soft_label** (bool) - 是否将输入标签当作软标签。默认为False。
......@@ -9395,6 +9388,7 @@ stack
Out.dims = [1, 3, 2]
参数:
- **x** (Variable|list(Variable)|tuple(Variable)) – 输入变量
- **axis** (int|None) – 对输入进行stack运算所在的轴
......@@ -9914,6 +9908,7 @@ abs
out = |x|
参数:
- **x** - abs算子的输入
- **use_cudnn** (BOOLEAN) – (bool,默认为false)是否仅用于cudnn核,需要安装cudnn
......@@ -9997,6 +9992,7 @@ ceil
参数:
- **x** - Ceil算子的输入
- **use_cudnn** (BOOLEAN) – (bool,默认为false)是否仅用于cudnn核,需要安装cudnn
......@@ -10027,6 +10023,7 @@ Cosine余弦激活函数。
参数:
- **x** - cos算子的输入
- **use_cudnn** (BOOLEAN) – (bool,默认为false)是否仅用于cudnn核,需要安装cudnn
......@@ -10087,6 +10084,7 @@ Exp激活函数(Exp指以自然常数e为底的指数运算)。
out = e^x
参数:
- **x** - Exp算子的输入
- **use_cudnn** (BOOLEAN) – (bool,默认为false)是否仅用于cudnn核,需要安装cudnn
......@@ -10117,6 +10115,7 @@ floor
参数:
- **x** - Floor算子的输入
- **use_cudnn** (BOOLEAN) – (bool,默认为false)是否仅用于cudnn核,需要安装cudnn
......@@ -10189,6 +10188,7 @@ Logsigmoid激活函数。
参数:
- **x** - LogSigmoid算子的输入
- **use_cudnn** (BOOLEAN) – (bool,默认为false)是否仅用于cudnn核,需要安装cudnn
返回: LogSigmoid算子的输出
......@@ -10216,6 +10216,7 @@ Reciprocal(取倒数)激活函数
out = \frac{1}{x}
参数:
- **x** - reciprocal算子的输入
- **use_cudnn** (BOOLEAN) – (bool,默认为false)是否仅用于cudnn核,需要安装cudnn
......@@ -10230,9 +10231,6 @@ Reciprocal(取倒数)激活函数
.. _cn_api_fluid_layers_round:
round
......@@ -10248,6 +10246,7 @@ Round取整激活函数。
参数:
- **x** - round算子的输入
- **use_cudnn** (BOOLEAN) – (bool,默认为false)是否仅用于cudnn核,需要安装cudnn
......@@ -10276,6 +10275,7 @@ sigmoid激活函数
参数:
- **x** - Sigmoid算子的输入
- **use_cudnn** (BOOLEAN) – (bool,默认为false)是否仅用于cudnn核,需要安装cudnn
......@@ -10304,6 +10304,7 @@ sin
参数:
- **x** - sin算子的输入
- **use_cudnn** (BOOLEAN) – (bool,默认为false)是否仅用于cudnn核,需要安装cudnn
......@@ -10421,6 +10422,7 @@ sqrt
out = \sqrt{x}
参数:
- **x** - Sqrt算子的输入
- **use_cudnn** (BOOLEAN) – (bool,默认为false)是否仅用于cudnn核,需要安装cudnn
......@@ -10479,6 +10481,7 @@ tanh 激活函数。
参数:
- **x** - Tanh算子的输入
- **use_cudnn** (BOOLEAN) – (bool,默认为false)是否仅用于cudnn核,需要安装cudnn
......@@ -10507,6 +10510,7 @@ tanh_shrink激活函数。
out = x - \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}
参数:
- **x** - TanhShrink算子的输入
- **use_cudnn** (BOOLEAN) – (bool,默认为false)是否仅用于cudnn核,需要安装cudnn
......@@ -12513,7 +12517,7 @@ PolygonBoxTransform 算子。
参数:
- **input** (Variable) - shape 为[batch_size,geometry_channels,height,width]的张量
返回:与输入 shpae 相同
返回:与输入 shape 相同
返回类型:output(Variable)
......
## High/Low-level API简介
PaddlePaddle Fluid目前有2套API接口:
- Low-level(底层) API:
- 灵活性强并且已经相对成熟,使用它训练的模型,能直接支持C++预测上线。
- 提供了大量的模型作为使用示例,包括[Book](https://github.com/PaddlePaddle/book)中的全部章节,以及[models](https://github.com/PaddlePaddle/models)中的所有章节。
- 适用人群:对深度学习有一定了解,需要自定义网络进行训练/预测/上线部署的用户。
- High-level(高层)API:
- 使用简单
- 尚未成熟,接口暂时在[paddle.fluid.contrib](https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid/contrib)下面。
## Introduction to High/Low-level API
Currently PaddlePaddle Fluid has 2 branches of API interfaces:
- Low-level API:
- It is highly flexible and relatively mature. The model trained by it can directly support C++ inference deployment and release.
- There are a large number of models as examples, including all chapters in [book](https://github.com/PaddlePaddle/book), and [models](https://github.com/PaddlePaddle/models).
- Recommended for users who have a certain understanding of deep learning and need to customize a network for training/inference/online deployment.
- High-level API:
- Simple to use
- Still under development. the interface is temporarily in [paddle.fluid.contrib](https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid/contrib).
\ No newline at end of file
===========
API使用指南
API快速检索
===========
API使用指南分功能向您介绍PaddlePaddle Fluid的API体系和用法,帮助您快速了解PaddlePaddle Fluid API的全貌,包括以下几个模块:
API快速检索分功能向您介绍PaddlePaddle Fluid的API体系和用法,帮助您快速了解PaddlePaddle Fluid API的全貌,包括以下几个模块:
.. toctree::
:maxdepth: 1
high_low_level_api.md
low_level/program.rst
low_level/layers/index.rst
low_level/executor.rst
low_level/nets.rst
low_level/optimizer.rst
low_level/backward.rst
low_level/metrics.rst
low_level/model_save_reader.rst
low_level/inference.rst
low_level/distributed/index.rst
low_level/memory_optimize.rst
low_level/nets.rst
low_level/executor.rst
low_level/parallel_executor.rst
low_level/backward.rst
low_level/parameter.rst
low_level/program.rst
low_level/distributed/index.rst
===========
API Guides
===========
=================
API Quick Search
=================
This section introduces the Fluid API structure and usage, to help you quickly get the full picture of the PaddlePaddle Fluid API. This section is divided into the following modules:
.. toctree::
:maxdepth: 1
high_low_level_api_en.md
low_level/program_en.rst
low_level/layers/index_en.rst
low_level/executor_en.rst
low_level/nets_en.rst
low_level/optimizer_en.rst
low_level/backward_en.rst
low_level/metrics_en.rst
low_level/model_save_reader_en.rst
low_level/inference_en.rst
low_level/distributed/index_en.rst
low_level/memory_optimize_en.rst
low_level/nets_en.rst
low_level/executor_en.rst
low_level/parallel_executor_en.rst
low_level/compiled_program_en.rst
low_level/backward_en.rst
low_level/parameter_en.rst
low_level/program_en.rst
low_level/distributed/index_en.rst
......@@ -4,7 +4,7 @@
Asynchronous Distributed Training
####################################
Fluid supports parallelism asynchronous distributed training. :code:`DistributedTranspiler` converts a single node network configuration into a :code:`pserver` side program and the :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, the corresponding :code:`pserver` or :code:`trainer` role can be executed.
Fluid supports parallelism asynchronous distributed training. :code:`DistributeTranspiler` converts a single node network configuration into a :code:`pserver` side program and the :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, the corresponding :code:`pserver` or :code:`trainer` role can be executed.
**Asynchronous distributed training in Fluid only supports the pserver mode** . The main difference between asynchronous training and `synchronous training <../distributed/sync_training_en.html>`_ is that the gradients of each trainer are asynchronously applied on the parameters, but in synchronous training, the gradients of all trainers must be combined first and then they are used to update the parameters. Therefore, the hyperparameters of synchronous training and asynchronous training need to be adjusted separately.
......@@ -15,10 +15,10 @@ For detailed API, please refer to :ref:`api_fluid_transpiler_DistributeTranspile
.. code-block:: python
config = fluid.DistributedTranspilerConfig()
config = fluid.DistributeTranspilerConfig()
#Configuring config policy
config.slice_var_up = False
t = fluid.DistributedTranspiler(config=config)
t = fluid.DistributeTranspiler(config=config)
t.transpile(trainer_id,
program=main_program,
pservers="192.168.0.1:6174,192.168.0.2:6174",
......
......@@ -7,8 +7,5 @@
sync_training.rst
async_training.rst
cpu_train_best_practice.rst
large_scale_sparse_feature_training.rst
cluster_train_data_cn.rst
......@@ -7,7 +7,6 @@ Distributed Training
sync_training_en.rst
async_training_en.rst
cpu_train_best_practice_en.rst
large_scale_sparse_feature_training_en.rst
cluster_train_data_en.rst
......
......@@ -4,7 +4,7 @@
分布式同步训练
############
Fluid支持数据并行的分布式同步训练,API使用 :code:`DistributedTranspiler` 将单机网络配置转换成可以多机执行的
Fluid支持数据并行的分布式同步训练,API使用 :code:`DistributeTranspiler` 将单机网络配置转换成可以多机执行的
:code:`pserver` 端程序和 :code:`trainer` 端程序。用户在不同的节点执行相同的一段代码,根据环境变量或启动参数,
可以执行对应的 :code:`pserver` 或 :code:`trainer` 角色。Fluid分布式同步训练同时支持pserver模式和NCCL2模式,
在API使用上有差别,需要注意。
......@@ -16,10 +16,10 @@ API详细使用方法参考 :ref:`DistributeTranspiler` ,简单实例用法:
.. code-block:: python
config = fluid.DistributedTranspilerConfig()
config = fluid.DistributeTranspilerConfig()
# 配置策略config
config.slice_var_up = False
t = fluid.DistributedTranspiler(config=config)
t = fluid.DistributeTranspiler(config=config)
t.transpile(trainer_id,
program=main_program,
pservers="192.168.0.1:6174,192.168.0.2:6174",
......@@ -68,7 +68,7 @@ NCCL2模式分布式训练
config = fluid.DistributeTranspilerConfig()
config.mode = "nccl2"
t = fluid.DistributedTranspiler(config=config)
t = fluid.DistributeTranspiler(config=config)
t.transpile(trainer_id,
program=main_program,
startup_program=startup_program,
......
......@@ -4,7 +4,7 @@
Synchronous Distributed Training
####################################
Fluid supports parallelism distributed synchronous training, the API uses the :code:`DistributedTranspiler` to convert a single node network configuration into a :code:`pserver` side and :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, you can execute the corresponding :code:`pserver` or :code:`trainer` role. Fluid distributed synchronous training supports both pserver mode and NCCL2 mode. There are differences in the use of the API, to which you need to pay attention.
Fluid supports parallelism distributed synchronous training, the API uses the :code:`DistributeTranspiler` to convert a single node network configuration into a :code:`pserver` side and :code:`trainer` side program that can be executed on multiple machines. The user executes the same piece of code on different nodes. Depending on the environment variables or startup parameters, you can execute the corresponding :code:`pserver` or :code:`trainer` role. Fluid distributed synchronous training supports both pserver mode and NCCL2 mode. There are differences in the use of the API, to which you need to pay attention.
Distributed training in pserver mode
======================================
......@@ -13,10 +13,10 @@ For API Reference, please refer to :ref:`DistributeTranspiler`. A simple example
.. code-block:: python
config = fluid.DistributedTranspilerConfig()
config = fluid.DistributeTranspilerConfig()
#Configuring policy config
config.slice_var_up = False
t = fluid.DistributedTranspiler(config=config)
t = fluid.DistributeTranspiler(config=config)
t.transpile(trainer_id,
program=main_program,
pservers="192.168.0.1:6174,192.168.0.2:6174",
......@@ -65,7 +65,7 @@ Use the following code to convert the current :code:`Program` to a Fluid :code:`
Config = fluid.DistributeTranspilerConfig()
Config.mode = "nccl2"
t = fluid.DistributedTranspiler(config=config)
t = fluid.DistributeTranspiler(config=config)
t.transpile(trainer_id,
program=main_program,
startup_program=startup_program,
......
.. _api_guide_Program:
###############################
Program/Block/Operator/Variable
###############################
#########
基础概念
#########
==================
Program
......
.. _api_guide_Program_en:
###############################
Program/Block/Operator/Variable
###############################
###############
Basic Concept
###############
==================
Program
......
......@@ -186,6 +186,7 @@
For Python2: cmake .. -DWITH_FLUID_ONLY=ON -DWITH_GPU=OFF -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release
For Python3: cmake .. -DPY_VERSION=3.5 -DPYTHON_INCLUDE_DIR=${PYTHON_INCLUDE_DIRS} \
-DPYTHON_LIBRARY=${PYTHON_LIBRARY} -DWITH_FLUID_ONLY=ON -DWITH_GPU=OFF -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release
>`-DPY_VERSION=3.5`请修改为安装环境的Python版本
10. 使用以下命令来编译:
......
......@@ -236,7 +236,7 @@ Fluid的设计思想类似于高级编程语言C++和JAVA等。程序的执行
#定义Exector
cpu = fluid.core.CPUPlace() #定义运算场所,这里选择在CPU下训练
exe = fluid.Executor(cpu) #创建执行器
exe.run(fluid.default_startup_program()) #初始化Program
exe.run(fluid.default_startup_program()) #用来进行初始化的program
#训练Program,开始计算
#feed以字典的形式定义了数据传入网络的顺序
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册