提交 a9ecd332 编写于 作者: J JepsonWong

Merge branch 'develop' of https://github.com/PaddlePaddle/FluidDoc into update_faq

......@@ -6,7 +6,7 @@ Advanced User Guides
So far you have already been familiar with Fluid. And the next expectation should be building a more efficient model or inventing your original Operator. If so, read more on:
- `Fluid Design Principles <../advanced_usage/design_idea/fluid_design_idea_en.html>`_ : Design principles underlying Fluid to help you understand how the framework runs.
- `Design Principles of Fluid <../advanced_usage/design_idea/fluid_design_idea_en.html>`_ : Design principles underlying Fluid to help you understand how the framework runs.
- `Deploy Inference Model <../advanced_usage/deploy/index_en.html>`_ :How to deploy the trained network to perform practical inference
......
......@@ -136,7 +136,7 @@ feed map为该program提供输入数据。fetch_list提供program训练结束后
- **fetch_var_name** (str) – 结果获取算子(fetch operator)的输出变量名称
- **scope** (Scope) – 执行这个program的域,用户可以指定不同的域。缺省为全局域
- **return_numpy** (bool) – 如果为True,则将结果张量(fetched tensor)转化为numpy
- **use_program_cache** (bool) – 是否跨批使用缓存程序设置。设置为True时,只有当(1)程序没有用数据并行编译,并且(2)program、 feed变量名和fetch_list变量名与上一步相比没有更改时,运行速度才会更快。
- **use_program_cache** (bool) – 是否在不同的批次间使用相同的缓存程序设置。设置为True时,只有当(1)程序没有用数据并行编译,并且(2)program、 feed变量名和fetch_list变量名与上一步相比没有更改时,运行速度才会更快。
返回: 根据fetch_list来获取结果
......
......@@ -13,8 +13,6 @@ save_persistables
- **executor** (Executor) – 保存变量的 executor
- **dirname** (str) – 目录路径
- **main_program** (Program|None) – 需要保存变量的 Program。如果为 None,则使用 default_main_Program 。默认值: None
- **predicate** (function|None) – 如果不等于None,当指定main_program, 那么只有 predicate(variable)==True 时,main_program中的变量
- **vars** (list[Variable]|None) – 要保存的所有变量的列表。 优先级高于main_program。默认值: None
- **filename** (str|None) – 保存变量的文件。如果想分开保存变量,设置 filename=None. 默认值: None
返回: None
......
......@@ -28,7 +28,6 @@ fluid.layers
layers_cn/atan_cn.rst
layers_cn/auc_cn.rst
layers_cn/autoincreased_step_counter_cn.rst
layers_cn/batch_cn.rst
layers_cn/batch_norm_cn.rst
layers_cn/beam_search_cn.rst
layers_cn/beam_search_decode_cn.rst
......@@ -41,6 +40,7 @@ fluid.layers
layers_cn/brelu_cn.rst
layers_cn/cast_cn.rst
layers_cn/ceil_cn.rst
layers_cn/center_loss_cn.rst
layers_cn/chunk_eval_cn.rst
layers_cn/clip_by_norm_cn.rst
layers_cn/clip_cn.rst
......@@ -96,9 +96,11 @@ fluid.layers
layers_cn/exp_cn.rst
layers_cn/expand_cn.rst
layers_cn/exponential_decay_cn.rst
layers_cn/eye_cn.rst
layers_cn/fc_cn.rst
layers_cn/fill_constant_batch_size_like_cn.rst
layers_cn/fill_constant_cn.rst
layers_cn/filter_by_instag_cn.rst
layers_cn/flatten_cn.rst
layers_cn/floor_cn.rst
layers_cn/fsp_matrix_cn.rst
......@@ -116,6 +118,7 @@ fluid.layers
layers_cn/gru_unit_cn.rst
layers_cn/hard_shrink_cn.rst
layers_cn/hard_sigmoid_cn.rst
layers_cn/hard_swish_cn.rst
layers_cn/has_inf_cn.rst
layers_cn/has_nan_cn.rst
layers_cn/hash_cn.rst
......@@ -141,6 +144,7 @@ fluid.layers
layers_cn/linear_lr_warmup_cn.rst
layers_cn/linspace_cn.rst
layers_cn/load_cn.rst
layers_cn/lod_append_cn.rst
layers_cn/lod_reset_cn.rst
layers_cn/log_cn.rst
layers_cn/log_loss_cn.rst
......@@ -153,6 +157,7 @@ fluid.layers
layers_cn/lstm_cn.rst
layers_cn/lstm_unit_cn.rst
layers_cn/margin_rank_loss_cn.rst
layers_cn/match_matrix_tensor_cn.rst
layers_cn/matmul_cn.rst
layers_cn/maxout_cn.rst
layers_cn/mean_cn.rst
......@@ -165,11 +170,12 @@ fluid.layers
layers_cn/natural_exp_decay_cn.rst
layers_cn/nce_cn.rst
layers_cn/noam_decay_cn.rst
layers_cn/Normal_cn.rst
layers_cn/not_equal_cn.rst
layers_cn/npair_loss_cn.rst
layers_cn/one_hot_cn.rst
layers_cn/ones_cn.rst
layers_cn/open_files_cn.rst
layers_cn/ones_like_cn.rst
layers_cn/pad_cn.rst
layers_cn/pad_constant_like_cn.rst
layers_cn/pad2d_cn.rst
......@@ -181,14 +187,12 @@ fluid.layers
layers_cn/pool3d_cn.rst
layers_cn/pow_cn.rst
layers_cn/prelu_cn.rst
layers_cn/Preprocessor_cn.rst
layers_cn/Print_cn.rst
layers_cn/prior_box_cn.rst
layers_cn/psroi_pool_cn.rst
layers_cn/py_func_cn.rst
layers_cn/py_reader_cn.rst
layers_cn/random_crop_cn.rst
layers_cn/random_data_generator_cn.rst
layers_cn/range_cn.rst
layers_cn/rank_cn.rst
layers_cn/rank_loss_cn.rst
......@@ -207,6 +211,7 @@ fluid.layers
layers_cn/reshape_cn.rst
layers_cn/resize_bilinear_cn.rst
layers_cn/resize_nearest_cn.rst
layers_cn/resize_trilinear_cn.rst
layers_cn/retinanet_detection_output_cn.rst
layers_cn/retinanet_target_assign_cn.rst
layers_cn/reverse_cn.rst
......@@ -239,6 +244,7 @@ fluid.layers
layers_cn/sequence_softmax_cn.rst
layers_cn/sequence_unpad_cn.rst
layers_cn/shape_cn.rst
layers_cn/shard_index_cn.rst
layers_cn/shuffle_channel_cn.rst
layers_cn/shuffle_cn.rst
layers_cn/sigmoid_cn.rst
......@@ -280,10 +286,15 @@ fluid.layers
layers_cn/topk_cn.rst
layers_cn/transpose_cn.rst
layers_cn/tree_conv_cn.rst
layers_cn/unfold_cn.rst
layers_cn/Uniform_cn.rst
layers_cn/uniform_random_cn.rst
layers_cn/uniform_random_batch_size_like_cn.rst
layers_cn/unique_cn.rst
layers_cn/unique_with_counts_cn.rst
layers_cn/unsqueeze_cn.rst
layers_cn/unstack_cn.rst
layers_cn/var_conv_2d_cn.rst
layers_cn/warpctc_cn.rst
layers_cn/where_cn.rst
layers_cn/While_cn.rst
......
.. _cn_api_fluid_layers_Preprocessor:
Preprocessor
-------------------------------
.. py:class:: paddle.fluid.layers.Preprocessor(reader, name=None)
reader变量中数据预处理块。
参数:
- **reader** (Variable)-reader变量
- **name** (str,默认None)-reader的名称
**代码示例**:
.. code-block:: python
import paddle.fluid as fluid
reader = fluid.layers.io.open_files(
filenames=['./data1.recordio', './data2.recordio'],
shapes=[(3, 224, 224), (1, )],
lod_levels=[0, 0],
dtypes=['float32', 'int64'])
preprocessor = fluid.layers.io.Preprocessor(reader=reader)
with preprocessor.block():
img, lbl = preprocessor.inputs()
img_out = img / 2
lbl_out = lbl + 1
preprocessor.outputs(img_out, lbl_out)
data_file = fluid.layers.io.double_buffer(preprocessor())
......@@ -35,15 +35,27 @@ Print
import paddle.fluid as fluid
input = fluid.layers.data(name="input", shape=[4, 32, 32], dtype="float32")
input = fluid.layers.Print(input, message = "The content of input layer:")
# value = some_layer(...)
# Print(value, summarize=10,
# message="The content of some_layer: ")
input = fluid.layers.fill_constant(shape=[10,2], value=3, dtype='int64')
input = fluid.layers.Print(input, message="The content of input layer:")
main_program = fluid.default_main_program()
exe = fluid.Executor(fluid.CPUPlace())
exe.run(main_program)
**运行输出**:
.. code-block:: bash
1564546375 输出层内容: place:CPUPlace
Tensor[fill_constant_0.tmp_0]
shape: [10,2,]
dtype: x
data: 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,
# 不同的环境中运行时信息的类型可能不相同.
# 比如:
# 如果Tensor y dtype='int64', 相应的 c++ 类型为 int64_t.
# 在 MacOS 和 gcc4.8.2的环境中输出的dtype为 "x" ("x" is typeid(int64_t).name()) 。
......
......@@ -19,7 +19,7 @@ anchor_generator
- **name** (str) - 先验框操作符名称。默认:None
返回:
- Anchors(Varibale): 输出anchor,布局[H,W,num_anchors,4] , ``H`` 是输入的高度, ``W`` 是输入的宽度, ``num_priors`` 是输入每位的框数,每个anchor格式(未归一化)为(xmin,ymin,xmax,ymax)
- Anchors(Varibale): 输出anchor,布局[H,W,num_anchors,4] , ``H`` 是输入的高度, ``W`` 是输入的宽度, ``num_anchors`` 是输入每位的框数,每个anchor格式(未归一化)为(xmin,ymin,xmax,ymax)
- Variances(Variable): anchor的扩展变量布局为 [H,W,num_priors,4]。 ``H`` 是输入的高度, ``W`` 是输入的宽度, ``num_priors`` 是输入每个位置的框数,每个变量的格式为(xcenter,ycenter,w,h)。
......
.. _cn_api_fluid_layers_batch:
batch
-------------------------------
.. py:function:: paddle.fluid.layers.batch(reader, batch_size)
该层是一个reader装饰器。接受一个reader变量并添加``batching``装饰。读取装饰的reader,输出数据自动组织成batch的形式。
参数:
- **reader** (Variable)-装饰有“batching”的reader变量
- **batch_size** (int)-批尺寸
返回:装饰有``batching``的reader变量
返回类型:变量(Variable)
**代码示例**:
.. code-block:: python
import paddle.fluid as fluid
raw_reader = fluid.layers.io.open_files(filenames=['./data1.recordio',
'./data2.recordio'],
shapes=[(3,224,224), (1,)],
lod_levels=[0, 0],
dtypes=['float32', 'int64'],
thread_num=2,
buffer_size=2)
batch_reader = fluid.layers.batch(reader=raw_reader, batch_size=5)
# 如果用raw_reader读取数据:
# data = fluid.layers.read_file(raw_reader)
# 只能得到数据实例。
#
# 但如果用batch_reader读取数据:
# data = fluid.layers.read_file(batch_reader)
# 每5个相邻的实例自动连接成一个batch。因此get('data')得到的是一个batch数据而不是一个实例。
......@@ -22,7 +22,7 @@ beam_search
参数:
- **pre_ids** (Variable) - LodTensor变量,它是上一步 ``beam_search`` 的输出。在第一步中。它应该是LodTensor,shape为 :math:`(batch\_size,1)` , :math:`lod [[0,1,...,batch\_size],[0,1,...,batch\_size]]`
- **pre_scores** (Variable) - LodTensor变量,它是上一步中beam_search的输出
- **ids** (Variable) - 包含候选ID的LodTensor变量。shape为 :math:`(batch\_size×beam\_ize,K)` ,其中 ``K`` 应该是 ``beam_size``
- **ids** (Variable) - 包含候选ID的LodTensor变量。shape为 :math:`(batch\_size×beam\_size,K)` ,其中 ``K`` 应该是 ``beam_size``
- **scores** (Variable) - 与 ``ids`` 及其shape对应的累积分数的LodTensor变量, 与 ``ids`` 的shape相同。
- **beam_size** (int) - 束搜索中的束宽度。
- **end_id** (int) - 结束标记的id。
......
......@@ -65,7 +65,7 @@ conv2d_transpose
- **filter_size** (int|tuple|None) - 滤波器大小。如果filter_size是一个tuple,则形式为(filter_size_H, filter_size_W)。否则,滤波器将是一个方阵。如果filter_size=None,则内部会计算输出大小。
- **padding** (int|tuple) - 填充大小。如果padding是一个元组,它必须包含两个整数(padding_H、padding_W)。否则,padding_H = padding_W = padding。默认:padding = 0。
- **stride** (int|tuple) - 步长大小。如果stride是一个元组,那么元组的形式为(stride_H、stride_W)。否则,stride_H = stride_W = stride。默认:stride = 1。
- **dilation** (int|元组) - 膨胀(dilation)大小。如果dilation是一个元组,那么元组的形式为(dilation_H, dilation_W)。否则,dilation_H = dilation_W = dilation_W。默认:dilation= 1。
- **dilation** (int|元组) - 膨胀(dilation)大小。如果dilation是一个元组,那么元组的形式为(dilation_H, dilation_W)。否则,dilation_H = dilation_W = dilation。默认:dilation= 1。
- **groups** (int) - Conv2d转置层的groups个数。从Alex Krizhevsky的CNN Deep论文中的群卷积中受到启发,当group=2时,前半部分滤波器只连接到输入通道的前半部分,而后半部分滤波器只连接到输入通道的后半部分。默认值:group = 1。
- **param_attr** (ParamAttr|None) - conv2d_transfer中可学习参数/权重的属性。如果param_attr值为None或ParamAttr的一个属性,conv2d_transfer使用ParamAttrs作为param_attr的值。如果没有设置的param_attr初始化器,那么使用Xavier初始化。默认值:None。
- **bias_attr** (ParamAttr|bool|None) - conv2d_tran_bias中的bias属性。如果设置为False,则不会向输出单元添加偏置。如果param_attr值为None或ParamAttr的一个属性,将conv2d_transfer使用ParamAttrs作为,bias_attr。如果没有设置bias_attr的初始化器,bias将初始化为零。默认值:None。
......
......@@ -10,11 +10,11 @@ ELU激活层(ELU Activation Operator)
根据 https://arxiv.org/abs/1511.07289 对输入张量中每个元素应用以下计算。
.. math::
\\out=max(0,x)+min(0,α∗(ex−1))\\
\\out=max(0,x)+min(0,α∗(e^{x}−1))\\
参数:
- x(Variable)- ELU operator的输入
- alpha(FAOAT|1.0)- ELU的alpha值
- alpha(float|1.0)- ELU的alpha值
- name (str|None) -这个层的名称(可选)。如果设置为None,该层将被自动命名。
返回: ELU操作符的输出
......
......@@ -33,8 +33,8 @@ lstm单元的输入包括 :math:`x_{t}` , :math:`h_{t-1}` 和 :math:`c_{t-1}`
参数:
- **x_t** (Variable) - 当前步的输入值,二维张量,shape为 M x N ,M是批尺寸,N是输入尺寸
- **hidden_t_prev** (Variable) - lstm单元的隐藏状态值,二维张量,shape为 M x S,M是批尺寸,N是lstm单元的大小
- **cell_t_prev** (Variable) - lstm单元的cell值,二维张量,shape为 M x S ,M是批尺寸,N是lstm单元的大小
- **hidden_t_prev** (Variable) - lstm单元的隐藏状态值,二维张量,shape为 M x S,M是批尺寸,S是lstm单元的大小
- **cell_t_prev** (Variable) - lstm单元的cell值,二维张量,shape为 M x S ,M是批尺寸,S是lstm单元的大小
- **forget_bias** (Variable) - lstm单元的遗忘bias
- **param_attr** (ParamAttr|None) - 可学习hidden-hidden权重的擦参数属性。如果设为None或者 ``ParamAttr`` 的一个属性,lstm_unit创建 ``ParamAttr`` 为param_attr。如果param_attr的初始化函数未设置,参数初始化为Xavier。默认:None
- **bias_attr** (ParamAttr|None) - 可学习bias权重的bias属性。如果设为False,输出单元中则不添加bias。如果设为None或者 ``ParamAttr`` 的一个属性,lstm_unit创建 ``ParamAttr`` 为bias_attr。如果bias_attr的初始化函数未设置,bias初始化为0.默认:None
......
.. _cn_api_fluid_layers_open_files:
open_files
-------------------------------
.. py:function:: paddle.fluid.layers.open_files(filenames, shapes, lod_levels, dtypes, thread_num=None, buffer_size=None, pass_num=1, is_test=None)
打开文件(Open files)
该函数获取需要读取的文件列表,并返回Reader变量。通过Reader变量,我们可以从给定的文件中获取数据。所有文件必须有名称后缀来表示它们的格式,例如,``*.recordio``。
参数:
- **filenames** (list)-文件名列表
- **shape** (list)-元组类型值列表,声明数据维度
- **lod_levels** (list)-整形值列表,声明数据的lod层级
- **dtypes** (list)-字符串类型值列表,声明数据类型
- **thread_num** (None)-用于读文件的线程数。默认:min(len(filenames),cpu_number)
- **buffer_size** (None)-reader的缓冲区大小。默认:3*thread_num
- **pass_num** (int)-用于运行的传递数量
- **is_test** (bool|None)-open_files是否用于测试。如果用于测试,生成的数据顺序和文件顺序一致。反之,无法保证每一epoch之间的数据顺序是一致的
返回:一个Reader变量,通过该变量获取文件数据
返回类型:变量(Variable)
**代码示例**:
.. code-block:: python
import paddle.fluid as fluid
reader = fluid.layers.io.open_files(filenames=['./data1.recordio',
'./data2.recordio'],
shapes=[(3,224,224), (1,)],
lod_levels=[0, 0],
dtypes=['float32', 'int64'])
# 通过reader, 可使用''read_file''层获取数据:
image, label = fluid.layers.io.read_file(reader)
.. _cn_api_fluid_layers_random_data_generator:
random_data_generator
-------------------------------
.. py:function:: paddle.fluid.layers.random_data_generator(low, high, shapes, lod_levels, for_parallel=True)
创建一个均匀分布随机数据生成器.
该层返回一个Reader变量。该Reader变量不是用于打开文件读取数据,而是自生成float类型的均匀分布随机数。该变量可作为一个虚拟reader来测试网络,而不需要打开一个真实的文件。
参数:
- **low** (float)--数据均匀分布的下界
- **high** (float)-数据均匀分布的上界
- **shapes** (list)-元组数列表,声明数据维度
- **lod_levels** (list)-整形数列表,声明数据
- **for_parallel** (Bool)-若要运行一系列操作命令则将其设置为True
返回:Reader变量,可从中获取随机数据
返回类型:变量(Variable)
**代码示例**:
.. code-block:: python
import paddle.fluid as fluid
reader = fluid.layers.random_data_generator(
low=0.0,
high=1.0,
shapes=[[3,224,224], [1]],
lod_levels=[0, 0])
# 通过reader, 可以用'read_file'层获取数据:
image, label = fluid.layers.read_file(reader)
.. _cn_api_fluid_layers_shard_index:
shard_index
-------------------------------
.. py:function:: paddle.fluid.layers.shard_index(input, index_num, nshards, shard_id, ignore_value=-1)
该层为输入创建碎片化索引,通常在模型和数据并行混合训练时使用,索引数据(通常是标签)应该在每一个trainer里面被计算,通过
::
assert index_num % nshards == 0
shard_size = index_num / nshards
如果 x / shard_size == shard_id
y = x % shard_size
否则
y = ignore_value
我们使用分布式 ``one-hot`` 表示来展示该层如何使用, 分布式的 ``one-hot`` 表示被分割为多个碎片, 碎片索引里不为1的都使用0来填充。为了在每一个trainer里面创建碎片化的表示,原始的索引应该先进行计算(i.e. sharded)。我们来看个例子:
.. code-block:: text
X 是一个整形张量
X.shape = [4, 1]
X.data = [[1], [6], [12], [19]]
假设 index_num = 20 并且 nshards = 2, 我们可以得到 shard_size = 10
如果 shard_id == 0, 我们得到输出:
Out.shape = [4, 1]
Out.data = [[1], [6], [-1], [-1]]
如果 shard_id == 1, 我们得到输出:
Out.shape = [4, 1]
Out.data = [[-1], [-1], [2], [9]]
上面的例子中默认 ignore_value = -1
参数:
- **input** (Variable)- 输入的索引,最后的维度应该为1
- **index_num** (scalar) - 定义索引长度的整形参数
- **nshards** (scalar) - shards数量
- **shard_id** (scalar) - 当前碎片的索引
- **ignore_value** (scalar) - 超出碎片索引范围的整型值
返回: 输入的碎片索引
返回类型: Variable
**代码示例:**
.. code-block:: python
import paddle.fluid as fluid
label = fluid.layers.data(name="label", shape=[1], dtype="int64")
shard_label = fluid.layers.shard_index(input=label,
index_num=20,
nshards=2,
shard_id=0)
......@@ -55,14 +55,14 @@ API Reference 请参考 :ref:`cn_api_fluid_layers_floor`
sin
------------------
对输入 :code:`Tensor` 逐元素取正
对输入 :code:`Tensor` 逐元素取正
API Reference 请参考 :ref:`cn_api_fluid_layers_sin`
cos
------------------
对输入 :code:`Tensor` 逐元素取余
对输入 :code:`Tensor` 逐元素取余
API Reference 请参考 :ref:`cn_api_fluid_layers_cos`
......
......@@ -2,7 +2,7 @@
## 环境准备
* *Windows 7/8/10 专业版/企业版 (64bit) (GPU版本支持CUDA 8/9.2, 且仅支持单卡)*
* *Windows 7/8/10 专业版/企业版 (64bit) (GPU版本支持CUDA 8.0/9.0/10.0, 且仅支持单卡)*
* *Python 版本 2.7/3.5.1+/3.6/3.7 (64 bit)*
* *pip 或 pip3 版本 9.0.1+ (64 bit)*
* *Visual Studio 2015 Update3*
......@@ -12,7 +12,7 @@
* 如果您的计算机没有 NVIDIA® GPU,请编译CPU版的PaddlePaddle
* 如果您的计算机有NVIDIA® GPU,并且满足以下条件,推荐编译GPU版的PaddlePaddle
* *CUDA 工具包8.0 配合cuDNN v7.1+, 9.0配合cuDNN v7.3+*
* *CUDA 工具包8.0配合cuDNN v7.1+, 9.0/10.0配合cuDNN v7.3+*
* *GPU运算能力超过1.0的硬件设备*
## 安装步骤
......
***
# **Compile on Windows from Source Code**
This instruction will show you how to compile PaddlePaddle on a *64-bit desktop or laptop* and Windows 10. The Windows systems we support must meet the following requirements:
......@@ -60,7 +59,6 @@ Please note: The current version does not support NCCL and distributed related f
6. Execute cmake:
> For details on the compilation options, see [the compilation options list](../Tables.html/#Compile).
* For users who need to compile **the CPU version PaddlePaddle**:
For Python2:`cmake .. -G "Visual Studio 14 2015 Win64" -DPYTHON_INCLUDE_DIR = $ {PYTHON_INCLUDE_DIRS}
......
......@@ -2,7 +2,7 @@
## 环境准备
* *Windows 7/8/10 专业版/企业版 (64bit) (GPU版本支持CUDA 8.0/9.0,且仅支持单卡)*
* *Windows 7/8/10 专业版/企业版 (64bit) (GPU版本支持CUDA 8.0/9.0/10.0,且仅支持单卡)*
* *Python 版本 2.7.15+/3.5.1+/3.6/3.7 (64 bit)*
* *pip 或 pip3 版本 9.0.1+ (64 bit)*
......@@ -60,10 +60,10 @@
* 如果您的计算机没有 NVIDIA® GPU,请安装CPU版的PaddlePaddle
* 如果您的计算机有 NVIDIA® GPU,并且满足以下条件,推荐安装GPU版的PaddlePaddle
* *CUDA 工具包8.0配合cuDNN v7.1+, 9.0配合cuDNN v7.3+*
* *CUDA 工具包8.0配合cuDNN v7.1+, 9.0/10.0配合cuDNN v7.3+*
* *GPU运算能力超过1.0的硬件设备*
注: 目前官方发布的windows安装包仅包含 CUDA 8.0/9.0 的单卡模式,不包含 CUDA 9.1/9.2/10.0/10.1,如需使用,请通过源码自行编译。
注: 目前官方发布的windows安装包仅包含 CUDA 8.0/9.0/10.0 的单卡模式,不包含 CUDA 9.1/9.2/10.1,如需使用,请通过源码自行编译。
您可参考NVIDIA官方文档了解CUDA和CUDNN的安装流程和配置方法,请见[CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/)[cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/)
......@@ -91,7 +91,7 @@ Windows系统下有3种安装方式:
* 如果是python2.7, 建议使用`python`命令; 如果是python3.x, 则建议使用`python3`命令
* `python -m pip install paddlepaddle-gpu -i https://pypi.tuna.tsinghua.edu.cn/simple` 此命令将安装支持CUDA 8.0/9.0 cuDNN v7.3+的PaddlePaddle,如您对CUDA或cuDNN版本有不同要求,可用`python -m pip install paddlepaddle-gpu==[版本号] -i https://pypi.tuna.tsinghua.edu.cn/simple``python3 -m pip install paddlepaddle-gpu==[版本号] -i https://pypi.tuna.tsinghua.edu.cn/simple`命令来安装,版本号请见[这里](https://pypi.org/project/paddlepaddle-gpu/#history), 关于paddlepaddle与CUDA, cuDNN版本的对应关系请见[安装包列表](./Tables.html/#whls)
* `python -m pip install paddlepaddle-gpu -i https://pypi.tuna.tsinghua.edu.cn/simple` 此命令将安装支持CUDA 8.0(配合cuDNN v7.1+)或者CUDA 9.0/10.0(配合cuDNN v7.3+)的PaddlePaddle,如您对CUDA或cuDNN版本有不同要求,可用`python -m pip install paddlepaddle-gpu==[版本号] -i https://pypi.tuna.tsinghua.edu.cn/simple``python3 -m pip install paddlepaddle-gpu==[版本号] -i https://pypi.tuna.tsinghua.edu.cn/simple`命令来安装,版本号请见[这里](https://pypi.org/project/paddlepaddle-gpu/#history), 关于paddlepaddle与CUDA, cuDNN版本的对应关系请见[安装包列表](./Tables.html/#whls)
<a name="check"></a>
......
......@@ -2,7 +2,7 @@
## Operating Environment
* *Windows 7/8/10 Pro/Enterprise(64bit)(CUDA 8.0/9.0 are supported, and only single GPU is supported)*
* *Windows 7/8/10 Pro/Enterprise(64bit)(CUDA 8.0/9.0/10.0 are supported, and only single GPU is supported)*
* *Python 2.7.15+/3.5.1+/3.6/3.7(64bit)*
* *pip or pip3 9.0.1+(64bit)*
......@@ -16,10 +16,10 @@
* If your computer doesn’t have NVIDIA® GPU, please install the CPU version of PaddlePaddle
* If your computer has NVIDIA® GPU, and it satisfies the following requirements, we recommend you to install the GPU version of PaddlePaddle
* *CUDA Toolkit 8.0/9.0 with cuDNN v7.3+*
* *CUDA Toolkit 8.0 with cuDNN v7.1+, or 9.0/10.0 with cuDNN v7.3+*
* *GPU's computing capability exceeds 1.0*
Note: currently, the official Windows installation package only support CUDA 8.0/9.0 with single GPU, and don't support CUDA 9.1/9.2/10.0/10.1. if you need to use, please compile by yourself through the source code.
Note: currently, the official Windows installation package only support CUDA 8.0/9.0/10.0 with single GPU, and don't support CUDA 9.1/9.2/10.1. if you need to use, please compile by yourself through the source code.
Please refer to the NVIDIA official documents for the installation process and the configuration methods of [CUDA](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/) and [cuDNN](https://docs.nvidia.com/deeplearning/sdk/cudnn-install/).
......@@ -43,7 +43,7 @@ There is a checking function below for [verifyig whether the installation is suc
Notice:
* The version of pip and the version of python should be corresponding: python2.7 corresponds to `pip`; python3.x corresponds to `pip3`.
* `pip install paddlepaddle-gpu` This command will install PaddlePaddle that supports CUDA 8.0/9.0 cuDNN v7.3+, Currently, PaddlePaddle doesn't support any other version of CUDA or cuDNN on Windows.
* `pip install paddlepaddle-gpu` This command will install PaddlePaddle that supports CUDA 8.0(with cuDNN v7.1+), or CUDA 9.0/10.0(with cuDNN v7.3+).
<a name="check"></a>
## Installation Verification
......
......@@ -43,7 +43,6 @@ import paddle.fluid as fluid
y = fluid.layers.fc(input=x, size=128, bias_attr=True)
```
**2. 输入输出Tensor**
整个神经网络的输入数据也是一个特殊的 Tensor,在这个 Tensor 中,一些维度的大小在定义模型时无法确定(通常包括:batch size,如果 mini-batch 之间数据可变,也会包括图片的宽度和高度等),在定义模型时需要占位。
......@@ -97,7 +96,30 @@ type {
persistable: false
```
具体输出数值将在Executor运行时得到,详细过程会在后文展开描述。
具体输出数值将在Executor运行时得到。获取运行时的Variable数值有两种方式:方式一是利用 `paddle.fluid.layers.Print` 创建一个打印操作,打印正在访问的张量。方式二是将Variable添加在fetch_list中。
方式一的代码实现如下所示:
```python
import paddle.fluid as fluid
data = fluid.layers.fill_constant(shape=[1], value=0, dtype='int64')
data = fluid.layers.Print(data, message="Print data:")
```
运行时的输出结果:
```
1563874307 Print data: The place is:CPUPlace
Tensor[fill_constant_0.tmp_0]
shape: [1,]
dtype: x
data: 0,
```
更多 Print API 的使用方式请查看:[Print操作命令](https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/api_cn/layers_cn/control_flow_cn.html#print)
方式二Fetch_list的详细过程会在后文展开描述。
## 数据传入
......
......@@ -101,7 +101,29 @@ type {
persistable: false
```
Specific output value will be shown at the runtime of Executor. Detailed process will be explained later.
Specific output value will be shown at the runtime of Executor. There are two ways to get runtime Variable value. The first way is to use `paddle.fluid.layers.Print` to create a print op that will print the tensor being accessed. The second way is to add Variable to Fetch_list.
Code of the first way is as follows:
```python
import paddle.fluid as fluid
data = fluid.layers.fill_constant(shape=[1], value=0, dtype='int64')
data = fluid.layers.Print(data, message="Print data: ")
```
Output at the runtime of Executor:
```
1563874307 Print data: The place is:CPUPlace
Tensor[fill_constant_0.tmp_0]
shape: [1,]
dtype: x
data: 0,
```
For more information on how to use the Print API, please refer to [Print operator](https://www.paddlepaddle.org.cn/documentation/docs/en/1.5/api/layers/control_flow.html#print).
Detailed process of the second way Fetch_list will be explained later.
## Feed data
......
......@@ -3,7 +3,7 @@ cudnn
==================
conv_workspace_size_limit
FLAGS_conv_workspace_size_limit
*******************************************
(始于0.13.0)
......@@ -18,7 +18,7 @@ Uint64型,缺省值为4096。即4G内存工作区。
FLAGS_conv_workspace_size_limit=1024 - 将用于选择cuDNN卷积算法的工作区限制大小设置为1024MB。
cudnn_batchnorm_spatial_persistent
FLAGS_cudnn_batchnorm_spatial_persistent
*******************************************
(始于1.4.0)
......@@ -37,7 +37,7 @@ FLAGS_cudnn_batchnorm_spatial_persistent=True - 开启CUDNN_BATCHNORM_SPATIAL_PE
此模式在某些任务中可以更快,因为将为CUDNN_DATA_FLOAT和CUDNN_DATA_HALF数据类型选择优化路径。我们默认将其设置为False的原因是此模式可能使用原子整数缩减(scaled atomic integer reduction)而导致某些输入数据范围的数字溢出。
cudnn_deterministic
FLAGS_cudnn_deterministic
*******************************************
(始于0.13.0)
......@@ -56,7 +56,7 @@ FLAGS_cudnn_deterministic=True - 选择cuDNN中的确定性函数。
现在,在cuDNN卷积和池化Operator中启用此flag。确定性算法速度可能较慢,因此该flag通常用于调试。
cudnn_exhaustive_search
FLAGS_cudnn_exhaustive_search
*******************************************
(始于1.2.0)
......
......@@ -3,7 +3,7 @@ cudnn
==================
conv_workspace_size_limit
FLAGS_conv_workspace_size_limit
*******************************************
(since 0.13.0)
......@@ -18,7 +18,7 @@ Example
FLAGS_conv_workspace_size_limit=1024 set the workspace limit size for choosing cuDNN convolution algorithms to 1024MB.
cudnn_batchnorm_spatial_persistent
FLAGS_cudnn_batchnorm_spatial_persistent
*******************************************
(since 1.4.0)
......@@ -37,7 +37,7 @@ Note
This mode can be faster in some tasks because an optimized path will be selected for CUDNN_DATA_FLOAT and CUDNN_DATA_HALF data types. The reason we set it to False by default is that this mode may use scaled atomic integer reduction which may cause a numerical overflow for some input data range.
cudnn_deterministic
FLAGS_cudnn_deterministic
*******************************************
(since 0.13.0)
......@@ -56,7 +56,7 @@ Note
Now this flag is enabled in cuDNN convolution and pooling operator. The deterministic algorithms may slower, so this flag is generally used for debugging.
cudnn_exhaustive_search
FLAGS_cudnn_exhaustive_search
*******************************************
(since 1.2.0)
......
......@@ -3,7 +3,7 @@
==================
enable_cublas_tensor_op_math
FLAGS_enable_cublas_tensor_op_math
*******************************************
(始于1.2.0)
......@@ -15,10 +15,10 @@ Bool型,缺省值为False。
示例
-------
enable_cublas_tensor_op_math=True - 使用Tensor Core。
FLAGS_enable_cublas_tensor_op_math=True - 使用Tensor Core。
use_mkldnn
FLAGS_use_mkldnn
*******************************************
(始于0.13.0)
......
......@@ -2,7 +2,7 @@
data processing
==================
enable_cublas_tensor_op_math
FLAGS_enable_cublas_tensor_op_math
*******************************************
(since 1.2.0)
......@@ -14,10 +14,10 @@ Bool. The default value is False.
Example
-------
enable_cublas_tensor_op_math=True will use Tensor Core.
FLAGS_enable_cublas_tensor_op_math=True will use Tensor Core.
use_mkldnn
FLAGS_use_mkldnn
*******************************************
(since 0.13.0)
......
......@@ -3,7 +3,7 @@
==================
check_nan_inf
FLAGS_check_nan_inf
********************
(始于0.13.0)
......@@ -18,7 +18,7 @@ Bool型,缺省值为False。
FLAGS_check_nan_inf=True - 检查Operator的结果是否含有Nan或Inf。
cpu_deterministic
FLAGS_cpu_deterministic
*******************************************
(始于0.15.0)
......@@ -33,7 +33,7 @@ Bool型,缺省值为False。
FLAGS_cpu_deterministic=True - 在CPU侧确定计算结果。
enable_rpc_profiler
FLAGS_enable_rpc_profiler
*******************************************
(始于1.0.0)
......@@ -48,7 +48,7 @@ Bool型,缺省值为False。
FLAGS_enable_rpc_profiler=True - 启用RPC分析器并在分析器文件中记录时间线。
multiple_of_cupti_buffer_size
FLAGS_multiple_of_cupti_buffer_size
*******************************************
(始于1.4.0)
......@@ -63,7 +63,7 @@ Int32型,缺省值为1。
FLAGS_multiple_of_cupti_buffer_size=1 - 将CUPTI设备缓冲区大小的倍数设为1。
reader_queue_speed_test_mode
FLAGS_reader_queue_speed_test_mode
*******************************************
(始于1.1.0)
......
......@@ -2,7 +2,7 @@
debug
==================
check_nan_inf
FLAGS_check_nan_inf
**************************************
(since 0.13.0)
......@@ -17,7 +17,7 @@ Example
FLAGS_check_nan_inf=True will check the result of Operator whether the result has Nan or Inf.
cpu_deterministic
FLAGS_cpu_deterministic
*******************************************
(since 0.15.0)
......@@ -32,7 +32,7 @@ Example
FLAGS_cpu_deterministic=True will make the result of computation deterministic in CPU side.
enable_rpc_profiler
FLAGS_enable_rpc_profiler
*******************************************
(Since 1.0.0)
......@@ -47,7 +47,7 @@ Example
FLAGS_enable_rpc_profiler=True will enable rpc profiler and record the timeline to profiler file.
multiple_of_cupti_buffer_size
FLAGS_multiple_of_cupti_buffer_size
*******************************************
(since 1.4.0)
......@@ -62,7 +62,7 @@ Example
FLAGS_multiple_of_cupti_buffer_size=1 set the multiple of the CUPTI device buffer size to 1.
reader_queue_speed_test_mode
FLAGS_reader_queue_speed_test_mode
*******************************************
(since 1.1.0)
......
......@@ -3,7 +3,7 @@
==================
paddle_num_threads
FLAGS_paddle_num_threads
*******************************************
(始于0.15.0)
......@@ -18,7 +18,7 @@ Int32型,缺省值为1。
FLAGS_paddle_num_threads=2 - 将每个实例的最大线程数设为2。
selected_gpus
FLAGS_selected_gpus
*******************************************
(始于1.3)
......
......@@ -3,7 +3,7 @@ device management
==================
paddle_num_threads
FLAGS_paddle_num_threads
*******************************************
(since 0.15.0)
......@@ -18,7 +18,7 @@ Example
FLAGS_paddle_num_threads=2 will enable 2 threads as max number of threads for each instance.
selected_gpus
FLAGS_selected_gpus
*******************************************
(since 1.3)
......
......@@ -3,7 +3,7 @@
==================
communicator_fake_rpc
FLAGS_communicator_fake_rpc
**********************
(始于1.5.0)
......@@ -22,7 +22,7 @@ FLAGS_communicator_fake_rpc=True - 启用通信器fake模式。
该flag仅用于paddlepaddle的开发者,普通用户不应对其设置。
communicator_independent_recv_thread
FLAGS_communicator_independent_recv_thread
**************************************
(始于1.5.0)
......@@ -41,7 +41,7 @@ FLAGS_communicator_independent_recv_thread=True - 使用独立线程以从参数
开发者使用该flag进行框架的调试与优化,普通用户不应对其设置。
communicator_max_merge_var_num
FLAGS_communicator_max_merge_var_num
**************************************
(始于1.5.0)
......@@ -60,7 +60,7 @@ FLAGS_communicator_max_merge_var_num=16 - 将要通过通信器合并为一个
该flag和训练器线程数有着密切关联,缺省值应和线程数一致。
communicator_merge_sparse_grad
FLAGS_communicator_merge_sparse_grad
*******************************************
(始于1.5.0)
......@@ -79,11 +79,11 @@ FLAGS_communicator_merge_sparse_grad=true - 设置合并稀疏梯度。
合并稀疏梯度会耗费时间。如果重复ID较多,内存占用会变少,通信会变快;如果重复ID较少,则并不会节约内存。
communicator_min_send_grad_num_before_recv
FLAGS_communicator_min_send_grad_num_before_recv
*******************************************
(始于1.5.0)
在通信器中,有一个发送线程向参数服务器发送梯度,一个接收线程从参数服务器接收参数,且它们之间彼此独立。该flag用于控制接收线程的频率。 仅当发送线程至少发送communicator_min_send_grad_num_before_recv数量的梯度时,接收线程才会从参数服务器接收参数。
在通信器中,有一个发送线程向参数服务器发送梯度,一个接收线程从参数服务器接收参数,且它们之间彼此独立。该flag用于控制接收线程的频率。 仅当发送线程至少发送FLAGS_communicator_min_send_grad_num_before_recv数量的梯度时,接收线程才会从参数服务器接收参数。
取值范围
---------------
......@@ -98,7 +98,7 @@ FLAGS_communicator_min_send_grad_num_before_recv=10 - 在接收线程从参数
由于该flag和训练器的训练线程数强相关,而每个训练线程都会发送其梯度,所以缺省值应和线程数一致。
communicator_send_queue_size
FLAGS_communicator_send_queue_size
*******************************************
(始于1.5.0)
......@@ -117,7 +117,7 @@ FLAGS_communicator_send_queue_size=10 - 设置每个梯度的队列大小为10
该flag会影响训练速度,若队列大小过大,速度会变快但结果可能会变差。
communicator_send_wait_times
FLAGS_communicator_send_wait_times
*******************************************
(始于1.5.0)
......@@ -132,7 +132,7 @@ Int32型,缺省值为5。
FLAGS_communicator_send_wait_times=5 - 将合并数没有达到max_merge_var_num的情况下发送线程等待的次数设为5。
communicator_thread_pool_size
FLAGS_communicator_thread_pool_size
*******************************************
(始于1.5.0)
......@@ -151,7 +151,7 @@ FLAGS_communicator_thread_pool_size=10 - 设置线程池大小为10。
大部分情况下,用户不需要设置该flag。
dist_threadpool_size
FLAGS_dist_threadpool_size
*******************************************
(始于1.0.0)
......@@ -166,7 +166,7 @@ Int32型,缺省值为0。
FLAGS_dist_threadpool_size=10 - 将用于分布式模块的最大线程数设为10。
rpc_deadline
FLAGS_rpc_deadline
*******************************************
(始于1.0.0)
......@@ -181,11 +181,11 @@ Int32型,缺省值为180000,单位为ms。
FLAGS_rpc_deadline=180000 - 将deadline超时设为3分钟。
rpc_disable_reuse_port
FLAGS_rpc_disable_reuse_port
*******************************************
(始于1.2.0)
rpc_disable_reuse_port为True时,grpc的 GRPC_ARG_ALLOW_REUSEPORT会被设置为False以禁用SO_REUSEPORT。
FLAGS_rpc_disable_reuse_port为True时,grpc的 GRPC_ARG_ALLOW_REUSEPORT会被设置为False以禁用SO_REUSEPORT。
取值范围
---------------
......@@ -196,7 +196,7 @@ Bool型,缺省值为False。
FLAGS_rpc_disable_reuse_port=True - 禁用SO_REUSEPORT。
rpc_get_thread_num
FLAGS_rpc_get_thread_num
*******************************************
(始于1.0.0)
......@@ -211,7 +211,7 @@ Int32型,缺省值为12。
FLAGS_rpc_get_thread_num=6 - 将从参数服务器获取参数的线程数设为6。
rpc_send_thread_num
FLAGS_rpc_send_thread_num
*******************************************
(始于1.0.0)
......@@ -226,11 +226,11 @@ Int32型,缺省值为12。
FLAGS_rpc_send_thread_num=6 - 将用于发送的线程数设为6。
rpc_server_profile_path
FLAGS_rpc_server_profile_path
*******************************************
since(v0.15.0)
设置分析器输出日志文件路径前缀。完整路径为rpc_server_profile_path_listener_id,其中listener_id为随机数。
设置分析器输出日志文件路径前缀。完整路径为FLAGS_rpc_server_profile_path_listener_id,其中listener_id为随机数。
取值范围
---------------
......
......@@ -2,7 +2,7 @@
distributed
==================
communicator_fake_rpc
FLAGS_communicator_fake_rpc
**************************************
(since 1.5.0)
......@@ -21,7 +21,7 @@ Note
This flag is only for developer of paddlepaddle, user should not set it.
communicator_independent_recv_thread
FLAGS_communicator_independent_recv_thread
**************************************
(since 1.5.0)
......@@ -40,7 +40,7 @@ Note
This flag is for developer to debug and optimize the framework. User should not set it.
communicator_max_merge_var_num
FLAGS_communicator_max_merge_var_num
**************************************
(since 1.5.0)
......@@ -59,7 +59,7 @@ Note
This flag has strong relationship with trainer thread num. The default value should be the same with thread num.
communicator_merge_sparse_grad
FLAGS_communicator_merge_sparse_grad
*******************************
(since 1.5.0)
......@@ -78,11 +78,11 @@ Note
Merging sparse gradient would be time-consuming. If the sparse gradient has many duplicated ids, it will save memory and communication could be much faster. Otherwise it will not save memory.
communicator_min_send_grad_num_before_recv
FLAGS_communicator_min_send_grad_num_before_recv
*******************************************
(since 1.5.0)
In communicator, there is one send thread that send gradient to parameter server and one receive thread that receive parameter from parameter server. They work independently. This flag is used to control the frequency of receive thread. Only when the send thread send at least communicator_min_send_grad_num_before_recv gradients will the receive thread receive parameter from parameter server.
In communicator, there is one send thread that send gradient to parameter server and one receive thread that receive parameter from parameter server. They work independently. This flag is used to control the frequency of receive thread. Only when the send thread send at least FLAGS_communicator_min_send_grad_num_before_recv gradients will the receive thread receive parameter from parameter server.
Values accepted
---------------
......@@ -97,7 +97,7 @@ Note
This flag has strong relation with the training threads of trainer. because each training thread will send it's grad. So the default value should be training thread num.
communicator_send_queue_size
FLAGS_communicator_send_queue_size
*******************************************
(since 1.5.0)
......@@ -116,7 +116,7 @@ Note
This flag will affect the training speed, if the queue size is larger, the speed may be faster, but may make the result worse.
communicator_send_wait_times
FLAGS_communicator_send_wait_times
*******************************************
(since 1.5.0)
......@@ -131,7 +131,7 @@ Example
FLAGS_communicator_send_wait_times=5 set the times that send thread will wait if merge number does not reach max_merge_var_num to 5.
communicator_thread_pool_size
FLAGS_communicator_thread_pool_size
*******************************************
(since 1.5.0)
......@@ -150,7 +150,7 @@ Note
Most of time user does not need to set this flag.
dist_threadpool_size
FLAGS_dist_threadpool_size
*******************************************
(Since 1.0.0)
......@@ -165,7 +165,7 @@ Example
FLAGS_dist_threadpool_size=10 will enable 10 threads as max number of thread used for distributed module.
rpc_deadline
FLAGS_rpc_deadline
*******************************************
(Since 1.0.0)
......@@ -180,11 +180,11 @@ Example
FLAGS_rpc_deadline=180000 will set deadline timeout to 3 minute.
rpc_disable_reuse_port
FLAGS_rpc_disable_reuse_port
*******************************************
(since 1.2.0)
When rpc_disable_reuse_port is true, the flag of grpc GRPC_ARG_ALLOW_REUSEPORT will be set to false to
When FLAGS_rpc_disable_reuse_port is true, the flag of grpc GRPC_ARG_ALLOW_REUSEPORT will be set to false to
disable the use of SO_REUSEPORT if it's available.
Values accepted
......@@ -196,7 +196,7 @@ Example
FLAGS_rpc_disable_reuse_port=True will disable the use of SO_REUSEPORT.
rpc_get_thread_num
FLAGS_rpc_get_thread_num
*******************************************
(Since 1.0.0)
......@@ -211,7 +211,7 @@ Example
FLAGS_rpc_get_thread_num=6 will use 6 threads to get parameter from parameter server.
rpc_send_thread_num
FLAGS_rpc_send_thread_num
*******************************************
(Since 1.0.0)
......@@ -226,11 +226,11 @@ Example
FLAGS_rpc_send_thread_num=6 will set number thread used for send to 6.
rpc_server_profile_path
FLAGS_rpc_server_profile_path
*******************************************
since(v0.15.0)
Set the profiler output log file path prefix. The complete path will be rpc_server_profile_path_listener_id, listener_id is a random number.
Set the profiler output log file path prefix. The complete path will be FLAGS_rpc_server_profile_path_listener_id, listener_id is a random number.
Values accepted
---------------
......
......@@ -3,7 +3,7 @@
==================
enable_parallel_graph
FLAGS_enable_parallel_graph
*******************************************
(始于1.2.0)
......@@ -18,7 +18,7 @@ Bool型,缺省值为False。
FLAGS_enable_parallel_graph=False - 通过ParallelExecutor强制禁用并行图执行模式。
pe_profile_fname
FLAGS_pe_profile_fname
*******************************************
(始于1.3.0)
......@@ -33,7 +33,7 @@ String型,缺省值为empty ("")。
FLAGS_pe_profile_fname="./parallel_executor.perf" - 将配置文件结果存储在parallel_executor.perf中。
print_sub_graph_dir
FLAGS_print_sub_graph_dir
*******************************************
(始于1.2.0)
......@@ -48,7 +48,7 @@ String型,缺省值为empty ("")。
FLAGS_print_sub_graph_dir="./sub_graphs.txt" - 将断开连接的子图打印到"./sub_graphs.txt"。
use_ngraph
FLAGS_use_ngraph
*******************************************
(始于1.4.0)
......
......@@ -3,7 +3,7 @@ executor
==================
enable_parallel_graph
FLAGS_enable_parallel_graph
*******************************************
(since 1.2.0)
......@@ -18,7 +18,7 @@ Example
FLAGS_enable_parallel_graph=False will force disable parallel graph execution mode by ParallelExecutor.
pe_profile_fname
FLAGS_pe_profile_fname
*******************************************
(since 1.3.0)
......@@ -33,7 +33,7 @@ Example
FLAGS_pe_profile_fname="./parallel_executor.perf" will store the profile result to parallel_executor.perf.
print_sub_graph_dir
FLAGS_print_sub_graph_dir
*******************************************
(since 1.2.0)
......@@ -48,7 +48,7 @@ Example
FLAGS_print_sub_graph_dir="./sub_graphs.txt" will print the disconnected subgraphs to "./sub_graphs.txt".
use_ngraph
FLAGS_use_ngraph
*******************************************
(since 1.4.0)
......
......@@ -3,7 +3,7 @@
==================
allocator_strategy
FLAGS_allocator_strategy
********************
(始于1.2)
......@@ -20,7 +20,7 @@ FLAGS_allocator_strategy=legacy - 使用legacy分配器。
FLAGS_allocator_strategy=naive_best_fit - 使用新设计的分配器。
eager_delete_scope
FLAGS_eager_delete_scope
*******************************************
(始于0.12.0)
......@@ -35,7 +35,7 @@ Bool型,缺省值为True。
FLAGS_eager_delete_scope=True - 同步局域删除。
eager_delete_tensor_gb
FLAGS_eager_delete_tensor_gb
*******************************************
(始于1.0.0)
......@@ -58,7 +58,7 @@ FLAGS_eager_delete_tensor_gb=-1.0 - 禁用垃圾回收策略。
建议用户在训练大型网络时设置FLAGS_eager_delete_tensor_gb=0.0以启用垃圾回收策略。
enable_inplace_whitelist
FLAGS_enable_inplace_whitelist
*******************************************
(始于1.4)
......@@ -73,7 +73,7 @@ Bool型,缺省值为False。
FLAGS_enable_inplace_whitelist=True - 在特定op上禁止内存原位复用优化。
fast_eager_deletion_mode
FLAGS_fast_eager_deletion_mode
*******************************************
(始于1.3)
......@@ -90,7 +90,7 @@ FLAGS_fast_eager_deletion_mode=True - 启用快速垃圾回收策略。
FLAGS_fast_eager_deletion_mode=False - 禁用快速垃圾回收策略。
fraction_of_gpu_memory_to_use
FLAGS_fraction_of_gpu_memory_to_use
*******************************************
(始于1.2.0)
......@@ -109,7 +109,7 @@ FLAGS_fraction_of_gpu_memory_to_use=0.1 - 分配总GPU内存大小的10%作为
Windows系列平台会将FLAGS_fraction_of_gpu_memory_to_use默认设为0.5,Linux则会默认设为0.92。
free_idle_memory
FLAGS_free_idle_memory
*******************************************
(始于0.15.0)
......@@ -126,7 +126,7 @@ FLAGS_free_idle_memory=True - 空闲内存太多时释放。
FLAGS_free_idle_memory=False - 不释放空闲内存。
fuse_parameter_groups_size
FLAGS_fuse_parameter_groups_size
*******************************************
(始于1.4.0)
......@@ -141,7 +141,7 @@ Int32型,缺省值为3。
FLAGS_fuse_parameter_groups_size=3 - 将单组参数的梯度大小设为3。
fuse_parameter_memory_size
FLAGS_fuse_parameter_memory_size
*******************************************
(始于1.5.0)
......@@ -156,7 +156,7 @@ Double型,缺省值为-1.0。
FLAGS_fuse_parameter_memory_size=16 - 将单组参数梯度的上限大小设为16MB。
init_allocated_mem
FLAGS_init_allocated_mem
*******************************************
(始于0.15.0)
......@@ -173,7 +173,7 @@ FLAGS_init_allocated_mem=True - 对分配的内存进行非零初始化。
FLAGS_init_allocated_mem=False - 不会对分配的内存进行非零初始化。
initial_cpu_memory_in_mb
FLAGS_initial_cpu_memory_in_mb
*******************************************
(始于0.14.0)
......@@ -188,7 +188,7 @@ Uint64型,缺省值为500,单位为MB。
FLAGS_initial_cpu_memory_in_mb=100 - 在FLAGS_fraction_of_cpu_memory_to_use*(总物理内存)大于100MB的情况下,首次提出分配请求时,分配器预先分配100MB内存,并在预分配的内存耗尽时再次分配100MB。
initial_gpu_memory_in_mb
FLAGS_initial_gpu_memory_in_mb
*******************************************
(始于1.4.0)
......@@ -207,7 +207,7 @@ FLAGS_initial_gpu_memory_in_mb=4096 - 分配4GB作为初始GPU内存块大小。
如果设置该flag,则FLAGS_fraction_of_gpu_memory_to_use设置的内存大小将被该flag覆盖。如果未设置该flag,PaddlePaddle将使用FLAGS_fraction_of_gpu_memory_to_use分配GPU内存。
limit_of_tmp_allocation
FLAGS_limit_of_tmp_allocation
*******************************************
(始于1.3)
......@@ -222,7 +222,7 @@ Int64型,缺省值为-1。
FLAGS_limit_of_tmp_allocation=1024 - 将temporary_allocation大小的上限设为1024字节。
memory_fraction_of_eager_deletion
FLAGS_memory_fraction_of_eager_deletion
*******************************************
(始于1.4)
......@@ -242,7 +242,7 @@ FLAGS_memory_fraction_of_eager_deletion=1 - 释放所有临时变量。
FLAGS_memory_fraction_of_eager_deletion=0.5 - 仅释放50%比例的占用内存最多的变量。
reallocate_gpu_memory_in_mb
FLAGS_reallocate_gpu_memory_in_mb
*******************************************
(始于1.4.0)
......@@ -261,7 +261,7 @@ FLAGS_reallocate_gpu_memory_in_mb=1024 - 如果耗尽了分配的GPU内存块,
如果设置了该flag,PaddlePaddle将重新分配该flag指定大小的gpu内存。否则分配FLAGS_fraction_of_gpu_memory_to_use指定比例的gpu内存。
times_excess_than_required_tmp_allocation
FLAGS_times_excess_than_required_tmp_allocation
*******************************************
(始于1.3)
......@@ -276,7 +276,7 @@ Int64型,缺省值为2。
FLAGS_times_excess_than_required_tmp_allocation=1024 - 设置TemporaryAllocator可以返回的最大大小为1024*N。
use_pinned_memory
FLAGS_use_pinned_memory
*******************************************
(始于0.12.0)
......
......@@ -3,7 +3,7 @@ memory management
==================
allocator_strategy
FLAGS_allocator_strategy
**************************************
(since 1.2)
......@@ -21,7 +21,7 @@ FLAGS_allocator_strategy=naive_best_fit would use the new-designed allocator.
eager_delete_scope
FLAGS_eager_delete_scope
*******************************************
(since 0.12.0)
......@@ -36,7 +36,7 @@ Example
FLAGS_eager_delete_scope=True will make scope delete synchronously.
eager_delete_tensor_gb
FLAGS_eager_delete_tensor_gb
*******************************************
(since 1.0.0)
......@@ -60,7 +60,7 @@ It is recommended that users enable garbage collection strategy by setting FLAGS
enable_inplace_whitelist
FLAGS_enable_inplace_whitelist
*******************************************
(since 1.4)
......@@ -76,7 +76,7 @@ FLAGS_enable_inplace_whitelist=True would disable memory in-place optimization o
fast_eager_deletion_mode
FLAGS_fast_eager_deletion_mode
*******************************************
(since 1.3)
......@@ -93,7 +93,7 @@ FLAGS_fast_eager_deletion_mode=True would turn on fast garbage collection strate
FLAGS_fast_eager_deletion_mode=False would turn off fast garbage collection strategy.
fraction_of_gpu_memory_to_use
FLAGS_fraction_of_gpu_memory_to_use
*******************************************
(since 1.2.0)
......@@ -113,7 +113,7 @@ Windows series platform will set FLAGS_fraction_of_gpu_memory_to_use to 0.5 by d
Linux will set FLAGS_fraction_of_gpu_memory_to_use to 0.92 by default.
free_idle_memory
FLAGS_free_idle_memory
*******************************************
(since 0.15.0)
......@@ -130,7 +130,7 @@ FLAGS_free_idle_memory=True will free idle memory when there is too much of it.
FLAGS_free_idle_memory=False will not free idle memory.
fuse_parameter_groups_size
FLAGS_fuse_parameter_groups_size
*******************************************
(since 1.4.0)
......@@ -146,7 +146,7 @@ FLAGS_fuse_parameter_groups_size=3 will set the size of one group parameters' gr
fuse_parameter_memory_size
FLAGS_fuse_parameter_memory_size
*******************************************
(since 1.5.0)
......@@ -161,7 +161,7 @@ Example
FLAGS_fuse_parameter_memory_size=16 set the up limited memory size of one group parameters' gradient to 16 Megabytes.
init_allocated_mem
FLAGS_init_allocated_mem
*******************************************
(since 0.15.0)
......@@ -178,7 +178,7 @@ FLAGS_init_allocated_mem=True will make the allocated memory initialize as a non
FLAGS_init_allocated_mem=False will not initialize the allocated memory.
initial_cpu_memory_in_mb
FLAGS_initial_cpu_memory_in_mb
*******************************************
(since 0.14.0)
......@@ -193,7 +193,7 @@ Example
FLAGS_initial_cpu_memory_in_mb=100, if FLAGS_fraction_of_cpu_memory_to_use*(total physical memory) > 100MB, then allocator will pre-allocate 100MB when first allocation request raises, and re-allocate 100MB again when the pre-allocated memory is exhaustive.
initial_gpu_memory_in_mb
FLAGS_initial_gpu_memory_in_mb
*******************************************
(since 1.4.0)
......@@ -213,7 +213,7 @@ If you set this flag, the memory size set by FLAGS_fraction_of_gpu_memory_to_use
If you don't set this flag, PaddlePaddle will use FLAGS_fraction_of_gpu_memory_to_use to allocate gpu memory.
limit_of_tmp_allocation
FLAGS_limit_of_tmp_allocation
*******************************************
(since 1.3)
......@@ -228,7 +228,7 @@ Example
FLAGS_limit_of_tmp_allocation=1024 will set the up limit of temporary_allocation size to 1024 bytes.
memory_fraction_of_eager_deletion
FLAGS_memory_fraction_of_eager_deletion
*******************************************
(since 1.4)
......@@ -248,7 +248,7 @@ FLAGS_memory_fraction_of_eager_deletion=1 would release all temporary variables.
FLAGS_memory_fraction_of_eager_deletion=0.5 would only release 50% of variables with largest memory size.
reallocate_gpu_memory_in_mb
FLAGS_reallocate_gpu_memory_in_mb
*******************************************
(since 1.4.0)
......@@ -268,12 +268,12 @@ If this flag is set, PaddlePaddle will reallocate the gpu memory with size speci
Else PaddlePaddle will reallocate with size set by FLAGS_fraction_of_gpu_memory_to_use.
times_excess_than_required_tmp_allocation
FLAGS_times_excess_than_required_tmp_allocation
*******************************************
(since 1.3)
The FLAGS_times_excess_than_required_tmp_allocation indicates the max size the TemporaryAllocator can return. For Example
, if the required memory size is N, and times_excess_than_required_tmp_allocation is 2.0, the TemporaryAllocator will return the available allocation that the range of size is N ~ 2*N.
, if the required memory size is N, and FLAGS_times_excess_than_required_tmp_allocation is 2.0, the TemporaryAllocator will return the available allocation that the range of size is N ~ 2*N.
Values accepted
---------------
......@@ -284,7 +284,7 @@ Example
FLAGS_times_excess_than_required_tmp_allocation=1024 will set the max size of the TemporaryAllocator can return to 1024*N.
use_pinned_memory
FLAGS_use_pinned_memory
*******************************************
(since 0.12.0)
......
......@@ -4,7 +4,7 @@
benchmark
FLAGS_benchmark
********************
(始于0.12.0)
......@@ -19,7 +19,7 @@ Bool型,缺省值为False。
FLAGS_benchmark=True - 同步以测试基准。
inner_op_parallelism
FLAGS_inner_op_parallelism
*******************************************
(始于1.3.0)
......@@ -38,7 +38,7 @@ FLAGS_inner_op_parallelism=5 - 将operator内的线程数设为5。
目前只有稀疏的adam op支持inner_op_parallelism。
max_body_size
FLAGS_max_body_size
*******************************************
(始于1.0.0)
......@@ -53,7 +53,7 @@ Int32型,缺省值为2147483647。
FLAGS_max_body_size=2147483647 - 将BRPC消息大小设为2147483647。
sync_nccl_allreduce
FLAGS_sync_nccl_allreduce
*******************************************
(始于1.3)
......@@ -68,7 +68,7 @@ Bool型,缺省值为True。
FLAGS_sync_nccl_allreduce=True - 在allreduce_op_handle中调用 `cudaStreamSynchronize(nccl_stream)` 。
tracer_profile_fname
FLAGS_tracer_profile_fname
*******************************************
(始于1.4.0)
......
......@@ -4,7 +4,7 @@ others
benchmark
FLAGS_benchmark
**************************************
(since 0.12.0)
......@@ -19,7 +19,7 @@ Example
FLAGS_benchmark=True will do some synchronizations to test benchmark.
inner_op_parallelism
FLAGS_inner_op_parallelism
*******************************************
(since 1.3.0)
......@@ -38,7 +38,7 @@ Note
currently only sparse adam op supports inner_op_parallelism.
max_body_size
FLAGS_max_body_size
*******************************************
(Since 1.0.0)
......@@ -53,7 +53,7 @@ Example
FLAGS_max_body_size=2147483647 will set the BRPC message size to 2147483647.
sync_nccl_allreduce
FLAGS_sync_nccl_allreduce
*******************************************
(since 1.3)
......@@ -68,7 +68,7 @@ Example
FLAGS_sync_nccl_allreduce=True will call `cudaStreamSynchronize(nccl_stream)` in allreduce_op_handle.
tracer_profile_fname
FLAGS_tracer_profile_fname
*******************************************
(since 1.4.0)
......
############
基本概念
############
.. _cn_user_guide_lod_tensor:
本文介绍Fluid版本基本使用概念:
##################
LoD-Tensor使用说明
##################
- `LoD-Tensor使用说明 <lod_tensor.html>`_ : LoD-Tensor是Fluid中特有的概念,它在Tensor基础上附加了序列信息,支持处理变长数据
LoD(Level-of-Detail) Tensor是Fluid中特有的概念,它在Tensor基础上附加了序列信息。Fluid中可传输的数据包括:输入、输出、网络中的可学习参数,全部统一使用LoD-Tensor表示
.. toctree::
:hidden:
阅读本文档将帮助您了解 Fluid 中的 LoD-Tensor 设计思想,以便您更灵活的使用这一数据类型。
变长序列的挑战
================
大多数的深度学习框架使用Tensor表示一个mini-batch。
例如一个mini-batch中有10张图片,每幅图片大小为32x32,则这个mini-batch是一个10x32x32的 Tensor。
或者在处理NLP任务中,一个mini-batch包含N个句子,每个字都用一个D维的one-hot向量表示,假设所有句子都用相同的长度L,那这个mini-batch可以被表示为NxLxD的Tensor。
上述两个例子中序列元素都具有相同大小,但是在许多情况下,训练数据是变长序列。基于这一场景,大部分框架采取的方法是确定一个固定长度,对小于这一长度的序列数据以0填充。
在Fluid中,由于LoD-Tensor的存在,我们不要求每个mini-batch中的序列数据必须保持长度一致,因此您不需要执行填充操作,也可以满足处理NLP等具有序列要求的任务需求。
Fluid引入了一个索引数据结构(LoD)来将张量分割成序列。
LoD 索引
===========
为了更好的理解LoD的概念,本节提供了几个例子供您参考:
**句子组成的 mini-batch**
假设一个mini-batch中有3个句子,每个句子中分别包含3个、1个和2个单词。我们可以用(3+1+2)xD维Tensor 加上一些索引信息来表示这个mini-batch:
.. code-block :: text
3 1 2
| | | | | |
上述表示中,每一个 :code:`|` 代表一个D维的词向量,数字3,1,2构成了 1-level LoD。
**递归序列**
让我们来看另一个2-level LoD-Tensor的例子:假设存在一个mini-batch中包含3个句子、1个句子和2个句子的文章,每个句子都由不同数量的单词组成,则这个mini-batch的样式可以看作:
.. code-block:: text
3 1 2
3 2 4 1 2 3
||| || |||| | || |||
表示的LoD信息为:
.. code-block:: text
[[3,1,2]/*level=0*/,[3,2,4,1,2,3]/*level=1*/]
**视频的mini-batch**
在视觉任务中,时常需要处理视频和图像这些元素是高维的对象,假设现存的一个mini-batch包含3个视频,分别有3个,1个和2个帧,每个帧都具有相同大小:640x480,则这个mini-batch可以被表示为:
.. code-block:: text
3 1 2
口口口 口 口口
最底层tensor大小为(3+1+2)x640x480,每一个 :code:`口` 表示一个640x480的图像
**图像的mini-batch**
在传统的情况下,比如有N个固定大小的图像的mini-batch,LoD-Tensor表示为:
.. code-block:: text
1 1 1 1 1
口口口口 ... 口
在这种情况下,我们不会因为索引值都为1而忽略信息,仅仅把LoD-Tensor看作是一个普通的张量:
.. code-block:: text
口口口口 ... 口
**模型参数**
模型参数只是一个普通的张量,在Fluid中它们被表示为一个0-level LoD-Tensor。
LoDTensor的偏移表示
=====================
为了快速访问基本序列,Fluid提供了一种偏移表示的方法——保存序列的开始和结束元素,而不是保存长度。
在上述例子中,您可以计算基本元素的长度:
.. code-block:: text
3 2 4 1 2 3
将其转换为偏移表示:
.. code-block:: text
0 3 5 9 10 12 15
= = = = = =
3 2+3 4+5 1+9 2+10 3+12
所以我们知道第一个句子是从单词0到单词3,第二个句子是从单词3到单词5。
类似的,LoD的顶层长度
.. code-block:: text
3 1 2
可以被转化成偏移形式:
.. code-block:: text
0 3 4 6
= = =
3 3+1 4+2
因此该LoD-Tensor的偏移表示为:
.. code-block:: text
0 3 4 6
3 5 9 10 12 15
LoD-Tensor
=============
一个LoD-Tensor可以被看作是一个树的结构,树叶是基本的序列元素,树枝作为基本元素的标识。
在 Fluid 中 LoD-Tensor 的序列信息有两种表述形式:原始长度和偏移量。在 Paddle 内部采用偏移量的形式表述 LoD-Tensor,以获得更快的序列访问速度;在 python API中采用原始长度的形式表述 LoD-Tensor 方便用户理解和计算,并将原始长度称为: :code:`recursive_sequence_lengths` 。
以上文提到的一个2-level LoD-Tensor为例:
.. code-block:: text
3 1 2
3 2 4 1 2 3
||| || |||| | || |||
- 以偏移量表示此 LoD-Tensor:[ [0,3,4,6] , [0,3,5,9,10,12,15] ],
- 以原始长度表达此 Lod-Tensor:recursive_sequence_lengths=[ [3-0 , 4-3 , 6-4] , [3-0 , 5-3 , 9-5 , 10-9 , 12-10 , 15-12] ]。
以文字序列为例: [3,1,2] 可以表示这个mini-batch中有3篇文章,每篇文章分别有3、1、2个句子,[3,2,4,1,2,3] 表示每个句子中分别含有3、2、4、1、2、3个字。
recursive_seq_lens 是一个双层嵌套列表,也就是列表的列表,最外层列表的size表示嵌套的层数,也就是lod-level的大小;内部的每个列表,对应表示每个lod-level下,每个元素的大小。
下面三段代码分别介绍如何创建一个LoD-Tensor,如何将LoD-Tensor转换成Tensor,如何将Tensor转换成LoD-Tensor:
* 创建 LoD-Tensor
.. code-block:: python
#创建lod-tensor
import paddle.fluid as fluid
import numpy as np
a = fluid.create_lod_tensor(np.array([[1],[1],[1],
[1],[1],
[1],[1],[1],[1],
[1],
[1],[1],
[1],[1],[1]]).astype('int64') ,
[[3,1,2] , [3,2,4,1,2,3]],
fluid.CPUPlace())
#查看lod-tensor嵌套层数
print (len(a.recursive_sequence_lengths()))
# output:2
#查看最基础元素个数
print (sum(a.recursive_sequence_lengths()[-1]))
# output:15 (3+2+4+1+2+3=15)
* LoD-Tensor 转 Tensor
.. code-block:: python
import paddle.fluid as fluid
import numpy as np
# 创建一个 LoD-Tensor
a = fluid.create_lod_tensor(np.array([[1.1], [2.2],[3.3],[4.4]]).astype('float32'), [[1,3]], fluid.CPUPlace())
def LodTensor_to_Tensor(lod_tensor):
# 获取 LoD-Tensor 的 lod 信息
lod = lod_tensor.lod()
# 转换成 array
array = np.array(lod_tensor)
new_array = []
# 依照原LoD-Tensor的层级信息,转换成Tensor
for i in range(len(lod[0]) - 1):
new_array.append(array[lod[0][i]:lod[0][i + 1]])
return new_array
new_array = LodTensor_to_Tensor(a)
# 输出结果
print(new_array)
* Tensor 转 LoD-Tensor
.. code-block:: python
import paddle.fluid as fluid
import numpy as np
def to_lodtensor(data, place):
# 存储Tensor的长度作为LoD信息
seq_lens = [len(seq) for seq in data]
cur_len = 0
lod = [cur_len]
for l in seq_lens:
cur_len += l
lod.append(cur_len)
# 对待转换的 Tensor 降维
flattened_data = np.concatenate(data, axis=0).astype("int64")
flattened_data = flattened_data.reshape([len(flattened_data), 1])
# 为 Tensor 数据添加lod信息
res = fluid.LoDTensor()
res.set(flattened_data, place)
res.set_lod([lod])
return res
# new_array 为上段代码中转换的Tensor
lod_tensor = to_lodtensor(new_array,fluid.CPUPlace())
# 输出 LoD 信息
print("The LoD of the result: {}.".format(lod_tensor.lod()))
# 检验与原Tensor数据是否一致
print("The array : {}.".format(np.array(lod_tensor)))
代码示例
===========
本节代码将根据指定的级别y-lod,扩充输入变量x。本例综合了LoD-Tensor的多个重要概念,跟随代码实现,您将:
- 直观理解Fluid中 :code:`fluid.layers.sequence_expand` 的实现过程
- 掌握如何在Fluid中创建LoD-Tensor
- 学习如何打印LoDTensor内容
**定义计算过程**
layers.sequence_expand通过获取 y 的 lod 值对 x 的数据进行扩充,关于 :code:`fluid.layers.sequence_expand` 的功能说明,请先阅读 :ref:`cn_api_fluid_layers_sequence_expand` 。
序列扩充代码实现:
.. code-block:: python
x = fluid.layers.data(name='x', shape=[1], dtype='float32', lod_level=1)
y = fluid.layers.data(name='y', shape=[1], dtype='float32', lod_level=2)
out = fluid.layers.sequence_expand(x=x, y=y, ref_level=0)
*说明*:输出LoD-Tensor的维度仅与传入的真实数据维度有关,在定义网络结构阶段为x、y设置的shape值,仅作为占位,并不影响结果。
**创建Executor**
.. code-block:: python
place = fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
**准备数据**
这里我们调用 :code:`fluid.create_lod_tensor` 创建 :code:`sequence_expand` 的输入数据,通过定义 y_d 的 LoD 值,对 x_d 进行扩充。其中,输出值只与 y_d 的 LoD 值有关,y_d 的 data 值在这里并不参与计算,维度上与LoD[-1]一致即可。
:code:`fluid.create_lod_tensor()` 的使用说明请参考 :ref:`cn_api_fluid_create_lod_tensor` 。
实现代码如下:
.. code-block:: python
x_d = fluid.create_lod_tensor(np.array([[1.1],[2.2],[3.3],[4.4]]).astype('float32'), [[1,3]], place)
y_d = fluid.create_lod_tensor(np.array([[1.1],[1.1],[1.1],[1.1],[1.1],[1.1]]).astype('float32'), [[1,3], [2,1,2,1]],place)
**执行运算**
在Fluid中,LoD>1的Tensor与其他类型的数据一样,使用 :code:`feed` 定义数据传入顺序。此外,由于输出results是带有LoD信息的Tensor,需在exe.run( )中添加 :code:`return_numpy=False` 参数,获得LoD-Tensor的输出结果。
.. code-block:: python
results = exe.run(fluid.default_main_program(),
feed={'x':x_d, 'y': y_d },
fetch_list=[out],return_numpy=False)
**查看LodTensor结果**
由于LoDTensor的特殊属性,无法直接print查看内容,常用操作时将LoD-Tensor作为网络的输出fetch出来,然后执行 numpy.array(lod_tensor), 就能转成numpy array:
.. code-block:: python
np.array(results[0])
输出结果为:
.. code-block:: text
array([[1.1],[2.2],[3.3],[4.4],[2.2],[3.3],[4.4],[2.2],[3.3],[4.4]])
**查看序列长度**
可以通过查看序列长度得到 LoDTensor 的递归序列长度:
.. code-block:: python
results[0].recursive_sequence_lengths()
输出结果为:
.. code-block:: text
[[1L, 3L, 3L, 3L]]
**完整代码**
您可以运行下列完整代码,观察输出结果:
.. code-block:: python
#加载库
import paddle
import paddle.fluid as fluid
import numpy as np
#定义前向计算
x = fluid.layers.data(name='x', shape=[1], dtype='float32', lod_level=1)
y = fluid.layers.data(name='y', shape=[1], dtype='float32', lod_level=2)
out = fluid.layers.sequence_expand(x=x, y=y, ref_level=0)
#定义运算场所
place = fluid.CPUPlace()
#创建执行器
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
#创建LoDTensor
x_d = fluid.create_lod_tensor(np.array([[1.1], [2.2],[3.3],[4.4]]).astype('float32'), [[1,3]], place)
y_d = fluid.create_lod_tensor(np.array([[1.1],[1.1],[1.1],[1.1],[1.1],[1.1]]).astype('float32'), [[1,3], [1,2,1,2]], place)
#开始计算
results = exe.run(fluid.default_main_program(),
feed={'x':x_d, 'y': y_d },
fetch_list=[out],return_numpy=False)
#输出执行结果
print("The data of the result: {}.".format(np.array(results[0])))
#输出 result 的序列长度
print("The recursive sequence lengths of the result: {}.".format(results[0].recursive_sequence_lengths()))
#输出 result 的 LoD
print("The LoD of the result: {}.".format(results[0].lod()))
总结
========
至此,相信您已经基本掌握了LoD-Tensor的概念,尝试修改上述代码中的 x_d 与 y_d,观察输出结果,有助于您更好的理解这一灵活的结构。
更多LoDTensor的模型应用,可以参考新手入门中的 `词向量 <../../../beginners_guide/basics/word2vec/index.html>`_ 、`个性化推荐 <../../../beginners_guide/basics/recommender_system/index.html>`_、`情感分析 <../../../beginners_guide/basics/understand_sentiment/index.html>`_ 等指导教程。
更高阶的应用案例,请参考 `模型库 <../../../user_guides/models/index_cn.html>`_ 中的相关内容。
lod_tensor.rst
###############
Basic Concepts
###############
#####################
LoD-Tensor User Guide
#####################
This section will introduce basic concepts in Fluid:
LoD(Level-of-Detail) Tensor is a unique term in Fluid, which can be constructed by appending sequence information to Tensor. Data transferred in Fluid contain input, output and learnable parameters of the network, all of which are represented by LoD-Tensor.
- `LoD-Tensor User Guide <lod_tensor_en.html>`_ : LoD-Tensor is a unique term of Fluid. It appends sequence information to Tensor,and supports data of variable lengths.
With the help of this user guide, you will learn the design idea of LoD-Tensor in Fluid so that you can use such a data type more flexibly.
.. toctree::
:hidden:
Challenge of variable-length sequences
======================================
In most deep learning frameworks, a mini-batch is represented by Tensor.
For example, if there are 10 pictures in a mini-batch and the size of each picture is 32*32, the mini-batch will be a 10*32*32 Tensor.
Or in the NLP task, there are N sentences in a mini-batch and the length of each sentence is L. Every word is represented by a one-hot vector with D dimensions. Then the mini-batch can be represented by an N*L*D Tensor.
In the two examples above, the size of each sequence element remains the same. However, the data to be trained are variable-length sequences in many cases. For this scenario, method to be taken in most frameworks is to set a fixed length and sequence data shorter than the fixed length will be padded with 0 to reach the fixed length.
Owing to the LoD-Tensor in Fluid, it is not necessary to keep the lengths of sequence data in every mini-batch constant.Therefore tasks sensitive to sequence formats like NLP can also be finished without padding.
Index Data Structure (LoD) is introduced to Fluid to split Tensor into sequences.
Index Structure - LoD
======================
To have a better understanding of the concept of LoD, you can refer to the examples in this section.
**mini-batch consisting of sentences**
Suppose a mini-batch contains three sentences, and each contains 3, 1, 2 words respectively. Then the mini-batch can be represented by a (3+1+2)*D Tensor with some index information appended:
.. code-block :: text
3 1 2
| | | | | |
In the text above, each :code:`|` represents a word vector with D dimension and a 1-level LoD is made up of digits 3,1,2 .
**recursive sequence**
Take a 2-level LoD-Tensor for example, a mini-batch contains articles of 3 sentences, 1 sentence and 2 sentences. The number of words in every sentence is different. Then the mini-batch is formed as follows:
.. code-block:: text
3 1 2
3 2 4 1 2 3
||| || |||| | || |||
the LoD to express the format:
.. code-block:: text
[[312]/*level=0*/[324123]/*level=1*/]
**mini-batch consisting of video data**
In the task of computer vision, it usually needs to deal objects with high dimension like videos and pictures. Suppose a mini-batch contains 3 videos, which is composed of 3 frames, 1 frames, 2 frames respectively. The size of each frame is 640*480. Then the mini-batch can be described as:
.. code-block:: text
3 1 2
口口口 口口
The size of the tensor at the bottom is (3+1+2)*640*480. Every :code:`` represents a 640*480 picture.
**mini-batch consisting of pictures**
Traditionally, for a mini-batch of N pictures with fixed size, LoD-Tensor is described as:
.. code-block:: text
1 1 1 1 1
口口口口 ...
Under such circumstance, we will consider LoD-Tensor as a common tensor instead of ignoring information because of the indices of all elements are 1.
.. code-block:: text
口口口口 ...
**model parameter**
model parameter is a common tensor which is described as a 0-level LoD-Tensor in Fluid.
LoDTensor expressed by offset
=============================
To have a quick access to the original sequence, you can take the offset expression method——store the first and last element of a sequence instead of its length.
In the example above, you can compute the length of fundamental elements:
.. code-block:: text
3 2 4 1 2 3
It is expressed by offset as follows:
.. code-block:: text
0 3 5 9 10 12 15
= = = = = =
3 2+3 4+5 1+9 2+10 3+12
Therefore we infer that the first sentence starts from word 0 to word 3 and the second sentence starts from word 3 to word 5.
Similarly, for the length of the top layer of LoD
.. code-block:: text
3 1 2
It can be expressed by offset:
.. code-block:: text
0 3 4 6
= = =
3 3+1 4+2
Therefore the LoD-Tensor is expressed by offset:
.. code-block:: text
0 3 4 6
3 5 9 10 12 15
LoD-Tensor
=============
A LoD-Tensor can be regarded as a tree of which the leaf is an original sequence element and branch is the flag of fundamental element.
There are two ways to express sequence information of LoD-Tensor in Fluid: primitive length and offset. LoD-Tensor is expressed by offset in Paddle to offer a quicker access to sequence;LoD-Tensor is expressed by primitive length in python API to make user understand and compute more easily. The primary length is named as :code:`recursive_sequence_lengths` .
Take a 2-level LoD-Tensor mentioned above as an example:
.. code-block:: text
3 1 2
3 2 4 1 2 3
||| || |||| | || |||
- LoD-Tensor expressed by offset: [ [0,3,4,6] , [0,3,5,9,10,12,15] ]
- LoD-Tensor expressed by primitive length: recursive_sequence_lengths=[ [3-0 , 4-3 , 6-4] , [3-0 , 5-3 , 9-5 , 10-9 , 12-10 , 15-12] ]
Take text sequence as an example,[3,1,2] indicates there are 3 articles in the mini-batch,which contains 3,1,2 sentences respectively.[3,2,4,1,2,3] indicates there are 3,2,4,1,2,3 words in sentences respectively.
recursive_seq_lens is a double Layer nested list, and in other words, the element of the list is list. The size of the outermost list represents the nested layers, namely the size of lod-level; Each inner list represents the size of each element in each lod-level.
The following three pieces of codes introduce how to create LoD-Tensor, how to transform LoD-Tensor to Tensor and how to transform Tensor to LoD-Tensor respectively:
* Create LoD-Tensor
.. code-block:: python
#Create lod-tensor
import paddle.fluid as fluid
import numpy as np
a = fluid.create_lod_tensor(np.array([[1],[1],[1],
[1],[1],
[1],[1],[1],[1],
[1],
[1],[1],
[1],[1],[1]]).astype('int64') ,
[[3,1,2] , [3,2,4,1,2,3]],
fluid.CPUPlace())
#Check lod-tensor nested layers
print (len(a.recursive_sequence_lengths()))
# output2
#Check the number of the most fundamental elements
print (sum(a.recursive_sequence_lengths()[-1]))
# output:15 (3+2+4+1+2+3=15)
* Transform LoD-Tensor to Tensor
.. code-block:: python
import paddle.fluid as fluid
import numpy as np
# create LoD-Tensor
a = fluid.create_lod_tensor(np.array([[1.1], [2.2],[3.3],[4.4]]).astype('float32'), [[1,3]], fluid.CPUPlace())
def LodTensor_to_Tensor(lod_tensor):
# get lod information of LoD-Tensor
lod = lod_tensor.lod()
# transform into array
array = np.array(lod_tensor)
new_array = []
# transform to Tensor according to the layer information of the original LoD-Tensor
for i in range(len(lod[0]) - 1):
new_array.append(array[lod[0][i]:lod[0][i + 1]])
return new_array
new_array = LodTensor_to_Tensor(a)
# output the result
print(new_array)
* Transform Tensor to LoD-Tensor
.. code-block:: python
import paddle.fluid as fluid
import numpy as np
def to_lodtensor(data, place):
# save the length of Tensor as LoD information
seq_lens = [len(seq) for seq in data]
cur_len = 0
lod = [cur_len]
for l in seq_lens:
cur_len += l
lod.append(cur_len)
# decrease the dimention of transformed Tensor
flattened_data = np.concatenate(data, axis=0).astype("int64")
flattened_data = flattened_data.reshape([len(flattened_data), 1])
# add lod information to Tensor data
res = fluid.LoDTensor()
res.set(flattened_data, place)
res.set_lod([lod])
return res
# new_array is the transformed Tensor above
lod_tensor = to_lodtensor(new_array,fluid.CPUPlace())
# output LoD information
print("The LoD of the result: {}.".format(lod_tensor.lod()))
# examine the consistency with Tensor data
print("The array : {}.".format(np.array(lod_tensor)))
Code examples
==============
Input variable x is expanded according to specified layer level y-lod in the code example in this section. The example below contains some fundamental conception of LoD-Tensor. By following the code, you will
- Have a direct understanding of the implementation of :code:`fluid.layers.sequence_expand` in Fluid
- Know how to create LoD-Tensor in Fluid
- Learn how to print the content of LoDTensor
**Define the Process of Computing**
layers.sequence_expand expands x by obtaining the lod value of y. About more explanation of :code:`fluid.layers.sequence_expand` , please read :ref:`api_fluid_layers_sequence_expand` first.
Code of sequence expanding:
.. code-block:: python
x = fluid.layers.data(name='x', shape=[1], dtype='float32', lod_level=1)
y = fluid.layers.data(name='y', shape=[1], dtype='float32', lod_level=2)
out = fluid.layers.sequence_expand(x=x, y=y, ref_level=0)
*Note*The dimension of input LoD-Tensor is only associated with the dimension of real data transferred in. The shape value set for x and y in the definition of network structure is just a placeholder with little influence on the result.
**Create Executor**
.. code-block:: python
place = fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
**Prepare Data**
Here we use :code:`fluid.create_lod_tensor` to create the input data of :code:`sequence_expand` and expand x_d by defining LoD of y_d. The output value is only associated with LoD of y_d. And the data of y_d is not invovled in the process of computation. The dimension of y_d must keep consistent with as its LoD[-1] .
About the user guide of :code:`fluid.create_lod_tensor()` , please refer to :ref:`api_fluid_create_lod_tensor` .
Code
.. code-block:: python
x_d = fluid.create_lod_tensor(np.array([[1.1],[2.2],[3.3],[4.4]]).astype('float32'), [[1,3]], place)
y_d = fluid.create_lod_tensor(np.array([[1.1],[1.1],[1.1],[1.1],[1.1],[1.1]]).astype('float32'), [[1,3], [2,1,2,1]],place)
**Execute Computing**
For tensor whose LoD > 1 in Fluid, like data of other types, the order of transfering data is defined by :code:`feed` . In addition, parameter :code:`return_numpy=False` needs to be added to exe.run() to get the output of LoD-Tensor because results are Tensors with LoD information.
.. code-block:: python
results = exe.run(fluid.default_main_program(),
feed={'x':x_d, 'y': y_d },
fetch_list=[out],return_numpy=False)
**Check the result of LodTensor**
Because of the special attributes of LoDTensor, you could not print to check the content. The usual solution to the problem is to fetch the LoDTensor as the output of network and then execute numpy.array(lod_tensor) to transfer LoDTensor into numpy array:
.. code-block:: python
np.array(results[0])
Output:
.. code-block:: text
array([[1.1],[2.2],[3.3],[4.4],[2.2],[3.3],[4.4],[2.2],[3.3],[4.4]])
**Check the length of sequence**
You can get the recursive sequence length of LoDTensor by checking the sequence length:
.. code-block:: python
results[0].recursive_sequence_lengths()
Output
.. code-block:: text
[[1L, 3L, 3L, 3L]]
**Complete Code**
You can check the output by executing the following complete code:
.. code-block:: python
#Load
import paddle
import paddle.fluid as fluid
import numpy as np
#Define forward computation
x = fluid.layers.data(name='x', shape=[1], dtype='float32', lod_level=1)
y = fluid.layers.data(name='y', shape=[1], dtype='float32', lod_level=2)
out = fluid.layers.sequence_expand(x=x, y=y, ref_level=0)
#Define place for computation
place = fluid.CPUPlace()
#Create executer
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
#Create LoDTensor
x_d = fluid.create_lod_tensor(np.array([[1.1], [2.2],[3.3],[4.4]]).astype('float32'), [[1,3]], place)
y_d = fluid.create_lod_tensor(np.array([[1.1],[1.1],[1.1],[1.1],[1.1],[1.1]]).astype('float32'), [[1,3], [1,2,1,2]], place)
#Start computing
results = exe.run(fluid.default_main_program(),
feed={'x':x_d, 'y': y_d },
fetch_list=[out],return_numpy=False)
#Output result
print("The data of the result: {}.".format(np.array(results[0])))
#print the length of sequence of result
print("The recursive sequence lengths of the result: {}.".format(results[0].recursive_sequence_lengths()))
#print the LoD of result
print("The LoD of the result: {}.".format(results[0].lod()))
Summary
========
Then, we believe that you have known about the concept LoD-Tensor. And an attempt to change x_d and y_d in code above and then to check the output may help you get a better understanding of this flexible structure.
About more model applications of LoDTensor, you can refer to `Word2vec <../../../beginners_guide/basics/word2vec/index_en.html>`_ , `Individual Recommendation <../../../beginners_guide/basics/recommender_system/index_en.html>`_ , `Sentiment Analysis <../../../beginners_guide/basics/understand_sentiment/index_en.html>`_ in the beginner's guide.
About more difffiult and complex examples of application, please refer to associated information about `models <../../../user_guides/models/index_en.html>`_ .
lod_tensor_en.rst
......@@ -4,127 +4,12 @@
准备数据
########
使用PaddlePaddle Fluid准备数据分为三个步骤:
Step1: 自定义Reader生成训练/预测数据
###################################
生成的数据类型可以为Numpy Array或LoDTensor。根据Reader返回的数据形式的不同,可分为Batch级的Reader和Sample(样本)级的Reader。
Batch级的Reader每次返回一个Batch的数据,Sample级的Reader每次返回单个样本的数据
如果您的数据是Sample级的数据,我们提供了一个可以数据预处理和组建batch的工具::code:`Python Reader` 。
Step2: 在网络配置中定义数据层变量
###################################
用户需使用 :code:`fluid.layers.data` 在网络中定义数据层变量。定义数据层变量时需指明数据层的名称name、数据类型dtype和维度shape。例如:
.. code-block:: python
import paddle.fluid as fluid
image = fluid.layers.data(name='image', dtype='float32', shape=[28, 28])
label = fluid.layers.data(name='label', dtype='int64', shape=[1])
需要注意的是,此处的shape是单个样本的维度,PaddlePaddle Fluid会在shape第0维位置添加-1,表示batch_size的维度,即此例中image.shape为[-1, 28, 28],
label.shape为[-1, 1]。
若用户不希望框架在第0维位置添加-1,则可通过append_batch_size=False参数控制,即:
.. code-block:: python
import paddle.fluid as fluid
image = fluid.layers.data(name='image', dtype='float32', shape=[28, 28], append_batch_size=False)
label = fluid.layers.data(name='label', dtype='int64', shape=[1], append_batch_size=False)
此时,image.shape为[28, 28],label.shape为[1]。
Step3: 将数据送入网络进行训练/预测
###################################
Fluid提供两种方式,分别是异步PyReader接口方式或同步Feed方式,具体介绍如下:
- 异步PyReader接口方式
用户需要先使用 :code:`fluid.io.PyReader` 定义PyReader对象,然后通过PyReader对象的decorate方法设置数据源。
使用PyReader接口时,数据传入与模型训练/预测过程是异步进行的,效率较高,推荐使用。
- 同步Feed方式
用户自行构造输入数据,并在 :code:`fluid.Executor` 或 :code:`fluid.ParallelExecutor`
中使用 :code:`executor.run(feed=...)` 传入训练数据。数据准备和模型训练/预测的过程是同步进行的,
效率较低。
这两种准备数据方法的比较如下:
======== ================================= =====================================
对比项 同步Feed方式 异步PyReader接口方式
======== ================================= =====================================
API接口 :code:`executor.run(feed=...)` :code:`fluid.io.PyReader`
数据格式 Numpy Array或LoDTensor Numpy Array或LoDTensor
数据增强 Python端使用其他库完成 Python端使用其他库完成
速度 慢 快
推荐用途 调试模型 工业训练
======== ================================= =====================================
Reader数据类型对使用方式的影响
###############################
根据Reader数据类型的不同,上述步骤的具体操作将有所不同,具体介绍如下:
读取Sample级Reader数据
+++++++++++++++++++++
若自定义的Reader每次返回单个样本的数据,用户需通过以下步骤完成数据送入:
Step1. 组建数据
=============================
调用Fluid提供的Reader相关接口完成组batch和部分的数据预处理功能,具体请参见:
本章详细介绍了如何为神经网络提供数据,包括数据的前期处理与后期的同步、异步读取。
.. toctree::
:maxdepth: 1
prepare_steps.rst
reader_cn.md
Step2. 送入数据
=================================
若使用异步PyReader接口方式送入数据,请调用 :code:`decorate_sample_generator` 或 :code:`decorate_sample_list_generator` 接口完成,具体请参见:
- :ref:`user_guides_use_py_reader`
若使用同步Feed方式送入数据,请使用DataFeeder接口将Reader数据转换为LoDTensor格式后送入网络,具体请参见 :ref:`cn_api_fluid_DataFeeder`
读取Batch级Reader数据
+++++++++++++++++++++++
Step1. 组建数据
=================
由于Batch已经组好,已经满足了Step1的条件,可以直接进行Step2
Step2. 送入数据
=================================
若使用异步PyReader接口方式送入数据,请调用PyReader的 :code:`decorate_batch_generator` 接口完成,具体方式请参见:
.. toctree::
:maxdepth: 1
use_py_reader.rst
若使用同步Feed方式送入数据,具体请参见:
.. toctree::
:maxdepth: 1
feeding_data.rst
\ No newline at end of file
......@@ -4,52 +4,12 @@
Prepare Data
#############
PaddlePaddle Fluid supports two methods to feed data into networks:
1. Synchronous method - Python Reader:Firstly, use :code:`fluid.layers.data` to set up data input layer. Then, feed in the training data through :code:`executor.run(feed=...)` in :code:`fluid.Executor` or :code:`fluid.ParallelExecutor` .
2. Asynchronous method - py_reader:Firstly, use :code:`fluid.layers.py_reader` to set up data input layer. Then configure the data source with functions :code:`decorate_paddle_reader` or :code:`decorate_tensor_provider` of :code:`py_reader` . After that, call :code:`fluid.layers.read_file` to read data.
Comparisons of the two methods:
========================= ==================================================== ===============================================
Aspects Synchronous Python Reader Asynchronous py_reader
========================= ==================================================== ===============================================
API interface :code:`executor.run(feed=...)` :code:`fluid.layers.py_reader`
data type Numpy Array Numpy Array or LoDTensor
data augmentation carried out by other libraries on Python end carried out by other libraries on Python end
velocity slow rapid
recommended applications model debugging industrial training
========================= ==================================================== ===============================================
Synchronous Python Reader
##########################
Fluid provides Python Reader to feed in data.
Python Reader is a pure Python-side interface, and data feeding is synchronized with the model training/prediction process. Users can pass in data through Numpy Array. For specific operations, please refer to:
.. toctree::
:maxdepth: 1
feeding_data_en.rst
Python Reader supports advanced functions like group batch, shuffle. For specific operations, please refer to:
This document mainly introduces how to provide data for the network, including Synchronous-method and Asynchronous-method.
.. toctree::
:maxdepth: 1
prepare_steps_en.rst
reader.md
Asynchronous py_reader
########################
Fluid provides asynchronous data feeding method PyReader. It is more efficient as data feeding is not synchronized with the model training/prediction process. For specific operations, please refer to:
.. toctree::
:maxdepth: 1
use_py_reader_en.rst
feeding_data_en.rst
\ No newline at end of file
.. _user_guide_prepare_steps:
########
准备步骤
########
使用PaddlePaddle Fluid准备数据分为三个步骤:
Step1: 自定义Reader生成训练/预测数据
###################################
生成的数据类型可以为Numpy Array或LoDTensor。根据Reader返回的数据形式的不同,可分为Batch级的Reader和Sample(样本)级的Reader。
Batch级的Reader每次返回一个Batch的数据,Sample级的Reader每次返回单个样本的数据
如果您的数据是Sample级的数据,我们提供了一个可以数据预处理和组建batch的工具::code:`Python Reader` 。
Step2: 在网络配置中定义数据层变量
###################################
用户需使用 :code:`fluid.layers.data` 在网络中定义数据层变量。定义数据层变量时需指明数据层的名称name、数据类型dtype和维度shape。例如:
.. code-block:: python
import paddle.fluid as fluid
image = fluid.layers.data(name='image', dtype='float32', shape=[28, 28])
label = fluid.layers.data(name='label', dtype='int64', shape=[1])
需要注意的是,此处的shape是单个样本的维度,PaddlePaddle Fluid会在shape第0维位置添加-1,表示batch_size的维度,即此例中image.shape为[-1, 28, 28],
label.shape为[-1, 1]。
若用户不希望框架在第0维位置添加-1,则可通过append_batch_size=False参数控制,即:
.. code-block:: python
import paddle.fluid as fluid
image = fluid.layers.data(name='image', dtype='float32', shape=[28, 28], append_batch_size=False)
label = fluid.layers.data(name='label', dtype='int64', shape=[1], append_batch_size=False)
此时,image.shape为[28, 28],label.shape为[1]。
Step3: 将数据送入网络进行训练/预测
###################################
Fluid提供两种方式,分别是异步PyReader接口方式或同步Feed方式,具体介绍如下:
- 异步PyReader接口方式
用户需要先使用 :code:`fluid.io.PyReader` 定义PyReader对象,然后通过PyReader对象的decorate方法设置数据源。
使用PyReader接口时,数据传入与模型训练/预测过程是异步进行的,效率较高,推荐使用。
- 同步Feed方式
用户自行构造输入数据,并在 :code:`fluid.Executor` 或 :code:`fluid.ParallelExecutor`
中使用 :code:`executor.run(feed=...)` 传入训练数据。数据准备和模型训练/预测的过程是同步进行的,
效率较低。
这两种准备数据方法的比较如下:
======== ================================= =====================================
对比项 同步Feed方式 异步PyReader接口方式
======== ================================= =====================================
API接口 :code:`executor.run(feed=...)` :code:`fluid.io.PyReader`
数据格式 Numpy Array或LoDTensor Numpy Array或LoDTensor
数据增强 Python端使用其他库完成 Python端使用其他库完成
速度 慢 快
推荐用途 调试模型 工业训练
======== ================================= =====================================
Reader数据类型对使用方式的影响
###############################
根据Reader数据类型的不同,上述步骤的具体操作将有所不同,具体介绍如下:
读取Sample级Reader数据
+++++++++++++++++++++
若自定义的Reader每次返回单个样本的数据,用户需通过以下步骤完成数据送入:
Step1. 组建数据
=============================
调用Fluid提供的Reader相关接口完成组batch和部分的数据预处理功能,具体请参见:
.. toctree::
:maxdepth: 1
reader_cn.md
Step2. 送入数据
=================================
若使用异步PyReader接口方式送入数据,请调用 :code:`decorate_sample_generator` 或 :code:`decorate_sample_list_generator` 接口完成,具体请参见:
- :ref:`user_guides_use_py_reader`
若使用同步Feed方式送入数据,请使用DataFeeder接口将Reader数据转换为LoDTensor格式后送入网络,具体请参见 :ref:`cn_api_fluid_DataFeeder`
读取Batch级Reader数据
+++++++++++++++++++++++
Step1. 组建数据
=================
由于Batch已经组好,已经满足了Step1的条件,可以直接进行Step2
Step2. 送入数据
=================================
若使用异步PyReader接口方式送入数据,请调用PyReader的 :code:`decorate_batch_generator` 接口完成,具体方式请参见:
.. toctree::
:maxdepth: 1
use_py_reader.rst
若使用同步Feed方式送入数据,具体请参见:
.. toctree::
:maxdepth: 1
feeding_data.rst
.. _user_guide_prepare_steps_en:
#############
Prepare Steps
#############
PaddlePaddle Fluid supports two methods to feed data into networks:
1. Synchronous method - Python Reader:Firstly, use :code:`fluid.layers.data` to set up data input layer. Then, feed in the training data through :code:`executor.run(feed=...)` in :code:`fluid.Executor` or :code:`fluid.ParallelExecutor` .
2. Asynchronous method - py_reader:Firstly, use :code:`fluid.layers.py_reader` to set up data input layer. Then configure the data source with functions :code:`decorate_paddle_reader` or :code:`decorate_tensor_provider` of :code:`py_reader` . After that, call :code:`fluid.layers.read_file` to read data.
Comparisons of the two methods:
========================= ==================================================== ===============================================
Aspects Synchronous Python Reader Asynchronous py_reader
========================= ==================================================== ===============================================
API interface :code:`executor.run(feed=...)` :code:`fluid.layers.py_reader`
data type Numpy Array Numpy Array or LoDTensor
data augmentation carried out by other libraries on Python end carried out by other libraries on Python end
velocity slow rapid
recommended applications model debugging industrial training
========================= ==================================================== ===============================================
Synchronous Python Reader
##########################
Fluid provides Python Reader to feed in data.
Python Reader is a pure Python-side interface, and data feeding is synchronized with the model training/prediction process. Users can pass in data through Numpy Array. For specific operations, please refer to:
.. toctree::
:maxdepth: 1
feeding_data_en.rst
Python Reader supports advanced functions like group batch, shuffle. For specific operations, please refer to:
.. toctree::
:maxdepth: 1
reader.md
Asynchronous py_reader
########################
Fluid provides asynchronous data feeding method PyReader. It is more efficient as data feeding is not synchronized with the model training/prediction process. For specific operations, please refer to:
.. toctree::
:maxdepth: 1
use_py_reader_en.rst
......@@ -4,7 +4,7 @@
Use PyReader to read training and test data
############################################
Besides Python Reader, we provide PyReader. The performance of PyReader is better than :ref:`user_guide_use_numpy_array_as_train_data` , because the process of loading data is asynchronous with the process of training model when PyReader is in use. And PyReader can coordinate with :code:`double_buffer_reader` to improve the performance of reading data. What's more, :code:`double_buffer_reader` can achieve the transformation from CPU Tensor to GPU Tensor, which improve the efficiency of reading data to some extent.
Besides Python Reader, we provide PyReader. The performance of PyReader is better than :ref:`user_guide_use_numpy_array_as_train_data_en` , because the process of loading data is asynchronous with the process of training model when PyReader is in use. And PyReader can coordinate with :code:`double_buffer_reader` to improve the performance of reading data. What's more, :code:`double_buffer_reader` can achieve the transformation from CPU Tensor to GPU Tensor, which improve the efficiency of reading data to some extent.
Create PyReader Object
################################
......
......@@ -6,7 +6,7 @@
如果您已经掌握了新手入门阶段的内容,期望可以针对实际问题建模、搭建自己网络,本模块提供了一些 Fluid 的使用细节供您参考:
- `基本概念 <../user_guides/howto/basic_concept/index_cn.html>`_ :介绍了Fluid的基本使用概念
- `LoD-Tensor概念 <../user_guides/howto/basic_concept/index_cn.html>`_ :介绍了Fluid LoD-Tensor的基本概念
- `准备数据 <../user_guides/howto/prepare_data/index_cn.html>`_ :介绍使用 Fluid 训练网络时,数据的支持类型及传输方法
......
......@@ -8,7 +8,7 @@ If you have got the hang of Beginner's Guide, and wish to model practical proble
you with some detailed operations:
- `Basic Concepts <../user_guides/howto/basic_concept/index_en.html>`_ :It explains basic concepts of Fluid.
- `LoD-Tensor Concepts <../user_guides/howto/basic_concept/index_en.html>`_ :It explains basic concepts of Fluid LoD-Tensor.
- `Prepare Data <../user_guides/howto/prepare_data/index_en.html>`_ :This section introduces data types supported and data transmission methods when you are training your networks with Fluid.
......
Subproject commit 2e3ec66be0fa425c389def2db5db7494f77f905c
Subproject commit 103d09169d2d023b0f477cc9362d724ab2531453
Subproject commit 0aa844b15af5f09d1f7c9effa60ea0da1cd0c84f
Subproject commit 4d3d1663c2dd28241ab7ee32396fb0d92793f9fc
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册