提交 921765ab 编写于 作者: R RaindragonD 提交者: xsrobin

icafe DLTP-1328, stress exe.close() (#900)

* modify write_docs_cn, write_docs_en, include specifications about python 2.7.15 and creating  virtual env with conda

* gitignore update

* revert .gitignore;  update write_docs

* icafe DLTP-1328, stress  exe.close()

* DLTP-1383 不可迭代更正

* DLTP-1404 修复动态图文档中每段代码的格式和大小都不一样

* DLTP-1390 Pyreader中文显示格式问题

* DLTP-1393 无效链接

* DLTP-1421 paddle.fluid.layers.reduce_max中文翻译有问题

* DyGraph 调整标签结构
上级 15957eec
...@@ -24,8 +24,8 @@ PaddlePaddle Fluid可以支持在现代GPU [#]_ 服务器集群上完成高性 ...@@ -24,8 +24,8 @@ PaddlePaddle Fluid可以支持在现代GPU [#]_ 服务器集群上完成高性
:header: "调节项", "可选值说明", "配置方法" :header: "调节项", "可选值说明", "配置方法"
:widths: 3, 3, 5 :widths: 3, 3, 5
"通信模式", "pserver模式;NCCL2模式(collective [#]_ )", "配置方法参考: `这里 <../../user_guides/howto/training/cluster_howto.html#permalink-8--nccl2->`_ " "通信模式", "pserver模式;NCCL2模式(collective [#]_ )", "配置方法参考::ref:`cluster_howto`"
"执行模式", "单进程;单进程ParallelGraph;多进程", "配置方法参考: `这里 <../../user_guides/howto/training/cluster_howto.html#permalink-9--nccl2->`_ " "执行模式", "单进程;单进程ParallelGraph;多进程", "配置方法参考::ref:`cluster_howto`"
"同步AllReduce操作", "开启则使每次调用等待AllReduce同步", "设置环境变量 :code:`FLAGS_sync_nccl_allreduce`" "同步AllReduce操作", "开启则使每次调用等待AllReduce同步", "设置环境变量 :code:`FLAGS_sync_nccl_allreduce`"
"CPU线程数", "int值,配置使用的CPU线程数", "参考本片后续说明" "CPU线程数", "int值,配置使用的CPU线程数", "参考本片后续说明"
"预先分配足够的显存", "0~1之间的float值,预先分配显存的占比", "设置环境变量 :code:`FLAGS_fraction_of_gpu_memory_to_use`" "预先分配足够的显存", "0~1之间的float值,预先分配显存的占比", "设置环境变量 :code:`FLAGS_fraction_of_gpu_memory_to_use`"
...@@ -41,7 +41,7 @@ PaddlePaddle Fluid可以支持在现代GPU [#]_ 服务器集群上完成高性 ...@@ -41,7 +41,7 @@ PaddlePaddle Fluid可以支持在现代GPU [#]_ 服务器集群上完成高性
选择通信模式和执行模式 选择通信模式和执行模式
+++++++++++++++++++ +++++++++++++++++++
GPU分布式训练场景,使用多进程+NCCL2模式(collective)通常可以获得最好的性能。参考 `这里 <../../user_guides/howto/training/cluster_howto.html#permalink-8--nccl2->`_ 配置您的程序使用多进程NCCL2模式训练。 GPU分布式训练场景,使用多进程+NCCL2模式(collective)通常可以获得最好的性能。参考 :ref:`cluster_howto` 配置您的程序使用多进程NCCL2模式训练。
在进程模式下,每台服务器的每个GPU卡都会对应启动一个训练进程, 在进程模式下,每台服务器的每个GPU卡都会对应启动一个训练进程,
集群中的所有进程之间会互相通信完成训练。以此方式最大限度的降低进程内部资源抢占的开销。 集群中的所有进程之间会互相通信完成训练。以此方式最大限度的降低进程内部资源抢占的开销。
......
...@@ -196,7 +196,7 @@ PyReader ...@@ -196,7 +196,7 @@ PyReader
**代码示例** **代码示例**
1.如果iterable=false,则创建的Pyreader对象几乎与 ``fluid.layers.py_reader()`` 相同。算子将被插入program中。用户应该在每个epoch之前调用start(),并在epoch结束时捕获 ``Executor.run()`` 抛出的 ``fluid.core.EOFException `` 。一旦捕获到异常,用户应该调用reset()手动重置reader。 1.如果iterable=False,则创建的Pyreader对象几乎与 ``fluid.layers.py_reader()`` 相同。算子将被插入program中。用户应该在每个epoch之前调用start(),并在epoch结束时捕获 ``Executor.run()`` 抛出的 ``fluid.core.EOFException `` 。一旦捕获到异常,用户应该调用reset()手动重置reader。
.. code-block:: python .. code-block:: python
...@@ -220,7 +220,7 @@ PyReader ...@@ -220,7 +220,7 @@ PyReader
break break
2.如果iterable=True,则创建的Pyreader对象与程序分离。程序中不会插入任何算子。在本例中,创建的reader是一个python生成器,它是可迭代的。用户应将从Pyreader对象生成的数据输入 ``Executor.run(feed=...)`` 2.如果iterable=True,则创建的Pyreader对象与程序分离。程序中不会插入任何算子。在本例中,创建的reader是一个python生成器,它是可迭代的。用户应将从Pyreader对象生成的数据输入 ``Executor.run(feed=...)``
.. code-block:: python .. code-block:: python
...@@ -239,15 +239,15 @@ PyReader ...@@ -239,15 +239,15 @@ PyReader
for data in reader(): for data in reader():
executor.run(feed=data, ...) executor.run(feed=data, ...)
.. py:method::start() .. py:function:: start()
启动数据输入线程。只能在reader对象不可迭代时调用。 启动数据输入线程。只能在reader对象不可迭代时调用。
.. py:method::reset() .. py:function:: reset()
当 ``fluid.core.EOFException`` 提升时重置reader对象。只能在reader对象不可迭代时调用。 当 ``fluid.core.EOFException`` 提升时重置reader对象。只能在reader对象不可迭代时调用。
.. py:method::decorate_sample_generator(sample_generator, batch_size, drop_last=True, places=None) .. py:function:: decorate_sample_generator(sample_generator, batch_size, drop_last=True, places=None)
设置Pyreader对象的数据源。 设置Pyreader对象的数据源。
...@@ -264,7 +264,7 @@ PyReader ...@@ -264,7 +264,7 @@ PyReader
- **places** (None|list(CUDAPlace)|list(CPUPlace)) – 位置列表。当PyReader可迭代时必须被提供 - **places** (None|list(CUDAPlace)|list(CPUPlace)) – 位置列表。当PyReader可迭代时必须被提供
.. py:method::decorate_sample_list_generator(reader, places=None) .. py:function:: decorate_sample_list_generator(reader, places=None)
设置Pyreader对象的数据源。 设置Pyreader对象的数据源。
...@@ -277,7 +277,7 @@ PyReader ...@@ -277,7 +277,7 @@ PyReader
- **places** (None|list(CUDAPlace)|list(CPUPlace)) – 位置列表。当PyReader可迭代时必须被提供 - **places** (None|list(CUDAPlace)|list(CPUPlace)) – 位置列表。当PyReader可迭代时必须被提供
.. py:method::decorate_batch_generator(reader, places=None) .. py:function:: decorate_batch_generator(reader, places=None)
设置Pyreader对象的数据源。 设置Pyreader对象的数据源。
......
...@@ -6839,7 +6839,7 @@ reduce_max ...@@ -6839,7 +6839,7 @@ reduce_max
参数: 参数:
- **input** (Variable):输入变量为Tensor或LoDTensor。 - **input** (Variable):输入变量为Tensor或LoDTensor。
- **dim** (list | int | None):函数运算的维度。如果为None,则计算所有元素的平均值并返回单个元素的Tensor变量,否则必须在 :math:`[−rank(input),rank(input)]` 范围内。如果 :math:`dim [i] <0` ,则维度将减小为 :math:`rank+dim[i]` 。 - **dim** (list | int | None):函数运算的维度。如果为None,则计算所有元素中的最大值并返回单个元素的Tensor变量,否则必须在 :math:`[−rank(input),rank(input)]` 范围内。如果 :math:`dim [i] <0` ,则维度将减小为 :math:`rank+dim[i]` 。
- **keep_dim** (bool | False):是否在输出Tensor中保留减小的维度。除非 ``keep_dim`` 为true,否则结果张量将比输入少一个维度。 - **keep_dim** (bool | False):是否在输出Tensor中保留减小的维度。除非 ``keep_dim`` 为true,否则结果张量将比输入少一个维度。
- **name** (str | None):这一层的名称(可选)。如果设置为None,则将自动命名这一层。 - **name** (str | None):这一层的名称(可选)。如果设置为None,则将自动命名这一层。
......
...@@ -21,51 +21,66 @@ PaddlePaddle DyGraph是一个更加灵活易用的模式,可提供: ...@@ -21,51 +21,66 @@ PaddlePaddle DyGraph是一个更加灵活易用的模式,可提供:
## 设置和基本用法 ## 设置和基本用法
1. 升级到最新的PaddlePaddle 1.4: 1. 升级到最新的PaddlePaddle 1.4:
pip install -q --upgrade paddlepaddle==1.4 ```
pip install -q --upgrade paddlepaddle==1.4
```
2. 使用`fluid.dygraph.guard(place=None)` 上下文: 2. 使用`fluid.dygraph.guard(place=None)` 上下文:
import paddle.fluid as fluid ```python
with fluid.dygraph.guard(): import paddle.fluid as fluid
# write your executable dygraph code here with fluid.dygraph.guard():
# write your executable dygraph code here
现在您就可以在`fluid.dygraph.guard()`上下文环境中使用DyGraph的模式运行网络了,DyGraph将改变以往PaddlePaddle的执行方式: 现在他们将会立即执行,并且将计算结果返回给Python。 ```
现在您就可以在`fluid.dygraph.guard()`上下文环境中使用DyGraph的模式运行网络了,DyGraph将改变以往PaddlePaddle的执行方式: 现在他们将会立即执行,并且将计算结果返回给Python。
Dygraph将非常适合和Numpy一起使用,使用`fluid.dygraph.base.to_variable(x)`将会将ndarray转换为`fluid.Variable`,而使用`fluid.Variable.numpy()`将可以把任意时刻获取到的计算结果转换为Numpy`ndarray`
```python
x = np.ones([2, 2], np.float32)
with fluid.dygraph.guard():
inputs = []
for _ in range(10):
inputs.append(fluid.dygraph.base.to_variable(x))
ret = fluid.layers.sums(inputs)
print(ret.numpy())
```
得到输出 :
```
[[10. 10.]
[10. 10.]]
Process finished with exit code 0
```
> 这里创建了一系列`ndarray`的输入,执行了一个`sum`操作之后,我们可以直接将运行的结果打印出来
然后通过调用`reduce_sum`后使用`Variable.backward()`方法执行反向,使用`Variable.gradient()`方法即可获得反向网络执行完成后的梯度值的`ndarray`形式:
Dygraph将非常适合和Numpy一起使用,使用`fluid.dygraph.base.to_variable(x)`将会将ndarray转换为`fluid.Variable`,而使用`fluid.Variable.numpy()`将可以把任意时刻获取到的计算结果转换为Numpy`ndarray`:
x = np.ones([2, 2], np.float32)
with fluid.dygraph.guard():
inputs = []
for _ in range(10):
inputs.append(fluid.dygraph.base.to_variable(x))
ret = fluid.layers.sums(inputs)
print(ret.numpy())
[[10. 10.]
[10. 10.]]
Process finished with exit code 0
> 这里创建了一系列`ndarray`的输入,执行了一个`sum`操作之后,我们可以直接将运行的结果打印出来
然后通过调用`reduce_sum`后使用`Variable.backward()`方法执行反向,使用`Variable.gradient()`方法即可获得反向网络执行完成后的梯度值的`ndarray`形式:
loss = fluid.layers.reduce_sum(ret)
loss.backward()
print(loss.gradient())
[1.]
Process finished with exit code 0 ```python
loss = fluid.layers.reduce_sum(ret)
loss.backward()
print(loss.gradient())
```
得到输出 :
```
[1.]
Process finished with exit code 0
```
<!--3. 使用Python和Numpy的操作来构建一个网络: <!--3. 使用Python和Numpy的操作来构建一个网络:
...@@ -108,47 +123,64 @@ PaddlePaddle DyGraph是一个更加灵活易用的模式,可提供: ...@@ -108,47 +123,64 @@ PaddlePaddle DyGraph是一个更加灵活易用的模式,可提供:
1. 编写一段用于DyGraph执行的Object-Oriented-Designed, PaddlePaddle模型代码主要由以下**三个部分**组成: **请注意,如果您设计的这一层结构是包含参数的,则必需要使用继承自`fluid.Layer`的Object-Oriented-Designed的类来描述该层的行为。** 1. 编写一段用于DyGraph执行的Object-Oriented-Designed, PaddlePaddle模型代码主要由以下**三个部分**组成: **请注意,如果您设计的这一层结构是包含参数的,则必需要使用继承自`fluid.Layer`的Object-Oriented-Designed的类来描述该层的行为。**
1. 建立一个可以在DyGraph模式中执行的,Object-Oriented的网络,需要继承自`fluid.Layer`,其中需要调用基类的`__init__`方法,并且实现带有参数`name_scope`(用来标识本层的名字)的`__init__`构造函数,在构造函数中,我们通常会执行一些例如参数初始化,子网络初始化的操作,执行这些操作时不依赖于输入的动态信息: 1. 建立一个可以在DyGraph模式中执行的,Object-Oriented的网络,需要继承自`fluid.Layer`,其中需要调用基类的`__init__`方法,并且实现带有参数`name_scope`(用来标识本层的名字)的`__init__`构造函数,在构造函数中,我们通常会执行一些例如参数初始化,子网络初始化的操作,执行这些操作时不依赖于输入的动态信息:
class MyLayer(fluid.Layer):
def __init__(self, name_scope): ```python
super(MyLayer, self).__init__(name_scope) class MyLayer(fluid.Layer):
def __init__(self, name_scope):
2. 实现一个`forward(self, *inputs)`的执行函数,该函数将负责执行实际运行时网络的执行逻辑, 该函数将会在每一轮训练/预测中被调用,这里我们将执行一个简单的`relu` -> `elementwise add` -> `reduce sum`: super(MyLayer, self).__init__(name_scope)
```
2. 实现一个`forward(self, *inputs)`的执行函数,该函数将负责执行实际运行时网络的执行逻辑, 该函数将会在每一轮训练/预测中被调用,这里我们将执行一个简单的`relu` -> `elementwise add` -> `reduce sum`:
def forward(self, inputs):
x = fluid.layers.relu(inputs) ```python
self._x_for_debug = x def forward(self, inputs):
x = fluid.layers.elementwise_mul(x, x) x = fluid.layers.relu(inputs)
x = fluid.layers.reduce_sum(x) self._x_for_debug = x
return [x] x = fluid.layers.elementwise_mul(x, x)
x = fluid.layers.reduce_sum(x)
3. (可选)实现一个`build_once(self, *inputs)` 方法,该方法将作为一个单次执行的函数,用于初始化一些依赖于输入信息的参数和网络信息, 例如在`FC`(fully connected layer)当中, 需要依赖输入的`shape`初始化参数, 这里我们并不需要这样的操作,仅仅为了展示,因此这个方法可以直接跳过: return [x]
```
3. (可选)实现一个`build_once(self, *inputs)` 方法,该方法将作为一个单次执行的函数,用于初始化一些依赖于输入信息的参数和网络信息, 例如在`FC`(fully connected layer)当中, 需要依赖输入的`shape`初始化参数, 这里我们并不需要这样的操作,仅仅为了展示,因此这个方法可以直接跳过:
def build_once(self, input):
pass ```python
def build_once(self, input):
pass
```
2.`fluid.dygraph.guard()`中执行: 2.`fluid.dygraph.guard()`中执行:
1. 使用Numpy构建输入:
1. 使用Numpy构建输入:
np_inp = np.array([1.0, 2.0, -1.0], dtype=np.float32)
2. 输入转换并执行前向网络获取返回值: 使用`fluid.dygraph.base.to_variable(np_inp)`转换Numpy输入为DyGraph接收的输入,然后使用`l(var_inp)[0]`调用callable object并且获取了`x`作为返回值,利用`x.numpy()`方法直接获取了执行得到的`x`的`ndarray`返回值。 ```python
np_inp = np.array([1.0, 2.0, -1.0], dtype=np.float32)
```
2. 输入转换并执行前向网络获取返回值: 使用`fluid.dygraph.base.to_variable(np_inp)`转换Numpy输入为DyGraph接收的输入,然后使用`l(var_inp)[0]`调用callable object并且获取了`x`作为返回值,利用`x.numpy()`方法直接获取了执行得到的`x`的`ndarray`返回值。
with fluid.dygraph.guard(): ```python
var_inp = fluid.dygraph.base.to_variable(np_inp) with fluid.dygraph.guard():
l = MyLayer("my_layer") var_inp = fluid.dygraph.base.to_variable(np_inp)
x = l(var_inp)[0] l = MyLayer("my_layer")
dy_out = x.numpy() x = l(var_inp)[0]
dy_out = x.numpy()
```
3. 计算梯度:自动微分对于实现机器学习算法(例如用于训练神经网络的反向传播)来说很有用, 使用`x.backward()`方法可以从某个`fluid.Varaible`开始执行反向网络,同时利用`l._x_for_debug.gradient()`获取了网络中`x`梯度的`ndarray` 返回值: 3. 计算梯度:自动微分对于实现机器学习算法(例如用于训练神经网络的反向传播)来说很有用, 使用`x.backward()`方法可以从某个`fluid.Varaible`开始执行反向网络,同时利用`l._x_for_debug.gradient()`获取了网络中`x`梯度的`ndarray` 返回值:
x.backward()
dy_grad = l._x_for_debug.gradient()
```python
x.backward()
dy_grad = l._x_for_debug.gradient()
```
## 使用DyGraph训练模型 ## 使用DyGraph训练模型
...@@ -159,144 +191,153 @@ PaddlePaddle DyGraph是一个更加灵活易用的模式,可提供: ...@@ -159,144 +191,153 @@ PaddlePaddle DyGraph是一个更加灵活易用的模式,可提供:
1. 准备数据,我们使用`paddle.dataset.mnist`作为训练所需要的数据集: 1. 准备数据,我们使用`paddle.dataset.mnist`作为训练所需要的数据集:
train_reader = paddle.batch( ```python
paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True) train_reader = paddle.batch(
paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True)
```
2. 构建网络,虽然您可以根据之前的介绍自己定义所有的网络结构,但是您也可以直接使用`fluid.Layer.nn`当中我们为您定制好的一些基础网络结构,这里我们利用`fluid.Layer.nn.Conv2d`以及`fluid.Layer.nn.Pool2d`构建了基础的`SimpleImgConvPool` 2. 构建网络,虽然您可以根据之前的介绍自己定义所有的网络结构,但是您也可以直接使用`fluid.Layer.nn`当中我们为您定制好的一些基础网络结构,这里我们利用`fluid.Layer.nn.Conv2d`以及`fluid.Layer.nn.Pool2d`构建了基础的`SimpleImgConvPool`
class SimpleImgConvPool(fluid.dygraph.Layer): ```python
def __init__(self, class SimpleImgConvPool(fluid.dygraph.Layer):
name_scope, def __init__(self,
num_channels, name_scope,
num_filters, num_channels,
filter_size, num_filters,
pool_size, filter_size,
pool_stride, pool_size,
pool_padding=0, pool_stride,
pool_type='max', pool_padding=0,
global_pooling=False, pool_type='max',
conv_stride=1, global_pooling=False,
conv_padding=0, conv_stride=1,
conv_dilation=1, conv_padding=0,
conv_groups=1, conv_dilation=1,
act=None, conv_groups=1,
use_cudnn=False, act=None,
param_attr=None, use_cudnn=False,
bias_attr=None): param_attr=None,
super(SimpleImgConvPool, self).__init__(name_scope) bias_attr=None):
super(SimpleImgConvPool, self).__init__(name_scope)
self._conv2d = Conv2D(
self.full_name(), self._conv2d = Conv2D(
num_channels=num_channels, self.full_name(),
num_filters=num_filters, num_channels=num_channels,
filter_size=filter_size, num_filters=num_filters,
stride=conv_stride, filter_size=filter_size,
padding=conv_padding, stride=conv_stride,
dilation=conv_dilation, padding=conv_padding,
groups=conv_groups, dilation=conv_dilation,
param_attr=None, groups=conv_groups,
bias_attr=None, param_attr=None,
use_cudnn=use_cudnn) bias_attr=None,
use_cudnn=use_cudnn)
self._pool2d = Pool2D(
self.full_name(), self._pool2d = Pool2D(
pool_size=pool_size, self.full_name(),
pool_type=pool_type, pool_size=pool_size,
pool_stride=pool_stride, pool_type=pool_type,
pool_padding=pool_padding, pool_stride=pool_stride,
global_pooling=global_pooling, pool_padding=pool_padding,
use_cudnn=use_cudnn) global_pooling=global_pooling,
use_cudnn=use_cudnn)
def forward(self, inputs):
x = self._conv2d(inputs) def forward(self, inputs):
x = self._pool2d(x) x = self._conv2d(inputs)
return x x = self._pool2d(x)
return x
```
> 注意: 构建网络时子网络的定义和使用请在`__init__`中进行, 而子网络的调用则在`forward`函数中调用
> 注意: 构建网络时子网络的定义和使用请在`__init__`中进行, 而子网络的调用则在`forward`函数中调用
3. 利用已经构建好的`SimpleImgConvPool`组成最终的`MNIST`网络: 3. 利用已经构建好的`SimpleImgConvPool`组成最终的`MNIST`网络:
class MNIST(fluid.dygraph.Layer): ```python
def __init__(self, name_scope): class MNIST(fluid.dygraph.Layer):
super(MNIST, self).__init__(name_scope) def __init__(self, name_scope):
super(MNIST, self).__init__(name_scope)
self._simple_img_conv_pool_1 = SimpleImgConvPool(
self.full_name(), 1, 20, 5, 2, 2, act="relu") self._simple_img_conv_pool_1 = SimpleImgConvPool(
self.full_name(), 1, 20, 5, 2, 2, act="relu")
self._simple_img_conv_pool_2 = SimpleImgConvPool(
self.full_name(), 20, 50, 5, 2, 2, act="relu") self._simple_img_conv_pool_2 = SimpleImgConvPool(
self.full_name(), 20, 50, 5, 2, 2, act="relu")
pool_2_shape = 50 * 4 * 4
SIZE = 10 pool_2_shape = 50 * 4 * 4
scale = (2.0 / (pool_2_shape**2 * SIZE))**0.5 SIZE = 10
self._fc = FC(self.full_name(), scale = (2.0 / (pool_2_shape**2 * SIZE))**0.5
10, self._fc = FC(self.full_name(),
param_attr=fluid.param_attr.ParamAttr( 10,
initializer=fluid.initializer.NormalInitializer( param_attr=fluid.param_attr.ParamAttr(
loc=0.0, scale=scale)), initializer=fluid.initializer.NormalInitializer(
act="softmax") loc=0.0, scale=scale)),
act="softmax")
def forward(self, inputs):
x = self._simple_img_conv_pool_1(inputs) def forward(self, inputs):
x = self._simple_img_conv_pool_2(x) x = self._simple_img_conv_pool_1(inputs)
x = self._fc(x) x = self._simple_img_conv_pool_2(x)
return x x = self._fc(x)
return x
```
4.`fluid.dygraph.guard()`中定义配置好的`MNIST`网络结构,此时即使没有训练也可以在`fluid.dygraph.guard()`中调用模型并且检查输出: 4.`fluid.dygraph.guard()`中定义配置好的`MNIST`网络结构,此时即使没有训练也可以在`fluid.dygraph.guard()`中调用模型并且检查输出:
with fluid.dygraph.guard(): ```python
mnist = MNIST("mnist") with fluid.dygraph.guard():
id, data = list(enumerate(train_reader()))[0] mnist = MNIST("mnist")
dy_x_data = np.array( id, data = list(enumerate(train_reader()))[0]
[x[0].reshape(1, 28, 28) dy_x_data = np.array(
for x in data]).astype('float32') [x[0].reshape(1, 28, 28)
img = to_variable(dy_x_data) for x in data]).astype('float32')
print("cost is: {}".format(mnist(img).numpy())) img = to_variable(dy_x_data)
print("cost is: {}".format(mnist(img).numpy()))
```
cost is: [[0.10135901 0.1051138 0.1027941 ... 0.0972859 0.10221873 0.10165327] 得到输出:
[0.09735426 0.09970362 0.10198303 ... 0.10134517 0.10179105 0.10025002]
[0.09539858 0.10213123 0.09543551 ... 0.10613529 0.10535969 0.097991 ] ```
... cost is: [[0.10135901 0.1051138 0.1027941 ... 0.0972859 0.10221873 0.10165327]
[0.10120598 0.0996111 0.10512722 ... 0.10067689 0.10088114 0.10071224] [0.09735426 0.09970362 0.10198303 ... 0.10134517 0.10179105 0.10025002]
[0.09889644 0.10033772 0.10151272 ... 0.10245881 0.09878646 0.101483 ] [0.09539858 0.10213123 0.09543551 ... 0.10613529 0.10535969 0.097991 ]
[0.09097178 0.10078511 0.10198414 ... 0.10317434 0.10087223 0.09816764]] ...
[0.10120598 0.0996111 0.10512722 ... 0.10067689 0.10088114 0.10071224]
Process finished with exit code 0 [0.09889644 0.10033772 0.10151272 ... 0.10245881 0.09878646 0.101483 ]
[0.09097178 0.10078511 0.10198414 ... 0.10317434 0.10087223 0.09816764]]
Process finished with exit code 0
```
5. 构建训练循环,在每一轮参数更新完成后我们调用`mnist.clear_gradients()`来重置梯度: 5. 构建训练循环,在每一轮参数更新完成后我们调用`mnist.clear_gradients()`来重置梯度:
for epoch in range(epoch_num): ```python
for batch_id, data in enumerate(train_reader()): for epoch in range(epoch_num):
dy_x_data = np.array( for batch_id, data in enumerate(train_reader()):
[x[0].reshape(1, 28, 28) dy_x_data = np.array(
for x in data]).astype('float32') [x[0].reshape(1, 28, 28)
y_data = np.array( for x in data]).astype('float32')
[x[1] for x in data]).astype('int64').reshape(BATCH_SIZE, 1) y_data = np.array(
[x[1] for x in data]).astype('int64').reshape(BATCH_SIZE, 1)
img = to_variable(dy_x_data)
label = to_variable(y_data)
label.stop_gradient = True
cost = mnist(img) img = to_variable(dy_x_data)
loss = fluid.layers.cross_entropy(cost, label) label = to_variable(y_data)
avg_loss = fluid.layers.mean(loss) label.stop_gradient = True
dy_out = avg_loss.numpy() cost = mnist(img)
avg_loss.backward() loss = fluid.layers.cross_entropy(cost, label)
sgd.minimize(avg_loss) avg_loss = fluid.layers.mean(loss)
mnist.clear_gradients()
dy_out = avg_loss.numpy()
avg_loss.backward()
sgd.minimize(avg_loss)
mnist.clear_gradients()
```
...@@ -305,130 +346,137 @@ PaddlePaddle DyGraph是一个更加灵活易用的模式,可提供: ...@@ -305,130 +346,137 @@ PaddlePaddle DyGraph是一个更加灵活易用的模式,可提供:
模型的参数或者任何您希望检测的值可以作为变量封装在类中,并且通过对象获取并使用`numpy()`方法获取其`ndarray`的输出, 在训练过程中您可以使用`mnist.parameters()`来获取到网络中所有的参数,也可以指定某一个`Layer`的某个参数或者`parameters()`来获取该层的所有参数,使用`numpy()`方法随时查看参数的值 模型的参数或者任何您希望检测的值可以作为变量封装在类中,并且通过对象获取并使用`numpy()`方法获取其`ndarray`的输出, 在训练过程中您可以使用`mnist.parameters()`来获取到网络中所有的参数,也可以指定某一个`Layer`的某个参数或者`parameters()`来获取该层的所有参数,使用`numpy()`方法随时查看参数的值
反向运行后调用之前定义的`SGD`优化器对象的`minimize`方法进行参数更新: 反向运行后调用之前定义的`SGD`优化器对象的`minimize`方法进行参数更新:
with fluid.dygraph.guard(): ```python
fluid.default_startup_program().random_seed = seed with fluid.dygraph.guard():
fluid.default_main_program().random_seed = seed fluid.default_startup_program().random_seed = seed
fluid.default_main_program().random_seed = seed
mnist = MNIST("mnist")
sgd = SGDOptimizer(learning_rate=1e-3) mnist = MNIST("mnist")
train_reader = paddle.batch( sgd = SGDOptimizer(learning_rate=1e-3)
paddle.dataset.mnist.train(), batch_size= BATCH_SIZE, drop_last=True) train_reader = paddle.batch(
paddle.dataset.mnist.train(), batch_size= BATCH_SIZE, drop_last=True)
np.set_printoptions(precision=3, suppress=True)
for epoch in range(epoch_num): np.set_printoptions(precision=3, suppress=True)
for batch_id, data in enumerate(train_reader()): for epoch in range(epoch_num):
dy_x_data = np.array( for batch_id, data in enumerate(train_reader()):
[x[0].reshape(1, 28, 28) dy_x_data = np.array(
for x in data]).astype('float32') [x[0].reshape(1, 28, 28)
y_data = np.array( for x in data]).astype('float32')
[x[1] for x in data]).astype('int64').reshape(BATCH_SIZE, 1) y_data = np.array(
[x[1] for x in data]).astype('int64').reshape(BATCH_SIZE, 1)
img = to_variable(dy_x_data)
label = to_variable(y_data) img = to_variable(dy_x_data)
label.stop_gradient = True label = to_variable(y_data)
label.stop_gradient = True
cost = mnist(img)
loss = fluid.layers.cross_entropy(cost, label) cost = mnist(img)
avg_loss = fluid.layers.mean(loss) loss = fluid.layers.cross_entropy(cost, label)
avg_loss = fluid.layers.mean(loss)
dy_out = avg_loss.numpy()
dy_out = avg_loss.numpy()
avg_loss.backward()
sgd.minimize(avg_loss) avg_loss.backward()
mnist.clear_gradients() sgd.minimize(avg_loss)
mnist.clear_gradients()
dy_param_value = {}
for param in mnist.parameters(): dy_param_value = {}
dy_param_value[param.name] = param.numpy() for param in mnist.parameters():
dy_param_value[param.name] = param.numpy()
if batch_id % 20 == 0:
print("Loss at step {}: {:.7}".format(batch_id, avg_loss.numpy())) if batch_id % 20 == 0:
print("Final loss: {:.7}".format(avg_loss.numpy())) print("Loss at step {}: {:.7}".format(batch_id, avg_loss.numpy()))
print("_simple_img_conv_pool_1_conv2d W's mean is: {}".format(mnist._simple_img_conv_pool_1._conv2d._filter_param.numpy().mean())) print("Final loss: {:.7}".format(avg_loss.numpy()))
print("_simple_img_conv_pool_1_conv2d Bias's mean is: {}".format(mnist._simple_img_conv_pool_1._conv2d._bias_param.numpy().mean())) print("_simple_img_conv_pool_1_conv2d W's mean is: {}".format(mnist._simple_img_conv_pool_1._conv2d._filter_param.numpy().mean()))
print("_simple_img_conv_pool_1_conv2d Bias's mean is: {}".format(mnist._simple_img_conv_pool_1._conv2d._bias_param.numpy().mean()))
```
Loss at step 0: [2.302]
Loss at step 20: [1.616] 得到输出:
Loss at step 40: [1.244]
Loss at step 60: [1.142] ```
Loss at step 80: [0.911] Loss at step 0: [2.302]
Loss at step 100: [0.824] Loss at step 20: [1.616]
Loss at step 120: [0.774] Loss at step 40: [1.244]
Loss at step 140: [0.626] Loss at step 60: [1.142]
Loss at step 160: [0.609] Loss at step 80: [0.911]
Loss at step 180: [0.627] Loss at step 100: [0.824]
Loss at step 200: [0.466] Loss at step 120: [0.774]
Loss at step 220: [0.499] Loss at step 140: [0.626]
Loss at step 240: [0.614] Loss at step 160: [0.609]
Loss at step 260: [0.585] Loss at step 180: [0.627]
Loss at step 280: [0.503] Loss at step 200: [0.466]
Loss at step 300: [0.423] Loss at step 220: [0.499]
Loss at step 320: [0.509] Loss at step 240: [0.614]
Loss at step 340: [0.348] Loss at step 260: [0.585]
Loss at step 360: [0.452] Loss at step 280: [0.503]
Loss at step 380: [0.397] Loss at step 300: [0.423]
Loss at step 400: [0.54] Loss at step 320: [0.509]
Loss at step 420: [0.341] Loss at step 340: [0.348]
Loss at step 440: [0.337] Loss at step 360: [0.452]
Loss at step 460: [0.155] Loss at step 380: [0.397]
Final loss: [0.164] Loss at step 400: [0.54]
_simple_img_conv_pool_1_conv2d W's mean is: 0.00606656912714 Loss at step 420: [0.341]
_simple_img_conv_pool_1_conv2d Bias's mean is: -3.4576318285e-05 Loss at step 440: [0.337]
Loss at step 460: [0.155]
Final loss: [0.164]
_simple_img_conv_pool_1_conv2d W's mean is: 0.00606656912714
_simple_img_conv_pool_1_conv2d Bias's mean is: -3.4576318285e-05
```
7. 性能 7. 性能
在使用`fluid.dygraph.guard()`可以通过传入`fluid.CUDAPlace(0)`或者`fluid.CPUPlace()`来选择执行DyGraph的设备,通常如果不做任何处理将会自动适配您的设备。 在使用`fluid.dygraph.guard()`可以通过传入`fluid.CUDAPlace(0)`或者`fluid.CPUPlace()`来选择执行DyGraph的设备,通常如果不做任何处理将会自动适配您的设备。
## 模型参数的保存 ## 模型参数的保存

在模型训练中可以使用` fluid.dygraph.save_persistables(your_model_object.state_dict(), "save_dir")`来保存`your_model_object`中所有的模型参数。也可以自定义需要保存的“参数名” - “参数对象”的Python Dictionary传入。 
在模型训练中可以使用` fluid.dygraph.save_persistables(your_model_object.state_dict(), "save_dir")`来保存`your_model_object`中所有的模型参数。也可以自定义需要保存的“参数名” - “参数对象”的Python Dictionary传入。
同样可以使用`your_modle_object.load_dict(fluid.dygraph.load_persistables("save_dir"))`接口来恢复保存的模型参数从而达到继续训练的目的。 同样可以使用`your_modle_object.load_dict(fluid.dygraph.load_persistables("save_dir"))`接口来恢复保存的模型参数从而达到继续训练的目的。
下面的代码展示了如何在“手写数字识别”任务中保存参数并且读取已经保存的参数来继续训练。 下面的代码展示了如何在“手写数字识别”任务中保存参数并且读取已经保存的参数来继续训练。
dy_param_init_value={} ```python
for epoch in range(epoch_num): dy_param_init_value={}
for batch_id, data in enumerate(train_reader()): for epoch in range(epoch_num):
dy_x_data = np.array( for batch_id, data in enumerate(train_reader()):
[x[0].reshape(1, 28, 28) dy_x_data = np.array(
for x in data]).astype('float32') [x[0].reshape(1, 28, 28)
y_data = np.array( for x in data]).astype('float32')
[x[1] for x in data]).astype('int64').reshape(BATCH_SIZE, 1) y_data = np.array(
[x[1] for x in data]).astype('int64').reshape(BATCH_SIZE, 1)
img = to_variable(dy_x_data)
label = to_variable(y_data) img = to_variable(dy_x_data)
label.stop_gradient = True label = to_variable(y_data)
label.stop_gradient = True
cost = mnist(img)
loss = fluid.layers.cross_entropy(cost, label) cost = mnist(img)
avg_loss = fluid.layers.mean(loss) loss = fluid.layers.cross_entropy(cost, label)
avg_loss = fluid.layers.mean(loss)
dy_out = avg_loss.numpy()
dy_out = avg_loss.numpy()
avg_loss.backward()
sgd.minimize(avg_loss) avg_loss.backward()
fluid.dygraph.save_persistables(mnist.state_dict(), "save_dir") sgd.minimize(avg_loss)
mnist.clear_gradients() fluid.dygraph.save_persistables(mnist.state_dict(), "save_dir")
mnist.clear_gradients()
for param in mnist.parameters():
dy_param_init_value[param.name] = param.numpy() for param in mnist.parameters():
mnist.load_dict(fluid.dygraph.load_persistables("save_dir")) dy_param_init_value[param.name] = param.numpy()
restore = mnist.parameters() mnist.load_dict(fluid.dygraph.load_persistables("save_dir"))
# check save and load restore = mnist.parameters()
success = True # check save and load
for value in restore: success = True
if (not np.allclose(value.numpy(), dy_param_init_value[value.name])) or (not np.isfinite(value.numpy().all())) or (np.isnan(value.numpy().any())): for value in restore:
success = False if (not np.allclose(value.numpy(), dy_param_init_value[value.name])) or (not np.isfinite(value.numpy().all())) or (np.isnan(value.numpy().any())):
print("model save and load success? {}".format(success)) success = False
print("model save and load success? {}".format(success))
```
...@@ -443,134 +491,140 @@ PaddlePaddle DyGraph是一个更加灵活易用的模式,可提供: ...@@ -443,134 +491,140 @@ PaddlePaddle DyGraph是一个更加灵活易用的模式,可提供:
我们在第二个`fluid.dygraph.guard()`上下文中利用之前保存的`checkpoint`进行预测,同样的在执行预测前需要使用`YourModel.eval()`来切换的预测模式。 我们在第二个`fluid.dygraph.guard()`上下文中利用之前保存的`checkpoint`进行预测,同样的在执行预测前需要使用`YourModel.eval()`来切换的预测模式。
with fluid.dygraph.guard(): ```python
fluid.default_startup_program().random_seed = seed with fluid.dygraph.guard():
fluid.default_main_program().random_seed = seed fluid.default_startup_program().random_seed = seed
fluid.default_main_program().random_seed = seed
mnist = MNIST("mnist")
adam = AdamOptimizer(learning_rate=0.001) mnist = MNIST("mnist")
train_reader = paddle.batch( adam = AdamOptimizer(learning_rate=0.001)
paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True) train_reader = paddle.batch(
test_reader = paddle.batch( paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True)
paddle.dataset.mnist.test(), batch_size=BATCH_SIZE, drop_last=True) test_reader = paddle.batch(
for epoch in range(epoch_num): paddle.dataset.mnist.test(), batch_size=BATCH_SIZE, drop_last=True)
for batch_id, data in enumerate(train_reader()): for epoch in range(epoch_num):
dy_x_data = np.array( for batch_id, data in enumerate(train_reader()):
[x[0].reshape(1, 28, 28) dy_x_data = np.array(
for x in data]).astype('float32') [x[0].reshape(1, 28, 28)
y_data = np.array( for x in data]).astype('float32')
[x[1] for x in data]).astype('int64').reshape(BATCH_SIZE, 1) y_data = np.array(
[x[1] for x in data]).astype('int64').reshape(BATCH_SIZE, 1)
img = to_variable(dy_x_data)
label = to_variable(y_data) img = to_variable(dy_x_data)
label.stop_gradient = True label = to_variable(y_data)
label.stop_gradient = True
cost, acc = mnist(img, label)
cost, acc = mnist(img, label)
loss = fluid.layers.cross_entropy(cost, label)
avg_loss = fluid.layers.mean(loss) loss = fluid.layers.cross_entropy(cost, label)
avg_loss.backward() avg_loss = fluid.layers.mean(loss)
adam.minimize(avg_loss) avg_loss.backward()
# save checkpoint adam.minimize(avg_loss)
mnist.clear_gradients() # save checkpoint
if batch_id % 100 == 0: mnist.clear_gradients()
print("Loss at epoch {} step {}: {:}".format(epoch, batch_id, avg_loss.numpy())) if batch_id % 100 == 0:
mnist.eval() print("Loss at epoch {} step {}: {:}".format(epoch, batch_id, avg_loss.numpy()))
test_cost, test_acc = self._test_train(test_reader, mnist, BATCH_SIZE) mnist.eval()
mnist.train() test_cost, test_acc = self._test_train(test_reader, mnist, BATCH_SIZE)
print("Loss at epoch {} , Test avg_loss is: {}, acc is: {}".format(epoch, test_cost, test_acc)) mnist.train()
print("Loss at epoch {} , Test avg_loss is: {}, acc is: {}".format(epoch, test_cost, test_acc))
fluid.dygraph.save_persistables(mnist.state_dict(), "save_dir")
print("checkpoint saved") fluid.dygraph.save_persistables(mnist.state_dict(), "save_dir")
print("checkpoint saved")
with fluid.dygraph.guard():
fluid.default_startup_program().random_seed = seed with fluid.dygraph.guard():
fluid.default_main_program().random_seed = seed fluid.default_startup_program().random_seed = seed
fluid.default_main_program().random_seed = seed
mnist_infer = MNIST("mnist")
# load checkpoint mnist_infer = MNIST("mnist")
mnist_infer.load_dict( # load checkpoint
fluid.dygraph.load_persistables("save_dir")) mnist_infer.load_dict(
print("checkpoint loaded") fluid.dygraph.load_persistables("save_dir"))
print("checkpoint loaded")
# start evaluate mode
mnist_infer.eval() # start evaluate mode
def load_image(file): mnist_infer.eval()
im = Image.open(file).convert('L') def load_image(file):
im = im.resize((28, 28), Image.ANTIALIAS) im = Image.open(file).convert('L')
im = np.array(im).reshape(1, 1, 28, 28).astype(np.float32) im = im.resize((28, 28), Image.ANTIALIAS)
im = im / 255.0 * 2.0 - 1.0 im = np.array(im).reshape(1, 1, 28, 28).astype(np.float32)
return im im = im / 255.0 * 2.0 - 1.0
return im
cur_dir = os.path.dirname(os.path.realpath(__file__))
tensor_img = load_image(cur_dir + '/image/infer_3.png') cur_dir = os.path.dirname(os.path.realpath(__file__))
tensor_img = load_image(cur_dir + '/image/infer_3.png')
results = mnist_infer(to_variable(tensor_img))
lab = np.argsort(results.numpy()) results = mnist_infer(to_variable(tensor_img))
print("Inference result of image/infer_3.png is: %d" % lab[0][-1]) lab = np.argsort(results.numpy())
print("Inference result of image/infer_3.png is: %d" % lab[0][-1])
```
Loss at epoch 3 , Test avg_loss is: 0.0721620170576, acc is: 0.97796474359 得到输出:
Loss at epoch 4 step 0: [0.01078923]
Loss at epoch 4 step 100: [0.10447877] ```
Loss at epoch 4 step 200: [0.05149534] Loss at epoch 3 , Test avg_loss is: 0.0721620170576, acc is: 0.97796474359
Loss at epoch 4 step 300: [0.0122997] Loss at epoch 4 step 0: [0.01078923]
Loss at epoch 4 step 400: [0.0281883] Loss at epoch 4 step 100: [0.10447877]
Loss at epoch 4 step 500: [0.10709661] Loss at epoch 4 step 200: [0.05149534]
Loss at epoch 4 step 600: [0.1306036] Loss at epoch 4 step 300: [0.0122997]
Loss at epoch 4 step 700: [0.01628026] Loss at epoch 4 step 400: [0.0281883]
Loss at epoch 4 step 800: [0.07947419] Loss at epoch 4 step 500: [0.10709661]
Loss at epoch 4 step 900: [0.02067161] Loss at epoch 4 step 600: [0.1306036]
Loss at epoch 4 , Test avg_loss is: 0.0802323290939, acc is: 0.976963141026 Loss at epoch 4 step 700: [0.01628026]
checkpoint saved Loss at epoch 4 step 800: [0.07947419]
checkpoint loaded Loss at epoch 4 step 900: [0.02067161]
Loss at epoch 4 , Test avg_loss is: 0.0802323290939, acc is: 0.976963141026
checkpoint saved
Ran 1 test in 208.017s checkpoint loaded
Inference result of image/infer_3.png is: 3
Ran 1 test in 208.017s
Inference result of image/infer_3.png is: 3
```
## 编写兼容的模型 ## 编写兼容的模型
以上一步中手写数字识别的例子为例,相同的模型代码可以直接在PaddlePaddle的`Executor`中执行: 以上一步中手写数字识别的例子为例,相同的模型代码可以直接在PaddlePaddle的`Executor`中执行:
exe = fluid.Executor(fluid.CPUPlace( ```python
) if not core.is_compiled_with_cuda() else fluid.CUDAPlace(0)) exe = fluid.Executor(fluid.CPUPlace(
) if not core.is_compiled_with_cuda() else fluid.CUDAPlace(0))
mnist = MNIST("mnist")
sgd = SGDOptimizer(learning_rate=1e-3) mnist = MNIST("mnist")
train_reader = paddle.batch( sgd = SGDOptimizer(learning_rate=1e-3)
paddle.dataset.mnist.train(), batch_size= BATCH_SIZE, drop_last=True) train_reader = paddle.batch(
paddle.dataset.mnist.train(), batch_size= BATCH_SIZE, drop_last=True)
img = fluid.layers.data(
name='pixel', shape=[1, 28, 28], dtype='float32') img = fluid.layers.data(
label = fluid.layers.data(name='label', shape=[1], dtype='int64') name='pixel', shape=[1, 28, 28], dtype='float32')
cost = mnist(img) label = fluid.layers.data(name='label', shape=[1], dtype='int64')
loss = fluid.layers.cross_entropy(cost, label) cost = mnist(img)
avg_loss = fluid.layers.mean(loss) loss = fluid.layers.cross_entropy(cost, label)
sgd.minimize(avg_loss) avg_loss = fluid.layers.mean(loss)
sgd.minimize(avg_loss)
out = exe.run(fluid.default_startup_program())
out = exe.run(fluid.default_startup_program())
for epoch in range(epoch_num):
for batch_id, data in enumerate(train_reader()): for epoch in range(epoch_num):
static_x_data = np.array( for batch_id, data in enumerate(train_reader()):
[x[0].reshape(1, 28, 28) static_x_data = np.array(
for x in data]).astype('float32') [x[0].reshape(1, 28, 28)
y_data = np.array( for x in data]).astype('float32')
[x[1] for x in data]).astype('int64').reshape([BATCH_SIZE, 1]) y_data = np.array(
[x[1] for x in data]).astype('int64').reshape([BATCH_SIZE, 1])
fetch_list = [avg_loss.name]
out = exe.run( fetch_list = [avg_loss.name]
fluid.default_main_program(), out = exe.run(
feed={"pixel": static_x_data, fluid.default_main_program(),
"label": y_data}, feed={"pixel": static_x_data,
fetch_list=fetch_list) "label": y_data},
fetch_list=fetch_list)
static_out = out[0]
static_out = out[0]
```
\ No newline at end of file
...@@ -171,6 +171,9 @@ PSERVER 节点中会保存所有 TRAINER 节点的状态信息,在 TRAINER 结 ...@@ -171,6 +171,9 @@ PSERVER 节点中会保存所有 TRAINER 节点的状态信息,在 TRAINER 结
# training process ... # training process ...
exe.close() # notify PServer to destory the resource exe.close() # notify PServer to destory the resource
注意:所有的trainer在退出时都需要调用exe.close()。
启动分布式训练任务 启动分布式训练任务
-------------------- --------------------
......
...@@ -167,6 +167,7 @@ The status information of all trainer nodes is saved in the pserver node. When t ...@@ -167,6 +167,7 @@ The status information of all trainer nodes is saved in the pserver node. When t
# training process ... # training process ...
exe.close() # notify PServer to destory the resource exe.close() # notify PServer to destory the resource
Note: every trainer needs to call exe.close() when the trainer finishes.
Start a Distributed Training Task Start a Distributed Training Task
---------------------------------- ----------------------------------
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册