提交 783c4b20 编写于 作者: Y Youwei Song 提交者: Jiabin Yang

fix Dygraph cn sample code and doc format (#1508)

* fix Dygraph cn sample code, test=develop

* fix Dygraph doc format, test=develop
上级 cacbfe4e
...@@ -21,36 +21,41 @@ PaddlePaddle DyGraph是一个更加灵活易用的模式,可提供: ...@@ -21,36 +21,41 @@ PaddlePaddle DyGraph是一个更加灵活易用的模式,可提供:
## 设置和基本用法 ## 设置和基本用法
1. 升级到最新的PaddlePaddle 1.5: 1. 升级到最新的PaddlePaddle 1.5:
pip install -q --upgrade paddlepaddle==1.5
```
pip install -q --upgrade paddlepaddle==1.5
```
2. 使用`fluid.dygraph.guard(place=None)` 上下文:
2. 使用`fluid.dygraph.guard(place=None)` 上下文:
import paddle.fluid as fluid
with fluid.dygraph.guard():
# write your executable dygraph code here
```python
import paddle.fluid as fluid
with fluid.dygraph.guard():
# write your executable dygraph code here
```
现在您就可以在`fluid.dygraph.guard()`上下文环境中使用DyGraph的模式运行网络了,DyGraph将改变以往PaddlePaddle的执行方式: 现在他们将会立即执行,并且将计算结果返回给Python。 现在您就可以在`fluid.dygraph.guard()`上下文环境中使用DyGraph的模式运行网络了,DyGraph将改变以往PaddlePaddle的执行方式: 现在他们将会立即执行,并且将计算结果返回给Python。
Dygraph将非常适合和Numpy一起使用,使用`fluid.dygraph.to_variable(x)`将会将ndarray转换为`fluid.Variable`,而使用`fluid.Variable.numpy()`将可以把任意时刻获取到的计算结果转换为Numpy`ndarray` Dygraph将非常适合和Numpy一起使用,使用`fluid.dygraph.to_variable(x)`将会将ndarray转换为`fluid.Variable`,而使用`fluid.Variable.numpy()`将可以把任意时刻获取到的计算结果转换为Numpy`ndarray`
x = np.ones([2, 2], np.float32) ```python
with fluid.dygraph.guard(): x = np.ones([2, 2], np.float32)
inputs = [] with fluid.dygraph.guard():
for _ in range(10): inputs = []
inputs.append(fluid.dygraph.to_variable(x)) for _ in range(10):
ret = fluid.layers.sums(inputs) inputs.append(fluid.dygraph.to_variable(x))
print(ret.numpy()) ret = fluid.layers.sums(inputs)
print(ret.numpy())
```
[[10. 10.]
[10. 10.]] 得到输出:
Process finished with exit code 0 ```
[[10. 10.]
[10. 10.]]
```
> 这里创建了一系列`ndarray`的输入,执行了一个`sum`操作之后,我们可以直接将运行的结果打印出来 > 这里创建了一系列`ndarray`的输入,执行了一个`sum`操作之后,我们可以直接将运行的结果打印出来
...@@ -58,260 +63,257 @@ Dygraph将非常适合和Numpy一起使用,使用`fluid.dygraph.to_variable(x) ...@@ -58,260 +63,257 @@ Dygraph将非常适合和Numpy一起使用,使用`fluid.dygraph.to_variable(x)
然后通过调用`reduce_sum`后使用`Variable.backward()`方法执行反向,使用`Variable.gradient()`方法即可获得反向网络执行完成后的梯度值的`ndarray`形式: 然后通过调用`reduce_sum`后使用`Variable.backward()`方法执行反向,使用`Variable.gradient()`方法即可获得反向网络执行完成后的梯度值的`ndarray`形式:
```python
loss = fluid.layers.reduce_sum(ret)
loss = fluid.layers.reduce_sum(ret) loss.backward()
loss.backward() print(loss.gradient())
print(loss.gradient()) ```
得到输出 : 得到输出 :
[1.]
Process finished with exit code 0
```
[1.]
```
## 基于DyGraph构建网络 ## 基于DyGraph构建网络
1. 编写一段用于DyGraph执行的Object-Oriented-Designed, PaddlePaddle模型代码主要由以下**三个部分**组成: **请注意,如果您设计的这一层结构是包含参数的,则必须要使用继承自`fluid.dygraph.Layer`的Object-Oriented-Designed的类来描述该层的行为。** 1. 编写一段用于DyGraph执行的Object-Oriented-Designed, PaddlePaddle模型代码主要由以下**三个部分**组成: **请注意,如果您设计的这一层结构是包含参数的,则必须要使用继承自`fluid.dygraph.Layer`的Object-Oriented-Designed的类来描述该层的行为。**
1. 建立一个可以在DyGraph模式中执行的,Object-Oriented的网络,需要继承自`fluid.dygraph.Layer`,其中需要调用基类的`__init__`方法,并且实现带有参数`name_scope`(用来标识本层的名字)的`__init__`构造函数,在构造函数中,我们通常会执行一些例如参数初始化,子网络初始化的操作,执行这些操作时不依赖于输入的动态信息: 1. 建立一个可以在DyGraph模式中执行的,Object-Oriented的网络,需要继承自`fluid.dygraph.Layer`,其中需要调用基类的`__init__`方法,并且实现带有参数`name_scope`(用来标识本层的名字)的`__init__`构造函数,在构造函数中,我们通常会执行一些例如参数初始化,子网络初始化的操作,执行这些操作时不依赖于输入的动态信息:
```python
class MyLayer(fluid.dygraph.Layer): class MyLayer(fluid.dygraph.Layer):
def __init__(self, name_scope): def __init__(self, name_scope):
super(MyLayer, self).__init__(name_scope) super(MyLayer, self).__init__(name_scope)
```
2. 实现一个`forward(self, *inputs)`的执行函数,该函数将负责执行实际运行时网络的执行逻辑, 该函数将会在每一轮训练/预测中被调用,这里我们将执行一个简单的`relu` -> `elementwise add` -> `reduce sum`: 2. 实现一个`forward(self, *inputs)`的执行函数,该函数将负责执行实际运行时网络的执行逻辑, 该函数将会在每一轮训练/预测中被调用,这里我们将执行一个简单的`relu` -> `elementwise add` -> `reduce sum`:
def forward(self, inputs):
x = fluid.layers.relu(inputs)
self._x_for_debug = x
x = fluid.layers.elementwise_mul(x, x)
x = fluid.layers.reduce_sum(x)
return [x]
2.`fluid.dygraph.guard()`中执行: ```python
def forward(self, inputs):
x = fluid.layers.relu(inputs)
self._x_for_debug = x
x = fluid.layers.elementwise_mul(x, x)
x = fluid.layers.reduce_sum(x)
return [x]
```
2.`fluid.dygraph.guard()`中执行:
1. 使用Numpy构建输入:
```python
np_inp = np.array([1.0, 2.0, -1.0], dtype=np.float32)
```
1. 使用Numpy构建输入:
np_inp = np.array([1.0, 2.0, -1.0], dtype=np.float32)
2. 转换输入的`ndarray`为`Variable`, 并执行前向网络获取返回值: 使用`fluid.dygraph.to_variable(np_inp)`转换Numpy输入为DyGraph接收的输入,然后使用`my_layer(var_inp)[0]`调用callable object并且获取了`x`作为返回值,利用`x.numpy()`方法直接获取了执行得到的`x`的`ndarray`返回值。 2. 转换输入的`ndarray`为`Variable`, 并执行前向网络获取返回值: 使用`fluid.dygraph.to_variable(np_inp)`转换Numpy输入为DyGraph接收的输入,然后使用`my_layer(var_inp)[0]`调用callable object并且获取了`x`作为返回值,利用`x.numpy()`方法直接获取了执行得到的`x`的`ndarray`返回值。
with fluid.dygraph.guard(): ```python
var_inp = fluid.dygraph.to_variable(np_inp) with fluid.dygraph.guard():
my_layer = MyLayer("my_layer") var_inp = fluid.dygraph.to_variable(np_inp)
x = my_layer(var_inp)[0] my_layer = MyLayer("my_layer")
dy_out = x.numpy() x = my_layer(var_inp)[0]
dy_out = x.numpy()
```
3. 计算梯度:自动微分对于实现机器学习算法(例如用于训练神经网络的反向传播)来说很有用, 使用`x.backward()`方法可以从某个`fluid.Varaible`开始执行反向网络,同时利用`my_layer._x_for_debug.gradient()`获取了网络中`x`梯度的`ndarray` 返回值: 3. 计算梯度:自动微分对于实现机器学习算法(例如用于训练神经网络的反向传播)来说很有用, 使用`x.backward()`方法可以从某个`fluid.Varaible`开始执行反向网络,同时利用`my_layer._x_for_debug.gradient()`获取了网络中`x`梯度的`ndarray` 返回值:
x.backward()
dy_grad = my_layer._x_for_debug.gradient()
完整代码如下:
import paddle.fluid as fluid
import numpy as np
class MyLayer(fluid.dygraph.Layer):
def __init__(self, name_scope):
super(MyLayer, self).__init__(name_scope)
def forward(self, inputs):
x = fluid.layers.relu(inputs)
self._x_for_debug = x
x = fluid.layers.elementwise_mul(x, x)
x = fluid.layers.reduce_sum(x)
return [x]
if __name__ == '__main__':
np_inp = np.array([1.0, 2.0, -1.0], dtype=np.float32)
with fluid.dygraph.guard():
var_inp = fluid.dygraph.to_variable(np_inp)
my_layer = MyLayer("my_layer")
x = my_layer(var_inp)[0]
dy_out = x.numpy()
x.backward()
dy_grad = my_layer._x_for_debug.gradient()
my_layer.clear_gradients() # 将参数梯度清零以保证下一轮训练的正确性
## 使用DyGraph训练模型 ```python
x.backward()
dy_grad = my_layer._x_for_debug.gradient()
```
接下来我们将以“手写数字识别”这个最基础的模型为例,展示如何利用DyGraph模式搭建并训练一个模型: 完整代码如下:
有关手写数字识别的相关理论知识请参考[PaddleBook](https://github.com/PaddlePaddle/book/tree/develop/02.recognize_digits)中的内容,我们在这里默认您已经了解了该模型所需的深度学习理论知识。
1. 准备数据,我们使用`paddle.dataset.mnist`作为训练所需要的数据集:
```python
import paddle.fluid as fluid
import numpy as np
train_reader = paddle.batch(
paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True)
class MyLayer(fluid.dygraph.Layer):
def __init__(self, name_scope):
super(MyLayer, self).__init__(name_scope)
def forward(self, inputs):
x = fluid.layers.relu(inputs)
self._x_for_debug = x
x = fluid.layers.elementwise_mul(x, x)
x = fluid.layers.reduce_sum(x)
return [x]
if __name__ == '__main__':
np_inp = np.array([1.0, 2.0, -1.0], dtype=np.float32)
with fluid.dygraph.guard():
var_inp = fluid.dygraph.to_variable(np_inp)
var_inp.stop_gradient = False
my_layer = MyLayer("my_layer")
x = my_layer(var_inp)[0]
dy_out = x.numpy()
x.backward()
dy_grad = my_layer._x_for_debug.gradient()
my_layer.clear_gradients() # 将参数梯度清零以保证下一轮训练的正确性
```
2. 构建网络,虽然您可以根据之前的介绍自己定义所有的网络结构,但是您也可以直接使用`fluid.dygraph.Layer`当中我们为您定制好的一些基础网络结构,这里我们利用`fluid.dygraph.Conv2D`以及`fluid.dygraph.Pool2d`构建了基础的`SimpleImgConvPool` ## 使用DyGraph训练模型
class SimpleImgConvPool(fluid.dygraph.Layer): 接下来我们将以“手写数字识别”这个最基础的模型为例,展示如何利用DyGraph模式搭建并训练一个模型:
def __init__(self,
name_scope,
num_filters,
filter_size,
pool_size,
pool_stride,
pool_padding=0,
pool_type='max',
global_pooling=False,
conv_stride=1,
conv_padding=0,
conv_dilation=1,
conv_groups=1,
act=None,
use_cudnn=False,
param_attr=None,
bias_attr=None):
super(SimpleImgConvPool, self).__init__(name_scope)
self._conv2d = fluid.dygraph.Conv2D(
self.full_name(),
num_filters=num_filters,
filter_size=filter_size,
stride=conv_stride,
padding=conv_padding,
dilation=conv_dilation,
groups=conv_groups,
param_attr=None,
bias_attr=None,
act=act,
use_cudnn=use_cudnn)
self._pool2d = fluid.dygraph.Pool2D(
self.full_name(),
pool_size=pool_size,
pool_type=pool_type,
pool_stride=pool_stride,
pool_padding=pool_padding,
global_pooling=global_pooling,
use_cudnn=use_cudnn)
def forward(self, inputs):
x = self._conv2d(inputs)
x = self._pool2d(x)
return x
有关手写数字识别的相关理论知识请参考[PaddleBook](https://github.com/PaddlePaddle/book/tree/develop/02.recognize_digits)中的内容,我们在这里默认您已经了解了该模型所需的深度学习理论知识。
1. 准备数据,我们使用`paddle.dataset.mnist`作为训练所需要的数据集:
> 注意: 构建网络时子网络的定义和使用请在`__init__`中进行, 而子网络的执行则在`forward`函数中进行 ```python
train_reader = paddle.batch(
paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True)
```
2. 构建网络,虽然您可以根据之前的介绍自己定义所有的网络结构,但是您也可以直接使用`fluid.dygraph.Layer`当中我们为您定制好的一些基础网络结构,这里我们利用`fluid.dygraph.Conv2D`以及`fluid.dygraph.Pool2d`构建了基础的`SimpleImgConvPool`
```python
class SimpleImgConvPool(fluid.dygraph.Layer):
def __init__(self,
name_scope,
num_filters,
filter_size,
pool_size,
pool_stride,
pool_padding=0,
pool_type='max',
global_pooling=False,
conv_stride=1,
conv_padding=0,
conv_dilation=1,
conv_groups=1,
act=None,
use_cudnn=False,
param_attr=None,
bias_attr=None):
super(SimpleImgConvPool, self).__init__(name_scope)
self._conv2d = fluid.dygraph.Conv2D(
self.full_name(),
num_filters=num_filters,
filter_size=filter_size,
stride=conv_stride,
padding=conv_padding,
dilation=conv_dilation,
groups=conv_groups,
param_attr=None,
bias_attr=None,
act=act,
use_cudnn=use_cudnn)
self._pool2d = fluid.dygraph.Pool2D(
self.full_name(),
pool_size=pool_size,
pool_type=pool_type,
pool_stride=pool_stride,
pool_padding=pool_padding,
global_pooling=global_pooling,
use_cudnn=use_cudnn)
def forward(self, inputs):
x = self._conv2d(inputs)
x = self._pool2d(x)
return x
```
> 注意: 构建网络时子网络的定义和使用请在`__init__`中进行, 而子网络的执行则在`forward`函数中进行
3. 利用已经构建好的`SimpleImgConvPool`组成最终的`MNIST`网络: 3. 利用已经构建好的`SimpleImgConvPool`组成最终的`MNIST`网络:
class MNIST(fluid.dygraph.Layer): ```python
def __init__(self, name_scope): class MNIST(fluid.dygraph.Layer):
super(MNIST, self).__init__(name_scope) def __init__(self, name_scope):
super(MNIST, self).__init__(name_scope)
self._simple_img_conv_pool_1 = SimpleImgConvPool(
self.full_name(), 20, 5, 2, 2, act="relu") self._simple_img_conv_pool_1 = SimpleImgConvPool(
self.full_name(), 20, 5, 2, 2, act="relu")
self._simple_img_conv_pool_2 = SimpleImgConvPool(
self.full_name(), 50, 5, 2, 2, act="relu") self._simple_img_conv_pool_2 = SimpleImgConvPool(
self.full_name(), 50, 5, 2, 2, act="relu")
pool_2_shape = 50 * 4 * 4
SIZE = 10 pool_2_shape = 50 * 4 * 4
scale = (2.0 / (pool_2_shape**2 * SIZE))**0.5 SIZE = 10
self._fc = fluid.dygraph.FC(self.full_name(), scale = (2.0 / (pool_2_shape**2 * SIZE))**0.5
10, self._fc = fluid.dygraph.FC(self.full_name(),
param_attr=fluid.param_attr.ParamAttr( 10,
initializer=fluid.initializer.NormalInitializer( param_attr=fluid.param_attr.ParamAttr(
loc=0.0, scale=scale)), initializer=fluid.initializer.NormalInitializer(
act="softmax") loc=0.0, scale=scale)),
act="softmax")
def forward(self, inputs, label=None):
x = self._simple_img_conv_pool_1(inputs) def forward(self, inputs, label=None):
x = self._simple_img_conv_pool_2(x) x = self._simple_img_conv_pool_1(inputs)
x = self._fc(x) x = self._simple_img_conv_pool_2(x)
if label is not None: x = self._fc(x)
acc = fluid.layers.accuracy(input=x, label=label) if label is not None:
return x, acc acc = fluid.layers.accuracy(input=x, label=label)
else: return x, acc
return x else:
return x
```
4. 在`fluid.dygraph.guard()`中定义配置好的`MNIST`网络结构,此时即使没有训练也可以在`fluid.dygraph.guard()`中调用模型并且检查输出: 4. 在`fluid.dygraph.guard()`中定义配置好的`MNIST`网络结构,此时即使没有训练也可以在`fluid.dygraph.guard()`中调用模型并且检查输出:
with fluid.dygraph.guard():
mnist = MNIST("mnist")
id, data = list(enumerate(train_reader()))[0]
dy_x_data = np.array(
[x[0].reshape(1, 28, 28)
for x in data]).astype('float32')
img = fluid.dygraph.to_variable(dy_x_data)
print("result is: {}".format(mnist(img).numpy()))
result is: [[0.10135901 0.1051138 0.1027941 ... 0.0972859 0.10221873 0.10165327]
[0.09735426 0.09970362 0.10198303 ... 0.10134517 0.10179105 0.10025002]
[0.09539858 0.10213123 0.09543551 ... 0.10613529 0.10535969 0.097991 ]
...
[0.10120598 0.0996111 0.10512722 ... 0.10067689 0.10088114 0.10071224]
[0.09889644 0.10033772 0.10151272 ... 0.10245881 0.09878646 0.101483 ]
[0.09097178 0.10078511 0.10198414 ... 0.10317434 0.10087223 0.09816764]]
Process finished with exit code 0
5. 构建训练循环,在每一轮参数更新完成后我们调用`mnist.clear_gradients()`来重置梯度:
with fluid.dygraph.guard():
epoch_num = 5
BATCH_SIZE = 64
train_reader = paddle.batch(
paddle.dataset.mnist.train(), batch_size=32, drop_last=True)
mnist = MNIST("mnist")
id, data = list(enumerate(train_reader()))[0]
adam = fluid.optimizer.AdamOptimizer(learning_rate=0.001)
for epoch in range(epoch_num):
for batch_id, data in enumerate(train_reader()):
dy_x_data = np.array([x[0].reshape(1, 28, 28)
for x in data]).astype('float32')
y_data = np.array(
[x[1] for x in data]).astype('int64').reshape(-1, 1)
img = fluid.dygraph.to_variable(dy_x_data)
label = fluid.dygraph.to_variable(y_data)
cost = mnist(img)
loss = fluid.layers.cross_entropy(cost, label)
avg_loss = fluid.layers.mean(loss)
if batch_id % 100 == 0 and batch_id is not 0:
print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, avg_loss.numpy()))
avg_loss.backward()
adam.minimize(avg_loss)
mnist.clear_gradients()
```python
with fluid.dygraph.guard():
mnist = MNIST("mnist")
id, data = list(enumerate(train_reader()))[0]
dy_x_data = np.array(
[x[0].reshape(1, 28, 28)
for x in data]).astype('float32')
img = fluid.dygraph.to_variable(dy_x_data)
print("result is: {}".format(mnist(img).numpy()))
```
输出:
```
result is: [[0.10135901 0.1051138 0.1027941 ... 0.0972859 0.10221873 0.10165327]
[0.09735426 0.09970362 0.10198303 ... 0.10134517 0.10179105 0.10025002]
[0.09539858 0.10213123 0.09543551 ... 0.10613529 0.10535969 0.097991 ]
...
[0.10120598 0.0996111 0.10512722 ... 0.10067689 0.10088114 0.10071224]
[0.09889644 0.10033772 0.10151272 ... 0.10245881 0.09878646 0.101483 ]
[0.09097178 0.10078511 0.10198414 ... 0.10317434 0.10087223 0.09816764]]
```
5. 构建训练循环,在每一轮参数更新完成后我们调用`mnist.clear_gradients()`来重置梯度:
```python
with fluid.dygraph.guard():
epoch_num = 5
BATCH_SIZE = 64
train_reader = paddle.batch(
paddle.dataset.mnist.train(), batch_size=32, drop_last=True)
mnist = MNIST("mnist")
id, data = list(enumerate(train_reader()))[0]
adam = fluid.optimizer.AdamOptimizer(learning_rate=0.001)
for epoch in range(epoch_num):
for batch_id, data in enumerate(train_reader()):
dy_x_data = np.array([x[0].reshape(1, 28, 28)
for x in data]).astype('float32')
y_data = np.array(
[x[1] for x in data]).astype('int64').reshape(-1, 1)
img = fluid.dygraph.to_variable(dy_x_data)
label = fluid.dygraph.to_variable(y_data)
cost = mnist(img)
loss = fluid.layers.cross_entropy(cost, label)
avg_loss = fluid.layers.mean(loss)
if batch_id % 100 == 0 and batch_id is not 0:
print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, avg_loss.numpy()))
avg_loss.backward()
adam.minimize(avg_loss)
mnist.clear_gradients()
```
6. 变量及优化器 6. 变量及优化器
...@@ -319,217 +321,238 @@ Dygraph将非常适合和Numpy一起使用,使用`fluid.dygraph.to_variable(x) ...@@ -319,217 +321,238 @@ Dygraph将非常适合和Numpy一起使用,使用`fluid.dygraph.to_variable(x)
反向运行后调用之前定义的`Adam`优化器对象的`minimize`方法进行参数更新: 反向运行后调用之前定义的`Adam`优化器对象的`minimize`方法进行参数更新:
with fluid.dygraph.guard(): ```python
epoch_num = 5 with fluid.dygraph.guard():
BATCH_SIZE = 64 epoch_num = 5
BATCH_SIZE = 64
mnist = MNIST("mnist")
adam = fluid.optimizer.AdamOptimizer(learning_rate=0.001)
train_reader = paddle.batch(
paddle.dataset.mnist.train(), batch_size= BATCH_SIZE, drop_last=True)
np.set_printoptions(precision=3, suppress=True)
for epoch in range(epoch_num):
for batch_id, data in enumerate(train_reader()):
dy_x_data = np.array(
[x[0].reshape(1, 28, 28)
for x in data]).astype('float32')
y_data = np.array(
[x[1] for x in data]).astype('int64').reshape(BATCH_SIZE, 1)
img = fluid.dygraph.to_variable(dy_x_data)
label = fluid.dygraph.to_variable(y_data)
label.stop_gradient = True
cost = mnist(img)
loss = fluid.layers.cross_entropy(cost, label)
avg_loss = fluid.layers.mean(loss)
dy_out = avg_loss.numpy()
avg_loss.backward()
adam.minimize(avg_loss)
mnist.clear_gradients()
dy_param_value = {}
for param in mnist.parameters():
dy_param_value[param.name] = param.numpy()
if batch_id % 20 == 0:
print("Loss at step {}: {}".format(batch_id, avg_loss.numpy()))
print("Final loss: {}".format(avg_loss.numpy()))
print("_simple_img_conv_pool_1_conv2d W's mean is: {}".format(mnist._simple_img_conv_pool_1._conv2d._filter_param.numpy().mean()))
print("_simple_img_conv_pool_1_conv2d Bias's mean is: {}".format(mnist._simple_img_conv_pool_1._conv2d._bias_param.numpy().mean()))
Loss at step 0: [2.302]
Loss at step 20: [1.616]
Loss at step 40: [1.244]
Loss at step 60: [1.142]
Loss at step 80: [0.911]
Loss at step 100: [0.824]
Loss at step 120: [0.774]
Loss at step 140: [0.626]
Loss at step 160: [0.609]
Loss at step 180: [0.627]
Loss at step 200: [0.466]
Loss at step 220: [0.499]
Loss at step 240: [0.614]
Loss at step 260: [0.585]
Loss at step 280: [0.503]
Loss at step 300: [0.423]
Loss at step 320: [0.509]
Loss at step 340: [0.348]
Loss at step 360: [0.452]
Loss at step 380: [0.397]
Loss at step 400: [0.54]
Loss at step 420: [0.341]
Loss at step 440: [0.337]
Loss at step 460: [0.155]
Final loss: [0.164]
_simple_img_conv_pool_1_conv2d W's mean is: 0.00606656912714
_simple_img_conv_pool_1_conv2d Bias's mean is: -3.4576318285e-05
7. 性能
在使用`fluid.dygraph.guard()`时可以通过传入`fluid.CUDAPlace(0)`或者`fluid.CPUPlace()`来选择执行DyGraph的设备,通常如果不做任何处理将会自动适配您的设备。
## 使用多卡训练模型
目前PaddlePaddle支持通过多进程方式进行多卡训练,即每个进程对应一张卡。训练过程中,在第一次执行前向操作时,如果该操作需要参数,则会将0号卡的参数Broadcast到其他卡上,确保各个卡上的参数一致;在计算完反向操作之后,将产生的参数梯度在所有卡之间进行聚合;最后在各个GPU卡上分别进行参数更新。
place = fluid.CUDAPlace(fluid.dygraph.parallel.Env().dev_id)
with fluid.dygraph.guard(place):
strategy = fluid.dygraph.parallel.prepare_context()
mnist = MNIST("mnist") mnist = MNIST("mnist")
adam = AdamOptimizer(learning_rate=0.001) adam = fluid.optimizer.AdamOptimizer(learning_rate=0.001)
mnist = fluid.dygraph.parallel.DataParallel(mnist, strategy)
train_reader = paddle.batch( train_reader = paddle.batch(
paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True) paddle.dataset.mnist.train(), batch_size= BATCH_SIZE, drop_last=True)
train_reader = fluid.contrib.reader.distributed_batch_reader(
train_reader) np.set_printoptions(precision=3, suppress=True)
for epoch in range(epoch_num): for epoch in range(epoch_num):
for batch_id, data in enumerate(train_reader()): for batch_id, data in enumerate(train_reader()):
dy_x_data = np.array([x[0].reshape(1, 28, 28) dy_x_data = np.array(
for x in data]).astype('float32') [x[0].reshape(1, 28, 28)
for x in data]).astype('float32')
y_data = np.array( y_data = np.array(
[x[1] for x in data]).astype('int64').reshape(-1, 1) [x[1] for x in data]).astype('int64').reshape(BATCH_SIZE, 1)
img = to_variable(dy_x_data) img = fluid.dygraph.to_variable(dy_x_data)
label = to_variable(y_data) label = fluid.dygraph.to_variable(y_data)
label.stop_gradient = True label.stop_gradient = True
cost, acc = mnist(img, label) cost = mnist(img)
loss = fluid.layers.cross_entropy(cost, label) loss = fluid.layers.cross_entropy(cost, label)
avg_loss = fluid.layers.mean(loss) avg_loss = fluid.layers.mean(loss)
avg_loss = mnist.scale_loss(avg_loss) dy_out = avg_loss.numpy()
avg_loss.backward() avg_loss.backward()
mnist.apply_collective_grads()
adam.minimize(avg_loss) adam.minimize(avg_loss)
mnist.clear_gradients() mnist.clear_gradients()
if batch_id % 100 == 0 and batch_id is not 0:
print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, avg_loss.numpy())) dy_param_value = {}
for param in mnist.parameters():
dy_param_value[param.name] = param.numpy()
if batch_id % 20 == 0:
print("Loss at step {}: {}".format(batch_id, avg_loss.numpy()))
print("Final loss: {}".format(avg_loss.numpy()))
print("_simple_img_conv_pool_1_conv2d W's mean is: {}".format(mnist._simple_img_conv_pool_1._conv2d._filter_param.numpy().mean()))
print("_simple_img_conv_pool_1_conv2d Bias's mean is: {}".format(mnist._simple_img_conv_pool_1._conv2d._bias_param.numpy().mean()))
```
输出:
```
Loss at step 0: [2.302]
Loss at step 20: [1.616]
Loss at step 40: [1.244]
Loss at step 60: [1.142]
Loss at step 80: [0.911]
Loss at step 100: [0.824]
Loss at step 120: [0.774]
Loss at step 140: [0.626]
Loss at step 160: [0.609]
Loss at step 180: [0.627]
Loss at step 200: [0.466]
Loss at step 220: [0.499]
Loss at step 240: [0.614]
Loss at step 260: [0.585]
Loss at step 280: [0.503]
Loss at step 300: [0.423]
Loss at step 320: [0.509]
Loss at step 340: [0.348]
Loss at step 360: [0.452]
Loss at step 380: [0.397]
Loss at step 400: [0.54]
Loss at step 420: [0.341]
Loss at step 440: [0.337]
Loss at step 460: [0.155]
Final loss: [0.164]
_simple_img_conv_pool_1_conv2d W's mean is: 0.00606656912714
_simple_img_conv_pool_1_conv2d Bias's mean is: -3.4576318285e-05
```
7. 性能
在使用`fluid.dygraph.guard()`时可以通过传入`fluid.CUDAPlace(0)`或者`fluid.CPUPlace()`来选择执行DyGraph的设备,通常如果不做任何处理将会自动适配您的设备。
## 使用多卡训练模型
目前PaddlePaddle支持通过多进程方式进行多卡训练,即每个进程对应一张卡。训练过程中,在第一次执行前向操作时,如果该操作需要参数,则会将0号卡的参数Broadcast到其他卡上,确保各个卡上的参数一致;在计算完反向操作之后,将产生的参数梯度在所有卡之间进行聚合;最后在各个GPU卡上分别进行参数更新。
```python
place = fluid.CUDAPlace(fluid.dygraph.parallel.Env().dev_id)
with fluid.dygraph.guard(place):
strategy = fluid.dygraph.parallel.prepare_context()
mnist = MNIST("mnist")
adam = AdamOptimizer(learning_rate=0.001)
mnist = fluid.dygraph.parallel.DataParallel(mnist, strategy)
train_reader = paddle.batch(
paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True)
train_reader = fluid.contrib.reader.distributed_batch_reader(
train_reader)
for epoch in range(epoch_num):
for batch_id, data in enumerate(train_reader()):
dy_x_data = np.array([x[0].reshape(1, 28, 28)
for x in data]).astype('float32')
y_data = np.array(
[x[1] for x in data]).astype('int64').reshape(-1, 1)
img = to_variable(dy_x_data)
label = to_variable(y_data)
label.stop_gradient = True
cost, acc = mnist(img, label)
loss = fluid.layers.cross_entropy(cost, label)
avg_loss = fluid.layers.mean(loss)
avg_loss = mnist.scale_loss(avg_loss)
avg_loss.backward()
mnist.apply_collective_grads()
adam.minimize(avg_loss)
mnist.clear_gradients()
if batch_id % 100 == 0 and batch_id is not 0:
print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, avg_loss.numpy()))
```
动态图单卡训练转多卡训练需要修改的地方主要有四处: 动态图单卡训练转多卡训练需要修改的地方主要有四处:
1. 需要从环境变量获取设备的ID,即: 1. 需要从环境变量获取设备的ID,即:
place = fluid.CUDAPlace(fluid.dygraph.parallel.Env().dev_id) ```python
place = fluid.CUDAPlace(fluid.dygraph.parallel.Env().dev_id)
```
2. 需要对原模型做一些预处理,即: 2. 需要对原模型做一些预处理,即:
strategy = fluid.dygraph.parallel.prepare_context() ```python
mnist = MNIST("mnist") strategy = fluid.dygraph.parallel.prepare_context()
adam = AdamOptimizer(learning_rate=0.001) mnist = MNIST("mnist")
mnist = fluid.dygraph.parallel.DataParallel(mnist, strategy) adam = AdamOptimizer(learning_rate=0.001)
mnist = fluid.dygraph.parallel.DataParallel(mnist, strategy)
```
3. 数据读取,必须确保每个进程读取的数据是不同的,即所有进程读取数据的交集为空,所有进程读取数据的并集是完整的数据集: 3. 数据读取,必须确保每个进程读取的数据是不同的,即所有进程读取数据的交集为空,所有进程读取数据的并集是完整的数据集:
train_reader = paddle.batch( ```python
paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True) train_reader = paddle.batch(
train_reader = fluid.contrib.reader.distributed_batch_reader( paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True)
train_reader) train_reader = fluid.contrib.reader.distributed_batch_reader(
train_reader)
```
4. 需要对loss进行调整,以及对参数的梯度进行聚合,即: 4. 需要对loss进行调整,以及对参数的梯度进行聚合,即:
avg_loss = mnist.scale_loss(avg_loss) ```python
avg_loss.backward() avg_loss = mnist.scale_loss(avg_loss)
mnist.apply_collective_grads() avg_loss.backward()
mnist.apply_collective_grads()
```
Paddle动态图多进程多卡模型训练启动时需要指定使用的GPU,即如果使用`0,1,2,3`卡,启动方式如下: Paddle动态图多进程多卡模型训练启动时需要指定使用的GPU,即如果使用`0,1,2,3`卡,启动方式如下:
python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train.py ```
python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train.py
```
输出结果为: 输出结果为:
----------- Configuration Arguments ----------- ```
cluster_node_ips: 127.0.0.1 ----------- Configuration Arguments -----------
log_dir: ./mylog cluster_node_ips: 127.0.0.1
node_ip: 127.0.0.1 log_dir: ./mylog
print_config: True node_ip: 127.0.0.1
selected_gpus: 0,1,2,3 print_config: True
started_port: 6170 selected_gpus: 0,1,2,3
training_script: train.py started_port: 6170
training_script_args: ['--use_data_parallel', '1'] training_script: train.py
use_paddlecloud: True training_script_args: ['--use_data_parallel', '1']
------------------------------------------------ use_paddlecloud: True
trainers_endpoints: 127.0.0.1:6170,127.0.0.1:6171,127.0.0.1:6172,127.0.0.1:6173 , node_id: 0 , current_node_ip: 127.0.0.1 , num_nodes: 1 , node_ips: ['127.0.0.1'] , nranks: 4 ------------------------------------------------
trainers_endpoints: 127.0.0.1:6170,127.0.0.1:6171,127.0.0.1:6172,127.0.0.1:6173 , node_id: 0 , current_node_ip: 127.0.0.1 , num_nodes: 1 , node_ips: ['127.0.0.1'] , nranks: 4
```
此时,程序会将每个进程的输出log导出到./mylog路径下: 此时,程序会将每个进程的输出log导出到./mylog路径下:
. ```
├── mylog .
│ ├── workerlog.0 ├── mylog
│ ├── workerlog.1 │ ├── workerlog.0
│ ├── workerlog.2 │ ├── workerlog.1
│ └── workerlog.3 │ ├── workerlog.2
└── train.py │ └── workerlog.3
└── train.py
```
如果不指定`--log_dir`,程序会将打印出所有进程的输出,即: 如果不指定`--log_dir`,程序会将打印出所有进程的输出,即:
----------- Configuration Arguments ----------- ```
cluster_node_ips: 127.0.0.1 ----------- Configuration Arguments -----------
log_dir: None cluster_node_ips: 127.0.0.1
node_ip: 127.0.0.1 log_dir: None
print_config: True node_ip: 127.0.0.1
selected_gpus: 0,1,2,3 print_config: True
started_port: 6170 selected_gpus: 0,1,2,3
training_script: train.py started_port: 6170
training_script_args: ['--use_data_parallel', '1'] training_script: train.py
use_paddlecloud: True training_script_args: ['--use_data_parallel', '1']
------------------------------------------------ use_paddlecloud: True
trainers_endpoints: 127.0.0.1:6170,127.0.0.1:6171,127.0.0.1:6172,127.0.0.1:6173 , node_id: 0 , current_node_ip: 127.0.0.1 , num_nodes: 1 , node_ips: ['127.0.0.1'] , nranks: 4 ------------------------------------------------
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script trainers_endpoints: 127.0.0.1:6170,127.0.0.1:6171,127.0.0.1:6172,127.0.0.1:6173 , node_id: 0 , current_node_ip: 127.0.0.1 , num_nodes: 1 , node_ips: ['127.0.0.1'] , nranks: 4
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
I0923 09:32:36.423513 56410 nccl_context.cc:120] init nccl context nranks: 4 local rank: 1 gpu id: 1 grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
I0923 09:32:36.425287 56411 nccl_context.cc:120] init nccl context nranks: 4 local rank: 2 gpu id: 2 I0923 09:32:36.423513 56410 nccl_context.cc:120] init nccl context nranks: 4 local rank: 1 gpu id: 1
I0923 09:32:36.429337 56409 nccl_context.cc:120] init nccl context nranks: 4 local rank: 0 gpu id: 0 I0923 09:32:36.425287 56411 nccl_context.cc:120] init nccl context nranks: 4 local rank: 2 gpu id: 2
I0923 09:32:36.429440 56412 nccl_context.cc:120] init nccl context nranks: 4 local rank: 3 gpu id: 3 I0923 09:32:36.429337 56409 nccl_context.cc:120] init nccl context nranks: 4 local rank: 0 gpu id: 0
W0923 09:32:42.594097 56412 device_context.cc:198] Please NOTE: device: 3, CUDA Capability: 70, Driver API Version: 9.0, Runtime API Version: 9.0 I0923 09:32:36.429440 56412 nccl_context.cc:120] init nccl context nranks: 4 local rank: 3 gpu id: 3
W0923 09:32:42.605836 56412 device_context.cc:206] device: 3, cuDNN Version: 7.5. W0923 09:32:42.594097 56412 device_context.cc:198] Please NOTE: device: 3, CUDA Capability: 70, Driver API Version: 9.0, Runtime API Version: 9.0
W0923 09:32:42.632463 56410 device_context.cc:198] Please NOTE: device: 1, CUDA Capability: 70, Driver API Version: 9.0, Runtime API Version: 9.0 W0923 09:32:42.605836 56412 device_context.cc:206] device: 3, cuDNN Version: 7.5.
W0923 09:32:42.637948 56410 device_context.cc:206] device: 1, cuDNN Version: 7.5. W0923 09:32:42.632463 56410 device_context.cc:198] Please NOTE: device: 1, CUDA Capability: 70, Driver API Version: 9.0, Runtime API Version: 9.0
W0923 09:32:42.648674 56411 device_context.cc:198] Please NOTE: device: 2, CUDA Capability: 70, Driver API Version: 9.0, Runtime API Version: 9.0 W0923 09:32:42.637948 56410 device_context.cc:206] device: 1, cuDNN Version: 7.5.
W0923 09:32:42.654021 56411 device_context.cc:206] device: 2, cuDNN Version: 7.5. W0923 09:32:42.648674 56411 device_context.cc:198] Please NOTE: device: 2, CUDA Capability: 70, Driver API Version: 9.0, Runtime API Version: 9.0
W0923 09:32:43.048696 56409 device_context.cc:198] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.0, Runtime API Version: 9.0 W0923 09:32:42.654021 56411 device_context.cc:206] device: 2, cuDNN Version: 7.5.
W0923 09:32:43.053236 56409 device_context.cc:206] device: 0, cuDNN Version: 7.5. W0923 09:32:43.048696 56409 device_context.cc:198] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.0, Runtime API Version: 9.0
start data reader (trainers_num: 4, trainer_id: 2) W0923 09:32:43.053236 56409 device_context.cc:206] device: 0, cuDNN Version: 7.5.
start data reader (trainers_num: 4, trainer_id: 3) start data reader (trainers_num: 4, trainer_id: 2)
start data reader (trainers_num: 4, trainer_id: 1) start data reader (trainers_num: 4, trainer_id: 3)
start data reader (trainers_num: 4, trainer_id: 0) start data reader (trainers_num: 4, trainer_id: 1)
Loss at epoch 0 step 0: [0.57390565] start data reader (trainers_num: 4, trainer_id: 0)
Loss at epoch 0 step 0: [0.57523954] Loss at epoch 0 step 0: [0.57390565]
Loss at epoch 0 step 0: [0.575606] Loss at epoch 0 step 0: [0.57523954]
Loss at epoch 0 step 0: [0.5767452] Loss at epoch 0 step 0: [0.575606]
Loss at epoch 0 step 0: [0.5767452]
```
## 模型参数的保存 ## 模型参数的保存
...@@ -549,63 +572,66 @@ Paddle动态图多进程多卡模型训练启动时需要指定使用的GPU, ...@@ -549,63 +572,66 @@ Paddle动态图多进程多卡模型训练启动时需要指定使用的GPU,
下面的代码展示了如何在“手写数字识别”任务中保存参数并且读取已经保存的参数来继续训练。 下面的代码展示了如何在“手写数字识别”任务中保存参数并且读取已经保存的参数来继续训练。
```python
with fluid.dygraph.guard(): with fluid.dygraph.guard():
epoch_num = 5 epoch_num = 5
BATCH_SIZE = 64 BATCH_SIZE = 64
mnist = MNIST("mnist") mnist = MNIST("mnist")
adam = fluid.optimizer.Adam(learning_rate=0.001) adam = fluid.optimizer.Adam(learning_rate=0.001)
train_reader = paddle.batch( train_reader = paddle.batch(
paddle.dataset.mnist.train(), batch_size= BATCH_SIZE, drop_last=True) paddle.dataset.mnist.train(), batch_size= BATCH_SIZE, drop_last=True)
np.set_printoptions(precision=3, suppress=True) np.set_printoptions(precision=3, suppress=True)
dy_param_init_value={} dy_param_init_value={}
for epoch in range(epoch_num): for epoch in range(epoch_num):
for batch_id, data in enumerate(train_reader()): for batch_id, data in enumerate(train_reader()):
dy_x_data = np.array( dy_x_data = np.array(
[x[0].reshape(1, 28, 28) [x[0].reshape(1, 28, 28)
for x in data]).astype('float32') for x in data]).astype('float32')
y_data = np.array( y_data = np.array(
[x[1] for x in data]).astype('int64').reshape(BATCH_SIZE, 1) [x[1] for x in data]).astype('int64').reshape(BATCH_SIZE, 1)
img = fluid.dygraph.to_variable(dy_x_data) img = fluid.dygraph.to_variable(dy_x_data)
label = fluid.dygraph.to_variable(y_data) label = fluid.dygraph.to_variable(y_data)
label.stop_gradient = True label.stop_gradient = True
cost = mnist(img) cost = mnist(img)
loss = fluid.layers.cross_entropy(cost, label) loss = fluid.layers.cross_entropy(cost, label)
avg_loss = fluid.layers.mean(loss) avg_loss = fluid.layers.mean(loss)
dy_out = avg_loss.numpy() dy_out = avg_loss.numpy()
avg_loss.backward() avg_loss.backward()
adam.minimize(avg_loss) adam.minimize(avg_loss)
if batch_id == 20: if batch_id == 20:
fluid.dygraph.save_persistables(mnist.state_dict(), "save_dir", adam) fluid.dygraph.save_persistables(mnist.state_dict(), "save_dir", adam)
mnist.clear_gradients() mnist.clear_gradients()
if batch_id == 20: if batch_id == 20:
for param in mnist.parameters(): for param in mnist.parameters():
dy_param_init_value[param.name] = param.numpy() dy_param_init_value[param.name] = param.numpy()
model, _ = fluid.dygraph.load_persistables("save_dir") model, _ = fluid.dygraph.load_persistables("save_dir")
mnist.load_dict(model) mnist.load_dict(model)
break break
if epoch == 0: if epoch == 0:
break break
restore = mnist.parameters() restore = mnist.parameters()
# check save and load # check save and load
success = True success = True
for value in restore: for value in restore:
if (not np.array_equal(value.numpy(), dy_param_init_value[value.name])) or (not np.isfinite(value.numpy().all())) or (np.isnan(value.numpy().any())): if (not np.array_equal(value.numpy(), dy_param_init_value[value.name])) or (not np.isfinite(value.numpy().all())) or (np.isnan(value.numpy().any())):
success = False success = False
print("model save and load success? {}".format(success)) print("model save and load success? {}".format(success))
```
需要注意的是,如果采用多卡训练,只需要一个进程对模型参数进行保存,因此在保存模型参数时,需要进行指定保存哪个进程的参数,比如 需要注意的是,如果采用多卡训练,只需要一个进程对模型参数进行保存,因此在保存模型参数时,需要进行指定保存哪个进程的参数,比如
```python
if fluid.dygraph.parallel.Env().local_rank == 0: if fluid.dygraph.parallel.Env().local_rank == 0:
fluid.dygraph.save_persistables(mnist.state_dict(), "save_dir") fluid.dygraph.save_persistables(mnist.state_dict(), "save_dir")
```
## 模型评估 ## 模型评估
...@@ -617,168 +643,169 @@ Paddle动态图多进程多卡模型训练启动时需要指定使用的GPU, ...@@ -617,168 +643,169 @@ Paddle动态图多进程多卡模型训练启动时需要指定使用的GPU,
我们在`inference_mnist `中启用另一个`fluid.dygraph.guard()`,并在其上下文中`load`之前保存的`checkpoint`进行预测,同样的在执行预测前需要使用`YourModel.eval()`来切换到预测模式。 我们在`inference_mnist `中启用另一个`fluid.dygraph.guard()`,并在其上下文中`load`之前保存的`checkpoint`进行预测,同样的在执行预测前需要使用`YourModel.eval()`来切换到预测模式。
```python
def test_mnist(reader, model, batch_size): def test_mnist(reader, model, batch_size):
acc_set = [] acc_set = []
avg_loss_set = [] avg_loss_set = []
for batch_id, data in enumerate(reader()): for batch_id, data in enumerate(reader()):
dy_x_data = np.array([x[0].reshape(1, 28, 28) dy_x_data = np.array([x[0].reshape(1, 28, 28)
for x in data]).astype('float32') for x in data]).astype('float32')
y_data = np.array( y_data = np.array(
[x[1] for x in data]).astype('int64').reshape(batch_size, 1) [x[1] for x in data]).astype('int64').reshape(batch_size, 1)
img = fluid.dygraph.to_variable(dy_x_data) img = fluid.dygraph.to_variable(dy_x_data)
label = fluid.dygraph.to_variable(y_data) label = fluid.dygraph.to_variable(y_data)
label.stop_gradient = True label.stop_gradient = True
prediction, acc = model(img, label) prediction, acc = model(img, label)
loss = fluid.layers.cross_entropy(input=prediction, label=label) loss = fluid.layers.cross_entropy(input=prediction, label=label)
avg_loss = fluid.layers.mean(loss) avg_loss = fluid.layers.mean(loss)
acc_set.append(float(acc.numpy())) acc_set.append(float(acc.numpy()))
avg_loss_set.append(float(avg_loss.numpy())) avg_loss_set.append(float(avg_loss.numpy()))
# get test acc and loss # get test acc and loss
acc_val_mean = np.array(acc_set).mean() acc_val_mean = np.array(acc_set).mean()
avg_loss_val_mean = np.array(avg_loss_set).mean() avg_loss_val_mean = np.array(avg_loss_set).mean()
return avg_loss_val_mean, acc_val_mean return avg_loss_val_mean, acc_val_mean
def inference_mnist(): def inference_mnist():
with fluid.dygraph.guard(): with fluid.dygraph.guard():
mnist_infer = MNIST("mnist") mnist_infer = MNIST("mnist")
# load checkpoint # load checkpoint
model_dict, _ = fluid.dygraph.load_persistables("save_dir") model_dict, _ = fluid.dygraph.load_persistables("save_dir")
mnist_infer.load_dict(model_dict) mnist_infer.load_dict(model_dict)
print("checkpoint loaded") print("checkpoint loaded")
# start evaluate mode # start evaluate mode
mnist_infer.eval() mnist_infer.eval()
def load_image(file): def load_image(file):
im = Image.open(file).convert('L') im = Image.open(file).convert('L')
im = im.resize((28, 28), Image.ANTIALIAS) im = im.resize((28, 28), Image.ANTIALIAS)
im = np.array(im).reshape(1, 1, 28, 28).astype(np.float32) im = np.array(im).reshape(1, 1, 28, 28).astype(np.float32)
im = im / 255.0 * 2.0 - 1.0 im = im / 255.0 * 2.0 - 1.0
return im return im
cur_dir = os.path.dirname(os.path.realpath(__file__)) cur_dir = os.path.dirname(os.path.realpath(__file__))
tensor_img = load_image(cur_dir + '/image/infer_3.png') tensor_img = load_image(cur_dir + '/image/infer_3.png')
results = mnist_infer(fluid.dygraph.to_variable(tensor_img)) results = mnist_infer(fluid.dygraph.to_variable(tensor_img))
lab = np.argsort(results.numpy()) lab = np.argsort(results.numpy())
print("Inference result of image/infer_3.png is: %d" % lab[0][-1]) print("Inference result of image/infer_3.png is: %d" % lab[0][-1])
with fluid.dygraph.guard(): with fluid.dygraph.guard():
epoch_num = 1 epoch_num = 1
BATCH_SIZE = 64 BATCH_SIZE = 64
mnist = MNIST("mnist") mnist = MNIST("mnist")
adam = fluid.optimizer.AdamOptimizer(learning_rate=0.001) adam = fluid.optimizer.AdamOptimizer(learning_rate=0.001)
test_reader = paddle.batch( test_reader = paddle.batch(
paddle.dataset.mnist.test(), batch_size=BATCH_SIZE, drop_last=True) paddle.dataset.mnist.test(), batch_size=BATCH_SIZE, drop_last=True)
train_reader = paddle.batch( train_reader = paddle.batch(
paddle.dataset.mnist.train(), paddle.dataset.mnist.train(),
batch_size=BATCH_SIZE, batch_size=BATCH_SIZE,
drop_last=True) drop_last=True)
for epoch in range(epoch_num): for epoch in range(epoch_num):
for batch_id, data in enumerate(train_reader()): for batch_id, data in enumerate(train_reader()):
dy_x_data = np.array([x[0].reshape(1, 28, 28) dy_x_data = np.array([x[0].reshape(1, 28, 28)
for x in data]).astype('float32') for x in data]).astype('float32')
y_data = np.array( y_data = np.array(
[x[1] for x in data]).astype('int64').reshape(-1, 1) [x[1] for x in data]).astype('int64').reshape(-1, 1)
img = fluid.dygraph.to_variable(dy_x_data) img = fluid.dygraph.to_variable(dy_x_data)
label = fluid.dygraph.to_variable(y_data) label = fluid.dygraph.to_variable(y_data)
label.stop_gradient = True label.stop_gradient = True
cost, acc = mnist(img, label) cost, acc = mnist(img, label)
loss = fluid.layers.cross_entropy(cost, label) loss = fluid.layers.cross_entropy(cost, label)
avg_loss = fluid.layers.mean(loss) avg_loss = fluid.layers.mean(loss)
avg_loss.backward() avg_loss.backward()
adam.minimize(avg_loss) adam.minimize(avg_loss)
# save checkpoint # save checkpoint
mnist.clear_gradients() mnist.clear_gradients()
if batch_id % 100 == 0: if batch_id % 100 == 0:
print("Loss at epoch {} step {}: {:}".format( print("Loss at epoch {} step {}: {:}".format(
epoch, batch_id, avg_loss.numpy())) epoch, batch_id, avg_loss.numpy()))
mnist.eval() mnist.eval()
test_cost, test_acc = test_mnist(test_reader, mnist, BATCH_SIZE) test_cost, test_acc = test_mnist(test_reader, mnist, BATCH_SIZE)
mnist.train() mnist.train()
print("Loss at epoch {} , Test avg_loss is: {}, acc is: {}".format( print("Loss at epoch {} , Test avg_loss is: {}, acc is: {}".format(
epoch, test_cost, test_acc)) epoch, test_cost, test_acc))
fluid.dygraph.save_persistables(mnist.state_dict(), "save_dir") fluid.dygraph.save_persistables(mnist.state_dict(), "save_dir")
print("checkpoint saved") print("checkpoint saved")
inference_mnist() inference_mnist()
```
输出:
Loss at epoch 0 step 0: [2.2991252]
Loss at epoch 0 step 100: [0.15491392] ```
Loss at epoch 0 step 200: [0.13315125] Loss at epoch 0 step 0: [2.2991252]
Loss at epoch 0 step 300: [0.10253005] Loss at epoch 0 step 100: [0.15491392]
Loss at epoch 0 step 400: [0.04266362] Loss at epoch 0 step 200: [0.13315125]
Loss at epoch 0 step 500: [0.08894891] Loss at epoch 0 step 300: [0.10253005]
Loss at epoch 0 step 600: [0.08999012] Loss at epoch 0 step 400: [0.04266362]
Loss at epoch 0 step 700: [0.12975612] Loss at epoch 0 step 500: [0.08894891]
Loss at epoch 0 step 800: [0.15257305] Loss at epoch 0 step 600: [0.08999012]
Loss at epoch 0 step 900: [0.07429226] Loss at epoch 0 step 700: [0.12975612]
Loss at epoch 0 , Test avg_loss is: 0.05995981965082674, acc is: 0.9794671474358975 Loss at epoch 0 step 800: [0.15257305]
checkpoint saved Loss at epoch 0 step 900: [0.07429226]
No optimizer loaded. If you didn't save optimizer, please ignore this. The program can still work with new optimizer. Loss at epoch 0 , Test avg_loss is: 0.05995981965082674, acc is: 0.9794671474358975
checkpoint loaded checkpoint saved
Inference result of image/infer_3.png is: 3 No optimizer loaded. If you didn't save optimizer, please ignore this. The program can still work with new optimizer.
checkpoint loaded
Inference result of image/infer_3.png is: 3
```
## 编写兼容的模型 ## 编写兼容的模型
以上一步中手写数字识别的例子为例,动态图的模型代码可以直接用于静态图中作为模型代码,执行时,直接使用PaddlePaddle静态图执行方式即可,这里以静态图中的`executor`为例, 模型代码可以直接使用之前的模型代码,执行时使用`Executor`执行即可 以上一步中手写数字识别的例子为例,动态图的模型代码可以直接用于静态图中作为模型代码,执行时,直接使用PaddlePaddle静态图执行方式即可,这里以静态图中的`executor`为例, 模型代码可以直接使用之前的模型代码,执行时使用`Executor`执行即可
```python
epoch_num = 1 epoch_num = 1
BATCH_SIZE = 64 BATCH_SIZE = 64
exe = fluid.Executor(fluid.CPUPlace()) exe = fluid.Executor(fluid.CPUPlace())
mnist = MNIST("mnist") mnist = MNIST("mnist")
sgd = fluid.optimizer.SGDOptimizer(learning_rate=1e-3) sgd = fluid.optimizer.SGDOptimizer(learning_rate=1e-3)
train_reader = paddle.batch( train_reader = paddle.batch(
paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True) paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True)
img = fluid.layers.data( img = fluid.layers.data(
name='pixel', shape=[1, 28, 28], dtype='float32') name='pixel', shape=[1, 28, 28], dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64') label = fluid.layers.data(name='label', shape=[1], dtype='int64')
cost = mnist(img) cost = mnist(img)
loss = fluid.layers.cross_entropy(cost, label) loss = fluid.layers.cross_entropy(cost, label)
avg_loss = fluid.layers.mean(loss) avg_loss = fluid.layers.mean(loss)
sgd.minimize(avg_loss) sgd.minimize(avg_loss)
out = exe.run(fluid.default_startup_program()) out = exe.run(fluid.default_startup_program())
for epoch in range(epoch_num): for epoch in range(epoch_num):
for batch_id, data in enumerate(train_reader()): for batch_id, data in enumerate(train_reader()):
static_x_data = np.array( static_x_data = np.array(
[x[0].reshape(1, 28, 28) [x[0].reshape(1, 28, 28)
for x in data]).astype('float32') for x in data]).astype('float32')
y_data = np.array( y_data = np.array(
[x[1] for x in data]).astype('int64').reshape([BATCH_SIZE, 1]) [x[1] for x in data]).astype('int64').reshape([BATCH_SIZE, 1])
fetch_list = [avg_loss.name] fetch_list = [avg_loss.name]
out = exe.run( out = exe.run(
fluid.default_main_program(), fluid.default_main_program(),
feed={"pixel": static_x_data, feed={"pixel": static_x_data,
"label": y_data}, "label": y_data},
fetch_list=fetch_list) fetch_list=fetch_list)
static_out = out[0] static_out = out[0]
if batch_id % 100 == 0 and batch_id is not 0: if batch_id % 100 == 0 and batch_id is not 0:
print("epoch: {}, batch_id: {}, loss: {}".format(epoch, batch_id, static_out)) print("epoch: {}, batch_id: {}, loss: {}".format(epoch, batch_id, static_out))
```
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册