未验证 提交 3e91a34d 编写于 作者: C Chen Long 提交者: GitHub

add_tutorial test=develop (#2582)

* add_tutorial test=develop

* fix quick start test=develop
上级 7b69a251
使用卷积神经网络进行图像分类
============================
本示例教程将会演示如何使用飞桨的卷积神经网络来完成图像分类任务。这是一个较为简单的示例,将会使用一个由三个卷积层组成的网络完成\ `cifar10 <https://www.cs.toronto.edu/~kriz/cifar.html>`__\ 数据集的图像分类任务。
设置环境
----------
我们将使用飞桨2.0beta版本。
.. code:: ipython3
import paddle
import paddle.nn.functional as F
from paddle.vision.transforms import Normalize
import numpy as np
import matplotlib.pyplot as plt
paddle.disable_static()
print(paddle.__version__)
print(paddle.__git_commit__)
.. parsed-literal::
0.0.0
264e76cae6861ad9b1d4bcd8c3212f7a78c01e4d
加载并浏览数据集
-------------------
我们将会使用飞桨提供的API完成数据集的下载并为后续的训练任务准备好数据迭代器。cifar10数据集由60000张大小为32
\*
32的彩色图片组成,其中有50000张图片组成了训练集,另外10000张图片组成了测试集。这些图片分为10个类别,我们的任务是训练一个模型能够把图片进行正确的分类。
.. code:: ipython3
cifar10_train = paddle.vision.datasets.cifar.Cifar10(mode='train', transform=None)
train_images = np.zeros((50000, 32, 32, 3), dtype='float32')
train_labels = np.zeros((50000, 1), dtype='int32')
for i, data in enumerate(cifar10_train):
train_image, train_label = data
train_image = train_image.reshape((3, 32, 32 )).astype('float32') / 255.
train_image = train_image.transpose(1, 2, 0)
train_images[i, :, :, :] = train_image
train_labels[i, 0] = train_label
浏览数据集
-------------
接下来我们从数据集中随机挑选一些图片并显示,从而对数据集有一个直观的了解。
.. code:: ipython3
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
plt.figure(figsize=(10,10))
sample_idxs = np.random.choice(50000, size=25, replace=False)
for i in range(25):
plt.subplot(5, 5, i+1)
plt.xticks([])
plt.yticks([])
plt.imshow(train_images[sample_idxs[i]], cmap=plt.cm.binary)
plt.xlabel(class_names[train_labels[sample_idxs[i]][0]])
plt.show()
.. image:: convnet_image_classification_files/convnet_image_classification_6_0.png
组建网络
----------
接下来我们使用飞桨定义一个使用了三个二维卷积(\ ``Conv2d``)且每次卷积之后使用\ ``relu``\ 激活函数,两个二维池化层(\ ``MaxPool2d``\ ),和两个线性变换层组成的分类网络,来把一个\ ``(32, 32, 3)``\ 形状的图片通过卷积神经网络映射为10个输出,这对应着10个分类的类别。
.. code:: ipython3
class MyNet(paddle.nn.Layer):
def __init__(self, num_classes=1):
super(MyNet, self).__init__()
self.conv1 = paddle.nn.Conv2d(in_channels=3, out_channels=32, kernel_size=(3, 3))
self.pool1 = paddle.nn.MaxPool2d(kernel_size=2, stride=2)
self.conv2 = paddle.nn.Conv2d(in_channels=32, out_channels=64, kernel_size=(3,3))
self.pool2 = paddle.nn.MaxPool2d(kernel_size=2, stride=2)
self.conv3 = paddle.nn.Conv2d(in_channels=64, out_channels=64, kernel_size=(3,3))
self.flatten = paddle.nn.Flatten()
self.linear1 = paddle.nn.Linear(in_features=1024, out_features=64)
self.linear2 = paddle.nn.Linear(in_features=64, out_features=num_classes)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.pool1(x)
x = self.conv2(x)
x = F.relu(x)
x = self.pool2(x)
x = self.conv3(x)
x = F.relu(x)
x = self.flatten(x)
x = self.linear1(x)
x = F.relu(x)
x = self.linear2(x)
return x
模型训练
--------
接下来,我们用一个循环来进行模型的训练,我们将会: -
使用\ ``paddle.optimizer.Adam``\ 优化器来进行优化。 -
使用\ ``F.softmax_with_cross_entropy``\ 来计算损失值。 -
使用\ ``paddle.io.DataLoader``\ 来加载数据并组建batch
.. code:: ipython3
epoch_num = 10
batch_size = 32
learning_rate = 0.001
.. code:: ipython3
val_acc_history = []
val_loss_history = []
def train(model):
print('start training ... ')
# turn into training mode
model.train()
opt = paddle.optimizer.Adam(learning_rate=learning_rate,
parameters=model.parameters())
train_loader = paddle.io.DataLoader(cifar10_train,
places=paddle.CPUPlace(),
shuffle=True,
batch_size=batch_size)
cifar10_test = paddle.vision.datasets.cifar.Cifar10(mode='test', transform=None)
valid_loader = paddle.io.DataLoader(cifar10_test, places=paddle.CPUPlace(), batch_size=batch_size)
for epoch in range(epoch_num):
for batch_id, data in enumerate(train_loader()):
x_data = paddle.cast(data[0], 'float32')
x_data = paddle.reshape(x_data, (-1, 3, 32, 32)) / 255.0
y_data = paddle.cast(data[1], 'int64')
y_data = paddle.reshape(y_data, (-1, 1))
logits = model(x_data)
loss = F.softmax_with_cross_entropy(logits, y_data)
avg_loss = paddle.mean(loss)
if batch_id % 1000 == 0:
print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, avg_loss.numpy()))
avg_loss.backward()
opt.minimize(avg_loss)
model.clear_gradients()
# evaluate model after one epoch
model.eval()
accuracies = []
losses = []
for batch_id, data in enumerate(valid_loader()):
x_data = paddle.cast(data[0], 'float32')
x_data = paddle.reshape(x_data, (-1, 3, 32, 32)) / 255.0
y_data = paddle.cast(data[1], 'int64')
y_data = paddle.reshape(y_data, (-1, 1))
logits = model(x_data)
loss = F.softmax_with_cross_entropy(logits, y_data)
acc = paddle.metric.accuracy(logits, y_data)
accuracies.append(np.mean(acc.numpy()))
losses.append(np.mean(loss.numpy()))
avg_acc, avg_loss = np.mean(accuracies), np.mean(losses)
print("[validation] accuracy/loss: {}/{}".format(avg_acc, avg_loss))
val_acc_history.append(avg_acc)
val_loss_history.append(avg_loss)
model.train()
model = MyNet(num_classes=10)
train(model)
.. parsed-literal::
start training ...
epoch: 0, batch_id: 0, loss is: [2.3024805]
epoch: 0, batch_id: 1000, loss is: [1.1422595]
[validation] accuracy/loss: 0.5575079917907715/1.2516425848007202
epoch: 1, batch_id: 0, loss is: [0.9350736]
epoch: 1, batch_id: 1000, loss is: [1.3825703]
[validation] accuracy/loss: 0.5959464907646179/1.1320706605911255
epoch: 2, batch_id: 0, loss is: [0.979844]
epoch: 2, batch_id: 1000, loss is: [0.87730503]
[validation] accuracy/loss: 0.6607428193092346/0.9754576086997986
epoch: 3, batch_id: 0, loss is: [0.7345351]
epoch: 3, batch_id: 1000, loss is: [1.0982555]
[validation] accuracy/loss: 0.6671326160430908/0.9667007327079773
epoch: 4, batch_id: 0, loss is: [0.9291839]
epoch: 4, batch_id: 1000, loss is: [1.1812104]
[validation] accuracy/loss: 0.6895966529846191/0.9075900316238403
epoch: 5, batch_id: 0, loss is: [0.5072213]
epoch: 5, batch_id: 1000, loss is: [0.60360587]
[validation] accuracy/loss: 0.6944888234138489/0.8740479350090027
epoch: 6, batch_id: 0, loss is: [0.5917944]
epoch: 6, batch_id: 1000, loss is: [0.7963876]
[validation] accuracy/loss: 0.7072683572769165/0.8597638607025146
epoch: 7, batch_id: 0, loss is: [0.50116754]
epoch: 7, batch_id: 1000, loss is: [0.95844793]
[validation] accuracy/loss: 0.700579047203064/0.876727819442749
epoch: 8, batch_id: 0, loss is: [0.87496114]
epoch: 8, batch_id: 1000, loss is: [0.68749857]
[validation] accuracy/loss: 0.7198482155799866/0.8403064608573914
epoch: 9, batch_id: 0, loss is: [0.8548105]
epoch: 9, batch_id: 1000, loss is: [0.6488569]
[validation] accuracy/loss: 0.7106629610061646/0.874437153339386
.. code:: ipython3
plt.plot(val_acc_history, label = 'validation accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim([0.5, 0.8])
plt.legend(loc='lower right')
.. parsed-literal::
<matplotlib.legend.Legend at 0x163d6ec50>
.. image:: convnet_image_classification_files/convnet_image_classification_12_1.png
The End
-------
从上面的示例可以看到,在cifar10数据集上,使用简单的卷积神经网络,用飞桨可以达到71%以上的准确率。
因为 它太大了无法显示 source diff 。你可以改为 查看blob
基于图片相似度的图片搜索
========================
简要介绍
--------
图片搜索是一种有着广泛的应用场景的深度学习技术的应用,目前,无论是工程图纸的检索,还是互联网上相似图片的搜索,都基于深度学习算法能够实现很好的基于给定图片,检索出跟该图片相似的图片的效果。
本示例简要介绍如何通过飞桨开源框架,实现图片搜索的功能。其基本思路是,先将图片使用卷积神经网络转换为高维空间的向量表示,然后计算两张图片的高维空间的向量表示之间的相似程度(本示例中,我们使用余弦相似度)。在模型训练阶段,其训练目标是让同一类别的图片的相似程度尽可能的高,不同类别的图片的相似程度尽可能的低。在模型预测阶段,对于用户上传的一张图片,会计算其与图片库中图片的相似程度,返回给用户按照相似程度由高到低的图片的列表作为检索的结果。
环境设置
--------
本示例基于飞桨开源框架2.0版本。
.. code:: ipython3
import paddle
import paddle.nn.functional as F
import numpy as np
import random
import matplotlib.pyplot as plt
from PIL import Image
from collections import defaultdict
paddle.disable_static()
print(paddle.__version__)
print(paddle.__git_commit__)
.. parsed-literal::
0.0.0
89af2088b6e74bdfeef2d4d78e08461ed2aafee5
数据集
------
本示例采用\ `CIFAR-10 <https://www.cs.toronto.edu/~kriz/cifar.html>`__\ 数据集。这是一个经典的数据集,由50000张图片的训练数据,和10000张图片的测试数据组成,其中每张图片是一个RGB的长和宽都为32的图片。使用\ ``paddle.dataset.cifar``\ 可以方便的完成数据的下载工作,把数据归一化到\ ``(0, 1.0)``\ 区间内,并提供迭代器供按顺序访问数据。我们会把训练数据和测试数据分别存放在两个\ ``numpy``\ 数组中,供后面的训练和评估来使用。
.. code:: ipython3
cifar10_train = paddle.vision.datasets.cifar.Cifar10(mode='train', transform=None)
x_train = np.zeros((50000, 3, 32, 32))
y_train = np.zeros((50000, 1), dtype='int32')
for i in range(len(cifar10_train)):
train_image, train_label = cifar10_train[i]
train_image = train_image.reshape((3,32,32 ))
# normalize the data
x_train[i,:, :, :] = train_image / 255.
y_train[i, 0] = train_label
y_train = np.squeeze(y_train)
print(x_train.shape)
print(y_train.shape)
.. parsed-literal::
(50000, 3, 32, 32)
(50000,)
.. code:: ipython3
cifar10_test = paddle.vision.datasets.cifar.Cifar10(mode='test', transform=None)
x_test = np.zeros((10000, 3, 32, 32), dtype='float32')
y_test = np.zeros((10000, 1), dtype='int64')
for i in range(len(cifar10_test)):
test_image, test_label = cifar10_test[i]
test_image = test_image.reshape((3,32,32 ))
# normalize the data
x_test[i,:, :, :] = test_image / 255.
y_test[i, 0] = test_label
y_test = np.squeeze(y_test)
print(x_test.shape)
print(y_test.shape)
.. parsed-literal::
(10000, 3, 32, 32)
(10000,)
数据探索
--------
接下来我们随机从训练数据里找一些图片,浏览一下这些图片。
.. code:: ipython3
height_width = 32
def show_collage(examples):
box_size = height_width + 2
num_rows, num_cols = examples.shape[:2]
collage = Image.new(
mode="RGB",
size=(num_cols * box_size, num_rows * box_size),
color=(255, 255, 255),
)
for row_idx in range(num_rows):
for col_idx in range(num_cols):
array = (np.array(examples[row_idx, col_idx]) * 255).astype(np.uint8)
array = array.transpose(1,2,0)
collage.paste(
Image.fromarray(array), (col_idx * box_size, row_idx * box_size)
)
collage = collage.resize((2 * num_cols * box_size, 2 * num_rows * box_size))
return collage
sample_idxs = np.random.randint(0, 50000, size=(5, 5))
examples = x_train[sample_idxs]
show_collage(examples)
.. image:: image_search_files/image_search_8_0.png
构建训练数据
--------------
图片检索的模型的训练样本跟我们常见的分类任务的训练样本不太一样的地方在于,每个训练样本并不是一个\ ``(image, class)``\ 这样的形式。而是(image0,
image1,
similary_or_not)的形式,即,每一个训练样本由两张图片组成,而其\ ``label``\ 是这两张图片是否相似的标志位(0或者1)。
很自然的我们能够想到,来自同一个类别的两张图片,是相似的图片,而来自不同类别的两张图片,应该是不相似的图片。
为了能够方便的抽样出相似图片(以及不相似图片)的样本,我们先建立能够根据类别找到该类别下所有图片的索引。
.. code:: ipython3
class_idx_to_train_idxs = defaultdict(list)
for y_train_idx, y in enumerate(y_train):
class_idx_to_train_idxs[y].append(y_train_idx)
class_idx_to_test_idxs = defaultdict(list)
for y_test_idx, y in enumerate(y_test):
class_idx_to_test_idxs[y].append(y_test_idx)
有了上面的索引,我们就可以为飞桨准备一个读取数据的迭代器。该迭代器每次生成\ ``2 * number of classes``\ 张图片,在CIFAR10数据集中,这会是20张图片。前10张图片,和后10张图片,分别是10个类别中每个类别随机抽出的一张图片。这样,在实际的训练过程中,我们就会有10张相似的图片和90张不相似的图片(前10张图片中的任意一张图片,都与后10张的对应位置的1张图片相似,而与其他9张图片不相似)。
.. code:: ipython3
num_classes = 10
def reader_creator(num_batchs):
def reader():
iter_step = 0
while True:
if iter_step >= num_batchs:
break
iter_step += 1
x = np.empty((2, num_classes, 3, height_width, height_width), dtype=np.float32)
for class_idx in range(num_classes):
examples_for_class = class_idx_to_train_idxs[class_idx]
anchor_idx = random.choice(examples_for_class)
positive_idx = random.choice(examples_for_class)
while positive_idx == anchor_idx:
positive_idx = random.choice(examples_for_class)
x[0, class_idx] = x_train[anchor_idx]
x[1, class_idx] = x_train[positive_idx]
yield x
return reader
# num_batchs: how many batchs to generate
def anchor_positive_pairs(num_batchs=100):
return reader_creator(num_batchs)
.. code:: ipython3
pairs_train_reader = anchor_positive_pairs(num_batchs=1000)
拿出第一批次的图片,并可视化的展示出来,如下所示。(这样更容易理解训练样本的构成)
.. code:: ipython3
examples = next(pairs_train_reader())
print(examples.shape)
show_collage(examples)
.. parsed-literal::
(2, 10, 3, 32, 32)
.. image:: image_search_files/image_search_15_1.png
把图片转换为高维的向量表示的网络
-----------------------------------
我们的目标是首先把图片转换为高维空间的表示,然后计算图片在高维空间表示时的相似度。
下面的网络结构用来把一个形状为\ ``(3, 32, 32)``\ 的图片转换成形状为\ ``(8,)``\ 的向量。在有些资料中也会把这个转换成的向量称为\ ``Embedding``\ ,请注意,这与自然语言处理领域的词向量的区别。
下面的模型由三个连续的卷积加一个全局均值池化,然后用一个线性全链接层映射到维数为8的向量空间。为了后续计算余弦相似度时的便利,我们还在最后用\ `l2_normalize <https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/layers_cn/l2_normalize_cn.html>`__\ 做了归一化。(即,余弦相似度的分母部分)
.. code:: ipython3
class MyNet(paddle.nn.Layer):
def __init__(self):
super(MyNet, self).__init__()
self.conv1 = paddle.nn.Conv2d(in_channels=3,
out_channels=32,
kernel_size=(3, 3),
stride=2)
self.conv2 = paddle.nn.Conv2d(in_channels=32,
out_channels=64,
kernel_size=(3,3),
stride=2)
self.conv3 = paddle.nn.Conv2d(in_channels=64,
out_channels=128,
kernel_size=(3,3),
stride=2)
self.gloabl_pool = paddle.nn.AdaptiveAvgPool2d((1,1))
self.fc1 = paddle.nn.Linear(in_features=128, out_features=8)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = self.conv3(x)
x = F.relu(x)
x = self.gloabl_pool(x)
x = paddle.squeeze(x, axis=[2, 3])
x = self.fc1(x)
x = F.l2_normalize(x, axis=1)
return x
在模型的训练过程中如下面的代码所示:
- ``inverse_temperature``\ 参数起到的作用是让softmax在计算梯度时,能够处于梯度更显著的区域。(可以参考\ `attention
is all you
need <https://arxiv.org/abs/1706.03762>`__\ 中,在点积之后的\ ``scale``\ 操作)。
- 整个计算过程,会先用上面的网络分别计算前10张图片(anchors)的高维表示,和后10张图片的高维表示。然后再用\ `matmul <https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/layers_cn/matmul_cn.html>`__\ 计算前10张图片分别与后10张图片的相似度。(所以\ ``similarities``\ 会是一个\ ``(10, 10)``\ Tensor)。
- \ `softmax_with_cross_entropy <https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/layers_cn/softmax_with_cross_entropy_cn.html>`__\ 构造类别标签时,则相应的,可以构造出来0
~
num_classes的标签值,用来让学习的目标成为相似的图片的相似度尽可能的趋向于1.0,而不相似的图片的相似度尽可能的趋向于-1.0
.. code:: ipython3
# 定义训练过程
def train(model):
print('start training ... ')
model.train()
inverse_temperature = paddle.to_tensor(np.array([1.0/0.2], dtype='float32'))
epoch_num = 20
opt = paddle.optimizer.Adam(learning_rate=0.0001,
parameters=model.parameters())
for epoch in range(epoch_num):
for batch_id, data in enumerate(pairs_train_reader()):
anchors_data, positives_data = data[0], data[1]
anchors = paddle.to_tensor(anchors_data)
positives = paddle.to_tensor(positives_data)
anchor_embeddings = model(anchors)
positive_embeddings = model(positives)
similarities = paddle.matmul(anchor_embeddings, positive_embeddings, transpose_y=True)
similarities = paddle.multiply(similarities, inverse_temperature)
sparse_labels = paddle.arange(0, num_classes, dtype='int64')
sparse_labels = paddle.reshape(sparse_labels, (num_classes, 1))
loss = F.softmax_with_cross_entropy(similarities, sparse_labels)
avg_loss = paddle.mean(loss)
if batch_id % 500 == 0:
print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, avg_loss.numpy()))
avg_loss.backward()
opt.minimize(avg_loss)
model.clear_gradients()
model = MyNet()
train(model)
.. parsed-literal::
start training ...
epoch: 0, batch_id: 0, loss is: [2.3080945]
epoch: 0, batch_id: 500, loss is: [2.326215]
epoch: 1, batch_id: 0, loss is: [2.0898924]
epoch: 1, batch_id: 500, loss is: [1.8754089]
epoch: 2, batch_id: 0, loss is: [2.2416227]
epoch: 2, batch_id: 500, loss is: [1.9024051]
epoch: 3, batch_id: 0, loss is: [1.841417]
epoch: 3, batch_id: 500, loss is: [2.1239076]
epoch: 4, batch_id: 0, loss is: [1.9291763]
epoch: 4, batch_id: 500, loss is: [2.2363486]
epoch: 5, batch_id: 0, loss is: [2.0078473]
epoch: 5, batch_id: 500, loss is: [2.0765374]
epoch: 6, batch_id: 0, loss is: [2.080376]
epoch: 6, batch_id: 500, loss is: [2.1759136]
epoch: 7, batch_id: 0, loss is: [1.908263]
epoch: 7, batch_id: 500, loss is: [1.7774136]
epoch: 8, batch_id: 0, loss is: [1.6335764]
epoch: 8, batch_id: 500, loss is: [1.5713912]
epoch: 9, batch_id: 0, loss is: [2.287479]
epoch: 9, batch_id: 500, loss is: [1.7719988]
epoch: 10, batch_id: 0, loss is: [1.2894523]
epoch: 10, batch_id: 500, loss is: [1.599735]
epoch: 11, batch_id: 0, loss is: [1.78816]
epoch: 11, batch_id: 500, loss is: [1.4773489]
epoch: 12, batch_id: 0, loss is: [1.6737808]
epoch: 12, batch_id: 500, loss is: [1.8889393]
epoch: 13, batch_id: 0, loss is: [1.6156021]
epoch: 13, batch_id: 500, loss is: [1.3851049]
epoch: 14, batch_id: 0, loss is: [1.3854092]
epoch: 14, batch_id: 500, loss is: [2.0325592]
epoch: 15, batch_id: 0, loss is: [1.9734558]
epoch: 15, batch_id: 500, loss is: [1.8050598]
epoch: 16, batch_id: 0, loss is: [1.7084911]
epoch: 16, batch_id: 500, loss is: [1.8919995]
epoch: 17, batch_id: 0, loss is: [1.3137552]
epoch: 17, batch_id: 500, loss is: [1.8817297]
epoch: 18, batch_id: 0, loss is: [1.9453808]
epoch: 18, batch_id: 500, loss is: [2.1317677]
epoch: 19, batch_id: 0, loss is: [1.6051079]
epoch: 19, batch_id: 500, loss is: [1.779858]
模型预测
--------
前述的模型训练训练结束之后,我们就可以用该网络结构来计算出任意一张图片的高维向量表示(embedding),通过计算该图片与图片库中其他图片的高维向量表示之间的相似度,就可以按照相似程度进行排序,排序越靠前,则相似程度越高。
下面我们对测试集中所有的图片都两两计算相似度,然后选一部分相似的图片展示出来。
.. code:: ipython3
near_neighbours_per_example = 10
x_test_t = paddle.to_tensor(x_test)
test_images_embeddings = model(x_test_t)
similarities_matrix = paddle.matmul(test_images_embeddings, test_images_embeddings, transpose_y=True)
indicies = paddle.argsort(similarities_matrix, descending=True)
indicies = indicies.numpy()
.. code:: ipython3
num_collage_examples = 10
examples = np.empty(
(
num_collage_examples,
near_neighbours_per_example + 1,
3,
height_width,
height_width,
),
dtype=np.float32,
)
for row_idx in range(num_collage_examples):
examples[row_idx, 0] = x_test[row_idx]
anchor_near_neighbours = indicies[row_idx][1:near_neighbours_per_example+1]
for col_idx, nn_idx in enumerate(anchor_near_neighbours):
examples[row_idx, col_idx + 1] = x_test[nn_idx]
show_collage(examples)
.. image:: image_search_files/image_search_22_0.png
The end
-------
上面展示的结果当中,每一行里其余的图片都是跟第一张图片按照相似度进行排序相似的图片。你也可以调整网络结构和超参数,以获得更好的结果。
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "ueGUN2EQeScw"
},
"source": [
"# 基于U型语义分割模型实现的宠物图像分割\n",
"\n",
"本示例教程当前是基于2.0-beta版本Paddle做的案例实现,未来会随着2.0的系列版本发布进行升级。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1.简要介绍\n",
"\n",
"在计算机视觉领域,图像分割指的是将数字图像细分为多个图像子区域的过程。图像分割的目的是简化或改变图像的表示形式,使得图像更容易理解和分析。图像分割通常用于定位图像中的物体和边界(线,曲线等)。更精确的,图像分割是对图像中的每个像素加标签的一个过程,这一过程使得具有相同标签的像素具有某种共同视觉特性。图像分割的领域非常多,无人车、地块检测、表计识别等等。\n",
"\n",
"本示例简要介绍如何通过飞桨开源框架,实现图像分割。这里我们是采用了一个在图像分割领域比较熟知的U-Net网络结构,是一个基于FCN做改进后的一个深度学习网络,包含下采样(编码器,特征提取)和上采样(解码器,分辨率还原)两个阶段,因模型结构比较像U型而命名为U-Net。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.环境设置\n",
"\n",
"导入一些比较基础常用的模块,确认自己的飞桨版本。"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "'0.0.0'"
},
"metadata": {},
"execution_count": 1
}
],
"source": [
"import os\n",
"import io\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from PIL import Image as PilImage\n",
"\n",
"import paddle\n",
"from paddle.nn import functional as F\n",
"\n",
"paddle.__version__"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "VMC2xLAxeScx"
},
"source": [
"## 3.数据集"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "H0KiJ_5N936Y"
},
"source": [
"### 3.1 数据集下载\n",
"\n",
"本案例使用Oxford-IIIT Pet数据集,官网:https://www.robots.ox.ac.uk/~vgg/data/pets 。\n",
"\n",
"数据集统计如下:\n",
"\n",
"![alt 数据集统计信息](https://www.robots.ox.ac.uk/~vgg/data/pets/breed_count.jpg)\n",
"\n",
"数据集包含两个压缩文件:\n",
"\n",
"1. 原图:https://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz\n",
"2. 分割图像:https://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 119
},
"colab_type": "code",
"id": "xJd9y-u9eScy",
"outputId": "3985783f-7166-4afa-f511-16427b3e2a71",
"tags": []
},
"outputs": [],
"source": [
"!curl -O http://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz\n",
"!curl -O http://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz\n",
"!tar -xf images.tar.gz\n",
"!tar -xf annotations.tar.gz"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "L5cP2CBz-Mra"
},
"source": [
"### 3.2 数据集概览\n",
"\n",
"首先我们先看看下载到磁盘上的文件结构是什么样,来了解一下我们的数据集。\n",
"\n",
"1. 首先看一下images.tar.gz这个压缩包,该文件解压后得到一个images目录,这个目录比较简单,里面直接放的是用类名和序号命名好的图片文件,每个图片是对应的宠物照片。\n",
"\n",
"```bash\n",
".\n",
"├── samoyed_7.jpg\n",
"├── ......\n",
"└── samoyed_81.jpg\n",
"```\n",
"\n",
"2. 然后我们在看下annotations.tar.gz,文件解压后的目录里面包含以下内容,目录中的README文件将每个目录和文件做了比较详细的介绍,我们可以通过README来查看每个目录文件的说明。\n",
"\n",
"```bash\n",
".\n",
"├── README\n",
"├── list.txt\n",
"├── test.txt\n",
"├── trainval.txt\n",
"├── trimaps\n",
"│   ├── Abyssinian_1.png\n",
"│    ├── Abyssinian_10.png\n",
"│    ├── ......\n",
"│    └── yorkshire_terrier_99.png\n",
"└── xmls\n",
" ├── Abyssinian_1.xml\n",
" ├── Abyssinian_10.xml\n",
" ├── ......\n",
" └── yorkshire_terrier_190.xml\n",
"```\n",
"\n",
"本次我们主要使用到images和annotations/trimaps两个目录,即原图和三元图像文件,前者作为训练的输入数据,后者是对应的标签数据。\n",
"\n",
"我们来看看这个数据集给我们提供了多少个训练样本。"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"colab_type": "code",
"id": "tqB7YQ4leSc4",
"outputId": "8872356c-ef32-4c94-defb-66250a00890a",
"tags": []
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": "用于训练的图片样本数量: 7390\n"
}
],
"source": [
"train_images_path = \"images/\"\n",
"label_images_path = \"annotations/trimaps/\"\n",
"\n",
"print(\"用于训练的图片样本数量:\", len([os.path.join(train_images_path, image_name) \n",
" for image_name in os.listdir(train_images_path) \n",
" if image_name.endswith('.jpg')]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.3 数据集类定义\n",
"\n",
"飞桨(PaddlePaddle)数据集加载方案是统一使用Dataset(数据集定义) + DataLoader(多进程数据集加载)。\n",
"\n",
"首先我们先进行数据集的定义,数据集定义主要是实现一个新的Dataset类,继承父类paddle.io.Dataset,并实现父类中以下两个抽象方法,`__getitem__`和`__len__`:\n",
"\n",
"```python\n",
"class MyDataset(Dataset):\n",
" def __init__(self):\n",
" ...\n",
" \n",
" # 每次迭代时返回数据和对应的标签\n",
" def __getitem__(self, idx):\n",
" return x, y\n",
"\n",
" # 返回整个数据集的总数\n",
" def __len__(self):\n",
" return count(samples)\n",
"```\n",
"\n",
"在数据集内部可以结合图像数据预处理相关API进行图像的预处理(改变大小、反转、调整格式等)。\n",
"\n",
"由于加载进来的图像不一定都符合自己的需求,举个例子,已下载的这些图片里面就会有RGBA格式的图片,这个时候图片就不符合我们所需3通道的需求,我们需要进行图片的格式转换,那么这里我们直接实现了一个通用的图片读取接口,确保读取出来的图片都是满足我们的需求。\n",
"\n",
"另外图片加载出来的默认shape是HWC,这个时候要看看是否满足后面训练的需要,如果Layer的默认格式和这个不是符合的情况下,需要看下Layer有没有参数可以进行格式调整。不过如果layer较多的话,还是直接调整原数据Shape比较好,否则每个layer都要做参数设置,如果有遗漏就会导致训练出错,那么在本案例中是直接对数据源的shape做了统一调整,从HWC转换成了CHW,因为飞桨的卷积等API的默认输入格式为CHW,这样处理方便后续模型训练。"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import random\n",
"\n",
"from paddle.io import Dataset\n",
"from paddle.vision.transforms import transforms\n",
"\n",
"\n",
"class ImgTranspose(object):\n",
" \"\"\"\n",
" 图像预处理工具,用于将Mask图像进行升维(160, 160) => (160, 160, 1),\n",
" 并对图像的维度进行转换从HWC变为CHW\n",
" \"\"\"\n",
" def __init__(self, fmt):\n",
" self.format = fmt\n",
" \n",
" def __call__(self, img):\n",
" if len(img.shape) == 2:\n",
" img = np.expand_dims(img, axis=2)\n",
" \n",
" return img.transpose(self.format)\n",
"\n",
"class PetDataset(Dataset):\n",
" \"\"\"\n",
" 数据集定义\n",
" \"\"\"\n",
" def __init__(self, image_path, label_path, mode='train'):\n",
" \"\"\"\n",
" 构造函数\n",
" \"\"\"\n",
" self.image_size = (160, 160)\n",
" self.image_path = image_path\n",
" self.label_path = label_path\n",
" self.mode = mode.lower()\n",
" self.eval_image_num = 1000\n",
" \n",
" assert self.mode in ['train', 'test'], \\\n",
" \"mode should be 'train' or 'test', but got {}\".format(self.mode)\n",
" \n",
" self._parse_dataset()\n",
" \n",
" self.transforms = transforms.Compose([\n",
" ImgTranspose((2, 0, 1))\n",
" ])\n",
" \n",
" def _sort_images(self, image_dir, image_type):\n",
" \"\"\"\n",
" 对文件夹内的图像进行按照文件名排序\n",
" \"\"\"\n",
" files = []\n",
"\n",
" for image_name in os.listdir(image_dir):\n",
" if image_name.endswith('.{}'.format(image_type)) \\\n",
" and not image_name.startswith('.'):\n",
" files.append(os.path.join(image_dir, image_name))\n",
"\n",
" return sorted(files)\n",
" \n",
" def _parse_dataset(self):\n",
" \"\"\"\n",
" 由于所有文件都是散落在文件夹中,在训练时我们需要使用的是数据集和标签对应的数据关系,\n",
" 所以我们第一步是对原始的数据集进行整理,得到数据集和标签两个数组,分别一一对应。\n",
" 这样可以在使用的时候能够很方便的找到原始数据和标签的对应关系,否则对于原有的文件夹图片数据无法直接应用。\n",
" 在这里是用了一个非常简单的方法,按照文件名称进行排序。\n",
" 因为刚好数据和标签的文件名是按照这个逻辑制作的,名字都一样,只有扩展名不一样。\n",
" \"\"\"\n",
" temp_train_images = self._sort_images(self.image_path, 'jpg')\n",
" temp_label_images = self._sort_images(self.label_path, 'png')\n",
"\n",
" random.Random(1337).shuffle(temp_train_images)\n",
" random.Random(1337).shuffle(temp_label_images)\n",
" \n",
" if self.mode == 'train':\n",
" self.train_images = temp_train_images[:-self.eval_image_num]\n",
" self.label_images = temp_label_images[:-self.eval_image_num]\n",
" else:\n",
" self.train_images = temp_train_images[-self.eval_image_num:]\n",
" self.label_images = temp_label_images[-self.eval_image_num:]\n",
" \n",
" def _load_img(self, path, color_mode='rgb'):\n",
" \"\"\"\n",
" 统一的图像处理接口封装,用于规整图像大小和通道\n",
" \"\"\"\n",
" with open(path, 'rb') as f:\n",
" img = PilImage.open(io.BytesIO(f.read()))\n",
" if color_mode == 'grayscale':\n",
" # if image is not already an 8-bit, 16-bit or 32-bit grayscale image\n",
" # convert it to an 8-bit grayscale image.\n",
" if img.mode not in ('L', 'I;16', 'I'):\n",
" img = img.convert('L')\n",
" elif color_mode == 'rgba':\n",
" if img.mode != 'RGBA':\n",
" img = img.convert('RGBA')\n",
" elif color_mode == 'rgb':\n",
" if img.mode != 'RGB':\n",
" img = img.convert('RGB')\n",
" else:\n",
" raise ValueError('color_mode must be \"grayscale\", \"rgb\", or \"rgba\"')\n",
"\n",
" if self.image_size is not None:\n",
" if img.size != self.image_size:\n",
" img = img.resize(self.image_size, PilImage.NEAREST)\n",
"\n",
" return img\n",
"\n",
" def __getitem__(self, idx):\n",
" \"\"\"\n",
" 返回 image, label\n",
" \"\"\"\n",
" # 花了比较多的时间在数据处理这里,需要处理成模型能适配的格式,踩了一些坑(比如有不是RGB格式的)\n",
" # 有图片会出现通道数和期望不符的情况,需要进行相关考虑\n",
"\n",
" # 加载原始图像\n",
" train_image = self._load_img(self.train_images[idx])\n",
" x = np.array(train_image, dtype='float32')\n",
"\n",
" # 对图像进行预处理,统一大小,转换维度格式(HWC => CHW)\n",
" x = self.transforms(x)\n",
" \n",
" # 加载Label图像\n",
" label_image = self._load_img(self.label_images[idx], color_mode=\"grayscale\") \n",
" y = np.array(label_image, dtype='uint8') \n",
"\n",
" # 图像预处理\n",
" # Label图像是二维的数组(size, size),升维到(size, size, 1)后才能用于最后loss计算\n",
" y = self.transforms(y)\n",
" \n",
" # 返回img, label,转换为需要的格式\n",
" return x, y.astype('int64')\n",
" \n",
" def __len__(self):\n",
" \"\"\"\n",
" 返回数据集总数\n",
" \"\"\"\n",
" return len(self.train_images)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "GYxTHfbBESSG"
},
"source": [
"### 3.4 PetDataSet数据集抽样展示\n",
"\n",
"实现好Dataset数据集后,我们来测试一下数据集是否符合预期,因为Dataset是一个可以被迭代的Class,我们通过for循环从里面读取数据来用matplotlib进行展示,这里要注意的是对于分割的标签文件因为是1通道的灰度图片,需要在使用imshow接口时注意下传参cmap='gray'。"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 479
},
"colab_type": "code",
"id": "MTO-C5qFDnPn",
"outputId": "0937ed5e-1216-4773-9b54-16db8fe7b457"
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 432x288 with 2 Axes>",
"image/svg+xml": "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\n<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n<!-- Created with matplotlib (https://matplotlib.org/) -->\n<svg height=\"181.699943pt\" version=\"1.1\" viewBox=\"0 0 349.2 181.699943\" width=\"349.2pt\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n <defs>\n <style type=\"text/css\">\n*{stroke-linecap:butt;stroke-linejoin:round;}\n </style>\n </defs>\n <g id=\"figure_1\">\n <g id=\"patch_1\">\n <path d=\"M 0 181.699943 \nL 349.2 181.699943 \nL 349.2 0 \nL 0 0 \nz\n\" style=\"fill:none;\"/>\n </g>\n <g id=\"axes_1\">\n <g clip-path=\"url(#p58ad9a7e6d)\">\n <image height=\"153\" id=\"image6a21407320\" transform=\"scale(1 -1)translate(0 -153)\" width=\"153\" x=\"7.2\" xlink:href=\"data:image/png;base64,\\" y=\"-21.499943\"/>\n </g>\n <g id=\"text_1\">\n <!-- Train Image -->\n <defs>\n <path d=\"M -0.296875 72.90625 \nL 61.375 72.90625 \nL 61.375 64.59375 \nL 35.5 64.59375 \nL 35.5 0 \nL 25.59375 0 \nL 25.59375 64.59375 \nL -0.296875 64.59375 \nz\n\" id=\"DejaVuSans-84\"/>\n <path d=\"M 41.109375 46.296875 \nQ 39.59375 47.171875 37.8125 47.578125 \nQ 36.03125 48 33.890625 48 \nQ 26.265625 48 22.1875 43.046875 \nQ 18.109375 38.09375 18.109375 28.8125 \nL 18.109375 0 \nL 9.078125 0 \nL 9.078125 54.6875 \nL 18.109375 54.6875 \nL 18.109375 46.1875 \nQ 20.953125 51.171875 25.484375 53.578125 \nQ 30.03125 56 36.53125 56 \nQ 37.453125 56 38.578125 55.875 \nQ 39.703125 55.765625 41.0625 55.515625 \nz\n\" id=\"DejaVuSans-114\"/>\n <path d=\"M 34.28125 27.484375 \nQ 23.390625 27.484375 19.1875 25 \nQ 14.984375 22.515625 14.984375 16.5 \nQ 14.984375 11.71875 18.140625 8.90625 \nQ 21.296875 6.109375 26.703125 6.109375 \nQ 34.1875 6.109375 38.703125 11.40625 \nQ 43.21875 16.703125 43.21875 25.484375 \nL 43.21875 27.484375 \nz\nM 52.203125 31.203125 \nL 52.203125 0 \nL 43.21875 0 \nL 43.21875 8.296875 \nQ 40.140625 3.328125 35.546875 0.953125 \nQ 30.953125 -1.421875 24.3125 -1.421875 \nQ 15.921875 -1.421875 10.953125 3.296875 \nQ 6 8.015625 6 15.921875 \nQ 6 25.140625 12.171875 29.828125 \nQ 18.359375 34.515625 30.609375 34.515625 \nL 43.21875 34.515625 \nL 43.21875 35.40625 \nQ 43.21875 41.609375 39.140625 45 \nQ 35.0625 48.390625 27.6875 48.390625 \nQ 23 48.390625 18.546875 47.265625 \nQ 14.109375 46.140625 10.015625 43.890625 \nL 10.015625 52.203125 \nQ 14.9375 54.109375 19.578125 55.046875 \nQ 24.21875 56 28.609375 56 \nQ 40.484375 56 46.34375 49.84375 \nQ 52.203125 43.703125 52.203125 31.203125 \nz\n\" id=\"DejaVuSans-97\"/>\n <path d=\"M 9.421875 54.6875 \nL 18.40625 54.6875 \nL 18.40625 0 \nL 9.421875 0 \nz\nM 9.421875 75.984375 \nL 18.40625 75.984375 \nL 18.40625 64.59375 \nL 9.421875 64.59375 \nz\n\" id=\"DejaVuSans-105\"/>\n <path d=\"M 54.890625 33.015625 \nL 54.890625 0 \nL 45.90625 0 \nL 45.90625 32.71875 \nQ 45.90625 40.484375 42.875 44.328125 \nQ 39.84375 48.1875 33.796875 48.1875 \nQ 26.515625 48.1875 22.3125 43.546875 \nQ 18.109375 38.921875 18.109375 30.90625 \nL 18.109375 0 \nL 9.078125 0 \nL 9.078125 54.6875 \nL 18.109375 54.6875 \nL 18.109375 46.1875 \nQ 21.34375 51.125 25.703125 53.5625 \nQ 30.078125 56 35.796875 56 \nQ 45.21875 56 50.046875 50.171875 \nQ 54.890625 44.34375 54.890625 33.015625 \nz\n\" id=\"DejaVuSans-110\"/>\n <path id=\"DejaVuSans-32\"/>\n <path d=\"M 9.8125 72.90625 \nL 19.671875 72.90625 \nL 19.671875 0 \nL 9.8125 0 \nz\n\" id=\"DejaVuSans-73\"/>\n <path d=\"M 52 44.1875 \nQ 55.375 50.25 60.0625 53.125 \nQ 64.75 56 71.09375 56 \nQ 79.640625 56 84.28125 50.015625 \nQ 88.921875 44.046875 88.921875 33.015625 \nL 88.921875 0 \nL 79.890625 0 \nL 79.890625 32.71875 \nQ 79.890625 40.578125 77.09375 44.375 \nQ 74.3125 48.1875 68.609375 48.1875 \nQ 61.625 48.1875 57.5625 43.546875 \nQ 53.515625 38.921875 53.515625 30.90625 \nL 53.515625 0 \nL 44.484375 0 \nL 44.484375 32.71875 \nQ 44.484375 40.625 41.703125 44.40625 \nQ 38.921875 48.1875 33.109375 48.1875 \nQ 26.21875 48.1875 22.15625 43.53125 \nQ 18.109375 38.875 18.109375 30.90625 \nL 18.109375 0 \nL 9.078125 0 \nL 9.078125 54.6875 \nL 18.109375 54.6875 \nL 18.109375 46.1875 \nQ 21.1875 51.21875 25.484375 53.609375 \nQ 29.78125 56 35.6875 56 \nQ 41.65625 56 45.828125 52.96875 \nQ 50 49.953125 52 44.1875 \nz\n\" id=\"DejaVuSans-109\"/>\n <path d=\"M 45.40625 27.984375 \nQ 45.40625 37.75 41.375 43.109375 \nQ 37.359375 48.484375 30.078125 48.484375 \nQ 22.859375 48.484375 18.828125 43.109375 \nQ 14.796875 37.75 14.796875 27.984375 \nQ 14.796875 18.265625 18.828125 12.890625 \nQ 22.859375 7.515625 30.078125 7.515625 \nQ 37.359375 7.515625 41.375 12.890625 \nQ 45.40625 18.265625 45.40625 27.984375 \nz\nM 54.390625 6.78125 \nQ 54.390625 -7.171875 48.1875 -13.984375 \nQ 42 -20.796875 29.203125 -20.796875 \nQ 24.46875 -20.796875 20.265625 -20.09375 \nQ 16.0625 -19.390625 12.109375 -17.921875 \nL 12.109375 -9.1875 \nQ 16.0625 -11.328125 19.921875 -12.34375 \nQ 23.78125 -13.375 27.78125 -13.375 \nQ 36.625 -13.375 41.015625 -8.765625 \nQ 45.40625 -4.15625 45.40625 5.171875 \nL 45.40625 9.625 \nQ 42.625 4.78125 38.28125 2.390625 \nQ 33.9375 0 27.875 0 \nQ 17.828125 0 11.671875 7.65625 \nQ 5.515625 15.328125 5.515625 27.984375 \nQ 5.515625 40.671875 11.671875 48.328125 \nQ 17.828125 56 27.875 56 \nQ 33.9375 56 38.28125 53.609375 \nQ 42.625 51.21875 45.40625 46.390625 \nL 45.40625 54.6875 \nL 54.390625 54.6875 \nz\n\" id=\"DejaVuSans-103\"/>\n <path d=\"M 56.203125 29.59375 \nL 56.203125 25.203125 \nL 14.890625 25.203125 \nQ 15.484375 15.921875 20.484375 11.0625 \nQ 25.484375 6.203125 34.421875 6.203125 \nQ 39.59375 6.203125 44.453125 7.46875 \nQ 49.3125 8.734375 54.109375 11.28125 \nL 54.109375 2.78125 \nQ 49.265625 0.734375 44.1875 -0.34375 \nQ 39.109375 -1.421875 33.890625 -1.421875 \nQ 20.796875 -1.421875 13.15625 6.1875 \nQ 5.515625 13.8125 5.515625 26.8125 \nQ 5.515625 40.234375 12.765625 48.109375 \nQ 20.015625 56 32.328125 56 \nQ 43.359375 56 49.78125 48.890625 \nQ 56.203125 41.796875 56.203125 29.59375 \nz\nM 47.21875 32.234375 \nQ 47.125 39.59375 43.09375 43.984375 \nQ 39.0625 48.390625 32.421875 48.390625 \nQ 24.90625 48.390625 20.390625 44.140625 \nQ 15.875 39.890625 15.1875 32.171875 \nz\n\" id=\"DejaVuSans-101\"/>\n </defs>\n <g transform=\"translate(48.199347 16.318125)scale(0.12 -0.12)\">\n <use xlink:href=\"#DejaVuSans-84\"/>\n <use x=\"46.333984\" xlink:href=\"#DejaVuSans-114\"/>\n <use x=\"87.447266\" xlink:href=\"#DejaVuSans-97\"/>\n <use x=\"148.726562\" xlink:href=\"#DejaVuSans-105\"/>\n <use x=\"176.509766\" xlink:href=\"#DejaVuSans-110\"/>\n <use x=\"239.888672\" xlink:href=\"#DejaVuSans-32\"/>\n <use x=\"271.675781\" xlink:href=\"#DejaVuSans-73\"/>\n <use x=\"301.167969\" xlink:href=\"#DejaVuSans-109\"/>\n <use x=\"398.580078\" xlink:href=\"#DejaVuSans-97\"/>\n <use x=\"459.859375\" xlink:href=\"#DejaVuSans-103\"/>\n <use x=\"523.335938\" xlink:href=\"#DejaVuSans-101\"/>\n </g>\n </g>\n </g>\n <g id=\"axes_2\">\n <g clip-path=\"url(#pf02e2d733d)\">\n <image height=\"153\" id=\"imageb081ed1ee7\" transform=\"scale(1 -1)translate(0 -153)\" width=\"153\" x=\"189.818182\" xlink:href=\"data:image/png;base64,\niVBORw0KGgoAAAANSUhEUgAAAJkAAACZCAYAAAA8XJi6AAAABHNCSVQICAgIfAhkiAAADEVJREFUeJzt3V9MW2UfB/Bv2xUKsm44Fkt0Gpdtzo3ojM4MnasJ4mYUsmxqvMBsmYlx80+miboLY7hxGnahhkVNVDSMhKibYEC2MmAONkWQbR3IYIVRkilsDAoUO1tOe96LveN9m7bQQp/zPE/5fRKS9ZyTnm/gu/Ocnp4/usLCQhUAkpOTsW/fPoTz2Wef4dVXXw07j8xOTk4O6urqQqZXVlbCbrdzSMSOnncAEmzr1q1YuXIl7xhxNVWypKQknjlIAtMDgNlsxltvvRV2Aa/XC7fbrWmo+WBychLj4+O8Y2hCn56ejjfffDPszOvXr+Pzzz+PuK9GZq+xsREvvfRS2HlpaWkwGAwaJ2JH/8Ybb4Sd4fF48O2330YsIGEnPz8ft99+O+8YcRN2x9/j8aC0tBR79uzROg9JQGFL1tPTg927d2udhSQoOoQhqOXLl8NkMvGOERdUMkFZrVYsWrSId4y4CCnZ6OgoSkpKeGQhCSqkZCMjI/j00095ZJl32tvbceTIEd4xmAsqmdvtxnvvvccry7zT3d2No0eP8o7BXFDJPB4PysvLeWUhCYp2/AlzVDLOqqqqcPDgQd4xmKKScXb16lU4nU7eMZiikhHmqGQC27lzJxYvXsw7xpxRyQRmMpmg0+l4x5gzKhlhjkpGmAsqmd/v55VjXgsEAlBVlXcMZqZKNjY2llBnY8rk448/xgcffBB23oIFCzROE380XApuz549SEtL4x1jTqhkhDkqGWGOSiaBjIwMqY+X6QFAVVU0NzfzzjKv9ff3Y2BgIOy8HTt2SH2Fvx4AfD4ftmzZwjvLvPbVV1+hoqKCdwwmaLiUxJo1a6QdMqlkAmlsbMTFixfDzsvPz4deL+efS87UCeq7775DS0sL7xhxRyWTSE5ODu8Is0Ilk0h2draU+2VUMsEUFxfjzJkz0y7z3HPPYfv27RolmjsqmWBaWlrw999/R5xfUFCANWvWICsrCy+88IKGyWaPSiaZ5cuXT/377rvv5pgkelQyiRmNRuzatYt3jBlRyQS0a9euqA5l6HQ6LFmyRINEc0MlE9DQ0BC8Xi/vGHFDJZNcamqq8LddpZIlANHP0KCSCcpqteL8+fNRLbto0SKh7/E7VbJEum98Ioj16iWRvwnQAzce3jUyMgKTyTT1I/ommATT6XTCbiimtmRmsxnXr1+f+jl58iQWL16M1NRUnvnmtfHxcQQCgaiWzcjIwM6dO9kGmqWI+2QbNmyAy+XCoUOHYDabtcxE/uuxxx5DX18f7xhzNuOVo9u2bYPH40FhYeHUtMHBQfzzzz8sc5EEEtXlyQUFBSgoKJh6/c477+D48eNRrUBRFHR0dMwuHUFnZyfuuusuqa8k16mMb8IwMTGBp556Cv/++y/++OMPlqtKWAMDA7BYLDMud/nyZXz99dcaJIqN3uFwwOFwoLe3l8kK0tLS0NTUhJ9++glPP/00NmzYwGQ9iayurg6Kosy4XEpKipD3M9HdfAa5wWBAfn4+gBthWT2iuKurC7t378Yvv/zC5P0T1ejoaFSPwent7UVZWZkGiaI39enS7/ejoqICFRUVqKqqwoULF5iscPXq1SguLsYTTzzB5P2JeMIewnC73bDZbGhsbERXV1fcV5qVlYUDBw5g8+bNcX9vIh7D448/XhhuhtfrhdPpxODgIEZHR6GqKm699da4rdhisSArKwtOp5PZ/mAimZiYwJNPPjnjtZculyvq7zy1MuPn4uHhYfz222+4dOkSOjs7Y3pzvV6PZ555JuL8devW4dFHH4XNZovpfeejgwcP4sCBA1Ieyog68ZUrV3DlypWY3lyn00FRFGzdujXiMs8++yxaWlpQXV0d03sTeUQcLuNlaGgI/f396Ovrw+rVq0PmL126FOvXr4fD4UBPTw/LKNL79ddf8eKLL057xkVKSgoMBgP6+/s1TDY95tveQCCAvr4+GAwGXLt2DRaLJWQIXblyJTIzM1lHkd6JEydmPAUoJSUFS5cu1ShRdDQ7adHv9+Ovv/7CuXPnwg6N+/fvR25urlZxpLV27VreEWKm+Zmxfr8fbrc7ZLrFYpH+Brxa6O7unnGZe+65R6j7ZnA5/bqnpwc1NTU8Vj0vGI1GZGdnw2q18o4CgFPJAoEA2traUFtbGzS9vLxcmF+MyKI5v89gMGDTpk145JFHNEg0PW4XkgQCgZAnoCQnJ0t7ozctRXsun16vF+L3yTVBuE9KIvxSSHxx/Yu2trbixIkTQdPq6urw0EMPcUqUeER4ZpN831GQqKiqilOnTuH06dO8o4hZsoULF0Kv10d9pc58E+lpvqqqTt1Dw263o6GhQctYEXEvmc/nw+TkJIxG49S0hoYGWK1WnDp1iooWxvDwcNBrVVUxMTGBgYEBlJeXc0oVGfeSNTc3w2w2Izs7O2j6yZMnsWzZMly+fJlTMnmMj4/jk08+4R0jIvooR5gTomQulwsTExO8YxBGhChZa2srHA5HyPTc3NygfTUC5OXl8Y4QMyFKFklJSQkWLlzIO4ZQZHzIlzAl6+vrg8vlCpn+8ssvS3nKMfkfYUrW3t6OwcHBkOkffvghkpOTOSQi8SJMyQDg3LlzGBkZ4R1DWEVFRVJ+tytU4osXL2JsbIx3DGG9/vrrIef3+3y+kFOmREM7OxJTFAWHDx8O+8lcJEJtyQCgtrY25GuTmpoaYW9VyUsgEEBZWZnwBQMELNng4GDIgxI2bdok9I13eVBVVajL3qYjXMlI4qGSEeaELFlpaSkdykggQpbM6/WGnEc2NjZG+2WSErJk4dDzBOQlTcmIvKhkhDkqGWGOSkaYo5IR5qhkhDkqGWGOSkaYo5IR5qhkhDkqGWGOSkaYo5IR5qhkhDkqGWGOSkaYo5IR5qhkhDkqGWGOSkaYo5JJymAw4O233+YdIypUMonJcnNAKhlhjkomkcnJSd4RZoVKJhGz2Rzy+EYZUMkIc1QywhyVjDBHJSPMUckIc1QywhyVjDBHJSPMUckIc1QywpwcX+PP0d69e/Hggw/yjhGipaUFxcXFvGMwl/Al27t3L959911YLBbeUULk5OQgNzcXFRUV+Oabb3jHYSahS/baa69h3759uO2223hHCSszMxN5eXl44IEHoCgKDh06xDsSEwm7T/bKK6/g/fffF7Zg/++OO+5AUVERnn/+ed5RmEjILdmOHTuwf/9+pKen844SNYvFgiVLlvCOwURClWzLli0oKSnBLbfcArPZzDtOzD766CNcvXoVR44c4R0lroQdLr/88ku43e6ol9+4cSN++OEHZGZmSlkw4MZJiaWlpdi8eTPvKHEl7JbM5/NBVdWoll23bh3q6+uRlJQUcZmysjI4nc44pYuPe++9F9u3bw+alpqaiqqqKvj9fqxfvx4dHR1T81wul5QPlxW2ZLHQ6/XTFuz7779Hb2+vhomi09HRgaSkJOTl5QVNNxqNMBqNIQ+1N5lMWsaLG2GHy2itWLECbW1tEedXVlbiwoULGiaKzXRba7vdjhUrVkj/dDzpS5acnBx2uqqqOHbsGOx2u8aJYnP27FnYbLaIZXM4HAgEAsjIyAiZJ8tFJVIPl6tWrQraZwFuPABeURQ0NTXh999/55QsNs3NzUhKSoLVag0ZIm8aGhoKeu33+1FUVKRFvDmTtmTp6eno7u4Omd7e3o7KykoOieamsbERRqMR2dnZUu7cT0eq4fLOO+8EAOh0OixbtixkvqIo8Hg8WseKm/r6erS2tkJRFN5R4kqqLZnT6cTatWthMpnC7uz39vaitraWQ7L4sdlsWLBgAe6//34YjcaIy127dk3DVHMjVckA4M8//ww73ev1Ynh4WOM0bPz888/Q6XS47777whZNVVV88cUXHJLNjtDDZX9/f8gD78Px+Xxoa2vD8ePHNUiljerqapw/fz7s0Hnp0iUOiWZP6JL9+OOP8Hq9My43PDycUAW7qbq6GmfOnEFnZ2fQf7aysjKOqWIn/HBpt9vx8MMPR/xo7/V60dXVpXEq7Rw9ehTAjS//p9tHE5nwJbPZbJicnMTGjRtDjnz7fD40NTXh9OnTnNJp59ixY7wjzJrQw+VNDQ0NIUfEFUVBfX39vCiY7ITfkt1UU1MT9DoQCODs2bOc0pBYSFOy6b4EJ2KTYrgkcqOSEeaoZIQ5KhlhjkpGmKOSEeaoZIQ5Khlh7j+IobnQcdL/mQAAAABJRU5ErkJggg==\" y=\"-21.499943\"/>\n </g>\n <g id=\"text_2\">\n <!-- Label -->\n <defs>\n <path d=\"M 9.8125 72.90625 \nL 19.671875 72.90625 \nL 19.671875 8.296875 \nL 55.171875 8.296875 \nL 55.171875 0 \nL 9.8125 0 \nz\n\" id=\"DejaVuSans-76\"/>\n <path d=\"M 48.6875 27.296875 \nQ 48.6875 37.203125 44.609375 42.84375 \nQ 40.53125 48.484375 33.40625 48.484375 \nQ 26.265625 48.484375 22.1875 42.84375 \nQ 18.109375 37.203125 18.109375 27.296875 \nQ 18.109375 17.390625 22.1875 11.75 \nQ 26.265625 6.109375 33.40625 6.109375 \nQ 40.53125 6.109375 44.609375 11.75 \nQ 48.6875 17.390625 48.6875 27.296875 \nz\nM 18.109375 46.390625 \nQ 20.953125 51.265625 25.265625 53.625 \nQ 29.59375 56 35.59375 56 \nQ 45.5625 56 51.78125 48.09375 \nQ 58.015625 40.1875 58.015625 27.296875 \nQ 58.015625 14.40625 51.78125 6.484375 \nQ 45.5625 -1.421875 35.59375 -1.421875 \nQ 29.59375 -1.421875 25.265625 0.953125 \nQ 20.953125 3.328125 18.109375 8.203125 \nL 18.109375 0 \nL 9.078125 0 \nL 9.078125 75.984375 \nL 18.109375 75.984375 \nz\n\" id=\"DejaVuSans-98\"/>\n <path d=\"M 9.421875 75.984375 \nL 18.40625 75.984375 \nL 18.40625 0 \nL 9.421875 0 \nz\n\" id=\"DejaVuSans-108\"/>\n </defs>\n <g transform=\"translate(249.721278 16.318125)scale(0.12 -0.12)\">\n <use xlink:href=\"#DejaVuSans-76\"/>\n <use x=\"55.712891\" xlink:href=\"#DejaVuSans-97\"/>\n <use x=\"116.992188\" xlink:href=\"#DejaVuSans-98\"/>\n <use x=\"180.46875\" xlink:href=\"#DejaVuSans-101\"/>\n <use x=\"241.992188\" xlink:href=\"#DejaVuSans-108\"/>\n </g>\n </g>\n </g>\n </g>\n <defs>\n <clipPath id=\"p58ad9a7e6d\">\n <rect height=\"152.181818\" width=\"152.181818\" x=\"7.2\" y=\"22.318125\"/>\n </clipPath>\n <clipPath id=\"pf02e2d733d\">\n <rect height=\"152.181818\" width=\"152.181818\" x=\"189.818182\" y=\"22.318125\"/>\n </clipPath>\n </defs>\n</svg>\n",
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
],
"source": [
"# 训练数据集\n",
"train_dataset = PetDataset(train_images_path, label_images_path, mode='train')\n",
"\n",
"# 验证数据集\n",
"val_dataset = PetDataset(train_images_path, label_images_path, mode='test')\n",
"\n",
"# 抽样一个数据\n",
"image, label = train_dataset[0]\n",
"\n",
"# 进行图片的展示\n",
"plt.figure()\n",
"\n",
"plt.subplot(1,2,1), \n",
"plt.title('Train Image')\n",
"plt.imshow(image.transpose((1, 2, 0)).astype('uint8'))\n",
"plt.axis('off')\n",
"\n",
"plt.subplot(1,2,2), \n",
"plt.title('Label')\n",
"plt.imshow(np.squeeze(label, axis=0).astype('uint8'), cmap='gray')\n",
"plt.axis('off')\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "d9JyZz3ZEnQ1"
},
"source": [
"## 4.模型组网\n",
"\n",
"U-Net是一个U型网络结构,可以看做两个大的阶段,图像先经过Encoder编码器进行下采样得到高级语义特征图,再经过Decoder解码器上采样将特征图恢复到原图片的分辨率。"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "wi-ouGZL--BN"
},
"source": [
"### 4.1 定义SeparableConv2d接口\n",
"\n",
"我们为了减少卷积操作中的训练参数来提升性能,是继承paddle.nn.Layer自定义了一个SeparableConv2d Layer类,整个过程是把`filter_size * filter_size * num_filters`的Conv2d操作拆解为两个子Conv2d,先对输入数据的每个通道使用`filter_size * filter_size * 1`的卷积核进行计算,输入输出通道数目相同,之后在使用`1 * 1 * num_filters`的卷积核计算。"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "0c-FikH-A4qP"
},
"outputs": [],
"source": [
"class SeparableConv2d(paddle.nn.Layer):\n",
" def __init__(self, \n",
" in_channels, \n",
" out_channels, \n",
" kernel_size, \n",
" stride=1, \n",
" padding=0, \n",
" dilation=1, \n",
" groups=None, \n",
" weight_attr=None, \n",
" bias_attr=None, \n",
" data_format=\"NCHW\"):\n",
" super(SeparableConv2d, self).__init__()\n",
" # 第一次卷积操作没有偏置参数\n",
" self.conv_1 = paddle.nn.Conv2d(in_channels, \n",
" in_channels, \n",
" kernel_size, \n",
" stride=stride,\n",
" padding=padding,\n",
" dilation=dilation,\n",
" groups=in_channels, \n",
" weight_attr=weight_attr, \n",
" bias_attr=False, \n",
" data_format=data_format)\n",
" self.pointwise = paddle.nn.Conv2d(in_channels, \n",
" out_channels, \n",
" 1, \n",
" stride=1, \n",
" padding=0, \n",
" dilation=1, \n",
" groups=1, \n",
" weight_attr=weight_attr, \n",
" data_format=data_format)\n",
" \n",
" def forward(self, inputs):\n",
" y = self.conv_1(inputs)\n",
" y = self.pointwise(y)\n",
"\n",
" return y"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "zNyzlqQmBEEi"
},
"source": [
"### 4.2 定义Encoder编码器\n",
"\n",
"我们将网络结构中的Encoder下采样过程进行了一个Layer封装,方便后续调用,减少代码编写,下采样是有一个模型逐渐向下画曲线的一个过程,这个过程中是不断的重复一个单元结构将通道数不断增加,形状不断缩小,并且引入残差网络结构,我们将这些都抽象出来进行统一封装。"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "OpUi9VUeGmXp"
},
"outputs": [],
"source": [
"class Encoder(paddle.nn.Layer):\n",
" def __init__(self, in_channels, out_channels):\n",
" super(Encoder, self).__init__()\n",
" \n",
" self.relu = paddle.nn.ReLU()\n",
" self.separable_conv_01 = SeparableConv2d(in_channels, \n",
" out_channels, \n",
" kernel_size=3, \n",
" padding='same')\n",
" self.bn = paddle.nn.BatchNorm2d(out_channels)\n",
" self.separable_conv_02 = SeparableConv2d(out_channels, \n",
" out_channels, \n",
" kernel_size=3, \n",
" padding='same')\n",
" self.pool = paddle.nn.MaxPool2d(kernel_size=3, stride=2, padding=1)\n",
" self.residual_conv = paddle.nn.Conv2d(in_channels, \n",
" out_channels, \n",
" kernel_size=1, \n",
" stride=2, \n",
" padding='same')\n",
"\n",
" def forward(self, inputs):\n",
" previous_block_activation = inputs\n",
" \n",
" y = self.relu(inputs)\n",
" y = self.separable_conv_01(y)\n",
" y = self.bn(y)\n",
" y = self.relu(y)\n",
" y = self.separable_conv_02(y)\n",
" y = self.bn(y)\n",
" y = self.pool(y)\n",
" \n",
" residual = self.residual_conv(previous_block_activation)\n",
" y = paddle.add(y, residual)\n",
"\n",
" return y"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "nPBRD42WGmuH"
},
"source": [
"### 4.3 定义Decoder解码器\n",
"\n",
"在通道数达到最大得到高级语义特征图后,网络结构会开始进行decode操作,进行上采样,通道数逐渐减小,对应图片尺寸逐步增加,直至恢复到原图像大小,那么这个过程里面也是通过不断的重复相同结构的残差网络完成,我们也是为了减少代码编写,将这个过程定义一个Layer来放到模型组网中使用。"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "ltVurq8OGvK7"
},
"outputs": [],
"source": [
"class Decoder(paddle.nn.Layer):\n",
" def __init__(self, in_channels, out_channels):\n",
" super(Decoder, self).__init__()\n",
"\n",
" self.relu = paddle.nn.ReLU()\n",
" self.conv_transpose_01 = paddle.nn.ConvTranspose2d(in_channels, \n",
" out_channels, \n",
" kernel_size=3, \n",
" padding='same')\n",
" self.conv_transpose_02 = paddle.nn.ConvTranspose2d(out_channels, \n",
" out_channels, \n",
" kernel_size=3, \n",
" padding='same')\n",
" self.bn = paddle.nn.BatchNorm2d(out_channels)\n",
" self.upsample = paddle.nn.UpSample(scale_factor=2.0)\n",
" self.residual_conv = paddle.nn.Conv2d(in_channels, \n",
" out_channels, \n",
" kernel_size=1, \n",
" padding='same')\n",
"\n",
" def forward(self, inputs):\n",
" previous_block_activation = inputs\n",
"\n",
" y = self.relu(inputs)\n",
" y = self.conv_transpose_01(y)\n",
" y = self.bn(y)\n",
" y = self.relu(y)\n",
" y = self.conv_transpose_02(y)\n",
" y = self.bn(y)\n",
" y = self.upsample(y)\n",
" \n",
" residual = self.upsample(previous_block_activation)\n",
" residual = self.residual_conv(residual)\n",
" \n",
" y = paddle.add(y, residual)\n",
" \n",
" return y"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "vLKLj2FMGvdc"
},
"source": [
"### 4.4 训练模型组网\n",
"\n",
"按照U型网络结构格式进行整体的网络结构搭建,三次下采样,四次上采样。"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "an1YFILpG4Xy"
},
"outputs": [],
"source": [
"class PetModel(paddle.nn.Layer):\n",
" def __init__(self, num_classes):\n",
" super(PetModel, self).__init__()\n",
"\n",
" self.conv_1 = paddle.nn.Conv2d(3, 32, \n",
" kernel_size=3,\n",
" stride=2,\n",
" padding='same')\n",
" self.bn = paddle.nn.BatchNorm2d(32)\n",
" self.relu = paddle.nn.ReLU()\n",
"\n",
" in_channels = 32\n",
" self.encoders = []\n",
" self.encoder_list = [64, 128, 256]\n",
" self.decoder_list = [256, 128, 64, 32]\n",
"\n",
" # 根据下采样个数和配置循环定义子Layer,避免重复写一样的程序\n",
" for out_channels in self.encoder_list:\n",
" block = self.add_sublayer('encoder_%s'.format(out_channels),\n",
" Encoder(in_channels, out_channels))\n",
" self.encoders.append(block)\n",
" in_channels = out_channels\n",
"\n",
" self.decoders = []\n",
"\n",
" # 根据上采样个数和配置循环定义子Layer,避免重复写一样的程序\n",
" for out_channels in self.decoder_list:\n",
" block = self.add_sublayer('decoder_%s'.format(out_channels), \n",
" Decoder(in_channels, out_channels))\n",
" self.decoders.append(block)\n",
" in_channels = out_channels\n",
"\n",
" self.output_conv = paddle.nn.Conv2d(in_channels, \n",
" num_classes, \n",
" kernel_size=3, \n",
" padding='same')\n",
" \n",
" def forward(self, inputs):\n",
" y = self.conv_1(inputs)\n",
" y = self.bn(y)\n",
" y = self.relu(y)\n",
" \n",
" for encoder in self.encoders:\n",
" y = encoder(y)\n",
"\n",
" for decoder in self.decoders:\n",
" y = decoder(y)\n",
" \n",
" y = self.output_conv(y)\n",
" \n",
" return y"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "6Nf7hQ60G4sj"
},
"source": [
"### 4.5 模型可视化\n",
"\n",
"调用飞桨提供的summary接口对组建好的模型进行可视化,方便进行模型结构和参数信息的查看和确认。\n",
"@TODO,需要替换"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"colab_type": "code",
"id": "1_MXfWkZeSdE",
"outputId": "4c9870de-9eb6-47e8-e88c-79509ef78cf5",
"tags": []
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": "--------------------------------------------------------------------------------\n Layer (type) Input Shape Output Shape Param #\n================================================================================\n Conv2d-22 [-1, 3, 160, 160] [-1, 32, 80, 80] 896\n BatchNorm2d-9 [-1, 32, 80, 80] [-1, 32, 80, 80] 64\n ReLU-9 [-1, 32, 80, 80] [-1, 32, 80, 80] 0\n ReLU-12 [-1, 256, 20, 20] [-1, 256, 20, 20] 0\n Conv2d-33 [-1, 128, 20, 20] [-1, 128, 20, 20] 1,152\n Conv2d-34 [-1, 128, 20, 20] [-1, 256, 20, 20] 33,024\nSeparableConv2d-11 [-1, 128, 20, 20] [-1, 256, 20, 20] 0\n BatchNorm2d-12 [-1, 256, 20, 20] [-1, 256, 20, 20] 512\n Conv2d-35 [-1, 256, 20, 20] [-1, 256, 20, 20] 2,304\n Conv2d-36 [-1, 256, 20, 20] [-1, 256, 20, 20] 65,792\nSeparableConv2d-12 [-1, 256, 20, 20] [-1, 256, 20, 20] 0\n MaxPool2d-6 [-1, 256, 20, 20] [-1, 256, 10, 10] 0\n Conv2d-37 [-1, 128, 20, 20] [-1, 256, 10, 10] 33,024\n Encoder-6 [-1, 128, 20, 20] [-1, 256, 10, 10] 0\n ReLU-16 [-1, 32, 80, 80] [-1, 32, 80, 80] 0\nConvTranspose2d-15 [-1, 64, 80, 80] [-1, 32, 80, 80] 18,464\n BatchNorm2d-16 [-1, 32, 80, 80] [-1, 32, 80, 80] 64\nConvTranspose2d-16 [-1, 32, 80, 80] [-1, 32, 80, 80] 9,248\n UpSample-8 [-1, 64, 80, 80] [-1, 64, 160, 160] 0\n Conv2d-41 [-1, 64, 160, 160] [-1, 32, 160, 160] 2,080\n Decoder-8 [-1, 64, 80, 80] [-1, 32, 160, 160] 0\n Conv2d-42 [-1, 32, 160, 160] [-1, 4, 160, 160] 1,156\n================================================================================\nTotal params: 167,780\nTrainable params: 167,780\nNon-trainable params: 0\n--------------------------------------------------------------------------------\nInput size (MB): 0.29\nForward/backward pass size (MB): 43.16\nParams size (MB): 0.64\nEstimated Total Size (MB): 44.10\n--------------------------------------------------------------------------------\n\n"
},
{
"output_type": "execute_result",
"data": {
"text/plain": "{'total_params': 167780, 'trainable_params': 167780}"
},
"metadata": {},
"execution_count": 11
}
],
"source": [
"from paddle.static import InputSpec\n",
"\n",
"paddle.disable_static()\n",
"num_classes = 4\n",
"model = paddle.Model(PetModel(num_classes))\n",
"model.summary((3, 160, 160))"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "j9Trlcvj8R7L"
},
"source": [
"## 5.模型训练"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "8Sskbyz58X4J"
},
"source": [
"### 5.1 配置信息\n",
"\n",
"定义训练BATCH_SIZE、训练轮次和计算设备等信息。"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "4fSkTiRB8OpP"
},
"outputs": [],
"source": [
"BATCH_SIZE = 32\n",
"EPOCHS = 15\n",
"device = paddle.set_device('gpu')\n",
"paddle.disable_static(device)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "x_vaedRa8eoy"
},
"source": [
"### 5.3 自定义Loss\n",
"\n",
"在这个任务中我们使用SoftmaxWithCrossEntropy损失函数来做计算,飞桨中有functional形式的API,这里我们做一个自定义操作,实现一个Class形式API放到模型训练中使用。没有直接使用CrossEntropyLoss的原因主要是对计算维度的自定义需求,本次需要进行softmax计算的维度是1,不是默认的最后一维,所以我们采用上面提到的损失函数,通过axis参数来指定softmax计算维度。"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "AEZq_jT78jNe"
},
"outputs": [],
"source": [
"class SoftmaxWithCrossEntropy(paddle.nn.Layer):\n",
" def __init__(self):\n",
" super(SoftmaxWithCrossEntropy, self).__init__()\n",
"\n",
" def forward(self, input, label):\n",
" loss = F.softmax_with_cross_entropy(input, \n",
" label, \n",
" return_softmax=False,\n",
" axis=1)\n",
" return paddle.mean(loss)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "rj6MPPMkJIdZ"
},
"source": [
"### 5.4 启动模型训练\n",
"\n",
"使用模型代码进行Model实例生成,使用prepare接口定义优化器、损失函数和评价指标等信息,用于后续训练使用。在所有初步配置完成后,调用fit接口开启训练执行过程,调用fit时只需要将前面定义好的训练数据集、测试数据集、训练轮次(Epoch)和批次大小(batch_size)配置好即可。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 51
},
"colab_type": "code",
"id": "m-cVyjNreSdO",
"outputId": "9b37dd07-746b-41cc-c8e2-687a83b1ad75",
"tags": []
},
"outputs": [],
"source": [
"optim = paddle.optimizer.RMSProp(learning_rate=0.001, \n",
" rho=0.9, \n",
" momentum=0.0, \n",
" epsilon=1e-07, \n",
" centered=False,\n",
" parameters=model.parameters())\n",
"model = paddle.Model(PetModel(num_classes, model_tools))\n",
"model.prepare(optim, \n",
" SoftmaxWithCrossEntropy())\n",
"\n",
"model.fit(train_dataset, \n",
" val_dataset, \n",
" epochs=EPOCHS, \n",
" batch_size=BATCH_SIZE\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "-mouwS1kJRqJ"
},
"source": [
"## 6.模型预测"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "Dvjxu91DJd1G"
},
"source": [
"### 6.1 预测数据集准备和预测\n",
"\n",
"继续使用PetDataset来实例化待预测使用的数据集。这里我们为了方便没有在另外准备预测数据,复用了评估数据。\n",
"\n",
"我们可以直接使用model.predict接口来对数据集进行预测操作,只需要将预测数据集传递到接口内即可。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "Ur088_vjeSdR"
},
"outputs": [],
"source": [
"predict_results = model.predict(val_dataset)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "-DpAEFBSJioy"
},
"source": [
"### 6.2 预测结果可视化\n",
"\n",
"从我们的预测数据集中抽3个动物来看看预测的效果,展示一下原图、标签图和预测结果。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "1mfaFkO5S1PU"
},
"outputs": [],
"source": [
"plt.figure(figsize=(10, 10))\n",
"\n",
"i = 0\n",
"mask_idx = 0\n",
"\n",
"for data in val_dataset:\n",
" if i > 8: \n",
" break\n",
" plt.subplot(3, 3, i + 1)\n",
" plt.imshow(data[0].transpose((1, 2, 0)).astype('uint8'))\n",
" plt.title('Input Image')\n",
" plt.axis(\"off\")\n",
"\n",
" plt.subplot(3, 3, i + 2)\n",
" plt.imshow(np.squeeze(data[1], axis=0).astype('uint8'), cmap='gray')\n",
" plt.title('Label')\n",
" plt.axis(\"off\")\n",
" \n",
" \n",
" data = val_preds[0][mask_idx][0].transpose((1, 2, 0))\n",
" mask = np.argmax(data, axis=-1)\n",
" mask = np.expand_dims(mask, axis=-1)\n",
"\n",
" plt.subplot(3, 3, i + 3)\n",
" plt.imshow(np.squeeze(mask, axis=2).astype('uint8'), cmap='gray')\n",
" plt.title('Predict')\n",
" plt.axis(\"off\")\n",
" i += 3\n",
" mask_idx += 1\n",
"\n",
"plt.show()"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"collapsed_sections": [],
"name": "pets_image_segmentation_U_Net_like.ipynb",
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3.7.4 64-bit",
"language": "python",
"name": "python_defaultSpec_1599452401282"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4-final"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
\ No newline at end of file
基于U型语义分割模型实现的宠物图像分割
=====================================
本示例教程当前是基于2.0-beta版本Paddle做的案例实现,未来会随着2.0的系列版本发布进行升级。
1.简要介绍
----------
在计算机视觉领域,图像分割指的是将数字图像细分为多个图像子区域的过程。图像分割的目的是简化或改变图像的表示形式,使得图像更容易理解和分析。图像分割通常用于定位图像中的物体和边界(线,曲线等)。更精确的,图像分割是对图像中的每个像素加标签的一个过程,这一过程使得具有相同标签的像素具有某种共同视觉特性。图像分割的领域非常多,无人车、地块检测、表计识别等等。
本示例简要介绍如何通过飞桨开源框架,实现图像分割。这里我们是采用了一个在图像分割领域比较熟知的U-Net网络结构,是一个基于FCN做改进后的一个深度学习网络,包含下采样(编码器,特征提取)和上采样(解码器,分辨率还原)两个阶段,因模型结构比较像U型而命名为U-Net
2.环境设置
----------
导入一些比较基础常用的模块,确认自己的飞桨版本。
.. code:: ipython3
import os
import io
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image as PilImage
import paddle
from paddle.nn import functional as F
paddle.__version__
.. parsed-literal::
'0.0.0'
3.数据集
--------
3.1 数据集下载
~~~~~~~~~~~~~~
本案例使用Oxford-IIIT
Pet数据集,官网:https://www.robots.ox.ac.uk/~vgg/data/pets
数据集统计如下:
.. figure:: https://www.robots.ox.ac.uk/~vgg/data/pets/breed_count.jpg
:alt: alt 数据集统计信息
alt 数据集统计信息
数据集包含两个压缩文件:
1. 原图:https://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz
2. 分割图像:https://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz
.. code:: ipython3
!curl -O http://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz
!curl -O http://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz
!tar -xf images.tar.gz
!tar -xf annotations.tar.gz
3.2 数据集概览
~~~~~~~~~~~~~~
首先我们先看看下载到磁盘上的文件结构是什么样,来了解一下我们的数据集。
1. 首先看一下images.tar.gz这个压缩包,该文件解压后得到一个images目录,这个目录比较简单,里面直接放的是用类名和序号命名好的图片文件,每个图片是对应的宠物照片。
.. code:: bash
.
├── samoyed_7.jpg
├── ......
└── samoyed_81.jpg
2. 然后我们在看下annotations.tar.gz,文件解压后的目录里面包含以下内容,目录中的README文件将每个目录和文件做了比较详细的介绍,我们可以通过README来查看每个目录文件的说明。
.. code:: bash
.
├── README
├── list.txt
├── test.txt
├── trainval.txt
├── trimaps
│   ├── Abyssinian_1.png
   ├── Abyssinian_10.png
   ├── ......
│    └── yorkshire_terrier_99.png
└── xmls
├── Abyssinian_1.xml
├── Abyssinian_10.xml
├── ......
└── yorkshire_terrier_190.xml
本次我们主要使用到imagesannotations/trimaps两个目录,即原图和三元图像文件,前者作为训练的输入数据,后者是对应的标签数据。
我们来看看这个数据集给我们提供了多少个训练样本。
.. code:: ipython3
train_images_path = "images/"
label_images_path = "annotations/trimaps/"
print("用于训练的图片样本数量:", len([os.path.join(train_images_path, image_name)
for image_name in os.listdir(train_images_path)
if image_name.endswith('.jpg')]))
.. parsed-literal::
用于训练的图片样本数量: 7390
3.3 数据集类定义
~~~~~~~~~~~~~~~~
飞桨(PaddlePaddle)数据集加载方案是统一使用Dataset(数据集定义) +
DataLoader(多进程数据集加载)。
首先我们先进行数据集的定义,数据集定义主要是实现一个新的Dataset类,继承父类paddle.io.Dataset,并实现父类中以下两个抽象方法,\ ``__getitem__``\ \ ``__len__``\
.. code:: python
class MyDataset(Dataset):
def __init__(self):
...
# 每次迭代时返回数据和对应的标签
def __getitem__(self, idx):
return x, y
# 返回整个数据集的总数
def __len__(self):
return count(samples)
在数据集内部可以结合图像数据预处理相关API进行图像的预处理(改变大小、反转、调整格式等)。
由于加载进来的图像不一定都符合自己的需求,举个例子,已下载的这些图片里面就会有RGBA格式的图片,这个时候图片就不符合我们所需3通道的需求,我们需要进行图片的格式转换,那么这里我们直接实现了一个通用的图片读取接口,确保读取出来的图片都是满足我们的需求。
另外图片加载出来的默认shapeHWC,这个时候要看看是否满足后面训练的需要,如果Layer的默认格式和这个不是符合的情况下,需要看下Layer有没有参数可以进行格式调整。不过如果layer较多的话,还是直接调整原数据Shape比较好,否则每个layer都要做参数设置,如果有遗漏就会导致训练出错,那么在本案例中是直接对数据源的shape做了统一调整,从HWC转换成了CHW,因为飞桨的卷积等API的默认输入格式为CHW,这样处理方便后续模型训练。
.. code:: ipython3
import random
from paddle.io import Dataset
from paddle.vision.transforms import transforms
class ImgTranspose(object):
"""
图像预处理工具,用于将Mask图像进行升维(160, 160) => (160, 160, 1),
并对图像的维度进行转换从HWC变为CHW
"""
def __init__(self, fmt):
self.format = fmt
def __call__(self, img):
if len(img.shape) == 2:
img = np.expand_dims(img, axis=2)
return img.transpose(self.format)
class PetDataset(Dataset):
"""
数据集定义
"""
def __init__(self, image_path, label_path, mode='train'):
"""
构造函数
"""
self.image_size = (160, 160)
self.image_path = image_path
self.label_path = label_path
self.mode = mode.lower()
self.eval_image_num = 1000
assert self.mode in ['train', 'test'], \
"mode should be 'train' or 'test', but got {}".format(self.mode)
self._parse_dataset()
self.transforms = transforms.Compose([
ImgTranspose((2, 0, 1))
])
def _sort_images(self, image_dir, image_type):
"""
对文件夹内的图像进行按照文件名排序
"""
files = []
for image_name in os.listdir(image_dir):
if image_name.endswith('.{}'.format(image_type)) \
and not image_name.startswith('.'):
files.append(os.path.join(image_dir, image_name))
return sorted(files)
def _parse_dataset(self):
"""
由于所有文件都是散落在文件夹中,在训练时我们需要使用的是数据集和标签对应的数据关系,
所以我们第一步是对原始的数据集进行整理,得到数据集和标签两个数组,分别一一对应。
这样可以在使用的时候能够很方便的找到原始数据和标签的对应关系,否则对于原有的文件夹图片数据无法直接应用。
在这里是用了一个非常简单的方法,按照文件名称进行排序。
因为刚好数据和标签的文件名是按照这个逻辑制作的,名字都一样,只有扩展名不一样。
"""
temp_train_images = self._sort_images(self.image_path, 'jpg')
temp_label_images = self._sort_images(self.label_path, 'png')
random.Random(1337).shuffle(temp_train_images)
random.Random(1337).shuffle(temp_label_images)
if self.mode == 'train':
self.train_images = temp_train_images[:-self.eval_image_num]
self.label_images = temp_label_images[:-self.eval_image_num]
else:
self.train_images = temp_train_images[-self.eval_image_num:]
self.label_images = temp_label_images[-self.eval_image_num:]
def _load_img(self, path, color_mode='rgb'):
"""
统一的图像处理接口封装,用于规整图像大小和通道
"""
with open(path, 'rb') as f:
img = PilImage.open(io.BytesIO(f.read()))
if color_mode == 'grayscale':
# if image is not already an 8-bit, 16-bit or 32-bit grayscale image
# convert it to an 8-bit grayscale image.
if img.mode not in ('L', 'I;16', 'I'):
img = img.convert('L')
elif color_mode == 'rgba':
if img.mode != 'RGBA':
img = img.convert('RGBA')
elif color_mode == 'rgb':
if img.mode != 'RGB':
img = img.convert('RGB')
else:
raise ValueError('color_mode must be "grayscale", "rgb", or "rgba"')
if self.image_size is not None:
if img.size != self.image_size:
img = img.resize(self.image_size, PilImage.NEAREST)
return img
def __getitem__(self, idx):
"""
返回 image, label
"""
# 花了比较多的时间在数据处理这里,需要处理成模型能适配的格式,踩了一些坑(比如有不是RGB格式的)
# 有图片会出现通道数和期望不符的情况,需要进行相关考虑
# 加载原始图像
train_image = self._load_img(self.train_images[idx])
x = np.array(train_image, dtype='float32')
# 对图像进行预处理,统一大小,转换维度格式(HWC => CHW
x = self.transforms(x)
# 加载Label图像
label_image = self._load_img(self.label_images[idx], color_mode="grayscale")
y = np.array(label_image, dtype='uint8')
# 图像预处理
# Label图像是二维的数组(size, size),升维到(size, size, 1)后才能用于最后loss计算
y = self.transforms(y)
# 返回img, label,转换为需要的格式
return x, y.astype('int64')
def __len__(self):
"""
返回数据集总数
"""
return len(self.train_images)
3.4 PetDataSet数据集抽样展示
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
实现好Dataset数据集后,我们来测试一下数据集是否符合预期,因为Dataset是一个可以被迭代的Class,我们通过for循环从里面读取数据来用matplotlib进行展示,这里要注意的是对于分割的标签文件因为是1通道的灰度图片,需要在使用imshow接口时注意下传参cmap=gray’。
.. code:: ipython3
# 训练数据集
train_dataset = PetDataset(train_images_path, label_images_path, mode='train')
# 验证数据集
val_dataset = PetDataset(train_images_path, label_images_path, mode='test')
# 抽样一个数据
image, label = train_dataset[0]
# 进行图片的展示
plt.figure()
plt.subplot(1,2,1),
plt.title('Train Image')
plt.imshow(image.transpose((1, 2, 0)).astype('uint8'))
plt.axis('off')
plt.subplot(1,2,2),
plt.title('Label')
plt.imshow(np.squeeze(label, axis=0).astype('uint8'), cmap='gray')
plt.axis('off')
plt.show()
.. image:: pets_image_segmentation_U_Net_like_files/pets_image_segmentation_U_Net_like_12_0.svg
4.模型组网
----------
U-Net是一个U型网络结构,可以看做两个大的阶段,图像先经过Encoder编码器进行下采样得到高级语义特征图,再经过Decoder解码器上采样将特征图恢复到原图片的分辨率。
4.1 定义SeparableConv2d接口
~~~~~~~~~~~~~~~~~~~~~~~~~~~
我们为了减少卷积操作中的训练参数来提升性能,是继承paddle.nn.Layer自定义了一个SeparableConv2d
Layer类,整个过程是把\ ``filter_size * filter_size * num_filters``\ Conv2d操作拆解为两个子Conv2d,先对输入数据的每个通道使用\ ``filter_size * filter_size * 1``\ 的卷积核进行计算,输入输出通道数目相同,之后在使用\ ``1 * 1 * num_filters``\ 的卷积核计算。
.. code:: ipython3
class SeparableConv2d(paddle.nn.Layer):
def __init__(self,
in_channels,
out_channels,
kernel_size,
stride=1,
padding=0,
dilation=1,
groups=None,
weight_attr=None,
bias_attr=None,
data_format="NCHW"):
super(SeparableConv2d, self).__init__()
# 第一次卷积操作没有偏置参数
self.conv_1 = paddle.nn.Conv2d(in_channels,
in_channels,
kernel_size,
stride=stride,
padding=padding,
dilation=dilation,
groups=in_channels,
weight_attr=weight_attr,
bias_attr=False,
data_format=data_format)
self.pointwise = paddle.nn.Conv2d(in_channels,
out_channels,
1,
stride=1,
padding=0,
dilation=1,
groups=1,
weight_attr=weight_attr,
data_format=data_format)
def forward(self, inputs):
y = self.conv_1(inputs)
y = self.pointwise(y)
return y
4.2 定义Encoder编码器
~~~~~~~~~~~~~~~~~~~~~
我们将网络结构中的Encoder下采样过程进行了一个Layer封装,方便后续调用,减少代码编写,下采样是有一个模型逐渐向下画曲线的一个过程,这个过程中是不断的重复一个单元结构将通道数不断增加,形状不断缩小,并且引入残差网络结构,我们将这些都抽象出来进行统一封装。
.. code:: ipython3
class Encoder(paddle.nn.Layer):
def __init__(self, in_channels, out_channels):
super(Encoder, self).__init__()
self.relu = paddle.nn.ReLU()
self.separable_conv_01 = SeparableConv2d(in_channels,
out_channels,
kernel_size=3,
padding='same')
self.bn = paddle.nn.BatchNorm2d(out_channels)
self.separable_conv_02 = SeparableConv2d(out_channels,
out_channels,
kernel_size=3,
padding='same')
self.pool = paddle.nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.residual_conv = paddle.nn.Conv2d(in_channels,
out_channels,
kernel_size=1,
stride=2,
padding='same')
def forward(self, inputs):
previous_block_activation = inputs
y = self.relu(inputs)
y = self.separable_conv_01(y)
y = self.bn(y)
y = self.relu(y)
y = self.separable_conv_02(y)
y = self.bn(y)
y = self.pool(y)
residual = self.residual_conv(previous_block_activation)
y = paddle.add(y, residual)
return y
4.3 定义Decoder解码器
~~~~~~~~~~~~~~~~~~~~~
在通道数达到最大得到高级语义特征图后,网络结构会开始进行decode操作,进行上采样,通道数逐渐减小,对应图片尺寸逐步增加,直至恢复到原图像大小,那么这个过程里面也是通过不断的重复相同结构的残差网络完成,我们也是为了减少代码编写,将这个过程定义一个Layer来放到模型组网中使用。
.. code:: ipython3
class Decoder(paddle.nn.Layer):
def __init__(self, in_channels, out_channels):
super(Decoder, self).__init__()
self.relu = paddle.nn.ReLU()
self.conv_transpose_01 = paddle.nn.ConvTranspose2d(in_channels,
out_channels,
kernel_size=3,
padding='same')
self.conv_transpose_02 = paddle.nn.ConvTranspose2d(out_channels,
out_channels,
kernel_size=3,
padding='same')
self.bn = paddle.nn.BatchNorm2d(out_channels)
self.upsample = paddle.nn.UpSample(scale_factor=2.0)
self.residual_conv = paddle.nn.Conv2d(in_channels,
out_channels,
kernel_size=1,
padding='same')
def forward(self, inputs):
previous_block_activation = inputs
y = self.relu(inputs)
y = self.conv_transpose_01(y)
y = self.bn(y)
y = self.relu(y)
y = self.conv_transpose_02(y)
y = self.bn(y)
y = self.upsample(y)
residual = self.upsample(previous_block_activation)
residual = self.residual_conv(residual)
y = paddle.add(y, residual)
return y
4.4 训练模型组网
~~~~~~~~~~~~~~~~
按照U型网络结构格式进行整体的网络结构搭建,三次下采样,四次上采样。
.. code:: ipython3
class PetModel(paddle.nn.Layer):
def __init__(self, num_classes):
super(PetModel, self).__init__()
self.conv_1 = paddle.nn.Conv2d(3, 32,
kernel_size=3,
stride=2,
padding='same')
self.bn = paddle.nn.BatchNorm2d(32)
self.relu = paddle.nn.ReLU()
in_channels = 32
self.encoders = []
self.encoder_list = [64, 128, 256]
self.decoder_list = [256, 128, 64, 32]
# 根据下采样个数和配置循环定义子Layer,避免重复写一样的程序
for out_channels in self.encoder_list:
block = self.add_sublayer('encoder_%s'.format(out_channels),
Encoder(in_channels, out_channels))
self.encoders.append(block)
in_channels = out_channels
self.decoders = []
# 根据上采样个数和配置循环定义子Layer,避免重复写一样的程序
for out_channels in self.decoder_list:
block = self.add_sublayer('decoder_%s'.format(out_channels),
Decoder(in_channels, out_channels))
self.decoders.append(block)
in_channels = out_channels
self.output_conv = paddle.nn.Conv2d(in_channels,
num_classes,
kernel_size=3,
padding='same')
def forward(self, inputs):
y = self.conv_1(inputs)
y = self.bn(y)
y = self.relu(y)
for encoder in self.encoders:
y = encoder(y)
for decoder in self.decoders:
y = decoder(y)
y = self.output_conv(y)
return y
4.5 模型可视化
~~~~~~~~~~~~~~
调用飞桨提供的summary接口对组建好的模型进行可视化,方便进行模型结构和参数信息的查看和确认。
@TODO,需要替换
.. code:: ipython3
from paddle.static import InputSpec
paddle.disable_static()
num_classes = 4
model = paddle.Model(PetModel(num_classes))
model.summary((3, 160, 160))
.. parsed-literal::
--------------------------------------------------------------------------------
Layer (type) Input Shape Output Shape Param #
================================================================================
Conv2d-22 [-1, 3, 160, 160] [-1, 32, 80, 80] 896
BatchNorm2d-9 [-1, 32, 80, 80] [-1, 32, 80, 80] 64
ReLU-9 [-1, 32, 80, 80] [-1, 32, 80, 80] 0
ReLU-12 [-1, 256, 20, 20] [-1, 256, 20, 20] 0
Conv2d-33 [-1, 128, 20, 20] [-1, 128, 20, 20] 1,152
Conv2d-34 [-1, 128, 20, 20] [-1, 256, 20, 20] 33,024
SeparableConv2d-11 [-1, 128, 20, 20] [-1, 256, 20, 20] 0
BatchNorm2d-12 [-1, 256, 20, 20] [-1, 256, 20, 20] 512
Conv2d-35 [-1, 256, 20, 20] [-1, 256, 20, 20] 2,304
Conv2d-36 [-1, 256, 20, 20] [-1, 256, 20, 20] 65,792
SeparableConv2d-12 [-1, 256, 20, 20] [-1, 256, 20, 20] 0
MaxPool2d-6 [-1, 256, 20, 20] [-1, 256, 10, 10] 0
Conv2d-37 [-1, 128, 20, 20] [-1, 256, 10, 10] 33,024
Encoder-6 [-1, 128, 20, 20] [-1, 256, 10, 10] 0
ReLU-16 [-1, 32, 80, 80] [-1, 32, 80, 80] 0
ConvTranspose2d-15 [-1, 64, 80, 80] [-1, 32, 80, 80] 18,464
BatchNorm2d-16 [-1, 32, 80, 80] [-1, 32, 80, 80] 64
ConvTranspose2d-16 [-1, 32, 80, 80] [-1, 32, 80, 80] 9,248
UpSample-8 [-1, 64, 80, 80] [-1, 64, 160, 160] 0
Conv2d-41 [-1, 64, 160, 160] [-1, 32, 160, 160] 2,080
Decoder-8 [-1, 64, 80, 80] [-1, 32, 160, 160] 0
Conv2d-42 [-1, 32, 160, 160] [-1, 4, 160, 160] 1,156
================================================================================
Total params: 167,780
Trainable params: 167,780
Non-trainable params: 0
--------------------------------------------------------------------------------
Input size (MB): 0.29
Forward/backward pass size (MB): 43.16
Params size (MB): 0.64
Estimated Total Size (MB): 44.10
--------------------------------------------------------------------------------
.. parsed-literal::
{'total_params': 167780, 'trainable_params': 167780}
5.模型训练
----------
5.1 配置信息
~~~~~~~~~~~~
定义训练BATCH_SIZE、训练轮次和计算设备等信息。
.. code:: ipython3
BATCH_SIZE = 32
EPOCHS = 15
device = paddle.set_device('gpu')
paddle.disable_static(device)
5.3 自定义Loss
~~~~~~~~~~~~~~
在这个任务中我们使用SoftmaxWithCrossEntropy损失函数来做计算,飞桨中有functional形式的API,这里我们做一个自定义操作,实现一个Class形式API放到模型训练中使用。没有直接使用CrossEntropyLoss的原因主要是对计算维度的自定义需求,本次需要进行softmax计算的维度是1,不是默认的最后一维,所以我们采用上面提到的损失函数,通过axis参数来指定softmax计算维度。
.. code:: ipython3
class SoftmaxWithCrossEntropy(paddle.nn.Layer):
def __init__(self):
super(SoftmaxWithCrossEntropy, self).__init__()
def forward(self, input, label):
loss = F.softmax_with_cross_entropy(input,
label,
return_softmax=False,
axis=1)
return paddle.mean(loss)
5.4 启动模型训练
~~~~~~~~~~~~~~~~
使用模型代码进行Model实例生成,使用prepare接口定义优化器、损失函数和评价指标等信息,用于后续训练使用。在所有初步配置完成后,调用fit接口开启训练执行过程,调用fit时只需要将前面定义好的训练数据集、测试数据集、训练轮次(Epoch)和批次大小(batch_size)配置好即可。
.. code:: ipython3
optim = paddle.optimizer.RMSProp(learning_rate=0.001,
rho=0.9,
momentum=0.0,
epsilon=1e-07,
centered=False,
parameters=model.parameters())
model = paddle.Model(PetModel(num_classes, model_tools))
model.prepare(optim,
SoftmaxWithCrossEntropy())
model.fit(train_dataset,
val_dataset,
epochs=EPOCHS,
batch_size=BATCH_SIZE
)
6.模型预测
----------
6.1 预测数据集准备和预测
~~~~~~~~~~~~~~~~~~~~~~~~
继续使用PetDataset来实例化待预测使用的数据集。这里我们为了方便没有在另外准备预测数据,复用了评估数据。
我们可以直接使用model.predict接口来对数据集进行预测操作,只需要将预测数据集传递到接口内即可。
.. code:: ipython3
predict_results = model.predict(val_dataset)
6.2 预测结果可视化
~~~~~~~~~~~~~~~~~~
从我们的预测数据集中抽3个动物来看看预测的效果,展示一下原图、标签图和预测结果。
.. code:: ipython3
plt.figure(figsize=(10, 10))
i = 0
mask_idx = 0
for data in val_dataset:
if i > 8:
break
plt.subplot(3, 3, i + 1)
plt.imshow(data[0].transpose((1, 2, 0)).astype('uint8'))
plt.title('Input Image')
plt.axis("off")
plt.subplot(3, 3, i + 2)
plt.imshow(np.squeeze(data[1], axis=0).astype('uint8'), cmap='gray')
plt.title('Label')
plt.axis("off")
data = val_preds[0][mask_idx][0].transpose((1, 2, 0))
mask = np.argmax(data, axis=-1)
mask = np.expand_dims(mask, axis=-1)
plt.subplot(3, 3, i + 3)
plt.imshow(np.squeeze(mask, axis=2).astype('uint8'), cmap='gray')
plt.title('Predict')
plt.axis("off")
i += 3
mask_idx += 1
plt.show()
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Created with matplotlib (https://matplotlib.org/) -->
<svg height="181.699943pt" version="1.1" viewBox="0 0 349.2 181.699943" width="349.2pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<defs>
<style type="text/css">
*{stroke-linecap:butt;stroke-linejoin:round;}
</style>
</defs>
<g id="figure_1">
<g id="patch_1">
<path d="M 0 181.699943
L 349.2 181.699943
L 349.2 0
L 0 0
z
" style="fill:none;"/>
</g>
<g id="axes_1">
<g clip-path="url(#p58ad9a7e6d)">
<image height="153" id="image6a21407320" transform="scale(1 -1)translate(0 -153)" width="153" x="7.2" xlink:href="data:image/png;base64,
" y="-21.499943"/>
</g>
<g id="text_1">
<!-- Train Image -->
<defs>
<path d="M -0.296875 72.90625
L 61.375 72.90625
L 61.375 64.59375
L 35.5 64.59375
L 35.5 0
L 25.59375 0
L 25.59375 64.59375
L -0.296875 64.59375
z
" id="DejaVuSans-84"/>
<path d="M 41.109375 46.296875
Q 39.59375 47.171875 37.8125 47.578125
Q 36.03125 48 33.890625 48
Q 26.265625 48 22.1875 43.046875
Q 18.109375 38.09375 18.109375 28.8125
L 18.109375 0
L 9.078125 0
L 9.078125 54.6875
L 18.109375 54.6875
L 18.109375 46.1875
Q 20.953125 51.171875 25.484375 53.578125
Q 30.03125 56 36.53125 56
Q 37.453125 56 38.578125 55.875
Q 39.703125 55.765625 41.0625 55.515625
z
" id="DejaVuSans-114"/>
<path d="M 34.28125 27.484375
Q 23.390625 27.484375 19.1875 25
Q 14.984375 22.515625 14.984375 16.5
Q 14.984375 11.71875 18.140625 8.90625
Q 21.296875 6.109375 26.703125 6.109375
Q 34.1875 6.109375 38.703125 11.40625
Q 43.21875 16.703125 43.21875 25.484375
L 43.21875 27.484375
z
M 52.203125 31.203125
L 52.203125 0
L 43.21875 0
L 43.21875 8.296875
Q 40.140625 3.328125 35.546875 0.953125
Q 30.953125 -1.421875 24.3125 -1.421875
Q 15.921875 -1.421875 10.953125 3.296875
Q 6 8.015625 6 15.921875
Q 6 25.140625 12.171875 29.828125
Q 18.359375 34.515625 30.609375 34.515625
L 43.21875 34.515625
L 43.21875 35.40625
Q 43.21875 41.609375 39.140625 45
Q 35.0625 48.390625 27.6875 48.390625
Q 23 48.390625 18.546875 47.265625
Q 14.109375 46.140625 10.015625 43.890625
L 10.015625 52.203125
Q 14.9375 54.109375 19.578125 55.046875
Q 24.21875 56 28.609375 56
Q 40.484375 56 46.34375 49.84375
Q 52.203125 43.703125 52.203125 31.203125
z
" id="DejaVuSans-97"/>
<path d="M 9.421875 54.6875
L 18.40625 54.6875
L 18.40625 0
L 9.421875 0
z
M 9.421875 75.984375
L 18.40625 75.984375
L 18.40625 64.59375
L 9.421875 64.59375
z
" id="DejaVuSans-105"/>
<path d="M 54.890625 33.015625
L 54.890625 0
L 45.90625 0
L 45.90625 32.71875
Q 45.90625 40.484375 42.875 44.328125
Q 39.84375 48.1875 33.796875 48.1875
Q 26.515625 48.1875 22.3125 43.546875
Q 18.109375 38.921875 18.109375 30.90625
L 18.109375 0
L 9.078125 0
L 9.078125 54.6875
L 18.109375 54.6875
L 18.109375 46.1875
Q 21.34375 51.125 25.703125 53.5625
Q 30.078125 56 35.796875 56
Q 45.21875 56 50.046875 50.171875
Q 54.890625 44.34375 54.890625 33.015625
z
" id="DejaVuSans-110"/>
<path id="DejaVuSans-32"/>
<path d="M 9.8125 72.90625
L 19.671875 72.90625
L 19.671875 0
L 9.8125 0
z
" id="DejaVuSans-73"/>
<path d="M 52 44.1875
Q 55.375 50.25 60.0625 53.125
Q 64.75 56 71.09375 56
Q 79.640625 56 84.28125 50.015625
Q 88.921875 44.046875 88.921875 33.015625
L 88.921875 0
L 79.890625 0
L 79.890625 32.71875
Q 79.890625 40.578125 77.09375 44.375
Q 74.3125 48.1875 68.609375 48.1875
Q 61.625 48.1875 57.5625 43.546875
Q 53.515625 38.921875 53.515625 30.90625
L 53.515625 0
L 44.484375 0
L 44.484375 32.71875
Q 44.484375 40.625 41.703125 44.40625
Q 38.921875 48.1875 33.109375 48.1875
Q 26.21875 48.1875 22.15625 43.53125
Q 18.109375 38.875 18.109375 30.90625
L 18.109375 0
L 9.078125 0
L 9.078125 54.6875
L 18.109375 54.6875
L 18.109375 46.1875
Q 21.1875 51.21875 25.484375 53.609375
Q 29.78125 56 35.6875 56
Q 41.65625 56 45.828125 52.96875
Q 50 49.953125 52 44.1875
z
" id="DejaVuSans-109"/>
<path d="M 45.40625 27.984375
Q 45.40625 37.75 41.375 43.109375
Q 37.359375 48.484375 30.078125 48.484375
Q 22.859375 48.484375 18.828125 43.109375
Q 14.796875 37.75 14.796875 27.984375
Q 14.796875 18.265625 18.828125 12.890625
Q 22.859375 7.515625 30.078125 7.515625
Q 37.359375 7.515625 41.375 12.890625
Q 45.40625 18.265625 45.40625 27.984375
z
M 54.390625 6.78125
Q 54.390625 -7.171875 48.1875 -13.984375
Q 42 -20.796875 29.203125 -20.796875
Q 24.46875 -20.796875 20.265625 -20.09375
Q 16.0625 -19.390625 12.109375 -17.921875
L 12.109375 -9.1875
Q 16.0625 -11.328125 19.921875 -12.34375
Q 23.78125 -13.375 27.78125 -13.375
Q 36.625 -13.375 41.015625 -8.765625
Q 45.40625 -4.15625 45.40625 5.171875
L 45.40625 9.625
Q 42.625 4.78125 38.28125 2.390625
Q 33.9375 0 27.875 0
Q 17.828125 0 11.671875 7.65625
Q 5.515625 15.328125 5.515625 27.984375
Q 5.515625 40.671875 11.671875 48.328125
Q 17.828125 56 27.875 56
Q 33.9375 56 38.28125 53.609375
Q 42.625 51.21875 45.40625 46.390625
L 45.40625 54.6875
L 54.390625 54.6875
z
" id="DejaVuSans-103"/>
<path d="M 56.203125 29.59375
L 56.203125 25.203125
L 14.890625 25.203125
Q 15.484375 15.921875 20.484375 11.0625
Q 25.484375 6.203125 34.421875 6.203125
Q 39.59375 6.203125 44.453125 7.46875
Q 49.3125 8.734375 54.109375 11.28125
L 54.109375 2.78125
Q 49.265625 0.734375 44.1875 -0.34375
Q 39.109375 -1.421875 33.890625 -1.421875
Q 20.796875 -1.421875 13.15625 6.1875
Q 5.515625 13.8125 5.515625 26.8125
Q 5.515625 40.234375 12.765625 48.109375
Q 20.015625 56 32.328125 56
Q 43.359375 56 49.78125 48.890625
Q 56.203125 41.796875 56.203125 29.59375
z
M 47.21875 32.234375
Q 47.125 39.59375 43.09375 43.984375
Q 39.0625 48.390625 32.421875 48.390625
Q 24.90625 48.390625 20.390625 44.140625
Q 15.875 39.890625 15.1875 32.171875
z
" id="DejaVuSans-101"/>
</defs>
<g transform="translate(48.199347 16.318125)scale(0.12 -0.12)">
<use xlink:href="#DejaVuSans-84"/>
<use x="46.333984" xlink:href="#DejaVuSans-114"/>
<use x="87.447266" xlink:href="#DejaVuSans-97"/>
<use x="148.726562" xlink:href="#DejaVuSans-105"/>
<use x="176.509766" xlink:href="#DejaVuSans-110"/>
<use x="239.888672" xlink:href="#DejaVuSans-32"/>
<use x="271.675781" xlink:href="#DejaVuSans-73"/>
<use x="301.167969" xlink:href="#DejaVuSans-109"/>
<use x="398.580078" xlink:href="#DejaVuSans-97"/>
<use x="459.859375" xlink:href="#DejaVuSans-103"/>
<use x="523.335938" xlink:href="#DejaVuSans-101"/>
</g>
</g>
</g>
<g id="axes_2">
<g clip-path="url(#pf02e2d733d)">
<image height="153" id="imageb081ed1ee7" transform="scale(1 -1)translate(0 -153)" width="153" x="189.818182" xlink:href="data:image/png;base64,
iVBORw0KGgoAAAANSUhEUgAAAJkAAACZCAYAAAA8XJi6AAAABHNCSVQICAgIfAhkiAAADEVJREFUeJzt3V9MW2UfB/Bv2xUKsm44Fkt0Gpdtzo3ojM4MnasJ4mYUsmxqvMBsmYlx80+miboLY7hxGnahhkVNVDSMhKibYEC2MmAONkWQbR3IYIVRkilsDAoUO1tOe96LveN9m7bQQp/zPE/5fRKS9ZyTnm/gu/Ocnp4/usLCQhUAkpOTsW/fPoTz2Wef4dVXXw07j8xOTk4O6urqQqZXVlbCbrdzSMSOnncAEmzr1q1YuXIl7xhxNVWypKQknjlIAtMDgNlsxltvvRV2Aa/XC7fbrWmo+WBychLj4+O8Y2hCn56ejjfffDPszOvXr+Pzzz+PuK9GZq+xsREvvfRS2HlpaWkwGAwaJ2JH/8Ybb4Sd4fF48O2330YsIGEnPz8ft99+O+8YcRN2x9/j8aC0tBR79uzROg9JQGFL1tPTg927d2udhSQoOoQhqOXLl8NkMvGOERdUMkFZrVYsWrSId4y4CCnZ6OgoSkpKeGQhCSqkZCMjI/j00095ZJl32tvbceTIEd4xmAsqmdvtxnvvvccry7zT3d2No0eP8o7BXFDJPB4PysvLeWUhCYp2/AlzVDLOqqqqcPDgQd4xmKKScXb16lU4nU7eMZiikhHmqGQC27lzJxYvXsw7xpxRyQRmMpmg0+l4x5gzKhlhjkpGmAsqmd/v55VjXgsEAlBVlXcMZqZKNjY2llBnY8rk448/xgcffBB23oIFCzROE380XApuz549SEtL4x1jTqhkhDkqGWGOSiaBjIwMqY+X6QFAVVU0NzfzzjKv9ff3Y2BgIOy8HTt2SH2Fvx4AfD4ftmzZwjvLvPbVV1+hoqKCdwwmaLiUxJo1a6QdMqlkAmlsbMTFixfDzsvPz4deL+efS87UCeq7775DS0sL7xhxRyWTSE5ODu8Is0Ilk0h2draU+2VUMsEUFxfjzJkz0y7z3HPPYfv27RolmjsqmWBaWlrw999/R5xfUFCANWvWICsrCy+88IKGyWaPSiaZ5cuXT/377rvv5pgkelQyiRmNRuzatYt3jBlRyQS0a9euqA5l6HQ6LFmyRINEc0MlE9DQ0BC8Xi/vGHFDJZNcamqq8LddpZIlANHP0KCSCcpqteL8+fNRLbto0SKh7/E7VbJEum98Ioj16iWRvwnQAzce3jUyMgKTyTT1I/ommATT6XTCbiimtmRmsxnXr1+f+jl58iQWL16M1NRUnvnmtfHxcQQCgaiWzcjIwM6dO9kGmqWI+2QbNmyAy+XCoUOHYDabtcxE/uuxxx5DX18f7xhzNuOVo9u2bYPH40FhYeHUtMHBQfzzzz8sc5EEEtXlyQUFBSgoKJh6/c477+D48eNRrUBRFHR0dMwuHUFnZyfuuusuqa8k16mMb8IwMTGBp556Cv/++y/++OMPlqtKWAMDA7BYLDMud/nyZXz99dcaJIqN3uFwwOFwoLe3l8kK0tLS0NTUhJ9++glPP/00NmzYwGQ9iayurg6Kosy4XEpKipD3M9HdfAa5wWBAfn4+gBthWT2iuKurC7t378Yvv/zC5P0T1ejoaFSPwent7UVZWZkGiaI39enS7/ejoqICFRUVqKqqwoULF5iscPXq1SguLsYTTzzB5P2JeMIewnC73bDZbGhsbERXV1fcV5qVlYUDBw5g8+bNcX9vIh7D448/XhhuhtfrhdPpxODgIEZHR6GqKm699da4rdhisSArKwtOp5PZ/mAimZiYwJNPPjnjtZculyvq7zy1MuPn4uHhYfz222+4dOkSOjs7Y3pzvV6PZ555JuL8devW4dFHH4XNZovpfeejgwcP4sCBA1Ieyog68ZUrV3DlypWY3lyn00FRFGzdujXiMs8++yxaWlpQXV0d03sTeUQcLuNlaGgI/f396Ovrw+rVq0PmL126FOvXr4fD4UBPTw/LKNL79ddf8eKLL057xkVKSgoMBgP6+/s1TDY95tveQCCAvr4+GAwGXLt2DRaLJWQIXblyJTIzM1lHkd6JEydmPAUoJSUFS5cu1ShRdDQ7adHv9+Ovv/7CuXPnwg6N+/fvR25urlZxpLV27VreEWKm+Zmxfr8fbrc7ZLrFYpH+Brxa6O7unnGZe+65R6j7ZnA5/bqnpwc1NTU8Vj0vGI1GZGdnw2q18o4CgFPJAoEA2traUFtbGzS9vLxcmF+MyKI5v89gMGDTpk145JFHNEg0PW4XkgQCgZAnoCQnJ0t7ozctRXsun16vF+L3yTVBuE9KIvxSSHxx/Yu2trbixIkTQdPq6urw0EMPcUqUeER4ZpN831GQqKiqilOnTuH06dO8o4hZsoULF0Kv10d9pc58E+lpvqqqTt1Dw263o6GhQctYEXEvmc/nw+TkJIxG49S0hoYGWK1WnDp1iooWxvDwcNBrVVUxMTGBgYEBlJeXc0oVGfeSNTc3w2w2Izs7O2j6yZMnsWzZMly+fJlTMnmMj4/jk08+4R0jIvooR5gTomQulwsTExO8YxBGhChZa2srHA5HyPTc3NygfTUC5OXl8Y4QMyFKFklJSQkWLlzIO4ZQZHzIlzAl6+vrg8vlCpn+8ssvS3nKMfkfYUrW3t6OwcHBkOkffvghkpOTOSQi8SJMyQDg3LlzGBkZ4R1DWEVFRVJ+tytU4osXL2JsbIx3DGG9/vrrIef3+3y+kFOmREM7OxJTFAWHDx8O+8lcJEJtyQCgtrY25GuTmpoaYW9VyUsgEEBZWZnwBQMELNng4GDIgxI2bdok9I13eVBVVajL3qYjXMlI4qGSEeaELFlpaSkdykggQpbM6/WGnEc2NjZG+2WSErJk4dDzBOQlTcmIvKhkhDkqGWGOSkaYo5IR5qhkhDkqGWGOSkaYo5IR5qhkhDkqGWGOSkaYo5IR5qhkhDkqGWGOSkaYo5IR5qhkhDkqGWGOSkaYo5JJymAw4O233+YdIypUMonJcnNAKhlhjkomkcnJSd4RZoVKJhGz2Rzy+EYZUMkIc1QywhyVjDBHJSPMUckIc1QywhyVjDBHJSPMUckIc1QywpwcX+PP0d69e/Hggw/yjhGipaUFxcXFvGMwl/Al27t3L959911YLBbeUULk5OQgNzcXFRUV+Oabb3jHYSahS/baa69h3759uO2223hHCSszMxN5eXl44IEHoCgKDh06xDsSEwm7T/bKK6/g/fffF7Zg/++OO+5AUVERnn/+ed5RmEjILdmOHTuwf/9+pKen844SNYvFgiVLlvCOwURClWzLli0oKSnBLbfcArPZzDtOzD766CNcvXoVR44c4R0lroQdLr/88ku43e6ol9+4cSN++OEHZGZmSlkw4MZJiaWlpdi8eTPvKHEl7JbM5/NBVdWoll23bh3q6+uRlJQUcZmysjI4nc44pYuPe++9F9u3bw+alpqaiqqqKvj9fqxfvx4dHR1T81wul5QPlxW2ZLHQ6/XTFuz7779Hb2+vhomi09HRgaSkJOTl5QVNNxqNMBqNIQ+1N5lMWsaLG2GHy2itWLECbW1tEedXVlbiwoULGiaKzXRba7vdjhUrVkj/dDzpS5acnBx2uqqqOHbsGOx2u8aJYnP27FnYbLaIZXM4HAgEAsjIyAiZJ8tFJVIPl6tWrQraZwFuPABeURQ0NTXh999/55QsNs3NzUhKSoLVag0ZIm8aGhoKeu33+1FUVKRFvDmTtmTp6eno7u4Omd7e3o7KykoOieamsbERRqMR2dnZUu7cT0eq4fLOO+8EAOh0OixbtixkvqIo8Hg8WseKm/r6erS2tkJRFN5R4kqqLZnT6cTatWthMpnC7uz39vaitraWQ7L4sdlsWLBgAe6//34YjcaIy127dk3DVHMjVckA4M8//ww73ev1Ynh4WOM0bPz888/Q6XS47777whZNVVV88cUXHJLNjtDDZX9/f8gD78Px+Xxoa2vD8ePHNUiljerqapw/fz7s0Hnp0iUOiWZP6JL9+OOP8Hq9My43PDycUAW7qbq6GmfOnEFnZ2fQf7aysjKOqWIn/HBpt9vx8MMPR/xo7/V60dXVpXEq7Rw9ehTAjS//p9tHE5nwJbPZbJicnMTGjRtDjnz7fD40NTXh9OnTnNJp59ixY7wjzJrQw+VNDQ0NIUfEFUVBfX39vCiY7ITfkt1UU1MT9DoQCODs2bOc0pBYSFOy6b4EJ2KTYrgkcqOSEeaoZIQ5KhlhjkpGmKOSEeaoZIQ5Khlh7j+IobnQcdL/mQAAAABJRU5ErkJggg==" y="-21.499943"/>
</g>
<g id="text_2">
<!-- Label -->
<defs>
<path d="M 9.8125 72.90625
L 19.671875 72.90625
L 19.671875 8.296875
L 55.171875 8.296875
L 55.171875 0
L 9.8125 0
z
" id="DejaVuSans-76"/>
<path d="M 48.6875 27.296875
Q 48.6875 37.203125 44.609375 42.84375
Q 40.53125 48.484375 33.40625 48.484375
Q 26.265625 48.484375 22.1875 42.84375
Q 18.109375 37.203125 18.109375 27.296875
Q 18.109375 17.390625 22.1875 11.75
Q 26.265625 6.109375 33.40625 6.109375
Q 40.53125 6.109375 44.609375 11.75
Q 48.6875 17.390625 48.6875 27.296875
z
M 18.109375 46.390625
Q 20.953125 51.265625 25.265625 53.625
Q 29.59375 56 35.59375 56
Q 45.5625 56 51.78125 48.09375
Q 58.015625 40.1875 58.015625 27.296875
Q 58.015625 14.40625 51.78125 6.484375
Q 45.5625 -1.421875 35.59375 -1.421875
Q 29.59375 -1.421875 25.265625 0.953125
Q 20.953125 3.328125 18.109375 8.203125
L 18.109375 0
L 9.078125 0
L 9.078125 75.984375
L 18.109375 75.984375
z
" id="DejaVuSans-98"/>
<path d="M 9.421875 75.984375
L 18.40625 75.984375
L 18.40625 0
L 9.421875 0
z
" id="DejaVuSans-108"/>
</defs>
<g transform="translate(249.721278 16.318125)scale(0.12 -0.12)">
<use xlink:href="#DejaVuSans-76"/>
<use x="55.712891" xlink:href="#DejaVuSans-97"/>
<use x="116.992188" xlink:href="#DejaVuSans-98"/>
<use x="180.46875" xlink:href="#DejaVuSans-101"/>
<use x="241.992188" xlink:href="#DejaVuSans-108"/>
</g>
</g>
</g>
</g>
<defs>
<clipPath id="p58ad9a7e6d">
<rect height="152.181818" width="152.181818" x="7.2" y="22.318125"/>
</clipPath>
<clipPath id="pf02e2d733d">
<rect height="152.181818" width="152.181818" x="189.818182" y="22.318125"/>
</clipPath>
</defs>
</svg>
...@@ -6,10 +6,16 @@ ...@@ -6,10 +6,16 @@
在这里PaddlePaddle为大家提供了一篇cv的教程供大家学习: 在这里PaddlePaddle为大家提供了一篇cv的教程供大家学习:
- `图像分类 <./mnist_lenet_classification/mnist_lenet_classification.html>`_ :介绍使用 Paddle 在MNIST数据集上完成图像分类。 - `图像分类 <./mnist_lenet_classification/mnist_lenet_classification.html>`_ :介绍使用 Paddle 在MNIST数据集上完成图像分类。
- `图像分类 <./convnet_image_classification/convnet_image_classification.html>`_ :介绍使用 Paddle 在CIFA10数据集上完成图像分类。
- `图像搜索 <./image_search/image_search.html>`_ :介绍使用 Paddle 实现图像搜索。
- `图像分割 <./image_segmentation/pets_image_segmentation_U_Net_like.html>`_ :介绍使用 Paddle 实现U-Net模型完成图像分割。
.. toctree:: .. toctree::
:hidden: :hidden:
:titlesonly: :titlesonly:
mnist_lenet_classification/mnist_lenet_classification.rst mnist_lenet_classification/mnist_lenet_classification.rst
convnet_image_classification/convnet_image_classification.rst
image_search/image_search.rst
image_segmentation/pets_image_segmentation_U_Net_like.rst
...@@ -11,9 +11,13 @@ ...@@ -11,9 +11,13 @@
**内容简介** **内容简介**
- `快速上手 <./simple_case/index_cn.html>`_ :快速了解Paddle 2的特性与功能。
- `计算机视觉 <./cv_case/index_cn.html>`_ :介绍使用 Paddle 解决计算机视觉领域的案例 - `计算机视觉 <./cv_case/index_cn.html>`_ :介绍使用 Paddle 解决计算机视觉领域的案例
- `自然语言处理 <./nlp_case/index_cn.html>`_ :介绍使用 Paddle 解决自然语言处理领域的案例
.. toctree:: .. toctree::
:hidden: :hidden:
quick_start/index_cn.rst
cv_case/index_cn.rst cv_case/index_cn.rst
nlp_case/index_cn.rst
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# IMDB 数据集使用BOW网络的文本分类\n",
"\n",
"本示例教程演示如何在IMDB数据集上用简单的BOW网络完成文本分类的任务。\n",
"\n",
"IMDB数据集是一个对电影评论标注为正向评论与负向评论的数据集,共有25000条文本数据作为训练集,25000条文本数据作为测试集。\n",
"该数据集的官方地址为: http://ai.stanford.edu/~amaas/data/sentiment/\n",
"\n",
"- Warning: `paddle.dataset.imdb`先在是一个非常粗野的实现,后续需要有替代的方案。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 环境设置\n",
"\n",
"本示例基于飞桨开源框架2.0版本。"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.0.0\n",
"264e76cae6861ad9b1d4bcd8c3212f7a78c01e4d\n"
]
}
],
"source": [
"import paddle\n",
"import numpy as np\n",
"\n",
"paddle.disable_static()\n",
"print(paddle.__version__)\n",
"print(paddle.__git_commit__)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 加载数据\n",
"\n",
"我们会使用`paddle.dataset`完成数据下载,构建字典和准备数据读取器。在飞桨2.0版本中,推荐使用padding的方式来对同一个batch中长度不一的数据进行补齐,所以在字典中,我们还会添加一个特殊的`<pad>`词,用来在后续对batch中较短的句子进行填充。"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Loading IMDB word dict....\n"
]
}
],
"source": [
"print(\"Loading IMDB word dict....\")\n",
"word_dict = paddle.dataset.imdb.word_dict()\n",
"\n",
"train_reader = paddle.dataset.imdb.train(word_dict)\n",
"test_reader = paddle.dataset.imdb.test(word_dict)\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"the:0\n",
"and:1\n",
"a:2\n",
"of:3\n",
"to:4\n",
"...\n",
"virtual:5143\n",
"warriors:5144\n",
"widely:5145\n",
"<unk>:5146\n",
"<pad>:5147\n",
"totally 5148 words\n"
]
}
],
"source": [
"# add a pad token to the dict for later padding the sequence\n",
"word_dict['<pad>'] = len(word_dict)\n",
"\n",
"for k in list(word_dict)[:5]:\n",
" print(\"{}:{}\".format(k.decode('ASCII'), word_dict[k]))\n",
"\n",
"print(\"...\")\n",
"\n",
"for k in list(word_dict)[-5:]:\n",
" print(\"{}:{}\".format(k if isinstance(k, str) else k.decode('ASCII'), word_dict[k]))\n",
"\n",
"print(\"totally {} words\".format(len(word_dict)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 参数设置\n",
"\n",
"在这里我们设置一下词表大小,`embedding`的大小,batch_size,等等"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"vocab_size = len(word_dict)\n",
"emb_size = 256\n",
"seq_len = 200\n",
"batch_size = 32\n",
"epoch_num = 2\n",
"pad_id = word_dict['<pad>']\n",
"\n",
"classes = ['negative', 'positive']\n",
"\n",
"def ids_to_str(ids):\n",
" #print(ids)\n",
" words = []\n",
" for k in ids:\n",
" w = list(word_dict)[k]\n",
" words.append(w if isinstance(w, str) else w.decode('ASCII'))\n",
" return \" \".join(words)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"在这里,取出一条数据打印出来看看,可以对数据有一个初步直观的印象。"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[5146, 43, 71, 6, 1092, 14, 0, 878, 130, 151, 5146, 18, 281, 747, 0, 5146, 3, 5146, 2165, 37, 5146, 46, 5, 71, 4089, 377, 162, 46, 5, 32, 1287, 300, 35, 203, 2136, 565, 14, 2, 253, 26, 146, 61, 372, 1, 615, 5146, 5, 30, 0, 50, 3290, 6, 2148, 14, 0, 5146, 11, 17, 451, 24, 4, 127, 10, 0, 878, 130, 43, 2, 50, 5146, 751, 5146, 5, 2, 221, 3727, 6, 9, 1167, 373, 9, 5, 5146, 7, 5, 1343, 13, 2, 5146, 1, 250, 7, 98, 4270, 56, 2316, 0, 928, 11, 11, 9, 16, 5, 5146, 5146, 6, 50, 69, 27, 280, 27, 108, 1045, 0, 2633, 4177, 3180, 17, 1675, 1, 2571] 0\n",
"<unk> has much in common with the third man another <unk> film set among the <unk> of <unk> europe like <unk> there is much inventive camera work there is an innocent american who gets emotionally involved with a woman he doesnt really understand and whose <unk> is all the more striking in contrast with the <unk> br but id have to say that the third man has a more <unk> storyline <unk> is a bit disjointed in this respect perhaps this is <unk> it is presented as a <unk> and making it too coherent would spoil the effect br br this movie is <unk> <unk> in more than one sense one never sees the sun shine grim but intriguing and frightening\n",
"negative\n"
]
}
],
"source": [
"# 取出来第一条数据看看样子。\n",
"sent, label = next(train_reader())\n",
"print(sent, label)\n",
"\n",
"print(ids_to_str(sent))\n",
"print(classes[label])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 用padding的方式对齐数据\n",
"\n",
"文本数据中,每一句话的长度都是不一样的,为了方便后续的神经网络的计算,常见的处理方式是把数据集中的数据都统一成同样长度的数据。这包括:对于较长的数据进行截断处理,对于较短的数据用特殊的词`<pad>`进行填充。接下来的代码会对数据集中的数据进行这样的处理。"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(25000, 200)\n",
"(25000, 1)\n",
"(25000, 200)\n",
"(25000, 1)\n",
"<unk> has much in common with the third man another <unk> film set among the <unk> of <unk> europe like <unk> there is much inventive camera work there is an innocent american who gets emotionally involved with a woman he doesnt really understand and whose <unk> is all the more striking in contrast with the <unk> br but id have to say that the third man has a more <unk> storyline <unk> is a bit disjointed in this respect perhaps this is <unk> it is presented as a <unk> and making it too coherent would spoil the effect br br this movie is <unk> <unk> in more than one sense one never sees the sun shine grim but intriguing and frightening <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad>\n",
"<unk> is the most original movie ive seen in years if you like unique thrillers that are influenced by film noir then this is just the right cure for all of those hollywood summer <unk> <unk> the theaters these days von <unk> <unk> like breaking the waves have gotten more <unk> but this is really his best work it is <unk> without being distracting and offers the perfect combination of suspense and dark humor its too bad he decided <unk> cameras were the wave of the future its hard to say who talked him away from the style he <unk> here but its everyones loss that he went into his heavily <unk> <unk> direction instead <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad>\n",
"<unk> von <unk> is never <unk> in trying out new techniques some of them are very original while others are best <unk> br he depicts <unk> germany as a <unk> train journey with so many cities lying in ruins <unk> <unk> a young american of german descent feels <unk> to help in their <unk> it is not a simple task as he quickly finds outbr br his uncle finds him a job as a night <unk> on the <unk> <unk> line his job is to <unk> to the needs of the passengers when the shoes are <unk> a <unk> mark is made on the <unk> a terrible argument <unk> when a passengers shoes are not <unk> despite the fact they have been <unk> there are many <unk> to the german <unk> of <unk> to such stupid <unk> br the <unk> journey is like an <unk> <unk> mans <unk> through life with all its <unk> and <unk> in one sequence <unk> <unk> through the back <unk> to discover them filled with <unk> bodies appearing to have just escaped from <unk> these images horrible as they are are <unk> as in a dream each with its own terrible impact yet <unk> br\n"
]
}
],
"source": [
"def create_padded_dataset(reader):\n",
" padded_sents = []\n",
" labels = []\n",
" for batch_id, data in enumerate(reader):\n",
" sent, label = data\n",
" padded_sent = sent[:seq_len] + [pad_id] * (seq_len - len(sent))\n",
" padded_sents.append(padded_sent)\n",
" labels.append(label)\n",
" return np.array(padded_sents), np.expand_dims(np.array(labels), axis=1)\n",
"\n",
"train_sents, train_labels = create_padded_dataset(train_reader())\n",
"test_sents, test_labels = create_padded_dataset(test_reader())\n",
"\n",
"print(train_sents.shape)\n",
"print(train_labels.shape)\n",
"print(test_sents.shape)\n",
"print(test_labels.shape)\n",
"\n",
"for sent in train_sents[:3]:\n",
" print(ids_to_str(sent))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 组建网络\n",
"\n",
"本示例中,我们将会使用一个不考虑词的顺序的BOW的网络,在查找到每个词对应的embedding后,简单的取平均,作为一个句子的表示。然后用`Linear`进行线性变换。为了防止过拟合,我们还使用了`Dropout`。"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"class MyNet(paddle.nn.Layer):\n",
" def __init__(self):\n",
" super(MyNet, self).__init__()\n",
" self.emb = paddle.nn.Embedding(vocab_size, emb_size)\n",
" self.fc = paddle.nn.Linear(in_features=emb_size, out_features=2)\n",
" self.dropout = paddle.nn.Dropout(0.5)\n",
"\n",
" def forward(self, x):\n",
" x = self.emb(x)\n",
" x = paddle.reduce_mean(x, dim=1)\n",
" x = self.dropout(x)\n",
" x = self.fc(x)\n",
" return x"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 开始模型的训练\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch: 0, batch_id: 0, loss is: [0.6926701]\n",
"epoch: 0, batch_id: 500, loss is: [0.41248566]\n",
"[validation] accuracy/loss: 0.8505121469497681/0.3615057170391083\n",
"epoch: 1, batch_id: 0, loss is: [0.29521096]\n",
"epoch: 1, batch_id: 500, loss is: [0.2916747]\n",
"[validation] accuracy/loss: 0.86475670337677/0.3259459137916565\n"
]
}
],
"source": [
"def train(model):\n",
" model.train()\n",
"\n",
" opt = paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters())\n",
"\n",
" for epoch in range(epoch_num):\n",
" # shuffle data\n",
" perm = np.random.permutation(len(train_sents))\n",
" train_sents_shuffled = train_sents[perm]\n",
" train_labels_shuffled = train_labels[perm]\n",
" \n",
" for batch_id in range(len(train_sents_shuffled) // batch_size):\n",
" x_data = train_sents_shuffled[(batch_id * batch_size):((batch_id+1)*batch_size)]\n",
" y_data = train_labels_shuffled[(batch_id * batch_size):((batch_id+1)*batch_size)]\n",
" \n",
" sent = paddle.to_tensor(x_data)\n",
" label = paddle.to_tensor(y_data)\n",
" \n",
" logits = model(sent)\n",
" loss = paddle.nn.functional.softmax_with_cross_entropy(logits, label)\n",
" \n",
" avg_loss = paddle.mean(loss)\n",
" if batch_id % 500 == 0:\n",
" print(\"epoch: {}, batch_id: {}, loss is: {}\".format(epoch, batch_id, avg_loss.numpy()))\n",
" avg_loss.backward()\n",
" opt.minimize(avg_loss)\n",
" model.clear_gradients()\n",
"\n",
" # evaluate model after one epoch\n",
" model.eval()\n",
" accuracies = []\n",
" losses = []\n",
" for batch_id in range(len(test_sents) // batch_size):\n",
" x_data = test_sents[(batch_id * batch_size):((batch_id+1)*batch_size)]\n",
" y_data = test_labels[(batch_id * batch_size):((batch_id+1)*batch_size)]\n",
" \n",
" sent = paddle.to_tensor(x_data)\n",
" label = paddle.to_tensor(y_data)\n",
"\n",
" logits = model(sent)\n",
" loss = paddle.nn.functional.softmax_with_cross_entropy(logits, label)\n",
" acc = paddle.metric.accuracy(logits, label)\n",
" \n",
" accuracies.append(acc.numpy())\n",
" losses.append(loss.numpy())\n",
" \n",
" avg_acc, avg_loss = np.mean(accuracies), np.mean(losses)\n",
" print(\"[validation] accuracy/loss: {}/{}\".format(avg_acc, avg_loss))\n",
" \n",
" model.train()\n",
" \n",
"model = MyNet()\n",
"train(model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The End\n",
"\n",
"可以看到,在这个数据集上,经过两轮的迭代可以得到86%左右的准确率。你也可以通过调整网络结构和超参数,来获得更好的效果。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"colab": {
"name": "cifar-10-cnn.ipynb",
"private_outputs": true,
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
IMDB 数据集使用BOW网络的文本分类
================================
本示例教程演示如何在IMDB数据集上用简单的BOW网络完成文本分类的任务。
IMDB数据集是一个对电影评论标注为正向评论与负向评论的数据集,共有25000条文本数据作为训练集,25000条文本数据作为测试集。
该数据集的官方地址为: http://ai.stanford.edu/~amaas/data/sentiment/
- Warning:
``paddle.dataset.imdb``\ 先在是一个非常粗野的实现,后续需要有替代的方案。
环境设置
--------
本示例基于飞桨开源框架2.0版本。
.. code::
import paddle
import numpy as np
paddle.disable_static()
print(paddle.__version__)
print(paddle.__git_commit__)
.. parsed-literal::
0.0.0
264e76cae6861ad9b1d4bcd8c3212f7a78c01e4d
加载数据
--------
我们会使用\ ``paddle.dataset``\ 完成数据下载,构建字典和准备数据读取器。在飞桨2.0版本中,推荐使用padding的方式来对同一个batch中长度不一的数据进行补齐,所以在字典中,我们还会添加一个特殊的\ ``<pad>``\ 词,用来在后续对batch中较短的句子进行填充。
.. code::
print("Loading IMDB word dict....")
word_dict = paddle.dataset.imdb.word_dict()
train_reader = paddle.dataset.imdb.train(word_dict)
test_reader = paddle.dataset.imdb.test(word_dict)
.. parsed-literal::
Loading IMDB word dict....
.. code::
# add a pad token to the dict for later padding the sequence
word_dict['<pad>'] = len(word_dict)
for k in list(word_dict)[:5]:
print("{}:{}".format(k.decode('ASCII'), word_dict[k]))
print("...")
for k in list(word_dict)[-5:]:
print("{}:{}".format(k if isinstance(k, str) else k.decode('ASCII'), word_dict[k]))
print("totally {} words".format(len(word_dict)))
.. parsed-literal::
the:0
and:1
a:2
of:3
to:4
...
virtual:5143
warriors:5144
widely:5145
<unk>:5146
<pad>:5147
totally 5148 words
参数设置
--------
在这里我们设置一下词表大小,\ ``embedding``\ 的大小,batch_size,等等
.. code::
vocab_size = len(word_dict)
emb_size = 256
seq_len = 200
batch_size = 32
epoch_num = 2
pad_id = word_dict['<pad>']
classes = ['negative', 'positive']
def ids_to_str(ids):
#print(ids)
words = []
for k in ids:
w = list(word_dict)[k]
words.append(w if isinstance(w, str) else w.decode('ASCII'))
return " ".join(words)
在这里,取出一条数据打印出来看看,可以对数据有一个初步直观的印象。
.. code::
# 取出来第一条数据看看样子。
sent, label = next(train_reader())
print(sent, label)
print(ids_to_str(sent))
print(classes[label])
.. parsed-literal::
[5146, 43, 71, 6, 1092, 14, 0, 878, 130, 151, 5146, 18, 281, 747, 0, 5146, 3, 5146, 2165, 37, 5146, 46, 5, 71, 4089, 377, 162, 46, 5, 32, 1287, 300, 35, 203, 2136, 565, 14, 2, 253, 26, 146, 61, 372, 1, 615, 5146, 5, 30, 0, 50, 3290, 6, 2148, 14, 0, 5146, 11, 17, 451, 24, 4, 127, 10, 0, 878, 130, 43, 2, 50, 5146, 751, 5146, 5, 2, 221, 3727, 6, 9, 1167, 373, 9, 5, 5146, 7, 5, 1343, 13, 2, 5146, 1, 250, 7, 98, 4270, 56, 2316, 0, 928, 11, 11, 9, 16, 5, 5146, 5146, 6, 50, 69, 27, 280, 27, 108, 1045, 0, 2633, 4177, 3180, 17, 1675, 1, 2571] 0
<unk> has much in common with the third man another <unk> film set among the <unk> of <unk> europe like <unk> there is much inventive camera work there is an innocent american who gets emotionally involved with a woman he doesnt really understand and whose <unk> is all the more striking in contrast with the <unk> br but id have to say that the third man has a more <unk> storyline <unk> is a bit disjointed in this respect perhaps this is <unk> it is presented as a <unk> and making it too coherent would spoil the effect br br this movie is <unk> <unk> in more than one sense one never sees the sun shine grim but intriguing and frightening
negative
padding的方式对齐数据
----------------------------
文本数据中,每一句话的长度都是不一样的,为了方便后续的神经网络的计算,常见的处理方式是把数据集中的数据都统一成同样长度的数据。这包括:对于较长的数据进行截断处理,对于较短的数据用特殊的词\ ``<pad>``\ 进行填充。接下来的代码会对数据集中的数据进行这样的处理。
.. code::
def create_padded_dataset(reader):
padded_sents = []
labels = []
for batch_id, data in enumerate(reader):
sent, label = data
padded_sent = sent[:seq_len] + [pad_id] * (seq_len - len(sent))
padded_sents.append(padded_sent)
labels.append(label)
return np.array(padded_sents), np.expand_dims(np.array(labels), axis=1)
train_sents, train_labels = create_padded_dataset(train_reader())
test_sents, test_labels = create_padded_dataset(test_reader())
print(train_sents.shape)
print(train_labels.shape)
print(test_sents.shape)
print(test_labels.shape)
for sent in train_sents[:3]:
print(ids_to_str(sent))
.. parsed-literal::
(25000, 200)
(25000, 1)
(25000, 200)
(25000, 1)
<unk> has much in common with the third man another <unk> film set among the <unk> of <unk> europe like <unk> there is much inventive camera work there is an innocent american who gets emotionally involved with a woman he doesnt really understand and whose <unk> is all the more striking in contrast with the <unk> br but id have to say that the third man has a more <unk> storyline <unk> is a bit disjointed in this respect perhaps this is <unk> it is presented as a <unk> and making it too coherent would spoil the effect br br this movie is <unk> <unk> in more than one sense one never sees the sun shine grim but intriguing and frightening <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad>
<unk> is the most original movie ive seen in years if you like unique thrillers that are influenced by film noir then this is just the right cure for all of those hollywood summer <unk> <unk> the theaters these days von <unk> <unk> like breaking the waves have gotten more <unk> but this is really his best work it is <unk> without being distracting and offers the perfect combination of suspense and dark humor its too bad he decided <unk> cameras were the wave of the future its hard to say who talked him away from the style he <unk> here but its everyones loss that he went into his heavily <unk> <unk> direction instead <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad>
<unk> von <unk> is never <unk> in trying out new techniques some of them are very original while others are best <unk> br he depicts <unk> germany as a <unk> train journey with so many cities lying in ruins <unk> <unk> a young american of german descent feels <unk> to help in their <unk> it is not a simple task as he quickly finds outbr br his uncle finds him a job as a night <unk> on the <unk> <unk> line his job is to <unk> to the needs of the passengers when the shoes are <unk> a <unk> mark is made on the <unk> a terrible argument <unk> when a passengers shoes are not <unk> despite the fact they have been <unk> there are many <unk> to the german <unk> of <unk> to such stupid <unk> br the <unk> journey is like an <unk> <unk> mans <unk> through life with all its <unk> and <unk> in one sequence <unk> <unk> through the back <unk> to discover them filled with <unk> bodies appearing to have just escaped from <unk> these images horrible as they are are <unk> as in a dream each with its own terrible impact yet <unk> br
组建网络
--------
本示例中,我们将会使用一个不考虑词的顺序的BOW的网络,在查找到每个词对应的embedding后,简单的取平均,作为一个句子的表示。然后用\ ``Linear``\ 进行线性变换。为了防止过拟合,我们还使用了\ ``Dropout``\
.. code::
class MyNet(paddle.nn.Layer):
def __init__(self):
super(MyNet, self).__init__()
self.emb = paddle.nn.Embedding(vocab_size, emb_size)
self.fc = paddle.nn.Linear(in_features=emb_size, out_features=2)
self.dropout = paddle.nn.Dropout(0.5)
def forward(self, x):
x = self.emb(x)
x = paddle.reduce_mean(x, dim=1)
x = self.dropout(x)
x = self.fc(x)
return x
开始模型的训练
--------------
.. code::
def train(model):
model.train()
opt = paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters())
for epoch in range(epoch_num):
# shuffle data
perm = np.random.permutation(len(train_sents))
train_sents_shuffled = train_sents[perm]
train_labels_shuffled = train_labels[perm]
for batch_id in range(len(train_sents_shuffled) // batch_size):
x_data = train_sents_shuffled[(batch_id * batch_size):((batch_id+1)*batch_size)]
y_data = train_labels_shuffled[(batch_id * batch_size):((batch_id+1)*batch_size)]
sent = paddle.to_tensor(x_data)
label = paddle.to_tensor(y_data)
logits = model(sent)
loss = paddle.nn.functional.softmax_with_cross_entropy(logits, label)
avg_loss = paddle.mean(loss)
if batch_id % 500 == 0:
print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, avg_loss.numpy()))
avg_loss.backward()
opt.minimize(avg_loss)
model.clear_gradients()
# evaluate model after one epoch
model.eval()
accuracies = []
losses = []
for batch_id in range(len(test_sents) // batch_size):
x_data = test_sents[(batch_id * batch_size):((batch_id+1)*batch_size)]
y_data = test_labels[(batch_id * batch_size):((batch_id+1)*batch_size)]
sent = paddle.to_tensor(x_data)
label = paddle.to_tensor(y_data)
logits = model(sent)
loss = paddle.nn.functional.softmax_with_cross_entropy(logits, label)
acc = paddle.metric.accuracy(logits, label)
accuracies.append(acc.numpy())
losses.append(loss.numpy())
avg_acc, avg_loss = np.mean(accuracies), np.mean(losses)
print("[validation] accuracy/loss: {}/{}".format(avg_acc, avg_loss))
model.train()
model = MyNet()
train(model)
.. parsed-literal::
epoch: 0, batch_id: 0, loss is: [0.6926701]
epoch: 0, batch_id: 500, loss is: [0.41248566]
[validation] accuracy/loss: 0.8505121469497681/0.3615057170391083
epoch: 1, batch_id: 0, loss is: [0.29521096]
epoch: 1, batch_id: 500, loss is: [0.2916747]
[validation] accuracy/loss: 0.86475670337677/0.3259459137916565
The End
--------
可以看到,在这个数据集上,经过两轮的迭代可以得到86%左右的准确率。你也可以通过调整网络结构和超参数,来获得更好的效果。
################
自然语言处理
################
在这里PaddlePaddle为大家提供了一篇nlp的教程供大家学习:
- `N-Gram <./n_gram_model/n_gram_model.html>`_ :介绍使用 Paddle 实现N-Gram 模型。
- `文本分类 <./imdb_bow_classification/imdb_bow_classification.html>`_ :介绍使用 Paddle 在IMDB数据集上完成文本分类。
- `文本翻译 <./seq2seq_with_attention/seq2seq_with_attention.html>`_ :介绍使用 Paddle 实现文本翻译。
.. toctree::
:hidden:
:titlesonly:
n_gram_model/n_gram_model.rst
imdb_bow_classification/imdb_bow_classification.rst
seq2seq_with_attention/seq2seq_with_attention.rst
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"# 用N-Gram模型在莎士比亚文集中训练word embedding\n",
"N-gram 是计算机语言学和概率论范畴内的概念,是指给定的一段文本中N个项目的序列。\n",
"N=1 时 N-gram 又称为 unigram,N=2 称为 bigram,N=3 称为 trigram,以此类推。实际应用通常采用 bigram 和 trigram 进行计算。\n",
"本示例在莎士比亚文集上实现了trigram。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 环境\n",
"本教程基于paddle-develop编写,如果您的环境不是本版本,请先安装paddle-develop。"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'0.0.0'"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import paddle\n",
"paddle.__version__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 数据集&&相关参数\n",
"训练数据集采用了莎士比亚文集,[下载](https://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt),保存为txt格式即可。<br>\n",
"context_size设为2,意味着是trigram。embedding_dim设为256。"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2020-09-09 14:58:26-- https://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt\n",
"正在解析主机 ocw.mit.edu (ocw.mit.edu)... 151.101.110.133\n",
"正在连接 ocw.mit.edu (ocw.mit.edu)|151.101.110.133|:443... 已连接。\n",
"已发出 HTTP 请求,正在等待回应... 200 OK\n",
"长度:5458199 (5.2M) [text/plain]\n",
"正在保存至: “t8.shakespeare.txt”\n",
"\n",
"t8.shakespeare.txt 100%[===================>] 5.21M 94.1KB/s 用时 70s \n",
"\n",
"2020-09-09 14:59:38 (75.7 KB/s) - 已保存 “t8.shakespeare.txt” [5458199/5458199])\n",
"\n"
]
}
],
"source": [
"!wget https://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"embedding_dim = 256\n",
"context_size = 2"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Length of text: 5458199 characters\n"
]
}
],
"source": [
"# 文件路径\n",
"path_to_file = './t8.shakespeare.txt'\n",
"test_sentence = open(path_to_file, 'rb').read().decode(encoding='utf-8')\n",
"\n",
"# 文本长度是指文本中的字符个数\n",
"print ('Length of text: {} characters'.format(len(test_sentence)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 去除标点符号\n",
"因为标点符号本身无实际意义,用`string`库中的punctuation,完成英文符号的替换。"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'!': '', '\"': '', '#': '', '$': '', '%': '', '&': '', \"'\": '', '(': '', ')': '', '*': '', '+': '', ',': '', '-': '', '.': '', '/': '', ':': '', ';': '', '<': '', '=': '', '>': '', '?': '', '@': '', '[': '', '\\\\': '', ']': '', '^': '', '_': '', '`': '', '{': '', '|': '', '}': '', '~': ''}\n"
]
}
],
"source": [
"from string import punctuation\n",
"process_dicts={i:'' for i in punctuation}\n",
"print(process_dicts)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"28343\n"
]
}
],
"source": [
"punc_table = str.maketrans(process_dicts)\n",
"test_sentence = test_sentence.translate(punc_table)\n",
"test_sentence = test_sentence.lower().split()\n",
"vocab = set(test_sentence)\n",
"print(len(vocab))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 数据预处理\n",
"将文本被拆成了元组的形式,格式为(('第一个词', '第二个词'), '第三个词');其中,第三个词就是我们的目标。"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[['this', 'is'], 'the'], [['is', 'the'], '100th'], [['the', '100th'], 'etext']]\n"
]
}
],
"source": [
"trigram = [[[test_sentence[i], test_sentence[i + 1]], test_sentence[i + 2]]\n",
" for i in range(len(test_sentence) - 2)]\n",
"\n",
"word_to_idx = {word: i for i, word in enumerate(vocab)}\n",
"idx_to_word = {word_to_idx[word]: word for word in word_to_idx}\n",
"# 看一下数据集\n",
"print(trigram[:3])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 构建`Dataset`类 加载数据\n",
"用`paddle.io.Dataset`构建数据集,然后作为参数传入到`paddle.io.DataLoader`,完成数据集的加载。"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"import paddle\n",
"import numpy as np\n",
"batch_size = 256\n",
"paddle.disable_static()\n",
"class TrainDataset(paddle.io.Dataset):\n",
" def __init__(self, tuple_data):\n",
" self.tuple_data = tuple_data\n",
"\n",
" def __getitem__(self, idx):\n",
" data = self.tuple_data[idx][0]\n",
" label = self.tuple_data[idx][1]\n",
" data = np.array(list(map(lambda w: word_to_idx[w], data)))\n",
" label = np.array(word_to_idx[label])\n",
" return data, label\n",
" \n",
" def __len__(self):\n",
" return len(self.tuple_data)\n",
"train_dataset = TrainDataset(trigram)\n",
"train_loader = paddle.io.DataLoader(train_dataset,places=paddle.CPUPlace(), return_list=True,\n",
" shuffle=True, batch_size=batch_size, drop_last=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 组网&训练\n",
"这里用paddle动态图的方式组网。为了构建Trigram模型,用一层 `Embedding` 与两层 `Linear` 完成构建。`Embedding` 层对输入的前两个单词embedding,然后输入到后面的两个`Linear`层中,完成特征提取。"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"import paddle\n",
"import numpy as np\n",
"hidden_size = 1024\n",
"class NGramModel(paddle.nn.Layer):\n",
" def __init__(self, vocab_size, embedding_dim, context_size):\n",
" super(NGramModel, self).__init__()\n",
" self.embedding = paddle.nn.Embedding(num_embeddings=vocab_size, embedding_dim=embedding_dim)\n",
" self.linear1 = paddle.nn.Linear(context_size * embedding_dim, hidden_size)\n",
" self.linear2 = paddle.nn.Linear(hidden_size, len(vocab))\n",
"\n",
" def forward(self, x):\n",
" x = self.embedding(x)\n",
" x = paddle.reshape(x, [-1, context_size * embedding_dim])\n",
" x = self.linear1(x)\n",
" x = paddle.nn.functional.relu(x)\n",
" x = self.linear2(x)\n",
" return x"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 定义`train()`函数,对模型进行训练。"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch: 0, batch_id: 0, loss is: [10.252193]\n",
"epoch: 0, batch_id: 500, loss is: [6.894636]\n",
"epoch: 0, batch_id: 1000, loss is: [6.849346]\n",
"epoch: 0, batch_id: 1500, loss is: [6.931605]\n",
"epoch: 0, batch_id: 2000, loss is: [6.6860313]\n",
"epoch: 0, batch_id: 2500, loss is: [6.2472367]\n",
"epoch: 0, batch_id: 3000, loss is: [6.8818874]\n",
"epoch: 0, batch_id: 3500, loss is: [6.941615]\n",
"epoch: 1, batch_id: 0, loss is: [6.3628616]\n",
"epoch: 1, batch_id: 500, loss is: [6.2065206]\n",
"epoch: 1, batch_id: 1000, loss is: [6.5334334]\n",
"epoch: 1, batch_id: 1500, loss is: [6.5788]\n",
"epoch: 1, batch_id: 2000, loss is: [6.352103]\n",
"epoch: 1, batch_id: 2500, loss is: [6.6272373]\n",
"epoch: 1, batch_id: 3000, loss is: [6.801074]\n",
"epoch: 1, batch_id: 3500, loss is: [6.2274427]\n"
]
}
],
"source": [
"vocab_size = len(vocab)\n",
"epochs = 2\n",
"losses = []\n",
"def train(model):\n",
" model.train()\n",
" optim = paddle.optimizer.Adam(learning_rate=0.01, parameters=model.parameters())\n",
" for epoch in range(epochs):\n",
" for batch_id, data in enumerate(train_loader()):\n",
" x_data = data[0]\n",
" y_data = data[1]\n",
" predicts = model(x_data)\n",
" y_data = paddle.reshape(y_data, ([-1, 1]))\n",
" loss = paddle.nn.functional.softmax_with_cross_entropy(predicts, y_data)\n",
" avg_loss = paddle.mean(loss)\n",
" avg_loss.backward()\n",
" if batch_id % 500 == 0:\n",
" losses.append(avg_loss.numpy())\n",
" print(\"epoch: {}, batch_id: {}, loss is: {}\".format(epoch, batch_id, avg_loss.numpy())) \n",
" optim.minimize(avg_loss)\n",
" model.clear_gradients()\n",
"model = NGramModel(vocab_size, embedding_dim, context_size)\n",
"train(model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 打印loss下降曲线\n",
"通过可视化loss的曲线,可以看到模型训练的效果。"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<matplotlib.lines.Line2D at 0x14e27b3c8>]"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"import matplotlib.ticker as ticker\n",
"%matplotlib inline\n",
"\n",
"plt.figure()\n",
"plt.plot(losses)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 预测\n",
"用训练好的模型进行预测。"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"the input words is: of, william\n",
"the predict words is: shakespeare\n",
"the true words is: shakespeare\n"
]
}
],
"source": [
"import random\n",
"def test(model):\n",
" model.eval()\n",
" # 从最后10组数据中随机选取1个\n",
" idx = random.randint(len(trigram)-10, len(trigram)-1)\n",
" print('the input words is: ' + trigram[idx][0][0] + ', ' + trigram[idx][0][1])\n",
" x_data = list(map(lambda w: word_to_idx[w], trigram[idx][0]))\n",
" x_data = paddle.to_tensor(np.array(x_data))\n",
" predicts = model(x_data)\n",
" predicts = predicts.numpy().tolist()[0]\n",
" predicts = predicts.index(max(predicts))\n",
" print('the predict words is: ' + idx_to_word[predicts])\n",
" y_data = trigram[idx][1]\n",
" print('the true words is: ' + y_data)\n",
"test(model)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
N-Gram模型在莎士比亚文集中训练word embedding
==============================================
N-gram
是计算机语言学和概率论范畴内的概念,是指给定的一段文本中N个项目的序列。
N=1 N-gram 又称为 unigramN=2 称为 bigramN=3 称为
trigram,以此类推。实际应用通常采用 bigram trigram 进行计算。
本示例在莎士比亚文集上实现了trigram
环境
----
本教程基于paddle-develop编写,如果您的环境不是本版本,请先安装paddle-develop
.. code:: ipython3
import paddle
paddle.__version__
.. parsed-literal::
'0.0.0'
数据集&&相关参数
----------------
训练数据集采用了莎士比亚文集,\ `下载 <https://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt>`__\ ,保存为txt格式即可。
context_size设为2,意味着是trigramembedding_dim设为256
.. code:: ipython3
!wget https://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt
.. parsed-literal::
--2020-09-09 14:58:26-- https://ocw.mit.edu/ans7870/6/6.006/s08/lecturenotes/files/t8.shakespeare.txt
正在解析主机 ocw.mit.edu (ocw.mit.edu)... 151.101.110.133
正在连接 ocw.mit.edu (ocw.mit.edu)|151.101.110.133|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:5458199 (5.2M) [text/plain]
正在保存至: t8.shakespeare.txt
t8.shakespeare.txt 100%[===================>] 5.21M 94.1KB/s 用时 70s
2020-09-09 14:59:38 (75.7 KB/s) - 已保存 t8.shakespeare.txt [5458199/5458199])
.. code:: ipython3
embedding_dim = 256
context_size = 2
.. code:: ipython3
# 文件路径
path_to_file = './t8.shakespeare.txt'
test_sentence = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# 文本长度是指文本中的字符个数
print ('Length of text: {} characters'.format(len(test_sentence)))
.. parsed-literal::
Length of text: 5458199 characters
去除标点符号
------------
因为标点符号本身无实际意义,用\ ``string``\ 库中的punctuation,完成英文符号的替换。
.. code:: ipython3
from string import punctuation
process_dicts={i:'' for i in punctuation}
print(process_dicts)
.. parsed-literal::
{'!': '', '"': '', '#': '', '$': '', '%': '', '&': '', "'": '', '(': '', ')': '', '*': '', '+': '', ',': '', '-': '', '.': '', '/': '', ':': '', ';': '', '<': '', '=': '', '>': '', '?': '', '@': '', '[': '', '\\': '', ']': '', '^': '', '_': '', '`': '', '{': '', '|': '', '}': '', '~': ''}
.. code:: ipython3
punc_table = str.maketrans(process_dicts)
test_sentence = test_sentence.translate(punc_table)
test_sentence = test_sentence.lower().split()
vocab = set(test_sentence)
print(len(vocab))
.. parsed-literal::
28343
数据预处理
----------
将文本被拆成了元组的形式,格式为((‘第一个词’, ‘第二个词’),
‘第三个词’);其中,第三个词就是我们的目标。
.. code:: ipython3
trigram = [[[test_sentence[i], test_sentence[i + 1]], test_sentence[i + 2]]
for i in range(len(test_sentence) - 2)]
word_to_idx = {word: i for i, word in enumerate(vocab)}
idx_to_word = {word_to_idx[word]: word for word in word_to_idx}
# 看一下数据集
print(trigram[:3])
.. parsed-literal::
[[['this', 'is'], 'the'], [['is', 'the'], '100th'], [['the', '100th'], 'etext']]
构建\ ``Dataset``\ 加载数据
------------------------------
\ ``paddle.io.Dataset``\ 构建数据集,然后作为参数传入到\ ``paddle.io.DataLoader``\ ,完成数据集的加载。
.. code:: ipython3
import paddle
import numpy as np
batch_size = 256
paddle.disable_static()
class TrainDataset(paddle.io.Dataset):
def __init__(self, tuple_data):
self.tuple_data = tuple_data
def __getitem__(self, idx):
data = self.tuple_data[idx][0]
label = self.tuple_data[idx][1]
data = np.array(list(map(lambda w: word_to_idx[w], data)))
label = np.array(word_to_idx[label])
return data, label
def __len__(self):
return len(self.tuple_data)
train_dataset = TrainDataset(trigram)
train_loader = paddle.io.DataLoader(train_dataset,places=paddle.CPUPlace(), return_list=True,
shuffle=True, batch_size=batch_size, drop_last=True)
组网&训练
---------
这里用paddle动态图的方式组网。为了构建Trigram模型,用一层 ``Embedding``
与两层 ``Linear`` 完成构建。\ ``Embedding``
层对输入的前两个单词embedding,然后输入到后面的两个\ ``Linear``\ 层中,完成特征提取。
.. code:: ipython3
import paddle
import numpy as np
hidden_size = 1024
class NGramModel(paddle.nn.Layer):
def __init__(self, vocab_size, embedding_dim, context_size):
super(NGramModel, self).__init__()
self.embedding = paddle.nn.Embedding(num_embeddings=vocab_size, embedding_dim=embedding_dim)
self.linear1 = paddle.nn.Linear(context_size * embedding_dim, hidden_size)
self.linear2 = paddle.nn.Linear(hidden_size, len(vocab))
def forward(self, x):
x = self.embedding(x)
x = paddle.reshape(x, [-1, context_size * embedding_dim])
x = self.linear1(x)
x = paddle.nn.functional.relu(x)
x = self.linear2(x)
return x
定义\ ``train()``\ 函数,对模型进行训练。
-----------------------------------------
.. code:: ipython3
vocab_size = len(vocab)
epochs = 2
losses = []
def train(model):
model.train()
optim = paddle.optimizer.Adam(learning_rate=0.01, parameters=model.parameters())
for epoch in range(epochs):
for batch_id, data in enumerate(train_loader()):
x_data = data[0]
y_data = data[1]
predicts = model(x_data)
y_data = paddle.reshape(y_data, ([-1, 1]))
loss = paddle.nn.functional.softmax_with_cross_entropy(predicts, y_data)
avg_loss = paddle.mean(loss)
avg_loss.backward()
if batch_id % 500 == 0:
losses.append(avg_loss.numpy())
print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, avg_loss.numpy()))
optim.minimize(avg_loss)
model.clear_gradients()
model = NGramModel(vocab_size, embedding_dim, context_size)
train(model)
.. parsed-literal::
epoch: 0, batch_id: 0, loss is: [10.252193]
epoch: 0, batch_id: 500, loss is: [6.894636]
epoch: 0, batch_id: 1000, loss is: [6.849346]
epoch: 0, batch_id: 1500, loss is: [6.931605]
epoch: 0, batch_id: 2000, loss is: [6.6860313]
epoch: 0, batch_id: 2500, loss is: [6.2472367]
epoch: 0, batch_id: 3000, loss is: [6.8818874]
epoch: 0, batch_id: 3500, loss is: [6.941615]
epoch: 1, batch_id: 0, loss is: [6.3628616]
epoch: 1, batch_id: 500, loss is: [6.2065206]
epoch: 1, batch_id: 1000, loss is: [6.5334334]
epoch: 1, batch_id: 1500, loss is: [6.5788]
epoch: 1, batch_id: 2000, loss is: [6.352103]
epoch: 1, batch_id: 2500, loss is: [6.6272373]
epoch: 1, batch_id: 3000, loss is: [6.801074]
epoch: 1, batch_id: 3500, loss is: [6.2274427]
打印loss下降曲线
----------------
通过可视化loss的曲线,可以看到模型训练的效果。
.. code:: ipython3
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
%matplotlib inline
plt.figure()
plt.plot(losses)
.. parsed-literal::
[<matplotlib.lines.Line2D at 0x14e27b3c8>]
.. image:: n_gram_model_files/n_gram_model_19_1.png
预测
----
用训练好的模型进行预测。
.. code:: ipython3
import random
def test(model):
model.eval()
# 从最后10组数据中随机选取1
idx = random.randint(len(trigram)-10, len(trigram)-1)
print('the input words is: ' + trigram[idx][0][0] + ', ' + trigram[idx][0][1])
x_data = list(map(lambda w: word_to_idx[w], trigram[idx][0]))
x_data = paddle.to_tensor(np.array(x_data))
predicts = model(x_data)
predicts = predicts.numpy().tolist()[0]
predicts = predicts.index(max(predicts))
print('the predict words is: ' + idx_to_word[predicts])
y_data = trigram[idx][1]
print('the true words is: ' + y_data)
test(model)
.. parsed-literal::
the input words is: of, william
the predict words is: shakespeare
the true words is: shakespeare
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 使用注意力机制的LSTM的机器翻译\n",
"\n",
"本示例教程介绍如何使用飞桨完成一个机器翻译任务。我们将会使用飞桨提供的LSTM的API,组建一个`sequence to sequence with attention`的机器翻译的模型,并在示例的数据集上完成从英文翻译成中文的机器翻译。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 环境设置\n",
"\n",
"本示例教程基于飞桨2.0-beta版本。"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.0.0\n",
"89af2088b6e74bdfeef2d4d78e08461ed2aafee5\n"
]
}
],
"source": [
"import paddle\n",
"import paddle.nn.functional as F\n",
"import re\n",
"import numpy as np\n",
"\n",
"paddle.disable_static()\n",
"print(paddle.__version__)\n",
"print(paddle.__git_commit__)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 下载数据集\n",
"\n",
"我们将使用 [http://www.manythings.org/anki/](http://www.manythings.org/anki/) 提供的中英文的英汉句对作为数据集,来完成本任务。该数据集含有23610个中英文双语的句对。"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2020-09-04 16:13:35-- https://www.manythings.org/anki/cmn-eng.zip\n",
"Resolving www.manythings.org (www.manythings.org)... 104.24.109.196, 172.67.173.198, 2606:4700:3037::6818:6cc4, ...\n",
"Connecting to www.manythings.org (www.manythings.org)|104.24.109.196|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 1030722 (1007K) [application/zip]\n",
"Saving to: ‘cmn-eng.zip’\n",
"\n",
"cmn-eng.zip 100%[===================>] 1007K 520KB/s in 1.9s \n",
"\n",
"2020-09-04 16:13:38 (520 KB/s) - ‘cmn-eng.zip’ saved [1030722/1030722]\n",
"\n",
"Archive: cmn-eng.zip\n",
" inflating: cmn.txt \n",
" inflating: _about.txt \n"
]
}
],
"source": [
"!wget -c https://www.manythings.org/anki/cmn-eng.zip && unzip cmn-eng.zip"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 23610 cmn.txt\r\n"
]
}
],
"source": [
"!wc -l cmn.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 构建双语句对的数据结构\n",
"\n",
"接下来我们通过处理下载下来的双语句对的文本文件,将双语句对读入到python的数据结构中。这里做了如下的处理。\n",
"\n",
"- 对于英文,会把全部英文都变成小写,并只保留英文的单词。\n",
"- 对于中文,为了简便起见,未做分词,按照字做了切分。\n",
"- 为了后续的程序运行的更快,我们通过限制句子长度,和只保留部分英文单词开头的句子的方式,得到了一个较小的数据集。这样得到了一个有5508个句对的数据集。"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"MAX_LEN = 10"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"5508\n",
"(['i', 'won'], ['我', '赢', '了', '。'])\n",
"(['he', 'ran'], ['他', '跑', '了', '。'])\n",
"(['i', 'quit'], ['我', '退', '出', '。'])\n",
"(['i', 'm', 'ok'], ['我', '沒', '事', '。'])\n",
"(['i', 'm', 'up'], ['我', '已', '经', '起', '来', '了', '。'])\n",
"(['we', 'try'], ['我', '们', '来', '试', '试', '。'])\n",
"(['he', 'came'], ['他', '来', '了', '。'])\n",
"(['he', 'runs'], ['他', '跑', '。'])\n",
"(['i', 'agree'], ['我', '同', '意', '。'])\n",
"(['i', 'm', 'ill'], ['我', '生', '病', '了', '。'])\n"
]
}
],
"source": [
"lines = open('cmn.txt', encoding='utf-8').read().strip().split('\\n')\n",
"words_re = re.compile(r'\\w+')\n",
"\n",
"pairs = []\n",
"for l in lines:\n",
" en_sent, cn_sent, _ = l.split('\\t')\n",
" pairs.append((words_re.findall(en_sent.lower()), list(cn_sent)))\n",
"\n",
"# create a smaller dataset to make the demo process faster\n",
"filtered_pairs = []\n",
"\n",
"for x in pairs:\n",
" if len(x[0]) < MAX_LEN and len(x[1]) < MAX_LEN and \\\n",
" x[0][0] in ('i', 'you', 'he', 'she', 'we', 'they'):\n",
" filtered_pairs.append(x)\n",
" \n",
"print(len(filtered_pairs))\n",
"for x in filtered_pairs[:10]: print(x) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 创建词表\n",
"\n",
"接下来我们分别创建中英文的词表,这两份词表会用来将英文和中文的句子转换为词的ID构成的序列。词表中还加入了如下三个特殊的词:\n",
"- `<pad>`: 用来对较短的句子进行填充。\n",
"- `<bos>`: \"begin of sentence\", 表示句子的开始的特殊词。\n",
"- `<eos>`: \"end of sentence\", 表示句子的结束的特殊词。\n",
"\n",
"Note: 在实际的任务中,可能还需要通过`<unk>`(或者`<oov>`)特殊词来表示未在词表中出现的词。"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2539\n",
"2039\n"
]
}
],
"source": [
"en_vocab = {}\n",
"cn_vocab = {}\n",
"\n",
"# create special token for pad, begin of sentence, end of sentence\n",
"en_vocab['<pad>'], en_vocab['<bos>'], en_vocab['<eos>'] = 0, 1, 2\n",
"cn_vocab['<pad>'], cn_vocab['<bos>'], cn_vocab['<eos>'] = 0, 1, 2\n",
"\n",
"en_idx, cn_idx = 3, 3\n",
"for en, cn in filtered_pairs:\n",
" for w in en: \n",
" if w not in en_vocab: \n",
" en_vocab[w] = en_idx\n",
" en_idx += 1\n",
" for w in cn: \n",
" if w not in cn_vocab: \n",
" cn_vocab[w] = cn_idx\n",
" cn_idx += 1\n",
"\n",
"print(len(list(en_vocab)))\n",
"print(len(list(cn_vocab)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 创建padding过的数据集\n",
"\n",
"接下来根据词表,我们将会创建一份实际的用于训练的用numpy array组织起来的数据集。\n",
"- 所有的句子都通过`<pad>`补充成为了长度相同的句子。\n",
"- 对于英文句子(源语言),我们将其反转了过来,这会带来更好的翻译的效果。\n",
"- 所创建的`padded_cn_label_sents`是训练过程中的预测的目标,即,每个中文的当前词去预测下一个词是什么词。\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(5508, 11)\n",
"(5508, 12)\n",
"(5508, 12)\n"
]
}
],
"source": [
"padded_en_sents = []\n",
"padded_cn_sents = []\n",
"padded_cn_label_sents = []\n",
"for en, cn in filtered_pairs:\n",
" # reverse source sentence\n",
" padded_en_sent = en + ['<eos>'] + ['<pad>'] * (MAX_LEN - len(en))\n",
" padded_en_sent.reverse()\n",
" padded_cn_sent = ['<bos>'] + cn + ['<eos>'] + ['<pad>'] * (MAX_LEN - len(cn))\n",
" padded_cn_label_sent = cn + ['<eos>'] + ['<pad>'] * (MAX_LEN - len(cn) + 1) \n",
"\n",
" padded_en_sents.append([en_vocab[w] for w in padded_en_sent])\n",
" padded_cn_sents.append([cn_vocab[w] for w in padded_cn_sent])\n",
" padded_cn_label_sents.append([cn_vocab[w] for w in padded_cn_label_sent])\n",
"\n",
"train_en_sents = np.array(padded_en_sents)\n",
"train_cn_sents = np.array(padded_cn_sents)\n",
"train_cn_label_sents = np.array(padded_cn_label_sents)\n",
"\n",
"print(train_en_sents.shape)\n",
"print(train_cn_sents.shape)\n",
"print(train_cn_label_sents.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 创建网络\n",
"\n",
"我们将会创建一个Encoder-AttentionDecoder架构的模型结构用来完成机器翻译任务。\n",
"首先我们将设置一些必要的网络结构中用到的参数。"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"embedding_size = 128\n",
"hidden_size = 256\n",
"num_encoder_lstm_layers = 1\n",
"en_vocab_size = len(list(en_vocab))\n",
"cn_vocab_size = len(list(cn_vocab))\n",
"epochs = 20\n",
"batch_size = 16"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Encoder部分\n",
"\n",
"在编码器的部分,我们通过查找完Embedding之后接一个LSTM的方式构建一个对源语言编码的网络。飞桨的RNN系列的API,除了LSTM之外,还提供了SimleRNN, GRU供使用,同时,还可以使用反向RNN,双向RNN,多层RNN等形式。也可以通过`dropout`参数设置是否对多层RNN的中间层进行`dropout`处理,来防止过拟合。\n",
"\n",
"除了使用序列到序列的RNN操作之外,也可以通过SimpleRNN, GRUCell, LSTMCell等API更灵活的创建单步的RNN计算,甚至通过继承RNNCellBase来实现自己的RNN计算单元。"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"# encoder: simply learn representation of source sentence\n",
"class Encoder(paddle.nn.Layer):\n",
" def __init__(self):\n",
" super(Encoder, self).__init__()\n",
" self.emb = paddle.nn.Embedding(en_vocab_size, embedding_size,)\n",
" self.lstm = paddle.nn.LSTM(input_size=embedding_size, \n",
" hidden_size=hidden_size, \n",
" num_layers=num_encoder_lstm_layers)\n",
"\n",
" def forward(self, x):\n",
" x = self.emb(x)\n",
" x, (_, _) = self.lstm(x)\n",
" return x"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# AttentionDecoder部分\n",
"\n",
"在解码器部分,我们通过一个带有注意力机制的LSTM来完成解码。\n",
"\n",
"- 单步的LSTM:在解码器的实现的部分,我们同样使用LSTM,与Encoder部分不同的是,下面的代码,每次只让LSTM往前计算一次。整体的recurrent部分,是在训练循环内完成的。\n",
"- 注意力机制:这里使用了一个由两个Linear组成的网络来完成注意力机制的计算,它用来计算出目标语言在每次翻译一个词的时候,需要对源语言当中的每个词需要赋予多少的权重。\n",
"- 对于第一次接触这样的网络结构来说,下面的代码在理解起来可能稍微有些复杂,你可以通过插入打印每个tensor在不同步骤时的形状的方式来更好的理解。"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"# only move one step of LSTM, \n",
"# the recurrent loop is implemented inside training loop\n",
"class AttentionDecoder(paddle.nn.Layer):\n",
" def __init__(self):\n",
" super(AttentionDecoder, self).__init__()\n",
" self.emb = paddle.nn.Embedding(cn_vocab_size, embedding_size)\n",
" self.lstm = paddle.nn.LSTM(input_size=embedding_size + hidden_size, \n",
" hidden_size=hidden_size)\n",
"\n",
" # for computing attention weights\n",
" self.attention_linear1 = paddle.nn.Linear(hidden_size * 2, hidden_size)\n",
" self.attention_linear2 = paddle.nn.Linear(hidden_size, 1)\n",
" \n",
" # for computing output logits\n",
" self.outlinear =paddle.nn.Linear(hidden_size, cn_vocab_size)\n",
"\n",
" def forward(self, x, previous_hidden, previous_cell, encoder_outputs):\n",
" x = self.emb(x)\n",
" \n",
" attention_inputs = paddle.concat((encoder_outputs, \n",
" paddle.tile(previous_hidden, repeat_times=[1, MAX_LEN+1, 1])),\n",
" axis=-1\n",
" )\n",
"\n",
" attention_hidden = self.attention_linear1(attention_inputs)\n",
" attention_hidden = F.tanh(attention_hidden)\n",
" attention_logits = self.attention_linear2(attention_hidden)\n",
" attention_logits = paddle.squeeze(attention_logits)\n",
"\n",
" attention_weights = F.softmax(attention_logits) \n",
" attention_weights = paddle.expand_as(paddle.unsqueeze(attention_weights, -1), \n",
" encoder_outputs)\n",
"\n",
" context_vector = paddle.multiply(encoder_outputs, attention_weights) \n",
" context_vector = paddle.reduce_sum(context_vector, 1)\n",
" context_vector = paddle.unsqueeze(context_vector, 1)\n",
" \n",
" lstm_input = paddle.concat((x, context_vector), axis=-1)\n",
"\n",
" # LSTM requirement to previous hidden/state: \n",
" # (number_of_layers * direction, batch, hidden)\n",
" previous_hidden = paddle.transpose(previous_hidden, [1, 0, 2])\n",
" previous_cell = paddle.transpose(previous_cell, [1, 0, 2])\n",
" \n",
" x, (hidden, cell) = self.lstm(lstm_input, (previous_hidden, previous_cell))\n",
" \n",
" # change the return to (batch, number_of_layers * direction, hidden)\n",
" hidden = paddle.transpose(hidden, [1, 0, 2])\n",
" cell = paddle.transpose(cell, [1, 0, 2])\n",
"\n",
" output = self.outlinear(hidden)\n",
" output = paddle.squeeze(output)\n",
" return output, (hidden, cell)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 训练模型\n",
"\n",
"接下来我们开始训练模型。\n",
"\n",
"- 在每个epoch开始之前,我们对训练数据进行了随机打乱。\n",
"- 我们通过多次调用`atten_decoder`,在这里实现了解码时的recurrent循环。\n",
"- `teacher forcing`策略: 在每次解码下一个词时,我们给定了训练数据当中的真实词作为了预测下一个词时的输入。相应的,你也可以尝试用模型预测的结果作为下一个词的输入。(或者混合使用)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch:0\n",
"iter 0, loss:[7.6194725]\n",
"iter 200, loss:[3.4147663]\n",
"epoch:1\n",
"iter 0, loss:[3.0931656]\n",
"iter 200, loss:[2.7543137]\n",
"epoch:2\n",
"iter 0, loss:[2.8413522]\n",
"iter 200, loss:[2.340513]\n",
"epoch:3\n",
"iter 0, loss:[2.597812]\n",
"iter 200, loss:[2.5552855]\n",
"epoch:4\n",
"iter 0, loss:[2.0783448]\n",
"iter 200, loss:[2.4544785]\n",
"epoch:5\n",
"iter 0, loss:[1.8709135]\n",
"iter 200, loss:[1.8736631]\n",
"epoch:6\n",
"iter 0, loss:[1.9589291]\n",
"iter 200, loss:[2.119414]\n",
"epoch:7\n",
"iter 0, loss:[1.5829577]\n",
"iter 200, loss:[1.6002902]\n",
"epoch:8\n",
"iter 0, loss:[1.6022769]\n",
"iter 200, loss:[1.52694]\n",
"epoch:9\n",
"iter 0, loss:[1.3616685]\n",
"iter 200, loss:[1.5420443]\n",
"epoch:10\n",
"iter 0, loss:[1.0397792]\n",
"iter 200, loss:[1.2458231]\n",
"epoch:11\n",
"iter 0, loss:[1.2107158]\n",
"iter 200, loss:[1.426417]\n",
"epoch:12\n",
"iter 0, loss:[1.1840894]\n",
"iter 200, loss:[1.0999664]\n",
"epoch:13\n",
"iter 0, loss:[1.0968472]\n",
"iter 200, loss:[0.8149167]\n",
"epoch:14\n",
"iter 0, loss:[0.95585203]\n",
"iter 200, loss:[1.0070628]\n",
"epoch:15\n",
"iter 0, loss:[0.89463925]\n",
"iter 200, loss:[0.8288595]\n",
"epoch:16\n",
"iter 0, loss:[0.5672495]\n",
"iter 200, loss:[0.7317069]\n",
"epoch:17\n",
"iter 0, loss:[0.76785177]\n",
"iter 200, loss:[0.5319323]\n",
"epoch:18\n",
"iter 0, loss:[0.5250005]\n",
"iter 200, loss:[0.4182841]\n",
"epoch:19\n",
"iter 0, loss:[0.52320284]\n",
"iter 200, loss:[0.47618982]\n"
]
}
],
"source": [
"encoder = Encoder()\n",
"atten_decoder = AttentionDecoder()\n",
"\n",
"opt = paddle.optimizer.Adam(learning_rate=0.001, \n",
" parameters=encoder.parameters()+atten_decoder.parameters())\n",
"\n",
"for epoch in range(epochs):\n",
" print(\"epoch:{}\".format(epoch))\n",
"\n",
" # shuffle training data\n",
" perm = np.random.permutation(len(train_en_sents))\n",
" train_en_sents_shuffled = train_en_sents[perm]\n",
" train_cn_sents_shuffled = train_cn_sents[perm]\n",
" train_cn_label_sents_shuffled = train_cn_label_sents[perm]\n",
"\n",
" for iteration in range(train_en_sents_shuffled.shape[0] // batch_size):\n",
" x_data = train_en_sents_shuffled[(batch_size*iteration):(batch_size*(iteration+1))]\n",
" sent = paddle.to_tensor(x_data)\n",
" en_repr = encoder(sent)\n",
"\n",
" x_cn_data = train_cn_sents_shuffled[(batch_size*iteration):(batch_size*(iteration+1))]\n",
" x_cn_label_data = train_cn_label_sents_shuffled[(batch_size*iteration):(batch_size*(iteration+1))]\n",
"\n",
" # shape: (batch, num_layer(=1 here) * num_of_direction(=1 here), hidden_size)\n",
" hidden = paddle.zeros([batch_size, 1, hidden_size])\n",
" cell = paddle.zeros([batch_size, 1, hidden_size])\n",
"\n",
" loss = paddle.zeros([1])\n",
" # the decoder recurrent loop mentioned above\n",
" for i in range(MAX_LEN + 2):\n",
" cn_word = paddle.to_tensor(x_cn_data[:,i:i+1])\n",
" cn_word_label = paddle.to_tensor(x_cn_label_data[:,i:i+1])\n",
"\n",
" logits, (hidden, cell) = atten_decoder(cn_word, hidden, cell, en_repr)\n",
" step_loss = F.softmax_with_cross_entropy(logits, cn_word_label)\n",
" avg_step_loss = paddle.mean(step_loss)\n",
" loss += avg_step_loss\n",
"\n",
" loss = loss / (MAX_LEN + 2)\n",
" if(iteration % 200 == 0):\n",
" print(\"iter {}, loss:{}\".format(iteration, loss.numpy()))\n",
"\n",
" loss.backward()\n",
" opt.minimize(loss)\n",
" encoder.clear_gradients()\n",
" atten_decoder.clear_gradients()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 使用模型进行机器翻译\n",
"\n",
"根据你所使用的计算设备的不同,上面的训练过程可能需要不等的时间。(在一台Mac笔记本上,大约耗时15~20分钟)\n",
"完成上面的模型训练之后,我们可以得到一个能够从英文翻译成中文的机器翻译模型。接下来我们通过一个greedy search来实现使用该模型完成实际的机器翻译。(实际的任务中,你可能需要用beam search算法来提升效果)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"i agree with him\n",
"true: 我同意他。\n",
"pred: 我同意他。\n",
"i think i ll take a bath tonight\n",
"true: 我想我今晚會洗澡。\n",
"pred: 我想我今晚會洗澡。\n",
"he asked for a drink of water\n",
"true: 他要了水喝。\n",
"pred: 他喝了一杯水。\n",
"i began running\n",
"true: 我開始跑。\n",
"pred: 我開始跑。\n",
"i m sick\n",
"true: 我生病了。\n",
"pred: 我生病了。\n",
"you had better go to the dentist s\n",
"true: 你最好去看牙醫。\n",
"pred: 你最好去看牙醫。\n",
"we went for a walk in the forest\n",
"true: 我们去了林中散步。\n",
"pred: 我們去公园散步。\n",
"you ve arrived very early\n",
"true: 你來得很早。\n",
"pred: 你去早个。\n",
"he pretended not to be listening\n",
"true: 他裝作沒在聽。\n",
"pred: 他假装聽到它。\n",
"he always wanted to study japanese\n",
"true: 他一直想學日語。\n",
"pred: 他一直想學日語。\n"
]
}
],
"source": [
"encoder.eval()\n",
"atten_decoder.eval()\n",
"\n",
"num_of_exampels_to_evaluate = 10\n",
"\n",
"indices = np.random.choice(len(train_en_sents), num_of_exampels_to_evaluate, replace=False)\n",
"x_data = train_en_sents[indices]\n",
"sent = paddle.to_tensor(x_data)\n",
"en_repr = encoder(sent)\n",
"\n",
"word = np.array(\n",
" [[cn_vocab['<bos>']]] * num_of_exampels_to_evaluate\n",
")\n",
"word = paddle.to_tensor(word)\n",
"\n",
"hidden = paddle.zeros([num_of_exampels_to_evaluate, 1, hidden_size])\n",
"cell = paddle.zeros([num_of_exampels_to_evaluate, 1, hidden_size])\n",
"\n",
"decoded_sent = []\n",
"for i in range(MAX_LEN + 2):\n",
" logits, (hidden, cell) = atten_decoder(word, hidden, cell, en_repr)\n",
" word = paddle.argmax(logits, axis=1)\n",
" decoded_sent.append(word.numpy())\n",
" word = paddle.unsqueeze(word, axis=-1)\n",
" \n",
"results = np.stack(decoded_sent, axis=1)\n",
"for i in range(num_of_exampels_to_evaluate):\n",
" en_input = \" \".join(filtered_pairs[indices[i]][0])\n",
" ground_truth_translate = \"\".join(filtered_pairs[indices[i]][1])\n",
" model_translate = \"\"\n",
" for k in results[i]:\n",
" w = list(cn_vocab)[k]\n",
" if w != '<pad>' and w != '<eos>':\n",
" model_translate += w\n",
" print(en_input)\n",
" print(\"true: {}\".format(ground_truth_translate))\n",
" print(\"pred: {}\".format(model_translate))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The End\n",
"\n",
"你还可以通过变换网络结构,调整数据集,尝试不同的参数的方式来进一步提升本示例当中的机器翻译的效果。同时,也可以尝试在其他的类似的任务中用飞桨来完成实际的实践。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
使用注意力机制的LSTM的机器翻译
==============================
本示例教程介绍如何使用飞桨完成一个机器翻译任务。我们将会使用飞桨提供的LSTM的API,组建一个\ ``sequence to sequence with attention``\ 的机器翻译的模型,并在示例的数据集上完成从英文翻译成中文的机器翻译。
环境设置
---------
本示例教程基于飞桨2.0-beta版本。
.. code:: ipython3
import paddle
import paddle.nn.functional as F
import re
import numpy as np
paddle.disable_static()
print(paddle.__version__)
print(paddle.__git_commit__)
.. parsed-literal::
0.0.0
89af2088b6e74bdfeef2d4d78e08461ed2aafee5
下载数据集
------------
我们将使用 http://www.manythings.org/anki/
提供的中英文的英汉句对作为数据集,来完成本任务。该数据集含有23610个中英文双语的句对。
.. code:: ipython3
!wget -c https://www.manythings.org/anki/cmn-eng.zip && unzip cmn-eng.zip
.. parsed-literal::
--2020-09-04 16:13:35-- https://www.manythings.org/anki/cmn-eng.zip
Resolving www.manythings.org (www.manythings.org)... 104.24.109.196, 172.67.173.198, 2606:4700:3037::6818:6cc4, ...
Connecting to www.manythings.org (www.manythings.org)|104.24.109.196|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1030722 (1007K) [application/zip]
Saving to: ‘cmn-eng.zip’
cmn-eng.zip 100%[===================>] 1007K 520KB/s in 1.9s
2020-09-04 16:13:38 (520 KB/s) - ‘cmn-eng.zip’ saved [1030722/1030722]
Archive: cmn-eng.zip
inflating: cmn.txt
inflating: _about.txt
.. code:: ipython3
!wc -l cmn.txt
.. parsed-literal::
23610 cmn.txt
构建双语句对的数据结构
-------------------------
接下来我们通过处理下载下来的双语句对的文本文件,将双语句对读入到python的数据结构中。这里做了如下的处理。
- 对于英文,会把全部英文都变成小写,并只保留英文的单词。
- 对于中文,为了简便起见,未做分词,按照字做了切分。
- 为了后续的程序运行的更快,我们通过限制句子长度,和只保留部分英文单词开头的句子的方式,得到了一个较小的数据集。这样得到了一个有5508个句对的数据集。
.. code:: ipython3
MAX_LEN = 10
.. code:: ipython3
lines = open('cmn.txt', encoding='utf-8').read().strip().split('\n')
words_re = re.compile(r'\w+')
pairs = []
for l in lines:
en_sent, cn_sent, _ = l.split('\t')
pairs.append((words_re.findall(en_sent.lower()), list(cn_sent)))
# create a smaller dataset to make the demo process faster
filtered_pairs = []
for x in pairs:
if len(x[0]) < MAX_LEN and len(x[1]) < MAX_LEN and \
x[0][0] in ('i', 'you', 'he', 'she', 'we', 'they'):
filtered_pairs.append(x)
print(len(filtered_pairs))
for x in filtered_pairs[:10]: print(x)
.. parsed-literal::
5508
(['i', 'won'], ['我', '赢', '了', '。'])
(['he', 'ran'], ['他', '跑', '了', '。'])
(['i', 'quit'], ['我', '退', '出', '。'])
(['i', 'm', 'ok'], ['我', '沒', '事', '。'])
(['i', 'm', 'up'], ['我', '已', '经', '起', '来', '了', '。'])
(['we', 'try'], ['我', '们', '来', '试', '试', '。'])
(['he', 'came'], ['他', '来', '了', '。'])
(['he', 'runs'], ['他', '跑', '。'])
(['i', 'agree'], ['我', '同', '意', '。'])
(['i', 'm', 'ill'], ['我', '生', '病', '了', '。'])
创建词表
----------
接下来我们分别创建中英文的词表,这两份词表会用来将英文和中文的句子转换为词的ID构成的序列。词表中还加入了如下三个特殊的词:
- ``<pad>``: 用来对较短的句子进行填充。 - ``<bos>``: “begin of
sentence”, 表示句子的开始的特殊词。 - ``<eos>``: “end of sentence”,
表示句子的结束的特殊词。
Note:
在实际的任务中,可能还需要通过\ ``<unk>``\ (或者\ ``<oov>``\ )特殊词来表示未在词表中出现的词。
.. code:: ipython3
en_vocab = {}
cn_vocab = {}
# create special token for pad, begin of sentence, end of sentence
en_vocab['<pad>'], en_vocab['<bos>'], en_vocab['<eos>'] = 0, 1, 2
cn_vocab['<pad>'], cn_vocab['<bos>'], cn_vocab['<eos>'] = 0, 1, 2
en_idx, cn_idx = 3, 3
for en, cn in filtered_pairs:
for w in en:
if w not in en_vocab:
en_vocab[w] = en_idx
en_idx += 1
for w in cn:
if w not in cn_vocab:
cn_vocab[w] = cn_idx
cn_idx += 1
print(len(list(en_vocab)))
print(len(list(cn_vocab)))
.. parsed-literal::
2539
2039
创建padding过的数据集
-----------------------------
接下来根据词表,我们将会创建一份实际的用于训练的用numpy
array组织起来的数据集。 -
所有的句子都通过\ ``<pad>``\ 补充成为了长度相同的句子。 -
对于英文句子(源语言),我们将其反转了过来,这会带来更好的翻译的效果。 -
所创建的\ ``padded_cn_label_sents``\ 是训练过程中的预测的目标,即,每个中文的当前词去预测下一个词是什么词。
.. code:: ipython3
padded_en_sents = []
padded_cn_sents = []
padded_cn_label_sents = []
for en, cn in filtered_pairs:
# reverse source sentence
padded_en_sent = en + ['<eos>'] + ['<pad>'] * (MAX_LEN - len(en))
padded_en_sent.reverse()
padded_cn_sent = ['<bos>'] + cn + ['<eos>'] + ['<pad>'] * (MAX_LEN - len(cn))
padded_cn_label_sent = cn + ['<eos>'] + ['<pad>'] * (MAX_LEN - len(cn) + 1)
padded_en_sents.append([en_vocab[w] for w in padded_en_sent])
padded_cn_sents.append([cn_vocab[w] for w in padded_cn_sent])
padded_cn_label_sents.append([cn_vocab[w] for w in padded_cn_label_sent])
train_en_sents = np.array(padded_en_sents)
train_cn_sents = np.array(padded_cn_sents)
train_cn_label_sents = np.array(padded_cn_label_sents)
print(train_en_sents.shape)
print(train_cn_sents.shape)
print(train_cn_label_sents.shape)
.. parsed-literal::
(5508, 11)
(5508, 12)
(5508, 12)
创建网络
---------
我们将会创建一个Encoder-AttentionDecoder架构的模型结构用来完成机器翻译任务。
首先我们将设置一些必要的网络结构中用到的参数。
.. code:: ipython3
embedding_size = 128
hidden_size = 256
num_encoder_lstm_layers = 1
en_vocab_size = len(list(en_vocab))
cn_vocab_size = len(list(cn_vocab))
epochs = 20
batch_size = 16
Encoder部分
----------------
在编码器的部分,我们通过查找完Embedding之后接一个LSTM的方式构建一个对源语言编码的网络。飞桨的RNN系列的API,除了LSTM之外,还提供了SimleRNN,
GRU供使用,同时,还可以使用反向RNN,双向RNN,多层RNN等形式。也可以通过\ ``dropout``\ 参数设置是否对多层RNN的中间层进行\ ``dropout``\ 处理,来防止过拟合。
除了使用序列到序列的RNN操作之外,也可以通过SimpleRNN, GRUCell,
LSTMCell等API更灵活的创建单步的RNN计算,甚至通过继承RNNCellBase来实现自己的RNN计算单元。
.. code:: ipython3
# encoder: simply learn representation of source sentence
class Encoder(paddle.nn.Layer):
def __init__(self):
super(Encoder, self).__init__()
self.emb = paddle.nn.Embedding(en_vocab_size, embedding_size,)
self.lstm = paddle.nn.LSTM(input_size=embedding_size,
hidden_size=hidden_size,
num_layers=num_encoder_lstm_layers)
def forward(self, x):
x = self.emb(x)
x, (_, _) = self.lstm(x)
return x
AttentionDecoder部分
------------------------
在解码器部分,我们通过一个带有注意力机制的LSTM来完成解码。
- 单步的LSTM:在解码器的实现的部分,我们同样使用LSTM,与Encoder部分不同的是,下面的代码,每次只让LSTM往前计算一次。整体的recurrent部分,是在训练循环内完成的。
- 注意力机制:这里使用了一个由两个Linear组成的网络来完成注意力机制的计算,它用来计算出目标语言在每次翻译一个词的时候,需要对源语言当中的每个词需要赋予多少的权重。
- 对于第一次接触这样的网络结构来说,下面的代码在理解起来可能稍微有些复杂,你可以通过插入打印每个tensor在不同步骤时的形状的方式来更好的理解。
.. code:: ipython3
# only move one step of LSTM,
# the recurrent loop is implemented inside training loop
class AttentionDecoder(paddle.nn.Layer):
def __init__(self):
super(AttentionDecoder, self).__init__()
self.emb = paddle.nn.Embedding(cn_vocab_size, embedding_size)
self.lstm = paddle.nn.LSTM(input_size=embedding_size + hidden_size,
hidden_size=hidden_size)
# for computing attention weights
self.attention_linear1 = paddle.nn.Linear(hidden_size * 2, hidden_size)
self.attention_linear2 = paddle.nn.Linear(hidden_size, 1)
# for computing output logits
self.outlinear =paddle.nn.Linear(hidden_size, cn_vocab_size)
def forward(self, x, previous_hidden, previous_cell, encoder_outputs):
x = self.emb(x)
attention_inputs = paddle.concat((encoder_outputs,
paddle.tile(previous_hidden, repeat_times=[1, MAX_LEN+1, 1])),
axis=-1
)
attention_hidden = self.attention_linear1(attention_inputs)
attention_hidden = F.tanh(attention_hidden)
attention_logits = self.attention_linear2(attention_hidden)
attention_logits = paddle.squeeze(attention_logits)
attention_weights = F.softmax(attention_logits)
attention_weights = paddle.expand_as(paddle.unsqueeze(attention_weights, -1),
encoder_outputs)
context_vector = paddle.multiply(encoder_outputs, attention_weights)
context_vector = paddle.reduce_sum(context_vector, 1)
context_vector = paddle.unsqueeze(context_vector, 1)
lstm_input = paddle.concat((x, context_vector), axis=-1)
# LSTM requirement to previous hidden/state:
# (number_of_layers * direction, batch, hidden)
previous_hidden = paddle.transpose(previous_hidden, [1, 0, 2])
previous_cell = paddle.transpose(previous_cell, [1, 0, 2])
x, (hidden, cell) = self.lstm(lstm_input, (previous_hidden, previous_cell))
# change the return to (batch, number_of_layers * direction, hidden)
hidden = paddle.transpose(hidden, [1, 0, 2])
cell = paddle.transpose(cell, [1, 0, 2])
output = self.outlinear(hidden)
output = paddle.squeeze(output)
return output, (hidden, cell)
训练模型
--------
接下来我们开始训练模型。
- 在每个epoch开始之前,我们对训练数据进行了随机打乱。
- 我们通过多次调用\ ``atten_decoder``\ ,在这里实现了解码时的recurrent循环。
- ``teacher forcing``\ 策略:
在每次解码下一个词时,我们给定了训练数据当中的真实词作为了预测下一个词时的输入。相应的,你也可以尝试用模型预测的结果作为下一个词的输入。(或者混合使用)
.. code:: ipython3
encoder = Encoder()
atten_decoder = AttentionDecoder()
opt = paddle.optimizer.Adam(learning_rate=0.001,
parameters=encoder.parameters()+atten_decoder.parameters())
for epoch in range(epochs):
print("epoch:{}".format(epoch))
# shuffle training data
perm = np.random.permutation(len(train_en_sents))
train_en_sents_shuffled = train_en_sents[perm]
train_cn_sents_shuffled = train_cn_sents[perm]
train_cn_label_sents_shuffled = train_cn_label_sents[perm]
for iteration in range(train_en_sents_shuffled.shape[0] // batch_size):
x_data = train_en_sents_shuffled[(batch_size*iteration):(batch_size*(iteration+1))]
sent = paddle.to_tensor(x_data)
en_repr = encoder(sent)
x_cn_data = train_cn_sents_shuffled[(batch_size*iteration):(batch_size*(iteration+1))]
x_cn_label_data = train_cn_label_sents_shuffled[(batch_size*iteration):(batch_size*(iteration+1))]
# shape: (batch, num_layer(=1 here) * num_of_direction(=1 here), hidden_size)
hidden = paddle.zeros([batch_size, 1, hidden_size])
cell = paddle.zeros([batch_size, 1, hidden_size])
loss = paddle.zeros([1])
# the decoder recurrent loop mentioned above
for i in range(MAX_LEN + 2):
cn_word = paddle.to_tensor(x_cn_data[:,i:i+1])
cn_word_label = paddle.to_tensor(x_cn_label_data[:,i:i+1])
logits, (hidden, cell) = atten_decoder(cn_word, hidden, cell, en_repr)
step_loss = F.softmax_with_cross_entropy(logits, cn_word_label)
avg_step_loss = paddle.mean(step_loss)
loss += avg_step_loss
loss = loss / (MAX_LEN + 2)
if(iteration % 200 == 0):
print("iter {}, loss:{}".format(iteration, loss.numpy()))
loss.backward()
opt.minimize(loss)
encoder.clear_gradients()
atten_decoder.clear_gradients()
.. parsed-literal::
epoch:0
iter 0, loss:[7.6194725]
iter 200, loss:[3.4147663]
epoch:1
iter 0, loss:[3.0931656]
iter 200, loss:[2.7543137]
epoch:2
iter 0, loss:[2.8413522]
iter 200, loss:[2.340513]
epoch:3
iter 0, loss:[2.597812]
iter 200, loss:[2.5552855]
epoch:4
iter 0, loss:[2.0783448]
iter 200, loss:[2.4544785]
epoch:5
iter 0, loss:[1.8709135]
iter 200, loss:[1.8736631]
epoch:6
iter 0, loss:[1.9589291]
iter 200, loss:[2.119414]
epoch:7
iter 0, loss:[1.5829577]
iter 200, loss:[1.6002902]
epoch:8
iter 0, loss:[1.6022769]
iter 200, loss:[1.52694]
epoch:9
iter 0, loss:[1.3616685]
iter 200, loss:[1.5420443]
epoch:10
iter 0, loss:[1.0397792]
iter 200, loss:[1.2458231]
epoch:11
iter 0, loss:[1.2107158]
iter 200, loss:[1.426417]
epoch:12
iter 0, loss:[1.1840894]
iter 200, loss:[1.0999664]
epoch:13
iter 0, loss:[1.0968472]
iter 200, loss:[0.8149167]
epoch:14
iter 0, loss:[0.95585203]
iter 200, loss:[1.0070628]
epoch:15
iter 0, loss:[0.89463925]
iter 200, loss:[0.8288595]
epoch:16
iter 0, loss:[0.5672495]
iter 200, loss:[0.7317069]
epoch:17
iter 0, loss:[0.76785177]
iter 200, loss:[0.5319323]
epoch:18
iter 0, loss:[0.5250005]
iter 200, loss:[0.4182841]
epoch:19
iter 0, loss:[0.52320284]
iter 200, loss:[0.47618982]
使用模型进行机器翻译
-----------------------
根据你所使用的计算设备的不同,上面的训练过程可能需要不等的时间。(在一台Mac笔记本上,大约耗时15~20分钟)
完成上面的模型训练之后,我们可以得到一个能够从英文翻译成中文的机器翻译模型。接下来我们通过一个greedy
search来实现使用该模型完成实际的机器翻译。(实际的任务中,你可能需要用beam
search算法来提升效果)
.. code:: ipython3
encoder.eval()
atten_decoder.eval()
num_of_exampels_to_evaluate = 10
indices = np.random.choice(len(train_en_sents), num_of_exampels_to_evaluate, replace=False)
x_data = train_en_sents[indices]
sent = paddle.to_tensor(x_data)
en_repr = encoder(sent)
word = np.array(
[[cn_vocab['<bos>']]] * num_of_exampels_to_evaluate
)
word = paddle.to_tensor(word)
hidden = paddle.zeros([num_of_exampels_to_evaluate, 1, hidden_size])
cell = paddle.zeros([num_of_exampels_to_evaluate, 1, hidden_size])
decoded_sent = []
for i in range(MAX_LEN + 2):
logits, (hidden, cell) = atten_decoder(word, hidden, cell, en_repr)
word = paddle.argmax(logits, axis=1)
decoded_sent.append(word.numpy())
word = paddle.unsqueeze(word, axis=-1)
results = np.stack(decoded_sent, axis=1)
for i in range(num_of_exampels_to_evaluate):
en_input = " ".join(filtered_pairs[indices[i]][0])
ground_truth_translate = "".join(filtered_pairs[indices[i]][1])
model_translate = ""
for k in results[i]:
w = list(cn_vocab)[k]
if w != '<pad>' and w != '<eos>':
model_translate += w
print(en_input)
print("true: {}".format(ground_truth_translate))
print("pred: {}".format(model_translate))
.. parsed-literal::
i agree with him
true: 我同意他。
pred: 我同意他。
i think i ll take a bath tonight
true: 我想我今晚會洗澡。
pred: 我想我今晚會洗澡。
he asked for a drink of water
true: 他要了水喝。
pred: 他喝了一杯水。
i began running
true: 我開始跑。
pred: 我開始跑。
i m sick
true: 我生病了。
pred: 我生病了。
you had better go to the dentist s
true: 你最好去看牙醫。
pred: 你最好去看牙醫。
we went for a walk in the forest
true: 我们去了林中散步。
pred: 我們去公园散步。
you ve arrived very early
true: 你來得很早。
pred: 你去早个。
he pretended not to be listening
true: 他裝作沒在聽。
pred: 他假装聽到它。
he always wanted to study japanese
true: 他一直想學日語。
pred: 他一直想學日語。
The End
-------
你还可以通过变换网络结构,调整数据集,尝试不同的参数的方式来进一步提升本示例当中的机器翻译的效果。同时,也可以尝试在其他的类似的任务中用飞桨来完成实际的实践。
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 动态图\n",
"\n",
"从飞桨开源框架2.0beta版本开始,飞桨默认为用户开启了动态图模式。在这种模式下,每次执行一个运算,可以立即得到结果(而不是事先定义好网络结构,然后再执行)。\n",
"\n",
"在动态图模式下,您可以更加方便的组织代码,更容易的调试程序,本示例教程将向你介绍飞桨的动态图的使用。\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 设置环境\n",
"\n",
"我们将使用飞桨2.0beta版本,并确认已经开启了动态图模式。"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.0.0\n",
"89af2088b6e74bdfeef2d4d78e08461ed2aafee5\n"
]
}
],
"source": [
"import paddle\n",
"import paddle.nn.functional as F\n",
"import numpy as np\n",
"\n",
"paddle.disable_static()\n",
"print(paddle.__version__)\n",
"print(paddle.__git_commit__)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 基本用法\n",
"\n",
"在动态图模式下,您可以直接运行一个飞桨提供的API,它会立刻返回结果到python。不再需要首先创建一个计算图,然后再给定数据去运行。"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[-0.49341336 -0.8112665 ]\n",
" [ 0.8929015 0.24661176]\n",
" [-0.64440054 -0.7945008 ]\n",
" [-0.07345356 1.3641853 ]]\n",
"[1. 2.]\n",
"[[0.5065867 1.1887336 ]\n",
" [1.8929014 2.2466118 ]\n",
" [0.35559946 1.2054992 ]\n",
" [0.92654645 3.3641853 ]]\n",
"[-2.1159463 1.386125 -2.2334023 2.654917 ]\n"
]
}
],
"source": [
"a = paddle.randn([4, 2])\n",
"b = paddle.arange(1, 3, dtype='float32')\n",
"\n",
"print(a.numpy())\n",
"print(b.numpy())\n",
"\n",
"c = a + b\n",
"print(c.numpy())\n",
"\n",
"d = paddle.matmul(a, b)\n",
"print(d.numpy())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 使用python的控制流\n",
"\n",
"动态图模式下,您可以使用python的条件判断和循环,这类控制语句来执行神经网络的计算。(不再需要`cond`, `loop`这类OP)\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0 +> [5 6 7]\n",
"1 +> [5 7 9]\n",
"2 +> [ 5 9 15]\n",
"3 -> [-3 3 21]\n",
"4 -> [-3 11 75]\n",
"5 +> [ 5 37 249]\n",
"6 +> [ 5 69 735]\n",
"7 -> [ -3 123 2181]\n",
"8 +> [ 5 261 6567]\n",
"9 +> [ 5 517 19689]\n"
]
}
],
"source": [
"a = paddle.to_tensor(np.array([1, 2, 3]))\n",
"b = paddle.to_tensor(np.array([4, 5, 6]))\n",
"\n",
"for i in range(10):\n",
" r = paddle.rand([1,])\n",
" if r > 0.5:\n",
" c = paddle.pow(a, i) + b\n",
" print(\"{} +> {}\".format(i, c.numpy()))\n",
" else:\n",
" c = paddle.pow(a, i) - b\n",
" print(\"{} -> {}\".format(i, c.numpy()))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 构建更加灵活的网络:控制流\n",
"\n",
"- 使用动态图可以用来创建更加灵活的网络,比如根据控制流选择不同的分支网络,和方便的构建权重共享的网络。接下来我们来看一个具体的例子,在这个例子中,第二个线性变换只有0.5的可能性会运行。\n",
"- 在sequence to sequence with attention的机器翻译的示例中,你会看到更实际的使用动态图构建RNN类的网络带来的灵活性。\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"class MyModel(paddle.nn.Layer):\n",
" def __init__(self, input_size, hidden_size):\n",
" super(MyModel, self).__init__()\n",
" self.linear1 = paddle.nn.Linear(input_size, hidden_size)\n",
" self.linear2 = paddle.nn.Linear(hidden_size, hidden_size)\n",
" self.linear3 = paddle.nn.Linear(hidden_size, 1)\n",
"\n",
" def forward(self, inputs):\n",
" x = self.linear1(inputs)\n",
" x = F.relu(x)\n",
"\n",
" if paddle.rand([1,]) > 0.5: \n",
" x = self.linear2(x)\n",
" x = F.relu(x)\n",
"\n",
" x = self.linear3(x)\n",
" \n",
" return x "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0 [2.0915627]\n",
"200 [0.67530334]\n",
"400 [0.52042854]\n",
"600 [0.28010666]\n",
"800 [0.09739777]\n",
"1000 [0.09307177]\n",
"1200 [0.04252927]\n",
"1400 [0.03095707]\n",
"1600 [0.03022156]\n",
"1800 [0.01616007]\n",
"2000 [0.01069116]\n",
"2200 [0.0055158]\n",
"2400 [0.00195092]\n",
"2600 [0.00101116]\n",
"2800 [0.00192219]\n"
]
}
],
"source": [
"total_data, batch_size, input_size, hidden_size = 1000, 64, 128, 256\n",
"\n",
"x_data = np.random.randn(total_data, input_size).astype(np.float32)\n",
"y_data = np.random.randn(total_data, 1).astype(np.float32)\n",
"\n",
"model = MyModel(input_size, hidden_size)\n",
"\n",
"loss_fn = paddle.nn.MSELoss(reduction='mean')\n",
"optimizer = paddle.optimizer.SGD(learning_rate=0.01, \n",
" parameters=model.parameters())\n",
"\n",
"for t in range(200 * (total_data // batch_size)):\n",
" idx = np.random.choice(total_data, batch_size, replace=False)\n",
" x = paddle.to_tensor(x_data[idx,:])\n",
" y = paddle.to_tensor(y_data[idx,:])\n",
" y_pred = model(x)\n",
"\n",
" loss = loss_fn(y_pred, y)\n",
" if t % 200 == 0:\n",
" print(t, loss.numpy())\n",
"\n",
" loss.backward()\n",
" optimizer.minimize(loss)\n",
" model.clear_gradients()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 构建更加灵活的网络:共享权重\n",
"\n",
"- 使用动态图还可以更加方便的创建共享权重的网络,下面的示例展示了一个共享了权重的简单的AutoEncoder的示例。\n",
"- 你也可以参考图像搜索的示例看到共享参数权重的更实际的使用。"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"step: 0, loss: [0.37666085]\n",
"step: 1, loss: [0.3063845]\n",
"step: 2, loss: [0.2647248]\n",
"step: 3, loss: [0.23831272]\n",
"step: 4, loss: [0.21714918]\n",
"step: 5, loss: [0.1955545]\n",
"step: 6, loss: [0.17261818]\n",
"step: 7, loss: [0.15009595]\n",
"step: 8, loss: [0.13051331]\n",
"step: 9, loss: [0.11537809]\n"
]
}
],
"source": [
"inputs = paddle.rand((256, 64))\n",
"\n",
"linear = paddle.nn.Linear(64, 8, bias_attr=False)\n",
"loss_fn = paddle.nn.MSELoss()\n",
"optimizer = paddle.optimizer.Adam(0.01, parameters=linear.parameters())\n",
"\n",
"for i in range(10):\n",
" hidden = linear(inputs)\n",
" # weight from input to hidden is shared with the linear mapping from hidden to output\n",
" outputs = paddle.matmul(hidden, linear.weight, transpose_y=True) \n",
" loss = loss_fn(outputs, inputs)\n",
" loss.backward()\n",
" print(\"step: {}, loss: {}\".format(i, loss.numpy()))\n",
" optimizer.minimize(loss)\n",
" linear.clear_gradients()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The end\n",
"\n",
"可以看到使用动态图带来了更灵活易用的方式来组网和训练。"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
动态图
======
从飞桨开源框架2.0beta版本开始,飞桨默认为用户开启了动态图模式。在这种模式下,每次执行一个运算,可以立即得到结果(而不是事先定义好网络结构,然后再执行)。
在动态图模式下,您可以更加方便的组织代码,更容易的调试程序,本示例教程将向你介绍飞桨的动态图的使用。
设置环境
--------
我们将使用飞桨2.0beta版本,并确认已经开启了动态图模式。
.. code:: ipython3
import paddle
import paddle.nn.functional as F
import numpy as np
paddle.disable_static()
print(paddle.__version__)
print(paddle.__git_commit__)
.. parsed-literal::
0.0.0
89af2088b6e74bdfeef2d4d78e08461ed2aafee5
基本用法
--------
在动态图模式下,您可以直接运行一个飞桨提供的API,它会立刻返回结果到python。不再需要首先创建一个计算图,然后再给定数据去运行。
.. code:: ipython3
a = paddle.randn([4, 2])
b = paddle.arange(1, 3, dtype='float32')
print(a.numpy())
print(b.numpy())
c = a + b
print(c.numpy())
d = paddle.matmul(a, b)
print(d.numpy())
.. parsed-literal::
[[-0.49341336 -0.8112665 ]
[ 0.8929015 0.24661176]
[-0.64440054 -0.7945008 ]
[-0.07345356 1.3641853 ]]
[1. 2.]
[[0.5065867 1.1887336 ]
[1.8929014 2.2466118 ]
[0.35559946 1.2054992 ]
[0.92654645 3.3641853 ]]
[-2.1159463 1.386125 -2.2334023 2.654917 ]
使用python的控制流
------------------
动态图模式下,您可以使用python的条件判断和循环,这类控制语句来执行神经网络的计算。(不再需要\ ``cond``,
``loop``\ 这类OP)
.. code:: ipython3
a = paddle.to_tensor(np.array([1, 2, 3]))
b = paddle.to_tensor(np.array([4, 5, 6]))
for i in range(10):
r = paddle.rand([1,])
if r > 0.5:
c = paddle.pow(a, i) + b
print("{} +> {}".format(i, c.numpy()))
else:
c = paddle.pow(a, i) - b
print("{} -> {}".format(i, c.numpy()))
.. parsed-literal::
0 +> [5 6 7]
1 +> [5 7 9]
2 +> [ 5 9 15]
3 -> [-3 3 21]
4 -> [-3 11 75]
5 +> [ 5 37 249]
6 +> [ 5 69 735]
7 -> [ -3 123 2181]
8 +> [ 5 261 6567]
9 +> [ 5 517 19689]
构建更加灵活的网络:控制流
-------------------------------
- 使用动态图可以用来创建更加灵活的网络,比如根据控制流选择不同的分支网络,和方便的构建权重共享的网络。接下来我们来看一个具体的例子,在这个例子中,第二个线性变换只有0.5的可能性会运行。
- sequence to sequence with
attention的机器翻译的示例中,你会看到更实际的使用动态图构建RNN类的网络带来的灵活性。
.. code:: ipython3
class MyModel(paddle.nn.Layer):
def __init__(self, input_size, hidden_size):
super(MyModel, self).__init__()
self.linear1 = paddle.nn.Linear(input_size, hidden_size)
self.linear2 = paddle.nn.Linear(hidden_size, hidden_size)
self.linear3 = paddle.nn.Linear(hidden_size, 1)
def forward(self, inputs):
x = self.linear1(inputs)
x = F.relu(x)
if paddle.rand([1,]) > 0.5:
x = self.linear2(x)
x = F.relu(x)
x = self.linear3(x)
return x
.. code:: ipython3
total_data, batch_size, input_size, hidden_size = 1000, 64, 128, 256
x_data = np.random.randn(total_data, input_size).astype(np.float32)
y_data = np.random.randn(total_data, 1).astype(np.float32)
model = MyModel(input_size, hidden_size)
loss_fn = paddle.nn.MSELoss(reduction='mean')
optimizer = paddle.optimizer.SGD(learning_rate=0.01,
parameters=model.parameters())
for t in range(200 * (total_data // batch_size)):
idx = np.random.choice(total_data, batch_size, replace=False)
x = paddle.to_tensor(x_data[idx,:])
y = paddle.to_tensor(y_data[idx,:])
y_pred = model(x)
loss = loss_fn(y_pred, y)
if t % 200 == 0:
print(t, loss.numpy())
loss.backward()
optimizer.minimize(loss)
model.clear_gradients()
.. parsed-literal::
0 [2.0915627]
200 [0.67530334]
400 [0.52042854]
600 [0.28010666]
800 [0.09739777]
1000 [0.09307177]
1200 [0.04252927]
1400 [0.03095707]
1600 [0.03022156]
1800 [0.01616007]
2000 [0.01069116]
2200 [0.0055158]
2400 [0.00195092]
2600 [0.00101116]
2800 [0.00192219]
构建更加灵活的网络:共享权重
---------------------------------
- 使用动态图还可以更加方便的创建共享权重的网络,下面的示例展示了一个共享了权重的简单的AutoEncoder的示例。
- 你也可以参考图像搜索的示例看到共享参数权重的更实际的使用。
.. code:: ipython3
inputs = paddle.rand((256, 64))
linear = paddle.nn.Linear(64, 8, bias_attr=False)
loss_fn = paddle.nn.MSELoss()
optimizer = paddle.optimizer.Adam(0.01, parameters=linear.parameters())
for i in range(10):
hidden = linear(inputs)
# weight from input to hidden is shared with the linear mapping from hidden to output
outputs = paddle.matmul(hidden, linear.weight, transpose_y=True)
loss = loss_fn(outputs, inputs)
loss.backward()
print("step: {}, loss: {}".format(i, loss.numpy()))
optimizer.minimize(loss)
linear.clear_gradients()
.. parsed-literal::
step: 0, loss: [0.37666085]
step: 1, loss: [0.3063845]
step: 2, loss: [0.2647248]
step: 3, loss: [0.23831272]
step: 4, loss: [0.21714918]
step: 5, loss: [0.1955545]
step: 6, loss: [0.17261818]
step: 7, loss: [0.15009595]
step: 8, loss: [0.13051331]
step: 9, loss: [0.11537809]
The end
--------
可以看到使用动态图带来了更灵活易用的方式来组网和训练。
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 快速上手飞桨(PaddlePaddle)\n",
"\n",
"本示例通过一个基础案例带您从一个飞桨新手快速掌握如何使用。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. 安装飞桨\n",
"\n",
"如果您已经安装好飞桨那么可以跳过此步骤。我们针对用户提供了一个方便易用的安装引导页面,您可以通过选择自己的系统和软件版本来获取对应的安装命令,具体可以点击[快速安装](https://www.paddlepaddle.org.cn/install/quick)查看。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. 导入飞桨\n",
"\n",
"这个示例我们采用了Notebook的形式来进行编写,您可以直接通过AIStudio或Jupyter等平台工具来运行这个案例,Notebook的好处是可以通过浏览器来运行Python程序,边看教程边运行结果,可以对比学习,并且可以做到单步运行调试。\n",
"\n",
"安装好飞桨后我们就可以在Python程序中进行飞桨的导入。"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'0.0.0'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import paddle\n",
"\n",
"paddle.__version__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. 实践一个手写数字识别任务\n",
"\n",
"对于深度学习任务如果简单来看,其实分为几个核心步骤:1. 数据集的准备和加载;2. 模型的构建;3.模型训练;4.模型评估。那么接下来我们就一步一步带您通过飞桨的少量API快速实现。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.1 数据加载\n",
"\n",
"加载我们框架为您准备好的一个手写数字识别数据集。这里我们使用两个数据集,一个用来做模型的训练,一个用来做模型的评估。"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"train_dataset = paddle.vision.datasets.MNIST(mode='train', chw_format=False)\n",
"val_dataset = paddle.vision.datasets.MNIST(mode='test', chw_format=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.2 模型搭建\n",
"\n",
"通过Sequential将一层一层的网络结构组建起来。通过数据集加载接口的chw_format参数我们已经将[1, 28, 28]形状的图片数据改变形状为[1, 784],那么在组网过程中不在需要先进行Flatten操作。"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"mnist = paddle.nn.Sequential(\n",
" paddle.nn.Linear(784, 512),\n",
" paddle.nn.ReLU(),\n",
" paddle.nn.Dropout(0.2),\n",
" paddle.nn.Linear(512, 10)\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.3 模型训练\n",
"\n",
"配置好我们模型训练需要的损失计算方法和优化方法后就可以使用fit接口来开启我们的模型训练过程。"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/5\n",
"step 1875/1875 [==============================] - loss: 0.2571 - acc: 0.9037 - 10ms/step \n",
"Epoch 2/5\n",
"step 1875/1875 [==============================] - loss: 0.1880 - acc: 0.9458 - 14ms/step \n",
"Epoch 3/5\n",
"step 1875/1875 [==============================] - loss: 0.0279 - acc: 0.9549 - 11ms/step \n",
"Epoch 4/5\n",
"step 1875/1875 [==============================] - loss: 0.0505 - acc: 0.9608 - 13ms/step \n",
"Epoch 5/5\n",
"step 1875/1875 [==============================] - loss: 0.2253 - acc: 0.9646 - 12ms/step \n"
]
}
],
"source": [
"# 开启动态图模式\n",
"paddle.disable_static() \n",
"\n",
"# 预计模型结构生成模型实例,便于进行后续的配置、训练和验证\n",
"model = paddle.Model(mnist) \n",
"\n",
"# 模型训练相关配置,准备损失计算方法,优化器和精度计算方法\n",
"model.prepare(paddle.optimizer.Adam(parameters=mnist.parameters()),\n",
" paddle.nn.CrossEntropyLoss(),\n",
" paddle.metric.Accuracy())\n",
"\n",
"# 开始模型训练\n",
"model.fit(train_dataset,\n",
" epochs=5, \n",
" batch_size=32,\n",
" verbose=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.4 模型评估\n",
"\n",
"使用我们刚才训练得到的模型参数进行模型的评估操作,看看我们的模型精度如何。"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'loss': [3.576278e-07], 'acc': 0.9666}"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.evaluate(val_dataset, verbose=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"那么初步训练得到的模型效果在97%附近,我们可以进一步通过调整其中的训练参数来提升我们的模型精度。\n",
"\n",
"至此我们可以知道如何通过飞桨的几个简单API来快速完成一个深度学习任务,大家可以针对自己的需求来更换其中的代码,如果需要使用自己的数据集,那么可以更换数据集加载部分程序,如果需要替换模型,那么可以更改模型代码实现等等。我们也为大家提供了很多其他场景的示例代码来教大家如何使用我们的飞桨API,大家可以查看下面的链接或通过页面导航来查看自己感兴趣的部分。\n",
"\n",
"TODO:补充其他示例教程的快速链接。"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.7.4 64-bit",
"language": "python",
"name": "python37464bitc4da1ac836094043840bff631bedbf7f"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
快速上手飞桨(PaddlePaddle
============================
本示例通过一个基础案例带您从一个飞桨新手快速掌握如何使用。
1. 安装飞桨
-----------
如果您已经安装好飞桨那么可以跳过此步骤。我们针对用户提供了一个方便易用的安装引导页面,您可以通过选择自己的系统和软件版本来获取对应的安装命令,具体可以点击\ `快速安装 <https://www.paddlepaddle.org.cn/install/quick>`__\ 查看。
2. 导入飞桨
-----------
这个示例我们采用了Notebook的形式来进行编写,您可以直接通过AIStudioJupyter等平台工具来运行这个案例,Notebook的好处是可以通过浏览器来运行Python程序,边看教程边运行结果,可以对比学习,并且可以做到单步运行调试。
安装好飞桨后我们就可以在Python程序中进行飞桨的导入。
.. code:: ipython3
import paddle
paddle.__version__
.. parsed-literal::
'0.0.0'
3. 实践一个手写数字识别任务
---------------------------
对于深度学习任务如果简单来看,其实分为几个核心步骤:1.
数据集的准备和加载;2.
模型的构建;3.模型训练;4.模型评估。那么接下来我们就一步一步带您通过飞桨的少量API快速实现。
3.1 数据加载
~~~~~~~~~~~~
加载我们框架为您准备好的一个手写数字识别数据集。这里我们使用两个数据集,一个用来做模型的训练,一个用来做模型的评估。
.. code:: ipython3
train_dataset = paddle.vision.datasets.MNIST(mode='train', chw_format=False)
val_dataset = paddle.vision.datasets.MNIST(mode='test', chw_format=False)
3.2 模型搭建
~~~~~~~~~~~~
通过Sequential将一层一层的网络结构组建起来。通过数据集加载接口的chw_format参数我们已经将[1,
28, 28]形状的图片数据改变形状为[1,
784],那么在组网过程中不在需要先进行Flatten操作。
.. code:: ipython3
mnist = paddle.nn.Sequential(
paddle.nn.Linear(784, 512),
paddle.nn.ReLU(),
paddle.nn.Dropout(0.2),
paddle.nn.Linear(512, 10)
)
3.3 模型训练
~~~~~~~~~~~~
配置好我们模型训练需要的损失计算方法和优化方法后就可以使用fit接口来开启我们的模型训练过程。
.. code:: ipython3
# 开启动态图模式
paddle.disable_static()
# 预计模型结构生成模型实例,便于进行后续的配置、训练和验证
model = paddle.Model(mnist)
# 模型训练相关配置,准备损失计算方法,优化器和精度计算方法
model.prepare(paddle.optimizer.Adam(parameters=mnist.parameters()),
paddle.nn.CrossEntropyLoss(),
paddle.metric.Accuracy())
# 开始模型训练
model.fit(train_dataset,
epochs=5,
batch_size=32,
verbose=1)
.. parsed-literal::
Epoch 1/5
step 1875/1875 [==============================] - loss: 0.2571 - acc: 0.9037 - 10ms/step
Epoch 2/5
step 1875/1875 [==============================] - loss: 0.1880 - acc: 0.9458 - 14ms/step
Epoch 3/5
step 1875/1875 [==============================] - loss: 0.0279 - acc: 0.9549 - 11ms/step
Epoch 4/5
step 1875/1875 [==============================] - loss: 0.0505 - acc: 0.9608 - 13ms/step
Epoch 5/5
step 1875/1875 [==============================] - loss: 0.2253 - acc: 0.9646 - 12ms/step
3.4 模型评估
~~~~~~~~~~~~
使用我们刚才训练得到的模型参数进行模型的评估操作,看看我们的模型精度如何。
.. code:: ipython3
model.evaluate(val_dataset, verbose=0)
.. parsed-literal::
{'loss': [3.576278e-07], 'acc': 0.9666}
那么初步训练得到的模型效果在97%附近,我们可以进一步通过调整其中的训练参数来提升我们的模型精度。
至此我们可以知道如何通过飞桨的几个简单API来快速完成一个深度学习任务,大家可以针对自己的需求来更换其中的代码,如果需要使用自己的数据集,那么可以更换数据集加载部分程序,如果需要替换模型,那么可以更改模型代码实现等等。我们也为大家提供了很多其他场景的示例代码来教大家如何使用我们的飞桨API,大家可以查看下面的链接或通过页面导航来查看自己感兴趣的部分。
TODO:补充其他示例教程的快速链接。
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Hello Paddle: 从普通程序走向机器学习程序"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"这篇示例向你介绍普通的程序跟机器学习程序的区别,并带着你用飞桨框架,实现你的第一个机器学习程序。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 普通程序跟机器学习程序的逻辑区别\n",
"\n",
"作为一名开发者,你最熟悉的开始学习一门编程语言,或者一个深度学习框架的方式,可能是通过一个hello, world程序。\n",
"\n",
"学习飞桨也可以这样,这篇小示例教程将会通过一个非常简单的示例来向你展示如何开始使用飞桨。\n",
"\n",
"机器学习程序跟通常的程序最大的不同是,通常的程序是在给定输入的情况下,通过告诉计算机处理数据的规则,然后得到处理后的结果。而机器学习程序则是在并不知道这些规则的情况下,让机器来从数据当中**学习**出来规则。\n",
"\n",
"作为热身,我们先来看看通常的程序所做的事情。\n",
"\n",
"我们现在面临这样一个任务:\n",
"\n",
"我们乘坐出租车的时候,会有一个10元的起步价,只要上车就需要收取。出租车每行驶1公里,需要再支付每公里2元的行驶费用。当一个乘客坐完出租车之后,车上的计价器需要算出来该乘客需要支付的乘车费用。\n",
"\n",
"如果用python来实现该功能,会如下所示:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"12.0\n",
"16.0\n",
"20.0\n",
"28.0\n",
"30.0\n",
"50.0\n"
]
}
],
"source": [
"def calculate_fee(distance_travelled):\n",
" return 10 + 2 * distance_travelled\n",
"\n",
"for x in [1.0, 3.0, 5.0, 9.0, 10.0, 20.0]:\n",
" print(calculate_fee(x))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"接下来,我们把问题稍微变换一下,现在我们知道乘客每次乘坐出租车的公里数,也知道乘客每次下车的时候支付给出租车司机的总费用。但是并不知道乘车的起步价,以及每公里行驶费用是多少。我们希望让机器从这些数据当中学习出来计算总费用的规则。\n",
"\n",
"更具体的,我们想要让机器学习程序通过数据学习出来下面的公式当中的参数w和参数b(这是一个非常简单的示例,所以`w`和`b`都是浮点数,随着对深度学习了解的深入,你将会知道`w`和`b`通常情况下会是矩阵和向量)。这样,当下次乘车的时候,我们知道了行驶里程`distance_travelled`的时候,我们就可以估算出来用户的总费用`total_fee`了。\n",
"\n",
"```\n",
"total_fee = w * distance_travelled + b\n",
"```\n",
"\n",
"接下来,我们看看用飞桨如何实现这个hello, world级别的机器学习程序。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 导入飞桨\n",
"\n",
"为了能够使用飞桨,我们需要先用python的`import`语句导入飞桨`paddle`。\n",
"同时,为了能够更好的对数组进行计算和处理,我们也还需要导入`numpy`。\n",
"\n",
"如果你是在本机运行这个notebook,而且还没有安装飞桨,可以去飞桨的官网查看如何安装:[飞桨官网](https://www.paddlepaddle.org.cn/)。并且请使用2.0beta或以上版本的飞桨。"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"paddle version 0.0.0\n"
]
}
],
"source": [
"import paddle\n",
"paddle.disable_static()\n",
"print(\"paddle version \" + paddle.__version__)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 准备数据\n",
"\n",
"在这个机器学习任务中,我们已经知道了乘客的行驶里程`distance_travelled`,和对应的,这些乘客的总费用`total_fee`。\n",
"通常情况下,在机器学习任务中,像`distance_travelled`这样的输入值,一般被称为`x`(或者特征`feature`),像`total_fee`这样的输出值,一般被称为`y`(或者标签`label`)。\n",
"\n",
"我们用`paddle.to_tensor`把示例数据转换为paddle的Tensor数据。"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"x_data = paddle.to_tensor([[1.], [3.0], [5.0], [9.0], [10.0], [20.0]])\n",
"y_data = paddle.to_tensor([[12.], [16.0], [20.0], [28.0], [30.0], [50.0]])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 用飞桨定义模型的计算\n",
"\n",
"使用飞桨定义模型的计算的过程,本质上,是我们用python,通过飞桨提供的API,来告诉飞桨我们的计算规则的过程。回顾一下,我们想要通过飞桨用机器学习方法,从数据当中学习出来如下公式当中的`w`和`b`。这样在未来,给定`x`时就可以估算出来`y`值(估算出来的`y`记为`y_predict`)\n",
"\n",
"```\n",
"y_predict = w * x + b\n",
"```\n",
"\n",
"我们将会用飞桨的线性变换层:`paddle.nn.Linear`来实现这个计算过程,这个公式里的变量`x, y, w, b, y_predict`,对应着飞桨里面的[Tensor概念](https://www.paddlepaddle.org.cn/documentation/docs/zh/beginners_guide/basic_concept/tensor.html)。\n",
"\n",
"### 稍微补充一下\n",
"\n",
"在这里的示例中,我们根据经验,已经事先知道了`distance_travelled`和`total_fee`之间是线性的关系,而在更实际的问题当中,`x`和`y`的关系通常是非线性的,因此也就需要使用更多类型,也更复杂的神经网络。(比如,BMI指数跟你的身高就不是线性关系,一张图片里的某个像素值跟这个图片是猫还是狗也不是线性关系。)\n"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"linear = paddle.nn.Linear(in_features=1, out_features=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 准备好运行飞桨\n",
"\n",
"机器(计算机)在一开始的时候会随便猜`w`和`b`,我们先看看机器猜的怎么样。你应该可以看到,这时候的`w`是一个随机值,`b`是0.0,这是飞桨的初始化策略,也是这个领域常用的初始化策略。(如果你愿意,也可以采用其他的初始化的方式,今后你也会看到,选择不同的初始化策略也是对于做好深度学习任务来说很重要的一点)。"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"w before optimize: -1.7107375860214233\n",
"b before optimize: 0.0\n"
]
}
],
"source": [
"w_before_opt = linear.weight.numpy().item()\n",
"b_before_opt = linear.bias.numpy().item()\n",
"\n",
"print(\"w before optimize: {}\".format(w_before_opt))\n",
"print(\"b before optimize: {}\".format(b_before_opt))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 告诉飞桨怎么样学习\n",
"\n",
"前面我们定义好了神经网络(尽管是一个最简单的神经网络),我们还需要告诉飞桨,怎么样去**学习**,从而能得到参数`w`和`b`。\n",
"\n",
"这个过程简单的来陈述一下,你应该就会大致明白了(尽管背后的理论和知识还需要逐步的去学习)。在机器学习/深度学习当中,机器(计算机)在最开始的时候,得到参数`w`和`b`的方式是随便猜一下,用这种随便猜测得到的参数值,去进行计算(预测)的时候,得到的`y_predict`,跟实际的`y`值一定是有**差距**的。接下来,机器会根据这个差距来**调整`w`和`b`**,随着这样的逐步的调整,`w`和`b`会越来越正确,`y_predict`跟`y`之间的差距也会越来越小,从而最终能得到好用的`w`和`b`。这个过程就是机器**学习**的过程。\n",
"\n",
"用更加技术的语言来说,衡量**差距**的函数(一个公式)就是损失函数,用来**调整**参数的方法就是优化算法。\n",
"\n",
"在本示例当中,我们用最简单的均方误差(mean square error)作为损失函数(`paddle.nn.MSELoss`);和最常见的优化算法SGD(stocastic gradient descent)作为优化算法(传给`paddle.optimizer.SGD`的参数`learning_rate`,你可以理解为控制每次调整的步子大小的参数)。"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"mse_loss = paddle.nn.MSELoss()\n",
"sgd_optimizer = paddle.optimizer.SGD(learning_rate=0.001, parameters = linear.parameters())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 运行优化算法\n",
"\n",
"接下来,我们让飞桨运行一下这个优化算法,这会是一个前面介绍过的逐步调整参数的过程,你应该可以看到loss值(衡量`y`和`y_predict`的差距的`loss`)在不断的降低。"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch 0 loss [2107.3943]\n",
"epoch 1000 loss [7.8432994]\n",
"epoch 2000 loss [1.7537074]\n",
"epoch 3000 loss [0.39211753]\n",
"epoch 4000 loss [0.08767726]\n",
"finished training, loss [0.01963376]\n"
]
}
],
"source": [
"total_epoch = 5000\n",
"for i in range(total_epoch):\n",
" y_predict = linear(x_data)\n",
" loss = mse_loss(y_predict, y_data)\n",
" loss.backward()\n",
" sgd_optimizer.minimize(loss)\n",
" linear.clear_gradients()\n",
" \n",
" if i%1000 == 0:\n",
" print(\"epoch {} loss {}\".format(i, loss.numpy()))\n",
" \n",
"print(\"finished training, loss {}\".format(loss.numpy()))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 机器学习出来的参数\n",
"\n",
"经过了这样的对参数`w`和`b`的调整(**学习**),我们再通过下面的程序,来看看现在的参数变成了多少。你应该会发现`w`变成了很接近2.0的一个值,`b`变成了接近10.0的一个值。虽然并不是正好的2和10,但却是从数据当中学习出来的还不错的模型的参数,可以在未来的时候,用从这批数据当中学习到的参数来预估了。(如果你愿意,也可以通过让机器多学习一段时间,从而得到更加接近2.0和10.0的参数值。)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"w after optimize: 2.017843246459961\n",
"b after optimize: 9.771851539611816\n"
]
}
],
"source": [
"w_after_opt = linear.weight.numpy().item()\n",
"b_after_opt = linear.bias.numpy().item()\n",
"\n",
"print(\"w after optimize: {}\".format(w_after_opt))\n",
"print(\"b after optimize: {}\".format(b_after_opt))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# hello paddle\n",
"\n",
"通过这个小示例,希望你已经初步了解了飞桨,能在接下来随着对飞桨的更多学习,来解决实际遇到的问题。"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"hello paddle\n"
]
}
],
"source": [
"print(\"hello paddle\")"
]
}
],
"metadata": {
"colab": {
"name": "hello-paddle.ipynb",
"private_outputs": true,
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Hello Paddle: 从普通程序走向机器学习程序
========================================
这篇示例向你介绍普通的程序跟机器学习程序的区别,并带着你用飞桨框架,实现你的第一个机器学习程序。
普通程序跟机器学习程序的逻辑区别
--------------------------------
作为一名开发者,你最熟悉的开始学习一门编程语言,或者一个深度学习框架的方式,可能是通过一个hello,
world程序。
学习飞桨也可以这样,这篇小示例教程将会通过一个非常简单的示例来向你展示如何开始使用飞桨。
机器学习程序跟通常的程序最大的不同是,通常的程序是在给定输入的情况下,通过告诉计算机处理数据的规则,然后得到处理后的结果。而机器学习程序则是在并不知道这些规则的情况下,让机器来从数据当中\ **学习**\ 出来规则。
作为热身,我们先来看看通常的程序所做的事情。
我们现在面临这样一个任务:
我们乘坐出租车的时候,会有一个10元的起步价,只要上车就需要收取。出租车每行驶1公里,需要再支付每公里2元的行驶费用。当一个乘客坐完出租车之后,车上的计价器需要算出来该乘客需要支付的乘车费用。
如果用python来实现该功能,会如下所示:
.. code:: ipython3
def calculate_fee(distance_travelled):
return 10 + 2 * distance_travelled
for x in [1.0, 3.0, 5.0, 9.0, 10.0, 20.0]:
print(calculate_fee(x))
.. parsed-literal::
12.0
16.0
20.0
28.0
30.0
50.0
接下来,我们把问题稍微变换一下,现在我们知道乘客每次乘坐出租车的公里数,也知道乘客每次下车的时候支付给出租车司机的总费用。但是并不知道乘车的起步价,以及每公里行驶费用是多少。我们希望让机器从这些数据当中学习出来计算总费用的规则。
更具体的,我们想要让机器学习程序通过数据学习出来下面的公式当中的参数w和参数b(这是一个非常简单的示例,所以\ ``w``\ 和\ ``b``\ 都是浮点数,随着对深度学习了解的深入,你将会知道\ ``w``\ 和\ ``b``\ 通常情况下会是矩阵和向量)。这样,当下次乘车的时候,我们知道了行驶里程\ ``distance_travelled``\ 的时候,我们就可以估算出来用户的总费用\ ``total_fee``\ 了。
::
total_fee = w * distance_travelled + b
接下来,我们看看用飞桨如何实现这个hello, world级别的机器学习程序。
导入飞桨
---------
为了能够使用飞桨,我们需要先用python的\ ``import``\ 语句导入飞桨\ ``paddle``\ 。
同时,为了能够更好的对数组进行计算和处理,我们也还需要导入\ ``numpy``\ 。
如果你是在本机运行这个notebook,而且还没有安装飞桨,可以去飞桨的官网查看如何安装:\ `飞桨官网 <https://www.paddlepaddle.org.cn/>`__\ 。并且请使用2.0beta或以上版本的飞桨。
.. code:: ipython3
import paddle
paddle.disable_static()
print("paddle version " + paddle.__version__)
.. parsed-literal::
paddle version 0.0.0
准备数据
---------
在这个机器学习任务中,我们已经知道了乘客的行驶里程\ ``distance_travelled``\ ,和对应的,这些乘客的总费用\ ``total_fee``\ 。
通常情况下,在机器学习任务中,像\ ``distance_travelled``\ 这样的输入值,一般被称为\ ``x``\ (或者特征\ ``feature``\ ),像\ ``total_fee``\ 这样的输出值,一般被称为\ ``y``\ (或者标签\ ``label``)。
我们用\ ``paddle.to_tensor``\ 把示例数据转换为paddle的Tensor数据。
.. code:: ipython3
x_data = paddle.to_tensor([[1.], [3.0], [5.0], [9.0], [10.0], [20.0]])
y_data = paddle.to_tensor([[12.], [16.0], [20.0], [28.0], [30.0], [50.0]])
用飞桨定义模型的计算
--------------------
使用飞桨定义模型的计算的过程,本质上,是我们用python,通过飞桨提供的API,来告诉飞桨我们的计算规则的过程。回顾一下,我们想要通过飞桨用机器学习方法,从数据当中学习出来如下公式当中的\ ``w``\ 和\ ``b``\ 。这样在未来,给定\ ``x``\ 时就可以估算出来\ ``y``\ 值(估算出来的\ ``y``\ 记为\ ``y_predict``\ )
::
y_predict = w * x + b
我们将会用飞桨的线性变换层:\ ``paddle.nn.Linear``\ 来实现这个计算过程,这个公式里的变量\ ``x, y, w, b, y_predict``\ ,对应着飞桨里面的\ `Tensor概念 <https://www.paddlepaddle.org.cn/documentation/docs/zh/beginners_guide/basic_concept/tensor.html>`__\ 。
稍微补充一下
~~~~~~~~~~~~
在这里的示例中,我们根据经验,已经事先知道了\ ``distance_travelled``\ 和\ ``total_fee``\ 之间是线性的关系,而在更实际的问题当中,\ ``x``\ 和\ ``y``\ 的关系通常是非线性的,因此也就需要使用更多类型,也更复杂的神经网络。(比如,BMI指数跟你的身高就不是线性关系,一张图片里的某个像素值跟这个图片是猫还是狗也不是线性关系。)
.. code:: ipython3
linear = paddle.nn.Linear(in_features=1, out_features=1)
准备好运行飞桨
----------------
机器(计算机)在一开始的时候会随便猜\ ``w``\ 和\ ``b``\ ,我们先看看机器猜的怎么样。你应该可以看到,这时候的\ ``w``\ 是一个随机值,\ ``b``\ 是0.0,这是飞桨的初始化策略,也是这个领域常用的初始化策略。(如果你愿意,也可以采用其他的初始化的方式,今后你也会看到,选择不同的初始化策略也是对于做好深度学习任务来说很重要的一点)。
.. code:: ipython3
w_before_opt = linear.weight.numpy().item()
b_before_opt = linear.bias.numpy().item()
print("w before optimize: {}".format(w_before_opt))
print("b before optimize: {}".format(b_before_opt))
.. parsed-literal::
w before optimize: -1.7107375860214233
b before optimize: 0.0
告诉飞桨怎么样学习
--------------------
前面我们定义好了神经网络(尽管是一个最简单的神经网络),我们还需要告诉飞桨,怎么样去\ **学习**\ ,从而能得到参数\ ``w``\ 和\ ``b``\ 。
这个过程简单的来陈述一下,你应该就会大致明白了(尽管背后的理论和知识还需要逐步的去学习)。在机器学习/深度学习当中,机器(计算机)在最开始的时候,得到参数\ ``w``\ 和\ ``b``\ 的方式是随便猜一下,用这种随便猜测得到的参数值,去进行计算(预测)的时候,得到的\ ``y_predict``\ ,跟实际的\ ``y``\ 值一定是有\ **差距**\ 的。接下来,机器会根据这个差距来\ **调整\ ``w``\ 和\ ``b``**\ ,随着这样的逐步的调整,\ ``w``\ 和\ ``b``\ 会越来越正确,\ ``y_predict``\ 跟\ ``y``\ 之间的差距也会越来越小,从而最终能得到好用的\ ``w``\ 和\ ``b``\ 。这个过程就是机器\ **学习**\ 的过程。
用更加技术的语言来说,衡量\ **差距**\ 的函数(一个公式)就是损失函数,用来\ **调整**\ 参数的方法就是优化算法。
在本示例当中,我们用最简单的均方误差(mean square
error)作为损失函数(``paddle.nn.MSELoss``);和最常见的优化算法SGD(stocastic
gradient
descent)作为优化算法(传给\ ``paddle.optimizer.SGD``\ 的参数\ ``learning_rate``\ ,你可以理解为控制每次调整的步子大小的参数)。
.. code:: ipython3
mse_loss = paddle.nn.MSELoss()
sgd_optimizer = paddle.optimizer.SGD(learning_rate=0.001, parameters = linear.parameters())
运行优化算法
---------------
接下来,我们让飞桨运行一下这个优化算法,这会是一个前面介绍过的逐步调整参数的过程,你应该可以看到loss值(衡量\ ``y``\ 和\ ``y_predict``\ 的差距的\ ``loss``)在不断的降低。
.. code:: ipython3
total_epoch = 5000
for i in range(total_epoch):
y_predict = linear(x_data)
loss = mse_loss(y_predict, y_data)
loss.backward()
sgd_optimizer.minimize(loss)
linear.clear_gradients()
if i%1000 == 0:
print("epoch {} loss {}".format(i, loss.numpy()))
print("finished training, loss {}".format(loss.numpy()))
.. parsed-literal::
epoch 0 loss [2107.3943]
epoch 1000 loss [7.8432994]
epoch 2000 loss [1.7537074]
epoch 3000 loss [0.39211753]
epoch 4000 loss [0.08767726]
finished training, loss [0.01963376]
机器学习出来的参数
-------------------
经过了这样的对参数\ ``w``\ 和\ ``b``\ 的调整(\ **学习**),我们再通过下面的程序,来看看现在的参数变成了多少。你应该会发现\ ``w``\ 变成了很接近2.0的一个值,\ ``b``\ 变成了接近10.0的一个值。虽然并不是正好的2和10,但却是从数据当中学习出来的还不错的模型的参数,可以在未来的时候,用从这批数据当中学习到的参数来预估了。(如果你愿意,也可以通过让机器多学习一段时间,从而得到更加接近2.0和10.0的参数值。)
.. code:: ipython3
w_after_opt = linear.weight.numpy().item()
b_after_opt = linear.bias.numpy().item()
print("w after optimize: {}".format(w_after_opt))
print("b after optimize: {}".format(b_after_opt))
.. parsed-literal::
w after optimize: 2.017843246459961
b after optimize: 9.771851539611816
hello paddle
---------------
通过这个小示例,希望你已经初步了解了飞桨,能在接下来随着对飞桨的更多学习,来解决实际遇到的问题。
.. code:: ipython3
print("hello paddle")
.. parsed-literal::
hello paddle
{
"metadata": {
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4-final"
},
"orig_nbformat": 2,
"kernelspec": {
"name": "python37464bitc4da1ac836094043840bff631bedbf7f",
"display_name": "Python 3.7.4 64-bit"
}
},
"nbformat": 4,
"nbformat_minor": 2,
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 飞桨高层API使用指南\n",
"\n",
"## 1. 简介\n",
"\n",
"飞桨2.0全新推出高层API,是对飞桨API的进一步封装与升级,提供了更加简洁易用的API,进一步提升了飞桨的易学易用性,并增强飞桨的功能。\n",
"\n",
"飞桨高层API面向从深度学习小白到资深开发者的所有人群,对于AI初学者来说,使用高层API可以简单快速的构建深度学习项目,对于资深开发者来说,可以快速完成算法迭代。\n",
"\n",
"飞桨高层API具有以下特点:\n",
"\n",
"* 易学易用: 高层API是对普通动态图API的进一步封装和优化,同时保持与普通API的兼容性,高层API使用更加易学易用,同样的实现使用高层API可以节省大量的代码。\n",
"* 低代码开发: 使用飞桨高层API的一个明显特点是,用户可编程代码量大大缩减。\n",
"* 动静转换: 高层API支持动静转换,用户只需要改一行代码即可实现将动态图代码在静态图模式下训练,既方便用户使用动态图调试模型,又提升了模型训练效率。\n",
"\n",
"在功能增强与使用方式上,高层API有以下升级:\n",
"\n",
"* 模型训练方式升级: 高层API中封装了Model类,继承了Model类的神经网络可以仅用几行代码完成模型的训练。\n",
"* 新增图像处理模块transform: 飞桨新增了图像预处理模块,其中包含数十种数据处理函数,基本涵盖了常用的数据处理、数据增强方法。\n",
"* 提供常用的神经网络模型可供调用: 高层API中集成了计算机视觉领域和自然语言处理领域常用模型,包括但不限于mobilenet、resnet、yolov3、cyclegan、bert、transformer、seq2seq等等。同时发布了对应模型的预训练模型,用户可以直接使用这些模型或者在此基础上完成二次开发。\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. 安装并使用飞桨高层API\n",
"\n",
"飞桨高层API无需独立安装,只需要安装好paddlepaddle即可,安装完成后import paddle即可使用相关高层API,如:paddle.Model、视觉领域paddle.vision、NLP领域paddle.text。"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "'0.0.0'"
},
"metadata": {},
"execution_count": 4
}
],
"source": [
"import paddle\n",
"import paddle.vision as vision\n",
"import paddle.text as text\n",
"\n",
"paddle.__version__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. 目录\n",
"\n",
"本指南教学内容覆盖\n",
"\n",
"* 使用高层API提供的自带数据集进行相关深度学习任务训练。\n",
"* 使用自定义数据进行数据集的定义、数据预处理和训练。\n",
"* 如何在数据集定义和加载中应用数据增强相关接口。\n",
"* 如何进行模型的组网。\n",
"* 高层API进行模型训练的相关API使用。\n",
"* 如何在fit接口满足需求的时候进行自定义,使用基础API来完成训练。\n",
"* 如何使用多卡来加速训练。\n",
"\n",
"其他端到端的示例教程:\n",
"* TBD"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. 数据集定义、加载和数据预处理\n",
"\n",
"对于深度学习任务,均是框架针对各种类型数字的计算,是无法直接使用原始图片和文本等文件来完成。那么就是涉及到了一项动作,就是将原始的各种数据文件进行处理加工,转换成深度学习任务可以使用的数据。\n",
"\n",
"### 3.1 框架自带数据集使用\n",
"\n",
"高层API将一些我们常用到的数据集作为领域API对用户进行开放,对应API所在目录为`paddle.vision.datasets`,那么我们先看下提供了哪些数据集。"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"tags": []
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "['DatasetFolder',\n 'ImageFolder',\n 'MNIST',\n 'Flowers',\n 'Cifar10',\n 'Cifar100',\n 'VOC2012']"
},
"metadata": {},
"execution_count": 17
}
],
"source": [
"paddle.vision.datasets.__all__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"这里我们是加载一个手写数字识别的数据集,用`mode`来标识是训练数据还是测试数据集。数据集接口会自动从远端下载数据集到本机缓存目录`~/.cache/paddle/dataset`。"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"# 测试数据集\n",
"train_dataset = vision.datasets.MNIST(mode='train')\n",
"\n",
"# 验证数据集\n",
"val_dataset = vision.datasets.MNIST(mode='test')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.2 自定义数据集\n",
"\n",
"更多的时候我们是需要自己使用已有的相关数据来定义数据集,那么这里我们通过一个案例来了解如何进行数据集的定义,飞桨为用户提供了`paddle.io.Dataset`基类,让用户通过类的集成来快速实现数据集定义。"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"tags": []
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": "=============train dataset=============\ntraindata1 label1\ntraindata2 label2\ntraindata3 label3\ntraindata4 label4\n=============evaluation dataset=============\ntestdata1 label1\ntestdata2 label2\ntestdata3 label3\ntestdata4 label4\n"
}
],
"source": [
"from paddle.io import Dataset\n",
"\n",
"\n",
"class MyDataset(Dataset):\n",
" \"\"\"\n",
" 步骤一:继承paddle.io.Dataset类\n",
" \"\"\"\n",
" def __init__(self, mode='train'):\n",
" \"\"\"\n",
" 步骤二:实现构造函数,定义数据读取方式,划分训练和测试数据集\n",
" \"\"\"\n",
" super(MyDataset, self).__init__()\n",
"\n",
" if mode == 'train':\n",
" self.data = [\n",
" ['traindata1', 'label1'],\n",
" ['traindata2', 'label2'],\n",
" ['traindata3', 'label3'],\n",
" ['traindata4', 'label4'],\n",
" ]\n",
" else:\n",
" self.data = [\n",
" ['testdata1', 'label1'],\n",
" ['testdata2', 'label2'],\n",
" ['testdata3', 'label3'],\n",
" ['testdata4', 'label4'],\n",
" ]\n",
" \n",
" def __getitem__(self, index):\n",
" \"\"\"\n",
" 步骤三:实现__getitem__方法,定义指定index时如何获取数据,并返回单条数据(训练数据,对应的标签)\n",
" \"\"\"\n",
" data = self.data[index][0]\n",
" label = self.data[index][1]\n",
"\n",
" return data, label\n",
"\n",
" def __len__(self):\n",
" \"\"\"\n",
" 步骤四:实现__len__方法,返回数据集总数目\n",
" \"\"\"\n",
" return len(self.data)\n",
"\n",
"# 测试定义的数据集\n",
"train_dataset = MyDataset(mode='train')\n",
"val_dataset = MyDataset(mode='test')\n",
"\n",
"print('=============train dataset=============')\n",
"for data, label in train_dataset:\n",
" print(data, label)\n",
"\n",
"print('=============evaluation dataset=============')\n",
"for data, label in val_dataset:\n",
" print(data, label)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.3 数据增强\n",
"\n",
"训练过程中有时会遇到过拟合的问题,其中一个解决方法就是对训练数据做增强,对数据进行处理得到不同的图像,从而泛化数据集。数据增强API是定义在领域目录的transofrms下,这里我们介绍两种使用方式,一种是基于框架自带数据集,一种是基于自己定义的数据集。\n",
"\n",
"#### 3.3.1 框架自带数据集"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from paddle.vision.transforms import Compose, Resize, ColorJitter\n",
"\n",
"\n",
"# 定义想要使用那些数据增强方式,这里用到了随机调整亮度、对比度和饱和度,改变图片大小\n",
"transform = Compose([ColorJitter(), Resize(size=100)])\n",
"\n",
"# 通过transform参数传递定义好的数据增项方法即可完成对自带数据集的应用\n",
"train_dataset = vision.datasets.MNIST(mode='train', transform=transform)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 3.3.2 自定义数据集\n",
"\n",
"针对自定义数据集使用数据增强有两种方式,一种是在数据集的构造函数中进行数据增强方法的定义,之后对__getitem__中返回的数据进行应用。另外一种方式也可以给自定义的数据集类暴漏一个构造参数,在实例化类的时候将数据增强方法传递进去。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from paddle.io import Dataset\n",
"\n",
"\n",
"class MyDataset(Dataset):\n",
" def __init__(self, mode='train'):\n",
" super(MyDataset, self).__init__()\n",
"\n",
" if mode == 'train':\n",
" self.data = [\n",
" ['traindata1', 'label1'],\n",
" ['traindata2', 'label2'],\n",
" ['traindata3', 'label3'],\n",
" ['traindata4', 'label4'],\n",
" ]\n",
" else:\n",
" self.data = [\n",
" ['testdata1', 'label1'],\n",
" ['testdata2', 'label2'],\n",
" ['testdata3', 'label3'],\n",
" ['testdata4', 'label4'],\n",
" ]\n",
"\n",
" # 定义要使用的数据预处理方法,针对图片的操作\n",
" self.transform = Compose([ColorJitter(), Resize(size=100)])\n",
" \n",
" def __getitem__(self, index):\n",
" data = self.data[index][0]\n",
"\n",
" # 在这里对训练数据进行应用\n",
" # 这里只是一个示例,测试时需要将数据集更换为图片数据进行测试\n",
" data = self.transform(data)\n",
"\n",
" label = self.data[index][1]\n",
"\n",
" return data, label\n",
"\n",
" def __len__(self):\n",
" return len(self.data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. 模型组网\n",
"\n",
"针对高层API在模型组网上和基础API是统一的一套,无需投入额外的学习使用成本。那么这里我举几个简单的例子来做示例。\n",
"\n",
"### 4.1 Sequential组网\n",
"\n",
"针对顺序的线性网络结构我们可以直接使用Sequential来快速完成组网,可以减少类的定义等代码编写。"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"# Sequential形式组网\n",
"mnist = paddle.nn.Sequential(\n",
" paddle.nn.Flatten(),\n",
" paddle.nn.Linear(784, 512),\n",
" paddle.nn.ReLU(),\n",
" paddle.nn.Dropout(0.2),\n",
" paddle.nn.Linear(512, 10)\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4.2 SubClass组网\n",
"针对一些比较复杂的网络结构,就可以使用Layer子类定义的方式来进行模型代码编写,在`__init__`构造函数中进行组网Layer的声明,在`forward`中使用声明的Layer变量进行前向计算。子类组网方式也可以实现sublayer的复用,针对相同的layer可以在构造函数中一次性定义,在forward中多次调用。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Layer类继承方式组网\n",
"class Mnist(paddle.nn.Layer):\n",
" def __init__(self):\n",
" super(Mnist, self).__init__()\n",
"\n",
" self.flatten = paddle.nn.Flatten()\n",
" self.linear_1 = paddle.nn.Linear(784, 512)\n",
" self.linear_2 = paddle.nn.Linear(512, 10)\n",
" self.relu = paddle.nn.ReLU()\n",
" self.dropout = paddle.nn.Dropout(0.2)\n",
"\n",
" def forward(self, inputs):\n",
" y = self.flatten(inputs)\n",
" y = self.linear_1(y)\n",
" y = self.relu(y)\n",
" y = self.dropout(y)\n",
" y = self.linear_2(y)\n",
"\n",
" return y\n",
"\n",
"mnist = Mnist()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4.3 模型封装\n",
"\n",
"定义好网络结构之后我们来使用`paddle.Model`完成模型的封装,将网络结构组合成一个可快速使用高层API进行训练、评估和预测的类。\n",
"\n",
"在封装的时候我们有两种场景,动态图训练模式和静态图训练模式。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 场景1:动态图模式\n",
"\n",
"# 启动动态图训练模式\n",
"paddle.disable_static()\n",
"# 使用GPU训练\n",
"paddle.set_device('gpu')\n",
"# 模型封装\n",
"model = paddle.Model(mnist)\n",
"\n",
"\n",
"# 场景2:静态图模式\n",
"\n",
"# input = paddle.static.InputSpec([None, 1, 28, 28], dtype='float32')\n",
"# label = paddle.static.InputSpec([None, 1], dtype='int8')\n",
"# model = paddle.Model(mnist, input, label)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4.4 模型可视化\n",
"\n",
"在组建好我们的网络结构后,一般我们会想去对我们的网络结构进行一下可视化,逐层的去对齐一下我们的网络结构参数,看看是否符合我们的预期。这里可以通过`Model.summary`接口进行可视化展示。\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.summary((1, 28, 28))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"另外,summary接口有两种使用方式,下面我们通过两个示例来做展示,除了`Model.summary`这种配套`paddle.Model`封装使用的接口外,还有一套配合没有经过`paddle.Model`封装的方式来使用。可以直接将实例化好的Layer子类放到`paddle.summary`接口中进行可视化呈现。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"paddle.summary(mnist, (1, 28, 28))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"这里面有一个注意的点,有的用户可能会疑惑为什么要传递`(1, 28, 28)`这个input_size参数,因为在动态图中,网络定义阶段是还没有得到输入数据的形状信息,我们想要做网络结构的呈现就无从下手,那么我们通过告知接口网络结构的输入数据形状,这样网络可以通过逐层的计算推导得到完整的网络结构信息进行呈现。如果是动态图运行模式,那么就不需要给summary接口传递输入数据形状这个值了,因为在Model封装的时候我们已经定义好了InputSpec,其中包含了输入数据的形状格式。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. 模型训练\n",
"\n",
"使用`paddle.Model`封装成模型类后进行训练非常的简洁方便,我们可以直接通过调用`Model.fit`就可以完成训练过程。\n",
"\n",
"在使用`Model.fit`接口启动训练前,我们先通过`Model.prepare`接口来对训练进行提前的配置准备工作,包括设置模型优化器,Loss计算方法,精度计算方法等。\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 为模型训练做准备,设置优化器,损失函数和精度计算方式\n",
"model.prepare(paddle.optimizer.Adam(parameters=model.parameters()), \n",
" paddle.nn.CrossEntropyLoss(),\n",
" paddle.metric.Accuracy())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"做好模型训练的前期准备工作后,我们正式调用`fit()`接口来启动训练过程,需要指定一下至少3个关键参数:训练数据集,训练轮次和单次训练数据批次大小。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 启动模型训练,指定训练数据集,设置训练轮次,设置每次数据集计算的批次大小,设置日志格式\n",
"model.fit(train_dataset, \n",
" epochs=10, \n",
" batch_size=32,\n",
" verbose=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5.1 单机单卡\n",
"\n",
"我们把刚才单步教学的训练代码做一个整合,这个完整的代码示例就是我们的单机单卡训练程序。"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# 启动动态图训练模式\n",
"paddle.disable_static()\n",
"\n",
"# 使用GPU训练\n",
"paddle.set_device('gpu')\n",
"\n",
"# 构建模型训练用的Model,告知需要训练哪个模型\n",
"model = paddle.Model(mnist)\n",
"\n",
"# 为模型训练做准备,设置优化器,损失函数和精度计算方式\n",
"model.prepare(paddle.optimizer.Adam(parameters=model.parameters()), \n",
" paddle.nn.CrossEntropyLoss(),\n",
" paddle.metric.Accuracy())\n",
"\n",
"# 启动模型训练,指定训练数据集,设置训练轮次,设置每次数据集计算的批次大小,设置日志格式\n",
"model.fit(train_dataset, \n",
" epochs=10, \n",
" batch_size=32,\n",
" verbose=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5.2 单机多卡\n",
"\n",
"对于高层API来实现单机多卡非常简单,整个训练代码和单机单卡没有差异。直接使用`paddle.distributed.launch`启动单机单卡的程序即可。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# train.py里面包含的就是单机单卡代码\n",
"python -m paddle.distributed.launch train.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. 模型评估\n",
"\n",
"对于训练好的模型进行评估操作可以使用`evaluate`接口来实现。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"result = model.evaluate(val_dataset, verbose=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 7. 模型预测\n",
"\n",
"高层API中提供`predict`接口,支持用户使用测试数据来完成模型的预测。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pred_result = model.predict(val_dataset)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 8. 模型部署\n",
"\n",
"### 8.1 模型存储\n",
"\n",
"模型训练和验证达到我们的预期后,可以使用`save`接口来将我们的模型保存下来,用于后续模型的Fine-tuning或推理部署。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 保存用于推理部署的模型(training=False)\n",
"model.save('~/model/mnist', training=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 8.2 预测部署\n",
"\n",
"有了用于推理部署的模型,就可以使用推理部署框架来完成预测服务部署,具体可以参见:[预测部署](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/index_cn.html), 包括服务端部署、移动端部署和模型压缩。"
]
}
]
}
\ No newline at end of file
飞桨高层API使用指南
===================
1. 简介
-------
飞桨2.0全新推出高层API,是对飞桨API的进一步封装与升级,提供了更加简洁易用的API,进一步提升了飞桨的易学易用性,并增强飞桨的功能。
飞桨高层API面向从深度学习小白到资深开发者的所有人群,对于AI初学者来说,使用高层API可以简单快速的构建深度学习项目,对于资深开发者来说,可以快速完成算法迭代。
飞桨高层API具有以下特点:
- 易学易用:
高层API是对普通动态图API的进一步封装和优化,同时保持与普通API的兼容性,高层API使用更加易学易用,同样的实现使用高层API可以节省大量的代码。
- 低代码开发:
使用飞桨高层API的一个明显特点是,用户可编程代码量大大缩减。
- 动静转换:
高层API支持动静转换,用户只需要改一行代码即可实现将动态图代码在静态图模式下训练,既方便用户使用动态图调试模型,又提升了模型训练效率。
在功能增强与使用方式上,高层API有以下升级:
- 模型训练方式升级:
高层API中封装了Model类,继承了Model类的神经网络可以仅用几行代码完成模型的训练。
- 新增图像处理模块transform:
飞桨新增了图像预处理模块,其中包含数十种数据处理函数,基本涵盖了常用的数据处理、数据增强方法。
- 提供常用的神经网络模型可供调用:
高层API中集成了计算机视觉领域和自然语言处理领域常用模型,包括但不限于mobilenetresnetyolov3cycleganberttransformerseq2seq等等。同时发布了对应模型的预训练模型,用户可以直接使用这些模型或者在此基础上完成二次开发。
2. 安装并使用飞桨高层API
------------------------
飞桨高层API无需独立安装,只需要安装好paddlepaddle即可,安装完成后import
paddle即可使用相关高层API,如:paddle.Model、视觉领域paddle.visionNLP领域paddle.text
.. code:: ipython3
import paddle
import paddle.vision as vision
import paddle.text as text
paddle.__version__
.. parsed-literal::
'0.0.0'
2. 目录
-------
本指南教学内容覆盖
- 使用高层API提供的自带数据集进行相关深度学习任务训练。
- 使用自定义数据进行数据集的定义、数据预处理和训练。
- 如何在数据集定义和加载中应用数据增强相关接口。
- 如何进行模型的组网。
- 高层API进行模型训练的相关API使用。
- 如何在fit接口满足需求的时候进行自定义,使用基础API来完成训练。
- 如何使用多卡来加速训练。
其他端到端的示例教程: \* TBD
3. 数据集定义、加载和数据预处理
-------------------------------
对于深度学习任务,均是框架针对各种类型数字的计算,是无法直接使用原始图片和文本等文件来完成。那么就是涉及到了一项动作,就是将原始的各种数据文件进行处理加工,转换成深度学习任务可以使用的数据。
3.1 框架自带数据集使用
~~~~~~~~~~~~~~~~~~~~~~
高层API将一些我们常用到的数据集作为领域API对用户进行开放,对应API所在目录为\ ``paddle.vision.datasets``\ ,那么我们先看下提供了哪些数据集。
.. code:: ipython3
paddle.vision.datasets.__all__
.. parsed-literal::
['DatasetFolder',
'ImageFolder',
'MNIST',
'Flowers',
'Cifar10',
'Cifar100',
'VOC2012']
这里我们是加载一个手写数字识别的数据集,用\ ``mode``\ 来标识是训练数据还是测试数据集。数据集接口会自动从远端下载数据集到本机缓存目录\ ``~/.cache/paddle/dataset``\
.. code:: ipython3
# 测试数据集
train_dataset = vision.datasets.MNIST(mode='train')
# 验证数据集
val_dataset = vision.datasets.MNIST(mode='test')
3.2 自定义数据集
~~~~~~~~~~~~~~~~
更多的时候我们是需要自己使用已有的相关数据来定义数据集,那么这里我们通过一个案例来了解如何进行数据集的定义,飞桨为用户提供了\ ``paddle.io.Dataset``\ 基类,让用户通过类的集成来快速实现数据集定义。
.. code:: ipython3
from paddle.io import Dataset
class MyDataset(Dataset):
"""
步骤一:继承paddle.io.Dataset类
"""
def __init__(self, mode='train'):
"""
步骤二:实现构造函数,定义数据读取方式,划分训练和测试数据集
"""
super(MyDataset, self).__init__()
if mode == 'train':
self.data = [
['traindata1', 'label1'],
['traindata2', 'label2'],
['traindata3', 'label3'],
['traindata4', 'label4'],
]
else:
self.data = [
['testdata1', 'label1'],
['testdata2', 'label2'],
['testdata3', 'label3'],
['testdata4', 'label4'],
]
def __getitem__(self, index):
"""
步骤三:实现__getitem__方法,定义指定index时如何获取数据,并返回单条数据(训练数据,对应的标签)
"""
data = self.data[index][0]
label = self.data[index][1]
return data, label
def __len__(self):
"""
步骤四:实现__len__方法,返回数据集总数目
"""
return len(self.data)
# 测试定义的数据集
train_dataset = MyDataset(mode='train')
val_dataset = MyDataset(mode='test')
print('=============train dataset=============')
for data, label in train_dataset:
print(data, label)
print('=============evaluation dataset=============')
for data, label in val_dataset:
print(data, label)
.. parsed-literal::
=============train dataset=============
traindata1 label1
traindata2 label2
traindata3 label3
traindata4 label4
=============evaluation dataset=============
testdata1 label1
testdata2 label2
testdata3 label3
testdata4 label4
3.3 数据增强
~~~~~~~~~~~~
训练过程中有时会遇到过拟合的问题,其中一个解决方法就是对训练数据做增强,对数据进行处理得到不同的图像,从而泛化数据集。数据增强API是定义在领域目录的transofrms下,这里我们介绍两种使用方式,一种是基于框架自带数据集,一种是基于自己定义的数据集。
3.3.1 框架自带数据集
^^^^^^^^^^^^^^^^^^^^
.. code:: ipython3
from paddle.vision.transforms import Compose, Resize, ColorJitter
# 定义想要使用那些数据增强方式,这里用到了随机调整亮度、对比度和饱和度,改变图片大小
transform = Compose([ColorJitter(), Resize(size=100)])
# 通过transform参数传递定义好的数据增项方法即可完成对自带数据集的应用
train_dataset = vision.datasets.MNIST(mode='train', transform=transform)
3.3.2 自定义数据集
^^^^^^^^^^^^^^^^^^
针对自定义数据集使用数据增强有两种方式,一种是在数据集的构造函数中进行数据增强方法的定义,之后对__getitem__中返回的数据进行应用。另外一种方式也可以给自定义的数据集类暴漏一个构造参数,在实例化类的时候将数据增强方法传递进去。
.. code:: ipython3
from paddle.io import Dataset
class MyDataset(Dataset):
def __init__(self, mode='train'):
super(MyDataset, self).__init__()
if mode == 'train':
self.data = [
['traindata1', 'label1'],
['traindata2', 'label2'],
['traindata3', 'label3'],
['traindata4', 'label4'],
]
else:
self.data = [
['testdata1', 'label1'],
['testdata2', 'label2'],
['testdata3', 'label3'],
['testdata4', 'label4'],
]
# 定义要使用的数据预处理方法,针对图片的操作
self.transform = Compose([ColorJitter(), Resize(size=100)])
def __getitem__(self, index):
data = self.data[index][0]
# 在这里对训练数据进行应用
# 这里只是一个示例,测试时需要将数据集更换为图片数据进行测试
data = self.transform(data)
label = self.data[index][1]
return data, label
def __len__(self):
return len(self.data)
4. 模型组网
-----------
针对高层API在模型组网上和基础API是统一的一套,无需投入额外的学习使用成本。那么这里我举几个简单的例子来做示例。
4.1 Sequential组网
~~~~~~~~~~~~~~~~~~
针对顺序的线性网络结构我们可以直接使用Sequential来快速完成组网,可以减少类的定义等代码编写。
.. code:: ipython3
# Sequential形式组网
mnist = paddle.nn.Sequential(
paddle.nn.Flatten(),
paddle.nn.Linear(784, 512),
paddle.nn.ReLU(),
paddle.nn.Dropout(0.2),
paddle.nn.Linear(512, 10)
)
4.2 SubClass组网
~~~~~~~~~~~~~~~~
针对一些比较复杂的网络结构,就可以使用Layer子类定义的方式来进行模型代码编写,在\ ``__init__``\ 构造函数中进行组网Layer的声明,在\ ``forward``\ 中使用声明的Layer变量进行前向计算。子类组网方式也可以实现sublayer的复用,针对相同的layer可以在构造函数中一次性定义,在forward中多次调用。
.. code:: ipython3
# Layer类继承方式组网
class Mnist(paddle.nn.Layer):
def __init__(self):
super(Mnist, self).__init__()
self.flatten = paddle.nn.Flatten()
self.linear_1 = paddle.nn.Linear(784, 512)
self.linear_2 = paddle.nn.Linear(512, 10)
self.relu = paddle.nn.ReLU()
self.dropout = paddle.nn.Dropout(0.2)
def forward(self, inputs):
y = self.flatten(inputs)
y = self.linear_1(y)
y = self.relu(y)
y = self.dropout(y)
y = self.linear_2(y)
return y
mnist = Mnist()
4.3 模型封装
~~~~~~~~~~~~
定义好网络结构之后我们来使用\ ``paddle.Model``\ 完成模型的封装,将网络结构组合成一个可快速使用高层API进行训练、评估和预测的类。
在封装的时候我们有两种场景,动态图训练模式和静态图训练模式。
.. code:: ipython3
# 场景1:动态图模式
# 启动动态图训练模式
paddle.disable_static()
# 使用GPU训练
paddle.set_device('gpu')
# 模型封装
model = paddle.Model(mnist)
# 场景2:静态图模式
# input = paddle.static.InputSpec([None, 1, 28, 28], dtype='float32')
# label = paddle.static.InputSpec([None, 1], dtype='int8')
# model = paddle.Model(mnist, input, label)
4.4 模型可视化
~~~~~~~~~~~~~~
在组建好我们的网络结构后,一般我们会想去对我们的网络结构进行一下可视化,逐层的去对齐一下我们的网络结构参数,看看是否符合我们的预期。这里可以通过\ ``Model.summary``\ 接口进行可视化展示。
.. code:: ipython3
model.summary((1, 28, 28))
另外,summary接口有两种使用方式,下面我们通过两个示例来做展示,除了\ ``Model.summary``\ 这种配套\ ``paddle.Model``\ 封装使用的接口外,还有一套配合没有经过\ ``paddle.Model``\ 封装的方式来使用。可以直接将实例化好的Layer子类放到\ ``paddle.summary``\ 接口中进行可视化呈现。
.. code:: ipython3
paddle.summary(mnist, (1, 28, 28))
这里面有一个注意的点,有的用户可能会疑惑为什么要传递\ ``(1, 28, 28)``\ 这个input_size参数,因为在动态图中,网络定义阶段是还没有得到输入数据的形状信息,我们想要做网络结构的呈现就无从下手,那么我们通过告知接口网络结构的输入数据形状,这样网络可以通过逐层的计算推导得到完整的网络结构信息进行呈现。如果是动态图运行模式,那么就不需要给summary接口传递输入数据形状这个值了,因为在Model封装的时候我们已经定义好了InputSpec,其中包含了输入数据的形状格式。
5. 模型训练
-----------
使用\ ``paddle.Model``\ 封装成模型类后进行训练非常的简洁方便,我们可以直接通过调用\ ``Model.fit``\ 就可以完成训练过程。
在使用\ ``Model.fit``\ 接口启动训练前,我们先通过\ ``Model.prepare``\ 接口来对训练进行提前的配置准备工作,包括设置模型优化器,Loss计算方法,精度计算方法等。
.. code:: ipython3
# 为模型训练做准备,设置优化器,损失函数和精度计算方式
model.prepare(paddle.optimizer.Adam(parameters=model.parameters()),
paddle.nn.CrossEntropyLoss(),
paddle.metric.Accuracy())
做好模型训练的前期准备工作后,我们正式调用\ ``fit()``\ 接口来启动训练过程,需要指定一下至少3个关键参数:训练数据集,训练轮次和单次训练数据批次大小。
.. code:: ipython3
# 启动模型训练,指定训练数据集,设置训练轮次,设置每次数据集计算的批次大小,设置日志格式
model.fit(train_dataset,
epochs=10,
batch_size=32,
verbose=1)
5.1 单机单卡
~~~~~~~~~~~~
我们把刚才单步教学的训练代码做一个整合,这个完整的代码示例就是我们的单机单卡训练程序。
.. code:: ipython3
# 启动动态图训练模式
paddle.disable_static()
# 使用GPU训练
paddle.set_device('gpu')
# 构建模型训练用的Model,告知需要训练哪个模型
model = paddle.Model(mnist)
# 为模型训练做准备,设置优化器,损失函数和精度计算方式
model.prepare(paddle.optimizer.Adam(parameters=model.parameters()),
paddle.nn.CrossEntropyLoss(),
paddle.metric.Accuracy())
# 启动模型训练,指定训练数据集,设置训练轮次,设置每次数据集计算的批次大小,设置日志格式
model.fit(train_dataset,
epochs=10,
batch_size=32,
verbose=1)
5.2 单机多卡
~~~~~~~~~~~~
对于高层API来实现单机多卡非常简单,整个训练代码和单机单卡没有差异。直接使用\ ``paddle.distributed.launch``\ 启动单机单卡的程序即可。
.. code:: ipython3
# train.py里面包含的就是单机单卡代码
python -m paddle.distributed.launch train.py
6. 模型评估
-----------
对于训练好的模型进行评估操作可以使用\ ``evaluate``\ 接口来实现。
.. code:: ipython3
result = model.evaluate(val_dataset, verbose=1)
7. 模型预测
-----------
高层API中提供\ ``predict``\ 接口,支持用户使用测试数据来完成模型的预测。
.. code:: ipython3
pred_result = model.predict(val_dataset)
8. 模型部署
-----------
8.1 模型存储
~~~~~~~~~~~~
模型训练和验证达到我们的预期后,可以使用\ ``save``\ 接口来将我们的模型保存下来,用于后续模型的Fine-tuning或推理部署。
.. code:: ipython3
# 保存用于推理部署的模型(training=False
model.save('~/model/mnist', training=False)
8.2 预测部署
~~~~~~~~~~~~
有了用于推理部署的模型,就可以使用推理部署框架来完成预测服务部署,具体可以参见:\ `预测部署 <https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/index_cn.html>`__\
包括服务端部署、移动端部署和模型压缩。
################
快速上手
################
在这里PaddlePaddle为大家提供了一些简单的案例,快速上手paddle 2.0:
- `hello paddle <./hello_paddle/hello_paddle.html>`_ :简单介绍 Paddle,完成您的第一个Paddle项目。
- `Paddle 动态图 <./dynamic_graph/dynamic_graph.html>`_ :介绍使用 Paddle 动态图。
- `高层API快速上手 <./getting_started/getting_started.html>`_ :介绍Paddle高层API,快速完成模型搭建。
- `高层API详细介绍 <./high_level_api/high_level_api.html>`_ :详细介绍Paddle高层API。
- `模型加载与保存 <./save_model/save_model.html>`_ :介绍Paddle 模型的加载与保存。
- `线性回归 <./linear_regression/linear_regression.html>`_ :介绍使用 Paddle 实现线性回归任务。
.. toctree::
:hidden:
:titlesonly:
hello_paddle/hello_paddle.rst
dynamic_graph/dynamic_graph.rst
getting_started/getting_started.rst
high_level_api/high_level_api.rst
save_model/save_model.rst
linear_regression/linear_regression.rst
因为 它太大了无法显示 source diff 。你可以改为 查看blob
线性回归
========
NOTE:
本示例教程依然在开发中,目前是基于2.0beta版本(由于2.0beta没有正式发版,在用最新developwhl包下载的paddle)。
简要介绍
--------
经典的线性回归模型主要用来预测一些存在着线性关系的数据集。回归模型可以理解为:存在一个点集,用一条曲线去拟合它分布的过程。如果拟合曲线是一条直线,则称为线性回归。如果是一条二次曲线,则被称为二次回归。线性回归是回归模型中最简单的一种。
本示例简要介绍如何用飞桨开源框架,实现波士顿房价预测。其思路是,假设uci-housing数据集中的房子属性和房价之间的关系可以被属性间的线性组合描述。在模型训练阶段,让假设的预测结果和真实值之间的误差越来越小。在模型预测阶段,预测器会读取训练好的模型,对从未遇见过的房子属性进行房价预测。
环境设置
--------
本示例基于飞桨开源框架2.0版本。
.. code:: ipython3
import paddle
import numpy as np
import os
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
paddle.__version__
.. parsed-literal::
'0.0.0'
数据集
------
本示例采用uci-housing数据集,这是经典线性回归的数据集。数据集共506,每行14列。前13列用来描述房屋的各种信息,最后一列为该类房屋价格中位数。飞桨提供了读取uci_housing训练集和测试集的接口,分别为paddle.dataset.uci_housing.train()paddle.dataset.uci_housing.test()
13列用来描述房屋的各种信息
.. figure:: https://ai-studio-static-online.cdn.bcebos.com/c19602ce74284e3b9a50422f8dc37c0c1c79cf5cd8424994b6a6b073dcb7c057
:alt: avatar
avatar
下面我们来浏览一下数据是什么样子的:
.. code:: ipython3
import matplotlib.pyplot as plt
import matplotlib
train_data=paddle.dataset.uci_housing.train()
sample_data=next(train_data())
print(sample_data[0])
# 画图看特征间的关系,主要是变量两两之间的关系(线性或非线性,有无明显较为相关关系)
feature_names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE','DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']
feature_num = len(feature_names)
features_np=np.array([x[0] for x in train_data()],np.float32)
labels_np=np.array([x[1] for x in train_data()],np.float32)
data_np=np.c_[features_np,labels_np]
df=pd.DataFrame(data_np,columns=feature_names)
matplotlib.use('TkAgg')
%matplotlib inline
sns.pairplot(df.dropna())
plt.show()
.. parsed-literal::
[-0.0405441 0.06636364 -0.32356227 -0.06916996 -0.03435197 0.05563625
-0.03475696 0.02682186 -0.37171335 -0.21419304 -0.33569506 0.10143217
-0.21172912]
.. image:: linear_regression_files/linear_regression_6_1.png
上图中,对角线上是各属性的直方图,非对角线上的是两个不同属性之间的相关图。
从图中我们可以看出,RM(每栋房平均客房数)、LSTAT(低收入人群占比)、与房价成明显的相关关系、NOX(一氧化碳浓度)和DIS(与波士顿就业中心距离)成明显相关关系等。
.. code:: ipython3
# 相关性分析
fig, ax = plt.subplots(figsize=(15,15))
ax=sns.heatmap(df.corr(), cbar=True, annot=True)
ax.set_ylim([14, 0])
plt.show()
.. image:: linear_regression_files/linear_regression_8_0.png
**数据归一化处理**
下图为大家展示各属性的取值范围分布:
.. code:: ipython3
sns.boxplot(data=df.iloc[:,0:13])
.. parsed-literal::
<matplotlib.axes._subplots.AxesSubplot at 0x1a3adcb410>
.. image:: linear_regression_files/linear_regression_11_1.png
做归一化(或 Feature scaling)至少有以下3个理由:
- 过大或过小的数值范围会导致计算时的浮点上溢或下溢。
- 不同的数值范围会导致不同属性对模型的重要性不同(至少在训练的初始阶段如此),而这个隐含的假设常常是不合理的。这会对优化的过程造成困难,使训练时间大大的加长.
- 很多的机器学习技巧/模型(例如L1L2正则项,向量空间模型-Vector Space
Model)都基于这样的假设:所有的属性取值都差不多是以0为均值且取值范围相近的。
.. code:: ipython3
features_max=[]
features_min=[]
features_avg=[]
for i in range(13):
i_feature_max=max([data[1][0][i] for data in enumerate(train_data())])
features_max.append(i_feature_max)
i_feature_min=min([data[1][0][i] for data in enumerate(train_data())])
features_min.append(i_feature_min)
i_feature_avg=sum([data[1][0][i] for data in enumerate(train_data())])/506
features_avg.append(i_feature_avg)
.. code:: ipython3
BATCH_SIZE=20
def feature_norm(input):
f_size=input.shape[0]
output_features=np.zeros((f_size,13),np.float32)
for batch_id in range(f_size):
for index in range(13):
output_features[batch_id][index]=(input[batch_id][index]-features_avg[index])/(features_max[index]-features_min[index])
return output_features
定义绘制训练过程的损失值变化趋势的方法draw_train_process
.. code:: ipython3
global iter
iter=0
iters=[]
train_costs=[]
def draw_train_process(iters,train_costs):
plt.title("training cost" ,fontsize=24)
plt.xlabel("iter", fontsize=14)
plt.ylabel("cost", fontsize=14)
plt.plot(iters, train_costs,color='red',label='training cost')
plt.show()
**数据提供器**
下面我们分别定义了用于训练和测试的数据提供器。提供器每次读入一个大小为BATCH_SIZE的数据批次。如果您希望加一些随机性,它可以同时定义一个批次大小和一个缓存大小。这样的话,每次数据提供器会从缓存中随机读取批次大小那么多的数据。
.. code:: ipython3
BATCH_SIZE=20
BUF_SIZE=500
train_reader=paddle.batch(paddle.reader.shuffle(paddle.dataset.uci_housing.train(),buf_size=BUF_SIZE),batch_size=BATCH_SIZE)
模型配置
--------
线性回归就是一个从输入到输出的简单的全连接层。
对于波士顿房价数据集,假设属性和房价之间的关系可以被属性间的线性组合描述。
.. code:: ipython3
class Regressor(paddle.nn.Layer):
def __init__(self):
super(Regressor,self).__init__()
self.fc=paddle.nn.Linear(13,1,None)
def forward(self,inputs):
pred=self.fc(inputs)
return pred
模型训练
---------
下面为大家展示模型训练的代码。
这里用到的是线性回归模型最常用的损失函数–均方误差(MSE),用来衡量模型预测的房价和真实房价的差异。
对损失函数进行优化所采用的方法是梯度下降法
.. code:: ipython3
y_preds=[]
labels_list=[]
def train(model):
print('start training ... ')
model.train()
EPOCH_NUM=500
optimizer=paddle.optimizer.SGD(learning_rate=0.001, parameters = model.parameters())
iter=0
for epoch_id in range(EPOCH_NUM):
train_cost=0
for batch_id,data in enumerate(train_reader()):
features_np=np.array([x[0] for x in data],np.float32)
labels_np=np.array([x[1] for x in data],np.float32)
features=paddle.to_variable(feature_norm(features_np))
labels=paddle.to_variable(labels_np)
#前向计算
y_pred=model(features)
cost=paddle.nn.functional.square_error_cost(y_pred,label=labels)
avg_cost=paddle.mean(cost)
train_cost = [avg_cost.numpy()]
#反向传播
avg_cost.backward()
#最小化loss,更新参数
opts=optimizer.minimize(avg_cost)
# 清除梯度
model.clear_gradients()
if batch_id%30==0 and epoch_id%30==0:
print("Pass:%d,Cost:%0.5f"%(epoch_id,train_cost[0][0]))
iter=iter+BATCH_SIZE
iters.append(iter)
train_costs.append(train_cost[0][0])
paddle.disable_static()
model = Regressor()
train(model)
.. parsed-literal::
start training ...
Pass:0,Cost:531.75244
Pass:30,Cost:61.10927
Pass:60,Cost:22.68571
Pass:90,Cost:34.80560
Pass:120,Cost:78.28358
Pass:150,Cost:124.95644
Pass:180,Cost:91.88014
Pass:210,Cost:15.23689
Pass:240,Cost:34.86035
Pass:270,Cost:54.76824
Pass:300,Cost:65.88247
Pass:330,Cost:41.25426
Pass:360,Cost:64.10200
Pass:390,Cost:77.11707
Pass:420,Cost:20.80456
Pass:450,Cost:29.80167
Pass:480,Cost:41.59278
.. code:: ipython3
matplotlib.use('TkAgg')
%matplotlib inline
draw_train_process(iters,train_costs)
.. image:: linear_regression_files/linear_regression_23_0.png
可以从上图看出,随着训练轮次的增加,损失在呈降低趋势。但由于每次仅基于少量样本更新参数和计算损失,所以损失下降曲线会出现震荡。
模型预测
----------
.. code:: ipython3
#获取预测数据
INFER_BATCH_SIZE=100
infer_reader=paddle.batch(paddle.dataset.uci_housing.test(),batch_size=INFER_BATCH_SIZE)
infer_data = next(infer_reader())
infer_features_np = np.array([data[0] for data in infer_data]).astype("float32")
infer_labels_np= np.array([data[1] for data in infer_data]).astype("float32")
infer_features=paddle.to_variable(feature_norm(infer_features_np))
infer_labels=paddle.to_variable(infer_labels_np)
fetch_list=model(infer_features).numpy()
sum_cost=0
for i in range(INFER_BATCH_SIZE):
infer_result=fetch_list[i][0]
ground_truth=infer_labels.numpy()[i]
if i%10==0:
print("No.%d: infer result is %.2f,ground truth is %.2f" % (i, infer_result,ground_truth))
cost=np.power(infer_result-ground_truth,2)
sum_cost+=cost
print("平均误差为:",sum_cost/INFER_BATCH_SIZE)
.. parsed-literal::
No.0: infer result is 12.20,ground truth is 8.50
No.10: infer result is 5.65,ground truth is 7.00
No.20: infer result is 14.87,ground truth is 11.70
No.30: infer result is 16.60,ground truth is 11.70
No.40: infer result is 13.71,ground truth is 10.80
No.50: infer result is 16.11,ground truth is 14.90
No.60: infer result is 18.78,ground truth is 21.40
No.70: infer result is 15.53,ground truth is 13.80
No.80: infer result is 18.10,ground truth is 20.60
No.90: infer result is 21.39,ground truth is 24.50
平均误差为: [12.917107]
.. code:: ipython3
def plot_pred_ground(pred, groud):
plt.figure()
plt.title("Predication v.s. Ground", fontsize=24)
plt.xlabel("groud price(unit:$1000)", fontsize=14)
plt.ylabel("predict price", fontsize=14)
plt.scatter(pred, groud, alpha=0.5) # scatter:散点图,alpha:"透明度"
plt.plot(groud, groud, c='red')
plt.show()
.. code:: ipython3
plot_pred_ground(fetch_list, infer_labels_np)
.. image:: linear_regression_files/linear_regression_28_0.png
上图可以看出,我们训练出来的模型的预测结果与真实结果是较为接近的。
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 模型保存及加载\n",
"本教程将基于Paddle高阶API对模型参数的保存和加载进行讲解。在日常训练模型过程中我们会遇到一些突发情况,导致训练过程主动或被动的中断,因此在模型没有完全训练好的情况下,我们需要高频的保存下模型参数,在发生意外时可以快速载入保存的参数继续训练。抑或是模型已经训练好了,我们需要使用训练好的参数进行预测或部署模型上线。面对上述情况,Paddle中提供了保存模型和提取模型的方法,支持从上一次保存状态开始训练,只要我们随时保存训练过程中的模型状态,就不用从初始状态重新训练。\n",
"下面将基于手写数字识别的模型讲解paddle如何保存及加载模型,并恢复训练,网络结构部分的讲解省略。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 环境\n",
"本教程基于paddle-develop编写,如果您的环境不是本版本,请先安装paddle-develop版本。"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.0.0\n"
]
}
],
"source": [
"import paddle\n",
"import paddle.nn.functional as F\n",
"from paddle.nn import Layer\n",
"from paddle.vision.datasets import MNIST\n",
"from paddle.metric import Accuracy\n",
"from paddle.nn import Conv2d,MaxPool2d,Linear\n",
"from paddle.static import InputSpec\n",
"\n",
"print(paddle.__version__)\n",
"paddle.disable_static()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 数据集\n",
"手写数字的MNIST数据集,包含60,000个用于训练的示例和10,000个用于测试的示例。这些数字已经过尺寸标准化并位于图像中心,图像是固定大小(28x28像素),其值为0到1。该数据集的官方地址为:http://yann.lecun.com/exdb/mnist/\n",
"本例中我们使用飞桨自带的mnist数据集。使用from paddle.vision.datasets import MNIST 引入即可。"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"train_dataset = MNIST(mode='train')\n",
"test_dataset = MNIST(mode='test')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 模型搭建"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"class MyModel(Layer):\n",
" def __init__(self):\n",
" super(MyModel, self).__init__()\n",
" self.conv1 = paddle.nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, stride=1, padding=2)\n",
" self.max_pool1 = MaxPool2d(kernel_size=2, stride=2)\n",
" self.conv2 = Conv2d(in_channels=6, out_channels=16, kernel_size=5, stride=1)\n",
" self.max_pool2 = MaxPool2d(kernel_size=2, stride=2)\n",
" self.linear1 = Linear(in_features=16*5*5, out_features=120)\n",
" self.linear2 = Linear(in_features=120, out_features=84)\n",
" self.linear3 = Linear(in_features=84, out_features=10)\n",
"\n",
" def forward(self, x):\n",
" x = self.conv1(x)\n",
" x = F.relu(x)\n",
" x = self.max_pool1(x)\n",
" x = F.relu(x)\n",
" x = self.conv2(x)\n",
" x = self.max_pool2(x)\n",
" x = paddle.flatten(x, start_axis=1, stop_axis=-1)\n",
" x = self.linear1(x)\n",
" x = F.relu(x)\n",
" x = self.linear2(x)\n",
" x = F.relu(x)\n",
" x = self.linear3(x)\n",
" x = F.softmax(x)\n",
" return x"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 模型训练\n",
"通过`Model` 构建实例,快速完成模型训练"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/1\n",
"step 100/938 - loss: 1.6177 - acc_top1: 0.6119 - acc_top2: 0.6813 - 15ms/step\n",
"step 200/938 - loss: 1.7720 - acc_top1: 0.7230 - acc_top2: 0.7788 - 15ms/step\n",
"step 300/938 - loss: 1.6114 - acc_top1: 0.7666 - acc_top2: 0.8164 - 15ms/step\n",
"step 400/938 - loss: 1.6537 - acc_top1: 0.7890 - acc_top2: 0.8350 - 15ms/step\n",
"step 500/938 - loss: 1.5229 - acc_top1: 0.8170 - acc_top2: 0.8619 - 15ms/step\n",
"step 600/938 - loss: 1.5269 - acc_top1: 0.8391 - acc_top2: 0.8821 - 15ms/step\n",
"step 700/938 - loss: 1.4821 - acc_top1: 0.8561 - acc_top2: 0.8970 - 15ms/step\n",
"step 800/938 - loss: 1.4860 - acc_top1: 0.8689 - acc_top2: 0.9081 - 15ms/step\n",
"step 900/938 - loss: 1.5032 - acc_top1: 0.8799 - acc_top2: 0.9174 - 15ms/step\n",
"step 938/938 - loss: 1.4617 - acc_top1: 0.8835 - acc_top2: 0.9203 - 15ms/step\n",
"save checkpoint at /Users/dingjiawei/online_repo/book/paddle2.0_docs/save_model/mnist_checkpoint/0\n",
"Eval begin...\n",
"step 100/157 - loss: 1.4765 - acc_top1: 0.9636 - acc_top2: 0.9891 - 6ms/step\n",
"step 157/157 - loss: 1.4612 - acc_top1: 0.9705 - acc_top2: 0.9910 - 6ms/step\n",
"Eval samples: 10000\n",
"save checkpoint at /Users/dingjiawei/online_repo/book/paddle2.0_docs/save_model/mnist_checkpoint/final\n"
]
}
],
"source": [
"inputs = InputSpec([None, 784], 'float32', 'x')\n",
"labels = InputSpec([None, 10], 'float32', 'x')\n",
"model = paddle.Model(MyModel(), inputs, labels)\n",
"\n",
"optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters())\n",
"\n",
"model.prepare(\n",
" optim,\n",
" paddle.nn.loss.CrossEntropyLoss(),\n",
" Accuracy(topk=(1, 2))\n",
" )\n",
"model.fit(train_dataset,\n",
" test_dataset,\n",
" epochs=1,\n",
" log_freq=100,\n",
" batch_size=64,\n",
" save_dir='mnist_checkpoint')\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 保存模型参数\n",
"\n",
"目前Paddle框架有三种保存模型参数的体系,分别是:\n",
"#### paddle 高阶API-模型参数保存\n",
" * paddle.Model.fit\n",
" * paddle.Model.save\n",
"#### paddle 基础框架-动态图-模型参数保存 \n",
" * paddle.save\n",
"#### paddle 基础框架-静态图-模型参数保存 \n",
" * paddle.io.save\n",
" * paddle.io.save_inference_model\n",
"\n",
"下面将基于高阶API对模型保存与加载的方法进行讲解。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"#### 方法一:\n",
"* paddle.Model.fit(train_data, epochs, batch_size, save_dir, log_freq) <br><br>\n",
"在使用model.fit函数进行网络循环训练时,在save_dir参数中指定保存模型的路径,save_freq指定写入频率,即可同时实现模型的训练和保存。mode.fit()只能保存模型参数,不能保存优化器参数,每个epoch结束只会生成一个.pdparams文件。可以边训练边保存,每次epoch结束会实时生成一个.pdparams文件。 \n",
"\n",
"#### 方法二:\n",
"* paddle.Model.save(self, path, training=True) <br><br>\n",
"model.save(path)方法可以保存模型结构、网络参数和优化器参数,参数training=true的使用场景是在训练过程中,此时会保存网络参数和优化器参数。每个epoch生成两种文件 0.pdparams,0.pdopt,分别存储了模型参数和优化器参数,但是只会在整个模型训练完成后才会生成包含所有epoch参数的文件,path的格式为'dirname/file_prefix' 或 'file_prefix',其中dirname指定路径名称,file_prefix 指定参数文件的名称。当training=false的时候,代表已经训练结束,此时存储的是预测模型结构和网络参数。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 方法一:训练过程中实时保存每个epoch的模型参数\n",
"model.fit(train_dataset,\n",
" test_dataset,\n",
" epochs=2,\n",
" batch_size=64,\n",
" save_dir='mnist_checkpoint'\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 方法二:model.save()保存模型和优化器参数信息\n",
"model.save('mnist_checkpoint/test')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 加载模型参数\n",
"\n",
"当恢复训练状态时,需要加载模型数据,此时我们可以使用加载函数从存储模型状态和优化器状态的文件中载入模型参数和优化器参数,如果不需要恢复优化器,则不必使用优化器状态文件。\n",
"#### 高阶API-模型参数加载\n",
" * paddle.Model.load\n",
"#### paddle 基础框架-动态图-模型参数加载\n",
" * paddle.load\n",
"#### paddle 基础框架-静态图-模型参数加载\n",
" * paddle.io.load \n",
" * paddle.io.load_inference_model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"下面将对高阶API的模型参数加载方法进行讲解\n",
"* model.load(self, path, skip_mismatch=False, reset_optimizer=False)<br><br>\n",
"model.load能够同时加载模型和优化器参数。通过reset_optimizer参数来指定是否需要恢复优化器参数,若reset_optimizer参数为True,则重新初始化优化器参数,若reset_optimizer参数为False,则从路径中恢复优化器参数。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# 高阶API加载模型\n",
"model.load('mnist_checkpoint/test')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 恢复训练\n",
"\n",
"理想的恢复训练是模型状态回到训练中断的时刻,恢复训练之后的梯度更新走向是和恢复训练前的梯度走向完全相同的。基于此,我们可以通过恢复训练后的损失变化,判断上述方法是否能准确的恢复训练。即从epoch 0结束时保存的模型参数和优化器状态恢复训练,校验其后训练的损失变化(epoch 1)是否和不中断时的训练完全一致。\n",
"\n",
"说明:\n",
"\n",
"恢复训练有如下两个要点:\n",
"\n",
"* 保存模型时同时保存模型参数和优化器参数\n",
"\n",
"* 恢复参数时同时恢复模型参数和优化器参数。"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/2\n",
"step 100/938 - loss: 1.4635 - acc_top1: 0.9650 - acc_top2: 0.9898 - 15ms/step\n",
"step 200/938 - loss: 1.5459 - acc_top1: 0.9659 - acc_top2: 0.9897 - 15ms/step\n",
"step 300/938 - loss: 1.5109 - acc_top1: 0.9658 - acc_top2: 0.9893 - 15ms/step\n",
"step 400/938 - loss: 1.4797 - acc_top1: 0.9664 - acc_top2: 0.9899 - 15ms/step\n",
"step 500/938 - loss: 1.4786 - acc_top1: 0.9673 - acc_top2: 0.9902 - 15ms/step\n",
"step 600/938 - loss: 1.5082 - acc_top1: 0.9679 - acc_top2: 0.9906 - 15ms/step\n",
"step 700/938 - loss: 1.4768 - acc_top1: 0.9687 - acc_top2: 0.9909 - 15ms/step\n",
"step 800/938 - loss: 1.4638 - acc_top1: 0.9696 - acc_top2: 0.9913 - 15ms/step\n",
"step 900/938 - loss: 1.5058 - acc_top1: 0.9704 - acc_top2: 0.9916 - 15ms/step\n",
"step 938/938 - loss: 1.4702 - acc_top1: 0.9708 - acc_top2: 0.9917 - 15ms/step\n",
"Eval begin...\n",
"step 100/157 - loss: 1.4613 - acc_top1: 0.9755 - acc_top2: 0.9944 - 5ms/step\n",
"step 157/157 - loss: 1.4612 - acc_top1: 0.9805 - acc_top2: 0.9956 - 5ms/step\n",
"Eval samples: 10000\n",
"Epoch 2/2\n",
"step 100/938 - loss: 1.4832 - acc_top1: 0.9789 - acc_top2: 0.9927 - 15ms/step\n",
"step 200/938 - loss: 1.4618 - acc_top1: 0.9779 - acc_top2: 0.9932 - 14ms/step\n",
"step 300/938 - loss: 1.4613 - acc_top1: 0.9779 - acc_top2: 0.9929 - 15ms/step\n",
"step 400/938 - loss: 1.4765 - acc_top1: 0.9772 - acc_top2: 0.9932 - 15ms/step\n",
"step 500/938 - loss: 1.4932 - acc_top1: 0.9775 - acc_top2: 0.9934 - 15ms/step\n",
"step 600/938 - loss: 1.4773 - acc_top1: 0.9773 - acc_top2: 0.9936 - 15ms/step\n",
"step 700/938 - loss: 1.4612 - acc_top1: 0.9783 - acc_top2: 0.9939 - 15ms/step\n",
"step 800/938 - loss: 1.4653 - acc_top1: 0.9779 - acc_top2: 0.9939 - 15ms/step\n",
"step 900/938 - loss: 1.4639 - acc_top1: 0.9780 - acc_top2: 0.9939 - 15ms/step\n",
"step 938/938 - loss: 1.4678 - acc_top1: 0.9779 - acc_top2: 0.9937 - 15ms/step\n",
"Eval begin...\n",
"step 100/157 - loss: 1.4612 - acc_top1: 0.9733 - acc_top2: 0.9945 - 6ms/step\n",
"step 157/157 - loss: 1.4612 - acc_top1: 0.9778 - acc_top2: 0.9952 - 6ms/step\n",
"Eval samples: 10000\n"
]
}
],
"source": [
"import paddle\n",
"from paddle.vision.datasets import MNIST\n",
"from paddle.metric import Accuracy\n",
"from paddle.static import InputSpec\n",
"#\n",
"#\n",
"train_dataset = MNIST(mode='train')\n",
"test_dataset = MNIST(mode='test')\n",
"\n",
"paddle.disable_static()\n",
"\n",
"inputs = InputSpec([None, 784], 'float32', 'x')\n",
"labels = InputSpec([None, 10], 'float32', 'x')\n",
"model = paddle.Model(MyModel(), inputs, labels)\n",
"optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters())\n",
"model.load(\"./mnist_checkpoint/final\")\n",
"model.prepare( \n",
" optim,\n",
" paddle.nn.loss.CrossEntropyLoss(),\n",
" Accuracy(topk=(1, 2))\n",
" )\n",
"model.fit(train_data=train_dataset,\n",
" eval_data=test_dataset,\n",
" batch_size=64,\n",
" log_freq=100,\n",
" epochs=2\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 总结\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"以上就是用Mnist手写数字识别的例子对保存模型、加载模型、恢复训练进行讲解,Paddle提供了很多保存和加载的API方法,您可以根据自己的需求进行选择。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.8"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
模型保存及加载
==============
本教程将基于Paddle高阶API对模型参数的保存和加载进行讲解。在日常训练模型过程中我们会遇到一些突发情况,导致训练过程主动或被动的中断,因此在模型没有完全训练好的情况下,我们需要高频的保存下模型参数,在发生意外时可以快速载入保存的参数继续训练。抑或是模型已经训练好了,我们需要使用训练好的参数进行预测或部署模型上线。面对上述情况,Paddle中提供了保存模型和提取模型的方法,支持从上一次保存状态开始训练,只要我们随时保存训练过程中的模型状态,就不用从初始状态重新训练。
下面将基于手写数字识别的模型讲解paddle如何保存及加载模型,并恢复训练,网络结构部分的讲解省略。
环境
----
本教程基于paddle-develop编写,如果您的环境不是本版本,请先安装paddle-develop版本。
.. code:: ipython3
import paddle
import paddle.nn.functional as F
from paddle.nn import Layer
from paddle.vision.datasets import MNIST
from paddle.metric import Accuracy
from paddle.nn import Conv2d,MaxPool2d,Linear
from paddle.static import InputSpec
print(paddle.__version__)
paddle.disable_static()
.. parsed-literal::
0.0.0
数据集
------
手写数字的MNIST数据集,包含60,000个用于训练的示例和10,000个用于测试的示例。这些数字已经过尺寸标准化并位于图像中心,图像是固定大小(28x28像素),其值为01。该数据集的官方地址为:http://yann.lecun.com/exdb/mnist/
本例中我们使用飞桨自带的mnist数据集。使用from paddle.vision.datasets
import MNIST 引入即可。
.. code:: ipython3
train_dataset = MNIST(mode='train')
test_dataset = MNIST(mode='test')
模型搭建
--------
.. code:: ipython3
class MyModel(Layer):
def __init__(self):
super(MyModel, self).__init__()
self.conv1 = paddle.nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, stride=1, padding=2)
self.max_pool1 = MaxPool2d(kernel_size=2, stride=2)
self.conv2 = Conv2d(in_channels=6, out_channels=16, kernel_size=5, stride=1)
self.max_pool2 = MaxPool2d(kernel_size=2, stride=2)
self.linear1 = Linear(in_features=16*5*5, out_features=120)
self.linear2 = Linear(in_features=120, out_features=84)
self.linear3 = Linear(in_features=84, out_features=10)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.max_pool1(x)
x = F.relu(x)
x = self.conv2(x)
x = self.max_pool2(x)
x = paddle.flatten(x, start_axis=1, stop_axis=-1)
x = self.linear1(x)
x = F.relu(x)
x = self.linear2(x)
x = F.relu(x)
x = self.linear3(x)
x = F.softmax(x)
return x
模型训练
--------
通过\ ``Model`` 构建实例,快速完成模型训练
.. code:: ipython3
inputs = InputSpec([None, 784], 'float32', 'x')
labels = InputSpec([None, 10], 'float32', 'x')
model = paddle.Model(MyModel(), inputs, labels)
optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters())
model.prepare(
optim,
paddle.nn.loss.CrossEntropyLoss(),
Accuracy(topk=(1, 2))
)
model.fit(train_dataset,
test_dataset,
epochs=1,
log_freq=100,
batch_size=64,
save_dir='mnist_checkpoint')
.. parsed-literal::
Epoch 1/1
step 100/938 - loss: 1.6177 - acc_top1: 0.6119 - acc_top2: 0.6813 - 15ms/step
step 200/938 - loss: 1.7720 - acc_top1: 0.7230 - acc_top2: 0.7788 - 15ms/step
step 300/938 - loss: 1.6114 - acc_top1: 0.7666 - acc_top2: 0.8164 - 15ms/step
step 400/938 - loss: 1.6537 - acc_top1: 0.7890 - acc_top2: 0.8350 - 15ms/step
step 500/938 - loss: 1.5229 - acc_top1: 0.8170 - acc_top2: 0.8619 - 15ms/step
step 600/938 - loss: 1.5269 - acc_top1: 0.8391 - acc_top2: 0.8821 - 15ms/step
step 700/938 - loss: 1.4821 - acc_top1: 0.8561 - acc_top2: 0.8970 - 15ms/step
step 800/938 - loss: 1.4860 - acc_top1: 0.8689 - acc_top2: 0.9081 - 15ms/step
step 900/938 - loss: 1.5032 - acc_top1: 0.8799 - acc_top2: 0.9174 - 15ms/step
step 938/938 - loss: 1.4617 - acc_top1: 0.8835 - acc_top2: 0.9203 - 15ms/step
save checkpoint at /Users/dingjiawei/online_repo/book/paddle2.0_docs/save_model/mnist_checkpoint/0
Eval begin...
step 100/157 - loss: 1.4765 - acc_top1: 0.9636 - acc_top2: 0.9891 - 6ms/step
step 157/157 - loss: 1.4612 - acc_top1: 0.9705 - acc_top2: 0.9910 - 6ms/step
Eval samples: 10000
save checkpoint at /Users/dingjiawei/online_repo/book/paddle2.0_docs/save_model/mnist_checkpoint/final
保存模型参数
------------
目前Paddle框架有三种保存模型参数的体系,分别是: #### paddle
高阶API-模型参数保存 \* paddle.Model.fit \* paddle.Model.save ####
paddle 基础框架-动态图-模型参数保存 \* paddle.save #### paddle
基础框架-静态图-模型参数保存 \* paddle.io.save \*
paddle.io.save_inference_model
下面将基于高阶API对模型保存与加载的方法进行讲解。
方法一:
^^^^^^^^
- paddle.Model.fit(train_data, epochs, batch_size, save_dir, log_freq)
在使用model.fit函数进行网络循环训练时,在save_dir参数中指定保存模型的路径,save_freq指定写入频率,即可同时实现模型的训练和保存。mode.fit()只能保存模型参数,不能保存优化器参数,每个epoch结束只会生成一个.pdparams文件。可以边训练边保存,每次epoch结束会实时生成一个.pdparams文件。
方法二:
^^^^^^^^
- paddle.Model.save(self, path, training=True)
model.save(path)方法可以保存模型结构、网络参数和优化器参数,参数training=true的使用场景是在训练过程中,此时会保存网络参数和优化器参数。每个epoch生成两种文件
0.pdparams,0.pdopt,分别存储了模型参数和优化器参数,但是只会在整个模型训练完成后才会生成包含所有epoch参数的文件,path的格式为’dirname/file_prefix
file_prefix’,其中dirname指定路径名称,file_prefix
指定参数文件的名称。当training=false的时候,代表已经训练结束,此时存储的是预测模型结构和网络参数。
.. code:: ipython3
# 方法一:训练过程中实时保存每个epoch的模型参数
model.fit(train_dataset,
test_dataset,
epochs=2,
batch_size=64,
save_dir='mnist_checkpoint'
)
.. code:: ipython3
# 方法二:model.save()保存模型和优化器参数信息
model.save('mnist_checkpoint/test')
加载模型参数
------------
当恢复训练状态时,需要加载模型数据,此时我们可以使用加载函数从存储模型状态和优化器状态的文件中载入模型参数和优化器参数,如果不需要恢复优化器,则不必使用优化器状态文件。
#### 高阶API-模型参数加载 \* paddle.Model.load #### paddle
基础框架-动态图-模型参数加载 \* paddle.load #### paddle
基础框架-静态图-模型参数加载 \* paddle.io.load \*
paddle.io.load_inference_model
下面将对高阶API的模型参数加载方法进行讲解 \* model.load(self, path,
skip_mismatch=False, reset_optimizer=False)
model.load能够同时加载模型和优化器参数。通过reset_optimizer参数来指定是否需要恢复优化器参数,若reset_optimizer参数为True,则重新初始化优化器参数,若reset_optimizer参数为False,则从路径中恢复优化器参数。
.. code:: ipython3
# 高阶API加载模型
model.load('mnist_checkpoint/test')
恢复训练
--------
理想的恢复训练是模型状态回到训练中断的时刻,恢复训练之后的梯度更新走向是和恢复训练前的梯度走向完全相同的。基于此,我们可以通过恢复训练后的损失变化,判断上述方法是否能准确的恢复训练。即从epoch
0结束时保存的模型参数和优化器状态恢复训练,校验其后训练的损失变化(epoch
1)是否和不中断时的训练完全一致。
说明:
恢复训练有如下两个要点:
- 保存模型时同时保存模型参数和优化器参数
- 恢复参数时同时恢复模型参数和优化器参数。
.. code:: ipython3
import paddle
from paddle.vision.datasets import MNIST
from paddle.metric import Accuracy
from paddle.static import InputSpec
#
#
train_dataset = MNIST(mode='train')
test_dataset = MNIST(mode='test')
paddle.disable_static()
inputs = InputSpec([None, 784], 'float32', 'x')
labels = InputSpec([None, 10], 'float32', 'x')
model = paddle.Model(MyModel(), inputs, labels)
optim = paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters())
model.load("./mnist_checkpoint/final")
model.prepare(
optim,
paddle.nn.loss.CrossEntropyLoss(),
Accuracy(topk=(1, 2))
)
model.fit(train_data=train_dataset,
eval_data=test_dataset,
batch_size=64,
log_freq=100,
epochs=2
)
.. parsed-literal::
Epoch 1/2
step 100/938 - loss: 1.4635 - acc_top1: 0.9650 - acc_top2: 0.9898 - 15ms/step
step 200/938 - loss: 1.5459 - acc_top1: 0.9659 - acc_top2: 0.9897 - 15ms/step
step 300/938 - loss: 1.5109 - acc_top1: 0.9658 - acc_top2: 0.9893 - 15ms/step
step 400/938 - loss: 1.4797 - acc_top1: 0.9664 - acc_top2: 0.9899 - 15ms/step
step 500/938 - loss: 1.4786 - acc_top1: 0.9673 - acc_top2: 0.9902 - 15ms/step
step 600/938 - loss: 1.5082 - acc_top1: 0.9679 - acc_top2: 0.9906 - 15ms/step
step 700/938 - loss: 1.4768 - acc_top1: 0.9687 - acc_top2: 0.9909 - 15ms/step
step 800/938 - loss: 1.4638 - acc_top1: 0.9696 - acc_top2: 0.9913 - 15ms/step
step 900/938 - loss: 1.5058 - acc_top1: 0.9704 - acc_top2: 0.9916 - 15ms/step
step 938/938 - loss: 1.4702 - acc_top1: 0.9708 - acc_top2: 0.9917 - 15ms/step
Eval begin...
step 100/157 - loss: 1.4613 - acc_top1: 0.9755 - acc_top2: 0.9944 - 5ms/step
step 157/157 - loss: 1.4612 - acc_top1: 0.9805 - acc_top2: 0.9956 - 5ms/step
Eval samples: 10000
Epoch 2/2
step 100/938 - loss: 1.4832 - acc_top1: 0.9789 - acc_top2: 0.9927 - 15ms/step
step 200/938 - loss: 1.4618 - acc_top1: 0.9779 - acc_top2: 0.9932 - 14ms/step
step 300/938 - loss: 1.4613 - acc_top1: 0.9779 - acc_top2: 0.9929 - 15ms/step
step 400/938 - loss: 1.4765 - acc_top1: 0.9772 - acc_top2: 0.9932 - 15ms/step
step 500/938 - loss: 1.4932 - acc_top1: 0.9775 - acc_top2: 0.9934 - 15ms/step
step 600/938 - loss: 1.4773 - acc_top1: 0.9773 - acc_top2: 0.9936 - 15ms/step
step 700/938 - loss: 1.4612 - acc_top1: 0.9783 - acc_top2: 0.9939 - 15ms/step
step 800/938 - loss: 1.4653 - acc_top1: 0.9779 - acc_top2: 0.9939 - 15ms/step
step 900/938 - loss: 1.4639 - acc_top1: 0.9780 - acc_top2: 0.9939 - 15ms/step
step 938/938 - loss: 1.4678 - acc_top1: 0.9779 - acc_top2: 0.9937 - 15ms/step
Eval begin...
step 100/157 - loss: 1.4612 - acc_top1: 0.9733 - acc_top2: 0.9945 - 6ms/step
step 157/157 - loss: 1.4612 - acc_top1: 0.9778 - acc_top2: 0.9952 - 6ms/step
Eval samples: 10000
总结
----
以上就是用Mnist手写数字识别的例子对保存模型、加载模型、恢复训练进行讲解,Paddle提供了很多保存和加载的API方法,您可以根据自己的需求进行选择。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册