提交 2af4a06d 编写于 作者: D denglelaibh

fix

上级 5ad41346
The minimum PaddlePaddle version needed for the code sample in this directory is v0.10.0. If you are on a version of PaddlePaddle earlier than v0.10.0, please [update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
-----------------------
# Deep Residual Networks(DRN)
## 简介
在论文[1]中提到了,1202层的ResNet出现了过拟合的问题,有待进一步改进。第二年,何的团队就发表了“Identity Mappings in Deep Residual Networks”这篇文章[2],分析了ResNet成功的关键因素——residual block背后的算法,并对residual block以及after-addition activation进行改进,通过一系列的ablation experiments验证了,在residual block和after-addition activation上都使用identity mapping(恒等映射)时,能对模型训练产生很好的效果,通过这项改进,也成功的训练出了具有很好效果的ResNet-1001。
## DRN 网络结构
在原始的ResNet中,对于每一个residual building block:
![pic1](./img/pic1.png)
可以表现为以下形式:
<img src="https://latex.codecogs.com/gif.latex?$$&space;y_l&space;=&space;h(x_l)&space;&plus;&space;f(x_l,&space;w_l)&space;$$" title="$$ y_l = h(x_l) + f(x_l, w_l) $$" />
<img src="https://latex.codecogs.com/gif.latex?x_{l&plus;1}&space;=&space;f(y_l)" title="x_{l+1} = f(y_l)" />
其中<img src="https://latex.codecogs.com/gif.latex?$h(x_1)$" title="$h(x_1)$" />为一个恒等映射,<img src="https://latex.codecogs.com/gif.latex?$f(y_l)$" title="$f(y_l)$" />代表ReLU激活函数,在[2]中提出了,如果<img src="https://latex.codecogs.com/gif.latex?$h(x)$" title="$h(x)$" /><img src="https://latex.codecogs.com/gif.latex?$f(y)$" title="$f(y)$" />都是恒等映射,即<img src="https://latex.codecogs.com/gif.latex?$h(x_l)=x_l、f(y_l)=y_l$" title="$h(x_l)=x_l、f(y_l)=y_l$" />,那么在训练的前向和反向传播阶段,信号可以直接从一个单元传递到另外一个单元,使得训练变得更加简单。
在DNN16中,具有以下优良特性:
(1)对于任意深的单元**L**的特征 <img src="https://latex.codecogs.com/gif.latex?$x_L$" title="$x_L$" /> 可以表达为浅层单元**l**的特征<img src="https://latex.codecogs.com/gif.latex?$x_l$" title="$x_l$" />加上一个形如 <img src="https://latex.codecogs.com/gif.latex?$\sum_{i=l}^{L−1}F$" title="$\sum_{i=l}^{L−1}F$" />的残差函数,这表明了任意单元**L****l**之间都具有残差特性。
(2)对于任意深的单元 **L**,它的特征 <img src="https://latex.codecogs.com/gif.latex?$x_L&space;=&space;x_0&space;&plus;&space;\sum_{i=0}^{L−1}F(x_i,W_i)$" title="$x_L = x_0 + \sum_{i=0}^{L−1}F(x_i,W_i)$" />,即为之前所有残差函数输出的总和(加上<img src="https://latex.codecogs.com/gif.latex?$x_0$" title="$x_0$" />)。而正好相反的是,“plain network”中的特征<img src="https://latex.codecogs.com/gif.latex?$x_L$" title="$x_L$" />是一系列矩阵向量的乘积,也就是<img src="https://latex.codecogs.com/gif.latex?$\prod_{i=0}^{L−1}W_i&space;x_0$" title="$\prod_{i=0}^{L−1}W_i x_0$" />,而求和的计算量远远小于求积的计算量。
实验发现,<img src="https://latex.codecogs.com/gif.latex?$h(x_l)&space;=&space;x_l$" title="$h(x_l) = x_l$" />的误差衰减最快,误差也最低(下图a子图所示):
![pic2](./img/pic2.png)
对于激活函数,验发现,将ReLU和BN都放在预激活中,即full pre-activation(下图子图e所示)在ResNet-110和ResNet-164上的效果都最好。
![pic3](./img/pic3.png)
## 复现文件一览
在复现文件中,包含以下文件:
<table>
<tr>
<th width=60%>文件</th>
<th width=40%>描述</th>
</tr>
<tr>
<td> train.py </td>
<td> DRN模型训练脚本 </td>
</tr>
<tr>
<td> infer.py </td>
<td> 利用训练好的DRN模型做预测 </td>
<tr>
<td> drn.py </td>
<td> 定义DRN的网络结构 </td>
</tr>
<tr>
<td> utils.py </td>
   <td> 测验并显示训练过程 </td>
</tr>
</table>
## 基于flower数据集的模型复现
### 数据准备
所使用的的数据集是paddle中自带的flowers数据集进行训练,直接import即可:
```
import paddle.v2.dataset.flowers as flowers
```
### 网络定义
网络的定义在文件```drn.py```中完整实现,其中最主要的是残差网络的部分:
```
def conv_bn_layer(input,
ch_out,
filter_size,
stride,
padding,
active_type=paddle.activation.Relu(),
ch_in=None):
tmp = paddle.layer.img_conv(
input=input,
filter_size=filter_size,
num_channels=ch_in,
num_filters=ch_out,
stride=stride,
padding=padding,
act=paddle.activation.Linear(),
bias_attr=False)
return paddle.layer.batch_norm(input=tmp, act=active_type)
```
### 训练
接下来,执行``` python train.py -model drn``` 即可训练过程,在训练过程中,建议使CUDA GPU进行训练,如果使用CPU训练耗时可长达90小时以上,关键代码为:
```
paddle.init(use_gpu=True, trainer_count=1)
image = paddle.layer.data(name="image", type=paddle.data_type.dense_vector(DATA_DIM))
lbl = paddle.layer.data(name="label", type=paddle.data_type.integer_value(CLASS_DIM))
# 省略部分代码
trainer = paddle.trainer.SGD(cost=cost,
parameters=parameters,
update_equation=optimizer,
extra_layers=extra_layers)
# 省略部分代码
trainer.train(
reader=train_reader, num_passes=200, event_handler=event_handler)
```
下面是关于上述代码的解释:
1. 进行``` paddle.init ```以1个GPU的方式初始化
2. 定义```img```图像名 和 ```lbl``` 图像标签
3. 定义```trainer```,包含损失函数、参数、优化器和层数信息
4.```train```函数中进行实际训练,共执行200趟
执行过程中,控制台将打印如下所示的信息:
```
Pass 0, Batch 0, Cost 2.2512, ...
Pass 0, Batch 1, Cost 2.1532, ...
```
同时在```train.py```目录下,每趟训练完成时,将生成```params_pass_0.tar,gz```,最后一趟的200.tar.gz文件生成时,训练完成。
### 简化测验
应用```python utils.py```可以简化训练过程的输出,观察训练的cost和效果的变化
```
[INFO 2018-03-27 09:21:50,020 utils.py:83] pass_id:0,batch_id:0,train_cost:2.97480511665,s:{'classification_error_evaluator': 0.875}
INFO :pass_id:0,batch_id:0,train_cost:2.97480511665,s:{'classification_error_evaluator': 0.875}
[INFO 2018-03-27 09:21:55,806 utils.py:83] pass_id:0,batch_id:10,train_cost:4.01303768158,s:{'classification_error_evaluator': 1.0}
INFO :pass_id:0,batch_id:10,train_cost:4.01303768158,s:{'classification_error_evaluator': 1.0}
[INFO 2018-03-27 09:22:01,413 utils.py:83] pass_id:0,batch_id:20,train_cost:2.66417765617,s:{'classification_error_evaluator': 0.875}
INFO :pass_id:0,batch_id:20,train_cost:2.66417765617,s:{'classification_error_evaluator': 0.875}
```
### 应用模型
应用训练好的模型,执行``` python infer.py -data_list <文件目录> -model drn```即可:
```
# load parameters
with gzip.open('params_pass_200.tar.gz', 'r') as f:
parameters = paddle.parameters.Parameters.from_tar(f)
file_list = [line.strip() for line in open(image_list_file)]
test_data = [(paddle.image.load_and_transform(image_file, 256, 224, False)
.flatten().astype('float32'), )
for image_file in file_list]
probs = paddle.infer(
output_layer=out, parameters=parameters, input=test_data)
lab = np.argsort(-probs)
for file_name, result in zip(file_list, lab):
print "Label of %s is: %d" % (file_name, result[0])
```
代码将从图片文件夹中读取对应的图片文件,同时给出预测的标签结果,并进行输出。
### 参考文献
[1] [Deep Residual Learning for Image Recognition](http://arxiv.org/pdf/1512.03385.pdf)
[2] [Identity Mappings in Deep Residual Networks](https://arxiv.org/abs/1603.05027)
import paddle.v2 as paddle
__all__ = ['drn_imagenet']
def conv_bn_layer(input,
ch_out,
filter_size,
stride,
padding,
active_type=paddle.activation.Relu(),
ch_in=None):
tmp = paddle.layer.img_conv(
input=input,
filter_size=filter_size,
num_channels=ch_in,
num_filters=ch_out,
stride=stride,
padding=padding,
act=paddle.activation.Linear(),
bias_attr=False)
return paddle.layer.batch_norm(input=tmp, act=active_type)
def shortcut(input, ch_out, stride):
if input.num_filters != ch_out:
return conv_bn_layer(input, ch_out, 1, stride, 0,
paddle.activation.Linear())
else:
return input
def basicblock(input, ch_out, stride):
short = shortcut(input, ch_out, stride)
conv1 = conv_bn_layer(input, ch_out, 3, stride, 1)
conv2 = conv_bn_layer(conv1, ch_out, 3, 1, 1, paddle.activation.Linear())
return paddle.layer.addto(
input=[short, conv2], act=paddle.activation.Relu())
def bottleneck(input, ch_out, stride):
short = shortcut(input, ch_out * 4, stride)
conv1 = conv_bn_layer(input, ch_out, 1, stride, 0)
conv2 = conv_bn_layer(conv1, ch_out, 3, 1, 1)
conv3 = conv_bn_layer(conv2, ch_out * 4, 1, 1, 0,
paddle.activation.Linear())
return paddle.layer.addto(
input=[short, conv3], act=paddle.activation.Relu())
def layer_warp(block_func, input, ch_out, count, stride):
conv = block_func(input, ch_out, stride)
for i in range(1, count):
conv = block_func(conv, ch_out, 1)
return conv
def drn_imagenet(input, class_dim, depth=50):
cfg = {
18: ([2, 2, 2, 1], basicblock),
34: ([3, 4, 6, 3], basicblock),
50: ([3, 4, 6, 3], bottleneck),
101: ([3, 4, 23, 3], bottleneck),
152: ([3, 8, 36, 3], bottleneck)
}
stages, block_func = cfg[depth]
conv1 = conv_bn_layer(
input, ch_in=3, ch_out=64, filter_size=7, stride=2, padding=3)
pool1 = paddle.layer.img_pool(input=conv1, pool_size=3, stride=2)
res1 = layer_warp(block_func, pool1, 64, stages[0], 1)
res2 = layer_warp(block_func, res1, 128, stages[1], 2)
res3 = layer_warp(block_func, res2, 256, stages[2], 2)
res4 = layer_warp(block_func, res3, 512, stages[3], 2)
pool2 = paddle.layer.img_pool(
input=res4, pool_size=7, stride=1, pool_type=paddle.pooling.Avg())
out = paddle.layer.fc(input=pool2,
size=class_dim,
act=paddle.activation.Softmax())
return out
\ No newline at end of file
import os
import gzip
import argparse
import numpy as np
from PIL import Image
import paddle.v2 as paddle
import reader
import vgg
import drn
DATA_DIM = 3 * 224 * 224
CLASS_DIM = 102
def main():
# parse the argument
parser = argparse.ArgumentParser()
parser.add_argument(
'data_list',
help='The path of data list file, which consists of one image path per line'
)
parser.add_argument(
'model',
help='The model for image classification',
choices=[
'drn'
])
parser.add_argument(
'params_path', help='The file which stores the parameters')
args = parser.parse_args()
# PaddlePaddle init
paddle.init(use_gpu=True, trainer_count=1)
image = paddle.layer.data(
name="image", type=paddle.data_type.dense_vector(DATA_DIM))
if args.model == 'drn':
out = drn.drn_imagenet(image, class_dim=CLASS_DIM)
# load parameters
with gzip.open(args.params_path, 'r') as f:
parameters = paddle.parameters.Parameters.from_tar(f)
file_list = [line.strip() for line in open(args.data_list)]
test_data = [(paddle.image.load_and_transform(image_file, 256, 224, False)
.flatten().astype('float32'), ) for image_file in file_list]
probs = paddle.infer(
output_layer=out, parameters=parameters, input=test_data)
lab = np.argsort(-probs)
for file_name, result in zip(file_list, lab):
print "Label of %s is: %d" % (file_name, result[0])
if __name__ == '__main__':
main()
\ No newline at end of file
import random
from paddle.v2.image import load_and_transform
import paddle.v2 as paddle
from multiprocessing import cpu_count
def train_mapper(sample):
'''
map image path to type needed by model input layer for the training set
'''
img, label = sample
img = paddle.image.load_image(img)
img = paddle.image.simple_transform(img, 256, 224, True)
return img.flatten().astype('float32'), label
def test_mapper(sample):
'''
map image path to type needed by model input layer for the test set
'''
img, label = sample
img = paddle.image.load_image(img)
img = paddle.image.simple_transform(img, 256, 224, True)
return img.flatten().astype('float32'), label
def train_reader(train_list, buffered_size=1024):
def reader():
with open(train_list, 'r') as f:
lines = [line.strip() for line in f]
for line in lines:
img_path, lab = line.strip().split('\t')
yield img_path, int(lab)
return paddle.reader.xmap_readers(train_mapper, reader,
cpu_count(), buffered_size)
def test_reader(test_list, buffered_size=1024):
def reader():
with open(test_list, 'r') as f:
lines = [line.strip() for line in f]
for line in lines:
img_path, lab = line.strip().split('\t')
yield img_path, int(lab)
return paddle.reader.xmap_readers(test_mapper, reader,
cpu_count(), buffered_size)
if __name__ == '__main__':
for im in train_reader('train.list'):
print len(im[0])
for im in train_reader('test.list'):
print len(im[0])
\ No newline at end of file
import gzip
import argparse
import paddle.v2.dataset.flowers as flowers
import paddle.v2 as paddle
import reader
import drn
DATA_DIM = 3 * 224 * 224
CLASS_DIM = 102
BATCH_SIZE = 128
def main():
# parse the argument
parser = argparse.ArgumentParser()
parser.add_argument(
'model',
help='The model for image classification',
choices=[
'drn'
])
args = parser.parse_args()
# PaddlePaddle init
paddle.init(use_gpu=False, trainer_count=1)
image = paddle.layer.data(
name="image", type=paddle.data_type.dense_vector(DATA_DIM))
lbl = paddle.layer.data(
name="label", type=paddle.data_type.integer_value(CLASS_DIM))
extra_layers = None
learning_rate = 0.01
if args.model == 'drn':
out = drn.drn_imagenet(image, class_dim=CLASS_DIM)
learning_rate = 0.1
cost = paddle.layer.classification_cost(input=out, label=lbl)
# Create parameters
parameters = paddle.parameters.create(cost)
# Create optimizer
optimizer = paddle.optimizer.Momentum(
momentum=0.9,
regularization=paddle.optimizer.L2Regularization(rate=0.0005 *
BATCH_SIZE),
learning_rate=learning_rate / BATCH_SIZE,
learning_rate_decay_a=0.1,
learning_rate_decay_b=128000 * 35,
learning_rate_schedule="discexp", )
train_reader = paddle.batch(
paddle.reader.shuffle(
flowers.train(),
# To use other data, replace the above line with:
# reader.train_reader('train.list'),
buf_size=1000),
batch_size=BATCH_SIZE)
test_reader = paddle.batch(
flowers.valid(),
# To use other data, replace the above line with:
# reader.test_reader('val.list'),
batch_size=BATCH_SIZE)
# Create trainer
trainer = paddle.trainer.SGD(cost=cost,
parameters=parameters,
update_equation=optimizer,
extra_layers=extra_layers)
# End batch and end pass event handler
def event_handler(event):
if isinstance(event, paddle.event.EndIteration):
if event.batch_id % 1 == 0:
print "\nPass %d, Batch %d, Cost %f, %s" % (
event.pass_id, event.batch_id, event.cost, event.metrics)
if isinstance(event, paddle.event.EndPass):
with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
trainer.save_parameter_to_tar(f)
result = trainer.test(reader=test_reader)
print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)
trainer.train(
reader=train_reader, num_passes=200, event_handler=event_handler)
if __name__ == '__main__':
main()
# encoding=utf-8
from __future__ import print_function
import paddle.v2 as paddle
import sys
import logging
data_dim = 3 * 32 * 32
num_class = 10
paddle.init(use_gpu=False, trainer_count=4) # 设置不适用GPU,trainer_count=1
logger = logging.getLogger()
formatter = logging.Formatter('%(levelname)-8s:%(message)s')
file_handler = logging.FileHandler('./resNet_paddlepaddle2016.log') # 日志文件名称
file_handler.setFormatter(formatter)
console_handler = logging.StreamHandler(sys.stdout)
console_handler.formatter = formatter
logger.addHandler(file_handler)
logger.addHandler(console_handler)
logger.setLevel(logging.INFO)
logger.info('input=3*32*32') # 样本图片大小
logger.info('numclass=10')
def bn_conv_layer(input_x, ch_out, filter_size, stride, padding, ch_in=None):
tmp = paddle.layer.batch_norm(input=input_x, act=paddle.activation.Relu())
return paddle.layer.img_conv(input=tmp, filter_size=filter_size, num_channels=ch_in, num_filters=ch_out,
stride=stride, padding=padding, act=paddle.activation.Linear(), bias_attr=False)
def shortcut(input_x, ch_in, ch_out, stride=1):
if ch_in != ch_out:
return bn_conv_layer(input_x, ch_out, 1, stride, 0)
else:
return input_x
def basicblock_changeC(ipt, ch_in, ch_out, stride):
tmp = bn_conv_layer(ipt, ch_out, 3, stride, 1)
tmp = bn_conv_layer(tmp, ch_out, 3, 1, 1)
short = shortcut(ipt, ch_in, ch_out, stride)
return paddle.layer.addto(input=[tmp, short], act=paddle.activation.Linear())
def layer_warp(block_func, ipt, ch_in, ch_out, count, stride):
tmp = block_func(ipt, ch_in, ch_out, stride)
for i in range(1, count):
tmp = block_func(tmp, ch_out, ch_out, 1)
return tmp
def resNet_(input_x, layer_num=[3, 4, 6, 3]):
conv_1 = bn_conv_layer(input_x, ch_in=3, ch_out=16, filter_size=3, stride=1, padding=1)
basic_1 = layer_warp(basicblock_changeC, conv_1, 16, 16, layer_num[0], 1)
basic_2 = layer_warp(basicblock_changeC, basic_1, 16, 32, layer_num[1], 2)
basic_3 = layer_warp(basicblock_changeC, basic_2, 32, 64, layer_num[2], 2)
basic_4 = layer_warp(basicblock_changeC, basic_3, 64, 128, layer_num[2], 2)
predict = paddle.layer.img_pool(input=basic_4, pool_size=4, stride=1, pool_type=paddle.pooling.Avg())
predict_ = paddle.layer.fc(input=predict, size=num_class, act=paddle.activation.Softmax())
return predict_
input_image = paddle.layer.data(name="image", type=paddle.data_type.dense_vector(data_dim))
images, label = input_image, paddle.layer.data(name="label", type=paddle.data_type.integer_value(num_class))
predict = resNet_(images)
cost = paddle.layer.classification_cost(input=predict, label=label)
parameters = paddle.parameters.create(cost)
optimizer = paddle.optimizer.Momentum(momentum=0.9, regularization=paddle.optimizer.L2Regularization(rate=0.0002 * 128),
learning_rate=0.1 / 128.0, learning_rate_decay_a=0.1,
learning_rate_decay_b=50000 * 100, learning_rate_schedule='discexp')
trainer = paddle.trainer.SGD(cost=cost, parameters=parameters, update_equation=optimizer)
reader = paddle.batch(paddle.reader.shuffle(paddle.dataset.cifar.train10(), buf_size=100), batch_size=16)
feeding = {'image': 0, 'label': 1}
def event_handler(event):
if isinstance(event, paddle.event.EndIteration):
if event.batch_id % 10 == 0:
logger.info('pass_id:' + str(event.pass_id) + ',batch_id:' + str(event.batch_id) + ',train_cost:' + str(
event.cost) + ',s:' + str(event.metrics))
if isinstance(event, paddle.event.EndPass):
with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
trainer.save_parameter_to_tar(f)
result = trainer.test(
reader=paddle.batch(
paddle.dataset.cifar.test10(), batch_size=64),
feeding=feeding)
logger.info('pass_id:' + str(event.pass_id) + ',s:' + str(result.metrics))
trainer.train(reader=reader, num_passes=20, event_handler=event_handler, feeding=feeding)
logger.removeHandler(file_handler)
logger.removeHandler(console_handler)
# Teaching Machines to Read and Comprehend
-----
以下是本例目录包含的文件以及对应说明:
```
├── README.md # 本教程markdown 文档
├── images # 本教程图片目录
│ ├── attentive_reader.png
│ ├── impatient_reader.png
│ ├── two_layer_lstm.png
│ └── machine_reader.png
├── train.py # 训练脚本
└── attentive_reader.py # Attentive Reader网络的实现
└── impatient_reader.py # Impatient Reader网络的实现
└── two_layer_lstm.py # 双层LSTM的实现
```
## 背景介绍
---
本教程针对NIPS 2015上的一篇论文《Teaching Machines to Read and Comprehend》中的主要模型架构做了复现工作,阅读本教程,读者将对于文章中所使用的主要模型架构方式有一定的了解,从而进行主要的复现工作,论文的贡献点有以下几点:
* 针对阅读理解缺乏大规模训练数据集,构建了相应的数据集
* 尝试利用神经网络模型解决机器阅读理解问题
由于数据集的建立并不涉及到神经网络模型,更多涉及到NLP中的问题,故而本教程主要针对其中的网络模型做了阐释和复现。
## 模型思路
---
本文主要提及了三个阅读模型,分别是双层lstm,attentive reader 和 impatient reader,下面将逐一介绍三个模型的构建思路
### 双层lstm
在文中最基础也是其他两个模型的重要组成部分的模型,就是deep lstm模型,如下图所示
![fig1](./img/two_layer_lstm.png)
其中的核心原理其实就是用一个两层 LSTM 来 encode "query||document" 或者 "document||query",然后用得到的表示做后续的匹配工作。
### Attentive Reader
Attentive的模型如下图所示:
![fig2](./img/attentive_reader.png)
这一模型分别计算 document 和 query,然后通过一个额外的前馈网络把他们组合起来。
document 部分利用双向 LSTM 来 encode,每个 token 都是由前向后向的隐层状态拼接而成,而 document 则是利用其中所有 token 的加权平均表示的,这里的权重就是 attention,利用 query 部分就是将双向 LSTM 的两个方向的最终状态拼接表示而来。
最后利用 document 和 query 做后续工作。
### Impatient Reader
Impatient Reader 其实与 Attentive Reader 有一定的相似之处,但是每读入一个 query token 就迭代计算一次 document 的权重分布,这里只需要设置好query token的长度即可,其网络模型如下图所示:
![fig3](./img/impatient_reader.png)
## 模型效果
---
模型最终的结果是能够在将实体匿名化之后,根据上下文,阅读获得query对应的答案,也就是最终实现machine reading.效果如下图所示:
![fig4](./img/machine_reader.png)
## 模型构造复现
----
在本教程中,重点对整个文章中最为重要的组成部分,双层lstm进行了复现,对于attentive reader和impatient reader,仅需要调整参数,完成构建即可。双层lstm的模型的构建见文件[two_layer_lstm.py](./two_layer_lstm.py)
### 函数建立
首先建立网络构造函数,输入为词典维度,目标词典维度,模型是否已经生成,窗口大小和阅读最大长度。
```
def two_layer_lstm_net(source_dict_dim,
target_dict_dim,
is_generating,
beam_size=3,
max_length=250):
```
### 网络基本参数设置
在这里设置词向量embedding的维度为512,encoder和decoder的维度与之相同:
```
### 网络结构定义
word_vector_dim = 512 # 词向量维度
decoder_size = 512 # decoder隐藏单元的维度
encoder_size = 512 # encoder隐藏单元维度
```
### Encoder 输入疑问句
将问句输入网络,进行encoder:
```
src_word_id = paddle.layer.data(
name='source_words',
type=paddle.data_type.integer_value_sequence(source_dict_dim))
src_embedding = paddle.layer.embedding(
input=src_word_id, size=word_vector_dim)
src_forward = paddle.networks.simple_gru(
input=src_embedding, size=encoder_size)
src_backward = paddle.networks.simple_gru(
input=src_embedding, size=encoder_size, reverse=True)
encoded_vector = paddle.layer.concat(input=[src_forward, src_backward])
```
### 将结果输入decoder
从双层lstm得到的输出下一步输入decoder中:
```
encoded_proj = paddle.layer.fc(
act=paddle.activation.Linear(),
size=decoder_size,
bias_attr=False,
input=encoded_vector)
backward_first = paddle.layer.first_seq(input=src_backward)
decoder_boot = paddle.layer.fc(
size=decoder_size,
act=paddle.activation.Tanh(),
bias_attr=False,
input=backward_first)
```
接下来,如果模型并未生成,即$is\_generate$的状态为0,则训练模型,生成模型的tar文件,否则如果已有模型tar文件,则直接读取tar文件。
## 模型训练
---
模型训练的代码在[train.py](./train.py)中实现,主要有以下几个部分:
### 模型打包存储函数
将训练好的模型打包为tar文件,以便下次使用:
```
def save_model(trainer, parameters, save_path):
with open(save_path, 'w') as f:
trainer.save_parameter_to_tar(f)
```
### 初始化
这里使用CPU进行训练,同时初始设定为没有模型:
```
paddle.init(use_gpu=False, trainer_count=1)
is_generating = False
# 定义dict的维度
dict_size = 30000
source_dict_dim = target_dict_dim = dict_size
```
### 定义方法并优化训练器
这里利用paddle自带的adam方法进行优化操作:
```
optimizer = paddle.optimizer.Adam(
learning_rate=5e-5,
regularization=paddle.optimizer.L2Regularization(rate=8e-4))
cost = two_layer_lstm_net(source_dict_dim, target_dict_dim, is_generating)
parameters = paddle.parameters.create(cost)
trainer = paddle.trainer.SGD(
cost=cost, parameters=parameters, update_equation=optimizer)
```
### 数据集
由于论文自带的数据集过于庞大,这里利用了paddle自带的wmt数据集做了简化,代码如下:
```
src_dict, trg_dict = paddle.dataset.wmt14.get_dict(dict_size)
gen_sen_idx = np.where(beam_result[1] == -1)[0]
assert len(gen_sen_idx) == len(gen_data) * beam_size
```
### 生成回答
最后利用decoder实现阅读的回答步骤:
```
# 生成回答
start_pos, end_pos = 1, 0
for i, sample in enumerate(gen_data):
print(
" ".join([src_dict[w] for w in sample[0][1:-1]])
)
for j in xrange(beam_size):
end_pos = gen_sen_idx[i * beam_size + j]
print("%.4f\t%s" % (beam_result[0][i][j], " ".join(
trg_dict[w] for w in beam_result[1][start_pos:end_pos])))
start_pos = end_pos + 2
print("\n")
```
### 执行训练
在终端执行指令:
```
python train.py
```
会在终端有如下输出:
```
Pass 0, Batch 0, Cost 306.693146, {'classification_error_evaluator': 1.0}
.........
Pass 0, Batch 10, Cost 211.242233, {'classification_error_evaluator': 0.9268292784690857}
.........
Pass 0, Batch 20, Cost 203.324371, {'classification_error_evaluator': 0.8860759735107422}
.........
Pass 0, Batch 30, Cost 351.998260, {'classification_error_evaluator': 0.8540145754814148}
```
## 执行预测
在终端执行以下指令:
```
python infer.py
```
将产生对应的问句并给出对应的回答
## 扩展
---
读者可以自行下载文章中的原始数据集[3]进行进一步的扩展。
## 参考文献
---
[1] http://www.paddlepaddle.org
[2] http://papers.nips.cc/paper/5945-teaching-machines-to-read-and-comprehend.pdf
[3] https://github.com/deepmind/rc-data/
# -*- coding:utf-8 -*-
import sys, os
import numpy as np
import paddle.v2 as paddle
def attentive_reader_net(source_dict_dim,
target_dict_dim,
is_generating,
beam_size=3,
max_length=250):
### 网络结构定义
word_vector_dim = 256 # 词向量维度
decoder_size = 256 # decoder隐藏单元的维度
encoder_size = 256 # encoder隐藏单元维度
#### Encoder 输入疑问句
src_word_id = paddle.layer.data(
name='source_words',
type=paddle.data_type.integer_value_sequence(source_dict_dim))
src_embedding = paddle.layer.embedding(
input=src_word_id, size=word_vector_dim)
src_forward = paddle.networks.simple_gru(
input=src_embedding, size=encoder_size)
src_backward = paddle.networks.simple_gru(
input=src_embedding, size=encoder_size, reverse=True)
encoded_vector = paddle.layer.concat(input=[src_forward, src_backward])
#### Decoder 输出回答
encoded_proj = paddle.layer.fc(
act=paddle.activation.Linear(),
size=decoder_size,
bias_attr=False,
input=encoded_vector)
backward_first = paddle.layer.first_seq(input=src_backward)
decoder_boot = paddle.layer.fc(
size=decoder_size,
act=paddle.activation.Tanh(),
bias_attr=False,
input=backward_first)
def attentive_decoder(enc_vec, enc_proj, current_word):
decoder_mem = paddle.layer.memory(
name='attentive_decoder', size=decoder_size, boot_layer=decoder_boot)
context = paddle.networks.simple_attention(
encoded_sequence=enc_vec,
encoded_proj=enc_proj,
decoder_state=decoder_mem)
decoder_inputs = paddle.layer.fc(
act=paddle.activation.Linear(),
size=decoder_size * 3,
bias_attr=False,
input=[context, current_word],
layer_attr=paddle.attr.ExtraLayerAttribute(
error_clipping_threshold=100.0))
attentive_step = paddle.layer.gru_step(
name='attentive_decoder',
input=decoder_inputs,
output_mem=decoder_mem,
size=decoder_size)
out = paddle.layer.fc(
size=target_dict_dim,
bias_attr=True,
act=paddle.activation.Softmax(),
input=attentive_step)
return out
decoder_group_name = 'decoder_group'
group_input1 = paddle.layer.StaticInput(input=encoded_vector)
group_input2 = paddle.layer.StaticInput(input=encoded_proj)
group_inputs = [group_input1, group_input2]
if not is_generating:
trg_embedding = paddle.layer.embedding(
input=paddle.layer.data(
name='attentive_reader',
type=paddle.data_type.integer_value_sequence(target_dict_dim)),
size=word_vector_dim,
param_attr=paddle.attr.ParamAttr(name='_attentive_reader_embedding'))
group_inputs.append(trg_embedding)
decoder = paddle.layer.recurrent_group(
name=decoder_group_name,
step=attentive_decoder,
input=group_inputs)
lbl = paddle.layer.data(
name='attentive_reader_next_word',
type=paddle.data_type.integer_value_sequence(target_dict_dim))
cost = paddle.layer.classification_cost(input=decoder, label=lbl)
return cost
else:
trg_embedding = paddle.layer.GeneratedInput(
size=target_dict_dim,
embedding_name='_attentive_reader_embedding',
embedding_size=word_vector_dim)
group_inputs.append(trg_embedding)
answer_gen = paddle.layer.beam_search(
name=decoder_group_name,
step=attentive_decoder,
input=group_inputs,
bos_id=0,
eos_id=1,
beam_size=beam_size,
max_length=max_length)
return answer_gen
\ No newline at end of file
# -*- coding:utf-8 -*-
import sys, os
import numpy as np
import paddle.v2 as paddle
def impatient_reader_net(source_dict_dim,
target_dict_dim,
is_generating,
beam_size=3,
max_length=250):
### 网络结构定义
word_vector_dim = 128 # 词向量维度
decoder_size = 128 # decoder隐藏单元的维度
encoder_size = 128 # encoder隐藏单元维度
#### Encoder 输入疑问句
src_word_id = paddle.layer.data(
name='source_words',
type=paddle.data_type.integer_value_sequence(source_dict_dim))
src_embedding = paddle.layer.embedding(
input=src_word_id, size=word_vector_dim)
src_forward = paddle.networks.simple_gru(
input=src_embedding, size=encoder_size)
src_backward = paddle.networks.simple_gru(
input=src_embedding, size=encoder_size, reverse=True)
encoded_vector = paddle.layer.concat(input=[src_forward, src_backward])
#### Decoder 输出回答
encoded_proj = paddle.layer.fc(
act=paddle.activation.Linear(),
size=decoder_size,
bias_attr=False,
input=encoded_vector)
backward_first = paddle.layer.first_seq(input=src_backward)
decoder_boot = paddle.layer.fc(
size=decoder_size,
act=paddle.activation.Tanh(),
bias_attr=False,
input=backward_first)
def impatient_decoder(enc_vec, enc_proj, current_word):
decoder_mem = paddle.layer.memory(
name='impatient_decoder', size=decoder_size, boot_layer=decoder_boot)
context = paddle.networks.simple_attention(
encoded_sequence=enc_vec,
encoded_proj=enc_proj,
decoder_state=decoder_mem)
decoder_inputs = paddle.layer.fc(
act=paddle.activation.Linear(),
size=decoder_size * 3,
bias_attr=False,
input=[context, current_word],
layer_attr=paddle.attr.ExtraLayerAttribute(
error_clipping_threshold=100.0))
impatient_step = paddle.layer.gru_step(
name='impatient_decoder',
input=decoder_inputs,
output_mem=decoder_mem,
size=decoder_size)
out = paddle.layer.fc(
size=target_dict_dim,
bias_attr=True,
act=paddle.activation.Softmax(),
input=impatient_step)
return out
decoder_group_name = 'decoder_group'
group_input1 = paddle.layer.StaticInput(input=encoded_vector)
group_input2 = paddle.layer.StaticInput(input=encoded_proj)
group_inputs = [group_input1, group_input2]
if not is_generating:
trg_embedding = paddle.layer.embedding(
input=paddle.layer.data(
name='impatient_reader',
type=paddle.data_type.integer_value_sequence(target_dict_dim)),
size=word_vector_dim,
param_attr=paddle.attr.ParamAttr(name='_impatient_reader_embedding'))
group_inputs.append(trg_embedding)
decoder = paddle.layer.recurrent_group(
name=decoder_group_name,
step=impatient_decoder,
input=group_inputs)
lbl = paddle.layer.data(
name='impatient_reader_next_word',
type=paddle.data_type.integer_value_sequence(target_dict_dim))
cost = paddle.layer.classification_cost(input=decoder, label=lbl)
return cost
else:
trg_embedding = paddle.layer.GeneratedInput(
size=target_dict_dim,
embedding_name='_impatient_reader_embedding',
embedding_size=word_vector_dim)
group_inputs.append(trg_embedding)
answer_gen = paddle.layer.beam_search(
name=decoder_group_name,
step=impatient_decoder,
input=group_inputs,
bos_id=0,
eos_id=1,
beam_size=beam_size,
max_length=max_length)
return answer_gen
\ No newline at end of file
# -*- coding:utf-8 -*-
import sys, os
import numpy as np
import paddle.v2 as paddle
from attentive_reader import attentive_reader_net
from impatient_reader import impatient_reader_net
from two_layer_lstm import two_layer_lstm_net
with_gpu = os.getenv('WITH_GPU', '0') != '0'
def save_model(trainer, parameters, save_path):
with open(save_path, 'w') as f:
trainer.save_parameter_to_tar(f)
def main():
paddle.init(use_gpu=True, trainer_count=1)
is_generating = True
# 定义dict的维度
dict_size = 30000
source_dict_dim = target_dict_dim = dict_size
# 训练网络
if not is_generating:
# 定义方法并优化训练器
optimizer = paddle.optimizer.Adam(
learning_rate=5e-5,
regularization=paddle.optimizer.L2Regularization(rate=8e-4))
cost = two_layer_lstm_net(source_dict_dim, target_dict_dim, is_generating)
parameters = paddle.parameters.create(cost)
trainer = paddle.trainer.SGD(
cost=cost, parameters=parameters, update_equation=optimizer)
# 设置数据集
wmt14_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.wmt14.train(dict_size), buf_size=8192),
batch_size=4)
# 设置event_handler
def event_handler(event):
if isinstance(event, paddle.event.EndIteration):
if event.batch_id % 10 == 0:
print("\nPass %d, Batch %d, Cost %f, %s" %
(event.pass_id, event.batch_id, event.cost,
event.metrics))
else:
sys.stdout.write('.')
sys.stdout.flush()
if not event.batch_id % 10:
save_path = 'params_pass_%05d_batch_%05d.tar' % (
event.pass_id, event.batch_id)
save_model(trainer, parameters, save_path)
if isinstance(event, paddle.event.EndPass):
# save parameters
save_path = 'params_pass_%05d.tar' % (event.pass_id)
save_model(trainer, parameters, save_path)
# 开始训练
trainer.train(
reader=wmt14_reader, event_handler=event_handler, num_passes=2)
# 生成问句并寻找回答
else:
# 只针对头三个问句
gen_data = []
gen_num = 3
for item in paddle.dataset.wmt14.gen(dict_size)():
gen_data.append([item[0]])
if len(gen_data) == gen_num:
break
beam_size = 3
beam_gen = two_layer_lstm_net(source_dict_dim, target_dict_dim,
is_generating, beam_size)
# 设置训练好的模型 bleu = 26.92
parameters = paddle.dataset.wmt14.model()
# prob 是回答准确的概率
beam_result = paddle.infer(
output_layer=beam_gen,
parameters=parameters,
input=gen_data,
field=['prob', 'id'])
# 载入数据集
src_dict, trg_dict = paddle.dataset.wmt14.get_dict(dict_size)
gen_sen_idx = np.where(beam_result[1] == -1)[0]
assert len(gen_sen_idx) == len(gen_data) * beam_size
# 生成回答
start_pos, end_pos = 1, 0
for i, sample in enumerate(gen_data):
print(
" ".join([src_dict[w] for w in sample[0][1:-1]])
)
for j in xrange(beam_size):
end_pos = gen_sen_idx[i * beam_size + j]
print("%.4f\t%s" % (beam_result[0][i][j], " ".join(
trg_dict[w] for w in beam_result[1][start_pos:end_pos])))
start_pos = end_pos + 2
print("\n")
if __name__ == '__main__':
main()
# -*- coding:utf-8 -*-
import sys, os
import numpy as np
import paddle.v2 as paddle
from attentive_reader import attentive_reader_net
from impatient_reader import impatient_reader_net
from two_layer_lstm import two_layer_lstm_net
with_gpu = os.getenv('WITH_GPU', '0') != '0'
def save_model(trainer, parameters, save_path):
with open(save_path, 'w') as f:
trainer.save_parameter_to_tar(f)
def main():
paddle.init(use_gpu=True, trainer_count=4)
is_generating = False
# 定义dict的维度
dict_size = 30000
source_dict_dim = target_dict_dim = dict_size
# 训练网络
if not is_generating:
# 定义方法并优化训练器
optimizer = paddle.optimizer.Adam(
learning_rate=5e-5,
regularization=paddle.optimizer.L2Regularization(rate=8e-4))
cost = two_layer_lstm_net(source_dict_dim, target_dict_dim, is_generating)
parameters = paddle.parameters.create(cost)
trainer = paddle.trainer.SGD(
cost=cost, parameters=parameters, update_equation=optimizer)
# 设置数据集
wmt14_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.wmt14.train(dict_size), buf_size=8192),
batch_size=4)
# 设置event_handler
def event_handler(event):
if isinstance(event, paddle.event.EndIteration):
if event.batch_id % 10 == 0:
print("\nPass %d, Batch %d, Cost %f, %s" %
(event.pass_id, event.batch_id, event.cost,
event.metrics))
else:
sys.stdout.write('.')
sys.stdout.flush()
if not event.batch_id % 10:
save_path = 'params_pass_%05d_batch_%05d.tar' % (
event.pass_id, event.batch_id)
save_model(trainer, parameters, save_path)
if isinstance(event, paddle.event.EndPass):
# save parameters
save_path = 'params_pass_%05d.tar' % (event.pass_id)
save_model(trainer, parameters, save_path)
# 开始训练
trainer.train(
reader=wmt14_reader, event_handler=event_handler, num_passes=2)
# 生成问句并寻找回答
else:
# 只针对头三个问句
gen_data = []
gen_num = 3
for item in paddle.dataset.wmt14.gen(dict_size)():
gen_data.append([item[0]])
if len(gen_data) == gen_num:
break
beam_size = 3
beam_gen = two_layer_lstm_net(source_dict_dim, target_dict_dim,
is_generating, beam_size)
# 设置训练好的模型 bleu = 26.92
parameters = paddle.dataset.wmt14.model()
# prob 是回答准确的概率
beam_result = paddle.infer(
output_layer=beam_gen,
parameters=parameters,
input=gen_data,
field=['prob', 'id'])
# 载入数据集
src_dict, trg_dict = paddle.dataset.wmt14.get_dict(dict_size)
gen_sen_idx = np.where(beam_result[1] == -1)[0]
assert len(gen_sen_idx) == len(gen_data) * beam_size
# 生成回答
start_pos, end_pos = 1, 0
for i, sample in enumerate(gen_data):
print(
" ".join([src_dict[w] for w in sample[0][1:-1]])
)
for j in xrange(beam_size):
end_pos = gen_sen_idx[i * beam_size + j]
print("%.4f\t%s" % (beam_result[0][i][j], " ".join(
trg_dict[w] for w in beam_result[1][start_pos:end_pos])))
start_pos = end_pos + 2
print("\n")
if __name__ == '__main__':
main()
# -*- coding:utf-8 -*-
import sys, os
import numpy as np
import paddle.v2 as paddle
def two_layer_lstm_net(source_dict_dim,
target_dict_dim,
is_generating,
beam_size=3,
max_length=250):
### 网络结构定义
word_vector_dim = 512 # 词向量维度
decoder_size = 512 # decoder隐藏单元的维度
encoder_size = 512 # encoder隐藏单元维度
#### Encoder 输入疑问句
src_word_id = paddle.layer.data(
name='source_words',
type=paddle.data_type.integer_value_sequence(source_dict_dim))
src_embedding = paddle.layer.embedding(
input=src_word_id, size=word_vector_dim)
src_forward = paddle.networks.simple_gru(
input=src_embedding, size=encoder_size)
src_backward = paddle.networks.simple_gru(
input=src_embedding, size=encoder_size, reverse=True)
encoded_vector = paddle.layer.concat(input=[src_forward, src_backward])
#### Decoder 输出回答
encoded_proj = paddle.layer.fc(
act=paddle.activation.Linear(),
size=decoder_size,
bias_attr=False,
input=encoded_vector)
backward_first = paddle.layer.first_seq(input=src_backward)
decoder_boot = paddle.layer.fc(
size=decoder_size,
act=paddle.activation.Tanh(),
bias_attr=False,
input=backward_first)
def two_layer_lstm_decoder(enc_vec, enc_proj, current_word):
decoder_mem = paddle.layer.memory(
name='two_layer_lstm_decoder', size=decoder_size, boot_layer=decoder_boot)
context = paddle.networks.simple_attention(
encoded_sequence=enc_vec,
encoded_proj=enc_proj,
decoder_state=decoder_mem)
decoder_inputs = paddle.layer.fc(
act=paddle.activation.Linear(),
size=decoder_size * 3,
bias_attr=False,
input=[context, current_word],
layer_attr=paddle.attr.ExtraLayerAttribute(
error_clipping_threshold=100.0))
two_layer_lstm_step = paddle.layer.gru_step(
name='two_layer_lstm_decoder',
input=decoder_inputs,
output_mem=decoder_mem,
size=decoder_size)
out = paddle.layer.fc(
size=target_dict_dim,
bias_attr=True,
act=paddle.activation.Softmax(),
input=two_layer_lstm_step)
return out
decoder_group_name = 'decoder_group'
group_input1 = paddle.layer.StaticInput(input=encoded_vector)
group_input2 = paddle.layer.StaticInput(input=encoded_proj)
group_inputs = [group_input1, group_input2]
if not is_generating:
trg_embedding = paddle.layer.embedding(
input=paddle.layer.data(
name='two_layer_lstm_reader',
type=paddle.data_type.integer_value_sequence(target_dict_dim)),
size=word_vector_dim,
param_attr=paddle.attr.ParamAttr(name='_two_layer_lstm_reader_embedding'))
group_inputs.append(trg_embedding)
decoder = paddle.layer.recurrent_group(
name=decoder_group_name,
step=two_layer_lstm_decoder,
input=group_inputs)
lbl = paddle.layer.data(
name='two_layer_lstm_reader_next_word',
type=paddle.data_type.integer_value_sequence(target_dict_dim))
cost = paddle.layer.classification_cost(input=decoder, label=lbl)
return cost
else:
trg_embedding = paddle.layer.GeneratedInput(
size=target_dict_dim,
embedding_name='_two_layer_lstm_reader_embedding',
embedding_size=word_vector_dim)
group_inputs.append(trg_embedding)
answer_gen = paddle.layer.beam_search(
name=decoder_group_name,
step=two_layer_lstm_decoder,
input=group_inputs,
bos_id=0,
eos_id=1,
beam_size=beam_size,
max_length=max_length)
return answer_gen
\ No newline at end of file
```
Pass id: 1, batch id: 5900, cost: 1.094884, pass_acc: 0.781732
Pass id: 1, batch id: 6000, cost: 0.867171, pass_acc: 0.782620
Pass id: 1, batch id: 6100, cost: 0.190018, pass_acc: 0.782945
Pass id: 1, batch id: 6200, cost: 0.225717, pass_acc: 0.783059
Pass id: 1, test_acc: 0.820560
Pass id: 2, batch id: 100, cost: 0.172085, pass_acc: 0.836634
Pass id: 2, batch id: 200, cost: 0.425561, pass_acc: 0.825871
Pass id: 2, batch id: 300, cost: 0.201359, pass_acc: 0.833887
···
\ No newline at end of file
class TrainConfig(object):
# Whether to use GPU in training or not.
use_gpu = True
# The training batch size.
batch_size = 4
# The epoch number.
num_passes = 30
# The global learning rate.
learning_rate = 0.01
# Training log will be printed every log_period.
log_period = 100
class TestConfig(object):
# Whether to use GPU in training or not.
use_gpu = True
# The training batch size.
batch_size = 4
# The epoch number.
num_passes = 30
# The global learning rate.
learning_rate = 0.01
# Training log will be printed every log_period.
log_period = 100
#!/bin/sh
wget http://ai.stanford.edu/%7Eamaas/data/sentiment/aclImdb_v1.tar.gz; tar zxf aclImdb_v1.tar.gz
import numpy as np
import sys
import os
import argparse
import time
import paddle.v2 as paddle
import paddle.fluid as fluid
import paddle.fluid.profiler as profiler
from config import TestConfig as conf
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
'--dict_path',
type=str,
required=True,
help="Path of the word dictionary.")
return parser.parse_args()
# Define to_lodtensor function to process the sequential data.
def to_lodtensor(data, place):
seq_lens = [len(seq) for seq in data]
cur_len = 0
lod = [cur_len]
for l in seq_lens:
cur_len += l
lod.append(cur_len)
flattened_data = np.concatenate(data, axis=0).astype("int64")
flattened_data = flattened_data.reshape([len(flattened_data), 1])
res = fluid.LoDTensor()
res.set(flattened_data, place)
res.set_lod([lod])
return res
# Load the dictionary.
def load_vocab(filename):
vocab = {}
with open(filename) as f:
for idx, line in enumerate(f):
vocab[line.strip()] = idx
return vocab
# Define the convolution model.
def conv_net(dict_dim,
window_size=3,
emb_dim=128,
num_filters=128,
fc0_dim=96,
class_dim=2):
data = fluid.layers.data(
name="words", shape=[1], dtype="int64", lod_level=1)
label = fluid.layers.data(name="label", shape=[1], dtype="int64")
emb = fluid.layers.embedding(input=data, size=[dict_dim, emb_dim])
conv_3 = fluid.nets.sequence_conv_pool(
input=emb,
num_filters=num_filters,
filter_size=window_size,
act="tanh",
pool_type="max")
fc_0 = fluid.layers.fc(input=[conv_3], size=fc0_dim)
prediction = fluid.layers.fc(input=[fc_0], size=class_dim, act="softmax")
cost = fluid.layers.cross_entropy(input=prediction, label=label)
avg_cost = fluid.layers.mean(x=cost)
return data, label, prediction, avg_cost
def main(dict_path):
word_dict = load_vocab(dict_path)
word_dict["<unk>"] = len(word_dict)
dict_dim = len(word_dict)
print("The dictionary size is : %d" % dict_dim)
data, label, prediction, avg_cost = conv_net(dict_dim)
sgd_optimizer = fluid.optimizer.SGD(learning_rate=conf.learning_rate)
sgd_optimizer.minimize(avg_cost)
# The training data set.
train_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.imdb.train(word_dict), buf_size=51200),
batch_size=conf.batch_size)
# The testing data set.
test_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.imdb.test(word_dict), buf_size=51200),
batch_size=conf.batch_size)
if conf.use_gpu:
place = fluid.CUDAPlace(0)
else:
place = fluid.CPUPlace()
exe = fluid.Executor(place)
feeder = fluid.DataFeeder(feed_list=[data, label], place=place)
exe.run(fluid.default_startup_program())
print("Done Inferring.")
if __name__ == '__main__':
args = parse_args()
with profiler.profiler("GPU", 'total') as prof:
main(args.dict_path)
import numpy as np
import sys
import os
import argparse
import time
import paddle.v2 as paddle
import paddle.fluid as fluid
from config import TrainConfig as conf
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
'--dict_path',
type=str,
required=True,
help="Path of the word dictionary.")
return parser.parse_args()
# Define to_lodtensor function to process the sequential data.
def to_lodtensor(data, place):
seq_lens = [len(seq) for seq in data]
cur_len = 0
lod = [cur_len]
for l in seq_lens:
cur_len += l
lod.append(cur_len)
flattened_data = np.concatenate(data, axis=0).astype("int64")
flattened_data = flattened_data.reshape([len(flattened_data), 1])
res = fluid.LoDTensor()
res.set(flattened_data, place)
res.set_lod([lod])
return res
# Load the dictionary.
def load_vocab(filename):
vocab = {}
with open(filename) as f:
for idx, line in enumerate(f):
vocab[line.strip()] = idx
return vocab
# Define the convolution model.
def conv_net(dict_dim,
window_size=3,
emb_dim=128,
num_filters=128,
fc0_dim=96,
class_dim=2):
data = fluid.layers.data(
name="words", shape=[1], dtype="int64", lod_level=1)
label = fluid.layers.data(name="label", shape=[1], dtype="int64")
emb = fluid.layers.embedding(input=data, size=[dict_dim, emb_dim])
conv_3 = fluid.nets.sequence_conv_pool(
input=emb,
num_filters=num_filters,
filter_size=window_size,
act="tanh",
pool_type="max")
fc_0 = fluid.layers.fc(input=[conv_3], size=fc0_dim)
prediction = fluid.layers.fc(input=[fc_0], size=class_dim, act="softmax")
cost = fluid.layers.cross_entropy(input=prediction, label=label)
avg_cost = fluid.layers.mean(x=cost)
return data, label, prediction, avg_cost
def main(dict_path):
word_dict = load_vocab(dict_path)
word_dict["<unk>"] = len(word_dict)
dict_dim = len(word_dict)
print("The dictionary size is : %d" % dict_dim)
data, label, prediction, avg_cost = conv_net(dict_dim)
sgd_optimizer = fluid.optimizer.SGD(learning_rate=conf.learning_rate)
sgd_optimizer.minimize(avg_cost)
batch_size_var = fluid.layers.create_tensor(dtype='int64')
batch_acc_var = fluid.layers.accuracy(
input=prediction, label=label, total=batch_size_var)
inference_program = fluid.default_main_program().clone()
with fluid.program_guard(inference_program):
inference_program = fluid.io.get_inference_program(
target_vars=[batch_acc_var, batch_size_var])
# The training data set.
train_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.imdb.train(word_dict), buf_size=51200),
batch_size=conf.batch_size)
# The testing data set.
test_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.imdb.test(word_dict), buf_size=51200),
batch_size=conf.batch_size)
if conf.use_gpu:
place = fluid.CUDAPlace(0)
else:
place = fluid.CPUPlace()
exe = fluid.Executor(place)
feeder = fluid.DataFeeder(feed_list=[data, label], place=place)
exe.run(fluid.default_startup_program())
train_pass_acc_evaluator = fluid.average.WeightedAverage()
test_pass_acc_evaluator = fluid.average.WeightedAverage()
def test(exe):
test_pass_acc_evaluator.reset()
for batch_id, data in enumerate(test_reader()):
input_seq = to_lodtensor(map(lambda x: x[0], data), place)
y_data = np.array(map(lambda x: x[1], data)).astype("int64")
y_data = y_data.reshape([-1, 1])
b_acc, b_size = exe.run(inference_program,
feed={"words": input_seq,
"label": y_data},
fetch_list=[batch_acc_var, batch_size_var])
test_pass_acc_evaluator.add(value=b_acc, weight=b_size)
test_acc = test_pass_acc_evaluator.eval()
return test_acc
total_time = 0.
for pass_id in xrange(conf.num_passes):
train_pass_acc_evaluator.reset()
start_time = time.time()
for batch_id, data in enumerate(train_reader()):
cost_val, acc_val, size_val = exe.run(
fluid.default_main_program(),
feed=feeder.feed(data),
fetch_list=[avg_cost, batch_acc_var, batch_size_var])
train_pass_acc_evaluator.add(value=acc_val, weight=size_val)
if batch_id and batch_id % conf.log_period == 0:
print("Pass id: %d, batch id: %d, cost: %f, pass_acc: %f" %
(pass_id, batch_id, cost_val,
train_pass_acc_evaluator.eval()))
end_time = time.time()
total_time += (end_time - start_time)
pass_test_acc = test(exe)
print("Pass id: %d, test_acc: %f" % (pass_id, pass_test_acc))
print("Total train time: %f" % (total_time))
if __name__ == '__main__':
args = parse_args()
main(args.dict_path)
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册