未验证 提交 4026558a 编写于 作者: G guochaorong 提交者: GitHub

Merge pull request #1243 from guochaorong/remove_unrelease_models

reomve unrelease models
The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
## Deep Automatic Speech Recognition
### Introduction
TBD
### Installation
#### Kaldi
The decoder depends on [kaldi](https://github.com/kaldi-asr/kaldi), install it by flowing its instructions. Then
```shell
export KALDI_ROOT=<absolute path to kaldi>
```
#### Decoder
```shell
git clone https://github.com/PaddlePaddle/models.git
cd models/fluid/DeepASR/decoder
sh setup.sh
```
### Data reprocessing
TBD
### Training
TBD
### Inference & Decoding
TBD
### Question and Contribution
TBD
运行本目录下的程序示例需要使用 PaddlePaddle v0.14及以上版本。如果您的 PaddlePaddle 安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新 PaddlePaddle 安装版本。
---
DeepASR (Deep Automatic Speech Recognition) 是一个基于PaddlePaddle FLuid与[Kaldi](http://www.kaldi-asr.org)的语音识别系统。其利用Fluid框架完成语音识别中声学模型的配置和训练,并集成 Kaldi 的解码器。旨在方便已对 Kaldi 的较为熟悉的用户实现中声学模型的快速、大规模训练,并利用kaldi完成复杂的语音数据预处理和最终的解码过程。
### 目录
- [模型概览](#model-overview)
- [安装](#installation)
- [数据预处理](#data-reprocessing)
- [模型训练](#training)
- [训练过程中的时间分析](#perf-profiling)
- [预测和解码](#infer-decoding)
- [评估错误率](#scoring-error-rate)
- [Aishell 实例](#aishell-example)
- [欢迎贡献更多的实例](#how-to-contrib)
### 模型概览
DeepASR的声学模型是一个单卷积层加多层层叠LSTMP 的结构,利用卷积来进行初步的特征提取,并用多层的LSTMP来对时序关系进行建模,所用到的损失函数是交叉熵。[LSTMP](https://arxiv.org/abs/1402.1128)(LSTM with recurrent projection layer)是传统 LSTM 的拓展,在 LSTM 的基础上增加了一个映射层,将隐含层映射到较低的维度并输入下一个时间步,这种结构在大为减小 LSTM 的参数规模和计算复杂度的同时还提升了 LSTM 的性能表现。
<p align="center">
<img src="images/lstmp.png" height=240 width=480 hspace='10'/> <br />
图1 LSTMP 的拓扑结构
</p>
### 安装
#### kaldi的安装与设置
DeepASR解码过程中所用的解码器依赖于[Kaldi的安装](https://github.com/kaldi-asr/kaldi),如环境中无Kaldi, 请`git clone`其源代码,并按给定的命令安装好kaldi,最后设置环境变量`KALDI_ROOT`
```shell
export KALDI_ROOT=<kaldi的安装路径>
```
#### 解码器的安装
进入解码器源码所在的目录
```shell
cd models/fluid/DeepASR/decoder
```
运行安装脚本
```shell
sh setup.sh
```
编译过程完成即成功地安转了解码器。
### 数据预处理
参考[Kaldi的数据准备流程](http://kaldi-asr.org/doc/data_prep.html)完成音频数据的特征提取和标签对齐
### 声学模型的训练
可以选择在CPU或GPU模式下进行声学模型的训练,例如在GPU模式下的训练
```shell
CUDA_VISIBLE_DEVICES=0,1,2,3 python -u train.py \
--train_feature_lst train_feature.lst \
--train_label_lst train_label.lst \
--val_feature_lst val_feature.lst \
--val_label_lst val_label.lst \
--mean_var global_mean_var \
--parallel
```
其中`train_feature.lst``train_label.lst`分别是训练数据集的特征列表文件和标注列表文件,类似的,`val_feature.lst``val_label.lst`对应的则是验证集的列表文件。实际训练过程中要正确指定建模单元大小、学习率等重要参数。关于这些参数的说明,请运行
```shell
python train.py --help
```
获取更多信息。
### 训练过程中的时间分析
利用Fluid提供的性能分析工具profiler,可对训练过程进行性能分析,获取网络中operator级别的执行时间
```shell
CUDA_VISIBLE_DEVICES=0 python -u tools/profile.py \
--train_feature_lst train_feature.lst \
--train_label_lst train_label.lst \
--val_feature_lst val_feature.lst \
--val_label_lst val_label.lst \
--mean_var global_mean_var
```
### 预测和解码
在充分训练好声学模型之后,利用训练过程中保存下来的模型checkpoint,可对输入的音频数据进行解码输出,得到声音到文字的识别结果
```
CUDA_VISIBLE_DEVICES=0,1,2,3 python -u infer_by_ckpt.py \
--batch_size 96 \
--checkpoint deep_asr.pass_1.checkpoint \
--infer_feature_lst test_feature.lst \
--infer_label_lst test_label.lst \
--mean_var global_mean_var \
--parallel
```
### 评估错误率
对语音识别系统的评价常用的指标有词错误率(Word Error Rate, WER)和字错误率(Character Error Rate, CER), 在DeepASR中也实现了相关的度量工具,其运行方式为
```
python score_error_rate.py --error_rate_type cer --ref ref.txt --hyp decoding.txt
```
参数`error_rate_type`表示测量错误率的类型,即 WER 或 CER;`ref.txt``decoding.txt` 分别表示参考文本和实际解码出的文本,它们有着同样的格式:
```
key1 text1
key2 text2
key3 text3
...
```
### Aishell 实例
本节以[Aishell数据集](http://www.aishelltech.com/kysjcp)为例,展示如何完成从数据预处理到解码输出。Aishell是由北京希尔贝克公司所开放的中文普通话语音数据集,时长178小时,包含了400名来自不同口音区域录制者的语音,原始数据可由[openslr](http://www.openslr.org/33)获取。为简化流程,这里提供了已完成预处理的数据集供下载:
```
cd examples/aishell
sh prepare_data.sh
```
其中包括了声学模型的训练数据以及解码过程中所用到的辅助文件等。下载数据完成后,在开始训练之前可对训练过程进行分析
```
sh profile.sh
```
执行训练
```
sh train.sh
```
默认是用4卡GPU进行训练,在实际过程中可根据可用GPU的数目和显存大小对`batch_size`、学习率等参数进行动态调整。训练过程中典型的损失函数和精度的变化趋势如图2所示
<p align="center">
<img src="images/learning_curve.png" height=480 width=640 hspace='10'/> <br />
图2 在Aishell数据集上训练声学模型的学习曲线
</p>
完成模型训练后,即可执行预测识别测试集语音中的文字:
```
sh infer_by_ckpt.sh
```
其中包括了声学模型的预测和解码器的解码输出两个重要的过程。以下是解码输出的样例:
```
...
BAC009S0764W0239 十一 五 期间 我 国 累计 境外 投资 七千亿 美元
BAC009S0765W0140 在 了解 送 方 的 资产 情况 与 需求 之后
BAC009S0915W0291 这 对 苹果 来说 不 是 件 容易 的 事 儿
BAC009S0769W0159 今年 土地 收入 预计 近 四万亿 元
BAC009S0907W0451 由 浦东 商店 作为 掩护
BAC009S0768W0128 土地 交易 可能 随着 供应 淡季 的 到来 而 降温
...
```
每行对应一个输出,均以音频样本的关键字开头,随后是按词分隔的解码出的中文文本。解码完成后运行脚本评估字错误率(CER)
```
sh score_cer.sh
```
其输出类似于如下所示
```
Error rate[cer] = 0.101971 (10683/104765),
total 7176 sentences in hyp, 0 not presented in ref.
```
利用经过20轮左右训练的声学模型,可以在Aishell的测试集上得到CER约10%的识别结果。
### 欢迎贡献更多的实例
DeepASR目前只开放了Aishell实例,我们欢迎用户在更多的数据集上测试完整的训练流程并贡献到这个项目中。
此差异已折叠。
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
import data_utils.augmentor.trans_add_delta as trans_add_delta
import data_utils.augmentor.trans_splice as trans_splice
16.2845556399 11.6891798673
17.21509949 12.3788567902
18.1143704548 14.9912618017
19.2335963752 18.5419556172
19.9266772451 21.2768220522
19.8245737202 21.2347210705
19.5432940972 20.2784036567
19.4631271754 20.2934452329
19.3929919324 20.457971868
19.2924788362 20.3626439234
18.9207244502 19.9196569759
18.7202605641 19.5920276899
18.4844279398 19.2068349019
18.2670948624 18.8716893824
18.0929628855 18.5439666541
17.8428896026 18.0255891747
17.6646850635 17.473764296
17.4955705896 16.8966859471
17.3706720293 16.4294027467
17.2530867792 16.0514717623
17.1304341172 15.7234699057
17.0038353287 15.4344471514
16.902550309 15.1603287337
16.8375590047 14.9304337826
16.816287853 14.9119310513
16.828838265 15.0930023024
16.8602209498 15.3771992423
16.9101763812 15.6897991789
16.9466065143 15.9364556489
16.9486061956 16.0699417826
16.9041374104 16.0796970272
16.8410093699 16.0111444599
16.7045718836 15.7991985601
16.51128489 15.5208920129
16.3253910608 15.2603181921
16.1297317333 14.9499965958
15.903428372 14.5958280409
15.6131718105 14.2709618
15.1395035533 13.9993939893
14.4298229999 13.3841189151
0.0034970565424 0.246184766149
0.00501284154705 0.238484972472
0.00605942680019 0.269064381708
0.00687266156243 0.319479238011
0.00734065019253 0.371947383205
0.00718807218417 0.384426479694
0.00652195540212 0.384676838281
0.00660416525951 0.395543910317
0.00680202057642 0.400803979681
0.00659144183007 0.393228973031
0.00605294530423 0.385021118038
0.00590452969394 0.361763039625
0.00612315374687 0.346777773373
0.00582354093973 0.335802403976
0.00574556002554 0.320733728218
0.00612254485891 0.310153103033
0.00626733043219 0.299854747445
0.00567398408041 0.293353685493
0.00519236700706 0.287668810947
0.00529581474367 0.281479660772
0.00479019484082 0.27451415777
0.00486381039428 0.266294391154
0.00491126372868 0.258105116126
0.00452105305011 0.252926328298
0.00531483334271 0.250910887373
0.00546572110469 0.253302256977
0.00479544857908 0.258484183394
0.00422106426297 0.264582900173
0.00401824135188 0.268467945623
0.0041705465252 0.269699480291
0.00405239564143 0.270406162975
0.0040059737566 0.270407601782
0.00406426729317 0.267951582656
0.00416613791013 0.264543833042
0.00427847607653 0.26247798891
0.00428050903034 0.259635263243
0.00454842971786 0.255829377617
0.00393747552387 0.253802307025
0.00374143688909 0.251011478787
0.00335475310258 0.236543650856
0.000373194755312 0.0419494800709
0.000230909648678 0.0394102370205
0.000150840015851 0.0414956922398
8.44401840771e-05 0.0460502231327
-6.24759314572e-06 0.0528049937739
-8.82957758148e-05 0.055711244886
1.16795791952e-05 0.0563188428833
-1.68716267856e-05 0.0575232763711
-0.000112625308645 0.057979929947
-0.000122619090002 0.0564126233493
1.73569637319e-05 0.05522573909
6.49872782342e-05 0.0507353361334
4.17746389178e-05 0.0479568131253
5.13884475653e-05 0.0461253238047
1.8860115143e-05 0.0436860476919
-5.64317701105e-05 0.042516381059
-0.000136859948115 0.0413574820205
-7.00847019726e-05 0.0409516370727
-5.39392223336e-05 0.040441504085
-9.24897162815e-05 0.0397800398173
4.7104970622e-05 0.039046286243
6.24805896165e-06 0.0380185986602
-2.35272813418e-05 0.036851063786
5.88344154127e-05 0.0361640489242
-8.39162076993e-05 0.0357639427311
-0.000108702805776 0.0358774639538
3.22013961834e-06 0.0363644530435
9.43501518394e-05 0.0370309934774
0.000134406229423 0.0374972993343
3.84007008533e-05 0.037676222515
3.05989328157e-05 0.0379111939182
9.52201629091e-05 0.0380927209106
0.000102126083729 0.0379925358499
6.98628072264e-05 0.0377276252241
4.55782256339e-05 0.0375165468654
4.76370987786e-05 0.0371482526345
-2.24128832709e-05 0.0366810742947
0.000125621306953 0.036628355271
0.000134568666093 0.0364860461759
0.000159858844464 0.0345583593149
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys
import unittest
import numpy as np
import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
import data_utils.augmentor.trans_add_delta as trans_add_delta
import data_utils.augmentor.trans_splice as trans_splice
import data_utils.augmentor.trans_delay as trans_delay
class TestTransMeanVarianceNorm(unittest.TestCase):
"""unit test for TransMeanVarianceNorm
"""
def setUp(self):
self._file_path = "./data_utils/augmentor/tests/data/" \
"global_mean_var_search26kHr"
def test(self):
feature = np.zeros((2, 120), dtype="float32")
feature.fill(1)
trans = trans_mean_variance_norm.TransMeanVarianceNorm(self._file_path)
(feature1, label1, name) = trans.perform_trans((feature, None, None))
(mean, var) = trans.get_mean_var()
feature_flat1 = feature1.flatten()
feature_flat = feature.flatten()
one = np.ones((1), dtype="float32")
for idx, val in enumerate(feature_flat1):
cur_idx = idx % 120
self.assertAlmostEqual(val, (one[0] - mean[cur_idx]) * var[cur_idx])
class TestTransAddDelta(unittest.TestCase):
"""unit test TestTransAddDelta
"""
def test_regress(self):
"""test regress
"""
feature = np.zeros((14, 120), dtype="float32")
feature[0:5, 0:40].fill(1)
feature[0 + 5, 0:40].fill(1)
feature[1 + 5, 0:40].fill(2)
feature[2 + 5, 0:40].fill(3)
feature[3 + 5, 0:40].fill(4)
feature[8:14, 0:40].fill(4)
trans = trans_add_delta.TransAddDelta()
feature = feature.reshape((14 * 120))
trans._regress(feature, 5 * 120, feature, 5 * 120 + 40, 40, 4, 120)
trans._regress(feature, 5 * 120 + 40, feature, 5 * 120 + 80, 40, 4, 120)
feature = feature.reshape((14, 120))
tmp_feature = feature[5:5 + 4, :]
self.assertAlmostEqual(1.0, tmp_feature[0][0])
self.assertAlmostEqual(0.24, tmp_feature[0][119])
self.assertAlmostEqual(2.0, tmp_feature[1][0])
self.assertAlmostEqual(0.13, tmp_feature[1][119])
self.assertAlmostEqual(3.0, tmp_feature[2][0])
self.assertAlmostEqual(-0.13, tmp_feature[2][119])
self.assertAlmostEqual(4.0, tmp_feature[3][0])
self.assertAlmostEqual(-0.24, tmp_feature[3][119])
def test_perform(self):
"""test perform
"""
feature = np.zeros((4, 40), dtype="float32")
feature[0, 0:40].fill(1)
feature[1, 0:40].fill(2)
feature[2, 0:40].fill(3)
feature[3, 0:40].fill(4)
trans = trans_add_delta.TransAddDelta()
(feature, label, name) = trans.perform_trans((feature, None, None))
self.assertAlmostEqual(feature.shape[0], 4)
self.assertAlmostEqual(feature.shape[1], 120)
self.assertAlmostEqual(1.0, feature[0][0])
self.assertAlmostEqual(0.24, feature[0][119])
self.assertAlmostEqual(2.0, feature[1][0])
self.assertAlmostEqual(0.13, feature[1][119])
self.assertAlmostEqual(3.0, feature[2][0])
self.assertAlmostEqual(-0.13, feature[2][119])
self.assertAlmostEqual(4.0, feature[3][0])
self.assertAlmostEqual(-0.24, feature[3][119])
class TestTransSplict(unittest.TestCase):
"""unit test Test TransSplict
"""
def test_perfrom(self):
feature = np.zeros((8, 10), dtype="float32")
for i in xrange(feature.shape[0]):
feature[i, :].fill(i)
trans = trans_splice.TransSplice()
(feature, label, name) = trans.perform_trans((feature, None, None))
self.assertEqual(feature.shape[1], 110)
for i in xrange(8):
nzero_num = 5 - i
cur_val = 0.0
if nzero_num < 0:
cur_val = i - 5 - 1
for j in xrange(11):
if j <= nzero_num:
for k in xrange(10):
self.assertAlmostEqual(feature[i][j * 10 + k], cur_val)
else:
if cur_val < 7:
cur_val += 1.0
for k in xrange(10):
self.assertAlmostEqual(feature[i][j * 10 + k], cur_val)
class TestTransDelay(unittest.TestCase):
"""unittest TransDelay
"""
def test_perform(self):
label = np.zeros((10, 1), dtype="int64")
for i in xrange(10):
label[i][0] = i
trans = trans_delay.TransDelay(5)
(_, label, _) = trans.perform_trans((None, label, None))
for i in xrange(5):
self.assertAlmostEqual(label[i + 5][0], i)
for i in xrange(5):
self.assertAlmostEqual(label[i][0], 0)
if __name__ == '__main__':
unittest.main()
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import math
import copy
class TransAddDelta(object):
""" add delta of feature data
trans feature for shape(a, b) to shape(a, b * 3)
Attributes:
_norder(int):
_window(int):
"""
def __init__(self, norder=2, nwindow=2):
""" init construction
Args:
norder: default 2
nwindow: default 2
"""
self._norder = norder
self._nwindow = nwindow
def perform_trans(self, sample):
""" add delta for feature
trans feature shape from (a,b) to (a, b * 3)
Args:
sample(object,tuple): contain feature numpy and label numpy
Returns:
(feature, label, name)
"""
(feature, label, name) = sample
frame_dim = feature.shape[1]
d_frame_dim = frame_dim * 3
head_filled = 5
tail_filled = 5
mat = np.zeros(
(feature.shape[0] + head_filled + tail_filled, d_frame_dim),
dtype="float32")
#copy first frame
for i in xrange(head_filled):
np.copyto(mat[i, 0:frame_dim], feature[0, :])
np.copyto(mat[head_filled:head_filled + feature.shape[0], 0:frame_dim],
feature[:, :])
# copy last frame
for i in xrange(head_filled + feature.shape[0], mat.shape[0], 1):
np.copyto(mat[i, 0:frame_dim], feature[feature.shape[0] - 1, :])
nframe = feature.shape[0]
start = head_filled
tmp_shape = mat.shape
mat = mat.reshape((tmp_shape[0] * tmp_shape[1]))
self._regress(mat, start * d_frame_dim, mat,
start * d_frame_dim + frame_dim, frame_dim, nframe,
d_frame_dim)
self._regress(mat, start * d_frame_dim + frame_dim, mat,
start * d_frame_dim + 2 * frame_dim, frame_dim, nframe,
d_frame_dim)
mat.shape = tmp_shape
return (mat[head_filled:mat.shape[0] - tail_filled, :], label, name)
def _regress(self, data_in, start_in, data_out, start_out, size, n, step):
""" regress
Args:
data_in: in data
start_in: start index of data_in
data_out: out data
start_out: start index of data_out
size: frame dimentional
n: frame num
step: 3 * (frame num)
Returns:
None
"""
sigma_t2 = 0.0
delta_window = self._nwindow
for t in xrange(1, delta_window + 1):
sigma_t2 += t * t
sigma_t2 *= 2.0
for i in xrange(n):
fp1 = start_in
fp2 = start_out
for j in xrange(size):
back = fp1
forw = fp1
sum = 0.0
for t in xrange(1, delta_window + 1):
back -= step
forw += step
sum += t * (data_in[forw] - data_in[back])
data_out[fp2] = sum / sigma_t2
fp1 += 1
fp2 += 1
start_in += step
start_out += step
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import math
class TransDelay(object):
""" Delay label, and copy first label value in the front.
Attributes:
_delay_time : the delay frame num of label
"""
def __init__(self, delay_time):
"""init construction
Args:
delay_time : the delay frame num of label
"""
self._delay_time = delay_time
def perform_trans(self, sample):
"""
Args:
sample(object):input sample, contain feature numpy and label numpy, sample name list
Returns:
(feature, label, name)
"""
(feature, label, name) = sample
shape = label.shape
assert len(shape) == 2
label[self._delay_time:shape[0]] = label[0:shape[0] - self._delay_time]
for i in xrange(self._delay_time):
label[i][0] = label[self._delay_time][0]
return (feature, label, name)
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import math
class TransMeanVarianceNorm(object):
""" normalization of mean variance for feature data
Attributes:
_mean(numpy.array): the feature mean vector
_var(numpy.array): the feature variance
"""
def __init__(self, snorm_path):
"""init construction
Args:
snorm_path: the path of mean and variance
"""
self._mean = None
self._var = None
self._load_norm(snorm_path)
def _load_norm(self, snorm_path):
""" load mean var file
Args:
snorm_path(str):the file path
"""
lLines = open(snorm_path).readlines()
nLen = len(lLines)
self._mean = np.zeros((nLen), dtype="float32")
self._var = np.zeros((nLen), dtype="float32")
self._nLen = nLen
for nidx, l in enumerate(lLines):
s = l.split()
assert len(s) == 2
self._mean[nidx] = float(s[0])
self._var[nidx] = 1.0 / math.sqrt(float(s[1]))
if self._var[nidx] > 100000.0:
self._var[nidx] = 100000.0
def get_mean_var(self):
""" get mean and var
Args:
Returns:
(mean, var)
"""
return (self._mean, self._var)
def perform_trans(self, sample):
""" feature = (feature - mean) * var
Args:
sample(object):input sample, contain feature numpy and label numpy
Returns:
(feature, label, name)
"""
(feature, label, name) = sample
shape = feature.shape
assert len(shape) == 2
nfeature_len = shape[0] * shape[1]
assert nfeature_len % self._nLen == 0
ncur_idx = 0
feature = feature.reshape((nfeature_len))
while ncur_idx < nfeature_len:
block = feature[ncur_idx:ncur_idx + self._nLen]
block = (block - self._mean) * self._var
feature[ncur_idx:ncur_idx + self._nLen] = block
ncur_idx += self._nLen
feature = feature.reshape(shape)
return (feature, label, name)
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import math
class TransSplice(object):
""" copy feature context to construct new feature
expand feature data from shape (frame_num, frame_dim)
to shape (frame_num, frame_dim * 11)
Attributes:
_nleft_context(int): copy left context number
_nright_context(int): copy right context number
"""
def __init__(self, nleft_context=5, nright_context=5):
""" init construction
Args:
nleft_context(int):
nright_context(int):
"""
self._nleft_context = nleft_context
self._nright_context = nright_context
def perform_trans(self, sample):
""" copy feature context
Args:
sample(object): input sample(feature, label)
Return:
(feature, label, name)
"""
(feature, label, name) = sample
nframe_num = feature.shape[0]
nframe_dim = feature.shape[1]
nnew_frame_dim = nframe_dim * (
self._nleft_context + self._nright_context + 1)
mat = np.zeros(
(nframe_num + self._nleft_context + self._nright_context,
nframe_dim),
dtype="float32")
ret = np.zeros((nframe_num, nnew_frame_dim), dtype="float32")
#copy left
for i in xrange(self._nleft_context):
mat[i, :] = feature[0, :]
#copy middle
mat[self._nleft_context:self._nleft_context +
nframe_num, :] = feature[:, :]
#copy right
for i in xrange(self._nright_context):
mat[i + self._nleft_context + nframe_num, :] = feature[-1, :]
mat = mat.reshape(mat.shape[0] * mat.shape[1])
ret = ret.reshape(ret.shape[0] * ret.shape[1])
for i in xrange(nframe_num):
np.copyto(ret[i * nnew_frame_dim:(i + 1) * nnew_frame_dim],
mat[i * nframe_dim:i * nframe_dim + nnew_frame_dim])
ret = ret.reshape((nframe_num, nnew_frame_dim))
return (ret, label, name)
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys
from six import reraise
from tblib import Traceback
import numpy as np
def to_lodtensor(data, place):
"""convert tensor to lodtensor
"""
seq_lens = [len(seq) for seq in data]
cur_len = 0
lod = [cur_len]
for l in seq_lens:
cur_len += l
lod.append(cur_len)
flattened_data = numpy.concatenate(data, axis=0).astype("int64")
flattened_data = flattened_data.reshape([len(flattened_data), 1])
res = fluid.LoDTensor()
res.set(flattened_data, place)
res.set_lod([lod])
return res
def lodtensor_to_ndarray(lod_tensor):
"""conver lodtensor to ndarray
"""
dims = lod_tensor._get_dims()
ret = np.zeros(shape=dims).astype('float32')
for i in xrange(np.product(dims)):
ret.ravel()[i] = lod_tensor.get_float_element(i)
return ret, lod_tensor.lod()
def split_infer_result(infer_seq, lod):
infer_batch = []
for i in xrange(0, len(lod[0]) - 1):
infer_batch.append(infer_seq[lod[0][i]:lod[0][i + 1]])
return infer_batch
class CriticalException(Exception):
pass
def suppress_signal(signo, stack_frame):
pass
def suppress_complaints(verbose, notify=None):
def decorator_maker(func):
def suppress_warpper(*args, **kwargs):
try:
func(*args, **kwargs)
except:
et, ev, tb = sys.exc_info()
if notify is not None:
notify(except_type=et, except_value=ev, traceback=tb)
if verbose == 1 or isinstance(ev, CriticalException):
reraise(et, ev, Traceback(tb).as_traceback())
return suppress_warpper
return decorator_maker
class ForceExitWrapper(object):
def __init__(self, exit_flag):
self._exit_flag = exit_flag
@suppress_complaints(verbose=0)
def __call__(self, *args, **kwargs):
self._exit_flag.value = True
def __eq__(self, flag):
return self._exit_flag.value == flag
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "post_latgen_faster_mapped.h"
#include <limits>
#include "ThreadPool.h"
using namespace kaldi;
typedef kaldi::int32 int32;
using fst::SymbolTable;
using fst::Fst;
using fst::StdArc;
Decoder::Decoder(std::string trans_model_in_filename,
std::string word_syms_filename,
std::string fst_in_filename,
std::string logprior_in_filename,
size_t beam_size,
kaldi::BaseFloat acoustic_scale) {
const char *usage =
"Generate lattices using neural net model.\n"
"Usage: post-latgen-faster-mapped [options] <trans-model> "
"<fst-in|fsts-rspecifier> <logprior> <posts-rspecifier>"
" <lattice-wspecifier> [ <words-wspecifier> [<alignments-wspecifier>] "
"]\n";
ParseOptions po(usage);
allow_partial = false;
this->acoustic_scale = acoustic_scale;
config.Register(&po);
int32 beam = 11;
po.Register("acoustic-scale",
&acoustic_scale,
"Scaling factor for acoustic likelihoods");
po.Register("word-symbol-table",
&word_syms_filename,
"Symbol table for words [for debug output]");
po.Register("allow-partial",
&allow_partial,
"If true, produce output even if end state was not reached.");
int argc = 2;
char *argv[] = {(char *)"post-latgen-faster-mapped",
(char *)("--beam=" + std::to_string(beam_size)).c_str()};
po.Read(argc, argv);
std::ifstream is_logprior(logprior_in_filename);
logprior.Read(is_logprior, false);
{
bool binary;
Input ki(trans_model_in_filename, &binary);
this->trans_model.Read(ki.Stream(), binary);
}
this->determinize = config.determinize_lattice;
this->word_syms = NULL;
if (word_syms_filename != "") {
if (!(word_syms = fst::SymbolTable::ReadText(word_syms_filename))) {
KALDI_ERR << "Could not read symbol table from file "
<< word_syms_filename;
}
}
// Input FST is just one FST, not a table of FSTs.
this->decode_fst = fst::ReadFstKaldiGeneric(fst_in_filename);
kaldi::LatticeFasterDecoder *decoder =
new LatticeFasterDecoder(*decode_fst, config);
decoder_pool.emplace_back(decoder);
std::string lattice_wspecifier =
"ark:|gzip -c > mapped_decoder_data/lat.JOB.gz";
if (!(determinize ? compact_lattice_writer.Open(lattice_wspecifier)
: lattice_writer.Open(lattice_wspecifier)))
KALDI_ERR << "Could not open table for writing lattices: "
<< lattice_wspecifier;
words_writer = new Int32VectorWriter("");
alignment_writer = new Int32VectorWriter("");
}
Decoder::~Decoder() {
if (!this->word_syms) delete this->word_syms;
delete this->decode_fst;
for (size_t i = 0; i < decoder_pool.size(); ++i) {
delete decoder_pool[i];
}
delete words_writer;
delete alignment_writer;
}
void Decoder::decode_from_file(std::string posterior_rspecifier,
size_t num_processes) {
try {
double tot_like = 0.0;
kaldi::int64 frame_count = 0;
// int num_success = 0, num_fail = 0;
KALDI_ASSERT(ClassifyRspecifier(fst_in_filename, NULL, NULL) ==
kNoRspecifier);
SequentialBaseFloatMatrixReader posterior_reader("ark:" +
posterior_rspecifier);
Timer timer;
timer.Reset();
double elapsed = 0.0;
for (size_t n = decoder_pool.size(); n < num_processes; ++n) {
kaldi::LatticeFasterDecoder *decoder =
new LatticeFasterDecoder(*decode_fst, config);
decoder_pool.emplace_back(decoder);
}
elapsed = timer.Elapsed();
ThreadPool thread_pool(num_processes);
while (!posterior_reader.Done()) {
timer.Reset();
std::vector<std::future<std::string>> que;
for (size_t i = 0; i < num_processes && !posterior_reader.Done(); ++i) {
std::string utt = posterior_reader.Key();
Matrix<BaseFloat> &loglikes(posterior_reader.Value());
que.emplace_back(thread_pool.enqueue(std::bind(
&Decoder::decode_internal, this, decoder_pool[i], utt, loglikes)));
posterior_reader.Next();
}
timer.Reset();
for (size_t i = 0; i < que.size(); ++i) {
std::cout << que[i].get() << std::endl;
}
}
} catch (const std::exception &e) {
std::cerr << e.what();
}
}
inline kaldi::Matrix<kaldi::BaseFloat> vector2kaldi_mat(
const std::vector<std::vector<kaldi::BaseFloat>> &log_probs) {
size_t num_frames = log_probs.size();
size_t dim_label = log_probs[0].size();
kaldi::Matrix<kaldi::BaseFloat> loglikes(
num_frames, dim_label, kaldi::kSetZero, kaldi::kStrideEqualNumCols);
for (size_t i = 0; i < num_frames; ++i) {
memcpy(loglikes.Data() + i * dim_label,
log_probs[i].data(),
sizeof(kaldi::BaseFloat) * dim_label);
}
return loglikes;
}
std::vector<std::string> Decoder::decode_batch(
std::vector<std::string> keys,
const std::vector<std::vector<std::vector<kaldi::BaseFloat>>>
&log_probs_batch,
size_t num_processes) {
ThreadPool thread_pool(num_processes);
std::vector<std::string> decoding_results; //(keys.size(), "");
for (size_t n = decoder_pool.size(); n < num_processes; ++n) {
kaldi::LatticeFasterDecoder *decoder =
new LatticeFasterDecoder(*decode_fst, config);
decoder_pool.emplace_back(decoder);
}
size_t index = 0;
while (index < keys.size()) {
std::vector<std::future<std::string>> res_in_que;
for (size_t t = 0; t < num_processes && index < keys.size(); ++t) {
kaldi::Matrix<kaldi::BaseFloat> loglikes =
vector2kaldi_mat(log_probs_batch[index]);
res_in_que.emplace_back(
thread_pool.enqueue(std::bind(&Decoder::decode_internal,
this,
decoder_pool[t],
keys[index],
loglikes)));
index++;
}
for (size_t i = 0; i < res_in_que.size(); ++i) {
decoding_results.emplace_back(res_in_que[i].get());
}
}
return decoding_results;
}
std::string Decoder::decode(
std::string key,
const std::vector<std::vector<kaldi::BaseFloat>> &log_probs) {
kaldi::Matrix<kaldi::BaseFloat> loglikes = vector2kaldi_mat(log_probs);
return decode_internal(decoder_pool[0], key, loglikes);
}
std::string Decoder::decode_internal(
LatticeFasterDecoder *decoder,
std::string key,
kaldi::Matrix<kaldi::BaseFloat> &loglikes) {
if (loglikes.NumRows() == 0) {
KALDI_WARN << "Zero-length utterance: " << key;
// num_fail++;
}
KALDI_ASSERT(loglikes.NumCols() == logprior.Dim());
loglikes.ApplyLog();
loglikes.AddVecToRows(-1.0, logprior);
DecodableMatrixScaledMapped matrix_decodable(
trans_model, loglikes, acoustic_scale);
double like;
return this->DecodeUtteranceLatticeFaster(
decoder, matrix_decodable, key, &like);
}
std::string Decoder::DecodeUtteranceLatticeFaster(
LatticeFasterDecoder *decoder,
DecodableInterface &decodable, // not const but is really an input.
std::string utt,
double *like_ptr) { // puts utterance's like in like_ptr on success.
using fst::VectorFst;
std::string ret = utt + ' ';
if (!decoder->Decode(&decodable)) {
KALDI_WARN << "Failed to decode file " << utt;
return ret;
}
if (!decoder->ReachedFinal()) {
if (allow_partial) {
KALDI_WARN << "Outputting partial output for utterance " << utt
<< " since no final-state reached\n";
} else {
KALDI_WARN << "Not producing output for utterance " << utt
<< " since no final-state reached and "
<< "--allow-partial=false.\n";
return ret;
}
}
double likelihood;
LatticeWeight weight;
int32 num_frames;
{ // First do some stuff with word-level traceback...
VectorFst<LatticeArc> decoded;
if (!decoder->GetBestPath(&decoded))
// Shouldn't really reach this point as already checked success.
KALDI_ERR << "Failed to get traceback for utterance " << utt;
std::vector<int32> alignment;
std::vector<int32> words;
GetLinearSymbolSequence(decoded, &alignment, &words, &weight);
num_frames = alignment.size();
// if (alignment_writer->IsOpen()) alignment_writer->Write(utt, alignment);
if (word_syms != NULL) {
for (size_t i = 0; i < words.size(); i++) {
std::string s = word_syms->Find(words[i]);
ret += s + ' ';
}
}
likelihood = -(weight.Value1() + weight.Value2());
}
// Get lattice, and do determinization if requested.
Lattice lat;
decoder->GetRawLattice(&lat);
if (lat.NumStates() == 0)
KALDI_ERR << "Unexpected problem getting lattice for utterance " << utt;
fst::Connect(&lat);
if (determinize) {
CompactLattice clat;
if (!DeterminizeLatticePhonePrunedWrapper(
trans_model,
&lat,
decoder->GetOptions().lattice_beam,
&clat,
decoder->GetOptions().det_opts))
KALDI_WARN << "Determinization finished earlier than the beam for "
<< "utterance " << utt;
// We'll write the lattice without acoustic scaling.
if (acoustic_scale != 0.0)
fst::ScaleLattice(fst::AcousticLatticeScale(1.0 / acoustic_scale), &clat);
// disable output lattice temporarily
// compact_lattice_writer.Write(utt, clat);
} else {
// We'll write the lattice without acoustic scaling.
if (acoustic_scale != 0.0)
fst::ScaleLattice(fst::AcousticLatticeScale(1.0 / acoustic_scale), &lat);
// lattice_writer.Write(utt, lat);
}
return ret;
}
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <string>
#include <vector>
#include "base/kaldi-common.h"
#include "base/timer.h"
#include "decoder/decodable-matrix.h"
#include "decoder/decoder-wrappers.h"
#include "fstext/kaldi-fst-io.h"
#include "hmm/transition-model.h"
#include "tree/context-dep.h"
#include "util/common-utils.h"
class Decoder {
public:
Decoder(std::string trans_model_in_filename,
std::string word_syms_filename,
std::string fst_in_filename,
std::string logprior_in_filename,
size_t beam_size,
kaldi::BaseFloat acoustic_scale);
~Decoder();
// Interface to accept the scores read from specifier and print
// the decoding results directly
void decode_from_file(std::string posterior_rspecifier,
size_t num_processes = 1);
// Accept the scores of one utterance and return the decoding result
std::string decode(
std::string key,
const std::vector<std::vector<kaldi::BaseFloat>> &log_probs);
// Accept the scores of utterances in batch and return the decoding results
std::vector<std::string> decode_batch(
std::vector<std::string> key,
const std::vector<std::vector<std::vector<kaldi::BaseFloat>>>
&log_probs_batch,
size_t num_processes = 1);
private:
// For decoding one utterance
std::string decode_internal(kaldi::LatticeFasterDecoder *decoder,
std::string key,
kaldi::Matrix<kaldi::BaseFloat> &loglikes);
std::string DecodeUtteranceLatticeFaster(kaldi::LatticeFasterDecoder *decoder,
kaldi::DecodableInterface &decodable,
std::string utt,
double *like_ptr);
fst::SymbolTable *word_syms;
fst::Fst<fst::StdArc> *decode_fst;
std::vector<kaldi::LatticeFasterDecoder *> decoder_pool;
kaldi::Vector<kaldi::BaseFloat> logprior;
kaldi::TransitionModel trans_model;
kaldi::LatticeFasterDecoderConfig config;
kaldi::CompactLatticeWriter compact_lattice_writer;
kaldi::LatticeWriter lattice_writer;
kaldi::Int32VectorWriter *words_writer;
kaldi::Int32VectorWriter *alignment_writer;
bool binary;
bool determinize;
kaldi::BaseFloat acoustic_scale;
bool allow_partial;
};
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include "post_latgen_faster_mapped.h"
namespace py = pybind11;
PYBIND11_MODULE(post_latgen_faster_mapped, m) {
m.doc() = "Decoder for Deep ASR model";
py::class_<Decoder>(m, "Decoder")
.def(py::init<std::string,
std::string,
std::string,
std::string,
size_t,
kaldi::BaseFloat>())
.def("decode_from_file",
(void (Decoder::*)(std::string, size_t)) & Decoder::decode_from_file,
"Decode for the probability matrices in specifier "
"and print the transcriptions.")
.def(
"decode",
(std::string (Decoder::*)(
std::string, const std::vector<std::vector<kaldi::BaseFloat>>&)) &
Decoder::decode,
"Decode one input probability matrix "
"and return the transcription.")
.def("decode_batch",
(std::vector<std::string> (Decoder::*)(
std::vector<std::string>,
const std::vector<std::vector<std::vector<kaldi::BaseFloat>>>&,
size_t num_processes)) &
Decoder::decode_batch,
"Decode one batch of probability matrices "
"and return the transcriptions.");
}
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import glob
from distutils.core import setup, Extension
from distutils.sysconfig import get_config_vars
try:
kaldi_root = os.environ['KALDI_ROOT']
except:
raise ValueError("Enviroment variable 'KALDI_ROOT' is not defined. Please "
"install kaldi and export KALDI_ROOT=<kaldi's root dir> .")
args = [
'-std=c++11', '-fopenmp', '-Wno-sign-compare', '-Wno-unused-variable',
'-Wno-unused-local-typedefs', '-Wno-unused-but-set-variable',
'-Wno-deprecated-declarations', '-Wno-unused-function'
]
# remove warning about -Wstrict-prototypes
(opt, ) = get_config_vars('OPT')
os.environ['OPT'] = " ".join(flag for flag in opt.split()
if flag != '-Wstrict-prototypes')
os.environ['CC'] = 'g++'
LIBS = [
'fst', 'kaldi-base', 'kaldi-util', 'kaldi-matrix', 'kaldi-tree',
'kaldi-hmm', 'kaldi-fstext', 'kaldi-decoder', 'kaldi-lat'
]
LIB_DIRS = [
'tools/openfst/lib', 'src/base', 'src/matrix', 'src/util', 'src/tree',
'src/hmm', 'src/fstext', 'src/decoder', 'src/lat'
]
LIB_DIRS = [os.path.join(kaldi_root, path) for path in LIB_DIRS]
LIB_DIRS = [os.path.abspath(path) for path in LIB_DIRS]
ext_modules = [
Extension(
'post_latgen_faster_mapped',
['pybind.cc', 'post_latgen_faster_mapped.cc'],
include_dirs=[
'pybind11/include', '.', os.path.join(kaldi_root, 'src'),
os.path.join(kaldi_root, 'tools/openfst/src/include'), 'ThreadPool'
],
language='c++',
libraries=LIBS,
library_dirs=LIB_DIRS,
runtime_library_dirs=LIB_DIRS,
extra_compile_args=args, ),
]
setup(
name='post_latgen_faster_mapped',
version='0.1.0',
author='Paddle',
author_email='',
description='Decoder for Deep ASR model',
ext_modules=ext_modules, )
set -e
if [ ! -d pybind11 ]; then
git clone https://github.com/pybind/pybind11.git
fi
if [ ! -d ThreadPool ]; then
git clone https://github.com/progschj/ThreadPool.git
echo -e "\n"
fi
python setup.py build_ext -i
url=http://deep-asr-data.gz.bcebos.com/aishell_pretrained_model.tar.gz
md5=7b51bde64e884f43901b7a3461ccbfa3
wget -c $url
echo "Checking md5 sum ..."
md5sum_tmp=`md5sum aishell_pretrained_model.tar.gz | cut -d ' ' -f1`
if [ $md5sum_tmp != $md5 ]; then
echo "Md5sum check failed, please remove and redownload "
"aishell_pretrained_model.tar.gz."
exit 1
fi
tar xvf aishell_pretrained_model.tar.gz
decode_to_path=./decoding_result.txt
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -u ../../infer_by_ckpt.py --batch_size 96 \
--checkpoint checkpoints/deep_asr.latest.checkpoint \
--infer_feature_lst data/test_feature.lst \
--mean_var data/global_mean_var \
--frame_dim 80 \
--class_num 3040 \
--num_threads 24 \
--beam_size 11 \
--decode_to_path $decode_to_path \
--trans_model aux/final.mdl \
--log_prior aux/logprior \
--vocabulary aux/graph/words.txt \
--graphs aux/graph/HCLG.fst \
--acoustic_scale 0.059 \
--parallel
data_dir=~/.cache/paddle/dataset/speech/deep_asr_data/aishell
data_url='http://deep-asr-data.gz.bcebos.com/aishell_data.tar.gz'
lst_url='http://deep-asr-data.gz.bcebos.com/aishell_lst.tar.gz'
aux_url='http://deep-asr-data.gz.bcebos.com/aux.tar.gz'
md5=17669b8d63331c9326f4a9393d289bfb
aux_md5=50e3125eba1e3a2768a6f2e499cc1749
if [ ! -e $data_dir ]; then
mkdir -p $data_dir
fi
if [ ! -e $data_dir/aishell_data.tar.gz ]; then
echo "Download $data_dir/aishell_data.tar.gz ..."
wget -c -P $data_dir $data_url
else
echo "Skip downloading for $data_dir/aishell_data.tar.gz has already existed!"
fi
echo "Checking md5 sum ..."
md5sum_tmp=`md5sum $data_dir/aishell_data.tar.gz | cut -d ' ' -f1`
if [ $md5sum_tmp != $md5 ]; then
echo "Md5sum check failed, please remove and redownload "
"$data_dir/aishell_data.tar.gz"
exit 1
fi
echo "Untar aishell_data.tar.gz ..."
tar xzf $data_dir/aishell_data.tar.gz -C $data_dir
if [ ! -e data ]; then
mkdir data
fi
echo "Download and untar lst files ..."
wget -c -P data $lst_url
tar xvf data/aishell_lst.tar.gz -C data
ln -s $data_dir data/aishell
echo "Download and untar aux files ..."
wget -c $aux_url
tar xvf aux.tar.gz
export CUDA_VISIBLE_DEVICES=0
python -u ../../tools/profile.py --feature_lst data/train_feature.lst \
--label_lst data/train_label.lst \
--mean_var data/global_mean_var \
--frame_dim 80 \
--class_num 3040 \
--batch_size 16
ref_txt=aux/test.ref.txt
hyp_txt=decoding_result.txt
python ../../score_error_rate.py --error_rate_type cer --ref $ref_txt --hyp $hyp_txt
export CUDA_VISIBLE_DEVICES=4,5,6,7
python -u ../../train.py --train_feature_lst data/train_feature.lst \
--train_label_lst data/train_label.lst \
--val_feature_lst data/val_feature.lst \
--val_label_lst data/val_label.lst \
--mean_var data/global_mean_var \
--checkpoints checkpoints \
--frame_dim 80 \
--class_num 3040 \
--infer_models '' \
--batch_size 64 \
--learning_rate 6.4e-5 \
--parallel
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import argparse
import paddle.fluid as fluid
import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
import data_utils.augmentor.trans_add_delta as trans_add_delta
import data_utils.augmentor.trans_splice as trans_splice
import data_utils.async_data_reader as reader
from data_utils.util import lodtensor_to_ndarray
from data_utils.util import split_infer_result
def parse_args():
parser = argparse.ArgumentParser("Inference for stacked LSTMP model.")
parser.add_argument(
'--batch_size',
type=int,
default=32,
help='The sequence number of a batch data. (default: %(default)d)')
parser.add_argument(
'--device',
type=str,
default='GPU',
choices=['CPU', 'GPU'],
help='The device type. (default: %(default)s)')
parser.add_argument(
'--mean_var',
type=str,
default='data/global_mean_var_search26kHr',
help="The path for feature's global mean and variance. "
"(default: %(default)s)")
parser.add_argument(
'--infer_feature_lst',
type=str,
default='data/infer_feature.lst',
help='The feature list path for inference. (default: %(default)s)')
parser.add_argument(
'--infer_label_lst',
type=str,
default='data/infer_label.lst',
help='The label list path for inference. (default: %(default)s)')
parser.add_argument(
'--infer_model_path',
type=str,
default='./infer_models/deep_asr.pass_0.infer.model/',
help='The directory for loading inference model. '
'(default: %(default)s)')
args = parser.parse_args()
return args
def print_arguments(args):
print('----------- Configuration Arguments -----------')
for arg, value in sorted(vars(args).iteritems()):
print('%s: %s' % (arg, value))
print('------------------------------------------------')
def infer(args):
""" Gets one batch of feature data and predicts labels for each sample.
"""
if not os.path.exists(args.infer_model_path):
raise IOError("Invalid inference model path!")
place = fluid.CUDAPlace(0) if args.device == 'GPU' else fluid.CPUPlace()
exe = fluid.Executor(place)
# load model
[infer_program, feed_dict,
fetch_targets] = fluid.io.load_inference_model(args.infer_model_path, exe)
ltrans = [
trans_add_delta.TransAddDelta(2, 2),
trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var),
trans_splice.TransSplice()
]
infer_data_reader = reader.AsyncDataReader(args.infer_feature_lst,
args.infer_label_lst)
infer_data_reader.set_transformers(ltrans)
feature_t = fluid.LoDTensor()
one_batch = infer_data_reader.batch_iterator(args.batch_size, 1).next()
(features, labels, lod) = one_batch
feature_t.set(features, place)
feature_t.set_lod([lod])
results = exe.run(infer_program,
feed={feed_dict[0]: feature_t},
fetch_list=fetch_targets,
return_numpy=False)
probs, lod = lodtensor_to_ndarray(results[0])
preds = probs.argmax(axis=1)
infer_batch = split_infer_result(preds, lod)
for index, sample in enumerate(infer_batch):
print("result %d: " % index, sample, '\n')
if __name__ == '__main__':
args = parse_args()
print_arguments(args)
infer(args)
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys
import os
import numpy as np
import argparse
import time
import paddle.fluid as fluid
import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
import data_utils.augmentor.trans_add_delta as trans_add_delta
import data_utils.augmentor.trans_splice as trans_splice
import data_utils.augmentor.trans_delay as trans_delay
import data_utils.async_data_reader as reader
from data_utils.util import lodtensor_to_ndarray, split_infer_result
from model_utils.model import stacked_lstmp_model
from decoder.post_latgen_faster_mapped import Decoder
from tools.error_rate import char_errors
def parse_args():
parser = argparse.ArgumentParser("Run inference by using checkpoint.")
parser.add_argument(
'--batch_size',
type=int,
default=32,
help='The sequence number of a batch data. (default: %(default)d)')
parser.add_argument(
'--beam_size',
type=int,
default=11,
help='The beam size for decoding. (default: %(default)d)')
parser.add_argument(
'--minimum_batch_size',
type=int,
default=1,
help='The minimum sequence number of a batch data. '
'(default: %(default)d)')
parser.add_argument(
'--frame_dim',
type=int,
default=80,
help='Frame dimension of feature data. (default: %(default)d)')
parser.add_argument(
'--stacked_num',
type=int,
default=5,
help='Number of lstmp layers to stack. (default: %(default)d)')
parser.add_argument(
'--proj_dim',
type=int,
default=512,
help='Project size of lstmp unit. (default: %(default)d)')
parser.add_argument(
'--hidden_dim',
type=int,
default=1024,
help='Hidden size of lstmp unit. (default: %(default)d)')
parser.add_argument(
'--class_num',
type=int,
default=1749,
help='Number of classes in label. (default: %(default)d)')
parser.add_argument(
'--num_threads',
type=int,
default=10,
help='The number of threads for decoding. (default: %(default)d)')
parser.add_argument(
'--device',
type=str,
default='GPU',
choices=['CPU', 'GPU'],
help='The device type. (default: %(default)s)')
parser.add_argument(
'--parallel', action='store_true', help='If set, run in parallel.')
parser.add_argument(
'--mean_var',
type=str,
default='data/global_mean_var',
help="The path for feature's global mean and variance. "
"(default: %(default)s)")
parser.add_argument(
'--infer_feature_lst',
type=str,
default='data/infer_feature.lst',
help='The feature list path for inference. (default: %(default)s)')
parser.add_argument(
'--checkpoint',
type=str,
default='./checkpoint',
help="The checkpoint path to init model. (default: %(default)s)")
parser.add_argument(
'--trans_model',
type=str,
default='./graph/trans_model',
help="The path to vocabulary. (default: %(default)s)")
parser.add_argument(
'--vocabulary',
type=str,
default='./graph/words.txt',
help="The path to vocabulary. (default: %(default)s)")
parser.add_argument(
'--graphs',
type=str,
default='./graph/TLG.fst',
help="The path to TLG graphs for decoding. (default: %(default)s)")
parser.add_argument(
'--log_prior',
type=str,
default="./logprior",
help="The log prior probs for training data. (default: %(default)s)")
parser.add_argument(
'--acoustic_scale',
type=float,
default=0.2,
help="Scaling factor for acoustic likelihoods. (default: %(default)f)")
parser.add_argument(
'--post_matrix_path',
type=str,
default=None,
help="The path to output post prob matrix. (default: %(default)s)")
parser.add_argument(
'--decode_to_path',
type=str,
default='./decoding_result.txt',
required=True,
help="The path to output the decoding result. (default: %(default)s)")
args = parser.parse_args()
return args
def print_arguments(args):
print('----------- Configuration Arguments -----------')
for arg, value in sorted(vars(args).iteritems()):
print('%s: %s' % (arg, value))
print('------------------------------------------------')
class PostMatrixWriter:
""" The writer for outputing the post probability matrix
"""
def __init__(self, to_path):
self._to_path = to_path
with open(self._to_path, "w") as post_matrix:
post_matrix.seek(0)
post_matrix.truncate()
def write(self, keys, probs):
with open(self._to_path, "a") as post_matrix:
if isinstance(keys, str):
keys, probs = [keys], [probs]
for key, prob in zip(keys, probs):
post_matrix.write(key + " [\n")
for i in range(prob.shape[0]):
for j in range(prob.shape[1]):
post_matrix.write(str(prob[i][j]) + " ")
post_matrix.write("\n")
post_matrix.write("]\n")
class DecodingResultWriter:
""" The writer for writing out decoding results
"""
def __init__(self, to_path):
self._to_path = to_path
with open(self._to_path, "w") as decoding_result:
decoding_result.seek(0)
decoding_result.truncate()
def write(self, results):
with open(self._to_path, "a") as decoding_result:
if isinstance(results, str):
decoding_result.write(results.encode("utf8") + "\n")
else:
for result in results:
decoding_result.write(result.encode("utf8") + "\n")
def infer_from_ckpt(args):
"""Inference by using checkpoint."""
if not os.path.exists(args.checkpoint):
raise IOError("Invalid checkpoint!")
prediction, avg_cost, accuracy = stacked_lstmp_model(
frame_dim=args.frame_dim,
hidden_dim=args.hidden_dim,
proj_dim=args.proj_dim,
stacked_num=args.stacked_num,
class_num=args.class_num,
parallel=args.parallel)
infer_program = fluid.default_main_program().clone()
# optimizer, placeholder
optimizer = fluid.optimizer.Adam(
learning_rate=fluid.layers.exponential_decay(
learning_rate=0.0001,
decay_steps=1879,
decay_rate=1 / 1.2,
staircase=True))
optimizer.minimize(avg_cost)
place = fluid.CPUPlace() if args.device == 'CPU' else fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
# load checkpoint.
fluid.io.load_persistables(exe, args.checkpoint)
# init decoder
decoder = Decoder(args.trans_model, args.vocabulary, args.graphs,
args.log_prior, args.beam_size, args.acoustic_scale)
ltrans = [
trans_add_delta.TransAddDelta(2, 2),
trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var),
trans_splice.TransSplice(5, 5), trans_delay.TransDelay(5)
]
feature_t = fluid.LoDTensor()
label_t = fluid.LoDTensor()
# infer data reader
infer_data_reader = reader.AsyncDataReader(
args.infer_feature_lst, drop_frame_len=-1, split_sentence_threshold=-1)
infer_data_reader.set_transformers(ltrans)
decoding_result_writer = DecodingResultWriter(args.decode_to_path)
post_matrix_writer = None if args.post_matrix_path is None \
else PostMatrixWriter(args.post_matrix_path)
for batch_id, batch_data in enumerate(
infer_data_reader.batch_iterator(args.batch_size,
args.minimum_batch_size)):
# load_data
(features, labels, lod, name_lst) = batch_data
features = np.reshape(features, (-1, 11, 3, args.frame_dim))
features = np.transpose(features, (0, 2, 1, 3))
feature_t.set(features, place)
feature_t.set_lod([lod])
label_t.set(labels, place)
label_t.set_lod([lod])
results = exe.run(infer_program,
feed={"feature": feature_t,
"label": label_t},
fetch_list=[prediction, avg_cost, accuracy],
return_numpy=False)
probs, lod = lodtensor_to_ndarray(results[0])
infer_batch = split_infer_result(probs, lod)
print("Decoding batch %d ..." % batch_id)
decoded = decoder.decode_batch(name_lst, infer_batch, args.num_threads)
decoding_result_writer.write(decoded)
if args.post_matrix_path is not None:
post_matrix_writer.write(name_lst, infer_batch)
if __name__ == '__main__':
args = parse_args()
print_arguments(args)
infer_from_ckpt(args)
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle.v2 as paddle
import paddle.fluid as fluid
def stacked_lstmp_model(frame_dim,
hidden_dim,
proj_dim,
stacked_num,
class_num,
parallel=False,
is_train=True):
""" The model for DeepASR. The main structure is composed of stacked
identical LSTMP (LSTM with recurrent projection) layers.
When running in training and validation phase, the feeding dictionary
is {'feature', 'label'}, fed by the LodTensor for feature data and
label data respectively. And in inference, only `feature` is needed.
Args:
frame_dim(int): The frame dimension of feature data.
hidden_dim(int): The hidden state's dimension of the LSTMP layer.
proj_dim(int): The projection size of the LSTMP layer.
stacked_num(int): The number of stacked LSTMP layers.
parallel(bool): Run in parallel or not, default `False`.
is_train(bool): Run in training phase or not, default `True`.
class_dim(int): The number of output classes.
"""
# network configuration
def _net_conf(feature, label):
conv1 = fluid.layers.conv2d(
input=feature,
num_filters=32,
filter_size=3,
stride=1,
padding=1,
bias_attr=True,
act="relu")
pool1 = fluid.layers.pool2d(
conv1, pool_size=3, pool_type="max", pool_stride=2, pool_padding=0)
stack_input = pool1
for i in range(stacked_num):
fc = fluid.layers.fc(input=stack_input,
size=hidden_dim * 4,
bias_attr=None)
proj, cell = fluid.layers.dynamic_lstmp(
input=fc,
size=hidden_dim * 4,
proj_size=proj_dim,
bias_attr=True,
use_peepholes=True,
is_reverse=False,
cell_activation="tanh",
proj_activation="tanh")
bn = fluid.layers.batch_norm(
input=proj,
is_test=not is_train,
momentum=0.9,
epsilon=1e-05,
data_layout='NCHW')
stack_input = bn
prediction = fluid.layers.fc(input=stack_input,
size=class_num,
act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=label)
avg_cost = fluid.layers.mean(x=cost)
acc = fluid.layers.accuracy(input=prediction, label=label)
return prediction, avg_cost, acc
# data feeder
feature = fluid.layers.data(
name="feature",
shape=[-1, 3, 11, frame_dim],
dtype="float32",
lod_level=1)
label = fluid.layers.data(
name="label", shape=[-1, 1], dtype="int64", lod_level=1)
if parallel:
# When the execution place is specified to CUDAPlace, the program will
# run on all $CUDA_VISIBLE_DEVICES GPUs. Otherwise the program will
# run on all CPU devices.
places = fluid.layers.device.get_places()
pd = fluid.layers.ParallelDo(places)
with pd.do():
feat_ = pd.read_input(feature)
label_ = pd.read_input(label)
prediction, avg_cost, acc = _net_conf(feat_, label_)
for out in [prediction, avg_cost, acc]:
pd.write_output(out)
# get mean loss and acc through every devices.
prediction, avg_cost, acc = pd()
prediction.stop_gradient = True
avg_cost = fluid.layers.mean(x=avg_cost)
acc = fluid.layers.mean(x=acc)
else:
prediction, avg_cost, acc = _net_conf(feature, label)
return prediction, avg_cost, acc
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
from tools.error_rate import char_errors, word_errors
def parse_args():
parser = argparse.ArgumentParser(
"Score word/character error rate (WER/CER) "
"for decoding result.")
parser.add_argument(
'--error_rate_type',
type=str,
default='cer',
choices=['cer', 'wer'],
help="Error rate type. (default: %(default)s)")
parser.add_argument(
'--special_tokens',
type=str,
default='<SPOKEN_NOISE>',
help="Special tokens in scoring CER, seperated by space. "
"They shouldn't be splitted and should be treated as one special "
"character. Example: '<SPOKEN_NOISE> <bos> <eos>' "
"(default: %(default)s)")
parser.add_argument(
'--ref', type=str, required=True, help="The ground truth text.")
parser.add_argument(
'--hyp', type=str, required=True, help="The decoding result text.")
args = parser.parse_args()
return args
if __name__ == '__main__':
args = parse_args()
ref_dict = {}
sum_errors, sum_ref_len = 0.0, 0
sent_cnt, not_in_ref_cnt = 0, 0
special_tokens = args.special_tokens.split(" ")
with open(args.ref, "r") as ref_txt:
line = ref_txt.readline()
while line:
del_pos = line.find(" ")
key, sent = line[0:del_pos], line[del_pos + 1:-1].strip()
ref_dict[key] = sent
line = ref_txt.readline()
with open(args.hyp, "r") as hyp_txt:
line = hyp_txt.readline()
while line:
del_pos = line.find(" ")
key, sent = line[0:del_pos], line[del_pos + 1:-1].strip()
sent_cnt += 1
line = hyp_txt.readline()
if key not in ref_dict:
not_in_ref_cnt += 1
continue
if args.error_rate_type == 'cer':
for sp_tok in special_tokens:
sent = sent.replace(sp_tok, '\0')
errors, ref_len = char_errors(
ref_dict[key].decode("utf8"),
sent.decode("utf8"),
remove_space=True)
else:
errors, ref_len = word_errors(ref_dict[key].decode("utf8"),
sent.decode("utf8"))
sum_errors += errors
sum_ref_len += ref_len
print("Error rate[%s] = %f (%d/%d)," %
(args.error_rate_type, sum_errors / sum_ref_len, int(sum_errors),
sum_ref_len))
print("total %d sentences in hyp, %d not presented in ref." %
(sent_cnt, not_in_ref_cnt))
"""Add the parent directory to $PYTHONPATH"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os.path
import sys
def add_path(path):
if path not in sys.path:
sys.path.insert(0, path)
this_dir = os.path.dirname(__file__)
# Add project path to PYTHONPATH
proj_path = os.path.join(this_dir, '..')
add_path(proj_path)
# -*- coding: utf-8 -*-
"""This module provides functions to calculate error rate in different level.
e.g. wer for word-level, cer for char-level.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
def _levenshtein_distance(ref, hyp):
"""Levenshtein distance is a string metric for measuring the difference
between two sequences. Informally, the levenshtein disctance is defined as
the minimum number of single-character edits (substitutions, insertions or
deletions) required to change one word into the other. We can naturally
extend the edits to word level when calculate levenshtein disctance for
two sentences.
"""
m = len(ref)
n = len(hyp)
# special case
if ref == hyp:
return 0
if m == 0:
return n
if n == 0:
return m
if m < n:
ref, hyp = hyp, ref
m, n = n, m
# use O(min(m, n)) space
distance = np.zeros((2, n + 1), dtype=np.int32)
# initialize distance matrix
for j in xrange(n + 1):
distance[0][j] = j
# calculate levenshtein distance
for i in xrange(1, m + 1):
prev_row_idx = (i - 1) % 2
cur_row_idx = i % 2
distance[cur_row_idx][0] = i
for j in xrange(1, n + 1):
if ref[i - 1] == hyp[j - 1]:
distance[cur_row_idx][j] = distance[prev_row_idx][j - 1]
else:
s_num = distance[prev_row_idx][j - 1] + 1
i_num = distance[cur_row_idx][j - 1] + 1
d_num = distance[prev_row_idx][j] + 1
distance[cur_row_idx][j] = min(s_num, i_num, d_num)
return distance[m % 2][n]
def word_errors(reference, hypothesis, ignore_case=False, delimiter=' '):
"""Compute the levenshtein distance between reference sequence and
hypothesis sequence in word-level.
:param reference: The reference sentence.
:type reference: basestring
:param hypothesis: The hypothesis sentence.
:type hypothesis: basestring
:param ignore_case: Whether case-sensitive or not.
:type ignore_case: bool
:param delimiter: Delimiter of input sentences.
:type delimiter: char
:return: Levenshtein distance and word number of reference sentence.
:rtype: list
"""
if ignore_case == True:
reference = reference.lower()
hypothesis = hypothesis.lower()
ref_words = filter(None, reference.split(delimiter))
hyp_words = filter(None, hypothesis.split(delimiter))
edit_distance = _levenshtein_distance(ref_words, hyp_words)
return float(edit_distance), len(ref_words)
def char_errors(reference, hypothesis, ignore_case=False, remove_space=False):
"""Compute the levenshtein distance between reference sequence and
hypothesis sequence in char-level.
:param reference: The reference sentence.
:type reference: basestring
:param hypothesis: The hypothesis sentence.
:type hypothesis: basestring
:param ignore_case: Whether case-sensitive or not.
:type ignore_case: bool
:param remove_space: Whether remove internal space characters
:type remove_space: bool
:return: Levenshtein distance and length of reference sentence.
:rtype: list
"""
if ignore_case == True:
reference = reference.lower()
hypothesis = hypothesis.lower()
join_char = ' '
if remove_space == True:
join_char = ''
reference = join_char.join(filter(None, reference.split(' ')))
hypothesis = join_char.join(filter(None, hypothesis.split(' ')))
edit_distance = _levenshtein_distance(reference, hypothesis)
return float(edit_distance), len(reference)
def wer(reference, hypothesis, ignore_case=False, delimiter=' '):
"""Calculate word error rate (WER). WER compares reference text and
hypothesis text in word-level. WER is defined as:
.. math::
WER = (Sw + Dw + Iw) / Nw
where
.. code-block:: text
Sw is the number of words subsituted,
Dw is the number of words deleted,
Iw is the number of words inserted,
Nw is the number of words in the reference
We can use levenshtein distance to calculate WER. Please draw an attention
that empty items will be removed when splitting sentences by delimiter.
:param reference: The reference sentence.
:type reference: basestring
:param hypothesis: The hypothesis sentence.
:type hypothesis: basestring
:param ignore_case: Whether case-sensitive or not.
:type ignore_case: bool
:param delimiter: Delimiter of input sentences.
:type delimiter: char
:return: Word error rate.
:rtype: float
:raises ValueError: If word number of reference is zero.
"""
edit_distance, ref_len = word_errors(reference, hypothesis, ignore_case,
delimiter)
if ref_len == 0:
raise ValueError("Reference's word number should be greater than 0.")
wer = float(edit_distance) / ref_len
return wer
def cer(reference, hypothesis, ignore_case=False, remove_space=False):
"""Calculate charactor error rate (CER). CER compares reference text and
hypothesis text in char-level. CER is defined as:
.. math::
CER = (Sc + Dc + Ic) / Nc
where
.. code-block:: text
Sc is the number of characters substituted,
Dc is the number of characters deleted,
Ic is the number of characters inserted
Nc is the number of characters in the reference
We can use levenshtein distance to calculate CER. Chinese input should be
encoded to unicode. Please draw an attention that the leading and tailing
space characters will be truncated and multiple consecutive space
characters in a sentence will be replaced by one space character.
:param reference: The reference sentence.
:type reference: basestring
:param hypothesis: The hypothesis sentence.
:type hypothesis: basestring
:param ignore_case: Whether case-sensitive or not.
:type ignore_case: bool
:param remove_space: Whether remove internal space characters
:type remove_space: bool
:return: Character error rate.
:rtype: float
:raises ValueError: If the reference length is zero.
"""
edit_distance, ref_len = char_errors(reference, hypothesis, ignore_case,
remove_space)
if ref_len == 0:
raise ValueError("Length of reference should be greater than 0.")
cer = float(edit_distance) / ref_len
return cer
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys
import numpy as np
import argparse
import time
import paddle.fluid as fluid
import paddle.fluid.profiler as profiler
import _init_paths
import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
import data_utils.augmentor.trans_add_delta as trans_add_delta
import data_utils.augmentor.trans_splice as trans_splice
import data_utils.augmentor.trans_delay as trans_delay
import data_utils.async_data_reader as reader
from model_utils.model import stacked_lstmp_model
from data_utils.util import lodtensor_to_ndarray
def parse_args():
parser = argparse.ArgumentParser("Profiling for the stacked LSTMP model.")
parser.add_argument(
'--batch_size',
type=int,
default=32,
help='The sequence number of a batch data. (default: %(default)d)')
parser.add_argument(
'--minimum_batch_size',
type=int,
default=1,
help='The minimum sequence number of a batch data. '
'(default: %(default)d)')
parser.add_argument(
'--frame_dim',
type=int,
default=120 * 11,
help='Frame dimension of feature data. (default: %(default)d)')
parser.add_argument(
'--stacked_num',
type=int,
default=5,
help='Number of lstmp layers to stack. (default: %(default)d)')
parser.add_argument(
'--proj_dim',
type=int,
default=512,
help='Project size of lstmp unit. (default: %(default)d)')
parser.add_argument(
'--hidden_dim',
type=int,
default=1024,
help='Hidden size of lstmp unit. (default: %(default)d)')
parser.add_argument(
'--class_num',
type=int,
default=1749,
help='Number of classes in label. (default: %(default)d)')
parser.add_argument(
'--learning_rate',
type=float,
default=0.00016,
help='Learning rate used to train. (default: %(default)f)')
parser.add_argument(
'--device',
type=str,
default='GPU',
choices=['CPU', 'GPU'],
help='The device type. (default: %(default)s)')
parser.add_argument(
'--parallel', action='store_true', help='If set, run in parallel.')
parser.add_argument(
'--mean_var',
type=str,
default='data/global_mean_var_search26kHr',
help='mean var path')
parser.add_argument(
'--feature_lst',
type=str,
default='data/feature.lst',
help='feature list path.')
parser.add_argument(
'--label_lst',
type=str,
default='data/label.lst',
help='label list path.')
parser.add_argument(
'--max_batch_num',
type=int,
default=11,
help='Maximum number of batches for profiling. (default: %(default)d)')
parser.add_argument(
'--first_batches_to_skip',
type=int,
default=1,
help='Number of first batches to skip for profiling. '
'(default: %(default)d)')
parser.add_argument(
'--print_train_acc',
action='store_true',
help='If set, output training accuray.')
parser.add_argument(
'--sorted_key',
type=str,
default='total',
choices=['None', 'total', 'calls', 'min', 'max', 'ave'],
help='Different types of time to sort the profiling report. '
'(default: %(default)s)')
args = parser.parse_args()
return args
def print_arguments(args):
print('----------- Configuration Arguments -----------')
for arg, value in sorted(vars(args).iteritems()):
print('%s: %s' % (arg, value))
print('------------------------------------------------')
def profile(args):
"""profile the training process.
"""
if not args.first_batches_to_skip < args.max_batch_num:
raise ValueError("arg 'first_batches_to_skip' must be smaller than "
"'max_batch_num'.")
if not args.first_batches_to_skip >= 0:
raise ValueError(
"arg 'first_batches_to_skip' must not be smaller than 0.")
_, avg_cost, accuracy = stacked_lstmp_model(
frame_dim=args.frame_dim,
hidden_dim=args.hidden_dim,
proj_dim=args.proj_dim,
stacked_num=args.stacked_num,
class_num=args.class_num,
parallel=args.parallel)
optimizer = fluid.optimizer.Adam(
learning_rate=fluid.layers.exponential_decay(
learning_rate=args.learning_rate,
decay_steps=1879,
decay_rate=1 / 1.2,
staircase=True))
optimizer.minimize(avg_cost)
place = fluid.CPUPlace() if args.device == 'CPU' else fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
ltrans = [
trans_add_delta.TransAddDelta(2, 2),
trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var),
trans_splice.TransSplice(5, 5), trans_delay.TransDelay(5)
]
data_reader = reader.AsyncDataReader(
args.feature_lst, args.label_lst, -1, split_sentence_threshold=1024)
data_reader.set_transformers(ltrans)
feature_t = fluid.LoDTensor()
label_t = fluid.LoDTensor()
sorted_key = None if args.sorted_key is 'None' else args.sorted_key
with profiler.profiler(args.device, sorted_key) as prof:
frames_seen, start_time = 0, 0.0
for batch_id, batch_data in enumerate(
data_reader.batch_iterator(args.batch_size,
args.minimum_batch_size)):
if batch_id >= args.max_batch_num:
break
if args.first_batches_to_skip == batch_id:
profiler.reset_profiler()
start_time = time.time()
frames_seen = 0
# load_data
(features, labels, lod, _) = batch_data
features = np.reshape(features, (-1, 11, 3, args.frame_dim))
features = np.transpose(features, (0, 2, 1, 3))
feature_t.set(features, place)
feature_t.set_lod([lod])
label_t.set(labels, place)
label_t.set_lod([lod])
frames_seen += lod[-1]
outs = exe.run(fluid.default_main_program(),
feed={"feature": feature_t,
"label": label_t},
fetch_list=[avg_cost, accuracy]
if args.print_train_acc else [],
return_numpy=False)
if args.print_train_acc:
print("Batch %d acc: %f" %
(batch_id, lodtensor_to_ndarray(outs[1])[0]))
else:
sys.stdout.write('.')
sys.stdout.flush()
time_consumed = time.time() - start_time
frames_per_sec = frames_seen / time_consumed
print("\nTime consumed: %f s, performance: %f frames/s." %
(time_consumed, frames_per_sec))
if __name__ == '__main__':
args = parse_args()
print_arguments(args)
profile(args)
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys
import os
import numpy as np
import argparse
import time
import paddle.fluid as fluid
import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
import data_utils.augmentor.trans_add_delta as trans_add_delta
import data_utils.augmentor.trans_splice as trans_splice
import data_utils.augmentor.trans_delay as trans_delay
import data_utils.async_data_reader as reader
from data_utils.util import lodtensor_to_ndarray
from model_utils.model import stacked_lstmp_model
def parse_args():
parser = argparse.ArgumentParser("Training for stacked LSTMP model.")
parser.add_argument(
'--batch_size',
type=int,
default=32,
help='The sequence number of a batch data. (default: %(default)d)')
parser.add_argument(
'--minimum_batch_size',
type=int,
default=1,
help='The minimum sequence number of a batch data. '
'(default: %(default)d)')
parser.add_argument(
'--frame_dim',
type=int,
default=80,
help='Frame dimension of feature data. (default: %(default)d)')
parser.add_argument(
'--stacked_num',
type=int,
default=5,
help='Number of lstmp layers to stack. (default: %(default)d)')
parser.add_argument(
'--proj_dim',
type=int,
default=512,
help='Project size of lstmp unit. (default: %(default)d)')
parser.add_argument(
'--hidden_dim',
type=int,
default=1024,
help='Hidden size of lstmp unit. (default: %(default)d)')
parser.add_argument(
'--class_num',
type=int,
default=3040,
help='Number of classes in label. (default: %(default)d)')
parser.add_argument(
'--pass_num',
type=int,
default=100,
help='Epoch number to train. (default: %(default)d)')
parser.add_argument(
'--print_per_batches',
type=int,
default=100,
help='Interval to print training accuracy. (default: %(default)d)')
parser.add_argument(
'--learning_rate',
type=float,
default=0.00016,
help='Learning rate used to train. (default: %(default)f)')
parser.add_argument(
'--device',
type=str,
default='GPU',
choices=['CPU', 'GPU'],
help='The device type. (default: %(default)s)')
parser.add_argument(
'--parallel', action='store_true', help='If set, run in parallel.')
parser.add_argument(
'--mean_var',
type=str,
default='data/global_mean_var_search26kHr',
help="The path for feature's global mean and variance. "
"(default: %(default)s)")
parser.add_argument(
'--train_feature_lst',
type=str,
default='data/feature.lst',
help='The feature list path for training. (default: %(default)s)')
parser.add_argument(
'--train_label_lst',
type=str,
default='data/label.lst',
help='The label list path for training. (default: %(default)s)')
parser.add_argument(
'--val_feature_lst',
type=str,
default='data/val_feature.lst',
help='The feature list path for validation. (default: %(default)s)')
parser.add_argument(
'--val_label_lst',
type=str,
default='data/val_label.lst',
help='The label list path for validation. (default: %(default)s)')
parser.add_argument(
'--init_model_path',
type=str,
default=None,
help="The model (checkpoint) path which the training resumes from. "
"If None, train the model from scratch. (default: %(default)s)")
parser.add_argument(
'--checkpoints',
type=str,
default='./checkpoints',
help="The directory for saving checkpoints. Do not save checkpoints "
"if set to ''. (default: %(default)s)")
parser.add_argument(
'--infer_models',
type=str,
default='./infer_models',
help="The directory for saving inference models. Do not save inference "
"models if set to ''. (default: %(default)s)")
args = parser.parse_args()
return args
def print_arguments(args):
print('----------- Configuration Arguments -----------')
for arg, value in sorted(vars(args).iteritems()):
print('%s: %s' % (arg, value))
print('------------------------------------------------')
def train(args):
"""train in loop.
"""
# paths check
if args.init_model_path is not None and \
not os.path.exists(args.init_model_path):
raise IOError("Invalid initial model path!")
if args.checkpoints != '' and not os.path.exists(args.checkpoints):
os.mkdir(args.checkpoints)
if args.infer_models != '' and not os.path.exists(args.infer_models):
os.mkdir(args.infer_models)
prediction, avg_cost, accuracy = stacked_lstmp_model(
frame_dim=args.frame_dim,
hidden_dim=args.hidden_dim,
proj_dim=args.proj_dim,
stacked_num=args.stacked_num,
class_num=args.class_num,
parallel=args.parallel)
# program for test
test_program = fluid.default_main_program().clone()
#optimizer = fluid.optimizer.Momentum(learning_rate=args.learning_rate, momentum=0.9)
optimizer = fluid.optimizer.Adam(
learning_rate=fluid.layers.exponential_decay(
learning_rate=args.learning_rate,
decay_steps=1879,
decay_rate=1 / 1.2,
staircase=True))
optimizer.minimize(avg_cost)
place = fluid.CPUPlace() if args.device == 'CPU' else fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
# resume training if initial model provided.
if args.init_model_path is not None:
fluid.io.load_persistables(exe, args.init_model_path)
ltrans = [
trans_add_delta.TransAddDelta(2, 2),
trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var),
trans_splice.TransSplice(5, 5), trans_delay.TransDelay(5)
]
feature_t = fluid.LoDTensor()
label_t = fluid.LoDTensor()
# validation
def test(exe):
# If test data not found, return invalid cost and accuracy
if not (os.path.exists(args.val_feature_lst) and
os.path.exists(args.val_label_lst)):
return -1.0, -1.0
# test data reader
test_data_reader = reader.AsyncDataReader(
args.val_feature_lst,
args.val_label_lst,
-1,
split_sentence_threshold=1024)
test_data_reader.set_transformers(ltrans)
test_costs, test_accs = [], []
for batch_id, batch_data in enumerate(
test_data_reader.batch_iterator(args.batch_size,
args.minimum_batch_size)):
# load_data
(features, labels, lod, _) = batch_data
features = np.reshape(features, (-1, 11, 3, args.frame_dim))
features = np.transpose(features, (0, 2, 1, 3))
feature_t.set(features, place)
feature_t.set_lod([lod])
label_t.set(labels, place)
label_t.set_lod([lod])
cost, acc = exe.run(test_program,
feed={"feature": feature_t,
"label": label_t},
fetch_list=[avg_cost, accuracy],
return_numpy=False)
test_costs.append(lodtensor_to_ndarray(cost)[0])
test_accs.append(lodtensor_to_ndarray(acc)[0])
return np.mean(test_costs), np.mean(test_accs)
# train data reader
train_data_reader = reader.AsyncDataReader(
args.train_feature_lst,
args.train_label_lst,
-1,
split_sentence_threshold=1024)
train_data_reader.set_transformers(ltrans)
# train
for pass_id in xrange(args.pass_num):
pass_start_time = time.time()
for batch_id, batch_data in enumerate(
train_data_reader.batch_iterator(args.batch_size,
args.minimum_batch_size)):
# load_data
(features, labels, lod, name_lst) = batch_data
features = np.reshape(features, (-1, 11, 3, args.frame_dim))
features = np.transpose(features, (0, 2, 1, 3))
feature_t.set(features, place)
feature_t.set_lod([lod])
label_t.set(labels, place)
label_t.set_lod([lod])
to_print = batch_id > 0 and (batch_id % args.print_per_batches == 0)
outs = exe.run(fluid.default_main_program(),
feed={"feature": feature_t,
"label": label_t},
fetch_list=[avg_cost, accuracy] if to_print else [],
return_numpy=False)
if to_print:
print("\nBatch %d, train cost: %f, train acc: %f" %
(batch_id, lodtensor_to_ndarray(outs[0])[0],
lodtensor_to_ndarray(outs[1])[0]))
# save the latest checkpoint
if args.checkpoints != '':
model_path = os.path.join(args.checkpoints,
"deep_asr.latest.checkpoint")
fluid.io.save_persistables(exe, model_path)
else:
sys.stdout.write('.')
sys.stdout.flush()
# run test
val_cost, val_acc = test(exe)
# save checkpoint per pass
if args.checkpoints != '':
model_path = os.path.join(
args.checkpoints,
"deep_asr.pass_" + str(pass_id) + ".checkpoint")
fluid.io.save_persistables(exe, model_path)
# save inference model
if args.infer_models != '':
model_path = os.path.join(
args.infer_models,
"deep_asr.pass_" + str(pass_id) + ".infer.model")
fluid.io.save_inference_model(model_path, ["feature"],
[prediction], exe)
# cal pass time
pass_end_time = time.time()
time_consumed = pass_end_time - pass_start_time
# print info at pass end
print("\nPass %d, time consumed: %f s, val cost: %f, val acc: %f\n" %
(pass_id, time_consumed, val_cost, val_acc))
if __name__ == '__main__':
args = parse_args()
print_arguments(args)
train(args)
#-*- coding: utf-8 -*-
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
import numpy as np
import math
from tqdm import tqdm
from utils import fluid_flatten
class DQNModel(object):
def __init__(self, state_dim, action_dim, gamma, hist_len, use_cuda=False):
self.img_height = state_dim[0]
self.img_width = state_dim[1]
self.action_dim = action_dim
self.gamma = gamma
self.exploration = 1.1
self.update_target_steps = 10000 // 4
self.hist_len = hist_len
self.use_cuda = use_cuda
self.global_step = 0
self._build_net()
def _get_inputs(self):
return fluid.layers.data(
name='state',
shape=[self.hist_len, self.img_height, self.img_width],
dtype='float32'), \
fluid.layers.data(
name='action', shape=[1], dtype='int32'), \
fluid.layers.data(
name='reward', shape=[], dtype='float32'), \
fluid.layers.data(
name='next_s',
shape=[self.hist_len, self.img_height, self.img_width],
dtype='float32'), \
fluid.layers.data(
name='isOver', shape=[], dtype='bool')
def _build_net(self):
state, action, reward, next_s, isOver = self._get_inputs()
self.pred_value = self.get_DQN_prediction(state)
self.predict_program = fluid.default_main_program().clone()
reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
action_onehot = fluid.layers.one_hot(action, self.action_dim)
action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
pred_action_value = fluid.layers.reduce_sum(
fluid.layers.elementwise_mul(action_onehot, self.pred_value), dim=1)
targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
best_v = fluid.layers.reduce_max(targetQ_predict_value, dim=1)
best_v.stop_gradient = True
target = reward + (1.0 - fluid.layers.cast(
isOver, dtype='float32')) * self.gamma * best_v
cost = fluid.layers.square_error_cost(pred_action_value, target)
cost = fluid.layers.reduce_mean(cost)
self._sync_program = self._build_sync_target_network()
optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
optimizer.minimize(cost)
# define program
self.train_program = fluid.default_main_program()
# fluid exe
place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace()
self.exe = fluid.Executor(place)
self.exe.run(fluid.default_startup_program())
def get_DQN_prediction(self, image, target=False):
image = image / 255.0
variable_field = 'target' if target else 'policy'
conv1 = fluid.layers.conv2d(
input=image,
num_filters=32,
filter_size=[5, 5],
stride=[1, 1],
padding=[2, 2],
act='relu',
param_attr=ParamAttr(name='{}_conv1'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field)))
max_pool1 = fluid.layers.pool2d(
input=conv1, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
conv2 = fluid.layers.conv2d(
input=max_pool1,
num_filters=32,
filter_size=[5, 5],
stride=[1, 1],
padding=[2, 2],
act='relu',
param_attr=ParamAttr(name='{}_conv2'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field)))
max_pool2 = fluid.layers.pool2d(
input=conv2, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
conv3 = fluid.layers.conv2d(
input=max_pool2,
num_filters=64,
filter_size=[4, 4],
stride=[1, 1],
padding=[1, 1],
act='relu',
param_attr=ParamAttr(name='{}_conv3'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field)))
max_pool3 = fluid.layers.pool2d(
input=conv3, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
conv4 = fluid.layers.conv2d(
input=max_pool3,
num_filters=64,
filter_size=[3, 3],
stride=[1, 1],
padding=[1, 1],
act='relu',
param_attr=ParamAttr(name='{}_conv4'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field)))
flatten = fluid_flatten(conv4)
out = fluid.layers.fc(
input=flatten,
size=self.action_dim,
param_attr=ParamAttr(name='{}_fc1'.format(variable_field)),
bias_attr=ParamAttr(name='{}_fc1_b'.format(variable_field)))
return out
def _build_sync_target_network(self):
vars = list(fluid.default_main_program().list_vars())
policy_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars))
target_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'target' in x.name, vars))
policy_vars.sort(key=lambda x: x.name)
target_vars.sort(key=lambda x: x.name)
sync_program = fluid.default_main_program().clone()
with fluid.program_guard(sync_program):
sync_ops = []
for i, var in enumerate(policy_vars):
sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
sync_ops.append(sync_op)
sync_program = sync_program.prune(sync_ops)
return sync_program
def act(self, state, train_or_test):
sample = np.random.random()
if train_or_test == 'train' and sample < self.exploration:
act = np.random.randint(self.action_dim)
else:
if np.random.random() < 0.01:
act = np.random.randint(self.action_dim)
else:
state = np.expand_dims(state, axis=0)
pred_Q = self.exe.run(self.predict_program,
feed={'state': state.astype('float32')},
fetch_list=[self.pred_value])[0]
pred_Q = np.squeeze(pred_Q, axis=0)
act = np.argmax(pred_Q)
if train_or_test == 'train':
self.exploration = max(0.1, self.exploration - 1e-6)
return act
def train(self, state, action, reward, next_state, isOver):
if self.global_step % self.update_target_steps == 0:
self.sync_target_network()
self.global_step += 1
action = np.expand_dims(action, -1)
self.exe.run(self.train_program,
feed={
'state': state.astype('float32'),
'action': action.astype('int32'),
'reward': reward,
'next_s': next_state.astype('float32'),
'isOver': isOver
})
def sync_target_network(self):
self.exe.run(self._sync_program)
#-*- coding: utf-8 -*-
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
import numpy as np
from tqdm import tqdm
import math
from utils import fluid_argmax, fluid_flatten
class DoubleDQNModel(object):
def __init__(self, state_dim, action_dim, gamma, hist_len, use_cuda=False):
self.img_height = state_dim[0]
self.img_width = state_dim[1]
self.action_dim = action_dim
self.gamma = gamma
self.exploration = 1.1
self.update_target_steps = 10000 // 4
self.hist_len = hist_len
self.use_cuda = use_cuda
self.global_step = 0
self._build_net()
def _get_inputs(self):
return fluid.layers.data(
name='state',
shape=[self.hist_len, self.img_height, self.img_width],
dtype='float32'), \
fluid.layers.data(
name='action', shape=[1], dtype='int32'), \
fluid.layers.data(
name='reward', shape=[], dtype='float32'), \
fluid.layers.data(
name='next_s',
shape=[self.hist_len, self.img_height, self.img_width],
dtype='float32'), \
fluid.layers.data(
name='isOver', shape=[], dtype='bool')
def _build_net(self):
state, action, reward, next_s, isOver = self._get_inputs()
self.pred_value = self.get_DQN_prediction(state)
self.predict_program = fluid.default_main_program().clone()
reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
action_onehot = fluid.layers.one_hot(action, self.action_dim)
action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
pred_action_value = fluid.layers.reduce_sum(
fluid.layers.elementwise_mul(action_onehot, self.pred_value), dim=1)
targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
next_s_predcit_value = self.get_DQN_prediction(next_s)
greedy_action = fluid_argmax(next_s_predcit_value)
predict_onehot = fluid.layers.one_hot(greedy_action, self.action_dim)
best_v = fluid.layers.reduce_sum(
fluid.layers.elementwise_mul(predict_onehot, targetQ_predict_value),
dim=1)
best_v.stop_gradient = True
target = reward + (1.0 - fluid.layers.cast(
isOver, dtype='float32')) * self.gamma * best_v
cost = fluid.layers.square_error_cost(pred_action_value, target)
cost = fluid.layers.reduce_mean(cost)
self._sync_program = self._build_sync_target_network()
optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
optimizer.minimize(cost)
# define program
self.train_program = fluid.default_main_program()
# fluid exe
place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace()
self.exe = fluid.Executor(place)
self.exe.run(fluid.default_startup_program())
def get_DQN_prediction(self, image, target=False):
image = image / 255.0
variable_field = 'target' if target else 'policy'
conv1 = fluid.layers.conv2d(
input=image,
num_filters=32,
filter_size=[5, 5],
stride=[1, 1],
padding=[2, 2],
act='relu',
param_attr=ParamAttr(name='{}_conv1'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field)))
max_pool1 = fluid.layers.pool2d(
input=conv1, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
conv2 = fluid.layers.conv2d(
input=max_pool1,
num_filters=32,
filter_size=[5, 5],
stride=[1, 1],
padding=[2, 2],
act='relu',
param_attr=ParamAttr(name='{}_conv2'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field)))
max_pool2 = fluid.layers.pool2d(
input=conv2, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
conv3 = fluid.layers.conv2d(
input=max_pool2,
num_filters=64,
filter_size=[4, 4],
stride=[1, 1],
padding=[1, 1],
act='relu',
param_attr=ParamAttr(name='{}_conv3'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field)))
max_pool3 = fluid.layers.pool2d(
input=conv3, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
conv4 = fluid.layers.conv2d(
input=max_pool3,
num_filters=64,
filter_size=[3, 3],
stride=[1, 1],
padding=[1, 1],
act='relu',
param_attr=ParamAttr(name='{}_conv4'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field)))
flatten = fluid_flatten(conv4)
out = fluid.layers.fc(
input=flatten,
size=self.action_dim,
param_attr=ParamAttr(name='{}_fc1'.format(variable_field)),
bias_attr=ParamAttr(name='{}_fc1_b'.format(variable_field)))
return out
def _build_sync_target_network(self):
vars = list(fluid.default_main_program().list_vars())
policy_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars))
target_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'target' in x.name, vars))
policy_vars.sort(key=lambda x: x.name)
target_vars.sort(key=lambda x: x.name)
sync_program = fluid.default_main_program().clone()
with fluid.program_guard(sync_program):
sync_ops = []
for i, var in enumerate(policy_vars):
sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
sync_ops.append(sync_op)
sync_program = sync_program.prune(sync_ops)
return sync_program
def act(self, state, train_or_test):
sample = np.random.random()
if train_or_test == 'train' and sample < self.exploration:
act = np.random.randint(self.action_dim)
else:
if np.random.random() < 0.01:
act = np.random.randint(self.action_dim)
else:
state = np.expand_dims(state, axis=0)
pred_Q = self.exe.run(self.predict_program,
feed={'state': state.astype('float32')},
fetch_list=[self.pred_value])[0]
pred_Q = np.squeeze(pred_Q, axis=0)
act = np.argmax(pred_Q)
if train_or_test == 'train':
self.exploration = max(0.1, self.exploration - 1e-6)
return act
def train(self, state, action, reward, next_state, isOver):
if self.global_step % self.update_target_steps == 0:
self.sync_target_network()
self.global_step += 1
action = np.expand_dims(action, -1)
self.exe.run(self.train_program,
feed={
'state': state.astype('float32'),
'action': action.astype('int32'),
'reward': reward,
'next_s': next_state.astype('float32'),
'isOver': isOver
})
def sync_target_network(self):
self.exe.run(self._sync_program)
#-*- coding: utf-8 -*-
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
import numpy as np
from tqdm import tqdm
import math
from utils import fluid_flatten
class DuelingDQNModel(object):
def __init__(self, state_dim, action_dim, gamma, hist_len, use_cuda=False):
self.img_height = state_dim[0]
self.img_width = state_dim[1]
self.action_dim = action_dim
self.gamma = gamma
self.exploration = 1.1
self.update_target_steps = 10000 // 4
self.hist_len = hist_len
self.use_cuda = use_cuda
self.global_step = 0
self._build_net()
def _get_inputs(self):
return fluid.layers.data(
name='state',
shape=[self.hist_len, self.img_height, self.img_width],
dtype='float32'), \
fluid.layers.data(
name='action', shape=[1], dtype='int32'), \
fluid.layers.data(
name='reward', shape=[], dtype='float32'), \
fluid.layers.data(
name='next_s',
shape=[self.hist_len, self.img_height, self.img_width],
dtype='float32'), \
fluid.layers.data(
name='isOver', shape=[], dtype='bool')
def _build_net(self):
state, action, reward, next_s, isOver = self._get_inputs()
self.pred_value = self.get_DQN_prediction(state)
self.predict_program = fluid.default_main_program().clone()
reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
action_onehot = fluid.layers.one_hot(action, self.action_dim)
action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
pred_action_value = fluid.layers.reduce_sum(
fluid.layers.elementwise_mul(action_onehot, self.pred_value), dim=1)
targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
best_v = fluid.layers.reduce_max(targetQ_predict_value, dim=1)
best_v.stop_gradient = True
target = reward + (1.0 - fluid.layers.cast(
isOver, dtype='float32')) * self.gamma * best_v
cost = fluid.layers.square_error_cost(pred_action_value, target)
cost = fluid.layers.reduce_mean(cost)
self._sync_program = self._build_sync_target_network()
optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
optimizer.minimize(cost)
# define program
self.train_program = fluid.default_main_program()
# fluid exe
place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace()
self.exe = fluid.Executor(place)
self.exe.run(fluid.default_startup_program())
def get_DQN_prediction(self, image, target=False):
image = image / 255.0
variable_field = 'target' if target else 'policy'
conv1 = fluid.layers.conv2d(
input=image,
num_filters=32,
filter_size=[5, 5],
stride=[1, 1],
padding=[2, 2],
act='relu',
param_attr=ParamAttr(name='{}_conv1'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field)))
max_pool1 = fluid.layers.pool2d(
input=conv1, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
conv2 = fluid.layers.conv2d(
input=max_pool1,
num_filters=32,
filter_size=[5, 5],
stride=[1, 1],
padding=[2, 2],
act='relu',
param_attr=ParamAttr(name='{}_conv2'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field)))
max_pool2 = fluid.layers.pool2d(
input=conv2, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
conv3 = fluid.layers.conv2d(
input=max_pool2,
num_filters=64,
filter_size=[4, 4],
stride=[1, 1],
padding=[1, 1],
act='relu',
param_attr=ParamAttr(name='{}_conv3'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field)))
max_pool3 = fluid.layers.pool2d(
input=conv3, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
conv4 = fluid.layers.conv2d(
input=max_pool3,
num_filters=64,
filter_size=[3, 3],
stride=[1, 1],
padding=[1, 1],
act='relu',
param_attr=ParamAttr(name='{}_conv4'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field)))
flatten = fluid_flatten(conv4)
value = fluid.layers.fc(
input=flatten,
size=1,
param_attr=ParamAttr(name='{}_value_fc'.format(variable_field)),
bias_attr=ParamAttr(name='{}_value_fc_b'.format(variable_field)))
advantage = fluid.layers.fc(
input=flatten,
size=self.action_dim,
param_attr=ParamAttr(name='{}_advantage_fc'.format(variable_field)),
bias_attr=ParamAttr(
name='{}_advantage_fc_b'.format(variable_field)))
Q = advantage + (value - fluid.layers.reduce_mean(
advantage, dim=1, keep_dim=True))
return Q
def _build_sync_target_network(self):
vars = list(fluid.default_main_program().list_vars())
policy_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars))
target_vars = list(filter(
lambda x: 'GRAD' not in x.name and 'target' in x.name, vars))
policy_vars.sort(key=lambda x: x.name)
target_vars.sort(key=lambda x: x.name)
sync_program = fluid.default_main_program().clone()
with fluid.program_guard(sync_program):
sync_ops = []
for i, var in enumerate(policy_vars):
sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
sync_ops.append(sync_op)
sync_program = sync_program.prune(sync_ops)
return sync_program
def act(self, state, train_or_test):
sample = np.random.random()
if train_or_test == 'train' and sample < self.exploration:
act = np.random.randint(self.action_dim)
else:
if np.random.random() < 0.01:
act = np.random.randint(self.action_dim)
else:
state = np.expand_dims(state, axis=0)
pred_Q = self.exe.run(self.predict_program,
feed={'state': state.astype('float32')},
fetch_list=[self.pred_value])[0]
pred_Q = np.squeeze(pred_Q, axis=0)
act = np.argmax(pred_Q)
if train_or_test == 'train':
self.exploration = max(0.1, self.exploration - 1e-6)
return act
def train(self, state, action, reward, next_state, isOver):
if self.global_step % self.update_target_steps == 0:
self.sync_target_network()
self.global_step += 1
action = np.expand_dims(action, -1)
self.exe.run(self.train_program, \
feed={'state': state.astype('float32'), \
'action': action.astype('int32'), \
'reward': reward, \
'next_s': next_state.astype('float32'), \
'isOver': isOver})
def sync_target_network(self):
self.exe.run(self._sync_program)
[中文版](README_cn.md)
## Reproduce DQN, DoubleDQN, DuelingDQN model with Fluid version of PaddlePaddle
Based on PaddlePaddle's next-generation API Fluid, the DQN model of deep reinforcement learning is reproduced, and the same level of indicators of the paper is reproduced in the classic Atari game. The model receives the image of the game as input, and uses the end-to-end model to directly predict the next step. The repository contains the following three types of models:
+ DQN in
[Human-level Control Through Deep Reinforcement Learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html)
+ DoubleDQN in:
[Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389)
+ DuelingDQN in:
[Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html)
## Atari benchmark & performance
### Atari games introduction
Please see [here](https://gym.openai.com/envs/#atari) to know more about Atari game.
### Pong game result
The average game rewards that can be obtained for the three models as the number of training steps changes during the training are as follows(about 3 hours/1 Million steps):
<div align="center">
<img src="assets/dqn.png" width="600" height="300" alt="DQN result"></img>
</div>
## How to use
### Dependencies:
+ python2.7
+ gym
+ tqdm
+ opencv-python
+ paddlepaddle-gpu>=0.12.0
+ ale_python_interface
### Install Dependencies:
+ Install PaddlePaddle:
recommended to compile and install PaddlePaddle from source code
+ Install other dependencies:
```
pip install -r requirement.txt
pip install gym[atari]
```
Install ale_python_interface, please see [here](https://github.com/mgbellemare/Arcade-Learning-Environment).
### Start Training:
```
# To train a model for Pong game with gpu (use DQN model as default)
python train.py --rom ./rom_files/pong.bin --use_cuda
# To train a model for Pong with DoubleDQN
python train.py --rom ./rom_files/pong.bin --use_cuda --alg DoubleDQN
# To train a model for Pong with DuelingDQN
python train.py --rom ./rom_files/pong.bin --use_cuda --alg DuelingDQN
```
To train more games, you can install more rom files from [here](https://github.com/openai/atari-py/tree/master/atari_py/atari_roms).
### Start Testing:
```
# Play the game with saved best model and calculate the average rewards
python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong
# Play the game with visualization
python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong --viz 0.01
```
[Here](https://pan.baidu.com/s/1gIsbNw5V7tMeb74ojx-TMA) is saved models for Pong and Breakout games. You can use it to play the game directly.
## 基于PaddlePaddle的Fluid版本复现DQN, DoubleDQN, DuelingDQN三个模型
基于PaddlePaddle下一代API Fluid复现了深度强化学习领域的DQN模型,在经典的Atari 游戏上复现了论文同等水平的指标,模型接收游戏的图像作为输入,采用端到端的模型直接预测下一步要执行的控制信号,本仓库一共包含以下3类模型:
+ DQN模型:
[Human-level Control Through Deep Reinforcement Learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html)
+ DoubleDQN模型:
[Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389)
+ DuelingDQN模型:
[Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html)
## 模型效果:Atari游戏表现
### Atari游戏介绍
请点击[这里](https://gym.openai.com/envs/#atari)了解Atari游戏。
### Pong游戏训练结果
三个模型在训练过程中随着训练步数的变化,能得到的平均游戏奖励如下图所示(大概3小时每1百万步):
<div align="center">
<img src="assets/dqn.png" width="600" height="300" alt="DQN result"></img>
</div>
## 使用教程
### 依赖:
+ python2.7
+ gym
+ tqdm
+ opencv-python
+ paddlepaddle-gpu>=0.12.0
+ ale_python_interface
### 下载依赖:
+ 安装PaddlePaddle:
建议通过PaddlePaddle源码进行编译安装
+ 下载其它依赖:
```
pip install -r requirement.txt
pip install gym[atari]
```
安装ale_python_interface可以参考[这里](https://github.com/mgbellemare/Arcade-Learning-Environment)
### 训练模型:
```
# 使用GPU训练Pong游戏(默认使用DQN模型)
python train.py --rom ./rom_files/pong.bin --use_cuda
# 训练DoubleDQN模型
python train.py --rom ./rom_files/pong.bin --use_cuda --alg DoubleDQN
# 训练DuelingDQN模型
python train.py --rom ./rom_files/pong.bin --use_cuda --alg DuelingDQN
```
训练更多游戏,可以从[这里](https://github.com/openai/atari-py/tree/master/atari_py/atari_roms)下载游戏rom
### 测试模型:
```
# Play the game with saved model and calculate the average rewards
# 使用训练过程中保存的最好模型玩游戏,以及计算平均奖励(rewards)
python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong
# 以可视化的形式来玩游戏
python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong --viz 0.01
```
[这里](https://pan.baidu.com/s/1gIsbNw5V7tMeb74ojx-TMA)是Pong和Breakout游戏训练好的模型,可以直接用来测试。
# -*- coding: utf-8 -*-
import numpy as np
import os
import cv2
import threading
import gym
from gym import spaces
from gym.envs.atari.atari_env import ACTION_MEANING
from ale_python_interface import ALEInterface
__all__ = ['AtariPlayer']
ROM_URL = "https://github.com/openai/atari-py/tree/master/atari_py/atari_roms"
_ALE_LOCK = threading.Lock()
"""
The following AtariPlayer are copied or modified from tensorpack/tensorpack:
https://github.com/tensorpack/tensorpack/blob/master/examples/DeepQNetwork/atari.py
"""
class AtariPlayer(gym.Env):
"""
A wrapper for ALE emulator, with configurations to mimic DeepMind DQN settings.
Info:
score: the accumulated reward in the current game
gameOver: True when the current game is Over
"""
def __init__(self,
rom_file,
viz=0,
frame_skip=4,
nullop_start=30,
live_lost_as_eoe=True,
max_num_frames=0):
"""
Args:
rom_file: path to the rom
frame_skip: skip every k frames and repeat the action
viz: visualization to be done.
Set to 0 to disable.
Set to a positive number to be the delay between frames to show.
Set to a string to be a directory to store frames.
nullop_start: start with random number of null ops.
live_losts_as_eoe: consider lost of lives as end of episode. Useful for training.
max_num_frames: maximum number of frames per episode.
"""
super(AtariPlayer, self).__init__()
assert os.path.isfile(rom_file), \
"rom {} not found. Please download at {}".format(rom_file, ROM_URL)
try:
ALEInterface.setLoggerMode(ALEInterface.Logger.Error)
except AttributeError:
print("You're not using latest ALE")
# avoid simulator bugs: https://github.com/mgbellemare/Arcade-Learning-Environment/issues/86
with _ALE_LOCK:
self.ale = ALEInterface()
self.ale.setInt(b"random_seed", np.random.randint(0, 30000))
self.ale.setInt(b"max_num_frames_per_episode", max_num_frames)
self.ale.setBool(b"showinfo", False)
self.ale.setInt(b"frame_skip", 1)
self.ale.setBool(b'color_averaging', False)
# manual.pdf suggests otherwise.
self.ale.setFloat(b'repeat_action_probability', 0.0)
# viz setup
if isinstance(viz, str):
assert os.path.isdir(viz), viz
self.ale.setString(b'record_screen_dir', viz)
viz = 0
if isinstance(viz, int):
viz = float(viz)
self.viz = viz
if self.viz and isinstance(self.viz, float):
self.windowname = os.path.basename(rom_file)
cv2.startWindowThread()
cv2.namedWindow(self.windowname)
self.ale.loadROM(rom_file.encode('utf-8'))
self.width, self.height = self.ale.getScreenDims()
self.actions = self.ale.getMinimalActionSet()
self.live_lost_as_eoe = live_lost_as_eoe
self.frame_skip = frame_skip
self.nullop_start = nullop_start
self.action_space = spaces.Discrete(len(self.actions))
self.observation_space = spaces.Box(low=0,
high=255,
shape=(self.height, self.width),
dtype=np.uint8)
self._restart_episode()
def get_action_meanings(self):
return [ACTION_MEANING[i] for i in self.actions]
def _grab_raw_image(self):
"""
:returns: the current 3-channel image
"""
m = self.ale.getScreenRGB()
return m.reshape((self.height, self.width, 3))
def _current_state(self):
"""
returns: a gray-scale (h, w) uint8 image
"""
ret = self._grab_raw_image()
# avoid missing frame issue: max-pooled over the last screen
ret = np.maximum(ret, self.last_raw_screen)
if self.viz:
if isinstance(self.viz, float):
cv2.imshow(self.windowname, ret)
cv2.waitKey(int(self.viz * 1000))
ret = ret.astype('float32')
# 0.299,0.587.0.114. same as rgb2y in torch/image
ret = cv2.cvtColor(ret, cv2.COLOR_RGB2GRAY)
return ret.astype('uint8') # to save some memory
def _restart_episode(self):
with _ALE_LOCK:
self.ale.reset_game()
# random null-ops start
n = np.random.randint(self.nullop_start)
self.last_raw_screen = self._grab_raw_image()
for k in range(n):
if k == n - 1:
self.last_raw_screen = self._grab_raw_image()
self.ale.act(0)
def reset(self):
if self.ale.game_over():
self._restart_episode()
return self._current_state()
def step(self, act):
oldlives = self.ale.lives()
r = 0
for k in range(self.frame_skip):
if k == self.frame_skip - 1:
self.last_raw_screen = self._grab_raw_image()
r += self.ale.act(self.actions[act])
newlives = self.ale.lives()
if self.ale.game_over() or \
(self.live_lost_as_eoe and newlives < oldlives):
break
isOver = self.ale.game_over()
if self.live_lost_as_eoe:
isOver = isOver or newlives < oldlives
info = {'ale.lives': newlives}
return self._current_state(), r, isOver, info
# -*- coding: utf-8 -*-
import numpy as np
from collections import deque
import gym
from gym import spaces
_v0, _v1 = gym.__version__.split('.')[:2]
assert int(_v0) > 0 or int(_v1) >= 10, gym.__version__
"""
The following wrappers are copied or modified from openai/baselines:
https://github.com/openai/baselines/blob/master/baselines/common/atari_wrappers.py
"""
class MapState(gym.ObservationWrapper):
def __init__(self, env, map_func):
gym.ObservationWrapper.__init__(self, env)
self._func = map_func
def observation(self, obs):
return self._func(obs)
class FrameStack(gym.Wrapper):
def __init__(self, env, k):
"""Buffer observations and stack across channels (last axis)."""
gym.Wrapper.__init__(self, env)
self.k = k
self.frames = deque([], maxlen=k)
shp = env.observation_space.shape
chan = 1 if len(shp) == 2 else shp[2]
self.observation_space = spaces.Box(low=0,
high=255,
shape=(shp[0], shp[1], chan * k),
dtype=np.uint8)
def reset(self):
"""Clear buffer and re-fill by duplicating the first observation."""
ob = self.env.reset()
for _ in range(self.k - 1):
self.frames.append(np.zeros_like(ob))
self.frames.append(ob)
return self.observation()
def step(self, action):
ob, reward, done, info = self.env.step(action)
self.frames.append(ob)
return self.observation(), reward, done, info
def observation(self):
assert len(self.frames) == self.k
return np.stack(self.frames, axis=0)
class _FireResetEnv(gym.Wrapper):
def __init__(self, env):
"""Take action on reset for environments that are fixed until firing."""
gym.Wrapper.__init__(self, env)
assert env.unwrapped.get_action_meanings()[1] == 'FIRE'
assert len(env.unwrapped.get_action_meanings()) >= 3
def reset(self):
self.env.reset()
obs, _, done, _ = self.env.step(1)
if done:
self.env.reset()
obs, _, done, _ = self.env.step(2)
if done:
self.env.reset()
return obs
def step(self, action):
return self.env.step(action)
def FireResetEnv(env):
if isinstance(env, gym.Wrapper):
baseenv = env.unwrapped
else:
baseenv = env
if 'FIRE' in baseenv.get_action_meanings():
return _FireResetEnv(env)
return env
class LimitLength(gym.Wrapper):
def __init__(self, env, k):
gym.Wrapper.__init__(self, env)
self.k = k
def reset(self):
# This assumes that reset() will really reset the env.
# If the underlying env tries to be smart about reset
# (e.g. end-of-life), the assumption doesn't hold.
ob = self.env.reset()
self.cnt = 0
return ob
def step(self, action):
ob, r, done, info = self.env.step(action)
self.cnt += 1
if self.cnt == self.k:
done = True
return ob, r, done, info
# -*- coding: utf-8 -*-
import numpy as np
import copy
from collections import deque, namedtuple
Experience = namedtuple('Experience', ['state', 'action', 'reward', 'isOver'])
class ReplayMemory(object):
def __init__(self, max_size, state_shape, context_len):
self.max_size = int(max_size)
self.state_shape = state_shape
self.context_len = int(context_len)
self.state = np.zeros((self.max_size, ) + state_shape, dtype='uint8')
self.action = np.zeros((self.max_size, ), dtype='int32')
self.reward = np.zeros((self.max_size, ), dtype='float32')
self.isOver = np.zeros((self.max_size, ), dtype='bool')
self._curr_size = 0
self._curr_pos = 0
self._context = deque(maxlen=context_len - 1)
def append(self, exp):
"""append a new experience into replay memory
"""
if self._curr_size < self.max_size:
self._assign(self._curr_pos, exp)
self._curr_size += 1
else:
self._assign(self._curr_pos, exp)
self._curr_pos = (self._curr_pos + 1) % self.max_size
if exp.isOver:
self._context.clear()
else:
self._context.append(exp)
def recent_state(self):
""" maintain recent state for training"""
lst = list(self._context)
states = [np.zeros(self.state_shape, dtype='uint8')] * \
(self._context.maxlen - len(lst))
states.extend([k.state for k in lst])
return states
def sample(self, idx):
""" return state, action, reward, isOver,
note that some frames in state may be generated from last episode,
they should be removed from state
"""
state = np.zeros(
(self.context_len + 1, ) + self.state_shape, dtype=np.uint8)
state_idx = np.arange(idx, idx + self.context_len + 1) % self._curr_size
# confirm that no frame was generated from last episode
has_last_episode = False
for k in range(self.context_len - 2, -1, -1):
to_check_idx = state_idx[k]
if self.isOver[to_check_idx]:
has_last_episode = True
state_idx = state_idx[k + 1:]
state[k + 1:] = self.state[state_idx]
break
if not has_last_episode:
state = self.state[state_idx]
real_idx = (idx + self.context_len - 1) % self._curr_size
action = self.action[real_idx]
reward = self.reward[real_idx]
isOver = self.isOver[real_idx]
return state, reward, action, isOver
def __len__(self):
return self._curr_size
def _assign(self, pos, exp):
self.state[pos] = exp.state
self.reward[pos] = exp.reward
self.action[pos] = exp.action
self.isOver[pos] = exp.isOver
def sample_batch(self, batch_size):
"""sample a batch from replay memory for training
"""
batch_idx = np.random.randint(
self._curr_size - self.context_len - 1, size=batch_size)
batch_idx = (self._curr_pos + batch_idx) % self._curr_size
batch_exp = [self.sample(i) for i in batch_idx]
return self._process_batch(batch_exp)
def _process_batch(self, batch_exp):
state = np.asarray([e[0] for e in batch_exp], dtype='uint8')
reward = np.asarray([e[1] for e in batch_exp], dtype='float32')
action = np.asarray([e[2] for e in batch_exp], dtype='int8')
isOver = np.asarray([e[3] for e in batch_exp], dtype='bool')
return [state, action, reward, isOver]
#-*- coding: utf-8 -*-
import argparse
import os
import numpy as np
import paddle.fluid as fluid
from train import get_player
from tqdm import tqdm
def predict_action(exe, state, predict_program, feed_names, fetch_targets,
action_dim):
if np.random.random() < 0.01:
act = np.random.randint(action_dim)
else:
state = np.expand_dims(state, axis=0)
pred_Q = exe.run(predict_program,
feed={feed_names[0]: state.astype('float32')},
fetch_list=fetch_targets)[0]
pred_Q = np.squeeze(pred_Q, axis=0)
act = np.argmax(pred_Q)
return act
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument(
'--use_cuda', action='store_true', help='if set, use cuda')
parser.add_argument('--rom', type=str, required=True, help='atari rom')
parser.add_argument(
'--model_path', type=str, required=True, help='dirname to load model')
parser.add_argument(
'--viz',
type=float,
default=0,
help='''viz: visualization setting:
Set to 0 to disable;
Set to a positive number to be the delay between frames to show.
''')
args = parser.parse_args()
env = get_player(args.rom, viz=args.viz)
place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
exe = fluid.Executor(place)
inference_scope = fluid.core.Scope()
with fluid.scope_guard(inference_scope):
[predict_program, feed_names,
fetch_targets] = fluid.io.load_inference_model(args.model_path, exe)
episode_reward = []
for _ in tqdm(xrange(30), desc='eval agent'):
state = env.reset()
total_reward = 0
while True:
action = predict_action(exe, state, predict_program, feed_names,
fetch_targets, env.action_space.n)
state, reward, isOver, info = env.step(action)
total_reward += reward
if isOver:
break
episode_reward.append(total_reward)
eval_reward = np.mean(episode_reward)
print('Average reward of 30 epidose: {}'.format(eval_reward))
numpy
gym
tqdm
opencv-python
paddlepaddle-gpu==0.12.0
#-*- coding: utf-8 -*-
from DQN_agent import DQNModel
from DoubleDQN_agent import DoubleDQNModel
from DuelingDQN_agent import DuelingDQNModel
from atari import AtariPlayer
import paddle.fluid as fluid
import gym
import argparse
import cv2
from tqdm import tqdm
from expreplay import ReplayMemory, Experience
import numpy as np
import os
from datetime import datetime
from atari_wrapper import FrameStack, MapState, FireResetEnv, LimitLength
from collections import deque
UPDATE_FREQ = 4
#MEMORY_WARMUP_SIZE = 2000
MEMORY_SIZE = 1e6
MEMORY_WARMUP_SIZE = MEMORY_SIZE // 20
IMAGE_SIZE = (84, 84)
CONTEXT_LEN = 4
ACTION_REPEAT = 4 # aka FRAME_SKIP
UPDATE_FREQ = 4
def run_train_episode(agent, env, exp):
total_reward = 0
state = env.reset()
step = 0
while True:
step += 1
context = exp.recent_state()
context.append(state)
context = np.stack(context, axis=0)
action = agent.act(context, train_or_test='train')
next_state, reward, isOver, _ = env.step(action)
exp.append(Experience(state, action, reward, isOver))
# train model
# start training
if len(exp) > MEMORY_WARMUP_SIZE:
if step % UPDATE_FREQ == 0:
batch_all_state, batch_action, batch_reward, batch_isOver = exp.sample_batch(
args.batch_size)
batch_state = batch_all_state[:, :CONTEXT_LEN, :, :]
batch_next_state = batch_all_state[:, 1:, :, :]
agent.train(batch_state, batch_action, batch_reward,
batch_next_state, batch_isOver)
total_reward += reward
state = next_state
if isOver:
break
return total_reward, step
def get_player(rom, viz=False, train=False):
env = AtariPlayer(
rom,
frame_skip=ACTION_REPEAT,
viz=viz,
live_lost_as_eoe=train,
max_num_frames=60000)
env = FireResetEnv(env)
env = MapState(env, lambda im: cv2.resize(im, IMAGE_SIZE))
if not train:
# in training, context is taken care of in expreplay buffer
env = FrameStack(env, CONTEXT_LEN)
return env
def eval_agent(agent, env):
episode_reward = []
for _ in tqdm(range(30), desc='eval agent'):
state = env.reset()
total_reward = 0
step = 0
while True:
step += 1
action = agent.act(state, train_or_test='test')
state, reward, isOver, info = env.step(action)
total_reward += reward
if isOver:
break
episode_reward.append(total_reward)
eval_reward = np.mean(episode_reward)
return eval_reward
def train_agent():
env = get_player(args.rom, train=True)
test_env = get_player(args.rom)
exp = ReplayMemory(args.mem_size, IMAGE_SIZE, CONTEXT_LEN)
action_dim = env.action_space.n
if args.alg == 'DQN':
agent = DQNModel(IMAGE_SIZE, action_dim, args.gamma, CONTEXT_LEN,
args.use_cuda)
elif args.alg == 'DoubleDQN':
agent = DoubleDQNModel(IMAGE_SIZE, action_dim, args.gamma, CONTEXT_LEN,
args.use_cuda)
elif args.alg == 'DuelingDQN':
agent = DuelingDQNModel(IMAGE_SIZE, action_dim, args.gamma, CONTEXT_LEN,
args.use_cuda)
else:
print('Input algorithm name error!')
return
with tqdm(total=MEMORY_WARMUP_SIZE) as pbar:
while len(exp) < MEMORY_WARMUP_SIZE:
total_reward, step = run_train_episode(agent, env, exp)
pbar.update(step)
# train
test_flag = 0
save_flag = 0
pbar = tqdm(total=1e8)
recent_100_reward = []
total_step = 0
max_reward = None
save_path = os.path.join(args.model_dirname, '{}-{}'.format(
args.alg, os.path.basename(args.rom).split('.')[0]))
while True:
# start epoch
total_reward, step = run_train_episode(agent, env, exp)
total_step += step
pbar.set_description('[train]exploration:{}'.format(agent.exploration))
pbar.update(step)
if total_step // args.test_every_steps == test_flag:
pbar.write("testing")
eval_reward = eval_agent(agent, test_env)
test_flag += 1
print("eval_agent done, (steps, eval_reward): ({}, {})".format(
total_step, eval_reward))
if max_reward is None or eval_reward > max_reward:
max_reward = eval_reward
fluid.io.save_inference_model(save_path, ['state'],
agent.pred_value, agent.exe,
agent.predict_program)
pbar.close()
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument(
'--alg',
type=str,
default='DQN',
help='Reinforcement learning algorithm, support: DQN, DoubleDQN, DuelingDQN'
)
parser.add_argument(
'--use_cuda', action='store_true', help='if set, use cuda')
parser.add_argument(
'--gamma',
type=float,
default=0.99,
help='discount factor for accumulated reward computation')
parser.add_argument(
'--mem_size',
type=int,
default=1000000,
help='memory size for experience replay')
parser.add_argument(
'--batch_size', type=int, default=64, help='batch size for training')
parser.add_argument('--rom', help='atari rom', required=True)
parser.add_argument(
'--model_dirname',
type=str,
default='saved_model',
help='dirname to save model')
parser.add_argument(
'--test_every_steps',
type=int,
default=100000,
help='every steps number to run test')
args = parser.parse_args()
train_agent()
#-*- coding: utf-8 -*-
#File: utils.py
import paddle.fluid as fluid
import numpy as np
def fluid_argmax(x):
"""
Get index of max value for the last dimension
"""
_, max_index = fluid.layers.topk(x, k=1)
return max_index
def fluid_flatten(x):
"""
Flatten fluid variable along the first dimension
"""
return fluid.layers.reshape(x, shape=[-1, np.prod(x.shape[1:])])
The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
---
# Advbox
Advbox is a toolbox to generate adversarial examples that fool neural networks and Advbox can benchmark the robustness of machine learning models.
The Advbox is based on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) Fluid and is under continual development, always welcoming contributions of the latest method of adversarial attacks and defenses.
## Overview
[Szegedy et al.](https://arxiv.org/abs/1312.6199) discovered an intriguing properties of deep neural networks in the context of image classification for the first time. They showed that despite the state-of-the-art deep networks are surprisingly susceptible to adversarial attacks in the form of small perturbations to images that remain (almost) imperceptible to human vision system. These perturbations are found by optimizing the input to maximize the prediction error and the images modified by these perturbations are called as `adversarial examples`. The profound implications of these results triggered a wide interest of researchers in adversarial attacks and their defenses for deep learning in general.
Advbox is similar to [Foolbox](https://github.com/bethgelab/foolbox) and [CleverHans](https://github.com/tensorflow/cleverhans). CleverHans only supports TensorFlow framework while foolbox interfaces with many popular machine learning frameworks such as PyTorch, Keras, TensorFlow, Theano, Lasagne and MXNet. However, these two great libraries don't support PaddlePaddle, an easy-to-use, efficient, flexible and scalable deep learning platform which is originally developed by Baidu scientists and engineers for the purpose of applying deep learning to many products at Baidu.
## Usage
Advbox provides many stable reference implementations of modern methods to generate adversarial examples such as FGSM, DeepFool, JSMA. When you want to benchmark the robustness of your neural networks , you can use the advbox to generate some adversarial examples and benchmark the networks. Some tips of using Advbox:
1. Train a model and save the parameters.
2. Load the parameters which has been trained,then reconstruct the model.
3. Use advbox to generate the adversarial samples.
#### Dependencies
* PaddlePaddle: [the lastest develop branch](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html)
* Python 2.x
#### Structure
Network models, attack method's implements and the criterion that defines adversarial examples are three essential elements to generate adversarial examples. Misclassification is adopted as the adversarial criterion for briefness in Advbox.
The structure of Advbox module are as follows:
.
├── advbox
| ├── __init__.py
| ├── attack
| ├── __init__.py
| ├── base.py
| ├── deepfool.py
| ├── gradient_method.py
| ├── lbfgs.py
| └── saliency.py
| ├── models
| ├── __init__.py
| ├── base.py
| └── paddle.py
| └── adversary.py
├── tutorials
| ├── __init__.py
| ├── mnist_model.py
| ├── mnist_tutorial_lbfgs.py
| ├── mnist_tutorial_fgsm.py
| ├── mnist_tutorial_bim.py
| ├── mnist_tutorial_ilcm.py
| ├── mnist_tutorial_mifgsm.py
| ├── mnist_tutorial_jsma.py
| └── mnist_tutorial_deepfool.py
└── README.md
**advbox.attack**
Advbox implements several popular adversarial attacks which search adversarial examples. Each attack method uses a distance measure(L1, L2, etc.) to quantify the size of adversarial perturbations. Advbox is easy to craft adversarial example as some attack methods could perform internal hyperparameter tuning to find the minimum perturbation.
**advbox.model**
Advbox implements interfaces to PaddlePaddle. Additionally, other deep learning framworks such as TensorFlow can also be defined and employed. The module is use to compute predictions and gradients for given inputs in a specific framework.
**advbox.adversary**
Adversary contains the original object, the target and the adversarial examples. It provides the misclassification as the criterion to accept a adversarial example.
## Tutorials
The `./tutorials/` folder provides some tutorials to generate adversarial examples on the MNIST dataset. You can slightly modify the code to apply to other dataset. These attack methods are supported in Advbox:
* [L-BFGS](https://arxiv.org/abs/1312.6199)
* [FGSM](https://arxiv.org/abs/1412.6572)
* [BIM](https://arxiv.org/abs/1607.02533)
* [ILCM](https://arxiv.org/abs/1607.02533)
* [MI-FGSM](https://arxiv.org/pdf/1710.06081.pdf)
* [JSMA](https://arxiv.org/pdf/1511.07528)
* [DeepFool](https://arxiv.org/abs/1511.04599)
## Testing
Benchmarks on a vanilla CNN model.
> MNIST
| adversarial attacks | fooling rate (non-targeted) | fooling rate (targeted) | max_epsilon | iterations | Strength |
|:-----:| :----: | :---: | :----: | :----: | :----: |
|L-BFGS| --- | 89.2% | --- | One shot | *** |
|FGSM| 57.8% | 26.55% | 0.3 | One shot| *** |
|BIM| 97.4% | --- | 0.1 | 100 | **** |
|ILCM| --- | 100.0% | 0.1 | 100 | **** |
|MI-FGSM| 94.4% | 100.0% | 0.1 | 100 | **** |
|JSMA| 96.8% | 90.4%| 0.1 | 2000 | *** |
|DeepFool| 97.7% | 51.3% | --- | 100 | **** |
* The strength (higher for more asterisks) is based on the impression from the reviewed literature.
---
## References
* [Intriguing properties of neural networks](https://arxiv.org/abs/1312.6199), C. Szegedy et al., arxiv 2014
* [Explaining and Harnessing Adversarial Examples](https://arxiv.org/abs/1412.6572), I. Goodfellow et al., ICLR 2015
* [Adversarial Examples In The Physical World](https://arxiv.org/pdf/1607.02533v3.pdf), A. Kurakin et al., ICLR workshop 2017
* [Boosting Adversarial Attacks with Momentum](https://arxiv.org/abs/1710.06081), Yinpeng Dong et al., arxiv 2018
* [The Limitations of Deep Learning in Adversarial Settings](https://arxiv.org/abs/1511.07528), N. Papernot et al., ESSP 2016
* [DeepFool: a simple and accurate method to fool deep neural networks](https://arxiv.org/abs/1511.04599), S. Moosavi-Dezfooli et al., CVPR 2016
* [Foolbox: A Python toolbox to benchmark the robustness of machine learning models](https://arxiv.org/abs/1707.04131), Jonas Rauber et al., arxiv 2018
* [CleverHans: An adversarial example library for constructing attacks, building defenses, and benchmarking both](https://github.com/tensorflow/cleverhans#setting-up-cleverhans)
* [Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey](https://arxiv.org/abs/1801.00553), Naveed Akhtar, Ajmal Mian, arxiv 2018
"""
A set of tools for generating adversarial example on paddle platform
"""
"""
Defines a class that contains the original object, the target and the
adversarial example.
"""
class Adversary(object):
"""
Adversary contains the original object, the target and the adversarial
example.
"""
def __init__(self, original, original_label=None):
"""
:param original: The original instance, such as an image.
:param original_label: The original instance's label.
"""
assert original is not None
self.original_label = original_label
self.target_label = None
self.adversarial_label = None
self.__original = original
self.__target = None
self.__is_targeted_attack = False
self.__adversarial_example = None
self.__bad_adversarial_example = None
def set_target(self, is_targeted_attack, target=None, target_label=None):
"""
Set the target be targeted or untargeted.
:param is_targeted_attack: bool
:param target: The target.
:param target_label: If is_targeted_attack is true and target_label is
None, self.target_label will be set by the Attack class.
If is_targeted_attack is false, target_label must be None.
"""
assert (target_label is None) or is_targeted_attack
self.__is_targeted_attack = is_targeted_attack
self.target_label = target_label
self.__target = target
if not is_targeted_attack:
self.target_label = None
self.__target = None
def set_original(self, original, original_label=None):
"""
Reset the original.
:param original: Original instance.
:param original_label: Original instance's label.
"""
if original != self.__original:
self.__original = original
self.original_label = original_label
self.__adversarial_example = None
self.__bad_adversarial_example = None
if original is None:
self.original_label = None
def _is_successful(self, adversarial_label):
"""
Is the adversarial_label is the expected adversarial label.
:param adversarial_label: adversarial label.
:return: bool
"""
if self.target_label is not None:
return adversarial_label == self.target_label
else:
return (adversarial_label is not None) and \
(adversarial_label != self.original_label)
def is_successful(self):
"""
Has the adversarial example been found.
:return: bool
"""
return self._is_successful(self.adversarial_label)
def try_accept_the_example(self, adversarial_example, adversarial_label):
"""
If adversarial_label the target label that we are finding.
The adversarial_example and adversarial_label will be accepted and
True will be returned.
:return: bool
"""
assert adversarial_example is not None
assert self.__original.shape == adversarial_example.shape
ok = self._is_successful(adversarial_label)
if ok:
self.__adversarial_example = adversarial_example
self.adversarial_label = adversarial_label
else:
self.__bad_adversarial_example = adversarial_example
return ok
def perturbation(self, multiplying_factor=1.0):
"""
The perturbation that the adversarial_example is added.
:param multiplying_factor: float.
:return: The perturbation that is multiplied by multiplying_factor.
"""
assert self.__original is not None
assert (self.__adversarial_example is not None) or \
(self.__bad_adversarial_example is not None)
if self.__adversarial_example is not None:
return multiplying_factor * (
self.__adversarial_example - self.__original)
else:
return multiplying_factor * (
self.__bad_adversarial_example - self.__original)
@property
def is_targeted_attack(self):
"""
:property: is_targeted_attack
"""
return self.__is_targeted_attack
@property
def target(self):
"""
:property: target
"""
return self.__target
@property
def original(self):
"""
:property: original
"""
return self.__original
@property
def adversarial_example(self):
"""
:property: adversarial_example
"""
return self.__adversarial_example
@property
def bad_adversarial_example(self):
"""
:property: bad_adversarial_example
"""
return self.__bad_adversarial_example
"""
The base model of the model.
"""
import logging
from abc import ABCMeta
from abc import abstractmethod
import numpy as np
class Attack(object):
"""
Abstract base class for adversarial attacks. `Attack` represent an
adversarial attack which search an adversarial example. subclass should
implement the _apply() method.
Args:
model(Model): an instance of the class advbox.base.Model.
"""
__metaclass__ = ABCMeta
def __init__(self, model):
self.model = model
def __call__(self, adversary, **kwargs):
"""
Generate the adversarial sample.
Args:
adversary(object): The adversary object.
**kwargs: Other named arguments.
"""
self._preprocess(adversary)
return self._apply(adversary, **kwargs)
@abstractmethod
def _apply(self, adversary, **kwargs):
"""
Search an adversarial example.
Args:
adversary(object): The adversary object.
**kwargs: Other named arguments.
"""
raise NotImplementedError
def _preprocess(self, adversary):
"""
Preprocess the adversary object.
:param adversary: adversary
:return: None
"""
assert self.model.channel_axis() == adversary.original.ndim
if adversary.original_label is None:
adversary.original_label = np.argmax(
self.model.predict(adversary.original))
if adversary.is_targeted_attack and adversary.target_label is None:
if adversary.target is None:
raise ValueError(
'When adversary.is_targeted_attack is true, '
'adversary.target_label or adversary.target must be set.')
else:
adversary.target_label = np.argmax(
self.model.predict(adversary.target))
logging.info('adversary:'
'\n original_label: {}'
'\n target_label: {}'
'\n is_targeted_attack: {}'
''.format(adversary.original_label, adversary.target_label,
adversary.is_targeted_attack))
"""
This module provide the attack method for deepfool. Deepfool is a simple and
accurate adversarial attack.
"""
from __future__ import division
import logging
import numpy as np
from .base import Attack
__all__ = ['DeepFoolAttack']
class DeepFoolAttack(Attack):
"""
DeepFool: a simple and accurate method to fool deep neural networks",
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Pascal Frossard,
https://arxiv.org/abs/1511.04599
"""
def _apply(self, adversary, iterations=100, overshoot=0.02):
"""
Apply the deep fool attack.
Args:
adversary(Adversary): The Adversary object.
iterations(int): The iterations.
overshoot(float): We add (1+overshoot)*pert every iteration.
Return:
adversary: The Adversary object.
"""
assert adversary is not None
pre_label = adversary.original_label
min_, max_ = self.model.bounds()
f = self.model.predict(adversary.original)
if adversary.is_targeted_attack:
labels = [adversary.target_label]
else:
max_class_count = 10
class_count = self.model.num_classes()
if class_count > max_class_count:
labels = np.argsort(f)[-(max_class_count + 1):-1]
else:
labels = np.arange(class_count)
gradient = self.model.gradient(adversary.original, pre_label)
x = adversary.original
for iteration in xrange(iterations):
w = np.inf
w_norm = np.inf
pert = np.inf
for k in labels:
if k == pre_label:
continue
gradient_k = self.model.gradient(x, k)
w_k = gradient_k - gradient
f_k = f[k] - f[pre_label]
w_k_norm = np.linalg.norm(w_k.flatten()) + 1e-8
pert_k = (np.abs(f_k) + 1e-8) / w_k_norm
if pert_k < pert:
pert = pert_k
w = w_k
w_norm = w_k_norm
r_i = -w * pert / w_norm # The gradient is -gradient in the paper.
x = x + (1 + overshoot) * r_i
x = np.clip(x, min_, max_)
f = self.model.predict(x)
gradient = self.model.gradient(x, pre_label)
adv_label = np.argmax(f)
logging.info('iteration={}, f[pre_label]={}, f[target_label]={}'
', f[adv_label]={}, pre_label={}, adv_label={}'
''.format(iteration, f[pre_label], (
f[adversary.target_label]
if adversary.is_targeted_attack else 'NaN'), f[
adv_label], pre_label, adv_label))
if adversary.try_accept_the_example(x, adv_label):
return adversary
return adversary
此差异已折叠。
此差异已折叠。
"""
Models __init__.py
"""
\ No newline at end of file
此差异已折叠。
此差异已折叠。
"""
A set of tutorials for generating adversarial examples with advbox.
"""
\ No newline at end of file
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册