提交 3cbf0f73 编写于 作者: G guosheng

Merge branch 'develop' of https://github.com/PaddlePaddle/models into...

Merge branch 'develop' of https://github.com/PaddlePaddle/models into add-transformer-BeamsearchDecoder-clean
运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html)中的说明更新PaddlePaddle安装版本。
---
# 基于深度因子分解机的点击率预估模型
## 介绍
本模型实现了下述论文中提出的DeepFM模型:
```text
@inproceedings{guo2017deepfm,
title={DeepFM: A Factorization-Machine based Neural Network for CTR Prediction},
author={Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li and Xiuqiang He},
booktitle={the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI)},
pages={1725--1731},
year={2017}
}
```
DeepFM模型把因子分解机和深度神经网络的低阶和高阶特征的相互作用结合起来,有关因子分解机的详细信息,请参考论文[因子分解机](https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf)
## 数据集
本文使用的是Kaggle公司举办的[展示广告竞赛](https://www.kaggle.com/c/criteo-display-ad-challenge/)中所使用的Criteo数据集。
每一行是一次广告展示的特征,第一列是一个标签,表示这次广告展示是否被点击。总共有39个特征,其中13个特征采用整型值,另外26个特征是类别类特征。测试集中是没有标签的。
下载数据集:
```bash
cd data && ./download.sh && cd ..
```
## 模型
DeepFM模型是由因子分解机(FM)和深度神经网络(DNN)组成的。所有的输入特征都会同时输入FM和DNN,最后把FM和DNN的输出结合在一起形成最终的输出。DNN中稀疏特征生成的嵌入层与FM层中的隐含向量(因子)共享参数。
PaddlePaddle中的因子分解机层负责计算二阶组合特征的相互关系。以下的代码示例结合了因子分解机层和全连接层,形成了完整的的因子分解机:
```python
def fm_layer(input, factor_size):
first_order = paddle.layer.fc(input=input, size=1, act=paddle.activation.Linear())
second_order = paddle.layer.factorization_machine(input=input, factor_size=factor_size)
fm = paddle.layer.addto(input=[first_order, second_order],
act=paddle.activation.Linear(),
bias_attr=False)
return fm
```
## 数据准备
处理原始数据集,整型特征使用min-max归一化方法规范到[0, 1],类别类特征使用了one-hot编码。原始数据集分割成两部分:90%用于训练,其他10%用于训练过程中的验证。
```bash
python preprocess.py --datadir ./data/raw --outdir ./data
```
## 训练
训练的命令行选项可以通过`python train.py -h`列出。
训练模型:
```bash
python train.py \
--train_data_path data/train.txt \
--test_data_path data/valid.txt \
2>&1 | tee train.log
```
训练到第9轮的第40000个batch后,测试的AUC为0.807178,误差(cost)为0.445196。
## 预测
预测的命令行选项可以通过`python infer.py -h`列出。
对测试集进行预测:
```bash
python infer.py \
--model_gz_path models/model-pass-9-batch-10000.tar.gz \
--data_path data/test.txt \
--prediction_output_path ./predict.txt
```
The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html). The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
--- ## Deep Automatic Speech Recognition
### TODO
This project is still under active development. ### Introduction
TBD
### Installation
#### Kaldi
The decoder depends on [kaldi](https://github.com/kaldi-asr/kaldi), install it by flowing its instructions. Then
```shell
export KALDI_ROOT=<absolute path to kaldi>
```
#### Decoder
```shell
git clone https://github.com/PaddlePaddle/models.git
cd models/fluid/DeepASR/decoder
sh setup.sh
```
### Data reprocessing
TBD
### Training
TBD
### Inference & Decoding
TBD
### Question and Contribution
TBD
...@@ -8,6 +8,7 @@ import numpy as np ...@@ -8,6 +8,7 @@ import numpy as np
import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
import data_utils.augmentor.trans_add_delta as trans_add_delta import data_utils.augmentor.trans_add_delta as trans_add_delta
import data_utils.augmentor.trans_splice as trans_splice import data_utils.augmentor.trans_splice as trans_splice
import data_utils.augmentor.trans_delay as trans_delay
class TestTransMeanVarianceNorm(unittest.TestCase): class TestTransMeanVarianceNorm(unittest.TestCase):
...@@ -112,5 +113,24 @@ class TestTransSplict(unittest.TestCase): ...@@ -112,5 +113,24 @@ class TestTransSplict(unittest.TestCase):
self.assertAlmostEqual(feature[i][j * 10 + k], cur_val) self.assertAlmostEqual(feature[i][j * 10 + k], cur_val)
class TestTransDelay(unittest.TestCase):
"""unittest TransDelay
"""
def test_perform(self):
label = np.zeros((10, 1), dtype="int64")
for i in xrange(10):
label[i][0] = i
trans = trans_delay.TransDelay(5)
(_, label, _) = trans.perform_trans((None, label, None))
for i in xrange(5):
self.assertAlmostEqual(label[i + 5][0], i)
for i in xrange(5):
self.assertAlmostEqual(label[i][0], 0)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import math
class TransDelay(object):
""" Delay label, and copy first label value in the front.
Attributes:
_delay_time : the delay frame num of label
"""
def __init__(self, delay_time):
"""init construction
Args:
delay_time : the delay frame num of label
"""
self._delay_time = delay_time
def perform_trans(self, sample):
"""
Args:
sample(object):input sample, contain feature numpy and label numpy, sample name list
Returns:
(feature, label, name)
"""
(feature, label, name) = sample
shape = label.shape
assert len(shape) == 2
label[self._delay_time:shape[0]] = label[0:shape[0] - self._delay_time]
for i in xrange(self._delay_time):
label[i][0] = label[self._delay_time][0]
return (feature, label, name)
data_dir=~/.cache/paddle/dataset/speech/deep_asr_data/aishell data_dir=~/.cache/paddle/dataset/speech/deep_asr_data/aishell
data_url='http://deep-asr-data.gz.bcebos.com/aishell_data.tar.gz' data_url='http://deep-asr-data.gz.bcebos.com/aishell_data.tar.gz'
lst_url='http://deep-asr-data.gz.bcebos.com/aishell_lst.tar.gz' lst_url='http://deep-asr-data.gz.bcebos.com/aishell_lst.tar.gz'
md5=e017d858d9e509c8a84b73f673f08b9a md5=17669b8d63331c9326f4a9393d289bfb
if [ ! -e $data_dir ]; then if [ ! -e $data_dir ]; then
mkdir -p $data_dir mkdir -p $data_dir
......
export CUDA_VISIBLE_DEVICES=2,3,4,5 export CUDA_VISIBLE_DEVICES=0,1,2,3
python -u ../../tools/profile.py --feature_lst data/train_feature.lst \ python -u ../../tools/profile.py --feature_lst data/train_feature.lst \
--label_lst data/train_label.lst \ --label_lst data/train_label.lst \
--mean_var data/aishell/global_mean_var \ --mean_var data/aishell/global_mean_var \
--parallel \ --parallel \
--frame_dim 2640 \ --frame_dim 80 \
--class_num 101 \ --class_num 3040 \
export CUDA_VISIBLE_DEVICES=2,3,4,5 export CUDA_VISIBLE_DEVICES=0,1,2,3
python -u ../../train.py --train_feature_lst data/train_feature.lst \ python -u ../../train.py --train_feature_lst data/train_feature.lst \
--train_label_lst data/train_label.lst \ --train_label_lst data/train_label.lst \
--val_feature_lst data/val_feature.lst \ --val_feature_lst data/val_feature.lst \
--val_label_lst data/val_label.lst \ --val_label_lst data/val_label.lst \
--mean_var data/aishell/global_mean_var \ --mean_var data/aishell/global_mean_var \
--checkpoints checkpoints \ --checkpoints checkpoints \
--frame_dim 2640 \ --frame_dim 80 \
--class_num 101 \ --class_num 3040 \
--infer_models '' \ --infer_models '' \
--batch_size 128 \ --batch_size 64 \
--learning_rate 0.00016 \ --learning_rate 6.4e-5 \
--parallel --parallel
~
...@@ -12,6 +12,7 @@ import paddle.fluid as fluid ...@@ -12,6 +12,7 @@ import paddle.fluid as fluid
import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
import data_utils.augmentor.trans_add_delta as trans_add_delta import data_utils.augmentor.trans_add_delta as trans_add_delta
import data_utils.augmentor.trans_splice as trans_splice import data_utils.augmentor.trans_splice as trans_splice
import data_utils.augmentor.trans_delay as trans_delay
import data_utils.async_data_reader as reader import data_utils.async_data_reader as reader
from decoder.post_decode_faster import Decoder from decoder.post_decode_faster import Decoder
from data_utils.util import lodtensor_to_ndarray from data_utils.util import lodtensor_to_ndarray
...@@ -36,7 +37,7 @@ def parse_args(): ...@@ -36,7 +37,7 @@ def parse_args():
parser.add_argument( parser.add_argument(
'--frame_dim', '--frame_dim',
type=int, type=int,
default=120 * 11, default=80,
help='Frame dimension of feature data. (default: %(default)d)') help='Frame dimension of feature data. (default: %(default)d)')
parser.add_argument( parser.add_argument(
'--stacked_num', '--stacked_num',
...@@ -179,7 +180,7 @@ def infer_from_ckpt(args): ...@@ -179,7 +180,7 @@ def infer_from_ckpt(args):
ltrans = [ ltrans = [
trans_add_delta.TransAddDelta(2, 2), trans_add_delta.TransAddDelta(2, 2),
trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var), trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var),
trans_splice.TransSplice() trans_splice.TransSplice(), trans_delay.TransDelay(5)
] ]
feature_t = fluid.LoDTensor() feature_t = fluid.LoDTensor()
......
...@@ -32,25 +32,23 @@ def stacked_lstmp_model(frame_dim, ...@@ -32,25 +32,23 @@ def stacked_lstmp_model(frame_dim,
# network configuration # network configuration
def _net_conf(feature, label): def _net_conf(feature, label):
seq_conv1 = fluid.layers.sequence_conv( conv1 = fluid.layers.conv2d(
input=feature, input=feature,
num_filters=1024, num_filters=32,
filter_size=3, filter_size=3,
filter_stride=1, stride=1,
bias_attr=True) padding=1,
bn1 = fluid.layers.batch_norm( bias_attr=True,
input=seq_conv1, act="relu")
act="sigmoid",
is_test=not is_train,
momentum=0.9,
epsilon=1e-05,
data_layout='NCHW')
stack_input = bn1 pool1 = fluid.layers.pool2d(
conv1, pool_size=3, pool_type="max", pool_stride=2, pool_padding=0)
stack_input = pool1
for i in range(stacked_num): for i in range(stacked_num):
fc = fluid.layers.fc(input=stack_input, fc = fluid.layers.fc(input=stack_input,
size=hidden_dim * 4, size=hidden_dim * 4,
bias_attr=True) bias_attr=None)
proj, cell = fluid.layers.dynamic_lstmp( proj, cell = fluid.layers.dynamic_lstmp(
input=fc, input=fc,
size=hidden_dim * 4, size=hidden_dim * 4,
...@@ -62,7 +60,6 @@ def stacked_lstmp_model(frame_dim, ...@@ -62,7 +60,6 @@ def stacked_lstmp_model(frame_dim,
proj_activation="tanh") proj_activation="tanh")
bn = fluid.layers.batch_norm( bn = fluid.layers.batch_norm(
input=proj, input=proj,
act="sigmoid",
is_test=not is_train, is_test=not is_train,
momentum=0.9, momentum=0.9,
epsilon=1e-05, epsilon=1e-05,
...@@ -80,7 +77,10 @@ def stacked_lstmp_model(frame_dim, ...@@ -80,7 +77,10 @@ def stacked_lstmp_model(frame_dim,
# data feeder # data feeder
feature = fluid.layers.data( feature = fluid.layers.data(
name="feature", shape=[-1, frame_dim], dtype="float32", lod_level=1) name="feature",
shape=[-1, 3, 11, frame_dim],
dtype="float32",
lod_level=1)
label = fluid.layers.data( label = fluid.layers.data(
name="label", shape=[-1, 1], dtype="int64", lod_level=1) name="label", shape=[-1, 1], dtype="int64", lod_level=1)
......
...@@ -13,6 +13,7 @@ import _init_paths ...@@ -13,6 +13,7 @@ import _init_paths
import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
import data_utils.augmentor.trans_add_delta as trans_add_delta import data_utils.augmentor.trans_add_delta as trans_add_delta
import data_utils.augmentor.trans_splice as trans_splice import data_utils.augmentor.trans_splice as trans_splice
import data_utils.augmentor.trans_delay as trans_delay
import data_utils.async_data_reader as reader import data_utils.async_data_reader as reader
from model_utils.model import stacked_lstmp_model from model_utils.model import stacked_lstmp_model
from data_utils.util import lodtensor_to_ndarray from data_utils.util import lodtensor_to_ndarray
...@@ -87,7 +88,7 @@ def parse_args(): ...@@ -87,7 +88,7 @@ def parse_args():
parser.add_argument( parser.add_argument(
'--max_batch_num', '--max_batch_num',
type=int, type=int,
default=10, default=11,
help='Maximum number of batches for profiling. (default: %(default)d)') help='Maximum number of batches for profiling. (default: %(default)d)')
parser.add_argument( parser.add_argument(
'--first_batches_to_skip', '--first_batches_to_skip',
...@@ -146,10 +147,10 @@ def profile(args): ...@@ -146,10 +147,10 @@ def profile(args):
ltrans = [ ltrans = [
trans_add_delta.TransAddDelta(2, 2), trans_add_delta.TransAddDelta(2, 2),
trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var), trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var),
trans_splice.TransSplice() trans_splice.TransSplice(5, 5), trans_delay.TransDelay(5)
] ]
data_reader = reader.AsyncDataReader(args.feature_lst, args.label_lst) data_reader = reader.AsyncDataReader(args.feature_lst, args.label_lst, -1)
data_reader.set_transformers(ltrans) data_reader.set_transformers(ltrans)
feature_t = fluid.LoDTensor() feature_t = fluid.LoDTensor()
...@@ -169,6 +170,8 @@ def profile(args): ...@@ -169,6 +170,8 @@ def profile(args):
frames_seen = 0 frames_seen = 0
# load_data # load_data
(features, labels, lod, _) = batch_data (features, labels, lod, _) = batch_data
features = np.reshape(features, (-1, 11, 3, args.frame_dim))
features = np.transpose(features, (0, 2, 1, 3))
feature_t.set(features, place) feature_t.set(features, place)
feature_t.set_lod([lod]) feature_t.set_lod([lod])
label_t.set(labels, place) label_t.set(labels, place)
......
...@@ -12,6 +12,7 @@ import paddle.fluid as fluid ...@@ -12,6 +12,7 @@ import paddle.fluid as fluid
import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
import data_utils.augmentor.trans_add_delta as trans_add_delta import data_utils.augmentor.trans_add_delta as trans_add_delta
import data_utils.augmentor.trans_splice as trans_splice import data_utils.augmentor.trans_splice as trans_splice
import data_utils.augmentor.trans_delay as trans_delay
import data_utils.async_data_reader as reader import data_utils.async_data_reader as reader
from data_utils.util import lodtensor_to_ndarray from data_utils.util import lodtensor_to_ndarray
from model_utils.model import stacked_lstmp_model from model_utils.model import stacked_lstmp_model
...@@ -33,7 +34,7 @@ def parse_args(): ...@@ -33,7 +34,7 @@ def parse_args():
parser.add_argument( parser.add_argument(
'--frame_dim', '--frame_dim',
type=int, type=int,
default=120 * 11, default=80,
help='Frame dimension of feature data. (default: %(default)d)') help='Frame dimension of feature data. (default: %(default)d)')
parser.add_argument( parser.add_argument(
'--stacked_num', '--stacked_num',
...@@ -53,7 +54,7 @@ def parse_args(): ...@@ -53,7 +54,7 @@ def parse_args():
parser.add_argument( parser.add_argument(
'--class_num', '--class_num',
type=int, type=int,
default=1749, default=3040,
help='Number of classes in label. (default: %(default)d)') help='Number of classes in label. (default: %(default)d)')
parser.add_argument( parser.add_argument(
'--pass_num', '--pass_num',
...@@ -157,6 +158,7 @@ def train(args): ...@@ -157,6 +158,7 @@ def train(args):
# program for test # program for test
test_program = fluid.default_main_program().clone() test_program = fluid.default_main_program().clone()
#optimizer = fluid.optimizer.Momentum(learning_rate=args.learning_rate, momentum=0.9)
optimizer = fluid.optimizer.Adam(learning_rate=args.learning_rate) optimizer = fluid.optimizer.Adam(learning_rate=args.learning_rate)
optimizer.minimize(avg_cost) optimizer.minimize(avg_cost)
...@@ -171,7 +173,7 @@ def train(args): ...@@ -171,7 +173,7 @@ def train(args):
ltrans = [ ltrans = [
trans_add_delta.TransAddDelta(2, 2), trans_add_delta.TransAddDelta(2, 2),
trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var), trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var),
trans_splice.TransSplice() trans_splice.TransSplice(5, 5), trans_delay.TransDelay(5)
] ]
feature_t = fluid.LoDTensor() feature_t = fluid.LoDTensor()
...@@ -193,6 +195,8 @@ def train(args): ...@@ -193,6 +195,8 @@ def train(args):
args.minimum_batch_size)): args.minimum_batch_size)):
# load_data # load_data
(features, labels, lod, _) = batch_data (features, labels, lod, _) = batch_data
features = np.reshape(features, (-1, 11, 3, args.frame_dim))
features = np.transpose(features, (0, 2, 1, 3))
feature_t.set(features, place) feature_t.set(features, place)
feature_t.set_lod([lod]) feature_t.set_lod([lod])
label_t.set(labels, place) label_t.set(labels, place)
...@@ -220,6 +224,8 @@ def train(args): ...@@ -220,6 +224,8 @@ def train(args):
args.minimum_batch_size)): args.minimum_batch_size)):
# load_data # load_data
(features, labels, lod, name_lst) = batch_data (features, labels, lod, name_lst) = batch_data
features = np.reshape(features, (-1, 11, 3, args.frame_dim))
features = np.transpose(features, (0, 2, 1, 3))
feature_t.set(features, place) feature_t.set(features, place)
feature_t.set_lod([lod]) feature_t.set_lod([lod])
label_t.set(labels, place) label_t.set(labels, place)
......
#-*- coding: utf-8 -*-
#File: DQN.py
from agent import Model
import gym
import argparse
from tqdm import tqdm
from expreplay import ReplayMemory, Experience
import numpy as np
import os
UPDATE_FREQ = 4
MEMORY_WARMUP_SIZE = 1000
def run_episode(agent, env, exp, train_or_test):
assert train_or_test in ['train', 'test'], train_or_test
total_reward = 0
state = env.reset()
for step in range(200):
action = agent.act(state, train_or_test)
next_state, reward, isOver, _ = env.step(action)
if train_or_test == 'train':
exp.append(Experience(state, action, reward, isOver))
# train model
# start training
if len(exp) > MEMORY_WARMUP_SIZE:
batch_idx = np.random.randint(
len(exp) - 1, size=(args.batch_size))
if step % UPDATE_FREQ == 0:
batch_state, batch_action, batch_reward, \
batch_next_state, batch_isOver = exp.sample(batch_idx)
agent.train(batch_state, batch_action, batch_reward, \
batch_next_state, batch_isOver)
total_reward += reward
state = next_state
if isOver:
break
return total_reward
def train_agent():
env = gym.make(args.env)
state_shape = env.observation_space.shape
exp = ReplayMemory(args.mem_size, state_shape)
action_dim = env.action_space.n
agent = Model(state_shape[0], action_dim, gamma=0.99)
while len(exp) < MEMORY_WARMUP_SIZE:
run_episode(agent, env, exp, train_or_test='train')
max_episode = 4000
# train
total_episode = 0
pbar = tqdm(total=max_episode)
recent_100_reward = []
for episode in xrange(max_episode):
# start epoch
total_reward = run_episode(agent, env, exp, train_or_test='train')
pbar.set_description('[train]exploration:{}'.format(agent.exploration))
pbar.update()
# recent 100 reward
total_reward = run_episode(agent, env, exp, train_or_test='test')
recent_100_reward.append(total_reward)
if len(recent_100_reward) > 100:
recent_100_reward = recent_100_reward[1:]
pbar.write("episode:{} test_reward:{}".format(\
episode, np.mean(recent_100_reward)))
pbar.close()
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--env', type=str, default='MountainCar-v0', \
help='enviroment to train DQN model, e.g CartPole-v0')
parser.add_argument('--gamma', type=float, default=0.99, \
help='discount factor for accumulated reward computation')
parser.add_argument('--mem_size', type=int, default=500000, \
help='memory size for experience replay')
parser.add_argument('--batch_size', type=int, default=192, \
help='batch size for training')
args = parser.parse_args()
train_agent()
#-*- coding: utf-8 -*-
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
import numpy as np
import math
from tqdm import tqdm
from utils import fluid_flatten
class DQNModel(object):
def __init__(self, state_dim, action_dim, gamma, hist_len, use_cuda=False):
self.img_height = state_dim[0]
self.img_width = state_dim[1]
self.action_dim = action_dim
self.gamma = gamma
self.exploration = 1.1
self.update_target_steps = 10000 // 4
self.hist_len = hist_len
self.use_cuda = use_cuda
self.global_step = 0
self._build_net()
def _get_inputs(self):
return fluid.layers.data(
name='state',
shape=[self.hist_len, self.img_height, self.img_width],
dtype='float32'), \
fluid.layers.data(
name='action', shape=[1], dtype='int32'), \
fluid.layers.data(
name='reward', shape=[], dtype='float32'), \
fluid.layers.data(
name='next_s',
shape=[self.hist_len, self.img_height, self.img_width],
dtype='float32'), \
fluid.layers.data(
name='isOver', shape=[], dtype='bool')
def _build_net(self):
state, action, reward, next_s, isOver = self._get_inputs()
self.pred_value = self.get_DQN_prediction(state)
self.predict_program = fluid.default_main_program().clone()
reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
action_onehot = fluid.layers.one_hot(action, self.action_dim)
action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
pred_action_value = fluid.layers.reduce_sum(
fluid.layers.elementwise_mul(action_onehot, self.pred_value), dim=1)
targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
best_v = fluid.layers.reduce_max(targetQ_predict_value, dim=1)
best_v.stop_gradient = True
target = reward + (1.0 - fluid.layers.cast(
isOver, dtype='float32')) * self.gamma * best_v
cost = fluid.layers.square_error_cost(pred_action_value, target)
cost = fluid.layers.reduce_mean(cost)
self._sync_program = self._build_sync_target_network()
optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
optimizer.minimize(cost)
# define program
self.train_program = fluid.default_main_program()
# fluid exe
place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace()
self.exe = fluid.Executor(place)
self.exe.run(fluid.default_startup_program())
def get_DQN_prediction(self, image, target=False):
image = image / 255.0
variable_field = 'target' if target else 'policy'
conv1 = fluid.layers.conv2d(
input=image,
num_filters=32,
filter_size=[5, 5],
stride=[1, 1],
padding=[2, 2],
act='relu',
param_attr=ParamAttr(name='{}_conv1'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field)))
max_pool1 = fluid.layers.pool2d(
input=conv1, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
conv2 = fluid.layers.conv2d(
input=max_pool1,
num_filters=32,
filter_size=[5, 5],
stride=[1, 1],
padding=[2, 2],
act='relu',
param_attr=ParamAttr(name='{}_conv2'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field)))
max_pool2 = fluid.layers.pool2d(
input=conv2, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
conv3 = fluid.layers.conv2d(
input=max_pool2,
num_filters=64,
filter_size=[4, 4],
stride=[1, 1],
padding=[1, 1],
act='relu',
param_attr=ParamAttr(name='{}_conv3'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field)))
max_pool3 = fluid.layers.pool2d(
input=conv3, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
conv4 = fluid.layers.conv2d(
input=max_pool3,
num_filters=64,
filter_size=[3, 3],
stride=[1, 1],
padding=[1, 1],
act='relu',
param_attr=ParamAttr(name='{}_conv4'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field)))
flatten = fluid_flatten(conv4)
out = fluid.layers.fc(
input=flatten,
size=self.action_dim,
param_attr=ParamAttr(name='{}_fc1'.format(variable_field)),
bias_attr=ParamAttr(name='{}_fc1_b'.format(variable_field)))
return out
def _build_sync_target_network(self):
vars = list(fluid.default_main_program().list_vars())
policy_vars = filter(
lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars)
target_vars = filter(
lambda x: 'GRAD' not in x.name and 'target' in x.name, vars)
policy_vars.sort(key=lambda x: x.name)
target_vars.sort(key=lambda x: x.name)
sync_program = fluid.default_main_program().clone()
with fluid.program_guard(sync_program):
sync_ops = []
for i, var in enumerate(policy_vars):
sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
sync_ops.append(sync_op)
sync_program = sync_program.prune(sync_ops)
return sync_program
def act(self, state, train_or_test):
sample = np.random.random()
if train_or_test == 'train' and sample < self.exploration:
act = np.random.randint(self.action_dim)
else:
if np.random.random() < 0.01:
act = np.random.randint(self.action_dim)
else:
state = np.expand_dims(state, axis=0)
pred_Q = self.exe.run(self.predict_program,
feed={'state': state.astype('float32')},
fetch_list=[self.pred_value])[0]
pred_Q = np.squeeze(pred_Q, axis=0)
act = np.argmax(pred_Q)
if train_or_test == 'train':
self.exploration = max(0.1, self.exploration - 1e-6)
return act
def train(self, state, action, reward, next_state, isOver):
if self.global_step % self.update_target_steps == 0:
self.sync_target_network()
self.global_step += 1
action = np.expand_dims(action, -1)
self.exe.run(self.train_program,
feed={
'state': state.astype('float32'),
'action': action.astype('int32'),
'reward': reward,
'next_s': next_state.astype('float32'),
'isOver': isOver
})
def sync_target_network(self):
self.exe.run(self._sync_program)
#-*- coding: utf-8 -*-
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
import numpy as np
from tqdm import tqdm
import math
from utils import fluid_argmax, fluid_flatten
class DoubleDQNModel(object):
def __init__(self, state_dim, action_dim, gamma, hist_len, use_cuda=False):
self.img_height = state_dim[0]
self.img_width = state_dim[1]
self.action_dim = action_dim
self.gamma = gamma
self.exploration = 1.1
self.update_target_steps = 10000 // 4
self.hist_len = hist_len
self.use_cuda = use_cuda
self.global_step = 0
self._build_net()
def _get_inputs(self):
return fluid.layers.data(
name='state',
shape=[self.hist_len, self.img_height, self.img_width],
dtype='float32'), \
fluid.layers.data(
name='action', shape=[1], dtype='int32'), \
fluid.layers.data(
name='reward', shape=[], dtype='float32'), \
fluid.layers.data(
name='next_s',
shape=[self.hist_len, self.img_height, self.img_width],
dtype='float32'), \
fluid.layers.data(
name='isOver', shape=[], dtype='bool')
def _build_net(self):
state, action, reward, next_s, isOver = self._get_inputs()
self.pred_value = self.get_DQN_prediction(state)
self.predict_program = fluid.default_main_program().clone()
reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
action_onehot = fluid.layers.one_hot(action, self.action_dim)
action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
pred_action_value = fluid.layers.reduce_sum(
fluid.layers.elementwise_mul(action_onehot, self.pred_value), dim=1)
targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
next_s_predcit_value = self.get_DQN_prediction(next_s)
greedy_action = fluid_argmax(next_s_predcit_value)
predict_onehot = fluid.layers.one_hot(greedy_action, self.action_dim)
best_v = fluid.layers.reduce_sum(
fluid.layers.elementwise_mul(predict_onehot, targetQ_predict_value),
dim=1)
best_v.stop_gradient = True
target = reward + (1.0 - fluid.layers.cast(
isOver, dtype='float32')) * self.gamma * best_v
cost = fluid.layers.square_error_cost(pred_action_value, target)
cost = fluid.layers.reduce_mean(cost)
self._sync_program = self._build_sync_target_network()
optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
optimizer.minimize(cost)
# define program
self.train_program = fluid.default_main_program()
# fluid exe
place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace()
self.exe = fluid.Executor(place)
self.exe.run(fluid.default_startup_program())
def get_DQN_prediction(self, image, target=False):
image = image / 255.0
variable_field = 'target' if target else 'policy'
conv1 = fluid.layers.conv2d(
input=image,
num_filters=32,
filter_size=[5, 5],
stride=[1, 1],
padding=[2, 2],
act='relu',
param_attr=ParamAttr(name='{}_conv1'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field)))
max_pool1 = fluid.layers.pool2d(
input=conv1, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
conv2 = fluid.layers.conv2d(
input=max_pool1,
num_filters=32,
filter_size=[5, 5],
stride=[1, 1],
padding=[2, 2],
act='relu',
param_attr=ParamAttr(name='{}_conv2'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field)))
max_pool2 = fluid.layers.pool2d(
input=conv2, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
conv3 = fluid.layers.conv2d(
input=max_pool2,
num_filters=64,
filter_size=[4, 4],
stride=[1, 1],
padding=[1, 1],
act='relu',
param_attr=ParamAttr(name='{}_conv3'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field)))
max_pool3 = fluid.layers.pool2d(
input=conv3, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
conv4 = fluid.layers.conv2d(
input=max_pool3,
num_filters=64,
filter_size=[3, 3],
stride=[1, 1],
padding=[1, 1],
act='relu',
param_attr=ParamAttr(name='{}_conv4'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field)))
flatten = fluid_flatten(conv4)
out = fluid.layers.fc(
input=flatten,
size=self.action_dim,
param_attr=ParamAttr(name='{}_fc1'.format(variable_field)),
bias_attr=ParamAttr(name='{}_fc1_b'.format(variable_field)))
return out
def _build_sync_target_network(self):
vars = list(fluid.default_main_program().list_vars())
policy_vars = filter(
lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars)
target_vars = filter(
lambda x: 'GRAD' not in x.name and 'target' in x.name, vars)
policy_vars.sort(key=lambda x: x.name)
target_vars.sort(key=lambda x: x.name)
sync_program = fluid.default_main_program().clone()
with fluid.program_guard(sync_program):
sync_ops = []
for i, var in enumerate(policy_vars):
sync_op = fluid.layers.assign(policy_vars[i], target_vars[i])
sync_ops.append(sync_op)
sync_program = sync_program.prune(sync_ops)
return sync_program
def act(self, state, train_or_test):
sample = np.random.random()
if train_or_test == 'train' and sample < self.exploration:
act = np.random.randint(self.action_dim)
else:
if np.random.random() < 0.01:
act = np.random.randint(self.action_dim)
else:
state = np.expand_dims(state, axis=0)
pred_Q = self.exe.run(self.predict_program,
feed={'state': state.astype('float32')},
fetch_list=[self.pred_value])[0]
pred_Q = np.squeeze(pred_Q, axis=0)
act = np.argmax(pred_Q)
if train_or_test == 'train':
self.exploration = max(0.1, self.exploration - 1e-6)
return act
def train(self, state, action, reward, next_state, isOver):
if self.global_step % self.update_target_steps == 0:
self.sync_target_network()
self.global_step += 1
action = np.expand_dims(action, -1)
self.exe.run(self.train_program,
feed={
'state': state.astype('float32'),
'action': action.astype('int32'),
'reward': reward,
'next_s': next_state.astype('float32'),
'isOver': isOver
})
def sync_target_network(self):
self.exe.run(self._sync_program)
#-*- coding: utf-8 -*- #-*- coding: utf-8 -*-
#File: agent.py
import paddle.fluid as fluid import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr from paddle.fluid.param_attr import ParamAttr
import numpy as np import numpy as np
from tqdm import tqdm from tqdm import tqdm
import math import math
from utils import fluid_flatten
UPDATE_TARGET_STEPS = 200
class DuelingDQNModel(object):
class Model(object): def __init__(self, state_dim, action_dim, gamma, hist_len, use_cuda=False):
def __init__(self, state_dim, action_dim, gamma): self.img_height = state_dim[0]
self.global_step = 0 self.img_width = state_dim[1]
self.state_dim = state_dim
self.action_dim = action_dim self.action_dim = action_dim
self.gamma = gamma self.gamma = gamma
self.exploration = 1.0 self.exploration = 1.1
self.update_target_steps = 10000 // 4
self.hist_len = hist_len
self.use_cuda = use_cuda
self.global_step = 0
self._build_net() self._build_net()
def _get_inputs(self): def _get_inputs(self):
return [fluid.layers.data(\ return fluid.layers.data(
name='state', shape=[self.state_dim], dtype='float32'), name='state',
fluid.layers.data(\ shape=[self.hist_len, self.img_height, self.img_width],
name='action', shape=[1], dtype='int32'), dtype='float32'), \
fluid.layers.data(\ fluid.layers.data(
name='reward', shape=[], dtype='float32'), name='action', shape=[1], dtype='int32'), \
fluid.layers.data(\ fluid.layers.data(
name='next_s', shape=[self.state_dim], dtype='float32'), name='reward', shape=[], dtype='float32'), \
fluid.layers.data(\ fluid.layers.data(
name='isOver', shape=[], dtype='bool')] name='next_s',
shape=[self.hist_len, self.img_height, self.img_width],
dtype='float32'), \
fluid.layers.data(
name='isOver', shape=[], dtype='bool')
def _build_net(self): def _build_net(self):
state, action, reward, next_s, isOver = self._get_inputs() state, action, reward, next_s, isOver = self._get_inputs()
self.pred_value = self.get_DQN_prediction(state) self.pred_value = self.get_DQN_prediction(state)
self.predict_program = fluid.default_main_program().clone() self.predict_program = fluid.default_main_program().clone()
reward = fluid.layers.clip(reward, min=-1.0, max=1.0)
action_onehot = fluid.layers.one_hot(action, self.action_dim) action_onehot = fluid.layers.one_hot(action, self.action_dim)
action_onehot = fluid.layers.cast(action_onehot, dtype='float32') action_onehot = fluid.layers.cast(action_onehot, dtype='float32')
pred_action_value = fluid.layers.reduce_sum(\ pred_action_value = fluid.layers.reduce_sum(
fluid.layers.elementwise_mul(action_onehot, self.pred_value), dim=1) fluid.layers.elementwise_mul(action_onehot, self.pred_value), dim=1)
targetQ_predict_value = self.get_DQN_prediction(next_s, target=True) targetQ_predict_value = self.get_DQN_prediction(next_s, target=True)
best_v = fluid.layers.reduce_max(targetQ_predict_value, dim=1) best_v = fluid.layers.reduce_max(targetQ_predict_value, dim=1)
best_v.stop_gradient = True best_v.stop_gradient = True
target = reward + (1.0 - fluid.layers.cast(\ target = reward + (1.0 - fluid.layers.cast(
isOver, dtype='float32')) * self.gamma * best_v isOver, dtype='float32')) * self.gamma * best_v
cost = fluid.layers.square_error_cost(\ cost = fluid.layers.square_error_cost(pred_action_value, target)
input=pred_action_value, label=target)
cost = fluid.layers.reduce_mean(cost) cost = fluid.layers.reduce_mean(cost)
self._sync_program = self._build_sync_target_network() self._sync_program = self._build_sync_target_network()
optimizer = fluid.optimizer.Adam(1e-3) optimizer = fluid.optimizer.Adam(1e-3 * 0.5, epsilon=1e-3)
optimizer.minimize(cost) optimizer.minimize(cost)
# define program # define program
self.train_program = fluid.default_main_program() self.train_program = fluid.default_main_program()
# fluid exe # fluid exe
place = fluid.CUDAPlace(0) place = fluid.CUDAPlace(0) if self.use_cuda else fluid.CPUPlace()
self.exe = fluid.Executor(place) self.exe = fluid.Executor(place)
self.exe.run(fluid.default_startup_program()) self.exe.run(fluid.default_startup_program())
def get_DQN_prediction(self, state, target=False): def get_DQN_prediction(self, image, target=False):
image = image / 255.0
variable_field = 'target' if target else 'policy' variable_field = 'target' if target else 'policy'
# layer fc1
param_attr = ParamAttr(name='{}_fc1'.format(variable_field)) conv1 = fluid.layers.conv2d(
bias_attr = ParamAttr(name='{}_fc1_b'.format(variable_field)) input=image,
fc1 = fluid.layers.fc(input=state, num_filters=32,
size=256, filter_size=[5, 5],
act='relu', stride=[1, 1],
param_attr=param_attr, padding=[2, 2],
bias_attr=bias_attr) act='relu',
param_attr=ParamAttr(name='{}_conv1'.format(variable_field)),
param_attr = ParamAttr(name='{}_fc2'.format(variable_field)) bias_attr=ParamAttr(name='{}_conv1_b'.format(variable_field)))
bias_attr = ParamAttr(name='{}_fc2_b'.format(variable_field)) max_pool1 = fluid.layers.pool2d(
fc2 = fluid.layers.fc(input=fc1, input=conv1, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
size=128,
act='tanh', conv2 = fluid.layers.conv2d(
param_attr=param_attr, input=max_pool1,
bias_attr=bias_attr) num_filters=32,
filter_size=[5, 5],
param_attr = ParamAttr(name='{}_fc3'.format(variable_field)) stride=[1, 1],
bias_attr = ParamAttr(name='{}_fc3_b'.format(variable_field)) padding=[2, 2],
value = fluid.layers.fc(input=fc2, act='relu',
size=self.action_dim, param_attr=ParamAttr(name='{}_conv2'.format(variable_field)),
param_attr=param_attr, bias_attr=ParamAttr(name='{}_conv2_b'.format(variable_field)))
bias_attr=bias_attr) max_pool2 = fluid.layers.pool2d(
input=conv2, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
return value
conv3 = fluid.layers.conv2d(
input=max_pool2,
num_filters=64,
filter_size=[4, 4],
stride=[1, 1],
padding=[1, 1],
act='relu',
param_attr=ParamAttr(name='{}_conv3'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv3_b'.format(variable_field)))
max_pool3 = fluid.layers.pool2d(
input=conv3, pool_size=[2, 2], pool_stride=[2, 2], pool_type='max')
conv4 = fluid.layers.conv2d(
input=max_pool3,
num_filters=64,
filter_size=[3, 3],
stride=[1, 1],
padding=[1, 1],
act='relu',
param_attr=ParamAttr(name='{}_conv4'.format(variable_field)),
bias_attr=ParamAttr(name='{}_conv4_b'.format(variable_field)))
flatten = fluid_flatten(conv4)
value = fluid.layers.fc(
input=flatten,
size=1,
param_attr=ParamAttr(name='{}_value_fc'.format(variable_field)),
bias_attr=ParamAttr(name='{}_value_fc_b'.format(variable_field)))
advantage = fluid.layers.fc(
input=flatten,
size=self.action_dim,
param_attr=ParamAttr(name='{}_advantage_fc'.format(variable_field)),
bias_attr=ParamAttr(
name='{}_advantage_fc_b'.format(variable_field)))
Q = advantage + (value - fluid.layers.reduce_mean(
advantage, dim=1, keep_dim=True))
return Q
def _build_sync_target_network(self): def _build_sync_target_network(self):
vars = fluid.default_main_program().list_vars() vars = list(fluid.default_main_program().list_vars())
policy_vars = [] policy_vars = filter(
target_vars = [] lambda x: 'GRAD' not in x.name and 'policy' in x.name, vars)
for var in vars: target_vars = filter(
if 'GRAD' in var.name: continue lambda x: 'GRAD' not in x.name and 'target' in x.name, vars)
if 'policy' in var.name: policy_vars.sort(key=lambda x: x.name)
policy_vars.append(var) target_vars.sort(key=lambda x: x.name)
elif 'target' in var.name:
target_vars.append(var)
policy_vars.sort(key=lambda x: x.name.split('policy_')[1])
target_vars.sort(key=lambda x: x.name.split('target_')[1])
sync_program = fluid.default_main_program().clone() sync_program = fluid.default_main_program().clone()
with fluid.program_guard(sync_program): with fluid.program_guard(sync_program):
...@@ -122,26 +166,30 @@ class Model(object): ...@@ -122,26 +166,30 @@ class Model(object):
if train_or_test == 'train' and sample < self.exploration: if train_or_test == 'train' and sample < self.exploration:
act = np.random.randint(self.action_dim) act = np.random.randint(self.action_dim)
else: else:
state = np.expand_dims(state, axis=0) if np.random.random() < 0.01:
pred_Q = self.exe.run(self.predict_program, act = np.random.randint(self.action_dim)
feed={'state': state.astype('float32')}, else:
fetch_list=[self.pred_value])[0] state = np.expand_dims(state, axis=0)
pred_Q = np.squeeze(pred_Q, axis=0) pred_Q = self.exe.run(self.predict_program,
act = np.argmax(pred_Q) feed={'state': state.astype('float32')},
self.exploration = max(0.1, self.exploration - 1e-6) fetch_list=[self.pred_value])[0]
pred_Q = np.squeeze(pred_Q, axis=0)
act = np.argmax(pred_Q)
if train_or_test == 'train':
self.exploration = max(0.1, self.exploration - 1e-6)
return act return act
def train(self, state, action, reward, next_state, isOver): def train(self, state, action, reward, next_state, isOver):
if self.global_step % UPDATE_TARGET_STEPS == 0: if self.global_step % self.update_target_steps == 0:
self.sync_target_network() self.sync_target_network()
self.global_step += 1 self.global_step += 1
action = np.expand_dims(action, -1) action = np.expand_dims(action, -1)
self.exe.run(self.train_program, \ self.exe.run(self.train_program, \
feed={'state': state, \ feed={'state': state.astype('float32'), \
'action': action, \ 'action': action.astype('int32'), \
'reward': reward, \ 'reward': reward, \
'next_s': next_state, \ 'next_s': next_state.astype('float32'), \
'isOver': isOver}) 'isOver': isOver})
def sync_target_network(self): def sync_target_network(self):
......
<img src="mountain_car.gif" width="300" height="200"> # Reproduce DQN, DoubleDQN, DuelingDQN model with fluid version of PaddlePaddle
# Reproduce DQN model + DQN in:
+ DQN in:
[Human-level Control Through Deep Reinforcement Learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html) [Human-level Control Through Deep Reinforcement Learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html)
+ DoubleDQN in:
[Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389)
+ DuelingDQN in:
[Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html)
# Mountain-CAR benchmark & performance # Atari benchmark & performance
[MountainCar-v0](https://gym.openai.com/envs/MountainCar-v0/) ## [Atari games introduction](https://gym.openai.com/envs/#atari)
A car is on a one-dimensional track, positioned between two "mountains". The goal is to drive up the mountain on the right; however, the car's engine is not strong enough to scale the mountain in a single pass. Therefore, the only way to succeed is to drive back and forth to build up momentum. + Pong game result
![DQN result](assets/dqn.png)
# How to use
+ Dependencies:
+ python2.7
+ gym
+ tqdm
+ paddlepaddle-gpu==0.12.0
+ Start Training:
```
# To train a model for Pong game with gpu (use DQN model as default)
python train.py --rom ./rom_files/pong.bin --use_cuda
<img src="curve.png" > # To train a model for Pong with DoubleDQN
python train.py --rom ./rom_files/pong.bin --use_cuda --alg DoubleDQN
# To train a model for Pong with DuelingDQN
python train.py --rom ./rom_files/pong.bin --use_cuda --alg DuelingDQN
```
To train more games, can install more rom files from [here](https://github.com/openai/atari-py/tree/master/atari_py/atari_roms)
# How to use + Start Testing:
+ Dependencies: ```
+ python2.7 # Play the game with saved model and calculate the average rewards
+ gym python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong/stepXXXXX
+ tqdm
+ paddle-fluid
+ Start Training:
```
# use mountain-car enviroment as default
python DQN.py
# use other enviorment # Play the game with visualization
python DQN.py --env CartPole-v0 python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong/stepXXXXX --viz 0.01
``` ```
# -*- coding: utf-8 -*-
import numpy as np
import os
import cv2
import threading
import gym
from gym import spaces
from gym.envs.atari.atari_env import ACTION_MEANING
from ale_python_interface import ALEInterface
__all__ = ['AtariPlayer']
ROM_URL = "https://github.com/openai/atari-py/tree/master/atari_py/atari_roms"
_ALE_LOCK = threading.Lock()
"""
The following AtariPlayer are copied or modified from tensorpack/tensorpack:
https://github.com/tensorpack/tensorpack/blob/master/examples/DeepQNetwork/atari.py
"""
class AtariPlayer(gym.Env):
"""
A wrapper for ALE emulator, with configurations to mimic DeepMind DQN settings.
Info:
score: the accumulated reward in the current game
gameOver: True when the current game is Over
"""
def __init__(self,
rom_file,
viz=0,
frame_skip=4,
nullop_start=30,
live_lost_as_eoe=True,
max_num_frames=0):
"""
Args:
rom_file: path to the rom
frame_skip: skip every k frames and repeat the action
viz: visualization to be done.
Set to 0 to disable.
Set to a positive number to be the delay between frames to show.
Set to a string to be a directory to store frames.
nullop_start: start with random number of null ops.
live_losts_as_eoe: consider lost of lives as end of episode. Useful for training.
max_num_frames: maximum number of frames per episode.
"""
super(AtariPlayer, self).__init__()
assert os.path.isfile(rom_file), \
"rom {} not found. Please download at {}".format(rom_file, ROM_URL)
try:
ALEInterface.setLoggerMode(ALEInterface.Logger.Error)
except AttributeError:
print "You're not using latest ALE"
# avoid simulator bugs: https://github.com/mgbellemare/Arcade-Learning-Environment/issues/86
with _ALE_LOCK:
self.ale = ALEInterface()
self.ale.setInt(b"random_seed", np.random.randint(0, 30000))
self.ale.setInt(b"max_num_frames_per_episode", max_num_frames)
self.ale.setBool(b"showinfo", False)
self.ale.setInt(b"frame_skip", 1)
self.ale.setBool(b'color_averaging', False)
# manual.pdf suggests otherwise.
self.ale.setFloat(b'repeat_action_probability', 0.0)
# viz setup
if isinstance(viz, str):
assert os.path.isdir(viz), viz
self.ale.setString(b'record_screen_dir', viz)
viz = 0
if isinstance(viz, int):
viz = float(viz)
self.viz = viz
if self.viz and isinstance(self.viz, float):
self.windowname = os.path.basename(rom_file)
cv2.startWindowThread()
cv2.namedWindow(self.windowname)
self.ale.loadROM(rom_file.encode('utf-8'))
self.width, self.height = self.ale.getScreenDims()
self.actions = self.ale.getMinimalActionSet()
self.live_lost_as_eoe = live_lost_as_eoe
self.frame_skip = frame_skip
self.nullop_start = nullop_start
self.action_space = spaces.Discrete(len(self.actions))
self.observation_space = spaces.Box(low=0,
high=255,
shape=(self.height, self.width),
dtype=np.uint8)
self._restart_episode()
def get_action_meanings(self):
return [ACTION_MEANING[i] for i in self.actions]
def _grab_raw_image(self):
"""
:returns: the current 3-channel image
"""
m = self.ale.getScreenRGB()
return m.reshape((self.height, self.width, 3))
def _current_state(self):
"""
returns: a gray-scale (h, w) uint8 image
"""
ret = self._grab_raw_image()
# avoid missing frame issue: max-pooled over the last screen
ret = np.maximum(ret, self.last_raw_screen)
if self.viz:
if isinstance(self.viz, float):
cv2.imshow(self.windowname, ret)
cv2.waitKey(int(self.viz * 1000))
ret = ret.astype('float32')
# 0.299,0.587.0.114. same as rgb2y in torch/image
ret = cv2.cvtColor(ret, cv2.COLOR_RGB2GRAY)
return ret.astype('uint8') # to save some memory
def _restart_episode(self):
with _ALE_LOCK:
self.ale.reset_game()
# random null-ops start
n = np.random.randint(self.nullop_start)
self.last_raw_screen = self._grab_raw_image()
for k in range(n):
if k == n - 1:
self.last_raw_screen = self._grab_raw_image()
self.ale.act(0)
def reset(self):
if self.ale.game_over():
self._restart_episode()
return self._current_state()
def step(self, act):
oldlives = self.ale.lives()
r = 0
for k in range(self.frame_skip):
if k == self.frame_skip - 1:
self.last_raw_screen = self._grab_raw_image()
r += self.ale.act(self.actions[act])
newlives = self.ale.lives()
if self.ale.game_over() or \
(self.live_lost_as_eoe and newlives < oldlives):
break
isOver = self.ale.game_over()
if self.live_lost_as_eoe:
isOver = isOver or newlives < oldlives
info = {'ale.lives': newlives}
return self._current_state(), r, isOver, info
# -*- coding: utf-8 -*-
import numpy as np
from collections import deque
import gym
from gym import spaces
_v0, _v1 = gym.__version__.split('.')[:2]
assert int(_v0) > 0 or int(_v1) >= 10, gym.__version__
"""
The following wrappers are copied or modified from openai/baselines:
https://github.com/openai/baselines/blob/master/baselines/common/atari_wrappers.py
"""
class MapState(gym.ObservationWrapper):
def __init__(self, env, map_func):
gym.ObservationWrapper.__init__(self, env)
self._func = map_func
def observation(self, obs):
return self._func(obs)
class FrameStack(gym.Wrapper):
def __init__(self, env, k):
"""Buffer observations and stack across channels (last axis)."""
gym.Wrapper.__init__(self, env)
self.k = k
self.frames = deque([], maxlen=k)
shp = env.observation_space.shape
chan = 1 if len(shp) == 2 else shp[2]
self.observation_space = spaces.Box(low=0,
high=255,
shape=(shp[0], shp[1], chan * k),
dtype=np.uint8)
def reset(self):
"""Clear buffer and re-fill by duplicating the first observation."""
ob = self.env.reset()
for _ in range(self.k - 1):
self.frames.append(np.zeros_like(ob))
self.frames.append(ob)
return self.observation()
def step(self, action):
ob, reward, done, info = self.env.step(action)
self.frames.append(ob)
return self.observation(), reward, done, info
def observation(self):
assert len(self.frames) == self.k
return np.stack(self.frames, axis=0)
class _FireResetEnv(gym.Wrapper):
def __init__(self, env):
"""Take action on reset for environments that are fixed until firing."""
gym.Wrapper.__init__(self, env)
assert env.unwrapped.get_action_meanings()[1] == 'FIRE'
assert len(env.unwrapped.get_action_meanings()) >= 3
def reset(self):
self.env.reset()
obs, _, done, _ = self.env.step(1)
if done:
self.env.reset()
obs, _, done, _ = self.env.step(2)
if done:
self.env.reset()
return obs
def step(self, action):
return self.env.step(action)
def FireResetEnv(env):
if isinstance(env, gym.Wrapper):
baseenv = env.unwrapped
else:
baseenv = env
if 'FIRE' in baseenv.get_action_meanings():
return _FireResetEnv(env)
return env
class LimitLength(gym.Wrapper):
def __init__(self, env, k):
gym.Wrapper.__init__(self, env)
self.k = k
def reset(self):
# This assumes that reset() will really reset the env.
# If the underlying env tries to be smart about reset
# (e.g. end-of-life), the assumption doesn't hold.
ob = self.env.reset()
self.cnt = 0
return ob
def step(self, action):
ob, r, done, info = self.env.step(action)
self.cnt += 1
if self.cnt == self.k:
done = True
return ob, r, done, info
#-*- coding: utf-8 -*- # -*- coding: utf-8 -*-
#File: expreplay.py
from collections import namedtuple
import numpy as np import numpy as np
import copy
from collections import deque, namedtuple
Experience = namedtuple('Experience', ['state', 'action', 'reward', 'isOver']) Experience = namedtuple('Experience', ['state', 'action', 'reward', 'isOver'])
class ReplayMemory(object): class ReplayMemory(object):
def __init__(self, max_size, state_shape): def __init__(self, max_size, state_shape, context_len):
self.max_size = int(max_size) self.max_size = int(max_size)
self.state_shape = state_shape self.state_shape = state_shape
self.context_len = int(context_len)
self.state = np.zeros((self.max_size, ) + state_shape, dtype='float32') self.state = np.zeros((self.max_size, ) + state_shape, dtype='uint8')
self.action = np.zeros((self.max_size, ), dtype='int32') self.action = np.zeros((self.max_size, ), dtype='int32')
self.reward = np.zeros((self.max_size, ), dtype='float32') self.reward = np.zeros((self.max_size, ), dtype='float32')
self.isOver = np.zeros((self.max_size, ), dtype='bool') self.isOver = np.zeros((self.max_size, ), dtype='bool')
self._curr_size = 0 self._curr_size = 0
self._curr_pos = 0 self._curr_pos = 0
self._context = deque(maxlen=context_len - 1)
def append(self, exp): def append(self, exp):
"""append a new experience into replay memory
"""
if self._curr_size < self.max_size: if self._curr_size < self.max_size:
self._assign(self._curr_pos, exp) self._assign(self._curr_pos, exp)
self._curr_size += 1 self._curr_size += 1
else: else:
self._assign(self._curr_pos, exp) self._assign(self._curr_pos, exp)
self._curr_pos = (self._curr_pos + 1) % self.max_size self._curr_pos = (self._curr_pos + 1) % self.max_size
if exp.isOver:
self._context.clear()
else:
self._context.append(exp)
def recent_state(self):
""" maintain recent state for training"""
lst = list(self._context)
states = [np.zeros(self.state_shape, dtype='uint8')] * \
(self._context.maxlen - len(lst))
states.extend([k.state for k in lst])
return states
def sample(self, idx):
""" return state, action, reward, isOver,
note that some frames in state may be generated from last episode,
they should be removed from state
"""
state = np.zeros(
(self.context_len + 1, ) + self.state_shape, dtype=np.uint8)
state_idx = np.arange(idx, idx + self.context_len + 1) % self._curr_size
# confirm that no frame was generated from last episode
has_last_episode = False
for k in range(self.context_len - 2, -1, -1):
to_check_idx = state_idx[k]
if self.isOver[to_check_idx]:
has_last_episode = True
state_idx = state_idx[k + 1:]
state[k + 1:] = self.state[state_idx]
break
if not has_last_episode:
state = self.state[state_idx]
real_idx = (idx + self.context_len - 1) % self._curr_size
action = self.action[real_idx]
reward = self.reward[real_idx]
isOver = self.isOver[real_idx]
return state, reward, action, isOver
def __len__(self):
return self._curr_size
def _assign(self, pos, exp): def _assign(self, pos, exp):
self.state[pos] = exp.state self.state[pos] = exp.state
self.action[pos] = exp.action
self.reward[pos] = exp.reward self.reward[pos] = exp.reward
self.action[pos] = exp.action
self.isOver[pos] = exp.isOver self.isOver[pos] = exp.isOver
def __len__(self): def sample_batch(self, batch_size):
return self._curr_size """sample a batch from replay memory for training
"""
def sample(self, batch_idx): batch_idx = np.random.randint(
# index mapping to avoid sampling lastest state self._curr_size - self.context_len - 1, size=batch_size)
batch_idx = (self._curr_pos + batch_idx) % self._curr_size batch_idx = (self._curr_pos + batch_idx) % self._curr_size
next_idx = (batch_idx + 1) % self._curr_size batch_exp = [self.sample(i) for i in batch_idx]
return self._process_batch(batch_exp)
state = self.state[batch_idx]
reward = self.reward[batch_idx] def _process_batch(self, batch_exp):
action = self.action[batch_idx] state = np.asarray([e[0] for e in batch_exp], dtype='uint8')
next_state = self.state[next_idx] reward = np.asarray([e[1] for e in batch_exp], dtype='float32')
isOver = self.isOver[batch_idx] action = np.asarray([e[2] for e in batch_exp], dtype='int8')
return (state, action, reward, next_state, isOver) isOver = np.asarray([e[3] for e in batch_exp], dtype='bool')
return [state, action, reward, isOver]
#-*- coding: utf-8 -*-
import argparse
import os
import numpy as np
import paddle.fluid as fluid
from train import get_player
from tqdm import tqdm
def predict_action(exe, state, predict_program, feed_names, fetch_targets,
action_dim):
if np.random.randint(100) == 0:
act = np.random.randint(action_dim)
else:
state = np.expand_dims(state, axis=0)
pred_Q = exe.run(predict_program,
feed={feed_names[0]: state.astype('float32')},
fetch_list=fetch_targets)[0]
pred_Q = np.squeeze(pred_Q, axis=0)
act = np.argmax(pred_Q)
return act
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument(
'--use_cuda', action='store_true', help='if set, use cuda')
parser.add_argument('--rom', type=str, required=True, help='atari rom')
parser.add_argument(
'--model_path', type=str, required=True, help='dirname to load model')
parser.add_argument(
'--viz',
type=float,
default=0,
help='''viz: visualization setting:
Set to 0 to disable;
Set to a positive number to be the delay between frames to show.
''')
args = parser.parse_args()
env = get_player(args.rom, viz=args.viz)
place = fluid.CUDAPlace(0) if args.use_cuda else fluid.CPUPlace()
exe = fluid.Executor(place)
inference_scope = fluid.core.Scope()
with fluid.scope_guard(inference_scope):
[predict_program, feed_names,
fetch_targets] = fluid.io.load_inference_model(args.model_path, exe)
episode_reward = []
for _ in tqdm(xrange(30), desc='eval agent'):
state = env.reset()
total_reward = 0
while True:
action = predict_action(exe, state, predict_program, feed_names,
fetch_targets, env.action_space.n)
state, reward, isOver, info = env.step(action)
total_reward += reward
if isOver:
break
episode_reward.append(total_reward)
eval_reward = np.mean(episode_reward)
print('Average reward of 30 epidose: {}'.format(eval_reward))
#-*- coding: utf-8 -*-
from DQN_agent import DQNModel
from DoubleDQN_agent import DoubleDQNModel
from DuelingDQN_agent import DuelingDQNModel
from atari import AtariPlayer
import paddle.fluid as fluid
import gym
import argparse
import cv2
from tqdm import tqdm
from expreplay import ReplayMemory, Experience
import numpy as np
import os
from datetime import datetime
from atari_wrapper import FrameStack, MapState, FireResetEnv, LimitLength
from collections import deque
UPDATE_FREQ = 4
#MEMORY_WARMUP_SIZE = 2000
MEMORY_SIZE = 1e6
MEMORY_WARMUP_SIZE = MEMORY_SIZE // 20
IMAGE_SIZE = (84, 84)
CONTEXT_LEN = 4
ACTION_REPEAT = 4 # aka FRAME_SKIP
UPDATE_FREQ = 4
def run_train_episode(agent, env, exp):
total_reward = 0
state = env.reset()
step = 0
while True:
step += 1
context = exp.recent_state()
context.append(state)
context = np.stack(context, axis=0)
action = agent.act(context, train_or_test='train')
next_state, reward, isOver, _ = env.step(action)
exp.append(Experience(state, action, reward, isOver))
# train model
# start training
if len(exp) > MEMORY_WARMUP_SIZE:
if step % UPDATE_FREQ == 0:
batch_all_state, batch_action, batch_reward, batch_isOver = exp.sample_batch(
args.batch_size)
batch_state = batch_all_state[:, :CONTEXT_LEN, :, :]
batch_next_state = batch_all_state[:, 1:, :, :]
agent.train(batch_state, batch_action, batch_reward,
batch_next_state, batch_isOver)
total_reward += reward
state = next_state
if isOver:
break
return total_reward, step
def get_player(rom, viz=False, train=False):
env = AtariPlayer(
rom,
frame_skip=ACTION_REPEAT,
viz=viz,
live_lost_as_eoe=train,
max_num_frames=60000)
env = FireResetEnv(env)
env = MapState(env, lambda im: cv2.resize(im, IMAGE_SIZE))
if not train:
# in training, context is taken care of in expreplay buffer
env = FrameStack(env, CONTEXT_LEN)
return env
def eval_agent(agent, env):
episode_reward = []
for _ in tqdm(xrange(30), desc='eval agent'):
state = env.reset()
total_reward = 0
step = 0
while True:
step += 1
action = agent.act(state, train_or_test='test')
state, reward, isOver, info = env.step(action)
total_reward += reward
if isOver:
break
episode_reward.append(total_reward)
eval_reward = np.mean(episode_reward)
return eval_reward
def train_agent():
env = get_player(args.rom, train=True)
test_env = get_player(args.rom)
exp = ReplayMemory(args.mem_size, IMAGE_SIZE, CONTEXT_LEN)
action_dim = env.action_space.n
if args.alg == 'DQN':
agent = DQNModel(IMAGE_SIZE, action_dim, args.gamma, CONTEXT_LEN,
args.use_cuda)
elif args.alg == 'DoubleDQN':
agent = DoubleDQNModel(IMAGE_SIZE, action_dim, args.gamma, CONTEXT_LEN,
args.use_cuda)
elif args.alg == 'DuelingDQN':
agent = DuelingDQNModel(IMAGE_SIZE, action_dim, args.gamma, CONTEXT_LEN,
args.use_cuda)
else:
print('Input algorithm name error!')
return
with tqdm(total=MEMORY_WARMUP_SIZE) as pbar:
while len(exp) < MEMORY_WARMUP_SIZE:
total_reward, step = run_train_episode(agent, env, exp)
pbar.update(step)
# train
test_flag = 0
save_flag = 0
pbar = tqdm(total=1e8)
recent_100_reward = []
total_step = 0
while True:
# start epoch
total_reward, step = run_train_episode(agent, env, exp)
total_step += step
pbar.set_description('[train]exploration:{}'.format(agent.exploration))
pbar.update(step)
if total_step // args.test_every_steps == test_flag:
pbar.write("testing")
eval_reward = eval_agent(agent, test_env)
test_flag += 1
print("eval_agent done, (steps, eval_reward): ({}, {})".format(
total_step, eval_reward))
if total_step // args.save_every_steps == save_flag:
save_flag += 1
save_path = os.path.join(args.model_dirname, '{}-{}'.format(
args.alg, os.path.basename(args.rom).split('.')[0]),
'step{}'.format(total_step))
fluid.io.save_inference_model(save_path, ['state'],
agent.pred_value, agent.exe,
agent.predict_program)
pbar.close()
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument(
'--alg',
type=str,
default='DQN',
help='Reinforcement learning algorithm, support: DQN, DoubleDQN, DuelingDQN'
)
parser.add_argument(
'--use_cuda', action='store_true', help='if set, use cuda')
parser.add_argument(
'--gamma',
type=float,
default=0.99,
help='discount factor for accumulated reward computation')
parser.add_argument(
'--mem_size',
type=int,
default=1000000,
help='memory size for experience replay')
parser.add_argument(
'--batch_size', type=int, default=64, help='batch size for training')
parser.add_argument('--rom', help='atari rom', required=True)
parser.add_argument(
'--model_dirname',
type=str,
default='saved_model',
help='dirname to save model')
parser.add_argument(
'--save_every_steps',
type=int,
default=100000,
help='every steps number to save model')
parser.add_argument(
'--test_every_steps',
type=int,
default=100000,
help='every steps number to run test')
args = parser.parse_args()
train_agent()
#-*- coding: utf-8 -*-
#File: utils.py
import paddle.fluid as fluid
import numpy as np
def fluid_argmax(x):
"""
Get index of max value for the last dimension
"""
_, max_index = fluid.layers.topk(x, k=1)
return max_index
def fluid_flatten(x):
"""
Flatten fluid variable along the first dimension
"""
return fluid.layers.reshape(x, shape=[-1, np.prod(x.shape[1:])])
...@@ -211,13 +211,12 @@ def main(train_data_file, test_data_file, model_save_dir, num_passes): ...@@ -211,13 +211,12 @@ def main(train_data_file, test_data_file, model_save_dir, num_passes):
avg_cost, feature_out, word, mention, target = ner_net(word_dict_len, avg_cost, feature_out, word, mention, target = ner_net(word_dict_len,
label_dict_len) label_dict_len)
crf_decode = fluid.layers.crf_decoding(
input=feature_out, param_attr=fluid.ParamAttr(name='crfw'))
sgd_optimizer = fluid.optimizer.SGD(learning_rate=1e-3) sgd_optimizer = fluid.optimizer.SGD(learning_rate=1e-3)
sgd_optimizer.minimize(avg_cost) sgd_optimizer.minimize(avg_cost)
crf_decode = fluid.layers.crf_decoding(
input=feature_out, param_attr=fluid.ParamAttr(
name='crfw', ))
(precision, recall, f1_score, num_infer_chunks, num_label_chunks, (precision, recall, f1_score, num_infer_chunks, num_label_chunks,
num_correct_chunks) = fluid.layers.chunk_eval( num_correct_chunks) = fluid.layers.chunk_eval(
input=crf_decode, input=crf_decode,
...@@ -289,8 +288,8 @@ def main(train_data_file, test_data_file, model_save_dir, num_passes): ...@@ -289,8 +288,8 @@ def main(train_data_file, test_data_file, model_save_dir, num_passes):
+ str(f1)) + str(f1))
save_dirname = os.path.join(model_save_dir, save_dirname = os.path.join(model_save_dir,
"params_pass_%d" % pass_id) "params_pass_%d" % pass_id)
fluid.io.save_inference_model( fluid.io.save_inference_model(save_dirname, ['word', 'mention'],
save_dirname, ['word', 'mention', 'target'], [crf_decode], exe) [crf_decode], exe)
if __name__ == "__main__": if __name__ == "__main__":
......
model/
pretrained/
data/
label/
*.swp
*.log
infer_results/
from PIL import Image, ImageEnhance, ImageDraw
from PIL import ImageFile
import numpy as np
import random
import math
ImageFile.LOAD_TRUNCATED_IMAGES = True #otherwise IOError raised image file is truncated
class sampler():
def __init__(self,
max_sample,
max_trial,
min_scale,
max_scale,
min_aspect_ratio,
max_aspect_ratio,
min_jaccard_overlap,
max_jaccard_overlap,
min_object_coverage,
max_object_coverage,
use_square=False):
self.max_sample = max_sample
self.max_trial = max_trial
self.min_scale = min_scale
self.max_scale = max_scale
self.min_aspect_ratio = min_aspect_ratio
self.max_aspect_ratio = max_aspect_ratio
self.min_jaccard_overlap = min_jaccard_overlap
self.max_jaccard_overlap = max_jaccard_overlap
self.min_object_coverage = min_object_coverage
self.max_object_coverage = max_object_coverage
self.use_square = use_square
class bbox():
def __init__(self, xmin, ymin, xmax, ymax):
self.xmin = xmin
self.ymin = ymin
self.xmax = xmax
self.ymax = ymax
def intersect_bbox(bbox1, bbox2):
if bbox2.xmin > bbox1.xmax or bbox2.xmax < bbox1.xmin or \
bbox2.ymin > bbox1.ymax or bbox2.ymax < bbox1.ymin:
intersection_box = bbox(0.0, 0.0, 0.0, 0.0)
else:
intersection_box = bbox(
max(bbox1.xmin, bbox2.xmin),
max(bbox1.ymin, bbox2.ymin),
min(bbox1.xmax, bbox2.xmax), min(bbox1.ymax, bbox2.ymax))
return intersection_box
def bbox_coverage(bbox1, bbox2):
inter_box = intersect_bbox(bbox1, bbox2)
intersect_size = bbox_area(inter_box)
if intersect_size > 0:
bbox1_size = bbox_area(bbox1)
return intersect_size / bbox1_size
else:
return 0.
def bbox_area(src_bbox):
if src_bbox.xmax < src_bbox.xmin or src_bbox.ymax < src_bbox.ymin:
return 0.
else:
width = src_bbox.xmax - src_bbox.xmin
height = src_bbox.ymax - src_bbox.ymin
return width * height
def generate_sample(sampler, image_width, image_height):
scale = random.uniform(sampler.min_scale, sampler.max_scale)
aspect_ratio = random.uniform(sampler.min_aspect_ratio,
sampler.max_aspect_ratio)
aspect_ratio = max(aspect_ratio, (scale**2.0))
aspect_ratio = min(aspect_ratio, 1 / (scale**2.0))
bbox_width = scale * (aspect_ratio**0.5)
bbox_height = scale / (aspect_ratio**0.5)
# guarantee a squared image patch after cropping
if sampler.use_square:
if image_height < image_width:
bbox_width = bbox_height * image_height / image_width
else:
bbox_height = bbox_width * image_width / image_height
xmin_bound = 1 - bbox_width
ymin_bound = 1 - bbox_height
xmin = random.uniform(0, xmin_bound)
ymin = random.uniform(0, ymin_bound)
xmax = xmin + bbox_width
ymax = ymin + bbox_height
sampled_bbox = bbox(xmin, ymin, xmax, ymax)
return sampled_bbox
def data_anchor_sampling(sampler, bbox_labels, image_width, image_height,
scale_array, resize_width, resize_height):
num_gt = len(bbox_labels)
# np.random.randint range: [low, high)
rand_idx = np.random.randint(0, num_gt) if num_gt != 0 else 0
if num_gt != 0:
norm_xmin = bbox_labels[rand_idx][0]
norm_ymin = bbox_labels[rand_idx][1]
norm_xmax = bbox_labels[rand_idx][2]
norm_ymax = bbox_labels[rand_idx][3]
xmin = norm_xmin * image_width
ymin = norm_ymin * image_height
wid = image_width * (norm_xmax - norm_xmin)
hei = image_height * (norm_ymax - norm_ymin)
range_size = 0
for scale_ind in range(0, len(scale_array) - 1):
area = wid * hei
if area > scale_array[scale_ind] ** 2 and area < \
scale_array[scale_ind + 1] ** 2:
range_size = scale_ind + 1
break
scale_choose = 0.0
if range_size == 0:
rand_idx_size = range_size + 1
else:
# np.random.randint range: [low, high)
rng_rand_size = np.random.randint(0, range_size)
rand_idx_size = rng_rand_size % range_size
scale_choose = random.uniform(scale_array[rand_idx_size] / 2.0,
2.0 * scale_array[rand_idx_size])
sample_bbox_size = wid * resize_width / scale_choose
w_off_orig = 0.0
h_off_orig = 0.0
if sample_bbox_size < max(image_height, image_width):
if wid <= sample_bbox_size:
w_off_orig = random.uniform(xmin + wid - sample_bbox_size, xmin)
else:
w_off_orig = random.uniform(xmin, xmin + wid - sample_bbox_size)
if hei <= sample_bbox_size:
h_off_orig = random.uniform(ymin + hei - sample_bbox_size, ymin)
else:
h_off_orig = random.uniform(ymin, ymin + hei - sample_bbox_size)
else:
w_off_orig = random.uniform(image_width - sample_bbox_size, 0.0)
h_off_orig = random.uniform(image_height - sample_bbox_size, 0.0)
w_off_orig = math.floor(w_off_orig)
h_off_orig = math.floor(h_off_orig)
# Figure out top left coordinates.
w_off = 0.0
h_off = 0.0
w_off = float(w_off_orig / image_width)
h_off = float(h_off_orig / image_height)
sampled_bbox = bbox(w_off, h_off,
w_off + float(sample_bbox_size / image_width),
h_off + float(sample_bbox_size / image_height))
return sampled_bbox
def jaccard_overlap(sample_bbox, object_bbox):
if sample_bbox.xmin >= object_bbox.xmax or \
sample_bbox.xmax <= object_bbox.xmin or \
sample_bbox.ymin >= object_bbox.ymax or \
sample_bbox.ymax <= object_bbox.ymin:
return 0
intersect_xmin = max(sample_bbox.xmin, object_bbox.xmin)
intersect_ymin = max(sample_bbox.ymin, object_bbox.ymin)
intersect_xmax = min(sample_bbox.xmax, object_bbox.xmax)
intersect_ymax = min(sample_bbox.ymax, object_bbox.ymax)
intersect_size = (intersect_xmax - intersect_xmin) * (
intersect_ymax - intersect_ymin)
sample_bbox_size = bbox_area(sample_bbox)
object_bbox_size = bbox_area(object_bbox)
overlap = intersect_size / (
sample_bbox_size + object_bbox_size - intersect_size)
return overlap
def satisfy_sample_constraint(sampler, sample_bbox, bbox_labels):
if sampler.min_jaccard_overlap == 0 and sampler.max_jaccard_overlap == 0:
has_jaccard_overlap = False
else:
has_jaccard_overlap = True
if sampler.min_object_coverage == 0 and sampler.max_object_coverage == 0:
has_object_coverage = False
else:
has_object_coverage = True
if not has_jaccard_overlap and not has_object_coverage:
return True
found = False
for i in range(len(bbox_labels)):
object_bbox = bbox(bbox_labels[i][1], bbox_labels[i][2],
bbox_labels[i][3], bbox_labels[i][4])
if has_jaccard_overlap:
overlap = jaccard_overlap(sample_bbox, object_bbox)
if sampler.min_jaccard_overlap != 0 and \
overlap < sampler.min_jaccard_overlap:
continue
if sampler.max_jaccard_overlap != 0 and \
overlap > sampler.max_jaccard_overlap:
continue
found = True
if has_object_coverage:
object_coverage = bbox_coverage(object_bbox, sample_bbox)
if sampler.min_object_coverage != 0 and \
object_coverage < sampler.min_object_coverage:
continue
if sampler.max_object_coverage != 0 and \
object_coverage > sampler.max_object_coverage:
continue
found = True
if found:
return True
return found
def generate_batch_samples(batch_sampler, bbox_labels, image_width,
image_height):
sampled_bbox = []
for sampler in batch_sampler:
found = 0
for i in range(sampler.max_trial):
if found >= sampler.max_sample:
break
sample_bbox = generate_sample(sampler, image_width, image_height)
if satisfy_sample_constraint(sampler, sample_bbox, bbox_labels):
sampled_bbox.append(sample_bbox)
found = found + 1
return sampled_bbox
def generate_batch_random_samples(batch_sampler, bbox_labels, image_width,
image_height, scale_array, resize_width,
resize_height):
sampled_bbox = []
for sampler in batch_sampler:
found = 0
for i in range(sampler.max_trial):
if found >= sampler.max_sample:
break
sample_bbox = data_anchor_sampling(
sampler, bbox_labels, image_width, image_height, scale_array,
resize_width, resize_height)
if satisfy_sample_constraint(sampler, sample_bbox, bbox_labels):
sampled_bbox.append(sample_bbox)
found = found + 1
return sampled_bbox
def clip_bbox(src_bbox):
src_bbox.xmin = max(min(src_bbox.xmin, 1.0), 0.0)
src_bbox.ymin = max(min(src_bbox.ymin, 1.0), 0.0)
src_bbox.xmax = max(min(src_bbox.xmax, 1.0), 0.0)
src_bbox.ymax = max(min(src_bbox.ymax, 1.0), 0.0)
return src_bbox
def meet_emit_constraint(src_bbox, sample_bbox):
center_x = (src_bbox.xmax + src_bbox.xmin) / 2
center_y = (src_bbox.ymax + src_bbox.ymin) / 2
if center_x >= sample_bbox.xmin and \
center_x <= sample_bbox.xmax and \
center_y >= sample_bbox.ymin and \
center_y <= sample_bbox.ymax:
return True
return False
def project_bbox(object_bbox, sample_bbox):
if object_bbox.xmin >= sample_bbox.xmax or \
object_bbox.xmax <= sample_bbox.xmin or \
object_bbox.ymin >= sample_bbox.ymax or \
object_bbox.ymax <= sample_bbox.ymin:
return False
else:
proj_bbox = bbox(0, 0, 0, 0)
sample_width = sample_bbox.xmax - sample_bbox.xmin
sample_height = sample_bbox.ymax - sample_bbox.ymin
proj_bbox.xmin = (object_bbox.xmin - sample_bbox.xmin) / sample_width
proj_bbox.ymin = (object_bbox.ymin - sample_bbox.ymin) / sample_height
proj_bbox.xmax = (object_bbox.xmax - sample_bbox.xmin) / sample_width
proj_bbox.ymax = (object_bbox.ymax - sample_bbox.ymin) / sample_height
proj_bbox = clip_bbox(proj_bbox)
if bbox_area(proj_bbox) > 0:
return proj_bbox
else:
return False
def transform_labels(bbox_labels, sample_bbox):
sample_labels = []
for i in range(len(bbox_labels)):
sample_label = []
object_bbox = bbox(bbox_labels[i][1], bbox_labels[i][2],
bbox_labels[i][3], bbox_labels[i][4])
if not meet_emit_constraint(object_bbox, sample_bbox):
continue
proj_bbox = project_bbox(object_bbox, sample_bbox)
if proj_bbox:
sample_label.append(bbox_labels[i][0])
sample_label.append(float(proj_bbox.xmin))
sample_label.append(float(proj_bbox.ymin))
sample_label.append(float(proj_bbox.xmax))
sample_label.append(float(proj_bbox.ymax))
sample_label = sample_label + bbox_labels[i][5:]
sample_labels.append(sample_label)
return sample_labels
def crop_image(img, bbox_labels, sample_bbox, image_width, image_height):
sample_bbox = clip_bbox(sample_bbox)
xmin = int(sample_bbox.xmin * image_width)
xmax = int(sample_bbox.xmax * image_width)
ymin = int(sample_bbox.ymin * image_height)
ymax = int(sample_bbox.ymax * image_height)
sample_img = img[ymin:ymax, xmin:xmax]
sample_labels = transform_labels(bbox_labels, sample_bbox)
return sample_img, sample_labels
def crop_image_sampling(img, bbox_labels, sample_bbox, image_width,
image_height, resize_width, resize_height):
# no clipping here
xmin = int(sample_bbox.xmin * image_width)
xmax = int(sample_bbox.xmax * image_width)
ymin = int(sample_bbox.ymin * image_height)
ymax = int(sample_bbox.ymax * image_height)
w_off = xmin
h_off = ymin
width = xmax - xmin
height = ymax - ymin
cross_xmin = max(0.0, float(w_off))
cross_ymin = max(0.0, float(h_off))
cross_xmax = min(float(w_off + width - 1.0), float(image_width))
cross_ymax = min(float(h_off + height - 1.0), float(image_height))
cross_width = cross_xmax - cross_xmin
cross_height = cross_ymax - cross_ymin
roi_xmin = 0 if w_off >= 0 else abs(w_off)
roi_ymin = 0 if h_off >= 0 else abs(h_off)
roi_width = cross_width
roi_height = cross_height
sample_img = np.zeros((width, height, 3))
sample_img[roi_xmin : roi_xmin + roi_width, roi_ymin : roi_ymin + roi_height] = \
img[cross_xmin : cross_xmin + cross_width, cross_ymin : cross_ymin + cross_height]
sample_img = cv2.resize(
sample_img, (resize_width, resize_height), interpolation=cv2.INTER_AREA)
sample_labels = transform_labels(bbox_labels, sample_bbox)
return sample_img, sample_labels
def random_brightness(img, settings):
prob = random.uniform(0, 1)
if prob < settings.brightness_prob:
delta = random.uniform(-settings.brightness_delta,
settings.brightness_delta) + 1
img = ImageEnhance.Brightness(img).enhance(delta)
return img
def random_contrast(img, settings):
prob = random.uniform(0, 1)
if prob < settings.contrast_prob:
delta = random.uniform(-settings.contrast_delta,
settings.contrast_delta) + 1
img = ImageEnhance.Contrast(img).enhance(delta)
return img
def random_saturation(img, settings):
prob = random.uniform(0, 1)
if prob < settings.saturation_prob:
delta = random.uniform(-settings.saturation_delta,
settings.saturation_delta) + 1
img = ImageEnhance.Color(img).enhance(delta)
return img
def random_hue(img, settings):
prob = random.uniform(0, 1)
if prob < settings.hue_prob:
delta = random.uniform(-settings.hue_delta, settings.hue_delta)
img_hsv = np.array(img.convert('HSV'))
img_hsv[:, :, 0] = img_hsv[:, :, 0] + delta
img = Image.fromarray(img_hsv, mode='HSV').convert('RGB')
return img
def distort_image(img, settings):
prob = random.uniform(0, 1)
# Apply different distort order
if prob > 0.5:
img = random_brightness(img, settings)
img = random_contrast(img, settings)
img = random_saturation(img, settings)
img = random_hue(img, settings)
else:
img = random_brightness(img, settings)
img = random_saturation(img, settings)
img = random_hue(img, settings)
img = random_contrast(img, settings)
return img
def expand_image(img, bbox_labels, img_width, img_height, settings):
prob = random.uniform(0, 1)
if prob < settings.expand_prob:
if settings.expand_max_ratio - 1 >= 0.01:
expand_ratio = random.uniform(1, settings.expand_max_ratio)
height = int(img_height * expand_ratio)
width = int(img_width * expand_ratio)
h_off = math.floor(random.uniform(0, height - img_height))
w_off = math.floor(random.uniform(0, width - img_width))
expand_bbox = bbox(-w_off / img_width, -h_off / img_height,
(width - w_off) / img_width,
(height - h_off) / img_height)
expand_img = np.ones((height, width, 3))
expand_img = np.uint8(expand_img * np.squeeze(settings.img_mean))
expand_img = Image.fromarray(expand_img)
expand_img.paste(img, (int(w_off), int(h_off)))
bbox_labels = transform_labels(bbox_labels, expand_bbox)
return expand_img, bbox_labels, width, height
return img, bbox_labels, img_width, img_height
import os
import time
import numpy as np
import argparse
import functools
from PIL import Image
from PIL import ImageDraw
import paddle
import paddle.fluid as fluid
import reader
from pyramidbox import PyramidBox
from utility import add_arguments, print_arguments
parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser)
# yapf: disable
add_arg('use_gpu', bool, True, "Whether use GPU.")
add_arg('use_pyramidbox', bool, False, "Whether use PyramidBox model.")
add_arg('confs_threshold', float, 0.25, "Confidence threshold to draw bbox.")
add_arg('image_path', str, '', "The data root path.")
add_arg('model_dir', str, '', "The model path.")
# yapf: enable
def draw_bounding_box_on_image(image_path, nms_out, confs_threshold):
image = Image.open(image_path)
draw = ImageDraw.Draw(image)
for dt in nms_out:
xmin, ymin, xmax, ymax, score = dt
if score < confs_threshold:
continue
(left, right, top, bottom) = (xmin, xmax, ymin, ymax)
draw.line(
[(left, top), (left, bottom), (right, bottom), (right, top),
(left, top)],
width=4,
fill='red')
image_name = image_path.split('/')[-1]
image_class = image_path.split('/')[-2]
print("image with bbox drawed saved as {}".format(image_name))
image.save('./infer_results/' + image_class.encode('utf-8') + '/' +
image_name.encode('utf-8'))
def write_to_txt(image_path, f, nms_out):
image_name = image_path.split('/')[-1]
image_class = image_path.split('/')[-2]
f.write('{:s}\n'.format(
image_class.encode('utf-8') + '/' + image_name.encode('utf-8')))
f.write('{:d}\n'.format(nms_out.shape[0]))
for dt in nms_out:
xmin, ymin, xmax, ymax, score = dt
f.write('{:.1f} {:.1f} {:.1f} {:.1f} {:.3f}\n'.format(xmin, ymin, (
xmax - xmin + 1), (ymax - ymin + 1), score))
print("image infer result saved {}".format(image_name[:-4]))
def get_round(x, loc):
str_x = str(x)
if '.' in str_x:
len_after = len(str_x.split('.')[1])
str_before = str_x.split('.')[0]
str_after = str_x.split('.')[1]
if len_after >= 3:
str_final = str_before + '.' + str_after[0:loc]
return float(str_final)
else:
return x
def bbox_vote(det):
order = det[:, 4].ravel().argsort()[::-1]
det = det[order, :]
if det.shape[0] == 0:
dets = np.array([[10, 10, 20, 20, 0.002]])
det = np.empty(shape=[0, 5])
while det.shape[0] > 0:
# IOU
area = (det[:, 2] - det[:, 0] + 1) * (det[:, 3] - det[:, 1] + 1)
xx1 = np.maximum(det[0, 0], det[:, 0])
yy1 = np.maximum(det[0, 1], det[:, 1])
xx2 = np.minimum(det[0, 2], det[:, 2])
yy2 = np.minimum(det[0, 3], det[:, 3])
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
o = inter / (area[0] + area[:] - inter)
# get needed merge det and delete these det
merge_index = np.where(o >= 0.3)[0]
det_accu = det[merge_index, :]
det = np.delete(det, merge_index, 0)
if merge_index.shape[0] <= 1:
if det.shape[0] == 0:
try:
dets = np.row_stack((dets, det_accu))
except:
dets = det_accu
continue
det_accu[:, 0:4] = det_accu[:, 0:4] * np.tile(det_accu[:, -1:], (1, 4))
max_score = np.max(det_accu[:, 4])
det_accu_sum = np.zeros((1, 5))
det_accu_sum[:, 0:4] = np.sum(det_accu[:, 0:4],
axis=0) / np.sum(det_accu[:, -1:])
det_accu_sum[:, 4] = max_score
try:
dets = np.row_stack((dets, det_accu_sum))
except:
dets = det_accu_sum
dets = dets[0:750, :]
return dets
def image_preprocess(image):
img = np.array(image)
# HWC to CHW
if len(img.shape) == 3:
img = np.swapaxes(img, 1, 2)
img = np.swapaxes(img, 1, 0)
# RBG to BGR
img = img[[2, 1, 0], :, :]
img = img.astype('float32')
img -= np.array(
[104., 117., 123.])[:, np.newaxis, np.newaxis].astype('float32')
img = img * 0.007843
img = [img]
img = np.array(img)
return img
def detect_face(image, shrink):
image_shape = [3, image.size[1], image.size[0]]
num_classes = 2
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
if shrink != 1:
image = image.resize((int(image_shape[2] * shrink),
int(image_shape[1] * shrink)), Image.ANTIALIAS)
image_shape = [
image_shape[0], int(image_shape[1] * shrink),
int(image_shape[2] * shrink)
]
print "image_shape:", image_shape
img = image_preprocess(image)
scope = fluid.core.Scope()
main_program = fluid.Program()
startup_program = fluid.Program()
with fluid.scope_guard(scope):
with fluid.unique_name.guard():
with fluid.program_guard(main_program, startup_program):
fetches = []
network = PyramidBox(
image_shape,
num_classes,
sub_network=args.use_pyramidbox,
is_infer=True)
infer_program, nmsed_out = network.infer(main_program)
fetches = [nmsed_out]
fluid.io.load_persistables(
exe, args.model_dir, main_program=main_program)
detection, = exe.run(infer_program,
feed={'image': img},
fetch_list=fetches,
return_numpy=False)
detection = np.array(detection)
# layout: xmin, ymin, xmax. ymax, score
det_conf = detection[:, 1]
det_xmin = image_shape[2] * detection[:, 2] / shrink
det_ymin = image_shape[1] * detection[:, 3] / shrink
det_xmax = image_shape[2] * detection[:, 4] / shrink
det_ymax = image_shape[1] * detection[:, 5] / shrink
det = np.column_stack((det_xmin, det_ymin, det_xmax, det_ymax, det_conf))
keep_index = np.where(det[:, 4] >= 0)[0]
det = det[keep_index, :]
return det
def flip_test(image, shrink):
img = image.transpose(Image.FLIP_LEFT_RIGHT)
det_f = detect_face(img, shrink)
det_t = np.zeros(det_f.shape)
# image.size: [width, height]
det_t[:, 0] = image.size[0] - det_f[:, 2]
det_t[:, 1] = det_f[:, 1]
det_t[:, 2] = image.size[0] - det_f[:, 0]
det_t[:, 3] = det_f[:, 3]
det_t[:, 4] = det_f[:, 4]
return det_t
def multi_scale_test(image, max_shrink):
# shrink detecting and shrink only detect big face
st = 0.5 if max_shrink >= 0.75 else 0.5 * max_shrink
det_s = detect_face(image, st)
index = np.where(
np.maximum(det_s[:, 2] - det_s[:, 0] + 1, det_s[:, 3] - det_s[:, 1] + 1)
> 30)[0]
det_s = det_s[index, :]
# enlarge one times
bt = min(2, max_shrink) if max_shrink > 1 else (st + max_shrink) / 2
det_b = detect_face(image, bt)
# enlarge small image x times for small face
if max_shrink > 2:
bt *= 2
while bt < max_shrink:
det_b = np.row_stack((det_b, detect_face(image, bt)))
bt *= 2
det_b = np.row_stack((det_b, detect_face(image, max_shrink)))
# enlarge only detect small face
if bt > 1:
index = np.where(
np.minimum(det_b[:, 2] - det_b[:, 0] + 1,
det_b[:, 3] - det_b[:, 1] + 1) < 100)[0]
det_b = det_b[index, :]
else:
index = np.where(
np.maximum(det_b[:, 2] - det_b[:, 0] + 1,
det_b[:, 3] - det_b[:, 1] + 1) > 30)[0]
det_b = det_b[index, :]
return det_s, det_b
def get_im_shrink(image_shape):
max_shrink_v1 = (0x7fffffff / 577.0 /
(image_shape[1] * image_shape[2]))**0.5
max_shrink_v2 = (
(678 * 1024 * 2.0 * 2.0) / (image_shape[1] * image_shape[2]))**0.5
max_shrink = get_round(min(max_shrink_v1, max_shrink_v2), 2) - 0.3
if max_shrink >= 1.5 and max_shrink < 2:
max_shrink = max_shrink - 0.1
elif max_shrink >= 2 and max_shrink < 3:
max_shrink = max_shrink - 0.2
elif max_shrink >= 3 and max_shrink < 4:
max_shrink = max_shrink - 0.3
elif max_shrink >= 4 and max_shrink < 5:
max_shrink = max_shrink - 0.4
elif max_shrink >= 5:
max_shrink = max_shrink - 0.5
print 'max_shrink = ', max_shrink
shrink = max_shrink if max_shrink < 1 else 1
print "shrink = ", shrink
return shrink, max_shrink
def infer(args, batch_size, data_args):
if not os.path.exists(args.model_dir):
raise ValueError("The model path [%s] does not exist." %
(args.model_dir))
infer_reader = paddle.batch(
reader.test(data_args, file_list), batch_size=batch_size)
for batch_id, img in enumerate(infer_reader()):
image = img[0][0]
image_path = img[0][1]
# image.size: [width, height]
image_shape = [3, image.size[1], image.size[0]]
shrink, max_shrink = get_im_shrink(image_shape)
det0 = detect_face(image, shrink)
det1 = flip_test(image, shrink)
[det2, det3] = multi_scale_test(image, max_shrink)
det = np.row_stack((det0, det1, det2, det3))
dets = bbox_vote(det)
image_name = image_path.split('/')[-1]
image_class = image_path.split('/')[-2]
if not os.path.exists('./infer_results/' + image_class.encode('utf-8')):
os.makedirs('./infer_results/' + image_class.encode('utf-8'))
f = open('./infer_results/' + image_class.encode('utf-8') + '/' +
image_name.encode('utf-8')[:-4] + '.txt', 'w')
write_to_txt(image_path, f, dets)
# draw_bounding_box_on_image(image_path, dets, args.confs_threshold)
print "Done"
if __name__ == '__main__':
args = parser.parse_args()
print_arguments(args)
data_dir = 'data/WIDERFACE/WIDER_val/images/'
file_list = 'label/val_gt_widerface.res'
data_args = reader.Settings(
data_dir=data_dir,
mean_value=[104., 117., 123],
apply_distort=False,
apply_expand=False,
ap_version='11point')
infer(args, batch_size=1, data_args=data_args)
import numpy as np
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import Xavier
from paddle.fluid.initializer import Constant
from paddle.fluid.initializer import Bilinear
from paddle.fluid.regularizer import L2Decay
def conv_bn(input, filter, ksize, stride, padding, act='relu', bias_attr=False):
conv = fluid.layers.conv2d(
input=input,
filter_size=ksize,
num_filters=filter,
stride=stride,
padding=padding,
act=None,
bias_attr=bias_attr)
return fluid.layers.batch_norm(input=conv, act=act)
def conv_block(input, groups, filters, ksizes, strides=None, with_pool=True):
assert len(filters) == groups
assert len(ksizes) == groups
strides = [1] * groups if strides is None else strides
w_attr = ParamAttr(learning_rate=1., initializer=Xavier())
b_attr = ParamAttr(learning_rate=2., regularizer=L2Decay(0.))
conv = input
for i in xrange(groups):
conv = fluid.layers.conv2d(
input=conv,
num_filters=filters[i],
filter_size=ksizes[i],
stride=strides[i],
padding=(ksizes[i] - 1) / 2,
param_attr=w_attr,
bias_attr=b_attr,
act='relu')
if with_pool:
pool = fluid.layers.pool2d(
input=conv,
pool_size=2,
pool_type='max',
pool_stride=2,
ceil_mode=True)
return conv, pool
else:
return conv
class PyramidBox(object):
def __init__(self,
data_shape,
num_classes,
use_transposed_conv2d=True,
is_infer=False,
sub_network=False):
"""
TODO(qingqing): add comments.
"""
self.data_shape = data_shape
self.min_sizes = [16., 32., 64., 128., 256., 512.]
self.steps = [4., 8., 16., 32., 64., 128.]
self.num_classes = num_classes
self.use_transposed_conv2d = use_transposed_conv2d
self.is_infer = is_infer
self.sub_network = sub_network
# the base network is VGG with atrous layers
self._input()
self._vgg()
if sub_network:
self._low_level_fpn()
self._cpm_module()
self._pyramidbox()
else:
self._vgg_ssd()
def feeds(self):
if self.is_infer:
return [self.image]
else:
return [
self.image, self.face_box, self.head_box, self.gt_label,
self.difficult
]
def _input(self):
self.image = fluid.layers.data(
name='image', shape=self.data_shape, dtype='float32')
if not self.is_infer:
self.face_box = fluid.layers.data(
name='face_box', shape=[4], dtype='float32', lod_level=1)
self.head_box = fluid.layers.data(
name='head_box', shape=[4], dtype='float32', lod_level=1)
self.gt_label = fluid.layers.data(
name='gt_label', shape=[1], dtype='int32', lod_level=1)
self.difficult = fluid.layers.data(
name='gt_difficult', shape=[1], dtype='int32', lod_level=1)
def _vgg(self):
self.conv1, self.pool1 = conv_block(self.image, 2, [64] * 2, [3] * 2)
self.conv2, self.pool2 = conv_block(self.pool1, 2, [128] * 2, [3] * 2)
#priorbox min_size is 16
self.conv3, self.pool3 = conv_block(self.pool2, 3, [256] * 3, [3] * 3)
#priorbox min_size is 32
self.conv4, self.pool4 = conv_block(self.pool3, 3, [512] * 3, [3] * 3)
#priorbox min_size is 64
self.conv5, self.pool5 = conv_block(self.pool4, 3, [512] * 3, [3] * 3)
# fc6 and fc7 in paper, priorbox min_size is 128
self.conv6 = conv_block(
self.pool5, 2, [1024, 1024], [3, 1], with_pool=False)
# conv6_1 and conv6_2 in paper, priorbox min_size is 256
self.conv7 = conv_block(
self.conv6, 2, [256, 512], [1, 3], [1, 2], with_pool=False)
# conv7_1 and conv7_2 in paper, priorbox mini_size is 512
self.conv8 = conv_block(
self.conv7, 2, [128, 256], [1, 3], [1, 2], with_pool=False)
def _low_level_fpn(self):
"""
Low-level feature pyramid network.
"""
def fpn(up_from, up_to):
ch = up_to.shape[1]
b_attr = ParamAttr(learning_rate=2., regularizer=L2Decay(0.))
conv1 = fluid.layers.conv2d(
up_from, ch, 1, act='relu', bias_attr=b_attr)
if self.use_transposed_conv2d:
w_attr = ParamAttr(
learning_rate=0.,
regularizer=L2Decay(0.),
initializer=Bilinear())
upsampling = fluid.layers.conv2d_transpose(
conv1,
ch,
output_size=None,
filter_size=4,
padding=1,
stride=2,
groups=ch,
param_attr=w_attr,
bias_attr=False)
else:
upsampling = fluid.layers.resize_bilinear(
conv1, out_shape=up_to.shape[2:])
b_attr = ParamAttr(learning_rate=2., regularizer=L2Decay(0.))
conv2 = fluid.layers.conv2d(
up_to, ch, 1, act='relu', bias_attr=b_attr)
if self.is_infer:
upsampling = fluid.layers.crop(upsampling, shape=conv2)
# eltwise mul
conv_fuse = upsampling * conv2
return conv_fuse
self.lfpn2_on_conv5 = fpn(self.conv6, self.conv5)
self.lfpn1_on_conv4 = fpn(self.lfpn2_on_conv5, self.conv4)
self.lfpn0_on_conv3 = fpn(self.lfpn1_on_conv4, self.conv3)
def _cpm_module(self):
"""
Context-sensitive Prediction Module
"""
def cpm(input):
# residual
branch1 = conv_bn(input, 1024, 1, 1, 0, None)
branch2a = conv_bn(input, 256, 1, 1, 0, act='relu')
branch2b = conv_bn(branch2a, 256, 3, 1, 1, act='relu')
branch2c = conv_bn(branch2b, 1024, 1, 1, 0, None)
sum = branch1 + branch2c
rescomb = fluid.layers.relu(x=sum)
# ssh
b_attr = ParamAttr(learning_rate=2., regularizer=L2Decay(0.))
ssh_1 = fluid.layers.conv2d(rescomb, 256, 3, 1, 1, bias_attr=b_attr)
ssh_dimred = fluid.layers.conv2d(
rescomb, 128, 3, 1, 1, act='relu', bias_attr=b_attr)
ssh_2 = fluid.layers.conv2d(
ssh_dimred, 128, 3, 1, 1, bias_attr=b_attr)
ssh_3a = fluid.layers.conv2d(
ssh_dimred, 128, 3, 1, 1, act='relu', bias_attr=b_attr)
ssh_3b = fluid.layers.conv2d(ssh_3a, 128, 3, 1, 1, bias_attr=b_attr)
ssh_concat = fluid.layers.concat([ssh_1, ssh_2, ssh_3b], axis=1)
ssh_out = fluid.layers.relu(x=ssh_concat)
return ssh_out
self.ssh_conv3 = cpm(self.lfpn0_on_conv3)
self.ssh_conv4 = cpm(self.lfpn1_on_conv4)
self.ssh_conv5 = cpm(self.lfpn2_on_conv5)
self.ssh_conv6 = cpm(self.conv6)
self.ssh_conv7 = cpm(self.conv7)
self.ssh_conv8 = cpm(self.conv8)
def _l2_norm_scale(self, input, init_scale=1.0, channel_shared=False):
from paddle.fluid.layer_helper import LayerHelper
helper = LayerHelper("Scale")
l2_norm = fluid.layers.l2_normalize(
input, axis=1) # l2 norm along channel
shape = [1] if channel_shared else [input.shape[1]]
scale = helper.create_parameter(
attr=helper.param_attr,
shape=shape,
dtype=input.dtype,
default_initializer=Constant(init_scale))
out = fluid.layers.elementwise_mul(
x=l2_norm, y=scale, axis=-1 if channel_shared else 1)
return out
def _pyramidbox(self):
"""
Get prior-boxes and pyramid-box
"""
self.ssh_conv3_norm = self._l2_norm_scale(
self.ssh_conv3, init_scale=10.)
self.ssh_conv4_norm = self._l2_norm_scale(self.ssh_conv4, init_scale=8.)
self.ssh_conv5_norm = self._l2_norm_scale(self.ssh_conv5, init_scale=5.)
def permute_and_reshape(input, last_dim):
trans = fluid.layers.transpose(input, perm=[0, 2, 3, 1])
new_shape = [
trans.shape[0], np.prod(trans.shape[1:]) / last_dim, last_dim
]
return fluid.layers.reshape(trans, shape=new_shape)
face_locs, face_confs = [], []
head_locs, head_confs = [], []
boxes, vars = [], []
inputs = [
self.ssh_conv3_norm, self.ssh_conv4_norm, self.ssh_conv5_norm,
self.ssh_conv6, self.ssh_conv7, self.ssh_conv8
]
b_attr = ParamAttr(learning_rate=2., regularizer=L2Decay(0.))
for i, input in enumerate(inputs):
mbox_loc = fluid.layers.conv2d(input, 8, 3, 1, 1, bias_attr=b_attr)
face_loc, head_loc = fluid.layers.split(
mbox_loc, num_or_sections=2, dim=1)
face_loc = permute_and_reshape(face_loc, 4)
head_loc = permute_and_reshape(head_loc, 4)
mbox_conf = fluid.layers.conv2d(input, 6, 3, 1, 1, bias_attr=b_attr)
face_conf1, face_conf3, head_conf = fluid.layers.split(
mbox_conf, num_or_sections=[1, 3, 2], dim=1)
face_conf3_maxin = fluid.layers.reduce_max(
face_conf3, dim=1, keep_dim=True)
face_conf = fluid.layers.concat(
[face_conf1, face_conf3_maxin], axis=1)
face_conf = permute_and_reshape(face_conf, 2)
head_conf = permute_and_reshape(head_conf, 2)
face_locs.append(face_loc)
face_confs.append(face_conf)
head_locs.append(head_loc)
head_confs.append(head_conf)
box, var = fluid.layers.prior_box(
input,
self.image,
min_sizes=[self.min_sizes[i]],
steps=[self.steps[i]] * 2,
aspect_ratios=[1.],
clip=False,
flip=True,
offset=0.5)
box = fluid.layers.reshape(box, shape=[-1, 4])
var = fluid.layers.reshape(var, shape=[-1, 4])
boxes.append(box)
vars.append(var)
self.face_mbox_loc = fluid.layers.concat(face_locs, axis=1)
self.face_mbox_conf = fluid.layers.concat(face_confs, axis=1)
self.head_mbox_loc = fluid.layers.concat(head_locs, axis=1)
self.head_mbox_conf = fluid.layers.concat(head_confs, axis=1)
self.prior_boxes = fluid.layers.concat(boxes)
self.box_vars = fluid.layers.concat(vars)
def _vgg_ssd(self):
self.conv3_norm = self._l2_norm_scale(self.conv3, init_scale=10.)
self.conv4_norm = self._l2_norm_scale(self.conv4, init_scale=8.)
self.conv5_norm = self._l2_norm_scale(self.conv5, init_scale=5.)
def permute_and_reshape(input, last_dim):
trans = fluid.layers.transpose(input, perm=[0, 2, 3, 1])
new_shape = [
trans.shape[0], np.prod(trans.shape[1:]) / last_dim, last_dim
]
return fluid.layers.reshape(trans, shape=new_shape)
locs, confs = [], []
boxes, vars = [], []
b_attr = ParamAttr(learning_rate=2., regularizer=L2Decay(0.))
# conv3
mbox_loc = fluid.layers.conv2d(
self.conv3_norm, 4, 3, 1, 1, bias_attr=b_attr)
loc = permute_and_reshape(mbox_loc, 4)
mbox_conf = fluid.layers.conv2d(
self.conv3_norm, 4, 3, 1, 1, bias_attr=b_attr)
conf1, conf3 = fluid.layers.split(
mbox_conf, num_or_sections=[1, 3], dim=1)
conf3_maxin = fluid.layers.reduce_max(conf3, dim=1, keep_dim=True)
conf = fluid.layers.concat([conf1, conf3_maxin], axis=1)
conf = permute_and_reshape(conf, 2)
box, var = fluid.layers.prior_box(
self.conv3_norm,
self.image,
min_sizes=[16.],
steps=[4, 4],
aspect_ratios=[1.],
clip=False,
flip=True,
offset=0.5)
box = fluid.layers.reshape(box, shape=[-1, 4])
var = fluid.layers.reshape(var, shape=[-1, 4])
locs.append(loc)
confs.append(conf)
boxes.append(box)
vars.append(var)
min_sizes = [32., 64., 128., 256., 512.]
steps = [8., 16., 32., 64., 128.]
inputs = [
self.conv4_norm, self.conv5_norm, self.conv6, self.conv7, self.conv8
]
for i, input in enumerate(inputs):
mbox_loc = fluid.layers.conv2d(input, 4, 3, 1, 1, bias_attr=b_attr)
loc = permute_and_reshape(mbox_loc, 4)
mbox_conf = fluid.layers.conv2d(input, 2, 3, 1, 1, bias_attr=b_attr)
conf = permute_and_reshape(mbox_conf, 2)
box, var = fluid.layers.prior_box(
input,
self.image,
min_sizes=[min_sizes[i]],
steps=[steps[i]] * 2,
aspect_ratios=[1.],
clip=False,
flip=True,
offset=0.5)
box = fluid.layers.reshape(box, shape=[-1, 4])
var = fluid.layers.reshape(var, shape=[-1, 4])
locs.append(loc)
confs.append(conf)
boxes.append(box)
vars.append(var)
self.face_mbox_loc = fluid.layers.concat(locs, axis=1)
self.face_mbox_conf = fluid.layers.concat(confs, axis=1)
self.prior_boxes = fluid.layers.concat(boxes)
self.box_vars = fluid.layers.concat(vars)
def vgg_ssd_loss(self):
loss = fluid.layers.ssd_loss(
self.face_mbox_loc,
self.face_mbox_conf,
self.face_box,
self.gt_label,
self.prior_boxes,
self.box_vars,
overlap_threshold=0.35,
neg_overlap=0.35)
loss = fluid.layers.reduce_sum(loss)
return loss
def train(self):
face_loss = fluid.layers.ssd_loss(
self.face_mbox_loc,
self.face_mbox_conf,
self.face_box,
self.gt_label,
self.prior_boxes,
self.box_vars,
overlap_threshold=0.35,
neg_overlap=0.35)
head_loss = fluid.layers.ssd_loss(
self.head_mbox_loc,
self.head_mbox_conf,
self.head_box,
self.gt_label,
self.prior_boxes,
self.box_vars,
overlap_threshold=0.35,
neg_overlap=0.35)
face_loss = fluid.layers.reduce_sum(face_loss)
head_loss = fluid.layers.reduce_sum(head_loss)
total_loss = face_loss + head_loss
return face_loss, head_loss, total_loss
def infer(self, main_program=None):
if main_program is None:
test_program = fluid.default_main_program().clone(for_test=True)
else:
test_program = main_program.clone(for_test=True)
with fluid.program_guard(test_program):
face_nmsed_out = fluid.layers.detection_output(
self.face_mbox_loc,
self.face_mbox_conf,
self.prior_boxes,
self.box_vars,
nms_threshold=0.45)
return test_program, face_nmsed_out
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import image_util
from paddle.utils.image_util import *
import random
from PIL import Image
from PIL import ImageDraw
import numpy as np
import xml.etree.ElementTree
import os
import time
import copy
import random
class Settings(object):
def __init__(self,
dataset=None,
data_dir=None,
label_file=None,
resize_h=None,
resize_w=None,
mean_value=[104., 117., 123.],
apply_distort=True,
apply_expand=True,
ap_version='11point',
toy=0):
self.dataset = dataset
self.ap_version = ap_version
self.toy = toy
self.data_dir = data_dir
self.apply_distort = apply_distort
self.apply_expand = apply_expand
self.resize_height = resize_h
self.resize_width = resize_w
self.img_mean = np.array(mean_value)[:, np.newaxis, np.newaxis].astype(
'float32')
self.expand_prob = 0.5
self.expand_max_ratio = 4
self.hue_prob = 0.5
self.hue_delta = 18
self.contrast_prob = 0.5
self.contrast_delta = 0.5
self.saturation_prob = 0.5
self.saturation_delta = 0.5
self.brightness_prob = 0.5
# _brightness_delta is the normalized value by 256
# self._brightness_delta = 32
self.brightness_delta = 0.125
self.scale = 0.007843 # 1 / 127.5
self.data_anchor_sampling_prob = 0.5
def preprocess(img, bbox_labels, mode, settings):
img_width, img_height = img.size
sampled_labels = bbox_labels
if mode == 'train':
if settings.apply_distort:
img = image_util.distort_image(img, settings)
if settings.apply_expand:
img, bbox_labels, img_width, img_height = image_util.expand_image(
img, bbox_labels, img_width, img_height, settings)
# sampling
batch_sampler = []
prob = random.uniform(0., 1.)
if prob > settings.data_anchor_sampling_prob:
scale_array = np.array([16, 32, 64, 128, 256, 512])
batch_sampler.append(
image_util.sampler(1, 10, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.2,
0.0, True))
sampled_bbox = image_util.generate_batch_random_samples(
batch_sampler, bbox_labels, img_width, img_height, scale_array,
settings.resize_width, settings.resize_height)
img = np.array(img)
if len(sampled_bbox) > 0:
idx = int(random.uniform(0, len(sampled_bbox)))
img, sampled_labels = image_util.crop_image_sampling(
img, bbox_labels, sampled_bbox[idx], img_width, img_height,
resize_width, resize_heigh)
img = Image.fromarray(img)
else:
# hard-code here
batch_sampler.append(
image_util.sampler(1, 50, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
0.0, True))
batch_sampler.append(
image_util.sampler(1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
0.0, True))
batch_sampler.append(
image_util.sampler(1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
0.0, True))
batch_sampler.append(
image_util.sampler(1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
0.0, True))
batch_sampler.append(
image_util.sampler(1, 50, 0.3, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
0.0, True))
sampled_bbox = image_util.generate_batch_samples(
batch_sampler, bbox_labels, img_width, img_height)
img = np.array(img)
if len(sampled_bbox) > 0:
idx = int(random.uniform(0, len(sampled_bbox)))
img, sampled_labels = image_util.crop_image(
img, bbox_labels, sampled_bbox[idx], img_width, img_height)
img = Image.fromarray(img)
img = img.resize((settings.resize_width, settings.resize_height),
Image.ANTIALIAS)
img = np.array(img)
if mode == 'train':
mirror = int(random.uniform(0, 2))
if mirror == 1:
img = img[:, ::-1, :]
for i in xrange(len(sampled_labels)):
tmp = sampled_labels[i][1]
sampled_labels[i][1] = 1 - sampled_labels[i][3]
sampled_labels[i][3] = 1 - tmp
# HWC to CHW
if len(img.shape) == 3:
img = np.swapaxes(img, 1, 2)
img = np.swapaxes(img, 1, 0)
# RBG to BGR
img = img[[2, 1, 0], :, :]
img = img.astype('float32')
img -= settings.img_mean
img = img * settings.scale
return img, sampled_labels
def put_txt_in_dict(input_txt):
with open(input_txt, 'r') as f_dir:
lines_input_txt = f_dir.readlines()
dict_input_txt = {}
num_class = 0
for i in range(len(lines_input_txt)):
tmp_line_txt = lines_input_txt[i].strip('\n\t\r')
if '--' in tmp_line_txt:
if i != 0:
num_class += 1
dict_input_txt[num_class] = []
dict_name = tmp_line_txt
dict_input_txt[num_class].append(tmp_line_txt)
if '--' not in tmp_line_txt:
if len(tmp_line_txt) > 6:
split_str = tmp_line_txt.split(' ')
x1_min = float(split_str[0])
y1_min = float(split_str[1])
x2_max = float(split_str[2])
y2_max = float(split_str[3])
tmp_line_txt = str(x1_min) + ' ' + str(y1_min) + ' ' + str(
x2_max) + ' ' + str(y2_max)
dict_input_txt[num_class].append(tmp_line_txt)
else:
dict_input_txt[num_class].append(tmp_line_txt)
return dict_input_txt
def expand_bboxes(bboxes,
expand_left=2.,
expand_up=2.,
expand_right=2.,
expand_down=2.):
"""
Expand bboxes, expand 2 times by defalut.
"""
expand_boxes = []
for bbox in bboxes:
xmin = bbox[0]
ymin = bbox[1]
xmax = bbox[2]
ymax = bbox[3]
w = xmax - xmin
h = ymax - ymin
ex_xmin = max(xmin - w / expand_left, 0.)
ex_ymin = max(ymin - h / expand_up, 0.)
ex_xmax = min(xmax + w / expand_right, 1.)
ex_ymax = min(ymax + h / expand_down, 1.)
expand_boxes.append([ex_xmin, ex_ymin, ex_xmax, ex_ymax])
return expand_boxes
def pyramidbox(settings, file_list, mode, shuffle):
dict_input_txt = {}
dict_input_txt = put_txt_in_dict(file_list)
def reader():
if mode == 'train' and shuffle:
random.shuffle(dict_input_txt)
for index_image in range(len(dict_input_txt)):
image_name = dict_input_txt[index_image][0] + '.jpg'
image_path = os.path.join(settings.data_dir, image_name)
im = Image.open(image_path)
if im.mode == 'L':
im = im.convert('RGB')
im_width, im_height = im.size
# layout: label | xmin | ymin | xmax | ymax
if mode == 'train':
bbox_labels = []
for index_box in range(len(dict_input_txt[index_image])):
if index_box >= 2:
bbox_sample = []
temp_info_box = dict_input_txt[index_image][
index_box].split(' ')
xmin = float(temp_info_box[0])
ymin = float(temp_info_box[1])
w = float(temp_info_box[2])
h = float(temp_info_box[3])
xmax = xmin + w
ymax = ymin + h
bbox_sample.append(1)
bbox_sample.append(float(xmin) / im_width)
bbox_sample.append(float(ymin) / im_height)
bbox_sample.append(float(xmax) / im_width)
bbox_sample.append(float(ymax) / im_height)
bbox_labels.append(bbox_sample)
im, sample_labels = preprocess(im, bbox_labels, mode, settings)
sample_labels = np.array(sample_labels)
if len(sample_labels) == 0: continue
im = im.astype('float32')
boxes = sample_labels[:, 1:5]
lbls = [1] * len(boxes)
difficults = [1] * len(boxes)
yield im, boxes, expand_bboxes(boxes), lbls, difficults
if mode == 'test':
yield im, image_path
return reader
def train(settings, file_list, shuffle=True):
return pyramidbox(settings, file_list, 'train', shuffle)
def test(settings, file_list):
return pyramidbox(settings, file_list, 'test', False)
def infer(settings, image_path):
def batch_reader():
img = Image.open(image_path)
if img.mode == 'L':
img = im.convert('RGB')
im_width, im_height = img.size
if settings.resize_width and settings.resize_height:
img = img.resize((settings.resize_width, settings.resize_height),
Image.ANTIALIAS)
img = np.array(img)
# HWC to CHW
if len(img.shape) == 3:
img = np.swapaxes(img, 1, 2)
img = np.swapaxes(img, 1, 0)
# RBG to BGR
img = img[[2, 1, 0], :, :]
img = img.astype('float32')
img -= settings.img_mean
img = img * settings.scale
return np.array([img])
return batch_reader
import os
import shutil
import numpy as np
import time
import argparse
import functools
import reader
import paddle
import paddle.fluid as fluid
from pyramidbox import PyramidBox
from utility import add_arguments, print_arguments
parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser)
# yapf: disable
add_arg('parallel', bool, True, "parallel")
add_arg('learning_rate', float, 0.001, "Learning rate.")
add_arg('batch_size', int, 12, "Minibatch size.")
add_arg('num_passes', int, 120, "Epoch number.")
add_arg('use_gpu', bool, True, "Whether use GPU.")
add_arg('use_pyramidbox', bool, True, "Whether use PyramidBox model.")
add_arg('model_save_dir', str, 'output', "The path to save model.")
add_arg('pretrained_model', str, './pretrained/', "The init model path.")
add_arg('resize_h', int, 640, "The resized image height.")
add_arg('resize_w', int, 640, "The resized image height.")
#yapf: enable
def train(args, config, train_file_list, optimizer_method):
learning_rate = args.learning_rate
batch_size = args.batch_size
num_passes = args.num_passes
height = args.resize_h
width = args.resize_w
use_gpu = args.use_gpu
use_pyramidbox = args.use_pyramidbox
model_save_dir = args.model_save_dir
pretrained_model = args.pretrained_model
num_classes = 2
image_shape = [3, height, width]
devices = os.getenv("CUDA_VISIBLE_DEVICES") or ""
devices_num = len(devices.split(","))
fetches = []
network = PyramidBox(image_shape, num_classes,
sub_network=use_pyramidbox)
if use_pyramidbox:
face_loss, head_loss, loss = network.train()
fetches = [face_loss, head_loss]
else:
loss = network.vgg_ssd_loss()
fetches = [loss]
epocs = 12880 / batch_size
boundaries = [epocs * 40, epocs * 60, epocs * 80, epocs * 100]
values = [
learning_rate, learning_rate * 0.5, learning_rate * 0.25,
learning_rate * 0.1, learning_rate * 0.01
]
if optimizer_method == "momentum":
optimizer = fluid.optimizer.Momentum(
learning_rate=fluid.layers.piecewise_decay(
boundaries=boundaries, values=values),
momentum=0.9,
regularization=fluid.regularizer.L2Decay(0.0005),
)
else:
optimizer = fluid.optimizer.RMSProp(
learning_rate=fluid.layers.piecewise_decay(boundaries, values),
regularization=fluid.regularizer.L2Decay(0.0005),
)
optimizer.minimize(loss)
#fluid.memory_optimize(fluid.default_main_program())
place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
start_pass = 0
if pretrained_model:
if pretrained_model.isdigit():
start_pass = int(pretrained_model) + 1
pretrained_model = os.path.join(model_save_dir, pretrained_model)
print("Resume from %s " %(pretrained_model))
if not os.path.exists(pretrained_model):
raise ValueError("The pre-trained model path [%s] does not exist." %
(pretrained_model))
def if_exist(var):
return os.path.exists(os.path.join(pretrained_model, var.name))
fluid.io.load_vars(exe, pretrained_model, predicate=if_exist)
if args.parallel:
train_exe = fluid.ParallelExecutor(
use_cuda=use_gpu, loss_name=loss.name)
train_reader = paddle.batch(
reader.train(config, train_file_list), batch_size=batch_size)
feeder = fluid.DataFeeder(place=place, feed_list=network.feeds())
def save_model(postfix):
model_path = os.path.join(model_save_dir, postfix)
if os.path.isdir(model_path):
shutil.rmtree(model_path)
print 'save models to %s' % (model_path)
fluid.io.save_persistables(exe, model_path)
for pass_id in range(start_pass, num_passes):
start_time = time.time()
prev_start_time = start_time
end_time = 0
for batch_id, data in enumerate(train_reader()):
prev_start_time = start_time
start_time = time.time()
if len(data) < 2 * devices_num: continue
if args.parallel:
fetch_vars = train_exe.run(fetch_list=[v.name for v in fetches],
feed=feeder.feed(data))
else:
fetch_vars = exe.run(fluid.default_main_program(),
feed=feeder.feed(data),
fetch_list=fetches)
end_time = time.time()
fetch_vars = [np.mean(np.array(v)) for v in fetch_vars]
if batch_id % 1 == 0:
if not args.use_pyramidbox:
print("Pass {0}, batch {1}, loss {2}, time {3}".format(
pass_id, batch_id, fetch_vars[0],
start_time - prev_start_time))
else:
print("Pass {0}, batch {1}, face loss {2}, head loss {3}, " \
"time {4}".format(pass_id,
batch_id, fetch_vars[0], fetch_vars[1],
start_time - prev_start_time))
if pass_id % 1 == 0 or pass_id == num_passes - 1:
save_model(str(pass_id))
if __name__ == '__main__':
args = parser.parse_args()
print_arguments(args)
data_dir = 'data/WIDERFACE/WIDER_train/images/'
train_file_list = 'label/train_gt_widerface.res'
config = reader.Settings(
data_dir=data_dir,
resize_h=args.resize_h,
resize_w=args.resize_w,
apply_expand=False,
mean_value=[104., 117., 123],
ap_version='11point')
train(args, config, train_file_list, optimizer_method="momentum")
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import distutils.util
def print_arguments(args):
"""Print argparse's arguments.
Usage:
.. code-block:: python
parser = argparse.ArgumentParser()
parser.add_argument("name", default="Jonh", type=str, help="User name.")
args = parser.parse_args()
print_arguments(args)
:param args: Input argparse.Namespace for printing.
:type args: argparse.Namespace
"""
print("----------- Configuration Arguments -----------")
for arg, value in sorted(vars(args).iteritems()):
print("%s: %s" % (arg, value))
print("------------------------------------------------")
def add_arguments(argname, type, default, help, argparser, **kwargs):
"""Add argparse's argument.
Usage:
.. code-block:: python
parser = argparse.ArgumentParser()
add_argument("name", str, "Jonh", "User name.", parser)
args = parser.parse_args()
"""
type = distutils.util.strtobool if type == bool else type
argparser.add_argument(
"--" + argname,
default=default,
type=type,
help=help + ' Default: %(default)s.',
**kwargs)
运行本目录下的程序示例需要使用PaddlePaddle develop最新版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
## 代码结构
```
├── network.py # 网络结构定义脚本
├── train.py # 训练任务脚本
├── eval.py # 评估脚本
├── infer.py # 预测脚本
├── cityscape.py # 数据预处理脚本
└── utils.py # 定义通用的函数
```
## 简介
Image Cascade Network(ICNet)主要用于图像实时语义分割。相较于其它压缩计算的方法,ICNet即考虑了速度,也考虑了准确性。
ICNet的主要思想是将输入图像变换为不同的分辨率,然后用不同计算复杂度的子网络计算不同分辨率的输入,然后将结果合并。ICNet由三个子网络组成,计算复杂度高的网络处理低分辨率输入,计算复杂度低的网络处理分辨率高的网络,通过这种方式在高分辨率图像的准确性和低复杂度网络的效率之间获得平衡。
整个网络结构如下:
<p align="center">
<img src="images/icnet.png" width="620" hspace='10'/> <br/>
<strong>图 1</strong>
</p>
## 数据准备
本文采用Cityscape数据集,请前往[Cityscape官网](https://www.cityscapes-dataset.com)注册下载。下载数据之后,按照[这里](https://github.com/mcordts/cityscapesScripts/blob/master/cityscapesscripts/preparation/createTrainIdLabelImgs.py#L3)的说明和工具处理数据。
处理之后的数据
```
data/cityscape/
|-- gtFine
| |-- test
| |-- train
| `-- val
|-- leftImg8bit
| |-- test
| |-- train
| `-- val
|-- train.list
`-- val.list
```
其中,train.list和val.list分别是用于训练和测试的列表文件,第一列为输入图像数据,第二列为标注数据,两列用空格分开。示例如下:
```
leftImg8bit/train/stuttgart/stuttgart_000021_000019_leftImg8bit.png gtFine/train/stuttgart/stuttgart_000021_000019_gtFine_labelTrainIds.png
leftImg8bit/train/stuttgart/stuttgart_000072_000019_leftImg8bit.png gtFine/train/stuttgart/stuttgart_000072_000019_gtFine_labelTrainIds.png
```
完成数据下载和准备后,需要修改`cityscape.py`脚本中对应的数据地址。
## 模型训练与预测
### 训练
执行以下命令进行训练,同时指定checkpoint保存路径:
```
python train.py --batch_size=16 --use_gpu=True --checkpoint_path="./chkpnt/"
```
使用以下命令获得更多使用说明:
```
python train.py --help
```
训练过程中会根据用户的设置,输出训练集上每个网络分支的`loss`, 示例如下:
```
Iter[0]; train loss: 2.338; sub4_loss: 3.367; sub24_loss: 4.120; sub124_loss: 0.151
```
### 测试
执行以下命令在`Cityscape`测试数据集上进行测试:
```
python eval.py --model_path="./model/" --use_gpu=True
```
需要通过选项`--model_path`指定模型文件。
测试脚本的输出的评估指标为[mean IoU]()。
### 预测
执行以下命令对指定的数据进行预测:
```
python infer.py \
--model_path="./model" \
--images_path="./data/cityscape/" \
--images_list="./data/cityscape/infer.list"
```
通过选项`--images_list`指定列表文件,列表文件中每一行为一个要预测的图片的路径。
预测结果默认保存到当前路径下的`output`文件夹下。
## 实验结果
图2为在`CityScape`训练集上的训练的Loss曲线:
<p align="center">
<img src="images/train_loss.png" width="620" hspace='10'/> <br/>
<strong>图 2</strong>
</p>
在训练集上训练,在validation数据集上验证的结果为:mean_IoU=67.0%(论文67.7%)
图3是使用`infer.py`脚本预测产生的结果示例,其中,第一行为输入的原始图片,第二行为人工的标注,第三行为我们模型计算的结果。
<p align="center">
<img src="images/result.png" width="620" hspace='10'/> <br/>
<strong>图 3</strong>
</p>
## 其他信息
|数据集 | pretrained model |
|---|---|
|CityScape | [Model]()[md: ] |
## 参考
- [ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545)
"""Reader for Cityscape dataset.
"""
import os
import cv2
import numpy as np
import paddle.v2 as paddle
DATA_PATH = "./data/cityscape"
TRAIN_LIST = DATA_PATH + "/train.list"
TEST_LIST = DATA_PATH + "/val.list"
IGNORE_LABEL = 255
NUM_CLASSES = 19
TRAIN_DATA_SHAPE = (3, 720, 720)
TEST_DATA_SHAPE = (3, 1024, 2048)
IMG_MEAN = np.array((103.939, 116.779, 123.68), dtype=np.float32)
def train_data_shape():
return TRAIN_DATA_SHAPE
def test_data_shape():
return TEST_DATA_SHAPE
def num_classes():
return NUM_CLASSES
class DataGenerater:
def __init__(self, data_list, mode="train", flip=True, scaling=True):
self.flip = flip
self.scaling = scaling
self.image_label = []
with open(data_list, 'r') as f:
for line in f:
image_file, label_file = line.strip().split(' ')
self.image_label.append((image_file, label_file))
def create_train_reader(self, batch_size):
"""
Create a reader for train dataset.
"""
def reader():
np.random.shuffle(self.image_label)
images = []
labels_sub1 = []
labels_sub2 = []
labels_sub4 = []
count = 0
for image, label in self.image_label:
image, label_sub1, label_sub2, label_sub4 = self.process_train_data(
image, label)
count += 1
images.append(image)
labels_sub1.append(label_sub1)
labels_sub2.append(label_sub2)
labels_sub4.append(label_sub4)
if count == batch_size:
yield self.mask(
np.array(images),
np.array(labels_sub1),
np.array(labels_sub2), np.array(labels_sub4))
images = []
labels_sub1 = []
labels_sub2 = []
labels_sub4 = []
count = 0
if images:
yield self.mask(
np.array(images),
np.array(labels_sub1),
np.array(labels_sub2), np.array(labels_sub4))
return reader
def create_test_reader(self):
"""
Create a reader for test dataset.
"""
def reader():
for image, label in self.image_label:
image, label = self.load(image, label)
image = paddle.image.to_chw(image)[np.newaxis, :]
label = label[np.newaxis, :, :, np.newaxis].astype("float32")
label_mask = np.where((label != IGNORE_LABEL).flatten())[
0].astype("int32")
yield image, label, label_mask
return reader
def process_train_data(self, image, label):
"""
Process training data.
"""
image, label = self.load(image, label)
if self.flip:
image, label = self.random_flip(image, label)
if self.scaling:
image, label = self.random_scaling(image, label)
image, label = self.resize(image, label, out_size=TRAIN_DATA_SHAPE[1:])
label = label.astype("float32")
label_sub1 = paddle.image.to_chw(self.scale_label(label, factor=4))
label_sub2 = paddle.image.to_chw(self.scale_label(label, factor=8))
label_sub4 = paddle.image.to_chw(self.scale_label(label, factor=16))
image = paddle.image.to_chw(image)
return image, label_sub1, label_sub2, label_sub4
def load(self, image, label):
"""
Load image from file.
"""
image = paddle.image.load_image(
DATA_PATH + "/" + image, is_color=True).astype("float32")
image -= IMG_MEAN
label = paddle.image.load_image(
DATA_PATH + "/" + label, is_color=False).astype("float32")
return image, label
def random_flip(self, image, label):
"""
Flip image and label randomly.
"""
r = np.random.rand(1)
if r > 0.5:
image = paddle.image.left_right_flip(image, is_color=True)
label = paddle.image.left_right_flip(label, is_color=False)
return image, label
def random_scaling(self, image, label):
"""
Scale image and label randomly.
"""
scale = np.random.uniform(0.5, 2.0, 1)[0]
h_new = int(image.shape[0] * scale)
w_new = int(image.shape[1] * scale)
image = cv2.resize(image, (w_new, h_new))
label = cv2.resize(
label, (w_new, h_new), interpolation=cv2.INTER_NEAREST)
return image, label
def padding_as(self, image, h, w, is_color):
"""
Padding image.
"""
pad_h = max(image.shape[0], h) - image.shape[0]
pad_w = max(image.shape[1], w) - image.shape[1]
if is_color:
return np.pad(image, ((0, pad_h), (0, pad_w), (0, 0)), 'constant')
else:
return np.pad(image, ((0, pad_h), (0, pad_w)), 'constant')
def resize(self, image, label, out_size):
"""
Resize image and label by padding or cropping.
"""
ignore_label = IGNORE_LABEL
label = label - ignore_label
if len(label.shape) == 2:
label = label[:, :, np.newaxis]
combined = np.concatenate((image, label), axis=2)
combined = self.padding_as(
combined, out_size[0], out_size[1], is_color=True)
combined = paddle.image.random_crop(
combined, out_size[0], is_color=True)
image = combined[:, :, 0:3]
label = combined[:, :, 3:4] + ignore_label
return image, label
def scale_label(self, label, factor):
"""
Scale label according to factor.
"""
h = label.shape[0] / factor
w = label.shape[1] / factor
return cv2.resize(
label, (h, w), interpolation=cv2.INTER_NEAREST)[:, :, np.newaxis]
def mask(self, image, label0, label1, label2):
"""
Get mask for valid pixels.
"""
mask_sub1 = np.where(((label0 < (NUM_CLASSES + 1)) & (
label0 != IGNORE_LABEL)).flatten())[0].astype("int32")
mask_sub2 = np.where(((label1 < (NUM_CLASSES + 1)) & (
label1 != IGNORE_LABEL)).flatten())[0].astype("int32")
mask_sub4 = np.where(((label2 < (NUM_CLASSES + 1)) & (
label2 != IGNORE_LABEL)).flatten())[0].astype("int32")
return image.astype(
"float32"), label0, mask_sub1, label1, mask_sub2, label2, mask_sub4
def train(batch_size=32, flip=True, scaling=True):
"""
Cityscape training set reader.
It returns a reader, in which each result is a batch with batch_size samples.
:param batch_size: The batch size of each result return by the reader.
:type batch_size: int
:param flip: Whether flip images randomly.
:type batch_size: bool
:param scaling: Whether scale images randomly.
:type batch_size: bool
:return: Training reader.
:rtype: callable
"""
reader = DataGenerater(
TRAIN_LIST, flip=flip, scaling=scaling).create_train_reader(batch_size)
return reader
def test():
"""
Cityscape validation set reader.
It returns a reader, in which each result is a sample.
:return: Training reader.
:rtype: callable
"""
reader = DataGenerater(TEST_LIST).create_test_reader()
return reader
def infer(image_list=TEST_LIST):
"""
Infer set reader.
It returns a reader, in which each result is a sample.
:param image_list: The image list file in which each line is a path of image to be infered.
:type batch_size: str
:return: Infer reader.
:rtype: callable
"""
reader = DataGenerater(image_list).create_test_reader()
"""Evaluator for ICNet model."""
import paddle.fluid as fluid
import numpy as np
from utils import add_arguments, print_arguments, get_feeder_data
from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter
from paddle.fluid.initializer import init_on_cpu
from icnet import icnet
import cityscape
import argparse
import functools
import sys
import os
parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser)
# yapf: disable
add_arg('model_path', str, None, "Model path.")
add_arg('use_gpu', bool, True, "Whether use GPU to test.")
# yapf: enable
def cal_mean_iou(wrong, correct):
sum = wrong + cerroct
true_num = (sum != 0).sum()
for i in len(sum):
if sum[i] == 0:
sum[i] = 1
return (cerroct.astype("float64") / sum).sum() / true_num
def create_iou(predict, label, mask, num_classes, image_shape):
predict = fluid.layers.resize_bilinear(predict, out_shape=image_shape[1:3])
predict = fluid.layers.transpose(predict, perm=[0, 2, 3, 1])
predict = fluid.layers.reshape(predict, shape=[-1, num_classes])
label = fluid.layers.reshape(label, shape=[-1, 1])
_, predict = fluid.layers.topk(predict, k=1)
predict = fluid.layers.cast(predict, dtype="float32")
predict = fluid.layers.gather(predict, mask)
label = fluid.layers.gather(label, mask)
label = fluid.layers.cast(label, dtype="int32")
predict = fluid.layers.cast(predict, dtype="int32")
iou, out_w, out_r = fluid.layers.mean_iou(predict, label, num_classes)
return iou, out_w, out_r
def eval(args):
data_shape = cityscape.test_data_shape()
num_classes = cityscape.num_classes()
# define network
images = fluid.layers.data(name='image', shape=data_shape, dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int32')
mask = fluid.layers.data(name='mask', shape=[-1], dtype='int32')
_, _, sub124_out = icnet(images, num_classes,
np.array(data_shape[1:]).astype("float32"))
iou, out_w, out_r = create_iou(sub124_out, label, mask, num_classes,
data_shape)
inference_program = fluid.default_main_program().clone(for_test=True)
# prepare environment
place = fluid.CPUPlace()
if args.use_gpu:
place = fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
assert os.path.exists(args.model_path)
fluid.io.load_params(exe, args.model_path)
print "loaded model from: %s" % args.model_path
sys.stdout.flush()
fetch_vars = [iou, out_w, out_r]
out_wrong = np.zeros([num_classes]).astype("int64")
out_right = np.zeros([num_classes]).astype("int64")
count = 0
test_reader = cityscape.test()
for data in test_reader():
count += 1
result = exe.run(inference_program,
feed=get_feeder_data(
data, place, for_test=True),
fetch_list=fetch_vars)
out_wrong += result[1]
out_right += result[2]
print "count: %s; current iou: %.3f;\r" % (count, result[0]),
sys.stdout.flush()
iou = cal_mean_iou(out_wrong, out_right)
print "\nmean iou: %.3f" % iou
def main():
args = parser.parse_args()
print_arguments(args)
eval(args)
if __name__ == "__main__":
main()
import paddle.fluid as fluid
import numpy as np
import sys
def conv(input,
k_h,
k_w,
c_o,
s_h,
s_w,
relu=False,
padding="VALID",
biased=False,
name=None):
act = None
tmp = input
if relu:
act = "relu"
if padding == "SAME":
padding_h = max(k_h - s_h, 0)
padding_w = max(k_w - s_w, 0)
padding_top = padding_h / 2
padding_left = padding_w / 2
padding_bottom = padding_h - padding_top
padding_right = padding_w - padding_left
padding = [
0, 0, 0, 0, padding_top, padding_bottom, padding_left, padding_right
]
tmp = fluid.layers.pad(tmp, padding)
tmp = fluid.layers.conv2d(
tmp,
num_filters=c_o,
filter_size=[k_h, k_w],
stride=[s_h, s_w],
groups=1,
act=act,
bias_attr=biased,
use_cudnn=False,
name=name)
return tmp
def atrous_conv(input,
k_h,
k_w,
c_o,
dilation,
relu=False,
padding="VALID",
biased=False,
name=None):
act = None
if relu:
act = "relu"
tmp = input
if padding == "SAME":
padding_h = max(k_h - s_h, 0)
padding_w = max(k_w - s_w, 0)
padding_top = padding_h / 2
padding_left = padding_w / 2
padding_bottom = padding_h - padding_top
padding_right = padding_w - padding_left
padding = [
0, 0, 0, 0, padding_top, padding_bottom, padding_left, padding_right
]
tmp = fluid.layers.pad(tmp, padding)
tmp = fluid.layers.conv2d(
input,
num_filters=c_o,
filter_size=[k_h, k_w],
dilation=dilation,
groups=1,
act=act,
bias_attr=biased,
use_cudnn=False,
name=name)
return tmp
def zero_padding(input, padding):
return fluid.layers.pad(input,
[0, 0, 0, 0, padding, padding, padding, padding])
def bn(input, relu=False, name=None, is_test=False):
act = None
if relu:
act = 'relu'
name = input.name.split(".")[0] + "_bn"
tmp = fluid.layers.batch_norm(
input, act=act, momentum=0.95, epsilon=1e-5, name=name)
return tmp
def avg_pool(input, k_h, k_w, s_h, s_w, name=None, padding=0):
temp = fluid.layers.pool2d(
input,
pool_size=[k_h, k_w],
pool_type="avg",
pool_stride=[s_h, s_w],
pool_padding=padding,
name=name)
return temp
def max_pool(input, k_h, k_w, s_h, s_w, name=None, padding=0):
temp = fluid.layers.pool2d(
input,
pool_size=[k_h, k_w],
pool_type="max",
pool_stride=[s_h, s_w],
pool_padding=padding,
name=name)
return temp
def interp(input, out_shape):
out_shape = list(out_shape.astype("int32"))
return fluid.layers.resize_bilinear(input, out_shape=out_shape)
def dilation_convs(input):
tmp = res_block(input, filter_num=256, padding=1, name="conv3_2")
tmp = res_block(tmp, filter_num=256, padding=1, name="conv3_3")
tmp = res_block(tmp, filter_num=256, padding=1, name="conv3_4")
tmp = proj_block(tmp, filter_num=512, padding=2, dilation=2, name="conv4_1")
tmp = res_block(tmp, filter_num=512, padding=2, dilation=2, name="conv4_2")
tmp = res_block(tmp, filter_num=512, padding=2, dilation=2, name="conv4_3")
tmp = res_block(tmp, filter_num=512, padding=2, dilation=2, name="conv4_4")
tmp = res_block(tmp, filter_num=512, padding=2, dilation=2, name="conv4_5")
tmp = res_block(tmp, filter_num=512, padding=2, dilation=2, name="conv4_6")
tmp = proj_block(
tmp, filter_num=1024, padding=4, dilation=4, name="conv5_1")
tmp = res_block(tmp, filter_num=1024, padding=4, dilation=4, name="conv5_2")
tmp = res_block(tmp, filter_num=1024, padding=4, dilation=4, name="conv5_3")
return tmp
def pyramis_pooling(input, input_shape):
shape = np.ceil(input_shape / 32).astype("int32")
h, w = shape
pool1 = avg_pool(input, h, w, h, w)
pool1_interp = interp(pool1, shape)
pool2 = avg_pool(input, h / 2, w / 2, h / 2, w / 2)
pool2_interp = interp(pool2, shape)
pool3 = avg_pool(input, h / 3, w / 3, h / 3, w / 3)
pool3_interp = interp(pool3, shape)
pool4 = avg_pool(input, h / 4, w / 4, h / 4, w / 4)
pool4_interp = interp(pool4, shape)
conv5_3_sum = input + pool4_interp + pool3_interp + pool2_interp + pool1_interp
return conv5_3_sum
def shared_convs(image):
tmp = conv(image, 3, 3, 32, 2, 2, padding='SAME', name="conv1_1_3_3_s2")
tmp = bn(tmp, relu=True)
tmp = conv(tmp, 3, 3, 32, 1, 1, padding='SAME', name="conv1_2_3_3")
tmp = bn(tmp, relu=True)
tmp = conv(tmp, 3, 3, 64, 1, 1, padding='SAME', name="conv1_3_3_3")
tmp = bn(tmp, relu=True)
tmp = max_pool(tmp, 3, 3, 2, 2, padding=[1, 1])
tmp = proj_block(tmp, filter_num=128, padding=0, name="conv2_1")
tmp = res_block(tmp, filter_num=128, padding=1, name="conv2_2")
tmp = res_block(tmp, filter_num=128, padding=1, name="conv2_3")
tmp = proj_block(tmp, filter_num=256, padding=1, stride=2, name="conv3_1")
return tmp
def res_block(input, filter_num, padding=0, dilation=None, name=None):
tmp = conv(input, 1, 1, filter_num / 4, 1, 1, name=name + "_1_1_reduce")
tmp = bn(tmp, relu=True)
tmp = zero_padding(tmp, padding=padding)
if dilation is None:
tmp = conv(tmp, 3, 3, filter_num / 4, 1, 1, name=name + "_3_3")
else:
tmp = atrous_conv(
tmp, 3, 3, filter_num / 4, dilation, name=name + "_3_3")
tmp = bn(tmp, relu=True)
tmp = conv(tmp, 1, 1, filter_num, 1, 1, name=name + "_1_1_increase")
tmp = bn(tmp, relu=False)
tmp = input + tmp
tmp = fluid.layers.relu(tmp, name=name + "_relu")
return tmp
def proj_block(input, filter_num, padding=0, dilation=None, stride=1,
name=None):
proj = conv(
input, 1, 1, filter_num, stride, stride, name=name + "_1_1_proj")
proj_bn = bn(proj, relu=False)
tmp = conv(
input, 1, 1, filter_num / 4, stride, stride, name=name + "_1_1_reduce")
tmp = bn(tmp, relu=True)
tmp = zero_padding(tmp, padding=padding)
if padding == 0:
padding = 'SAME'
else:
padding = 'VALID'
if dilation is None:
tmp = conv(
tmp,
3,
3,
filter_num / 4,
1,
1,
padding=padding,
name=name + "_3_3")
else:
tmp = atrous_conv(
tmp,
3,
3,
filter_num / 4,
dilation,
padding=padding,
name=name + "_3_3")
tmp = bn(tmp, relu=True)
tmp = conv(tmp, 1, 1, filter_num, 1, 1, name=name + "_1_1_increase")
tmp = bn(tmp, relu=False)
tmp = proj_bn + tmp
tmp = fluid.layers.relu(tmp, name=name + "_relu")
return tmp
def sub_net_4(input, input_shape):
tmp = interp(input, out_shape=np.ceil(input_shape / 32))
tmp = dilation_convs(tmp)
tmp = pyramis_pooling(tmp, input_shape)
tmp = conv(tmp, 1, 1, 256, 1, 1, name="conv5_4_k1")
tmp = bn(tmp, relu=True)
tmp = interp(tmp, input_shape / 16)
return tmp
def sub_net_2(input):
tmp = conv(input, 1, 1, 128, 1, 1, name="conv3_1_sub2_proj")
tmp = bn(tmp, relu=False)
return tmp
def sub_net_1(input):
tmp = conv(input, 3, 3, 32, 2, 2, padding='SAME', name="conv1_sub1")
tmp = bn(tmp, relu=True)
tmp = conv(tmp, 3, 3, 32, 2, 2, padding='SAME', name="conv2_sub1")
tmp = bn(tmp, relu=True)
tmp = conv(tmp, 3, 3, 64, 2, 2, padding='SAME', name="conv3_sub1")
tmp = bn(tmp, relu=True)
tmp = conv(tmp, 1, 1, 128, 1, 1, name="conv3_sub1_proj")
tmp = bn(tmp, relu=False)
return tmp
def CCF24(sub2_out, sub4_out, input_shape):
tmp = zero_padding(sub4_out, padding=2)
tmp = atrous_conv(tmp, 3, 3, 128, 2, name="conv_sub4")
tmp = bn(tmp, relu=False)
tmp = tmp + sub2_out
tmp = fluid.layers.relu(tmp)
tmp = interp(tmp, input_shape / 8)
return tmp
def CCF124(sub1_out, sub24_out, input_shape):
tmp = zero_padding(sub24_out, padding=2)
tmp = atrous_conv(tmp, 3, 3, 128, 2, name="conv_sub2")
tmp = bn(tmp, relu=False)
tmp = tmp + sub1_out
tmp = fluid.layers.relu(tmp)
tmp = interp(tmp, input_shape / 4)
return tmp
def icnet(data, num_classes, input_shape):
image_sub1 = data
image_sub2 = interp(data, out_shape=input_shape * 0.5)
s_convs = shared_convs(image_sub2)
sub4_out = sub_net_4(s_convs, input_shape)
sub2_out = sub_net_2(s_convs)
sub1_out = sub_net_1(image_sub1)
sub24_out = CCF24(sub2_out, sub4_out, input_shape)
sub124_out = CCF124(sub1_out, sub24_out, input_shape)
conv6_cls = conv(
sub124_out, 1, 1, num_classes, 1, 1, biased=True, name="conv6_cls")
sub4_out = conv(
sub4_out, 1, 1, num_classes, 1, 1, biased=True, name="sub4_out")
sub24_out = conv(
sub24_out, 1, 1, num_classes, 1, 1, biased=True, name="sub24_out")
return sub4_out, sub24_out, conv6_cls
"""Infer for ICNet model."""
import cityscape
import argparse
import functools
import sys
import os
import cv2
import paddle.fluid as fluid
import paddle.v2 as paddle
from icnet import icnet
from utils import add_arguments, print_arguments, get_feeder_data
from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter
from paddle.fluid.initializer import init_on_cpu
import numpy as np
IMG_MEAN = np.array((103.939, 116.779, 123.68), dtype=np.float32)
parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser)
# yapf: disable
add_arg('model_path', str, None, "Model path.")
add_arg('images_list', str, None, "List file with images to be infered.")
add_arg('images_path', str, None, "The images path.")
add_arg('out_path', str, "./output", "Output path.")
add_arg('use_gpu', bool, True, "Whether use GPU to test.")
# yapf: enable
data_shape = [3, 1024, 2048]
num_classes = 19
label_colours = [
[128, 64, 128],
[244, 35, 231],
[69, 69, 69]
# 0 = road, 1 = sidewalk, 2 = building
,
[102, 102, 156],
[190, 153, 153],
[153, 153, 153]
# 3 = wall, 4 = fence, 5 = pole
,
[250, 170, 29],
[219, 219, 0],
[106, 142, 35]
# 6 = traffic light, 7 = traffic sign, 8 = vegetation
,
[152, 250, 152],
[69, 129, 180],
[219, 19, 60]
# 9 = terrain, 10 = sky, 11 = person
,
[255, 0, 0],
[0, 0, 142],
[0, 0, 69]
# 12 = rider, 13 = car, 14 = truck
,
[0, 60, 100],
[0, 79, 100],
[0, 0, 230]
# 15 = bus, 16 = train, 17 = motocycle
,
[119, 10, 32]
]
# 18 = bicycle
def color(input):
"""
Convert infered result to color image.
"""
result = []
for i in input.flatten():
result.append(
[label_colours[i][2], label_colours[i][1], label_colours[i][0]])
result = np.array(result).reshape([input.shape[0], input.shape[1], 3])
return result
def infer(args):
data_shape = cityscape.test_data_shape()
num_classes = cityscape.num_classes()
# define network
images = fluid.layers.data(name='image', shape=data_shape, dtype='float32')
_, _, sub124_out = icnet(images, num_classes,
np.array(data_shape[1:]).astype("float32"))
predict = fluid.layers.resize_bilinear(
sub124_out, out_shape=data_shape[1:3])
predict = fluid.layers.transpose(predict, perm=[0, 2, 3, 1])
predict = fluid.layers.reshape(predict, shape=[-1, num_classes])
_, predict = fluid.layers.topk(predict, k=1)
predict = fluid.layers.reshape(
predict,
shape=[data_shape[1], data_shape[2], -1]) # batch_size should be 1
inference_program = fluid.default_main_program().clone(for_test=True)
# prepare environment
place = fluid.CPUPlace()
if args.use_gpu:
place = fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
assert os.path.exists(args.model_path)
fluid.io.load_params(exe, args.model_path)
print "loaded model from: %s" % args.model_path
sys.stdout.flush()
if not os.path.isdir(args.out_path):
os.makedirs(args.out_path)
for line in open(args.images_list):
image_file = args.images_path + "/" + line.strip()
filename = os.path.basename(image_file)
image = paddle.image.load_image(
image_file, is_color=True).astype("float32")
image -= IMG_MEAN
img = paddle.image.to_chw(image)[np.newaxis, :]
image_t = fluid.core.LoDTensor()
image_t.set(img, place)
result = exe.run(inference_program,
feed={"image": image_t},
fetch_list=[predict])
cv2.imwrite(args.out_path + "/" + filename + "_result.png",
color(result[0]))
def main():
args = parser.parse_args()
print_arguments(args)
infer(args)
if __name__ == "__main__":
main()
"""Trainer for ICNet model."""
from icnet import icnet
import cityscape
import argparse
import functools
import sys
import time
import paddle.fluid as fluid
import numpy as np
from utils import add_arguments, print_arguments, get_feeder_data
from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter
from paddle.fluid.initializer import init_on_cpu
parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser)
# yapf: disable
add_arg('batch_size', int, 16, "Minibatch size.")
add_arg('checkpoint_path', str, None, "Checkpoint svae path.")
add_arg('init_model', str, None, "Pretrain model path.")
add_arg('use_gpu', bool, True, "Whether use GPU to train.")
add_arg('random_mirror', bool, True, "Whether prepare by random mirror.")
add_arg('random_scaling', bool, True, "Whether prepare by random scaling.")
# yapf: enable
LAMBDA1 = 0.16
LAMBDA2 = 0.4
LAMBDA3 = 1.0
LEARNING_RATE = 0.003
POWER = 0.9
LOG_PERIOD = 1
CHECKPOINT_PERIOD = 1000
TOTAL_STEP = 60000
no_grad_set = []
def create_loss(predict, label, mask, num_classes):
predict = fluid.layers.transpose(predict, perm=[0, 2, 3, 1])
predict = fluid.layers.reshape(predict, shape=[-1, num_classes])
label = fluid.layers.reshape(label, shape=[-1, 1])
predict = fluid.layers.gather(predict, mask)
label = fluid.layers.gather(label, mask)
label = fluid.layers.cast(label, dtype="int64")
loss = fluid.layers.softmax_with_cross_entropy(predict, label)
no_grad_set.append(label.name)
return fluid.layers.reduce_mean(loss)
def poly_decay():
global_step = _decay_step_counter()
with init_on_cpu():
decayed_lr = LEARNING_RATE * (fluid.layers.pow(
(1 - global_step / TOTAL_STEP), POWER))
return decayed_lr
def train(args):
data_shape = cityscape.train_data_shape()
num_classes = cityscape.num_classes()
# define network
images = fluid.layers.data(name='image', shape=data_shape, dtype='float32')
label_sub1 = fluid.layers.data(name='label_sub1', shape=[1], dtype='int32')
label_sub2 = fluid.layers.data(name='label_sub2', shape=[1], dtype='int32')
label_sub4 = fluid.layers.data(name='label_sub4', shape=[1], dtype='int32')
mask_sub1 = fluid.layers.data(name='mask_sub1', shape=[-1], dtype='int32')
mask_sub2 = fluid.layers.data(name='mask_sub2', shape=[-1], dtype='int32')
mask_sub4 = fluid.layers.data(name='mask_sub4', shape=[-1], dtype='int32')
sub4_out, sub24_out, sub124_out = icnet(
images, num_classes, np.array(data_shape[1:]).astype("float32"))
loss_sub4 = create_loss(sub4_out, label_sub4, mask_sub4, num_classes)
loss_sub24 = create_loss(sub24_out, label_sub2, mask_sub2, num_classes)
loss_sub124 = create_loss(sub124_out, label_sub1, mask_sub1, num_classes)
reduced_loss = LAMBDA1 * loss_sub4 + LAMBDA2 * loss_sub24 + LAMBDA3 * loss_sub124
regularizer = fluid.regularizer.L2Decay(0.0001)
optimizer = fluid.optimizer.Momentum(
learning_rate=poly_decay(), momentum=0.9, regularization=regularizer)
_, params_grads = optimizer.minimize(reduced_loss, no_grad_set=no_grad_set)
# prepare environment
place = fluid.CPUPlace()
if args.use_gpu:
place = fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
if args.init_model is not None:
print "load model from: %s" % args.init_model
sys.stdout.flush()
fluid.io.load_params(exe, args.init_model)
iter_id = 0
t_loss = 0.
sub4_loss = 0.
sub24_loss = 0.
sub124_loss = 0.
train_reader = cityscape.train(
args.batch_size, flip=args.random_mirror, scaling=args.random_scaling)
while True:
# train a pass
for data in train_reader():
if iter_id > TOTAL_STEP:
return
iter_id += 1
results = exe.run(
feed=get_feeder_data(data, place),
fetch_list=[reduced_loss, loss_sub4, loss_sub24, loss_sub124])
t_loss += results[0]
sub4_loss += results[1]
sub24_loss += results[2]
sub124_loss += results[3]
# training log
if iter_id % LOG_PERIOD == 0:
print "Iter[%d]; train loss: %.3f; sub4_loss: %.3f; sub24_loss: %.3f; sub124_loss: %.3f" % (
iter_id, t_loss / LOG_PERIOD, sub4_loss / LOG_PERIOD,
sub24_loss / LOG_PERIOD, sub124_loss / LOG_PERIOD)
t_loss = 0.
sub4_loss = 0.
sub24_loss = 0.
sub124_loss = 0.
sys.stdout.flush()
if iter_id % CHECKPOINT_PERIOD == 0:
dir_name = args.checkpoint_path + "/" + str(iter_id)
fluid.io.save_persistables(exe, dirname=dir_name)
print "Saved checkpoint: %s" % (dir_name)
def main():
args = parser.parse_args()
print_arguments(args)
train(args)
if __name__ == "__main__":
main()
"""Contains common utility functions."""
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import distutils.util
import numpy as np
from paddle.fluid import core
def print_arguments(args):
"""Print argparse's arguments.
Usage:
.. code-block:: python
parser = argparse.ArgumentParser()
parser.add_argument("name", default="Jonh", type=str, help="User name.")
args = parser.parse_args()
print_arguments(args)
:param args: Input argparse.Namespace for printing.
:type args: argparse.Namespace
"""
print("----------- Configuration Arguments -----------")
for arg, value in sorted(vars(args).iteritems()):
print("%s: %s" % (arg, value))
print("------------------------------------------------")
def add_arguments(argname, type, default, help, argparser, **kwargs):
"""Add argparse's argument.
Usage:
.. code-block:: python
parser = argparse.ArgumentParser()
add_argument("name", str, "Jonh", "User name.", parser)
args = parser.parse_args()
"""
type = distutils.util.strtobool if type == bool else type
argparser.add_argument(
"--" + argname,
default=default,
type=type,
help=help + ' Default: %(default)s.',
**kwargs)
def to_lodtensor(data, place):
seq_lens = [len(seq) for seq in data]
cur_len = 0
lod = [cur_len]
for l in seq_lens:
cur_len += l
lod.append(cur_len)
flattened_data = np.concatenate(data, axis=0).astype("int32")
flattened_data = flattened_data.reshape([len(flattened_data), 1])
res = core.LoDTensor()
res.set(flattened_data, place)
res.set_lod([lod])
return res
def get_feeder_data(data, place, for_test=False):
feed_dict = {}
image_t = core.LoDTensor()
image_t.set(data[0], place)
feed_dict["image"] = image_t
if not for_test:
labels_sub1_t = core.LoDTensor()
labels_sub2_t = core.LoDTensor()
labels_sub4_t = core.LoDTensor()
mask_sub1_t = core.LoDTensor()
mask_sub2_t = core.LoDTensor()
mask_sub4_t = core.LoDTensor()
labels_sub1_t.set(data[1], place)
labels_sub2_t.set(data[3], place)
mask_sub1_t.set(data[2], place)
mask_sub2_t.set(data[4], place)
labels_sub4_t.set(data[5], place)
mask_sub4_t.set(data[6], place)
feed_dict["label_sub1"] = labels_sub1_t
feed_dict["label_sub2"] = labels_sub2_t
feed_dict["mask_sub1"] = mask_sub1_t
feed_dict["mask_sub2"] = mask_sub2_t
feed_dict["label_sub4"] = labels_sub4_t
feed_dict["mask_sub4"] = mask_sub4_t
else:
label_t = core.LoDTensor()
mask_t = core.LoDTensor()
label_t.set(data[1], place)
mask_t.set(data[2], place)
feed_dict["label"] = label_t
feed_dict["mask"] = mask_t
return feed_dict
The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html). # Image Classification and Model Zoo
Image classification, which is an important field of computer vision, is to classify an image into pre-defined labels. Recently, many researchers developed different kinds of neural networks and highly improve the classification performance. This page introduces how to do image classification with PaddlePaddle Fluid, including [data preparation](#data-preparation), [training](#training-a-model), [finetuning](#finetuning), [evaluation](#evaluation) and [inference](#inference).
--- ---
## Table of Contents
- [Installation](#installation)
- [Data preparation](#data-preparation)
- [Training a model with flexible parameters](#training-a-model)
- [Finetuning](#finetuning)
- [Evaluation](#evaluation)
- [Inference](#inference)
- [Supported models and performances](#supported-models)
# SE-ResNeXt for image classification ## Installation
This model built with paddle fluid is still under active development and is not Running sample code in this directory requires PaddelPaddle Fluid v0.13.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html) and make an update.
the final version. We welcome feedbacks.
## Introduction ## Data preparation
The current code support the training of [SE-ResNeXt](https://arxiv.org/abs/1709.01507) (50/152 layers). An example for ImageNet classification is as follows. First of all, preparation of imagenet data can be done as:
```
cd data/ILSVRC2012/
sh download_imagenet2012.sh
```
## Data Preparation In the shell script ```download_imagenet2012.sh```, there are three steps to prepare data:
1. Download ImageNet-2012 dataset **step-1:** Register at ```image-net.org``` first in order to get a pair of ```Username``` and ```AccessKey```, which are used to download ImageNet data.
```
cd data/
mkdir -p ILSVRC2012/
cd ILSVRC2012/
# get training set
wget http://www.image-net.org/challenges/LSVRC/2012/nnoupb/ILSVRC2012_img_train.tar
# get validation set
wget http://www.image-net.org/challenges/LSVRC/2012/nnoupb/ILSVRC2012_img_val.tar
# prepare directory
tar xf ILSVRC2012_img_train.tar
tar xf ILSVRC2012_img_val.tar
# unzip all classes data using unzip.sh **step-2:** Download ImageNet-2012 dataset from website. The training and validation data will be downloaded into folder "train" and "val" respectively. Please note that the size of data is more than 40 GB, it will take much time to download. Users who have downloaded the ImageNet data can organize it into ```data/ILSVRC2012``` directly.
sh unzip.sh
```
2. Download training and validation label files from [ImageNet2012 url](https://pan.baidu.com/s/1Y6BCo0nmxsm_FsEqmx2hKQ)(password:```wx99```). Untar it into workspace ```ILSVRC2012/```. The files include **step-3:** Download training and validation label files. There are two label files which contain train and validation image labels respectively:
**train_list.txt**: training list of imagenet 2012 classification task, with each line seperated by SPACE. * *train_list.txt*: label file of imagenet-2012 training set, with each line seperated by ```SPACE```, like:
``` ```
train/n02483708/n02483708_2436.jpeg 369 train/n02483708/n02483708_2436.jpeg 369
train/n03998194/n03998194_7015.jpeg 741 train/n03998194/n03998194_7015.jpeg 741
...@@ -41,7 +40,7 @@ train/n04596742/n04596742_3032.jpeg 909 ...@@ -41,7 +40,7 @@ train/n04596742/n04596742_3032.jpeg 909
train/n03208938/n03208938_7065.jpeg 535 train/n03208938/n03208938_7065.jpeg 535
... ...
``` ```
**val_list.txt**: validation list of imagenet 2012 classification task, with each line seperated by SPACE. * *val_list.txt*: label file of imagenet-2012 validation set, with each line seperated by ```SPACE```, like.
``` ```
val/ILSVRC2012_val_00000001.jpeg 65 val/ILSVRC2012_val_00000001.jpeg 65
val/ILSVRC2012_val_00000002.jpeg 970 val/ILSVRC2012_val_00000002.jpeg 970
...@@ -50,38 +49,160 @@ val/ILSVRC2012_val_00000004.jpeg 809 ...@@ -50,38 +49,160 @@ val/ILSVRC2012_val_00000004.jpeg 809
val/ILSVRC2012_val_00000005.jpeg 516 val/ILSVRC2012_val_00000005.jpeg 516
... ...
``` ```
**synset_words.txt**: the semantic label of each class.
## Training a model ## Training a model with flexible parameters
To start a training task, one can use command line as: After data preparation, one can start the training step by:
``` ```
python train.py --num_layers=50 --batch_size=8 --with_mem_opt=True --parallel_exe=False python train.py \
--model=SE_ResNeXt50_32x4d \
--batch_size=32 \
--total_images=1281167 \
--class_dim=1000
--image_shape=3,224,224 \
--model_save_dir=output/ \
--with_mem_opt=False \
--lr_strategy=piecewise_decay \
--lr=0.1
``` ```
## Finetune a model **parameter introduction:**
* **model**: name model to use. Default: "SE_ResNeXt50_32x4d".
* **num_epochs**: the number of epochs. Default: 120.
* **batch_size**: the size of each mini-batch. Default: 256.
* **use_gpu**: whether to use GPU or not. Default: True.
* **total_images**: total number of images in the training set. Default: 1281167.
* **class_dim**: the class number of the classification task. Default: 1000.
* **image_shape**: input size of the network. Default: "3,224,224".
* **model_save_dir**: the directory to save trained model. Default: "output".
* **with_mem_opt**: whether to use memory optimization or not. Default: False.
* **lr_strategy**: learning rate changing strategy. Default: "piecewise_decay".
* **lr**: initialized learning rate. Default: 0.1.
* **pretrained_model**: model path for pretraining. Default: None.
* **checkpoint**: the checkpoint path to resume. Default: None.
**data reader introduction:** Data reader is defined in ```reader.py```. In [training stage](#training-a-model), random crop and flipping are used, while center crop is used in [evaluation](#inference) and [inference](#inference) stages. Supported data augmentation includes:
* rotation
* color jitter
* random crop
* center crop
* resize
* flipping
**training curve:** The training curve can be drawn based on training log. For example, the log from training AlexNet is like:
``` ```
python train.py --num_layers=50 --batch_size=8 --with_mem_opt=True --parallel_exe=False --pretrained_model="pretrain/96/" End pass 1, train_loss 6.23153877258, train_acc1 0.0150696625933, train_acc5 0.0552518665791, test_loss 5.41981744766, test_acc1 0.0519132651389, test_acc5 0.156150355935
End pass 2, train_loss 5.15442800522, train_acc1 0.0784279331565, train_acc5 0.211050540209, test_loss 4.45795249939, test_acc1 0.140469551086, test_acc5 0.333163291216
End pass 3, train_loss 4.51505613327, train_acc1 0.145300447941, train_acc5 0.331567406654, test_loss 3.86548018456, test_acc1 0.219443559647, test_acc5 0.446448504925
End pass 4, train_loss 4.12735557556, train_acc1 0.19437250495, train_acc5 0.405713528395, test_loss 3.56990146637, test_acc1 0.264536827803, test_acc5 0.507190704346
End pass 5, train_loss 3.87505435944, train_acc1 0.229518383741, train_acc5 0.453582793474, test_loss 3.35345435143, test_acc1 0.297349333763, test_acc5 0.54753267765
End pass 6, train_loss 3.6929500103, train_acc1 0.255628824234, train_acc5 0.487188398838, test_loss 3.17112898827, test_acc1 0.326953113079, test_acc5 0.581780135632
End pass 7, train_loss 3.55882954597, train_acc1 0.275381118059, train_acc5 0.511990904808, test_loss 3.03736782074, test_acc1 0.349035382271, test_acc5 0.606293857098
End pass 8, train_loss 3.45595097542, train_acc1 0.291462600231, train_acc5 0.530815005302, test_loss 2.96034455299, test_acc1 0.362228929996, test_acc5 0.617390751839
End pass 9, train_loss 3.3745200634, train_acc1 0.303871691227, train_acc5 0.545210540295, test_loss 2.93932366371, test_acc1 0.37129303813, test_acc5 0.623573005199
...
``` ```
TBD
## Inference The error rate curves of AlexNet, ResNet50 and SE-ResNeXt-50 are shown in the figure below.
<p align="center">
<img src="images/curve.jpg" height=480 width=640 hspace='10'/> <br />
Training and validation Curves
</p>
## Finetuning
Finetuning is to finetune model weights in a specific task by loading pretrained weights. After initializing ```path_to_pretrain_model```, one can finetune a model as:
``` ```
python infer.py --num_layers=50 --batch_size=8 --model='model/90' --test_list='' python train.py
--model=SE_ResNeXt50_32x4d \
--pretrained_model=${path_to_pretrain_model} \
--batch_size=32 \
--total_images=1281167 \
--class_dim=1000 \
--image_shape=3,224,224 \
--model_save_dir=output/ \
--with_mem_opt=True \
--lr_strategy=piecewise_decay \
--lr=0.1
``` ```
TBD
## Results ## Evaluation
Evaluation is to evaluate the performance of a trained model. One can download [pretrained models](#supported-models) and set its path to ```path_to_pretrain_model```. Then top1/top5 accuracy can be obtained by running the following command:
```
python eval.py \
--model=SE_ResNeXt50_32x4d \
--batch_size=32 \
--class_dim=1000 \
--image_shape=3,224,224 \
--with_mem_opt=True \
--pretrained_model=${path_to_pretrain_model}
```
The SE-ResNeXt-50 model is trained by starting with learning rate ```0.1``` and decaying it by ```0.1``` after each ```10``` epoches. Top-1/Top-5 Validation Accuracy on ImageNet 2012 is listed in table. According to the congfiguration of evaluation, the output log is like:
```
Testbatch 0,loss 2.1786134243, acc1 0.625,acc5 0.8125,time 0.48 sec
Testbatch 10,loss 0.898496925831, acc1 0.75,acc5 0.9375,time 0.51 sec
Testbatch 20,loss 1.32524681091, acc1 0.6875,acc5 0.9375,time 0.37 sec
Testbatch 30,loss 1.46830511093, acc1 0.5,acc5 0.9375,time 0.51 sec
Testbatch 40,loss 1.12802267075, acc1 0.625,acc5 0.9375,time 0.35 sec
Testbatch 50,loss 0.881597697735, acc1 0.8125,acc5 1.0,time 0.32 sec
Testbatch 60,loss 0.300163716078, acc1 0.875,acc5 1.0,time 0.48 sec
Testbatch 70,loss 0.692037761211, acc1 0.875,acc5 1.0,time 0.35 sec
Testbatch 80,loss 0.0969972759485, acc1 1.0,acc5 1.0,time 0.41 sec
...
```
|model | [original paper(Fig.5)](https://arxiv.org/abs/1709.01507) | Pytorch | Paddle fluid ## Inference
|- | :-: |:-: | -: Inference is used to get prediction score or image features based on trained models.
|SE-ResNeXt-50 | 77.6%/- | 77.71%/93.63% | 77.42%/93.50% ```
python infer.py \
--model=SE_ResNeXt50_32x4d \
--batch_size=32 \
--class_dim=1000 \
--image_shape=3,224,224 \
--with_mem_opt=True \
--pretrained_model=${path_to_pretrain_model}
```
The output contains predication results, including maximum score (before softmax) and corresponding predicted label.
```
Test-0-score: [13.168352], class [491]
Test-1-score: [7.913302], class [975]
Test-2-score: [16.959702], class [21]
Test-3-score: [14.197695], class [383]
Test-4-score: [12.607652], class [878]
Test-5-score: [17.725458], class [15]
Test-6-score: [12.678599], class [118]
Test-7-score: [12.353498], class [505]
Test-8-score: [20.828007], class [747]
Test-9-score: [15.135801], class [315]
Test-10-score: [14.585114], class [920]
Test-11-score: [13.739927], class [679]
Test-12-score: [15.040644], class [386]
...
```
## Supported models and performances
Models are trained by starting with learning rate ```0.1``` and decaying it by ```0.1``` after each pre-defined epoches, if not special introduced. Available top-1/top-5 validation accuracy on ImageNet 2012 are listed in table. Pretrained models can be downloaded by clicking related model names.
## Released models |model | top-1/top-5 accuracy
|model | Baidu Cloud
|- | -: |- | -:
|SE-ResNeXt-50 | [url]() |[AlexNet](http://paddle-imagenet-models.bj.bcebos.com/alexnet_model.tar) | 57.21%/79.72%
TBD |VGG11 | -
|VGG13 | -
|VGG16 | -
|VGG19 | -
|GoogleNet | -
|InceptionV4 | -
|MobileNet | -
|[ResNet50](http://paddle-imagenet-models.bj.bcebos.com/resnet_50_model.tar) | 76.63%/93.10%
|ResNet101 | -
|ResNet152 | -
|[SE_ResNeXt50_32x4d](http://paddle-imagenet-models.bj.bcebos.com/se_resnext_50_model.tar) | 78.33%/93.96%
|SE_ResNeXt101_32x4d | -
|SE_ResNeXt152_32x4d | -
|DPN68 | -
|DPN92 | -
|DPN98 | -
|DPN107 | -
|DPN131 | -
# 图像分类以及模型库
图像分类是计算机视觉的重要领域,它的目标是将图像分类到预定义的标签。近期,需要研究者提出很多不同种类的神经网络,并且极大的提升了分类算法的性能。本页将介绍如何使用PaddlePaddle进行图像分类,包括[数据准备](#data-preparation)[训练](#training-a-model)[参数微调](#finetuning)[模型评估](#evaluation)以及[模型推断](#inference)
---
## 内容
- [安装](#installation)
- [数据准备](#data-preparation)
- [模型训练](#training-a-model)
- [参数微调](#finetuning)
- [模型评估](#evaluation)
- [模型推断](#inference)
- [已有模型及其性能](#supported-models)
## 安装
在当前目录下运行样例代码需要PadddlePaddle Fluid的v0.13.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明来更新PaddlePaddle。
## 数据准备
下面给出了ImageNet分类任务的样例,首先,通过如下的方式进行数据的准备:
```
cd data/ILSVRC2012/
sh download_imagenet2012.sh
```
```download_imagenet2012.sh```脚本中,通过下面三步来准备数据:
**步骤一:** 首先在```image-net.org```网站上完成注册,用于获得一对```Username``````AccessKey```
**步骤二:** 从ImageNet官网下载ImageNet-2012的图像数据。训练以及验证数据集会分别被下载到"train" 和 "val" 目录中。请注意,ImaegNet数据的大小超过40GB,下载非常耗时;已经自行下载ImageNet的用户可以直接将数据组织放置到```data/ILSVRC2012```
**步骤三:** 下载训练与验证集合对应的标签文件。下面两个文件分别包含了训练集合与验证集合中图像的标签:
* *train_list.txt*: ImageNet-2012训练集合的标签文件,每一行采用"空格"分隔图像路径与标注,例如:
```
train/n02483708/n02483708_2436.jpeg 369
train/n03998194/n03998194_7015.jpeg 741
train/n04523525/n04523525_38118.jpeg 884
train/n04596742/n04596742_3032.jpeg 909
train/n03208938/n03208938_7065.jpeg 535
...
```
* *val_list.txt*: ImageNet-2012验证集合的标签文件,每一行采用"空格"分隔图像路径与标注,例如:
```
val/ILSVRC2012_val_00000001.jpeg 65
val/ILSVRC2012_val_00000002.jpeg 970
val/ILSVRC2012_val_00000003.jpeg 230
val/ILSVRC2012_val_00000004.jpeg 809
val/ILSVRC2012_val_00000005.jpeg 516
...
```
## 模型训练
数据准备完毕后,可以通过如下的方式启动训练:
```
python train.py \
--model=SE_ResNeXt50_32x4d \
--batch_size=32 \
--total_images=1281167 \
--class_dim=1000
--image_shape=3,224,224 \
--model_save_dir=output/ \
--with_mem_opt=False \
--lr_strategy=piecewise_decay \
--lr=0.1
```
**参数说明:**
* **model**: name model to use. Default: "SE_ResNeXt50_32x4d".
* **num_epochs**: the number of epochs. Default: 120.
* **batch_size**: the size of each mini-batch. Default: 256.
* **use_gpu**: whether to use GPU or not. Default: True.
* **total_images**: total number of images in the training set. Default: 1281167.
* **class_dim**: the class number of the classification task. Default: 1000.
* **image_shape**: input size of the network. Default: "3,224,224".
* **model_save_dir**: the directory to save trained model. Default: "output".
* **with_mem_opt**: whether to use memory optimization or not. Default: False.
* **lr_strategy**: learning rate changing strategy. Default: "piecewise_decay".
* **lr**: initialized learning rate. Default: 0.1.
* **pretrained_model**: model path for pretraining. Default: None.
* **checkpoint**: the checkpoint path to resume. Default: None.
**数据读取器说明:** 数据读取器定义在```reader.py```中。在[训练阶段](#training-a-model), 默认采用的增广方式是随机裁剪与水平翻转, 而在[评估](#inference)[推断](#inference)阶段用的默认方式是中心裁剪。当前支持的数据增广方式有:
* 旋转
* 颜色抖动
* 随机裁剪
* 中心裁剪
* 长宽调整
* 水平翻转
**训练曲线:** 通过训练过程中的日志可以画出训练曲线。举个例子,训练AlexNet出来的日志如下所示:
```
End pass 1, train_loss 6.23153877258, train_acc1 0.0150696625933, train_acc5 0.0552518665791, test_loss 5.41981744766, test_acc1 0.0519132651389, test_acc5 0.156150355935
End pass 2, train_loss 5.15442800522, train_acc1 0.0784279331565, train_acc5 0.211050540209, test_loss 4.45795249939, test_acc1 0.140469551086, test_acc5 0.333163291216
End pass 3, train_loss 4.51505613327, train_acc1 0.145300447941, train_acc5 0.331567406654, test_loss 3.86548018456, test_acc1 0.219443559647, test_acc5 0.446448504925
End pass 4, train_loss 4.12735557556, train_acc1 0.19437250495, train_acc5 0.405713528395, test_loss 3.56990146637, test_acc1 0.264536827803, test_acc5 0.507190704346
End pass 5, train_loss 3.87505435944, train_acc1 0.229518383741, train_acc5 0.453582793474, test_loss 3.35345435143, test_acc1 0.297349333763, test_acc5 0.54753267765
End pass 6, train_loss 3.6929500103, train_acc1 0.255628824234, train_acc5 0.487188398838, test_loss 3.17112898827, test_acc1 0.326953113079, test_acc5 0.581780135632
End pass 7, train_loss 3.55882954597, train_acc1 0.275381118059, train_acc5 0.511990904808, test_loss 3.03736782074, test_acc1 0.349035382271, test_acc5 0.606293857098
End pass 8, train_loss 3.45595097542, train_acc1 0.291462600231, train_acc5 0.530815005302, test_loss 2.96034455299, test_acc1 0.362228929996, test_acc5 0.617390751839
End pass 9, train_loss 3.3745200634, train_acc1 0.303871691227, train_acc5 0.545210540295, test_loss 2.93932366371, test_acc1 0.37129303813, test_acc5 0.623573005199
...
```
下图给出了AlexNet、ResNet50以及SE-ResNeXt-50网络的错误率曲线:
<p align="center">
<img src="images/curve.jpg" height=480 width=640 hspace='10'/> <br />
训练集合与验证集合上的错误率曲线
</p>
## 参数微调
参数微调是指在特定任务上微调已训练模型的参数。通过初始化```path_to_pretrain_model```,微调一个模型可以采用如下的命令:
```
python train.py
--model=SE_ResNeXt50_32x4d \
--pretrained_model=${path_to_pretrain_model} \
--batch_size=32 \
--total_images=1281167 \
--class_dim=1000 \
--image_shape=3,224,224 \
--model_save_dir=output/ \
--with_mem_opt=True \
--lr_strategy=piecewise_decay \
--lr=0.1
```
## 模型评估
模型评估是指对训练完毕的模型评估各类性能指标。用户可以下载[预训练模型](#supported-models)并且设置```path_to_pretrain_model```为模型所在路径。运行如下的命令,可以获得一个模型top-1/top-5精度:
```
python eval.py \
--model=SE_ResNeXt50_32x4d \
--batch_size=32 \
--class_dim=1000 \
--image_shape=3,224,224 \
--with_mem_opt=True \
--pretrained_model=${path_to_pretrain_model}
```
根据这个评估程序的配置,输出日志形式如下:
```
Testbatch 0,loss 2.1786134243, acc1 0.625,acc5 0.8125,time 0.48 sec
Testbatch 10,loss 0.898496925831, acc1 0.75,acc5 0.9375,time 0.51 sec
Testbatch 20,loss 1.32524681091, acc1 0.6875,acc5 0.9375,time 0.37 sec
Testbatch 30,loss 1.46830511093, acc1 0.5,acc5 0.9375,time 0.51 sec
Testbatch 40,loss 1.12802267075, acc1 0.625,acc5 0.9375,time 0.35 sec
Testbatch 50,loss 0.881597697735, acc1 0.8125,acc5 1.0,time 0.32 sec
Testbatch 60,loss 0.300163716078, acc1 0.875,acc5 1.0,time 0.48 sec
Testbatch 70,loss 0.692037761211, acc1 0.875,acc5 1.0,time 0.35 sec
Testbatch 80,loss 0.0969972759485, acc1 1.0,acc5 1.0,time 0.41 sec
...
```
## 模型推断
模型推断可以获取一个模型的预测分数或者图像的特征:
```
python infer.py \
--model=SE_ResNeXt50_32x4d \
--batch_size=32 \
--class_dim=1000 \
--image_shape=3,224,224 \
--with_mem_opt=True \
--pretrained_model=${path_to_pretrain_model}
```
输出的预测结果包括最高分数(未经过softmax处理)以及相应的预测标签。
```
Test-0-score: [13.168352], class [491]
Test-1-score: [7.913302], class [975]
Test-2-score: [16.959702], class [21]
Test-3-score: [14.197695], class [383]
Test-4-score: [12.607652], class [878]
Test-5-score: [17.725458], class [15]
Test-6-score: [12.678599], class [118]
Test-7-score: [12.353498], class [505]
Test-8-score: [20.828007], class [747]
Test-9-score: [15.135801], class [315]
Test-10-score: [14.585114], class [920]
Test-11-score: [13.739927], class [679]
Test-12-score: [15.040644], class [386]
...
```
## 已有模型及其性能
表格中列出了在"models"目录下支持的神经网络种类,并且给出了已完成训练的模型在ImageNet-2012验证集合上的top-1/top-5精度;如无特征说明,训练模型的初始学习率为```0.1```,每隔预定的epochs会下降```0.1```。预训练模型可以通过点击相应模型的名称进行下载。
|model | top-1/top-5 accuracy
|- | -:
|[AlexNet](http://paddle-imagenet-models.bj.bcebos.com/alexnet_model.tar) | 57.21%/79.72%
|VGG11 | -
|VGG13 | -
|VGG16 | -
|VGG19 | -
|GoogleNet | -
|InceptionV4 | -
|MobileNet | -
|[ResNet50](http://paddle-imagenet-models.bj.bcebos.com/resnet_50_model.tar) | 76.63%/93.10%
|ResNet101 | -
|ResNet152 | -
|[SE_ResNeXt50_32x4d](http://paddle-imagenet-models.bj.bcebos.com/se_resnext_50_model.tar) | 78.33%/93.96%
|SE_ResNeXt101_32x4d | -
|SE_ResNeXt152_32x4d | -
|DPN68 | -
|DPN92 | -
|DPN98 | -
|DPN107 | -
|DPN131 | -
...@@ -20,8 +20,8 @@ def calc_diff(f1, f2): ...@@ -20,8 +20,8 @@ def calc_diff(f1, f2):
d1 = np.load(f1) d1 = np.load(f1)
d2 = np.load(f2) d2 = np.load(f2)
print d1.shape #print d1.shape
print d2.shape #print d2.shape
#print d1[0, 0, 0:10, 0:10] #print d1[0, 0, 0:10, 0:10]
#print d2[0, 0, 0:10, 0:10] #print d2[0, 0, 0:10, 0:10]
#d1 = d1[:, :, 1:-2, 1:-2] #d1 = d1[:, :, 1:-2, 1:-2]
......
...@@ -78,6 +78,54 @@ def dump_results(results, names, root): ...@@ -78,6 +78,54 @@ def dump_results(results, names, root):
np.save(filename + '.npy', res) np.save(filename + '.npy', res)
def normalize_name(name_map):
return {
k.replace('/', '_'): v.replace('/', '_')
for k, v in name_map.items()
}
def rename_layer_name(names, net):
""" because the names of output layers from caffe maybe changed for 'INPLACE' operation,
and paddle's layers maybe fused, so we need to re-mapping their relationship for comparing
"""
#build a mapping from paddle's name to caffe's name
trace = getattr(net, 'name_trace', None)
cf_trace = trace['caffe']
real2cf = normalize_name(cf_trace['real2chg'])
pd_trace = trace['paddle']
pd2real = normalize_name(pd_trace['chg2real'])
pd_deleted = normalize_name(pd_trace['deleted'])
pd2cf_name = {}
for pd_name, real_name in pd2real.items():
if real_name in real2cf:
pd2cf_name[pd_name] = '%s.%s.%s.both_changed' \
% (real2cf[real_name], real_name, pd_name)
else:
pd2cf_name[pd_name] = '%s.%s.pd_changed' % (real_name, pd_name)
for pd_name, trace in pd_deleted.items():
assert pd_name not in pd2cf_name, "this name[%s] has already exist" % (
pd_name)
pd2cf_name[pd_name] = '%s.pd_deleted' % (pd_name)
for real_name, cf_name in real2cf.items():
if cf_name not in pd2cf_name:
pd2cf_name[cf_name] = '%s.cf_deleted' % (cf_name)
if real_name not in pd2cf_name:
pd2cf_name[real_name] = '%s.%s.cf_changed' % (cf_name, real_name)
ret = []
for name in names:
new_name = pd2cf_name[name] if name in pd2cf_name else name
print('remap paddle name[%s] to output name[%s]' % (name, new_name))
ret.append(new_name)
return ret
def load_model(exe, place, net_file, net_name, net_weight, debug): def load_model(exe, place, net_file, net_name, net_weight, debug):
""" load model using xxxnet.py and xxxnet.npy """ load model using xxxnet.py and xxxnet.npy
""" """
...@@ -117,7 +165,8 @@ def load_model(exe, place, net_file, net_name, net_weight, debug): ...@@ -117,7 +165,8 @@ def load_model(exe, place, net_file, net_name, net_weight, debug):
'feed_names': feed_names, 'feed_names': feed_names,
'fetch_vars': fetch_list_var, 'fetch_vars': fetch_list_var,
'fetch_names': fetch_list_name, 'fetch_names': fetch_list_name,
'feed_shapes': feed_shapes 'feed_shapes': feed_shapes,
'net': net
} }
...@@ -171,6 +220,7 @@ def infer(model_path, imgfile, net_file=None, net_name=None, debug=True): ...@@ -171,6 +220,7 @@ def infer(model_path, imgfile, net_file=None, net_name=None, debug=True):
fetch_targets = ret['fetch_vars'] fetch_targets = ret['fetch_vars']
fetch_list_name = ret['fetch_names'] fetch_list_name = ret['fetch_names']
feed_shapes = ret['feed_shapes'] feed_shapes = ret['feed_shapes']
net = ret['net']
input_name = feed_names[0] input_name = feed_names[0]
input_shape = feed_shapes[0] input_shape = feed_shapes[0]
...@@ -182,7 +232,8 @@ def infer(model_path, imgfile, net_file=None, net_name=None, debug=True): ...@@ -182,7 +232,8 @@ def infer(model_path, imgfile, net_file=None, net_name=None, debug=True):
if debug is True: if debug is True:
dump_path = 'results.paddle' dump_path = 'results.paddle'
dump_results(results, fetch_list_name, dump_path) dump_names = rename_layer_name(fetch_list_name, net)
dump_results(results, dump_names, dump_path)
print('all result of layers dumped to [%s]' % (dump_path)) print('all result of layers dumped to [%s]' % (dump_path))
else: else:
result = results[0] result = results[0]
......
...@@ -19,4 +19,6 @@ if [[ $# -eq 3 ]];then ...@@ -19,4 +19,6 @@ if [[ $# -eq 3 ]];then
else else
caffe_file="./results/${model_name}.caffe/${2}.npy" caffe_file="./results/${model_name}.caffe/${2}.npy"
fi fi
python ./compare.py $paddle_file $caffe_file cmd="python ./compare.py $paddle_file $caffe_file"
echo $cmd
eval $cmd
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
#function: #function:
# a tool used to compare all layers' results # a tool used to compare all layers' results
# #
#set -x
if [[ $# -ne 1 ]];then if [[ $# -ne 1 ]];then
echo "usage:" echo "usage:"
echo " bash $0 [model_name]" echo " bash $0 [model_name]"
...@@ -13,11 +13,20 @@ fi ...@@ -13,11 +13,20 @@ fi
model_name=$1 model_name=$1
prototxt="models.caffe/$model_name/${model_name}.prototxt" prototxt="models.caffe/$model_name/${model_name}.prototxt"
layers=$(cat $prototxt | perl -ne 'if(/^\s+name\s*:\s*\"([^\"]+)/){print $1."\n";}') cat $prototxt | grep name | perl -ne 'if(/^\s*name\s*:\s+\"([^\"]+)/){ print $1."\n";}' >.layer_names
final_layer=$(cat $prototxt | perl -ne 'if(/^\s*top\s*:\s+\"([^\"]+)/){ print $1."\n";}' | tail -n1)
ret=$(grep "^$final_layer$" .layer_names | wc -l)
if [[ $ret -eq 0 ]];then
echo $final_layer >>.layer_names
fi
for i in $layers;do for i in $(cat .layer_names);do
i=${i//\//_}
cf_npy="results/${model_name}.caffe/${i}.npy" cf_npy="results/${model_name}.caffe/${i}.npy"
pd_npy="results/${model_name}.paddle/${i}.npy" #pd_npy="results/${model_name}.paddle/${i}.npy"
#pd_npy=$(find results/${model_name}.paddle -iname "${i}*.npy" | head -n1)
pd_npy=$(find results/${model_name}.paddle -iname "${i}.*npy" | grep deleted -v | head -n1)
if [[ ! -e $cf_npy ]];then if [[ ! -e $cf_npy ]];then
echo "caffe's result not exist[$cf_npy]" echo "caffe's result not exist[$cf_npy]"
......
...@@ -29,8 +29,8 @@ fi ...@@ -29,8 +29,8 @@ fi
mkdir -p $results_root mkdir -p $results_root
model_prototxt="models.caffe/$model_name/${model_name}.prototxt" prototxt="models.caffe/$model_name/${model_name}.prototxt"
model_caffemodel="models.caffe/${model_name}/${model_name}.caffemodel" caffemodel="models.caffe/${model_name}/${model_name}.caffemodel"
#1, dump layers' results from paddle #1, dump layers' results from paddle
paddle_results="$results_root/${model_name}.paddle" paddle_results="$results_root/${model_name}.paddle"
...@@ -47,7 +47,11 @@ mv results.paddle $paddle_results ...@@ -47,7 +47,11 @@ mv results.paddle $paddle_results
caffe_results="$results_root/${model_name}.caffe" caffe_results="$results_root/${model_name}.caffe"
rm -rf $caffe_results rm -rf $caffe_results
rm -rf "results.caffe" rm -rf "results.caffe"
cfpython ./infer.py caffe $model_prototxt $model_caffemodel $paddle_results/data.npy PYTHON=`which cfpython`
if [[ -z $PYTHON ]];then
PYTHON=`which python`
fi
$PYTHON ./infer.py caffe $prototxt $caffemodel $paddle_results/data.npy
if [[ $? -ne 0 ]] || [[ ! -e "results.caffe" ]];then if [[ $? -ne 0 ]] || [[ ! -e "results.caffe" ]];then
echo "not found caffe's results, maybe failed to do inference with caffe" echo "not found caffe's results, maybe failed to do inference with caffe"
exit 1 exit 1
...@@ -55,10 +59,25 @@ fi ...@@ -55,10 +59,25 @@ fi
mv results.caffe $caffe_results mv results.caffe $caffe_results
#3, extract layer names #3, extract layer names
cat $model_prototxt | grep name | perl -ne 'if(/^\s*name:\s+\"([^\"]+)/){ print $1."\n";}' >.layer_names cat $prototxt | grep name | perl -ne 'if(/^\s*name\s*:\s+\"([^\"]+)/){ print $1."\n";}' >.layer_names
final_layer=$(cat $prototxt | perl -ne 'if(/^\s*top\s*:\s+\"([^\"]+)/){ print $1."\n";}' | tail -n1)
ret=$(grep "^$final_layer$" .layer_names | wc -l)
if [[ $ret -eq 0 ]];then
echo $final_layer >>.layer_names
fi
#4, compare one by one #4, compare one by one
for i in $(cat ".layer_names" | tail -n1);do #for i in $(cat .layer_names);do
for i in $(cat .layer_names | tail -n1);do
i=${i//\//_}
echo "process $i" echo "process $i"
python compare.py $caffe_results/${i}.npy $paddle_results/${i}.npy pd_npy=$(find $paddle_results/ -iname "${i}.*npy" | grep deleted -v | head -n1)
#pd_npy="$paddle_results/${i}.npy"
if [[ -f $pd_npy ]];then
$PYTHON compare.py $caffe_results/${i}.npy $pd_npy
else
echo "not found npy file[${i}.*npy] for layer[$i]"
exit 1
fi
done done
...@@ -71,7 +71,9 @@ if [[ -z $only_convert ]];then ...@@ -71,7 +71,9 @@ if [[ -z $only_convert ]];then
if [[ -z $net_name ]];then if [[ -z $net_name ]];then
net_name="MyNet" net_name="MyNet"
fi fi
$PYTHON ./infer.py dump $net_file $weight_file $imgfile $net_name cmd="$PYTHON ./infer.py dump $net_file $weight_file $imgfile $net_name"
echo $cmd
eval $cmd
ret=$? ret=$?
fi fi
exit $ret exit $ret
#!/bin/bash
#
#script to test all models
#
models="alexnet vgg16 googlenet resnet152 resnet101 resnet50"
for i in $models;do
echo "begin to process $i"
bash ./tools/diff.sh $i 2>&1
echo "finished to process $i with ret[$?]"
done
...@@ -58,11 +58,13 @@ def argmax_layer(input, name, out_max_val=False, top_k=1, axis=-1): ...@@ -58,11 +58,13 @@ def argmax_layer(input, name, out_max_val=False, top_k=1, axis=-1):
if axis < 0: if axis < 0:
axis += len(input.shape) axis += len(input.shape)
topk_var, index_var = fluid.layers.topk(input=input, k=top_k)
if out_max_val is True: if out_max_val is True:
topk_var, index_var = fluid.layers.topk(input=input, k=top_k)
index_var = fluid.layers.cast(index_var, dtype=topk_var.dtype) index_var = fluid.layers.cast(index_var, dtype=topk_var.dtype)
output = fluid.layers.concat([index_var, topk_var], axis=axis) output = fluid.layers.concat(
[index_var, topk_var], axis=axis, name=name)
else: else:
topk_var, index_var = fluid.layers.topk(input=input, k=top_k, name=name)
output = index_var output = index_var
return output return output
......
...@@ -43,7 +43,7 @@ def axpy_layer(inputs, name): ...@@ -43,7 +43,7 @@ def axpy_layer(inputs, name):
x = inputs[1] x = inputs[1]
y = inputs[2] y = inputs[2]
output = fluid.layers.elementwise_mul(x, alpha, axis=0) output = fluid.layers.elementwise_mul(x, alpha, axis=0)
output = fluid.layers.elementwise_add(output, y) output = fluid.layers.elementwise_add(output, y, name=name)
return output return output
......
...@@ -63,9 +63,10 @@ class Node(object): ...@@ -63,9 +63,10 @@ class Node(object):
class Graph(object): class Graph(object):
def __init__(self, nodes=None, name=None): def __init__(self, nodes=None, name=None, trace={}):
self.nodes = nodes or [] self.nodes = nodes or []
self.node_lut = {node.name: node for node in self.nodes} self.node_lut = {node.name: node for node in self.nodes}
self.output_trace = trace
if name is None or name == '': if name is None or name == '':
self.name = 'MyNet' self.name = 'MyNet'
else: else:
...@@ -81,6 +82,15 @@ class Graph(object): ...@@ -81,6 +82,15 @@ class Graph(object):
except KeyError: except KeyError:
raise KaffeError('Layer not found: %s' % name) raise KaffeError('Layer not found: %s' % name)
def add_name_trace(self, trace, which='caffe'):
self.output_trace[which] = trace
def get_name_trace(self, which=None):
if which is not None:
return self.output_trace[which]
else:
return self.output_trace
def get_input_nodes(self): def get_input_nodes(self):
return [node for node in self.nodes if len(node.parents) == 0] return [node for node in self.nodes if len(node.parents) == 0]
...@@ -116,7 +126,7 @@ class Graph(object): ...@@ -116,7 +126,7 @@ class Graph(object):
*NodeKind.compute_output_shape(node)) *NodeKind.compute_output_shape(node))
def replaced(self, new_nodes): def replaced(self, new_nodes):
return Graph(nodes=new_nodes, name=self.name) return Graph(nodes=new_nodes, name=self.name, trace=self.output_trace)
def transformed(self, transformers): def transformed(self, transformers):
graph = self graph = self
...@@ -262,6 +272,7 @@ class GraphBuilder(object): ...@@ -262,6 +272,7 @@ class GraphBuilder(object):
# The current implementation only supports single-output nodes (note that a node can still # The current implementation only supports single-output nodes (note that a node can still
# have multiple children, since multiple child nodes can refer to the single top's name). # have multiple children, since multiple child nodes can refer to the single top's name).
node_outputs = {} node_outputs = {}
output_trace = {}
for layer in layers: for layer in layers:
node = graph.get_node(layer.name) node = graph.get_node(layer.name)
for input_name in layer.bottom: for input_name in layer.bottom:
...@@ -291,7 +302,26 @@ class GraphBuilder(object): ...@@ -291,7 +302,26 @@ class GraphBuilder(object):
# #
# For both cases, future references to this top re-routes to this node. # For both cases, future references to this top re-routes to this node.
node_outputs[output_name] = node node_outputs[output_name] = node
if output_name in output_trace:
output_trace[output_name].append(node.name)
else:
output_trace[output_name] = [output_name, node.name]
#build a mapping from real-name to changed-name(for caffe's INPLACE inference)
real2chg = {}
deleted = {}
for k, v in output_trace.items():
real2chg[v[-1]] = k
for n in v:
if n in real2chg:
continue
if n not in deleted:
deleted[n] = '%s.%s' % (k, v[-1])
graph.add_name_trace({
'real2chg': real2chg,
'deleted': deleted
}, 'caffe')
graph.compute_output_shapes() graph.compute_output_shapes()
return graph return graph
......
...@@ -216,7 +216,7 @@ class LayerAdapter(object): ...@@ -216,7 +216,7 @@ class LayerAdapter(object):
s_w = self.get_kernel_value( s_w = self.get_kernel_value(
params.stride_w, params.stride, 1, default=1) params.stride_w, params.stride, 1, default=1)
p_h = self.get_kernel_value(params.pad_h, params.pad, 0, default=0) p_h = self.get_kernel_value(params.pad_h, params.pad, 0, default=0)
p_w = self.get_kernel_value(params.pad_h, params.pad, 1, default=0) p_w = self.get_kernel_value(params.pad_w, params.pad, 1, default=0)
return KernelParameters(k_h, k_w, s_h, s_w, p_h, p_w) return KernelParameters(k_h, k_w, s_h, s_w, p_h, p_w)
......
...@@ -78,7 +78,7 @@ class MyNet(object): ...@@ -78,7 +78,7 @@ class MyNet(object):
exe, exe,
main_program=None, main_program=None,
model_filename=model_filename, model_filename=model_filename,
params_filename=model_filename) params_filename=params_filename)
return 0 return 0
......
...@@ -47,6 +47,8 @@ class Network(object): ...@@ -47,6 +47,8 @@ class Network(object):
self.trainable = trainable self.trainable = trainable
# Switch variable for dropout # Switch variable for dropout
self.paddle_env = None self.paddle_env = None
self.output_names = []
self.name_trace = None
self.setup() self.setup()
def setup(self): def setup(self):
...@@ -79,6 +81,10 @@ class Network(object): ...@@ -79,6 +81,10 @@ class Network(object):
data_dict = np.load(data_path).item() data_dict = np.load(data_path).item()
for op_name in data_dict: for op_name in data_dict:
if op_name == 'caffe2fluid_name_trace':
self.name_trace = data_dict[op_name]
continue
layer = self.layers[op_name] layer = self.layers[op_name]
for param_name, data in data_dict[op_name].iteritems(): for param_name, data in data_dict[op_name].iteritems():
try: try:
...@@ -117,6 +123,15 @@ class Network(object): ...@@ -117,6 +123,15 @@ class Network(object):
ident = sum(t.startswith(prefix) for t, _ in self.layers.items()) + 1 ident = sum(t.startswith(prefix) for t, _ in self.layers.items()) + 1
return '%s_%d' % (prefix, ident) return '%s_%d' % (prefix, ident)
def get_unique_output_name(self, prefix, layertype):
'''Returns an index-suffixed unique name for the given prefix.
This is used for auto-generating layer names based on the type-prefix.
'''
ident = sum(t.startswith(prefix) for t in self.output_names) + 1
unique_name = '%s.%s.output.%d' % (prefix, layertype, ident)
self.output_names.append(unique_name)
return unique_name
@layer @layer
def conv(self, def conv(self,
input, input,
...@@ -152,6 +167,7 @@ class Network(object): ...@@ -152,6 +167,7 @@ class Network(object):
act = None act = None
output = fluid.layers.conv2d( output = fluid.layers.conv2d(
name=self.get_unique_output_name(name, 'conv2d'),
input=input, input=input,
filter_size=[k_h, k_w], filter_size=[k_h, k_w],
num_filters=c_o, num_filters=c_o,
...@@ -170,7 +186,8 @@ class Network(object): ...@@ -170,7 +186,8 @@ class Network(object):
@layer @layer
def relu(self, input, name): def relu(self, input, name):
fluid = import_fluid() fluid = import_fluid()
output = fluid.layers.relu(x=input) output = fluid.layers.relu(
name=self.get_unique_output_name(name, 'relu'), x=input)
return output return output
def pool(self, pool_type, input, k_h, k_w, s_h, s_w, ceil_mode, padding, def pool(self, pool_type, input, k_h, k_w, s_h, s_w, ceil_mode, padding,
...@@ -182,6 +199,7 @@ class Network(object): ...@@ -182,6 +199,7 @@ class Network(object):
fluid = import_fluid() fluid = import_fluid()
output = fluid.layers.pool2d( output = fluid.layers.pool2d(
name=name,
input=input, input=input,
pool_size=k_hw, pool_size=k_hw,
pool_stride=s_hw, pool_stride=s_hw,
...@@ -200,8 +218,16 @@ class Network(object): ...@@ -200,8 +218,16 @@ class Network(object):
ceil_mode, ceil_mode,
padding=[0, 0], padding=[0, 0],
name=None): name=None):
return self.pool('max', input, k_h, k_w, s_h, s_w, ceil_mode, padding, return self.pool(
name) 'max',
input,
k_h,
k_w,
s_h,
s_w,
ceil_mode,
padding,
name=self.get_unique_output_name(name, 'max_pool'))
@layer @layer
def avg_pool(self, def avg_pool(self,
...@@ -213,25 +239,41 @@ class Network(object): ...@@ -213,25 +239,41 @@ class Network(object):
ceil_mode, ceil_mode,
padding=[0, 0], padding=[0, 0],
name=None): name=None):
return self.pool('avg', input, k_h, k_w, s_h, s_w, ceil_mode, padding, return self.pool(
name) 'avg',
input,
k_h,
k_w,
s_h,
s_w,
ceil_mode,
padding,
name=self.get_unique_output_name(name, 'avg_pool'))
@layer @layer
def sigmoid(self, input, name): def sigmoid(self, input, name):
fluid = import_fluid() fluid = import_fluid()
return fluid.layers.sigmoid(input) return fluid.layers.sigmoid(
input, name=self.get_unique_output_name(name, 'sigmoid'))
@layer @layer
def lrn(self, input, radius, alpha, beta, name, bias=1.0): def lrn(self, input, radius, alpha, beta, name, bias=1.0):
fluid = import_fluid() fluid = import_fluid()
output = fluid.layers.lrn(input=input, \ output = fluid.layers.lrn(input=input,
n=radius, k=bias, alpha=alpha, beta=beta, name=name) n=radius,
k=bias,
alpha=alpha,
beta=beta,
name=self.get_unique_output_name(name, 'lrn'))
return output return output
@layer @layer
def concat(self, inputs, axis, name): def concat(self, inputs, axis, name):
fluid = import_fluid() fluid = import_fluid()
output = fluid.layers.concat(input=inputs, axis=axis) output = fluid.layers.concat(
input=inputs,
axis=axis,
name=self.get_unique_output_name(name, 'concat'))
return output return output
@layer @layer
...@@ -239,7 +281,8 @@ class Network(object): ...@@ -239,7 +281,8 @@ class Network(object):
fluid = import_fluid() fluid = import_fluid()
output = inputs[0] output = inputs[0]
for i in inputs[1:]: for i in inputs[1:]:
output = fluid.layers.elementwise_add(x=output, y=i) output = fluid.layers.elementwise_add(
x=output, y=i, name=self.get_unique_output_name(name, 'add'))
return output return output
@layer @layer
...@@ -251,7 +294,7 @@ class Network(object): ...@@ -251,7 +294,7 @@ class Network(object):
prefix = name + '_' prefix = name + '_'
output = fluid.layers.fc( output = fluid.layers.fc(
name=name, name=self.get_unique_output_name(name, 'fc'),
input=input, input=input,
size=num_out, size=num_out,
act=act, act=act,
...@@ -269,7 +312,8 @@ class Network(object): ...@@ -269,7 +312,8 @@ class Network(object):
str(shape)) str(shape))
input = fluid.layers.reshape(input, shape[0:2]) input = fluid.layers.reshape(input, shape[0:2])
output = fluid.layers.softmax(input) output = fluid.layers.softmax(
input, name=self.get_unique_output_name(name, 'softmax'))
return output return output
@layer @layer
...@@ -289,7 +333,7 @@ class Network(object): ...@@ -289,7 +333,7 @@ class Network(object):
mean_name = prefix + 'mean' mean_name = prefix + 'mean'
variance_name = prefix + 'variance' variance_name = prefix + 'variance'
output = fluid.layers.batch_norm( output = fluid.layers.batch_norm(
name=name, name=self.get_unique_output_name(name, 'batch_norm'),
input=input, input=input,
is_test=True, is_test=True,
param_attr=param_attr, param_attr=param_attr,
...@@ -308,7 +352,10 @@ class Network(object): ...@@ -308,7 +352,10 @@ class Network(object):
output = input output = input
else: else:
output = fluid.layers.dropout( output = fluid.layers.dropout(
input, dropout_prob=drop_prob, is_test=is_test) input,
dropout_prob=drop_prob,
is_test=is_test,
name=self.get_unique_output_name(name, 'dropout'))
return output return output
@layer @layer
...@@ -328,8 +375,16 @@ class Network(object): ...@@ -328,8 +375,16 @@ class Network(object):
offset_param = fluid.layers.create_parameter( offset_param = fluid.layers.create_parameter(
shape=scale_shape, dtype=input.dtype, name=name, attr=offset_attr) shape=scale_shape, dtype=input.dtype, name=name, attr=offset_attr)
output = fluid.layers.elementwise_mul(input, scale_param, axis=axis) output = fluid.layers.elementwise_mul(
output = fluid.layers.elementwise_add(output, offset_param, axis=axis) input,
scale_param,
axis=axis,
name=self.get_unique_output_name(name, 'scale_mul'))
output = fluid.layers.elementwise_add(
output,
offset_param,
axis=axis,
name=self.get_unique_output_name(name, 'scale_add'))
return output return output
def custom_layer_factory(self): def custom_layer_factory(self):
...@@ -342,5 +397,6 @@ class Network(object): ...@@ -342,5 +397,6 @@ class Network(object):
def custom_layer(self, inputs, kind, name, *args, **kwargs): def custom_layer(self, inputs, kind, name, *args, **kwargs):
""" make custom layer """ make custom layer
""" """
name = self.get_unique_output_name(name, kind)
layer_factory = self.custom_layer_factory() layer_factory = self.custom_layer_factory()
return layer_factory(kind, inputs, name, *args, **kwargs) return layer_factory(kind, inputs, name, *args, **kwargs)
...@@ -3,9 +3,9 @@ import numpy as np ...@@ -3,9 +3,9 @@ import numpy as np
from ..errors import KaffeError, print_stderr from ..errors import KaffeError, print_stderr
from ..graph import GraphBuilder, NodeMapper from ..graph import GraphBuilder, NodeMapper
from ..layers import NodeKind from ..layers import NodeKind
from ..transformers import (DataInjector, DataReshaper, NodeRenamer, ReLUFuser, from ..transformers import (DataInjector, DataReshaper, NodeRenamer,
BatchNormScaleBiasFuser, BatchNormPreprocessor, SubNodeFuser, ReLUFuser, BatchNormScaleBiasFuser,
ParameterNamer) BatchNormPreprocessor, ParameterNamer)
from . import network from . import network
...@@ -18,7 +18,7 @@ def get_padding_type(kernel_params, input_shape, output_shape): ...@@ -18,7 +18,7 @@ def get_padding_type(kernel_params, input_shape, output_shape):
https://github.com/Yangqing/caffe2/blob/master/caffe2/proto/caffe2_legacy.proto https://github.com/Yangqing/caffe2/blob/master/caffe2/proto/caffe2_legacy.proto
''' '''
k_h, k_w, s_h, s_w, p_h, p_w = kernel_params k_h, k_w, s_h, s_w, p_h, p_w = kernel_params
if p_h * p_w > 0: if p_h > 0 or p_w > 0:
return [p_h, p_w] return [p_h, p_w]
else: else:
return None return None
...@@ -315,6 +315,23 @@ class Transformer(object): ...@@ -315,6 +315,23 @@ class Transformer(object):
self.graph = graph.transformed(transformers) self.graph = graph.transformed(transformers)
#for the purpose of recording name mapping because of fused nodes
trace = SubNodeFuser.traced_names()
chg2real = {}
deleted = {}
for k, v in trace.items():
chg2real[k] = v[-1] #mapping from changed-name to real-name
for n in v:
if n in chg2real:
continue
if n not in deleted:
deleted[n] = '%s.%s' % (k, v[-1])
self.graph.add_name_trace({
'chg2real': chg2real,
'deleted': deleted
}, 'paddle')
# Display the graph # Display the graph
if self.verbose: if self.verbose:
print_stderr(self.graph) print_stderr(self.graph)
...@@ -339,6 +356,8 @@ class Transformer(object): ...@@ -339,6 +356,8 @@ class Transformer(object):
node.name: node.data node.name: node.data
for node in self.graph.nodes if node.data for node in self.graph.nodes if node.data
} }
self.params['caffe2fluid_name_trace'] = self.graph.get_name_trace()
return self.params return self.params
def transform_source(self): def transform_source(self):
......
...@@ -181,6 +181,20 @@ class SubNodeFuser(object): ...@@ -181,6 +181,20 @@ class SubNodeFuser(object):
''' '''
An abstract helper for merging a single-child with its single-parent. An abstract helper for merging a single-child with its single-parent.
''' '''
_traced_names = {}
@classmethod
def traced_names(cls):
return cls._traced_names
@classmethod
def trace(cls, fname, tname):
""" recording the names mapping,
the value of 'fname' will be replaced by value of 'tname'
"""
if fname not in cls._traced_names:
cls._traced_names[fname] = []
cls._traced_names[fname].append(tname)
def __call__(self, graph): def __call__(self, graph):
nodes = graph.nodes nodes = graph.nodes
...@@ -234,6 +248,7 @@ class ReLUFuser(SubNodeFuser): ...@@ -234,6 +248,7 @@ class ReLUFuser(SubNodeFuser):
child.kind == NodeKind.ReLU) child.kind == NodeKind.ReLU)
def merge(self, parent, child): def merge(self, parent, child):
SubNodeFuser.trace(parent.name, child.name)
parent.metadata['relu'] = True parent.metadata['relu'] = True
parent.metadata['relu_negative_slope'] = child.parameters.negative_slope parent.metadata['relu_negative_slope'] = child.parameters.negative_slope
...@@ -255,6 +270,7 @@ class BatchNormScaleBiasFuser(SubNodeFuser): ...@@ -255,6 +270,7 @@ class BatchNormScaleBiasFuser(SubNodeFuser):
child.parameters.bias_term == True) child.parameters.bias_term == True)
def merge(self, parent, child): def merge(self, parent, child):
SubNodeFuser.trace(parent.name, child.name)
parent.scale_bias_node = child parent.scale_bias_node = child
...@@ -318,7 +334,9 @@ class ParameterNamer(object): ...@@ -318,7 +334,9 @@ class ParameterNamer(object):
if len(node.data) == 4: if len(node.data) == 4:
names += ('scale', 'offset') names += ('scale', 'offset')
elif node.kind == NodeKind.Scale: elif node.kind == NodeKind.Scale:
names = ('scale', 'offset') names = ('scale', )
if getattr(node.parameters, 'bias_term', False):
names = ('scale', 'offset')
else: else:
warn('Unhandled parameters when naming this it[%s]' % warn('Unhandled parameters when naming this it[%s]' %
(node.kind)) (node.kind))
......
set -e
if [ "x${IMAGENET_USERNAME}" == x -o "x${IMAGENET_ACCESS_KEY}" == x ];then
echo "Please create an account on image-net.org."
echo "It will provide you a pair of username and accesskey to download imagenet data."
read -p "Username: " IMAGENET_USERNAME
read -p "Accesskey: " IMAGENET_ACCESS_KEY
fi
root_url=http://www.image-net.org/challenges/LSVRC/2012/nnoupb
valid_tar=ILSVRC2012_img_val.tar
train_tar=ILSVRC2012_img_train.tar
train_folder=train/
valid_folder=val/
echo "Download imagenet training data..."
mkdir -p ${train_folder}
wget -nd -c ${root_url}/${train_tar}
tar xf ${train_tar} -C ${train_folder}
cd ${train_folder}
for x in `ls *.tar`
do
filename=`basename $x .tar`
mkdir -p $filename
tar -xf $x -C $filename
rm -rf $x
done
cd -
echo "Download imagenet validation data..."
mkdir -p ${valid_folder}
wget -nd -c ${root_url}/${valid_tar}
tar xf ${valid_tar} -C ${valid_folder}
echo "Download imagenet label file: val_list.txt & train_list.txt"
label_file=ImageNet_label.tgz
label_url=http://imagenet-data.bj.bcebos.com/${label_file}
wget -nd -c ${label_url}
tar zxf ${label_file}
cd train
dir=./
for x in `ls *.tar`
do
filename=`basename $x .tar`
mkdir $filename
tar -xvf $x -C ./$filename
done
import os import os
import sys
import numpy as np import numpy as np
import argparse import time
import functools import sys
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
from utility import add_arguments, print_arguments import models
from se_resnext import SE_ResNeXt
import reader import reader
import argparse
import functools
from models.learning_rate import cosine_decay
from utility import add_arguments, print_arguments
import math
parser = argparse.ArgumentParser(description=__doc__) parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser) add_arg = functools.partial(add_arguments, argparser=parser)
# yapf: disable # yapf: disable
add_arg('batch_size', int, 32, "Minibatch size.") add_arg('batch_size', int, 256, "Minibatch size.")
add_arg('use_gpu', bool, True, "Whether to use GPU or not.") add_arg('use_gpu', bool, True, "Whether to use GPU or not.")
add_arg('test_list', str, '', "The testing data lists.") add_arg('class_dim', int, 1000, "Class number.")
add_arg('num_layers', int, 50, "How many layers for SE-ResNeXt model.") add_arg('image_shape', str, "3,224,224", "Input image size")
add_arg('model_dir', str, '', "The model path.") add_arg('with_mem_opt', bool, True, "Whether to use memory optimization or not.")
add_arg('pretrained_model', str, None, "Whether to use pretrained model.")
add_arg('model', str, "SE_ResNeXt50_32x4d", "Set the network to use.")
# yapf: enable # yapf: enable
model_list = [m for m in dir(models) if "__" not in m]
def eval(args): def eval(args):
class_dim = 1000 # parameters from arguments
image_shape = [3, 224, 224] class_dim = args.class_dim
model_name = args.model
pretrained_model = args.pretrained_model
with_memory_optimization = args.with_mem_opt
image_shape = [int(m) for m in args.image_shape.split(",")]
assert model_name in model_list, "{} is not in lists: {}".format(args.model,
model_list)
image = fluid.layers.data(name='image', shape=image_shape, dtype='float32') image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64') label = fluid.layers.data(name='label', shape=[1], dtype='int64')
out = SE_ResNeXt(input=image, class_dim=class_dim, layers=args.num_layers)
cost = fluid.layers.cross_entropy(input=out, label=label)
acc_top1 = fluid.layers.accuracy(input=out, label=label, k=1)
acc_top5 = fluid.layers.accuracy(input=out, label=label, k=5)
avg_cost = fluid.layers.mean(x=cost)
inference_program = fluid.default_main_program().clone(for_test=True) # model definition
model = models.__dict__[model_name]()
if model_name is "GoogleNet":
out0, out1, out2 = model.net(input=image, class_dim=class_dim)
cost0 = fluid.layers.cross_entropy(input=out0, label=label)
cost1 = fluid.layers.cross_entropy(input=out1, label=label)
cost2 = fluid.layers.cross_entropy(input=out2, label=label)
avg_cost0 = fluid.layers.mean(x=cost0)
avg_cost1 = fluid.layers.mean(x=cost1)
avg_cost2 = fluid.layers.mean(x=cost2)
avg_cost = avg_cost0 + 0.3 * avg_cost1 + 0.3 * avg_cost2
acc_top1 = fluid.layers.accuracy(input=out0, label=label, k=1)
acc_top5 = fluid.layers.accuracy(input=out0, label=label, k=5)
else:
out = model.net(input=image, class_dim=class_dim)
cost = fluid.layers.cross_entropy(input=out, label=label)
avg_cost = fluid.layers.mean(x=cost)
acc_top1 = fluid.layers.accuracy(input=out, label=label, k=1)
acc_top5 = fluid.layers.accuracy(input=out, label=label, k=5)
test_program = fluid.default_main_program().clone(for_test=True)
if with_memory_optimization:
fluid.memory_optimize(fluid.default_main_program())
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace() place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place) exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
if not os.path.exists(args.model_dir): if pretrained_model:
raise ValueError("The model path [%s] does not exist." %
(args.model_dir))
if not os.path.exists(args.test_list):
raise ValueError("The test lists [%s] does not exist." %
(args.test_list))
def if_exist(var): def if_exist(var):
return os.path.exists(os.path.join(args.model_dir, var.name)) return os.path.exists(os.path.join(pretrained_model, var.name))
fluid.io.load_vars(exe, args.model_dir, predicate=if_exist) fluid.io.load_vars(exe, pretrained_model, predicate=if_exist)
test_reader = paddle.batch( val_reader = paddle.batch(reader.val(), batch_size=args.batch_size)
reader.test(args.test_list), batch_size=args.batch_size)
feeder = fluid.DataFeeder(place=place, feed_list=[image, label]) feeder = fluid.DataFeeder(place=place, feed_list=[image, label])
fetch_list = [avg_cost, acc_top1, acc_top5] fetch_list = [avg_cost.name, acc_top1.name, acc_top5.name]
test_info = [[], [], []] test_info = [[], [], []]
for batch_id, data in enumerate(test_reader()): cnt = 0
loss, acc1, acc5 = exe.run(inference_program, for batch_id, data in enumerate(val_reader()):
feed=feeder.feed(data), t1 = time.time()
fetch_list=fetch_list) loss, acc1, acc5 = exe.run(test_program,
test_info[0].append(loss[0]) fetch_list=fetch_list,
test_info[1].append(acc1[0]) feed=feeder.feed(data))
test_info[2].append(acc5[0]) t2 = time.time()
if batch_id % 1 == 0: period = t2 - t1
print("Test {0}, loss {1}, acc1 {2}, acc5 {3}" loss = np.mean(loss)
.format(batch_id, loss[0], acc1[0], acc5[0])) acc1 = np.mean(acc1)
acc5 = np.mean(acc5)
test_info[0].append(loss * len(data))
test_info[1].append(acc1 * len(data))
test_info[2].append(acc5 * len(data))
cnt += len(data)
if batch_id % 10 == 0:
print("Testbatch {0},loss {1}, "
"acc1 {2},acc5 {3},time {4}".format(batch_id, \
loss, acc1, acc5, \
"%2.2f sec" % period))
sys.stdout.flush() sys.stdout.flush()
test_loss = np.array(test_info[0]).mean() test_loss = np.sum(test_info[0]) / cnt
test_acc1 = np.array(test_info[1]).mean() test_acc1 = np.sum(test_info[1]) / cnt
test_acc5 = np.array(test_info[2]).mean() test_acc5 = np.sum(test_info[2]) / cnt
print("Test loss {0}, acc1 {1}, acc5 {2}".format(test_loss, test_acc1, print("Test_loss {0}, test_acc1 {1}, test_acc5 {2}".format(
test_acc5)) test_loss, test_acc1, test_acc5))
sys.stdout.flush() sys.stdout.flush()
if __name__ == '__main__': def main():
args = parser.parse_args() args = parser.parse_args()
print_arguments(args) print_arguments(args)
eval(args) eval(args)
if __name__ == '__main__':
main()
import os
import paddle.fluid as fluid
def inception_v4(img, class_dim):
tmp = stem(input=img)
for i in range(1):
tmp = inception_A(input=tmp, depth=i)
tmp = reduction_A(input=tmp)
for i in range(7):
tmp = inception_B(input=tmp, depth=i)
reduction_B(input=tmp)
for i in range(3):
tmp = inception_C(input=tmp, depth=i)
pool = fluid.layers.pool2d(
pool_type='avg', input=tmp, pool_size=7, pool_stride=1)
dropout = fluid.layers.dropout(x=pool, dropout_prob=0.2)
fc = fluid.layers.fc(input=dropout, size=class_dim, act='softmax')
out = fluid.layers.softmax(input=fc)
return out
def conv_bn_layer(name,
input,
num_filters,
filter_size,
padding=0,
stride=1,
groups=1,
act=None):
conv = fluid.layers.conv2d(
name=name,
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=groups,
act=None,
bias_attr=False)
return fluid.layers.batch_norm(name=name + '_norm', input=conv, act=act)
def stem(input):
conv0 = conv_bn_layer(
name='stem_conv_0',
input=input,
num_filters=32,
filter_size=3,
padding=1,
stride=2)
conv1 = conv_bn_layer(
name='stem_conv_1',
input=conv0,
num_filters=32,
filter_size=3,
padding=1)
conv2 = conv_bn_layer(
name='stem_conv_2',
input=conv1,
num_filters=64,
filter_size=3,
padding=1)
def block0(input):
pool0 = fluid.layers.pool2d(
input=input,
pool_size=3,
pool_stride=2,
pool_type='max',
pool_padding=1)
conv0 = conv_bn_layer(
name='stem_block0_conv',
input=input,
num_filters=96,
filter_size=3,
stride=2,
padding=1)
return fluid.layers.concat(input=[pool0, conv0], axis=1)
def block1(input):
l_conv0 = conv_bn_layer(
name='stem_block1_l_conv0',
input=input,
num_filters=64,
filter_size=1,
stride=1,
padding=0)
l_conv1 = conv_bn_layer(
name='stem_block1_l_conv1',
input=l_conv0,
num_filters=96,
filter_size=3,
stride=1,
padding=1)
r_conv0 = conv_bn_layer(
name='stem_block1_r_conv0',
input=input,
num_filters=64,
filter_size=1,
stride=1,
padding=0)
r_conv1 = conv_bn_layer(
name='stem_block1_r_conv1',
input=r_conv0,
num_filters=64,
filter_size=(7, 1),
stride=1,
padding=(3, 0))
r_conv2 = conv_bn_layer(
name='stem_block1_r_conv2',
input=r_conv1,
num_filters=64,
filter_size=(1, 7),
stride=1,
padding=(0, 3))
r_conv3 = conv_bn_layer(
name='stem_block1_r_conv3',
input=r_conv2,
num_filters=96,
filter_size=3,
stride=1,
padding=1)
return fluid.layers.concat(input=[l_conv1, r_conv3], axis=1)
def block2(input):
conv0 = conv_bn_layer(
name='stem_block2_conv',
input=input,
num_filters=192,
filter_size=3,
stride=2,
padding=1)
pool0 = fluid.layers.pool2d(
input=input,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max')
return fluid.layers.concat(input=[conv0, pool0], axis=1)
conv3 = block0(conv2)
conv4 = block1(conv3)
conv5 = block2(conv4)
return conv5
def inception_A(input, depth):
b0_pool0 = fluid.layers.pool2d(
name='inceptA{0}_branch0_pool0'.format(depth),
input=input,
pool_size=3,
pool_stride=1,
pool_padding=1,
pool_type='avg')
b0_conv0 = conv_bn_layer(
name='inceptA{0}_branch0_conv0'.format(depth),
input=b0_pool0,
num_filters=96,
filter_size=1,
stride=1,
padding=0)
b1_conv0 = conv_bn_layer(
name='inceptA{0}_branch1_conv0'.format(depth),
input=input,
num_filters=96,
filter_size=1,
stride=1,
padding=0)
b2_conv0 = conv_bn_layer(
name='inceptA{0}_branch2_conv0'.format(depth),
input=input,
num_filters=64,
filter_size=1,
stride=1,
padding=0)
b2_conv1 = conv_bn_layer(
name='inceptA{0}_branch2_conv1'.format(depth),
input=b2_conv0,
num_filters=96,
filter_size=3,
stride=1,
padding=1)
b3_conv0 = conv_bn_layer(
name='inceptA{0}_branch3_conv0'.format(depth),
input=input,
num_filters=64,
filter_size=1,
stride=1,
padding=0)
b3_conv1 = conv_bn_layer(
name='inceptA{0}_branch3_conv1'.format(depth),
input=b3_conv0,
num_filters=96,
filter_size=3,
stride=1,
padding=1)
b3_conv2 = conv_bn_layer(
name='inceptA{0}_branch3_conv2'.format(depth),
input=b3_conv1,
num_filters=96,
filter_size=3,
stride=1,
padding=1)
return fluid.layers.concat(
input=[b0_conv0, b1_conv0, b2_conv1, b3_conv2], axis=1)
def reduction_A(input):
b0_pool0 = fluid.layers.pool2d(
name='ReductA_branch0_pool0',
input=input,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max')
b1_conv0 = conv_bn_layer(
name='ReductA_branch1_conv0',
input=input,
num_filters=384,
filter_size=3,
stride=2,
padding=1)
b2_conv0 = conv_bn_layer(
name='ReductA_branch2_conv0',
input=input,
num_filters=192,
filter_size=1,
stride=1,
padding=0)
b2_conv1 = conv_bn_layer(
name='ReductA_branch2_conv1',
input=b2_conv0,
num_filters=224,
filter_size=3,
stride=1,
padding=1)
b2_conv2 = conv_bn_layer(
name='ReductA_branch2_conv2',
input=b2_conv1,
num_filters=256,
filter_size=3,
stride=2,
padding=1)
return fluid.layers.concat(input=[b0_pool0, b1_conv0, b2_conv2], axis=1)
def inception_B(input, depth):
b0_pool0 = fluid.layers.pool2d(
name='inceptB{0}_branch0_pool0'.format(depth),
input=input,
pool_size=3,
pool_stride=1,
pool_padding=1,
pool_type='avg')
b0_conv0 = conv_bn_layer(
name='inceptB{0}_branch0_conv0'.format(depth),
input=b0_pool0,
num_filters=128,
filter_size=1,
stride=1,
padding=0)
b1_conv0 = conv_bn_layer(
name='inceptB{0}_branch1_conv0'.format(depth),
input=input,
num_filters=384,
filter_size=1,
stride=1,
padding=0)
b2_conv0 = conv_bn_layer(
name='inceptB{0}_branch2_conv0'.format(depth),
input=input,
num_filters=192,
filter_size=1,
stride=1,
padding=0)
b2_conv1 = conv_bn_layer(
name='inceptB{0}_branch2_conv1'.format(depth),
input=b2_conv0,
num_filters=224,
filter_size=(1, 7),
stride=1,
padding=(0, 3))
b2_conv2 = conv_bn_layer(
name='inceptB{0}_branch2_conv2'.format(depth),
input=b2_conv1,
num_filters=256,
filter_size=(7, 1),
stride=1,
padding=(3, 0))
b3_conv0 = conv_bn_layer(
name='inceptB{0}_branch3_conv0'.format(depth),
input=input,
num_filters=192,
filter_size=1,
stride=1,
padding=0)
b3_conv1 = conv_bn_layer(
name='inceptB{0}_branch3_conv1'.format(depth),
input=b3_conv0,
num_filters=192,
filter_size=(1, 7),
stride=1,
padding=(0, 3))
b3_conv2 = conv_bn_layer(
name='inceptB{0}_branch3_conv2'.format(depth),
input=b3_conv1,
num_filters=224,
filter_size=(7, 1),
stride=1,
padding=(3, 0))
b3_conv3 = conv_bn_layer(
name='inceptB{0}_branch3_conv3'.format(depth),
input=b3_conv2,
num_filters=224,
filter_size=(1, 7),
stride=1,
padding=(0, 3))
b3_conv4 = conv_bn_layer(
name='inceptB{0}_branch3_conv4'.format(depth),
input=b3_conv3,
num_filters=256,
filter_size=(7, 1),
stride=1,
padding=(3, 0))
return fluid.layers.concat(
input=[b0_conv0, b1_conv0, b2_conv2, b3_conv4], axis=1)
def reduction_B(input):
b0_pool0 = fluid.layers.pool2d(
name='ReductB_branch0_pool0',
input=input,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max')
b1_conv0 = conv_bn_layer(
name='ReductB_branch1_conv0',
input=input,
num_filters=192,
filter_size=1,
stride=1,
padding=0)
b1_conv1 = conv_bn_layer(
name='ReductB_branch1_conv1',
input=b1_conv0,
num_filters=192,
filter_size=3,
stride=2,
padding=1)
b2_conv0 = conv_bn_layer(
name='ReductB_branch2_conv0',
input=input,
num_filters=256,
filter_size=1,
stride=1,
padding=0)
b2_conv1 = conv_bn_layer(
name='ReductB_branch2_conv1',
input=b2_conv0,
num_filters=256,
filter_size=(1, 7),
stride=1,
padding=(0, 3))
b2_conv2 = conv_bn_layer(
name='ReductB_branch2_conv2',
input=b2_conv1,
num_filters=320,
filter_size=(7, 1),
stride=1,
padding=(3, 0))
b2_conv3 = conv_bn_layer(
name='ReductB_branch2_conv3',
input=b2_conv2,
num_filters=320,
filter_size=3,
stride=2,
padding=1)
return fluid.layers.concat(input=[b0_pool0, b1_conv1, b2_conv3], axis=1)
def inception_C(input, depth):
b0_pool0 = fluid.layers.pool2d(
name='inceptC{0}_branch0_pool0'.format(depth),
input=input,
pool_size=3,
pool_stride=1,
pool_padding=1,
pool_type='avg')
b0_conv0 = conv_bn_layer(
name='inceptC{0}_branch0_conv0'.format(depth),
input=b0_pool0,
num_filters=256,
filter_size=1,
stride=1,
padding=0)
b1_conv0 = conv_bn_layer(
name='inceptC{0}_branch1_conv0'.format(depth),
input=input,
num_filters=256,
filter_size=1,
stride=1,
padding=0)
b2_conv0 = conv_bn_layer(
name='inceptC{0}_branch2_conv0'.format(depth),
input=input,
num_filters=384,
filter_size=1,
stride=1,
padding=0)
b2_conv1 = conv_bn_layer(
name='inceptC{0}_branch2_conv1'.format(depth),
input=b2_conv0,
num_filters=256,
filter_size=(1, 3),
stride=1,
padding=(0, 1))
b2_conv2 = conv_bn_layer(
name='inceptC{0}_branch2_conv2'.format(depth),
input=b2_conv0,
num_filters=256,
filter_size=(3, 1),
stride=1,
padding=(1, 0))
b3_conv0 = conv_bn_layer(
name='inceptC{0}_branch3_conv0'.format(depth),
input=input,
num_filters=384,
filter_size=1,
stride=1,
padding=0)
b3_conv1 = conv_bn_layer(
name='inceptC{0}_branch3_conv1'.format(depth),
input=b3_conv0,
num_filters=448,
filter_size=(1, 3),
stride=1,
padding=(0, 1))
b3_conv2 = conv_bn_layer(
name='inceptC{0}_branch3_conv2'.format(depth),
input=b3_conv1,
num_filters=512,
filter_size=(3, 1),
stride=1,
padding=(1, 0))
b3_conv3 = conv_bn_layer(
name='inceptC{0}_branch3_conv3'.format(depth),
input=b3_conv2,
num_filters=256,
filter_size=(3, 1),
stride=1,
padding=(1, 0))
b3_conv4 = conv_bn_layer(
name='inceptC{0}_branch3_conv4'.format(depth),
input=b3_conv2,
num_filters=256,
filter_size=(1, 3),
stride=1,
padding=(0, 1))
return fluid.layers.concat(
input=[b0_conv0, b1_conv0, b2_conv1, b2_conv2, b3_conv3, b3_conv4],
axis=1)
import os import os
import sys
import numpy as np import numpy as np
import argparse import time
import functools import sys
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
from utility import add_arguments, print_arguments import models
from se_resnext import SE_ResNeXt
import reader import reader
import argparse
import functools
from models.learning_rate import cosine_decay
from utility import add_arguments, print_arguments
import math
parser = argparse.ArgumentParser(description=__doc__) parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser)
# yapf: disable # yapf: disable
add_arg('batch_size', int, 1, "Minibatch size.") add_arg = functools.partial(add_arguments, argparser=parser)
add_arg('use_gpu', bool, True, "Whether to use GPU or not.") add_arg('batch_size', int, 256, "Minibatch size.")
add_arg('test_list', str, '', "The testing data lists.") add_arg('use_gpu', bool, True, "Whether to use GPU or not.")
add_arg('num_layers', int, 50, "How many layers for SE-ResNeXt model.") add_arg('class_dim', int, 1000, "Class number.")
add_arg('model_dir', str, '', "The model path.") add_arg('image_shape', str, "3,224,224", "Input image size")
add_arg('with_mem_opt', bool, True, "Whether to use memory optimization or not.")
add_arg('pretrained_model', str, None, "Whether to use pretrained model.")
add_arg('model', str, "SE_ResNeXt50_32x4d", "Set the network to use.")
# yapf: enable # yapf: enable
model_list = [m for m in dir(models) if "__" not in m]
def infer(args): def infer(args):
class_dim = 1000 # parameters from arguments
image_shape = [3, 224, 224] class_dim = args.class_dim
model_name = args.model
pretrained_model = args.pretrained_model
with_memory_optimization = args.with_mem_opt
image_shape = [int(m) for m in args.image_shape.split(",")]
assert model_name in model_list, "{} is not in lists: {}".format(args.model,
model_list)
image = fluid.layers.data(name='image', shape=image_shape, dtype='float32') image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
out = SE_ResNeXt(input=image, class_dim=class_dim, layers=args.num_layers)
out = fluid.layers.softmax(input=out)
inference_program = fluid.default_main_program().clone(for_test=True) # model definition
model = models.__dict__[model_name]()
if model_name is "GoogleNet":
out, _, _ = model.net(input=image, class_dim=class_dim)
else:
out = model.net(input=image, class_dim=class_dim)
test_program = fluid.default_main_program().clone(for_test=True)
if with_memory_optimization:
fluid.memory_optimize(fluid.default_main_program())
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace() place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place) exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
if not os.path.exists(args.model_dir): if pretrained_model:
raise ValueError("The model path [%s] does not exist." %
(args.model_dir))
if not os.path.exists(args.test_list):
raise ValueError("The test lists [%s] does not exist." %
(args.test_list))
def if_exist(var): def if_exist(var):
return os.path.exists(os.path.join(args.model_dir, var.name)) return os.path.exists(os.path.join(pretrained_model, var.name))
fluid.io.load_vars(exe, args.model_dir, predicate=if_exist) fluid.io.load_vars(exe, pretrained_model, predicate=if_exist)
test_reader = paddle.batch( test_batch_size = 1
reader.infer(args.test_list), batch_size=args.batch_size) test_reader = paddle.batch(reader.test(), batch_size=test_batch_size)
feeder = fluid.DataFeeder(place=place, feed_list=[image]) feeder = fluid.DataFeeder(place=place, feed_list=[image])
fetch_list = [out] fetch_list = [out.name]
TOPK = 1 TOPK = 1
for batch_id, data in enumerate(test_reader()): for batch_id, data in enumerate(test_reader()):
result = exe.run(inference_program, result = exe.run(test_program,
feed=feeder.feed(data), fetch_list=fetch_list,
fetch_list=fetch_list) feed=feeder.feed(data))
result = result[0] result = result[0][0]
pred_label = np.argsort(result)[::-1][0][0] pred_label = np.argsort(result)[::-1][:TOPK]
print("Test {0}-score {1}, class {2}: " print("Test-{0}-score: {1}, class {2}"
.format(batch_id, result[0][pred_label], pred_label)) .format(batch_id, result[pred_label], pred_label))
sys.stdout.flush() sys.stdout.flush()
if __name__ == '__main__': def main():
args = parser.parse_args() args = parser.parse_args()
print_arguments(args) print_arguments(args)
infer(args) infer(args)
if __name__ == '__main__':
main()
import os
import paddle.v2 as paddle
import paddle.fluid as fluid
from paddle.fluid.initializer import MSRA
from paddle.fluid.param_attr import ParamAttr
parameter_attr = ParamAttr(initializer=MSRA())
def conv_bn_layer(input,
filter_size,
num_filters,
stride,
padding,
channels=None,
num_groups=1,
act='relu',
use_cudnn=True):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=num_groups,
act=None,
use_cudnn=use_cudnn,
param_attr=parameter_attr,
bias_attr=False)
return fluid.layers.batch_norm(input=conv, act=act)
def depthwise_separable(input, num_filters1, num_filters2, num_groups, stride,
scale):
"""
"""
depthwise_conv = conv_bn_layer(
input=input,
filter_size=3,
num_filters=int(num_filters1 * scale),
stride=stride,
padding=1,
num_groups=int(num_groups * scale),
use_cudnn=False)
pointwise_conv = conv_bn_layer(
input=depthwise_conv,
filter_size=1,
num_filters=int(num_filters2 * scale),
stride=1,
padding=0)
return pointwise_conv
def mobile_net(img, class_dim, scale=1.0):
# conv1: 112x112
tmp = conv_bn_layer(
img,
filter_size=3,
channels=3,
num_filters=int(32 * scale),
stride=2,
padding=1)
# 56x56
tmp = depthwise_separable(
tmp,
num_filters1=32,
num_filters2=64,
num_groups=32,
stride=1,
scale=scale)
tmp = depthwise_separable(
tmp,
num_filters1=64,
num_filters2=128,
num_groups=64,
stride=2,
scale=scale)
# 28x28
tmp = depthwise_separable(
tmp,
num_filters1=128,
num_filters2=128,
num_groups=128,
stride=1,
scale=scale)
tmp = depthwise_separable(
tmp,
num_filters1=128,
num_filters2=256,
num_groups=128,
stride=2,
scale=scale)
# 14x14
tmp = depthwise_separable(
tmp,
num_filters1=256,
num_filters2=256,
num_groups=256,
stride=1,
scale=scale)
tmp = depthwise_separable(
tmp,
num_filters1=256,
num_filters2=512,
num_groups=256,
stride=2,
scale=scale)
# 14x14
for i in range(5):
tmp = depthwise_separable(
tmp,
num_filters1=512,
num_filters2=512,
num_groups=512,
stride=1,
scale=scale)
# 7x7
tmp = depthwise_separable(
tmp,
num_filters1=512,
num_filters2=1024,
num_groups=512,
stride=2,
scale=scale)
tmp = depthwise_separable(
tmp,
num_filters1=1024,
num_filters2=1024,
num_groups=1024,
stride=1,
scale=scale)
tmp = fluid.layers.pool2d(
input=tmp,
pool_size=0,
pool_stride=1,
pool_type='avg',
global_pooling=True)
tmp = fluid.layers.fc(input=tmp,
size=class_dim,
act='softmax',
param_attr=parameter_attr)
return tmp
from .alexnet import AlexNet
from .mobilenet import MobileNet
from .googlenet import GoogleNet
from .vgg import VGG11, VGG13, VGG16, VGG19
from .resnet import ResNet50, ResNet101, ResNet152
from .inception_v4 import InceptionV4
from .se_resnext import SE_ResNeXt50_32x4d, SE_ResNeXt101_32x4d, SE_ResNeXt152_32x4d
from .dpn import DPN68, DPN92, DPN98, DPN107, DPN131
import paddle
import paddle.fluid as fluid
import math
__all__ = ['AlexNet']
train_parameters = {
"input_size": [3, 224, 224],
"input_mean": [0.485, 0.456, 0.406],
"input_std": [0.229, 0.224, 0.225],
"learning_strategy": {
"name": "piecewise_decay",
"batch_size": 256,
"epochs": [40, 70, 100],
"steps": [0.01, 0.001, 0.0001, 0.00001]
}
}
class AlexNet():
def __init__(self):
self.params = train_parameters
def net(self, input, class_dim=1000):
stdv = 1.0 / math.sqrt(input.shape[1] * 11 * 11)
conv1 = fluid.layers.conv2d(
input=input,
num_filters=64,
filter_size=11,
stride=4,
padding=2,
groups=1,
act='relu',
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)),
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)))
pool1 = fluid.layers.pool2d(
input=conv1,
pool_size=3,
pool_stride=2,
pool_padding=0,
pool_type='max')
stdv = 1.0 / math.sqrt(pool1.shape[1] * 5 * 5)
conv2 = fluid.layers.conv2d(
input=pool1,
num_filters=192,
filter_size=5,
stride=1,
padding=2,
groups=1,
act='relu',
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)),
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)))
pool2 = fluid.layers.pool2d(
input=conv2,
pool_size=3,
pool_stride=2,
pool_padding=0,
pool_type='max')
stdv = 1.0 / math.sqrt(pool2.shape[1] * 3 * 3)
conv3 = fluid.layers.conv2d(
input=pool2,
num_filters=384,
filter_size=3,
stride=1,
padding=1,
groups=1,
act='relu',
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)),
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)))
stdv = 1.0 / math.sqrt(conv3.shape[1] * 3 * 3)
conv4 = fluid.layers.conv2d(
input=conv3,
num_filters=256,
filter_size=3,
stride=1,
padding=1,
groups=1,
act='relu',
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)),
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)))
stdv = 1.0 / math.sqrt(conv4.shape[1] * 3 * 3)
conv5 = fluid.layers.conv2d(
input=conv4,
num_filters=256,
filter_size=3,
stride=1,
padding=1,
groups=1,
act='relu',
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)),
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)))
pool5 = fluid.layers.pool2d(
input=conv5,
pool_size=3,
pool_stride=2,
pool_padding=0,
pool_type='max')
drop6 = fluid.layers.dropout(x=pool5, dropout_prob=0.5)
stdv = 1.0 / math.sqrt(drop6.shape[1] * drop6.shape[2] *
drop6.shape[3] * 1.0)
fc6 = fluid.layers.fc(
input=drop6,
size=4096,
act='relu',
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)),
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)))
drop7 = fluid.layers.dropout(x=fc6, dropout_prob=0.5)
stdv = 1.0 / math.sqrt(drop7.shape[1] * 1.0)
fc7 = fluid.layers.fc(
input=drop7,
size=4096,
act='relu',
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)),
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)))
stdv = 1.0 / math.sqrt(fc7.shape[1] * 1.0)
out = fluid.layers.fc(
input=fc7,
size=class_dim,
act='softmax',
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)),
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)))
return out
import os
import numpy as np
import time
import sys
import paddle
import paddle.fluid as fluid
import paddle.fluid.layers.control_flow as control_flow
import paddle.fluid.layers.nn as nn
import paddle.fluid.layers.tensor as tensor
import math
__all__ = ["DPN", "DPN68", "DPN92", "DPN98", "DPN107", "DPN131"]
train_parameters = {
"input_size": [3, 224, 224],
"input_mean": [0.485, 0.456, 0.406],
"input_std": [0.229, 0.224, 0.225],
"learning_strategy": {
"name": "piecewise_decay",
"batch_size": 256,
"epochs": [30, 60, 90],
"steps": [0.1, 0.01, 0.001, 0.0001]
}
}
class DPN(object):
def __init__(self, layers=68):
self.params = train_parameters
self.layers = layers
def net(self, input, class_dim=1000):
# get network args
args = self.get_net_args(self.layers)
bws = args['bw']
inc_sec = args['inc_sec']
rs = args['bw']
k_r = args['k_r']
k_sec = args['k_sec']
G = args['G']
init_num_filter = args['init_num_filter']
init_filter_size = args['init_filter_size']
init_padding = args['init_padding']
## define Dual Path Network
# conv1
conv1_x_1 = fluid.layers.conv2d(
input=input,
num_filters=init_num_filter,
filter_size=init_filter_size,
stride=2,
padding=init_padding,
groups=1,
act=None,
bias_attr=False)
conv1_x_1 = fluid.layers.batch_norm(
input=conv1_x_1, act='relu', is_test=False)
convX_x_x = fluid.layers.pool2d(
input=conv1_x_1,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max')
#conv2 - conv5
for gc in range(4):
bw = bws[gc]
inc = inc_sec[gc]
R = (k_r * bw) / rs[gc]
if gc == 0:
_type1 = 'proj'
_type2 = 'normal'
else:
_type1 = 'down'
_type2 = 'normal'
convX_x_x = self.dual_path_factory(convX_x_x, R, R, bw, inc, G,
_type1)
for i_ly in range(2, k_sec[gc] + 1):
convX_x_x = self.dual_path_factory(convX_x_x, R, R, bw, inc, G,
_type2)
conv5_x_x = fluid.layers.concat(convX_x_x, axis=1)
conv5_x_x = fluid.layers.batch_norm(
input=conv5_x_x, act='relu', is_test=False)
pool5 = fluid.layers.pool2d(
input=conv5_x_x,
pool_size=7,
pool_stride=1,
pool_padding=0,
pool_type='avg')
#stdv = 1.0 / math.sqrt(pool5.shape[1] * 1.0)
stdv = 0.01
param_attr = fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv))
fc6 = fluid.layers.fc(input=pool5,
size=class_dim,
act='softmax',
param_attr=param_attr)
return fc6
def get_net_args(self, layers):
if layers == 68:
k_r = 128
G = 32
k_sec = [3, 4, 12, 3]
inc_sec = [16, 32, 32, 64]
bw = [64, 128, 256, 512]
r = [64, 64, 64, 64]
init_num_filter = 10
init_filter_size = 3
init_padding = 1
elif layers == 92:
k_r = 96
G = 32
k_sec = [3, 4, 20, 3]
inc_sec = [16, 32, 24, 128]
bw = [256, 512, 1024, 2048]
r = [256, 256, 256, 256]
init_num_filter = 64
init_filter_size = 7
init_padding = 3
elif layers == 98:
k_r = 160
G = 40
k_sec = [3, 6, 20, 3]
inc_sec = [16, 32, 32, 128]
bw = [256, 512, 1024, 2048]
r = [256, 256, 256, 256]
init_num_filter = 96
init_filter_size = 7
init_padding = 3
elif layers == 107:
k_r = 200
G = 50
k_sec = [4, 8, 20, 3]
inc_sec = [20, 64, 64, 128]
bw = [256, 512, 1024, 2048]
r = [256, 256, 256, 256]
init_num_filter = 128
init_filter_size = 7
init_padding = 3
elif layers == 131:
k_r = 160
G = 40
k_sec = [4, 8, 28, 3]
inc_sec = [16, 32, 32, 128]
bw = [256, 512, 1024, 2048]
r = [256, 256, 256, 256]
init_num_filter = 128
init_filter_size = 7
init_padding = 3
else:
raise NotImplementedError
net_arg = {
'k_r': k_r,
'G': G,
'k_sec': k_sec,
'inc_sec': inc_sec,
'bw': bw,
'r': r
}
net_arg['init_num_filter'] = init_num_filter
net_arg['init_filter_size'] = init_filter_size
net_arg['init_padding'] = init_padding
return net_arg
def dual_path_factory(self,
data,
num_1x1_a,
num_3x3_b,
num_1x1_c,
inc,
G,
_type='normal'):
kw = 3
kh = 3
pw = (kw - 1) / 2
ph = (kh - 1) / 2
# type
if _type is 'proj':
key_stride = 1
has_proj = True
if _type is 'down':
key_stride = 2
has_proj = True
if _type is 'normal':
key_stride = 1
has_proj = False
# PROJ
if type(data) is list:
data_in = fluid.layers.concat([data[0], data[1]], axis=1)
else:
data_in = data
if has_proj:
c1x1_w = self.bn_ac_conv(
data=data_in,
num_filter=(num_1x1_c + 2 * inc),
kernel=(1, 1),
pad=(0, 0),
stride=(key_stride, key_stride))
data_o1, data_o2 = fluid.layers.split(
c1x1_w, num_or_sections=[num_1x1_c, 2 * inc], dim=1)
else:
data_o1 = data[0]
data_o2 = data[1]
# MAIN
c1x1_a = self.bn_ac_conv(
data=data_in, num_filter=num_1x1_a, kernel=(1, 1), pad=(0, 0))
c3x3_b = self.bn_ac_conv(
data=c1x1_a,
num_filter=num_3x3_b,
kernel=(kw, kh),
pad=(pw, ph),
stride=(key_stride, key_stride),
num_group=G)
c1x1_c = self.bn_ac_conv(
data=c3x3_b,
num_filter=(num_1x1_c + inc),
kernel=(1, 1),
pad=(0, 0))
c1x1_c1, c1x1_c2 = fluid.layers.split(
c1x1_c, num_or_sections=[num_1x1_c, inc], dim=1)
# OUTPUTS
summ = fluid.layers.elementwise_add(x=data_o1, y=c1x1_c1)
dense = fluid.layers.concat([data_o2, c1x1_c2], axis=1)
return [summ, dense]
def bn_ac_conv(self,
data,
num_filter,
kernel,
pad,
stride=(1, 1),
num_group=1):
bn_ac = fluid.layers.batch_norm(input=data, act='relu', is_test=False)
bn_ac_conv = fluid.layers.conv2d(
input=bn_ac,
num_filters=num_filter,
filter_size=kernel,
stride=stride,
padding=pad,
groups=num_group,
act=None,
bias_attr=False)
return bn_ac_conv
def DPN68():
model = DPN(layers=68)
return model
def DPN92():
model = DPN(layers=92)
return model
def DPN98():
model = DPN(layers=98)
return model
def DPN107():
model = DPN(layers=107)
return model
def DPN131():
model = DPN(layers=131)
return model
import paddle
import paddle.fluid as fluid
__all__ = ['GoogleNet']
train_parameters = {
"input_size": [3, 224, 224],
"input_mean": [0.485, 0.456, 0.406],
"input_std": [0.229, 0.224, 0.225],
"learning_strategy": {
"name": "piecewise_decay",
"batch_size": 256,
"epochs": [30, 60, 90],
"steps": [0.1, 0.01, 0.001, 0.0001]
}
}
class GoogleNet():
def __init__(self):
self.params = train_parameters
def conv_layer(self,
input,
num_filters,
filter_size,
stride=1,
groups=1,
act=None):
channels = input.shape[1]
stdv = (3.0 / (filter_size**2 * channels))**0.5
param_attr = fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv))
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=(filter_size - 1) / 2,
groups=groups,
act=act,
param_attr=param_attr,
bias_attr=False)
return conv
def xavier(self, channels, filter_size):
stdv = (3.0 / (filter_size**2 * channels))**0.5
param_attr = fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv))
return param_attr
def inception(self, name, input, channels, filter1, filter3R, filter3,
filter5R, filter5, proj):
conv1 = self.conv_layer(
input=input, num_filters=filter1, filter_size=1, stride=1, act=None)
conv3r = self.conv_layer(
input=input,
num_filters=filter3R,
filter_size=1,
stride=1,
act=None)
conv3 = self.conv_layer(
input=conv3r,
num_filters=filter3,
filter_size=3,
stride=1,
act=None)
conv5r = self.conv_layer(
input=input,
num_filters=filter5R,
filter_size=1,
stride=1,
act=None)
conv5 = self.conv_layer(
input=conv5r,
num_filters=filter5,
filter_size=5,
stride=1,
act=None)
pool = fluid.layers.pool2d(
input=input,
pool_size=3,
pool_stride=1,
pool_padding=1,
pool_type='max')
convprj = fluid.layers.conv2d(
input=pool, filter_size=1, num_filters=proj, stride=1, padding=0)
cat = fluid.layers.concat(input=[conv1, conv3, conv5, convprj], axis=1)
cat = fluid.layers.relu(cat)
return cat
def net(self, input, class_dim=1000):
conv = self.conv_layer(
input=input, num_filters=64, filter_size=7, stride=2, act=None)
pool = fluid.layers.pool2d(
input=conv, pool_size=3, pool_type='max', pool_stride=2)
conv = self.conv_layer(
input=pool, num_filters=64, filter_size=1, stride=1, act=None)
conv = self.conv_layer(
input=conv, num_filters=192, filter_size=3, stride=1, act=None)
pool = fluid.layers.pool2d(
input=conv, pool_size=3, pool_type='max', pool_stride=2)
ince3a = self.inception("ince3a", pool, 192, 64, 96, 128, 16, 32, 32)
ince3b = self.inception("ince3b", ince3a, 256, 128, 128, 192, 32, 96,
64)
pool3 = fluid.layers.pool2d(
input=ince3b, pool_size=3, pool_type='max', pool_stride=2)
ince4a = self.inception("ince4a", pool3, 480, 192, 96, 208, 16, 48, 64)
ince4b = self.inception("ince4b", ince4a, 512, 160, 112, 224, 24, 64,
64)
ince4c = self.inception("ince4c", ince4b, 512, 128, 128, 256, 24, 64,
64)
ince4d = self.inception("ince4d", ince4c, 512, 112, 144, 288, 32, 64,
64)
ince4e = self.inception("ince4e", ince4d, 528, 256, 160, 320, 32, 128,
128)
pool4 = fluid.layers.pool2d(
input=ince4e, pool_size=3, pool_type='max', pool_stride=2)
ince5a = self.inception("ince5a", pool4, 832, 256, 160, 320, 32, 128,
128)
ince5b = self.inception("ince5b", ince5a, 832, 384, 192, 384, 48, 128,
128)
pool5 = fluid.layers.pool2d(
input=ince5b, pool_size=7, pool_type='avg', pool_stride=7)
dropout = fluid.layers.dropout(x=pool5, dropout_prob=0.4)
out = fluid.layers.fc(input=dropout,
size=class_dim,
act='softmax',
param_attr=self.xavier(1024, 1))
pool_o1 = fluid.layers.pool2d(
input=ince4a, pool_size=5, pool_type='avg', pool_stride=3)
conv_o1 = self.conv_layer(
input=pool_o1, num_filters=128, filter_size=1, stride=1, act=None)
fc_o1 = fluid.layers.fc(input=conv_o1,
size=1024,
act='relu',
param_attr=self.xavier(2048, 1))
dropout_o1 = fluid.layers.dropout(x=fc_o1, dropout_prob=0.7)
out1 = fluid.layers.fc(input=dropout_o1,
size=class_dim,
act='softmax',
param_attr=self.xavier(1024, 1))
pool_o2 = fluid.layers.pool2d(
input=ince4d, pool_size=5, pool_type='avg', pool_stride=3)
conv_o2 = self.conv_layer(
input=pool_o2, num_filters=128, filter_size=1, stride=1, act=None)
fc_o2 = fluid.layers.fc(input=conv_o2,
size=1024,
act='relu',
param_attr=self.xavier(2048, 1))
dropout_o2 = fluid.layers.dropout(x=fc_o2, dropout_prob=0.7)
out2 = fluid.layers.fc(input=dropout_o2,
size=class_dim,
act='softmax',
param_attr=self.xavier(1024, 1))
# last fc layer is "out"
return out, out1, out2
import paddle
import paddle.fluid as fluid
import math
__all__ = ['InceptionV4']
train_parameters = {
"input_size": [3, 224, 224],
"input_mean": [0.485, 0.456, 0.406],
"input_std": [0.229, 0.224, 0.225],
"learning_strategy": {
"name": "piecewise_decay",
"batch_size": 256,
"epochs": [30, 60, 90],
"steps": [0.1, 0.01, 0.001, 0.0001]
}
}
class InceptionV4():
def __init__(self):
self.params = train_parameters
def net(self, input, class_dim=1000):
x = self.inception_stem(input)
for i in range(4):
x = self.inceptionA(x)
x = self.reductionA(x)
for i in range(7):
x = self.inceptionB(x)
x = self.reductionB(x)
for i in range(3):
x = self.inceptionC(x)
pool = fluid.layers.pool2d(
input=x, pool_size=8, pool_type='avg', global_pooling=True)
drop = fluid.layers.dropout(x=pool, dropout_prob=0.2)
stdv = 1.0 / math.sqrt(drop.shape[1] * 1.0)
out = fluid.layers.fc(
input=drop,
size=class_dim,
act='softmax',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)))
return out
def conv_bn_layer(self,
data,
num_filters,
filter_size,
stride=1,
padding=0,
groups=1,
act='relu'):
conv = fluid.layers.conv2d(
input=data,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=groups,
act=None,
bias_attr=False)
return fluid.layers.batch_norm(input=conv, act=act)
def inception_stem(self, data):
conv = self.conv_bn_layer(data, 32, 3, stride=2, act='relu')
conv = self.conv_bn_layer(conv, 32, 3, act='relu')
conv = self.conv_bn_layer(conv, 64, 3, padding=1, act='relu')
pool1 = fluid.layers.pool2d(
input=conv, pool_size=3, pool_stride=2, pool_type='max')
conv2 = self.conv_bn_layer(conv, 96, 3, stride=2, act='relu')
concat = fluid.layers.concat([pool1, conv2], axis=1)
conv1 = self.conv_bn_layer(concat, 64, 1, act='relu')
conv1 = self.conv_bn_layer(conv1, 96, 3, act='relu')
conv2 = self.conv_bn_layer(concat, 64, 1, act='relu')
conv2 = self.conv_bn_layer(
conv2, 64, (7, 1), padding=(3, 0), act='relu')
conv2 = self.conv_bn_layer(
conv2, 64, (1, 7), padding=(0, 3), act='relu')
conv2 = self.conv_bn_layer(conv2, 96, 3, act='relu')
concat = fluid.layers.concat([conv1, conv2], axis=1)
conv1 = self.conv_bn_layer(concat, 192, 3, stride=2, act='relu')
pool1 = fluid.layers.pool2d(
input=concat, pool_size=3, pool_stride=2, pool_type='max')
concat = fluid.layers.concat([conv1, pool1], axis=1)
return concat
def inceptionA(self, data):
pool1 = fluid.layers.pool2d(
input=data, pool_size=3, pool_padding=1, pool_type='avg')
conv1 = self.conv_bn_layer(pool1, 96, 1, act='relu')
conv2 = self.conv_bn_layer(data, 96, 1, act='relu')
conv3 = self.conv_bn_layer(data, 64, 1, act='relu')
conv3 = self.conv_bn_layer(conv3, 96, 3, padding=1, act='relu')
conv4 = self.conv_bn_layer(data, 64, 1, act='relu')
conv4 = self.conv_bn_layer(conv4, 96, 3, padding=1, act='relu')
conv4 = self.conv_bn_layer(conv4, 96, 3, padding=1, act='relu')
concat = fluid.layers.concat([conv1, conv2, conv3, conv4], axis=1)
return concat
def reductionA(self, data):
pool1 = fluid.layers.pool2d(
input=data, pool_size=3, pool_stride=2, pool_type='max')
conv2 = self.conv_bn_layer(data, 384, 3, stride=2, act='relu')
conv3 = self.conv_bn_layer(data, 192, 1, act='relu')
conv3 = self.conv_bn_layer(conv3, 224, 3, padding=1, act='relu')
conv3 = self.conv_bn_layer(conv3, 256, 3, stride=2, act='relu')
concat = fluid.layers.concat([pool1, conv2, conv3], axis=1)
return concat
def inceptionB(self, data):
pool1 = fluid.layers.pool2d(
input=data, pool_size=3, pool_padding=1, pool_type='avg')
conv1 = self.conv_bn_layer(pool1, 128, 1, act='relu')
conv2 = self.conv_bn_layer(data, 384, 1, act='relu')
conv3 = self.conv_bn_layer(data, 192, 1, act='relu')
conv3 = self.conv_bn_layer(
conv3, 224, (1, 7), padding=(0, 3), act='relu')
conv3 = self.conv_bn_layer(
conv3, 256, (7, 1), padding=(3, 0), act='relu')
conv4 = self.conv_bn_layer(data, 192, 1, act='relu')
conv4 = self.conv_bn_layer(
conv4, 192, (1, 7), padding=(0, 3), act='relu')
conv4 = self.conv_bn_layer(
conv4, 224, (7, 1), padding=(3, 0), act='relu')
conv4 = self.conv_bn_layer(
conv4, 224, (1, 7), padding=(0, 3), act='relu')
conv4 = self.conv_bn_layer(
conv4, 256, (7, 1), padding=(3, 0), act='relu')
concat = fluid.layers.concat([conv1, conv2, conv3, conv4], axis=1)
return concat
def reductionB(self, data):
pool1 = fluid.layers.pool2d(
input=data, pool_size=3, pool_stride=2, pool_type='max')
conv2 = self.conv_bn_layer(data, 192, 1, act='relu')
conv2 = self.conv_bn_layer(conv2, 192, 3, stride=2, act='relu')
conv3 = self.conv_bn_layer(data, 256, 1, act='relu')
conv3 = self.conv_bn_layer(
conv3, 256, (1, 7), padding=(0, 3), act='relu')
conv3 = self.conv_bn_layer(
conv3, 320, (7, 1), padding=(3, 0), act='relu')
conv3 = self.conv_bn_layer(conv3, 320, 3, stride=2, act='relu')
concat = fluid.layers.concat([pool1, conv2, conv3], axis=1)
return concat
def inceptionC(self, data):
pool1 = fluid.layers.pool2d(
input=data, pool_size=3, pool_padding=1, pool_type='avg')
conv1 = self.conv_bn_layer(pool1, 256, 1, act='relu')
conv2 = self.conv_bn_layer(data, 256, 1, act='relu')
conv3 = self.conv_bn_layer(data, 384, 1, act='relu')
conv3_1 = self.conv_bn_layer(
conv3, 256, (1, 3), padding=(0, 1), act='relu')
conv3_2 = self.conv_bn_layer(
conv3, 256, (3, 1), padding=(1, 0), act='relu')
conv4 = self.conv_bn_layer(data, 384, 1, act='relu')
conv4 = self.conv_bn_layer(
conv4, 448, (1, 3), padding=(0, 1), act='relu')
conv4 = self.conv_bn_layer(
conv4, 512, (3, 1), padding=(1, 0), act='relu')
conv4_1 = self.conv_bn_layer(
conv4, 256, (1, 3), padding=(0, 1), act='relu')
conv4_2 = self.conv_bn_layer(
conv4, 256, (3, 1), padding=(1, 0), act='relu')
concat = fluid.layers.concat(
[conv1, conv2, conv3_1, conv3_2, conv4_1, conv4_2], axis=1)
return concat
import paddle
import paddle.fluid as fluid
import paddle.fluid.layers.ops as ops
from paddle.fluid.initializer import init_on_cpu
from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter
import math
def cosine_decay(learning_rate, step_each_epoch, epochs=120):
"""Applies cosine decay to the learning rate.
lr = 0.05 * (math.cos(epoch * (math.pi / 120)) + 1)
"""
global_step = _decay_step_counter()
with init_on_cpu():
epoch = ops.floor(global_step / step_each_epoch)
decayed_lr = learning_rate * \
(ops.cos(epoch * (math.pi / epochs)) + 1)/2
return decayed_lr
import paddle.fluid as fluid
from paddle.fluid.initializer import MSRA
from paddle.fluid.param_attr import ParamAttr
__all__ = ['MobileNet']
train_parameters = {
"input_size": [3, 224, 224],
"input_mean": [0.485, 0.456, 0.406],
"input_std": [0.229, 0.224, 0.225],
"learning_strategy": {
"name": "piecewise_decay",
"batch_size": 256,
"epochs": [30, 60, 90],
"steps": [0.1, 0.01, 0.001, 0.0001]
}
}
class MobileNet():
def __init__(self):
self.params = train_parameters
def net(self, input, class_dim=1000, scale=1.0):
# conv1: 112x112
input = self.conv_bn_layer(
input,
filter_size=3,
channels=3,
num_filters=int(32 * scale),
stride=2,
padding=1)
# 56x56
input = self.depthwise_separable(
input,
num_filters1=32,
num_filters2=64,
num_groups=32,
stride=1,
scale=scale)
input = self.depthwise_separable(
input,
num_filters1=64,
num_filters2=128,
num_groups=64,
stride=2,
scale=scale)
# 28x28
input = self.depthwise_separable(
input,
num_filters1=128,
num_filters2=128,
num_groups=128,
stride=1,
scale=scale)
input = self.depthwise_separable(
input,
num_filters1=128,
num_filters2=256,
num_groups=128,
stride=2,
scale=scale)
# 14x14
input = self.depthwise_separable(
input,
num_filters1=256,
num_filters2=256,
num_groups=256,
stride=1,
scale=scale)
input = self.depthwise_separable(
input,
num_filters1=256,
num_filters2=512,
num_groups=256,
stride=2,
scale=scale)
# 14x14
for i in range(5):
input = self.depthwise_separable(
input,
num_filters1=512,
num_filters2=512,
num_groups=512,
stride=1,
scale=scale)
# 7x7
input = self.depthwise_separable(
input,
num_filters1=512,
num_filters2=1024,
num_groups=512,
stride=2,
scale=scale)
input = self.depthwise_separable(
input,
num_filters1=1024,
num_filters2=1024,
num_groups=1024,
stride=1,
scale=scale)
input = fluid.layers.pool2d(
input=input,
pool_size=0,
pool_stride=1,
pool_type='avg',
global_pooling=True)
output = fluid.layers.fc(input=input,
size=class_dim,
act='softmax',
param_attr=ParamAttr(initializer=MSRA()))
return output
def conv_bn_layer(self,
input,
filter_size,
num_filters,
stride,
padding,
channels=None,
num_groups=1,
act='relu',
use_cudnn=True):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=num_groups,
act=None,
use_cudnn=use_cudnn,
param_attr=ParamAttr(initializer=MSRA()),
bias_attr=False)
return fluid.layers.batch_norm(input=conv, act=act)
def depthwise_separable(self, input, num_filters1, num_filters2, num_groups,
stride, scale):
depthwise_conv = self.conv_bn_layer(
input=input,
filter_size=3,
num_filters=int(num_filters1 * scale),
stride=stride,
padding=1,
num_groups=int(num_groups * scale),
use_cudnn=False)
pointwise_conv = self.conv_bn_layer(
input=depthwise_conv,
filter_size=1,
num_filters=int(num_filters2 * scale),
stride=1,
padding=0)
return pointwise_conv
import paddle
import paddle.fluid as fluid
import math
__all__ = ["ResNet", "ResNet50", "ResNet101", "ResNet152"]
train_parameters = {
"input_size": [3, 224, 224],
"input_mean": [0.485, 0.456, 0.406],
"input_std": [0.229, 0.224, 0.225],
"learning_strategy": {
"name": "piecewise_decay",
"batch_size": 256,
"epochs": [30, 60, 90],
"steps": [0.1, 0.01, 0.001, 0.0001]
}
}
class ResNet():
def __init__(self, layers=50):
self.params = train_parameters
self.layers = layers
def net(self, input, class_dim=1000):
layers = self.layers
supported_layers = [50, 101, 152]
assert layers in supported_layers, \
"supported layers are {} but input layer is {}".format(supported_layers, layers)
if layers == 50:
depth = [3, 4, 6, 3]
elif layers == 101:
depth = [3, 4, 23, 3]
elif layers == 152:
depth = [3, 8, 36, 3]
num_filters = [64, 128, 256, 512]
conv = self.conv_bn_layer(
input=input, num_filters=64, filter_size=7, stride=2, act='relu')
conv = fluid.layers.pool2d(
input=conv,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max')
for block in range(len(depth)):
for i in range(depth[block]):
conv = self.bottleneck_block(
input=conv,
num_filters=num_filters[block],
stride=2 if i == 0 and block != 0 else 1)
pool = fluid.layers.pool2d(
input=conv, pool_size=7, pool_type='avg', global_pooling=True)
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc(input=pool,
size=class_dim,
act='softmax',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv,
stdv)))
return out
def conv_bn_layer(self,
input,
num_filters,
filter_size,
stride=1,
groups=1,
act=None):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=(filter_size - 1) / 2,
groups=groups,
act=None,
bias_attr=False)
return fluid.layers.batch_norm(input=conv, act=act)
def shortcut(self, input, ch_out, stride):
ch_in = input.shape[1]
if ch_in != ch_out or stride != 1:
return self.conv_bn_layer(input, ch_out, 1, stride)
else:
return input
def bottleneck_block(self, input, num_filters, stride):
conv0 = self.conv_bn_layer(
input=input, num_filters=num_filters, filter_size=1, act='relu')
conv1 = self.conv_bn_layer(
input=conv0,
num_filters=num_filters,
filter_size=3,
stride=stride,
act='relu')
conv2 = self.conv_bn_layer(
input=conv1, num_filters=num_filters * 4, filter_size=1, act=None)
short = self.shortcut(input, num_filters * 4, stride)
return fluid.layers.elementwise_add(x=short, y=conv2, act='relu')
def ResNet50():
model = ResNet(layers=50)
return model
def ResNet101():
model = ResNet(layers=101)
return model
def ResNet152():
model = ResNet(layers=152)
return model
import paddle
import paddle.fluid as fluid
import math
__all__ = [
"SE_ResNeXt", "SE_ResNeXt50_32x4d", "SE_ResNeXt101_32x4d",
"SE_ResNeXt152_32x4d"
]
train_parameters = {
"input_size": [3, 224, 224],
"input_mean": [0.485, 0.456, 0.406],
"input_std": [0.229, 0.224, 0.225],
"learning_strategy": {
"name": "piecewise_decay",
"batch_size": 256,
"epochs": [30, 60, 90],
"steps": [0.1, 0.01, 0.001, 0.0001]
}
}
class SE_ResNeXt():
def __init__(self, layers=50):
self.params = train_parameters
self.layers = layers
def net(self, input, class_dim=1000):
layers = self.layers
supported_layers = [50, 101, 152]
assert layers in supported_layers, \
"supported layers are {} but input layer is {}".format(supported_layers, layers)
if layers == 50:
cardinality = 32
reduction_ratio = 16
depth = [3, 4, 6, 3]
num_filters = [128, 256, 512, 1024]
conv = self.conv_bn_layer(
input=input,
num_filters=64,
filter_size=7,
stride=2,
act='relu')
conv = fluid.layers.pool2d(
input=conv,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max')
elif layers == 101:
cardinality = 32
reduction_ratio = 16
depth = [3, 4, 23, 3]
num_filters = [128, 256, 512, 1024]
conv = self.conv_bn_layer(
input=input,
num_filters=64,
filter_size=7,
stride=2,
act='relu')
conv = fluid.layers.pool2d(
input=conv,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max')
elif layers == 152:
cardinality = 64
reduction_ratio = 16
depth = [3, 8, 36, 3]
num_filters = [128, 256, 512, 1024]
conv = self.conv_bn_layer(
input=input,
num_filters=64,
filter_size=3,
stride=2,
act='relu')
conv = self.conv_bn_layer(
input=conv, num_filters=64, filter_size=3, stride=1, act='relu')
conv = self.conv_bn_layer(
input=conv,
num_filters=128,
filter_size=3,
stride=1,
act='relu')
conv = fluid.layers.pool2d(
input=conv, pool_size=3, pool_stride=2, pool_padding=1, \
pool_type='max')
for block in range(len(depth)):
for i in range(depth[block]):
conv = self.bottleneck_block(
input=conv,
num_filters=num_filters[block],
stride=2 if i == 0 and block != 0 else 1,
cardinality=cardinality,
reduction_ratio=reduction_ratio)
pool = fluid.layers.pool2d(
input=conv, pool_size=7, pool_type='avg', global_pooling=True)
drop = fluid.layers.dropout(x=pool, dropout_prob=0.5)
stdv = 1.0 / math.sqrt(drop.shape[1] * 1.0)
out = fluid.layers.fc(input=drop,
size=class_dim,
act='softmax',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv,
stdv)))
return out
def shortcut(self, input, ch_out, stride):
ch_in = input.shape[1]
if ch_in != ch_out or stride != 1:
filter_size = 1
return self.conv_bn_layer(input, ch_out, filter_size, stride)
else:
return input
def bottleneck_block(self, input, num_filters, stride, cardinality,
reduction_ratio):
conv0 = self.conv_bn_layer(
input=input, num_filters=num_filters, filter_size=1, act='relu')
conv1 = self.conv_bn_layer(
input=conv0,
num_filters=num_filters,
filter_size=3,
stride=stride,
groups=cardinality,
act='relu')
conv2 = self.conv_bn_layer(
input=conv1, num_filters=num_filters * 2, filter_size=1, act=None)
scale = self.squeeze_excitation(
input=conv2,
num_channels=num_filters * 2,
reduction_ratio=reduction_ratio)
short = self.shortcut(input, num_filters * 2, stride)
return fluid.layers.elementwise_add(x=short, y=scale, act='relu')
def conv_bn_layer(self,
input,
num_filters,
filter_size,
stride=1,
groups=1,
act=None):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=(filter_size - 1) / 2,
groups=groups,
act=None,
bias_attr=False)
return fluid.layers.batch_norm(input=conv, act=act)
def squeeze_excitation(self, input, num_channels, reduction_ratio):
pool = fluid.layers.pool2d(
input=input, pool_size=0, pool_type='avg', global_pooling=True)
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
squeeze = fluid.layers.fc(input=pool,
size=num_channels / reduction_ratio,
act='relu',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(
-stdv, stdv)))
stdv = 1.0 / math.sqrt(squeeze.shape[1] * 1.0)
excitation = fluid.layers.fc(input=squeeze,
size=num_channels,
act='sigmoid',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(
-stdv, stdv)))
scale = fluid.layers.elementwise_mul(x=input, y=excitation, axis=0)
return scale
def SE_ResNeXt50_32x4d():
model = SE_ResNeXt(layers=50)
return model
def SE_ResNeXt101_32x4d():
model = SE_ResNeXt(layers=101)
return model
def SE_ResNeXt152_32x4d():
model = SE_ResNeXt(layers=152)
return model
import paddle
import paddle.fluid as fluid
__all__ = ["VGGNet", "VGG11", "VGG13", "VGG16", "VGG19"]
train_parameters = {
"input_size": [3, 224, 224],
"input_mean": [0.485, 0.456, 0.406],
"input_std": [0.229, 0.224, 0.225],
"learning_strategy": {
"name": "piecewise_decay",
"batch_size": 256,
"epochs": [30, 60, 90],
"steps": [0.1, 0.01, 0.001, 0.0001]
}
}
class VGGNet():
def __init__(self, layers=16):
self.params = train_parameters
self.layers = layers
def net(self, input, class_dim=1000):
layers = self.layers
vgg_spec = {
11: ([1, 1, 2, 2, 2]),
13: ([2, 2, 2, 2, 2]),
16: ([2, 2, 3, 3, 3]),
19: ([2, 2, 4, 4, 4])
}
assert layers in vgg_spec.keys(), \
"supported layers are {} but input layer is {}".format(vgg_spec.keys(), layers)
nums = vgg_spec[layers]
conv1 = self.conv_block(input, 64, nums[0])
conv2 = self.conv_block(conv1, 128, nums[1])
conv3 = self.conv_block(conv2, 256, nums[2])
conv4 = self.conv_block(conv3, 512, nums[3])
conv5 = self.conv_block(conv4, 512, nums[4])
fc_dim = 4096
fc1 = fluid.layers.fc(
input=conv5,
size=fc_dim,
act='relu',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Normal(scale=0.005)),
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Constant(value=0.1)))
fc1 = fluid.layers.dropout(x=fc1, dropout_prob=0.5)
fc2 = fluid.layers.fc(
input=fc1,
size=fc_dim,
act='relu',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Normal(scale=0.005)),
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Constant(value=0.1)))
fc2 = fluid.layers.dropout(x=fc2, dropout_prob=0.5)
out = fluid.layers.fc(
input=fc2,
size=class_dim,
act='softmax',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Normal(scale=0.005)),
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Constant(value=0.1)))
return out
def conv_block(self, input, num_filter, groups):
conv = input
for i in range(groups):
conv = fluid.layers.conv2d(
input=conv,
num_filters=num_filter,
filter_size=3,
stride=1,
padding=1,
act='relu',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Normal(scale=0.01)),
bias_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Constant(value=0.0)))
return fluid.layers.pool2d(
input=conv, pool_size=2, pool_type='max', pool_stride=2)
def VGG11():
model = VGGNet(layers=11)
return model
def VGG13():
model = VGGNet(layers=13)
return model
def VGG16():
model = VGGNet(layers=16)
return model
def VGG19():
model = VGGNet(layers=19)
return model
...@@ -11,7 +11,7 @@ random.seed(0) ...@@ -11,7 +11,7 @@ random.seed(0)
DATA_DIM = 224 DATA_DIM = 224
THREAD = 8 THREAD = 8
BUF_SIZE = 1024 BUF_SIZE = 102400
DATA_DIR = 'data/ILSVRC2012' DATA_DIR = 'data/ILSVRC2012'
TRAIN_LIST = 'data/ILSVRC2012/train_list.txt' TRAIN_LIST = 'data/ILSVRC2012/train_list.txt'
...@@ -105,7 +105,7 @@ def process_image(sample, mode, color_jitter, rotate): ...@@ -105,7 +105,7 @@ def process_image(sample, mode, color_jitter, rotate):
if rotate: img = rotate_image(img) if rotate: img = rotate_image(img)
img = random_crop(img, DATA_DIM) img = random_crop(img, DATA_DIM)
else: else:
img = resize_short(img, DATA_DIM) img = resize_short(img, target_size=256)
img = crop_image(img, target_size=DATA_DIM, center=True) img = crop_image(img, target_size=DATA_DIM, center=True)
if mode == 'train': if mode == 'train':
if color_jitter: if color_jitter:
...@@ -120,9 +120,9 @@ def process_image(sample, mode, color_jitter, rotate): ...@@ -120,9 +120,9 @@ def process_image(sample, mode, color_jitter, rotate):
img -= img_mean img -= img_mean
img /= img_std img /= img_std
if mode == 'train' or mode == 'test': if mode == 'train' or mode == 'val':
return img, sample[1] return img, sample[1]
elif mode == 'infer': elif mode == 'test':
return [img] return [img]
...@@ -137,11 +137,11 @@ def _reader_creator(file_list, ...@@ -137,11 +137,11 @@ def _reader_creator(file_list,
if shuffle: if shuffle:
random.shuffle(lines) random.shuffle(lines)
for line in lines: for line in lines:
if mode == 'train' or mode == 'test': if mode == 'train' or mode == 'val':
img_path, label = line.split() img_path, label = line.split()
img_path = os.path.join(DATA_DIR, img_path) img_path = os.path.join(DATA_DIR, img_path)
yield img_path, int(label) yield img_path, int(label)
elif mode == 'infer': elif mode == 'test':
img_path = os.path.join(DATA_DIR, line) img_path = os.path.join(DATA_DIR, line)
yield [img_path] yield [img_path]
...@@ -156,9 +156,9 @@ def train(file_list=TRAIN_LIST): ...@@ -156,9 +156,9 @@ def train(file_list=TRAIN_LIST):
file_list, 'train', shuffle=True, color_jitter=False, rotate=False) file_list, 'train', shuffle=True, color_jitter=False, rotate=False)
def test(file_list=TEST_LIST): def val(file_list=TEST_LIST):
return _reader_creator(file_list, 'test', shuffle=False) return _reader_creator(file_list, 'val', shuffle=False)
def infer(file_list): def test(file_list):
return _reader_creator(file_list, 'infer', shuffle=False) return _reader_creator(file_list, 'test', shuffle=False)
import os
import numpy as np
import time
import sys
import paddle
import paddle.fluid as fluid
import reader
import paddle.fluid.layers.control_flow as control_flow
import paddle.fluid.layers.nn as nn
import paddle.fluid.layers.tensor as tensor
import math
def conv_bn_layer(input, num_filters, filter_size, stride=1, groups=1,
act=None):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=(filter_size - 1) / 2,
groups=groups,
act=None,
bias_attr=False)
return fluid.layers.batch_norm(input=conv, act=act)
def squeeze_excitation(input, num_channels, reduction_ratio):
pool = fluid.layers.pool2d(
input=input, pool_size=0, pool_type='avg', global_pooling=True)
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
squeeze = fluid.layers.fc(input=pool,
size=num_channels / reduction_ratio,
act='relu',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv,
stdv)))
stdv = 1.0 / math.sqrt(squeeze.shape[1] * 1.0)
excitation = fluid.layers.fc(input=squeeze,
size=num_channels,
act='sigmoid',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(
-stdv, stdv)))
scale = fluid.layers.elementwise_mul(x=input, y=excitation, axis=0)
return scale
def shortcut(input, ch_out, stride):
ch_in = input.shape[1]
if ch_in != ch_out or stride != 1:
filter_size = 1
return conv_bn_layer(input, ch_out, filter_size, stride)
else:
return input
def bottleneck_block(input, num_filters, stride, cardinality, reduction_ratio):
conv0 = conv_bn_layer(
input=input, num_filters=num_filters, filter_size=1, act='relu')
conv1 = conv_bn_layer(
input=conv0,
num_filters=num_filters,
filter_size=3,
stride=stride,
groups=cardinality,
act='relu')
conv2 = conv_bn_layer(
input=conv1, num_filters=num_filters * 2, filter_size=1, act=None)
scale = squeeze_excitation(
input=conv2,
num_channels=num_filters * 2,
reduction_ratio=reduction_ratio)
short = shortcut(input, num_filters * 2, stride)
return fluid.layers.elementwise_add(x=short, y=scale, act='relu')
def SE_ResNeXt(input, class_dim, infer=False, layers=50):
supported_layers = [50, 152]
if layers not in supported_layers:
print("supported layers are", supported_layers, \
"but input layer is ", layers)
exit()
if layers == 50:
cardinality = 32
reduction_ratio = 16
depth = [3, 4, 6, 3]
num_filters = [128, 256, 512, 1024]
conv = conv_bn_layer(
input=input, num_filters=64, filter_size=7, stride=2, act='relu')
conv = fluid.layers.pool2d(
input=conv,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max')
elif layers == 152:
cardinality = 64
reduction_ratio = 16
depth = [3, 8, 36, 3]
num_filters = [128, 256, 512, 1024]
conv = conv_bn_layer(
input=input, num_filters=64, filter_size=3, stride=2, act='relu')
conv = conv_bn_layer(
input=conv, num_filters=64, filter_size=3, stride=1, act='relu')
conv = conv_bn_layer(
input=conv, num_filters=128, filter_size=3, stride=1, act='relu')
conv = fluid.layers.pool2d(
input=conv, pool_size=3, pool_stride=2, pool_padding=1, \
pool_type='max')
for block in range(len(depth)):
for i in range(depth[block]):
conv = bottleneck_block(
input=conv,
num_filters=num_filters[block],
stride=2 if i == 0 and block != 0 else 1,
cardinality=cardinality,
reduction_ratio=reduction_ratio)
pool = fluid.layers.pool2d(
input=conv, pool_size=7, pool_type='avg', global_pooling=True)
if not infer:
drop = fluid.layers.dropout(x=pool, dropout_prob=0.5)
else:
drop = pool
stdv = 1.0 / math.sqrt(drop.shape[1] * 1.0)
out = fluid.layers.fc(input=drop,
size=class_dim,
act='softmax',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv,
stdv)))
return out
...@@ -4,278 +4,145 @@ import time ...@@ -4,278 +4,145 @@ import time
import sys import sys
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
from se_resnext import SE_ResNeXt import models
from mobilenet import mobile_net
from inception_v4 import inception_v4
import reader import reader
import argparse import argparse
import functools import functools
import paddle.fluid.layers.ops as ops from models.learning_rate import cosine_decay
from utility import add_arguments, print_arguments from utility import add_arguments, print_arguments
from paddle.fluid.initializer import init_on_cpu
from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter
import math import math
parser = argparse.ArgumentParser(description=__doc__) parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser) add_arg = functools.partial(add_arguments, argparser=parser)
add_arg('batch_size', int, 256, "Minibatch size.") # yapf: disable
add_arg('num_layers', int, 50, "How many layers for SE-ResNeXt model.") add_arg('batch_size', int, 256, "Minibatch size.")
add_arg('with_mem_opt', bool, True, add_arg('use_gpu', bool, True, "Whether to use GPU or not.")
"Whether to use memory optimization or not.") add_arg('total_images', int, 1281167, "Training image number.")
add_arg('parallel_exe', bool, True, add_arg('num_epochs', int, 120, "number of epochs.")
"Whether to use ParallelExecutor to train or not.") add_arg('class_dim', int, 1000, "Class number.")
add_arg('init_model', str, None, "Whether to use initialized model.") add_arg('image_shape', str, "3,224,224", "input image size")
add_arg('pretrained_model', str, None, "Whether to use pretrained model.") add_arg('model_save_dir', str, "output", "model save directory")
add_arg('lr_strategy', str, "cosine_decay", add_arg('with_mem_opt', bool, True, "Whether to use memory optimization or not.")
"Set the learning rate decay strategy.") add_arg('pretrained_model', str, None, "Whether to use pretrained model.")
add_arg('model', str, "se_resnext", "Set the network to use.") add_arg('checkpoint', str, None, "Whether to resume checkpoint.")
add_arg('lr', float, 0.1, "set learning rate.")
add_arg('lr_strategy', str, "piecewise_decay", "Set the learning rate decay strategy.")
def cosine_decay(learning_rate, step_each_epoch, epochs=120): add_arg('model', str, "SE_ResNeXt50_32x4d", "Set the network to use.")
"""Applies cosine decay to the learning rate. # yapf: enable
lr = 0.05 * (math.cos(epoch * (math.pi / 120)) + 1)
""" model_list = [m for m in dir(models) if "__" not in m]
global_step = _decay_step_counter()
with init_on_cpu(): def optimizer_setting(params):
epoch = ops.floor(global_step / step_each_epoch) ls = params["learning_strategy"]
decayed_lr = learning_rate * \
(ops.cos(epoch * (math.pi / epochs)) + 1)/2 if ls["name"] == "piecewise_decay":
return decayed_lr if "total_images" not in params:
total_images = 1281167
def train_parallel_do(args,
learning_rate,
batch_size,
num_passes,
init_model=None,
pretrained_model=None,
model_save_dir='model',
parallel=True,
use_nccl=True,
lr_strategy=None,
layers=50):
class_dim = 1000
image_shape = [3, 224, 224]
image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
if parallel:
places = fluid.layers.get_places()
pd = fluid.layers.ParallelDo(places, use_nccl=use_nccl)
with pd.do():
image_ = pd.read_input(image)
label_ = pd.read_input(label)
if args.model is 'se_resnext':
out = SE_ResNeXt(
input=image_, class_dim=class_dim, layers=layers)
elif args.model is 'mobile_net':
out = mobile_net(img=image_, class_dim=class_dim)
else:
out = inception_v4(img=image_, class_dim=class_dim)
cost = fluid.layers.cross_entropy(input=out, label=label_)
avg_cost = fluid.layers.mean(x=cost)
acc_top1 = fluid.layers.accuracy(input=out, label=label_, k=1)
acc_top5 = fluid.layers.accuracy(input=out, label=label_, k=5)
pd.write_output(avg_cost)
pd.write_output(acc_top1)
pd.write_output(acc_top5)
avg_cost, acc_top1, acc_top5 = pd()
avg_cost = fluid.layers.mean(x=avg_cost)
acc_top1 = fluid.layers.mean(x=acc_top1)
acc_top5 = fluid.layers.mean(x=acc_top5)
else:
if args.model is 'se_resnext':
out = SE_ResNeXt(input=image, class_dim=class_dim, layers=layers)
elif args.model is 'mobile_net':
out = mobile_net(img=image, class_dim=class_dim)
else: else:
out = inception_v4(img=image, class_dim=class_dim) total_images = params["total_images"]
cost = fluid.layers.cross_entropy(input=out, label=label)
avg_cost = fluid.layers.mean(x=cost)
acc_top1 = fluid.layers.accuracy(input=out, label=label, k=1)
acc_top5 = fluid.layers.accuracy(input=out, label=label, k=5)
inference_program = fluid.default_main_program().clone(for_test=True) batch_size = ls["batch_size"]
step = int(total_images / batch_size + 1)
if "piecewise_decay" in lr_strategy: bd = [step * e for e in ls["epochs"]]
bd = lr_strategy["piecewise_decay"]["bd"] base_lr = params["lr"]
lr = lr_strategy["piecewise_decay"]["lr"] lr = []
lr = [base_lr * (0.1**i) for i in range(len(bd) + 1)]
optimizer = fluid.optimizer.Momentum( optimizer = fluid.optimizer.Momentum(
learning_rate=fluid.layers.piecewise_decay( learning_rate=fluid.layers.piecewise_decay(
boundaries=bd, values=lr), boundaries=bd, values=lr),
momentum=0.9, momentum=0.9,
regularization=fluid.regularizer.L2Decay(1e-4)) regularization=fluid.regularizer.L2Decay(1e-4))
elif "cosine_decay" in lr_strategy: elif ls["name"] == "cosine_decay":
step_each_epoch = lr_strategy["cosine_decay"]["step_each_epoch"] if "total_images" not in params:
epochs = lr_strategy["cosine_decay"]["epochs"] total_images = 1281167
else:
total_images = params["total_images"]
batch_size = ls["batch_size"]
step = int(total_images / batch_size + 1)
lr = params["lr"]
num_epochs = params["num_epochs"]
optimizer = fluid.optimizer.Momentum( optimizer = fluid.optimizer.Momentum(
learning_rate=cosine_decay( learning_rate=cosine_decay(
learning_rate=learning_rate, learning_rate=lr, step_each_epoch=step, epochs=num_epochs),
step_each_epoch=step_each_epoch,
epochs=epochs),
momentum=0.9, momentum=0.9,
regularization=fluid.regularizer.L2Decay(1e-4)) regularization=fluid.regularizer.L2Decay(1e-4))
else: else:
lr = params["lr"]
optimizer = fluid.optimizer.Momentum( optimizer = fluid.optimizer.Momentum(
learning_rate=learning_rate, learning_rate=lr,
momentum=0.9, momentum=0.9,
regularization=fluid.regularizer.L2Decay(1e-4)) regularization=fluid.regularizer.L2Decay(1e-4))
opts = optimizer.minimize(avg_cost) return optimizer
if args.with_mem_opt:
fluid.memory_optimize(fluid.default_main_program())
place = fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
if init_model is not None:
fluid.io.load_persistables(exe, init_model)
if pretrained_model:
def if_exist(var):
return os.path.exists(os.path.join(pretrained_model, var.name))
fluid.io.load_vars(exe, pretrained_model, predicate=if_exist)
train_reader = paddle.batch(reader.train(), batch_size=batch_size)
test_reader = paddle.batch(reader.test(), batch_size=batch_size)
feeder = fluid.DataFeeder(place=place, feed_list=[image, label])
for pass_id in range(num_passes):
train_info = [[], [], []]
test_info = [[], [], []]
for batch_id, data in enumerate(train_reader()):
t1 = time.time()
loss, acc1, acc5 = exe.run(
fluid.default_main_program(),
feed=feeder.feed(data),
fetch_list=[avg_cost, acc_top1, acc_top5])
t2 = time.time()
period = t2 - t1
train_info[0].append(loss[0])
train_info[1].append(acc1[0])
train_info[2].append(acc5[0])
if batch_id % 10 == 0:
print("Pass {0}, trainbatch {1}, loss {2}, \
acc1 {3}, acc5 {4} time {5}"
.format(pass_id, \
batch_id, loss[0], acc1[0], acc5[0], \
"%2.2f sec" % period))
sys.stdout.flush()
train_loss = np.array(train_info[0]).mean()
train_acc1 = np.array(train_info[1]).mean()
train_acc5 = np.array(train_info[2]).mean()
for data in test_reader():
t1 = time.time()
loss, acc1, acc5 = exe.run(
inference_program,
feed=feeder.feed(data),
fetch_list=[avg_cost, acc_top1, acc_top5])
t2 = time.time()
period = t2 - t1
test_info[0].append(loss[0])
test_info[1].append(acc1[0])
test_info[2].append(acc5[0])
if batch_id % 10 == 0:
print("Pass {0},testbatch {1},loss {2}, \
acc1 {3},acc5 {4},time {5}"
.format(pass_id, \
batch_id, loss[0], acc1[0], acc5[0], \
"%2.2f sec" % period))
sys.stdout.flush()
test_loss = np.array(test_info[0]).mean()
test_acc1 = np.array(test_info[1]).mean()
test_acc5 = np.array(test_info[2]).mean()
print("End pass {0}, train_loss {1}, train_acc1 {2}, train_acc5 {3}, \
test_loss {4}, test_acc1 {5}, test_acc5 {6}"
.format(pass_id, \
train_loss, train_acc1, train_acc5, test_loss, test_acc1, \
test_acc5))
sys.stdout.flush()
model_path = os.path.join(model_save_dir + '/' + args.model,
str(pass_id))
if not os.path.isdir(model_path):
os.makedirs(model_path)
fluid.io.save_persistables(exe, model_path)
def train(args):
# parameters from arguments
class_dim = args.class_dim
model_name = args.model
checkpoint = args.checkpoint
pretrained_model = args.pretrained_model
with_memory_optimization = args.with_mem_opt
model_save_dir = args.model_save_dir
image_shape = [int(m) for m in args.image_shape.split(",")]
def train_parallel_exe(args, assert model_name in model_list, "{} is not in lists: {}".format(args.model,
learning_rate, model_list)
batch_size,
num_passes,
init_model=None,
pretrained_model=None,
model_save_dir='model',
parallel=True,
use_nccl=True,
lr_strategy=None,
layers=50):
class_dim = 1000
image_shape = [3, 224, 224]
image = fluid.layers.data(name='image', shape=image_shape, dtype='float32') image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64') label = fluid.layers.data(name='label', shape=[1], dtype='int64')
if args.model is 'se_resnext':
out = SE_ResNeXt(input=image, class_dim=class_dim, layers=layers) # model definition
elif args.model is 'mobile_net': model = models.__dict__[model_name]()
out = mobile_net(img=image, class_dim=class_dim)
if model_name is "GoogleNet":
out0, out1, out2 = model.net(input=image, class_dim=class_dim)
cost0 = fluid.layers.cross_entropy(input=out0, label=label)
cost1 = fluid.layers.cross_entropy(input=out1, label=label)
cost2 = fluid.layers.cross_entropy(input=out2, label=label)
avg_cost0 = fluid.layers.mean(x=cost0)
avg_cost1 = fluid.layers.mean(x=cost1)
avg_cost2 = fluid.layers.mean(x=cost2)
avg_cost = avg_cost0 + 0.3 * avg_cost1 + 0.3 * avg_cost2
acc_top1 = fluid.layers.accuracy(input=out0, label=label, k=1)
acc_top5 = fluid.layers.accuracy(input=out0, label=label, k=5)
else: else:
out = inception_v4(img=image, class_dim=class_dim) out = model.net(input=image, class_dim=class_dim)
cost = fluid.layers.cross_entropy(input=out, label=label)
cost = fluid.layers.cross_entropy(input=out, label=label) avg_cost = fluid.layers.mean(x=cost)
acc_top1 = fluid.layers.accuracy(input=out, label=label, k=1) acc_top1 = fluid.layers.accuracy(input=out, label=label, k=1)
acc_top5 = fluid.layers.accuracy(input=out, label=label, k=5) acc_top5 = fluid.layers.accuracy(input=out, label=label, k=5)
avg_cost = fluid.layers.mean(x=cost)
test_program = fluid.default_main_program().clone(for_test=True) test_program = fluid.default_main_program().clone(for_test=True)
if "piecewise_decay" in lr_strategy: # parameters from model and arguments
bd = lr_strategy["piecewise_decay"]["bd"] params = model.params
lr = lr_strategy["piecewise_decay"]["lr"] params["total_images"] = args.total_images
optimizer = fluid.optimizer.Momentum( params["lr"] = args.lr
learning_rate=fluid.layers.piecewise_decay( params["num_epochs"] = args.num_epochs
boundaries=bd, values=lr), params["learning_strategy"]["batch_size"] = args.batch_size
momentum=0.9, params["learning_strategy"]["name"] = args.lr_strategy
regularization=fluid.regularizer.L2Decay(1e-4))
elif "cosine_decay" in lr_strategy:
step_each_epoch = lr_strategy["cosine_decay"]["step_each_epoch"]
epochs = lr_strategy["cosine_decay"]["epochs"]
optimizer = fluid.optimizer.Momentum(
learning_rate=cosine_decay(
learning_rate=learning_rate,
step_each_epoch=step_each_epoch,
epochs=epochs),
momentum=0.9,
regularization=fluid.regularizer.L2Decay(1e-4))
else:
optimizer = fluid.optimizer.Momentum(
learning_rate=learning_rate,
momentum=0.9,
regularization=fluid.regularizer.L2Decay(1e-4))
# initialize optimizer
optimizer = optimizer_setting(params)
opts = optimizer.minimize(avg_cost) opts = optimizer.minimize(avg_cost)
if args.with_mem_opt: if with_memory_optimization:
fluid.memory_optimize(fluid.default_main_program()) fluid.memory_optimize(fluid.default_main_program())
place = fluid.CUDAPlace(0) place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place) exe = fluid.Executor(place)
exe.run(fluid.default_startup_program()) exe.run(fluid.default_startup_program())
if init_model is not None: if checkpoint is not None:
fluid.io.load_persistables(exe, init_model) fluid.io.load_persistables(exe, checkpoint)
if pretrained_model: if pretrained_model:
...@@ -284,18 +151,17 @@ def train_parallel_exe(args, ...@@ -284,18 +151,17 @@ def train_parallel_exe(args,
fluid.io.load_vars(exe, pretrained_model, predicate=if_exist) fluid.io.load_vars(exe, pretrained_model, predicate=if_exist)
train_reader = paddle.batch(reader.train(), batch_size=batch_size) train_batch_size = args.batch_size
test_reader = paddle.batch(reader.test(), batch_size=batch_size) test_batch_size = 16
train_reader = paddle.batch(reader.train(), batch_size=train_batch_size)
test_reader = paddle.batch(reader.val(), batch_size=test_batch_size)
feeder = fluid.DataFeeder(place=place, feed_list=[image, label]) feeder = fluid.DataFeeder(place=place, feed_list=[image, label])
train_exe = fluid.ParallelExecutor(use_cuda=True, loss_name=avg_cost.name) train_exe = fluid.ParallelExecutor(use_cuda=True, loss_name=avg_cost.name)
test_exe = fluid.ParallelExecutor(
use_cuda=True, main_program=test_program, share_vars_from=train_exe)
fetch_list = [avg_cost.name, acc_top1.name, acc_top5.name] fetch_list = [avg_cost.name, acc_top1.name, acc_top5.name]
for pass_id in range(num_passes): for pass_id in range(params["num_epochs"]):
train_info = [[], [], []] train_info = [[], [], []]
test_info = [[], [], []] test_info = [[], [], []]
for batch_id, data in enumerate(train_reader()): for batch_id, data in enumerate(train_reader()):
...@@ -320,81 +186,51 @@ def train_parallel_exe(args, ...@@ -320,81 +186,51 @@ def train_parallel_exe(args,
train_loss = np.array(train_info[0]).mean() train_loss = np.array(train_info[0]).mean()
train_acc1 = np.array(train_info[1]).mean() train_acc1 = np.array(train_info[1]).mean()
train_acc5 = np.array(train_info[2]).mean() train_acc5 = np.array(train_info[2]).mean()
for data in test_reader(): cnt = 0
for test_batch_id, data in enumerate(test_reader()):
t1 = time.time() t1 = time.time()
loss, acc1, acc5 = test_exe.run(fetch_list, feed=feeder.feed(data)) loss, acc1, acc5 = exe.run(test_program,
fetch_list=fetch_list,
feed=feeder.feed(data))
t2 = time.time() t2 = time.time()
period = t2 - t1 period = t2 - t1
loss = np.mean(np.array(loss)) loss = np.mean(loss)
acc1 = np.mean(np.array(acc1)) acc1 = np.mean(acc1)
acc5 = np.mean(np.array(acc5)) acc5 = np.mean(acc5)
test_info[0].append(loss) test_info[0].append(loss * len(data))
test_info[1].append(acc1) test_info[1].append(acc1 * len(data))
test_info[2].append(acc5) test_info[2].append(acc5 * len(data))
if batch_id % 10 == 0: cnt += len(data)
if test_batch_id % 10 == 0:
print("Pass {0},testbatch {1},loss {2}, \ print("Pass {0},testbatch {1},loss {2}, \
acc1 {3},acc5 {4},time {5}" acc1 {3},acc5 {4},time {5}"
.format(pass_id, \ .format(pass_id, \
batch_id, loss, acc1, acc5, \ test_batch_id, loss, acc1, acc5, \
"%2.2f sec" % period)) "%2.2f sec" % period))
sys.stdout.flush() sys.stdout.flush()
test_loss = np.array(test_info[0]).mean() test_loss = np.sum(test_info[0]) / cnt
test_acc1 = np.array(test_info[1]).mean() test_acc1 = np.sum(test_info[1]) / cnt
test_acc5 = np.array(test_info[2]).mean() test_acc5 = np.sum(test_info[2]) / cnt
print("End pass {0}, train_loss {1}, train_acc1 {2}, train_acc5 {3}, \ print("End pass {0}, train_loss {1}, train_acc1 {2}, train_acc5 {3}, "
test_loss {4}, test_acc1 {5}, test_acc5 {6}" "test_loss {4}, test_acc1 {5}, test_acc5 {6}".format(pass_id, \
.format(pass_id, \
train_loss, train_acc1, train_acc5, test_loss, test_acc1, \ train_loss, train_acc1, train_acc5, test_loss, test_acc1, \
test_acc5)) test_acc5))
sys.stdout.flush() sys.stdout.flush()
model_path = os.path.join(model_save_dir + '/' + args.model, model_path = os.path.join(model_save_dir + '/' + model_name,
str(pass_id)) str(pass_id))
if not os.path.isdir(model_path): if not os.path.isdir(model_path):
os.makedirs(model_path) os.makedirs(model_path)
fluid.io.save_persistables(exe, model_path) fluid.io.save_persistables(exe, model_path)
if __name__ == '__main__': def main():
args = parser.parse_args() args = parser.parse_args()
print_arguments(args) print_arguments(args)
train(args)
total_images = 1281167
batch_size = args.batch_size if __name__ == '__main__':
step = int(total_images / batch_size + 1) main()
num_epochs = 120
learning_rate_mode = args.lr_strategy
lr_strategy = {}
if learning_rate_mode == "piecewise_decay":
epoch_points = [30, 60, 90]
bd = [e * step for e in epoch_points]
lr = [0.1, 0.01, 0.001, 0.0001]
lr_strategy[learning_rate_mode] = {"bd": bd, "lr": lr}
elif learning_rate_mode == "cosine_decay":
lr_strategy[learning_rate_mode] = {
"step_each_epoch": step,
"epochs": num_epochs
}
else:
lr_strategy = None
use_nccl = True
# layers: 50, 152
layers = args.num_layers
method = train_parallel_exe if args.parallel_exe else train_parallel_do
init_model = args.init_model if args.init_model else None
pretrained_model = args.pretrained_model if args.pretrained_model else None
method(
args,
learning_rate=0.1,
batch_size=batch_size,
num_passes=num_epochs,
init_model=init_model,
pretrained_model=pretrained_model,
parallel=True,
use_nccl=True,
lr_strategy=lr_strategy,
layers=layers)
class TrainTaskConfig(object): class TrainTaskConfig(object):
# only support GPU currently
use_gpu = True use_gpu = True
# the epoch number to train. # the epoch number to train.
pass_num = 30 pass_num = 30
# the number of sequences contained in a mini-batch. # the number of sequences contained in a mini-batch.
# deprecated, set batch_size in args.
batch_size = 32 batch_size = 32
# the hyper parameters for Adam optimizer. # the hyper parameters for Adam optimizer.
# This static learning_rate will be multiplied to the LearningRateScheduler # This static learning_rate will be multiplied to the LearningRateScheduler
...@@ -13,8 +15,6 @@ class TrainTaskConfig(object): ...@@ -13,8 +15,6 @@ class TrainTaskConfig(object):
eps = 1e-9 eps = 1e-9
# the parameters for learning rate scheduling. # the parameters for learning rate scheduling.
warmup_steps = 4000 warmup_steps = 4000
# the flag indicating to use average loss or sum loss when training.
use_avg_cost = True
# the weight used to mix up the ground-truth distribution and the fixed # the weight used to mix up the ground-truth distribution and the fixed
# uniform distribution in label smoothing when training. # uniform distribution in label smoothing when training.
# Set this as zero if label smoothing is not wanted. # Set this as zero if label smoothing is not wanted.
...@@ -38,22 +38,20 @@ class InferTaskConfig(object): ...@@ -38,22 +38,20 @@ class InferTaskConfig(object):
batch_size = 2 batch_size = 2
# the parameters for beam search. # the parameters for beam search.
beam_size = 5 beam_size = 5
max_out_len = 30 max_out_len = 256
# the number of decoded sentences to output. # the number of decoded sentences to output.
n_best = 1 n_best = 1
# the flags indicating whether to output the special tokens. # the flags indicating whether to output the special tokens.
output_bos = True #False output_bos = False
output_eos = True #False output_eos = False
output_unk = True #False output_unk = True
# the directory for loading the trained model. # the directory for loading the trained model.
model_path = "trained_models/pass_1.infer.model" model_path = "trained_models/pass_1.infer.model"
class ModelHyperParams(object): class ModelHyperParams(object):
# This model directly uses paddle.dataset.wmt16 in which <bos>, <eos> and # These following five vocabularies related configurations will be set
# <unk> token has alreay been added. As for the <pad> token, any token # automatically according to the passed vocabulary path and special tokens.
# included in dict can be used to pad, since the paddings' loss will be
# masked out and make no effect on parameter gradients.
# size of source word dictionary. # size of source word dictionary.
src_vocab_size = 10000 src_vocab_size = 10000
# size of target word dictionay # size of target word dictionay
...@@ -68,13 +66,13 @@ class ModelHyperParams(object): ...@@ -68,13 +66,13 @@ class ModelHyperParams(object):
# The size of position encoding table should at least plus 1, since the # The size of position encoding table should at least plus 1, since the
# sinusoid position encoding starts from 1 and 0 can be used as the padding # sinusoid position encoding starts from 1 and 0 can be used as the padding
# token for position encoding. # token for position encoding.
max_length = 50 max_length = 256
# the dimension for word embeddings, which is also the last dimension of # the dimension for word embeddings, which is also the last dimension of
# the input and output of multi-head attention, position-wise feed-forward # the input and output of multi-head attention, position-wise feed-forward
# networks, encoder and decoder. # networks, encoder and decoder.
d_model = 512 d_model = 512
# size of the hidden layer in position-wise feed-forward networks. # size of the hidden layer in position-wise feed-forward networks.
d_inner_hid = 1024 d_inner_hid = 2048
# the dimension that keys are projected to for dot-product attention. # the dimension that keys are projected to for dot-product attention.
d_key = 64 d_key = 64
# the dimension that values are projected to for dot-product attention. # the dimension that values are projected to for dot-product attention.
...@@ -85,6 +83,9 @@ class ModelHyperParams(object): ...@@ -85,6 +83,9 @@ class ModelHyperParams(object):
n_layer = 6 n_layer = 6
# dropout rate used by all dropout layers. # dropout rate used by all dropout layers.
dropout = 0.1 dropout = 0.1
# the flag indicating whether to share embedding and softmax weights.
# vocabularies in source and target should be same for weight sharing.
weight_sharing = True
def merge_cfg_from_list(cfg_list, g_cfgs): def merge_cfg_from_list(cfg_list, g_cfgs):
...@@ -97,7 +98,7 @@ def merge_cfg_from_list(cfg_list, g_cfgs): ...@@ -97,7 +98,7 @@ def merge_cfg_from_list(cfg_list, g_cfgs):
if hasattr(g_cfg, key): if hasattr(g_cfg, key):
try: try:
value = eval(value) value = eval(value)
except SyntaxError: # for file path except Exception: # for file path
pass pass
setattr(g_cfg, key, value) setattr(g_cfg, key, value)
break break
...@@ -179,6 +180,10 @@ input_descs = { ...@@ -179,6 +180,10 @@ input_descs = {
"init_score": [(batch_size, 1L), "float32"], "init_score": [(batch_size, 1L), "float32"],
} }
# Names of word embedding table which might be reused for weight sharing.
word_emb_param_names = (
"src_word_emb_table",
"trg_word_emb_table", )
# Names of position encoding table which will be initialized externally. # Names of position encoding table which will be initialized externally.
pos_enc_param_names = ( pos_enc_param_names = (
"src_pos_enc_table", "src_pos_enc_table",
......
...@@ -319,7 +319,7 @@ def infer(args): ...@@ -319,7 +319,7 @@ def infer(args):
ModelHyperParams.n_layer, ModelHyperParams.n_head, ModelHyperParams.n_layer, ModelHyperParams.n_head,
ModelHyperParams.d_key, ModelHyperParams.d_value, ModelHyperParams.d_key, ModelHyperParams.d_value,
ModelHyperParams.d_model, ModelHyperParams.d_inner_hid, ModelHyperParams.d_model, ModelHyperParams.d_inner_hid,
ModelHyperParams.dropout) ModelHyperParams.dropout, ModelHyperParams.weight_sharing)
decoder_program = fluid.Program() decoder_program = fluid.Program()
with fluid.program_guard(main_program=decoder_program): with fluid.program_guard(main_program=decoder_program):
...@@ -328,7 +328,7 @@ def infer(args): ...@@ -328,7 +328,7 @@ def infer(args):
ModelHyperParams.n_layer, ModelHyperParams.n_head, ModelHyperParams.n_layer, ModelHyperParams.n_head,
ModelHyperParams.d_key, ModelHyperParams.d_value, ModelHyperParams.d_key, ModelHyperParams.d_value,
ModelHyperParams.d_model, ModelHyperParams.d_inner_hid, ModelHyperParams.d_model, ModelHyperParams.d_inner_hid,
ModelHyperParams.dropout) ModelHyperParams.dropout, ModelHyperParams.weight_sharing)
# Load model parameters of encoder and decoder separately from the saved # Load model parameters of encoder and decoder separately from the saved
# transformer model. # transformer model.
......
...@@ -49,26 +49,14 @@ def multi_head_attention(queries, ...@@ -49,26 +49,14 @@ def multi_head_attention(queries,
""" """
q = layers.fc(input=queries, q = layers.fc(input=queries,
size=d_key * n_head, size=d_key * n_head,
param_attr=fluid.initializer.Xavier(
uniform=False,
fan_in=d_model * d_key,
fan_out=n_head * d_key),
bias_attr=False, bias_attr=False,
num_flatten_dims=2) num_flatten_dims=2)
k = layers.fc(input=keys, k = layers.fc(input=keys,
size=d_key * n_head, size=d_key * n_head,
param_attr=fluid.initializer.Xavier(
uniform=False,
fan_in=d_model * d_key,
fan_out=n_head * d_key),
bias_attr=False, bias_attr=False,
num_flatten_dims=2) num_flatten_dims=2)
v = layers.fc(input=values, v = layers.fc(input=values,
size=d_value * n_head, size=d_value * n_head,
param_attr=fluid.initializer.Xavier(
uniform=False,
fan_in=d_model * d_value,
fan_out=n_head * d_value),
bias_attr=False, bias_attr=False,
num_flatten_dims=2) num_flatten_dims=2)
return q, k, v return q, k, v
...@@ -168,7 +156,6 @@ def multi_head_attention(queries, ...@@ -168,7 +156,6 @@ def multi_head_attention(queries,
# Project back to the model size. # Project back to the model size.
proj_out = layers.fc(input=out, proj_out = layers.fc(input=out,
size=d_model, size=d_model,
param_attr=fluid.initializer.Xavier(uniform=False),
bias_attr=False, bias_attr=False,
num_flatten_dims=2) num_flatten_dims=2)
# global FLAG # global FLAG
...@@ -188,14 +175,8 @@ def positionwise_feed_forward(x, d_inner_hid, d_hid): ...@@ -188,14 +175,8 @@ def positionwise_feed_forward(x, d_inner_hid, d_hid):
hidden = layers.fc(input=x, hidden = layers.fc(input=x,
size=d_inner_hid, size=d_inner_hid,
num_flatten_dims=2, num_flatten_dims=2,
param_attr=fluid.initializer.Uniform(
low=-(d_hid**-0.5), high=(d_hid**-0.5)),
act="relu") act="relu")
out = layers.fc(input=hidden, out = layers.fc(input=hidden, size=d_hid, num_flatten_dims=2)
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.initializer.Uniform(
low=-(d_inner_hid**-0.5), high=(d_inner_hid**-0.5)))
return out return out
...@@ -233,6 +214,7 @@ def prepare_encoder(src_word, ...@@ -233,6 +214,7 @@ def prepare_encoder(src_word,
src_max_len, src_max_len,
dropout_rate=0., dropout_rate=0.,
src_data_shape=None, src_data_shape=None,
word_emb_param_name=None,
pos_enc_param_name=None): pos_enc_param_name=None):
"""Add word embeddings and position encodings. """Add word embeddings and position encodings.
The output tensor has a shape of: The output tensor has a shape of:
...@@ -242,7 +224,10 @@ def prepare_encoder(src_word, ...@@ -242,7 +224,10 @@ def prepare_encoder(src_word,
src_word_emb = layers.embedding( src_word_emb = layers.embedding(
src_word, src_word,
size=[src_vocab_size, src_emb_dim], size=[src_vocab_size, src_emb_dim],
param_attr=fluid.initializer.Normal(0., 1.)) param_attr=fluid.ParamAttr(
name=word_emb_param_name,
initializer=fluid.initializer.Normal(0., src_emb_dim**-0.5)))
src_word_emb = layers.scale(x=src_word_emb, scale=src_emb_dim**0.5)
src_pos_enc = layers.embedding( src_pos_enc = layers.embedding(
src_pos, src_pos,
size=[src_max_len, src_emb_dim], size=[src_max_len, src_emb_dim],
...@@ -447,7 +432,12 @@ def transformer( ...@@ -447,7 +432,12 @@ def transformer(
d_model, d_model,
d_inner_hid, d_inner_hid,
dropout_rate, dropout_rate,
weight_sharing,
label_smooth_eps, ): label_smooth_eps, ):
if weight_sharing:
assert src_vocab_size == src_vocab_size, (
"Vocabularies in source and target should be same for weight sharing."
)
enc_inputs = make_all_inputs(encoder_data_input_fields + enc_inputs = make_all_inputs(encoder_data_input_fields +
encoder_util_input_fields) encoder_util_input_fields)
...@@ -461,6 +451,7 @@ def transformer( ...@@ -461,6 +451,7 @@ def transformer(
d_model, d_model,
d_inner_hid, d_inner_hid,
dropout_rate, dropout_rate,
weight_sharing,
enc_inputs, ) enc_inputs, )
dec_inputs = make_all_inputs(decoder_data_input_fields[:-1] + dec_inputs = make_all_inputs(decoder_data_input_fields[:-1] +
...@@ -476,6 +467,7 @@ def transformer( ...@@ -476,6 +467,7 @@ def transformer(
d_model, d_model,
d_inner_hid, d_inner_hid,
dropout_rate, dropout_rate,
weight_sharing,
dec_inputs, dec_inputs,
enc_output, ) enc_output, )
...@@ -507,6 +499,7 @@ def wrap_encoder(src_vocab_size, ...@@ -507,6 +499,7 @@ def wrap_encoder(src_vocab_size,
d_model, d_model,
d_inner_hid, d_inner_hid,
dropout_rate, dropout_rate,
weight_sharing,
enc_inputs=None): enc_inputs=None):
""" """
The wrapper assembles together all needed layers for the encoder. The wrapper assembles together all needed layers for the encoder.
...@@ -528,7 +521,8 @@ def wrap_encoder(src_vocab_size, ...@@ -528,7 +521,8 @@ def wrap_encoder(src_vocab_size,
d_model, d_model,
max_length, max_length,
dropout_rate, dropout_rate,
src_data_shape, ) src_data_shape,
word_emb_param_name=word_emb_param_names[0])
enc_output = encoder( enc_output = encoder(
enc_input, enc_input,
src_slf_attn_bias, src_slf_attn_bias,
...@@ -553,6 +547,7 @@ def wrap_decoder(trg_vocab_size, ...@@ -553,6 +547,7 @@ def wrap_decoder(trg_vocab_size,
d_model, d_model,
d_inner_hid, d_inner_hid,
dropout_rate, dropout_rate,
weight_sharing,
dec_inputs=None, dec_inputs=None,
enc_output=None, enc_output=None,
caches=None): caches=None):
...@@ -579,7 +574,9 @@ def wrap_decoder(trg_vocab_size, ...@@ -579,7 +574,9 @@ def wrap_decoder(trg_vocab_size,
d_model, d_model,
max_length, max_length,
dropout_rate, dropout_rate,
trg_data_shape, ) trg_data_shape,
word_emb_param_name=word_emb_param_names[0]
if weight_sharing else word_emb_param_names[1])
dec_output = decoder( dec_output = decoder(
dec_input, dec_input,
enc_output, enc_output,
...@@ -598,13 +595,22 @@ def wrap_decoder(trg_vocab_size, ...@@ -598,13 +595,22 @@ def wrap_decoder(trg_vocab_size,
src_attn_post_softmax_shape, src_attn_post_softmax_shape,
caches, ) caches, )
# Return logits for training and probs for inference. # Return logits for training and probs for inference.
predict = layers.reshape( if weight_sharing:
x=layers.fc(input=dec_output, predict = layers.reshape(
size=trg_vocab_size, x=layers.matmul(
bias_attr=False, x=dec_output,
num_flatten_dims=2), y=fluid.get_var(word_emb_param_names[0]),
shape=[-1, trg_vocab_size], transpose_y=True),
act="softmax" if dec_inputs is None else None) shape=[-1, trg_vocab_size],
act="softmax" if dec_inputs is None else None)
else:
predict = layers.reshape(
x=layers.fc(input=dec_output,
size=trg_vocab_size,
bias_attr=False,
num_flatten_dims=2),
shape=[-1, trg_vocab_size],
act="softmax" if dec_inputs is None else None)
return predict return predict
......
...@@ -43,9 +43,11 @@ def parse_args(): ...@@ -43,9 +43,11 @@ def parse_args():
parser.add_argument( parser.add_argument(
"--batch_size", "--batch_size",
type=int, type=int,
default=2000, default=2048,
help="The number of sequences contained in a mini-batch, or the maximum " help="The number of sequences contained in a mini-batch, or the maximum "
"number of tokens (include paddings) contained in a mini-batch.") "number of tokens (include paddings) contained in a mini-batch. Note "
"that this represents the number on single device and the actual batch "
"size for multi-devices will multiply the device number.")
parser.add_argument( parser.add_argument(
"--pool_size", "--pool_size",
type=int, type=int,
...@@ -203,50 +205,50 @@ def prepare_batch_input(insts, data_input_names, util_input_names, src_pad_idx, ...@@ -203,50 +205,50 @@ def prepare_batch_input(insts, data_input_names, util_input_names, src_pad_idx,
[num_token], dtype="float32") [num_token], dtype="float32")
def train(args): def read_multiple(reader, count, clip_last=True):
dev_count = fluid.core.get_cuda_device_count() """
Stack data from reader for multi-devices.
"""
def read_multiple(reader, def __impl__():
count=dev_count if args.use_token_batch else 1, res = []
clip_last=True): for item in reader():
""" res.append(item)
Stack data from reader for multi-devices.
"""
def __impl__():
res = []
for item in reader():
res.append(item)
if len(res) == count:
yield res
res = []
if len(res) == count: if len(res) == count:
yield res yield res
elif not clip_last: res = []
data = [] if len(res) == count:
for item in res: yield res
data += item elif not clip_last:
if len(data) > count: data = []
inst_num_per_part = len(data) // count for item in res:
yield [ data += item
data[inst_num_per_part * i:inst_num_per_part * (i + 1)] if len(data) > count:
for i in range(count) inst_num_per_part = len(data) // count
] yield [
data[inst_num_per_part * i:inst_num_per_part * (i + 1)]
return __impl__ for i in range(count)
]
def split_data(data, num_part=dev_count):
""" return __impl__
Split data for each device.
"""
if len(data) == num_part: def split_data(data, num_part):
return data """
data = data[0] Split data for each device.
inst_num_per_part = len(data) // num_part """
return [ if len(data) == num_part:
data[inst_num_per_part * i:inst_num_per_part * (i + 1)] return data
for i in range(num_part) data = data[0]
] inst_num_per_part = len(data) // num_part
return [
data[inst_num_per_part * i:inst_num_per_part * (i + 1)]
for i in range(num_part)
]
def train(args):
dev_count = fluid.core.get_cuda_device_count()
sum_cost, avg_cost, predict, token_num = transformer( sum_cost, avg_cost, predict, token_num = transformer(
ModelHyperParams.src_vocab_size, ModelHyperParams.trg_vocab_size, ModelHyperParams.src_vocab_size, ModelHyperParams.trg_vocab_size,
...@@ -254,7 +256,7 @@ def train(args): ...@@ -254,7 +256,7 @@ def train(args):
ModelHyperParams.n_head, ModelHyperParams.d_key, ModelHyperParams.n_head, ModelHyperParams.d_key,
ModelHyperParams.d_value, ModelHyperParams.d_model, ModelHyperParams.d_value, ModelHyperParams.d_model,
ModelHyperParams.d_inner_hid, ModelHyperParams.dropout, ModelHyperParams.d_inner_hid, ModelHyperParams.dropout,
TrainTaskConfig.label_smooth_eps) ModelHyperParams.weight_sharing, TrainTaskConfig.label_smooth_eps)
lr_scheduler = LearningRateScheduler(ModelHyperParams.d_model, lr_scheduler = LearningRateScheduler(ModelHyperParams.d_model,
TrainTaskConfig.warmup_steps, TrainTaskConfig.warmup_steps,
...@@ -290,19 +292,27 @@ def train(args): ...@@ -290,19 +292,27 @@ def train(args):
unk_mark=args.special_token[2], unk_mark=args.special_token[2],
max_length=ModelHyperParams.max_length, max_length=ModelHyperParams.max_length,
clip_last_batch=False) clip_last_batch=False)
train_data = read_multiple(
train_data = read_multiple(reader=train_data.batch_generator) reader=train_data.batch_generator,
count=dev_count if args.use_token_batch else 1)
build_strategy = fluid.BuildStrategy()
# Since the token number differs among devices, customize gradient scale to
# use token average cost among multi-devices. and the gradient scale is
# `1 / token_number` for average cost.
build_strategy.gradient_scale_strategy = fluid.BuildStrategy.GradientScaleStrategy.Customized
train_exe = fluid.ParallelExecutor( train_exe = fluid.ParallelExecutor(
use_cuda=TrainTaskConfig.use_gpu, use_cuda=TrainTaskConfig.use_gpu,
loss_name=sum_cost.name, loss_name=sum_cost.name,
use_default_grad_scale=False) build_strategy=build_strategy)
def test_context(): def test_context():
# Context to do validation. # Context to do validation.
test_program = fluid.default_main_program().clone() test_program = fluid.default_main_program().clone(for_test=True)
with fluid.program_guard(test_program): test_exe = fluid.ParallelExecutor(
test_program = fluid.io.get_inference_program([avg_cost]) use_cuda=TrainTaskConfig.use_gpu,
main_program=test_program,
share_vars_from=train_exe)
val_data = reader.DataReader( val_data = reader.DataReader(
src_vocab_fpath=args.src_vocab_fpath, src_vocab_fpath=args.src_vocab_fpath,
...@@ -321,18 +331,17 @@ def train(args): ...@@ -321,18 +331,17 @@ def train(args):
shuffle=False, shuffle=False,
shuffle_batch=False) shuffle_batch=False)
test_exe = fluid.ParallelExecutor(
use_cuda=TrainTaskConfig.use_gpu,
main_program=test_program,
share_vars_from=train_exe)
def test(exe=test_exe): def test(exe=test_exe):
test_total_cost = 0 test_total_cost = 0
test_total_token = 0 test_total_token = 0
test_data = read_multiple(reader=val_data.batch_generator) test_data = read_multiple(
reader=val_data.batch_generator,
count=dev_count if args.use_token_batch else 1)
for batch_id, data in enumerate(test_data()): for batch_id, data in enumerate(test_data()):
feed_list = [] feed_list = []
for place_id, data_buffer in enumerate(split_data(data)): for place_id, data_buffer in enumerate(
split_data(
data, num_part=dev_count)):
data_input_dict, util_input_dict, _ = prepare_batch_input( data_input_dict, util_input_dict, _ = prepare_batch_input(
data_buffer, data_input_names, util_input_names, data_buffer, data_input_names, util_input_names,
ModelHyperParams.eos_idx, ModelHyperParams.eos_idx, ModelHyperParams.eos_idx, ModelHyperParams.eos_idx,
...@@ -365,7 +374,9 @@ def train(args): ...@@ -365,7 +374,9 @@ def train(args):
feed_list = [] feed_list = []
total_num_token = 0 total_num_token = 0
lr_rate = lr_scheduler.update_learning_rate() lr_rate = lr_scheduler.update_learning_rate()
for place_id, data_buffer in enumerate(split_data(data)): for place_id, data_buffer in enumerate(
split_data(
data, num_part=dev_count)):
data_input_dict, util_input_dict, num_token = prepare_batch_input( data_input_dict, util_input_dict, num_token = prepare_batch_input(
data_buffer, data_input_names, util_input_names, data_buffer, data_input_names, util_input_names,
ModelHyperParams.eos_idx, ModelHyperParams.eos_idx, ModelHyperParams.eos_idx, ModelHyperParams.eos_idx,
...@@ -375,17 +386,14 @@ def train(args): ...@@ -375,17 +386,14 @@ def train(args):
dict(data_input_dict.items() + util_input_dict.items() + dict(data_input_dict.items() + util_input_dict.items() +
{lr_scheduler.learning_rate.name: lr_rate}.items())) {lr_scheduler.learning_rate.name: lr_rate}.items()))
if not init: if not init: # init the position encoding table
for pos_enc_param_name in pos_enc_param_names: for pos_enc_param_name in pos_enc_param_names:
pos_enc = position_encoding_init( pos_enc = position_encoding_init(
ModelHyperParams.max_length + 1, ModelHyperParams.max_length + 1,
ModelHyperParams.d_model) ModelHyperParams.d_model)
feed_list[place_id][pos_enc_param_name] = pos_enc feed_list[place_id][pos_enc_param_name] = pos_enc
for feed_dict in feed_list: for feed_dict in feed_list:
feed_dict[ feed_dict[sum_cost.name + "@GRAD"] = 1. / total_num_token
sum_cost.name +
"@GRAD"] = 1. / total_num_token if TrainTaskConfig.use_avg_cost else np.asarray(
[1.], dtype="float32")
outs = train_exe.run(fetch_list=[sum_cost.name, token_num.name], outs = train_exe.run(fetch_list=[sum_cost.name, token_num.name],
feed=feed_list) feed=feed_list)
sum_cost_val, token_num_val = np.array(outs[0]), np.array(outs[1]) sum_cost_val, token_num_val = np.array(outs[0]), np.array(outs[1])
......
./data/pascalvoc/VOCdevkit/ # saved model
data/pascalvoc/test.txt model/
data/pascalvoc/trainval.txt
pretrained/ssd_mobilenet_v1_coco.tar.gz pretrained/ssd_mobilenet_v1_coco.tar.gz
pretrained/ssd_mobilenet_v1_coco pretrained/ssd_mobilenet_v1_coco
pretrained/mobilenet_v1_imagenet.tar.gz pretrained/mobilenet_v1_imagenet.tar.gz
pretrained/mobilenet_v1_imagenet pretrained/mobilenet_v1_imagenet
# coco and pascalvoc data
data/coco/train2017/
data/coco/train2014/
data/coco/val2017/
data/coco/val2014/
data/coco/*.zip
data/coco/annotations/
data/pascalvoc/VOCdevkit/
data/pascalvoc/*.tar
data/pascalvoc/test.txt
data/pascalvoc/trainval.txt
log* log*
*.log *.log
The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html). The minimum PaddlePaddle version needed for the code sample in this directory is the latest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
--- ---
## SSD Object Detection ## SSD Object Detection
## Table of Contents
- [Introduction](#introduction)
- [Data Preparation](#data-preparation)
- [Train](#train)
- [Evaluate](#evaluate)
- [Infer and Visualize](#infer-and-visualize)
- [Released Model](#released-model)
### Introduction ### Introduction
[Single Shot MultiBox Detector (SSD)](https://arxiv.org/abs/1512.02325) framework for object detection is based on a feed-forward convolutional network. The early network is a standard convolutional architecture for image classification, such as VGG, ResNet, or MobileNet, which is also called base network. In this tutorial we used [MobileNet](https://arxiv.org/abs/1704.04861). [Single Shot MultiBox Detector (SSD)](https://arxiv.org/abs/1512.02325) framework for object detection can be categorized as a single stage detector. A single stage detector simplifies object detection as a regression problem, which directly predicts the bounding boxes and class probabilities without region proposal. SSD further makes improves by producing these predictions of different scales from different layers, as shown below. Six levels predictions are made in six different scale feature maps. And there are two 3x3 convolutional layers in each feature map, which predict category or a shape offset relative to the prior box(also called anchor), respectively. Thus, we get 38x38x4 + 19x19x6 + 10x10x6 + 5x5x6 + 3x3x4 + 1x1x4 = 8732 detections per class.
<p align="center">
<img src="images/SSD_paper_figure.jpg" height=300 width=900 hspace='10'/> <br />
The Single Shot MultiBox Detector (SSD)
</p>
SSD is readily pluggable into a wide variant standard convolutional network, such as VGG, ResNet, or MobileNet, which is also called base network or backbone. In this tutorial we used [MobileNet](https://arxiv.org/abs/1704.04861).
### Data Preparation ### Data Preparation
You can use [PASCAL VOC dataset](http://host.robots.ox.ac.uk/pascal/VOC/) or [MS-COCO dataset](http://cocodataset.org/#download). You can use [PASCAL VOC dataset](http://host.robots.ox.ac.uk/pascal/VOC/) or [MS-COCO dataset](http://cocodataset.org/#download).
#### PASCAL VOC Dataset If you want to train a model on PASCAL VOC dataset, please download dataset at first, skip this step if you already have one.
If you want to train model on PASCAL VOC dataset, please download datset at first, skip this step if you already have one.
```bash ```bash
cd data/pascalvoc cd data/pascalvoc
...@@ -23,9 +36,7 @@ cd data/pascalvoc ...@@ -23,9 +36,7 @@ cd data/pascalvoc
The command `download.sh` also will create training and testing file lists. The command `download.sh` also will create training and testing file lists.
#### MS-COCO Dataset If you want to train a model on MS-COCO dataset, please download dataset at first, skip this step if you already have one.
If you want to train model on MS-COCO dataset, please download datset at first, skip this step if you already have one.
``` ```
cd data/coco cd data/coco
...@@ -36,45 +47,52 @@ cd data/coco ...@@ -36,45 +47,52 @@ cd data/coco
#### Download the Pre-trained Model. #### Download the Pre-trained Model.
We provide two pre-trained models. The one is MobileNet-v1 SSD trained on COCO dataset, but removed the convolutional predictors for COCO dataset. This model can be used to initialize the models when training other dataset, like PASCAL VOC. Then other pre-trained model is MobileNet v1 trained on ImageNet 2012 dataset, but removed the last weights and bias in Fully-Connected layer. We provide two pre-trained models. The one is MobileNet-v1 SSD trained on COCO dataset, but removed the convolutional predictors for COCO dataset. This model can be used to initialize the models when training other datasets, like PASCAL VOC. The other pre-trained model is MobileNet-v1 trained on ImageNet 2012 dataset but removed the last weights and bias in the Fully-Connected layer.
Declaration: the MobileNet-v1 SSD model is converted by [TensorFlow model](https://github.com/tensorflow/models/blob/f87a58cd96d45de73c9a8330a06b2ab56749a7fa/research/object_detection/g3doc/detection_model_zoo.md). The MobileNet v1 model is converted [Caffe](https://github.com/shicai/MobileNet-Caffe). Declaration: the MobileNet-v1 SSD model is converted by [TensorFlow model](https://github.com/tensorflow/models/blob/f87a58cd96d45de73c9a8330a06b2ab56749a7fa/research/object_detection/g3doc/detection_model_zoo.md). The MobileNet-v1 model is converted from [Caffe](https://github.com/shicai/MobileNet-Caffe).
We will release the pre-trained models by ourself in the upcoming soon.
- Download MobileNet-v1 SSD: - Download MobileNet-v1 SSD:
``` ```bash
./pretrained/download_coco.sh ./pretrained/download_coco.sh
``` ```
- Download MobileNet-v1: - Download MobileNet-v1:
``` ```bash
./pretrained/download_imagenet.sh ./pretrained/download_imagenet.sh
``` ```
#### Train on PASCAL VOC #### Train on PASCAL VOC
- Train on one device (/GPU).
```python
env CUDA_VISIBLE_DEVICES=0 python -u train.py --parallel=False --dataset='pascalvoc' --pretrained_model='pretrained/ssd_mobilenet_v1_coco/'
```
- Train on multi devices (/GPUs).
```python `train.py` is the main caller of the training module. Examples of usage are shown below.
env CUDA_VISIBLE_DEVICES=0,1 python -u train.py --batch_size=64 --dataset='pascalvoc' --pretrained_model='pretrained/ssd_mobilenet_v1_coco/' ```bash
python -u train.py --batch_size=64 --dataset='pascalvoc' --pretrained_model='pretrained/ssd_mobilenet_v1_coco/'
``` ```
- Set ```export CUDA_VISIBLE_DEVICES=0,1``` to specifiy the number of GPU you want to use.
- Set ```--dataset='coco2014'``` or ```--dataset='coco2017'``` to train model on MS COCO dataset.
- For more help on arguments:
#### Train on MS-COCO ```bash
- Train on one device (/GPU). python train.py --help
```python
env CUDA_VISIBLE_DEVICES=0 python -u train.py --parallel=False --dataset='coco2014' --pretrained_model='pretrained/mobilenet_imagenet/'
```
- Train on multi devices (/GPUs).
```python
env CUDA_VISIBLE_DEVICES=0,1 python -u train.py --batch_size=64 --dataset='coco2014' --pretrained_model='pretrained/mobilenet_imagenet/'
``` ```
TBD Data reader is defined in `reader.py`. All images will be resized to 300x300. In training stage, images are randomly distorted, expanded, cropped and flipped:
- distort: distort brightness, contrast, saturation, and hue.
- expand: put the original image into a larger expanded image which is initialized using image mean.
- crop: crop image with respect to different scale, aspect ratio, and overlap.
- flip: flip horizontally.
We used RMSProp optimizer with mini-batch size 64 to train the MobileNet-SSD. The initial learning rate is 0.001, and was decayed at 40, 60, 80, 100 epochs with multiplier 0.5, 0.25, 0.1, 0.01, respectively. Weight decay is 0.00005. After 120 epochs we achieve 73.32% mAP under 11point metric.
### Evaluate ### Evaluate
You can evaluate your trained model in different metric like 11point, integral on both PASCAL VOC and COCO dataset. Moreover, we provide eval_coco_map.py which uses a COCO-specific mAP metric defined by [COCO committee](http://cocodataset.org/#detections-eval). To use this eval_coco_map.py, [cocoapi](https://github.com/cocodataset/cocoapi) is needed. You can evaluate your trained model in different metrics like 11point, integral on both PASCAL VOC and COCO dataset. Note we set the default test list to the dataset's test/val list, you can use your own test list by setting ```--test_list``` args.
`eval.py` is the main caller of the evaluating module. Examples of usage are shown below.
```bash
python eval.py --dataset='pascalvoc' --model_dir='train_pascal_model/best_model' --data_dir='data/pascalvoc' --test_list='test.txt' --ap_version='11point' --nms_threshold=0.45
```
You can set ```--dataset``` to ```coco2014``` or ```coco2017``` to evaluate COCO dataset. Moreover, we provide `eval_coco_map.py` which uses a COCO-specific mAP metric defined by [COCO committee](http://cocodataset.org/#detections-eval). To use this eval_coco_map.py, [cocoapi](https://github.com/cocodataset/cocoapi) is needed.
Install the cocoapi: Install the cocoapi:
``` ```
# COCOAPI=/path/to/clone/cocoapi # COCOAPI=/path/to/clone/cocoapi
...@@ -86,44 +104,25 @@ make install ...@@ -86,44 +104,25 @@ make install
# not to install the COCO API into global site-packages # not to install the COCO API into global site-packages
python2 setup.py install --user python2 setup.py install --user
``` ```
Note we set the defualt test list to the dataset's test/val list, you can use your own test list by setting test_list args.
#### Evaluate on PASCAL VOC
```python
env CUDA_VISIBLE_DEVICES=0 python eval.py --dataset='pascalvoc' --model_dir='train_pascal_model/90' --data_dir='data/pascalvoc' --test_list='test.txt' --ap_version='11point'
```
#### Evaluate on MS-COCO
```python
env CUDA_VISIBLE_DEVICES=0 python eval.py --dataset='coco2014' --nms_threshold=0.5 --model_dir='train_coco_model/40' --test_list='annotations/instances_minival2014.json' --ap_version='integral'
env CUDA_VISIBLE_DEVICES=0 python eval_coco_map.py --dataset='coco2017' --nms_threshold=0.5 --model_dir='train_coco_model/40' --test_list='annotations/instances_minival2017.json'
```
TBD
### Infer and Visualize ### Infer and Visualize
`infer.py` is the main caller of the inferring module. Examples of usage are shown below.
```python ```bash
env CUDA_VISIBLE_DEVICES=0 python infer.py --model_dir='train_coco_model/20' --image_path='./data/coco/val2014/COCO_val2014_000000000139.jpg' python infer.py --dataset='pascalvoc' --nms_threshold=0.45 --model_dir='train_pascal_model/best_model' --image_path='./data/pascalvoc/VOCdevkit/VOC2007/JPEGImages/009963.jpg'
``` ```
Below is the examples after running python infer.py to inference and visualize the model result. Below are the examples of running the inference and visualizing the model result.
<p align="center"> <p align="center">
<img src="images/COCO_val2014_000000000139.jpg" height=300 width=400 hspace='10'/> <img src="images/009943.jpg" height=300 width=400 hspace='10'/>
<img src="images/COCO_val2014_000000000785.jpg" height=300 width=400 hspace='10'/> <img src="images/009956.jpg" height=300 width=400 hspace='10'/>
<img src="images/COCO_val2014_000000142324.jpg" height=300 width=400 hspace='10'/> <img src="images/009960.jpg" height=300 width=400 hspace='10'/>
<img src="images/COCO_val2014_000000144003.jpg" height=300 width=400 hspace='10'/> <br /> <img src="images/009962.jpg" height=300 width=400 hspace='10'/> <br />
MobileNet-SSD300x300 Visualization Examples MobileNet-v1-SSD 300x300 Visualization Examples
</p> </p>
TBD
### Released Model ### Released Model
| Model | Pre-trained Model | Training data | Test data | mAP | | Model | Pre-trained Model | Training data | Test data | mAP |
|:------------------------:|:------------------:|:----------------:|:------------:|:----:| |:------------------------:|:------------------:|:----------------:|:------------:|:----:|
|MobileNet-v1-SSD 300x300 | COCO MobileNet SSD | VOC07+12 trainval| VOC07 test | xx% | |[MobileNet-v1-SSD 300x300](http://paddlemodels.bj.bcebos.com/ssd_mobilenet_v1_pascalvoc.tar.gz) | COCO MobileNet SSD | VOC07+12 trainval| VOC07 test | 73.32% |
|MobileNet-v1-SSD 300x300 | ImageNet MobileNet | VOC07+12 trainval| VOC07 test | xx% |
|MobileNet-v1-SSD 300x300 | ImageNet MobileNet | MS-COCO trainval | MS-COCO test | xx% |
TBD
运行本目录下的程序示例需要使用 PaddlePaddle 最新的 develop branch 版本。如果您的 PaddlePaddle 安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新 PaddlePaddle 安装版本。
---
## SSD 目标检测
## Table of Contents
- [简介](#简介)
- [数据准备](#数据准备)
- [模型训练](#模型训练)
- [模型评估](#模型评估)
- [模型预测以及可视化](#模型预测以及可视化)
- [模型发布](#模型发布)
### 简介
[Single Shot MultiBox Detector (SSD)](https://arxiv.org/abs/1512.02325) 是一种单阶段的目标检测器。与两阶段的检测方法不同,单阶段目标检测并不进行区域推荐,而是直接从特征图回归出目标的边界框和分类概率。SSD 运用了这种单阶段检测的思想,并且对其进行改进:在不同尺度的特征图上检测对应尺度的目标。如下图所示,SSD 在六个尺度的特征图上进行了不同层级的预测。每个层级由两个3x3卷积分别对目标类别和边界框偏移进行回归。因此对于每个类别,SSD 的六个层级一共会产生 38x38x4 + 19x19x6 + 10x10x6 + 5x5x6 + 3x3x4 + 1x1x4 = 8732 个检测结果。
<p align="center">
<img src="images/SSD_paper_figure.jpg" height=300 width=900 hspace='10'/> <br />
SSD 目标检测模型
</p>
SSD 可以方便地插入到任何一种标准卷积网络中,比如 VGG、ResNet 或者 MobileNet,这些网络被称作检测器的基网络。在这个示例中我们使用 [MobileNet](https://arxiv.org/abs/1704.04861)
### 数据准备
你可以使用 [PASCAL VOC 数据集](http://host.robots.ox.ac.uk/pascal/VOC/) 或者 [MS-COCO 数据集](http://cocodataset.org/#download)
如果你想在 PASCAL VOC 数据集上进行训练,请先使用下面的命令下载数据集。
```bash
cd data/pascalvoc
./download.sh
```
`download.sh` 命令会自动创建训练和测试用的列表文件。
如果你想在 MS-COCO 数据集上进行训练,请先使用下面的命令下载数据集。
```
cd data/coco
./download.sh
```
### 模型训练
#### 下载预训练模型
我们提供了两个预训练模型。第一个模型是在 COCO 数据集上预训练的 MobileNet-v1 SSD,我们将它的预测头移除了以便在 COCO 以外的数据集上进行训练。第二个模型是在 ImageNet 2012 数据集上预训练的 MobileNet-v1,我们也将最后的全连接层移除以便进行目标检测训练。
声明:MobileNet-v1 SSD 模型转换自[TensorFlow model](https://github.com/tensorflow/models/blob/f87a58cd96d45de73c9a8330a06b2ab56749a7fa/research/object_detection/g3doc/detection_model_zoo.md)。MobileNet-v1 模型转换自[Caffe](https://github.com/shicai/MobileNet-Caffe)。我们不久也会发布我们自己预训练的模型。
- 下载 MobileNet-v1 SSD:
```bash
./pretrained/download_coco.sh
```
- 下载 MobileNet-v1:
```bash
./pretrained/download_imagenet.sh
```
#### 训练 PASCAL VOC 数据集
`train.py` 是训练模块的主要执行程序,调用示例如下:
```bash
python -u train.py --batch_size=64 --dataset='pascalvoc' --pretrained_model='pretrained/ssd_mobilenet_v1_coco/'
```
- 可以通过设置 ```export CUDA_VISIBLE_DEVICES=0,1``` 指定想要使用的GPU数量。
- 可以通过设置 ```--dataset='coco2014'``````--dataset='coco2017'``` 指定训练 MS-COCO数据集。
- 更多的可选参数见:
```bash
python train.py --help
```
数据的读取行为定义在 `reader.py` 中,所有的图片都会被缩放到300x300。在训练时还会对图片进行数据增强,包括随机扰动、扩张、翻转和裁剪:
- 扰动: 扰动图片亮度、对比度、饱和度和色相。
- 扩张: 将原始图片放进一张使用像素均值填充(随后会在减均值操作中减掉)的扩张图中,再对此图进行裁剪、缩放和翻转。
- 翻转: 水平翻转。
- 裁剪: 根据缩放比例、长宽比例两个参数生成若干候选框,再依据这些候选框和标注框的面积交并比(IoU)挑选出符合要求的裁剪结果。
我们使用了 RMSProp 优化算法来训练 MobileNet-SSD,batch大小为64,权重衰减系数为0.00005,初始学习率为 0.001,并且在第40、60、80、100 轮时使用 0.5, 0.25, 0.1, 0.01乘子进行学习率衰减。在120轮训练后,11point评价标准下的mAP为73.32%。
### 模型评估
你可以使用11point、integral等指标在PASCAL VOC 和 COCO 数据集上评估训练好的模型。不失一般性,我们采用相应数据集的测试列表作为样例代码的默认列表,你也可以通过设置```--test_list```来指定自己的测试样本列表。
`eval.py`是评估模块的主要执行程序,调用示例如下:
```bash
python eval.py --dataset='pascalvoc' --model_dir='train_pascal_model/best_model' --data_dir='data/pascalvoc' --test_list='test.txt' --ap_version='11point' --nms_threshold=0.45
```
你可以设置```--dataset``````coco2014``````coco2017```来评估 COCO 数据集。我们也提供了`eval_coco_map.py`以进行[COCO官方评估](http://cocodataset.org/#detections-eval)。若要使用 eval_coco_map.py, 需要首先下载[cocoapi](https://github.com/cocodataset/cocoapi)
```
# COCOAPI=/path/to/clone/cocoapi
git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
cd $COCOAPI/PythonAPI
# Install into global site-packages
make install
# Alternatively, if you do not have permissions or prefer
# not to install the COCO API into global site-packages
python2 setup.py install --user
```
### 模型预测以及可视化
`infer.py`是预测及可视化模块的主要执行程序,调用示例如下:
```bash
python infer.py --dataset='pascalvoc' --nms_threshold=0.45 --model_dir='train_pascal_model/best_model' --image_path='./data/pascalvoc/VOCdevkit/VOC2007/JPEGImages/009963.jpg'
```
下图可视化了模型的预测结果:
<p align="center">
<img src="images/009943.jpg" height=300 width=400 hspace='10'/>
<img src="images/009956.jpg" height=300 width=400 hspace='10'/>
<img src="images/009960.jpg" height=300 width=400 hspace='10'/>
<img src="images/009962.jpg" height=300 width=400 hspace='10'/> <br />
MobileNet-v1-SSD 300x300 预测可视化
</p>
### 模型发布
| 模型 | 预训练模型 | 训练数据 | 测试数据 | mAP |
|:------------------------:|:------------------:|:----------------:|:------------:|:----:|
|[MobileNet-v1-SSD 300x300](http://paddlemodels.bj.bcebos.com/ssd_mobilenet_v1_pascalvoc.tar.gz) | COCO MobileNet SSD | VOC07+12 trainval| VOC07 test | 73.32% |
...@@ -64,6 +64,7 @@ def eval(args, data_args, test_list, batch_size, model_dir=None): ...@@ -64,6 +64,7 @@ def eval(args, data_args, test_list, batch_size, model_dir=None):
place=place, feed_list=[image, gt_box, gt_label, difficult]) place=place, feed_list=[image, gt_box, gt_label, difficult])
def test(): def test():
# switch network to test mode (i.e. batch norm test mode)
test_program = fluid.default_main_program().clone(for_test=True) test_program = fluid.default_main_program().clone(for_test=True)
with fluid.program_guard(test_program): with fluid.program_guard(test_program):
map_eval = fluid.evaluator.DetectionMAP( map_eval = fluid.evaluator.DetectionMAP(
...@@ -79,12 +80,12 @@ def eval(args, data_args, test_list, batch_size, model_dir=None): ...@@ -79,12 +80,12 @@ def eval(args, data_args, test_list, batch_size, model_dir=None):
_, accum_map = map_eval.get_map_var() _, accum_map = map_eval.get_map_var()
map_eval.reset(exe) map_eval.reset(exe)
for batch_id, data in enumerate(test_reader()): for batch_id, data in enumerate(test_reader()):
test_map = exe.run(test_program, test_map, = exe.run(test_program,
feed=feeder.feed(data), feed=feeder.feed(data),
fetch_list=[accum_map]) fetch_list=[accum_map])
if batch_id % 20 == 0: if batch_id % 20 == 0:
print("Batch {0}, map {1}".format(batch_id, test_map[0])) print("Batch {0}, map {1}".format(batch_id, test_map))
print("Test model {0}, map {1}".format(model_dir, test_map[0])) print("Test model {0}, map {1}".format(model_dir, test_map))
test() test()
...@@ -96,10 +97,14 @@ if __name__ == '__main__': ...@@ -96,10 +97,14 @@ if __name__ == '__main__':
data_dir = 'data/pascalvoc' data_dir = 'data/pascalvoc'
test_list = 'test.txt' test_list = 'test.txt'
label_file = 'label_list' label_file = 'label_list'
if not os.path.exists(args.model_dir):
raise ValueError("The model path [%s] does not exist." %
(args.model_dir))
if 'coco' in args.dataset: if 'coco' in args.dataset:
data_dir = './data/coco' data_dir = 'data/coco'
if '2014' in args.dataset: if '2014' in args.dataset:
test_list = 'annotations/instances_minival2014.json' test_list = 'annotations/instances_val2014.json'
elif '2017' in args.dataset: elif '2017' in args.dataset:
test_list = 'annotations/instances_val2017.json' test_list = 'annotations/instances_val2017.json'
......
...@@ -133,7 +133,7 @@ if __name__ == '__main__': ...@@ -133,7 +133,7 @@ if __name__ == '__main__':
data_dir = './data/coco' data_dir = './data/coco'
if '2014' in args.dataset: if '2014' in args.dataset:
test_list = 'annotations/instances_minival2014.json' test_list = 'annotations/instances_val2014.json'
elif '2017' in args.dataset: elif '2017' in args.dataset:
test_list = 'annotations/instances_val2017.json' test_list = 'annotations/instances_val2017.json'
......
...@@ -37,9 +37,11 @@ def bbox_area(src_bbox): ...@@ -37,9 +37,11 @@ def bbox_area(src_bbox):
def generate_sample(sampler): def generate_sample(sampler):
scale = random.uniform(sampler.min_scale, sampler.max_scale) scale = random.uniform(sampler.min_scale, sampler.max_scale)
min_aspect_ratio = max(sampler.min_aspect_ratio, (scale**2.0)) aspect_ratio = random.uniform(sampler.min_aspect_ratio,
max_aspect_ratio = min(sampler.max_aspect_ratio, 1 / (scale**2.0)) sampler.max_aspect_ratio)
aspect_ratio = random.uniform(min_aspect_ratio, max_aspect_ratio) aspect_ratio = max(aspect_ratio, (scale**2.0))
aspect_ratio = min(aspect_ratio, 1 / (scale**2.0))
bbox_width = scale * (aspect_ratio**0.5) bbox_width = scale * (aspect_ratio**0.5)
bbox_height = scale / (aspect_ratio**0.5) bbox_height = scale / (aspect_ratio**0.5)
xmin_bound = 1 - bbox_width xmin_bound = 1 - bbox_width
......
...@@ -5,6 +5,7 @@ import argparse ...@@ -5,6 +5,7 @@ import argparse
import functools import functools
from PIL import Image from PIL import Image
from PIL import ImageDraw from PIL import ImageDraw
from PIL import ImageFont
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
...@@ -20,7 +21,7 @@ add_arg('use_gpu', bool, True, "Whether use GPU.") ...@@ -20,7 +21,7 @@ add_arg('use_gpu', bool, True, "Whether use GPU.")
add_arg('image_path', str, '', "The image used to inference and visualize.") add_arg('image_path', str, '', "The image used to inference and visualize.")
add_arg('model_dir', str, '', "The model path.") add_arg('model_dir', str, '', "The model path.")
add_arg('nms_threshold', float, 0.45, "NMS threshold.") add_arg('nms_threshold', float, 0.45, "NMS threshold.")
add_arg('confs_threshold', float, 0.2, "Confidence threshold to draw bbox.") add_arg('confs_threshold', float, 0.5, "Confidence threshold to draw bbox.")
add_arg('resize_h', int, 300, "The resized image height.") add_arg('resize_h', int, 300, "The resized image height.")
add_arg('resize_w', int, 300, "The resized image height.") add_arg('resize_w', int, 300, "The resized image height.")
add_arg('mean_value_B', float, 127.5, "Mean value for B channel which will be subtracted.") #123.68 add_arg('mean_value_B', float, 127.5, "Mean value for B channel which will be subtracted.") #123.68
...@@ -33,8 +34,20 @@ def infer(args, data_args, image_path, model_dir): ...@@ -33,8 +34,20 @@ def infer(args, data_args, image_path, model_dir):
image_shape = [3, data_args.resize_h, data_args.resize_w] image_shape = [3, data_args.resize_h, data_args.resize_w]
if 'coco' in data_args.dataset: if 'coco' in data_args.dataset:
num_classes = 91 num_classes = 91
# cocoapi
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
label_fpath = os.path.join(data_dir, label_file)
coco = COCO(label_fpath)
category_ids = coco.getCatIds()
label_list = {
item['id']: item['name']
for item in coco.loadCats(category_ids)
}
label_list[0] = ['background']
elif 'pascalvoc' in data_args.dataset: elif 'pascalvoc' in data_args.dataset:
num_classes = 21 num_classes = 21
label_list = data_args.label_list
image = fluid.layers.data(name='image', shape=image_shape, dtype='float32') image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
locs, confs, box, box_var = mobile_net(num_classes, image, image_shape) locs, confs, box, box_var = mobile_net(num_classes, image, image_shape)
...@@ -52,22 +65,21 @@ def infer(args, data_args, image_path, model_dir): ...@@ -52,22 +65,21 @@ def infer(args, data_args, image_path, model_dir):
infer_reader = reader.infer(data_args, image_path) infer_reader = reader.infer(data_args, image_path)
feeder = fluid.DataFeeder(place=place, feed_list=[image]) feeder = fluid.DataFeeder(place=place, feed_list=[image])
def infer(): data = infer_reader()
data = infer_reader()
nmsed_out_v = exe.run(fluid.default_main_program(),
feed=feeder.feed([[data]]),
fetch_list=[nmsed_out],
return_numpy=False)
nmsed_out_v = np.array(nmsed_out_v[0])
draw_bounding_box_on_image(image_path, nmsed_out_v,
args.confs_threshold)
for dt in nmsed_out_v:
category_id, score, xmin, ymin, xmax, ymax = dt.tolist()
infer() # switch network to test mode (i.e. batch norm test mode)
test_program = fluid.default_main_program().clone(for_test=True)
nmsed_out_v, = exe.run(test_program,
feed=feeder.feed([[data]]),
fetch_list=[nmsed_out],
return_numpy=False)
nmsed_out_v = np.array(nmsed_out_v)
draw_bounding_box_on_image(image_path, nmsed_out_v, args.confs_threshold,
label_list)
def draw_bounding_box_on_image(image_path, nms_out, confs_threshold): def draw_bounding_box_on_image(image_path, nms_out, confs_threshold,
label_list):
image = Image.open(image_path) image = Image.open(image_path)
draw = ImageDraw.Draw(image) draw = ImageDraw.Draw(image)
im_width, im_height = image.size im_width, im_height = image.size
...@@ -85,6 +97,8 @@ def draw_bounding_box_on_image(image_path, nms_out, confs_threshold): ...@@ -85,6 +97,8 @@ def draw_bounding_box_on_image(image_path, nms_out, confs_threshold):
(left, top)], (left, top)],
width=4, width=4,
fill='red') fill='red')
if image.mode == 'RGB':
draw.text((left, top), label_list[int(category_id)], (255, 255, 0))
image_name = image_path.split('/')[-1] image_name = image_path.split('/')[-1]
print("image with bbox drawed saved as {}".format(image_name)) print("image with bbox drawed saved as {}".format(image_name))
image.save(image_name) image.save(image_name)
...@@ -94,10 +108,20 @@ if __name__ == '__main__': ...@@ -94,10 +108,20 @@ if __name__ == '__main__':
args = parser.parse_args() args = parser.parse_args()
print_arguments(args) print_arguments(args)
data_dir = 'data/pascalvoc'
label_file = 'label_list'
if not os.path.exists(args.model_dir):
raise ValueError("The model path [%s] does not exist." %
(args.model_dir))
if 'coco' in args.dataset:
data_dir = 'data/coco'
label_file = 'annotations/instances_val2014.json'
data_args = reader.Settings( data_args = reader.Settings(
dataset=args.dataset, dataset=args.dataset,
data_dir='', data_dir=data_dir,
label_file='', label_file=label_file,
resize_h=args.resize_h, resize_h=args.resize_h,
resize_w=args.resize_w, resize_w=args.resize_w,
mean_value=[args.mean_value_B, args.mean_value_G, args.mean_value_R], mean_value=[args.mean_value_B, args.mean_value_G, args.mean_value_R],
......
...@@ -15,18 +15,17 @@ parser = argparse.ArgumentParser(description=__doc__) ...@@ -15,18 +15,17 @@ parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser) add_arg = functools.partial(add_arguments, argparser=parser)
# yapf: disable # yapf: disable
add_arg('learning_rate', float, 0.001, "Learning rate.") add_arg('learning_rate', float, 0.001, "Learning rate.")
add_arg('batch_size', int, 32, "Minibatch size.") add_arg('batch_size', int, 64, "Minibatch size.")
add_arg('num_passes', int, 120, "Epoch number.") add_arg('num_passes', int, 120, "Epoch number.")
add_arg('use_gpu', bool, True, "Whether use GPU.") add_arg('use_gpu', bool, True, "Whether use GPU.")
add_arg('parallel', bool, True, "Parallel.") add_arg('parallel', bool, True, "Parallel.")
add_arg('use_nccl', bool, True, "NCCL.")
add_arg('dataset', str, 'pascalvoc', "coco2014, coco2017, and pascalvoc.") add_arg('dataset', str, 'pascalvoc', "coco2014, coco2017, and pascalvoc.")
add_arg('model_save_dir', str, 'model', "The path to save model.") add_arg('model_save_dir', str, 'model', "The path to save model.")
add_arg('pretrained_model', str, 'pretrained/ssd_mobilenet_v1_coco/', "The init model path.") add_arg('pretrained_model', str, 'pretrained/ssd_mobilenet_v1_coco/', "The init model path.")
add_arg('apply_distort', bool, True, "Whether apply distort.") add_arg('apply_distort', bool, True, "Whether apply distort.")
add_arg('apply_expand', bool, False, "Whether appley expand.") add_arg('apply_expand', bool, True, "Whether appley expand.")
add_arg('nms_threshold', float, 0.45, "NMS threshold.") add_arg('nms_threshold', float, 0.45, "NMS threshold.")
add_arg('ap_version', str, 'integral', "integral, 11point.") add_arg('ap_version', str, '11point', "integral, 11point.")
add_arg('resize_h', int, 300, "The resized image height.") add_arg('resize_h', int, 300, "The resized image height.")
add_arg('resize_w', int, 300, "The resized image height.") add_arg('resize_w', int, 300, "The resized image height.")
add_arg('mean_value_B', float, 127.5, "Mean value for B channel which will be subtracted.") #123.68 add_arg('mean_value_B', float, 127.5, "Mean value for B channel which will be subtracted.") #123.68
...@@ -35,141 +34,16 @@ add_arg('mean_value_R', float, 127.5, "Mean value for R channel which will ...@@ -35,141 +34,16 @@ add_arg('mean_value_R', float, 127.5, "Mean value for R channel which will
add_arg('is_toy', int, 0, "Toy for quick debug, 0 means using all data, while n means using only n sample.") add_arg('is_toy', int, 0, "Toy for quick debug, 0 means using all data, while n means using only n sample.")
#yapf: enable #yapf: enable
def parallel_do(args,
train_file_list,
val_file_list,
data_args,
learning_rate,
batch_size,
num_passes,
model_save_dir,
pretrained_model=None):
image_shape = [3, data_args.resize_h, data_args.resize_w]
if data_args.dataset == 'coco':
num_classes = 81
elif data_args.dataset == 'pascalvoc':
num_classes = 21
image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
gt_box = fluid.layers.data(
name='gt_box', shape=[4], dtype='float32', lod_level=1)
gt_label = fluid.layers.data(
name='gt_label', shape=[1], dtype='int32', lod_level=1)
difficult = fluid.layers.data(
name='gt_difficult', shape=[1], dtype='int32', lod_level=1)
if args.parallel:
places = fluid.layers.get_places()
pd = fluid.layers.ParallelDo(places, use_nccl=args.use_nccl)
with pd.do():
image_ = pd.read_input(image)
gt_box_ = pd.read_input(gt_box)
gt_label_ = pd.read_input(gt_label)
difficult_ = pd.read_input(difficult)
locs, confs, box, box_var = mobile_net(num_classes, image_,
image_shape)
loss = fluid.layers.ssd_loss(locs, confs, gt_box_, gt_label_, box,
box_var)
nmsed_out = fluid.layers.detection_output(
locs, confs, box, box_var, nms_threshold=0.45)
loss = fluid.layers.reduce_sum(loss)
pd.write_output(loss)
pd.write_output(nmsed_out)
loss, nmsed_out = pd()
loss = fluid.layers.mean(loss)
else:
locs, confs, box, box_var = mobile_net(num_classes, image, image_shape)
nmsed_out = fluid.layers.detection_output(
locs, confs, box, box_var, nms_threshold=0.45)
loss = fluid.layers.ssd_loss(locs, confs, gt_box, gt_label, box,
box_var)
loss = fluid.layers.reduce_sum(loss)
test_program = fluid.default_main_program().clone(for_test=True)
with fluid.program_guard(test_program):
map_eval = fluid.evaluator.DetectionMAP(
nmsed_out,
gt_label,
gt_box,
difficult,
num_classes,
overlap_threshold=0.5,
evaluate_difficult=False,
ap_version=args.ap_version)
if data_args.dataset == 'coco':
# learning rate decay in 12, 19 pass, respectively
if '2014' in train_file_list:
boundaries = [82783 / batch_size * 12, 82783 / batch_size * 19]
elif '2017' in train_file_list:
boundaries = [118287 / batch_size * 12, 118287 / batch_size * 19]
elif data_args.dataset == 'pascalvoc':
boundaries = [40000, 60000]
values = [learning_rate, learning_rate * 0.5, learning_rate * 0.25]
optimizer = fluid.optimizer.RMSProp(
learning_rate=fluid.layers.piecewise_decay(boundaries, values),
regularization=fluid.regularizer.L2Decay(0.00005), )
optimizer.minimize(loss)
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
if pretrained_model:
def if_exist(var):
return os.path.exists(os.path.join(pretrained_model, var.name))
fluid.io.load_vars(exe, pretrained_model, predicate=if_exist)
train_reader = paddle.batch(
reader.train(data_args, train_file_list), batch_size=batch_size)
test_reader = paddle.batch(
reader.test(data_args, val_file_list), batch_size=batch_size)
feeder = fluid.DataFeeder(
place=place, feed_list=[image, gt_box, gt_label, difficult])
def test(pass_id):
_, accum_map = map_eval.get_map_var()
map_eval.reset(exe)
test_map = None
for data in test_reader():
test_map = exe.run(test_program,
feed=feeder.feed(data),
fetch_list=[accum_map])
print("Pass {0}, test map {1}".format(pass_id, test_map[0]))
for pass_id in range(num_passes):
start_time = time.time()
prev_start_time = start_time
end_time = 0
for batch_id, data in enumerate(train_reader()):
prev_start_time = start_time
start_time = time.time()
loss_v = exe.run(fluid.default_main_program(),
feed=feeder.feed(data),
fetch_list=[loss])
end_time = time.time()
if batch_id % 20 == 0:
print("Pass {0}, batch {1}, loss {2}, time {3}".format(
pass_id, batch_id, loss_v[0], start_time - prev_start_time))
test(pass_id)
if pass_id % 10 == 0 or pass_id == num_passes - 1:
model_path = os.path.join(model_save_dir, str(pass_id))
print 'save models to %s' % (model_path)
fluid.io.save_persistables(exe, model_path)
def parallel_exe(args, def train(args,
train_file_list, train_file_list,
val_file_list, val_file_list,
data_args, data_args,
learning_rate, learning_rate,
batch_size, batch_size,
num_passes, num_passes,
model_save_dir, model_save_dir,
pretrained_model=None): pretrained_model=None):
image_shape = [3, data_args.resize_h, data_args.resize_w] image_shape = [3, data_args.resize_h, data_args.resize_w]
if 'coco' in data_args.dataset: if 'coco' in data_args.dataset:
num_classes = 91 num_classes = 91
...@@ -186,10 +60,6 @@ def parallel_exe(args, ...@@ -186,10 +60,6 @@ def parallel_exe(args,
name='gt_label', shape=[1], dtype='int32', lod_level=1) name='gt_label', shape=[1], dtype='int32', lod_level=1)
difficult = fluid.layers.data( difficult = fluid.layers.data(
name='gt_difficult', shape=[1], dtype='int32', lod_level=1) name='gt_difficult', shape=[1], dtype='int32', lod_level=1)
gt_iscrowd = fluid.layers.data(
name='gt_iscrowd', shape=[1], dtype='int32', lod_level=1)
gt_image_info = fluid.layers.data(
name='gt_image_id', shape=[3], dtype='int32', lod_level=1)
locs, confs, box, box_var = mobile_net(num_classes, image, image_shape) locs, confs, box, box_var = mobile_net(num_classes, image, image_shape)
nmsed_out = fluid.layers.detection_output( nmsed_out = fluid.layers.detection_output(
...@@ -267,15 +137,15 @@ def parallel_exe(args, ...@@ -267,15 +137,15 @@ def parallel_exe(args,
_, accum_map = map_eval.get_map_var() _, accum_map = map_eval.get_map_var()
map_eval.reset(exe) map_eval.reset(exe)
for batch_id, data in enumerate(test_reader()): for batch_id, data in enumerate(test_reader()):
test_map = exe.run(test_program, test_map, = exe.run(test_program,
feed=feeder.feed(data), feed=feeder.feed(data),
fetch_list=[accum_map]) fetch_list=[accum_map])
if batch_id % 20 == 0: if batch_id % 20 == 0:
print("Batch {0}, map {1}".format(batch_id, test_map[0])) print("Batch {0}, map {1}".format(batch_id, test_map))
if test_map[0] > best_map: if test_map[0] > best_map:
best_map = test_map[0] best_map = test_map[0]
save_model('best_model') save_model('best_model')
print("Pass {0}, test map {1}".format(pass_id, test_map[0])) print("Pass {0}, test map {1}".format(pass_id, test_map))
return best_map return best_map
for pass_id in range(num_passes): for pass_id in range(num_passes):
...@@ -285,10 +155,12 @@ def parallel_exe(args, ...@@ -285,10 +155,12 @@ def parallel_exe(args,
for batch_id, data in enumerate(train_reader()): for batch_id, data in enumerate(train_reader()):
prev_start_time = start_time prev_start_time = start_time
start_time = time.time() start_time = time.time()
if len(data) < devices_num: continue if len(data) < (devices_num * 2):
print("There are too few data to train on all devices.")
continue
if args.parallel: if args.parallel:
loss_v, = train_exe.run(fetch_list=[loss.name], loss_v, = train_exe.run(fetch_list=[loss.name],
feed_dict=feeder.feed(data)) feed=feeder.feed(data))
else: else:
loss_v, = exe.run(fluid.default_main_program(), loss_v, = exe.run(fluid.default_main_program(),
feed=feeder.feed(data), feed=feeder.feed(data),
...@@ -314,10 +186,10 @@ if __name__ == '__main__': ...@@ -314,10 +186,10 @@ if __name__ == '__main__':
label_file = 'label_list' label_file = 'label_list'
model_save_dir = args.model_save_dir model_save_dir = args.model_save_dir
if 'coco' in args.dataset: if 'coco' in args.dataset:
data_dir = './data/coco' data_dir = 'data/coco'
if '2014' in args.dataset: if '2014' in args.dataset:
train_file_list = 'annotations/instances_train2014.json' train_file_list = 'annotations/instances_train2014.json'
val_file_list = 'annotations/instances_minival2014.json' val_file_list = 'annotations/instances_val2014.json'
elif '2017' in args.dataset: elif '2017' in args.dataset:
train_file_list = 'annotations/instances_train2017.json' train_file_list = 'annotations/instances_train2017.json'
val_file_list = 'annotations/instances_val2017.json' val_file_list = 'annotations/instances_val2017.json'
...@@ -333,8 +205,7 @@ if __name__ == '__main__': ...@@ -333,8 +205,7 @@ if __name__ == '__main__':
apply_expand=args.apply_expand, apply_expand=args.apply_expand,
ap_version = args.ap_version, ap_version = args.ap_version,
toy=args.is_toy) toy=args.is_toy)
method = parallel_exe train(
method(
args, args,
train_file_list=train_file_list, train_file_list=train_file_list,
val_file_list=val_file_list, val_file_list=val_file_list,
......
 
运行本目录下的程序示例需要使用PaddlePaddle develop最新版本。如果您的PaddlePaddle安装版本低于此要求,请按照安装文档中的说明更新PaddlePaddle安装版本。 运行本目录下的程序示例需要使用PaddlePaddle develop最新版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
# Optical Character Recognition ## 代码结构
```
├── ctc_reader.py # 下载、读取、处理数据。
├── crnn_ctc_model.py # 定义了训练网络、预测网络和evaluate网络。
├── ctc_train.py # 用于模型的训练。
├── infer.py # 加载训练好的模型文件,对新数据进行预测。
├── eval.py # 评估模型在指定数据集上的效果。
└── utils.py # 定义通用的函数。
```
这里将介绍如何在PaddlePaddle Fluid下使用CRNN-CTC 和 CRNN-Attention模型对图片中的文字内容进行识别。
## 1. CRNN-CTC ## 简介
本章的任务是识别含有单行汉语字符图片,首先采用卷积将图片转为特征图, 然后使用`im2sequence op`将特征图转为序列,通过`双向GRU`学习到序列特征。训练过程选用的损失函数为CTC(Connectionist Temporal Classification) loss,最终的评估指标为样本级别的错误率。 本章的任务是识别含有单行汉语字符图片,首先采用卷积将图片转为特征图, 然后使用`im2sequence op`将特征图转为序列,通过`双向GRU`学习到序列特征。训练过程选用的损失函数为CTC(Connectionist Temporal Classification) loss,最终的评估指标为样本级别的错误率。
本路径下各个文件的作用如下:
- **ctc_reader.py :** 下载、读取、处理数据。提供方法`train()``test()` 分别产生训练集和测试集的数据迭代器。
- **crnn_ctc_model.py :** 在该脚本中定义了训练网络、预测网络和evaluate网络。
- **ctc_train.py :** 用于模型的训练,可通过命令`python train.py --help` 获得使用方法。
- **infer.py :** 加载训练好的模型文件,对新数据进行预测。可通过命令`python infer.py --help` 获得使用方法。
- **eval.py :** 评估模型在指定数据集上的效果。可通过命令`python infer.py --help` 获得使用方法。
- **utility.py :** 实现的一些通用方法,包括参数配置、tensor的构造等。
### 1.1 数据 ## 数据
数据的下载和简单预处理都在`ctc_reader.py`中实现。 数据的下载和简单预处理都在`ctc_reader.py`中实现。
#### 1.1.1 数据格式 ### 数据示例
我们使用的训练和测试数据如`图1`所示,每张图片包含单行不定长的文字符串,这些图片都是经过检测算法进行预框选处理的。 我们使用的训练和测试数据如`图1`所示,每张图片包含单行不定长的文字符串,这些图片都是经过检测算法进行预框选处理的。
<p align="center"> <p align="center">
<img src="images/demo.jpg" width="620" hspace='10'/> <br/> <img src="images/demo.jpg" width="620" hspace='10'/> <br/>
...@@ -35,12 +34,12 @@ ...@@ -35,12 +34,12 @@
在训练集中,每张图片对应的label是汉字在词典中的索引。 `图1` 对应的label如下所示: 在训练集中,每张图片对应的label是汉字在词典中的索引。 `图1` 对应的label如下所示:
``` ```
3835,8371,7191,2369,6876,4162,1938,168,1517,4590,3793 80,84,68,82,83,72,78,77,68,67
``` ```
在上边这个label中,`3835` 表示字符‘两’的索引,`4590` 表示中文字符逗号的索引。 在上边这个label中,`80` 表示字符`Q`的索引,`67` 表示英文字符`D`的索引。
#### 1.1.2 数据准备 ### 数据准备
**A. 训练集** **A. 训练集**
...@@ -105,7 +104,9 @@ data/test_images/00003.jpg ...@@ -105,7 +104,9 @@ data/test_images/00003.jpg
第三种:从stdin读入一张图片的path,然后进行一次inference. 第三种:从stdin读入一张图片的path,然后进行一次inference.
#### 1.2 训练 ## 模型训练与预测
### 训练
使用默认数据在GPU单卡上训练: 使用默认数据在GPU单卡上训练:
...@@ -121,7 +122,7 @@ env CUDA_VISIABLE_DEVICES=0,1,2,3 python ctc_train.py --parallel=True ...@@ -121,7 +122,7 @@ env CUDA_VISIABLE_DEVICES=0,1,2,3 python ctc_train.py --parallel=True
执行`python ctc_train.py --help`可查看更多使用方式和参数详细说明。 执行`python ctc_train.py --help`可查看更多使用方式和参数详细说明。
图2为使用默认参数和默认数据集训练的收敛曲线,其中横坐标轴为训练迭代次数,纵轴为样本级错误率。其中,蓝线为训练集上的样本错误率,红线为测试集上的样本错误率。在45轮迭代训练中,测试集上最低错误率为第60轮的21.11%. 图2为使用默认参数和默认数据集训练的收敛曲线,其中横坐标轴为训练迭代次数,纵轴为样本级错误率。其中,蓝线为训练集上的样本错误率,红线为测试集上的样本错误率。在60轮迭代训练中,测试集上最低错误率为第32轮的22.0%.
<p align="center"> <p align="center">
<img src="images/train.jpg" width="620" hspace='10'/> <br/> <img src="images/train.jpg" width="620" hspace='10'/> <br/>
...@@ -130,7 +131,7 @@ env CUDA_VISIABLE_DEVICES=0,1,2,3 python ctc_train.py --parallel=True ...@@ -130,7 +131,7 @@ env CUDA_VISIABLE_DEVICES=0,1,2,3 python ctc_train.py --parallel=True
### 1.3 评估 ## 测试
通过以下命令调用评估脚本用指定数据集对模型进行评估: 通过以下命令调用评估脚本用指定数据集对模型进行评估:
...@@ -144,7 +145,7 @@ env CUDA_VISIBLE_DEVICE=0 python eval.py \ ...@@ -144,7 +145,7 @@ env CUDA_VISIBLE_DEVICE=0 python eval.py \
执行`python ctc_train.py --help`可查看参数详细说明。 执行`python ctc_train.py --help`可查看参数详细说明。
### 1.4 预测 ### 预测
从标准输入读取一张图片的路径,并对齐进行预测: 从标准输入读取一张图片的路径,并对齐进行预测:
...@@ -176,5 +177,3 @@ env CUDA_VISIBLE_DEVICE=0 python infer.py \ ...@@ -176,5 +177,3 @@ env CUDA_VISIBLE_DEVICE=0 python infer.py \
--model_path="models/model_00044_15000" \ --model_path="models/model_00044_15000" \
--input_images_list="data/test.list" --input_images_list="data/test.list"
``` ```
>注意:因为版权原因,我们暂时停止提供中文数据集的下载和使用服务,你通过`ctc_reader.py`自动下载的数据将是含有30W图片的英文数据集。在英文数据集上的训练结果会稍后发布。
import paddle.fluid as fluid import paddle.fluid as fluid
from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter
from paddle.fluid.initializer import init_on_cpu
import math
def conv_bn_pool(input, def conv_bn_pool(input,
...@@ -8,7 +11,8 @@ def conv_bn_pool(input, ...@@ -8,7 +11,8 @@ def conv_bn_pool(input,
param=None, param=None,
bias=None, bias=None,
param_0=None, param_0=None,
is_test=False): is_test=False,
pooling=True):
tmp = input tmp = input
for i in xrange(group): for i in xrange(group):
tmp = fluid.layers.conv2d( tmp = fluid.layers.conv2d(
...@@ -19,32 +23,25 @@ def conv_bn_pool(input, ...@@ -19,32 +23,25 @@ def conv_bn_pool(input,
param_attr=param if param_0 is None else param_0, param_attr=param if param_0 is None else param_0,
act=None, # LinearActivation act=None, # LinearActivation
use_cudnn=True) use_cudnn=True)
#tmp = fluid.layers.Print(tmp)
tmp = fluid.layers.batch_norm( tmp = fluid.layers.batch_norm(
input=tmp, input=tmp,
act=act, act=act,
param_attr=param, param_attr=param,
bias_attr=bias, bias_attr=bias,
is_test=is_test) is_test=is_test)
tmp = fluid.layers.pool2d( if pooling:
input=tmp, tmp = fluid.layers.pool2d(
pool_size=2, input=tmp,
pool_type='max', pool_size=2,
pool_stride=2, pool_type='max',
use_cudnn=True, pool_stride=2,
ceil_mode=True) use_cudnn=True,
ceil_mode=True)
return tmp return tmp
def ocr_convs(input, def ocr_convs(input, regularizer=None, gradient_clip=None, is_test=False):
num,
with_bn,
regularizer=None,
gradient_clip=None,
is_test=False):
assert (num % 4 == 0)
b = fluid.ParamAttr( b = fluid.ParamAttr(
regularizer=regularizer, regularizer=regularizer,
gradient_clip=gradient_clip, gradient_clip=gradient_clip,
...@@ -63,7 +60,8 @@ def ocr_convs(input, ...@@ -63,7 +60,8 @@ def ocr_convs(input,
tmp = conv_bn_pool(tmp, 2, [32, 32], param=w1, bias=b, is_test=is_test) tmp = conv_bn_pool(tmp, 2, [32, 32], param=w1, bias=b, is_test=is_test)
tmp = conv_bn_pool(tmp, 2, [64, 64], param=w1, bias=b, is_test=is_test) tmp = conv_bn_pool(tmp, 2, [64, 64], param=w1, bias=b, is_test=is_test)
tmp = conv_bn_pool(tmp, 2, [128, 128], param=w1, bias=b, is_test=is_test) tmp = conv_bn_pool(
tmp, 2, [128, 128], param=w1, bias=b, is_test=is_test, pooling=False)
return tmp return tmp
...@@ -75,8 +73,6 @@ def encoder_net(images, ...@@ -75,8 +73,6 @@ def encoder_net(images,
is_test=False): is_test=False):
conv_features = ocr_convs( conv_features = ocr_convs(
images, images,
8,
True,
regularizer=regularizer, regularizer=regularizer,
gradient_clip=gradient_clip, gradient_clip=gradient_clip,
is_test=is_test) is_test=is_test)
...@@ -143,6 +139,7 @@ def ctc_train_net(images, label, args, num_classes): ...@@ -143,6 +139,7 @@ def ctc_train_net(images, label, args, num_classes):
L2_RATE = 0.0004 L2_RATE = 0.0004
LR = 1.0e-3 LR = 1.0e-3
MOMENTUM = 0.9 MOMENTUM = 0.9
learning_rate_decay = None
regularizer = fluid.regularizer.L2Decay(L2_RATE) regularizer = fluid.regularizer.L2Decay(L2_RATE)
fc_out = encoder_net(images, num_classes, regularizer=regularizer) fc_out = encoder_net(images, num_classes, regularizer=regularizer)
...@@ -155,7 +152,15 @@ def ctc_train_net(images, label, args, num_classes): ...@@ -155,7 +152,15 @@ def ctc_train_net(images, label, args, num_classes):
error_evaluator = fluid.evaluator.EditDistance( error_evaluator = fluid.evaluator.EditDistance(
input=decoded_out, label=casted_label) input=decoded_out, label=casted_label)
inference_program = fluid.default_main_program().clone(for_test=True) inference_program = fluid.default_main_program().clone(for_test=True)
optimizer = fluid.optimizer.Momentum(learning_rate=LR, momentum=MOMENTUM) if learning_rate_decay == "piecewise_decay":
learning_rate = fluid.layers.piecewise_decay([
args.total_step / 4, args.total_step / 2, args.total_step * 3 / 4
], [LR, LR * 0.1, LR * 0.01, LR * 0.001])
else:
learning_rate = LR
optimizer = fluid.optimizer.Momentum(
learning_rate=learning_rate, momentum=MOMENTUM)
_, params_grads = optimizer.minimize(sum_cost) _, params_grads = optimizer.minimize(sum_cost)
model_average = None model_average = None
if args.average_window > 0: if args.average_window > 0:
......
...@@ -7,7 +7,7 @@ from os import path ...@@ -7,7 +7,7 @@ from os import path
from paddle.v2.image import load_image from paddle.v2.image import load_image
import paddle.v2 as paddle import paddle.v2 as paddle
NUM_CLASSES = 10784 NUM_CLASSES = 95
DATA_SHAPE = [1, 48, 512] DATA_SHAPE = [1, 48, 512]
DATA_MD5 = "7256b1d5420d8c3e74815196e58cdad5" DATA_MD5 = "7256b1d5420d8c3e74815196e58cdad5"
......
...@@ -14,7 +14,7 @@ parser = argparse.ArgumentParser(description=__doc__) ...@@ -14,7 +14,7 @@ parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser) add_arg = functools.partial(add_arguments, argparser=parser)
# yapf: disable # yapf: disable
add_arg('batch_size', int, 32, "Minibatch size.") add_arg('batch_size', int, 32, "Minibatch size.")
add_arg('pass_num', int, 100, "Number of training epochs.") add_arg('total_step', int, 720000, "Number of training iterations.")
add_arg('log_period', int, 1000, "Log period.") add_arg('log_period', int, 1000, "Log period.")
add_arg('save_model_period', int, 15000, "Save model period. '-1' means never saving the model.") add_arg('save_model_period', int, 15000, "Save model period. '-1' means never saving the model.")
add_arg('eval_period', int, 15000, "Evaluate period. '-1' means never evaluating the model.") add_arg('eval_period', int, 15000, "Evaluate period. '-1' means never evaluating the model.")
...@@ -22,7 +22,7 @@ add_arg('save_model_dir', str, "./models", "The directory the model to be s ...@@ -22,7 +22,7 @@ add_arg('save_model_dir', str, "./models", "The directory the model to be s
add_arg('init_model', str, None, "The init model file of directory.") add_arg('init_model', str, None, "The init model file of directory.")
add_arg('use_gpu', bool, True, "Whether use GPU to train.") add_arg('use_gpu', bool, True, "Whether use GPU to train.")
add_arg('min_average_window',int, 10000, "Min average window.") add_arg('min_average_window',int, 10000, "Min average window.")
add_arg('max_average_window',int, 15625, "Max average window. It is proposed to be set as the number of minibatch in a pass.") add_arg('max_average_window',int, 12500, "Max average window. It is proposed to be set as the number of minibatch in a pass.")
add_arg('average_window', float, 0.15, "Average window.") add_arg('average_window', float, 0.15, "Average window.")
add_arg('parallel', bool, False, "Whether use parallel training.") add_arg('parallel', bool, False, "Whether use parallel training.")
# yapf: enable # yapf: enable
...@@ -71,6 +71,7 @@ def train(args, data_reader=ctc_reader): ...@@ -71,6 +71,7 @@ def train(args, data_reader=ctc_reader):
print "Init model from: %s." % args.init_model print "Init model from: %s." % args.init_model
train_exe = exe train_exe = exe
error_evaluator.reset(exe)
if args.parallel: if args.parallel:
train_exe = fluid.ParallelExecutor( train_exe = fluid.ParallelExecutor(
use_cuda=True, loss_name=sum_cost.name) use_cuda=True, loss_name=sum_cost.name)
...@@ -81,7 +82,7 @@ def train(args, data_reader=ctc_reader): ...@@ -81,7 +82,7 @@ def train(args, data_reader=ctc_reader):
var_names = [var.name for var in fetch_vars] var_names = [var.name for var in fetch_vars]
if args.parallel: if args.parallel:
results = train_exe.run(var_names, results = train_exe.run(var_names,
feed_dict=get_feeder_data(data, place)) feed=get_feeder_data(data, place))
results = [np.array(result).sum() for result in results] results = [np.array(result).sum() for result in results]
else: else:
results = exe.run(feed=get_feeder_data(data, place), results = exe.run(feed=get_feeder_data(data, place),
...@@ -89,55 +90,57 @@ def train(args, data_reader=ctc_reader): ...@@ -89,55 +90,57 @@ def train(args, data_reader=ctc_reader):
results = [result[0] for result in results] results = [result[0] for result in results]
return results return results
def test(pass_id, batch_id): def test(iter_num):
error_evaluator.reset(exe) error_evaluator.reset(exe)
for data in test_reader(): for data in test_reader():
exe.run(inference_program, feed=get_feeder_data(data, place)) exe.run(inference_program, feed=get_feeder_data(data, place))
_, test_seq_error = error_evaluator.eval(exe) _, test_seq_error = error_evaluator.eval(exe)
print "\nTime: %s; Pass[%d]-batch[%d]; Test seq error: %s.\n" % ( print "\nTime: %s; Iter[%d]; Test seq error: %s.\n" % (
time.time(), pass_id, batch_id, str(test_seq_error[0])) time.time(), iter_num, str(test_seq_error[0]))
def save_model(args, exe, pass_id, batch_id): def save_model(args, exe, iter_num):
filename = "model_%05d_%d" % (pass_id, batch_id) filename = "model_%05d" % iter_num
fluid.io.save_params( fluid.io.save_params(
exe, dirname=args.save_model_dir, filename=filename) exe, dirname=args.save_model_dir, filename=filename)
print "Saved model to: %s/%s." % (args.save_model_dir, filename) print "Saved model to: %s/%s." % (args.save_model_dir, filename)
error_evaluator.reset(exe) iter_num = 0
for pass_id in range(args.pass_num): while True:
batch_id = 1
total_loss = 0.0 total_loss = 0.0
total_seq_error = 0.0 total_seq_error = 0.0
# train a pass # train a pass
for data in train_reader(): for data in train_reader():
iter_num += 1
if iter_num > args.total_step:
return
results = train_one_batch(data) results = train_one_batch(data)
total_loss += results[0] total_loss += results[0]
total_seq_error += results[2] total_seq_error += results[2]
# training log # training log
if batch_id % args.log_period == 0: if iter_num % args.log_period == 0:
print "\nTime: %s; Pass[%d]-batch[%d]; Avg Warp-CTC loss: %s; Avg seq err: %s" % ( print "\nTime: %s; Iter[%d]; Avg Warp-CTC loss: %.3f; Avg seq err: %.3f" % (
time.time(), pass_id, batch_id, time.time(), iter_num,
total_loss / (batch_id * args.batch_size), total_loss / (args.log_period * args.batch_size),
total_seq_error / (batch_id * args.batch_size)) total_seq_error / (args.log_period * args.batch_size))
sys.stdout.flush() sys.stdout.flush()
total_loss = 0.0
total_seq_error = 0.0
# evaluate # evaluate
if batch_id % args.eval_period == 0: if iter_num % args.eval_period == 0:
if model_average: if model_average:
with model_average.apply(exe): with model_average.apply(exe):
test(pass_id, batch_id) test(iter_num)
else: else:
test(pass_id, batch_d) test(iter_num)
# save model # save model
if batch_id % args.save_model_period == 0: if iter_num % args.save_model_period == 0:
if model_average: if model_average:
with model_average.apply(exe): with model_average.apply(exe):
save_model(args, exe, pass_id, batch_id) save_model(args, exe, iter_num)
else: else:
save_model(args, exe, pass_id, batch_id) save_model(args, exe, iter_num)
batch_id += 1
def main(): def main():
......
...@@ -35,7 +35,7 @@ def evaluate(args, eval=ctc_eval, data_reader=ctc_reader): ...@@ -35,7 +35,7 @@ def evaluate(args, eval=ctc_eval, data_reader=ctc_reader):
# prepare environment # prepare environment
place = fluid.CPUPlace() place = fluid.CPUPlace()
if use_gpu: if args.use_gpu:
place = fluid.CUDAPlace(0) place = fluid.CUDAPlace(0)
exe = fluid.Executor(place) exe = fluid.Executor(place)
......
...@@ -238,7 +238,7 @@ def lstm_net(data, ...@@ -238,7 +238,7 @@ def lstm_net(data,
size=[dict_dim, emb_dim], size=[dict_dim, emb_dim],
param_attr=fluid.ParamAttr(learning_rate=emb_lr)) param_attr=fluid.ParamAttr(learning_rate=emb_lr))
fc0 = fluid.layers.fc(input=emb, size=hid_dim * 4, act='tanh') fc0 = fluid.layers.fc(input=emb, size=hid_dim * 4)
lstm_h, c = fluid.layers.dynamic_lstm( lstm_h, c = fluid.layers.dynamic_lstm(
input=fc0, size=hid_dim * 4, is_reverse=False) input=fc0, size=hid_dim * 4, is_reverse=False)
...@@ -273,9 +273,9 @@ def bilstm_net(data, ...@@ -273,9 +273,9 @@ def bilstm_net(data,
size=[dict_dim, emb_dim], size=[dict_dim, emb_dim],
param_attr=fluid.ParamAttr(learning_rate=emb_lr)) param_attr=fluid.ParamAttr(learning_rate=emb_lr))
fc0 = fluid.layers.fc(input=emb, size=hid_dim * 4, act='tanh') fc0 = fluid.layers.fc(input=emb, size=hid_dim * 4)
rfc0 = fluid.layers.fc(input=emb, size=hid_dim * 4, act='tanh') rfc0 = fluid.layers.fc(input=emb, size=hid_dim * 4)
lstm_h, c = fluid.layers.dynamic_lstm( lstm_h, c = fluid.layers.dynamic_lstm(
input=fc0, size=hid_dim * 4, is_reverse=False) input=fc0, size=hid_dim * 4, is_reverse=False)
......
...@@ -238,7 +238,7 @@ def lstm_net(data, ...@@ -238,7 +238,7 @@ def lstm_net(data,
size=[dict_dim, emb_dim], size=[dict_dim, emb_dim],
param_attr=fluid.ParamAttr(learning_rate=emb_lr)) param_attr=fluid.ParamAttr(learning_rate=emb_lr))
fc0 = fluid.layers.fc(input=emb, size=hid_dim * 4, act='tanh') fc0 = fluid.layers.fc(input=emb, size=hid_dim * 4)
lstm_h, c = fluid.layers.dynamic_lstm( lstm_h, c = fluid.layers.dynamic_lstm(
input=fc0, size=hid_dim * 4, is_reverse=False) input=fc0, size=hid_dim * 4, is_reverse=False)
...@@ -273,9 +273,9 @@ def bilstm_net(data, ...@@ -273,9 +273,9 @@ def bilstm_net(data,
size=[dict_dim, emb_dim], size=[dict_dim, emb_dim],
param_attr=fluid.ParamAttr(learning_rate=emb_lr)) param_attr=fluid.ParamAttr(learning_rate=emb_lr))
fc0 = fluid.layers.fc(input=emb, size=hid_dim * 4, act='tanh') fc0 = fluid.layers.fc(input=emb, size=hid_dim * 4)
rfc0 = fluid.layers.fc(input=emb, size=hid_dim * 4, act='tanh') rfc0 = fluid.layers.fc(input=emb, size=hid_dim * 4)
lstm_h, c = fluid.layers.dynamic_lstm( lstm_h, c = fluid.layers.dynamic_lstm(
input=fc0, size=hid_dim * 4, is_reverse=False) input=fc0, size=hid_dim * 4, is_reverse=False)
......
...@@ -75,7 +75,7 @@ def lstm_net(data, ...@@ -75,7 +75,7 @@ def lstm_net(data,
size=[dict_dim, emb_dim], size=[dict_dim, emb_dim],
param_attr=fluid.ParamAttr(learning_rate=emb_lr)) param_attr=fluid.ParamAttr(learning_rate=emb_lr))
fc0 = fluid.layers.fc(input=emb, size=hid_dim * 4, act='tanh') fc0 = fluid.layers.fc(input=emb, size=hid_dim * 4)
lstm_h, c = fluid.layers.dynamic_lstm( lstm_h, c = fluid.layers.dynamic_lstm(
input=fc0, size=hid_dim * 4, is_reverse=False) input=fc0, size=hid_dim * 4, is_reverse=False)
......
Running sample code in this directory requires PaddelPaddle v0.10.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html) and make an update.
---
# Chinese Ancient Poetry Generation
## Introduction
On an encoder-decoder neural network model, we perform sequence-to-sequence training with The Complete Tang Poems. The model should generate the verse after the given input verse.
The encoders and decoders in the model all use a stacked bi-directional LSTM which, by default, has three layers and attention.
The following is a brief directory structure and description of this example:
```text
.
├── data # store training data and dictionary
│ ├── download.sh # download raw data
├── README.md # documentation
├── index.html # document (html format)
├── preprocess.py # raw data preprocessing
├── generate.py # generate verse script
├── network_conf.py # model definition
├── reader.py # data reading interface
├── train.py # training script
└── utils.py # define utility functions
```
## Data Processing
### Raw Data Source
The training data of this example is The Complete Tang Poems in the [Chinese ancient poetry database](https://github.com/chinese-poetry/chinese-poetry). There are about 54,000 Tang poems.
### Downloading Raw Data
```bash
cd data && ./download.sh && cd ..
```
### Data Preprocessing
```bash
python preprocess.py --datadir data/raw --outfile data/poems.txt --dictfile data/dict.txt
```
After the above script is executed, the processed training data "poems.txt" and dictionary "dict.txt" will be generated. The dictionary's unit is word, and it is constructed by words with a frequency of at least 10.
Divided into three columns, each line in poems.txt contains the title, author, and content of a poem. Verses of a poem are separated by`.`.
Training data example:
```text
登鸛雀樓 王之渙 白日依山盡.黃河入海流.欲窮千里目.更上一層樓
觀獵 李白 太守耀清威.乘閑弄晚暉.江沙橫獵騎.山火遶行圍.箭逐雲鴻落.鷹隨月兔飛.不知白日暮.歡賞夜方歸
晦日重宴 陳嘉言 高門引冠蓋.下客抱支離.綺席珍羞滿.文場翰藻摛.蓂華彫上月.柳色藹春池.日斜歸戚里.連騎勒金羈
```
When the model is trained, each verse is used as a model input, and the next verse is used as a prediction target.
## Model Training
The command line arguments in the training script, ["train.py"](./train.py), can be viewed with `python train.py --help`. The main parameters are as follows:
- `num_passes`: number of passes
- `batch_size`: batch size
- `use_gpu`: whether to use GPU
- `trainer_count`: number of trainers, the default is 1
- `save_dir_path`: model storage path, the default is the current directory under the models directory
- `encoder_depth`: model encoder LSTM depth, default 3
- `decoder_depth`: model decoder LSTM depth, default 3
- `train_data_path`: training data path
- `word_dict_path`: data dictionary path
- `init_model_path`: initial model path, no need to specify at the start of training
### Training Execution
```bash
python train.py \
--num_passes 50 \
--batch_size 256 \
--use_gpu True \
--trainer_count 1 \
--save_dir_path models \
--train_data_path data/poems.txt \
--word_dict_path data/dict.txt \
2>&1 | tee train.log
```
After each pass training, the model parameters are saved under directory "models". Training logs are stored in "train.log".
### Optimal Model Parameters
Find the pass with the lowest cost and use the model parameters corresponding to the pass for subsequent prediction.
```bash
python -c 'import utils; utils.find_optiaml_pass("./train.log")'
```
## Generating Verses
Use the ["generate.py"](./generate.py) script to generate the next verse for the input verses. Command line arguments can be viewed with `python generate.py --help`.
The main parameters are described as follows:
- `model_path`: trained model parameter file
- `word_dict_path`: data dictionary path
- `test_data_path`: input data path
- `batch_size`: batch size, default is 1
- `beam_size`: search size in beam search, the default is 5
- `save_file`: output save path
- `use_gpu`: whether to use GPU
### Perform Generation
For example, save the verse `孤帆遠影碧空盡` in the file `input.txt` as input. To predict the next sentence, execute the command:
```bash
python generate.py \
--model_path models/pass_00049.tar.gz \
--word_dict_path data/dict.txt \
--test_data_path input.txt \
--save_file output.txt
```
The result will be saved in the file "output.txt". For the above example input, the generated verses are as follows:
```text
-9.6987 萬 壑 清 風 黃 葉 多
-10.0737 萬 里 遠 山 紅 葉 深
-10.4233 萬 壑 清 波 紅 一 流
-10.4802 萬 壑 清 風 黃 葉 深
-10.9060 萬 壑 清 風 紅 葉 多
```
Running the program sample in this directory requires the version of the PaddlePaddle is v0.10.0. If the version is below this requirement, following the [instructions](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html) in the document about installation to update your Paddlepaddle's version.
# Learning To Rank
Learning to rank[1] is a method to build the ranking model of machine learning,which plays an important role in the computer science scene such as information retrieval, natural language processing and data mining. The primary purpose of learning to rank is to order a document that reflects the relevance of any query request for a given set of documents. In this example, using the annotated Corpus training two classical ranking models RankNet[4] and LamdaRank[6],the corresponding ranking model can be generated, and the correlation documents can be sorted by any query request.
## Background Information
Learning to rank is the application of machine learning. On the one hand, the manual ranking rules can not deal with the large scale of the candidate data, on the other hand can not give the appropriate weight for the candidate data of different channels, so it is widely used in daily life.Learning to rank originated in the field of information retrieval and is still the core parts of many information retrieval systems,such as the ranking of search results in search engine,ranking of candidate data in the recommendation system,and online advertising, and so on. In this case, we use the document retrieval task to illustrate the learning to rank model.
![image](https://github.com/PaddlePaddle/models/blob/develop/ltr/images/search_engine_example.png?raw=true)
Figure.1 the role of ranking model in the typical application search engine of document retrieval.
Assuming that there is a set of documents $S$, the document retrieval task is based on the relevance of the requests to give the order of the documents. According to the query request, the query engine will score every document according to the query request, and arrange the documents in reverse order according to the grading, and get the query results.Given a query and corresponding documents, the model is trained based the scoring of the document sorts. When it goes to the predicted phase, the model will generate the document sort according to the query received. The common ranking learning methods are divided into the following three categories.
- Pointwise approach
In this case,the learning-to-rank problem can be viewed as a regression problem.The input single sample is the **score-document**,the correlation score of each query-Document pair is used as the real number or the sequence number,so the individual query-document pairs are uesd as a sample point (the origin of the word pointwise) to train the ranking model.When predicting,the correlation score of query-document pair is given for the specified input.
- Pairwise approach
In this case, the learning-to-rank problem is approximated by a classification problem — learning a binary classifier that can tell which document is better in a given pair of documents.The single input sample is the **label-document pair**.For multiple result documents of one query,any two documents are combined to form document pairs as input samples.Any two documents are combined to form document pairs as the input samples.That is to learn a two classifier, the input is a pair of documents A-B (the origin of Pairwise), according to whether the correlation of A is better than B,the two classifier gives the classification label 1 or 0.After classifying all the document pairs,we can get a set of partial order relations to construct the order relation of the documents.The principle of this kind of the method is to reduce the number of the reverse order document pairs in the order of the given pair of documents $S$,so as to achieve the goal of optimizing the sorting result.
- Listwise approach
These algorithms try to directly optimize the value of one of the above evaluation measures, averaged over all queries in the training data.The single input sample is a **document arranged**. By constructing the appropriate measurement function to measure the difference between the current document ranking and the optimal ranking,then optimizes the evaluation measures to get the ranking model. It is difficult to optimize because most of the ranking loss function are not continuous smooth functions.
![image](https://github.com/PaddlePaddle/models/blob/develop/ltr/images/learning_to_rank.jpg?raw=true)
Figure.2 Three methods of the ranking model
## Experimental data
The experimental data in this example uses the LETOR corpus of benchmarking data in the Ranking learning, part of the query results is from the Gov2 website, which contains about 1700 query request result document lists and has made manual annotations on the relevance of the documents.Among them,a query contains a unique query id,corresponding to a number of related documents,forming a query request result list.The feature vector of Each document is represented by the one-dimensional array,and corresponds to a correlation score between the human annotation and the query.
This example automatically downloads the LETOR MQ2007 dataset and cache when it is first running,without manual downloading.
The Data sets of **mq2007** provide a generation format for three types of the ranking models respectively.Which is need to specify the **format**.
for example,the call interface
```
pairwise_train_dataset = functools.partial(paddle.dataset.mq2007.train, format="pairwise")
for label, left_doc, right_doc in pairwise_train_dataset():
...
```
## Model overview
For the ranking model, the RankNet model of the Pairwise method and the LambdaRank of the Listwise method are provided in this example, respectively representing two types of the learning methods. The ranking model of the Pointwise method can be degraded to the regression problem. Please refer to the [recommendation system](https://github.com/PaddlePaddle/book/blob/develop/05.recommender_system/README.cn.md) in the PaddleBook.
## RankNet model
[RankNet](http://icml.cc/2015/wp-content/uploads/2015/06/icml_ranking.pdf) is a classic Pairwise ranking learning method, which is a typical forward neural network ranking model. The $i$ document in the document collection $S$ is denoted as $U_i$, its document feature vector is denoted as $x_i$, and for a given document pair $U_i$, $U_j$, RankNet maps the input single document feature vector $x$ to $f(x)$, and gets $s_i=f(x_i),$s_j=f(x_j)$.The probability that the correlation of $U_i$ is better than $U_j$ is recorded as $P_{i,j}$.
$$P_{i,j}=P(U_{i}>U_{j})=\frac{1}{1+e^{-\sigma (s_{i}-s_{j}))}}$$
Because most of the rank metric functions are mostly non-continuous and non-smooth, the ranknet needs a metric function $C$ that can be optimized.First,the cross entropy is used as a measure function to measure the prediction cost, and the loss function $C$ is recorded as
$$C_{i,j}=-\bar{P_{i,j}}logP_{i,j}-(1-\bar{P_{i,j}})log(1-P_{i,j})$$
The $\bar{P_{i,j}}$ represents the true probability, which is recorded as
$$\bar{P_{i,j}}=\frac{1}{2}(1+S_{i,j})$$
$S_{i,j}$ = {+1,0}, which represents the label of pair consisting of $U_i$ and $U_j$,that is, whethe the Ui correlation is better than $U_j$.
Finally, a derivable metric loss function is obtained
$$C=\frac{1}{2}(1-S_{i,j})\sigma (s_{i}-s{j})+log(1+e^{-\sigma (s_{i }-s_{j})})$$
It can be optimized using conventional gradient descent methods. See [RankNet](http://icml.cc/2015/wp-content/uploads/2015/06/icml_ranking.pdf) for details.
Meanwhile, get the gradient information of document $U_i$ in the ranking optimization process.
$$\lambda {i,j}=\frac{\partial C}{\partial s{i}} = \frac{1}{2}(1-S_{i,j})-\frac{1} {1+e^{\sigma (s_{i}-s_{j})}}$$
The meaning of the expression is the increase or decrease of the document $U_i$ during this round of sorting optimization.
Based on the above inference, the RankNet network structure is constructed, which is composed of several layers of hidden layers and full connected layers. As shown in the figure, the document features are used in the hidden layers, and the all connected layer is transformed by layer by layer,completing the transformation from the underlying feature space to the high-level feature space. The structure of docA and docB is symmetrical and they are input into the final RankCost layer.
![image](https://github.com/sunshine-2015/models/blob/patch-4/ltr/images/ranknet_en.png?raw=true)
Figure.3 The structure diagram of RankNet network
- Full connected layer: means that each node in the previous layer is connected to the underlying network. In this example, **paddle.layer.fc** is also used. Note that the full connection layer dimension input to the RankCost layer is 1.
- RankCost layer: The RankCost layer is the core of the RankNet ranking network and measures whether the docA correlation is better than the docB. Give the predicted value and compare it with the label. Cross entropy is used as a measure of the loss function, using a gradient descent method for optimization. Details can be found in [RankNet](http://icml.cc/2015/wp-content/uploads/2015/06/icml_ranking.pdf) [4].
Because the network structure in Pairwise is Left-right symmetrical, half of the network structure can be defined, and the other half share network parameters. The PaddlePaddle allows sharing of connections in the network structure, parameters with the same name will share parameters. Use the PaddlePaddle to implement the RankNet ranking model. The sample code for defining the network structure is given in the **half_ranknet** function in [ranknet.py](https://github.com/PaddlePaddle/models/blob/develop/ltr/ranknet.py).
The structure defined in the ***half_ranknet*** function uses the same model structure as in FIG 3: two hidden layers, a fully connected layer with **hidden_size=10** and a fully connected layer with **hidden_size=1**. In this example, **input_dim** refers to the dimension of the characteristic of the input **single document**. The value of label is 1,0. Each input sample is the structure of **<label>, <docA, docB>**. Take docA as an example, input the **input_dim** document features, turn into 10-dimensional, 1-dimensional features, and finally input into the RankCost layer, compare docA and docB. The RankCost output gives the predicted value.
### RankNet model training
Train **RankNet** model executes on the command line:
```
python train.py --model_type ranknet
```
The initial execution automatically downloads data and trains the RankNet model,which stores the parameters of the model at each round.
### RankNet model prediction
Use the trained **RankNet** model to continue the prediction and execute it on the command line:
```
python infer.py --model_type ranknet --test_model_path models/ranknet_params_0.tar.gz
```
This example provides training and prediction part of the RankNet model. After completing the training,The model is divided into two parts, topology structure (the **rank_cost** is not part of the model topology) and the model's parameter file. In this example, the topology **half_ranknet** during the **ranknet** training is reused, and the parameters of the model are loaded from the external memory. The input of the prediction of the model is the feature vector of a single document,and the model will give a relevance score. Sorting the forecast scores to get the final document relevance ranking result.
## User-defined RankNet data
The above code uses PaddlePaddle's built-in sorting data. If you want to use custom format data, you can refer to the PaddlePaddle's built-in **mq2007** data set and write a new generator function. For example, the input data is in the following format, containing only three documents doc0-doc2.
<query_id> <relevance_score> <feature_vector>(featureid: feature_value)
```
query_id : 1, relevance_score:1, feature_vector 0:0.1, 1:0.2, 2:0.4 #doc0
query_id : 1, relevance_score:2, feature_vector 0:0.3, 1:0.1, 2:0.4 #doc1
query_id : 1, relevance_score:0, feature_vector 0:0.2, 1:0.4, 2:0.1 #doc2
query_id : 2, relevance_score:0, feature_vector 0:0.1, 1:0.4, 2:0.1 #doc0
.....
```
The input sample needs to be converted to Pairwise's input format. for example, the combination of the generated format is same with the structure of the mq2007 Pairwise format.
<label> <docA_feature_vector><docB_feature_vector>
```
1 doc1 doc0
1 doc1 doc2
1 doc0 doc2
....
```
Note that generally, in Pairwise format data, label=1 indicates that the correlation between docA and the query is better than that of docB. In fact, the label information is implicit in the combination of docA and docB. If there is **0 docA docB**, then exchange order to construct **1 docB docA**.
In addition, combining all pairs will make training data redundancy because the total order relationship on the document set can be recovered from the partial partial order relationships. See the [PairWise approach](http://www.machinelearning.org/proceedings/icml2007/papers/139.pdf) [5] for related research. This example will not repeat.
```
# a customized data generator
def gen_pairwise_data(text_line_of_data):
"""
return :
------
label : np.array, shape=(1)
docA_feature_vector : np.array, shape=(1, feature_dimension)
docA_feature_vector : np.array, shape=(1, feature_dimension)
"""
return label, docA_feature_vector, docB_feature_vector
```
Corresponding to the paddle input, **integer_value** is a single integer, and **dense_vector** is a real one-dimensional vector, corresponding to the generator, it is necessary to specify the input data correspondence before training the model.
```
# Define the input data order
feeding = { "label":0,
"left/data" :1,
"right/data":2}
```
## LambdaRank ranking model
[LambdaRank](https://papers.nips.cc/paper/2971-learning-to-rank-with-nonsmooth-cost-functions.pdf)[6] is Listwise's ranking method. It is developed by Bugers et al.[6] from RankNet. It uses the method of constructing lambda function (the origin of LambdaRank name) to optimize the metric NDCG (Normalized Discounted Cumulative Gain). The resulting list is used individually as a training sample. The NDCG is one of the standards in the information theory that measures the ranking quality of the document list. The NDCG scores of the previous $K$ documents are recorded as
$$NDCG@K=Z_{k}\sum (2^{rel_{i}})1/log(k+1)$$
As previously deduced by RankNet, document sorting requires the gradient information of errors from sorting. The NDCG metric function is non-smooth and non-successive. It cannot directly obtain the gradient information. Therefore, the |delta(NDCG)|=|NDCG(new) - NDCG(old)| is introduced to construct the lambda function as
$$\lambda {i,j}=\frac{\partial C}{\partial s{i}}=-\frac{\sigma }{1+e^{\sigma (s_{i}-s{j})}}|\Delta NDCG|$$
Replace the gradient representation in RankNet and get the ranking model called [LambdaRank](https://papers.nips.cc/paper/2971-learning-to-rank-with-nonsmooth-cost-functions.pdf)
From the above derivation we can see that the LambdaRank network structure is very similar to the RankNet structure. as the picture shows
![image](https://github.com/sunshine-2015/models/blob/patch-4/ltr/images/LambdaRank_EN.png?raw=true)
Figure 4. Network structure of LambdaRank
Replacing the pair of the document-score pair with the list of the query-related document as input sample, refactoring the LambdaCost layer to RankCost layer, and keep the rest of the network same with Ranket.
- LambdaCost layer: The LambdaCost layer uses the NDCG difference as the Lambda function. The score is a one-dimensional sequence. For a monotonic training sample, the full-connection layer output is a 1x1 sequence, and the length of the both sequence is equal to the number of documents obtained by the query. The **LambdaRank** function's details is in [LambdaRank](https://papers.nips.cc/paper/2971-learning-to-rank-with-nonsmooth-cost-functions.pdf)
An example of a LambdaRank network structure defined using PaddlePaddle is the **lambda_rank** function in [lambda_rank.py](https://github.com/PaddlePaddle/models/blob/develop/ltr/lambda_rank.py).
The same model structure as FIG. 3 is used in the above structure. Similar to RankNet, two fully connected layers of **hidden_size=10** and **hidden_size=1** are used respectively. The input_dim in this example refers to the dimension of the characteristic of the input single document. Each input sample is the structure of label, <docA, docB>. Take docA as an example, with inputing the input_dim's document features,which are turned into 10-dimensional, 1-dimensional features, and finally is inputed into the LambdaCost layer. It should be noted that the label and data formats here are **dense_vector_sequences**, which represent a sequence of document scores or document features.
### LambdaRank model training
Execute on the command line to train the **LambdaRank** model:
```
python train.py --model_type lambdarank
```
The first run of the script will automatically download the data and train the LambdaRank model and store the model of each round.
### LambdaRank model prediction
The prediction process of the LambdaRank model is the same as RankNet. The model's topology in the prediction model reuses the model definition in the code and loads the corresponding parameter file from the external memory. The input during the forecast is a document list, and the output is the relevance score of each document in the document list. The document is re-sorted based the score, to obtaine the final document's sorting result.
Use the trained LambdaRank model to continue the prediction:
```
python infer.py --model_type lambdarank --test_model_path models/lambda_rank_params_0.tar.gz
```
## Customize LambdaRank data
The above code uses the built-in mq2007 data from PaddlePaddle, and if you want to use custom format data, you can refer to the built-in mq2007 dataset in PaddlePaddle and write a generator function. For example, the input data is in the following format, which only has three documents doc0-doc2.
<query_id> <relevance_score> <feature_vector>
```
query_id : 1, relevance_score:1, feature_vector 0:0.1, 1:0.2, 2:0.4 #doc0
query_id : 1, relevance_score:2, feature_vector 0:0.3, 1:0.1, 2:0.4 #doc1
query_id : 1, relevance_score:0, feature_vector 0:0.2, 1:0.4, 2:0.1 #doc2
query_id : 2, relevance_score:0, feature_vector 0:0.1, 1:0.4, 2:0.1 #doc0
query_id : 2, relevance_score:2, feature_vector 0:0.1, 1:0.4, 2:0.1 #doc1
.....
```
Convert the format to the Listwise, for example:
<query_id><relevance_score> <feature_vector>
```
1 1 0.1,0.2,0.4
1 2 0.3,0.1,0.4
1 0 0.2,0.4,0.1
2 0 0.1,0.4,0.1
2 2 0.1,0.4,0.1
......
```
**Note Data format**
- The number of documents corresponding to each sample in the data must be more than the NDCG_num of **lambda_cost** layer.
- If the document of the single sample is 0, the correlation of the document is 0, and the calculation of NDCG is invalid, then we can determine that the query is invalid, and we can filter out such query during training.
```
# self define data generator
def gen_listwise_data(text_all_lines_of_data):
"""
return :
------
label : np.array, shape=(samples_num, )
querylist : np.array, shape=(samples_num, feature_dimension)
"""
return label_list, query_docs_feature_vector_matrix
```
Corresponds to input of PaddlePaddle, the type of **label** as **dense_vector_sequence**, is the sequence of score, the type of **data** for **dense_vector_sequence**, is the input feature vector sequences, **input_dim** is a one-dimensional feature vector dimension of a single document, need to specify the corresponding relations between the input data before training model.
```
# Define the input data order
feeding = {"label":0,
"data" : 1}
```
## Output custom evaluation index during training.
Here, we take **RankNet** as an example of how to export custom evaluation metrics during training. This method can also be used to obtain the value of an output matrix of the network in the training process.
The RankNet network learns a scoring function to score the inputs of left and right.The greater the difference between the scores of the two inputs,the stronger the discrimination ability of the scoring function for positive and negative , and the better the model's generalization ability. If we want to get the average value of the difference between the right and left inputs in the training process. To compute this custom metric, we need to get the output matrix for each mini-batch after the split layer (which corresponds to the layer in ranknet with the name of left_score and right_score). We can do this with the following two steps:
1. In the event_handler, processing the predefined paddle.Event.EndIteration or the paddle.Event.EndPass events in PaddlePaddle.
2. Call the event.gm.getlayeroutput, pass the name of the specified layer into the network, we can obtain the value of the layer after completing the forward calculation of mini-batch.
Here is the code example:
```
def score_diff(right_score, left_score):
return np.average(np.abs(right_score - left_score))
def event_handler(event):
if isinstance(event, paddle.event.EndIteration):
if event.batch_id % 25 == 0:
diff = score_diff(
event.gm.getLayerOutputs("right_score")["right_score"][
"value"],
event.gm.getLayerOutputs("left_score")["left_score"][
"value"])
logger.info(("Pass %d Batch %d : Cost %.6f, "
"average absolute diff scores: %.6f") %
(event.pass_id, event.batch_id, event.cost, diff))
```
## Conclusion
LTR is widely used in real life. The construction method of ranking model can generally be divided into PointWise, Pairwise, Listwise. this example adopt the LETOR mq2007 data as an example, expounds the classic method of RankNet in Pairwise and the LambdaRank in Listwise method, shows how to use PaddlePaddle framework to structure the corresponding sorting model, and provides a sample used the custom data types. Paddlepaddles provides a flexible programming interface. At the same time, with using a set of code in a single GPU, The LTR type is implemented by the multi-machine's distributed multi-gpu.
## Attention
1. As a demonstration example of LTR, this example is a small network size. In the application, it is necessary to adjust the network complexity in combination with the actual situation and reset the network scale.
2. In this case, the feature vectors in the experimental data are the joint features of the query-document. When using the independent features of the query-document, [DSSM](https://github.com/PaddlePaddle/models/tree/develop/dssm) can be used to build the network.
## Reference
1. https://en.wikipedia.org/wiki/Learning_to_rank
2. Liu T Y. [Learning to rank for information retrieval](http://ftp.nowpublishers.com/article/DownloadSummary/INR-016)[J]. Foundations and Trends® in Information Retrieval, 2009, 3(3): 225-331.
3. Li H. [Learning to rank for information retrieval and natural language processing](http://www.morganclaypool.com/doi/abs/10.2200/S00607ED2V01Y201410HLT026)[J]. Synthesis Lectures on Human Language Technologies, 2014, 7(3): 1-121.
4. Burges C, Shaked T, Renshaw E, et al. [Learning to rank using gradient descent](http://machinelearning.wustl.edu/mlpapers/paper_files/icml2005_BurgesSRLDHH05.pdf)[C]//Proceedings of the 22nd international conference on Machine learning. ACM, 2005: 89-96.
5. Cao Z, Qin T, Liu T Y, et al. [Learning to rank: from pairwise approach to listwise approach](http://machinelearning.wustl.edu/mlpapers/paper_files/icml2007_CaoQLTL07.pdf)[C]//Proceedings of the 24th international conference on Machine learning. ACM, 2007: 129-136.
6. Burges C J C, Ragno R, Le Q V. [Learning to rank with nonsmooth cost functions](https://papers.nips.cc/paper/2971-learning-to-rank-with-nonsmooth-cost-functions.pdf)[C]//NIPS. 2006, 6: 193-200.
## Introduction
Sequence is an input data type faced by many machine learning and data mining tasks. Taking Natural Language Processing task as an example, sentence is composed of words, and paragraph is composed of sentences. As a result, a paragraph can be seen as a nested sequence (or called: double sequence), and each element of the sequence is a sequence.
Double sequence is a very flexible data organization method supported by PaddlePaddle, which can help us better describe more complex data such as paragraphs, multiple rounds of dialogues. With a double-layer sequence as input, we can design a hierarchical network to better accomplish some complex tasks.
This unit will introduce how to use a double sequence in PaddlePaddle.
- [Text Classification Based on Double Sequence](https://github.com/PaddlePaddle/models/tree/develop/nested_sequence/text_classification)
...@@ -22,7 +22,7 @@ PaddlePaddle 实现该网络结构的代码见 `network_conf.py`。 ...@@ -22,7 +22,7 @@ PaddlePaddle 实现该网络结构的代码见 `network_conf.py`。
对双层时间序列的处理,需要先将双层时间序列数据变换成单层时间序列数据,再对每一个单层时间序列进行处理。 在 PaddlePaddle 中 ,`recurrent_group` 是帮助我们构建处理双层序列的层次化模型的主要工具。这里,我们使用两个嵌套的 `recurrent_group` 。外层的 `recurrent_group` 将段落拆解为句子,`step` 函数中拿到的输入是句子序列;内层的 `recurrent_group` 将句子拆解为词语,`step` 函数中拿到的输入是非序列的词语。 对双层时间序列的处理,需要先将双层时间序列数据变换成单层时间序列数据,再对每一个单层时间序列进行处理。 在 PaddlePaddle 中 ,`recurrent_group` 是帮助我们构建处理双层序列的层次化模型的主要工具。这里,我们使用两个嵌套的 `recurrent_group` 。外层的 `recurrent_group` 将段落拆解为句子,`step` 函数中拿到的输入是句子序列;内层的 `recurrent_group` 将句子拆解为词语,`step` 函数中拿到的输入是非序列的词语。
在词语级别,我们通过 CNN 网络以词向量为输入输出学习到的句子表示;在段落级别,将每个句子的表示通过池化作用得到段落表示。 在词语级别,我们运用 CNN 网络,以词向量为输入,输出学习到的句子表示;在段落级别,我们通过池化作用,从若干句子的表示中得到段落的表示。
``` python ``` python
nest_group = paddle.layer.recurrent_group(input=[paddle.layer.SubsequenceInput(emb), nest_group = paddle.layer.recurrent_group(input=[paddle.layer.SubsequenceInput(emb),
...@@ -112,18 +112,18 @@ python train.py ...@@ -112,18 +112,18 @@ python train.py
``` ```
将以 PaddlePaddle 内置的情感分类数据集: `imdb` 运行本例。 将以 PaddlePaddle 内置的情感分类数据集: `imdb` 运行本例。
### 预测 ### 预测
训练结束后模型将存储在指定目录当中(默认models目录),在终端执行: 训练结束后,模型将被存储在指定目录当中(默认models目录),在终端执行:
```bash ```bash
python infer.py --model_path 'models/params_pass_00000.tar.gz' python infer.py --model_path 'models/params_pass_00000.tar.gz'
``` ```
默认情况下,预测脚本将加载训练一个pass的模型对 `imdb的测试集` 进行测试。 预测脚本将加载、训练一个pass的模型,并用这个模型对 `imdb的测试集` 进行测试。
## 使用自定义数据训练和预测 ## 使用自定义数据训练和预测
### 训练 ### 训练
1.数据组织 1.数据组织
输入数据格式如下:每一行为一条样本,以 `\t` 分隔,第一列是类别标签,第二列是输入文本的内容。以下是两条示例数据: 每一行为一条样本,以 `\t` 分隔,第一列是类别标签,第二列是输入文本的内容。
``` ```
positive This movie is very good. The actor is so handsome. positive This movie is very good. The actor is so handsome.
...@@ -132,7 +132,7 @@ negative What a terrible movie. I waste so much time. ...@@ -132,7 +132,7 @@ negative What a terrible movie. I waste so much time.
2.编写数据读取接口 2.编写数据读取接口
自定义数据读取接口只需编写一个 Python 生成器实现**从原始输入文本中解析一条训练样本**的逻辑。以下代码片段实现了读取原始数据返回类型为: `paddle.data_type.integer_value_sub_sequence``paddle.data_type.integer_value` 自定义数据读取接口只需编写一个 Python 生成器,实现**解析输入文本**的逻辑。以下代码片段实现了读取原始数据返回类型为: `paddle.data_type.integer_value_sub_sequence``paddle.data_type.integer_value`
```python ```python
def train_reader(data_dir, word_dict, label_dict): def train_reader(data_dir, word_dict, label_dict):
""" """
......
Running sample code in this directory requires PaddelPaddle v0.11.0 and later. If the PaddlePaddle on your device is lower than� this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html) and make an update.
---
# Text Classification Based on Double Sequence
## Introduction
This example will demonstrate how to organize long text(usually paragraphs or chapters) input into a double sequence in PaddlePaddle to complete the task of classifying long text.
## Model introduction
We treat a text as a sequence of sentences, and each sentence is a sequence of words.
We first use the convolutional neural network to encode each sentence in the paragraph; then, let the expression vector of each sentence goes through the pooled layer to obtain the encoded vector of the paragraph; finally, the encoded vector of the paragraph is used as the classifier(the full connection of softmax layer) input to obtain the final classification result.
**The model structure is shown in the figure below**
<p align="center">
<img src="images/model.jpg" width = "60%" align="center"/><br/>
Figure1. Text classification model based on double layer sequence
</p>
PaddlePaddle implementation of the network structure is in `network_conf.py`.
To process double-level time series, we need to transform the double layer time series data into single time series data, and then process each single time series. In PaddlePaddle, recurrent_group is the main tool to help us build a hierarchical model for processing double decker sequences. Here, we use two nested recurrent_group. The outer recurrent_group dissolves the paragraph into a sentence, and the input from the step function is the sentence sequence. The recurrent_group in the inner layer dismantles the sentence into word. The input in the step function is a group of non-sequential words.
At the level of words, we obtain the expression of a sentence from word vectors using CNN. At the level of paragraphs, we obtain the expression of a paragraph from the expressions of the sentences in the paragraph through pooling.
``` python
nest_group = paddle.layer.recurrent_group(input=[paddle.layer.SubsequenceInput(emb),
hidden_size],
step=cnn_cov_group)
```
The single layer sequence data after disassembly is represented by a CNN network to learn the corresponding vector, and the network structure of the CNN contains the following parts:
- **Convolution layer**: convolution in text classification is done on time series. The width of convolution kernel is consistent with the matrix generated by word vector level. After convolution, the result is a "feature map". Multiple feature maps can be obtained by using multiple convolutions of different heights. This code uses the convolution kernel of 3 (the red box of Figure 1) and 4 (the blue box of Figure 1) by default.
- **Maximum pool layer**: the maximum pool operation is performed on each feature graph obtained by convolution. Since the feature graph itself is already a vector, the maximum pooling is actually the largest element in the selection of each vector. All the largest elements are spliced together to form a new vector.
- **Linear projection layer**: splices the results from the maximum pool operations into a long vector. Linear projection is used to get the representation vectors of corresponding single layer sequences.
Implementation of CNN network:
```python
def cnn_cov_group(group_input, hidden_size):
"""
Convolution group definition.
:param group_input: The input of this layer.
:type group_input: LayerOutput
:params hidden_size: The size of the fully connected layer.
:type hidden_size: int
"""
conv3 = paddle.networks.sequence_conv_pool(
input=group_input, context_len=3, hidden_size=hidden_size)
conv4 = paddle.networks.sequence_conv_pool(
input=group_input, context_len=4, hidden_size=hidden_size)
linear_proj = paddle.layer.fc(input=[conv3, conv4],
size=hidden_size,
param_attr=paddle.attr.ParamAttr(name='_cov_value_weight'),
bias_attr=paddle.attr.ParamAttr(name='_cov_value_bias'),
act=paddle.activation.Linear())
return linear_proj
```
PaddlePaddle has been encapsulated with a pooled text sequence convolution module: `paddle.networks.sequence_conv_pool`, which can be called directly.
After getting the expression vectors of each sentence, all the sentence vectors are passed through an average pool level, and a vector representation of a sample is obtained. The vector outputs the final prediction result through a fully connected layer. The code:
```python
avg_pool = paddle.layer.pooling(input=nest_group,
pooling_type=paddle.pooling.Avg(),
agg_level=paddle.layer.AggregateLevel.TO_NO_SEQUENCE)
prob = paddle.layer.mixed(size=class_num,
input=[paddle.layer.full_matrix_projection(input=avg_pool)],
act=paddle.activation.Softmax())
```
## Install dependency package
```bash
pip install -r requirements.txt
```
## Specify training configuration parameters
The training and model configuration parameters are modified through the `config.py` script. There are detailed explanations for configurable parameters in the script. The examples are as follows:
```python
class TrainerConfig(object):
# whether to use GPU for training
use_gpu = False
# the number of threads used in one machine
trainer_count = 1
# train batch size
batch_size = 32
...
class ModelConfig(object):
# embedding vector dimension
emb_size = 28
...
```
Modify the `config.py` to adjust the parameters. For example, we can specify whether or not to use GPU for training by modifying `use_gpu`.
## Implement with PaddlePaddle Built-in data
### Train
Execute at the terminal:
```bash
python train.py
```
You will run this example with the PaddlePaddle's built-in emotional categorization dataset, `imdb` .
### Prediction
After training, the model will be stored in the specified directory (the default models directory), execute the following command:
```bash
python infer.py --model_path 'models/params_pass_00000.tar.gz'
```
The prediction script will load and train a pass model to test `test set of the IMDB`.
## Use custom data train and predict
### Train
1.Data structure
Each line is a sample with class label and text. Class label and text content are seperated by `\t`. The following are two samples::
```
positive This movie is very good. The actor is so handsome.
negative What a terrible movie. I waste so much time.
```
2.Write the Data Reading Interface
To define a custom data reading interface, we only need to write a Python generator to **parse the input text**. The following code fragment is implemented to read the return type of the original data: `paddle.data_type.integer_value_sub_sequence` and `paddle.data_type.integer_value`
```python
def train_reader(data_dir, word_dict, label_dict):
"""
Reader interface for training data
:param data_dir: data directory
:type data_dir: str
:param word_dict: path of word dictionary,
the dictionary must has a "UNK" in it.
:type word_dict: Python dict
:param label_dict: path of label dictionary.
:type label_dict: Python dict
"""
def reader():
UNK_ID = word_dict['<unk>']
word_col = 1
lbl_col = 0
for file_name in os.listdir(data_dir):
file_path = os.path.join(data_dir, file_name)
if not os.path.isfile(file_path):
continue
with open(file_path, "r") as f:
for line in f:
line_split = line.strip().split("\t")
doc = line_split[word_col]
doc_ids = []
for sent in doc.strip().split("."):
sent_ids = [
word_dict.get(w, UNK_ID)
for w in sent.split()]
if sent_ids:
doc_ids.append(sent_ids)
yield doc_ids, label_dict[line_split[lbl_col]]
return reader
```
Note that, in this case to English period `'.'` as a separator, the text is divided into a certain number of sentences, and each sentence is expressed as the corresponding index array Thesaurus (`sent_ids`). Since the representation of the current sample (`doc_ids`) contains all the sentences of the text, it is type: `paddle.data_type.integer_value_sub_sequence`.
3.Specify command line parameters for training
`train.py` contains the following parameters:
```
Options:
--train_data_dir TEXT The path of training dataset (default: None). If
this parameter is not set, imdb dataset will be
used.
--test_data_dir TEXT The path of testing dataset (default: None). If this
parameter is not set, imdb dataset will be used.
--word_dict_path TEXT The path of word dictionary (default: None). If this
parameter is not set, imdb dataset will be used. If
this parameter is set, but the file does not exist,
word dictionay will be built from the training data
automatically.
--label_dict_path TEXT The path of label dictionary (default: None).If this
parameter is not set, imdb dataset will be used. If
this parameter is set, but the file does not exist,
label dictionay will be built from the training data
automatically.
--model_save_dir TEXT The path to save the trained models (default:
'models').
--help Show this message and exit.
```
Modify the startup parameters in the `train.py` script to run this example directly. Take the sample data in the data directory for example, execute at the terminal:
```bash
python train.py \
--train_data_dir 'data/train_data' \
--test_data_dir 'data/test_data' \
--word_dict_path 'word_dict.txt' \
--label_dict_path 'label_dict.txt'
```
So you can train with sample data.
### Prediction
1.Specify command line parameters
`infer.py` contains the following parameters:
```
Options:
--data_path TEXT The path of data for inference (default: None). If
this parameter is not set, imdb test dataset will be
used.
--model_path TEXT The path of saved model. [required]
--word_dict_path TEXT The path of word dictionary (default: None). If this
parameter is not set, imdb dataset will be used.
--label_dict_path TEXT The path of label dictionary (default: None).If this
parameter is not set, imdb dataset will be used.
--batch_size INTEGER The number of examples in one batch (default: 32).
--help Show this message and exit.
```
2.take the sample data in the `data` directory as an example, execute at the terminal:
```bash
python infer.py \
--data_path 'data/infer.txt' \
--word_dict_path 'word_dict.txt' \
--label_dict_path 'label_dict.txt' \
--model_path 'models/params_pass_00000.tar.gz'
```
So the sample data can be predicted.
Running sample code in this directory requires PaddelPaddle v0.10.0 and later. If the PaddlePaddle on your device is lower than this version, please follow the instructions in [installation document](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html) and make an update.
---
# Scheduled Sampling
## Overview
The goal of a sequence generation task is to maximize the probability of the target sequence given the input of the source. When training, the model uses real elements in the target sequence as the input to each step of the decoder and then maximizes the probability of the next element. The element decoded in the previous step is used as the current input to generate the next element. We can see that the probability distribution of the decoder input data in the training phase and the generation phase is not consistent in this case.
Scheduled sampling\[[1](#references)\] is a solution to the inconsistency in the distribution of input data during training and generation phases. In the early stage of training, this method uses the real elements in the target sequence as the decoder input and quickly guides the model from a randomly initialized state to a reasonable state. As training progresses, the method will gradually increase the use of the generated elements as decoder input to solve the problem of inconsistent data distribution.
In a standard sequence-to-sequence model, if an incorrect element is generated at the beginning of the sequence, the subsequent input state will be affected, and the error will continue to accumulate as the generation process continues. Scheduled sampling uses the generated elements as the decoder input with a certain probability, so even if the previous generation steps have errors, the training process' target is still to maximize the probability of the real target sequence, so the model is still trained in the right direction. Therefore, this approach increases the fault tolerance of the model.
## Introduction to the Algorithm
Scheduled sampling is used only in the training phase of the sequence-to-sequence model and not in the generation phase.
The standard sequence-to-sequence model uses the true element, $y_{t-1}$, at the previous moment as input for the decoder to maximize the probability of the $t$-th element. Let $g_{t-1}$ be the element generated at the latest moment. The scheduled sampling algorithm will use $g_{t-1}$ as the decoder input with a certain probability.
Suppose that the $i$-th mini-batch has been trained. To control the decoder's input, the scheduled sampling algorithm defines a probability variable $\epsilon_i$ that decays as $i$ increases. Some common definitions are:
Linear attenuation: $\epsilon_i=max(\epsilon,k-c*i)$, where $\epsilon$ limits the minimum value of $\epsilon_i$, and $k$ and $c$ control the magnitude of linear attenuation.
Exponential decay: $\epsilon_i=k^i$, where $0 <k <1$, $k$ controls the magnitude of the exponential decay.
Inverse Sigmoid decay: $\epsilon_i=k/(k+exp(i/k))$, where $k>1$, $k$ also controls the magnitude of attenuation.
<p align="center">
<img src="images/decay.jpg" width="50%" align="center"><br>
Figure 1. Attenuation curves for linear attenuation, exponential decay, and inverse Sigmoid decay
</p>
As shown in Fig. 2, at time $t$ of the decoder, the scheduled sampling algorithm uses the true element $yt−1$ of the previous moment as the decoder input with probability $\epsilon_i$, and uses $g_{t-1}$ generated at the previous moment as the decoder input with probability $1-\epsilon_i$. From Figure 1, we see that as $i$ increases, $\epsilon_i$ decreases. Decoder will continue to use the generated elements as input. The data distribution during the training phase and the generation phase will gradually become more consistent.
<p align="center">
<img src="images/Scheduled_Sampling.jpg" width="50%" align="center"><br>
Figure 2. Scheduled sampling algorithm selects different elements as decoder input
</p>
## Model Implementation
Since the scheduled sampling algorithm is just an improvement over the sequence-to-sequence model, its overall implementation framework is similar to that of the sequence-to-sequence model. Thus, only the parts related to scheduled sampling are described here. For the complete code, see `network_conf.py`.
First, we import the required package and define the class `RandomScheduleGenerator` that controls the decay probability as follows:
```python
import numpy as np
import math
class RandomScheduleGenerator:
"""
The random sampling rate for scheduled sampling algoithm, which uses devcayed
sampling rate.
"""
...
```
We will now define the three methods of class `RandomScheduleGenerator`: `__init__`, `getScheduleRate`, and `processBatch`.
The `__init__` method initializes the class. The `schedule_type` parameter specifies which decay mode to use. The options are `constant`, `linear`, `exponential`, and `inverse_sigmoid`. Mode `constant` uses a fixed $\epsilon_i$ for all mini-batch; mode `linear` refers to linear attenuation; mode `exponential` refers to exponential decay; mode `inverse_sigmoid` refers to inverse sigmoid decay. Parameters `a` and `b` of the `__init__` method represent the parameters of the attenuation method, and they need to be tuned on the validation set. `self.schedule_computers` maps the decay mode to a function that calculates εi. The last line assigns the selected attenuation function to `self.schedule_computer` according to `schedule_type`.
```python
def __init__(self, schedule_type, a, b):
"""
schduled_type: is the type of the decay. It supports constant, linear,
exponential, and inverse_sigmoid right now.
a: parameter of the decay (MUST BE DOUBLE)
b: parameter of the decay (MUST BE DOUBLE)
"""
self.schedule_type = schedule_type
self.a = a
self.b = b
self.data_processed_ = 0
self.schedule_computers = {
"constant": lambda a, b, d: a,
"linear": lambda a, b, d: max(a, 1 - d / b),
"exponential": lambda a, b, d: pow(a, d / b),
"inverse_sigmoid": lambda a, b, d: b / (b + math.exp(d * a / b)),
}
assert (self.schedule_type in self.schedule_computers)
self.schedule_computer = self.schedule_computers[self.schedule_type]
```
`getScheduleRate` calculates $\epsilon_i$ based on the decay function and the amount of data already processed.
```python
def getScheduleRate(self):
"""
Get the schedule sampling rate. Usually not needed to be called by the users
"""
return self.schedule_computer(self.a, self.b, self.data_processed_)
```
The `processBatch` method is sampled according to the probability value $\epsilon_i$, and it output `indexes`. Each element in `indexes` has ϵi probability to be assigned `0`, $1-\epsilon_i$ to be assigned `1`. `indexes` determines whether the decoder's input is a real element or a generated element. A value of `0` indicates that the real element is used, and a value of `1` indicates that the generated element is used.
```python
def processBatch(self, batch_size):
"""
Get a batch_size of sampled indexes. These indexes can be passed to a
MultiplexLayer to select from the grouth truth and generated samples
from the last time step.
"""
rate = self.getScheduleRate()
numbers = np.random.rand(batch_size)
indexes = (numbers >= rate).astype('int32').tolist()
self.data_processed_ += batch_size
return indexes
```
Scheduled sampling algorithm needs to add to the sequence-to-sequence model another input variable, `true_token_flag`, to control the decoder input.
```python
true_token_flags = paddle.layer.data(
name='true_token_flag',
type=paddle.data_type.integer_value_sequence(2))
```
Here, we also need to encapsulate the original reader and add a data generator to `true_token_flag`. We use linear decay as an example to show how to call `RandomScheduleGenerator` defined above to generate input data for `true_token_flag`.
```python
def gen_schedule_data(reader,
schedule_type="linear",
decay_a=0.75,
decay_b=1000000):
"""
Creates a data reader for scheduled sampling.
Output from the iterator that created by original reader will be
appended with "true_token_flag" to indicate whether to use true token.
:param reader: the original reader.
:type reader: callable
:param schedule_type: the type of sampling rate decay.
:type schedule_type: str
:param decay_a: the decay parameter a.
:type decay_a: float
:param decay_b: the decay parameter b.
:type decay_b: float
:return: the new reader with the field "true_token_flag".
:rtype: callable
"""
schedule_generator = RandomScheduleGenerator(schedule_type, decay_a, decay_b)
def data_reader():
for src_ids, trg_ids, trg_ids_next in reader():
yield src_ids, trg_ids, trg_ids_next, \
[0] + schedule_generator.processBatch(len(trg_ids) - 1)
return data_reader
```
This code appends the input data that controls the decoder input after the original input data (ie, source sequence element `src_ids`, target sequence element `trg_ids`, and an element in the target sequence `trg_ids_next`). Since the first element of the decoder is the sequence starter, we set the first element of the appended data to `0`, indicating that the first operation of the decoder always uses the first element of the real target sequence (ie, the sequence starter).
The decoder function called by each step of `recurrent_group` during training is as follows:
```python
def gru_decoder_with_attention_train(enc_vec, enc_proj, true_word,
true_token_flag):
"""
The decoder step for training.
:param enc_vec: the encoder vector for attention
:type enc_vec: LayerOutput
:param enc_proj: the encoder projection for attention
:type enc_proj: LayerOutput
:param true_word: the ground-truth target word
:type true_word: LayerOutput
:param true_token_flag: the flag of using the ground-truth target word
:type true_token_flag: LayerOutput
:return: the softmax output layer
:rtype: LayerOutput
"""
decoder_mem = paddle.layer.memory(
name='gru_decoder', size=decoder_size, boot_layer=decoder_boot)
context = paddle.networks.simple_attention(
encoded_sequence=enc_vec,
encoded_proj=enc_proj,
decoder_state=decoder_mem)
gru_out_memory = paddle.layer.memory(
name='gru_out', size=target_dict_dim)
generated_word = paddle.layer.max_id(input=gru_out_memory)
generated_word_emb = paddle.layer.embedding(
input=generated_word,
size=word_vector_dim,
param_attr=paddle.attr.ParamAttr(name='_target_language_embedding'))
current_word = paddle.layer.multiplex(
input=[true_token_flag, true_word, generated_word_emb])
decoder_inputs = paddle.layer.fc(
input=[context, current_word],
size=decoder_size * 3,
act=paddle.activation.Linear(),
bias_attr=False)
gru_step = paddle.layer.gru_step(
name='gru_decoder',
input=decoder_inputs,
output_mem=decoder_mem,
size=decoder_size)
out = paddle.layer.fc(
name='gru_out',
input=gru_step,
size=target_dict_dim,
act=paddle.activation.Softmax())
return out
```
The function uses the `memory` layer `gru_out_memory` to memorize the elements generated at the last moment, and select the word with the highest probability as the generated word. The `multiplex` layer makes a choice between the true element, `true_word`, and the generated element, `generated_word`, and uses the result as the decoder input. The `multiplex` layer uses three inputs, `true_token_flag`, `true_word`, and `generated_word_emb`. For each of these three inputs, if `true_token_flag` is 0, then the `multiplex` layer outputs the corresponding element in `true_word`; if `true_token_flag` is 1, then the `multiplex` layer outputs the corresponding element in `generated_word_emb`.
## References
[1] Bengio S, Vinyals O, Jaitly N, et al. [Scheduled sampling for sequence prediction with recurrent neural networks](http://papers.nips.cc/paper/5956-scheduled-sampling-for-sequence-prediction-with-recurrent-neural-networks)//Advances in Neural Information Processing Systems. 2015: 1171-1179.
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册