提交 7df53c9a 编写于 作者: W wanghaoshuang

Merge branch 'develop' of https://github.com/PaddlePaddle/models into model_avg

The minimum PaddlePaddle version needed for the code sample in this directory is v0.11.0. If you are on a version of PaddlePaddle earlier than v0.11.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
---
# Convolutional Sequence to Sequence Learning # Convolutional Sequence to Sequence Learning
This model implements the work in the following paper: This model implements the work in the following paper:
......
运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
---
# 点击率预估 # 点击率预估
以下是本例目录包含的文件以及对应说明: 以下是本例目录包含的文件以及对应说明:
......
The minimum PaddlePaddle version needed for the code sample in this directory is v0.10.0. If you are on a version of PaddlePaddle earlier than v0.10.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
---
# Click-Through Rate Prediction # Click-Through Rate Prediction
## Introduction ## Introduction
......
The minimum PaddlePaddle version needed for the code sample in this directory is v0.11.0. If you are on a version of PaddlePaddle earlier than v0.11.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
---
# Deep Factorization Machine for Click-Through Rate prediction # Deep Factorization Machine for Click-Through Rate prediction
## Introduction ## Introduction
......
运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此版本要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
---
# 深度结构化语义模型 (Deep Structured Semantic Models, DSSM) # 深度结构化语义模型 (Deep Structured Semantic Models, DSSM)
DSSM使用DNN模型在一个连续的语义空间中学习文本低纬的表示向量,并且建模两个句子间的语义相似度。本例演示如何使用PaddlePaddle实现一个通用的DSSM 模型,用于建模两个字符串间的语义相似度,模型实现支持通用的数据格式,用户替换数据便可以在真实场景中使用该模型。 DSSM使用DNN模型在一个连续的语义空间中学习文本低纬的表示向量,并且建模两个句子间的语义相似度。本例演示如何使用PaddlePaddle实现一个通用的DSSM 模型,用于建模两个字符串间的语义相似度,模型实现支持通用的数据格式,用户替换数据便可以在真实场景中使用该模型。
......
The minimum PaddlePaddle version needed for the code sample in this directory is v0.10.0. If you are on a version of PaddlePaddle earlier than v0.10.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
---
# Deep Structured Semantic Models (DSSM) # Deep Structured Semantic Models (DSSM)
Deep Structured Semantic Models (DSSM) is simple but powerful DNN based model for matching web search queries and the URL based documents. This example demonstrates how to use PaddlePaddle to implement a generic DSSM model for modeling the semantic similarity between two strings. Deep Structured Semantic Models (DSSM) is simple but powerful DNN based model for matching web search queries and the URL based documents. This example demonstrates how to use PaddlePaddle to implement a generic DSSM model for modeling the semantic similarity between two strings.
......
Deep ASR Kickoff The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
---
### TODO
This project is still under active development.
...@@ -15,6 +15,9 @@ from multiprocessing import Manager, Process ...@@ -15,6 +15,9 @@ from multiprocessing import Manager, Process
import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
import data_utils.augmentor.trans_add_delta as trans_add_delta import data_utils.augmentor.trans_add_delta as trans_add_delta
from data_utils.util import suppress_complaints, suppress_signal from data_utils.util import suppress_complaints, suppress_signal
from data_utils.util import SharedNDArray, SharedMemoryPoolManager
from data_utils.util import DaemonProcessGroup, batch_to_ndarray
from data_utils.util import CriticalException, ForceExitWrapper, EpochEndSignal
class SampleInfo(object): class SampleInfo(object):
...@@ -27,7 +30,7 @@ class SampleInfo(object): ...@@ -27,7 +30,7 @@ class SampleInfo(object):
feature_frame_num (int): Time length of the sample. feature_frame_num (int): Time length of the sample.
feature_dim (int): Feature dimension of one frame. feature_dim (int): Feature dimension of one frame.
label_bin_path (str): File containing the label data. label_bin_path (str): File containing the label data.
label_size (int): Byte count of the sample's label data. label_size (int): Byte count of the sample's label data.
label_frame_num (int): Label number of the sample. label_frame_num (int): Label number of the sample.
""" """
...@@ -48,23 +51,36 @@ class SampleInfo(object): ...@@ -48,23 +51,36 @@ class SampleInfo(object):
class SampleInfoBucket(object): class SampleInfoBucket(object):
"""SampleInfoBucket contains paths of several description files. Feature """SampleInfoBucket contains paths of several description files. Feature
description file contains necessary information (including path of binary description file contains necessary information (including path of binary
data, sample start position, sample byte number etc.) to access samples' data, sample start position, sample byte number etc.) to access samples'
feature data and the same with the label description file. SampleInfoBucket feature data and the same with the label description file. SampleInfoBucket
is the minimum unit to do shuffle. is the minimum unit to do shuffle.
Args: Args:
feature_bin_paths (list|tuple): Files containing the binary feature feature_bin_paths (list|tuple): Files containing the binary feature
data. data.
feature_desc_paths (list|tuple): Files containing the description of feature_desc_paths (list|tuple): Files containing the description of
samples' feature data. samples' feature data.
label_bin_paths (list|tuple): Files containing the binary label data. label_bin_paths (list|tuple): Files containing the binary label data.
label_desc_paths (list|tuple): Files containing the description of label_desc_paths (list|tuple): Files containing the description of
samples' label data. samples' label data.
split_perturb(int): Maximum perturbation value for length of
sub-sentence when splitting long sentence.
split_sentence_threshold(int): Sentence whose length larger than
the value will trigger split operation.
split_sub_sentence_len(int): sub-sentence length is equal to
(split_sub_sentence_len + \
rand() % split_perturb).
""" """
def __init__(self, feature_bin_paths, feature_desc_paths, label_bin_paths, def __init__(self,
label_desc_paths): feature_bin_paths,
feature_desc_paths,
label_bin_paths,
label_desc_paths,
split_perturb=50,
split_sentence_threshold=512,
split_sub_sentence_len=256):
block_num = len(label_bin_paths) block_num = len(label_bin_paths)
assert len(label_desc_paths) == block_num assert len(label_desc_paths) == block_num
assert len(feature_bin_paths) == block_num assert len(feature_bin_paths) == block_num
...@@ -75,6 +91,10 @@ class SampleInfoBucket(object): ...@@ -75,6 +91,10 @@ class SampleInfoBucket(object):
self._feature_desc_paths = feature_desc_paths self._feature_desc_paths = feature_desc_paths
self._label_bin_paths = label_bin_paths self._label_bin_paths = label_bin_paths
self._label_desc_paths = label_desc_paths self._label_desc_paths = label_desc_paths
self._split_perturb = split_perturb
self._split_sentence_threshold = split_sentence_threshold
self._split_sub_sentence_len = split_sub_sentence_len
self._rng = random.Random(0)
def generate_sample_info_list(self): def generate_sample_info_list(self):
sample_info_list = [] sample_info_list = []
...@@ -101,42 +121,70 @@ class SampleInfoBucket(object): ...@@ -101,42 +121,70 @@ class SampleInfoBucket(object):
label_start = int(label_desc_split[2]) label_start = int(label_desc_split[2])
label_size = int(label_desc_split[3]) label_size = int(label_desc_split[3])
label_frame_num = int(label_desc_split[4]) label_frame_num = int(label_desc_split[4])
assert feature_frame_num == label_frame_num
sample_info_list.append(
SampleInfo(feature_bin_path, feature_start, feature_size, if self._split_sentence_threshold == -1 or \
feature_frame_num, feature_dim, label_bin_path, self._split_perturb == -1 or \
label_start, label_size, label_frame_num)) self._split_sub_sentence_len == -1 \
or self._split_sentence_threshold >= feature_frame_num:
sample_info_list.append(
SampleInfo(feature_bin_path, feature_start,
feature_size, feature_frame_num, feature_dim,
label_bin_path, label_start, label_size,
label_frame_num))
#split sentence
else:
cur_frame_pos = 0
cur_frame_len = 0
remain_frame_num = feature_frame_num
while True:
if remain_frame_num > self._split_sentence_threshold:
cur_frame_len = self._split_sub_sentence_len + \
self._rng.randint(0, self._split_perturb)
if cur_frame_len > remain_frame_num:
cur_frame_len = remain_frame_num
else:
cur_frame_len = remain_frame_num
sample_info_list.append(
SampleInfo(
feature_bin_path, feature_start + cur_frame_pos
* feature_dim * 4, cur_frame_len * feature_dim *
4, cur_frame_len, feature_dim, label_bin_path,
label_start + cur_frame_pos * 4, cur_frame_len *
4, cur_frame_len))
remain_frame_num -= cur_frame_len
cur_frame_pos += cur_frame_len
if remain_frame_num <= 0:
break
return sample_info_list return sample_info_list
class EpochEndSignal(): class AsyncDataReader(object):
pass
class DataReader(object):
"""DataReader provides basic audio sample preprocessing pipeline including """DataReader provides basic audio sample preprocessing pipeline including
data loading and data augmentation. data loading and data augmentation.
Args: Args:
feature_file_list (str): File containing paths of feature data file and feature_file_list (str): File containing paths of feature data file and
corresponding description file. corresponding description file.
label_file_list (str): File containing paths of label data file and label_file_list (str): File containing paths of label data file and
corresponding description file. corresponding description file.
drop_frame_len (int): Samples whose label length above the value will be drop_frame_len (int): Samples whose label length above the value will be
dropped. dropped.(Using '-1' to disable the policy)
process_num (int): Number of processes for processing data. proc_num (int): Number of processes for processing data.
sample_buffer_size (int): Buffer size to indicate the maximum samples sample_buffer_size (int): Buffer size to indicate the maximum samples
cached. cached.
sample_info_buffer_size (int): Buffer size to indicate the maximum sample_info_buffer_size (int): Buffer size to indicate the maximum
sample information cached. sample information cached.
batch_buffer_size (int): Buffer size to indicate the maximum batch batch_buffer_size (int): Buffer size to indicate the maximum batch
cached. cached.
shuffle_block_num (int): Block number indicating the minimum unit to do shuffle_block_num (int): Block number indicating the minimum unit to do
shuffle. shuffle.
random_seed (int): Random seed. random_seed (int): Random seed.
verbose (int): If set to 0, complaints including exceptions and signal verbose (int): If set to 0, complaints including exceptions and signal
traceback from sub-process will be suppressed. If set traceback from sub-process will be suppressed. If set
to 1, all complaints will be printed. to 1, all complaints will be printed.
""" """
...@@ -144,11 +192,11 @@ class DataReader(object): ...@@ -144,11 +192,11 @@ class DataReader(object):
feature_file_list, feature_file_list,
label_file_list, label_file_list,
drop_frame_len=512, drop_frame_len=512,
process_num=10, proc_num=10,
sample_buffer_size=1024, sample_buffer_size=1024,
sample_info_buffer_size=1024, sample_info_buffer_size=1024,
batch_buffer_size=1024, batch_buffer_size=10,
shuffle_block_num=1, shuffle_block_num=10,
random_seed=0, random_seed=0,
verbose=0): verbose=0):
self._feature_file_list = feature_file_list self._feature_file_list = feature_file_list
...@@ -164,8 +212,12 @@ class DataReader(object): ...@@ -164,8 +212,12 @@ class DataReader(object):
self._sample_buffer_size = sample_buffer_size self._sample_buffer_size = sample_buffer_size
self._sample_info_buffer_size = sample_info_buffer_size self._sample_info_buffer_size = sample_info_buffer_size
self._batch_buffer_size = batch_buffer_size self._batch_buffer_size = batch_buffer_size
self._process_num = process_num self._proc_num = proc_num
if self._proc_num <= 2:
raise ValueError("Value of `proc_num` should be greater than 2.")
self._sample_proc_num = self._proc_num - 2
self._verbose = verbose self._verbose = verbose
self._force_exit = ForceExitWrapper(self._manager.Value('b', False))
def generate_bucket_list(self, is_shuffle): def generate_bucket_list(self, is_shuffle):
if self._block_info_list is None: if self._block_info_list is None:
...@@ -199,41 +251,57 @@ class DataReader(object): ...@@ -199,41 +251,57 @@ class DataReader(object):
def set_transformers(self, transformers): def set_transformers(self, transformers):
self._transformers = transformers self._transformers = transformers
def _sample_generator(self): def recycle(self, *args):
for shared_ndarray in args:
if not isinstance(shared_ndarray, SharedNDArray):
raise Value("Only support recycle SharedNDArray object.")
shared_ndarray.recycle(self._pool_manager.pool)
def _start_async_processing(self):
sample_info_queue = self._manager.Queue(self._sample_info_buffer_size) sample_info_queue = self._manager.Queue(self._sample_info_buffer_size)
sample_queue = self._manager.Queue(self._sample_buffer_size) sample_queue = self._manager.Queue(self._sample_buffer_size)
self._order_id = 0 self._order_id = 0
@suppress_complaints(verbose=self._verbose) @suppress_complaints(verbose=self._verbose, notify=self._force_exit)
def ordered_feeding_task(sample_info_queue): def ordered_feeding_task(sample_info_queue):
if self._verbose == 0:
signal.signal(signal.SIGTERM, suppress_signal)
signal.signal(signal.SIGINT, suppress_signal)
for sample_info_bucket in self._bucket_list: for sample_info_bucket in self._bucket_list:
sample_info_list = sample_info_bucket.generate_sample_info_list( try:
) sample_info_list = \
self._rng.shuffle(sample_info_list) # do shuffle here sample_info_bucket.generate_sample_info_list()
for sample_info in sample_info_list: except Exception as e:
sample_info_queue.put((sample_info, self._order_id)) raise CriticalException(e)
self._order_id += 1 else:
self._rng.shuffle(sample_info_list) # do shuffle here
for i in xrange(self._process_num): for sample_info in sample_info_list:
sample_info_queue.put((sample_info, self._order_id))
self._order_id += 1
for i in xrange(self._sample_proc_num):
sample_info_queue.put(EpochEndSignal()) sample_info_queue.put(EpochEndSignal())
feeding_thread = Thread( feeding_proc = DaemonProcessGroup(
target=ordered_feeding_task, args=(sample_info_queue, )) proc_num=1, target=ordered_feeding_task, args=(sample_info_queue, ))
feeding_thread.daemon = True feeding_proc.start_all()
feeding_thread.start()
@suppress_complaints(verbose=self._verbose) @suppress_complaints(verbose=self._verbose, notify=self._force_exit)
def ordered_processing_task(sample_info_queue, sample_queue, out_order): def ordered_processing_task(sample_info_queue, sample_queue, out_order):
if self._verbose == 0: if self._verbose == 0:
signal.signal(signal.SIGTERM, suppress_signal) signal.signal(signal.SIGTERM, suppress_signal)
signal.signal(signal.SIGINT, suppress_signal) signal.signal(signal.SIGINT, suppress_signal)
def read_bytes(fpath, start, size): def read_bytes(fpath, start, size):
f = open(fpath, 'r') try:
f.seek(start, 0) f = open(fpath, 'r')
binary_bytes = f.read(size) f.seek(start, 0)
f.close() binary_bytes = f.read(size)
return binary_bytes f.close()
return binary_bytes
except Exception as e:
raise CriticalException(e)
ins = sample_info_queue.get() ins = sample_info_queue.get()
...@@ -244,11 +312,21 @@ class DataReader(object): ...@@ -244,11 +312,21 @@ class DataReader(object):
sample_info.feature_start, sample_info.feature_start,
sample_info.feature_size) sample_info.feature_size)
assert sample_info.feature_frame_num \
* sample_info.feature_dim * 4 == len(feature_bytes), \
(sample_info.feature_bin_path,
sample_info.feature_frame_num,
sample_info.feature_dim,
len(feature_bytes))
label_bytes = read_bytes(sample_info.label_bin_path, label_bytes = read_bytes(sample_info.label_bin_path,
sample_info.label_start, sample_info.label_start,
sample_info.label_size) sample_info.label_size)
assert sample_info.label_frame_num * 4 == len(label_bytes) assert sample_info.label_frame_num * 4 == len(label_bytes), (
sample_info.label_bin_path, sample_info.label_array,
len(label_bytes))
label_array = struct.unpack('I' * sample_info.label_frame_num, label_array = struct.unpack('I' * sample_info.label_frame_num,
label_bytes) label_bytes)
label_data = np.array( label_data = np.array(
...@@ -273,7 +351,8 @@ class DataReader(object): ...@@ -273,7 +351,8 @@ class DataReader(object):
time.sleep(0.001) time.sleep(0.001)
# drop long sentence # drop long sentence
if self._drop_frame_len >= sample_data[0].shape[0]: if self._drop_frame_len == -1 or \
self._drop_frame_len >= sample_data[0].shape[0]:
sample_queue.put(sample_data) sample_queue.put(sample_data)
out_order[0] += 1 out_order[0] += 1
...@@ -283,73 +362,76 @@ class DataReader(object): ...@@ -283,73 +362,76 @@ class DataReader(object):
out_order = self._manager.list([0]) out_order = self._manager.list([0])
args = (sample_info_queue, sample_queue, out_order) args = (sample_info_queue, sample_queue, out_order)
workers = [ sample_proc = DaemonProcessGroup(
Process( proc_num=self._sample_proc_num,
target=ordered_processing_task, args=args) target=ordered_processing_task,
for _ in xrange(self._process_num) args=args)
] sample_proc.start_all()
for w in workers:
w.daemon = True
w.start()
finished_process_num = 0 return sample_queue
while finished_process_num < self._process_num: def batch_iterator(self, batch_size, minimum_batch_size):
sample = sample_queue.get() @suppress_complaints(verbose=self._verbose, notify=self._force_exit)
if isinstance(sample, EpochEndSignal): def batch_assembling_task(sample_queue, batch_queue, pool):
finished_process_num += 1 def conv_to_shared(ndarray):
continue while self._force_exit == False:
yield sample try:
(name, shared_ndarray) = pool.popitem()
except Exception as e:
time.sleep(0.001)
else:
shared_ndarray.copy(ndarray)
return shared_ndarray
feeding_thread.join() if self._verbose == 0:
for w in workers: signal.signal(signal.SIGTERM, suppress_signal)
w.join() signal.signal(signal.SIGINT, suppress_signal)
def batch_iterator(self, batch_size, minimum_batch_size):
def batch_to_ndarray(batch_samples, lod):
assert len(batch_samples)
frame_dim = batch_samples[0][0].shape[1]
batch_feature = np.zeros((lod[-1], frame_dim), dtype="float32")
batch_label = np.zeros((lod[-1], 1), dtype="int64")
start = 0
for sample in batch_samples:
frame_num = sample[0].shape[0]
batch_feature[start:start + frame_num, :] = sample[0]
batch_label[start:start + frame_num, :] = sample[1]
start += frame_num
return (batch_feature, batch_label)
@suppress_complaints(verbose=self._verbose)
def batch_assembling_task(sample_generator, batch_queue):
batch_samples = [] batch_samples = []
lod = [0] lod = [0]
for sample in sample_generator(): done_num = 0
batch_samples.append(sample) while done_num < self._sample_proc_num:
lod.append(lod[-1] + sample[0].shape[0]) sample = sample_queue.get()
if len(batch_samples) == batch_size: if isinstance(sample, EpochEndSignal):
(batch_feature, batch_label) = batch_to_ndarray( done_num += 1
batch_samples, lod) else:
batch_queue.put((batch_feature, batch_label, lod)) batch_samples.append(sample)
batch_samples = [] lod.append(lod[-1] + sample[0].shape[0])
lod = [0] if len(batch_samples) == batch_size:
feature, label = batch_to_ndarray(batch_samples, lod)
feature = conv_to_shared(feature)
label = conv_to_shared(label)
lod = conv_to_shared(np.array(lod).astype('int64'))
batch_queue.put((feature, label, lod))
batch_samples = []
lod = [0]
if len(batch_samples) >= minimum_batch_size: if len(batch_samples) >= minimum_batch_size:
(batch_feature, batch_label) = batch_to_ndarray(batch_samples, (feature, label) = batch_to_ndarray(batch_samples, lod)
lod)
batch_queue.put((batch_feature, batch_label, lod)) feature = conv_to_shared(feature)
label = conv_to_shared(label)
lod = conv_to_shared(np.array(lod).astype('int64'))
batch_queue.put((feature, label, lod))
batch_queue.put(EpochEndSignal()) batch_queue.put(EpochEndSignal())
batch_queue = Queue.Queue(self._batch_buffer_size) sample_queue = self._start_async_processing()
batch_queue = self._manager.Queue(self._batch_buffer_size)
self._pool_manager = SharedMemoryPoolManager(self._batch_buffer_size *
3, self._manager)
assembling_thread = Thread( assembling_proc = DaemonProcessGroup(
proc_num=1,
target=batch_assembling_task, target=batch_assembling_task,
args=(self._sample_generator, batch_queue)) args=(sample_queue, batch_queue, self._pool_manager.pool))
assembling_thread.daemon = True assembling_proc.start_all()
assembling_thread.start()
while True: while self._force_exit == False:
try: try:
batch_data = batch_queue.get_nowait() batch_data = batch_queue.get_nowait()
except Queue.Empty: except Queue.Empty:
...@@ -359,4 +441,5 @@ class DataReader(object): ...@@ -359,4 +441,5 @@ class DataReader(object):
break break
yield batch_data yield batch_data
assembling_thread.join() # clean the shared memory
del self._pool_manager
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
import data_utils.augmentor.trans_add_delta as trans_add_delta
import data_utils.augmentor.trans_splice as trans_splice
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import sys import sys, time
from six import reraise from six import reraise
from tblib import Traceback from tblib import Traceback
from multiprocessing import Manager, Process
import posix_ipc, mmap
import numpy as np import numpy as np
...@@ -35,21 +37,177 @@ def lodtensor_to_ndarray(lod_tensor): ...@@ -35,21 +37,177 @@ def lodtensor_to_ndarray(lod_tensor):
return ret, lod_tensor.lod() return ret, lod_tensor.lod()
def batch_to_ndarray(batch_samples, lod):
frame_dim = batch_samples[0][0].shape[1]
batch_feature = np.zeros((lod[-1], frame_dim), dtype="float32")
batch_label = np.zeros((lod[-1], 1), dtype="int64")
start = 0
for sample in batch_samples:
frame_num = sample[0].shape[0]
batch_feature[start:start + frame_num, :] = sample[0]
batch_label[start:start + frame_num, :] = sample[1]
start += frame_num
return (batch_feature, batch_label)
def split_infer_result(infer_seq, lod):
infer_batch = []
for i in xrange(0, len(lod[0]) - 1):
infer_batch.append(infer_seq[lod[0][i]:lod[0][i + 1]])
return infer_batch
class DaemonProcessGroup(object):
def __init__(self, proc_num, target, args):
self._proc_num = proc_num
self._workers = [
Process(
target=target, args=args) for _ in xrange(self._proc_num)
]
def start_all(self):
for w in self._workers:
w.daemon = True
w.start()
@property
def proc_num(self):
return self._proc_num
class EpochEndSignal(object):
pass
class CriticalException(Exception):
pass
class SharedNDArray(object):
"""SharedNDArray utilizes shared memory to avoid data serialization when
data object shared among different processes. We can reconstruct the
`ndarray` when memory address, shape and dtype provided.
Args:
name (str): Address name of shared memory.
whether_verify (bool): Whether to validate the writing operation.
"""
def __init__(self, name, whether_verify=False):
self._name = name
self._shm = None
self._buf = None
self._array = np.zeros(1, dtype=np.float32)
self._inited = False
self._whether_verify = whether_verify
def zeros_like(self, shape, dtype):
size = int(np.prod(shape)) * np.dtype(dtype).itemsize
if self._inited:
self._shm = posix_ipc.SharedMemory(self._name)
else:
self._shm = posix_ipc.SharedMemory(
self._name, posix_ipc.O_CREAT, size=size)
self._buf = mmap.mmap(self._shm.fd, size)
self._array = np.ndarray(shape, dtype, self._buf, order='C')
def copy(self, ndarray):
size = int(np.prod(ndarray.shape)) * np.dtype(ndarray.dtype).itemsize
self.zeros_like(ndarray.shape, ndarray.dtype)
self._array[:] = ndarray
self._buf.flush()
self._inited = True
if self._whether_verify:
shm = posix_ipc.SharedMemory(self._name)
buf = mmap.mmap(shm.fd, size)
array = np.ndarray(ndarray.shape, ndarray.dtype, buf, order='C')
np.testing.assert_array_equal(array, ndarray)
@property
def ndarray(self):
return self._array
def recycle(self, pool):
self._buf.close()
self._shm.close_fd()
self._inited = False
pool[self._name] = self
def __getstate__(self):
return (self._name, self._array.shape, self._array.dtype, self._inited,
self._whether_verify)
def __setstate__(self, state):
self._name = state[0]
self._inited = state[3]
self.zeros_like(state[1], state[2])
self._whether_verify = state[4]
class SharedMemoryPoolManager(object):
"""SharedMemoryPoolManager maintains a multiprocessing.Manager.dict object.
All available addresses are allocated once and will be reused. Though this
class is not process-safe, the pool can be shared between processes. All
shared memory should be unlinked before the main process exited.
Args:
pool_size (int): Size of shared memory pool.
manager (dict): A multiprocessing.Manager object, the pool is
maintained by the proxy process.
name_prefix (str): Address prefix of shared memory.
"""
def __init__(self, pool_size, manager, name_prefix='/deep_asr'):
self._names = []
self._dict = manager.dict()
self._time_prefix = time.strftime('%Y%m%d%H%M%S')
for i in xrange(pool_size):
name = name_prefix + '_' + self._time_prefix + '_' + str(i)
self._dict[name] = SharedNDArray(name)
self._names.append(name)
@property
def pool(self):
return self._dict
def __del__(self):
for name in self._names:
# have to unlink the shared memory
posix_ipc.unlink_shared_memory(name)
def suppress_signal(signo, stack_frame): def suppress_signal(signo, stack_frame):
pass pass
def suppress_complaints(verbose): def suppress_complaints(verbose, notify=None):
def decorator_maker(func): def decorator_maker(func):
def suppress_warpper(*args, **kwargs): def suppress_warpper(*args, **kwargs):
try: try:
func(*args, **kwargs) func(*args, **kwargs)
except: except:
et, ev, tb = sys.exc_info() et, ev, tb = sys.exc_info()
tb = Traceback(tb)
if verbose == 1: if notify is not None:
reraise(et, ev, tb.as_traceback()) notify(except_type=et, except_value=ev, traceback=tb)
if verbose == 1 or isinstance(ev, CriticalException):
reraise(et, ev, Traceback(tb).as_traceback())
return suppress_warpper return suppress_warpper
return decorator_maker return decorator_maker
class ForceExitWrapper(object):
def __init__(self, exit_flag):
self._exit_flag = exit_flag
@suppress_complaints(verbose=0)
def __call__(self, *args, **kwargs):
self._exit_flag.value = True
def __eq__(self, flag):
return self._exit_flag.value == flag
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include "post_decode_faster.h"
typedef kaldi::int32 int32;
using fst::SymbolTable;
using fst::VectorFst;
using fst::StdArc;
Decoder::Decoder(std::string word_syms_filename,
std::string fst_in_filename,
std::string logprior_rxfilename) {
const char* usage =
"Decode, reading log-likelihoods (of transition-ids or whatever symbol "
"is on the graph) as matrices.";
kaldi::ParseOptions po(usage);
binary = true;
acoustic_scale = 1.5;
allow_partial = true;
kaldi::FasterDecoderOptions decoder_opts;
decoder_opts.Register(&po, true); // true == include obscure settings.
po.Register("binary", &binary, "Write output in binary mode");
po.Register("allow-partial",
&allow_partial,
"Produce output even when final state was not reached");
po.Register("acoustic-scale",
&acoustic_scale,
"Scaling factor for acoustic likelihoods");
word_syms = NULL;
if (word_syms_filename != "") {
word_syms = fst::SymbolTable::ReadText(word_syms_filename);
if (!word_syms)
KALDI_ERR << "Could not read symbol table from file "
<< word_syms_filename;
}
std::ifstream is_logprior(logprior_rxfilename);
logprior.Read(is_logprior, false);
// It's important that we initialize decode_fst after loglikes_reader, as it
// can prevent crashes on systems installed without enough virtual memory.
// It has to do with what happens on UNIX systems if you call fork() on a
// large process: the page-table entries are duplicated, which requires a
// lot of virtual memory.
decode_fst = fst::ReadFstKaldi(fst_in_filename);
decoder = new kaldi::FasterDecoder(*decode_fst, decoder_opts);
}
Decoder::~Decoder() {
if (!word_syms) delete word_syms;
delete decode_fst;
delete decoder;
}
std::string Decoder::decode(
std::string key,
const std::vector<std::vector<kaldi::BaseFloat>>& log_probs) {
size_t num_frames = log_probs.size();
size_t dim_label = log_probs[0].size();
kaldi::Matrix<kaldi::BaseFloat> loglikes(
num_frames, dim_label, kaldi::kSetZero, kaldi::kStrideEqualNumCols);
for (size_t i = 0; i < num_frames; ++i) {
memcpy(loglikes.Data() + i * dim_label,
log_probs[i].data(),
sizeof(kaldi::BaseFloat) * dim_label);
}
return decode(key, loglikes);
}
std::vector<std::string> Decoder::decode(std::string posterior_rspecifier) {
kaldi::SequentialBaseFloatMatrixReader posterior_reader(posterior_rspecifier);
std::vector<std::string> decoding_results;
for (; !posterior_reader.Done(); posterior_reader.Next()) {
std::string key = posterior_reader.Key();
kaldi::Matrix<kaldi::BaseFloat> loglikes(posterior_reader.Value());
decoding_results.push_back(decode(key, loglikes));
}
return decoding_results;
}
std::string Decoder::decode(std::string key,
kaldi::Matrix<kaldi::BaseFloat>& loglikes) {
std::string decoding_result;
if (loglikes.NumRows() == 0) {
KALDI_WARN << "Zero-length utterance: " << key;
}
KALDI_ASSERT(loglikes.NumCols() == logprior.Dim());
loglikes.ApplyLog();
loglikes.AddVecToRows(-1.0, logprior);
kaldi::DecodableMatrixScaled decodable(loglikes, acoustic_scale);
decoder->Decode(&decodable);
VectorFst<kaldi::LatticeArc> decoded; // linear FST.
if ((allow_partial || decoder->ReachedFinal()) &&
decoder->GetBestPath(&decoded)) {
if (!decoder->ReachedFinal())
KALDI_WARN << "Decoder did not reach end-state, outputting partial "
"traceback.";
std::vector<int32> alignment;
std::vector<int32> words;
kaldi::LatticeWeight weight;
GetLinearSymbolSequence(decoded, &alignment, &words, &weight);
if (word_syms != NULL) {
for (size_t i = 0; i < words.size(); i++) {
std::string s = word_syms->Find(words[i]);
decoding_result += s;
if (s == "")
KALDI_ERR << "Word-id " << words[i] << " not in symbol table.";
}
}
}
return decoding_result;
}
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <string>
#include <vector>
#include "base/kaldi-common.h"
#include "base/timer.h"
#include "decoder/decodable-matrix.h"
#include "decoder/faster-decoder.h"
#include "fstext/fstext-lib.h"
#include "hmm/transition-model.h"
#include "lat/kaldi-lattice.h" // for {Compact}LatticeArc
#include "tree/context-dep.h"
#include "util/common-utils.h"
class Decoder {
public:
Decoder(std::string word_syms_filename,
std::string fst_in_filename,
std::string logprior_rxfilename);
~Decoder();
// Interface to accept the scores read from specifier and return
// the batch decoding results
std::vector<std::string> decode(std::string posterior_rspecifier);
// Accept the scores of one utterance and return the decoding result
std::string decode(
std::string key,
const std::vector<std::vector<kaldi::BaseFloat>> &log_probs);
private:
// For decoding one utterance
std::string decode(std::string key,
kaldi::Matrix<kaldi::BaseFloat> &loglikes);
fst::SymbolTable *word_syms;
fst::VectorFst<fst::StdArc> *decode_fst;
kaldi::FasterDecoder *decoder;
kaldi::Vector<kaldi::BaseFloat> logprior;
bool binary;
kaldi::BaseFloat acoustic_scale;
bool allow_partial;
};
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include "post_decode_faster.h"
namespace py = pybind11;
PYBIND11_MODULE(post_decode_faster, m) {
m.doc() = "Decoder for Deep ASR model";
py::class_<Decoder>(m, "Decoder")
.def(py::init<std::string, std::string, std::string>())
.def("decode",
(std::vector<std::string> (Decoder::*)(std::string)) &
Decoder::decode,
"Decode for the probability matrices in specifier "
"and return the transcriptions.")
.def(
"decode",
(std::string (Decoder::*)(
std::string, const std::vector<std::vector<kaldi::BaseFloat>>&)) &
Decoder::decode,
"Decode one input probability matrix "
"and return the transcription.");
}
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import glob
from distutils.core import setup, Extension
from distutils.sysconfig import get_config_vars
try:
kaldi_root = os.environ['KALDI_ROOT']
except:
raise ValueError("Enviroment variable 'KALDI_ROOT' is not defined. Please "
"install kaldi and export KALDI_ROOT=<kaldi's root dir> .")
args = [
'-std=c++11', '-Wno-sign-compare', '-Wno-unused-variable',
'-Wno-unused-local-typedefs', '-Wno-unused-but-set-variable',
'-Wno-deprecated-declarations', '-Wno-unused-function'
]
# remove warning about -Wstrict-prototypes
(opt, ) = get_config_vars('OPT')
os.environ['OPT'] = " ".join(flag for flag in opt.split()
if flag != '-Wstrict-prototypes')
os.environ['CC'] = 'g++'
LIBS = [
'fst', 'kaldi-base', 'kaldi-util', 'kaldi-matrix', 'kaldi-tree',
'kaldi-hmm', 'kaldi-fstext', 'kaldi-decoder', 'kaldi-lat'
]
LIB_DIRS = [
'tools/openfst/lib', 'src/base', 'src/matrix', 'src/util', 'src/tree',
'src/hmm', 'src/fstext', 'src/decoder', 'src/lat'
]
LIB_DIRS = [os.path.join(kaldi_root, path) for path in LIB_DIRS]
LIB_DIRS = [os.path.abspath(path) for path in LIB_DIRS]
ext_modules = [
Extension(
'post_decode_faster',
['pybind.cc', 'post_decode_faster.cc'],
include_dirs=[
'pybind11/include', '.', os.path.join(kaldi_root, 'src'),
os.path.join(kaldi_root, 'tools/openfst/src/include')
],
language='c++',
libraries=LIBS,
library_dirs=LIB_DIRS,
runtime_library_dirs=LIB_DIRS,
extra_compile_args=args, ),
]
setup(
name='post_decode_faster',
version='0.0.1',
author='Paddle',
author_email='',
description='Decoder for Deep ASR model',
ext_modules=ext_modules, )
set -e
if [ ! -d pybind11 ]; then
git clone https://github.com/pybind/pybind11.git
fi
python setup.py build_ext -i
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import argparse
import paddle.fluid as fluid
import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
import data_utils.augmentor.trans_add_delta as trans_add_delta
import data_utils.augmentor.trans_splice as trans_splice
import data_utils.data_reader as reader
from data_utils.util import lodtensor_to_ndarray
from data_utils.util import split_infer_result
def parse_args():
parser = argparse.ArgumentParser("Inference for stacked LSTMP model.")
parser.add_argument(
'--batch_size',
type=int,
default=32,
help='The sequence number of a batch data. (default: %(default)d)')
parser.add_argument(
'--device',
type=str,
default='GPU',
choices=['CPU', 'GPU'],
help='The device type. (default: %(default)s)')
parser.add_argument(
'--mean_var',
type=str,
default='data/global_mean_var_search26kHr',
help="The path for feature's global mean and variance. "
"(default: %(default)s)")
parser.add_argument(
'--infer_feature_lst',
type=str,
default='data/infer_feature.lst',
help='The feature list path for inference. (default: %(default)s)')
parser.add_argument(
'--infer_label_lst',
type=str,
default='data/infer_label.lst',
help='The label list path for inference. (default: %(default)s)')
parser.add_argument(
'--infer_model_path',
type=str,
default='./infer_models/deep_asr.pass_0.infer.model/',
help='The directory for loading inference model. '
'(default: %(default)s)')
args = parser.parse_args()
return args
def print_arguments(args):
print('----------- Configuration Arguments -----------')
for arg, value in sorted(vars(args).iteritems()):
print('%s: %s' % (arg, value))
print('------------------------------------------------')
def infer(args):
""" Gets one batch of feature data and predicts labels for each sample.
"""
if not os.path.exists(args.infer_model_path):
raise IOError("Invalid inference model path!")
place = fluid.CUDAPlace(0) if args.device == 'GPU' else fluid.CPUPlace()
exe = fluid.Executor(place)
# load model
[infer_program, feed_dict,
fetch_targets] = fluid.io.load_inference_model(args.infer_model_path, exe)
ltrans = [
trans_add_delta.TransAddDelta(2, 2),
trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var),
trans_splice.TransSplice()
]
infer_data_reader = reader.DataReader(args.infer_feature_lst,
args.infer_label_lst)
infer_data_reader.set_transformers(ltrans)
feature_t = fluid.LoDTensor()
one_batch = infer_data_reader.batch_iterator(args.batch_size, 1).next()
(features, labels, lod) = one_batch
feature_t.set(features, place)
feature_t.set_lod([lod])
results = exe.run(infer_program,
feed={feed_dict[0]: feature_t},
fetch_list=fetch_targets,
return_numpy=False)
probs, lod = lodtensor_to_ndarray(results[0])
preds = probs.argmax(axis=1)
infer_batch = split_infer_result(preds, lod)
for index, sample in enumerate(infer_batch):
print("result %d: " % index, sample, '\n')
if __name__ == '__main__':
args = parse_args()
print_arguments(args)
infer(args)
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys
import os
import numpy as np
import argparse
import time
import paddle.fluid as fluid
import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
import data_utils.augmentor.trans_add_delta as trans_add_delta
import data_utils.augmentor.trans_splice as trans_splice
import data_utils.async_data_reader as reader
from decoder.post_decode_faster import Decoder
from data_utils.util import lodtensor_to_ndarray
from model_utils.model import stacked_lstmp_model
from data_utils.util import split_infer_result
def parse_args():
parser = argparse.ArgumentParser("Run inference by using checkpoint.")
parser.add_argument(
'--batch_size',
type=int,
default=32,
help='The sequence number of a batch data. (default: %(default)d)')
parser.add_argument(
'--minimum_batch_size',
type=int,
default=1,
help='The minimum sequence number of a batch data. '
'(default: %(default)d)')
parser.add_argument(
'--frame_dim',
type=int,
default=120 * 11,
help='Frame dimension of feature data. (default: %(default)d)')
parser.add_argument(
'--stacked_num',
type=int,
default=5,
help='Number of lstmp layers to stack. (default: %(default)d)')
parser.add_argument(
'--proj_dim',
type=int,
default=512,
help='Project size of lstmp unit. (default: %(default)d)')
parser.add_argument(
'--hidden_dim',
type=int,
default=1024,
help='Hidden size of lstmp unit. (default: %(default)d)')
parser.add_argument(
'--class_num',
type=int,
default=1749,
help='Number of classes in label. (default: %(default)d)')
parser.add_argument(
'--learning_rate',
type=float,
default=0.00016,
help='Learning rate used to train. (default: %(default)f)')
parser.add_argument(
'--device',
type=str,
default='GPU',
choices=['CPU', 'GPU'],
help='The device type. (default: %(default)s)')
parser.add_argument(
'--parallel', action='store_true', help='If set, run in parallel.')
parser.add_argument(
'--mean_var',
type=str,
default='data/global_mean_var_search26kHr',
help="The path for feature's global mean and variance. "
"(default: %(default)s)")
parser.add_argument(
'--infer_feature_lst',
type=str,
default='data/infer_feature.lst',
help='The feature list path for inference. (default: %(default)s)')
parser.add_argument(
'--infer_label_lst',
type=str,
default='data/infer_label.lst',
help='The label list path for inference. (default: %(default)s)')
parser.add_argument(
'--checkpoint',
type=str,
default='./checkpoint',
help="The checkpoint path to init model. (default: %(default)s)")
parser.add_argument(
'--vocabulary',
type=str,
default='./decoder/graph/words.txt',
help="The path to vocabulary. (default: %(default)s)")
parser.add_argument(
'--graphs',
type=str,
default='./decoder/graph/TLG.fst',
help="The path to TLG graphs for decoding. (default: %(default)s)")
parser.add_argument(
'--log_prior',
type=str,
default="./decoder/logprior",
help="The log prior probs for training data. (default: %(default)s)")
args = parser.parse_args()
return args
def print_arguments(args):
print('----------- Configuration Arguments -----------')
for arg, value in sorted(vars(args).iteritems()):
print('%s: %s' % (arg, value))
print('------------------------------------------------')
def infer_from_ckpt(args):
"""Inference by using checkpoint."""
if not os.path.exists(args.checkpoint):
raise IOError("Invalid checkpoint!")
prediction, avg_cost, accuracy = stacked_lstmp_model(
frame_dim=args.frame_dim,
hidden_dim=args.hidden_dim,
proj_dim=args.proj_dim,
stacked_num=args.stacked_num,
class_num=args.class_num,
parallel=args.parallel)
infer_program = fluid.default_main_program().clone()
optimizer = fluid.optimizer.Adam(learning_rate=args.learning_rate)
optimizer.minimize(avg_cost)
place = fluid.CPUPlace() if args.device == 'CPU' else fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
# load checkpoint.
fluid.io.load_persistables(exe, args.checkpoint)
ltrans = [
trans_add_delta.TransAddDelta(2, 2),
trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var),
trans_splice.TransSplice()
]
feature_t = fluid.LoDTensor()
label_t = fluid.LoDTensor()
# infer data reader
infer_data_reader = reader.AsyncDataReader(args.infer_feature_lst,
args.infer_label_lst)
infer_data_reader.set_transformers(ltrans)
infer_costs, infer_accs = [], []
for batch_id, batch_data in enumerate(
infer_data_reader.batch_iterator(args.batch_size,
args.minimum_batch_size)):
# load_data
(features, labels, lod) = batch_data
feature_t.set(features.ndarray, place)
feature_t.set_lod([lod.ndarray])
label_t.set(labels.ndarray, place)
label_t.set_lod([lod.ndarray])
infer_data_reader.recycle(features, labels, lod)
results = exe.run(infer_program,
feed={"feature": feature_t,
"label": label_t},
fetch_list=[prediction, avg_cost, accuracy],
return_numpy=False)
infer_costs.append(lodtensor_to_ndarray(results[1])[0])
infer_accs.append(lodtensor_to_ndarray(results[2])[0])
probs, lod = lodtensor_to_ndarray(results[0])
infer_batch = split_infer_result(probs, lod)
for index, sample in enumerate(infer_batch):
key = "utter#%d" % (batch_id * args.batch_size + index)
print(key, ": ", decoder.decode(key, sample), "\n")
print(np.mean(infer_costs), np.mean(infer_accs))
if __name__ == '__main__':
args = parse_args()
print_arguments(args)
infer_from_ckpt(args)
...@@ -3,10 +3,11 @@ from __future__ import division ...@@ -3,10 +3,11 @@ from __future__ import division
from __future__ import print_function from __future__ import print_function
import paddle.v2 as paddle import paddle.v2 as paddle
import paddle.v2.fluid as fluid import paddle.fluid as fluid
def stacked_lstmp_model(hidden_dim, def stacked_lstmp_model(frame_dim,
hidden_dim,
proj_dim, proj_dim,
stacked_num, stacked_num,
class_num, class_num,
...@@ -20,12 +21,13 @@ def stacked_lstmp_model(hidden_dim, ...@@ -20,12 +21,13 @@ def stacked_lstmp_model(hidden_dim,
label data respectively. And in inference, only `feature` is needed. label data respectively. And in inference, only `feature` is needed.
Args: Args:
hidden_dim(int): The hidden state's dimension of the LSTMP layer. frame_dim(int): The frame dimension of feature data.
proj_dim(int): The projection size of the LSTMP layer. hidden_dim(int): The hidden state's dimension of the LSTMP layer.
stacked_num(int): The number of stacked LSTMP layers. proj_dim(int): The projection size of the LSTMP layer.
parallel(bool): Run in parallel or not, default `False`. stacked_num(int): The number of stacked LSTMP layers.
is_train(bool): Run in training phase or not, default `True`. parallel(bool): Run in parallel or not, default `False`.
class_dim(int): The number of output classes. is_train(bool): Run in training phase or not, default `True`.
class_dim(int): The number of output classes.
""" """
# network configuration # network configuration
...@@ -78,7 +80,7 @@ def stacked_lstmp_model(hidden_dim, ...@@ -78,7 +80,7 @@ def stacked_lstmp_model(hidden_dim,
# data feeder # data feeder
feature = fluid.layers.data( feature = fluid.layers.data(
name="feature", shape=[-1, 120 * 11], dtype="float32", lod_level=1) name="feature", shape=[-1, frame_dim], dtype="float32", lod_level=1)
label = fluid.layers.data( label = fluid.layers.data(
name="label", shape=[-1, 1], dtype="int64", lod_level=1) name="label", shape=[-1, 1], dtype="int64", lod_level=1)
...@@ -92,11 +94,12 @@ def stacked_lstmp_model(hidden_dim, ...@@ -92,11 +94,12 @@ def stacked_lstmp_model(hidden_dim,
feat_ = pd.read_input(feature) feat_ = pd.read_input(feature)
label_ = pd.read_input(label) label_ = pd.read_input(label)
prediction, avg_cost, acc = _net_conf(feat_, label_) prediction, avg_cost, acc = _net_conf(feat_, label_)
for out in [avg_cost, acc]: for out in [prediction, avg_cost, acc]:
pd.write_output(out) pd.write_output(out)
# get mean loss and acc through every devices. # get mean loss and acc through every devices.
avg_cost, acc = pd() prediction, avg_cost, acc = pd()
prediction.stop_gradient = True
avg_cost = fluid.layers.mean(x=avg_cost) avg_cost = fluid.layers.mean(x=avg_cost)
acc = fluid.layers.mean(x=acc) acc = fluid.layers.mean(x=acc)
else: else:
......
...@@ -7,13 +7,13 @@ import numpy as np ...@@ -7,13 +7,13 @@ import numpy as np
import argparse import argparse
import time import time
import paddle.v2.fluid as fluid import paddle.fluid as fluid
import paddle.v2.fluid.profiler as profiler import paddle.fluid.profiler as profiler
import _init_paths import _init_paths
import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
import data_utils.augmentor.trans_add_delta as trans_add_delta import data_utils.augmentor.trans_add_delta as trans_add_delta
import data_utils.augmentor.trans_splice as trans_splice import data_utils.augmentor.trans_splice as trans_splice
import data_utils.data_reader as reader import data_utils.async_data_reader as reader
from model_utils.model import stacked_lstmp_model from model_utils.model import stacked_lstmp_model
from data_utils.util import lodtensor_to_ndarray from data_utils.util import lodtensor_to_ndarray
...@@ -31,6 +31,11 @@ def parse_args(): ...@@ -31,6 +31,11 @@ def parse_args():
default=1, default=1,
help='The minimum sequence number of a batch data. ' help='The minimum sequence number of a batch data. '
'(default: %(default)d)') '(default: %(default)d)')
parser.add_argument(
'--frame_dim',
type=int,
default=120 * 11,
help='Frame dimension of feature data. (default: %(default)d)')
parser.add_argument( parser.add_argument(
'--stacked_num', '--stacked_num',
type=int, type=int,
...@@ -46,10 +51,15 @@ def parse_args(): ...@@ -46,10 +51,15 @@ def parse_args():
type=int, type=int,
default=1024, default=1024,
help='Hidden size of lstmp unit. (default: %(default)d)') help='Hidden size of lstmp unit. (default: %(default)d)')
parser.add_argument(
'--class_num',
type=int,
default=1749,
help='Number of classes in label. (default: %(default)d)')
parser.add_argument( parser.add_argument(
'--learning_rate', '--learning_rate',
type=float, type=float,
default=0.002, default=0.00016,
help='Learning rate used to train. (default: %(default)f)') help='Learning rate used to train. (default: %(default)f)')
parser.add_argument( parser.add_argument(
'--device', '--device',
...@@ -119,14 +129,15 @@ def profile(args): ...@@ -119,14 +129,15 @@ def profile(args):
"arg 'first_batches_to_skip' must not be smaller than 0.") "arg 'first_batches_to_skip' must not be smaller than 0.")
_, avg_cost, accuracy = stacked_lstmp_model( _, avg_cost, accuracy = stacked_lstmp_model(
frame_dim=args.frame_dim,
hidden_dim=args.hidden_dim, hidden_dim=args.hidden_dim,
proj_dim=args.proj_dim, proj_dim=args.proj_dim,
stacked_num=args.stacked_num, stacked_num=args.stacked_num,
class_num=1749, class_num=args.class_num,
parallel=args.parallel) parallel=args.parallel)
adam_optimizer = fluid.optimizer.Adam(learning_rate=args.learning_rate) optimizer = fluid.optimizer.Adam(learning_rate=args.learning_rate)
adam_optimizer.minimize(avg_cost) optimizer.minimize(avg_cost)
place = fluid.CPUPlace() if args.device == 'CPU' else fluid.CUDAPlace(0) place = fluid.CPUPlace() if args.device == 'CPU' else fluid.CUDAPlace(0)
exe = fluid.Executor(place) exe = fluid.Executor(place)
...@@ -138,7 +149,7 @@ def profile(args): ...@@ -138,7 +149,7 @@ def profile(args):
trans_splice.TransSplice() trans_splice.TransSplice()
] ]
data_reader = reader.DataReader(args.feature_lst, args.label_lst) data_reader = reader.AsyncDataReader(args.feature_lst, args.label_lst)
data_reader.set_transformers(ltrans) data_reader.set_transformers(ltrans)
feature_t = fluid.LoDTensor() feature_t = fluid.LoDTensor()
...@@ -158,17 +169,20 @@ def profile(args): ...@@ -158,17 +169,20 @@ def profile(args):
frames_seen = 0 frames_seen = 0
# load_data # load_data
(features, labels, lod) = batch_data (features, labels, lod) = batch_data
feature_t.set(features, place) feature_t.set(features.ndarray, place)
feature_t.set_lod([lod]) feature_t.set_lod([lod.ndarray])
label_t.set(labels, place) label_t.set(labels.ndarray, place)
label_t.set_lod([lod]) label_t.set_lod([lod.ndarray])
frames_seen += lod.ndarray[-1]
frames_seen += lod[-1] data_reader.recycle(features, labels, lod)
outs = exe.run(fluid.default_main_program(), outs = exe.run(fluid.default_main_program(),
feed={"feature": feature_t, feed={"feature": feature_t,
"label": label_t}, "label": label_t},
fetch_list=[avg_cost, accuracy], fetch_list=[avg_cost, accuracy]
if args.print_train_acc else [],
return_numpy=False) return_numpy=False)
if args.print_train_acc: if args.print_train_acc:
......
...@@ -8,11 +8,11 @@ import numpy as np ...@@ -8,11 +8,11 @@ import numpy as np
import argparse import argparse
import time import time
import paddle.v2.fluid as fluid import paddle.fluid as fluid
import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
import data_utils.augmentor.trans_add_delta as trans_add_delta import data_utils.augmentor.trans_add_delta as trans_add_delta
import data_utils.augmentor.trans_splice as trans_splice import data_utils.augmentor.trans_splice as trans_splice
import data_utils.data_reader as reader import data_utils.async_data_reader as reader
from data_utils.util import lodtensor_to_ndarray from data_utils.util import lodtensor_to_ndarray
from model_utils.model import stacked_lstmp_model from model_utils.model import stacked_lstmp_model
...@@ -30,21 +30,31 @@ def parse_args(): ...@@ -30,21 +30,31 @@ def parse_args():
default=1, default=1,
help='The minimum sequence number of a batch data. ' help='The minimum sequence number of a batch data. '
'(default: %(default)d)') '(default: %(default)d)')
parser.add_argument(
'--frame_dim',
type=int,
default=120 * 11,
help='Frame dimension of feature data. (default: %(default)d)')
parser.add_argument( parser.add_argument(
'--stacked_num', '--stacked_num',
type=int, type=int,
default=5, default=5,
help='Number of lstm layers to stack. (default: %(default)d)') help='Number of lstmp layers to stack. (default: %(default)d)')
parser.add_argument( parser.add_argument(
'--proj_dim', '--proj_dim',
type=int, type=int,
default=512, default=512,
help='Project size of lstm unit. (default: %(default)d)') help='Project size of lstmp unit. (default: %(default)d)')
parser.add_argument( parser.add_argument(
'--hidden_dim', '--hidden_dim',
type=int, type=int,
default=1024, default=1024,
help='Hidden size of lstm unit. (default: %(default)d)') help='Hidden size of lstmp unit. (default: %(default)d)')
parser.add_argument(
'--class_num',
type=int,
default=1749,
help='Number of classes in label. (default: %(default)d)')
parser.add_argument( parser.add_argument(
'--pass_num', '--pass_num',
type=int, type=int,
...@@ -58,7 +68,7 @@ def parse_args(): ...@@ -58,7 +68,7 @@ def parse_args():
parser.add_argument( parser.add_argument(
'--learning_rate', '--learning_rate',
type=float, type=float,
default=0.002, default=0.00016,
help='Learning rate used to train. (default: %(default)f)') help='Learning rate used to train. (default: %(default)f)')
parser.add_argument( parser.add_argument(
'--device', '--device',
...@@ -72,33 +82,46 @@ def parse_args(): ...@@ -72,33 +82,46 @@ def parse_args():
'--mean_var', '--mean_var',
type=str, type=str,
default='data/global_mean_var_search26kHr', default='data/global_mean_var_search26kHr',
help='mean var path') help="The path for feature's global mean and variance. "
"(default: %(default)s)")
parser.add_argument( parser.add_argument(
'--train_feature_lst', '--train_feature_lst',
type=str, type=str,
default='data/feature.lst', default='data/feature.lst',
help='feature list path for training.') help='The feature list path for training. (default: %(default)s)')
parser.add_argument( parser.add_argument(
'--train_label_lst', '--train_label_lst',
type=str, type=str,
default='data/label.lst', default='data/label.lst',
help='label list path for training.') help='The label list path for training. (default: %(default)s)')
parser.add_argument( parser.add_argument(
'--val_feature_lst', '--val_feature_lst',
type=str, type=str,
default='data/val_feature.lst', default='data/val_feature.lst',
help='feature list path for validation.') help='The feature list path for validation. (default: %(default)s)')
parser.add_argument( parser.add_argument(
'--val_label_lst', '--val_label_lst',
type=str, type=str,
default='data/val_label.lst', default='data/val_label.lst',
help='label list path for validation.') help='The label list path for validation. (default: %(default)s)')
parser.add_argument(
'--init_model_path',
type=str,
default=None,
help="The model (checkpoint) path which the training resumes from. "
"If None, train the model from scratch. (default: %(default)s)")
parser.add_argument( parser.add_argument(
'--model_save_dir', '--checkpoints',
type=str, type=str,
default='./checkpoints', default='./checkpoints',
help='directory to save model. Do not save model if set to ' help="The directory for saving checkpoints. Do not save checkpoints "
'.') "if set to ''. (default: %(default)s)")
parser.add_argument(
'--infer_models',
type=str,
default='./infer_models',
help="The directory for saving inference models. Do not save inference "
"models if set to ''. (default: %(default)s)")
args = parser.parse_args() args = parser.parse_args()
return args return args
...@@ -114,27 +137,37 @@ def train(args): ...@@ -114,27 +137,37 @@ def train(args):
"""train in loop. """train in loop.
""" """
# prediction, avg_cost, accuracy = stacked_lstmp_model(args.hidden_dim, # paths check
# args.proj_dim, args.stacked_num, class_num=1749, args.parallel) if args.init_model_path is not None and \
not os.path.exists(args.init_model_path):
raise IOError("Invalid initial model path!")
if args.checkpoints != '' and not os.path.exists(args.checkpoints):
os.mkdir(args.checkpoints)
if args.infer_models != '' and not os.path.exists(args.infer_models):
os.mkdir(args.infer_models)
prediction, avg_cost, accuracy = stacked_lstmp_model( prediction, avg_cost, accuracy = stacked_lstmp_model(
frame_dim=args.frame_dim,
hidden_dim=args.hidden_dim, hidden_dim=args.hidden_dim,
proj_dim=args.proj_dim, proj_dim=args.proj_dim,
stacked_num=args.stacked_num, stacked_num=args.stacked_num,
class_num=1749, class_num=args.class_num,
parallel=args.parallel) parallel=args.parallel)
adam_optimizer = fluid.optimizer.Adam(learning_rate=args.learning_rate)
adam_optimizer.minimize(avg_cost)
# program for test # program for test
test_program = fluid.default_main_program().clone() test_program = fluid.default_main_program().clone()
with fluid.program_guard(test_program):
test_program = fluid.io.get_inference_program([avg_cost, accuracy]) optimizer = fluid.optimizer.Adam(learning_rate=args.learning_rate)
optimizer.minimize(avg_cost)
place = fluid.CPUPlace() if args.device == 'CPU' else fluid.CUDAPlace(0) place = fluid.CPUPlace() if args.device == 'CPU' else fluid.CUDAPlace(0)
exe = fluid.Executor(place) exe = fluid.Executor(place)
exe.run(fluid.default_startup_program()) exe.run(fluid.default_startup_program())
# resume training if initial model provided.
if args.init_model_path is not None:
fluid.io.load_persistables(exe, args.init_model_path)
ltrans = [ ltrans = [
trans_add_delta.TransAddDelta(2, 2), trans_add_delta.TransAddDelta(2, 2),
trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var), trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var),
...@@ -151,8 +184,8 @@ def train(args): ...@@ -151,8 +184,8 @@ def train(args):
os.path.exists(args.val_label_lst)): os.path.exists(args.val_label_lst)):
return -1.0, -1.0 return -1.0, -1.0
# test data reader # test data reader
test_data_reader = reader.DataReader(args.val_feature_lst, test_data_reader = reader.AsyncDataReader(args.val_feature_lst,
args.val_label_lst) args.val_label_lst)
test_data_reader.set_transformers(ltrans) test_data_reader.set_transformers(ltrans)
test_costs, test_accs = [], [] test_costs, test_accs = [], []
for batch_id, batch_data in enumerate( for batch_id, batch_data in enumerate(
...@@ -160,10 +193,12 @@ def train(args): ...@@ -160,10 +193,12 @@ def train(args):
args.minimum_batch_size)): args.minimum_batch_size)):
# load_data # load_data
(features, labels, lod) = batch_data (features, labels, lod) = batch_data
feature_t.set(features, place) feature_t.set(features.ndarray, place)
feature_t.set_lod([lod]) feature_t.set_lod([lod.ndarray])
label_t.set(labels, place) label_t.set(labels.ndarray, place)
label_t.set_lod([lod]) label_t.set_lod([lod.ndarray])
test_data_reader.recycle(features, labels, lod)
cost, acc = exe.run(test_program, cost, acc = exe.run(test_program,
feed={"feature": feature_t, feed={"feature": feature_t,
...@@ -175,8 +210,8 @@ def train(args): ...@@ -175,8 +210,8 @@ def train(args):
return np.mean(test_costs), np.mean(test_accs) return np.mean(test_costs), np.mean(test_accs)
# train data reader # train data reader
train_data_reader = reader.DataReader(args.train_feature_lst, train_data_reader = reader.AsyncDataReader(args.train_feature_lst,
args.train_label_lst) args.train_label_lst, -1)
train_data_reader.set_transformers(ltrans) train_data_reader.set_transformers(ltrans)
# train # train
for pass_id in xrange(args.pass_num): for pass_id in xrange(args.pass_num):
...@@ -186,30 +221,46 @@ def train(args): ...@@ -186,30 +221,46 @@ def train(args):
args.minimum_batch_size)): args.minimum_batch_size)):
# load_data # load_data
(features, labels, lod) = batch_data (features, labels, lod) = batch_data
feature_t.set(features, place) feature_t.set(features.ndarray, place)
feature_t.set_lod([lod]) feature_t.set_lod([lod.ndarray])
label_t.set(labels, place) label_t.set(labels.ndarray, place)
label_t.set_lod([lod]) label_t.set_lod([lod.ndarray])
cost, acc = exe.run(fluid.default_main_program(), train_data_reader.recycle(features, labels, lod)
feed={"feature": feature_t,
"label": label_t}, to_print = batch_id > 0 and (batch_id % args.print_per_batches == 0)
fetch_list=[avg_cost, accuracy], outs = exe.run(fluid.default_main_program(),
return_numpy=False) feed={"feature": feature_t,
"label": label_t},
fetch_list=[avg_cost, accuracy] if to_print else [],
return_numpy=False)
if batch_id > 0 and (batch_id % args.print_per_batches == 0): if to_print:
print("\nBatch %d, train cost: %f, train acc: %f" % print("\nBatch %d, train cost: %f, train acc: %f" %
(batch_id, lodtensor_to_ndarray(cost)[0], (batch_id, lodtensor_to_ndarray(outs[0])[0],
lodtensor_to_ndarray(acc)[0])) lodtensor_to_ndarray(outs[1])[0]))
# save the latest checkpoint
if args.checkpoints != '':
model_path = os.path.join(args.checkpoints,
"deep_asr.latest.checkpoint")
fluid.io.save_persistables(exe, model_path)
else: else:
sys.stdout.write('.') sys.stdout.write('.')
sys.stdout.flush() sys.stdout.flush()
# run test # run test
val_cost, val_acc = test(exe) val_cost, val_acc = test(exe)
# save model
if args.model_save_dir != '': # save checkpoint per pass
if args.checkpoints != '':
model_path = os.path.join( model_path = os.path.join(
args.model_save_dir, "deep_asr.pass_" + str(pass_id) + ".model") args.checkpoints,
"deep_asr.pass_" + str(pass_id) + ".checkpoint")
fluid.io.save_persistables(exe, model_path)
# save inference model
if args.infer_models != '':
model_path = os.path.join(
args.infer_models,
"deep_asr.pass_" + str(pass_id) + ".infer.model")
fluid.io.save_inference_model(model_path, ["feature"], fluid.io.save_inference_model(model_path, ["feature"],
[prediction], exe) [prediction], exe)
# cal pass time # cal pass time
...@@ -224,7 +275,4 @@ if __name__ == '__main__': ...@@ -224,7 +275,4 @@ if __name__ == '__main__':
args = parse_args() args = parse_args()
print_arguments(args) print_arguments(args)
if args.model_save_dir != '' and not os.path.exists(args.model_save_dir):
os.mkdir(args.model_save_dir)
train(args) train(args)
# Paddle Fluid Models
---
The Paddle Fluid models are a collection of example models that use Paddle Fluid APIs. Currently, example codes in this directory are still under active development.
The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
---
# Advbox # Advbox
Advbox is a Python toolbox to create adversarial examples that fool neural networks. It requires Python and paddle. Advbox is a Python toolbox to create adversarial examples that fool neural networks. It requires Python and paddle.
......
""" """
A set of tools for generating adversarial example on paddle platform A set of tools for generating adversarial example on paddle platform
""" """
from . import attacks
from . import models
from .adversary import Adversary
...@@ -18,13 +18,15 @@ class Adversary(object): ...@@ -18,13 +18,15 @@ class Adversary(object):
""" """
assert original is not None assert original is not None
self.original_label = original_label
self.target_label = None
self.adversarial_label = None
self.__original = original self.__original = original
self.__original_label = original_label
self.__target_label = None
self.__target = None self.__target = None
self.__is_targeted_attack = False self.__is_targeted_attack = False
self.__adversarial_example = None self.__adversarial_example = None
self.__adversarial_label = None self.__bad_adversarial_example = None
def set_target(self, is_targeted_attack, target=None, target_label=None): def set_target(self, is_targeted_attack, target=None, target_label=None):
""" """
...@@ -38,10 +40,10 @@ class Adversary(object): ...@@ -38,10 +40,10 @@ class Adversary(object):
""" """
assert (target_label is None) or is_targeted_attack assert (target_label is None) or is_targeted_attack
self.__is_targeted_attack = is_targeted_attack self.__is_targeted_attack = is_targeted_attack
self.__target_label = target_label self.target_label = target_label
self.__target = target self.__target = target
if not is_targeted_attack: if not is_targeted_attack:
self.__target_label = None self.target_label = None
self.__target = None self.__target = None
def set_original(self, original, original_label=None): def set_original(self, original, original_label=None):
...@@ -53,10 +55,11 @@ class Adversary(object): ...@@ -53,10 +55,11 @@ class Adversary(object):
""" """
if original != self.__original: if original != self.__original:
self.__original = original self.__original = original
self.__original_label = original_label self.original_label = original_label
self.__adversarial_example = None self.__adversarial_example = None
self.__bad_adversarial_example = None
if original is None: if original is None:
self.__original_label = None self.original_label = None
def _is_successful(self, adversarial_label): def _is_successful(self, adversarial_label):
""" """
...@@ -65,11 +68,11 @@ class Adversary(object): ...@@ -65,11 +68,11 @@ class Adversary(object):
:param adversarial_label: adversarial label. :param adversarial_label: adversarial label.
:return: bool :return: bool
""" """
if self.__target_label is not None: if self.target_label is not None:
return adversarial_label == self.__target_label return adversarial_label == self.target_label
else: else:
return (adversarial_label is not None) and \ return (adversarial_label is not None) and \
(adversarial_label != self.__original_label) (adversarial_label != self.original_label)
def is_successful(self): def is_successful(self):
""" """
...@@ -77,7 +80,7 @@ class Adversary(object): ...@@ -77,7 +80,7 @@ class Adversary(object):
:return: bool :return: bool
""" """
return self._is_successful(self.__adversarial_label) return self._is_successful(self.adversarial_label)
def try_accept_the_example(self, adversarial_example, adversarial_label): def try_accept_the_example(self, adversarial_example, adversarial_label):
""" """
...@@ -93,7 +96,9 @@ class Adversary(object): ...@@ -93,7 +96,9 @@ class Adversary(object):
ok = self._is_successful(adversarial_label) ok = self._is_successful(adversarial_label)
if ok: if ok:
self.__adversarial_example = adversarial_example self.__adversarial_example = adversarial_example
self.__adversarial_label = adversarial_label self.adversarial_label = adversarial_label
else:
self.__bad_adversarial_example = adversarial_example
return ok return ok
def perturbation(self, multiplying_factor=1.0): def perturbation(self, multiplying_factor=1.0):
...@@ -104,9 +109,14 @@ class Adversary(object): ...@@ -104,9 +109,14 @@ class Adversary(object):
:return: The perturbation that is multiplied by multiplying_factor. :return: The perturbation that is multiplied by multiplying_factor.
""" """
assert self.__original is not None assert self.__original is not None
assert self.__adversarial_example is not None assert (self.__adversarial_example is not None) or \
return multiplying_factor * ( (self.__bad_adversarial_example is not None)
self.__adversarial_example - self.__original) if self.__adversarial_example is not None:
return multiplying_factor * (
self.__adversarial_example - self.__original)
else:
return multiplying_factor * (
self.__bad_adversarial_example - self.__original)
@property @property
def is_targeted_attack(self): def is_targeted_attack(self):
...@@ -115,20 +125,6 @@ class Adversary(object): ...@@ -115,20 +125,6 @@ class Adversary(object):
""" """
return self.__is_targeted_attack return self.__is_targeted_attack
@property
def target_label(self):
"""
:property: target_label
"""
return self.__target_label
@target_label.setter
def target_label(self, label):
"""
:property: target_label
"""
self.__target_label = label
@property @property
def target(self): def target(self):
""" """
...@@ -143,20 +139,6 @@ class Adversary(object): ...@@ -143,20 +139,6 @@ class Adversary(object):
""" """
return self.__original return self.__original
@property
def original_label(self):
"""
:property: original
"""
return self.__original_label
@original_label.setter
def original_label(self, label):
"""
original_label setter
"""
self.__original_label = label
@property @property
def adversarial_example(self): def adversarial_example(self):
""" """
...@@ -164,23 +146,9 @@ class Adversary(object): ...@@ -164,23 +146,9 @@ class Adversary(object):
""" """
return self.__adversarial_example return self.__adversarial_example
@adversarial_example.setter
def adversarial_example(self, example):
"""
adversarial_example setter
"""
self.__adversarial_example = example
@property @property
def adversarial_label(self): def bad_adversarial_example(self):
"""
:property: adversarial_label
"""
return self.__adversarial_label
@adversarial_label.setter
def adversarial_label(self, label):
""" """
adversarial_label setter :property: bad_adversarial_example
""" """
self.__adversarial_label = label return self.__bad_adversarial_example
""" """
Attack methods Attack methods __init__.py
""" """
from .base import Attack
from .deepfool import DeepFoolAttack
from .gradientsign import FGSM
from .gradientsign import GradientSignAttack
from .iterator_gradientsign import IFGSM
from .iterator_gradientsign import IteratorGradientSignAttack
...@@ -52,21 +52,23 @@ class Attack(object): ...@@ -52,21 +52,23 @@ class Attack(object):
:param adversary: adversary :param adversary: adversary
:return: None :return: None
""" """
assert self.model.channel_axis() == adversary.original.ndim
if adversary.original_label is None: if adversary.original_label is None:
adversary.original_label = np.argmax( adversary.original_label = np.argmax(
self.model.predict(adversary.original)) self.model.predict(adversary.original))
if adversary.is_targeted_attack and adversary.target_label is None: if adversary.is_targeted_attack and adversary.target_label is None:
if adversary.target is None: if adversary.target is None:
raise ValueError( raise ValueError(
'When adversary.is_targeted_attack is True, ' 'When adversary.is_targeted_attack is true, '
'adversary.target_label or adversary.target must be set.') 'adversary.target_label or adversary.target must be set.')
else: else:
adversary.target_label_label = np.argmax( adversary.target_label = np.argmax(
self.model.predict( self.model.predict(adversary.target))
self.model.scale_input(adversary.target)))
logging.info('adversary:\noriginal_label: {}' logging.info('adversary:'
'\n target_lable: {}' '\n original_label: {}'
'\n is_targeted_attack: {}' '\n target_label: {}'
'\n is_targeted_attack: {}'
''.format(adversary.original_label, adversary.target_label, ''.format(adversary.original_label, adversary.target_label,
adversary.is_targeted_attack)) adversary.is_targeted_attack))
...@@ -10,6 +10,8 @@ import numpy as np ...@@ -10,6 +10,8 @@ import numpy as np
from .base import Attack from .base import Attack
__all__ = ['DeepFoolAttack']
class DeepFoolAttack(Attack): class DeepFoolAttack(Attack):
""" """
...@@ -56,7 +58,7 @@ class DeepFoolAttack(Attack): ...@@ -56,7 +58,7 @@ class DeepFoolAttack(Attack):
gradient_k = self.model.gradient(x, k) gradient_k = self.model.gradient(x, k)
w_k = gradient_k - gradient w_k = gradient_k - gradient
f_k = f[k] - f[pre_label] f_k = f[k] - f[pre_label]
w_k_norm = np.linalg.norm(w_k) + 1e-8 w_k_norm = np.linalg.norm(w_k.flatten()) + 1e-8
pert_k = (np.abs(f_k) + 1e-8) / w_k_norm pert_k = (np.abs(f_k) + 1e-8) / w_k_norm
if pert_k < pert: if pert_k < pert:
pert = pert_k pert = pert_k
...@@ -70,9 +72,12 @@ class DeepFoolAttack(Attack): ...@@ -70,9 +72,12 @@ class DeepFoolAttack(Attack):
f = self.model.predict(x) f = self.model.predict(x)
gradient = self.model.gradient(x, pre_label) gradient = self.model.gradient(x, pre_label)
adv_label = np.argmax(f) adv_label = np.argmax(f)
logging.info('iteration = {}, f = {}, pre_label = {}' logging.info('iteration={}, f[pre_label]={}, f[target_label]={}'
', adv_label={}'.format(iteration, f[pre_label], ', f[adv_label]={}, pre_label={}, adv_label={}'
pre_label, adv_label)) ''.format(iteration, f[pre_label], (
f[adversary.target_label]
if adversary.is_targeted_attack else 'NaN'), f[
adv_label], pre_label, adv_label))
if adversary.try_accept_the_example(x, adv_label): if adversary.try_accept_the_example(x, adv_label):
return adversary return adversary
......
"""
This module provide the attack method for Iterator FGSM's implement.
"""
from __future__ import division
import logging
from collections import Iterable
import numpy as np
from .base import Attack
__all__ = [
'GradientMethodAttack', 'FastGradientSignMethodAttack', 'FGSM',
'FastGradientSignMethodTargetedAttack', 'FGSMT',
'BasicIterativeMethodAttack', 'BIM',
'IterativeLeastLikelyClassMethodAttack', 'ILCM'
]
class GradientMethodAttack(Attack):
"""
This class implements gradient attack method, and is the base of FGSM, BIM,
ILCM, etc.
"""
def __init__(self, model, support_targeted=True):
"""
:param model(model): The model to be attacked.
:param support_targeted(bool): Does this attack method support targeted.
"""
super(GradientMethodAttack, self).__init__(model)
self.support_targeted = support_targeted
def _apply(self, adversary, norm_ord=np.inf, epsilons=0.01, steps=100):
"""
Apply the gradient attack method.
:param adversary(Adversary):
The Adversary object.
:param norm_ord(int):
Order of the norm, such as np.inf, 1, 2, etc. It can't be 0.
:param epsilons(list|tuple|int):
Attack step size (input variation).
:param steps:
The number of iterator steps.
:return:
adversary(Adversary): The Adversary object.
"""
if norm_ord == 0:
raise ValueError("L0 norm is not supported!")
if not self.support_targeted:
if adversary.is_targeted_attack:
raise ValueError(
"This attack method doesn't support targeted attack!")
if not isinstance(epsilons, Iterable):
epsilons = np.linspace(epsilons, epsilons + 1e-10, num=steps)
pre_label = adversary.original_label
min_, max_ = self.model.bounds()
assert self.model.channel_axis() == adversary.original.ndim
assert (self.model.channel_axis() == 1 or
self.model.channel_axis() == adversary.original.shape[0] or
self.model.channel_axis() == adversary.original.shape[-1])
step = 1
adv_img = adversary.original
for epsilon in epsilons[:steps]:
if epsilon == 0.0:
continue
if adversary.is_targeted_attack:
gradient = -self.model.gradient(adv_img, adversary.target_label)
else:
gradient = self.model.gradient(adv_img,
adversary.original_label)
if norm_ord == np.inf:
gradient_norm = np.sign(gradient)
else:
gradient_norm = gradient / self._norm(gradient, ord=norm_ord)
adv_img = adv_img + epsilon * gradient_norm * (max_ - min_)
adv_img = np.clip(adv_img, min_, max_)
adv_label = np.argmax(self.model.predict(adv_img))
logging.info('step={}, epsilon = {:.5f}, pre_label = {}, '
'adv_label={}'.format(step, epsilon, pre_label,
adv_label))
if adversary.try_accept_the_example(adv_img, adv_label):
return adversary
step += 1
return adversary
@staticmethod
def _norm(a, ord):
if a.ndim == 1:
return np.linalg.norm(a, ord=ord)
if a.ndim == a.shape[0]:
norm_shape = (a.ndim, reduce(np.dot, a.shape[1:]))
norm_axis = 1
else:
norm_shape = (reduce(np.dot, a.shape[:-1]), a.ndim)
norm_axis = 0
return np.linalg.norm(a.reshape(norm_shape), ord=ord, axis=norm_axis)
class FastGradientSignMethodTargetedAttack(GradientMethodAttack):
"""
"Fast Gradient Sign Method" is extended to support targeted attack.
"Fast Gradient Sign Method" was originally implemented by Goodfellow et
al. (2015) with the infinity norm.
Paper link: https://arxiv.org/abs/1412.6572
"""
def _apply(self, adversary, epsilons=0.03):
return GradientMethodAttack._apply(
self,
adversary=adversary,
norm_ord=np.inf,
epsilons=epsilons,
steps=1)
class FastGradientSignMethodAttack(FastGradientSignMethodTargetedAttack):
"""
This attack was originally implemented by Goodfellow et al. (2015) with the
infinity norm, and is known as the "Fast Gradient Sign Method".
Paper link: https://arxiv.org/abs/1412.6572
"""
def __init__(self, model):
super(FastGradientSignMethodAttack, self).__init__(model, False)
class IterativeLeastLikelyClassMethodAttack(GradientMethodAttack):
"""
"Iterative Least-likely Class Method (ILCM)" extends "BIM" to support
targeted attack.
"The Basic Iterative Method (BIM)" is to extend "FSGM". "BIM" iteratively
take multiple small steps while adjusting the direction after each step.
Paper link: https://arxiv.org/abs/1607.02533
"""
def _apply(self, adversary, epsilons=0.001, steps=1000):
return GradientMethodAttack._apply(
self,
adversary=adversary,
norm_ord=np.inf,
epsilons=epsilons,
steps=steps)
class BasicIterativeMethodAttack(IterativeLeastLikelyClassMethodAttack):
"""
FGSM is a one-step method. "The Basic Iterative Method (BIM)" iteratively
take multiple small steps while adjusting the direction after each step.
Paper link: https://arxiv.org/abs/1607.02533
"""
def __init__(self, model):
super(BasicIterativeMethodAttack, self).__init__(model, False)
FGSM = FastGradientSignMethodAttack
FGSMT = FastGradientSignMethodTargetedAttack
BIM = BasicIterativeMethodAttack
ILCM = IterativeLeastLikelyClassMethodAttack
"""
This module provide the attack method for FGSM's implement.
"""
from __future__ import division
import logging
from collections import Iterable
import numpy as np
from .base import Attack
class GradientSignAttack(Attack):
"""
This attack was originally implemented by Goodfellow et al. (2015) with the
infinity norm (and is known as the "Fast Gradient Sign Method").
This is therefore called the Fast Gradient Method.
Paper link: https://arxiv.org/abs/1412.6572
"""
def _apply(self, adversary, epsilons=1000):
"""
Apply the gradient sign attack.
Args:
adversary(Adversary): The Adversary object.
epsilons(list|tuple|int): The epsilon (input variation parameter).
Return:
adversary: The Adversary object.
"""
assert adversary is not None
if not isinstance(epsilons, Iterable):
epsilons = np.linspace(0, 1, num=epsilons + 1)[1:]
pre_label = adversary.original_label
min_, max_ = self.model.bounds()
if adversary.is_targeted_attack:
gradient = self.model.gradient(adversary.original,
adversary.target_label)
gradient_sign = -np.sign(gradient) * (max_ - min_)
else:
gradient = self.model.gradient(adversary.original,
adversary.original_label)
gradient_sign = np.sign(gradient) * (max_ - min_)
for epsilon in epsilons:
adv_img = adversary.original + epsilon * gradient_sign
adv_img = np.clip(adv_img, min_, max_)
adv_label = np.argmax(self.model.predict(adv_img))
logging.info('epsilon = {:.3f}, pre_label = {}, adv_label={}'.
format(epsilon, pre_label, adv_label))
if adversary.try_accept_the_example(adv_img, adv_label):
return adversary
return adversary
FGSM = GradientSignAttack
"""
This module provide the attack method for Iterator FGSM's implement.
"""
from __future__ import division
import logging
from collections import Iterable
import numpy as np
from .base import Attack
class IteratorGradientSignAttack(Attack):
"""
This attack was originally implemented by Alexey Kurakin(Google Brain).
Paper link: https://arxiv.org/pdf/1607.02533.pdf
"""
def _apply(self, adversary, epsilons=100, steps=10):
"""
Apply the iterative gradient sign attack.
Args:
adversary(Adversary): The Adversary object.
epsilons(list|tuple|int): The epsilon (input variation parameter).
steps(int): The number of iterator steps.
Return:
adversary(Adversary): The Adversary object.
"""
if not isinstance(epsilons, Iterable):
epsilons = np.linspace(0, 1 / steps, num=epsilons + 1)[1:]
pre_label = adversary.original_label
min_, max_ = self.model.bounds()
for epsilon in epsilons:
adv_img = adversary.original
for _ in range(steps):
if adversary.is_targeted_attack:
gradient = self.model.gradient(adversary.original,
adversary.target_label)
gradient_sign = -np.sign(gradient) * (max_ - min_)
else:
gradient = self.model.gradient(adversary.original,
adversary.original_label)
gradient_sign = np.sign(gradient) * (max_ - min_)
adv_img = adv_img + gradient_sign * epsilon
adv_img = np.clip(adv_img, min_, max_)
adv_label = np.argmax(self.model.predict(adv_img))
logging.info('epsilon = {:.3f}, pre_label = {}, adv_label={}'.
format(epsilon, pre_label, adv_label))
if adversary.try_accept_the_example(adv_img, adv_label):
return adversary
return adversary
IFGSM = IteratorGradientSignAttack
"""
This module provide the attack method of "LBFGS".
"""
from __future__ import division
import logging
import numpy as np
from scipy.optimize import fmin_l_bfgs_b
from .base import Attack
__all__ = ['LBFGSAttack', 'LBFGS']
class LBFGSAttack(Attack):
"""
Uses L-BFGS-B to minimize the cross-entropy and the distance between the
original and the adversary.
Paper link: https://arxiv.org/abs/1510.05328
"""
def __init__(self, model):
super(LBFGSAttack, self).__init__(model)
self._predicts_normalized = None
self._adversary = None # type: Adversary
def _apply(self, adversary, epsilon=0.001, steps=10):
self._adversary = adversary
if not adversary.is_targeted_attack:
raise ValueError("This attack method only support targeted attack!")
# finding initial c
logging.info('finding initial c...')
c = epsilon
x0 = adversary.original.flatten()
for i in range(30):
c = 2 * c
logging.info('c={}'.format(c))
is_adversary = self._lbfgsb(x0, c, steps)
if is_adversary:
break
if not is_adversary:
logging.info('Failed!')
return adversary
# binary search c
logging.info('binary search c...')
c_low = 0
c_high = c
while c_high - c_low >= epsilon:
logging.info('c_high={}, c_low={}, diff={}, epsilon={}'
.format(c_high, c_low, c_high - c_low, epsilon))
c_half = (c_low + c_high) / 2
is_adversary = self._lbfgsb(x0, c_half, steps)
if is_adversary:
c_high = c_half
else:
c_low = c_half
return adversary
def _is_predicts_normalized(self, predicts):
"""
To determine the predicts is normalized.
:param predicts(np.array): the output of the model.
:return: bool
"""
if self._predicts_normalized is None:
if self.model.predict_name().lower() in [
'softmax', 'probabilities', 'probs'
]:
self._predicts_normalized = True
else:
if np.any(predicts < 0.0):
self._predicts_normalized = False
else:
s = np.sum(predicts.flatten())
if 0.999 <= s <= 1.001:
self._predicts_normalized = True
else:
self._predicts_normalized = False
assert self._predicts_normalized is not None
return self._predicts_normalized
def _loss(self, adv_x, c):
"""
To get the loss and gradient.
:param adv_x: the candidate adversarial example
:param c: parameter 'C' in the paper
:return: (loss, gradient)
"""
x = adv_x.reshape(self._adversary.original.shape)
# cross_entropy
logits = self.model.predict(x)
if not self._is_predicts_normalized(logits): # to softmax
e = np.exp(logits)
logits = e / np.sum(e)
e = np.exp(logits)
s = np.sum(e)
ce = np.log(s) - logits[self._adversary.target_label]
# L2 distance
min_, max_ = self.model.bounds()
d = np.sum((x - self._adversary.original).flatten() ** 2) \
/ ((max_ - min_) ** 2) / len(adv_x)
# gradient
gradient = self.model.gradient(x, self._adversary.target_label)
result = (c * ce + d).astype(float), gradient.flatten().astype(float)
return result
def _lbfgsb(self, x0, c, maxiter):
min_, max_ = self.model.bounds()
bounds = [(min_, max_)] * len(x0)
approx_grad_eps = (max_ - min_) / 100.0
x, f, d = fmin_l_bfgs_b(
self._loss,
x0,
args=(c, ),
bounds=bounds,
maxiter=maxiter,
epsilon=approx_grad_eps)
if np.amax(x) > max_ or np.amin(x) < min_:
x = np.clip(x, min_, max_)
shape = self._adversary.original.shape
adv_label = np.argmax(self.model.predict(x.reshape(shape)))
logging.info('pre_label = {}, adv_label={}'.format(
self._adversary.target_label, adv_label))
return self._adversary.try_accept_the_example(
x.reshape(shape), adv_label)
LBFGS = LBFGSAttack
"""
This module provide the attack method for JSMA's implement.
"""
from __future__ import division
import logging
import random
import numpy as np
from .base import Attack
class SaliencyMapAttack(Attack):
"""
Implements the Saliency Map Attack.
The Jacobian-based Saliency Map Approach (Papernot et al. 2016).
Paper link: https://arxiv.org/pdf/1511.07528.pdf
"""
def _apply(self,
adversary,
max_iter=2000,
fast=True,
theta=0.1,
max_perturbations_per_pixel=7):
"""
Apply the JSMA attack.
Args:
adversary(Adversary): The Adversary object.
max_iter(int): The max iterations.
fast(bool): Whether evaluate the pixel influence on sum of residual classes.
theta(float): Perturbation per pixel relative to [min, max] range.
max_perturbations_per_pixel(int): The max count of perturbation per pixel.
Return:
adversary: The Adversary object.
"""
assert adversary is not None
if not adversary.is_targeted_attack or (adversary.target_label is None):
target_labels = self._generate_random_target(
adversary.original_label)
else:
target_labels = [adversary.target_label]
for target in target_labels:
original_image = adversary.original
# the mask defines the search domain
# each modified pixel with border value is set to zero in mask
mask = np.ones_like(original_image)
# count tracks how often each pixel was changed
counts = np.zeros_like(original_image)
labels = range(self.model.num_classes())
adv_img = original_image.copy()
min_, max_ = self.model.bounds()
for step in range(max_iter):
adv_img = np.clip(adv_img, min_, max_)
adv_label = np.argmax(self.model.predict(adv_img))
if adversary.try_accept_the_example(adv_img, adv_label):
return adversary
# stop if mask is all zero
if not any(mask.flatten()):
return adversary
logging.info('step = {}, original_label = {}, adv_label={}'.
format(step, adversary.original_label, adv_label))
# get pixel location with highest influence on class
idx, p_sign = self._saliency_map(
adv_img, target, labels, mask, fast=fast)
# apply perturbation
adv_img[idx] += -p_sign * theta * (max_ - min_)
# tracks number of updates for each pixel
counts[idx] += 1
# remove pixel from search domain if it hits the bound
if adv_img[idx] <= min_ or adv_img[idx] >= max_:
mask[idx] = 0
# remove pixel if it was changed too often
if counts[idx] >= max_perturbations_per_pixel:
mask[idx] = 0
adv_img = np.clip(adv_img, min_, max_)
def _generate_random_target(self, original_label):
"""
Draw random target labels all of which are different and not the original label.
Args:
original_label(int): Original label.
Return:
target_labels(list): random target labels
"""
num_random_target = 1
num_classes = self.model.num_classes()
assert num_random_target <= num_classes - 1
target_labels = random.sample(range(num_classes), num_random_target + 1)
target_labels = [t for t in target_labels if t != original_label]
target_labels = target_labels[:num_random_target]
return target_labels
def _saliency_map(self, image, target, labels, mask, fast=False):
"""
Get pixel location with highest influence on class.
Args:
image(numpy.ndarray): Image with shape (height, width, channels).
target(int): The target label.
labels(int): The number of classes of the output label.
mask(list): Each modified pixel with border value is set to zero in mask.
fast(bool): Whether evaluate the pixel influence on sum of residual classes.
Return:
idx: The index of optimal pixel.
pix_sign: The direction of perturbation
"""
# pixel influence on target class
alphas = self.model.gradient(image, target) * mask
# pixel influence on sum of residual classes(don't evaluate if fast == True)
if fast:
betas = -np.ones_like(alphas)
else:
betas = np.sum([
self.model.gradient(image, label) * mask - alphas
for label in labels
], 0)
# compute saliency map (take into account both pos. & neg. perturbations)
sal_map = np.abs(alphas) * np.abs(betas) * np.sign(alphas * betas)
# find optimal pixel & direction of perturbation
idx = np.argmin(sal_map)
idx = np.unravel_index(idx, mask.shape)
pix_sign = np.sign(alphas)[idx]
return idx, pix_sign
JSMA = SaliencyMapAttack
""" """
Paddle model for target of attack Models __init__.py
""" """
from .base import Model \ No newline at end of file
from .paddle import PaddleModel
...@@ -24,11 +24,21 @@ class Model(object): ...@@ -24,11 +24,21 @@ class Model(object):
assert len(bounds) == 2 assert len(bounds) == 2
assert channel_axis in [0, 1, 2, 3] assert channel_axis in [0, 1, 2, 3]
if preprocess is None:
preprocess = (0, 1)
self._bounds = bounds self._bounds = bounds
self._channel_axis = channel_axis self._channel_axis = channel_axis
self._preprocess = preprocess
# Make self._preprocess to be (0,1) if possible, so that don't need
# to do substract or divide.
if preprocess is not None:
sub, div = np.array(preprocess)
if not np.any(sub):
sub = 0
if np.all(div == 1):
div = 1
assert (div is None) or np.all(div)
self._preprocess = (sub, div)
else:
self._preprocess = (0, 1)
def bounds(self): def bounds(self):
""" """
...@@ -47,8 +57,7 @@ class Model(object): ...@@ -47,8 +57,7 @@ class Model(object):
sub, div = self._preprocess sub, div = self._preprocess
if np.any(sub != 0): if np.any(sub != 0):
res = input_ - sub res = input_ - sub
assert np.any(div != 0) if not np.all(sub == 1):
if np.any(div != 1):
if res is None: # "res = input_ - sub" is not executed! if res is None: # "res = input_ - sub" is not executed!
res = input_ / div res = input_ / div
else: else:
...@@ -97,3 +106,11 @@ class Model(object): ...@@ -97,3 +106,11 @@ class Model(object):
with the shape (height, width, channel). with the shape (height, width, channel).
""" """
raise NotImplementedError raise NotImplementedError
@abstractmethod
def predict_name(self):
"""
Get the predict name, such as "softmax",etc.
:return: string
"""
raise NotImplementedError
...@@ -4,7 +4,7 @@ Paddle model ...@@ -4,7 +4,7 @@ Paddle model
from __future__ import absolute_import from __future__ import absolute_import
import numpy as np import numpy as np
import paddle.v2.fluid as fluid import paddle.fluid as fluid
from .base import Model from .base import Model
...@@ -16,7 +16,7 @@ class PaddleModel(Model): ...@@ -16,7 +16,7 @@ class PaddleModel(Model):
instance of PaddleModel. instance of PaddleModel.
Args: Args:
program(paddle.v2.fluid.framework.Program): The program of the model program(paddle.fluid.framework.Program): The program of the model
which generate the adversarial sample. which generate the adversarial sample.
input_name(string): The name of the input. input_name(string): The name of the input.
logits_name(string): The name of the logits. logits_name(string): The name of the logits.
...@@ -114,3 +114,10 @@ class PaddleModel(Model): ...@@ -114,3 +114,10 @@ class PaddleModel(Model):
feed=feeder.feed([(scaled_data, label)]), feed=feeder.feed([(scaled_data, label)]),
fetch_list=[self._gradient]) fetch_list=[self._gradient])
return grad.reshape(data.shape) return grad.reshape(data.shape)
def predict_name(self):
"""
Get the predict name, such as "softmax",etc.
:return: string
"""
return self._program.block(0).var(self._predict_name).op.type
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
CNN on mnist data using fluid api of paddlepaddle CNN on mnist data using fluid api of paddlepaddle
""" """
import paddle.v2 as paddle import paddle.v2 as paddle
import paddle.v2.fluid as fluid import paddle.fluid as fluid
def mnist_cnn_model(img): def mnist_cnn_model(img):
...@@ -47,7 +47,9 @@ def main(): ...@@ -47,7 +47,9 @@ def main():
optimizer = fluid.optimizer.Adam(learning_rate=0.01) optimizer = fluid.optimizer.Adam(learning_rate=0.01)
optimizer.minimize(avg_cost) optimizer.minimize(avg_cost)
accuracy = fluid.evaluator.Accuracy(input=logits, label=label) batch_size = fluid.layers.create_tensor(dtype='int64')
batch_acc = fluid.layers.accuracy(
input=logits, label=label, total=batch_size)
BATCH_SIZE = 50 BATCH_SIZE = 50
PASS_NUM = 3 PASS_NUM = 3
...@@ -63,20 +65,22 @@ def main(): ...@@ -63,20 +65,22 @@ def main():
feeder = fluid.DataFeeder(feed_list=[img, label], place=place) feeder = fluid.DataFeeder(feed_list=[img, label], place=place)
exe.run(fluid.default_startup_program()) exe.run(fluid.default_startup_program())
pass_acc = fluid.average.WeightedAverage()
for pass_id in range(PASS_NUM): for pass_id in range(PASS_NUM):
accuracy.reset(exe) pass_acc.reset()
for data in train_reader(): for data in train_reader():
loss, acc = exe.run(fluid.default_main_program(), loss, acc, b_size = exe.run(
feed=feeder.feed(data), fluid.default_main_program(),
fetch_list=[avg_cost] + accuracy.metrics) feed=feeder.feed(data),
pass_acc = accuracy.eval(exe) fetch_list=[avg_cost, batch_acc, batch_size])
print("pass_id=" + str(pass_id) + " acc=" + str(acc) + " pass_acc=" pass_acc.add(value=acc, weight=b_size)
+ str(pass_acc)) print("pass_id=" + str(pass_id) + " acc=" + str(acc[0]) +
" pass_acc=" + str(pass_acc.eval()[0]))
if loss < LOSS_THRESHOLD and pass_acc > ACC_THRESHOLD: if loss < LOSS_THRESHOLD and pass_acc > ACC_THRESHOLD:
break break
pass_acc = accuracy.eval(exe) print("pass_id=" + str(pass_id) + " pass_acc=" + str(pass_acc.eval()[
print("pass_id=" + str(pass_id) + " pass_acc=" + str(pass_acc)) 0]))
fluid.io.save_params( fluid.io.save_params(
exe, dirname='./mnist', main_program=fluid.default_main_program()) exe, dirname='./mnist', main_program=fluid.default_main_program())
print('train mnist done') print('train mnist done')
......
...@@ -3,10 +3,10 @@ FGSM demos on mnist using advbox tool. ...@@ -3,10 +3,10 @@ FGSM demos on mnist using advbox tool.
""" """
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
import paddle.v2 as paddle import paddle.v2 as paddle
import paddle.v2.fluid as fluid import paddle.fluid as fluid
from advbox import Adversary from advbox.adversary import Adversary
from advbox.attacks.gradientsign import GradientSignAttack from advbox.attacks.gradient_method import FGSM
from advbox.models.paddle import PaddleModel from advbox.models.paddle import PaddleModel
...@@ -73,7 +73,7 @@ def main(): ...@@ -73,7 +73,7 @@ def main():
# advbox demo # advbox demo
m = PaddleModel(fluid.default_main_program(), IMG_NAME, LABEL_NAME, m = PaddleModel(fluid.default_main_program(), IMG_NAME, LABEL_NAME,
logits.name, avg_cost.name, (-1, 1)) logits.name, avg_cost.name, (-1, 1))
att = GradientSignAttack(m) att = FGSM(m)
for data in train_reader(): for data in train_reader():
# fgsm attack # fgsm attack
adversary = att(Adversary(data[0][0], data[0][1])) adversary = att(Adversary(data[0][0], data[0][1]))
......
"""
FGSM demos on mnist using advbox tool.
"""
import matplotlib.pyplot as plt
import paddle.v2 as paddle
import paddle.fluid as fluid
import numpy as np
from advbox import Adversary
from advbox.attacks.saliency import SaliencyMapAttack
from advbox.models.paddle import PaddleModel
def cnn_model(img):
"""
Mnist cnn model
Args:
img(Varaible): the input image to be recognized
Returns:
Variable: the label prediction
"""
# conv1 = fluid.nets.conv2d()
conv_pool_1 = fluid.nets.simple_img_conv_pool(
input=img,
num_filters=20,
filter_size=5,
pool_size=2,
pool_stride=2,
act='relu')
conv_pool_2 = fluid.nets.simple_img_conv_pool(
input=conv_pool_1,
num_filters=50,
filter_size=5,
pool_size=2,
pool_stride=2,
act='relu')
logits = fluid.layers.fc(input=conv_pool_2, size=10, act='softmax')
return logits
def main():
"""
Advbox demo which demonstrate how to use advbox.
"""
IMG_NAME = 'img'
LABEL_NAME = 'label'
img = fluid.layers.data(name=IMG_NAME, shape=[1, 28, 28], dtype='float32')
# gradient should flow
img.stop_gradient = False
label = fluid.layers.data(name=LABEL_NAME, shape=[1], dtype='int64')
logits = cnn_model(img)
cost = fluid.layers.cross_entropy(input=logits, label=label)
avg_cost = fluid.layers.mean(x=cost)
place = fluid.CPUPlace()
exe = fluid.Executor(place)
BATCH_SIZE = 1
train_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.mnist.train(), buf_size=500),
batch_size=BATCH_SIZE)
feeder = fluid.DataFeeder(
feed_list=[IMG_NAME, LABEL_NAME],
place=place,
program=fluid.default_main_program())
fluid.io.load_params(
exe, "./mnist/", main_program=fluid.default_main_program())
# advbox demo
m = PaddleModel(fluid.default_main_program(), IMG_NAME, LABEL_NAME,
logits.name, avg_cost.name, (-1, 1))
attack = SaliencyMapAttack(m)
total_num = 0
success_num = 0
for data in train_reader():
total_num += 1
# adversary.set_target(True, target_label=target_label)
jsma_attack = attack(Adversary(data[0][0], data[0][1]))
if jsma_attack is not None and jsma_attack.is_successful():
# plt.imshow(jsma_attack.target, cmap='Greys_r')
# plt.show()
success_num += 1
print('original_label=%d, adversary examples label =%d' %
(data[0][1], jsma_attack.adversarial_label))
# np.save('adv_img', jsma_attack.adversarial_example)
print('total num = %d, success num = %d ' % (total_num, success_num))
if total_num == 100:
break
if __name__ == '__main__':
main()
The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
---
# SE-ResNeXt for image classification # SE-ResNeXt for image classification
This model built with paddle fluid is still under active development and is not This model built with paddle fluid is still under active development and is not
......
### Caffe2Fluid
This tool is used to convert a Caffe model to Fluid model
### Howto
1, Prepare caffepb.py in ./proto if your python has no 'pycaffe' module, two options provided here:
1) generate it from caffe.proto using protoc
bash ./proto/compile.sh
2) download one from github directly
cd proto/ && wget https://github.com/ethereon/caffe-tensorflow/blob/master/kaffe/caffe/caffepb.py
2, Convert the caffe model using 'convert.py' which will generate a python script and a weight(in .npy) file
3, Use the converted model to predict
see more detail info in 'examples/xxx'
### Tested models
- Lenet on mnist dataset
- ResNets:(ResNet-50, ResNet-101, ResNet-152)
model addr: `https://onedrive.live.com/?authkey=%21AAFW2-FVoxeVRck&id=4006CBB8476FF777%2117887&cid=4006CBB8476FF777`_
- GoogleNet:
model addr: `https://gist.github.com/jimmie33/7ea9f8ac0da259866b854460f4526034`_
- VGG:
model addr: `https://gist.github.com/ksimonyan/211839e770f7b538e2d8`_
- AlexNet:
model addr: `https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet`_
### Notes
Some of this code come from here: https://github.com/ethereon/caffe-tensorflow
#!/usr/bin/env python
import os
import sys
import numpy as np
import argparse
from kaffe import KaffeError, print_stderr
from kaffe.paddle import Transformer
def fatal_error(msg):
""" fatal error encounted
"""
print_stderr(msg)
exit(-1)
def validate_arguments(args):
""" validate args
"""
if (args.data_output_path is not None) and (args.caffemodel is None):
fatal_error('No input data path provided.')
if (args.caffemodel is not None) and (args.data_output_path is None):
fatal_error('No output data path provided.')
if (args.code_output_path is None) and (args.data_output_path is None):
fatal_error('No output path specified.')
def convert(def_path, caffemodel_path, data_output_path, code_output_path,
phase):
""" convert caffe model to tf/paddle models
"""
try:
transformer = Transformer(def_path, caffemodel_path, phase=phase)
print_stderr('Converting data...')
if caffemodel_path is not None:
data = transformer.transform_data()
print_stderr('Saving data...')
with open(data_output_path, 'wb') as data_out:
np.save(data_out, data)
if code_output_path:
print_stderr('Saving source...')
with open(code_output_path, 'wb') as src_out:
src_out.write(transformer.transform_source())
print_stderr('Done.')
except KaffeError as err:
fatal_error('Error encountered: {}'.format(err))
return 0
def main():
""" main
"""
parser = argparse.ArgumentParser()
parser.add_argument('def_path', help='Model definition (.prototxt) path')
parser.add_argument('--caffemodel', help='Model data (.caffemodel) path')
parser.add_argument('--data-output-path', help='Converted data output path')
parser.add_argument(
'--code-output-path', help='Save generated source to this path')
parser.add_argument(
'-p',
'--phase',
default='test',
help='The phase to convert: test (default) or train')
args = parser.parse_args()
validate_arguments(args)
return convert(args.def_path, args.caffemodel, args.data_output_path,
args.code_output_path, args.phase)
if __name__ == '__main__':
ret = main()
sys.exit(ret)
a demo to show converting caffe models on 'imagenet' using caffe2fluid
---
# How to use
1. prepare python environment
2. download caffe model to "models.caffe/xxx" which contains "xxx.caffemodel" and "xxx.prototxt"
3. run the tool
eg: bash ./run.sh resnet50 ./models.caffe/resnet50 ./models/resnet50
#!/bin/env python
#function:
# a demo to show how to use the converted model genereated by caffe2fluid
#
#notes:
# only support imagenet data
import os
import sys
import inspect
import numpy as np
import paddle.v2 as paddle
import paddle.v2.fluid as fluid
def load_data(imgfile, shape):
h, w = shape[1:]
from PIL import Image
im = Image.open(imgfile)
# The storage order of the loaded image is W(widht),
# H(height), C(channel). PaddlePaddle requires
# the CHW order, so transpose them.
im = im.resize((w, h), Image.ANTIALIAS)
im = np.array(im).astype(np.float32)
im = im.transpose((2, 0, 1)) # CHW
im = im[(2, 1, 0), :, :] # BGR
# The mean to be subtracted from each image.
# By default, the per-channel ImageNet mean.
mean = np.array([104., 117., 124.], dtype=np.float32)
mean = mean.reshape([3, 1, 1])
im = im - mean
return im.reshape([1] + shape)
def build_model(net_file, net_name):
print('build model with net_file[%s] and net_name[%s]' %
(net_file, net_name))
net_path = os.path.dirname(net_file)
module_name = os.path.basename(net_file).rstrip('.py')
if net_path not in sys.path:
sys.path.insert(0, net_path)
try:
m = __import__(module_name, fromlist=[net_name])
MyNet = getattr(m, net_name)
except Exception as e:
print('failed to load module[%s]' % (module_name))
print(e)
return None
input_name = 'data'
input_shape = MyNet.input_shapes()[input_name]
images = fluid.layers.data(name='image', shape=input_shape, dtype='float32')
#label = fluid.layers.data(name='label', shape=[1], dtype='int64')
net = MyNet({input_name: images})
input_shape = MyNet.input_shapes()[input_name]
return net, input_shape
def dump_results(results, names, root):
if os.path.exists(root) is False:
os.path.mkdir(root)
for i in range(len(names)):
n = names[i]
res = results[i]
filename = os.path.join(root, n)
np.save(filename + '.npy', res)
def infer(net_file, net_name, model_file, imgfile, debug=False):
""" do inference using a model which consist 'xxx.py' and 'xxx.npy'
"""
#1, build model
net, input_shape = build_model(net_file, net_name)
prediction = net.get_output()
#2, load weights for this model
place = fluid.CPUPlace()
exe = fluid.Executor(place)
startup_program = fluid.default_startup_program()
exe.run(startup_program)
if model_file.find('.npy') > 0:
net.load(data_path=model_file, exe=exe, place=place)
else:
net.load(data_path=model_file, exe=exe)
#3, test this model
test_program = fluid.default_main_program().clone()
fetch_list_var = []
fetch_list_name = []
if debug is False:
fetch_list_var.append(prediction)
else:
for k, v in net.layers.items():
fetch_list_var.append(v)
fetch_list_name.append(k)
np_images = load_data(imgfile, input_shape)
results = exe.run(program=test_program,
feed={'image': np_images},
fetch_list=fetch_list_var)
if debug is True:
dump_path = 'results.layers'
dump_results(results, fetch_list_name, dump_path)
print('all results dumped to [%s]' % (dump_path))
else:
result = results[0]
print('predicted class:', np.argmax(result))
if __name__ == "__main__":
""" maybe more convenient to use 'run.sh' to call this tool
"""
net_file = 'models/resnet50/resnet50.py'
weight_file = 'models/resnet50/resnet50.npy'
imgfile = 'data/65.jpeg'
net_name = 'ResNet50'
argc = len(sys.argv)
if argc == 5:
net_file = sys.argv[1]
weight_file = sys.argv[2]
imgfile = sys.argv[3]
net_name = sys.argv[4]
elif argc > 1:
print('usage:')
print('\tpython %s [net_file] [weight_file] [imgfile] [net_name]' %
(sys.argv[0]))
print('\teg:python %s %s %s %s %s' % (sys.argv[0], net_file,
weight_file, imgfile, net_name))
sys.exit(1)
infer(net_file, net_name, weight_file, imgfile)
#!/bin/bash
#function:
# a tool used to:
# 1, convert a caffe model
# 2, do inference using this model
#
#usage:
# bash run.sh resnet50 ./models.caffe/resnet50 ./models/resnet50
#
#set -x
if [[ $# -lt 3 ]];then
echo "usage:"
echo " bash $0 [model_name] [cf_model_path] [pd_model_path] [only_convert]"
echo " eg: bash $0 resnet50 ./models.caffe/resnet50 ./models/resnet50"
exit 1
else
model_name=$1
cf_model_path=$2
pd_model_path=$3
only_convert=$4
fi
proto_file=$cf_model_path/${model_name}.prototxt
caffemodel_file=$cf_model_path/${model_name}.caffemodel
weight_file=$pd_model_path/${model_name}.npy
net_file=$pd_model_path/${model_name}.py
if [[ ! -e $proto_file ]];then
echo "not found prototxt[$proto_file]"
exit 1
fi
if [[ ! -e $caffemodel_file ]];then
echo "not found caffemodel[$caffemodel_file]"
exit 1
fi
if [[ ! -e $pd_model_path ]];then
mkdir $pd_model_path
fi
PYTHON=`which cfpython`
if [[ -z $PYTHON ]];then
PYTHON=`which python`
fi
$PYTHON ../../convert.py \
$proto_file \
--caffemodel $caffemodel_file \
--data-output-path $weight_file\
--code-output-path $net_file
ret=$?
if [[ $ret -ne 0 ]];then
echo "failed to convert caffe model[$cf_model_path]"
exit $ret
else
echo "succeed to convert caffe model[$cf_model_path] to fluid model[$pd_model_path]"
fi
if [[ -z $only_convert ]];then
PYTHON=`which pdpython`
if [[ -z $PYTHON ]];then
PYTHON=`which python`
fi
imgfile="data/65.jpeg"
net_name=`grep "name" $proto_file | head -n1 | perl -ne 'if(/\"([^\"]+)\"/){ print $1."\n";}'`
$PYTHON ./infer.py $net_file $weight_file $imgfile $net_name
ret=$?
fi
exit $ret
a demo to show converting caffe model on 'mnist' using caffe2fluid
---
# How to use
1. prepare python environment
2. download caffe model to "models.caffe/lenet" which contains "lenet.caffemodel" and "lenet.prototxt"
3. run the tool
eg: bash ./run.sh lenet ./models.caffe/lenet ./models/lenet
#!/bin/env python
#function:
# demo to show how to use converted model using caffe2fluid
#
import sys
import os
import numpy as np
import paddle.v2 as paddle
import paddle.v2.fluid as fluid
def test_model(exe, test_program, fetch_list, test_reader, feeder):
acc_set = []
for data in test_reader():
acc_np, pred = exe.run(program=test_program,
feed=feeder.feed(data),
fetch_list=fetch_list)
acc_set.append(float(acc_np))
acc_val = np.array(acc_set).mean()
return float(acc_val)
def evaluate(net_file, model_file):
""" main
"""
#1, build model
net_path = os.path.dirname(net_file)
if net_path not in sys.path:
sys.path.insert(0, net_path)
from lenet import LeNet as MyNet
with_gpu = False
paddle.init(use_gpu=with_gpu)
#1, define network topology
images = fluid.layers.data(name='image', shape=[1, 28, 28], dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
net = MyNet({'data': images})
prediction = net.layers['prob']
acc = fluid.layers.accuracy(input=prediction, label=label)
place = fluid.CUDAPlace(0) if with_gpu is True else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
#2, load weights
if model_file.find('.npy') > 0:
net.load(data_path=model_file, exe=exe, place=place)
else:
net.load(data_path=model_file, exe=exe)
#3, test this model
test_program = fluid.default_main_program().clone()
test_reader = paddle.batch(paddle.dataset.mnist.test(), batch_size=128)
feeder = fluid.DataFeeder(feed_list=[images, label], place=place)
fetch_list = [acc, prediction]
print('go to test model using test set')
acc_val = test_model(exe, test_program, \
fetch_list, test_reader, feeder)
print('test accuracy is [%.4f], expected value[0.919]' % (acc_val))
if __name__ == "__main__":
net_file = 'models/lenet/lenet.py'
weight_file = 'models/lenet/lenet.npy'
argc = len(sys.argv)
if argc == 3:
net_file = sys.argv[1]
weight_file = sys.argv[2]
elif argc > 1:
print('usage:')
print('\tpython %s [net_file] [weight_file]' % (sys.argv[0]))
print('\teg:python %s %s %s %s' % (sys.argv[0], net_file, weight_file))
sys.exit(1)
evaluate(net_file, weight_file)
#!/bin/bash
#function:
# a tool used to:
# 1, convert a caffe model
# 2, do inference using this model
#
#usage:
# bash run.sh lenet ./models.caffe/lenet ./models/lenet
#
#set -x
if [[ $# -lt 3 ]];then
echo "usage:"
echo " bash $0 [model_name] [cf_model_path] [pd_model_path] [only_convert]"
echo " eg: bash $0 lenet ./models.caffe/lenet ./models/lenet"
exit 1
else
model_name=$1
cf_model_path=$2
pd_model_path=$3
no_eval=$4
fi
proto_file=$cf_model_path/${model_name}.prototxt
caffemodel_file=$cf_model_path/${model_name}.caffemodel
weight_file=$pd_model_path/${model_name}.npy
net_file=$pd_model_path/${model_name}.py
if [[ ! -e $proto_file ]];then
echo "not found prototxt[$proto_file]"
exit 1
fi
if [[ ! -e $caffemodel_file ]];then
echo "not found caffemodel[$caffemodel_file]"
exit 1
fi
if [[ ! -e $pd_model_path ]];then
mkdir $pd_model_path
fi
PYTHON=`which cfpython`
if [[ -z $PYTHON ]];then
PYTHON=`which python`
fi
$PYTHON ../../convert.py \
$proto_file \
--caffemodel $caffemodel_file \
--data-output-path $weight_file\
--code-output-path $net_file
ret=$?
if [[ $ret -ne 0 ]];then
echo "failed to convert caffe model[$cf_model_path]"
exit $ret
else
echo "succeed to convert caffe model[$cf_model_path] to fluid model[$pd_model_path]"
fi
if [[ -z $only_convert ]];then
PYTHON=`which pdpython`
if [[ -z $PYTHON ]];then
PYTHON=`which python`
fi
net_name=`grep "name" $proto_file | head -n1 | perl -ne 'if(/\"([^\"]+)\"/){ print $1."\n";}'`
if [[ $net_name != "LeNet" ]];then
echo "only support LeNet"
exit 1
fi
$PYTHON ./evaluate.py $net_file $weight_file
ret=$?
fi
exit $ret
from .graph import GraphBuilder, NodeMapper
from .errors import KaffeError, print_stderr
import os
from . import paddle
from .resolver import get_caffe_resolver, has_pycaffe
import os
import sys
SHARED_CAFFE_RESOLVER = None
def import_caffepb():
p = os.path.realpath(__file__)
p = os.path.dirname(p)
p = os.path.join(p, '../../proto')
sys.path.insert(0, p)
import caffepb
return caffepb
class CaffeResolver(object):
def __init__(self):
self.import_caffe()
def import_caffe(self):
self.caffe = None
try:
# Try to import PyCaffe first
import caffe
self.caffe = caffe
except ImportError:
# Fall back to the protobuf implementation
self.caffepb = import_caffepb()
show_fallback_warning()
if self.caffe:
# Use the protobuf code from the imported distribution.
# This way, Caffe variants with custom layers will work.
self.caffepb = self.caffe.proto.caffe_pb2
self.NetParameter = self.caffepb.NetParameter
def has_pycaffe(self):
return self.caffe is not None
def get_caffe_resolver():
global SHARED_CAFFE_RESOLVER
if SHARED_CAFFE_RESOLVER is None:
SHARED_CAFFE_RESOLVER = CaffeResolver()
return SHARED_CAFFE_RESOLVER
def has_pycaffe():
return get_caffe_resolver().has_pycaffe()
def show_fallback_warning():
msg = '''
------------------------------------------------------------
WARNING: PyCaffe not found!
Falling back to a pure protocol buffer implementation.
* Conversions will be drastically slower.
------------------------------------------------------------
'''
sys.stderr.write(msg)
import sys
#debug level, can be 'warn', 'verbose'
log_level = 'warn'
class KaffeError(Exception):
pass
def print_stderr(msg):
sys.stderr.write('%s\n' % msg)
def debug(msg):
if log_level == 'verbose':
print_stderr('[DEBUG]' + msg)
def notice(msg):
print_stderr('[NOTICE]' + msg)
def warn(msg):
print_stderr('[WARNING]' + msg)
def set_loglevel(level):
global log_level
if 'warn' != level and 'verbose' != level:
raise Exception('not supported log level[%s]' % (level))
log_level = level
from google.protobuf import text_format
from .caffe import get_caffe_resolver
from .errors import KaffeError, print_stderr
from .layers import LayerAdapter, LayerType, NodeKind, NodeDispatch
from .shapes import TensorShape
class Node(object):
def __init__(self, name, kind, layer=None):
self.name = name
self.kind = kind
self.layer = LayerAdapter(layer, kind) if layer else None
self.parents = []
self.children = []
self.data = None
self.output_shape = None
self.metadata = {}
def add_parent(self, parent_node):
assert parent_node not in self.parents
self.parents.append(parent_node)
if self not in parent_node.children:
parent_node.children.append(self)
def add_child(self, child_node):
assert child_node not in self.children
self.children.append(child_node)
if self not in child_node.parents:
child_node.parents.append(self)
def get_only_parent(self):
if len(self.parents) != 1:
raise KaffeError('Node (%s) expected to have 1 parent. Found %s.' %
(self, len(self.parents)))
return self.parents[0]
@property
def parameters(self):
if self.layer is not None:
return self.layer.parameters
return None
def __str__(self):
return '[%s] %s' % (self.kind, self.name)
def __repr__(self):
return '%s (0x%x)' % (self.name, id(self))
class Graph(object):
def __init__(self, nodes=None, name=None):
self.nodes = nodes or []
self.node_lut = {node.name: node for node in self.nodes}
self.name = name
def add_node(self, node):
self.nodes.append(node)
self.node_lut[node.name] = node
def get_node(self, name):
try:
return self.node_lut[name]
except KeyError:
raise KaffeError('Layer not found: %s' % name)
def get_input_nodes(self):
return [node for node in self.nodes if len(node.parents) == 0]
def get_output_nodes(self):
return [node for node in self.nodes if len(node.children) == 0]
def topologically_sorted(self):
sorted_nodes = []
unsorted_nodes = list(self.nodes)
temp_marked = set()
perm_marked = set()
def visit(node):
if node in temp_marked:
raise KaffeError('Graph is not a DAG.')
if node in perm_marked:
return
temp_marked.add(node)
for child in node.children:
visit(child)
perm_marked.add(node)
temp_marked.remove(node)
sorted_nodes.insert(0, node)
while len(unsorted_nodes):
visit(unsorted_nodes.pop())
return sorted_nodes
def compute_output_shapes(self):
sorted_nodes = self.topologically_sorted()
for node in sorted_nodes:
node.output_shape = TensorShape(
*NodeKind.compute_output_shape(node))
def replaced(self, new_nodes):
return Graph(nodes=new_nodes, name=self.name)
def transformed(self, transformers):
graph = self
for transformer in transformers:
graph = transformer(graph)
if graph is None:
raise KaffeError('Transformer failed: {}'.format(transformer))
assert isinstance(graph, Graph)
return graph
def __contains__(self, key):
return key in self.node_lut
def __str__(self):
hdr = '{:<20} {:<30} {:>20} {:>20}'.format('Type', 'Name', 'Param',
'Output')
s = [hdr, '-' * 94]
for node in self.topologically_sorted():
# If the node has learned parameters, display the first one's shape.
# In case of convolutions, this corresponds to the weights.
data_shape = node.data[0].shape if node.data else '--'
out_shape = node.output_shape or '--'
s.append('{:<20} {:<30} {:>20} {:>20}'.format(
node.kind, node.name, data_shape, tuple(out_shape)))
return '\n'.join(s)
class GraphBuilder(object):
'''Constructs a model graph from a Caffe protocol buffer definition.'''
def __init__(self, def_path, phase='test'):
'''
def_path: Path to the model definition (.prototxt)
data_path: Path to the model data (.caffemodel)
phase: Either 'test' or 'train'. Used for filtering phase-specific nodes.
'''
self.def_path = def_path
self.phase = phase
self.load()
def load(self):
'''Load the layer definitions from the prototxt.'''
self.params = get_caffe_resolver().NetParameter()
with open(self.def_path, 'rb') as def_file:
text_format.Merge(def_file.read(), self.params)
def filter_layers(self, layers):
'''Filter out layers based on the current phase.'''
phase_map = {0: 'train', 1: 'test'}
filtered_layer_names = set()
filtered_layers = []
for layer in layers:
phase = self.phase
if len(layer.include):
phase = phase_map[layer.include[0].phase]
if len(layer.exclude):
phase = phase_map[1 - layer.include[0].phase]
exclude = (phase != self.phase)
# Dropout layers appear in a fair number of Caffe
# test-time networks. These are just ignored. We'll
# filter them out here.
if (not exclude) and (phase == 'test'):
exclude = (layer.type == LayerType.Dropout)
if not exclude:
filtered_layers.append(layer)
# Guard against dupes.
assert layer.name not in filtered_layer_names
filtered_layer_names.add(layer.name)
return filtered_layers
def make_node(self, layer):
'''Create a graph node for the given layer.'''
kind = NodeKind.map_raw_kind(layer.type)
if kind is None:
raise KaffeError('Unknown layer type encountered: %s' % layer.type)
# We want to use the layer's top names (the "output" names), rather than the
# name attribute, which is more of readability thing than a functional one.
# Other layers will refer to a node by its "top name".
return Node(layer.name, kind, layer=layer)
def make_input_nodes(self):
'''
Create data input nodes.
This method is for old-style inputs, where the input specification
was not treated as a first-class layer in the prototext.
Newer models use the "Input layer" type.
'''
nodes = [Node(name, NodeKind.Data) for name in self.params.input]
if len(nodes):
input_dim = map(int, self.params.input_dim)
if not input_dim:
if len(self.params.input_shape) > 0:
input_dim = map(int, self.params.input_shape[0].dim)
else:
raise KaffeError('Dimensions for input not specified.')
for node in nodes:
node.output_shape = tuple(input_dim)
return nodes
def build(self):
'''
Builds the graph from the Caffe layer definitions.
'''
# Get the layers
layers = self.params.layers or self.params.layer
# Filter out phase-excluded layers
layers = self.filter_layers(layers)
# Get any separately-specified input layers
nodes = self.make_input_nodes()
nodes += [self.make_node(layer) for layer in layers]
# Initialize the graph
graph = Graph(nodes=nodes, name=self.params.name)
# Connect the nodes
#
# A note on layers and outputs:
# In Caffe, each layer can produce multiple outputs ("tops") from a set of inputs
# ("bottoms"). The bottoms refer to other layers' tops. The top can rewrite a bottom
# (in case of in-place operations). Note that the layer's name is not used for establishing
# any connectivity. It's only used for data association. By convention, a layer with a
# single top will often use the same name (although this is not required).
#
# The current implementation only supports single-output nodes (note that a node can still
# have multiple children, since multiple child nodes can refer to the single top's name).
node_outputs = {}
for layer in layers:
node = graph.get_node(layer.name)
for input_name in layer.bottom:
assert input_name != layer.name
parent_node = node_outputs.get(input_name)
if (parent_node is None) or (parent_node == node):
parent_node = graph.get_node(input_name)
node.add_parent(parent_node)
if len(layer.top) > 1:
raise KaffeError('Multiple top nodes are not supported.')
for output_name in layer.top:
if output_name == layer.name:
# Output is named the same as the node. No further action required.
continue
# There are two possibilities here:
#
# Case 1: output_name refers to another node in the graph.
# This is an "in-place operation" that overwrites an existing node.
# This would create a cycle in the graph. We'll undo the in-placing
# by substituting this node wherever the overwritten node is referenced.
#
# Case 2: output_name violates the convention layer.name == output_name.
# Since we are working in the single-output regime, we will can rename it to
# match the layer name.
#
# For both cases, future references to this top re-routes to this node.
node_outputs[output_name] = node
graph.compute_output_shapes()
return graph
class NodeMapper(NodeDispatch):
def __init__(self, graph):
self.graph = graph
def map(self):
nodes = self.graph.topologically_sorted()
# Remove input nodes - we'll handle them separately.
input_nodes = self.graph.get_input_nodes()
nodes = [t for t in nodes if t not in input_nodes]
# Decompose DAG into chains.
chains = []
for node in nodes:
attach_to_chain = None
if len(node.parents) == 1:
parent = node.get_only_parent()
for chain in chains:
if chain[-1] == parent:
# Node is part of an existing chain.
attach_to_chain = chain
break
if attach_to_chain is None:
# Start a new chain for this node.
attach_to_chain = []
chains.append(attach_to_chain)
attach_to_chain.append(node)
# Map each chain.
mapped_chains = []
for chain in chains:
mapped_chains.append(self.map_chain(chain))
return self.commit(mapped_chains)
def map_chain(self, chain):
return [self.map_node(node) for node in chain]
def map_node(self, node):
map_func = self.get_handler(node.kind, 'map')
mapped_node = map_func(node)
assert mapped_node is not None
mapped_node.node = node
return mapped_node
def commit(self, mapped_chains):
raise NotImplementedError('Must be implemented by subclass.')
import re
import numbers
from collections import namedtuple
from .shapes import *
LAYER_DESCRIPTORS = {
# Caffe Types
'AbsVal': shape_identity,
'Accuracy': shape_scalar,
'ArgMax': shape_not_implemented,
'BatchNorm': shape_identity,
'BNLL': shape_not_implemented,
'Concat': shape_concat,
'ContrastiveLoss': shape_scalar,
'Convolution': shape_convolution,
'Deconvolution': shape_not_implemented,
'Data': shape_data,
'Dropout': shape_identity,
'DummyData': shape_data,
'EuclideanLoss': shape_scalar,
'Eltwise': shape_identity,
'Exp': shape_identity,
'Flatten': shape_not_implemented,
'HDF5Data': shape_data,
'HDF5Output': shape_identity,
'HingeLoss': shape_scalar,
'Im2col': shape_not_implemented,
'ImageData': shape_data,
'InfogainLoss': shape_scalar,
'InnerProduct': shape_inner_product,
'Input': shape_data,
'LRN': shape_identity,
'MemoryData': shape_mem_data,
'MultinomialLogisticLoss': shape_scalar,
'MVN': shape_not_implemented,
'Pooling': shape_pool,
'Power': shape_identity,
'ReLU': shape_identity,
'Scale': shape_identity,
'Sigmoid': shape_identity,
'SigmoidCrossEntropyLoss': shape_scalar,
'Silence': shape_not_implemented,
'Softmax': shape_identity,
'SoftmaxWithLoss': shape_scalar,
'Split': shape_not_implemented,
'Slice': shape_not_implemented,
'TanH': shape_identity,
'WindowData': shape_not_implemented,
'Threshold': shape_identity,
}
# layer types in 'V1LayerParameter'
# (v1layertype name, enum value, mapped to layer type)
v1_layertypes = [
('ABSVAL', 35),
('ACCURACY', 1),
('ARGMAX', 30),
('BNLL', 2),
('CONCAT', 3),
('CONVOLUTION', 4),
('DATA', 5),
('DECONVOLUTION', 39),
('DROPOUT', 6),
('ELTWISE', 25),
('EXP', 38),
('FLATTEN', 8),
('IM2COL', 11),
('INNERPRODUCT', 14),
('LRN', 15),
('MEMORYDATA', 29),
('MULTINOMIALLOGISTICLOSS', 16),
('MVN', 34),
('POOLING', 17),
('POWER', 26),
('RELU', 18),
('SIGMOID', 19),
('SIGMOIDCROSSENTROPYLOSS', 27),
('SILENCE', 36),
('SOFTMAX', 20),
('SPLIT', 22),
('SLICE', 33),
('TANH', 23),
('WINDOWDATA', 24),
('THRESHOLD', 31),
]
LAYER_TYPES = LAYER_DESCRIPTORS.keys()
LayerType = type('LayerType', (), {t: t for t in LAYER_TYPES})
#map the layer name in V1 to standard name
V1_LAYER_MAP = {'_not_init_': True}
def get_v1_layer_map():
global V1_LAYER_MAP
if '_not_init_' not in V1_LAYER_MAP:
return V1_LAYER_MAP
else:
del V1_LAYER_MAP['_not_init_']
name2layer = {}
for n in LAYER_TYPES:
name2layer[n.upper()] = n
for l in v1_layertypes:
n, v = l
if n in name2layer and v not in V1_LAYER_MAP:
V1_LAYER_MAP[v] = name2layer[n]
else:
raise KaffeError('not found v1 layer type %s' % n)
return V1_LAYER_MAP
class NodeKind(LayerType):
@staticmethod
def map_raw_kind(kind):
if kind in LAYER_TYPES:
return kind
v1_layers = get_v1_layer_map()
if kind in v1_layers:
return v1_layers[kind]
else:
return None
@staticmethod
def compute_output_shape(node):
try:
val = LAYER_DESCRIPTORS[node.kind](node)
return val
except NotImplementedError:
raise KaffeError(
'Output shape computation not implemented for type: %s' %
node.kind)
class NodeDispatchError(KaffeError):
pass
class NodeDispatch(object):
@staticmethod
def get_handler_name(node_kind):
if len(node_kind) <= 4:
# A catch-all for things like ReLU and tanh
return node_kind.lower()
# Convert from CamelCase to under_scored
name = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', node_kind)
return re.sub('([a-z0-9])([A-Z])', r'\1_\2', name).lower()
def get_handler(self, node_kind, prefix):
name = self.get_handler_name(node_kind)
name = '_'.join((prefix, name))
try:
return getattr(self, name)
except AttributeError:
raise NodeDispatchError(
'No handler found for node kind: %s (expected: %s)' %
(node_kind, name))
class LayerAdapter(object):
def __init__(self, layer, kind):
self.layer = layer
self.kind = kind
@property
def parameters(self):
name = NodeDispatch.get_handler_name(self.kind)
name = '_'.join((name, 'param'))
try:
return getattr(self.layer, name)
except AttributeError:
raise NodeDispatchError(
'Caffe parameters not found for layer kind: %s' % (self.kind))
@staticmethod
def get_kernel_value(scalar, repeated, idx, default=None):
if scalar:
return scalar
if repeated:
if isinstance(repeated, numbers.Number):
return repeated
if len(repeated) == 1:
# Same value applies to all spatial dimensions
return int(repeated[0])
assert idx < len(repeated)
# Extract the value for the given spatial dimension
return repeated[idx]
if default is None:
raise ValueError('Unable to determine kernel parameter!')
return default
@property
def kernel_parameters(self):
assert self.kind in (NodeKind.Convolution, NodeKind.Pooling)
params = self.parameters
k_h = self.get_kernel_value(params.kernel_h, params.kernel_size, 0)
k_w = self.get_kernel_value(params.kernel_w, params.kernel_size, 1)
s_h = self.get_kernel_value(
params.stride_h, params.stride, 0, default=1)
s_w = self.get_kernel_value(
params.stride_w, params.stride, 1, default=1)
p_h = self.get_kernel_value(params.pad_h, params.pad, 0, default=0)
p_w = self.get_kernel_value(params.pad_h, params.pad, 1, default=0)
return KernelParameters(k_h, k_w, s_h, s_w, p_h, p_w)
KernelParameters = namedtuple('KernelParameters', [
'kernel_h', 'kernel_w', 'stride_h', 'stride_w', 'pad_h', 'pad_w'
])
from .transformer import Transformer
from .network import Network
import math
import os
import numpy as np
def import_fluid():
import paddle.v2.fluid as fluid
return fluid
def layer(op):
'''Decorator for composable network layers.'''
def layer_decorated(self, *args, **kwargs):
# Automatically set a name if not provided.
name = kwargs.setdefault('name', self.get_unique_name(op.__name__))
# Figure out the layer inputs.
if len(self.terminals) == 0:
raise RuntimeError('No input variables found for layer %s.' % name)
elif len(self.terminals) == 1:
layer_input = self.terminals[0]
else:
layer_input = list(self.terminals)
# Perform the operation and get the output.
layer_output = op(self, layer_input, *args, **kwargs)
# Add to layer LUT.
self.layers[name] = layer_output
# This output is now the input for the next layer.
self.feed(layer_output)
#print('output shape of %s:' % (name))
#print layer_output.shape
# Return self for chained calls.
return self
return layer_decorated
class Network(object):
def __init__(self, inputs, trainable=True):
# The input nodes for this network
self.inputs = inputs
# The current list of terminal nodes
self.terminals = []
# Mapping from layer names to layers
self.layers = dict(inputs)
# If true, the resulting variables are set as trainable
self.trainable = trainable
# Switch variable for dropout
self.paddle_env = None
self.setup()
def setup(self):
'''Construct the network. '''
raise NotImplementedError('Must be implemented by the subclass.')
def load(self, data_path, exe=None, place=None, ignore_missing=False):
'''Load network weights.
data_path: The path to the numpy-serialized network weights
ignore_missing: If true, serialized weights for missing layers are ignored.
'''
fluid = import_fluid()
#load fluid mode directly
if os.path.isdir(data_path):
assert (exe is not None), \
'must provide a executor to load fluid model'
fluid.io.load_persistables_if_exist(executor=exe, dirname=data_path)
return True
#load model from a npy file
if exe is None or place is None:
if self.paddle_env is None:
place = fluid.CPUPlace()
exe = fluid.Executor(place)
self.paddle_env = {'place': place, 'exe': exe}
exe = exe.run(fluid.default_startup_program())
else:
place = self.paddle_env['place']
exe = self.paddle_env['exe']
data_dict = np.load(data_path).item()
for op_name in data_dict:
layer = self.layers[op_name]
for param_name, data in data_dict[op_name].iteritems():
try:
name = '%s_%s' % (op_name, param_name)
v = fluid.global_scope().find_var(name)
w = v.get_tensor()
w.set(data, place)
except ValueError:
if not ignore_missing:
raise
return True
def feed(self, *args):
'''Set the input(s) for the next operation by replacing the terminal nodes.
The arguments can be either layer names or the actual layers.
'''
assert len(args) != 0
self.terminals = []
for fed_layer in args:
if isinstance(fed_layer, basestring):
try:
fed_layer = self.layers[fed_layer]
except KeyError:
raise KeyError('Unknown layer name fed: %s' % fed_layer)
self.terminals.append(fed_layer)
return self
def get_output(self):
'''Returns the current network output.'''
return self.terminals[-1]
def get_unique_name(self, prefix):
'''Returns an index-suffixed unique name for the given prefix.
This is used for auto-generating layer names based on the type-prefix.
'''
ident = sum(t.startswith(prefix) for t, _ in self.layers.items()) + 1
return '%s_%d' % (prefix, ident)
@layer
def conv(self,
input,
k_h,
k_w,
c_o,
s_h,
s_w,
name,
relu=True,
padding=None,
group=1,
biased=True):
if padding is None:
padding = [0, 0]
# Get the number of channels in the input
c_i, h_i, w_i = input.shape[1:]
# Verify that the grouping parameter is valid
assert c_i % group == 0
assert c_o % group == 0
fluid = import_fluid()
prefix = name + '_'
output = fluid.layers.conv2d(
input=input,
filter_size=[k_h, k_w],
num_filters=c_o,
stride=[s_h, s_w],
padding=padding,
groups=group,
param_attr=fluid.ParamAttr(name=prefix + "weights"),
bias_attr=fluid.ParamAttr(name=prefix + "biases"),
act="relu" if relu is True else None)
return output
@layer
def relu(self, input, name):
fluid = import_fluid()
output = fluid.layers.relu(x=input)
return output
def _adjust_pad_if_needed(self, i_hw, k_hw, s_hw, p_hw):
#adjust the padding if needed
i_h, i_w = i_hw
k_h, k_w = k_hw
s_h, s_w = s_hw
p_h, p_w = p_hw
def is_consistent(i, k, s, p):
o = i + 2 * p - k
if o % s == 0:
return True
else:
return False
real_p_h = 0
real_p_w = 0
if is_consistent(i_h, k_h, s_h, p_h) is False:
real_p_h = int(k_h / 2)
if is_consistent(i_w, k_w, s_w, p_w) is False:
real_p_w = int(k_w / 2)
return [real_p_h, real_p_w]
def pool(self, pool_type, input, k_h, k_w, s_h, s_w, name, padding):
# Get the number of channels in the input
in_hw = input.shape[2:]
k_hw = [k_h, k_w]
s_hw = [s_h, s_w]
if padding is None:
#fix bug about the difference between conv and pool
#more info: https://github.com/BVLC/caffe/issues/1318
padding = self._adjust_pad_if_needed(in_hw, k_hw, s_hw, [0, 0])
fluid = import_fluid()
output = fluid.layers.pool2d(
input=input,
pool_size=k_hw,
pool_stride=s_hw,
pool_padding=padding,
pool_type=pool_type)
return output
@layer
def max_pool(self, input, k_h, k_w, s_h, s_w, name, padding=None):
return self.pool('max', input, k_h, k_w, s_h, s_w, name, padding)
@layer
def avg_pool(self, input, k_h, k_w, s_h, s_w, name, padding=None):
return self.pool('avg', input, k_h, k_w, s_h, s_w, name, padding)
@layer
def lrn(self, input, radius, alpha, beta, name, bias=1.0):
fluid = import_fluid()
output = fluid.layers.lrn(input=input, \
n=radius, k=bias, alpha=alpha, beta=beta, name=name)
return output
@layer
def concat(self, inputs, axis, name):
fluid = import_fluid()
output = fluid.layers.concat(input=inputs, axis=axis)
return output
@layer
def add(self, inputs, name):
fluid = import_fluid()
output = inputs[0]
for i in inputs[1:]:
output = fluid.layers.elementwise_add(x=output, y=i)
return output
@layer
def fc(self, input, num_out, name, relu=True, act=None):
fluid = import_fluid()
if act is None:
act = 'relu' if relu is True else None
prefix = name + '_'
output = fluid.layers.fc(
name=name,
input=input,
size=num_out,
act=act,
param_attr=fluid.ParamAttr(name=prefix + 'weights'),
bias_attr=fluid.ParamAttr(name=prefix + 'biases'))
return output
@layer
def softmax(self, input, name):
fluid = import_fluid()
output = fluid.layers.softmax(input)
return output
@layer
def batch_normalization(self, input, name, scale_offset=True, relu=False):
# NOTE: Currently, only inference is supported
fluid = import_fluid()
prefix = name + '_'
param_attr = None if scale_offset is False else fluid.ParamAttr(
name=prefix + 'scale')
bias_attr = None if scale_offset is False else fluid.ParamAttr(
name=prefix + 'offset')
mean_name = prefix + 'mean'
variance_name = prefix + 'variance'
output = fluid.layers.batch_norm(
name=name,
input=input,
is_test=True,
param_attr=param_attr,
bias_attr=bias_attr,
moving_mean_name=mean_name,
moving_variance_name=variance_name,
epsilon=1e-5,
act='relu' if relu is True else None)
return output
@layer
def dropout(self, input, drop_prob, name, is_test=True):
fluid = import_fluid()
output = fluid.layers.dropout(
input, dropout_prob=drop_prob, is_test=is_test, name=name)
return output
import numpy as np
from ..errors import KaffeError, print_stderr
from ..graph import GraphBuilder, NodeMapper
from ..layers import NodeKind
from ..transformers import (DataInjector, DataReshaper, NodeRenamer, ReLUFuser,
BatchNormScaleBiasFuser, BatchNormPreprocessor,
ParameterNamer)
from . import network
def get_padding_type(kernel_params, input_shape, output_shape):
'''Translates Caffe's numeric padding to one of ('SAME', 'VALID').
Caffe supports arbitrary padding values, while TensorFlow only
supports 'SAME' and 'VALID' modes. So, not all Caffe paddings
can be translated to TensorFlow. There are some subtleties to
how the padding edge-cases are handled. These are described here:
https://github.com/Yangqing/caffe2/blob/master/caffe2/proto/caffe2_legacy.proto
'''
k_h, k_w, s_h, s_w, p_h, p_w = kernel_params
if p_h * p_w > 0:
return [p_h, p_w]
else:
return None
class TensorFlowNode(object):
'''An intermediate representation for TensorFlow operations.'''
def __init__(self, op, *args, **kwargs):
# A string corresponding to the TensorFlow operation
self.op = op
# Positional arguments for the operation
self.args = args
# Keyword arguments for the operation
self.kwargs = list(kwargs.items())
# The source Caffe node
self.node = None
def format(self, arg):
'''Returns a string representation for the given value.'''
return "'%s'" % arg if isinstance(arg, basestring) else str(arg)
def pair(self, key, value):
'''Returns key=formatted(value).'''
return '%s=%s' % (key, self.format(value))
def emit(self):
'''Emits the Python source for this node.'''
# Format positional arguments
args = map(self.format, self.args)
# Format any keyword arguments
if self.kwargs:
args += [self.pair(k, v) for k, v in self.kwargs]
# Set the node name
args.append(self.pair('name', self.node.name))
args = ', '.join(args)
return '%s(%s)' % (self.op, args)
class MaybeActivated(object):
def __init__(self, node, default=True):
self.inject_kwargs = {}
if node.metadata.get('relu', False) != default:
self.inject_kwargs['relu'] = not default
def __call__(self, *args, **kwargs):
kwargs.update(self.inject_kwargs)
return TensorFlowNode(*args, **kwargs)
class TensorFlowMapper(NodeMapper):
def get_kernel_params(self, node):
kernel_params = node.layer.kernel_parameters
input_shape = node.get_only_parent().output_shape
padding = get_padding_type(kernel_params, input_shape,
node.output_shape)
# Only emit the padding if it's not the default value.
padding = {'padding': padding} if padding is not None else {}
return (kernel_params, padding)
def map_convolution(self, node):
(kernel_params, kwargs) = self.get_kernel_params(node)
h = kernel_params.kernel_h
w = kernel_params.kernel_w
c_o = node.output_shape[1]
c_i = node.parents[0].output_shape[1]
group = node.parameters.group
if group != 1:
kwargs['group'] = group
if not node.parameters.bias_term:
kwargs['biased'] = False
assert kernel_params.kernel_h == h
assert kernel_params.kernel_w == w
return MaybeActivated(node)(
'conv', kernel_params.kernel_h, kernel_params.kernel_w, c_o,
kernel_params.stride_h, kernel_params.stride_w, **kwargs)
def map_relu(self, node):
return TensorFlowNode('relu')
def map_pooling(self, node):
pool_type = node.parameters.pool
if pool_type == 0:
pool_op = 'max_pool'
elif pool_type == 1:
pool_op = 'avg_pool'
else:
# Stochastic pooling, for instance.
raise KaffeError('Unsupported pooling type.')
(kernel_params, padding) = self.get_kernel_params(node)
return TensorFlowNode(pool_op, kernel_params.kernel_h,
kernel_params.kernel_w, kernel_params.stride_h,
kernel_params.stride_w, **padding)
def map_inner_product(self, node):
#TODO: Axis
assert node.parameters.axis == 1
#TODO: Unbiased
assert node.parameters.bias_term == True
return MaybeActivated(node)('fc', node.parameters.num_output)
def map_softmax(self, node):
return TensorFlowNode('softmax')
def map_lrn(self, node):
params = node.parameters
# The window size must be an odd value. For a window
# size of (2*n+1), TensorFlow defines depth_radius = n.
assert params.local_size % 2 == 1
# Caffe scales by (alpha/(2*n+1)), whereas TensorFlow
# just scales by alpha (as does Krizhevsky's paper).
# We'll account for that here.
alpha = params.alpha / float(params.local_size)
return TensorFlowNode('lrn', params.local_size, alpha, params.beta)
def map_concat(self, node):
return TensorFlowNode('concat', node.parameters.axis)
def map_dropout(self, node):
return TensorFlowNode('dropout', node.parameters.dropout_ratio)
def map_batch_norm(self, node):
scale_offset = len(node.data) == 4
kwargs = {} if scale_offset else {'scale_offset': False}
return MaybeActivated(
node, default=False)('batch_normalization', **kwargs)
def map_eltwise(self, node):
operations = {0: 'multiply', 1: 'add', 2: 'max'}
op_code = node.parameters.operation
try:
return TensorFlowNode(operations[op_code])
except KeyError:
raise KaffeError('Unknown elementwise operation: {}'.format(
op_code))
def commit(self, chains):
return chains
class TensorFlowEmitter(object):
def __init__(self, tab=None):
self.tab = tab or ' ' * 4
self.prefix = ''
self.net_name = ''
def indent(self):
self.prefix += self.tab
def outdent(self):
self.prefix = self.prefix[:-len(self.tab)]
def statement(self, s):
return self.prefix + s + '\n'
def emit_imports(self):
import inspect
codes = []
codes.append(
'### generated by caffe2fluid, your net is in class "%s" ###\n' %
(self.net_name))
network_source = inspect.getsource(network)
codes.append(network_source + '\n')
return self.statement('\n'.join(codes))
def emit_class_def(self, name):
return self.statement('class %s(Network):' % (name))
def emit_setup_def(self):
return self.statement('def setup(self):')
def emit_shape_def(self, input_nodes):
self.outdent()
func_def = self.statement('@classmethod')
func_def += self.statement('def input_shapes(cls):')
self.indent()
input_shapes = {}
for n in input_nodes:
name = n.name
output_shape = n.output_shape
shape = [str(s) for s in output_shape[1:]]
input_shapes[name] = ', '.join(shape)
input_shapes = ['"%s": [%s]' % (n, l) for n, l in input_shapes.items()]
shape_str = ','.join(input_shapes)
func_def += self.statement('return {%s}' % (shape_str))
return '\n\n' + func_def
def emit_convert_def(self, input_nodes):
codes = []
inputs = {}
codes.append('shapes = cls.input_shapes()')
for n in input_nodes:
name = n.name
layer_var = name + '_layer'
layer_def = '%s = fluid.layers.data(name="%s", shape=shapes["%s"],'\
' dtype="float32")' % (layer_var, name, name)
#layer_var, layer_def = data_layer_def(n.name, n.output_shape)
codes.append(layer_def)
inputs[name] = layer_var
input_dict = ','.join(['"%s": %s' % (n, l) for n, l in inputs.items()])
codes.append('feed_data = {' + input_dict + '}')
codes.append('net = cls(feed_data)')
codes.append("place = fluid.CPUPlace()")
codes.append("exe = fluid.Executor(place)")
codes.append("exe.run(fluid.default_startup_program())")
codes.append("net.load(data_path=npy_model, exe=exe, place=place)")
codes.append(
"fluid.io.save_persistables(executor=exe, dirname=fluid_path)")
self.outdent()
func_def = self.statement('@classmethod')
func_def += self.statement('def convert(cls, npy_model, fluid_path):')
self.indent()
func_def += self.statement('import paddle.v2.fluid as fluid')
for l in codes:
func_def += self.statement(l)
return '\n' + func_def
def emit_main_def(self, name):
if name is None:
return ''
self.prefix = ''
main_def = self.statement('if __name__ == "__main__":')
self.indent()
main_def += self.statement("#usage: python xxxnet.py xxx.npy ./model\n")
main_def += self.statement("import sys")
main_def += self.statement("npy_weight = sys.argv[1]")
main_def += self.statement("fluid_model = sys.argv[2]")
main_def += self.statement("%s.convert(npy_weight, fluid_model)" %
(name))
main_def += self.statement("exit(0)")
return '\n\n' + main_def
def emit_parents(self, chain):
assert len(chain)
s = 'self.feed('
sep = ', \n' + self.prefix + (' ' * len(s))
s += sep.join(
["'%s'" % parent.name for parent in chain[0].node.parents])
return self.statement(s + ')')
def emit_node(self, node):
return self.statement('self.' + node.emit())
def emit(self, name, chains, input_nodes=None):
self.net_name = name
s = self.emit_imports()
s += self.emit_class_def(name)
self.indent()
s += self.emit_setup_def()
self.indent()
blocks = []
for chain in chains:
b = ''
b += self.emit_parents(chain)
for node in chain:
b += self.emit_node(node)
blocks.append(b[:-1])
s = s + '\n\n'.join(blocks)
s += self.emit_shape_def(input_nodes)
s += self.emit_convert_def(input_nodes)
s += self.emit_main_def(name)
return s
class Transformer(object):
def __init__(self, def_path, data_path, verbose=True, phase='test'):
self.verbose = verbose
self.phase = phase
self.load(def_path, data_path, phase)
self.params = None
self.source = None
def load(self, def_path, data_path, phase):
# Build the graph
graph = GraphBuilder(def_path, phase).build()
if data_path is not None:
# Load and associate learned parameters
graph = DataInjector(def_path, data_path)(graph)
# Transform the graph
transformers = [
# Fuse split batch normalization layers
BatchNormScaleBiasFuser(),
# Fuse ReLUs
# TODO: Move non-linearity application to layer wrapper, allowing
# any arbitrary operation to be optionally activated.
ReLUFuser(allowed_parent_types=[
NodeKind.Convolution, NodeKind.InnerProduct, NodeKind.BatchNorm
]),
# Rename nodes
# Slashes are used for scoping in TensorFlow. Replace slashes
# in node names with underscores.
# (Caffe's GoogLeNet implementation uses slashes)
NodeRenamer(lambda node: node.name.replace('/', '_'))
]
self.graph = graph.transformed(transformers)
# Display the graph
if self.verbose:
print_stderr(self.graph)
def transform_data(self):
if self.params is None:
transformers = [
# Reshape the parameters to TensorFlow's ordering
DataReshaper({
# (c_o, c_i, h, w) -> (h, w, c_i, c_o) for TF
NodeKind.Convolution: (0, 1, 2, 3),
# (c_o, c_i) -> (c_i, c_o)
NodeKind.InnerProduct: (1, 0)
}),
# Pre-process batch normalization data
BatchNormPreprocessor(),
# Convert parameters to dictionaries
ParameterNamer(),
]
self.graph = self.graph.transformed(transformers)
self.params = {
node.name: node.data
for node in self.graph.nodes if node.data
}
return self.params
def transform_source(self):
if self.source is None:
mapper = TensorFlowMapper(self.graph)
chains = mapper.map()
emitter = TensorFlowEmitter()
input_nodes = self.graph.get_input_nodes()
self.source = emitter.emit(self.graph.name, chains, input_nodes)
return self.source
import math
from collections import namedtuple
from .errors import KaffeError
TensorShape = namedtuple('TensorShape',
['batch_size', 'channels', 'height', 'width'])
def get_filter_output_shape(i_h, i_w, params, round_func):
o_h = (i_h + 2 * params.pad_h - params.kernel_h
) / float(params.stride_h) + 1
o_w = (i_w + 2 * params.pad_w - params.kernel_w
) / float(params.stride_w) + 1
return (int(round_func(o_h)), int(round_func(o_w)))
def get_strided_kernel_output_shape(node, round_func):
assert node.layer is not None
input_shape = node.get_only_parent().output_shape
o_h, o_w = get_filter_output_shape(input_shape.height, input_shape.width,
node.layer.kernel_parameters, round_func)
params = node.layer.parameters
has_c_o = hasattr(params, 'num_output')
c = params.num_output if has_c_o else input_shape.channels
return TensorShape(input_shape.batch_size, c, o_h, o_w)
def shape_not_implemented(node):
raise NotImplementedError
def shape_identity(node):
assert len(node.parents) > 0
return node.parents[0].output_shape
def shape_scalar(node):
return TensorShape(1, 1, 1, 1)
def shape_data(node):
if node.output_shape:
# Old-style input specification
return node.output_shape
try:
# New-style input specification
return map(int, node.parameters.shape[0].dim)
except:
# We most likely have a data layer on our hands. The problem is,
# Caffe infers the dimensions of the data from the source (eg: LMDB).
# We want to avoid reading datasets here. Fail for now.
# This can be temporarily fixed by transforming the data layer to
# Caffe's "input" layer (as is usually used in the "deploy" version).
# TODO: Find a better solution for this.
raise KaffeError('Cannot determine dimensions of data layer.\n'
'See comments in function shape_data for more info.')
def shape_mem_data(node):
params = node.parameters
return TensorShape(params.batch_size, params.channels, params.height,
params.width)
def shape_concat(node):
axis = node.layer.parameters.axis
output_shape = None
for parent in node.parents:
if output_shape is None:
output_shape = list(parent.output_shape)
else:
output_shape[axis] += parent.output_shape[axis]
return tuple(output_shape)
def shape_convolution(node):
return get_strided_kernel_output_shape(node, math.floor)
def shape_pool(node):
return get_strided_kernel_output_shape(node, math.ceil)
def shape_inner_product(node):
input_shape = node.get_only_parent().output_shape
return TensorShape(input_shape.batch_size, node.layer.parameters.num_output,
1, 1)
'''
A collection of graph transforms.
A transformer is a callable that accepts a graph and returns a transformed version.
'''
import os
import numpy as np
from .caffe import get_caffe_resolver, has_pycaffe
from .errors import KaffeError, debug, notice, warn
from .layers import NodeKind
class DataInjector(object):
'''
Associates parameters loaded from a .caffemodel file with their corresponding nodes.
'''
def __init__(self, def_path, data_path):
# The .prototxt file defining the graph
self.def_path = def_path
# The .caffemodel file containing the learned parameters
self.data_path = data_path
# Set to true if the fallback protocol-buffer based backend was used
self.did_use_pb = False
# A list containing (layer name, parameters) tuples
self.params = None
# Load the parameters
self.load()
def load(self):
if has_pycaffe():
self.load_using_caffe()
else:
self.load_using_pb()
def load_using_caffe(self):
caffe = get_caffe_resolver().caffe
net = caffe.Net(self.def_path, self.data_path, caffe.TEST)
data = lambda blob: blob.data
self.params = [(k, map(data, v)) for k, v in net.params.items()]
def load_using_pb(self):
data = get_caffe_resolver().NetParameter()
data.MergeFromString(open(self.data_path, 'rb').read())
pair = lambda layer: (layer.name, self.normalize_pb_data(layer))
layers = data.layers or data.layer
self.params = [pair(layer) for layer in layers if layer.blobs]
self.did_use_pb = True
def normalize_pb_data(self, layer):
transformed = []
for blob in layer.blobs:
if len(blob.shape.dim):
dims = blob.shape.dim
c_o, c_i, h, w = map(int, [1] * (4 - len(dims)) + list(dims))
else:
c_o = blob.num
c_i = blob.channels
h = blob.height
w = blob.width
data = np.array(blob.data, dtype=np.float32).reshape(c_o, c_i, h, w)
transformed.append(data)
return transformed
def adjust_parameters(self, node, data):
if not self.did_use_pb:
return data
# When using the protobuf-backend, each parameter initially has four dimensions.
# In certain cases (like FC layers), we want to eliminate the singleton dimensions.
# This implementation takes care of the common cases. However, it does leave the
# potential for future issues.
# The Caffe-backend does not suffer from this problem.
data = list(data)
squeeze_indices = [1] # Squeeze biases.
if node.kind == NodeKind.InnerProduct:
squeeze_indices.append(0) # Squeeze FC.
for idx in squeeze_indices:
if idx >= len(data):
continue
shape_old = data[idx].shape
data[idx] = np.squeeze(data[idx])
shape_new = data[idx].shape
if len(shape_old) != shape_new:
debug('squeeze idx:%d, with kind:%s,name:%s' % \
(idx, node.kind, node.name))
return data
def __call__(self, graph):
for layer_name, data in self.params:
if layer_name in graph:
node = graph.get_node(layer_name)
node.data = self.adjust_parameters(node, data)
else:
notice('Ignoring parameters for non-existent layer: %s' % \
layer_name)
return graph
class DataReshaper(object):
def __init__(self, mapping, replace=True):
# A dictionary mapping NodeKind to the transposed order.
self.mapping = mapping
# The node kinds eligible for reshaping
self.reshaped_node_types = self.mapping.keys()
# If true, the reshaped data will replace the old one.
# Otherwise, it's set to the reshaped_data attribute.
self.replace = replace
def has_spatial_parent(self, node):
try:
parent = node.get_only_parent()
s = parent.output_shape
return s.height > 1 or s.width > 1
except KaffeError:
return False
def map(self, node_kind):
try:
return self.mapping[node_kind]
except KeyError:
raise
#raise KaffeError('Ordering not found for node kind: {}'.format(node_kind))
def __call__(self, graph):
for node in graph.nodes:
if node.data is None:
continue
if node.kind not in self.reshaped_node_types:
# Check for 2+ dimensional data
if any(len(tensor.shape) > 1 for tensor in node.data):
notice('parmaters not reshaped for node: {}'.format(node))
continue
transpose_order = self.map(node.kind)
weights = node.data[0]
if (node.kind == NodeKind.InnerProduct
) and self.has_spatial_parent(node):
# The FC layer connected to the spatial layer needs to be
# re-wired to match the new spatial ordering.
in_shape = node.get_only_parent().output_shape
fc_shape = weights.shape
output_channels = fc_shape[0]
weights = weights.reshape((output_channels, -1))
weights = weights.transpose(transpose_order)
node.reshaped_data = weights
else:
node.reshaped_data = weights.transpose(transpose_order)
if self.replace:
for node in graph.nodes:
if hasattr(node, 'reshaped_data'):
# Set the weights
node.data[0] = node.reshaped_data
del node.reshaped_data
return graph
class SubNodeFuser(object):
'''
An abstract helper for merging a single-child with its single-parent.
'''
def __call__(self, graph):
nodes = graph.nodes
fused_nodes = []
for node in nodes:
if len(node.parents) != 1:
# We're only fusing nodes with single parents
continue
parent = node.get_only_parent()
if len(parent.children) != 1:
# We can only fuse a node if its parent's
# value isn't used by any other node.
continue
if not self.is_eligible_pair(parent, node):
continue
# Rewrite the fused node's children to its parent.
for child in node.children:
child.parents.remove(node)
parent.add_child(child)
# Disconnect the fused node from the graph.
parent.children.remove(node)
fused_nodes.append(node)
# Let the sub-class merge the fused node in any arbitrary way.
self.merge(parent, node)
transformed_nodes = [node for node in nodes if node not in fused_nodes]
return graph.replaced(transformed_nodes)
def is_eligible_pair(self, parent, child):
'''Returns true if this parent/child pair is eligible for fusion.'''
raise NotImplementedError('Must be implemented by subclass.')
def merge(self, parent, child):
'''Merge the child node into the parent.'''
raise NotImplementedError('Must be implemented by subclass')
class ReLUFuser(SubNodeFuser):
'''
Fuses rectified linear units with their parent nodes.
'''
def __init__(self, allowed_parent_types=None):
# Fuse ReLUs when the parent node is one of the given types.
# If None, all node types are eligible.
self.allowed_parent_types = allowed_parent_types
def is_eligible_pair(self, parent, child):
return ((self.allowed_parent_types is None or \
parent.kind in self.allowed_parent_types) and \
child.kind == NodeKind.ReLU)
def merge(self, parent, _):
parent.metadata['relu'] = True
class BatchNormScaleBiasFuser(SubNodeFuser):
'''
The original batch normalization paper includes two learned
parameters: a scaling factor \gamma and a bias \beta.
Caffe's implementation does not include these two. However, it is commonly
replicated by adding a scaling+bias layer immidiately after the batch norm.
This fuser merges the scaling+bias layer with the batch norm.
'''
def is_eligible_pair(self, parent, child):
return (parent.kind == NodeKind.BatchNorm and \
child.kind == NodeKind.Scale and \
child.parameters.axis == 1 and \
child.parameters.bias_term == True)
def merge(self, parent, child):
parent.scale_bias_node = child
class BatchNormPreprocessor(object):
'''
Prescale batch normalization parameters.
Concatenate gamma (scale) and beta (bias) terms if set.
'''
def __call__(self, graph):
for node in graph.nodes:
if node.kind != NodeKind.BatchNorm:
continue
assert node.data is not None
assert len(node.data) == 3
node.data = [np.squeeze(i) for i in node.data]
mean, variance, scale = node.data
# Prescale the stats
scaling_factor = 1.0 / scale if scale != 0 else 0
mean *= scaling_factor
variance *= scaling_factor
# Replace with the updated values
node.data = [mean, variance]
if hasattr(node, 'scale_bias_node'):
# Include the scale and bias terms
gamma, beta = node.scale_bias_node.data
node.data += [np.squeeze(i) for i in [gamma, beta]]
return graph
class NodeRenamer(object):
'''
Renames nodes in the graph using a given unary function that
accepts a node and returns its new name.
'''
def __init__(self, renamer):
self.renamer = renamer
def __call__(self, graph):
for node in graph.nodes:
node.name = self.renamer(node)
return graph
class ParameterNamer(object):
'''
Convert layer data arrays to a dictionary mapping parameter names to their values.
'''
def __call__(self, graph):
for node in graph.nodes:
if node.data is None:
continue
if node.kind in (NodeKind.Convolution, NodeKind.InnerProduct):
names = ('weights', )
if node.parameters.bias_term:
names += ('biases', )
elif node.kind == NodeKind.BatchNorm:
names = ('mean', 'variance')
if len(node.data) == 4:
names += ('scale', 'offset')
else:
warn('Unhandled parameters: {}'.format(node.kind))
continue
assert len(names) == len(node.data)
node.data = dict(zip(names, node.data))
return graph
syntax = "proto2";
package caffe;
// Specifies the shape (dimensions) of a Blob.
message BlobShape { repeated int64 dim = 1 [ packed = true ]; }
message BlobProto {
optional BlobShape shape = 7;
repeated float data = 5 [ packed = true ];
repeated float diff = 6 [ packed = true ];
repeated double double_data = 8 [ packed = true ];
repeated double double_diff = 9 [ packed = true ];
// 4D dimensions -- deprecated. Use "shape" instead.
optional int32 num = 1 [ default = 0 ];
optional int32 channels = 2 [ default = 0 ];
optional int32 height = 3 [ default = 0 ];
optional int32 width = 4 [ default = 0 ];
}
// The BlobProtoVector is simply a way to pass multiple blobproto instances
// around.
message BlobProtoVector { repeated BlobProto blobs = 1; }
message Datum {
optional int32 channels = 1;
optional int32 height = 2;
optional int32 width = 3;
// the actual image data, in bytes
optional bytes data = 4;
optional int32 label = 5;
// Optionally, the datum could also hold float data.
repeated float float_data = 6;
// If true data contains an encoded image that need to be decoded
optional bool encoded = 7 [ default = false ];
}
message FillerParameter {
// The filler type.
optional string type = 1 [ default = 'constant' ];
optional float value = 2 [ default = 0 ]; // the value in constant filler
optional float min = 3 [ default = 0 ]; // the min value in uniform filler
optional float max = 4 [ default = 1 ]; // the max value in uniform filler
optional float mean = 5 [ default = 0 ]; // the mean value in Gaussian filler
optional float std = 6 [ default = 1 ]; // the std value in Gaussian filler
// The expected number of non-zero output weights for a given input in
// Gaussian filler -- the default -1 means don't perform sparsification.
optional int32 sparse = 7 [ default = -1 ];
// Normalize the filler variance by fan_in, fan_out, or their average.
// Applies to 'xavier' and 'msra' fillers.
enum VarianceNorm {
FAN_IN = 0;
FAN_OUT = 1;
AVERAGE = 2;
}
optional VarianceNorm variance_norm = 8 [ default = FAN_IN ];
}
message NetParameter {
optional string name = 1; // consider giving the network a name
// DEPRECATED. See InputParameter. The input blobs to the network.
repeated string input = 3;
// DEPRECATED. See InputParameter. The shape of the input blobs.
repeated BlobShape input_shape = 8;
// 4D input dimensions -- deprecated. Use "input_shape" instead.
// If specified, for each input blob there should be four
// values specifying the num, channels, height and width of the input blob.
// Thus, there should be a total of (4 * #input) numbers.
repeated int32 input_dim = 4;
// Whether the network will force every layer to carry out backward operation.
// If set False, then whether to carry out backward is determined
// automatically according to the net structure and learning rates.
optional bool force_backward = 5 [ default = false ];
// The current "state" of the network, including the phase, level, and stage.
// Some layers may be included/excluded depending on this state and the states
// specified in the layers' include and exclude fields.
optional NetState state = 6;
// Print debugging information about results while running Net::Forward,
// Net::Backward, and Net::Update.
optional bool debug_info = 7 [ default = false ];
// The layers that make up the net. Each of their configurations, including
// connectivity and behavior, is specified as a LayerParameter.
repeated LayerParameter layer = 100; // ID 100 so layers are printed last.
// DEPRECATED: use 'layer' instead.
repeated V1LayerParameter layers = 2;
}
// NOTE
// Update the next available ID when you add a new SolverParameter field.
//
// SolverParameter next available ID: 42 (last added: layer_wise_reduce)
message SolverParameter {
//////////////////////////////////////////////////////////////////////////////
// Specifying the train and test networks
//
// Exactly one train net must be specified using one of the following fields:
// train_net_param, train_net, net_param, net
// One or more test nets may be specified using any of the following fields:
// test_net_param, test_net, net_param, net
// If more than one test net field is specified (e.g., both net and
// test_net are specified), they will be evaluated in the field order given
// above: (1) test_net_param, (2) test_net, (3) net_param/net.
// A test_iter must be specified for each test_net.
// A test_level and/or a test_stage may also be specified for each test_net.
//////////////////////////////////////////////////////////////////////////////
// Proto filename for the train net, possibly combined with one or more
// test nets.
optional string net = 24;
// Inline train net param, possibly combined with one or more test nets.
optional NetParameter net_param = 25;
optional string train_net = 1; // Proto filename for the train net.
repeated string test_net = 2; // Proto filenames for the test nets.
optional NetParameter train_net_param = 21; // Inline train net params.
repeated NetParameter test_net_param = 22; // Inline test net params.
// The states for the train/test nets. Must be unspecified or
// specified once per net.
//
// By default, train_state will have phase = TRAIN,
// and all test_state's will have phase = TEST.
// Other defaults are set according to the NetState defaults.
optional NetState train_state = 26;
repeated NetState test_state = 27;
// The number of iterations for each test net.
repeated int32 test_iter = 3;
// The number of iterations between two testing phases.
optional int32 test_interval = 4 [ default = 0 ];
optional bool test_compute_loss = 19 [ default = false ];
// If true, run an initial test pass before the first iteration,
// ensuring memory availability and printing the starting value of the loss.
optional bool test_initialization = 32 [ default = true ];
optional float base_lr = 5; // The base learning rate
// the number of iterations between displaying info. If display = 0, no info
// will be displayed.
optional int32 display = 6;
// Display the loss averaged over the last average_loss iterations
optional int32 average_loss = 33 [ default = 1 ];
optional int32 max_iter = 7; // the maximum number of iterations
// accumulate gradients over `iter_size` x `batch_size` instances
optional int32 iter_size = 36 [ default = 1 ];
// The learning rate decay policy. The currently implemented learning rate
// policies are as follows:
// - fixed: always return base_lr.
// - step: return base_lr * gamma ^ (floor(iter / step))
// - exp: return base_lr * gamma ^ iter
// - inv: return base_lr * (1 + gamma * iter) ^ (- power)
// - multistep: similar to step but it allows non uniform steps defined by
// stepvalue
// - poly: the effective learning rate follows a polynomial decay, to be
// zero by the max_iter. return base_lr (1 - iter/max_iter) ^ (power)
// - sigmoid: the effective learning rate follows a sigmod decay
// return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))
//
// where base_lr, max_iter, gamma, step, stepvalue and power are defined
// in the solver parameter protocol buffer, and iter is the current iteration.
optional string lr_policy = 8;
optional float gamma = 9; // The parameter to compute the learning rate.
optional float power = 10; // The parameter to compute the learning rate.
optional float momentum = 11; // The momentum value.
optional float weight_decay = 12; // The weight decay.
// regularization types supported: L1 and L2
// controlled by weight_decay
optional string regularization_type = 29 [ default = "L2" ];
// the stepsize for learning rate policy "step"
optional int32 stepsize = 13;
// the stepsize for learning rate policy "multistep"
repeated int32 stepvalue = 34;
// Set clip_gradients to >= 0 to clip parameter gradients to that L2 norm,
// whenever their actual L2 norm is larger.
optional float clip_gradients = 35 [ default = -1 ];
optional int32 snapshot = 14 [ default = 0 ]; // The snapshot interval
optional string snapshot_prefix = 15; // The prefix for the snapshot.
// whether to snapshot diff in the results or not. Snapshotting diff will help
// debugging but the final protocol buffer size will be much larger.
optional bool snapshot_diff = 16 [ default = false ];
enum SnapshotFormat {
HDF5 = 0;
BINARYPROTO = 1;
}
optional SnapshotFormat snapshot_format = 37 [ default = BINARYPROTO ];
// the mode solver will use: 0 for CPU and 1 for GPU. Use GPU in default.
enum SolverMode {
CPU = 0;
GPU = 1;
}
optional SolverMode solver_mode = 17 [ default = GPU ];
// the device_id will that be used in GPU mode. Use device_id = 0 in default.
optional int32 device_id = 18 [ default = 0 ];
// If non-negative, the seed with which the Solver will initialize the Caffe
// random number generator -- useful for reproducible results. Otherwise,
// (and by default) initialize using a seed derived from the system clock.
optional int64 random_seed = 20 [ default = -1 ];
// type of the solver
optional string type = 40 [ default = "SGD" ];
// numerical stability for RMSProp, AdaGrad and AdaDelta and Adam
optional float delta = 31 [ default = 1e-8 ];
// parameters for the Adam solver
optional float momentum2 = 39 [ default = 0.999 ];
// RMSProp decay value
// MeanSquare(t) = rms_decay*MeanSquare(t-1) + (1-rms_decay)*SquareGradient(t)
optional float rms_decay = 38 [ default = 0.99 ];
// If true, print information about the state of the net that may help with
// debugging learning problems.
optional bool debug_info = 23 [ default = false ];
// If false, don't save a snapshot after training finishes.
optional bool snapshot_after_train = 28 [ default = true ];
// DEPRECATED: old solver enum types, use string instead
enum SolverType {
SGD = 0;
NESTEROV = 1;
ADAGRAD = 2;
RMSPROP = 3;
ADADELTA = 4;
ADAM = 5;
}
// DEPRECATED: use type instead of solver_type
optional SolverType solver_type = 30 [ default = SGD ];
// Overlap compute and communication for data parallel training
optional bool layer_wise_reduce = 41 [ default = true ];
}
// A message that stores the solver snapshots
message SolverState {
optional int32 iter = 1; // The current iteration
optional string learned_net = 2; // The file that stores the learned net.
repeated BlobProto history = 3; // The history for sgd solvers
optional int32 current_step = 4
[ default = 0 ]; // The current step for learning rate
}
enum Phase {
TRAIN = 0;
TEST = 1;
}
message NetState {
optional Phase phase = 1 [ default = TEST ];
optional int32 level = 2 [ default = 0 ];
repeated string stage = 3;
}
message NetStateRule {
// Set phase to require the NetState have a particular phase (TRAIN or TEST)
// to meet this rule.
optional Phase phase = 1;
// Set the minimum and/or maximum levels in which the layer should be used.
// Leave undefined to meet the rule regardless of level.
optional int32 min_level = 2;
optional int32 max_level = 3;
// Customizable sets of stages to include or exclude.
// The net must have ALL of the specified stages and NONE of the specified
// "not_stage"s to meet the rule.
// (Use multiple NetStateRules to specify conjunctions of stages.)
repeated string stage = 4;
repeated string not_stage = 5;
}
// Specifies training parameters (multipliers on global learning constants,
// and the name and other settings used for weight sharing).
message ParamSpec {
// The names of the parameter blobs -- useful for sharing parameters among
// layers, but never required otherwise. To share a parameter between two
// layers, give it a (non-empty) name.
optional string name = 1;
// Whether to require shared weights to have the same shape, or just the same
// count -- defaults to STRICT if unspecified.
optional DimCheckMode share_mode = 2;
enum DimCheckMode {
// STRICT (default) requires that num, channels, height, width each match.
STRICT = 0;
// PERMISSIVE requires only the count (num*channels*height*width) to match.
PERMISSIVE = 1;
}
// The multiplier on the global learning rate for this parameter.
optional float lr_mult = 3 [ default = 1.0 ];
// The multiplier on the global weight decay for this parameter.
optional float decay_mult = 4 [ default = 1.0 ];
}
// NOTE
// Update the next available ID when you add a new LayerParameter field.
//
// LayerParameter next available layer-specific ID: 147 (last added:
// recurrent_param)
message LayerParameter {
optional string name = 1; // the layer name
optional string type = 2; // the layer type
repeated string bottom = 3; // the name of each bottom blob
repeated string top = 4; // the name of each top blob
// The train / test phase for computation.
optional Phase phase = 10;
// The amount of weight to assign each top blob in the objective.
// Each layer assigns a default value, usually of either 0 or 1,
// to each top blob.
repeated float loss_weight = 5;
// Specifies training parameters (multipliers on global learning constants,
// and the name and other settings used for weight sharing).
repeated ParamSpec param = 6;
// The blobs containing the numeric parameters of the layer.
repeated BlobProto blobs = 7;
// Specifies whether to backpropagate to each bottom. If unspecified,
// Caffe will automatically infer whether each input needs backpropagation
// to compute parameter gradients. If set to true for some inputs,
// backpropagation to those inputs is forced; if set false for some inputs,
// backpropagation to those inputs is skipped.
//
// The size must be either 0 or equal to the number of bottoms.
repeated bool propagate_down = 11;
// Rules controlling whether and when a layer is included in the network,
// based on the current NetState. You may specify a non-zero number of rules
// to include OR exclude, but not both. If no include or exclude rules are
// specified, the layer is always included. If the current NetState meets
// ANY (i.e., one or more) of the specified rules, the layer is
// included/excluded.
repeated NetStateRule include = 8;
repeated NetStateRule exclude = 9;
// Parameters for data pre-processing.
optional TransformationParameter transform_param = 100;
// Parameters shared by loss layers.
optional LossParameter loss_param = 101;
// Layer type-specific parameters.
//
// Note: certain layers may have more than one computational engine
// for their implementation. These layers include an Engine type and
// engine parameter for selecting the implementation.
// The default for the engine is set by the ENGINE switch at compile-time.
optional AccuracyParameter accuracy_param = 102;
optional ArgMaxParameter argmax_param = 103;
optional BatchNormParameter batch_norm_param = 139;
optional BiasParameter bias_param = 141;
optional ConcatParameter concat_param = 104;
optional ContrastiveLossParameter contrastive_loss_param = 105;
optional ConvolutionParameter convolution_param = 106;
optional CropParameter crop_param = 144;
optional DataParameter data_param = 107;
optional DropoutParameter dropout_param = 108;
optional DummyDataParameter dummy_data_param = 109;
optional EltwiseParameter eltwise_param = 110;
optional ELUParameter elu_param = 140;
optional EmbedParameter embed_param = 137;
optional ExpParameter exp_param = 111;
optional FlattenParameter flatten_param = 135;
optional HDF5DataParameter hdf5_data_param = 112;
optional HDF5OutputParameter hdf5_output_param = 113;
optional HingeLossParameter hinge_loss_param = 114;
optional ImageDataParameter image_data_param = 115;
optional InfogainLossParameter infogain_loss_param = 116;
optional InnerProductParameter inner_product_param = 117;
optional InputParameter input_param = 143;
optional LogParameter log_param = 134;
optional LRNParameter lrn_param = 118;
optional MemoryDataParameter memory_data_param = 119;
optional MVNParameter mvn_param = 120;
optional ParameterParameter parameter_param = 145;
optional PoolingParameter pooling_param = 121;
optional PowerParameter power_param = 122;
optional PReLUParameter prelu_param = 131;
optional PythonParameter python_param = 130;
optional RecurrentParameter recurrent_param = 146;
optional ReductionParameter reduction_param = 136;
optional ReLUParameter relu_param = 123;
optional ReshapeParameter reshape_param = 133;
optional ScaleParameter scale_param = 142;
optional SigmoidParameter sigmoid_param = 124;
optional SoftmaxParameter softmax_param = 125;
optional SPPParameter spp_param = 132;
optional SliceParameter slice_param = 126;
optional TanHParameter tanh_param = 127;
optional ThresholdParameter threshold_param = 128;
optional TileParameter tile_param = 138;
optional WindowDataParameter window_data_param = 129;
}
// Message that stores parameters used to apply transformation
// to the data layer's data
message TransformationParameter {
// For data pre-processing, we can do simple scaling and subtracting the
// data mean, if provided. Note that the mean subtraction is always carried
// out before scaling.
optional float scale = 1 [ default = 1 ];
// Specify if we want to randomly mirror data.
optional bool mirror = 2 [ default = false ];
// Specify if we would like to randomly crop an image.
optional uint32 crop_size = 3 [ default = 0 ];
// mean_file and mean_value cannot be specified at the same time
optional string mean_file = 4;
// if specified can be repeated once (would subtract it from all the channels)
// or can be repeated the same number of times as channels
// (would subtract them from the corresponding channel)
repeated float mean_value = 5;
// Force the decoded image to have 3 color channels.
optional bool force_color = 6 [ default = false ];
// Force the decoded image to have 1 color channels.
optional bool force_gray = 7 [ default = false ];
}
// Message that stores parameters shared by loss layers
message LossParameter {
// If specified, ignore instances with the given label.
optional int32 ignore_label = 1;
// How to normalize the loss for loss layers that aggregate across batches,
// spatial dimensions, or other dimensions. Currently only implemented in
// SoftmaxWithLoss and SigmoidCrossEntropyLoss layers.
enum NormalizationMode {
// Divide by the number of examples in the batch times spatial dimensions.
// Outputs that receive the ignore label will NOT be ignored in computing
// the normalization factor.
FULL = 0;
// Divide by the total number of output locations that do not take the
// ignore_label. If ignore_label is not set, this behaves like FULL.
VALID = 1;
// Divide by the batch size.
BATCH_SIZE = 2;
// Do not normalize the loss.
NONE = 3;
}
// For historical reasons, the default normalization for
// SigmoidCrossEntropyLoss is BATCH_SIZE and *not* VALID.
optional NormalizationMode normalization = 3 [ default = VALID ];
// Deprecated. Ignored if normalization is specified. If normalization
// is not specified, then setting this to false will be equivalent to
// normalization = BATCH_SIZE to be consistent with previous behavior.
optional bool normalize = 2;
}
// Messages that store parameters used by individual layer types follow, in
// alphabetical order.
message AccuracyParameter {
// When computing accuracy, count as correct by comparing the true label to
// the top k scoring classes. By default, only compare to the top scoring
// class (i.e. argmax).
optional uint32 top_k = 1 [ default = 1 ];
// The "label" axis of the prediction blob, whose argmax corresponds to the
// predicted label -- may be negative to index from the end (e.g., -1 for the
// last axis). For example, if axis == 1 and the predictions are
// (N x C x H x W), the label blob is expected to contain N*H*W ground truth
// labels with integer values in {0, 1, ..., C-1}.
optional int32 axis = 2 [ default = 1 ];
// If specified, ignore instances with the given label.
optional int32 ignore_label = 3;
}
message ArgMaxParameter {
// If true produce pairs (argmax, maxval)
optional bool out_max_val = 1 [ default = false ];
optional uint32 top_k = 2 [ default = 1 ];
// The axis along which to maximise -- may be negative to index from the
// end (e.g., -1 for the last axis).
// By default ArgMaxLayer maximizes over the flattened trailing dimensions
// for each index of the first / num dimension.
optional int32 axis = 3;
}
message ConcatParameter {
// The axis along which to concatenate -- may be negative to index from the
// end (e.g., -1 for the last axis). Other axes must have the
// same dimension for all the bottom blobs.
// By default, ConcatLayer concatenates blobs along the "channels" axis (1).
optional int32 axis = 2 [ default = 1 ];
// DEPRECATED: alias for "axis" -- does not support negative indexing.
optional uint32 concat_dim = 1 [ default = 1 ];
}
message BatchNormParameter {
// If false, normalization is performed over the current mini-batch
// and global statistics are accumulated (but not yet used) by a moving
// average.
// If true, those accumulated mean and variance values are used for the
// normalization.
// By default, it is set to false when the network is in the training
// phase and true when the network is in the testing phase.
optional bool use_global_stats = 1;
// What fraction of the moving average remains each iteration?
// Smaller values make the moving average decay faster, giving more
// weight to the recent values.
// Each iteration updates the moving average @f$S_{t-1}@f$ with the
// current mean @f$ Y_t @f$ by
// @f$ S_t = (1-\beta)Y_t + \beta \cdot S_{t-1} @f$, where @f$ \beta @f$
// is the moving_average_fraction parameter.
optional float moving_average_fraction = 2 [ default = .999 ];
// Small value to add to the variance estimate so that we don't divide by
// zero.
optional float eps = 3 [ default = 1e-5 ];
}
message BiasParameter {
// The first axis of bottom[0] (the first input Blob) along which to apply
// bottom[1] (the second input Blob). May be negative to index from the end
// (e.g., -1 for the last axis).
//
// For example, if bottom[0] is 4D with shape 100x3x40x60, the output
// top[0] will have the same shape, and bottom[1] may have any of the
// following shapes (for the given value of axis):
// (axis == 0 == -4) 100; 100x3; 100x3x40; 100x3x40x60
// (axis == 1 == -3) 3; 3x40; 3x40x60
// (axis == 2 == -2) 40; 40x60
// (axis == 3 == -1) 60
// Furthermore, bottom[1] may have the empty shape (regardless of the value of
// "axis") -- a scalar bias.
optional int32 axis = 1 [ default = 1 ];
// (num_axes is ignored unless just one bottom is given and the bias is
// a learned parameter of the layer. Otherwise, num_axes is determined by the
// number of axes by the second bottom.)
// The number of axes of the input (bottom[0]) covered by the bias
// parameter, or -1 to cover all axes of bottom[0] starting from `axis`.
// Set num_axes := 0, to add a zero-axis Blob: a scalar.
optional int32 num_axes = 2 [ default = 1 ];
// (filler is ignored unless just one bottom is given and the bias is
// a learned parameter of the layer.)
// The initialization for the learned bias parameter.
// Default is the zero (0) initialization, resulting in the BiasLayer
// initially performing the identity operation.
optional FillerParameter filler = 3;
}
message ContrastiveLossParameter {
// margin for dissimilar pair
optional float margin = 1 [ default = 1.0 ];
// The first implementation of this cost did not exactly match the cost of
// Hadsell et al 2006 -- using (margin - d^2) instead of (margin - d)^2.
// legacy_version = false (the default) uses (margin - d)^2 as proposed in the
// Hadsell paper. New models should probably use this version.
// legacy_version = true uses (margin - d^2). This is kept to support /
// reproduce existing models and results
optional bool legacy_version = 2 [ default = false ];
}
message ConvolutionParameter {
optional uint32 num_output = 1; // The number of outputs for the layer
optional bool bias_term = 2 [ default = true ]; // whether to have bias terms
// Pad, kernel size, and stride are all given as a single value for equal
// dimensions in all spatial dimensions, or once per spatial dimension.
repeated uint32 pad = 3; // The padding size; defaults to 0
repeated uint32 kernel_size = 4; // The kernel size
repeated uint32 stride = 6; // The stride; defaults to 1
// Factor used to dilate the kernel, (implicitly) zero-filling the resulting
// holes. (Kernel dilation is sometimes referred to by its use in the
// algorithme à trous from Holschneider et al. 1987.)
repeated uint32 dilation = 18; // The dilation; defaults to 1
// For 2D convolution only, the *_h and *_w versions may also be used to
// specify both spatial dimensions.
optional uint32 pad_h = 9 [ default = 0 ]; // The padding height (2D only)
optional uint32 pad_w = 10 [ default = 0 ]; // The padding width (2D only)
optional uint32 kernel_h = 11; // The kernel height (2D only)
optional uint32 kernel_w = 12; // The kernel width (2D only)
optional uint32 stride_h = 13; // The stride height (2D only)
optional uint32 stride_w = 14; // The stride width (2D only)
optional uint32 group = 5 [ default = 1 ]; // The group size for group conv
optional FillerParameter weight_filler = 7; // The filler for the weight
optional FillerParameter bias_filler = 8; // The filler for the bias
enum Engine {
DEFAULT = 0;
CAFFE = 1;
CUDNN = 2;
}
optional Engine engine = 15 [ default = DEFAULT ];
// The axis to interpret as "channels" when performing convolution.
// Preceding dimensions are treated as independent inputs;
// succeeding dimensions are treated as "spatial".
// With (N, C, H, W) inputs, and axis == 1 (the default), we perform
// N independent 2D convolutions, sliding C-channel (or (C/g)-channels, for
// groups g>1) filters across the spatial axes (H, W) of the input.
// With (N, C, D, H, W) inputs, and axis == 1, we perform
// N independent 3D convolutions, sliding (C/g)-channels
// filters across the spatial axes (D, H, W) of the input.
optional int32 axis = 16 [ default = 1 ];
// Whether to force use of the general ND convolution, even if a specific
// implementation for blobs of the appropriate number of spatial dimensions
// is available. (Currently, there is only a 2D-specific convolution
// implementation; for input blobs with num_axes != 2, this option is
// ignored and the ND implementation will be used.)
optional bool force_nd_im2col = 17 [ default = false ];
}
message CropParameter {
// To crop, elements of the first bottom are selected to fit the dimensions
// of the second, reference bottom. The crop is configured by
// - the crop `axis` to pick the dimensions for cropping
// - the crop `offset` to set the shift for all/each dimension
// to align the cropped bottom with the reference bottom.
// All dimensions up to but excluding `axis` are preserved, while
// the dimensions including and trailing `axis` are cropped.
// If only one `offset` is set, then all dimensions are offset by this amount.
// Otherwise, the number of offsets must equal the number of cropped axes to
// shift the crop in each dimension accordingly.
// Note: standard dimensions are N,C,H,W so the default is a spatial crop,
// and `axis` may be negative to index from the end (e.g., -1 for the last
// axis).
optional int32 axis = 1 [ default = 2 ];
repeated uint32 offset = 2;
}
message DataParameter {
enum DB {
LEVELDB = 0;
LMDB = 1;
}
// Specify the data source.
optional string source = 1;
// Specify the batch size.
optional uint32 batch_size = 4;
// The rand_skip variable is for the data layer to skip a few data points
// to avoid all asynchronous sgd clients to start at the same point. The skip
// point would be set as rand_skip * rand(0,1). Note that rand_skip should not
// be larger than the number of keys in the database.
// DEPRECATED. Each solver accesses a different subset of the database.
optional uint32 rand_skip = 7 [ default = 0 ];
optional DB backend = 8 [ default = LEVELDB ];
// DEPRECATED. See TransformationParameter. For data pre-processing, we can do
// simple scaling and subtracting the data mean, if provided. Note that the
// mean subtraction is always carried out before scaling.
optional float scale = 2 [ default = 1 ];
optional string mean_file = 3;
// DEPRECATED. See TransformationParameter. Specify if we would like to
// randomly
// crop an image.
optional uint32 crop_size = 5 [ default = 0 ];
// DEPRECATED. See TransformationParameter. Specify if we want to randomly
// mirror
// data.
optional bool mirror = 6 [ default = false ];
// Force the encoded image to have 3 color channels
optional bool force_encoded_color = 9 [ default = false ];
// Prefetch queue (Increase if data feeding bandwidth varies, within the
// limit of device memory for GPU training)
optional uint32 prefetch = 10 [ default = 4 ];
}
message DropoutParameter {
optional float dropout_ratio = 1 [ default = 0.5 ]; // dropout ratio
}
// DummyDataLayer fills any number of arbitrarily shaped blobs with random
// (or constant) data generated by "Fillers" (see "message FillerParameter").
message DummyDataParameter {
// This layer produces N >= 1 top blobs. DummyDataParameter must specify 1 or
// N
// shape fields, and 0, 1 or N data_fillers.
//
// If 0 data_fillers are specified, ConstantFiller with a value of 0 is used.
// If 1 data_filler is specified, it is applied to all top blobs. If N are
// specified, the ith is applied to the ith top blob.
repeated FillerParameter data_filler = 1;
repeated BlobShape shape = 6;
// 4D dimensions -- deprecated. Use "shape" instead.
repeated uint32 num = 2;
repeated uint32 channels = 3;
repeated uint32 height = 4;
repeated uint32 width = 5;
}
message EltwiseParameter {
enum EltwiseOp {
PROD = 0;
SUM = 1;
MAX = 2;
}
optional EltwiseOp operation = 1 [ default = SUM ]; // element-wise operation
repeated float coeff = 2; // blob-wise coefficient for SUM operation
// Whether to use an asymptotically slower (for >2 inputs) but stabler method
// of computing the gradient for the PROD operation. (No effect for SUM op.)
optional bool stable_prod_grad = 3 [ default = true ];
}
// Message that stores parameters used by ELULayer
message ELUParameter {
// Described in:
// Clevert, D.-A., Unterthiner, T., & Hochreiter, S. (2015). Fast and Accurate
// Deep Network Learning by Exponential Linear Units (ELUs). arXiv
optional float alpha = 1 [ default = 1 ];
}
// Message that stores parameters used by EmbedLayer
message EmbedParameter {
optional uint32 num_output = 1; // The number of outputs for the layer
// The input is given as integers to be interpreted as one-hot
// vector indices with dimension num_input. Hence num_input should be
// 1 greater than the maximum possible input value.
optional uint32 input_dim = 2;
optional bool bias_term = 3 [ default = true ]; // Whether to use a bias term
optional FillerParameter weight_filler = 4; // The filler for the weight
optional FillerParameter bias_filler = 5; // The filler for the bias
}
// Message that stores parameters used by ExpLayer
message ExpParameter {
// ExpLayer computes outputs y = base ^ (shift + scale * x), for base > 0.
// Or if base is set to the default (-1), base is set to e,
// so y = exp(shift + scale * x).
optional float base = 1 [ default = -1.0 ];
optional float scale = 2 [ default = 1.0 ];
optional float shift = 3 [ default = 0.0 ];
}
/// Message that stores parameters used by FlattenLayer
message FlattenParameter {
// The first axis to flatten: all preceding axes are retained in the output.
// May be negative to index from the end (e.g., -1 for the last axis).
optional int32 axis = 1 [ default = 1 ];
// The last axis to flatten: all following axes are retained in the output.
// May be negative to index from the end (e.g., the default -1 for the last
// axis).
optional int32 end_axis = 2 [ default = -1 ];
}
// Message that stores parameters used by HDF5DataLayer
message HDF5DataParameter {
// Specify the data source.
optional string source = 1;
// Specify the batch size.
optional uint32 batch_size = 2;
// Specify whether to shuffle the data.
// If shuffle == true, the ordering of the HDF5 files is shuffled,
// and the ordering of data within any given HDF5 file is shuffled,
// but data between different files are not interleaved; all of a file's
// data are output (in a random order) before moving onto another file.
optional bool shuffle = 3 [ default = false ];
}
message HDF5OutputParameter { optional string file_name = 1; }
message HingeLossParameter {
enum Norm {
L1 = 1;
L2 = 2;
}
// Specify the Norm to use L1 or L2
optional Norm norm = 1 [ default = L1 ];
}
message ImageDataParameter {
// Specify the data source.
optional string source = 1;
// Specify the batch size.
optional uint32 batch_size = 4 [ default = 1 ];
// The rand_skip variable is for the data layer to skip a few data points
// to avoid all asynchronous sgd clients to start at the same point. The skip
// point would be set as rand_skip * rand(0,1). Note that rand_skip should not
// be larger than the number of keys in the database.
optional uint32 rand_skip = 7 [ default = 0 ];
// Whether or not ImageLayer should shuffle the list of files at every epoch.
optional bool shuffle = 8 [ default = false ];
// It will also resize images if new_height or new_width are not zero.
optional uint32 new_height = 9 [ default = 0 ];
optional uint32 new_width = 10 [ default = 0 ];
// Specify if the images are color or gray
optional bool is_color = 11 [ default = true ];
// DEPRECATED. See TransformationParameter. For data pre-processing, we can do
// simple scaling and subtracting the data mean, if provided. Note that the
// mean subtraction is always carried out before scaling.
optional float scale = 2 [ default = 1 ];
optional string mean_file = 3;
// DEPRECATED. See TransformationParameter. Specify if we would like to
// randomly
// crop an image.
optional uint32 crop_size = 5 [ default = 0 ];
// DEPRECATED. See TransformationParameter. Specify if we want to randomly
// mirror
// data.
optional bool mirror = 6 [ default = false ];
optional string root_folder = 12 [ default = "" ];
}
message InfogainLossParameter {
// Specify the infogain matrix source.
optional string source = 1;
optional int32 axis = 2 [ default = 1 ]; // axis of prob
}
message InnerProductParameter {
optional uint32 num_output = 1; // The number of outputs for the layer
optional bool bias_term = 2 [ default = true ]; // whether to have bias terms
optional FillerParameter weight_filler = 3; // The filler for the weight
optional FillerParameter bias_filler = 4; // The filler for the bias
// The first axis to be lumped into a single inner product computation;
// all preceding axes are retained in the output.
// May be negative to index from the end (e.g., -1 for the last axis).
optional int32 axis = 5 [ default = 1 ];
// Specify whether to transpose the weight matrix or not.
// If transpose == true, any operations will be performed on the transpose
// of the weight matrix. The weight matrix itself is not going to be
// transposed
// but rather the transfer flag of operations will be toggled accordingly.
optional bool transpose = 6 [ default = false ];
}
message InputParameter {
// This layer produces N >= 1 top blob(s) to be assigned manually.
// Define N shapes to set a shape for each top.
// Define 1 shape to set the same shape for every top.
// Define no shape to defer to reshaping manually.
repeated BlobShape shape = 1;
}
// Message that stores parameters used by LogLayer
message LogParameter {
// LogLayer computes outputs y = log_base(shift + scale * x), for base > 0.
// Or if base is set to the default (-1), base is set to e,
// so y = ln(shift + scale * x) = log_e(shift + scale * x)
optional float base = 1 [ default = -1.0 ];
optional float scale = 2 [ default = 1.0 ];
optional float shift = 3 [ default = 0.0 ];
}
// Message that stores parameters used by LRNLayer
message LRNParameter {
optional uint32 local_size = 1 [ default = 5 ];
optional float alpha = 2 [ default = 1. ];
optional float beta = 3 [ default = 0.75 ];
enum NormRegion {
ACROSS_CHANNELS = 0;
WITHIN_CHANNEL = 1;
}
optional NormRegion norm_region = 4 [ default = ACROSS_CHANNELS ];
optional float k = 5 [ default = 1. ];
enum Engine {
DEFAULT = 0;
CAFFE = 1;
CUDNN = 2;
}
optional Engine engine = 6 [ default = DEFAULT ];
}
message MemoryDataParameter {
optional uint32 batch_size = 1;
optional uint32 channels = 2;
optional uint32 height = 3;
optional uint32 width = 4;
}
message MVNParameter {
// This parameter can be set to false to normalize mean only
optional bool normalize_variance = 1 [ default = true ];
// This parameter can be set to true to perform DNN-like MVN
optional bool across_channels = 2 [ default = false ];
// Epsilon for not dividing by zero while normalizing variance
optional float eps = 3 [ default = 1e-9 ];
}
message ParameterParameter { optional BlobShape shape = 1; }
message PoolingParameter {
enum PoolMethod {
MAX = 0;
AVE = 1;
STOCHASTIC = 2;
}
optional PoolMethod pool = 1 [ default = MAX ]; // The pooling method
// Pad, kernel size, and stride are all given as a single value for equal
// dimensions in height and width or as Y, X pairs.
optional uint32 pad = 4 [ default = 0 ]; // The padding size (equal in Y, X)
optional uint32 pad_h = 9 [ default = 0 ]; // The padding height
optional uint32 pad_w = 10 [ default = 0 ]; // The padding width
optional uint32 kernel_size = 2; // The kernel size (square)
optional uint32 kernel_h = 5; // The kernel height
optional uint32 kernel_w = 6; // The kernel width
optional uint32 stride = 3 [ default = 1 ]; // The stride (equal in Y, X)
optional uint32 stride_h = 7; // The stride height
optional uint32 stride_w = 8; // The stride width
enum Engine {
DEFAULT = 0;
CAFFE = 1;
CUDNN = 2;
}
optional Engine engine = 11 [ default = DEFAULT ];
// If global_pooling then it will pool over the size of the bottom by doing
// kernel_h = bottom->height and kernel_w = bottom->width
optional bool global_pooling = 12 [ default = false ];
}
message PowerParameter {
// PowerLayer computes outputs y = (shift + scale * x) ^ power.
optional float power = 1 [ default = 1.0 ];
optional float scale = 2 [ default = 1.0 ];
optional float shift = 3 [ default = 0.0 ];
}
message PythonParameter {
optional string module = 1;
optional string layer = 2;
// This value is set to the attribute `param_str` of the `PythonLayer` object
// in Python before calling the `setup()` method. This could be a number,
// string, dictionary in Python dict format, JSON, etc. You may parse this
// string in `setup` method and use it in `forward` and `backward`.
optional string param_str = 3 [ default = ''];
// DEPRECATED
optional bool share_in_parallel = 4 [ default = false ];
}
// Message that stores parameters used by RecurrentLayer
message RecurrentParameter {
// The dimension of the output (and usually hidden state) representation --
// must be explicitly set to non-zero.
optional uint32 num_output = 1 [ default = 0 ];
optional FillerParameter weight_filler = 2; // The filler for the weight
optional FillerParameter bias_filler = 3; // The filler for the bias
// Whether to enable displaying debug_info in the unrolled recurrent net.
optional bool debug_info = 4 [ default = false ];
// Whether to add as additional inputs (bottoms) the initial hidden state
// blobs, and add as additional outputs (tops) the final timestep hidden state
// blobs. The number of additional bottom/top blobs required depends on the
// recurrent architecture -- e.g., 1 for RNNs, 2 for LSTMs.
optional bool expose_hidden = 5 [ default = false ];
}
// Message that stores parameters used by ReductionLayer
message ReductionParameter {
enum ReductionOp {
SUM = 1;
ASUM = 2;
SUMSQ = 3;
MEAN = 4;
}
optional ReductionOp operation = 1 [ default = SUM ]; // reduction operation
// The first axis to reduce to a scalar -- may be negative to index from the
// end (e.g., -1 for the last axis).
// (Currently, only reduction along ALL "tail" axes is supported; reduction
// of axis M through N, where N < num_axes - 1, is unsupported.)
// Suppose we have an n-axis bottom Blob with shape:
// (d0, d1, d2, ..., d(m-1), dm, d(m+1), ..., d(n-1)).
// If axis == m, the output Blob will have shape
// (d0, d1, d2, ..., d(m-1)),
// and the ReductionOp operation is performed (d0 * d1 * d2 * ... * d(m-1))
// times, each including (dm * d(m+1) * ... * d(n-1)) individual data.
// If axis == 0 (the default), the output Blob always has the empty shape
// (count 1), performing reduction across the entire input --
// often useful for creating new loss functions.
optional int32 axis = 2 [ default = 0 ];
optional float coeff = 3 [ default = 1.0 ]; // coefficient for output
}
// Message that stores parameters used by ReLULayer
message ReLUParameter {
// Allow non-zero slope for negative inputs to speed up optimization
// Described in:
// Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities
// improve neural network acoustic models. In ICML Workshop on Deep Learning
// for Audio, Speech, and Language Processing.
optional float negative_slope = 1 [ default = 0 ];
enum Engine {
DEFAULT = 0;
CAFFE = 1;
CUDNN = 2;
}
optional Engine engine = 2 [ default = DEFAULT ];
}
message ReshapeParameter {
// Specify the output dimensions. If some of the dimensions are set to 0,
// the corresponding dimension from the bottom layer is used (unchanged).
// Exactly one dimension may be set to -1, in which case its value is
// inferred from the count of the bottom blob and the remaining dimensions.
// For example, suppose we want to reshape a 2D blob "input" with shape 2 x 8:
//
// layer {
// type: "Reshape" bottom: "input" top: "output"
// reshape_param { ... }
// }
//
// If "input" is 2D with shape 2 x 8, then the following reshape_param
// specifications are all equivalent, producing a 3D blob "output" with shape
// 2 x 2 x 4:
//
// reshape_param { shape { dim: 2 dim: 2 dim: 4 } }
// reshape_param { shape { dim: 0 dim: 2 dim: 4 } }
// reshape_param { shape { dim: 0 dim: 2 dim: -1 } }
// reshape_param { shape { dim: 0 dim:-1 dim: 4 } }
//
optional BlobShape shape = 1;
// axis and num_axes control the portion of the bottom blob's shape that are
// replaced by (included in) the reshape. By default (axis == 0 and
// num_axes == -1), the entire bottom blob shape is included in the reshape,
// and hence the shape field must specify the entire output shape.
//
// axis may be non-zero to retain some portion of the beginning of the input
// shape (and may be negative to index from the end; e.g., -1 to begin the
// reshape after the last axis, including nothing in the reshape,
// -2 to include only the last axis, etc.).
//
// For example, suppose "input" is a 2D blob with shape 2 x 8.
// Then the following ReshapeLayer specifications are all equivalent,
// producing a blob "output" with shape 2 x 2 x 4:
//
// reshape_param { shape { dim: 2 dim: 2 dim: 4 } }
// reshape_param { shape { dim: 2 dim: 4 } axis: 1 }
// reshape_param { shape { dim: 2 dim: 4 } axis: -3 }
//
// num_axes specifies the extent of the reshape.
// If num_axes >= 0 (and axis >= 0), the reshape will be performed only on
// input axes in the range [axis, axis+num_axes].
// num_axes may also be -1, the default, to include all remaining axes
// (starting from axis).
//
// For example, suppose "input" is a 2D blob with shape 2 x 8.
// Then the following ReshapeLayer specifications are equivalent,
// producing a blob "output" with shape 1 x 2 x 8.
//
// reshape_param { shape { dim: 1 dim: 2 dim: 8 } }
// reshape_param { shape { dim: 1 dim: 2 } num_axes: 1 }
// reshape_param { shape { dim: 1 } num_axes: 0 }
//
// On the other hand, these would produce output blob shape 2 x 1 x 8:
//
// reshape_param { shape { dim: 2 dim: 1 dim: 8 } }
// reshape_param { shape { dim: 1 } axis: 1 num_axes: 0 }
//
optional int32 axis = 2 [ default = 0 ];
optional int32 num_axes = 3 [ default = -1 ];
}
message ScaleParameter {
// The first axis of bottom[0] (the first input Blob) along which to apply
// bottom[1] (the second input Blob). May be negative to index from the end
// (e.g., -1 for the last axis).
//
// For example, if bottom[0] is 4D with shape 100x3x40x60, the output
// top[0] will have the same shape, and bottom[1] may have any of the
// following shapes (for the given value of axis):
// (axis == 0 == -4) 100; 100x3; 100x3x40; 100x3x40x60
// (axis == 1 == -3) 3; 3x40; 3x40x60
// (axis == 2 == -2) 40; 40x60
// (axis == 3 == -1) 60
// Furthermore, bottom[1] may have the empty shape (regardless of the value of
// "axis") -- a scalar multiplier.
optional int32 axis = 1 [ default = 1 ];
// (num_axes is ignored unless just one bottom is given and the scale is
// a learned parameter of the layer. Otherwise, num_axes is determined by the
// number of axes by the second bottom.)
// The number of axes of the input (bottom[0]) covered by the scale
// parameter, or -1 to cover all axes of bottom[0] starting from `axis`.
// Set num_axes := 0, to multiply with a zero-axis Blob: a scalar.
optional int32 num_axes = 2 [ default = 1 ];
// (filler is ignored unless just one bottom is given and the scale is
// a learned parameter of the layer.)
// The initialization for the learned scale parameter.
// Default is the unit (1) initialization, resulting in the ScaleLayer
// initially performing the identity operation.
optional FillerParameter filler = 3;
// Whether to also learn a bias (equivalent to a ScaleLayer+BiasLayer, but
// may be more efficient). Initialized with bias_filler (defaults to 0).
optional bool bias_term = 4 [ default = false ];
optional FillerParameter bias_filler = 5;
}
message SigmoidParameter {
enum Engine {
DEFAULT = 0;
CAFFE = 1;
CUDNN = 2;
}
optional Engine engine = 1 [ default = DEFAULT ];
}
message SliceParameter {
// The axis along which to slice -- may be negative to index from the end
// (e.g., -1 for the last axis).
// By default, SliceLayer concatenates blobs along the "channels" axis (1).
optional int32 axis = 3 [ default = 1 ];
repeated uint32 slice_point = 2;
// DEPRECATED: alias for "axis" -- does not support negative indexing.
optional uint32 slice_dim = 1 [ default = 1 ];
}
// Message that stores parameters used by SoftmaxLayer, SoftmaxWithLossLayer
message SoftmaxParameter {
enum Engine {
DEFAULT = 0;
CAFFE = 1;
CUDNN = 2;
}
optional Engine engine = 1 [ default = DEFAULT ];
// The axis along which to perform the softmax -- may be negative to index
// from the end (e.g., -1 for the last axis).
// Any other axes will be evaluated as independent softmaxes.
optional int32 axis = 2 [ default = 1 ];
}
message TanHParameter {
enum Engine {
DEFAULT = 0;
CAFFE = 1;
CUDNN = 2;
}
optional Engine engine = 1 [ default = DEFAULT ];
}
// Message that stores parameters used by TileLayer
message TileParameter {
// The index of the axis to tile.
optional int32 axis = 1 [ default = 1 ];
// The number of copies (tiles) of the blob to output.
optional int32 tiles = 2;
}
// Message that stores parameters used by ThresholdLayer
message ThresholdParameter {
optional float threshold = 1 [ default = 0 ]; // Strictly positive values
}
message WindowDataParameter {
// Specify the data source.
optional string source = 1;
// For data pre-processing, we can do simple scaling and subtracting the
// data mean, if provided. Note that the mean subtraction is always carried
// out before scaling.
optional float scale = 2 [ default = 1 ];
optional string mean_file = 3;
// Specify the batch size.
optional uint32 batch_size = 4;
// Specify if we would like to randomly crop an image.
optional uint32 crop_size = 5 [ default = 0 ];
// Specify if we want to randomly mirror data.
optional bool mirror = 6 [ default = false ];
// Foreground (object) overlap threshold
optional float fg_threshold = 7 [ default = 0.5 ];
// Background (non-object) overlap threshold
optional float bg_threshold = 8 [ default = 0.5 ];
// Fraction of batch that should be foreground objects
optional float fg_fraction = 9 [ default = 0.25 ];
// Amount of contextual padding to add around a window
// (used only by the window_data_layer)
optional uint32 context_pad = 10 [ default = 0 ];
// Mode for cropping out a detection window
// warp: cropped window is warped to a fixed size and aspect ratio
// square: the tightest square around the window is cropped
optional string crop_mode = 11 [ default = "warp" ];
// cache_images: will load all images in memory for faster access
optional bool cache_images = 12 [ default = false ];
// append root_folder to locate images
optional string root_folder = 13 [ default = "" ];
}
message SPPParameter {
enum PoolMethod {
MAX = 0;
AVE = 1;
STOCHASTIC = 2;
}
optional uint32 pyramid_height = 1;
optional PoolMethod pool = 2 [ default = MAX ]; // The pooling method
enum Engine {
DEFAULT = 0;
CAFFE = 1;
CUDNN = 2;
}
optional Engine engine = 6 [ default = DEFAULT ];
}
// DEPRECATED: use LayerParameter.
message V1LayerParameter {
repeated string bottom = 2;
repeated string top = 3;
optional string name = 4;
repeated NetStateRule include = 32;
repeated NetStateRule exclude = 33;
enum LayerType {
NONE = 0;
ABSVAL = 35;
ACCURACY = 1;
ARGMAX = 30;
BNLL = 2;
CONCAT = 3;
CONTRASTIVE_LOSS = 37;
CONVOLUTION = 4;
DATA = 5;
DECONVOLUTION = 39;
DROPOUT = 6;
DUMMY_DATA = 32;
EUCLIDEAN_LOSS = 7;
ELTWISE = 25;
EXP = 38;
FLATTEN = 8;
HDF5_DATA = 9;
HDF5_OUTPUT = 10;
HINGE_LOSS = 28;
IM2COL = 11;
IMAGE_DATA = 12;
INFOGAIN_LOSS = 13;
INNER_PRODUCT = 14;
LRN = 15;
MEMORY_DATA = 29;
MULTINOMIAL_LOGISTIC_LOSS = 16;
MVN = 34;
POOLING = 17;
POWER = 26;
RELU = 18;
SIGMOID = 19;
SIGMOID_CROSS_ENTROPY_LOSS = 27;
SILENCE = 36;
SOFTMAX = 20;
SOFTMAX_LOSS = 21;
SPLIT = 22;
SLICE = 33;
TANH = 23;
WINDOW_DATA = 24;
THRESHOLD = 31;
}
optional LayerType type = 5;
repeated BlobProto blobs = 6;
repeated string param = 1001;
repeated DimCheckMode blob_share_mode = 1002;
enum DimCheckMode {
STRICT = 0;
PERMISSIVE = 1;
}
repeated float blobs_lr = 7;
repeated float weight_decay = 8;
repeated float loss_weight = 35;
optional AccuracyParameter accuracy_param = 27;
optional ArgMaxParameter argmax_param = 23;
optional ConcatParameter concat_param = 9;
optional ContrastiveLossParameter contrastive_loss_param = 40;
optional ConvolutionParameter convolution_param = 10;
optional DataParameter data_param = 11;
optional DropoutParameter dropout_param = 12;
optional DummyDataParameter dummy_data_param = 26;
optional EltwiseParameter eltwise_param = 24;
optional ExpParameter exp_param = 41;
optional HDF5DataParameter hdf5_data_param = 13;
optional HDF5OutputParameter hdf5_output_param = 14;
optional HingeLossParameter hinge_loss_param = 29;
optional ImageDataParameter image_data_param = 15;
optional InfogainLossParameter infogain_loss_param = 16;
optional InnerProductParameter inner_product_param = 17;
optional LRNParameter lrn_param = 18;
optional MemoryDataParameter memory_data_param = 22;
optional MVNParameter mvn_param = 34;
optional PoolingParameter pooling_param = 19;
optional PowerParameter power_param = 21;
optional ReLUParameter relu_param = 30;
optional SigmoidParameter sigmoid_param = 38;
optional SoftmaxParameter softmax_param = 39;
optional SliceParameter slice_param = 31;
optional TanHParameter tanh_param = 37;
optional ThresholdParameter threshold_param = 25;
optional WindowDataParameter window_data_param = 20;
optional TransformationParameter transform_param = 36;
optional LossParameter loss_param = 42;
optional V0LayerParameter layer = 1;
}
// DEPRECATED: V0LayerParameter is the old way of specifying layer parameters
// in Caffe. We keep this message type around for legacy support.
message V0LayerParameter {
optional string name = 1; // the layer name
optional string type = 2; // the string to specify the layer type
// Parameters to specify layers with inner products.
optional uint32 num_output = 3; // The number of outputs for the layer
optional bool biasterm = 4 [ default = true ]; // whether to have bias terms
optional FillerParameter weight_filler = 5; // The filler for the weight
optional FillerParameter bias_filler = 6; // The filler for the bias
optional uint32 pad = 7 [ default = 0 ]; // The padding size
optional uint32 kernelsize = 8; // The kernel size
optional uint32 group = 9 [ default = 1 ]; // The group size for group conv
optional uint32 stride = 10 [ default = 1 ]; // The stride
enum PoolMethod {
MAX = 0;
AVE = 1;
STOCHASTIC = 2;
}
optional PoolMethod pool = 11 [ default = MAX ]; // The pooling method
optional float dropout_ratio = 12 [ default = 0.5 ]; // dropout ratio
optional uint32 local_size = 13 [ default = 5 ]; // for local response norm
optional float alpha = 14 [ default = 1. ]; // for local response norm
optional float beta = 15 [ default = 0.75 ]; // for local response norm
optional float k = 22 [ default = 1. ];
// For data layers, specify the data source
optional string source = 16;
// For data pre-processing, we can do simple scaling and subtracting the
// data mean, if provided. Note that the mean subtraction is always carried
// out before scaling.
optional float scale = 17 [ default = 1 ];
optional string meanfile = 18;
// For data layers, specify the batch size.
optional uint32 batchsize = 19;
// For data layers, specify if we would like to randomly crop an image.
optional uint32 cropsize = 20 [ default = 0 ];
// For data layers, specify if we want to randomly mirror data.
optional bool mirror = 21 [ default = false ];
// The blobs containing the numeric parameters of the layer
repeated BlobProto blobs = 50;
// The ratio that is multiplied on the global learning rate. If you want to
// set the learning ratio for one blob, you need to set it for all blobs.
repeated float blobs_lr = 51;
// The weight decay that is multiplied on the global weight decay.
repeated float weight_decay = 52;
// The rand_skip variable is for the data layer to skip a few data points
// to avoid all asynchronous sgd clients to start at the same point. The skip
// point would be set as rand_skip * rand(0,1). Note that rand_skip should not
// be larger than the number of keys in the database.
optional uint32 rand_skip = 53 [ default = 0 ];
// Fields related to detection (det_*)
// foreground (object) overlap threshold
optional float det_fg_threshold = 54 [ default = 0.5 ];
// background (non-object) overlap threshold
optional float det_bg_threshold = 55 [ default = 0.5 ];
// Fraction of batch that should be foreground objects
optional float det_fg_fraction = 56 [ default = 0.25 ];
// optional bool OBSOLETE_can_clobber = 57 [default = true];
// Amount of contextual padding to add around a window
// (used only by the window_data_layer)
optional uint32 det_context_pad = 58 [ default = 0 ];
// Mode for cropping out a detection window
// warp: cropped window is warped to a fixed size and aspect ratio
// square: the tightest square around the window is cropped
optional string det_crop_mode = 59 [ default = "warp" ];
// For ReshapeLayer, one needs to specify the new dimensions.
optional int32 new_num = 60 [ default = 0 ];
optional int32 new_channels = 61 [ default = 0 ];
optional int32 new_height = 62 [ default = 0 ];
optional int32 new_width = 63 [ default = 0 ];
// Whether or not ImageLayer should shuffle the list of files at every epoch.
// It will also resize images if new_height or new_width are not zero.
optional bool shuffle_images = 64 [ default = false ];
// For ConcatLayer, one needs to specify the dimension for concatenation, and
// the other dimensions must be the same for all the bottom blobs.
// By default it will concatenate blobs along the channels dimension.
optional uint32 concat_dim = 65 [ default = 1 ];
optional HDF5OutputParameter hdf5_output_param = 1001;
}
message PReLUParameter {
// Parametric ReLU described in K. He et al, Delving Deep into Rectifiers:
// Surpassing Human-Level Performance on ImageNet Classification, 2015.
// Initial value of a_i. Default is a_i=0.25 for all i.
optional FillerParameter filler = 1;
// Whether or not slope parameters are shared across channels.
optional bool channel_shared = 2 [ default = false ];
}
#!/bin/bash
#function:
# script used to generate caffepb.py from caffe.proto using protoc
#
PROTOC=`which protoc`
if [[ -z $PROTOC ]];then
echo "not found protoc, you should first install it following this[https://github.com/google/protobuf/releases]"
exit 1
fi
WORK_ROOT=$(dirname `readlink -f "$BASH_SOURCE[0]"`)
PY_NAME="$WORK_ROOT/caffepb.py"
$PROTOC --proto_path=$WORK_ROOT --python_out=$WORK_ROOT $WORK_ROOT/caffe.proto
ret=$?
if [ $ret -eq 0 ];then
mv $WORK_ROOT/caffe_pb2.py $PY_NAME
fi
if [ -e "$PY_NAME" ];then
echo "succeed to generate [$PY_NAME]"
exit 0
else
echo "failed to generate [$PY_NAME]"
fi
exit $ret
import os
import paddle.v2 as paddle
import paddle.fluid as fluid
from paddle.fluid.initializer import MSRA
from paddle.fluid.param_attr import ParamAttr
parameter_attr = ParamAttr(initializer=MSRA())
def conv_bn_layer(input,
filter_size,
num_filters,
stride,
padding,
channels=None,
num_groups=1,
act='relu',
use_cudnn=True):
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=num_groups,
act=None,
use_cudnn=use_cudnn,
param_attr=parameter_attr,
bias_attr=False)
return fluid.layers.batch_norm(input=conv, act=act)
def depthwise_separable(input, num_filters1, num_filters2, num_groups, stride,
scale):
"""
"""
depthwise_conv = conv_bn_layer(
input=input,
filter_size=3,
num_filters=int(num_filters1 * scale),
stride=stride,
padding=1,
num_groups=int(num_groups * scale),
use_cudnn=False)
pointwise_conv = conv_bn_layer(
input=depthwise_conv,
filter_size=1,
num_filters=int(num_filters2 * scale),
stride=1,
padding=0)
return pointwise_conv
def mobile_net(img, class_dim, scale=1.0):
# conv1: 112x112
tmp = conv_bn_layer(
img,
filter_size=3,
channels=3,
num_filters=int(32 * scale),
stride=2,
padding=1)
# 56x56
tmp = depthwise_separable(
tmp,
num_filters1=32,
num_filters2=64,
num_groups=32,
stride=1,
scale=scale)
tmp = depthwise_separable(
tmp,
num_filters1=64,
num_filters2=128,
num_groups=64,
stride=2,
scale=scale)
# 28x28
tmp = depthwise_separable(
tmp,
num_filters1=128,
num_filters2=128,
num_groups=128,
stride=1,
scale=scale)
tmp = depthwise_separable(
tmp,
num_filters1=128,
num_filters2=256,
num_groups=128,
stride=2,
scale=scale)
# 14x14
tmp = depthwise_separable(
tmp,
num_filters1=256,
num_filters2=256,
num_groups=256,
stride=1,
scale=scale)
tmp = depthwise_separable(
tmp,
num_filters1=256,
num_filters2=512,
num_groups=256,
stride=2,
scale=scale)
# 14x14
for i in range(5):
tmp = depthwise_separable(
tmp,
num_filters1=512,
num_filters2=512,
num_groups=512,
stride=1,
scale=scale)
# 7x7
tmp = depthwise_separable(
tmp,
num_filters1=512,
num_filters2=1024,
num_groups=512,
stride=2,
scale=scale)
tmp = depthwise_separable(
tmp,
num_filters1=1024,
num_filters2=1024,
num_groups=1024,
stride=1,
scale=scale)
tmp = fluid.layers.pool2d(
input=tmp,
pool_size=0,
pool_stride=1,
pool_type='avg',
global_pooling=True)
tmp = fluid.layers.fc(input=tmp,
size=class_dim,
act='softmax',
param_attr=parameter_attr)
return tmp
def train(learning_rate, batch_size, num_passes, model_save_dir='model'):
class_dim = 102
image_shape = [3, 224, 224]
image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
out = mobile_net(image, class_dim=class_dim)
cost = fluid.layers.cross_entropy(input=out, label=label)
avg_cost = fluid.layers.mean(x=cost)
optimizer = fluid.optimizer.Momentum(
learning_rate=learning_rate,
momentum=0.9,
regularization=fluid.regularizer.L2Decay(5 * 1e-5))
opts = optimizer.minimize(avg_cost)
b_size_var = fluid.layers.create_tensor(dtype='int64')
b_acc_var = fluid.layers.accuracy(input=out, label=label, total=b_size_var)
inference_program = fluid.default_main_program().clone()
with fluid.program_guard(inference_program):
inference_program = fluid.io.get_inference_program(
target_vars=[b_acc_var, b_size_var])
place = fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
train_reader = paddle.batch(
paddle.dataset.flowers.train(), batch_size=batch_size)
test_reader = paddle.batch(
paddle.dataset.flowers.test(), batch_size=batch_size)
feeder = fluid.DataFeeder(place=place, feed_list=[image, label])
train_pass_acc_evaluator = fluid.average.WeightedAverage()
test_pass_acc_evaluator = fluid.average.WeightedAverage()
for pass_id in range(num_passes):
train_pass_acc_evaluator.reset()
for batch_id, data in enumerate(train_reader()):
loss, acc, size = exe.run(
fluid.default_main_program(),
feed=feeder.feed(data),
fetch_list=[avg_cost, b_acc_var, b_size_var])
train_pass_acc_evaluator.add(value=acc, weight=size)
print("Pass {0}, batch {1}, loss {2}, acc {3}".format(
pass_id, batch_id, loss[0], acc[0]))
test_pass_acc_evaluator.reset()
for data in test_reader():
loss, acc, size = exe.run(
inference_program,
feed=feeder.feed(data),
fetch_list=[avg_cost, b_acc_var, b_size_var])
test_pass_acc_evaluator.add(value=acc, weight=size)
print("End pass {0}, train_acc {1}, test_acc {2}".format(
pass_id,
train_pass_acc_evaluator.eval(), test_pass_acc_evaluator.eval()))
if pass_id % 10 == 0:
model_path = os.path.join(model_save_dir, str(pass_id))
print 'save models to %s' % (model_path)
fluid.io.save_inference_model(model_path, ['image'], [out], exe)
if __name__ == '__main__':
train(learning_rate=0.005, batch_size=40, num_passes=300)
import os import os
import paddle.v2 as paddle import paddle.v2 as paddle
import paddle.v2.fluid as fluid import paddle.fluid as fluid
import reader import reader
...@@ -103,66 +103,87 @@ def train(learning_rate, ...@@ -103,66 +103,87 @@ def train(learning_rate,
batch_size, batch_size,
num_passes, num_passes,
init_model=None, init_model=None,
model_save_dir='model'): model_save_dir='model',
parallel=True):
class_dim = 1000 class_dim = 1000
image_shape = [3, 224, 224] image_shape = [3, 224, 224]
image = fluid.layers.data(name='image', shape=image_shape, dtype='float32') image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64') label = fluid.layers.data(name='label', shape=[1], dtype='int64')
out = SE_ResNeXt(input=image, class_dim=class_dim) if parallel:
places = fluid.layers.get_places()
cost = fluid.layers.cross_entropy(input=out, label=label) pd = fluid.layers.ParallelDo(places)
avg_cost = fluid.layers.mean(x=cost)
with pd.do():
image_ = pd.read_input(image)
label_ = pd.read_input(label)
out = SE_ResNeXt(input=image_, class_dim=class_dim)
cost = fluid.layers.cross_entropy(input=out, label=label_)
avg_cost = fluid.layers.mean(x=cost)
accuracy = fluid.layers.accuracy(input=out, label=label_)
pd.write_output(avg_cost)
pd.write_output(accuracy)
avg_cost, accuracy = pd()
avg_cost = fluid.layers.mean(x=avg_cost)
accuracy = fluid.layers.mean(x=accuracy)
else:
out = SE_ResNeXt(input=image, class_dim=class_dim)
cost = fluid.layers.cross_entropy(input=out, label=label)
avg_cost = fluid.layers.mean(x=cost)
accuracy = fluid.layers.accuracy(input=out, label=label)
optimizer = fluid.optimizer.Momentum( optimizer = fluid.optimizer.Momentum(
learning_rate=learning_rate, learning_rate=learning_rate,
momentum=0.9, momentum=0.9,
regularization=fluid.regularizer.L2Decay(1e-4)) regularization=fluid.regularizer.L2Decay(1e-4))
opts = optimizer.minimize(avg_cost) opts = optimizer.minimize(avg_cost)
accuracy = fluid.evaluator.Accuracy(input=out, label=label)
inference_program = fluid.default_main_program().clone() inference_program = fluid.default_main_program().clone()
with fluid.program_guard(inference_program): with fluid.program_guard(inference_program):
test_accuracy = fluid.evaluator.Accuracy(input=out, label=label) inference_program = fluid.io.get_inference_program([avg_cost, accuracy])
test_target = [avg_cost] + test_accuracy.metrics + test_accuracy.states
inference_program = fluid.io.get_inference_program(test_target)
place = fluid.CUDAPlace(0) place = fluid.CUDAPlace(0)
exe = fluid.Executor(place) exe = fluid.Executor(place)
exe.run(fluid.default_startup_program()) exe.run(fluid.default_startup_program())
if init_model is not None: if init_model is not None:
fluid.io.load_persistables_if_exist(exe, init_model) fluid.io.load_persistables(exe, init_model)
train_reader = paddle.batch(reader.train(), batch_size=batch_size) train_reader = paddle.batch(reader.train(), batch_size=batch_size)
test_reader = paddle.batch(reader.test(), batch_size=batch_size) test_reader = paddle.batch(reader.test(), batch_size=batch_size)
feeder = fluid.DataFeeder(place=place, feed_list=[image, label]) feeder = fluid.DataFeeder(place=place, feed_list=[image, label])
for pass_id in range(num_passes): for pass_id in range(num_passes):
accuracy.reset(exe)
for batch_id, data in enumerate(train_reader()): for batch_id, data in enumerate(train_reader()):
loss, acc = exe.run(fluid.default_main_program(), loss = exe.run(fluid.default_main_program(),
feed=feeder.feed(data), feed=feeder.feed(data),
fetch_list=[avg_cost] + accuracy.metrics) fetch_list=[avg_cost])
print("Pass {0}, batch {1}, loss {2}, acc {3}".format( print("Pass {0}, batch {1}, loss {2}".format(pass_id, batch_id,
pass_id, batch_id, loss[0], acc[0])) float(loss[0])))
pass_acc = accuracy.eval(exe)
total_loss = 0.0
test_accuracy.reset(exe) total_acc = 0.0
total_batch = 0
for data in test_reader(): for data in test_reader():
loss, acc = exe.run(inference_program, loss, acc = exe.run(inference_program,
feed=feeder.feed(data), feed=feeder.feed(data),
fetch_list=[avg_cost] + test_accuracy.metrics) fetch_list=[avg_cost, accuracy])
test_pass_acc = test_accuracy.eval(exe) total_loss += float(loss)
print("End pass {0}, train_acc {1}, test_acc {2}".format( total_acc += float(acc)
pass_id, pass_acc, test_pass_acc)) total_batch += 1
print("End pass {0}, test_loss {1}, test_acc {2}".format(
pass_id, total_loss / total_batch, total_acc / total_batch))
model_path = os.path.join(model_save_dir, str(pass_id)) model_path = os.path.join(model_save_dir, str(pass_id))
if not os.path.isdir(model_path): fluid.io.save_inference_model(model_path, ['image'], [out], exe)
os.makedirs(model_path)
fluid.io.save_persistables(exe, model_path)
if __name__ == '__main__': if __name__ == '__main__':
train(learning_rate=0.1, batch_size=8, num_passes=100, init_model=None) train(
learning_rate=0.1,
batch_size=8,
num_passes=100,
init_model=None,
parallel=False)
The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
---
This is a collection of example models for neural machine translation and neural sequence modeling.
### TODO
This project is still under active development.
The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
---
# Attention is All You Need: A Paddle Fluid implementation
This is a Paddle Fluid implementation of the Transformer model in [Attention is All You Need]() (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017).
If you use the dataset/code in your research, please cite the paper:
```text
@inproceedings{vaswani2017attention,
title={Attention is all you need},
author={Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia},
booktitle={Advances in Neural Information Processing Systems},
pages={6000--6010},
year={2017}
}
```
### TODO
This project is still under active development.
class TrainTaskConfig(object):
use_gpu = False
# the epoch number to train.
pass_num = 2
# the number of sequences contained in a mini-batch.
batch_size = 64
# the hyper parameters for Adam optimizer.
learning_rate = 0.001
beta1 = 0.9
beta2 = 0.98
eps = 1e-9
# the parameters for learning rate scheduling.
warmup_steps = 4000
# the directory for saving trained models.
model_dir = "trained_models"
class InferTaskConfig(object):
use_gpu = False
# the number of examples in one run for sequence generation.
# currently the batch size can only be set to 1.
batch_size = 1
# the parameters for beam search.
beam_size = 5
max_length = 30
# the number of decoded sentences to output.
n_best = 1
# the directory for loading the trained model.
model_path = "trained_models/pass_1.infer.model"
class ModelHyperParams(object):
# Dictionary size for source and target language. This model directly uses
# paddle.dataset.wmt16 in which <bos>, <eos> and <unk> token has
# alreay been added, but the <pad> token is not added. Transformer requires
# sequences in a mini-batch are padded to have the same length. A <pad> token is
# added into the original dictionary in paddle.dateset.wmt16.
# size of source word dictionary.
src_vocab_size = 10000
# index for <pad> token in source language.
src_pad_idx = src_vocab_size
# size of target word dictionay
trg_vocab_size = 10000
# index for <pad> token in target language.
trg_pad_idx = trg_vocab_size
# index for <bos> token
bos_idx = 0
# index for <eos> token
eos_idx = 1
# position value corresponding to the <pad> token.
pos_pad_idx = 0
# max length of sequences. It should plus 1 to include position
# padding token for position encoding.
max_length = 50
# the dimension for word embeddings, which is also the last dimension of
# the input and output of multi-head attention, position-wise feed-forward
# networks, encoder and decoder.
d_model = 512
# size of the hidden layer in position-wise feed-forward networks.
d_inner_hid = 1024
# the dimension that keys are projected to for dot-product attention.
d_key = 64
# the dimension that values are projected to for dot-product attention.
d_value = 64
# number of head used in multi-head attention.
n_head = 8
# number of sub-layers to be stacked in the encoder and decoder.
n_layer = 6
# dropout rate used by all dropout layers.
dropout = 0.1
# Names of position encoding table which will be initialized externally.
pos_enc_param_names = (
"src_pos_enc_table",
"trg_pos_enc_table", )
# Names of all data layers in encoder listed in order.
encoder_input_data_names = (
"src_word",
"src_pos",
"src_slf_attn_bias", )
# Names of all data layers in decoder listed in order.
decoder_input_data_names = (
"trg_word",
"trg_pos",
"trg_slf_attn_bias",
"trg_src_attn_bias",
"enc_output", )
# Names of label related data layers listed in order.
label_data_names = (
"lbl_word",
"lbl_weight", )
import numpy as np
import paddle.v2 as paddle
import paddle.fluid as fluid
import model
from model import wrap_encoder as encoder
from model import wrap_decoder as decoder
from config import InferTaskConfig, ModelHyperParams, \
encoder_input_data_names, decoder_input_data_names
from train import pad_batch_data
def translate_batch(exe, src_words, encoder, enc_in_names, enc_out_names,
decoder, dec_in_names, dec_out_names, beam_size, max_length,
n_best, batch_size, n_head, src_pad_idx, trg_pad_idx,
bos_idx, eos_idx):
"""
Run the encoder program once and run the decoder program multiple times to
implement beam search externally.
"""
# Prepare data for encoder and run the encoder.
enc_in_data = pad_batch_data(
src_words,
src_pad_idx,
n_head,
is_target=False,
return_pos=True,
return_attn_bias=True,
return_max_len=True)
enc_output = exe.run(encoder,
feed=dict(zip(enc_in_names, enc_in_data)),
fetch_list=enc_out_names)[0]
# Beam Search.
# To store the beam info.
scores = np.zeros((batch_size, beam_size), dtype="float32")
prev_branchs = [[]] * batch_size
next_ids = [[]] * batch_size
# Use beam_map to map the instance idx in batch to beam idx, since the
# size of feeded batch is changing.
beam_map = range(batch_size)
def beam_backtrace(prev_branchs, next_ids, n_best=beam_size, add_bos=True):
"""
Decode and select n_best sequences for one instance by backtrace.
"""
seqs = []
for i in range(n_best):
k = i
seq = []
for j in range(len(prev_branchs) - 1, -1, -1):
seq.append(next_ids[j][k])
k = prev_branchs[j][k]
seq = seq[::-1]
seq = [bos_idx] + seq if add_bos else seq
seqs.append(seq)
return seqs
def init_dec_in_data(batch_size, beam_size, enc_in_data, enc_output):
"""
Initialize the input data for decoder.
"""
trg_words = np.array(
[[bos_idx]] * batch_size * beam_size, dtype="int64")
trg_pos = np.array([[1]] * batch_size * beam_size, dtype="int64")
src_max_length, src_slf_attn_bias, trg_max_len = enc_in_data[
-1], enc_in_data[-2], 1
# This is used to remove attention on subsequent words.
trg_slf_attn_bias = np.ones((batch_size * beam_size, trg_max_len,
trg_max_len))
trg_slf_attn_bias = np.triu(trg_slf_attn_bias, 1).reshape(
[-1, 1, trg_max_len, trg_max_len])
trg_slf_attn_bias = (np.tile(trg_slf_attn_bias, [1, n_head, 1, 1]) *
[-1e9]).astype("float32")
# This is used to remove attention on the paddings of source sequences.
trg_src_attn_bias = np.tile(
src_slf_attn_bias[:, :, ::src_max_length, :],
[beam_size, 1, trg_max_len, 1])
enc_output = np.tile(enc_output, [beam_size, 1, 1])
return trg_words, trg_pos, trg_slf_attn_bias, trg_src_attn_bias, enc_output
def update_dec_in_data(dec_in_data, next_ids, active_beams):
"""
Update the input data of decoder mainly by slicing from the previous
input data and dropping the finished instance beams.
"""
trg_words, trg_pos, trg_slf_attn_bias, trg_src_attn_bias, enc_output = dec_in_data
trg_cur_len = len(next_ids[0]) + 1 # include the <bos>
trg_words = np.array(
[
beam_backtrace(
prev_branchs[beam_idx], next_ids[beam_idx], add_bos=True)
for beam_idx in active_beams
],
dtype="int64")
trg_words = trg_words.reshape([-1, 1])
trg_pos = np.array(
[range(1, trg_cur_len + 1)] * len(active_beams) * beam_size,
dtype="int64").reshape([-1, 1])
active_beams_indice = (
(np.array(active_beams) * beam_size)[:, np.newaxis] +
np.array(range(beam_size))[np.newaxis, :]).flatten()
# This is used to remove attention on subsequent words.
trg_slf_attn_bias = np.ones((len(active_beams) * beam_size, trg_cur_len,
trg_cur_len))
trg_slf_attn_bias = np.triu(trg_slf_attn_bias, 1).reshape(
[-1, 1, trg_cur_len, trg_cur_len])
trg_slf_attn_bias = (np.tile(trg_slf_attn_bias, [1, n_head, 1, 1]) *
[-1e9]).astype("float32")
# This is used to remove attention on the paddings of source sequences.
trg_src_attn_bias = np.tile(trg_src_attn_bias[
active_beams_indice, :, ::trg_src_attn_bias.shape[2], :],
[1, 1, trg_cur_len, 1])
enc_output = enc_output[active_beams_indice, :, :]
return trg_words, trg_pos, trg_slf_attn_bias, trg_src_attn_bias, enc_output
dec_in_data = init_dec_in_data(batch_size, beam_size, enc_in_data,
enc_output)
for i in range(max_length):
predict_all = exe.run(decoder,
feed=dict(zip(dec_in_names, dec_in_data)),
fetch_list=dec_out_names)[0]
predict_all = np.log(
predict_all.reshape([len(beam_map) * beam_size, i + 1, -1])[:,
-1, :])
predict_all = (predict_all + scores[beam_map].reshape(
[len(beam_map) * beam_size, -1])).reshape(
[len(beam_map), beam_size, -1])
active_beams = []
for inst_idx, beam_idx in enumerate(beam_map):
predict = (predict_all[inst_idx, :, :]
if i != 0 else predict_all[inst_idx, 0, :]).flatten()
top_k_indice = np.argpartition(predict, -beam_size)[-beam_size:]
top_scores_ids = top_k_indice[np.argsort(predict[top_k_indice])[::
-1]]
top_scores = predict[top_scores_ids]
scores[beam_idx] = top_scores
prev_branchs[beam_idx].append(top_scores_ids /
predict_all.shape[-1])
next_ids[beam_idx].append(top_scores_ids % predict_all.shape[-1])
if next_ids[beam_idx][-1][0] != eos_idx:
active_beams.append(beam_idx)
beam_map = active_beams
if len(beam_map) == 0:
break
dec_in_data = update_dec_in_data(dec_in_data, next_ids, active_beams)
# Decode beams and select n_best sequences for each instance by backtrace.
seqs = [beam_backtrace(prev_branchs[beam_idx], next_ids[beam_idx], n_best)]
return seqs, scores[:, :n_best].tolist()
def main():
place = fluid.CUDAPlace(0) if InferTaskConfig.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
# The current program desc is coupled with batch_size and the only
# supported batch size is 1 currently.
encoder_program = fluid.Program()
model.batch_size = InferTaskConfig.batch_size
with fluid.program_guard(main_program=encoder_program):
enc_output = encoder(
ModelHyperParams.src_vocab_size + 1,
ModelHyperParams.max_length + 1, ModelHyperParams.n_layer,
ModelHyperParams.n_head, ModelHyperParams.d_key,
ModelHyperParams.d_value, ModelHyperParams.d_model,
ModelHyperParams.d_inner_hid, ModelHyperParams.dropout,
ModelHyperParams.src_pad_idx, ModelHyperParams.pos_pad_idx)
model.batch_size = InferTaskConfig.batch_size * InferTaskConfig.beam_size
decoder_program = fluid.Program()
with fluid.program_guard(main_program=decoder_program):
predict = decoder(
ModelHyperParams.trg_vocab_size + 1,
ModelHyperParams.max_length + 1, ModelHyperParams.n_layer,
ModelHyperParams.n_head, ModelHyperParams.d_key,
ModelHyperParams.d_value, ModelHyperParams.d_model,
ModelHyperParams.d_inner_hid, ModelHyperParams.dropout,
ModelHyperParams.trg_pad_idx, ModelHyperParams.pos_pad_idx)
# Load model parameters of encoder and decoder separately from the saved
# transformer model.
encoder_var_names = []
for op in encoder_program.block(0).ops:
encoder_var_names += op.input_arg_names
encoder_param_names = filter(
lambda var_name: isinstance(encoder_program.block(0).var(var_name),
fluid.framework.Parameter),
encoder_var_names)
encoder_params = map(encoder_program.block(0).var, encoder_param_names)
decoder_var_names = []
for op in decoder_program.block(0).ops:
decoder_var_names += op.input_arg_names
decoder_param_names = filter(
lambda var_name: isinstance(decoder_program.block(0).var(var_name),
fluid.framework.Parameter),
decoder_var_names)
decoder_params = map(decoder_program.block(0).var, decoder_param_names)
fluid.io.load_vars(exe, InferTaskConfig.model_path, vars=encoder_params)
fluid.io.load_vars(exe, InferTaskConfig.model_path, vars=decoder_params)
# This is used here to set dropout to the test mode.
encoder_program = fluid.io.get_inference_program(
target_vars=[enc_output], main_program=encoder_program)
decoder_program = fluid.io.get_inference_program(
target_vars=[predict], main_program=decoder_program)
test_data = paddle.batch(
paddle.dataset.wmt16.test(ModelHyperParams.src_vocab_size,
ModelHyperParams.trg_vocab_size),
batch_size=InferTaskConfig.batch_size)
trg_idx2word = paddle.dataset.wmt16.get_dict(
"de", dict_size=ModelHyperParams.trg_vocab_size, reverse=True)
for batch_id, data in enumerate(test_data()):
batch_seqs, batch_scores = translate_batch(
exe, [item[0] for item in data], encoder_program,
encoder_input_data_names, [enc_output.name], decoder_program,
decoder_input_data_names, [predict.name], InferTaskConfig.beam_size,
InferTaskConfig.max_length, InferTaskConfig.n_best,
len(data), ModelHyperParams.n_head, ModelHyperParams.src_pad_idx,
ModelHyperParams.trg_pad_idx, ModelHyperParams.bos_idx,
ModelHyperParams.eos_idx)
for i in range(len(batch_seqs)):
seqs = batch_seqs[i]
scores = batch_scores[i]
for seq in seqs:
print(" ".join([trg_idx2word[idx] for idx in seq]))
if __name__ == "__main__":
main()
from functools import partial
import numpy as np
import paddle.fluid as fluid
import paddle.fluid.layers as layers
from config import TrainTaskConfig, pos_enc_param_names, \
encoder_input_data_names, decoder_input_data_names, label_data_names
# FIXME(guosheng): Remove out the batch_size from the model.
batch_size = TrainTaskConfig.batch_size
def position_encoding_init(n_position, d_pos_vec):
"""
Generate the initial values for the sinusoid position encoding table.
"""
position_enc = np.array([[
pos / np.power(10000, 2 * (j // 2) / d_pos_vec)
for j in range(d_pos_vec)
] if pos != 0 else np.zeros(d_pos_vec) for pos in range(n_position)])
position_enc[1:, 0::2] = np.sin(position_enc[1:, 0::2]) # dim 2i
position_enc[1:, 1::2] = np.cos(position_enc[1:, 1::2]) # dim 2i+1
return position_enc.astype("float32")
def multi_head_attention(queries,
keys,
values,
attn_bias,
d_key,
d_value,
d_model,
n_head=1,
dropout_rate=0.):
"""
Multi-Head Attention. Note that attn_bias is added to the logit before
computing softmax activiation to mask certain selected positions so that
they will not considered in attention weights.
"""
if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
raise ValueError(
"Inputs: quries, keys and values should all be 3-D tensors.")
def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(input=queries,
size=d_key * n_head,
param_attr=fluid.initializer.Xavier(
uniform=False,
fan_in=d_model * d_key,
fan_out=n_head * d_key),
bias_attr=False,
num_flatten_dims=2)
k = layers.fc(input=keys,
size=d_key * n_head,
param_attr=fluid.initializer.Xavier(
uniform=False,
fan_in=d_model * d_key,
fan_out=n_head * d_key),
bias_attr=False,
num_flatten_dims=2)
v = layers.fc(input=values,
size=d_value * n_head,
param_attr=fluid.initializer.Xavier(
uniform=False,
fan_in=d_model * d_value,
fan_out=n_head * d_value),
bias_attr=False,
num_flatten_dims=2)
return q, k, v
def __split_heads(x, n_head):
"""
Reshape the last dimension of inpunt tensor x so that it becomes two
dimensions and then transpose. Specifically, input a tensor with shape
[bs, max_sequence_length, n_head * hidden_dim] then output a tensor
with shape [bs, n_head, max_sequence_length, hidden_dim].
"""
if n_head == 1:
return x
hidden_size = x.shape[-1]
# FIXME(guosheng): Decouple the program desc with batch_size.
reshaped = layers.reshape(
x=x, shape=[batch_size, -1, n_head, hidden_size // n_head])
# permuate the dimensions into:
# [batch_size, n_head, max_sequence_len, hidden_size_per_head]
return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
def __combine_heads(x):
"""
Transpose and then reshape the last two dimensions of inpunt tensor x
so that it becomes one dimension, which is reverse to __split_heads.
"""
if len(x.shape) == 3: return x
if len(x.shape) != 4:
raise ValueError("Input(x) should be a 4-D Tensor.")
trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
# FIXME(guosheng): Decouple the program desc with batch_size.
return layers.reshape(
x=trans_x,
shape=map(int,
[batch_size, -1, trans_x.shape[2] * trans_x.shape[3]]))
def scaled_dot_product_attention(q, k, v, attn_bias, d_model, dropout_rate):
"""
Scaled Dot-Product Attention
"""
# FIXME(guosheng): Optimize the shape in reshape_op or softmax_op.
# The current implementation of softmax_op only supports 2D tensor,
# consequently it cannot be directly used here.
# If to use the reshape_op, Besides, the shape of product inferred in
# compile-time is not the actual shape in run-time. It cann't be used
# to set the attribute of reshape_op.
# So, here define the softmax for temporary solution.
def __softmax(x, eps=1e-9):
exp_out = layers.exp(x=x)
sum_out = layers.reduce_sum(exp_out, dim=-1, keep_dim=False)
return layers.elementwise_div(x=exp_out, y=sum_out, axis=0)
scaled_q = layers.scale(x=q, scale=d_model**-0.5)
product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
weights = __softmax(
layers.elementwise_add(
x=product, y=attn_bias) if attn_bias else product)
if dropout_rate:
weights = layers.dropout(
weights, dropout_prob=dropout_rate, is_test=False)
out = layers.matmul(weights, v)
return out
q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
q = __split_heads(q, n_head)
k = __split_heads(k, n_head)
v = __split_heads(v, n_head)
ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_model,
dropout_rate)
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(input=out,
size=d_model,
param_attr=fluid.initializer.Xavier(uniform=False),
bias_attr=False,
num_flatten_dims=2)
return proj_out
def positionwise_feed_forward(x, d_inner_hid, d_hid):
"""
Position-wise Feed-Forward Networks.
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
param_attr=fluid.initializer.Uniform(
low=-(d_hid**-0.5), high=(d_hid**-0.5)),
act="relu")
out = layers.fc(input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.initializer.Uniform(
low=-(d_inner_hid**-0.5), high=(d_inner_hid**-0.5)))
return out
def pre_post_process_layer(prev_out, out, process_cmd, dropout=0.):
"""
Add residual connection, layer normalization and droput to the out tensor
optionally according to the value of process_cmd.
This will be used before or after multi-head attention and position-wise
feed-forward networks.
"""
for cmd in process_cmd:
if cmd == "a": # add residual connection
out = out + prev_out if prev_out else out
elif cmd == "n": # add layer normalization
out = layers.layer_norm(
out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.initializer.Constant(1.),
bias_attr=fluid.initializer.Constant(0.))
elif cmd == "d": # add dropout
if dropout:
out = layers.dropout(out, dropout_prob=dropout, is_test=False)
return out
pre_process_layer = partial(pre_post_process_layer, None)
post_process_layer = pre_post_process_layer
def prepare_encoder(src_word,
src_pos,
src_vocab_size,
src_emb_dim,
src_pad_idx,
src_max_len,
dropout=0.,
pos_pad_idx=0,
pos_enc_param_name=None):
"""Add word embeddings and position encodings.
The output tensor has a shape of:
[batch_size, max_src_length_in_batch, d_model].
This module is used at the bottom of the encoder stacks.
"""
src_word_emb = layers.embedding(
src_word,
size=[src_vocab_size, src_emb_dim],
padding_idx=src_pad_idx,
param_attr=fluid.initializer.Normal(0., 1.))
src_pos_enc = layers.embedding(
src_pos,
size=[src_max_len, src_emb_dim],
padding_idx=pos_pad_idx,
param_attr=fluid.ParamAttr(
name=pos_enc_param_name, trainable=False))
enc_input = src_word_emb + src_pos_enc
# FIXME(guosheng): Decouple the program desc with batch_size.
enc_input = layers.reshape(x=enc_input, shape=[batch_size, -1, src_emb_dim])
return layers.dropout(
enc_input, dropout_prob=dropout,
is_test=False) if dropout else enc_input
prepare_encoder = partial(
prepare_encoder, pos_enc_param_name=pos_enc_param_names[0])
prepare_decoder = partial(
prepare_encoder, pos_enc_param_name=pos_enc_param_names[1])
def encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
dropout_rate=0.):
"""The encoder layers that can be stacked to form a deep encoder.
This module consits of a multi-head (self) attention followed by
position-wise feed-forward networks and both the two components companied
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(enc_input, enc_input, enc_input,
attn_bias, d_key, d_value, d_model,
n_head, dropout_rate)
attn_output = post_process_layer(enc_input, attn_output, "dan",
dropout_rate)
ffd_output = positionwise_feed_forward(attn_output, d_inner_hid, d_model)
return post_process_layer(attn_output, ffd_output, "dan", dropout_rate)
def encoder(enc_input,
attn_bias,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
dropout_rate=0.):
"""
The encoder is composed of a stack of identical layers returned by calling
encoder_layer.
"""
for i in range(n_layer):
enc_output = encoder_layer(
enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
dropout_rate, )
enc_input = enc_output
return enc_output
def decoder_layer(dec_input,
enc_output,
slf_attn_bias,
dec_enc_attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
dropout_rate=0.):
""" The layer to be stacked in decoder part.
The structure of this module is similar to that in the encoder part except
a multi-head attention is added to implement encoder-decoder attention.
"""
slf_attn_output = multi_head_attention(
dec_input,
dec_input,
dec_input,
slf_attn_bias,
d_key,
d_value,
d_model,
n_head,
dropout_rate, )
slf_attn_output = post_process_layer(
dec_input,
slf_attn_output,
"dan", # residual connection + dropout + layer normalization
dropout_rate, )
enc_attn_output = multi_head_attention(
slf_attn_output,
enc_output,
enc_output,
dec_enc_attn_bias,
d_key,
d_value,
d_model,
n_head,
dropout_rate, )
enc_attn_output = post_process_layer(
slf_attn_output,
enc_attn_output,
"dan", # residual connection + dropout + layer normalization
dropout_rate, )
ffd_output = positionwise_feed_forward(
enc_attn_output,
d_inner_hid,
d_model, )
dec_output = post_process_layer(
enc_attn_output,
ffd_output,
"dan", # residual connection + dropout + layer normalization
dropout_rate, )
return dec_output
def decoder(dec_input,
enc_output,
dec_slf_attn_bias,
dec_enc_attn_bias,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
dropout_rate=0.):
"""
The decoder is composed of a stack of identical decoder_layer layers.
"""
for i in range(n_layer):
dec_output = decoder_layer(
dec_input,
enc_output,
dec_slf_attn_bias,
dec_enc_attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
dropout_rate, )
dec_input = dec_output
return dec_output
def make_inputs(input_data_names,
n_head,
d_model,
batch_size,
max_length,
is_pos,
slf_attn_bias_flag,
src_attn_bias_flag,
enc_output_flag=False):
"""
Define the input data layers for the transformer model.
"""
input_layers = []
# The shapes here act as placeholder.
# The shapes set here is to pass the infer-shape in compile time.
word = layers.data(
name=input_data_names[len(input_layers)],
shape=[batch_size * max_length, 1],
dtype="int64",
append_batch_size=False)
input_layers += [word]
# This is used for position data or label weight.
pos = layers.data(
name=input_data_names[len(input_layers)],
shape=[batch_size * max_length, 1],
dtype="int64" if is_pos else "float32",
append_batch_size=False)
input_layers += [pos]
if slf_attn_bias_flag:
# This input is used to remove attention weights on paddings for the
# encoder and to remove attention weights on subsequent words for the
# decoder.
slf_attn_bias = layers.data(
name=input_data_names[len(input_layers)],
shape=[batch_size, n_head, max_length, max_length],
dtype="float32",
append_batch_size=False)
input_layers += [slf_attn_bias]
if src_attn_bias_flag:
# This input is used to remove attention weights on paddings.
src_attn_bias = layers.data(
name=input_data_names[len(input_layers)],
shape=[batch_size, n_head, max_length, max_length],
dtype="float32",
append_batch_size=False)
input_layers += [src_attn_bias]
if enc_output_flag:
enc_output = layers.data(
name=input_data_names[len(input_layers)],
shape=[batch_size, max_length, d_model],
dtype="float32",
append_batch_size=False)
input_layers += [enc_output]
return input_layers
def transformer(
src_vocab_size,
trg_vocab_size,
max_length,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
dropout_rate,
src_pad_idx,
trg_pad_idx,
pos_pad_idx, ):
enc_input_layers = make_inputs(encoder_input_data_names, n_head, d_model,
batch_size, max_length, True, True, False)
enc_output = wrap_encoder(
src_vocab_size,
max_length,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
dropout_rate,
src_pad_idx,
pos_pad_idx,
enc_input_layers, )
dec_input_layers = make_inputs(decoder_input_data_names, n_head, d_model,
batch_size, max_length, True, True, True)
predict = wrap_decoder(
trg_vocab_size,
max_length,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
dropout_rate,
trg_pad_idx,
pos_pad_idx,
dec_input_layers,
enc_output, )
# Padding index do not contribute to the total loss. The weights is used to
# cancel padding index in calculating the loss.
gold, weights = make_inputs(label_data_names, n_head, d_model, batch_size,
max_length, False, False, False)
cost = layers.cross_entropy(input=predict, label=gold)
weighted_cost = cost * weights
return layers.reduce_sum(weighted_cost), predict
def wrap_encoder(src_vocab_size,
max_length,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
dropout_rate,
src_pad_idx,
pos_pad_idx,
enc_input_layers=None):
"""
The wrapper assembles together all needed layers for the encoder.
"""
if enc_input_layers is None:
# This is used to implement independent encoder program in inference.
src_word, src_pos, src_slf_attn_bias = make_inputs(
encoder_input_data_names, n_head, d_model, batch_size, max_length,
True, True, False)
else:
src_word, src_pos, src_slf_attn_bias = enc_input_layers
enc_input = prepare_encoder(
src_word,
src_pos,
src_vocab_size,
d_model,
src_pad_idx,
max_length,
dropout_rate, )
enc_output = encoder(
enc_input,
src_slf_attn_bias,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
dropout_rate, )
return enc_output
def wrap_decoder(trg_vocab_size,
max_length,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
dropout_rate,
trg_pad_idx,
pos_pad_idx,
dec_input_layers=None,
enc_output=None):
"""
The wrapper assembles together all needed layers for the decoder.
"""
if dec_input_layers is None:
# This is used to implement independent decoder program in inference.
trg_word, trg_pos, trg_slf_attn_bias, trg_src_attn_bias, enc_output = make_inputs(
decoder_input_data_names, n_head, d_model, batch_size, max_length,
True, True, True, True)
else:
trg_word, trg_pos, trg_slf_attn_bias, trg_src_attn_bias = dec_input_layers
dec_input = prepare_decoder(
trg_word,
trg_pos,
trg_vocab_size,
d_model,
trg_pad_idx,
max_length,
dropout_rate, )
dec_output = decoder(
dec_input,
enc_output,
trg_slf_attn_bias,
trg_src_attn_bias,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
dropout_rate, )
predict = layers.reshape(
x=layers.fc(input=dec_output,
size=trg_vocab_size,
bias_attr=False,
num_flatten_dims=2),
shape=[-1, trg_vocab_size],
act="softmax")
return predict
import numpy as np
import paddle.fluid as fluid
import paddle.fluid.layers as layers
class LearningRateScheduler(object):
"""
Wrapper for learning rate scheduling as described in the Transformer paper.
LearningRateScheduler adapts the learning rate externally and the adapted
learning rate will be feeded into the main_program as input data.
"""
def __init__(self,
d_model,
warmup_steps,
place,
learning_rate=0.001,
current_steps=0,
name="learning_rate"):
self.current_steps = current_steps
self.warmup_steps = warmup_steps
self.d_model = d_model
self.learning_rate = layers.create_global_var(
name=name,
shape=[1],
value=float(learning_rate),
dtype="float32",
persistable=True)
self.place = place
def update_learning_rate(self, data_input):
self.current_steps += 1
lr_value = np.power(self.d_model, -0.5) * np.min([
np.power(self.current_steps, -0.5),
np.power(self.warmup_steps, -1.5) * self.current_steps
])
lr_tensor = fluid.LoDTensor()
lr_tensor.set(np.array([lr_value], dtype="float32"), self.place)
data_input[self.learning_rate.name] = lr_tensor
import os
import numpy as np
import paddle.v2 as paddle
import paddle.fluid as fluid
from model import transformer, position_encoding_init
from optim import LearningRateScheduler
from config import TrainTaskConfig, ModelHyperParams, pos_enc_param_names, \
encoder_input_data_names, decoder_input_data_names, label_data_names
def pad_batch_data(insts,
pad_idx,
n_head,
is_target=False,
return_pos=True,
return_attn_bias=True,
return_max_len=True):
"""
Pad the instances to the max sequence length in batch, and generate the
corresponding position data and attention bias.
"""
return_list = []
max_len = max(len(inst) for inst in insts)
inst_data = np.array(
[inst + [pad_idx] * (max_len - len(inst)) for inst in insts])
return_list += [inst_data.astype("int64").reshape([-1, 1])]
if return_pos:
inst_pos = np.array([[
pos_i + 1 if w_i != pad_idx else 0 for pos_i, w_i in enumerate(inst)
] for inst in inst_data])
return_list += [inst_pos.astype("int64").reshape([-1, 1])]
if return_attn_bias:
if is_target:
# This is used to avoid attention on paddings and subsequent
# words.
slf_attn_bias_data = np.ones((inst_data.shape[0], max_len, max_len))
slf_attn_bias_data = np.triu(slf_attn_bias_data, 1).reshape(
[-1, 1, max_len, max_len])
slf_attn_bias_data = np.tile(slf_attn_bias_data,
[1, n_head, 1, 1]) * [-1e9]
else:
# This is used to avoid attention on paddings.
slf_attn_bias_data = np.array([[0] * len(inst) + [-1e9] *
(max_len - len(inst))
for inst in insts])
slf_attn_bias_data = np.tile(
slf_attn_bias_data.reshape([-1, 1, 1, max_len]),
[1, n_head, max_len, 1])
return_list += [slf_attn_bias_data.astype("float32")]
if return_max_len:
return_list += [max_len]
return return_list if len(return_list) > 1 else return_list[0]
def prepare_batch_input(insts, input_data_names, src_pad_idx, trg_pad_idx,
max_length, n_head):
"""
Put all padded data needed by training into a dict.
"""
src_word, src_pos, src_slf_attn_bias, src_max_len = pad_batch_data(
[inst[0] for inst in insts], src_pad_idx, n_head, is_target=False)
trg_word, trg_pos, trg_slf_attn_bias, trg_max_len = pad_batch_data(
[inst[1] for inst in insts], trg_pad_idx, n_head, is_target=True)
trg_src_attn_bias = np.tile(src_slf_attn_bias[:, :, ::src_max_len, :],
[1, 1, trg_max_len, 1]).astype("float32")
lbl_word = pad_batch_data([inst[2] for inst in insts], trg_pad_idx, n_head,
False, False, False, False)
lbl_weight = (lbl_word != trg_pad_idx).astype("float32").reshape([-1, 1])
input_dict = dict(
zip(input_data_names, [
src_word, src_pos, src_slf_attn_bias, trg_word, trg_pos,
trg_slf_attn_bias, trg_src_attn_bias, lbl_word, lbl_weight
]))
return input_dict
def main():
place = fluid.CUDAPlace(0) if TrainTaskConfig.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
cost, predict = transformer(
ModelHyperParams.src_vocab_size + 1,
ModelHyperParams.trg_vocab_size + 1, ModelHyperParams.max_length + 1,
ModelHyperParams.n_layer, ModelHyperParams.n_head,
ModelHyperParams.d_key, ModelHyperParams.d_value,
ModelHyperParams.d_model, ModelHyperParams.d_inner_hid,
ModelHyperParams.dropout, ModelHyperParams.src_pad_idx,
ModelHyperParams.trg_pad_idx, ModelHyperParams.pos_pad_idx)
lr_scheduler = LearningRateScheduler(ModelHyperParams.d_model,
TrainTaskConfig.warmup_steps, place,
TrainTaskConfig.learning_rate)
optimizer = fluid.optimizer.Adam(
learning_rate=lr_scheduler.learning_rate,
beta1=TrainTaskConfig.beta1,
beta2=TrainTaskConfig.beta2,
epsilon=TrainTaskConfig.eps)
optimizer.minimize(cost)
train_data = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.wmt16.train(ModelHyperParams.src_vocab_size,
ModelHyperParams.trg_vocab_size),
buf_size=100000),
batch_size=TrainTaskConfig.batch_size)
# Program to do validation.
test_program = fluid.default_main_program().clone()
with fluid.program_guard(test_program):
test_program = fluid.io.get_inference_program([cost])
val_data = paddle.batch(
paddle.dataset.wmt16.validation(ModelHyperParams.src_vocab_size,
ModelHyperParams.trg_vocab_size),
batch_size=TrainTaskConfig.batch_size)
def test(exe):
test_costs = []
for batch_id, data in enumerate(val_data()):
if len(data) != TrainTaskConfig.batch_size:
continue
data_input = prepare_batch_input(
data, encoder_input_data_names + decoder_input_data_names[:-1] +
label_data_names, ModelHyperParams.src_pad_idx,
ModelHyperParams.trg_pad_idx, ModelHyperParams.max_length,
ModelHyperParams.n_head)
test_cost = exe.run(test_program,
feed=data_input,
fetch_list=[cost])[0]
test_costs.append(test_cost)
return np.mean(test_costs)
# Initialize the parameters.
exe.run(fluid.framework.default_startup_program())
for pos_enc_param_name in pos_enc_param_names:
pos_enc_param = fluid.global_scope().find_var(
pos_enc_param_name).get_tensor()
pos_enc_param.set(
position_encoding_init(ModelHyperParams.max_length + 1,
ModelHyperParams.d_model), place)
for pass_id in xrange(TrainTaskConfig.pass_num):
for batch_id, data in enumerate(train_data()):
# The current program desc is coupled with batch_size, thus all
# mini-batches must have the same number of instances currently.
if len(data) != TrainTaskConfig.batch_size:
continue
data_input = prepare_batch_input(
data, encoder_input_data_names + decoder_input_data_names[:-1] +
label_data_names, ModelHyperParams.src_pad_idx,
ModelHyperParams.trg_pad_idx, ModelHyperParams.max_length,
ModelHyperParams.n_head)
lr_scheduler.update_learning_rate(data_input)
outs = exe.run(fluid.framework.default_main_program(),
feed=data_input,
fetch_list=[cost],
use_program_cache=True)
cost_val = np.array(outs[0])
print("pass_id = " + str(pass_id) + " batch = " + str(batch_id) +
" cost = " + str(cost_val))
# Validate and save the model for inference.
val_cost = test(exe)
print("pass_id = " + str(pass_id) + " val_cost = " + str(val_cost))
fluid.io.save_inference_model(
os.path.join(TrainTaskConfig.model_dir,
"pass_" + str(pass_id) + ".infer.model"),
encoder_input_data_names + decoder_input_data_names[:-1],
[predict], exe)
if __name__ == "__main__":
main()
The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
---
# MobileNet-SSD
This model built with paddle fluid is still under active development and is not
the final version. We welcome feedbacks.
background
aeroplane
bicycle
bird
boat
bottle
bus
car
cat
chair
cow
diningtable
dog
horse
motorbike
person
pottedplant
sheep
sofa
train
tvmonitor
import os
import os.path as osp
import re
import random
devkit_dir = './VOCdevkit'
years = ['2007', '2012']
def get_dir(devkit_dir, year, type):
return osp.join(devkit_dir, 'VOC' + year, type)
def walk_dir(devkit_dir, year):
filelist_dir = get_dir(devkit_dir, year, 'ImageSets/Main')
annotation_dir = get_dir(devkit_dir, year, 'Annotations')
img_dir = get_dir(devkit_dir, year, 'JPEGImages')
trainval_list = []
test_list = []
added = set()
for _, _, files in os.walk(filelist_dir):
for fname in files:
img_ann_list = []
if re.match('[a-z]+_trainval\.txt', fname):
img_ann_list = trainval_list
elif re.match('[a-z]+_test\.txt', fname):
img_ann_list = test_list
else:
continue
fpath = osp.join(filelist_dir, fname)
for line in open(fpath):
name_prefix = line.strip().split()[0]
if name_prefix in added:
continue
added.add(name_prefix)
ann_path = osp.join(annotation_dir, name_prefix + '.xml')
img_path = osp.join(img_dir, name_prefix + '.jpg')
assert os.path.isfile(ann_path), 'file %s not found.' % ann_path
assert os.path.isfile(img_path), 'file %s not found.' % img_path
img_ann_list.append((img_path, ann_path))
return trainval_list, test_list
def prepare_filelist(devkit_dir, years, output_dir):
trainval_list = []
test_list = []
for year in years:
trainval, test = walk_dir(devkit_dir, year)
trainval_list.extend(trainval)
test_list.extend(test)
random.shuffle(trainval_list)
with open(osp.join(output_dir, 'trainval.txt'), 'w') as ftrainval:
for item in trainval_list:
ftrainval.write(item[0] + ' ' + item[1] + '\n')
with open(osp.join(output_dir, 'test.txt'), 'w') as ftest:
for item in test_list:
ftest.write(item[0] + ' ' + item[1] + '\n')
prepare_filelist(devkit_dir, years, '.')
from PIL import Image, ImageEnhance
import numpy as np
import random
import math
class sampler():
def __init__(self, max_sample, max_trial, min_scale, max_scale,
min_aspect_ratio, max_aspect_ratio, min_jaccard_overlap,
max_jaccard_overlap):
self.max_sample = max_sample
self.max_trial = max_trial
self.min_scale = min_scale
self.max_scale = max_scale
self.min_aspect_ratio = min_aspect_ratio
self.max_aspect_ratio = max_aspect_ratio
self.min_jaccard_overlap = min_jaccard_overlap
self.max_jaccard_overlap = max_jaccard_overlap
class bbox():
def __init__(self, xmin, ymin, xmax, ymax):
self.xmin = xmin
self.ymin = ymin
self.xmax = xmax
self.ymax = ymax
def bbox_area(src_bbox):
width = src_bbox.xmax - src_bbox.xmin
height = src_bbox.ymax - src_bbox.ymin
return width * height
def generate_sample(sampler):
scale = random.uniform(sampler.min_scale, sampler.max_scale)
min_aspect_ratio = max(sampler.min_aspect_ratio, (scale**2.0))
max_aspect_ratio = min(sampler.max_aspect_ratio, 1 / (scale**2.0))
aspect_ratio = random.uniform(min_aspect_ratio, max_aspect_ratio)
bbox_width = scale * (aspect_ratio**0.5)
bbox_height = scale / (aspect_ratio**0.5)
xmin_bound = 1 - bbox_width
ymin_bound = 1 - bbox_height
xmin = random.uniform(0, xmin_bound)
ymin = random.uniform(0, ymin_bound)
xmax = xmin + bbox_width
ymax = ymin + bbox_height
sampled_bbox = bbox(xmin, ymin, xmax, ymax)
return sampled_bbox
def jaccard_overlap(sample_bbox, object_bbox):
if sample_bbox.xmin >= object_bbox.xmax or \
sample_bbox.xmax <= object_bbox.xmin or \
sample_bbox.ymin >= object_bbox.ymax or \
sample_bbox.ymax <= object_bbox.ymin:
return 0
intersect_xmin = max(sample_bbox.xmin, object_bbox.xmin)
intersect_ymin = max(sample_bbox.ymin, object_bbox.ymin)
intersect_xmax = min(sample_bbox.xmax, object_bbox.xmax)
intersect_ymax = min(sample_bbox.ymax, object_bbox.ymax)
intersect_size = (intersect_xmax - intersect_xmin) * (
intersect_ymax - intersect_ymin)
sample_bbox_size = bbox_area(sample_bbox)
object_bbox_size = bbox_area(object_bbox)
overlap = intersect_size / (
sample_bbox_size + object_bbox_size - intersect_size)
return overlap
def satisfy_sample_constraint(sampler, sample_bbox, bbox_labels):
if sampler.min_jaccard_overlap == 0 and sampler.max_jaccard_overlap == 0:
return True
for i in range(len(bbox_labels)):
object_bbox = bbox(bbox_labels[i][1], bbox_labels[i][2],
bbox_labels[i][3], bbox_labels[i][4])
overlap = jaccard_overlap(sample_bbox, object_bbox)
if sampler.min_jaccard_overlap != 0 and \
overlap < sampler.min_jaccard_overlap:
continue
if sampler.max_jaccard_overlap != 0 and \
overlap > sampler.max_jaccard_overlap:
continue
return True
return False
def generate_batch_samples(batch_sampler, bbox_labels, image_width,
image_height):
sampled_bbox = []
index = []
c = 0
for sampler in batch_sampler:
found = 0
for i in range(sampler.max_trial):
if found >= sampler.max_sample:
break
sample_bbox = generate_sample(sampler)
if satisfy_sample_constraint(sampler, sample_bbox, bbox_labels):
sampled_bbox.append(sample_bbox)
found = found + 1
index.append(c)
c = c + 1
return sampled_bbox
def clip_bbox(src_bbox):
src_bbox.xmin = max(min(src_bbox.xmin, 1.0), 0.0)
src_bbox.ymin = max(min(src_bbox.ymin, 1.0), 0.0)
src_bbox.xmax = max(min(src_bbox.xmax, 1.0), 0.0)
src_bbox.ymax = max(min(src_bbox.ymax, 1.0), 0.0)
return src_bbox
def meet_emit_constraint(src_bbox, sample_bbox):
center_x = (src_bbox.xmax + src_bbox.xmin) / 2
center_y = (src_bbox.ymax + src_bbox.ymin) / 2
if center_x >= sample_bbox.xmin and \
center_x <= sample_bbox.xmax and \
center_y >= sample_bbox.ymin and \
center_y <= sample_bbox.ymax:
return True
return False
def transform_labels(bbox_labels, sample_bbox):
proj_bbox = bbox(0, 0, 0, 0)
sample_labels = []
for i in range(len(bbox_labels)):
sample_label = []
object_bbox = bbox(bbox_labels[i][1], bbox_labels[i][2],
bbox_labels[i][3], bbox_labels[i][4])
if not meet_emit_constraint(object_bbox, sample_bbox):
continue
sample_width = sample_bbox.xmax - sample_bbox.xmin
sample_height = sample_bbox.ymax - sample_bbox.ymin
proj_bbox.xmin = (object_bbox.xmin - sample_bbox.xmin) / sample_width
proj_bbox.ymin = (object_bbox.ymin - sample_bbox.ymin) / sample_height
proj_bbox.xmax = (object_bbox.xmax - sample_bbox.xmin) / sample_width
proj_bbox.ymax = (object_bbox.ymax - sample_bbox.ymin) / sample_height
proj_bbox = clip_bbox(proj_bbox)
if bbox_area(proj_bbox) > 0:
sample_label.append(bbox_labels[i][0])
sample_label.append(float(proj_bbox.xmin))
sample_label.append(float(proj_bbox.ymin))
sample_label.append(float(proj_bbox.xmax))
sample_label.append(float(proj_bbox.ymax))
sample_label.append(bbox_labels[i][5])
sample_labels.append(sample_label)
return sample_labels
def crop_image(img, bbox_labels, sample_bbox, image_width, image_height):
sample_bbox = clip_bbox(sample_bbox)
xmin = int(sample_bbox.xmin * image_width)
xmax = int(sample_bbox.xmax * image_width)
ymin = int(sample_bbox.ymin * image_height)
ymax = int(sample_bbox.ymax * image_height)
sample_img = img[ymin:ymax, xmin:xmax]
sample_labels = transform_labels(bbox_labels, sample_bbox)
return sample_img, sample_labels
def random_brightness(img, settings):
prob = random.uniform(0, 1)
if prob < settings._brightness_prob:
delta = random.uniform(-settings._brightness_delta,
settings._brightness_delta) + 1
img = ImageEnhance.Brightness(img).enhance(delta)
return img
def random_contrast(img, settings):
prob = random.uniform(0, 1)
if prob < settings._contrast_prob:
delta = random.uniform(-settings._contrast_delta,
settings._contrast_delta) + 1
img = ImageEnhance.Contrast(img).enhance(delta)
return img
def random_saturation(img, settings):
prob = random.uniform(0, 1)
if prob < settings._saturation_prob:
delta = random.uniform(-settings._saturation_delta,
settings._saturation_delta) + 1
img = ImageEnhance.Color(img).enhance(delta)
return img
def random_hue(img, settings):
prob = random.uniform(0, 1)
if prob < settings._hue_prob:
delta = random.uniform(-settings._hue_delta, settings._hue_delta)
img_hsv = np.array(img.convert('HSV'))
img_hsv[:, :, 0] = img_hsv[:, :, 0] + delta
img = Image.fromarray(img_hsv, mode='HSV').convert('RGB')
return img
def distort_image(img, settings):
prob = random.uniform(0, 1)
# Apply different distort order
if prob > 0.5:
img = random_brightness(img, settings)
img = random_contrast(img, settings)
img = random_saturation(img, settings)
img = random_hue(img, settings)
else:
img = random_brightness(img, settings)
img = random_saturation(img, settings)
img = random_hue(img, settings)
img = random_contrast(img, settings)
return img
def expand_image(img, bbox_labels, img_width, img_height, settings):
prob = random.uniform(0, 1)
if prob < settings._hue_prob:
expand_ratio = random.uniform(1, settings._expand_max_ratio)
if expand_ratio - 1 >= 0.01:
height = int(img_height * expand_ratio)
width = int(img_width * expand_ratio)
h_off = math.floor(random.uniform(0, height - img_height))
w_off = math.floor(random.uniform(0, width - img_width))
expand_bbox = bbox(-w_off / img_width, -h_off / img_height,
(width - w_off) / img_width,
(height - h_off) / img_height)
expand_img = np.ones((height, width, 3))
expand_img = np.uint8(expand_img * np.squeeze(settings._img_mean))
expand_img = Image.fromarray(expand_img)
expand_img.paste(img, (int(w_off), int(h_off)))
bbox_labels = transform_labels(bbox_labels, expand_bbox)
return expand_img, bbox_labels
return img, bbox_labels
import paddle.v2 as paddle
import paddle.fluid as fluid
import numpy as np
# From npy
def load_vars():
vars = {}
name_map = {}
with open('./ssd_mobilenet_v1_coco/names.map', 'r') as map_file:
for param in map_file:
fd_name, tf_name = param.strip().split('\t')
name_map[fd_name] = tf_name
tf_vars = np.load(
'./ssd_mobilenet_v1_coco/ssd_mobilenet_v1_coco_2017_11_17.npy').item()
for fd_name in name_map:
tf_name = name_map[fd_name]
tf_var = tf_vars[tf_name]
if len(tf_var.shape) == 4 and 'depthwise' in tf_name:
vars[fd_name] = np.transpose(tf_var, (2, 3, 0, 1))
elif len(tf_var.shape) == 4:
vars[fd_name] = np.transpose(tf_var, (3, 2, 0, 1))
else:
vars[fd_name] = tf_var
return vars
def load_and_set_vars(place):
vars = load_vars()
for k, v in vars.items():
t = fluid.global_scope().find_var(k).get_tensor()
#print(np.array(t).shape, v.shape, k)
assert np.array(t).shape == v.shape
t.set(v, place)
# From Paddle V1
def load_paddlev1_vars(place):
vars = {}
name_map = {}
with open('./caffe2paddle/names.map', 'r') as map_file:
for param in map_file:
fd_name, tf_name = param.strip().split('\t')
name_map[fd_name] = tf_name
from operator import mul
def load(file_name, shape):
with open(file_name, 'rb') as f:
f.read(16)
arr = np.fromfile(f, dtype=np.float32)
#print(arr.size, reduce(mul, shape), file_name)
assert arr.size == reduce(mul, shape)
return arr.reshape(shape)
for fd_name in name_map:
v1_name = name_map[fd_name]
t = fluid.global_scope().find_var(fd_name).get_tensor()
shape = np.array(t).shape
v1_var = load('./caffe2paddle/' + v1_name, shape)
t.set(v1_var, place)
if __name__ == "__main__":
load_vars()
import paddle.v2 as paddle
import paddle.fluid as fluid
from paddle.fluid.initializer import MSRA
from paddle.fluid.param_attr import ParamAttr
def conv_bn(input,
filter_size,
num_filters,
stride,
padding,
channels=None,
num_groups=1,
act='relu',
use_cudnn=True):
parameter_attr = ParamAttr(learning_rate=0.1, initializer=MSRA())
conv = fluid.layers.conv2d(
input=input,
num_filters=num_filters,
filter_size=filter_size,
stride=stride,
padding=padding,
groups=num_groups,
act=None,
use_cudnn=use_cudnn,
param_attr=parameter_attr,
bias_attr=False)
parameter_attr = ParamAttr(learning_rate=0.1, initializer=MSRA())
bias_attr = ParamAttr(learning_rate=0.2)
return fluid.layers.batch_norm(
input=conv,
act=act,
epsilon=0.00001,
param_attr=parameter_attr,
bias_attr=bias_attr)
def depthwise_separable(input, num_filters1, num_filters2, num_groups, stride,
scale):
depthwise_conv = conv_bn(
input=input,
filter_size=3,
num_filters=int(num_filters1 * scale),
stride=stride,
padding=1,
num_groups=int(num_groups * scale),
use_cudnn=False)
pointwise_conv = conv_bn(
input=depthwise_conv,
filter_size=1,
num_filters=int(num_filters2 * scale),
stride=1,
padding=0)
return pointwise_conv
def extra_block(input, num_filters1, num_filters2, num_groups, stride, scale):
# 1x1 conv
pointwise_conv = conv_bn(
input=input,
filter_size=1,
num_filters=int(num_filters1 * scale),
stride=1,
num_groups=int(num_groups * scale),
padding=0)
# 3x3 conv
normal_conv = conv_bn(
input=pointwise_conv,
filter_size=3,
num_filters=int(num_filters2 * scale),
stride=2,
num_groups=int(num_groups * scale),
padding=1)
return normal_conv
def mobile_net(img, img_shape, scale=1.0):
# 300x300
tmp = conv_bn(img, 3, int(32 * scale), 2, 1, 3)
# 150x150
tmp = depthwise_separable(tmp, 32, 64, 32, 1, scale)
tmp = depthwise_separable(tmp, 64, 128, 64, 2, scale)
# 75x75
tmp = depthwise_separable(tmp, 128, 128, 128, 1, scale)
tmp = depthwise_separable(tmp, 128, 256, 128, 2, scale)
# 38x38
tmp = depthwise_separable(tmp, 256, 256, 256, 1, scale)
tmp = depthwise_separable(tmp, 256, 512, 256, 2, scale)
# 19x19
for i in range(5):
tmp = depthwise_separable(tmp, 512, 512, 512, 1, scale)
module11 = tmp
tmp = depthwise_separable(tmp, 512, 1024, 512, 2, scale)
# 10x10
module13 = depthwise_separable(tmp, 1024, 1024, 1024, 1, scale)
module14 = extra_block(module13, 256, 512, 1, 2, scale)
# 5x5
module15 = extra_block(module14, 128, 256, 1, 2, scale)
# 3x3
module16 = extra_block(module15, 128, 256, 1, 2, scale)
# 2x2
module17 = extra_block(module16, 64, 128, 1, 2, scale)
mbox_locs, mbox_confs, box, box_var = fluid.layers.multi_box_head(
inputs=[module11, module13, module14, module15, module16, module17],
image=img,
num_classes=21,
min_ratio=20,
max_ratio=90,
min_sizes=[60.0, 105.0, 150.0, 195.0, 240.0, 285.0],
max_sizes=[[], 150.0, 195.0, 240.0, 285.0, 300.0],
aspect_ratios=[[2.], [2., 3.], [2., 3.], [2., 3.], [2., 3.], [2., 3.]],
base_size=img_shape[2],
offset=0.5,
flip=True)
return mbox_locs, mbox_confs, box, box_var
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import image_util
from paddle.utils.image_util import *
import random
from PIL import Image
import numpy as np
import xml.etree.ElementTree
import os
class Settings(object):
def __init__(self, data_dir, label_file, resize_h, resize_w, mean_value,
apply_distort, apply_expand):
self._data_dir = data_dir
self._label_list = []
label_fpath = os.path.join(data_dir, label_file)
for line in open(label_fpath):
self._label_list.append(line.strip())
self._apply_distort = apply_distort
self._apply_expand = apply_expand
self._resize_height = resize_h
self._resize_width = resize_w
self._img_mean = np.array(mean_value)[:, np.newaxis, np.newaxis].astype(
'float32')
self._expand_prob = 0.5
self._expand_max_ratio = 4
self._hue_prob = 0.5
self._hue_delta = 18
self._contrast_prob = 0.5
self._contrast_delta = 0.5
self._saturation_prob = 0.5
self._saturation_delta = 0.5
self._brightness_prob = 0.5
self._brightness_delta = 0.125
@property
def apply_distort(self):
return self._apply_expand
@property
def apply_distort(self):
return self._apply_distort
@property
def data_dir(self):
return self._data_dir
@property
def label_list(self):
return self._label_list
@property
def resize_h(self):
return self._resize_height
@property
def resize_w(self):
return self._resize_width
@property
def img_mean(self):
return self._img_mean
def _reader_creator(settings, file_list, mode, shuffle):
def reader():
with open(file_list) as flist:
lines = [line.strip() for line in flist]
if shuffle:
random.shuffle(lines)
for line in lines:
if mode == 'train' or mode == 'test':
img_path, label_path = line.split()
img_path = os.path.join(settings.data_dir, img_path)
label_path = os.path.join(settings.data_dir, label_path)
elif mode == 'infer':
img_path = os.path.join(settings.data_dir, line)
img = Image.open(img_path)
img_width, img_height = img.size
# layout: label | xmin | ymin | xmax | ymax | difficult
if mode == 'train' or mode == 'test':
bbox_labels = []
root = xml.etree.ElementTree.parse(label_path).getroot()
for object in root.findall('object'):
bbox_sample = []
# start from 1
bbox_sample.append(
float(
settings.label_list.index(
object.find('name').text)))
bbox = object.find('bndbox')
difficult = float(object.find('difficult').text)
bbox_sample.append(
float(bbox.find('xmin').text) / img_width)
bbox_sample.append(
float(bbox.find('ymin').text) / img_height)
bbox_sample.append(
float(bbox.find('xmax').text) / img_width)
bbox_sample.append(
float(bbox.find('ymax').text) / img_height)
bbox_sample.append(difficult)
bbox_labels.append(bbox_sample)
sample_labels = bbox_labels
if mode == 'train':
if settings._apply_distort:
img = image_util.distort_image(img, settings)
if settings._apply_expand:
img, bbox_labels = image_util.expand_image(
img, bbox_labels, img_width, img_height,
settings)
batch_sampler = []
# hard-code here
batch_sampler.append(
image_util.sampler(1, 1, 1.0, 1.0, 1.0, 1.0, 0.0,
0.0))
batch_sampler.append(
image_util.sampler(1, 50, 0.3, 1.0, 0.5, 2.0, 0.1,
0.0))
batch_sampler.append(
image_util.sampler(1, 50, 0.3, 1.0, 0.5, 2.0, 0.3,
0.0))
batch_sampler.append(
image_util.sampler(1, 50, 0.3, 1.0, 0.5, 2.0, 0.5,
0.0))
batch_sampler.append(
image_util.sampler(1, 50, 0.3, 1.0, 0.5, 2.0, 0.7,
0.0))
batch_sampler.append(
image_util.sampler(1, 50, 0.3, 1.0, 0.5, 2.0, 0.9,
0.0))
batch_sampler.append(
image_util.sampler(1, 50, 0.3, 1.0, 0.5, 2.0, 0.0,
1.0))
""" random crop """
sampled_bbox = image_util.generate_batch_samples(
batch_sampler, bbox_labels, img_width, img_height)
img = np.array(img)
if len(sampled_bbox) > 0:
idx = int(random.uniform(0, len(sampled_bbox)))
img, sample_labels = image_util.crop_image(
img, bbox_labels, sampled_bbox[idx], img_width,
img_height)
img = Image.fromarray(img)
img = img.resize((settings.resize_w, settings.resize_h),
Image.ANTIALIAS)
img = np.array(img)
if mode == 'train':
mirror = int(random.uniform(0, 2))
if mirror == 1:
img = img[:, ::-1, :]
for i in xrange(len(sample_labels)):
tmp = sample_labels[i][1]
sample_labels[i][1] = 1 - sample_labels[i][3]
sample_labels[i][3] = 1 - tmp
if len(img.shape) == 3:
img = np.swapaxes(img, 1, 2)
img = np.swapaxes(img, 1, 0)
img = img[[2, 1, 0], :, :]
img = img.astype('float32')
img -= settings.img_mean
img = img.flatten()
img = img * 0.007843
sample_labels = np.array(sample_labels)
if mode == 'train' or mode == 'test':
if mode == 'train' and len(sample_labels) == 0: continue
yield img.astype(
'float32'
), sample_labels[:, 1:5], sample_labels[:, 0].astype(
'int32'), sample_labels[:, -1].astype('int32')
elif mode == 'infer':
yield img.astype('float32')
return reader
def train(settings, file_list, shuffle=True):
return _reader_creator(settings, file_list, 'train', shuffle)
def test(settings, file_list):
return _reader_creator(settings, file_list, 'test', False)
def infer(settings, file_list):
return _reader_creator(settings, file_list, 'infer', False)
import paddle.v2 as paddle
import paddle.fluid as fluid
import reader
import load_model as load_model
from mobilenet_ssd import mobile_net
from utility import add_arguments, print_arguments
import os
import numpy as np
import argparse
import functools
parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser)
# yapf: disable
add_arg('parallel', bool, True, "Whether use parallel training.")
add_arg('use_gpu', bool, True, "Whether use GPU.")
# yapf: disable
def train(args,
train_file_list,
val_file_list,
data_args,
learning_rate,
batch_size,
num_passes,
model_save_dir='model',
init_model_path=None):
image_shape = [3, data_args.resize_h, data_args.resize_w]
image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
gt_box = fluid.layers.data(
name='gt_box', shape=[4], dtype='float32', lod_level=1)
gt_label = fluid.layers.data(
name='gt_label', shape=[1], dtype='int32', lod_level=1)
difficult = fluid.layers.data(
name='gt_difficult', shape=[1], dtype='int32', lod_level=1)
if args.parallel:
places = fluid.layers.get_places()
pd = fluid.layers.ParallelDo(places)
with pd.do():
image_ = pd.read_input(image)
gt_box_ = pd.read_input(gt_box)
gt_label_ = pd.read_input(gt_label)
difficult_ = pd.read_input(difficult)
locs, confs, box, box_var = mobile_net(image_, image_shape)
loss = fluid.layers.ssd_loss(locs, confs, gt_box_, gt_label_,
box, box_var)
pd.write_output(loss)
pd.write_output(locs)
pd.write_output(confs)
pd.write_output(box)
pd.write_output(box_var)
loss, locs, confs, box, box_var = pd()
loss = fluid.layers.reduce_sum(loss)
else:
locs, confs, box, box_var = mobile_net(image, image_shape)
nmsed_out = fluid.layers.detection_output(
locs, mbox_confs, box, box_var, nms_threshold=0.45)
loss = fluid.layers.ssd_loss(locs, mbox_confs, gt_box, gt_label,
box, box_var)
loss = fluid.layers.reduce_sum(loss)
test_program = fluid.default_main_program().clone(for_test=True)
with fluid.program_guard(test_program):
nmsed_out = fluid.layers.detection_output(
locs, confs, box, box_var, nms_threshold=0.45)
map_eval = fluid.evaluator.DetectionMAP(
nmsed_out,
gt_label,
gt_box,
difficult,
21,
overlap_threshold=0.5,
evaluate_difficult=False,
ap_version='11point')
boundaries = [40000, 60000]
values = [0.001, 0.0005, 0.00025]
optimizer = fluid.optimizer.RMSProp(
learning_rate=fluid.layers.piecewise_decay(boundaries, values),
regularization=fluid.regularizer.L2Decay(0.00005), )
optimizer.minimize(loss)
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
load_model.load_and_set_vars(place)
#load_model.load_paddlev1_vars(place)
train_reader = paddle.batch(
reader.train(data_args, train_file_list), batch_size=batch_size)
test_reader = paddle.batch(
reader.test(data_args, val_file_list), batch_size=batch_size)
feeder = fluid.DataFeeder(
place=place, feed_list=[image, gt_box, gt_label, difficult])
#print 'test_program ', test_program
def test(pass_id):
_, accum_map = map_eval.get_map_var()
map_eval.reset(exe)
test_map = None
for _, data in enumerate(test_reader()):
test_map = exe.run(test_program,
feed=feeder.feed(data),
fetch_list=[accum_map])
print("Test {0}, map {1}".format(pass_id, test_map[0]))
#print 'main_program ', fluid.default_main_program()
for pass_id in range(num_passes):
for batch_id, data in enumerate(train_reader()):
loss_v = exe.run(fluid.default_main_program(),
feed=feeder.feed(data),
fetch_list=[loss])
if batch_id % 20 == 0:
print("Pass {0}, batch {1}, loss {2}"
.format(pass_id, batch_id, loss_v[0]))
test(pass_id)
if pass_id % 10 == 0:
model_path = os.path.join(model_save_dir, str(pass_id))
print 'save models to %s' % (model_path)
fluid.io.save_inference_model(model_path, ['image'], [nmsed_out],
exe)
if __name__ == '__main__':
args = parser.parse_args()
print_arguments(args)
data_args = reader.Settings(
data_dir='./data',
label_file='label_list',
apply_distort=True,
apply_expand=True,
resize_h=300,
resize_w=300,
mean_value=[127.5, 127.5, 127.5])
train(args,
train_file_list='./data/trainval.txt',
val_file_list='./data/test.txt',
data_args=data_args,
learning_rate=0.001,
batch_size=32,
num_passes=300)
"""Contains common utility functions."""
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import distutils.util
import numpy as np
from paddle.fluid import core
def print_arguments(args):
"""Print argparse's arguments.
Usage:
.. code-block:: python
parser = argparse.ArgumentParser()
parser.add_argument("name", default="Jonh", type=str, help="User name.")
args = parser.parse_args()
print_arguments(args)
:param args: Input argparse.Namespace for printing.
:type args: argparse.Namespace
"""
print("----------- Configuration Arguments -----------")
for arg, value in sorted(vars(args).iteritems()):
print("%s: %s" % (arg, value))
print("------------------------------------------------")
def add_arguments(argname, type, default, help, argparser, **kwargs):
"""Add argparse's argument.
Usage:
.. code-block:: python
parser = argparse.ArgumentParser()
add_argument("name", str, "Jonh", "User name.", parser)
args = parser.parse_args()
"""
type = distutils.util.strtobool if type == bool else type
argparser.add_argument(
"--" + argname,
default=default,
type=type,
help=help + ' Default: %(default)s.',
**kwargs)
...@@ -141,15 +141,55 @@ def encoder_net(images, ...@@ -141,15 +141,55 @@ def encoder_net(images,
def ctc_train_net(images, label, args, num_classes): def ctc_train_net(images, label, args, num_classes):
regularizer = fluid.regularizer.L2Decay(args.l2) regularizer = fluid.regularizer.L2Decay(args.l2)
gradient_clip = None gradient_clip = None
fc_out = encoder_net( if args.parallel:
images, places = fluid.layers.get_places()
num_classes, pd = fluid.layers.ParallelDo(places)
regularizer=regularizer, with pd.do():
gradient_clip=gradient_clip) images_ = pd.read_input(images)
label_ = pd.read_input(label)
fc_out = encoder_net(
images_,
num_classes,
regularizer=regularizer,
gradient_clip=gradient_clip)
cost = fluid.layers.warpctc(
input=fc_out,
label=label_,
blank=num_classes,
norm_by_times=True)
sum_cost = fluid.layers.reduce_sum(cost)
decoded_out = fluid.layers.ctc_greedy_decoder(
input=fc_out, blank=num_classes)
pd.write_output(sum_cost)
pd.write_output(decoded_out)
sum_cost, decoded_out = pd()
sum_cost = fluid.layers.reduce_sum(sum_cost)
else:
fc_out = encoder_net(
images,
num_classes,
regularizer=regularizer,
gradient_clip=gradient_clip)
cost = fluid.layers.warpctc(
input=fc_out, label=label, blank=num_classes, norm_by_times=True)
sum_cost = fluid.layers.reduce_sum(cost)
decoded_out = fluid.layers.ctc_greedy_decoder(
input=fc_out, blank=num_classes)
cost = fluid.layers.warpctc( casted_label = fluid.layers.cast(x=label, dtype='int64')
input=fc_out, label=label, blank=num_classes, norm_by_times=True) error_evaluator = fluid.evaluator.EditDistance(
sum_cost = fluid.layers.reduce_sum(cost) input=decoded_out, label=casted_label)
inference_program = fluid.default_main_program().clone()
with fluid.program_guard(inference_program):
inference_program = fluid.io.get_inference_program(error_evaluator)
optimizer = fluid.optimizer.Momentum( optimizer = fluid.optimizer.Momentum(
learning_rate=args.learning_rate, momentum=args.momentum) learning_rate=args.learning_rate, momentum=args.momentum)
...@@ -166,7 +206,7 @@ def ctc_train_net(images, label, args, num_classes): ...@@ -166,7 +206,7 @@ def ctc_train_net(images, label, args, num_classes):
casted_label = fluid.layers.cast(x=label, dtype='int64') casted_label = fluid.layers.cast(x=label, dtype='int64')
error_evaluator = fluid.evaluator.EditDistance( error_evaluator = fluid.evaluator.EditDistance(
input=decoded_out, label=casted_label) input=decoded_out, label=casted_label)
return sum_cost, error_evaluator, model_average return sum_cost, error_evaluator, inference_program, model_average
def ctc_infer(images, num_classes): def ctc_infer(images, num_classes):
......
...@@ -27,7 +27,7 @@ add_arg('model_average', bool, True, "Whether to aevrage model for eva ...@@ -27,7 +27,7 @@ add_arg('model_average', bool, True, "Whether to aevrage model for eva
add_arg('min_average_window', int, 10000, "Min average window.") add_arg('min_average_window', int, 10000, "Min average window.")
add_arg('max_average_window', int, 15625, "Max average window.") add_arg('max_average_window', int, 15625, "Max average window.")
add_arg('average_window', float, 0.15, "Average window.") add_arg('average_window', float, 0.15, "Average window.")
add_arg('parallel', bool, True, "Whether use parallel training.")
# yapf: disable # yapf: disable
def load_parameter(place): def load_parameter(place):
...@@ -44,7 +44,7 @@ def train(args, data_reader=dummy_reader): ...@@ -44,7 +44,7 @@ def train(args, data_reader=dummy_reader):
# define network # define network
images = fluid.layers.data(name='pixel', shape=data_shape, dtype='float32') images = fluid.layers.data(name='pixel', shape=data_shape, dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int32', lod_level=1) label = fluid.layers.data(name='label', shape=[1], dtype='int32', lod_level=1)
sum_cost, error_evaluator, model_average = ctc_train_net(images, label, args, num_classes) sum_cost, error_evaluator, inference_program, model_average = ctc_train_net(images, label, args, num_classes)
# data reader # data reader
train_reader = data_reader.train(args.batch_size) train_reader = data_reader.train(args.batch_size)
...@@ -57,8 +57,6 @@ def train(args, data_reader=dummy_reader): ...@@ -57,8 +57,6 @@ def train(args, data_reader=dummy_reader):
exe.run(fluid.default_startup_program()) exe.run(fluid.default_startup_program())
#load_parameter(place) #load_parameter(place)
inference_program = fluid.io.get_inference_program(error_evaluator)
for pass_id in range(args.pass_num): for pass_id in range(args.pass_num):
error_evaluator.reset(exe) error_evaluator.reset(exe)
batch_id = 1 batch_id = 1
......
# Policy Gradient RL by PaddlePaddle 运行本目录下的程序示例需要使用PaddlePaddle的最新develop分枝。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
---
# Policy Gradient RL by PaddlePaddle
本文介绍了如何使用PaddlePaddle通过policy-based的强化学习方法来训练一个player(actor model), 我们希望这个player可以完成简单的走阶梯任务。 本文介绍了如何使用PaddlePaddle通过policy-based的强化学习方法来训练一个player(actor model), 我们希望这个player可以完成简单的走阶梯任务。
内容分为: 内容分为:
......
import numpy as np import numpy as np
import paddle.v2 as paddle import paddle.v2 as paddle
import paddle.v2.fluid as fluid import paddle.fluid as fluid
# reproducible # reproducible
np.random.seed(1) np.random.seed(1)
......
# 命名实体识别
以下是本例的简要目录结构及说明:
```text
.
├── data # 存储运行本例所依赖的数据,从外部获取
├── network_conf.py # 模型定义
├── reader.py # 数据读取接口, 从外部获取
├── README.md # 文档
├── train.py # 训练脚本
├── infer.py # 预测脚本
├── utils.py # 定义通用的函数, 从外部获取
└── utils_extend.py # 对utils.py的拓展
```
## 简介,模型详解
在PaddlePaddle v2版本[命名实体识别](https://github.com/PaddlePaddle/models/blob/develop/sequence_tagging_for_ner/README.md)中对于命名实体识别任务有较详细的介绍,在本例中不再重复介绍。
在模型上,我们沿用了v2版本的模型结构,唯一区别是我们使用LSTM代替原始的RNN。
## 数据获取
请参考PaddlePaddle v2版本[命名实体识别](https://github.com/PaddlePaddle/models/blob/develop/sequence_tagging_for_ner/README.md) 一节中数据获取方式,将该例中的data文件夹拷贝至本例目录下,运行其中的download.sh脚本获取训练和测试数据。
## 通用脚本获取
请将PaddlePaddle v2版本[命名实体识别](https://github.com/PaddlePaddle/models/blob/develop/sequence_tagging_for_ner/README.md)中提供的用于数据读取的文件[reader.py](https://github.com/PaddlePaddle/models/blob/develop/sequence_tagging_for_ner/reader.py)以及包含字典导入等通用功能的文件[utils.py](https://github.com/PaddlePaddle/models/blob/develop/sequence_tagging_for_ner/utils.py)复制到本目录下。本例将会使用到这两个脚本。
## 训练
1. 运行 `sh data/download.sh`
2. 修改 `train.py``main` 函数,指定数据路径
```python
main(
train_data_file="data/train",
test_data_file="data/test",
vocab_file="data/vocab.txt",
target_file="data/target.txt",
emb_file="data/wordVectors.txt",
model_save_dir="models",
num_passes=1000,
use_gpu=False,
parallel=False)
```
3. 运行命令 `python train.py`**需要注意:直接运行使用的是示例数据,请替换真实的标记数据。**
```text
Pass 127, Batch 9525, Cost 4.0867705, Precision 0.3954984, Recall 0.37846154, F1_score0.38679245
Pass 127, Batch 9530, Cost 3.137265, Precision 0.42971888, Recall 0.38351256, F1_score0.405303
Pass 127, Batch 9535, Cost 3.6240938, Precision 0.4272152, Recall 0.41795665, F1_score0.4225352
Pass 127, Batch 9540, Cost 3.5352352, Precision 0.48464164, Recall 0.4536741, F1_score0.46864685
Pass 127, Batch 9545, Cost 4.1130385, Precision 0.40131578, Recall 0.3836478, F1_score0.39228293
Pass 127, Batch 9550, Cost 3.6826708, Precision 0.43333334, Recall 0.43730888, F1_score0.43531203
Pass 127, Batch 9555, Cost 3.6363933, Precision 0.42424244, Recall 0.3962264, F1_score0.4097561
Pass 127, Batch 9560, Cost 3.6101768, Precision 0.51363635, Recall 0.353125, F1_score0.41851854
Pass 127, Batch 9565, Cost 3.5935276, Precision 0.5152439, Recall 0.5, F1_score0.5075075
Pass 127, Batch 9570, Cost 3.4987144, Precision 0.5, Recall 0.4330218, F1_score0.46410686
Pass 127, Batch 9575, Cost 3.4659843, Precision 0.39864865, Recall 0.38064516, F1_score0.38943896
Pass 127, Batch 9580, Cost 3.1702557, Precision 0.5, Recall 0.4490446, F1_score0.47315437
Pass 127, Batch 9585, Cost 3.1587276, Precision 0.49377593, Recall 0.4089347, F1_score0.4473684
Pass 127, Batch 9590, Cost 3.5043538, Precision 0.4556962, Recall 0.4600639, F1_score0.45786962
Pass 127, Batch 9595, Cost 2.981989, Precision 0.44981414, Recall 0.45149255, F1_score0.4506518
[TrainSet] pass_id:127 pass_precision:[0.46023396] pass_recall:[0.43197003] pass_f1_score:[0.44565433]
[TestSet] pass_id:127 pass_precision:[0.4708409] pass_recall:[0.47971722] pass_f1_score:[0.4752376]
```
## 预测
1. 修改 [infer.py](./infer.py)`infer` 函数,指定:需要测试的模型的路径、测试数据、字典文件,预测标记文件的路径,默认参数如下:
```python
infer(
model_path="models/params_pass_0",
batch_size=6,
test_data_file="data/test",
vocab_file="data/vocab.txt",
target_file="data/target.txt",
use_gpu=False
)
```
2. 在终端运行 `python infer.py`,开始测试,会看到如下预测结果(以下为训练70个pass所得模型的部分预测结果):
```text
leicestershire B-ORG B-LOC
extended O O
their O O
first O O
innings O O
by O O
DGDG O O
runs O O
before O O
being O O
bowled O O
out O O
for O O
296 O O
with O O
england B-LOC B-LOC
discard O O
andy B-PER B-PER
caddick I-PER I-PER
taking O O
three O O
for O O
DGDG O O
. O O
```
输出分为三列,以“\t” 分隔,第一列是输入的词语,第二列是标准结果,第三列为生成的标记结果。多条输入序列之间以空行分隔。
## 结果示例
<p align="center">
<img src="imgs/convergence_curve.png" width="80%" align="center"/><br/>
图1. 学习曲线, 横轴表示训练轮数,纵轴表示F1值
</p>
import numpy as np
import paddle.fluid as fluid
import paddle.v2 as paddle
from network_conf import ner_net
import reader
from utils import load_dict, load_reverse_dict
from utils_extend import to_lodtensor
def infer(model_path, batch_size, test_data_file, vocab_file, target_file,
use_gpu):
"""
use the model under model_path to predict the test data, the result will be printed on the screen
return nothing
"""
word_dict = load_dict(vocab_file)
word_reverse_dict = load_reverse_dict(vocab_file)
label_dict = load_dict(target_file)
label_reverse_dict = load_reverse_dict(target_file)
test_data = paddle.batch(
reader.data_reader(test_data_file, word_dict, label_dict),
batch_size=batch_size)
place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
inference_scope = fluid.core.Scope()
with fluid.scope_guard(inference_scope):
[inference_program, feed_target_names,
fetch_targets] = fluid.io.load_inference_model(model_path, exe)
for data in test_data():
word = to_lodtensor(map(lambda x: x[0], data), place)
mark = to_lodtensor(map(lambda x: x[1], data), place)
target = to_lodtensor(map(lambda x: x[2], data), place)
crf_decode = exe.run(
inference_program,
feed={"word": word,
"mark": mark,
"target": target},
fetch_list=fetch_targets,
return_numpy=False)
lod_info = (crf_decode[0].lod())[0]
np_data = np.array(crf_decode[0])
assert len(data) == len(lod_info) - 1
for sen_index in xrange(len(data)):
assert len(data[sen_index][0]) == lod_info[
sen_index + 1] - lod_info[sen_index]
word_index = 0
for tag_index in xrange(lod_info[sen_index],
lod_info[sen_index + 1]):
word = word_reverse_dict[data[sen_index][0][word_index]]
gold_tag = label_reverse_dict[data[sen_index][2][
word_index]]
tag = label_reverse_dict[np_data[tag_index][0]]
print word + "\t" + gold_tag + "\t" + tag
word_index += 1
print ""
if __name__ == "__main__":
infer(
model_path="models/params_pass_0",
batch_size=6,
test_data_file="data/test",
vocab_file="data/vocab.txt",
target_file="data/target.txt",
use_gpu=False)
import math
import paddle.fluid as fluid
from paddle.fluid.initializer import NormalInitializer
from utils import logger, load_dict, get_embedding
def ner_net(word_dict_len, label_dict_len, parallel, stack_num=2):
mark_dict_len = 2
word_dim = 50
mark_dim = 5
hidden_dim = 300
IS_SPARSE = True
embedding_name = 'emb'
def _net_conf(word, mark, target):
word_embedding = fluid.layers.embedding(
input=word,
size=[word_dict_len, word_dim],
dtype='float32',
is_sparse=IS_SPARSE,
param_attr=fluid.ParamAttr(
name=embedding_name, trainable=False))
mark_embedding = fluid.layers.embedding(
input=mark,
size=[mark_dict_len, mark_dim],
dtype='float32',
is_sparse=IS_SPARSE)
word_caps_vector = fluid.layers.concat(
input=[word_embedding, mark_embedding], axis=1)
mix_hidden_lr = 1
rnn_para_attr = fluid.ParamAttr(
initializer=NormalInitializer(
loc=0.0, scale=0.0),
learning_rate=mix_hidden_lr)
hidden_para_attr = fluid.ParamAttr(
initializer=NormalInitializer(
loc=0.0, scale=(1. / math.sqrt(hidden_dim) / 3)),
learning_rate=mix_hidden_lr)
hidden = fluid.layers.fc(
input=word_caps_vector,
name="__hidden00__",
size=hidden_dim,
act="tanh",
bias_attr=fluid.ParamAttr(initializer=NormalInitializer(
loc=0.0, scale=(1. / math.sqrt(hidden_dim) / 3))),
param_attr=fluid.ParamAttr(initializer=NormalInitializer(
loc=0.0, scale=(1. / math.sqrt(hidden_dim) / 3))))
fea = []
for direction in ["fwd", "bwd"]:
for i in range(stack_num):
if i != 0:
hidden = fluid.layers.fc(
name="__hidden%02d_%s__" % (i, direction),
size=hidden_dim,
act="stanh",
bias_attr=fluid.ParamAttr(initializer=NormalInitializer(
loc=0.0, scale=1.0)),
input=[hidden, rnn[0], rnn[1]],
param_attr=[
hidden_para_attr, rnn_para_attr, rnn_para_attr
])
rnn = fluid.layers.dynamic_lstm(
name="__rnn%02d_%s__" % (i, direction),
input=hidden,
size=hidden_dim,
candidate_activation='relu',
gate_activation='sigmoid',
cell_activation='sigmoid',
bias_attr=fluid.ParamAttr(initializer=NormalInitializer(
loc=0.0, scale=1.0)),
is_reverse=(i % 2) if direction == "fwd" else not i % 2,
param_attr=rnn_para_attr)
fea += [hidden, rnn[0], rnn[1]]
rnn_fea = fluid.layers.fc(
size=hidden_dim,
bias_attr=fluid.ParamAttr(initializer=NormalInitializer(
loc=0.0, scale=(1. / math.sqrt(hidden_dim) / 3))),
act="stanh",
input=fea,
param_attr=[hidden_para_attr, rnn_para_attr, rnn_para_attr] * 2)
emission = fluid.layers.fc(
size=label_dict_len,
input=rnn_fea,
param_attr=fluid.ParamAttr(initializer=NormalInitializer(
loc=0.0, scale=(1. / math.sqrt(hidden_dim) / 3))))
crf_cost = fluid.layers.linear_chain_crf(
input=emission,
label=target,
param_attr=fluid.ParamAttr(
name='crfw',
initializer=NormalInitializer(
loc=0.0, scale=(1. / math.sqrt(hidden_dim) / 3)),
learning_rate=mix_hidden_lr))
avg_cost = fluid.layers.mean(x=crf_cost)
return avg_cost, emission
word = fluid.layers.data(name='word', shape=[1], dtype='int64', lod_level=1)
mark = fluid.layers.data(name='mark', shape=[1], dtype='int64', lod_level=1)
target = fluid.layers.data(
name="target", shape=[1], dtype='int64', lod_level=1)
if parallel:
places = fluid.layers.get_places()
pd = fluid.layers.ParallelDo(places)
with pd.do():
word_ = pd.read_input(word)
mark_ = pd.read_input(mark)
target_ = pd.read_input(target)
avg_cost, emission_base = _net_conf(word_, mark_, target_)
pd.write_output(avg_cost)
pd.write_output(emission_base)
avg_cost_list, emission = pd()
avg_cost = fluid.layers.mean(x=avg_cost_list)
emission.stop_gradient = True
else:
avg_cost, emission = _net_conf(word, mark, target)
return avg_cost, emission, word, mark, target
import os
import math
import numpy as np
import paddle.v2 as paddle
import paddle.fluid as fluid
import reader
from network_conf import ner_net
from utils import logger, load_dict
from utils_extend import to_lodtensor, get_embedding
def test(exe, chunk_evaluator, inference_program, test_data, place):
chunk_evaluator.reset(exe)
for data in test_data():
word = to_lodtensor(map(lambda x: x[0], data), place)
mark = to_lodtensor(map(lambda x: x[1], data), place)
target = to_lodtensor(map(lambda x: x[2], data), place)
acc = exe.run(inference_program,
feed={"word": word,
"mark": mark,
"target": target})
return chunk_evaluator.eval(exe)
def main(train_data_file, test_data_file, vocab_file, target_file, emb_file,
model_save_dir, num_passes, use_gpu, parallel):
if not os.path.exists(model_save_dir):
os.mkdir(model_save_dir)
BATCH_SIZE = 200
word_dict = load_dict(vocab_file)
label_dict = load_dict(target_file)
word_vector_values = get_embedding(emb_file)
word_dict_len = len(word_dict)
label_dict_len = len(label_dict)
avg_cost, feature_out, word, mark, target = ner_net(
word_dict_len, label_dict_len, parallel)
sgd_optimizer = fluid.optimizer.SGD(learning_rate=1e-3)
sgd_optimizer.minimize(avg_cost)
crf_decode = fluid.layers.crf_decoding(
input=feature_out, param_attr=fluid.ParamAttr(name='crfw'))
chunk_evaluator = fluid.evaluator.ChunkEvaluator(
input=crf_decode,
label=target,
chunk_scheme="IOB",
num_chunk_types=int(math.ceil((label_dict_len - 1) / 2.0)))
inference_program = fluid.default_main_program().clone()
with fluid.program_guard(inference_program):
test_target = chunk_evaluator.metrics + chunk_evaluator.states
inference_program = fluid.io.get_inference_program(test_target)
train_reader = paddle.batch(
paddle.reader.shuffle(
reader.data_reader(train_data_file, word_dict, label_dict),
buf_size=20000),
batch_size=BATCH_SIZE)
test_reader = paddle.batch(
paddle.reader.shuffle(
reader.data_reader(test_data_file, word_dict, label_dict),
buf_size=20000),
batch_size=BATCH_SIZE)
place = fluid.CUDAPlace(0) if use_gpu else fluid.CPUPlace()
feeder = fluid.DataFeeder(feed_list=[word, mark, target], place=place)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
embedding_name = 'emb'
embedding_param = fluid.global_scope().find_var(embedding_name).get_tensor()
embedding_param.set(word_vector_values, place)
batch_id = 0
for pass_id in xrange(num_passes):
chunk_evaluator.reset(exe)
for data in train_reader():
cost, batch_precision, batch_recall, batch_f1_score = exe.run(
fluid.default_main_program(),
feed=feeder.feed(data),
fetch_list=[avg_cost] + chunk_evaluator.metrics)
if batch_id % 5 == 0:
print("Pass " + str(pass_id) + ", Batch " + str(
batch_id) + ", Cost " + str(cost[0]) + ", Precision " + str(
batch_precision[0]) + ", Recall " + str(batch_recall[0])
+ ", F1_score" + str(batch_f1_score[0]))
batch_id = batch_id + 1
pass_precision, pass_recall, pass_f1_score = chunk_evaluator.eval(exe)
print("[TrainSet] pass_id:" + str(pass_id) + " pass_precision:" + str(
pass_precision) + " pass_recall:" + str(pass_recall) +
" pass_f1_score:" + str(pass_f1_score))
pass_precision, pass_recall, pass_f1_score = test(
exe, chunk_evaluator, inference_program, test_reader, place)
print("[TestSet] pass_id:" + str(pass_id) + " pass_precision:" + str(
pass_precision) + " pass_recall:" + str(pass_recall) +
" pass_f1_score:" + str(pass_f1_score))
save_dirname = os.path.join(model_save_dir, "params_pass_%d" % pass_id)
fluid.io.save_inference_model(save_dirname, ['word', 'mark', 'target'],
[crf_decode], exe)
if __name__ == "__main__":
main(
train_data_file="data/train",
test_data_file="data/test",
vocab_file="data/vocab.txt",
target_file="data/target.txt",
emb_file="data/wordVectors.txt",
model_save_dir="models",
num_passes=1000,
use_gpu=False,
parallel=False)
import numpy as np
import paddle.fluid as fluid
def get_embedding(emb_file='data/wordVectors.txt'):
"""
Get the trained word vector.
"""
return np.loadtxt(emb_file, dtype='float32')
def to_lodtensor(data, place):
"""
convert data to lodtensor
"""
seq_lens = [len(seq) for seq in data]
cur_len = 0
lod = [cur_len]
for l in seq_lens:
cur_len += l
lod.append(cur_len)
flattened_data = np.concatenate(data, axis=0).astype("int64")
flattened_data = flattened_data.reshape([len(flattened_data), 1])
res = fluid.LoDTensor()
res.set(flattened_data, place)
res.set_lod([lod])
return res
The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
---
# Text Classification # Text Classification
## Data Preparation ## Data Preparation
......
...@@ -5,7 +5,7 @@ import argparse ...@@ -5,7 +5,7 @@ import argparse
import time import time
import paddle.v2 as paddle import paddle.v2 as paddle
import paddle.v2.fluid as fluid import paddle.fluid as fluid
from config import TrainConfig as conf from config import TrainConfig as conf
...@@ -89,12 +89,14 @@ def main(dict_path): ...@@ -89,12 +89,14 @@ def main(dict_path):
sgd_optimizer = fluid.optimizer.SGD(learning_rate=conf.learning_rate) sgd_optimizer = fluid.optimizer.SGD(learning_rate=conf.learning_rate)
sgd_optimizer.minimize(avg_cost) sgd_optimizer.minimize(avg_cost)
accuracy = fluid.evaluator.Accuracy(input=prediction, label=label) batch_size_var = fluid.layers.create_tensor(dtype='int64')
batch_acc_var = fluid.layers.accuracy(
input=prediction, label=label, total=batch_size_var)
inference_program = fluid.default_main_program().clone() inference_program = fluid.default_main_program().clone()
with fluid.program_guard(inference_program): with fluid.program_guard(inference_program):
test_target = accuracy.metrics + accuracy.states inference_program = fluid.io.get_inference_program(
inference_program = fluid.io.get_inference_program(test_target) target_vars=[batch_acc_var, batch_size_var])
# The training data set. # The training data set.
train_reader = paddle.batch( train_reader = paddle.batch(
...@@ -119,31 +121,37 @@ def main(dict_path): ...@@ -119,31 +121,37 @@ def main(dict_path):
exe.run(fluid.default_startup_program()) exe.run(fluid.default_startup_program())
train_pass_acc_evaluator = fluid.average.WeightedAverage()
test_pass_acc_evaluator = fluid.average.WeightedAverage()
def test(exe): def test(exe):
accuracy.reset(exe) test_pass_acc_evaluator.reset()
for batch_id, data in enumerate(test_reader()): for batch_id, data in enumerate(test_reader()):
input_seq = to_lodtensor(map(lambda x: x[0], data), place) input_seq = to_lodtensor(map(lambda x: x[0], data), place)
y_data = np.array(map(lambda x: x[1], data)).astype("int64") y_data = np.array(map(lambda x: x[1], data)).astype("int64")
y_data = y_data.reshape([-1, 1]) y_data = y_data.reshape([-1, 1])
acc = exe.run(inference_program, b_acc, b_size = exe.run(inference_program,
feed={"words": input_seq, feed={"words": input_seq,
"label": y_data}) "label": y_data},
test_acc = accuracy.eval(exe) fetch_list=[batch_acc_var, batch_size_var])
test_pass_acc_evaluator.add(value=b_acc, weight=b_size)
test_acc = test_pass_acc_evaluator.eval()
return test_acc return test_acc
total_time = 0. total_time = 0.
for pass_id in xrange(conf.num_passes): for pass_id in xrange(conf.num_passes):
accuracy.reset(exe) train_pass_acc_evaluator.reset()
start_time = time.time() start_time = time.time()
for batch_id, data in enumerate(train_reader()): for batch_id, data in enumerate(train_reader()):
cost_val, acc_val = exe.run( cost_val, acc_val, size_val = exe.run(
fluid.default_main_program(), fluid.default_main_program(),
feed=feeder.feed(data), feed=feeder.feed(data),
fetch_list=[avg_cost, accuracy.metrics[0]]) fetch_list=[avg_cost, batch_acc_var, batch_size_var])
pass_acc = accuracy.eval(exe) train_pass_acc_evaluator.add(value=acc_val, weight=size_val)
if batch_id and batch_id % conf.log_period == 0: if batch_id and batch_id % conf.log_period == 0:
print("Pass id: %d, batch id: %d, cost: %f, pass_acc %f" % print("Pass id: %d, batch id: %d, cost: %f, pass_acc: %f" %
(pass_id, batch_id, cost_val, pass_acc)) (pass_id, batch_id, cost_val,
train_pass_acc_evaluator.eval()))
end_time = time.time() end_time = time.time()
total_time += (end_time - start_time) total_time += (end_time - start_time)
pass_test_acc = test(exe) pass_test_acc = test(exe)
......
运行本目录下的程序示例需要使用PaddlePaddle v0.10.0版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
---
# 中国古诗生成 # 中国古诗生成
## 简介 ## 简介
......
运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
---
# 使用循环神经网语言模型生成文本 # 使用循环神经网语言模型生成文本
语言模型(Language Model)是一个概率分布模型,简单来说,就是用来计算一个句子的概率的模型。利用它可以确定哪个词序列的可能性更大,或者给定若干个词,可以预测下一个最可能出现的词。语言模型是自然语言处理领域里一个重要的基础模型。 语言模型(Language Model)是一个概率分布模型,简单来说,就是用来计算一个句子的概率的模型。利用它可以确定哪个词序列的可能性更大,或者给定若干个词,可以预测下一个最可能出现的词。语言模型是自然语言处理领域里一个重要的基础模型。
......
此目录中代码示例PaddlePaddle所需版本至少为v0.11.0。如果您使用的PaddlePaddle版本早于v0.11.0, [请更新](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
---
# 全球标准化阅读器
该模型实现以下功能:
Jonathan Raiman and John Miller. Globally Normalized Reader. Empirical Methods in Natural Language Processing (EMNLP), 2017
如果您在研究中使用数据集/代码,请引用上述论文:
```text
@inproceedings{raiman2015gnr,
author={Raiman, Jonathan and Miller, John},
booktitle={Empirical Methods in Natural Language Processing (EMNLP)},
title={Globally Normalized Reader},
year={2017},
}
```
您也可以访问 https://github.com/baidu-research/GloballyNormalizedReader 以获取更多信息。
# 安装
1. 请使用 [docker image](http://doc.paddlepaddle.org/develop/doc/getstarted/build_and_install/docker_install_en.html) 安装最新的PaddlePaddle,运行方法:
```bash
docker pull paddledev/paddle
```
2. 下载所有必要的数据,运行方法:
```bash
cd data && ./download.sh && cd ..
```
3. 预处理并特征化数据:
```bash
python featurize.py --datadir data --outdir data/featurized --glove-path data/glove.840B.300d.txt
```
# 模型训练
- 根据需要修改config.py来配置模型,然后运行:
```bash
python train.py 2>&1 | tee train.log
```
# 使用训练过的模型推断
- 运行以下训练模型来推断:
```bash
python infer.py \
--model_path models/pass_00000.tar.gz \
--data_dir data/featurized/ \
--batch_size 2 \
--use_gpu 0 \
--trainer_count 1 \
2>&1 | tee infer.log
```
The minimum PaddlePaddle version needed for the code sample in this directory is v0.11.0. If you are on a version of PaddlePaddle earlier than v0.11.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
---
# Globally Normalized Reader # Globally Normalized Reader
This model implements the work in the following paper: This model implements the work in the following paper:
......
运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
---
# Hsigmoid加速词向量训练 # Hsigmoid加速词向量训练
## 背景介绍 ## 背景介绍
在自然语言处理领域中,传统做法通常使用one-hot向量来表示词,比如词典为['我', '你', '喜欢'],可以用[1,0,0]、[0,1,0]和[0,0,1]这三个向量分别表示'我'、'你'和'喜欢'。这种表示方式比较简洁,但是当词表很大时,容易产生维度爆炸问题;而且任意两个词的向量是正交的,向量包含的信息有限。为了避免或减轻one-hot表示的缺点,目前通常使用词向量来取代one-hot表示,词向量也就是word embedding,即使用一个低维稠密的实向量取代高维稀疏的one-hot向量。训练词向量的方法有很多种,神经网络模型是其中之一,包括CBOW、Skip-gram等,这些模型本质上都是一个分类模型,当词表较大即类别较多时,传统的softmax将非常消耗时间。PaddlePaddle提供了Hsigmoid Layer、NCE Layer,来加速模型的训练过程。本文主要介绍如何使用Hsigmoid Layer来加速训练,词向量相关内容请查阅PaddlePaddle Book中的[词向量章节](https://github.com/PaddlePaddle/book/tree/develop/04.word2vec) 在自然语言处理领域中,传统做法通常使用one-hot向量来表示词,比如词典为['我', '你', '喜欢'],可以用[1,0,0]、[0,1,0]和[0,0,1]这三个向量分别表示'我'、'你'和'喜欢'。这种表示方式比较简洁,但是当词表很大时,容易产生维度爆炸问题;而且任意两个词的向量是正交的,向量包含的信息有限。为了避免或减轻one-hot表示的缺点,目前通常使用词向量来取代one-hot表示,词向量也就是word embedding,即使用一个低维稠密的实向量取代高维稀疏的one-hot向量。训练词向量的方法有很多种,神经网络模型是其中之一,包括CBOW、Skip-gram等,这些模型本质上都是一个分类模型,当词表较大即类别较多时,传统的softmax将非常消耗时间。PaddlePaddle提供了Hsigmoid Layer、NCE Layer,来加速模型的训练过程。本文主要介绍如何使用Hsigmoid Layer来加速训练,词向量相关内容请查阅PaddlePaddle Book中的[词向量章节](https://github.com/PaddlePaddle/book/tree/develop/04.word2vec)
......
运行本目录下的程序示例需要使用PaddlePaddle v0.11.0 版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
---
图像分类 图像分类
======================= =======================
......
运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
---
# 排序学习(Learning To Rank) # 排序学习(Learning To Rank)
排序学习技术\[[1](#参考文献1)\]是构建排序模型的机器学习方法,在信息检索、自然语言处理,数据挖掘等机器学场景中具有重要作用。排序学习的主要目的是对给定一组文档,对任意查询请求给出反映相关性的文档排序。在本例子中,利用标注过的语料库训练两种经典排序模型RankNet[[4](#参考文献4)\]和LamdaRank[[6](#参考文献6)\],分别可以生成对应的排序模型,能够对任意查询请求,给出相关性文档排序。 排序学习技术\[[1](#参考文献1)\]是构建排序模型的机器学习方法,在信息检索、自然语言处理,数据挖掘等机器学场景中具有重要作用。排序学习的主要目的是对给定一组文档,对任意查询请求给出反映相关性的文档排序。在本例子中,利用标注过的语料库训练两种经典排序模型RankNet[[4](#参考文献4)\]和LamdaRank[[6](#参考文献6)\],分别可以生成对应的排序模型,能够对任意查询请求,给出相关性文档排序。
......
运行本目录下的程序示例需要使用PaddlePaddle v0.11.0 版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
---
# 带外部记忆机制的神经机器翻译 # 带外部记忆机制的神经机器翻译
**外部记忆**(External Memory)机制的神经机器翻译模型(Neural Machine Translation, NMT),是神经机器翻译模型的一个重要扩展。它引入可微分的记忆网络作为额外的记忆单元,拓展神经翻译模型内部工作记忆(Working Memory)的容量或带宽,辅助完成翻译等任务中信息的临时存取,改善模型表现。 **外部记忆**(External Memory)机制的神经机器翻译模型(Neural Machine Translation, NMT),是神经机器翻译模型的一个重要扩展。它引入可微分的记忆网络作为额外的记忆单元,拓展神经翻译模型内部工作记忆(Working Memory)的容量或带宽,辅助完成翻译等任务中信息的临时存取,改善模型表现。
...@@ -112,7 +116,7 @@ ...@@ -112,7 +116,7 @@
算法实现于以下几个文件中: 算法实现于以下几个文件中:
- `external_memory.py`: 主要实现简化版的 **神经图灵机**`ExternalMemory` 类,对外提供初始化和读写函数。 - `external_memory.py`: 主要实现简化版的 **神经图灵机**`ExternalMemory` 类,对外提供初始化和读写函数。
- `model.py`: 相关模型配置函数,包括双向 GPU 编码器(`bidirectional_gru_encoder`),带外部记忆强化的解码器(`memory_enhanced_decoder`),带外部记忆强化的序列到序列模型(`memory_enhanced_decoder`)。 - `model.py`: 相关模型配置函数,包括双向 GPU 编码器(`bidirectional_gru_encoder`),带外部记忆强化的解码器(`memory_enhanced_decoder`),带外部记忆强化的序列到序列模型(`memory_enhanced_seq2seq`)。
- `data_utils.py`: 相关数据处理辅助函数。 - `data_utils.py`: 相关数据处理辅助函数。
- `train.py`: 模型训练。 - `train.py`: 模型训练。
- `infer.py`: 部分示例样本的翻译(模型推断)。 - `infer.py`: 部分示例样本的翻译(模型推断)。
...@@ -166,6 +170,7 @@ class ExternalMemory(object): ...@@ -166,6 +170,7 @@ class ExternalMemory(object):
a learnable gate function. a learnable gate function.
:type enable_interpolation: bool :type enable_interpolation: bool
""" """
pass
def _content_addressing(self, key_vector): def _content_addressing(self, key_vector):
"""Get write/read head's addressing weights via content-based addressing. """Get write/read head's addressing weights via content-based addressing.
...@@ -190,6 +195,7 @@ class ExternalMemory(object): ...@@ -190,6 +195,7 @@ class ExternalMemory(object):
:param write_key: Key vector for write heads to generate writing :param write_key: Key vector for write heads to generate writing
content and addressing signals. content and addressing signals.
:type write_key: LayerOutput :type write_key: LayerOutput
"""
pass pass
def read(self, read_key): def read(self, read_key):
...@@ -406,7 +412,7 @@ paddle.dataset.wmt14.test(dict_size) ...@@ -406,7 +412,7 @@ paddle.dataset.wmt14.test(dict_size)
命令行输入: 命令行输入:
```bash ```bash
python mt_with_external_memory.py python train.py
``` ```
或自定义部分参数, 例如: 或自定义部分参数, 例如:
......
运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
---
# 使用噪声对比估计加速语言模型训练 # 使用噪声对比估计加速语言模型训练
## 为什么需要噪声对比估计 ## 为什么需要噪声对比估计
...@@ -101,11 +105,11 @@ return paddle.layer.nce( ...@@ -101,11 +105,11 @@ return paddle.layer.nce(
NCE 层的一些重要参数解释如下: NCE 层的一些重要参数解释如下:
| 参数名 | 参数作用 | 介绍 | | 参数名 | 参数作用 | 介绍 |
|:------ |:-------| :--------| | :----------------------- | :--------------------------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------- |
| param\_attr / bias\_attr | 用来设置参数名字 |方便预测阶段加载参数,具体在预测一节中介绍。| | param\_attr / bias\_attr | 用来设置参数名字 | 方便预测阶段加载参数,具体在预测一节中介绍。 |
| num\_neg\_samples | 负样本采样个数|可以控制正负样本比例,这个值取值区间为 [1, 字典大小-1],负样本个数越多则整个模型的训练速度越慢,模型精度也会越高 | | num\_neg\_samples | 负样本采样个数 | 可以控制正负样本比例,这个值取值区间为 [1, 字典大小-1],负样本个数越多则整个模型的训练速度越慢,模型精度也会越高 |
| neg\_distribution | 生成负样例标签的分布,默认是一个均匀分布| 可以自行控制负样本采样时各个类别的采样权重。例如:希望正样例为“晴天”时,负样例“洪水”在训练时更被着重区分,则可以将“洪水”这个类别的采样权重增加| | neg\_distribution | 生成负样例标签的分布,默认是一个均匀分布 | 可以自行控制负样本采样时各个类别的采样权重。例如:希望正样例为“晴天”时,负样例“洪水”在训练时更被着重区分,则可以将“洪水”这个类别的采样权重增加 |
## 预测 ## 预测
1. 在命令行运行 : 1. 在命令行运行 :
......
运行本目录下的程序示例需要使用PaddlePaddle v0.11.0 版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
---
# 基于双层序列的文本分类 # 基于双层序列的文本分类
## 简介 ## 简介
......
The minimum PaddlePaddle version needed for the code sample in this directory is v0.10.0. If you are on a version of PaddlePaddle earlier than v0.10.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
---
# Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering # Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering
This model implements the work in the following paper: This model implements the work in the following paper:
......
The minimum PaddlePaddle version needed for the code sample in this directory is v0.10.0. If you are on a version of PaddlePaddle earlier than v0.10.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
---
# Neural Machine Translation Model # Neural Machine Translation Model
## Background Introduction ## Background Introduction
......
运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
---
# 场景文字识别 (STR, Scene Text Recognition) # 场景文字识别 (STR, Scene Text Recognition)
## STR任务简介 ## STR任务简介
......
运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
---
# Scheduled Sampling # Scheduled Sampling
## 概述 ## 概述
......
运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
---
# 命名实体识别 # 命名实体识别
以下是本例的简要目录结构及说明: 以下是本例的简要目录结构及说明:
...@@ -21,16 +25,16 @@ ...@@ -21,16 +25,16 @@
命名实体识别(Named Entity Recognition,NER)又称作“专名识别”,是指识别文本中具有特定意义的实体,主要包括人名、地名、机构名、专有名词等,是自然语言处理研究的一个基础问题。NER任务通常包括实体边界识别、确定实体类别两部分,可以将其作为序列标注问题解决。 命名实体识别(Named Entity Recognition,NER)又称作“专名识别”,是指识别文本中具有特定意义的实体,主要包括人名、地名、机构名、专有名词等,是自然语言处理研究的一个基础问题。NER任务通常包括实体边界识别、确定实体类别两部分,可以将其作为序列标注问题解决。
序列标注可以分为Sequence Classification、Segment Classification和Temporal Classification三类[[1](#参考文献)],本例只考虑Segment Classification,即对输入序列中的每个元素在输出序列中给出对应的标签。对于NER任务,由于需要标识边界,一般采用[BIO标注方法](http://book.paddlepaddle.org/07.label_semantic_roles/)定义的标签集,如下是一个NER的标注结果示例: 序列标注可以分为Sequence Classification、Segment Classification和Temporal Classification三类[[1](#参考文献)],本例只考虑Segment Classification,即对输入序列中的每个元素在输出序列中给出对应的标签。对于NER任务,由于需要标识边界,一般采用[BIO标注方法](http://www.paddlepaddle.org/docs/develop/book/07.label_semantic_roles/index.cn.html)定义的标签集,如下是一个NER的标注结果示例:
<p align="center"> <p align="center">
<img src="images/ner_label_ins.png" width="80%" align="center"/><br/> <img src="images/ner_label_ins.png" width="80%" align="center"/><br/>
图1. BIO标注方法示例 图1. BIO标注方法示例
</p> </p>
根据序列标注结果可以直接得到实体边界和实体类别。类似的,分词、词性标注、语块识别、[语义角色标注](http://book.paddlepaddle.org/07.label_semantic_roles/index.cn.html)等任务都可通过序列标注来解决。使用神经网络模型解决问题的思路通常是:前层网络学习输入的特征表示,网络的最后一层在特征基础上完成最终的任务;对于序列标注问题,通常:使用基于RNN的网络结构学习特征,将学习到的特征接入CRF完成序列标注。实际上是将传统CRF中的线性模型换成了非线性神经网络。沿用CRF的出发点是:CRF使用句子级别的似然概率,能够更好的解决标记偏置问题[[2](#参考文献)]。本例也将基于此思路建立模型。虽然,这里以NER任务作为示例,但所给出的模型可以应用到其他各种序列标注任务中。 根据序列标注结果可以直接得到实体边界和实体类别。类似的,分词、词性标注、语块识别、[语义角色标注](http://www.paddlepaddle.org/docs/develop/book/07.label_semantic_roles/index.cn.html)等任务都可通过序列标注来解决。使用神经网络模型解决问题的思路通常是:前层网络学习输入的特征表示,网络的最后一层在特征基础上完成最终的任务;对于序列标注问题,通常:使用基于RNN的网络结构学习特征,将学习到的特征接入CRF完成序列标注。实际上是将传统CRF中的线性模型换成了非线性神经网络。沿用CRF的出发点是:CRF使用句子级别的似然概率,能够更好的解决标记偏置问题[[2](#参考文献)]。本例也将基于此思路建立模型。虽然,这里以NER任务作为示例,但所给出的模型可以应用到其他各种序列标注任务中。
由于序列标注问题的广泛性,产生了[CRF](http://book.paddlepaddle.org/07.label_semantic_roles/index.cn.html)等经典的序列模型,这些模型大多只能使用局部信息或需要人工设计特征。随着深度学习研究的发展,循环神经网络(Recurrent Neural Network,RNN等 序列模型能够处理序列元素之间前后关联问题,能够从原始输入文本中学习特征表示,而更加适合序列标注任务,更多相关知识可参考PaddleBook中[语义角色标注](https://github.com/PaddlePaddle/book/blob/develop/07.label_semantic_roles/README.cn.md)一课。 由于序列标注问题的广泛性,产生了[CRF](http://www.paddlepaddle.org/docs/develop/book/07.label_semantic_roles/index.cn.html)等经典的序列模型,这些模型大多只能使用局部信息或需要人工设计特征。随着深度学习研究的发展,循环神经网络(Recurrent Neural Network,RNN等 序列模型能够处理序列元素之间前后关联问题,能够从原始输入文本中学习特征表示,而更加适合序列标注任务,更多相关知识可参考PaddleBook中[语义角色标注](https://github.com/PaddlePaddle/book/blob/develop/07.label_semantic_roles/README.cn.md)一课。
## 模型详解 ## 模型详解
...@@ -88,14 +92,14 @@ Baghdad NNP I-NP I-LOC ...@@ -88,14 +92,14 @@ Baghdad NNP I-NP I-LOC
预处理完成后,一条训练样本包含3个部分作为神经网络的输入信息用于训练:(1)句子序列;(2)首字母大写标记序列;(3)标注序列,下表是一条训练样本的示例: 预处理完成后,一条训练样本包含3个部分作为神经网络的输入信息用于训练:(1)句子序列;(2)首字母大写标记序列;(3)标注序列,下表是一条训练样本的示例:
| 句子序列 | 大写标记序列 | 标注序列 | | 句子序列 | 大写标记序列 | 标注序列 |
|---|---|---| | -------- | ------------ | -------- |
| u.n. | 1 | B-ORG | | u.n. | 1 | B-ORG |
| official | 0 | O | | official | 0 | O |
| ekeus | 1 | B-PER | | ekeus | 1 | B-PER |
| heads | 0 | O | | heads | 0 | O |
| for | 0 | O | | for | 0 | O |
| baghdad | 1 | B-LOC | | baghdad | 1 | B-LOC |
| . | 0 | O | | . | 0 | O |
## 运行 ## 运行
### 编写数据读取接口 ### 编写数据读取接口
......
运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
---
# SSD目标检测 # SSD目标检测
## 概述 ## 概述
SSD全称:Single Shot MultiBox Detector,是目标检测领域较新且效果较好的检测算法之一\[[1](#引用)\],有着检测速度快且检测精度高的有的。PaddlePaddle已集成SSD算法,本示例旨在介绍如何使用PaddlePaddle中的SSD模型进行目标检测。下文首先简要介绍SSD原理,然后介绍示例包含文件及如何使用,接着介绍如何在PASCAL VOC数据集上训练、评估及检测,最后简要介绍如何在自有数据集上使用SSD。 SSD全称:Single Shot MultiBox Detector,是目标检测领域较新且效果较好的检测算法之一\[[1](#引用)\],有着检测速度快且检测精度高的有的。PaddlePaddle已集成SSD算法,本示例旨在介绍如何使用PaddlePaddle中的SSD模型进行目标检测。下文首先简要介绍SSD原理,然后介绍示例包含文件及如何使用,接着介绍如何在PASCAL VOC数据集上训练、评估及检测,最后简要介绍如何在自有数据集上使用SSD。
......
The minimum PaddlePaddle version needed for the code sample in this directory is v0.10.0. If you are on a version of PaddlePaddle earlier than v0.10.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
---
# Single Shot MultiBox Detector (SSD) Object Detection # Single Shot MultiBox Detector (SSD) Object Detection
## Introduction ## Introduction
......
运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
---
# 文本分类 # 文本分类
以下是本例目录包含的文件以及对应说明: 以下是本例目录包含的文件以及对应说明:
...@@ -129,70 +133,70 @@ negative 0.0300 0.9700 i love scifi and am willing to put up with a lot ...@@ -129,70 +133,70 @@ negative 0.0300 0.9700 i love scifi and am willing to put up with a lot
1. 数据组织 1. 数据组织
假设有如下格式的训练数据:每一行为一条样本,以 `\t` 分隔,第一列是类别标签,第二列是输入文本的内容,文本内容中的词语以空格分隔。以下是两条示例数据: 假设有如下格式的训练数据:每一行为一条样本,以 `\t` 分隔,第一列是类别标签,第二列是输入文本的内容,文本内容中的词语以空格分隔。以下是两条示例数据:
``` ```
positive PaddlePaddle is good positive PaddlePaddle is good
negative What a terrible weather negative What a terrible weather
``` ```
2. 编写数据读取接口 2. 编写数据读取接口
自定义数据读取接口只需编写一个 Python 生成器实现**从原始输入文本中解析一条训练样本**的逻辑。以下代码片段实现了读取原始数据返回类型为: `paddle.data_type.integer_value_sequence`(词语在字典的序号)和 `paddle.data_type.integer_value`(类别标签)的 2 个输入给网络中定义的 2 个 `data_layer` 的功能。 自定义数据读取接口只需编写一个 Python 生成器实现**从原始输入文本中解析一条训练样本**的逻辑。以下代码片段实现了读取原始数据返回类型为: `paddle.data_type.integer_value_sequence`(词语在字典的序号)和 `paddle.data_type.integer_value`(类别标签)的 2 个输入给网络中定义的 2 个 `data_layer` 的功能。
```python ```python
def train_reader(data_dir, word_dict, label_dict): def train_reader(data_dir, word_dict, label_dict):
def reader(): def reader():
UNK_ID = word_dict["<UNK>"] UNK_ID = word_dict["<UNK>"]
word_col = 0 word_col = 0
lbl_col = 1 lbl_col = 1
for file_name in os.listdir(data_dir): for file_name in os.listdir(data_dir):
with open(os.path.join(data_dir, file_name), "r") as f: with open(os.path.join(data_dir, file_name), "r") as f:
for line in f: for line in f:
line_split = line.strip().split("\t") line_split = line.strip().split("\t")
word_ids = [ word_ids = [
word_dict.get(w, UNK_ID) word_dict.get(w, UNK_ID)
for w in line_split[word_col].split() for w in line_split[word_col].split()
] ]
yield word_ids, label_dict[line_split[lbl_col]] yield word_ids, label_dict[line_split[lbl_col]]
return reader return reader
``` ```
- 关于 PaddlePaddle 中 `data_layer` 接受输入数据的类型,以及数据读取接口对应该返回数据的格式,请参考 [input-types](http://www.paddlepaddle.org/release_doc/0.9.0/doc_cn/ui/data_provider/pydataprovider2.html#input-types) 一节。 - 关于 PaddlePaddle 中 `data_layer` 接受输入数据的类型,以及数据读取接口对应该返回数据的格式,请参考 [input-types](http://www.paddlepaddle.org/release_doc/0.9.0/doc_cn/ui/data_provider/pydataprovider2.html#input-types) 一节。
- 以上代码片段详见本例目录下的 `reader.py` 脚本,`reader.py` 同时提供了读取测试数据的全部代码。 - 以上代码片段详见本例目录下的 `reader.py` 脚本,`reader.py` 同时提供了读取测试数据的全部代码。
接下来,只需要将数据读取函数 `train_reader` 作为参数传递给 `train.py` 脚本中的 `paddle.batch` 接口即可使用自定义数据接口读取数据,调用方式如下: 接下来,只需要将数据读取函数 `train_reader` 作为参数传递给 `train.py` 脚本中的 `paddle.batch` 接口即可使用自定义数据接口读取数据,调用方式如下:
```python ```python
train_reader = paddle.batch( train_reader = paddle.batch(
paddle.reader.shuffle( paddle.reader.shuffle(
reader.train_reader(train_data_dir, word_dict, lbl_dict), reader.train_reader(train_data_dir, word_dict, lbl_dict),
buf_size=1000), buf_size=1000),
batch_size=batch_size) batch_size=batch_size)
``` ```
3. 修改命令行参数 3. 修改命令行参数
- 如果将数据组织成示例数据的同样的格式,只需在 `run.sh` 脚本中修改 `train.py` 启动参数,指定 `train_data_dir` 参数,可以直接运行本例,无需修改数据读取接口 `reader.py`。 - 如果将数据组织成示例数据的同样的格式,只需在 `run.sh` 脚本中修改 `train.py` 启动参数,指定 `train_data_dir` 参数,可以直接运行本例,无需修改数据读取接口 `reader.py`。
- 执行 `python train.py --help` 可以获取`train.py` 脚本各项启动参数的详细说明,主要参数如下: - 执行 `python train.py --help` 可以获取`train.py` 脚本各项启动参数的详细说明,主要参数如下:
- `nn_type`:选择要使用的模型,目前支持两种:“dnn” 或者 “cnn”。 - `nn_type`:选择要使用的模型,目前支持两种:“dnn” 或者 “cnn”。
- `train_data_dir`:指定训练数据所在的文件夹,使用自定义数据训练,必须指定此参数,否则使用`paddle.dataset.imdb`训练,同时忽略`test_data_dir`,`word_dict`,和 `label_dict` 参数。 - `train_data_dir`:指定训练数据所在的文件夹,使用自定义数据训练,必须指定此参数,否则使用`paddle.dataset.imdb`训练,同时忽略`test_data_dir`,`word_dict`,和 `label_dict` 参数。
- `test_data_dir`:指定测试数据所在的文件夹,若不指定将不进行测试。 - `test_data_dir`:指定测试数据所在的文件夹,若不指定将不进行测试。
- `word_dict`:字典文件所在的路径,若不指定,将从训练数据根据词频统计,自动建立字典。 - `word_dict`:字典文件所在的路径,若不指定,将从训练数据根据词频统计,自动建立字典。
- `label_dict`:类别标签字典,用于将字符串类型的类别标签,映射为整数类型的序号。 - `label_dict`:类别标签字典,用于将字符串类型的类别标签,映射为整数类型的序号。
- `batch_size`:指定多少条样本后进行一次神经网络的前向运行及反向更新。 - `batch_size`:指定多少条样本后进行一次神经网络的前向运行及反向更新。
- `num_passes`:指定训练多少个轮次。 - `num_passes`:指定训练多少个轮次。
### 如何预测 ### 如何预测
1. 修改 `infer.py` 中以下变量,指定使用的模型、指定测试数据。 1. 修改 `infer.py` 中以下变量,指定使用的模型、指定测试数据。
```python ```python
model_path = "dnn_params_pass_00000.tar.gz" # 指定模型所在的路径 model_path = "dnn_params_pass_00000.tar.gz" # 指定模型所在的路径
nn_type = "dnn" # 指定测试使用的模型 nn_type = "dnn" # 指定测试使用的模型
test_dir = "./data/test" # 指定测试文件所在的目录 test_dir = "./data/test" # 指定测试文件所在的目录
word_dict = "./data/dict/word_dict.txt" # 指定字典所在的路径 word_dict = "./data/dict/word_dict.txt" # 指定字典所在的路径
label_dict = "./data/dict/label_dict.txt" # 指定类别标签字典的路径 label_dict = "./data/dict/label_dict.txt" # 指定类别标签字典的路径
``` ```
2. 在终端中执行 `python infer.py` 2. 在终端中执行 `python infer.py`
运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
---
# Youtube DNN推荐模型 # Youtube DNN推荐模型
以下是本例目录包含的文件以及对应说明: 以下是本例目录包含的文件以及对应说明:
......
The minimum PaddlePaddle version needed for the code sample in this directory is v0.10.0. If you are on a version of PaddlePaddle earlier than v0.10.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
---
# Deep Neural Networks for YouTube Recommendations # Deep Neural Networks for YouTube Recommendations
## Introduction\[[1](#References)\] ## Introduction\[[1](#References)\]
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册