提交 62ff2e43 编写于 作者: Y Yibing Liu

Init deep attention matching net

上级 3984c24b
# __Deep Attention Matching Network__
This is the source code of Deep Attention Matching network (DAM), that is proposed for multi-turn response selection in the retrieval-based chatbot.
DAM is a neural matching network that entirely based on attention mechanism. The motivation of DAM is to capture those semantic dependencies, among dialogue elements at different level of granularities, in multi-turn conversation as matching evidences, in order to better match response candidate with its multi-turn context. DAM will appear on ACL-2018, please find our paper at: http://acl2018.org/conference/accepted-papers/.
## __TensorFlow Version__
DAM is originally implemented with Tensorflow, which can be found at: https://github.com/baidu/Dialogue/DAM . We highly recommend using the PaddlePaddle Fluid version here as it supports parallely training with very large corpus.
## __Network__
DAM is inspired by Transformer in Machine Translation (Vaswani et al., 2017), and we extend the key attention mechanism of Transformer in two perspectives and introduce those two kinds of attention in one uniform neural network.
- **self-attention** To gradually capture semantic representations in different granularities by stacking attention from word-level embeddings. Those multi-grained semantic representations would facilitate exploring segmental dependencies between context and response.
- **cross-attention** Attention across context and response can generally capture the relevance in dependency between segment pairs, which could provide complementary information to textual relevance for matching response with multi-turn context.
<p align="center">
<img src="images/Figure1.png"/> <br />
Overview of Deep Attention Matching Network
</p>
## __Results__
We test DAM on two large-scale multi-turn response selection tasks, i.e., the Ubuntu Corpus v1 and Douban Conversation Corpus, experimental results are bellow:
<p align="center">
<img src="images/Figure2.png"/> <br />
</p>
## __Usage__
First, please download [data](https://pan.baidu.com/s/1hakfuuwdS8xl7NyxlWzRiQ "data") and unzip it:
```
cd data
unzip data.zip
```
If you want use well trained models directly, please download [models](https://pan.baidu.com/s/1pl4d63MBxihgrEWWfdAz0w "models") and unzip it:
```
cd output
unzip output.zip
```
Train and test the model by:
```
sh run.sh
```
## __Dependencies__
- Python >= 2.7.3
- PaddlePaddle latest develop
## __Citation__
The following article describe the DAM in detail. We recommend citing this article as default.
```
@inproceedings{ ,
title={Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network},
author={Xiangyang Zhou, Lu Li, Daxiang Dong, Yi Liu, Ying Chen, Wayne Xin Zhao, Dianhai Yu and Hua Wu},
booktitle={Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
volume={1},
pages={ -- },
year={2018}
}
```
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -u test_and_evaluate.py --use_cuda \
--data_path ./data/data.pkl \
--save_path ./ \
--model_path models/step_10000 \
--batch_size 100 \
--vocab_size 172130 \
--emb_size 200 \
--_EOS_ 1
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -u ../train_and_evaluate.py --use_cuda \
--data_path ./data/data.pkl \
--save_path ./models \
--batch_size 100 \
--vocab_size 172130 \
--emb_size 200 \
--_EOS_ 1
import cPickle as pickle
import numpy as np
import paddle.fluid as fluid
import utils.layers as layers
class Net(object):
def __init__(self, max_turn_num, max_turn_len, vocab_size, emb_size,
stack_num):
self._max_turn_num = max_turn_num
self._max_turn_len = max_turn_len
self._vocab_size = vocab_size
self._emb_size = emb_size
self._stack_num = stack_num
self.word_emb_name = "shared_word_emb"
def create_network(self):
turns_data = []
for i in xrange(self._max_turn_num):
turn = fluid.layers.data(
name="turn_%d" % i,
shape=[self._max_turn_len, 1],
dtype="int32")
turns_data.append(turn)
turns_mask = []
for i in xrange(self._max_turn_num):
turn_mask = fluid.layers.data(
name="turn_mask_%d" % i,
shape=[self._max_turn_len],
dtype="float32")
turns_mask.append(turn_mask)
response = fluid.layers.data(
name="response", shape=[self._max_turn_len, 1], dtype="int32")
response_mask = fluid.layers.data(
name="response_mask", shape=[self._max_turn_len], dtype="float32")
label = fluid.layers.data(name="label", shape=[1], dtype="float32")
response_emb = fluid.layers.embedding(
input=response,
size=[self._vocab_size + 1, self._emb_size],
param_attr=fluid.ParamAttr(
name=self.word_emb_name,
initializer=fluid.initializer.Normal(scale=0.1)))
# response part
Hr = response_emb
Hr_stack = [Hr]
for index in range(self._stack_num):
Hr = layers.block(
name="response_self_stack" + str(index),
query=Hr,
key=Hr,
value=Hr,
d_key=self._emb_size,
q_mask=response_mask,
k_mask=response_mask)
Hr_stack.append(Hr)
# context part
sim_turns = []
for t in xrange(self._max_turn_num):
Hu = fluid.layers.embedding(
input=turns_data[t],
size=[self._vocab_size + 1, self._emb_size],
param_attr=fluid.ParamAttr(
name=self.word_emb_name,
initializer=fluid.initializer.Normal(scale=0.1)))
Hu_stack = [Hu]
for index in range(self._stack_num):
# share parameters
Hu = layers.block(
name="turn_self_stack" + str(index),
query=Hu,
key=Hu,
value=Hu,
d_key=self._emb_size,
q_mask=turns_mask[t],
k_mask=turns_mask[t])
Hu_stack.append(Hu)
# cross attention
r_a_t_stack = []
t_a_r_stack = []
for index in range(self._stack_num + 1):
t_a_r = layers.block(
name="t_attend_r_" + str(index),
query=Hu_stack[index],
key=Hr_stack[index],
value=Hr_stack[index],
d_key=self._emb_size,
q_mask=turns_mask[t],
k_mask=response_mask)
r_a_t = layers.block(
name="r_attend_t_" + str(index),
query=Hr_stack[index],
key=Hu_stack[index],
value=Hu_stack[index],
d_key=self._emb_size,
q_mask=response_mask,
k_mask=turns_mask[t])
t_a_r_stack.append(t_a_r)
r_a_t_stack.append(r_a_t)
t_a_r_stack.extend(Hu_stack)
r_a_t_stack.extend(Hr_stack)
for index in xrange(len(t_a_r_stack)):
t_a_r_stack[index] = fluid.layers.unsqueeze(
input=t_a_r_stack[index], axes=[1])
r_a_t_stack[index] = fluid.layers.unsqueeze(
input=r_a_t_stack[index], axes=[1])
t_a_r = fluid.layers.concat(input=t_a_r_stack, axis=1)
r_a_t = fluid.layers.concat(input=r_a_t_stack, axis=1)
# sim shape: [batch_size, 2*(stack_num+2), max_turn_len, max_turn_len]
sim = fluid.layers.matmul(x=t_a_r, y=r_a_t, transpose_y=True)
sim = fluid.layers.scale(x=sim, scale=1 / np.sqrt(200.0))
sim_turns.append(sim)
for index in xrange(len(sim_turns)):
sim_turns[index] = fluid.layers.unsqueeze(
input=sim_turns[index], axes=[2])
# sim shape: [batch_size, 2*(stack_num+2), max_turn_num, max_turn_len, max_turn_len]
sim = fluid.layers.concat(input=sim_turns, axis=2)
# for douban
final_info = layers.cnn_3d(sim, 16, 16)
loss, logits = layers.loss(final_info, label)
return loss, logits
import os
import numpy as np
import time
import argparse
import multiprocessing
import paddle
import paddle.fluid as fluid
import utils.reader as reader
import cPickle as pickle
from utils.util import print_arguments
import utils.evaluation as eva
from model import Net
#yapf: disable
def parse_args():
parser = argparse.ArgumentParser("Test for DAM.")
parser.add_argument(
'--batch_size',
type=int,
default=256,
help='Batch size for training. (default: %(default)d)')
parser.add_argument(
'--num_scan_data',
type=int,
default=2,
help='Number of pass for training. (default: %(default)d)')
parser.add_argument(
'--learning_rate',
type=float,
default=1e-3,
help='Learning rate used to train. (default: %(default)f)')
parser.add_argument(
'--data_path',
type=str,
default="data/ubuntu/data_small.pkl",
help='Path to training data. (default: %(default)s)')
parser.add_argument(
'--save_path',
type=str,
default="./",
help='Path to save score and result files. (default: %(default)s)')
parser.add_argument(
'--model_path',
type=str,
default="saved_models/step_1000",
help='Path to load well-trained models. (default: %(default)s)')
parser.add_argument(
'--use_cuda',
action='store_true',
help='If set, use cuda for training.')
parser.add_argument(
'--max_turn_num',
type=int,
default=9,
help='Maximum number of utterances in context.')
parser.add_argument(
'--max_turn_len',
type=int,
default=50,
help='Maximum length of setences in turns.')
parser.add_argument(
'--word_emb_init',
type=str,
default=None,
help='Path to the initial word embedding.')
parser.add_argument(
'--vocab_size',
type=int,
default=434512,
help='The size of vocabulary.')
parser.add_argument(
'--emb_size',
type=int,
default=200,
help='The dimension of word embedding.')
parser.add_argument(
'--_EOS_',
type=int,
default=28270,
help='The id for end of sentence in vocabulary.')
parser.add_argument(
'--stack_num',
type=int,
default=5,
help='The number of stacked attentive modules in network.')
args = parser.parse_args()
return args
#yapf: enable
def test(args):
if not os.path.exists(args.save_path):
raise ValueError("Invalid save path %s" % args.save_path)
if not os.path.exists(args.model_path):
raise ValueError("Invalid model init path %s" % args.model_path)
# data data_config
data_conf = {
"batch_size": args.batch_size,
"max_turn_num": args.max_turn_num,
"max_turn_len": args.max_turn_len,
"_EOS_": args._EOS_,
}
dam = Net(args.max_turn_num, args.max_turn_len, args.vocab_size,
args.emb_size, args.stack_num)
loss, logits = dam.create_network()
loss.persistable = True
# gradient clipping
fluid.clip.set_gradient_clip(clip=fluid.clip.GradientClipByValue(
max=1.0, min=-1.0))
test_program = fluid.default_main_program().clone(for_test=True)
optimizer = fluid.optimizer.Adam(
learning_rate=fluid.layers.exponential_decay(
learning_rate=args.learning_rate,
decay_steps=400,
decay_rate=0.9,
staircase=True))
optimizer.minimize(loss)
# The fethced loss is wrong when mem opt is enabled
fluid.memory_optimize(fluid.default_main_program())
if args.use_cuda:
place = fluid.CUDAPlace(0)
dev_count = fluid.core.get_cuda_device_count()
else:
place = fluid.CPUPlace()
dev_count = multiprocessing.cpu_count()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
fluid.io.load_persistables(exe, args.model_path)
test_exe = fluid.ParallelExecutor(
use_cuda=args.use_cuda, main_program=test_program)
print("start loading data ...")
train_data, val_data, test_data = pickle.load(open(args.data_path, 'rb'))
print("finish loading data ...")
test_batches = reader.build_batches(test_data, data_conf)
test_batch_num = len(test_batches["response"])
print("test batch num: %d" % test_batch_num)
print("begin inference ...")
print(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time())))
score_path = os.path.join(args.save_path, 'score.txt')
score_file = open(score_path, 'w')
for it in xrange(test_batch_num // dev_count):
feed_list = []
for dev in xrange(dev_count):
index = it * dev_count + dev
feed_dict = reader.make_one_batch_input(test_batches, index)
feed_list.append(feed_dict)
predicts = test_exe.run(feed=feed_list, fetch_list=[logits.name])
scores = np.array(predicts[0])
print("step = %d" % it)
for dev in xrange(dev_count):
index = it * dev_count + dev
for i in xrange(args.batch_size):
score_file.write(
str(scores[args.batch_size * dev + i][0]) + '\t' + str(
test_batches["label"][index][i]) + '\n')
score_file.close()
#write evaluation result
result = eva.evaluate(score_path)
result_file_path = os.path.join(args.save_path, 'result.txt')
with open(result_file_path, 'w') as out_file:
for p_at in result:
out_file.write(str(p_at) + '\n')
print('finish test')
print(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time())))
if __name__ == '__main__':
args = parse_args()
print_arguments(args)
test(args)
import os
import numpy as np
import time
import argparse
import multiprocessing
import paddle
import paddle.fluid as fluid
import utils.reader as reader
import cPickle as pickle
from utils.util import print_arguments
import utils.evaluation as eva
from model import Net
#yapf: disable
def parse_args():
parser = argparse.ArgumentParser("Training DAM.")
parser.add_argument(
'--batch_size',
type=int,
default=256,
help='Batch size for training. (default: %(default)d)')
parser.add_argument(
'--num_scan_data',
type=int,
default=2,
help='Number of pass for training. (default: %(default)d)')
parser.add_argument(
'--learning_rate',
type=float,
default=1e-3,
help='Learning rate used to train. (default: %(default)f)')
parser.add_argument(
'--data_path',
type=str,
default="data/ubuntu/data_small.pkl",
help='Path to training data. (default: %(default)s)')
parser.add_argument(
'--save_path',
type=str,
default="saved_models",
help='Path to save trained models. (default: %(default)s)')
parser.add_argument(
'--use_cuda',
action='store_true',
help='If set, use cuda for training.')
parser.add_argument(
'--max_turn_num',
type=int,
default=9,
help='Maximum number of utterances in context.')
parser.add_argument(
'--max_turn_len',
type=int,
default=50,
help='Maximum length of setences in turns.')
parser.add_argument(
'--word_emb_init',
type=str,
default=None,
help='Path to the initial word embedding.')
parser.add_argument(
'--vocab_size',
type=int,
default=434512,
help='The size of vocabulary.')
parser.add_argument(
'--emb_size',
type=int,
default=200,
help='The dimension of word embedding.')
parser.add_argument(
'--_EOS_',
type=int,
default=28270,
help='The id for end of sentence in vocabulary.')
parser.add_argument(
'--stack_num',
type=int,
default=5,
help='The number of stacked attentive modules in network.')
args = parser.parse_args()
return args
#yapf: enable
def train(args):
# data data_config
data_conf = {
"batch_size": args.batch_size,
"max_turn_num": args.max_turn_num,
"max_turn_len": args.max_turn_len,
"_EOS_": args._EOS_,
}
dam = Net(args.max_turn_num, args.max_turn_len, args.vocab_size,
args.emb_size, args.stack_num)
loss, logits = dam.create_network()
loss.persistable = True
logits.persistable = True
train_program = fluid.default_main_program()
test_program = fluid.default_main_program().clone(for_test=True)
# gradient clipping
fluid.clip.set_gradient_clip(clip=fluid.clip.GradientClipByValue(
max=1.0, min=-1.0))
optimizer = fluid.optimizer.Adam(
learning_rate=fluid.layers.exponential_decay(
learning_rate=args.learning_rate,
decay_steps=400,
decay_rate=0.9,
staircase=True))
optimizer.minimize(loss)
fluid.memory_optimize(train_program)
if args.use_cuda:
place = fluid.CUDAPlace(0)
dev_count = fluid.core.get_cuda_device_count()
else:
place = fluid.CPUPlace()
dev_count = multiprocessing.cpu_count()
print("device count %d" % dev_count)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
train_exe = fluid.ParallelExecutor(
use_cuda=args.use_cuda, loss_name=loss.name, main_program=train_program)
test_exe = fluid.ParallelExecutor(
use_cuda=args.use_cuda,
main_program=test_program,
share_vars_from=train_exe)
if args.word_emb_init is not None:
print("start loading word embedding init ...")
word_emb = pickle.load(open(args.word_emb_init, 'rb')).astype('float32')
print("finish loading word embedding init ...")
print("start loading data ...")
train_data, val_data, test_data = pickle.load(open(args.data_path, 'rb'))
print("finish loading data ...")
val_batches = reader.build_batches(val_data, data_conf)
batch_num = len(train_data['y']) / args.batch_size
val_batch_num = len(val_batches["response"])
print_step = max(1, batch_num / (dev_count * 100))
save_step = max(1, batch_num / (dev_count * 10))
word_emb_inited = False
print("begin model training ...")
print(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time())))
step = 0
for epoch in xrange(args.num_scan_data):
shuffle_train = reader.unison_shuffle(train_data)
train_batches = reader.build_batches(shuffle_train, data_conf)
ave_cost = 0.0
for it in xrange(batch_num // dev_count):
feed_list = []
for dev in xrange(dev_count):
index = it * dev_count + dev
feed_dict = reader.make_one_batch_input(train_batches, index)
if word_emb_inited is False and args.word_emb_init is not None:
feed_dict[dam.word_emb_name] = word_emb
feed_list.append(feed_dict)
word_emb_inited = True
cost = train_exe.run(feed=feed_list, fetch_list=[loss.name])
ave_cost += np.array(cost[0]).mean()
step = step + 1
if step % print_step == 0:
print("processed: [" + str(step * dev_count * 1.0 / batch_num) +
"] ave loss: [" + str(ave_cost / print_step) + "]")
ave_cost = 0.0
if (args.save_path is not None) and (step % save_step == 0):
save_path = os.path.join(args.save_path, "step_" + str(step))
print("Save model at step %d ... " % step)
print(time.strftime('%Y-%m-%d %H:%M:%S',
time.localtime(time.time())))
fluid.io.save_persistables(exe, save_path)
score_path = os.path.join(args.save_path, 'score.' + str(step))
score_file = open(score_path, 'w')
for it in xrange(val_batch_num // dev_count):
feed_list = []
for dev in xrange(dev_count):
val_index = it * dev_count + dev
feed_dict = reader.make_one_batch_input(val_batches,
val_index)
feed_list.append(feed_dict)
predicts = test_exe.run(feed=feed_list,
fetch_list=[logits.name])
scores = np.array(predicts[0])
for dev in xrange(dev_count):
val_index = it * dev_count + dev
for i in xrange(args.batch_size):
score_file.write(
str(scores[args.batch_size * dev + i][0]) + '\t'
+ str(val_batches["label"][val_index][
i]) + '\n')
score_file.close()
#write evaluation result
result = eva.evaluate(score_path)
result_file_path = os.path.join(args.save_path,
'result.' + str(step))
with open(result_file_path, 'w') as out_file:
for p_at in result:
out_file.write(str(p_at) + '\n')
print('finish evaluation')
print(time.strftime('%Y-%m-%d %H:%M:%S',
time.localtime(time.time())))
if __name__ == '__main__':
args = parse_args()
print_arguments(args)
train(args)
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -u test_and_evaluate.py --use_cuda \
--data_path ./data/data.pkl \
--save_path ./ \
--model_path models/step_10000 \
--batch_size 100 \
--vocab_size 434512 \
--emb_size 200 \
--_EOS_ 28270
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -u ../train_and_evaluate.py --use_cuda \
--data_path ./data/data.pkl \
--save_path ./models \
--batch_size 100 \
--vocab_size 434512 \
--emb_size 200 \
--_EOS_ 28270
import sys
def get_p_at_n_in_m(data, n, m, ind):
pos_score = data[ind][0]
curr = data[ind:ind + m]
curr = sorted(curr, key=lambda x: x[0], reverse=True)
if curr[n - 1][0] <= pos_score:
return 1
return 0
def evaluate(file_path):
data = []
with open(file_path, 'r') as file:
for line in file:
line = line.strip()
tokens = line.split("\t")
if len(tokens) != 2:
continue
data.append((float(tokens[0]), int(tokens[1])))
#assert len(data) % 10 == 0
p_at_1_in_2 = 0.0
p_at_1_in_10 = 0.0
p_at_2_in_10 = 0.0
p_at_5_in_10 = 0.0
length = len(data) / 10
for i in xrange(0, length):
ind = i * 10
assert data[ind][1] == 1
p_at_1_in_2 += get_p_at_n_in_m(data, 1, 2, ind)
p_at_1_in_10 += get_p_at_n_in_m(data, 1, 10, ind)
p_at_2_in_10 += get_p_at_n_in_m(data, 2, 10, ind)
p_at_5_in_10 += get_p_at_n_in_m(data, 5, 10, ind)
return (p_at_1_in_2 / length, p_at_1_in_10 / length, p_at_2_in_10 / length,
p_at_5_in_10 / length)
import paddle.fluid as fluid
def loss(x, y, clip_value=10.0):
"""Calculate the sigmoid cross entropy with logits for input(x).
Args:
x: Variable with shape with shape [batch, dim]
y: Input label
Returns:
loss: cross entropy
logits: prediction
"""
logits = fluid.layers.fc(
input=x,
size=1,
bias_attr=fluid.ParamAttr(initializer=fluid.initializer.Constant(0.)))
loss = fluid.layers.sigmoid_cross_entropy_with_logits(x=logits, label=y)
loss = fluid.layers.reduce_mean(
fluid.layers.clip(
loss, min=-clip_value, max=clip_value))
return loss, logits
def ffn(input, d_inner_hid, d_hid, name=None):
"""Position-wise Feed-Forward Network
"""
hidden = fluid.layers.fc(input=input,
size=d_inner_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc.w_0'),
bias_attr=fluid.ParamAttr(
name=name + '_fc.b_0',
initializer=fluid.initializer.Constant(0.)),
act="relu")
out = fluid.layers.fc(input=hidden,
size=d_hid,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=name + '_fc.w_1'),
bias_attr=fluid.ParamAttr(
name=name + '_fc.b_1',
initializer=fluid.initializer.Constant(0.)))
return out
def dot_product_attention(query,
key,
value,
d_key,
q_mask=None,
k_mask=None,
dropout_rate=None):
"""Dot product layer.
Args:
query: a tensor with shape [batch, Q_time, Q_dimension]
key: a tensor with shape [batch, time, K_dimension]
value: a tensor with shape [batch, time, V_dimension]
q_lengths: a tensor with shape [batch]
k_lengths: a tensor with shape [batch]
Returns:
a tensor with shape [batch, query_time, value_dimension]
Raises:
AssertionError: if Q_dimension not equal to K_dimension when attention
type is dot.
"""
logits = fluid.layers.matmul(x=query, y=key, transpose_y=True)
logits = logits * (d_key**(-0.5))
if (q_mask is not None) and (k_mask is not None):
q_mask = fluid.layers.unsqueeze(input=q_mask, axes=[-1])
k_mask = fluid.layers.unsqueeze(input=k_mask, axes=[-1])
mask = fluid.layers.matmul(x=q_mask, y=k_mask, transpose_y=True)
logits = mask * logits + (1 - mask) * (-2**32 + 1)
attention = fluid.layers.softmax(logits)
if dropout_rate:
attention = fluid.layers.dropout(
input=attention, dropout_prob=dropout_rate, is_test=False, seed=2)
atten_out = fluid.layers.matmul(x=attention, y=value)
return atten_out
def block(name,
query,
key,
value,
d_key,
q_mask=None,
k_mask=None,
is_layer_norm=True,
dropout_rate=None):
"""
"""
att_out = dot_product_attention(query, key, value, d_key, q_mask, k_mask,
dropout_rate)
y = query + att_out
if is_layer_norm:
y = fluid.layers.layer_norm(
input=y,
begin_norm_axis=len(y.shape) - 1,
param_attr=fluid.ParamAttr(
initializer=fluid.initializer.Constant(1.),
name=name + '_layer_norm.w_0'),
bias_attr=fluid.ParamAttr(
initializer=fluid.initializer.Constant(0.),
name=name + '_layer_norm.b_0'))
z = ffn(y, d_key, d_key, name)
w = y + z
if is_layer_norm:
w = fluid.layers.layer_norm(
input=w,
begin_norm_axis=len(w.shape) - 1,
param_attr=fluid.ParamAttr(
initializer=fluid.initializer.Constant(1.),
name=name + '_layer_norm.w_1'),
bias_attr=fluid.ParamAttr(
initializer=fluid.initializer.Constant(0.),
name=name + '_layer_norm.b_1'))
return w
def cnn_3d(input, out_channels_0, out_channels_1, add_relu=True):
# same padding
conv_0 = fluid.layers.conv3d(
name="conv3d_0",
input=input,
num_filters=out_channels_0,
filter_size=[3, 3, 3],
padding=[1, 1, 1],
act="elu" if add_relu else None,
param_attr=fluid.ParamAttr(initializer=fluid.initializer.Uniform(
low=-0.01, high=0.01)),
bias_attr=fluid.ParamAttr(
initializer=fluid.initializer.Constant(value=0.0)))
# same padding
pooling_0 = fluid.layers.pool3d(
input=conv_0,
pool_type="max",
pool_size=3,
pool_padding=1,
pool_stride=3)
conv_1 = fluid.layers.conv3d(
name="conv3d_1",
input=pooling_0,
num_filters=out_channels_1,
filter_size=[3, 3, 3],
padding=[1, 1, 1],
act="elu" if add_relu else None,
param_attr=fluid.ParamAttr(initializer=fluid.initializer.Uniform(
low=-0.01, high=0.01)),
bias_attr=fluid.ParamAttr(
initializer=fluid.initializer.Constant(value=0.0)))
# same padding
pooling_1 = fluid.layers.pool3d(
input=conv_1,
pool_type="max",
pool_size=3,
pool_padding=1,
pool_stride=3)
return pooling_1
import cPickle as pickle
import numpy as np
def unison_shuffle(data, seed=None):
if seed is not None:
np.random.seed(seed)
y = np.array(data['y'])
c = np.array(data['c'])
r = np.array(data['r'])
assert len(y) == len(c) == len(r)
p = np.random.permutation(len(y))
shuffle_data = {'y': y[p], 'c': c[p], 'r': r[p]}
return shuffle_data
def split_c(c, split_id):
'''c is a list, example context
split_id is a integer, conf[_EOS_]
return nested list
'''
turns = [[]]
for _id in c:
if _id != split_id:
turns[-1].append(_id)
else:
turns.append([])
if turns[-1] == [] and len(turns) > 1:
turns.pop()
return turns
def normalize_length(_list, length, cut_type='tail'):
'''_list is a list or nested list, example turns/r/single turn c
cut_type is head or tail, if _list len > length is used
return a list len=length and min(read_length, length)
'''
real_length = len(_list)
if real_length == 0:
return [0] * length, 0
if real_length <= length:
if not isinstance(_list[0], list):
_list.extend([0] * (length - real_length))
else:
_list.extend([[]] * (length - real_length))
return _list, real_length
if cut_type == 'head':
return _list[:length], length
if cut_type == 'tail':
return _list[-length:], length
def produce_one_sample(data,
index,
split_id,
max_turn_num,
max_turn_len,
turn_cut_type='tail',
term_cut_type='tail'):
'''max_turn_num=10
max_turn_len=50
return y, nor_turns_nor_c, nor_r, turn_len, term_len, r_len
'''
c = data['c'][index]
r = data['r'][index][:]
y = data['y'][index]
turns = split_c(c, split_id)
#normalize turns_c length, nor_turns length is max_turn_num
nor_turns, turn_len = normalize_length(turns, max_turn_num, turn_cut_type)
nor_turns_nor_c = []
term_len = []
#nor_turn_nor_c length is max_turn_num, element is a list length is max_turn_len
for c in nor_turns:
#nor_c length is max_turn_len
nor_c, nor_c_len = normalize_length(c, max_turn_len, term_cut_type)
nor_turns_nor_c.append(nor_c)
term_len.append(nor_c_len)
nor_r, r_len = normalize_length(r, max_turn_len, term_cut_type)
return y, nor_turns_nor_c, nor_r, turn_len, term_len, r_len
def build_one_batch(data,
batch_index,
conf,
turn_cut_type='tail',
term_cut_type='tail'):
_turns = []
_tt_turns_len = []
_every_turn_len = []
_response = []
_response_len = []
_label = []
for i in range(conf['batch_size']):
index = batch_index * conf['batch_size'] + i
y, nor_turns_nor_c, nor_r, turn_len, term_len, r_len = produce_one_sample(
data, index, conf['_EOS_'], conf['max_turn_num'],
conf['max_turn_len'], turn_cut_type, term_cut_type)
_label.append(y)
_turns.append(nor_turns_nor_c)
_response.append(nor_r)
_every_turn_len.append(term_len)
_tt_turns_len.append(turn_len)
_response_len.append(r_len)
return _turns, _tt_turns_len, _every_turn_len, _response, _response_len, _label
def build_one_batch_dict(data,
batch_index,
conf,
turn_cut_type='tail',
term_cut_type='tail'):
_turns, _tt_turns_len, _every_turn_len, _response, _response_len, _label = build_one_batch(
data, batch_index, conf, turn_cut_type, term_cut_type)
ans = {
'turns': _turns,
'tt_turns_len': _tt_turns_len,
'every_turn_len': _every_turn_len,
'response': _response,
'response_len': _response_len,
'label': _label
}
return ans
def build_batches(data, conf, turn_cut_type='tail', term_cut_type='tail'):
_turns_batches = []
_tt_turns_len_batches = []
_every_turn_len_batches = []
_response_batches = []
_response_len_batches = []
_label_batches = []
batch_len = len(data['y']) / conf['batch_size']
for batch_index in range(batch_len):
_turns, _tt_turns_len, _every_turn_len, _response, _response_len, _label = build_one_batch(
data, batch_index, conf, turn_cut_type='tail', term_cut_type='tail')
_turns_batches.append(_turns)
_tt_turns_len_batches.append(_tt_turns_len)
_every_turn_len_batches.append(_every_turn_len)
_response_batches.append(_response)
_response_len_batches.append(_response_len)
_label_batches.append(_label)
ans = {
"turns": _turns_batches,
"tt_turns_len": _tt_turns_len_batches,
"every_turn_len": _every_turn_len_batches,
"response": _response_batches,
"response_len": _response_len_batches,
"label": _label_batches
}
return ans
def make_one_batch_input(data_batches, index):
"""Split turns and return feeding data.
Args:
data_batches: All data batches
index: The index for current batch
Return:
feeding dictionary
"""
turns = np.array(data_batches["turns"][index])
tt_turns_len = np.array(data_batches["tt_turns_len"][index])
every_turn_len = np.array(data_batches["every_turn_len"][index])
response = np.array(data_batches["response"][index])
response_len = np.array(data_batches["response_len"][index])
batch_size = turns.shape[0]
max_turn_num = turns.shape[1]
max_turn_len = turns.shape[2]
turns_list = [turns[:, i, :] for i in xrange(max_turn_num)]
every_turn_len_list = [every_turn_len[:, i] for i in xrange(max_turn_num)]
feed_dict = {}
for i, turn in enumerate(turns_list):
feed_dict["turn_%d" % i] = turn
feed_dict["turn_%d" % i] = np.expand_dims(
feed_dict["turn_%d" % i], axis=-1)
for i, turn_len in enumerate(every_turn_len_list):
feed_dict["turn_mask_%d" % i] = np.ones(
(batch_size, max_turn_len)).astype("float32")
for row in xrange(batch_size):
feed_dict["turn_mask_%d" % i][row, turn_len[row]:] = 0
feed_dict["response"] = response
feed_dict["response"] = np.expand_dims(feed_dict["response"], axis=-1)
feed_dict["response_mask"] = np.ones(
(batch_size, max_turn_len)).astype("float32")
for row in xrange(batch_size):
feed_dict["response_mask"][row, response_len[row]:] = 0
feed_dict["label"] = np.array([data_batches["label"][index]]).reshape(
[-1, 1]).astype("float32")
return feed_dict
if __name__ == '__main__':
conf = {
"batch_size": 256,
"max_turn_num": 10,
"max_turn_len": 50,
"_EOS_": 28270,
}
train, val, test = pickle.load(open('../data/ubuntu/data_small.pkl', 'rb'))
print('load data success')
train_batches = build_batches(train, conf)
val_batches = build_batches(val, conf)
test_batches = build_batches(test, conf)
print('build batches success')
pickle.dump([train_batches, val_batches, test_batches],
open('../data/ubuntu/data_small_xxx.pkl', 'wb'))
print('dump success')
def print_arguments(args):
print('----------- Configuration Arguments -----------')
for arg, value in sorted(vars(args).iteritems()):
print('%s: %s' % (arg, value))
print('------------------------------------------------')
def pos_encoding_init():
pass
def scaled_dot_product_attention():
pass
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册