提交 48cf8096 编写于 作者: X xixiaoyao

merge from master

...@@ -741,7 +741,7 @@ BERT包含了如下输入对象 ...@@ -741,7 +741,7 @@ BERT包含了如下输入对象
```yaml ```yaml
token_ids: 一个shape为[batch_size, seq_len]的矩阵,每行是一条样本,其中的每个元素为文本中的每个token对应的单词id。 token_ids: 一个shape为[batch_size, seq_len]的矩阵,每行是一条样本,其中的每个元素为文本中的每个token对应的单词id。
position_ids: 一个shape为[batch_size, seq_len]的矩阵,每行是一条样本,其中的每个元素为文本中的每个token对应的位置id。 position_ids: 一个shape为[batch_size, seq_len]的矩阵,每行是一条样本,其中的每个元素为文本中的每个token对应的位置id。
segment_ids: 一个shape为[batch_size, seq_len]的0/1矩阵,用于支持BERT、ERNIE等模型的输入,当元素为0时,代表当前token属于分类任务或匹配任务的text1,为1时代表当前token属于匹配任务的text2. segment_ids: 一个shape为[batch_size, seq_len]的0/1矩阵,用于支持BERT、ERNIE等模型的输入,当元素为0时,代表当前token属于分类任务或匹配任务的text1,为1时代表当前token属于匹配任务的text2
input_mask: 一个shape为[batch_size, seq_len]的矩阵,其中的每个元素为0或1,表示该位置是否是padding词(为1时代表是真实词,为0时代表是填充词)。 input_mask: 一个shape为[batch_size, seq_len]的矩阵,其中的每个元素为0或1,表示该位置是否是padding词(为1时代表是真实词,为0时代表是填充词)。
``` ```
...@@ -781,6 +781,7 @@ sentence_pair_embedding: 一个shape为[batch_size, hidden_size]的matrix, float ...@@ -781,6 +781,7 @@ sentence_pair_embedding: 一个shape为[batch_size, hidden_size]的matrix, float
## 附录C:内置任务范式(paradigm) ## 附录C:内置任务范式(paradigm)
#### 分类范式:cls #### 分类范式:cls
分类范式额外包含以下配置字段: 分类范式额外包含以下配置字段:
...@@ -788,6 +789,7 @@ sentence_pair_embedding: 一个shape为[batch_size, hidden_size]的matrix, float ...@@ -788,6 +789,7 @@ sentence_pair_embedding: 一个shape为[batch_size, hidden_size]的matrix, float
```yaml ```yaml
n_classes(REQUIRED): int类型。分类任务的类别数。 n_classes(REQUIRED): int类型。分类任务的类别数。
pred_output_path (OPTIONAL) : str类型。预测输出结果的保存路径,当该参数未空时,保存至全局配置文件中的`save_path`字段指定路径下的任务目录。 pred_output_path (OPTIONAL) : str类型。预测输出结果的保存路径,当该参数未空时,保存至全局配置文件中的`save_path`字段指定路径下的任务目录。
save_infermodel_every_n_steps (OPTIONAL) : int类型。周期性保存预测模型的间隔,未设置或设为-1时仅在该任务训练结束时保存预测模型。默认为-1。
``` ```
分类范式包含如下的输入对象: 分类范式包含如下的输入对象:
...@@ -812,6 +814,7 @@ sentence_embedding: 一个shape为[batch_size, hidden_size]的matrix, float32类 ...@@ -812,6 +814,7 @@ sentence_embedding: 一个shape为[batch_size, hidden_size]的matrix, float32类
```yaml ```yaml
pred_output_path (OPTIONAL) : str类型。预测输出结果的保存路径,当该参数未空时,保存至全局配置文件中的`save_path`字段指定路径下的任务目录。 pred_output_path (OPTIONAL) : str类型。预测输出结果的保存路径,当该参数未空时,保存至全局配置文件中的`save_path`字段指定路径下的任务目录。
save_infermodel_every_n_steps (OPTIONAL) : int类型。周期性保存预测模型的间隔,未设置或设为-1时仅在该任务训练结束时保存预测模型。默认为-1。
``` ```
匹配范式包含如下的输入对象: 匹配范式包含如下的输入对象:
...@@ -838,6 +841,7 @@ sentence_pair_embedding: 一个shape为[batch_size, hidden_size]的matrix, float ...@@ -838,6 +841,7 @@ sentence_pair_embedding: 一个shape为[batch_size, hidden_size]的matrix, float
max_answer_len(REQUIRED): int类型。预测的最大答案长度 max_answer_len(REQUIRED): int类型。预测的最大答案长度
n_best_size (OPTIONAL) : int类型,默认为20。预测时保存的nbest回答文件中每条样本的n_best数量 n_best_size (OPTIONAL) : int类型,默认为20。预测时保存的nbest回答文件中每条样本的n_best数量
pred_output_path (OPTIONAL) : str类型。预测输出结果的保存路径,当该参数未空时,保存至全局配置文件中的`save_path`字段指定路径下的任务目录 pred_output_path (OPTIONAL) : str类型。预测输出结果的保存路径,当该参数未空时,保存至全局配置文件中的`save_path`字段指定路径下的任务目录
save_infermodel_every_n_steps (OPTIONAL) : int类型。周期性保存预测模型的间隔,未设置或设为-1时仅在该任务训练结束时保存预测模型。默认为-1。
``` ```
机器阅读理解范式包含如下的输入对象: 机器阅读理解范式包含如下的输入对象:
...@@ -885,7 +889,8 @@ do_lower_case (OPTIONAL): bool类型。大小写标志位。默认为False,即 ...@@ -885,7 +889,8 @@ do_lower_case (OPTIONAL): bool类型。大小写标志位。默认为False,即
for_cn: bool类型。中文模式标志位。默认为False,即默认输入为英文,设置为True后,分词器、后处理等按照中文语言进行处理。 for_cn: bool类型。中文模式标志位。默认为False,即默认输入为英文,设置为True后,分词器、后处理等按照中文语言进行处理。
print_every_n_steps (OPTIONAL): int类型。默认为5。训练阶段打印日志的频率(step为单位)。 print_every_n_steps (OPTIONAL): int类型。默认为5。训练阶段打印日志的频率(step为单位)。
save_every_n_steps (OPTIONAL): int类型。默认为-1。训练过程中保存checkpoint模型的频率,默认不保存。 save_ckpt_every_n_steps (OPTIONAL): int类型。默认为-1。训练过程中保存完整计算图的检查点(checkpoint)的频率,默认-1,仅在最后一个step自动保存检查点。
save_infermodel_every_n_steps (OPTIONAL) : int类型。周期性保存预测模型的间隔,未设置或设为-1时仅在该任务训练结束时保存预测模型。默认为-1。
optimizer(REQUIRED): str类型。优化器名称,目前框架只支持adam,未来会支持更多优化器。 optimizer(REQUIRED): str类型。优化器名称,目前框架只支持adam,未来会支持更多优化器。
learning_rate(REQUIRED): str类型。训练阶段的学习率。 learning_rate(REQUIRED): str类型。训练阶段的学习率。
......
...@@ -12,6 +12,8 @@ do_lower_case: True ...@@ -12,6 +12,8 @@ do_lower_case: True
max_seq_len: 512 max_seq_len: 512
batch_size: 4 batch_size: 4
save_ckpt_every_n_steps: 5
save_infermodel_every_n_steps: 5
num_epochs: 2 num_epochs: 2
optimizer: "adam" optimizer: "adam"
learning_rate: 3e-5 learning_rate: 3e-5
......
...@@ -33,6 +33,7 @@ ssl._create_default_https_context = ssl._create_unverified_context ...@@ -33,6 +33,7 @@ ssl._create_default_https_context = ssl._create_unverified_context
_items = { _items = {
'pretrain': {'ernie-en-uncased-large': 'https://ernie.bj.bcebos.com/ERNIE_Large_en_stable-2.0.0.tar.gz', 'pretrain': {'ernie-en-uncased-large': 'https://ernie.bj.bcebos.com/ERNIE_Large_en_stable-2.0.0.tar.gz',
'bert-en-uncased-large': 'https://bert-models.bj.bcebos.com/uncased_L-24_H-1024_A-16.tar.gz', 'bert-en-uncased-large': 'https://bert-models.bj.bcebos.com/uncased_L-24_H-1024_A-16.tar.gz',
'bert-en-uncased-base': 'https://bert-models.bj.bcebos.com/uncased_L-12_H-768_A-12.tar.gz',
'utils': None}, 'utils': None},
'reader': {'utils': None}, 'reader': {'utils': None},
'backbone': {'utils': None}, 'backbone': {'utils': None},
...@@ -90,7 +91,7 @@ def _download(item, scope, path, silent=False): ...@@ -90,7 +91,7 @@ def _download(item, scope, path, silent=False):
tar.extractall(path = data_dir) tar.extractall(path = data_dir)
tar.close() tar.close()
os.remove(filename) os.remove(filename)
if scope == 'bert-en-uncased-large': if scope.startswith('bert'):
source_path = data_dir + '/' + data_name.split('.')[0] source_path = data_dir + '/' + data_name.split('.')[0]
fileList = os.listdir(source_path) fileList = os.listdir(source_path)
for file in fileList: for file in fileList:
......
...@@ -52,9 +52,9 @@ class Model(backbone): ...@@ -52,9 +52,9 @@ class Model(backbone):
@property @property
def inputs_attr(self): def inputs_attr(self):
return {"token_ids": [[-1, -1, 1], 'int64'], return {"token_ids": [[-1, -1], 'int64'],
"position_ids": [[-1, -1, 1], 'int64'], "position_ids": [[-1, -1], 'int64'],
"segment_ids": [[-1, -1, 1], 'int64'], "segment_ids": [[-1, -1], 'int64'],
"input_mask": [[-1, -1, 1], 'float32']} "input_mask": [[-1, -1, 1], 'float32']}
@property @property
...@@ -73,7 +73,7 @@ class Model(backbone): ...@@ -73,7 +73,7 @@ class Model(backbone):
self._emb_dtype = 'float32' self._emb_dtype = 'float32'
# padding id in vocabulary must be set to 0 # padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding( emb_out = fluid.embedding(
input=src_ids, input=src_ids,
size=[self._voc_size, self._emb_size], size=[self._voc_size, self._emb_size],
dtype=self._emb_dtype, dtype=self._emb_dtype,
...@@ -84,14 +84,14 @@ class Model(backbone): ...@@ -84,14 +84,14 @@ class Model(backbone):
# fluid.global_scope().find_var('backbone-word_embedding').get_tensor() # fluid.global_scope().find_var('backbone-word_embedding').get_tensor()
embedding_table = fluid.default_main_program().global_block().var(scope_name+self._word_emb_name) embedding_table = fluid.default_main_program().global_block().var(scope_name+self._word_emb_name)
position_emb_out = fluid.layers.embedding( position_emb_out = fluid.embedding(
input=pos_ids, input=pos_ids,
size=[self._max_position_seq_len, self._emb_size], size=[self._max_position_seq_len, self._emb_size],
dtype=self._emb_dtype, dtype=self._emb_dtype,
param_attr=fluid.ParamAttr( param_attr=fluid.ParamAttr(
name=scope_name+self._pos_emb_name, initializer=self._param_initializer)) name=scope_name+self._pos_emb_name, initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding( sent_emb_out = fluid.embedding(
sent_ids, sent_ids,
size=[self._sent_types, self._emb_size], size=[self._sent_types, self._emb_size],
dtype=self._emb_dtype, dtype=self._emb_dtype,
......
...@@ -62,11 +62,11 @@ class Model(backbone): ...@@ -62,11 +62,11 @@ class Model(backbone):
@property @property
def inputs_attr(self): def inputs_attr(self):
return {"token_ids": [[-1, -1, 1], 'int64'], return {"token_ids": [[-1, -1], 'int64'],
"position_ids": [[-1, -1, 1], 'int64'], "position_ids": [[-1, -1], 'int64'],
"segment_ids": [[-1, -1, 1], 'int64'], "segment_ids": [[-1, -1], 'int64'],
"input_mask": [[-1, -1, 1], 'float32'], "input_mask": [[-1, -1, 1], 'float32'],
"task_ids": [[-1,-1, 1], 'int64']} "task_ids": [[-1,-1], 'int64']}
@property @property
def outputs_attr(self): def outputs_attr(self):
...@@ -85,7 +85,7 @@ class Model(backbone): ...@@ -85,7 +85,7 @@ class Model(backbone):
task_ids = inputs['task_ids'] task_ids = inputs['task_ids']
# padding id in vocabulary must be set to 0 # padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding( emb_out = fluid.embedding(
input=src_ids, input=src_ids,
size=[self._voc_size, self._emb_size], size=[self._voc_size, self._emb_size],
dtype=self._emb_dtype, dtype=self._emb_dtype,
...@@ -96,14 +96,14 @@ class Model(backbone): ...@@ -96,14 +96,14 @@ class Model(backbone):
# fluid.global_scope().find_var('backbone-word_embedding').get_tensor() # fluid.global_scope().find_var('backbone-word_embedding').get_tensor()
embedding_table = fluid.default_main_program().global_block().var(scope_name+self._word_emb_name) embedding_table = fluid.default_main_program().global_block().var(scope_name+self._word_emb_name)
position_emb_out = fluid.layers.embedding( position_emb_out = fluid.embedding(
input=pos_ids, input=pos_ids,
size=[self._max_position_seq_len, self._emb_size], size=[self._max_position_seq_len, self._emb_size],
dtype=self._emb_dtype, dtype=self._emb_dtype,
param_attr=fluid.ParamAttr( param_attr=fluid.ParamAttr(
name=scope_name+self._pos_emb_name, initializer=self._param_initializer)) name=scope_name+self._pos_emb_name, initializer=self._param_initializer))
sent_emb_out = fluid.layers.embedding( sent_emb_out = fluid.embedding(
sent_ids, sent_ids,
size=[self._sent_types, self._emb_size], size=[self._sent_types, self._emb_size],
dtype=self._emb_dtype, dtype=self._emb_dtype,
...@@ -113,7 +113,7 @@ class Model(backbone): ...@@ -113,7 +113,7 @@ class Model(backbone):
emb_out = emb_out + position_emb_out emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out emb_out = emb_out + sent_emb_out
task_emb_out = fluid.layers.embedding( task_emb_out = fluid.embedding(
task_ids, task_ids,
size=[self._task_types, self._emb_size], size=[self._task_types, self._emb_size],
dtype=self._emb_dtype, dtype=self._emb_dtype,
......
...@@ -473,7 +473,7 @@ class Controller(object): ...@@ -473,7 +473,7 @@ class Controller(object):
# compute loss # compute loss
task_id_var = net_inputs['__task_id'] task_id_var = net_inputs['__task_id']
task_id_vec = layers.one_hot(task_id_var, num_instances) task_id_vec = fluid.one_hot(task_id_var, num_instances)
losses = fluid.layers.concat([task_output_vars[inst.name+'/loss'] for inst in instances], axis=0) losses = fluid.layers.concat([task_output_vars[inst.name+'/loss'] for inst in instances], axis=0)
loss = layers.reduce_sum(task_id_vec * losses) loss = layers.reduce_sum(task_id_vec * losses)
...@@ -622,8 +622,9 @@ class Controller(object): ...@@ -622,8 +622,9 @@ class Controller(object):
global_step += 1 global_step += 1
cur_task.cur_train_step += 1 cur_task.cur_train_step += 1
if cur_task.save_infermodel_every_n_steps > 0 and cur_task.cur_train_step % cur_task.save_infermodel_every_n_steps == 0: cur_task_global_step = cur_task.cur_train_step + cur_task.cur_train_epoch * cur_task.steps_pur_epoch
cur_task.save(suffix='.step'+str(cur_task.cur_train_step)) if cur_task.is_target and cur_task.save_infermodel_every_n_steps > 0 and cur_task_global_step % cur_task.save_infermodel_every_n_steps == 0:
cur_task.save(suffix='.step'+str(cur_task_global_step))
if global_step % main_conf.get('print_every_n_steps', 5) == 0: if global_step % main_conf.get('print_every_n_steps', 5) == 0:
loss = rt_outputs[cur_task.name+'/loss'] loss = rt_outputs[cur_task.name+'/loss']
...@@ -641,7 +642,7 @@ class Controller(object): ...@@ -641,7 +642,7 @@ class Controller(object):
print(cur_task.name+': train finished!') print(cur_task.name+': train finished!')
cur_task.save() cur_task.save()
if 'save_every_n_steps' in main_conf and global_step % main_conf['save_every_n_steps'] == 0: if 'save_ckpt_every_n_steps' in main_conf and global_step % main_conf['save_ckpt_every_n_steps'] == 0:
save_path = os.path.join(main_conf['save_path'], 'ckpt', save_path = os.path.join(main_conf['save_path'], 'ckpt',
"step_" + str(global_step)) "step_" + str(global_step))
fluid.io.save_persistables(self.exe, save_path, saver_program) fluid.io.save_persistables(self.exe, save_path, saver_program)
......
...@@ -62,18 +62,18 @@ class Reader(reader): ...@@ -62,18 +62,18 @@ class Reader(reader):
@property @property
def outputs_attr(self): def outputs_attr(self):
if self._is_training: if self._is_training:
return {"token_ids": [[-1, -1, 1], 'int64'], return {"token_ids": [[-1, -1], 'int64'],
"position_ids": [[-1, -1, 1], 'int64'], "position_ids": [[-1, -1], 'int64'],
"segment_ids": [[-1, -1, 1], 'int64'], "segment_ids": [[-1, -1], 'int64'],
"input_mask": [[-1, -1, 1], 'float32'], "input_mask": [[-1, -1, 1], 'float32'],
"label_ids": [[-1,1], 'int64'], "label_ids": [[-1], 'int64'],
"task_ids": [[-1, -1, 1], 'int64'] "task_ids": [[-1, -1], 'int64']
} }
else: else:
return {"token_ids": [[-1, -1, 1], 'int64'], return {"token_ids": [[-1, -1], 'int64'],
"position_ids": [[-1, -1, 1], 'int64'], "position_ids": [[-1, -1], 'int64'],
"segment_ids": [[-1, -1, 1], 'int64'], "segment_ids": [[-1, -1], 'int64'],
"task_ids": [[-1, -1, 1], 'int64'], "task_ids": [[-1, -1], 'int64'],
"input_mask": [[-1, -1, 1], 'float32'] "input_mask": [[-1, -1, 1], 'float32']
} }
......
...@@ -72,12 +72,12 @@ class Reader(reader): ...@@ -72,12 +72,12 @@ class Reader(reader):
@property @property
def outputs_attr(self): def outputs_attr(self):
if self._is_training: if self._is_training:
return {"token_ids": [[-1, -1, 1], 'int64'], return {"token_ids": [[-1, -1], 'int64'],
"position_ids": [[-1, -1, 1], 'int64'], "position_ids": [[-1, -1], 'int64'],
"segment_ids": [[-1, -1, 1], 'int64'], "segment_ids": [[-1, -1], 'int64'],
"input_mask": [[-1, -1, 1], 'float32'], "input_mask": [[-1, -1, 1], 'float32'],
"label_ids": [[-1,1], 'int64'], "label_ids": [[-1], 'int64'],
"task_ids": [[-1, -1, 1], 'int64'] "task_ids": [[-1, -1], 'int64']
} }
if siamese: if siamese:
if learning_strategy == 'pointwise': if learning_strategy == 'pointwise':
...@@ -102,10 +102,10 @@ class Reader(reader): ...@@ -102,10 +102,10 @@ class Reader(reader):
else: else:
else: else:
return {"token_ids": [[-1, -1, 1], 'int64'], return {"token_ids": [[-1, -1], 'int64'],
"position_ids": [[-1, -1, 1], 'int64'], "position_ids": [[-1, -1], 'int64'],
"segment_ids": [[-1, -1, 1], 'int64'], "segment_ids": [[-1, -1], 'int64'],
"task_ids": [[-1, -1, 1], 'int64'], "task_ids": [[-1, -1], 'int64'],
"input_mask": [[-1, -1, 1], 'float32'] "input_mask": [[-1, -1, 1], 'float32']
} }
......
...@@ -60,13 +60,13 @@ class Reader(reader): ...@@ -60,13 +60,13 @@ class Reader(reader):
@property @property
def outputs_attr(self): def outputs_attr(self):
return {"token_ids": [[-1, -1, 1], 'int64'], return {"token_ids": [[-1, -1], 'int64'],
"position_ids": [[-1, -1, 1], 'int64'], "position_ids": [[-1, -1], 'int64'],
"segment_ids": [[-1, -1, 1], 'int64'], "segment_ids": [[-1, -1], 'int64'],
"input_mask": [[-1, -1, 1], 'float32'], "input_mask": [[-1, -1, 1], 'float32'],
"task_ids": [[-1, -1, 1], 'int64'], "task_ids": [[-1, -1], 'int64'],
"mask_label": [[-1, 1], 'int64'], "mask_label": [[-1], 'int64'],
"mask_pos": [[-1, 1], 'int64'], "mask_pos": [[-1], 'int64'],
} }
......
...@@ -69,22 +69,21 @@ class Reader(reader): ...@@ -69,22 +69,21 @@ class Reader(reader):
@property @property
def outputs_attr(self): def outputs_attr(self):
if self._is_training: if self._is_training:
return {"token_ids": [[-1, -1, 1], 'int64'], return {"token_ids": [[-1, -1], 'int64'],
"position_ids": [[-1, -1, 1], 'int64'], "position_ids": [[-1, -1], 'int64'],
"segment_ids": [[-1, -1, 1], 'int64'], "segment_ids": [[-1, -1], 'int64'],
"input_mask": [[-1, -1, 1], 'float32'], "input_mask": [[-1, -1, 1], 'float32'],
"start_positions": [[-1, 1], 'int64'], "start_positions": [[-1], 'int64'],
"unique_ids": [[-1, 1], 'int64'], "end_positions": [[-1], 'int64'],
"end_positions": [[-1, 1], 'int64'], "task_ids": [[-1, -1], 'int64']
"task_ids": [[-1, -1, 1], 'int64']
} }
else: else:
return {"token_ids": [[-1, -1, 1], 'int64'], return {"token_ids": [[-1, -1], 'int64'],
"position_ids": [[-1, -1, 1], 'int64'], "position_ids": [[-1, -1], 'int64'],
"segment_ids": [[-1, -1, 1], 'int64'], "segment_ids": [[-1, -1], 'int64'],
"task_ids": [[-1, -1, 1], 'int64'], "task_ids": [[-1, -1], 'int64'],
"input_mask": [[-1, -1, 1], 'float32'], "input_mask": [[-1, -1, 1], 'float32'],
"unique_ids": [[-1, 1], 'int64'] "unique_ids": [[-1], 'int64']
} }
@property @property
......
...@@ -67,8 +67,8 @@ def mask(batch_tokens, total_token_num, vocab_size, CLS=1, SEP=2, MASK=3): ...@@ -67,8 +67,8 @@ def mask(batch_tokens, total_token_num, vocab_size, CLS=1, SEP=2, MASK=3):
sent[token_index] = MASK sent[token_index] = MASK
mask_flag = True mask_flag = True
mask_pos.append(sent_index * max_len + token_index) mask_pos.append(sent_index * max_len + token_index)
mask_label = np.array(mask_label).astype("int64").reshape([-1, 1]) mask_label = np.array(mask_label).astype("int64").reshape([-1])
mask_pos = np.array(mask_pos).astype("int64").reshape([-1, 1]) mask_pos = np.array(mask_pos).astype("int64").reshape([-1])
return batch_tokens, mask_label, mask_pos return batch_tokens, mask_label, mask_pos
...@@ -96,7 +96,7 @@ def prepare_batch_data(insts, ...@@ -96,7 +96,7 @@ def prepare_batch_data(insts,
# or unique id # or unique id
for i in range(3, len(insts[0]), 1): for i in range(3, len(insts[0]), 1):
labels = [inst[i] for inst in insts] labels = [inst[i] for inst in insts]
labels = np.array(labels).astype("int64").reshape([-1, 1]) labels = np.array(labels).astype("int64").reshape([-1])
labels_list.append(labels) labels_list.append(labels)
# First step: do mask without padding # First step: do mask without padding
if mask_id >= 0: if mask_id >= 0:
...@@ -154,14 +154,14 @@ def pad_batch_data(insts, ...@@ -154,14 +154,14 @@ def pad_batch_data(insts,
inst_data = np.array([ inst_data = np.array([
list(inst) + list([pad_idx] * (max_len - len(inst))) for inst in insts list(inst) + list([pad_idx] * (max_len - len(inst))) for inst in insts
]) ])
return_list += [inst_data.astype("int64").reshape([-1, max_len, 1])] return_list += [inst_data.astype("int64").reshape([-1, max_len])]
# position data # position data
if return_pos: if return_pos:
inst_pos = np.array([ inst_pos = np.array([
list(range(0, len(inst))) + [pad_idx] * (max_len - len(inst)) list(range(0, len(inst))) + [pad_idx] * (max_len - len(inst))
for inst in insts for inst in insts
]) ])
return_list += [inst_pos.astype("int64").reshape([-1, max_len, 1])] return_list += [inst_pos.astype("int64").reshape([-1, max_len])]
if return_input_mask: if return_input_mask:
# This is used to avoid attention on paddings. # This is used to avoid attention on paddings.
input_mask_data = np.array([[1] * len(inst) + [0] * input_mask_data = np.array([[1] * len(inst) + [0] *
......
...@@ -113,8 +113,8 @@ def mask(batch_tokens, ...@@ -113,8 +113,8 @@ def mask(batch_tokens,
pre_sent_len = len(sent) pre_sent_len = len(sent)
mask_label = np.array(mask_label).astype("int64").reshape([-1, 1]) mask_label = np.array(mask_label).astype("int64").reshape([-1])
mask_pos = np.array(mask_pos).astype("int64").reshape([-1, 1]) mask_pos = np.array(mask_pos).astype("int64").reshape([-1])
return batch_tokens, mask_label, mask_pos return batch_tokens, mask_label, mask_pos
...@@ -136,7 +136,7 @@ def pad_batch_data(insts, ...@@ -136,7 +136,7 @@ def pad_batch_data(insts,
inst_data = np.array( inst_data = np.array(
[inst + list([pad_idx] * (max_len - len(inst))) for inst in insts]) [inst + list([pad_idx] * (max_len - len(inst))) for inst in insts])
return_list += [inst_data.astype("int64").reshape([-1, max_len, 1])] return_list += [inst_data.astype("int64").reshape([-1, max_len])]
# position data # position data
if return_pos: if return_pos:
...@@ -145,7 +145,7 @@ def pad_batch_data(insts, ...@@ -145,7 +145,7 @@ def pad_batch_data(insts,
for inst in insts for inst in insts
]) ])
return_list += [inst_pos.astype("int64").reshape([-1, max_len, 1])] return_list += [inst_pos.astype("int64").reshape([-1, max_len])]
if return_input_mask: if return_input_mask:
# This is used to avoid attention on paddings. # This is used to avoid attention on paddings.
...@@ -165,7 +165,7 @@ def pad_batch_data(insts, ...@@ -165,7 +165,7 @@ def pad_batch_data(insts,
if return_seq_lens: if return_seq_lens:
seq_lens = np.array([len(inst) for inst in insts]) seq_lens = np.array([len(inst) for inst in insts])
return_list += [seq_lens.astype("int64").reshape([-1, 1])] return_list += [seq_lens.astype("int64").reshape([-1])]
return return_list if len(return_list) > 1 else return_list[0] return return_list if len(return_list) > 1 else return_list[0]
......
...@@ -168,14 +168,14 @@ def pad_batch_data(insts, ...@@ -168,14 +168,14 @@ def pad_batch_data(insts,
inst_data = np.array([ inst_data = np.array([
list(inst) + list([pad_idx] * (max_len - len(inst))) for inst in insts list(inst) + list([pad_idx] * (max_len - len(inst))) for inst in insts
]) ])
return_list += [inst_data.astype("int64").reshape([-1, max_len, 1])] return_list += [inst_data.astype("int64").reshape([-1, max_len])]
# position data # position data
if return_pos: if return_pos:
inst_pos = np.array([ inst_pos = np.array([
list(range(0, len(inst))) + [pad_idx] * (max_len - len(inst)) list(range(0, len(inst))) + [pad_idx] * (max_len - len(inst))
for inst in insts for inst in insts
]) ])
return_list += [inst_pos.astype("int64").reshape([-1, max_len, 1])] return_list += [inst_pos.astype("int64").reshape([-1, max_len])]
if return_input_mask: if return_input_mask:
# This is used to avoid attention on paddings. # This is used to avoid attention on paddings.
input_mask_data = np.array([[1] * len(inst) + [0] * input_mask_data = np.array([[1] * len(inst) + [0] *
......
...@@ -480,17 +480,17 @@ class ClassifyReader(BaseReader): ...@@ -480,17 +480,17 @@ class ClassifyReader(BaseReader):
batch_labels = [record.label_id for record in batch_records] batch_labels = [record.label_id for record in batch_records]
if self.is_classify: if self.is_classify:
batch_labels = np.array(batch_labels).astype("int64").reshape( batch_labels = np.array(batch_labels).astype("int64").reshape(
[-1, 1]) [-1])
elif self.is_regression: elif self.is_regression:
batch_labels = np.array(batch_labels).astype("float32").reshape( batch_labels = np.array(batch_labels).astype("float32").reshape(
[-1, 1]) [-1])
if batch_records[0].qid: if batch_records[0].qid:
batch_qids = [record.qid for record in batch_records] batch_qids = [record.qid for record in batch_records]
batch_qids = np.array(batch_qids).astype("int64").reshape( batch_qids = np.array(batch_qids).astype("int64").reshape(
[-1, 1]) [-1])
else: else:
batch_qids = np.array([]).astype("int64").reshape([-1, 1]) batch_qids = np.array([]).astype("int64").reshape([-1])
# padding # padding
padded_token_ids, input_mask = pad_batch_data( padded_token_ids, input_mask = pad_batch_data(
...@@ -918,19 +918,19 @@ class MRCReader(BaseReader): ...@@ -918,19 +918,19 @@ class MRCReader(BaseReader):
record.end_position for record in batch_records record.end_position for record in batch_records
] ]
batch_start_position = np.array(batch_start_position).astype( batch_start_position = np.array(batch_start_position).astype(
"int64").reshape([-1, 1]) "int64").reshape([-1])
batch_end_position = np.array(batch_end_position).astype( batch_end_position = np.array(batch_end_position).astype(
"int64").reshape([-1, 1]) "int64").reshape([-1])
else: else:
batch_size = len(batch_token_ids) batch_size = len(batch_token_ids)
batch_start_position = np.zeros( batch_start_position = np.zeros(
shape=[batch_size, 1], dtype="int64") shape=[batch_size], dtype="int64")
batch_end_position = np.zeros(shape=[batch_size, 1], dtype="int64") batch_end_position = np.zeros(shape=[batch_size], dtype="int64")
batch_unique_ids = [record.unique_id for record in batch_records] batch_unique_ids = [record.unique_id for record in batch_records]
batch_unique_ids = np.array(batch_unique_ids).astype("int64").reshape( batch_unique_ids = np.array(batch_unique_ids).astype("int64").reshape(
[-1, 1]) [-1])
# padding # padding
padded_token_ids, input_mask = pad_batch_data( padded_token_ids, input_mask = pad_batch_data(
......
...@@ -43,7 +43,7 @@ class TaskParadigm(task_paradigm): ...@@ -43,7 +43,7 @@ class TaskParadigm(task_paradigm):
@property @property
def inputs_attrs(self): def inputs_attrs(self):
if self._is_training: if self._is_training:
reader = {"label_ids": [[-1, 1], 'int64']} reader = {"label_ids": [[-1], 'int64']}
else: else:
reader = {} reader = {}
bb = {"sentence_embedding": [[-1, self._hidden_size], 'float32']} bb = {"sentence_embedding": [[-1, self._hidden_size], 'float32']}
...@@ -75,8 +75,9 @@ class TaskParadigm(task_paradigm): ...@@ -75,8 +75,9 @@ class TaskParadigm(task_paradigm):
name=scope_name+"cls_out_b", initializer=fluid.initializer.Constant(0.))) name=scope_name+"cls_out_b", initializer=fluid.initializer.Constant(0.)))
if self._is_training: if self._is_training:
loss = fluid.layers.softmax_with_cross_entropy( inputs = fluid.layers.softmax(logits)
logits=logits, label=label_ids) loss = fluid.layers.cross_entropy(
input=inputs, label=label_ids)
loss = layers.mean(loss) loss = layers.mean(loss)
return {"loss": loss} return {"loss": loss}
else: else:
......
...@@ -44,7 +44,7 @@ class TaskParadigm(task_paradigm): ...@@ -44,7 +44,7 @@ class TaskParadigm(task_paradigm):
@property @property
def inputs_attrs(self): def inputs_attrs(self):
if self._is_training: if self._is_training:
reader = {"label_ids": [[-1, 1], 'int64']} reader = {"label_ids": [[-1], 'int64']}
else: else:
reader = {} reader = {}
bb = {"sentence_pair_embedding": [[-1, self._hidden_size], 'float32']} bb = {"sentence_pair_embedding": [[-1, self._hidden_size], 'float32']}
...@@ -84,8 +84,9 @@ class TaskParadigm(task_paradigm): ...@@ -84,8 +84,9 @@ class TaskParadigm(task_paradigm):
initializer=fluid.initializer.Constant(0.))) initializer=fluid.initializer.Constant(0.)))
if self._is_training: if self._is_training:
ce_loss, probs = fluid.layers.softmax_with_cross_entropy( inputs = fluid.layers.softmax(logits)
logits=logits, label=labels, return_softmax=True) ce_loss = fluid.layers.cross_entropy(
input=inputs, label=labels)
loss = fluid.layers.mean(x=ce_loss) loss = fluid.layers.mean(x=ce_loss)
return {'loss': loss} return {'loss': loss}
else: else:
......
...@@ -33,8 +33,8 @@ class TaskParadigm(task_paradigm): ...@@ -33,8 +33,8 @@ class TaskParadigm(task_paradigm):
@property @property
def inputs_attrs(self): def inputs_attrs(self):
reader = { reader = {
"mask_label": [[-1, 1], 'int64'], "mask_label": [[-1], 'int64'],
"mask_pos": [[-1, 1], 'int64']} "mask_pos": [[-1], 'int64']}
if not self._is_training: if not self._is_training:
del reader['mask_label'] del reader['mask_label']
del reader['batchsize_x_seqlen'] del reader['batchsize_x_seqlen']
...@@ -100,8 +100,9 @@ class TaskParadigm(task_paradigm): ...@@ -100,8 +100,9 @@ class TaskParadigm(task_paradigm):
is_bias=True) is_bias=True)
if self._is_training: if self._is_training:
mask_lm_loss = fluid.layers.softmax_with_cross_entropy( inputs = fluid.layers.softmax(fc_out)
logits=fc_out, label=mask_label) mask_lm_loss = fluid.layers.cross_entropy(
input=inputs, label=mask_label)
loss = fluid.layers.mean(mask_lm_loss) loss = fluid.layers.mean(mask_lm_loss)
return {'loss': loss} return {'loss': loss}
else: else:
......
...@@ -49,11 +49,11 @@ class TaskParadigm(task_paradigm): ...@@ -49,11 +49,11 @@ class TaskParadigm(task_paradigm):
@property @property
def inputs_attrs(self): def inputs_attrs(self):
if self._is_training: if self._is_training:
reader = {"start_positions": [[-1, 1], 'int64'], reader = {"start_positions": [[-1], 'int64'],
"end_positions": [[-1, 1], 'int64'], "end_positions": [[-1], 'int64'],
} }
else: else:
reader = {'unique_ids': [[-1, 1], 'int64']} reader = {'unique_ids': [[-1], 'int64']}
bb = {"encoder_outputs": [[-1, -1, self._hidden_size], 'float32']} bb = {"encoder_outputs": [[-1, -1, self._hidden_size], 'float32']}
return {'reader': reader, 'backbone': bb} return {'reader': reader, 'backbone': bb}
...@@ -70,7 +70,7 @@ class TaskParadigm(task_paradigm): ...@@ -70,7 +70,7 @@ class TaskParadigm(task_paradigm):
else: else:
return {'start_logits': [[-1, -1, 1], 'float32'], return {'start_logits': [[-1, -1, 1], 'float32'],
'end_logits': [[-1, -1, 1], 'float32'], 'end_logits': [[-1, -1, 1], 'float32'],
'unique_ids': [[-1, 1], 'int64']} 'unique_ids': [[-1], 'int64']}
def build(self, inputs, scope_name=""): def build(self, inputs, scope_name=""):
...@@ -102,9 +102,11 @@ class TaskParadigm(task_paradigm): ...@@ -102,9 +102,11 @@ class TaskParadigm(task_paradigm):
start_logits, end_logits = fluid.layers.unstack(x=logits, axis=0) start_logits, end_logits = fluid.layers.unstack(x=logits, axis=0)
def _compute_single_loss(logits, positions): def _compute_single_loss(logits, positions):
"""Compute start/end loss for mrc model""" """Compute start/en
loss = fluid.layers.softmax_with_cross_entropy( d loss for mrc model"""
logits=logits, label=positions) inputs = fluid.layers.softmax(logits)
loss = fluid.layers.cross_entropy(
input=inputs, label=positions)
loss = fluid.layers.mean(x=loss) loss = fluid.layers.mean(x=loss)
return loss return loss
...@@ -122,7 +124,7 @@ class TaskParadigm(task_paradigm): ...@@ -122,7 +124,7 @@ class TaskParadigm(task_paradigm):
def postprocess(self, rt_outputs): def postprocess(self, rt_outputs):
"""this func will be called after each step(batch) of training/evaluating/predicting process.""" """this func will be called after each step(batch) of training/evaluating/predicting process."""
if not self._is_training: if not self._is_training:
unique_ids = np.squeeze(rt_outputs['unique_ids'], -1) unique_ids = rt_outputs['unique_ids']
start_logits = rt_outputs['start_logits'] start_logits = rt_outputs['start_logits']
end_logits = rt_outputs['end_logits'] end_logits = rt_outputs['end_logits']
for idx in range(len(unique_ids)): for idx in range(len(unique_ids)):
......
...@@ -19,7 +19,6 @@ import random ...@@ -19,7 +19,6 @@ import random
import numpy as np import numpy as np
import paddle import paddle
from paddle import fluid from paddle import fluid
from paddle.fluid import layers
def _check_and_adapt_shape_dtype(rt_val, attr, message=""): def _check_and_adapt_shape_dtype(rt_val, attr, message=""):
...@@ -65,7 +64,7 @@ def create_net_inputs(input_attrs, async=False, iterator_fn=None, dev_count=1, n ...@@ -65,7 +64,7 @@ def create_net_inputs(input_attrs, async=False, iterator_fn=None, dev_count=1, n
inputs = [] inputs = []
ret = {} ret = {}
for name, shape, dtype in input_attrs: for name, shape, dtype in input_attrs:
p = layers.data(name, shape=shape, dtype=dtype) p = fluid.data(name, shape=shape, dtype=dtype)
ret[name] = p ret[name] = p
inputs.append(p) inputs.append(p)
...@@ -227,7 +226,7 @@ def merge_input_attrs(backbone_attr, task_attrs, insert_taskid=True, insert_batc ...@@ -227,7 +226,7 @@ def merge_input_attrs(backbone_attr, task_attrs, insert_taskid=True, insert_batc
names = [] names = []
start = 0 start = 0
if insert_taskid: if insert_taskid:
ret.append(([1,1], 'int64')) ret.append(([1, 1], 'int64'))
names.append('__task_id') names.append('__task_id')
start += 1 start += 1
......
#!/bin/sh
if [[ $# != 1 ]]; then
echo "usage: bash convert_params.sh <params_dir>"
exit 1
fi
if [[ -f $1/__palminfo__ ]]; then
echo "already converted."
exit 0
fi
echo "converting..."
if [[ -d $1/params ]]; then
cd $1/params
else
cd $1
fi
mkdir .palm.backup
for file in $(ls *)
do cp $file .palm.backup; mv $file "__paddlepalm_"$file
done
tar -cf __rawmodel__ .palm.backup/*
rm .palm.backup/*
mv __rawmodel__ .palm.backup
# find . ! -name '__rawmodel__' -exec rm {} +
tar -cf __palmmodel__ __paddlepalm_*
touch __palminfo__
ls __paddlepalm_* > __palminfo__
rm __paddlepalm_*
cd - >/dev/null
echo "done!"
#!/bin/bash
set -e
if [[ $# != 1 ]]; then
echo "Usage: bash download_pretrain.sh <bert|ernie>"
exit 1
fi
if [[ $1 == 'bert' ]]; then
name="bert"
link="https://bert-models.bj.bcebos.com/uncased_L-24_H-1024_A-16.tar.gz"
packname="uncased_L-24_H-1024_A-16.tar.gz"
dirname="uncased_L-24_H-1024_A-16"
elif [[ $1 == 'ernie' ]]; then
name="ernie"
link="https://ernie.bj.bcebos.com/ERNIE_Large_en_stable-2.0.0.tar.gz"
packname="ERNIE_Large_en_stable-2.0.0.tar.gz"
else
echo "$1 is currently not supported."
exit 1
fi
if [[ ! -d pretrain_model ]]; then
mkdir pretrain_model
fi
cd pretrain_model
mkdir $name
cd $name
echo "downloading ${name}..."
wget --no-check-certificate $link
echo "decompressing..."
tar -zxf $packname
rm -rf $packname
if [[ $dirname != "" ]]; then
mv $dirname/* .
rm -rf $dirname
fi
cd ../..
#!/bin/sh
if [[ $# != 1 ]]; then
echo "usage: bash recover_params.sh <params_dir>"
exit 1
fi
if [[ ! -d $1 ]]; then
echo "$1 not found."
exit 1
fi
if [[ ! -f $1/__palmmodel__ ]]; then
echo "paddlepalm model not found."
exit 1
fi
echo "recovering..."
if [[ -d $1/params ]]; then
cd $1/params
else
cd $1
fi
rm __palm*
mv .palm.backup/__rawmodel__ .
rm -rf .palm.backup
tar -xf __rawmodel__
mv .palm.backup/* .
rm __rawmodel__
rm -rf .palm.backup
cd - >/dev/null
...@@ -18,7 +18,7 @@ ...@@ -18,7 +18,7 @@
""" """
Setup script. Setup script.
Authors: zhouxiangyang(zhouxiangyang@baidu.com) Authors: zhouxiangyang(zhouxiangyang@baidu.com)
Date: 2019/09/29 21:00:01 Date: 2019/12/05 13:24:01
""" """
import setuptools import setuptools
from io import open from io import open
...@@ -27,10 +27,10 @@ with open("README.md", "r", encoding='utf-8') as fh: ...@@ -27,10 +27,10 @@ with open("README.md", "r", encoding='utf-8') as fh:
setuptools.setup( setuptools.setup(
name="paddlepalm", name="paddlepalm",
version="0.2.1", version="0.2.2",
author="PaddlePaddle", author="PaddlePaddle",
author_email="zhangyiming04@baidu.com", author_email="zhangyiming04@baidu.com",
description="A Multi-task Learning Lib for PaddlePaddle Users.", description="A Lib for PaddlePaddle Users.",
# long_description=long_description, # long_description=long_description,
# long_description_content_type="text/markdown", # long_description_content_type="text/markdown",
url="https://github.com/PaddlePaddle/PALM", url="https://github.com/PaddlePaddle/PALM",
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册