未验证 提交 803dab78 编写于 作者: P pkpk 提交者: GitHub

test=develop (#4389)

上级 9e12ab90
Subproject commit 5426f75073cf5bd416622dbe71b146d3dc8fffb6
Subproject commit 30b892e3c029bff706337f269e6c158b0a223f60
...@@ -10,7 +10,7 @@ ...@@ -10,7 +10,7 @@
- **丰富而全面的NLP任务支持:** - **丰富而全面的NLP任务支持:**
- PaddleNLP为您提供了多粒度,多场景的应用支持。涵盖了从[分词](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/lexical_analysis)[词性标注](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/lexical_analysis)[命名实体识别](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/lexical_analysis)等NLP基础技术,到[文本分类](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/sentiment_classification)[文本相似度计算](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/similarity_net)[语义表示](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleLARK)[文本生成](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleTextGEN)等NLP核心技术。同时,PaddleNLP还提供了针对常见NLP大型应用系统(如[阅读理解](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleMRC)[对话系统](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleDialogue)[机器翻译系统](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleMT)等)的特定核心技术和工具组件,模型和预训练参数等,让您在NLP领域畅通无阻。 - PaddleNLP为您提供了多粒度,多场景的应用支持。涵盖了从[分词](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/lexical_analysis)[词性标注](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/lexical_analysis)[命名实体识别](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/lexical_analysis)等NLP基础技术,到[文本分类](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/sentiment_classification)[文本相似度计算](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/similarity_net)[语义表示](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/pretrain_langauge_models)[文本生成](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/seq2seq)等NLP核心技术。同时,PaddleNLP还提供了针对常见NLP大型应用系统(如[阅读理解](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/machine_reading_comprehension)[对话系统](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/dialogue_system)[机器翻译系统](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/machine_translation)等)的特定核心技术和工具组件,模型和预训练参数等,让您在NLP领域畅通无阻。
- **稳定可靠的NLP模型和强大的预训练参数:** - **稳定可靠的NLP模型和强大的预训练参数:**
...@@ -55,11 +55,11 @@ cd models/PaddleNLP/sentiment_classification ...@@ -55,11 +55,11 @@ cd models/PaddleNLP/sentiment_classification
| **语言模型** | [Language_model](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/language_model) | 基于循环神经网络(RNN)的经典神经语言模型(neural language model)。 | | **语言模型** | [Language_model](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/language_model) | 基于循环神经网络(RNN)的经典神经语言模型(neural language model)。 |
| **情感分类**:fire: | [Senta](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/sentiment_classification)[EmotionDetection](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/emotion_detection) | Senta(Sentiment Classification,简称Senta)和EmotionDetection两个项目分别提供了面向*通用场景**人机对话场景专用*的情感倾向性分析模型。 | | **情感分类**:fire: | [Senta](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/sentiment_classification)[EmotionDetection](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/emotion_detection) | Senta(Sentiment Classification,简称Senta)和EmotionDetection两个项目分别提供了面向*通用场景**人机对话场景专用*的情感倾向性分析模型。 |
| **文本相似度计算**:fire: | [SimNet](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/similarity_net) | SimNet,又称为Similarity Net,为您提供高效可靠的文本相似度计算工具和预训练模型。 | | **文本相似度计算**:fire: | [SimNet](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/similarity_net) | SimNet,又称为Similarity Net,为您提供高效可靠的文本相似度计算工具和预训练模型。 |
| **语义表示**:fire: | [PaddleLARK](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleLARK) | PaddleLARK,全称为Paddle LAngauge Representation Toolkit,集成了ELMO,BERT,ERNIE 1.0,ERNIE 2.0,XLNet等热门中英文预训练模型。 | | **语义表示**:fire: | [pretrain_langauge_models](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/pretrain_langauge_models) | 集成了ELMO,BERT,ERNIE 1.0,ERNIE 2.0,XLNet等热门中英文预训练模型。 |
| **文本生成** | [PaddleTextGEN](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleTextGEN) | Paddle Text Generation为您提供了一些列经典文本生成模型案例,如vanilla seq2seq,seq2seq with attention,variational seq2seq模型等。 | | **文本生成** | [seq2seq](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleTextGEN) | seq2seq为您提供了一些列经典文本生成模型案例,如vanilla seq2seq,seq2seq with attention,variational seq2seq模型等。 |
| **阅读理解** | [PaddleMRC](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleMRC) | PaddleMRC,全称为Paddle Machine Reading Comprehension,集合了百度在阅读理解领域相关的模型,工具,开源数据等一系列工作。包括DuReader (百度开源的基于真实搜索用户行为的中文大规模阅读理解数据集),KT-Net (结合知识的阅读理解模型,SQuAD以及ReCoRD曾排名第一), D-Net (预训练-微调框架,在EMNLP2019 MRQA国际阅读理解评测获得第一),等。 | | **阅读理解** | [machine_reading_comprehension](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/machine_reading_comprehension) | Paddle Machine Reading Comprehension,集合了百度在阅读理解领域相关的模型,工具,开源数据等一系列工作。包括DuReader (百度开源的基于真实搜索用户行为的中文大规模阅读理解数据集),KT-Net (结合知识的阅读理解模型,SQuAD以及ReCoRD曾排名第一), D-Net (预训练-微调框架,在EMNLP2019 MRQA国际阅读理解评测获得第一),等。 |
| **对话系统** | [PaddleDialogue](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleDialogue) | 包括:1)DGU(Dialogue General Understanding,通用对话理解模型)覆盖了包括**检索式聊天系统**中context-response matching任务和**任务完成型对话系统****意图识别****槽位解析****状态追踪**等常见对话系统任务,在6项国际公开数据集中都获得了最佳效果。<br/> 2) knowledge-driven dialogue:百度开源的知识驱动的开放领域对话数据集,发表于ACL2019。<br/>3)ADEM(Auto Dialogue Evaluation Model):对话自动评估模型,可用于自动评估不同对话生成模型的回复质量。 | | **对话系统** | [dialogue_system](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/dialogue_system) | 包括:1)DGU(Dialogue General Understanding,通用对话理解模型)覆盖了包括**检索式聊天系统**中context-response matching任务和**任务完成型对话系统****意图识别****槽位解析****状态追踪**等常见对话系统任务,在6项国际公开数据集中都获得了最佳效果。<br/> 2) knowledge-driven dialogue:百度开源的知识驱动的开放领域对话数据集,发表于ACL2019。<br/>3)ADEM(Auto Dialogue Evaluation Model):对话自动评估模型,可用于自动评估不同对话生成模型的回复质量。 |
| **机器翻译** | [PaddleMT](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleMT) | 全称为Paddle Machine Translation,基于Transformer的经典机器翻译模型。 | | **机器翻译** | [machine_translation](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/machine_translation) | 全称为Paddle Machine Translation,基于Transformer的经典机器翻译模型。 |
| **其他前沿工作** | [Research](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/Research) | 百度最新前沿工作开源。 | | **其他前沿工作** | [Research](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/Research) | 百度最新前沿工作开源。 |
...@@ -70,13 +70,13 @@ cd models/PaddleNLP/sentiment_classification ...@@ -70,13 +70,13 @@ cd models/PaddleNLP/sentiment_classification
```text ```text
. .
├── Research # 百度NLP在research方面的工作集合 ├── Research # 百度NLP在research方面的工作集合
├── PaddleMT # 机器翻译相关代码,数据,预训练模型 ├── machine_translation # 机器翻译相关代码,数据,预训练模型
├── PaddleDialogue # 对话系统相关代码,数据,预训练模型 ├── dialogue_system # 对话系统相关代码,数据,预训练模型
├── PaddleMRC # 阅读理解相关代码,数据,预训练模型 ├── machcine_reading_comprehension # 阅读理解相关代码,数据,预训练模型
├── PaddleLARK # 语言表示工具箱 ├── pretrain_langauge_models # 语言表示工具箱
├── language_model # 语言模型 ├── language_model # 语言模型
├── lexical_analysis # LAC词法分析 ├── lexical_analysis # LAC词法分析
├── models # 共享网络 ├── shared_modules/models # 共享网络
│ ├── __init__.py │ ├── __init__.py
│ ├── classification │ ├── classification
│ ├── dialogue_model_toolkit │ ├── dialogue_model_toolkit
...@@ -87,7 +87,7 @@ cd models/PaddleNLP/sentiment_classification ...@@ -87,7 +87,7 @@ cd models/PaddleNLP/sentiment_classification
│ ├── representation │ ├── representation
│ ├── sequence_labeling │ ├── sequence_labeling
│ └── transformer_encoder.py │ └── transformer_encoder.py
├── preprocess # 共享文本预处理工具 ├── shared_modules/preprocess # 共享文本预处理工具
│ ├── __init__.py │ ├── __init__.py
│ ├── ernie │ ├── ernie
│ ├── padding.py │ ├── padding.py
......
...@@ -16,7 +16,6 @@ ...@@ -16,7 +16,6 @@
# limitations under the License. # limitations under the License.
""" """
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
...@@ -40,43 +39,55 @@ import math ...@@ -40,43 +39,55 @@ import math
np.random.seed(0) np.random.seed(0)
random.seed(0) random.seed(0)
parser = argparse.ArgumentParser(__doc__) parser = argparse.ArgumentParser(__doc__)
DEV_COUNT = 1 DEV_COUNT = 1
model_g = ArgumentGroup(parser, "model", "model configuration and paths.") model_g = ArgumentGroup(parser, "model", "model configuration and paths.")
model_g.add_arg("init_checkpoint", str, None, "Init checkpoint to resume training from.") model_g.add_arg("init_checkpoint", str, None,
model_g.add_arg("checkpoints", str, "./checkpoints", "Path to save checkpoints.") "Init checkpoint to resume training from.")
model_g.add_arg("checkpoints", str, "./checkpoints",
"Path to save checkpoints.")
model_g.add_arg("config_path", str, "./data/input/model.conf", "Model conf.") model_g.add_arg("config_path", str, "./data/input/model.conf", "Model conf.")
model_g.add_arg("build_dict", bool, False, "Build dict.") model_g.add_arg("build_dict", bool, False, "Build dict.")
train_g = ArgumentGroup(parser, "training", "training options.") train_g = ArgumentGroup(parser, "training", "training options.")
train_g.add_arg("cpu_num", int, 3, "Number of Threads.") train_g.add_arg("cpu_num", int, 3, "Number of Threads.")
train_g.add_arg("epoch", int, 100, "Number of epoches for training.") train_g.add_arg("epoch", int, 100, "Number of epoches for training.")
train_g.add_arg("learning_rate", float, 0.1, "Learning rate used to train with warmup.") train_g.add_arg("learning_rate", float, 0.1,
train_g.add_arg("save_steps", int, 1000, "The steps interval to save checkpoints.") "Learning rate used to train with warmup.")
train_g.add_arg("validation_steps", int, 100, "The steps interval to evaluate model performance.") train_g.add_arg("save_steps", int, 1000,
"The steps interval to save checkpoints.")
train_g.add_arg("validation_steps", int, 100,
"The steps interval to evaluate model performance.")
train_g.add_arg("random_seed", int, 7, "random seed") train_g.add_arg("random_seed", int, 7, "random seed")
train_g.add_arg("threshold", float, 0.1, "When the confidence exceeds the threshold, the corresponding label is given.") train_g.add_arg(
"threshold", float, 0.1,
"When the confidence exceeds the threshold, the corresponding label is given."
)
log_g = ArgumentGroup(parser, "logging", "logging related.") log_g = ArgumentGroup(parser, "logging", "logging related.")
log_g.add_arg("skip_steps", int, 10, "The steps interval to print loss.") log_g.add_arg("skip_steps", int, 10, "The steps interval to print loss.")
data_g = ArgumentGroup(parser, "data", "Data paths, vocab paths and data processing options") data_g = ArgumentGroup(parser, "data",
"Data paths, vocab paths and data processing options")
data_g.add_arg("data_dir", str, "./data/input/", "Path to training data.") data_g.add_arg("data_dir", str, "./data/input/", "Path to training data.")
data_g.add_arg("save_dir", str, "./data/output/", "Path to save.") data_g.add_arg("save_dir", str, "./data/output/", "Path to save.")
data_g.add_arg("max_seq_len", int, 50, "Tokens' number of the longest seqence allowed.") data_g.add_arg("max_seq_len", int, 50,
data_g.add_arg("batch_size", int, 64, "The total number of examples in one batch for training.") "Tokens' number of the longest seqence allowed.")
data_g.add_arg("batch_size", int, 64,
"The total number of examples in one batch for training.")
run_type_g = ArgumentGroup(parser, "run_type", "running type options.") run_type_g = ArgumentGroup(parser, "run_type", "running type options.")
run_type_g.add_arg("use_cuda", bool, False, "If set, use GPU for training.") run_type_g.add_arg("use_cuda", bool, False, "If set, use GPU for training.")
# run_type_g.add_arg("use_fast_executor", bool, False, "If set, use fast parallel executor (in experiment).") # run_type_g.add_arg("use_fast_executor", bool, False, "If set, use fast parallel executor (in experiment).")
run_type_g.add_arg("do_train", bool, True, "Whether to perform evaluation on test data set.") run_type_g.add_arg("do_train", bool, True,
run_type_g.add_arg("do_eval", bool, True, "Whether to perform evaluation on test data set.") "Whether to perform evaluation on test data set.")
run_type_g.add_arg("do_test", bool, True, "Whether to perform evaluation on test data set.") run_type_g.add_arg("do_eval", bool, True,
"Whether to perform evaluation on test data set.")
run_type_g.add_arg("do_test", bool, True,
"Whether to perform evaluation on test data set.")
args = parser.parse_args() args = parser.parse_args()
def get_score(pred_result, label, eval_phase): def get_score(pred_result, label, eval_phase):
"""[get precision recall and f-score] """[get precision recall and f-score]
...@@ -93,7 +104,7 @@ def get_score(pred_result, label, eval_phase): ...@@ -93,7 +104,7 @@ def get_score(pred_result, label, eval_phase):
total += 1 total += 1
pred_labels = [] pred_labels = []
actual_labels = [] actual_labels = []
for j in range(1, len(pred_result[0])): # the 0 one is background for j in range(1, len(pred_result[0])): # the 0 one is background
if pred_result[i][j] == 1: if pred_result[i][j] == 1:
pred_labels.append(j) pred_labels.append(j)
if label[i][j] == 1: if label[i][j] == 1:
...@@ -106,12 +117,12 @@ def get_score(pred_result, label, eval_phase): ...@@ -106,12 +117,12 @@ def get_score(pred_result, label, eval_phase):
tp += 1 tp += 1
true_cnt += 1 true_cnt += 1
elif len(pred_labels) == 0 and len(actual_labels) == 0: elif len(pred_labels) == 0 and len(actual_labels) == 0:
true_cnt += 1 true_cnt += 1
try: try:
precision = tp * 1.0 / pred_pos_num precision = tp * 1.0 / pred_pos_num
recall = tp * 1.0 / pos_num recall = tp * 1.0 / pos_num
f1 = 2 * precision * recall / (recall + precision) f1 = 2 * precision * recall / (recall + precision)
except Exception as e: except Exception as e:
precision = 0 precision = 0
recall = 0 recall = 0
f1 = 0 f1 = 0
...@@ -139,7 +150,7 @@ def train(args, train_exe, build_res, place): ...@@ -139,7 +150,7 @@ def train(args, train_exe, build_res, place):
pred_label = build_res["pred_label"] pred_label = build_res["pred_label"]
label = build_res["label"] label = build_res["label"]
fetch_list = [cost.name, prediction.name, pred_label.name, label.name] fetch_list = [cost.name, prediction.name, pred_label.name, label.name]
train_pyreader = build_res["train_pyreader"] train_data_loader = build_res["train_data_loader"]
train_prog = build_res["train_prog"] train_prog = build_res["train_prog"]
steps = 0 steps = 0
time_begin = time.time() time_begin = time.time()
...@@ -147,22 +158,24 @@ def train(args, train_exe, build_res, place): ...@@ -147,22 +158,24 @@ def train(args, train_exe, build_res, place):
logger.info("Begin training") logger.info("Begin training")
for i in range(args.epoch): for i in range(args.epoch):
try: try:
for data in train_pyreader(): for data in train_data_loader():
avg_cost_np, avg_pred_np, pred_label, label = train_exe.run(feed=data, program=compiled_prog, \ avg_cost_np, avg_pred_np, pred_label, label = train_exe.run(feed=data, program=compiled_prog, \
fetch_list=fetch_list) fetch_list=fetch_list)
steps += 1 steps += 1
if steps % int(args.skip_steps) == 0: if steps % int(args.skip_steps) == 0:
time_end = time.time() time_end = time.time()
used_time = time_end - time_begin used_time = time_end - time_begin
get_score(pred_label, label, eval_phase = "Train") get_score(pred_label, label, eval_phase="Train")
logger.info('loss is {}'.format(avg_cost_np)) logger.info('loss is {}'.format(avg_cost_np))
logger.info("epoch: %d, step: %d, speed: %f steps/s" % (i, steps, args.skip_steps / used_time)) logger.info("epoch: %d, step: %d, speed: %f steps/s" %
(i, steps, args.skip_steps / used_time))
time_begin = time.time() time_begin = time.time()
if steps % args.save_steps == 0: if steps % args.save_steps == 0:
save_path = os.path.join(args.checkpoints, save_path = os.path.join(args.checkpoints,
"step_" + str(steps)) "step_" + str(steps))
fluid.io.save_persistables(train_exe, save_path, train_prog) fluid.io.save(train_prog, save_path)
logger.info("[save]step %d : save at %s" % (steps, save_path)) logger.info("[save]step %d : save at %s" %
(steps, save_path))
if steps % args.validation_steps == 0: if steps % args.validation_steps == 0:
if args.do_eval: if args.do_eval:
evaluate(args, test_exe, build_res, "eval") evaluate(args, test_exe, build_res, "eval")
...@@ -173,11 +186,16 @@ def train(args, train_exe, build_res, place): ...@@ -173,11 +186,16 @@ def train(args, train_exe, build_res, place):
logger.error("Train error : %s" % str(e)) logger.error("Train error : %s" % str(e))
exit(1) exit(1)
save_path = os.path.join(args.checkpoints, "step_" + str(steps)) save_path = os.path.join(args.checkpoints, "step_" + str(steps))
fluid.io.save_persistables(train_exe, save_path, train_prog) fluid.io.save(train_prog, save_path)
logger.info("[save]step %d : save at %s" % (steps, save_path)) logger.info("[save]step %d : save at %s" % (steps, save_path))
def evaluate(args, test_exe, build_res, eval_phase, save_result=False, id2intent=None): def evaluate(args,
test_exe,
build_res,
eval_phase,
save_result=False,
id2intent=None):
"""[evaluate on dev/test dataset] """[evaluate on dev/test dataset]
Arguments: Arguments:
...@@ -193,7 +211,7 @@ def evaluate(args, test_exe, build_res, eval_phase, save_result=False, id2intent ...@@ -193,7 +211,7 @@ def evaluate(args, test_exe, build_res, eval_phase, save_result=False, id2intent
save_result {bool} -- [description] (default: {False}) save_result {bool} -- [description] (default: {False})
id2intent {[type]} -- [description] (default: {None}) id2intent {[type]} -- [description] (default: {None})
""" """
place = build_res["test_place"] place = build_res["test_place"]
threshold = args.threshold threshold = args.threshold
cost = build_res["cost"] cost = build_res["cost"]
prediction = build_res["prediction"] prediction = build_res["prediction"]
...@@ -203,29 +221,34 @@ def evaluate(args, test_exe, build_res, eval_phase, save_result=False, id2intent ...@@ -203,29 +221,34 @@ def evaluate(args, test_exe, build_res, eval_phase, save_result=False, id2intent
total_cost, total_acc, pred_prob_list, pred_label_list, label_list = [], [], [], [], [] total_cost, total_acc, pred_prob_list, pred_label_list, label_list = [], [], [], [], []
if eval_phase == "eval": if eval_phase == "eval":
test_prog = build_res["eval_compiled_prog"] test_prog = build_res["eval_compiled_prog"]
test_pyreader = build_res["eval_pyreader"] test_data_loader = build_res["eval_data_loader"]
elif eval_phase == "test": elif eval_phase == "test":
test_prog = build_res["test_compiled_prog"] test_prog = build_res["test_compiled_prog"]
test_pyreader = build_res["test_pyreader"] test_data_loader = build_res["test_data_loader"]
else: else:
exit(1) exit(1)
logger.info("-----------------------------------------------------------") logger.info("-----------------------------------------------------------")
for data in test_pyreader(): for data in test_data_loader():
avg_cost_np, avg_pred_np, pred_label, label= test_exe.run(program=test_prog, fetch_list=fetch_list, feed=data, \ avg_cost_np, avg_pred_np, pred_label, label= test_exe.run(program=test_prog, fetch_list=fetch_list, feed=data, \
return_numpy=True) return_numpy=True)
total_cost.append(avg_cost_np) total_cost.append(avg_cost_np)
pred_prob_list.extend(avg_pred_np) pred_prob_list.extend(avg_pred_np)
pred_label_list.extend(pred_label) pred_label_list.extend(pred_label)
label_list.extend(label) label_list.extend(label)
if save_result: if save_result:
logger.info("save result at : %s" % args.save_dir + "/" + eval_phase + ".rst") logger.info("save result at : %s" % args.save_dir + "/" + eval_phase +
".rst")
save_dir = args.save_dir save_dir = args.save_dir
if not os.path.exists(save_dir): if not os.path.exists(save_dir):
logger.warning("save dir not exists, and create it") logger.warning("save dir not exists, and create it")
os.makedirs(save_dir) os.makedirs(save_dir)
fin = codecs.open(os.path.join(args.data_dir, eval_phase + ".txt"), "r", encoding="utf8") fin = codecs.open(
fout = codecs.open(args.save_dir + "/" + eval_phase + ".rst", "w", encoding="utf8") os.path.join(args.data_dir, eval_phase + ".txt"),
"r",
encoding="utf8")
fout = codecs.open(
args.save_dir + "/" + eval_phase + ".rst", "w", encoding="utf8")
for line in pred_prob_list: for line in pred_prob_list:
query = fin.readline().rsplit("\t", 1)[0] query = fin.readline().rsplit("\t", 1)[0]
res = [] res = []
...@@ -236,18 +259,23 @@ def evaluate(args, test_exe, build_res, eval_phase, save_result=False, id2intent ...@@ -236,18 +259,23 @@ def evaluate(args, test_exe, build_res, eval_phase, save_result=False, id2intent
if len(res) == 0: if len(res) == 0:
res.append(id2intent[0]) res.append(id2intent[0])
fout.write("%s\t%s\n" % (query, "\2".join(sorted(res)))) fout.write("%s\t%s\n" % (query, "\2".join(sorted(res))))
fout.close() fout.close()
fin.close() fin.close()
logger.info("[%s] result: " % eval_phase) logger.info("[%s] result: " % eval_phase)
get_score(pred_label_list, label_list, eval_phase) get_score(pred_label_list, label_list, eval_phase)
logger.info('loss is {}'.format(sum(total_cost) * 1.0 / len(total_cost))) logger.info('loss is {}'.format(sum(total_cost) * 1.0 / len(total_cost)))
logger.info("-----------------------------------------------------------") logger.info("-----------------------------------------------------------")
def create_net(args,
def create_net(args, flow_data, class_dim, dict_dim, place, model_name="textcnn_net", is_infer=False): flow_data,
"""[create network and pyreader] class_dim,
dict_dim,
place,
model_name="textcnn_net",
is_infer=False):
"""[create network and loader]
Arguments: Arguments:
flow_data {[type]} -- [description] flow_data {[type]} -- [description]
...@@ -266,29 +294,42 @@ def create_net(args, flow_data, class_dim, dict_dim, place, model_name="textcnn_ ...@@ -266,29 +294,42 @@ def create_net(args, flow_data, class_dim, dict_dim, place, model_name="textcnn_
model = textcnn_net_multi_label model = textcnn_net_multi_label
else: else:
return return
char_list = fluid.data(name="char", shape=[None, args.max_seq_len, 1], dtype="int64", lod_level=0) char_list = fluid.data(
label = fluid.data(name="label", shape=[None, class_dim], dtype="float32", lod_level=0) # label data name="char",
reader = fluid.io.PyReader(feed_list=[char_list, label], capacity=args.batch_size * 10, iterable=True, \ shape=[None, args.max_seq_len, 1],
return_list=False) dtype="int64",
output = model(char_list, label, dict_dim, lod_level=0)
emb_dim=flow_data["model"]["emb_dim"], label = fluid.data(
hid_dim=flow_data["model"]["hid_dim"], name="label", shape=[None, class_dim], dtype="float32",
hid_dim2=flow_data["model"]["hid_dim2"], lod_level=0) # label data
class_dim=class_dim, data_loader = fluid.io.DataLoader.from_generator(
win_sizes=flow_data["model"]["win_sizes"], feed_list=[char_list, label],
is_infer=is_infer, capacity=args.batch_size * 10,
threshold=args.threshold, iterable=True,
max_seq_len=args.max_seq_len) return_list=False)
output = model(
char_list,
label,
dict_dim,
emb_dim=flow_data["model"]["emb_dim"],
hid_dim=flow_data["model"]["hid_dim"],
hid_dim2=flow_data["model"]["hid_dim2"],
class_dim=class_dim,
win_sizes=flow_data["model"]["win_sizes"],
is_infer=is_infer,
threshold=args.threshold,
max_seq_len=args.max_seq_len)
if is_infer: if is_infer:
prediction = output prediction = output
return [reader, prediction] return [data_loader, prediction]
else: else:
avg_cost, prediction, pred_label, label = output[0], output[1], output[2], output[3] avg_cost, prediction, pred_label, label = output[0], output[1], output[
return [reader, avg_cost, prediction, pred_label, label] 2], output[3]
return [data_loader, avg_cost, prediction, pred_label, label]
def build_data_reader(args, char_dict, intent_dict):
"""[decorate samples for pyreader] def build_data_loader(args, char_dict, intent_dict):
"""[decorate samples for dataloader]
Arguments: Arguments:
args {[type]} -- [description] args {[type]} -- [description]
...@@ -298,20 +339,22 @@ def build_data_reader(args, char_dict, intent_dict): ...@@ -298,20 +339,22 @@ def build_data_reader(args, char_dict, intent_dict):
Returns: Returns:
[type] -- [description] [type] -- [description]
""" """
reader_res = {} loader_res = {}
if args.do_train: if args.do_train:
train_processor = DataReader(char_dict, intent_dict, args.max_seq_len) train_processor = DataReader(char_dict, intent_dict, args.max_seq_len)
train_data_generator = train_processor.prepare_data( train_data_generator = train_processor.prepare_data(
data_path=args.data_dir + "train.txt", data_path=args.data_dir + "train.txt",
batch_size=args.batch_size, batch_size=args.batch_size,
mode='train') mode='train')
reader_res["train_data_generator"] = train_data_generator loader_res["train_data_generator"] = train_data_generator
num_train_examples = train_processor._get_num_examples() num_train_examples = train_processor._get_num_examples()
logger.info("Num train examples: %d" % num_train_examples) logger.info("Num train examples: %d" % num_train_examples)
logger.info("Num train steps: %d" % (math.ceil(num_train_examples * 1.0 / args.batch_size) * \ logger.info("Num train steps: %d" % (math.ceil(num_train_examples * 1.0 / args.batch_size) * \
args.epoch // DEV_COUNT)) args.epoch // DEV_COUNT))
if math.ceil(num_train_examples * 1.0 / args.batch_size) // DEV_COUNT <= 0: if math.ceil(num_train_examples * 1.0 /
logger.error("Num of train steps is less than 0 or equals to 0, exit") args.batch_size) // DEV_COUNT <= 0:
logger.error(
"Num of train steps is less than 0 or equals to 0, exit")
exit(1) exit(1)
if args.do_eval: if args.do_eval:
eval_processor = DataReader(char_dict, intent_dict, args.max_seq_len) eval_processor = DataReader(char_dict, intent_dict, args.max_seq_len)
...@@ -319,7 +362,7 @@ def build_data_reader(args, char_dict, intent_dict): ...@@ -319,7 +362,7 @@ def build_data_reader(args, char_dict, intent_dict):
data_path=args.data_dir + "eval.txt", data_path=args.data_dir + "eval.txt",
batch_size=args.batch_size, batch_size=args.batch_size,
mode='eval') mode='eval')
reader_res["eval_data_generator"] = eval_data_generator loader_res["eval_data_generator"] = eval_data_generator
num_eval_examples = eval_processor._get_num_examples() num_eval_examples = eval_processor._get_num_examples()
logger.info("Num eval examples: %d" % num_eval_examples) logger.info("Num eval examples: %d" % num_eval_examples)
if args.do_test: if args.do_test:
...@@ -328,11 +371,12 @@ def build_data_reader(args, char_dict, intent_dict): ...@@ -328,11 +371,12 @@ def build_data_reader(args, char_dict, intent_dict):
data_path=args.data_dir + "test.txt", data_path=args.data_dir + "test.txt",
batch_size=args.batch_size, batch_size=args.batch_size,
mode='test') mode='test')
reader_res["test_data_generator"] = test_data_generator loader_res["test_data_generator"] = test_data_generator
return reader_res return loader_res
def build_graph(args, model_config, num_labels, dict_dim, place, test_place, reader_res): def build_graph(args, model_config, num_labels, dict_dim, place, test_place,
loader_res):
"""[build paddle graph] """[build paddle graph]
Arguments: Arguments:
...@@ -341,7 +385,7 @@ def build_graph(args, model_config, num_labels, dict_dim, place, test_place, rea ...@@ -341,7 +385,7 @@ def build_graph(args, model_config, num_labels, dict_dim, place, test_place, rea
num_labels {[type]} -- [description] num_labels {[type]} -- [description]
dict_dim {[type]} -- [description] dict_dim {[type]} -- [description]
place {[type]} -- [description] place {[type]} -- [description]
reader_res {[type]} -- [description] loader_res {[type]} -- [description]
Returns: Returns:
[type] -- [description] [type] -- [description]
...@@ -349,7 +393,7 @@ def build_graph(args, model_config, num_labels, dict_dim, place, test_place, rea ...@@ -349,7 +393,7 @@ def build_graph(args, model_config, num_labels, dict_dim, place, test_place, rea
res = {} res = {}
cost, prediction, pred_label, label = None, None, None, None cost, prediction, pred_label, label = None, None, None, None
train_prog = fluid.default_main_program() train_prog = fluid.default_main_program()
startup_prog = fluid.default_startup_program() startup_prog = fluid.default_startup_program()
eval_prog = train_prog.clone(for_test=True) eval_prog = train_prog.clone(for_test=True)
test_prog = train_prog.clone(for_test=True) test_prog = train_prog.clone(for_test=True)
...@@ -358,36 +402,42 @@ def build_graph(args, model_config, num_labels, dict_dim, place, test_place, rea ...@@ -358,36 +402,42 @@ def build_graph(args, model_config, num_labels, dict_dim, place, test_place, rea
if args.do_train: if args.do_train:
with fluid.program_guard(train_prog, startup_prog): with fluid.program_guard(train_prog, startup_prog):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
train_pyreader, cost, prediction, pred_label, label = create_net(args, model_config, num_labels, \ train_data_loader, cost, prediction, pred_label, label = create_net(args, model_config, num_labels, \
dict_dim, place, model_name="textcnn_net") dict_dim, place, model_name="textcnn_net")
train_pyreader.decorate_sample_list_generator(reader_res['train_data_generator'], places=place) train_data_loader.set_sample_list_generator(
res["train_pyreader"] = train_pyreader loader_res['train_data_generator'], places=place)
sgd_optimizer = fluid.optimizer.SGD(learning_rate=fluid.layers.exponential_decay( res["train_data_loader"] = train_data_loader
learning_rate=args.learning_rate, decay_steps=1000, decay_rate=0.5, staircase=True)) sgd_optimizer = fluid.optimizer.SGD(
learning_rate=fluid.layers.exponential_decay(
learning_rate=args.learning_rate,
decay_steps=1000,
decay_rate=0.5,
staircase=True))
sgd_optimizer.minimize(cost) sgd_optimizer.minimize(cost)
if args.do_eval: if args.do_eval:
with fluid.program_guard(eval_prog, startup_prog): with fluid.program_guard(eval_prog, startup_prog):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
eval_pyreader, cost, prediction, pred_label, label = create_net(args, model_config, num_labels, \ eval_data_loader, cost, prediction, pred_label, label = create_net(args, model_config, num_labels, \
dict_dim, test_place, model_name="textcnn_net") dict_dim, test_place, model_name="textcnn_net")
eval_pyreader.decorate_sample_list_generator(reader_res['eval_data_generator'], places=test_place) eval_data_loader.set_sample_list_generator(
res["eval_pyreader"] = eval_pyreader loader_res['eval_data_generator'], places=test_place)
res["eval_data_loader"] = eval_data_loader
if args.do_test: if args.do_test:
with fluid.program_guard(test_prog, startup_prog): with fluid.program_guard(test_prog, startup_prog):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
test_pyreader, cost, prediction, pred_label, label = create_net(args, model_config, num_labels, \ test_data_loader, cost, prediction, pred_label, label = create_net(args, model_config, num_labels, \
dict_dim, test_place, model_name="textcnn_net") dict_dim, test_place, model_name="textcnn_net")
test_pyreader.decorate_sample_list_generator(reader_res['test_data_generator'], places=test_place) test_data_loader.set_sample_list_generator(
res["test_pyreader"] = test_pyreader loader_res['test_data_generator'], places=test_place)
res["test_data_loader"] = test_data_loader
res["cost"] = cost res["cost"] = cost
res["prediction"] = prediction res["prediction"] = prediction
res["label"] = label res["label"] = label
res["pred_label"] = pred_label res["pred_label"] = pred_label
res["train_prog"] =train_prog res["train_prog"] = train_prog
res["eval_prog"] = eval_prog res["eval_prog"] = eval_prog
res["test_prog"] = test_prog res["test_prog"] = test_prog
return res return res
...@@ -421,22 +471,25 @@ def main(args): ...@@ -421,22 +471,25 @@ def main(args):
id2intent[int(value)] = key id2intent[int(value)] = key
num_labels = len(intent_dict) num_labels = len(intent_dict)
# build model # build model
reader_res = build_data_reader(args, char_dict, intent_dict) loader_res = build_data_loader(args, char_dict, intent_dict)
build_res = build_graph(args, model_config, num_labels, dict_dim, place, test_place, reader_res) build_res = build_graph(args, model_config, num_labels, dict_dim, place,
test_place, loader_res)
build_res["place"] = place build_res["place"] = place
build_res["test_place"] = test_place build_res["test_place"] = test_place
if not (args.do_train or args.do_eval or args.do_test): if not (args.do_train or args.do_eval or args.do_test):
raise ValueError("For args `do_train`, `do_eval` and `do_test`, at " raise ValueError("For args `do_train`, `do_eval` and `do_test`, at "
"least one of them must be True.") "least one of them must be True.")
exe.run(startup_prog) exe.run(startup_prog)
if args.init_checkpoint and args.init_checkpoint != "None": if args.init_checkpoint and args.init_checkpoint != "None":
try: try:
init_checkpoint(exe, args.init_checkpoint, main_program=startup_prog) init_checkpoint(
exe, args.init_checkpoint, main_program=startup_prog)
logger.info("Load model from %s" % args.init_checkpoint) logger.info("Load model from %s" % args.init_checkpoint)
except Exception as e: except Exception as e:
logger.exception(str(e)) logger.exception(str(e))
logger.error("Faild load model from %s [%s]" % (args.init_checkpoint, str(e))) logger.error("Faild load model from %s [%s]" %
(args.init_checkpoint, str(e)))
build_strategy = fluid.compiler.BuildStrategy() build_strategy = fluid.compiler.BuildStrategy()
build_strategy.fuse_all_reduce_ops = False build_strategy.fuse_all_reduce_ops = False
exec_strategy = fluid.ExecutionStrategy() exec_strategy = fluid.ExecutionStrategy()
...@@ -449,22 +502,23 @@ def main(args): ...@@ -449,22 +502,23 @@ def main(args):
exec_strategy=exec_strategy) exec_strategy=exec_strategy)
build_res["compiled_prog"] = compiled_prog build_res["compiled_prog"] = compiled_prog
if args.do_test: if args.do_test:
test_compiled_prog = fluid.compiler.CompiledProgram(build_res["test_prog"]) test_compiled_prog = fluid.compiler.CompiledProgram(build_res[
"test_prog"])
build_res["test_compiled_prog"] = test_compiled_prog build_res["test_compiled_prog"] = test_compiled_prog
if args.do_eval: if args.do_eval:
eval_compiled_prog = fluid.compiler.CompiledProgram(build_res["eval_prog"]) eval_compiled_prog = fluid.compiler.CompiledProgram(build_res[
"eval_prog"])
build_res["eval_compiled_prog"] = eval_compiled_prog build_res["eval_compiled_prog"] = eval_compiled_prog
if args.do_train: if args.do_train:
train(args, exe, build_res, place) train(args, exe, build_res, place)
if args.do_eval: if args.do_eval:
evaluate(args, exe, build_res, "eval", \ evaluate(args, exe, build_res, "eval", \
save_result=True, id2intent=id2intent)
if args.do_test:
evaluate(args, exe, build_res, "test",\
save_result=True, id2intent=id2intent) save_result=True, id2intent=id2intent)
if args.do_test:
evaluate(args, exe, build_res, "test",\
save_result=True, id2intent=id2intent)
if __name__ == "__main__": if __name__ == "__main__":
logger.info("the paddle version is %s" % paddle.__version__) logger.info("the paddle version is %s" % paddle.__version__)
......
...@@ -32,14 +32,13 @@ try: ...@@ -32,14 +32,13 @@ try:
except ImportError: except ImportError:
import ConfigParser as cp import ConfigParser as cp
random_seed = 7 random_seed = 7
logger = logging.getLogger() logger = logging.getLogger()
format = "%(asctime)s - %(name)s - %(levelname)s -%(filename)s-%(lineno)4d -%(message)s" format = "%(asctime)s - %(name)s - %(levelname)s -%(filename)s-%(lineno)4d -%(message)s"
# format = "%(levelname)8s: %(asctime)s: %(filename)s:%(lineno)4d %(message)s" # format = "%(levelname)8s: %(asctime)s: %(filename)s:%(lineno)4d %(message)s"
logging.basicConfig(format=format) logging.basicConfig(format=format)
logger.setLevel(logging.INFO) logger.setLevel(logging.INFO)
logger = logging.getLogger('Paddle-DDC') logger = logging.getLogger('Paddle-DDC')
def str2bool(v): def str2bool(v):
...@@ -77,6 +76,7 @@ class ArgumentGroup(object): ...@@ -77,6 +76,7 @@ class ArgumentGroup(object):
Arguments: Arguments:
object {[type]} -- [description] object {[type]} -- [description]
""" """
def __init__(self, parser, title, des): def __init__(self, parser, title, des):
self._group = parser.add_argument_group(title=title, description=des) self._group = parser.add_argument_group(title=title, description=des)
...@@ -107,6 +107,7 @@ class DataReader(object): ...@@ -107,6 +107,7 @@ class DataReader(object):
Returns: Returns:
[type] -- [description] [type] -- [description]
""" """
def __init__(self, char_vocab, intent_dict, max_len): def __init__(self, char_vocab, intent_dict, max_len):
self._char_vocab = char_vocab self._char_vocab = char_vocab
self._intent_dict = intent_dict self._intent_dict = intent_dict
...@@ -115,10 +116,10 @@ class DataReader(object): ...@@ -115,10 +116,10 @@ class DataReader(object):
self.all_data = [] self.all_data = []
self.max_len = max_len self.max_len = max_len
self.padding_id = 0 self.padding_id = 0
def _get_num_examples(self): def _get_num_examples(self):
return len(self.all_data) return len(self.all_data)
def prepare_data(self, data_path, batch_size, mode): def prepare_data(self, data_path, batch_size, mode):
""" """
prepare data prepare data
...@@ -128,12 +129,17 @@ class DataReader(object): ...@@ -128,12 +129,17 @@ class DataReader(object):
# word_dict_path), "The given word dictionary dose not exist." # word_dict_path), "The given word dictionary dose not exist."
assert os.path.exists(data_path), "The given data file does not exist." assert os.path.exists(data_path), "The given data file does not exist."
if mode == "train": if mode == "train":
train_reader = fluid.io.batch(paddle.reader.shuffle(self.data_reader(data_path, self.max_len, shuffle=True), train_reader = fluid.io.batch(
buf_size=batch_size * 100), batch_size) paddle.reader.shuffle(
self.data_reader(
data_path, self.max_len, shuffle=True),
buf_size=batch_size * 100),
batch_size)
# train_reader = fluid.io.batch(self.data_reader(data_path), batch_size) # train_reader = fluid.io.batch(self.data_reader(data_path), batch_size)
return train_reader return train_reader
else: else:
test_reader = fluid.io.batch(self.data_reader(data_path, self.max_len), batch_size) test_reader = fluid.io.batch(
self.data_reader(data_path, self.max_len), batch_size)
return test_reader return test_reader
def data_reader(self, file_path, max_len, shuffle=False): def data_reader(self, file_path, max_len, shuffle=False):
...@@ -141,7 +147,7 @@ class DataReader(object): ...@@ -141,7 +147,7 @@ class DataReader(object):
Convert query into id list Convert query into id list
use fixed voc use fixed voc
""" """
for line in codecs.open(file_path, "r", encoding="utf8"): for line in codecs.open(file_path, "r", encoding="utf8"):
line = line.strip() line = line.strip()
if isinstance(line, six.binary_type): if isinstance(line, six.binary_type):
...@@ -150,7 +156,8 @@ class DataReader(object): ...@@ -150,7 +156,8 @@ class DataReader(object):
char_id_list = list(map(lambda x: 0 if x not in self._char_vocab else int(self._char_vocab[x]), \ char_id_list = list(map(lambda x: 0 if x not in self._char_vocab else int(self._char_vocab[x]), \
list(query))) list(query)))
if len(char_id_list) < max_len: if len(char_id_list) < max_len:
char_id_list.extend([self.padding_id] * (max_len - len(char_id_list))) char_id_list.extend([self.padding_id] *
(max_len - len(char_id_list)))
char_id_list = char_id_list[:max_len] char_id_list = char_id_list[:max_len]
intent_id_list = [self.padding_id] * self.intent_size intent_id_list = [self.padding_id] * self.intent_size
for item in intent.split('\2'): for item in intent.split('\2'):
...@@ -159,6 +166,7 @@ class DataReader(object): ...@@ -159,6 +166,7 @@ class DataReader(object):
if shuffle: if shuffle:
random.seed(random_seed) random.seed(random_seed)
random.shuffle(self.all_data) random.shuffle(self.all_data)
def reader(): def reader():
""" """
reader reader
...@@ -166,6 +174,7 @@ class DataReader(object): ...@@ -166,6 +174,7 @@ class DataReader(object):
for char_id_list, intent_id_list in self.all_data: for char_id_list, intent_id_list in self.all_data:
# print char_id_list, intent_id # print char_id_list, intent_id
yield char_id_list, intent_id_list yield char_id_list, intent_id_list
return reader return reader
...@@ -178,6 +187,7 @@ class DataProcesser(object): ...@@ -178,6 +187,7 @@ class DataProcesser(object):
Returns: Returns:
[type] -- [description] [type] -- [description]
""" """
@staticmethod @staticmethod
def read_dict(filename): def read_dict(filename):
""" """
...@@ -211,7 +221,7 @@ class DataProcesser(object): ...@@ -211,7 +221,7 @@ class DataProcesser(object):
char_dict = {} char_dict = {}
intent_dict = {} intent_dict = {}
# readfile # readfile
for line in codecs.open(filename): for line in codecs.open(filename):
line = line.strip() line = line.strip()
if isinstance(line, six.binary_type): if isinstance(line, six.binary_type):
line = line.strip().decode("utf8", errors="ignore") line = line.strip().decode("utf8", errors="ignore")
...@@ -227,7 +237,8 @@ class DataProcesser(object): ...@@ -227,7 +237,8 @@ class DataProcesser(object):
intent_dict[intent] = 0 intent_dict[intent] = 0
intent_dict[intent] += 1 intent_dict[intent] += 1
# save char dict # save char dict
with codecs.open("%s/char.dict" % save_dir, "w", encoding="utf8") as f_out: with codecs.open(
"%s/char.dict" % save_dir, "w", encoding="utf8") as f_out:
f_out.write("PAD\0020\n") f_out.write("PAD\0020\n")
f_out.write("OOV\0021\n") f_out.write("OOV\0021\n")
char_id = 2 char_id = 2
...@@ -238,7 +249,8 @@ class DataProcesser(object): ...@@ -238,7 +249,8 @@ class DataProcesser(object):
f_out.write("%s\002%d\n" % (key, char_id)) f_out.write("%s\002%d\n" % (key, char_id))
char_id += 1 char_id += 1
# save intent dict # save intent dict
with codecs.open("%s/domain.dict" % save_dir, "w", encoding="utf8") as f_out: with codecs.open(
"%s/domain.dict" % save_dir, "w", encoding="utf8") as f_out:
f_out.write("SYS_OTHER\0020\n") f_out.write("SYS_OTHER\0020\n")
intent_id = 1 intent_id = 1
for key, value in intent_dict.items(): for key, value in intent_dict.items():
...@@ -247,7 +259,6 @@ class DataProcesser(object): ...@@ -247,7 +259,6 @@ class DataProcesser(object):
key = key.encode("utf8") key = key.encode("utf8")
f_out.write("%s\002%d\n" % (key, intent_id)) f_out.write("%s\002%d\n" % (key, intent_id))
intent_id += 1 intent_id += 1
class ConfigReader(object): class ConfigReader(object):
...@@ -282,49 +293,13 @@ class ConfigReader(object): ...@@ -282,49 +293,13 @@ class ConfigReader(object):
return flow_data return flow_data
def init_pretraining_params(exe,
pretraining_params_path,
main_program,
use_fp16=False):
"""load params of pretrained model, NOT including moment, learning_rate"""
assert os.path.exists(pretraining_params_path
), "[%s] cann't be found." % pretraining_params_path
def _existed_params(var):
if not isinstance(var, fluid.framework.Parameter):
return False
return os.path.exists(os.path.join(pretraining_params_path, var.name))
fluid.io.load_vars(
exe,
pretraining_params_path,
main_program=main_program,
predicate=_existed_params)
print("Load pretraining parameters from {}.".format(
pretraining_params_path))
def init_checkpoint(exe, init_checkpoint_path, main_program): def init_checkpoint(exe, init_checkpoint_path, main_program):
""" """
Init CheckPoint Init CheckPoint
""" """
assert os.path.exists( fluid.load(main_program, init_checkpoint_path, exe)
init_checkpoint_path), "[%s] cann't be found." % init_checkpoint_path print("Load model from {}".format(init_checkpoint_path))
def existed_persitables(var):
"""
If existed presitabels
"""
if not fluid.io.is_persistable(var):
return False
return os.path.exists(os.path.join(init_checkpoint_path, var.name))
fluid.io.load_vars(
exe,
init_checkpoint_path,
main_program=main_program,
predicate=existed_persitables)
print ("Load model from {}".format(init_checkpoint_path))
def print_arguments(args): def print_arguments(args):
""" """
...@@ -350,5 +325,3 @@ def check_version(version='1.6.0'): ...@@ -350,5 +325,3 @@ def check_version(version='1.6.0'):
except Exception as e: except Exception as e:
logger.error(err) logger.error(err)
sys.exit(1) sys.exit(1)
...@@ -468,7 +468,7 @@ python -u main.py \ ...@@ -468,7 +468,7 @@ python -u main.py \
--loss_type="CLS" --loss_type="CLS"
``` ```
#### windows环境下: #### windows环境下:
评估: 评估:
``` ```
python -u main.py --do_eval=true --use_cuda=false --evaluation_file=data\input\data\unlabel_data\test.ids --output_prediction_file=data\output\pretrain_matching_predict --loss_type=CLS python -u main.py --do_eval=true --use_cuda=false --evaluation_file=data\input\data\unlabel_data\test.ids --output_prediction_file=data\output\pretrain_matching_predict --loss_type=CLS
``` ```
......
...@@ -21,14 +21,16 @@ from kpi import DurationKpi ...@@ -21,14 +21,16 @@ from kpi import DurationKpi
train_loss_card1 = CostKpi('train_loss_card1', 0.03, 0, actived=True) train_loss_card1 = CostKpi('train_loss_card1', 0.03, 0, actived=True)
train_loss_card4 = CostKpi('train_loss_card4', 0.03, 0, actived=True) train_loss_card4 = CostKpi('train_loss_card4', 0.03, 0, actived=True)
train_duration_card1 = DurationKpi('train_duration_card1', 0.01, 0, actived=True) train_duration_card1 = DurationKpi(
train_duration_card4 = DurationKpi('train_duration_card4', 0.01, 0, actived=True) 'train_duration_card1', 0.01, 0, actived=True)
train_duration_card4 = DurationKpi(
'train_duration_card4', 0.01, 0, actived=True)
tracking_kpis = [ tracking_kpis = [
train_loss_card1, train_loss_card1,
train_loss_card4, train_loss_card4,
train_duration_card1, train_duration_card1,
train_duration_card4, train_duration_card4,
] ]
......
...@@ -20,48 +20,52 @@ import sys ...@@ -20,48 +20,52 @@ import sys
import io import io
import os import os
URLLIB=urllib URLLIB = urllib
if sys.version_info >= (3, 0): if sys.version_info >= (3, 0):
import urllib.request import urllib.request
URLLIB=urllib.request URLLIB = urllib.request
DATA_MODEL_PATH = {"DATA_PATH": "https://baidu-nlp.bj.bcebos.com/auto_dialogue_evaluation_dataset-1.0.0.tar.gz", DATA_MODEL_PATH = {
"TRAINED_MODEL": "https://baidu-nlp.bj.bcebos.com/auto_dialogue_evaluation_models.2.0.0.tar.gz"} "DATA_PATH":
"https://baidu-nlp.bj.bcebos.com/auto_dialogue_evaluation_dataset-1.0.0.tar.gz",
"TRAINED_MODEL":
"https://baidu-nlp.bj.bcebos.com/auto_dialogue_evaluation_models.2.0.0.tar.gz"
}
PATH_MAP = {'DATA_PATH': "./data/input", PATH_MAP = {'DATA_PATH': "./data/input", 'TRAINED_MODEL': './data/saved_models'}
'TRAINED_MODEL': './data/saved_models'}
def un_tar(tar_name, dir_name): def un_tar(tar_name, dir_name):
try: try:
t = tarfile.open(tar_name) t = tarfile.open(tar_name)
t.extractall(path = dir_name) t.extractall(path=dir_name)
return True return True
except Exception as e: except Exception as e:
print(e) print(e)
return False return False
def download_model_and_data(): def download_model_and_data():
print("Downloading ade data, pretrain model and trained models......") print("Downloading ade data, pretrain model and trained models......")
print("This process is quite long, please wait patiently............") print("This process is quite long, please wait patiently............")
for path in ['./data/input/data', './data/saved_models/trained_models']: for path in ['./data/input/data', './data/saved_models/trained_models']:
if not os.path.exists(path): if not os.path.exists(path):
continue continue
shutil.rmtree(path) shutil.rmtree(path)
for path_key in DATA_MODEL_PATH: for path_key in DATA_MODEL_PATH:
filename = os.path.basename(DATA_MODEL_PATH[path_key]) filename = os.path.basename(DATA_MODEL_PATH[path_key])
URLLIB.urlretrieve(DATA_MODEL_PATH[path_key], os.path.join("./", filename)) URLLIB.urlretrieve(DATA_MODEL_PATH[path_key],
os.path.join("./", filename))
state = un_tar(filename, PATH_MAP[path_key]) state = un_tar(filename, PATH_MAP[path_key])
if not state: if not state:
print("Tar %s error....." % path_key) print("Tar %s error....." % path_key)
return False return False
os.remove(filename) os.remove(filename)
return True return True
if __name__ == "__main__": if __name__ == "__main__":
state = download_model_and_data() state = download_model_and_data()
if not state: if not state:
exit(1) exit(1)
print("Downloading data and models sucess......") print("Downloading data and models sucess......")
...@@ -25,8 +25,8 @@ import numpy as np ...@@ -25,8 +25,8 @@ import numpy as np
import paddle.fluid as fluid import paddle.fluid as fluid
class InputField(object): class InputField(object):
def __init__(self, input_field): def __init__(self, input_field):
"""init inpit field""" """init inpit field"""
self.context_wordseq = input_field[0] self.context_wordseq = input_field[0]
self.response_wordseq = input_field[1] self.response_wordseq = input_field[1]
......
...@@ -30,7 +30,7 @@ def check_cuda(use_cuda, err = \ ...@@ -30,7 +30,7 @@ def check_cuda(use_cuda, err = \
if __name__ == "__main__": if __name__ == "__main__":
check_cuda(True) check_cuda(True)
check_cuda(False) check_cuda(False)
......
...@@ -69,8 +69,8 @@ def init_from_checkpoint(args, exe, program): ...@@ -69,8 +69,8 @@ def init_from_checkpoint(args, exe, program):
def init_from_params(args, exe, program): def init_from_params(args, exe, program):
assert isinstance(args.init_from_params, str) assert isinstance(args.init_from_params, str)
if not os.path.exists(args.init_from_params): if not os.path.exists(args.init_from_params):
raise Warning("the params path does not exist.") raise Warning("the params path does not exist.")
return False return False
...@@ -122,5 +122,3 @@ def save_param(args, exe, program, dirname): ...@@ -122,5 +122,3 @@ def save_param(args, exe, program, dirname):
print("save parameters at %s" % (os.path.join(param_dir, dirname))) print("save parameters at %s" % (os.path.join(param_dir, dirname)))
return True return True
...@@ -21,14 +21,13 @@ import paddle ...@@ -21,14 +21,13 @@ import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
def create_net( def create_net(is_training,
is_training, model_input,
model_input, args,
args, clip_value=10.0,
clip_value=10.0, word_emb_name="shared_word_emb",
word_emb_name="shared_word_emb", lstm_W_name="shared_lstm_W",
lstm_W_name="shared_lstm_W", lstm_bias_name="shared_lstm_bias"):
lstm_bias_name="shared_lstm_bias"):
context_wordseq = model_input.context_wordseq context_wordseq = model_input.context_wordseq
response_wordseq = model_input.response_wordseq response_wordseq = model_input.response_wordseq
...@@ -52,17 +51,15 @@ def create_net( ...@@ -52,17 +51,15 @@ def create_net(
initializer=fluid.initializer.Normal(scale=0.1))) initializer=fluid.initializer.Normal(scale=0.1)))
#fc to fit dynamic LSTM #fc to fit dynamic LSTM
context_fc = fluid.layers.fc( context_fc = fluid.layers.fc(input=context_emb,
input=context_emb, size=args.hidden_size * 4,
size=args.hidden_size * 4, param_attr=fluid.ParamAttr(name='fc_weight'),
param_attr=fluid.ParamAttr(name='fc_weight'), bias_attr=fluid.ParamAttr(name='fc_bias'))
bias_attr=fluid.ParamAttr(name='fc_bias'))
response_fc = fluid.layers.fc( response_fc = fluid.layers.fc(input=response_emb,
input=response_emb, size=args.hidden_size * 4,
size=args.hidden_size * 4, param_attr=fluid.ParamAttr(name='fc_weight'),
param_attr=fluid.ParamAttr(name='fc_weight'), bias_attr=fluid.ParamAttr(name='fc_bias'))
bias_attr=fluid.ParamAttr(name='fc_bias'))
#LSTM #LSTM
context_rep, _ = fluid.layers.dynamic_lstm( context_rep, _ = fluid.layers.dynamic_lstm(
...@@ -82,7 +79,7 @@ def create_net( ...@@ -82,7 +79,7 @@ def create_net(
logits = fluid.layers.bilinear_tensor_product( logits = fluid.layers.bilinear_tensor_product(
context_rep, response_rep, size=1) context_rep, response_rep, size=1)
if args.loss_type == 'CLS': if args.loss_type == 'CLS':
label = fluid.layers.cast(x=label, dtype='float32') label = fluid.layers.cast(x=label, dtype='float32')
loss = fluid.layers.sigmoid_cross_entropy_with_logits(logits, label) loss = fluid.layers.sigmoid_cross_entropy_with_logits(logits, label)
loss = fluid.layers.reduce_mean( loss = fluid.layers.reduce_mean(
...@@ -95,10 +92,10 @@ def create_net( ...@@ -95,10 +92,10 @@ def create_net(
loss = fluid.layers.reduce_mean(loss) loss = fluid.layers.reduce_mean(loss)
else: else:
raise ValueError raise ValueError
if is_training: if is_training:
return loss return loss
else: else:
return logits return logits
...@@ -106,7 +103,5 @@ def set_word_embedding(word_emb, place, word_emb_name="shared_word_emb"): ...@@ -106,7 +103,5 @@ def set_word_embedding(word_emb, place, word_emb_name="shared_word_emb"):
""" """
Set word embedding Set word embedding
""" """
word_emb_param = fluid.global_scope().find_var( word_emb_param = fluid.global_scope().find_var(word_emb_name).get_tensor()
word_emb_name).get_tensor()
word_emb_param.set(word_emb, place) word_emb_param.set(word_emb, place)
...@@ -23,13 +23,13 @@ import ade.evaluate as evaluate ...@@ -23,13 +23,13 @@ import ade.evaluate as evaluate
from ade.utils.configure import PDConfig from ade.utils.configure import PDConfig
def do_eval(args): def do_eval(args):
"""evaluate metrics""" """evaluate metrics"""
labels = [] labels = []
fr = io.open(args.evaluation_file, 'r', encoding="utf8") fr = io.open(args.evaluation_file, 'r', encoding="utf8")
for line in fr: for line in fr:
tokens = line.strip().split('\t') tokens = line.strip().split('\t')
assert len(tokens) == 3 assert len(tokens) == 3
label = int(tokens[2]) label = int(tokens[2])
labels.append(label) labels.append(label)
...@@ -43,25 +43,25 @@ def do_eval(args): ...@@ -43,25 +43,25 @@ def do_eval(args):
score = score.astype(np.float64) score = score.astype(np.float64)
scores.append(score) scores.append(score)
if args.loss_type == 'CLS': if args.loss_type == 'CLS':
recall_dict = evaluate.evaluate_Recall(list(zip(scores, labels))) recall_dict = evaluate.evaluate_Recall(list(zip(scores, labels)))
mean_score = sum(scores) / len(scores) mean_score = sum(scores) / len(scores)
print('mean score: %.6f' % mean_score) print('mean score: %.6f' % mean_score)
print('evaluation recall result:') print('evaluation recall result:')
print('1_in_2: %.6f\t1_in_10: %.6f\t2_in_10: %.6f\t5_in_10: %.6f' % print('1_in_2: %.6f\t1_in_10: %.6f\t2_in_10: %.6f\t5_in_10: %.6f' %
(recall_dict['1_in_2'], recall_dict['1_in_10'], (recall_dict['1_in_2'], recall_dict['1_in_10'],
recall_dict['2_in_10'], recall_dict['5_in_10'])) recall_dict['2_in_10'], recall_dict['5_in_10']))
elif args.loss_type == 'L2': elif args.loss_type == 'L2':
scores = [x[0] for x in scores] scores = [x[0] for x in scores]
mean_score = sum(scores) / len(scores) mean_score = sum(scores) / len(scores)
cor = evaluate.evaluate_cor(scores, labels) cor = evaluate.evaluate_cor(scores, labels)
print('mean score: %.6f\nevaluation cor results:%.6f' % print('mean score: %.6f\nevaluation cor results:%.6f' %
(mean_score, cor)) (mean_score, cor))
else: else:
raise ValueError raise ValueError
if __name__ == "__main__":
if __name__ == "__main__":
args = PDConfig(yaml_file="./data/config/ade.yaml") args = PDConfig(yaml_file="./data/config/ade.yaml")
args.build() args.build()
......
...@@ -42,22 +42,24 @@ def do_save_inference_model(args): ...@@ -42,22 +42,24 @@ def do_save_inference_model(args):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
context_wordseq = fluid.data( context_wordseq = fluid.data(
name='context_wordseq', shape=[-1, 1], dtype='int64', lod_level=1) name='context_wordseq',
shape=[-1, 1],
dtype='int64',
lod_level=1)
response_wordseq = fluid.data( response_wordseq = fluid.data(
name='response_wordseq', shape=[-1, 1], dtype='int64', lod_level=1) name='response_wordseq',
labels = fluid.data( shape=[-1, 1],
name='labels', shape=[-1, 1], dtype='int64') dtype='int64',
lod_level=1)
labels = fluid.data(name='labels', shape=[-1, 1], dtype='int64')
input_inst = [context_wordseq, response_wordseq, labels] input_inst = [context_wordseq, response_wordseq, labels]
input_field = InputField(input_inst) input_field = InputField(input_inst)
data_reader = fluid.io.PyReader(feed_list=input_inst, data_reader = fluid.io.PyReader(
capacity=4, iterable=False) feed_list=input_inst, capacity=4, iterable=False)
logits = create_net( logits = create_net(
is_training=False, is_training=False, model_input=input_field, args=args)
model_input=input_field,
args=args
)
if args.use_cuda: if args.use_cuda:
place = fluid.CUDAPlace(0) place = fluid.CUDAPlace(0)
...@@ -68,7 +70,7 @@ def do_save_inference_model(args): ...@@ -68,7 +70,7 @@ def do_save_inference_model(args):
exe.run(startup_prog) exe.run(startup_prog)
assert (args.init_from_params) or (args.init_from_pretrain_model) assert (args.init_from_params) or (args.init_from_pretrain_model)
if args.init_from_params: if args.init_from_params:
save_load_io.init_from_params(args, exe, test_prog) save_load_io.init_from_params(args, exe, test_prog)
elif args.init_from_pretrain_model: elif args.init_from_pretrain_model:
...@@ -76,24 +78,22 @@ def do_save_inference_model(args): ...@@ -76,24 +78,22 @@ def do_save_inference_model(args):
# saving inference model # saving inference model
fluid.io.save_inference_model( fluid.io.save_inference_model(
args.inference_model_dir, args.inference_model_dir,
feeded_var_names=[ feeded_var_names=[
input_field.context_wordseq.name, input_field.context_wordseq.name,
input_field.response_wordseq.name, input_field.response_wordseq.name,
], ],
target_vars=[ target_vars=[logits, ],
logits, executor=exe,
], main_program=test_prog,
executor=exe, model_filename="model.pdmodel",
main_program=test_prog, params_filename="params.pdparams")
model_filename="model.pdmodel",
params_filename="params.pdparams")
print("save inference model at %s" % (args.inference_model_dir)) print("save inference model at %s" % (args.inference_model_dir))
if __name__ == "__main__": if __name__ == "__main__":
args = PDConfig(yaml_file="./data/config/ade.yaml") args = PDConfig(yaml_file="./data/config/ade.yaml")
args.build() args.build()
check_cuda(args.use_cuda) check_cuda(args.use_cuda)
......
...@@ -26,7 +26,6 @@ from inference_model import do_save_inference_model ...@@ -26,7 +26,6 @@ from inference_model import do_save_inference_model
from ade.utils.configure import PDConfig from ade.utils.configure import PDConfig
if __name__ == "__main__": if __name__ == "__main__":
args = PDConfig(yaml_file="./data/config/ade.yaml") args = PDConfig(yaml_file="./data/config/ade.yaml")
......
...@@ -32,7 +32,7 @@ from ade.utils.model_check import check_cuda ...@@ -32,7 +32,7 @@ from ade.utils.model_check import check_cuda
import ade.utils.save_load_io as save_load_io import ade.utils.save_load_io as save_load_io
def do_predict(args): def do_predict(args):
""" """
predict function predict function
""" """
...@@ -46,30 +46,32 @@ def do_predict(args): ...@@ -46,30 +46,32 @@ def do_predict(args):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
context_wordseq = fluid.data( context_wordseq = fluid.data(
name='context_wordseq', shape=[-1, 1], dtype='int64', lod_level=1) name='context_wordseq',
shape=[-1, 1],
dtype='int64',
lod_level=1)
response_wordseq = fluid.data( response_wordseq = fluid.data(
name='response_wordseq', shape=[-1, 1], dtype='int64', lod_level=1) name='response_wordseq',
labels = fluid.data( shape=[-1, 1],
name='labels', shape=[-1, 1], dtype='int64') dtype='int64',
lod_level=1)
labels = fluid.data(name='labels', shape=[-1, 1], dtype='int64')
input_inst = [context_wordseq, response_wordseq, labels] input_inst = [context_wordseq, response_wordseq, labels]
input_field = InputField(input_inst) input_field = InputField(input_inst)
data_reader = fluid.io.PyReader(feed_list=input_inst, data_reader = fluid.io.PyReader(
capacity=4, iterable=False) feed_list=input_inst, capacity=4, iterable=False)
logits = create_net( logits = create_net(
is_training=False, is_training=False, model_input=input_field, args=args)
model_input=input_field,
args=args
)
logits.persistable = True logits.persistable = True
fetch_list = [logits.name] fetch_list = [logits.name]
#for_test is True if change the is_test attribute of operators to True #for_test is True if change the is_test attribute of operators to True
test_prog = test_prog.clone(for_test=True) test_prog = test_prog.clone(for_test=True)
if args.use_cuda: if args.use_cuda:
place = fluid.CUDAPlace(int(os.getenv('FLAGS_selected_gpus', '0'))) place = fluid.CUDAPlace(int(os.getenv('FLAGS_selected_gpus', '0')))
else: else:
place = fluid.CPUPlace() place = fluid.CPUPlace()
exe = fluid.Executor(place) exe = fluid.Executor(place)
...@@ -85,42 +87,39 @@ def do_predict(args): ...@@ -85,42 +87,39 @@ def do_predict(args):
processor = reader.DataProcessor( processor = reader.DataProcessor(
data_path=args.predict_file, data_path=args.predict_file,
max_seq_length=args.max_seq_len, max_seq_length=args.max_seq_len,
batch_size=args.batch_size) batch_size=args.batch_size)
batch_generator = processor.data_generator( batch_generator = processor.data_generator(
place=place, place=place, phase="test", shuffle=False, sample_pro=1)
phase="test",
shuffle=False,
sample_pro=1)
num_test_examples = processor.get_num_examples(phase='test') num_test_examples = processor.get_num_examples(phase='test')
data_reader.decorate_batch_generator(batch_generator) data_reader.decorate_batch_generator(batch_generator)
data_reader.start() data_reader.start()
scores = [] scores = []
while True: while True:
try: try:
results = exe.run(compiled_test_prog, fetch_list=fetch_list) results = exe.run(compiled_test_prog, fetch_list=fetch_list)
scores.extend(results[0]) scores.extend(results[0])
except fluid.core.EOFException: except fluid.core.EOFException:
data_reader.reset() data_reader.reset()
break break
scores = scores[: num_test_examples] scores = scores[:num_test_examples]
print("Write the predicted results into the output_prediction_file") print("Write the predicted results into the output_prediction_file")
fw = io.open(args.output_prediction_file, 'w', encoding="utf8") fw = io.open(args.output_prediction_file, 'w', encoding="utf8")
for index, score in enumerate(scores): for index, score in enumerate(scores):
fw.write("%s\t%s\n" % (index, score)) fw.write("%s\t%s\n" % (index, score))
print("finish........................................") print("finish........................................")
if __name__ == "__main__": if __name__ == "__main__":
args = PDConfig(yaml_file="./data/config/ade.yaml") args = PDConfig(yaml_file="./data/config/ade.yaml")
args.build() args.build()
args.Print() args.Print()
check_cuda(args.use_cuda) check_cuda(args.use_cuda)
do_predict(args) do_predict(args)
...@@ -31,7 +31,7 @@ from ade.utils.input_field import InputField ...@@ -31,7 +31,7 @@ from ade.utils.input_field import InputField
from ade.utils.model_check import check_cuda from ade.utils.model_check import check_cuda
import ade.utils.save_load_io as save_load_io import ade.utils.save_load_io as save_load_io
try: try:
import cPickle as pickle #python 2 import cPickle as pickle #python 2
except ImportError as e: except ImportError as e:
import pickle #python 3 import pickle #python 3
...@@ -47,24 +47,26 @@ def do_train(args): ...@@ -47,24 +47,26 @@ def do_train(args):
train_prog.random_seed = args.random_seed train_prog.random_seed = args.random_seed
startup_prog.random_seed = args.random_seed startup_prog.random_seed = args.random_seed
with fluid.unique_name.guard(): with fluid.unique_name.guard():
context_wordseq = fluid.data( context_wordseq = fluid.data(
name='context_wordseq', shape=[-1, 1], dtype='int64', lod_level=1) name='context_wordseq',
shape=[-1, 1],
dtype='int64',
lod_level=1)
response_wordseq = fluid.data( response_wordseq = fluid.data(
name='response_wordseq', shape=[-1, 1], dtype='int64', lod_level=1) name='response_wordseq',
labels = fluid.data( shape=[-1, 1],
name='labels', shape=[-1, 1], dtype='int64') dtype='int64',
lod_level=1)
labels = fluid.data(name='labels', shape=[-1, 1], dtype='int64')
input_inst = [context_wordseq, response_wordseq, labels] input_inst = [context_wordseq, response_wordseq, labels]
input_field = InputField(input_inst) input_field = InputField(input_inst)
data_reader = fluid.io.PyReader(feed_list=input_inst, data_reader = fluid.io.PyReader(
capacity=4, iterable=False) feed_list=input_inst, capacity=4, iterable=False)
loss = create_net( loss = create_net(
is_training=True, is_training=True, model_input=input_field, args=args)
model_input=input_field,
args=args
)
loss.persistable = True loss.persistable = True
# gradient clipping # gradient clipping
fluid.clip.set_gradient_clip(clip=fluid.clip.GradientClipByValue( fluid.clip.set_gradient_clip(clip=fluid.clip.GradientClipByValue(
...@@ -74,20 +76,21 @@ def do_train(args): ...@@ -74,20 +76,21 @@ def do_train(args):
if args.use_cuda: if args.use_cuda:
dev_count = fluid.core.get_cuda_device_count() dev_count = fluid.core.get_cuda_device_count()
place = fluid.CUDAPlace(int(os.getenv('FLAGS_selected_gpus', '0'))) place = fluid.CUDAPlace(
else: int(os.getenv('FLAGS_selected_gpus', '0')))
else:
dev_count = int(os.environ.get('CPU_NUM', 1)) dev_count = int(os.environ.get('CPU_NUM', 1))
place = fluid.CPUPlace() place = fluid.CPUPlace()
processor = reader.DataProcessor( processor = reader.DataProcessor(
data_path=args.training_file, data_path=args.training_file,
max_seq_length=args.max_seq_len, max_seq_length=args.max_seq_len,
batch_size=args.batch_size) batch_size=args.batch_size)
batch_generator = processor.data_generator( batch_generator = processor.data_generator(
place=place, place=place,
phase="train", phase="train",
shuffle=True, shuffle=True,
sample_pro=args.sample_pro) sample_pro=args.sample_pro)
num_train_examples = processor.get_num_examples(phase='train') num_train_examples = processor.get_num_examples(phase='train')
...@@ -105,18 +108,23 @@ def do_train(args): ...@@ -105,18 +108,23 @@ def do_train(args):
args.init_from_pretrain_model == "") args.init_from_pretrain_model == "")
#init from some checkpoint, to resume the previous training #init from some checkpoint, to resume the previous training
if args.init_from_checkpoint: if args.init_from_checkpoint:
save_load_io.init_from_checkpoint(args, exe, train_prog) save_load_io.init_from_checkpoint(args, exe, train_prog)
#init from some pretrain models, to better solve the current task #init from some pretrain models, to better solve the current task
if args.init_from_pretrain_model: if args.init_from_pretrain_model:
save_load_io.init_from_pretrain_model(args, exe, train_prog) save_load_io.init_from_pretrain_model(args, exe, train_prog)
if args.word_emb_init: if args.word_emb_init:
print("start loading word embedding init ...") print("start loading word embedding init ...")
if six.PY2: if six.PY2:
word_emb = np.array(pickle.load(io.open(args.word_emb_init, 'rb'))).astype('float32') word_emb = np.array(
pickle.load(io.open(args.word_emb_init, 'rb'))).astype(
'float32')
else: else:
word_emb = np.array(pickle.load(io.open(args.word_emb_init, 'rb'), encoding="bytes")).astype('float32') word_emb = np.array(
pickle.load(
io.open(args.word_emb_init, 'rb'),
encoding="bytes")).astype('float32')
set_word_embedding(word_emb, place) set_word_embedding(word_emb, place)
print("finish init word embedding ...") print("finish init word embedding ...")
...@@ -124,69 +132,74 @@ def do_train(args): ...@@ -124,69 +132,74 @@ def do_train(args):
build_strategy.enable_inplace = True build_strategy.enable_inplace = True
compiled_train_prog = fluid.CompiledProgram(train_prog).with_data_parallel( compiled_train_prog = fluid.CompiledProgram(train_prog).with_data_parallel(
loss_name=loss.name, build_strategy=build_strategy) loss_name=loss.name, build_strategy=build_strategy)
steps = 0 steps = 0
begin_time = time.time() begin_time = time.time()
time_begin = time.time() time_begin = time.time()
for epoch_step in range(args.epoch): for epoch_step in range(args.epoch):
data_reader.start() data_reader.start()
sum_loss = 0.0 sum_loss = 0.0
ce_loss = 0.0 ce_loss = 0.0
while True: while True:
try: try:
fetch_list = [loss.name] fetch_list = [loss.name]
outputs = exe.run(compiled_train_prog, fetch_list=fetch_list) outputs = exe.run(compiled_train_prog, fetch_list=fetch_list)
np_loss = outputs np_loss = outputs
sum_loss += np.array(np_loss).mean() sum_loss += np.array(np_loss).mean()
ce_loss = np.array(np_loss).mean() ce_loss = np.array(np_loss).mean()
if steps % args.print_steps == 0: if steps % args.print_steps == 0:
time_end = time.time() time_end = time.time()
used_time = time_end - time_begin used_time = time_end - time_begin
current_time = time.strftime('%Y-%m-%d %H:%M:%S', current_time = time.strftime('%Y-%m-%d %H:%M:%S',
time.localtime(time.time())) time.localtime(time.time()))
print('%s epoch: %d, step: %s, avg loss %s, speed: %f steps/s' % (current_time, epoch_step, steps, sum_loss / args.print_steps, args.print_steps / used_time)) print(
'%s epoch: %d, step: %s, avg loss %s, speed: %f steps/s'
% (current_time, epoch_step, steps, sum_loss /
args.print_steps, args.print_steps / used_time))
sum_loss = 0.0 sum_loss = 0.0
time_begin = time.time() time_begin = time.time()
if steps % args.save_steps == 0: if steps % args.save_steps == 0:
if args.save_checkpoint: if args.save_checkpoint:
save_load_io.save_checkpoint(args, exe, train_prog, "step_" + str(steps)) save_load_io.save_checkpoint(args, exe, train_prog,
if args.save_param: "step_" + str(steps))
save_load_io.save_param(args, exe, train_prog, "step_" + str(steps)) if args.save_param:
save_load_io.save_param(args, exe, train_prog,
"step_" + str(steps))
steps += 1 steps += 1
except fluid.core.EOFException: except fluid.core.EOFException:
data_reader.reset() data_reader.reset()
break break
if args.save_checkpoint: if args.save_checkpoint:
save_load_io.save_checkpoint(args, exe, train_prog, "step_final") save_load_io.save_checkpoint(args, exe, train_prog, "step_final")
if args.save_param: if args.save_param:
save_load_io.save_param(args, exe, train_prog, "step_final") save_load_io.save_param(args, exe, train_prog, "step_final")
def get_cards(): def get_cards():
num = 0 num = 0
cards = os.environ.get('CUDA_VISIBLE_DEVICES', '') cards = os.environ.get('CUDA_VISIBLE_DEVICES', '')
if cards != '': if cards != '':
num = len(cards.split(",")) num = len(cards.split(","))
return num return num
if args.enable_ce: if args.enable_ce:
card_num = get_cards() card_num = get_cards()
pass_time_cost = time.time() - begin_time pass_time_cost = time.time() - begin_time
print("test_card_num", card_num) print("test_card_num", card_num)
print("kpis\ttrain_duration_card%s\t%s" % (card_num, pass_time_cost)) print("kpis\ttrain_duration_card%s\t%s" % (card_num, pass_time_cost))
print("kpis\ttrain_loss_card%s\t%f" % (card_num, ce_loss)) print("kpis\ttrain_loss_card%s\t%f" % (card_num, ce_loss))
if __name__ == '__main__': if __name__ == '__main__':
args = PDConfig(yaml_file="./data/config/ade.yaml") args = PDConfig(yaml_file="./data/config/ade.yaml")
args.build() args.build()
args.Print() args.Print()
check_cuda(args.use_cuda) check_cuda(args.use_cuda)
do_train(args) do_train(args)
...@@ -62,7 +62,7 @@ SWDA:Switchboard Dialogue Act Corpus; ...@@ -62,7 +62,7 @@ SWDA:Switchboard Dialogue Act Corpus;
&ensp;&ensp;&ensp;&ensp;数据集、相关模型下载: &ensp;&ensp;&ensp;&ensp;数据集、相关模型下载:
&ensp;&ensp;&ensp;&ensp;linux环境下: &ensp;&ensp;&ensp;&ensp;linux环境下:
``` ```
python dgu/prepare_data_and_model.py python dgu/prepare_data_and_model.py
``` ```
&ensp;&ensp;&ensp;&ensp;数据路径:data/input/data &ensp;&ensp;&ensp;&ensp;数据路径:data/input/data
...@@ -72,7 +72,7 @@ python dgu/prepare_data_and_model.py ...@@ -72,7 +72,7 @@ python dgu/prepare_data_and_model.py
&ensp;&ensp;&ensp;&ensp;windows环境下: &ensp;&ensp;&ensp;&ensp;windows环境下:
``` ```
python dgu\prepare_data_and_model.py python dgu\prepare_data_and_model.py
``` ```
&ensp;&ensp;&ensp;&ensp;下载的数据集中已提供了训练集,测试集和验证集,用户如果需要重新生成某任务数据集的训练数据,可执行: &ensp;&ensp;&ensp;&ensp;下载的数据集中已提供了训练集,测试集和验证集,用户如果需要重新生成某任务数据集的训练数据,可执行:
...@@ -164,19 +164,19 @@ task_type: train,predict, evaluate, inference, all, 选择5个参数选项中 ...@@ -164,19 +164,19 @@ task_type: train,predict, evaluate, inference, all, 选择5个参数选项中
训练示例: bash run.sh atis_intent train 训练示例: bash run.sh atis_intent train
``` ```
&ensp;&ensp;&ensp;&ensp;如果为CPU训练: &ensp;&ensp;&ensp;&ensp;如果为CPU训练:
``` ```
请将run.sh内参数设置为: 请将run.sh内参数设置为:
1、export CUDA_VISIBLE_DEVICES= 1、export CUDA_VISIBLE_DEVICES=
``` ```
&ensp;&ensp;&ensp;&ensp;如果为GPU训练: &ensp;&ensp;&ensp;&ensp;如果为GPU训练:
``` ```
请将run.sh内参数设置为: 请将run.sh内参数设置为:
1、如果为单卡训练(用户指定空闲的单卡): 1、如果为单卡训练(用户指定空闲的单卡):
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
2、如果为多卡训练(用户指定空闲的多张卡): 2、如果为多卡训练(用户指定空闲的多张卡):
export CUDA_VISIBLE_DEVICES=0,1,2,3 export CUDA_VISIBLE_DEVICES=0,1,2,3
``` ```
...@@ -252,19 +252,19 @@ task_type: train,predict, evaluate, inference, all, 选择5个参数选项中 ...@@ -252,19 +252,19 @@ task_type: train,predict, evaluate, inference, all, 选择5个参数选项中
预测示例: bash run.sh atis_intent predict 预测示例: bash run.sh atis_intent predict
``` ```
&ensp;&ensp;&ensp;&ensp;如果为CPU预测: &ensp;&ensp;&ensp;&ensp;如果为CPU预测:
``` ```
请将run.sh内参数设置为: 请将run.sh内参数设置为:
1、export CUDA_VISIBLE_DEVICES= 1、export CUDA_VISIBLE_DEVICES=
``` ```
&ensp;&ensp;&ensp;&ensp;如果为GPU预测: &ensp;&ensp;&ensp;&ensp;如果为GPU预测:
``` ```
请将run.sh内参数设置为: 请将run.sh内参数设置为:
支持单卡预测(用户指定空闲的单卡): 支持单卡预测(用户指定空闲的单卡):
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
``` ```
注:预测时,如采用方式一,用户可通过修改run.sh中init_from_params参数来指定自己训练好的需要预测的模型,目前代码中默认为加载官方已经训练好的模型; 注:预测时,如采用方式一,用户可通过修改run.sh中init_from_params参数来指定自己训练好的需要预测的模型,目前代码中默认为加载官方已经训练好的模型;
...@@ -348,7 +348,7 @@ task_type: train,predict, evaluate, inference, all, 选择5个参数选项中 ...@@ -348,7 +348,7 @@ task_type: train,predict, evaluate, inference, all, 选择5个参数选项中
注:评估计算ground_truth和predict_label之间的打分,默认CPU计算即可; 注:评估计算ground_truth和predict_label之间的打分,默认CPU计算即可;
#### &ensp;&ensp;&ensp;&ensp;方式二: 执行评估相关的代码: #### &ensp;&ensp;&ensp;&ensp;方式二: 执行评估相关的代码:
``` ```
TASK_NAME="atis_intent" #指定预测的任务名称 TASK_NAME="atis_intent" #指定预测的任务名称
...@@ -363,7 +363,7 @@ python -u main.py \ ...@@ -363,7 +363,7 @@ python -u main.py \
#### windows环境下 #### windows环境下
``` ```
python -u main.py --task_name=atis_intent --use_cuda=false --do_eval=true --evaluation_file=data\input\data\atis\atis_intent\test.txt --output_prediction_file=data\output\pred_atis_intent python -u main.py --task_name=atis_intent --use_cuda=false --do_eval=true --evaluation_file=data\input\data\atis\atis_intent\test.txt --output_prediction_file=data\output\pred_atis_intent
``` ```
### 模型推断 ### 模型推断
...@@ -378,22 +378,22 @@ task_type: train,predict, evaluate, inference, all, 选择5个参数选项中 ...@@ -378,22 +378,22 @@ task_type: train,predict, evaluate, inference, all, 选择5个参数选项中
保存模型示例: bash run.sh atis_intent inference 保存模型示例: bash run.sh atis_intent inference
``` ```
&ensp;&ensp;&ensp;&ensp;如果为CPU执行inference model过程: &ensp;&ensp;&ensp;&ensp;如果为CPU执行inference model过程:
``` ```
请将run.sh内参数设置为: 请将run.sh内参数设置为:
1、export CUDA_VISIBLE_DEVICES= 1、export CUDA_VISIBLE_DEVICES=
``` ```
&ensp;&ensp;&ensp;&ensp;如果为GPU执行inference model过程: &ensp;&ensp;&ensp;&ensp;如果为GPU执行inference model过程:
``` ```
请将run.sh内参数设置为: 请将run.sh内参数设置为:
1、单卡模型推断(用户指定空闲的单卡): 1、单卡模型推断(用户指定空闲的单卡):
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
``` ```
#### &ensp;&ensp;&ensp;&ensp;方式二: 执行inference model相关的代码: #### &ensp;&ensp;&ensp;&ensp;方式二: 执行inference model相关的代码:
``` ```
TASK_NAME="atis_intent" #指定预测的任务名称 TASK_NAME="atis_intent" #指定预测的任务名称
...@@ -459,7 +459,7 @@ python -u main.py \ ...@@ -459,7 +459,7 @@ python -u main.py \
&ensp;&ensp;&ensp;&ensp;用户也可以根据自己的需求,组建自定义的模型,具体方法如下所示: &ensp;&ensp;&ensp;&ensp;用户也可以根据自己的需求,组建自定义的模型,具体方法如下所示:
&ensp;&ensp;&ensp;&ensp;a、自定义数据 &ensp;&ensp;&ensp;&ensp;a、自定义数据
&ensp;&ensp;&ensp;&ensp;&ensp;&ensp;如用户目前有数据集为**task_name**, 则在**data/input/data**下定义**task_name**文件夹,将数据集存放进去;在**dgu/reader.py**中,新增自定义的数据处理的类,如**udc**数据集对应**UDCProcessor**; 在**train.py**内设置**task_name****processor**的对应关系(如**processors = {'udc': reader.UDCProcessor}**). &ensp;&ensp;&ensp;&ensp;&ensp;&ensp;如用户目前有数据集为**task_name**, 则在**data/input/data**下定义**task_name**文件夹,将数据集存放进去;在**dgu/reader.py**中,新增自定义的数据处理的类,如**udc**数据集对应**UDCProcessor**; 在**train.py**内设置**task_name****processor**的对应关系(如**processors = {'udc': reader.UDCProcessor}**).
...@@ -481,7 +481,7 @@ python -u main.py \ ...@@ -481,7 +481,7 @@ python -u main.py \
- Elizabeth Shriberg, Raj Dhillon, Sonali Bhagat, JeremyAng, and Hannah Carvey. 2004. The icsi meetingrecorder dialog act (mrda) corpus. Technical report,INTERNATIONAL COMPUTER SCIENCE INSTBERKELEY CA. - Elizabeth Shriberg, Raj Dhillon, Sonali Bhagat, JeremyAng, and Hannah Carvey. 2004. The icsi meetingrecorder dialog act (mrda) corpus. Technical report,INTERNATIONAL COMPUTER SCIENCE INSTBERKELEY CA.
- Andreas Stolcke, Klaus Ries, Noah Coccaro, Eliza-beth Shriberg, Rebecca Bates, Daniel Jurafsky, PaulTaylor, Rachel Martin, Carol Van Ess-Dykema, andMarie Meteer. 2000. Dialogue act modeling for au-tomatic tagging and recognition of conversationalspeech.Computational linguistics, 26(3):339–373. - Andreas Stolcke, Klaus Ries, Noah Coccaro, Eliza-beth Shriberg, Rebecca Bates, Daniel Jurafsky, PaulTaylor, Rachel Martin, Carol Van Ess-Dykema, andMarie Meteer. 2000. Dialogue act modeling for au-tomatic tagging and recognition of conversationalspeech.Computational linguistics, 26(3):339–373.
- Ye-Yi Wang, Li Deng, and Alex Acero. 2005. Spo-ken language understanding.IEEE Signal Process-ing Magazine, 22(5):16–31.Jason Williams, Antoine Raux, Deepak Ramachan-dran, and Alan Black. 2013. The dialog state tracking challenge. InProceedings of the SIGDIAL 2013Conference, pages 404–413. - Ye-Yi Wang, Li Deng, and Alex Acero. 2005. Spo-ken language understanding.IEEE Signal Process-ing Magazine, 22(5):16–31.Jason Williams, Antoine Raux, Deepak Ramachan-dran, and Alan Black. 2013. The dialog state tracking challenge. InProceedings of the SIGDIAL 2013Conference, pages 404–413.
- Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc VLe, Mohammad Norouzi, Wolfgang Macherey,Maxim Krikun, Yuan Cao, Qin Gao, KlausMacherey, et al. 2016. Google’s neural ma-chine translation system: Bridging the gap betweenhuman and machine translation.arXiv preprintarXiv:1609.08144.Kaisheng - Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc VLe, Mohammad Norouzi, Wolfgang Macherey,Maxim Krikun, Yuan Cao, Qin Gao, KlausMacherey, et al. 2016. Google’s neural ma-chine translation system: Bridging the gap betweenhuman and machine translation.arXiv preprintarXiv:1609.08144.Kaisheng
- Yao, Geoffrey Zweig, Mei-Yuh Hwang,Yangyang Shi, and Dong Yu. 2013. Recurrent neu-ral networks for language understanding. InInter-speech, pages 2524–2528. - Yao, Geoffrey Zweig, Mei-Yuh Hwang,Yangyang Shi, and Dong Yu. 2013. Recurrent neu-ral networks for language understanding. InInter-speech, pages 2524–2528.
- Xiangyang Zhou, Lu Li, Daxiang Dong, Yi Liu, YingChen, Wayne Xin Zhao, Dianhai Yu, and Hua Wu.2018. Multi-turn response selection for chatbotswith deep attention matching network. InProceed-ings of the 56th Annual Meeting of the Associationfor Computational Linguistics (Volume 1: Long Pa-pers), volume 1, pages 1118–1127. - Xiangyang Zhou, Lu Li, Daxiang Dong, Yi Liu, YingChen, Wayne Xin Zhao, Dianhai Yu, and Hua Wu.2018. Multi-turn response selection for chatbotswith deep attention matching network. InProceed-ings of the 56th Annual Meeting of the Associationfor Computational Linguistics (Volume 1: Long Pa-pers), volume 1, pages 1118–1127.
- Su Zhu and Kai Yu. 2017. Encoder-decoder withfocus-mechanism for sequence labelling based spo-ken language understanding. In2017 IEEE Interna-tional Conference on Acoustics, Speech and SignalProcessing (ICASSP), pages 5675–5679. IEEE. - Su Zhu and Kai Yu. 2017. Encoder-decoder withfocus-mechanism for sequence labelling based spo-ken language understanding. In2017 IEEE Interna-tional Conference on Acoustics, Speech and SignalProcessing (ICASSP), pages 5675–5679. IEEE.
......
...@@ -20,20 +20,26 @@ from kpi import CostKpi ...@@ -20,20 +20,26 @@ from kpi import CostKpi
from kpi import DurationKpi from kpi import DurationKpi
from kpi import AccKpi from kpi import AccKpi
each_step_duration_atis_slot_card1 = DurationKpi('each_step_duration_atis_slot_card1', 0.01, 0, actived=True) each_step_duration_atis_slot_card1 = DurationKpi(
train_loss_atis_slot_card1 = CostKpi('train_loss_atis_slot_card1', 0.08, 0, actived=True) 'each_step_duration_atis_slot_card1', 0.01, 0, actived=True)
train_acc_atis_slot_card1 = CostKpi('train_acc_atis_slot_card1', 0.01, 0, actived=True) train_loss_atis_slot_card1 = CostKpi(
each_step_duration_atis_slot_card4 = DurationKpi('each_step_duration_atis_slot_card4', 0.06, 0, actived=True) 'train_loss_atis_slot_card1', 0.08, 0, actived=True)
train_loss_atis_slot_card4 = CostKpi('train_loss_atis_slot_card4', 0.03, 0, actived=True) train_acc_atis_slot_card1 = CostKpi(
train_acc_atis_slot_card4 = CostKpi('train_acc_atis_slot_card4', 0.01, 0, actived=True) 'train_acc_atis_slot_card1', 0.01, 0, actived=True)
each_step_duration_atis_slot_card4 = DurationKpi(
'each_step_duration_atis_slot_card4', 0.06, 0, actived=True)
train_loss_atis_slot_card4 = CostKpi(
'train_loss_atis_slot_card4', 0.03, 0, actived=True)
train_acc_atis_slot_card4 = CostKpi(
'train_acc_atis_slot_card4', 0.01, 0, actived=True)
tracking_kpis = [ tracking_kpis = [
each_step_duration_atis_slot_card1, each_step_duration_atis_slot_card1,
train_loss_atis_slot_card1, train_loss_atis_slot_card1,
train_acc_atis_slot_card1, train_acc_atis_slot_card1,
each_step_duration_atis_slot_card4, each_step_duration_atis_slot_card4,
train_loss_atis_slot_card4, train_loss_atis_slot_card4,
train_acc_atis_slot_card4, train_acc_atis_slot_card4,
] ]
......
...@@ -75,8 +75,8 @@ def mask(batch_tokens, total_token_num, vocab_size, CLS=1, SEP=2, MASK=3): ...@@ -75,8 +75,8 @@ def mask(batch_tokens, total_token_num, vocab_size, CLS=1, SEP=2, MASK=3):
def prepare_batch_data(task_name, def prepare_batch_data(task_name,
insts, insts,
max_len, max_len,
total_token_num, total_token_num,
voc_size=0, voc_size=0,
pad_id=None, pad_id=None,
...@@ -98,14 +98,18 @@ def prepare_batch_data(task_name, ...@@ -98,14 +98,18 @@ def prepare_batch_data(task_name,
# compatible with squad, whose example includes start/end positions, # compatible with squad, whose example includes start/end positions,
# or unique id # or unique id
if isinstance(insts[0][3], list): if isinstance(insts[0][3], list):
if task_name == "atis_slot": if task_name == "atis_slot":
labels_list = [inst[3] + [0] * (max_len - len(inst[3])) for inst in insts] labels_list = [
labels_list = [np.array(labels_list).astype("int64").reshape([-1, max_len])] inst[3] + [0] * (max_len - len(inst[3])) for inst in insts
elif task_name == "dstc2": ]
labels_list = [
np.array(labels_list).astype("int64").reshape([-1, max_len])
]
elif task_name == "dstc2":
labels_list = [inst[3] for inst in insts] labels_list = [inst[3] for inst in insts]
labels_list = [np.array(labels_list).astype("int64")] labels_list = [np.array(labels_list).astype("int64")]
else: else:
for i in range(3, len(insts[0]), 1): for i in range(3, len(insts[0]), 1):
labels = [inst[i] for inst in insts] labels = [inst[i] for inst in insts]
labels = np.array(labels).astype("int64").reshape([-1, 1]) labels = np.array(labels).astype("int64").reshape([-1, 1])
...@@ -124,28 +128,25 @@ def prepare_batch_data(task_name, ...@@ -124,28 +128,25 @@ def prepare_batch_data(task_name,
out = batch_src_ids out = batch_src_ids
# Second step: padding # Second step: padding
src_id, self_input_mask = pad_batch_data( src_id, self_input_mask = pad_batch_data(
out, out, max_len, pad_idx=pad_id, return_input_mask=True)
max_len,
pad_idx=pad_id,
return_input_mask=True)
pos_id = pad_batch_data( pos_id = pad_batch_data(
batch_pos_ids, batch_pos_ids,
max_len, max_len,
pad_idx=pad_id, pad_idx=pad_id,
return_pos=False, return_pos=False,
return_input_mask=False) return_input_mask=False)
sent_id = pad_batch_data( sent_id = pad_batch_data(
batch_sent_ids, batch_sent_ids,
max_len, max_len,
pad_idx=pad_id, pad_idx=pad_id,
return_pos=False, return_pos=False,
return_input_mask=False) return_input_mask=False)
if mask_id >= 0: if mask_id >= 0:
return_list = [ return_list = [
src_id, pos_id, sent_id, self_input_mask, mask_label, mask_pos src_id, pos_id, sent_id, self_input_mask, mask_label, mask_pos
] + labels_list ] + labels_list
else: else:
return_list = [src_id, pos_id, sent_id, self_input_mask] + labels_list return_list = [src_id, pos_id, sent_id, self_input_mask] + labels_list
return return_list if len(return_list) > 1 else return_list[0] return return_list if len(return_list) > 1 else return_list[0]
...@@ -163,13 +164,13 @@ def pad_batch_data(insts, ...@@ -163,13 +164,13 @@ def pad_batch_data(insts,
corresponding position data and attention bias. corresponding position data and attention bias.
""" """
return_list = [] return_list = []
max_len = max_len_in if max_len_in != -1 else max(len(inst) for inst in insts) max_len = max_len_in if max_len_in != -1 else max(
len(inst) for inst in insts)
# Any token included in dict can be used to pad, since the paddings' loss # Any token included in dict can be used to pad, since the paddings' loss
# will be masked out by weights and make no effect on parameter gradients. # will be masked out by weights and make no effect on parameter gradients.
inst_data = np.array( inst_data = np.array(
[inst + list([pad_idx] * (max_len - len(inst))) for inst in insts [inst + list([pad_idx] * (max_len - len(inst))) for inst in insts])
])
return_list += [inst_data.astype("int64").reshape([-1, max_len])] return_list += [inst_data.astype("int64").reshape([-1, max_len])]
# position data # position data
...@@ -183,10 +184,10 @@ def pad_batch_data(insts, ...@@ -183,10 +184,10 @@ def pad_batch_data(insts,
if return_input_mask: if return_input_mask:
# This is used to avoid attention on paddings. # This is used to avoid attention on paddings.
input_mask_data = np.array([[1] * len(inst) + [0] * input_mask_data = np.array([[1] * len(inst) + [0] *
(max_len - len(inst)) for inst in insts]) (max_len - len(inst)) for inst in insts])
input_mask_data = np.expand_dims(input_mask_data, axis=-1) input_mask_data = np.expand_dims(input_mask_data, axis=-1)
return_list += [input_mask_data.astype("float32")] return_list += [input_mask_data.astype("float32")]
if return_max_len: if return_max_len:
return_list += [max_len] return_list += [max_len]
......
...@@ -21,31 +21,34 @@ import paddle ...@@ -21,31 +21,34 @@ import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
class DefinePredict(object): class DefinePredict(object):
""" """
Packaging Prediction Results Packaging Prediction Results
""" """
def __init__(self):
def __init__(self):
""" """
init init
""" """
self.task_map = {'udc': 'get_matching_res', self.task_map = {
'swda': 'get_cls_res', 'udc': 'get_matching_res',
'mrda': 'get_cls_res', 'swda': 'get_cls_res',
'atis_intent': 'get_cls_res', 'mrda': 'get_cls_res',
'atis_slot': 'get_sequence_tagging', 'atis_intent': 'get_cls_res',
'dstc2': 'get_multi_cls_res', 'atis_slot': 'get_sequence_tagging',
'dstc2_asr': 'get_multi_cls_res', 'dstc2': 'get_multi_cls_res',
'multi-woz': 'get_multi_cls_res'} 'dstc2_asr': 'get_multi_cls_res',
'multi-woz': 'get_multi_cls_res'
}
def get_matching_res(self, probs, params=None): def get_matching_res(self, probs, params=None):
""" """
get matching score get matching score
""" """
probs = list(probs) probs = list(probs)
return probs[1] return probs[1]
def get_cls_res(self, probs, params=None): def get_cls_res(self, probs, params=None):
""" """
get da classify tag get da classify tag
""" """
...@@ -54,7 +57,7 @@ class DefinePredict(object): ...@@ -54,7 +57,7 @@ class DefinePredict(object):
tag = probs.index(max_prob) tag = probs.index(max_prob)
return tag return tag
def get_sequence_tagging(self, probs, params=None): def get_sequence_tagging(self, probs, params=None):
""" """
get sequence tagging tag get sequence tagging tag
""" """
...@@ -63,23 +66,19 @@ class DefinePredict(object): ...@@ -63,23 +66,19 @@ class DefinePredict(object):
labels = [" ".join([str(l) for l in list(l_l)]) for l_l in batch_labels] labels = [" ".join([str(l) for l in list(l_l)]) for l_l in batch_labels]
return labels return labels
def get_multi_cls_res(self, probs, params=None): def get_multi_cls_res(self, probs, params=None):
""" """
get dst classify tag get dst classify tag
""" """
labels = [] labels = []
probs = list(probs) probs = list(probs)
for i in range(len(probs)): for i in range(len(probs)):
if probs[i] >= 0.5: if probs[i] >= 0.5:
labels.append(i) labels.append(i)
if not labels: if not labels:
max_prob = max(probs) max_prob = max(probs)
label_str = str(probs.index(max_prob)) label_str = str(probs.index(max_prob))
else: else:
label_str = " ".join([str(l) for l in sorted(labels)]) label_str = " ".join([str(l) for l in sorted(labels)])
return label_str return label_str
...@@ -20,51 +20,60 @@ import sys ...@@ -20,51 +20,60 @@ import sys
import io import io
import os import os
URLLIB = urllib
URLLIB=urllib if sys.version_info >= (3, 0):
if sys.version_info >= (3, 0):
import urllib.request import urllib.request
URLLIB=urllib.request URLLIB = urllib.request
DATA_MODEL_PATH = {"DATA_PATH": "https://baidu-nlp.bj.bcebos.com/dmtk_data_1.0.0.tar.gz", DATA_MODEL_PATH = {
"PRETRAIN_MODEL": "https://bert-models.bj.bcebos.com/uncased_L-12_H-768_A-12.tar.gz", "DATA_PATH": "https://baidu-nlp.bj.bcebos.com/dmtk_data_1.0.0.tar.gz",
"TRAINED_MODEL": "https://baidu-nlp.bj.bcebos.com/dgu_models_2.0.0.tar.gz"} "PRETRAIN_MODEL":
"https://bert-models.bj.bcebos.com/uncased_L-12_H-768_A-12.tar.gz",
"TRAINED_MODEL": "https://baidu-nlp.bj.bcebos.com/dgu_models_2.0.0.tar.gz"
}
PATH_MAP = {'DATA_PATH': "./data/input", PATH_MAP = {
'PRETRAIN_MODEL': './data/pretrain_model', 'DATA_PATH': "./data/input",
'TRAINED_MODEL': './data/saved_models'} 'PRETRAIN_MODEL': './data/pretrain_model',
'TRAINED_MODEL': './data/saved_models'
}
def un_tar(tar_name, dir_name): def un_tar(tar_name, dir_name):
try: try:
t = tarfile.open(tar_name) t = tarfile.open(tar_name)
t.extractall(path = dir_name) t.extractall(path=dir_name)
return True return True
except Exception as e: except Exception as e:
print(e) print(e)
return False return False
def download_model_and_data(): def download_model_and_data():
print("Downloading dgu data, pretrain model and trained models......") print("Downloading dgu data, pretrain model and trained models......")
print("This process is quite long, please wait patiently............") print("This process is quite long, please wait patiently............")
for path in ['./data/input/data', './data/pretrain_model/uncased_L-12_H-768_A-12', './data/saved_models/trained_models']: for path in [
if not os.path.exists(path): './data/input/data',
'./data/pretrain_model/uncased_L-12_H-768_A-12',
'./data/saved_models/trained_models'
]:
if not os.path.exists(path):
continue continue
shutil.rmtree(path) shutil.rmtree(path)
for path_key in DATA_MODEL_PATH: for path_key in DATA_MODEL_PATH:
filename = os.path.basename(DATA_MODEL_PATH[path_key]) filename = os.path.basename(DATA_MODEL_PATH[path_key])
URLLIB.urlretrieve(DATA_MODEL_PATH[path_key], os.path.join("./", filename)) URLLIB.urlretrieve(DATA_MODEL_PATH[path_key],
os.path.join("./", filename))
state = un_tar(filename, PATH_MAP[path_key]) state = un_tar(filename, PATH_MAP[path_key])
if not state: if not state:
print("Tar %s error....." % path_key) print("Tar %s error....." % path_key)
return False return False
os.remove(filename) os.remove(filename)
return True return True
if __name__ == "__main__": if __name__ == "__main__":
state = download_model_and_data() state = download_model_and_data()
if not state: if not state:
exit(1) exit(1)
print("Downloading data and models sucess......") print("Downloading data and models sucess......")
...@@ -6,7 +6,7 @@ scripts:运行数据处理脚本目录, 将官方公开数据集转换成模 ...@@ -6,7 +6,7 @@ scripts:运行数据处理脚本目录, 将官方公开数据集转换成模
python run_build_data.py udc python run_build_data.py udc
生成数据在dialogue_general_understanding/data/input/data/udc 生成数据在dialogue_general_understanding/data/input/data/udc
2)、生成DA任务所需要的训练集、开发集、测试集时: 2)、生成DA任务所需要的训练集、开发集、测试集时:
python run_build_data.py swda python run_build_data.py swda
python run_build_data.py mrda python run_build_data.py mrda
生成数据分别在dialogue_general_understanding/data/input/data/swda和dialogue_general_understanding/data/input/data/mrda 生成数据分别在dialogue_general_understanding/data/input/data/swda和dialogue_general_understanding/data/input/data/mrda
...@@ -19,6 +19,3 @@ python run_build_data.py udc ...@@ -19,6 +19,3 @@ python run_build_data.py udc
python run_build_data.py atis python run_build_data.py atis
生成槽位识别数据在dialogue_general_understanding/data/input/data/atis/atis_slot 生成槽位识别数据在dialogue_general_understanding/data/input/data/atis/atis_slot
生成意图识别数据在dialogue_general_understanding/data/input/data/atis/atis_intent 生成意图识别数据在dialogue_general_understanding/data/input/data/atis/atis_intent
...@@ -12,7 +12,6 @@ ...@@ -12,7 +12,6 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""build swda train dev test dataset""" """build swda train dev test dataset"""
import json import json
...@@ -23,11 +22,12 @@ import io ...@@ -23,11 +22,12 @@ import io
import re import re
class ATIS(object): class ATIS(object):
""" """
nlu dataset atis data process nlu dataset atis data process
""" """
def __init__(self):
def __init__(self):
""" """
init instance init instance
""" """
...@@ -41,91 +41,94 @@ class ATIS(object): ...@@ -41,91 +41,94 @@ class ATIS(object):
self.map_tag_slot = "../../data/input/data/atis/atis_slot/map_tag_slot_id.txt" self.map_tag_slot = "../../data/input/data/atis/atis_slot/map_tag_slot_id.txt"
self.map_tag_intent = "../../data/input/data/atis/atis_intent/map_tag_intent_id.txt" self.map_tag_intent = "../../data/input/data/atis/atis_intent/map_tag_intent_id.txt"
def _load_file(self, data_type): def _load_file(self, data_type):
""" """
load dataset filename load dataset filename
""" """
slot_stat = os.path.exists(self.out_slot_dir) slot_stat = os.path.exists(self.out_slot_dir)
if not slot_stat: if not slot_stat:
os.makedirs(self.out_slot_dir) os.makedirs(self.out_slot_dir)
intent_stat = os.path.exists(self.out_intent_dir) intent_stat = os.path.exists(self.out_intent_dir)
if not intent_stat: if not intent_stat:
os.makedirs(self.out_intent_dir) os.makedirs(self.out_intent_dir)
src_examples = [] src_examples = []
json_file = os.path.join(self.src_dir, "%s.json" % data_type) json_file = os.path.join(self.src_dir, "%s.json" % data_type)
load_f = io.open(json_file, 'r', encoding="utf8") load_f = io.open(json_file, 'r', encoding="utf8")
json_dict = json.load(load_f) json_dict = json.load(load_f)
examples = json_dict['rasa_nlu_data']['common_examples'] examples = json_dict['rasa_nlu_data']['common_examples']
for example in examples: for example in examples:
text = example.get('text') text = example.get('text')
intent = example.get('intent') intent = example.get('intent')
entities = example.get('entities') entities = example.get('entities')
src_examples.append((text, intent, entities)) src_examples.append((text, intent, entities))
return src_examples return src_examples
def _parser_intent_data(self, examples, data_type): def _parser_intent_data(self, examples, data_type):
""" """
parser intent dataset parser intent dataset
""" """
out_filename = "%s/%s.txt" % (self.out_intent_dir, data_type) out_filename = "%s/%s.txt" % (self.out_intent_dir, data_type)
fw = io.open(out_filename, 'w', encoding="utf8") fw = io.open(out_filename, 'w', encoding="utf8")
for example in examples: for example in examples:
if example[1] not in self.intent_dict: if example[1] not in self.intent_dict:
self.intent_dict[example[1]] = self.intent_id self.intent_dict[example[1]] = self.intent_id
self.intent_id += 1 self.intent_id += 1
fw.write(u"%s\t%s\n" % (self.intent_dict[example[1]], example[0].lower())) fw.write(u"%s\t%s\n" %
(self.intent_dict[example[1]], example[0].lower()))
fw = io.open(self.map_tag_intent, 'w', encoding="utf8") fw = io.open(self.map_tag_intent, 'w', encoding="utf8")
for tag in self.intent_dict: for tag in self.intent_dict:
fw.write(u"%s\t%s\n" % (tag, self.intent_dict[tag])) fw.write(u"%s\t%s\n" % (tag, self.intent_dict[tag]))
def _parser_slot_data(self, examples, data_type): def _parser_slot_data(self, examples, data_type):
""" """
parser slot dataset parser slot dataset
""" """
out_filename = "%s/%s.txt" % (self.out_slot_dir, data_type) out_filename = "%s/%s.txt" % (self.out_slot_dir, data_type)
fw = io.open(out_filename, 'w', encoding="utf8") fw = io.open(out_filename, 'w', encoding="utf8")
for example in examples: for example in examples:
tags = [] tags = []
text = example[0] text = example[0]
entities = example[2] entities = example[2]
if not entities: if not entities:
tags = [str(self.slot_dict['O'])] * len(text.strip().split()) tags = [str(self.slot_dict['O'])] * len(text.strip().split())
continue continue
for i in range(len(entities)): for i in range(len(entities)):
enty = entities[i] enty = entities[i]
start = enty['start'] start = enty['start']
value_num = len(enty['value'].split()) value_num = len(enty['value'].split())
tags_slot = [] tags_slot = []
for j in range(value_num): for j in range(value_num):
if j == 0: if j == 0:
bround_tag = "B" bround_tag = "B"
else: else:
bround_tag = "I" bround_tag = "I"
tag = "%s-%s" % (bround_tag, enty['entity']) tag = "%s-%s" % (bround_tag, enty['entity'])
if tag not in self.slot_dict: if tag not in self.slot_dict:
self.slot_dict[tag] = self.slot_id self.slot_dict[tag] = self.slot_id
self.slot_id += 1 self.slot_id += 1
tags_slot.append(str(self.slot_dict[tag])) tags_slot.append(str(self.slot_dict[tag]))
if i == 0: if i == 0:
if start not in [0, 1]: if start not in [0, 1]:
prefix_num = len(text[: start].strip().split()) prefix_num = len(text[:start].strip().split())
tags.extend([str(self.slot_dict['O'])] * prefix_num) tags.extend([str(self.slot_dict['O'])] * prefix_num)
tags.extend(tags_slot) tags.extend(tags_slot)
else: else:
prefix_num = len(text[entities[i - 1]['end']: start].strip().split()) prefix_num = len(text[entities[i - 1]['end']:start].strip()
.split())
tags.extend([str(self.slot_dict['O'])] * prefix_num) tags.extend([str(self.slot_dict['O'])] * prefix_num)
tags.extend(tags_slot) tags.extend(tags_slot)
if entities[-1]['end'] < len(text): if entities[-1]['end'] < len(text):
suffix_num = len(text[entities[-1]['end']:].strip().split()) suffix_num = len(text[entities[-1]['end']:].strip().split())
tags.extend([str(self.slot_dict['O'])] * suffix_num) tags.extend([str(self.slot_dict['O'])] * suffix_num)
fw.write(u"%s\t%s\n" % (text.encode('utf8'), " ".join(tags).encode('utf8'))) fw.write(u"%s\t%s\n" %
(text.encode('utf8'), " ".join(tags).encode('utf8')))
fw = io.open(self.map_tag_slot, 'w', encoding="utf8") fw = io.open(self.map_tag_slot, 'w', encoding="utf8")
for slot in self.slot_dict: for slot in self.slot_dict:
fw.write(u"%s\t%s\n" % (slot, self.slot_dict[slot])) fw.write(u"%s\t%s\n" % (slot, self.slot_dict[slot]))
def get_train_dataset(self): def get_train_dataset(self):
""" """
parser train dataset and print train.txt parser train dataset and print train.txt
""" """
...@@ -133,7 +136,7 @@ class ATIS(object): ...@@ -133,7 +136,7 @@ class ATIS(object):
self._parser_intent_data(train_examples, "train") self._parser_intent_data(train_examples, "train")
self._parser_slot_data(train_examples, "train") self._parser_slot_data(train_examples, "train")
def get_test_dataset(self): def get_test_dataset(self):
""" """
parser test dataset and print test.txt parser test dataset and print test.txt
""" """
...@@ -141,7 +144,7 @@ class ATIS(object): ...@@ -141,7 +144,7 @@ class ATIS(object):
self._parser_intent_data(test_examples, "test") self._parser_intent_data(test_examples, "test")
self._parser_slot_data(test_examples, "test") self._parser_slot_data(test_examples, "test")
def main(self): def main(self):
""" """
run data process run data process
""" """
...@@ -149,10 +152,6 @@ class ATIS(object): ...@@ -149,10 +152,6 @@ class ATIS(object):
self.get_test_dataset() self.get_test_dataset()
if __name__ == "__main__": if __name__ == "__main__":
atis_inst = ATIS() atis_inst = ATIS()
atis_inst.main() atis_inst.main()
...@@ -24,11 +24,12 @@ import re ...@@ -24,11 +24,12 @@ import re
import commonlib import commonlib
class DSTC2(object): class DSTC2(object):
""" """
dialogue state tracking dstc2 data process dialogue state tracking dstc2 data process
""" """
def __init__(self):
def __init__(self):
""" """
init instance init instance
""" """
...@@ -42,16 +43,17 @@ class DSTC2(object): ...@@ -42,16 +43,17 @@ class DSTC2(object):
self._load_file() self._load_file()
self._load_ontology() self._load_ontology()
def _load_file(self): def _load_file(self):
""" """
load dataset filename load dataset filename
""" """
self.data_dict = commonlib.load_dict(self.data_list) self.data_dict = commonlib.load_dict(self.data_list)
for data_type in self.data_dict: for data_type in self.data_dict:
for i in range(len(self.data_dict[data_type])): for i in range(len(self.data_dict[data_type])):
self.data_dict[data_type][i] = os.path.join(self.src_dir, self.data_dict[data_type][i]) self.data_dict[data_type][i] = os.path.join(
self.src_dir, self.data_dict[data_type][i])
def _load_ontology(self): def _load_ontology(self):
""" """
load ontology tag load ontology tag
""" """
...@@ -60,8 +62,8 @@ class DSTC2(object): ...@@ -60,8 +62,8 @@ class DSTC2(object):
fr = io.open(self.onto_json, 'r', encoding="utf8") fr = io.open(self.onto_json, 'r', encoding="utf8")
ontology = json.load(fr) ontology = json.load(fr)
slots_values = ontology['informable'] slots_values = ontology['informable']
for slot in slots_values: for slot in slots_values:
for value in slots_values[slot]: for value in slots_values[slot]:
key = "%s_%s" % (slot, value) key = "%s_%s" % (slot, value)
self.map_tag_dict[key] = tag_id self.map_tag_dict[key] = tag_id
tag_id += 1 tag_id += 1
...@@ -69,22 +71,22 @@ class DSTC2(object): ...@@ -69,22 +71,22 @@ class DSTC2(object):
self.map_tag_dict[key] = tag_id self.map_tag_dict[key] = tag_id
tag_id += 1 tag_id += 1
def _parser_dataset(self, data_type): def _parser_dataset(self, data_type):
""" """
parser train dev test dataset parser train dev test dataset
""" """
stat = os.path.exists(self.out_dir) stat = os.path.exists(self.out_dir)
if not stat: if not stat:
os.makedirs(self.out_dir) os.makedirs(self.out_dir)
asr_stat = os.path.exists(self.out_asr_dir) asr_stat = os.path.exists(self.out_asr_dir)
if not asr_stat: if not asr_stat:
os.makedirs(self.out_asr_dir) os.makedirs(self.out_asr_dir)
out_file = os.path.join(self.out_dir, "%s.txt" % data_type) out_file = os.path.join(self.out_dir, "%s.txt" % data_type)
out_asr_file = os.path.join(self.out_asr_dir, "%s.txt" % data_type) out_asr_file = os.path.join(self.out_asr_dir, "%s.txt" % data_type)
fw = io.open(out_file, 'w', encoding="utf8") fw = io.open(out_file, 'w', encoding="utf8")
fw_asr = io.open(out_asr_file, 'w', encoding="utf8") fw_asr = io.open(out_asr_file, 'w', encoding="utf8")
data_list = self.data_dict.get(data_type) data_list = self.data_dict.get(data_type)
for fn in data_list: for fn in data_list:
log_file = os.path.join(fn, "log.json") log_file = os.path.join(fn, "log.json")
label_file = os.path.join(fn, "label.json") label_file = os.path.join(fn, "label.json")
f_log = io.open(log_file, 'r', encoding="utf8") f_log = io.open(log_file, 'r', encoding="utf8")
...@@ -93,49 +95,59 @@ class DSTC2(object): ...@@ -93,49 +95,59 @@ class DSTC2(object):
label_json = json.load(f_label) label_json = json.load(f_label)
session_id = log_json['session-id'] session_id = log_json['session-id']
assert len(label_json["turns"]) == len(log_json["turns"]) assert len(label_json["turns"]) == len(log_json["turns"])
for i in range(len(label_json["turns"])): for i in range(len(label_json["turns"])):
log_turn = log_json["turns"][i] log_turn = log_json["turns"][i]
label_turn = label_json["turns"][i] label_turn = label_json["turns"][i]
assert log_turn["turn-index"] == label_turn["turn-index"] assert log_turn["turn-index"] == label_turn["turn-index"]
labels = ["%s_%s" % (slot, label_turn["goal-labels"][slot]) for slot in label_turn["goal-labels"]] labels = [
labels_ids = " ".join([str(self.map_tag_dict.get(label, self.map_tag_dict["%s_none" % label.split('_')[0]])) for label in labels]) "%s_%s" % (slot, label_turn["goal-labels"][slot])
for slot in label_turn["goal-labels"]
]
labels_ids = " ".join([
str(
self.map_tag_dict.get(label, self.map_tag_dict[
"%s_none" % label.split('_')[0]]))
for label in labels
])
mach = log_turn['output']['transcript'] mach = log_turn['output']['transcript']
user = label_turn['transcription'] user = label_turn['transcription']
if not labels_ids.strip(): if not labels_ids.strip():
labels_ids = self.map_tag_dict['none'] labels_ids = self.map_tag_dict['none']
out = "%s\t%s\1%s\t%s" % (session_id, mach, user, labels_ids) out = "%s\t%s\1%s\t%s" % (session_id, mach, user, labels_ids)
user_asr = log_turn['input']['live']['asr-hyps'][0]['asr-hyp'].strip() user_asr = log_turn['input']['live']['asr-hyps'][0][
out_asr = "%s\t%s\1%s\t%s" % (session_id, mach, user_asr, labels_ids) 'asr-hyp'].strip()
out_asr = "%s\t%s\1%s\t%s" % (session_id, mach, user_asr,
labels_ids)
fw.write(u"%s\n" % out.encode('utf8')) fw.write(u"%s\n" % out.encode('utf8'))
fw_asr.write(u"%s\n" % out_asr.encode('utf8')) fw_asr.write(u"%s\n" % out_asr.encode('utf8'))
def get_train_dataset(self): def get_train_dataset(self):
""" """
parser train dataset and print train.txt parser train dataset and print train.txt
""" """
self._parser_dataset("train") self._parser_dataset("train")
def get_dev_dataset(self): def get_dev_dataset(self):
""" """
parser dev dataset and print dev.txt parser dev dataset and print dev.txt
""" """
self._parser_dataset("dev") self._parser_dataset("dev")
def get_test_dataset(self): def get_test_dataset(self):
""" """
parser test dataset and print test.txt parser test dataset and print test.txt
""" """
self._parser_dataset("test") self._parser_dataset("test")
def get_labels(self): def get_labels(self):
""" """
get tag and map ids file get tag and map ids file
""" """
fw = io.open(self.map_tag, 'w', encoding="utf8") fw = io.open(self.map_tag, 'w', encoding="utf8")
for elem in self.map_tag_dict: for elem in self.map_tag_dict:
fw.write(u"%s\t%s\n" % (elem, self.map_tag_dict[elem])) fw.write(u"%s\t%s\n" % (elem, self.map_tag_dict[elem]))
def main(self): def main(self):
""" """
run data process run data process
""" """
...@@ -144,10 +156,7 @@ class DSTC2(object): ...@@ -144,10 +156,7 @@ class DSTC2(object):
self.get_test_dataset() self.get_test_dataset()
self.get_labels() self.get_labels()
if __name__ == "__main__":
if __name__ == "__main__":
dstc_inst = DSTC2() dstc_inst = DSTC2()
dstc_inst.main() dstc_inst.main()
...@@ -23,11 +23,12 @@ import re ...@@ -23,11 +23,12 @@ import re
import commonlib import commonlib
class MRDA(object): class MRDA(object):
""" """
dialogue act dataset mrda data process dialogue act dataset mrda data process
""" """
def __init__(self):
def __init__(self):
""" """
init instance init instance
""" """
...@@ -41,7 +42,7 @@ class MRDA(object): ...@@ -41,7 +42,7 @@ class MRDA(object):
self._load_file() self._load_file()
self.tag_dict = commonlib.load_voc(self.voc_map_tag) self.tag_dict = commonlib.load_voc(self.voc_map_tag)
def _load_file(self): def _load_file(self):
""" """
load dataset filename load dataset filename
""" """
...@@ -49,30 +50,30 @@ class MRDA(object): ...@@ -49,30 +50,30 @@ class MRDA(object):
self.trans_dict = {} self.trans_dict = {}
self.data_dict = commonlib.load_dict(self.data_list) self.data_dict = commonlib.load_dict(self.data_list)
file_list, file_path = commonlib.get_file_list(self.src_dir) file_list, file_path = commonlib.get_file_list(self.src_dir)
for i in range(len(file_list)): for i in range(len(file_list)):
name = file_list[i] name = file_list[i]
keyword = name.split('.')[0] keyword = name.split('.')[0]
if 'dadb' in name: if 'dadb' in name:
self.dadb_dict[keyword] = file_path[i] self.dadb_dict[keyword] = file_path[i]
if 'trans' in name: if 'trans' in name:
self.trans_dict[keyword] = file_path[i] self.trans_dict[keyword] = file_path[i]
def load_dadb(self, data_type): def load_dadb(self, data_type):
""" """
load dadb dataset load dadb dataset
""" """
dadb_dict = {} dadb_dict = {}
conv_id_list = [] conv_id_list = []
dadb_list = self.data_dict[data_type] dadb_list = self.data_dict[data_type]
for dadb_key in dadb_list: for dadb_key in dadb_list:
dadb_file = self.dadb_dict[dadb_key] dadb_file = self.dadb_dict[dadb_key]
fr = io.open(dadb_file, 'r', encoding="utf8") fr = io.open(dadb_file, 'r', encoding="utf8")
row = csv.reader(fr, delimiter = ',') row = csv.reader(fr, delimiter=',')
for line in row: for line in row:
elems = line elems = line
conv_id = elems[2] conv_id = elems[2]
conv_id_list.append(conv_id) conv_id_list.append(conv_id)
if len(elems) != 14: if len(elems) != 14:
continue continue
error_code = elems[3] error_code = elems[3]
da_tag = elems[-9] da_tag = elems[-9]
...@@ -80,17 +81,17 @@ class MRDA(object): ...@@ -80,17 +81,17 @@ class MRDA(object):
dadb_dict[conv_id] = (error_code, da_ori_tag, da_tag) dadb_dict[conv_id] = (error_code, da_ori_tag, da_tag)
return dadb_dict, conv_id_list return dadb_dict, conv_id_list
def load_trans(self, data_type): def load_trans(self, data_type):
"""load trans data""" """load trans data"""
trans_dict = {} trans_dict = {}
trans_list = self.data_dict[data_type] trans_list = self.data_dict[data_type]
for trans_key in trans_list: for trans_key in trans_list:
trans_file = self.trans_dict[trans_key] trans_file = self.trans_dict[trans_key]
fr = io.open(trans_file, 'r', encoding="utf8") fr = io.open(trans_file, 'r', encoding="utf8")
row = csv.reader(fr, delimiter = ',') row = csv.reader(fr, delimiter=',')
for line in row: for line in row:
elems = line elems = line
if len(elems) != 3: if len(elems) != 3:
continue continue
conv_id = elems[0] conv_id = elems[0]
text = elems[1] text = elems[1]
...@@ -98,7 +99,7 @@ class MRDA(object): ...@@ -98,7 +99,7 @@ class MRDA(object):
trans_dict[conv_id] = (text, text_process) trans_dict[conv_id] = (text, text_process)
return trans_dict return trans_dict
def _parser_dataset(self, data_type): def _parser_dataset(self, data_type):
""" """
parser train dev test dataset parser train dev test dataset
""" """
...@@ -106,50 +107,51 @@ class MRDA(object): ...@@ -106,50 +107,51 @@ class MRDA(object):
dadb_dict, conv_id_list = self.load_dadb(data_type) dadb_dict, conv_id_list = self.load_dadb(data_type)
trans_dict = self.load_trans(data_type) trans_dict = self.load_trans(data_type)
fw = io.open(out_filename, 'w', encoding="utf8") fw = io.open(out_filename, 'w', encoding="utf8")
for elem in conv_id_list: for elem in conv_id_list:
v_dadb = dadb_dict[elem] v_dadb = dadb_dict[elem]
v_trans = trans_dict[elem] v_trans = trans_dict[elem]
da_tag = v_dadb[2] da_tag = v_dadb[2]
if da_tag not in self.tag_dict: if da_tag not in self.tag_dict:
continue continue
tag = self.tag_dict[da_tag] tag = self.tag_dict[da_tag]
if tag == "Z": if tag == "Z":
continue continue
if tag not in self.map_tag_dict: if tag not in self.map_tag_dict:
self.map_tag_dict[tag] = self.tag_id self.map_tag_dict[tag] = self.tag_id
self.tag_id += 1 self.tag_id += 1
caller = elem.split('_')[0].split('-')[-1] caller = elem.split('_')[0].split('-')[-1]
conv_no = elem.split('_')[0].split('-')[0] conv_no = elem.split('_')[0].split('-')[0]
out = "%s\t%s\t%s\t%s" % (conv_no, self.map_tag_dict[tag], caller, v_trans[0]) out = "%s\t%s\t%s\t%s" % (conv_no, self.map_tag_dict[tag], caller,
v_trans[0])
fw.write(u"%s\n" % out) fw.write(u"%s\n" % out)
def get_train_dataset(self): def get_train_dataset(self):
""" """
parser train dataset and print train.txt parser train dataset and print train.txt
""" """
self._parser_dataset("train") self._parser_dataset("train")
def get_dev_dataset(self): def get_dev_dataset(self):
""" """
parser dev dataset and print dev.txt parser dev dataset and print dev.txt
""" """
self._parser_dataset("dev") self._parser_dataset("dev")
def get_test_dataset(self): def get_test_dataset(self):
""" """
parser test dataset and print test.txt parser test dataset and print test.txt
""" """
self._parser_dataset("test") self._parser_dataset("test")
def get_labels(self): def get_labels(self):
""" """
get tag and map ids file get tag and map ids file
""" """
fw = io.open(self.map_tag, 'w', encoding="utf8") fw = io.open(self.map_tag, 'w', encoding="utf8")
for elem in self.map_tag_dict: for elem in self.map_tag_dict:
fw.write(u"%s\t%s\n" % (elem, self.map_tag_dict[elem])) fw.write(u"%s\t%s\n" % (elem, self.map_tag_dict[elem]))
def main(self): def main(self):
""" """
run data process run data process
""" """
...@@ -158,10 +160,7 @@ class MRDA(object): ...@@ -158,10 +160,7 @@ class MRDA(object):
self.get_test_dataset() self.get_test_dataset()
self.get_labels() self.get_labels()
if __name__ == "__main__":
if __name__ == "__main__":
mrda_inst = MRDA() mrda_inst = MRDA()
mrda_inst.main() mrda_inst.main()
...@@ -23,11 +23,12 @@ import re ...@@ -23,11 +23,12 @@ import re
import commonlib import commonlib
class SWDA(object): class SWDA(object):
""" """
dialogue act dataset swda data process dialogue act dataset swda data process
""" """
def __init__(self):
def __init__(self):
""" """
init instance init instance
""" """
...@@ -39,94 +40,94 @@ class SWDA(object): ...@@ -39,94 +40,94 @@ class SWDA(object):
self.src_dir = "../../data/input/data/swda/source_data/swda" self.src_dir = "../../data/input/data/swda/source_data/swda"
self._load_file() self._load_file()
def _load_file(self): def _load_file(self):
""" """
load dataset filename load dataset filename
""" """
self.data_dict = commonlib.load_dict(self.data_list) self.data_dict = commonlib.load_dict(self.data_list)
self.file_dict = {} self.file_dict = {}
child_dir = commonlib.get_dir_list(self.src_dir) child_dir = commonlib.get_dir_list(self.src_dir)
for chd in child_dir: for chd in child_dir:
file_list, file_path = commonlib.get_file_list(chd) file_list, file_path = commonlib.get_file_list(chd)
for i in range(len(file_list)): for i in range(len(file_list)):
name = file_list[i] name = file_list[i]
keyword = "sw%s" % name.split('.')[0].split('_')[-1] keyword = "sw%s" % name.split('.')[0].split('_')[-1]
self.file_dict[keyword] = file_path[i] self.file_dict[keyword] = file_path[i]
def _parser_dataset(self, data_type): def _parser_dataset(self, data_type):
""" """
parser train dev test dataset parser train dev test dataset
""" """
out_filename = "%s/%s.txt" % (self.out_dir, data_type) out_filename = "%s/%s.txt" % (self.out_dir, data_type)
fw = io.open(out_filename, 'w', encoding='utf8') fw = io.open(out_filename, 'w', encoding='utf8')
for name in self.data_dict[data_type]: for name in self.data_dict[data_type]:
file_path = self.file_dict[name] file_path = self.file_dict[name]
fr = io.open(file_path, 'r', encoding="utf8") fr = io.open(file_path, 'r', encoding="utf8")
idx = 0 idx = 0
row = csv.reader(fr, delimiter = ',') row = csv.reader(fr, delimiter=',')
for r in row: for r in row:
if idx == 0: if idx == 0:
idx += 1 idx += 1
continue continue
out = self._parser_utterence(r) out = self._parser_utterence(r)
fw.write(u"%s\n" % out) fw.write(u"%s\n" % out)
def _clean_text(self, text): def _clean_text(self, text):
""" """
text cleaning for dialogue act dataset text cleaning for dialogue act dataset
""" """
if text.startswith('<') and text.endswith('>.'): if text.startswith('<') and text.endswith('>.'):
return text return text
if "[" in text or "]" in text: if "[" in text or "]" in text:
stat = True stat = True
else: else:
stat = False stat = False
group = re.findall("\[.*?\+.*?\]", text) group = re.findall("\[.*?\+.*?\]", text)
while group and stat: while group and stat:
for elem in group: for elem in group:
elem_src = elem elem_src = elem
elem = re.sub('\+', '', elem.lstrip('[').rstrip(']')) elem = re.sub('\+', '', elem.lstrip('[').rstrip(']'))
text = text.replace(elem_src, elem) text = text.replace(elem_src, elem)
if "[" in text or "]" in text: if "[" in text or "]" in text:
stat = True stat = True
else: else:
stat = False stat = False
group = re.findall("\[.*?\+.*?\]", text) group = re.findall("\[.*?\+.*?\]", text)
if "{" in text or "}" in text: if "{" in text or "}" in text:
stat = True stat = True
else: else:
stat = False stat = False
group = re.findall("{[A-Z].*?}", text) group = re.findall("{[A-Z].*?}", text)
while group and stat: while group and stat:
child_group = re.findall("{[A-Z]*(.*?)}", text) child_group = re.findall("{[A-Z]*(.*?)}", text)
for i in range(len(group)): for i in range(len(group)):
text = text.replace(group[i], child_group[i]) text = text.replace(group[i], child_group[i])
if "{" in text or "}" in text: if "{" in text or "}" in text:
stat = True stat = True
else: else:
stat = False stat = False
group = re.findall("{[A-Z].*?}", text) group = re.findall("{[A-Z].*?}", text)
if "(" in text or ")" in text: if "(" in text or ")" in text:
stat = True stat = True
else: else:
stat = False stat = False
group = re.findall("\(\(.*?\)\)", text) group = re.findall("\(\(.*?\)\)", text)
while group and stat: while group and stat:
for elem in group: for elem in group:
if elem: if elem:
elem_clean = re.sub("\(|\)", "", elem) elem_clean = re.sub("\(|\)", "", elem)
text = text.replace(elem, elem_clean) text = text.replace(elem, elem_clean)
else: else:
text = text.replace(elem, "mumblex") text = text.replace(elem, "mumblex")
if "(" in text or ")" in text: if "(" in text or ")" in text:
stat = True stat = True
else: else:
stat = False stat = False
group = re.findall("\(\((.*?)\)\)", text) group = re.findall("\(\((.*?)\)\)", text)
group = re.findall("\<.*?\>", text) group = re.findall("\<.*?\>", text)
if group: if group:
for elem in group: for elem in group:
text = text.replace(elem, "") text = text.replace(elem, "")
text = re.sub(r" \'s", "\'s", text) text = re.sub(r" \'s", "\'s", text)
...@@ -137,24 +138,24 @@ class SWDA(object): ...@@ -137,24 +138,24 @@ class SWDA(object):
text = re.sub("\[|\]|\+|\>|\<|\{|\}", "", text) text = re.sub("\[|\]|\+|\>|\<|\{|\}", "", text)
return text.strip().lower() return text.strip().lower()
def _map_tag(self, da_tag): def _map_tag(self, da_tag):
""" """
map tag to 42 classes map tag to 42 classes
""" """
curr_da_tags = [] curr_da_tags = []
curr_das = re.split(r"\s*[,;]\s*", da_tag) curr_das = re.split(r"\s*[,;]\s*", da_tag)
for curr_da in curr_das: for curr_da in curr_das:
if curr_da == "qy_d" or curr_da == "qw^d" or curr_da == "b^m": if curr_da == "qy_d" or curr_da == "qw^d" or curr_da == "b^m":
pass pass
elif curr_da == "nn^e": elif curr_da == "nn^e":
curr_da = "ng" curr_da = "ng"
elif curr_da == "ny^e": elif curr_da == "ny^e":
curr_da = "na" curr_da = "na"
else: else:
curr_da = re.sub(r'(.)\^.*', r'\1', curr_da) curr_da = re.sub(r'(.)\^.*', r'\1', curr_da)
curr_da = re.sub(r'[\(\)@*]', '', curr_da) curr_da = re.sub(r'[\(\)@*]', '', curr_da)
tag = curr_da tag = curr_da
if tag in ('qr', 'qy'): if tag in ('qr', 'qy'):
tag = 'qy' tag = 'qy'
elif tag in ('fe', 'ba'): elif tag in ('fe', 'ba'):
tag = 'ba' tag = 'ba'
...@@ -170,12 +171,12 @@ class SWDA(object): ...@@ -170,12 +171,12 @@ class SWDA(object):
tag = 'fo_o_fw_"_by_bc' tag = 'fo_o_fw_"_by_bc'
curr_da = tag curr_da = tag
curr_da_tags.append(curr_da) curr_da_tags.append(curr_da)
if curr_da_tags[0] not in self.map_tag_dict: if curr_da_tags[0] not in self.map_tag_dict:
self.map_tag_dict[curr_da_tags[0]] = self.tag_id self.map_tag_dict[curr_da_tags[0]] = self.tag_id
self.tag_id += 1 self.tag_id += 1
return self.map_tag_dict[curr_da_tags[0]] return self.map_tag_dict[curr_da_tags[0]]
def _parser_utterence(self, line): def _parser_utterence(self, line):
""" """
parser one turn dialogue parser one turn dialogue
""" """
...@@ -188,34 +189,34 @@ class SWDA(object): ...@@ -188,34 +189,34 @@ class SWDA(object):
out = "%s\t%s\t%s\t%s" % (conversation_no, act_tag, caller, text) out = "%s\t%s\t%s\t%s" % (conversation_no, act_tag, caller, text)
return out return out
def get_train_dataset(self): def get_train_dataset(self):
""" """
parser train dataset and print train.txt parser train dataset and print train.txt
""" """
self._parser_dataset("train") self._parser_dataset("train")
def get_dev_dataset(self): def get_dev_dataset(self):
""" """
parser dev dataset and print dev.txt parser dev dataset and print dev.txt
""" """
self._parser_dataset("dev") self._parser_dataset("dev")
def get_test_dataset(self): def get_test_dataset(self):
""" """
parser test dataset and print test.txt parser test dataset and print test.txt
""" """
self._parser_dataset("test") self._parser_dataset("test")
def get_labels(self): def get_labels(self):
""" """
get tag and map ids file get tag and map ids file
""" """
fw = io.open(self.map_tag, 'w', encoding='utf8') fw = io.open(self.map_tag, 'w', encoding='utf8')
for elem in self.map_tag_dict: for elem in self.map_tag_dict:
fw.write(u"%s\t%s\n" % (elem, self.map_tag_dict[elem])) fw.write(u"%s\t%s\n" % (elem, self.map_tag_dict[elem]))
def main(self): def main(self):
""" """
run data process run data process
""" """
...@@ -224,10 +225,7 @@ class SWDA(object): ...@@ -224,10 +225,7 @@ class SWDA(object):
self.get_test_dataset() self.get_test_dataset()
self.get_labels() self.get_labels()
if __name__ == "__main__":
if __name__ == "__main__":
swda_inst = SWDA() swda_inst = SWDA()
swda_inst.main() swda_inst.main()
...@@ -25,52 +25,49 @@ def get_file_list(dir_name): ...@@ -25,52 +25,49 @@ def get_file_list(dir_name):
file_list = list() file_list = list()
file_path = list() file_path = list()
for root, dirs, files in os.walk(dir_name): for root, dirs, files in os.walk(dir_name):
for file in files: for file in files:
file_list.append(file) file_list.append(file)
file_path.append(os.path.join(root, file)) file_path.append(os.path.join(root, file))
return file_list, file_path return file_list, file_path
def get_dir_list(dir_name): def get_dir_list(dir_name):
""" """
get directory names get directory names
""" """
child_dir = [] child_dir = []
dir_list = os.listdir(dir_name) dir_list = os.listdir(dir_name)
for cur_file in dir_list: for cur_file in dir_list:
path = os.path.join(dir_name, cur_file) path = os.path.join(dir_name, cur_file)
if not os.path.isdir(path): if not os.path.isdir(path):
continue continue
child_dir.append(path) child_dir.append(path)
return child_dir return child_dir
def load_dict(conf): def load_dict(conf):
""" """
load swda dataset config load swda dataset config
""" """
conf_dict = dict() conf_dict = dict()
fr = io.open(conf, 'r', encoding="utf8") fr = io.open(conf, 'r', encoding="utf8")
for line in fr: for line in fr:
line = line.strip() line = line.strip()
elems = line.split('\t') elems = line.split('\t')
if elems[0] not in conf_dict: if elems[0] not in conf_dict:
conf_dict[elems[0]] = [] conf_dict[elems[0]] = []
conf_dict[elems[0]].append(elems[1]) conf_dict[elems[0]].append(elems[1])
return conf_dict return conf_dict
def load_voc(conf): def load_voc(conf):
""" """
load map dict load map dict
""" """
map_dict = {} map_dict = {}
fr = io.open(conf, 'r', encoding="utf8") fr = io.open(conf, 'r', encoding="utf8")
for line in fr: for line in fr:
line = line.strip() line = line.strip()
elems = line.split('\t') elems = line.split('\t')
map_dict[elems[0]] = elems[1] map_dict[elems[0]] = elems[1]
return map_dict return map_dict
...@@ -20,29 +20,29 @@ from build_dstc2_dataset import DSTC2 ...@@ -20,29 +20,29 @@ from build_dstc2_dataset import DSTC2
from build_mrda_dataset import MRDA from build_mrda_dataset import MRDA
from build_swda_dataset import SWDA from build_swda_dataset import SWDA
if __name__ == "__main__":
if __name__ == "__main__":
task_name = sys.argv[1] task_name = sys.argv[1]
task_name = task_name.lower() task_name = task_name.lower()
if task_name not in ['swda', 'mrda', 'atis', 'dstc2', 'udc']: if task_name not in ['swda', 'mrda', 'atis', 'dstc2', 'udc']:
print("task name error: we support [swda|mrda|atis|dstc2|udc]") print("task name error: we support [swda|mrda|atis|dstc2|udc]")
exit(1) exit(1)
if task_name == 'swda': if task_name == 'swda':
swda_inst = SWDA() swda_inst = SWDA()
swda_inst.main() swda_inst.main()
elif task_name == 'mrda': elif task_name == 'mrda':
mrda_inst = MRDA() mrda_inst = MRDA()
mrda_inst.main() mrda_inst.main()
elif task_name == 'atis': elif task_name == 'atis':
atis_inst = ATIS() atis_inst = ATIS()
atis_inst.main() atis_inst.main()
shutil.copyfile("../../data/input/data/atis/atis_slot/test.txt", "../../data/input/data/atis/atis_slot/dev.txt") shutil.copyfile("../../data/input/data/atis/atis_slot/test.txt",
shutil.copyfile("../../data/input/data/atis/atis_intent/test.txt", "../../data/input/data/atis/atis_intent/dev.txt") "../../data/input/data/atis/atis_slot/dev.txt")
elif task_name == 'dstc2': shutil.copyfile("../../data/input/data/atis/atis_intent/test.txt",
"../../data/input/data/atis/atis_intent/dev.txt")
elif task_name == 'dstc2':
dstc_inst = DSTC2() dstc_inst = DSTC2()
dstc_inst.main() dstc_inst.main()
else: else:
exit(0) exit(0)
...@@ -12,7 +12,6 @@ ...@@ -12,7 +12,6 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""Tokenization classes.""" """Tokenization classes."""
from __future__ import absolute_import from __future__ import absolute_import
......
...@@ -113,7 +113,7 @@ def multi_head_attention(queries, ...@@ -113,7 +113,7 @@ def multi_head_attention(queries,
""" """
Scaled Dot-Product Attention Scaled Dot-Product Attention
""" """
scaled_q = layers.scale(x=q, scale=d_key ** -0.5) scaled_q = layers.scale(x=q, scale=d_key**-0.5)
product = layers.matmul(x=scaled_q, y=k, transpose_y=True) product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
if attn_bias: if attn_bias:
product += attn_bias product += attn_bias
......
...@@ -25,8 +25,8 @@ import numpy as np ...@@ -25,8 +25,8 @@ import numpy as np
import paddle.fluid as fluid import paddle.fluid as fluid
class InputField(object): class InputField(object):
def __init__(self, input_field): def __init__(self, input_field):
"""init inpit field""" """init inpit field"""
self.src_ids = input_field[0] self.src_ids = input_field[0]
self.pos_ids = input_field[1] self.pos_ids = input_field[1]
......
...@@ -30,7 +30,7 @@ def check_cuda(use_cuda, err = \ ...@@ -30,7 +30,7 @@ def check_cuda(use_cuda, err = \
if __name__ == "__main__": if __name__ == "__main__":
check_cuda(True) check_cuda(True)
check_cuda(False) check_cuda(False)
......
...@@ -69,8 +69,8 @@ def init_from_checkpoint(args, exe, program): ...@@ -69,8 +69,8 @@ def init_from_checkpoint(args, exe, program):
def init_from_params(args, exe, program): def init_from_params(args, exe, program):
assert isinstance(args.init_from_params, str) assert isinstance(args.init_from_params, str)
if not os.path.exists(args.init_from_params): if not os.path.exists(args.init_from_params):
raise Warning("the params path does not exist.") raise Warning("the params path does not exist.")
return False return False
...@@ -113,7 +113,7 @@ def save_param(args, exe, program, dirname): ...@@ -113,7 +113,7 @@ def save_param(args, exe, program, dirname):
if not os.path.exists(param_dir): if not os.path.exists(param_dir):
os.makedirs(param_dir) os.makedirs(param_dir)
fluid.io.save_params( fluid.io.save_params(
exe, exe,
os.path.join(param_dir, dirname), os.path.join(param_dir, dirname),
...@@ -122,5 +122,3 @@ def save_param(args, exe, program, dirname): ...@@ -122,5 +122,3 @@ def save_param(args, exe, program, dirname):
print("save parameters at %s" % (os.path.join(param_dir, dirname))) print("save parameters at %s" % (os.path.join(param_dir, dirname)))
return True return True
...@@ -23,14 +23,9 @@ from dgu.bert import BertModel ...@@ -23,14 +23,9 @@ from dgu.bert import BertModel
from dgu.utils.configure import JsonConfig from dgu.utils.configure import JsonConfig
def create_net( def create_net(is_training, model_input, num_labels, paradigm_inst, args):
is_training,
model_input,
num_labels,
paradigm_inst,
args):
"""create dialogue task model""" """create dialogue task model"""
src_ids = model_input.src_ids src_ids = model_input.src_ids
pos_ids = model_input.pos_ids pos_ids = model_input.pos_ids
sent_ids = model_input.sent_ids sent_ids = model_input.sent_ids
...@@ -48,14 +43,15 @@ def create_net( ...@@ -48,14 +43,15 @@ def create_net(
config=bert_conf, config=bert_conf,
use_fp16=False) use_fp16=False)
params = {'num_labels': num_labels, params = {
'src_ids': src_ids, 'num_labels': num_labels,
'pos_ids': pos_ids, 'src_ids': src_ids,
'sent_ids': sent_ids, 'pos_ids': pos_ids,
'input_mask': input_mask, 'sent_ids': sent_ids,
'labels': labels, 'input_mask': input_mask,
'is_training': is_training} 'labels': labels,
'is_training': is_training
}
results = paradigm_inst.paradigm(bert, params) results = paradigm_inst.paradigm(bert, params)
return results return results
...@@ -20,17 +20,17 @@ from dgu.evaluation import evaluate ...@@ -20,17 +20,17 @@ from dgu.evaluation import evaluate
from dgu.utils.configure import PDConfig from dgu.utils.configure import PDConfig
def do_eval(args): def do_eval(args):
task_name = args.task_name.lower() task_name = args.task_name.lower()
reference = args.evaluation_file reference = args.evaluation_file
predicitions = args.output_prediction_file predicitions = args.output_prediction_file
evaluate(task_name, predicitions, reference) evaluate(task_name, predicitions, reference)
if __name__ == "__main__": if __name__ == "__main__":
args = PDConfig(yaml_file="./data/config/dgu.yaml") args = PDConfig(yaml_file="./data/config/dgu.yaml")
args.build() args.build()
......
...@@ -29,10 +29,10 @@ import dgu.utils.save_load_io as save_load_io ...@@ -29,10 +29,10 @@ import dgu.utils.save_load_io as save_load_io
import dgu.reader as reader import dgu.reader as reader
from dgu_net import create_net from dgu_net import create_net
import dgu.define_paradigm as define_paradigm import dgu.define_paradigm as define_paradigm
def do_save_inference_model(args): def do_save_inference_model(args):
"""save inference model function""" """save inference model function"""
task_name = args.task_name.lower() task_name = args.task_name.lower()
...@@ -57,35 +57,36 @@ def do_save_inference_model(args): ...@@ -57,35 +57,36 @@ def do_save_inference_model(args):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
# define inputs of the network # define inputs of the network
num_labels = len(processors[task_name].get_labels()) num_labels = len(processors[task_name].get_labels())
src_ids = fluid.data( src_ids = fluid.data(
name='src_ids', shape=[-1, args.max_seq_len], dtype='int64') name='src_ids', shape=[-1, args.max_seq_len], dtype='int64')
pos_ids = fluid.data( pos_ids = fluid.data(
name='pos_ids', shape=[-1, args.max_seq_len], dtype='int64') name='pos_ids', shape=[-1, args.max_seq_len], dtype='int64')
sent_ids = fluid.data( sent_ids = fluid.data(
name='sent_ids', shape=[-1, args.max_seq_len], dtype='int64') name='sent_ids', shape=[-1, args.max_seq_len], dtype='int64')
input_mask = fluid.data( input_mask = fluid.data(
name='input_mask', shape=[-1, args.max_seq_len], dtype='float32') name='input_mask',
if args.task_name == 'atis_slot': shape=[-1, args.max_seq_len],
dtype='float32')
if args.task_name == 'atis_slot':
labels = fluid.data( labels = fluid.data(
name='labels', shape=[-1, args.max_seq_len], dtype='int64') name='labels', shape=[-1, args.max_seq_len], dtype='int64')
elif args.task_name in ['dstc2', 'dstc2_asr', 'multi-woz']: elif args.task_name in ['dstc2', 'dstc2_asr', 'multi-woz']:
labels = fluid.data( labels = fluid.data(
name='labels', shape=[-1, num_labels], dtype='int64') name='labels', shape=[-1, num_labels], dtype='int64')
else: else:
labels = fluid.data( labels = fluid.data(name='labels', shape=[-1, 1], dtype='int64')
name='labels', shape=[-1, 1], dtype='int64')
input_inst = [src_ids, pos_ids, sent_ids, input_mask, labels] input_inst = [src_ids, pos_ids, sent_ids, input_mask, labels]
input_field = InputField(input_inst) input_field = InputField(input_inst)
results = create_net( results = create_net(
is_training=False, is_training=False,
model_input=input_field, model_input=input_field,
num_labels=num_labels, num_labels=num_labels,
paradigm_inst=paradigm_inst, paradigm_inst=paradigm_inst,
args=args) args=args)
probs = results.get("probs", None) probs = results.get("probs", None)
if args.use_cuda: if args.use_cuda:
...@@ -97,7 +98,7 @@ def do_save_inference_model(args): ...@@ -97,7 +98,7 @@ def do_save_inference_model(args):
exe.run(startup_prog) exe.run(startup_prog)
assert (args.init_from_params) or (args.init_from_pretrain_model) assert (args.init_from_params) or (args.init_from_pretrain_model)
if args.init_from_params: if args.init_from_params:
save_load_io.init_from_params(args, exe, test_prog) save_load_io.init_from_params(args, exe, test_prog)
elif args.init_from_pretrain_model: elif args.init_from_pretrain_model:
...@@ -105,20 +106,16 @@ def do_save_inference_model(args): ...@@ -105,20 +106,16 @@ def do_save_inference_model(args):
# saving inference model # saving inference model
fluid.io.save_inference_model( fluid.io.save_inference_model(
args.inference_model_dir, args.inference_model_dir,
feeded_var_names=[ feeded_var_names=[
input_field.src_ids.name, input_field.src_ids.name, input_field.pos_ids.name,
input_field.pos_ids.name, input_field.sent_ids.name, input_field.input_mask.name
input_field.sent_ids.name, ],
input_field.input_mask.name target_vars=[probs],
], executor=exe,
target_vars=[ main_program=test_prog,
probs model_filename="model.pdmodel",
], params_filename="params.pdparams")
executor=exe,
main_program=test_prog,
model_filename="model.pdmodel",
params_filename="params.pdparams")
print("save inference model at %s" % (args.inference_model_dir)) print("save inference model at %s" % (args.inference_model_dir))
......
...@@ -26,7 +26,6 @@ from inference_model import do_save_inference_model ...@@ -26,7 +26,6 @@ from inference_model import do_save_inference_model
from dgu.utils.configure import PDConfig from dgu.utils.configure import PDConfig
if __name__ == "__main__": if __name__ == "__main__":
args = PDConfig(yaml_file="./data/config/dgu.yaml") args = PDConfig(yaml_file="./data/config/dgu.yaml")
......
...@@ -28,7 +28,7 @@ import paddle.fluid as fluid ...@@ -28,7 +28,7 @@ import paddle.fluid as fluid
from dgu_net import create_net from dgu_net import create_net
import dgu.reader as reader import dgu.reader as reader
from dgu.optimization import optimization from dgu.optimization import optimization
import dgu.define_paradigm as define_paradigm import dgu.define_paradigm as define_paradigm
from dgu.utils.configure import PDConfig from dgu.utils.configure import PDConfig
from dgu.utils.input_field import InputField from dgu.utils.input_field import InputField
from dgu.utils.model_check import check_cuda from dgu.utils.model_check import check_cuda
...@@ -37,7 +37,7 @@ import dgu.utils.save_load_io as save_load_io ...@@ -37,7 +37,7 @@ import dgu.utils.save_load_io as save_load_io
def do_train(args): def do_train(args):
"""train function""" """train function"""
task_name = args.task_name.lower() task_name = args.task_name.lower()
paradigm_inst = define_paradigm.Paradigm(task_name) paradigm_inst = define_paradigm.Paradigm(task_name)
...@@ -53,34 +53,35 @@ def do_train(args): ...@@ -53,34 +53,35 @@ def do_train(args):
train_prog = fluid.default_main_program() train_prog = fluid.default_main_program()
startup_prog = fluid.default_startup_program() startup_prog = fluid.default_startup_program()
with fluid.program_guard(train_prog, startup_prog): with fluid.program_guard(train_prog, startup_prog):
train_prog.random_seed = args.random_seed train_prog.random_seed = args.random_seed
startup_prog.random_seed = args.random_seed startup_prog.random_seed = args.random_seed
with fluid.unique_name.guard(): with fluid.unique_name.guard():
num_labels = len(processors[task_name].get_labels()) num_labels = len(processors[task_name].get_labels())
src_ids = fluid.data( src_ids = fluid.data(
name='src_ids', shape=[-1, args.max_seq_len], dtype='int64') name='src_ids', shape=[-1, args.max_seq_len], dtype='int64')
pos_ids = fluid.data( pos_ids = fluid.data(
name='pos_ids', shape=[-1, args.max_seq_len], dtype='int64') name='pos_ids', shape=[-1, args.max_seq_len], dtype='int64')
sent_ids = fluid.data( sent_ids = fluid.data(
name='sent_ids', shape=[-1, args.max_seq_len], dtype='int64') name='sent_ids', shape=[-1, args.max_seq_len], dtype='int64')
input_mask = fluid.data( input_mask = fluid.data(
name='input_mask', shape=[-1, args.max_seq_len], dtype='float32') name='input_mask',
if args.task_name == 'atis_slot': shape=[-1, args.max_seq_len],
dtype='float32')
if args.task_name == 'atis_slot':
labels = fluid.data( labels = fluid.data(
name='labels', shape=[-1, args.max_seq_len], dtype='int64') name='labels', shape=[-1, args.max_seq_len], dtype='int64')
elif args.task_name in ['dstc2']: elif args.task_name in ['dstc2']:
labels = fluid.data( labels = fluid.data(
name='labels', shape=[-1, num_labels], dtype='int64') name='labels', shape=[-1, num_labels], dtype='int64')
else: else:
labels = fluid.data( labels = fluid.data(name='labels', shape=[-1, 1], dtype='int64')
name='labels', shape=[-1, 1], dtype='int64')
input_inst = [src_ids, pos_ids, sent_ids, input_mask, labels] input_inst = [src_ids, pos_ids, sent_ids, input_mask, labels]
input_field = InputField(input_inst) input_field = InputField(input_inst)
data_reader = fluid.io.PyReader(feed_list=input_inst, data_reader = fluid.io.PyReader(
capacity=4, iterable=False) feed_list=input_inst, capacity=4, iterable=False)
processor = processors[task_name](data_dir=args.data_dir, processor = processors[task_name](data_dir=args.data_dir,
vocab_path=args.vocab_path, vocab_path=args.vocab_path,
max_seq_len=args.max_seq_len, max_seq_len=args.max_seq_len,
...@@ -90,12 +91,12 @@ def do_train(args): ...@@ -90,12 +91,12 @@ def do_train(args):
random_seed=args.random_seed) random_seed=args.random_seed)
results = create_net( results = create_net(
is_training=True, is_training=True,
model_input=input_field, model_input=input_field,
num_labels=num_labels, num_labels=num_labels,
paradigm_inst=paradigm_inst, paradigm_inst=paradigm_inst,
args=args) args=args)
loss = results.get("loss", None) loss = results.get("loss", None)
probs = results.get("probs", None) probs = results.get("probs", None)
accuracy = results.get("accuracy", None) accuracy = results.get("accuracy", None)
...@@ -103,21 +104,19 @@ def do_train(args): ...@@ -103,21 +104,19 @@ def do_train(args):
loss.persistable = True loss.persistable = True
probs.persistable = True probs.persistable = True
if accuracy: if accuracy:
accuracy.persistable = True accuracy.persistable = True
num_seqs.persistable = True num_seqs.persistable = True
if args.use_cuda: if args.use_cuda:
dev_count = fluid.core.get_cuda_device_count() dev_count = fluid.core.get_cuda_device_count()
else: else:
dev_count = int(os.environ.get('CPU_NUM', 1)) dev_count = int(os.environ.get('CPU_NUM', 1))
batch_generator = processor.data_generator( batch_generator = processor.data_generator(
batch_size=args.batch_size, batch_size=args.batch_size, phase='train', shuffle=True)
phase='train',
shuffle=True)
num_train_examples = processor.get_num_examples(phase='train') num_train_examples = processor.get_num_examples(phase='train')
if args.in_tokens: if args.in_tokens:
max_train_steps = args.epoch * num_train_examples // ( max_train_steps = args.epoch * num_train_examples // (
args.batch_size // args.max_seq_len) // dev_count args.batch_size // args.max_seq_len) // dev_count
...@@ -147,32 +146,32 @@ def do_train(args): ...@@ -147,32 +146,32 @@ def do_train(args):
place = fluid.CUDAPlace(int(os.getenv('FLAGS_selected_gpus', '0'))) place = fluid.CUDAPlace(int(os.getenv('FLAGS_selected_gpus', '0')))
else: else:
place = fluid.CPUPlace() place = fluid.CPUPlace()
exe = fluid.Executor(place) exe = fluid.Executor(place)
exe.run(startup_prog) exe.run(startup_prog)
assert (args.init_from_checkpoint == "") or ( assert (args.init_from_checkpoint == "") or (
args.init_from_pretrain_model == "") args.init_from_pretrain_model == "")
# init from some checkpoint, to resume the previous training # init from some checkpoint, to resume the previous training
if args.init_from_checkpoint: if args.init_from_checkpoint:
save_load_io.init_from_checkpoint(args, exe, train_prog) save_load_io.init_from_checkpoint(args, exe, train_prog)
# init from some pretrain models, to better solve the current task # init from some pretrain models, to better solve the current task
if args.init_from_pretrain_model: if args.init_from_pretrain_model:
save_load_io.init_from_pretrain_model(args, exe, train_prog) save_load_io.init_from_pretrain_model(args, exe, train_prog)
build_strategy = fluid.compiler.BuildStrategy() build_strategy = fluid.compiler.BuildStrategy()
build_strategy.enable_inplace = True build_strategy.enable_inplace = True
compiled_train_prog = fluid.CompiledProgram(train_prog).with_data_parallel( compiled_train_prog = fluid.CompiledProgram(train_prog).with_data_parallel(
loss_name=loss.name, build_strategy=build_strategy) loss_name=loss.name, build_strategy=build_strategy)
# start training # start training
steps = 0 steps = 0
time_begin = time.time() time_begin = time.time()
ce_info = [] ce_info = []
for epoch_step in range(args.epoch): for epoch_step in range(args.epoch):
data_reader.start() data_reader.start()
while True: while True:
try: try:
...@@ -216,43 +215,38 @@ def do_train(args): ...@@ -216,43 +215,38 @@ def do_train(args):
used_time = time_end - time_begin used_time = time_end - time_begin
current_time = time.strftime('%Y-%m-%d %H:%M:%S', current_time = time.strftime('%Y-%m-%d %H:%M:%S',
time.localtime(time.time())) time.localtime(time.time()))
if accuracy is not None: if accuracy is not None:
print( print("%s epoch: %d, step: %d, ave loss: %f, "
"%s epoch: %d, step: %d, ave loss: %f, " "ave acc: %f, speed: %f steps/s" %
"ave acc: %f, speed: %f steps/s" % (current_time, epoch_step, steps,
(current_time, epoch_step, steps, np.mean(np_loss), np.mean(np_acc),
np.mean(np_loss), args.print_steps / used_time))
np.mean(np_acc),
args.print_steps / used_time))
ce_info.append([ ce_info.append([
np.mean(np_loss), np.mean(np_loss), np.mean(np_acc),
np.mean(np_acc),
args.print_steps / used_time args.print_steps / used_time
]) ])
else: else:
print( print("%s epoch: %d, step: %d, ave loss: %f, "
"%s epoch: %d, step: %d, ave loss: %f, " "speed: %f steps/s" %
"speed: %f steps/s" % (current_time, epoch_step, steps,
(current_time, epoch_step, steps, np.mean(np_loss), args.print_steps / used_time))
np.mean(np_loss), ce_info.append(
args.print_steps / used_time)) [np.mean(np_loss), args.print_steps / used_time])
ce_info.append([
np.mean(np_loss),
args.print_steps / used_time
])
time_begin = time.time() time_begin = time.time()
if steps % args.save_steps == 0: if steps % args.save_steps == 0:
save_path = "step_" + str(steps) save_path = "step_" + str(steps)
if args.save_checkpoint: if args.save_checkpoint:
save_load_io.save_checkpoint(args, exe, train_prog, save_path) save_load_io.save_checkpoint(args, exe, train_prog,
save_path)
if args.save_param: if args.save_param:
save_load_io.save_param(args, exe, train_prog, save_path) save_load_io.save_param(args, exe, train_prog,
save_path)
except fluid.core.EOFException:
except fluid.core.EOFException:
data_reader.reset() data_reader.reset()
break break
if args.save_checkpoint: if args.save_checkpoint:
save_load_io.save_checkpoint(args, exe, train_prog, "step_final") save_load_io.save_checkpoint(args, exe, train_prog, "step_final")
if args.save_param: if args.save_param:
save_load_io.save_param(args, exe, train_prog, "step_final") save_load_io.save_param(args, exe, train_prog, "step_final")
...@@ -264,7 +258,7 @@ def do_train(args): ...@@ -264,7 +258,7 @@ def do_train(args):
if cards != '': if cards != '':
num = len(cards.split(",")) num = len(cards.split(","))
return num return num
if args.enable_ce: if args.enable_ce:
card_num = get_cards() card_num = get_cards()
print("test_card_num", card_num) print("test_card_num", card_num)
...@@ -283,8 +277,8 @@ def do_train(args): ...@@ -283,8 +277,8 @@ def do_train(args):
print("kpis\ttrain_acc_%s_card%s\t%f" % (task_name, card_num, ce_acc)) print("kpis\ttrain_acc_%s_card%s\t%f" % (task_name, card_num, ce_acc))
if __name__ == '__main__': if __name__ == '__main__':
args = PDConfig(yaml_file="./data/config/dgu.yaml") args = PDConfig(yaml_file="./data/config/dgu.yaml")
args.build() args.build()
args.Print() args.Print()
......
...@@ -19,8 +19,7 @@ from __future__ import print_function ...@@ -19,8 +19,7 @@ from __future__ import print_function
import os import os
import sys import sys
sys.path.append("../") sys.path.append("../shared_modules/")
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
import numpy as np import numpy as np
......
...@@ -11,7 +11,6 @@ ...@@ -11,7 +11,6 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
""" """
Emotion Detection Task Emotion Detection Task
""" """
...@@ -24,7 +23,7 @@ import os ...@@ -24,7 +23,7 @@ import os
import time import time
import multiprocessing import multiprocessing
import sys import sys
sys.path.append("../") sys.path.append("../shared_modules/")
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
...@@ -38,9 +37,7 @@ import reader ...@@ -38,9 +37,7 @@ import reader
import utils import utils
def create_model(args, def create_model(args, num_labels, is_prediction=False):
num_labels,
is_prediction=False):
""" """
Create Model for Emotion Detection Create Model for Emotion Detection
""" """
...@@ -77,10 +74,17 @@ def create_model(args, ...@@ -77,10 +74,17 @@ def create_model(args,
raise ValueError("Unknown network type!") raise ValueError("Unknown network type!")
if is_prediction: if is_prediction:
probs = network(data, seq_len, None, args.vocab_size, class_dim=num_labels, is_prediction=True) probs = network(
data,
seq_len,
None,
args.vocab_size,
class_dim=num_labels,
is_prediction=True)
return loader, probs, [data.name, seq_len.name] return loader, probs, [data.name, seq_len.name]
avg_loss, probs = network(data, seq_len, label, args.vocab_size, class_dim=num_labels) avg_loss, probs = network(
data, seq_len, label, args.vocab_size, class_dim=num_labels)
num_seqs = fluid.layers.create_tensor(dtype='int64') num_seqs = fluid.layers.create_tensor(dtype='int64')
accuracy = fluid.layers.accuracy(input=probs, label=label, total=num_seqs) accuracy = fluid.layers.accuracy(input=probs, label=label, total=num_seqs)
return loader, avg_loss, accuracy, num_seqs return loader, avg_loss, accuracy, num_seqs
...@@ -142,9 +146,10 @@ def main(args): ...@@ -142,9 +146,10 @@ def main(args):
exe = fluid.Executor(place) exe = fluid.Executor(place)
task_name = args.task_name.lower() task_name = args.task_name.lower()
processor = reader.EmoTectProcessor(data_dir=args.data_dir, processor = reader.EmoTectProcessor(
vocab_path=args.vocab_path, data_dir=args.data_dir,
random_seed=args.random_seed) vocab_path=args.vocab_path,
random_seed=args.random_seed)
#num_labels = len(processor.get_labels()) #num_labels = len(processor.get_labels())
num_labels = args.num_labels num_labels = args.num_labels
...@@ -173,9 +178,7 @@ def main(args): ...@@ -173,9 +178,7 @@ def main(args):
with fluid.program_guard(train_program, startup_prog): with fluid.program_guard(train_program, startup_prog):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
train_loader, loss, accuracy, num_seqs = create_model( train_loader, loss, accuracy, num_seqs = create_model(
args, args, num_labels=num_labels, is_prediction=False)
num_labels=num_labels,
is_prediction=False)
sgd_optimizer = fluid.optimizer.Adagrad(learning_rate=args.lr) sgd_optimizer = fluid.optimizer.Adagrad(learning_rate=args.lr)
sgd_optimizer.minimize(loss) sgd_optimizer.minimize(loss)
...@@ -189,37 +192,27 @@ def main(args): ...@@ -189,37 +192,27 @@ def main(args):
if args.do_val: if args.do_val:
if args.do_train: if args.do_train:
test_data_generator = processor.data_generator( test_data_generator = processor.data_generator(
batch_size=args.batch_size, batch_size=args.batch_size, phase='dev', epoch=1)
phase='dev',
epoch=1)
else: else:
test_data_generator = processor.data_generator( test_data_generator = processor.data_generator(
batch_size=args.batch_size, batch_size=args.batch_size, phase='test', epoch=1)
phase='test',
epoch=1)
test_prog = fluid.Program() test_prog = fluid.Program()
with fluid.program_guard(test_prog, startup_prog): with fluid.program_guard(test_prog, startup_prog):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
test_loader, loss, accuracy, num_seqs = create_model( test_loader, loss, accuracy, num_seqs = create_model(
args, args, num_labels=num_labels, is_prediction=False)
num_labels=num_labels,
is_prediction=False)
test_prog = test_prog.clone(for_test=True) test_prog = test_prog.clone(for_test=True)
if args.do_infer: if args.do_infer:
infer_data_generator = processor.data_generator( infer_data_generator = processor.data_generator(
batch_size=args.batch_size, batch_size=args.batch_size, phase='infer', epoch=1)
phase='infer',
epoch=1)
test_prog = fluid.Program() test_prog = fluid.Program()
with fluid.program_guard(test_prog, startup_prog): with fluid.program_guard(test_prog, startup_prog):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
infer_loader, probs, _ = create_model( infer_loader, probs, _ = create_model(
args, args, num_labels=num_labels, is_prediction=True)
num_labels=num_labels,
is_prediction=True)
test_prog = test_prog.clone(for_test=True) test_prog = test_prog.clone(for_test=True)
exe.run(startup_prog) exe.run(startup_prog)
...@@ -292,8 +285,9 @@ def main(args): ...@@ -292,8 +285,9 @@ def main(args):
time_begin = time.time() time_begin = time.time()
if steps % args.save_steps == 0: if steps % args.save_steps == 0:
save_path = os.path.join(args.save_checkpoint_dir, "step_" + str(steps)) save_path = os.path.join(args.save_checkpoint_dir,
fluid.io.save_persistables(exe, save_path, train_program) "step_" + str(steps))
fluid.save(train_program, save_path)
if steps % args.validation_steps == 0: if steps % args.validation_steps == 0:
# evaluate on dev set # evaluate on dev set
...@@ -306,11 +300,11 @@ def main(args): ...@@ -306,11 +300,11 @@ def main(args):
print("final step: %d " % steps) print("final step: %d " % steps)
if args.do_val: if args.do_val:
evaluate(test_exe, test_prog, test_loader, evaluate(test_exe, test_prog, test_loader,
[loss.name, accuracy.name, num_seqs.name], [loss.name, accuracy.name, num_seqs.name], "dev")
"dev")
save_path = os.path.join(args.save_checkpoint_dir, "step_" + str(steps)) save_path = os.path.join(args.save_checkpoint_dir,
fluid.io.save_persistables(exe, save_path, train_program) "step_" + str(steps))
fluid.save(train_program, save_path)
train_loader.reset() train_loader.reset()
break break
...@@ -334,15 +328,12 @@ def main(args): ...@@ -334,15 +328,12 @@ def main(args):
if not args.do_train and args.do_val: if not args.do_train and args.do_val:
print("Final test result:") print("Final test result:")
evaluate(test_exe, test_prog, test_loader, evaluate(test_exe, test_prog, test_loader,
[loss.name, accuracy.name, num_seqs.name], [loss.name, accuracy.name, num_seqs.name], "test")
"test")
# infer # infer
if args.do_infer: if args.do_infer:
print("Final infer result:") print("Final infer result:")
infer(test_exe, test_prog, infer_loader, infer(test_exe, test_prog, infer_loader, [probs.name], "infer")
[probs.name],
"infer")
def get_cards(): def get_cards():
......
...@@ -11,7 +11,6 @@ ...@@ -11,7 +11,6 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
""" """
Emotion Detection Task, based on ERNIE Emotion Detection Task, based on ERNIE
""" """
...@@ -25,7 +24,7 @@ import time ...@@ -25,7 +24,7 @@ import time
import argparse import argparse
import multiprocessing import multiprocessing
import sys import sys
sys.path.append("../") sys.path.append("../shared_modules/")
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
...@@ -350,7 +349,7 @@ def main(args): ...@@ -350,7 +349,7 @@ def main(args):
if steps % args.save_steps == 0: if steps % args.save_steps == 0:
save_path = os.path.join(args.save_checkpoint_dir, "step_" + str(steps)) save_path = os.path.join(args.save_checkpoint_dir, "step_" + str(steps))
fluid.io.save_persistables(exe, save_path, train_program) fluid.save(train_program, save_path)
if steps % args.validation_steps == 0: if steps % args.validation_steps == 0:
# evaluate dev set # evaluate dev set
...@@ -369,7 +368,7 @@ def main(args): ...@@ -369,7 +368,7 @@ def main(args):
except fluid.core.EOFException: except fluid.core.EOFException:
save_path = os.path.join(args.save_checkpoint_dir, "step_" + str(steps)) save_path = os.path.join(args.save_checkpoint_dir, "step_" + str(steps))
fluid.io.save_persistables(exe, save_path, train_program) fluid.save(train_program, save_path)
train_pyreader.reset() train_pyreader.reset()
break break
......
...@@ -11,7 +11,6 @@ ...@@ -11,7 +11,6 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
""" """
EmoTect utilities. EmoTect utilities.
""" """
...@@ -29,27 +28,13 @@ import paddle ...@@ -29,27 +28,13 @@ import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
import numpy as np import numpy as np
def init_checkpoint(exe, init_checkpoint_path, main_program): def init_checkpoint(exe, init_checkpoint_path, main_program):
""" """
Init CheckPoint Init CheckPoint
""" """
assert os.path.exists(
init_checkpoint_path), "[%s] cann't be found." % init_checkpoint_path
def existed_persitables(var):
"""
If existed presitabels
"""
if not fluid.io.is_persistable(var):
return False
return os.path.exists(os.path.join(init_checkpoint_path, var.name))
fluid.io.load_vars( fluid.load(main_program, init_checkpoint_path, exe)
exe,
init_checkpoint_path,
main_program=main_program,
predicate=existed_persitables)
print("Load model from {}".format(init_checkpoint_path))
def word2id(word_dict, query): def word2id(word_dict, query):
...@@ -57,8 +42,10 @@ def word2id(word_dict, query): ...@@ -57,8 +42,10 @@ def word2id(word_dict, query):
Convert word sequence into id list Convert word sequence into id list
""" """
unk_id = len(word_dict) unk_id = len(word_dict)
wids = [word_dict[w] if w in word_dict else unk_id wids = [
for w in query.strip().split(" ")] word_dict[w] if w in word_dict else unk_id
for w in query.strip().split(" ")
]
return wids return wids
......
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
## 1. 任务说明 ## 1. 任务说明
本文主要介绍基于lstm的语言的模型的实现,给定一个输入词序列(中文分词、英文tokenize),计算其ppl(语言模型困惑度,用户表示句子的流利程度),基于循环神经网络语言模型的介绍可以[参阅论文](https://arxiv.org/abs/1409.2329)。相对于传统的方法,基于循环神经网络的方法能够更好的解决稀疏词的问题。 本文主要介绍基于lstm的语言的模型的实现,给定一个输入词序列(中文分词、英文tokenize),计算其ppl(语言模型困惑度,用户表示句子的流利程度),基于循环神经网络语言模型的介绍可以[参阅论文](https://arxiv.org/abs/1409.2329)。相对于传统的方法,基于循环神经网络的方法能够更好的解决稀疏词的问题。
**目前语言模型要求使用PaddlePaddle 1.6及以上版本或适当的develop版本。** **目前语言模型要求使用PaddlePaddle 1.7及以上版本或适当的develop版本。**
同时推荐用户参考[IPython Notebook demo](https://aistudio.baidu.com/aistudio/projectDetail/122290) 同时推荐用户参考[IPython Notebook demo](https://aistudio.baidu.com/aistudio/projectDetail/122290)
......
...@@ -36,7 +36,7 @@ import sys ...@@ -36,7 +36,7 @@ import sys
if sys.version[0] == '2': if sys.version[0] == '2':
reload(sys) reload(sys)
sys.setdefaultencoding("utf-8") sys.setdefaultencoding("utf-8")
sys.path.append('../') sys.path.append('../shared_modules/')
import os import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3" os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
...@@ -60,7 +60,7 @@ def profile_context(profile=True, profiler_path='/tmp/paddingrnn.profile'): ...@@ -60,7 +60,7 @@ def profile_context(profile=True, profiler_path='/tmp/paddingrnn.profile'):
def get_current_model_para(train_prog, train_exe): def get_current_model_para(train_prog, train_exe):
param_list = train_prog.block(0).all_parameters() param_list = train_prog.all_parameters()
param_name_list = [p.name for p in param_list] param_name_list = [p.name for p in param_list]
vals = {} vals = {}
...@@ -73,7 +73,7 @@ def get_current_model_para(train_prog, train_exe): ...@@ -73,7 +73,7 @@ def get_current_model_para(train_prog, train_exe):
def save_para_npz(train_prog, train_exe): def save_para_npz(train_prog, train_exe):
print("begin to save model to model_base") print("begin to save model to model_base")
param_list = train_prog.block(0).all_parameters() param_list = train_prog.all_parameters()
param_name_list = [p.name for p in param_list] param_name_list = [p.name for p in param_list]
vals = {} vals = {}
......
...@@ -16,7 +16,7 @@ Lexical Analysis of Chinese,简称 LAC,是一个联合的词法分析模型 ...@@ -16,7 +16,7 @@ Lexical Analysis of Chinese,简称 LAC,是一个联合的词法分析模型
#### 1.PaddlePaddle 安装 #### 1.PaddlePaddle 安装
本项目依赖 PaddlePaddle 1.6.0 及以上版本和PaddleHub 1.0.0及以上版本 ,PaddlePaddle安装请参考官网 [快速安装](http://www.paddlepaddle.org/paddle#quick-start),PaddleHub安装参考 [PaddleHub](https://github.com/PaddlePaddle/PaddleHub) 本项目依赖 PaddlePaddle 1.7 及以上版本和PaddleHub 1.0.0及以上版本 ,PaddlePaddle安装请参考官网 [快速安装](http://www.paddlepaddle.org/paddle#quick-start),PaddleHub安装参考 [PaddleHub](https://github.com/PaddlePaddle/PaddleHub)
> Warning: GPU 和 CPU 版本的 PaddlePaddle 分别是 paddlepaddle-gpu 和 paddlepaddle,请安装时注意区别。 > Warning: GPU 和 CPU 版本的 PaddlePaddle 分别是 paddlepaddle-gpu 和 paddlepaddle,请安装时注意区别。
......
...@@ -26,7 +26,7 @@ from paddle.fluid.initializer import NormalInitializer ...@@ -26,7 +26,7 @@ from paddle.fluid.initializer import NormalInitializer
from reader import Dataset from reader import Dataset
from ernie_reader import SequenceLabelReader from ernie_reader import SequenceLabelReader
sys.path.append("..") sys.path.append("../shared_modules/")
from models.sequence_labeling import nets from models.sequence_labeling import nets
from models.representation.ernie import ernie_encoder, ernie_pyreader from models.representation.ernie import ernie_encoder, ernie_pyreader
...@@ -35,9 +35,10 @@ def create_model(args, vocab_size, num_labels, mode='train'): ...@@ -35,9 +35,10 @@ def create_model(args, vocab_size, num_labels, mode='train'):
"""create lac model""" """create lac model"""
# model's input data # model's input data
words = fluid.data(name='words', shape=[-1, 1], dtype='int64', lod_level=1) words = fluid.data(
name='words', shape=[None, 1], dtype='int64', lod_level=1)
targets = fluid.data( targets = fluid.data(
name='targets', shape=[-1, 1], dtype='int64', lod_level=1) name='targets', shape=[None, 1], dtype='int64', lod_level=1)
# for inference process # for inference process
if mode == 'infer': if mode == 'infer':
...@@ -88,9 +89,11 @@ def create_pyreader(args, ...@@ -88,9 +89,11 @@ def create_pyreader(args,
return_reader=False, return_reader=False,
mode='train'): mode='train'):
# init reader # init reader
device_count = len(fluid.cuda_places()) if args.use_cuda else len(
fluid.cpu_places())
if model == 'lac': if model == 'lac':
pyreader = fluid.io.PyReader( pyreader = fluid.io.DataLoader.from_generator(
feed_list=feed_list, feed_list=feed_list,
capacity=50, capacity=50,
use_double_buffer=True, use_double_buffer=True,
...@@ -101,19 +104,19 @@ def create_pyreader(args, ...@@ -101,19 +104,19 @@ def create_pyreader(args,
# create lac pyreader # create lac pyreader
if mode == 'train': if mode == 'train':
pyreader.decorate_sample_list_generator( pyreader.set_sample_list_generator(
fluid.io.batch( fluid.io.batch(
fluid.io.shuffle( fluid.io.shuffle(
reader.file_reader(file_name), reader.file_reader(file_name),
buf_size=args.traindata_shuffle_buffer), buf_size=args.traindata_shuffle_buffer),
batch_size=args.batch_size), batch_size=args.batch_size / device_count),
places=place) places=place)
else: else:
pyreader.decorate_sample_list_generator( pyreader.set_sample_list_generator(
fluid.io.batch( fluid.io.batch(
reader.file_reader( reader.file_reader(
file_name, mode=mode), file_name, mode=mode),
batch_size=args.batch_size), batch_size=args.batch_size / device_count),
places=place) places=place)
elif model == 'ernie': elif model == 'ernie':
...@@ -162,19 +165,19 @@ def create_ernie_model(args, ernie_config): ...@@ -162,19 +165,19 @@ def create_ernie_model(args, ernie_config):
# ERNIE's input data # ERNIE's input data
src_ids = fluid.data( src_ids = fluid.data(
name='src_ids', shape=[-1, args.max_seq_len, 1], dtype='int64') name='src_ids', shape=[None, args.max_seq_len, 1], dtype='int64')
sent_ids = fluid.data( sent_ids = fluid.data(
name='sent_ids', shape=[-1, args.max_seq_len, 1], dtype='int64') name='sent_ids', shape=[None, args.max_seq_len, 1], dtype='int64')
pos_ids = fluid.data( pos_ids = fluid.data(
name='pos_ids', shape=[-1, args.max_seq_len, 1], dtype='int64') name='pos_ids', shape=[None, args.max_seq_len, 1], dtype='int64')
input_mask = fluid.data( input_mask = fluid.data(
name='input_mask', shape=[-1, args.max_seq_len, 1], dtype='float32') name='input_mask', shape=[None, args.max_seq_len, 1], dtype='float32')
padded_labels = fluid.data( padded_labels = fluid.data(
name='padded_labels', shape=[-1, args.max_seq_len, 1], dtype='int64') name='padded_labels', shape=[None, args.max_seq_len, 1], dtype='int64')
seq_lens = fluid.data( seq_lens = fluid.data(
name='seq_lens', shape=[-1], dtype='int64', lod_level=0) name='seq_lens', shape=[None], dtype='int64', lod_level=0)
squeeze_labels = fluid.layers.squeeze(padded_labels, axes=[-1]) squeeze_labels = fluid.layers.squeeze(padded_labels, axes=[-1])
......
...@@ -20,7 +20,7 @@ import sys ...@@ -20,7 +20,7 @@ import sys
from collections import namedtuple from collections import namedtuple
import numpy as np import numpy as np
sys.path.append("..") sys.path.append("../shared_modules/")
from preprocess.ernie.task_reader import BaseReader, tokenization from preprocess.ernie.task_reader import BaseReader, tokenization
......
...@@ -24,7 +24,7 @@ import paddle ...@@ -24,7 +24,7 @@ import paddle
import utils import utils
import reader import reader
import creator import creator
sys.path.append('../models/') sys.path.append('../shared_modules/models/')
from model_check import check_cuda from model_check import check_cuda
from model_check import check_version from model_check import check_version
......
...@@ -10,7 +10,7 @@ import paddle.fluid as fluid ...@@ -10,7 +10,7 @@ import paddle.fluid as fluid
import creator import creator
import reader import reader
import utils import utils
sys.path.append('../models/') sys.path.append('../shared_modules/models/')
from model_check import check_cuda from model_check import check_cuda
from model_check import check_version from model_check import check_version
......
...@@ -24,7 +24,7 @@ import paddle ...@@ -24,7 +24,7 @@ import paddle
import utils import utils
import reader import reader
import creator import creator
sys.path.append('../models/') sys.path.append('../shared_modules/models/')
from model_check import check_cuda from model_check import check_cuda
from model_check import check_version from model_check import check_version
......
...@@ -34,7 +34,7 @@ import paddle.fluid as fluid ...@@ -34,7 +34,7 @@ import paddle.fluid as fluid
import creator import creator
import utils import utils
sys.path.append("..") sys.path.append("../shared_modules/")
from models.representation.ernie import ErnieConfig from models.representation.ernie import ErnieConfig
from models.model_check import check_cuda from models.model_check import check_cuda
from models.model_check import check_version from models.model_check import check_version
...@@ -188,15 +188,16 @@ def do_train(args): ...@@ -188,15 +188,16 @@ def do_train(args):
if steps % args.save_steps == 0: if steps % args.save_steps == 0:
save_path = os.path.join(args.model_save_dir, save_path = os.path.join(args.model_save_dir,
"step_" + str(steps)) "step_" + str(steps), "checkpoint")
print("\tsaving model as %s" % (save_path)) print("\tsaving model as %s" % (save_path))
fluid.io.save_persistables(exe, save_path, train_program) fluid.save(train_program, save_path)
if steps % args.validation_steps == 0: if steps % args.validation_steps == 0:
evaluate(exe, test_program, test_pyreader, train_ret) evaluate(exe, test_program, test_pyreader, train_ret)
save_path = os.path.join(args.model_save_dir, "step_" + str(steps)) save_path = os.path.join(args.model_save_dir, "step_" + str(steps),
fluid.io.save_persistables(exe, save_path, train_program) "checkpoint")
fluid.save(train_program, save_path)
def do_eval(args): def do_eval(args):
......
...@@ -29,7 +29,7 @@ import reader ...@@ -29,7 +29,7 @@ import reader
import utils import utils
import creator import creator
from eval import test_process from eval import test_process
sys.path.append('../models/') sys.path.append('../shared_modules/models/')
from model_check import check_cuda from model_check import check_cuda
from model_check import check_version from model_check import check_version
...@@ -151,8 +151,8 @@ def do_train(args): ...@@ -151,8 +151,8 @@ def do_train(args):
# save checkpoints # save checkpoints
if step % args.save_steps == 0 and step != 0: if step % args.save_steps == 0 and step != 0:
save_path = os.path.join(args.model_save_dir, save_path = os.path.join(args.model_save_dir,
"step_" + str(step)) "step_" + str(step), "checkpoint")
fluid.io.save_persistables(exe, save_path, train_program) fluid.save(train_program, save_path)
step += 1 step += 1
if args.enable_ce: if args.enable_ce:
......
...@@ -200,19 +200,11 @@ def init_checkpoint(exe, init_checkpoint_path, main_program): ...@@ -200,19 +200,11 @@ def init_checkpoint(exe, init_checkpoint_path, main_program):
assert os.path.exists( assert os.path.exists(
init_checkpoint_path), "[%s] cann't be found." % init_checkpoint_path init_checkpoint_path), "[%s] cann't be found." % init_checkpoint_path
def existed_persitables(var): try:
""" checkpoint_path = os.path.join(init_checkpoint_path, "checkpoint")
If existed presitabels fluid.load(main_program, checkpoint_path, exe)
""" except:
if not fluid.io.is_persistable(var): fluid.load(main_program, init_checkpoint_path, exe)
return False
return os.path.exists(os.path.join(init_checkpoint_path, var.name))
fluid.io.load_vars(
exe,
init_checkpoint_path,
main_program=main_program,
predicate=existed_persitables)
print("Load model from {}".format(init_checkpoint_path)) print("Load model from {}".format(init_checkpoint_path))
...@@ -224,15 +216,6 @@ def init_pretraining_params(exe, ...@@ -224,15 +216,6 @@ def init_pretraining_params(exe,
assert os.path.exists(pretraining_params_path assert os.path.exists(pretraining_params_path
), "[%s] cann't be found." % pretraining_params_path ), "[%s] cann't be found." % pretraining_params_path
def _existed_params(var): fluid.load(main_program, pretraining_params_path, exe)
if not isinstance(var, fluid.framework.Parameter):
return False
return os.path.exists(os.path.join(pretraining_params_path, var.name))
fluid.io.load_vars(
exe,
pretraining_params_path,
main_program=main_program,
predicate=_existed_params)
print("Load pretraining parameters from {}.".format( print("Load pretraining parameters from {}.".format(
pretraining_params_path)) pretraining_params_path))
...@@ -14,12 +14,12 @@ DuReader是一个大规模、面向真实应用、由人类生成的中文阅读 ...@@ -14,12 +14,12 @@ DuReader是一个大规模、面向真实应用、由人类生成的中文阅读
- 答案由人类生成 - 答案由人类生成
- 面向真实应用场景 - 面向真实应用场景
- 标注更加丰富细致 - 标注更加丰富细致
更多关于DuReader数据集的详细信息可在[DuReader官网](https://ai.baidu.com//broad/subordinate?dataset=dureader)找到。 更多关于DuReader数据集的详细信息可在[DuReader官网](https://ai.baidu.com//broad/subordinate?dataset=dureader)找到。
### DuReader基线系统 ### DuReader基线系统
DuReader基线系统利用[PaddlePaddle](http://paddlepaddle.org)深度学习框架,针对**DuReader阅读理解数据集**实现并升级了一个经典的阅读理解模型 —— BiDAF. DuReader基线系统利用[PaddlePaddle](http://paddlepaddle.org)深度学习框架,针对**DuReader阅读理解数据集**实现并升级了一个经典的阅读理解模型 —— BiDAF.
## [KT-Net](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/Research/ACL2019-KTNET) ## [KT-Net](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/Research/ACL2019-KTNET)
...@@ -30,7 +30,7 @@ KT-NET是百度NLP提出的具有开创性意义的语言表示与知识表示 ...@@ -30,7 +30,7 @@ KT-NET是百度NLP提出的具有开创性意义的语言表示与知识表示
- 被ACL 2019录用为长文 ([文章链接](https://www.aclweb.org/anthology/P19-1226/)) - 被ACL 2019录用为长文 ([文章链接](https://www.aclweb.org/anthology/P19-1226/))
此外,KT-NET具备很强的通用性,不仅适用于机器阅读理解任务,对其他形式的语言理解任务,如自然语言推断、复述识别、语义相似度判断等均有帮助。 此外,KT-NET具备很强的通用性,不仅适用于机器阅读理解任务,对其他形式的语言理解任务,如自然语言推断、复述识别、语义相似度判断等均有帮助。
## [D-NET](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/Research/MRQA2019-D-NET) ## [D-NET](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/Research/MRQA2019-D-NET)
D-NET是一个以提升**阅读理解模型泛化能力**为目标的“预训练-微调”框架。D-NET的特点包括: D-NET是一个以提升**阅读理解模型泛化能力**为目标的“预训练-微调”框架。D-NET的特点包括:
...@@ -39,4 +39,3 @@ D-NET是一个以提升**阅读理解模型泛化能力**为目标的“预训 ...@@ -39,4 +39,3 @@ D-NET是一个以提升**阅读理解模型泛化能力**为目标的“预训
- 在微调阶段引入多任务、多领域的学习策略 (基于[PALM](https://github.com/PaddlePaddle/PALM)多任务学习框架),有效的提升了模型在不同领域的泛化能力 - 在微调阶段引入多任务、多领域的学习策略 (基于[PALM](https://github.com/PaddlePaddle/PALM)多任务学习框架),有效的提升了模型在不同领域的泛化能力
百度利用D-NET框架在EMNLP 2019 [MRQA](https://mrqa.github.io/shared)国际阅读理解评测中以超过第二名近两个百分点的成绩夺得冠军,同时,在全部12个测试数据集中的10个排名第一。 百度利用D-NET框架在EMNLP 2019 [MRQA](https://mrqa.github.io/shared)国际阅读理解评测中以超过第二名近两个百分点的成绩夺得冠军,同时,在全部12个测试数据集中的10个排名第一。
...@@ -106,7 +106,7 @@ python -u main.py \ ...@@ -106,7 +106,7 @@ python -u main.py \
--prepostprocess_dropout 0.3 --prepostprocess_dropout 0.3
``` ```
训练时默认使用所有 GPU,可以通过 `CUDA_VISIBLE_DEVICES` 环境变量来设置使用的 GPU 数目。也可以只使用 CPU 训练(通过参数 `--use_cuda False` 设置),训练速度相对较慢。在执行训练时若提供了 `save_param``save_checkpoint`(默认为 trained_params 和 trained_ckpts),则每隔一定 iteration 后(通过参数 `save_step` 设置,默认为10000)将分别保存当前训练的参数值和 checkpoint 到相应目录,每隔一定数目的 iteration (通过参数 `print_step` 设置,默认为100)将打印如下的日志到标准输出: 训练时默认使用所有 GPU,可以通过 `CUDA_VISIBLE_DEVICES` 环境变量来设置使用的 GPU 数目。也可以只使用 CPU 训练(通过参数 `--use_cuda False` 设置),训练速度相对较慢。在执行训练时若提供了 `save_model_path`(默认为 saved_models),则每隔一定 iteration 后(通过参数 `save_step` 设置,默认为10000)将保存当前训练的 checkpoint 到相应目录(会保存分别记录了模型参数和优化器状态的 `transformer.pdparams``transformer.pdopt` 两个文件),每隔一定数目的 iteration (通过参数 `print_step` 设置,默认为100)将打印如下的日志到标准输出:
```txt ```txt
[2019-08-02 15:30:51,656 INFO train.py:262] step_idx: 150100, epoch: 32, batch: 1364, avg loss: 2.880427, normalized loss: 1.504687, ppl: 17.821888, speed: 3.34 step/s [2019-08-02 15:30:51,656 INFO train.py:262] step_idx: 150100, epoch: 32, batch: 1364, avg loss: 2.880427, normalized loss: 1.504687, ppl: 17.821888, speed: 3.34 step/s
...@@ -195,7 +195,7 @@ BLEU = 26.35, 57.7/32.1/20.0/13.0 (BP=1.000, ratio=1.013, hyp_len=63903, ref_len ...@@ -195,7 +195,7 @@ BLEU = 26.35, 57.7/32.1/20.0/13.0 (BP=1.000, ratio=1.013, hyp_len=63903, ref_len
### 预训练模型 ### 预训练模型
我们这里提供了对应有以上 BLEU 值的 [base model](https://transformer-res.bj.bcebos.com/base_model_params.tar.gz)[big model](https://transformer-res.bj.bcebos.com/big_model_params.tar.gz) 的模型参数提供下载使用(注意,模型使用了提供下载的数据进行训练和测试)。 我们这里提供了对应有以上 BLEU 值的 [base model](https://transformer-res.bj.bcebos.com/base_model_graph.tar.gz)[big model](https://transformer-res.bj.bcebos.com/big_model_graph.tar.gz) 的模型参数提供下载使用(注意,模型使用了提供下载的数据进行训练和测试)。
## 进阶使用 ## 进阶使用
......
...@@ -12,6 +12,7 @@ ...@@ -12,6 +12,7 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
def get_input_descs(args): def get_input_descs(args):
""" """
Generate a dict mapping data fields to the corresponding data shapes and Generate a dict mapping data fields to the corresponding data shapes and
...@@ -42,11 +43,12 @@ def get_input_descs(args): ...@@ -42,11 +43,12 @@ def get_input_descs(args):
# encoder. # encoder.
# The actual data shape of src_slf_attn_bias is: # The actual data shape of src_slf_attn_bias is:
# [batch_size, n_head, max_src_len_in_batch, max_src_len_in_batch] # [batch_size, n_head, max_src_len_in_batch, max_src_len_in_batch]
"src_slf_attn_bias": [(batch_size, n_head, seq_len, seq_len), "float32"], "src_slf_attn_bias":
[(batch_size, n_head, seq_len, seq_len), "float32"],
# The actual data shape of trg_word is: # The actual data shape of trg_word is:
# [batch_size, max_trg_len_in_batch, 1] # [batch_size, max_trg_len_in_batch, 1]
"trg_word": [(batch_size, seq_len), "int64", "trg_word": [(batch_size, seq_len), "int64",
2], # lod_level is only used in fast decoder. 2], # lod_level is only used in fast decoder.
# The actual data shape of trg_pos is: # The actual data shape of trg_pos is:
# [batch_size, max_trg_len_in_batch, 1] # [batch_size, max_trg_len_in_batch, 1]
"trg_pos": [(batch_size, seq_len), "int64"], "trg_pos": [(batch_size, seq_len), "int64"],
...@@ -54,12 +56,14 @@ def get_input_descs(args): ...@@ -54,12 +56,14 @@ def get_input_descs(args):
# subsequent words in the decoder. # subsequent words in the decoder.
# The actual data shape of trg_slf_attn_bias is: # The actual data shape of trg_slf_attn_bias is:
# [batch_size, n_head, max_trg_len_in_batch, max_trg_len_in_batch] # [batch_size, n_head, max_trg_len_in_batch, max_trg_len_in_batch]
"trg_slf_attn_bias": [(batch_size, n_head, seq_len, seq_len), "float32"], "trg_slf_attn_bias":
[(batch_size, n_head, seq_len, seq_len), "float32"],
# This input is used to remove attention weights on paddings of the source # This input is used to remove attention weights on paddings of the source
# input in the encoder-decoder attention. # input in the encoder-decoder attention.
# The actual data shape of trg_src_attn_bias is: # The actual data shape of trg_src_attn_bias is:
# [batch_size, n_head, max_trg_len_in_batch, max_src_len_in_batch] # [batch_size, n_head, max_trg_len_in_batch, max_src_len_in_batch]
"trg_src_attn_bias": [(batch_size, n_head, seq_len, seq_len), "float32"], "trg_src_attn_bias":
[(batch_size, n_head, seq_len, seq_len), "float32"],
# This input is used in independent decoder program for inference. # This input is used in independent decoder program for inference.
# The actual data shape of enc_output is: # The actual data shape of enc_output is:
# [batch_size, max_src_len_in_batch, d_model] # [batch_size, max_src_len_in_batch, d_model]
...@@ -80,6 +84,7 @@ def get_input_descs(args): ...@@ -80,6 +84,7 @@ def get_input_descs(args):
return input_descs return input_descs
# Names of word embedding table which might be reused for weight sharing. # Names of word embedding table which might be reused for weight sharing.
word_emb_param_names = ( word_emb_param_names = (
"src_word_emb_table", "src_word_emb_table",
......
...@@ -24,6 +24,7 @@ import paddle.fluid as fluid ...@@ -24,6 +24,7 @@ import paddle.fluid as fluid
from utils.input_field import InputField from utils.input_field import InputField
from utils.configure import PDConfig from utils.configure import PDConfig
from utils.load import load
# include task-specific libs # include task-specific libs
import desc import desc
...@@ -31,51 +32,6 @@ import reader ...@@ -31,51 +32,6 @@ import reader
from transformer import create_net from transformer import create_net
def init_from_pretrain_model(args, exe, program):
assert isinstance(args.init_from_pretrain_model, str)
if not os.path.exists(args.init_from_pretrain_model):
raise Warning("The pretrained params do not exist.")
return False
def existed_params(var):
if not isinstance(var, fluid.framework.Parameter):
return False
return os.path.exists(
os.path.join(args.init_from_pretrain_model, var.name))
fluid.io.load_vars(
exe,
args.init_from_pretrain_model,
main_program=program,
predicate=existed_params)
print("finish initing model from pretrained params from %s" %
(args.init_from_pretrain_model))
return True
def init_from_params(args, exe, program):
assert isinstance(args.init_from_params, str)
if not os.path.exists(args.init_from_params):
raise Warning("the params path does not exist.")
return False
fluid.io.load_params(
executor=exe,
dirname=args.init_from_params,
main_program=program,
filename="params.pdparams")
print("finish init model from params from %s" % (args.init_from_params))
return True
def do_save_inference_model(args): def do_save_inference_model(args):
if args.use_cuda: if args.use_cuda:
dev_count = fluid.core.get_cuda_device_count() dev_count = fluid.core.get_cuda_device_count()
...@@ -84,6 +40,11 @@ def do_save_inference_model(args): ...@@ -84,6 +40,11 @@ def do_save_inference_model(args):
dev_count = int(os.environ.get('CPU_NUM', 1)) dev_count = int(os.environ.get('CPU_NUM', 1))
place = fluid.CPUPlace() place = fluid.CPUPlace()
src_vocab = reader.DataProcessor.load_dict(args.src_vocab_fpath)
trg_vocab = reader.DataProcessor.load_dict(args.trg_vocab_fpath)
args.src_vocab_size = len(src_vocab)
args.trg_vocab_size = len(trg_vocab)
test_prog = fluid.default_main_program() test_prog = fluid.default_main_program()
startup_prog = fluid.default_startup_program() startup_prog = fluid.default_startup_program()
...@@ -119,13 +80,10 @@ def do_save_inference_model(args): ...@@ -119,13 +80,10 @@ def do_save_inference_model(args):
exe = fluid.Executor(place) exe = fluid.Executor(place)
exe.run(startup_prog) exe.run(startup_prog)
assert (args.init_from_params) or (args.init_from_pretrain_model) assert (
args.init_from_params), "must set init_from_params to load parameters"
if args.init_from_params: load(test_prog, os.path.join(args.init_from_params, "transformer"), exe)
init_from_params(args, exe, test_prog) print("finish initing model from params from %s" % (args.init_from_params))
elif args.init_from_pretrain_model:
init_from_pretrain_model(args, exe, test_prog)
# saving inference model # saving inference model
......
...@@ -25,7 +25,6 @@ from train import do_train ...@@ -25,7 +25,6 @@ from train import do_train
from predict import do_predict from predict import do_predict
from inference_model import do_save_inference_model from inference_model import do_save_inference_model
if __name__ == "__main__": if __name__ == "__main__":
LOG_FORMAT = "[%(asctime)s %(levelname)s %(filename)s:%(lineno)d] %(message)s" LOG_FORMAT = "[%(asctime)s %(levelname)s %(filename)s:%(lineno)d] %(message)s"
logging.basicConfig( logging.basicConfig(
...@@ -43,4 +42,4 @@ if __name__ == "__main__": ...@@ -43,4 +42,4 @@ if __name__ == "__main__":
do_predict(args) do_predict(args)
if args.do_save_inference_model: if args.do_save_inference_model:
do_save_inference_model(args) do_save_inference_model(args)
\ No newline at end of file
...@@ -25,6 +25,7 @@ import paddle.fluid as fluid ...@@ -25,6 +25,7 @@ import paddle.fluid as fluid
from utils.input_field import InputField from utils.input_field import InputField
from utils.configure import PDConfig from utils.configure import PDConfig
from utils.check import check_gpu, check_version from utils.check import check_gpu, check_version
from utils.load import load
# include task-specific libs # include task-specific libs
import desc import desc
...@@ -32,51 +33,6 @@ import reader ...@@ -32,51 +33,6 @@ import reader
from transformer import create_net, position_encoding_init from transformer import create_net, position_encoding_init
def init_from_pretrain_model(args, exe, program):
assert isinstance(args.init_from_pretrain_model, str)
if not os.path.exists(args.init_from_pretrain_model):
raise Warning("The pretrained params do not exist.")
return False
def existed_params(var):
if not isinstance(var, fluid.framework.Parameter):
return False
return os.path.exists(
os.path.join(args.init_from_pretrain_model, var.name))
fluid.io.load_vars(
exe,
args.init_from_pretrain_model,
main_program=program,
predicate=existed_params)
print("finish initing model from pretrained params from %s" %
(args.init_from_pretrain_model))
return True
def init_from_params(args, exe, program):
assert isinstance(args.init_from_params, str)
if not os.path.exists(args.init_from_params):
raise Warning("the params path does not exist.")
return False
fluid.io.load_params(
executor=exe,
dirname=args.init_from_params,
main_program=program,
filename="params.pdparams")
print("finish init model from params from %s" % (args.init_from_params))
return True
def post_process_seq(seq, bos_idx, eos_idx, output_bos=False, output_eos=False): def post_process_seq(seq, bos_idx, eos_idx, output_bos=False, output_eos=False):
""" """
Post-process the beam-search decoded sequence. Truncate from the first Post-process the beam-search decoded sequence. Truncate from the first
...@@ -160,13 +116,10 @@ def do_predict(args): ...@@ -160,13 +116,10 @@ def do_predict(args):
exe = fluid.Executor(place) exe = fluid.Executor(place)
exe.run(startup_prog) exe.run(startup_prog)
assert (args.init_from_params) or (args.init_from_pretrain_model) assert (
args.init_from_params), "must set init_from_params to load parameters"
if args.init_from_params: load(test_prog, os.path.join(args.init_from_params, "transformer"), exe)
init_from_params(args, exe, test_prog) print("finish initing model from params from %s" % (args.init_from_params))
elif args.init_from_pretrain_model:
init_from_pretrain_model(args, exe, test_prog)
# to avoid a longer length than training, reset the size of position encoding to max_length # to avoid a longer length than training, reset the size of position encoding to max_length
for pos_enc_param_name in desc.pos_enc_param_names: for pos_enc_param_name in desc.pos_enc_param_names:
......
...@@ -27,6 +27,7 @@ import utils.dist_utils as dist_utils ...@@ -27,6 +27,7 @@ import utils.dist_utils as dist_utils
from utils.input_field import InputField from utils.input_field import InputField
from utils.configure import PDConfig from utils.configure import PDConfig
from utils.check import check_gpu, check_version from utils.check import check_gpu, check_version
from utils.load import load
# include task-specific libs # include task-specific libs
import desc import desc
...@@ -39,91 +40,6 @@ if os.environ.get('FLAGS_eager_delete_tensor_gb', None) is None: ...@@ -39,91 +40,6 @@ if os.environ.get('FLAGS_eager_delete_tensor_gb', None) is None:
num_trainers = int(os.environ.get('PADDLE_TRAINERS_NUM', 1)) num_trainers = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
def init_from_pretrain_model(args, exe, program):
assert isinstance(args.init_from_pretrain_model, str)
if not os.path.exists(args.init_from_pretrain_model):
raise Warning("The pretrained params do not exist.")
return False
def existed_params(var):
if not isinstance(var, fluid.framework.Parameter):
return False
return os.path.exists(
os.path.join(args.init_from_pretrain_model, var.name))
fluid.io.load_vars(
exe,
args.init_from_pretrain_model,
main_program=program,
predicate=existed_params)
print("finish initing model from pretrained params from %s" %
(args.init_from_pretrain_model))
return True
def init_from_checkpoint(args, exe, program):
assert isinstance(args.init_from_checkpoint, str)
if not os.path.exists(args.init_from_checkpoint):
raise Warning("the checkpoint path does not exist.")
return False
fluid.io.load_persistables(
executor=exe,
dirname=args.init_from_checkpoint,
main_program=program,
filename="checkpoint.pdckpt")
print("finish initing model from checkpoint from %s" %
(args.init_from_checkpoint))
return True
def save_checkpoint(args, exe, program, dirname):
assert isinstance(args.save_model_path, str)
checkpoint_dir = os.path.join(args.save_model_path, args.save_checkpoint)
if not os.path.exists(checkpoint_dir):
os.mkdir(checkpoint_dir)
fluid.io.save_persistables(
exe,
os.path.join(checkpoint_dir, dirname),
main_program=program,
filename="checkpoint.pdparams")
print("save checkpoint at %s" % (os.path.join(checkpoint_dir, dirname)))
return True
def save_param(args, exe, program, dirname):
assert isinstance(args.save_model_path, str)
param_dir = os.path.join(args.save_model_path, args.save_param)
if not os.path.exists(param_dir):
os.mkdir(param_dir)
fluid.io.save_params(
exe,
os.path.join(param_dir, dirname),
main_program=program,
filename="params.pdparams")
print("save parameters at %s" % (os.path.join(param_dir, dirname)))
return True
def do_train(args): def do_train(args):
if args.use_cuda: if args.use_cuda:
if num_trainers > 1: # for multi-process gpu training if num_trainers > 1: # for multi-process gpu training
...@@ -226,11 +142,17 @@ def do_train(args): ...@@ -226,11 +142,17 @@ def do_train(args):
## init from some checkpoint, to resume the previous training ## init from some checkpoint, to resume the previous training
if args.init_from_checkpoint: if args.init_from_checkpoint:
init_from_checkpoint(args, exe, train_prog) load(train_prog,
os.path.join(args.init_from_checkpoint, "transformer"), exe)
print("finish initing model from checkpoint from %s" %
(args.init_from_checkpoint))
## init from some pretrain models, to better solve the current task ## init from some pretrain models, to better solve the current task
if args.init_from_pretrain_model: if args.init_from_pretrain_model:
init_from_pretrain_model(args, exe, train_prog) load(train_prog,
os.path.join(args.init_from_pretrain_model, "transformer"), exe)
print("finish initing model from pretrained params from %s" %
(args.init_from_pretrain_model))
build_strategy = fluid.compiler.BuildStrategy() build_strategy = fluid.compiler.BuildStrategy()
build_strategy.enable_inplace = True build_strategy.enable_inplace = True
...@@ -259,7 +181,7 @@ def do_train(args): ...@@ -259,7 +181,7 @@ def do_train(args):
batch_id = 0 batch_id = 0
while True: while True:
if args.max_iter and total_batch_num == args.max_iter: # this for benchmark if args.max_iter and total_batch_num == args.max_iter: # this for benchmark
return return
try: try:
outs = exe.run(compiled_train_prog, outs = exe.run(compiled_train_prog,
...@@ -293,18 +215,15 @@ def do_train(args): ...@@ -293,18 +215,15 @@ def do_train(args):
avg_batch_time = time.time() avg_batch_time = time.time()
if step_idx % args.save_step == 0 and step_idx != 0: if step_idx % args.save_step == 0 and step_idx != 0:
if args.save_model_path:
if args.save_checkpoint: model_path = os.path.join(args.save_model_path,
save_checkpoint(args, exe, train_prog, "step_" + str(step_idx),
"step_" + str(step_idx)) "transformer")
fluid.save(train_prog, model_path)
if args.save_param:
save_param(args, exe, train_prog,
"step_" + str(step_idx))
batch_id += 1 batch_id += 1
step_idx += 1 step_idx += 1
total_batch_num = total_batch_num + 1 # this is for benchmark total_batch_num = total_batch_num + 1 # this is for benchmark
# profiler tools for benchmark # profiler tools for benchmark
if args.is_profiler and pass_id == 0 and batch_id == args.print_step: if args.is_profiler and pass_id == 0 and batch_id == args.print_step:
...@@ -319,11 +238,10 @@ def do_train(args): ...@@ -319,11 +238,10 @@ def do_train(args):
time_consumed = time.time() - pass_start_time time_consumed = time.time() - pass_start_time
if args.save_checkpoint: if args.save_model_path:
save_checkpoint(args, exe, train_prog, "step_final") model_path = os.path.join(args.save_model_path, "step_final",
"transformer")
if args.save_param: fluid.save(train_prog, model_path)
save_param(args, exe, train_prog, "step_final")
if args.enable_ce: # For CE if args.enable_ce: # For CE
print("kpis\ttrain_cost_card%d\t%f" % (dev_count, total_avg_cost)) print("kpis\ttrain_cost_card%d\t%f" % (dev_count, total_avg_cost))
......
...@@ -17,6 +17,7 @@ import numpy as np ...@@ -17,6 +17,7 @@ import numpy as np
import paddle.fluid as fluid import paddle.fluid as fluid
import paddle.fluid.layers as layers import paddle.fluid.layers as layers
from paddle.fluid.layers.utils import map_structure
from desc import * from desc import *
...@@ -90,7 +91,6 @@ def multi_head_attention(queries, ...@@ -90,7 +91,6 @@ def multi_head_attention(queries,
n_head=1, n_head=1,
dropout_rate=0., dropout_rate=0.,
cache=None, cache=None,
gather_idx=None,
static_kv=False): static_kv=False):
""" """
Multi-Head Attention. Note that attn_bias is added to the logit before Multi-Head Attention. Note that attn_bias is added to the logit before
...@@ -161,30 +161,28 @@ def multi_head_attention(queries, ...@@ -161,30 +161,28 @@ def multi_head_attention(queries,
v = transpose_layer(x=reshaped_v, perm=[0, 2, 1, 3]) v = transpose_layer(x=reshaped_v, perm=[0, 2, 1, 3])
if cache is not None: # only for faster inference if cache is not None: # only for faster inference
cache_, i = cache
if static_kv: # For encoder-decoder attention in inference if static_kv: # For encoder-decoder attention in inference
cache_k, cache_v = cache["static_k"], cache["static_v"] cache_k, cache_v = cache_["static_k"], cache_["static_v"]
# To init the static_k and static_v in cache. # To init the static_k and static_v in global block.
# Maybe we can use condition_op(if_else) to do these at the first
# step in while loop to replace these, however it might be less
# efficient.
static_cache_init = wrap_layer_with_block( static_cache_init = wrap_layer_with_block(
layers.assign, layers.assign,
fluid.default_main_program().current_block().parent_idx) fluid.default_main_program().current_block().parent_idx)
static_cache_init(k, cache_k) static_cache_init(
static_cache_init(v, cache_v) k,
fluid.default_main_program().global_block().var(
"static_k_%d" % i))
static_cache_init(
v,
fluid.default_main_program().global_block().var(
"static_v_%d" % i))
k, v = cache_k, cache_v
else: # For decoder self-attention in inference else: # For decoder self-attention in inference
cache_k, cache_v = cache["k"], cache["v"] # use cache and concat time steps.
# gather cell states corresponding to selected parent cache_k, cache_v = cache_["k"], cache_["v"]
select_k = layers.gather(cache_k, index=gather_idx) k = layers.concat([cache_k, k], axis=2)
select_v = layers.gather(cache_v, index=gather_idx) v = layers.concat([cache_v, v], axis=2)
if not static_kv: cache_["k"], cache_["v"] = (k, v)
# For self attention in inference, use cache and concat time steps.
select_k = layers.concat([select_k, k], axis=2)
select_v = layers.concat([select_v, v], axis=2)
# update cell states(caches) cached in global block
layers.assign(select_k, cache_k)
layers.assign(select_v, cache_v)
return q, select_k, select_v
return q, k, v return q, k, v
def __combine_heads(x): def __combine_heads(x):
...@@ -301,15 +299,16 @@ def prepare_encoder_decoder(src_word, ...@@ -301,15 +299,16 @@ def prepare_encoder_decoder(src_word,
src_word, src_word,
size=[src_vocab_size, src_emb_dim], size=[src_vocab_size, src_emb_dim],
padding_idx=bos_idx, # set embedding of bos to 0 padding_idx=bos_idx, # set embedding of bos to 0
param_attr=fluid.ParamAttr(name=word_emb_param_name, param_attr=fluid.ParamAttr(
initializer=fluid.initializer.Normal( name=word_emb_param_name,
0., src_emb_dim**-0.5))) initializer=fluid.initializer.Normal(0., src_emb_dim**-0.5)))
src_word_emb = layers.scale(x=src_word_emb, scale=src_emb_dim**0.5) src_word_emb = layers.scale(x=src_word_emb, scale=src_emb_dim**0.5)
src_pos_enc = fluid.embedding(src_pos, src_pos_enc = fluid.embedding(
size=[src_max_len, src_emb_dim], src_pos,
param_attr=fluid.ParamAttr( size=[src_max_len, src_emb_dim],
name=pos_enc_param_name, trainable=False)) param_attr=fluid.ParamAttr(
name=pos_enc_param_name, trainable=False))
src_pos_enc.stop_gradient = True src_pos_enc.stop_gradient = True
enc_input = src_word_emb + src_pos_enc enc_input = src_word_emb + src_pos_enc
return layers.dropout( return layers.dropout(
...@@ -405,8 +404,7 @@ def decoder_layer(dec_input, ...@@ -405,8 +404,7 @@ def decoder_layer(dec_input,
relu_dropout, relu_dropout,
preprocess_cmd, preprocess_cmd,
postprocess_cmd, postprocess_cmd,
cache=None, cache=None):
gather_idx=None):
""" The layer to be stacked in decoder part. """ The layer to be stacked in decoder part.
The structure of this module is similar to that in the encoder part except The structure of this module is similar to that in the encoder part except
a multi-head attention is added to implement encoder-decoder attention. a multi-head attention is added to implement encoder-decoder attention.
...@@ -421,8 +419,7 @@ def decoder_layer(dec_input, ...@@ -421,8 +419,7 @@ def decoder_layer(dec_input,
d_model, d_model,
n_head, n_head,
attention_dropout, attention_dropout,
cache=cache, cache=cache)
gather_idx=gather_idx)
slf_attn_output = post_process_layer( slf_attn_output = post_process_layer(
dec_input, dec_input,
slf_attn_output, slf_attn_output,
...@@ -440,7 +437,6 @@ def decoder_layer(dec_input, ...@@ -440,7 +437,6 @@ def decoder_layer(dec_input,
n_head, n_head,
attention_dropout, attention_dropout,
cache=cache, cache=cache,
gather_idx=gather_idx,
static_kv=True) static_kv=True)
enc_attn_output = post_process_layer( enc_attn_output = post_process_layer(
slf_attn_output, slf_attn_output,
...@@ -476,8 +472,7 @@ def decoder(dec_input, ...@@ -476,8 +472,7 @@ def decoder(dec_input,
relu_dropout, relu_dropout,
preprocess_cmd, preprocess_cmd,
postprocess_cmd, postprocess_cmd,
caches=None, caches=None):
gather_idx=None):
""" """
The decoder is composed of a stack of identical decoder_layer layers. The decoder is composed of a stack of identical decoder_layer layers.
""" """
...@@ -497,8 +492,7 @@ def decoder(dec_input, ...@@ -497,8 +492,7 @@ def decoder(dec_input,
relu_dropout, relu_dropout,
preprocess_cmd, preprocess_cmd,
postprocess_cmd, postprocess_cmd,
cache=None if caches is None else caches[i], cache=None if caches is None else (caches[i], i))
gather_idx=gather_idx)
dec_input = dec_output dec_input = dec_output
dec_output = pre_process_layer(dec_output, preprocess_cmd, dec_output = pre_process_layer(dec_output, preprocess_cmd,
prepostprocess_dropout) prepostprocess_dropout)
...@@ -536,48 +530,51 @@ def transformer(model_input, ...@@ -536,48 +530,51 @@ def transformer(model_input,
label = model_input.lbl_word label = model_input.lbl_word
weights = model_input.lbl_weight weights = model_input.lbl_weight
enc_output = wrap_encoder(enc_inputs, enc_output = wrap_encoder(
src_vocab_size, enc_inputs,
max_length, src_vocab_size,
n_layer, max_length,
n_head, n_layer,
d_key, n_head,
d_value, d_key,
d_model, d_value,
d_inner_hid, d_model,
prepostprocess_dropout, d_inner_hid,
attention_dropout, prepostprocess_dropout,
relu_dropout, attention_dropout,
preprocess_cmd, relu_dropout,
postprocess_cmd, preprocess_cmd,
weight_sharing, postprocess_cmd,
bos_idx=bos_idx) weight_sharing,
bos_idx=bos_idx)
predict = wrap_decoder(dec_inputs,
trg_vocab_size, predict = wrap_decoder(
max_length, dec_inputs,
n_layer, trg_vocab_size,
n_head, max_length,
d_key, n_layer,
d_value, n_head,
d_model, d_key,
d_inner_hid, d_value,
prepostprocess_dropout, d_model,
attention_dropout, d_inner_hid,
relu_dropout, prepostprocess_dropout,
preprocess_cmd, attention_dropout,
postprocess_cmd, relu_dropout,
weight_sharing, preprocess_cmd,
enc_output=enc_output) postprocess_cmd,
weight_sharing,
enc_output=enc_output)
# Padding index do not contribute to the total loss. The weights is used to # Padding index do not contribute to the total loss. The weights is used to
# cancel padding index in calculating the loss. # cancel padding index in calculating the loss.
if label_smooth_eps: if label_smooth_eps:
# TODO: use fluid.input.one_hot after softmax_with_cross_entropy removing # TODO: use fluid.input.one_hot after softmax_with_cross_entropy removing
# the enforcement that the last dimension of label must be 1. # the enforcement that the last dimension of label must be 1.
label = layers.label_smooth(label=layers.one_hot(input=label, label = layers.label_smooth(
depth=trg_vocab_size), label=layers.one_hot(
epsilon=label_smooth_eps) input=label, depth=trg_vocab_size),
epsilon=label_smooth_eps)
cost = layers.softmax_with_cross_entropy( cost = layers.softmax_with_cross_entropy(
logits=predict, logits=predict,
...@@ -654,7 +651,6 @@ def wrap_decoder(dec_inputs, ...@@ -654,7 +651,6 @@ def wrap_decoder(dec_inputs,
weight_sharing, weight_sharing,
enc_output=None, enc_output=None,
caches=None, caches=None,
gather_idx=None,
bos_idx=0): bos_idx=0):
""" """
The wrapper assembles together all needed layers for the decoder. The wrapper assembles together all needed layers for the decoder.
...@@ -687,8 +683,7 @@ def wrap_decoder(dec_inputs, ...@@ -687,8 +683,7 @@ def wrap_decoder(dec_inputs,
relu_dropout, relu_dropout,
preprocess_cmd, preprocess_cmd,
postprocess_cmd, postprocess_cmd,
caches=caches, caches=caches)
gather_idx=gather_idx)
# Reshape to 2D tensor to use GEMM instead of BatchedGEMM # Reshape to 2D tensor to use GEMM instead of BatchedGEMM
dec_output = layers.reshape( dec_output = layers.reshape(
dec_output, shape=[-1, dec_output.shape[-1]], inplace=True) dec_output, shape=[-1, dec_output.shape[-1]], inplace=True)
...@@ -722,22 +717,23 @@ def fast_decode(model_input, src_vocab_size, trg_vocab_size, max_in_len, ...@@ -722,22 +717,23 @@ def fast_decode(model_input, src_vocab_size, trg_vocab_size, max_in_len,
dec_inputs = (model_input.trg_word, model_input.init_score, dec_inputs = (model_input.trg_word, model_input.init_score,
model_input.init_idx, model_input.trg_src_attn_bias) model_input.init_idx, model_input.trg_src_attn_bias)
enc_output = wrap_encoder(enc_inputs, enc_output = wrap_encoder(
src_vocab_size, enc_inputs,
max_in_len, src_vocab_size,
n_layer, max_in_len,
n_head, n_layer,
d_key, n_head,
d_value, d_key,
d_model, d_value,
d_inner_hid, d_model,
prepostprocess_dropout, d_inner_hid,
attention_dropout, prepostprocess_dropout,
relu_dropout, attention_dropout,
preprocess_cmd, relu_dropout,
postprocess_cmd, preprocess_cmd,
weight_sharing, postprocess_cmd,
bos_idx=bos_idx) weight_sharing,
bos_idx=bos_idx)
start_tokens, init_scores, parent_idx, trg_src_attn_bias = dec_inputs start_tokens, init_scores, parent_idx, trg_src_attn_bias = dec_inputs
def beam_search(): def beam_search():
...@@ -748,8 +744,6 @@ def fast_decode(model_input, src_vocab_size, trg_vocab_size, max_in_len, ...@@ -748,8 +744,6 @@ def fast_decode(model_input, src_vocab_size, trg_vocab_size, max_in_len,
force_cpu=True) force_cpu=True)
step_idx = layers.fill_constant( step_idx = layers.fill_constant(
shape=[1], dtype=start_tokens.dtype, value=0, force_cpu=True) shape=[1], dtype=start_tokens.dtype, value=0, force_cpu=True)
cond = layers.less_than(x=step_idx, y=max_len) # default force_cpu=True
while_op = layers.While(cond)
# array states will be stored for each step. # array states will be stored for each step.
ids = layers.array_write( ids = layers.array_write(
layers.reshape(start_tokens, (-1, 1)), step_idx) layers.reshape(start_tokens, (-1, 1)), step_idx)
...@@ -773,21 +767,31 @@ def fast_decode(model_input, src_vocab_size, trg_vocab_size, max_in_len, ...@@ -773,21 +767,31 @@ def fast_decode(model_input, src_vocab_size, trg_vocab_size, max_in_len,
dtype=enc_output.dtype, dtype=enc_output.dtype,
value=0), value=0),
"static_k": # for encoder-decoder attention "static_k": # for encoder-decoder attention
layers.create_tensor(dtype=enc_output.dtype), fluid.data(
shape=[None, n_head, 0, d_key],
dtype=enc_output.dtype,
name=("static_k_%d" % i)),
"static_v": # for encoder-decoder attention "static_v": # for encoder-decoder attention
layers.create_tensor(dtype=enc_output.dtype) fluid.data(
shape=[None, n_head, 0, d_value],
dtype=enc_output.dtype,
name=("static_v_%d" % i)),
} for i in range(n_layer) } for i in range(n_layer)
] ]
with while_op.block(): def cond_func(step_idx, selected_ids, selected_scores, gather_idx,
pre_ids = layers.array_read(array=ids, i=step_idx) caches, trg_src_attn_bias):
# Since beam_search_op dosen't enforce pre_ids' shape, we can do length_cond = layers.less_than(x=step_idx, y=max_len)
# inplace reshape here which actually change the shape of pre_ids. finish_cond = layers.logical_not(layers.is_empty(x=selected_ids))
# pre_ids = layers.reshape(pre_ids, (-1, 1, 1), inplace=True) return layers.logical_and(x=length_cond, y=finish_cond)
pre_scores = layers.array_read(array=scores, i=step_idx)
def body_func(step_idx, pre_ids, pre_scores, gather_idx, caches,
trg_src_attn_bias):
# gather cell states corresponding to selected parent # gather cell states corresponding to selected parent
pre_caches = map_structure(
lambda x: layers.gather(x, index=gather_idx), caches)
pre_src_attn_bias = layers.gather( pre_src_attn_bias = layers.gather(
trg_src_attn_bias, index=parent_idx) trg_src_attn_bias, index=gather_idx)
pre_pos = layers.elementwise_mul( pre_pos = layers.elementwise_mul(
x=layers.fill_constant_batch_size_like( x=layers.fill_constant_batch_size_like(
input=pre_src_attn_bias, # cann't use lod tensor here input=pre_src_attn_bias, # cann't use lod tensor here
...@@ -796,25 +800,25 @@ def fast_decode(model_input, src_vocab_size, trg_vocab_size, max_in_len, ...@@ -796,25 +800,25 @@ def fast_decode(model_input, src_vocab_size, trg_vocab_size, max_in_len,
dtype=pre_ids.dtype), dtype=pre_ids.dtype),
y=step_idx, y=step_idx,
axis=0) axis=0)
logits = wrap_decoder((pre_ids, pre_pos, None, pre_src_attn_bias), logits = wrap_decoder(
trg_vocab_size, (pre_ids, pre_pos, None, pre_src_attn_bias),
max_in_len, trg_vocab_size,
n_layer, max_in_len,
n_head, n_layer,
d_key, n_head,
d_value, d_key,
d_model, d_value,
d_inner_hid, d_model,
prepostprocess_dropout, d_inner_hid,
attention_dropout, prepostprocess_dropout,
relu_dropout, attention_dropout,
preprocess_cmd, relu_dropout,
postprocess_cmd, preprocess_cmd,
weight_sharing, postprocess_cmd,
enc_output=enc_output, weight_sharing,
caches=caches, enc_output=enc_output,
gather_idx=parent_idx, caches=pre_caches,
bos_idx=bos_idx) bos_idx=bos_idx)
# intra-beam topK # intra-beam topK
topk_scores, topk_indices = layers.topk( topk_scores, topk_indices = layers.topk(
input=layers.softmax(logits), k=beam_size) input=layers.softmax(logits), k=beam_size)
...@@ -832,16 +836,20 @@ def fast_decode(model_input, src_vocab_size, trg_vocab_size, max_in_len, ...@@ -832,16 +836,20 @@ def fast_decode(model_input, src_vocab_size, trg_vocab_size, max_in_len,
beam_size=beam_size, beam_size=beam_size,
end_id=eos_idx, end_id=eos_idx,
return_parent_idx=True) return_parent_idx=True)
layers.increment(x=step_idx, value=1.0, in_place=True) step_idx = layers.increment(x=step_idx, value=1.0, in_place=False)
# cell states(caches) have been updated in wrap_decoder,
# only need to update beam search states here.
layers.array_write(selected_ids, i=step_idx, array=ids) layers.array_write(selected_ids, i=step_idx, array=ids)
layers.array_write(selected_scores, i=step_idx, array=scores) layers.array_write(selected_scores, i=step_idx, array=scores)
layers.assign(gather_idx, parent_idx) return (step_idx, selected_ids, selected_scores, gather_idx,
layers.assign(pre_src_attn_bias, trg_src_attn_bias) pre_caches, pre_src_attn_bias)
length_cond = layers.less_than(x=step_idx, y=max_len)
finish_cond = layers.logical_not(layers.is_empty(x=selected_ids)) _ = layers.while_loop(
layers.logical_and(x=length_cond, y=finish_cond, out=cond) cond=cond_func,
body=body_func,
loop_vars=[
step_idx, start_tokens, init_scores, parent_idx, caches,
trg_src_attn_bias
],
is_test=True)
finished_ids, finished_scores = layers.beam_search_decode( finished_ids, finished_scores = layers.beam_search_decode(
ids, scores, beam_size=beam_size, end_id=eos_idx) ids, scores, beam_size=beam_size, end_id=eos_idx)
......
...@@ -11,10 +11,11 @@ init_from_checkpoint: "" ...@@ -11,10 +11,11 @@ init_from_checkpoint: ""
init_from_pretrain_model: "" init_from_pretrain_model: ""
# path of trained parameter, to make prediction # path of trained parameter, to make prediction
init_from_params: "trained_params/step_100000" init_from_params: "trained_params/step_100000"
save_model_path: "" # the directory for saving models.
# the directory for saving checkpoints. save_model_path: "saved_models"
# deprecated, the directory for saving checkpoints.
save_checkpoint: "trained_ckpts" save_checkpoint: "trained_ckpts"
# the directory for saving trained parameters. # deprecated, the directory for saving trained parameters.
save_param: "trained_params" save_param: "trained_params"
# the directory for saving inference model. # the directory for saving inference model.
inference_model_dir: "infer_model" inference_model_dir: "infer_model"
......
...@@ -199,9 +199,14 @@ class PDConfig(object): ...@@ -199,9 +199,14 @@ class PDConfig(object):
"Whether to perform model saving for inference.") "Whether to perform model saving for inference.")
# NOTE: args for profiler # NOTE: args for profiler
self.default_g.add_arg("is_profiler", int, 0, "the switch of profiler tools. (used for benchmark)") self.default_g.add_arg(
self.default_g.add_arg("profiler_path", str, './', "the profiler output file path. (used for benchmark)") "is_profiler", int, 0,
self.default_g.add_arg("max_iter", int, 0, "the max train batch num.(used for benchmark)") "the switch of profiler tools. (used for benchmark)")
self.default_g.add_arg(
"profiler_path", str, './',
"the profiler output file path. (used for benchmark)")
self.default_g.add_arg("max_iter", int, 0,
"the max train batch num.(used for benchmark)")
self.parser = parser self.parser = parser
......
import pickle
import six
import warnings
from functools import partial
import paddle.fluid as fluid
def load(program, model_path, executor=None, var_list=None):
"""
To load python2 saved models in python3.
"""
try:
fluid.load(program, model_path, executor, var_list)
except UnicodeDecodeError:
warnings.warn(
"An UnicodeDecodeError is catched, which might be caused by loading "
"a python2 saved model. Encoding of pickle.load would be set and "
"load again automatically.")
if six.PY3:
load_bak = pickle.load
pickle.load = partial(load_bak, encoding="latin1")
fluid.load(program, model_path, executor, var_list)
pickle.load = load_bak
...@@ -22,6 +22,8 @@ ...@@ -22,6 +22,8 @@
| :------| :------: | :------: |:------: |:------: | | :------| :------: | :------: |:------: |:------: |
| [BERT-Large, Uncased (Whole Word Masking)](https://bert-models.bj.bcebos.com/wwm_uncased_L-24_H-1024_A-16.tar.gz)| 24 | 1024 | 16 | 340M | | [BERT-Large, Uncased (Whole Word Masking)](https://bert-models.bj.bcebos.com/wwm_uncased_L-24_H-1024_A-16.tar.gz)| 24 | 1024 | 16 | 340M |
| [BERT-Large, Cased (Whole Word Masking)](https://bert-models.bj.bcebos.com/wwm_cased_L-24_H-1024_A-16.tar.gz)| 24 | 1024 | 16 | 340M | | [BERT-Large, Cased (Whole Word Masking)](https://bert-models.bj.bcebos.com/wwm_cased_L-24_H-1024_A-16.tar.gz)| 24 | 1024 | 16 | 340M |
| [RoBERTa-Base, Chinese](https://bert-models.bj.bcebos.com/chinese_roberta_wwm_ext_L-12_H-768_A-12.tar.gz) | 12 | 768 |12 |110M |
| [RoBERTa-Large, Chinese](https://bert-models.bj.bcebos.com/chinese_roberta_wwm_large_ext_L-24_H-1024_A-16.tar.gz) | 24 | 1024 |16 |340M |
| [BERT-Base, Uncased](https://bert-models.bj.bcebos.com/uncased_L-12_H-768_A-12.tar.gz) | 12 | 768 |12 |110M | | [BERT-Base, Uncased](https://bert-models.bj.bcebos.com/uncased_L-12_H-768_A-12.tar.gz) | 12 | 768 |12 |110M |
| [BERT-Large, Uncased](https://bert-models.bj.bcebos.com/uncased_L-24_H-1024_A-16.tar.gz) | 24 | 1024 |16 |340M | | [BERT-Large, Uncased](https://bert-models.bj.bcebos.com/uncased_L-24_H-1024_A-16.tar.gz) | 24 | 1024 |16 |340M |
|[BERT-Base, Cased](https://bert-models.bj.bcebos.com/cased_L-12_H-768_A-12.tar.gz)|12|768|12|110M| |[BERT-Base, Cased](https://bert-models.bj.bcebos.com/cased_L-12_H-768_A-12.tar.gz)|12|768|12|110M|
...@@ -415,5 +417,3 @@ for (size_t i = 0; i < output.front().data.length() / sizeof(float); i += 3) { ...@@ -415,5 +417,3 @@ for (size_t i = 0; i < output.front().data.length() / sizeof(float); i += 3) {
<< static_cast<float *>(output.front().data.data())[i + 2] << std::endl; << static_cast<float *>(output.front().data.data())[i + 2] << std::endl;
} }
``` ```
...@@ -158,7 +158,7 @@ def optimization(loss, ...@@ -158,7 +158,7 @@ def optimization(loss,
else: else:
if weight_decay > 0: if weight_decay > 0:
for param in train_program.global_block().all_parameters(): for param in train_program.all_parameters():
param_list[param.name] = param * 1.0 param_list[param.name] = param * 1.0
param_list[param.name].stop_gradient = True param_list[param.name].stop_gradient = True
......
...@@ -392,7 +392,7 @@ def main(args): ...@@ -392,7 +392,7 @@ def main(args):
if steps % args.save_steps == 0: if steps % args.save_steps == 0:
save_path = os.path.join(args.checkpoints, save_path = os.path.join(args.checkpoints,
"step_" + str(steps)) "step_" + str(steps))
fluid.io.save_persistables(exe, save_path, train_program) fluid.save(program=train_program, model_path=save_path)
if steps % args.validation_steps == 0: if steps % args.validation_steps == 0:
print("Average throughtput: %s" % (np.average(throughput))) print("Average throughtput: %s" % (np.average(throughput)))
...@@ -409,7 +409,7 @@ def main(args): ...@@ -409,7 +409,7 @@ def main(args):
"test") "test")
except fluid.core.EOFException: except fluid.core.EOFException:
save_path = os.path.join(args.checkpoints, "step_" + str(steps)) save_path = os.path.join(args.checkpoints, "step_" + str(steps))
fluid.io.save_persistables(exe, save_path, train_program) fluid.save(program=train_program, model_path=save_path)
train_data_loader.reset() train_data_loader.reset()
break break
if args.enable_ce: if args.enable_ce:
......
...@@ -398,11 +398,11 @@ def train(args): ...@@ -398,11 +398,11 @@ def train(args):
if steps % args.save_steps == 0 or steps == max_train_steps: if steps % args.save_steps == 0 or steps == max_train_steps:
save_path = os.path.join(args.checkpoints, save_path = os.path.join(args.checkpoints,
"step_" + str(steps)) "step_" + str(steps))
fluid.io.save_persistables(exe, save_path, train_program) fluid.save(program=train_program, model_path=save_path)
except fluid.core.EOFException: except fluid.core.EOFException:
save_path = os.path.join(args.checkpoints, save_path = os.path.join(args.checkpoints,
"step_" + str(steps) + "_final") "step_" + str(steps) + "_final")
fluid.io.save_persistables(exe, save_path, train_program) fluid.save(program=train_program, model_path=save_path)
train_data_loader.reset() train_data_loader.reset()
break break
......
...@@ -412,7 +412,7 @@ def train(args): ...@@ -412,7 +412,7 @@ def train(args):
if steps % args.save_steps == 0: if steps % args.save_steps == 0:
save_path = os.path.join(args.checkpoints, "step_" + str(steps)) save_path = os.path.join(args.checkpoints, "step_" + str(steps))
fluid.io.save_persistables(exe, save_path, train_program) fluid.save(program=train_program, model_path=save_path)
if args.validation_set_dir and steps % args.validation_steps == 0: if args.validation_set_dir and steps % args.validation_steps == 0:
vali_cost, vali_lm_cost, vali_acc, vali_steps, vali_speed = predict( vali_cost, vali_lm_cost, vali_acc, vali_steps, vali_speed = predict(
......
...@@ -25,7 +25,7 @@ import paddle.fluid as fluid ...@@ -25,7 +25,7 @@ import paddle.fluid as fluid
def cast_fp32_to_fp16(exe, main_program): def cast_fp32_to_fp16(exe, main_program):
print("Cast parameters to float16 data format.") print("Cast parameters to float16 data format.")
for param in main_program.global_block().all_parameters(): for param in main_program.all_parameters():
if not param.name.endswith(".master"): if not param.name.endswith(".master"):
param_t = fluid.global_scope().find_var(param.name).get_tensor() param_t = fluid.global_scope().find_var(param.name).get_tensor()
data = np.array(param_t) data = np.array(param_t)
...@@ -38,21 +38,9 @@ def cast_fp32_to_fp16(exe, main_program): ...@@ -38,21 +38,9 @@ def cast_fp32_to_fp16(exe, main_program):
def init_checkpoint(exe, init_checkpoint_path, main_program, use_fp16=False): def init_checkpoint(exe, init_checkpoint_path, main_program, use_fp16=False):
assert os.path.exists( fluid.load(
init_checkpoint_path), "[%s] cann't be found." % init_checkpoint_path program=main_program, model_path=init_checkpoint_path, executor=exe)
def existed_persitables(var):
if not fluid.io.is_persistable(var):
return False
if os.path.exists(os.path.join(init_checkpoint_path, var.name)):
print("INIT {}".format(var.name))
return True
fluid.io.load_vars(
exe,
init_checkpoint_path,
main_program=main_program,
predicate=existed_persitables)
print("Load model from {}".format(init_checkpoint_path)) print("Load model from {}".format(init_checkpoint_path))
if use_fp16: if use_fp16:
...@@ -63,24 +51,8 @@ def init_pretraining_params(exe, ...@@ -63,24 +51,8 @@ def init_pretraining_params(exe,
pretraining_params_path, pretraining_params_path,
main_program, main_program,
use_fp16=False): use_fp16=False):
assert os.path.exists(pretraining_params_path fluid.load(
), "[%s] cann't be found." % pretraining_params_path program=main_program, model_path=pretraining_params_path, executor=exe)
def existed_params(var):
if not isinstance(var, fluid.framework.Parameter):
return False
if os.path.exists(os.path.join(pretraining_params_path, var.name)):
print("INIT {}".format(var.name))
return True
else:
print("SKIP {}".format(var.name))
return False
fluid.io.load_vars(
exe,
pretraining_params_path,
main_program=main_program,
predicate=existed_params)
print("Load pretraining parameters from {}.".format( print("Load pretraining parameters from {}.".format(
pretraining_params_path)) pretraining_params_path))
......
...@@ -90,5 +90,3 @@ word_embedding=fluid.layers.concat(input=[elmo_embedding, word_embedding], axis= ...@@ -90,5 +90,3 @@ word_embedding=fluid.layers.concat(input=[elmo_embedding, word_embedding], axis=
### 参考论文 ### 参考论文
[Deep contextualized word representations](https://arxiv.org/abs/1802.05365) [Deep contextualized word representations](https://arxiv.org/abs/1802.05365)
...@@ -7,7 +7,6 @@ from kpi import CostKpi, DurationKpi, AccKpi ...@@ -7,7 +7,6 @@ from kpi import CostKpi, DurationKpi, AccKpi
#### NOTE kpi.py should shared in models in some way!!!! #### NOTE kpi.py should shared in models in some way!!!!
train_duration_sts_b_card1 = DurationKpi( train_duration_sts_b_card1 = DurationKpi(
'train_duration_sts_b_card1', 0.01, 0, actived=True) 'train_duration_sts_b_card1', 0.01, 0, actived=True)
train_cost_sts_b_card1 = CostKpi( train_cost_sts_b_card1 = CostKpi(
......
...@@ -29,7 +29,7 @@ ...@@ -29,7 +29,7 @@
1. PaddlePaddle 安装 1. PaddlePaddle 安装
本项目依赖于 PaddlePaddle Fluid 1.6 及以上版本,请参考 [安装指南](http://www.paddlepaddle.org/#quick-start) 进行安装 本项目依赖于 PaddlePaddle Fluid 1.7 及以上版本,请参考 [安装指南](http://www.paddlepaddle.org/#quick-start) 进行安装
2. 代码安装 2. 代码安装
......
...@@ -13,6 +13,7 @@ from run_classifier import create_model ...@@ -13,6 +13,7 @@ from run_classifier import create_model
import utils import utils
import reader import reader
def do_save_inference_model(args): def do_save_inference_model(args):
if args.use_cuda: if args.use_cuda:
dev_count = fluid.core.get_cuda_device_count() dev_count = fluid.core.get_cuda_device_count()
...@@ -20,9 +21,9 @@ def do_save_inference_model(args): ...@@ -20,9 +21,9 @@ def do_save_inference_model(args):
else: else:
dev_count = int(os.environ.get('CPU_NUM', 1)) dev_count = int(os.environ.get('CPU_NUM', 1))
place = fluid.CPUPlace() place = fluid.CPUPlace()
exe = fluid.Executor(place) exe = fluid.Executor(place)
test_prog = fluid.Program() test_prog = fluid.Program()
startup_prog = fluid.Program() startup_prog = fluid.Program()
...@@ -36,7 +37,7 @@ def do_save_inference_model(args): ...@@ -36,7 +37,7 @@ def do_save_inference_model(args):
test_prog = test_prog.clone(for_test=True) test_prog = test_prog.clone(for_test=True)
exe.run(startup_prog) exe.run(startup_prog)
assert (args.init_checkpoint) assert (args.init_checkpoint)
if args.init_checkpoint: if args.init_checkpoint:
...@@ -53,6 +54,7 @@ def do_save_inference_model(args): ...@@ -53,6 +54,7 @@ def do_save_inference_model(args):
print("save inference model at %s" % (args.inference_model_dir)) print("save inference model at %s" % (args.inference_model_dir))
def inference(exe, test_program, test_pyreader, fetch_list, infer_phrase): def inference(exe, test_program, test_pyreader, fetch_list, infer_phrase):
""" """
Inference Function Inference Function
...@@ -61,13 +63,16 @@ def inference(exe, test_program, test_pyreader, fetch_list, infer_phrase): ...@@ -61,13 +63,16 @@ def inference(exe, test_program, test_pyreader, fetch_list, infer_phrase):
test_pyreader.start() test_pyreader.start()
while True: while True:
try: try:
np_props = exe.run(program=test_program, fetch_list=fetch_list, return_numpy=True) np_props = exe.run(program=test_program,
fetch_list=fetch_list,
return_numpy=True)
for probs in np_props[0]: for probs in np_props[0]:
print("%d\t%f\t%f" % (np.argmax(probs), probs[0], probs[1])) print("%d\t%f\t%f" % (np.argmax(probs), probs[0], probs[1]))
except fluid.core.EOFException: except fluid.core.EOFException:
test_pyreader.reset() test_pyreader.reset()
break break
def test_inference_model(args): def test_inference_model(args):
if args.use_cuda: if args.use_cuda:
dev_count = fluid.core.get_cuda_device_count() dev_count = fluid.core.get_cuda_device_count()
...@@ -75,7 +80,7 @@ def test_inference_model(args): ...@@ -75,7 +80,7 @@ def test_inference_model(args):
else: else:
dev_count = int(os.environ.get('CPU_NUM', 1)) dev_count = int(os.environ.get('CPU_NUM', 1))
place = fluid.CPUPlace() place = fluid.CPUPlace()
exe = fluid.Executor(place) exe = fluid.Executor(place)
test_prog = fluid.Program() test_prog = fluid.Program()
startup_prog = fluid.Program() startup_prog = fluid.Program()
...@@ -92,7 +97,8 @@ def test_inference_model(args): ...@@ -92,7 +97,8 @@ def test_inference_model(args):
exe = fluid.Executor(place) exe = fluid.Executor(place)
exe.run(startup_prog) exe.run(startup_prog)
processor = reader.SentaProcessor(data_dir=args.data_dir, processor = reader.SentaProcessor(
data_dir=args.data_dir,
vocab_path=args.vocab_path, vocab_path=args.vocab_path,
random_seed=args.random_seed, random_seed=args.random_seed,
max_seq_len=args.max_seq_len) max_seq_len=args.max_seq_len)
...@@ -107,14 +113,14 @@ def test_inference_model(args): ...@@ -107,14 +113,14 @@ def test_inference_model(args):
params_filename="params.pdparams") params_filename="params.pdparams")
infer_data_generator = processor.data_generator( infer_data_generator = processor.data_generator(
batch_size=args.batch_size, batch_size=args.batch_size / dev_count,
phase="infer", phase="infer",
epoch=1, epoch=1,
shuffle=False) shuffle=False)
infer_pyreader.decorate_sample_list_generator(infer_data_generator) infer_pyreader.set_sample_list_generator(infer_data_generator)
inference(exe, test_prog, infer_pyreader, inference(exe, test_prog, infer_pyreader, [probs.name], "infer")
[probs.name], "infer")
if __name__ == "__main__": if __name__ == "__main__":
args = PDConfig('senta_config.json') args = PDConfig('senta_config.json')
......
# -*- coding: utf_8 -*- # -*- coding: utf_8 -*-
import os import os
import sys import sys
sys.path.append("../") sys.path.append("../shared_modules/")
sys.path.append("../models/classification") sys.path.append("../shared_modules/models/classification")
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
import numpy as np import numpy as np
...@@ -17,6 +17,7 @@ from models.representation.ernie import ErnieConfig ...@@ -17,6 +17,7 @@ from models.representation.ernie import ErnieConfig
from models.representation.ernie import ernie_encoder, ernie_encoder_with_paddle_hub from models.representation.ernie import ernie_encoder, ernie_encoder_with_paddle_hub
from preprocess.ernie import task_reader from preprocess.ernie import task_reader
def do_save_inference_model(args): def do_save_inference_model(args):
ernie_config = ErnieConfig(args.ernie_config_path) ernie_config = ErnieConfig(args.ernie_config_path)
...@@ -28,30 +29,29 @@ def do_save_inference_model(args): ...@@ -28,30 +29,29 @@ def do_save_inference_model(args):
else: else:
dev_count = int(os.environ.get('CPU_NUM', 1)) dev_count = int(os.environ.get('CPU_NUM', 1))
place = fluid.CPUPlace() place = fluid.CPUPlace()
exe = fluid.Executor(place) exe = fluid.Executor(place)
test_prog = fluid.Program() test_prog = fluid.Program()
startup_prog = fluid.Program() startup_prog = fluid.Program()
with fluid.program_guard(test_prog, startup_prog): with fluid.program_guard(test_prog, startup_prog):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
infer_pyreader, ernie_inputs, labels = ernie_pyreader( infer_pyreader, ernie_inputs, labels = ernie_pyreader(
args, args, pyreader_name="infer_reader")
pyreader_name="infer_reader")
if args.use_paddle_hub: if args.use_paddle_hub:
embeddings = ernie_encoder_with_paddle_hub(ernie_inputs, args.max_seq_len) embeddings = ernie_encoder_with_paddle_hub(ernie_inputs,
args.max_seq_len)
else: else:
embeddings = ernie_encoder(ernie_inputs, ernie_config=ernie_config) embeddings = ernie_encoder(
ernie_inputs, ernie_config=ernie_config)
probs = create_model(args, probs = create_model(
embeddings, args, embeddings, labels=labels, is_prediction=True)
labels=labels,
is_prediction=True)
test_prog = test_prog.clone(for_test=True) test_prog = test_prog.clone(for_test=True)
exe.run(startup_prog) exe.run(startup_prog)
assert (args.init_checkpoint) assert (args.init_checkpoint)
if args.init_checkpoint: if args.init_checkpoint:
...@@ -59,11 +59,11 @@ def do_save_inference_model(args): ...@@ -59,11 +59,11 @@ def do_save_inference_model(args):
fluid.io.save_inference_model( fluid.io.save_inference_model(
args.inference_model_dir, args.inference_model_dir,
feeded_var_names=[ernie_inputs["src_ids"].name, feeded_var_names=[
ernie_inputs["sent_ids"].name, ernie_inputs["src_ids"].name, ernie_inputs["sent_ids"].name,
ernie_inputs["pos_ids"].name, ernie_inputs["pos_ids"].name, ernie_inputs["input_mask"].name,
ernie_inputs["input_mask"].name, ernie_inputs["seq_lens"].name
ernie_inputs["seq_lens"].name], ],
target_vars=[probs], target_vars=[probs],
executor=exe, executor=exe,
main_program=test_prog, main_program=test_prog,
...@@ -72,6 +72,7 @@ def do_save_inference_model(args): ...@@ -72,6 +72,7 @@ def do_save_inference_model(args):
print("save inference model at %s" % (args.inference_model_dir)) print("save inference model at %s" % (args.inference_model_dir))
def inference(exe, test_program, test_pyreader, fetch_list, infer_phrase): def inference(exe, test_program, test_pyreader, fetch_list, infer_phrase):
""" """
Inference Function Inference Function
...@@ -80,13 +81,16 @@ def inference(exe, test_program, test_pyreader, fetch_list, infer_phrase): ...@@ -80,13 +81,16 @@ def inference(exe, test_program, test_pyreader, fetch_list, infer_phrase):
test_pyreader.start() test_pyreader.start()
while True: while True:
try: try:
np_props = exe.run(program=test_program, fetch_list=fetch_list, return_numpy=True) np_props = exe.run(program=test_program,
fetch_list=fetch_list,
return_numpy=True)
for probs in np_props[0]: for probs in np_props[0]:
print("%d\t%f\t%f" % (np.argmax(probs), probs[0], probs[1])) print("%d\t%f\t%f" % (np.argmax(probs), probs[0], probs[1]))
except fluid.core.EOFException: except fluid.core.EOFException:
test_pyreader.reset() test_pyreader.reset()
break break
def test_inference_model(args): def test_inference_model(args):
ernie_config = ErnieConfig(args.ernie_config_path) ernie_config = ErnieConfig(args.ernie_config_path)
ernie_config.print_config() ernie_config.print_config()
...@@ -97,9 +101,9 @@ def test_inference_model(args): ...@@ -97,9 +101,9 @@ def test_inference_model(args):
else: else:
dev_count = int(os.environ.get('CPU_NUM', 1)) dev_count = int(os.environ.get('CPU_NUM', 1))
place = fluid.CPUPlace() place = fluid.CPUPlace()
exe = fluid.Executor(place) exe = fluid.Executor(place)
reader = task_reader.ClassifyReader( reader = task_reader.ClassifyReader(
vocab_path=args.vocab_path, vocab_path=args.vocab_path,
label_map_config=args.label_map_config, label_map_config=args.label_map_config,
...@@ -113,15 +117,11 @@ def test_inference_model(args): ...@@ -113,15 +117,11 @@ def test_inference_model(args):
with fluid.program_guard(test_prog, startup_prog): with fluid.program_guard(test_prog, startup_prog):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
infer_pyreader, ernie_inputs, labels = ernie_pyreader( infer_pyreader, ernie_inputs, labels = ernie_pyreader(
args, args, pyreader_name="infer_pyreader")
pyreader_name="infer_pyreader")
embeddings = ernie_encoder(ernie_inputs, ernie_config=ernie_config) embeddings = ernie_encoder(ernie_inputs, ernie_config=ernie_config)
probs = create_model( probs = create_model(
args, args, embeddings, labels=labels, is_prediction=True)
embeddings,
labels=labels,
is_prediction=True)
test_prog = test_prog.clone(for_test=True) test_prog = test_prog.clone(for_test=True)
exe.run(startup_prog) exe.run(startup_prog)
...@@ -129,7 +129,7 @@ def test_inference_model(args): ...@@ -129,7 +129,7 @@ def test_inference_model(args):
assert (args.inference_model_dir) assert (args.inference_model_dir)
infer_data_generator = reader.data_generator( infer_data_generator = reader.data_generator(
input_file=args.test_set, input_file=args.test_set,
batch_size=args.batch_size, batch_size=args.batch_size / dev_count,
phase="infer", phase="infer",
epoch=1, epoch=1,
shuffle=False) shuffle=False)
...@@ -140,9 +140,9 @@ def test_inference_model(args): ...@@ -140,9 +140,9 @@ def test_inference_model(args):
model_filename="model.pdmodel", model_filename="model.pdmodel",
params_filename="params.pdparams") params_filename="params.pdparams")
infer_pyreader.decorate_batch_generator(infer_data_generator) infer_pyreader.set_batch_generator(infer_data_generator)
inference(exe, test_prog, infer_pyreader, inference(exe, test_prog, infer_pyreader, [probs.name], "infer")
[probs.name], "infer")
if __name__ == "__main__": if __name__ == "__main__":
args = PDConfig() args = PDConfig()
......
...@@ -12,8 +12,8 @@ import argparse ...@@ -12,8 +12,8 @@ import argparse
import numpy as np import numpy as np
import multiprocessing import multiprocessing
import sys import sys
sys.path.append("../models/classification/") sys.path.append("../shared_modules/models/classification/")
sys.path.append("../") sys.path.append("../shared_modules/")
from nets import bow_net from nets import bow_net
from nets import lstm_net from nets import lstm_net
...@@ -30,24 +30,19 @@ import paddle.fluid as fluid ...@@ -30,24 +30,19 @@ import paddle.fluid as fluid
import reader import reader
from utils import init_checkpoint from utils import init_checkpoint
def create_model(args,
pyreader_name,
num_labels,
is_prediction=False):
def create_model(args, pyreader_name, num_labels, is_prediction=False):
""" """
Create Model for sentiment classification Create Model for sentiment classification
""" """
data = fluid.layers.data( data = fluid.data(
name="src_ids", shape=[-1, args.max_seq_len], dtype='int64') name="src_ids", shape=[None, args.max_seq_len], dtype='int64')
label = fluid.layers.data( label = fluid.data(name="label", shape=[None, 1], dtype="int64")
name="label", shape=[-1, 1], dtype="int64") seq_len = fluid.data(name="seq_len", shape=[None], dtype="int64")
seq_len = fluid.layers.data(
name="seq_len", shape=[-1], dtype="int64") data_reader = fluid.io.DataLoader.from_generator(
feed_list=[data, label, seq_len], capacity=4, iterable=False)
data_reader = fluid.io.PyReader(feed_list=[data, label, seq_len],
capacity=4, iterable=False)
if args.model_type == "bilstm_net": if args.model_type == "bilstm_net":
network = bilstm_net network = bilstm_net
...@@ -63,18 +58,19 @@ def create_model(args, ...@@ -63,18 +58,19 @@ def create_model(args,
raise ValueError("Unknown network type!") raise ValueError("Unknown network type!")
if is_prediction: if is_prediction:
probs = network(data, seq_len, None, args.vocab_size, is_prediction=is_prediction) probs = network(
data, seq_len, None, args.vocab_size, is_prediction=is_prediction)
print("create inference model...") print("create inference model...")
return data_reader, probs, [data.name, seq_len.name] return data_reader, probs, [data.name, seq_len.name]
ce_loss, probs = network(data, seq_len, label, args.vocab_size, is_prediction=is_prediction) ce_loss, probs = network(
data, seq_len, label, args.vocab_size, is_prediction=is_prediction)
loss = fluid.layers.mean(x=ce_loss) loss = fluid.layers.mean(x=ce_loss)
num_seqs = fluid.layers.create_tensor(dtype='int64') num_seqs = fluid.layers.create_tensor(dtype='int64')
accuracy = fluid.layers.accuracy(input=probs, label=label, total=num_seqs) accuracy = fluid.layers.accuracy(input=probs, label=label, total=num_seqs)
return data_reader, loss, accuracy, num_seqs return data_reader, loss, accuracy, num_seqs
def evaluate(exe, test_program, test_pyreader, fetch_list, eval_phase): def evaluate(exe, test_program, test_pyreader, fetch_list, eval_phase):
""" """
Evaluation Function Evaluation Function
...@@ -99,8 +95,8 @@ def evaluate(exe, test_program, test_pyreader, fetch_list, eval_phase): ...@@ -99,8 +95,8 @@ def evaluate(exe, test_program, test_pyreader, fetch_list, eval_phase):
break break
time_end = time.time() time_end = time.time()
print("[%s evaluation] ave loss: %f, ave acc: %f, elapsed time: %f s" % print("[%s evaluation] ave loss: %f, ave acc: %f, elapsed time: %f s" %
(eval_phase, np.sum(total_cost) / np.sum(total_num_seqs), (eval_phase, np.sum(total_cost) / np.sum(total_num_seqs),
np.sum(total_acc) / np.sum(total_num_seqs), time_end - time_begin)) np.sum(total_acc) / np.sum(total_num_seqs), time_end - time_begin))
def inference(exe, test_program, test_pyreader, fetch_list, infer_phrase): def inference(exe, test_program, test_pyreader, fetch_list, infer_phrase):
...@@ -111,8 +107,9 @@ def inference(exe, test_program, test_pyreader, fetch_list, infer_phrase): ...@@ -111,8 +107,9 @@ def inference(exe, test_program, test_pyreader, fetch_list, infer_phrase):
time_begin = time.time() time_begin = time.time()
while True: while True:
try: try:
np_props = exe.run(program=test_program, fetch_list=fetch_list, np_props = exe.run(program=test_program,
return_numpy=True) fetch_list=fetch_list,
return_numpy=True)
for probs in np_props[0]: for probs in np_props[0]:
print("%d\t%f\t%f" % (np.argmax(probs), probs[0], probs[1])) print("%d\t%f\t%f" % (np.argmax(probs), probs[0], probs[1]))
except fluid.core.EOFException: except fluid.core.EOFException:
...@@ -135,10 +132,11 @@ def main(args): ...@@ -135,10 +132,11 @@ def main(args):
exe = fluid.Executor(place) exe = fluid.Executor(place)
task_name = args.task_name.lower() task_name = args.task_name.lower()
processor = reader.SentaProcessor(data_dir=args.data_dir, processor = reader.SentaProcessor(
vocab_path=args.vocab_path, data_dir=args.data_dir,
random_seed=args.random_seed, vocab_path=args.vocab_path,
max_seq_len=args.max_seq_len) random_seed=args.random_seed,
max_seq_len=args.max_seq_len)
num_labels = len(processor.get_labels()) num_labels = len(processor.get_labels())
if not (args.do_train or args.do_val or args.do_infer): if not (args.do_train or args.do_val or args.do_infer):
...@@ -151,7 +149,7 @@ def main(args): ...@@ -151,7 +149,7 @@ def main(args):
if args.do_train: if args.do_train:
train_data_generator = processor.data_generator( train_data_generator = processor.data_generator(
batch_size=args.batch_size, batch_size=args.batch_size / dev_count,
phase='train', phase='train',
epoch=args.epoch, epoch=args.epoch,
shuffle=True) shuffle=True)
...@@ -183,11 +181,11 @@ def main(args): ...@@ -183,11 +181,11 @@ def main(args):
lower_mem, upper_mem, unit = fluid.contrib.memory_usage( lower_mem, upper_mem, unit = fluid.contrib.memory_usage(
program=train_program, batch_size=args.batch_size) program=train_program, batch_size=args.batch_size)
print("Theoretical memory usage in training: %.3f - %.3f %s" % print("Theoretical memory usage in training: %.3f - %.3f %s" %
(lower_mem, upper_mem, unit)) (lower_mem, upper_mem, unit))
if args.do_val: if args.do_val:
test_data_generator = processor.data_generator( test_data_generator = processor.data_generator(
batch_size=args.batch_size, batch_size=args.batch_size / dev_count,
phase='dev', phase='dev',
epoch=1, epoch=1,
shuffle=False) shuffle=False)
...@@ -204,7 +202,7 @@ def main(args): ...@@ -204,7 +202,7 @@ def main(args):
if args.do_infer: if args.do_infer:
infer_data_generator = processor.data_generator( infer_data_generator = processor.data_generator(
batch_size=args.batch_size, batch_size=args.batch_size / dev_count,
phase='infer', phase='infer',
epoch=1, epoch=1,
shuffle=False) shuffle=False)
...@@ -223,30 +221,25 @@ def main(args): ...@@ -223,30 +221,25 @@ def main(args):
if args.do_train: if args.do_train:
if args.init_checkpoint: if args.init_checkpoint:
init_checkpoint( init_checkpoint(
exe, exe, args.init_checkpoint, main_program=startup_prog)
args.init_checkpoint,
main_program=startup_prog)
elif args.do_val or args.do_infer: elif args.do_val or args.do_infer:
if not args.init_checkpoint: if not args.init_checkpoint:
raise ValueError("args 'init_checkpoint' should be set if" raise ValueError("args 'init_checkpoint' should be set if"
"only doing validation or testing!") "only doing validation or testing!")
init_checkpoint( init_checkpoint(exe, args.init_checkpoint, main_program=startup_prog)
exe,
args.init_checkpoint,
main_program=startup_prog)
if args.do_train: if args.do_train:
train_exe = exe train_exe = exe
train_reader.decorate_sample_list_generator(train_data_generator) train_reader.set_sample_list_generator(train_data_generator)
else: else:
train_exe = None train_exe = None
if args.do_val: if args.do_val:
test_exe = exe test_exe = exe
test_reader.decorate_sample_list_generator(test_data_generator) test_reader.set_sample_list_generator(test_data_generator)
if args.do_infer: if args.do_infer:
test_exe = exe test_exe = exe
infer_reader.decorate_sample_list_generator(infer_data_generator) infer_reader.set_sample_list_generator(infer_data_generator)
if args.do_train: if args.do_train:
train_reader.start() train_reader.start()
...@@ -262,7 +255,9 @@ def main(args): ...@@ -262,7 +255,9 @@ def main(args):
else: else:
fetch_list = [] fetch_list = []
outputs = train_exe.run(program=train_program, fetch_list=fetch_list, return_numpy=False) outputs = train_exe.run(program=train_program,
fetch_list=fetch_list,
return_numpy=False)
#print("finished one step") #print("finished one step")
if steps % args.skip_steps == 0: if steps % args.skip_steps == 0:
np_loss, np_acc, np_num_seqs = outputs np_loss, np_acc, np_num_seqs = outputs
...@@ -274,35 +269,37 @@ def main(args): ...@@ -274,35 +269,37 @@ def main(args):
total_num_seqs.extend(np_num_seqs) total_num_seqs.extend(np_num_seqs)
if args.verbose: if args.verbose:
verbose = "train pyreader queue size: %d, " % train_pyreader.queue.size() verbose = "train pyreader queue size: %d, " % train_pyreader.queue.size(
)
print(verbose) print(verbose)
time_end = time.time() time_end = time.time()
used_time = time_end - time_begin used_time = time_end - time_begin
print("step: %d, ave loss: %f, " print("step: %d, ave loss: %f, "
"ave acc: %f, speed: %f steps/s" % "ave acc: %f, speed: %f steps/s" %
(steps, np.sum(total_cost) / np.sum(total_num_seqs), (steps, np.sum(total_cost) / np.sum(total_num_seqs),
np.sum(total_acc) / np.sum(total_num_seqs), np.sum(total_acc) / np.sum(total_num_seqs),
args.skip_steps / used_time)) args.skip_steps / used_time))
total_cost, total_acc, total_num_seqs = [], [], [] total_cost, total_acc, total_num_seqs = [], [], []
time_begin = time.time() time_begin = time.time()
if steps % args.save_steps == 0: if steps % args.save_steps == 0:
save_path = os.path.join(args.checkpoints, save_path = os.path.join(args.checkpoints,
"step_" + str(steps)) "step_" + str(steps), "checkpoint")
fluid.io.save_persistables(exe, save_path, train_program) fluid.save(train_program, save_path)
if steps % args.validation_steps == 0: if steps % args.validation_steps == 0:
# evaluate dev set # evaluate dev set
if args.do_val: if args.do_val:
print("do evalatation") print("do evalatation")
evaluate(exe, test_prog, test_reader, evaluate(exe, test_prog, test_reader,
[loss.name, accuracy.name, num_seqs.name], [loss.name, accuracy.name, num_seqs.name],
"dev") "dev")
except fluid.core.EOFException: except fluid.core.EOFException:
save_path = os.path.join(args.checkpoints, "step_" + str(steps)) save_path = os.path.join(args.checkpoints, "step_" + str(steps),
fluid.io.save_persistables(exe, save_path, train_program) "checkpoint")
fluid.save(train_program, save_path)
train_reader.reset() train_reader.reset()
break break
...@@ -310,13 +307,12 @@ def main(args): ...@@ -310,13 +307,12 @@ def main(args):
if args.do_val: if args.do_val:
print("Final validation result:") print("Final validation result:")
evaluate(exe, test_prog, test_reader, evaluate(exe, test_prog, test_reader,
[loss.name, accuracy.name, num_seqs.name], "dev") [loss.name, accuracy.name, num_seqs.name], "dev")
# final eval on test set # final eval on test set
if args.do_infer: if args.do_infer:
print("Final test result:") print("Final test result:")
inference(exe, infer_prog, infer_reader, inference(exe, infer_prog, infer_reader, [prop.name], "infer")
[prop.name], "infer")
if __name__ == "__main__": if __name__ == "__main__":
......
...@@ -16,8 +16,8 @@ import sys ...@@ -16,8 +16,8 @@ import sys
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
sys.path.append("../models/classification/") sys.path.append("../shared_modules/models/classification/")
sys.path.append("..") sys.path.append("../shared_modules/")
print(sys.path) print(sys.path)
from nets import bow_net from nets import bow_net
...@@ -36,40 +36,37 @@ from config import PDConfig ...@@ -36,40 +36,37 @@ from config import PDConfig
from utils import init_checkpoint from utils import init_checkpoint
def ernie_pyreader(args, pyreader_name): def ernie_pyreader(args, pyreader_name):
src_ids = fluid.layers.data( src_ids = fluid.data(
name="src_ids", shape=[-1, args.max_seq_len, 1], dtype="int64") name="src_ids", shape=[None, args.max_seq_len, 1], dtype="int64")
sent_ids = fluid.layers.data( sent_ids = fluid.data(
name="sent_ids", shape=[-1, args.max_seq_len, 1], dtype="int64") name="sent_ids", shape=[None, args.max_seq_len, 1], dtype="int64")
pos_ids = fluid.layers.data( pos_ids = fluid.data(
name="pos_ids", shape=[-1, args.max_seq_len, 1], dtype="int64") name="pos_ids", shape=[None, args.max_seq_len, 1], dtype="int64")
input_mask = fluid.layers.data( input_mask = fluid.data(
name="input_mask", shape=[-1, args.max_seq_len, 1], dtype="float32") name="input_mask", shape=[None, args.max_seq_len, 1], dtype="float32")
labels = fluid.layers.data( labels = fluid.data(name="labels", shape=[None, 1], dtype="int64")
name="labels", shape=[-1, 1], dtype="int64") seq_lens = fluid.data(name="seq_lens", shape=[None], dtype="int64")
seq_lens = fluid.layers.data(
name="seq_lens", shape=[-1], dtype="int64")
pyreader = fluid.io.DataLoader.from_generator( pyreader = fluid.io.DataLoader.from_generator(
feed_list=[src_ids, sent_ids, pos_ids, input_mask, labels, seq_lens], feed_list=[src_ids, sent_ids, pos_ids, input_mask, labels, seq_lens],
capacity=50, capacity=50,
iterable=False, iterable=False,
use_double_buffer=True) use_double_buffer=True)
ernie_inputs = { ernie_inputs = {
"src_ids": src_ids, "src_ids": src_ids,
"sent_ids": sent_ids, "sent_ids": sent_ids,
"pos_ids": pos_ids, "pos_ids": pos_ids,
"input_mask": input_mask, "input_mask": input_mask,
"seq_lens": seq_lens} "seq_lens": seq_lens
}
return pyreader, ernie_inputs, labels return pyreader, ernie_inputs, labels
def create_model(args,
embeddings,
labels,
is_prediction=False):
def create_model(args, embeddings, labels, is_prediction=False):
""" """
Create Model for sentiment classification based on ERNIE encoder Create Model for sentiment classification based on ERNIE encoder
""" """
...@@ -78,11 +75,11 @@ def create_model(args, ...@@ -78,11 +75,11 @@ def create_model(args,
if args.model_type == "ernie_base": if args.model_type == "ernie_base":
ce_loss, probs = ernie_base_net(sentence_embeddings, labels, ce_loss, probs = ernie_base_net(sentence_embeddings, labels,
args.num_labels) args.num_labels)
elif args.model_type == "ernie_bilstm": elif args.model_type == "ernie_bilstm":
ce_loss, probs = ernie_bilstm_net(token_embeddings, labels, ce_loss, probs = ernie_bilstm_net(token_embeddings, labels,
args.num_labels) args.num_labels)
else: else:
raise ValueError("Unknown network type!") raise ValueError("Unknown network type!")
...@@ -120,8 +117,8 @@ def evaluate(exe, test_program, test_pyreader, fetch_list, eval_phase): ...@@ -120,8 +117,8 @@ def evaluate(exe, test_program, test_pyreader, fetch_list, eval_phase):
break break
time_end = time.time() time_end = time.time()
print("[%s evaluation] ave loss: %f, ave acc: %f, elapsed time: %f s" % print("[%s evaluation] ave loss: %f, ave acc: %f, elapsed time: %f s" %
(eval_phase, np.sum(total_cost) / np.sum(total_num_seqs), (eval_phase, np.sum(total_cost) / np.sum(total_num_seqs),
np.sum(total_acc) / np.sum(total_num_seqs), time_end - time_begin)) np.sum(total_acc) / np.sum(total_num_seqs), time_end - time_begin))
def infer(exe, infer_program, infer_pyreader, fetch_list, infer_phase): def infer(exe, infer_program, infer_pyreader, fetch_list, infer_phase):
...@@ -132,8 +129,9 @@ def infer(exe, infer_program, infer_pyreader, fetch_list, infer_phase): ...@@ -132,8 +129,9 @@ def infer(exe, infer_program, infer_pyreader, fetch_list, infer_phase):
time_begin = time.time() time_begin = time.time()
while True: while True:
try: try:
batch_probs = exe.run(program=infer_program, fetch_list=fetch_list, batch_probs = exe.run(program=infer_program,
return_numpy=True) fetch_list=fetch_list,
return_numpy=True)
for probs in batch_probs[0]: for probs in batch_probs[0]:
print("%d\t%f\t%f" % (np.argmax(probs), probs[0], probs[1])) print("%d\t%f\t%f" % (np.argmax(probs), probs[0], probs[1]))
except fluid.core.EOFException: except fluid.core.EOFException:
...@@ -195,21 +193,19 @@ def main(args): ...@@ -195,21 +193,19 @@ def main(args):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
# create ernie_pyreader # create ernie_pyreader
train_pyreader, ernie_inputs, labels = ernie_pyreader( train_pyreader, ernie_inputs, labels = ernie_pyreader(
args, args, pyreader_name='train_pyreader')
pyreader_name='train_pyreader')
# get ernie_embeddings # get ernie_embeddings
if args.use_paddle_hub: if args.use_paddle_hub:
embeddings = ernie_encoder_with_paddle_hub(ernie_inputs, args.max_seq_len) embeddings = ernie_encoder_with_paddle_hub(ernie_inputs,
args.max_seq_len)
else: else:
embeddings = ernie_encoder(ernie_inputs, ernie_config=ernie_config) embeddings = ernie_encoder(
ernie_inputs, ernie_config=ernie_config)
# user defined model based on ernie embeddings # user defined model based on ernie embeddings
loss, accuracy, num_seqs = create_model( loss, accuracy, num_seqs = create_model(
args, args, embeddings, labels=labels, is_prediction=False)
embeddings,
labels=labels,
is_prediction=False)
optimizer = fluid.optimizer.Adam(learning_rate=args.lr) optimizer = fluid.optimizer.Adam(learning_rate=args.lr)
optimizer.minimize(loss) optimizer.minimize(loss)
...@@ -218,62 +214,59 @@ def main(args): ...@@ -218,62 +214,59 @@ def main(args):
lower_mem, upper_mem, unit = fluid.contrib.memory_usage( lower_mem, upper_mem, unit = fluid.contrib.memory_usage(
program=train_program, batch_size=args.batch_size) program=train_program, batch_size=args.batch_size)
print("Theoretical memory usage in training: %.3f - %.3f %s" % print("Theoretical memory usage in training: %.3f - %.3f %s" %
(lower_mem, upper_mem, unit)) (lower_mem, upper_mem, unit))
if args.do_val: if args.do_val:
test_data_generator = reader.data_generator( test_data_generator = reader.data_generator(
input_file=args.dev_set, input_file=args.dev_set,
batch_size=args.batch_size, batch_size=args.batch_size,
phase='dev', phase='dev',
epoch=1, epoch=1,
shuffle=False) shuffle=False)
test_prog = fluid.Program() test_prog = fluid.Program()
with fluid.program_guard(test_prog, startup_prog): with fluid.program_guard(test_prog, startup_prog):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
# create ernie_pyreader # create ernie_pyreader
test_pyreader, ernie_inputs, labels = ernie_pyreader( test_pyreader, ernie_inputs, labels = ernie_pyreader(
args, args, pyreader_name='eval_reader')
pyreader_name='eval_reader')
# get ernie_embeddings # get ernie_embeddings
if args.use_paddle_hub: if args.use_paddle_hub:
embeddings = ernie_encoder_with_paddle_hub(ernie_inputs, args.max_seq_len) embeddings = ernie_encoder_with_paddle_hub(ernie_inputs,
args.max_seq_len)
else: else:
embeddings = ernie_encoder(ernie_inputs, ernie_config=ernie_config) embeddings = ernie_encoder(
ernie_inputs, ernie_config=ernie_config)
# user defined model based on ernie embeddings # user defined model based on ernie embeddings
loss, accuracy, num_seqs = create_model( loss, accuracy, num_seqs = create_model(
args, args, embeddings, labels=labels, is_prediction=False)
embeddings,
labels=labels,
is_prediction=False)
test_prog = test_prog.clone(for_test=True) test_prog = test_prog.clone(for_test=True)
if args.do_infer: if args.do_infer:
infer_data_generator = reader.data_generator( infer_data_generator = reader.data_generator(
input_file=args.test_set, input_file=args.test_set,
batch_size=args.batch_size, batch_size=args.batch_size,
phase='infer', phase='infer',
epoch=1, epoch=1,
shuffle=False) shuffle=False)
infer_prog = fluid.Program() infer_prog = fluid.Program()
with fluid.program_guard(infer_prog, startup_prog): with fluid.program_guard(infer_prog, startup_prog):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
infer_pyreader, ernie_inputs, labels = ernie_pyreader( infer_pyreader, ernie_inputs, labels = ernie_pyreader(
args, args, pyreader_name="infer_pyreader")
pyreader_name="infer_pyreader")
# get ernie_embeddings # get ernie_embeddings
if args.use_paddle_hub: if args.use_paddle_hub:
embeddings = ernie_encoder_with_paddle_hub(ernie_inputs, args.max_seq_len) embeddings = ernie_encoder_with_paddle_hub(ernie_inputs,
args.max_seq_len)
else: else:
embeddings = ernie_encoder(ernie_inputs, ernie_config=ernie_config) embeddings = ernie_encoder(
ernie_inputs, ernie_config=ernie_config)
probs = create_model(args, probs = create_model(
embeddings, args, embeddings, labels=labels, is_prediction=True)
labels=labels,
is_prediction=True)
infer_prog = infer_prog.clone(for_test=True) infer_prog = infer_prog.clone(for_test=True)
...@@ -282,25 +275,17 @@ def main(args): ...@@ -282,25 +275,17 @@ def main(args):
if args.do_train: if args.do_train:
if args.init_checkpoint: if args.init_checkpoint:
init_checkpoint( init_checkpoint(
exe, exe, args.init_checkpoint, main_program=train_program)
args.init_checkpoint,
main_program=train_program)
elif args.do_val: elif args.do_val:
if not args.init_checkpoint: if not args.init_checkpoint:
raise ValueError("args 'init_checkpoint' should be set if" raise ValueError("args 'init_checkpoint' should be set if"
"only doing validation or testing!") "only doing validation or testing!")
init_checkpoint( init_checkpoint(exe, args.init_checkpoint, main_program=test_prog)
exe,
args.init_checkpoint,
main_program=test_prog)
elif args.do_infer: elif args.do_infer:
if not args.init_checkpoint: if not args.init_checkpoint:
raise ValueError("args 'init_checkpoint' should be set if" raise ValueError("args 'init_checkpoint' should be set if"
"only doing validation or testing!") "only doing validation or testing!")
init_checkpoint( init_checkpoint(exe, args.init_checkpoint, main_program=infer_prog)
exe,
args.init_checkpoint,
main_program=infer_prog)
if args.do_train: if args.do_train:
train_exe = exe train_exe = exe
...@@ -327,7 +312,9 @@ def main(args): ...@@ -327,7 +312,9 @@ def main(args):
else: else:
fetch_list = [] fetch_list = []
outputs = train_exe.run(program=train_program, fetch_list=fetch_list, return_numpy=False) outputs = train_exe.run(program=train_program,
fetch_list=fetch_list,
return_numpy=False)
if steps % args.skip_steps == 0: if steps % args.skip_steps == 0:
np_loss, np_acc, np_num_seqs = outputs np_loss, np_acc, np_num_seqs = outputs
np_loss = np.array(np_loss) np_loss = np.array(np_loss)
...@@ -338,34 +325,36 @@ def main(args): ...@@ -338,34 +325,36 @@ def main(args):
total_num_seqs.extend(np_num_seqs) total_num_seqs.extend(np_num_seqs)
if args.verbose: if args.verbose:
verbose = "train pyreader queue size: %d, " % train_pyreader.queue.size() verbose = "train pyreader queue size: %d, " % train_pyreader.queue.size(
)
print(verbose) print(verbose)
time_end = time.time() time_end = time.time()
used_time = time_end - time_begin used_time = time_end - time_begin
print("step: %d, ave loss: %f, " print("step: %d, ave loss: %f, "
"ave acc: %f, speed: %f steps/s" % "ave acc: %f, speed: %f steps/s" %
(steps, np.sum(total_cost) / np.sum(total_num_seqs), (steps, np.sum(total_cost) / np.sum(total_num_seqs),
np.sum(total_acc) / np.sum(total_num_seqs), np.sum(total_acc) / np.sum(total_num_seqs),
args.skip_steps / used_time)) args.skip_steps / used_time))
total_cost, total_acc, total_num_seqs = [], [], [] total_cost, total_acc, total_num_seqs = [], [], []
time_begin = time.time() time_begin = time.time()
if steps % args.save_steps == 0: if steps % args.save_steps == 0:
save_path = os.path.join(args.checkpoints, save_path = os.path.join(args.checkpoints,
"step_" + str(steps)) "step_" + str(steps), "checkpoint")
fluid.io.save_persistables(exe, save_path, train_program) fluid.save(train_program, save_path)
if steps % args.validation_steps == 0: if steps % args.validation_steps == 0:
# evaluate dev set # evaluate dev set
if args.do_val: if args.do_val:
evaluate(exe, test_prog, test_pyreader, evaluate(exe, test_prog, test_pyreader,
[loss.name, accuracy.name, num_seqs.name], [loss.name, accuracy.name, num_seqs.name],
"dev") "dev")
except fluid.core.EOFException: except fluid.core.EOFException:
save_path = os.path.join(args.checkpoints, "step_" + str(steps)) save_path = os.path.join(args.checkpoints, "step_" + str(steps),
fluid.io.save_persistables(exe, save_path, train_program) "checkpoint")
fluid.save(train_program, save_path)
train_pyreader.reset() train_pyreader.reset()
break break
...@@ -373,13 +362,13 @@ def main(args): ...@@ -373,13 +362,13 @@ def main(args):
if args.do_val: if args.do_val:
print("Final validation result:") print("Final validation result:")
evaluate(exe, test_prog, test_pyreader, evaluate(exe, test_prog, test_pyreader,
[loss.name, accuracy.name, num_seqs.name], "dev") [loss.name, accuracy.name, num_seqs.name], "dev")
# final eval on test set # final eval on test set
if args.do_infer: if args.do_infer:
print("Final test result:") print("Final test result:")
infer(exe, infer_prog, infer_pyreader, infer(exe, infer_prog, infer_pyreader, [probs.name], "infer")
[probs.name], "infer")
if __name__ == "__main__": if __name__ == "__main__":
args = PDConfig() args = PDConfig()
......
...@@ -31,6 +31,7 @@ class ArgumentGroup(object): ...@@ -31,6 +31,7 @@ class ArgumentGroup(object):
""" """
Argument Class Argument Class
""" """
def __init__(self, parser, title, des): def __init__(self, parser, title, des):
self._group = parser.add_argument_group(title=title, description=des) self._group = parser.add_argument_group(title=title, description=des)
...@@ -63,23 +64,14 @@ def init_checkpoint(exe, init_checkpoint_path, main_program): ...@@ -63,23 +64,14 @@ def init_checkpoint(exe, init_checkpoint_path, main_program):
""" """
assert os.path.exists( assert os.path.exists(
init_checkpoint_path), "[%s] cann't be found." % init_checkpoint_path init_checkpoint_path), "[%s] cann't be found." % init_checkpoint_path
try:
def existed_persitables(var): checkpoint_path = os.path.join(init_checkpoint_path, "checkpoint")
""" fluid.load(main_program, checkpoint_path, exe)
If existed presitabels except:
""" fluid.load(main_program, init_checkpoint_path, exe)
if not fluid.io.is_persistable(var):
return False
return os.path.exists(os.path.join(init_checkpoint_path, var.name))
fluid.io.load_vars(
exe,
init_checkpoint_path,
main_program=main_program,
predicate=existed_persitables)
print("Load model from {}".format(init_checkpoint_path)) print("Load model from {}".format(init_checkpoint_path))
def data_reader(file_path, word_dict, num_examples, phrase, epoch, max_seq_len): def data_reader(file_path, word_dict, num_examples, phrase, epoch, max_seq_len):
""" """
Convert word sequence into slot Convert word sequence into slot
...@@ -96,8 +88,10 @@ def data_reader(file_path, word_dict, num_examples, phrase, epoch, max_seq_len): ...@@ -96,8 +88,10 @@ def data_reader(file_path, word_dict, num_examples, phrase, epoch, max_seq_len):
sys.stderr.write("[NOTICE] Error Format Line!") sys.stderr.write("[NOTICE] Error Format Line!")
continue continue
label = int(cols[1]) label = int(cols[1])
wids = [word_dict[x] if x in word_dict else unk_id wids = [
for x in cols[0].split(" ")] word_dict[x] if x in word_dict else unk_id
for x in cols[0].split(" ")
]
seq_len = len(wids) seq_len = len(wids)
if seq_len < max_seq_len: if seq_len < max_seq_len:
for i in range(max_seq_len - seq_len): for i in range(max_seq_len - seq_len):
...@@ -111,7 +105,7 @@ def data_reader(file_path, word_dict, num_examples, phrase, epoch, max_seq_len): ...@@ -111,7 +105,7 @@ def data_reader(file_path, word_dict, num_examples, phrase, epoch, max_seq_len):
random.shuffle(all_data) random.shuffle(all_data)
num_examples[phrase] = len(all_data) num_examples[phrase] = len(all_data)
def reader(): def reader():
""" """
Reader Function Reader Function
...@@ -119,8 +113,10 @@ def data_reader(file_path, word_dict, num_examples, phrase, epoch, max_seq_len): ...@@ -119,8 +113,10 @@ def data_reader(file_path, word_dict, num_examples, phrase, epoch, max_seq_len):
for epoch_index in range(epoch): for epoch_index in range(epoch):
for doc, label, seq_len in all_data: for doc, label, seq_len in all_data:
yield doc, label, seq_len yield doc, label, seq_len
return reader return reader
def load_vocab(file_path): def load_vocab(file_path):
""" """
load the given vocabulary load the given vocabulary
...@@ -144,15 +140,6 @@ def init_pretraining_params(exe, ...@@ -144,15 +140,6 @@ def init_pretraining_params(exe,
assert os.path.exists(pretraining_params_path assert os.path.exists(pretraining_params_path
), "[%s] cann't be found." % pretraining_params_path ), "[%s] cann't be found." % pretraining_params_path
def _existed_params(var): fluid.load(main_program, pretraining_params_path, exe)
if not isinstance(var, fluid.framework.Parameter):
return False
return os.path.exists(os.path.join(pretraining_params_path, var.name))
fluid.io.load_vars(
exe,
pretraining_params_path,
main_program=main_program,
predicate=_existed_params)
print("Load pretraining parameters from {}.".format( print("Load pretraining parameters from {}.".format(
pretraining_params_path)) pretraining_params_path))
运行本目录下的范例模型需要安装PaddlePaddle Fluid 1.6版。如果您的 PaddlePaddle 安装版本低于此要求,请按照[安装文档](https://www.paddlepaddle.org.cn/#quick-start)中的说明更新 PaddlePaddle 安装版本。 运行本目录下的范例模型需要安装PaddlePaddle Fluid 1.7版本。如果您的 PaddlePaddle 安装版本低于此要求,请按照[安装文档](https://www.paddlepaddle.org.cn/#quick-start)中的说明更新 PaddlePaddle 安装版本。
# Sequence to Sequence (Seq2Seq) # Sequence to Sequence (Seq2Seq)
......
...@@ -93,7 +93,7 @@ def infer(): ...@@ -93,7 +93,7 @@ def infer():
# clone from default main program and use it as the validation program # clone from default main program and use it as the validation program
main_program = fluid.default_main_program() main_program = fluid.default_main_program()
main_program = main_program.clone(for_test=True) main_program = main_program.clone(for_test=True)
print([param.name for param in main_program.blocks[0].all_parameters()]) print([param.name for param in main_program.all_parameters()])
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace() place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = Executor(place) exe = Executor(place)
...@@ -127,7 +127,8 @@ def infer(): ...@@ -127,7 +127,8 @@ def infer():
dir_name = args.reload_model dir_name = args.reload_model
print("dir name", dir_name) print("dir name", dir_name)
fluid.io.load_params(exe, dir_name) dir_name = os.path.join(dir_name, "checkpoint")
fluid.load(main_program, dir_name, exe)
train_data_iter = reader.get_data_iter(infer_data, 1, mode='eval') train_data_iter = reader.get_data_iter(infer_data, 1, mode='eval')
......
...@@ -214,7 +214,7 @@ def main(): ...@@ -214,7 +214,7 @@ def main():
ce_ppl.append(np.exp(total_loss / word_count)) ce_ppl.append(np.exp(total_loss / word_count))
total_loss = 0.0 total_loss = 0.0
word_count = 0.0 word_count = 0.0
# profiler tools # profiler tools
if args.profile and epoch_id == 0 and batch_id == 100: if args.profile and epoch_id == 0 and batch_id == 100:
profiler.reset_profiler() profiler.reset_profiler()
...@@ -229,10 +229,10 @@ def main(): ...@@ -229,10 +229,10 @@ def main():
% (epoch_id, epoch_time, sum(batch_times) / len(batch_times))) % (epoch_id, epoch_time, sum(batch_times) / len(batch_times)))
if not args.profile: if not args.profile:
dir_name = os.path.join(args.model_path, save_path = os.path.join(args.model_path,
"epoch_" + str(epoch_id)) "epoch_" + str(epoch_id), "checkpoint")
print("begin to save", dir_name) print("begin to save", save_path)
fluid.io.save_params(exe, dir_name, main_program=train_program) fluid.save(train_program, save_path)
print("save finished") print("save finished")
dev_ppl = eval(valid_data) dev_ppl = eval(valid_data)
print("dev ppl", dev_ppl) print("dev ppl", dev_ppl)
......
...@@ -88,7 +88,8 @@ def infer(): ...@@ -88,7 +88,8 @@ def infer():
dir_name = args.reload_model dir_name = args.reload_model
print("dir name", dir_name) print("dir name", dir_name)
fluid.io.load_params(exe, dir_name) dir_name = os.path.join(dir_name, "checkpoint")
fluid.load(main_program, dir_name, exe)
vocab, tar_id2vocab = get_vocab(args.dataset_prefix) vocab, tar_id2vocab = get_vocab(args.dataset_prefix)
infer_output = np.ones((batch_size, 1), dtype='int64') * BOS_ID infer_output = np.ones((batch_size, 1), dtype='int64') * BOS_ID
......
...@@ -255,10 +255,11 @@ def main(): ...@@ -255,10 +255,11 @@ def main():
best_nll = test_nll best_nll = test_nll
best_ppl = test_ppl best_ppl = test_ppl
best_epoch_id = epoch_id best_epoch_id = epoch_id
dir_name = os.path.join(args.model_path, save_path = os.path.join(args.model_path,
"epoch_" + str(best_epoch_id)) "epoch_" + str(best_epoch_id),
print("save model {}".format(dir_name)) "checkpoint")
fluid.io.save_params(exe, dir_name, main_program) print("save model {}".format(save_path))
fluid.save(main_program, save_path)
else: else:
steps_not_improved += 1 steps_not_improved += 1
if steps_not_improved == decay_ts: if steps_not_improved == decay_ts:
......
...@@ -4,6 +4,7 @@ This module provide nets for text classification ...@@ -4,6 +4,7 @@ This module provide nets for text classification
import paddle.fluid as fluid import paddle.fluid as fluid
def bow_net(data, def bow_net(data,
seq_len, seq_len,
label, label,
......
...@@ -43,8 +43,8 @@ class CNN(object): ...@@ -43,8 +43,8 @@ class CNN(object):
left_emb = emb_layer.ops(left) left_emb = emb_layer.ops(left)
right_emb = emb_layer.ops(right) right_emb = emb_layer.ops(right)
# Presentation context # Presentation context
cnn_layer = layers.SequenceConvPoolLayer( cnn_layer = layers.SequenceConvPoolLayer(self.filter_size,
self.filter_size, self.num_filters, "conv") self.num_filters, "conv")
left_cnn = cnn_layer.ops(left_emb) left_cnn = cnn_layer.ops(left_emb)
right_cnn = cnn_layer.ops(right_emb) right_cnn = cnn_layer.ops(right_emb)
# matching layer # matching layer
......
...@@ -33,20 +33,21 @@ def check_cuda(use_cuda, err = \ ...@@ -33,20 +33,21 @@ def check_cuda(use_cuda, err = \
except Exception as e: except Exception as e:
pass pass
def check_version(): def check_version():
""" """
Log error and exit when the installed version of paddlepaddle is Log error and exit when the installed version of paddlepaddle is
not satisfied. not satisfied.
""" """
err = "PaddlePaddle version 1.6 or higher is required, " \ err = "PaddlePaddle version 1.6 or higher is required, " \
"or a suitable develop version is satisfied as well. \n" \ "or a suitable develop version is satisfied as well. \n" \
"Please make sure the version is good with your code." \ "Please make sure the version is good with your code." \
try: try:
fluid.require_version('1.6.0') fluid.require_version('1.6.0')
except Exception as e: except Exception as e:
print(err) print(err)
sys.exit(1) sys.exit(1)
def check_version(): def check_version():
......
...@@ -30,10 +30,14 @@ from models.transformer_encoder import encoder, pre_process_layer ...@@ -30,10 +30,14 @@ from models.transformer_encoder import encoder, pre_process_layer
def ernie_pyreader(args, pyreader_name): def ernie_pyreader(args, pyreader_name):
"""define standard ernie pyreader""" """define standard ernie pyreader"""
src_ids = fluid.data(name='1', shape=[-1, args.max_seq_len, 1], dtype='int64') src_ids = fluid.data(
sent_ids = fluid.data(name='2', shape=[-1, args.max_seq_len, 1], dtype='int64') name='1', shape=[-1, args.max_seq_len, 1], dtype='int64')
pos_ids = fluid.data(name='3', shape=[-1, args.max_seq_len, 1], dtype='int64') sent_ids = fluid.data(
input_mask = fluid.data(name='4', shape=[-1, args.max_seq_len, 1], dtype='float32') name='2', shape=[-1, args.max_seq_len, 1], dtype='int64')
pos_ids = fluid.data(
name='3', shape=[-1, args.max_seq_len, 1], dtype='int64')
input_mask = fluid.data(
name='4', shape=[-1, args.max_seq_len, 1], dtype='float32')
labels = fluid.data(name='5', shape=[-1, 1], dtype='int64') labels = fluid.data(name='5', shape=[-1, 1], dtype='int64')
seq_lens = fluid.data(name='6', shape=[-1], dtype='int64') seq_lens = fluid.data(name='6', shape=[-1], dtype='int64')
......
...@@ -29,6 +29,7 @@ from preprocess.ernie import tokenization ...@@ -29,6 +29,7 @@ from preprocess.ernie import tokenization
from preprocess.padding import pad_batch_data from preprocess.padding import pad_batch_data
import io import io
def csv_reader(fd, delimiter='\t'): def csv_reader(fd, delimiter='\t'):
def gen(): def gen():
for i in fd: for i in fd:
...@@ -37,8 +38,10 @@ def csv_reader(fd, delimiter='\t'): ...@@ -37,8 +38,10 @@ def csv_reader(fd, delimiter='\t'):
yield slots, yield slots,
else: else:
yield slots yield slots
return gen() return gen()
class BaseReader(object): class BaseReader(object):
"""BaseReader for classify and sequence labeling task""" """BaseReader for classify and sequence labeling task"""
......
...@@ -23,6 +23,7 @@ import unicodedata ...@@ -23,6 +23,7 @@ import unicodedata
import six import six
import io import io
def convert_to_unicode(text): def convert_to_unicode(text):
"""Converts `text` to Unicode (if it's not already), assuming utf-8 input.""" """Converts `text` to Unicode (if it's not already), assuming utf-8 input."""
if six.PY3: if six.PY3:
......
...@@ -30,7 +30,7 @@ if sys.getdefaultencoding() != defaultencoding: ...@@ -30,7 +30,7 @@ if sys.getdefaultencoding() != defaultencoding:
reload(sys) reload(sys)
sys.setdefaultencoding(defaultencoding) sys.setdefaultencoding(defaultencoding)
sys.path.append("..") sys.path.append("../shared_modules/")
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
...@@ -47,18 +47,18 @@ from models.model_check import check_version ...@@ -47,18 +47,18 @@ from models.model_check import check_version
from models.model_check import check_cuda from models.model_check import check_cuda
def create_model(args, pyreader_name, is_inference = False, is_pointwise = False): def create_model(args, pyreader_name, is_inference=False, is_pointwise=False):
""" """
Create Model for simnet Create Model for simnet
""" """
if is_inference: if is_inference:
inf_pyreader = fluid.layers.py_reader( inf_pyreader = fluid.layers.py_reader(
capacity=16, capacity=16,
shapes=([-1,1], [-1,1]), shapes=([-1], [-1]),
dtypes=('int64', 'int64'), dtypes=('int64', 'int64'),
lod_levels=(1, 1), lod_levels=(1, 1),
name=pyreader_name, name=pyreader_name,
use_double_buffer=False) use_double_buffer=False)
left, pos_right = fluid.layers.read_file(inf_pyreader) left, pos_right = fluid.layers.read_file(inf_pyreader)
return inf_pyreader, left, pos_right return inf_pyreader, left, pos_right
...@@ -66,28 +66,30 @@ def create_model(args, pyreader_name, is_inference = False, is_pointwise = False ...@@ -66,28 +66,30 @@ def create_model(args, pyreader_name, is_inference = False, is_pointwise = False
else: else:
if is_pointwise: if is_pointwise:
pointwise_pyreader = fluid.layers.py_reader( pointwise_pyreader = fluid.layers.py_reader(
capacity=16, capacity=16,
shapes=([-1,1], [-1,1], [-1,1]), shapes=([-1], [-1], [-1]),
dtypes=('int64', 'int64', 'int64'), dtypes=('int64', 'int64', 'int64'),
lod_levels=(1, 1, 0), lod_levels=(1, 1, 0),
name=pyreader_name, name=pyreader_name,
use_double_buffer=False) use_double_buffer=False)
left, right, label = fluid.layers.read_file(pointwise_pyreader) left, right, label = fluid.layers.read_file(pointwise_pyreader)
return pointwise_pyreader, left, right, label return pointwise_pyreader, left, right, label
else: else:
pairwise_pyreader = fluid.layers.py_reader( pairwise_pyreader = fluid.layers.py_reader(
capacity=16, capacity=16,
shapes=([-1,1], [-1,1], [-1,1]), shapes=([-1], [-1], [-1]),
dtypes=('int64', 'int64', 'int64'), dtypes=('int64', 'int64', 'int64'),
lod_levels=(1, 1, 1), lod_levels=(1, 1, 1),
name=pyreader_name, name=pyreader_name,
use_double_buffer=False) use_double_buffer=False)
left, pos_right, neg_right = fluid.layers.read_file(pairwise_pyreader) left, pos_right, neg_right = fluid.layers.read_file(
pairwise_pyreader)
return pairwise_pyreader, left, pos_right, neg_right return pairwise_pyreader, left, pos_right, neg_right
def train(conf_dict, args): def train(conf_dict, args):
""" """
train processic train processic
...@@ -97,16 +99,16 @@ def train(conf_dict, args): ...@@ -97,16 +99,16 @@ def train(conf_dict, args):
# get vocab size # get vocab size
conf_dict['dict_size'] = len(vocab) conf_dict['dict_size'] = len(vocab)
# Load network structure dynamically # Load network structure dynamically
net = utils.import_class("../models/matching", net = utils.import_class("../shared_modules/models/matching",
conf_dict["net"]["module_name"], conf_dict["net"]["module_name"],
conf_dict["net"]["class_name"])(conf_dict) conf_dict["net"]["class_name"])(conf_dict)
# Load loss function dynamically # Load loss function dynamically
loss = utils.import_class("../models/matching/losses", loss = utils.import_class("../shared_modules/models/matching/losses",
conf_dict["loss"]["module_name"], conf_dict["loss"]["module_name"],
conf_dict["loss"]["class_name"])(conf_dict) conf_dict["loss"]["class_name"])(conf_dict)
# Load Optimization method # Load Optimization method
optimizer = utils.import_class( optimizer = utils.import_class(
"../models/matching/optimizers", "paddle_optimizers", "../shared_modules/models/matching/optimizers", "paddle_optimizers",
conf_dict["optimizer"]["class_name"])(conf_dict) conf_dict["optimizer"]["class_name"])(conf_dict)
# load auc method # load auc method
metric = fluid.metrics.Auc(name="auc") metric = fluid.metrics.Auc(name="auc")
...@@ -131,22 +133,23 @@ def train(conf_dict, args): ...@@ -131,22 +133,23 @@ def train(conf_dict, args):
with fluid.program_guard(train_program, startup_prog): with fluid.program_guard(train_program, startup_prog):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
train_pyreader, left, pos_right, neg_right = create_model( train_pyreader, left, pos_right, neg_right = create_model(
args, args, pyreader_name='train_reader')
pyreader_name='train_reader')
left_feat, pos_score = net.predict(left, pos_right) left_feat, pos_score = net.predict(left, pos_right)
pred = pos_score pred = pos_score
_, neg_score = net.predict(left, neg_right) _, neg_score = net.predict(left, neg_right)
avg_cost = loss.compute(pos_score, neg_score) avg_cost = loss.compute(pos_score, neg_score)
avg_cost.persistable = True avg_cost.persistable = True
optimizer.ops(avg_cost) optimizer.ops(avg_cost)
# Get Reader # Get Reader
get_train_examples = simnet_process.get_reader("train",epoch=args.epoch) get_train_examples = simnet_process.get_reader(
"train", epoch=args.epoch)
if args.do_valid: if args.do_valid:
test_prog = fluid.Program() test_prog = fluid.Program()
with fluid.program_guard(test_prog, startup_prog): with fluid.program_guard(test_prog, startup_prog):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
test_pyreader, left, pos_right= create_model(args, pyreader_name = 'test_reader',is_inference=True) test_pyreader, left, pos_right = create_model(
args, pyreader_name='test_reader', is_inference=True)
left_feat, pos_score = net.predict(left, pos_right) left_feat, pos_score = net.predict(left, pos_right)
pred = pos_score pred = pos_score
test_prog = test_prog.clone(for_test=True) test_prog = test_prog.clone(for_test=True)
...@@ -156,40 +159,41 @@ def train(conf_dict, args): ...@@ -156,40 +159,41 @@ def train(conf_dict, args):
with fluid.program_guard(train_program, startup_prog): with fluid.program_guard(train_program, startup_prog):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
train_pyreader, left, right, label = create_model( train_pyreader, left, right, label = create_model(
args, args, pyreader_name='train_reader', is_pointwise=True)
pyreader_name='train_reader',
is_pointwise=True)
left_feat, pred = net.predict(left, right) left_feat, pred = net.predict(left, right)
avg_cost = loss.compute(pred, label) avg_cost = loss.compute(pred, label)
avg_cost.persistable = True avg_cost.persistable = True
optimizer.ops(avg_cost) optimizer.ops(avg_cost)
# Get Feeder and Reader # Get Feeder and Reader
get_train_examples = simnet_process.get_reader("train",epoch=args.epoch) get_train_examples = simnet_process.get_reader(
"train", epoch=args.epoch)
if args.do_valid: if args.do_valid:
test_prog = fluid.Program() test_prog = fluid.Program()
with fluid.program_guard(test_prog, startup_prog): with fluid.program_guard(test_prog, startup_prog):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
test_pyreader, left, right= create_model(args, pyreader_name = 'test_reader',is_inference=True) test_pyreader, left, right = create_model(
args, pyreader_name='test_reader', is_inference=True)
left_feat, pred = net.predict(left, right) left_feat, pred = net.predict(left, right)
test_prog = test_prog.clone(for_test=True) test_prog = test_prog.clone(for_test=True)
if args.init_checkpoint is not "": if args.init_checkpoint is not "":
utils.init_checkpoint(exe, args.init_checkpoint, utils.init_checkpoint(exe, args.init_checkpoint, startup_prog)
startup_prog)
def valid_and_test(test_program, test_pyreader, get_valid_examples, process, mode, exe, fetch_list): def valid_and_test(test_program, test_pyreader, get_valid_examples, process,
mode, exe, fetch_list):
""" """
return auc and acc return auc and acc
""" """
# Get Batch Data # Get Batch Data
batch_data = fluid.io.batch(get_valid_examples, args.batch_size, drop_last=False) batch_data = fluid.io.batch(
get_valid_examples, args.batch_size, drop_last=False)
test_pyreader.decorate_paddle_reader(batch_data) test_pyreader.decorate_paddle_reader(batch_data)
test_pyreader.start() test_pyreader.start()
pred_list = [] pred_list = []
while True: while True:
try: try:
_pred = exe.run(program=test_program,fetch_list=[pred.name]) _pred = exe.run(program=test_program, fetch_list=[pred.name])
pred_list += list(_pred) pred_list += list(_pred)
except fluid.core.EOFException: except fluid.core.EOFException:
test_pyreader.reset() test_pyreader.reset()
...@@ -222,11 +226,12 @@ def train(conf_dict, args): ...@@ -222,11 +226,12 @@ def train(conf_dict, args):
#for epoch_id in range(args.epoch): #for epoch_id in range(args.epoch):
# used for continuous evaluation # used for continuous evaluation
if args.enable_ce: if args.enable_ce:
train_batch_data = fluid.io.batch(get_train_examples, args.batch_size, drop_last=False) train_batch_data = fluid.io.batch(
get_train_examples, args.batch_size, drop_last=False)
else: else:
train_batch_data = fluid.io.batch( train_batch_data = fluid.io.batch(
fluid.io.shuffle( fluid.io.shuffle(
get_train_examples, buf_size=10000), get_train_examples, buf_size=10000),
args.batch_size, args.batch_size,
drop_last=False) drop_last=False)
train_pyreader.decorate_paddle_reader(train_batch_data) train_pyreader.decorate_paddle_reader(train_batch_data)
...@@ -238,25 +243,29 @@ def train(conf_dict, args): ...@@ -238,25 +243,29 @@ def train(conf_dict, args):
try: try:
global_step += 1 global_step += 1
fetch_list = [avg_cost.name] fetch_list = [avg_cost.name]
avg_loss = train_exe.run(program=train_program, fetch_list = fetch_list) avg_loss = train_exe.run(program=train_program,
fetch_list=fetch_list)
losses.append(np.mean(avg_loss[0])) losses.append(np.mean(avg_loss[0]))
if args.do_valid and global_step % args.validation_steps == 0: if args.do_valid and global_step % args.validation_steps == 0:
get_valid_examples = simnet_process.get_reader("valid") get_valid_examples = simnet_process.get_reader("valid")
valid_result = valid_and_test(test_prog,test_pyreader,get_valid_examples,simnet_process,"valid",exe,[pred.name]) valid_result = valid_and_test(
test_prog, test_pyreader, get_valid_examples,
simnet_process, "valid", exe, [pred.name])
if args.compute_accuracy: if args.compute_accuracy:
valid_auc, valid_acc = valid_result valid_auc, valid_acc = valid_result
logging.info( logging.info(
"global_steps: %d, valid_auc: %f, valid_acc: %f, valid_loss: %f" % "global_steps: %d, valid_auc: %f, valid_acc: %f, valid_loss: %f"
(global_step, valid_auc, valid_acc, np.mean(losses))) % (global_step, valid_auc, valid_acc, np.mean(losses)))
else: else:
valid_auc = valid_result valid_auc = valid_result
logging.info("global_steps: %d, valid_auc: %f, valid_loss: %f" % logging.info(
(global_step, valid_auc, np.mean(losses))) "global_steps: %d, valid_auc: %f, valid_loss: %f" %
(global_step, valid_auc, np.mean(losses)))
if global_step % args.save_steps == 0: if global_step % args.save_steps == 0:
model_save_dir = os.path.join(args.output_dir, model_save_dir = os.path.join(args.output_dir,
conf_dict["model_path"]) conf_dict["model_path"])
model_path = os.path.join(model_save_dir, str(global_step)) model_path = os.path.join(model_save_dir, str(global_step))
if not os.path.exists(model_save_dir): if not os.path.exists(model_save_dir):
os.makedirs(model_save_dir) os.makedirs(model_save_dir)
if args.task_mode == "pairwise": if args.task_mode == "pairwise":
...@@ -269,21 +278,19 @@ def train(conf_dict, args): ...@@ -269,21 +278,19 @@ def train(conf_dict, args):
] ]
target_vars = [left_feat, pred] target_vars = [left_feat, pred]
fluid.io.save_inference_model(model_path, feed_var_names, fluid.io.save_inference_model(model_path, feed_var_names,
target_vars, exe, target_vars, exe, test_prog)
test_prog)
logging.info("saving infer model in %s" % model_path) logging.info("saving infer model in %s" % model_path)
except fluid.core.EOFException: except fluid.core.EOFException:
train_pyreader.reset() train_pyreader.reset()
break break
end_time = time.time() end_time = time.time()
#logging.info("epoch: %d, loss: %f, used time: %d sec" % #logging.info("epoch: %d, loss: %f, used time: %d sec" %
#(epoch_id, np.mean(losses), end_time - start_time)) #(epoch_id, np.mean(losses), end_time - start_time))
ce_info.append([np.mean(losses), end_time - start_time]) ce_info.append([np.mean(losses), end_time - start_time])
#final save #final save
logging.info("the final step is %s" % global_step) logging.info("the final step is %s" % global_step)
model_save_dir = os.path.join(args.output_dir, model_save_dir = os.path.join(args.output_dir, conf_dict["model_path"])
conf_dict["model_path"])
model_path = os.path.join(model_save_dir, str(global_step)) model_path = os.path.join(model_save_dir, str(global_step))
if not os.path.exists(model_save_dir): if not os.path.exists(model_save_dir):
os.makedirs(model_save_dir) os.makedirs(model_save_dir)
...@@ -296,9 +303,8 @@ def train(conf_dict, args): ...@@ -296,9 +303,8 @@ def train(conf_dict, args):
right.name, right.name,
] ]
target_vars = [left_feat, pred] target_vars = [left_feat, pred]
fluid.io.save_inference_model(model_path, feed_var_names, fluid.io.save_inference_model(model_path, feed_var_names, target_vars, exe,
target_vars, exe, test_prog)
test_prog)
logging.info("saving infer model in %s" % model_path) logging.info("saving infer model in %s" % model_path)
# used for continuous evaluation # used for continuous evaluation
if args.enable_ce: if args.enable_ce:
...@@ -322,7 +328,9 @@ def train(conf_dict, args): ...@@ -322,7 +328,9 @@ def train(conf_dict, args):
else: else:
# Get Feeder and Reader # Get Feeder and Reader
get_test_examples = simnet_process.get_reader("test") get_test_examples = simnet_process.get_reader("test")
test_result = valid_and_test(test_prog,test_pyreader,get_test_examples,simnet_process,"test",exe,[pred.name]) test_result = valid_and_test(test_prog, test_pyreader,
get_test_examples, simnet_process, "test",
exe, [pred.name])
if args.compute_accuracy: if args.compute_accuracy:
test_auc, test_acc = test_result test_auc, test_acc = test_result
logging.info("AUC of test is %f, Accuracy of test is %f" % logging.info("AUC of test is %f, Accuracy of test is %f" %
...@@ -344,16 +352,17 @@ def test(conf_dict, args): ...@@ -344,16 +352,17 @@ def test(conf_dict, args):
vocab = utils.load_vocab(args.vocab_path) vocab = utils.load_vocab(args.vocab_path)
simnet_process = reader.SimNetProcessor(args, vocab) simnet_process = reader.SimNetProcessor(args, vocab)
startup_prog = fluid.Program() startup_prog = fluid.Program()
get_test_examples = simnet_process.get_reader("test") get_test_examples = simnet_process.get_reader("test")
batch_data = fluid.io.batch(get_test_examples, args.batch_size, drop_last=False) batch_data = fluid.io.batch(
get_test_examples, args.batch_size, drop_last=False)
test_prog = fluid.Program() test_prog = fluid.Program()
conf_dict['dict_size'] = len(vocab) conf_dict['dict_size'] = len(vocab)
net = utils.import_class("../models/matching", net = utils.import_class("../shared_modules/models/matching",
conf_dict["net"]["module_name"], conf_dict["net"]["module_name"],
conf_dict["net"]["class_name"])(conf_dict) conf_dict["net"]["class_name"])(conf_dict)
...@@ -364,9 +373,7 @@ def test(conf_dict, args): ...@@ -364,9 +373,7 @@ def test(conf_dict, args):
with fluid.program_guard(test_prog, startup_prog): with fluid.program_guard(test_prog, startup_prog):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
test_pyreader, left, pos_right = create_model( test_pyreader, left, pos_right = create_model(
args, args, pyreader_name='test_reader', is_inference=True)
pyreader_name = 'test_reader',
is_inference=True)
left_feat, pos_score = net.predict(left, pos_right) left_feat, pos_score = net.predict(left, pos_right)
pred = pos_score pred = pos_score
test_prog = test_prog.clone(for_test=True) test_prog = test_prog.clone(for_test=True)
...@@ -375,19 +382,14 @@ def test(conf_dict, args): ...@@ -375,19 +382,14 @@ def test(conf_dict, args):
with fluid.program_guard(test_prog, startup_prog): with fluid.program_guard(test_prog, startup_prog):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
test_pyreader, left, right = create_model( test_pyreader, left, right = create_model(
args, args, pyreader_name='test_reader', is_inference=True)
pyreader_name = 'test_reader',
is_inference=True)
left_feat, pred = net.predict(left, right) left_feat, pred = net.predict(left, right)
test_prog = test_prog.clone(for_test=True) test_prog = test_prog.clone(for_test=True)
exe.run(startup_prog) exe.run(startup_prog)
utils.init_checkpoint( utils.init_checkpoint(exe, args.init_checkpoint, main_program=test_prog)
exe,
args.init_checkpoint,
main_program=test_prog)
test_exe = exe test_exe = exe
test_pyreader.decorate_paddle_reader(batch_data) test_pyreader.decorate_paddle_reader(batch_data)
...@@ -398,15 +400,18 @@ def test(conf_dict, args): ...@@ -398,15 +400,18 @@ def test(conf_dict, args):
output = [] output = []
while True: while True:
try: try:
output = test_exe.run(program=test_prog,fetch_list=fetch_list) output = test_exe.run(program=test_prog, fetch_list=fetch_list)
if args.task_mode == "pairwise": if args.task_mode == "pairwise":
pred_list += list(map(lambda item: float(item[0]), output[0])) pred_list += list(
map(lambda item: float(item[0]), output[0]))
predictions_file.write(u"\n".join( predictions_file.write(u"\n".join(
map(lambda item: str((item[0] + 1) / 2), output[0])) + "\n") map(lambda item: str((item[0] + 1) / 2), output[0])) +
"\n")
else: else:
pred_list += map(lambda item: item, output[0]) pred_list += map(lambda item: item, output[0])
predictions_file.write(u"\n".join( predictions_file.write(u"\n".join(
map(lambda item: str(np.argmax(item)), output[0])) + "\n") map(lambda item: str(np.argmax(item)), output[0])) +
"\n")
except fluid.core.EOFException: except fluid.core.EOFException:
test_pyreader.reset() test_pyreader.reset()
break break
...@@ -450,37 +455,37 @@ def infer(conf_dict, args): ...@@ -450,37 +455,37 @@ def infer(conf_dict, args):
startup_prog = fluid.Program() startup_prog = fluid.Program()
get_infer_examples = simnet_process.get_infer_reader get_infer_examples = simnet_process.get_infer_reader
batch_data = fluid.io.batch(get_infer_examples, args.batch_size, drop_last=False) batch_data = fluid.io.batch(
get_infer_examples, args.batch_size, drop_last=False)
test_prog = fluid.Program() test_prog = fluid.Program()
conf_dict['dict_size'] = len(vocab) conf_dict['dict_size'] = len(vocab)
net = utils.import_class("../models/matching", net = utils.import_class("../shared_modules/models/matching",
conf_dict["net"]["module_name"], conf_dict["net"]["module_name"],
conf_dict["net"]["class_name"])(conf_dict) conf_dict["net"]["class_name"])(conf_dict)
if args.task_mode == "pairwise": if args.task_mode == "pairwise":
with fluid.program_guard(test_prog, startup_prog): with fluid.program_guard(test_prog, startup_prog):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
infer_pyreader, left, pos_right = create_model(args, pyreader_name = 'infer_reader', is_inference = True) infer_pyreader, left, pos_right = create_model(
args, pyreader_name='infer_reader', is_inference=True)
left_feat, pos_score = net.predict(left, pos_right) left_feat, pos_score = net.predict(left, pos_right)
pred = pos_score pred = pos_score
test_prog = test_prog.clone(for_test=True) test_prog = test_prog.clone(for_test=True)
else: else:
with fluid.program_guard(test_prog, startup_prog): with fluid.program_guard(test_prog, startup_prog):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
infer_pyreader, left, right = create_model(args, pyreader_name = 'infer_reader', is_inference = True) infer_pyreader, left, right = create_model(
args, pyreader_name='infer_reader', is_inference=True)
left_feat, pred = net.predict(left, right) left_feat, pred = net.predict(left, right)
test_prog = test_prog.clone(for_test=True) test_prog = test_prog.clone(for_test=True)
exe.run(startup_prog) exe.run(startup_prog)
utils.init_checkpoint( utils.init_checkpoint(exe, args.init_checkpoint, main_program=test_prog)
exe,
args.init_checkpoint,
main_program=test_prog)
test_exe = exe test_exe = exe
infer_pyreader.decorate_sample_list_generator(batch_data) infer_pyreader.decorate_sample_list_generator(batch_data)
...@@ -490,16 +495,16 @@ def infer(conf_dict, args): ...@@ -490,16 +495,16 @@ def infer(conf_dict, args):
output = [] output = []
infer_pyreader.start() infer_pyreader.start()
while True: while True:
try: try:
output = test_exe.run(program=test_prog,fetch_list=fetch_list) output = test_exe.run(program=test_prog, fetch_list=fetch_list)
if args.task_mode == "pairwise": if args.task_mode == "pairwise":
preds_list += list( preds_list += list(
map(lambda item: str((item[0] + 1) / 2), output[0])) map(lambda item: str((item[0] + 1) / 2), output[0]))
else: else:
preds_list += map(lambda item: str(np.argmax(item)), output[0]) preds_list += map(lambda item: str(np.argmax(item)), output[0])
except fluid.core.EOFException: except fluid.core.EOFException:
infer_pyreader.reset() infer_pyreader.reset()
break break
with io.open(args.infer_result_path, "w", encoding="utf8") as infer_file: with io.open(args.infer_result_path, "w", encoding="utf8") as infer_file:
for _data, _pred in zip(simnet_process.get_infer_data(), preds_list): for _data, _pred in zip(simnet_process.get_infer_data(), preds_list):
infer_file.write(_data + "\t" + _pred + "\n") infer_file.write(_data + "\t" + _pred + "\n")
...@@ -514,6 +519,7 @@ def get_cards(): ...@@ -514,6 +519,7 @@ def get_cards():
num = len(cards.split(",")) num = len(cards.split(","))
return num return num
if __name__ == "__main__": if __name__ == "__main__":
args = ArgConfig() args = ArgConfig()
......
...@@ -149,7 +149,7 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 ...@@ -149,7 +149,7 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化
[**PaddleNLP**](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP) 是基于 PaddlePaddle 深度学习框架开发的自然语言处理 (NLP) 工具,算法,模型和数据的开源项目。百度在 NLP 领域十几年的深厚积淀为 PaddleNLP 提供了强大的核心动力。使用 PaddleNLP,您可以得到: [**PaddleNLP**](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP) 是基于 PaddlePaddle 深度学习框架开发的自然语言处理 (NLP) 工具,算法,模型和数据的开源项目。百度在 NLP 领域十几年的深厚积淀为 PaddleNLP 提供了强大的核心动力。使用 PaddleNLP,您可以得到:
- **丰富而全面的 NLP 任务支持:** - **丰富而全面的 NLP 任务支持:**
- PaddleNLP 为您提供了多粒度,多场景的应用支持。涵盖了从[分词](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/lexical_analysis)[词性标注](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/lexical_analysis)[命名实体识别](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/lexical_analysis)等 NLP 基础技术,到[文本分类](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/sentiment_classification)[文本相似度计算](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/similarity_net)[语义表示](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleLARK)[文本生成](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleTextGEN)等 NLP 核心技术。同时,PaddleNLP 还提供了针对常见 NLP 大型应用系统(如[阅读理解](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleMRC)[对话系统](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleDialogue)[机器翻译系统](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleMT)等)的特定核心技术和工具组件,模型和预训练参数等,让您在 NLP 领域畅通无阻。 - PaddleNLP 为您提供了多粒度,多场景的应用支持。涵盖了从[分词](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/lexical_analysis)[词性标注](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/lexical_analysis)[命名实体识别](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/lexical_analysis)等 NLP 基础技术,到[文本分类](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/sentiment_classification)[文本相似度计算](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/similarity_net)[语义表示](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/pretrain_langauge_models)[文本生成](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/seq2seq)等 NLP 核心技术。同时,PaddleNLP 还提供了针对常见 NLP 大型应用系统(如[阅读理解](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/machine_reading_comprehension)[对话系统](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/dialogue_system)[机器翻译系统](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/machine_translation)等)的特定核心技术和工具组件,模型和预训练参数等,让您在 NLP 领域畅通无阻。
- **稳定可靠的 NLP 模型和强大的预训练参数:** - **稳定可靠的 NLP 模型和强大的预训练参数:**
- PaddleNLP集成了百度内部广泛使用的 NLP 工具模型,为您提供了稳定可靠的 NLP 算法解决方案。基于百亿级数据的预训练参数和丰富的预训练模型,助您轻松提高模型效果,为您的 NLP 业务注入强大动力。 - PaddleNLP集成了百度内部广泛使用的 NLP 工具模型,为您提供了稳定可靠的 NLP 算法解决方案。基于百亿级数据的预训练参数和丰富的预训练模型,助您轻松提高模型效果,为您的 NLP 业务注入强大动力。
- **持续改进和技术支持,零基础搭建 NLP 应用:** - **持续改进和技术支持,零基础搭建 NLP 应用:**
...@@ -167,14 +167,14 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 ...@@ -167,14 +167,14 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化
#### 语义表示 #### 语义表示
[PaddleLARK](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleLARK) (Paddle LAngauge Representation ToolKit) 是传统语言模型的进一步发展,通过在大规模语料上训练得到的通用的语义表示模型,可以助益其他自然语言处理任务,是通用预训练 + 特定任务精调范式的体现。PaddleLARK 集成了 ELMO,BERT,ERNIE 1.0,ERNIE 2.0,XLNet 等热门中英文预训练模型。 [pretrain_langauge_models](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/pretrain_langauge_models) (Paddle LAngauge Representation ToolKit) 是传统语言模型的进一步发展,通过在大规模语料上训练得到的通用的语义表示模型,可以助益其他自然语言处理任务,是通用预训练 + 特定任务精调范式的体现。pretrain_langauge_models 集成了 ELMO,BERT,ERNIE 1.0,ERNIE 2.0,XLNet 等热门中英文预训练模型。
| 模型 | 简介 | | 模型 | 简介 |
| ------------------------------------------------------------ | ------------------------------------------------------------ | | ------------------------------------------------------------ | ------------------------------------------------------------ |
| [ERNIE](https://github.com/PaddlePaddle/ERNIE)(Enhanced Representation from kNowledge IntEgration) | 百度自研的语义表示模型,通过建模海量数据中的词、实体及实体关系,学习真实世界的语义知识。相较于 BERT 学习原始语言信号,ERNIE 直接对先验语义知识单元进行建模,增强了模型语义表示能力。 | | [ERNIE](https://github.com/PaddlePaddle/ERNIE)(Enhanced Representation from kNowledge IntEgration) | 百度自研的语义表示模型,通过建模海量数据中的词、实体及实体关系,学习真实世界的语义知识。相较于 BERT 学习原始语言信号,ERNIE 直接对先验语义知识单元进行建模,增强了模型语义表示能力。 |
| [BERT](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleLARK/BERT)(Bidirectional Encoder Representation from Transformers) | 一个迁移能力很强的通用语义表示模型, 以 Transformer 为网络基本组件,以双向 Masked Language Model和 Next Sentence Prediction 为训练目标,通过预训练得到通用语义表示,再结合简单的输出层,应用到下游的 NLP 任务,在多个任务上取得了 SOTA 的结果。 | | [BERT](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/pretrain_langauge_models/BERT)(Bidirectional Encoder Representation from Transformers) | 一个迁移能力很强的通用语义表示模型, 以 Transformer 为网络基本组件,以双向 Masked Language Model和 Next Sentence Prediction 为训练目标,通过预训练得到通用语义表示,再结合简单的输出层,应用到下游的 NLP 任务,在多个任务上取得了 SOTA 的结果。 |
| [XLNet](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleLARK/XLNet)(XLNet: Generalized Autoregressive Pretraining for Language Understanding) | 重要的语义表示模型之一,引入 Transformer-XL 为骨架,以 Permutation Language Modeling 为优化目标,在若干下游任务上优于 BERT 的性能。 | | [XLNet](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/pretrain_langauge_models/XLNet)(XLNet: Generalized Autoregressive Pretraining for Language Understanding) | 重要的语义表示模型之一,引入 Transformer-XL 为骨架,以 Permutation Language Modeling 为优化目标,在若干下游任务上优于 BERT 的性能。 |
| [ELMo](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleLARK/ELMo)(Embeddings from Language Models) | 重要的通用语义表示模型之一,以双向 LSTM 为网路基本组件,以 Language Model 为训练目标,通过预训练得到通用的语义表示,将通用的语义表示作为 Feature 迁移到下游 NLP 任务中,会显著提升下游任务的模型性能。 | | [ELMo](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/pretrain_langauge_models/ELMo)(Embeddings from Language Models) | 重要的通用语义表示模型之一,以双向 LSTM 为网路基本组件,以 Language Model 为训练目标,通过预训练得到通用的语义表示,将通用的语义表示作为 Feature 迁移到下游 NLP 任务中,会显著提升下游任务的模型性能。 |
#### 文本相似度计算 #### 文本相似度计算
...@@ -182,7 +182,7 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 ...@@ -182,7 +182,7 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化
#### 文本生成 #### 文本生成
[PaddleTextGEN](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleTextGEN) (Paddle Text Generation) ,一个基于 PaddlePaddle 的文本生成框架,提供了一些列经典文本生成模型案例,如 vanilla seq2seq,seq2seq with attention,variational seq2seq 模型等。 [seq2seq](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/seq2seq) (Paddle Text Generation) ,一个基于 PaddlePaddle 的文本生成框架,提供了一些列经典文本生成模型案例,如 vanilla seq2seq,seq2seq with attention,variational seq2seq 模型等。
### NLP 系统应用 ### NLP 系统应用
...@@ -195,7 +195,7 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 ...@@ -195,7 +195,7 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化
#### 阅读理解 #### 阅读理解
[PaddleMRC](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleMRC) (Paddle Machine Reading Comprehension),集合了百度在阅读理解领域相关的模型,工具,开源数据集等一系列工作。 [machine_reading_comprehension](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/machine_reading_comprehension) (Paddle Machine Reading Comprehension),集合了百度在阅读理解领域相关的模型,工具,开源数据集等一系列工作。
| 模型 | 简介 | | 模型 | 简介 |
| ------------------------------------------------------------ | ------------------------------------------------------------ | | ------------------------------------------------------------ | ------------------------------------------------------------ |
...@@ -205,16 +205,16 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化 ...@@ -205,16 +205,16 @@ PaddlePaddle 提供了丰富的计算单元,使得用户可以采用模块化
#### 机器翻译 #### 机器翻译
[PaddleMT](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleMT) ,全称为Paddle Machine Translation,基于Transformer的经典机器翻译模型,基于论文 [Attention Is All You Need](https://arxiv.org/abs/1706.03762) [machine_translation](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/machine_translation) ,全称为Paddle Machine Translation,基于Transformer的经典机器翻译模型,基于论文 [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
#### 对话系统 #### 对话系统
[PaddleDialogue](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleDialogue) 包含对话系统方向的模型、数据集和工具。 [dialogue_system](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/dialogue_system) 包含对话系统方向的模型、数据集和工具。
| 模型 | 简介 | | 模型 | 简介 |
| ------------------------------------------------------------ | ------------------------------------------------------------ | | ------------------------------------------------------------ | ------------------------------------------------------------ |
| [DGU](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleDialogue/dialogue_general_understanding) (Dialogue General Understanding,通用对话理解模型) | 覆盖了包括**检索式聊天系统**中 context-response matching 任务和**任务完成型对话系统****意图识别****槽位解析****状态追踪**等常见对话系统任务,在 6 项国际公开数据集中都获得了最佳效果。 | | [DGU](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/dialogue_system/dialogue_general_understanding) (Dialogue General Understanding,通用对话理解模型) | 覆盖了包括**检索式聊天系统**中 context-response matching 任务和**任务完成型对话系统****意图识别****槽位解析****状态追踪**等常见对话系统任务,在 6 项国际公开数据集中都获得了最佳效果。 |
| [ADEM](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleDialogue/auto_dialogue_evaluation) (Auto Dialogue Evaluation Model) | 评估开放领域对话系统的回复质量,能够帮助企业或个人快速评估对话系统的回复质量,减少人工评估成本。 | | [ADEM](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/dialogue_system/auto_dialogue_evaluation) (Auto Dialogue Evaluation Model) | 评估开放领域对话系统的回复质量,能够帮助企业或个人快速评估对话系统的回复质量,减少人工评估成本。 |
| [Proactive Conversation](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/Research/ACL2019-DuConv) | 包含百度开源的知识驱动的开放领域对话数据集 [DuConv](https://ai.baidu.com/broad/subordinate?dataset=duconv),以及基线模型。对应论文 [Proactive Human-Machine Conversation with Explicit Conversation Goals](https://arxiv.org/abs/1906.05572) 发表于 ACL2019。 | | [Proactive Conversation](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/Research/ACL2019-DuConv) | 包含百度开源的知识驱动的开放领域对话数据集 [DuConv](https://ai.baidu.com/broad/subordinate?dataset=duconv),以及基线模型。对应论文 [Proactive Human-Machine Conversation with Explicit Conversation Goals](https://arxiv.org/abs/1906.05572) 发表于 ACL2019。 |
| [DAM](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/Research/ACL2018-DAM)(Deep Attention Matching Network,深度注意力机制模型) | 开放领域多轮对话匹配模型,对应论文 [Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network](https://aclweb.org/anthology/P18-1103/) 发表于 ACL2018。 | | [DAM](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/Research/ACL2018-DAM)(Deep Attention Matching Network,深度注意力机制模型) | 开放领域多轮对话匹配模型,对应论文 [Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network](https://aclweb.org/anthology/P18-1103/) 发表于 ACL2018。 |
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册