From 32028a9e1f1edc81a9befbd4ef9f8e3034c2b86c Mon Sep 17 00:00:00 2001 From: xujiaqi01 Date: Fri, 15 May 2020 10:53:48 +0800 Subject: [PATCH] Update readme.md --- models/contentunderstanding/readme.md | 29 +++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/models/contentunderstanding/readme.md b/models/contentunderstanding/readme.md index e2994412..1063982b 100644 --- a/models/contentunderstanding/readme.md +++ b/models/contentunderstanding/readme.md @@ -28,6 +28,35 @@ ## 使用教程 ### 数据处理 + +**(1)TagSpace** + +[数据地址](https://github.com/mhjabreel/CharCNN/tree/master/data/) , [备份数据地址](https://paddle-tagspace.bj.bcebos.com/data.tar) + +数据格式如下 +``` +"3","Wall St. Bears Claw Back Into the Black (Reuters)","Reuters - Short-sellers, Wall Street's dwindling\band of ultra-cynics, are seeing green again." +``` + +数据解压后,将文本数据转为paddle数据,先将数据放到训练数据目录和测试数据目录 + +``` +mkdir raw_big_train_data +mkdir raw_big_test_data +mv train.csv raw_big_train_data +mv test.csv raw_big_test_data +``` + +运行脚本text2paddle.py 生成paddle输入格式 + +``` +python text2paddle.py raw_big_train_data/ raw_big_test_data/ train_big_data test_big_data big_vocab_text.txt big_vocab_tag.txt +``` + +**(2)TextClassification** + +无 + ### 训练 ### 预测 -- GitLab