diff --git a/doc/imgs/cnn-ckim2014.JPG b/doc/imgs/cnn-ckim2014.JPG new file mode 100644 index 0000000000000000000000000000000000000000..a1a8f1ec29adc75652ac79f1a71e0a3db11456e8 Binary files /dev/null and b/doc/imgs/cnn-ckim2014.JPG differ diff --git a/doc/imgs/tagspace.JPG b/doc/imgs/tagspace.JPG new file mode 100644 index 0000000000000000000000000000000000000000..3889db2d98c35cd494a77b710e6734fb4139d440 Binary files /dev/null and b/doc/imgs/tagspace.JPG differ diff --git a/models/contentunderstanding/text_classification/config.yaml b/models/contentunderstanding/classification/config.yaml similarity index 100% rename from models/contentunderstanding/text_classification/config.yaml rename to models/contentunderstanding/classification/config.yaml diff --git a/models/contentunderstanding/text_classification/model.py b/models/contentunderstanding/classification/model.py similarity index 100% rename from models/contentunderstanding/text_classification/model.py rename to models/contentunderstanding/classification/model.py diff --git a/models/contentunderstanding/text_classification/reader.py b/models/contentunderstanding/classification/reader.py similarity index 100% rename from models/contentunderstanding/text_classification/reader.py rename to models/contentunderstanding/classification/reader.py diff --git a/models/contentunderstanding/text_classification/train_data/part-0 b/models/contentunderstanding/classification/train_data/part-0 similarity index 100% rename from models/contentunderstanding/text_classification/train_data/part-0 rename to models/contentunderstanding/classification/train_data/part-0 diff --git a/models/contentunderstanding/readme.md b/models/contentunderstanding/readme.md index 1063982b7a98dbe56a06ed7c5915ecd21fd5bebf..2d4482bd9bdd612bf1b28c79a4c951c4a4143598 100644 --- a/models/contentunderstanding/readme.md +++ b/models/contentunderstanding/readme.md @@ -1,7 +1,7 @@ # 内容理解模型库 ## 简介 -我们提供了常见的内容理解任务中使用的模型算法的PaddleRec实现, 单机训练&预测效果指标以及分布式训练&预测性能指标等。实现的内容理解模型包括 [Tagspace](http://gitlab.baidu.com/xujiaqi01/paddlerec/tree/develop/models/contentunderstanding/tagspace)、[文本分类](http://gitlab.baidu.com/xujiaqi01/paddlerec/tree/develop/models/contentunderstanding/text_classification)。 +我们提供了常见的内容理解任务中使用的模型算法的PaddleRec实现, 单机训练&预测效果指标以及分布式训练&预测性能指标等。实现的内容理解模型包括 [Tagspace](http://gitlab.baidu.com/xujiaqi01/paddlerec/tree/develop/models/contentunderstanding/tagspace)、[文本分类](http://gitlab.baidu.com/xujiaqi01/paddlerec/tree/develop/models/contentunderstanding/classification)。 模型算法库在持续添加中,欢迎关注。 @@ -22,9 +22,18 @@ | 模型 | 简介 | 论文 | | :------------------: | :--------------------: | :---------: | -| TagSpace | 标签推荐 | [TagSpace: Semantic Embeddings from Hashtags](https://research.fb.com/publications/tagspace-semantic-embeddings-from-hashtags/) | -| TextClassification | 文本分类 | -- | +| TagSpace | 标签推荐 | [TagSpace: Semantic Embeddings from Hashtags (2014)](https://research.fb.com/publications/tagspace-semantic-embeddings-from-hashtags/) | +| Classification | 文本分类 | [Convolutional neural networks for sentence classication (2014)](https://www.aclweb.org/anthology/D14-1181.pdf) | +TagSpace模型 +

+ +

+ +文本分类CNN模型 +

+ +

## 使用教程 ### 数据处理 @@ -53,7 +62,7 @@ mv test.csv raw_big_test_data python text2paddle.py raw_big_train_data/ raw_big_test_data/ train_big_data test_big_data big_vocab_text.txt big_vocab_tag.txt ``` -**(2)TextClassification** +**(2)Classification** 无 @@ -66,7 +75,7 @@ python text2paddle.py raw_big_train_data/ raw_big_test_data/ train_big_data test | 数据集 | 模型 | loss | auc | acc | mae | | :------------------: | :--------------------: | :---------: |:---------: | :---------: |:---------: | | -- | TagSpace | -- | -- | -- | -- | -| -- | TextClassification | -- | -- | -- | -- | +| -- | Classification | -- | -- | -- | -- | ## 分布式 @@ -74,7 +83,7 @@ python text2paddle.py raw_big_train_data/ raw_big_test_data/ train_big_data test | 数据集 | 模型 | 单机 | 同步 (4节点) | 同步 (8节点) | 同步 (16节点) | 同步 (32节点) | | :------------------: | :--------------------: | :---------: |:---------: |:---------: |:---------: |:---------: | | -- | TagSpace | -- | -- | -- | -- | -- | -| -- | TextClassification | -- | -- | -- | -- | -- | +| -- | Classification | -- | -- | -- | -- | -- | ---- @@ -82,4 +91,4 @@ python text2paddle.py raw_big_train_data/ raw_big_test_data/ train_big_data test | 数据集 | 模型 | 单机 | 异步 (4节点) | 异步 (8节点) | 异步 (16节点) | 异步 (32节点) | | :------------------: | :--------------------: | :---------: |:---------: |:---------: |:---------: |:---------: | | -- | TagSpace | -- | -- | -- | -- | -- | -| -- | TextClassification | -- | -- | -- | -- | -- | \ No newline at end of file +| -- | Classification | -- | -- | -- | -- | -- | diff --git a/readme.md b/readme.md index 4873ab053d13cfa16e53121f0cd5dcd02978b282..ff2b64b8d7eea316b4d4a73249a84ff97751b21e 100644 --- a/readme.md +++ b/readme.md @@ -108,7 +108,7 @@ python -m paddlerec.run -m ./models/rank/dnn/config.yaml -e single | 方向 | 模型 | 单机CPU训练 | 单机GPU训练 | 分布式CPU训练 | | :------: | :----------------------------------------------------------------------------: | :---------: | :---------: | :-----------: | -| 内容理解 | [Text-Classifcation](models/contentunderstanding/text_classification/model.py) | ✓ | x | ✓ | +| 内容理解 | [Text-Classifcation](models/contentunderstanding/classification/model.py) | ✓ | x | ✓ | | 内容理解 | [TagSpace](models/contentunderstanding/tagspace/model.py) | ✓ | x | ✓ | | 召回 | [TDM](models/treebased/tdm/model.py) | ✓ | x | ✓ | | 召回 | [Word2Vec](models/recall/word2vec/model.py) | ✓ | x | ✓ | @@ -162,4 +162,4 @@ python -m paddlerec.run -m ./models/rank/dnn/config.yaml -e single ### 许可证书 本项目的发布受[Apache 2.0 license](LICENSE)许可认证。 - \ No newline at end of file +