未验证 提交 c75dbb37 编写于 作者: Z zhang wenhui 提交者: GitHub

add textclassifiler (#4547)

上级 ab2a43a4
# Text_Classifiler
以下是本例的简要目录结构及说明:
```text
.
├── README.md # 文档
├── train.py # 训练脚本
└── net.py # 网络结构
```
## 简介
文本分类是机器学习中经典的NLP和推荐任务,包括但不限于情感分类,标签推荐等等。以CNN为例介绍文本分类的任务。
## 数据
模型的输入数据为若干ID(表示词或者其他id)和一个label表示分类的标签。
原始数据格式分为两列,第一列是输入数据,第二列是分类标签
```
特 喜欢 这种 好看的 狗狗 1
这 真是 惊艳 世界 的 中国 黑科技 1
环境 特别 差 ,脏兮兮 的,再也 不去 了 0
```
本例采用随机数据作为demo样例
## 训练命令
```
python train.py
```
## 未来工作
添加公开数据集和测试结果
import paddle.fluid as fluid
def cnn_net(dict_dim=100,
max_len=10,
cnn_dim=32,
cnn_filter_size=128,
emb_dim=8,
hid_dim=128,
class_dim=2,
is_prediction=False):
"""
Conv net
"""
data = fluid.data(name="input", shape=[None, max_len], dtype='int64')
label = fluid.data(name="label", shape=[None, 1], dtype='int64')
seq_len = fluid.data(name="seq_len", shape=[None], dtype='int64')
# embedding layer
emb = fluid.embedding(input=data, size=[dict_dim, emb_dim])
emb = fluid.layers.sequence_unpad(emb, length=seq_len)
# convolution layer
conv = fluid.nets.sequence_conv_pool(
input=emb,
num_filters=cnn_dim,
filter_size=cnn_filter_size,
act="tanh",
pool_type="max")
# full connect layer
fc_1 = fluid.layers.fc(input=[conv], size=hid_dim)
# softmax layer
prediction = fluid.layers.fc(input=[fc_1], size=class_dim, act="softmax")
#if is_prediction:
# return prediction
cost = fluid.layers.cross_entropy(input=prediction, label=label)
avg_cost = fluid.layers.mean(x=cost)
acc = fluid.layers.accuracy(input=prediction, label=label)
return avg_cost
import net
import numpy as np
import paddle.fluid as fluid
def gen_data(dict_dim=100, class_size=2, batch_size=32, max_len=10):
return {
"input": np.random.randint(
dict_dim, size=(batch_size, max_len)).astype('int64'),
"seq_len": np.random.randint(
1, high=max_len, size=(batch_size)).astype('int64'),
"label": np.random.randint(
class_size, size=(batch_size, 1)).astype('int64')
}
main_program = fluid.default_startup_program()
startup_program = fluid.default_main_program()
dict_dim = 100
with fluid.program_guard(main_program, startup_program):
cost = net.cnn_net(dict_dim=dict_dim)
optimizer = fluid.optimizer.SGD(learning_rate=0.01)
optimizer.minimize(cost)
exe = fluid.Executor(fluid.CPUPlace())
exe.run(startup_program)
step = 100
for i in range(step):
cost_val = exe.run(main_program,
feed=gen_data(),
fetch_list=[cost.name])
print("step%d cost=%f" % (i, cost_val[0]))
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册