README.md 2.9 KB
Newer Older
X
Xiaoyao Xi 已提交
1
## Example 1: Classification
W
wangxiao1021 已提交
2 3
This task is a sentiment analysis task. The following sections detail model preparation, dataset preparation, and how to run the task.

X
Xiaoyao Xi 已提交
4
### Step 1: Prepare Pre-trained Model & Dataset
W
wangxiao1021 已提交
5

X
Xiaoyao Xi 已提交
6
#### Pre-trained Model
W
wangxiao1021 已提交
7

W
wangxiao1021 已提交
8
The pre-training model of this mission is: [ERNIE-v1-zh-base](https://github.com/PaddlePaddle/PALM/tree/r0.3-api).
W
wangxiao1021 已提交
9 10 11 12 13 14

Make sure you have downloaded the required pre-training model in the current folder.


#### Dataset

X
Xiaoyao Xi 已提交
15
This example demonstrates with [ChnSentiCorp](https://github.com/SophonPlus/ChineseNlpCorpus/tree/master/datasets/ChnSentiCorp_htl_all), a Chinese sentiment analysis dataset.
W
wangxiao1021 已提交
16 17 18 19 20 21

Download dataset:
```shell
python download.py
```

X
Xiaoyao Xi 已提交
22
If everything goes well, there will be a folder named `data/`  created with all the data files in it.
W
wangxiao1021 已提交
23

X
Xiaoyao Xi 已提交
24
The dataset file (for training) should have 2 fields,  `text_a` and `label`, stored with [tsv](https://en.wikipedia.org/wiki/Tab-separated_values) format. Here shows an example:
W
wangxiao1021 已提交
25 26 27 28 29 30 31 32 33 34

```
label  text_a
0   当当网名不符实,订货多日不见送货,询问客服只会推托,只会要求用户再下订单。如此服务留不住顾客的。去别的网站买书服务更好。
0   XP的驱动不好找!我的17号提的货,现在就降价了100元,而且还送杀毒软件!
1   <荐书> 推荐所有喜欢<红楼>的红迷们一定要收藏这本书,要知道当年我听说这本书的时候花很长时间去图书馆找和借都没能如愿,所以这次一看到当当有,马上买了,红迷们也要记得备货哦!
```

### Step 2: Train & Predict

W
wangxiao1021 已提交
35
The code used to perform this task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run:
W
wangxiao1021 已提交
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76

```shell
python run.py
```

If you want to specify a specific gpu or use multiple gpus for training, please use **`CUDA_VISIBLE_DEVICES`**, for example:

```shell
CUDA_VISIBLE_DEVICES=0,1,2 python run.py
```


Some logs will be shown below:

```
step 1/154 (epoch 0), loss: 5.512, speed: 0.51 steps/s
step 2/154 (epoch 0), loss: 2.595, speed: 3.36 steps/s
step 3/154 (epoch 0), loss: 1.798, speed: 3.48 steps/s
```


After the run, you can view the saved models in the `outputs/` folder and the predictions in the `outputs/predict` folder. Here are some examples of predictions:


```
{"index": 0, "logits": [-0.2014336884021759, 0.6799028515815735], "probs": [0.29290086030960083, 0.7070990800857544], "label": 1}
{"index": 1, "logits": [0.8593899011611938, -0.29743513464927673], "probs": [0.7607553601264954, 0.23924466967582703], "label": 0}
{"index": 2, "logits": [0.7462944388389587, -0.7083730101585388], "probs": [0.8107157349586487, 0.18928426504135132], "label": 0}
```

### Step 3: Evaluate

Once you have the prediction, you can run the evaluation script to evaluate the model:

```shell
python evaluate.py
```

The evaluation results are as follows:

```
W
wangxiao1021 已提交
77
data num: 1200
W
wangxiao1021 已提交
78 79
precision: 0.956666666667, recall: 0.949013157895, f1: 0.95688225039
```