README.md 3.3 KB
Newer Older
X
Xiaoyao Xi 已提交
1
## Example 1: Classification
W
wangxiao1021 已提交
2 3
This task is a sentiment analysis task. The following sections detail model preparation, dataset preparation, and how to run the task.

X
Xiaoyao Xi 已提交
4
### Step 1: Prepare Pre-trained Model & Dataset
W
wangxiao1021 已提交
5

X
Xiaoyao Xi 已提交
6
#### Pre-trained Model
W
wangxiao1021 已提交
7

W
wangxiao1021 已提交
8
The pre-training model of this mission is: [ERNIE-v1-zh-base](https://github.com/PaddlePaddle/PALM/tree/r0.3-api).
W
wangxiao1021 已提交
9 10 11 12 13 14

Make sure you have downloaded the required pre-training model in the current folder.


#### Dataset

X
Xiaoyao Xi 已提交
15
This example demonstrates with [ChnSentiCorp](https://github.com/SophonPlus/ChineseNlpCorpus/tree/master/datasets/ChnSentiCorp_htl_all), a Chinese sentiment analysis dataset.
W
wangxiao1021 已提交
16 17 18 19 20 21

Download dataset:
```shell
python download.py
```

X
Xiaoyao Xi 已提交
22
If everything goes well, there will be a folder named `data/`  created with all the data files in it.
W
wangxiao1021 已提交
23

X
Xiaoyao Xi 已提交
24
The dataset file (for training) should have 2 fields,  `text_a` and `label`, stored with [tsv](https://en.wikipedia.org/wiki/Tab-separated_values) format. Here shows an example:
W
wangxiao1021 已提交
25 26 27 28 29 30 31 32 33 34

```
label  text_a
0   当当网名不符实,订货多日不见送货,询问客服只会推托,只会要求用户再下订单。如此服务留不住顾客的。去别的网站买书服务更好。
0   XP的驱动不好找!我的17号提的货,现在就降价了100元,而且还送杀毒软件!
1   <荐书> 推荐所有喜欢<红楼>的红迷们一定要收藏这本书,要知道当年我听说这本书的时候花很长时间去图书馆找和借都没能如愿,所以这次一看到当当有,马上买了,红迷们也要记得备货哦!
```

### Step 2: Train & Predict

W
wangxiao1021 已提交
35
The code used to perform this task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run:
W
wangxiao1021 已提交
36 37 38 39 40 41 42 43

```shell
python run.py
```

If you want to specify a specific gpu or use multiple gpus for training, please use **`CUDA_VISIBLE_DEVICES`**, for example:

```shell
W
wangxiao1021 已提交
44
CUDA_VISIBLE_DEVICES=0,1 python run.py
W
wangxiao1021 已提交
45 46
```

W
wangxiao1021 已提交
47 48
Note: On multi-gpu mode, PaddlePALM will automatically split each batch onto the available cards. For example, if the `batch_size` is set 64, and there are 4 cards visible for PaddlePALM, then the batch_size in each card is actually 64/4=16. If you want to change the `batch_size` or the number of gpus used in the example, **you need to ensure that the set batch_size can be divided by the number of cards.**

W
wangxiao1021 已提交
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78

Some logs will be shown below:

```
step 1/154 (epoch 0), loss: 5.512, speed: 0.51 steps/s
step 2/154 (epoch 0), loss: 2.595, speed: 3.36 steps/s
step 3/154 (epoch 0), loss: 1.798, speed: 3.48 steps/s
```


After the run, you can view the saved models in the `outputs/` folder and the predictions in the `outputs/predict` folder. Here are some examples of predictions:


```
{"index": 0, "logits": [-0.2014336884021759, 0.6799028515815735], "probs": [0.29290086030960083, 0.7070990800857544], "label": 1}
{"index": 1, "logits": [0.8593899011611938, -0.29743513464927673], "probs": [0.7607553601264954, 0.23924466967582703], "label": 0}
{"index": 2, "logits": [0.7462944388389587, -0.7083730101585388], "probs": [0.8107157349586487, 0.18928426504135132], "label": 0}
```

### Step 3: Evaluate

Once you have the prediction, you can run the evaluation script to evaluate the model:

```shell
python evaluate.py
```

The evaluation results are as follows:

```
W
wangxiao1021 已提交
79
data num: 1200
W
wangxiao1021 已提交
80
accuracy: 0.9575, precision: 0.9634, recall: 0.9523, f1: 0.9578
W
wangxiao1021 已提交
81
```