提交 c5f1ce96 编写于 作者: X xixiaoyao

Merge branch 'r0.3-api'

...@@ -186,17 +186,17 @@ Available pretrain items: ...@@ -186,17 +186,17 @@ Available pretrain items:
For more implementation details, see following demos: For more implementation details, see following demos:
- [Sentiment Classification]() - [Sentiment Classification](https://github.com/PaddlePaddle/PALM/tree/master/examples/classification)
- [Quora Question Pairs matching]() - [Quora Question Pairs matching](https://github.com/PaddlePaddle/PALM/tree/master/examples/matching)
- [Tagging]() - [Tagging](https://github.com/PaddlePaddle/PALM/tree/master/examples/tagging)
- [SQuAD machine Reading Comprehension](). - [SQuAD machine Reading Comprehension](https://github.com/PaddlePaddle/PALM/tree/master/examples/mrc).
### set saver ### set saver
To save models/checkpoints and logs during training, just call `trainer.set_saver` method. More implementation details see [this](). To save models/checkpoints and logs during training, just call `trainer.set_saver` method. More implementation details see [this](https://github.com/PaddlePaddle/PALM/tree/master/examples).
### do prediction ### do prediction
To do predict/evaluation after a training stage, just create another three reader, backbone and head instance with `phase='predict'` (repeat step 1~4 above). Then do predicting with `predict` method in trainer (no need to create another trainer). More implementation details see [this](). To do predict/evaluation after a training stage, just create another three reader, backbone and head instance with `phase='predict'` (repeat step 1~4 above). Then do predicting with `predict` method in trainer (no need to create another trainer). More implementation details see [this](https://github.com/PaddlePaddle/PALM/tree/master/examples/predict).
### multi-task learning ### multi-task learning
To run with multi-task learning mode: To run with multi-task learning mode:
...@@ -212,7 +212,7 @@ The save/load and predict operations of a multi_head_trainer is the same as a tr ...@@ -212,7 +212,7 @@ The save/load and predict operations of a multi_head_trainer is the same as a tr
For more implementation details with `multi_head_trainer`, see For more implementation details with `multi_head_trainer`, see
- [ATIS: joint training of dialogue intent recognition and slot filling]() - [ATIS: joint training of dialogue intent recognition and slot filling](https://github.com/PaddlePaddle/PALM/tree/master/examples/multi-task)
- [MRQA: learning reading comprehension auxilarized with mask language model]() (初次发版先不用加) - [MRQA: learning reading comprehension auxilarized with mask language model]() (初次发版先不用加)
...@@ -222,5 +222,4 @@ This tutorial is contributed by [PaddlePaddle](https://github.com/PaddlePaddle/P ...@@ -222,5 +222,4 @@ This tutorial is contributed by [PaddlePaddle](https://github.com/PaddlePaddle/P
## 许可证书 ## 许可证书
此向导由[PaddlePaddle](https://github.com/PaddlePaddle/Paddle)贡献,受[Apache-2.0 license](https://github.com/PaddlePaddle/models/blob/develop/LICENSE)许可认证。 此向导由[PaddlePaddle](https://github.com/PaddlePaddle/Paddle)贡献,受[Apache-2.0 license](https://github.com/PaddlePaddle/models/blob/develop/LICENSE)许可认证。
\ No newline at end of file
...@@ -32,7 +32,7 @@ label text_a ...@@ -32,7 +32,7 @@ label text_a
### Step 2: Train & Predict ### Step 2: Train & Predict
The code used to perform classification task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run: The code used to perform this task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run:
```shell ```shell
python run.py python run.py
......
...@@ -64,7 +64,7 @@ if __name__ == '__main__': ...@@ -64,7 +64,7 @@ if __name__ == '__main__':
# step 8-1*: load pretrained parameters # step 8-1*: load pretrained parameters
trainer.load_pretrain(pre_params) trainer.load_pretrain(pre_params)
# step 8-2*: set saver to save model # step 8-2*: set saver to save model
# save_steps = n_steps // gpu_dev_count - batch_size # save_steps = n_steps
save_steps = 2396 save_steps = 2396
trainer.set_saver(save_steps=save_steps, save_path=save_path, save_type=save_type) trainer.set_saver(save_steps=save_steps, save_path=save_path, save_type=save_type)
# step 8-3: start training # step 8-3: start training
......
...@@ -21,7 +21,7 @@ python download.py ...@@ -21,7 +21,7 @@ python download.py
After the dataset is downloaded, you should convert the data format for training: After the dataset is downloaded, you should convert the data format for training:
```shell ```shell
python process.py quora_duplicate_questions.tsv train.tsv test.tsv python process.py data/quora_duplicate_questions.tsv data/train.tsv data/test.tsv
``` ```
If everything goes well, there will be a folder named `data/` created with all the converted datas in it. If everything goes well, there will be a folder named `data/` created with all the converted datas in it.
...@@ -40,7 +40,7 @@ What are the differences between the Dell Inspiron 3000, 5000, and 7000 series l ...@@ -40,7 +40,7 @@ What are the differences between the Dell Inspiron 3000, 5000, and 7000 series l
### Step 2: Train & Predict ### Step 2: Train & Predict
The code used to perform classification task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run: The code used to perform this task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run:
```shell ```shell
python run.py python run.py
......
...@@ -67,7 +67,7 @@ if __name__ == '__main__': ...@@ -67,7 +67,7 @@ if __name__ == '__main__':
# step 8-1*: load pretrained parameters # step 8-1*: load pretrained parameters
trainer.load_pretrain(pre_params, False) trainer.load_pretrain(pre_params, False)
# step 8-2*: set saver to save model # step 8-2*: set saver to save model
# save_steps = (n_steps-16) // gpu_dev_count # save_steps = n_steps-16
save_steps = 6244 save_steps = 6244
trainer.set_saver(save_path=save_path, save_steps=save_steps, save_type=save_type) trainer.set_saver(save_path=save_path, save_steps=save_steps, save_type=save_type)
# step 8-3: start training # step 8-3: start training
......
## Examples 4: Machine Reading Comprehension ## Example 4: Machine Reading Comprehension
This task is a machine reading comprehension task. The following sections detail model preparation, dataset preparation, and how to run the task. This task is a machine reading comprehension task. The following sections detail model preparation, dataset preparation, and how to run the task.
### Step 1: Prepare Pre-trained Models & Datasets ### Step 1: Prepare Pre-trained Models & Datasets
...@@ -39,12 +39,13 @@ Here is some example datas: ...@@ -39,12 +39,13 @@ Here is some example datas:
} }
] ]
} }
}
``` ```
### Step 2: Train & Predict ### Step 2: Train & Predict
The code used to perform classification task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run: The code used to perform this task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run:
```shell ```shell
python run.py python run.py
......
...@@ -64,7 +64,7 @@ if __name__ == '__main__': ...@@ -64,7 +64,7 @@ if __name__ == '__main__':
# step 8-1*: load pretrained parameters # step 8-1*: load pretrained parameters
trainer.load_pretrain(pre_params) trainer.load_pretrain(pre_params)
# step 8-2*: set saver to save model # step 8-2*: set saver to save model
# save_steps = (n_steps-8) // gpu_dev_count // 4 # save_steps = (n_steps-8) // 4
save_steps = 1520 save_steps = 1520
trainer.set_saver(save_path=save_path, save_steps=save_steps, save_type=save_type) trainer.set_saver(save_path=save_path, save_steps=save_steps, save_type=save_type)
# step 8-3: start training # step 8-3: start training
......
## Example 6: Joint Training in Dialogue
This task is a slot filling task. During training, the task uses intent determination task to assist in training slot filling model. The following sections detail model preparation, dataset preparation, and how to run the task.
### Step 1: Prepare Pre-trained Models & Datasets
#### Pre-trianed Model
The pre-training model of this mission is: [ernie-en-base](https://github.com/PaddlePaddle/PALM/tree/r0.3-api).
Make sure you have downloaded the required pre-training model in the current folder.
#### Dataset
This task uses the `Airline Travel Information System` dataset.
Download dataset:
```shell
python download.py
```
After the dataset is downloaded, you should convert the data format for training:
```shell
python process.py
```
If everything goes well, there will be a folder named `data/atis/` created with all the datas in it.
Here is some example datas:
`data/atis/atis_slot/train.tsv` :
```
text_a label
i want to fly from boston at 838 am and arrive in denver at 1110 in the morning O O O O O B-fromloc.city_name O B-depart_time.time I-depart_time.time O O O B-toloc.city_name O B-arrive_time.time O O B-arrive_time.period_of_day
what flights are available from pittsburgh to baltimore on thursday morning O O O O O B-fromloc.city_name O B-toloc.city_name O B-depart_date.day_name B-depart_time.period_of_day
what is the arrival time in san francisco for the 755 am flight leaving washington O O O B-flight_time I-flight_time O B-fromloc.city_name I-fromloc.city_name O O B-depart_time.time I-depart_time.time O O B-fromloc.city_name
cheapest airfare from tacoma to orlando B-cost_relative O O B-fromloc.city_name O B-toloc.city_name
```
`data/atis/atis_intent/train.tsv` :
```
label text_a
0 i want to fly from boston at 838 am and arrive in denver at 1110 in the morning
0 what flights are available from pittsburgh to baltimore on thursday morning
1 what is the arrival time in san francisco for the 755 am flight leaving washington
2 cheapest airfare from tacoma to orlando
```
### Step 2: Train & Predict
The code used to perform this task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run:
```shell
python run.py
```
If you want to specify a specific gpu or use multiple gpus for training, please use **`CUDA_VISIBLE_DEVICES`**, for example:
```shell
CUDA_VISIBLE_DEVICES=0,1,2 python run.py
```
Some logs will be shown below:
```
global step: 5, slot: step 3/309 (epoch 0), loss: 68.965, speed: 0.58 steps/s
global step: 10, intent: step 3/311 (epoch 0), loss: 3.407, speed: 8.76 steps/s
global step: 15, slot: step 12/309 (epoch 0), loss: 54.611, speed: 1.21 steps/s
global step: 20, intent: step 7/311 (epoch 0), loss: 3.487, speed: 10.28 steps/s
```
After the run, you can view the saved models in the `outputs/` folder.
If you want to use the trained model to predict the `atis_slot & atis_intent` data, run:
```shell
python predict-slot.py
python predict-intent.py
```
If you want to specify a specific gpu or use multiple gpus for predict, please use **`CUDA_VISIBLE_DEVICES`**, for example:
```shell
CUDA_VISIBLE_DEVICES=0,1,2 python predict-slot.py
CUDA_VISIBLE_DEVICES=0,1,2 python predict-intent.py
```
After the run, you can view the predictions in the `outputs/predict-slot` folder and `outputs/predict-intent` folder. Here are some examples of predictions:
`atis_slot`:
```
[129, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 5, 19, 1, 1, 1, 1, 1, 21, 21, 68, 129]
[129, 1, 39, 37, 1, 1, 1, 1, 1, 2, 1, 5, 19, 1, 23, 3, 4, 129, 129, 129, 129, 129]
[129, 1, 39, 37, 1, 1, 1, 1, 1, 1, 2, 1, 5, 19, 129, 129, 129, 129, 129, 129, 129, 129]
[129, 1, 1, 1, 1, 1, 1, 14, 15, 1, 2, 1, 5, 19, 1, 39, 37, 129, 24, 129, 129, 129]
```
`atis_intent`:
```
{"index": 0, "logits": [9.938603401184082, -0.3914794623851776, -0.050973162055015564, -1.0229418277740479, 0.04799401015043259, -0.9632213115692139, -0.6427211761474609, -1.337939739227295, -0.7969412803649902, -1.4441455602645874, -0.6339573264122009, -1.0393054485321045, -0.9242327213287354, -1.9637483358383179, 0.16733427345752716, -0.5280354619026184, -1.7195699214935303, -2.199411630630493, -1.2833174467086792, -1.3081035614013672, -1.6036226749420166, -1.8527079820632935, -2.289180040359497, -2.267214775085449, -2.2578916549682617, -2.2010505199432373], "probs": [0.999531626701355, 3.26210938510485e-05, 4.585415081237443e-05, 1.7348344044876285e-05, 5.06243304698728e-05, 1.8415948943584226e-05, 2.5373808966833167e-05, 1.266065828531282e-05, 2.174747896788176e-05, 1.1384962817828637e-05, 2.5597169951652177e-05, 1.7066764485207386e-05, 1.914815220516175e-05, 6.771284006390488e-06, 5.70411684748251e-05, 2.8457265216275118e-05, 8.644025911053177e-06, 5.349628736439627e-06, 1.3371440218179487e-05, 1.3044088518654462e-05, 9.706698619993404e-06, 7.5665011536329985e-06, 4.890325726591982e-06, 4.99892985317274e-06, 5.045753368904116e-06, 5.340866664482746e-06], "label": 0}
{"index": 1, "logits": [0.8863624930381775, -2.232290506362915, 8.191509246826172, -0.03161466494202614, -0.9149583578109741, -2.172696352005005, -0.3937145471572876, -0.3954394459724426, 1.5333592891693115, 0.8630291223526001, -0.9684226512908936, -2.722721815109253, -0.0060247331857681274, -0.9865402579307556, 1.6328885555267334, 0.3972966969013214, 0.27919167280197144, -1.4911551475524902, -0.9552251696586609, -0.9169244170188904, -0.810670793056488, -1.5118697881698608, -2.0140435695648193, -1.6299077272415161, -1.8589974641799927, -2.07601261138916], "probs": [0.0006675600307062268, 2.9517297662096098e-05, 0.9932880997657776, 0.0002665741485543549, 0.0001102013120544143, 3.132982965325937e-05, 0.00018559220188762993, 0.00018527248175814748, 0.0012749042361974716, 0.0006521637551486492, 0.00010446414671605453, 1.8075270418194123e-05, 0.0002734838053584099, 0.00010258861584588885, 0.0014083238784223795, 0.00040934717981144786, 0.00036374686169438064, 6.193659646669403e-05, 0.00010585198469925672, 0.00010998480865964666, 0.0001223145518451929, 6.0666847275570035e-05, 3.671637750812806e-05, 5.391232480178587e-05, 4.287416595616378e-05, 3.4510172554291785e-05], "label": 0}
{"index": 2, "logits": [9.789957046508789, -0.1730862706899643, -0.7198237776756287, -1.0460278987884521, 0.23521068692207336, -0.5075851678848267, -0.44724929332733154, -1.2945927381515503, -0.6984466314315796, -1.8749892711639404, -0.4631594121456146, -0.6256799697875977, -1.0252169370651245, -1.951456069946289, -0.17572557926177979, -0.6771697402000427, -1.7992591857910156, -2.1457295417785645, -1.4203097820281982, -1.4963451623916626, -1.692310094833374, -1.9219486713409424, -2.2533645629882812, -2.430952310562134, -2.3094685077667236, -2.2399914264678955], "probs": [0.9994625449180603, 4.708383130491711e-05, 2.725377635215409e-05, 1.9667899323394522e-05, 7.082601223373786e-05, 3.3697724575176835e-05, 3.579350595828146e-05, 1.5339375750045292e-05, 2.784266871458385e-05, 8.58508519741008e-06, 3.522853512549773e-05, 2.9944207199150696e-05, 2.0081495677004568e-05, 7.953084605105687e-06, 4.695970710599795e-05, 2.8441407266655006e-05, 9.26048778637778e-06, 6.548832516273251e-06, 1.3527245755540207e-05, 1.2536826943687629e-05, 1.030578732752474e-05, 8.19125762063777e-06, 5.880556273041293e-06, 4.923717369820224e-06, 5.559719284065068e-06, 5.9597273320832755e-06], "label": 0}
{"index": 3, "logits": [9.787659645080566, -0.6223222017288208, -0.03971472755074501, -1.038114070892334, 0.24018540978431702, -0.8904737830162048, -0.7114139795303345, -1.2315020561218262, -0.5120854377746582, -1.4273980855941772, -0.44618460536003113, -1.0241562128067017, -0.9727545380592346, -1.8587366342544556, 0.020689941942691803, -0.6228570342063904, -1.6020199060440063, -2.130260467529297, -1.370570421218872, -1.40530526638031, -1.6782578229904175, -1.94076669216156, -2.2038567066192627, -2.336832284927368, -2.268157720565796, -2.140028953552246], "probs": [0.9994485974311829, 3.0113611501292326e-05, 5.392447565100156e-05, 1.986949791898951e-05, 7.134198676794767e-05, 2.303065048181452e-05, 2.7546762794372626e-05, 1.6375688574044034e-05, 3.362310235388577e-05, 1.3462414244713727e-05, 3.591357381083071e-05, 2.0148761905147694e-05, 2.12115264730528e-05, 8.74570196174318e-06, 5.728216274292208e-05, 3.0097504350123927e-05, 1.1305383850412909e-05, 6.666126409982098e-06, 1.4249604646465741e-05, 1.3763145034317859e-05, 1.0475521776243113e-05, 8.056933438638225e-06, 6.193143690325087e-06, 5.422014055511681e-06, 5.807448815176031e-06, 6.601325367228128e-06], "label": 0}
```
### Step 3: Evaluate
Once you have the prediction, you can run the evaluation script to evaluate the model:
```shell
python evaluate-slot.py
python evaluate-intent.py
```
The evaluation results are as follows:
`atis_slot`:
```
precision: 0.894397728514, recall: 0.894104803493, f1: 0.894251242016
```
`atis_intent`:
```
data num: 893
precision: 0.708846584546, recall: 1.0, f1: 0.999999995
```
# -*- coding: utf-8 -*-
import os
import requests
import tarfile
import shutil
from tqdm import tqdm
def download(src, url):
file_size = int(requests.head(url).headers['Content-Length'])
header = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/'
'70.0.3538.67 Safari/537.36'
}
pbar = tqdm(total=file_size)
resp = requests.get(url, headers=header, stream=True)
with open(src, 'ab') as f:
for chunk in resp.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)
pbar.update(1024)
pbar.close()
return file_size
abs_path = os.path.abspath(__file__)
download_url = "https://baidu-nlp.bj.bcebos.com/dmtk_data_1.0.0.tar.gz"
downlaod_path = os.path.join(os.path.dirname(abs_path), "dmtk_data_1.0.0.tar.gz")
target_dir = os.path.dirname(abs_path)
download(downlaod_path, download_url)
tar = tarfile.open(downlaod_path)
tar.extractall(target_dir)
os.remove(downlaod_path)
shutil.rmtree(os.path.join(target_dir, 'data/dstc2/'))
shutil.rmtree(os.path.join(target_dir, 'data/mrda/'))
shutil.rmtree(os.path.join(target_dir, 'data/multi-woz/'))
shutil.rmtree(os.path.join(target_dir, 'data/swda/'))
shutil.rmtree(os.path.join(target_dir, 'data/udc/'))
# -*- coding: utf-8 -*-
import json
import numpy as np
def accuracy(preds, labels):
preds = np.array(preds)
labels = np.array(labels)
return (preds == labels).mean()
def f1(preds, labels):
preds = np.array(preds)
labels = np.array(labels)
tp = np.sum((labels == '1') & (preds == '1'))
tn = np.sum((labels == '0') & (preds == '0'))
fp = np.sum((labels == '0') & (preds == '1'))
fn = np.sum((labels == '1') & (preds == '0'))
p = tp * 1.0 / (tp + fp)
r = tp * 1.0 / (tp + fn) * 1.0
f1 = (2 * p * r) / (p + r + 1e-8)
return f1
def recall(preds, labels):
preds = np.array(preds)
labels = np.array(labels)
# recall=TP/(TP+FN)
tp = np.sum((labels == '1') & (preds == '1'))
fn = np.sum((labels == '1') & (preds == '0'))
re = tp * 1.0 / (tp + fn)
return re
def res_evaluate(res_dir="./outputs/predict-intent/predictions.json", eval_phase='test'):
if eval_phase == 'test':
data_dir="./data/atis/atis_intent/test.tsv"
elif eval_phase == 'dev':
data_dir="./data/dev.tsv"
else:
assert eval_phase in ['dev', 'test'], 'eval_phase should be dev or test'
labels = []
with open(data_dir, "r") as file:
first_flag = True
for line in file:
line = line.split("\t")
label = line[0]
if label=='label':
continue
labels.append(str(label))
file.close()
preds = []
with open(res_dir, "r") as file:
for line in file.readlines():
line = json.loads(line)
pred = line['label']
preds.append(str(pred))
file.close()
assert len(labels) == len(preds), "prediction result doesn't match to labels"
print('data num: {}'.format(len(labels)))
print("precision: {}, recall: {}, f1: {}".format(accuracy(preds, labels), recall(preds, labels), f1(preds, labels)))
res_evaluate()
# coding=utf-8
import paddlepalm as palm
import json
from paddlepalm.distribute import gpu_dev_count
if __name__ == '__main__':
# configs
max_seqlen = 256
batch_size = 16
num_epochs = 6
print_steps = 5
num_classes = 26
vocab_path = './pretrain/ernie-en-base/vocab.txt'
predict_file = './data/atis/atis_intent/test.tsv'
save_path = './outputs/'
pred_output = './outputs/predict-intent/'
save_type = 'ckpt'
random_seed = 0
pre_params = './pretrain/ernie-en-base/params'
config = json.load(open('./pretrain/ernie-en-base/ernie_config.json'))
input_dim = config['hidden_size']
# ----------------------- for prediction -----------------------
# step 1-1: create readers for prediction
print('prepare to predict...')
predict_cls_reader = palm.reader.ClassifyReader(vocab_path, max_seqlen, seed=random_seed, phase='predict')
# step 1-2: load the training data
predict_cls_reader.load_data(predict_file, batch_size)
# step 2: create a backbone of the model to extract text features
pred_ernie = palm.backbone.ERNIE.from_config(config, phase='predict')
# step 3: register the backbone in reader
predict_cls_reader.register_with(pred_ernie)
# step 4: create the task output head
cls_pred_head = palm.head.Classify(num_classes, input_dim, phase='predict')
# step 5-1: create a task trainer
trainer = palm.Trainer("intent")
# step 5-2: build forward graph with backbone and task head
trainer.build_predict_forward(pred_ernie, cls_pred_head)
# step 6: load pretrained model
pred_model_path = './outputs/ckpt.step9282'
pred_ckpt = trainer.load_ckpt(pred_model_path)
# step 7: fit prepared reader and data
trainer.fit_reader(predict_cls_reader, phase='predict')
# step 8: predict
print('predicting..')
trainer.predict(print_steps=print_steps, output_dir=pred_output)
\ No newline at end of file
# coding=utf-8
import paddlepalm as palm
import json
from paddlepalm.distribute import gpu_dev_count
if __name__ == '__main__':
# configs
max_seqlen = 256
batch_size = 16
num_epochs = 6
print_steps = 5
num_classes = 130
label_map = './data/atis/atis_slot/label_map.json'
vocab_path = './pretrain/ernie-en-base/vocab.txt'
predict_file = './data/atis/atis_slot/test.tsv'
save_path = './outputs/'
pred_output = './outputs/predict-slot/'
save_type = 'ckpt'
random_seed = 0
pre_params = './pretrain/ernie-en-base/params'
config = json.load(open('./pretrain/ernie-en-base/ernie_config.json'))
input_dim = config['hidden_size']
# ----------------------- for prediction -----------------------
# step 1-1: create readers for prediction
print('prepare to predict...')
predict_seq_label_reader = palm.reader.SequenceLabelReader(vocab_path, max_seqlen, label_map, seed=random_seed, phase='predict')
# step 1-2: load the training data
predict_seq_label_reader.load_data(predict_file, batch_size)
# step 2: create a backbone of the model to extract text features
pred_ernie = palm.backbone.ERNIE.from_config(config, phase='predict')
# step 3: register the backbone in reader
predict_seq_label_reader.register_with(pred_ernie)
# step 4: create the task output head
seq_label_pred_head = palm.head.SequenceLabel(num_classes, input_dim, phase='predict')
# step 5-1: create a task trainer
trainer_seq_label = palm.Trainer("slot")
# step 5-2: build forward graph with backbone and task head
trainer_seq_label.build_predict_forward(pred_ernie, seq_label_pred_head)
# step 6: load pretrained model
pred_model_path = './outputs/ckpt.step9282'
pred_ckpt = trainer_seq_label.load_ckpt(pred_model_path)
# step 7: fit prepared reader and data
trainer_seq_label.fit_reader(predict_seq_label_reader, phase='predict')
# step 8: predict
print('predicting..')
trainer_seq_label.predict(print_steps=print_steps, output_dir=pred_output)
\ No newline at end of file
import os
import json
label_new = "data/atis/atis_slot/label_map.json"
label_old = "data/atis/atis_slot/map_tag_slot_id.txt"
train_old = "data/atis/atis_slot/train.txt"
train_new = "data/atis/atis_slot/train.tsv"
dev_old = "data/atis/atis_slot/dev.txt"
dev_new = "data/atis/atis_slot/dev.tsv"
test_old = "data/atis/atis_slot/test.txt"
test_new = "data/atis/atis_slot/test.tsv"
intent_test = "data/atis/atis_intent/test.tsv"
os.rename("data/atis/atis_intent/test.txt", intent_test)
intent_train = "data/atis/atis_intent/train.tsv"
os.rename("data/atis/atis_intent/train.txt", intent_train)
intent_dev = "data/atis/atis_intent/dev.tsv"
os.rename("data/atis/atis_intent/dev.txt", intent_dev)
with open(intent_dev, 'r+') as f:
content = f.read()
f.seek(0, 0)
f.write("label\ttext_a\n"+content)
f.close()
with open(intent_test, 'r+') as f:
content = f.read()
f.seek(0, 0)
f.write("label\ttext_a\n"+content)
f.close()
with open(intent_train, 'r+') as f:
content = f.read()
f.seek(0, 0)
f.write("label\ttext_a\n"+content)
f.close()
os.mknod(label_new)
os.mknod(train_new)
os.mknod(dev_new)
os.mknod(test_new)
tag = []
id = []
map = {}
with open(label_old, "r") as f:
with open(label_new, "w") as f2:
for line in f.readlines():
line = line.split('\t')
tag.append(line[0])
id.append(int(line[1][:-1]))
map[line[1][:-1]] = line[0]
re = {tag[i]:id[i] for i in range(len(tag))}
re = json.dumps(re)
f2.write(re)
f2.close()
f.close()
with open(train_old, "r") as f:
with open(train_new, "w") as f2:
f2.write("text_a\tlabel\n")
for line in f.readlines():
line = line.split('\t')
text = line[0].split(' ')
label = line[1].split(' ')
for t in text:
f2.write(t)
f2.write('\2')
f2.write('\t')
for t in label:
if t.endswith('\n'):
t = t[:-1]
f2.write(map[t])
f2.write('\2')
f2.write('\n')
f2.close()
f.close()
with open(test_old, "r") as f:
with open(test_new, "w") as f2:
f2.write("text_a\tlabel\n")
for line in f.readlines():
line = line.split('\t')
text = line[0].split(' ')
label = line[1].split(' ')
for t in text:
f2.write(t)
f2.write('\2')
f2.write('\t')
for t in label:
if t.endswith('\n'):
t = t[:-1]
f2.write(map[t])
f2.write('\2')
f2.write('\n')
f2.close()
f.close()
with open(dev_old, "r") as f:
with open(dev_new, "w") as f2:
f2.write("text_a\tlabel\n")
for line in f.readlines():
line = line.split('\t')
text = line[0].split(' ')
label = line[1].split(' ')
for t in text:
f2.write(t)
f2.write('\2')
f2.write('\t')
for t in label:
if t.endswith('\n'):
t = t[:-1]
f2.write(map[t])
f2.write('\2')
f2.write('\n')
f2.close()
f.close()
os.remove(label_old)
os.remove(train_old)
os.remove(test_old)
os.remove(dev_old)
\ No newline at end of file
# coding=utf-8
import paddlepalm as palm
import json
from paddlepalm.distribute import gpu_dev_count
if __name__ == '__main__':
# configs
max_seqlen = 128
batch_size = 16
num_epochs = 20
print_steps = 5
lr = 2e-5
num_classes = 130
weight_decay = 0.01
num_classes_intent = 26
dropout_prob = 0.1
random_seed = 0
label_map = './data/atis/atis_slot/label_map.json'
vocab_path = './pretrain/ernie-en-base/vocab.txt'
train_slot = './data/atis/atis_slot/train.tsv'
train_intent = './data/atis/atis_intent/train.tsv'
predict_file = './data/atis/atis_slot/test.tsv'
save_path = './outputs/'
pred_output = './outputs/predict/'
save_type = 'ckpt'
pre_params = './pretrain/ernie-en-base/params'
config = json.load(open('./pretrain/ernie-en-base/ernie_config.json'))
input_dim = config['hidden_size']
# ----------------------- for training -----------------------
# step 1-1: create readers for training
seq_label_reader = palm.reader.SequenceLabelReader(vocab_path, max_seqlen, label_map, seed=random_seed)
cls_reader = palm.reader.ClassifyReader(vocab_path, max_seqlen, seed=random_seed)
# step 1-2: load the training data
seq_label_reader.load_data(train_slot, file_format='tsv', num_epochs=None, batch_size=batch_size)
cls_reader.load_data(train_intent, batch_size=batch_size, num_epochs=None)
# step 2: create a backbone of the model to extract text features
ernie = palm.backbone.ERNIE.from_config(config)
# step 3: register the backbone in readers
seq_label_reader.register_with(ernie)
cls_reader.register_with(ernie)
# step 4: create task output heads
seq_label_head = palm.head.SequenceLabel(num_classes, input_dim, dropout_prob)
cls_head = palm.head.Classify(num_classes_intent, input_dim, dropout_prob)
# step 5-1: create a task trainer
trainer_seq_label = palm.Trainer("slot", mix_ratio=1.0)
trainer_cls = palm.Trainer("intent", mix_ratio=1.0)
trainer = palm.MultiHeadTrainer([trainer_seq_label, trainer_cls])
# # step 5-2: build forward graph with backbone and task head
loss1 = trainer_cls.build_forward(ernie, cls_head)
loss2 = trainer_seq_label.build_forward(ernie, seq_label_head)
loss_var = trainer.build_forward()
# step 6-1*: use warmup
n_steps = seq_label_reader.num_examples * 1.5 * num_epochs // batch_size
warmup_steps = int(0.1 * n_steps)
sched = palm.lr_sched.TriangularSchedualer(warmup_steps, n_steps)
# step 6-2: create a optimizer
adam = palm.optimizer.Adam(loss_var, lr, sched)
# step 6-3: build backward
trainer.build_backward(optimizer=adam, weight_decay=weight_decay)
# step 7: fit prepared reader and data
trainer.fit_readers_with_mixratio([seq_label_reader, cls_reader], "slot", num_epochs)
# step 8-1*: load pretrained parameters
trainer.load_pretrain(pre_params)
# step 8-2*: set saver to save model
save_steps = int(n_steps-batch_size) // 2
# save_steps = 10
trainer.set_saver(save_path=save_path, save_steps=save_steps, save_type=save_type)
# step 8-3: start training
trainer.train(print_steps=print_steps)
\ No newline at end of file
## Examples 3: Tagging ## Example 3: Tagging
This task is a named entity recognition task. The following sections detail model preparation, dataset preparation, and how to run the task. This task is a named entity recognition task. The following sections detail model preparation, dataset preparation, and how to run the task.
### Step 1: Prepare Pre-trained Models & Datasets ### Step 1: Prepare Pre-trained Models & Datasets
...@@ -34,7 +34,7 @@ text_a label ...@@ -34,7 +34,7 @@ text_a label
### Step 2: Train & Predict ### Step 2: Train & Predict
The code used to perform classification task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run: The code used to perform this task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run:
```shell ```shell
python run.py python run.py
......
...@@ -32,26 +32,26 @@ if __name__ == '__main__': ...@@ -32,26 +32,26 @@ if __name__ == '__main__':
# ----------------------- for training ----------------------- # ----------------------- for training -----------------------
# step 1-1: create readers for training # step 1-1: create readers for training
ner_reader = palm.reader.SequenceLabelReader(vocab_path, max_seqlen, label_map, seed=random_seed) seq_label_reader = palm.reader.SequenceLabelReader(vocab_path, max_seqlen, label_map, seed=random_seed)
# step 1-2: load the training data # step 1-2: load the training data
ner_reader.load_data(train_file, file_format='tsv', num_epochs=num_epochs, batch_size=batch_size) seq_label_reader.load_data(train_file, file_format='tsv', num_epochs=num_epochs, batch_size=batch_size)
# step 2: create a backbone of the model to extract text features # step 2: create a backbone of the model to extract text features
ernie = palm.backbone.ERNIE.from_config(config) ernie = palm.backbone.ERNIE.from_config(config)
# step 3: register the backbone in reader # step 3: register the backbone in reader
ner_reader.register_with(ernie) seq_label_reader.register_with(ernie)
# step 4: create the task output head # step 4: create the task output head
ner_head = palm.head.SequenceLabel(num_classes, input_dim, dropout_prob) seq_label_head = palm.head.SequenceLabel(num_classes, input_dim, dropout_prob)
# step 5-1: create a task trainer # step 5-1: create a task trainer
trainer = palm.Trainer(task_name) trainer = palm.Trainer(task_name)
# step 5-2: build forward graph with backbone and task head # step 5-2: build forward graph with backbone and task head
loss_var = trainer.build_forward(ernie, ner_head) loss_var = trainer.build_forward(ernie, seq_label_head)
# step 6-1*: use warmup # step 6-1*: use warmup
n_steps = ner_reader.num_examples * num_epochs // batch_size n_steps = seq_label_reader.num_examples * num_epochs // batch_size
warmup_steps = int(0.1 * n_steps) warmup_steps = int(0.1 * n_steps)
print('total_steps: {}'.format(n_steps)) print('total_steps: {}'.format(n_steps))
print('warmup_steps: {}'.format(warmup_steps)) print('warmup_steps: {}'.format(warmup_steps))
...@@ -62,43 +62,43 @@ if __name__ == '__main__': ...@@ -62,43 +62,43 @@ if __name__ == '__main__':
trainer.build_backward(optimizer=adam, weight_decay=weight_decay) trainer.build_backward(optimizer=adam, weight_decay=weight_decay)
# step 7: fit prepared reader and data # step 7: fit prepared reader and data
trainer.fit_reader(ner_reader) trainer.fit_reader(seq_label_reader)
# step 8-1*: load pretrained parameters # # step 8-1*: load pretrained parameters
trainer.load_pretrain(pre_params) # trainer.load_pretrain(pre_params)
# step 8-2*: set saver to save model # # step 8-2*: set saver to save model
save_steps = (n_steps-20)// gpu_dev_count save_steps = 1951
print('save_steps: {}'.format(save_steps)) # print('save_steps: {}'.format(save_steps))
trainer.set_saver(save_path=save_path, save_steps=save_steps, save_type=save_type) # trainer.set_saver(save_path=save_path, save_steps=save_steps, save_type=save_type)
# step 8-3: start training # # step 8-3: start training
trainer.train(print_steps=train_print_steps) # trainer.train(print_steps=train_print_steps)
# ----------------------- for prediction ----------------------- # ----------------------- for prediction -----------------------
# step 1-1: create readers for prediction # step 1-1: create readers for prediction
print('prepare to predict...') print('prepare to predict...')
predict_ner_reader = palm.reader.SequenceLabelReader(vocab_path, max_seqlen, label_map, phase='predict') predict_seq_label_reader = palm.reader.SequenceLabelReader(vocab_path, max_seqlen, label_map, seed=random_seed, phase='predict')
# step 1-2: load the training data # step 1-2: load the training data
predict_ner_reader.load_data(predict_file, batch_size) predict_seq_label_reader.load_data(predict_file, batch_size)
# step 2: create a backbone of the model to extract text features # step 2: create a backbone of the model to extract text features
pred_ernie = palm.backbone.ERNIE.from_config(config, phase='predict') pred_ernie = palm.backbone.ERNIE.from_config(config, phase='predict')
# step 3: register the backbone in reader # step 3: register the backbone in reader
predict_ner_reader.register_with(pred_ernie) predict_seq_label_reader.register_with(pred_ernie)
# step 4: create the task output head # step 4: create the task output head
ner_pred_head = palm.head.SequenceLabel(num_classes, input_dim, phase='predict') seq_label_pred_head = palm.head.SequenceLabel(num_classes, input_dim, phase='predict')
# step 5: build forward graph with backbone and task head # step 5: build forward graph with backbone and task head
trainer.build_predict_forward(pred_ernie, ner_pred_head) trainer.build_predict_forward(pred_ernie, seq_label_pred_head)
# step 6: load pretrained model # step 6: load pretrained model
pred_model_path = './outputs/ckpt.step' + str(save_steps) pred_model_path = './outputs/ckpt.step' + str(save_steps)
pred_ckpt = trainer.load_ckpt(pred_model_path) pred_ckpt = trainer.load_ckpt(pred_model_path)
# step 7: fit prepared reader and data # step 7: fit prepared reader and data
trainer.fit_reader(predict_ner_reader, phase='predict') trainer.fit_reader(predict_seq_label_reader, phase='predict')
# step 8: predict # step 8: predict
print('predicting..') print('predicting..')
......
...@@ -57,9 +57,9 @@ def yield_pieces(data, distribute_strategy, batch_size): ...@@ -57,9 +57,9 @@ def yield_pieces(data, distribute_strategy, batch_size):
yield temp yield temp
def data_feeder(reader, postprocess_fn=None, prefetch_steps=2): def data_feeder(reader, postprocess_fn=None, prefetch_steps=2, phase='train', is_multi=False):
if postprocess_fn is None: if postprocess_fn is None:
def postprocess_fn(batch): def postprocess_fn(batch, id=-1, phase='train', is_multi=False):
return batch return batch
def worker(reader, dev_count, queue): def worker(reader, dev_count, queue):
...@@ -90,6 +90,10 @@ def data_feeder(reader, postprocess_fn=None, prefetch_steps=2): ...@@ -90,6 +90,10 @@ def data_feeder(reader, postprocess_fn=None, prefetch_steps=2):
queue.task_done() queue.task_done()
if ret is not None: if ret is not None:
batches, num_pad = ret batches, num_pad = ret
if dev_count > 1 and phase == 'train' and is_multi:
id = batches[0]['__task_id'][0]
else:
id = -1
batch_buf = [] batch_buf = []
flag_buf = [] flag_buf = []
for idx, batch in enumerate(batches): for idx, batch in enumerate(batches):
...@@ -97,8 +101,8 @@ def data_feeder(reader, postprocess_fn=None, prefetch_steps=2): ...@@ -97,8 +101,8 @@ def data_feeder(reader, postprocess_fn=None, prefetch_steps=2):
flag = idx-len(batches) < -num_pad flag = idx-len(batches) < -num_pad
# if num_pad > 0: # if num_pad > 0:
# num_pad -= 1 # num_pad -= 1
# batch = postprocess_fn(batch, id) batch = postprocess_fn(batch, id, phase, is_multi=is_multi)
batch = postprocess_fn(batch) # batch = postprocess_fn(batch)
batch_buf.append(batch) batch_buf.append(batch)
flag_buf.append(flag) flag_buf.append(flag)
yield batch_buf, flag_buf yield batch_buf, flag_buf
......
...@@ -111,7 +111,7 @@ class Classify(Head): ...@@ -111,7 +111,7 @@ class Classify(Head):
with open(os.path.join(output_dir, 'predictions.json'), 'w') as writer: with open(os.path.join(output_dir, 'predictions.json'), 'w') as writer:
for i in range(len(self._preds)): for i in range(len(self._preds)):
label = 0 if self._preds[i][0] > self._preds[i][1] else 1 label = 0 if self._preds[i][0] > self._preds[i][1] else 1
result = {'index': i, 'label': label, 'logits': self._preds[i], 'probs': self._preds[i]} result = {'index': i, 'label': label, 'logits': self._preds[i], 'probs': self._probs[i]}
result = json.dumps(result) result = json.dumps(result)
writer.write(result+'\n') writer.write(result+'\n')
print('Predictions saved at '+os.path.join(output_dir, 'predictions.json')) print('Predictions saved at '+os.path.join(output_dir, 'predictions.json'))
......
...@@ -24,24 +24,21 @@ class MaskLM(Head): ...@@ -24,24 +24,21 @@ class MaskLM(Head):
''' '''
mlm mlm
''' '''
def __init__(self, input_dim, vocab_size, hidden_act, initializer_range, dropout_prob=0.0, \ def __init__(self, input_dim, vocab_size, hidden_act, dropout_prob=0.0, \
param_initializer_range=0.02, phase='train'): param_initializer_range=0.02, phase='train'):
self._is_training = phase == 'train' self._is_training = phase == 'train'
self._emb_size = input_dim self._emb_size = input_dim
self._hidden_size = input_dim self._hidden_size = input_dim
self._dropout_prob = dropout_prob if phase == 'train' else 0.0 self._dropout_prob = dropout_prob if phase == 'train' else 0.0
self._param_initializer = fluid.initializer.TruncatedNormal(
scale=param_initializer_range)
self._preds = [] self._preds = []
self._vocab_size = vocab_size self._vocab_size = vocab_size
self._hidden_act = hidden_act self._hidden_act = hidden_act
self._initializer_range = initializer_range self._initializer_range = param_initializer_range
@property @property
def inputs_attrs(self): def inputs_attrs(self):
reader = { reader = {
"token_ids":[[-1, -1], 'int64'],
"mask_label": [[-1], 'int64'], "mask_label": [[-1], 'int64'],
"mask_pos": [[-1], 'int64'], "mask_pos": [[-1], 'int64'],
} }
...@@ -61,21 +58,19 @@ class MaskLM(Head): ...@@ -61,21 +58,19 @@ class MaskLM(Head):
def build(self, inputs, scope_name=""): def build(self, inputs, scope_name=""):
mask_pos = inputs["reader"]["mask_pos"] mask_pos = inputs["reader"]["mask_pos"]
word_emb = inputs["backbone"]["embedding_table"]
enc_out = inputs["backbone"]["encoder_outputs"]
if self._is_training: if self._is_training:
mask_label = inputs["reader"]["mask_label"] mask_label = inputs["reader"]["mask_label"]
l1 = fluid.layers.shape(inputs["reader"]["token_ids"] )[0] l1 = enc_out.shape[0]
# bxs = inputs["reader"]["token_ids"].shape[2].value l2 = enc_out.shape[1]
l2 = fluid.layers.shape(inputs["reader"]["token_ids"][0])[0] bxs = fluid.layers.fill_constant(shape=[1], value=l1*l2, dtype='int64')
bxs = (l1*l2).astype(np.int64)
# max_position = inputs["reader"]["batchsize_x_seqlen"] - 1
max_position = bxs - 1 max_position = bxs - 1
mask_pos = fluid.layers.elementwise_min(mask_pos, max_position) mask_pos = fluid.layers.elementwise_min(mask_pos, max_position)
mask_pos.stop_gradient = True mask_pos.stop_gradient = True
word_emb = inputs["backbone"]["embedding_table"]
enc_out = inputs["backbone"]["encoder_outputs"]
emb_size = word_emb.shape[-1] emb_size = word_emb.shape[-1]
_param_initializer = fluid.initializer.TruncatedNormal( _param_initializer = fluid.initializer.TruncatedNormal(
...@@ -95,7 +90,7 @@ class MaskLM(Head): ...@@ -95,7 +90,7 @@ class MaskLM(Head):
param_attr=fluid.ParamAttr( param_attr=fluid.ParamAttr(
name=scope_name+'mask_lm_trans_fc.w_0', name=scope_name+'mask_lm_trans_fc.w_0',
initializer=_param_initializer), initializer=_param_initializer),
bias_attr=fluid.ParamAttr(name=scope_name+'mask_lm_trans_fc.b_0')) bias_attr=fluid.ParamAttr(name=scope_name+'mask_lm_trans_fc.b_0'))
# transform: layer norm # transform: layer norm
mask_trans_feat = pre_process_layer( mask_trans_feat = pre_process_layer(
mask_trans_feat, 'n', name=scope_name+'mask_lm_trans') mask_trans_feat, 'n', name=scope_name+'mask_lm_trans')
......
...@@ -5,6 +5,7 @@ from paddlepalm.distribute import gpu_dev_count, cpu_dev_count ...@@ -5,6 +5,7 @@ from paddlepalm.distribute import gpu_dev_count, cpu_dev_count
from paddlepalm import Trainer from paddlepalm import Trainer
from paddlepalm.utils import reader_helper from paddlepalm.utils import reader_helper
import numpy as np import numpy as np
from paddlepalm.distribute import gpu_dev_count, data_feeder, decode_fake
import time import time
dev_count = 1 if gpu_dev_count <= 1 else gpu_dev_count dev_count = 1 if gpu_dev_count <= 1 else gpu_dev_count
...@@ -55,7 +56,8 @@ class MultiHeadTrainer(Trainer): ...@@ -55,7 +56,8 @@ class MultiHeadTrainer(Trainer):
for t in self._trainers: for t in self._trainers:
t._set_multitask() t._set_multitask()
def build_forward(self, backbone, heads): # def build_forward(self, backbone, heads):
def build_forward(self):
""" """
Build forward computation graph for training, which usually built from input layer to loss node. Build forward computation graph for training, which usually built from input layer to loss node.
...@@ -66,20 +68,13 @@ class MultiHeadTrainer(Trainer): ...@@ -66,20 +68,13 @@ class MultiHeadTrainer(Trainer):
Return: Return:
- loss_var: a Variable object. The computational graph variable(node) of loss. - loss_var: a Variable object. The computational graph variable(node) of loss.
""" """
head_dict = {}
if isinstance(heads, list): backbone = self._trainers[0]._backbone
head_dict = {k.name: v for k,v in zip(self._trainers, heads)} for i in self._trainers:
elif isinstance(heads, dict): assert i._task_head is not None and i._backbone is not None, "You should build forward for the {} task".format(i._name)
head_dict = heads assert i._backbone == backbone, "The backbone for each task must be the same"
else: head_dict[i._name] = i._task_head
raise ValueError()
num_heads = len(self._trainers)
assert len(head_dict) == num_heads
for t in self._trainers:
assert t.name in head_dict, "expected: {}, exists: {}".format(t.name, head_dict.keys())
train_prog = fluid.Program() train_prog = fluid.Program()
train_init_prog = fluid.Program() train_init_prog = fluid.Program()
self._train_prog = train_prog self._train_prog = train_prog
...@@ -87,27 +82,15 @@ class MultiHeadTrainer(Trainer): ...@@ -87,27 +82,15 @@ class MultiHeadTrainer(Trainer):
def get_loss(i): def get_loss(i):
head = head_dict[self._trainers[i].name] head = head_dict[self._trainers[i].name]
# loss_var = self._trainers[i].build_forward(backbone, head, train_prog, train_init_prog) self._trainers[i]._lock_prog = True
loss_var = self._trainers[i].build_forward(backbone, head) loss_var = self._trainers[i].build_forward(backbone, head)
self._trainers[i]._lock_prog = False
return loss_var return loss_var
# task_fns = {} task_fns = {i: lambda i=i: get_loss(i) for i in range(len(self._trainers))}
# for i in range(num_heads):
# def task_loss():
# task_id = i
# return lambda: get_loss(task_id)
# task_fns[i] = task_loss()
# task_fns = {i: lambda: get_loss(i) for i in range(num_heads)}
task_fns = {i: lambda i=i: get_loss(i) for i in range(num_heads)}
with fluid.program_guard(train_prog, train_init_prog): with fluid.program_guard(train_prog, train_init_prog):
task_id_var = fluid.data(name="__task_id",shape=[1],dtype='int64') task_id_var = fluid.data(name="__task_id",shape=[1],dtype='int64')
# task_id_var = fluid.layers.fill_constant(shape=[1],dtype='int64', value=1)
# print(task_id_var.name)
loss_var = layers.switch_case( loss_var = layers.switch_case(
branch_index=task_id_var, branch_index=task_id_var,
...@@ -200,15 +183,15 @@ class MultiHeadTrainer(Trainer): ...@@ -200,15 +183,15 @@ class MultiHeadTrainer(Trainer):
feed_batch_process_fn = reader_helper.create_feed_batch_process_fn(net_inputs) feed_batch_process_fn = reader_helper.create_feed_batch_process_fn(net_inputs)
if gpu_dev_count > 1: if gpu_dev_count > 1:
distribute_feeder_fn = data_feeder(iterator_fn, feed_batch_process_fn) distribute_feeder_fn = data_feeder(iterator_fn, feed_batch_process_fn, phase=phase, is_multi=True)
else: else:
distribute_feeder_fn = iterator_fn distribute_feeder_fn = iterator_fn()
if phase == 'train': if phase == 'train':
self._train_reader = distribute_feeder_fn() self._train_reader = distribute_feeder_fn
self._feed_batch_process_fn = feed_batch_process_fn self._feed_batch_process_fn = feed_batch_process_fn
elif phase == 'predict': elif phase == 'predict':
self._predict_reader = distribute_feeder_fn() self._predict_reader = distribute_feeder_fn
self._pred_feed_batch_process_fn = feed_batch_process_fn self._pred_feed_batch_process_fn = feed_batch_process_fn
def _check_finish(self, task_name, silent=False): def _check_finish(self, task_name, silent=False):
...@@ -241,7 +224,6 @@ class MultiHeadTrainer(Trainer): ...@@ -241,7 +224,6 @@ class MultiHeadTrainer(Trainer):
task_rt_outputs = {k[len(self._trainers[task_id].name+'.'):]: v for k,v in rt_outputs.items() if k.startswith(self._trainers[task_id].name+'.')} task_rt_outputs = {k[len(self._trainers[task_id].name+'.'):]: v for k,v in rt_outputs.items() if k.startswith(self._trainers[task_id].name+'.')}
self._trainers[task_id]._task_head.batch_postprocess(task_rt_outputs) self._trainers[task_id]._task_head.batch_postprocess(task_rt_outputs)
if print_steps > 0 and self._cur_train_step % print_steps == 0: if print_steps > 0 and self._cur_train_step % print_steps == 0:
loss = rt_outputs[self._trainers[task_id].name+'.loss'] loss = rt_outputs[self._trainers[task_id].name+'.loss']
loss = np.mean(np.squeeze(loss)).tolist() loss = np.mean(np.squeeze(loss)).tolist()
...@@ -276,8 +258,8 @@ class MultiHeadTrainer(Trainer): ...@@ -276,8 +258,8 @@ class MultiHeadTrainer(Trainer):
def train_one_step(self, batch): def train_one_step(self, batch):
if dev_count > 1: if dev_count > 1:
assert isinstance(batch, list) assert isinstance(batch, tuple)
task_id = batch[0]['__task_id'][0] task_id = batch[0][0]['__task_id'][0]
else: else:
assert isinstance(batch, dict) assert isinstance(batch, dict)
task_id = batch['__task_id'][0] task_id = batch['__task_id'][0]
......
...@@ -34,7 +34,6 @@ class MaskLMReader(Reader): ...@@ -34,7 +34,6 @@ class MaskLMReader(Reader):
for_cn = lang.lower() == 'cn' or lang.lower() == 'chinese' for_cn = lang.lower() == 'cn' or lang.lower() == 'chinese'
self._register.add('token_ids')
self._register.add('mask_pos') self._register.add('mask_pos')
if phase == 'train': if phase == 'train':
self._register.add('mask_label') self._register.add('mask_label')
......
...@@ -46,7 +46,7 @@ class Trainer(object): ...@@ -46,7 +46,7 @@ class Trainer(object):
self._pred_reader = None self._pred_reader = None
self._task_head = None self._task_head = None
self._pred_head = None self._pred_head = None
self._train_reader = None self._train_reader = None
self._predict_reader = None self._predict_reader = None
self._train_iterator = None self._train_iterator = None
...@@ -54,6 +54,8 @@ class Trainer(object): ...@@ -54,6 +54,8 @@ class Trainer(object):
self._train_init = False self._train_init = False
self._predict_init = False self._predict_init = False
self._train_init_prog = None
self._pred_init_prog = None
self._check_save = lambda: False self._check_save = lambda: False
...@@ -105,6 +107,7 @@ class Trainer(object): ...@@ -105,6 +107,7 @@ class Trainer(object):
'fetch_list': 'self._pred_fetch_name_list'} 'fetch_list': 'self._pred_fetch_name_list'}
self._lock = False self._lock = False
self._lock_prog = False
self._build_forward = False self._build_forward = False
def build_forward(self, backbone, task_head): def build_forward(self, backbone, task_head):
...@@ -159,9 +162,11 @@ class Trainer(object): ...@@ -159,9 +162,11 @@ class Trainer(object):
train_prog = fluid.Program() train_prog = fluid.Program()
train_init_prog = fluid.Program() train_init_prog = fluid.Program()
self._train_prog = train_prog if not self._lock_prog:
self._train_init_prog = train_init_prog self._train_prog = train_prog
if not self._multi_task: self._train_init_prog = train_init_prog
if not self._lock_prog:
with fluid.program_guard(train_prog, train_init_prog): with fluid.program_guard(train_prog, train_init_prog):
net_inputs = reader_helper.create_net_inputs(input_attrs, async=False) net_inputs = reader_helper.create_net_inputs(input_attrs, async=False)
bb_output_vars = backbone.build(net_inputs) bb_output_vars = backbone.build(net_inputs)
...@@ -182,7 +187,7 @@ class Trainer(object): ...@@ -182,7 +187,7 @@ class Trainer(object):
task_inputs['reader'] = task_inputs_from_reader task_inputs['reader'] = task_inputs_from_reader
scope = self.name+'.' scope = self.name+'.'
if not self._multi_task: if not self._lock_prog:
with fluid.program_guard(train_prog, train_init_prog): with fluid.program_guard(train_prog, train_init_prog):
with fluid.unique_name.guard(scope): with fluid.unique_name.guard(scope):
output_vars = self._build_head(task_inputs, phase='train', scope=scope) output_vars = self._build_head(task_inputs, phase='train', scope=scope)
...@@ -207,7 +212,7 @@ class Trainer(object): ...@@ -207,7 +212,7 @@ class Trainer(object):
# task_id_vec = layers.one_hot(task_id_var, num_instances) # task_id_vec = layers.one_hot(task_id_var, num_instances)
# losses = fluid.layers.concat([task_output_vars[inst.name+'/loss'] for inst in instances], axis=0) # losses = fluid.layers.concat([task_output_vars[inst.name+'/loss'] for inst in instances], axis=0)
# loss = layers.reduce_sum(task_id_vec * losses) # loss = layers.reduce_sum(task_id_vec * losses)
if not self._multi_task: if not self._lock_prog:
with fluid.program_guard(train_prog, train_init_prog): with fluid.program_guard(train_prog, train_init_prog):
loss_var = fluid.layers.reduce_sum(task_output_vars[self.name+'.loss']) loss_var = fluid.layers.reduce_sum(task_output_vars[self.name+'.loss'])
else: else:
...@@ -386,8 +391,9 @@ class Trainer(object): ...@@ -386,8 +391,9 @@ class Trainer(object):
reader_helper.check_io(self._task_head.inputs_attrs['backbone'], self._backbone.outputs_attr, in_name='task_head(backbone, train)', out_name='backbone') reader_helper.check_io(self._task_head.inputs_attrs['backbone'], self._backbone.outputs_attr, in_name='task_head(backbone, train)', out_name='backbone')
elif phase == 'predict': elif phase == 'predict':
self._predict_reader = reader self._predict_reader = reader
tail = self._num_examples % batch_size > 0 # tail = self._num_examples % batch_size > 0
self._pred_steps_pur_epoch = reader.num_examples // batch_size + 1 if tail else 0 # self._pred_steps_pur_epoch = reader.num_examples // batch_size + 1 if tail else 0
self._pred_steps_pur_epoch = reader.num_examples // batch_size
shape_and_dtypes = self._pred_shape_and_dtypes shape_and_dtypes = self._pred_shape_and_dtypes
name_to_position = self._pred_name_to_position name_to_position = self._pred_name_to_position
net_inputs = self._pred_net_inputs net_inputs = self._pred_net_inputs
...@@ -415,7 +421,7 @@ class Trainer(object): ...@@ -415,7 +421,7 @@ class Trainer(object):
self._raw_iterator_fn = iterator_fn self._raw_iterator_fn = iterator_fn
feed_batch_process_fn = reader_helper.create_feed_batch_process_fn(net_inputs) feed_batch_process_fn = reader_helper.create_feed_batch_process_fn(net_inputs)
if gpu_dev_count > 1: if gpu_dev_count > 1:
distribute_feeder_fn = data_feeder(iterator_fn, feed_batch_process_fn) distribute_feeder_fn = data_feeder(iterator_fn, feed_batch_process_fn, phase=phase)
else: else:
distribute_feeder_fn = iterator_fn() distribute_feeder_fn = iterator_fn()
...@@ -427,6 +433,7 @@ class Trainer(object): ...@@ -427,6 +433,7 @@ class Trainer(object):
self._pred_feed_batch_process_fn = feed_batch_process_fn self._pred_feed_batch_process_fn = feed_batch_process_fn
# return distribute_feeder_fn() # return distribute_feeder_fn()
def load_ckpt(self, model_path): def load_ckpt(self, model_path):
""" """
load training checkpoint for further training or predicting. load training checkpoint for further training or predicting.
...@@ -465,7 +472,7 @@ class Trainer(object): ...@@ -465,7 +472,7 @@ class Trainer(object):
strict=True) strict=True)
else: else:
raise Exception("model not found. You should at least build_forward or build_predict_forward to load its checkpoint.") raise Exception("model not found. You should at least build_forward or build_predict_forward to load its checkpoint.")
def load_predict_model(self, model_path, convert=False): def load_predict_model(self, model_path, convert=False):
""" """
load pretrain models(backbone) for training. load pretrain models(backbone) for training.
...@@ -510,6 +517,7 @@ class Trainer(object): ...@@ -510,6 +517,7 @@ class Trainer(object):
save_type: a string. The type of saved model. Currently support checkpoint(ckpt) and predict model(predict), default is ckpt. If both two types are needed to save, you can set as "ckpt,predict". save_type: a string. The type of saved model. Currently support checkpoint(ckpt) and predict model(predict), default is ckpt. If both two types are needed to save, you can set as "ckpt,predict".
""" """
save_type = save_type.split(',') save_type = save_type.split(',')
if 'predict' in save_type: if 'predict' in save_type:
...@@ -534,6 +542,7 @@ class Trainer(object): ...@@ -534,6 +542,7 @@ class Trainer(object):
def temp_func(): def temp_func():
if (self._save_predict or self._save_ckpt) and self._cur_train_step % save_steps == 0: if (self._save_predict or self._save_ckpt) and self._cur_train_step % save_steps == 0:
if self._save_predict: if self._save_predict:
self._save(save_path, suffix='pred.step'+str(self._cur_train_step)) self._save(save_path, suffix='pred.step'+str(self._cur_train_step))
print('predict model has been saved at '+os.path.join(save_path, 'pred.step'+str(self._cur_train_step))) print('predict model has been saved at '+os.path.join(save_path, 'pred.step'+str(self._cur_train_step)))
...@@ -600,7 +609,7 @@ class Trainer(object): ...@@ -600,7 +609,7 @@ class Trainer(object):
(self._cur_train_step-1) % self._steps_pur_epoch + 1 , self._steps_pur_epoch, self._cur_train_epoch, (self._cur_train_step-1) % self._steps_pur_epoch + 1 , self._steps_pur_epoch, self._cur_train_epoch,
loss, print_steps / time_cost)) loss, print_steps / time_cost))
time_begin = time.time() time_begin = time.time()
self._check_save() # self._check_save()
# if cur_task.train_finish and cur_task.cur_train_step + cur_task.cur_train_epoch * cur_task.steps_pur_epoch == cur_task.expected_train_steps: # if cur_task.train_finish and cur_task.cur_train_step + cur_task.cur_train_epoch * cur_task.steps_pur_epoch == cur_task.expected_train_steps:
# print(cur_task.name+': train finished!') # print(cur_task.name+': train finished!')
# cur_task.save() # cur_task.save()
...@@ -718,15 +727,16 @@ class Trainer(object): ...@@ -718,15 +727,16 @@ class Trainer(object):
feed, mask = batch feed, mask = batch
rt_outputs = exe.run(distribute_train_prog, feed=feed, fetch_list=fetch_list) rt_outputs = exe.run(distribute_train_prog, feed=feed, fetch_list=fetch_list)
num_fakes = decode_fake(len(rt_outputs[0]), mask, self._train_batch_size) num_fakes = decode_fake(len(rt_outputs[0]), mask, self._train_batch_size)
for _ in range(num_fakes): if num_fakes:
for item in rt_outputs: rt_outputs = [i[:-num_fakes] for i in rt_outputs]
item.pop()
else: else:
feed = self._feed_batch_process_fn(batch) feed = self._feed_batch_process_fn(batch)
rt_outputs = exe.run(distribute_train_prog, feed=feed, fetch_list=fetch_list) rt_outputs = exe.run(distribute_train_prog, feed=feed, fetch_list=fetch_list)
rt_outputs = {k:v for k,v in zip(self._fetch_names, rt_outputs)} rt_outputs = {k:v for k,v in zip(self._fetch_names, rt_outputs)}
self._cur_train_step += 1 self._cur_train_step += 1
self._check_save()
self._cur_train_epoch = (self._cur_train_step-1) // self._steps_pur_epoch self._cur_train_epoch = (self._cur_train_step-1) // self._steps_pur_epoch
return rt_outputs return rt_outputs
...@@ -735,9 +745,8 @@ class Trainer(object): ...@@ -735,9 +745,8 @@ class Trainer(object):
feed, mask = batch feed, mask = batch
rt_outputs = self._exe.run(self._distribute_pred_prog, feed=feed, fetch_list=self._pred_fetch_list) rt_outputs = self._exe.run(self._distribute_pred_prog, feed=feed, fetch_list=self._pred_fetch_list)
num_fakes = decode_fake(len(rt_outputs[0]), mask, self._predict_batch_size) num_fakes = decode_fake(len(rt_outputs[0]), mask, self._predict_batch_size)
for _ in range(num_fakes): if num_fakes:
for item in rt_outputs: rt_outputs = [i[:-num_fakes] for i in rt_outputs]
item.pop()
else: else:
feed = self._pred_feed_batch_process_fn(batch) feed = self._pred_feed_batch_process_fn(batch)
rt_outputs = self._exe.run(self._distribute_pred_prog, feed=feed, fetch_list=self._pred_fetch_list) rt_outputs = self._exe.run(self._distribute_pred_prog, feed=feed, fetch_list=self._pred_fetch_list)
...@@ -750,7 +759,7 @@ class Trainer(object): ...@@ -750,7 +759,7 @@ class Trainer(object):
@property @property
def name(self): def name(self):
return self._name return self._name
@property @property
def num_examples(self): def num_examples(self):
return self._num_examples return self._num_examples
......
...@@ -21,13 +21,20 @@ import numpy as np ...@@ -21,13 +21,20 @@ import numpy as np
import paddle import paddle
from paddle import fluid from paddle import fluid
from paddle.fluid import layers from paddle.fluid import layers
from paddlepalm.distribute import gpu_dev_count, cpu_dev_count
dev_count = 1 if gpu_dev_count <= 1 else gpu_dev_count
def create_feed_batch_process_fn(net_inputs): def create_feed_batch_process_fn(net_inputs):
def feed_batch_process_fn(data): def feed_batch_process_fn(data, id=-1, phase='train', is_multi=False):
temp = {} temp = {}
for q, var in net_inputs.items(): if dev_count > 1 and phase=='train' and is_multi:
inputs = net_inputs[id]
else:
inputs= net_inputs
for q, var in inputs.items():
if isinstance(var, str) or isinstance(var, unicode): if isinstance(var, str) or isinstance(var, unicode):
temp[var] = data[q] temp[var] = data[q]
else: else:
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册