Merge branch 'r0.3-api'

c5f1ce96 · xixiaoyao · 1ff55105 · 15ab54d7 · c5f1ce96 · c5f1ce96
23 changed file
--- a/README.md
+++ b/README.md
@@ -186,17 +186,17 @@ Available pretrain items:
 For more implementation details, see following demos: 
- [Sentiment Classification]()
+- [Sentiment Classification](https://github.com/PaddlePaddle/PALM/tree/master/examples/classification)
- [Quora Question Pairs matching]()
+- [Quora Question Pairs matching](https://github.com/PaddlePaddle/PALM/tree/master/examples/matching)
- [Tagging]()
+- [Tagging](https://github.com/PaddlePaddle/PALM/tree/master/examples/tagging)
- [SQuAD machine Reading Comprehension]().
+- [SQuAD machine Reading Comprehension](https://github.com/PaddlePaddle/PALM/tree/master/examples/mrc).
 ### set saver
-To save models/checkpoints and logs during training, just call `trainer.set_saver` method. More implementation details see [this]().
+To save models/checkpoints and logs during training, just call `trainer.set_saver` method. More implementation details see [this](https://github.com/PaddlePaddle/PALM/tree/master/examples).
 ### do prediction
-To do predict/evaluation after a training stage, just create another three reader, backbone and head instance with `phase='predict'` (repeat step 1~4 above). Then do predicting with `predict` method in trainer (no need to create another trainer). More implementation details see [this]().
+To do predict/evaluation after a training stage, just create another three reader, backbone and head instance with `phase='predict'` (repeat step 1~4 above). Then do predicting with `predict` method in trainer (no need to create another trainer). More implementation details see [this](https://github.com/PaddlePaddle/PALM/tree/master/examples/predict).
 ### multi-task learning
 To run with multi-task learning mode:
@@ -212,7 +212,7 @@ The save/load and predict operations of a multi_head_trainer is the same as a tr
 For more implementation details with `multi_head_trainer`, see
- [ATIS: joint training of dialogue intent recognition and slot filling]()
+- [ATIS: joint training of dialogue intent recognition and slot filling](https://github.com/PaddlePaddle/PALM/tree/master/examples/multi-task)
 - [MRQA: learning reading comprehension auxilarized with mask language model]() (初次发版先不用加)
@@ -222,5 +222,4 @@ This tutorial is contributed by [PaddlePaddle](https://github.com/PaddlePaddle/P
 ## 许可证书
 此向导由[PaddlePaddle](https://github.com/PaddlePaddle/Paddle)贡献，受[Apache-2.0 license](https://github.com/PaddlePaddle/models/blob/develop/LICENSE)许可认证。
\ No newline at end of file
--- a/examples/classification/README.md
+++ b/examples/classification/README.md
@@ -32,7 +32,7 @@ label  text_a
 ### Step 2: Train & Predict
-The code used to perform classification task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run:
+The code used to perform this task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run:
 ```shell
 python run.py

--- a/examples/classification/run.py
+++ b/examples/classification/run.py
@@ -64,7 +64,7 @@ if __name__ == '__main__':
    # step 8-1*: load pretrained parameters
    trainer.load_pretrain(pre_params)
    # step 8-2*: set saver to save model
-    # save_steps = n_steps // gpu_dev_count - batch_size
+    # save_steps = n_steps 
    save_steps = 2396
    trainer.set_saver(save_steps=save_steps, save_path=save_path, save_type=save_type)
    # step 8-3: start training

--- a/examples/matching/README.md
+++ b/examples/matching/README.md
@@ -21,7 +21,7 @@ python download.py
 After the dataset is downloaded, you should convert the data format for training:
 ```shell
-python process.py quora_duplicate_questions.tsv train.tsv test.tsv
+python process.py data/quora_duplicate_questions.tsv data/train.tsv data/test.tsv
 ```
 If everything goes well, there will be a folder named `data/`  created with all the converted datas in it.
@@ -40,7 +40,7 @@ What are the differences between the Dell Inspiron 3000, 5000, and 7000 series l
 ### Step 2: Train & Predict
-The code used to perform classification task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run:
+The code used to perform this task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run:
 ```shell
 python run.py

--- a/examples/matching/run.py
+++ b/examples/matching/run.py
@@ -67,7 +67,7 @@ if __name__ == '__main__':
    # step 8-1*: load pretrained parameters
    trainer.load_pretrain(pre_params, False)
    # step 8-2*: set saver to save model
-    # save_steps = (n_steps-16) // gpu_dev_count
+    # save_steps = n_steps-16
    save_steps = 6244
    trainer.set_saver(save_path=save_path, save_steps=save_steps, save_type=save_type)
    # step 8-3: start training

--- a/examples/mrc/README.md
+++ b/examples/mrc/README.md
-## Examples 4: Machine Reading Comprehension
+## Example 4: Machine Reading Comprehension
 This task is a machine reading comprehension task. The following sections detail model preparation, dataset preparation, and how to run the task.
 ### Step 1: Prepare Pre-trained Models & Datasets
@@ -39,12 +39,13 @@ Here is some example datas:
                 }
               ]
             }
+         }
 ```
 ### Step 2: Train & Predict
-The code used to perform classification task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run:
+The code used to perform this task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run:
 ```shell
 python run.py

--- a/examples/mrc/run.py
+++ b/examples/mrc/run.py
@@ -64,7 +64,7 @@ if __name__ == '__main__':
    # step 8-1*: load pretrained parameters
    trainer.load_pretrain(pre_params)
    # step 8-2*: set saver to save model
-    # save_steps = (n_steps-8) // gpu_dev_count // 4
+    # save_steps = (n_steps-8)  // 4
    save_steps = 1520
    trainer.set_saver(save_path=save_path, save_steps=save_steps, save_type=save_type)
    # step 8-3: start training

--- a/examples/multi-task/README.md
+++ b/examples/multi-task/README.md
+## Example 6: Joint Training in Dialogue
+This task is a slot filling task. During training, the task uses intent determination task to assist in training slot filling model. The following sections detail model preparation, dataset preparation, and how to run the task.
+### Step 1: Prepare Pre-trained Models & Datasets
+#### Pre-trianed Model
+The pre-training model of this mission is: [ernie-en-base](https://github.com/PaddlePaddle/PALM/tree/r0.3-api).
+Make sure you have downloaded the required pre-training model in the current folder.
+#### Dataset
+This task uses the `Airline Travel Information System` dataset. 
+Download dataset:
+```shell
+python download.py
+```
+After the dataset is downloaded, you should convert the data format for training:
+```shell
+python process.py
+```
+If everything goes well, there will be a folder named `data/atis/`  created with all the datas in it.
+Here is some example datas:
+`data/atis/atis_slot/train.tsv` :
+```
+text_a	label
+i want to fly from boston at 838 am and arrive in denver at 1110 in the morning 	O O O O O B-fromloc.city_name O B-depart_time.time I-depart_time.time O O O B-toloc.city_name O B-arrive_time.time O O B-arrive_time.period_of_day 
+what flights are available from pittsburgh to baltimore on thursday morning 	O O O O O B-fromloc.city_name O B-toloc.city_name O B-depart_date.day_name B-depart_time.period_of_day 
+what is the arrival time in san francisco for the 755 am flight leaving washington 	O O O B-flight_time I-flight_time O B-fromloc.city_name I-fromloc.city_name O O B-depart_time.time I-depart_time.time O O B-fromloc.city_name 
+cheapest airfare from tacoma to orlando 	B-cost_relative O O B-fromloc.city_name O B-toloc.city_name 
+```
+`data/atis/atis_intent/train.tsv` :
+```
+label	text_a
+0	i want to fly from boston at 838 am and arrive in denver at 1110 in the morning
+0	what flights are available from pittsburgh to baltimore on thursday morning
+1	what is the arrival time in san francisco for the 755 am flight leaving washington
+2	cheapest airfare from tacoma to orlando
+```
+### Step 2: Train & Predict
+The code used to perform this task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run:
+```shell
+python run.py
+```
+If you want to specify a specific gpu or use multiple gpus for training, please use **`CUDA_VISIBLE_DEVICES`**, for example:
+```shell
+CUDA_VISIBLE_DEVICES=0,1,2 python run.py
+```
+Some logs will be shown below:
+```
+global step: 5,   slot: step 3/309 (epoch 0), loss: 68.965, speed: 0.58 steps/s
+global step: 10, intent: step 3/311 (epoch 0), loss: 3.407, speed: 8.76 steps/s
+global step: 15,   slot: step 12/309 (epoch 0), loss: 54.611, speed: 1.21 steps/s
+global step: 20, intent: step 7/311 (epoch 0), loss: 3.487, speed: 10.28 steps/s
+```
+After the run, you can view the saved models in the `outputs/` folder.
+If you want to use the trained model to predict the `atis_slot & atis_intent` data, run:
+```shell
+python predict-slot.py
+python predict-intent.py
+```
+If you want to specify a specific gpu or use multiple gpus for predict, please use **`CUDA_VISIBLE_DEVICES`**, for example:
+```shell
+CUDA_VISIBLE_DEVICES=0,1,2 python predict-slot.py
+CUDA_VISIBLE_DEVICES=0,1,2 python predict-intent.py
+```
+After the run, you can view the predictions in the `outputs/predict-slot` folder and `outputs/predict-intent` folder. Here are some examples of predictions:
+`atis_slot`:
+```
+[129, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 5, 19, 1, 1, 1, 1, 1, 21, 21, 68, 129]
+[129, 1, 39, 37, 1, 1, 1, 1, 1, 2, 1, 5, 19, 1, 23, 3, 4, 129, 129, 129, 129, 129]
+[129, 1, 39, 37, 1, 1, 1, 1, 1, 1, 2, 1, 5, 19, 129, 129, 129, 129, 129, 129, 129, 129]
+[129, 1, 1, 1, 1, 1, 1, 14, 15, 1, 2, 1, 5, 19, 1, 39, 37, 129, 24, 129, 129, 129]
+```
+`atis_intent`:
+```
+{"index": 0, "logits": [9.938603401184082, -0.3914794623851776, -0.050973162055015564, -1.0229418277740479, 0.04799401015043259, -0.9632213115692139, -0.6427211761474609, -1.337939739227295, -0.7969412803649902, -1.4441455602645874, -0.6339573264122009, -1.0393054485321045, -0.9242327213287354, -1.9637483358383179, 0.16733427345752716, -0.5280354619026184, -1.7195699214935303, -2.199411630630493, -1.2833174467086792, -1.3081035614013672, -1.6036226749420166, -1.8527079820632935, -2.289180040359497, -2.267214775085449, -2.2578916549682617, -2.2010505199432373], "probs": [0.999531626701355, 3.26210938510485e-05, 4.585415081237443e-05, 1.7348344044876285e-05, 5.06243304698728e-05, 1.8415948943584226e-05, 2.5373808966833167e-05, 1.266065828531282e-05, 2.174747896788176e-05, 1.1384962817828637e-05, 2.5597169951652177e-05, 1.7066764485207386e-05, 1.914815220516175e-05, 6.771284006390488e-06, 5.70411684748251e-05, 2.8457265216275118e-05, 8.644025911053177e-06, 5.349628736439627e-06, 1.3371440218179487e-05, 1.3044088518654462e-05, 9.706698619993404e-06, 7.5665011536329985e-06, 4.890325726591982e-06, 4.99892985317274e-06, 5.045753368904116e-06, 5.340866664482746e-06], "label": 0}
+{"index": 1, "logits": [0.8863624930381775, -2.232290506362915, 8.191509246826172, -0.03161466494202614, -0.9149583578109741, -2.172696352005005, -0.3937145471572876, -0.3954394459724426, 1.5333592891693115, 0.8630291223526001, -0.9684226512908936, -2.722721815109253, -0.0060247331857681274, -0.9865402579307556, 1.6328885555267334, 0.3972966969013214, 0.27919167280197144, -1.4911551475524902, -0.9552251696586609, -0.9169244170188904, -0.810670793056488, -1.5118697881698608, -2.0140435695648193, -1.6299077272415161, -1.8589974641799927, -2.07601261138916], "probs": [0.0006675600307062268, 2.9517297662096098e-05, 0.9932880997657776, 0.0002665741485543549, 0.0001102013120544143, 3.132982965325937e-05, 0.00018559220188762993, 0.00018527248175814748, 0.0012749042361974716, 0.0006521637551486492, 0.00010446414671605453, 1.8075270418194123e-05, 0.0002734838053584099, 0.00010258861584588885, 0.0014083238784223795, 0.00040934717981144786, 0.00036374686169438064, 6.193659646669403e-05, 0.00010585198469925672, 0.00010998480865964666, 0.0001223145518451929, 6.0666847275570035e-05, 3.671637750812806e-05, 5.391232480178587e-05, 4.287416595616378e-05, 3.4510172554291785e-05], "label": 0}
+{"index": 2, "logits": [9.789957046508789, -0.1730862706899643, -0.7198237776756287, -1.0460278987884521, 0.23521068692207336, -0.5075851678848267, -0.44724929332733154, -1.2945927381515503, -0.6984466314315796, -1.8749892711639404, -0.4631594121456146, -0.6256799697875977, -1.0252169370651245, -1.951456069946289, -0.17572557926177979, -0.6771697402000427, -1.7992591857910156, -2.1457295417785645, -1.4203097820281982, -1.4963451623916626, -1.692310094833374, -1.9219486713409424, -2.2533645629882812, -2.430952310562134, -2.3094685077667236, -2.2399914264678955], "probs": [0.9994625449180603, 4.708383130491711e-05, 2.725377635215409e-05, 1.9667899323394522e-05, 7.082601223373786e-05, 3.3697724575176835e-05, 3.579350595828146e-05, 1.5339375750045292e-05, 2.784266871458385e-05, 8.58508519741008e-06, 3.522853512549773e-05, 2.9944207199150696e-05, 2.0081495677004568e-05, 7.953084605105687e-06, 4.695970710599795e-05, 2.8441407266655006e-05, 9.26048778637778e-06, 6.548832516273251e-06, 1.3527245755540207e-05, 1.2536826943687629e-05, 1.030578732752474e-05, 8.19125762063777e-06, 5.880556273041293e-06, 4.923717369820224e-06, 5.559719284065068e-06, 5.9597273320832755e-06], "label": 0}
+{"index": 3, "logits": [9.787659645080566, -0.6223222017288208, -0.03971472755074501, -1.038114070892334, 0.24018540978431702, -0.8904737830162048, -0.7114139795303345, -1.2315020561218262, -0.5120854377746582, -1.4273980855941772, -0.44618460536003113, -1.0241562128067017, -0.9727545380592346, -1.8587366342544556, 0.020689941942691803, -0.6228570342063904, -1.6020199060440063, -2.130260467529297, -1.370570421218872, -1.40530526638031, -1.6782578229904175, -1.94076669216156, -2.2038567066192627, -2.336832284927368, -2.268157720565796, -2.140028953552246], "probs": [0.9994485974311829, 3.0113611501292326e-05, 5.392447565100156e-05, 1.986949791898951e-05, 7.134198676794767e-05, 2.303065048181452e-05, 2.7546762794372626e-05, 1.6375688574044034e-05, 3.362310235388577e-05, 1.3462414244713727e-05, 3.591357381083071e-05, 2.0148761905147694e-05, 2.12115264730528e-05, 8.74570196174318e-06, 5.728216274292208e-05, 3.0097504350123927e-05, 1.1305383850412909e-05, 6.666126409982098e-06, 1.4249604646465741e-05, 1.3763145034317859e-05, 1.0475521776243113e-05, 8.056933438638225e-06, 6.193143690325087e-06, 5.422014055511681e-06, 5.807448815176031e-06, 6.601325367228128e-06], "label": 0}
+```
+### Step 3: Evaluate
+Once you have the prediction, you can run the evaluation script to evaluate the model:
+```shell
+python evaluate-slot.py
+python evaluate-intent.py
+```
+The evaluation results are as follows:
+`atis_slot`:
+```
+precision: 0.894397728514, recall: 0.894104803493, f1: 0.894251242016
+```
+`atis_intent`:
+```
+data num: 893
+precision: 0.708846584546, recall: 1.0, f1: 0.999999995
+```
--- a/examples/multi-task/download.py
+++ b/examples/multi-task/download.py
+#  -*- coding: utf-8 -*-
+import os
+import requests
+import tarfile
+import shutil
+from tqdm import tqdm
+def download(src, url):
+    file_size = int(requests.head(url).headers['Content-Length'])
+    header = {
+        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/'
+        '70.0.3538.67 Safari/537.36'
+    }
+    pbar = tqdm(total=file_size)
+    resp = requests.get(url, headers=header, stream=True)
+    with open(src, 'ab') as f:
+        for chunk in resp.iter_content(chunk_size=1024):
+            if chunk:
+                f.write(chunk)
+                pbar.update(1024)
+    pbar.close()
+    return file_size
+abs_path = os.path.abspath(__file__)
+download_url = "https://baidu-nlp.bj.bcebos.com/dmtk_data_1.0.0.tar.gz"
+downlaod_path = os.path.join(os.path.dirname(abs_path), "dmtk_data_1.0.0.tar.gz")
+target_dir = os.path.dirname(abs_path)
+download(downlaod_path, download_url)
+tar = tarfile.open(downlaod_path)
+tar.extractall(target_dir)
+os.remove(downlaod_path)
+shutil.rmtree(os.path.join(target_dir, 'data/dstc2/'))
+shutil.rmtree(os.path.join(target_dir, 'data/mrda/'))
+shutil.rmtree(os.path.join(target_dir, 'data/multi-woz/'))
+shutil.rmtree(os.path.join(target_dir, 'data/swda/'))
+shutil.rmtree(os.path.join(target_dir, 'data/udc/'))
--- a/examples/multi-task/evaluate-intent.py
+++ b/examples/multi-task/evaluate-intent.py
+#  -*- coding: utf-8 -*-
+import json
+import numpy as np
+def accuracy(preds, labels):
+    preds = np.array(preds)
+    labels = np.array(labels) 
+    return (preds == labels).mean()
+def f1(preds, labels):
+    preds = np.array(preds)
+    labels = np.array(labels)
+    tp = np.sum((labels == '1') & (preds == '1'))
+    tn = np.sum((labels == '0') & (preds == '0'))
+    fp = np.sum((labels == '0') & (preds == '1'))
+    fn = np.sum((labels == '1') & (preds == '0'))
+    p = tp * 1.0 / (tp + fp) 
+    r = tp * 1.0 / (tp + fn) * 1.0
+    f1 = (2 * p * r) / (p + r + 1e-8)
+    return f1
+def recall(preds, labels):
+    preds = np.array(preds)
+    labels = np.array(labels)
+    # recall=TP/(TP+FN)
+    tp = np.sum((labels == '1') & (preds == '1'))
+    fn = np.sum((labels == '1') & (preds == '0'))
+    re = tp * 1.0 / (tp + fn)
+    return re
+def res_evaluate(res_dir="./outputs/predict-intent/predictions.json", eval_phase='test'):
+    if eval_phase == 'test':
+        data_dir="./data/atis/atis_intent/test.tsv"
+    elif eval_phase == 'dev':
+        data_dir="./data/dev.tsv"
+    else:
+        assert eval_phase in ['dev', 'test'], 'eval_phase should be dev or test'
+    labels = []
+    with open(data_dir, "r") as file:
+        first_flag = True
+        for line in file:
+            line = line.split("\t")
+            label = line[0]
+            if label=='label':
+                continue
+            labels.append(str(label))
+    file.close()
+    preds = []
+    with open(res_dir, "r") as file:
+        for line in file.readlines():
+            line = json.loads(line)
+            pred = line['label']
+            preds.append(str(pred))
+    file.close()
+    assert len(labels) == len(preds), "prediction result doesn't match to labels"
+    print('data num: {}'.format(len(labels)))
+    print("precision: {}, recall: {}, f1: {}".format(accuracy(preds, labels), recall(preds, labels), f1(preds, labels)))
+res_evaluate()
--- a/examples/multi-task/predict-intent.py
+++ b/examples/multi-task/predict-intent.py
+# coding=utf-8
+import paddlepalm as palm
+import json
+from paddlepalm.distribute import gpu_dev_count
+if __name__ == '__main__':
+    # configs
+    max_seqlen = 256
+    batch_size = 16
+    num_epochs = 6 
+    print_steps = 5
+    num_classes = 26
+    vocab_path = './pretrain/ernie-en-base/vocab.txt'
+    predict_file = './data/atis/atis_intent/test.tsv'
+    save_path = './outputs/'
+    pred_output = './outputs/predict-intent/'
+    save_type = 'ckpt'
+    random_seed = 0
+    pre_params = './pretrain/ernie-en-base/params'
+    config = json.load(open('./pretrain/ernie-en-base/ernie_config.json'))
+    input_dim = config['hidden_size']
+    # -----------------------  for prediction ----------------------- 
+    # step 1-1: create readers for prediction
+    print('prepare to predict...')
+    predict_cls_reader = palm.reader.ClassifyReader(vocab_path, max_seqlen, seed=random_seed, phase='predict')
+    # step 1-2: load the training data
+    predict_cls_reader.load_data(predict_file, batch_size)
+    # step 2: create a backbone of the model to extract text features
+    pred_ernie = palm.backbone.ERNIE.from_config(config, phase='predict')
+    # step 3: register the backbone in reader
+    predict_cls_reader.register_with(pred_ernie)
+    # step 4: create the task output head
+    cls_pred_head = palm.head.Classify(num_classes, input_dim, phase='predict')
+    # step 5-1: create a task trainer
+    trainer = palm.Trainer("intent")
+    # step 5-2: build forward graph with backbone and task head
+    trainer.build_predict_forward(pred_ernie, cls_pred_head)
+    # step 6: load pretrained model
+    pred_model_path = './outputs/ckpt.step9282'
+    pred_ckpt = trainer.load_ckpt(pred_model_path)
+    # step 7: fit prepared reader and data
+    trainer.fit_reader(predict_cls_reader, phase='predict')
+    # step 8: predict
+    print('predicting..')
+    trainer.predict(print_steps=print_steps, output_dir=pred_output)
\ No newline at end of file
--- a/examples/multi-task/predict-slot.py
+++ b/examples/multi-task/predict-slot.py
+# coding=utf-8
+import paddlepalm as palm
+import json
+from paddlepalm.distribute import gpu_dev_count
+if __name__ == '__main__':
+    # configs
+    max_seqlen = 256
+    batch_size = 16
+    num_epochs = 6 
+    print_steps = 5
+    num_classes = 130
+    label_map = './data/atis/atis_slot/label_map.json'
+    vocab_path = './pretrain/ernie-en-base/vocab.txt'
+    predict_file = './data/atis/atis_slot/test.tsv'
+    save_path = './outputs/'
+    pred_output = './outputs/predict-slot/'
+    save_type = 'ckpt'
+    random_seed = 0
+    pre_params = './pretrain/ernie-en-base/params'
+    config = json.load(open('./pretrain/ernie-en-base/ernie_config.json'))
+    input_dim = config['hidden_size']
+    # -----------------------  for prediction ----------------------- 
+    # step 1-1: create readers for prediction
+    print('prepare to predict...')
+    predict_seq_label_reader = palm.reader.SequenceLabelReader(vocab_path, max_seqlen, label_map, seed=random_seed, phase='predict')
+    # step 1-2: load the training data
+    predict_seq_label_reader.load_data(predict_file, batch_size)
+    # step 2: create a backbone of the model to extract text features
+    pred_ernie = palm.backbone.ERNIE.from_config(config, phase='predict')
+    # step 3: register the backbone in reader
+    predict_seq_label_reader.register_with(pred_ernie)
+    # step 4: create the task output head
+    seq_label_pred_head = palm.head.SequenceLabel(num_classes, input_dim, phase='predict')
+    # step 5-1: create a task trainer
+    trainer_seq_label = palm.Trainer("slot")
+    # step 5-2: build forward graph with backbone and task head
+    trainer_seq_label.build_predict_forward(pred_ernie, seq_label_pred_head)
+    # step 6: load pretrained model
+    pred_model_path = './outputs/ckpt.step9282'
+    pred_ckpt = trainer_seq_label.load_ckpt(pred_model_path)
+    # step 7: fit prepared reader and data
+    trainer_seq_label.fit_reader(predict_seq_label_reader, phase='predict')
+    # step 8: predict
+    print('predicting..')
+    trainer_seq_label.predict(print_steps=print_steps, output_dir=pred_output)
\ No newline at end of file
--- a/examples/multi-task/process.py
+++ b/examples/multi-task/process.py
+import os
+import json
+label_new = "data/atis/atis_slot/label_map.json"
+label_old = "data/atis/atis_slot/map_tag_slot_id.txt"
+train_old = "data/atis/atis_slot/train.txt"
+train_new = "data/atis/atis_slot/train.tsv"
+dev_old = "data/atis/atis_slot/dev.txt"
+dev_new = "data/atis/atis_slot/dev.tsv"
+test_old = "data/atis/atis_slot/test.txt"
+test_new = "data/atis/atis_slot/test.tsv"
+intent_test =  "data/atis/atis_intent/test.tsv"
+os.rename("data/atis/atis_intent/test.txt", intent_test)
+intent_train =  "data/atis/atis_intent/train.tsv"
+os.rename("data/atis/atis_intent/train.txt", intent_train)
+intent_dev = "data/atis/atis_intent/dev.tsv"
+os.rename("data/atis/atis_intent/dev.txt", intent_dev)
+with open(intent_dev, 'r+') as f: 
+    content = f.read()  
+    f.seek(0, 0)
+    f.write("label\ttext_a\n"+content)
+f.close()
+with open(intent_test, 'r+') as f: 
+    content = f.read()  
+    f.seek(0, 0)
+    f.write("label\ttext_a\n"+content)
+f.close()
+with open(intent_train, 'r+') as f: 
+    content = f.read()  
+    f.seek(0, 0)
+    f.write("label\ttext_a\n"+content)
+f.close()
+os.mknod(label_new)
+os.mknod(train_new)
+os.mknod(dev_new)
+os.mknod(test_new)
+tag = []
+id = []
+map = {}
+with open(label_old, "r") as f:
+    with open(label_new, "w") as f2:
+        for line in f.readlines():
+            line = line.split('\t')
+            tag.append(line[0])
+            id.append(int(line[1][:-1]))
+            map[line[1][:-1]] = line[0]
+        re = {tag[i]:id[i] for i in range(len(tag))}
+        re = json.dumps(re)
+        f2.write(re)
+    f2.close()
+f.close()
+with open(train_old, "r") as f:
+    with open(train_new, "w") as f2:
+        f2.write("text_a\tlabel\n")
+        for line in f.readlines():
+            line = line.split('\t')
+            text = line[0].split(' ')
+            label = line[1].split(' ')
+            for t in text:
+                f2.write(t)
+                f2.write('\2')
+            f2.write('\t')
+            for t in label:
+                if t.endswith('\n'):
+                    t = t[:-1] 
+                f2.write(map[t])
+                f2.write('\2')
+            f2.write('\n')
+    f2.close()
+f.close()
+with open(test_old, "r") as f:
+    with open(test_new, "w") as f2:
+        f2.write("text_a\tlabel\n")
+        for line in f.readlines():
+            line = line.split('\t')
+            text = line[0].split(' ')
+            label = line[1].split(' ')
+            for t in text:
+                f2.write(t)
+                f2.write('\2')
+            f2.write('\t')
+            for t in label:
+                if t.endswith('\n'):
+                    t = t[:-1] 
+                f2.write(map[t])
+                f2.write('\2')
+            f2.write('\n')
+    f2.close()
+f.close()
+with open(dev_old, "r") as f:
+    with open(dev_new, "w") as f2:
+        f2.write("text_a\tlabel\n")
+        for line in f.readlines():
+            line = line.split('\t')
+            text = line[0].split(' ')
+            label = line[1].split(' ')
+            for t in text:
+                f2.write(t)
+                f2.write('\2')
+            f2.write('\t')
+            for t in label:
+                if t.endswith('\n'):
+                    t = t[:-1] 
+                f2.write(map[t])
+                f2.write('\2')
+            f2.write('\n')
+    f2.close()
+f.close()
+os.remove(label_old)
+os.remove(train_old)
+os.remove(test_old)
+os.remove(dev_old)
\ No newline at end of file
--- a/examples/multi-task/run.py
+++ b/examples/multi-task/run.py
+# coding=utf-8
+import paddlepalm as palm
+import json
+from paddlepalm.distribute import gpu_dev_count
+if __name__ == '__main__':
+    # configs
+    max_seqlen = 128
+    batch_size = 16
+    num_epochs = 20
+    print_steps = 5
+    lr = 2e-5
+    num_classes = 130
+    weight_decay = 0.01
+    num_classes_intent = 26
+    dropout_prob = 0.1
+    random_seed = 0
+    label_map = './data/atis/atis_slot/label_map.json'
+    vocab_path = './pretrain/ernie-en-base/vocab.txt'
+    train_slot = './data/atis/atis_slot/train.tsv'
+    train_intent = './data/atis/atis_intent/train.tsv'
+    predict_file = './data/atis/atis_slot/test.tsv'
+    save_path = './outputs/'
+    pred_output = './outputs/predict/'
+    save_type = 'ckpt'
+    pre_params = './pretrain/ernie-en-base/params'
+    config = json.load(open('./pretrain/ernie-en-base/ernie_config.json'))
+    input_dim = config['hidden_size']
+    # -----------------------  for training ----------------------- 
+    # step 1-1: create readers for training 
+    seq_label_reader = palm.reader.SequenceLabelReader(vocab_path, max_seqlen, label_map, seed=random_seed)
+    cls_reader = palm.reader.ClassifyReader(vocab_path, max_seqlen, seed=random_seed)
+    # step 1-2: load the training data
+    seq_label_reader.load_data(train_slot, file_format='tsv', num_epochs=None, batch_size=batch_size)
+    cls_reader.load_data(train_intent, batch_size=batch_size, num_epochs=None)
+    # step 2: create a backbone of the model to extract text features
+    ernie = palm.backbone.ERNIE.from_config(config)
+    # step 3: register the backbone in readers
+    seq_label_reader.register_with(ernie)
+    cls_reader.register_with(ernie)
+    # step 4: create task output heads
+    seq_label_head = palm.head.SequenceLabel(num_classes, input_dim, dropout_prob)
+    cls_head = palm.head.Classify(num_classes_intent, input_dim, dropout_prob)
+    # step 5-1: create a task trainer
+    trainer_seq_label = palm.Trainer("slot", mix_ratio=1.0)
+    trainer_cls = palm.Trainer("intent", mix_ratio=1.0)
+    trainer = palm.MultiHeadTrainer([trainer_seq_label, trainer_cls])
+    # # step 5-2: build forward graph with backbone and task head
+    loss1 = trainer_cls.build_forward(ernie, cls_head)
+    loss2 = trainer_seq_label.build_forward(ernie, seq_label_head)
+    loss_var = trainer.build_forward()
+    # step 6-1*: use warmup
+    n_steps = seq_label_reader.num_examples * 1.5 * num_epochs // batch_size
+    warmup_steps = int(0.1 * n_steps)
+    sched = palm.lr_sched.TriangularSchedualer(warmup_steps, n_steps)
+    # step 6-2: create a optimizer
+    adam = palm.optimizer.Adam(loss_var, lr, sched)
+    # step 6-3: build backward
+    trainer.build_backward(optimizer=adam, weight_decay=weight_decay)
+    # step 7: fit prepared reader and data
+    trainer.fit_readers_with_mixratio([seq_label_reader, cls_reader], "slot", num_epochs)
+    # step 8-1*: load pretrained parameters
+    trainer.load_pretrain(pre_params)
+    # step 8-2*: set saver to save model
+    save_steps = int(n_steps-batch_size) // 2
+    # save_steps = 10
+    trainer.set_saver(save_path=save_path, save_steps=save_steps, save_type=save_type)
+    # step 8-3: start training
+    trainer.train(print_steps=print_steps)
\ No newline at end of file
--- a/examples/tagging/README.md
+++ b/examples/tagging/README.md
-## Examples 3: Tagging
+## Example 3: Tagging
 This task is a named entity recognition task. The following sections detail model preparation, dataset preparation, and how to run the task.
 ### Step 1: Prepare Pre-trained Models & Datasets
@@ -34,7 +34,7 @@ text_a  label
 ### Step 2: Train & Predict
-The code used to perform classification task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run:
+The code used to perform this task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run:
 ```shell
 python run.py

--- a/examples/tagging/run.py
+++ b/examples/tagging/run.py
@@ -32,26 +32,26 @@ if __name__ == '__main__':
    # -----------------------  for training ----------------------- 
    # step 1-1: create readers for training
-    ner_reader = palm.reader.SequenceLabelReader(vocab_path, max_seqlen, label_map, seed=random_seed)
+    seq_label_reader = palm.reader.SequenceLabelReader(vocab_path, max_seqlen, label_map, seed=random_seed)
    # step 1-2: load the training data
-    ner_reader.load_data(train_file, file_format='tsv', num_epochs=num_epochs, batch_size=batch_size)
+    seq_label_reader.load_data(train_file, file_format='tsv', num_epochs=num_epochs, batch_size=batch_size)
    # step 2: create a backbone of the model to extract text features
    ernie = palm.backbone.ERNIE.from_config(config)
    # step 3: register the backbone in reader
-    ner_reader.register_with(ernie)
+    seq_label_reader.register_with(ernie)
    # step 4: create the task output head
-    ner_head = palm.head.SequenceLabel(num_classes, input_dim, dropout_prob)
+    seq_label_head = palm.head.SequenceLabel(num_classes, input_dim, dropout_prob)
    # step 5-1: create a task trainer
    trainer = palm.Trainer(task_name)
    # step 5-2: build forward graph with backbone and task head
-    loss_var = trainer.build_forward(ernie, ner_head)
+    loss_var = trainer.build_forward(ernie, seq_label_head)
    # step 6-1*: use warmup
-    n_steps = ner_reader.num_examples * num_epochs // batch_size
+    n_steps = seq_label_reader.num_examples * num_epochs // batch_size
    warmup_steps = int(0.1 * n_steps)
    print('total_steps: {}'.format(n_steps))
    print('warmup_steps: {}'.format(warmup_steps))
@@ -62,43 +62,43 @@ if __name__ == '__main__':
    trainer.build_backward(optimizer=adam, weight_decay=weight_decay)
    # step 7: fit prepared reader and data
-    trainer.fit_reader(ner_reader)
+    trainer.fit_reader(seq_label_reader)
-    # step 8-1*: load pretrained parameters
+    # # step 8-1*: load pretrained parameters
-    trainer.load_pretrain(pre_params)
+    # trainer.load_pretrain(pre_params)
-    # step 8-2*: set saver to save model
+    # # step 8-2*: set saver to save model
-    save_steps = (n_steps-20)// gpu_dev_count
+    save_steps = 1951
-    print('save_steps: {}'.format(save_steps))
+    # print('save_steps: {}'.format(save_steps))
-    trainer.set_saver(save_path=save_path, save_steps=save_steps, save_type=save_type)
+    # trainer.set_saver(save_path=save_path, save_steps=save_steps, save_type=save_type)
-    # step 8-3: start training
+    # # step 8-3: start training
-    trainer.train(print_steps=train_print_steps)
+    # trainer.train(print_steps=train_print_steps)
    # -----------------------  for prediction ----------------------- 
    # step 1-1: create readers for prediction
    print('prepare to predict...')
-    predict_ner_reader = palm.reader.SequenceLabelReader(vocab_path, max_seqlen, label_map, phase='predict')
+    predict_seq_label_reader = palm.reader.SequenceLabelReader(vocab_path, max_seqlen, label_map, seed=random_seed, phase='predict')
    # step 1-2: load the training data
-    predict_ner_reader.load_data(predict_file, batch_size)
+    predict_seq_label_reader.load_data(predict_file, batch_size)
    # step 2: create a backbone of the model to extract text features
    pred_ernie = palm.backbone.ERNIE.from_config(config, phase='predict')
    # step 3: register the backbone in reader
-    predict_ner_reader.register_with(pred_ernie)
+    predict_seq_label_reader.register_with(pred_ernie)
    # step 4: create the task output head
-    ner_pred_head = palm.head.SequenceLabel(num_classes, input_dim, phase='predict')
+    seq_label_pred_head = palm.head.SequenceLabel(num_classes, input_dim, phase='predict')
    # step 5: build forward graph with backbone and task head
-    trainer.build_predict_forward(pred_ernie, ner_pred_head)
+    trainer.build_predict_forward(pred_ernie, seq_label_pred_head)
    # step 6: load pretrained model
    pred_model_path = './outputs/ckpt.step' + str(save_steps)
    pred_ckpt = trainer.load_ckpt(pred_model_path)
    # step 7: fit prepared reader and data
-    trainer.fit_reader(predict_ner_reader, phase='predict')
+    trainer.fit_reader(predict_seq_label_reader, phase='predict')
    # step 8: predict
    print('predicting..')

--- a/paddlepalm/distribute/reader.py
+++ b/paddlepalm/distribute/reader.py
@@ -57,9 +57,9 @@ def yield_pieces(data, distribute_strategy, batch_size):
            yield temp
-def data_feeder(reader, postprocess_fn=None, prefetch_steps=2):
+def data_feeder(reader, postprocess_fn=None, prefetch_steps=2, phase='train', is_multi=False):
    if postprocess_fn is None:
-        def postprocess_fn(batch):
+        def postprocess_fn(batch, id=-1, phase='train', is_multi=False):
            return batch
    def worker(reader, dev_count, queue):
@@ -90,6 +90,10 @@ def data_feeder(reader, postprocess_fn=None, prefetch_steps=2):
        queue.task_done()
        if ret is not None:
            batches, num_pad = ret
+            if dev_count > 1 and phase == 'train' and is_multi: 
+                id = batches[0]['__task_id'][0]
+            else:
+                id = -1
            batch_buf = []
            flag_buf = []
            for idx, batch in enumerate(batches):
@@ -97,8 +101,8 @@ def data_feeder(reader, postprocess_fn=None, prefetch_steps=2):
                flag = idx-len(batches) < -num_pad
                # if num_pad > 0:
                #     num_pad -= 1
-                # batch = postprocess_fn(batch, id)
+                batch = postprocess_fn(batch, id, phase, is_multi=is_multi)
-                batch = postprocess_fn(batch)
+                # batch = postprocess_fn(batch)
                batch_buf.append(batch)
                flag_buf.append(flag)
            yield batch_buf, flag_buf

--- a/paddlepalm/head/cls.py
+++ b/paddlepalm/head/cls.py
@@ -111,7 +111,7 @@ class Classify(Head):
            with open(os.path.join(output_dir, 'predictions.json'), 'w') as writer:
                for i in range(len(self._preds)):
                    label = 0 if self._preds[i][0] > self._preds[i][1] else 1
-                    result = {'index': i, 'label': label, 'logits': self._preds[i], 'probs': self._preds[i]}
+                    result = {'index': i, 'label': label, 'logits': self._preds[i], 'probs': self._probs[i]}
                    result = json.dumps(result)
                    writer.write(result+'\n')
            print('Predictions saved at '+os.path.join(output_dir, 'predictions.json'))

--- a/paddlepalm/head/mlm.py
+++ b/paddlepalm/head/mlm.py
@@ -24,24 +24,21 @@ class MaskLM(Head):
    '''
    mlm
    '''
-    def __init__(self, input_dim, vocab_size, hidden_act, initializer_range, dropout_prob=0.0, \
+    def __init__(self, input_dim, vocab_size, hidden_act, dropout_prob=0.0, \
                 param_initializer_range=0.02, phase='train'):
        self._is_training = phase == 'train'
        self._emb_size = input_dim
        self._hidden_size = input_dim
        self._dropout_prob = dropout_prob if phase == 'train' else 0.0
-        self._param_initializer = fluid.initializer.TruncatedNormal(
-            scale=param_initializer_range)
        self._preds = []
        self._vocab_size = vocab_size
        self._hidden_act = hidden_act
-        self._initializer_range = initializer_range
+        self._initializer_range = param_initializer_range
    @property
    def inputs_attrs(self):
        reader = {
-            "token_ids":[[-1, -1], 'int64'],
            "mask_label": [[-1], 'int64'],
            "mask_pos": [[-1], 'int64'],
            }
@@ -61,21 +58,19 @@ class MaskLM(Head):
    def build(self, inputs, scope_name=""):
        mask_pos = inputs["reader"]["mask_pos"]
+        word_emb = inputs["backbone"]["embedding_table"]
+        enc_out = inputs["backbone"]["encoder_outputs"]
        if self._is_training:
-            mask_label = inputs["reader"]["mask_label"] 
+            mask_label = inputs["reader"]["mask_label"]
-            l1 = fluid.layers.shape(inputs["reader"]["token_ids"] )[0]
+            l1 = enc_out.shape[0] 
-            # bxs = inputs["reader"]["token_ids"].shape[2].value
+            l2 = enc_out.shape[1]
-            l2 = fluid.layers.shape(inputs["reader"]["token_ids"][0])[0]
+            bxs = fluid.layers.fill_constant(shape=[1], value=l1*l2, dtype='int64')
-            bxs = (l1*l2).astype(np.int64)
-            # max_position = inputs["reader"]["batchsize_x_seqlen"] - 1
            max_position = bxs - 1
            mask_pos = fluid.layers.elementwise_min(mask_pos, max_position)
            mask_pos.stop_gradient = True
-        word_emb = inputs["backbone"]["embedding_table"]
-        enc_out = inputs["backbone"]["encoder_outputs"]
        emb_size = word_emb.shape[-1]
        _param_initializer = fluid.initializer.TruncatedNormal(
@@ -95,7 +90,7 @@ class MaskLM(Head):
            param_attr=fluid.ParamAttr(
                name=scope_name+'mask_lm_trans_fc.w_0',
                initializer=_param_initializer),
-            bias_attr=fluid.ParamAttr(name=scope_name+'mask_lm_trans_fc.b_0'))
+                bias_attr=fluid.ParamAttr(name=scope_name+'mask_lm_trans_fc.b_0'))
        # transform: layer norm
        mask_trans_feat = pre_process_layer(
            mask_trans_feat, 'n', name=scope_name+'mask_lm_trans')

--- a/paddlepalm/multihead_trainer.py
+++ b/paddlepalm/multihead_trainer.py
@@ -5,6 +5,7 @@ from paddlepalm.distribute import gpu_dev_count, cpu_dev_count
 from paddlepalm import Trainer
 from paddlepalm.utils import reader_helper
 import numpy as np
+from paddlepalm.distribute import gpu_dev_count, data_feeder, decode_fake
 import time
 dev_count = 1 if gpu_dev_count <= 1 else gpu_dev_count
@@ -55,7 +56,8 @@ class MultiHeadTrainer(Trainer):
        for t in self._trainers:
            t._set_multitask()
-    def build_forward(self, backbone, heads):
+    # def build_forward(self, backbone, heads):
+    def build_forward(self):
        """
        Build forward computation graph for training, which usually built from input layer to loss node.
@@ -66,20 +68,13 @@ class MultiHeadTrainer(Trainer):
        Return:
            - loss_var: a Variable object. The computational graph variable(node) of loss.
        """
+        head_dict = {}
-        if isinstance(heads, list):
+        backbone = self._trainers[0]._backbone
-            head_dict = {k.name: v for k,v in zip(self._trainers, heads)}
+        for i in self._trainers:
-        elif isinstance(heads, dict):
+            assert i._task_head is not None and i._backbone is not None, "You should build forward for the {} task".format(i._name)
-            head_dict = heads
+            assert i._backbone == backbone, "The backbone for each task must be the same"
-        else:
+            head_dict[i._name] = i._task_head
-            raise ValueError()
-        num_heads = len(self._trainers)
-        assert len(head_dict) == num_heads
-        for t in self._trainers:
-            assert t.name in head_dict, "expected: {}, exists: {}".format(t.name, head_dict.keys())
        train_prog = fluid.Program()
        train_init_prog = fluid.Program()
        self._train_prog = train_prog
@@ -87,27 +82,15 @@ class MultiHeadTrainer(Trainer):
        def get_loss(i):
            head = head_dict[self._trainers[i].name]
-            # loss_var = self._trainers[i].build_forward(backbone, head, train_prog, train_init_prog)
+            self._trainers[i]._lock_prog = True
            loss_var = self._trainers[i].build_forward(backbone, head)
+            self._trainers[i]._lock_prog = False
            return loss_var
-        # task_fns = {}
+        task_fns = {i: lambda i=i: get_loss(i) for i in range(len(self._trainers))}
-        # for i in range(num_heads):
-        #     def task_loss():
-        #         task_id = i
-        #         return lambda: get_loss(task_id)
-        #     task_fns[i] = task_loss()
-        # task_fns = {i: lambda: get_loss(i) for i in range(num_heads)}
-        task_fns = {i: lambda i=i: get_loss(i) for i in range(num_heads)}
        with fluid.program_guard(train_prog, train_init_prog):
            task_id_var = fluid.data(name="__task_id",shape=[1],dtype='int64')
-            # task_id_var = fluid.layers.fill_constant(shape=[1],dtype='int64', value=1)
-            # print(task_id_var.name)
            loss_var = layers.switch_case(
                branch_index=task_id_var,
@@ -200,15 +183,15 @@ class MultiHeadTrainer(Trainer):
        feed_batch_process_fn = reader_helper.create_feed_batch_process_fn(net_inputs)
        if gpu_dev_count > 1:
-            distribute_feeder_fn = data_feeder(iterator_fn, feed_batch_process_fn)
+            distribute_feeder_fn = data_feeder(iterator_fn, feed_batch_process_fn, phase=phase, is_multi=True)
        else:
-            distribute_feeder_fn = iterator_fn
+            distribute_feeder_fn = iterator_fn()
        if phase == 'train':
-            self._train_reader = distribute_feeder_fn()
+            self._train_reader = distribute_feeder_fn
            self._feed_batch_process_fn = feed_batch_process_fn
        elif phase == 'predict':
-            self._predict_reader = distribute_feeder_fn()
+            self._predict_reader = distribute_feeder_fn
            self._pred_feed_batch_process_fn = feed_batch_process_fn
    def _check_finish(self, task_name, silent=False):
@@ -241,7 +224,6 @@ class MultiHeadTrainer(Trainer):
            task_rt_outputs = {k[len(self._trainers[task_id].name+'.'):]: v for k,v in rt_outputs.items() if k.startswith(self._trainers[task_id].name+'.')}
            self._trainers[task_id]._task_head.batch_postprocess(task_rt_outputs)
            if print_steps > 0 and self._cur_train_step % print_steps == 0:
                loss = rt_outputs[self._trainers[task_id].name+'.loss']
                loss = np.mean(np.squeeze(loss)).tolist()
@@ -276,8 +258,8 @@ class MultiHeadTrainer(Trainer):
    def train_one_step(self, batch):
        if dev_count > 1:
-            assert isinstance(batch, list)
+            assert isinstance(batch, tuple)
-            task_id = batch[0]['__task_id'][0]
+            task_id = batch[0][0]['__task_id'][0]
        else:
            assert isinstance(batch, dict)
            task_id = batch['__task_id'][0]

--- a/paddlepalm/reader/mlm.py
+++ b/paddlepalm/reader/mlm.py
@@ -34,7 +34,6 @@ class MaskLMReader(Reader):
        for_cn = lang.lower() == 'cn' or lang.lower() == 'chinese'
-        self._register.add('token_ids')
        self._register.add('mask_pos')
        if phase == 'train':
            self._register.add('mask_label')

--- a/paddlepalm/trainer.py
+++ b/paddlepalm/trainer.py
@@ -46,7 +46,7 @@ class Trainer(object):
        self._pred_reader = None
        self._task_head = None
        self._pred_head = None
        self._train_reader = None
        self._predict_reader = None
        self._train_iterator = None
@@ -54,6 +54,8 @@ class Trainer(object):
        self._train_init = False
        self._predict_init = False
+        self._train_init_prog = None
+        self._pred_init_prog = None
        self._check_save = lambda: False
@@ -105,6 +107,7 @@ class Trainer(object):
            'fetch_list': 'self._pred_fetch_name_list'}
        self._lock = False
+        self._lock_prog = False
        self._build_forward = False
    def build_forward(self, backbone, task_head):
@@ -159,9 +162,11 @@ class Trainer(object):
        train_prog = fluid.Program()
        train_init_prog = fluid.Program()
-        self._train_prog = train_prog
+        if not self._lock_prog:
-        self._train_init_prog = train_init_prog
+            self._train_prog = train_prog
-        if not self._multi_task:
+            self._train_init_prog = train_init_prog
+        if not self._lock_prog:
            with fluid.program_guard(train_prog, train_init_prog):
                net_inputs = reader_helper.create_net_inputs(input_attrs, async=False)
                bb_output_vars = backbone.build(net_inputs)
@@ -182,7 +187,7 @@ class Trainer(object):
        task_inputs['reader'] = task_inputs_from_reader
        scope = self.name+'.'
-        if not self._multi_task:
+        if not self._lock_prog:
            with fluid.program_guard(train_prog, train_init_prog):
                with fluid.unique_name.guard(scope):
                    output_vars = self._build_head(task_inputs, phase='train', scope=scope)
@@ -207,7 +212,7 @@ class Trainer(object):
        # task_id_vec = layers.one_hot(task_id_var, num_instances)
        # losses = fluid.layers.concat([task_output_vars[inst.name+'/loss'] for inst in instances], axis=0)
        # loss = layers.reduce_sum(task_id_vec * losses)
-        if not self._multi_task:
+        if not self._lock_prog:
            with fluid.program_guard(train_prog, train_init_prog):
                loss_var = fluid.layers.reduce_sum(task_output_vars[self.name+'.loss'])
        else:
@@ -386,8 +391,9 @@ class Trainer(object):
            reader_helper.check_io(self._task_head.inputs_attrs['backbone'], self._backbone.outputs_attr, in_name='task_head(backbone, train)', out_name='backbone')
        elif phase == 'predict':
            self._predict_reader = reader
-            tail = self._num_examples % batch_size > 0
+            # tail = self._num_examples % batch_size > 0
-            self._pred_steps_pur_epoch = reader.num_examples // batch_size + 1 if tail else 0
+            # self._pred_steps_pur_epoch = reader.num_examples // batch_size + 1 if tail else 0
+            self._pred_steps_pur_epoch = reader.num_examples // batch_size 
            shape_and_dtypes = self._pred_shape_and_dtypes
            name_to_position = self._pred_name_to_position
            net_inputs = self._pred_net_inputs
@@ -415,7 +421,7 @@ class Trainer(object):
        self._raw_iterator_fn = iterator_fn
        feed_batch_process_fn = reader_helper.create_feed_batch_process_fn(net_inputs)
        if gpu_dev_count > 1:
-            distribute_feeder_fn = data_feeder(iterator_fn, feed_batch_process_fn)
+            distribute_feeder_fn = data_feeder(iterator_fn, feed_batch_process_fn, phase=phase)
        else:
            distribute_feeder_fn = iterator_fn()
@@ -427,6 +433,7 @@ class Trainer(object):
            self._pred_feed_batch_process_fn = feed_batch_process_fn
        # return distribute_feeder_fn()
    def load_ckpt(self, model_path):
        """
        load training checkpoint for further training or predicting.
@@ -465,7 +472,7 @@ class Trainer(object):
                strict=True)
        else:
            raise Exception("model not found. You should at least build_forward or build_predict_forward to load its checkpoint.")
    def load_predict_model(self, model_path, convert=False):
        """
        load pretrain models(backbone) for training.
@@ -510,6 +517,7 @@ class Trainer(object):
            save_type: a string. The type of saved model. Currently support checkpoint(ckpt) and predict model(predict), default is ckpt. If both two types are needed to save, you can set as "ckpt,predict".
        """
        save_type = save_type.split(',')
        if 'predict' in save_type:
@@ -534,6 +542,7 @@ class Trainer(object):
        def temp_func():
            if (self._save_predict or self._save_ckpt) and self._cur_train_step % save_steps == 0:
                if self._save_predict:
                    self._save(save_path, suffix='pred.step'+str(self._cur_train_step))
                    print('predict model has been saved at '+os.path.join(save_path, 'pred.step'+str(self._cur_train_step)))
@@ -600,7 +609,7 @@ class Trainer(object):
                       (self._cur_train_step-1) % self._steps_pur_epoch + 1 , self._steps_pur_epoch, self._cur_train_epoch,
                       loss, print_steps / time_cost))
                time_begin = time.time() 
-                self._check_save()
+                # self._check_save()
            # if cur_task.train_finish and cur_task.cur_train_step + cur_task.cur_train_epoch * cur_task.steps_pur_epoch == cur_task.expected_train_steps:
            #     print(cur_task.name+': train finished!')
            #     cur_task.save()
@@ -718,15 +727,16 @@ class Trainer(object):
            feed, mask = batch
            rt_outputs = exe.run(distribute_train_prog, feed=feed, fetch_list=fetch_list)
            num_fakes = decode_fake(len(rt_outputs[0]), mask, self._train_batch_size)
-            for _ in range(num_fakes):
+            if num_fakes:
-                for item in rt_outputs:
+                rt_outputs = [i[:-num_fakes] for i in rt_outputs]
-                    item.pop()
        else:
            feed = self._feed_batch_process_fn(batch)
            rt_outputs = exe.run(distribute_train_prog, feed=feed, fetch_list=fetch_list)
        rt_outputs = {k:v for k,v in zip(self._fetch_names, rt_outputs)}
        self._cur_train_step += 1
+        self._check_save()
        self._cur_train_epoch = (self._cur_train_step-1) // self._steps_pur_epoch
        return rt_outputs
@@ -735,9 +745,8 @@ class Trainer(object):
            feed, mask = batch
            rt_outputs = self._exe.run(self._distribute_pred_prog, feed=feed, fetch_list=self._pred_fetch_list)
            num_fakes = decode_fake(len(rt_outputs[0]), mask, self._predict_batch_size)
-            for _ in range(num_fakes):
+            if num_fakes:
-                for item in rt_outputs:
+                rt_outputs = [i[:-num_fakes] for i in rt_outputs]
-                    item.pop()
        else:
            feed = self._pred_feed_batch_process_fn(batch)
            rt_outputs = self._exe.run(self._distribute_pred_prog, feed=feed, fetch_list=self._pred_fetch_list)
@@ -750,7 +759,7 @@ class Trainer(object):
    @property
    def name(self):
        return self._name
    @property
    def num_examples(self):
        return self._num_examples

--- a/paddlepalm/utils/reader_helper.py
+++ b/paddlepalm/utils/reader_helper.py
@@ -21,13 +21,20 @@ import numpy as np
 import paddle
 from paddle import fluid
 from paddle.fluid import layers
+from paddlepalm.distribute import gpu_dev_count, cpu_dev_count
+dev_count = 1 if gpu_dev_count <= 1 else gpu_dev_count
 def create_feed_batch_process_fn(net_inputs):
-    def feed_batch_process_fn(data):
+    def feed_batch_process_fn(data, id=-1, phase='train', is_multi=False):
        temp = {}
-        for q, var in net_inputs.items():
+        if dev_count > 1 and phase=='train' and is_multi:
+            inputs = net_inputs[id]
+        else:
+            inputs= net_inputs
+        for q, var in inputs.items():
            if isinstance(var, str) or isinstance(var, unicode):
                temp[var] = data[q]
            else: