提交 64c247d7 编写于 作者: W wangxiao1021

upgrade downloader & fix bugs

上级 fd9a9ba5
# PaddlePALM
English | [简体中文](./README_cn.md)
English | [简体中文](./README_zh.md)
PaddlePALM (PArallel Learning from Multi-tasks) is a fast, flexible, extensible and easy-to-use NLP large-scale pretraining and multi-task learning framework. PaddlePALM is a high level framework aiming at **fastly** developing **high-performance** NLP models.
......@@ -115,7 +115,7 @@ You can easily re-produce following competitive results with minor codes, which
## Overview
<p align="center">
<img src="https://github.com/PaddlePaddle/PALM/blob/master/img/architecture.png" alt="Sample" width="582" height="289">
<img src="https://github.com/PaddlePaddle/PALM/blob/master/img/architecture.png" alt="Sample" width="600px" height="auto">
<p align="center">
<em>Architecture Diagram</em>
</p>
......@@ -171,17 +171,23 @@ We incorporate many pretrained models to initialize model backbone parameters. T
>>> from paddlepalm import downloader
>>> downloader.ls('pretrain')
Available pretrain items:
=> roberta-cn-base
=> roberta-cn-large
=> bert-cn-base
=> bert-cn-large
=> bert-en-uncased-base
=> bert-en-uncased-large
=> bert-en-cased-base
=> bert-en-cased-large
=> ernie-en-uncased-base
=> ernie-en-uncased-large
...
=> RoBERTa-zh-base
=> RoBERTa-zh-large
=> ERNIE-v2-en-base
=> ERNIE-v2-en-large
=> XLNet-cased-base
=> XLNet-cased-large
=> ERNIE-v1-zh-base
=> ERNIE-v1-zh-base-max-len-512
=> BERT-en-uncased-large-whole-word-masking
=> BERT-en-cased-large-whole-word-masking
=> BERT-en-uncased-base
=> BERT-en-uncased-large
=> BERT-en-cased-base
=> BERT-en-cased-large
=> BERT-multilingual-uncased-base
=> BERT-multilingual-cased-base
=> BERT-zh-base
>>> downloader.download('pretrain', 'bert-en-uncased-base', './pretrain_models')
...
......
......@@ -115,7 +115,7 @@ PaddlePALM (PArallel Learning from Multi-tasks) 是一个灵活,通用且易
## Package概览
<p align="center">
<img src="https://github.com/PaddlePaddle/PALM/blob/master/img/architecture.png" alt="Sample" width="582" height="289">
<img src="https://github.com/PaddlePaddle/PALM/blob/master/img/architecture.png" alt="Sample" width="600px" height="auto">
<p align="center">
<em>PALM架构图</em>
</p>
......@@ -162,17 +162,23 @@ cd PALM && python setup.py install
>>> from paddlepalm import downloader
>>> downloader.ls('pretrain')
Available pretrain items:
=> roberta-cn-base
=> roberta-cn-large
=> bert-cn-base
=> bert-cn-large
=> bert-en-uncased-base
=> bert-en-uncased-large
=> bert-en-cased-base
=> bert-en-cased-large
=> ernie-en-uncased-base
=> ernie-en-uncased-large
...
=> RoBERTa-zh-base
=> RoBERTa-zh-large
=> ERNIE-v2-en-base
=> ERNIE-v2-en-large
=> XLNet-cased-base
=> XLNet-cased-large
=> ERNIE-v1-zh-base
=> ERNIE-v1-zh-base-max-len-512
=> BERT-en-uncased-large-whole-word-masking
=> BERT-en-cased-large-whole-word-masking
=> BERT-en-uncased-base
=> BERT-en-uncased-large
=> BERT-en-cased-base
=> BERT-en-cased-large
=> BERT-multilingual-uncased-base
=> BERT-multilingual-cased-base
=> BERT-zh-base
>>> downloader.download('pretrain', 'bert-en-uncased-base', './pretrain_models')
...
......
......@@ -5,7 +5,7 @@ This task is a sentiment analysis task. The following sections detail model prep
#### Pre-trained Model
The pre-training model of this mission is: [ernie-zh-base](https://github.com/PaddlePaddle/PALM/tree/r0.3-api).
The pre-training model of this mission is: [ERNIE-v1-zh-base](https://github.com/PaddlePaddle/PALM/tree/r0.3-api).
Make sure you have downloaded the required pre-training model in the current folder.
......
......@@ -12,11 +12,11 @@ if __name__ == '__main__':
num_epochs = 10
lr = 5e-5
weight_decay = 0.01
vocab_path = './pretrain/ernie-zh-base/vocab.txt'
vocab_path = './pretrain/ERNIE-v1-zh-base/vocab.txt'
train_file = './data/train.tsv'
predict_file = './data/test.tsv'
config = json.load(open('./pretrain/ernie-zh-base/ernie_config.json'))
config = json.load(open('./pretrain/ERNIE-v1-zh-base/ernie_config.json'))
input_dim = config['hidden_size']
num_classes = 2
dropout_prob = 0.1
......@@ -26,7 +26,7 @@ if __name__ == '__main__':
pred_output = './outputs/predict/'
save_type = 'ckpt'
print_steps = 20
pre_params = './pretrain/ernie-zh-base/params'
pre_params = './pretrain/ERNIE-v1-zh-base/params'
# ----------------------- for training -----------------------
......
......@@ -19,13 +19,13 @@ if __name__ == '__main__':
pred_model_path = './outputs/ckpt.step'+str(18732)
print_steps = 50
pred_output = './outputs/predict/'
pre_params = './pretrain/ernie-en-base/params'
pre_params = './pretrain/ERNIE-v2-en-base/params'
task_name = 'Quora Question Pairs matching'
vocab_path = './pretrain/ernie-en-base/vocab.txt'
vocab_path = './pretrain/ERNIE-v2-en-base/vocab.txt'
train_file = './data/train.tsv'
predict_file = './data/test.tsv'
config = json.load(open('./pretrain/ernie-en-base/ernie_config.json'))
config = json.load(open('./pretrain/ERNIE-v2-en-base/ernie_config.json'))
input_dim = config['hidden_size']
# ----------------------- for training -----------------------
......
......@@ -5,7 +5,7 @@ This task is a machine reading comprehension task. The following sections detail
#### Pre-trianed Model
The pre-training model of this mission is: [ernie-zh-base](https://github.com/PaddlePaddle/PALM/tree/r0.3-api).
The pre-training model of this mission is: [ERNIE-v1-zh-base](https://github.com/PaddlePaddle/PALM/tree/r0.3-api).
Make sure you have downloaded the required pre-training model in the current folder.
......
......@@ -97,7 +97,6 @@ def find_lcs(s1, s2):
return s1[p - mmax:p], mmax
#
def evaluate(ground_truth_file, prediction_file):
f1 = 0
em = 0
......@@ -163,7 +162,6 @@ def eval_file(dataset_file, prediction_file):
if __name__ == '__main__':
EM, F1, AVG, TOTAL = eval_file("task_data/cmrc2018/dev.json", "predictions.json")
print(EM)
print(F1)
print(TOTAL)
\ No newline at end of file
EM, F1, AVG, TOTAL = eval_file("data/dev.json", "outputs/predict/predictions.json")
print('data_num: {}').format(TOTAL)
print('em_sroce: {}, f1: {}').format(M,F1)
......@@ -16,7 +16,7 @@ if __name__ == '__main__':
max_ans_len = 128
weight_decay = 0.01
print_steps = 20
vocab_path = './pretrain/ernie-zh-base/vocab.txt'
vocab_path = './pretrain/ERNIE-v1-zh-base/vocab.txt'
do_lower_case = True
train_file = './data/train.json'
......@@ -25,8 +25,8 @@ if __name__ == '__main__':
pred_output = './outputs/predict/'
save_type = 'ckpt'
task_name = 'cmrc2018'
pre_params = './pretrain/ernie-zh-base/params'
config = json.load(open('./pretrain/ernie-zh-base/ernie_config.json'))
pre_params = './pretrain/ERNIE-v1-zh-base/params'
config = json.load(open('./pretrain/ERNIE-v1-zh-base/ernie_config.json'))
# ----------------------- for training -----------------------
......@@ -91,11 +91,11 @@ if __name__ == '__main__':
# step 6: load pretrained model
pred_model_path = './outputs/ckpt.step'+str(12160)
pred_ckpt = trainer.load_ckpt(pred_model_path)
trainer.load_ckpt(pred_model_path)
# step 7: fit prepared reader and data
trainer.fit_reader(predict_mrc_reader, phase='predict')
# step 8: predict
print('predicting..')
trainer.predict(print_steps=print_steps, output_dir="outputs/")
trainer.predict(print_steps=print_steps, output_dir="outputs/predict")
......@@ -12,15 +12,15 @@ if __name__ == '__main__':
num_epochs = 6
print_steps = 5
num_classes = 26
vocab_path = './pretrain/ernie-en-base/vocab.txt'
vocab_path = './pretrain/ERNIE-v2-en-base/vocab.txt'
predict_file = './data/atis/atis_intent/test.tsv'
save_path = './outputs/'
pred_output = './outputs/predict-intent/'
save_type = 'ckpt'
random_seed = 0
pre_params = './pretrain/ernie-en-base/params'
config = json.load(open('./pretrain/ernie-en-base/ernie_config.json'))
pre_params = './pretrain/ERNIE-v2-en-base/params'
config = json.load(open('./pretrain/ERNIE-v2-en-base/ernie_config.json'))
input_dim = config['hidden_size']
# ----------------------- for prediction -----------------------
......
......@@ -13,15 +13,15 @@ if __name__ == '__main__':
print_steps = 5
num_classes = 130
label_map = './data/atis/atis_slot/label_map.json'
vocab_path = './pretrain/ernie-en-base/vocab.txt'
vocab_path = './pretrain/ERNIE-v2-en-base/vocab.txt'
predict_file = './data/atis/atis_slot/test.tsv'
save_path = './outputs/'
pred_output = './outputs/predict-slot/'
save_type = 'ckpt'
random_seed = 0
pre_params = './pretrain/ernie-en-base/params'
config = json.load(open('./pretrain/ernie-en-base/ernie_config.json'))
pre_params = './pretrain/ERNIE-v2-en-base/params'
config = json.load(open('./pretrain/ERNIE-v2-en-base/ernie_config.json'))
input_dim = config['hidden_size']
# ----------------------- for prediction -----------------------
......
......@@ -18,7 +18,7 @@ if __name__ == '__main__':
dropout_prob = 0.1
random_seed = 0
label_map = './data/atis/atis_slot/label_map.json'
vocab_path = './pretrain/ernie-en-base/vocab.txt'
vocab_path = './pretrain/ERNIE-v2-en-base/vocab.txt'
train_slot = './data/atis/atis_slot/train.tsv'
train_intent = './data/atis/atis_intent/train.tsv'
......@@ -27,8 +27,8 @@ if __name__ == '__main__':
pred_output = './outputs/predict/'
save_type = 'ckpt'
pre_params = './pretrain/ernie-en-base/params'
config = json.load(open('./pretrain/ernie-en-base/ernie_config.json'))
pre_params = './pretrain/ERNIE-v2-en-base/params'
config = json.load(open('./pretrain/ERNIE-v2-en-base/ernie_config.json'))
input_dim = config['hidden_size']
# ----------------------- for training -----------------------
......
......@@ -9,16 +9,16 @@ if __name__ == '__main__':
# configs
max_seqlen = 256
batch_size = 8
vocab_path = './pretrain/ernie-zh-base/vocab.txt'
vocab_path = './pretrain/ERNIE-v1-zh-base/vocab.txt'
predict_file = './data/test.tsv'
random_seed = 1
config = json.load(open('./pretrain/ernie-zh-base/ernie_config.json'))
config = json.load(open('./pretrain/ERNIE-v1-zh-base/ernie_config.json'))
input_dim = config['hidden_size']
num_classes = 2
task_name = 'chnsenticorp'
pred_output = './outputs/predict/'
print_steps = 20
pre_params = './pretrain/ernie-zh-base/params'
pre_params = './pretrain/ERNIE-v1-zh-base/params'
# ----------------------- for prediction -----------------------
......
......@@ -5,7 +5,7 @@ This task is a named entity recognition task. The following sections detail mode
#### Pre-trianed Model
The pre-training model of this mission is: [ernie-zh-base](https://github.com/PaddlePaddle/PALM/tree/r0.3-api).
The pre-training model of this mission is: [ERNIE-v1-zh-base](https://github.com/PaddlePaddle/PALM/tree/r0.3-api).
Make sure you have downloaded the required pre-training model in the current folder.
......
......@@ -13,7 +13,7 @@ if __name__ == '__main__':
num_classes = 7
weight_decay = 0.01
dropout_prob = 0.1
vocab_path = './pretrain/ernie-zh-base/vocab.txt'
vocab_path = './pretrain/ERNIE-v1-zh-base/vocab.txt'
label_map = './data/label_map.json'
random_seed = 1
train_file = './data/train.tsv'
......@@ -21,8 +21,8 @@ if __name__ == '__main__':
save_path='./outputs/'
save_type='ckpt'
pre_params = './pretrain/ernie-zh-base/params'
config = json.load(open('./pretrain/ernie-zh-base/ernie_config.json'))
pre_params = './pretrain/ERNIE-v1-zh-base/params'
config = json.load(open('./pretrain/ERNIE-v1-zh-base/ernie_config.json'))
input_dim = config['hidden_size']
task_name = 'msra_ner'
pred_output = './outputs/predict/'
......
img/architecture.png

355.6 KB | W: | H:

img/architecture.png

357.2 KB | W: | H:

img/architecture.png
img/architecture.png
img/architecture.png
img/architecture.png
  • 2-up
  • Swipe
  • Onion skin
......@@ -22,8 +22,7 @@ try:
from urllib.request import urlopen # Python 3
except ImportError:
from urllib2 import urlopen # Python 2
from collections import OrderedDict
import ssl
__all__ = ["download", "ls"]
......@@ -31,20 +30,38 @@ __all__ = ["download", "ls"]
# for https
ssl._create_default_https_context = ssl._create_unverified_context
_items = {
'pretrain': {'ernie-en-large': 'https://ernie.bj.bcebos.com/ERNIE_Large_en_stable-2.0.0.tar.gz',
'ernie-en-base': 'https://ernie.bj.bcebos.com/ERNIE_Base_en_stable-2.0.0.tar.gz',
'ernie-zh-base':'https://ernie.bj.bcebos.com/ERNIE_1.0_max-len-512.tar.gz',
'bert-en-uncased-large': 'https://bert-models.bj.bcebos.com/uncased_L-24_H-1024_A-16.tar.gz',
'bert-en-uncased-base': 'https://bert-models.bj.bcebos.com/uncased_L-12_H-768_A-12.tar.gz',
'roberta-zh-base': 'https://bert-models.bj.bcebos.com/chinese_roberta_wwm_ext_L-12_H-768_A-12.tar.gz',
'roberta-zh-large': 'https://bert-models.bj.bcebos.com/chinese_roberta_wwm_large_ext_L-24_H-1024_A-16.tar.gz',
'utils': None},
'vocab': {'utils': None},
'backbone': {'utils': None},
'head': {'utils': None},
'reader': {'utils': None},
}
_pretrain = (('RoBERTa-zh-base', 'https://bert-models.bj.bcebos.com/chinese_roberta_wwm_ext_L-12_H-768_A-12.tar.gz'),
('RoBERTa-zh-large', 'https://bert-models.bj.bcebos.com/chinese_roberta_wwm_large_ext_L-24_H-1024_A-16.tar.gz'),
('ERNIE-v2-en-base', 'https://ernie.bj.bcebos.com/ERNIE_Base_en_stable-2.0.0.tar.gz'),
('ERNIE-v2-en-large', 'https://ernie.bj.bcebos.com/ERNIE_Large_en_stable-2.0.0.tar.gz'),
('XLNet-cased-base','https://xlnet.bj.bcebos.com/xlnet_cased_L-12_H-768_A-12.tgz'),
('XLNet-cased-large','https://xlnet.bj.bcebos.com/xlnet_cased_L-24_H-1024_A-16.tgz'),
('ERNIE-v1-zh-base','https://baidu-nlp.bj.bcebos.com/ERNIE_stable-1.0.1.tar.gz'),
('ERNIE-v1-zh-base-max-len-512','https://ernie.bj.bcebos.com/ERNIE_1.0_max-len-512.tar.gz'),
('BERT-en-uncased-large-whole-word-masking','https://bert-models.bj.bcebos.com/wwm_uncased_L-24_H-1024_A-16.tar.gz'),
('BERT-en-cased-large-whole-word-masking','https://bert-models.bj.bcebos.com/wwm_cased_L-24_H-1024_A-16.tar.gz'),
('BERT-en-uncased-base', 'https://bert-models.bj.bcebos.com/uncased_L-12_H-768_A-12.tar.gz'),
('BERT-en-uncased-large', 'https://bert-models.bj.bcebos.com/uncased_L-24_H-1024_A-16.tar.gz'),
('BERT-en-cased-base','https://bert-models.bj.bcebos.com/cased_L-12_H-768_A-12.tar.gz'),
('BERT-en-cased-large','https://bert-models.bj.bcebos.com/cased_L-24_H-1024_A-16.tar.gz'),
('BERT-multilingual-uncased-base','https://bert-models.bj.bcebos.com/multilingual_L-12_H-768_A-12.tar.gz'),
('BERT-multilingual-cased-base','https://bert-models.bj.bcebos.com/multi_cased_L-12_H-768_A-12.tar.gz'),
('BERT-zh-base','https://bert-models.bj.bcebos.com/chinese_L-12_H-768_A-12.tar.gz'),
('utils', None))
_vocab = (('utils', None),('utils', None))
_backbone =(('utils', None),('utils', None))
_head = (('utils', None),('utils', None))
_reader = (('utils', None),('utils', None))
_items = (('pretrain', OrderedDict(_pretrain)),
('vocab', OrderedDict(_vocab)),
('backbone', OrderedDict(_backbone)),
('head', OrderedDict(_head)),
('reader', OrderedDict(_reader))
)
_items = OrderedDict(_items)
def _download(item, scope, path, silent=False, convert=False):
data_url = _items[item][scope]
......@@ -96,7 +113,7 @@ def _download(item, scope, path, silent=False, convert=False):
tar.extractall(path = data_dir)
tar.close()
os.remove(filename)
if scope.startswith('bert'):
if len(os.listdir(data_dir))==1:
source_path = data_dir + '/' + data_name.split('.')[0]
fileList = os.listdir(source_path)
for file in fileList:
......@@ -141,8 +158,8 @@ def download(item, scope='all', path='.'):
scope: the scope of the item to download.
path: the target dir to download to. Default is `.`, means current dir.
"""
item = item.lower()
scope = scope.lower()
# item = item.lower()
# scope = scope.lower()
assert item in _items, '{} is not found. Support list: {}'.format(item, list(_items.keys()))
if _items[item]['utils'] is not None:
......
from . import gpu_dev_count, cpu_dev_count
try:
import queue
import queue as Queue
except ImportError:
import Queue as queue
import Queue
from threading import Thread
dev_count = gpu_dev_count if gpu_dev_count > 0 else cpu_dev_count
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册