提交 75e463a2 编写于 作者: S sserdoubleh 提交者: Yibing Liu

Upload mode: Dialogue-PLATO. (#3932)

上级 13f7b4b1
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
.python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don’t work, or not
# install all needed dependencies.
#Pipfile.lock
# celery beat schedule file
celerybeat-schedule
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# PLATO
**PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable**
[paper link](http://arxiv.org/abs/1910.07931)
**\*\*\*\*\* Update \*\*\*\*\***
Nov. 14: Support new APIs in paddlepaddle 1.6.0 (model files in the link have been updated accordingly), multi-GPU training and decoding strategy of top-k sampling. Release our baseline model `PLATO w/o latent`.
## Requirements
```
- python >= 3.6
- paddlepaddle >= 1.6.0
- numpy
- nltk
- tqdm
- visualdl >= 1.3.0 (optional)
- regex
```
## Pre-trained dialogue generation model
A novel pre-training model for dialogue generation is introduced in this work, incorporated with latent discrete variables for one-to-many relationship modeling. Our model is flexible enough to support various kinds of conversations, including chit-chat, knowledge grounded dialogues, and conversational question answering. The pre-training is carried out with Reddit and Twitter corpora. You can download the uncased pre-trained model from:
* PLATO, uncased [model](https://baidu-nlp.bj.bcebos.com/PLATO/model.tar.gz): 12-layers, 768-hidden, 12-heads, 132M parameters
* PLATO w/o latent, uncased [model](https://baidu-nlp.bj.bcebos.com/PLATO/model-baseline.tar.gz): 12-layers 768-hidden, 12-heads, 109M parameters
```bash
mv /path/to/model.tar.gz .
tar xzf model.tar.gz
```
## Fine-tuning
We also provide instructions to fine-tune PLATO on different conversation datasets (chit-chat, knowledge grounded dialogues and conversational question answering).
### Data preparation
Download data from the [link](https://baidu-nlp.bj.bcebos.com/PLATO/data.tar.gz).
The tar file contains three processed datasets: `DailyDialog`, `PersonaChat` and `DSTC7_AVSD`.
```bash
mv /path/to/data.tar.gz .
tar xzf data.tar.gz
```
### Data format
Our model supports two kinds of data formats for dialogue context: `multi` and `multi_knowledge`.
* `multi`: multi-turn dialogue context.
```txt
u_1 __eou__ u_2 __eou__ ... u_n \t r
```
* `multi_knowledge`: multi-turn dialogue context with background knowledges.
```txt
k_1 __eou__ k_2 __eou__ ... k_m \t u_1 __eou__ u_2 __eou__ ... u_n \t r
```
If you want to use this model on other datasets, you can process your data accordingly.
### Train
Fine-tuning the pre-trained model on different `${DATASET}`.
```bash
# DailyDialog / PersonaChat / DSTC7_AVSD
DATASET=DailyDialog
sh scripts/${DATASET}/train.sh
```
After training, you can find the output folder `outputs/${DATASET}` (by default). It contatins `best.model` (best results on validation dataset), `hparams.json` (hyper-parameters of training script) and `trainer.log` (training log).
Fine-tuning the pre-trained model on multiple GPUs.
Note: You need to install NCCL library and set up the environment variable `LD_LIBRARY` properly.
```bash
sh scripts/DailyDialog/multi_gpu_train.sh
```
You can fine-tune PLATO w/o latent on different `${DATASET}`. We provide an example script on DailyDialog dataset.
```bash
sh scripts/DailyDialog/baseline_train.sh
```
#### Recommended settings
For the fine-tuning of our pre-trained model, it usually requires about 10 epochs to reach convergence with learning rate = 1e-5 and about 2-3 epochs to reach convergence with learning rate = 5e-5.
GPU Memory | batch size | max len
------|------|------
16G | 6 | 256
32G | 12 | 256
### Infer
Running inference on test dataset.
```bash
# DailyDialog / PersonaChat / DSTC7_AVSD
DATASET=DailyDialog
sh scripts/${DATASET}/infer.sh
# Running inference of PLATO w/o latent
sh scripts/DailyDialog/baseline_infer.sh
```
After inference, you can find the output foler `outputs/${DATASET}.infer` (by default). It contains `infer_0.result.json` (the inference result), `hparams.json` (hyper-parameters of inference scipt) and `trainer.log` (inference log).
If you want to use top-k sampling (beam search by default), you can follow the example script:
```bash
sh scripts/DailyDialog/topk_infer.sh
```
## Result
### DailyDialog
Model | BLEU-1/2 | Distinct-1/2 | Fluency | Coherence | Informativeness | Overall
------|------|------|------|------|------|-------
Seq2Seq | 0.336/0.268 | 0.030/0.128 | 1.85 | 0.37 | 0.44 | 0.33
iVAE_MI | 0.309/0.249 | 0.029/0.250 | 1.53 | 0.34 | 0.59 | 0.30
Our w/o Latent | **0.405/0.322** | 0.046/0.246 | 1.91 | **1.58** | 1.03 | 1.44
Our Method | 0.397/0.311 | **0.053/0.291** | **1.97** | 1.57 | **1.23** | **1.48**
### PersonaChat
Model | BLEU-1/2 | Distinct-1/2 | Knowledge R/P/F1 | Fluency | Coherence | Informativeness | Overall
------|------|------|------|------|------|-------|-------
Seq2Seq | 0.448/0.353 | 0.004/0.016 | 0.004/0.016/0.006 | 1.82 | 0.37 | 0.85 | 0.34
LIC | 0.405/0.320 | 0.019/0.113 | 0.042/0.154/0.064 | 1.95 | 1.34 | 1.09 | 1.29
Our w/o Latent | **0.458/0.357** | 0.012/0.064 | 0.085/0.263/0.125 | 1.98 | 1.36 | 1.04 | 1.30
Our Method | 0.406/0.315 | **0.021/0.121** | **0.142/0.461/0.211** | **1.99** | **1.51** | **1.70** | **1.50**
### DSTC7_AVSD
Model | BELU-1 | BELU-2 | BLEU-3 | BLEU-4 | METEOR | ROUGH-L | CIDEr
------|------|------|------|------|------|-------|-------
Baseline | 0.629 | 0.485 | 0.383 | 0.309 | 0.215 | 0.487 | 0.746
CMU | 0.718 | 0.584 | 0.478 | 0.394 | 0.267 | 0.563 | 1.094
Our Method | **0.784** | **0.637** | **0.525** | **0.435** | **0.286** | **0.596** | **1.209**
Our Method Upper Bound | 0.925 | 0.843 | 0.767 | 0.689 | 0.361 | 0.731 | 1.716
Note: In the experiments on `DSTC7_AVSD`, the response selection of our method is strengthened with an extra ranking step, which ranks the candidates according to the automatic scores and selects the top one as the final answer.
## Citation
If you find PLATO useful in your work, please cite the following Arxiv paper:
```
@article{bao2019plato,
title={PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable},
author={Bao, Siqi and He, Huang and Wang, Fan and Wu, Hua and Wang, Haifeng},
journal={arXiv preprint arXiv:1910.07931},
year={2019}
}
```
## Disclaimer
This project aims to facilitate further research progress in dialogue generation. Baidu is not responsible for the 3rd party's generation with the pre-trained system.
## Contact information
For help or issues using PLATO, please submit a GitHub issue.
For personal communication related to PLATO, please contact Siqi Bao (`baosiqi@baidu.com`), or Huang He (`hehuang@baidu.com`).
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Parse argument.
"""
import argparse
import json
def str2bool(v):
if v.lower() in ('yes', 'true', 't', 'y', '1'):
return True
elif v.lower() in ('no', 'false', 'f', 'n', '0'):
return False
else:
raise argparse.ArgumentTypeError('Unsupported value encountered.')
class HParams(dict):
""" Hyper-parameters class
Store hyper-parameters in training / infer / ... scripts.
"""
def __getattr__(self, name):
if name in self.keys():
return self[name]
for v in self.values():
if isinstance(v, HParams):
if name in v:
return v[name]
raise AttributeError(f"'HParams' object has no attribute '{name}'")
def __setattr__(self, name, value):
self[name] = value
def save(self, filename):
with open(filename, "w", encoding="utf-8") as fp:
json.dump(self, fp, ensure_ascii=False,
indent=4, sort_keys=False)
def load(self, filename):
with open(filename, "r", encoding="utf-8") as fp:
params_dict = json.load(fp)
for k, v in params_dict.items():
if isinstance(v, dict):
self[k].update(HParams(v))
else:
self[k] = v
def parse_args(parser):
""" Parse hyper-parameters from cmdline. """
parsed = parser.parse_args()
args = HParams()
optional_args = parser._action_groups[1]
for action in optional_args._group_actions[1:]:
arg_name = action.dest
args[arg_name] = getattr(parsed, arg_name)
for group in parser._action_groups[2:]:
group_args = HParams()
for action in group._group_actions:
arg_name = action.dest
group_args[arg_name] = getattr(parsed, arg_name)
if len(group_args) > 0:
args[group.title] = group_args
return args
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
DataLoader class
"""
import math
import paddle.fluid as fluid
import paddle.batch
from plato.args import str2bool
from plato.data.sampler import RandomSampler
from plato.data.sampler import SequentialSampler
from plato.data.sampler import SortedSampler
import plato.modules.parallel as parallel
class DataLoader(object):
""" Implement of DataLoader. """
@classmethod
def add_cmdline_argument(cls, group):
group.add_argument("--shuffle", type=str2bool, default=True)
group.add_argument("--sort_pool_size", type=int, default=0)
return group
def __init__(self, dataset, hparams, collate_fn=None, sampler=None, is_test=False, is_train=False):
self.dataset = dataset
self.collate_fn = collate_fn
self.sort_pool_size = hparams.sort_pool_size
if sampler is None:
if hparams.shuffle and not is_test:
sampler = RandomSampler(dataset)
else:
sampler = SequentialSampler(dataset)
if self.sort_pool_size > 0 and not is_test:
sampler = SortedSampler(sampler, self.sort_pool_size)
def reader():
for idx in sampler:
yield idx
self.reader = paddle.batch(reader, batch_size=hparams.batch_size, drop_last=False)
self.num_batches = math.ceil(len(dataset) / hparams.batch_size)
if hparams.use_data_distributed and parallel.Env().nranks > 1 and is_train:
self.reader = fluid.contrib.reader.distributed_batch_reader(self.reader)
self.num_batches = self.num_batches // fluid.dygraph.parallel.Env().nranks
return
def __len__(self):
return self.num_batches
def __iter__(self):
for batch_indices in self.reader():
samples = [self.dataset[idx] for idx in batch_indices]
yield self.collate_fn(samples)
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Dataset class
"""
import json
class Dataset(object):
""" Basic Dataset interface class. """
@classmethod
def add_cmdline_argument(cls, parser):
group = parser.add_argument_group("Dataset")
group.add_argument("--data_dir", type=str, required=True,
help="The dataset dir.")
group.add_argument("--data_type", type=str, required=True,
choices=["multi", "multi_knowledge"],
help="The type of dataset.")
return group
def __init__(self, data):
self.data = data
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
return self.data[idx]
class LazyDataset(Dataset):
"""
Lazy load dataset from disk.
Each line of data file is a preprocessed example.
"""
def __init__(self, data_file, transform=lambda s: json.loads(s)):
"""
Initialize lazy dataset.
By default, loading .jsonl format.
:param data_file
:type str
:param transform
:type callable
"""
self.data_file = data_file
self.transform = transform
self.offsets = [0]
with open(data_file, "r", encoding="utf-8") as fp:
while fp.readline() != "":
self.offsets.append(fp.tell())
self.offsets.pop()
self.fp = open(data_file, "r", encoding="utf-8")
def __len__(self):
return len(self.offsets)
def __getitem__(self, idx):
self.fp.seek(self.offsets[idx], 0)
return self.transform(self.fp.readline().strip())
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Field class
"""
from itertools import chain
import json
import numpy as np
import pickle
import time
from tqdm import tqdm
from plato.args import str2bool
from plato.data.tokenizer import Tokenizer
def max_lens(X):
lens = [len(X)]
while isinstance(X[0], list):
lens.append(max(map(len, X)))
X = [x for xs in X for x in xs]
return lens
def list2np(X, padding=0, dtype="int64"):
shape = max_lens(X)
ret = np.full(shape, padding, dtype=np.int32)
if len(shape) == 1:
ret = np.array(X)
elif len(shape) == 2:
for i, x in enumerate(X):
ret[i, :len(x)] = np.array(x)
elif len(shape) == 3:
for i, xs in enumerate(X):
for j, x in enumerate(xs):
ret[i, j, :len(x)] = np.array(x)
return ret.astype(dtype)
class BPETextField(object):
pad_token = "[PAD]"
bos_token = "[BOS]"
eos_token = "[EOS]"
unk_token = "[UNK]"
@classmethod
def add_cmdline_argument(cls, parser):
group = parser.add_argument_group("BPETextField")
group.add_argument("--vocab_path", type=str, required=True,
help="The vocabulary file path.")
group.add_argument("--filtered", type=str2bool, default=False,
help="Whether to filter the data with too long utterance/context. "
"If the data is unfiltered, it will be truncated.")
group.add_argument("--max_len", type=int, default=256,
help="The maximum length of context or knowledges.")
group.add_argument("--min_utt_len", type=int, default=1,
help="The minimum length of utterance.")
group.add_argument("--max_utt_len", type=int, default=50,
help="The maximum length of utterance.")
group.add_argument("--min_ctx_turn", type=int, default=1,
help="The minimum turn of context.")
group.add_argument("--max_ctx_turn", type=int, default=16,
help="The maximum turn of context.")
group.add_argument("--max_knowledge_num", type=int, default=16,
help="The maximum number of knowledges.")
group.add_argument("--max_knowledge_len", type=int, default=16,
help="The maximum length of each knowledges.")
group.add_argument("--tokenizer_type", type=str, default="Bert",
choices=["Bert", "GPT2"],
help="The type of tokenizer.")
return group
def __init__(self, hparams):
special_tokens = [self.pad_token, self.bos_token, self.eos_token, self.unk_token]
self.tokenizer = Tokenizer(vocab_path=hparams.vocab_path,
special_tokens=special_tokens,
tokenizer_type=hparams.tokenizer_type)
self.filtered = hparams.filtered
self.max_len = hparams.max_len
self.min_utt_len = hparams.min_utt_len
self.max_utt_len = hparams.max_utt_len
self.min_ctx_turn = hparams.min_ctx_turn
self.max_ctx_turn = hparams.max_ctx_turn - 1 # subtract reply turn
self.max_knowledge_num = hparams.max_knowledge_num
self.max_knowledge_len = hparams.max_knowledge_len
return
@property
def vocab_size(self):
return self.tokenizer.vocab_size
@property
def num_specials(self):
return len(self.special_tokens)
@property
def pad_id(self):
return self.tokenizer.convert_tokens_to_ids([self.pad_token])[0]
@property
def bos_id(self):
return self.tokenizer.convert_tokens_to_ids([self.bos_token])[0]
@property
def eos_id(self):
return self.tokenizer.convert_tokens_to_ids([self.eos_token])[0]
@property
def unk_id(self):
return self.tokenizer.convert_tokens_to_ids([self.unk_token])[0]
@property
def bot_id(self):
return 0
@property
def user_id(self):
return 1
@property
def knowledge_id(self):
return 2
def numericalize(self, tokens):
assert isinstance(tokens, list)
if len(tokens) == 0:
return []
element = tokens[0]
if isinstance(element, list):
return [self.numericalize(s) for s in tokens]
else:
return self.tokenizer.convert_tokens_to_ids(tokens)
def denumericalize(self, numbers):
assert isinstance(numbers, list)
if len(numbers) == 0:
return []
element = numbers[0]
if isinstance(element, list):
return [self.denumericalize(x) for x in numbers]
else:
return self.tokenizer.decode(
numbers, ignore_tokens=[self.bos_token, self.eos_token, self.pad_token])
def save_examples(self, examples, filename):
print(f"Saving examples to '{filename}' ...")
start = time.time()
if filename.endswith("pkl"):
with open(filename, "wb") as fp:
pickle.dump(examples, fp)
elif filename.endswith("jsonl"):
with open(filename, "w", encoding="utf-8") as fp:
for ex in examples:
fp.write(json.dumps(ex) + "\n")
else:
raise ValueError(f"Unsport file format: {filename}")
elapsed = time.time() - start
print(f"Saved {len(examples)} examples (elapsed {elapsed:.2f}s)")
def load_examples(self, filename):
print(f"Loading examples from '{filename}' ...")
start = time.time()
if filename.endswith("pkl"):
with open(filename, "rb") as fp:
examples = pickle.load(fp)
else:
with open(filename, "r", encoding="utf-8") as fp:
examples = list(map(lambda s: json.loads(s.strip()), fp))
elapsed = time.time() - start
print(f"Loaded {len(examples)} examples (elapsed {elapsed:.2f}s)")
return examples
def utt_filter_pred(self, utt):
return self.min_utt_len <= len(utt) \
and (not self.filtered or len(utt) <= self.max_utt_len)
def utts_filter_pred(self, utts):
return self.min_ctx_turn <= len(utts) \
and (not self.filtered or len(utts) <= self.max_ctx_turn)
def build_example_multi_turn(self, req):
examples = []
src = [self.tokenizer.tokenize(s) for s in req["context"]]
src = [s[-self.max_utt_len:] for s in src[-self.max_ctx_turn:]]
src = [self.numericalize(s) + [self.eos_id] for s in src]
ex = {"src": src}
examples.append(ex)
return examples
def build_example_multi_turn_with_knowledge(self, req):
examples = []
src = [self.tokenizer.tokenize(s) for s in req["context"]]
src = [s[-self.max_utt_len:] for s in src[-self.max_ctx_turn:]]
src = [self.numericalize(s) + [self.eos_id] for s in src]
knowledge = [self.tokenizer.tokenize(k) for k in req["knowledge"]]
knowledge = [k[:self.max_knowledge_len] for k in knowledge]
knowledge = [self.numericalize(k) + [self.eos_id] for k in knowledge]
ex = {"src": src, "knowledge": knowledge}
examples.append(ex)
return examples
def build_examples_multi_turn(self, data_file, data_type="train"):
print(f"Reading examples from '{data_file}' ...")
examples = []
ignored = 0
with open(data_file, "r", encoding="utf-8") as f:
for line in tqdm(f, total=None):
src, tgt = line.strip("\n").split("\t")
tgt = self.tokenizer.tokenize(tgt)
src = [self.tokenizer.tokenize(s) for s in src.split(" __eou__ ")]
if (self.utts_filter_pred(src) and all(map(self.utt_filter_pred, src))
and self.utt_filter_pred(tgt)) or data_type == "test":
src = [s[-self.max_utt_len:] for s in src[-self.max_ctx_turn:]]
src = [self.numericalize(s) + [self.eos_id] for s in src]
tgt = [self.bos_id] + self.numericalize(tgt) + [self.eos_id]
if data_type != "test":
tgt = tgt[:self.max_utt_len + 2]
ex = {"src": src, "tgt": tgt}
examples.append(ex)
else:
ignored += 1
print(f"Built {len(examples)} {data_type.upper()} examples ({ignored} filtered)")
return examples
def build_examples_multi_turn_with_knowledge(self, data_file, data_type="train"):
print(f"Reading examples from '{data_file}' ...")
examples = []
ignored = 0
with open(data_file, "r", encoding="utf-8") as f:
for line in tqdm(f, total=None):
knowledge, src, tgt = line.strip("\n").split("\t")
tgt = self.tokenizer.tokenize(tgt)
knowledge = [self.tokenizer.tokenize(k) for k in knowledge.split(" __eou__ ")]
knowledge = [k[:self.max_knowledge_len]
for k in knowledge[-self.max_knowledge_num:]]
src = [self.tokenizer.tokenize(s) for s in src.split(" __eou__ ")]
if (self.utts_filter_pred(src) and all(map(self.utt_filter_pred, src))
and self.utt_filter_pred(tgt)) or data_type == "test":
src = [s[-self.max_utt_len:] for s in src[-self.max_ctx_turn:]]
src = [self.numericalize(s) + [self.eos_id] for s in src]
knowledge = [self.numericalize(k) + [self.eos_id] for k in knowledge]
tgt = [self.bos_id] + self.numericalize(tgt) + [self.eos_id]
if data_type != "test":
tgt = tgt[:self.max_utt_len + 2]
ex = {"src": src, "knowledge": knowledge, "tgt": tgt}
examples.append(ex)
else:
ignored += 1
print(f"Built {len(examples)} {data_type.upper()} examples ({ignored} filtered)")
return examples
def collate_fn_multi_turn(self, samples):
batch_size = len(samples)
src = [sp["src"] for sp in samples]
src_token, src_pos, src_turn, src_role = [], [], [], []
for utts in src:
utt_lens = [len(utt) for utt in utts]
# Token ids
src_token.append(list(chain(*utts))[-self.max_len:])
# Position ids
pos = [list(range(l)) for l in utt_lens]
src_pos.append(list(chain(*pos))[-self.max_len:])
# Turn ids
turn = [[len(utts) - i] * l for i, l in enumerate(utt_lens)]
src_turn.append(list(chain(*turn))[-self.max_len:])
# Role ids
role = [[self.bot_id if (len(utts) - i) % 2 == 0 else self.user_id] * l
for i, l in enumerate(utt_lens)]
src_role.append(list(chain(*role))[-self.max_len:])
src_token = list2np(src_token, padding=self.pad_id)
src_pos = list2np(src_pos, padding=self.pad_id)
src_turn = list2np(src_turn, padding=self.pad_id)
src_role = list2np(src_role, padding=self.pad_id)
batch = {}
batch["src_token"] = src_token
batch["src_mask"] = (src_token != self.pad_id).astype("int64")
batch["src_pos"] = src_pos
batch["src_type"] = src_role
batch["src_turn"] = src_turn
if "tgt" in samples[0]:
tgt = [sp["tgt"] for sp in samples]
# Token ids & Label ids
tgt_token = list2np(tgt, padding=self.pad_id)
# Position ids
tgt_pos = np.zeros_like(tgt_token)
tgt_pos[:] = np.arange(tgt_token.shape[1], dtype=tgt_token.dtype)
# Turn ids
tgt_turn = np.zeros_like(tgt_token)
# Role ids
tgt_role = np.full_like(tgt_token, self.bot_id)
batch["tgt_token"] = tgt_token
batch["tgt_mask"] = (tgt_token != self.pad_id).astype("int64")
batch["tgt_pos"] = tgt_pos
batch["tgt_type"] = tgt_role
batch["tgt_turn"] = tgt_turn
return batch, batch_size
def collate_fn_multi_turn_with_knowledge(self, samples):
batch_size = len(samples)
src = [sp["src"] for sp in samples]
knowledge = [sp["knowledge"] for sp in samples]
src_token, src_pos, src_turn, src_role = [], [], [], []
for utts, ks in zip(src, knowledge):
utt_lens = [len(utt) for utt in utts]
k_lens = [len(k) for k in ks]
# Token ids
token = list(chain(*utts))[-self.max_len:]
token.extend(list(chain(*ks))[-self.max_len:])
src_token.append(token)
# Position ids
pos = list(chain(*[list(range(l)) for l in utt_lens]))[-self.max_len:]
pos.extend(list(chain(*[list(range(l)) for l in k_lens]))[-self.max_len:])
src_pos.append(pos)
# Turn ids
turn = list(chain(*[[len(utts) - i] * l for i, l in enumerate(utt_lens)]))[-self.max_len:]
turn.extend(list(chain(*[[i] * l for i, l in enumerate(k_lens)]))[-self.max_len:])
src_turn.append(turn)
# Role ids
role = list(chain(*[[self.bot_id if (len(utts)-i) % 2 == 0 else self.user_id] * l
for i, l in enumerate(utt_lens)]))[-self.max_len:]
role.extend(list(chain(*[[self.knowledge_id] * l for l in k_lens]))[-self.max_len:])
src_role.append(role)
src_token = list2np(src_token, padding=self.pad_id)
src_pos = list2np(src_pos, padding=self.pad_id)
src_turn = list2np(src_turn, padding=self.pad_id)
src_role = list2np(src_role, padding=self.pad_id)
batch = {}
batch["src_token"] = src_token
batch["src_mask"] = (src_token != self.pad_id).astype("int64")
batch["src_pos"] = src_pos
batch["src_type"] = src_role
batch["src_turn"] = src_turn
if "tgt" in samples[0]:
tgt = [sp["tgt"] for sp in samples]
# Token ids & Label ids
tgt_token = list2np(tgt, padding=self.pad_id)
# Position ids
tgt_pos = np.zeros_like(tgt_token)
tgt_pos[:] = np.arange(tgt_token.shape[1], dtype=tgt_token.dtype)
# Turn ids
tgt_turn = np.zeros_like(tgt_token)
# Role ids
tgt_role = np.full_like(tgt_token, self.bot_id)
batch["tgt_token"] = tgt_token
batch["tgt_mask"] = (tgt_token != self.pad_id).astype("int64")
batch["tgt_pos"] = tgt_pos
batch["tgt_type"] = tgt_role
batch["tgt_turn"] = tgt_turn
return batch, batch_size
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Sampler class.
"""
import numpy as np
class Sampler(object):
def __init__(self):
return
def __len__(self):
raise NotImplementedError
def __iter__(self):
raise NotImplementedError
class SequentialSampler(Sampler):
def __init__(self, dataset):
self.dataset = dataset
return
def __len__(self):
return len(self.dataset)
def __iter__(self):
return iter(range(len(self)))
class RandomSampler(Sampler):
def __init__(self, dataset):
self.dataset = dataset
self.epoch = 0
return
def __len__(self):
return len(self.dataset)
def __iter__(self):
np.random.seed(self.epoch)
self.epoch += 1
return iter(np.random.permutation(len(self)))
class SortedSampler(Sampler):
""" Sorted Sampler.
Sort each block of examples by key.
"""
def __init__(self, sampler, sort_pool_size, key="src"):
self.sampler = sampler
self.sort_pool_size = sort_pool_size
self.key = lambda idx: len(self.sampler.dataset[idx][key])
return
def __len__(self):
return len(self.sampler)
def __iter__(self):
pool = []
for idx in self.sampler:
pool.append(idx)
if len(pool) == self.sort_pool_size:
pool = sorted(pool, key=self.key)
for i in pool:
yield i
pool = []
if len(pool) > 0:
pool = sorted(pool, key=self.key)
for i in pool:
yield i
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Tokenizer class.
"""
from __future__ import absolute_import, division, print_function, unicode_literals
import collections
import json
import logging
import os
import regex as re
import sys
import unicodedata
def clean_string(string):
replace_mp = {
" - ": "-",
" ' ": "'",
" n't": "n't",
" 'm": "'m",
" do not": " don't",
" 's": "'s",
" 've": "'ve",
" 're": "'re"
}
for k, v in replace_mp.items():
string = string.replace(k, v)
return string
class Tokenizer(object):
def __init__(self, vocab_path, special_tokens=[], tokenizer_type="Bert"):
self.tokenizer_type = tokenizer_type
if tokenizer_type == "Bert":
self.spec_convert_dict = {"[BOS]": "[unused0]", "[EOS]": "[unused1]"}
self.spec_revert_dict = {v: k for k,
v in self.spec_convert_dict.items()}
special_tokens = [self.spec_convert_dict.get(tok, tok)
for tok in special_tokens]
self.special_tokens = ("[UNK]", "[SEP]", "[PAD]", "[CLS]", "[MASK]")
self.special_tokens += tuple(x for x in special_tokens if x not in self.special_tokens)
self._tokenizer = BertTokenizer(vocab_path, never_split=self.special_tokens)
for tok in self.special_tokens:
assert tok in self._tokenizer.vocab, f"special token '{tok}' is not in the vocabulary"
self.vocab_size = len(self._tokenizer.vocab)
elif tokenizer_type == "GPT2":
self.spec_convert_dict = {"[UNK]": "<unk>"}
self.spec_revert_dict = {v: k for k,
v in self.spec_convert_dict.items()}
special_tokens = [tok for tok in special_tokens
if tok not in self.spec_convert_dict]
vocab_file = os.path.join(vocab_path, "vocab.json")
merges_file = os.path.join(vocab_path, "merges.txt")
self._tokenizer = GPT2Tokenizer(vocab_file, merges_file, special_tokens=special_tokens)
self.num_specials = len(special_tokens)
self.vocab_size = len(self._tokenizer)
else:
raise ValueError
def tokenize(self, text):
return self._tokenizer.tokenize(text)
def convert_tokens_to_ids(self, tokens):
if self.tokenizer_type == "Bert":
tokens = [self.spec_convert_dict.get(tok, tok) for tok in tokens]
ids = self._tokenizer.convert_tokens_to_ids(tokens)
return ids
else:
tokens = [self.spec_convert_dict.get(tok, tok) for tok in tokens]
ids = self._tokenizer.convert_tokens_to_ids(tokens)
ids = [(i + self.num_specials) % self.vocab_size for i in ids]
return ids
def convert_ids_to_tokens(self, ids):
if self.tokenizer_type == "Bert":
tokens = self._tokenizer.convert_ids_to_tokens(ids)
tokens = [self.spec_revert_dict.get(tok, tok) for tok in tokens]
return tokens
else:
ids = [(i - self.num_specials) % self.vocab_size for i in ids]
tokens = self._tokenizer.convert_ids_to_tokens(ids)
tokens = [self.spec_revert_dict.get(tok, tok) for tok in tokens]
return tokens
def decode(self, ids, ignore_tokens=[]):
tokens = self.convert_ids_to_tokens(ids)
if len(ignore_tokens) > 0:
ignore_tokens = set(ignore_tokens)
tokens = [tok for tok in tokens if tok not in ignore_tokens]
if self.tokenizer_type == "Bert":
string = " ".join(tokens).replace(" ##", "")
else:
string = "".join(tokens)
string = bytearray([self._tokenizer.byte_decoder[c]
for c in string]).decode("utf-8")
string = clean_string(string)
return string
# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tokenization classes."""
logger = logging.getLogger(__name__)
def load_vocab(vocab_file):
"""Loads a vocabulary file into a dictionary."""
vocab = collections.OrderedDict()
index = 0
with open(vocab_file, "r", encoding="utf-8") as reader:
while True:
token = reader.readline()
if not token:
break
token = token.strip()
vocab[token] = index
index += 1
return vocab
def whitespace_tokenize(text):
"""Runs basic whitespace cleaning and splitting on a piece of text."""
text = text.strip()
if not text:
return []
tokens = text.split()
return tokens
class BertTokenizer(object):
"""Runs end-to-end tokenization: punctuation splitting + wordpiece"""
def __init__(self, vocab_file, do_lower_case=True, max_len=None, do_basic_tokenize=True,
never_split=("[UNK]", "[SEP]", "[PAD]", "[CLS]", "[MASK]")):
"""Constructs a BertTokenizer.
Args:
vocab_file: Path to a one-wordpiece-per-line vocabulary file
do_lower_case: Whether to lower case the input
Only has an effect when do_wordpiece_only=False
do_basic_tokenize: Whether to do basic tokenization before wordpiece.
max_len: An artificial maximum length to truncate tokenized sequences to;
Effective maximum length is always the minimum of this
value (if specified) and the underlying BERT model's
sequence length.
never_split: List of tokens which will never be split during tokenization.
Only has an effect when do_wordpiece_only=False
"""
if not os.path.isfile(vocab_file):
raise ValueError(
"Can't find a vocabulary file at path '{}'. To load the vocabulary from a Google pretrained "
"model use `tokenizer = BertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)`".format(vocab_file))
self.vocab = load_vocab(vocab_file)
self.ids_to_tokens = collections.OrderedDict(
[(ids, tok) for tok, ids in self.vocab.items()])
self.do_basic_tokenize = do_basic_tokenize
if do_basic_tokenize:
self.basic_tokenizer = BasicTokenizer(do_lower_case=do_lower_case,
never_split=never_split)
self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.vocab)
self.max_len = max_len if max_len is not None else int(1e12)
def tokenize(self, text):
split_tokens = []
if self.do_basic_tokenize:
for token in self.basic_tokenizer.tokenize(text):
for sub_token in self.wordpiece_tokenizer.tokenize(token):
split_tokens.append(sub_token)
else:
split_tokens = self.wordpiece_tokenizer.tokenize(text)
return split_tokens
def convert_tokens_to_ids(self, tokens):
"""Converts a sequence of tokens into ids using the vocab."""
ids = []
for token in tokens:
ids.append(self.vocab[token])
if len(ids) > self.max_len:
logger.warning(
"Token indices sequence length is longer than the specified maximum "
" sequence length for this BERT model ({} > {}). Running this"
" sequence through BERT will result in indexing errors".format(len(ids), self.max_len)
)
return ids
def convert_ids_to_tokens(self, ids):
"""Converts a sequence of ids in wordpiece tokens using the vocab."""
tokens = []
for i in ids:
tokens.append(self.ids_to_tokens[i])
return tokens
class BasicTokenizer(object):
"""Runs basic tokenization (punctuation splitting, lower casing, etc.)."""
def __init__(self,
do_lower_case=True,
never_split=("[UNK]", "[SEP]", "[PAD]", "[CLS]", "[MASK]")):
"""Constructs a BasicTokenizer.
Args:
do_lower_case: Whether to lower case the input.
"""
self.do_lower_case = do_lower_case
self.never_split = never_split
def tokenize(self, text):
"""Tokenizes a piece of text."""
text = self._clean_text(text)
# This was added on November 1st, 2018 for the multilingual and Chinese
# models. This is also applied to the English models now, but it doesn't
# matter since the English models were not trained on any Chinese data
# and generally don't have any Chinese data in them (there are Chinese
# characters in the vocabulary because Wikipedia does have some Chinese
# words in the English Wikipedia.).
text = self._tokenize_chinese_chars(text)
orig_tokens = whitespace_tokenize(text)
split_tokens = []
for token in orig_tokens:
if self.do_lower_case and token not in self.never_split:
token = token.lower()
token = self._run_strip_accents(token)
split_tokens.extend(self._run_split_on_punc(token))
output_tokens = whitespace_tokenize(" ".join(split_tokens))
return output_tokens
def _run_strip_accents(self, text):
"""Strips accents from a piece of text."""
text = unicodedata.normalize("NFD", text)
output = []
for char in text:
cat = unicodedata.category(char)
if cat == "Mn":
continue
output.append(char)
return "".join(output)
def _run_split_on_punc(self, text):
"""Splits punctuation on a piece of text."""
if text in self.never_split:
return [text]
chars = list(text)
i = 0
start_new_word = True
output = []
while i < len(chars):
char = chars[i]
if _is_punctuation(char):
output.append([char])
start_new_word = True
else:
if start_new_word:
output.append([])
start_new_word = False
output[-1].append(char)
i += 1
return ["".join(x) for x in output]
def _tokenize_chinese_chars(self, text):
"""Adds whitespace around any CJK character."""
output = []
for char in text:
cp = ord(char)
if self._is_chinese_char(cp):
output.append(" ")
output.append(char)
output.append(" ")
else:
output.append(char)
return "".join(output)
def _is_chinese_char(self, cp):
"""Checks whether CP is the codepoint of a CJK character."""
# This defines a "chinese character" as anything in the CJK Unicode block:
# https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block)
#
# Note that the CJK Unicode block is NOT all Japanese and Korean characters,
# despite its name. The modern Korean Hangul alphabet is a different block,
# as is Japanese Hiragana and Katakana. Those alphabets are used to write
# space-separated words, so they are not treated specially and handled
# like the all of the other languages.
if ((cp >= 0x4E00 and cp <= 0x9FFF) or #
(cp >= 0x3400 and cp <= 0x4DBF) or #
(cp >= 0x20000 and cp <= 0x2A6DF) or #
(cp >= 0x2A700 and cp <= 0x2B73F) or #
(cp >= 0x2B740 and cp <= 0x2B81F) or #
(cp >= 0x2B820 and cp <= 0x2CEAF) or
(cp >= 0xF900 and cp <= 0xFAFF) or #
(cp >= 0x2F800 and cp <= 0x2FA1F)): #
return True
return False
def _clean_text(self, text):
"""Performs invalid character removal and whitespace cleanup on text."""
output = []
for char in text:
cp = ord(char)
if cp == 0 or cp == 0xfffd or _is_control(char):
continue
if _is_whitespace(char):
output.append(" ")
else:
output.append(char)
return "".join(output)
class WordpieceTokenizer(object):
"""Runs WordPiece tokenization."""
def __init__(self, vocab, unk_token="[UNK]", max_input_chars_per_word=100):
self.vocab = vocab
self.unk_token = unk_token
self.max_input_chars_per_word = max_input_chars_per_word
def tokenize(self, text):
"""Tokenizes a piece of text into its word pieces.
This uses a greedy longest-match-first algorithm to perform tokenization
using the given vocabulary.
For example:
input = "unaffable"
output = ["un", "##aff", "##able"]
Args:
text: A single token or whitespace separated tokens. This should have
already been passed through `BasicTokenizer`.
Returns:
A list of wordpiece tokens.
"""
output_tokens = []
for token in whitespace_tokenize(text):
chars = list(token)
if len(chars) > self.max_input_chars_per_word:
output_tokens.append(self.unk_token)
continue
is_bad = False
start = 0
sub_tokens = []
while start < len(chars):
end = len(chars)
cur_substr = None
while start < end:
substr = "".join(chars[start:end])
if start > 0:
substr = "##" + substr
if substr in self.vocab:
cur_substr = substr
break
end -= 1
if cur_substr is None:
is_bad = True
break
sub_tokens.append(cur_substr)
start = end
if is_bad:
output_tokens.append(self.unk_token)
else:
output_tokens.extend(sub_tokens)
return output_tokens
def _is_whitespace(char):
"""Checks whether `chars` is a whitespace character."""
# \t, \n, and \r are technically contorl characters but we treat them
# as whitespace since they are generally considered as such.
if char == " " or char == "\t" or char == "\n" or char == "\r":
return True
cat = unicodedata.category(char)
if cat == "Zs":
return True
return False
def _is_control(char):
"""Checks whether `chars` is a control character."""
# These are technically control characters but we count them as whitespace
# characters.
if char == "\t" or char == "\n" or char == "\r":
return False
cat = unicodedata.category(char)
if cat.startswith("C"):
return True
return False
def _is_punctuation(char):
"""Checks whether `chars` is a punctuation character."""
cp = ord(char)
# We treat all non-letter/number ASCII as punctuation.
# Characters such as "^", "$", and "`" are not in the Unicode
# Punctuation class but we treat them as punctuation anyways, for
# consistency.
if ((cp >= 33 and cp <= 47) or (cp >= 58 and cp <= 64) or
(cp >= 91 and cp <= 96) or (cp >= 123 and cp <= 126)):
return True
cat = unicodedata.category(char)
if cat.startswith("P"):
return True
return False
# Copyright 2018 The Open AI Team Authors and The HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tokenization classes for OpenAI GPT."""
try:
from functools import lru_cache
except ImportError:
# Just a dummy decorator to get the checks to run on python2
# because honestly I don't want to support a byte-level unicode BPE tokenizer on python 2 right now.
def lru_cache():
return lambda func: func
@lru_cache()
def bytes_to_unicode():
"""
Returns list of utf-8 byte and a corresponding list of unicode strings.
The reversible bpe codes work on unicode strings.
This means you need a large # of unicode characters in your vocab if you want to avoid UNKs.
When you're at something like a 10B token dataset you end up needing around 5K for decent coverage.
This is a signficant percentage of your normal, say, 32K bpe vocab.
To avoid that, we want lookup tables between utf-8 bytes and unicode strings.
And avoids mapping to whitespace/control characters the bpe code barfs on.
"""
_chr = unichr if sys.version_info[0] == 2 else chr
bs = list(range(ord("!"), ord("~")+1))+list(range(ord("¡"), ord("¬")+1))+list(range(ord("®"), ord("ÿ")+1))
cs = bs[:]
n = 0
for b in range(2**8):
if b not in bs:
bs.append(b)
cs.append(2**8+n)
n += 1
cs = [_chr(n) for n in cs]
return dict(zip(bs, cs))
def get_pairs(word):
"""Return set of symbol pairs in a word.
Word is represented as tuple of symbols (symbols being variable-length strings).
"""
pairs = set()
prev_char = word[0]
for char in word[1:]:
pairs.add((prev_char, char))
prev_char = char
return pairs
class GPT2Tokenizer(object):
"""
GPT-2 BPE tokenizer. Peculiarities:
- Byte-level BPE
"""
def __init__(self, vocab_file, merges_file, errors='replace', special_tokens=None, max_len=None):
self.max_len = max_len if max_len is not None else int(1e12)
self.encoder = json.load(open(vocab_file))
self.decoder = {v:k for k,v in self.encoder.items()}
self.errors = errors # how to handle errors in decoding
self.byte_encoder = bytes_to_unicode()
self.byte_decoder = {v:k for k, v in self.byte_encoder.items()}
bpe_data = open(merges_file, encoding='utf-8').read().split('\n')[1:-1]
bpe_merges = [tuple(merge.split()) for merge in bpe_data]
self.bpe_ranks = dict(zip(bpe_merges, range(len(bpe_merges))))
self.cache = {}
# Should haved added re.IGNORECASE so BPE merges can happen for capitalized versions of contractions
self.pat = re.compile(r"""'s|'t|'re|'ve|'m|'ll|'d| ?\p{L}+| ?\p{N}+| ?[^\s\p{L}\p{N}]+|\s+(?!\S)|\s+""")
self.special_tokens = {}
self.special_tokens_decoder = {}
self.set_special_tokens(special_tokens)
def __len__(self):
return len(self.encoder) + len(self.special_tokens)
def set_special_tokens(self, special_tokens):
""" Add a list of additional tokens to the encoder.
The additional tokens are indexed starting from the last index of the
current vocabulary in the order of the `special_tokens` list.
"""
if not special_tokens:
self.special_tokens = {}
self.special_tokens_decoder = {}
return
self.special_tokens = dict((tok, len(self.encoder) + i) for i, tok in enumerate(special_tokens))
self.special_tokens_decoder = {v:k for k, v in self.special_tokens.items()}
logger.info("Special tokens {}".format(self.special_tokens))
def bpe(self, token):
if token in self.cache:
return self.cache[token]
word = tuple(token)
pairs = get_pairs(word)
if not pairs:
return token
while True:
bigram = min(pairs, key = lambda pair: self.bpe_ranks.get(pair, float('inf')))
if bigram not in self.bpe_ranks:
break
first, second = bigram
new_word = []
i = 0
while i < len(word):
try:
j = word.index(first, i)
new_word.extend(word[i:j])
i = j
except:
new_word.extend(word[i:])
break
if word[i] == first and i < len(word)-1 and word[i+1] == second:
new_word.append(first+second)
i += 2
else:
new_word.append(word[i])
i += 1
new_word = tuple(new_word)
word = new_word
if len(word) == 1:
break
else:
pairs = get_pairs(word)
word = ' '.join(word)
self.cache[token] = word
return word
def tokenize(self, text):
""" Tokenize a string. """
bpe_tokens = []
for token in re.findall(self.pat, text):
token = ''.join(self.byte_encoder[ord(b)] for b in token if ord(b) in self.byte_encoder)
if token == '':
continue
bpe_tokens.extend(bpe_token for bpe_token in self.bpe(token).split(' '))
return bpe_tokens
def convert_tokens_to_ids(self, tokens):
""" Converts a sequence of tokens into ids using the vocab. """
ids = []
if isinstance(tokens, str) or (sys.version_info[0] == 2 and isinstance(tokens, unicode)):
if tokens in self.special_tokens:
return self.special_tokens[tokens]
else:
return self.encoder.get(tokens, 0)
for token in tokens:
if token in self.special_tokens:
ids.append(self.special_tokens[token])
else:
ids.append(self.encoder.get(token, 0))
if len(ids) > self.max_len:
logger.warning(
"Token indices sequence length is longer than the specified maximum "
" sequence length for this OpenAI GPT model ({} > {}). Running this"
" sequence through the model will result in indexing errors".format(len(ids), self.max_len)
)
return ids
def convert_ids_to_tokens(self, ids, skip_special_tokens=False):
"""Converts a sequence of ids in BPE tokens using the vocab."""
tokens = []
for i in ids:
if i in self.special_tokens_decoder:
if not skip_special_tokens:
tokens.append(self.special_tokens_decoder[i])
else:
tokens.append(self.decoder[i])
return tokens
def encode(self, text):
return self.convert_tokens_to_ids(self.tokenize(text))
def decode(self, tokens):
text = ''.join([self.decoder[token] for token in tokens])
text = bytearray([self.byte_decoder[c] for c in text]).decode('utf-8', errors=self.errors)
return text
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Metrics class.
"""
from collections import Counter
from nltk.translate import bleu_score
from nltk.translate.bleu_score import SmoothingFunction
import numpy as np
def distinct(seqs):
""" Calculate intra/inter distinct 1/2. """
batch_size = len(seqs)
intra_dist1, intra_dist2 = [], []
unigrams_all, bigrams_all = Counter(), Counter()
for seq in seqs:
unigrams = Counter(seq)
bigrams = Counter(zip(seq, seq[1:]))
intra_dist1.append((len(unigrams)+1e-12) / (len(seq)+1e-5))
intra_dist2.append((len(bigrams)+1e-12) / (max(0, len(seq)-1)+1e-5))
unigrams_all.update(unigrams)
bigrams_all.update(bigrams)
inter_dist1 = (len(unigrams_all)+1e-12) / (sum(unigrams_all.values())+1e-5)
inter_dist2 = (len(bigrams_all)+1e-12) / (sum(bigrams_all.values())+1e-5)
intra_dist1 = np.average(intra_dist1)
intra_dist2 = np.average(intra_dist2)
return intra_dist1, intra_dist2, inter_dist1, inter_dist2
def bleu(hyps, refs):
""" Calculate bleu 1/2. """
bleu_1 = []
bleu_2 = []
for hyp, ref in zip(hyps, refs):
try:
score = bleu_score.sentence_bleu(
[ref], hyp,
smoothing_function=SmoothingFunction().method7,
weights=[1, 0, 0, 0])
except:
score = 0
bleu_1.append(score)
try:
score = bleu_score.sentence_bleu(
[ref], hyp,
smoothing_function=SmoothingFunction().method7,
weights=[0.5, 0.5, 0, 0])
except:
score = 0
bleu_2.append(score)
bleu_1 = np.average(bleu_1)
bleu_2 = np.average(bleu_2)
return bleu_1, bleu_2
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
MetricsTracker class
"""
from collections import defaultdict
import math
class MetricsTracker(object):
""" Tracking metrics. """
def __init__(self):
self.metrics_val = defaultdict(float)
self.metrics_avg = defaultdict(float)
self.num_samples = 0
def update(self, metrics, num_samples):
for key, val in metrics.items():
if val is not None:
val = float(val)
self.metrics_val[key] = val
avg_val = (self.metrics_avg.get(key, 0) * self.num_samples +
val * num_samples) / (self.num_samples + num_samples)
self.metrics_avg[key] = avg_val
self.num_samples += num_samples
def clear(self):
self.metrics_val = defaultdict(float)
self.metrics_avg = defaultdict(float)
self.num_samples = 0
def items(self):
return self.metrics_avg.items()
def get(self, name):
if self.num_samples == 0:
raise ValueError("There is no data in Metrics.")
return self.metrics_avg.get(name)
def state_dict(self):
return {
"metrics_val": self.metrics_val,
"metrics_avg": self.metrics_avg,
"num_samples": self.num_samples,
}
def load_state_dict(self, state_dict):
self.metrics_val = state_dict["metrics_val"]
self.metrics_avg = state_dict["metrics_avg"]
self.num_samples = state_dict["num_samples"]
def value(self):
metric_strs = []
for key, val in self.metrics_val.items():
metric_str = f"{key.upper()}-{val:.3f}"
metric_strs.append(metric_str)
if "token_nll" in self.metrics_val:
metric_str = f"TOKEN_PPL-{math.exp(self.metrics_val['token_nll']):.3f}"
metric_strs.append(metric_str)
metric_strs = " ".join(metric_strs)
return metric_strs
def summary(self):
metric_strs = []
for key, val in self.metrics_avg.items():
metric_str = f"{key.upper()}-{val:.3f}"
metric_strs.append(metric_str)
if "token_nll" in self.metrics_avg:
metric_str = f"TOKEN_PPL-{math.exp(self.metrics_avg['token_nll']):.3f}"
metric_strs.append(metric_str)
metric_strs = " ".join(metric_strs)
return metric_strs
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Loading models.
"""
import plato.models.unified_transformer
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Generator class.
"""
import bisect
import math
import sys
import numpy as np
import paddle.fluid as fluid
import paddle.fluid.layers as layers
from paddle.fluid.framework import Variable
from plato.args import str2bool
import plato.modules.functions as F
def repeat(var, times):
if isinstance(var, list):
return [repeat(x, times) for x in var]
elif isinstance(var, dict):
return {k: repeat(v, times) for k, v in var.items()}
elif isinstance(var, Variable):
var = F.unsqueeze(var, [1])
expand_times = [1] * len(var.shape)
expand_times[1] = times
dtype = var.dtype
var = layers.cast(var, "float32")
var = layers.expand(var, expand_times)
shape = [var.shape[0] * var.shape[1]] + var.shape[2:]
var = layers.reshape(var, shape)
var = layers.cast(var, dtype)
return var
else:
return var
def gather(var, idx):
if isinstance(var, list):
return [gather(x, idx) for x in var]
elif isinstance(var, dict):
return {k: gather(v, idx) for k, v in var.items()}
elif isinstance(var, Variable):
out = layers.gather(var, idx)
return out
else:
return var
class Generator(object):
""" Genrator class. """
_registry = dict()
@classmethod
def register(cls, name):
Generator._registry[name] = cls
return
@staticmethod
def by_name(name):
return Generator._registry[name]
@staticmethod
def create(hparams, *args, **kwargs):
""" Create generator. """
generator_cls = Generator.by_name(hparams.generator)
return generator_cls(hparams, *args, **kwargs)
@classmethod
def add_cmdline_argument(cls, parser):
group = parser.add_argument_group("Generator")
group.add_argument("--generator", type=str, default="BeamSearch",
choices=["TopKSampling", "TopPSampling", "GreedySampling",
"BeamSearch"])
group.add_argument("--min_gen_len", type=int, default=1,
help="The minimum length of generated response.")
group.add_argument("--max_gen_len", type=int, default=30,
help="The maximum length of generated response.")
args, _ = parser.parse_known_args()
generator_cls = cls.by_name(args.generator)
generator_cls.add_cmdline_argument(group)
return group
def __init__(self, hparams, bpe):
self.vocab_size = bpe.vocab_size
self.bos_id = bpe.bos_id
self.eos_id = bpe.eos_id
self.unk_id = bpe.unk_id
self.pad_id = bpe.pad_id
self.min_gen_len = hparams.min_gen_len
self.max_gen_len = hparams.max_gen_len
assert 1 <= self.min_gen_len <= self.max_gen_len
return
def __call__(self, step_fn, state):
"""
Running generation.
@param : step_fn : decoding one step
@type : function
@param : state : initial state
@type : dict
"""
raise NotImplementedError
class Sampling(Generator):
""" Sampling Generator. """
@classmethod
def add_cmdline_argument(cls, group):
group.add_argument("--ignore_unk", type=str2bool, default=True,
help="Whether to ignore unkown token in generation.")
group.add_argument("--sampling_temperature", type=float, default=1.0)
return group
def __init__(self, hparams, bpe):
super().__init__(hparams, bpe)
self.ignore_unk = hparams.ignore_unk
self.temperature = hparams.sampling_temperature
return
def _sampling(self, scores):
""" Sampling function. """
raise NotImplementedError
def __call__(self, step_fn, state):
"""
Running generation.
@param : step_fn : decoding one step
@type : function
@param : state : initial state
@type : dict
"""
batch_size = state["batch_size"]
vocab_size = self.vocab_size
pos_index = layers.range(0, batch_size, 1, dtype="int64")
pos_index = layers.scale(pos_index, vocab_size)
# shape: [batch_size, beam_size, 1]
predictions = layers.fill_constant(shape=[batch_size, 1],
dtype="int64",
value=self.bos_id)
sequence_scores = layers.fill_constant(shape=[batch_size],
dtype="float32",
value=0.0)
unk_penalty = np.zeros(vocab_size, dtype="float32")
unk_penalty[self.unk_id] = -1e10
unk_penalty = layers.assign(unk_penalty)
eos_penalty = np.zeros(vocab_size, dtype="float32")
eos_penalty[self.eos_id] = -1e10
eos_penalty = layers.assign(eos_penalty)
scores_after_end = np.full(vocab_size, -1e10, dtype="float32")
scores_after_end[self.pad_id] = 0
scores_after_end = layers.assign(scores_after_end)
# initial input
for step in range(1, self.max_gen_len + 1):
pre_ids = predictions[:, -1:]
state["pred_token"] = F.unsqueeze(pre_ids, [2])
if step > 1:
state["pred_mask"] = 1 - F.equal(state["pred_token"], self.pad_id)
state["pred_pos"] = state["pred_pos"] + 1
scores, state = step_fn(state)
# Generate next
# scores shape: [batch_size, vocab_size]
if self.ignore_unk:
scores = scores + unk_penalty
if step <= self.min_gen_len:
scores = scores + eos_penalty
# previous token is [PAD] or [EOS]
# shape: [batch_size, 1]
pre_eos_mask = F.equal(pre_ids, self.eos_id) + F.equal(pre_ids, self.pad_id)
scores = scores * (1 - pre_eos_mask) + \
layers.expand(pre_eos_mask, [1, vocab_size]) * scores_after_end
scores = scores / self.temperature
preds = self._sampling(scores)
predictions = layers.concat([predictions, F.unsqueeze(preds, [1])], axis=1)
scores = layers.reshape(scores, [batch_size * vocab_size])
preds = preds + pos_index
scores = gather(scores, preds)
sequence_scores = sequence_scores + scores
results = {
"preds": predictions,
"scores": sequence_scores
}
return results
class GreedySampling(Sampling):
""" Greedy sampling. """
@classmethod
def add_cmdline_argument(cls, group):
return Sampling.add_cmdline_argument(group)
def _sampling(self, logits):
""" Implement greedy sampling. """
preds = layers.argmax(logits, axis=1)
return preds
class TopKSampling(Sampling):
""" Top-k sampling. """
@classmethod
def add_cmdline_argument(cls, group):
Sampling.add_cmdline_argument(group)
group.add_argument("--top_k_ratio", type=float, default=None)
group.add_argument("--top_k_num", type=int, default=None)
return group
def __init__(self, hparams, bpe):
super().__init__(hparams, bpe)
assert hparams.top_k_ratio is not None or hparams.top_k_num is not None
if hparams.top_k_num is not None:
self.top_k_num = hparams.top_k_num
else:
self.top_k_num = math.floor(hparams.top_k_ratio * self.vocab_size)
assert self.top_k_num >= 1
return
def _sampling(self, logits):
""" Implement top-k sampling. """
probs = layers.softmax(logits, axis=1)
probs, indices = layers.topk(probs, self.top_k_num)
probs = probs / layers.reduce_sum(probs, dim=1, keep_dim=True)
preds = []
for p, ids in zip(probs.numpy(), indices.numpy()):
o = np.random.choice(ids, p=p)
preds.append(o)
preds = np.array(preds, dtype="int64")
return fluid.dygraph.to_variable(preds)
class TopPSampling(Sampling):
""" Top-p sampling. """
@classmethod
def add_cmdline_argument(cls, group):
Sampling.add_cmdline_argument(group)
group.add_argument("--top_p_ratio", type=float, default=1.0)
return group
def __init__(self, hparams, bpe):
super().__init__(hparams, bpe)
self.top_p_ratio = hparams.top_p_ratio
return
def _sampling(self, logits):
""" Implement top-k sampling. """
probs = layers.softmax(logits, axis=1)
preds = []
for p in probs.numpy():
ids = np.argsort(-p)
p = p[ids]
c_p = np.cumsum(p)
i = bisect.bisect_right(c_p, self.top_p_ratio) + 1
o = np.random.choice(ids[:i], p=p[:i]/np.sum(p[:i]))
preds.append(o)
preds = np.array(preds, dtype="int64")
return fluid.dygraph.to_variable(preds)
class BeamSearch(Generator):
""" BeamSearch generator. """
@classmethod
def add_cmdline_argument(cls, group):
group.add_argument("--beam_size", type=int, default=5,
help="The beam size in beam search.")
group.add_argument("--length_average", type=str2bool, default=False,
help="Whether to use length average.")
group.add_argument("--length_penalty", type=float, default=-1.0,
help="The parameter(alpha) of length penalty.")
group.add_argument("--ignore_unk", type=str2bool, default=True,
help="Whether to ignore unkown token in generation.")
return group
def __init__(self, hparams, bpe):
super().__init__(hparams, bpe)
self.beam_size = hparams.beam_size
self.length_average = hparams.length_average
self.length_penalty = hparams.length_penalty
self.ignore_unk = hparams.ignore_unk
return
def __call__(self, step_fn, state):
"""
Running beam search.
@param : step_fn : decoding one step
@type : function
@param : state : initial state
@type : dict
"""
batch_size = state["batch_size"]
beam_size = self.beam_size
# shape: [batch_size, 1]
pos_index = layers.range(0, batch_size, 1, dtype="int64")
pos_index = layers.scale(pos_index, beam_size)
pos_index = F.unsqueeze(pos_index, [1])
# shape: [batch_size, beam_size, 1]
predictions = layers.fill_constant(shape=[batch_size, beam_size, 1],
dtype="int64",
value=self.bos_id)
# initial input
state["pred_token"] = predictions[:, :1]
# shape: [batch_size, vocab_size]
scores, state = step_fn(state)
unk_penalty = np.zeros(self.vocab_size, dtype="float32")
unk_penalty[self.unk_id] = -1e10
unk_penalty = layers.assign(unk_penalty)
eos_penalty = np.zeros(self.vocab_size, dtype="float32")
eos_penalty[self.eos_id] = -1e10
eos_penalty = layers.assign(eos_penalty)
scores_after_end = np.full(self.vocab_size, -1e10, dtype="float32")
scores_after_end[self.pad_id] = 0
scores_after_end = layers.assign(scores_after_end)
if self.ignore_unk:
scores = scores + unk_penalty
scores = scores + eos_penalty
# shape: [batch_size, beam_size]
sequence_scores, preds = layers.topk(scores, self.beam_size)
predictions = layers.concat([predictions, F.unsqueeze(preds, [2])], axis=2)
state = repeat(state, beam_size)
parent_idx_list = []
pred_list = []
for step in range(2, self.max_gen_len + 1):
pre_ids = predictions[:, :, -1:]
state["pred_token"] = layers.reshape(pre_ids, shape=[batch_size * beam_size, 1, 1])
state["pred_mask"] = 1 - F.equal(state["pred_token"], self.pad_id)
state["pred_pos"] = state["pred_pos"] + 1
scores, state = step_fn(state)
# Generate next
# scores shape: [batch_size, beam_size, vocab_size]
if self.ignore_unk:
scores = scores + unk_penalty
if step <= self.min_gen_len:
scores = scores + eos_penalty
scores = layers.reshape(scores, shape=[batch_size, beam_size, self.vocab_size])
# previous token is [PAD] or [EOS]
pre_eos_mask = F.equal(pre_ids, self.eos_id) + F.equal(pre_ids, self.pad_id)
scores = scores * (1 - pre_eos_mask) + \
layers.expand(pre_eos_mask, [1, 1, self.vocab_size]) * scores_after_end
if self.length_average:
scaled_value = pre_eos_mask + (1 - pre_eos_mask) * (1 - 1 / step)
sequence_scores = F.unsqueeze(sequence_scores, [2]) * scaled_value
scaled_value = pre_eos_mask + (1 - pre_eos_mask) * (1 / step)
scores = scores * scaled_value
elif self.length_penalty >= 0.0:
scaled_value = pre_eos_mask + (1 - pre_eos_mask) * \
(math.pow((4 + step) / (5 + step), self.length_penalty))
sequence_scores = layers.elementwise_mul(scaled_value, sequence_scores, axis=0)
scaled_value = pre_eos_mask + (1 - pre_eos_mask) * \
(math.pow(1 / (5 + step), self.length_penalty))
scores = scores * scaled_value
scores = layers.elementwise_add(scores, sequence_scores, axis=0)
scores = layers.reshape(scores, shape=[batch_size, beam_size * self.vocab_size])
topk_scores, topk_indices = layers.topk(scores, beam_size)
vocab_size = layers.fill_constant(shape=[1], dtype="int64", value=self.vocab_size)
parent_idx = layers.elementwise_floordiv(topk_indices, vocab_size)
preds = layers.elementwise_mod(topk_indices, vocab_size)
# Gather state / sequence_scores
parent_idx = layers.elementwise_add(parent_idx, pos_index, axis=0)
parent_idx = layers.reshape(parent_idx, [batch_size * beam_size])
state = gather(state, parent_idx)
sequence_scores = topk_scores
predictions = layers.reshape(predictions, shape=[batch_size * beam_size, step])
predictions = gather(predictions, parent_idx)
predictions = layers.reshape(predictions, shape=[batch_size, beam_size, step])
predictions = layers.concat([predictions, F.unsqueeze(preds, [2])], axis=2)
pre_ids = predictions[:, :, -1]
pre_eos_mask = F.equal(pre_ids, self.eos_id) + F.equal(pre_ids, self.pad_id)
sequence_scores = sequence_scores * pre_eos_mask + layers.scale(1 - pre_eos_mask, -1e10)
_, indices = layers.argsort(sequence_scores, axis=1)
indices = indices + pos_index
indices = layers.reshape(indices, [-1])
sequence_scores = layers.reshape(sequence_scores, [batch_size * beam_size])
predictions = layers.reshape(predictions, [batch_size * beam_size, -1])
sequence_scores = gather(sequence_scores, indices)
predictions = layers.gather(predictions, indices)
sequence_scores = layers.reshape(sequence_scores, [batch_size, beam_size])
predictions = layers.reshape(predictions, [batch_size, beam_size, -1])
results = {
"preds": predictions[:, -1],
"scores": sequence_scores[:, -1]
}
return results
BeamSearch.register("BeamSearch")
GreedySampling.register("GreedySampling")
TopKSampling.register("TopKSampling")
TopPSampling.register("TopPSampling")
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Model base
"""
import paddle.fluid as fluid
from paddle.fluid.dygraph import parallel_helper
class ModelBase(fluid.dygraph.Layer):
"""
Basic model wrapper for static graph and dygrpah.
"""
_registry = dict()
@classmethod
def register(cls, name):
ModelBase._registry[name] = cls
return
@staticmethod
def by_name(name):
return ModelBase._registry[name]
@staticmethod
def create(name_scope, hparams, *args, **kwargs):
model_cls = ModelBase.by_name(hparams.model)
return model_cls(name_scope, hparams, *args, **kwargs)
@classmethod
def add_cmdline_argument(cls, parser):
""" Add cmdline argument. """
group = parser.add_argument_group("Model")
group.add_argument("--init_checkpoint", type=str, default=None)
group.add_argument("--model", type=str, default="UnifiedTransformer",
choices=["UnifiedTransformer"])
args, _ = parser.parse_known_args()
model_cls = ModelBase.by_name(args.model)
model_cls.add_cmdline_argument(group)
return group
def __init__(self, name_scope, hparams):
super().__init__(name_scope)
self.init_checkpoint = hparams.init_checkpoint
return
def __call__(self, *args, **kwargs):
""" Re-implement __call__ function in dygraph mode. """
if not self._built:
self._build_once(*args, **kwargs)
self._built = True
outputs = self.forward(*args, **kwargs)
return outputs
def _build_once(self, inputs, *args, **kwargs):
"""
Build only once.
1. Initialize models's parameters.
2. Boardcast parameters if in data parallel mode.
3. Load saved parameters
"""
# Initial parameters.
self._create_parameters()
if parallel_helper._is_data_parallel_mode():
parallel_helper._broadcast_parameters(self._parameters.values())
# Load persitables
self._load_params()
return
def _create_parameters(self):
""" Create model's paramters. """
raise NotImplementedError
def _load_params(self):
""" Load saved paramters. """
raise NotImplementedError
def _forward(self, inputs, is_training):
""" Real forward process of model in different mode(train/test). """
raise NotImplementedError
def _collect_metrics(self, inputs, outputs):
""" Calculate loss function by using inputs and outputs. """
raise NotImplementedError
def _optimize(self, loss):
""" Optimize loss function and update model. """
raise NotImplementedError
def _infer(self, inputs):
""" Real inference process of model. """
raise NotImplementedError
def forward(self, inputs, is_training=False):
"""
Forward process, include real forward, collect metrices and optimize(optional)
@params : inputs : input data
@type : dict of numpy.ndarray/int/float/...
"""
if is_training:
self.train()
else:
self.eval()
outputs = self._forward(inputs, is_training)
metrics = self._collect_metrics(inputs, outputs)
loss = metrics["loss"]
if is_training:
self._optimize(loss)
metrics = {k: v.numpy() for k, v in metrics.items()}
return metrics
def infer(self, inputs):
"""
Inference process.
@params : inputs : input data
@type : dict of numpy.ndarray/int/float/...
"""
if not self._built:
self._build_once(inputs)
self._built = True
self.eval()
results = self._infer(inputs)
results = {name: results[name].numpy() for name in results}
return results
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
UnifiedTransformer
"""
import numpy as np
import paddle
import paddle.fluid as fluid
from paddle.fluid.dygraph import FC
import paddle.fluid.layers as layers
from plato.args import str2bool
from plato.modules.embedder import Embedder
import plato.modules.functions as F
from plato.modules.layer_norm import LayerNorm
from plato.modules.transformer_block import TransformerBlock
from plato.models.model_base import ModelBase
class UnifiedTransformer(ModelBase):
"""
Implement unified transformer.
"""
@classmethod
def add_cmdline_argument(cls, group):
""" Add cmdline argument. """
group.add_argument("--num_token_embeddings", type=int, default=-1,
help="The number of tokens in vocabulary. "
"It will be automatically calculated after loading vocabulary.")
group.add_argument("--num_pos_embeddings", type=int, default=512,
help="The maximum number of position.")
group.add_argument("--num_type_embeddings", type=int, default=2,
help="The number of different type of tokens.")
group.add_argument("--num_turn_embeddings", type=int, default=16,
help="The maximum number of turn.")
group.add_argument("--num_latent", type=int, default=20,
help="The number of latent.")
group.add_argument("--tau", type=float, default=0.67,
help="The parameter of gumbel softmax.")
group.add_argument("--with_bow", type=str2bool, default=True,
help="Whether to use BoW loss.")
group.add_argument("--hidden_dim", type=int, default=768,
help="The size of hidden vector in transformer.")
group.add_argument("--num_heads", type=int, default=12,
help="The number of heads in multi head attention.")
group.add_argument("--num_layers", type=int, default=12,
help="The number of layers in transformer.")
group.add_argument("--padding_idx", type=int, default=0,
help="The padding index.")
group.add_argument("--dropout", type=float, default=0.1,
help="The dropout ratio after multi head attention and feed forward network.")
group.add_argument("--embed_dropout", type=float, default=0.0,
help="The dropout ratio of embedding layers.")
group.add_argument("--attn_dropout", type=float, default=0.1,
help="The dropout ratio of multi head attention.")
group.add_argument("--ff_dropout", type=float, default=0.1,
help="The dropout ratio of feed forward network.")
group.add_argument("--use_discriminator", type=str2bool, default=False,
help="Whether to use discriminator loss.")
group.add_argument("--dis_ratio", type=float, default=1.0,
help="The ratio of discriminator loss.")
group.add_argument("--weight_sharing", type=str2bool, default=True,
help="Whether to share weight between token embedding and "
"predictor FC layer.")
group.add_argument("--pos_trainable", type=str2bool, default=True,
help="Whether to train position embeddings.")
group.add_argument("--two_layer_predictor", type=str2bool, default=False,
help="Use two layer predictor. "
"Traditional BERT use two FC layers to predict masked token.")
group.add_argument("--bidirectional_context", type=str2bool, default=True,
help="Whether to use bidirectional self-attention in context tokens.")
group.add_argument("--label_smooth", type=float, default=0.0,
help="Use soft label to calculate NLL loss and BoW loss.")
group.add_argument("--initializer_range", type=float, default=0.02,
help="Use to initialize parameters.")
group.add_argument("--lr", type=float, default=5e-5,
help="The inital learning rate for Adam.")
group.add_argument("--weight_decay", type=float, default=0.0,
help="The weight decay for Adam.")
group.add_argument("--max_grad_norm", type=float, default=None,
help="The maximum norm of gradient.")
return group
def __init__(self, name_scope, hparams, generator, dtype="float32"):
super().__init__(name_scope, hparams)
self.generator = generator
self.num_token_embeddings = hparams.num_token_embeddings
self.num_pos_embeddings = hparams.num_pos_embeddings
self.num_type_embeddings = hparams.num_type_embeddings
self.num_turn_embeddings = hparams.num_turn_embeddings
self.num_latent = hparams.num_latent
self.tau = hparams.tau
self.with_bow = hparams.with_bow
self.hidden_dim = hparams.hidden_dim
self.num_heads = hparams.num_heads
self.num_layers = hparams.num_layers
self.padding_idx = hparams.padding_idx
self.dropout = hparams.dropout
self.embed_dropout = hparams.embed_dropout
self.attn_dropout = hparams.attn_dropout
self.ff_dropout = hparams.ff_dropout
self.use_discriminator = hparams.use_discriminator
self.weight_sharing = hparams.weight_sharing
self.pos_trainable = hparams.pos_trainable
self.two_layer_predictor = hparams.two_layer_predictor
self.bidirectional_context = hparams.bidirectional_context
self.label_smooth = hparams.label_smooth
self.initializer_range = hparams.initializer_range
self.embedder = Embedder(self.full_name(),
self.hidden_dim,
self.num_token_embeddings,
self.num_pos_embeddings,
self.num_type_embeddings,
self.num_turn_embeddings,
padding_idx=self.padding_idx,
dropout=self.embed_dropout,
pos_trainable=self.pos_trainable)
self.embed_layer_norm = LayerNorm(self.full_name(),
begin_norm_axis=2,
epsilon=1e-12,
param_attr=fluid.ParamAttr(
regularizer=fluid.regularizer.L2Decay(0.0)),
bias_attr=fluid.ParamAttr(
regularizer=fluid.regularizer.L2Decay(0.0)))
self.layers = []
for i in range(hparams.num_layers):
layer = TransformerBlock(self.full_name(),
self.hidden_dim,
self.num_heads,
self.dropout,
self.attn_dropout,
self.ff_dropout)
self.layers.append(layer)
self.add_sublayer(f"layer_{i}", layer)
if self.num_latent > 0:
self.post_network = FC(name_scope=self.full_name() + ".post_network",
size=self.num_latent,
bias_attr=False)
if self.use_discriminator:
self.dis_ratio = hparams.dis_ratio
self.discriminator = FC(name_scope=self.full_name() + ".discriminator",
size=1,
act="sigmoid")
if self.two_layer_predictor:
self.pre_predictor = FC(name_scope=self.full_name() + ".pre_predictor",
size=self.hidden_dim,
num_flatten_dims=2,
act="gelu")
if self.num_latent > 0 and self.with_bow:
self.pre_bow_predictor = FC(name_scope=self.full_name() + ".pre_bow_predictor",
size=self.hidden_dim,
act="gelu")
if not self.weight_sharing:
self.predictor = FC(name_scope=self.full_name() + ".predictor",
size=self.num_token_embeddings,
num_flatten_dims=2,
bias_attr=False)
if self.num_latent > 0 and self.with_bow:
self.bow_predictor = FC(name_scope=self.full_name() + ".bow_predictor",
size=self.num_token_embeddings,
bias_attr=False)
self.max_grad_norm = hparams.max_grad_norm
if self.max_grad_norm is not None:
self.grad_clip = fluid.dygraph_grad_clip.GradClipByGlobalNorm(hparams.max_grad_norm)
else:
self.grad_clip = None
self.weight_decay = hparams.weight_decay
self.optimizer = fluid.optimizer.AdamOptimizer(
learning_rate=hparams.lr,
regularization=fluid.regularizer.L2Decay(self.weight_decay))
self._dtype = dtype
# DataDistributed
self.before_backward_fn = None
self.after_backward_fn = None
return
def _create_parameters(self):
""" Create model's paramters. """
if self.num_latent > 0:
self.mask_embed = self.create_parameter(
attr=fluid.ParamAttr(
name="mask_embed",
initializer=fluid.initializer.NormalInitializer(scale=self.initializer_range)),
shape=[1, 1, self.hidden_dim],
dtype=self._dtype)
self.latent_embeddings = self.create_parameter(
attr=fluid.ParamAttr(
name="latent_embeddings",
initializer=fluid.initializer.NormalInitializer(scale=self.initializer_range)),
shape=[self.num_latent, self.hidden_dim],
dtype=self._dtype)
sequence_mask = np.tri(self.num_pos_embeddings, self.num_pos_embeddings, dtype=self._dtype)
self.sequence_mask = self.create_parameter(
attr=fluid.ParamAttr(
name="sequence_mask",
initializer=fluid.initializer.NumpyArrayInitializer(sequence_mask),
trainable=False),
shape=sequence_mask.shape,
dtype=sequence_mask.dtype)
return
def _load_params(self):
""" Load saved paramters. """
if self.init_checkpoint is not None:
print(f"Loading parameters from {self.init_checkpoint}")
if hasattr(fluid, "load_dygraph"):
# >= 1.6.0 compatible
models, optimizers = fluid.load_dygraph(self.init_checkpoint)
else:
models, optimizers = fluid.dygraph.load_persistables(self.init_checkpoint)
parameters = {param.name: param for param in self.parameters()}
for name, param in models.items():
if name in parameters:
if param.shape != parameters[name].shape:
print(f"part of parameter({name}) random normlize initialize")
if hasattr(param, "numpy"):
arr = param.numpy()
else:
value = param.value()
tensor = value.get_tensor()
arr = np.array(tensor)
z = np.random.normal(scale=self.initializer_range,
size=parameters[name].shape).astype("float32")
if name == "Model/UnifiedTransformer_0/Embedder_0/Embedding_0.w_0":
z[-param.shape[0]:] = arr
else:
z[:param.shape[0]] = arr
z = fluid.dygraph.to_variable(z)
models[name] = z
for name in parameters:
if name not in models:
if parameters[name].trainable:
print(f"parameter({name}) random normlize initialize")
z = np.random.normal(scale=self.initializer_range,
size=parameters[name].shape).astype("float32")
models[name] = fluid.dygraph.to_variable(z)
else:
models[name] = parameters[name]
self.load_dict(models)
print(f"Loaded parameters from {self.init_checkpoint}")
def _create_mask(self, input_mask, append_head=False, auto_regressive=False):
"""
Create attention mask.
@param : input_mask
@type : Variable(shape: [batch_size, max_seq_len, 1])
@param : auto_regressive
@type : bool
"""
seq_len = input_mask.shape[1]
input_mask = layers.cast(input_mask, self._dtype)
mask1 = layers.expand(input_mask, [1, 1, seq_len])
mask2 = layers.transpose(mask1, [0, 2, 1])
mask = layers.elementwise_mul(mask1, mask2)
if append_head:
mask = layers.concat([mask[:, :1, :], mask], axis=1)
mask = layers.concat([mask[:, :, :1], mask], axis=2)
seq_len += 1
if auto_regressive:
seq_mask = self.sequence_mask[:seq_len, :seq_len]
mask = layers.elementwise_mul(mask, seq_mask)
mask = 1 - mask
return mask
def _join_mask(self, mask1, mask2):
""" Merge source attention mask and target attention mask.
@param : mask1 : source attention mask
@type : Variable(shape: [batch_size, max_src_len, max_src_len])
@param : mask1 : target attention mask
@type : Variable(shape: [batch_size, max_tgt_len, max_tgt_len])
"""
batch_size = mask1.shape[0]
seq_len1 = mask1.shape[1]
seq_len2 = mask2.shape[1]
seq_len = seq_len1 + seq_len2
mask_lu = mask1
mask_ru = layers.fill_constant([batch_size, seq_len1, seq_len2], self._dtype, 1)
mask3 = layers.expand(mask2[:, :, :1], [1, 1, seq_len1])
mask4 = layers.expand(mask1[:, :1], [1, seq_len2, 1])
mask_lb = mask3 + mask4 - mask3 * mask4
mask_rb = mask2
mask_u = layers.concat([mask_lu, mask_ru], axis=2)
mask_b = layers.concat([mask_lb, mask_rb], axis=2)
mask = layers.concat([mask_u, mask_b], axis=1)
return mask
def _posteriori_network(self, input_mask, embed, batch_size, src_len, tgt_len):
""" Basic posteriori network implement. """
mask_embed = self.mask_embed
mask_embed = layers.expand(mask_embed, [batch_size, 1, 1])
mask_embed = self.embed_layer_norm(mask_embed)
post_embed = layers.concat([mask_embed, embed], axis=1)
mask = self._create_mask(input_mask, auto_regressive=not self.bidirectional_context,
append_head=True)
for layer in self.layers:
post_embed = layer(post_embed, mask, None)
post_embed = post_embed[:, 0]
post_logits = self.post_network(post_embed)
post_probs = layers.softmax(post_logits, axis=-1)
post_logits = layers.log(post_probs)
return post_embed, post_probs, post_logits
def _discriminator_network(self, input_mask, embed, batch_size, src_len, tgt_len, pos_embed):
""" Basic discriminator network implement. """
# if batch_size <= 1:
# raise ValueError("Warmming: If you use discriminator loss in traning, the batch_size must be greater than 1.")
src_embed = embed[:, :src_len]
tgt_embed = embed[:, src_len:]
if batch_size > 1:
neg_tgt_embed = layers.concat([tgt_embed[1:], tgt_embed[:1]], axis=0)
else:
# Cannot train discriminator if batch_size == 1
neg_tgt_embed = tgt_embed
neg_embed = layers.concat([src_embed, neg_tgt_embed], axis=1)
# Create generation network mask
src_mask = input_mask[:, :src_len]
tgt_mask = input_mask[:, src_len:]
if batch_size > 1:
neg_tgt_mask = layers.concat([tgt_mask[1:], tgt_mask[:1]], axis=0)
else:
# Cannot train discriminator if batch_size == 1
neg_tgt_mask = tgt_mask
neg_mask = layers.concat([src_mask, neg_tgt_mask], axis=1)
mask = self._create_mask(neg_mask, auto_regressive=not self.bidirectional_context,
append_head=True)
mask_embed = self.mask_embed
mask_embed = layers.expand(mask_embed, [batch_size, 1, 1])
mask_embed = self.embed_layer_norm(mask_embed)
neg_embed= layers.concat([mask_embed, neg_embed], axis=1)
for layer in self.layers:
neg_embed = layer(neg_embed, mask, None)
neg_embed = neg_embed[:, 0]
pos_probs = self.discriminator(pos_embed)
neg_probs = self.discriminator(neg_embed)
return pos_probs, neg_probs
def _generation_network(self, input_mask, embed, batch_size, src_len, tgt_len, latent_embed):
""" Basic generation network implement. """
if self.num_latent > 0:
latent_embed = F.unsqueeze(latent_embed, [1])
latent_embed = self.embed_layer_norm(latent_embed)
dec_embed = layers.concat([latent_embed, embed], axis=1)
else:
dec_embed = embed
# Create generation network mask
src_mask = input_mask[:, :src_len]
tgt_mask = input_mask[:, src_len:]
enc_mask = self._create_mask(src_mask, auto_regressive=not self.bidirectional_context,
append_head=self.num_latent > 0)
dec_mask = self._create_mask(tgt_mask, auto_regressive=True)
mask = self._join_mask(enc_mask, dec_mask)
for layer in self.layers:
dec_embed = layer(dec_embed, mask, None)
if self.num_latent > 0:
latent_embed = dec_embed[:, 0]
else:
latent_embed = None
dec_embed = dec_embed[:, -tgt_len:]
if self.two_layer_predictor:
dec_embed = self.pre_predictor(dec_embed)
if self.weight_sharing:
token_embedding = self.embedder.token_embedding._w
dec_logits = layers.matmul(
x=dec_embed,
y=token_embedding,
transpose_y=True
)
else:
dec_logits = self.predictor(dec_embed)
dec_probs = layers.softmax(dec_logits, axis=-1)
return latent_embed, dec_probs
def _forward(self, inputs, is_training):
""" Real forward process of model in different mode(train/test). """
outputs = {}
src_token = inputs["src_token"]
src_mask = inputs["src_mask"]
src_pos = inputs["src_pos"]
src_type = inputs["src_type"]
src_turn = inputs["src_turn"]
tgt_token = inputs["tgt_token"][:, :-1]
tgt_mask = inputs["tgt_mask"][:, :-1]
tgt_pos = inputs["tgt_pos"][:, :-1]
tgt_type = inputs["tgt_type"][:, :-1]
tgt_turn = inputs["tgt_turn"][:, :-1]
input_mask = layers.concat([src_mask, tgt_mask], axis=1)
input_mask.stop_gradient = True
src_embed = self.embedder(src_token, src_pos, src_type, src_turn)
tgt_embed = self.embedder(tgt_token, tgt_pos, tgt_type, tgt_turn)
embed = layers.concat([src_embed, tgt_embed], axis=1)
embed = self.embed_layer_norm(embed)
batch_size = src_token.shape[0]
src_len = src_token.shape[1]
tgt_len = tgt_token.shape[1]
if self.num_latent > 0:
post_embed, post_probs, post_logits = self._posteriori_network(
input_mask, embed, batch_size, src_len, tgt_len)
outputs["post_logits"] = post_logits
if self.use_discriminator:
pos_probs, neg_probs = self._discriminator_network(
input_mask, embed, batch_size, src_len, tgt_len, post_embed)
outputs["pos_probs"] = pos_probs
outputs["neg_probs"] = neg_probs
if is_training:
z = F.gumbel_softmax(post_logits, self.tau)
else:
indices = layers.argmax(post_logits, axis=1)
z = layers.one_hot(F.unsqueeze(indices, [1]), self.num_latent)
latent_embeddings = self.latent_embeddings
latent_embed = layers.matmul(z, latent_embeddings)
outputs["latent_embed"] = latent_embed
else:
latent_embed = None
latent_embed, dec_probs = self._generation_network(
input_mask, embed, batch_size, src_len, tgt_len, latent_embed)
outputs["dec_probs"] = dec_probs
if self.num_latent > 0 and self.with_bow:
if self.two_layer_predictor:
latent_embed = self.pre_bow_predictor(latent_embed)
bow_logits = self.bow_predictor(latent_embed)
bow_probs = layers.softmax(bow_logits)
outputs["bow_probs"] = bow_probs
return outputs
def _collect_metrics(self, inputs, outputs):
""" Calculate loss function by using inputs and outputs. """
metrics = {}
tgt_len = layers.reduce_sum(layers.reduce_sum(inputs["tgt_mask"], dim=1) - 1)
tgt_len.stop_gradient = True
label = inputs["tgt_token"][:, 1:]
if self.label_smooth > 0:
one_hot_label = layers.one_hot(label, self.num_token_embeddings)
smooth_label = layers.label_smooth(one_hot_label, epsilon=self.label_smooth,
dtype=self._dtype)
nll = layers.cross_entropy(outputs["dec_pred"], smooth_label, soft_label=True,
ignore_index=self.padding_idx)
else:
nll = layers.cross_entropy(outputs["dec_probs"], label, ignore_index=self.padding_idx)
nll = layers.reduce_sum(nll, dim=1)
token_nll = layers.reduce_sum(nll) / tgt_len
nll = layers.reduce_mean(nll)
metrics["nll"] = nll
metrics["token_nll"] = token_nll
loss = nll
if self.num_latent > 0 and self.with_bow:
bow_probs = F.unsqueeze(outputs["bow_probs"], [1])
bow_probs = layers.expand(bow_probs, [1, label.shape[1], 1])
if self.label_smooth > 0:
bow = layers.cross_entropy(bow_probs, smooth_label, soft_label=True,
ignore_index=self.padding_idx)
else:
bow = layers.cross_entropy(bow_probs, label, ignore_index=self.padding_idx)
bow = layers.reduce_sum(bow, dim=1)
token_bow = layers.reduce_sum(bow) / tgt_len
bow = layers.reduce_mean(bow)
metrics["bow"] = bow
metrics["token_bow"] = token_bow
loss = loss + bow
if self.num_latent > 0 and self.use_discriminator:
dis = 0.0 - (layers.log(outputs["pos_probs"]) + layers.log(1.0 - outputs["neg_probs"]))
dis = layers.reduce_mean(dis)
metrics["dis"] = dis
loss = loss + dis * self.dis_ratio
metrics["loss"] = loss
metrics["token_num"] = tgt_len
return metrics
def _optimize(self, loss):
""" Optimize loss function and update model. """
if self.before_backward_fn is not None:
loss = self.before_backward_fn(loss)
loss.backward()
if self.after_backward_fn is not None:
self.after_backward_fn()
self.optimizer.minimize(loss,
grad_clip=self.grad_clip,
parameter_list=self.parameters())
self.clear_gradients()
return
def _init_state(self, inputs):
""" Initialize decode state. """
state = {}
src_token = inputs["src_token"]
src_mask = inputs["src_mask"]
src_pos = inputs["src_pos"]
src_type = inputs["src_type"]
src_turn = inputs["src_turn"]
batch_size = src_token.shape[0]
seq_len = src_token.shape[1]
src_embed = self.embedder(src_token, src_pos, src_type, src_turn)
src_embed = self.embed_layer_norm(src_embed)
mask = self._create_mask(src_mask, append_head=self.num_latent > 0)
if self.num_latent > 0:
src_embed = F.unsqueeze(src_embed, [1])
src_embed = layers.expand(src_embed, [1, self.num_latent, 1, 1])
src_embed = layers.reshape(src_embed, [-1, seq_len, self.hidden_dim])
latent_embed = self.latent_embeddings
latent_embed = F.unsqueeze(latent_embed, [1])
latent_embed = layers.expand(latent_embed, [batch_size, 1, 1])
latent_embed = self.embed_layer_norm(latent_embed)
enc_out = layers.concat([latent_embed, src_embed], axis=1)
mask = F.unsqueeze(mask, [1])
mask = layers.expand(mask, [1, self.num_latent, 1, 1])
mask = layers.reshape(mask, [-1, seq_len + 1, seq_len + 1])
else:
enc_out = src_embed
cache = {}
for l, layer in enumerate(self.layers):
cache[f"layer_{l}"] = {}
enc_out = layer(enc_out, mask, cache[f"layer_{l}"])
state["cache"] = cache
state["mask"] = mask[:, :1]
if self.num_latent > 0:
state["batch_size"] = batch_size * self.num_latent
shape = [batch_size * self.num_latent, 1, 1]
else:
state["batch_size"] = batch_size
shape = [batch_size, 1, 1]
state["pred_mask"] = layers.ones(shape, self._dtype)
state["pred_pos"] = layers.zeros(shape, "int64")
state["pred_type"] = layers.zeros(shape, "int64")
state["pred_turn"] = layers.zeros(shape, "int64")
if "tgt_token" in inputs and self.num_latent > 0:
tgt_token = inputs["tgt_token"][:, :-1]
tgt_mask = inputs["tgt_mask"][:, :-1]
tgt_pos = inputs["tgt_pos"][:, :-1]
tgt_type = inputs["tgt_type"][:, :-1]
tgt_turn = inputs["tgt_turn"][:, :-1]
input_mask = layers.concat([src_mask, tgt_mask], axis=1)
input_mask.stop_gradient = True
src_embed = self.embedder(src_token, src_pos, src_type, src_turn)
tgt_embed = self.embedder(tgt_token, tgt_pos, tgt_type, tgt_turn)
embed = layers.concat([src_embed, tgt_embed], axis=1)
embed = self.embed_layer_norm(embed)
batch_size = src_token.shape[0]
src_len = src_token.shape[1]
tgt_len = tgt_token.shape[1]
post_embed, post_probs, post_logits = self._posteriori_network(
input_mask, embed, batch_size, src_len, tgt_len)
state["post_probs"] = post_probs
return state
def _decode(self, state):
""" Decoding one time stamp. """
# shape: [batch_size, 1, seq_len]
mask = state["mask"]
# shape: [batch_size, 1]
pred_token = state["pred_token"]
pred_mask = state["pred_mask"]
pred_pos = state["pred_pos"]
pred_type = state["pred_type"]
pred_turn = state["pred_turn"]
# list of shape(len: num_layers): [batch_size, seq_len, hidden_dim]
cache = state["cache"]
pred_embed = self.embedder(pred_token, pred_pos, pred_type, pred_turn)
pred_embed = self.embed_layer_norm(pred_embed)
# shape: [batch_size, 1, seq_len + 1]
mask = layers.concat([mask, 1 - pred_mask], axis=2)
# shape: [batch_size, 1, hidden_dim]
for l, layer in enumerate(self.layers):
pred_embed = layer(pred_embed, mask, cache[f"layer_{l}"])
# shape: [batch_size, 1, vocab_size]
if self.two_layer_predictor:
pred_embed = self.pre_predictor(pred_embed)
if self.weight_sharing:
token_embedding = self.embedder.token_embedding._w
pred_logits = layers.matmul(
x=pred_embed,
y=token_embedding,
transpose_y=True
)
else:
pred_logits = self.predictor(pred_embed)
pred_logits = pred_logits[: , 0]
pred_probs = layers.softmax(pred_logits, axis=1)
pred_logits = layers.log(pred_probs)
state["mask"] = mask
return pred_logits, state
def _ranking(self, inputs, predictions):
""" Reranking generated responses. """
src_token = inputs["src_token"]
src_mask = inputs["src_mask"]
src_pos = inputs["src_pos"]
src_type = inputs["src_type"]
src_turn = inputs["src_turn"]
src_embed = self.embedder(src_token, src_pos, src_type, src_turn)
batch_size, num_latent, tgt_seq_len = predictions.shape
# shape: [batch_size, num_latent, seq_len, 1]
preds_token = F.unsqueeze(predictions, [3])
preds_mask = F.not_equal(preds_token, self.padding_idx, "int64")
preds_pos = layers.range(0, tgt_seq_len, 1, dtype="float32")
preds_pos = F.unsqueeze(preds_pos, [0, 0, 1])
preds_pos = layers.expand(preds_pos, [batch_size, num_latent, 1, 1])
preds_pos = layers.cast(preds_pos, "int64")
preds_type = layers.zeros_like(preds_token)
preds_turn = layers.zeros_like(preds_token)
scores = []
for i in range(num_latent):
pred_token = preds_token[:, i]
pred_mask = preds_mask[:, i]
pred_pos = preds_pos[:, i]
pred_type = preds_type[:, i]
pred_turn = preds_turn[:, i]
input_mask = layers.concat([src_mask, pred_mask], axis=1)
input_mask.stop_gradient = True
pred_embed = self.embedder(pred_token, pred_pos, pred_type, pred_turn)
embed = layers.concat([src_embed, pred_embed], axis=1)
embed = self.embed_layer_norm(embed)
mask_embed = self.mask_embed
mask_embed = layers.expand(mask_embed, [batch_size, 1, 1])
mask_embed = self.embed_layer_norm(mask_embed)
out = layers.concat([mask_embed, embed], axis=1)
mask = self._create_mask(input_mask, append_head=True)
for layer in self.layers:
out = layer(out, mask, None)
mask_embed = out[:, 0]
score = self.discriminator(mask_embed)
scores.append(score[:, 0])
scores = layers.stack(scores, axis=1)
return scores
def _infer(self, inputs):
""" Real inference process of model. """
results = {}
# Initial decode state.
state = self._init_state(inputs)
if "post_probs" in state:
results["post_probs"] = state.pop("post_probs")
# Generation process.
gen_results = self.generator(self._decode, state)
results.update(gen_results)
if self.num_latent > 0:
batch_size = state["batch_size"] // self.num_latent
results["scores"] = layers.reshape(results["scores"], [batch_size, self.num_latent])
results["log_p"] = results["scores"]
results["src"] = layers.reshape(inputs["src_token"], [batch_size, -1])
if "tgt_token" in inputs:
results["tgt"] = layers.reshape(inputs["tgt_token"], [batch_size, -1])
results["preds"] = layers.reshape(results["preds"], [batch_size, self.num_latent, -1])
if self.use_discriminator:
results["scores"] = self._ranking(inputs, results["preds"])
else:
batch_size = state["batch_size"]
if "tgt_token" in inputs:
results["tgt"] = layers.reshape(inputs["tgt_token"], [batch_size, -1])
return results
UnifiedTransformer.register("UnifiedTransformer")
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Embedder class.
"""
import paddle.fluid as fluid
from paddle.fluid.dygraph import Embedding
from paddle.fluid.dygraph import Layer
import paddle.fluid.layers as layers
import plato.modules.functions as F
class Embedder(Layer):
"""
Composite embedding layer.
"""
def __init__(self,
name_scope,
hidden_dim,
num_token_embeddings,
num_pos_embeddings,
num_type_embeddings,
num_turn_embeddings,
padding_idx=None,
dropout=0.1,
pos_trainable=False):
super().__init__(name_scope)
self.token_embedding = Embedding(name_scope=self.full_name(),
size=[num_token_embeddings, hidden_dim])
self.pos_embedding = Embedding(name_scope=self.full_name(),
size=[num_pos_embeddings, hidden_dim],
param_attr=fluid.ParamAttr(trainable=pos_trainable))
self.type_embedding = Embedding(name_scope=self.full_name(),
size=[num_type_embeddings, hidden_dim])
self.turn_embedding = Embedding(name_scope=self.full_name(),
size=[num_turn_embeddings, hidden_dim])
self.dropout = dropout
return
def forward(self, token_inp, pos_inp, type_inp, turn_inp):
embed = self.token_embedding(token_inp) + \
self.pos_embedding(pos_inp) + \
self.type_embedding(type_inp) + \
self.turn_embedding(turn_inp)
embed = F.dropout(embed, self.dropout)
return embed
def main():
import numpy as np
place = fluid.CPUPlace()
with fluid.dygraph.guard(place):
model = Embedder("Embedder", 10, 20, 20, 20, 20)
token_inp = fluid.dygraph.to_variable(np.random.randint(0, 19, [10, 10, 1]).astype("int64"))
pos_inp = fluid.dygraph.to_variable(np.random.randint(0, 19, [10, 10, 1]).astype("int64"))
type_inp = fluid.dygraph.to_variable(np.random.randint(0, 19, [10, 10, 1]).astype("int64"))
turn_inp = fluid.dygraph.to_variable(np.random.randint(0, 19, [10, 10, 1]).astype("int64"))
out = model(token_inp, pos_inp, type_inp, turn_inp)
print(out)
if __name__ == "__main__":
main()
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
FeedForward class.
"""
import paddle.fluid as fluid
from paddle.fluid.dygraph import FC
from paddle.fluid.dygraph import Layer
import paddle.fluid.layers as layers
import plato.modules.functions as F
class FeedForward(Layer):
"""
Positional feed forward layer.
"""
def __init__(self, name_scope, hidden_dim, inner_dim, dropout):
super().__init__(name_scope)
self.hidden_dim = hidden_dim
self.inner_dim = inner_dim
self.linear_hidden = FC(name_scope=self.full_name(),
size=inner_dim,
num_flatten_dims=2,
act="gelu")
self.linear_out = FC(name_scope=self.full_name(),
size=hidden_dim,
num_flatten_dims=2)
self.dropout = dropout
return
def forward(self, x):
out = self.linear_hidden(x)
out = F.dropout(out, self.dropout)
out = self.linear_out(out)
return out
def main():
import numpy as np
place = fluid.CPUPlace()
with fluid.dygraph.guard(place):
model = FeedForward("FeedForward", 10, 20, 0.5)
inp = np.random.rand(2, 3, 10).astype("float32")
inp = fluid.dygraph.to_variable(inp)
out = model(inp)
print(out)
if __name__ == "__main__":
main()
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Helpful functions.
"""
import numpy as np
import paddle.fluid as fluid
import paddle.fluid.layers as layers
def unsqueeze(input, axes):
""" Implement unsqueeze in dygraph mode. """
# return layers.unsqueeze(input, axes)
# op:unsqueeze has bug in dygraph
axes = [axis if axis >= 0 else axis + len(input.shape) + 1 for axis in axes]
axes = sorted(axes, reverse=True)
shape = list(input.shape)
for axis in axes:
shape.insert(axis, 1)
return layers.reshape(input, shape)
def gumbel_softmax(input, tau=1, eps=1e-10):
""" Basic implement of gumbel_softmax. """
U = fluid.dygraph.to_variable(np.random.rand(*input.shape))
# U = layers.uniform_random(input.shape, dtype=input.dtype, min=0.0, max=1.0)
# U.stop_gradient = True
gumbel = 0.0 - layers.log(eps - layers.log(U + eps))
y = input + gumbel
return layers.softmax(y / tau)
def equal(x, y, dtype=None):
""" Implement equal in dygraph mode. """
# if not isinstance(y, fluid.framework.Variable):
# y = layers.fill_constant(x.shape, x.dtype, y)
# return layers.cast(layers.equal(x, y), dtype)
if dtype is None:
dtype = "float32"
if isinstance(x, fluid.framework.Variable):
x = x.numpy()
if isinstance(y, fluid.framework.Variable):
y = y.numpy()
out = np.equal(x, y).astype(dtype)
return fluid.dygraph.to_variable(out)
def not_equal(x, y, dtype=None):
""" Implement not_equal in dygraph mode. """
return 1 - equal(x, y, dtype)
def dropout(x, p):
""" Implement dropout function like tensorflow/pytorch. """
return layers.dropout(x, p, dropout_implementation="upscale_in_train")
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
LayerNorm layer.
"""
# from paddle.fluid.dygraph import LayerNorm
from six.moves import reduce
import paddle.fluid as fluid
import paddle.fluid.layers as layers
from paddle.fluid.dygraph import Layer
import logging
class LayerNorm(Layer):
""" Implement LayerNorm in dygraph mode. """
def __init__(self,
name_scope,
scale=True,
shift=True,
begin_norm_axis=1,
epsilon=1e-05,
param_attr=None,
bias_attr=None,
act=None):
super().__init__(name_scope)
self._scale = scale
self._shift = shift
self._begin_norm_axis = begin_norm_axis
self._epsilon = epsilon
self._param_attr = param_attr
self._bias_attr = bias_attr
self._act = act
return
def _build_once(self, input):
""" Create parameters. """
self._dtype = self._helper.input_dtype(input)
input_shape = input.shape
param_shape = [
reduce(lambda x, y: x * y, input_shape[self._begin_norm_axis:])
]
if self._scale:
self._scale_w = self.create_parameter(
attr=self._param_attr,
shape=param_shape,
dtype=self._dtype,
default_initializer=fluid.initializer.Constant(1.0))
else:
if self._param_attr:
logging.warn("param_attr are only avaliable with scale is True")
if self._shift:
assert self._bias_attr is not False
self._bias_w = self.create_parameter(
attr=self._bias_attr,
shape=param_shape,
dtype=self._dtype,
is_bias=True)
else:
if self._bias_attr:
logging.warn("bias_attr are only avaliable with shift is True")
return
def forward(self, x):
""" Forward process of LayerNorm. """
mean = layers.reduce_mean(x,
dim=list(range(self._begin_norm_axis, len(x.shape))),
keep_dim=True)
shift_x = layers.elementwise_sub(x=x, y=mean, axis=0)
variance = layers.reduce_mean(layers.square(shift_x),
dim=list(range(self._begin_norm_axis, len(x.shape))),
keep_dim=True)
r_stdev = layers.rsqrt(variance + self._epsilon)
norm_x = layers.elementwise_mul(x=shift_x, y=r_stdev, axis=0)
out = layers.elementwise_mul(x=norm_x, y=self._scale_w, axis=-1)
out = layers.elementwise_add(x=out, y=self._bias_w, axis=-1)
return out
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
MultiheadAttention class.
"""
import paddle.fluid as fluid
from paddle.fluid.dygraph import Layer
from paddle.fluid.dygraph import FC
import paddle.fluid.layers as layers
import plato.modules.functions as F
class MultiheadAttention(Layer):
"""
Multi head attention layer.
"""
def __init__(self, name_scope, hidden_dim, num_heads, dropout):
assert hidden_dim % num_heads == 0
super().__init__(name_scope)
self.hidden_dim = hidden_dim
self.num_heads = num_heads
self.head_dim = hidden_dim // num_heads
self.scale = self.head_dim ** -0.5
self.linear_qkv = FC(name_scope=self.full_name(),
size=hidden_dim * 3,
num_flatten_dims=2)
self.linear_out = FC(name_scope=self.full_name(),
size=hidden_dim,
num_flatten_dims=2)
self.dropout = dropout
return
def _split_heads(self, x, is_key=False):
x = layers.reshape(
x=x, shape=[0, 0, self.num_heads, self.head_dim]
)
x = layers.transpose(x=x, perm=[0, 2, 3, 1] if is_key else [0, 2, 1, 3])
return x
def _merge_heads(self, x):
x = layers.transpose(x=x, perm=[0, 2, 1, 3])
x = layers.reshape(x=x, shape=[0, 0, self.hidden_dim])
return x
def _attn(self, query, key, value, mask):
# shape: [batch_size, num_head, seq_len, seq_len]
scores = layers.matmul(x=query, y=key, alpha=self.scale)
if mask is not None:
mask = F.unsqueeze(mask, [1])
mask = layers.expand(mask, [1, self.num_heads, 1, 1])
mask.stop_gradient = True
scores = (1 - mask) * scores + layers.scale(mask, scale=-1e10)
attn = layers.softmax(scores, axis=-1)
attn = F.dropout(attn, self.dropout)
if mask is not None:
attn = (1 - mask) * attn
out = layers.matmul(x=attn, y=value)
return out
def forward(self, inp, mask=None, cache=None):
""" Forward process of self attention. """
# shape: [batch_size, seq_len, 3 * hidden_dim]
qkv = self.linear_qkv(inp)
query, key, value = layers.split(qkv, num_or_sections=3, dim=2)
# shape: [batch_size, num_head, seq_len, head_dim]
query = self._split_heads(query)
# shape: [batch_size, num_head, head_dim, seq_len]
key = self._split_heads(key, is_key=True)
# shape: [batch_size, num_head, seq_len, head_dim]
value = self._split_heads(value)
if cache is not None:
if "key" in cache and "value" in cache:
key = layers.concat([cache["key"], key], axis=3)
value = layers.concat([cache["value"], value], axis=2)
cache["key"] = key
cache["value"] = value
out = self._attn(query, key, value, mask)
out = self._merge_heads(out)
out = self.linear_out(out)
return out
def main():
import numpy as np
place = fluid.CPUPlace()
with fluid.dygraph.guard(place):
model = MultiheadAttention("MultiheadAttention", 10, 2, 0.5)
inp = np.random.rand(2, 3, 10).astype("float32")
inp = fluid.dygraph.to_variable(inp)
out = model(inp, inp, inp)
print(out)
if __name__ == "__main__":
main()
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Parallel class.
"""
from collections import OrderedDict
import os
import numpy as np
from paddle.fluid import core
from paddle.fluid.dygraph import layers
from paddle.fluid.dygraph import parallel_helper
import paddle.fluid.framework as framework
from paddle.fluid.layers import collective
from paddle.fluid.dygraph.base import to_variable, no_grad
ParallelStrategy = core.ParallelStrategy
def prepare_context(strategy=None):
""" Copy codes. """
if strategy is None:
strategy = ParallelStrategy()
strategy.nranks = Env().nranks
strategy.local_rank = Env().local_rank
strategy.trainer_endpoints = Env().trainer_endpoints
strategy.current_endpoint = Env().current_endpoint
if strategy.nranks < 2:
return
assert framework.in_dygraph_mode() is True, \
"dygraph.parallel.prepare_context should be used with dygrahp mode."
place = framework._current_expected_place()
assert place is not None, \
"dygraph.parallel.prepare_context should be used in fluid.dygraph.guard(place) guard."
if isinstance(place, core.CUDAPlace):
parallel_helper._set_parallel_ctx(
core.NCCLParallelContext(strategy, place))
else:
# TODO(Yancey1989): add Gloo Parallel Context to support CPU parallel computation
assert ("Only support CUDAPlace for now.")
parallel_helper._init_parallel_ctx()
return strategy
class Env(object):
""" Copy codes. """
def __init__(self):
self._nranks = int(os.getenv("PADDLE_TRAINERS_NUM", "1"))
self._local_rank = int(os.getenv("PADDLE_TRAINER_ID", "0"))
self._dev_id = int(os.getenv("FLAGS_selected_gpus", "0"))
self._trainer_endpoints = os.getenv("PADDLE_TRAINER_ENDPOINTS",
"").split(",")
self._current_endpoint = os.getenv("PADDLE_CURRENT_ENDPOINT", "")
@property
def nranks(self):
""" Copy codes. """
return self._nranks
@property
def local_rank(self):
""" Copy codes. """
return self._local_rank
@property
def dev_id(self):
""" Copy codes. """
return self._dev_id
@property
def current_endpoint(self):
""" Copy codes. """
return self._current_endpoint
@property
def trainer_endpoints(self):
""" Copy codes. """
return self._trainer_endpoints
class DataParallel(layers.Layer):
"""
Runs the module with data parallelism.
Currently, DataParallel only supports to run the dynamic graph
with multi-process. The usage is:
`python -m paddle.distributed.launch --gpus 2 dynamic_graph_test.py`.
And the content of `dynamic_graph_test.py` is the code of examples.
Examples:
.. code-block:: python
import numpy as np
import paddle.fluid as fluid
import paddle.fluid.dygraph as dygraph
from paddle.fluid.optimizer import AdamOptimizer
from paddle.fluid.dygraph.nn import FC
from paddle.fluid.dygraph.base import to_variable
place = fluid.CUDAPlace(0)
with fluid.dygraph.guard(place=place):
# prepare the data parallel context
strategy=dygraph.parallel.prepare_context()
fc_layer = FC("FC", 10, act="softmax")
adam = fluid.optimizer.AdamOptimizer()
# make the module become the data parallelism module
fc_layer = dygraph.parallel.DataParallel(fc_layer, strategy)
x_data = np.random.random(size=[10, 1]).astype(np.float32)
data = to_variable(x_data)
hidden = fc_layer(data)
avg_loss = fluid.layers.mean(hidden)
# scale the loss according to the number of trainers.
avg_loss = fc_layer.scale_loss(avg_loss)
avg_loss.backward()
# collect the gradients of trainers.
fc_layer.apply_collective_grads()
adam.minimize(avg_loss)
fc_layer.clear_gradients()
Args:
layers(Layer): The module that should be executed by data parallel.
strategy(ParallelStrategy): The strategy of data parallelism.
Returns:
Layer: The data paralleled module.
"""
def __init__(self, layers, strategy):
super(DataParallel,
self).__init__(layers.full_name() + "_data_parallel")
self._layers = layers
self._strategy = strategy
def forward(self, *inputs, **kwargs):
return self._layers(*inputs, **kwargs)
def __call__(self, *args, **kwargs):
# Reimplement __call__ function
if not self._built:
self._built = True
outputs = self.forward(*args, **kwargs)
return outputs
def scale_loss(self, loss):
"""
Scale the loss. In data parallel mode, the loss should be scale with
the number of trainers. If not in data parallel mode, return the loss
directly.
Args:
loss(Layer): The loss of the current Model.
Returns:
Layer: the scaled loss.
"""
if not self._is_data_parallel_mode():
return loss
loss_scale = to_variable(
np.array([self._strategy.nranks]).astype("float32"))
loss_scale.stop_gradient = True
loss = loss / loss_scale
return loss
def _coalesce_tensors(self, var_groups):
from paddle.fluid.layers import nn
coalesced_grads_and_grad_vars = []
for group_id, grad_vars in var_groups.items():
flattened_vars = []
g_var_shapes = []
for g_var in grad_vars:
g_var_shapes.append(g_var.shape)
flattened_vars.append(
nn.reshape(
x=g_var, shape=[np.prod(g_var.shape)], inplace=True))
coalesced_grad = nn.concat(flattened_vars)
coalesced_grads_and_grad_vars.append(
[coalesced_grad, grad_vars, g_var_shapes])
return coalesced_grads_and_grad_vars
def _split_tensors(self, coalesced_grads_and_grad_vars):
from paddle.fluid.layers import nn
for coalesced_grad, origin_grad_vars, grad_shapes in coalesced_grads_and_grad_vars:
grad_var_len = [np.prod(g_shape) for g_shape in grad_shapes]
self._helper.main_program.current_block().append_op(
type='split',
inputs={'X': coalesced_grad},
outputs={'Out': origin_grad_vars},
attrs={'sections': grad_var_len,
'axis': 0})
for g_var, g_shape in zip(origin_grad_vars, grad_shapes):
nn.reshape(x=g_var, shape=g_shape, inplace=True)
@no_grad
def apply_collective_grads(self):
"""
AllReduce the Parameters' gradient.
"""
if not self._is_data_parallel_mode():
return
grad_var_set = set()
grad_vars = []
for param in self._layers.parameters():
# NOTE(zcd): The grad_ivar maybe no generated.
if param.trainable and param._ivar._grad_ivar():
g_var = framework.Variable(
block=self._helper.main_program.current_block(),
name=param._ivar._grad_name(),
stop_gradient=True,
ivar=param._ivar._grad_ivar())
grad_vars.append(g_var)
assert g_var not in grad_var_set
grad_var_set.add(g_var)
# FIXME(zcd): the type of the var should be LoDTensor, i.e
# the gradients should be dense, otherwise, the following
# logic should be updated.
# 128 MB as a group
mega_bytes = 128 * 1024 * 1024
group_idx = 0
memory_counter = 0
grad_var_groups = OrderedDict()
dtype = grad_vars[0].dtype
for g_var in grad_vars:
# Note: the dtype of the same group should be the same.
bytes = np.prod(g_var.shape) * core.size_of_dtype(g_var.dtype)
if memory_counter < mega_bytes and dtype == g_var.dtype:
memory_counter += bytes
else:
memory_counter = bytes
group_idx += 1
grad_var_groups.setdefault(group_idx, []).append(g_var)
coalesced_grads_and_vars = self._coalesce_tensors(grad_var_groups)
for coalesced_grad, g_vars, g_shapes in coalesced_grads_and_vars:
collective._allreduce(
coalesced_grad, coalesced_grad, sync_mode=False)
self._split_tensors(coalesced_grads_and_vars)
def _is_data_parallel_mode(self):
return self._strategy.nranks > 1
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
TransformerBlock class.
"""
import paddle.fluid as fluid
from paddle.fluid.dygraph import FC
from paddle.fluid.dygraph import Layer
import paddle.fluid.layers as layers
from plato.modules.feedforward import FeedForward
from plato.modules.layer_norm import LayerNorm
from plato.modules.multihead_attention import MultiheadAttention
import plato.modules.functions as F
class TransformerBlock(Layer):
"""
Transformer block module.
"""
def __init__(self, name_scope, hidden_dim, num_heads, dropout, attn_dropout, ff_dropout):
super().__init__(name_scope)
self.attn = MultiheadAttention(name_scope=self.full_name(),
hidden_dim=hidden_dim,
num_heads=num_heads,
dropout=attn_dropout)
self.attn_norm = LayerNorm(name_scope=self.full_name(),
begin_norm_axis=2,
epsilon=1e-12,
param_attr=fluid.ParamAttr(
regularizer=fluid.regularizer.L2Decay(0.0)),
bias_attr=fluid.ParamAttr(
regularizer=fluid.regularizer.L2Decay(0.0)))
self.ff = FeedForward(name_scope=self.full_name(),
hidden_dim=hidden_dim,
inner_dim=4 * hidden_dim,
dropout=ff_dropout)
self.ff_norm = LayerNorm(name_scope=self.full_name(),
begin_norm_axis=2,
epsilon=1e-12,
param_attr=fluid.ParamAttr(
regularizer=fluid.regularizer.L2Decay(0.0)),
bias_attr=fluid.ParamAttr(
regularizer=fluid.regularizer.L2Decay(0.0)))
self.dropout = dropout
return
def forward(self, inp, mask=None, cache=None):
"""
Forward process on one transformer layer.
@param : x
@type : Variable(shape: [batch_size, seq_len, hidden_size])
@param : memory
@type : Variable(shape: [batch_size, seq_len, hidden_size])
@param : mask
@param : cache
"""
attn_out = self.attn(inp, mask, cache)
attn_out = F.dropout(attn_out, self.dropout)
attn_out = self.attn_norm(attn_out + inp)
ff_out = self.ff(attn_out)
ff_out = F.dropout(ff_out, self.dropout)
ff_out = self.ff_norm(ff_out + attn_out)
return ff_out
def main():
import numpy as np
place = fluid.CPUPlace()
with fluid.dygraph.guard(place):
model = TransformerBlock("TransformerBlock", 10, 2, 0.5, 0.5, 0.5)
inp = np.random.rand(2, 3, 10).astype("float32")
inp = fluid.dygraph.to_variable(inp)
out = model(inp, inp)
print(out)
if __name__ == "__main__":
main()
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Trainer class.
"""
import json
import logging
import os
import sys
import time
import numpy as np
import paddle
import paddle.fluid as fluid
import paddle.fluid.dygraph as dygraph
from tqdm import tqdm
from plato.args import str2bool
from plato.data.data_loader import DataLoader
from plato.metrics.metrics_tracker import MetricsTracker
from plato.metrics.metrics import bleu
from plato.metrics.metrics import distinct
import plato.modules.parallel as parallel
def get_logger(log_path, name="default"):
logger = logging.getLogger(name)
logger.propagate = False
logger.setLevel(logging.DEBUG)
formatter = logging.Formatter("%(message)s")
sh = logging.StreamHandler(sys.stdout)
sh.setFormatter(formatter)
logger.addHandler(sh)
fh = logging.FileHandler(log_path, mode="w")
fh.setFormatter(formatter)
logger.addHandler(fh)
return logger
def evaluate_generation_result(results):
tgt = [result["tgt"].split(" ") for result in results]
pred = [result["preds"][np.argmax(result["scores"])]
if isinstance(result["preds"], list)
else result["preds"]
for result in results]
pred = [p.split(" ") for p in pred]
metrics = {}
metrics_tracker = MetricsTracker()
bleu1, bleu2 = bleu(pred, tgt)
metrics.update({"bleu_1": bleu1, "bleu_2": bleu2})
intra_dist1, intra_dist2, inter_dist1, inter_dist2 = distinct(pred)
metrics.update({"intra_dist_1": intra_dist1,
"intra_dist_2": intra_dist2,
"inter_dist_1": inter_dist1,
"inter_dist_2": inter_dist2})
avg_len = sum(map(len, pred)) / len(pred)
metrics.update({"len": avg_len})
metrics_tracker.update(metrics, num_samples=1)
return metrics_tracker
def save(model, model_path):
if isinstance(model, parallel.DataParallel):
model = model._layers
if hasattr(fluid, "save_dygraph"):
# >= 1.6.0 compatible
fluid.save_dygraph(model.state_dict(), model_path)
fluid.save_dygraph(model.optimizer.state_dict(), model_path)
else:
dygraph.save_persistables(model.state_dict(), model_path, optimizers=model.optimizer)
return
class Trainer(object):
@classmethod
def add_cmdline_argument(cls, parser):
""" Add the cmdline arguments of trainer. """
group = parser.add_argument_group("Trainer")
group.add_argument("--use_data_distributed", type=str2bool, default=False,
help="Whether to use data distributed for parallel training.")
group.add_argument("--valid_metric_name", type=str, default="-loss",
help="The validation metric determining which checkpoint is the best.")
group.add_argument("--num_epochs", type=int, default=10,
help="Total number of training epochs to perform.")
group.add_argument("--save_dir", type=str, required=True,
help="The output directory where the model will be saved.")
group.add_argument("--batch_size", type=int, default=8,
help="Total batch size for training/evaluation/inference.")
group.add_argument("--log_steps", type=int, default=100,
help="The number of training steps to output current metrics "
"on past training dataset.")
group.add_argument("--valid_steps", type=int, default=2000,
help="The number of training steps to perform a evaluation "
"on validation datasets.")
group.add_argument("--save_checkpoint", type=str2bool, default=True,
help="Whether to save one checkpoints for each training epoch.")
group.add_argument("--save_summary", type=str2bool, default=False,
help="Whether to save metrics summary for visualDL module.")
DataLoader.add_cmdline_argument(group)
return group
def __init__(self, model, to_tensor, hparams, logger=None):
# Use data distributed
if hparams.use_data_distributed:
strategy = parallel.prepare_context()
if strategy is not None:
parallel_model = parallel.DataParallel(model, strategy)
model.before_backward_fn = parallel_model.scale_loss
model.after_backward_fn = parallel_model.apply_collective_grads
model = parallel_model
self.model = model
self.to_tensor = to_tensor
self.is_decreased_valid_metric = hparams.valid_metric_name[0] == "-"
self.valid_metric_name = hparams.valid_metric_name[1:]
self.num_epochs = hparams.num_epochs
self.save_dir = hparams.save_dir
self.log_steps = hparams.log_steps
self.valid_steps = hparams.valid_steps
self.save_checkpoint = hparams.save_checkpoint
self.save_summary = hparams.save_summary
if not os.path.exists(self.save_dir):
os.makedirs(self.save_dir)
self.logger = logger or get_logger(os.path.join(self.save_dir, "trainer.log"), "trainer")
if self.save_summary:
from visualdl import LogWriter
self.summary_logger = LogWriter(os.path.join(self.save_dir, "summary"), sync_cycle=10000)
self.train_summary = {}
self.valid_summary = {}
self.batch_metrics_tracker = MetricsTracker()
self.token_metrics_tracker = MetricsTracker()
self.best_valid_metric = float("inf" if self.is_decreased_valid_metric else "-inf")
self.epoch = 0
self.batch_num = 0
def train_epoch(self, train_iter, valid_iter, infer_iter=None, infer_parse_dict=None):
"""
Train an epoch.
@param train_iter
@type : DataLoader
@param valid_iter
@type : DataLoader
@param infer_iter
@type : DataLoader
@param infer_parse_dict
@type : dict of function
"""
self.epoch += 1
num_batches = len(train_iter)
self.batch_metrics_tracker.clear()
self.token_metrics_tracker.clear()
times = []
for batch_id, (batch, batch_size) in enumerate(train_iter, 1):
batch = type(batch)(map(lambda kv: (kv[0], self.to_tensor(kv[1])), batch.items()))
batch["epoch"] = self.epoch
batch["num_steps"] = self.batch_num
# Do a training iteration
start_time = time.time()
metrics = self.model(batch, is_training=True)
token_num = metrics.pop("token_num", None)
elapsed = time.time() - start_time
times.append(elapsed)
batch_metrics = {k: v for k, v in metrics.items() if "token" not in k}
token_metrics = {k: v for k, v in metrics.items() if "token" in k}
self.batch_metrics_tracker.update(batch_metrics, batch_size)
self.token_metrics_tracker.update(token_metrics, token_num)
self.batch_num += 1
if self.log_steps and batch_id % self.log_steps == 0:
batch_metrics_message = self.batch_metrics_tracker.value()
token_metrics_message = self.token_metrics_tracker.value()
message_prefix = f"[Train][{self.epoch}][{batch_id}/{num_batches}]"
avg_time = f"AVG_Time-{sum(times[-self.log_steps:]) / self.log_steps:.3f}"
message = " ".join([message_prefix, batch_metrics_message, token_metrics_message,
avg_time])
self.logger.info(message)
if self.save_summary:
with self.summary_logger.mode("train"):
for k, v in self.batch_metrics_tracker.items():
if k not in self.train_summary:
self.train_summary[k] = self.summary_logger.scalar(k)
scalar = self.train_summary[k]
scalar.add_record(self.batch_num, v)
for k, v in self.token_metrics_tracker.items():
if k not in self.train_summary:
self.train_summary[k] = self.summary_logger.scalar(k)
scalar = self.train_summary[k]
scalar.add_record(self.batch_num, v)
if self.valid_steps and valid_iter is not None and \
batch_id % self.valid_steps == 0:
self.evaluate(valid_iter)
if valid_iter is not None:
self.evaluate(valid_iter)
if infer_iter is not None and infer_parse_dict is not None:
self.infer(infer_iter, infer_parse_dict)
return
def infer(self, data_iter, parse_dict, num_batches=None):
"""
Inference interface.
@param : data_iter
@type : DataLoader
@param : parse_dict
@type : dict of function
@param : num_batches : the number of batch to infer
@type : int/None
"""
self.logger.info("Generation starts ...")
infer_save_file = os.path.join(self.save_dir, f"infer_{self.epoch}.result.json")
# Inference
infer_results = []
batch_cnt = 0
begin_time = time.time()
for batch, batch_size in tqdm(data_iter, total=num_batches):
batch = type(batch)(map(lambda kv: (kv[0], self.to_tensor(kv[1])), batch.items()))
result = self.model.infer(inputs=batch)
batch_result = {}
def to_list(batch):
""" Parse list. """
return batch.tolist()
# parse
for k in result:
if k in parse_dict:
parse_fn = parse_dict[k]
else:
parse_fn = to_list
if result[k] is not None:
batch_result[k] = parse_fn(result[k])
for vs in zip(*batch_result.values()):
infer_result = {}
for k, v in zip(batch_result.keys(), vs):
infer_result[k] = v
infer_results.append(infer_result)
batch_cnt += 1
if batch_cnt == num_batches:
break
self.logger.info(f"Saved inference results to {infer_save_file}")
with open(infer_save_file, "w") as fp:
json.dump(infer_results, fp, indent=2)
infer_metrics_tracker = evaluate_generation_result(infer_results)
metrics_message = infer_metrics_tracker.summary()
message_prefix = f"[Infer][{self.epoch}]"
time_cost = f"TIME-{time.time() - begin_time:.3f}"
message = " ".join([message_prefix, metrics_message, time_cost])
self.logger.info(message)
return
def evaluate(self, data_iter, need_save=True):
"""
Evaluation interface
@param : data_iter
@type : DataLoader
@param : need_save
@type : bool
"""
if isinstance(self.model, parallel.DataParallel):
need_save = need_save and parallel.Env().local_rank == 0
# Evaluation
begin_time = time.time()
batch_metrics_tracker = MetricsTracker()
token_metrics_tracker = MetricsTracker()
for batch, batch_size in data_iter:
batch = type(batch)(map(lambda kv: (kv[0], self.to_tensor(kv[1])), batch.items()))
metrics = self.model(batch, is_training=False)
token_num = int(metrics.pop("token_num"))
batch_metrics = {k: v for k, v in metrics.items() if "token" not in k}
token_metrics = {k: v for k, v in metrics.items() if "token" in k}
batch_metrics_tracker.update(batch_metrics, batch_size)
token_metrics_tracker.update(token_metrics, token_num)
batch_metrics_message = batch_metrics_tracker.summary()
token_metrics_message = token_metrics_tracker.summary()
message_prefix = f"[Valid][{self.epoch}]"
time_cost = f"TIME-{time.time() - begin_time:.3f}"
message = " ".join([message_prefix, batch_metrics_message, token_metrics_message, time_cost])
self.logger.info(message)
if need_save:
# Check valid metric
cur_valid_metric = batch_metrics_tracker.get(self.valid_metric_name)
if self.is_decreased_valid_metric:
is_best = cur_valid_metric < self.best_valid_metric
else:
is_best = cur_valid_metric > self.best_valid_metric
if is_best:
# Save current best model
self.best_valid_metric = cur_valid_metric
best_model_path = os.path.join(self.save_dir, "best.model")
save(self.model, best_model_path)
self.logger.info(
f"Saved best model to '{best_model_path}' with new best valid metric "
f"{self.valid_metric_name.upper()}-{self.best_valid_metric:.3f}")
# Save checkpoint
if self.save_checkpoint:
model_file = os.path.join(self.save_dir, f"epoch_{self.epoch}.model")
save(self.model, model_file)
if self.save_summary:
with self.summary_logger.mode("valid"):
for k, v in self.batch_metrics_tracker.items():
if k not in self.valid_summary:
self.valid_summary[k] = self.summary_logger.scalar(k)
scalar = self.valid_summary[k]
scalar.add_record(self.batch_num, v)
for k, v in self.token_metrics_tracker.items():
if k not in self.valid_summary:
self.valid_summary[k] = self.summary_logger.scalar(k)
scalar = self.valid_summary[k]
scalar.add_record(self.batch_num, v)
return
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Preprocess script.
"""
import os
import argparse
from plato.args import str2bool
from plato.args import parse_args
from plato.data.dataset import Dataset
from plato.data.field import BPETextField
def main():
parser = argparse.ArgumentParser()
BPETextField.add_cmdline_argument(parser)
Dataset.add_cmdline_argument(parser)
args = parse_args(parser)
raw_train_file = os.path.join(args.data_dir, "dial.train")
raw_valid_file = os.path.join(args.data_dir, "dial.valid")
raw_test_file = os.path.join(args.data_dir, "dial.test")
train_file = raw_train_file + f".{args.tokenizer_type}.jsonl"
valid_file = raw_valid_file + f".{args.tokenizer_type}.jsonl"
test_file = raw_test_file + f".{args.tokenizer_type}.jsonl"
bpe = BPETextField(args.BPETextField)
BUILD_EXAMPLES_FN = {
"multi": bpe.build_examples_multi_turn,
"multi_knowledge": bpe.build_examples_multi_turn_with_knowledge
}
build_examples_fn = BUILD_EXAMPLES_FN[args.data_type]
if os.path.exists(raw_valid_file) and not os.path.exists(valid_file):
valid_examples = build_examples_fn(raw_valid_file, data_type="valid")
bpe.save_examples(valid_examples, valid_file)
if os.path.exists(raw_test_file) and not os.path.exists(test_file):
test_examples = build_examples_fn(raw_test_file, data_type="test")
bpe.save_examples(test_examples, test_file)
if os.path.exists(raw_train_file) and not os.path.exists(train_file):
train_examples = build_examples_fn(raw_train_file, data_type="train")
bpe.save_examples(train_examples, train_file)
return
if __name__ == "__main__":
main()
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Running scripts.
"""
import argparse
import json
import os
import numpy as np
import paddle.fluid as fluid
from plato.args import parse_args
from plato.args import str2bool
from plato.data.data_loader import DataLoader
from plato.data.dataset import Dataset
from plato.data.dataset import LazyDataset
from plato.data.field import BPETextField
from plato.trainer import Trainer
from plato.models.model_base import ModelBase
from plato.models.generator import Generator
import plato.modules.parallel as parallel
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--do_train", type=str2bool, default=False,
help="Whether to run trainning.")
parser.add_argument("--do_test", type=str2bool, default=False,
help="Whether to run evaluation on the test dataset.")
parser.add_argument("--do_infer", type=str2bool, default=False,
help="Whether to run inference on the test dataset.")
parser.add_argument("--num_infer_batches", type=int, default=None,
help="The number of batches need to infer.\n"
"Stay 'None': infer on entrie test dataset.")
parser.add_argument("--hparams_file", type=str, default=None,
help="Loading hparams setting from file(.json format).")
BPETextField.add_cmdline_argument(parser)
Dataset.add_cmdline_argument(parser)
Trainer.add_cmdline_argument(parser)
ModelBase.add_cmdline_argument(parser)
Generator.add_cmdline_argument(parser)
hparams = parse_args(parser)
if hparams.hparams_file and os.path.exists(hparams.hparams_file):
print(f"Loading hparams from {hparams.hparams_file} ...")
hparams.load(hparams.hparams_file)
print(f"Loaded hparams from {hparams.hparams_file}")
print(json.dumps(hparams, indent=2))
if not os.path.exists(hparams.save_dir):
os.makedirs(hparams.save_dir)
hparams.save(os.path.join(hparams.save_dir, "hparams.json"))
bpe = BPETextField(hparams.BPETextField)
hparams.Model.num_token_embeddings = bpe.vocab_size
generator = Generator.create(hparams.Generator, bpe=bpe)
COLLATE_FN = {
"multi": bpe.collate_fn_multi_turn,
"multi_knowledge": bpe.collate_fn_multi_turn_with_knowledge
}
collate_fn = COLLATE_FN[hparams.data_type]
# Loading datasets
if hparams.do_train:
raw_train_file = os.path.join(hparams.data_dir, "dial.train")
train_file = raw_train_file + f".{hparams.tokenizer_type}.jsonl"
assert os.path.exists(train_file), f"{train_file} isn't exist"
train_dataset = LazyDataset(train_file)
train_loader = DataLoader(train_dataset, hparams.Trainer, collate_fn=collate_fn, is_train=True)
raw_valid_file = os.path.join(hparams.data_dir, "dial.valid")
valid_file = raw_valid_file + f".{hparams.tokenizer_type}.jsonl"
assert os.path.exists(valid_file), f"{valid_file} isn't exist"
valid_dataset = LazyDataset(valid_file)
valid_loader = DataLoader(valid_dataset, hparams.Trainer, collate_fn=collate_fn)
if hparams.do_infer or hparams.do_test:
raw_test_file = os.path.join(hparams.data_dir, "dial.test")
test_file = raw_test_file + f".{hparams.tokenizer_type}.jsonl"
assert os.path.exists(test_file), f"{test_file} isn't exist"
test_dataset = LazyDataset(test_file)
test_loader = DataLoader(test_dataset, hparams.Trainer, collate_fn=collate_fn, is_test=hparams.do_infer)
def to_tensor(array):
array = np.expand_dims(array, -1)
return fluid.dygraph.to_variable(array)
if hparams.use_data_distributed:
place = fluid.CUDAPlace(parallel.Env().dev_id)
else:
place = fluid.CUDAPlace(0)
with fluid.dygraph.guard(place):
# Construct Model
model = ModelBase.create("Model", hparams, generator=generator)
# Construct Trainer
trainer = Trainer(model, to_tensor, hparams.Trainer)
if hparams.do_train:
# Training process
for epoch in range(hparams.num_epochs):
trainer.train_epoch(train_loader, valid_loader)
if hparams.do_test:
# Validation process
trainer.evaluate(test_loader, need_save=False)
if hparams.do_infer:
# Inference process
def split(xs, sep, pad):
""" Split id list by separator. """
out, o = [], []
for x in xs:
if x == pad:
continue
if x != sep:
o.append(x)
else:
if len(o) > 0:
out.append(list(o))
o = []
if len(o) > 0:
out.append(list(o))
assert(all(len(o) > 0 for o in out))
return out
def parse_context(batch):
""" Parse context. """
return bpe.denumericalize([split(xs, bpe.eos_id, bpe.pad_id)
for xs in batch.tolist()])
def parse_text(batch):
""" Parse text. """
return bpe.denumericalize(batch.tolist())
infer_parse_dict = {
"src": parse_context,
"tgt": parse_text,
"preds": parse_text
}
trainer.infer(test_loader, infer_parse_dict, num_batches=hparams.num_infer_batches)
if __name__ == "__main__":
main()
#!/bin/bash
set -ux
SAVE_DIR=outputs/DSTC7_AVSD.infer
VOCAB_PATH=model/Bert/vocab.txt
DATA_DIR=data/DSTC7_AVSD
INIT_CHECKPOINT=outputs/DSTC7_AVSD/best.model
DATA_TYPE=multi_knowledge
# CUDA environment settings.
export CUDA_VISIBLE_DEVICES=0
# Paddle environment settings.
export FLAGS_fraction_of_gpu_memory_to_use=0.1
export FLAGS_eager_delete_scope=True
export FLAGS_eager_delete_tensor_gb=0.0
python -u \
./preprocess.py \
--vocab_path $VOCAB_PATH \
--data_dir $DATA_DIR \
--data_type $DATA_TYPE
python -u \
./run.py \
--do_infer true \
--vocab_path $VOCAB_PATH \
--data_dir $DATA_DIR \
--data_type $DATA_TYPE \
--batch_size 4 \
--num_type_embeddings 3 \
--use_discriminator true \
--init_checkpoint $INIT_CHECKPOINT \
--save_dir $SAVE_DIR
#!/bin/bash
set -ux
SAVE_DIR=outputs/DSTC7_AVSD
VOCAB_PATH=model/Bert/vocab.txt
DATA_DIR=data/DSTC7_AVSD
INIT_CHECKPOINT=model/PLATO
DATA_TYPE=multi_knowledge
USE_VISUALDL=false
# CUDA environment settings.
export CUDA_VISIBLE_DEVICES=0
# Paddle environment settings.
export FLAGS_fraction_of_gpu_memory_to_use=0.1
export FLAGS_eager_delete_scope=True
export FLAGS_eager_delete_tensor_gb=0.0
python -u \
./preprocess.py \
--vocab_path $VOCAB_PATH \
--data_dir $DATA_DIR \
--data_type $DATA_TYPE
if [[ "$USE_VISUALDL" = true ]]; then
visualdl --logdir=$SAVE_DIR/summary --port=8083 --host=`hostname` &
VISUALDL_PID=$!
fi
python -u \
./run.py \
--do_train true \
--vocab_path $VOCAB_PATH \
--data_dir $DATA_DIR \
--data_type $DATA_TYPE \
--batch_size 4 \
--valid_steps 2000 \
--num_type_embeddings 3 \
--use_discriminator true \
--num_epoch 20 \
--lr 1e-5 \
--save_checkpoint false \
--save_summary $USE_VISUALDL \
--init_checkpoint $INIT_CHECKPOINT \
--save_dir $SAVE_DIR
if [[ $USE_VISUALDL = true ]]; then
kill $VISUALDL_PID
fi
#!/bin/bash
set -ux
SAVE_DIR=outputs/DailyDialog.baseline.infer
VOCAB_PATH=model/Bert/vocab.txt
DATA_DIR=data/DailyDialog
INIT_CHECKPOINT=outputs/DailyDialog.baseline/best.model
DATA_TYPE=multi
# CUDA environment settings.
export CUDA_VISIBLE_DEVICES=0
# Paddle environment settings.
export FLAGS_fraction_of_gpu_memory_to_use=0.1
export FLAGS_eager_delete_scope=True
export FLAGS_eager_delete_tensor_gb=0.0
python -u \
./preprocess.py \
--vocab_path $VOCAB_PATH \
--data_dir $DATA_DIR \
--data_type $DATA_TYPE
python -u \
./run.py \
--do_infer true \
--vocab_path $VOCAB_PATH \
--data_dir $DATA_DIR \
--data_type $DATA_TYPE \
--batch_size 48 \
--num_latent 0 \
--num_type_embeddings 2 \
--init_checkpoint $INIT_CHECKPOINT \
--length_average true \
--save_dir $SAVE_DIR
#!/bin/bash
set -ux
SAVE_DIR=outputs/DailyDialog.baseline
VOCAB_PATH=model-baseline/Bert/vocab.txt
DATA_DIR=data/DailyDialog
INIT_CHECKPOINT=model-baseline/PLATO.baseline
DATA_TYPE=multi
USE_VISUALDL=false
# CUDA environment settings.
export CUDA_VISIBLE_DEVICES=2
# Paddle environment settings.
export FLAGS_fraction_of_gpu_memory_to_use=0.1
export FLAGS_eager_delete_scope=True
export FLAGS_eager_delete_tensor_gb=0.0
python -u \
./preprocess.py \
--vocab_path $VOCAB_PATH \
--data_dir $DATA_DIR \
--data_type $DATA_TYPE
if [[ "$USE_VISUALDL" = true ]]; then
visualdl --logdir=$SAVE_DIR/summary --port=8083 --host=`hostname` &
VISUALDL_PID=$!
fi
python -u \
./run.py \
--do_train true \
--vocab_path $VOCAB_PATH \
--data_dir $DATA_DIR \
--data_type $DATA_TYPE \
--batch_size 2 \
--valid_steps 2000 \
--num_type_embeddings 2 \
--num_latent 0 \
--num_epoch 20 \
--lr 1e-5 \
--save_checkpoint false \
--save_summary $USE_VISUALDL \
--init_checkpoint $INIT_CHECKPOINT \
--save_dir $SAVE_DIR
if [[ $USE_VISUALDL = true ]]; then
kill $VISUALDL_PID
fi
#!/bin/bash
set -ux
SAVE_DIR=outputs/DailyDialog.infer
VOCAB_PATH=model/Bert/vocab.txt
DATA_DIR=data/DailyDialog
INIT_CHECKPOINT=outputs/DailyDialog/best.model
DATA_TYPE=multi
# CUDA environment settings.
export CUDA_VISIBLE_DEVICES=0
# Paddle environment settings.
export FLAGS_fraction_of_gpu_memory_to_use=0.1
export FLAGS_eager_delete_scope=True
export FLAGS_eager_delete_tensor_gb=0.0
python -u \
./preprocess.py \
--vocab_path $VOCAB_PATH \
--data_dir $DATA_DIR \
--data_type $DATA_TYPE
python -u \
./run.py \
--do_infer true \
--vocab_path $VOCAB_PATH \
--data_dir $DATA_DIR \
--data_type $DATA_TYPE \
--batch_size 4 \
--num_type_embeddings 2 \
--num_latent 20 \
--use_discriminator true \
--init_checkpoint $INIT_CHECKPOINT \
--save_dir $SAVE_DIR
#!/bin/bash
set -ux
SAVE_DIR=outputs/DailyDialog
VOCAB_PATH=model/Bert/vocab.txt
DATA_DIR=data/DailyDialog
INIT_CHECKPOINT=model/PLATO
DATA_TYPE=multi
USE_VISUALDL=false
# CUDA environment settings.
export CUDA_VISIBLE_DEVICES=0,1
# Paddle environment settings.
export FLAGS_fraction_of_gpu_memory_to_use=0.1
export FLAGS_eager_delete_scope=True
export FLAGS_eager_delete_tensor_gb=0.0
if [[ ! -e $DATA_DIR/dial.train.jsonl ]]; then
python -u \
./preprocess.py \
--vocab_path $VOCAB_PATH \
--data_dir $DATA_DIR \
--data_type $DATA_TYPE
fi
if [[ "$USE_VISUALDL" = true ]]; then
visualdl --logdir=$SAVE_DIR/summary --port=8083 --host=`hostname` &
VISUALDL_PID=$!
fi
python -m \
paddle.distributed.launch \
--log_dir $SAVE_DIR \
--started_port 8888 \
./run.py \
--use_data_distributed true \
--do_train true \
--vocab_path $VOCAB_PATH \
--data_dir $DATA_DIR \
--data_type $DATA_TYPE \
--batch_size 6 \
--valid_steps 2000 \
--num_type_embeddings 2 \
--use_discriminator true \
--num_epoch 20 \
--lr 1e-5 \
--save_checkpoint false \
--save_summary $USE_VISUALDL \
--init_checkpoint $INIT_CHECKPOINT \
--save_dir $SAVE_DIR
if [[ $USE_VISUALDL = true ]]; then
kill $VISUALDL_PID
fi
#!/bin/bash
set -ux
SAVE_DIR=outputs/DailyDialog.infer
VOCAB_PATH=model/Bert/vocab.txt
DATA_DIR=data/DailyDialog
INIT_CHECKPOINT=outputs/DailyDialog/best.model
DATA_TYPE=multi
# CUDA environment settings.
export CUDA_VISIBLE_DEVICES=0
# Paddle environment settings.
export FLAGS_fraction_of_gpu_memory_to_use=0.1
export FLAGS_eager_delete_scope=True
export FLAGS_eager_delete_tensor_gb=0.0
if [[ ! -e $DATA_DIR/dial.test.jsonl ]]; then
python -u \
./preprocess.py \
--vocab_path $VOCAB_PATH \
--data_dir $DATA_DIR \
--data_type $DATA_TYPE
fi
python -u \
./run.py \
--do_infer true \
--generator TopKSampling \
--top_k_num 10 \
--sampling_temperate 0.8 \
--vocab_path $VOCAB_PATH \
--data_dir $DATA_DIR \
--data_type $DATA_TYPE \
--batch_size 16 \
--num_type_embeddings 2 \
--use_discriminator true \
--init_checkpoint $INIT_CHECKPOINT \
--save_dir $SAVE_DIR
#!/bin/bash
set -ux
SAVE_DIR=outputs/DailyDialog
VOCAB_PATH=model/Bert/vocab.txt
DATA_DIR=data/DailyDialog
INIT_CHECKPOINT=model/PLATO
DATA_TYPE=multi
USE_VISUALDL=false
# CUDA environment settings.
export CUDA_VISIBLE_DEVICES=0
# Paddle environment settings.
export FLAGS_fraction_of_gpu_memory_to_use=0.1
export FLAGS_eager_delete_scope=True
export FLAGS_eager_delete_tensor_gb=0.0
python -u \
./preprocess.py \
--vocab_path $VOCAB_PATH \
--data_dir $DATA_DIR \
--data_type $DATA_TYPE
if [[ "$USE_VISUALDL" = true ]]; then
visualdl --logdir=$SAVE_DIR/summary --port=8083 --host=`hostname` &
VISUALDL_PID=$!
fi
python -u \
./run.py \
--do_train true \
--vocab_path $VOCAB_PATH \
--data_dir $DATA_DIR \
--data_type $DATA_TYPE \
--batch_size 6 \
--valid_steps 2000 \
--num_type_embeddings 2 \
--use_discriminator true \
--num_epoch 20 \
--lr 1e-5 \
--save_checkpoint false \
--save_summary $USE_VISUALDL \
--init_checkpoint $INIT_CHECKPOINT \
--save_dir $SAVE_DIR
if [[ $USE_VISUALDL = true ]]; then
kill $VISUALDL_PID
fi
#!/bin/bash
set -ux
SAVE_DIR=outputs/PersonaChat.infer
VOCAB_PATH=model/Bert/vocab.txt
DATA_DIR=data/PersonaChat
INIT_CHECKPOINT=outputs/PersonaChat/best.model
DATA_TYPE=multi_knowledge
# CUDA environment settings.
export CUDA_VISIBLE_DEVICES=0
# Paddle environment settings.
export FLAGS_fraction_of_gpu_memory_to_use=0.1
export FLAGS_eager_delete_scope=True
export FLAGS_eager_delete_tensor_gb=0.0
python -u \
./preprocess.py \
--vocab_path $VOCAB_PATH \
--data_dir $DATA_DIR \
--data_type $DATA_TYPE
python -u \
./run.py \
--do_infer true \
--vocab_path $VOCAB_PATH \
--data_dir $DATA_DIR \
--data_type $DATA_TYPE \
--batch_size 2 \
--num_type_embeddings 3 \
--use_discriminator true \
--init_checkpoint $INIT_CHECKPOINT \
--save_dir $SAVE_DIR
python -u ./tools/knowledge_f1.py $SAVE_DIR/infer_0.result.json $DATA_DIR/dial.test
#!/bin/bash
set -ux
SAVE_DIR=outputs/PersonaChat
VOCAB_PATH=model/Bert/vocab.txt
DATA_DIR=data/PersonaChat
INIT_CHECKPOINT=model/PLATO
DATA_TYPE=multi_knowledge
USE_VISUALDL=false
# CUDA environment settings.
export CUDA_VISIBLE_DEVICES=0
# Paddle environment settings.
export FLAGS_fraction_of_gpu_memory_to_use=0.1
export FLAGS_eager_delete_scope=True
export FLAGS_eager_delete_tensor_gb=0.0
python -u \
./preprocess.py \
--vocab_path $VOCAB_PATH \
--data_dir $DATA_DIR \
--data_type $DATA_TYPE
if [[ "$USE_VISUALDL" = true ]]; then
visualdl --logdir=$SAVE_DIR/summary --port=8083 --host=`hostname` &
VISUALDL_PID=$!
fi
python -u \
./run.py \
--do_train true \
--vocab_path $VOCAB_PATH \
--data_dir $DATA_DIR \
--data_type $DATA_TYPE \
--batch_size 4 \
--valid_steps 2000 \
--num_type_embeddings 3 \
--use_discriminator true \
--num_epoch 20 \
--lr 1e-5 \
--save_checkpoint false \
--save_summary $USE_VISUALDL \
--init_checkpoint $INIT_CHECKPOINT \
--save_dir $SAVE_DIR
if [[ $USE_VISUALDL = true ]]; then
kill $VISUALDL_PID
fi
import sys
import math
import json
import numpy as np
from pycocoevalcap.bleu.bleu import Bleu
from pycocoevalcap.rouge.rouge import Rouge
from pycocoevalcap.cider.cider import Cider
from pycocoevalcap.meteor.meteor import Meteor
def_scorers = [
(Bleu(4), ["Bleu_1", "Bleu_2", "Bleu_3", "Bleu_4"]),
(Meteor(),"METEOR"),
(Rouge(), "ROUGE_L"),
(Cider(), "CIDEr")
]
best_scorers = [
(Bleu(4), ["Bleu_1", "Bleu_2", "Bleu_3", "Bleu_4"]),
(Meteor(),"METEOR"),
(Rouge(), "ROUGE_L")
]
def score_fn(ref, sample, scorers=def_scorers):
# ref and sample are both dict
final_scores = {}
for scorer, method in scorers:
# print('computing %s score with COCO-EVAL...'%(scorer.method()))
score, scores = scorer.compute_score(ref, sample)
if type(score) == list:
for m, s in zip(method, score):
final_scores[m] = s
else:
final_scores[method] = score
return final_scores
from collections import defaultdict
chosen_by_scores = defaultdict(int)
chosen_by_best = defaultdict(int)
acc = 0
with open(sys.argv[1]) as file:
datas = json.load(file)
cnt = 0
all_refs = dict()
all_cands = dict()
for data in datas:
ref = list(map(lambda x : x.strip(), data['tgt'].split('|')))
# if False:
best_pred = ''
best_score = -1e9
best_idx = -1
for i, pred in enumerate(data['preds']):
refs = dict()
cands = dict()
refs[0] = ref
cands[0] = [pred]
ret = score_fn(refs, cands, best_scorers)
score = sum(map(lambda x : ret[x], ret))
if score > best_score:
best_idx = i
best_score = score
best_pred = pred
chosen_by_best[best_idx] += 1
idx = np.argmax(data['scores'])
chosen_by_scores[idx] += 1
chosen_pred = data['preds'][idx]
if idx == best_idx:
acc += 1
all_refs[cnt] = ref
all_cands[cnt] = [chosen_pred]
cnt += 1
print(f"Acc: {acc / len(datas)}")
for i in range(20):
print(f"{i} {chosen_by_scores[i]} {chosen_by_best[i]}"
f" {chosen_by_scores[i] / len(datas):.4f}"
f" {chosen_by_scores[i] / chosen_by_best[i]:.4f}")
res = score_fn(all_refs, all_cands)
for name in res:
print(f"{name}: {res[name]:.4f}")
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Calculate Knowledge f1.
"""
import sys
import json
import numpy as np
eval_file = sys.argv[1]
test_file = sys.argv[2]
cnt = 0
res = 0.0
r = 0.0
p = 0.0
stopwords = set()
with open("./tools/stopwords.txt") as f:
for line in f:
word = line.strip()
stopwords.add(word)
with open(eval_file) as f:
for result, line in zip(json.load(f), open(test_file)):
cnt += 1
if "scores" in result:
pred = result["preds"][np.argmax(result["scores"])]
else:
pred = result["preds"][0]
knowledges, _, reply = line.strip().split('\t')
words = set()
for sent in knowledges.split(" __eou__ "):
for word in sent.split():
words.add(word)
words = words - stopwords
k_len = len(words)
pred1 = set(pred.split())
pred1 = pred1 - stopwords
pred_len = len(pred1)
overlap = len(words & pred1)
if overlap == 0:
continue
recall = float(overlap) / k_len
r += recall
precison = float(overlap) / pred_len
p += precison
res += 2*recall*precison/(recall+precison)
print(f"Recall:{r/cnt}")
print(f"Precison:{p/cnt}")
print(f"F1:{res/cnt}")
print("Recall/Precision/F1:{:0,.4f}/{:0,.4f}/{:0,.4f}".format(r/cnt, p/cnt, res/cnt))
a
according
about
above
across
after
again
against
all
almost
alone
along
already
also
although
always
among
an
and
another
any
are
around
as
ask
asked
asking
asks
at
away
b
back
backed
backing
backs
be
became
because
become
becomes
been
began
being
beings
between
both
but
by
c
can
cannot
certain
certainly
come
could
d
did
differ
different
differently
do
does
done
during
e
each
either
even
evenly
ever
every
f
felt
find
finds
for
from
further
furthered
furthering
furthers
g
gave
general
generally
get
gets
give
given
gives
go
going
got
h
had
has
have
having
he
her
here
herself
him
himself
his
how
however
i
if
in
into
is
it
its
itself
j
just
k
keep
keeps
kind
knew
know
known
knows
l
let
lets
likely
m
may
me
might
mostly
much
must
my
myself
n
need
needed
needing
needs
never
no
nobody
non
noone
not
nothing
now
nowhere
o
of
on
once
one
only
or
other
others
our
out
over
overall
p
per
perhaps
put
puts
q
r
rather
really
s
seem
seemed
seeming
seems
shall
she
should
showed
showing
shows
since
so
some
still
still
such
sure
t
take
taken
than
that
the
their
them
then
there
therefore
particularly
nevertheless
these
they
thing
things
think
thinks
this
those
though
thought
thoughts
through
thus
try
trying
tried
to
anyway
anymore
together
too
took
toward
u
under
until
up
upon
us
use
used
uses
v
very
w
want
wanted
wanting
wants
was
way
ways
we
well
wells
went
were
what
when
where
whether
which
while
who
whole
whose
why
will
with
within
would
x
y
yet
you
your
yours
z
.
am
like
love
favorite
work
,
enjoy
'm
're
great
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册