未验证 提交 4748684e 编写于 作者: 0 0YuanZhang0 提交者: GitHub

update d-net (#3591)

* Remove KD scripts

* Add ERNIE2.0 service

* Update server

* Update MTL

* Update README.md

* Update README.md for MTL

* Update README.md
上级 2bbe37ad
# D-NET # D-NET
## Introduction ## Introduction
D-NET is the system Baidu submitted for MRQA (Machine Reading for Question Answering) 2019 Shared Task that focused on generalization of machine reading comprehension (MRC) models. Our system is built on a framework of pre-training and fine-tuning. The techniques of pre-trained language models, multi-task learning and knowledge distillation are employed to improve the generalization of MRC models and the experimental results show the effectiveness of these strategies. Our system is ranked at top 1 of all the participants in terms of averaged F1 score. Additionally, we won the first place for 10 of the 12 test sets and the second place for the other two in terms of F1 scores. D-NET is a simple pre-training and fine-tuning framework that Baidu used for the MRQA (Machine Reading for Question Answering) 2019 Shared Task, which focused on the generalization of machine reading comprehension (MRC) models. Our system is ranked at top 1 of all the participants in terms of the averaged F1 score. Additionally, we won the first place for 10 of the 12 test sets and the second place for the other two in terms of F1 scores.
In this repository, we release the related code, data and model parametrs which have been used in the D-NET framework.
## Framework ## Framework
An overview of the D-NET framework is shown in the figure below. To improve the generalization capability of a MRC system, we use mainly two techniques, i.e. **multi-task learning (MTL)** and **ensemble of multiple pre-trained models**.
<p align="center"> <p align="center">
<img src="./images/D-NET_framework.png" width="500"> <img src="./images/D-NET_framework.png" width="500">
</p> </p>
### D-NET includes 3 parts:
#### multi_task_learning
We use PaddlePaddle PALM multi-task learning library [Link](https://github.com/PaddlePaddle/PALM) to train single model for MRQA 2019 Shared Task.
#### knowledge_distillation
Model ensemble can improve the generalization of MRC models, we leverage the technique of distillation to ensemble multiple models into a single model, and no loss of accuracy, distillation solves the problem of slow inference process and reduce the use of a huge amount of resource.
#### server
MRQA2019 submission environment with baidu bert inference model and xlnet inference model.
#### Multi-task learning
In addition to the MRC task, we further introduce several auxiliary tasks in the fine-tuning stage to learn more general language representations. Specifically, we have the following auxiliary tasks:
- Unsupervised Task: masked Language Model
- Supervised Tasks:
- natural language inference
- paragraph ranking
We use the [PALM](https://github.com/PaddlePaddle/PALM) multi-task learning library based on [PaddlePaddle](https://www.paddlepaddle.org.cn/) in our experiments, which makes the implementation of new tasks and pre-trained models much easier than from scratch. To train the MRQA data sets with MTL, please refer to the instructions [here](multi_task_learning) (under `multi_task_learning/`).
#### Ensemble of multiple pre-trained models
In our experiments, we found that the ensemble system based on different pre-trained models shows better generalization capability than the system that based on the single ones. In this repository, we provide the parameters of 3 models that are fine-tuned on the MRQA in-domain data, based on ERNIE2.0, XL-NET and BERT, respectively. The ensemble of these models are implemented as servers. Please refer the instructions [here](server) (under `server/`) for more detials.
## Directory structure
```
├── multi_task_learning/ # scripts for multi-task learning
│ ├── configs/ # PALM config files
│ ├── scripts/ # auxiliary scripts
│ ├── wget_pretrained_model.sh # download pretrained model
│ ├── wget_data.sh # download data for MTL
│ ├── run_build_palm.sh # MLT preparation
│ ├── run_evaluation.sh # evaluation
│ ├── run_multi_task.sh # start MTL training
├── server/ # scripts for the ensemble of multiple pretrained models
│ ├── ernie_server/ # ERNIE mdoel server
│ ├── xlnet_server/ # XL-NET mdoel server
│ ├── bert_server/ # BERT mdoel server
│ ├── main_server.py # main server scripts for ensemble
│ ├── client/ # client scripts which read examples and make requests
│ ├── wget_server_inference_model.sh # script for downlowding model parameters
│ ├── start.sh # script for launching all the servers
```
## Copyright and License ## Copyright and License
Copyright 2019 Baidu.com, Inc. All Rights Reserved Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Copyright 2019 Baidu.com, Inc. All Rights Reserved Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
# knowledge_distillation
## 1、Introduction
Model ensemble can improve the generalization of MRC models. However, such approach is not efficient. Because the inference of an ensemble model is slow and a huge amount of resources are required. We leverage the technique of distillation to ensemble multiple models into a single model solves the problem of slow inference process.
## 2、Quick Start
### Environment
- Python >= 2.7
- cuda >= 9.0
- cudnn >= 7.0
- PaddlePaddle >= 1.6 Please refer to Installation Guide [Installation Guide](http://www.paddlepaddle.org/#quick-start)
### Data and Models Preparation
User can get the data and trained knowledge_distillation models directly we provided:
```
bash wget_models_and_data.sh
```
user can get data and models directorys:
data:
./data/input/mlm_data: mask language model dataset.
./data/input/mrqa_distill_data: mrqa dataset, it includes two parts: mrqa_distill.json(json data we calculate from teacher models), mrqa-combined.all_dev.raw.json(merge all mrqa dev dataset).
./data/input/mrqa_evaluation_dataset: mrqa evaluation data(in_domain data and out_of_domain json data).
models:
./data/pretrain_model/squad2_model: pretrain model(google squad2.0 model as pretrain model [Model Link](https://worksheets.codalab.org/worksheets/0x3852e60a51d2444680606556d404c657)).
./data/saved_models/knowledge_distillation_model: baidu trained knowledge distillation model.
## 3、Train and Predict
Train and predict knowledge distillation model
```
bash run_distill.sh
```
## 4、Evaluation
To evaluate the result, run
```
sh run_evaluation.sh
```
Note that we use the evaluation script for SQuAD 1.1 here, which is equivalent to the official one.
## 5、Performance
| | dev in_domain(Macro-F1)| dev out_of_domain(Macro-F1) |
| ------------- | ------------ | ------------ |
| Official baseline | 77.87 | 58.67 |
| KD(4 teacher model-> student)| 83.67 | 67.34 |
KD: knowledge distillation model(ensemble 4 teacher models to student model)
## Copyright and License
Copyright 2019 Baidu.com, Inc. All Rights Reserved Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and
limitations under the License.
input data dir: mrqa distillation dataset and mask language model dataset
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""BERT model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import six
import json
import numpy as np
import paddle.fluid as fluid
from model.transformer_encoder import encoder as encoder
from model.transformer_encoder import pre_process_layer as pre_process_layer
class BertModel(object):
def __init__(self,
src_ids,
position_ids,
sentence_ids,
input_mask,
config,
weight_sharing=True,
use_fp16=False,
model_name = ''):
self._emb_size = config["hidden_size"]
self._n_layer = config["num_hidden_layers"]
self._n_head = config["num_attention_heads"]
self._voc_size = config["vocab_size"]
self._max_position_seq_len = config["max_position_embeddings"]
self._sent_types = config["type_vocab_size"]
self._hidden_act = config["hidden_act"]
self._prepostprocess_dropout = config["hidden_dropout_prob"]
self._attention_dropout = config["attention_probs_dropout_prob"]
self._weight_sharing = weight_sharing
self.model_name = model_name
self._word_emb_name = self.model_name + "word_embedding"
self._pos_emb_name = self.model_name + "pos_embedding"
self._sent_emb_name = self.model_name + "sent_embedding"
self._dtype = "float16" if use_fp16 else "float32"
# Initialize all weigths by truncated normal initializer, and all biases
# will be initialized by constant zero by default.
self._param_initializer = fluid.initializer.TruncatedNormal(
scale=config["initializer_range"])
self._build_model(src_ids, position_ids, sentence_ids, input_mask, config)
def _build_model(self, src_ids, position_ids, sentence_ids, input_mask, config):
# padding id in vocabulary must be set to 0
emb_out = fluid.layers.embedding(
input=src_ids,
size=[self._voc_size, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(
name=self._word_emb_name, initializer=self._param_initializer),
is_sparse=False)
self.emb_out =emb_out
position_emb_out = fluid.layers.embedding(
input=position_ids,
size=[self._max_position_seq_len, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(
name=self._pos_emb_name, initializer=self._param_initializer))
self.position_emb_out = position_emb_out
sent_emb_out = fluid.layers.embedding(
sentence_ids,
size=[self._sent_types, self._emb_size],
dtype=self._dtype,
param_attr=fluid.ParamAttr(
name=self._sent_emb_name, initializer=self._param_initializer))
self.sent_emb_out = sent_emb_out
emb_out = emb_out + position_emb_out
emb_out = emb_out + sent_emb_out
emb_out = pre_process_layer(
emb_out, 'nd', self._prepostprocess_dropout, name='pre_encoder')
if self._dtype == "float16":
input_mask = fluid.layers.cast(x=input_mask, dtype=self._dtype)
self_attn_mask = fluid.layers.matmul(
x = input_mask, y = input_mask, transpose_y = True)
self_attn_mask = fluid.layers.scale(
x = self_attn_mask, scale = 10000.0, bias = -1.0, bias_after_scale = False)
n_head_self_attn_mask = fluid.layers.stack(
x=[self_attn_mask] * self._n_head, axis=1)
n_head_self_attn_mask.stop_gradient = True
self._enc_out = encoder(
enc_input = emb_out,
attn_bias = n_head_self_attn_mask,
n_layer = self._n_layer,
n_head = self._n_head,
d_key = self._emb_size // self._n_head,
d_value = self._emb_size // self._n_head,
d_model = self._emb_size,
d_inner_hid = self._emb_size * 4,
prepostprocess_dropout = self._prepostprocess_dropout,
attention_dropout = self._attention_dropout,
relu_dropout = 0,
hidden_act = self._hidden_act,
preprocess_cmd = "",
postprocess_cmd = "dan",
param_initializer = self._param_initializer,
name = self.model_name + 'encoder')
def get_sequence_output(self):
return self._enc_out
def get_pooled_output(self):
"""Get the first feature of each sequence for classification"""
next_sent_feat = fluid.layers.slice(
input = self._enc_out, axes = [1], starts = [0], ends = [1])
next_sent_feat = fluid.layers.fc(
input = next_sent_feat,
size = self._emb_size,
act = "tanh",
param_attr = fluid.ParamAttr(
name = self.model_name + "pooled_fc.w_0",
initializer = self._param_initializer),
bias_attr = "pooled_fc.b_0")
return next_sent_feat
def get_pretraining_output(self, mask_label, mask_pos, labels):
"""Get the loss & accuracy for pretraining"""
mask_pos = fluid.layers.cast(x=mask_pos, dtype='int32')
# extract the first token feature in each sentence
next_sent_feat = self.get_pooled_output()
reshaped_emb_out = fluid.layers.reshape(
x=self._enc_out, shape = [-1, self._emb_size])
# extract masked tokens' feature
mask_feat = fluid.layers.gather(input = reshaped_emb_out, index = mask_pos)
# transform: fc
mask_trans_feat = fluid.layers.fc(
input = mask_feat,
size = self._emb_size,
act = self._hidden_act,
param_attr = fluid.ParamAttr(
name = self.model_name + 'mask_lm_trans_fc.w_0',
initializer = self._param_initializer),
bias_attr = fluid.ParamAttr(name = self.model_name + 'mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = pre_process_layer(
mask_trans_feat, 'n', name = self.model_name + 'mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(
name = self.model_name + "mask_lm_out_fc.b_0",
initializer = fluid.initializer.Constant(value = 0.0))
if self._weight_sharing:
fc_out = fluid.layers.matmul(
x = mask_trans_feat,
y = fluid.default_main_program().global_block().var(
self._word_emb_name),
transpose_y = True)
fc_out += fluid.layers.create_parameter(
shape = [self._voc_size],
dtype = self._dtype,
attr = mask_lm_out_bias_attr,
is_bias = True)
else:
fc_out = fluid.layers.fc(input = mask_trans_feat,
size = self._voc_size,
param_attr = fluid.ParamAttr(
name = self.model_name + "mask_lm_out_fc.w_0",
initializer = self._param_initializer),
bias_attr = mask_lm_out_bias_attr)
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(
logits = fc_out, label = mask_label)
mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
next_sent_fc_out = fluid.layers.fc(
input = next_sent_feat,
size = 2,
param_attr = fluid.ParamAttr(
name = self.model_name + "next_sent_fc.w_0",
initializer = self._param_initializer),
bias_attr = self.model_name + "next_sent_fc.b_0")
next_sent_loss, next_sent_softmax = fluid.layers.softmax_with_cross_entropy(
logits = next_sent_fc_out, label = labels, return_softmax = True)
next_sent_acc = fluid.layers.accuracy(
input = next_sent_softmax, label = labels)
mean_next_sent_loss = fluid.layers.mean(next_sent_loss)
loss = mean_next_sent_loss + mean_mask_lm_loss
return next_sent_acc, mean_mask_lm_loss, loss
if __name__ == "__main__":
print("hello wolrd!")
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
import argparse
import collections
import numpy as np
import multiprocessing
from copy import deepcopy as copy
import paddle
import paddle.fluid as fluid
from model.bert import BertModel
from utils.configure import JsonConfig
class ModelBERT(object):
def __init__(
self,
conf,
name = "",
is_training = False,
base_model = None):
# the name of this task
# name is used for identifying parameters
self.name = name
# deep copy the configure of model
self.conf = copy(conf)
self.is_training = is_training
## the overall loss of this task
self.loss = None
## outputs may be useful for the other models
self.outputs = {}
## the prediction of this task
self.predict = []
def create_model(self,
args,
reader_input,
base_model = None):
"""
given the base model, reader_input
return the create fn for create this model
"""
def _create_model():
src_ids, pos_ids, sent_ids, input_mask = reader_input
bert_conf = JsonConfig(self.conf["bert_conf_file"])
self.bert = BertModel(
src_ids = src_ids,
position_ids = pos_ids,
sentence_ids = sent_ids,
input_mask = input_mask,
config = bert_conf,
use_fp16 = args.use_fp16,
model_name = self.name)
self.loss = None
self.outputs = {
"sequence_output":self.bert.get_sequence_output(),
}
return _create_model
def get_output(self, name):
return self.outputs[name]
def get_outputs(self):
return self.outputs
def get_predict(self):
return self.predict
if __name__ == "__main__":
bert_model = ModelBERT(conf = {"json_conf_path" : "./data/pretrained_models/squad2_model/bert_config.json"})
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle.fluid as fluid
from model.transformer_encoder import pre_process_layer
from utils.configure import JsonConfig
def compute_loss(output_tensors, args=None):
"""Compute loss for mlm model"""
fc_out = output_tensors['mlm_out']
mask_label = output_tensors['mask_label']
mask_lm_loss = fluid.layers.softmax_with_cross_entropy(
logits=fc_out, label=mask_label)
mean_mask_lm_loss = fluid.layers.mean(mask_lm_loss)
return mean_mask_lm_loss
def create_model(reader_input, base_model=None, is_training=True, args=None):
"""
given the base model, reader_input
return the output tensors
"""
mask_label, mask_pos = reader_input
config = JsonConfig(args.bert_config_path)
_emb_size = config['hidden_size']
_voc_size = config['vocab_size']
_hidden_act = config['hidden_act']
_word_emb_name = "word_embedding"
_dtype = "float16" if args.use_fp16 else "float32"
_param_initializer = fluid.initializer.TruncatedNormal(
scale=config['initializer_range'])
mask_pos = fluid.layers.cast(x=mask_pos, dtype='int32')
enc_out = base_model.get_output("sequence_output")
# extract the first token feature in each sentence
reshaped_emb_out = fluid.layers.reshape(
x=enc_out, shape=[-1, _emb_size])
# extract masked tokens' feature
mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)
num_seqs = fluid.layers.fill_constant(shape=[1], value=512, dtype='int64')
# transform: fc
mask_trans_feat = fluid.layers.fc(
input=mask_feat,
size=_emb_size,
act=_hidden_act,
param_attr=fluid.ParamAttr(
name='mask_lm_trans_fc.w_0',
initializer=_param_initializer),
bias_attr=fluid.ParamAttr(name='mask_lm_trans_fc.b_0'))
# transform: layer norm
mask_trans_feat = pre_process_layer(
mask_trans_feat, 'n', name='mask_lm_trans')
mask_lm_out_bias_attr = fluid.ParamAttr(
name="mask_lm_out_fc.b_0",
initializer=fluid.initializer.Constant(value=0.0))
fc_out = fluid.layers.matmul(
x=mask_trans_feat,
y=fluid.default_main_program().global_block().var(
_word_emb_name),
transpose_y=True)
fc_out += fluid.layers.create_parameter(
shape=[_voc_size],
dtype=_dtype,
attr=mask_lm_out_bias_attr,
is_bias=True)
output_tensors = {}
output_tensors['num_seqs'] = num_seqs
output_tensors['mlm_out'] = fc_out
output_tensors['mask_label'] = mask_label
return output_tensors
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle.fluid as fluid
def compute_loss(output_tensors, args=None):
"""Compute loss for mrc model"""
def _compute_single_loss(logits, positions):
"""Compute start/end loss for mrc model"""
loss = fluid.layers.softmax_with_cross_entropy(
logits=logits, label=positions)
loss = fluid.layers.mean(x=loss)
return loss
start_logits = output_tensors['start_logits']
end_logits = output_tensors['end_logits']
start_positions = output_tensors['start_positions']
end_positions = output_tensors['end_positions']
start_loss = _compute_single_loss(start_logits, start_positions)
end_loss = _compute_single_loss(end_logits, end_positions)
total_loss = (start_loss + end_loss) / 2.0
if args.use_fp16 and args.loss_scaling > 1.0:
total_loss = total_loss * args.loss_scaling
return total_loss
def compute_distill_loss(output_tensors, args=None):
"""Compute loss for mrc model"""
start_logits = output_tensors['start_logits']
end_logits = output_tensors['end_logits']
start_logits_truth = output_tensors['start_logits_truth']
end_logits_truth = output_tensors['end_logits_truth']
input_mask = output_tensors['input_mask']
def _mask(logits, input_mask, nan=1e5):
input_mask = fluid.layers.reshape(input_mask, [-1, 512])
logits = logits - (1.0 - input_mask) * nan
return logits
start_logits = _mask(start_logits, input_mask)
end_logits = _mask(end_logits, input_mask)
start_logits_truth = _mask(start_logits_truth, input_mask)
end_logits_truth = _mask(end_logits_truth, input_mask)
start_logits_truth = fluid.layers.reshape(start_logits_truth, [-1, 512])
end_logits_truth = fluid.layers.reshape(end_logits_truth, [-1, 512])
T = 1.0
start_logits_softmax = fluid.layers.softmax(input=start_logits/T)
end_logits_softmax = fluid.layers.softmax(input=end_logits/T)
start_logits_truth_softmax = fluid.layers.softmax(input=start_logits_truth/T)
end_logits_truth_softmax = fluid.layers.softmax(input=end_logits_truth/T)
start_logits_truth_softmax.stop_gradient = True
end_logits_truth_softmax.stop_gradient = True
start_loss = fluid.layers.cross_entropy(start_logits_softmax, start_logits_truth_softmax, soft_label=True)
end_loss = fluid.layers.cross_entropy(end_logits_softmax, end_logits_truth_softmax, soft_label=True)
start_loss = fluid.layers.mean(x=start_loss)
end_loss = fluid.layers.mean(x=end_loss)
total_loss = (start_loss + end_loss) / 2.0
return total_loss
def create_model(reader_input, base_model=None, is_training=True, args=None):
"""
given the base model, reader_input
return the output tensors
"""
if is_training:
if args.do_distill:
src_ids, pos_ids, sent_ids, input_mask, \
start_logits_truth, end_logits_truth, start_positions, end_positions = reader_input
else:
src_ids, pos_ids, sent_ids, input_mask, \
start_positions, end_positions = reader_input
else:
src_ids, pos_ids, sent_ids, input_mask, unique_id = reader_input
enc_out = base_model.get_output("sequence_output")
logits = fluid.layers.fc(
input=enc_out,
size=2,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(
name="cls_squad_out_w",
initializer=fluid.initializer.TruncatedNormal(scale=0.02)),
bias_attr=fluid.ParamAttr(
name="cls_squad_out_b", initializer=fluid.initializer.Constant(0.)))
logits = fluid.layers.transpose(x=logits, perm=[2, 0, 1])
start_logits, end_logits = fluid.layers.unstack(x=logits, axis=0)
batch_ones = fluid.layers.fill_constant_batch_size_like(
input=start_logits, dtype='int64', shape=[1], value=1)
num_seqs = fluid.layers.reduce_sum(input=batch_ones)
output_tensors = {}
output_tensors['start_logits'] = start_logits
output_tensors['end_logits'] = end_logits
output_tensors['num_seqs'] = num_seqs
output_tensors['input_mask'] = input_mask
if is_training:
output_tensors['start_positions'] = start_positions
output_tensors['end_positions'] = end_positions
if args.do_distill:
output_tensors['start_logits_truth'] = start_logits_truth
output_tensors['end_logits_truth'] = end_logits_truth
else:
output_tensors['unique_id'] = unique_id
output_tensors['start_logits'] = start_logits
output_tensors['end_logits'] = end_logits
return output_tensors
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
import random
import numpy as np
import paddle
import paddle.fluid as fluid
from utils.placeholder import Placeholder
def repeat(reader):
"""Repeat a generator forever"""
generator = reader()
while True:
try:
yield next(generator)
except StopIteration:
generator = reader()
yield next(generator)
def create_joint_generator(input_shape, generators, do_distill, is_multi_task=True):
def empty_output(input_shape, batch_size=1):
results = []
for i in range(len(input_shape)):
if input_shape[i][1] == 'int32':
dtype = np.int32
if input_shape[i][1] == 'int64':
dtype = np.int64
if input_shape[i][1] == 'float32':
dtype = np.float32
if input_shape[i][1] == 'float64':
dtype = np.float64
shape = input_shape[i][0]
shape[0] = batch_size
pad_tensor = np.zeros(shape=shape, dtype=dtype)
results.append(pad_tensor)
return results
def wrapper():
"""wrapper data"""
generators_inst = [repeat(gen[0]) for gen in generators]
generators_ratio = [gen[1] for gen in generators]
weights = [ratio/sum(generators_ratio) for ratio in generators_ratio]
run_task_id = range(len(generators))
while True:
idx = np.random.choice(run_task_id, p=weights)
gen_results = next(generators_inst[idx])
if not gen_results:
break
batch_size = gen_results[0].shape[0]
results = empty_output(input_shape, batch_size)
task_id_tensor = np.array([[idx]]).astype("int64")
results[0] = task_id_tensor
for i in range(4):
results[i+1] = gen_results[i]
if do_distill:
if idx == 0:
results[5] = gen_results[4]
results[6] = gen_results[5]
results[7] = gen_results[6]
results[8] = gen_results[7]
else:
results[9] = gen_results[4]
results[10] = gen_results[5]
else:
if idx == 0:
# mrc batch
results[5] = gen_results[4]
results[6] = gen_results[5]
elif idx == 1:
# mlm batch
results[7] = gen_results[4]
results[8] = gen_results[5]
# idx stands for the task index
yield results
return wrapper
def create_reader(reader_name, input_shape, is_multi_task, do_distill, *gens):
"""
build reader for multi_task_learning
"""
placeholder = Placeholder(input_shape)
pyreader, model_inputs = placeholder.build(capacity=100, reader_name=reader_name)
joint_generator = create_joint_generator(input_shape, gens[0], do_distill, is_multi_task=is_multi_task)
return joint_generator, pyreader, model_inputs
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
from __future__ import division
import os
import re
import six
import gzip
import types
import logging
import numpy as np
import collections
import paddle
import paddle.fluid as fluid
from utils import tokenization
from utils.batching import prepare_batch_data
class DataReader(object):
def __init__(self,
data_dir,
vocab_path,
batch_size=4096,
in_tokens=True,
max_seq_len=512,
shuffle_files=True,
epoch=100,
voc_size=0,
is_test=False,
generate_neg_sample=False):
self.vocab = self.load_vocab(vocab_path)
self.data_dir = data_dir
self.batch_size = batch_size
self.in_tokens = in_tokens
self.shuffle_files = shuffle_files
self.epoch = epoch
self.current_epoch = 0
self.current_file_index = 0
self.total_file = 0
self.current_file = None
self.voc_size = voc_size
self.max_seq_len = max_seq_len
self.pad_id = self.vocab["[PAD]"]
self.cls_id = self.vocab["[CLS]"]
self.sep_id = self.vocab["[SEP]"]
self.mask_id = self.vocab["[MASK]"]
self.is_test = is_test
self.generate_neg_sample = generate_neg_sample
if self.in_tokens:
assert self.batch_size >= self.max_seq_len, "The number of " \
"tokens in batch should not be smaller than max seq length."
if self.is_test:
self.epoch = 1
self.shuffle_files = False
def get_progress(self):
"""return current progress of traning data
"""
return self.current_epoch, self.current_file_index, self.total_file, self.current_file
def parse_line(self, line, max_seq_len=512):
""" parse one line to token_ids, sentence_ids, pos_ids, label
"""
line = line.strip().decode().split(";")
assert len(line) == 4, "One sample must have 4 fields!"
(token_ids, sent_ids, pos_ids, label) = line
token_ids = [int(token) for token in token_ids.split(" ")]
sent_ids = [int(token) for token in sent_ids.split(" ")]
pos_ids = [int(token) for token in pos_ids.split(" ")]
assert len(token_ids) == len(sent_ids) == len(
pos_ids
), "[Must be true]len(token_ids) == len(sent_ids) == len(pos_ids)"
label = int(label)
if len(token_ids) > max_seq_len:
return None
return [token_ids, sent_ids, pos_ids, label]
def read_file(self, file):
assert file.endswith('.gz'), "[ERROR] %s is not a gzip file" % file
file_path = self.data_dir + "/" + file
with gzip.open(file_path, "rb") as f:
for line in f:
parsed_line = self.parse_line(
line, max_seq_len=self.max_seq_len)
if parsed_line is None:
continue
yield parsed_line
def convert_to_unicode(self, text):
"""Converts `text` to Unicode (if it's not already), assuming utf-8 input."""
if six.PY3:
if isinstance(text, str):
return text
elif isinstance(text, bytes):
return text.decode("utf-8", "ignore")
else:
raise ValueError("Unsupported string type: %s" % (type(text)))
elif six.PY2:
if isinstance(text, str):
return text.decode("utf-8", "ignore")
elif isinstance(text, unicode):
return text
else:
raise ValueError("Unsupported string type: %s" % (type(text)))
else:
raise ValueError("Not running on Python2 or Python 3?")
def load_vocab(self, vocab_file):
"""Loads a vocabulary file into a dictionary."""
vocab = collections.OrderedDict()
fin = open(vocab_file)
for num, line in enumerate(fin):
items = self.convert_to_unicode(line.strip()).split("\t")
if len(items) > 2:
break
token = items[0]
index = items[1] if len(items) == 2 else num
token = token.strip()
vocab[token] = int(index)
return vocab
def random_pair_neg_samples(self, pos_samples):
""" randomly generate negtive samples using pos_samples
Args:
pos_samples: list of positive samples
Returns:
neg_samples: list of negtive samples
"""
np.random.shuffle(pos_samples)
num_sample = len(pos_samples)
neg_samples = []
miss_num = 0
for i in range(num_sample):
pair_index = (i + 1) % num_sample
origin_src_ids = pos_samples[i][0]
origin_sep_index = origin_src_ids.index(2)
pair_src_ids = pos_samples[pair_index][0]
pair_sep_index = pair_src_ids.index(2)
src_ids = origin_src_ids[:origin_sep_index + 1] + pair_src_ids[
pair_sep_index + 1:]
if len(src_ids) >= self.max_seq_len:
miss_num += 1
continue
sent_ids = [0] * len(origin_src_ids[:origin_sep_index + 1]) + [
1
] * len(pair_src_ids[pair_sep_index + 1:])
pos_ids = list(range(len(src_ids)))
neg_sample = [src_ids, sent_ids, pos_ids, 0]
assert len(src_ids) == len(sent_ids) == len(
pos_ids
), "[ERROR]len(src_id) == lne(sent_id) == len(pos_id) must be True"
neg_samples.append(neg_sample)
return neg_samples, miss_num
def mixin_negtive_samples(self, pos_sample_generator, buffer=1000):
""" 1. generate negtive samples by randomly group sentence_1 and sentence_2 of positive samples
2. combine negtive samples and positive samples
Args:
pos_sample_generator: a generator producing a parsed positive sample, which is a list: [token_ids, sent_ids, pos_ids, 1]
Returns:
sample: one sample from shuffled positive samples and negtive samples
"""
pos_samples = []
num_total_miss = 0
pos_sample_num = 0
try:
while True:
while len(pos_samples) < buffer:
pos_sample = next(pos_sample_generator)
label = pos_sample[3]
assert label == 1, "positive sample's label must be 1"
pos_samples.append(pos_sample)
pos_sample_num += 1
neg_samples, miss_num = self.random_pair_neg_samples(
pos_samples)
num_total_miss += miss_num
samples = pos_samples + neg_samples
pos_samples = []
np.random.shuffle(samples)
for sample in samples:
yield sample
except StopIteration:
print("stopiteration: reach end of file")
if len(pos_samples) == 1:
yield pos_samples[0]
elif len(pos_samples) == 0:
yield None
else:
neg_samples, miss_num = self.random_pair_neg_samples(
pos_samples)
num_total_miss += miss_num
samples = pos_samples + neg_samples
pos_samples = []
np.random.shuffle(samples)
for sample in samples:
yield sample
print("miss_num:%d\tideal_total_sample_num:%d\tmiss_rate:%f" %
(num_total_miss, pos_sample_num * 2,
num_total_miss / (pos_sample_num * 2)))
def data_generator(self):
"""
data_generator
"""
files = os.listdir(self.data_dir)
self.total_file = len(files)
assert self.total_file > 0, "[Error] data_dir is empty"
def wrapper():
def reader():
for epoch in range(self.epoch):
self.current_epoch = epoch + 1
if self.shuffle_files:
np.random.shuffle(files)
for index, file in enumerate(files):
self.current_file_index = index + 1
self.current_file = file
sample_generator = self.read_file(file)
if not self.is_test and self.generate_neg_sample:
sample_generator = self.mixin_negtive_samples(
sample_generator)
for sample in sample_generator:
if sample is None:
continue
yield sample
def batch_reader(reader, batch_size, in_tokens):
batch, total_token_num, max_len = [], 0, 0
for parsed_line in reader():
token_ids, sent_ids, pos_ids, label = parsed_line
max_len = max(max_len, len(token_ids))
if in_tokens:
to_append = (len(batch) + 1) * max_len <= batch_size
else:
to_append = len(batch) < batch_size
if to_append:
batch.append(parsed_line)
total_token_num += len(token_ids)
else:
yield batch, total_token_num
batch, total_token_num, max_len = [parsed_line], len(
token_ids), len(token_ids)
if len(batch) > 0:
yield batch, total_token_num
for batch_data, total_token_num in batch_reader(
reader, self.batch_size, self.in_tokens):
yield prepare_batch_data(
batch_data,
total_token_num,
voc_size=self.voc_size,
pad_id=self.pad_id,
cls_id=self.cls_id,
sep_id=self.sep_id,
mask_id=self.mask_id,
max_len=self.max_seq_len,
return_input_mask=True,
return_max_len=False,
return_num_token=False)
return wrapper
if __name__ == "__main__":
pass
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Run MRQA"""
import six
import math
import json
import random
import collections
import numpy as np
from utils import tokenization
from utils.batching import prepare_batch_data
class DataProcessorDistill(object):
def __init__(self):
self.num_examples = -1
self.current_train_example = -1
self.current_train_epoch = -1
def get_features(self, data_path):
with open(data_path, 'r') as fr:
for line in fr:
yield line.strip()
def data_generator(self,
data_file,
batch_size,
max_len,
in_tokens,
dev_count,
epochs,
shuffle):
self.num_examples = len([ "" for line in open(data_file,"r")])
def batch_reader(data_file, in_tokens, batch_size):
batch = []
index = 0
for feature in self.get_features(data_file):
to_append = len(batch) < batch_size
if to_append:
batch.append(feature)
else:
yield batch
batch = []
if len(batch) > 0:
yield batch
def wrapper():
for epoch in range(epochs):
all_batches = []
for batch_data in batch_reader(data_file, in_tokens, batch_size):
batch_data_segment = []
for feature in batch_data:
data = json.loads(feature.strip())
example_index = data['example_index']
unique_id = data['unique_id']
input_ids = data['input_ids']
position_ids = data['position_ids']
input_mask = data['input_mask']
segment_ids = data['segment_ids']
start_position = data['start_position']
end_position = data['end_position']
start_logits = data['start_logits']
end_logits = data['end_logits']
instance = [input_ids, position_ids, segment_ids, input_mask, start_logits, end_logits, start_position, end_position]
batch_data_segment.append(instance)
batch_data = batch_data_segment
src_ids = [inst[0] for inst in batch_data]
pos_ids = [inst[1] for inst in batch_data]
sent_ids = [inst[2] for inst in batch_data]
input_mask = [inst[3] for inst in batch_data]
start_logits = [inst[4] for inst in batch_data]
end_logits = [inst[5] for inst in batch_data]
src_ids = np.array(src_ids).astype("int64").reshape([-1, max_len, 1])
pos_ids = np.array(pos_ids).astype("int64").reshape([-1, max_len, 1])
sent_ids = np.array(sent_ids).astype("int64").reshape([-1, max_len, 1])
input_mask = np.array(input_mask).astype("float32").reshape([-1, max_len, 1])
start_logits = np.array(start_logits).astype("float32").reshape([-1, max_len])
end_logits = np.array(end_logits).astype("float32").reshape([-1, max_len])
start_positions = [inst[6] for inst in batch_data]
end_positions = [inst[7] for inst in batch_data]
start_positions = np.array(start_positions).astype("int64").reshape([-1, 1])
end_positions = np.array(end_positions).astype("int64").reshape([-1, 1])
batch_data = [src_ids, pos_ids, sent_ids, input_mask, start_logits, end_logits, start_positions, end_positions]
if len(all_batches) < dev_count:
all_batches.append(batch_data)
if len(all_batches) == dev_count:
for batch in all_batches:
yield batch
all_batches = []
return wrapper
#!/bin/bash
export FLAGS_sync_nccl_allreduce=0
export FLAGS_eager_delete_tensor_gb=1
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
if [ ! "$CUDA_VISIBLE_DEVICES" ]
then
export CPU_NUM=1
use_cuda=false
else
use_cuda=true
fi
# path of pre_train model
INPUT_PATH="data/input"
PRETRAIN_MODEL_PATH="data/pretrain_model/squad2_model"
# path to save checkpoint
CHECKPOINT_PATH="data/output/output_mrqa"
mkdir -p $CHECKPOINT_PATH
python -u train.py --use_cuda ${use_cuda}\
--batch_size 8 \
--in_tokens false \
--init_pretraining_params ${PRETRAIN_MODEL_PATH}/params \
--checkpoints $CHECKPOINT_PATH \
--vocab_path ${PRETRAIN_MODEL_PATH}/vocab.txt \
--do_distill true \
--do_train true \
--do_predict true \
--save_steps 10000 \
--warmup_proportion 0.1 \
--weight_decay 0.01 \
--sample_rate 0.02 \
--epoch 2 \
--max_seq_len 512 \
--bert_config_path ${PRETRAIN_MODEL_PATH}/bert_config.json \
--predict_file ${INPUT_PATH}/mrqa_distill_data/mrqa-combined.all_dev.raw.json \
--do_lower_case false \
--doc_stride 128 \
--train_file ${INPUT_PATH}/mrqa_distill_data/mrqa_distill.json \
--mlm_path ${INPUT_PATH}/mlm_data \
--mix_ratio 2.0 \
--learning_rate 3e-5 \
--lr_scheduler linear_warmup_decay \
--skip_steps 100
#!/usr/bin/env bash
# ==============================================================================
# Copyright 2017 Baidu.com, Inc. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
# path of dev data
PATH_dev=./data/input/mrqa_evaluation_dataset
# path of dev predict
KD_prediction=./prediction_results/KD_ema_predictions.json
files=$(ls ./prediction_results/*.log 2> /dev/null | wc -l)
if [ "$files" != "0" ];
then
rm prediction_results/*.log
fi
# evaluation KD model
echo "evaluate knowledge distillation model........................................."
for dataset in `ls $PATH_dev/in_domain_dev/*.raw.json`;do
echo $dataset >> prediction_results/KD.log
python ../multi_task_learning/scripts/evaluate-v1.1.py $dataset $KD_prediction >> prediction_results/KD.log
done
for dataset in `ls $PATH_dev/out_of_domain_dev/*.raw.json`;do
echo $dataset >> prediction_results/KD.log
python ../multi_task_learning/scripts/evaluate-v1.1.py $dataset $KD_prediction >> prediction_results/KD.log
done
python ../multi_task_learning/scripts/macro_avg.py prediction_results/KD.log
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import time
import argparse
import collections
import numpy as np
import multiprocessing
import paddle
import paddle.fluid as fluid
from utils.placeholder import Placeholder
from utils.init import init_pretraining_params, init_checkpoint
from utils.configure import ArgumentGroup, print_arguments, JsonConfig
from model import mlm_net
from model import mrqa_net
from optimizer.optimization import optimization
from model.bert_model import ModelBERT
from reader.mrqa_reader import DataProcessor, write_predictions
from reader.mrqa_distill_reader import DataProcessorDistill
from reader.mlm_reader import DataReader
from reader.joint_reader import create_reader
parser = argparse.ArgumentParser(__doc__)
model_g = ArgumentGroup(parser, "model", "model configuration and paths.")
model_g.add_arg("bert_config_path", str, None, "Path to the json file for bert model config.")
model_g.add_arg("init_checkpoint", str, None, "Init checkpoint to resume training from.")
model_g.add_arg("init_pretraining_params", str, None,
"Init pre-training params which preforms fine-tuning from. If the "
"arg 'init_checkpoint' has been set, this argument wouldn't be valid.")
model_g.add_arg("checkpoints", str, "checkpoints", "Path to save checkpoints.")
train_g = ArgumentGroup(parser, "training", "training options.")
train_g.add_arg("epoch", int, 3, "Number of epoches for fine-tuning.")
train_g.add_arg("learning_rate", float, 5e-5, "Learning rate used to train with warmup.")
train_g.add_arg("lr_scheduler", str, "linear_warmup_decay",
"scheduler of learning rate.", choices=['linear_warmup_decay', 'noam_decay'])
train_g.add_arg("weight_decay", float, 0.01, "Weight decay rate for L2 regularizer.")
train_g.add_arg("use_ema", bool, True, "Whether to use ema.")
train_g.add_arg("ema_decay", float, 0.9999, "Decay rate for expoential moving average.")
train_g.add_arg("warmup_proportion", float, 0.1,
"Proportion of training steps to perform linear learning rate warmup for.")
train_g.add_arg("save_steps", int, 1000, "The steps interval to save checkpoints.")
train_g.add_arg("sample_rate", float, 0.02, "train samples num.")
train_g.add_arg("use_fp16", bool, False, "Whether to use fp16 mixed precision training.")
train_g.add_arg("mix_ratio", float, 0.4, "batch mix ratio for masked language model task")
train_g.add_arg("loss_scaling", float, 1.0,
"Loss scaling factor for mixed precision training, only valid when use_fp16 is enabled.")
train_g.add_arg("do_distill", bool, False, "do distillation")
log_g = ArgumentGroup(parser, "logging", "logging related.")
log_g.add_arg("skip_steps", int, 10, "The steps interval to print loss.")
log_g.add_arg("verbose", bool, False, "Whether to output verbose log.")
data_g = ArgumentGroup(parser, "data", "Data paths, vocab paths and data processing options")
data_g.add_arg("train_file", str, None, "json data for training.")
data_g.add_arg("mlm_path", str, None, "data for masked language model training.")
data_g.add_arg("predict_file", str, None, "json data for predictions.")
data_g.add_arg("vocab_path", str, None, "Vocabulary path.")
data_g.add_arg("with_negative", bool, False,
"If true, the examples contain some that do not have an answer.")
data_g.add_arg("max_seq_len", int, 512, "Number of words of the longest seqence.")
data_g.add_arg("max_query_length", int, 64, "Max query length.")
data_g.add_arg("max_answer_length", int, 30, "Max answer length.")
data_g.add_arg("batch_size", int, 12,
"Total examples' number in batch for training. see also --in_tokens.")
data_g.add_arg("in_tokens", bool, False,
"If set, the batch size will be the maximum number of tokens in one batch. "
"Otherwise, it will be the maximum number of examples in one batch.")
data_g.add_arg("do_lower_case", bool, True,
"Whether to lower case the input text. Should be True for uncased models and False for cased models.")
data_g.add_arg("doc_stride", int, 128,
"When splitting up a long document into chunks, how much stride to take between chunks.")
data_g.add_arg("n_best_size", int, 20,
"The total number of n-best predictions to generate in the nbest_predictions.json output file.")
data_g.add_arg("null_score_diff_threshold", float, 0.0,
"If null_score - best_non_null is greater than the threshold predict null.")
data_g.add_arg("random_seed", int, 0, "Random seed.")
run_type_g = ArgumentGroup(parser, "run_type", "running type options.")
run_type_g.add_arg("use_cuda", bool, True, "If set, use GPU for training.")
run_type_g.add_arg("use_fast_executor", bool, False,
"If set, use fast parallel executor (in experiment).")
run_type_g.add_arg("num_iteration_per_drop_scope", int, 1,
"Ihe iteration intervals to clean up temporary variables.")
run_type_g.add_arg("do_train", bool, True, "Whether to perform training.")
run_type_g.add_arg("do_predict", bool, True, "Whether to perform prediction.")
args = parser.parse_args()
max_seq_len = args.max_seq_len
if args.do_distill:
input_shape = [
([1, 1], 'int64'),
([-1, max_seq_len, 1], 'int64'), # src_ids
([-1, max_seq_len, 1], 'int64'), # pos_ids
([-1, max_seq_len, 1], 'int64'), # sent_ids
([-1, max_seq_len, 1], 'float32'), # input_mask
([-1, max_seq_len, 1], 'float32'), # start_logits_truth
([-1, max_seq_len, 1], 'float32'), # end_logits_truth
([-1, 1], 'int64'), # start label
([-1, 1], 'int64'), # end label
([-1, 1], 'int64'), # masked label
([-1, 1], 'int64')] # masked pos
else:
input_shape = [
([1, 1], 'int64'),
([-1, max_seq_len, 1], 'int64'),
([-1, max_seq_len, 1], 'int64'),
([-1, max_seq_len, 1], 'int64'),
([-1, max_seq_len, 1], 'float32'),
([-1, 1], 'int64'), # start label
([-1, 1], 'int64'), # end label
([-1, 1], 'int64'), # masked label
([-1, 1], 'int64')] # masked pos
# yapf: enable.
RawResult = collections.namedtuple("RawResult",
["unique_id", "start_logits", "end_logits"])
def predict(test_exe, test_program, test_pyreader, fetch_list, processor, prefix=''):
if not os.path.exists(args.checkpoints):
os.makedirs(args.checkpoints)
output_prediction_file = os.path.join(args.checkpoints, prefix + "predictions.json")
output_nbest_file = os.path.join(args.checkpoints, prefix + "nbest_predictions.json")
output_null_log_odds_file = os.path.join(args.checkpoints, prefix + "null_odds.json")
test_pyreader.start()
all_results = []
time_begin = time.time()
while True:
try:
np_unique_ids, np_start_logits, np_end_logits, np_num_seqs = test_exe.run(
fetch_list=fetch_list, program=test_program)
for idx in range(np_unique_ids.shape[0]):
if np_unique_ids[idx] < 0:
continue
if len(all_results) % 1000 == 0:
print("Processing example: %d" % len(all_results))
unique_id = int(np_unique_ids[idx])
start_logits = [float(x) for x in np_start_logits[idx].flat]
end_logits = [float(x) for x in np_end_logits[idx].flat]
all_results.append(
RawResult(
unique_id=unique_id,
start_logits=start_logits,
end_logits=end_logits))
except fluid.core.EOFException:
test_pyreader.reset()
break
time_end = time.time()
features = processor.get_features(
processor.predict_examples, is_training=False)
write_predictions(processor.predict_examples, features, all_results,
args.n_best_size, args.max_answer_length,
args.do_lower_case, output_prediction_file,
output_nbest_file, output_null_log_odds_file,
args.with_negative,
args.null_score_diff_threshold, args.verbose)
def train(args):
if not (args.do_train or args.do_predict):
raise ValueError("For args `do_train` and `do_predict`, at "
"least one of them must be True.")
if args.use_cuda:
place = fluid.CUDAPlace(0)
dev_count = fluid.core.get_cuda_device_count()
else:
place = fluid.CPUPlace()
dev_count = int(os.environ.get('CPU_NUM', multiprocessing.cpu_count()))
exe = fluid.Executor(place)
startup_prog = fluid.default_startup_program()
if args.random_seed is not None:
startup_prog.random_seed = args.random_seed
if args.do_train:
if args.do_distill:
train_processor = DataProcessorDistill()
mrc_train_generator = train_processor.data_generator(
data_file=args.train_file,
batch_size=args.batch_size,
max_len=args.max_seq_len,
in_tokens=False,
dev_count=dev_count,
epochs=args.epoch,
shuffle=True)
else:
train_processor = DataProcessor(
vocab_path=args.vocab_path,
do_lower_case=args.do_lower_case,
max_seq_length=args.max_seq_len,
in_tokens=args.in_tokens,
doc_stride=args.doc_stride,
max_query_length=args.max_query_length)
mrc_train_generator = train_processor.data_generator(
data_path=args.train_file,
batch_size=args.batch_size,
max_len=args.max_seq_len,
phase='train',
shuffle=True,
dev_count=dev_count,
with_negative=args.with_negative,
epoch=args.epoch)
bert_conf = JsonConfig(args.bert_config_path)
data_reader = DataReader(
args.mlm_path,
vocab_path=args.vocab_path,
batch_size=args.batch_size,
in_tokens=args.in_tokens,
voc_size=bert_conf['vocab_size'],
shuffle_files=False,
epoch=args.epoch,
max_seq_len=args.max_seq_len,
is_test=False)
mlm_train_generator = data_reader.data_generator()
gens = [
(mrc_train_generator, 1.0),
(mlm_train_generator, args.mix_ratio)
]
# create joint pyreader
joint_generator, train_pyreader, model_inputs = \
create_reader("train_reader", input_shape, True, args.do_distill,
gens)
train_pyreader.decorate_tensor_provider(joint_generator)
task_id = model_inputs[0]
if args.do_distill:
bert_inputs = model_inputs[1:5]
mrc_inputs = model_inputs[1:9]
mlm_inputs = model_inputs[9:11]
else:
bert_inputs = model_inputs[1:5]
mrc_inputs = model_inputs[1:7]
mlm_inputs = model_inputs[7:9]
# create model
train_bert_model = ModelBERT(
conf={"bert_conf_file": args.bert_config_path},
is_training=True)
train_create_bert = train_bert_model.create_model(args, bert_inputs)
build_strategy = fluid.BuildStrategy()
if args.do_distill:
num_train_examples = train_processor.num_examples
print("runtime number of examples:")
print(num_train_examples)
else:
print("estimating runtime number of examples...")
num_train_examples = train_processor.estimate_runtime_examples(
args.train_file, sample_rate=args.sample_rate)
print("runtime number of examples:")
print(num_train_examples)
if args.in_tokens:
max_train_steps = args.epoch * num_train_examples // (
args.batch_size // args.max_seq_len) // dev_count
else:
max_train_steps = args.epoch * num_train_examples // (
args.batch_size) // dev_count
max_train_steps = int(max_train_steps * (1 + args.mix_ratio))
warmup_steps = int(max_train_steps * args.warmup_proportion)
print("Device count: %d" % dev_count)
print("Num train examples: %d" % num_train_examples)
print("Max train steps: %d" % max_train_steps)
print("Num warmup steps: %d" % warmup_steps)
train_program = fluid.default_main_program()
with fluid.program_guard(train_program, startup_prog):
with fluid.unique_name.guard():
train_create_bert()
mlm_output_tensors = mlm_net.create_model(
mlm_inputs, base_model=train_bert_model, is_training=True, args=args
)
mrc_output_tensors = mrqa_net.create_model(
mrc_inputs, base_model=train_bert_model, is_training=True, args=args
)
task_one_hot = fluid.layers.one_hot(task_id, 2)
mrc_loss = mrqa_net.compute_loss(mrc_output_tensors, args)
if args.do_distill:
distill_loss = mrqa_net.compute_distill_loss(mrc_output_tensors, args)
mrc_loss = mrc_loss + distill_loss
num_seqs = mrc_output_tensors['num_seqs']
mlm_loss = mlm_net.compute_loss(mlm_output_tensors)
num_seqs = mlm_output_tensors['num_seqs']
all_loss = fluid.layers.concat([mrc_loss, mlm_loss], axis=0)
loss = fluid.layers.reduce_sum(task_one_hot * all_loss)
scheduled_lr = optimization(
loss=loss,
warmup_steps=warmup_steps,
num_train_steps=max_train_steps,
learning_rate=args.learning_rate,
train_program=train_program,
startup_prog=startup_prog,
weight_decay=args.weight_decay,
scheduler=args.lr_scheduler,
use_fp16=args.use_fp16,
loss_scaling=args.loss_scaling)
loss.persistable = True
num_seqs.persistable = True
ema = fluid.optimizer.ExponentialMovingAverage(args.ema_decay)
ema.update()
train_compiled_program = fluid.CompiledProgram(train_program).with_data_parallel(
loss_name=loss.name, build_strategy=build_strategy)
if args.verbose:
if args.in_tokens:
lower_mem, upper_mem, unit = fluid.contrib.memory_usage(
program=train_program,
batch_size=args.batch_size // args.max_seq_len)
else:
lower_mem, upper_mem, unit = fluid.contrib.memory_usage(
program=train_program, batch_size=args.batch_size)
print("Theoretical memory usage in training: %.3f - %.3f %s" %
(lower_mem, upper_mem, unit))
if args.do_predict:
predict_processor = DataProcessor(
vocab_path=args.vocab_path,
do_lower_case=args.do_lower_case,
max_seq_length=args.max_seq_len,
in_tokens=args.in_tokens,
doc_stride=args.doc_stride,
max_query_length=args.max_query_length)
mrc_test_generator = predict_processor.data_generator(
data_path=args.predict_file,
batch_size=args.batch_size,
max_len=args.max_seq_len,
phase='predict',
shuffle=False,
dev_count=dev_count,
epoch=1)
test_input_shape = [
([-1, max_seq_len, 1], 'int64'),
([-1, max_seq_len, 1], 'int64'),
([-1, max_seq_len, 1], 'int64'),
([-1, max_seq_len, 1], 'float32'),
([-1, 1], 'int64')]
build_strategy = fluid.BuildStrategy()
test_prog = fluid.Program()
with fluid.program_guard(test_prog, startup_prog):
with fluid.unique_name.guard():
placeholder = Placeholder(test_input_shape)
test_pyreader, model_inputs = placeholder.build(
capacity=100, reader_name="test_reader")
test_pyreader.decorate_tensor_provider(mrc_test_generator)
# create model
bert_inputs = model_inputs[0:4]
mrc_inputs = model_inputs
test_bert_model = ModelBERT(
conf={"bert_conf_file": args.bert_config_path},
is_training=False)
test_create_bert = test_bert_model.create_model(args, bert_inputs)
test_create_bert()
mrc_output_tensors = mrqa_net.create_model(
mrc_inputs, base_model=test_bert_model, is_training=False, args=args
)
unique_ids = mrc_output_tensors['unique_id']
start_logits = mrc_output_tensors['start_logits']
end_logits = mrc_output_tensors['end_logits']
num_seqs = mrc_output_tensors['num_seqs']
if 'ema' not in dir():
ema = fluid.optimizer.ExponentialMovingAverage(args.ema_decay)
unique_ids.persistable = True
start_logits.persistable = True
end_logits.persistable = True
num_seqs.persistable = True
test_prog = test_prog.clone(for_test=True)
test_compiled_program = fluid.CompiledProgram(test_prog).with_data_parallel(
build_strategy=build_strategy)
exe.run(startup_prog)
if args.do_train:
if args.init_checkpoint and args.init_pretraining_params:
print(
"WARNING: args 'init_checkpoint' and 'init_pretraining_params' "
"both are set! Only arg 'init_checkpoint' is made valid.")
if args.init_checkpoint:
init_checkpoint(
exe,
args.init_checkpoint,
main_program=startup_prog,
use_fp16=args.use_fp16)
elif args.init_pretraining_params:
init_pretraining_params(
exe,
args.init_pretraining_params,
main_program=startup_prog,
use_fp16=args.use_fp16)
elif args.do_predict:
if not args.init_checkpoint:
raise ValueError("args 'init_checkpoint' should be set if"
"only doing prediction!")
init_checkpoint(
exe,
args.init_checkpoint,
main_program=startup_prog,
use_fp16=args.use_fp16)
if args.do_train:
train_pyreader.start()
steps = 0
total_cost, total_num_seqs = [], []
time_begin = time.time()
while True:
try:
steps += 1
if steps % args.skip_steps == 0:
if warmup_steps <= 0:
fetch_list = [loss.name, num_seqs.name]
else:
fetch_list = [
loss.name, scheduled_lr.name, num_seqs.name
]
else:
fetch_list = []
outputs = exe.run(train_compiled_program, fetch_list=fetch_list)
if steps % args.skip_steps == 0:
if warmup_steps <= 0:
np_loss, np_num_seqs = outputs
else:
np_loss, np_lr, np_num_seqs = outputs
total_cost.extend(np_loss * np_num_seqs)
total_num_seqs.extend(np_num_seqs)
if args.verbose:
verbose = "train pyreader queue size: %d, " % train_pyreader.queue.size(
)
verbose += "learning rate: %f" % (
np_lr[0]
if warmup_steps > 0 else args.learning_rate)
print(verbose)
time_end = time.time()
used_time = time_end - time_begin
print("progress: %d/%d, step: %d, loss: %f" % (steps, max_train_steps, steps, np.sum(total_cost) / np.sum(total_num_seqs)))
total_cost, total_num_seqs = [], []
time_begin = time.time()
if steps % args.save_steps == 0:
save_path = os.path.join(args.checkpoints,
"step_" + str(steps))
fluid.io.save_persistables(exe, save_path, train_program)
if steps == max_train_steps:
save_path = os.path.join(args.checkpoints,
"step_" + str(steps) + "_final")
fluid.io.save_persistables(exe, save_path, train_program)
break
except paddle.fluid.core.EOFException as err:
save_path = os.path.join(args.checkpoints,
"step_" + str(steps) + "_final")
fluid.io.save_persistables(exe, save_path, train_program)
train_pyreader.reset()
break
if args.do_predict:
if args.use_ema:
with ema.apply(exe):
predict(exe, test_compiled_program, test_pyreader, [
unique_ids.name, start_logits.name, end_logits.name, num_seqs.name
], predict_processor, prefix='ema_')
else:
predict(exe, test_compiled_program, test_pyreader, [
unique_ids.name, start_logits.name, end_logits.name, num_seqs.name
], predict_processor)
if __name__ == '__main__':
print_arguments(args)
train(args)
# wget pretrain model
wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/D-Net/squad2_model.tar.gz
tar -xvf squad2_model.tar.gz
rm squad2_model.tar.gz
mv squad2_model ./data/pretrain_model/
# wget knowledge_distillation dataset
wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/D-Net/d_net_knowledge_distillation_dataset.tar.gz
tar -xvf d_net_knowledge_distillation_dataset.tar.gz
rm d_net_knowledge_distillation_dataset.tar.gz
mv mlm_data ./data/input
mv mrqa_distill_data ./data/input
# wget evaluation dev dataset
wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/D-Net/mrqa_evaluation_dataset.tar.gz
tar -xvf mrqa_evaluation_dataset.tar.gz
rm mrqa_evaluation_dataset.tar.gz
mv mrqa_evaluation_dataset ./data/input
# wget predictions results
wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/D-Net/kd_prediction_results.tar.gz
tar -xvf kd_prediction_results.tar.gz
rm kd_prediction_results.tar.gz
# wget MRQA baidu trained knowledge distillation model
wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/D-Net/knowledge_distillation_model.tar.gz
tar -xvf knowledge_distillation_model.tar.gz
rm knowledge_distillation_model.tar.gz
mv knowledge_distillation_model ./data/saved_models
# Multi_task_learning # Multi task learning
## 1、Introduction ## 1. Introduction
The pretraining is usually performed on corpus with restricted domains, it is expected that increasing the domain diversity by further pre-training on other corpus may improve the generalization capability. Hence, we incorporate masked language model and domain classify model by using corpus from various domains as an auxiliary tasks in the fine-tuning phase, along with MRC. Additionally, we explore multi-task learning by incorporating the supervised dataset from other NLP tasks to learn better language representation. Multi task learning (MTL) has been used in many NLP tasks to obtain better language representations. Hence, we experiment with several auxiliary tasks to improve the generalization capability of a MRC model. The auxiliary tasks that we use include
- Unsupervised Task: masked Language Model
- Supervised Tasks:
- natural language inference
- paragraph ranking
In the MRQA 2019 shared task, We use [PALM](https://github.com/PaddlePaddle/PALM) v1.0 (a multi-task learning Library based on PaddlePaddle) to perform multi-task training, which makes the implementation of new tasks and pre-trained models much easier than from scratch.
## 2、Quick Start
We use PaddlePaddle PALM(multi-task Learning Library) to train MRQA2019 MRC multi-task baseline model, download PALM:
```
git clone https://github.com/PaddlePaddle/PALM.git
```
PALM user guide: [README.md](https://github.com/PaddlePaddle/PALM/blob/master/README.md) ## 2.Preparation
### Environment ### Environment
- Python >= 2.7 - Python >= 2.7
- cuda >= 9.0 - cuda >= 9.0
- cudnn >= 7.0 - cudnn >= 7.0
- PaddlePaddle >= 1.6 Please refer to Installation Guide [Installation Guide](http://www.paddlepaddle.org/#quick-start) - PaddlePaddle 1.5.2 (Please refer to the Installation Guide [Installation Guide](http://www.paddlepaddle.org/#quick-start))
- PALM v1.0
### Data Preparation ### Install PALM
#### Get data directly: To install PALM v1.0, run the follwing command under `multi_task_learning/`,
User can get the data directly we provided:
```
bash wget_data.sh
```
#### Convert MRC dataset to squad format data:
To download the MRQA datasets, run
``` ```
cd scripts && bash download_data.sh && cd .. git clone --branch v1.0 --depth 1 https://github.com/PaddlePaddle/PALM.git
``` ```
The training and prediction datasets will be saved in `./scripts/train/` and `./scripts/dev/`, respectively.
The Multi_task_learning model only supports dataset files in SQuAD format. Before running the model on MRQA datasets, one need to convert the official MRQA data to SQuAD format. To do the conversion, run For more instructions, see the PALM user guide: [README.md](https://github.com/PaddlePaddle/PALM/blob/v1.0/README.md)
```
cd scripts && bash convert_mrqa2squad.sh && cd ..
``` ### Dowload data
The output files will be named as `xxx.raw.json`.
To download the MRQA training and development data, as well as other auxiliary data for MTL, run
For convenience, we provide a script to combine all the training and development data into a single file respectively.
``` ```
cd scripts && bash combine.sh && cd .. bash wget_data.sh
``` ```
The combined files will be saved in `./scripts/train/mrqa-combined.raw.json` and `./scripts/dev/mrqa-combined.raw.json`. The downloaded data will be saved into `data/mrqa` (combined MRQA training and development data), `data/mrqa_dev` (seperated MRQA in-domain and out-of-domain data, for model evaluation), `mlm4mrqa` (training data for masked language model task) and `data/am4mrqa` (training data for paragraph matching task).
### Download pre-trained parameters
In our MTL experiments, we use BERT as our shared encoder. The parameters are initialized from the Whole Word Masking BERT (BERTwwm), further fine-tuned on the SQuAD 2.0 task with synthetic generated question answering corpora. The model parameters in Tensorflow format can be downloaded [here](https://worksheets.codalab.org/worksheets/0x3852e60a51d2444680606556d404c657). The following command can be used to convert the parameters to the format that is readable for PaddlePaddle.
### Models Preparation
In this competition, We use google squad2.0 model as pretrain model [Model Link](https://worksheets.codalab.org/worksheets/0x3852e60a51d2444680606556d404c657)
we provide script to convert tensorflow model to paddle model
``` ```
cd scripts && python convert_model_params.py --init_tf_checkpoint tf_model --fluid_params_dir paddle_model && cd .. cd scripts && python convert_model_params.py --init_tf_checkpoint tf_model --fluid_params_dir paddle_model && cd ..
``` ```
or user can get the pretrain model and multi-task learning trained models we provided: Alternatively, user can directly **download the parameters that we have converted**:
``` ```
bash wget_models.sh bash wget_pretrained_model.sh
``` ```
## 3、Train and Predict ## 3. Training
Preparing data, models, and task profiles for PALM In the following example, we use PALM library to preform a MLT with 3 tasks (i.e. machine reading comprehension as main task, masked lagnuage model and paragraph ranking as auxiliary tasks). For a detialed instruction on PALM, please refer to the [user guide](https://github.com/PaddlePaddle/PALM/blob/v1.0/README.md).
The PALM library requires a config file for every single task and a main config file `mtl_config.yaml`, which control the training behavior and hyper-parameters. For simplicity, we have prepared those files in the `multi_task_learning/configs` folder. To move the configuration files, data set and model parameters to the correct directory, run
``` ```
bash run_build_palm.sh bash run_build_palm.sh
``` ```
Start training: Once everything is in the right place, one can start training
``` ```
cd PALM cd PALM
bash run_multi_task.sh bash run_multi_task.sh
``` ```
The fine-tuned parameters and model predictions will be saved in `PALM/output/`, as specified by `mtl_config.yaml`.
## 4. Evaluation
The scripts for evaluation are in the folder `scripts/`. Here we provide an example for the usage of those scripts.
Before evaluation, one need a json file which contains the prediction results on the MRQA dev set. For convenience, we prepare two model prediction files with different MTL configurations, which have been saved in the `prediction_results/` folder, as downloaded in section **Download data**.
## 4、Evaluation
To evaluate the result, run To evaluate the result, run
``` ```
bash run_evaluation.sh bash run_evaluation.sh
``` ```
Note that we use the evaluation script for SQuAD 1.1 here, which is equivalent to the official one. The F1 and EM score of the two model predictions will be saved into `prediction_results/BERT_MLM.log` and `prediction_results/BERT_MLM_ParaRank.log`. The macro average of F1 score will be printed on the console. The table below shows the results of our experiments with different MTL configurations.
## 5、Performance |models |in-domain dev (Macro-F1)|out-of-domain dev (Macro-F1) |
| | dev in_domain(Macro-F1)| dev out_of_domain(Macro-F1) |
| ------------- | ------------ | ------------ | | ------------- | ------------ | ------------ |
| Official baseline | 77.87 | 58.67 | | Official baseline | 77.87 | 58.67 |
| BERT | 82.40 | 66.35 | | BERT (no MTL) | 82.40 | 66.35 |
| BERT + MLM | 83.19 | 67.45 | | BERT + MLM | 83.19 | 67.45 |
| BERT + MLM + ParaRank | 83.51 | 66.83 | | BERT + MLM + ParaRank | 83.51 | 66.83 |
BERT: reading comprehension single model.
BERT + MLM: reading comprehension single model as main task, mask language model as auxiliary task.
BERT + MLM + ParaRank: reading comprehension single model as main task, mask language model and paragraph classify rank as auxiliary tasks.
BERT config: configs/reading_comprehension.yaml
MLM config: configs/mask_language_model.yaml
ParaRank config: configs/answer_matching.yaml
## Copyright and License ## Copyright and License
Copyright 2019 Baidu.com, Inc. All Rights Reserved Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and Copyright 2019 Baidu.com, Inc. All Rights Reserved Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and
......
...@@ -5,5 +5,4 @@ cp configs/mtl_config.yaml PALM/ ...@@ -5,5 +5,4 @@ cp configs/mtl_config.yaml PALM/
rm -rf PALM/data rm -rf PALM/data
mv data PALM/ mv data PALM/
mv squad2_model PALM/pretrain_model mv squad2_model PALM/pretrain_model
mv mrqa_multi_task_models PALM/
cp run_multi_task.sh PALM/ cp run_multi_task.sh PALM/
...@@ -2,6 +2,3 @@ wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/D-Net/squad2_model.t ...@@ -2,6 +2,3 @@ wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/D-Net/squad2_model.t
tar -xvf squad2_model.tar.gz tar -xvf squad2_model.tar.gz
rm squad2_model.tar.gz rm squad2_model.tar.gz
wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/D-Net/mrqa_multi_task_models.tar.gz
tar -xvf mrqa_multi_task_models.tar.gz
rm mrqa_multi_task_models.tar.gz
# server # ensemble server system
This directory contains the ensemble system for the three models that are fine-tuned on the MRQA in-domain data (i.e. models based on ERNIE2.0, XL-NET and BERT). The architecture of the ensemble system is shown in the figure below. We first start 3 independent model server for ERNIE, XL-NET and BERT. We then start a main server to receive client requests, invoke model servers and ensemble model results.
For convinience, users are able to explore **any ensemble combinations** (e.g. ERNIE+XL-NET, BERT+XL-NET), by simply modifying the configurations.
## Introduction <p align="center">
MRQA 2019 Shared Task submission will be handled through the [Codalab](https://worksheets.codalab.org/) platform: see [these instructions](https://worksheets.codalab.org/worksheets/0x926e37ac8b4941f793bf9b9758cc01be/). <img src="../images/D-NET_server.png" width="500">
</p>
We provided D-NET models submission environment for MRQA competition. it includes two server: bert server and xlnet server, we merged the results of two serves.
## Inference Model Preparation ## Environment
Download bert inference model and xlnet inferece model In our test environment, we use
- Python 2.7.13
- PaddlePaddle 1.5.2
- sentencepiece 0.1.83
- flask 1.1.1
- Cuda 9.0
- CuDNN 7.0
## Download model parameters
To downlowd the model parameters that are fine-tuned on the MRQA in-domain data, run
``` ```
bash wget_server_inference_model.sh bash wget_server_inference_model.sh
``` ```
A folder named `infere_model` will appear in `ernie_server/`, `xlnet_server/` and `bert_server/`.
## Start server ## Start servers
Before starting the server, please make sure the ports `5118` to `5121` are available, and specify the `gpu_id` in `start.sh` (by default `GPU 0` on the machine will be used).
To start the servers, run
We can set GPU card for bert server or xlnet server, By setting variable CUDA_VISIBLE_DEVICES:
```
export CUDA_VISIBLE_DEVICES=1
```
In main_server.py file we set the server port for bert and xlnet model, as shown below, If the port 5118 or 5120 is occupied, please set up an idle port.
``` ```
url_1 = 'http://127.0.0.1:5118' # url for model1 bash start.sh
url_2 = 'http://127.0.0.1:5120' # url for model2
``` ```
start server The log for the main server will be saved in `main_server.log`, and the logs for the 3 model servers witll be saved in `ernie_server/ernie.log`, `xlnet_server/xlnet.log` and `bert_server/bert.log`.
By default, the main server will ensemble the results from ERNIE and XL-NET. To explore other ensemble combinations, one can change the configuration in `start.sh` (e.g. `python main_server.py --ernie --xlnet --bert` for 3 models, `python main_server.py --bert --xlnet` for BERT and XL-NET only).
Note that in our test environment, we use Tesla K40 (12G) and the three modles are able to fit in a single card. For GPUs with smaller RAM, one can choose to put three models on different card by modifying the configurations in `start.sh`.
## Send requests
Once the servers are successfully launched, one can use the client script to send requests.
``` ```
bash start.sh cd client
python client.py demo.txt results.txt 5121
``` ```
This will the read the examples in `demo.txt`, send requests to the main server, and save results into `results.txt`. The format of the input file (i.e. `demo.txt`) need to be in [MRQA official format](https://github.com/mrqa/MRQA-Shared-Task-2019).
\ No newline at end of file
#encoding=utf8
import os
import sys
import argparse
from copy import deepcopy as copy
import numpy as np
import paddle
import paddle.fluid as fluid
import collections
import multiprocessing
from pdnlp.nets.bert import BertModel
from pdnlp.toolkit.configure import JsonConfig
class ModelBERT(object):
def __init__(
self,
conf,
name = "",
is_training = False,
base_model = None):
# the name of this task
# name is used for identifying parameters
self.name = name
# deep copy the configure of model
self.conf = copy(conf)
self.is_training = is_training
## the overall loss of this task
self.loss = None
## outputs may be useful for the other models
self.outputs = {}
## the prediction of this task
self.predict = []
def create_model(self,
args,
reader_input,
base_model = None):
"""
given the base model, reader_input
return the create fn for create this model
"""
def _create_model():
src_ids, pos_ids, sent_ids, input_mask = reader_input
bert_conf = JsonConfig(self.conf["bert_conf_file"])
self.bert = BertModel(
src_ids = src_ids,
position_ids = pos_ids,
sentence_ids = sent_ids,
input_mask = input_mask,
config = bert_conf,
use_fp16 = args.use_fp16,
model_name = self.name)
self.loss = None
self.outputs = {
"sequence_output": self.bert.get_sequence_output(),
# "pooled_output": self.bert.get_pooled_output()
}
return _create_model
def get_output(self, name):
return self.outputs[name]
def get_outputs(self):
return self.outputs
def get_predict(self):
return self.predict
if __name__ == "__main__":
bert_model = ModelBERT(conf = {"json_conf_path" : "./data/pretrained_models/squad2_model/bert_config.json"})
...@@ -12,8 +12,6 @@ import argparse ...@@ -12,8 +12,6 @@ import argparse
import numpy as np import numpy as np
import paddle.fluid as fluid import paddle.fluid as fluid
from task_reader.mrqa import DataProcessor, get_answers from task_reader.mrqa import DataProcessor, get_answers
from bert_model import ModelBERT
import mrc_model
ema_decay = 0.9999 ema_decay = 0.9999
verbose = False verbose = False
......
# encoding=utf8
import paddle.fluid as fluid
def compute_loss(output_tensors, args=None):
"""Compute loss for mrc model"""
def _compute_single_loss(logits, positions):
"""Compute start/end loss for mrc model"""
loss = fluid.layers.softmax_with_cross_entropy(
logits=logits, label=positions)
loss = fluid.layers.mean(x=loss)
return loss
start_logits = output_tensors['start_logits']
end_logits = output_tensors['end_logits']
start_positions = output_tensors['start_positions']
end_positions = output_tensors['end_positions']
start_loss = _compute_single_loss(start_logits, start_positions)
end_loss = _compute_single_loss(end_logits, end_positions)
total_loss = (start_loss + end_loss) / 2.0
if args.use_fp16 and args.loss_scaling > 1.0:
total_loss = total_loss * args.loss_scaling
return total_loss
def create_model(reader_input, base_model=None, is_training=True, args=None):
"""
given the base model, reader_input
return the output tensors
"""
if is_training:
src_ids, pos_ids, sent_ids, input_mask, \
start_positions, end_positions = reader_input
else:
src_ids, pos_ids, sent_ids, input_mask, unique_id = reader_input
enc_out = base_model.get_output("sequence_output")
logits = fluid.layers.fc(
input=enc_out,
size=2,
num_flatten_dims=2,
param_attr=fluid.ParamAttr(
name="cls_squad_out_w",
initializer=fluid.initializer.TruncatedNormal(scale=0.02)),
bias_attr=fluid.ParamAttr(
name="cls_squad_out_b", initializer=fluid.initializer.Constant(0.)))
logits = fluid.layers.transpose(x=logits, perm=[2, 0, 1])
start_logits, end_logits = fluid.layers.unstack(x=logits, axis=0)
batch_ones = fluid.layers.fill_constant_batch_size_like(
input=start_logits, dtype='int64', shape=[1], value=1)
num_seqs = fluid.layers.reduce_sum(input=batch_ones)
output_tensors = {}
output_tensors['start_logits'] = start_logits
output_tensors['end_logits'] = end_logits
output_tensors['num_seqs'] = num_seqs
if is_training:
output_tensors['start_positions'] = start_positions
output_tensors['end_positions'] = end_positions
else:
output_tensors['unique_id'] = unique_id
output_tensors['start_logits'] = start_logits
output_tensors['end_logits'] = end_logits
return output_tensors
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Transformer encoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from functools import partial
import numpy as np
import paddle.fluid as fluid
import paddle.fluid.layers as layers
def multi_head_attention(queries,
keys,
values,
attn_bias,
d_key,
d_value,
d_model,
n_head=1,
dropout_rate=0.,
cache=None,
param_initializer=None,
name='multi_head_att'):
"""
Multi-Head Attention. Note that attn_bias is added to the logit before
computing softmax activiation to mask certain selected positions so that
they will not considered in attention weights.
"""
keys = queries if keys is None else keys
values = keys if values is None else values
if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
raise ValueError(
"Inputs: quries, keys and values should all be 3-D tensors.")
def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
"""
Add linear projection to queries, keys, and values.
"""
q = layers.fc(input = queries,
size = d_key * n_head,
num_flatten_dims = 2,
param_attr = fluid.ParamAttr(
name = name + '_query_fc.w_0',
initializer = param_initializer),
bias_attr = name + '_query_fc.b_0')
k = layers.fc(input = keys,
size = d_key * n_head,
num_flatten_dims = 2,
param_attr = fluid.ParamAttr(
name = name + '_key_fc.w_0',
initializer = param_initializer),
bias_attr = name + '_key_fc.b_0')
v = layers.fc(input = values,
size = d_value * n_head,
num_flatten_dims = 2,
param_attr = fluid.ParamAttr(
name = name + '_value_fc.w_0',
initializer = param_initializer),
bias_attr = name + '_value_fc.b_0')
return q, k, v
def __split_heads(x, n_head):
"""
Reshape the last dimension of inpunt tensor x so that it becomes two
dimensions and then transpose. Specifically, input a tensor with shape
[bs, max_sequence_length, n_head * hidden_dim] then output a tensor
with shape [bs, n_head, max_sequence_length, hidden_dim].
"""
hidden_size = x.shape[-1]
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
reshaped = layers.reshape(
x = x, shape = [0, 0, n_head, hidden_size // n_head], inplace=True)
# permuate the dimensions into:
# [batch_size, n_head, max_sequence_len, hidden_size_per_head]
return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
def __combine_heads(x):
"""
Transpose and then reshape the last two dimensions of inpunt tensor x
so that it becomes one dimension, which is reverse to __split_heads.
"""
if len(x.shape) == 3: return x
if len(x.shape) != 4:
raise ValueError("Input(x) should be a 4-D Tensor.")
trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
# The value 0 in shape attr means copying the corresponding dimension
# size of the input as the output dimension size.
return layers.reshape(
x = trans_x,
shape = [0, 0, trans_x.shape[2] * trans_x.shape[3]],
inplace = True)
def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
"""
Scaled Dot-Product Attention
"""
scaled_q = layers.scale(x = q, scale = d_key**-0.5)
product = layers.matmul(x = scaled_q, y = k, transpose_y = True)
if attn_bias:
product += attn_bias
weights = layers.softmax(product)
if dropout_rate:
weights = layers.dropout(
weights,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test=False)
out = layers.matmul(weights, v)
return out
q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)
if cache is not None: # use cache and concat time steps
# Since the inplace reshape in __split_heads changes the shape of k and
# v, which is the cache input for next time step, reshape the cache
# input from the previous time step first.
k = cache["k"] = layers.concat(
[layers.reshape(
cache["k"], shape=[0, 0, d_model]), k], axis=1)
v = cache["v"] = layers.concat(
[layers.reshape(
cache["v"], shape=[0, 0, d_model]), v], axis=1)
q = __split_heads(q, n_head)
k = __split_heads(k, n_head)
v = __split_heads(v, n_head)
ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key,
dropout_rate)
out = __combine_heads(ctx_multiheads)
# Project back to the model size.
proj_out = layers.fc(input = out,
size = d_model,
num_flatten_dims = 2,
param_attr=fluid.ParamAttr(
name = name + '_output_fc.w_0',
initializer = param_initializer),
bias_attr = name + '_output_fc.b_0')
return proj_out
def positionwise_feed_forward(x,
d_inner_hid,
d_hid,
dropout_rate,
hidden_act,
param_initializer=None,
name='ffn'):
"""
Position-wise Feed-Forward Networks.
This module consists of two linear transformations with a ReLU activation
in between, which is applied to each position separately and identically.
"""
hidden = layers.fc(input=x,
size=d_inner_hid,
num_flatten_dims=2,
act=hidden_act,
param_attr=fluid.ParamAttr(
name=name + '_fc_0.w_0',
initializer=param_initializer),
bias_attr=name + '_fc_0.b_0')
if dropout_rate:
hidden = layers.dropout(
hidden,
dropout_prob=dropout_rate,
dropout_implementation="upscale_in_train",
is_test = False)
out = layers.fc(input = hidden,
size = d_hid,
num_flatten_dims = 2,
param_attr=fluid.ParamAttr(
name = name + '_fc_1.w_0',
initializer = param_initializer),
bias_attr = name + '_fc_1.b_0')
return out
def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0.,
name=''):
"""
Add residual connection, layer normalization and droput to the out tensor
optionally according to the value of process_cmd.
This will be used before or after multi-head attention and position-wise
feed-forward networks.
"""
for cmd in process_cmd:
if cmd == "a": # add residual connection
out = out + prev_out if prev_out else out
elif cmd == "n": # add layer normalization
out_dtype = out.dtype
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x = out, dtype = "float32")
out = layers.layer_norm(
out,
begin_norm_axis=len(out.shape) - 1,
param_attr=fluid.ParamAttr(
name = name + '_layer_norm_scale',
initializer = fluid.initializer.Constant(1.)),
bias_attr=fluid.ParamAttr(
name = name + '_layer_norm_bias',
initializer = fluid.initializer.Constant(0.)))
if out_dtype == fluid.core.VarDesc.VarType.FP16:
out = layers.cast(x = out, dtype = "float16")
elif cmd == "d": # add dropout
if dropout_rate:
out = layers.dropout(
out,
dropout_prob = dropout_rate,
dropout_implementation = "upscale_in_train",
is_test = False)
return out
pre_process_layer = partial(pre_post_process_layer, None)
post_process_layer = pre_post_process_layer
def encoder_layer(enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name=''):
"""The encoder layers that can be stacked to form a deep encoder.
This module consits of a multi-head (self) attention followed by
position-wise feed-forward networks and both the two components companied
with the post_process_layer to add residual connection, layer normalization
and droput.
"""
attn_output = multi_head_attention(
pre_process_layer(
enc_input,
preprocess_cmd,
prepostprocess_dropout,
name=name + '_pre_att'),
None,
None,
attn_bias,
d_key,
d_value,
d_model,
n_head,
attention_dropout,
param_initializer = param_initializer,
name = name + '_multi_head_att')
attn_output = post_process_layer(
enc_input,
attn_output,
postprocess_cmd,
prepostprocess_dropout,
name = name + '_post_att')
ffd_output = positionwise_feed_forward(
pre_process_layer(
attn_output,
preprocess_cmd,
prepostprocess_dropout,
name = name + '_pre_ffn'),
d_inner_hid,
d_model,
relu_dropout,
hidden_act,
param_initializer = param_initializer,
name = name + '_ffn')
return post_process_layer(
attn_output,
ffd_output,
postprocess_cmd,
prepostprocess_dropout,
name = name + '_post_ffn')
def encoder(enc_input,
attn_bias,
n_layer,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd="n",
postprocess_cmd="da",
param_initializer=None,
name='',
return_all = False):
"""
The encoder is composed of a stack of identical layers returned by calling
encoder_layer.
"""
enc_outputs = []
for i in range(n_layer):
enc_output = encoder_layer(
enc_input,
attn_bias,
n_head,
d_key,
d_value,
d_model,
d_inner_hid,
prepostprocess_dropout,
attention_dropout,
relu_dropout,
hidden_act,
preprocess_cmd,
postprocess_cmd,
param_initializer = param_initializer,
name = name + '_layer_' + str(i))
enc_input = enc_output
if i < n_layer - 1:
enc_outputs.append(enc_output)
enc_output = pre_process_layer(
enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")
enc_outputs.append(enc_output)
if not return_all:
return enc_output
else:
return enc_output, enc_outputs
#encoding=utf8
import os
import sys
import random
import numpy as np
import paddle
import paddle.fluid as fluid
from pdnlp.toolkit.placeholder import Placeholder
def repeat(reader):
"""Repeat a generator forever"""
generator = reader()
while True:
try:
yield next(generator)
except StopIteration:
generator = reader()
yield next(generator)
def create_joint_generator(input_shape, generators, is_multi_task=True):
def empty_output(input_shape, batch_size=1):
results = []
for i in range(len(input_shape)):
if input_shape[i][1] == 'int32':
dtype = np.int32
if input_shape[i][1] == 'int64':
dtype = np.int64
if input_shape[i][1] == 'float32':
dtype = np.float32
if input_shape[i][1] == 'float64':
dtype = np.float64
shape = input_shape[i][0]
shape[0] = batch_size
pad_tensor = np.zeros(shape=shape, dtype=dtype)
results.append(pad_tensor)
return results
def wrapper():
"""wrapper data"""
generators_inst = [repeat(gen[0]) for gen in generators]
generators_ratio = [gen[1] for gen in generators]
weights = [ratio/sum(generators_ratio) for ratio in generators_ratio]
run_task_id = range(len(generators))
while True:
idx = np.random.choice(run_task_id, p=weights)
gen_results = next(generators_inst[idx])
if not gen_results:
break
batch_size = gen_results[0].shape[0]
results = empty_output(input_shape, batch_size)
task_id_tensor = np.array([[idx]]).astype("int64")
results[0] = task_id_tensor
for i in range(4):
results[i+1] = gen_results[i]
if idx == 0:
# mrc batch
results[5] = gen_results[4]
results[6] = gen_results[5]
elif idx == 1:
# mlm batch
results[7] = gen_results[4]
results[8] = gen_results[5]
elif idx == 2:
# MNLI batch
results[9] = gen_results[4]
else:
raise RuntimeError('Invalid task ID - {}'.format(idx))
# idx stands for the task index
yield results
return wrapper
def create_reader(reader_name, input_shape, is_multi_task, *gens):
"""
build reader for multi_task_learning
"""
placeholder = Placeholder(input_shape)
pyreader, model_inputs = placeholder.build(capacity=100, reader_name=reader_name)
joint_generator = create_joint_generator(input_shape, gens[0], is_multi_task=is_multi_task)
return joint_generator, pyreader, model_inputs
export FLAGS_fraction_of_gpu_memory_to_use=0.1 export FLAGS_fraction_of_gpu_memory_to_use=0.1
python start_service.py ./infer_model 5118 & port=$1
gpu=$2
export CUDA_VISIBLE_DEVICES=$gpu
python start_service.py ./infer_model $port
#!/usr/bin/env python #!/usr/bin/env python
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
"""Provide MRC service for TOP1 short answer extraction system """
Note the services here share some global pre/post process objects, which BERT model service
are **NOT THREAD SAFE**. Try to use multi-process instead of multi-thread
for deployment.
""" """
import json import json
import sys import sys
......
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Query the MRQA model server to generate predictions.
"""
import argparse
import json
import requests
import time
if __name__ == '__main__':
parse = argparse.ArgumentParser("")
parse.add_argument("dataset")
parse.add_argument("output_file")
parse.add_argument("port", type=int)
args = parse.parse_args()
all_predictions = {}
contexts = []
f = open(args.dataset)
for example in f:
context = json.loads(example)
if 'header' in context:
continue
contexts.append(context)
f.close()
results = {}
cnt = 0
for context in contexts:
cnt += 1
start = time.time()
pred = requests.post('http://127.0.0.1:%d' % args.port, json=context)
result = pred.json()
results.update(result)
end=time.time()
print('----- request cnt: {}, time elapsed: {:.2f} ms -----'.format(cnt, (end - start)*1000))
for qid, answer in result.items():
print('{}: {}'.format(qid, answer.encode('utf-8')))
with open(args.output_file,'w') as f:
json.dump(results, f, indent=1)
{"id": "", "context": "Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24\u201310 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the \"golden anniversary\" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as \"Super Bowl L\"), so that the logo could prominently feature the Arabic numerals 50.", "qas": [{"answers": ["Denver Broncos", "Denver Broncos", "Denver Broncos"], "question": "Which NFL team represented the AFC at Super Bowl 50?", "id": "56be4db0acb8001400a502ec", "qid": "b0626b3af0764c80b1e6f22c114982c1", "question_tokens": [["Which", 0], ["NFL", 6], ["team", 10], ["represented", 15], ["the", 27], ["AFC", 31], ["at", 35], ["Super", 38], ["Bowl", 44], ["50", 49], ["?", 51]], "detected_answers": [{"text": "Denver Broncos", "char_spans": [[177, 190]], "token_spans": [[33, 34]]}]}, {"answers": ["Carolina Panthers", "Carolina Panthers", "Carolina Panthers"], "question": "Which NFL team represented the NFC at Super Bowl 50?", "id": "56be4db0acb8001400a502ed", "qid": "8d96e9feff464a52a15e192b1dc9ed01", "question_tokens": [["Which", 0], ["NFL", 6], ["team", 10], ["represented", 15], ["the", 27], ["NFC", 31], ["at", 35], ["Super", 38], ["Bowl", 44], ["50", 49], ["?", 51]], "detected_answers": [{"text": "Carolina Panthers", "char_spans": [[249, 265]], "token_spans": [[44, 45]]}]}, {"answers": ["Santa Clara, California", "Levi's Stadium", "Levi's Stadium in the San Francisco Bay Area at Santa Clara, California."], "question": "Where did Super Bowl 50 take place?", "id": "56be4db0acb8001400a502ee", "qid": "190fdfbc068243a7a04eb3ed59808db8", "question_tokens": [["Where", 0], ["did", 6], ["Super", 10], ["Bowl", 16], ["50", 21], ["take", 24], ["place", 29], ["?", 34]], "detected_answers": [{"text": "Levi's Stadium in the San Francisco Bay Area at Santa Clara, California.", "char_spans": [[355, 426]], "token_spans": [[66, 80]]}]}, {"answers": ["Denver Broncos", "Denver Broncos", "Denver Broncos"], "question": "Which NFL team won Super Bowl 50?", "id": "56be4db0acb8001400a502ef", "qid": "e8d4a7478ed5439fa55c2660267bcaa1", "question_tokens": [["Which", 0], ["NFL", 6], ["team", 10], ["won", 15], ["Super", 19], ["Bowl", 25], ["50", 30], ["?", 32]], "detected_answers": [{"text": "Denver Broncos", "char_spans": [[177, 190]], "token_spans": [[33, 34]]}]}, {"answers": ["gold", "gold", "gold"], "question": "What color was used to emphasize the 50th anniversary of the Super Bowl?", "id": "56be4db0acb8001400a502f0", "qid": "74019130542f49e184d733607e565a68", "question_tokens": [["What", 0], ["color", 5], ["was", 11], ["used", 15], ["to", 20], ["emphasize", 23], ["the", 33], ["50th", 37], ["anniversary", 42], ["of", 54], ["the", 57], ["Super", 61], ["Bowl", 67], ["?", 71]], "detected_answers": [{"text": "gold", "char_spans": [[521, 524]], "token_spans": [[99, 99]]}]}, {"answers": ["\"golden anniversary\"", "gold-themed", "\"golden anniversary"], "question": "What was the theme of Super Bowl 50?", "id": "56be8e613aeaaa14008c90d1", "qid": "3729174743f74ed58aa64cb7c7dbc7b3", "question_tokens": [["What", 0], ["was", 5], ["the", 9], ["theme", 13], ["of", 19], ["Super", 22], ["Bowl", 28], ["50", 33], ["?", 35]], "detected_answers": [{"text": "\"golden anniversary", "char_spans": [[487, 505]], "token_spans": [[93, 95]]}]}, {"answers": ["February 7, 2016", "February 7", "February 7, 2016"], "question": "What day was the game played on?", "id": "56be8e613aeaaa14008c90d2", "qid": "cc75a31d588842848d9890cafe092dec", "question_tokens": [["What", 0], ["day", 5], ["was", 9], ["the", 13], ["game", 17], ["played", 22], ["on", 29], ["?", 31]], "detected_answers": [{"text": "February 7, 2016", "char_spans": [[334, 349]], "token_spans": [[60, 63]]}]}, {"answers": ["American Football Conference", "American Football Conference", "American Football Conference"], "question": "What is the AFC short for?", "id": "56be8e613aeaaa14008c90d3", "qid": "7c1424bfa53a4de28c3ec91adfbfe4ab", "question_tokens": [["What", 0], ["is", 5], ["the", 8], ["AFC", 12], ["short", 16], ["for", 22], ["?", 25]], "detected_answers": [{"text": "American Football Conference", "char_spans": [[133, 160]], "token_spans": [[26, 28]]}]}, {"answers": ["\"golden anniversary\"", "gold-themed", "gold"], "question": "What was the theme of Super Bowl 50?", "id": "56bea9923aeaaa14008c91b9", "qid": "78a00c316d9e40e69711a9b5c7a932a0", "question_tokens": [["What", 0], ["was", 5], ["the", 9], ["theme", 13], ["of", 19], ["Super", 22], ["Bowl", 28], ["50", 33], ["?", 35]], "detected_answers": [{"text": "gold", "char_spans": [[521, 524]], "token_spans": [[99, 99]]}]}, {"answers": ["American Football Conference", "American Football Conference", "American Football Conference"], "question": "What does AFC stand for?", "id": "56bea9923aeaaa14008c91ba", "qid": "1ef03938ae3848798b701dd4dbb30bd9", "question_tokens": [["What", 0], ["does", 5], ["AFC", 10], ["stand", 14], ["for", 20], ["?", 23]], "detected_answers": [{"text": "American Football Conference", "char_spans": [[133, 160]], "token_spans": [[26, 28]]}]}, {"answers": ["February 7, 2016", "February 7", "February 7, 2016"], "question": "What day was the Super Bowl played on?", "id": "56bea9923aeaaa14008c91bb", "qid": "cfd440704eee420b9fdf92725a6cdb64", "question_tokens": [["What", 0], ["day", 5], ["was", 9], ["the", 13], ["Super", 17], ["Bowl", 23], ["played", 28], ["on", 35], ["?", 37]], "detected_answers": [{"text": "February 7, 2016", "char_spans": [[334, 349]], "token_spans": [[60, 63]]}]}, {"answers": ["Denver Broncos", "Denver Broncos", "Denver Broncos"], "question": "Who won Super Bowl 50?", "id": "56beace93aeaaa14008c91df", "qid": "ca4749d3d0204f418fbfbaa52a1d9ece", "question_tokens": [["Who", 0], ["won", 4], ["Super", 8], ["Bowl", 14], ["50", 19], ["?", 21]], "detected_answers": [{"text": "Denver Broncos", "char_spans": [[177, 190]], "token_spans": [[33, 34]]}]}, {"answers": ["Levi's Stadium", "Levi's Stadium", "Levi's Stadium in the San Francisco Bay Area at Santa Clara"], "question": "What venue did Super Bowl 50 take place in?", "id": "56beace93aeaaa14008c91e0", "qid": "c2c7e5d3fb87437c80d863d91f8a4e21", "question_tokens": [["What", 0], ["venue", 5], ["did", 11], ["Super", 15], ["Bowl", 21], ["50", 26], ["take", 29], ["place", 34], ["in", 40], ["?", 42]], "detected_answers": [{"text": "Levi's Stadium in the San Francisco Bay Area at Santa Clara", "char_spans": [[355, 413]], "token_spans": [[66, 77]]}]}, {"answers": ["Santa Clara", "Santa Clara", "Santa Clara"], "question": "What city did Super Bowl 50 take place in?", "id": "56beace93aeaaa14008c91e1", "qid": "643b4c1ef1644d18bf6866d95f24f900", "question_tokens": [["What", 0], ["city", 5], ["did", 10], ["Super", 14], ["Bowl", 20], ["50", 25], ["take", 28], ["place", 33], ["in", 39], ["?", 41]], "detected_answers": [{"text": "Santa Clara", "char_spans": [[403, 413]], "token_spans": [[76, 77]]}]}, {"answers": ["Super Bowl L", "L", "Super Bowl L"], "question": "If Roman numerals were used, what would Super Bowl 50 have been called?", "id": "56beace93aeaaa14008c91e2", "qid": "fad596c3f0e944abae33bf99ceccfbd6", "question_tokens": [["If", 0], ["Roman", 3], ["numerals", 9], ["were", 18], ["used", 23], [",", 27], ["what", 29], ["would", 34], ["Super", 40], ["Bowl", 46], ["50", 51], ["have", 54], ["been", 59], ["called", 64], ["?", 70]], "detected_answers": [{"text": "Super Bowl L", "char_spans": [[693, 704]], "token_spans": [[131, 133]]}]}, {"answers": ["2015", "the 2015 season", "2015"], "question": "Super Bowl 50 decided the NFL champion for what season?", "id": "56beace93aeaaa14008c91e3", "qid": "97f0c1c69a694cc8bc9edd41dd4c42be", "question_tokens": [["Super", 0], ["Bowl", 6], ["50", 11], ["decided", 14], ["the", 22], ["NFL", 26], ["champion", 30], ["for", 39], ["what", 43], ["season", 48], ["?", 54]], "detected_answers": [{"text": "2015", "char_spans": [[116, 119]], "token_spans": [[22, 22]]}]}, {"answers": ["2015", "2016", "2015"], "question": "What year did the Denver Broncos secure a Super Bowl title for the third time?", "id": "56bf10f43aeaaa14008c94fd", "qid": "d14fc2f7c07e4729a02888b4ee4c400c", "question_tokens": [["What", 0], ["year", 5], ["did", 10], ["the", 14], ["Denver", 18], ["Broncos", 25], ["secure", 33], ["a", 40], ["Super", 42], ["Bowl", 48], ["title", 53], ["for", 59], ["the", 63], ["third", 67], ["time", 73], ["?", 77]], "detected_answers": [{"text": "2015", "char_spans": [[116, 119]], "token_spans": [[22, 22]]}]}, {"answers": ["Santa Clara", "Santa Clara", "Santa Clara"], "question": "What city did Super Bowl 50 take place in?", "id": "56bf10f43aeaaa14008c94fe", "qid": "4297cde9c23a4105998937901a7fd3f6", "question_tokens": [["What", 0], ["city", 5], ["did", 10], ["Super", 14], ["Bowl", 20], ["50", 25], ["take", 28], ["place", 33], ["in", 39], ["?", 41]], "detected_answers": [{"text": "Santa Clara", "char_spans": [[403, 413]], "token_spans": [[76, 77]]}]}, {"answers": ["Levi's Stadium", "Levi's Stadium", "Levi's Stadium"], "question": "What stadium did Super Bowl 50 take place in?", "id": "56bf10f43aeaaa14008c94ff", "qid": "da8f425e541a46c19be04738f41097b3", "question_tokens": [["What", 0], ["stadium", 5], ["did", 13], ["Super", 17], ["Bowl", 23], ["50", 28], ["take", 31], ["place", 36], ["in", 42], ["?", 44]], "detected_answers": [{"text": "Levi's Stadium", "char_spans": [[355, 368]], "token_spans": [[66, 68]]}]}, {"answers": ["24\u201310", "24\u201310", "24\u201310"], "question": "What was the final score of Super Bowl 50? ", "id": "56bf10f43aeaaa14008c9500", "qid": "f944d4b2519b43e4a3dd13dda85495fc", "question_tokens": [["What", 0], ["was", 5], ["the", 9], ["final", 13], ["score", 19], ["of", 25], ["Super", 28], ["Bowl", 34], ["50", 39], ["?", 41]], "detected_answers": [{"text": "24\u201310", "char_spans": [[267, 271]], "token_spans": [[46, 46]]}]}, {"answers": ["February 7, 2016", "February 7, 2016", "February 7, 2016"], "question": "What month, day and year did Super Bowl 50 take place? ", "id": "56bf10f43aeaaa14008c9501", "qid": "adff197d69764b7fbe2a6ebaae075df4", "question_tokens": [["What", 0], ["month", 5], [",", 10], ["day", 12], ["and", 16], ["year", 20], ["did", 25], ["Super", 29], ["Bowl", 35], ["50", 40], ["take", 43], ["place", 48], ["?", 53]], "detected_answers": [{"text": "February 7, 2016", "char_spans": [[334, 349]], "token_spans": [[60, 63]]}]}, {"answers": ["2015", "2016", "2016"], "question": "What year was Super Bowl 50?", "id": "56d20362e7d4791d009025e8", "qid": "c5187d183b494ccf969a15cd0c3039e2", "question_tokens": [["What", 0], ["year", 5], ["was", 10], ["Super", 14], ["Bowl", 20], ["50", 25], ["?", 27]], "detected_answers": [{"text": "2016", "char_spans": [[346, 349]], "token_spans": [[63, 63]]}]}, {"answers": ["Denver Broncos", "Denver Broncos", "Denver Broncos"], "question": "What team was the AFC champion?", "id": "56d20362e7d4791d009025e9", "qid": "6288b96ce9944dc1b391ff08b6bd8386", "question_tokens": [["What", 0], ["team", 5], ["was", 10], ["the", 14], ["AFC", 18], ["champion", 22], ["?", 30]], "detected_answers": [{"text": "Denver Broncos", "char_spans": [[177, 190]], "token_spans": [[33, 34]]}]}, {"answers": ["Carolina Panthers", "Carolina Panthers", "Carolina Panthers"], "question": "What team was the NFC champion?", "id": "56d20362e7d4791d009025ea", "qid": "80edad8dc6254bd680100e36be2cfa98", "question_tokens": [["What", 0], ["team", 5], ["was", 10], ["the", 14], ["NFC", 18], ["champion", 22], ["?", 30]], "detected_answers": [{"text": "Carolina Panthers", "char_spans": [[249, 265]], "token_spans": [[44, 45]]}]}, {"answers": ["Denver Broncos", "Denver Broncos", "Denver Broncos"], "question": "Who won Super Bowl 50?", "id": "56d20362e7d4791d009025eb", "qid": "556c5788c4574cc78d53a241004c4e93", "question_tokens": [["Who", 0], ["won", 4], ["Super", 8], ["Bowl", 14], ["50", 19], ["?", 21]], "detected_answers": [{"text": "Denver Broncos", "char_spans": [[177, 190]], "token_spans": [[33, 34]]}]}, {"answers": ["2015", "the 2015 season", "2015"], "question": "Super Bowl 50 determined the NFL champion for what season?", "id": "56d600e31c85041400946eae", "qid": "18d7493cca8a44db945ff16a2949e26d", "question_tokens": [["Super", 0], ["Bowl", 6], ["50", 11], ["determined", 14], ["the", 25], ["NFL", 29], ["champion", 33], ["for", 42], ["what", 46], ["season", 51], ["?", 57]], "detected_answers": [{"text": "2015", "char_spans": [[116, 119]], "token_spans": [[22, 22]]}]}, {"answers": ["Denver Broncos", "Denver Broncos", "Denver Broncos"], "question": "Which team won Super Bowl 50.", "id": "56d600e31c85041400946eb0", "qid": "6392df5f107a4acf9d96321f1e0c177d", "question_tokens": [["Which", 0], ["team", 6], ["won", 11], ["Super", 15], ["Bowl", 21], ["50", 26], [".", 28]], "detected_answers": [{"text": "Denver Broncos", "char_spans": [[177, 190]], "token_spans": [[33, 34]]}]}, {"answers": ["Santa Clara, California.", "Levi's Stadium", "Levi's Stadium"], "question": "Where was Super Bowl 50 held?", "id": "56d600e31c85041400946eb1", "qid": "81485c83e23a45448e2b9d31a679d73b", "question_tokens": [["Where", 0], ["was", 6], ["Super", 10], ["Bowl", 16], ["50", 21], ["held", 24], ["?", 28]], "detected_answers": [{"text": "Levi's Stadium", "char_spans": [[355, 368]], "token_spans": [[66, 68]]}]}, {"answers": ["Super Bowl", "Super Bowl", "Super Bowl"], "question": "The name of the NFL championship game is?", "id": "56d9895ddc89441400fdb50e", "qid": "5668cdd5c25b4549856d628a3ec248d9", "question_tokens": [["The", 0], ["name", 4], ["of", 9], ["the", 12], ["NFL", 16], ["championship", 20], ["game", 33], ["is", 38], ["?", 40]], "detected_answers": [{"text": "Super Bowl", "token_spans": [[0, 1], [86, 87], [51, 52], [114, 115], [131, 132]], "char_spans": [[0, 9], [449, 458], [293, 302], [609, 618], [693, 702]]}]}, {"answers": ["Denver Broncos", "Denver Broncos", "Denver Broncos"], "question": "What 2015 NFL team one the AFC playoff?", "id": "56d9895ddc89441400fdb510", "qid": "52d6568dd0b74a99866cad2599161a4a", "question_tokens": [["What", 0], ["2015", 5], ["NFL", 10], ["team", 14], ["one", 19], ["the", 23], ["AFC", 27], ["playoff", 31], ["?", 38]], "detected_answers": [{"text": "Denver Broncos", "char_spans": [[177, 190]], "token_spans": [[33, 34]]}]}], "context_tokens": [["Super", 0], ["Bowl", 6], ["50", 11], ["was", 14], ["an", 18], ["American", 21], ["football", 30], ["game", 39], ["to", 44], ["determine", 47], ["the", 57], ["champion", 61], ["of", 70], ["the", 73], ["National", 77], ["Football", 86], ["League", 95], ["(", 102], ["NFL", 103], [")", 106], ["for", 108], ["the", 112], ["2015", 116], ["season", 121], [".", 127], ["The", 129], ["American", 133], ["Football", 142], ["Conference", 151], ["(", 162], ["AFC", 163], [")", 166], ["champion", 168], ["Denver", 177], ["Broncos", 184], ["defeated", 192], ["the", 201], ["National", 205], ["Football", 214], ["Conference", 223], ["(", 234], ["NFC", 235], [")", 238], ["champion", 240], ["Carolina", 249], ["Panthers", 258], ["24\u201310", 267], ["to", 273], ["earn", 276], ["their", 281], ["third", 287], ["Super", 293], ["Bowl", 299], ["title", 304], [".", 309], ["The", 311], ["game", 315], ["was", 320], ["played", 324], ["on", 331], ["February", 334], ["7", 343], [",", 344], ["2016", 346], [",", 350], ["at", 352], ["Levi", 355], ["'s", 359], ["Stadium", 362], ["in", 370], ["the", 373], ["San", 377], ["Francisco", 381], ["Bay", 391], ["Area", 395], ["at", 400], ["Santa", 403], ["Clara", 409], [",", 414], ["California", 416], [".", 426], ["As", 428], ["this", 431], ["was", 436], ["the", 440], ["50th", 444], ["Super", 449], ["Bowl", 455], [",", 459], ["the", 461], ["league", 465], ["emphasized", 472], ["the", 483], ["\"", 487], ["golden", 488], ["anniversary", 495], ["\"", 506], ["with", 508], ["various", 513], ["gold", 521], ["-", 525], ["themed", 526], ["initiatives", 533], [",", 544], ["as", 546], ["well", 549], ["as", 554], ["temporarily", 557], ["suspending", 569], ["the", 580], ["tradition", 584], ["of", 594], ["naming", 597], ["each", 604], ["Super", 609], ["Bowl", 615], ["game", 620], ["with", 625], ["Roman", 630], ["numerals", 636], ["(", 645], ["under", 646], ["which", 652], ["the", 658], ["game", 662], ["would", 667], ["have", 673], ["been", 678], ["known", 683], ["as", 689], ["\"", 692], ["Super", 693], ["Bowl", 699], ["L", 704], ["\"", 705], [")", 706], [",", 707], ["so", 709], ["that", 712], ["the", 717], ["logo", 721], ["could", 726], ["prominently", 732], ["feature", 744], ["the", 752], ["Arabic", 756], ["numerals", 763], ["50", 772], [".", 774]]}
{"id": "", "context": "The Broncos took an early lead in Super Bowl 50 and never trailed. Newton was limited by Denver's defense, which sacked him seven times and forced him into three turnovers, including a fumble which they recovered for a touchdown. Denver linebacker Von Miller was named Super Bowl MVP, recording five solo tackles, 2\u00bd sacks, and two forced fumbles.", "qas": [{"answers": ["Von Miller", "Von Miller", "Miller"], "question": "Who was the Super Bowl 50 MVP?", "id": "56be4eafacb8001400a50302", "qid": "fd7bfb38f688441087d80a0351b57a67", "question_tokens": [["Who", 0], ["was", 4], ["the", 8], ["Super", 12], ["Bowl", 18], ["50", 23], ["MVP", 26], ["?", 29]], "detected_answers": [{"text": "Miller", "char_spans": [[252, 257]], "token_spans": [[47, 47]]}]}, {"answers": ["2", "two", "two"], "question": "How many fumbles did Von Miller force in Super Bowl 50?", "id": "56be4eafacb8001400a50303", "qid": "5b79e7b38c4144318840802650a9dad7", "question_tokens": [["How", 0], ["many", 4], ["fumbles", 9], ["did", 17], ["Von", 21], ["Miller", 25], ["force", 32], ["in", 38], ["Super", 41], ["Bowl", 47], ["50", 52], ["?", 54]], "detected_answers": [{"text": "two", "char_spans": [[328, 330]], "token_spans": [[63, 63]]}]}, {"answers": ["Broncos", "The Broncos", "Broncos"], "question": "Which team held the scoring lead throughout the entire game?", "id": "56be4eafacb8001400a50304", "qid": "8eb67b9ad5dc44d0b807662d713368df", "question_tokens": [["Which", 0], ["team", 6], ["held", 11], ["the", 16], ["scoring", 20], ["lead", 28], ["throughout", 33], ["the", 44], ["entire", 48], ["game", 55], ["?", 59]], "detected_answers": [{"text": "Broncos", "char_spans": [[4, 10]], "token_spans": [[1, 1]]}]}, {"answers": ["linebacker Von Miller", "Von Miller", "Miller"], "question": "Which Denver linebacker was named Super Bowl MVP?", "id": "56beab833aeaaa14008c91d2", "qid": "bfa155f66d054ed8a2bd324ffd07f306", "question_tokens": [["Which", 0], ["Denver", 6], ["linebacker", 13], ["was", 24], ["named", 28], ["Super", 34], ["Bowl", 40], ["MVP", 45], ["?", 48]], "detected_answers": [{"text": "Miller", "char_spans": [[252, 257]], "token_spans": [[47, 47]]}]}, {"answers": ["five solo tackles", "five", "five"], "question": "How many solo tackles did Von Miller make at Super Bowl 50?", "id": "56beab833aeaaa14008c91d3", "qid": "52db8fd9a50a405286d50511f0cbdc01", "question_tokens": [["How", 0], ["many", 4], ["solo", 9], ["tackles", 14], ["did", 22], ["Von", 26], ["Miller", 30], ["make", 37], ["at", 42], ["Super", 45], ["Bowl", 51], ["50", 56], ["?", 58]], "detected_answers": [{"text": "five", "char_spans": [[295, 298]], "token_spans": [[55, 55]]}]}, {"answers": ["Newton was limited by Denver's defense", "Newton", "Newton"], "question": "Who was limited by Denver's defense?", "id": "56beab833aeaaa14008c91d4", "qid": "aa128287be4c4c259f508edb9cf10649", "question_tokens": [["Who", 0], ["was", 4], ["limited", 8], ["by", 16], ["Denver", 19], ["'s", 25], ["defense", 28], ["?", 35]], "detected_answers": [{"text": "Newton", "char_spans": [[67, 72]], "token_spans": [[14, 14]]}]}, {"answers": ["seven", "seven", "seven"], "question": "How many times was Cam Newton sacked?", "id": "56beae423aeaaa14008c91f4", "qid": "346105c83f374c7d850fc3851add1c0e", "question_tokens": [["How", 0], ["many", 4], ["times", 9], ["was", 15], ["Cam", 19], ["Newton", 23], ["sacked", 30], ["?", 36]], "detected_answers": [{"text": "seven", "char_spans": [[124, 128]], "token_spans": [[25, 25]]}]}, {"answers": ["Von Miller", "The Broncos", "Miller"], "question": "Who won the Super Bowl MVP?", "id": "56beae423aeaaa14008c91f5", "qid": "419e07351e714aea8c9f8f4768193d42", "question_tokens": [["Who", 0], ["won", 4], ["the", 8], ["Super", 12], ["Bowl", 18], ["MVP", 23], ["?", 26]], "detected_answers": [{"text": "Miller", "char_spans": [[252, 257]], "token_spans": [[47, 47]]}]}, {"answers": ["three", "three", "three"], "question": "How many turnovers did Cam Newton have?", "id": "56beae423aeaaa14008c91f6", "qid": "e3458adb99b1445ba9f0c15bfb1a835b", "question_tokens": [["How", 0], ["many", 4], ["turnovers", 9], ["did", 19], ["Cam", 23], ["Newton", 27], ["have", 34], ["?", 38]], "detected_answers": [{"text": "three", "char_spans": [[156, 160]], "token_spans": [[31, 31]]}]}, {"answers": ["two", "two", "two"], "question": "How many fumbles did Von Miller force?", "id": "56beae423aeaaa14008c91f7", "qid": "4212722392fb404e9a51f825307ef039", "question_tokens": [["How", 0], ["many", 4], ["fumbles", 9], ["did", 17], ["Von", 21], ["Miller", 25], ["force", 32], ["?", 37]], "detected_answers": [{"text": "two", "char_spans": [[328, 330]], "token_spans": [[63, 63]]}]}, {"answers": ["Von Miller", "Von Miller", "Miller"], "question": "Who was given the esteemed status of MVP for Super Bowl 50?", "id": "56bf17653aeaaa14008c9511", "qid": "96a72d8e442a4f79a95e7a10343fa74d", "question_tokens": [["Who", 0], ["was", 4], ["given", 8], ["the", 14], ["esteemed", 18], ["status", 27], ["of", 34], ["MVP", 37], ["for", 41], ["Super", 45], ["Bowl", 51], ["50", 56], ["?", 58]], "detected_answers": [{"text": "Miller", "char_spans": [[252, 257]], "token_spans": [[47, 47]]}]}, {"answers": ["linebacker", "linebacker", "linebacker"], "question": "What position does Von Miller play for the Denver Broncos?", "id": "56bf17653aeaaa14008c9513", "qid": "45016738a27c46cc85e841bd27ddde8f", "question_tokens": [["What", 0], ["position", 5], ["does", 14], ["Von", 19], ["Miller", 23], ["play", 30], ["for", 35], ["the", 39], ["Denver", 43], ["Broncos", 50], ["?", 57]], "detected_answers": [{"text": "linebacker", "char_spans": [[237, 246]], "token_spans": [[45, 45]]}]}, {"answers": ["5", "five", "five"], "question": "What was the number of solo tackles that Von Miller had in Super Bowl 50?", "id": "56bf17653aeaaa14008c9514", "qid": "659ef466d7aa45eb9cd9074c865831e0", "question_tokens": [["What", 0], ["was", 5], ["the", 9], ["number", 13], ["of", 20], ["solo", 23], ["tackles", 28], ["that", 36], ["Von", 41], ["Miller", 45], ["had", 52], ["in", 56], ["Super", 59], ["Bowl", 65], ["50", 70], ["?", 72]], "detected_answers": [{"text": "five", "char_spans": [[295, 298]], "token_spans": [[55, 55]]}]}, {"answers": ["2", "two", "two"], "question": "How many forced fumbles did Von Miller have during the Super Bowl 50 game?", "id": "56bf17653aeaaa14008c9515", "qid": "a0fa827ce3e94fa2be0997ba37a5cfe6", "question_tokens": [["How", 0], ["many", 4], ["forced", 9], ["fumbles", 16], ["did", 24], ["Von", 28], ["Miller", 32], ["have", 39], ["during", 44], ["the", 51], ["Super", 55], ["Bowl", 61], ["50", 66], ["game", 69], ["?", 73]], "detected_answers": [{"text": "two", "char_spans": [[328, 330]], "token_spans": [[63, 63]]}]}, {"answers": ["Von Miller", "Von Miller", "Von Miller"], "question": "Who won the MVP for the Super Bowl?", "id": "56d204ade7d4791d00902603", "qid": "277ba1c1c0b640da94f7959433306e9d", "question_tokens": [["Who", 0], ["won", 4], ["the", 8], ["MVP", 12], ["for", 16], ["the", 20], ["Super", 24], ["Bowl", 30], ["?", 34]], "detected_answers": [{"text": "Von Miller", "char_spans": [[248, 257]], "token_spans": [[46, 47]]}]}, {"answers": ["5", "five", "five"], "question": "How many tackles did Von Miller get during the game?", "id": "56d204ade7d4791d00902604", "qid": "eacb604a58ae461c8becaf241d20fc94", "question_tokens": [["How", 0], ["many", 4], ["tackles", 9], ["did", 17], ["Von", 21], ["Miller", 25], ["get", 32], ["during", 36], ["the", 43], ["game", 47], ["?", 51]], "detected_answers": [{"text": "five", "char_spans": [[295, 298]], "token_spans": [[55, 55]]}]}, {"answers": ["seven", "seven", "seven"], "question": "How many times was Cam Newton sacked in Super Bowl 50?", "id": "56d601e41c85041400946ece", "qid": "2d10f1eb017749d286b27d931c11ad1c", "question_tokens": [["How", 0], ["many", 4], ["times", 9], ["was", 15], ["Cam", 19], ["Newton", 23], ["sacked", 30], ["in", 37], ["Super", 40], ["Bowl", 46], ["50", 51], ["?", 53]], "detected_answers": [{"text": "seven", "char_spans": [[124, 128]], "token_spans": [[25, 25]]}]}, {"answers": ["three", "three", "three"], "question": "How many times did the Denver defense force Newton into turnovers?", "id": "56d601e41c85041400946ecf", "qid": "25e7b1192a3f46b29939ca3d24ba84d1", "question_tokens": [["How", 0], ["many", 4], ["times", 9], ["did", 15], ["the", 19], ["Denver", 23], ["defense", 30], ["force", 38], ["Newton", 44], ["into", 51], ["turnovers", 56], ["?", 65]], "detected_answers": [{"text": "three", "char_spans": [[156, 160]], "token_spans": [[31, 31]]}]}, {"answers": ["a fumble", "a fumble", "fumble"], "question": "Which Newton turnover resulted in seven points for Denver?", "id": "56d601e41c85041400946ed0", "qid": "f5a36768278949f2a141aaac2b35a90d", "question_tokens": [["Which", 0], ["Newton", 6], ["turnover", 13], ["resulted", 22], ["in", 31], ["seven", 34], ["points", 40], ["for", 47], ["Denver", 51], ["?", 57]], "detected_answers": [{"text": "fumble", "char_spans": [[185, 190]], "token_spans": [[36, 36]]}]}, {"answers": ["Von Miller", "Von Miller", "Von Miller"], "question": "Who was the Most Valuable Player of Super Bowl 50?", "id": "56d601e41c85041400946ed1", "qid": "cdf0925e33f84a06991a7a956f04aa72", "question_tokens": [["Who", 0], ["was", 4], ["the", 8], ["Most", 12], ["Valuable", 17], ["Player", 26], ["of", 33], ["Super", 36], ["Bowl", 42], ["50", 47], ["?", 49]], "detected_answers": [{"text": "Von Miller", "char_spans": [[248, 257]], "token_spans": [[46, 47]]}]}, {"answers": ["linebacker", "linebacker", "linebacker"], "question": "What position does Von Miller play?", "id": "56d601e41c85041400946ed2", "qid": "056e4a0f86af48ed8e52b5a5bbb20a9e", "question_tokens": [["What", 0], ["position", 5], ["does", 14], ["Von", 19], ["Miller", 23], ["play", 30], ["?", 34]], "detected_answers": [{"text": "linebacker", "char_spans": [[237, 246]], "token_spans": [[45, 45]]}]}, {"answers": ["seven", "seven", "seven"], "question": "How many times was the Panthers' quarterback sacked?", "id": "56d98b33dc89441400fdb53b", "qid": "28add3f364454f83ad3c97bbeeef265e", "question_tokens": [["How", 0], ["many", 4], ["times", 9], ["was", 15], ["the", 19], ["Panthers", 23], ["'", 31], ["quarterback", 33], ["sacked", 45], ["?", 51]], "detected_answers": [{"text": "seven", "char_spans": [[124, 128]], "token_spans": [[25, 25]]}]}, {"answers": ["three", "three", "three"], "question": "How many times did the Broncos cause turnovers in the game?", "id": "56d98b33dc89441400fdb53c", "qid": "d1794bb6911644c890519cdcb177ee1f", "question_tokens": [["How", 0], ["many", 4], ["times", 9], ["did", 15], ["the", 19], ["Broncos", 23], ["cause", 31], ["turnovers", 37], ["in", 47], ["the", 50], ["game", 54], ["?", 58]], "detected_answers": [{"text": "three", "char_spans": [[156, 160]], "token_spans": [[31, 31]]}]}, {"answers": ["Von Miller", "Von Miller", "Von Miller"], "question": "What Denver player caused two fumbles for the Panthers?", "id": "56d98b33dc89441400fdb53d", "qid": "7bb2d15a021247b1b3d162659e7dd9f3", "question_tokens": [["What", 0], ["Denver", 5], ["player", 12], ["caused", 19], ["two", 26], ["fumbles", 30], ["for", 38], ["the", 42], ["Panthers", 46], ["?", 54]], "detected_answers": [{"text": "Von Miller", "char_spans": [[248, 257]], "token_spans": [[46, 47]]}]}, {"answers": ["five", "five", "five"], "question": "How many tackles did Von Miller accomlish by himself in the game?", "id": "56d98b33dc89441400fdb53e", "qid": "1e24127c2a4340c68af60b0d65bf2321", "question_tokens": [["How", 0], ["many", 4], ["tackles", 9], ["did", 17], ["Von", 21], ["Miller", 25], ["accomlish", 32], ["by", 42], ["himself", 45], ["in", 53], ["the", 56], ["game", 60], ["?", 64]], "detected_answers": [{"text": "five", "char_spans": [[295, 298]], "token_spans": [[55, 55]]}]}], "context_tokens": [["The", 0], ["Broncos", 4], ["took", 12], ["an", 17], ["early", 20], ["lead", 26], ["in", 31], ["Super", 34], ["Bowl", 40], ["50", 45], ["and", 48], ["never", 52], ["trailed", 58], [".", 65], ["Newton", 67], ["was", 74], ["limited", 78], ["by", 86], ["Denver", 89], ["'s", 95], ["defense", 98], [",", 105], ["which", 107], ["sacked", 113], ["him", 120], ["seven", 124], ["times", 130], ["and", 136], ["forced", 140], ["him", 147], ["into", 151], ["three", 156], ["turnovers", 162], [",", 171], ["including", 173], ["a", 183], ["fumble", 185], ["which", 192], ["they", 198], ["recovered", 203], ["for", 213], ["a", 217], ["touchdown", 219], [".", 228], ["Denver", 230], ["linebacker", 237], ["Von", 248], ["Miller", 252], ["was", 259], ["named", 263], ["Super", 269], ["Bowl", 275], ["MVP", 280], [",", 283], ["recording", 285], ["five", 295], ["solo", 300], ["tackles", 305], [",", 312], ["2\u00bd", 314], ["sacks", 317], [",", 322], ["and", 324], ["two", 328], ["forced", 332], ["fumbles", 339], [".", 346]]}
{"id": "", "context": "The Panthers finished the regular season with a 15\u20131 record, and quarterback Cam Newton was named the NFL Most Valuable Player (MVP). They defeated the Arizona Cardinals 49\u201315 in the NFC Championship Game and advanced to their second Super Bowl appearance since the franchise was founded in 1995. The Broncos finished the regular season with a 12\u20134 record, and denied the New England Patriots a chance to defend their title from Super Bowl XLIX by defeating them 20\u201318 in the AFC Championship Game. They joined the Patriots, Dallas Cowboys, and Pittsburgh Steelers as one of four teams that have made eight appearances in the Super Bowl.", "qas": [{"answers": ["Cam Newton", "Cam Newton", "Cam Newton"], "question": "Which Carolina Panthers player was named Most Valuable Player?", "id": "56be4e1facb8001400a502f6", "qid": "da07218228a644c1857fd6ccb910ae72", "question_tokens": [["Which", 0], ["Carolina", 6], ["Panthers", 15], ["player", 24], ["was", 31], ["named", 35], ["Most", 41], ["Valuable", 46], ["Player", 55], ["?", 61]], "detected_answers": [{"text": "Cam Newton", "char_spans": [[77, 86]], "token_spans": [[13, 14]]}]}, {"answers": ["8", "eight", "eight"], "question": "How many appearances have the Denver Broncos made in the Super Bowl?", "id": "56be4e1facb8001400a502f9", "qid": "49a498b05c4a4acd85397de984cf2188", "question_tokens": [["How", 0], ["many", 4], ["appearances", 9], ["have", 21], ["the", 26], ["Denver", 30], ["Broncos", 37], ["made", 45], ["in", 50], ["the", 53], ["Super", 57], ["Bowl", 63], ["?", 67]], "detected_answers": [{"text": "eight", "char_spans": [[601, 605]], "token_spans": [[109, 109]]}]}, {"answers": ["1995", "1995", "1995"], "question": "What year was the Carolina Panthers franchise founded?", "id": "56be4e1facb8001400a502fa", "qid": "3e7b4eb8b2224ed89647e15fd9d6cd23", "question_tokens": [["What", 0], ["year", 5], ["was", 10], ["the", 14], ["Carolina", 18], ["Panthers", 27], ["franchise", 36], ["founded", 46], ["?", 53]], "detected_answers": [{"text": "1995", "char_spans": [[291, 294]], "token_spans": [[51, 51]]}]}, {"answers": ["Arizona Cardinals", "the Arizona Cardinals", "Arizona Cardinals"], "question": "What team did the Panthers defeat?", "id": "56beaa4a3aeaaa14008c91c2", "qid": "323d496b4dbe465d9eae68f98bf610cb", "question_tokens": [["What", 0], ["team", 5], ["did", 10], ["the", 14], ["Panthers", 18], ["defeat", 27], ["?", 33]], "detected_answers": [{"text": "Arizona Cardinals", "char_spans": [[152, 168]], "token_spans": [[29, 30]]}]}, {"answers": ["New England Patriots", "the New England Patriots", "New England Patriots"], "question": "Who did the Broncos prevent from going to the Super Bowl?", "id": "56beaa4a3aeaaa14008c91c3", "qid": "ecdf89b712b84cbc8372a41495f62571", "question_tokens": [["Who", 0], ["did", 4], ["the", 8], ["Broncos", 12], ["prevent", 20], ["from", 28], ["going", 33], ["to", 39], ["the", 42], ["Super", 46], ["Bowl", 52], ["?", 56]], "detected_answers": [{"text": "New England Patriots", "char_spans": [[372, 391]], "token_spans": [[67, 69]]}]}, {"answers": ["Arizona Cardinals", "the Arizona Cardinals", "Arizona Cardinals"], "question": "Who did the Panthers beat in the NFC Championship Game?", "id": "56bead5a3aeaaa14008c91e9", "qid": "9e40a8ebc8564507b738a65605b9a67e", "question_tokens": [["Who", 0], ["did", 4], ["the", 8], ["Panthers", 12], ["beat", 21], ["in", 26], ["the", 29], ["NFC", 33], ["Championship", 37], ["Game", 50], ["?", 54]], "detected_answers": [{"text": "Arizona Cardinals", "char_spans": [[152, 168]], "token_spans": [[29, 30]]}]}, {"answers": ["New England Patriots", "the New England Patriots", "New England Patriots"], "question": "Who lost to the Broncos in the AFC Championship?", "id": "56bead5a3aeaaa14008c91ea", "qid": "1b9a02825aa04d5081e10af76a72440f", "question_tokens": [["Who", 0], ["lost", 4], ["to", 9], ["the", 12], ["Broncos", 16], ["in", 24], ["the", 27], ["AFC", 31], ["Championship", 35], ["?", 47]], "detected_answers": [{"text": "New England Patriots", "char_spans": [[372, 391]], "token_spans": [[67, 69]]}]}, {"answers": ["New England Patriots", "the New England Patriots", "New England Patriots"], "question": "Who were the defending Super Bowl champions?", "id": "56bead5a3aeaaa14008c91eb", "qid": "9af08f972fde41f9922d06b3f4f10c9a", "question_tokens": [["Who", 0], ["were", 4], ["the", 9], ["defending", 13], ["Super", 23], ["Bowl", 29], ["champions", 34], ["?", 43]], "detected_answers": [{"text": "New England Patriots", "char_spans": [[372, 391]], "token_spans": [[67, 69]]}]}, {"answers": ["four", "four", "four"], "question": "How many teams have been in the Super Bowl eight times?", "id": "56bead5a3aeaaa14008c91ec", "qid": "0c53b9deba6e4052833488cdfd5104e3", "question_tokens": [["How", 0], ["many", 4], ["teams", 9], ["have", 15], ["been", 20], ["in", 25], ["the", 28], ["Super", 32], ["Bowl", 38], ["eight", 43], ["times", 49], ["?", 54]], "detected_answers": [{"text": "four", "char_spans": [[575, 578]], "token_spans": [[104, 104]]}]}, {"answers": ["Cam Newton", "Cam Newton", "Cam Newton"], "question": "Who was this season's NFL MVP?", "id": "56bead5a3aeaaa14008c91ed", "qid": "6bf2e3769c5b47d899142e101e4f7ef6", "question_tokens": [["Who", 0], ["was", 4], ["this", 8], ["season", 13], ["'s", 19], ["NFL", 22], ["MVP", 26], ["?", 29]], "detected_answers": [{"text": "Cam Newton", "char_spans": [[77, 86]], "token_spans": [[13, 14]]}]}, {"answers": ["15\u20131", "15\u20131", "15\u20131"], "question": "What was the win/loss ratio in 2015 for the Carolina Panthers during their regular season?", "id": "56bf159b3aeaaa14008c9507", "qid": "437d2865eb1843b9b7d2b307b47aefc0", "question_tokens": [["What", 0], ["was", 5], ["the", 9], ["win", 13], ["/", 16], ["loss", 17], ["ratio", 22], ["in", 28], ["2015", 31], ["for", 36], ["the", 40], ["Carolina", 44], ["Panthers", 53], ["during", 62], ["their", 69], ["regular", 75], ["season", 83], ["?", 89]], "detected_answers": [{"text": "15\u20131", "char_spans": [[48, 51]], "token_spans": [[8, 8]]}]}, {"answers": ["Cam Newton", "Cam Newton", "Cam Newton"], "question": "Which Carolina Panthers team member was picked as the team's MVP in 2015? ", "id": "56bf159b3aeaaa14008c9508", "qid": "6feefc79aadb4cc2a2dbe29b3de1932d", "question_tokens": [["Which", 0], ["Carolina", 6], ["Panthers", 15], ["team", 24], ["member", 29], ["was", 36], ["picked", 40], ["as", 47], ["the", 50], ["team", 54], ["'s", 58], ["MVP", 61], ["in", 65], ["2015", 68], ["?", 72]], "detected_answers": [{"text": "Cam Newton", "char_spans": [[77, 86]], "token_spans": [[13, 14]]}]}, {"answers": ["12\u20134", "12\u20134", "12\u20134"], "question": "What were the win/loss game stats for the Denver Bronco's regular season in 2015?", "id": "56bf159b3aeaaa14008c9509", "qid": "8a18fb154ca84cbdacb07e149efaaebd", "question_tokens": [["What", 0], ["were", 5], ["the", 10], ["win", 14], ["/", 17], ["loss", 18], ["game", 23], ["stats", 28], ["for", 34], ["the", 38], ["Denver", 42], ["Bronco", 49], ["'s", 55], ["regular", 58], ["season", 66], ["in", 73], ["2015", 76], ["?", 80]], "detected_answers": [{"text": "12\u20134", "char_spans": [[344, 347]], "token_spans": [[61, 61]]}]}, {"answers": ["4", "four", "four"], "question": "How many teams have played in the Super Bowl eight times?", "id": "56bf159b3aeaaa14008c950a", "qid": "909ce53c05864cc1b802bf64daa96a57", "question_tokens": [["How", 0], ["many", 4], ["teams", 9], ["have", 15], ["played", 20], ["in", 27], ["the", 30], ["Super", 34], ["Bowl", 40], ["eight", 45], ["times", 51], ["?", 56]], "detected_answers": [{"text": "four", "char_spans": [[575, 578]], "token_spans": [[104, 104]]}]}, {"answers": ["New England Patriots", "the New England Patriots", "New England Patriots"], "question": "Which team did not get a chance to defend their Super Bowl XLIX win in Super Bowl 50?", "id": "56bf159b3aeaaa14008c950b", "qid": "4415e45815e14a0985ef22441f8a6ddc", "question_tokens": [["Which", 0], ["team", 6], ["did", 11], ["not", 15], ["get", 19], ["a", 23], ["chance", 25], ["to", 32], ["defend", 35], ["their", 42], ["Super", 48], ["Bowl", 54], ["XLIX", 59], ["win", 64], ["in", 68], ["Super", 71], ["Bowl", 77], ["50", 82], ["?", 84]], "detected_answers": [{"text": "New England Patriots", "char_spans": [[372, 391]], "token_spans": [[67, 69]]}]}, {"answers": ["Cam Newton", "Cam Newton", "Cam Newton"], "question": "Who is the quarterback for the Panthers?", "id": "56d2045de7d4791d009025f3", "qid": "e7a5911b99434f94b94b1fc214d78255", "question_tokens": [["Who", 0], ["is", 4], ["the", 7], ["quarterback", 11], ["for", 23], ["the", 27], ["Panthers", 31], ["?", 39]], "detected_answers": [{"text": "Cam Newton", "char_spans": [[77, 86]], "token_spans": [[13, 14]]}]}, {"answers": ["Arizona Cardinals", "the Arizona Cardinals", "Arizona Cardinals"], "question": "Who did Carolina beat in the NFC championship game?", "id": "56d2045de7d4791d009025f4", "qid": "5d1fa83a07cb49f8aac96a4d5442e160", "question_tokens": [["Who", 0], ["did", 4], ["Carolina", 8], ["beat", 17], ["in", 22], ["the", 25], ["NFC", 29], ["championship", 33], ["game", 46], ["?", 50]], "detected_answers": [{"text": "Arizona Cardinals", "char_spans": [[152, 168]], "token_spans": [[29, 30]]}]}, {"answers": ["2", "second", "second"], "question": "How many times have the Panthers been in the Super Bowl?", "id": "56d2045de7d4791d009025f5", "qid": "d4667206b27343018306c903f6bc6a99", "question_tokens": [["How", 0], ["many", 4], ["times", 9], ["have", 15], ["the", 20], ["Panthers", 24], ["been", 33], ["in", 38], ["the", 41], ["Super", 45], ["Bowl", 51], ["?", 55]], "detected_answers": [{"text": "second", "char_spans": [[227, 232]], "token_spans": [[41, 41]]}]}, {"answers": ["New England Patriots", "the New England Patriots", "New England Patriots"], "question": "Who did Denver beat in the AFC championship?", "id": "56d2045de7d4791d009025f6", "qid": "0c538848194644f9903c7fb60dd171e6", "question_tokens": [["Who", 0], ["did", 4], ["Denver", 8], ["beat", 15], ["in", 20], ["the", 23], ["AFC", 27], ["championship", 31], ["?", 43]], "detected_answers": [{"text": "New England Patriots", "char_spans": [[372, 391]], "token_spans": [[67, 69]]}]}, {"answers": ["Cam Newton", "Cam Newton", "Cam Newton"], "question": "Who was the Most Valuable Player for the 2015 NFL season?", "id": "56d6017d1c85041400946ebe", "qid": "4d3d83197bba4f518feb689ef7f60f8a", "question_tokens": [["Who", 0], ["was", 4], ["the", 8], ["Most", 12], ["Valuable", 17], ["Player", 26], ["for", 33], ["the", 37], ["2015", 41], ["NFL", 46], ["season", 50], ["?", 56]], "detected_answers": [{"text": "Cam Newton", "char_spans": [[77, 86]], "token_spans": [[13, 14]]}]}, {"answers": ["New England Patriots", "the New England Patriots", "New England Patriots"], "question": "Who did Denver beat in the 2015 AFC Championship game?", "id": "56d6017d1c85041400946ec1", "qid": "63bed63e1ba54b279f24036708cfddf2", "question_tokens": [["Who", 0], ["did", 4], ["Denver", 8], ["beat", 15], ["in", 20], ["the", 23], ["2015", 27], ["AFC", 32], ["Championship", 36], ["game", 49], ["?", 53]], "detected_answers": [{"text": "New England Patriots", "char_spans": [[372, 391]], "token_spans": [[67, 69]]}]}, {"answers": ["Arizona Cardinals", "the Arizona Cardinals", "Arizona Cardinals"], "question": "Who did the Carolina Panthers beat in the 2015 NFC Championship game?", "id": "56d6017d1c85041400946ec2", "qid": "4b5d0e2322fe49ccb9a5448a73d88622", "question_tokens": [["Who", 0], ["did", 4], ["the", 8], ["Carolina", 12], ["Panthers", 21], ["beat", 30], ["in", 35], ["the", 38], ["2015", 42], ["NFC", 47], ["Championship", 51], ["game", 64], ["?", 68]], "detected_answers": [{"text": "Arizona Cardinals", "char_spans": [[152, 168]], "token_spans": [[29, 30]]}]}, {"answers": ["Cam Newton", "Cam Newton", "Cam Newton"], "question": "Who was the 2015 NFL MVP?", "id": "56d98a59dc89441400fdb52a", "qid": "1765057242ee422e82a04197605bfd07", "question_tokens": [["Who", 0], ["was", 4], ["the", 8], ["2015", 12], ["NFL", 17], ["MVP", 21], ["?", 24]], "detected_answers": [{"text": "Cam Newton", "char_spans": [[77, 86]], "token_spans": [[13, 14]]}]}, {"answers": ["Arizona Cardinals", "the Arizona Cardinals", "Arizona Cardinals"], "question": "Who did the Panthers beat to become the NFC champs?", "id": "56d98a59dc89441400fdb52b", "qid": "7cc6f913c3334f6bacff45bc1d39eb11", "question_tokens": [["Who", 0], ["did", 4], ["the", 8], ["Panthers", 12], ["beat", 21], ["to", 26], ["become", 29], ["the", 36], ["NFC", 40], ["champs", 44], ["?", 50]], "detected_answers": [{"text": "Arizona Cardinals", "char_spans": [[152, 168]], "token_spans": [[29, 30]]}]}, {"answers": ["1995.", "1995", "1995"], "question": "What year did the Carolina Panthers form?", "id": "56d98a59dc89441400fdb52e", "qid": "3ce823a0265c42138af5f46290aee69d", "question_tokens": [["What", 0], ["year", 5], ["did", 10], ["the", 14], ["Carolina", 18], ["Panthers", 27], ["form", 36], ["?", 40]], "detected_answers": [{"text": "1995", "char_spans": [[291, 294]], "token_spans": [[51, 51]]}]}], "context_tokens": [["The", 0], ["Panthers", 4], ["finished", 13], ["the", 22], ["regular", 26], ["season", 34], ["with", 41], ["a", 46], ["15\u20131", 48], ["record", 53], [",", 59], ["and", 61], ["quarterback", 65], ["Cam", 77], ["Newton", 81], ["was", 88], ["named", 92], ["the", 98], ["NFL", 102], ["Most", 106], ["Valuable", 111], ["Player", 120], ["(", 127], ["MVP", 128], [")", 131], [".", 132], ["They", 134], ["defeated", 139], ["the", 148], ["Arizona", 152], ["Cardinals", 160], ["49\u201315", 170], ["in", 176], ["the", 179], ["NFC", 183], ["Championship", 187], ["Game", 200], ["and", 205], ["advanced", 209], ["to", 218], ["their", 221], ["second", 227], ["Super", 234], ["Bowl", 240], ["appearance", 245], ["since", 256], ["the", 262], ["franchise", 266], ["was", 276], ["founded", 280], ["in", 288], ["1995", 291], [".", 295], ["The", 297], ["Broncos", 301], ["finished", 309], ["the", 318], ["regular", 322], ["season", 330], ["with", 337], ["a", 342], ["12\u20134", 344], ["record", 349], [",", 355], ["and", 357], ["denied", 361], ["the", 368], ["New", 372], ["England", 376], ["Patriots", 384], ["a", 393], ["chance", 395], ["to", 402], ["defend", 405], ["their", 412], ["title", 418], ["from", 424], ["Super", 429], ["Bowl", 435], ["XLIX", 440], ["by", 445], ["defeating", 448], ["them", 458], ["20\u201318", 463], ["in", 469], ["the", 472], ["AFC", 476], ["Championship", 480], ["Game", 493], [".", 497], ["They", 499], ["joined", 504], ["the", 511], ["Patriots", 515], [",", 523], ["Dallas", 525], ["Cowboys", 532], [",", 539], ["and", 541], ["Pittsburgh", 545], ["Steelers", 556], ["as", 565], ["one", 568], ["of", 572], ["four", 575], ["teams", 580], ["that", 586], ["have", 591], ["made", 596], ["eight", 601], ["appearances", 607], ["in", 619], ["the", 622], ["Super", 626], ["Bowl", 632], [".", 636]]}
{"id": "", "context": "In early 2012, NFL Commissioner Roger Goodell stated that the league planned to make the 50th Super Bowl \"spectacular\" and that it would be \"an important game for us as a league\".", "qas": [{"answers": ["Roger Goodell", "Roger Goodell", "Goodell"], "question": "Who was the NFL Commissioner in early 2012?", "id": "56be53b8acb8001400a50314", "qid": "4a254acd87b748b5b285642219aa07af", "question_tokens": [["Who", 0], ["was", 4], ["the", 8], ["NFL", 12], ["Commissioner", 16], ["in", 29], ["early", 32], ["2012", 38], ["?", 42]], "detected_answers": [{"text": "Goodell", "char_spans": [[38, 44]], "token_spans": [[7, 7]]}]}, {"answers": ["the 50th Super Bowl", "the 50th", "50th"], "question": "Which Super Bowl did Roger Goodell speak about?", "id": "56be53b8acb8001400a50315", "qid": "0e71ff5d91a049f2913aa1d7c79e1d4e", "question_tokens": [["Which", 0], ["Super", 6], ["Bowl", 12], ["did", 17], ["Roger", 21], ["Goodell", 27], ["speak", 35], ["about", 41], ["?", 46]], "detected_answers": [{"text": "50th", "char_spans": [[89, 92]], "token_spans": [[16, 16]]}]}, {"answers": ["2012", "2012", "2012"], "question": "In what year did Roger Goodell call Super Bowl 50 'an important game for us as a league'?", "id": "56be53b8acb8001400a50316", "qid": "9b27dff5b8064c328e37b252b1546ec7", "question_tokens": [["In", 0], ["what", 3], ["year", 8], ["did", 13], ["Roger", 17], ["Goodell", 23], ["call", 31], ["Super", 36], ["Bowl", 42], ["50", 47], ["'", 50], ["an", 51], ["important", 54], ["game", 64], ["for", 69], ["us", 73], ["as", 76], ["a", 79], ["league", 81], ["'", 87], ["?", 88]], "detected_answers": [{"text": "2012", "char_spans": [[9, 12]], "token_spans": [[2, 2]]}]}, {"answers": ["Roger Goodell", "Roger Goodell", "Goodell"], "question": "Who is the Commissioner of the National Football League?", "id": "56beafca3aeaaa14008c9207", "qid": "ab3d30def72b4174826a713e8f572f0c", "question_tokens": [["Who", 0], ["is", 4], ["the", 7], ["Commissioner", 11], ["of", 24], ["the", 27], ["National", 31], ["Football", 40], ["League", 49], ["?", 55]], "detected_answers": [{"text": "Goodell", "char_spans": [[38, 44]], "token_spans": [[7, 7]]}]}, {"answers": ["early 2012", "In early 2012", "2012"], "question": "When did he make the quoted remarks about Super Bowl 50?", "id": "56beafca3aeaaa14008c9208", "qid": "a8ee01c022a64793bb108576e29f8586", "question_tokens": [["When", 0], ["did", 5], ["he", 9], ["make", 12], ["the", 17], ["quoted", 21], ["remarks", 28], ["about", 36], ["Super", 42], ["Bowl", 48], ["50", 53], ["?", 55]], "detected_answers": [{"text": "2012", "char_spans": [[9, 12]], "token_spans": [[2, 2]]}]}, {"answers": ["Roger Goodell", "Roger Goodell", "Goodell"], "question": "Who was the commissioner of the NFL in 2012? ", "id": "56bf42f53aeaaa14008c95a3", "qid": "badec4360c174c3aabc71178828a274c", "question_tokens": [["Who", 0], ["was", 4], ["the", 8], ["commissioner", 12], ["of", 25], ["the", 28], ["NFL", 32], ["in", 36], ["2012", 39], ["?", 43]], "detected_answers": [{"text": "Goodell", "char_spans": [[38, 44]], "token_spans": [[7, 7]]}]}, {"answers": ["Roger Goodell", "Roger Goodell", "Goodell"], "question": "Who if the commissioner of the NFL?", "id": "56d2053ae7d4791d00902610", "qid": "9f3e1e612d60489d920c5bba8d856be3", "question_tokens": [["Who", 0], ["if", 4], ["the", 7], ["commissioner", 11], ["of", 24], ["the", 27], ["NFL", 31], ["?", 34]], "detected_answers": [{"text": "Goodell", "char_spans": [[38, 44]], "token_spans": [[7, 7]]}]}, {"answers": ["Roger Goodell", "Roger Goodell", "Goodell"], "question": "Who is the commissioner of the NFL?", "id": "56d6edd00d65d21400198250", "qid": "2bc59ee77b55423b803ab82039f4a8ad", "question_tokens": [["Who", 0], ["is", 4], ["the", 7], ["commissioner", 11], ["of", 24], ["the", 27], ["NFL", 31], ["?", 34]], "detected_answers": [{"text": "Goodell", "char_spans": [[38, 44]], "token_spans": [[7, 7]]}]}, {"answers": ["spectacular", "an important game for us as a league", "spectacular"], "question": "In early 2012, Goodell said that Super Bowl 50 would be what?", "id": "56d6edd00d65d21400198251", "qid": "d61a05770159444da1a4243481a3f2cd", "question_tokens": [["In", 0], ["early", 3], ["2012", 9], [",", 13], ["Goodell", 15], ["said", 23], ["that", 28], ["Super", 33], ["Bowl", 39], ["50", 44], ["would", 47], ["be", 53], ["what", 56], ["?", 60]], "detected_answers": [{"text": "spectacular", "char_spans": [[106, 116]], "token_spans": [[20, 20]]}]}, {"answers": ["spectacular", "spectacular", "spectacular"], "question": "What one word did the NFL commissioner use to describe what Super Bowl 50 was intended to be?", "id": "56d98d0adc89441400fdb54e", "qid": "79205f275aa3488fa6f4d8d35adc0dad", "question_tokens": [["What", 0], ["one", 5], ["word", 9], ["did", 14], ["the", 18], ["NFL", 22], ["commissioner", 26], ["use", 39], ["to", 43], ["describe", 46], ["what", 55], ["Super", 60], ["Bowl", 66], ["50", 71], ["was", 74], ["intended", 78], ["to", 87], ["be", 90], ["?", 92]], "detected_answers": [{"text": "spectacular", "char_spans": [[106, 116]], "token_spans": [[20, 20]]}]}, {"answers": ["2012", "2012", "2012"], "question": "What year did Roger Goodell announce that Super Bowl 50 would be \"important\"?", "id": "56d98d0adc89441400fdb54f", "qid": "281dd246f1eb4df29ce5fa1c9b7e2df1", "question_tokens": [["What", 0], ["year", 5], ["did", 10], ["Roger", 14], ["Goodell", 20], ["announce", 28], ["that", 37], ["Super", 42], ["Bowl", 48], ["50", 53], ["would", 56], ["be", 62], ["\"", 65], ["important", 66], ["\"", 75], ["?", 76]], "detected_answers": [{"text": "2012", "char_spans": [[9, 12]], "token_spans": [[2, 2]]}]}], "context_tokens": [["In", 0], ["early", 3], ["2012", 9], [",", 13], ["NFL", 15], ["Commissioner", 19], ["Roger", 32], ["Goodell", 38], ["stated", 46], ["that", 53], ["the", 58], ["league", 62], ["planned", 69], ["to", 77], ["make", 80], ["the", 85], ["50th", 89], ["Super", 94], ["Bowl", 100], ["\"", 105], ["spectacular", 106], ["\"", 117], ["and", 119], ["that", 123], ["it", 128], ["would", 131], ["be", 137], ["\"", 140], ["an", 141], ["important", 144], ["game", 154], ["for", 159], ["us", 163], ["as", 166], ["a", 169], ["league", 171], ["\"", 177], [".", 178]]}
{"id": "", "context": "CBS broadcast Super Bowl 50 in the U.S., and charged an average of $5 million for a 30-second commercial during the game. The Super Bowl 50 halftime show was headlined by the British rock group Coldplay with special guest performers Beyonc\u00e9 and Bruno Mars, who headlined the Super Bowl XLVII and Super Bowl XLVIII halftime shows, respectively. It was the third-most watched U.S. broadcast ever.", "qas": [{"answers": ["CBS", "CBS", "CBS"], "question": "Which network broadcasted Super Bowl 50 in the U.S.?", "id": "56be5333acb8001400a5030a", "qid": "0c9b4fa5b9c94a6dbb05efdc241ecaea", "question_tokens": [["Which", 0], ["network", 6], ["broadcasted", 14], ["Super", 26], ["Bowl", 32], ["50", 37], ["in", 40], ["the", 43], ["U.S.", 47], ["?", 51]], "detected_answers": [{"text": "CBS", "token_spans": [[0, 0]], "char_spans": [[0, 2]]}]}, {"answers": ["$5 million", "$5 million", "$5 million"], "question": "What was the average cost for a 30 second commercial during Super Bowl 50?", "id": "56be5333acb8001400a5030b", "qid": "b4ce2af7a31a480799a4ddf27001324b", "question_tokens": [["What", 0], ["was", 5], ["the", 9], ["average", 13], ["cost", 21], ["for", 26], ["a", 30], ["30", 32], ["second", 35], ["commercial", 42], ["during", 53], ["Super", 60], ["Bowl", 66], ["50", 71], ["?", 73]], "detected_answers": [{"text": "$5 million", "char_spans": [[67, 76]], "token_spans": [[14, 16]]}]}, {"answers": ["Coldplay", "Coldplay", "Coldplay"], "question": "Which group headlined the Super Bowl 50 halftime show?", "id": "56be5333acb8001400a5030c", "qid": "3c8f72ed38114041b2a0c784556b70af", "question_tokens": [["Which", 0], ["group", 6], ["headlined", 12], ["the", 22], ["Super", 26], ["Bowl", 32], ["50", 37], ["halftime", 40], ["show", 49], ["?", 53]], "detected_answers": [{"text": "Coldplay", "char_spans": [[194, 201]], "token_spans": [[38, 38]]}]}, {"answers": ["Beyonc\u00e9 and Bruno Mars", "Beyonc\u00e9 and Bruno Mars", "Beyonc\u00e9 and Bruno Mars"], "question": "Which performers joined the headliner during the Super Bowl 50 halftime show?", "id": "56be5333acb8001400a5030d", "qid": "bd4ac80e6ab34822bd30bc62c0aa84f0", "question_tokens": [["Which", 0], ["performers", 6], ["joined", 17], ["the", 24], ["headliner", 28], ["during", 38], ["the", 45], ["Super", 49], ["Bowl", 55], ["50", 60], ["halftime", 63], ["show", 72], ["?", 76]], "detected_answers": [{"text": "Beyonc\u00e9 and Bruno Mars", "char_spans": [[233, 254]], "token_spans": [[43, 46]]}]}, {"answers": ["Super Bowl XLVII", "Super Bowl XLVII", "XLVII"], "question": "At which Super Bowl did Beyonce headline the halftime show?", "id": "56be5333acb8001400a5030e", "qid": "e10622bd50994a70ba188696f1fa950a", "question_tokens": [["At", 0], ["which", 3], ["Super", 9], ["Bowl", 15], ["did", 20], ["Beyonce", 24], ["headline", 32], ["the", 41], ["halftime", 45], ["show", 54], ["?", 58]], "detected_answers": [{"text": "XLVII", "char_spans": [[286, 290]], "token_spans": [[53, 53]]}]}, {"answers": ["CBS", "CBS", "CBS"], "question": "Who was the broadcaster for Super Bowl 50 in the United States?", "id": "56beaf5e3aeaaa14008c91fd", "qid": "9c8627c4245d40958d9eafe5b3ecb7d9", "question_tokens": [["Who", 0], ["was", 4], ["the", 8], ["broadcaster", 12], ["for", 24], ["Super", 28], ["Bowl", 34], ["50", 39], ["in", 42], ["the", 45], ["United", 49], ["States", 56], ["?", 62]], "detected_answers": [{"text": "CBS", "token_spans": [[0, 0]], "char_spans": [[0, 2]]}]}, {"answers": ["$5 million", "$5 million", "$5 million"], "question": "What was the average cost of a 30-second commercial?", "id": "56beaf5e3aeaaa14008c91fe", "qid": "ed933501734a4497bdc14153e593e2e3", "question_tokens": [["What", 0], ["was", 5], ["the", 9], ["average", 13], ["cost", 21], ["of", 26], ["a", 29], ["30-second", 31], ["commercial", 41], ["?", 51]], "detected_answers": [{"text": "$5 million", "char_spans": [[67, 76]], "token_spans": [[14, 16]]}]}, {"answers": ["Beyonc\u00e9", "Beyonc\u00e9", "Beyonc\u00e9"], "question": "What halftime performer previously headlined Super Bowl XLVII?", "id": "56beaf5e3aeaaa14008c91ff", "qid": "6d5190021d0e47b7939a8a4098fbca30", "question_tokens": [["What", 0], ["halftime", 5], ["performer", 14], ["previously", 24], ["headlined", 35], ["Super", 45], ["Bowl", 51], ["XLVII", 56], ["?", 61]], "detected_answers": [{"text": "Beyonc\u00e9", "char_spans": [[233, 239]], "token_spans": [[43, 43]]}]}, {"answers": ["Bruno Mars", "Bruno Mars", "Mars"], "question": "What halftime performer previously headlined Super Bowl XLVIII?", "id": "56beaf5e3aeaaa14008c9200", "qid": "953fb4e5f4754283b144e9502aa611b6", "question_tokens": [["What", 0], ["halftime", 5], ["performer", 14], ["previously", 24], ["headlined", 35], ["Super", 45], ["Bowl", 51], ["XLVIII", 56], ["?", 62]], "detected_answers": [{"text": "Mars", "char_spans": [[251, 254]], "token_spans": [[46, 46]]}]}, {"answers": ["Coldplay", "Coldplay", "Coldplay"], "question": "Who was the main performer at this year's halftime show?", "id": "56beaf5e3aeaaa14008c9201", "qid": "f43c83e38d1e424ea00f8ad3c77ec999", "question_tokens": [["Who", 0], ["was", 4], ["the", 8], ["main", 12], ["performer", 17], ["at", 27], ["this", 30], ["year", 35], ["'s", 39], ["halftime", 42], ["show", 51], ["?", 55]], "detected_answers": [{"text": "Coldplay", "char_spans": [[194, 201]], "token_spans": [[38, 38]]}]}, {"answers": ["CBS", "CBS", "CBS"], "question": "Which network broadcasted the 50th Super Bowl game? ", "id": "56bf1ae93aeaaa14008c951b", "qid": "f2d38fefa10543f28667c2516b598752", "question_tokens": [["Which", 0], ["network", 6], ["broadcasted", 14], ["the", 26], ["50th", 30], ["Super", 35], ["Bowl", 41], ["game", 46], ["?", 50]], "detected_answers": [{"text": "CBS", "token_spans": [[0, 0]], "char_spans": [[0, 2]]}]}, {"answers": ["$5 million", "$5 million", "$5 million"], "question": "What was the average cost for a TV ad lasting 30 seconds during Super Bowl 50?", "id": "56bf1ae93aeaaa14008c951c", "qid": "8010971a771042da83d36365518ce556", "question_tokens": [["What", 0], ["was", 5], ["the", 9], ["average", 13], ["cost", 21], ["for", 26], ["a", 30], ["TV", 32], ["ad", 35], ["lasting", 38], ["30", 46], ["seconds", 49], ["during", 57], ["Super", 64], ["Bowl", 70], ["50", 75], ["?", 77]], "detected_answers": [{"text": "$5 million", "char_spans": [[67, 76]], "token_spans": [[14, 16]]}]}, {"answers": ["Bruno Mars", "Bruno Mars", "Bruno Mars,"], "question": "Who was the male singer who performed as a special guest during Super Bowl 50?", "id": "56bf1ae93aeaaa14008c951e", "qid": "f8d1ae73efb54b31a90381ab150dbae6", "question_tokens": [["Who", 0], ["was", 4], ["the", 8], ["male", 12], ["singer", 17], ["who", 24], ["performed", 28], ["as", 38], ["a", 41], ["special", 43], ["guest", 51], ["during", 57], ["Super", 64], ["Bowl", 70], ["50", 75], ["?", 77]], "detected_answers": [{"text": "Bruno Mars,", "char_spans": [[245, 255]], "token_spans": [[45, 47]]}]}, {"answers": ["third", "third", "third"], "question": "What ranking does the Super Bowl 50 halftime show have on the list of most watched TV broadcasts?", "id": "56bf1ae93aeaaa14008c951f", "qid": "7384156b154247e288e57763eca2d4c7", "question_tokens": [["What", 0], ["ranking", 5], ["does", 13], ["the", 18], ["Super", 22], ["Bowl", 28], ["50", 33], ["halftime", 36], ["show", 45], ["have", 50], ["on", 55], ["the", 58], ["list", 62], ["of", 67], ["most", 70], ["watched", 75], ["TV", 83], ["broadcasts", 86], ["?", 96]], "detected_answers": [{"text": "third", "char_spans": [[355, 359]], "token_spans": [[66, 66]]}]}, {"answers": ["CBS", "CBS", "CBS"], "question": "What station aired the Super Bowl?", "id": "56d2051ce7d4791d00902608", "qid": "7df46260b280403986082e27c6b39ecc", "question_tokens": [["What", 0], ["station", 5], ["aired", 13], ["the", 19], ["Super", 23], ["Bowl", 29], ["?", 33]], "detected_answers": [{"text": "CBS", "token_spans": [[0, 0]], "char_spans": [[0, 2]]}]}, {"answers": ["$5 million", "$5 million", "$5 million"], "question": "How much money did a 1/2 minute commercial cost?", "id": "56d2051ce7d4791d00902609", "qid": "e9add81e71134709ba5c5ca255265932", "question_tokens": [["How", 0], ["much", 4], ["money", 9], ["did", 15], ["a", 19], ["1/2", 21], ["minute", 25], ["commercial", 32], ["cost", 43], ["?", 47]], "detected_answers": [{"text": "$5 million", "char_spans": [[67, 76]], "token_spans": [[14, 16]]}]}, {"answers": ["Coldplay", "Coldplay", "Coldplay"], "question": "What band headlined half-time during Super Bowl 50?", "id": "56d2051ce7d4791d0090260a", "qid": "63a7e9b702d94b1994b9f543ba173e99", "question_tokens": [["What", 0], ["band", 5], ["headlined", 10], ["half", 20], ["-", 24], ["time", 25], ["during", 30], ["Super", 37], ["Bowl", 43], ["50", 48], ["?", 50]], "detected_answers": [{"text": "Coldplay", "char_spans": [[194, 201]], "token_spans": [[38, 38]]}]}, {"answers": ["Beyonc\u00e9 and Bruno Mars", "Beyonc\u00e9 and Bruno Mars", "Beyonc\u00e9 and Bruno Mars"], "question": "What two artists came out with Coldplay during the half-time show?", "id": "56d2051ce7d4791d0090260b", "qid": "25d0c8f0803a41b5867e1e10e9d4186d", "question_tokens": [["What", 0], ["two", 5], ["artists", 9], ["came", 17], ["out", 22], ["with", 26], ["Coldplay", 31], ["during", 40], ["the", 47], ["half", 51], ["-", 55], ["time", 56], ["show", 61], ["?", 65]], "detected_answers": [{"text": "Beyonc\u00e9 and Bruno Mars", "char_spans": [[233, 254]], "token_spans": [[43, 46]]}]}, {"answers": ["CBS", "CBS", "CBS"], "question": "Who broadcast the Super Bowl on TV?", "id": "56d602631c85041400946ed8", "qid": "4071b2abd36948da9dc8ae88d79ec150", "question_tokens": [["Who", 0], ["broadcast", 4], ["the", 14], ["Super", 18], ["Bowl", 24], ["on", 29], ["TV", 32], ["?", 34]], "detected_answers": [{"text": "CBS", "token_spans": [[0, 0]], "char_spans": [[0, 2]]}]}, {"answers": ["Coldplay", "Coldplay", "Coldplay"], "question": "Who headlined the halftime show for Super Bowl 50?", "id": "56d602631c85041400946eda", "qid": "6e5cce1881714600bdd943b40eea3b1f", "question_tokens": [["Who", 0], ["headlined", 4], ["the", 14], ["halftime", 18], ["show", 27], ["for", 32], ["Super", 36], ["Bowl", 42], ["50", 47], ["?", 49]], "detected_answers": [{"text": "Coldplay", "char_spans": [[194, 201]], "token_spans": [[38, 38]]}]}, {"answers": ["Beyonc\u00e9 and Bruno Mars", "Beyonc\u00e9 and Bruno Mars", "Beyonc\u00e9 and Bruno Mars"], "question": "Who were special guests for the Super Bowl halftime show?", "id": "56d602631c85041400946edb", "qid": "c0a739122447497f872a3041671bc290", "question_tokens": [["Who", 0], ["were", 4], ["special", 9], ["guests", 17], ["for", 24], ["the", 28], ["Super", 32], ["Bowl", 38], ["halftime", 43], ["show", 52], ["?", 56]], "detected_answers": [{"text": "Beyonc\u00e9 and Bruno Mars", "char_spans": [[233, 254]], "token_spans": [[43, 46]]}]}, {"answers": ["Super Bowl XLVII", "Super Bowl XLVII", "Super Bowl XLVII"], "question": "Which Super Bowl halftime show did Beyonc\u00e9 headline?", "id": "56d602631c85041400946edc", "qid": "612366179dfb48acb54d038ce202389d", "question_tokens": [["Which", 0], ["Super", 6], ["Bowl", 12], ["halftime", 17], ["show", 26], ["did", 31], ["Beyonc\u00e9", 35], ["headline", 43], ["?", 51]], "detected_answers": [{"text": "Super Bowl XLVII", "char_spans": [[275, 290]], "token_spans": [[51, 53]]}]}, {"answers": ["$5 million", "$5 million", "$5 million for a 30-second"], "question": "What was the cost for a half minute ad?", "id": "56d98c53dc89441400fdb544", "qid": "79e082c905b147288f1fd1552221c21b", "question_tokens": [["What", 0], ["was", 5], ["the", 9], ["cost", 13], ["for", 18], ["a", 22], ["half", 24], ["minute", 29], ["ad", 36], ["?", 38]], "detected_answers": [{"text": "$5 million for a 30-second", "char_spans": [[67, 92]], "token_spans": [[14, 19]]}]}, {"answers": ["Coldplay", "Coldplay", "Coldplay"], "question": "Who lead the Super Bowl 50 halftime performance?", "id": "56d98c53dc89441400fdb545", "qid": "d43b746325264596ad5ab1cd3afd9cb6", "question_tokens": [["Who", 0], ["lead", 4], ["the", 9], ["Super", 13], ["Bowl", 19], ["50", 24], ["halftime", 27], ["performance", 36], ["?", 47]], "detected_answers": [{"text": "Coldplay", "char_spans": [[194, 201]], "token_spans": [[38, 38]]}]}, {"answers": ["Beyonc\u00e9 and Bruno Mars", "Beyonc\u00e9 and Bruno Mars", "Beyonc\u00e9 and Bruno Mars"], "question": "What other two famous performers were part of the Super Bowl 50 halftime?", "id": "56d98c53dc89441400fdb546", "qid": "8e4c964faecf4027892e02623dc82f33", "question_tokens": [["What", 0], ["other", 5], ["two", 11], ["famous", 15], ["performers", 22], ["were", 33], ["part", 38], ["of", 43], ["the", 46], ["Super", 50], ["Bowl", 56], ["50", 61], ["halftime", 64], ["?", 72]], "detected_answers": [{"text": "Beyonc\u00e9 and Bruno Mars", "char_spans": [[233, 254]], "token_spans": [[43, 46]]}]}, {"answers": ["Bruno Mars", "Coldplay", "Coldplay"], "question": "What performer lead the Super Bowl XLVIII halftime show?", "id": "56d98c53dc89441400fdb548", "qid": "469309ce71a9446f886ef563f7eafc66", "question_tokens": [["What", 0], ["performer", 5], ["lead", 15], ["the", 20], ["Super", 24], ["Bowl", 30], ["XLVIII", 35], ["halftime", 42], ["show", 51], ["?", 55]], "detected_answers": [{"text": "Coldplay", "char_spans": [[194, 201]], "token_spans": [[38, 38]]}]}], "context_tokens": [["CBS", 0], ["broadcast", 4], ["Super", 14], ["Bowl", 20], ["50", 25], ["in", 28], ["the", 31], ["U.S.", 35], [",", 39], ["and", 41], ["charged", 45], ["an", 53], ["average", 56], ["of", 64], ["$", 67], ["5", 68], ["million", 70], ["for", 78], ["a", 82], ["30-second", 84], ["commercial", 94], ["during", 105], ["the", 112], ["game", 116], [".", 120], ["The", 122], ["Super", 126], ["Bowl", 132], ["50", 137], ["halftime", 140], ["show", 149], ["was", 154], ["headlined", 158], ["by", 168], ["the", 171], ["British", 175], ["rock", 183], ["group", 188], ["Coldplay", 194], ["with", 203], ["special", 208], ["guest", 216], ["performers", 222], ["Beyonc\u00e9", 233], ["and", 241], ["Bruno", 245], ["Mars", 251], [",", 255], ["who", 257], ["headlined", 261], ["the", 271], ["Super", 275], ["Bowl", 281], ["XLVII", 286], ["and", 292], ["Super", 296], ["Bowl", 302], ["XLVIII", 307], ["halftime", 314], ["shows", 323], [",", 328], ["respectively", 330], [".", 342], ["It", 344], ["was", 347], ["the", 351], ["third", 355], ["-", 360], ["most", 361], ["watched", 366], ["U.S.", 374], ["broadcast", 379], ["ever", 389], [".", 393]]}
{"id": "", "context": "The league eventually narrowed the bids to three sites: New Orleans' Mercedes-Benz Superdome, Miami's Sun Life Stadium, and the San Francisco Bay Area's Levi's Stadium.", "qas": [{"answers": ["New Orleans' Mercedes-Benz Superdome", "New Orleans' Mercedes-Benz Superdome", "Mercedes-Benz Superdome"], "question": "Which Louisiana venue was one of three considered for Super Bowl 50?", "id": "56be5438acb8001400a5031a", "qid": "a76d3c5264da4af68c11aadcbac645c1", "question_tokens": [["Which", 0], ["Louisiana", 6], ["venue", 16], ["was", 22], ["one", 26], ["of", 30], ["three", 33], ["considered", 39], ["for", 50], ["Super", 54], ["Bowl", 60], ["50", 65], ["?", 67]], "detected_answers": [{"text": "Mercedes-Benz Superdome", "char_spans": [[69, 91]], "token_spans": [[13, 16]]}]}, {"answers": ["Miami's Sun Life Stadium", "Miami's Sun Life Stadium", "Sun Life Stadium"], "question": "Which Florida venue was one of three considered for Super Bowl 50?", "id": "56be5438acb8001400a5031b", "qid": "38dec4e3dfcb48cfbaeb1d34ce1c8d4d", "question_tokens": [["Which", 0], ["Florida", 6], ["venue", 14], ["was", 20], ["one", 24], ["of", 28], ["three", 31], ["considered", 37], ["for", 48], ["Super", 52], ["Bowl", 58], ["50", 63], ["?", 65]], "detected_answers": [{"text": "Sun Life Stadium", "char_spans": [[102, 117]], "token_spans": [[20, 22]]}]}, {"answers": ["San Francisco Bay Area's Levi's Stadium", "San Francisco Bay Area's Levi's Stadium", "Levi's Stadium"], "question": "Which California venue was one of three considered for Super Bowl 50?", "id": "56be5438acb8001400a5031c", "qid": "a3a813c2479d435793ebceaa7017cb4a", "question_tokens": [["Which", 0], ["California", 6], ["venue", 17], ["was", 23], ["one", 27], ["of", 31], ["three", 34], ["considered", 40], ["for", 51], ["Super", 55], ["Bowl", 61], ["50", 66], ["?", 68]], "detected_answers": [{"text": "Levi's Stadium", "char_spans": [[153, 166]], "token_spans": [[31, 33]]}]}, {"answers": ["Sun Life Stadium", "Sun Life Stadium", "Sun Life Stadium"], "question": "What venue in Miami was a candidate for the site of Super Bowl 50?", "id": "56beb03c3aeaaa14008c920b", "qid": "af29d2ba013041048ae9c7fc80c87c69", "question_tokens": [["What", 0], ["venue", 5], ["in", 11], ["Miami", 14], ["was", 20], ["a", 24], ["candidate", 26], ["for", 36], ["the", 40], ["site", 44], ["of", 49], ["Super", 52], ["Bowl", 58], ["50", 63], ["?", 65]], "detected_answers": [{"text": "Sun Life Stadium", "char_spans": [[102, 117]], "token_spans": [[20, 22]]}]}, {"answers": ["Levi's Stadium", "Levi's Stadium", "Levi's Stadium"], "question": "What site is located in the San Francisco Bay Area?", "id": "56beb03c3aeaaa14008c920d", "qid": "2624809fbaba41b7a7f293ba928b0a4f", "question_tokens": [["What", 0], ["site", 5], ["is", 10], ["located", 13], ["in", 21], ["the", 24], ["San", 28], ["Francisco", 32], ["Bay", 42], ["Area", 46], ["?", 50]], "detected_answers": [{"text": "Levi's Stadium", "char_spans": [[153, 166]], "token_spans": [[31, 33]]}]}, {"answers": ["Levi's Stadium", "Levi's Stadium", "Levi's Stadium."], "question": "What is the name of San Francisco's stadium when looked at as a possibility for Super Bowl 50?", "id": "56bf3c633aeaaa14008c9580", "qid": "d3577e3a35a44b5698776546235b1afe", "question_tokens": [["What", 0], ["is", 5], ["the", 8], ["name", 12], ["of", 17], ["San", 20], ["Francisco", 24], ["'s", 33], ["stadium", 36], ["when", 44], ["looked", 49], ["at", 56], ["as", 59], ["a", 62], ["possibility", 64], ["for", 76], ["Super", 80], ["Bowl", 86], ["50", 91], ["?", 93]], "detected_answers": [{"text": "Levi's Stadium.", "char_spans": [[153, 167]], "token_spans": [[31, 34]]}]}, {"answers": ["Mercedes-Benz Superdome", "Mercedes-Benz Superdome", "Mercedes-Benz Superdome"], "question": "What was the name of New Orleans' superdome at the time that Super Bowl 50 took place?", "id": "56bf3c633aeaaa14008c9581", "qid": "8b5db421a73d4a6e8a2cc23306a86d4f", "question_tokens": [["What", 0], ["was", 5], ["the", 9], ["name", 13], ["of", 18], ["New", 21], ["Orleans", 25], ["'", 32], ["superdome", 34], ["at", 44], ["the", 47], ["time", 51], ["that", 56], ["Super", 61], ["Bowl", 67], ["50", 72], ["took", 75], ["place", 80], ["?", 85]], "detected_answers": [{"text": "Mercedes-Benz Superdome", "char_spans": [[69, 91]], "token_spans": [[13, 16]]}]}, {"answers": ["Sun Life Stadium", "Sun Life Stadium", "Sun Life Stadium"], "question": "What was the given name of Miami's stadium at the time of Super Bowl 50?", "id": "56bf3c633aeaaa14008c9582", "qid": "9433233f7fbc417882f5b3bf324b9a18", "question_tokens": [["What", 0], ["was", 5], ["the", 9], ["given", 13], ["name", 19], ["of", 24], ["Miami", 27], ["'s", 32], ["stadium", 35], ["at", 43], ["the", 46], ["time", 50], ["of", 55], ["Super", 58], ["Bowl", 64], ["50", 69], ["?", 71]], "detected_answers": [{"text": "Sun Life Stadium", "char_spans": [[102, 117]], "token_spans": [[20, 22]]}]}, {"answers": ["New Orleans' Mercedes-Benz Superdome, Miami's Sun Life Stadium, and the San Francisco Bay Area's Levi's Stadium", "New Orleans' Mercedes-Benz Superdome, Miami's Sun Life Stadium, and the San Francisco Bay Area's Levi's Stadium.", "New Orleans' Mercedes-Benz Superdome, Miami's Sun Life Stadium, and the San Francisco Bay Area's Levi's Stadium."], "question": "What three stadiums did the NFL decide between for the game?", "id": "56d20564e7d4791d00902612", "qid": "2f2b6dda6f9541c6aa54e8c503c27b44", "question_tokens": [["What", 0], ["three", 5], ["stadiums", 11], ["did", 20], ["the", 24], ["NFL", 28], ["decide", 32], ["between", 39], ["for", 47], ["the", 51], ["game", 55], ["?", 59]], "detected_answers": [{"text": "New Orleans' Mercedes-Benz Superdome, Miami's Sun Life Stadium, and the San Francisco Bay Area's Levi's Stadium.", "char_spans": [[56, 167]], "token_spans": [[10, 34]]}]}, {"answers": ["three", "three", "three"], "question": "How many sites did the NFL narrow down Super Bowl 50's location to?", "id": "56d6ee6e0d65d21400198254", "qid": "bd6015231c15435780b020832ada86e5", "question_tokens": [["How", 0], ["many", 4], ["sites", 9], ["did", 15], ["the", 19], ["NFL", 23], ["narrow", 27], ["down", 34], ["Super", 39], ["Bowl", 45], ["50", 50], ["'s", 52], ["location", 55], ["to", 64], ["?", 66]], "detected_answers": [{"text": "three", "char_spans": [[43, 47]], "token_spans": [[7, 7]]}]}, {"answers": ["New Orleans", "New Orleans", "New Orleans'"], "question": "One of the sites, Merceds-Benz Superdome, is located where?", "id": "56d6ee6e0d65d21400198255", "qid": "b57a3b8a64f14502a58f26d61562aa17", "question_tokens": [["One", 0], ["of", 4], ["the", 7], ["sites", 11], [",", 16], ["Merceds", 18], ["-", 25], ["Benz", 26], ["Superdome", 31], [",", 40], ["is", 42], ["located", 45], ["where", 53], ["?", 58]], "detected_answers": [{"text": "New Orleans'", "char_spans": [[56, 67]], "token_spans": [[10, 12]]}]}, {"answers": ["Sun Life Stadium", "Sun Life Stadium", "Sun Life Stadium"], "question": "What is the name of the stadium in Miami that was considered?", "id": "56d6ee6e0d65d21400198256", "qid": "4bc32a6d068f45f7863322228b5222fc", "question_tokens": [["What", 0], ["is", 5], ["the", 8], ["name", 12], ["of", 17], ["the", 20], ["stadium", 24], ["in", 32], ["Miami", 35], ["that", 41], ["was", 46], ["considered", 50], ["?", 60]], "detected_answers": [{"text": "Sun Life Stadium", "char_spans": [[102, 117]], "token_spans": [[20, 22]]}]}, {"answers": ["San Francisco", "San Francisco", "San Francisco Bay Area's"], "question": "What was the third city that was considered?", "id": "56d6ee6e0d65d21400198257", "qid": "df37bf87d0e54df88c25d785dd8a0990", "question_tokens": [["What", 0], ["was", 5], ["the", 9], ["third", 13], ["city", 19], ["that", 24], ["was", 29], ["considered", 33], ["?", 43]], "detected_answers": [{"text": "San Francisco Bay Area's", "char_spans": [[128, 151]], "token_spans": [[26, 30]]}]}, {"answers": ["Levi's Stadium.", "Levi's Stadium", "Levi's Stadium."], "question": "What is the name of the stadium in San Francisco Bay Area?", "id": "56d6ee6e0d65d21400198258", "qid": "51aa32cd575e4f89939cf9279afc8463", "question_tokens": [["What", 0], ["is", 5], ["the", 8], ["name", 12], ["of", 17], ["the", 20], ["stadium", 24], ["in", 32], ["San", 35], ["Francisco", 39], ["Bay", 49], ["Area", 53], ["?", 57]], "detected_answers": [{"text": "Levi's Stadium.", "char_spans": [[153, 167]], "token_spans": [[31, 34]]}]}, {"answers": ["Sun Life Stadium", "Sun Life Stadium", "Sun Life Stadium"], "question": "What Florida stadium was considered for Super Bowl 50?", "id": "56d98db6dc89441400fdb552", "qid": "f130a64961bd4ccca57897ffeec58791", "question_tokens": [["What", 0], ["Florida", 5], ["stadium", 13], ["was", 21], ["considered", 25], ["for", 36], ["Super", 40], ["Bowl", 46], ["50", 51], ["?", 53]], "detected_answers": [{"text": "Sun Life Stadium", "char_spans": [[102, 117]], "token_spans": [[20, 22]]}]}, {"answers": ["Mercedes-Benz Superdome", "Mercedes-Benz Superdome", "Mercedes-Benz Superdome,"], "question": "What New Orleans stadium was considered for Super Bowl 50?", "id": "56d98db6dc89441400fdb553", "qid": "9acb55c4351a4534b7aa7553a2cbbe97", "question_tokens": [["What", 0], ["New", 5], ["Orleans", 9], ["stadium", 17], ["was", 25], ["considered", 29], ["for", 40], ["Super", 44], ["Bowl", 50], ["50", 55], ["?", 57]], "detected_answers": [{"text": "Mercedes-Benz Superdome,", "char_spans": [[69, 92]], "token_spans": [[13, 17]]}]}, {"answers": ["Levi's Stadium.", "Levi's Stadium", "Levi's Stadium."], "question": "What is the name of the stadium where Super Bowl 50 was played?", "id": "56d98db6dc89441400fdb554", "qid": "eaaeea366d5145059e51c31dfc75fb5d", "question_tokens": [["What", 0], ["is", 5], ["the", 8], ["name", 12], ["of", 17], ["the", 20], ["stadium", 24], ["where", 32], ["Super", 38], ["Bowl", 44], ["50", 49], ["was", 52], ["played", 56], ["?", 62]], "detected_answers": [{"text": "Levi's Stadium.", "char_spans": [[153, 167]], "token_spans": [[31, 34]]}]}], "context_tokens": [["The", 0], ["league", 4], ["eventually", 11], ["narrowed", 22], ["the", 31], ["bids", 35], ["to", 40], ["three", 43], ["sites", 49], [":", 54], ["New", 56], ["Orleans", 60], ["'", 67], ["Mercedes", 69], ["-", 77], ["Benz", 78], ["Superdome", 83], [",", 92], ["Miami", 94], ["'s", 99], ["Sun", 102], ["Life", 106], ["Stadium", 111], [",", 118], ["and", 120], ["the", 124], ["San", 128], ["Francisco", 132], ["Bay", 142], ["Area", 146], ["'s", 150], ["Levi", 153], ["'s", 157], ["Stadium", 160], [".", 167]]}
{"id": "", "context": "On May 21, 2013, NFL owners at their spring meetings in Boston voted and awarded the game to Levi's Stadium. The $1.2 billion stadium opened in 2014. It is the first Super Bowl held in the San Francisco Bay Area since Super Bowl XIX in 1985, and the first in California since Super Bowl XXXVII took place in San Diego in 2003.", "qas": [{"answers": ["May 21, 2013", "May 21, 2013", "May 21, 2013,"], "question": "When was Levi's Stadium awarded the right to host Super Bowl 50?", "id": "56be5523acb8001400a5032c", "qid": "af73833956fe4991baf9a6cef4978577", "question_tokens": [["When", 0], ["was", 5], ["Levi", 9], ["'s", 13], ["Stadium", 16], ["awarded", 24], ["the", 32], ["right", 36], ["to", 42], ["host", 45], ["Super", 50], ["Bowl", 56], ["50", 61], ["?", 63]], "detected_answers": [{"text": "May 21, 2013,", "char_spans": [[3, 15]], "token_spans": [[1, 5]]}]}, {"answers": ["NFL owners", "NFL owners", "NFL owners"], "question": "Who voted on the venue for Super Bowl 50?", "id": "56be5523acb8001400a5032d", "qid": "1c1158f70ceb460c809cd8deb51b4aa2", "question_tokens": [["Who", 0], ["voted", 4], ["on", 10], ["the", 13], ["venue", 17], ["for", 23], ["Super", 27], ["Bowl", 33], ["50", 38], ["?", 40]], "detected_answers": [{"text": "NFL owners", "char_spans": [[17, 26]], "token_spans": [[6, 7]]}]}, {"answers": ["2014", "in 2014", "2014"], "question": "When did Lev's Stadium open?", "id": "56be5523acb8001400a5032e", "qid": "4d9d4a9f586540c1b5227e8a6c96741b", "question_tokens": [["When", 0], ["did", 5], ["Lev", 9], ["'s", 12], ["Stadium", 15], ["open", 23], ["?", 27]], "detected_answers": [{"text": "2014", "char_spans": [[144, 147]], "token_spans": [[31, 31]]}]}, {"answers": ["$1.2 billion", "$1.2 billion", "$1.2 billion"], "question": "How much did it cost to build Levi's Stadium?", "id": "56be5523acb8001400a5032f", "qid": "a3bf86576c0342fca8dcc03a882cb85c", "question_tokens": [["How", 0], ["much", 4], ["did", 9], ["it", 13], ["cost", 16], ["to", 21], ["build", 24], ["Levi", 30], ["'s", 34], ["Stadium", 37], ["?", 44]], "detected_answers": [{"text": "$1.2 billion", "char_spans": [[113, 124]], "token_spans": [[25, 27]]}]}, {"answers": ["San Diego", "San Diego", "San Diego"], "question": "What California city last hosted the Super Bowl?", "id": "56be5523acb8001400a50330", "qid": "9151a879be3d4a8593009534310b4654", "question_tokens": [["What", 0], ["California", 5], ["city", 16], ["last", 21], ["hosted", 26], ["the", 33], ["Super", 37], ["Bowl", 43], ["?", 47]], "detected_answers": [{"text": "San Diego", "char_spans": [[308, 316]], "token_spans": [[65, 66]]}]}, {"answers": ["Boston", "in Boston", "May 21, 2013"], "question": "Where did the spring meetings of the NFL owners take place?", "id": "56beb2153aeaaa14008c9225", "qid": "1e65836b1a664410966341a9178768f1", "question_tokens": [["Where", 0], ["did", 6], ["the", 10], ["spring", 14], ["meetings", 21], ["of", 30], ["the", 33], ["NFL", 37], ["owners", 41], ["take", 48], ["place", 53], ["?", 58]], "detected_answers": [{"text": "May 21, 2013", "char_spans": [[3, 14]], "token_spans": [[1, 4]]}]}, {"answers": ["May 21, 2013", "May 21, 2013", "May 21, 2013,"], "question": "On what date was Super Bowl 50 given to Levi's Stadium?", "id": "56beb2153aeaaa14008c9226", "qid": "83d81045f7744f30ba5c03f6adf08438", "question_tokens": [["On", 0], ["what", 3], ["date", 8], ["was", 13], ["Super", 17], ["Bowl", 23], ["50", 28], ["given", 31], ["to", 37], ["Levi", 40], ["'s", 44], ["Stadium", 47], ["?", 54]], "detected_answers": [{"text": "May 21, 2013,", "char_spans": [[3, 15]], "token_spans": [[1, 5]]}]}, {"answers": ["$1.2 billion", "$1.2 billion", "$1.2 billion"], "question": "How much did it cost to build Levi's Stadium?", "id": "56beb2153aeaaa14008c9227", "qid": "3c29ee3785fa46b092dc7e5034bb8ca0", "question_tokens": [["How", 0], ["much", 4], ["did", 9], ["it", 13], ["cost", 16], ["to", 21], ["build", 24], ["Levi", 30], ["'s", 34], ["Stadium", 37], ["?", 44]], "detected_answers": [{"text": "$1.2 billion", "char_spans": [[113, 124]], "token_spans": [[25, 27]]}]}, {"answers": ["Super Bowl XXXVII", "Super Bowl XXXVII", "XXXVII"], "question": "Prior to Super Bowl 50, what was the last Super Bowl in California?", "id": "56beb2153aeaaa14008c9228", "qid": "3d8ac2bd694a4ac7bad126269d0c92a0", "question_tokens": [["Prior", 0], ["to", 6], ["Super", 9], ["Bowl", 15], ["50", 20], [",", 22], ["what", 24], ["was", 29], ["the", 33], ["last", 37], ["Super", 42], ["Bowl", 48], ["in", 53], ["California", 56], ["?", 66]], "detected_answers": [{"text": "XXXVII", "char_spans": [[287, 292]], "token_spans": [[61, 61]]}]}, {"answers": ["San Diego", "San Diego", "San Diego"], "question": "In what city did the last Super Bowl in California occur?", "id": "56beb2153aeaaa14008c9229", "qid": "6313f86b729e4d0da3c7a7ab7fbdb6d2", "question_tokens": [["In", 0], ["what", 3], ["city", 8], ["did", 13], ["the", 17], ["last", 21], ["Super", 26], ["Bowl", 32], ["in", 37], ["California", 40], ["occur", 51], ["?", 56]], "detected_answers": [{"text": "San Diego", "char_spans": [[308, 316]], "token_spans": [[65, 66]]}]}, {"answers": ["2013", "2013", "2013"], "question": "What year did Levi's Stadium become fully approved to host Super Bowl 50?", "id": "56bf23363aeaaa14008c952f", "qid": "03910221e3584c339e1a16176c2cfe25", "question_tokens": [["What", 0], ["year", 5], ["did", 10], ["Levi", 14], ["'s", 18], ["Stadium", 21], ["become", 29], ["fully", 36], ["approved", 42], ["to", 51], ["host", 54], ["Super", 59], ["Bowl", 65], ["50", 70], ["?", 72]], "detected_answers": [{"text": "2013", "char_spans": [[11, 14]], "token_spans": [[4, 4]]}]}, {"answers": ["2014", "2014", "2014"], "question": "When did Levi's stadium open to the public? ", "id": "56bf23363aeaaa14008c9530", "qid": "660318b1d68a45e7b9131ea05889dd92", "question_tokens": [["When", 0], ["did", 5], ["Levi", 9], ["'s", 13], ["stadium", 16], ["open", 24], ["to", 29], ["the", 32], ["public", 36], ["?", 42]], "detected_answers": [{"text": "2014", "char_spans": [[144, 147]], "token_spans": [[31, 31]]}]}, {"answers": ["$1.2 billion", "$1.2 billion", "$1.2 billion"], "question": "How much did it cost to build the stadium where Super Bowl 50 was played?", "id": "56bf23363aeaaa14008c9531", "qid": "d0d23a90127a4c3f8b3928ef185631fd", "question_tokens": [["How", 0], ["much", 4], ["did", 9], ["it", 13], ["cost", 16], ["to", 21], ["build", 24], ["the", 30], ["stadium", 34], ["where", 42], ["Super", 48], ["Bowl", 54], ["50", 59], ["was", 62], ["played", 66], ["?", 72]], "detected_answers": [{"text": "$1.2 billion", "char_spans": [[113, 124]], "token_spans": [[25, 27]]}]}, {"answers": ["1985", "1985", "1985"], "question": "What year did a Super Bowl play in the bay area around San Francisco, prior to Super Bowl 50?", "id": "56bf23363aeaaa14008c9532", "qid": "b0c8ddcc8734483f8fa0245dca48785b", "question_tokens": [["What", 0], ["year", 5], ["did", 10], ["a", 14], ["Super", 16], ["Bowl", 22], ["play", 27], ["in", 32], ["the", 35], ["bay", 39], ["area", 43], ["around", 48], ["San", 55], ["Francisco", 59], [",", 68], ["prior", 70], ["to", 76], ["Super", 79], ["Bowl", 85], ["50", 90], ["?", 92]], "detected_answers": [{"text": "1985", "char_spans": [[236, 239]], "token_spans": [[51, 51]]}]}, {"answers": ["Super Bowl XXXVII", "Super Bowl XXXVII", "XXXVII"], "question": "Which Super Bowl was hosted in San Diego in 2003? ", "id": "56bf23363aeaaa14008c9533", "qid": "a9fdb83b628849e89bfd5ead56d2e6ee", "question_tokens": [["Which", 0], ["Super", 6], ["Bowl", 12], ["was", 17], ["hosted", 21], ["in", 28], ["San", 31], ["Diego", 35], ["in", 41], ["2003", 44], ["?", 48]], "detected_answers": [{"text": "XXXVII", "char_spans": [[287, 292]], "token_spans": [[61, 61]]}]}, {"answers": ["May 21, 2013", "May 21, 2013,", "May 21, 2013"], "question": "When was San Francisco voted to be the location for Super Bowl 50?", "id": "56d6f0770d65d21400198268", "qid": "3b37d8ac6ff6441d87bfab0589ee4a22", "question_tokens": [["When", 0], ["was", 5], ["San", 9], ["Francisco", 13], ["voted", 23], ["to", 29], ["be", 32], ["the", 35], ["location", 39], ["for", 48], ["Super", 52], ["Bowl", 58], ["50", 63], ["?", 65]], "detected_answers": [{"text": "May 21, 2013", "char_spans": [[3, 14]], "token_spans": [[1, 4]]}]}, {"answers": ["2014", "in 2014", "2014"], "question": "When did Levi's Stadium open?", "id": "56d6f0770d65d21400198269", "qid": "2b8e91f54fc241e8ab6d393ed21f4c2b", "question_tokens": [["When", 0], ["did", 5], ["Levi", 9], ["'s", 13], ["Stadium", 16], ["open", 24], ["?", 28]], "detected_answers": [{"text": "2014", "char_spans": [[144, 147]], "token_spans": [[31, 31]]}]}, {"answers": ["2003", "in 2003", "2003"], "question": "When was the last Super Bowl in California?", "id": "56d6f0770d65d2140019826a", "qid": "39c35592698d4f8b82183414e95a97e0", "question_tokens": [["When", 0], ["was", 5], ["the", 9], ["last", 13], ["Super", 18], ["Bowl", 24], ["in", 29], ["California", 32], ["?", 42]], "detected_answers": [{"text": "2003", "char_spans": [[321, 324]], "token_spans": [[68, 68]]}]}, {"answers": ["Boston", "in Boston", "Boston"], "question": "Where was the meeting held when the NFL owners voted on the location for Super Bowl 50?", "id": "56d6f0770d65d2140019826c", "qid": "123c0940045243b6a72eead43011d1fc", "question_tokens": [["Where", 0], ["was", 6], ["the", 10], ["meeting", 14], ["held", 22], ["when", 27], ["the", 32], ["NFL", 36], ["owners", 40], ["voted", 47], ["on", 53], ["the", 56], ["location", 60], ["for", 69], ["Super", 73], ["Bowl", 79], ["50", 84], ["?", 86]], "detected_answers": [{"text": "Boston", "char_spans": [[56, 61]], "token_spans": [[13, 13]]}]}, {"answers": ["May 21, 2013", "May 21, 2013", "May 21, 2013"], "question": "When was Levi's Stadium picked for Super bowl 50?", "id": "56d98fbfdc89441400fdb562", "qid": "3c2a542e2d8d4c159215b971ad3f096d", "question_tokens": [["When", 0], ["was", 5], ["Levi", 9], ["'s", 13], ["Stadium", 16], ["picked", 24], ["for", 31], ["Super", 35], ["bowl", 41], ["50", 46], ["?", 48]], "detected_answers": [{"text": "May 21, 2013", "char_spans": [[3, 14]], "token_spans": [[1, 4]]}]}, {"answers": ["2014.", "in 2014", "2014"], "question": "When did Levi's Stadium open?", "id": "56d98fbfdc89441400fdb563", "qid": "622ba886d68649619d926563ca28eb8f", "question_tokens": [["When", 0], ["did", 5], ["Levi", 9], ["'s", 13], ["Stadium", 16], ["open", 24], ["?", 28]], "detected_answers": [{"text": "2014", "char_spans": [[144, 147]], "token_spans": [[31, 31]]}]}, {"answers": ["$1.2 billion", "$1.2 billion", "$1.2 billion"], "question": "How much did Levi's Stadium cost?", "id": "56d98fbfdc89441400fdb564", "qid": "ab581f4ebb6e4f3a97e224e9fe6a515f", "question_tokens": [["How", 0], ["much", 4], ["did", 9], ["Levi", 13], ["'s", 17], ["Stadium", 20], ["cost", 28], ["?", 32]], "detected_answers": [{"text": "$1.2 billion", "char_spans": [[113, 124]], "token_spans": [[25, 27]]}]}, {"answers": ["2003.", "2003", "2003"], "question": "When was the last time California hosted a Super Bowl?", "id": "56d98fbfdc89441400fdb565", "qid": "bb8ff77cd36a4f02bc6995da68245cc1", "question_tokens": [["When", 0], ["was", 5], ["the", 9], ["last", 13], ["time", 18], ["California", 23], ["hosted", 34], ["a", 41], ["Super", 43], ["Bowl", 49], ["?", 53]], "detected_answers": [{"text": "2003", "char_spans": [[321, 324]], "token_spans": [[68, 68]]}]}], "context_tokens": [["On", 0], ["May", 3], ["21", 7], [",", 9], ["2013", 11], [",", 15], ["NFL", 17], ["owners", 21], ["at", 28], ["their", 31], ["spring", 37], ["meetings", 44], ["in", 53], ["Boston", 56], ["voted", 63], ["and", 69], ["awarded", 73], ["the", 81], ["game", 85], ["to", 90], ["Levi", 93], ["'s", 97], ["Stadium", 100], [".", 107], ["The", 109], ["$", 113], ["1.2", 114], ["billion", 118], ["stadium", 126], ["opened", 134], ["in", 141], ["2014", 144], [".", 148], ["It", 150], ["is", 153], ["the", 156], ["first", 160], ["Super", 166], ["Bowl", 172], ["held", 177], ["in", 182], ["the", 185], ["San", 189], ["Francisco", 193], ["Bay", 203], ["Area", 207], ["since", 212], ["Super", 218], ["Bowl", 224], ["XIX", 229], ["in", 233], ["1985", 236], [",", 240], ["and", 242], ["the", 246], ["first", 250], ["in", 256], ["California", 259], ["since", 270], ["Super", 276], ["Bowl", 282], ["XXXVII", 287], ["took", 294], ["place", 299], ["in", 305], ["San", 308], ["Diego", 312], ["in", 318], ["2003", 321], [".", 325]]}
{"id": "", "context": "The league announced on October 16, 2012, that the two finalists were Sun Life Stadium and Levi's Stadium. The South Florida/Miami area has previously hosted the event 10 times (tied for most with New Orleans), with the most recent one being Super Bowl XLIV in 2010. The San Francisco Bay Area last hosted in 1985 (Super Bowl XIX), held at Stanford Stadium in Stanford, California, won by the home team 49ers. The Miami bid depended on whether the stadium underwent renovations. However, on May 3, 2013, the Florida legislature refused to approve the funding plan to pay for the renovations, dealing a significant blow to Miami's chances.", "qas": [{"answers": ["October 16, 2012", "October 16, 2012,", "October 16, 2012"], "question": "When were the two finalists for hosting Super Bowl 50 announced?", "id": "56be54bdacb8001400a50322", "qid": "4d621b4978da4a5dbd57791b86da9c71", "question_tokens": [["When", 0], ["were", 5], ["the", 10], ["two", 14], ["finalists", 18], ["for", 28], ["hosting", 32], ["Super", 40], ["Bowl", 46], ["50", 51], ["announced", 54], ["?", 63]], "detected_answers": [{"text": "October 16, 2012", "char_spans": [[24, 39]], "token_spans": [[4, 7]]}]}, {"answers": ["10", "10", "10"], "question": "How many times has the South Florida/Miami area hosted the Super Bowl?", "id": "56be54bdacb8001400a50323", "qid": "2e83992ae554439b8e85c47c3518977e", "question_tokens": [["How", 0], ["many", 4], ["times", 9], ["has", 15], ["the", 19], ["South", 23], ["Florida", 29], ["/", 36], ["Miami", 37], ["area", 43], ["hosted", 48], ["the", 55], ["Super", 59], ["Bowl", 65], ["?", 69]], "detected_answers": [{"text": "10", "char_spans": [[168, 169]], "token_spans": [[33, 33]]}]}, {"answers": ["Super Bowl XLIV", "Super Bowl XLIV", "2010"], "question": "What was the most recent Super Bowl hosted in the South Florida/Miami area?", "id": "56be54bdacb8001400a50324", "qid": "85b0a0b8b1ab46a68e968470f4e5a390", "question_tokens": [["What", 0], ["was", 5], ["the", 9], ["most", 13], ["recent", 18], ["Super", 25], ["Bowl", 31], ["hosted", 36], ["in", 43], ["the", 46], ["South", 50], ["Florida", 56], ["/", 63], ["Miami", 64], ["area", 70], ["?", 74]], "detected_answers": [{"text": "2010", "char_spans": [[261, 264]], "token_spans": [[54, 54]]}]}, {"answers": ["2010", "2010", "2010"], "question": "When was the most recent Super Bowl hosted in the South Florida/Miami area?", "id": "56be54bdacb8001400a50325", "qid": "37042bcbf3774e3a8b6fc7582af90bc5", "question_tokens": [["When", 0], ["was", 5], ["the", 9], ["most", 13], ["recent", 18], ["Super", 25], ["Bowl", 31], ["hosted", 36], ["in", 43], ["the", 46], ["South", 50], ["Florida", 56], ["/", 63], ["Miami", 64], ["area", 70], ["?", 74]], "detected_answers": [{"text": "2010", "char_spans": [[261, 264]], "token_spans": [[54, 54]]}]}, {"answers": ["1985", "1985", "1985"], "question": "When did the San Francisco Bay area last host the Super Bowl?", "id": "56be54bdacb8001400a50326", "qid": "c1610b846d7f4f7a8cb684a694470ce5", "question_tokens": [["When", 0], ["did", 5], ["the", 9], ["San", 13], ["Francisco", 17], ["Bay", 27], ["area", 31], ["last", 36], ["host", 41], ["the", 46], ["Super", 50], ["Bowl", 56], ["?", 60]], "detected_answers": [{"text": "1985", "char_spans": [[309, 312]], "token_spans": [[64, 64]]}]}, {"answers": ["Sun Life Stadium", "Sun Life Stadium", "Sun Life Stadium"], "question": "What was the other finalist besides Levi's Stadium?", "id": "56beb0f43aeaaa14008c921b", "qid": "aaaee866ce0f4876bf9b5f411f80c7bb", "question_tokens": [["What", 0], ["was", 5], ["the", 9], ["other", 13], ["finalist", 19], ["besides", 28], ["Levi", 36], ["'s", 40], ["Stadium", 43], ["?", 50]], "detected_answers": [{"text": "Sun Life Stadium", "char_spans": [[70, 85]], "token_spans": [[14, 16]]}]}, {"answers": ["October 16, 2012", "October 16, 2012", "October 16, 2012,"], "question": "When were the finalists announced?", "id": "56beb0f43aeaaa14008c921c", "qid": "cff82fdf8f12427aab53d690ce70d30e", "question_tokens": [["When", 0], ["were", 5], ["the", 10], ["finalists", 14], ["announced", 24], ["?", 33]], "detected_answers": [{"text": "October 16, 2012,", "char_spans": [[24, 40]], "token_spans": [[4, 8]]}]}, {"answers": ["Stanford Stadium", "Stanford Stadium", "Stanford Stadium"], "question": "In what venue did Super Bowl XIX take place?", "id": "56beb0f43aeaaa14008c921d", "qid": "aa03e9f4b1354458b939ccf3b1b5ff2d", "question_tokens": [["In", 0], ["what", 3], ["venue", 8], ["did", 14], ["Super", 18], ["Bowl", 24], ["XIX", 29], ["take", 33], ["place", 38], ["?", 43]], "detected_answers": [{"text": "Stanford Stadium", "char_spans": [[340, 355]], "token_spans": [[73, 74]]}]}, {"answers": ["May 3, 2013", "May 3, 2013", "May 3, 2013"], "question": "On what date did the Florida legislature decide against the plan to renovate the Miami stadium?", "id": "56beb0f43aeaaa14008c921e", "qid": "ef692734c5774551b78c1d6a8a1aa509", "question_tokens": [["On", 0], ["what", 3], ["date", 8], ["did", 13], ["the", 17], ["Florida", 21], ["legislature", 29], ["decide", 41], ["against", 48], ["the", 56], ["plan", 60], ["to", 65], ["renovate", 68], ["the", 77], ["Miami", 81], ["stadium", 87], ["?", 94]], "detected_answers": [{"text": "May 3, 2013", "char_spans": [[491, 501]], "token_spans": [[101, 104]]}]}, {"answers": ["2010", "2010", "2010"], "question": "In what year was the Super Bowl last held in the Miami/South Florida area?", "id": "56beb0f43aeaaa14008c921f", "qid": "a70a0ccb24d74a4a8b0f2a3e6436bdff", "question_tokens": [["In", 0], ["what", 3], ["year", 8], ["was", 13], ["the", 17], ["Super", 21], ["Bowl", 27], ["last", 32], ["held", 37], ["in", 42], ["the", 45], ["Miami", 49], ["/", 54], ["South", 55], ["Florida", 61], ["area", 69], ["?", 73]], "detected_answers": [{"text": "2010", "char_spans": [[261, 264]], "token_spans": [[54, 54]]}]}, {"answers": ["two", "10", "10"], "question": "How many times has a Super Bowl taken place at Miami's Sun Life Stadium?", "id": "56bf21b43aeaaa14008c9525", "qid": "6d0b299bfeff4ab196ab679382d17d8f", "question_tokens": [["How", 0], ["many", 4], ["times", 9], ["has", 15], ["a", 19], ["Super", 21], ["Bowl", 27], ["taken", 32], ["place", 38], ["at", 44], ["Miami", 47], ["'s", 52], ["Sun", 55], ["Life", 59], ["Stadium", 64], ["?", 71]], "detected_answers": [{"text": "10", "char_spans": [[168, 169]], "token_spans": [[33, 33]]}]}, {"answers": ["Super Bowl XLIV", "Super Bowl XLIV", "2010"], "question": "What was the last Super Bowl that took place at Sun Life Stadium in Miami? ", "id": "56bf21b43aeaaa14008c9526", "qid": "75fe1964de0046fa99d54ac2e81be956", "question_tokens": [["What", 0], ["was", 5], ["the", 9], ["last", 13], ["Super", 18], ["Bowl", 24], ["that", 29], ["took", 34], ["place", 39], ["at", 45], ["Sun", 48], ["Life", 52], ["Stadium", 57], ["in", 65], ["Miami", 68], ["?", 73]], "detected_answers": [{"text": "2010", "char_spans": [[261, 264]], "token_spans": [[54, 54]]}]}, {"answers": ["two", "two", "two"], "question": "In 2012, how many stadiums were named as finalists for hosting Super Bowl 50 before the final stadium was chosen?", "id": "56bf21b43aeaaa14008c9528", "qid": "e1f33f1fb239449c9c2048e8e48dd40c", "question_tokens": [["In", 0], ["2012", 3], [",", 7], ["how", 9], ["many", 13], ["stadiums", 18], ["were", 27], ["named", 32], ["as", 38], ["finalists", 41], ["for", 51], ["hosting", 55], ["Super", 63], ["Bowl", 69], ["50", 74], ["before", 77], ["the", 84], ["final", 88], ["stadium", 94], ["was", 102], ["chosen", 106], ["?", 112]], "detected_answers": [{"text": "two", "char_spans": [[51, 53]], "token_spans": [[11, 11]]}]}, {"answers": ["Florida legislature", "the Florida legislature", "Florida legislature"], "question": "What was the entity that stepped in and caused Miami's Sun Life Stadium to no longer be in the running to host Super Bowl 50?", "id": "56bf21b43aeaaa14008c9529", "qid": "64bf0e133c0741b0ab7bb79e3f695356", "question_tokens": [["What", 0], ["was", 5], ["the", 9], ["entity", 13], ["that", 20], ["stepped", 25], ["in", 33], ["and", 36], ["caused", 40], ["Miami", 47], ["'s", 52], ["Sun", 55], ["Life", 59], ["Stadium", 64], ["to", 72], ["no", 75], ["longer", 78], ["be", 85], ["in", 88], ["the", 91], ["running", 95], ["to", 103], ["host", 106], ["Super", 111], ["Bowl", 117], ["50", 122], ["?", 124]], "detected_answers": [{"text": "Florida legislature", "char_spans": [[508, 526]], "token_spans": [[107, 108]]}]}, {"answers": ["1985", "1985", "1985"], "question": "Prior to this consideration, when did San Francisco last host a Super Bowl?", "id": "56d6ef6a0d65d21400198260", "qid": "118f18b3891a4fa9b39b090ba2a49a89", "question_tokens": [["Prior", 0], ["to", 6], ["this", 9], ["consideration", 14], [",", 27], ["when", 29], ["did", 34], ["San", 38], ["Francisco", 42], ["last", 52], ["host", 57], ["a", 62], ["Super", 64], ["Bowl", 70], ["?", 74]], "detected_answers": [{"text": "1985", "char_spans": [[309, 312]], "token_spans": [[64, 64]]}]}, {"answers": ["New Orleans", "New Orleans", "New Orleans"], "question": "What other city has hosted the Super Bowl ten times?", "id": "56d6ef6a0d65d21400198262", "qid": "504beced7bd646f28d3a3d983d0019e9", "question_tokens": [["What", 0], ["other", 5], ["city", 11], ["has", 16], ["hosted", 20], ["the", 27], ["Super", 31], ["Bowl", 37], ["ten", 42], ["times", 46], ["?", 51]], "detected_answers": [{"text": "New Orleans", "char_spans": [[197, 207]], "token_spans": [[40, 41]]}]}, {"answers": ["October 16, 2012", "October 16, 2012", "October 16, 2012,"], "question": "What date were the top two stadium choices for Super Bowl 50 announced?", "id": "56d98f0ddc89441400fdb558", "qid": "6d639b775f574bd3b9955b0ab4ad61cf", "question_tokens": [["What", 0], ["date", 5], ["were", 10], ["the", 15], ["top", 19], ["two", 23], ["stadium", 27], ["choices", 35], ["for", 43], ["Super", 47], ["Bowl", 53], ["50", 58], ["announced", 61], ["?", 70]], "detected_answers": [{"text": "October 16, 2012,", "char_spans": [[24, 40]], "token_spans": [[4, 8]]}]}, {"answers": ["10.", "10", "10 times"], "question": "How many times prios has the Sun Life Stadium had Super Bowls?", "id": "56d98f0ddc89441400fdb559", "qid": "d2aa5e88daed457589d89095055adcbb", "question_tokens": [["How", 0], ["many", 4], ["times", 9], ["prios", 15], ["has", 21], ["the", 25], ["Sun", 29], ["Life", 33], ["Stadium", 38], ["had", 46], ["Super", 50], ["Bowls", 56], ["?", 61]], "detected_answers": [{"text": "10 times", "char_spans": [[168, 175]], "token_spans": [[33, 34]]}]}, {"answers": ["New Orleans", "New Orleans", "New Orleans"], "question": "What city is tied with Miami for hosting the Super Bowl?", "id": "56d98f0ddc89441400fdb55a", "qid": "2ee29208f7ae4be1a28fce711758a99f", "question_tokens": [["What", 0], ["city", 5], ["is", 10], ["tied", 13], ["with", 18], ["Miami", 23], ["for", 29], ["hosting", 33], ["the", 41], ["Super", 45], ["Bowl", 51], ["?", 55]], "detected_answers": [{"text": "New Orleans", "char_spans": [[197, 207]], "token_spans": [[40, 41]]}]}, {"answers": ["1985", "1985", "1985"], "question": "When was the last time San Francisco hosted a Super Bowl?", "id": "56d98f0ddc89441400fdb55b", "qid": "84c1a0498d4b4843a03ea86fa7e8a5c6", "question_tokens": [["When", 0], ["was", 5], ["the", 9], ["last", 13], ["time", 18], ["San", 23], ["Francisco", 27], ["hosted", 37], ["a", 44], ["Super", 46], ["Bowl", 52], ["?", 56]], "detected_answers": [{"text": "1985", "char_spans": [[309, 312]], "token_spans": [[64, 64]]}]}, {"answers": ["Florida legislature", "the Florida legislature", "Florida legislature"], "question": "Who decided not to approve paying for renovations at Sun Life Stadium that the league wanted for them to do to host Super Bowl 50?", "id": "56d98f0ddc89441400fdb55c", "qid": "0e0dd2d0d46c491082bc3f0196789a95", "question_tokens": [["Who", 0], ["decided", 4], ["not", 12], ["to", 16], ["approve", 19], ["paying", 27], ["for", 34], ["renovations", 38], ["at", 50], ["Sun", 53], ["Life", 57], ["Stadium", 62], ["that", 70], ["the", 75], ["league", 79], ["wanted", 86], ["for", 93], ["them", 97], ["to", 102], ["do", 105], ["to", 108], ["host", 111], ["Super", 116], ["Bowl", 122], ["50", 127], ["?", 129]], "detected_answers": [{"text": "Florida legislature", "char_spans": [[508, 526]], "token_spans": [[107, 108]]}]}], "context_tokens": [["The", 0], ["league", 4], ["announced", 11], ["on", 21], ["October", 24], ["16", 32], [",", 34], ["2012", 36], [",", 40], ["that", 42], ["the", 47], ["two", 51], ["finalists", 55], ["were", 65], ["Sun", 70], ["Life", 74], ["Stadium", 79], ["and", 87], ["Levi", 91], ["'s", 95], ["Stadium", 98], [".", 105], ["The", 107], ["South", 111], ["Florida", 117], ["/", 124], ["Miami", 125], ["area", 131], ["has", 136], ["previously", 140], ["hosted", 151], ["the", 158], ["event", 162], ["10", 168], ["times", 171], ["(", 177], ["tied", 178], ["for", 183], ["most", 187], ["with", 192], ["New", 197], ["Orleans", 201], [")", 208], [",", 209], ["with", 211], ["the", 216], ["most", 220], ["recent", 225], ["one", 232], ["being", 236], ["Super", 242], ["Bowl", 248], ["XLIV", 253], ["in", 258], ["2010", 261], [".", 265], ["The", 267], ["San", 271], ["Francisco", 275], ["Bay", 285], ["Area", 289], ["last", 294], ["hosted", 299], ["in", 306], ["1985", 309], ["(", 314], ["Super", 315], ["Bowl", 321], ["XIX", 326], [")", 329], [",", 330], ["held", 332], ["at", 337], ["Stanford", 340], ["Stadium", 349], ["in", 357], ["Stanford", 360], [",", 368], ["California", 370], [",", 380], ["won", 382], ["by", 386], ["the", 389], ["home", 393], ["team", 398], ["49ers", 403], [".", 408], ["The", 410], ["Miami", 414], ["bid", 420], ["depended", 424], ["on", 433], ["whether", 436], ["the", 444], ["stadium", 448], ["underwent", 456], ["renovations", 466], [".", 477], ["However", 479], [",", 486], ["on", 488], ["May", 491], ["3", 495], [",", 496], ["2013", 498], [",", 502], ["the", 504], ["Florida", 508], ["legislature", 516], ["refused", 528], ["to", 536], ["approve", 539], ["the", 547], ["funding", 551], ["plan", 559], ["to", 564], ["pay", 567], ["for", 571], ["the", 575], ["renovations", 579], [",", 590], ["dealing", 592], ["a", 600], ["significant", 602], ["blow", 614], ["to", 619], ["Miami", 622], ["'s", 627], ["chances", 630], [".", 637]]}
{"id": "", "context": "For the third straight season, the number one seeds from both conferences met in the Super Bowl. The Carolina Panthers became one of only ten teams to have completed a regular season with only one loss, and one of only six teams to have acquired a 15\u20131 record, while the Denver Broncos became one of four teams to have made eight appearances in the Super Bowl. The Broncos made their second Super Bowl appearance in three years, having reached Super Bowl XLVIII, while the Panthers made their second Super Bowl appearance in franchise history, their other appearance being Super Bowl XXXVIII. Coincidentally, both teams were coached by John Fox in their last Super Bowl appearance prior to Super Bowl 50.", "qas": [{"answers": ["John Fox", "John Fox", "Fox"], "question": "Who coached each Super Bowl 50 participant in their most recent Super Bowl appearance prior to Super Bowl 50?", "id": "56be572b3aeaaa14008c9052", "qid": "c5f55f0f56704c7aa28769060e232615", "question_tokens": [["Who", 0], ["coached", 4], ["each", 12], ["Super", 17], ["Bowl", 23], ["50", 28], ["participant", 31], ["in", 43], ["their", 46], ["most", 52], ["recent", 57], ["Super", 64], ["Bowl", 70], ["appearance", 75], ["prior", 86], ["to", 92], ["Super", 95], ["Bowl", 101], ["50", 106], ["?", 108]], "detected_answers": [{"text": "Fox", "char_spans": [[641, 643]], "token_spans": [[118, 118]]}]}, {"answers": ["ten", "ten", "six"], "question": "How many NFL teams have finished the regular season with one loss?", "id": "56beb2a03aeaaa14008c922f", "qid": "b3627c8795144e2a9915e506968a0f53", "question_tokens": [["How", 0], ["many", 4], ["NFL", 9], ["teams", 13], ["have", 19], ["finished", 24], ["the", 33], ["regular", 37], ["season", 45], ["with", 52], ["one", 57], ["loss", 61], ["?", 65]], "detected_answers": [{"text": "six", "char_spans": [[219, 221]], "token_spans": [[43, 43]]}]}, {"answers": ["six", "six", "six"], "question": "How many NFL teams have gone 15-1 in one season?", "id": "56beb2a03aeaaa14008c9230", "qid": "d93364aeb44a49c0b8d03fcdae097770", "question_tokens": [["How", 0], ["many", 4], ["NFL", 9], ["teams", 13], ["have", 19], ["gone", 24], ["15", 29], ["-", 31], ["1", 32], ["in", 34], ["one", 37], ["season", 41], ["?", 47]], "detected_answers": [{"text": "six", "char_spans": [[219, 221]], "token_spans": [[43, 43]]}]}, {"answers": ["Carolina Panthers", "The Carolina Panthers", "Panthers"], "question": "Which team in Super Bowl 50 had a 15-1 record?", "id": "56beb2a03aeaaa14008c9231", "qid": "67f71da206e14df29c4095c51e2240a5", "question_tokens": [["Which", 0], ["team", 6], ["in", 11], ["Super", 14], ["Bowl", 20], ["50", 25], ["had", 28], ["a", 32], ["15", 34], ["-", 36], ["1", 37], ["record", 39], ["?", 45]], "detected_answers": [{"text": "Panthers", "char_spans": [[110, 117]], "token_spans": [[21, 21]]}]}, {"answers": ["Super Bowl XLVIII", "Super Bowl XLVIII", "XLVIII"], "question": "What was the last Super Bowl the Broncos participated in?", "id": "56beb2a03aeaaa14008c9232", "qid": "c6488a805a0044f2818aab09b0a77edb", "question_tokens": [["What", 0], ["was", 5], ["the", 9], ["last", 13], ["Super", 18], ["Bowl", 24], ["the", 29], ["Broncos", 33], ["participated", 41], ["in", 54], ["?", 56]], "detected_answers": [{"text": "XLVIII", "char_spans": [[455, 460]], "token_spans": [[87, 87]]}]}, {"answers": ["John Fox", "John Fox", "Fox"], "question": "Who was the head coach of the Broncos in Super Bowl XLVIII?", "id": "56beb2a03aeaaa14008c9233", "qid": "51784276fee04f33966e71320a7d87c3", "question_tokens": [["Who", 0], ["was", 4], ["the", 8], ["head", 12], ["coach", 17], ["of", 23], ["the", 26], ["Broncos", 30], ["in", 38], ["Super", 41], ["Bowl", 47], ["XLVIII", 52], ["?", 58]], "detected_answers": [{"text": "Fox", "char_spans": [[641, 643]], "token_spans": [[118, 118]]}]}, {"answers": ["eight", "eight", "eight"], "question": "What was the number of times the Denver Broncos played in a Super Bowl by the time they reached Super Bowl 50?", "id": "56bf28c73aeaaa14008c9539", "qid": "ddcbfe39a20540df86ccc7b713d820f6", "question_tokens": [["What", 0], ["was", 5], ["the", 9], ["number", 13], ["of", 20], ["times", 23], ["the", 29], ["Denver", 33], ["Broncos", 40], ["played", 48], ["in", 55], ["a", 58], ["Super", 60], ["Bowl", 66], ["by", 71], ["the", 74], ["time", 78], ["they", 83], ["reached", 88], ["Super", 96], ["Bowl", 102], ["50", 107], ["?", 109]], "detected_answers": [{"text": "eight", "char_spans": [[324, 328]], "token_spans": [[64, 64]]}]}, {"answers": ["ten", "ten", "ten"], "question": "How many NFL teams have had only one loss by the end of a regular season?", "id": "56bf28c73aeaaa14008c953a", "qid": "f420b5aeeea04b40aa8943b7c3cb9c66", "question_tokens": [["How", 0], ["many", 4], ["NFL", 9], ["teams", 13], ["have", 19], ["had", 24], ["only", 28], ["one", 33], ["loss", 37], ["by", 42], ["the", 45], ["end", 49], ["of", 53], ["a", 56], ["regular", 58], ["season", 66], ["?", 72]], "detected_answers": [{"text": "ten", "char_spans": [[138, 140]], "token_spans": [[26, 26]]}]}, {"answers": ["Super Bowl XXXVIII", "Super Bowl XXXVIII", "XXXVIII"], "question": "What was the first Super Bowl that the Carolina Panthers played in? ", "id": "56bf28c73aeaaa14008c953c", "qid": "dda3b90e6e59403ab7d02ce32a8e84b5", "question_tokens": [["What", 0], ["was", 5], ["the", 9], ["first", 13], ["Super", 19], ["Bowl", 25], ["that", 30], ["the", 35], ["Carolina", 39], ["Panthers", 48], ["played", 57], ["in", 64], ["?", 66]], "detected_answers": [{"text": "XXXVIII", "char_spans": [[584, 590]], "token_spans": [[108, 108]]}]}, {"answers": ["six", "ten", "ten"], "question": "How many teams can boast a 15\u20131 regular season record?", "id": "56bf28c73aeaaa14008c953d", "qid": "2ff5748b293c472ba0800a38fc620079", "question_tokens": [["How", 0], ["many", 4], ["teams", 9], ["can", 15], ["boast", 19], ["a", 25], ["15\u20131", 27], ["regular", 32], ["season", 40], ["record", 47], ["?", 53]], "detected_answers": [{"text": "ten", "char_spans": [[138, 140]], "token_spans": [[26, 26]]}]}, {"answers": ["number one", "number one", "one"], "question": "What seed was the Carolina Panthers?", "id": "56d6f1190d65d21400198272", "qid": "9b75a742420e4c5c8f12e837ff18c25a", "question_tokens": [["What", 0], ["seed", 5], ["was", 10], ["the", 14], ["Carolina", 18], ["Panthers", 27], ["?", 35]], "detected_answers": [{"text": "one", "char_spans": [[42, 44]], "token_spans": [[8, 8]]}]}, {"answers": ["number one", "number one", "one"], "question": "What seed was the Denver Broncos?", "id": "56d6f1190d65d21400198273", "qid": "b870221781c64c0e9769b542adb9de5c", "question_tokens": [["What", 0], ["seed", 5], ["was", 10], ["the", 14], ["Denver", 18], ["Broncos", 25], ["?", 32]], "detected_answers": [{"text": "one", "char_spans": [[42, 44]], "token_spans": [[8, 8]]}]}, {"answers": ["Super Bowl XLVIII", "Super Bowl XLVIII", "Super Bowl XLVIII"], "question": "Prior to Super Bowl 50, when were the Broncos last there?", "id": "56d6f1190d65d21400198274", "qid": "3891e5bedff44b968211dcb2f9287a75", "question_tokens": [["Prior", 0], ["to", 6], ["Super", 9], ["Bowl", 15], ["50", 20], [",", 22], ["when", 24], ["were", 29], ["the", 34], ["Broncos", 38], ["last", 46], ["there", 51], ["?", 56]], "detected_answers": [{"text": "Super Bowl XLVIII", "char_spans": [[444, 460]], "token_spans": [[85, 87]]}]}, {"answers": ["Super Bowl XXXVIII.", "Super Bowl XXXVIII", "Super Bowl XXXVIII"], "question": "Prior to Super Bowl 50, when were the Carolina Panthers last there?", "id": "56d6f1190d65d21400198275", "qid": "f3b014ecab1c4032a1c6e5fa7641555b", "question_tokens": [["Prior", 0], ["to", 6], ["Super", 9], ["Bowl", 15], ["50", 20], [",", 22], ["when", 24], ["were", 29], ["the", 34], ["Carolina", 38], ["Panthers", 47], ["last", 56], ["there", 61], ["?", 66]], "detected_answers": [{"text": "Super Bowl XXXVIII", "char_spans": [[573, 590]], "token_spans": [[106, 108]]}]}, {"answers": ["six", "ten", "ten"], "question": "How many teams have had a 15-1 record for the regular season?", "id": "56d6f1190d65d21400198276", "qid": "644501ec5c0749bbbb3b27debffca9b8", "question_tokens": [["How", 0], ["many", 4], ["teams", 9], ["have", 15], ["had", 20], ["a", 24], ["15", 26], ["-", 28], ["1", 29], ["record", 31], ["for", 38], ["the", 42], ["regular", 46], ["season", 54], ["?", 60]], "detected_answers": [{"text": "ten", "char_spans": [[138, 140]], "token_spans": [[26, 26]]}]}, {"answers": ["one", "1", "1"], "question": "How many games did the Panthers lose in the regular season before Super Bowl 50?", "id": "56d99179dc89441400fdb56c", "qid": "a4339b28775f456c94895fc5a37e77fd", "question_tokens": [["How", 0], ["many", 4], ["games", 9], ["did", 15], ["the", 19], ["Panthers", 23], ["lose", 32], ["in", 37], ["the", 40], ["regular", 44], ["season", 52], ["before", 59], ["Super", 66], ["Bowl", 72], ["50", 77], ["?", 79]], "detected_answers": [{"text": "one", "char_spans": [[42, 44]], "token_spans": [[8, 8]]}]}, {"answers": ["four", "four", "four"], "question": "How many teams up to Super Bowl 50 have been to the championship game eight times?", "id": "56d99179dc89441400fdb56d", "qid": "b1cb5895c617478aa2d30992551f903a", "question_tokens": [["How", 0], ["many", 4], ["teams", 9], ["up", 15], ["to", 18], ["Super", 21], ["Bowl", 27], ["50", 32], ["have", 35], ["been", 40], ["to", 45], ["the", 48], ["championship", 52], ["game", 65], ["eight", 70], ["times", 76], ["?", 81]], "detected_answers": [{"text": "four", "char_spans": [[300, 303]], "token_spans": [[59, 59]]}]}, {"answers": ["John Fox", "John Fox", "John Fox"], "question": "Before Super Bowl 50, what was the coach's name that coached both teams for their last Super Bowl appearances?", "id": "56d99179dc89441400fdb570", "qid": "7311a706089c4dedaee715881d893ebc", "question_tokens": [["Before", 0], ["Super", 7], ["Bowl", 13], ["50", 18], [",", 20], ["what", 22], ["was", 27], ["the", 31], ["coach", 35], ["'s", 40], ["name", 43], ["that", 48], ["coached", 53], ["both", 61], ["teams", 66], ["for", 72], ["their", 76], ["last", 82], ["Super", 87], ["Bowl", 93], ["appearances", 98], ["?", 109]], "detected_answers": [{"text": "John Fox", "char_spans": [[636, 643]], "token_spans": [[117, 118]]}]}], "context_tokens": [["For", 0], ["the", 4], ["third", 8], ["straight", 14], ["season", 23], [",", 29], ["the", 31], ["number", 35], ["one", 42], ["seeds", 46], ["from", 52], ["both", 57], ["conferences", 62], ["met", 74], ["in", 78], ["the", 81], ["Super", 85], ["Bowl", 91], [".", 95], ["The", 97], ["Carolina", 101], ["Panthers", 110], ["became", 119], ["one", 126], ["of", 130], ["only", 133], ["ten", 138], ["teams", 142], ["to", 148], ["have", 151], ["completed", 156], ["a", 166], ["regular", 168], ["season", 176], ["with", 183], ["only", 188], ["one", 193], ["loss", 197], [",", 201], ["and", 203], ["one", 207], ["of", 211], ["only", 214], ["six", 219], ["teams", 223], ["to", 229], ["have", 232], ["acquired", 237], ["a", 246], ["15\u20131", 248], ["record", 253], [",", 259], ["while", 261], ["the", 267], ["Denver", 271], ["Broncos", 278], ["became", 286], ["one", 293], ["of", 297], ["four", 300], ["teams", 305], ["to", 311], ["have", 314], ["made", 319], ["eight", 324], ["appearances", 330], ["in", 342], ["the", 345], ["Super", 349], ["Bowl", 355], [".", 359], ["The", 361], ["Broncos", 365], ["made", 373], ["their", 378], ["second", 384], ["Super", 391], ["Bowl", 397], ["appearance", 402], ["in", 413], ["three", 416], ["years", 422], [",", 427], ["having", 429], ["reached", 436], ["Super", 444], ["Bowl", 450], ["XLVIII", 455], [",", 461], ["while", 463], ["the", 469], ["Panthers", 473], ["made", 482], ["their", 487], ["second", 493], ["Super", 500], ["Bowl", 506], ["appearance", 511], ["in", 522], ["franchise", 525], ["history", 535], [",", 542], ["their", 544], ["other", 550], ["appearance", 556], ["being", 567], ["Super", 573], ["Bowl", 579], ["XXXVIII", 584], [".", 591], ["Coincidentally", 593], [",", 607], ["both", 609], ["teams", 614], ["were", 620], ["coached", 625], ["by", 633], ["John", 636], ["Fox", 641], ["in", 645], ["their", 648], ["last", 654], ["Super", 659], ["Bowl", 665], ["appearance", 670], ["prior", 681], ["to", 687], ["Super", 690], ["Bowl", 696], ["50", 701], [".", 703]]}
{"id": "", "context": "Despite waiving longtime running back DeAngelo Williams and losing top wide receiver Kelvin Benjamin to a torn ACL in the preseason, the Carolina Panthers had their best regular season in franchise history, becoming the seventh team to win at least 15 regular season games since the league expanded to a 16-game schedule in 1978. Carolina started the season 14\u20130, not only setting franchise records for the best start and the longest single-season winning streak, but also posting the best start to a season by an NFC team in NFL history, breaking the 13\u20130 record previously shared with the 2009 New Orleans Saints and the 2011 Green Bay Packers. With their NFC-best 15\u20131 regular season record, the Panthers clinched home-field advantage throughout the NFC playoffs for the first time in franchise history. Ten players were selected to the Pro Bowl (the most in franchise history) along with eight All-Pro selections.", "qas": [{"answers": ["DeAngelo Williams", "DeAngelo Williams", "Williams"], "question": "Whic Carolina Panthers running back was waived?", "id": "56be59683aeaaa14008c9058", "qid": "0e3b434ab0ac4318aabcd29e030523ae", "question_tokens": [["Whic", 0], ["Carolina", 5], ["Panthers", 14], ["running", 23], ["back", 31], ["was", 36], ["waived", 40], ["?", 46]], "detected_answers": [{"text": "Williams", "char_spans": [[47, 54]], "token_spans": [[6, 6]]}]}, {"answers": ["Kelvin Benjamin", "Kelvin Benjamin", "Benjamin"], "question": "Which Carolina Panthers wide receiver suffered a torn ACL before the season began?", "id": "56be59683aeaaa14008c9059", "qid": "08b9ace3723f4360b5bb00a7aea977ec", "question_tokens": [["Which", 0], ["Carolina", 6], ["Panthers", 15], ["wide", 24], ["receiver", 29], ["suffered", 38], ["a", 47], ["torn", 49], ["ACL", 54], ["before", 58], ["the", 65], ["season", 69], ["began", 76], ["?", 81]], "detected_answers": [{"text": "Benjamin", "char_spans": [[92, 99]], "token_spans": [[13, 13]]}]}, {"answers": ["7", "seventh", "seventh"], "question": "How many teams have won 15 regular season games since the 16-game schedule was adopted?", "id": "56be59683aeaaa14008c905a", "qid": "50f65d4d1ec6483ba2bb42031d656df0", "question_tokens": [["How", 0], ["many", 4], ["teams", 9], ["have", 15], ["won", 20], ["15", 24], ["regular", 27], ["season", 35], ["games", 42], ["since", 48], ["the", 54], ["16-game", 58], ["schedule", 66], ["was", 75], ["adopted", 79], ["?", 86]], "detected_answers": [{"text": "seventh", "char_spans": [[220, 226]], "token_spans": [[36, 36]]}]}, {"answers": ["1978", "1978", "1978"], "question": "In what year did the NFL switch to a 16-game regular season?", "id": "56beb3083aeaaa14008c923d", "qid": "d82088877d5543f694f2af057fce09a7", "question_tokens": [["In", 0], ["what", 3], ["year", 8], ["did", 13], ["the", 17], ["NFL", 21], ["switch", 25], ["to", 32], ["a", 35], ["16-game", 37], ["regular", 45], ["season", 53], ["?", 59]], "detected_answers": [{"text": "1978", "char_spans": [[324, 327]], "token_spans": [[55, 55]]}]}, {"answers": ["Carolina Panthers", "the Panthers", "Carolina"], "question": "Who had the best record in the NFC?", "id": "56beb3083aeaaa14008c923e", "qid": "4394b910b1d840e78b9ad6b8e5f31b4c", "question_tokens": [["Who", 0], ["had", 4], ["the", 8], ["best", 12], ["record", 17], ["in", 24], ["the", 27], ["NFC", 31], ["?", 34]], "detected_answers": [{"text": "Carolina", "char_spans": [[330, 337]], "token_spans": [[57, 57]]}]}, {"answers": ["Ten", "Ten", "Ten"], "question": "How many Panthers went to the Pro Bowl?", "id": "56beb3083aeaaa14008c923f", "qid": "7131e4b52ecb44f4ae14f3734f6d7c96", "question_tokens": [["How", 0], ["many", 4], ["Panthers", 9], ["went", 18], ["to", 23], ["the", 26], ["Pro", 30], ["Bowl", 34], ["?", 38]], "detected_answers": [{"text": "Ten", "char_spans": [[807, 809]], "token_spans": [[146, 146]]}]}, {"answers": ["eight", "eight", "eight"], "question": "How many Panthers were designated All-Pro?", "id": "56beb3083aeaaa14008c9240", "qid": "0ec83cf9a5d94c08a7f75fcfbded2057", "question_tokens": [["How", 0], ["many", 4], ["Panthers", 9], ["were", 18], ["designated", 23], ["All", 34], ["-", 37], ["Pro", 38], ["?", 41]], "detected_answers": [{"text": "eight", "char_spans": [[892, 896]], "token_spans": [[163, 163]]}]}, {"answers": ["Kelvin Benjamin", "Kelvin Benjamin", "Benjamin"], "question": "What Panther tore his ACL in the preseason?", "id": "56beb3083aeaaa14008c9241", "qid": "0515b56f5c034b8492e71266251cd34f", "question_tokens": [["What", 0], ["Panther", 5], ["tore", 13], ["his", 18], ["ACL", 22], ["in", 26], ["the", 29], ["preseason", 33], ["?", 42]], "detected_answers": [{"text": "Benjamin", "char_spans": [[92, 99]], "token_spans": [[13, 13]]}]}, {"answers": ["1978", "1978", "1978"], "question": "What year did the league begin having schedules with 16 games in them?", "id": "56bf2afe3aeaaa14008c9543", "qid": "04def93363e142fb976ded79212e9331", "question_tokens": [["What", 0], ["year", 5], ["did", 10], ["the", 14], ["league", 18], ["begin", 25], ["having", 31], ["schedules", 38], ["with", 48], ["16", 53], ["games", 56], ["in", 62], ["them", 65], ["?", 69]], "detected_answers": [{"text": "1978", "char_spans": [[324, 327]], "token_spans": [[55, 55]]}]}, {"answers": ["2009", "2009", "2009"], "question": "What year did the the Saints hit a 13-0 record?", "id": "56bf2afe3aeaaa14008c9544", "qid": "e91a3b3b72d44a4ab0de6f9cbe4de146", "question_tokens": [["What", 0], ["year", 5], ["did", 10], ["the", 14], ["the", 18], ["Saints", 22], ["hit", 29], ["a", 33], ["13", 35], ["-", 37], ["0", 38], ["record", 40], ["?", 46]], "detected_answers": [{"text": "2009", "char_spans": [[591, 594]], "token_spans": [[106, 106]]}]}, {"answers": ["2011", "2011", "2011"], "question": "When did the Packers arrive at a record of 13-0?", "id": "56bf2afe3aeaaa14008c9545", "qid": "9e6cf6de9029406a8c807f3c9dadaab0", "question_tokens": [["When", 0], ["did", 5], ["the", 9], ["Packers", 13], ["arrive", 21], ["at", 28], ["a", 31], ["record", 33], ["of", 40], ["13", 43], ["-", 45], ["0", 46], ["?", 47]], "detected_answers": [{"text": "2011", "char_spans": [[623, 626]], "token_spans": [[112, 112]]}]}, {"answers": ["torn ACL", "a torn ACL", "torn ACL"], "question": "What injury did the Carolina Panthers lose Kelvin Benjamin to during their preseason?", "id": "56bf2afe3aeaaa14008c9547", "qid": "88138335d9724387ab75669b97131b2a", "question_tokens": [["What", 0], ["injury", 5], ["did", 12], ["the", 16], ["Carolina", 20], ["Panthers", 29], ["lose", 38], ["Kelvin", 43], ["Benjamin", 50], ["to", 59], ["during", 62], ["their", 69], ["preseason", 75], ["?", 84]], "detected_answers": [{"text": "torn ACL", "char_spans": [[106, 113]], "token_spans": [[16, 17]]}]}, {"answers": ["Kelvin Benjamin", "Kelvin Benjamin", "Benjamin"], "question": "Which player did the Panthers lose to an ACL injury in a preseason game?", "id": "56d6f2000d65d2140019827c", "qid": "8367dace42d646df8dfe429f545d2c1b", "question_tokens": [["Which", 0], ["player", 6], ["did", 13], ["the", 17], ["Panthers", 21], ["lose", 30], ["to", 35], ["an", 38], ["ACL", 41], ["injury", 45], ["in", 52], ["a", 55], ["preseason", 57], ["game", 67], ["?", 71]], "detected_answers": [{"text": "Benjamin", "char_spans": [[92, 99]], "token_spans": [[13, 13]]}]}, {"answers": ["DeAngelo Williams", "DeAngelo Williams", "Williams"], "question": "Which running back did the Panthers waive?", "id": "56d6f2000d65d2140019827d", "qid": "56ff24dfe4eb4f2186901cca0ee168f3", "question_tokens": [["Which", 0], ["running", 6], ["back", 14], ["did", 19], ["the", 23], ["Panthers", 27], ["waive", 36], ["?", 41]], "detected_answers": [{"text": "Williams", "char_spans": [[47, 54]], "token_spans": [[6, 6]]}]}, {"answers": ["1978", "1978", "1978"], "question": "When did the NFL start their 16 game seasons?", "id": "56d6f2000d65d2140019827e", "qid": "cf061826580f45908fec8617888fe6a9", "question_tokens": [["When", 0], ["did", 5], ["the", 9], ["NFL", 13], ["start", 17], ["their", 23], ["16", 29], ["game", 32], ["seasons", 37], ["?", 44]], "detected_answers": [{"text": "1978", "char_spans": [[324, 327]], "token_spans": [[55, 55]]}]}, {"answers": ["Ten", "Ten", "Ten"], "question": "How many Panthers players were selected to the Pro Bowl?", "id": "56d6f2000d65d2140019827f", "qid": "aab9c74ec68645e79ecf9019e36ac771", "question_tokens": [["How", 0], ["many", 4], ["Panthers", 9], ["players", 18], ["were", 26], ["selected", 31], ["to", 40], ["the", 43], ["Pro", 47], ["Bowl", 51], ["?", 55]], "detected_answers": [{"text": "Ten", "char_spans": [[807, 809]], "token_spans": [[146, 146]]}]}, {"answers": ["Carolina Panthers", "the Panthers", "Carolina"], "question": "Which team had the best regular season in their history?", "id": "56d9943fdc89441400fdb576", "qid": "0833e0539eb44c2fba66625f7bdce517", "question_tokens": [["Which", 0], ["team", 6], ["had", 11], ["the", 15], ["best", 19], ["regular", 24], ["season", 32], ["in", 39], ["their", 42], ["history", 48], ["?", 55]], "detected_answers": [{"text": "Carolina", "char_spans": [[330, 337]], "token_spans": [[57, 57]]}]}, {"answers": ["1978.", "1978", "1978"], "question": "When did the league go from 15 to 16 games in the regular season?", "id": "56d9943fdc89441400fdb577", "qid": "2e29854700ab4436bf2a4fa777c84665", "question_tokens": [["When", 0], ["did", 5], ["the", 9], ["league", 13], ["go", 20], ["from", 23], ["15", 28], ["to", 31], ["16", 34], ["games", 37], ["in", 43], ["the", 46], ["regular", 50], ["season", 58], ["?", 64]], "detected_answers": [{"text": "1978", "char_spans": [[324, 327]], "token_spans": [[55, 55]]}]}, {"answers": ["Carolina Panthers", "the Panthers", "Carolina"], "question": "What team had the best start ever in the NFL?", "id": "56d9943fdc89441400fdb578", "qid": "7ded1b9fa0c4487cb947ea9653235a3f", "question_tokens": [["What", 0], ["team", 5], ["had", 10], ["the", 14], ["best", 18], ["start", 23], ["ever", 29], ["in", 34], ["the", 37], ["NFL", 41], ["?", 44]], "detected_answers": [{"text": "Carolina", "char_spans": [[330, 337]], "token_spans": [[57, 57]]}]}, {"answers": ["Ten", "Ten", "Ten"], "question": "How many Panthers players were chosen for the 2015 season's Pro Bowl?", "id": "56d9943fdc89441400fdb57a", "qid": "8bc00cf440354fd5bd645ccad4b04290", "question_tokens": [["How", 0], ["many", 4], ["Panthers", 9], ["players", 18], ["were", 26], ["chosen", 31], ["for", 38], ["the", 42], ["2015", 46], ["season", 51], ["'s", 57], ["Pro", 60], ["Bowl", 64], ["?", 68]], "detected_answers": [{"text": "Ten", "char_spans": [[807, 809]], "token_spans": [[146, 146]]}]}], "context_tokens": [["Despite", 0], ["waiving", 8], ["longtime", 16], ["running", 25], ["back", 33], ["DeAngelo", 38], ["Williams", 47], ["and", 56], ["losing", 60], ["top", 67], ["wide", 71], ["receiver", 76], ["Kelvin", 85], ["Benjamin", 92], ["to", 101], ["a", 104], ["torn", 106], ["ACL", 111], ["in", 115], ["the", 118], ["preseason", 122], [",", 131], ["the", 133], ["Carolina", 137], ["Panthers", 146], ["had", 155], ["their", 159], ["best", 165], ["regular", 170], ["season", 178], ["in", 185], ["franchise", 188], ["history", 198], [",", 205], ["becoming", 207], ["the", 216], ["seventh", 220], ["team", 228], ["to", 233], ["win", 236], ["at", 240], ["least", 243], ["15", 249], ["regular", 252], ["season", 260], ["games", 267], ["since", 273], ["the", 279], ["league", 283], ["expanded", 290], ["to", 299], ["a", 302], ["16-game", 304], ["schedule", 312], ["in", 321], ["1978", 324], [".", 328], ["Carolina", 330], ["started", 339], ["the", 347], ["season", 351], ["14\u20130", 358], [",", 362], ["not", 364], ["only", 368], ["setting", 373], ["franchise", 381], ["records", 391], ["for", 399], ["the", 403], ["best", 407], ["start", 412], ["and", 418], ["the", 422], ["longest", 426], ["single", 434], ["-", 440], ["season", 441], ["winning", 448], ["streak", 456], [",", 462], ["but", 464], ["also", 468], ["posting", 473], ["the", 481], ["best", 485], ["start", 490], ["to", 496], ["a", 499], ["season", 501], ["by", 508], ["an", 511], ["NFC", 514], ["team", 518], ["in", 523], ["NFL", 526], ["history", 530], [",", 537], ["breaking", 539], ["the", 548], ["13\u20130", 552], ["record", 557], ["previously", 564], ["shared", 575], ["with", 582], ["the", 587], ["2009", 591], ["New", 596], ["Orleans", 600], ["Saints", 608], ["and", 615], ["the", 619], ["2011", 623], ["Green", 628], ["Bay", 634], ["Packers", 638], [".", 645], ["With", 647], ["their", 652], ["NFC", 658], ["-", 661], ["best", 662], ["15\u20131", 667], ["regular", 672], ["season", 680], ["record", 687], [",", 693], ["the", 695], ["Panthers", 699], ["clinched", 708], ["home", 717], ["-", 721], ["field", 722], ["advantage", 728], ["throughout", 738], ["the", 749], ["NFC", 753], ["playoffs", 757], ["for", 766], ["the", 770], ["first", 774], ["time", 780], ["in", 785], ["franchise", 788], ["history", 798], [".", 805], ["Ten", 807], ["players", 811], ["were", 819], ["selected", 824], ["to", 833], ["the", 836], ["Pro", 840], ["Bowl", 844], ["(", 849], ["the", 850], ["most", 854], ["in", 859], ["franchise", 862], ["history", 872], [")", 879], ["along", 881], ["with", 887], ["eight", 892], ["All", 898], ["-", 901], ["Pro", 902], ["selections", 906], [".", 916]]}
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""ERNIE (PaddlePaddle) model wrapper"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import json
import collections
import multiprocessing
import argparse
import numpy as np
import paddle.fluid as fluid
from pdnlp.toolkit.configure import ArgumentGroup
from task_reader.mrqa_infer import DataProcessor, get_answers
from pdnlp.toolkit.init import init_pretraining_params, init_checkpoint
ema_decay = 0.9999
verbose = False
max_seq_len = 512
max_query_length = 64
max_answer_length = 30
in_tokens = False
do_lower_case = True
doc_stride = 128
n_best_size = 20
use_cuda = True
class ERNIEModelWrapper():
"""
Wrap a tnet model
the basic processes include input checking, preprocessing, calling tf-serving
and postprocessing
"""
def __init__(self, model_dir):
""" """
if use_cuda:
place = fluid.CUDAPlace(0)
dev_count = fluid.core.get_cuda_device_count()
else:
place = fluid.CPUPlace()
dev_count = int(os.environ.get('CPU_NUM', multiprocessing.cpu_count()))
self.exe = fluid.Executor(place)
self.bert_preprocessor = DataProcessor(
vocab_path=os.path.join(model_dir, 'vocab.txt'),
do_lower_case=do_lower_case,
max_seq_length=max_seq_len,
in_tokens=in_tokens,
doc_stride=doc_stride,
max_query_length=max_query_length)
self.inference_program, self.feed_target_names, self.fetch_targets = \
fluid.io.load_inference_model(dirname=model_dir, executor=self.exe)
def preprocessor(self, samples, batch_size, examples_start_id, features_start_id):
"""Preprocess the input samples, including word seg, padding, token to ids"""
# Tokenization and paragraph padding
examples, features, batch = self.bert_preprocessor.data_generator(
samples, batch_size, max_len=max_seq_len, examples_start_id=examples_start_id, features_start_id=features_start_id)
self.samples = samples
return examples, features, batch
def call_mrc(self, batch, squeeze_dim0=False, return_list=False):
"""MRC"""
if squeeze_dim0 and return_list:
raise ValueError("squeeze_dim0 only work for dict-type return value.")
src_ids = batch[0]
pos_ids = batch[1]
sent_ids = batch[2]
input_mask = batch[3]
unique_id = batch[4]
feed_dict = {
self.feed_target_names[0]: src_ids,
self.feed_target_names[1]: pos_ids,
self.feed_target_names[2]: sent_ids,
self.feed_target_names[3]: input_mask,
self.feed_target_names[4]: unique_id
}
np_unique_ids, np_start_logits, np_end_logits, np_num_seqs = \
self.exe.run(self.inference_program, feed=feed_dict, fetch_list=self.fetch_targets)
if len(np_unique_ids) == 1 and squeeze_dim0:
np_unique_ids = np_unique_ids[0]
np_start_logits = np_start_logits[0]
np_end_logits = np_end_logits[0]
if return_list:
mrc_results = [{'unique_ids': id, 'start_logits': st, 'end_logits': end}
for id, st, end in zip(np_unique_ids, np_start_logits, np_end_logits)]
else:
mrc_results = {
'unique_ids': np_unique_ids,
'start_logits': np_start_logits,
'end_logits': np_end_logits,
}
return mrc_results
def postprocessor(self, examples, features, mrc_results):
"""Extract answer
batch: [examples, features] from preprocessor
mrc_results: model results from call_mrc. if mrc_results is list, each element of which is a size=1 batch.
"""
RawResult = collections.namedtuple("RawResult",
["unique_id", "start_logits", "end_logits"])
results = []
if isinstance(mrc_results, list):
for res in mrc_results:
unique_id = res['unique_ids'][0]
start_logits = [float(x) for x in res['start_logits'].flat]
end_logits = [float(x) for x in res['end_logits'].flat]
results.append(
RawResult(
unique_id=unique_id,
start_logits=start_logits,
end_logits=end_logits))
else:
assert isinstance(mrc_results, dict)
for idx in range(mrc_results['unique_ids'].shape[0]):
unique_id = int(mrc_results['unique_ids'][idx])
start_logits = [float(x) for x in mrc_results['start_logits'][idx].flat]
end_logits = [float(x) for x in mrc_results['end_logits'][idx].flat]
results.append(
RawResult(
unique_id=unique_id,
start_logits=start_logits,
end_logits=end_logits))
answers = get_answers(
examples, features, results, n_best_size,
max_answer_length, do_lower_case, verbose)
return answers
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""Some utilities for MRC online service"""
import json
import sys
import logging
import time
import numpy as np
from flask import Response
from flask import request
from copy import deepcopy
verbose = False
def _request_check(input_json):
"""Check if the request json is valid"""
if input_json is None or not isinstance(input_json, dict):
return 'Can not parse the input json data - {}'.format(input_json)
try:
c = input_json['context']
qa = input_json['qas'][0]
qid = qa['qid']
q = qa['question']
except KeyError as e:
return 'Invalid request, key "{}" not found'.format(e)
return 'OK'
def _abort(status_code, message):
"""Create custom error message and status code"""
return Response(json.dumps(message), status=status_code, mimetype='application/json')
def _timmer(init_start, start, current, process_name):
cumulated_elapsed_time = (current - init_start) * 1000
current_elapsed_time = (current - start) * 1000
print('{}\t-\t{:.2f}\t{:.2f}'.format(process_name, cumulated_elapsed_time,
current_elapsed_time))
def _split_input_json(input_json):
if len(input_json['context_tokens']) > 810:
input_json['context'] = input_json['context'][:5000]
if len(input_json['qas']) == 1:
return [input_json]
else:
rets = []
for i in range(len(input_json['qas'])):
temp = deepcopy(input_json)
temp['qas'] = [input_json['qas'][i]]
rets.append(temp)
return rets
class BasicMRCService(object):
"""Provide basic MRC service for flask"""
def __init__(self, name, logger=None, log_data=False):
""" """
self.name = name
if logger is None:
self.logger = logging.getLogger('flask')
else:
self.logger = logger
self.log_data = log_data
def __call__(self, model, process_mode='serial', max_batch_size=5, timmer=False):
"""
Args:
mode: serial, parallel
"""
if timmer:
start = time.time()
"""Call mrc model wrapper and handle expectations"""
self.input_json = request.get_json(silent=True)
try:
if timmer:
start_request_check = time.time()
request_status = _request_check(self.input_json)
if timmer:
current_time = time.time()
_timmer(start, start_request_check, current_time, 'request check')
if self.log_data:
if self.logger is None:
logging.info(
'Client input - {}'.format(json.dumps(self.input_json, ensure_ascii=False))
)
else:
self.logger.info(
'Client input - {}'.format(json.dumps(self.input_json, ensure_ascii=False))
)
except Exception as e:
self.logger.error('server request checker error')
self.logger.exception(e)
return _abort(500, 'server request checker error - {}'.format(e))
if request_status != 'OK':
return _abort(400, request_status)
# call preprocessor
try:
if timmer:
start_preprocess = time.time()
jsons = _split_input_json(self.input_json)
processed = []
ex_start_idx = 0
feat_start_idx = 1000000000
for i in jsons:
e,f,b = model.preprocessor(i, batch_size=max_batch_size if process_mode == 'parallel' else 1, examples_start_id=ex_start_idx, features_start_id=feat_start_idx)
ex_start_idx += len(e)
feat_start_idx += len(f)
processed.append([e,f,b])
if timmer:
current_time = time.time()
_timmer(start, start_preprocess, current_time, 'preprocess')
except Exception as e:
self.logger.error('preprocessor error')
self.logger.exception(e)
return _abort(500, 'preprocessor error - {}'.format(e))
def transpose(mat):
return zip(*mat)
# call mrc
try:
if timmer:
start_call_mrc = time.time()
self.mrc_results = []
self.examples = []
self.features = []
for e, f, batches in processed:
if verbose:
if len(f) > max_batch_size:
print("get a too long example....")
if process_mode == 'serial':
self.mrc_results.extend([model.call_mrc(b, squeeze_dim0=True) for b in batches[:max_batch_size]])
elif process_mode == 'parallel':
# only keep first max_batch_size features
# batches = batches[0]
for b in batches:
self.mrc_results.extend(model.call_mrc(b, return_list=True))
else:
raise NotImplementedError()
self.examples.extend(e)
# self.features.extend(f[:max_batch_size])
self.features.extend(f)
if timmer:
current_time = time.time()
_timmer(start, start_call_mrc, current_time, 'call mrc')
except Exception as e:
self.logger.error('call_mrc error')
self.logger.exception(e)
return _abort(500, 'call_mrc error - {}'.format(e))
# call post processor
try:
if timmer:
start_post_precess = time.time()
self.results = model.postprocessor(self.examples, self.features, self.mrc_results)
# only nbest results is POSTed back
self.results = self.results[1]
# self.results = self.results[0]
if timmer:
current_time = time.time()
_timmer(start, start_post_precess, current_time, 'post process')
except Exception as e:
self.logger.error('postprocessor error')
self.logger.exception(e)
return _abort(500, 'postprocessor error - {}'.format(e))
return self._response_constructor()
def _response_constructor(self):
"""construct http response object"""
try:
response = {
# 'requestID': self.input_json['requestID'],
'results': self.results
}
if self.log_data:
self.logger.info(
'Response - {}'.format(json.dumps(response, ensure_ascii=False))
)
return Response(json.dumps(response), mimetype='application/json')
except Exception as e:
self.logger.error('response constructor error')
self.logger.exception(e)
return _abort(500, 'response constructor error - {}'.format(e))
from algorithm import optimization
from algorithm import multitask
from extension import fp16
from module import transformer_encoder
from toolkit import configure
from toolkit import init
from toolkit import placeholder
from nets import bert
#encoding=utf8
import os
import sys
import random
from copy import deepcopy as copy
import numpy as np
import paddle
import paddle.fluid as fluid
import multiprocessing
class Task:
def __init__(
self,
conf,
name = "",
is_training = False,
_DataProcesser = None,
shared_name = ""):
self.conf = copy(conf)
self.name = name
self.shared_name = shared_name
self.is_training = is_training
self.DataProcesser = _DataProcesser
def _create_reader(self):
raise NotImplementedError("Task:_create_reader not implemented")
def _create_model(self):
raise NotImplementedError("Task:_create_model not implemented")
def prepare(self, args):
raise NotImplementedError("Task:prepare not implemented")
def train_step(self, args):
raise NotImplementedError("Task:train_step not implemented")
def predict(self, args):
raise NotImplementedError("Task:_predict not implemented")
class JointTask:
def __init__(self):
self.tasks = []
#self.startup_exe = None
#self.train_exe = None
self.exe = None
self.share_vars_from = None
self.startup_prog = fluid.Program()
def __add__(self, task):
assert isinstance(task, Task)
self.tasks.append(task)
return self
def prepare(self, args):
if args.use_cuda:
place = fluid.CUDAPlace(0)
dev_count = fluid.core.get_cuda_device_count()
else:
place = fluid.CPUPlace()
dev_count = int(os.environ.get('CPU_NUM', multiprocessing.cpu_count()))
#self.startup_exe = fluid.Executor(place)
self.exe = fluid.Executor(place)
for idx, task in enumerate(self.tasks):
if idx == 0:
print("for idx : %d" % idx)
task.prepare(args, exe = self.exe)
self.share_vars_from = task.compiled_train_prog
else:
print("for idx : %d" % idx)
task.prepare(args, exe = self.exe, share_vars_from = self.share_vars_from)
def train(self, args):
joint_steps = []
for i in xrange(0, len(self.tasks)):
for _ in xrange(0, self.tasks[i].max_train_steps):
joint_steps.append(i)
self.tasks[0].train_step(args, exe = self.exe)
random.shuffle(joint_steps)
for next_task_id in joint_steps:
self.tasks[next_task_id].train_step(args, exe = self.exe)
if __name__ == "__main__":
basetask_a = Task(None)
basetask_b = Task(None)
joint_tasks = JointTask()
joint_tasks += basetask_a
print(joint_tasks.tasks)
joint_tasks += basetask_b
print(joint_tasks.tasks)
...@@ -19,7 +19,7 @@ from __future__ import print_function ...@@ -19,7 +19,7 @@ from __future__ import print_function
import numpy as np import numpy as np
import paddle.fluid as fluid import paddle.fluid as fluid
from utils.fp16 import create_master_params_grads, master_param_to_train_param from pdnlp.extension.fp16 import create_master_params_grads, master_param_to_train_param
def linear_warmup_decay(learning_rate, warmup_steps, num_train_steps): def linear_warmup_decay(learning_rate, warmup_steps, num_train_steps):
......
...@@ -18,7 +18,6 @@ from __future__ import division ...@@ -18,7 +18,6 @@ from __future__ import division
from __future__ import print_function from __future__ import print_function
from functools import partial from functools import partial
from functools import reduce
import numpy as np import numpy as np
import paddle.fluid as fluid import paddle.fluid as fluid
......
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. #encoding=utf8
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
...@@ -25,7 +12,6 @@ import json ...@@ -25,7 +12,6 @@ import json
logging_only_message = "%(message)s" logging_only_message = "%(message)s"
logging_details = "%(asctime)s.%(msecs)03d %(levelname)s %(module)s - %(funcName)s: %(message)s" logging_details = "%(asctime)s.%(msecs)03d %(levelname)s %(module)s - %(funcName)s: %(message)s"
class JsonConfig(object): class JsonConfig(object):
def __init__(self, config_path): def __init__(self, config_path):
self._config_dict = self._parse(config_path) self._config_dict = self._parse(config_path)
...@@ -62,7 +48,6 @@ class ArgumentGroup(object): ...@@ -62,7 +48,6 @@ class ArgumentGroup(object):
help=help + ' Default: %(default)s.', help=help + ' Default: %(default)s.',
**kwargs) **kwargs)
class ArgConfig(object): class ArgConfig(object):
def __init__(self): def __init__(self):
......
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. #encoding=utf8
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function from __future__ import print_function
import os import os
...@@ -54,6 +40,7 @@ class Placeholder(object): ...@@ -54,6 +40,7 @@ class Placeholder(object):
self.lod_levels.append(lod_level) self.lod_levels.append(lod_level)
self.names.append(name) self.names.append(name)
def build(self, capacity, reader_name, use_double_buffer = False): def build(self, capacity, reader_name, use_double_buffer = False):
pyreader = fluid.layers.py_reader( pyreader = fluid.layers.py_reader(
capacity = capacity, capacity = capacity,
...@@ -65,6 +52,7 @@ class Placeholder(object): ...@@ -65,6 +52,7 @@ class Placeholder(object):
return [pyreader, fluid.layers.read_file(pyreader)] return [pyreader, fluid.layers.read_file(pyreader)]
def __add__(self, new_holder): def __add__(self, new_holder):
assert isinstance(new_holder, tuple) or isinstance(new_holder, list) assert isinstance(new_holder, tuple) or isinstance(new_holder, list)
assert len(new_holder) >= 2 assert len(new_holder) >= 2
......
export FLAGS_fraction_of_gpu_memory_to_use=0.1
port=$1
gpu=$2
export CUDA_VISIBLE_DEVICES=$gpu
python start_service.py ./infer_model $port
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
ERNIE model service
"""
import json
import sys
import logging
logging.basicConfig(
level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
import requests
from flask import Flask
from flask import Response
from flask import request
import mrc_service
import model_wrapper as ernie_wrapper
assert len(sys.argv) == 3 or len(sys.argv) == 4, "Usage: python serve.py <model_dir> <port> [process_mode]"
if len(sys.argv) == 3:
_, model_dir, port = sys.argv
mode = 'parallel'
else:
_, model_dir, port, mode = sys.argv
app = Flask(__name__)
app.logger.setLevel(logging.INFO)
ernie_model = ernie_wrapper.ERNIEModelWrapper(model_dir=model_dir)
server = mrc_service.BasicMRCService('Short answer MRC service', app.logger)
@app.route('/', methods=['POST'])
def mrqa_service():
"""Description"""
model = ernie_model
return server(model, process_mode=mode, max_batch_size=5)
if __name__ == '__main__':
app.run(port=port, debug=False, threaded=False, processes=1)
...@@ -12,7 +12,6 @@ ...@@ -12,7 +12,6 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
"""Mask, padding and batching.""" """Mask, padding and batching."""
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
......
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. # coding=utf-8
# Copyright 2018 The Google AI Language Team Authors.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
...@@ -13,15 +14,15 @@ ...@@ -13,15 +14,15 @@
# limitations under the License. # limitations under the License.
"""Run MRQA""" """Run MRQA"""
import re
import six import six
import math import math
import json import json
import random import random
import collections import collections
import numpy as np import numpy as np
import tokenization
from utils import tokenization from batching import prepare_batch_data
from utils.batching import prepare_batch_data
class MRQAExample(object): class MRQAExample(object):
...@@ -94,10 +95,8 @@ class InputFeatures(object): ...@@ -94,10 +95,8 @@ class InputFeatures(object):
self.is_impossible = is_impossible self.is_impossible = is_impossible
def read_mrqa_examples(input_file, is_training, with_negative=False): def read_mrqa_examples(sample, is_training=False, with_negative=False):
"""Read a MRQA json file into a list of MRQAExample.""" """Read a MRQA json file into a list of MRQAExample."""
with open(input_file, "r") as reader:
input_data = json.load(reader)["data"]
def is_whitespace(c): def is_whitespace(c):
if c == " " or c == "\t" or c == "\r" or c == "\n" or ord(c) == 0x202F: if c == " " or c == "\t" or c == "\r" or c == "\n" or ord(c) == 0x202F:
...@@ -105,9 +104,9 @@ def read_mrqa_examples(input_file, is_training, with_negative=False): ...@@ -105,9 +104,9 @@ def read_mrqa_examples(input_file, is_training, with_negative=False):
return False return False
examples = [] examples = []
for entry in input_data: # sample = json.loads(raw_sample)
for paragraph in entry["paragraphs"]: paragraph_text = sample["context"]
paragraph_text = paragraph["context"] paragraph_text = re.sub(r'\[TLE\]|\[DOC\]|\[PAR\]', '[SEP]', paragraph_text)
doc_tokens = [] doc_tokens = []
char_to_word_offset = [] char_to_word_offset = []
prev_is_whitespace = True prev_is_whitespace = True
...@@ -122,56 +121,18 @@ def read_mrqa_examples(input_file, is_training, with_negative=False): ...@@ -122,56 +121,18 @@ def read_mrqa_examples(input_file, is_training, with_negative=False):
prev_is_whitespace = False prev_is_whitespace = False
char_to_word_offset.append(len(doc_tokens) - 1) char_to_word_offset.append(len(doc_tokens) - 1)
for qa in paragraph["qas"]: for qa in sample["qas"]:
qas_id = qa["id"] qas_id = qa["qid"]
question_text = qa["question"] question_text = qa["question"]
start_position = None start_position = None
end_position = None end_position = None
orig_answer_text = None orig_answer_text = None
is_impossible = False is_impossible = False
if is_training:
if with_negative:
is_impossible = qa["is_impossible"]
if (len(qa["answers"]) != 1) and (not is_impossible):
raise ValueError(
"For training, each question should have exactly 1 answer."
)
if not is_impossible:
answer = qa["answers"][0]
orig_answer_text = answer["text"]
answer_offset = answer["answer_start"]
answer_length = len(orig_answer_text)
start_position = char_to_word_offset[answer_offset]
end_position = char_to_word_offset[answer_offset +
answer_length - 1]
# Only add answers where the text can be exactly recovered from the
# document. If this CAN'T happen it's likely due to weird Unicode
# stuff so we will just skip the example.
#
# Note that this means for training mode, every example is NOT
# guaranteed to be preserved.
actual_text = " ".join(doc_tokens[start_position:(
end_position + 1)])
cleaned_answer_text = " ".join(
tokenization.whitespace_tokenize(orig_answer_text))
if actual_text.find(cleaned_answer_text) == -1:
print("Could not find answer: '%s' vs. '%s'",
actual_text, cleaned_answer_text)
continue
else:
start_position = -1
end_position = -1
orig_answer_text = ""
example = MRQAExample( example = MRQAExample(
qas_id=qas_id, qas_id=qas_id,
question_text=question_text, question_text=question_text,
doc_tokens=doc_tokens, doc_tokens=doc_tokens)
orig_answer_text=orig_answer_text,
start_position=start_position,
end_position=end_position,
is_impossible=is_impossible)
examples.append(example) examples.append(example)
return examples return examples
...@@ -184,13 +145,17 @@ def convert_examples_to_features( ...@@ -184,13 +145,17 @@ def convert_examples_to_features(
doc_stride, doc_stride,
max_query_length, max_query_length,
is_training, is_training,
examples_start_id=0,
features_start_id=1000000000
#output_fn #output_fn
): ):
"""Loads a data file into a list of `InputBatch`s.""" """Loads a data file into a list of `InputBatch`s."""
unique_id = 1000000000 unique_id = features_start_id
example_index = examples_start_id
for (example_index, example) in enumerate(examples): features = []
for example in examples:
query_tokens = tokenizer.tokenize(example.question_text) query_tokens = tokenizer.tokenize(example.question_text)
if len(query_tokens) > max_query_length: if len(query_tokens) > max_query_length:
...@@ -308,34 +273,6 @@ def convert_examples_to_features( ...@@ -308,34 +273,6 @@ def convert_examples_to_features(
start_position = 0 start_position = 0
end_position = 0 end_position = 0
""" """
if example_index < 3:
print("*** Example ***")
print("unique_id: %s" % (unique_id))
print("example_index: %s" % (example_index))
print("doc_span_index: %s" % (doc_span_index))
print("tokens: %s" % " ".join(
[tokenization.printable_text(x) for x in tokens]))
print("token_to_orig_map: %s" % " ".join([
"%d:%d" % (x, y)
for (x, y) in six.iteritems(token_to_orig_map)
]))
print("token_is_max_context: %s" % " ".join([
"%d:%s" % (x, y)
for (x, y) in six.iteritems(token_is_max_context)
]))
print("input_ids: %s" % " ".join([str(x) for x in input_ids]))
print("input_mask: %s" % " ".join([str(x) for x in input_mask]))
print("segment_ids: %s" %
" ".join([str(x) for x in segment_ids]))
if is_training and example.is_impossible:
print("impossible example")
if is_training and not example.is_impossible:
answer_text = " ".join(tokens[start_position:(end_position +
1)])
print("start_position: %d" % (start_position))
print("end_position: %d" % (end_position))
print("answer: %s" %
(tokenization.printable_text(answer_text)))
feature = InputFeatures( feature = InputFeatures(
unique_id=unique_id, unique_id=unique_id,
...@@ -352,8 +289,9 @@ def convert_examples_to_features( ...@@ -352,8 +289,9 @@ def convert_examples_to_features(
is_impossible=example.is_impossible) is_impossible=example.is_impossible)
unique_id += 1 unique_id += 1
features.append(feature)
yield feature example_index += 1
return features
def estimate_runtime_examples(data_path, sample_rate, tokenizer, \ def estimate_runtime_examples(data_path, sample_rate, tokenizer, \
...@@ -606,7 +544,6 @@ class DataProcessor(object): ...@@ -606,7 +544,6 @@ class DataProcessor(object):
self.current_train_epoch = -1 self.current_train_epoch = -1
self.train_examples = None self.train_examples = None
self.predict_examples = None
self.num_examples = {'train': -1, 'predict': -1} self.num_examples = {'train': -1, 'predict': -1}
def get_train_progress(self): def get_train_progress(self):
...@@ -636,42 +573,30 @@ class DataProcessor(object): ...@@ -636,42 +573,30 @@ class DataProcessor(object):
self._max_seq_length, self._doc_stride, self._max_query_length, \ self._max_seq_length, self._doc_stride, self._max_query_length, \
remove_impossible_questions=True, filter_invalid_spans=True) remove_impossible_questions=True, filter_invalid_spans=True)
def get_features(self, examples, is_training): def get_features(self, examples, is_training, examples_start_id, features_start_id):
features = convert_examples_to_features( features = convert_examples_to_features(
examples=examples, examples=examples,
tokenizer=self._tokenizer, tokenizer=self._tokenizer,
max_seq_length=self._max_seq_length, max_seq_length=self._max_seq_length,
doc_stride=self._doc_stride, doc_stride=self._doc_stride,
max_query_length=self._max_query_length, max_query_length=self._max_query_length,
examples_start_id=examples_start_id,
features_start_id=features_start_id,
is_training=is_training) is_training=is_training)
return features return features
def data_generator(self, def data_generator(self,
data_path, raw_samples,
batch_size, batch_size,
max_len=None, max_len=None,
phase='train', phase='predict',
shuffle=False, shuffle=False,
dev_count=1, dev_count=1,
with_negative=False, with_negative=False,
epoch=1): epoch=1,
if phase == 'train': examples_start_id=0,
self.train_examples = self.get_examples( features_start_id=1000000000):
data_path, examples = read_mrqa_examples(raw_samples)
is_training=True,
with_negative=with_negative)
examples = self.train_examples
self.num_examples['train'] = len(self.train_examples)
elif phase == 'predict':
self.predict_examples = self.get_examples(
data_path,
is_training=False,
with_negative=with_negative)
examples = self.predict_examples
self.num_examples['predict'] = len(self.predict_examples)
else:
raise ValueError(
"Unknown phase, which should be in ['train', 'predict'].")
def batch_reader(features, batch_size, in_tokens): def batch_reader(features, batch_size, in_tokens):
batch, total_token_num, max_len = [], 0, 0 batch, total_token_num, max_len = [], 0, 0
...@@ -704,15 +629,7 @@ class DataProcessor(object): ...@@ -704,15 +629,7 @@ class DataProcessor(object):
if len(batch) > 0: if len(batch) > 0:
yield batch, total_token_num yield batch, total_token_num
def wrapper(): features = self.get_features(examples, is_training=False, examples_start_id=examples_start_id, features_start_id=features_start_id)
for epoch_index in range(epoch):
if shuffle:
random.shuffle(examples)
if phase == 'train':
self.current_train_epoch = epoch_index
features = self.get_features(examples, is_training=True)
else:
features = self.get_features(examples, is_training=False)
all_dev_batches = [] all_dev_batches = []
for batch_data, total_token_num in batch_reader( for batch_data, total_token_num in batch_reader(
...@@ -729,32 +646,14 @@ class DataProcessor(object): ...@@ -729,32 +646,14 @@ class DataProcessor(object):
return_input_mask=True, return_input_mask=True,
return_max_len=False, return_max_len=False,
return_num_token=False) return_num_token=False)
if len(all_dev_batches) < dev_count:
all_dev_batches.append(batch_data) all_dev_batches.append(batch_data)
return examples, features, all_dev_batches
if len(all_dev_batches) == dev_count:
for batch in all_dev_batches:
yield batch
all_dev_batches = []
if phase == 'predict' and len(all_dev_batches) > 0:
fake_batch = all_dev_batches[-1]
fake_batch = fake_batch[:-1] + [np.array([-1]*len(fake_batch[0]))]
all_dev_batches = all_dev_batches + [fake_batch] * (dev_count - len(all_dev_batches))
for batch in all_dev_batches:
yield batch
return wrapper def get_answers(all_examples, all_features, all_results, n_best_size,
max_answer_length, do_lower_case,
verbose=False):
def write_predictions(all_examples, all_features, all_results, n_best_size,
max_answer_length, do_lower_case, output_prediction_file,
output_nbest_file, output_null_log_odds_file,
with_negative, null_score_diff_threshold,
verbose):
"""Write final predictions to the json file and log-odds of null if needed.""" """Write final predictions to the json file and log-odds of null if needed."""
print("Writing predictions to: %s" % (output_prediction_file))
print("Writing nbest to: %s" % (output_nbest_file))
example_index_to_features = collections.defaultdict(list) example_index_to_features = collections.defaultdict(list)
for feature in all_features: for feature in all_features:
...@@ -788,14 +687,6 @@ def write_predictions(all_examples, all_features, all_results, n_best_size, ...@@ -788,14 +687,6 @@ def write_predictions(all_examples, all_features, all_results, n_best_size,
start_indexes = _get_best_indexes(result.start_logits, n_best_size) start_indexes = _get_best_indexes(result.start_logits, n_best_size)
end_indexes = _get_best_indexes(result.end_logits, n_best_size) end_indexes = _get_best_indexes(result.end_logits, n_best_size)
# if we could have irrelevant answers, get the min score of irrelevant # if we could have irrelevant answers, get the min score of irrelevant
if with_negative:
feature_null_score = result.start_logits[0] + result.end_logits[
0]
if feature_null_score < score_null:
score_null = feature_null_score
min_null_feature_index = feature_index
null_start_logit = result.start_logits[0]
null_end_logit = result.end_logits[0]
for start_index in start_indexes: for start_index in start_indexes:
for end_index in end_indexes: for end_index in end_indexes:
# We could hypothetically create invalid predictions, e.g., predict # We could hypothetically create invalid predictions, e.g., predict
...@@ -824,14 +715,6 @@ def write_predictions(all_examples, all_features, all_results, n_best_size, ...@@ -824,14 +715,6 @@ def write_predictions(all_examples, all_features, all_results, n_best_size,
start_logit=result.start_logits[start_index], start_logit=result.start_logits[start_index],
end_logit=result.end_logits[end_index])) end_logit=result.end_logits[end_index]))
if with_negative:
prelim_predictions.append(
_PrelimPrediction(
feature_index=min_null_feature_index,
start_index=0,
end_index=0,
start_logit=null_start_logit,
end_logit=null_end_logit))
prelim_predictions = sorted( prelim_predictions = sorted(
prelim_predictions, prelim_predictions,
key=lambda x: (x.start_logit + x.end_logit), key=lambda x: (x.start_logit + x.end_logit),
...@@ -880,14 +763,6 @@ def write_predictions(all_examples, all_features, all_results, n_best_size, ...@@ -880,14 +763,6 @@ def write_predictions(all_examples, all_features, all_results, n_best_size,
start_logit=pred.start_logit, start_logit=pred.start_logit,
end_logit=pred.end_logit)) end_logit=pred.end_logit))
# if we didn't inlude the empty option in the n-best, inlcude it
if with_negative:
if "" not in seen_predictions:
nbest.append(
_NbestPrediction(
text="",
start_logit=null_start_logit,
end_logit=null_end_logit))
# In very rare edge cases we could have no valid predictions. So we # In very rare edge cases we could have no valid predictions. So we
# just create a nonce prediction in this case to avoid failure. # just create a nonce prediction in this case to avoid failure.
if not nbest: if not nbest:
...@@ -921,29 +796,10 @@ def write_predictions(all_examples, all_features, all_results, n_best_size, ...@@ -921,29 +796,10 @@ def write_predictions(all_examples, all_features, all_results, n_best_size,
assert len(nbest_json) >= 1 assert len(nbest_json) >= 1
if not with_negative:
all_predictions[example.qas_id] = nbest_json[0]["text"] all_predictions[example.qas_id] = nbest_json[0]["text"]
else:
# predict "" iff the null score - the score of best non-null > threshold
score_diff = score_null - best_non_null_entry.start_logit - (
best_non_null_entry.end_logit)
scores_diff_json[example.qas_id] = score_diff
if score_diff > null_score_diff_threshold:
all_predictions[example.qas_id] = ""
else:
all_predictions[example.qas_id] = best_non_null_entry.text
all_nbest_json[example.qas_id] = nbest_json all_nbest_json[example.qas_id] = nbest_json
with open(output_prediction_file, "w") as writer: return all_predictions, all_nbest_json
writer.write(json.dumps(all_predictions, indent=4) + "\n")
with open(output_nbest_file, "w") as writer:
writer.write(json.dumps(all_nbest_json, indent=4) + "\n")
if with_negative:
with open(output_null_log_odds_file, "w") as writer:
writer.write(json.dumps(scores_diff_json, indent=4) + "\n")
def get_final_text(pred_text, orig_text, do_lower_case, verbose): def get_final_text(pred_text, orig_text, do_lower_case, verbose):
......
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. # coding=utf-8
# Copyright 2018 The Google AI Language Team Authors.
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
......
...@@ -11,13 +11,13 @@ from flask import Flask ...@@ -11,13 +11,13 @@ from flask import Flask
from flask import Response from flask import Response
from flask import request from flask import request
import numpy as np import numpy as np
import argparse
from multiprocessing.dummy import Pool as ThreadPool from multiprocessing.dummy import Pool as ThreadPool
app = Flask(__name__) app = Flask(__name__)
logger = logging.getLogger('flask') logger = logging.getLogger('flask')
url_1 = 'http://127.0.0.1:5118' # url for model1
url_2 = 'http://127.0.0.1:5120' # url for model2
def ensemble_example(answers, n_models=None): def ensemble_example(answers, n_models=None):
if n_models is None: if n_models is None:
...@@ -50,32 +50,45 @@ def mrqa_main(): ...@@ -50,32 +50,45 @@ def mrqa_main():
return nbest return nbest
try: try:
input_json = request.get_json(silent=True) input_json = request.get_json(silent=True)
n_models = len(urls)
pool = ThreadPool(2) pool = ThreadPool(n_models)
res1 = pool.apply_async(_call_model, (url_1, input_json)) results = []
res2 = pool.apply_async(_call_model, (url_2, input_json)) for url in urls:
nbest1 = res1.get() result = pool.apply_async(_call_model, (url, input_json))
nbest2 = res2.get() results.append(result.get())
# print(res1)
# print(nbest1)
pool.close() pool.close()
pool.join() pool.join()
nbests = [nbest.json()['results'] for nbest in results]
nbest1 = nbest1.json()['results'] qids = list(nbests[0].keys())
nbest2 = nbest2.json()['results']
qids = list(nbest1.keys())
for qid in qids: for qid in qids:
ensemble_nbest = ensemble_example([nbest1[qid], nbest2[qid]], n_models=2) ensemble_nbest = ensemble_example([nbest[qid] for nbest in nbests], n_models=n_models)
pred[qid] = ensemble_nbest[0]['text'] pred[qid] = ensemble_nbest[0]['text']
except Exception as e: except Exception as e:
pred['error'] = 'empty' pred['error'] = 'empty'
# logger.error('Error in mrc server - {}'.format(e))
logger.exception(e) logger.exception(e)
# import pdb; pdb.set_trace() # XXX BREAKPOINT
return Response(json.dumps(pred), mimetype='application/json') return Response(json.dumps(pred), mimetype='application/json')
if __name__ == '__main__': if __name__ == '__main__':
url_1 = 'http://127.0.0.1:5118' # url for ernie
url_2 = 'http://127.0.0.1:5119' # url for xl-net
url_3 = 'http://127.0.0.1:5120' # url for bert
parser = argparse.ArgumentParser('main server')
parser.add_argument('--ernie', action='store_true', default=False, help="Include ERNIE")
parser.add_argument('--xlnet', action='store_true', default=False, help="Include XL-NET")
parser.add_argument('--bert', action='store_true', default=False, help="Include BERT")
args = parser.parse_args()
urls = []
if args.ernie:
print('Include ERNIE model')
urls.append(url_1)
if args.xlnet:
print('Include XL-NET model')
urls.append(url_2)
if args.bert:
print('Include BERT model')
urls.append(url_3)
assert len(urls) > 0, "At lease one model is required"
app.run(host='127.0.0.1', port=5121, debug=False, threaded=False, processes=1) app.run(host='127.0.0.1', port=5121, debug=False, threaded=False, processes=1)
#!/bin/bash
gpu_id=0
# start ernie service
# usage: sh start.sh port gpu_id
cd ernie_server
nohup sh start.sh 5118 $gpu_id > ernie.log 2>&1 &
cd ..
# start xlnet service
cd xlnet_server
nohup sh start.sh 5119 $gpu_id > xlnet.log 2>&1 &
cd ..
# start bert service
cd bert_server cd bert_server
export CUDA_VISIBLE_DEVICES=1 nohup sh start.sh 5120 $gpu_id > bert.log 2>&1 &
sh start.sh
cd ../xlnet_server
export CUDA_VISIBLE_DEVICES=2
sh serve.sh
cd .. cd ..
sleep 60 sleep 3
python main_server.py # start main server
# usage: python main_server.py --model_name
# the model_name specifies the model to be used in the ensemble.
nohup python main_server.py --ernie --xlnet > main_server.log 2>&1 &
wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/D-Net/mrqa2019_inference_model.tar.gz wget --no-check-certificate https://baidu-nlp.bj.bcebos.com/D-Net/mrqa2019_inference_model.tar.gz
tar -xvf mrqa2019_inference_model.tar.gz tar -xvf mrqa2019_inference_model.tar.gz
rm mrqa2019_inference_model.tar.gz rm mrqa2019_inference_model.tar.gz
mv infer_model bert_server mv bert_infer_model bert_server/infer_model
mv infer_model_800_bs128 xlnet_server mv xlnet_infer_model xlnet_server/infer_model
mv ernie_infer_model ernie_server/infer_model
#!/usr/bin/env python #!/usr/bin/env python
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
"""Provide MRC service for TOP1 short answer extraction system """
Note the services here share some global pre/post process objects, which XL-NET model service
are **NOT THREAD SAFE**. Try to use multi-process instead of multi-thread
for deployment.
""" """
import json import json
import sys import sys
......
export FLAGS_sync_nccl_allreduce=0 export FLAGS_sync_nccl_allreduce=0
export FLAGS_eager_delete_tensor_gb=1 export FLAGS_eager_delete_tensor_gb=1
export FLAGS_fraction_of_gpu_memory_to_use=0.1 export FLAGS_fraction_of_gpu_memory_to_use=0.1
port=$1
gpu=$2
export CUDA_VISIBLE_DEVICES=$gpu
python serve.py ./infer_model_800_bs128 5001 & python serve.py ./infer_model $port
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册