README.md 4.2 KB
Newer Older
P
Peng Li 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
# Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering

This model implements the work in the following paper:

Peng Li, Wei Li, Zhengyan He, Xuguang Wang, Ying Cao, Jie Zhou, and Wei Xu. Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering. [arXiv:1607.06275](https://arxiv.org/abs/1607.06275).

If you use the dataset/code in your research, please cite the above paper:

```text
@article{li:2016:arxiv,
  author  = {Li, Peng and Li, Wei and He, Zhengyan and Wang, Xuguang and Cao, Ying and Zhou, Jie and Xu, Wei},
  title   = {Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering},
  journal = {arXiv:1607.06275v2},
  year    = {2016},
  url     = {https://arxiv.org/abs/1607.06275v2},
}
```


P
Peng Li 已提交
20
## Installation
P
Peng Li 已提交
21 22 23 24 25 26 27 28 29 30 31 32 33 34

1. Install PaddlePaddle v0.10.5 by the following commond. Note that v0.10.0 is not supported.
    ```bash
    # either one is OK
    # CPU
    pip install paddlepaddle
    # GPU
    pip install paddlepaddle-gpu
    ```
2. Download the [WebQA](http://idl.baidu.com/WebQA.html) dataset by running
   ```bash
   cd data && ./download.sh && cd ..
   ```

P
Peng Li 已提交
35
## Hyperparameters
P
Peng Li 已提交
36 37 38

All the hyperparameters are defined in `config.py`. The default values are aligned with the paper.

P
Peng Li 已提交
39
## Training
P
Peng Li 已提交
40 41 42 43 44 45

Training can be launched using the following command:

```bash
PYTHONPATH=data/evaluation:$PYTHONPATH python train.py 2>&1 | tee train.log
```
P
Peng Li 已提交
46
## Validation and Test
P
Peng Li 已提交
47

P
Peng Li 已提交
48
WebQA provides two versions of validation and test sets.  Automatic validation and test can be lauched by
P
Peng Li 已提交
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

```bash
PYTHONPATH=data/evaluation:$PYTHONPATH python val_and_test.py models [ann|ir]
```

where

* `models`: the directory where model files are stored. You can use `models` if `config.py` is not changed.
* `ann`: using the validation and test sets with annotated evidence.
* `ir`: using the validation and test sets with retrieved evidence.

Note that validation and test can run simultaneously with training. `val_and_test.py` will handle the synchronization related problems.

Intermediate results are stored in the directory `tmp`. You can delete them safely after validation and test.

The results should be comparable with those shown in Table 3 in the paper.

P
Peng Li 已提交
66
## Inferring using a Trained Model
P
Peng Li 已提交
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81

Infer using a trained model by running:
```bash
PYTHONPATH=data/evaluation:$PYTHONPATH python infer.py \
  MODEL_FILE \
  INPUT_DATA \
  OUTPUT_FILE \
  2>&1 | tee infer.log
```

where

* `MODEL_FILE`: a trained model produced by `train.py`.
* `INPUT_DATA`: input data in the same format as the validation/test sets of the WebQA dataset.
* `OUTPUT_FILE`: results in the format specified in the WebQA dataset for the evaluation scripts.
P
Peng Li 已提交
82

P
Peng Li 已提交
83
## Pre-trained Models
P
Peng Li 已提交
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124

We have provided two pre-trained models, one for the validation and test sets with annotated evidence, and one for those with retrieved evidence. These two models are selected according to the performance on the corresponding version of validation set, which is consistent with the paper.

The models can be downloaded with
```bash
cd pre-trained-models && ./download-models.sh && cd ..
```

The evaluation result on the test set with annotated evidence can be achieved by

```bash
PYTHONPATH=data/evaluation:$PYTHONPATH python infer.py \
  pre-trained-models/params_pass_00010.tar.gz \
  data/data/test.ann.json.gz \
  test.ann.output.txt.gz

PYTHONPATH=data/evaluation:$PYTHONPATH \
  python data/evaluation/evaluate-tagging-result.py \
  test.ann.output.txt.gz \
  data/data/test.ann.json.gz \
  --fuzzy --schema BIO2
# The result should be
# chunk_f1=0.739091 chunk_precision=0.686119 chunk_recall=0.800926 true_chunks=3024 result_chunks=3530 correct_chunks=2422
```

And the evaluation result on the test set with retrieved evidence can be achieved by

```bash
PYTHONPATH=data/evaluation:$PYTHONPATH python infer.py \
  pre-trained-models/params_pass_00021.tar.gz \
  data/data/test.ir.json.gz \
  test.ir.output.txt.gz

PYTHONPATH=data/evaluation:$PYTHONPATH \
  python data/evaluation/evaluate-voting-result.py \
  test.ir.output.txt.gz \
  data/data/test.ir.json.gz \
  --fuzzy --schema BIO2
# The result should be
# chunk_f1=0.749358 chunk_precision=0.727868 chunk_recall=0.772156 true_chunks=3024 result_chunks=3208 correct_chunks=2335
```