README.md 4.5 KB
Newer Older
1 2 3 4
The minimum PaddlePaddle version needed for the code sample in this directory is v0.10.0. If you are on a version of PaddlePaddle earlier than v0.10.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).

---

P
Peng Li 已提交
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
# Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering

This model implements the work in the following paper:

Peng Li, Wei Li, Zhengyan He, Xuguang Wang, Ying Cao, Jie Zhou, and Wei Xu. Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering. [arXiv:1607.06275](https://arxiv.org/abs/1607.06275).

If you use the dataset/code in your research, please cite the above paper:

```text
@article{li:2016:arxiv,
  author  = {Li, Peng and Li, Wei and He, Zhengyan and Wang, Xuguang and Cao, Ying and Zhou, Jie and Xu, Wei},
  title   = {Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering},
  journal = {arXiv:1607.06275v2},
  year    = {2016},
  url     = {https://arxiv.org/abs/1607.06275v2},
}
```


P
Peng Li 已提交
24
## Installation
P
Peng Li 已提交
25 26 27 28 29 30 31 32 33 34 35 36 37 38

1. Install PaddlePaddle v0.10.5 by the following commond. Note that v0.10.0 is not supported.
    ```bash
    # either one is OK
    # CPU
    pip install paddlepaddle
    # GPU
    pip install paddlepaddle-gpu
    ```
2. Download the [WebQA](http://idl.baidu.com/WebQA.html) dataset by running
   ```bash
   cd data && ./download.sh && cd ..
   ```

P
Peng Li 已提交
39
## Hyperparameters
P
Peng Li 已提交
40 41 42

All the hyperparameters are defined in `config.py`. The default values are aligned with the paper.

P
Peng Li 已提交
43
## Training
P
Peng Li 已提交
44 45 46 47 48 49

Training can be launched using the following command:

```bash
PYTHONPATH=data/evaluation:$PYTHONPATH python train.py 2>&1 | tee train.log
```
P
Peng Li 已提交
50
## Validation and Test
P
Peng Li 已提交
51

P
Peng Li 已提交
52
WebQA provides two versions of validation and test sets.  Automatic validation and test can be lauched by
P
Peng Li 已提交
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69

```bash
PYTHONPATH=data/evaluation:$PYTHONPATH python val_and_test.py models [ann|ir]
```

where

* `models`: the directory where model files are stored. You can use `models` if `config.py` is not changed.
* `ann`: using the validation and test sets with annotated evidence.
* `ir`: using the validation and test sets with retrieved evidence.

Note that validation and test can run simultaneously with training. `val_and_test.py` will handle the synchronization related problems.

Intermediate results are stored in the directory `tmp`. You can delete them safely after validation and test.

The results should be comparable with those shown in Table 3 in the paper.

P
Peng Li 已提交
70
## Inferring using a Trained Model
P
Peng Li 已提交
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85

Infer using a trained model by running:
```bash
PYTHONPATH=data/evaluation:$PYTHONPATH python infer.py \
  MODEL_FILE \
  INPUT_DATA \
  OUTPUT_FILE \
  2>&1 | tee infer.log
```

where

* `MODEL_FILE`: a trained model produced by `train.py`.
* `INPUT_DATA`: input data in the same format as the validation/test sets of the WebQA dataset.
* `OUTPUT_FILE`: results in the format specified in the WebQA dataset for the evaluation scripts.
P
Peng Li 已提交
86

P
Peng Li 已提交
87
## Pre-trained Models
P
Peng Li 已提交
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128

We have provided two pre-trained models, one for the validation and test sets with annotated evidence, and one for those with retrieved evidence. These two models are selected according to the performance on the corresponding version of validation set, which is consistent with the paper.

The models can be downloaded with
```bash
cd pre-trained-models && ./download-models.sh && cd ..
```

The evaluation result on the test set with annotated evidence can be achieved by

```bash
PYTHONPATH=data/evaluation:$PYTHONPATH python infer.py \
  pre-trained-models/params_pass_00010.tar.gz \
  data/data/test.ann.json.gz \
  test.ann.output.txt.gz

PYTHONPATH=data/evaluation:$PYTHONPATH \
  python data/evaluation/evaluate-tagging-result.py \
  test.ann.output.txt.gz \
  data/data/test.ann.json.gz \
  --fuzzy --schema BIO2
# The result should be
# chunk_f1=0.739091 chunk_precision=0.686119 chunk_recall=0.800926 true_chunks=3024 result_chunks=3530 correct_chunks=2422
```

And the evaluation result on the test set with retrieved evidence can be achieved by

```bash
PYTHONPATH=data/evaluation:$PYTHONPATH python infer.py \
  pre-trained-models/params_pass_00021.tar.gz \
  data/data/test.ir.json.gz \
  test.ir.output.txt.gz

PYTHONPATH=data/evaluation:$PYTHONPATH \
  python data/evaluation/evaluate-voting-result.py \
  test.ir.output.txt.gz \
  data/data/test.ir.json.gz \
  --fuzzy --schema BIO2
# The result should be
# chunk_f1=0.749358 chunk_precision=0.727868 chunk_recall=0.772156 true_chunks=3024 result_chunks=3208 correct_chunks=2335
```