# Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering This model implements the work in the following paper: Peng Li, Wei Li, Zhengyan He, Xuguang Wang, Ying Cao, Jie Zhou, and Wei Xu. Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering. [arXiv:1607.06275](https://arxiv.org/abs/1607.06275). If you use the dataset/code in your research, please cite the above paper: ```text @article{li:2016:arxiv, author = {Li, Peng and Li, Wei and He, Zhengyan and Wang, Xuguang and Cao, Ying and Zhou, Jie and Xu, Wei}, title = {Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering}, journal = {arXiv:1607.06275v2}, year = {2016}, url = {https://arxiv.org/abs/1607.06275v2}, } ``` # Installation 1. Install PaddlePaddle v0.10.5 by the following commond. Note that v0.10.0 is not supported. ```bash # either one is OK # CPU pip install paddlepaddle # GPU pip install paddlepaddle-gpu ``` 2. Download the [WebQA](http://idl.baidu.com/WebQA.html) dataset by running ```bash cd data && ./download.sh && cd .. ``` #Hyperparameters All the hyperparameters are defined in `config.py`. The default values are aligned with the paper. # Training Training can be launched using the following command: ```bash PYTHONPATH=data/evaluation:$PYTHONPATH python train.py 2>&1 | tee train.log ``` # Validation and Test WebQA provoides two versions of validation and test sets. Automatic valiation and test can be lauched by ```bash PYTHONPATH=data/evaluation:$PYTHONPATH python val_and_test.py models [ann|ir] ``` where * `models`: the directory where model files are stored. You can use `models` if `config.py` is not changed. * `ann`: using the validation and test sets with annotated evidence. * `ir`: using the validation and test sets with retrieved evidence. Note that validation and test can run simultaneously with training. `val_and_test.py` will handle the synchronization related problems. Intermediate results are stored in the directory `tmp`. You can delete them safely after validation and test. The results should be comparable with those shown in Table 3 in the paper. # Inferring using a Trained Model Infer using a trained model by running: ```bash PYTHONPATH=data/evaluation:$PYTHONPATH python infer.py \ MODEL_FILE \ INPUT_DATA \ OUTPUT_FILE \ 2>&1 | tee infer.log ``` where * `MODEL_FILE`: a trained model produced by `train.py`. * `INPUT_DATA`: input data in the same format as the validation/test sets of the WebQA dataset. * `OUTPUT_FILE`: results in the format specified in the WebQA dataset for the evaluation scripts.