@@ -4,8 +4,13 @@ This model implements the work in the following paper:
Jonas Gehring, Micheal Auli, David Grangier, et al. Convolutional Sequence to Sequence Learning. Association for Computational Linguistics (ACL), 2017
# Data Preparation
- The data used in this tutorial can be downloaded by runing:
- In this tutorial, each line in a data file contains one sample and each sample consists of a source sentence and a target sentence. And the two sentences are seperated by '\t'. So, to use your own data, it should be organized as follows:
```bash
sh download.sh
```
- Each line in the data file contains one sample and each sample consists of a source sentence and a target sentence. And the two sentences are seperated by '\t'. So, to use your own data, it should be organized as follows:
```
<source sentence>\t<target sentence>
...
...
@@ -16,15 +21,16 @@ Jonas Gehring, Micheal Auli, David Grangier, et al. Convolutional Sequence to Se
```bash
python train.py \
--train_data_path ./data/train_data \
--test_data_path ./data/test_data \
--train_data_path ./data/train \
--test_data_path ./data/test \
--src_dict_path ./data/src_dict \
--trg_dict_path ./data/trg_dict \
--enc_blocks"[(256, 3)] * 5"\
--dec_blocks"[(256, 3)] * 3"\
--emb_size 256 \
--pos_size 200 \
--drop_rate 0.1 \
--drop_rate 0.2 \
--use_bn False \
--use_gpu False \
--trainer_count 1 \
--batch_size 32 \
...
...
@@ -37,22 +43,24 @@ Jonas Gehring, Micheal Auli, David Grangier, et al. Convolutional Sequence to Se
```bash
python infer.py \
--infer_data_path ./data/infer_data \
--infer_data_path ./data/dev\
--src_dict_path ./data/src_dict \
--trg_dict_path ./data/trg_dict \
--enc_blocks"[(256, 3)] * 5"\
--dec_blocks"[(256, 3)] * 3"\
--emb_size 256 \
--pos_size 200 \
--drop_rate 0.1 \
--drop_rate 0.2 \
--use_bn False \
--use_gpu False \
--trainer_count 1 \
--max_len 100 \
--batch_size 256 \
--beam_size 1 \
--is_show_attention False \
--model_path ./params.pass-0.tar.gz \
1>infer_result 2>infer.log
```
# Notes
Currently, beam search will forward the encoder multiple times when predicting each target word, which requires extra computations. And we will fix it later.
Since PaddlePaddle of current version doesn't support weight normalization, we use batch normalization instead to confirm convergence when the network is deep.