diff --git a/conv_seq_to_seq/README.md b/conv_seq_to_seq/README.md
index f6a09ed22d42d6de3b0d6fd14d826cb87de822f5..5b4bcfffba268c2e9d7e00bae719a3cb25ef0b94 100644
--- a/conv_seq_to_seq/README.md
+++ b/conv_seq_to_seq/README.md
@@ -1 +1,50 @@
-[TBD]
+# Convolutional Sequence to Sequence Learning
+This model implements the work in the following paper:
+
+Jonas Gehring, Micheal Auli, David Grangier, et al. Convolutional Sequence to Sequence Learning. Association for Computational Linguistics (ACL), 2017
+
+# Training a Model
+- Modify the following script if needed and then run:
+
+	```bash
+	python train.py \
+	  --train_data_path ./data/train_data \
+	  --test_data_path ./data/test_data \
+	  --src_dict_path ./data/src_dict \
+	  --trg_dict_path ./data/trg_dict \
+	  --enc_blocks "[(256, 3)] * 5" \
+	  --dec_blocks "[(256, 3)] * 3" \
+	  --emb_size 256 \
+	  --pos_size 200 \
+	  --drop_rate 0.1 \
+	  --use_gpu False \
+	  --trainer_count 1 \
+	  --batch_size 32 \
+	  --num_passes 20 \
+	  >train.log 2>&1
+	```
+
+# Inferring by a Trained Model
+- Infer by a trained model by running:
+
+	```bash
+	python infer.py \
+	  --infer_data_path ./data/infer_data \
+	  --src_dict_path ./data/src_dict \
+	  --trg_dict_path ./data/trg_dict \
+	  --enc_blocks "[(256, 3)] * 5" \
+	  --dec_blocks "[(256, 3)] * 3" \
+	  --emb_size 256 \
+	  --pos_size 200 \
+	  --drop_rate 0.1 \
+	  --use_gpu False \
+	  --trainer_count 1 \
+	  --max_len 100 \
+	  --beam_size 1 \
+	  --model_path ./params.pass-0.tar.gz \
+	  1>infer_result 2>infer.log
+	```
+
+# Notes
+
+Currently, the beam search will forward the whole network when predicting every word, which is a waste of time. And we will fix it later.