Release training data & update README

0c0e6ded · Yibing Liu · c4d61454 · 0c0e6ded · 0c0e6ded · 0c0e6ded
4 changed file
--- a/fluid/deep_attention_matching_net/README.md
+++ b/fluid/deep_attention_matching_net/README.md
@@ -2,11 +2,11 @@
 This is the source code of Deep Attention Matching network (DAM), that is proposed for multi-turn response selection in the retrieval-based chatbot.
-DAM is a neural matching network that entirely based on attention mechanism. The motivation of DAM is to capture those semantic dependencies, among dialogue elements at different level of granularities, in multi-turn conversation as matching evidences, in order to better match response candidate with its multi-turn context. DAM will appear on ACL-2018, please find our paper at: http://acl2018.org/conference/accepted-papers/.
+DAM is a neural matching network that entirely based on attention mechanism. The motivation of DAM is to capture those semantic dependencies, among dialogue elements at different level of granularities, in multi-turn conversation as matching evidences, in order to better match response candidate with its multi-turn context. DAM appears on ACL-2018, please find our paper at [http://aclweb.org/anthology/P18-1103](http://aclweb.org/anthology/P18-1103).
 ## __TensorFlow Version__
-DAM is originally implemented with Tensorflow, which can be found at: https://github.com/baidu/Dialogue/DAM . We highly recommend using the PaddlePaddle Fluid version here as it supports parallely training with very large corpus.
+DAM is originally implemented with Tensorflow, which can be found at: [https://github.com/baidu/Dialogue/DAM](https://github.com/baidu/Dialogue/DAM) (in progress). We highly recommend using the PaddlePaddle Fluid version here as it supports parallely training with very large corpus.
 ## __Network__
@@ -32,27 +32,42 @@ We test DAM on two large-scale multi-turn response selection tasks, i.e., the Ub
 ## __Usage__
-First, please download [data](https://pan.baidu.com/s/1hakfuuwdS8xl7NyxlWzRiQ "data") and unzip it:
+Take the experiment on the Ubuntu Corpus v1 for Example.
+1) Go to the `ubuntu` directory
+```
+cd ubuntu
+```
+2) Download the well-preprocessed data for training  
 ```
-cd data
+sh download_data.sh
-unzip data.zip
 ```
+3) Execute the model training and evaluation by
-If you want use well trained models directly, please download [models](https://pan.baidu.com/s/1pl4d63MBxihgrEWWfdAz0w "models") and unzip it:
 ```
-cd output
+sh train.sh
-unzip output.zip
 ```
+for more detailed explanation about the arguments, please run
-Train and test the model by:
 ```
-sh run.sh
+python ../train_and_evaluate.py --help
 ```
+4) Run test by
+```
+sh test.sh
+```
+and run the test for different saved models by using different argument `--model_path`.
+Similary, one can carry out the experiment on the Douban Conversation Corpus by going to the directory `douban` and following the same procedure.
 ## __Dependencies__
 - Python >= 2.7.3
- PaddlePaddle latest develop
+- PaddlePaddle latest develop branch
 ## __Citation__

--- a/fluid/deep_attention_matching_net/douban/download_data.sh
+++ b/fluid/deep_attention_matching_net/douban/download_data.sh
+url=http://dam-data.cdn.bcebos.com/douban.tar.gz
+md5=e07ca68f21c20e09efb3e8b247194405
+if  [ ! -e douban.tar.gz ]; then
+    wget -c $url
+fi
+echo "Checking md5 sum ..."
+md5sum_tmp=`md5sum douban.tar.gz | cut -d ' ' -f1`
+if [ $md5sum_tmp !=  $md5 ]; then
+    echo "Md5sum check failed, please remove and redownload douban.tar.gz"
+    exit 1
+fi
+echo "Untar douban.tar.gz ..."
+tar -xzvf douban.tar.gz 
--- a/fluid/deep_attention_matching_net/ubuntu/download_data.sh
+++ b/fluid/deep_attention_matching_net/ubuntu/download_data.sh
+url=http://dam-data.cdn.bcebos.com/ubuntu.tar.gz
+md5=9d7db116a040530a16f68dc0ab44e4b6
+if  [ ! -e ubuntu.tar.gz ]; then
+    wget -c $url
+fi
+echo "Checking md5 sum ..."
+md5sum_tmp=`md5sum ubuntu.tar.gz | cut -d ' ' -f1`
+if [ $md5sum_tmp !=  $md5 ]; then
+    echo "Md5sum check failed, please remove and redownload ubuntu.tar.gz"
+    exit 1
+fi
+echo "Untar ubuntu.tar.gz ..."
+tar -xzvf ubuntu.tar.gz 
--- a/fluid/deep_attention_matching_net/ubuntu/test.sh
+++ b/fluid/deep_attention_matching_net/ubuntu/test.sh
 export CUDA_VISIBLE_DEVICES=0,1,2,3
-python -u test_and_evaluate.py --use_cuda \
+python -u ../test_and_evaluate.py --use_cuda \
                --data_path ./data/data.pkl \
                --save_path ./ \
                --model_path models/step_10000 \