The command line options for training can be listed by `python train.py -h`.
The command line options for training can be listed by `python train.py -h`.
...
@@ -62,7 +59,7 @@ The command line options for training can be listed by `python train.py -h`.
...
@@ -62,7 +59,7 @@ The command line options for training can be listed by `python train.py -h`.
### Local Train:
### Local Train:
```bash
```bash
python train.py \
python train.py \
--train_data_path data/train.txt \
--train_data_path data/raw/train.txt \
2>&1 | tee train.log
2>&1 | tee train.log
```
```
...
@@ -70,7 +67,9 @@ After training pass 1 batch 40000, the testing AUC is `0.801178` and the testing
...
@@ -70,7 +67,9 @@ After training pass 1 batch 40000, the testing AUC is `0.801178` and the testing
cost is `0.445196`.
cost is `0.445196`.
### Distributed Train
### Distributed Train
Run a 2 pserver 2 trainer distribute training on a single machine
Run a 2 pserver 2 trainer distribute training on a single machine.
In distributed training setting, training data is splited by trainer_id, so that training data
do not overlap among trainers
```bash
```bash
sh cluster_train.sh
sh cluster_train.sh
...
@@ -83,9 +82,9 @@ To make inference for the test dataset:
...
@@ -83,9 +82,9 @@ To make inference for the test dataset:
```bash
```bash
python infer.py \
python infer.py \
--model_path models/ \
--model_path models/ \
--data_path data/valid.txt
--data_path data/raw/train.txt
```
```
Note: The AUC value in the last log info is the total AUC for all test dataset.
Note: The AUC value in the last log info is the total AUC for all test dataset. Here, train.txt is splited inside the reader.py so that validation data does not have overlap with training data.
## Train on Baidu Cloud
## Train on Baidu Cloud
1. Please prepare some CPU machines on Baidu Cloud following the steps in [train_on_baidu_cloud](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/user_guides/howto/training/train_on_baidu_cloud_cn.rst)
1. Please prepare some CPU machines on Baidu Cloud following the steps in [train_on_baidu_cloud](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/user_guides/howto/training/train_on_baidu_cloud_cn.rst)