@@ -25,7 +25,7 @@ To preprocess the raw dataset, we min-max normalize continuous features to [0, 1
...
@@ -25,7 +25,7 @@ To preprocess the raw dataset, we min-max normalize continuous features to [0, 1
Download and preprocess data:
Download and preprocess data:
```bash
```bash
cd data &&sh download_preprocess.sh&&cd ..
cd data &&python download_preprocess.py&&cd ..
```
```
After executing these commands, 3 folders "train_data", "test_data" and "aid_data" will be generated. The folder "train_data" contains 90% of the raw data, while the rest 10% is in "test_data". The folder "aid_data" contains a created feature dictionary "feat_dict.pkl2".
After executing these commands, 3 folders "train_data", "test_data" and "aid_data" will be generated. The folder "train_data" contains 90% of the raw data, while the rest 10% is in "test_data". The folder "aid_data" contains a created feature dictionary "feat_dict.pkl2".
...
@@ -58,12 +58,13 @@ We emulate distributed training on a local machine. In default, we use 2 X 2,i
...
@@ -58,12 +58,13 @@ We emulate distributed training on a local machine. In default, we use 2 X 2,i
### Download and preprocess distributed demo dataset
### Download and preprocess distributed demo dataset
This small demo dataset(a few lines from Criteo dataset) only test if distributed training can train.
This small demo dataset(a few lines from Criteo dataset) only test if distributed training can train.
```bash
```bash
cd dist_data &&sh dist_data_download.sh&&cd ..
cd dist_data &&python dist_data_download.py&&cd ..