diff --git a/demo/quick_start/preprocess.sh b/demo/quick_start/preprocess.sh index fb2bee98beb268e88d82b64332273aa10399ff42..fe2acbbd74898fa3d12ddee3271658043c43e32e 100755 --- a/demo/quick_start/preprocess.sh +++ b/demo/quick_start/preprocess.sh @@ -20,6 +20,8 @@ set -e +export LC_ALL=C + mkdir -p data/tmp python preprocess.py -i data/reviews_Electronics_5.json.gz # uniq and shuffle diff --git a/doc/demo/quick_start/index_en.md b/doc/demo/quick_start/index_en.md index ee3fa2a2166f497524663574270b239a6170ab19..e7d74512292c89233373c48d05895794d56702d8 100644 --- a/doc/demo/quick_start/index_en.md +++ b/doc/demo/quick_start/index_en.md @@ -134,7 +134,7 @@ def process(settings, file_name): You need to add a data provider definition `define_py_data_sources2` in our network configuration. This definition specifies: - The path of the training and testing data (`data/train.list`, `data/test.list`). -- The location of the data provider file (`dataprovider_pow`). +- The location of the data provider file (`dataprovider_bow`). - The function to call to get data. (`process`). - Additional arguments or data. Here it passes the path of word dictionary.