提交 18a0bbe8 编写于 作者: J JiabinYang

remove is_local for preprocess

上级 fc4fe627
...@@ -23,7 +23,7 @@ cd data && ./download.sh && cd .. ...@@ -23,7 +23,7 @@ cd data && ./download.sh && cd ..
对数据进行预处理以生成一个词典。 对数据进行预处理以生成一个词典。
```bash ```bash
python preprocess.py --data_path ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled --dict_path data/1-billion_dict --is_local python preprocess.py --data_path ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled --dict_path data/1-billion_dict
``` ```
如果您想使用自定义的词典形如: 如果您想使用自定义的词典形如:
```bash ```bash
......
...@@ -29,9 +29,16 @@ This model implement a skip-gram model of word2vector. ...@@ -29,9 +29,16 @@ This model implement a skip-gram model of word2vector.
Preprocess the training data to generate a word dict. Preprocess the training data to generate a word dict.
```bash ```bash
python preprocess.py --data_path ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled --is_local --dict_path data/1-billion_dict python preprocess.py --data_path ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled --dict_path data/1-billion_dict
``` ```
if you would like to use our supported third party vocab, please set --other_dict_path as the directory of where you if you would like to use your own vocab follow the format below:
```bash
<UNK>
a
b
c
```
Then, please set --other_dict_path as the directory of where you
save the vocab you will use and set --with_other_dict flag on to using it. save the vocab you will use and set --with_other_dict flag on to using it.
## Train ## Train
......
...@@ -27,12 +27,6 @@ def parse_args(): ...@@ -27,12 +27,6 @@ def parse_args():
type=int, type=int,
default=5, default=5,
help="If the word count is less then freq, it will be removed from dict") help="If the word count is less then freq, it will be removed from dict")
parser.add_argument(
'--is_local',
action='store_true',
required=False,
default=False,
help='Local train or not, (default: False)')
parser.add_argument( parser.add_argument(
'--with_other_dict', '--with_other_dict',
...@@ -203,7 +197,6 @@ def preprocess(args): ...@@ -203,7 +197,6 @@ def preprocess(args):
for line in f: for line in f:
word_count[native_to_unicode(line.strip())] = 1 word_count[native_to_unicode(line.strip())] = 1
if args.is_local:
for i in range(1, 100): for i in range(1, 100):
with io.open( with io.open(
args.data_path + "/news.en-000{:0>2d}-of-00100".format(i), args.data_path + "/news.en-000{:0>2d}-of-00100".format(i),
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册