提交 41351679 编写于 作者: J JiabinYang

refine readme and clean code

上级 de58898f
...@@ -25,6 +25,7 @@ cd data && ./download.sh && cd .. ...@@ -25,6 +25,7 @@ cd data && ./download.sh && cd ..
```bash ```bash
python preprocess.py --data_path ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled --dict_path data/1-billion_dict python preprocess.py --data_path ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled --dict_path data/1-billion_dict
``` ```
如果您想使用我们支持的第三方词汇表,请将--other_dict_path设置为您存放将使用的词汇表的目录,并设置--with_other_dict使用它
## 训练 ## 训练
训练的命令行选项可以通过`python train.py -h`列出。 训练的命令行选项可以通过`python train.py -h`列出。
......
...@@ -31,7 +31,8 @@ Preprocess the training data to generate a word dict. ...@@ -31,7 +31,8 @@ Preprocess the training data to generate a word dict.
```bash ```bash
python preprocess.py --data_path ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled --dict_path data/1-billion_dict python preprocess.py --data_path ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled --dict_path data/1-billion_dict
``` ```
if you would like to use our supported third party vocab, please set if you would like to use our supported third party vocab, please set --other_dict_path as the directory of where you
save the vocab you will use and set --with_other_dict flag on to using it.
## Train ## Train
The command line options for training can be listed by `python train.py -h`. The command line options for training can be listed by `python train.py -h`.
......
#!/bin/bash
wget http://www.statmt.org/lm-benchmark/1-billion-word-language-modeling-benchmark-r13output.tar.gz
tar -zxvf 1-billion-word-language-modeling-benchmark-r13output.tar.gz
...@@ -2,8 +2,6 @@ import time ...@@ -2,8 +2,6 @@ import time
import os import os
import paddle.fluid as fluid import paddle.fluid as fluid
import numpy as np import numpy as np
from Queue import PriorityQueue
import heapq
import logging import logging
import argparse import argparse
import preprocess import preprocess
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册