提交 41351679 编写于 作者: J JiabinYang

refine readme and clean code

上级 de58898f
......@@ -25,6 +25,7 @@ cd data && ./download.sh && cd ..
```bash
python preprocess.py --data_path ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled --dict_path data/1-billion_dict
```
如果您想使用我们支持的第三方词汇表,请将--other_dict_path设置为您存放将使用的词汇表的目录,并设置--with_other_dict使用它
## 训练
训练的命令行选项可以通过`python train.py -h`列出。
......
......@@ -31,7 +31,8 @@ Preprocess the training data to generate a word dict.
```bash
python preprocess.py --data_path ./data/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled --dict_path data/1-billion_dict
```
if you would like to use our supported third party vocab, please set
if you would like to use our supported third party vocab, please set --other_dict_path as the directory of where you
save the vocab you will use and set --with_other_dict flag on to using it.
## Train
The command line options for training can be listed by `python train.py -h`.
......
#!/bin/bash
wget http://www.statmt.org/lm-benchmark/1-billion-word-language-modeling-benchmark-r13output.tar.gz
tar -zxvf 1-billion-word-language-modeling-benchmark-r13output.tar.gz
......@@ -2,8 +2,6 @@ import time
import os
import paddle.fluid as fluid
import numpy as np
from Queue import PriorityQueue
import heapq
import logging
import argparse
import preprocess
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册