提交 f7ba0273 编写于 作者: Y Yibing Liu

upload the language model

上级 6073936a
......@@ -66,12 +66,36 @@ More help for arguments:
python train.py --help
```
### Inferencing
### Preparing language model
The following steps, inference, parameters tuning and evaluating, will require a language model during decoding.
A compressed language model is provided and can be accessed by
```
cd ./lm
sh run.sh
```
After the downloading is completed, then
```
cd ..
```
### Inference
For GPU inference
```
CUDA_VISIBLE_DEVICES=0 python infer.py
```
For CPU inference
```
python infer.py --use_gpu=False
```
More help for arguments:
```
......@@ -92,14 +116,24 @@ python evaluate.py --help
### Parameters tuning
Parameters tuning for the CTC beam search decoder
Usually, the parameters $\alpha$ and $\beta$ for the CTC [prefix beam search](https://arxiv.org/abs/1408.2873) decoder need to be tuned after retraining the acoustic model.
For GPU tuning
```
CUDA_VISIBLE_DEVICES=0 python tune.py
```
For CPU tuning
```
python tune.py --use_gpu=False
```
More help for arguments:
```
python tune.py --help
```
Then reset parameters with the tuning result before inference or evaluating.
......@@ -62,7 +62,7 @@ parser.add_argument(
)
parser.add_argument(
"--language_model_path",
default="lm/data/1Billion.klm",
default="lm/data/common_crawl_00.prune01111.trie.klm",
type=str,
help="Path for language model. (default: %(default)s)")
parser.add_argument(
......@@ -139,6 +139,7 @@ def evaluate():
batch_reader = data_generator.batch_reader_creator(
manifest_path=args.decode_manifest_path,
batch_size=args.batch_size,
min_batch_size=1,
sortagrad=False,
shuffle_method=None)
......
......@@ -89,7 +89,7 @@ parser.add_argument(
help="Number of output per sample in beam search. (default: %(default)d)")
parser.add_argument(
"--language_model_path",
default="lm/data/1Billion.klm",
default="lm/data/common_crawl_00.prune01111.trie.klm",
type=str,
help="Path for language model. (default: %(default)s)")
parser.add_argument(
......
echo "Downloading language model."
echo "Downloading language model ..."
mkdir data
LM=common_crawl_00.prune01111.trie.klm
MD5="099a601759d467cd0a8523ff939819c5"
wget -c http://paddlepaddle.bj.bcebos.com/model_zoo/speech/$LM -P ./data
echo "Checking md5sum ..."
md5_tmp=`md5sum ./data/$LM | awk -F[' '] '{print $1}'`
if [ $MD5 != $md5_tmp ]; then
echo "Fail to download the language model!"
exit 1
fi
wget -c ftp://xxx/xxx/en.00.UNKNOWN.klm -P ./data
......@@ -77,7 +77,7 @@ parser.add_argument(
help="Width for beam search decoding. (default: %(default)d)")
parser.add_argument(
"--language_model_path",
default="lm/data/1Billion.klm",
default="lm/data/common_crawl_00.prune01111.trie.klm",
type=str,
help="Path for language model. (default: %(default)s)")
parser.add_argument(
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册