提交 f7ba0273 编写于 作者: Y Yibing Liu

upload the language model

上级 6073936a
...@@ -66,12 +66,36 @@ More help for arguments: ...@@ -66,12 +66,36 @@ More help for arguments:
python train.py --help python train.py --help
``` ```
### Inferencing ### Preparing language model
The following steps, inference, parameters tuning and evaluating, will require a language model during decoding.
A compressed language model is provided and can be accessed by
```
cd ./lm
sh run.sh
```
After the downloading is completed, then
```
cd ..
```
### Inference
For GPU inference
``` ```
CUDA_VISIBLE_DEVICES=0 python infer.py CUDA_VISIBLE_DEVICES=0 python infer.py
``` ```
For CPU inference
```
python infer.py --use_gpu=False
```
More help for arguments: More help for arguments:
``` ```
...@@ -92,14 +116,24 @@ python evaluate.py --help ...@@ -92,14 +116,24 @@ python evaluate.py --help
### Parameters tuning ### Parameters tuning
Parameters tuning for the CTC beam search decoder Usually, the parameters $\alpha$ and $\beta$ for the CTC [prefix beam search](https://arxiv.org/abs/1408.2873) decoder need to be tuned after retraining the acoustic model.
For GPU tuning
``` ```
CUDA_VISIBLE_DEVICES=0 python tune.py CUDA_VISIBLE_DEVICES=0 python tune.py
``` ```
For CPU tuning
```
python tune.py --use_gpu=False
```
More help for arguments: More help for arguments:
``` ```
python tune.py --help python tune.py --help
``` ```
Then reset parameters with the tuning result before inference or evaluating.
...@@ -62,7 +62,7 @@ parser.add_argument( ...@@ -62,7 +62,7 @@ parser.add_argument(
) )
parser.add_argument( parser.add_argument(
"--language_model_path", "--language_model_path",
default="lm/data/1Billion.klm", default="lm/data/common_crawl_00.prune01111.trie.klm",
type=str, type=str,
help="Path for language model. (default: %(default)s)") help="Path for language model. (default: %(default)s)")
parser.add_argument( parser.add_argument(
...@@ -139,6 +139,7 @@ def evaluate(): ...@@ -139,6 +139,7 @@ def evaluate():
batch_reader = data_generator.batch_reader_creator( batch_reader = data_generator.batch_reader_creator(
manifest_path=args.decode_manifest_path, manifest_path=args.decode_manifest_path,
batch_size=args.batch_size, batch_size=args.batch_size,
min_batch_size=1,
sortagrad=False, sortagrad=False,
shuffle_method=None) shuffle_method=None)
......
...@@ -89,7 +89,7 @@ parser.add_argument( ...@@ -89,7 +89,7 @@ parser.add_argument(
help="Number of output per sample in beam search. (default: %(default)d)") help="Number of output per sample in beam search. (default: %(default)d)")
parser.add_argument( parser.add_argument(
"--language_model_path", "--language_model_path",
default="lm/data/1Billion.klm", default="lm/data/common_crawl_00.prune01111.trie.klm",
type=str, type=str,
help="Path for language model. (default: %(default)s)") help="Path for language model. (default: %(default)s)")
parser.add_argument( parser.add_argument(
......
echo "Downloading language model." echo "Downloading language model ..."
mkdir data
LM=common_crawl_00.prune01111.trie.klm
MD5="099a601759d467cd0a8523ff939819c5"
wget -c http://paddlepaddle.bj.bcebos.com/model_zoo/speech/$LM -P ./data
echo "Checking md5sum ..."
md5_tmp=`md5sum ./data/$LM | awk -F[' '] '{print $1}'`
if [ $MD5 != $md5_tmp ]; then
echo "Fail to download the language model!"
exit 1
fi
wget -c ftp://xxx/xxx/en.00.UNKNOWN.klm -P ./data
...@@ -77,7 +77,7 @@ parser.add_argument( ...@@ -77,7 +77,7 @@ parser.add_argument(
help="Width for beam search decoding. (default: %(default)d)") help="Width for beam search decoding. (default: %(default)d)")
parser.add_argument( parser.add_argument(
"--language_model_path", "--language_model_path",
default="lm/data/1Billion.klm", default="lm/data/common_crawl_00.prune01111.trie.klm",
type=str, type=str,
help="Path for language model. (default: %(default)s)") help="Path for language model. (default: %(default)s)")
parser.add_argument( parser.add_argument(
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册