“2b87801bbbac72ab6c9bbe9a029b1513f17c40d0”上不存在“tools/converter.py”
提交 92381451 编写于 作者: H Hui Zhang

format

上级 30aba266
......@@ -11,7 +11,7 @@
## Features
See [feature list](doc/src/feature_list.md) for more information.
See [feature list](doc/src/feature_list.md) for more information.
## Setup
......
......@@ -179,7 +179,8 @@ class FeatureNormalizer(object):
wav_number += batch_size
if wav_number % 1000 == 0:
logger.info(f'process {wav_number} wavs,{all_number} frames.')
logger.info(
f'process {wav_number} wavs,{all_number} frames.')
self.cmvn_info = {
'mean_stat': list(all_mean_stat.tolist()),
......
......@@ -98,4 +98,4 @@
## Text Filter
* 敏感词(黄暴、涉政、违法违禁等)
\ No newline at end of file
* 敏感词(黄暴、涉政、违法违禁等)
......@@ -14,4 +14,3 @@ We compare the training time with 1, 2, 4, 8 Tesla V100 GPUs (with a subset of L
| 8 | 6.95 X |
`utils/profile.sh` provides such a demo profiling tool, you can change it as need.
......@@ -48,4 +48,4 @@
## Zhuyin
* [Bopomofo](https://en.wikipedia.org/wiki/Bopomofo)
* [Zhuyin table](https://en.wikipedia.org/wiki/Zhuyin_table)
\ No newline at end of file
* [Zhuyin table](https://en.wikipedia.org/wiki/Zhuyin_table)
......@@ -18,4 +18,4 @@
### ASR Noise
* [asr-noises](https://github.com/speechio/asr-noises)
\ No newline at end of file
* [asr-noises](https://github.com/speechio/asr-noises)
......@@ -58,4 +58,4 @@
### Grapheme To Phoneme
* syallable
* phoneme
\ No newline at end of file
* phoneme
......@@ -83,4 +83,4 @@ Please notice that the released language models only contain Chinese simplified
```
build/bin/build_binary ./result/people2014corpus_words.arps ./result/people2014corpus_words.klm
```
\ No newline at end of file
```
......@@ -76,7 +76,7 @@ pip3 install textgrid
tg.read('file.TextGrid') # 'file.TextGrid' 是文件名
```
tg.tiers属性:
tg.tiers属性:
会把文件中的所有item打印出来, print(tg.tiers) 的结果如下:
```text
......@@ -86,7 +86,7 @@ pip3 install textgrid
Interval(1361.89250, 1362.01250, R),
Interval(1362.01250, 1362.13250, AY1),
Interval(1362.13250, 1362.16250, T),
...
]
)
......@@ -113,7 +113,7 @@ pip3 install textgrid
Interval 可以理解为时长
```
2. textgrid库中的对象
**IntervalTier** 对象:
......@@ -148,7 +148,7 @@ pip3 install textgrid
strict -- > 返回bool值, 表示是否严格TextGrid格式
```
**PointTier** 对象:
方法
......@@ -174,7 +174,7 @@ pip3 install textgrid
name 返回name
```
**Point** 对象:
支持比较大小, 支持加减运算
......@@ -185,7 +185,7 @@ pip3 install textgrid
time:
```
**Interval** 对象:
支持比较大小, 支持加减运算
......@@ -250,10 +250,9 @@ pip3 install textgrid
grids: --> 返回读取的grids的列表
```
## Reference
* https://zh.wikipedia.org/wiki/Praat%E8%AF%AD%E9%9F%B3%E5%AD%A6%E8%BD%AF%E4%BB%B6
* https://blog.csdn.net/duxin_csdn/article/details/88966295
# Useful Tools
* [正则可视化和常用正则表达式](https://wangwl.net/static/projects/visualRegex/#)
......@@ -23,7 +23,7 @@ Therefore, procedures like stemming and lemmatization are not useful for Chinese
### Tokenization
**Tokenizing breaks up text data into shorter pre-set strings**, which help build context and meaning for the machine learning model.
**Tokenizing breaks up text data into shorter pre-set strings**, which help build context and meaning for the machine learning model.
These “tags” label the part of speech. There are 24 part of speech tags and 4 proper name category labels in the `**jieba**` package’s existing dictionary.
......@@ -31,7 +31,7 @@ These “tags” label the part of speech. There are 24 part of speech tags and
### Stop Words
In NLP, **stop words are “meaningless” words** that make the data too noisy or ambiguous.
In NLP, **stop words are “meaningless” words** that make the data too noisy or ambiguous.
Instead of manually removing them, you could import the `**stopwordsiso**` package for a full list of Chinese stop words. More information can be found [here](https://pypi.org/project/stopwordsiso/). And with this, we can easily create code to filter out any stop words in large text data.
......@@ -188,4 +188,4 @@ TN: 基于规则的方法
## Reference
* [Text Front End](https://slyne.github.io/%E5%85%AC%E5%BC%80%E8%AF%BE/2020/10/03/TTS1/)
* [Chinese Natural Language (Pre)processing: An Introduction](https://towardsdatascience.com/chinese-natural-language-pre-processing-an-introduction-995d16c2705f)
* [Beginner’s Guide to Sentiment Analysis for Simplified Chinese using SnowNLP](https://towardsdatascience.com/beginners-guide-to-sentiment-analysis-for-simplified-chinese-using-snownlp-ce88a8407efb)
\ No newline at end of file
* [Beginner’s Guide to Sentiment Analysis for Simplified Chinese using SnowNLP](https://towardsdatascience.com/beginners-guide-to-sentiment-analysis-for-simplified-chinese-using-snownlp-ce88a8407efb)
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册