提交 9eef84a3 编写于 作者: O oyjxer

fix bug with fixed inputs

上级 bb8271ad
...@@ -2,16 +2,42 @@ ERNIE-SAT是可以同时处理中英文的跨语言的语音-语言跨模态大 ...@@ -2,16 +2,42 @@ ERNIE-SAT是可以同时处理中英文的跨语言的语音-语言跨模态大
## 模型框架 ## 模型框架
ERNIE-SAT中我们提出了两项创新: ERNIE-SAT中我们提出了两项创新:
- 在预训练过程中将中英双语对应的音素作为输入,实现了跨语言、个性化的软音素映射 - 在预训练过程中将中英双语对应的音素作为输入,实现了跨语言、个性化的软音素映射
- 采用语言和语音的联合掩码学习实现了语言和语音的对齐 - 采用语言和语音的联合掩码学习实现了语言和语音的对齐
![framework](.meta/framework.png) ![framework](.meta/framework.png)
## 使用说明 ## 使用说明
### 1.安装飞桨 ### 1.安装飞桨与环境依赖
- 本项目的代码基于 Paddle(version>=2.0)
- 本项目开放提供加载torch版本的vocoder的功能
- torch version>=1.8
- 安装htk: 在[官方地址](https://htk.eng.cam.ac.uk/)注册完成后,即可进行下载较新版本的htk(例如3.4.1)。同时提供[历史版本htk下载地址](https://htk.eng.cam.ac.uk/ftp/software/)
- 1.注册账号,下载htk
- 2.解压htk文件,**放入项目根目录的tools文件夹中, 以htk文件夹名称放入**
- 3.**注意**: 如果您下载的是3.4.1或者更高版本,需要进入HTKLib/HRec.c文件中, **修改1626行和1650行**, 即把**以下两行的dur<=0 都修改为 dur<0**,如下所示:
```bash
以htk3.4.1版本举例:
(1)第1626行: if (dur<=0 && labid != splabid) HError(8522,"LatFromPaths: Align have dur<=0");
修改为: if (dur<0 && labid != splabid) HError(8522,"LatFromPaths: Align have dur<0");
(2)1650行: if (dur<=0 && labid != splabid) HError(8522,"LatFromPaths: Align have dur<=0 ");
修改为: if (dur<0 && labid != splabid) HError(8522,"LatFromPaths: Align have dur<0 ");
```
- 4.**编译**: 详情参见解压后的htk中的README文件(如果未编译, 则无法正常运行)
- 安装ParallelWaveGAN: 参见[官方地址](https://github.com/kan-bayashi/ParallelWaveGAN):按照该官方链接的安装流程,直接在**项目的根目录下** git clone ParallelWaveGAN项目并且安装相关依赖即可。
- 安装其他依赖: **sox, libsndfile**
本项目的代码基于 Paddle(version>=2.0)
### 2.预训练模型 ### 2.预训练模型
...@@ -21,12 +47,22 @@ ERNIE-SAT中我们提出了两项创新: ...@@ -21,12 +47,22 @@ ERNIE-SAT中我们提出了两项创新:
- [ERNIE-SAT_ZH_and_EN](http://bj.bcebos.com/wenxin-models/model-ernie-sat-base-en_zh.tar.gz) - [ERNIE-SAT_ZH_and_EN](http://bj.bcebos.com/wenxin-models/model-ernie-sat-base-en_zh.tar.gz)
创建download文件夹,下载上述ERNIE-SAT预训练模型并将其解压:
```bash
mkdir pretrained_model
cd pretrained_model
tar -zxvf model-ernie-sat-base-en.tar.gz
tar -zxvf model-ernie-sat-base-zh.tar.gz
tar -zxvf model-ernie-sat-base-en_zh.tar.gz
```
### 3.下载 ### 3.下载
1. 本项目使用parallel wavegan作为声码器(vocoder): 1. 本项目使用parallel wavegan作为声码器(vocoder):
- [pwg_aishell3_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_aishell3_ckpt_0.5.zip) - [pwg_aishell3_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_aishell3_ckpt_0.5.zip)
创建download文件夹,下载上述预训练的声码器(vocoder)模型并将其解压 创建download文件夹,下载上述预训练的声码器(vocoder)模型并将其解压:
```bash ```bash
mkdir download mkdir download
...@@ -49,7 +85,7 @@ unzip fastspeech2_nosil_ljspeech_ckpt_0.5.zip ...@@ -49,7 +85,7 @@ unzip fastspeech2_nosil_ljspeech_ckpt_0.5.zip
### 4.推理 ### 4.推理
本项目当前开源了语音编辑、个性化语音合成、跨语言语音合成的推理代码,后续会逐步开源。 本项目当前开源了语音编辑、个性化语音合成、跨语言语音合成的推理代码,后续会逐步开源。
注:当前采用的声码器版本与[模型训练时版本](https://github.com/kan-bayashi/ParallelWaveGAN)在英文上存在差异,您可使用模型训练时版本作为您的声码器,模型将在后续更新中升级 注:当前英文场下的合成语音采用的声码器默认为vctk_parallel_wavegan.v1.long, 可在[该链接](https://github.com/kan-bayashi/ParallelWaveGAN)中找到; 若use_pt_vocoder参数设置为False,则英文场景下使用paddle版本的声码器
我们提供特定音频文件, 以及其对应的文本、音素相关文件: 我们提供特定音频文件, 以及其对应的文本、音素相关文件:
- prompt_wav: 提供的音频文件 - prompt_wav: 提供的音频文件
...@@ -59,7 +95,7 @@ unzip fastspeech2_nosil_ljspeech_ckpt_0.5.zip ...@@ -59,7 +95,7 @@ unzip fastspeech2_nosil_ljspeech_ckpt_0.5.zip
```text ```text
prompt_wav prompt_wav
├── p299_096.wav # 样例语音文件1 ├── p299_096.wav # 样例语音文件1
├── SSB03540428.wav # 样例语音文件2 ├── p243_313.wav # 样例语音文件2
└── ... └── ...
``` ```
...@@ -85,11 +121,12 @@ prompt/dev ...@@ -85,11 +121,12 @@ prompt/dev
12. ` --target_language` , 目标语言 12. ` --target_language` , 目标语言
13. ` --output_name` , 合成语音名称 13. ` --output_name` , 合成语音名称
14. ` --task_name` , 任务名称, 包括:语音编辑任务、个性化语音合成任务、跨语言语音合成任务 14. ` --task_name` , 任务名称, 包括:语音编辑任务、个性化语音合成任务、跨语言语音合成任务
15. ` use_pt_vocoder`,英文场景下是否使用torch版本的vocoder, 默认情况下为True; 设置为False则在英文场景下使用paddle版本vocoder
运行以下脚本即可进行实验 运行以下脚本即可进行实验
```shell ```shell
sh run_sedit_en.sh # 语音编辑任务(英文) sh run_sedit_en.sh # 语音编辑任务(英文)
sh run_gen_en.sh # 个性化语音合成任务(英文) sh run_gen_en.sh # 个性化语音合成任务(英文)
sh run_clone_en_to_zh.sh # 跨语言语音合成任务(英文到中文的克隆) sh run_clone_en_to_zh.sh # 跨语言语音合成任务(英文到中文的语音克隆)
``` ```
#!/usr/bin/env python
""" Usage:
align_english.py wavfile trsfile outwordfile outphonefile
"""
import os
import sys
from tqdm import tqdm
import multiprocessing as mp
PHONEME = 'tools/aligner/english_envir/english2phoneme/phoneme'
MODEL_DIR = 'tools/aligner/english'
HVITE = 'tools/htk/HTKTools/HVite'
HCOPY = 'tools/htk/HTKTools/HCopy'
def prep_txt(line, tmpbase, dictfile):
words = []
line = line.strip()
for pun in [',', '.', ':', ';', '!', '?', '"', '(', ')', '--', '---']:
line = line.replace(pun, ' ')
for wrd in line.split():
if (wrd[-1] == '-'):
wrd = wrd[:-1]
if (wrd[0] == "'"):
wrd = wrd[1:]
if wrd:
words.append(wrd)
ds = set([])
with open(dictfile, 'r') as fid:
for line in fid:
ds.add(line.split()[0])
unk_words = set([])
with open(tmpbase + '.txt', 'w') as fwid:
for wrd in words:
if (wrd.upper() not in ds):
unk_words.add(wrd.upper())
fwid.write(wrd + ' ')
fwid.write('\n')
#generate pronounciations for unknows words using 'letter to sound'
with open(tmpbase + '_unk.words', 'w') as fwid:
for unk in unk_words:
fwid.write(unk + '\n')
try:
os.system(PHONEME + ' ' + tmpbase + '_unk.words' + ' ' + tmpbase + '_unk.phons')
except:
print('english2phoneme error!')
sys.exit(1)
#add unknown words to the standard dictionary, generate a tmp dictionary for alignment
fw = open(tmpbase + '.dict', 'w')
with open(dictfile, 'r') as fid:
for line in fid:
fw.write(line)
f = open(tmpbase + '_unk.words', 'r')
lines1 = f.readlines()
f.close()
f = open(tmpbase + '_unk.phons', 'r')
lines2 = f.readlines()
f.close()
for i in range(len(lines1)):
wrd = lines1[i].replace('\n', '')
phons = lines2[i].replace('\n', '').replace(' ', '')
seq = []
j = 0
while (j < len(phons)):
if (phons[j] > 'Z'):
if (phons[j] == 'j'):
seq.append('JH')
elif (phons[j] == 'h'):
seq.append('HH')
else:
seq.append(phons[j].upper())
j += 1
else:
p = phons[j:j+2]
if (p == 'WH'):
seq.append('W')
elif (p in ['TH', 'SH', 'HH', 'DH', 'CH', 'ZH', 'NG']):
seq.append(p)
elif (p == 'AX'):
seq.append('AH0')
else:
seq.append(p + '1')
j += 2
fw.write(wrd + ' ')
for s in seq:
fw.write(' ' + s)
fw.write('\n')
fw.close()
def prep_mlf(txt, tmpbase):
with open(tmpbase + '.mlf', 'w') as fwid:
fwid.write('#!MLF!#\n')
fwid.write('"' + tmpbase + '.lab"\n')
fwid.write('sp\n')
wrds = txt.split()
for wrd in wrds:
fwid.write(wrd.upper() + '\n')
fwid.write('sp\n')
fwid.write('.\n')
def gen_res(tmpbase, outfile1, outfile2):
with open(tmpbase + '.txt', 'r') as fid:
words = fid.readline().strip().split()
words = txt.strip().split()
words.reverse()
with open(tmpbase + '.aligned', 'r') as fid:
lines = fid.readlines()
i = 2
times1 = []
times2 = []
while (i < len(lines)):
if (len(lines[i].split()) >= 4) and (lines[i].split()[0] != lines[i].split()[1]):
phn = lines[i].split()[2]
pst = (int(lines[i].split()[0])/1000+125)/10000
pen = (int(lines[i].split()[1])/1000+125)/10000
times2.append([phn, pst, pen])
if (len(lines[i].split()) == 5):
if (lines[i].split()[0] != lines[i].split()[1]):
wrd = lines[i].split()[-1].strip()
st = (int(lines[i].split()[0])/1000+125)/10000
j = i + 1
while (lines[j] != '.\n') and (len(lines[j].split()) != 5):
j += 1
en = (int(lines[j-1].split()[1])/1000+125)/10000
times1.append([wrd, st, en])
i += 1
with open(outfile1, 'w') as fwid:
for item in times1:
if (item[0] == 'sp'):
fwid.write(str(item[1]) + ' ' + str(item[2]) + ' SIL\n')
else:
wrd = words.pop()
fwid.write(str(item[1]) + ' ' + str(item[2]) + ' ' + wrd + '\n')
if words:
print('not matched::' + alignfile)
sys.exit(1)
with open(outfile2, 'w') as fwid:
for item in times2:
fwid.write(str(item[1]) + ' ' + str(item[2]) + ' ' + item[0] + '\n')
def alignment(wav_path, text_string):
tmpbase = '/tmp/' + os.environ['USER'] + '_' + str(os.getpid())
#prepare wav and trs files
try:
os.system('sox ' + wav_path + ' -r 16000 ' + tmpbase + '.wav remix -')
except:
print('sox error!')
return None
#prepare clean_transcript file
try:
prep_txt(text_string, tmpbase, MODEL_DIR + '/dict')
except:
print('prep_txt error!')
return None
#prepare mlf file
try:
with open(tmpbase + '.txt', 'r') as fid:
txt = fid.readline()
prep_mlf(txt, tmpbase)
except:
print('prep_mlf error!')
return None
#prepare scp
try:
os.system(HCOPY + ' -C ' + MODEL_DIR + '/16000/config ' + tmpbase + '.wav' + ' ' + tmpbase + '.plp')
except:
print('HCopy error!')
return None
#run alignment
try:
os.system(HVITE + ' -a -m -t 10000.0 10000.0 100000.0 -I ' + tmpbase + '.mlf -H ' + MODEL_DIR + '/16000/macros -H ' + MODEL_DIR + '/16000/hmmdefs -i ' + tmpbase + '.aligned ' + tmpbase + '.dict ' + MODEL_DIR + '/monophones ' + tmpbase + '.plp 2>&1 > /dev/null')
except:
print('HVite error!')
return None
with open(tmpbase + '.txt', 'r') as fid:
words = fid.readline().strip().split()
words = txt.strip().split()
words.reverse()
with open(tmpbase + '.aligned', 'r') as fid:
lines = fid.readlines()
i = 2
times2 = []
word2phns = {}
current_word = ''
index = 0
while (i < len(lines)):
splited_line = lines[i].strip().split()
if (len(splited_line) >= 4) and (splited_line[0] != splited_line[1]):
phn = splited_line[2]
pst = (int(splited_line[0])/1000+125)/10000
pen = (int(splited_line[1])/1000+125)/10000
times2.append([phn, pst, pen])
# splited_line[-1]!='sp'
if len(splited_line)==5:
current_word = str(index)+'_'+splited_line[-1]
word2phns[current_word] = phn
index+=1
elif len(splited_line)==4:
word2phns[current_word] += ' '+phn
i+=1
return times2,word2phns
#!/usr/bin/env python
""" Usage:
align_mandarin.py wavfile trsfile outwordfile putphonefile
"""
import os
import sys
from tqdm import tqdm
import multiprocessing as mp
MODEL_DIR = 'tools/aligner/mandarin'
HVITE = 'tools/htk/HTKTools/HVite'
HCOPY = 'tools/htk/HTKTools/HCopy'
def prep_txt(line, tmpbase, dictfile):
words = []
line = line.strip()
for pun in [',', '.', ':', ';', '!', '?', '"', '(', ')', '--', '---', u',', u'。', u':', u';', u'!', u'?', u'(', u')']:
line = line.replace(pun, ' ')
for wrd in line.split():
if (wrd[-1] == '-'):
wrd = wrd[:-1]
if (wrd[0] == "'"):
wrd = wrd[1:]
if wrd:
words.append(wrd)
ds = set([])
with open(dictfile, 'r') as fid:
for line in fid:
ds.add(line.split()[0])
unk_words = set([])
with open(tmpbase + '.txt', 'w') as fwid:
for wrd in words:
if (wrd not in ds):
unk_words.add(wrd)
fwid.write(wrd + ' ')
fwid.write('\n')
return unk_words
def prep_mlf(txt, tmpbase):
with open(tmpbase + '.mlf', 'w') as fwid:
fwid.write('#!MLF!#\n')
fwid.write('"' + tmpbase + '.lab"\n')
fwid.write('sp\n')
wrds = txt.split()
for wrd in wrds:
fwid.write(wrd.upper() + '\n')
fwid.write('sp\n')
fwid.write('.\n')
def gen_res(tmpbase, outfile1, outfile2):
with open(tmpbase + '.txt', 'r') as fid:
words = fid.readline().strip().split()
words = txt.strip().split()
words.reverse()
with open(tmpbase + '.aligned', 'r') as fid:
lines = fid.readlines()
i = 2
times1 = []
times2 = []
while (i < len(lines)):
if (len(lines[i].split()) >= 4) and (lines[i].split()[0] != lines[i].split()[1]):
phn = lines[i].split()[2]
pst = (int(lines[i].split()[0])/1000+125)/10000
pen = (int(lines[i].split()[1])/1000+125)/10000
times2.append([phn, pst, pen])
if (len(lines[i].split()) == 5):
if (lines[i].split()[0] != lines[i].split()[1]):
wrd = lines[i].split()[-1].strip()
st = (int(lines[i].split()[0])/1000+125)/10000
j = i + 1
while (lines[j] != '.\n') and (len(lines[j].split()) != 5):
j += 1
en = (int(lines[j-1].split()[1])/1000+125)/10000
times1.append([wrd, st, en])
i += 1
with open(outfile1, 'w') as fwid:
for item in times1:
if (item[0] == 'sp'):
fwid.write(str(item[1]) + ' ' + str(item[2]) + ' SIL\n')
else:
wrd = words.pop()
fwid.write(str(item[1]) + ' ' + str(item[2]) + ' ' + wrd + '\n')
if words:
print('not matched::' + alignfile)
sys.exit(1)
with open(outfile2, 'w') as fwid:
for item in times2:
fwid.write(str(item[1]) + ' ' + str(item[2]) + ' ' + item[0] + '\n')
def alignment_zh(wav_path, text_string):
tmpbase = '/tmp/' + os.environ['USER'] + '_' + str(os.getpid())
#prepare wav and trs files
try:
os.system('sox ' + wav_path + ' -r 16000 -b 16 ' + tmpbase + '.wav remix -')
except:
print('sox error!')
return None
#prepare clean_transcript file
try:
unk_words = prep_txt(text_string, tmpbase, MODEL_DIR + '/dict')
if unk_words:
print('Error! Please add the following words to dictionary:')
for unk in unk_words:
print("非法words: ", unk)
except:
print('prep_txt error!')
return None
#prepare mlf file
try:
with open(tmpbase + '.txt', 'r') as fid:
txt = fid.readline()
prep_mlf(txt, tmpbase)
except:
print('prep_mlf error!')
return None
#prepare scp
try:
os.system(HCOPY + ' -C ' + MODEL_DIR + '/16000/config ' + tmpbase + '.wav' + ' ' + tmpbase + '.plp')
except:
print('HCopy error!')
return None
#run alignment
try:
os.system(HVITE + ' -a -m -t 10000.0 10000.0 100000.0 -I ' + tmpbase + '.mlf -H ' + MODEL_DIR + '/16000/macros -H ' + MODEL_DIR + '/16000/hmmdefs -i ' + tmpbase + '.aligned ' + MODEL_DIR + '/dict ' + MODEL_DIR + '/monophones ' + tmpbase + '.plp 2>&1 > /dev/null')
except:
print('HVite error!')
return None
with open(tmpbase + '.txt', 'r') as fid:
words = fid.readline().strip().split()
words = txt.strip().split()
words.reverse()
with open(tmpbase + '.aligned', 'r') as fid:
lines = fid.readlines()
i = 2
times2 = []
word2phns = {}
current_word = ''
index = 0
while (i < len(lines)):
splited_line = lines[i].strip().split()
if (len(splited_line) >= 4) and (splited_line[0] != splited_line[1]):
phn = splited_line[2]
pst = (int(splited_line[0])/1000+125)/10000
pen = (int(splited_line[1])/1000+125)/10000
times2.append([phn, pst, pen])
# splited_line[-1]!='sp'
if len(splited_line)==5:
current_word = str(index)+'_'+splited_line[-1]
word2phns[current_word] = phn
index+=1
elif len(splited_line)==4:
word2phns[current_word] += ' '+phn
i+=1
return times2,word2phns
import paddle
import numpy as np
import math
def pad_list(xs, pad_value):
"""Perform padding for the list of tensors.
Args:
xs (List): List of Tensors [(T_1, `*`), (T_2, `*`), ..., (T_B, `*`)].
pad_value (float): Value for padding.
Returns:
Tensor: Padded tensor (B, Tmax, `*`).
Examples:
>>> x = [torch.ones(4), torch.ones(2), torch.ones(1)]
>>> x
[tensor([1., 1., 1., 1.]), tensor([1., 1.]), tensor([1.])]
>>> pad_list(x, 0)
tensor([[1., 1., 1., 1.],
[1., 1., 0., 0.],
[1., 0., 0., 0.]])
"""
n_batch = len(xs)
max_len = max(paddle.shape(x)[0] for x in xs)
pad = paddle.full((n_batch, max_len), pad_value, dtype = xs[0].dtype)
for i in range(n_batch):
pad[i, : paddle.shape(xs[i])[0]] = xs[i]
return pad
def pad_to_longformer_att_window(text, max_len, max_tlen,attention_window):
round = max_len % attention_window
if round != 0:
max_tlen += (attention_window - round)
n_batch = paddle.shape(text)[0]
text_pad = paddle.zeros((n_batch, max_tlen, *paddle.shape(text[0])[1:]), dtype=text.dtype)
for i in range(n_batch):
text_pad[i, : paddle.shape(text[i])[0]] = text[i]
else:
text_pad = text[:, : max_tlen]
return text_pad, max_tlen
def make_pad_mask(lengths, xs=None, length_dim=-1):
"""Make mask tensor containing indices of padded part.
Args:
lengths (LongTensor or List): Batch of lengths (B,).
xs (Tensor, optional): The reference tensor.
If set, masks will be the same shape as this tensor.
length_dim (int, optional): Dimension indicator of the above tensor.
See the example.
Returns:
Tensor: Mask tensor containing indices of padded part.
dtype=torch.uint8 in PyTorch 1.2-
dtype=torch.bool in PyTorch 1.2+ (including 1.2)
Examples:
With only lengths.
>>> lengths = [5, 3, 2]
>>> make_non_pad_mask(lengths)
masks = [[0, 0, 0, 0 ,0],
[0, 0, 0, 1, 1],
[0, 0, 1, 1, 1]]
With the reference tensor.
>>> xs = torch.zeros((3, 2, 4))
>>> make_pad_mask(lengths, xs)
tensor([[[0, 0, 0, 0],
[0, 0, 0, 0]],
[[0, 0, 0, 1],
[0, 0, 0, 1]],
[[0, 0, 1, 1],
[0, 0, 1, 1]]], dtype=torch.uint8)
>>> xs = torch.zeros((3, 2, 6))
>>> make_pad_mask(lengths, xs)
tensor([[[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1]],
[[0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 1, 1]],
[[0, 0, 1, 1, 1, 1],
[0, 0, 1, 1, 1, 1]]], dtype=torch.uint8)
With the reference tensor and dimension indicator.
>>> xs = torch.zeros((3, 6, 6))
>>> make_pad_mask(lengths, xs, 1)
tensor([[[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1]],
[[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1]],
[[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1]]], dtype=torch.uint8)
>>> make_pad_mask(lengths, xs, 2)
tensor([[[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1]],
[[0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 1, 1]],
[[0, 0, 1, 1, 1, 1],
[0, 0, 1, 1, 1, 1],
[0, 0, 1, 1, 1, 1],
[0, 0, 1, 1, 1, 1],
[0, 0, 1, 1, 1, 1],
[0, 0, 1, 1, 1, 1]]], dtype=torch.uint8)
"""
if length_dim == 0:
raise ValueError("length_dim cannot be 0: {}".format(length_dim))
if not isinstance(lengths, list):
lengths = list(lengths)
# print('lengths', lengths)
bs = int(len(lengths))
if xs is None:
maxlen = int(max(lengths))
else:
maxlen = paddle.shape(xs)[length_dim]
seq_range = paddle.arange(0, maxlen, dtype=paddle.int64)
seq_range_expand = paddle.expand(paddle.unsqueeze(seq_range, 0), (bs, maxlen))
seq_length_expand = paddle.unsqueeze(paddle.to_tensor(lengths), -1)
# print('seq_length_expand', paddle.shape(seq_length_expand))
# print('seq_range_expand', paddle.shape(seq_range_expand))
mask = seq_range_expand >= seq_length_expand
if xs is not None:
assert paddle.shape(xs)[0] == bs, (paddle.shape(xs)[0], bs)
if length_dim < 0:
length_dim = len(paddle.shape(xs)) + length_dim
# ind = (:, None, ..., None, :, , None, ..., None)
ind = tuple(
slice(None) if i in (0, length_dim) else None for i in range(len(paddle.shape(xs)))
)
# print('0:', paddle.shape(mask))
# print('1:', paddle.shape(mask[ind]))
# print('2:', paddle.shape(xs))
mask = paddle.expand(mask[ind], paddle.shape(xs))
return mask
def make_non_pad_mask(lengths, xs=None, length_dim=-1):
"""Make mask tensor containing indices of non-padded part.
Args:
lengths (LongTensor or List): Batch of lengths (B,).
xs (Tensor, optional): The reference tensor.
If set, masks will be the same shape as this tensor.
length_dim (int, optional): Dimension indicator of the above tensor.
See the example.
Returns:
ByteTensor: mask tensor containing indices of padded part.
dtype=torch.uint8 in PyTorch 1.2-
dtype=torch.bool in PyTorch 1.2+ (including 1.2)
Examples:
With only lengths.
>>> lengths = [5, 3, 2]
>>> make_non_pad_mask(lengths)
masks = [[1, 1, 1, 1 ,1],
[1, 1, 1, 0, 0],
[1, 1, 0, 0, 0]]
With the reference tensor.
>>> xs = torch.zeros((3, 2, 4))
>>> make_non_pad_mask(lengths, xs)
tensor([[[1, 1, 1, 1],
[1, 1, 1, 1]],
[[1, 1, 1, 0],
[1, 1, 1, 0]],
[[1, 1, 0, 0],
[1, 1, 0, 0]]], dtype=torch.uint8)
>>> xs = torch.zeros((3, 2, 6))
>>> make_non_pad_mask(lengths, xs)
tensor([[[1, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 0]],
[[1, 1, 1, 0, 0, 0],
[1, 1, 1, 0, 0, 0]],
[[1, 1, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0]]], dtype=torch.uint8)
With the reference tensor and dimension indicator.
>>> xs = torch.zeros((3, 6, 6))
>>> make_non_pad_mask(lengths, xs, 1)
tensor([[[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0]],
[[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]],
[[1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]]], dtype=torch.uint8)
>>> make_non_pad_mask(lengths, xs, 2)
tensor([[[1, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 0]],
[[1, 1, 1, 0, 0, 0],
[1, 1, 1, 0, 0, 0],
[1, 1, 1, 0, 0, 0],
[1, 1, 1, 0, 0, 0],
[1, 1, 1, 0, 0, 0],
[1, 1, 1, 0, 0, 0]],
[[1, 1, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0]]], dtype=torch.uint8)
"""
return ~make_pad_mask(lengths, xs, length_dim)
def phones_masking(xs_pad, src_mask, align_start, align_end, align_start_lengths, mlm_prob, mean_phn_span, span_boundary=None):
bz, sent_len, _ = paddle.shape(xs_pad)
mask_num_lower = math.ceil(sent_len * mlm_prob)
masked_position = np.zeros((bz, sent_len))
y_masks = None
# y_masks = torch.ones(bz,sent_len,sent_len,device=xs_pad.device,dtype=xs_pad.dtype)
# tril_masks = torch.tril(y_masks)
if mlm_prob == 1.0:
masked_position += 1
# y_masks = tril_masks
elif mean_phn_span == 0:
# only speech
length = sent_len
mean_phn_span = min(length*mlm_prob//3, 50)
masked_phn_indices = random_spans_noise_mask(length,mlm_prob, mean_phn_span).nonzero()
masked_position[:,masked_phn_indices]=1
else:
for idx in range(bz):
if span_boundary is not None:
for s,e in zip(span_boundary[idx][::2], span_boundary[idx][1::2]):
masked_position[idx, s:e] = 1
# y_masks[idx, :, s:e] = tril_masks[idx, :, s:e]
# y_masks[idx, e:, s:e ] = 0
else:
length = align_start_lengths[idx].item()
if length<2:
continue
masked_phn_indices = random_spans_noise_mask(length,mlm_prob, mean_phn_span).nonzero()
masked_start = align_start[idx][masked_phn_indices].tolist()
masked_end = align_end[idx][masked_phn_indices].tolist()
for s,e in zip(masked_start, masked_end):
masked_position[idx, s:e] = 1
# y_masks[idx, :, s:e] = tril_masks[idx, :, s:e]
# y_masks[idx, e:, s:e ] = 0
non_eos_mask = np.array(paddle.reshape(src_mask, paddle.shape(xs_pad)[:2]).float().cpu())
masked_position = masked_position * non_eos_mask
# y_masks = src_mask & y_masks.bool()
return paddle.cast(paddle.to_tensor(masked_position), paddle.bool), y_masks
def get_segment_pos(speech_pad, text_pad, align_start, align_end, align_start_lengths,sega_emb):
bz, speech_len, _ = speech_pad.size()
_, text_len = text_pad.size()
# text_segment_pos = paddle.zeros_like(text_pad)
# speech_segment_pos = paddle.zeros((bz, speech_len),dtype=text_pad.dtype)
text_segment_pos = np.zeros((bz, text_len)).astype('int64')
speech_segment_pos = np.zeros((bz, speech_len)).astype('int64')
if not sega_emb:
text_segment_pos = paddle.to_tensor(text_segment_pos)
speech_segment_pos = paddle.to_tensor(speech_segment_pos)
return speech_segment_pos, text_segment_pos
for idx in range(bz):
align_length = align_start_lengths[idx].item()
for j in range(align_length):
s,e = align_start[idx][j].item(), align_end[idx][j].item()
speech_segment_pos[idx][s:e] = j+1
text_segment_pos[idx][j] = j+1
text_segment_pos = paddle.to_tensor(text_segment_pos)
speech_segment_pos = paddle.to_tensor(speech_segment_pos)
return speech_segment_pos, text_segment_pos
\ No newline at end of file
此差异已折叠。
...@@ -63,11 +63,7 @@ class LogMelFBank(): ...@@ -63,11 +63,7 @@ class LogMelFBank():
window=self.window, window=self.window,
center=self.center, center=self.center,
pad_mode=self.pad_mode) pad_mode=self.pad_mode)
f = open('/mnt/home/xiaoran/projects/wave_summit/espnet_dual_mask/tmp_var_stft.out.1', 'w')
print('stft shape is', D.size())
# for item in [round(item, 6) for item in output["speech"][0].tolist()]:
# f.write(str(item)+'\n')
# f.close()
return D return D
def _spectrogram(self, wav): def _spectrogram(self, wav):
......
ou3 ou3
a3 a3
eng4 eng4
u1 u1
vn2 vn2
uang3 uang3
ang3 ang3
ua1 ua1
ou1 ou1
in3 in3
uai4 uai4
van1 van1
en2 en2
ia4 ia4
uai2 uai2
iang4 iang4
ai3 ai3
sp sp
in1 in1
uai3 uai3
ve1 ve1
ou4 ou4
d d
ang2 ang2
iang3 iang3
o1 o1
iao3 iao3
an1 an1
en5 en5
ong3 ong3
e5 e5
e3 e3
van3 van3
i3 i3
i2 i2
uo4 uo4
i1 i1
in2 in2
v1 v1
uang4 uang4
en3 en3
ian5 ian5
ie3 ie3
o2 o2
x x
iang2 iang2
ei1 ei1
uang2 uang2
t t
ao4 ao4
ch ch
o3 o3
en1 en1
ie1 ie1
uan3 uan3
uo1 uo1
iang5 iang5
iong1 iong1
l l
a5 a5
an4 an4
u2 u2
ei3 ei3
uo3 uo3
ai2 ai2
v3 v3
k k
uan4 uan4
ian2 ian2
ei2 ei2
sh sh
g g
ong2 ong2
ing1 ing1
vn3 vn3
r r
ong1 ong1
ao1 ao1
ua3 ua3
ia1 ia1
u3 u3
s s
b b
e2 e2
ua4 ua4
iang1 iang1
ie4 ie4
ou5 ou5
ing4 ing4
ai1 ai1
iong4 iong4
uo5 uo5
ei5 ei5
ueng1 ueng1
ou2 ou2
e1 e1
f f
en4 en4
v2 v2
iao2 iao2
ie2 ie2
van2 van2
eng1 eng1
ai4 ai4
uo2 uo2
iao1 iao1
in4 in4
er4 er4
e4 e4
uan1 uan1
ia3 ia3
ao2 ao2
u4 u4
ei4 ei4
eng3 eng3
z z
j j
ve3 ve3
n n
an3 an3
uan2 uan2
o5 o5
ve2 ve2
ang4 ang4
er2 er2
ia5 ia5
ian4 ian4
er5 er5
ia2 ia2
eng2 eng2
ie5 ie5
ang1 ang1
er3 er3
ian1 ian1
<unk> <unk>
c c
v4 v4
iao4 iao4
a4 a4
m m
a2 a2
ong4 ong4
q q
uang1 uang1
an2 an2
ua2 ua2
zh zh
ing2 ing2
ve4 ve4
van4 van4
vn4 vn4
iong3 iong3
i4 i4
ian3 ian3
ing3 ing3
p p
iong2 iong2
ao3 ao3
vn1 vn1
uai1 uai1
a1 a1
o4 o4
h h
uenr4 un4 ee er5
iaor3 iao3 ee er2
iour4 ii iu4 ee er2
iangr4 ii iang4 ee er5
iou3 ii iu3
sil sp
iour1 iu1 ee er5
vn5 vn1
ir1 i1 ee er2
vanr1 van1 ee er2
vanr2 van2 ee er5
air3 ai3 ee er2
uangr4 uu uang1
enr1 en1 ee er2
iour3 ii iu3 ee er5
uenr1 un1 ee er5
uenr3 un3 ee er5
or2 o2 ee er2
anr3 an3 ee er5
ai5 ai4
iaor2 iao2 ee er2
uanr3 uan3 ee er5
uanr2 uu uan4 ee er2
uen1 un1
ua5 uu ua2
uen3 uu un3
iii4 ix4
uor1 uo1 ee er5
our2 ou5 ee er2
uei1 uu ui1
vr3 v3 ee er5
uenr2 un2 ee er5
uanr5 uu uan2 ee er5
iiir4 ix4 ee er5
iiir1 ix1 ee er5
ur2 u3 ee er5
eng5 eng1
ingr1 ii ing1 ee er2
ii4 iy4
ve5 vv ve1
? <unk>
ii1 iy1
ao5 ao3
v5 vv v2
ing5 ing2
i5 i1
iou5 ii iu3
uen4 un4
our4 ou4 ee er5
io3 ii iu3
ar4 a4 ee er5
ingr2 ing2 ee er5
ingr4 ing4 ee er5
ir3 e5 ee er5
iaor4 iao4 ee er5
ii2 ix2
uanr4 uan4 ee er5
enr5 en4 ee er2
ianr3 ian3 ee er5
uei5 uu ui2
ianr4 ian4 ee er2
iar4 ia4 ee er2
uair4 uai1 ee er2
enr2 en2 ee er5
iii1 ix1
ver3 ve3 ee er2
ianr5 ian3 ee er5
ong5 ong1
air2 ai2 ee er5
angr4 ang4 ee er5
iii5 ix2
ang5 ang1
iou1 iu1
uar4 ua4 ee er5
ur4 u4 ee er5
iou4 iu4
iou2 ii iu2
in5 in1
uor2 uo2 ee er5
uar2 ua2 ee er5
uei2 uu ui2
<pad> <unk>
anr1 an1 ee er5
ar5 a1 ee er5
uen2 un2
eir4 ei4 ee er2
ingr3 ii ing3 ee er5
aor4 ao4 ee er5
enr4 en4 ee er5
iao5 ii iao2
iii2 ix2
er1 e1 ee er5
iaor1 iao1 ee er5
ueir1 ui1 ee er2
inr4 in4 ee er5
ueir2 ui4 ee er5
uan5 ai2 ee er5
ir4 i4 ee er2
ur1 u1 ee er5
iour2 iu1 ee er2
ar2 a2 ee er5
an5 an2
iii3 ix3
ver4 vv ve4 ee er2
。 <unk>
aor3 ao3 ee er5
iong5 ii iong4
u5 u4
air4 ai4 ee er5
ii3 iy3
our5 ou4 ee er5
inr1 in1 ee er5
uor3 uo3 ee er5
van5 van4
ur5 u4 ee er2
aor5 ao4 ee er5
engr4 eng4 ee er2
ueir4 ui4 ee er5
<eos> <unk>
angr2 ang2 ee er2
ii5 iy5
vnr2 vn2 ee er5
enr3 en3 ee er5
uar1 ua1 ee er2
vanr4 van4 ee er5
, <unk>
uor5 uo3 ee er5
uei4 ui4
aor1 ao1 ee er5
uen5 uu un4
anr4 an4 ee er5
iar1 ia1 ee er5
vanr3 van3 ee er5
uei3 uu ui3
! <unk>
io1 ii uo5
spl <unk>
ar3 a3 ee er5
our3 ou3 ee er5
ueir3 ui3 ee er5
ianr2 ian3 ee er5
ueng4 uu un4
ianr1 ian1 ee er5
# en --> zh 的 语音合成 # en --> zh 的 语音合成
# 根据Prompt_003_new对应的语音: This was not the show for me. 来合成: '今天天气很好' # 根据Prompt_003_new作为提示语音: This was not the show for me. 来合成: '今天天气很好'
# 注: 输入的new_str需为中文汉字, 否则会通过预处理只保留中文汉字, 即合成预处理后的中文语音。
python inference.py \ python inference.py \
--task_name cross-lingual_clone \ --task_name cross-lingual_clone \
--model_name paddle_checkpoint_ench \ --model_name paddle_checkpoint_dual_mask_enzh \
--uid Prompt_003_new \ --uid Prompt_003_new \
--new_str '今天天气很好' \ --new_str '今天天气很好.' \
--prefix ./prompt/dev/ \ --prefix ./prompt/dev/ \
--source_language english \ --source_language english \
--target_language chinese \ --target_language chinese \
--output_name pred_zh.wav \ --output_name pred_clone.wav \
--use_pt_vocoder False \
--voc pwgan_aishell3 \ --voc pwgan_aishell3 \
--voc_config download/pwg_aishell3_ckpt_0.5/default.yaml \ --voc_config download/pwg_aishell3_ckpt_0.5/default.yaml \
--voc_ckpt download/pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz \ --voc_ckpt download/pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz \
......
# 纯英文的语音合成 # 纯英文的语音合成
# 根据p299_096对应的语音: This was not the show for me. 来合成: 'I enjoy my life.' # 样例为根据p299_096对应的语音作为提示语音: This was not the show for me. 来合成: 'I enjoy my life.'
python inference.py \ python inference.py \
--task_name synthesize \ --task_name synthesize \
...@@ -9,7 +9,8 @@ python inference.py \ ...@@ -9,7 +9,8 @@ python inference.py \
--prefix ./prompt/dev/ \ --prefix ./prompt/dev/ \
--source_language english \ --source_language english \
--target_language english \ --target_language english \
--output_name pred.wav \ --output_name pred_gen.wav \
--use_pt_vocoder True \
--voc pwgan_aishell3 \ --voc pwgan_aishell3 \
--voc_config download/pwg_aishell3_ckpt_0.5/default.yaml \ --voc_config download/pwg_aishell3_ckpt_0.5/default.yaml \
--voc_ckpt download/pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz \ --voc_ckpt download/pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz \
......
# 纯英文的语音编辑 # 纯英文的语音编辑
# 将p243_new对应的原始语音: For that reason cover should not be given. 编辑成'for that reason cover is impossible to be given.'对应的语音 # 样例为把p243_new对应的原始语音: For that reason cover should not be given.编辑成'for that reason cover is impossible to be given.'对应的语音
# NOTE: 语音编辑任务暂支持句子中1个位置的替换或者插入文本操作
python inference.py \ python inference.py \
--task_name edit \ --task_name edit \
...@@ -9,7 +10,8 @@ python inference.py \ ...@@ -9,7 +10,8 @@ python inference.py \
--prefix ./prompt/dev/ \ --prefix ./prompt/dev/ \
--source_language english \ --source_language english \
--target_language english \ --target_language english \
--output_name pred.wav \ --output_name pred_edit.wav \
--use_pt_vocoder True \
--voc pwgan_aishell3 \ --voc pwgan_aishell3 \
--voc_config download/pwg_aishell3_ckpt_0.5/default.yaml \ --voc_config download/pwg_aishell3_ckpt_0.5/default.yaml \
--voc_ckpt download/pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz \ --voc_ckpt download/pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz \
......
...@@ -73,8 +73,8 @@ def parse_args(): ...@@ -73,8 +73,8 @@ def parse_args():
parser.add_argument( parser.add_argument(
"--ngpu", type=int, default=1, help="if ngpu == 0, use cpu.") "--ngpu", type=int, default=1, help="if ngpu == 0, use cpu.")
parser.add_argument("--test_metadata", type=str, help="test metadata.") # parser.add_argument("--test_metadata", type=str, help="test metadata.")
parser.add_argument("--output_dir", type=str, help="output dir.") # parser.add_argument("--output_dir", type=str, help="output dir.")
parser.add_argument("--model_name", type=str, help="model name") parser.add_argument("--model_name", type=str, help="model name")
parser.add_argument("--uid", type=str, help="uid") parser.add_argument("--uid", type=str, help="uid")
...@@ -86,7 +86,7 @@ def parse_args(): ...@@ -86,7 +86,7 @@ def parse_args():
parser.add_argument("--target_language", type=str, help="target language") parser.add_argument("--target_language", type=str, help="target language")
parser.add_argument("--output_name", type=str, help="output name") parser.add_argument("--output_name", type=str, help="output name")
parser.add_argument("--task_name", type=str, help="task name") parser.add_argument("--task_name", type=str, help="task name")
parser.add_argument("--use_pt_vocoder", default=True, help="use pytorch version vocoder or not. [Note: only in english condition.]")
# pre # pre
args = parser.parse_args() args = parser.parse_args()
......
# Coding parameters
SOURCEKIND = WAVEFORM
SOURCEFORMAT = WAVE
SOURCERATE = 907.02947845804988
TARGETKIND = PLP_0_D_A_Z
TARGETRATE = 100000.0
SAVECOMPRESSED = T
SAVEWITHCRC = T
WINDOWSIZE = 250000.0
ZMEANSOURCE = T
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 20
LPCORDER = 12
USEPOWER = T
此差异已折叠。
~o
<STREAMINFO> 1 39
<VECSIZE> 39<NULLD><PLP_D_A_Z_0><DIAGC>
~v "varFloor1"
<VARIANCE> 39
2.970232e-03 3.081554e-03 3.337499e-03 4.222610e-03 4.197491e-03 3.755180e-03 3.401211e-03 3.156109e-03 2.829444e-03 2.476874e-03 1.801175e-03 1.400571e-03 4.726708e-03 1.402909e-04 1.383319e-04 1.553502e-04 2.128327e-04 2.107100e-04 2.003327e-04 2.263938e-04 2.249473e-04 2.067962e-04 1.757082e-04 1.399256e-04 1.028699e-04 1.197369e-04 2.207970e-05 2.272787e-05 2.571406e-05 3.619217e-05 3.745446e-05 3.682210e-05 4.203814e-05 4.217610e-05 3.967129e-05 3.367268e-05 2.703490e-05 1.971991e-05 1.748702e-05
# Coding parameters
SOURCEKIND = WAVEFORM
SOURCEFORMAT = WAVE
SOURCERATE = 625.0
TARGETKIND = PLP_0_D_A_Z
TARGETRATE = 100000.0
SAVECOMPRESSED = T
SAVEWITHCRC = T
WINDOWSIZE = 250000.0
ZMEANSOURCE = T
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 20
LPCORDER = 12
USEPOWER = T
此差异已折叠。
~o
<STREAMINFO> 1 39
<VECSIZE> 39<NULLD><PLP_D_A_Z_0><DIAGC>
~v "varFloor1"
<VARIANCE> 39
2.970232e-03 3.081554e-03 3.337499e-03 4.222610e-03 4.197491e-03 3.755180e-03 3.401211e-03 3.156109e-03 2.829444e-03 2.476874e-03 1.801175e-03 1.400571e-03 4.726708e-03 1.402909e-04 1.383319e-04 1.553502e-04 2.128327e-04 2.107100e-04 2.003327e-04 2.263938e-04 2.249473e-04 2.067962e-04 1.757082e-04 1.399256e-04 1.028699e-04 1.197369e-04 2.207970e-05 2.272787e-05 2.571406e-05 3.619217e-05 3.745446e-05 3.682210e-05 4.203814e-05 4.217610e-05 3.967129e-05 3.367268e-05 2.703490e-05 1.971991e-05 1.748702e-05
# Coding parameters
SOURCEKIND = WAVEFORM
SOURCEFORMAT = WAVE
SOURCERATE = 1250
TARGETKIND = PLP_0_D_A_Z
TARGETRATE = 100000.0
SAVECOMPRESSED = T
SAVEWITHCRC = T
WINDOWSIZE = 250000.0
ZMEANSOURCE = T
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 20
LPCORDER = 12
USEPOWER = T
此差异已折叠。
~o
<STREAMINFO> 1 39
<VECSIZE> 39<NULLD><PLP_D_A_Z_0><DIAGC>
~v "varFloor1"
<VARIANCE> 39
2.320759e-03 3.364773e-03 2.644561e-03 4.602237e-03 4.153211e-03 3.535625e-03 3.436818e-03 3.055576e-03 2.946933e-03 2.210875e-03 1.983593e-03 1.391166e-03 5.161191e-03 1.195636e-04 1.395769e-04 1.410736e-04 2.242859e-04 2.118236e-04 2.178820e-04 2.484023e-04 2.270718e-04 2.155360e-04 1.773744e-04 1.613469e-04 1.159174e-04 1.315518e-04 1.986226e-05 2.259619e-05 2.456991e-05 3.887276e-05 3.827550e-05 4.066243e-05 4.655687e-05 4.391165e-05 4.144727e-05 3.483306e-05 3.158762e-05 2.273686e-05 1.879711e-05
此差异已折叠。
EH2
K
S
L
AH0
M
EY1
SH
N
P
OY2
T
OW1
Z
W
D
AH1
B
EH1
V
IH1
AA1
R
AY1
ER0
AE1
AE2
AO1
NG
G
IH0
TH
IY2
F
DH
IY1
HH
UH1
IY0
OY1
OW2
CH
UW1
IH2
EH0
AO2
AA0
AA2
OW0
EY0
AE0
AW2
AW1
EY2
UW0
AH2
UW2
AO0
JH
Y
ZH
AY2
ER1
UH2
AY0
ER2
OY0
UH0
AW0
br
cg
lg
ls
ns
sil
sp
# Coding parameters
SOURCEKIND = WAVEFORM
SOURCEFORMAT = WAVE
SOURCERATE = 907.02947845804988
TARGETKIND = PLP_0_D_A_Z
TARGETRATE = 100000.0
SAVECOMPRESSED = T
SAVEWITHCRC = T
WINDOWSIZE = 250000.0
ZMEANSOURCE = T
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 20
LPCORDER = 12
USEPOWER = T
~o
<STREAMINFO> 1 39
<VECSIZE> 39<NULLD><PLP_D_A_Z_0><DIAGC>
~v "varFloor1"
<VARIANCE> 39
2.970232e-03 3.081554e-03 3.337499e-03 4.222610e-03 4.197491e-03 3.755180e-03 3.401211e-03 3.156109e-03 2.829444e-03 2.476874e-03 1.801175e-03 1.400571e-03 4.726708e-03 1.402909e-04 1.383319e-04 1.553502e-04 2.128327e-04 2.107100e-04 2.003327e-04 2.263938e-04 2.249473e-04 2.067962e-04 1.757082e-04 1.399256e-04 1.028699e-04 1.197369e-04 2.207970e-05 2.272787e-05 2.571406e-05 3.619217e-05 3.745446e-05 3.682210e-05 4.203814e-05 4.217610e-05 3.967129e-05 3.367268e-05 2.703490e-05 1.971991e-05 1.748702e-05
# Coding parameters
SOURCEKIND = WAVEFORM
SOURCEFORMAT = WAVE
SOURCERATE = 625.0
TARGETKIND = PLP_0_D_A_Z
TARGETRATE = 100000.0
SAVECOMPRESSED = T
SAVEWITHCRC = T
WINDOWSIZE = 250000.0
ZMEANSOURCE = T
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 20
LPCORDER = 12
USEPOWER = T
~o
<STREAMINFO> 1 39
<VECSIZE> 39<NULLD><PLP_D_A_Z_0><DIAGC>
~v "varFloor1"
<VARIANCE> 39
2.970232e-03 3.081554e-03 3.337499e-03 4.222610e-03 4.197491e-03 3.755180e-03 3.401211e-03 3.156109e-03 2.829444e-03 2.476874e-03 1.801175e-03 1.400571e-03 4.726708e-03 1.402909e-04 1.383319e-04 1.553502e-04 2.128327e-04 2.107100e-04 2.003327e-04 2.263938e-04 2.249473e-04 2.067962e-04 1.757082e-04 1.399256e-04 1.028699e-04 1.197369e-04 2.207970e-05 2.272787e-05 2.571406e-05 3.619217e-05 3.745446e-05 3.682210e-05 4.203814e-05 4.217610e-05 3.967129e-05 3.367268e-05 2.703490e-05 1.971991e-05 1.748702e-05
# Coding parameters
SOURCEKIND = WAVEFORM
SOURCEFORMAT = WAVE
SOURCERATE = 1250
TARGETKIND = PLP_0_D_A_Z
TARGETRATE = 100000.0
SAVECOMPRESSED = T
SAVEWITHCRC = T
WINDOWSIZE = 250000.0
ZMEANSOURCE = T
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 20
LPCORDER = 12
USEPOWER = T
~o
<STREAMINFO> 1 39
<VECSIZE> 39<NULLD><PLP_D_A_Z_0><DIAGC>
~v "varFloor1"
<VARIANCE> 39
2.320759e-03 3.364773e-03 2.644561e-03 4.602237e-03 4.153211e-03 3.535625e-03 3.436818e-03 3.055576e-03 2.946933e-03 2.210875e-03 1.983593e-03 1.391166e-03 5.161191e-03 1.195636e-04 1.395769e-04 1.410736e-04 2.242859e-04 2.118236e-04 2.178820e-04 2.484023e-04 2.270718e-04 2.155360e-04 1.773744e-04 1.613469e-04 1.159174e-04 1.315518e-04 1.986226e-05 2.259619e-05 2.456991e-05 3.887276e-05 3.827550e-05 4.066243e-05 4.655687e-05 4.391165e-05 4.144727e-05 3.483306e-05 3.158762e-05 2.273686e-05 1.879711e-05
此差异已折叠。
EH2
K
S
L
AH0
M
EY1
SH
N
P
OY2
T
OW1
Z
W
D
AH1
B
EH1
V
IH1
AA1
R
AY1
ER0
AE1
AE2
AO1
NG
G
IH0
TH
IY2
F
DH
IY1
HH
UH1
IY0
OY1
OW2
CH
UW1
IH2
EH0
AO2
AA0
AA2
OW0
EY0
AE0
AW2
AW1
EY2
UW0
AH2
UW2
AO0
JH
Y
ZH
AY2
ER1
UH2
AY0
ER2
OY0
UH0
AW0
br
cg
lg
ls
ns
sil
sp
phoneme: phoneme.o english.o parse.o saynum.o spellword.o
head 1.1;
access;
symbols;
locks
steve:1.1; strict;
comment @ * @;
1.1
date 2009.03.13.20.13.23; author steve; state Exp;
branches;
next ;
desc
@parse.c
@
1.1
log
@Initial revision
@
text
@#include <stdio.h>
#include <ctype.h>
#define MAX_LENGTH 128
static FILE *In_file;
static FILE *Out_file;
static int Char, Char1, Char2, Char3;
/*
** main(argc, argv)
** int argc;
** char *argv[];
**
** This is the main program. It takes up to two file names (input
** and output) and translates the input file to phoneme codes
** (see english.c) on the output file.
*/
main(argc, argv)
int argc;
char *argv[];
{
if (argc > 3)
{
fputs("Usage: PHONEME [infile [outfile]]\n", stderr);
exit();
}
if (argc == 1)
{
fputs("Enter english text:\n", stderr);
}
if (argc > 1)
{
In_file = fopen(argv[1], "r");
if (In_file == 0)
{
fputs("Error: Cannot open input file.\n", stderr);
fputs("Usage: PHONEME [infile [outfile]]\n", stderr);
exit();
}
}
else
In_file = stdin;
if (argc > 2)
{
Out_file = fopen(argv[2], "w");
if (Out_file == 0)
{
fputs("Error: Cannot create output file.\n", stderr);
fputs("Usage: PHONEME [infile [outfile]]\n", stderr);
exit();
}
}
else
Out_file = stdout;
xlate_file();
}
outstring(string)
char *string;
{
while (*string != '\0')
outchar(*string++);
}
outchar(chr)
int chr;
{
fputc(chr,Out_file);
}
int makeupper(character)
int character;
{
if (islower(character))
return toupper(character);
else
return character;
}
new_char()
{
/*
If the cache is full of newline, time to prime the look-ahead
again. If an EOF is found, fill the remainder of the queue with
EOF's.
*/
if (Char == '\n' && Char1 == '\n' && Char2 == '\n' && Char3 == '\n')
{ /* prime the pump again */
Char = getc(In_file);
if (Char == EOF)
{
Char1 = EOF;
Char2 = EOF;
Char3 = EOF;
return Char;
}
if (Char == '\n')
return Char;
Char1 = getc(In_file);
if (Char1 == EOF)
{
Char2 = EOF;
Char3 = EOF;
return Char;
}
if (Char1 == '\n')
return Char;
Char2 = getc(In_file);
if (Char2 == EOF)
{
Char3 = EOF;
return Char;
}
if (Char2 == '\n')
return Char;
Char3 = getc(In_file);
}
else
{
/*
Buffer not full of newline, shuffle the characters and
either get a new one or propagate a newline or EOF.
*/
Char = Char1;
Char1 = Char2;
Char2 = Char3;
if (Char3 != '\n' && Char3 != EOF)
Char3 = getc(In_file);
}
return Char;
}
/*
** xlate_file()
**
** This is the input file translator. It sets up the first character
** and uses it to determine what kind of text follows.
*/
xlate_file()
{
/* Prime the queue */
Char = '\n';
Char1 = '\n';
Char2 = '\n';
Char3 = '\n';
new_char(); /* Fill Char, Char1, Char2 and Char3 */
while (Char != EOF) /* All of the words in the file */
{
if (isdigit(Char))
have_number();
else
if (isalpha(Char) || Char == '\'')
have_letter();
else
if (Char == '$' && isdigit(Char1))
have_dollars();
else
have_special();
}
}
have_dollars()
{
long int value;
value = 0L;
for (new_char() ; isdigit(Char) || Char == ',' ; new_char())
{
if (Char != ',')
value = 10 * value + (Char-'0');
}
say_cardinal(value); /* Say number of whole dollars */
/* Found a character that is a non-digit and non-comma */
/* Check for no decimal or no cents digits */
if (Char != '.' || !isdigit(Char1))
{
if (value == 1L)
outstring("dAAlER ");
else
outstring("dAAlAArz ");
return;
}
/* We have '.' followed by a digit */
new_char(); /* Skip the period */
/* If it is ".dd " say as " DOLLARS AND n CENTS " */
if (isdigit(Char1) && !isdigit(Char2))
{
if (value == 1L)
outstring("dAAlER ");
else
outstring("dAAlAArz ");
if (Char == '0' && Char1 == '0')
{
new_char(); /* Skip tens digit */
new_char(); /* Skip units digit */
return;
}
outstring("AAnd ");
value = (Char-'0')*10 + Char1-'0';
say_cardinal(value);
if (value == 1L)
outstring("sEHnt ");
else
outstring("sEHnts ");
new_char(); /* Used Char (tens digit) */
new_char(); /* Used Char1 (units digit) */
return;
}
/* Otherwise say as "n POINT ddd DOLLARS " */
outstring("pOYnt ");
for ( ; isdigit(Char) ; new_char())
{
say_ascii(Char);
}
outstring("dAAlAArz ");
return;
}
have_special()
{
if (Char == '\n')
outchar('\n');
else
if (!isspace(Char))
say_ascii(Char);
new_char();
return;
}
have_number()
{
long int value;
int lastdigit;
value = Char - '0';
lastdigit = Char;
for (new_char() ; isdigit(Char) ; new_char())
{
value = 10 * value + (Char-'0');
lastdigit = Char;
}
/* Recognize ordinals based on last digit of number */
switch (lastdigit)
{
case '1': /* ST */
if (makeupper(Char) == 'S' && makeupper(Char1) == 'T' &&
!isalpha(Char2) && !isdigit(Char2))
{
say_ordinal(value);
new_char(); /* Used Char */
new_char(); /* Used Char1 */
return;
}
break;
case '2': /* ND */
if (makeupper(Char) == 'N' && makeupper(Char1) == 'D' &&
!isalpha(Char2) && !isdigit(Char2))
{
say_ordinal(value);
new_char(); /* Used Char */
new_char(); /* Used Char1 */
return;
}
break;
case '3': /* RD */
if (makeupper(Char) == 'R' && makeupper(Char1) == 'D' &&
!isalpha(Char2) && !isdigit(Char2))
{
say_ordinal(value);
new_char(); /* Used Char */
new_char(); /* Used Char1 */
return;
}
break;
case '0': /* TH */
case '4': /* TH */
case '5': /* TH */
case '6': /* TH */
case '7': /* TH */
case '8': /* TH */
case '9': /* TH */
if (makeupper(Char) == 'T' && makeupper(Char1) == 'H' &&
!isalpha(Char2) && !isdigit(Char2))
{
say_ordinal(value);
new_char(); /* Used Char */
new_char(); /* Used Char1 */
return;
}
break;
}
say_cardinal(value);
/* Recognize decimal points */
if (Char == '.' && isdigit(Char1))
{
outstring("pOYnt ");
for (new_char() ; isdigit(Char) ; new_char())
{
say_ascii(Char);
}
}
/* Spell out trailing abbreviations */
if (isalpha(Char))
{
while (isalpha(Char))
{
say_ascii(Char);
new_char();
}
}
return;
}
have_letter()
{
char buff[MAX_LENGTH];
int count;
count = 0;
buff[count++] = ' '; /* Required initial blank */
buff[count++] = makeupper(Char);
for (new_char() ; isalpha(Char) || Char == '\'' ; new_char())
{
buff[count++] = makeupper(Char);
if (count > MAX_LENGTH-2)
{
buff[count++] = ' ';
buff[count++] = '\0';
xlate_word(buff);
count = 1;
}
}
buff[count++] = ' '; /* Required terminating blank */
buff[count++] = '\0';
/* Check for AAANNN type abbreviations */
if (isdigit(Char))
{
spell_word(buff);
return;
}
else
if (strlen(buff) == 3) /* one character, two spaces */
say_ascii(buff[1]);
else
if (Char == '.') /* Possible abbreviation */
abbrev(buff);
else
xlate_word(buff);
if (Char == '-' && isalpha(Char1))
new_char(); /* Skip hyphens */
}
/* Handle abbreviations. Text in buff was followed by '.' */
abbrev(buff)
char buff[];
{
if (strcmp(buff, " DR ") == 0)
{
xlate_word(" DOCTOR ");
new_char();
}
else
if (strcmp(buff, " MR ") == 0)
{
xlate_word(" MISTER ");
new_char();
}
else
if (strcmp(buff, " MRS ") == 0)
{
xlate_word(" MISSUS ");
new_char();
}
else
if (strcmp(buff, " PHD ") == 0)
{
spell_word(" PHD ");
new_char();
}
else
xlate_word(buff);
}
@
Final Version of
ENGLISH TO PHONEME TRANSLATION
4/15/85
Here it is one last time. I have fixed all of the bugs I
heard about and added a new feature or two (it now talks
money as well as numbers). I think that this version is
good enough for most purposes. I have proof-read the
phoneme rules (found one bug) and made the program more
"robust". I added protection against the "toupper()"
problem some people had with earlier versions.
If you make a major addition (like better abbreviation
handling or an exception dictionary) please send me a
copy. As before, this is all public domain and I make
no copyright claims on it. The part derived from the
Naval Research Lab should be public anyway. Sell it
if you can!
-John A. Wasser
Work address:
ARPAnet: WASSER%VIKING.DEC@decwrl.ARPA
Usenet: {allegra,Shasta,decvax}!decwrl!dec-rhea!dec-viking!wasser
Easynet: VIKING::WASSER
Telephone: (617)486-2505
USPS: Digital Equipment Corp.
Mail stop: LJO2/E4
30 Porter Rd
Littleton, MA 01460
The files that make up this package are:
english.c Translation rules.
phoneme.c Translate a single word.
parse.c Split a file into words.
spellwor.c Spell an ASCII character or word.
saynum.c Say a cardinal or ordinal number (long int).
/*
** English to Phoneme rules.
**
** Derived from:
**
** AUTOMATIC TRANSLATION OF ENGLISH TEXT TO PHONETICS
** BY MEANS OF LETTER-TO-SOUND RULES
**
** NRL Report 7948
**
** January 21st, 1976
** Naval Research Laboratory, Washington, D.C.
**
**
** Published by the National Technical Information Service as
** document "AD/A021 929".
**
**
**
** The Phoneme codes:
**
** IY bEEt IH bIt
** EY gAte EH gEt
** AE fAt AA fAther
** AO lAWn OW lOne
** UH fUll UW fOOl
** ER mURdER AX About
** AH bUt AY hIde
** AW hOW OY tOY
**
** p Pack b Back
** t Time d Dime
** k Coat g Goat
** f Fault v Vault
** TH eTHer DH eiTHer
** s Sue z Zoo
** SH leaSH ZH leiSure
** HH How m suM
** n suN NG suNG
** l Laugh w Wear
** y Young r Rate
** CH CHar j Jar
** WH WHere
**
**
** Rules are made up of four parts:
**
** The left context.
** The text to match.
** The right context.
** The phonemes to substitute for the matched text.
**
** Procedure:
**
** Seperate each block of letters (apostrophes included)
** and add a space on each side. For each unmatched
** letter in the word, look through the rules where the
** text to match starts with the letter in the word. If
** the text to match is found and the right and left
** context patterns also match, output the phonemes for
** that rule and skip to the next unmatched letter.
**
**
** Special Context Symbols:
**
** # One or more vowels
** : Zero or more consonants
** ^ One consonant.
** . One of B, D, V, G, J, L, M, N, R, W or Z (voiced
** consonants)
** % One of ER, E, ES, ED, ING, ELY (a suffix)
** (Found in right context only)
** + One of E, I or Y (a "front" vowel)
**
*/
/* Context definitions */
static char Anything[] = ""; /* No context requirement */
static char Nothing[] = " "; /* Context is beginning or end of word */
/* Phoneme definitions */
static char Pause[] = " "; /* Short silence */
static char Silent[] = ""; /* No phonemes */
#define LEFT_PART 0
#define MATCH_PART 1
#define RIGHT_PART 2
#define OUT_PART 3
typedef char *Rule[4]; /* Rule is an array of 4 character pointers */
/*0 = Punctuation */
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule punct_rules[] =
{
{Anything, " ", Anything, Pause },
{Anything, "-", Anything, Silent },
{".", "'S", Anything, "z" },
{"#:.E", "'S", Anything, "z" },
{"#", "'S", Anything, "z" },
{Anything, "'", Anything, Silent },
{Anything, ",", Anything, Pause },
{Anything, ".", Anything, Pause },
{Anything, "?", Anything, Pause },
{Anything, "!", Anything, Pause },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule A_rules[] =
{
{Anything, "A", Nothing, "AX" },
{Nothing, "ARE", Nothing, "AAr" },
{Nothing, "AR", "O", "AXr" },
{Anything, "AR", "#", "EHr" },
{"^", "AS", "#", "EYs" },
{Anything, "A", "WA", "AX" },
{Anything, "AW", Anything, "AO" },
{" :", "ANY", Anything, "EHnIY" },
{Anything, "A", "^+#", "EY" },
{"#:", "ALLY", Anything, "AXlIY" },
{Nothing, "AL", "#", "AXl" },
{Anything, "AGAIN", Anything, "AXgEHn"},
{"#:", "AG", "E", "IHj" },
{Anything, "A", "^+:#", "AE" },
{" :", "A", "^+ ", "EY" },
{Anything, "A", "^%", "EY" },
{Nothing, "ARR", Anything, "AXr" },
{Anything, "ARR", Anything, "AEr" },
{" :", "AR", Nothing, "AAr" },
{Anything, "AR", Nothing, "ER" },
{Anything, "AR", Anything, "AAr" },
{Anything, "AIR", Anything, "EHr" },
{Anything, "AI", Anything, "EY" },
{Anything, "AY", Anything, "EY" },
{Anything, "AU", Anything, "AO" },
{"#:", "AL", Nothing, "AXl" },
{"#:", "ALS", Nothing, "AXlz" },
{Anything, "ALK", Anything, "AOk" },
{Anything, "AL", "^", "AOl" },
{" :", "ABLE", Anything, "EYbAXl"},
{Anything, "ABLE", Anything, "AXbAXl"},
{Anything, "ANG", "+", "EYnj" },
{Anything, "A", Anything, "AE" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule B_rules[] =
{
{Nothing, "BE", "^#", "bIH" },
{Anything, "BEING", Anything, "bIYIHNG"},
{Nothing, "BOTH", Nothing, "bOWTH" },
{Nothing, "BUS", "#", "bIHz" },
{Anything, "BUIL", Anything, "bIHl" },
{Anything, "B", Anything, "b" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule C_rules[] =
{
{Nothing, "CH", "^", "k" },
{"^E", "CH", Anything, "k" },
{Anything, "CH", Anything, "CH" },
{" S", "CI", "#", "sAY" },
{Anything, "CI", "A", "SH" },
{Anything, "CI", "O", "SH" },
{Anything, "CI", "EN", "SH" },
{Anything, "C", "+", "s" },
{Anything, "CK", Anything, "k" },
{Anything, "COM", "%", "kAHm" },
{Anything, "C", Anything, "k" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule D_rules[] =
{
{"#:", "DED", Nothing, "dIHd" },
{".E", "D", Nothing, "d" },
{"#:^E", "D", Nothing, "t" },
{Nothing, "DE", "^#", "dIH" },
{Nothing, "DO", Nothing, "dUW" },
{Nothing, "DOES", Anything, "dAHz" },
{Nothing, "DOING", Anything, "dUWIHNG"},
{Nothing, "DOW", Anything, "dAW" },
{Anything, "DU", "A", "jUW" },
{Anything, "D", Anything, "d" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule E_rules[] =
{
{"#:", "E", Nothing, Silent },
{"':^", "E", Nothing, Silent },
{" :", "E", Nothing, "IY" },
{"#", "ED", Nothing, "d" },
{"#:", "E", "D ", Silent },
{Anything, "EV", "ER", "EHv" },
{Anything, "E", "^%", "IY" },
{Anything, "ERI", "#", "IYrIY" },
{Anything, "ERI", Anything, "EHrIH" },
{"#:", "ER", "#", "ER" },
{Anything, "ER", "#", "EHr" },
{Anything, "ER", Anything, "ER" },
{Nothing, "EVEN", Anything, "IYvEHn"},
{"#:", "E", "W", Silent },
{"T", "EW", Anything, "UW" },
{"S", "EW", Anything, "UW" },
{"R", "EW", Anything, "UW" },
{"D", "EW", Anything, "UW" },
{"L", "EW", Anything, "UW" },
{"Z", "EW", Anything, "UW" },
{"N", "EW", Anything, "UW" },
{"J", "EW", Anything, "UW" },
{"TH", "EW", Anything, "UW" },
{"CH", "EW", Anything, "UW" },
{"SH", "EW", Anything, "UW" },
{Anything, "EW", Anything, "yUW" },
{Anything, "E", "O", "IY" },
{"#:S", "ES", Nothing, "IHz" },
{"#:C", "ES", Nothing, "IHz" },
{"#:G", "ES", Nothing, "IHz" },
{"#:Z", "ES", Nothing, "IHz" },
{"#:X", "ES", Nothing, "IHz" },
{"#:J", "ES", Nothing, "IHz" },
{"#:CH", "ES", Nothing, "IHz" },
{"#:SH", "ES", Nothing, "IHz" },
{"#:", "E", "S ", Silent },
{"#:", "ELY", Nothing, "lIY" },
{"#:", "EMENT", Anything, "mEHnt" },
{Anything, "EFUL", Anything, "fUHl" },
{Anything, "EE", Anything, "IY" },
{Anything, "EARN", Anything, "ERn" },
{Nothing, "EAR", "^", "ER" },
{Anything, "EAD", Anything, "EHd" },
{"#:", "EA", Nothing, "IYAX" },
{Anything, "EA", "SU", "EH" },
{Anything, "EA", Anything, "IY" },
{Anything, "EIGH", Anything, "EY" },
{Anything, "EI", Anything, "IY" },
{Nothing, "EYE", Anything, "AY" },
{Anything, "EY", Anything, "IY" },
{Anything, "EU", Anything, "yUW" },
{Anything, "E", Anything, "EH" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule F_rules[] =
{
{Anything, "FUL", Anything, "fUHl" },
{Anything, "F", Anything, "f" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule G_rules[] =
{
{Anything, "GIV", Anything, "gIHv" },
{Nothing, "G", "I^", "g" },
{Anything, "GE", "T", "gEH" },
{"SU", "GGES", Anything, "gjEHs" },
{Anything, "GG", Anything, "g" },
{" B#", "G", Anything, "g" },
{Anything, "G", "+", "j" },
{Anything, "GREAT", Anything, "grEYt" },
{"#", "GH", Anything, Silent },
{Anything, "G", Anything, "g" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule H_rules[] =
{
{Nothing, "HAV", Anything, "hAEv" },
{Nothing, "HERE", Anything, "hIYr" },
{Nothing, "HOUR", Anything, "AWER" },
{Anything, "HOW", Anything, "hAW" },
{Anything, "H", "#", "h" },
{Anything, "H", Anything, Silent },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule I_rules[] =
{
{Nothing, "IN", Anything, "IHn" },
{Nothing, "I", Nothing, "AY" },
{Anything, "IN", "D", "AYn" },
{Anything, "IER", Anything, "IYER" },
{"#:R", "IED", Anything, "IYd" },
{Anything, "IED", Nothing, "AYd" },
{Anything, "IEN", Anything, "IYEHn" },
{Anything, "IE", "T", "AYEH" },
{" :", "I", "%", "AY" },
{Anything, "I", "%", "IY" },
{Anything, "IE", Anything, "IY" },
{Anything, "I", "^+:#", "IH" },
{Anything, "IR", "#", "AYr" },
{Anything, "IZ", "%", "AYz" },
{Anything, "IS", "%", "AYz" },
{Anything, "I", "D%", "AY" },
{"+^", "I", "^+", "IH" },
{Anything, "I", "T%", "AY" },
{"#:^", "I", "^+", "IH" },
{Anything, "I", "^+", "AY" },
{Anything, "IR", Anything, "ER" },
{Anything, "IGH", Anything, "AY" },
{Anything, "ILD", Anything, "AYld" },
{Anything, "IGN", Nothing, "AYn" },
{Anything, "IGN", "^", "AYn" },
{Anything, "IGN", "%", "AYn" },
{Anything, "IQUE", Anything, "IYk" },
{Anything, "I", Anything, "IH" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule J_rules[] =
{
{Anything, "J", Anything, "j" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule K_rules[] =
{
{Nothing, "K", "N", Silent },
{Anything, "K", Anything, "k" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule L_rules[] =
{
{Anything, "LO", "C#", "lOW" },
{"L", "L", Anything, Silent },
{"#:^", "L", "%", "AXl" },
{Anything, "LEAD", Anything, "lIYd" },
{Anything, "L", Anything, "l" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule M_rules[] =
{
{Anything, "MOV", Anything, "mUWv" },
{Anything, "M", Anything, "m" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule N_rules[] =
{
{"E", "NG", "+", "nj" },
{Anything, "NG", "R", "NGg" },
{Anything, "NG", "#", "NGg" },
{Anything, "NGL", "%", "NGgAXl"},
{Anything, "NG", Anything, "NG" },
{Anything, "NK", Anything, "NGk" },
{Nothing, "NOW", Nothing, "nAW" },
{Anything, "N", Anything, "n" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule O_rules[] =
{
{Anything, "OF", Nothing, "AXv" },
{Anything, "OROUGH", Anything, "EROW" },
{"#:", "OR", Nothing, "ER" },
{"#:", "ORS", Nothing, "ERz" },
{Anything, "OR", Anything, "AOr" },
{Nothing, "ONE", Anything, "wAHn" },
{Anything, "OW", Anything, "OW" },
{Nothing, "OVER", Anything, "OWvER" },
{Anything, "OV", Anything, "AHv" },
{Anything, "O", "^%", "OW" },
{Anything, "O", "^EN", "OW" },
{Anything, "O", "^I#", "OW" },
{Anything, "OL", "D", "OWl" },
{Anything, "OUGHT", Anything, "AOt" },
{Anything, "OUGH", Anything, "AHf" },
{Nothing, "OU", Anything, "AW" },
{"H", "OU", "S#", "AW" },
{Anything, "OUS", Anything, "AXs" },
{Anything, "OUR", Anything, "AOr" },
{Anything, "OULD", Anything, "UHd" },
{"^", "OU", "^L", "AH" },
{Anything, "OUP", Anything, "UWp" },
{Anything, "OU", Anything, "AW" },
{Anything, "OY", Anything, "OY" },
{Anything, "OING", Anything, "OWIHNG"},
{Anything, "OI", Anything, "OY" },
{Anything, "OOR", Anything, "AOr" },
{Anything, "OOK", Anything, "UHk" },
{Anything, "OOD", Anything, "UHd" },
{Anything, "OO", Anything, "UW" },
{Anything, "O", "E", "OW" },
{Anything, "O", Nothing, "OW" },
{Anything, "OA", Anything, "OW" },
{Nothing, "ONLY", Anything, "OWnlIY"},
{Nothing, "ONCE", Anything, "wAHns" },
{Anything, "ON'T", Anything, "OWnt" },
{"C", "O", "N", "AA" },
{Anything, "O", "NG", "AO" },
{" :^", "O", "N", "AH" },
{"I", "ON", Anything, "AXn" },
{"#:", "ON", Nothing, "AXn" },
{"#^", "ON", Anything, "AXn" },
{Anything, "O", "ST ", "OW" },
{Anything, "OF", "^", "AOf" },
{Anything, "OTHER", Anything, "AHDHER"},
{Anything, "OSS", Nothing, "AOs" },
{"#:^", "OM", Anything, "AHm" },
{Anything, "O", Anything, "AA" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule P_rules[] =
{
{Anything, "PH", Anything, "f" },
{Anything, "PEOP", Anything, "pIYp" },
{Anything, "POW", Anything, "pAW" },
{Anything, "PUT", Nothing, "pUHt" },
{Anything, "P", Anything, "p" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule Q_rules[] =
{
{Anything, "QUAR", Anything, "kwAOr" },
{Anything, "QU", Anything, "kw" },
{Anything, "Q", Anything, "k" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule R_rules[] =
{
{Nothing, "RE", "^#", "rIY" },
{Anything, "R", Anything, "r" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule S_rules[] =
{
{Anything, "SH", Anything, "SH" },
{"#", "SION", Anything, "ZHAXn" },
{Anything, "SOME", Anything, "sAHm" },
{"#", "SUR", "#", "ZHER" },
{Anything, "SUR", "#", "SHER" },
{"#", "SU", "#", "ZHUW" },
{"#", "SSU", "#", "SHUW" },
{"#", "SED", Nothing, "zd" },
{"#", "S", "#", "z" },
{Anything, "SAID", Anything, "sEHd" },
{"^", "SION", Anything, "SHAXn" },
{Anything, "S", "S", Silent },
{".", "S", Nothing, "z" },
{"#:.E", "S", Nothing, "z" },
{"#:^##", "S", Nothing, "z" },
{"#:^#", "S", Nothing, "s" },
{"U", "S", Nothing, "s" },
{" :#", "S", Nothing, "z" },
{Nothing, "SCH", Anything, "sk" },
{Anything, "S", "C+", Silent },
{"#", "SM", Anything, "zm" },
{"#", "SN", "'", "zAXn" },
{Anything, "S", Anything, "s" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule T_rules[] =
{
{Nothing, "THE", Nothing, "DHAX" },
{Anything, "TO", Nothing, "tUW" },
{Anything, "THAT", Nothing, "DHAEt" },
{Nothing, "THIS", Nothing, "DHIHs" },
{Nothing, "THEY", Anything, "DHEY" },
{Nothing, "THERE", Anything, "DHEHr" },
{Anything, "THER", Anything, "DHER" },
{Anything, "THEIR", Anything, "DHEHr" },
{Nothing, "THAN", Nothing, "DHAEn" },
{Nothing, "THEM", Nothing, "DHEHm" },
{Anything, "THESE", Nothing, "DHIYz" },
{Nothing, "THEN", Anything, "DHEHn" },
{Anything, "THROUGH", Anything, "THrUW" },
{Anything, "THOSE", Anything, "DHOWz" },
{Anything, "THOUGH", Nothing, "DHOW" },
{Nothing, "THUS", Anything, "DHAHs" },
{Anything, "TH", Anything, "TH" },
{"#:", "TED", Nothing, "tIHd" },
{"S", "TI", "#N", "CH" },
{Anything, "TI", "O", "SH" },
{Anything, "TI", "A", "SH" },
{Anything, "TIEN", Anything, "SHAXn" },
{Anything, "TUR", "#", "CHER" },
{Anything, "TU", "A", "CHUW" },
{Nothing, "TWO", Anything, "tUW" },
{Anything, "T", Anything, "t" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule U_rules[] =
{
{Nothing, "UN", "I", "yUWn" },
{Nothing, "UN", Anything, "AHn" },
{Nothing, "UPON", Anything, "AXpAOn"},
{"T", "UR", "#", "UHr" },
{"S", "UR", "#", "UHr" },
{"R", "UR", "#", "UHr" },
{"D", "UR", "#", "UHr" },
{"L", "UR", "#", "UHr" },
{"Z", "UR", "#", "UHr" },
{"N", "UR", "#", "UHr" },
{"J", "UR", "#", "UHr" },
{"TH", "UR", "#", "UHr" },
{"CH", "UR", "#", "UHr" },
{"SH", "UR", "#", "UHr" },
{Anything, "UR", "#", "yUHr" },
{Anything, "UR", Anything, "ER" },
{Anything, "U", "^ ", "AH" },
{Anything, "U", "^^", "AH" },
{Anything, "UY", Anything, "AY" },
{" G", "U", "#", Silent },
{"G", "U", "%", Silent },
{"G", "U", "#", "w" },
{"#N", "U", Anything, "yUW" },
{"T", "U", Anything, "UW" },
{"S", "U", Anything, "UW" },
{"R", "U", Anything, "UW" },
{"D", "U", Anything, "UW" },
{"L", "U", Anything, "UW" },
{"Z", "U", Anything, "UW" },
{"N", "U", Anything, "UW" },
{"J", "U", Anything, "UW" },
{"TH", "U", Anything, "UW" },
{"CH", "U", Anything, "UW" },
{"SH", "U", Anything, "UW" },
{Anything, "U", Anything, "yUW" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule V_rules[] =
{
{Anything, "VIEW", Anything, "vyUW" },
{Anything, "V", Anything, "v" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule W_rules[] =
{
{Nothing, "WERE", Anything, "wER" },
{Anything, "WA", "S", "wAA" },
{Anything, "WA", "T", "wAA" },
{Anything, "WHERE", Anything, "WHEHr" },
{Anything, "WHAT", Anything, "WHAAt" },
{Anything, "WHOL", Anything, "hOWl" },
{Anything, "WHO", Anything, "hUW" },
{Anything, "WH", Anything, "WH" },
{Anything, "WAR", Anything, "wAOr" },
{Anything, "WOR", "^", "wER" },
{Anything, "WR", Anything, "r" },
{Anything, "W", Anything, "w" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule X_rules[] =
{
{Anything, "X", Anything, "ks" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule Y_rules[] =
{
{Anything, "YOUNG", Anything, "yAHNG" },
{Nothing, "YOU", Anything, "yUW" },
{Nothing, "YES", Anything, "yEHs" },
{Nothing, "Y", Anything, "y" },
{"#:^", "Y", Nothing, "IY" },
{"#:^", "Y", "I", "IY" },
{" :", "Y", Nothing, "AY" },
{" :", "Y", "#", "AY" },
{" :", "Y", "^+:#", "IH" },
{" :", "Y", "^#", "AY" },
{Anything, "Y", Anything, "IH" },
{Anything, 0, Anything, Silent },
};
/*
** LEFT_PART MATCH_PART RIGHT_PART OUT_PART
*/
static Rule Z_rules[] =
{
{Anything, "Z", Anything, "z" },
{Anything, 0, Anything, Silent },
};
Rule *Rules[] =
{
punct_rules,
A_rules, B_rules, C_rules, D_rules, E_rules, F_rules, G_rules,
H_rules, I_rules, J_rules, K_rules, L_rules, M_rules, N_rules,
O_rules, P_rules, Q_rules, R_rules, S_rules, T_rules, U_rules,
V_rules, W_rules, X_rules, Y_rules, Z_rules
};
#include <stdio.h>
#include <ctype.h>
#define MAX_LENGTH 128
static FILE *In_file;
static FILE *Out_file;
static int Char, Char1, Char2, Char3;
/*
** main(argc, argv)
** int argc;
** char *argv[];
**
** This is the main program. It takes up to two file names (input
** and output) and translates the input file to phoneme codes
** (see english.c) on the output file.
*/
main(argc, argv)
int argc;
char *argv[];
{
if (argc > 3)
{
fputs("Usage: PHONEME [infile [outfile]]\n", stderr);
exit(1);
}
if (argc == 1)
{
fputs("Enter english text:\n", stderr);
}
if (argc > 1)
{
In_file = fopen(argv[1], "r");
if (In_file == 0)
{
fputs("Error: Cannot open input file.\n", stderr);
fputs("Usage: PHONEME [infile [outfile]]\n", stderr);
exit(1);
}
}
else
In_file = stdin;
if (argc > 2)
{
Out_file = fopen(argv[2], "w");
if (Out_file == 0)
{
fputs("Error: Cannot create output file.\n", stderr);
fputs("Usage: PHONEME [infile [outfile]]\n", stderr);
exit(1);
}
}
else
Out_file = stdout;
xlate_file();
}
outstring(string)
char *string;
{
while (*string != '\0')
outchar(*string++);
}
outchar(chr)
int chr;
{
fputc(chr,Out_file);
}
int makeupper(character)
int character;
{
if (islower(character))
return toupper(character);
else
return character;
}
new_char()
{
/*
If the cache is full of newline, time to prime the look-ahead
again. If an EOF is found, fill the remainder of the queue with
EOF's.
*/
if (Char == '\n' && Char1 == '\n' && Char2 == '\n' && Char3 == '\n')
{ /* prime the pump again */
Char = getc(In_file);
if (Char == EOF)
{
Char1 = EOF;
Char2 = EOF;
Char3 = EOF;
return Char;
}
if (Char == '\n')
return Char;
Char1 = getc(In_file);
if (Char1 == EOF)
{
Char2 = EOF;
Char3 = EOF;
return Char;
}
if (Char1 == '\n')
return Char;
Char2 = getc(In_file);
if (Char2 == EOF)
{
Char3 = EOF;
return Char;
}
if (Char2 == '\n')
return Char;
Char3 = getc(In_file);
}
else
{
/*
Buffer not full of newline, shuffle the characters and
either get a new one or propagate a newline or EOF.
*/
Char = Char1;
Char1 = Char2;
Char2 = Char3;
if (Char3 != '\n' && Char3 != EOF)
Char3 = getc(In_file);
}
return Char;
}
/*
** xlate_file()
**
** This is the input file translator. It sets up the first character
** and uses it to determine what kind of text follows.
*/
xlate_file()
{
/* Prime the queue */
Char = '\n';
Char1 = '\n';
Char2 = '\n';
Char3 = '\n';
new_char(); /* Fill Char, Char1, Char2 and Char3 */
while (Char != EOF) /* All of the words in the file */
{
if (isdigit(Char))
have_number();
else
if (isalpha(Char) || Char == '\'')
have_letter();
else
if (Char == '$' && isdigit(Char1))
have_dollars();
else
have_special();
}
}
have_dollars()
{
long int value;
value = 0L;
for (new_char() ; isdigit(Char) || Char == ',' ; new_char())
{
if (Char != ',')
value = 10 * value + (Char-'0');
}
say_cardinal(value); /* Say number of whole dollars */
/* Found a character that is a non-digit and non-comma */
/* Check for no decimal or no cents digits */
if (Char != '.' || !isdigit(Char1))
{
if (value == 1L)
outstring("dAAlER ");
else
outstring("dAAlAArz ");
return 1;
}
/* We have '.' followed by a digit */
new_char(); /* Skip the period */
/* If it is ".dd " say as " DOLLARS AND n CENTS " */
if (isdigit(Char1) && !isdigit(Char2))
{
if (value == 1L)
outstring("dAAlER ");
else
outstring("dAAlAArz ");
if (Char == '0' && Char1 == '0')
{
new_char(); /* Skip tens digit */
new_char(); /* Skip units digit */
return 1;
}
outstring("AAnd ");
value = (Char-'0')*10 + Char1-'0';
say_cardinal(value);
if (value == 1L)
outstring("sEHnt ");
else
outstring("sEHnts ");
new_char(); /* Used Char (tens digit) */
new_char(); /* Used Char1 (units digit) */
return 1;
}
/* Otherwise say as "n POINT ddd DOLLARS " */
outstring("pOYnt ");
for ( ; isdigit(Char) ; new_char())
{
say_ascii(Char);
}
outstring("dAAlAArz ");
return 1;
}
have_special()
{
if (Char == '\n')
outchar('\n');
else
if (!isspace(Char))
say_ascii(Char);
new_char();
return 1;
}
have_number()
{
long int value;
int lastdigit;
value = Char - '0';
lastdigit = Char;
for (new_char() ; isdigit(Char) ; new_char())
{
value = 10 * value + (Char-'0');
lastdigit = Char;
}
/* Recognize ordinals based on last digit of number */
switch (lastdigit)
{
case '1': /* ST */
if (makeupper(Char) == 'S' && makeupper(Char1) == 'T' &&
!isalpha(Char2) && !isdigit(Char2))
{
say_ordinal(value);
new_char(); /* Used Char */
new_char(); /* Used Char1 */
return 1;
}
break;
case '2': /* ND */
if (makeupper(Char) == 'N' && makeupper(Char1) == 'D' &&
!isalpha(Char2) && !isdigit(Char2))
{
say_ordinal(value);
new_char(); /* Used Char */
new_char(); /* Used Char1 */
return 1;
}
break;
case '3': /* RD */
if (makeupper(Char) == 'R' && makeupper(Char1) == 'D' &&
!isalpha(Char2) && !isdigit(Char2))
{
say_ordinal(value);
new_char(); /* Used Char */
new_char(); /* Used Char1 */
return 1;
}
break;
case '0': /* TH */
case '4': /* TH */
case '5': /* TH */
case '6': /* TH */
case '7': /* TH */
case '8': /* TH */
case '9': /* TH */
if (makeupper(Char) == 'T' && makeupper(Char1) == 'H' &&
!isalpha(Char2) && !isdigit(Char2))
{
say_ordinal(value);
new_char(); /* Used Char */
new_char(); /* Used Char1 */
return 1;
}
break;
}
say_cardinal(value);
/* Recognize decimal points */
if (Char == '.' && isdigit(Char1))
{
outstring("pOYnt ");
for (new_char() ; isdigit(Char) ; new_char())
{
say_ascii(Char);
}
}
/* Spell out trailing abbreviations */
if (isalpha(Char))
{
while (isalpha(Char))
{
say_ascii(Char);
new_char();
}
}
return 1;
}
have_letter()
{
char buff[MAX_LENGTH];
int count;
count = 0;
buff[count++] = ' '; /* Required initial blank */
buff[count++] = makeupper(Char);
for (new_char() ; isalpha(Char) || Char == '\'' ; new_char())
{
buff[count++] = makeupper(Char);
if (count > MAX_LENGTH-2)
{
buff[count++] = ' ';
buff[count++] = '\0';
xlate_word(buff);
count = 1;
}
}
buff[count++] = ' '; /* Required terminating blank */
buff[count++] = '\0';
/* Check for AAANNN type abbreviations */
if (isdigit(Char))
{
spell_word(buff);
return 1;
}
else
if (strlen(buff) == 3) /* one character, two spaces */
say_ascii(buff[1]);
else
if (Char == '.') /* Possible abbreviation */
abbrev(buff);
else
xlate_word(buff);
if (Char == '-' && isalpha(Char1))
new_char(); /* Skip hyphens */
}
/* Handle abbreviations. Text in buff was followed by '.' */
abbrev(buff)
char buff[];
{
if (strcmp(buff, " DR ") == 0)
{
xlate_word(" DOCTOR ");
new_char();
}
else
if (strcmp(buff, " MR ") == 0)
{
xlate_word(" MISTER ");
new_char();
}
else
if (strcmp(buff, " MRS ") == 0)
{
xlate_word(" MISSUS ");
new_char();
}
else
if (strcmp(buff, " PHD ") == 0)
{
spell_word(" PHD ");
new_char();
}
else
xlate_word(buff);
}
#include <stdio.h>
#include <ctype.h>
#define FALSE (0)
#define TRUE (!0)
/*
** English to Phoneme translation.
**
** Rules are made up of four parts:
**
** The left context.
** The text to match.
** The right context.
** The phonemes to substitute for the matched text.
**
** Procedure:
**
** Seperate each block of letters (apostrophes included)
** and add a space on each side. For each unmatched
** letter in the word, look through the rules where the
** text to match starts with the letter in the word. If
** the text to match is found and the right and left
** context patterns also match, output the phonemes for
** that rule and skip to the next unmatched letter.
**
**
** Special Context Symbols:
**
** # One or more vowels
** : Zero or more consonants
** ^ One consonant.
** . One of B, D, V, G, J, L, M, N, R, W or Z (voiced
** consonants)
** % One of ER, E, ES, ED, ING, ELY (a suffix)
** (Right context only)
** + One of E, I or Y (a "front" vowel)
*/
typedef char *Rule[4]; /* A rule is four character pointers */
extern Rule *Rules[]; /* An array of pointers to rules */
int isvowel(chr)
char chr;
{
return (chr == 'A' || chr == 'E' || chr == 'I' ||
chr == 'O' || chr == 'U');
}
int isconsonant(chr)
char chr;
{
return (isupper(chr) && !isvowel(chr));
}
xlate_word(word)
char word[];
{
int index; /* Current position in word */
int type; /* First letter of match part */
index = 1; /* Skip the initial blank */
do
{
if (isupper(word[index]))
type = word[index] - 'A' + 1;
else
type = 0;
index = find_rule(word, index, Rules[type]);
}
while (word[index] != '\0');
}
find_rule(word, index, rules)
char word[];
int index;
Rule *rules;
{
Rule *rule;
char *left, *match, *right, *output;
int remainder;
for (;;) /* Search for the rule */
{
rule = rules++;
match = (*rule)[1];
if (match == 0) /* bad symbol! */
{
fprintf(stderr,
"Error: Can't find rule for: '%c' in \"%s\"\n", word[index], word);
return index+1; /* Skip it! */
}
for (remainder = index; *match != '\0'; match++, remainder++)
{
if (*match != word[remainder])
break;
}
if (*match != '\0') /* found missmatch */
continue;
/*
printf("\nWord: \"%s\", Index:%4d, Trying: \"%s/%s/%s\" = \"%s\"\n",
word, index, (*rule)[0], (*rule)[1], (*rule)[2], (*rule)[3]);
*/
left = (*rule)[0];
right = (*rule)[2];
if (!leftmatch(left, &word[index-1]))
continue;
/*
printf("leftmatch(\"%s\",\"...%c\") succeded!\n", left, word[index-1]);
*/
if (!rightmatch(right, &word[remainder]))
continue;
/*
printf("rightmatch(\"%s\",\"%s\") succeded!\n", right, &word[remainder]);
*/
output = (*rule)[3];
/*
printf("Success: ");
*/
outstring(output);
return remainder;
}
}
leftmatch(pattern, context)
char *pattern; /* first char of pattern to match in text */
char *context; /* last char of text to be matched */
{
char *pat;
char *text;
int count;
if (*pattern == '\0') /* null string matches any context */
{
return TRUE;
}
/* point to last character in pattern string */
count = strlen(pattern);
pat = pattern + (count - 1);
text = context;
for (; count > 0; pat--, count--)
{
/* First check for simple text or space */
if (isalpha(*pat) || *pat == '\'' || *pat == ' ')
if (*pat != *text)
return FALSE;
else
{
text--;
continue;
}
switch (*pat)
{
case '#': /* One or more vowels */
if (!isvowel(*text))
return FALSE;
text--;
while (isvowel(*text))
text--;
break;
case ':': /* Zero or more consonants */
while (isconsonant(*text))
text--;
break;
case '^': /* One consonant */
if (!isconsonant(*text))
return FALSE;
text--;
break;
case '.': /* B, D, V, G, J, L, M, N, R, W, Z */
if (*text != 'B' && *text != 'D' && *text != 'V'
&& *text != 'G' && *text != 'J' && *text != 'L'
&& *text != 'M' && *text != 'N' && *text != 'R'
&& *text != 'W' && *text != 'Z')
return FALSE;
text--;
break;
case '+': /* E, I or Y (front vowel) */
if (*text != 'E' && *text != 'I' && *text != 'Y')
return FALSE;
text--;
break;
case '%':
default:
fprintf(stderr, "Bad char in left rule: '%c'\n", *pat);
return FALSE;
}
}
return TRUE;
}
rightmatch(pattern, context)
char *pattern; /* first char of pattern to match in text */
char *context; /* last char of text to be matched */
{
char *pat;
char *text;
if (*pattern == '\0') /* null string matches any context */
return TRUE;
pat = pattern;
text = context;
for (pat = pattern; *pat != '\0'; pat++)
{
/* First check for simple text or space */
if (isalpha(*pat) || *pat == '\'' || *pat == ' ')
if (*pat != *text)
return FALSE;
else
{
text++;
continue;
}
switch (*pat)
{
case '#': /* One or more vowels */
if (!isvowel(*text))
return FALSE;
text++;
while (isvowel(*text))
text++;
break;
case ':': /* Zero or more consonants */
while (isconsonant(*text))
text++;
break;
case '^': /* One consonant */
if (!isconsonant(*text))
return FALSE;
text++;
break;
case '.': /* B, D, V, G, J, L, M, N, R, W, Z */
if (*text != 'B' && *text != 'D' && *text != 'V'
&& *text != 'G' && *text != 'J' && *text != 'L'
&& *text != 'M' && *text != 'N' && *text != 'R'
&& *text != 'W' && *text != 'Z')
return FALSE;
text++;
break;
case '+': /* E, I or Y (front vowel) */
if (*text != 'E' && *text != 'I' && *text != 'Y')
return FALSE;
text++;
break;
case '%': /* ER, E, ES, ED, ING, ELY (a suffix) */
if (*text == 'E')
{
text++;
if (*text == 'L')
{
text++;
if (*text == 'Y')
{
text++;
break;
}
else
{
text--; /* Don't gobble L */
break;
}
}
else
if (*text == 'R' || *text == 'S'
|| *text == 'D')
text++;
break;
}
else
if (*text == 'I')
{
text++;
if (*text == 'N')
{
text++;
if (*text == 'G')
{
text++;
break;
}
}
return FALSE;
}
else
return FALSE;
default:
fprintf(stderr, "Bad char in right rule:'%c'\n", *pat);
return FALSE;
}
}
return TRUE;
}
#include <stdio.h>
/*
** Integer to Readable ASCII Conversion Routine.
**
** Synopsis:
**
** say_cardinal(value)
** long int value; -- The number to output
**
** The number is translated into a string of phonemes
**
*/
static char *Cardinals[] =
{
"zIHrOW ", "wAHn ", "tUW ", "THrIY ",
"fOWr ", "fAYv ", "sIHks ", "sEHvAXn ",
"EYt ", "nAYn ",
"tEHn ", "IYlEHvAXn ", "twEHlv ", "THERtIYn ",
"fOWrtIYn ", "fIHftIYn ", "sIHkstIYn ", "sEHvEHntIYn ",
"EYtIYn ", "nAYntIYn "
} ;
static char *Twenties[] =
{
"twEHntIY ", "THERtIY ", "fAOrtIY ", "fIHftIY ",
"sIHkstIY ", "sEHvEHntIY ", "EYtIY ", "nAYntIY "
} ;
static char *Ordinals[] =
{
"zIHrOWEHTH ", "fERst ", "sEHkAHnd ", "THERd ",
"fOWrTH ", "fIHfTH ", "sIHksTH ", "sEHvEHnTH ",
"EYtTH ", "nAYnTH ",
"tEHnTH ", "IYlEHvEHnTH ", "twEHlvTH ", "THERtIYnTH ",
"fAOrtIYnTH ", "fIHftIYnTH ", "sIHkstIYnTH ", "sEHvEHntIYnTH ",
"EYtIYnTH ", "nAYntIYnTH "
} ;
static char *Ord_twenties[] =
{
"twEHntIYEHTH ","THERtIYEHTH ", "fOWrtIYEHTH ", "fIHftIYEHTH ",
"sIHkstIYEHTH ","sEHvEHntIYEHTH ","EYtIYEHTH ", "nAYntIYEHTH "
} ;
/*
** Translate a number to phonemes. This version is for CARDINAL numbers.
** Note: this is recursive.
*/
say_cardinal(value)
long int value;
{
if (value < 0)
{
outstring("mAYnAHs ");
value = (-value);
if (value < 0) /* Overflow! -32768 */
{
outstring("IHnfIHnIHtIY ");
return 1;
}
}
if (value >= 1000000000L) /* Billions */
{
say_cardinal(value/1000000000L);
outstring("bIHlIYAXn ");
value = value % 1000000000;
if (value == 0)
return 1; /* Even billion */
if (value < 100) /* as in THREE BILLION AND FIVE */
outstring("AEnd ");
}
if (value >= 1000000L) /* Millions */
{
say_cardinal(value/1000000L);
outstring("mIHlIYAXn ");
value = value % 1000000L;
if (value == 0)
return 1; /* Even million */
if (value < 100) /* as in THREE MILLION AND FIVE */
outstring("AEnd ");
}
/* Thousands 1000..1099 2000..99999 */
/* 1100 to 1999 is eleven-hunderd to ninteen-hunderd */
if ((value >= 1000L && value <= 1099L) || value >= 2000L)
{
say_cardinal(value/1000L);
outstring("THAWzAEnd ");
value = value % 1000L;
if (value == 0)
return 1; /* Even thousand */
if (value < 100) /* as in THREE THOUSAND AND FIVE */
outstring("AEnd ");
}
if (value >= 100L)
{
outstring(Cardinals[value/100]);
outstring("hAHndrEHd ");
value = value % 100;
if (value == 0)
return 1; /* Even hundred */
}
if (value >= 20)
{
outstring(Twenties[(value-20)/ 10]);
value = value % 10;
if (value == 0)
return 1; /* Even ten */
}
outstring(Cardinals[value]);
return 1;
}
/*
** Translate a number to phonemes. This version is for ORDINAL numbers.
** Note: this is recursive.
*/
say_ordinal(value)
long int value;
{
if (value < 0)
{
outstring("mAHnAXs ");
value = (-value);
if (value < 0) /* Overflow! -32768 */
{
outstring("IHnfIHnIHtIY ");
return 1;
}
}
if (value >= 1000000000L) /* Billions */
{
say_cardinal(value/1000000000L);
value = value % 1000000000;
if (value == 0)
{
outstring("bIHlIYAXnTH ");
return 1; /* Even billion */
}
outstring("bIHlIYAXn ");
if (value < 100) /* as in THREE BILLION AND FIVE */
outstring("AEnd ");
}
if (value >= 1000000L) /* Millions */
{
say_cardinal(value/1000000L);
value = value % 1000000L;
if (value == 0)
{
outstring("mIHlIYAXnTH ");
return 1; /* Even million */
}
outstring("mIHlIYAXn ");
if (value < 100) /* as in THREE MILLION AND FIVE */
outstring("AEnd ");
}
/* Thousands 1000..1099 2000..99999 */
/* 1100 to 1999 is eleven-hunderd to ninteen-hunderd */
if ((value >= 1000L && value <= 1099L) || value >= 2000L)
{
say_cardinal(value/1000L);
value = value % 1000L;
if (value == 0)
{
outstring("THAWzAEndTH ");
return 1; /* Even thousand */
}
outstring("THAWzAEnd ");
if (value < 100) /* as in THREE THOUSAND AND FIVE */
outstring("AEnd ");
}
if (value >= 100L)
{
outstring(Cardinals[value/100]);
value = value % 100;
if (value == 0)
{
outstring("hAHndrEHdTH ");
return 1; /* Even hundred */
}
outstring("hAHndrEHd ");
}
if (value >= 20)
{
if ((value%10) == 0)
{
outstring(Ord_twenties[(value-20)/ 10]);
return 1; /* Even ten */
}
outstring(Twenties[(value-20)/ 10]);
value = value % 10;
}
outstring(Ordinals[value]);
return 1;
}
#include <stdio.h>
static char *Ascii[] =
{
"nUWl ","stAArt AXv hEHdER ","stAArt AXv tEHkst ","EHnd AXv tEHkst ",
"EHnd AXv trAEnsmIHSHAXn",
"EHnkwAYr ","AEk ","bEHl ","bAEkspEYs ","tAEb ","lIHnIYfIYd ",
"vERtIHkAXl tAEb ","fAOrmfIYd ","kAErAYj rIYtERn ","SHIHft AWt ",
"SHIHft IHn ","dIHlIYt ","dIHvIHs kAAntrAAl wAHn ","dIHvIHs kAAntrAAl tUW ",
"dIHvIHs kAAntrAAl THrIY ","dIHvIHs kAAntrAAl fOWr ","nAEk ","sIHnk ",
"EHnd tEHkst blAAk ","kAEnsEHl ","EHnd AXv mEHsIHj ","sUWbstIHtUWt ",
"EHskEYp ","fAYEHld sIYpERAEtER ","grUWp sIYpERAEtER ","rIYkAOrd sIYpERAEtER ",
"yUWnIHt sIYpERAEtER ","spEYs ","EHksklAEmEYSHAXn mAArk ","dAHbl kwOWt ",
"nUWmbER sAYn ","dAAlER sAYn ","pERsEHnt ","AEmpERsAEnd ","kwOWt ",
"OWpEHn pEHrEHn ","klOWz pEHrEHn ","AEstEHrIHsk ","plAHs ","kAAmmAX ",
"mIHnAHs ","pIYrIYAAd ","slAESH ",
"zIHrOW ","wAHn ","tUW ","THrIY ","fOWr ",
"fAYv ","sIHks ","sEHvAXn ","EYt ","nAYn ",
"kAAlAXn ","sEHmIHkAAlAXn ","lEHs DHAEn ","EHkwAXl sAYn ","grEYtER DHAEn ",
"kwEHsCHAXn mAArk ","AEt sAYn ",
"EY ","bIY ","sIY ","dIY ","IY ","EHf ","jIY ",
"EYtCH ","AY ","jEY ","kEY ","EHl ","EHm ","EHn ","AA ","pIY ",
"kw ","AAr ","EHz ","tIY ","AHw ","vIY ",
"dAHblyUWw ","EHks ","wAYIY ","zIY ",
"lEHft brAEkEHt ","bAEkslAESH ","rAYt brAEkEHt ","kAErEHt ",
"AHndERskAOr ","AEpAAstrAAfIH ",
"EY ","bIY ","sIY ","dIY ","IY ","EHf ","jIY ",
"EYtCH ","AY ","jEY ","kEY ","EHl ","EHm ","EHn ","AA ","pIY ",
"kw ","AAr ","EHz ","tIY ","AHw ","vIY ",
"dAHblyUWw ","EHks ","wAYIY ","zIY ",
"lEHft brEYs ","vERtIHkAXl bAAr ","rAYt brEYs ","tAYld ","dEHl ",
};
say_ascii(character)
int character;
{
outstring(Ascii[character&0x7F]);
}
spell_word(word)
char *word;
{
for (word++ ; word[1] != '\0' ; word++)
outstring(Ascii[(*word)&0x7F]);
}
# Coding parameters
SOURCEKIND = WAVEFORM
SOURCEFORMAT = WAVE
SOURCERATE = 625.0
TARGETKIND = PLP_E_D_A_Z
TARGETRATE = 100000.0
SAVECOMPRESSED = T
SAVEWITHCRC = T
WINDOWSIZE = 250000.0
ZMEANSOURCE = T
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 24
LPCORDER = 12
USEPOWER = T
此差异已折叠。
~o
<STREAMINFO> 1 39
<VECSIZE> 39<NULLD><PLP_E_D_A_Z><DIAGC>
~v "varFloor1"
<VARIANCE> 39
3.775564e-03 4.079504e-03 4.140842e-03 4.754395e-03 5.421045e-03 3.905575e-03 3.417824e-03 3.297425e-03 3.133435e-03 2.756949e-03 2.279631e-03 1.618109e-03 1.932043e-03 1.913605e-04 1.656697e-04 1.542451e-04 1.967382e-04 1.924746e-04 1.852640e-04 1.763692e-04 1.960309e-04 1.941942e-04 1.750100e-04 1.440021e-04 1.083358e-04 8.763076e-06 2.761404e-05 2.222120e-05 2.312505e-05 3.049958e-05 3.095940e-05 3.121570e-05 3.065950e-05 3.542555e-05 3.565122e-05 3.164548e-05 2.577707e-05 2.046118e-05 1.202846e-06
此差异已折叠。
此差异已折叠。
a1
a2
a3
a4
a5
aa
ai1
ai2
ai3
ai4
ai5
an1
an2
an3
an4
an5
ang1
ang2
ang3
ang4
ang5
ao1
ao2
ao3
ao4
ao5
b
c
ch
d
e1
e2
e3
e4
e5
ee
ei1
ei2
ei3
ei4
ei5
en1
en2
en3
en4
en5
eng1
eng2
eng3
eng4
eng5
er2
er3
er4
er5
f
g
h
i1
i2
i3
i4
i5
ia1
ia2
ia3
ia4
ia5
ian1
ian2
ian3
ian4
ian5
iang1
iang2
iang3
iang4
iang5
iao1
iao2
iao3
iao4
iao5
ie1
ie2
ie3
ie4
ie5
ii
in1
in2
in3
in4
in5
ing1
ing2
ing3
ing4
ing5
iong1
iong2
iong3
iong4
iong5
iu1
iu2
iu3
iu4
iu5
ix1
ix2
ix3
ix4
ix5
iy1
iy2
iy3
iy4
iy5
iz4
j
k
l
m
n
o1
o2
o3
o4
o5
ong1
ong2
ong3
ong4
ong5
oo
ou1
ou2
ou3
ou4
ou5
p
q
r
s
sh
sil
sp
t
u1
u2
u3
u4
u5
ua1
ua2
ua3
ua4
ua5
uai1
uai2
uai3
uai4
uai5
uan1
uan2
uan3
uan4
uan5
uang1
uang2
uang3
uang4
uang5
ueng1
ueng3
ueng4
ueng5
ui1
ui2
ui3
ui4
ui5
un1
un2
un3
un4
un5
uo1
uo2
uo3
uo4
uo5
uu
v1
v2
v3
v4
v5
van1
van2
van3
van4
van5
ve1
ve2
ve3
ve4
ve5
vn1
vn2
vn3
vn4
vn5
vv
x
z
zh
# Coding parameters
SOURCEKIND = WAVEFORM
SOURCEFORMAT = WAVE
SOURCERATE = 625.0
TARGETKIND = PLP_E_D_A_Z
TARGETRATE = 100000.0
SAVECOMPRESSED = T
SAVEWITHCRC = T
WINDOWSIZE = 250000.0
ZMEANSOURCE = T
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 24
LPCORDER = 12
USEPOWER = T
~o
<STREAMINFO> 1 39
<VECSIZE> 39<NULLD><PLP_E_D_A_Z><DIAGC>
~v "varFloor1"
<VARIANCE> 39
3.775564e-03 4.079504e-03 4.140842e-03 4.754395e-03 5.421045e-03 3.905575e-03 3.417824e-03 3.297425e-03 3.133435e-03 2.756949e-03 2.279631e-03 1.618109e-03 1.932043e-03 1.913605e-04 1.656697e-04 1.542451e-04 1.967382e-04 1.924746e-04 1.852640e-04 1.763692e-04 1.960309e-04 1.941942e-04 1.750100e-04 1.440021e-04 1.083358e-04 8.763076e-06 2.761404e-05 2.222120e-05 2.312505e-05 3.049958e-05 3.095940e-05 3.121570e-05 3.065950e-05 3.542555e-05 3.565122e-05 3.164548e-05 2.577707e-05 2.046118e-05 1.202846e-06
此差异已折叠。
# Copyright 2021 Tomoki Hayashi
# Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
"""Wrapper class for the vocoder model trained with parallel_wavegan repo."""
import logging
import os
from pathlib import Path
from typing import Optional
from typing import Union
import yaml
import torch
class ParallelWaveGANPretrainedVocoder(torch.nn.Module):
"""Wrapper class to load the vocoder trained with parallel_wavegan repo."""
def __init__(
self,
model_file: Union[Path, str],
config_file: Optional[Union[Path, str]] = None,
):
"""Initialize ParallelWaveGANPretrainedVocoder module."""
super().__init__()
try:
from parallel_wavegan.utils import load_model
except ImportError:
logging.error(
"`parallel_wavegan` is not installed. "
"Please install via `pip install -U parallel_wavegan`."
)
raise
if config_file is None:
dirname = os.path.dirname(str(model_file))
config_file = os.path.join(dirname, "config.yml")
with open(config_file) as f:
config = yaml.load(f, Loader=yaml.Loader)
self.fs = config["sampling_rate"]
self.vocoder = load_model(model_file, config)
if hasattr(self.vocoder, "remove_weight_norm"):
self.vocoder.remove_weight_norm()
self.normalize_before = False
if hasattr(self.vocoder, "mean"):
self.normalize_before = True
@torch.no_grad()
def forward(self, feats: torch.Tensor) -> torch.Tensor:
"""Generate waveform with pretrained vocoder.
Args:
feats (Tensor): Feature tensor (T_feats, #mels).
Returns:
Tensor: Generated waveform tensor (T_wav).
"""
return self.vocoder.inference(
feats,
normalize_before=self.normalize_before,
).view(-1)
...@@ -38,6 +38,7 @@ import paddle.nn.functional as F ...@@ -38,6 +38,7 @@ import paddle.nn.functional as F
from paddlespeech.t2s.modules.nets_utils import make_pad_mask from paddlespeech.t2s.modules.nets_utils import make_pad_mask
from paddlespeech.t2s.exps.syn_utils import get_frontend from paddlespeech.t2s.exps.syn_utils import get_frontend
from tools.parallel_wavegan_pretrained_vocoder import ParallelWaveGANPretrainedVocoder
from sedit_arg_parser import parse_args from sedit_arg_parser import parse_args
model_alias = { model_alias = {
...@@ -60,14 +61,38 @@ model_alias = { ...@@ -60,14 +61,38 @@ model_alias = {
def is_chinese(ch):
if u'\u4e00' <= ch <= u'\u9fff':
return True
else:
return False
def build_vocoder_from_file(
vocoder_config_file = None,
vocoder_file = None,
model = None,
device = "cpu",
):
# Build vocoder
if str(vocoder_file).endswith(".pkl"):
# If the extension is ".pkl", the model is trained with parallel_wavegan
vocoder = ParallelWaveGANPretrainedVocoder(
vocoder_file, vocoder_config_file
)
return vocoder.to(device)
else:
raise ValueError(f"{vocoder_file} is not supported format.")
def get_voc_out(mel, target_language="chinese"): def get_voc_out(mel, target_language="chinese"):
# vocoder # vocoder
args = parse_args() args = parse_args()
assert target_language == "chinese" or target_language == "english", "In get_voc_out function, target_language is illegal..." assert target_language == "chinese" or target_language == "english", "In get_voc_out function, target_language is illegal..."
print("current vocoder: ", args.voc) # print("current vocoder: ", args.voc)
with open(args.voc_config) as f: with open(args.voc_config) as f:
voc_config = CfgNode(yaml.safe_load(f)) voc_config = CfgNode(yaml.safe_load(f))
# print(voc_config) # print(voc_config)
...@@ -136,6 +161,23 @@ def get_am_inference(args, am_config): ...@@ -136,6 +161,23 @@ def get_am_inference(args, am_config):
def evaluate_durations(phns, target_language="chinese", fs=24000, hop_length=300): def evaluate_durations(phns, target_language="chinese", fs=24000, hop_length=300):
args = parse_args() args = parse_args()
if target_language == 'english':
args.lang='en'
args.am = "fastspeech2_ljspeech"
args.am_config = "download/fastspeech2_nosil_ljspeech_ckpt_0.5/default.yaml"
args.am_ckpt = "download/fastspeech2_nosil_ljspeech_ckpt_0.5/snapshot_iter_100000.pdz"
args.am_stat = "download/fastspeech2_nosil_ljspeech_ckpt_0.5/speech_stats.npy"
args.phones_dict = "download/fastspeech2_nosil_ljspeech_ckpt_0.5/phone_id_map.txt"
elif target_language == 'chinese':
args.lang='zh'
args.am = "fastspeech2_csmsc"
args.am_config="download/fastspeech2_conformer_baker_ckpt_0.5/conformer.yaml"
args.am_ckpt = "download/fastspeech2_conformer_baker_ckpt_0.5/snapshot_iter_76000.pdz"
args.am_stat = "download/fastspeech2_conformer_baker_ckpt_0.5/speech_stats.npy"
args.phones_dict ="download/fastspeech2_conformer_baker_ckpt_0.5/phone_id_map.txt"
# args = parser.parse_args(args=[]) # args = parser.parse_args(args=[])
if args.ngpu == 0: if args.ngpu == 0:
paddle.set_device("cpu") paddle.set_device("cpu")
...@@ -167,6 +209,7 @@ def evaluate_durations(phns, target_language="chinese", fs=24000, hop_length=300 ...@@ -167,6 +209,7 @@ def evaluate_durations(phns, target_language="chinese", fs=24000, hop_length=300
phonemes = [ phonemes = [
phn if phn in vocab_phones else "sp" for phn in torch_phns phn if phn in vocab_phones else "sp" for phn in torch_phns
] ]
phone_ids = [vocab_phones[item] for item in phonemes] phone_ids = [vocab_phones[item] for item in phonemes]
phone_ids_new = phone_ids phone_ids_new = phone_ids
phone_ids_new.append(vocab_size-1) phone_ids_new.append(vocab_size-1)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册