fix bug with fixed inputs

9eef84a3 · oyjxer · bb8271ad · 9eef84a3 · 9eef84a3 · 9eef84a3
73 changed file
--- a/ernie-sat/README.md
+++ b/ernie-sat/README.md
@@ -2,16 +2,42 @@ ERNIE-SAT是可以同时处理中英文的跨语言的语音-语言跨模态大
 ## 模型框架
 ERNIE-SAT中我们提出了两项创新：
- 在预训练过程中将中英双语对应的音素作为输入，实现了跨语言、个性化的软音素映射；
+- 在预训练过程中将中英双语对应的音素作为输入，实现了跨语言、个性化的软音素映射
- 采用语言和语音的联合掩码学习实现了语言和语音的对齐：
+- 采用语言和语音的联合掩码学习实现了语言和语音的对齐
 ![framework](.meta/framework.png)
 ## 使用说明
-### 1.安装飞桨
+### 1.安装飞桨与环境依赖
+- 本项目的代码基于 Paddle(version>=2.0)
+- 本项目开放提供加载torch版本的vocoder的功能
+  - torch version>=1.8
+- 安装htk: 在[官方地址](https://htk.eng.cam.ac.uk/)注册完成后，即可进行下载较新版本的htk(例如3.4.1)。同时提供[历史版本htk下载地址](https://htk.eng.cam.ac.uk/ftp/software/)
+    - 1.注册账号，下载htk
+    - 2.解压htk文件，**放入项目根目录的tools文件夹中, 以htk文件夹名称放入**
+    - 3.**注意**: 如果您下载的是3.4.1或者更高版本,需要进入HTKLib/HRec.c文件中, **修改1626行和1650行**, 即把**以下两行的dur<=0 都修改为 dur<0**，如下所示:
+        ```bash
+         以htk3.4.1版本举例: 
+         (1)第1626行: if (dur<=0 && labid != splabid) HError(8522,"LatFromPaths: Align  have dur<=0");
+         修改为:      if (dur<0 && labid != splabid) HError(8522,"LatFromPaths: Align  have dur<0");
+         (2)1650行: if (dur<=0 && labid != splabid) HError(8522,"LatFromPaths: Align have dur<=0 ");
+         修改为:     if (dur<0 && labid != splabid) HError(8522,"LatFromPaths: Align have dur<0 ");
+        ```
+    - 4.**编译**: 详情参见解压后的htk中的README文件(如果未编译, 则无法正常运行)
+- 安装ParallelWaveGAN: 参见[官方地址](https://github.com/kan-bayashi/ParallelWaveGAN)：按照该官方链接的安装流程，直接在**项目的根目录下** git clone ParallelWaveGAN项目并且安装相关依赖即可。
+- 安装其他依赖: **sox, libsndfile**等
-本项目的代码基于 Paddle(version>=2.0)
 ### 2.预训练模型
@@ -21,12 +47,22 @@ ERNIE-SAT中我们提出了两项创新：
 - [ERNIE-SAT_ZH_and_EN](http://bj.bcebos.com/wenxin-models/model-ernie-sat-base-en_zh.tar.gz) 
+创建download文件夹，下载上述ERNIE-SAT预训练模型并将其解压: 
+```bash
+mkdir pretrained_model
+cd pretrained_model
+tar -zxvf model-ernie-sat-base-en.tar.gz
+tar -zxvf model-ernie-sat-base-zh.tar.gz
+tar -zxvf model-ernie-sat-base-en_zh.tar.gz
+```
 ### 3.下载
 1. 本项目使用parallel wavegan作为声码器(vocoder): 
    - [pwg_aishell3_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/pwgan/pwg_aishell3_ckpt_0.5.zip)  
-创建download文件夹，下载上述预训练的声码器(vocoder)模型并将其解压
+创建download文件夹，下载上述预训练的声码器(vocoder)模型并将其解压:
 ```bash
 mkdir download
@@ -49,7 +85,7 @@ unzip fastspeech2_nosil_ljspeech_ckpt_0.5.zip
 ### 4.推理
 本项目当前开源了语音编辑、个性化语音合成、跨语言语音合成的推理代码，后续会逐步开源。
-注：当前采用的声码器版本与[模型训练时版本](https://github.com/kan-bayashi/ParallelWaveGAN)在英文上存在差异，您可使用模型训练时版本作为您的声码器，模型将在后续更新中升级。
+注：当前英文场下的合成语音采用的声码器默认为vctk_parallel_wavegan.v1.long, 可在[该链接](https://github.com/kan-bayashi/ParallelWaveGAN)中找到; 若use_pt_vocoder参数设置为False，则英文场景下使用paddle版本的声码器。
 我们提供特定音频文件, 以及其对应的文本、音素相关文件:
 - prompt_wav: 提供的音频文件
@@ -59,7 +95,7 @@ unzip fastspeech2_nosil_ljspeech_ckpt_0.5.zip
 ```text
 prompt_wav
 ├── p299_096.wav                 # 样例语音文件1
-├── SSB03540428.wav              # 样例语音文件2
+├── p243_313.wav                 # 样例语音文件2
 └── ...
 ```
@@ -85,11 +121,12 @@ prompt/dev
 12. ` --target_language` , 目标语言
 13. ` --output_name` , 合成语音名称
 14. ` --task_name` , 任务名称, 包括：语音编辑任务、个性化语音合成任务、跨语言语音合成任务
+15. ` use_pt_vocoder`,英文场景下是否使用torch版本的vocoder, 默认情况下为True; 设置为False则在英文场景下使用paddle版本vocoder
 运行以下脚本即可进行实验
 ```shell
 sh run_sedit_en.sh # 语音编辑任务(英文) 
 sh run_gen_en.sh # 个性化语音合成任务(英文)
-sh run_clone_en_to_zh.sh # 跨语言语音合成任务(英文到中文的克隆)
+sh run_clone_en_to_zh.sh # 跨语言语音合成任务(英文到中文的语音克隆)
 ```
--- a/ernie-sat/align_english.py
+++ b/ernie-sat/align_english.py
+#!/usr/bin/env python
+""" Usage:
+      align_english.py wavfile trsfile outwordfile outphonefile
+"""
+import os
+import sys
+from tqdm import tqdm
+import multiprocessing as mp
+PHONEME = 'tools/aligner/english_envir/english2phoneme/phoneme'
+MODEL_DIR = 'tools/aligner/english'
+HVITE = 'tools/htk/HTKTools/HVite'
+HCOPY = 'tools/htk/HTKTools/HCopy'
+def prep_txt(line, tmpbase, dictfile):
+    words = []
+    line = line.strip()
+    for pun in [',', '.', ':', ';', '!', '?', '"', '(', ')', '--', '---']:
+        line = line.replace(pun, ' ')
+    for wrd in line.split():
+        if (wrd[-1] == '-'):
+            wrd = wrd[:-1]
+        if (wrd[0] == "'"):
+            wrd = wrd[1:]
+        if wrd:
+            words.append(wrd)
+    ds = set([])
+    with open(dictfile, 'r') as fid:
+        for line in fid:
+            ds.add(line.split()[0])
+    unk_words = set([])
+    with open(tmpbase + '.txt', 'w') as fwid:
+        for wrd in words:
+            if (wrd.upper() not in ds):
+                unk_words.add(wrd.upper())
+            fwid.write(wrd + ' ')
+        fwid.write('\n')
+    #generate pronounciations for unknows words using 'letter to sound'
+    with open(tmpbase + '_unk.words', 'w') as fwid:
+        for unk in unk_words:
+            fwid.write(unk + '\n')
+    try:
+        os.system(PHONEME + ' ' + tmpbase + '_unk.words' + ' ' + tmpbase + '_unk.phons')
+    except:
+        print('english2phoneme error!')
+        sys.exit(1)
+    #add unknown words to the standard dictionary, generate a tmp dictionary for alignment 
+    fw = open(tmpbase + '.dict', 'w')
+    with open(dictfile, 'r') as fid:
+        for line in fid:
+            fw.write(line)
+    f = open(tmpbase + '_unk.words', 'r')
+    lines1 = f.readlines()
+    f.close()
+    f = open(tmpbase + '_unk.phons', 'r')
+    lines2 = f.readlines()
+    f.close()
+    for i in range(len(lines1)):
+        wrd = lines1[i].replace('\n', '')
+        phons = lines2[i].replace('\n', '').replace(' ', '')
+        seq = []
+        j = 0
+        while (j < len(phons)):
+            if (phons[j] > 'Z'):
+                if (phons[j] == 'j'):
+                    seq.append('JH')
+                elif (phons[j] == 'h'):
+                    seq.append('HH')
+                else:
+                    seq.append(phons[j].upper())
+                j += 1
+            else:
+                p = phons[j:j+2]
+                if (p == 'WH'):
+                    seq.append('W')
+                elif (p in ['TH', 'SH', 'HH', 'DH', 'CH', 'ZH', 'NG']):
+                    seq.append(p)
+                elif (p == 'AX'):
+                    seq.append('AH0')
+                else:
+                    seq.append(p + '1')
+                j += 2
+        fw.write(wrd + ' ')
+        for s in seq:
+            fw.write(' ' + s)
+        fw.write('\n')
+    fw.close()
+def prep_mlf(txt, tmpbase):
+    with open(tmpbase + '.mlf', 'w') as fwid:
+        fwid.write('#!MLF!#\n')
+        fwid.write('"' + tmpbase + '.lab"\n')
+        fwid.write('sp\n')
+        wrds = txt.split()
+        for wrd in wrds:
+            fwid.write(wrd.upper() + '\n')
+            fwid.write('sp\n')
+        fwid.write('.\n')
+def gen_res(tmpbase, outfile1, outfile2):
+    with open(tmpbase + '.txt', 'r') as fid:
+        words = fid.readline().strip().split()
+    words = txt.strip().split()
+    words.reverse()
+    with open(tmpbase + '.aligned', 'r') as fid:
+        lines = fid.readlines()
+    i = 2
+    times1 = []
+    times2 = []
+    while (i < len(lines)):
+        if (len(lines[i].split()) >= 4) and (lines[i].split()[0] != lines[i].split()[1]):
+            phn = lines[i].split()[2]
+            pst = (int(lines[i].split()[0])/1000+125)/10000
+            pen = (int(lines[i].split()[1])/1000+125)/10000
+            times2.append([phn, pst, pen])
+        if (len(lines[i].split()) == 5):
+            if (lines[i].split()[0] != lines[i].split()[1]):
+                wrd = lines[i].split()[-1].strip()
+                st = (int(lines[i].split()[0])/1000+125)/10000
+                j = i + 1
+                while (lines[j] != '.\n') and (len(lines[j].split()) != 5):
+                    j += 1
+                en = (int(lines[j-1].split()[1])/1000+125)/10000
+                times1.append([wrd, st, en])
+        i += 1
+    with open(outfile1, 'w') as fwid:
+        for item in times1:
+            if (item[0] == 'sp'):
+                fwid.write(str(item[1]) + ' ' + str(item[2]) + ' SIL\n')
+            else:
+                wrd = words.pop()
+                fwid.write(str(item[1]) + ' ' + str(item[2]) + ' ' + wrd + '\n')
+    if words:
+        print('not matched::' + alignfile)
+        sys.exit(1)
+    with open(outfile2, 'w') as fwid:
+        for item in times2:
+            fwid.write(str(item[1]) + ' ' + str(item[2]) + ' ' + item[0] + '\n')
+def alignment(wav_path, text_string):
+    tmpbase = '/tmp/' + os.environ['USER'] + '_' + str(os.getpid())
+    #prepare wav and trs files
+    try:
+        os.system('sox ' + wav_path + ' -r 16000 ' + tmpbase + '.wav remix -')
+    except:
+        print('sox error!')
+        return None
+    #prepare clean_transcript file
+    try:
+        prep_txt(text_string, tmpbase, MODEL_DIR + '/dict')
+    except:
+        print('prep_txt error!')
+        return None
+    #prepare mlf file
+    try:
+        with open(tmpbase + '.txt', 'r') as fid:
+            txt = fid.readline()
+        prep_mlf(txt, tmpbase)
+    except:
+        print('prep_mlf error!')
+        return None
+    #prepare scp
+    try:
+        os.system(HCOPY + ' -C ' + MODEL_DIR + '/16000/config ' + tmpbase + '.wav' + ' ' + tmpbase + '.plp')
+    except:
+        print('HCopy error!')
+        return None
+    #run alignment
+    try:
+        os.system(HVITE + ' -a -m -t 10000.0 10000.0 100000.0 -I ' + tmpbase + '.mlf -H ' + MODEL_DIR + '/16000/macros -H ' + MODEL_DIR + '/16000/hmmdefs -i ' + tmpbase +  '.aligned '  + tmpbase + '.dict ' + MODEL_DIR + '/monophones ' + tmpbase + '.plp 2>&1 > /dev/null') 
+    except:
+        print('HVite error!')
+        return None
+    with open(tmpbase + '.txt', 'r') as fid:
+        words = fid.readline().strip().split()
+    words = txt.strip().split()
+    words.reverse()
+    with open(tmpbase + '.aligned', 'r') as fid:
+        lines = fid.readlines()
+    i = 2
+    times2 = []
+    word2phns = {}
+    current_word = ''
+    index = 0
+    while (i < len(lines)):
+        splited_line = lines[i].strip().split()
+        if (len(splited_line) >= 4) and (splited_line[0] != splited_line[1]):
+            phn = splited_line[2]
+            pst = (int(splited_line[0])/1000+125)/10000
+            pen = (int(splited_line[1])/1000+125)/10000
+            times2.append([phn, pst, pen])
+            # splited_line[-1]!='sp'
+            if len(splited_line)==5:
+                current_word = str(index)+'_'+splited_line[-1]
+                word2phns[current_word] = phn
+                index+=1
+            elif len(splited_line)==4:
+                word2phns[current_word] += ' '+phn 
+        i+=1
+    return times2,word2phns
--- a/ernie-sat/align_mandarin.py
+++ b/ernie-sat/align_mandarin.py
+#!/usr/bin/env python
+""" Usage:
+      align_mandarin.py wavfile trsfile outwordfile putphonefile
+"""
+import os
+import sys
+from tqdm import tqdm
+import multiprocessing as mp
+MODEL_DIR = 'tools/aligner/mandarin'
+HVITE = 'tools/htk/HTKTools/HVite'
+HCOPY = 'tools/htk/HTKTools/HCopy'
+def prep_txt(line, tmpbase, dictfile):
+    words = []
+    line = line.strip()
+    for pun in [',', '.', ':', ';', '!', '?', '"', '(', ')', '--', '---', u'，', u'。', u'：', u'；', u'！', u'？', u'（', u'）']:
+        line = line.replace(pun, ' ')
+    for wrd in line.split():
+        if (wrd[-1] == '-'):
+            wrd = wrd[:-1]
+        if (wrd[0] == "'"):
+            wrd = wrd[1:]
+        if wrd:
+            words.append(wrd)
+    ds = set([])
+    with open(dictfile, 'r') as fid:
+        for line in fid:
+            ds.add(line.split()[0])
+    unk_words = set([])
+    with open(tmpbase + '.txt', 'w') as fwid:
+        for wrd in words:
+            if (wrd not in ds):
+                unk_words.add(wrd)
+            fwid.write(wrd + ' ')
+        fwid.write('\n')
+    return unk_words
+def prep_mlf(txt, tmpbase):
+    with open(tmpbase + '.mlf', 'w') as fwid:
+        fwid.write('#!MLF!#\n')
+        fwid.write('"' + tmpbase + '.lab"\n')
+        fwid.write('sp\n')
+        wrds = txt.split()
+        for wrd in wrds:
+            fwid.write(wrd.upper() + '\n')
+            fwid.write('sp\n')
+        fwid.write('.\n')
+def gen_res(tmpbase, outfile1, outfile2):
+    with open(tmpbase + '.txt', 'r') as fid:
+        words = fid.readline().strip().split()
+    words = txt.strip().split()
+    words.reverse()
+    with open(tmpbase + '.aligned', 'r') as fid:
+        lines = fid.readlines()
+    i = 2
+    times1 = []
+    times2 = []
+    while (i < len(lines)):
+        if (len(lines[i].split()) >= 4) and (lines[i].split()[0] != lines[i].split()[1]):
+            phn = lines[i].split()[2]
+            pst = (int(lines[i].split()[0])/1000+125)/10000
+            pen = (int(lines[i].split()[1])/1000+125)/10000
+            times2.append([phn, pst, pen])
+        if (len(lines[i].split()) == 5):
+            if (lines[i].split()[0] != lines[i].split()[1]):
+                wrd = lines[i].split()[-1].strip()
+                st = (int(lines[i].split()[0])/1000+125)/10000
+                j = i + 1
+                while (lines[j] != '.\n') and (len(lines[j].split()) != 5):
+                    j += 1
+                en = (int(lines[j-1].split()[1])/1000+125)/10000
+                times1.append([wrd, st, en])
+        i += 1
+    with open(outfile1, 'w') as fwid:
+        for item in times1:
+            if (item[0] == 'sp'):
+                fwid.write(str(item[1]) + ' ' + str(item[2]) + ' SIL\n')
+            else:
+                wrd = words.pop()
+                fwid.write(str(item[1]) + ' ' + str(item[2]) + ' ' + wrd + '\n')
+    if words:
+        print('not matched::' + alignfile)
+        sys.exit(1)
+    with open(outfile2, 'w') as fwid:
+        for item in times2:
+            fwid.write(str(item[1]) + ' ' + str(item[2]) + ' ' + item[0] + '\n')
+def alignment_zh(wav_path, text_string):
+    tmpbase = '/tmp/' + os.environ['USER'] + '_' + str(os.getpid())
+    #prepare wav and trs files
+    try:
+        os.system('sox ' + wav_path + ' -r 16000 -b 16 ' + tmpbase + '.wav remix -')
+    except:
+        print('sox error!')
+        return None
+    #prepare clean_transcript file
+    try:
+        unk_words = prep_txt(text_string, tmpbase, MODEL_DIR + '/dict')
+        if unk_words:
+            print('Error! Please add the following words to dictionary:')
+            for unk in unk_words:
+                print("非法words: ", unk)
+    except:
+        print('prep_txt error!')
+        return None
+    #prepare mlf file
+    try:
+        with open(tmpbase + '.txt', 'r') as fid:
+            txt = fid.readline()
+        prep_mlf(txt, tmpbase)
+    except:
+        print('prep_mlf error!')
+        return None
+    #prepare scp
+    try:
+        os.system(HCOPY + ' -C ' + MODEL_DIR + '/16000/config ' + tmpbase + '.wav' + ' ' + tmpbase + '.plp')
+    except:
+        print('HCopy error!')
+        return None
+    #run alignment
+    try:
+        os.system(HVITE + ' -a -m -t 10000.0 10000.0 100000.0 -I ' + tmpbase + '.mlf -H ' + MODEL_DIR + '/16000/macros -H ' + MODEL_DIR + '/16000/hmmdefs -i ' + tmpbase +  '.aligned '  + MODEL_DIR + '/dict ' + MODEL_DIR + '/monophones ' + tmpbase + '.plp 2>&1 > /dev/null')
+    except:
+        print('HVite error!')
+        return None
+    with open(tmpbase + '.txt', 'r') as fid:
+        words = fid.readline().strip().split()
+    words = txt.strip().split()
+    words.reverse()
+    with open(tmpbase + '.aligned', 'r') as fid:
+        lines = fid.readlines()
+    i = 2
+    times2 = []
+    word2phns = {}      
+    current_word = ''
+    index = 0
+    while (i < len(lines)):
+        splited_line = lines[i].strip().split()
+        if (len(splited_line) >= 4) and (splited_line[0] != splited_line[1]):
+            phn = splited_line[2]
+            pst = (int(splited_line[0])/1000+125)/10000
+            pen = (int(splited_line[1])/1000+125)/10000
+            times2.append([phn, pst, pen])
+            # splited_line[-1]!='sp'
+            if len(splited_line)==5:
+                current_word = str(index)+'_'+splited_line[-1]
+                word2phns[current_word] = phn
+                index+=1
+            elif len(splited_line)==4:
+                word2phns[current_word] += ' '+phn
+        i+=1
+    return times2,word2phns
--- a/ernie-sat/dataset.py
+++ b/ernie-sat/dataset.py
+import paddle
+import numpy as np
+import math
+def pad_list(xs, pad_value):
+    """Perform padding for the list of tensors.
+    Args:
+        xs (List): List of Tensors [(T_1, `*`), (T_2, `*`), ..., (T_B, `*`)].
+        pad_value (float): Value for padding.
+    Returns:
+        Tensor: Padded tensor (B, Tmax, `*`).
+    Examples:
+        >>> x = [torch.ones(4), torch.ones(2), torch.ones(1)]
+        >>> x
+        [tensor([1., 1., 1., 1.]), tensor([1., 1.]), tensor([1.])]
+        >>> pad_list(x, 0)
+        tensor([[1., 1., 1., 1.],
+                [1., 1., 0., 0.],
+                [1., 0., 0., 0.]])
+    """
+    n_batch = len(xs)
+    max_len = max(paddle.shape(x)[0] for x in xs)
+    pad = paddle.full((n_batch, max_len), pad_value, dtype = xs[0].dtype)
+    for i in range(n_batch):
+        pad[i, : paddle.shape(xs[i])[0]] = xs[i]
+    return pad
+def pad_to_longformer_att_window(text, max_len, max_tlen,attention_window):
+    round = max_len % attention_window
+    if round != 0:
+        max_tlen += (attention_window - round)
+        n_batch = paddle.shape(text)[0]
+        text_pad = paddle.zeros((n_batch, max_tlen, *paddle.shape(text[0])[1:]), dtype=text.dtype)
+        for i in range(n_batch):
+            text_pad[i, : paddle.shape(text[i])[0]] = text[i]
+    else:
+        text_pad = text[:, : max_tlen]
+    return text_pad, max_tlen
+def make_pad_mask(lengths, xs=None, length_dim=-1):
+    """Make mask tensor containing indices of padded part.
+    Args:
+        lengths (LongTensor or List): Batch of lengths (B,).
+        xs (Tensor, optional): The reference tensor.
+            If set, masks will be the same shape as this tensor.
+        length_dim (int, optional): Dimension indicator of the above tensor.
+            See the example.
+    Returns:
+        Tensor: Mask tensor containing indices of padded part.
+                dtype=torch.uint8 in PyTorch 1.2-
+                dtype=torch.bool in PyTorch 1.2+ (including 1.2)
+    Examples:
+        With only lengths.
+        >>> lengths = [5, 3, 2]
+        >>> make_non_pad_mask(lengths)
+        masks = [[0, 0, 0, 0 ,0],
+                 [0, 0, 0, 1, 1],
+                 [0, 0, 1, 1, 1]]
+        With the reference tensor.
+        >>> xs = torch.zeros((3, 2, 4))
+        >>> make_pad_mask(lengths, xs)
+        tensor([[[0, 0, 0, 0],
+                 [0, 0, 0, 0]],
+                [[0, 0, 0, 1],
+                 [0, 0, 0, 1]],
+                [[0, 0, 1, 1],
+                 [0, 0, 1, 1]]], dtype=torch.uint8)
+        >>> xs = torch.zeros((3, 2, 6))
+        >>> make_pad_mask(lengths, xs)
+        tensor([[[0, 0, 0, 0, 0, 1],
+                 [0, 0, 0, 0, 0, 1]],
+                [[0, 0, 0, 1, 1, 1],
+                 [0, 0, 0, 1, 1, 1]],
+                [[0, 0, 1, 1, 1, 1],
+                 [0, 0, 1, 1, 1, 1]]], dtype=torch.uint8)
+        With the reference tensor and dimension indicator.
+        >>> xs = torch.zeros((3, 6, 6))
+        >>> make_pad_mask(lengths, xs, 1)
+        tensor([[[0, 0, 0, 0, 0, 0],
+                 [0, 0, 0, 0, 0, 0],
+                 [0, 0, 0, 0, 0, 0],
+                 [0, 0, 0, 0, 0, 0],
+                 [0, 0, 0, 0, 0, 0],
+                 [1, 1, 1, 1, 1, 1]],
+                [[0, 0, 0, 0, 0, 0],
+                 [0, 0, 0, 0, 0, 0],
+                 [0, 0, 0, 0, 0, 0],
+                 [1, 1, 1, 1, 1, 1],
+                 [1, 1, 1, 1, 1, 1],
+                 [1, 1, 1, 1, 1, 1]],
+                [[0, 0, 0, 0, 0, 0],
+                 [0, 0, 0, 0, 0, 0],
+                 [1, 1, 1, 1, 1, 1],
+                 [1, 1, 1, 1, 1, 1],
+                 [1, 1, 1, 1, 1, 1],
+                 [1, 1, 1, 1, 1, 1]]], dtype=torch.uint8)
+        >>> make_pad_mask(lengths, xs, 2)
+        tensor([[[0, 0, 0, 0, 0, 1],
+                 [0, 0, 0, 0, 0, 1],
+                 [0, 0, 0, 0, 0, 1],
+                 [0, 0, 0, 0, 0, 1],
+                 [0, 0, 0, 0, 0, 1],
+                 [0, 0, 0, 0, 0, 1]],
+                [[0, 0, 0, 1, 1, 1],
+                 [0, 0, 0, 1, 1, 1],
+                 [0, 0, 0, 1, 1, 1],
+                 [0, 0, 0, 1, 1, 1],
+                 [0, 0, 0, 1, 1, 1],
+                 [0, 0, 0, 1, 1, 1]],
+                [[0, 0, 1, 1, 1, 1],
+                 [0, 0, 1, 1, 1, 1],
+                 [0, 0, 1, 1, 1, 1],
+                 [0, 0, 1, 1, 1, 1],
+                 [0, 0, 1, 1, 1, 1],
+                 [0, 0, 1, 1, 1, 1]]], dtype=torch.uint8)
+    """
+    if length_dim == 0:
+        raise ValueError("length_dim cannot be 0: {}".format(length_dim))
+    if not isinstance(lengths, list):
+        lengths = list(lengths)
+    # print('lengths', lengths)
+    bs = int(len(lengths))
+    if xs is None:
+        maxlen = int(max(lengths))
+    else:
+        maxlen = paddle.shape(xs)[length_dim]
+    seq_range = paddle.arange(0, maxlen, dtype=paddle.int64)
+    seq_range_expand = paddle.expand(paddle.unsqueeze(seq_range, 0), (bs, maxlen))
+    seq_length_expand = paddle.unsqueeze(paddle.to_tensor(lengths), -1)
+    # print('seq_length_expand', paddle.shape(seq_length_expand))
+    # print('seq_range_expand', paddle.shape(seq_range_expand))
+    mask = seq_range_expand >= seq_length_expand
+    if xs is not None:
+        assert paddle.shape(xs)[0] == bs, (paddle.shape(xs)[0], bs)
+        if length_dim < 0:
+            length_dim = len(paddle.shape(xs)) + length_dim
+        # ind = (:, None, ..., None, :, , None, ..., None)
+        ind = tuple(
+            slice(None) if i in (0, length_dim) else None for i in range(len(paddle.shape(xs)))
+        )
+        # print('0:', paddle.shape(mask))
+        # print('1:', paddle.shape(mask[ind]))
+        # print('2:', paddle.shape(xs))
+        mask = paddle.expand(mask[ind], paddle.shape(xs))
+    return mask
+def make_non_pad_mask(lengths, xs=None, length_dim=-1):
+    """Make mask tensor containing indices of non-padded part.
+    Args:
+        lengths (LongTensor or List): Batch of lengths (B,).
+        xs (Tensor, optional): The reference tensor.
+            If set, masks will be the same shape as this tensor.
+        length_dim (int, optional): Dimension indicator of the above tensor.
+            See the example.
+    Returns:
+        ByteTensor: mask tensor containing indices of padded part.
+                    dtype=torch.uint8 in PyTorch 1.2-
+                    dtype=torch.bool in PyTorch 1.2+ (including 1.2)
+    Examples:
+        With only lengths.
+        >>> lengths = [5, 3, 2]
+        >>> make_non_pad_mask(lengths)
+        masks = [[1, 1, 1, 1 ,1],
+                 [1, 1, 1, 0, 0],
+                 [1, 1, 0, 0, 0]]
+        With the reference tensor.
+        >>> xs = torch.zeros((3, 2, 4))
+        >>> make_non_pad_mask(lengths, xs)
+        tensor([[[1, 1, 1, 1],
+                 [1, 1, 1, 1]],
+                [[1, 1, 1, 0],
+                 [1, 1, 1, 0]],
+                [[1, 1, 0, 0],
+                 [1, 1, 0, 0]]], dtype=torch.uint8)
+        >>> xs = torch.zeros((3, 2, 6))
+        >>> make_non_pad_mask(lengths, xs)
+        tensor([[[1, 1, 1, 1, 1, 0],
+                 [1, 1, 1, 1, 1, 0]],
+                [[1, 1, 1, 0, 0, 0],
+                 [1, 1, 1, 0, 0, 0]],
+                [[1, 1, 0, 0, 0, 0],
+                 [1, 1, 0, 0, 0, 0]]], dtype=torch.uint8)
+        With the reference tensor and dimension indicator.
+        >>> xs = torch.zeros((3, 6, 6))
+        >>> make_non_pad_mask(lengths, xs, 1)
+        tensor([[[1, 1, 1, 1, 1, 1],
+                 [1, 1, 1, 1, 1, 1],
+                 [1, 1, 1, 1, 1, 1],
+                 [1, 1, 1, 1, 1, 1],
+                 [1, 1, 1, 1, 1, 1],
+                 [0, 0, 0, 0, 0, 0]],
+                [[1, 1, 1, 1, 1, 1],
+                 [1, 1, 1, 1, 1, 1],
+                 [1, 1, 1, 1, 1, 1],
+                 [0, 0, 0, 0, 0, 0],
+                 [0, 0, 0, 0, 0, 0],
+                 [0, 0, 0, 0, 0, 0]],
+                [[1, 1, 1, 1, 1, 1],
+                 [1, 1, 1, 1, 1, 1],
+                 [0, 0, 0, 0, 0, 0],
+                 [0, 0, 0, 0, 0, 0],
+                 [0, 0, 0, 0, 0, 0],
+                 [0, 0, 0, 0, 0, 0]]], dtype=torch.uint8)
+        >>> make_non_pad_mask(lengths, xs, 2)
+        tensor([[[1, 1, 1, 1, 1, 0],
+                 [1, 1, 1, 1, 1, 0],
+                 [1, 1, 1, 1, 1, 0],
+                 [1, 1, 1, 1, 1, 0],
+                 [1, 1, 1, 1, 1, 0],
+                 [1, 1, 1, 1, 1, 0]],
+                [[1, 1, 1, 0, 0, 0],
+                 [1, 1, 1, 0, 0, 0],
+                 [1, 1, 1, 0, 0, 0],
+                 [1, 1, 1, 0, 0, 0],
+                 [1, 1, 1, 0, 0, 0],
+                 [1, 1, 1, 0, 0, 0]],
+                [[1, 1, 0, 0, 0, 0],
+                 [1, 1, 0, 0, 0, 0],
+                 [1, 1, 0, 0, 0, 0],
+                 [1, 1, 0, 0, 0, 0],
+                 [1, 1, 0, 0, 0, 0],
+                 [1, 1, 0, 0, 0, 0]]], dtype=torch.uint8)
+    """
+    return ~make_pad_mask(lengths, xs, length_dim)
+def phones_masking(xs_pad, src_mask, align_start, align_end, align_start_lengths, mlm_prob, mean_phn_span, span_boundary=None):
+    bz, sent_len, _ = paddle.shape(xs_pad)
+    mask_num_lower = math.ceil(sent_len * mlm_prob)
+    masked_position = np.zeros((bz, sent_len))
+    y_masks = None
+    # y_masks = torch.ones(bz,sent_len,sent_len,device=xs_pad.device,dtype=xs_pad.dtype)
+    # tril_masks = torch.tril(y_masks)
+    if mlm_prob == 1.0:
+        masked_position += 1
+        # y_masks = tril_masks
+    elif mean_phn_span == 0:
+        # only speech 
+        length = sent_len
+        mean_phn_span = min(length*mlm_prob//3, 50)
+        masked_phn_indices = random_spans_noise_mask(length,mlm_prob, mean_phn_span).nonzero()
+        masked_position[:,masked_phn_indices]=1
+    else:
+        for idx in range(bz):
+            if span_boundary is not None:
+                for s,e in zip(span_boundary[idx][::2], span_boundary[idx][1::2]):
+                    masked_position[idx, s:e] = 1
+                    # y_masks[idx, :, s:e] = tril_masks[idx, :, s:e]
+                    # y_masks[idx, e:, s:e ] = 0
+            else:
+                length = align_start_lengths[idx].item()
+                if length<2:
+                    continue
+                masked_phn_indices = random_spans_noise_mask(length,mlm_prob, mean_phn_span).nonzero()
+                masked_start = align_start[idx][masked_phn_indices].tolist()
+                masked_end = align_end[idx][masked_phn_indices].tolist()
+                for s,e in zip(masked_start, masked_end):
+                    masked_position[idx, s:e] = 1
+                    # y_masks[idx, :, s:e] = tril_masks[idx, :, s:e]
+                    # y_masks[idx, e:, s:e ] = 0
+    non_eos_mask = np.array(paddle.reshape(src_mask, paddle.shape(xs_pad)[:2]).float().cpu())
+    masked_position = masked_position * non_eos_mask
+    # y_masks = src_mask & y_masks.bool()
+    return paddle.cast(paddle.to_tensor(masked_position), paddle.bool), y_masks
+def get_segment_pos(speech_pad, text_pad, align_start, align_end, align_start_lengths,sega_emb):
+    bz, speech_len, _ = speech_pad.size()
+    _, text_len = text_pad.size()
+    # text_segment_pos = paddle.zeros_like(text_pad)
+    # speech_segment_pos = paddle.zeros((bz, speech_len),dtype=text_pad.dtype)
+    text_segment_pos = np.zeros((bz, text_len)).astype('int64')
+    speech_segment_pos = np.zeros((bz, speech_len)).astype('int64')
+    if not sega_emb:
+        text_segment_pos = paddle.to_tensor(text_segment_pos)
+        speech_segment_pos = paddle.to_tensor(speech_segment_pos)
+        return speech_segment_pos, text_segment_pos
+    for idx in range(bz):
+        align_length = align_start_lengths[idx].item()
+        for j in range(align_length):
+            s,e = align_start[idx][j].item(), align_end[idx][j].item()
+            speech_segment_pos[idx][s:e] = j+1
+            text_segment_pos[idx][j] = j+1
+    text_segment_pos = paddle.to_tensor(text_segment_pos)
+    speech_segment_pos = paddle.to_tensor(speech_segment_pos)
+    return speech_segment_pos, text_segment_pos
\ No newline at end of file
--- a/ernie-sat/inference.py
+++ b/ernie-sat/inference.py
--- a/ernie-sat/paddlespeech/t2s/datasets/get_feats.py
+++ b/ernie-sat/paddlespeech/t2s/datasets/get_feats.py
@@ -63,11 +63,7 @@ class LogMelFBank():
            window=self.window,
            center=self.center,
            pad_mode=self.pad_mode)
-        f = open('/mnt/home/xiaoran/projects/wave_summit/espnet_dual_mask/tmp_var_stft.out.1', 'w')
-        print('stft shape is', D.size())
-        # for item in [round(item, 6) for item in output["speech"][0].tolist()]:
-        #     f.write(str(item)+'\n')
-        # f.close()
        return D
    def _spectrogram(self, wav):

--- a/ernie-sat/phn_mapping.txt
+++ b/ernie-sat/phn_mapping.txt
-ou3 ou3
-a3 a3
-eng4 eng4
-u1 u1
-vn2 vn2
-uang3 uang3
-ang3 ang3
-ua1 ua1
-ou1 ou1
-in3 in3
-uai4 uai4
-van1 van1
-en2 en2
-ia4 ia4
-uai2 uai2
-iang4 iang4
-ai3 ai3
-sp sp
-in1 in1
-uai3 uai3
-ve1 ve1
-ou4 ou4
-d d
-ang2 ang2
-iang3 iang3
-o1 o1
-iao3 iao3
-an1 an1
-en5 en5
-ong3 ong3
-e5 e5
-e3 e3
-van3 van3
-i3 i3
-i2 i2
-uo4 uo4
-i1 i1
-in2 in2
-v1 v1
-uang4 uang4
-en3 en3
-ian5 ian5
-ie3 ie3
-o2 o2
-x x
-iang2 iang2
-ei1 ei1
-uang2 uang2
-t t
-ao4 ao4
-ch ch
-o3 o3
-en1 en1
-ie1 ie1
-uan3 uan3
-uo1 uo1
-iang5 iang5
-iong1 iong1
-l l
-a5 a5
-an4 an4
-u2 u2
-ei3 ei3
-uo3 uo3
-ai2 ai2
-v3 v3
-k k
-uan4 uan4
-ian2 ian2
-ei2 ei2
-sh sh
-g g
-ong2 ong2
-ing1 ing1
-vn3 vn3
-r r
-ong1 ong1
-ao1 ao1
-ua3 ua3
-ia1 ia1
-u3 u3
-s s
-b b
-e2 e2
-ua4 ua4
-iang1 iang1
-ie4 ie4
-ou5 ou5
-ing4 ing4
-ai1 ai1
-iong4 iong4
-uo5 uo5
-ei5 ei5
-ueng1 ueng1
-ou2 ou2
-e1 e1
-f f
-en4 en4
-v2 v2
-iao2 iao2
-ie2 ie2
-van2 van2
-eng1 eng1
-ai4 ai4
-uo2 uo2
-iao1 iao1
-in4 in4
-er4 er4
-e4 e4
-uan1 uan1
-ia3 ia3
-ao2 ao2
-u4 u4
-ei4 ei4
-eng3 eng3
-z z
-j j
-ve3 ve3
-n n
-an3 an3
-uan2 uan2
-o5 o5
-ve2 ve2
-ang4 ang4
-er2 er2
-ia5 ia5
-ian4 ian4
-er5 er5
-ia2 ia2
-eng2 eng2
-ie5 ie5
-ang1 ang1
-er3 er3
-ian1 ian1
-<unk> <unk>
-c c
-v4 v4
-iao4 iao4
-a4 a4
-m m
-a2 a2
-ong4 ong4
-q q
-uang1 uang1
-an2 an2
-ua2 ua2
-zh zh
-ing2 ing2
-ve4 ve4
-van4 van4
-vn4 vn4
-iong3 iong3
-i4 i4
-ian3 ian3
-ing3 ing3
-p p
-iong2 iong2
-ao3 ao3
-vn1 vn1
-uai1 uai1
-a1 a1
-o4 o4
-h h
-uenr4 un4 ee er5
-iaor3 iao3 ee er2
-iour4 ii iu4 ee er2
-iangr4 ii iang4 ee er5
-iou3 ii iu3 
-sil sp
-iour1 iu1 ee er5
-vn5 vn1
-ir1 i1 ee er2
-vanr1 van1 ee er2
-vanr2 van2 ee er5
-air3 ai3 ee er2
-uangr4 uu uang1
-enr1 en1 ee er2
-iour3 ii iu3 ee er5
-uenr1 un1 ee er5
-uenr3 un3 ee er5
-or2 o2 ee er2
-anr3 an3 ee er5
-ai5 ai4
-iaor2 iao2 ee er2
-uanr3 uan3 ee er5
-uanr2 uu uan4 ee er2
-uen1 un1
-ua5 uu ua2
-uen3 uu un3 
-iii4 ix4
-uor1 uo1 ee er5
-our2 ou5 ee er2
-uei1 uu ui1
-vr3 v3 ee er5
-uenr2 un2 ee er5
-uanr5 uu uan2 ee er5
-iiir4 ix4 ee er5
-iiir1 ix1 ee er5
-ur2 u3 ee er5 
-eng5 eng1
-ingr1 ii ing1 ee er2
-ii4 iy4
-ve5 vv ve1 
-？ <unk>
-ii1 iy1
-ao5 ao3
-v5 vv v2
-ing5 ing2
-i5 i1 
-iou5 ii iu3
-uen4 un4
-our4 ou4 ee er5
-io3 ii iu3
-ar4 a4 ee er5
-ingr2 ing2 ee er5
-ingr4 ing4 ee er5
-ir3 e5 ee er5
-iaor4 iao4 ee er5 
-ii2 ix2
-uanr4 uan4 ee er5
-enr5 en4 ee er2
-ianr3 ian3 ee er5 
-uei5 uu ui2
-ianr4 ian4 ee er2
-iar4 ia4 ee er2
-uair4 uai1 ee er2
-enr2 en2 ee er5
-iii1 ix1
-ver3 ve3 ee er2
-ianr5 ian3 ee er5 
-ong5 ong1
-air2 ai2 ee er5
-angr4 ang4 ee er5
-iii5 ix2
-ang5 ang1
-iou1 iu1
-uar4 ua4 ee er5
-ur4 u4 ee er5 
-iou4 iu4
-iou2 ii iu2
-in5 in1
-uor2 uo2 ee er5
-uar2 ua2 ee er5
-uei2 uu ui2
-<pad> <unk> 
-anr1 an1 ee er5
-ar5 a1 ee er5
-uen2 un2
-eir4 ei4 ee er2
-ingr3 ii ing3 ee er5
-aor4 ao4 ee er5
-enr4 en4 ee er5 
-iao5 ii iao2 
-iii2 ix2
-er1 e1 ee er5
-iaor1 iao1 ee er5
-ueir1 ui1 ee er2
-inr4 in4 ee er5
-ueir2 ui4 ee er5
-uan5 ai2 ee er5
-ir4 i4 ee er2
-ur1 u1 ee er5
-iour2 iu1 ee er2
-ar2 a2 ee er5
-an5 an2
-iii3 ix3
-ver4 vv ve4 ee er2
-。 <unk>
-aor3 ao3 ee er5
-iong5 ii iong4
-u5 u4
-air4 ai4 ee er5
-ii3 iy3
-our5 ou4 ee er5
-inr1 in1 ee er5
-uor3 uo3 ee er5
-van5 van4
-ur5 u4 ee er2
-aor5 ao4 ee er5
-engr4 eng4 ee er2
-ueir4 ui4 ee er5
-<eos> <unk>
-angr2 ang2 ee er2
-ii5 iy5
-vnr2 vn2 ee er5
-enr3 en3 ee er5
-uar1 ua1 ee er2
-vanr4 van4 ee er5
-， <unk>
-uor5 uo3 ee er5
-uei4 ui4
-aor1 ao1 ee er5
-uen5 uu un4
-anr4 an4 ee er5
-iar1 ia1 ee er5
-vanr3 van3 ee er5
-uei3 uu ui3
-！ <unk>
-io1 ii uo5 
-spl <unk>
-ar3 a3 ee er5
-our3 ou3 ee er5
-ueir3 ui3 ee er5
-ianr2 ian3 ee er5
-ueng4 uu un4
-ianr1 ian1 ee er5
--- a/ernie-sat/prompt_wav/SSB03420111.wav
+++ b/ernie-sat/prompt_wav/SSB03420111.wav
--- a/ernie-sat/prompt_wav/SSB03540015.wav
+++ b/ernie-sat/prompt_wav/SSB03540015.wav
--- a/ernie-sat/prompt_wav/SSB03540307.wav
+++ b/ernie-sat/prompt_wav/SSB03540307.wav
--- a/ernie-sat/prompt_wav/SSB03540428.wav
+++ b/ernie-sat/prompt_wav/SSB03540428.wav
--- a/ernie-sat/prompt_wav/p323_083.wav
+++ b/ernie-sat/prompt_wav/p323_083.wav
--- a/ernie-sat/run_clone_en_to_zh.sh
+++ b/ernie-sat/run_clone_en_to_zh.sh
 # en --> zh  的 语音合成
-# 根据Prompt_003_new对应的语音: This was not the show for me. 来合成:  '今天天气很好'
+# 根据Prompt_003_new作为提示语音: This was not the show for me. 来合成:  '今天天气很好'
+# 注: 输入的new_str需为中文汉字, 否则会通过预处理只保留中文汉字, 即合成预处理后的中文语音。
 python inference.py \
 --task_name cross-lingual_clone \
--model_name paddle_checkpoint_ench \
+--model_name paddle_checkpoint_dual_mask_enzh \
 --uid Prompt_003_new \
--new_str '今天天气很好' \
+--new_str '今天天气很好.' \
 --prefix ./prompt/dev/ \
 --source_language english \
 --target_language chinese \
--output_name pred_zh.wav \
+--output_name pred_clone.wav \
+--use_pt_vocoder False \
 --voc pwgan_aishell3 \
 --voc_config download/pwg_aishell3_ckpt_0.5/default.yaml \
 --voc_ckpt download/pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz \

--- a/ernie-sat/run_gen_en.sh
+++ b/ernie-sat/run_gen_en.sh
 # 纯英文的语音合成
-# 根据p299_096对应的语音: This was not the show for me. 来合成:  'I enjoy my life.'
+# 样例为根据p299_096对应的语音作为提示语音: This was not the show for me. 来合成:  'I enjoy my life.'
 python inference.py \
 --task_name synthesize \
@@ -9,7 +9,8 @@ python inference.py \
 --prefix ./prompt/dev/ \
 --source_language english \
 --target_language english \
--output_name pred.wav \
+--output_name pred_gen.wav \
+--use_pt_vocoder True \
 --voc pwgan_aishell3 \
 --voc_config download/pwg_aishell3_ckpt_0.5/default.yaml \
 --voc_ckpt download/pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz \

--- a/ernie-sat/run_sedit_en.sh
+++ b/ernie-sat/run_sedit_en.sh
 # 纯英文的语音编辑
-# 将p243_new对应的原始语音: For that reason cover should not be given. 编辑成'for that reason cover is impossible to be given.'对应的语音
+# 样例为把p243_new对应的原始语音: For that reason cover should not be given.编辑成'for that reason cover is impossible to be given.'对应的语音
+# NOTE: 语音编辑任务暂支持句子中1个位置的替换或者插入文本操作
 python inference.py \
 --task_name edit \
@@ -9,7 +10,8 @@ python inference.py \
 --prefix ./prompt/dev/ \
 --source_language english \
 --target_language english \
--output_name pred.wav \
+--output_name pred_edit.wav \
+--use_pt_vocoder True \
 --voc pwgan_aishell3 \
 --voc_config download/pwg_aishell3_ckpt_0.5/default.yaml \
 --voc_ckpt download/pwg_aishell3_ckpt_0.5/snapshot_iter_1000000.pdz \

--- a/ernie-sat/sedit_arg_parser.py
+++ b/ernie-sat/sedit_arg_parser.py
@@ -73,8 +73,8 @@ def parse_args():
    parser.add_argument(
        "--ngpu", type=int, default=1, help="if ngpu == 0, use cpu.")
-    parser.add_argument("--test_metadata", type=str, help="test metadata.")
+    # parser.add_argument("--test_metadata", type=str, help="test metadata.")
-    parser.add_argument("--output_dir", type=str, help="output dir.")
+    # parser.add_argument("--output_dir", type=str, help="output dir.")
    parser.add_argument("--model_name", type=str, help="model name")
    parser.add_argument("--uid", type=str, help="uid")
@@ -86,7 +86,7 @@ def parse_args():
    parser.add_argument("--target_language", type=str, help="target language")
    parser.add_argument("--output_name", type=str, help="output name")
    parser.add_argument("--task_name", type=str, help="task name")
+    parser.add_argument("--use_pt_vocoder", default=True, help="use pytorch version vocoder or not. [Note: only in english condition.]")
    # pre
    args = parser.parse_args()

--- a/ernie-sat/tmp/tmp_pkl.Prompt_003_new
+++ b/ernie-sat/tmp/tmp_pkl.Prompt_003_new
--- a/ernie-sat/tmp/tmp_pkl.p243_new
+++ b/ernie-sat/tmp/tmp_pkl.p243_new
--- a/ernie-sat/tmp/tmp_pkl.p299_096
+++ b/ernie-sat/tmp/tmp_pkl.p299_096
--- a/ernie-sat/tools/.DS_Store
+++ b/ernie-sat/tools/.DS_Store
--- a/ernie-sat/tools/aligner/english/11025/config
+++ b/ernie-sat/tools/aligner/english/11025/config
+# Coding parameters
+SOURCEKIND = WAVEFORM
+SOURCEFORMAT = WAVE
+SOURCERATE = 907.02947845804988
+TARGETKIND = PLP_0_D_A_Z
+TARGETRATE = 100000.0
+SAVECOMPRESSED = T
+SAVEWITHCRC = T
+WINDOWSIZE = 250000.0
+ZMEANSOURCE = T
+USEHAMMING = T
+PREEMCOEF = 0.97
+NUMCHANS = 20
+LPCORDER = 12 
+USEPOWER = T
--- a/ernie-sat/tools/aligner/english/11025/hmmdefs
+++ b/ernie-sat/tools/aligner/english/11025/hmmdefs
--- a/ernie-sat/tools/aligner/english/11025/macros
+++ b/ernie-sat/tools/aligner/english/11025/macros
+~o
+<STREAMINFO> 1 39
+<VECSIZE> 39<NULLD><PLP_D_A_Z_0><DIAGC>
+~v "varFloor1"
+<VARIANCE> 39
+ 2.970232e-03 3.081554e-03 3.337499e-03 4.222610e-03 4.197491e-03 3.755180e-03 3.401211e-03 3.156109e-03 2.829444e-03 2.476874e-03 1.801175e-03 1.400571e-03 4.726708e-03 1.402909e-04 1.383319e-04 1.553502e-04 2.128327e-04 2.107100e-04 2.003327e-04 2.263938e-04 2.249473e-04 2.067962e-04 1.757082e-04 1.399256e-04 1.028699e-04 1.197369e-04 2.207970e-05 2.272787e-05 2.571406e-05 3.619217e-05 3.745446e-05 3.682210e-05 4.203814e-05 4.217610e-05 3.967129e-05 3.367268e-05 2.703490e-05 1.971991e-05 1.748702e-05
--- a/ernie-sat/tools/aligner/english/16000/config
+++ b/ernie-sat/tools/aligner/english/16000/config
+# Coding parameters
+SOURCEKIND = WAVEFORM
+SOURCEFORMAT = WAVE
+SOURCERATE = 625.0
+TARGETKIND = PLP_0_D_A_Z
+TARGETRATE = 100000.0
+SAVECOMPRESSED = T
+SAVEWITHCRC = T
+WINDOWSIZE = 250000.0
+ZMEANSOURCE = T
+USEHAMMING = T
+PREEMCOEF = 0.97
+NUMCHANS = 20
+LPCORDER = 12 
+USEPOWER = T
--- a/ernie-sat/tools/aligner/english/16000/hmmdefs
+++ b/ernie-sat/tools/aligner/english/16000/hmmdefs
--- a/ernie-sat/tools/aligner/english/16000/macros
+++ b/ernie-sat/tools/aligner/english/16000/macros
+~o
+<STREAMINFO> 1 39
+<VECSIZE> 39<NULLD><PLP_D_A_Z_0><DIAGC>
+~v "varFloor1"
+<VARIANCE> 39
+ 2.970232e-03 3.081554e-03 3.337499e-03 4.222610e-03 4.197491e-03 3.755180e-03 3.401211e-03 3.156109e-03 2.829444e-03 2.476874e-03 1.801175e-03 1.400571e-03 4.726708e-03 1.402909e-04 1.383319e-04 1.553502e-04 2.128327e-04 2.107100e-04 2.003327e-04 2.263938e-04 2.249473e-04 2.067962e-04 1.757082e-04 1.399256e-04 1.028699e-04 1.197369e-04 2.207970e-05 2.272787e-05 2.571406e-05 3.619217e-05 3.745446e-05 3.682210e-05 4.203814e-05 4.217610e-05 3.967129e-05 3.367268e-05 2.703490e-05 1.971991e-05 1.748702e-05
--- a/ernie-sat/tools/aligner/english/8000/config
+++ b/ernie-sat/tools/aligner/english/8000/config
+# Coding parameters
+SOURCEKIND = WAVEFORM
+SOURCEFORMAT = WAVE
+SOURCERATE = 1250
+TARGETKIND = PLP_0_D_A_Z
+TARGETRATE = 100000.0
+SAVECOMPRESSED = T
+SAVEWITHCRC = T
+WINDOWSIZE = 250000.0
+ZMEANSOURCE = T
+USEHAMMING = T
+PREEMCOEF = 0.97
+NUMCHANS = 20
+LPCORDER = 12 
+USEPOWER = T
--- a/ernie-sat/tools/aligner/english/8000/hmmdefs
+++ b/ernie-sat/tools/aligner/english/8000/hmmdefs
--- a/ernie-sat/tools/aligner/english/8000/macros
+++ b/ernie-sat/tools/aligner/english/8000/macros
+~o
+<STREAMINFO> 1 39
+<VECSIZE> 39<NULLD><PLP_D_A_Z_0><DIAGC>
+~v "varFloor1"
+<VARIANCE> 39
+ 2.320759e-03 3.364773e-03 2.644561e-03 4.602237e-03 4.153211e-03 3.535625e-03 3.436818e-03 3.055576e-03 2.946933e-03 2.210875e-03 1.983593e-03 1.391166e-03 5.161191e-03 1.195636e-04 1.395769e-04 1.410736e-04 2.242859e-04 2.118236e-04 2.178820e-04 2.484023e-04 2.270718e-04 2.155360e-04 1.773744e-04 1.613469e-04 1.159174e-04 1.315518e-04 1.986226e-05 2.259619e-05 2.456991e-05 3.887276e-05 3.827550e-05 4.066243e-05 4.655687e-05 4.391165e-05 4.144727e-05 3.483306e-05 3.158762e-05 2.273686e-05 1.879711e-05
--- a/ernie-sat/tools/aligner/english/dict
+++ b/ernie-sat/tools/aligner/english/dict
--- a/ernie-sat/tools/aligner/english/monophones
+++ b/ernie-sat/tools/aligner/english/monophones
+EH2
+K
+S
+L
+AH0
+M
+EY1
+SH
+N
+P
+OY2
+T
+OW1
+Z
+W
+D
+AH1
+B
+EH1
+V
+IH1
+AA1
+R
+AY1
+ER0
+AE1
+AE2
+AO1
+NG
+G
+IH0
+TH
+IY2
+F
+DH
+IY1
+HH
+UH1
+IY0
+OY1
+OW2
+CH
+UW1
+IH2
+EH0
+AO2
+AA0
+AA2
+OW0
+EY0
+AE0
+AW2
+AW1
+EY2
+UW0
+AH2
+UW2
+AO0
+JH
+Y
+ZH
+AY2
+ER1
+UH2
+AY0
+ER2
+OY0
+UH0
+AW0
+br
+cg
+lg
+ls
+ns
+sil
+sp
--- a/ernie-sat/tools/aligner/english_envir/english/11025/config
+++ b/ernie-sat/tools/aligner/english_envir/english/11025/config
+# Coding parameters
+SOURCEKIND = WAVEFORM
+SOURCEFORMAT = WAVE
+SOURCERATE = 907.02947845804988
+TARGETKIND = PLP_0_D_A_Z
+TARGETRATE = 100000.0
+SAVECOMPRESSED = T
+SAVEWITHCRC = T
+WINDOWSIZE = 250000.0
+ZMEANSOURCE = T
+USEHAMMING = T
+PREEMCOEF = 0.97
+NUMCHANS = 20
+LPCORDER = 12 
+USEPOWER = T
--- a/ernie-sat/tools/aligner/english_envir/english/11025/hmmdefs
+++ b/ernie-sat/tools/aligner/english_envir/english/11025/hmmdefs
--- a/ernie-sat/tools/aligner/english_envir/english/11025/macros
+++ b/ernie-sat/tools/aligner/english_envir/english/11025/macros
+~o
+<STREAMINFO> 1 39
+<VECSIZE> 39<NULLD><PLP_D_A_Z_0><DIAGC>
+~v "varFloor1"
+<VARIANCE> 39
+ 2.970232e-03 3.081554e-03 3.337499e-03 4.222610e-03 4.197491e-03 3.755180e-03 3.401211e-03 3.156109e-03 2.829444e-03 2.476874e-03 1.801175e-03 1.400571e-03 4.726708e-03 1.402909e-04 1.383319e-04 1.553502e-04 2.128327e-04 2.107100e-04 2.003327e-04 2.263938e-04 2.249473e-04 2.067962e-04 1.757082e-04 1.399256e-04 1.028699e-04 1.197369e-04 2.207970e-05 2.272787e-05 2.571406e-05 3.619217e-05 3.745446e-05 3.682210e-05 4.203814e-05 4.217610e-05 3.967129e-05 3.367268e-05 2.703490e-05 1.971991e-05 1.748702e-05
--- a/ernie-sat/tools/aligner/english_envir/english/16000/config
+++ b/ernie-sat/tools/aligner/english_envir/english/16000/config
+# Coding parameters
+SOURCEKIND = WAVEFORM
+SOURCEFORMAT = WAVE
+SOURCERATE = 625.0
+TARGETKIND = PLP_0_D_A_Z
+TARGETRATE = 100000.0
+SAVECOMPRESSED = T
+SAVEWITHCRC = T
+WINDOWSIZE = 250000.0
+ZMEANSOURCE = T
+USEHAMMING = T
+PREEMCOEF = 0.97
+NUMCHANS = 20
+LPCORDER = 12 
+USEPOWER = T
--- a/ernie-sat/tools/aligner/english_envir/english/16000/hmmdefs
+++ b/ernie-sat/tools/aligner/english_envir/english/16000/hmmdefs
--- a/ernie-sat/tools/aligner/english_envir/english/16000/macros
+++ b/ernie-sat/tools/aligner/english_envir/english/16000/macros
+~o
+<STREAMINFO> 1 39
+<VECSIZE> 39<NULLD><PLP_D_A_Z_0><DIAGC>
+~v "varFloor1"
+<VARIANCE> 39
+ 2.970232e-03 3.081554e-03 3.337499e-03 4.222610e-03 4.197491e-03 3.755180e-03 3.401211e-03 3.156109e-03 2.829444e-03 2.476874e-03 1.801175e-03 1.400571e-03 4.726708e-03 1.402909e-04 1.383319e-04 1.553502e-04 2.128327e-04 2.107100e-04 2.003327e-04 2.263938e-04 2.249473e-04 2.067962e-04 1.757082e-04 1.399256e-04 1.028699e-04 1.197369e-04 2.207970e-05 2.272787e-05 2.571406e-05 3.619217e-05 3.745446e-05 3.682210e-05 4.203814e-05 4.217610e-05 3.967129e-05 3.367268e-05 2.703490e-05 1.971991e-05 1.748702e-05
--- a/ernie-sat/tools/aligner/english_envir/english/8000/config
+++ b/ernie-sat/tools/aligner/english_envir/english/8000/config
+# Coding parameters
+SOURCEKIND = WAVEFORM
+SOURCEFORMAT = WAVE
+SOURCERATE = 1250
+TARGETKIND = PLP_0_D_A_Z
+TARGETRATE = 100000.0
+SAVECOMPRESSED = T
+SAVEWITHCRC = T
+WINDOWSIZE = 250000.0
+ZMEANSOURCE = T
+USEHAMMING = T
+PREEMCOEF = 0.97
+NUMCHANS = 20
+LPCORDER = 12 
+USEPOWER = T
--- a/ernie-sat/tools/aligner/english_envir/english/8000/hmmdefs
+++ b/ernie-sat/tools/aligner/english_envir/english/8000/hmmdefs
--- a/ernie-sat/tools/aligner/english_envir/english/8000/macros
+++ b/ernie-sat/tools/aligner/english_envir/english/8000/macros
+~o
+<STREAMINFO> 1 39
+<VECSIZE> 39<NULLD><PLP_D_A_Z_0><DIAGC>
+~v "varFloor1"
+<VARIANCE> 39
+ 2.320759e-03 3.364773e-03 2.644561e-03 4.602237e-03 4.153211e-03 3.535625e-03 3.436818e-03 3.055576e-03 2.946933e-03 2.210875e-03 1.983593e-03 1.391166e-03 5.161191e-03 1.195636e-04 1.395769e-04 1.410736e-04 2.242859e-04 2.118236e-04 2.178820e-04 2.484023e-04 2.270718e-04 2.155360e-04 1.773744e-04 1.613469e-04 1.159174e-04 1.315518e-04 1.986226e-05 2.259619e-05 2.456991e-05 3.887276e-05 3.827550e-05 4.066243e-05 4.655687e-05 4.391165e-05 4.144727e-05 3.483306e-05 3.158762e-05 2.273686e-05 1.879711e-05
--- a/ernie-sat/tools/aligner/english_envir/english/dict
+++ b/ernie-sat/tools/aligner/english_envir/english/dict
--- a/ernie-sat/tools/aligner/english_envir/english/monophones
+++ b/ernie-sat/tools/aligner/english_envir/english/monophones
+EH2
+K
+S
+L
+AH0
+M
+EY1
+SH
+N
+P
+OY2
+T
+OW1
+Z
+W
+D
+AH1
+B
+EH1
+V
+IH1
+AA1
+R
+AY1
+ER0
+AE1
+AE2
+AO1
+NG
+G
+IH0
+TH
+IY2
+F
+DH
+IY1
+HH
+UH1
+IY0
+OY1
+OW2
+CH
+UW1
+IH2
+EH0
+AO2
+AA0
+AA2
+OW0
+EY0
+AE0
+AW2
+AW1
+EY2
+UW0
+AH2
+UW2
+AO0
+JH
+Y
+ZH
+AY2
+ER1
+UH2
+AY0
+ER2
+OY0
+UH0
+AW0
+br
+cg
+lg
+ls
+ns
+sil
+sp
--- a/ernie-sat/tools/aligner/english_envir/english2phoneme/Makefile
+++ b/ernie-sat/tools/aligner/english_envir/english2phoneme/Makefile
+phoneme: phoneme.o english.o parse.o saynum.o spellword.o
--- a/ernie-sat/tools/aligner/english_envir/english2phoneme/RCS/parse.c,v
+++ b/ernie-sat/tools/aligner/english_envir/english2phoneme/RCS/parse.c,v
+head	1.1;
+access;
+symbols;
+locks
+	steve:1.1; strict;
+comment	@ * @;
+1.1
+date	2009.03.13.20.13.23;	author steve;	state Exp;
+branches;
+next	;
+desc
+@parse.c
+@
+1.1
+log
+@Initial revision
+@
+text
+@#include <stdio.h>
+#include <ctype.h>
+#define MAX_LENGTH 128
+static FILE *In_file;
+static FILE *Out_file;
+static int Char, Char1, Char2, Char3;
+/*
+** main(argc, argv)
+**	int argc;
+**	char *argv[];
+**
+**	This is the main program.  It takes up to two file names (input
+**	and output)  and translates the input file to phoneme codes
+**	(see english.c) on the output file.
+*/
+main(argc, argv)
+	int argc;
+	char *argv[];
+	{
+	if (argc > 3)
+		{
+		fputs("Usage: PHONEME [infile [outfile]]\n", stderr);
+		exit();
+		}
+	if (argc == 1)
+		{
+		fputs("Enter english text:\n", stderr);
+		}
+	if (argc > 1)
+		{
+		In_file = fopen(argv[1], "r");
+		if (In_file == 0)
+			{
+			fputs("Error: Cannot open input file.\n", stderr);
+			fputs("Usage: PHONEME [infile [outfile]]\n", stderr);
+			exit();
+			}
+		}
+	else
+		In_file = stdin;
+	if (argc > 2)
+		{
+		Out_file = fopen(argv[2], "w");
+		if (Out_file == 0)
+			{
+			fputs("Error: Cannot create output file.\n", stderr);
+			fputs("Usage: PHONEME [infile [outfile]]\n", stderr);
+			exit();
+			}
+		}
+	else
+		Out_file = stdout;
+	xlate_file();
+	}
+outstring(string)
+	char *string;
+	{
+	while (*string != '\0')
+		outchar(*string++);
+	}
+outchar(chr)
+	int chr;
+	{
+	fputc(chr,Out_file);
+	}
+int makeupper(character)
+	int character;
+	{
+	if (islower(character))
+		return toupper(character);
+	else
+		return character;
+	}
+new_char()
+	{
+	/*
+	If the cache is full of newline, time to prime the look-ahead
+	again.  If an EOF is found, fill the remainder of the queue with
+	EOF's.
+	*/
+	if (Char == '\n'  && Char1 == '\n' && Char2 == '\n' && Char3 == '\n')
+		{	/* prime the pump again */
+		Char = getc(In_file);
+		if (Char == EOF)
+			{
+			Char1 = EOF;
+			Char2 = EOF;
+			Char3 = EOF;
+			return Char;
+			}
+		if (Char == '\n')
+			return Char;
+		Char1 = getc(In_file);
+		if (Char1 == EOF)
+			{
+			Char2 = EOF;
+			Char3 = EOF;
+			return Char;
+			}
+		if (Char1 == '\n')
+			return Char;
+		Char2 = getc(In_file);
+		if (Char2 == EOF)
+			{
+			Char3 = EOF;
+			return Char;
+			}
+		if (Char2 == '\n')
+			return Char;
+		Char3 = getc(In_file);
+		}
+	else
+		{
+		/*
+		Buffer not full of newline, shuffle the characters and
+		either get a new one or propagate a newline or EOF.
+		*/
+		Char = Char1;
+		Char1 = Char2;
+		Char2 = Char3;
+		if (Char3 != '\n' && Char3 != EOF)
+			Char3 = getc(In_file);
+		}
+	return Char;
+	}
+/*
+** xlate_file()
+**
+**	This is the input file translator.  It sets up the first character
+**	and uses it to determine what kind of text follows.
+*/
+xlate_file()
+	{
+	/* Prime the queue */
+	Char = '\n';
+	Char1 = '\n';
+	Char2 = '\n';
+	Char3 = '\n';
+	new_char();	/* Fill Char, Char1, Char2 and Char3 */
+	while (Char != EOF)	/* All of the words in the file */
+		{
+		if (isdigit(Char))
+			have_number();
+		else
+		if (isalpha(Char) || Char == '\'')
+			have_letter();
+		else
+		if (Char == '$' && isdigit(Char1))
+			have_dollars();
+		else
+			have_special();
+		}
+	}
+have_dollars()
+	{
+	long int value;
+	value = 0L;
+	for (new_char() ; isdigit(Char) || Char == ',' ; new_char())
+		{
+		if (Char != ',')
+			value = 10 * value + (Char-'0');
+		}
+	say_cardinal(value);	/* Say number of whole dollars */
+	/* Found a character that is a non-digit and non-comma */
+	/* Check for no decimal or no cents digits */
+	if (Char != '.' || !isdigit(Char1))
+		{
+		if (value == 1L)
+			outstring("dAAlER ");
+		else
+			outstring("dAAlAArz ");
+		return;
+		}
+	/* We have '.' followed by a digit */
+	new_char();	/* Skip the period */
+	/* If it is ".dd " say as " DOLLARS AND n CENTS " */
+	if (isdigit(Char1) && !isdigit(Char2))
+		{
+		if (value == 1L)
+			outstring("dAAlER ");
+		else
+			outstring("dAAlAArz ");
+		if (Char == '0' && Char1 == '0')
+			{
+			new_char();	/* Skip tens digit */
+			new_char();	/* Skip units digit */
+			return;
+			}
+		outstring("AAnd ");
+		value = (Char-'0')*10 + Char1-'0';
+		say_cardinal(value);
+		if (value == 1L)
+			outstring("sEHnt ");
+		else
+			outstring("sEHnts ");
+		new_char();	/* Used Char (tens digit) */
+		new_char();	/* Used Char1 (units digit) */
+		return;
+		}
+	/* Otherwise say as "n POINT ddd DOLLARS " */
+	outstring("pOYnt ");
+	for ( ; isdigit(Char) ; new_char())
+		{
+		say_ascii(Char);
+		}
+	outstring("dAAlAArz ");
+	return;
+	}
+have_special()
+	{
+	if (Char == '\n')
+		outchar('\n');
+	else
+	if (!isspace(Char))
+		say_ascii(Char);
+	new_char();
+	return;
+	}
+have_number()
+	{
+	long int value;
+	int lastdigit;
+	value = Char - '0';
+	lastdigit = Char;
+	for (new_char() ; isdigit(Char) ; new_char())
+		{
+		value = 10 * value + (Char-'0');
+		lastdigit = Char;
+		}
+	/* Recognize ordinals based on last digit of number */
+	switch (lastdigit)
+		{
+	case '1':	/* ST */
+		if (makeupper(Char) == 'S' && makeupper(Char1) == 'T' &&
+		    !isalpha(Char2) && !isdigit(Char2))
+			{
+			say_ordinal(value);
+			new_char();	/* Used Char */
+			new_char();	/* Used Char1 */
+			return;
+			}
+		break;
+	case '2':	/* ND */
+		if (makeupper(Char) == 'N' && makeupper(Char1) == 'D' &&
+		    !isalpha(Char2) && !isdigit(Char2))
+			{
+			say_ordinal(value);
+			new_char();	/* Used Char */
+			new_char();	/* Used Char1 */
+			return;
+			}
+		break;
+	case '3':	/* RD */
+		if (makeupper(Char) == 'R' && makeupper(Char1) == 'D' &&
+		    !isalpha(Char2) && !isdigit(Char2))
+			{
+			say_ordinal(value);
+			new_char();	/* Used Char */
+			new_char();	/* Used Char1 */
+			return;
+			}
+		break;
+	case '0':	/* TH */
+	case '4':	/* TH */
+	case '5':	/* TH */
+	case '6':	/* TH */
+	case '7':	/* TH */
+	case '8':	/* TH */
+	case '9':	/* TH */
+		if (makeupper(Char) == 'T' && makeupper(Char1) == 'H' &&
+		    !isalpha(Char2) && !isdigit(Char2))
+			{
+			say_ordinal(value);
+			new_char();	/* Used Char */
+			new_char();	/* Used Char1 */
+			return;
+			}
+		break;
+		}
+	say_cardinal(value);
+	/* Recognize decimal points */
+	if (Char == '.' && isdigit(Char1))
+		{
+		outstring("pOYnt ");
+		for (new_char() ; isdigit(Char) ; new_char())
+			{
+			say_ascii(Char);
+			}
+		}
+	/* Spell out trailing abbreviations */
+	if (isalpha(Char))
+		{
+		while (isalpha(Char))
+			{
+			say_ascii(Char);
+			new_char();
+			}
+		}
+	return;
+	}
+have_letter()
+	{
+	char buff[MAX_LENGTH];
+	int count;
+	count = 0;
+	buff[count++] = ' ';	/* Required initial blank */
+	buff[count++] = makeupper(Char);
+	for (new_char() ; isalpha(Char) || Char == '\'' ; new_char())
+		{
+		buff[count++] = makeupper(Char);
+		if (count > MAX_LENGTH-2)
+			{
+			buff[count++] = ' ';
+			buff[count++] = '\0';
+			xlate_word(buff);
+			count = 1;
+			}
+		}
+	buff[count++] = ' ';	/* Required terminating blank */
+	buff[count++] = '\0';
+	/* Check for AAANNN type abbreviations */
+	if (isdigit(Char))
+		{
+		spell_word(buff);
+		return;
+		}
+	else
+	if (strlen(buff) == 3)	 /* one character, two spaces */
+		say_ascii(buff[1]);
+	else
+	if (Char == '.')		/* Possible abbreviation */
+		abbrev(buff);
+	else
+		xlate_word(buff);
+	if (Char == '-' && isalpha(Char1))
+		new_char();	/* Skip hyphens */
+	}
+/* Handle abbreviations.  Text in buff was followed by '.' */
+abbrev(buff)
+	char buff[];
+	{
+	if (strcmp(buff, " DR ") == 0)
+		{
+		xlate_word(" DOCTOR ");
+		new_char();
+		}
+	else
+	if (strcmp(buff, " MR ") == 0)
+		{
+		xlate_word(" MISTER ");
+		new_char();
+		}
+	else
+	if (strcmp(buff, " MRS ") == 0)
+		{
+		xlate_word(" MISSUS ");
+		new_char();
+		}
+	else
+	if (strcmp(buff, " PHD ") == 0)
+		{
+		spell_word(" PHD ");
+		new_char();
+		}
+	else
+		xlate_word(buff);
+	}
+@
--- a/ernie-sat/tools/aligner/english_envir/english2phoneme/README
+++ b/ernie-sat/tools/aligner/english_envir/english2phoneme/README
+	                Final Version of
+                ENGLISH TO PHONEME TRANSLATION
+                           4/15/85
+        Here it is one last time.  I have fixed all of the bugs I
+        heard about and added a new feature or two (it now talks
+        money as well as numbers).  I think that this version is
+        good enough for most purposes.  I have proof-read the
+        phoneme rules (found one bug) and made the program more
+        "robust".  I added protection against the "toupper()"
+        problem some people had with earlier versions.
+        If you make a major addition (like better abbreviation
+        handling or an exception dictionary) please send me a
+        copy.  As before, this is all public domain and I make
+        no copyright claims on it.  The part derived from the
+        Naval Research Lab should be public anyway.  Sell it
+        if you can!
+                -John A. Wasser
+Work address:
+ARPAnet:        WASSER%VIKING.DEC@decwrl.ARPA
+Usenet:         {allegra,Shasta,decvax}!decwrl!dec-rhea!dec-viking!wasser
+Easynet:        VIKING::WASSER
+Telephone:      (617)486-2505
+USPS:           Digital Equipment Corp.
+                Mail stop: LJO2/E4
+                30 Porter Rd
+                Littleton, MA  01460
+   The files that make up this package are:
+          english.c       Translation rules.
+          phoneme.c       Translate a single word.
+          parse.c         Split a file into words.
+          spellwor.c      Spell an ASCII character or word.
+          saynum.c        Say a cardinal or ordinal number (long int).
--- a/ernie-sat/tools/aligner/english_envir/english2phoneme/english.c
+++ b/ernie-sat/tools/aligner/english_envir/english2phoneme/english.c
+/*
+**	English to Phoneme rules.
+**
+**	Derived from: 
+**
+**	     AUTOMATIC TRANSLATION OF ENGLISH TEXT TO PHONETICS
+**	            BY MEANS OF LETTER-TO-SOUND RULES
+**
+**			NRL Report 7948
+**
+**		      January 21st, 1976
+**	    Naval Research Laboratory, Washington, D.C.
+**
+**
+**	Published by the National Technical Information Service as
+**	document "AD/A021 929".
+**
+**
+**
+**	The Phoneme codes:
+**
+**		IY	bEEt		IH	bIt
+**		EY	gAte		EH	gEt
+**		AE	fAt		AA	fAther
+**		AO	lAWn		OW	lOne
+**		UH	fUll		UW	fOOl
+**		ER	mURdER		AX	About
+**		AH	bUt		AY	hIde
+**		AW	hOW		OY	tOY
+**	
+**		p	Pack		b	Back
+**		t	Time		d	Dime
+**		k	Coat		g	Goat
+**		f	Fault		v	Vault
+**		TH	eTHer		DH	eiTHer
+**		s	Sue		z	Zoo
+**		SH	leaSH		ZH	leiSure
+**		HH	How		m	suM
+**		n	suN		NG	suNG
+**		l	Laugh		w	Wear
+**		y	Young		r	Rate
+**		CH	CHar		j	Jar
+**		WH	WHere
+**
+**
+**	Rules are made up of four parts:
+**	
+**		The left context.
+**		The text to match.
+**		The right context.
+**		The phonemes to substitute for the matched text.
+**
+**	Procedure:
+**
+**		Seperate each block of letters (apostrophes included) 
+**		and add a space on each side.  For each unmatched 
+**		letter in the word, look through the rules where the 
+**		text to match starts with the letter in the word.  If 
+**		the text to match is found and the right and left 
+**		context patterns also match, output the phonemes for 
+**		that rule and skip to the next unmatched letter.
+**
+**
+**	Special Context Symbols:
+**
+**		#	One or more vowels
+**		:	Zero or more consonants
+**		^	One consonant.
+**		.	One of B, D, V, G, J, L, M, N, R, W or Z (voiced 
+**			consonants)
+**		%	One of ER, E, ES, ED, ING, ELY (a suffix)
+**			(Found in right context only)
+**		+	One of E, I or Y (a "front" vowel)
+**
+*/
+/* Context definitions */
+static char Anything[] = "";	/* No context requirement */
+static char Nothing[] = " ";	/* Context is beginning or end of word */
+/* Phoneme definitions */
+static char Pause[] = " ";	/* Short silence */
+static char Silent[] = "";	/* No phonemes */
+#define LEFT_PART	0
+#define MATCH_PART	1
+#define RIGHT_PART	2
+#define OUT_PART	3
+typedef char *Rule[4];	/* Rule is an array of 4 character pointers */
+/*0 = Punctuation */
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule punct_rules[] =
+	{
+	{Anything,	" ",		Anything,	Pause	},
+	{Anything,	"-",		Anything,	Silent	},
+	{".",		"'S",		Anything,	"z"	},
+	{"#:.E",	"'S",		Anything,	"z"	},
+	{"#",		"'S",		Anything,	"z"	},
+	{Anything,	"'",		Anything,	Silent	},
+	{Anything,	",",		Anything,	Pause	},
+	{Anything,	".",		Anything,	Pause	},
+	{Anything,	"?",		Anything,	Pause	},
+	{Anything,	"!",		Anything,	Pause	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule A_rules[] =
+	{
+	{Anything,	"A",		Nothing,	"AX"	},
+	{Nothing,	"ARE",		Nothing,	"AAr"	},
+	{Nothing,	"AR",		"O",		"AXr"	},
+	{Anything,	"AR",		"#",		"EHr"	},
+	{"^",		"AS",		"#",		"EYs"	},
+	{Anything,	"A",		"WA",		"AX"	},
+	{Anything,	"AW",		Anything,	"AO"	},
+	{" :",		"ANY",		Anything,	"EHnIY"	},
+	{Anything,	"A",		"^+#",		"EY"	},
+	{"#:",		"ALLY",		Anything,	"AXlIY"	},
+	{Nothing,	"AL",		"#",		"AXl"	},
+	{Anything,	"AGAIN",	Anything,	"AXgEHn"},
+	{"#:",		"AG",		"E",		"IHj"	},
+	{Anything,	"A",		"^+:#",		"AE"	},
+	{" :",		"A",		"^+ ",		"EY"	},
+	{Anything,	"A",		"^%",		"EY"	},
+	{Nothing,	"ARR",		Anything,	"AXr"	},
+	{Anything,	"ARR",		Anything,	"AEr"	},
+	{" :",		"AR",		Nothing,	"AAr"	},
+	{Anything,	"AR",		Nothing,	"ER"	},
+	{Anything,	"AR",		Anything,	"AAr"	},
+	{Anything,	"AIR",		Anything,	"EHr"	},
+	{Anything,	"AI",		Anything,	"EY"	},
+	{Anything,	"AY",		Anything,	"EY"	},
+	{Anything,	"AU",		Anything,	"AO"	},
+	{"#:",		"AL",		Nothing,	"AXl"	},
+	{"#:",		"ALS",		Nothing,	"AXlz"	},
+	{Anything,	"ALK",		Anything,	"AOk"	},
+	{Anything,	"AL",		"^",		"AOl"	},
+	{" :",		"ABLE",		Anything,	"EYbAXl"},
+	{Anything,	"ABLE",		Anything,	"AXbAXl"},
+	{Anything,	"ANG",		"+",		"EYnj"	},
+	{Anything,	"A",		Anything,	"AE"	},
+ 	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule B_rules[] =
+	{
+	{Nothing,	"BE",		"^#",		"bIH"	},
+	{Anything,	"BEING",	Anything,	"bIYIHNG"},
+	{Nothing,	"BOTH",		Nothing,	"bOWTH"	},
+	{Nothing,	"BUS",		"#",		"bIHz"	},
+	{Anything,	"BUIL",		Anything,	"bIHl"	},
+	{Anything,	"B",		Anything,	"b"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule C_rules[] =
+	{
+	{Nothing,	"CH",		"^",		"k"	},
+	{"^E",		"CH",		Anything,	"k"	},
+	{Anything,	"CH",		Anything,	"CH"	},
+	{" S",		"CI",		"#",		"sAY"	},
+	{Anything,	"CI",		"A",		"SH"	},
+	{Anything,	"CI",		"O",		"SH"	},
+	{Anything,	"CI",		"EN",		"SH"	},
+	{Anything,	"C",		"+",		"s"	},
+	{Anything,	"CK",		Anything,	"k"	},
+	{Anything,	"COM",		"%",		"kAHm"	},
+	{Anything,	"C",		Anything,	"k"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule D_rules[] =
+	{
+	{"#:",		"DED",		Nothing,	"dIHd"	},
+	{".E",		"D",		Nothing,	"d"	},
+	{"#:^E",	"D",		Nothing,	"t"	},
+	{Nothing,	"DE",		"^#",		"dIH"	},
+	{Nothing,	"DO",		Nothing,	"dUW"	},
+	{Nothing,	"DOES",		Anything,	"dAHz"	},
+	{Nothing,	"DOING",	Anything,	"dUWIHNG"},
+	{Nothing,	"DOW",		Anything,	"dAW"	},
+	{Anything,	"DU",		"A",		"jUW"	},
+	{Anything,	"D",		Anything,	"d"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule E_rules[] =
+	{
+	{"#:",		"E",		Nothing,	Silent	},
+	{"':^",		"E",		Nothing,	Silent	},
+	{" :",		"E",		Nothing,	"IY"	},
+	{"#",		"ED",		Nothing,	"d"	},
+	{"#:",		"E",		"D ",		Silent	},
+	{Anything,	"EV",		"ER",		"EHv"	},
+	{Anything,	"E",		"^%",		"IY"	},
+	{Anything,	"ERI",		"#",		"IYrIY"	},
+	{Anything,	"ERI",		Anything,	"EHrIH"	},
+	{"#:",		"ER",		"#",		"ER"	},
+	{Anything,	"ER",		"#",		"EHr"	},
+	{Anything,	"ER",		Anything,	"ER"	},
+	{Nothing,	"EVEN",		Anything,	"IYvEHn"},
+	{"#:",		"E",		"W",		Silent	},
+	{"T",		"EW",		Anything,	"UW"	},
+	{"S",		"EW",		Anything,	"UW"	},
+	{"R",		"EW",		Anything,	"UW"	},
+	{"D",		"EW",		Anything,	"UW"	},
+	{"L",		"EW",		Anything,	"UW"	},
+	{"Z",		"EW",		Anything,	"UW"	},
+	{"N",		"EW",		Anything,	"UW"	},
+	{"J",		"EW",		Anything,	"UW"	},
+	{"TH",		"EW",		Anything,	"UW"	},
+	{"CH",		"EW",		Anything,	"UW"	},
+	{"SH",		"EW",		Anything,	"UW"	},
+	{Anything,	"EW",		Anything,	"yUW"	},
+	{Anything,	"E",		"O",		"IY"	},
+	{"#:S",		"ES",		Nothing,	"IHz"	},
+	{"#:C",		"ES",		Nothing,	"IHz"	},
+	{"#:G",		"ES",		Nothing,	"IHz"	},
+	{"#:Z",		"ES",		Nothing,	"IHz"	},
+	{"#:X",		"ES",		Nothing,	"IHz"	},
+	{"#:J",		"ES",		Nothing,	"IHz"	},
+	{"#:CH",	"ES",		Nothing,	"IHz"	},
+	{"#:SH",	"ES",		Nothing,	"IHz"	},
+	{"#:",		"E",		"S ",		Silent	},
+	{"#:",		"ELY",		Nothing,	"lIY"	},
+	{"#:",		"EMENT",	Anything,	"mEHnt"	},
+	{Anything,	"EFUL",		Anything,	"fUHl"	},
+	{Anything,	"EE",		Anything,	"IY"	},
+	{Anything,	"EARN",		Anything,	"ERn"	},
+	{Nothing,	"EAR",		"^",		"ER"	},
+	{Anything,	"EAD",		Anything,	"EHd"	},
+	{"#:",		"EA",		Nothing,	"IYAX"	},
+	{Anything,	"EA",		"SU",		"EH"	},
+	{Anything,	"EA",		Anything,	"IY"	},
+	{Anything,	"EIGH",		Anything,	"EY"	},
+	{Anything,	"EI",		Anything,	"IY"	},
+	{Nothing,	"EYE",		Anything,	"AY"	},
+	{Anything,	"EY",		Anything,	"IY"	},
+	{Anything,	"EU",		Anything,	"yUW"	},
+	{Anything,	"E",		Anything,	"EH"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule F_rules[] =
+	{
+	{Anything,	"FUL",		Anything,	"fUHl"	},
+	{Anything,	"F",		Anything,	"f"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule G_rules[] =
+	{
+	{Anything,	"GIV",		Anything,	"gIHv"	},
+	{Nothing,	"G",		"I^",		"g"	},
+	{Anything,	"GE",		"T",		"gEH"	},
+	{"SU",		"GGES",		Anything,	"gjEHs"	},
+	{Anything,	"GG",		Anything,	"g"	},
+	{" B#",		"G",		Anything,	"g"	},
+	{Anything,	"G",		"+",		"j"	},
+	{Anything,	"GREAT",	Anything,	"grEYt"	},
+	{"#",		"GH",		Anything,	Silent	},
+	{Anything,	"G",		Anything,	"g"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule H_rules[] =
+	{
+	{Nothing,	"HAV",		Anything,	"hAEv"	},
+	{Nothing,	"HERE",		Anything,	"hIYr"	},
+	{Nothing,	"HOUR",		Anything,	"AWER"	},
+	{Anything,	"HOW",		Anything,	"hAW"	},
+	{Anything,	"H",		"#",		"h"	},
+	{Anything,	"H",		Anything,	Silent	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule I_rules[] =
+	{
+	{Nothing,	"IN",		Anything,	"IHn"	},
+	{Nothing,	"I",		Nothing,	"AY"	},
+	{Anything,	"IN",		"D",		"AYn"	},
+	{Anything,	"IER",		Anything,	"IYER"	},
+	{"#:R",		"IED",		Anything,	"IYd"	},
+	{Anything,	"IED",		Nothing,	"AYd"	},
+	{Anything,	"IEN",		Anything,	"IYEHn"	},
+	{Anything,	"IE",		"T",		"AYEH"	},
+	{" :",		"I",		"%",		"AY"	},
+	{Anything,	"I",		"%",		"IY"	},
+	{Anything,	"IE",		Anything,	"IY"	},
+	{Anything,	"I",		"^+:#",		"IH"	},
+	{Anything,	"IR",		"#",		"AYr"	},
+	{Anything,	"IZ",		"%",		"AYz"	},
+	{Anything,	"IS",		"%",		"AYz"	},
+	{Anything,	"I",		"D%",		"AY"	},
+	{"+^",		"I",		"^+",		"IH"	},
+	{Anything,	"I",		"T%",		"AY"	},
+	{"#:^",		"I",		"^+",		"IH"	},
+	{Anything,	"I",		"^+",		"AY"	},
+	{Anything,	"IR",		Anything,	"ER"	},
+	{Anything,	"IGH",		Anything,	"AY"	},
+	{Anything,	"ILD",		Anything,	"AYld"	},
+	{Anything,	"IGN",		Nothing,	"AYn"	},
+	{Anything,	"IGN",		"^",		"AYn"	},
+	{Anything,	"IGN",		"%",		"AYn"	},
+	{Anything,	"IQUE",		Anything,	"IYk"	},
+	{Anything,	"I",		Anything,	"IH"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule J_rules[] =
+	{
+	{Anything,	"J",		Anything,	"j"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule K_rules[] =
+	{
+	{Nothing,	"K",		"N",		Silent	},
+	{Anything,	"K",		Anything,	"k"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule L_rules[] =
+	{
+	{Anything,	"LO",		"C#",		"lOW"	},
+	{"L",		"L",		Anything,	Silent	},
+	{"#:^",		"L",		"%",		"AXl"	},
+	{Anything,	"LEAD",		Anything,	"lIYd"	},
+	{Anything,	"L",		Anything,	"l"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule M_rules[] =
+	{
+	{Anything,	"MOV",		Anything,	"mUWv"	},
+	{Anything,	"M",		Anything,	"m"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule N_rules[] =
+	{
+	{"E",		"NG",		"+",		"nj"	},
+	{Anything,	"NG",		"R",		"NGg"	},
+	{Anything,	"NG",		"#",		"NGg"	},
+	{Anything,	"NGL",		"%",		"NGgAXl"},
+	{Anything,	"NG",		Anything,	"NG"	},
+	{Anything,	"NK",		Anything,	"NGk"	},
+	{Nothing,	"NOW",		Nothing,	"nAW"	},
+	{Anything,	"N",		Anything,	"n"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule O_rules[] =
+	{
+	{Anything,	"OF",		Nothing,	"AXv"	},
+	{Anything,	"OROUGH",	Anything,	"EROW"	},
+	{"#:",		"OR",		Nothing,	"ER"	},
+	{"#:",		"ORS",		Nothing,	"ERz"	},
+	{Anything,	"OR",		Anything,	"AOr"	},
+	{Nothing,	"ONE",		Anything,	"wAHn"	},
+	{Anything,	"OW",		Anything,	"OW"	},
+	{Nothing,	"OVER",		Anything,	"OWvER"	},
+	{Anything,	"OV",		Anything,	"AHv"	},
+	{Anything,	"O",		"^%",		"OW"	},
+	{Anything,	"O",		"^EN",		"OW"	},
+	{Anything,	"O",		"^I#",		"OW"	},
+	{Anything,	"OL",		"D",		"OWl"	},
+	{Anything,	"OUGHT",	Anything,	"AOt"	},
+	{Anything,	"OUGH",		Anything,	"AHf"	},
+	{Nothing,	"OU",		Anything,	"AW"	},
+	{"H",		"OU",		"S#",		"AW"	},
+	{Anything,	"OUS",		Anything,	"AXs"	},
+	{Anything,	"OUR",		Anything,	"AOr"	},
+	{Anything,	"OULD",		Anything,	"UHd"	},
+	{"^",		"OU",		"^L",		"AH"	},
+	{Anything,	"OUP",		Anything,	"UWp"	},
+	{Anything,	"OU",		Anything,	"AW"	},
+	{Anything,	"OY",		Anything,	"OY"	},
+	{Anything,	"OING",		Anything,	"OWIHNG"},
+	{Anything,	"OI",		Anything,	"OY"	},
+	{Anything,	"OOR",		Anything,	"AOr"	},
+	{Anything,	"OOK",		Anything,	"UHk"	},
+	{Anything,	"OOD",		Anything,	"UHd"	},
+	{Anything,	"OO",		Anything,	"UW"	},
+	{Anything,	"O",		"E",		"OW"	},
+	{Anything,	"O",		Nothing,	"OW"	},
+	{Anything,	"OA",		Anything,	"OW"	},
+	{Nothing,	"ONLY",		Anything,	"OWnlIY"},
+	{Nothing,	"ONCE",		Anything,	"wAHns"	},
+	{Anything,	"ON'T",		Anything,	"OWnt"	},
+	{"C",		"O",		"N",		"AA"	},
+	{Anything,	"O",		"NG",		"AO"	},
+	{" :^",		"O",		"N",		"AH"	},
+	{"I",		"ON",		Anything,	"AXn"	},
+	{"#:",		"ON",		Nothing,	"AXn"	},
+	{"#^",		"ON",		Anything,	"AXn"	},
+	{Anything,	"O",		"ST ",		"OW"	},
+	{Anything,	"OF",		"^",		"AOf"	},
+	{Anything,	"OTHER",	Anything,	"AHDHER"},
+	{Anything,	"OSS",		Nothing,	"AOs"	},
+	{"#:^",		"OM",		Anything,	"AHm"	},
+	{Anything,	"O",		Anything,	"AA"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule P_rules[] =
+	{
+	{Anything,	"PH",		Anything,	"f"	},
+	{Anything,	"PEOP",		Anything,	"pIYp"	},
+	{Anything,	"POW",		Anything,	"pAW"	},
+	{Anything,	"PUT",		Nothing,	"pUHt"	},
+	{Anything,	"P",		Anything,	"p"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule Q_rules[] =
+	{
+	{Anything,	"QUAR",		Anything,	"kwAOr"	},
+	{Anything,	"QU",		Anything,	"kw"	},
+	{Anything,	"Q",		Anything,	"k"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule R_rules[] =
+	{
+	{Nothing,	"RE",		"^#",		"rIY"	},
+	{Anything,	"R",		Anything,	"r"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule S_rules[] =
+	{
+	{Anything,	"SH",		Anything,	"SH"	},
+	{"#",		"SION",		Anything,	"ZHAXn"	},
+	{Anything,	"SOME",		Anything,	"sAHm"	},
+	{"#",		"SUR",		"#",		"ZHER"	},
+	{Anything,	"SUR",		"#",		"SHER"	},
+	{"#",		"SU",		"#",		"ZHUW"	},
+	{"#",		"SSU",		"#",		"SHUW"	},
+	{"#",		"SED",		Nothing,	"zd"	},
+	{"#",		"S",		"#",		"z"	},
+	{Anything,	"SAID",		Anything,	"sEHd"	},
+	{"^",		"SION",		Anything,	"SHAXn"	},
+	{Anything,	"S",		"S",		Silent	},
+	{".",		"S",		Nothing,	"z"	},
+	{"#:.E",	"S",		Nothing,	"z"	},
+	{"#:^##",	"S",		Nothing,	"z"	},
+	{"#:^#",	"S",		Nothing,	"s"	},
+	{"U",		"S",		Nothing,	"s"	},
+	{" :#",		"S",		Nothing,	"z"	},
+	{Nothing,	"SCH",		Anything,	"sk"	},
+	{Anything,	"S",		"C+",		Silent	},
+	{"#",		"SM",		Anything,	"zm"	},
+	{"#",		"SN",		"'",		"zAXn"	},
+	{Anything,	"S",		Anything,	"s"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule T_rules[] =
+	{
+	{Nothing,	"THE",		Nothing,	"DHAX"	},
+	{Anything,	"TO",		Nothing,	"tUW"	},
+	{Anything,	"THAT",		Nothing,	"DHAEt"	},
+	{Nothing,	"THIS",		Nothing,	"DHIHs"	},
+	{Nothing,	"THEY",		Anything,	"DHEY"	},
+	{Nothing,	"THERE",	Anything,	"DHEHr"	},
+	{Anything,	"THER",		Anything,	"DHER"	},
+	{Anything,	"THEIR",	Anything,	"DHEHr"	},
+	{Nothing,	"THAN",		Nothing,	"DHAEn"	},
+	{Nothing,	"THEM",		Nothing,	"DHEHm"	},
+	{Anything,	"THESE",	Nothing,	"DHIYz"	},
+	{Nothing,	"THEN",		Anything,	"DHEHn"	},
+	{Anything,	"THROUGH",	Anything,	"THrUW"	},
+	{Anything,	"THOSE",	Anything,	"DHOWz"	},
+	{Anything,	"THOUGH",	Nothing,	"DHOW"	},
+	{Nothing,	"THUS",		Anything,	"DHAHs"	},
+	{Anything,	"TH",		Anything,	"TH"	},
+	{"#:",		"TED",		Nothing,	"tIHd"	},
+	{"S",		"TI",		"#N",		"CH"	},
+	{Anything,	"TI",		"O",		"SH"	},
+	{Anything,	"TI",		"A",		"SH"	},
+	{Anything,	"TIEN",		Anything,	"SHAXn"	},
+	{Anything,	"TUR",		"#",		"CHER"	},
+	{Anything,	"TU",		"A",		"CHUW"	},
+	{Nothing,	"TWO",		Anything,	"tUW"	},
+	{Anything,	"T",		Anything,	"t"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule U_rules[] =
+	{
+	{Nothing,	"UN",		"I",		"yUWn"	},
+	{Nothing,	"UN",		Anything,	"AHn"	},
+	{Nothing,	"UPON",		Anything,	"AXpAOn"},
+	{"T",		"UR",		"#",		"UHr"	},
+	{"S",		"UR",		"#",		"UHr"	},
+	{"R",		"UR",		"#",		"UHr"	},
+	{"D",		"UR",		"#",		"UHr"	},
+	{"L",		"UR",		"#",		"UHr"	},
+	{"Z",		"UR",		"#",		"UHr"	},
+	{"N",		"UR",		"#",		"UHr"	},
+	{"J",		"UR",		"#",		"UHr"	},
+	{"TH",		"UR",		"#",		"UHr"	},
+	{"CH",		"UR",		"#",		"UHr"	},
+	{"SH",		"UR",		"#",		"UHr"	},
+	{Anything,	"UR",		"#",		"yUHr"	},
+	{Anything,	"UR",		Anything,	"ER"	},
+	{Anything,	"U",		"^ ",		"AH"	},
+	{Anything,	"U",		"^^",		"AH"	},
+	{Anything,	"UY",		Anything,	"AY"	},
+	{" G",		"U",		"#",		Silent	},
+	{"G",		"U",		"%",		Silent	},
+	{"G",		"U",		"#",		"w"	},
+	{"#N",		"U",		Anything,	"yUW"	},
+	{"T",		"U",		Anything,	"UW"	},
+	{"S",		"U",		Anything,	"UW"	},
+	{"R",		"U",		Anything,	"UW"	},
+	{"D",		"U",		Anything,	"UW"	},
+	{"L",		"U",		Anything,	"UW"	},
+	{"Z",		"U",		Anything,	"UW"	},
+	{"N",		"U",		Anything,	"UW"	},
+	{"J",		"U",		Anything,	"UW"	},
+	{"TH",		"U",		Anything,	"UW"	},
+	{"CH",		"U",		Anything,	"UW"	},
+	{"SH",		"U",		Anything,	"UW"	},
+	{Anything,	"U",		Anything,	"yUW"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule V_rules[] =
+	{
+	{Anything,	"VIEW",		Anything,	"vyUW"	},
+	{Anything,	"V",		Anything,	"v"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule W_rules[] =
+	{
+	{Nothing,	"WERE",		Anything,	"wER"	},
+	{Anything,	"WA",		"S",		"wAA"	},
+	{Anything,	"WA",		"T",		"wAA"	},
+	{Anything,	"WHERE",	Anything,	"WHEHr"	},
+	{Anything,	"WHAT",		Anything,	"WHAAt"	},
+	{Anything,	"WHOL",		Anything,	"hOWl"	},
+	{Anything,	"WHO",		Anything,	"hUW"	},
+	{Anything,	"WH",		Anything,	"WH"	},
+	{Anything,	"WAR",		Anything,	"wAOr"	},
+	{Anything,	"WOR",		"^",		"wER"	},
+	{Anything,	"WR",		Anything,	"r"	},
+	{Anything,	"W",		Anything,	"w"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule X_rules[] =
+	{
+	{Anything,	"X",		Anything,	"ks"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule Y_rules[] =
+	{
+	{Anything,	"YOUNG",	Anything,	"yAHNG"	},
+	{Nothing,	"YOU",		Anything,	"yUW"	},
+	{Nothing,	"YES",		Anything,	"yEHs"	},
+	{Nothing,	"Y",		Anything,	"y"	},
+	{"#:^",		"Y",		Nothing,	"IY"	},
+	{"#:^",		"Y",		"I",		"IY"	},
+	{" :",		"Y",		Nothing,	"AY"	},
+	{" :",		"Y",		"#",		"AY"	},
+	{" :",		"Y",		"^+:#",		"IH"	},
+	{" :",		"Y",		"^#",		"AY"	},
+	{Anything,	"Y",		Anything,	"IH"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+/*
+**	LEFT_PART	MATCH_PART	RIGHT_PART	OUT_PART
+*/
+static Rule Z_rules[] =
+	{
+	{Anything,	"Z",		Anything,	"z"	},
+	{Anything,	0,		Anything,	Silent	},
+	};
+Rule *Rules[] =
+	{
+	punct_rules,
+	A_rules, B_rules, C_rules, D_rules, E_rules, F_rules, G_rules, 
+	H_rules, I_rules, J_rules, K_rules, L_rules, M_rules, N_rules, 
+	O_rules, P_rules, Q_rules, R_rules, S_rules, T_rules, U_rules, 
+	V_rules, W_rules, X_rules, Y_rules, Z_rules
+	};
--- a/ernie-sat/tools/aligner/english_envir/english2phoneme/english.o
+++ b/ernie-sat/tools/aligner/english_envir/english2phoneme/english.o
--- a/ernie-sat/tools/aligner/english_envir/english2phoneme/parse.c
+++ b/ernie-sat/tools/aligner/english_envir/english2phoneme/parse.c
+#include <stdio.h>
+#include <ctype.h>
+#define MAX_LENGTH 128
+static FILE *In_file;
+static FILE *Out_file;
+static int Char, Char1, Char2, Char3;
+/*
+** main(argc, argv)
+**	int argc;
+**	char *argv[];
+**
+**	This is the main program.  It takes up to two file names (input
+**	and output)  and translates the input file to phoneme codes
+**	(see english.c) on the output file.
+*/
+main(argc, argv)
+	int argc;
+	char *argv[];
+	{
+	if (argc > 3)
+		{
+		fputs("Usage: PHONEME [infile [outfile]]\n", stderr);
+		exit(1);
+		}
+	if (argc == 1)
+		{
+		fputs("Enter english text:\n", stderr);
+		}
+	if (argc > 1)
+		{
+		In_file = fopen(argv[1], "r");
+		if (In_file == 0)
+			{
+			fputs("Error: Cannot open input file.\n", stderr);
+			fputs("Usage: PHONEME [infile [outfile]]\n", stderr);
+			exit(1);
+			}
+		}
+	else
+		In_file = stdin;
+	if (argc > 2)
+		{
+		Out_file = fopen(argv[2], "w");
+		if (Out_file == 0)
+			{
+			fputs("Error: Cannot create output file.\n", stderr);
+			fputs("Usage: PHONEME [infile [outfile]]\n", stderr);
+			exit(1);
+			}
+		}
+	else
+		Out_file = stdout;
+	xlate_file();
+	}
+outstring(string)
+	char *string;
+	{
+	while (*string != '\0')
+		outchar(*string++);
+	}
+outchar(chr)
+	int chr;
+	{
+	fputc(chr,Out_file);
+	}
+int makeupper(character)
+	int character;
+	{
+	if (islower(character))
+		return toupper(character);
+	else
+		return character;
+	}
+new_char()
+	{
+	/*
+	If the cache is full of newline, time to prime the look-ahead
+	again.  If an EOF is found, fill the remainder of the queue with
+	EOF's.
+	*/
+	if (Char == '\n'  && Char1 == '\n' && Char2 == '\n' && Char3 == '\n')
+		{	/* prime the pump again */
+		Char = getc(In_file);
+		if (Char == EOF)
+			{
+			Char1 = EOF;
+			Char2 = EOF;
+			Char3 = EOF;
+			return Char;
+			}
+		if (Char == '\n')
+			return Char;
+		Char1 = getc(In_file);
+		if (Char1 == EOF)
+			{
+			Char2 = EOF;
+			Char3 = EOF;
+			return Char;
+			}
+		if (Char1 == '\n')
+			return Char;
+		Char2 = getc(In_file);
+		if (Char2 == EOF)
+			{
+			Char3 = EOF;
+			return Char;
+			}
+		if (Char2 == '\n')
+			return Char;
+		Char3 = getc(In_file);
+		}
+	else
+		{
+		/*
+		Buffer not full of newline, shuffle the characters and
+		either get a new one or propagate a newline or EOF.
+		*/
+		Char = Char1;
+		Char1 = Char2;
+		Char2 = Char3;
+		if (Char3 != '\n' && Char3 != EOF)
+			Char3 = getc(In_file);
+		}
+	return Char;
+	}
+/*
+** xlate_file()
+**
+**	This is the input file translator.  It sets up the first character
+**	and uses it to determine what kind of text follows.
+*/
+xlate_file()
+	{
+	/* Prime the queue */
+	Char = '\n';
+	Char1 = '\n';
+	Char2 = '\n';
+	Char3 = '\n';
+	new_char();	/* Fill Char, Char1, Char2 and Char3 */
+	while (Char != EOF)	/* All of the words in the file */
+		{
+		if (isdigit(Char))
+			have_number();
+		else
+		if (isalpha(Char) || Char == '\'')
+			have_letter();
+		else
+		if (Char == '$' && isdigit(Char1))
+			have_dollars();
+		else
+			have_special();
+		}
+	}
+have_dollars()
+	{
+	long int value;
+	value = 0L;
+	for (new_char() ; isdigit(Char) || Char == ',' ; new_char())
+		{
+		if (Char != ',')
+			value = 10 * value + (Char-'0');
+		}
+	say_cardinal(value);	/* Say number of whole dollars */
+	/* Found a character that is a non-digit and non-comma */
+	/* Check for no decimal or no cents digits */
+	if (Char != '.' || !isdigit(Char1))
+		{
+		if (value == 1L)
+			outstring("dAAlER ");
+		else
+			outstring("dAAlAArz ");
+		return 1;
+		}
+	/* We have '.' followed by a digit */
+	new_char();	/* Skip the period */
+	/* If it is ".dd " say as " DOLLARS AND n CENTS " */
+	if (isdigit(Char1) && !isdigit(Char2))
+		{
+		if (value == 1L)
+			outstring("dAAlER ");
+		else
+			outstring("dAAlAArz ");
+		if (Char == '0' && Char1 == '0')
+			{
+			new_char();	/* Skip tens digit */
+			new_char();	/* Skip units digit */
+			return 1;
+			}
+		outstring("AAnd ");
+		value = (Char-'0')*10 + Char1-'0';
+		say_cardinal(value);
+		if (value == 1L)
+			outstring("sEHnt ");
+		else
+			outstring("sEHnts ");
+		new_char();	/* Used Char (tens digit) */
+		new_char();	/* Used Char1 (units digit) */
+		return 1;
+		}
+	/* Otherwise say as "n POINT ddd DOLLARS " */
+	outstring("pOYnt ");
+	for ( ; isdigit(Char) ; new_char())
+		{
+		say_ascii(Char);
+		}
+	outstring("dAAlAArz ");
+	return 1;
+	}
+have_special()
+	{
+	if (Char == '\n')
+		outchar('\n');
+	else
+	if (!isspace(Char))
+		say_ascii(Char);
+	new_char();
+	return 1;
+	}
+have_number()
+	{
+	long int value;
+	int lastdigit;
+	value = Char - '0';
+	lastdigit = Char;
+	for (new_char() ; isdigit(Char) ; new_char())
+		{
+		value = 10 * value + (Char-'0');
+		lastdigit = Char;
+		}
+	/* Recognize ordinals based on last digit of number */
+	switch (lastdigit)
+		{
+	case '1':	/* ST */
+		if (makeupper(Char) == 'S' && makeupper(Char1) == 'T' &&
+		    !isalpha(Char2) && !isdigit(Char2))
+			{
+			say_ordinal(value);
+			new_char();	/* Used Char */
+			new_char();	/* Used Char1 */
+			return 1;
+			}
+		break;
+	case '2':	/* ND */
+		if (makeupper(Char) == 'N' && makeupper(Char1) == 'D' &&
+		    !isalpha(Char2) && !isdigit(Char2))
+			{
+			say_ordinal(value);
+			new_char();	/* Used Char */
+			new_char();	/* Used Char1 */
+			return 1;
+			}
+		break;
+	case '3':	/* RD */
+		if (makeupper(Char) == 'R' && makeupper(Char1) == 'D' &&
+		    !isalpha(Char2) && !isdigit(Char2))
+			{
+			say_ordinal(value);
+			new_char();	/* Used Char */
+			new_char();	/* Used Char1 */
+			return 1;
+			}
+		break;
+	case '0':	/* TH */
+	case '4':	/* TH */
+	case '5':	/* TH */
+	case '6':	/* TH */
+	case '7':	/* TH */
+	case '8':	/* TH */
+	case '9':	/* TH */
+		if (makeupper(Char) == 'T' && makeupper(Char1) == 'H' &&
+		    !isalpha(Char2) && !isdigit(Char2))
+			{
+			say_ordinal(value);
+			new_char();	/* Used Char */
+			new_char();	/* Used Char1 */
+			return 1;
+			}
+		break;
+		}
+	say_cardinal(value);
+	/* Recognize decimal points */
+	if (Char == '.' && isdigit(Char1))
+		{
+		outstring("pOYnt ");
+		for (new_char() ; isdigit(Char) ; new_char())
+			{
+			say_ascii(Char);
+			}
+		}
+	/* Spell out trailing abbreviations */
+	if (isalpha(Char))
+		{
+		while (isalpha(Char))
+			{
+			say_ascii(Char);
+			new_char();
+			}
+		}
+	return 1;
+	}
+have_letter()
+	{
+	char buff[MAX_LENGTH];
+	int count;
+	count = 0;
+	buff[count++] = ' ';	/* Required initial blank */
+	buff[count++] = makeupper(Char);
+	for (new_char() ; isalpha(Char) || Char == '\'' ; new_char())
+		{
+		buff[count++] = makeupper(Char);
+		if (count > MAX_LENGTH-2)
+			{
+			buff[count++] = ' ';
+			buff[count++] = '\0';
+			xlate_word(buff);
+			count = 1;
+			}
+		}
+	buff[count++] = ' ';	/* Required terminating blank */
+	buff[count++] = '\0';
+	/* Check for AAANNN type abbreviations */
+	if (isdigit(Char))
+		{
+		spell_word(buff);
+		return 1;
+		}
+	else
+	if (strlen(buff) == 3)	 /* one character, two spaces */
+		say_ascii(buff[1]);
+	else
+	if (Char == '.')		/* Possible abbreviation */
+		abbrev(buff);
+	else
+		xlate_word(buff);
+	if (Char == '-' && isalpha(Char1))
+		new_char();	/* Skip hyphens */
+	}
+/* Handle abbreviations.  Text in buff was followed by '.' */
+abbrev(buff)
+	char buff[];
+	{
+	if (strcmp(buff, " DR ") == 0)
+		{
+		xlate_word(" DOCTOR ");
+		new_char();
+		}
+	else
+	if (strcmp(buff, " MR ") == 0)
+		{
+		xlate_word(" MISTER ");
+		new_char();
+		}
+	else
+	if (strcmp(buff, " MRS ") == 0)
+		{
+		xlate_word(" MISSUS ");
+		new_char();
+		}
+	else
+	if (strcmp(buff, " PHD ") == 0)
+		{
+		spell_word(" PHD ");
+		new_char();
+		}
+	else
+		xlate_word(buff);
+	}
--- a/ernie-sat/tools/aligner/english_envir/english2phoneme/parse.o
+++ b/ernie-sat/tools/aligner/english_envir/english2phoneme/parse.o
--- a/ernie-sat/tools/aligner/english_envir/english2phoneme/phoneme
+++ b/ernie-sat/tools/aligner/english_envir/english2phoneme/phoneme
--- a/ernie-sat/tools/aligner/english_envir/english2phoneme/phoneme.c
+++ b/ernie-sat/tools/aligner/english_envir/english2phoneme/phoneme.c
+#include <stdio.h>
+#include <ctype.h>
+#define FALSE (0)
+#define TRUE (!0)
+/*
+**	English to Phoneme translation.
+**
+**	Rules are made up of four parts:
+**	
+**		The left context.
+**		The text to match.
+**		The right context.
+**		The phonemes to substitute for the matched text.
+**
+**	Procedure:
+**
+**		Seperate each block of letters (apostrophes included) 
+**		and add a space on each side.  For each unmatched 
+**		letter in the word, look through the rules where the 
+**		text to match starts with the letter in the word.  If 
+**		the text to match is found and the right and left 
+**		context patterns also match, output the phonemes for 
+**		that rule and skip to the next unmatched letter.
+**
+**
+**	Special Context Symbols:
+**
+**		#	One or more vowels
+**		:	Zero or more consonants
+**		^	One consonant.
+**		.	One of B, D, V, G, J, L, M, N, R, W or Z (voiced 
+**			consonants)
+**		%	One of ER, E, ES, ED, ING, ELY (a suffix)
+**			(Right context only)
+**		+	One of E, I or Y (a "front" vowel)
+*/
+typedef char *Rule[4];	/* A rule is four character pointers */
+extern Rule *Rules[];	/* An array of pointers to rules */
+int isvowel(chr)
+	char chr;
+	{
+	return (chr == 'A' || chr == 'E' || chr == 'I' || 
+		chr == 'O' || chr == 'U');
+	}
+int isconsonant(chr)
+	char chr;
+	{
+	return (isupper(chr) && !isvowel(chr));
+	}
+xlate_word(word)
+	char word[];
+	{
+	int index;	/* Current position in word */
+	int type;	/* First letter of match part */
+	index = 1;	/* Skip the initial blank */
+	do
+		{
+		if (isupper(word[index]))
+			type = word[index] - 'A' + 1;
+		else
+			type = 0;
+		index = find_rule(word, index, Rules[type]);
+		}
+	while (word[index] != '\0');
+	}
+find_rule(word, index, rules)
+	char word[];
+	int index;
+	Rule *rules;
+	{
+	Rule *rule;
+	char *left, *match, *right, *output;
+	int remainder;
+	for (;;)	/* Search for the rule */
+		{
+		rule = rules++;
+		match = (*rule)[1];
+		if (match == 0)	/* bad symbol! */
+			{
+			fprintf(stderr,
+"Error: Can't find rule for: '%c' in \"%s\"\n", word[index], word);
+			return index+1;	/* Skip it! */
+			}
+		for (remainder = index; *match != '\0'; match++, remainder++)
+			{
+			if (*match != word[remainder])
+				break;
+			}
+		if (*match != '\0')	/* found missmatch */
+			continue;
+/*
+printf("\nWord: \"%s\", Index:%4d, Trying: \"%s/%s/%s\" = \"%s\"\n",
+    word, index, (*rule)[0], (*rule)[1], (*rule)[2], (*rule)[3]);
+*/
+		left = (*rule)[0];
+		right = (*rule)[2];
+		if (!leftmatch(left, &word[index-1]))
+			continue;
+/*
+printf("leftmatch(\"%s\",\"...%c\") succeded!\n", left, word[index-1]);
+*/
+		if (!rightmatch(right, &word[remainder]))
+			continue;
+/*
+printf("rightmatch(\"%s\",\"%s\") succeded!\n", right, &word[remainder]);
+*/
+		output = (*rule)[3];
+/*
+printf("Success: ");
+*/
+		outstring(output);
+		return remainder;
+		}
+	}
+leftmatch(pattern, context)
+	char *pattern;	/* first char of pattern to match in text */
+	char *context;	/* last char of text to be matched */
+	{
+	char *pat;
+	char *text;
+	int count;
+	if (*pattern == '\0')	/* null string matches any context */
+		{
+		return TRUE;
+		}
+	/* point to last character in pattern string */
+	count = strlen(pattern);
+	pat = pattern + (count - 1);
+	text = context;
+	for (; count > 0; pat--, count--)
+		{
+		/* First check for simple text or space */
+		if (isalpha(*pat) || *pat == '\'' || *pat == ' ')
+			if (*pat != *text)
+				return FALSE;
+			else
+				{
+				text--;
+				continue;
+				}
+		switch (*pat)
+			{
+		case '#':	/* One or more vowels */
+			if (!isvowel(*text))
+				return FALSE;
+			text--;
+			while (isvowel(*text))
+				text--;
+			break;
+		case ':':	/* Zero or more consonants */
+			while (isconsonant(*text))
+				text--;
+			break;
+		case '^':	/* One consonant */
+			if (!isconsonant(*text))
+				return FALSE;
+			text--;
+			break;
+		case '.':	/* B, D, V, G, J, L, M, N, R, W, Z */
+			if (*text != 'B' && *text != 'D' && *text != 'V'
+			   && *text != 'G' && *text != 'J' && *text != 'L'
+			   && *text != 'M' && *text != 'N' && *text != 'R'
+			   && *text != 'W' && *text != 'Z')
+				return FALSE;
+			text--;
+			break;
+		case '+':	/* E, I or Y (front vowel) */
+			if (*text != 'E' && *text != 'I' && *text != 'Y')
+				return FALSE;
+			text--;
+			break;
+		case '%':
+		default:
+			fprintf(stderr, "Bad char in left rule: '%c'\n", *pat);
+			return FALSE;
+			}
+		}
+	return TRUE;
+	}
+rightmatch(pattern, context)
+	char *pattern;	/* first char of pattern to match in text */
+	char *context;	/* last char of text to be matched */
+	{
+	char *pat;
+	char *text;
+	if (*pattern == '\0')	/* null string matches any context */
+		return TRUE;
+	pat = pattern;
+	text = context;
+	for (pat = pattern; *pat != '\0'; pat++)
+		{
+		/* First check for simple text or space */
+		if (isalpha(*pat) || *pat == '\'' || *pat == ' ')
+			if (*pat != *text)
+				return FALSE;
+			else
+				{
+				text++;
+				continue;
+				}
+		switch (*pat)
+			{
+		case '#':	/* One or more vowels */
+			if (!isvowel(*text))
+				return FALSE;
+			text++;
+			while (isvowel(*text))
+				text++;
+			break;
+		case ':':	/* Zero or more consonants */
+			while (isconsonant(*text))
+				text++;
+			break;
+		case '^':	/* One consonant */
+			if (!isconsonant(*text))
+				return FALSE;
+			text++;
+			break;
+		case '.':	/* B, D, V, G, J, L, M, N, R, W, Z */
+			if (*text != 'B' && *text != 'D' && *text != 'V'
+			   && *text != 'G' && *text != 'J' && *text != 'L'
+			   && *text != 'M' && *text != 'N' && *text != 'R'
+			   && *text != 'W' && *text != 'Z')
+				return FALSE;
+			text++;
+			break;
+		case '+':	/* E, I or Y (front vowel) */
+			if (*text != 'E' && *text != 'I' && *text != 'Y')
+				return FALSE;
+			text++;
+			break;
+		case '%':	/* ER, E, ES, ED, ING, ELY (a suffix) */
+			if (*text == 'E')
+				{
+				text++;
+				if (*text == 'L')
+					{
+					text++;
+					if (*text == 'Y')
+						{
+						text++;
+						break;
+						}
+					else
+						{
+						text--; /* Don't gobble L */
+						break;
+						}
+					}
+				else
+				if (*text == 'R' || *text == 'S' 
+				   || *text == 'D')
+					text++;
+				break;
+				}
+			else
+			if (*text == 'I')
+				{
+				text++;
+				if (*text == 'N')
+					{
+					text++;
+					if (*text == 'G')
+						{
+						text++;
+						break;
+						}
+					}
+				return FALSE;
+				}
+			else
+			return FALSE;
+		default:
+			fprintf(stderr, "Bad char in right rule:'%c'\n", *pat);
+			return FALSE;
+			}
+		}
+	return TRUE;
+	}
--- a/ernie-sat/tools/aligner/english_envir/english2phoneme/phoneme.o
+++ b/ernie-sat/tools/aligner/english_envir/english2phoneme/phoneme.o
--- a/ernie-sat/tools/aligner/english_envir/english2phoneme/saynum.c
+++ b/ernie-sat/tools/aligner/english_envir/english2phoneme/saynum.c
+#include <stdio.h>
+/*
+**              Integer to Readable ASCII Conversion Routine.
+**
+** Synopsis:
+**
+**      say_cardinal(value)
+**      	long int     value;          -- The number to output
+**
+**	The number is translated into a string of phonemes
+**
+*/
+static char *Cardinals[] = 
+	{
+	"zIHrOW ",	"wAHn ",	"tUW ",		"THrIY ",
+	"fOWr ",	"fAYv ",	"sIHks ",	"sEHvAXn ",
+	"EYt ",		"nAYn ",		
+	"tEHn ",	"IYlEHvAXn ",	"twEHlv ",	"THERtIYn ",
+	"fOWrtIYn ",	"fIHftIYn ", 	"sIHkstIYn ",	"sEHvEHntIYn ",
+	"EYtIYn ",	"nAYntIYn "
+	} ;
+static char *Twenties[] = 
+	{
+	"twEHntIY ",	"THERtIY ",	"fAOrtIY ",	"fIHftIY ",
+	"sIHkstIY ",	"sEHvEHntIY ",	"EYtIY ",	"nAYntIY "
+	} ;
+static char *Ordinals[] = 
+	{
+	"zIHrOWEHTH ",	"fERst ",	"sEHkAHnd ",	"THERd ",
+	"fOWrTH ",	"fIHfTH ",	"sIHksTH ",	"sEHvEHnTH ",
+	"EYtTH ",	"nAYnTH ",		
+	"tEHnTH ",	"IYlEHvEHnTH ",	"twEHlvTH ",	"THERtIYnTH ",
+	"fAOrtIYnTH ",	"fIHftIYnTH ", 	"sIHkstIYnTH ",	"sEHvEHntIYnTH ",
+	"EYtIYnTH ",	"nAYntIYnTH "
+	} ;
+static char *Ord_twenties[] = 
+	{
+	"twEHntIYEHTH ","THERtIYEHTH ",	"fOWrtIYEHTH ",	"fIHftIYEHTH ",
+	"sIHkstIYEHTH ","sEHvEHntIYEHTH ","EYtIYEHTH ",	"nAYntIYEHTH "
+	} ;
+/*
+** Translate a number to phonemes.  This version is for CARDINAL numbers.
+**	 Note: this is recursive.
+*/
+say_cardinal(value)
+	long int value;
+	{
+	if (value < 0)
+		{
+		outstring("mAYnAHs ");
+		value = (-value);
+		if (value < 0)	/* Overflow!  -32768 */
+			{
+			outstring("IHnfIHnIHtIY ");
+			return 1;
+			}
+		}
+	if (value >= 1000000000L)	/* Billions */
+		{
+		say_cardinal(value/1000000000L);
+		outstring("bIHlIYAXn ");
+		value = value % 1000000000;
+		if (value == 0)
+			return 1;		/* Even billion */
+		if (value < 100)	/* as in THREE BILLION AND FIVE */
+			outstring("AEnd ");
+		}
+	if (value >= 1000000L)	/* Millions */
+		{
+		say_cardinal(value/1000000L);
+		outstring("mIHlIYAXn ");
+		value = value % 1000000L;
+		if (value == 0)
+			return 1;		/* Even million */
+		if (value < 100)	/* as in THREE MILLION AND FIVE */
+			outstring("AEnd ");
+		}
+	/* Thousands 1000..1099 2000..99999 */
+	/* 1100 to 1999 is eleven-hunderd to ninteen-hunderd */
+	if ((value >= 1000L && value <= 1099L) || value >= 2000L)
+		{
+		say_cardinal(value/1000L);
+		outstring("THAWzAEnd ");
+		value = value % 1000L;
+		if (value == 0)
+			return 1;		/* Even thousand */
+		if (value < 100)	/* as in THREE THOUSAND AND FIVE */
+			outstring("AEnd ");
+		}
+	if (value >= 100L)
+		{
+		outstring(Cardinals[value/100]);
+		outstring("hAHndrEHd ");
+		value = value % 100;
+		if (value == 0)
+			return 1;		/* Even hundred */
+		}
+	if (value >= 20)
+		{
+		outstring(Twenties[(value-20)/ 10]);
+		value = value % 10;
+		if (value == 0)
+			return 1;		/* Even ten */
+		}
+	outstring(Cardinals[value]);
+	return 1;
+	} 
+/*
+** Translate a number to phonemes.  This version is for ORDINAL numbers.
+**	 Note: this is recursive.
+*/
+say_ordinal(value)
+	long int value;
+	{
+	if (value < 0)
+		{
+		outstring("mAHnAXs ");
+		value = (-value);
+		if (value < 0)	/* Overflow!  -32768 */
+			{
+			outstring("IHnfIHnIHtIY ");
+			return 1;
+			}
+		}
+	if (value >= 1000000000L)	/* Billions */
+		{
+		say_cardinal(value/1000000000L);
+		value = value % 1000000000;
+		if (value == 0)
+			{
+			outstring("bIHlIYAXnTH ");
+			return 1;		/* Even billion */
+			}
+		outstring("bIHlIYAXn ");
+		if (value < 100)	/* as in THREE BILLION AND FIVE */
+			outstring("AEnd ");
+		}
+	if (value >= 1000000L)	/* Millions */
+		{
+		say_cardinal(value/1000000L);
+		value = value % 1000000L;
+		if (value == 0)
+			{
+			outstring("mIHlIYAXnTH ");
+			return 1;		/* Even million */
+			}
+		outstring("mIHlIYAXn ");
+		if (value < 100)	/* as in THREE MILLION AND FIVE */
+			outstring("AEnd ");
+		}
+	/* Thousands 1000..1099 2000..99999 */
+	/* 1100 to 1999 is eleven-hunderd to ninteen-hunderd */
+	if ((value >= 1000L && value <= 1099L) || value >= 2000L)
+		{
+		say_cardinal(value/1000L);
+		value = value % 1000L;
+		if (value == 0)
+			{
+			outstring("THAWzAEndTH ");
+			return 1;		/* Even thousand */
+			}
+		outstring("THAWzAEnd ");
+		if (value < 100)	/* as in THREE THOUSAND AND FIVE */
+			outstring("AEnd ");
+		}
+	if (value >= 100L)
+		{
+		outstring(Cardinals[value/100]);
+		value = value % 100;
+		if (value == 0)
+			{
+			outstring("hAHndrEHdTH ");
+			return 1;		/* Even hundred */
+			}
+		outstring("hAHndrEHd ");
+		}
+	if (value >= 20)
+		{
+		if ((value%10) == 0)
+			{
+			outstring(Ord_twenties[(value-20)/ 10]);
+			return 1;		/* Even ten */
+			}
+		outstring(Twenties[(value-20)/ 10]);
+		value = value % 10;
+		}
+	outstring(Ordinals[value]);
+	return 1;
+	} 
--- a/ernie-sat/tools/aligner/english_envir/english2phoneme/saynum.o
+++ b/ernie-sat/tools/aligner/english_envir/english2phoneme/saynum.o
--- a/ernie-sat/tools/aligner/english_envir/english2phoneme/spellword.c
+++ b/ernie-sat/tools/aligner/english_envir/english2phoneme/spellword.c
+#include <stdio.h>
+static char *Ascii[] =
+	{
+"nUWl ","stAArt AXv hEHdER ","stAArt AXv tEHkst ","EHnd AXv tEHkst ",
+"EHnd AXv trAEnsmIHSHAXn",
+"EHnkwAYr ","AEk ","bEHl ","bAEkspEYs ","tAEb ","lIHnIYfIYd ",
+"vERtIHkAXl tAEb ","fAOrmfIYd ","kAErAYj rIYtERn ","SHIHft AWt ",
+"SHIHft IHn ","dIHlIYt ","dIHvIHs kAAntrAAl wAHn ","dIHvIHs kAAntrAAl tUW ",
+"dIHvIHs kAAntrAAl THrIY ","dIHvIHs kAAntrAAl fOWr ","nAEk ","sIHnk ",
+"EHnd tEHkst blAAk ","kAEnsEHl ","EHnd AXv mEHsIHj ","sUWbstIHtUWt ",
+"EHskEYp ","fAYEHld sIYpERAEtER ","grUWp sIYpERAEtER ","rIYkAOrd sIYpERAEtER ",
+"yUWnIHt sIYpERAEtER ","spEYs ","EHksklAEmEYSHAXn mAArk ","dAHbl kwOWt ",
+"nUWmbER sAYn ","dAAlER sAYn ","pERsEHnt ","AEmpERsAEnd ","kwOWt ",
+"OWpEHn pEHrEHn ","klOWz pEHrEHn ","AEstEHrIHsk ","plAHs ","kAAmmAX ",
+"mIHnAHs ","pIYrIYAAd ","slAESH ",
+"zIHrOW ","wAHn ","tUW ","THrIY ","fOWr ",
+"fAYv ","sIHks ","sEHvAXn ","EYt ","nAYn ",
+"kAAlAXn ","sEHmIHkAAlAXn ","lEHs DHAEn ","EHkwAXl sAYn ","grEYtER DHAEn ",
+"kwEHsCHAXn mAArk ","AEt sAYn ",
+"EY ","bIY ","sIY ","dIY ","IY ","EHf ","jIY  ",
+"EYtCH ","AY ","jEY ","kEY ","EHl ","EHm ","EHn ","AA ","pIY ",
+"kw ","AAr ","EHz ","tIY ","AHw ","vIY ",
+"dAHblyUWw ","EHks ","wAYIY ","zIY ",
+"lEHft brAEkEHt ","bAEkslAESH ","rAYt brAEkEHt ","kAErEHt ",
+"AHndERskAOr ","AEpAAstrAAfIH ",
+"EY ","bIY ","sIY ","dIY ","IY ","EHf ","jIY  ",
+"EYtCH ","AY ","jEY ","kEY ","EHl ","EHm ","EHn ","AA ","pIY ",
+"kw ","AAr ","EHz ","tIY ","AHw ","vIY ",
+"dAHblyUWw ","EHks ","wAYIY ","zIY ",
+"lEHft brEYs ","vERtIHkAXl bAAr ","rAYt brEYs ","tAYld ","dEHl ",
+	};
+say_ascii(character)
+	int character;
+	{
+	outstring(Ascii[character&0x7F]);
+	}
+spell_word(word)
+	char *word;
+	{
+	for (word++ ; word[1] != '\0' ; word++)
+		outstring(Ascii[(*word)&0x7F]);
+	}
--- a/ernie-sat/tools/aligner/english_envir/english2phoneme/spellword.o
+++ b/ernie-sat/tools/aligner/english_envir/english2phoneme/spellword.o
--- a/ernie-sat/tools/aligner/english_envir/phoneme
+++ b/ernie-sat/tools/aligner/english_envir/phoneme
--- a/ernie-sat/tools/aligner/mandarin/16000/config
+++ b/ernie-sat/tools/aligner/mandarin/16000/config
+# Coding parameters
+SOURCEKIND = WAVEFORM
+SOURCEFORMAT = WAVE
+SOURCERATE = 625.0
+TARGETKIND = PLP_E_D_A_Z
+TARGETRATE = 100000.0
+SAVECOMPRESSED = T
+SAVEWITHCRC = T
+WINDOWSIZE = 250000.0
+ZMEANSOURCE = T
+USEHAMMING = T
+PREEMCOEF = 0.97
+NUMCHANS = 24
+LPCORDER = 12
+USEPOWER = T
--- a/ernie-sat/tools/aligner/mandarin/16000/hmmdefs
+++ b/ernie-sat/tools/aligner/mandarin/16000/hmmdefs
--- a/ernie-sat/tools/aligner/mandarin/16000/macros
+++ b/ernie-sat/tools/aligner/mandarin/16000/macros
+~o
+<STREAMINFO> 1 39
+<VECSIZE> 39<NULLD><PLP_E_D_A_Z><DIAGC>
+~v "varFloor1"
+<VARIANCE> 39
+ 3.775564e-03 4.079504e-03 4.140842e-03 4.754395e-03 5.421045e-03 3.905575e-03 3.417824e-03 3.297425e-03 3.133435e-03 2.756949e-03 2.279631e-03 1.618109e-03 1.932043e-03 1.913605e-04 1.656697e-04 1.542451e-04 1.967382e-04 1.924746e-04 1.852640e-04 1.763692e-04 1.960309e-04 1.941942e-04 1.750100e-04 1.440021e-04 1.083358e-04 8.763076e-06 2.761404e-05 2.222120e-05 2.312505e-05 3.049958e-05 3.095940e-05 3.121570e-05 3.065950e-05 3.542555e-05 3.565122e-05 3.164548e-05 2.577707e-05 2.046118e-05 1.202846e-06
--- a/ernie-sat/tools/aligner/mandarin/dict
+++ b/ernie-sat/tools/aligner/mandarin/dict
--- a/ernie-sat/tools/aligner/mandarin/dict4jieba.txt
+++ b/ernie-sat/tools/aligner/mandarin/dict4jieba.txt
--- a/ernie-sat/tools/aligner/mandarin/monophones
+++ b/ernie-sat/tools/aligner/mandarin/monophones
+a1
+a2
+a3
+a4
+a5
+aa
+ai1
+ai2
+ai3
+ai4
+ai5
+an1
+an2
+an3
+an4
+an5
+ang1
+ang2
+ang3
+ang4
+ang5
+ao1
+ao2
+ao3
+ao4
+ao5
+b
+c
+ch
+d
+e1
+e2
+e3
+e4
+e5
+ee
+ei1
+ei2
+ei3
+ei4
+ei5
+en1
+en2
+en3
+en4
+en5
+eng1
+eng2
+eng3
+eng4
+eng5
+er2
+er3
+er4
+er5
+f
+g
+h
+i1
+i2
+i3
+i4
+i5
+ia1
+ia2
+ia3
+ia4
+ia5
+ian1
+ian2
+ian3
+ian4
+ian5
+iang1
+iang2
+iang3
+iang4
+iang5
+iao1
+iao2
+iao3
+iao4
+iao5
+ie1
+ie2
+ie3
+ie4
+ie5
+ii
+in1
+in2
+in3
+in4
+in5
+ing1
+ing2
+ing3
+ing4
+ing5
+iong1
+iong2
+iong3
+iong4
+iong5
+iu1
+iu2
+iu3
+iu4
+iu5
+ix1
+ix2
+ix3
+ix4
+ix5
+iy1
+iy2
+iy3
+iy4
+iy5
+iz4
+j
+k
+l
+m
+n
+o1
+o2
+o3
+o4
+o5
+ong1
+ong2
+ong3
+ong4
+ong5
+oo
+ou1
+ou2
+ou3
+ou4
+ou5
+p
+q
+r
+s
+sh
+sil
+sp
+t
+u1
+u2
+u3
+u4
+u5
+ua1
+ua2
+ua3
+ua4
+ua5
+uai1
+uai2
+uai3
+uai4
+uai5
+uan1
+uan2
+uan3
+uan4
+uan5
+uang1
+uang2
+uang3
+uang4
+uang5
+ueng1
+ueng3
+ueng4
+ueng5
+ui1
+ui2
+ui3
+ui4
+ui5
+un1
+un2
+un3
+un4
+un5
+uo1
+uo2
+uo3
+uo4
+uo5
+uu
+v1
+v2
+v3
+v4
+v5
+van1
+van2
+van3
+van4
+van5
+ve1
+ve2
+ve3
+ve4
+ve5
+vn1
+vn2
+vn3
+vn4
+vn5
+vv
+x
+z
+zh
--- a/ernie-sat/tools/aligner/mandarin_envir/mandarin/16000/config
+++ b/ernie-sat/tools/aligner/mandarin_envir/mandarin/16000/config
+# Coding parameters
+SOURCEKIND = WAVEFORM
+SOURCEFORMAT = WAVE
+SOURCERATE = 625.0
+TARGETKIND = PLP_E_D_A_Z
+TARGETRATE = 100000.0
+SAVECOMPRESSED = T
+SAVEWITHCRC = T
+WINDOWSIZE = 250000.0
+ZMEANSOURCE = T
+USEHAMMING = T
+PREEMCOEF = 0.97
+NUMCHANS = 24
+LPCORDER = 12
+USEPOWER = T
--- a/ernie-sat/tools/aligner/mandarin_envir/mandarin/16000/hmmdefs
+++ b/ernie-sat/tools/aligner/mandarin_envir/mandarin/16000/hmmdefs
--- a/ernie-sat/tools/aligner/mandarin_envir/mandarin/16000/macros
+++ b/ernie-sat/tools/aligner/mandarin_envir/mandarin/16000/macros
+~o
+<STREAMINFO> 1 39
+<VECSIZE> 39<NULLD><PLP_E_D_A_Z><DIAGC>
+~v "varFloor1"
+<VARIANCE> 39
+ 3.775564e-03 4.079504e-03 4.140842e-03 4.754395e-03 5.421045e-03 3.905575e-03 3.417824e-03 3.297425e-03 3.133435e-03 2.756949e-03 2.279631e-03 1.618109e-03 1.932043e-03 1.913605e-04 1.656697e-04 1.542451e-04 1.967382e-04 1.924746e-04 1.852640e-04 1.763692e-04 1.960309e-04 1.941942e-04 1.750100e-04 1.440021e-04 1.083358e-04 8.763076e-06 2.761404e-05 2.222120e-05 2.312505e-05 3.049958e-05 3.095940e-05 3.121570e-05 3.065950e-05 3.542555e-05 3.565122e-05 3.164548e-05 2.577707e-05 2.046118e-05 1.202846e-06
--- a/ernie-sat/tools/aligner/mandarin_envir/mandarin/dict
+++ b/ernie-sat/tools/aligner/mandarin_envir/mandarin/dict
--- a/ernie-sat/tools/aligner/mandarin_envir/mandarin/monophones
+++ b/ernie-sat/tools/aligner/mandarin_envir/mandarin/monophones
--- a/ernie-sat/tools/parallel_wavegan_pretrained_vocoder.py
+++ b/ernie-sat/tools/parallel_wavegan_pretrained_vocoder.py
+# Copyright 2021 Tomoki Hayashi
+#  Apache 2.0  (http://www.apache.org/licenses/LICENSE-2.0)
+"""Wrapper class for the vocoder model trained with parallel_wavegan repo."""
+import logging
+import os
+from pathlib import Path
+from typing import Optional
+from typing import Union
+import yaml
+import torch
+class ParallelWaveGANPretrainedVocoder(torch.nn.Module):
+    """Wrapper class to load the vocoder trained with parallel_wavegan repo."""
+    def __init__(
+        self,
+        model_file: Union[Path, str],
+        config_file: Optional[Union[Path, str]] = None,
+    ):
+        """Initialize ParallelWaveGANPretrainedVocoder module."""
+        super().__init__()
+        try:
+            from parallel_wavegan.utils import load_model
+        except ImportError:
+            logging.error(
+                "`parallel_wavegan` is not installed. "
+                "Please install via `pip install -U parallel_wavegan`."
+            )
+            raise
+        if config_file is None:
+            dirname = os.path.dirname(str(model_file))
+            config_file = os.path.join(dirname, "config.yml")
+        with open(config_file) as f:
+            config = yaml.load(f, Loader=yaml.Loader)
+        self.fs = config["sampling_rate"]
+        self.vocoder = load_model(model_file, config)
+        if hasattr(self.vocoder, "remove_weight_norm"):
+            self.vocoder.remove_weight_norm()
+        self.normalize_before = False
+        if hasattr(self.vocoder, "mean"):
+            self.normalize_before = True
+    @torch.no_grad()
+    def forward(self, feats: torch.Tensor) -> torch.Tensor:
+        """Generate waveform with pretrained vocoder.
+        Args:
+            feats (Tensor): Feature tensor (T_feats, #mels).
+        Returns:
+            Tensor: Generated waveform tensor (T_wav).
+        """
+        return self.vocoder.inference(
+            feats,
+            normalize_before=self.normalize_before,
+        ).view(-1)
--- a/ernie-sat/utils.py
+++ b/ernie-sat/utils.py
@@ -38,6 +38,7 @@ import paddle.nn.functional as F
 from paddlespeech.t2s.modules.nets_utils import make_pad_mask
 from paddlespeech.t2s.exps.syn_utils import get_frontend
+from tools.parallel_wavegan_pretrained_vocoder import ParallelWaveGANPretrainedVocoder
 from sedit_arg_parser import parse_args
 model_alias = {
@@ -60,14 +61,38 @@ model_alias = {
+def is_chinese(ch):
+    if u'\u4e00' <= ch <= u'\u9fff':
+        return True
+    else:
+        return False
+def build_vocoder_from_file(
+    vocoder_config_file = None,
+    vocoder_file = None,
+    model = None,
+    device = "cpu",
+    ):
+    # Build vocoder
+    if str(vocoder_file).endswith(".pkl"):
+        # If the extension is ".pkl", the model is trained with parallel_wavegan
+        vocoder = ParallelWaveGANPretrainedVocoder(
+            vocoder_file, vocoder_config_file
+        )
+        return vocoder.to(device)
+    else:
+        raise ValueError(f"{vocoder_file} is not supported format.")
 def get_voc_out(mel, target_language="chinese"):
    # vocoder
    args = parse_args()
    assert target_language == "chinese" or target_language == "english", "In get_voc_out function, target_language is illegal..."
-    print("current vocoder: ", args.voc)
+    # print("current vocoder: ", args.voc)
    with open(args.voc_config) as f:
        voc_config = CfgNode(yaml.safe_load(f))
    # print(voc_config)
@@ -136,6 +161,23 @@ def get_am_inference(args, am_config):
 def evaluate_durations(phns, target_language="chinese", fs=24000, hop_length=300):
    args = parse_args()
+    if target_language == 'english':
+        args.lang='en'
+        args.am = "fastspeech2_ljspeech"
+        args.am_config = "download/fastspeech2_nosil_ljspeech_ckpt_0.5/default.yaml"
+        args.am_ckpt = "download/fastspeech2_nosil_ljspeech_ckpt_0.5/snapshot_iter_100000.pdz"
+        args.am_stat = "download/fastspeech2_nosil_ljspeech_ckpt_0.5/speech_stats.npy"
+        args.phones_dict = "download/fastspeech2_nosil_ljspeech_ckpt_0.5/phone_id_map.txt"
+    elif target_language == 'chinese':
+        args.lang='zh'
+        args.am = "fastspeech2_csmsc"
+        args.am_config="download/fastspeech2_conformer_baker_ckpt_0.5/conformer.yaml"
+        args.am_ckpt = "download/fastspeech2_conformer_baker_ckpt_0.5/snapshot_iter_76000.pdz"
+        args.am_stat = "download/fastspeech2_conformer_baker_ckpt_0.5/speech_stats.npy"
+        args.phones_dict ="download/fastspeech2_conformer_baker_ckpt_0.5/phone_id_map.txt"
    # args = parser.parse_args(args=[])
    if args.ngpu == 0:
        paddle.set_device("cpu")
@@ -167,6 +209,7 @@ def evaluate_durations(phns, target_language="chinese", fs=24000, hop_length=300
    phonemes = [
        phn if phn in vocab_phones else "sp" for phn in torch_phns
    ]
    phone_ids = [vocab_phones[item] for item in phonemes]
    phone_ids_new = phone_ids
    phone_ids_new.append(vocab_size-1)

--- a/ernie-sat/wavs/pred_zh.wav
+++ b/ernie-sat/wavs/pred_zh.wav
--- a/ernie-sat/wavs/pred_en_edit_paddle_voc.wav
+++ b/ernie-sat/wavs/pred_en_edit_paddle_voc.wav
--- a/ernie-sat/wavs/pred.wav
+++ b/ernie-sat/wavs/pred.wav