questions.txt 1.0 KB
Newer Older
C
Corentin Jemine 已提交
1 2 3
- Page format
- Could it be that in HMM-base synthesis, static features and their deltas (dynamic) are the same things as predicted mean and variance?
- How do i aggregate the different metrics: MOS, Preference score, LSD/VU error rate/F0 RMSE?
4 5

Self:
C
Corentin Jemine 已提交
6
- Am I going to need a different embedding for the voice of a same speaker in two different languages? I may need to formulate a "unique encoding hypothesis", i.e. that two people with the same voice in language A would also have the same voice in language B. This is likely not a true hypothesis but still a reasonable simplification for the voice transfer problem.
C
Corentin Jemine 已提交
7
- [1409.0473] "Most of the proposed neural machine translation models belong to a family of encoder–decoders (...), with an encoder and a decoder for each language, (...)". I could do something similar: a voice encoder and a synthesizer per language, and somehow manage to keep a shared embedding space for all languages. This reminds me of UNIT, I wonder if it's applicable here. Very likely, the best way to do this lies in recent NLP methods.