前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >SP Module 5 Speech Synthesis – Phonemes and the Front End

SP Module 5 Speech Synthesis – Phonemes and the Front End

作者头像
杨丝儿
发布2022-11-10 15:29:33
4860
发布2022-11-10 15:29:33
举报
文章被收录于专栏:杨丝儿的小站

Tokenisation & normalisation

When processing almost any text, we need to find the words. This involves splitting the input character sequence into tokens and normalising each token into words.

s13262010222022
s13262010222022
s13271410222022
s13271410222022

Handwritten rules

Every user of a language holds a lot of knowledge about that language in their mind. One way to capture and make use of that knowledge is in the form of rules.

Finite state transducer

Finite State Transducers provide general-purpose machinery for rewriting an input sequence as an output sequence. They have many uses, including verbalising NSWs into natural language.

Phonemes and allophones

This video introduces the notion of phoneme as a basic unit of phonological analysis.

Difference between Phonetics and Phonology.

s13401210222022
s13401210222022
s13410710222022
s13410710222022
  • two levels of representation
    • surface or allophonic level
      • represents something close to articulation
      • the phonetic descriptions that we’ve been learning so far.
    • underlying or phonemic level
      • represents abstract categories that are something like our perceptual judgments about which sounds are and are not similar to each other.
  • Somewhat confusingly, both of these levels use symbols from the IPA.
    • In order to distinguish between the two levels of representation, we use two types of brackets.
      • For the surface forms, we use square brackets,
        • [ ] can indicate varying degrees of detail
      • for the underlying forms, we use slashes.
        • / / can only indicate abstract categories of phonemic contrast

Easy to recognize the difference in underlying level.

s13444310222022
s13444310222022

Hard to recognize the difference in underlying level, for English.

s13450110222022
s13450110222022

These surface representations, represented between square brackets are known as allophones and they are language specific.

s13455710222022
s13455710222022

In Mapudungun, we can recognize the difference.

s13452810222022
s13452810222022

The language specific allophones, in English 2 surface representation with 1 underlying representation, and in Mapudungun 2 surface representations with 2 different underlying representations separately.

s13461710222022
s13461710222022
s13513610222022
s13513610222022
s13511910222022
s13511910222022
  • Phonologists sometimes formalize this relationship between the phoneme and its allophones in a rule.
    • The arrow is read as “is realized as”
    • the slash stands for “in the environment of”.
    • The blank shows where the phoneme occurs in order for the rule to apply.
  • In order to fully define a phoneme,
    • we first need to observe the surface forms that occur, along with their environments.
    • Then, we need to describe the patterns that we see with respect to the surface forms and their phonetic environments, looking for generalizations along the way.
  • The types of generalizations that we typically mean are those having to do with shared features across the predictive environments.

More examples

s13583010222022
s13583010222022

Pronunciation

The phoneme inventory is a design choice when we build a TTS or ASR system. The IPA is a helpful guide when making this choice, but we don’t have to obey it, and are free to make different choices.

s16155010222022
s16155010222022

Prosody

Prosody for Text-To-Speech can be reduced the the problem of predicting pausing, duration, and F0.

s16233110222022
s16233110222022

Decision tree

Because a decision tree only asks simple ‘yes or no’ questions about predictors, it works for both categorical and continuous predictors, or a mixture of both.

s16292810222022
s16292810222022

Learning decision trees

Having defined the model, we now need an algorithm to estimate it from data. For a Decision Tree, this is a simple greedy algorithm.

Training data

s23245310222022
s23245310222022

Goal: making query and reducing entropy of the probability distribution.

s23263710222022
s23263710222022

Stop condition: result data set is small or result is acceptable or the depth of tree is reach the limit.

Summary

s16465410222022
s16465410222022
s16471410222022
s16471410222022

Origin: Module 5 speech synthesis – phonemes and the front end Translate + Edit: YangSier (Homepage)

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2022-10-20,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • Tokenisation & normalisation
  • Handwritten rules
  • Finite state transducer
  • Phonemes and allophones
  • Pronunciation
  • Prosody
  • Decision tree
  • Learning decision trees
  • Summary
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档