The Inherent Temporal Precision of Phoneme Transitions

Baghai-Ravary L.

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >The Inherent Temporal Precision of Phoneme Transitions

【24h】

The Inherent Temporal Precision of Phoneme Transitions

机译：音素过渡的内在时间精度

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In natural speech, some phoneme transitions correspond to abrupt changes in the acoustic signal. Others are less clear-cut because the acoustic transition from one phoneme to the next is gradual. In this paper we determine the naturally occurring groups of phonemes (regardless of conventional phonetic categories) which show similar characteristics in such behavior. These data-driven groupings could be used in the design of decision-trees for context-dependent phoneme clustering, as used in large-vocabulary speech recognition and alignment systems, or during the design of speech databases for speech synthesis systems. We use 128 different Hidden Markov Model phoneme alignment systems and a large corpus of British English speech to assess the consistency with which different phoneme transitions can be identified. The phoneme transitions are grouped automatically so as to minimize the statistical differences in behavior between members of each group. In this way we derive two sets of phonemic classes, one for the first phoneme of each phoneme-to-phoneme transition, and another for the second. The grouping of the phonemes confirms that broad phonetic classes are a significant indicator of the accuracy with which boundaries can be identified, but there are a number of exceptions and some apparent sub-divisions and mergers of accepted phonetic classes. The automatic grouping of the second phonemes results in two singletons, /Z/ and /N/ (in SAMPA notation). Finally, statistics are presented which characterize the precision with which transitions between these automatic classes can be identified. These could provide weightings to be applied to different transitions to provide a more realistic assessment when evaluating the relative accuracies of different alignment systems.

机译：在自然语音中，某些音素过渡对应于声学信号中的突然变化。其他声音则不太清晰，因为从一个音素到另一个音素的声音过渡是逐渐的。在本文中，我们确定自然出现的音素组（无论常规音素类别如何），它们在此类行为中表现出相似的特征。这些数据驱动的分组可用于与上下文相关的音素聚类的决策树设计中，如用于大词汇量语音识别和对齐系统中，或用于语音合成系统的语音数据库设计中。我们使用128个不同的隐马尔可夫模型音素对齐系统和大量的英式英语语音库来评估可识别不同音素过渡的一致性。音素过渡会自动分组，以最大程度地减少每个组成员之间行为的统计差异。这样，我们得出了两组音素类，一组用于每个音素到音素过渡的第一个音素，而另一组则用于第二个音素。音素的分组确认，广泛的音素类别是可识别边界的准确度的重要指标，但是存在许多例外情况，并且某些公认的音素类别有明显的细分和合并。第二个音素的自动分组导致两个单例/ Z /和/ N /（以SAMPA表示法）。最后，提供了统计数据，这些统计数据表征了可以识别这些自动类之间的转换的精度。这些可以提供权重，以应用于不同的过渡，以便在评估不同对齐系统的相对精度时提供更现实的评估。

著录项

来源
《Audio, Speech, and Language Processing, IEEE Transactions on》 |2013年第3期|p.579-586|共8页
作者
Baghai-Ravary L.;
展开▼
作者单位

Phonetics Laboratory, University of Oxford, Oxford, UK;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Accuracy; Acoustics; Databases; Hidden Markov models; Speech; Speech recognition; Vocabulary; Broad phonetic classes; phoneme alignment; speech analysis; speech recognition;

机译：准确性;声学;数据库;隐藏的马尔可夫模型;言语;语音识别;词汇;广泛的语音课程;音素对齐;语音分析;语音识别;

相似文献

外文文献
中文文献
专利

1. Using Reversed Sequences and Grapheme Generation Rules to Extend the Feasibility of a Phoneme Transition Network-Based Grapheme-to-Phoneme Conversion [J] . Seng KHEANG, Kouichi KATSURADA, Yurie IRIBE, IEICE transactions on information and systems . 2016,第4期

机译：使用反向序列和音素生成规则来扩展基于音素过渡网络的音素到音素转换的可行性
2. The Phoneme Identification Test for Assessment of Spectral and Temporal Discrimination Skills in Children: Development, Normative Data, and Test‐Retest Reliability Studies [J] . Cameron Sharon, Chong-White Nicky, Mealings Kiri, Journal of the American Academy of Audiology . 2018,第2期

机译：儿童谱和时间辨别技能评估的音素识别试验：开发，规范数据和测试＆amp;＃8208;重新获得可靠性研究
3. The Time Course of Audio-Visual Phoneme Identification: a High Temporal Resolution Study [J] . G. Vinodh Kumar, Neeraj Kumar, Dipanjan Roy, Multisensory research . 2018,第1a2期

机译：音频视觉音素识别的时间过程：高时分辨率研究
4. Gate Driver with Short Inherent Dead-Time for Wide-Bandgap High-Precision Inverters [C] . Pelle Weiler, Bas Vermulst IEEE Applied Power Electronics Conference and Exposition . 2020

机译：宽带隙高精度逆变器的固有固有死区时间短的栅极驱动器
5. High Precision Fret to Study Bimolecular Dynamics with High Temporal Resolution [D] . Ma, Junyan. 2018

机译：高精度摩托，以高颞分辨率研究双分子动态
6. Relative contributions of spectral and temporal cues for phoneme recognition [O] . Li Xu, Catherine S. Thompson, Bryan E. Pfingst -1

机译：频谱和时间提示对音素识别的相对贡献
7. Using Reversed Sequences and Grapheme Generation Rules to Extend the Feasibility of a Phoneme Transition Network-Based Grapheme-to-Phoneme Conversion [O] . Seng KHEANG, Kouichi KATSURADA, Yurie IRIBE, 2016

机译：使用反向序列和图形生成规则来扩展基于音素转换网络的图形到音素转换的可行性
8. Atomic Scale Temporal Structure Inherent to High-Order Harmonic Generation [R] . Harris, S. E., Macklin, J. J., Hansch, T. W. 1993

机译：高阶谐波生成固有的原子尺度时间结构

The Inherent Temporal Precision of Phoneme Transitions

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅