首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >The Inherent Temporal Precision of Phoneme Transitions
【24h】

The Inherent Temporal Precision of Phoneme Transitions

机译:音素过渡的内在时间精度

获取原文
获取原文并翻译 | 示例

摘要

In natural speech, some phoneme transitions correspond to abrupt changes in the acoustic signal. Others are less clear-cut because the acoustic transition from one phoneme to the next is gradual. In this paper we determine the naturally occurring groups of phonemes (regardless of conventional phonetic categories) which show similar characteristics in such behavior. These data-driven groupings could be used in the design of decision-trees for context-dependent phoneme clustering, as used in large-vocabulary speech recognition and alignment systems, or during the design of speech databases for speech synthesis systems. We use 128 different Hidden Markov Model phoneme alignment systems and a large corpus of British English speech to assess the consistency with which different phoneme transitions can be identified. The phoneme transitions are grouped automatically so as to minimize the statistical differences in behavior between members of each group. In this way we derive two sets of phonemic classes, one for the first phoneme of each phoneme-to-phoneme transition, and another for the second. The grouping of the phonemes confirms that broad phonetic classes are a significant indicator of the accuracy with which boundaries can be identified, but there are a number of exceptions and some apparent sub-divisions and mergers of accepted phonetic classes. The automatic grouping of the second phonemes results in two singletons, /Z/ and /N/ (in SAMPA notation). Finally, statistics are presented which characterize the precision with which transitions between these automatic classes can be identified. These could provide weightings to be applied to different transitions to provide a more realistic assessment when evaluating the relative accuracies of different alignment systems.
机译:在自然语音中,某些音素过渡对应于声学信号中的突然变化。其他声音则不太清晰,因为从一个音素到另一个音素的声音过渡是逐渐的。在本文中,我们确定自然出现的音素组(无论常规音素类别如何),它们在此类行为中表现出相似的特征。这些数据驱动的分组可用于与上下文相关的音素聚类的决策树设计中,如用于大词汇量语音识别和对齐系统中,或用于语音合成系统的语音数据库设计中。我们使用128个不同的隐马尔可夫模型音素对齐系统和大量的英式英语语音库来评估可识别不同音素过渡的一致性。音素过渡会自动分组,以最大程度地减少每个组成员之间行为的统计差异。这样,我们得出了两组音素类,一组用于每个音素到音素过渡的第一个音素,而另一组则用于第二个音素。音素的分组确认,广泛的音素类别是可识别边界的准确度的重要指标,但是存在许多例外情况,并且某些公认的音素类别有明显的细分和合并。第二个音素的自动分组导致两个单例/ Z /和/ N /(以SAMPA表示法)。最后,提供了统计数据,这些统计数据表征了可以识别这些自动类之间的转换的精度。这些可以提供权重,以应用于不同的过渡,以便在评估不同对齐系统的相对精度时提供更现实的评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号