【24h】

Structured Redefinition of Sound Units by Merging and Splitting for Improved Speech Recognition

机译:通过合并和拆分对声音单元进行结构化重新定义,以改善语音识别

获取原文

摘要

The performance of speech recgnition systems degrades when the basic sound units used are poorly defined or inconsistently used. Several attempts have been made to improve dictionaries automatically, either by redefining pronunciations of words in terms of existing sound units, or by redefining the sound units themselves completely. The problem with these approaches is that, while the former is limited by the sound units used, the latter discards all human information that has been incorporated into an expert-designed recognition dictionary. In this paper we propose a new merging-and-splitting algorithm that attempts to redefine the basic sound units used in the dictionary, whiel maintaining the expert knowledge built into a manually designed dictionary. Sound units from an existing dictionary are merged based on their inherent confusability, as measured by a Monte-Carlo based metric, and subsequently split to maximize the likelihood of the training data. Experiemtns with the Resource Management database indicate that this approach results in an improvement in recognition accuracy when context-independent models are used for recognition. When context-dependent models are used for recognition. When context-dependent models are used, the improvement observeed is reduced.
机译:当使用的基本声音单元定义不当或不一致时,语音识别系统的性能会降低。已经进行了一些尝试来自动改进字典,这是通过根据现有声音单位重新定义单词的发音,或者通过完全重新定义声音单位本身来进行的。这些方法的问题在于,尽管前者受到使用的声音单位的限制,但后者却丢弃了已被整合到专家设计的识别词典中的所有人类信息。在本文中,我们提出了一种新的合并和拆分算法,该算法试图重新定义词典中使用的基本声音单位,同时保持内置在手动设计的词典中的专家知识。现有字典中的声音单元会根据其固有的易混淆性进行合并(如基于蒙特卡洛的度量标准进行度量),然后进行拆分以最大程度地提高训练数据的可能性。资源管理数据库的经验表明,当使用上下文无关模型进行识别时,此方法可提高识别准确性。当使用上下文相关模型进行识别时。当使用上下文相关模型时,观察到的改进会减少。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号