首页> 外文期刊>Computer speech and language >A two-pass approach for handling out-of-vocabulary words in a large vocabulary recognition task
【24h】

A two-pass approach for handling out-of-vocabulary words in a large vocabulary recognition task

机译:在大型词汇识别任务中处理词汇外单词的两遍方法

获取原文
获取原文并翻译 | 示例

摘要

This paper addresses the problem of recognizing a vocabulary of over 50,000 city names in a telephone access spoken dialogue system. We adopt a two-stage framework in which only major cities are represented in the first stage lexicon. We rely on an unknown word model encoded as a phone loop to detect OOV city names (referred to as 'rare city' names). We use SpeM, a tool that can extract words and word-initial cohorts from phone graphs from a large fallback lexicon, to provide an N-best list of promising city name hypotheses on the basis of the phone graph corresponding to the OOV. This N-best list is then inserted into the second stage lexicon for a subsequent recognition pass. Experiments were conducted on a set of spontaneous telephone-quality utterances; each containing one rare city name. It appeared that SpeM was able to include nearly 75% of the correct city names in an N-best hypothesis list of 3000 city names. With the names found by SpeM to extend the lexicon of the second stage recognizer, a word accuracy of 77.3% could be obtained. The best one-stage system yielded a word accuracy of 72.6%. The absolute number of correctly recognized rare city names almost doubled, from 62 for the best one-stage system to 102 for the best two-stage system. However, even the best two-stage system recognized only about one-third of the rare city names retrieved by SpeM. The paper discusses ways for improving the overall performance in the context of an application.
机译:本文解决了在电话访问语音对话系统中识别超过50,000个城市名称的词汇的问题。我们采用两阶段框架,在第一阶段词典中仅代表主要城市。我们依靠编码为电话循环的未知单词模型来检测OOV城市名称(称为“稀有城市”名称)。我们使用SpeM(一种可以从大型后备词典中从电话图中提取单词和单词首字母组的工具),根据与OOV相对应的电话图来提供有前途的城市名称假设的N个最佳列表。然后,将此N最佳列表插入第二阶段词典,以进行后续识别。实验是针对一组自发的电话质量话语进行的;每个都包含一个稀有的城市名称。似乎SpeM能够在3000个城市名称的N个最佳假设列表中包含将近75%的正确城市名称。使用SpeM发现的名称扩展第二阶段识别器的词典,可以获得77.3%的单词准确度。最好的单级系统产生的单词准确性为72.6%。正确识别的稀有城市名称的绝对数量几乎翻了一番,从最佳一阶段系统的62个增加到最佳两阶段系统的102个。但是,即使是最好的两阶段系统,也只能识别SpeM检索到的稀有城市名称的大约三分之一。本文讨论了在应用程序上下文中改善整体性能的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号