首页> 外文期刊>IEEE Transactions on Speech and Audio Proceessing >On the use of linguistic consistency in systems for human-computer dialogues
【24h】

On the use of linguistic consistency in systems for human-computer dialogues

机译:关于人机对话系统中语言一致性的使用

获取原文
获取原文并翻译 | 示例

摘要

This paper introduces new recognition strategies based on reasoning about results obtained with different Language Models (LMs). Strategies are built following the conjecture that the consensus among the results obtained with different models gives rise to different situations in which hypothesized sentences have different word error rates (WER) and may be further processed with other LMs. New LMs are built by data augmentation using ideas from latent semantic analysis and trigram analogy. Situations are defined by expressing the consensus among the recognition results produced with different LMs and by the amount of unobserved trigrams in the hypothesized sentence. The diagnostic power of the use of observed trigrams or their corresponding class trigrams is compared with that of situations based on values of sentence posterior probabilities. In order to avoid or correct errors due to syntactic inconsistence of the recognized sentence, automata, obtained by explanation-based learning, are introduced and used in certain conditions. Semantic Classification Trees are introduced to provide sentence patterns expressing constraints of long distance syntactic coherence. Results on a dialogue corpus provided by France Telecom R&D have shown that starting with a WER of 21.87% on a test set of 1422 sentences, it is possible to subdivide the sentences into three sets characterized by automatically recognized situations. The first one has a coverage of 68% with a WER of 7.44%. The second one has various types of sentences with a WER around 20%. The third one contains 13% of the sentences that should be rejected with a WER around 49%. The second set characterizes sentences that should be processed with particular care by the dialogue interpreter with the possibility of asking a confirmation from the user.
机译:本文基于对使用不同语言模型(LM)获得的结果进行推理的基础上,介绍了新的识别策略。根据猜想建立策略,即使用不同模型获得的结果之间的共识会导致不同的情况,在这些情况下,假设的句子具有不同的单词错误率(WER),并且可能会与其他LM进行进一步处理。通过使用潜在语义分析和三字母组合类比的思想通过数据增强来构建新的LM。通过表达不同LM产生的识别结果之间的共识,并通过假设句子中未观察到的字母的数量来定义情况。基于句子后验概率的值,将使用观察到的三词组或对应的三类词组的诊断能力与情况的诊断能力进行比较。为了避免或纠正由于识别语句的句法不一致引起的错误,引入了通过基于解释的学习获得的自动机,并在某些条件下使用了自动机。引入语义分类树以提供表达长距离句法连贯性约束的句子模式。法国电信研发部提供的对话语料库的结果表明,从1422个句子的测试集的WER为21.87%开始,可以将这些句子细分为三个具有自动识别情况的句子。第一个覆盖率达68%,WER为7.44%。第二个句子有各种类型的句子,WER约为20%。第三部分包含13%的句子,应以49%的WER拒绝。第二组表征应由对话解释器仔细处理的句子,并可能要求用户确认。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号