首页> 外文期刊>IEICE transactions on information and systems >Confidence Measure Based on Context Consistency Using Word Occurrence Probability and Topic Adaptation for Spoken Term Detection
【24h】

Confidence Measure Based on Context Consistency Using Word Occurrence Probability and Topic Adaptation for Spoken Term Detection

机译:基于上下文一致性的词汇出现概率和主题自适应度的置信度测度

获取原文
       

摘要

In this paper, we propose a novel confidence measure to improve the performance of spoken term detection (STD). The proposed confidence measure is based on the context consistency between a hypothesized word and its context in a word lattice. The main contribution of this paper is to compute the context consistency by considering the uncertainty in the results of speech recognition and the effect of topic. To measure the uncertainty of the context, we employ the word occurrence probability, which is obtained through combining the overlapping hypotheses in a word posterior lattice. To handle the effect of topic, we propose a method of topic adaptation. The adaptation method firstly classifies the spoken document according to the topics and then computes the context consistency of the hypothesized word with the topic-specific measure of semantic similarity. Additionally, we apply the topic-specific measure of semantic similarity by two means, and they are performed respectively with the information of the top-1 topic and the mixture of all topics according to topic classification. The experiments conducted on the Hub-4NE Mandarin database show that both the occurrence probability of context word and the topic adaptation are effective for the confidence measure of STD. The proposed confidence measure performs better compared with the one ignoring the uncertainty of the context or the one using a non-topic method.
机译:在本文中,我们提出了一种新颖的置信度度量来改善口语术语检测(STD)的性能。拟议的置信度度量基于假设词与词格中其上下文之间的上下文一致性。本文的主要贡献是通过考虑语音识别结果的不确定性和主题效果来计算上下文一致性。为了测量上下文的不确定性,我们使用单词出现概率,该概率是通过将单词后格子中的重叠假设进行组合而获得的。为了处理主题的影响,我们提出了一种主题自适应的方法。自适应方法首先根据主题对语音文档进行分类,然后使用特定主题的语义相似性度量来计算假设单词的上下文一致性。此外,我们通过两种方式应用特定于主题的语义相似性度量,并根据主题分类分别对top-1主题的信息和所有主题的混合进行执行。在Hub-4NE普通话数据库上进行的实验表明,上下文词的出现概率和主题适应对STD的置信度均有效。与忽略上下文不确定性的方法或使用非主题方法的方法相比,建议的置信度方法的效果更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号