【24h】

Order-free spoken term detection

机译:无序口语检测

获取原文

摘要

In this paper, we propose Time-Marked Word (TMW) lists as a replacement for the lattices and Confusion Networks (CNs) widely used as indexing vehicles for Spoken Term Detection (STD). In a TMWlist, candidates are simply tagged with posterior probabilities and time information and stored as a large list of words: the additional ordering present in a lattice or CN is discarded. TMW lists compactly summarize a large ASR search space. Representing a large search space is critical for STD metrics such as ATWV that heavily penalize misses of rare keywords. Comparisons on the OpenKWS 2014 Tamil limited language pack task [1] show that the new TMW-based indexing results in better performance while being faster and having a smaller footprint.
机译:在本文中,我们提出了时标字(TMW)列表来代替格和混淆网(CN),它们被广泛用作口语检测(STD)的索引工具。在TMWlist中,简单地用后验概率和时间信息标记候选项,并将其存储为大量单词:删除晶格或CN中存在的其他排序。 TMW列表紧凑地总结了一个很大的ASR搜索空间。代表较大的搜索空间对于STD指标(例如ATWV)至关重要,因为它严重惩罚了稀有关键字的遗漏。对OpenKWS 2014 Tamil有限语言包任务的比较[1]表明,新的基于TMW的索引可实现更好的性能,同时使其速度更快,占用空间更小。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号