【24h】

N-Gram FST Indexing for Spoken Term Detection

机译:N-Gram FST索引用于语音术语检测

获取原文

摘要

An efficient indexing scheme is essentially important for spoken term detection (STD) on large databases, particularly for phone-based systems that have been widely adopted to achieve vocabulary-independent detection. While the finite state transducer (FST) composition provides a standard indexing approach, the n-gram reverse indexing is more flexible in connectivity representation and confidence measuring and therefore may result in better performance than searching within the original lattices or the equivalent FSTs. In this paper we present an n-gram FST indexing approach which combines the flexibility of n-gram indexing and the efficiency of FST indexing. Specifically, we employ the n-gram indexing to relax connectivity in original lattices and then formalize the indices into an FST for online search. We demonstrate this approach with a phone-based STD task where the lattice is sparse due to strong language models. The results show that n-gram FST indexing provides not only better detection performance than lattice search, but also a faster detection than both conventional n-gram and FST indexing.
机译:对于大型数据库上的语音术语检测(STD),尤其是对于已被广泛采用以实现与词汇无关的检测的基于电话的系统,有效的索引方案至关重要。虽然有限状态换能器(FST)组合提供了一种标准的索引方法,但n-gram反向索引在连接性表示和置信度测量方面更为灵活,因此与在原始晶格或等效FST中进行搜索相比,其性能可能更好。在本文中,我们提出了一种n-gram FST索引方法,该方法结合了n-gram索引的灵活性和FST索引的效率。具体来说,我们使用n-gram索引来放松原始格中的连接性,然后将索引形式化为FST以进行在线搜索。我们通过基于电话的STD任务演示了这种方法,该任务由于强大的语言模型而使得晶格稀疏。结果表明,n-gram FST索引不仅比格搜索提供更好的检测性能,而且比常规n-gram和FST索引都提供更快的检测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号