首页> 外文期刊>IEICE transactions on information and systems >Spoken Term Detection Using SVM-Based Classifier Trained with Pre-Indexed Keywords
【24h】

Spoken Term Detection Using SVM-Based Classifier Trained with Pre-Indexed Keywords

机译:使用基于SVM的分类器和预索引关键字训练的语音术语检测

获取原文
           

摘要

This study presents a two-stage spoken term detection (STD) method that uses the same STD engine twice and a support vector machine (SVM)-based classifier to verify detected terms from the STD engine's output. In a front-end process, the STD engine is used to pre-index target spoken documents from a keyword list built from an automatic speech recognition result. The STD result includes a set of keywords and their detection intervals (positions) in the spoken documents. For keywords having competitive intervals, we rank them based on the STD matching cost and select the one having the longest duration among competitive detections. The selected keywords are registered in the pre-index. They are then used to train an SVM-based classifier. In a query term search process, a query term is searched by the same STD engine, and the output candidates are verified by the SVM-based classifier. Our proposed two-stage STD method with pre-indexing was evaluated using the NTCIR-10 SpokenDoc-2 STD task and it drastically outperformed the traditional STD method based on dynamic time warping and a confusion network-based index.
机译:这项研究提出了一种两阶段的口语项检测(STD)方法,该方法两次使用相同的STD引擎,并且基于支持向量机(SVM)的分类器从STD引擎的输出中验证检测到的项。在前端过程中,STD引擎用于根据自动语音识别结果构建的关键字列表对目标语音文档进行预索引。 STD结果包括一组关键词及其在语音文档中的检测间隔(位置)。对于具有竞争间隔的关键字,我们根据STD匹配成本对它们进行排名,然后在竞争检测中选择持续时间最长的关键字。所选关键字已注册在预索引中。然后将它们用于训练基于SVM的分类器。在查询词搜索过程中,查询词由相同的STD引擎搜索,输出候选由基于SVM的分类器验证。我们使用NTCIR-10 SpokenDoc-2 STD任务评估了我们提出的带有预索引的两阶段STD方法,该方法大大优于基于动态时间规整和基于混淆网络的索引的传统STD方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号