首页> 外文会议>International Speech Communication Association >Reducing the Effect of OOV Query Words by Using Morph-Based SpokenDocument Retrieval
【24h】

Reducing the Effect of OOV Query Words by Using Morph-Based SpokenDocument Retrieval

机译:通过使用Morph-Copencument Retrival来降低Oov查询词的效果

获取原文

摘要

Morph-based spoken document retrieval uses morpheme-like subword units for both language modeling and as index terms. Problems of out-of-vocabulary (OOV) words are avoided as the morph recognizer can recognize any word in speech as a se-quence of subwords. The effect of previously unseen query words (i.e. words that are not in the language model training text) is analyzed for Finnish spoken document retrieval. The performance of the morph-based system is compared to a word-based approach. Language models with artificially high OOV query word rates are built and the results show that morph-based retrieval suffers significantly less from the OOV query words than word-based. Extracting alternative recognition can-didates from confusion networks further improves the results, especially for morph-based retrieval.
机译:基于Morph的口头文档检索使用语言建模和索引术语的语言次字单元。随着变形识别器可以避免词汇外(OOV)单词的问题可以将任何词语中的任何单词作为子字的静态识别出来。分析了以前看不见的查询词的效果(即,语言模型培训文本中的单词)进行芬兰语口语文件检索。将晶鸦的系统的性能与基于词的方法进行了比较。构建了具有人工高OOV查询字符率的语言模型,结果表明,从oov查询单词比基于Word的单词,Morph的检索显着较低。提取替代识别可以从混淆网络进一步提高结果,特别是对于基于晶圆的检索。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号