首页> 外文期刊>Pattern recognition letters >Exploring the use of latent topical information for statistical Chinese spoken document retrieval
【24h】

Exploring the use of latent topical information for statistical Chinese spoken document retrieval

机译:探索使用潜在的主题信息进行统计的中文语音文档检索

获取原文
获取原文并翻译 | 示例
           

摘要

Information retrieval which aims to provide people with easy access to all kinds of information is now becoming more and more emphasized. However, most approaches to information retrieval are primarily based on literal term matching and operate in a deterministic manner. Thus their performance is often limited due to the problems of vocabulary mismatch and not able to be steadily improved through use. In order to overcome these drawbacks as well as to enhance the retrieval performance, in this paper, we explore the use of topical mixture model for statistical Chinese spoken document retrieval. Various kinds of model structures and learning approaches were extensively investigated. In addition, the retrieval capabilities were verified by comparison with the probabilistic latent semantic analysis model, vector space model and latent semantic indexing model, as well as our previously presented HMM/N-gram retrieval model. The experiments were performed on the TDT Chinese collections (TDT-2 and TDT-3). Noticeable improvements in retrieval performance were obtained.
机译:旨在使人们能够轻松访问各种信息的信息检索现在越来越受到重视。但是,大多数信息检索方法主要基于文字术语匹配并以确定性方式运行。因此,由于词汇不匹配的问题,它们的性能通常受到限制,并且不能通过使用而稳定地提高。为了克服这些缺点并提高检索性能,本文探讨了主题混合模型在统计汉语语音文档检索中的应用。广泛研究了各种模型结构和学习方法。此外,通过与概率潜在语义分析模型,向量空间模型和潜在语义索引模型以及我们先前介绍的HMM / N-gram检索模型进行比较,验证了检索能力。实验是在TDT中文集(TDT-2和TDT-3)上进行的。获得了显着的检索性能改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号