首页> 外文会议>International conference on discovery science >Passage-Based Document Retrieval as a Tool for Text Mining with User's Information Needs
【24h】

Passage-Based Document Retrieval as a Tool for Text Mining with User's Information Needs

机译:基于段落的文档检索作为文本挖掘的工具,具有用户信息需求

获取原文

摘要

Document retrieval can be considered as a basic but important tool for text mining that is capable of taking a user's information need into account. However, document retrieval is a hard task if multi-topic lengthy documents have to be retrieved with a very short description (a few keywords) of the information need. In this paper, we focus on this problem which is typical in real world applications. We experimentally validate that passage-based document retrieval is advantageous in such circumstances as compared to conventional document retrieval. Passage-based document retrieval is a kind of document retrieval which takes into account only small fractions (passages) of documents to judge the document relevance to the information need. As a passage-based method, we employ the method based on density distributions of keywords. This is compared with the following three conventional methods for document retrieval: the vector space model, pseudo-feedback, and latent semantic indexing. Experimental results show that the passage-based method is superior to the conventional methods if long documents have to be retrieved by short queries.
机译:文档检索可以被视为能够考虑用户信息的文本挖掘的基本但重要的工具。然而,如果必须使用信息的非常简短的描述(几个关键字)需要检索多主题冗长文档,则文档检索是一个艰难的任务。在本文中,我们专注于这个问题在现实世界应用中的典型问题。我们通过传统文档检索相比,我们通过实验验证基于段落的文档检索在这种情况下是有利的。基于段落的文档检索是一种文档检索,其仅考虑了文件的小分数(通道),以判断与信息需要的文档相关性。作为基于段落的方法,我们采用了基于关键字的密度分布的方法。将其与以下三种传统方法进行比较,用于文件检索:矢量空间模型,伪反馈和潜在语义索引。实验结果表明,如果必须通过短查询检索长的文件,则基于段的方法优于传统方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号