首页> 外文期刊>Information Sciences: An International Journal >Fast and effective cluster-based information retrieval using frequent closed itemsets
【24h】

Fast and effective cluster-based information retrieval using frequent closed itemsets

机译:基于频繁封闭项目的基于基于群集的信息的快速有效的基于群集的信息

获取原文
获取原文并翻译 | 示例
       

摘要

Document Information retrieval consists of finding the documents in a collection of documents that are the most relevant to a user query. Information retrieval techniques are widely-used by organizations to facilitate the search for information. However, applying traditional information retrieval techniques is time consuming for large document collections. Recently, cluster-based information retrieval approaches have been developed. Although these approaches are often much faster than traditional approaches for processing large document collections, the quality of the documents retrieved by cluster-based approaches is often less than that of traditional approaches. To address this drawback of i cluster-based approaches, and improve the performance of information retrieval both in terms of runtime and quality of retrieved documents, this paper proposes a new cluster based information retrieval approach named ICIR (Intelligent Cluster-based Information Retrieval). The proposed approach combines k-means clustering with frequent closed itemset mining to extract clusters of documents and find frequent terms in each cluster. Patterns discovered in each cluster are then used to select the most relevant document clusters to answer each user query. Four alternative heuristics are proposed to select the most relevant clusters, and two alternative heuristics for choosing documents in the selected clusters. Thus, eight versions of the proposed approach are obtained. To validate the proposed approach, extensive experiments have been carried out on well-known document collections. Results show that the designed approach outperforms traditional and cluster-based information retrieval approaches both in terms of execution time and quality of the returned documents. (C) 2018 Elsevier Inc. All rights reserved.
机译:文档信息检索包括在与用户查询最相关的文档集中找到文档。组织广泛使用信息检索技术,以便于搜索信息。但是,应用传统信息检索技术是大型文档集合的耗时。最近,已经开发了基于群集的信息检索方法。虽然这些方法通常比传统的处理大文件收集方法更快,但是由基于群集的方法检索的文档的质量往往小于传统方法的文档。为了解决基于群集的方法的这种缺点,并提高信息检索的性能,以及检索到的检索文档的质量,提出了一种名为ICIR的基于集群的信息检索方法(基于智能群集的信息检索)。所提出的方法将K-Means群集与频繁关闭的项目集挖掘组合以提取文档集群并在每个群集中找到频繁的术语。然后,在每个群集中发现的模式用于选择最相关的文档群集以应对每个用户查询进行应答。建议四种替代启发式学习选择最相关的群集,以及用于在所选集群中选择文档的两个替代启发式。因此,获得了八种版本的提出方法。为了验证所提出的方法,已经在众所周知的文件收集中进行了广泛的实验。结果表明,设计的方法在返回的文件的执行时间和质量方面优于传统和基于群集的信息检索方法。 (c)2018年Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号