【24h】

Set-Based Model: A New Approach for Information Retrieval

机译:基于集合的模型:一种新的信息检索方法

获取原文
获取原文并翻译 | 示例

摘要

The objective of this paper is to present a new technique for computing term weights for index terms, which leads to a new ranking mechanism, referred to as set-based model. The components in our model are no longer terms, but termsets. The novelty is that we compute term weights using a data mining technique called association rules, which is tune efficient and yet yields nice improvements in retrieval effectiveness. The set-based model function for computing the similarity between a document and a query considers the termset frequency in the document and its scarcity in the document collection. Experimental results show that our model improves the average precision of the answer set for all three collections evaluated. For the TReC-3 collection, our set-based model led to a gain, relative to the standard vector space model, of 37% in average precision curves and of 57% in average precision for the top 10 documents. Like the vector space model, the set-based model has time complexity that is linear in the number of documents in the collection.
机译:本文的目的是提出一种用于计算索引词的词权重的新技术,从而产生一种新的排名机制,称为基于集合的模型。我们模型中的组件不再是术语,而是术语集。新颖之处在于,我们使用一种称为关联规则的数据挖掘技术来计算术语权重,该技术具有调优效果,但检索效率却得到了很好的提高。用于计算文档和查询之间相似度的基于集合的模型函数考虑了文档中的术语集频率及其在文档集合中的稀缺性。实验结果表明,我们的模型提高了所评估的所有三个集合的答案集的平均精度。对于TReC-3集合,相对于标准向量空间模型,我们的基于集合的模型的前10个文档的平均精度曲线提高了37%,平均精度提高了57%。与向量空间模型一样,基于集合的模型的时间复杂度在集合中的文档数中呈线性关系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号