首页> 外文会议>International Conference on Artificial Intelligence and Data Processing >A Comparison of Recent Information Retrieval Term-Weighting Models Using Ancient Datasets
【24h】

A Comparison of Recent Information Retrieval Term-Weighting Models Using Ancient Datasets

机译:使用古代数据集的最近信息检索术语加权模型的比较

获取原文

摘要

With the development of technology, human computer interaction is continuously increasing. Parallel to this, information from web sites, social media, blogs and other applications reach enormous dimensions. It becomes a big problem to obtain the desired information from this mass of data. One way of solving this problem is to keep the information correctly indexed and searched by using information retrieval methods. Information retrieval is the study of finding documents of unstructured material which should satisfy users' information needs. Various term-weighting models have been proposed for information retrieval. This work is carried out to analyze and evaluate the retrieval effectiveness of recently developed term-weighting models (after the 2000s) using the earlier datasets (dating back as far as the 1980s) with the motivation of such comparison has not been done.The open source library Apache Lucene is used for all experiments and evaluation. As a result, we observe that the DFIC model is in general more effective than the other models. We note also that, although one model can be the most effective for one dataset, the same model can be the least effective for another dataset.
机译:随着技术的发展,人机交互不断增长。与此平行的是,来自网站,社交媒体,博客和其他应用程序的信息达到了巨大的规模。从海量数据中获得所需信息成为一个大问题。解决此问题的一种方法是通过使用信息检索方法来正确地对信息进行索引和搜索。信息检索是寻找应满足用户信息需求的非结构化材料文档的研究。已经提出了用于信息检索的各种术语加权模型。这项工作是使用较早的数据集(可追溯到1980年代)来分析和评估最近开发的术语加权模型(在2000年代之后)的检索效果的,尚未进行这种比较的动机。源代码库Apache Lucene用于所有实验和评估。结果,我们观察到DFIC模型通常比其他模型更有效。我们还注意到,尽管一个模型对于一个数据集可能是最有效的,但是同一模型对于另一个数据集可能是最无效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号