首页> 外文OA文献 >New Weighting Schemes for Document Ranking and Ranked Query Suggestion
【2h】

New Weighting Schemes for Document Ranking and Ranked Query Suggestion

机译:用于文档排名和排名查询建议的新加权方案

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Term weighting is a process of scoring and ranking a term’s relevance to a user’s information need or the importance of a term to a document. This thesis aims to investigate novel term weighting methods with applications in document representation for text classification, web document ranking, and ranked query suggestion. Firstly, this research proposes a new feature for document representation under the vector space model (VSM) framework, i.e., class specific document frequency (CSDF), which leads to a new term weighting scheme based on term frequency (TF) and the newly proposed feature. The experimental results show that the proposed methods, CSDF and TF-CSDF, improve the performance of document classification in comparison with other widely used VSM document representations. Secondly, a new ranking method called GCrank is proposed for re-ranking web documents returned from search engines using document classification scores. The experimental results show that the GCrank method can improve the performance of web returned document ranking in terms of several commonly used evaluation criteria. Finally, this research investigates several state-of-the-art ranked retrieval methods, adapts and combines them as well, leading to a new method called Tfjac for ranked query suggestion, which is based on the combination between TF-IDF and Jaccard coefficient methods. The experimental results show that Tfjac is the best method for query suggestion among the methods evaluated. It outperforms the most popularly used TF-IDF method in terms of increasing the number of highly relevant query suggestions.
机译:术语权重是对术语与用户信息需求的相关性或术语对文档的重要性进行评分和排名的过程。本文旨在研究新颖的术语加权方法及其在文本表示,网页文档排名和排名查询建议中的应用。首先,本研究提出了一种在向量空间模型(VSM)框架下文档表示的新功能,即类特定文档频率(CSDF),这导致了基于术语频率(TF)的新术语加权方案和新提出的特征。实验结果表明,与其他广泛使用的VSM文档表示法相比,所提出的CSDF和TF-CSDF方法提高了文档分类的性能。其次,提出了一种新的排名方法,称为GCrank,用于使用文档分类分数对搜索引擎返回的Web文档进行重新排名。实验结果表明,根据几种常用的评估标准,GCrank方法可以提高Web返回文档排名的性能。最后,本研究调查了几种最先进的分级检索方法,并对它们进行了调整和组合,从而产生了一种新的名为Tfjac的分级查询建议方法,该方法基于TF-IDF和Jaccard系数方法的结合。实验结果表明,Tfjac是所评估方法中查询建议的最佳方法。在增加高度相关的查询建议的数量方面,它优于最常用的TF-IDF方法。

著录项

  • 作者

    Plansangket Suthira;

  • 作者单位
  • 年度 2017
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号