首页> 外文会议>Information Resources Management Association International Conference >Improvement of Query Hit List Precision with a Document Clustering Technique
【24h】

Improvement of Query Hit List Precision with a Document Clustering Technique

机译:用文档聚类技术改进查询命中列表精度

获取原文

摘要

We propose a new approach to improve query hit list precision in document information retrieval. We use the k-tnean clustering technique to group returned hit list documents. The relevancy of each cluster is evaluated according to document relevancy scores in the clusters. The final relevancy score of each document is a combination of the relevancy score of cluster and individual document. To form clusters with features more related to the query, we use pseudo-feedback documents to construct a latentsemantic index (LSI), which transforms all the documents in the hit list into LSI feature vectors. Feature vectors constructed with relevant features are input to the clustering algorithm. We show that LSI based on relevant documents can improve the hitlist cluster coherence significantly, in the sense that clusters group query relevant and irrelevant documents separately. We also show that the improved cluster quality, which results to better separation between relevant and irrelevant documents, canbe used to improve the precision of a query hit list significantly.
机译:我们提出了一种新的方法来改善文档信息检索中的查询命中列表精度。我们将K-Tnean聚类技术与组返回的命中列表文档进行组。根据集群中的文档相关性分数评估每个群集的相关性。每个文档的最终相关性分数是集群和个人文件的相关性分数的组合。为了形成与查询相关的功能的群集,我们使用伪反馈文档构建延期申报(LSI),该索引(LSI)将命中列表中的所有文档转换为LSI特征向量。具有相关特征的特征向量被输入到聚类算法。我们表明,基于相关文档的LSI可以显着提高Hitlist集群连贯性,从而分别查询相关和无关文件的意义。我们还表明,改进的群集质量,这导致相关和无关文件之间的分离,可以用于提高查询命中列表的精度显着。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号