【24h】

Improvement of Query Hit List Precision with a Document Clustering Technique

机译:利用文档聚类技术提高查询命中列表的准确性

获取原文
获取原文并翻译 | 示例

摘要

We propose a new approach to improve query hit list precision in document information retrieval. We use the k-mean clustering technique to group returned hit list documents. The relevancy of each cluster is evaluated according to document relevancy scores in the clusters. The final relevancy score of each document is a combination of the relevancy score of cluster and individual document. To form clusters with features more related to the query, we use pseudo-feedback documents to construct a latent semantic index (LSI), which transforms all the documents in the hit list into LSI feature vectors. Feature vectors constructed with relevant features are input to the clustering algorithm. We show that LSI based on relevant documents can improve the hit list cluster coherence significantly, in the sense that clusters group query relevant and irrelevant documents separately. We also show that the improved cluster quality, which results to better separation between relevant and irrelevant documents, can be used to improve the precision of a query hit list significantly.
机译:我们提出了一种新的方法来提高文档信息检索中查询命中列表的准确性。我们使用k均值聚类技术对返回的命中列表文档进行分组。根据聚类中的文档相关性得分评估每个聚类的相关性。每个文档的最终相关性分数是群集和单个文档的相关性分数的组合。为了形成具有与查询更相关的特征的聚类,我们使用伪反馈文档来构造潜在语义索引(LSI),该属性将命中列表中的所有文档转换为LSI特征向量。具有相关特征的特征向量被输入到聚类算法。我们表明,在聚类组分别查询相关文档和不相关文档的意义上,基于相关文档的LSI可以显着提高命中列表聚类的一致性。我们还表明,改进的集群质量可导致更好地分离相关文档和不相关文档,可用于显着提高查询命中列表的精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号