首页> 外文期刊>Information Technology Journal >Clustering Search Results Based on Formal Concept Analysis
【24h】

Clustering Search Results Based on Formal Concept Analysis

机译:基于形式概念分析的搜索结果聚类

获取原文
获取原文并翻译 | 示例
           

摘要

This study propose a new method based on Formal Concept Analysis (FCA) to group and organize search results. After formal concepts are extracted using FCA, the concepts most relevant to the query are selected and extracted and then a two-level hierarchy is built and presented to the user. We refer the proposed algorithm to as CHC (Conceptual and Hierarchical Clustering). Evaluating the quality of the clustering results is a non-trivial task. Two improved objective metrics of clustering quality, ANCE@K and ANCE@K, are proposed based on NMI (normalized mutual information) and NCE (normalized complementary entropy) metrics but eliminating the biases existed in them. We compare CHC with three other Search Results Clustering (SRC) algorithms: Suffix Tree Clustering (STC), Lingo and Vivisimo, using a comprehensive set of documents obtained from the Open Directory Project hierarchy as benchmark. In addition to comparison based on objective measures, we also subjectively analyze the properties of cluster labels produced by different SRC algorithms. The experimental results show that our method outperforms the other three SRC algorithms and is helpful to the user for browsing and locating the results of interests.
机译:这项研究提出了一种基于形式概念分析(FCA)的新方法来对搜索结果进行分组和组织。使用FCA提取形式概念后,选择和提取与查询最相关的概念,然后建立两级层次结构并将其呈现给用户。我们将提出的算法称为CHC(概念和层次聚类)。评估聚类结果的质量并非易事。基于NMI(归一化互信息)和NCE(归一化互补熵)度量,提出了两个改进的聚类质量客观度量ANCE @ K和ANCE @ K,但消除了它们之间存在的偏差。我们将CHC与其他三种搜索结果聚类(SRC)算法进行了比较:后缀树聚类(STC),Lingo和Vivisimo,并使用了从Open Directory Project层次结构中获得的全面文档集作为基准。除了基于客观度量的比较之外,我们还主观分析了不同SRC算法产生的聚类标签的属性。实验结果表明,我们的方法优于其他三种SRC算法,对用户浏览和定位感兴趣的结果很有帮助。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号