首页> 外文OA文献 >A transduction-based approach to fuzzy clustering, relevance ranking and cluster label generation on web search results
【2h】

A transduction-based approach to fuzzy clustering, relevance ranking and cluster label generation on web search results

机译:基于转导的网络搜索结果模糊聚类,相关性排名和聚类标签生成方法

摘要

This paper details a modular, self-contained web search results clustering system that enhances search results by (i) performing clustering on lists of web documents returned by queries to search engines, and (ii) ranking the results and labeling the resulting clusters, by using a calculated relevance value as a degree of membership to clusters. In addition, we demonstrate an external evaluation method based on precision for comparing fuzzy clustering techniques, as well as internal measures suitable for working on non-training data. The built-in label generator uses the membership degrees and relevance values to weight the most relevant results more heavily. The membership degrees of documents to fuzzy clusters also facilitate effective detection and removal of overly similar clusters. To achieve this, our transductionbased clustering algorithm (TCA) and its fuzzy counterpart (FTCA) employ a transduction-based relevance model (TRM) to consider local relationships between each web document. Results from testing on five different real-world and synthetic datasets results show favorable results compared to established label-based clustering algorithms Suffix Tree Clustering (STC) and Lingo.
机译:本文详细介绍了一种模块化,自包含的Web搜索结果聚类系统,该系统可通过(i)对查询返回给搜索引擎的Web文档列表执行聚类,以及(ii)通过对结果进行排名和标记结果聚类来增强搜索结果使用计算出的相关性值作为聚类的隶属度。此外,我们演示了一种基于精度的外部评估方法,用于比较模糊聚类技术,以及适用于处理非训练数据的内部措施。内置标签生成器使用隶属度和相关性值更重地权衡最相关的结果。文档对模糊聚类的隶属度还有助于有效检测和去除过于相似的聚类。为实现此目的,我们的基于转导的聚类算法(TCA)及其模糊对应物(FTCA)使用基于转导的相关性模型(TRM)考虑每个Web文档之间的局部关系。与已建立的基于标签的聚类算法后缀树聚类(STC)和Lingo相比,在五个不同的真实世界和合成数据集上的测试结果显示出了令人满意的结果。

著录项

  • 作者

    Matsumoto T; Hung E;

  • 作者单位
  • 年度 2012
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号