首页> 外文会议>International World Wide Web Conference >A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results
【24h】

A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results

机译:用于汇总和浏览搜索结果的分层单位文档聚类算法

获取原文

摘要

Organizing Web search results into a hierarchy of topics and subtopics facilitates browsing the collection and locating results of interest. In this paper, we propose a new hierarchical monothetic clustering algorithm to build a topic hierarchy for a collection of search results retrieved in response to a query. At every level of the hierarchy, the new algorithm progressively identifies topics in a way that maximizes the coverage while maintaining distinctiveness of the topics. We refer the proposed algorithm to as DisCover. Evaluating the quality of a topic hierarchy is a non-trivial task, the ultimate test being user judgment. We use several objective measures such as coverage and reach time for an empirical comparison of the proposed algorithm with two other monothetic clustering algorithms to demonstrate its superiority. Even though our algorithm is slightly more computationally intensive than one of the algorithms, it generates better hierarchies. Our user studies also show that the proposed algorithm is superior to the other algorithms as a summarizing and browsing tool.
机译:将Web搜索结果组织成主题的层次结构和子主题便于浏览收集和定位感兴趣的结果。在本文中,我们提出了一种新的分层单位聚类算法来构建一个主题层次结构,用于响应查询检索的搜索结果的集合。在层次结构的每个级别,新算法逐步识别主题,以便在保持主题的独特性的同时最大化覆盖范围。我们将所提出的算法引用到Discover中。评估主题层次结构的质量是一个非琐碎的任务,最终测试是用户判断。我们使用多种客观措施,例如覆盖范围和达到时间,以便与其他两个单一聚类算法的提议算法进行实证比较,以展示其优越性。尽管我们的算法比其中一个算法略微计算得多,但它会产生更好的层次结构。我们的用户研究还表明,所提出的算法优于其他算法作为总结和浏览工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号