首页> 外文会议>Pacific-Asia conference on knowledge discovery and data mining >Text Document Topical Recursive Clustering and Automatic Labeling of a Hierarchy of Document Clusters
【24h】

Text Document Topical Recursive Clustering and Automatic Labeling of a Hierarchy of Document Clusters

机译:文本文档主题递归聚类和文档聚类层次结构的自动标记

获取原文

摘要

The overwhelming amount of textual documents available nowadays highlights the need for information organization and discovery. Effectively organizing documents into a hierarchy of topics and subtopics makes it easier for users to browse the documents. This paper borrows community mining from social network analysis to generate a hierarchy of topically coherent document clusters. It focuses on giving the document clusters descriptive labels. We propose to use betweenness cen-trality measure in networks of co-occurring terms to label the clusters. We also incorporate keyphrase extraction and automatic titling in cluster labeling. The results show that the cluster labeling method utilizing KEA to extract keyphrases from the documents generates the best labels overall comparing to other methods and baselines.
机译:如今,大量的文本文件突显了信息组织和发现的需求。有效地将文档组织到主题和子主题的层次结构中,使用户可以更轻松地浏览文档。本文借鉴了来自社交网络分析的社区挖掘,以生成局部相关文档簇的层次结构。它着重于为文档簇提供描述性标签。我们建议在共同出现的词的网络中使用中间性中心度量来标记聚类。我们还将关键字短语提取和自动标题合并到群集标记中。结果表明,与其他方法和基准相比,利用KEA从文档中提取关键短语的聚类标记方法可产生最佳的标记。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号