首页> 外文会议>SIAM International Conference on Data Mining >A Segment-based Approach To Clustering Multi-Topic Documents
【24h】

A Segment-based Approach To Clustering Multi-Topic Documents

机译:基于段的聚类多主题文档的方法

获取原文

摘要

Document clustering has been recognized as a central problem in text data management, and it becomes particularly challenging when documents have multiple topics. In this paper we address the problem of multi-topic document clustering by leveraging the natural composition of documents in text segments, which bear one or more topics on their own. We propose a segment-based document clustering frame-work, which is designed to induce a classification of documents starting from the identification of cohesive groups of segment-based portions of the original documents. We empirically give evidence of the significance of our approach on different, large collections of multi-topic documents.
机译:文档群集已被识别为文本数据管理中的核心问题,并且当文档有多个主题时,它变得特别具有挑战性。在本文中,我们通过利用文本细分中的自然组成来解决多主题文档聚类的问题,这些文本在文本段中的自然组成,这是一个或多个主题。我们提出了一个基于分段的文档聚类帧工作,该帧工作旨在诱导从识别基于段的原始文档部分的内聚集组的文档的分类。我们经验证明了我们对不同大型多主题文件的方法的重要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号