首页> 外文会议>Asia-Pacific Web Conference >A Frequent Term-Based Multiple Clustering Approach for Text Documents
【24h】

A Frequent Term-Based Multiple Clustering Approach for Text Documents

机译:基于常用的基于术语的文本文档的多个聚类方法

获取原文

摘要

With the boom of web and social network, the amount of generated text data has increased enormously. On one hand, although text clustering methods are applicable to classify text data and facilitate data mining work such as information retrieval and recommendation, inadequate aspects are still evident. Especially, most existing text clustering methods provide either a hard partitioned or a hierarchical result, which cannot describe the data from various perspectives. On the other hand, multiple clustering approaches, which are proposed to classify data with various perspectives, meet several challenges such as high time complexity and incomprehensible results while applied to text documents. In this paper, we propose a frequent term-based multiple clustering approach for text documents. Our approach classifies text documents with various perspectives and provides a semantic explanation for each cluster. Through a series of experiments, we prove that our method is more scalable and provides more comprehensible results than traditional multiple clustering methods such as OSCLU and ASCLU while applied to text documents. In addition, we also found that our approach achieves a better clustering quality than existing text clustering approaches like FTC.
机译:随着Web和社交网络的繁荣,所生成的文本数据的数量很大。一方面,虽然文本聚类方法适用于对文本数据进行分类并促进信息检索和建议等数据挖掘工作,但方面不足。特别是,大多数现有文本群集方法提供了硬分区或分层结果,其无法从各种透视图中描述数据。另一方面,提出的多种聚类方法,该方法以各种观点对数据进行分类,满足了几种挑战,例如高时间复杂性和难以理解的结果,同时应用于文本文档。在本文中,我们提出了一种常用的基于术语的多种聚类方法,用于文本文档。我们的方法将文本文档分类为各种透视图,为每个群集提供语义解释。通过一系列实验,我们证明了我们的方法更可扩展,并且比传统的多个聚类方法(如OSCLU和ASClu)提供更可理解的结果,而在应用于文本文档时。此外,我们还发现,我们的方法能够实现比FTC等现有文本聚类方法更好的聚类质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号