首页> 外文会议>Asia-Pacific Web Conference >A Frequent Term-Based Multiple Clustering Approach for Text Documents

【24h】

A Frequent Term-Based Multiple Clustering Approach for Text Documents

机译：基于常用的基于术语的文本文档的多个聚类方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the boom of web and social network, the amount of generated text data has increased enormously. On one hand, although text clustering methods are applicable to classify text data and facilitate data mining work such as information retrieval and recommendation, inadequate aspects are still evident. Especially, most existing text clustering methods provide either a hard partitioned or a hierarchical result, which cannot describe the data from various perspectives. On the other hand, multiple clustering approaches, which are proposed to classify data with various perspectives, meet several challenges such as high time complexity and incomprehensible results while applied to text documents. In this paper, we propose a frequent term-based multiple clustering approach for text documents. Our approach classifies text documents with various perspectives and provides a semantic explanation for each cluster. Through a series of experiments, we prove that our method is more scalable and provides more comprehensible results than traditional multiple clustering methods such as OSCLU and ASCLU while applied to text documents. In addition, we also found that our approach achieves a better clustering quality than existing text clustering approaches like FTC.

机译：随着Web和社交网络的繁荣，所生成的文本数据的数量很大。一方面，虽然文本聚类方法适用于对文本数据进行分类并促进信息检索和建议等数据挖掘工作，但方面不足。特别是，大多数现有文本群集方法提供了硬分区或分层结果，其无法从各种透视图中描述数据。另一方面，提出的多种聚类方法，该方法以各种观点对数据进行分类，满足了几种挑战，例如高时间复杂性和难以理解的结果，同时应用于文本文档。在本文中，我们提出了一种常用的基于术语的多种聚类方法，用于文本文档。我们的方法将文本文档分类为各种透视图，为每个群集提供语义解释。通过一系列实验，我们证明了我们的方法更可扩展，并且比传统的多个聚类方法（如OSCLU和ASClu）提供更可理解的结果，而在应用于文本文档时。此外，我们还发现，我们的方法能够实现比FTC等现有文本聚类方法更好的聚类质量。

著录项

来源
《Asia-Pacific Web Conference》|2014年||共8页
会议地点
作者
Hai-Tao Zheng; Hao Chen; Shu-Qin Gong;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP393-53;
关键词
Multiple clustering; Frequent term; Text documents;

机译：多聚类;频繁的术语;文本文件;

相似文献

外文文献
中文文献
专利

1. Text Document Retrieval through Clustering using Meaningful Frequent Ordered Word Patterns [J] . Pushpalatha K. P., G. Raju International Journal of Applied Engineering Research . 2018,第7aPta2期

机译：通过使用有意义的频繁有序的单词模式来通过聚类来检索文本文档
2. A SOM-Based Document Clustering Using Frequent Max Substrings for Non-Segmented Texts [J] . Todsanai Chumwatana, Kok Wai Wong, Hong Xie Journal of Intelligent Learning Systems and Applications . 2010,第3期

机译：基于SOM的文档聚类，使用非分类文本的最大行数子字符串
3. Text document clustering based on frequent word meaning sequences [J] . Yanjun Li, Soon M. Chung, John D. Holt Data & Knowledge Engineering . 2008,第1期

机译：基于频繁词义序列的文本文档聚类
4. A Frequent Term-Based Multiple Clustering Approach for Text Documents [C] . Hai-Tao Zheng, Hao Chen, Shu-Qin Gong Asia-Pacific web conference . 2014

机译：一种基于术语的基于术语的频繁聚类方法
5. Text document topical recursive clustering and automatic labeling of a hierarchy of document clusters. [D] . Li, Xiaoxiao. 2012

机译：文本文档主题递归群集和文档群集层次结构的自动标记。
6. Thematic clustering of text documents using an EM-based approach [O] . Sun Kim, W John Wilbur 2012

机译：使用基于EM的方法对文本文档进行主题聚类
7. Frequent Itemset-based Text Clustering Approach to Cluster Ranked Documents [O] . Snehalata Nandanwar, Geetanjali Kale, Sheetal Sonawane 2014

机译：基于项目集的基于项目的文本聚类方法来群集排名文档

A Frequent Term-Based Multiple Clustering Approach for Text Documents

摘要

著录项

相似文献

相关主题

期刊订阅