首页> 外文会议>International Visual Informatics Conference >Determining Number of Clusters Using Firefly Algorithm with Cluster Merging for Text Clustering
【24h】

Determining Number of Clusters Using Firefly Algorithm with Cluster Merging for Text Clustering

机译:使用具有群集合并的萤火虫算法确定文本群集的群集数

获取原文

摘要

Text mining, in particular the clustering is mostly used by search engines to increase the recall and precision of a search query. The content of online websites (text, blogs, chats, news, etc.) are dynamically updated, nevertheless relevant information on the changes made are not present. Such a scenario requires a dynamic text clustering method that operates without initial knowledge on a data collection. In this paper, a dynamic text clustering that utilizes Firefly algorithm is introduced. The proposed, aFA_(merge), clustering algorithm automatically groups text documents into the appropriate number of clusters based on the behavior of firefly and cluster merging process. Experiments utilizing the proposed aFA_(merge) were conducted on two datasets; 20Newsgroups and Reuter's news collection. Results indicate that the aFA_(merge) generates a more robust and compact clusters than the ones produced by Bisect K-means and practical General Stochastic Clustering Method (pGSCM).
机译:文本挖掘,特别是群集主要由搜索引擎使用,以增加搜索查询的召回和精度。在线网站(文本,博客,聊天,新闻等)的内容是动态更新的,因此关于所做的更改的相关信息不存在。这样的场景需要一种动态文本群集方法,该方法在没有关于数据收集的初始知识的情况下运行。本文介绍了利用Firefly算法的动态文本聚类。建议的AFA_(合并),群集算法根据萤火虫和群集合并过程的行为自动将文本文档组成适当的群集。利用所提出的AFA_(合并)的实验在两个数据集上进行; 20Newsgroups和Reuter的新闻集合。结果表明,AFA_(合并)产生比通过双分辨率K-MENIEL和实用的通用随机聚类方法(PGSCM)产生的更稳健和紧凑的簇。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号