首页> 外文期刊>Journal of Information Science >An improved ant algorithm with LDA-based representation for text document clustering
【24h】

An improved ant algorithm with LDA-based representation for text document clustering

机译:一种基于LDA表示的改进蚁群算法用于文本文档聚类

获取原文
获取原文并翻译 | 示例
           

摘要

Document clustering can be applied in document organisation and browsing, document summarisation and classification. The identification of an appropriate representation for textual documents is extremely important for the performance of clustering or classification algorithms. Textual documents suffer from the high dimensionality and irrelevancy of text features. Besides, conventional clustering algorithms suffer from several shortcomings, such as slow convergence and sensitivity to the initial value. To tackle the problems of conventional clustering algorithms, metaheuristic algorithms are frequently applied to clustering. In this paper, an improved ant clustering algorithm is presented, where two novel heuristic methods are proposed to enhance the clustering quality of ant-based clustering. In addition, the latent Dirichlet allocation (LDA) is used to represent textual documents in a compact and efficient way. The clustering quality of the proposed ant clustering algorithm is compared to the conventional clustering algorithms using 25 text benchmarks in terms of F-measure values. The experimental results indicate that the proposed clustering scheme outperforms the compared conventional and metaheuristic clustering methods for textual documents.
机译:文档聚类可以应用于文档组织和浏览,文档汇总和分类。确定文本文档的适当表示形式对于执行聚类或分类算法非常重要。文本文档遭受文本特征的高维和不相关性的困扰。此外,传统的聚类算法还具有收敛速度慢和对初始值敏感等缺点。为了解决常规聚类算法的问题,元启发式算法经常应用于聚类。本文提出了一种改进的蚁群算法,提出了两种新颖的启发式方法来提高基于蚁群的聚类质量。此外,潜在的Dirichlet分配(LDA)用于以紧凑高效的方式表示文本文档。使用F测度值,使用25个文本基准将拟议的蚂蚁聚类算法的聚类质量与常规聚类算法进行了比较。实验结果表明,提出的聚类方案优于传统的和文本启发式聚类方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号