首页> 外文期刊>Mathematical Problems in Engineering: Theory, Methods and Applications >Novel Automated K-means++ Algorithm for Financial Data Sets
【24h】

Novel Automated K-means++ Algorithm for Financial Data Sets

机译:用于财务数据集的新型自动化k型++算法

获取原文
       

摘要

The K-means algorithm has been extensively investigated in the field of text clustering because of its linear time complexity and adaptation to sparse matrix data. However, it has two main problems, namely, the determination of the number of clusters and the location of the initial cluster centres. In this study, we propose an improved K-means++ algorithm based on the Davies-Bouldin index (DBI) and the largest sum of distance called the SDK-means++ algorithm. Firstly, we use the term frequency-inverse document frequency to represent the data set. Secondly, we measure the distance between objects by cosine similarity. Thirdly, the initial cluster centres are selected by comparing the distance to existing initial cluster centres and the maximum density. Fourthly, clustering results are obtained using the K-means++ method. Lastly, DBI is used to obtain optimal clustering results automatically. Experimental results on real bank transaction volume data sets show that the SDK-means++ algorithm is more effective and efficient than two other algorithms in organising large financial text data sets. The F-measure value of the proposed algorithm is 0.97. The running time of the SDK-means++ algorithm is reduced by 42.9% and 22.4% compared with that for K-means and K-means++ algorithms, respectively.
机译:由于其线性时间复杂性和适应稀疏矩阵数据,K-Means算法已在文本群集领域进行广泛调查。然而,它有两个主要问题,即确定簇数和初始集群中心的位置。在这项研究中,我们提出了一种基于Davies-Bouldin指数(DBI)的改进的K-Means ++算法以及称为SDK-Pers ++算法的最大距离和。首先,我们使用术语频率逆文档频率来表示数据集。其次,我们通过余弦相似度测量对象之间的距离。第三,通过将距离与现有初始群集中心和最大密度的距离进行比较来选择初始群集中心。第四,使用K-Means ++方法获得聚类结果。最后,DBI用于自动获得最佳聚类结果。实验结果实验结果对实际银行交易量数据集显示,SDK-Means ++算法比组织大型财务文本数据集的其他算法更有效和高效。所提出的算法的F测量值为0.97。与K-means和K-means ++算法分别相比,SDK-Means ++算法的运行时间减少了42.9%和22.4%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号