首页> 外文OA文献 >An enhanced k-means clustering algorithm for pattern discovery in healthcare data
【2h】

An enhanced k-means clustering algorithm for pattern discovery in healthcare data

机译:用于医疗数据模式发现的增强型k均值聚类算法

摘要

The huge amounts of data generated by media sensors in health monitoring systems, by medical diagnosis that produce media (audio, video, image, and text) content, and from health service providers are too complex and voluminous to be processed and analyzed by traditional methods. Data mining approaches offer the methodology and technology to transform these heterogeneous data into meaningful information for decision making. This paper studies data mining applications in healthcare. Mainly, we study k-means clustering algorithms on large datasets and present an enhancement to k-means clustering, which requires k or a lesser number of passes to a dataset. The proposed algorithm, which we call G-means, utilizes a greedy approach to produce the preliminary centroids and then takes k or lesser passes over the dataset to adjust these center points. Our experimental results, which were used in an increasing manner on the same dataset, show that G-means outperforms k-means in terms of entropy and F-scores. The experiments also yield better results for G-means in terms of the coefficient of variance and the execution time.
机译:健康监控系统中的媒体传感器,产生媒体(音频,视频,图像和文本)内容的医学诊断以及健康服务提供商所产生的大量数据过于复杂和庞大,无法通过传统方法进行处理和分析。 。数据挖掘方法提供了将这些异构数据转换为有意义的信息以供决策的方法和技术。本文研究数据挖掘在医疗保健中的应用。主要是,我们研究大型数据集上的k均值聚类算法,并提出了对k均值聚类的增强功能,它需要k个或更少次数的数据集。所提出的算法(我们称为G-均值)利用贪婪方法生成初步质心,然后对数据集进行k次或更小遍调整这些中心点。我们的实验结果(在同一数据集上使用的方式越来越多)表明,就熵和F得分而言,G均值优于k均值。在方差系数和执行时间方面,实验也为G均值产生了更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号