...
首页> 外文期刊>Journal of supercomputing >A parallel clustering method combined information bottleneck theory and centroid-based clustering
【24h】

A parallel clustering method combined information bottleneck theory and centroid-based clustering

机译:信息瓶颈理论与质心聚类相结合的并行聚类方法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Clustering is an important research topic of data mining. Information bottleneck theory-based clustering method is suitable for dealing with complicated clustering problems because that its information loss metric can measure arbitrary statistical relationships between samples. It has been widely applied to many kinds of areas. With the development of information technology, the electronic data scale becomes larger and larger. Classical information bottleneck theory-based clustering method is out of work to deal with large-scale dataset because of expensive computational cost. Parallel clustering method based on MapReduce model is the most efficient method to deal with large-scale data-intensive clustering problems. A parallel clustering method based on MapReduce model is developed in this paper. In the method, parallel information bottleneck theory clustering method based on MapReduce is proposed to determine the initial clustering center. An objective method is proposed to determine the final number of clusters automatically. Parallel centroid-based clustering method is proposed to determine the final clustering result. The clustering results are visualized with interpolation MDS dimension reduction method. The efficiency of the method is illustrated with a practical DNA clustering example.
机译:聚类是数据挖掘的重要研究课题。基于信息瓶颈理论的聚类方法适用于处理复杂的聚类问题,因为它的信息丢失度量可以度量样本之间的任意统计关系。它已被广泛应用于许多领域。随着信息技术的发展,电子数据规模越来越大。基于经典信息瓶颈理论的聚类方法由于计算成本高而无法处理大规模数据集。基于MapReduce模型的并行聚类方法是处理大规模数据密集型聚类问题的最有效方法。本文提出了一种基于MapReduce模型的并行聚类方法。该方法提出了一种基于MapReduce的并行信息瓶颈理论聚类方法来确定初始聚类中心。提出了一种客观的方法来自动确定最终的簇数。为了确定最终的聚类结果,提出了基于质心的聚类方法。聚类结果通过插值MDS降维方法可视化。通过一个实际的DNA聚类实例说明了该方法的效率。

著录项

  • 来源
    《Journal of supercomputing》 |2014年第1期|452-467|共16页
  • 作者单位

    Key Laboratory for Computer Network of Shandong Province, Shandong Computer Science Center, 19 Keyuan Road, Jinan 250014, Shandong, China;

    School of Informatics and Computing, Pervasive Technology Institute, Indiana University Bloomington, Bloomington, IN 47408, USA;

    Key Laboratory for Computer Network of Shandong Province, Shandong Computer Science Center, 19 Keyuan Road, Jinan 250014, Shandong, China;

    School of Software Engineering, Beijing Jiaotong University, Beijing 100044, China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Clustering; Information bottleneck theory; MapReduce; Centroid-based clustering;

    机译:集群;信息瓶颈理论;MapReduce;基于质心的聚类;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号