首页> 外文期刊>International Journal of Applied Engineering Research >Generalized Jaccard Similarity Based Multilevel Threshold Affinity Propagated Clustering For Big Data Analytics
【24h】

Generalized Jaccard Similarity Based Multilevel Threshold Affinity Propagated Clustering For Big Data Analytics

机译:基于jaccard的广义jaccard相似性的多级阈值亲和力传播聚类为大数据分析

获取原文
获取原文并翻译 | 示例
           

摘要

Clustering the huge amount of information in very large datasets is a difficult problem to be solved in data mining. Few research works have been designed for grouping similar types of data in big dataset with aid of different data mining concepts. The computational cost of conventional affinity propagation clustering technique is expensive in terms of memory space and time complexities when considering large size of dataset as input. In order to overcome these limitations, Generalized Jaccard Similarity based Multilevel Threshold Affinity Propagated Clustering (GJS-MTAPC) Technique is proposed. The GJS-MTAPC Technique is an improved Affinity propagation (AP) algorithm to increase the clustering performance of big data with minimal false positive rate and minimal computational cost. The GJS-MTAPC Technique splits big dataset which is to be clustered into a number of subsets. After dividing dataset, GJS-MTAPC Technique chooses exemplars of each subset randomly. Then, GJS-MTAPC Technique identifies best exemplars by transmitting responsibility and availability messages among data samples in each subset. Finally, GJS-MTAPC Technique defines multiple threshold values in order to precisely cluster data samples in a large dataset based on similarity values. As a result, GJS-MTAPC Technique provides better big data clustering processes in terms of clustering accuracy, computational cost and space complexity and false positive rate. The experimental result show that GJS-MTAPC Technique is able to increases the clustering accuracy and also minimizes the computational cost of big data analytics as compared to state-of-the-art works.
机译:在非常大的数据集中聚类大量信息是在数据挖掘中解决的难题。借助不同的数据挖掘概念,设计了很少有研究作品用于在大数据集中分组类似类型的数据。在考虑大尺寸的数据集中作为输入时,传统亲和传播聚类技术的计算成本在存储空间和时间复杂性方面是昂贵的。为了克服这些限制,提出了基于广义的Jaccard相似性的多级阈值阈值与传播的聚类(GJS-MTAPC)技术。 GJS-MTAPC技术是一种改进的亲和传播(AP)算法,以提高大数据的聚类性能,具有最小的误率和最小的计算成本。 GJS-MTAPC技术将要群集为多个子集的大数据集。在分割数据集之后,GJS-MTAPC技术随机选择每个子集的示例。然后,GJS-MTAPC技术通过在每个子集中的数据样本之间传输责任和可用性消息来识别最佳示例。最后,GJS-MTAPC技术定义了多个阈值,以便基于相似性值精确地在大数据集中进行数据采样。结果,GJS-MTAPC技术在聚类精度,计算成本和空间复杂度和假阳性率方面提供更好的大数据聚类过程。实验结果表明,与最先进的工程相比,GJS-MTAPC技术能够提高聚类精度,并最大限度地减少大数据分析的计算成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号