Generalized Jaccard Similarity Based Multilevel Threshold Affinity Propagated Clustering For Big Data Analytics

Maheswari K.; Ramakrishnan M.

首页> 外文期刊>International Journal of Applied Engineering Research >Generalized Jaccard Similarity Based Multilevel Threshold Affinity Propagated Clustering For Big Data Analytics

【24h】

Generalized Jaccard Similarity Based Multilevel Threshold Affinity Propagated Clustering For Big Data Analytics

机译：基于jaccard的广义jaccard相似性的多级阈值亲和力传播聚类为大数据分析

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering the huge amount of information in very large datasets is a difficult problem to be solved in data mining. Few research works have been designed for grouping similar types of data in big dataset with aid of different data mining concepts. The computational cost of conventional affinity propagation clustering technique is expensive in terms of memory space and time complexities when considering large size of dataset as input. In order to overcome these limitations, Generalized Jaccard Similarity based Multilevel Threshold Affinity Propagated Clustering (GJS-MTAPC) Technique is proposed. The GJS-MTAPC Technique is an improved Affinity propagation (AP) algorithm to increase the clustering performance of big data with minimal false positive rate and minimal computational cost. The GJS-MTAPC Technique splits big dataset which is to be clustered into a number of subsets. After dividing dataset, GJS-MTAPC Technique chooses exemplars of each subset randomly. Then, GJS-MTAPC Technique identifies best exemplars by transmitting responsibility and availability messages among data samples in each subset. Finally, GJS-MTAPC Technique defines multiple threshold values in order to precisely cluster data samples in a large dataset based on similarity values. As a result, GJS-MTAPC Technique provides better big data clustering processes in terms of clustering accuracy, computational cost and space complexity and false positive rate. The experimental result show that GJS-MTAPC Technique is able to increases the clustering accuracy and also minimizes the computational cost of big data analytics as compared to state-of-the-art works.

机译：在非常大的数据集中聚类大量信息是在数据挖掘中解决的难题。借助不同的数据挖掘概念，设计了很少有研究作品用于在大数据集中分组类似类型的数据。在考虑大尺寸的数据集中作为输入时，传统亲和传播聚类技术的计算成本在存储空间和时间复杂性方面是昂贵的。为了克服这些限制，提出了基于广义的Jaccard相似性的多级阈值阈值与传播的聚类（GJS-MTAPC）技术。 GJS-MTAPC技术是一种改进的亲和传播（AP）算法，以提高大数据的聚类性能，具有最小的误率和最小的计算成本。 GJS-MTAPC技术将要群集为多个子集的大数据集。在分割数据集之后，GJS-MTAPC技术随机选择每个子集的示例。然后，GJS-MTAPC技术通过在每个子集中的数据样本之间传输责任和可用性消息来识别最佳示例。最后，GJS-MTAPC技术定义了多个阈值，以便基于相似性值精确地在大数据集中进行数据采样。结果，GJS-MTAPC技术在聚类精度，计算成本和空间复杂度和假阳性率方面提供更好的大数据聚类过程。实验结果表明，与最先进的工程相比，GJS-MTAPC技术能够提高聚类精度，并最大限度地减少大数据分析的计算成本。

著录项

来源
《International Journal of Applied Engineering Research》 |2018年第1期|共11页
作者
Maheswari K.; Ramakrishnan M.;
展开▼
作者单位

Department of Computer Science Bharathiyar University;

School of Information Technology Madurai Kamaraj University;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类工程基础科学;
关键词
Big data; Exemplars; Clusters; Generalized Jaccard Similarity; Multiple Threshold Values; Subsets;

机译：大数据;示例;集群;广义jaccard相似性;多阈值;子集;

相似文献

外文文献
中文文献
专利

1. Generalized Jaccard Similarity Based Multilevel Threshold Affinity Propagated Clustering For Big Data Analytics [J] . Maheswari K., Ramakrishnan M. International Journal of Applied Engineering Research . 2018,第11aPta1期

机译：基于jaccard的广义jaccard相似性的多级阈值亲和力传播聚类为大数据分析
2. Genetic Similarity of Pakistani Pea (Pisum sativum L.) Germplasm with World Collection using Cluster Analysis and Jaccard's Similarity Index [J] . MUHAMMAD NISAR, ABDUL GHAFOOR, MUHAMMAD RASHID KHAN Journal of the Chemical Society of Pakistan . 2009,第1期

机译：聚类分析和Jaccard相似度指数与世界种巴基斯坦豌豆（Pisum sativum L.）种质的遗传相似性
3. A New Similarity Measure Based Affinity Propagation for Data Clustering [J] . Advanced Science Letters . 2018,第2期

机译：基于新的相似度测量数据群集的关联传播
4. Threshold based similarity clustering of medical data [C] . Morajkar Sweta C., Laxminarayana J.A. International Conference on Advanced Communication Control and Computing Technologies . 2014

机译：基于阈值的医学数据相似性聚类
5. A generalized multidimensional index structure for multimedia data to support content-based similarity searches in a collaborative search environment. [D] . Chetterjee, Kasturi. 2010

机译：用于多媒体数据的通用多维索引结构，以在协作搜索环境中支持基于内容的相似性搜索。
6. Patent relatedness and velocity in the Chinese pharmaceutical industry: A dataset of Jaccard similarity indices [O] . Charlotte Marie Vorreuther, Thierry Warin 2021

机译：中国制药行业的专利相关性和速度：Jaccard相似性指数的数据集
7. Comparison Jaccard similarity, Cosine Similarity and Combined Both of the Data Clustering With Shared Nearest Neighbor Method [O] . Zahrotun Lisna 2016

机译：比较Jaccard相似度，余弦相似性以及数据聚类与共享最近邻法的结合

Generalized Jaccard Similarity Based Multilevel Threshold Affinity Propagated Clustering For Big Data Analytics

摘要

著录项

相似文献

相关主题

期刊订阅