首页> 外文会议>IEEE/ACM International Conference on Computer-Aided Design >A Streaming Clustering Approach Using a Heterogeneous System for Big Data Analysis
【24h】

A Streaming Clustering Approach Using a Heterogeneous System for Big Data Analysis

机译:一种使用异构系统进行大数据分析的流聚类方法

获取原文

摘要

Data clustering is a fundamental challenge in data analytics. It is the main task in exploratory data mining and a core technique in machine learning. As the volume, variety, velocity, and variability of data grows, we need more efficient data analysis methods that can scale towards increasingly large and high dimensional data sets. We develop a streaming clustering algorithm that is highly amenable to hardware acceleration. Our algorithm eliminates the need to store the data objects, which removes limits on the size of the data that we can analyze. Our algorithm is highly parameterizable, which allows it to fit to the characteristics of the data set, and scale towards the available hardware resources. Our streaming hardware core can handle more than 40 Msamples/s when processing 3-dimensional streaming data and up to 1.78 Msamples/s for 70-dimensional data. To validate the accuracy and performance of our algorithms we compare it with several common clustering techniques on several different applications. The experimental result shows that it outperforms other prior hardware accelerated clustering systems.
机译:数据群集是数据分析中的基本挑战。它是探索性数据挖掘的主要任务和机器学习中的核心技术。作为数据的体积,品种,速度和可变性,我们需要更高效的数据分析方法,可以扩展到越来越大的高维数据集。我们开发了一种媒体聚类算法,其高度适用于硬件加速度。我们的算法消除了存储数据对象的需求,该数据对象删除了我们可以分析的数据大小的限制。我们的算法是高度可参数的,它允许它适合数据集的特征,并朝向可用的硬件资源进行比例。我们的流式硬件核心可以在处理三维流数据时处理超过40毫念/秒,最高可达70维数据。为了验证我们算法的准确性和性能,我们将其与几种不同应用程序的常用聚类技术进行比较。实验结果表明它优于其他先前的硬件加速聚类系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号