Scalable Online-Offline Stream Clustering in Apache Spark

机译：Apache Spark中的可扩展的在线离线流集群

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Two of the most popular approaches for dealing with big data are distributed computing and stream mining. In this paper, we incorporate both approaches in order to bring a competitive stream clustering algorithm, namely CluStream, into a modern framework for distributed computing, namely, Apache Spark. CluStream is one of the most popular clustering approaches for stream clustering and the one that introduced the online-offline mining process: the online phase summarizes the stream through statistical summaries and the offline phase generates the final clusters upon these summaries. We obtain a scalable stream clustering method which is open source and can be used by the Apache Spark community. Our experiments show that our adaptation, our achieves similar quality to the original approach, while it is more efficient.

机译：处理大数据的两种最流行的方法是分布式计算和流挖掘。在本文中，我们将这两种方法结合在一起，以将竞争性的流聚类算法CluStream引入到分布式计算的现代框架Apache Spark中。 CluStream是最流行的用于流群集的群集方法之一，并且是一种引入在线-离线挖掘过程的方法：在线阶段通过统计摘要来总结流，而离线阶段根据这些摘要来生成最终的群集。我们获得了一种可扩展的流聚类方法，该方法是开源的，可以由Apache Spark社区使用。我们的实验表明，我们的适应方法可以达到与原始方法相似的质量，同时效率更高。

著录项

来源
《IEEE International Conference on Data Mining Workshops》|2016年|37-44|共8页
会议地点
作者
Omar Backhoff; Eirini Ntoutsi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Sparks; Clustering algorithms; Algorithm design and analysis; Data mining; Big data; Distributed databases;

机译：Sparks;聚类算法;算法设计与分析;数据挖掘;大数据;分布式数据库;

相似文献

外文文献
中文文献
专利

1. Adaptive performance model for dynamic scaling Apache Spark Streaming [J] . Max Petrov, Nikolay Butakov, Denis Nasonov, Procedia Computer Science . 2018,第1期

机译：动态扩展Apache Spark流的自适应性能模型
2. Scalability of Artificial Neural Network in Apache Spark Powered Cluster [J] . Advanced Science Letters . 2017,第6期

机译：Apache Spark Power群集中人工神经网络的可扩展性
3. Big Data Processing with Apache Spark in Tertiary Institutions: Spark Streaming [J] . Emmanuel Boachie, Chunlin Li Journal of Information Engineering and Applications . 2017,第6期

机译：高校使用Apache Spark进行大数据处理：Spark流
4. Scalable Online-Offline Stream Clustering in Apache Spark [C] . Omar Backhoff, Eirini Ntoutsi IEEE International Conference on Data Mining Workshops . 2016

机译：Apache Spark中可扩展的在线脱机流群集
5. Graphical Development Interface and Stream Analyzer for Apache Spark [D] . Sharma, Arun. 2018

机译：用于Apache Spark的图形开发界面和流分析仪
6. Real-Time Heart Arrhythmia Detection Using Apache Spark Structured Streaming [O] . Sadegh Ilbeigipour, Amir Albadvi, Elham Akhondzadeh Noughabi 2021

机译：使用Apache Spark结构流媒体进行实时心脏心律失常检测
7. An adaptive clustering and classification algorithm for Twitter data streaming in Apache Spark [O] . Raed A. Hasan, Royida A. Ibrahem Alhayali, Nashwan Dheyaa Zaki, 2019

机译：Apache Spark中推特数据流的自适应聚类和分类算法

Scalable Online-Offline Stream Clustering in Apache Spark

摘要

著录项

相似文献

相关主题

期刊订阅