Static and Dynamic Big Data Partitioning on Apache Spark

机译：Apache Spark上的静态和动态大数据划分

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many of today's large datasets are organized as a graph. Due to their size it is often infeasible to process these graphs using a single machine. Therefore, many software frameworks and tools have been proposed to process graph on top of distributed infrastructures. This software is often bundled with generic data decomposition strategies that are not optimised for specific algorithms. In this paper we study how a specific data partitioning strategy affects the performances of graph algorithms executing on Apache Spark. To this end, we implemented different graph algorithms and we compared their performances using a naive partitioning solution against more elaborate strategies, both static and dynamic.

机译：今天许多大型数据集团作为图形组织。由于它们的大小，使用单个机器处理这些图形通常是不可行的。因此，已经提出了许多软件框架和工具来处理分布式基础架构顶部的图表。该软件通常与通用数据分解策略捆绑在于未针对特定算法进行优化。在本文中，我们研究特定数据分区策略如何影响在Apache Spark上执行的图形算法的性能。为此，我们实现了不同的图形算法，我们使用天真的分区解决方案对其进行比较更精细的策略，既静态和动态。

著录项

来源
《International Conference series on Parallel Computing》|2016年|xx 850 pages :|共10页
会议地点
作者
Massimiliano Bertolucci; Emanuele Carlini; Patrizio Dazzi; Alessandro Lulli; Laura Ricci;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP338.6-532;
关键词
BigData; Graph algorithms; Data partitioning; Apache Spark;

机译：BigData;图算法;数据分区;Apache Spark;

相似文献

外文文献
中文文献
专利

1. A Robust Distributed Big Data Clustering-based on Adaptive Density Partitioning using Apache Spark [J] . Behrooz Hosseini, Kourosh Kiani Symmetry . 2018,第8期

机译：基于Apache Spark的基于自适应密度分区的鲁棒分布式大数据聚类
2. Implementation and performance analysis of dynamic partitioning of graphs in Apache Spark [J] . Geetha J, Jayalakshmi D S, Harshit N G International Journal of Advanced Computer Research . 2020,第48期

机译：Apache Spark中图形动态分区的实现与性能分析
3. Cost-efficient dynamic scheduling of big data applications in apache spark on cloud [J] . Muhammed Tawfiqul Islam, Satish Narayana Srirama, Shanika Karunasekera, The Journal of Systems and Software . 2020,第Apra期

机译：云中基于Apache Spark的经济高效的大数据应用动态调度
4. Static and Dynamic Big Data Partitioning on Apache Spark [C] . Massimiliano Bertolucci, Emanuele Carlini, Patrizio Dazzi, International Conference series on Parallel Computing . 2016

机译：Apache Spark上的静态和动态大数据划分
5. Streamlining Big Data Processing Pipelines via Unix Memory Tools, Persistent Spark Datasets, and the Apache Ignite Inmemory File System [D] . Blair, Walter 2018

机译：通过Unix内存工具，持久性Spark数据集和Apache Ignite内存文件系统简化大数据处理管道
6. Big Data Approaches for the Analysis of Large-Scale fMRI Data Using Apache Spark and GPU Processing: A Demonstration on Resting-State fMRI Data from the Human Connectome Project [O] . Roland N. Boubela, Klaudius Kalcher, Wolfgang Huf, 2015

机译：使用Apache Spark和GPU处理的大数据分析方法用于大规模fMRI数据：来自人类Connectome项目的静态fMRI数据的演示
7. A Comprehensive Performance Analysis of Apache Hadoop and Apache Spark for Large Scale Data Sets Using HiBench [O] . Nasim Ahmed, Andre L. C. Barczak, Teo Susnjak, 2020

机译：使用Hibench的大规模数据集的Apache Hadoop和Apache Spark的全面绩效分析

Static and Dynamic Big Data Partitioning on Apache Spark

摘要

著录项

相似文献

相关主题

期刊订阅