Data Mining Library for Big Data Processing Platforms: A Case Study-Sparkling Water Platform

机译：大数据处理平台的数据挖掘库：案例研究-气泡水平台

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Nowadays, many data from millions of websites, applications, social media resources, surveys, video surveillance platforms, and many other sources are obtained in a very large amount. By processing large datasets that occur every day, useful information can be derived. Distributed data processing platforms are needed to handle large amounts of data. For big data processing and analytics platforms such as Hadoop and Spark, there are machine learning libraries that operates distributed and exploits the advantages of distributed computing. For example; The Mahout library uses the Hadoop platform, while the Spark-MLLib library uses the Spark platform. However, for these platforms, it seems that there is no implementation for the algorithms included in the data mining steps, or there is only the implementation for some of the steps' algorithms. Within the scope of this research, algorithms in different data mining steps on a large data platform will be implemented and a performance evaluation will be performed. In the context of this research, as a case study, the Sparkling Water platform was chosen as a major data processing platform. The banking data set was used for the tests of the implemented data mining algorithms. A software layer containing all data mining steps was developed on the Sparkling Water platform and performance evaluation was conducted. As a result of the evaluation, it has been observed that performance enhancement which comes with distributed data processing has been successful.

机译：如今，从数以百万计的网站，应用程序，社交媒体资源，调查，视频监视平台以及许多其他来源获得的许多数据量非常大。通过处理每天发生的大型数据集，可以得出有用的信息。需要使用分布式数据处理平台来处理大量数据。对于诸如Hadoop和Spark之类的大数据处理和分析平台，有一些机器学习库可以分布式运行并利用分布式计算的优势。例如; Mahout库使用Hadoop平台，而Spark-MLLib库使用Spark平台。但是，对于这些平台，似乎没有实现数据挖掘步骤中包含的算法的实现，或者仅存在某些步骤的算法的实现。在本研究的范围内，将在大型数据平台上实施不同数据挖掘步骤中的算法，并将进行性能评估。在本研究的背景下，作为案例研究，选择了苏打水平台作为主要的数据处理平台。银行数据集用于测试已实施的数据挖掘算法。在Sparkling Water平台上开发了包含所有数据挖掘步骤的软件层，并进行了性能评估。评估的结果表明，分布式数据处理所带来的性能提升已获得成功。

著录项

来源
《International Conference on Computer Science and Engineering》|2018年|167-172|共6页
会议地点
作者
Elif Cansu Yıldız; Mehmet S. Aktas; Oya Kalıpsız; Alper Nebi Kanlı; Umut Orçun Turgut;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Sparks; Data mining; Data processing; Libraries; Performance evaluation; Water resources; Social network services;

机译：Sparks;数据挖掘;数据处理;图书馆;性能评估;水资源;社交网络服务;

相似文献

外文文献
中文文献
专利

1. Construction of an Intelligent Processing Platform for Equestrian Event Information Based on Data Fusion and Data Mining [J] . Zhong Wu, Chuan Zhou Journal of Sensors . 2021,第a期

机译：基于数据融合和数据挖掘的马术事件信息智能加工平台的构建
2. Big data Mining Using Very-Large-Scale Data Processing Platforms [J] . Ms. K. Deepthi, Dr. K. Anuradha International Journal of Engineering Research and Applications . 2016,第2期

机译：使用超大规模数据处理平台进行大数据挖掘
3. Research on Database Massive Data Processing and Mining Method basedon Hadoop Cloud Platform [J] . Zhao Xiaoyong, Yang Chunrong The Open Automation and Control Systems Journal . 2016,第1期

机译：基于Hadoop Cloud平台的数据库海量数据处理与挖掘方法研究
4. Data Mining Library for Big Data Processing Platforms: A Case Study-Sparkling Water Platform [C] . Elif Cansu Y?ld?z, Mehmet S. Aktas, Oya Kal?ps?z, International Conference on Computer Science and Engineering . 2018

机译：大数据处理平台的数据挖掘库：一个案例研究 - 闪闪发光的水平台
5. Performance comparison of two data mining algorithms on big data platforms. [D] . Raju, Md Rajiur Rahman. 2015

机译：大数据平台上两种数据挖掘算法的性能比较。
6. RETINOBASE: a web database data mining and analysis platform for gene expression data on retina [O] . Ravi Kiran Reddy Kalathur, Nicolas Gagniere, Guillaume Berthommier, 2008

机译：RETINOBASE：用于视网膜基因表达数据的网络数据库数据挖掘和分析平台
7. Research on Database Massive Data Processing and Mining Method based on Hadoop Cloud Platform [O] . Zhao Xiaoyong, Yang Chunrong 2014

机译：基于Hadoop云平台的数据库大规模数据处理和挖掘方法研究

Data Mining Library for Big Data Processing Platforms: A Case Study-Sparkling Water Platform

摘要

著录项

相似文献

相关主题

期刊订阅