Benchmarking Spark Machine Learning Using BigBench

机译：使用Bigbench基准火花机学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Databases such as dashDB are adding High Speed Connectors for Spark to efficiently extract large volumes of data. This allows them to be combined with other unstructured data sources and perform Machine Learning (ML) on top of it. Machine Learning is a key ingredient for such use cases. In order to assess performance of the data connectors and machine language frameworks, we sought benchmarks that have the ability to scale the size of datasets to very large volumes and apply Machine Learning algorithms. After exploring several options, we found BigBench to be a good fit. In this paper, we talk about our experiences of using BigBench with special focus on its 5 Machine Learning queries and their default implementation in Spark. We discuss on how we could improve effectiveness of BigBench for benchmarking Machine Learning by avoiding bias and inclusion of real time analytics. We also think that there is scope for improving the coverage of Machine Learning by adding more use cases like Collaborative Filtering. Lastly, we share some interesting visualization of 4 ML queries using SPSS Modeler and our experiments on different Clustering and Classification algorithms.

机译：DashDB等数据库正在添加用于火花的高速连接器，以有效提取大量数据。这允许它们与其他非结构化数据源组合并在其顶部执行机器学习（ML）。机器学习是这种用例的关键因素。为了评估数据连接器和机器语言框架的性能，我们寻求具有将数据集大小扩展到非常大卷的能力的基准，并应用机器学习算法。在探索多种选择之后，我们发现Bigbench是一个很好的合适。在本文中，我们讨论了使用BigBench的经验，并特别关注其5台机器学习查询及其在Spark中的默认实现。我们讨论如何通过避免偏见和纳入实时分析来提高基准机器学习的Bigbench的有效性。我们还认为，通过添加更多用例，可以提高机器学习的覆盖范围，如同协作滤波等更多使用情况。最后，我们使用SPSS Modeler和我们在不同聚类和分类算法上的实验分享了4 ML查询的一些有趣的可视化。

著录项

来源
《TPC Technology Conference on Performance Evaluation and Benchmarking》|2017年|160p|共16页
会议地点
作者
Sweta Singh;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词
Collaborative filtering using machine learning; Predicting accuracy of data sets; Visualization of bigbench machine learning queries using SPSS;

机译：使用机器学习的协同过滤;预测数据集的准确性;使用SPSS的Bigbench机器学习查询的可视化;

相似文献

外文文献
中文文献
专利

1. Can deep learning algorithms outperform benchmark machine learning algorithms in flood susceptibility modeling? [J] . Binh Thai Pham, Chinh Luu, Tran Van Phong, Journal of Hydrology . 2021,第1期

机译：可以深入学习算法优于洪水易感性建模的基准机学习算法吗？
2. Applying deep learning and benchmark machine learning algorithms for landslide susceptibility modelling in Rorachu river basin of Sikkim Himalaya, India [J] . Kanu Mandal, Sunil Saha, Sujit Mandal Geoscience frontiers . 2021,第5期

机译：对印度锡金喜马拉雅省Rorachu River盆地滑坡敏感性建模的深层学习和基准机学习算法
3. Applying deep learning and benchmark machine learning algorithms for landslide susceptibility modelling in Rorachu river basin of Sikkim Himalaya, India [J] . Kanu Mandal, Sunil Saha, Sujit Mandal 地学前缘(英文版) . 2021,第005期

机译：Applying deep learning and benchmark machine learning algorithms for landslide susceptibility modelling in Rorachu river basin of Sikkim Himalaya, India
4. Benchmarking Spark Machine Learning Using BigBench [C] . Sweta Singh International conference on very large data bases;TPC technology conference on performance evaluation and benchmarking . 2017

机译：使用BigBench对Spark机器学习进行基准测试
5. Performance Evaluation of Machine Learning Algorithms in Apache Spark for Intrusion Detection [D] . Dobson, Anthony M. 2018

机译：用于入侵检测的Apache Spark中机器学习算法的性能评估
6. Usages of Spark Framework with Different Machine Learning Algorithms [O] . Mohamed Ali Mohamed, Ibrahim Mahmoud El-henawy, Ahmad Salah 2021

机译：不同机器学习算法的火花框架的用途
7. Characterizing BigBench Queries, Hive, and Spark in Multi-cloud Environments [O] . Nicolas Poggi, Alejandro Montero, David Carrera 2017

机译：在多云环境中表征Bigbench查询，蜂巢和火花

Benchmarking Spark Machine Learning Using BigBench

摘要

著录项

相似文献

相关主题

期刊订阅