Screening hardware and volume factors in distributed machine learning algorithms on spark: A design of experiments (DoE) based approach

Rodrigues Jairson B.; Vasconcelos Germano C.; Maciel Paulo R. M.

首页> 外文期刊>Computing >Screening hardware and volume factors in distributed machine learning algorithms on spark: A design of experiments (DoE) based approach

【24h】

Screening hardware and volume factors in distributed machine learning algorithms on spark: A design of experiments (DoE) based approach

机译：筛选在火花上分布式机器学习算法中的筛选硬件和体积因子：基于实验的设计（DOE）方法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper presents an approach to investigate distributed machine learning workloads on Spark. The work analyzes hardware and volume data factors regarding time and cost performance when applying machine learning (ML) techniques in big data scenarios. The method is based on the Design of Experiments (DoE) approach and applies randomized two-level fractional factorial design with replications to screening the most relevant factors. A Web Corpus was built from 16 million webpages from Portuguese-speaking countries. The application was a binary text classification to distinguish Brazillian Portuguese from other variations. Five different machine learning algorithms were examined: Logistic Regression, Random Forest, Support Vector Machines, Naive Bayes and Multilayer Perceptron. The data was processed using real clusters having up to 28 nodes, each composed of 12 or 32 cores, 1 or 7 SSD disks, and 3x or 6x RAM per core, totalizing a maximum computational power of 896 cores and 5.25 TB RAM. Linear models were applied to identify, analyze and rank the influence of factors. A total of 240 experiments were carefully organized to maximize the detection of non-cofounded effects up to the second-order, minimizing the experimental efforts. Our results include linear models to estimate time and cost performance, statistical inferences about effects, and a visualization tool based on parallel coordinates to aid decision making about cluster configuration.

机译：本文介绍了调查火花上的分布式机器学习工作负载的方法。该工作分析了在大数据场景中应用机器学习（ML）技术时的时间和成本性能的硬件和卷数据因素。该方法基于实验（DOE）方法的设计，并使用可随机的两级分数因子设计进行复制，以筛选最相关的因素。 Web语料库是由葡萄牙语国家的1600万个网页建造。该申请是二进制文本分类，以区分Brazillian葡萄牙语与其他变化。检查了五种不同的机器学习算法：Logistic回归，随机森林，支持向量机，天真贝叶斯和多层的感觉。使用最多28个节点的实际集群处理数据，每个群集由12或32个核心，1或7个SSD磁盘和3倍或6倍RAM组成，总计896核和5.25 TB RAM的最大计算功率。应用线性模型来识别，分析和排列因素的影响。共组织共240个实验，以最大限度地检测到二阶的非Cofound影响，最大限度地减少实验努力。我们的结果包括线性模型来估算时间和成本性能，统计推论的效果，以及基于并行坐标的可视化工具，以帮助决策簇配置。

著录项

来源
《Computing》 |2021年第10期|2203-2225|共23页
作者
Rodrigues Jairson B.; Vasconcelos Germano C.; Maciel Paulo R. M.;
展开▼
作者单位

Univ Fed Pernambuco Ctr Informat Av Jornalista Anibal Fernandes S-N Cidade Univ 50 BR-740560 Recife PE Brazil;

Univ Fed Pernambuco Ctr Informat Av Jornalista Anibal Fernandes S-N Cidade Univ 50 BR-740560 Recife PE Brazil;

Univ Fed Pernambuco Ctr Informat Av Jornalista Anibal Fernandes S-N Cidade Univ 50 BR-740560 Recife PE Brazil;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Design of experiments; Time and cost models; Distributed machine learning; Cloud computing; Big data; Spark;

机译：实验设计;时间和成本模型;分布式机器学习;云计算;大数据;火花;

相似文献

外文文献
中文文献
专利

1. Brain tumor segmentation approach based on the extreme learning machine and significantly fast and robust fuzzy C-means clustering algorithms running on Raspberry Pi hardware [J] . Medical hypotheses . 2020,第期

机译：基于极端学习机的脑肿瘤分割方法，在覆盆子PI硬件上运行显着快速鲁棒的模糊C型聚类算法
2. Design of experiment (DOE) based screening of factors affecting municipal solid waste (MSW) composting [J] . Khoshrooz Kazemi, Baiyu Zhang, Leonard M. Lye, Waste Management . 2016,第deca期

机译：基于实验设计（DOE）筛选影响城市固体废物（MSW）堆肥的因素
3. Deep Learning Takes on Translation Improvements in hardware, the availability of massive amounts of data, and algorithmic upgrades are among the factors supporting better machine translation [J] . Monroe Don Communications of the ACM . 2017,第6期

机译：深度学习承担翻译工作硬件改进，大量数据的可用性以及算法升级是支持更好的机器翻译的因素
4. A Design of Experiments (DOE) Approach to Parameters Optimization of Sludge Treatment System in an Enzyme Preparations Factory [C] . HaiXia Wang, Min Ji, XinYang Zhang MEMC 2013 . 2013

机译：酶制剂厂污泥处理系统参数优化的实验（DOE）方法设计
5. Hardware design for cryptographic protocols: An algorithmic state machine design approach. [D] . Zamora Garcia, Gerardo Alejandro. 2016

机译：加密协议的硬件设计：一种算法状态机设计方法。
6. Developing an Explainable Machine Learning-Based Personalised Dementia Risk Prediction Model: A Transfer Learning Approach With Ensemble Learning Algorithms [O] . Samuel O. Danso, Zhanhang Zeng, Graciela Muniz-Terrera, 2021

机译：开发可解释的基于机器学习的个性化痴呆风险预测模型：具有集合学习算法的转移学习方法
7. Volume 2, Issue 3, Special issue on Recent Advances in Engineering Systems (Published Papers) Articles Transmit / Received Beamforming for Frequency Diverse Array with Symmetrical frequency offsets Shaddrack Yaw Nusenu Adv. Sci. Technol. Eng. Syst. J. 2(3), 1-6 (2017); View Description Detailed Analysis of Amplitude and Slope Diffraction Coefficients for knife-edge structure in S-UTD-CH Model Eray Arik, Mehmet Baris Tabakcioglu Adv. Sci. Technol. Eng. Syst. J. 2(3), 7-11 (2017); View Description Applications of Case Based Organizational Memory Supported by the PAbMM Architecture Martín, María de los Ángeles, Diván, Mario José Adv. Sci. Technol. Eng. Syst. J. 2(3), 12-23 (2017); View Description Low Probability of Interception Beampattern Using Frequency Diverse Array Antenna Shaddrack Yaw Nusenu Adv. Sci. Technol. Eng. Syst. J. 2(3), 24-29 (2017); View Description Zero Trust Cloud Networks using Transport Access Control and High Availability Optical Bypass Switching Casimer DeCusatis, Piradon Liengtiraphan, Anthony Sager Adv. Sci. Technol. Eng. Syst. J. 2(3), 30-35 (2017); View Description A Derived Metrics as a Measurement to Support Efficient Requirements Analysis and Release Management Indranil Nath Adv. Sci. Technol. Eng. Syst. J. 2(3), 36-40 (2017); View Description Feedback device of temperature sensation for a myoelectric prosthetic hand Yuki Ueda, Chiharu Ishii Adv. Sci. Technol. Eng. Syst. J. 2(3), 41-40 (2017); View Description Deep venous thrombus characterization: ultrasonography, elastography and scattering operator Thibaud Berthomier, Ali Mansour, Luc Bressollette, Frédéric Le Roy, Dominique Mottier Adv. Sci. Technol. Eng. Syst. J. 2(3), 48-59 (2017); View Description Improving customs’ border control by creating a reference database of cargo inspection X-ray images Selina Kolokytha, Alexander Flisch, Thomas Lüthi, Mathieu Plamondon, Adrian Schwaninger, Wicher Vasser, Diana Hardmeier, Marius Costin, Caroline Vienne, Frank Sukowski, Ulf Hassler, Irène Dorion, Najib Gadi, Serge Maitrejean, Abraham Marciano, Andrea Canonica, Eric Rochat, Ger Koomen, Micha Slegt Adv. Sci. Technol. Eng. Syst. J. 2(3), 60-66 (2017); View Description Aviation Navigation with Use of Polarimetric Technologies Arsen Klochan, Ali Al-Ammouri, Viktor Romanenko, Vladimir Tronko Adv. Sci. Technol. Eng. Syst. J. 2(3), 67-72 (2017); View Description Optimization of Multi-standard Transmitter Architecture Using Single-Double Conversion Technique Used for Rescue Operations Riadh Essaadali, Said Aliouane, Chokri Jebali and Ammar Kouki Adv. Sci. Technol. Eng. Syst. J. 2(3), 73-81 (2017); View Description Singular Integral Equations in Electromagnetic Waves Reflection Modeling A. S. Ilinskiy, T. N. Galishnikova Adv. Sci. Technol. Eng. Syst. J. 2(3), 82-87 (2017); View Description Methodology for Management of Information Security in Industrial Control Systems: A Proof of Concept aligned with Enterprise Objectives. Fabian Bustamante, Walter Fuertes, Paul Diaz, Theofilos Toulqueridis Adv. Sci. Technol. Eng. Syst. J. 2(3), 88-99 (2017); View Description Dependence-Based Segmentation Approach for Detecting Morpheme Boundaries Ahmed Khorsi, Abeer Alsheddi Adv. Sci. Technol. Eng. Syst. J. 2(3), 100-110 (2017); View Description Paper Improving Rule Based Stemmers to Solve Some Special Cases of Arabic Language Soufiane Farrah, Hanane El Manssouri, Ziyati Elhoussaine, Mohamed Ouzzif Adv. Sci. Technol. Eng. Syst. J. 2(3), 111-115 (2017); View Description Medical imbalanced data classification Sara Belarouci, Mohammed Amine Chikh Adv. Sci. Technol. Eng. Syst. J. 2(3), 116-124 (2017); View Description ADOxx Modelling Method Conceptualization Environment Nesat Efendioglu, Robert Woitsch, Wilfrid Utz, Damiano Falcioni Adv. Sci. Technol. Eng. Syst. J. 2(3), 125-136 (2017); View Description GPSR+Predict: An Enhancement for GPSR to Make Smart Routing Decision by Anticipating Movement of Vehicles in VANETs Zineb Squalli Houssaini, Imane Zaimi, Mohammed Oumsis, Saïd El Alaoui Ouatik Adv. Sci. Technol. Eng. Syst. J. 2(3), 137-146 (2017); View Description Optimal Synthesis of Universal Space Vector Digital Algorithm for Matrix Converters Adrian Popovici, Mircea Băbăiţă, Petru Papazian Adv. Sci. Technol. Eng. Syst. J. 2(3), 147-152 (2017); View Description Control design for axial flux permanent magnet synchronous motor which operates above the nominal speed Xuan Minh Tran, Nhu Hien Nguyen, Quoc Tuan Duong Adv. Sci. Technol. Eng. Syst. J. 2(3), 153-159 (2017); View Description A synchronizing second order sliding mode control applied to decentralized time delayed multi−agent robotic systems: Stability Proof Marwa Fathallah, Fatma Abdelhedi, Nabil Derbel Adv. Sci. Technol. Eng. Syst. J. 2(3), 160-170 (2017); View Description Fault Diagnosis and Tolerant Control Using Observer Banks Applied to Continuous Stirred Tank Reactor Martin F. Pico, Eduardo J. Adam Adv. Sci. Technol. Eng. Syst. J. 2(3), 171-181 (2017); View Description Development and Validation of a Heat Pump System Model Using Artificial Neural Network Nabil Nassif, Jordan Gooden Adv. Sci. Technol. Eng. Syst. J. 2(3), 182-185 (2017); View Description Assessment of the usefulness and appeal of stigma-stop by psychology students: a serious game designed to reduce the stigma of mental illness Adolfo J. Cangas, Noelia Navarro, Juan J. Ojeda, Diego Cangas, Jose A. Piedra, José Gallego Adv. Sci. Technol. Eng. Syst. J. 2(3), 186-190 (2017); View Description Kinect-Based Moving Human Tracking System with Obstacle Avoidance Abdel Mehsen Ahmad, Zouhair Bazzal, Hiba Al Youssef Adv. Sci. Technol. Eng. Syst. J. 2(3), 191-197 (2017); View Description A security approach based on honeypots: Protecting Online Social network from malicious profiles Fatna Elmendili, Nisrine Maqran, Younes El Bouzekri El Idrissi, Habiba Chaoui Adv. Sci. Technol. Eng. Syst. J. 2(3), 198-204 (2017); View Description Pulse Generator for Ultrasonic Piezoelectric Transducer Arrays Based on a Programmable System-on-Chip (PSoC) Pedro Acevedo, Martín Fuentes, Joel Durán, Mónica Vázquez, Carlos Díaz Adv. Sci. Technol. Eng. Syst. J. 2(3), 205-209 (2017); View Description Enabling Toy Vehicles Interaction With Visible Light Communication (VLC) M. A. Ilyas, M. B. Othman, S. M. Shah, Mas Fawzi Adv. Sci. Technol. Eng. Syst. J. 2(3), 210-216 (2017); View Description Analysis of Fractional-Order 2xn RLC Networks by Transmission Matrices Mahmut Ün, Manolya Ün Adv. Sci. Technol. Eng. Syst. J. 2(3), 217-220 (2017); View Description Fire extinguishing system in large underground garages Ivan Antonov, Rositsa Velichkova, Svetlin Antonov, Kamen Grozdanov, Milka Uzunova, Ikram El Abbassi Adv. Sci. Technol. Eng. Syst. J. 2(3), 221-226 (2017); View Description Directional Antenna Modulation Technique using A Two-Element Frequency Diverse Array Shaddrack Yaw Nusenu Adv. Sci. Technol. Eng. Syst. J. 2(3), 227-232 (2017); View Description Classifying region of interests from mammograms with breast cancer into BIRADS using Artificial Neural Networks Estefanía D. Avalos-Rivera, Alberto de J. Pastrana-Palma Adv. Sci. Technol. Eng. Syst. J. 2(3), 233-240 (2017); View Description Magnetically Levitated and Guided Systems Florian Puci, Miroslav Husak Adv. Sci. Technol. Eng. Syst. J. 2(3), 241-244 (2017); View Description Energy-Efficient Mobile Sensing in Distributed Multi-Agent Sensor Networks Minh T. Nguyen Adv. Sci. Technol. Eng. Syst. J. 2(3), 245-253 (2017); View Description Validity and efficiency of conformal anomaly detection on big distributed data Ilia Nouretdinov Adv. Sci. Technol. Eng. Syst. J. 2(3), 254-267 (2017); View Description S-Parameters Optimization in both Segmented and Unsegmented Insulated TSV upto 40GHz Frequency Juma Mary Atieno, Xuliang Zhang, HE Song Bai Adv. Sci. Technol. Eng. Syst. J. 2(3), 268-276 (2017); View Description Synthesis of Important Design Criteria for Future Vehicle Electric System Lisa Braun, Eric Sax Adv. Sci. Technol. Eng. Syst. J. 2(3), 277-283 (2017); View Description Gestural Interaction for Virtual Reality Environments through Data Gloves G. Rodriguez, N. Jofre, Y. Alvarado, J. Fernández, R. Guerrero Adv. Sci. Technol. Eng. Syst. J. 2(3), 284-290 (2017); View Description Solving the Capacitated Network Design Problem in Two Steps [O] . Meriem Khelifi, Mohand Yazid Saidi, Saadi Boudjit 2017

机译：第2卷，第3卷，工程系统最近进步的特殊问题（已发布论文）文章传输/接收频率各种阵列的波束成形，具有对称频率偏移Shaddrack偏航Nusenu Adv。 SCI。技术。 eng。系统。 J. 2（3），1-6（2017）;查看描述S-UTD-CH模型Eray Arik刀刃结构幅度和坡度衍射系数的详细分析，Mehmet Baris Tabakcioglu Adv。 SCI。技术。 eng。系统。 J. 2（3），7-11（2017）;查看描述案例基于组织内存的案例组织内存由PABMM ArchitectralMartín，MaríadeLosÁngeles，Diván，MarioJoséAven。 SCI。技术。 eng。系统。 J. 2（3），12-23（2017）;查看说明使用频率各种阵列天线Shaddrack偏航Nusenu Adv的低拦截横梁仪表概率。 SCI。技术。 eng。系统。 J. 2（3），24-29（2017）;查看说明零信任云网络使用传输访问控制和高可用性光学旁路交换套管切换西米列德·莱格托希金，安东尼Sager adv。 SCI。技术。 eng。系统。 J. 2（3），30-35（2017）;视图描述派生指标作为支持有效的需求分析和发布管理Indranil Nath ADV的测量。 SCI。技术。 eng。系统。 J. 2（3），36-40（2017）;视图描述肌电假肢yuki ueda的温度感觉反馈装置，恰米·伊莎。 SCI。技术。 eng。系统。 J. 2（3），41-40（2017）;查看描述深静脉血栓表征：超声检查，弹性造影和散射操作员Thibaud Berthomier，Ali Mansour，Luc Bressollette，FrédéricLeRoy，Dominique Mottier Adv。 SCI。技术。 eng。系统。 J. 2（3），48-59（2017）;查看说明通过创建货物检测的参考数据库来改进海关边界控制X射线图像Selina Kolokytha，Alexander Flisch，ThomasLüthi，Mathieu Plamondon，Adrian Schwaninger，Wiana Schwaninger，Wiana Hardmeier，Marius Costin，Caroline Vienne，Frank Sukowski，ULF哈桑德勒，伊瑞恩多森，纳吉·甘迪，塞尔格·马西亚诺，亚伯拉·马西亚诺，安德雷阿索尼卡，埃里克·罗·克，Ger Komen，Micha Slegt Adv。 SCI。技术。 eng。系统。 J. 2（3），60-66（2017）;查看说明航空导航使用偏光技术Arsen Klochan，Ali Al-Ammouri，Viktor Romanenko，Vladimir Tronko Adv。 SCI。技术。 eng。系统。 J. 2（3），67-72（2017）;查看描述使用用于救援运营的单双转换技术优化多标准变送器架构Riadue Essaadali，Chokri Jebali和Ammar Kouki Adv。 SCI。技术。 eng。系统。 J. 2（3），73-81（2017）;视图描述电磁波反射模型中的奇异积分方程A. S.Ilinskiy，T.Galishnikova Adv。 SCI。技术。 eng。系统。 J. 2（3），82-87（2017）;查看工业控制系统信息安全管理的描述方法：概念证明与企业目标对齐。 Fabian Bustamante，Walter Fuertes，Paul Diaz，Theofilos Toulqueridis adv。 SCI。技术。 eng。系统。 J. 2（3），88-99（2017年）;查看描述依赖基于依赖的分割方法，用于检测语素边界Ahmed Khorsi，Abeer Alsheddi Adv。 SCI。技术。 eng。系统。 J. 2（3），100-110（2017）;查看描述纸张改进了基于统治的犹太人，解决了阿拉伯语Soufiane Farrah，Hanane El Manssouri，Ziyati Elhoussaine，Mohamed Ouzzif Adv。 SCI。技术。 eng。系统。 J. 2（3），111-115（2017）;查看描述医疗不平衡数据分类Sara Belarouci，穆罕默德胺Chikh Adv。 SCI。技术。 eng。系统。 J. 2（3），116-124（2017）;查看描述adoxx建模方法概念化环境Nesat Efendioglu，Robert Woitsch，Wilfrid Utz，Damiano Falcioni Adv。 SCI。技术。 eng。系统。 J. 2（3），125-136（2017）;查看描述GPSR +预测：通过预期Vanets Zineb Squalli Houssaini，Imane Zaimi，Mohammed Oumsis，SaïdelAlaouiOuatik Advik Advik Advik Advik Advik Acik Adve，GPSR +预测SCI。技术。 eng。系统。 J.2（3），137-146（2017）;查看说明矩阵转换器通用空间矢量数字算法的最佳合成Adrian Popovici，MirceaBăBăIţă，Petru Papazian adv。 SCI。技术。 eng。系统。 J. 2（3），147-152（2017）;视图描述轴向磁通永磁同步电动机的控制设计，其在标称旋转Xuan Minh Tran，Nhu Hien Nguyen，CACoc Tuan Duong Adv。 SCI。技术。 eng。系统。 J. 2（3），153-159（2017）;视图说明A同步应用于分散时间延迟多功能机器人系统：稳定性证明Marwa Fathallah，Fatma Abdelhedi，Nabil Derbel Adv。 SCI。技术。 eng。系统。 J. 2（3），160-170（2017年）;查看描述故障诊断和耐受控制使用观察者银行应用于连续搅拌坦克反应器Martin F. Pico，Eduardo J. Adam Adv。 SCI。技术。 eng。系统。 J. 2（3），171-181（2017年）;查看说明用人工神经网络利用人工神经网络的热泵系统模型的开发和验证Nabil Nassif，Jordan Goodend Adv。 SCI。技术。 eng。系统。 J. 2（3），182-185（2017）;查看描述对心理学学生的耻辱 - 终止的有用性和吸引力的描述：一场严肃的比赛，旨在减少精神疾病的耻辱，诺埃尔·纳瓦罗，Juan J. Ojeda，迭戈库戈，何塞A. Piedra，joséGallego adv。 SCI。技术。 eng。系统。 J. 2（3），186-190（2017）;视图说明基于Kinect的移动人类跟踪系统，避免避让人Abdel Mehsen Ahmad，Zouhair Bazzal，Hiba Al Youssef Adv。 SCI。技术。 eng。系统。 J. 2（3），191-197（2017年）;视图描述基于蜜罐的安全方法：保护在线社交网络免受恶意配置文件FATNA Elmendili，Nisrine Maqran，Younes el Bouzekri El Idrissi，Habiba Chaoui Adv。 SCI。技术。 eng。系统。 J. 2（3），198-204（2017）;视图描述超声波压电传感器阵列的基于可编程系统的片上（PSoC）Pedro Acevedo，MartínFentes，JoelDurán，MónicaVázquez，CarlosDíazadv。 SCI。技术。 eng。系统。 J. 2（3），205-209（2017）;查看描述使玩具车辆与可见光通信（VLC）的交互（VLC）M.A.Ilyas，M. B. Othman，S. S. Shah，Mas Fawzi Adv。 SCI。技术。 eng。系统。 J. 2（3），210-216（2017）;查看说明分析分数2xN RLC网络传输矩阵MahmutÜn，ManolyaÜndadv。 SCI。技术。 eng。系统。 J. 2（3），217-220（2017年）;查看描述灭火系统在大型地下车库Ivan Antonov，Rositsa Velichkova，Svetlin Antonov，Kamen Grozdanov，Milka Uzunova，Ikram El Abbassi Adv。 SCI。技术。 eng。系统。 J. 2（3），221-226（2017）;查看说明使用双元频率各种阵列的定向天线调制技术Shaddrack偏航Nusenu Adv。 SCI。技术。 eng。系统。 J. 2（3），227-232（2017）;查看描述使用人工神经网络与乳腺癌与乳腺癌的乳腺X乳头乳腺癌的兴趣区域进行分类，使用人工神经网络EstefaníaD.Avalos-Rivera，Alberto de J. Pastana-Palma Adv。 SCI。技术。 eng。系统。 J.2（3），233-240（2017）;查看描述磁悬浮和引导系统Florian Puci，Miroslav Husak Adv。 SCI。技术。 eng。系统。 J. 2（3），241-244（2017年）;视图说明分布式多功能传感器网络中的节能移动感应minh t. nguyen adv。 SCI。技术。 eng。系统。 J. 2（3），245-253（2017年）;视图描述大分布式数据Ilia Nouretdinov Adv的保形异常检测的有效性和效率。 SCI。技术。 eng。系统。 J. 2（3），254-267（2017年）;查看描述S参数优化在分段和未分段绝缘TSV中高达40GHz频率Juma Mary Atieno，Xuliang Zhang，He Song Bai Adv。 SCI。技术。 eng。系统。 J. 2（3），268-276（2017年）;查看描述综合未来车辆电气系统的重要设计标准Lisa Braun，Eric Sax Adv。 SCI。技术。 eng。系统。 J. 2（3），277-283（2017年）;查看描述虚拟现实环境的故障交互通过数据手套G. Rodriguez，N.Jofre，Y.Alvarado，J.Fernández，R.Guerrero Adv。 SCI。技术。 eng。系统。 J. 2（3），284-290（2017年）;查看描述在两个步骤中解决电容网络设计问题

Screening hardware and volume factors in distributed machine learning algorithms on spark: A design of experiments (DoE) based approach

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅