首页> 外文学位 >Studying Recommender Systems to Enhance Distributed Computing Schedulers.

【24h】

Studying Recommender Systems to Enhance Distributed Computing Schedulers.

机译：研究推荐系统以增强分布式计算调度程序。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Distributed Computing frameworks belong to a class of programming models that allow developers to launch workloads on large clusters of machines. Due to the dramatic increase in the volume of data gathered by ubiquitous computing devices, data analytic workloads have become a common case among distributed computing applications, making Data Science an entire field of Computer Science. We argue that Data Scientist's concern lays in three main components: a dataset, a sequence of operations they wish to apply on this dataset, and some constraint they may have related to their work (performances, QoS, budget, etc). However, it is actually extremely difficult, without domain expertise, to perform data science. One need to select the right amount and type of resources, pick up a framework, and configure it. Also, users are often running their application in shared environments, ruled by schedulers expecting them to specify precisely their resource needs. Inherent to the distributed and concurrent nature of the cited frameworks, monitoring and profiling are hard, high dimensional problems that block users from making the right configuration choices and determining the right amount of resources they need. Paradoxically, the system is gathering a large amount of monitoring data at runtime, which remains unused.;In the ideal abstraction we envision for data scientists, the system is adaptive, able to exploit monitoring data to learn about workloads, and process user requests into a tailored execution context. In this work, we study different techniques that have been used to make steps toward such system awareness, and explore a new way to do so by implementing machine learning techniques to recommend a specific subset of system configurations for Apache Spark applications. Furthermore, we present an in depth study of Apache Spark executors configuration, which highlight the complexity in choosing the best one for a given workload.

机译：分布式计算框架属于一类编程模型，允许开发人员在大型计算机集群上启动工作负载。由于无处不在的计算设备收集的数据量急剧增加，因此数据分析工作负载已成为分布式计算应用程序中的常见情况，从而使Data Science成为计算机科学的整个领域。我们认为数据科学家的关注点在于三个主要组成部分：数据集，他们希望对该数据集应用的一系列操作以及它们可能与工作有关的某些约束（性能，QoS，预算等）。但是，如果没有领域专业知识，执行数据科学实际上非常困难。需要选择正确数量和类型的资源，选择一个框架并进行配置。同样，用户经常在共享环境中运行其应用程序，这是由调度程序所期望的，他们希望他们精确指定其资源需求。由于引用的框架具有分布式和并行性，因此监视和概要分析是困难的，高维度的问题，会阻止用户进行正确的配置选择和确定所需的正确资源数量。矛盾的是，该系统正在运行时收集大量监视数据，而这些数据仍未使用。；在我们为数据科学家设想的理想抽象中，该系统是自适应的，能够利用监视数据来了解工作负载，并将用户请求处理为量身定制的执行上下文。在这项工作中，我们研究了用于提高系统知名度的各种技术，并通过实现机器学习技术为Apache Spark应用程序推荐特定的系统配置子集，探索了一种新的方法。此外，我们对Apache Spark执行程序的配置进行了深入研究，突出了针对给定工作负载选择最佳配置的复杂性。

著录项

作者
Demoulin, Henri Maxime.;
展开▼
作者单位

Duke University.;

展开▼
授予单位 Duke University.;
学科 Computer science.
学位 M.S.
年度 2016
页码 86 p.
总页数 86
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A study of the applicability of recommender systems for the Production and Distributed Analysis system PanDA of the ATLAS Experiment [J] . M Titov, G Záruba, K De, Journal of Physics: Conference Series . 2018,第4期

机译：推荐系统在ATLAS实验的生产和分布式分析系统PanDA中的适用性研究
2. Task allocation in Distributed computing VS distributed database systems:A Comparative study [J] . Suchita Upadhyaya, Suman Lata International journal of computer science and network security . 2008,第3期

机译：分布式计算VS分布式数据库系统中的任务分配：比较研究
3. Recommended iodine dose for multiphasic contrast-enhanced mutidetector-row computed tomography imaging of liver for assessing hypervascular hepatocellular carcinoma: Multicenter prospective study in 77 general hospitals in Japan [J] . IchikawaT., OkadaM., KondoH., Academic radiology . 2013,第9期

机译：推荐的碘剂量用于肝脏多相造影剂行行计算机体层摄影术以评估高血管性肝细胞癌：日本77家综合医院的多中心前瞻性研究
4. A Comparative Framework to Evaluate Recommender Systems in Technology Enhanced Learning: a Case Study [C] . Matteo Lombardi, Alessandro Marani Mexican international conference on artificial intelligence . 2015

机译：评估技术增强学习中推荐系统的比较框架：一个案例研究
5. Distributed systems middleware: A framework for parallel and distributed computing on heterogeneous systems. [D] . Al-Jaroodi, Jameela. 2004

机译：分布式系统中间件：用于异构系统上并行和分布式计算的框架。
6. From Microbial Communities to Distributed Computing Systems [O] . Behzad D. Karkaria, Neythen J. Treloar, Chris P. Barnes, 2020

机译：从微生物社区到分布式计算系统
7. Performance Enhancement of Scheduling Algorithm in Heterogeneous Distributed Computing Systems [O] . Aida A. Nasr, Nirmeen A. El-bahnasawy, Menoufia Uni, 2015

机译：异构分布式计算系统中调度算法的性能增强

Studying Recommender Systems to Enhance Distributed Computing Schedulers.

摘要

著录项

相似文献

相关主题

期刊订阅