QDrill: Query-Based Distributed Consumable Analytics for Big Data

机译：QDrill：大数据的基于查询的分布式耗材分析

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Consumable analytics attempt to address the shortage of skilled data analysts in many organizations by offering analytic functionality in a form more familiar to in-house expertise. Providing consumable analytics for Big Data faces three main challenges. The first challenge is making the analytics algorithms run in a distributed fashion in order to analyze Big Data in a timely manner. The second challenge is providing an easy interface to allow in-house expertise to run these algorithms in a distributed fashion while minimizing the learning cycle and existing code rewrites. The third challenge is running the analytics on data of different formats stored on heterogeneous data stores. In this paper, we address these challenges in the proposed QDrill. We introduce the Analytics Adaptor extension for Apache Drill, a schema-free SQL query engine for non-relational storage. The Analytics Adaptor introduces the Distributed Analytics Query Language for invoking data mining algorithms from within the Drill standard SQL query statements. The adaptor allows using any sequential single-node data mining library (e.g. WEKA) and makes its algorithms run in a distributed fashion without having to rewrite them. We evaluate QDrill against Apache Mahout. The evaluation shows that QDrill outperforms Mahout in Updatable model training and scoring phase while almost keeping the same performance for Non-Updatable model training. QDrill is more scalable and offers an easier interface, no storage overhead and the whole algorithms repository of WEKA, with the ability to extend to use algorithms from other data mining libraries.

机译：消耗型分析试图通过以内部专家更熟悉的形式提供分析功能来解决许多组织中缺乏熟练数据分析师的问题。为大数据提供耗材分析面临三个主要挑战。第一个挑战是使分析算法以分布式方式运行，以便及时分析大数据。第二个挑战是提供一个简单的界面，以允许内部专家以分布式方式运行这些算法，同时最小化学习周期和现有代码重写。第三个挑战是对异构数据存储中存储的不同格式的数据进行分析。在本文中，我们将在拟议的QDrill中解决这些挑战。我们为Apache Drill引入了Analytics Adapter扩展，这是一种用于非关系存储的无模式SQL查询引擎。 Analytics Adapter引入了Distributed Analytics查询语言，用于从Drill标准SQL查询语句中调用数据挖掘算法。该适配器允许使用任何顺序的单节点数据挖掘库（例如WEKA），并使其算法以分布式方式运行而无需重写它们。我们针对Apache Mahout评估QDrill。评估显示，在可更新模型训练和评分阶段，QDrill优于Mahout，而对于不可更新模型训练，其性能几乎保持相同。 QDrill具有更高的可扩展性，提供了更简单的界面，无存储开销以及WEKA的整个算法存储库，并具有扩展能力以使用其他数据挖掘库中的算法。

著录项

来源
《IEEE International Congress on Big Data》|2016年|117-124|共8页
会议地点
作者
Shadi Khalifa; Patrick Martin; Dan Rope; Mike McRoberts; Craig Statchuk;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Data mining; Libraries; Big data; Database languages; Organizations; Standards organizations;

机译：数据挖掘;图书馆;大数据;数据库语言;组织;标准组织;

相似文献

外文文献
中文文献
专利

1. Renewable Energy-Aware Big Data Analytics in Geo-Distributed Data Centers with Reinforcement Learning [J] . Network Science and Engineering, IEEE Transactions on . 2020,第1期

机译：具有强化学习功能的地理分布数据中心中的可再生能源感知大数据分析
2. Private data analytics on biomedical sensing data via distributed computation [J] . Thierry Edoh Computing reviews . 2017,第8期

机译：通过分布式计算对生物医学传感数据进行私有数据分析
3. On the Effectiveness of Hybrid Canopy with Hoeffding Adaptive Naive Bayes Trees: Distributed Data Mining for Big Data Analytics [J] . Mrutyunjaya Panda International journal of applied evolutionary computation . 2017,第2期

机译：Hoeffding自适应朴素贝叶斯树的混合冠层的有效性：大数据分析的分布式数据挖掘
4. QDrill: Query-Based Distributed Consumable Analytics for Big Data [C] . Shadi Khalifa, Patrick Martin, Dan Rope, IEEE International Congress on Big Data . 2016

机译：QDRILL：基于查询的大数据的分布式消耗分析
5. Achieving consumable big data analytics by distributing data mining algorithms. [D] . Khalifa, Shady Samir Mohamed. 2017

机译：通过分发数据挖掘算法来实现消耗性大数据分析。
6. Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services [O] . Dillon Chrimes, Hamid Zamani 2017

机译：在大数据分析平台中通过HBase使用分布式数据进行临床服务
7. Intelligent Queries over BIRN Data using the Foundational Model of Anatomy and a Distributed Query-Based Data Integration System [O] . Brinkley James F, Turner Jessica A, Detwiler Landon T, 2010

机译：使用解剖的基本模型和基于分布式查询的数据集成系统对BIRN数据进行智能查询

QDrill: Query-Based Distributed Consumable Analytics for Big Data

摘要

著录项

相似文献

相关主题

期刊订阅