Compile-Time Code Generation for Embedded Data-Intensive Query Languages

机译：嵌入数据密集型查询语言的编译时代码生成

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many emerging Big Data programming environments, such as Spark and Flink, provide powerful APIs that are inspired by functional programming. However, because of the complexity involved in developing and fine-tuning data analysis applications using the provided APIs, many programmers prefer to use declarative languages, such as Hive and Spark SQL, to code their distributed applications. Unfortunately, current data analysis query languages, which are typically based on the relational model, cannot effectively capture the rich data types and computations required for complex data analysis applications. Furthermore, these query languages are not well-integrated with the host programming language, as they are based on an incompatible data model, and are checked for correctness at run-time, which results in a significantly longer program development time. To address these shortcomings, we introduce a new query language for data-intensive scalable computing, called DIQL, that is deeply embedded in Scala, and a query optimization framework that optimizes and translates DIQL queries to byte code at compile-time. In contrast to other query languages, our query embedding eliminates impedance mismatch as any Scala code can be seamlessly mixed with SQL-like syntax, without having to add any special declaration. DIQL supports nested collections and hierarchical data and allows query nesting at any place in a query. With DIQL, programmers can express complex data analysis tasks, such as PageRank and matrix factorization, using SQL-like syntax exclusively. The DIQL query optimizer can find any possible join in a query, including joins hidden across deeply nested queries, thus unnesting any form of query nesting. Currently, DIQL can run on three Big Data platforms: Apache Spark, Apache Flink, and Twitter's Cascading/Scalding.

机译：许多新兴的大数据编程环境，如火花和传递，提供了强大的API，它受到功能编程的启发。但是，由于使用提供的API开发和微调数据分析应用程序所涉及的复杂性，许多程序员更喜欢使用声明性语言，例如Hive和Spark SQL，以代码其分布式应用程序。遗憾的是，通常基于关系模型的当前数据分析查询语言无法有效地捕获复杂数据分析应用程序所需的丰富数据类型和计算。此外，这些查询语言与主机编程语言没有充分集成，因为它们基于不兼容的数据模型，并在运行时检查正确性，这导致了更长的程序开发时间。为了解决这些缺点，我们向数据密集可扩展计算的新查询语言介绍了DIQL，它深度嵌入Scala，以及查询优化框架，可在编译时优化和将DIQL查询转换为字节代码。与其他查询语言相比，我们的查询嵌入消除了阻抗不匹配，因为任何SCALA代码都可以与SQL类似的语法无缝混合，而无需添加任何特殊声明。 DIQL支持嵌套的集合和分层数据，并允许在查询中的任何位置屏蔽。对于DIQL，程序员可以专门使用SQL样语法表达复杂的数据分析任务，例如PageRank和矩阵分解。 DIQL查询优化器可以在查询中找到任何可能的连接，包括隐藏在深度嵌套查询中的连接，从而不确定任何形式的查询嵌套。目前，DIQL可以在三个大数据平台上运行：Apache Spark，Apache Flink和Twitter的级联/烫伤。

著录项

来源
《IEEE International Congress on Big Data》|2018年|1 v.|共8页
会议地点
作者
Leonidas Fegaras; Md Hasanuzzaman Noor;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词
Sparks; Database languages; Data analysis; Syntactics; Big Data; Optimization; Query processing;

机译：火花;数据库语言;数据分析;句法;大数据;优化;查询处理;

相似文献

外文文献
中文文献
专利

1. Enriching Data-Intensive Domain-Ontologies for Useful Transformation of Natural Language Queries [J] . S. M. Abdullah Al-Mamun, Mohammad Moinul Hoque International journal of computer science and network security . 2010,第7期

机译：丰富数据密集型领域本体以进行自然语言查询的有用转换
2. Automatic Task-Based Code Generation for High Performance Domain Specific Embedded Language [J] . Antoine Tran Tan, Joel Falcou, Daniel Etiemble, International journal of parallel programming . 2016,第3期

机译：用于高性能领域特定嵌入式语言的基于任务的自动代码生成
3. Automatic Query Generation and Query Relevance Measurement for Unsupervised Language Model Adaptation of Speech Recognition [J] . Akinori Ito, Yasutomo Kajiura, Motoyuki Suzuki, EURASIP journal on audio, speech, and music processing . 2009,第009期

机译：语音识别的无监督语言模型自适应自动查询生成和查询相关性度量
4. Compile-Time Code Generation for Embedded Data-Intensive Query Languages [C] . Leonidas Fegaras, Md Hasanuzzaman Noor 2018 IEEE International Congress on Big Data . 2018

机译：嵌入式数据密集型查询语言的编译时代码生成
5. COMPILE-TIME EVALUATION AND CODE GENERATION FOR SEMANTICS-DIRECTED COMPILERS (DENOTATIONAL). [D] . APPEL, ANDREW WILSON. 1985

机译：面向语义的编译器的编译时评估和代码生成（代名词）。
6. Automatic query generation using word embeddings for retrieving passages describing experimental methods [O] . Ferhat Aydın, Zehra Melce Hüsünbeyi, Arzucan Özgür 2017

机译：使用单词嵌入自动查询生成以检索描述实验方法的段落
7. SWOBE- Embedding the Semantic Web languages RDF, SPARQL and SPARUL into Java for Guaranteeing Type Safety, for Checking the Satisfiability of Queries and for the Determination of Query Result Types [O] . Jana Neumann, Volker Linnemann 2012

机译：SWOBE-将语义Web语言RDF，SPARQL和SPARUL嵌入Java中，以确保类型安全，检查查询的可满足性以及确定查询结果类型
8. Using Query Languages and Mobile Code to Reduce Service Invocation Costs; Conference paper [R] . Szymanski, R., Palmer, N., Chase, T. 2006

机译：使用查询语言和移动代码来降低服务调用成本;会议文件

Compile-Time Code Generation for Embedded Data-Intensive Query Languages

摘要

著录项

相似文献

相关主题

期刊订阅