首页> 外文会议>International conference on very large data bases >An IDEA: An gestion Framework for Data Enrichment in AsterixDB
【24h】

An IDEA: An gestion Framework for Data Enrichment in AsterixDB

机译:IDEA:AsterixDB中用于数据丰富的/ gestation框架

获取原文

摘要

Big Data today is being generated at an unprecedented rate from various sources such as sensors, applications, and devices, and it often needs to be enriched based on other reference information to support complex analytical queries. Depending on the use case, the enrichment operations can be compiled code, declarative queries, or machine learning models with different complexities. For enrichments that will be frequently used in the future, it can be advantageous to push their computation into the ingestion pipeline so that they can be stored (and queried) together with the data. In some cases, the referenced information may change over time, so the ingestion pipeline should be able to adapt to such changes to guarantee the currency and/or correctness of the enrichment results. In this paper, we present a new data ingestion framework that supports data ingestion at scale, enrichments requiring complex operations, and adaptiveness to reference data changes. We explain how this framework has been built on top of Apache AsterixDB and investigate its performance at scale under various workloads.
机译:如今,大数据正以前所未有的速度从传感器,应用程序和设备等各种来源生成,并且通常需要基于其他参考信息来丰富大数据以支持复杂的分析查询。根据使用情况,扩展操作可以是编译代码,声明性查询或具有不同复杂性的机器学习模型。对于将来将经常使用的浓缩,将其计算推入摄取管道以使它们可以与数据一起存储(和查询)可能是有利的。在某些情况下,参考信息可能会随时间而变化,因此,摄入流水线应该能够适应这种变化,以保证浓缩结果的准确性和/或正确性。在本文中,我们提出了一个新的数据摄取框架,该框架支持大规模的数据摄取,需要复杂操作的扩充以及对参考数据更改的适应性。我们将说明如何在Apache AsterixDB的基础上构建此框架,并在各种工作负载下大规模研究其性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号