首页> 外文期刊>Journal of computer sciences >Towards a Generic Multimodal Architecture for Batch and Streaming Big Data Integration
【24h】

Towards a Generic Multimodal Architecture for Batch and Streaming Big Data Integration

机译:面向用于批处理和流式大数据集成的通用多模式架构

获取原文
获取原文并翻译 | 示例
           

摘要

Big Data are rapidly produced from various heterogeneous data sources. They are of different types (text, image, video or audio) and have different levels of reliability and completeness. One of the most interesting architectures that deal with the large amount of emerging data at high velocity is called the lambda architecture. In fact, it combines two different processing layers namely batch and speed layers, each providing specific views of data while ensuring robustness, fast and scalable data processing. However, most papers dealing with the lambda architecture are focusing one single type of data generally produced by a single data source. Besides, the layers of the architecture are implemented independently, or, at best, are combined to perform basic processing without assessing either the data reliability or completeness. Therefore, inspired by the lambda architecture, we propose in this paper a generic multimodal architecture that combines both batch and streaming processing in order to build a complete, global and accurate insight in near-real-time based on the knowledge extracted from multiple heterogeneous Big Data sources. Our architecture uses batch processing to analyze the data structures and contents, build the learning models and calculate the reliability index of the involved sources, while the streaming processing uses the built-in models of the batch layer to immediately process incoming data and rapidly provide results. We validate our architecture in the context of urban traffic management systems in order to detect congestions.
机译:大数据是通过各种异构数据源快速生成的。它们具有不同的类型(文本,图像,视频或音频),并且具有不同级别的可靠性和完整性。高速处理大量新兴数据的最有趣的体系结构之一就是lambda体系结构。实际上,它结合了两个不同的处理层,即批处理层和速度层,每个层提供特定的数据视图,同时确保鲁棒性,快速和可扩展的数据处理。但是,有关lambda体系结构的大多数论文都将重点放在通常由单个数据源生成的一种类型的数据上。此外,体系结构的各层可以独立实现,或者最多可以组合在一起执行基本处理,而无需评估数据的可靠性或完整性。因此,受lambda架构的启发,我们在本文中提出了一种通用的多模式架构,该架构将批处理和流处理结合起来,以便基于从多个异构Big中提取的知识,以近乎实时的方式建立完整,全局和准确的洞察力。数据源。我们的体系结构使用批处理来分析数据结构和内容,构建学习模型并计算所涉及源的可靠性指标,而流处理使用批处理层的内置模型立即处理传入数据并快速提供结果。我们在城市交通管理系统的背景下验证我们的体系结构,以检测拥堵。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号