首页> 外文会议>International Symposium on Advanced Intelligent Systems >Performance Evaluation of Big Data Technology on Designing Big Network Traffic Data Analysis System
【24h】

Performance Evaluation of Big Data Technology on Designing Big Network Traffic Data Analysis System

机译:大型网络交通数据分析系统设计大数据技术的性能评估

获取原文

摘要

Network and computer systems administrators are facing a serious problem of the big network traffic data analysis. It became difficult work of administrators to extract and analysis the abnormal and normal patterns from large amounts of the network traffic data. Currently, traditional relational database management systems (RDBMS) are unsuitable to store a large amount of data because they are designed for storing and processing the structured data. Hive is a data warehouse tool built on top of Hadoop for storing, processing, querying, and analysis the large amount of data. Hive stores the data in a table similar the relational database management system. In this paper, we propose a Hadoop-based traffic querying and analyzing system that handles the TCP, ICMP, and UDP analysis of the big network traffic data. The system consists of six modules: Data Collection Module, Transferring and Storing Information Module, Convertor Module, Data Mining Process Module, DM2SC Module, and Report Module. We also performed complex search queries and compared the query response times of MySQL against Hive in Hadoop environment. As the result, in some scenario, MySQL outperform a cluster of four Hive nodes on querying the ICMP protocol information, nevertheless, MySQL database that stored more than the network traffic data about 45 million records cannot be query the TCP protocol information. Moreover, we observed that the average query response times of Hive in Hadoop cluster that reduce continuously be scale up node into the cluster.
机译:网络和计算机系统管理员正面临着大网络流量数据分析的严重问题。管理员难以从大量网络流量数据中提取和分析异常和正常模式的困难工作。目前,传统的关系数据库管理系统(RDBMS)不合适地存储大量数据,因为它们是为存储和处理结构化数据而设计的。 Hive是一个基于Hadoop顶部的数据仓库工具,用于存储,处理,查询和分析大量数据。 Hive将数据存储在表中类似的关系数据库管理系统。在本文中,我们提出了一种基于Hadoop的流量查询和分析系统,该系统处理大网络流量数据的TCP,ICMP和UDP分析。该系统由六个模块组成:数据采集模块,传输和存储信息模块,转换器模块,数据挖掘过程模块,DM2SC模块和报告模块。我们还执行了复杂的搜索查询,并将MySQL的查询响应时间与Hadoop环境中的Hive进行了比较。结果,在某些情况下,MySQL在查询ICMP协议信息时占据了四个Hive节点的群集,尽管如此,存储多于网络流量数据的MySQL数据库大约4500万条记录,无法查询TCP协议信息。此外,我们观察到,Hadoop集群中Hive的平均查询响应时间将连续扩展到群集中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号