首页> 外文会议>International Conference on Soft Computing and Intelligent Systems;International Symposium on Advanced Intelligent Systems >Performance Evaluation of Big Data Technology on Designing Big Network Traffic Data Analysis System
【24h】

Performance Evaluation of Big Data Technology on Designing Big Network Traffic Data Analysis System

机译:大数据技术在设计大网络流量数据分析系统中的性能评估

获取原文

摘要

Network and computer systems administrators are facing a serious problem of the big network traffic data analysis. It became difficult work of administrators to extract and analysis the abnormal and normal patterns from large amounts of the network traffic data. Currently, traditional relational database management systems (RDBMS) are unsuitable to store a large amount of data because they are designed for storing and processing the structured data. Hive is a data warehouse tool built on top of Hadoop for storing, processing, querying, and analysis the large amount of data. Hive stores the data in a table similar the relational database management system. In this paper, we propose a Hadoop-based traffic querying and analyzing system that handles the TCP, ICMP, and UDP analysis of the big network traffic data. The system consists of six modules: Data Collection Module, Transferring and Storing Information Module, Convertor Module, Data Mining Process Module, DM2SC Module, and Report Module. We also performed complex search queries and compared the query response times of MySQL against Hive in Hadoop environment. As the result, in some scenario, MySQL outperform a cluster of four Hive nodes on querying the ICMP protocol information, nevertheless, MySQL database that stored more than the network traffic data about 45 million records cannot be query the TCP protocol information. Moreover, we observed that the average query response times of Hive in Hadoop cluster that reduce continuously be scale up node into the cluster.
机译:网络和计算机系统管理员面临着大型网络流量数据分析的严重问题。从大量的网络流量数据中提取和分析异常和正常模式已成为管理员的一项艰巨的工作。当前,传统的关系数据库管理系统(RDBMS)不适合存储大量数据,因为它们是为存储和处理结构化数据而设计的。 Hive是构建在Hadoop之上的数据仓库工具,用于存储,处理,查询和分析大量数据。 Hive将数据存储在类似于关系数据库管理系统的表中。在本文中,我们提出了一个基于Hadoop的流量查询和分析系统,该系统可以处理大型网络流量数据的TCP,ICMP和UDP分析。该系统由六个模块组成:数据收集模块,传输和存储信息模块,转换器模块,数据挖掘处理模块,DM2SC模块和报告模块。我们还执行了复杂的搜索查询,并比较了Hadoop环境中MySQL与Hive的查询响应时间。结果,在某些情况下,MySQL在查询ICMP协议信息方面胜过由四个Hive节点组成的集群,但是,存储了超过4500万条记录的网络流量数据的MySQL数据库无法查询TCP协议信息。此外,我们观察到Hive在Hadoop群集中的平均查询响应时间不断减少,从而将节点扩展到群集中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号