【24h】

Dynamic Load Balancing and Channel Strategy for Apache Flume Collecting Real-Time Data Stream

机译:Apache Flume收集实时数据流的动态负载平衡和通道策略

获取原文
获取原文并翻译 | 示例

摘要

With the development of information technology, real-time data stream processing(RTDSP) has become a popular research topic. The first step of RTDSP is collecting data, requiring a data collector to receive data from the source and send them to the sink. Apache Flume, a distributed and reliable framework, used for this purpose, has some limitations and drawbacks on load balancing and storage. In this paper, we aim to improve performance and availability for collecting unstable real-time big data stream. So we propose a new load balancing strategy based on the free memory size and a storage strategy of integration memory channel with the multi-file channel to reduce the overhead of disk and network. Finally, the experimental results show that the availability and performance are improved under the condition of a poor network, high availability requirements, intense competition in memory resources and large data size. Specifically, the availability is higher than 99.999%, and the performance can be improved by 10%-50% under different conditions.
机译:随着信息技术的发展,实时数据流处理(RTDSP)已成为一个流行的研究课题。 RTDSP的第一步是收集数据,要求数据收集器从源接收数据并将其发送到接收器。为此目的使用的Apache Flume,一个分布式的可靠框架,在负载平衡和存储方面有一些限制和缺点。在本文中,我们旨在提高性能和可用性,以收集不稳定的实时大数据流。因此,我们提出了一种基于可用内存大小的新负载均衡策略,以及一种将内存通道与多文件通道集成的存储策略,以减少磁盘和网络的开销。最后,实验结果表明,在网络状况较差,可用性要求高,内存资源竞争激烈和数据量大的情况下,可用性和性能得到了改善。具体来说,可用性高于99.999%,在不同条件下性能可以提高10%-50%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号