Kafka is a high-throughput distributed messaging system.It is widely used in the distributed massive data proces-sing and other fields.But the current load balancing algorithm for the Kafka consumer client has some shortcomings.It will increase the overhead and even make some mistakes in the massive data processing.This paper developed an optimized load balancing algorithm for the Kafka consumer client.The process of load balancing was totally controlled by the consumer which was the manager.The rest of consumers didn''t need to do load balancing alone.And the manager didn''t have to redistribute partitions among consumers.The system monitored the condition of every consumer and did load balancing in time if there was a crashed consumer.The results of tests show the algorithm can reduce the system overhead caused by load balancing and avoid getting wrong results of load balancing.It can guarantee the correctness of distributed scientific data processing effectively.%Kafka是一款高吞吐的分布式消息系统,在海量数据处理等多个领域被广泛使用.但现有Kafka消费者/客户端负载均衡算法存在一些不足,在处理海量数据时易产生过高开销甚至出错等问题.提出了一种优化的Kafka消费者/客户端负载均衡算法,负载均衡过程完全由作为管理者的消费者控制,其余消费者不必单独进行负载均衡,而且管理者不用重新分配每个消费者消费的分区.系统对所有消费者的运行状况进行监控,出现宕机的消费者后可及时进行负载再均衡.测试结果说明该算法能够减少Kafka消费者/客户端在负载均衡过程中的系统开销,并避免出现错误的负载均衡结果,可以有效地保障分布科学数据处理的正确性.
展开▼