首页> 外文期刊>Электронное моделирование: Науч.-теорет. журн. >Real-time Method of Accurate Unique IPs Counting Across High Number of Distinct Dimensions and distinct Time Frames for Big Data Systems
【24h】

Real-time Method of Accurate Unique IPs Counting Across High Number of Distinct Dimensions and distinct Time Frames for Big Data Systems

机译:大数据系统中大量不同维度和不同时间范围内的精确唯一IP的实时计数方法

获取原文
获取原文并翻译 | 示例
           

摘要

The article describes a method which allows counting unique IP addresses within 10 bln of system events per day across high number of distinct dimensions (tuples). Log-based and probability-based methods showed unsatisfactory results. The proposed method allows avoiding excessive resource usage (RAM, CPU and persistent storage) as it appeared in a raw logs method and a probability method of counting. The method also avoids high statistic error for low cardinality as it appeared in a probability method. The main idea is to count unique IP addresses in distinct tuples in real time using RAM for short data interval processing, then flushing it to persistent storage, using merge algorithms to process and store unique IP counts in ordinary database from 5 minute, hourly, daily, weekly and monthly interval files.
机译:本文介绍了一种方法,该方法允许每天在大量不同维度(元组)中的100亿系统事件中计算唯一IP地址。基于对数和基于概率的方法显示出不令人满意的结果。所提出的方法可以避免原始日志方法和计数概率方法中出现的过多资源使用情况(RAM,CPU和永久性存储)。该方法还避免了在概率方法中出现的低基数的高统计误差。主要思想是使用RAM进行短数据间隔处理,实时对不同元组中的唯一IP地址进行计数,然后将其刷新到持久性存储中,使用合并算法从每天5分钟,每小时,每天处理和存储唯一IP计数到普通数据库中,每周和每月间隔文件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号