【24h】

Sampling Method in Traffic Logs Analyzing

机译:交通日志分析中的抽样方法

获取原文

摘要

In this paper, we aim to quantify the amount of degradation and bias that sampling introduces with respect to the non-sampled traffic data taking into account different sampling rates, different sampling policies, different sampling population, and different analysis tasks. First, we analyze the impact of sampling on summation task. Second, we apply sampling method to aggregation by a particular dimension task. We find that the relative error of different keys in aggregation are very different which will greatly limit the effect of sampling method on data compression when the application has strict limit to maximum relative error. So we implement a novel reservoir sampling policy based on our application and furthermore optimize it by combine static sampling policy with reservoir sampling policy. The results demonstrate that the proposed method can effectively control the maximum relative error while maintain data compression rate comparable to existing static sampling methods. Finally we analyze the user loss rate as a function of sampling step under system sampling policy. Through deeply inspect the number of logs for each user, we find the reason why large user loss rate occurs. Our results can provide useful reference for quick approximate analysis using sampling method in traffic logs analyzing area.
机译:在本文中,我们的目标是考虑到不同的采样率,不同的采样策略,不同的采样人口和不同的分析任务,来量化采样对非采样交通数据造成的退化和偏差的数量。首先,我们分析抽样对求和任务的影响。其次,我们将抽样方法应用于特定维度任务的汇总。我们发现,聚合中不同键的相对误差差异很大,当应用对最大相对误差有严格限制时,将大大限制采样方法对数据压缩的影响。因此,我们根据自己的应用情况实施了一种新颖的水库采样策略,并通过将静态采样策略与水库采样策略相结合对其进行了优化。结果表明,该方法可以有效地控制最大相对误差,同时保持与现有静态采样方法相当的数据压缩率。最后,我们根据系统采样策略分析了用户丢失率与采样步长的关系。通过深入检查每个用户的日志数,我们找到了导致大量用户流失率的原因。我们的结果可为在交通记录分析区域中使用采样方法进行快速近似分析提供有用的参考。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号