首页> 外文会议>IEEE Conference on High Performance Extreme Computing >Using a Power Law distribution to describe big data
【24h】

Using a Power Law distribution to describe big data

机译:使用幂律分布来描述大数据

获取原文

摘要

The gap between data production and user ability to access, compute and produce meaningful results calls for tools that address the challenges associated with big data volume, velocity and variety. One of the key hurdles is the inability to methodically remove expected or uninteresting elements from large data sets. This difficulty often wastes valuable researcher and computational time by expending resources on uninteresting parts of data. Social sensors, or sensors which produce data based on human activity, such as Wikipedia, Twitter, and Facebook have an underlying structure which can be thought of as having a Power Law distribution. Such a distribution implies that few nodes generate large amounts of data. In this article, we propose a technique to take an arbitrary dataset and compute a power law distributed background model that bases its parameters on observed statistics. This model can be used to determine the suitability of using a power law or automatically identify high degree nodes for filtering and can be scaled to work with big data.
机译:数据生产与用户访问,计算和产生有意义结果的能力之间的鸿沟要求使用工具来应对与大数据量,速度和多样性相关的挑战。关键障碍之一是无法有条不紊地从大型数据集中删除预期或不感兴趣的元素。通过在不感兴趣的数据部分上花费资源,这种困难通常浪费了宝贵的研究人员和计算时间。社交传感器或基于人类活动产生数据的传感器(例如Wikipedia,Twitter和Facebook)具有底层结构,可以认为具有幂律分布。这种分布意味着很少有节点会生成大量数据。在本文中,我们提出了一种获取任意数据集并计算幂律分布背景模型的技术,该模型的参数基于观察到的统计数据。该模型可用于确定使用幂定律的适用性,或自动识别高阶节点以进行过滤,并可进行缩放以处理大数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号