Using a Power Law distribution to describe big data

机译：使用幂律分布来描述大数据

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The gap between data production and user ability to access, compute and produce meaningful results calls for tools that address the challenges associated with big data volume, velocity and variety. One of the key hurdles is the inability to methodically remove expected or uninteresting elements from large data sets. This difficulty often wastes valuable researcher and computational time by expending resources on uninteresting parts of data. Social sensors, or sensors which produce data based on human activity, such as Wikipedia, Twitter, and Facebook have an underlying structure which can be thought of as having a Power Law distribution. Such a distribution implies that few nodes generate large amounts of data. In this article, we propose a technique to take an arbitrary dataset and compute a power law distributed background model that bases its parameters on observed statistics. This model can be used to determine the suitability of using a power law or automatically identify high degree nodes for filtering and can be scaled to work with big data.

机译：数据生产与用户访问，计算和产生有意义结果的能力之间的鸿沟要求使用工具来应对与大数据量，速度和多样性相关的挑战。关键障碍之一是无法有条不紊地从大型数据集中删除预期或不感兴趣的元素。通过在不感兴趣的数据部分上花费资源，这种困难通常浪费了宝贵的研究人员和计算时间。社交传感器或基于人类活动产生数据的传感器（例如Wikipedia，Twitter和Facebook）具有底层结构，可以认为具有幂律分布。这种分布意味着很少有节点会生成大量数据。在本文中，我们提出了一种获取任意数据集并计算幂律分布背景模型的技术，该模型的参数基于观察到的统计数据。该模型可用于确定使用幂定律的适用性，或自动识别高阶节点以进行过滤，并可进行缩放以处理大数据。

著录项

来源
《IEEE Conference on High Performance Extreme Computing》|2015年|1-5|共5页
会议地点
作者
Gadepally Vijay; Kepner Jeremy;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Big Data; Facebook; Twitter; Wikipedia; big data; data production; power law distribution; social sensors; Big data; Data models; Distributed databases; Matrix converters; Media; Signal processing; Twitter; Big Data; Power Law; Signal Processing;

机译：大数据; Facebook; Twitter;维基百科;大数据;数据生产;幂律分布;社会传感器;大数据;数据模型;分布式数据库;矩阵转换器;媒体;信号处理; Twitter;大数据;幂律;信号处理;
入库时间 2022-08-26 15:00:51

相似文献

外文文献
中文文献
专利

1. Identification and Interpretation of Power-Law Distributions by Spectral Data of Remote Sensing [J] . Mikhail V. Artiushenko Journal of automation and information sciences . 2018,第12期

机译：利用遥感光谱数据识别和解释幂律分布
2. Are the discretised lognormal and hooked power law distributions plausible for citation data? [J] . Thelwall Mike Journal of informetrics . 2016,第2期

机译：离散的对数正态和钩幂定律分布对于引证数据是否合理？
3. The discretised lognormal and hooked power law distributions for complete citation data: Best options for modelling and regression [J] . Thelwall Mike Journal of informetrics . 2016,第2期

机译：用于完整引文数据的离散对数正态和钩函数幂定律分布：建模和回归的最佳选择
4. Power-Law Distribution of Long-Term Experimental Data in Swarm Robotics [C] . Farshad Arvin, Abdolrahman Attar, Ali Emre Turgut, International Conference on Swarm Intelligence;BRICS Congress on Computational Intelligence . 2015

机译：群机器人中长期实验数据的幂律分布
5. An Application of Power-Law Distributions to the Tail of Flood Frequency Data: A Search for a Physical Connection in Flood Frequency Statistics [D] . Otto, Lindsay. 2020

机译：幂律分布在洪水频率数据尾部的应用：洪水频率统计中的物理连接
6. Broad distribution spectrum from Gaussian to power law appears in stochastic variations in RNA-seq data [O] . Akinori Awazu, Takahiro Tanabe, Mari Kamitani, -1

机译：从高斯到幂律的广泛分布谱出现在RNA序列数据的随机变化中
7. Figure 5: Fitting power law distribution and normal distribution to the specificity of CST-III: the power law distribution (green curve) succeeded, while the normal distribution failed. [O] . -1

机译：图5：拟合电力法分布和正常分布对CST-III的特异性：电力法分布（绿色曲线）成功，而正态分布失败。
8. HMI Data Driven Magnetohydrodynamic Model Predicted Active Region Photospheric Heating Rates: Their Scale Invariant, Flare Like Power Law Distributions, and Their Possible Association With Flares. [R] . Goodman, M. L., Kwan, C., Ayhan, B., 2017

机译：HmI数据驱动的磁流体动力学模型预测活动区域光球加热速率：它们的尺度不变，耀斑像幂律分布，以及它们与耀斑的可能关联。

Using a Power Law distribution to describe big data

摘要

著录项

相似文献

相关主题

期刊订阅