An Information Divergence Estimation over Data Streams

机译：数据流上的信息发散估计

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we consider the setting of large scale distributed systems, in which each node needs to quickly process a huge amount of data received in the form of a stream that may have been tampered with by an adversary. In this situation, a fundamental problem is how to detect and quantify the amount of work performed by the adversary. To address this issue, we have proposed in a prior work, AnKLe, a one pass algorithm for estimating the Kullback-Leibler divergence of an observed stream compared to the expected one. Experimental evaluations have shown that the estimation provided by AnKLe is accurate for different adversarial settings for which the quality of other methods dramatically decreases. In the present paper, considering n as the number of distinct data items in a stream, we show that AnKLe is an (Îµ, Î´)-approximation Yann Busnel LINA / Universite Ì de Nantes Nantes, France Yann. Busnel@univ-nantes. fr a huge amount of data with limited resources, both in space and timeâ" AnKLe detects changes in the observed stream with respect to an expected behavior by relying on sampling techniques and information-theoretic methods. The metric used is the Kullback-Leibler (KL) divergence, which can be viewed as an extension of the Shannon entropy and is often referred to as the relative entropy [3]. In this paper, we analyze the quality of AnKLe in approx-imating the KL divergence between the expected stream and the observed one. An algorithm A is said to be an (Îµ, Î´)-approximation of a function Ï on Ï if for any sequence algorithm with a space complexity O Ì(1 + 1 ) bits in âmostâ Îµ Îµ2 ofitemsintheinputstreamÏ, AoutputsÏËsuchthat Ë cases, and O Ì(1 + nâ''&- mp;#x00CE;µâ''1 ) otherwise. To the best of our Îµ Îµ2 P{|Ïâ''Ï|>ÎµÏ}0aregivenas parameters of the algorithm. knowledge, an approximation algorithm for estimating the Kullback-Leibler divergence has never been analyzed before.

机译：在本文中，我们考虑大型分布式系统的设置，其中每个节点都需要快速处理以流的形式接收的大量数据，这些数据可能已被对手篡改。在这种情况下，一个基本问题是如何检测和量化对手所完成的工作量。为了解决这个问题，我们在先前的工作中提出了AnKLe，这是一种单程算法，用于估计观察到的流与预期流的Kullback-Leibler散度。实验评估表明，AnKLe提供的估计对于不同的对抗设置是准确的，而其他对抗方法的质量会大大降低。在本文中，考虑n作为流中不同数据项的数量，我们表明AnKLe是近似（μ，γ）的Yann Busnel LINA /Universitéde Nantes Nantes，法国Yann。 Busnel @ univ-nantes。在空间和时间上都用有限的资源获取大量数据。” AnKLe通过采样技术和信息理论方法来检测观察到的流相对于预期行为的变化。使用的度量标准是Kullback-Leibler（KL ）散度，可以看作是Shannon熵的扩展，通常被称为相对熵[3]。在本文中，我们分析了AnKLe的质量，以近似估算预期流与水汽之间的KL散度。如果一个算法A对于任何具有最大空间复杂度OÌ（1 + 1）位的输入流中A的Φμ2个位的空间算法，则算法A被称为Ï上的函数Î的（α，α）逼近。情况，否则为OÌ（1 +nâmp-＃x00CE;μâ1），对于我们的μβ2P {|Ïα||μμ} 0aregivenas参数来说，这是最好的。，从未评估过用于估计Kullback-Leibler散度的近似算法。

著录项

来源
《2012 IEEE 11th International Symposium on Network Computing and Applications》|2012年|p.28- 35|共8页
会议地点 Cambridge MA(US)
作者
Anceaume Emmanuelle; Busnel Yann;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机网络;
关键词

相似文献

外文文献
中文文献
专利

1. A Distributed Information Divergence Estimation over Data Streams [J] . Anceaume Emmanuelle, Busnel Yann IEEE Transactions on Parallel and Distributed Systems . 2014,第2期

机译：数据流上的分布式信息发散估计
2. Data skeletons: simultaneous estimation of multiple quantiles for massive streaming datasets with applications to density estimation [J] . James P. McDermott, G. Jogesh Babu, John C. Liechty, Statistics and computing . 2007,第4期

机译：数据骨架：同时估算海量流数据集的多个分位数，并将其应用于密度估计
3. Controlling View Divergence of Data Freshness in a Replicated Database System Using Statistical Update Delay Estimation [J] . Takao YAMASHITA, Satoshi ONO IEICE Transactions on Information and Systems . 2005,第4期

机译：使用统计更新延迟估计控制复制数据库系统中数据新鲜度的视图分歧
4. An Information Divergence Estimation over Data Streams [C] . Anceaume Emmanuelle, Busnel Yann IEEE International Symposium on Network Computing and Applications . 2012

机译：数据流的信息分歧估计
5. Screening-Based Bregman Divergence Estimation and the Application to Spike Train Data Analysis. [D] . Chai, Yi. 2014

机译：基于筛选的布雷格曼散度估计及其在秒杀列车数据分析中的应用。
6. FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data Streams [O] . Namuk Park, Songkuk Kim 2021

机译：FlexSketch：估计静止和非静止数据流的概率密度
7. A Distributed Information Divergence Estimation over Data Streams [O] . Anceaume, Emmanuelle, Busnel, Yann 2014

机译：数据流上的分布式信息发散估计
8. Vector Splines on the Sphere with Application to the Estimation of Vorticity and Divergence from Discrete, Noisy Data [R] . Wahba, G. 1982

机译：球面上的矢量样条应用于离散噪声数据的涡量和散度估计

An Information Divergence Estimation over Data Streams

摘要

著录项

相似文献

相关主题

期刊订阅