【24h】

An Information Divergence Estimation over Data Streams

机译:数据流上的信息发散估计

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we consider the setting of large scale distributed systems, in which each node needs to quickly process a huge amount of data received in the form of a stream that may have been tampered with by an adversary. In this situation, a fundamental problem is how to detect and quantify the amount of work performed by the adversary. To address this issue, we have proposed in a prior work, AnKLe, a one pass algorithm for estimating the Kullback-Leibler divergence of an observed stream compared to the expected one. Experimental evaluations have shown that the estimation provided by AnKLe is accurate for different adversarial settings for which the quality of other methods dramatically decreases. In the present paper, considering n as the number of distinct data items in a stream, we show that AnKLe is an (ε, δ)-approximation Yann Busnel LINA / Universite Ì de Nantes Nantes, France Yann. Busnel@univ-nantes. fr a huge amount of data with limited resources, both in space and timeâ" AnKLe detects changes in the observed stream with respect to an expected behavior by relying on sampling techniques and information-theoretic methods. The metric used is the Kullback-Leibler (KL) divergence, which can be viewed as an extension of the Shannon entropy and is often referred to as the relative entropy [3]. In this paper, we analyze the quality of AnKLe in approx-imating the KL divergence between the expected stream and the observed one. An algorithm A is said to be an (ε, δ)-approximation of a function Ï on Ï if for any sequence algorithm with a space complexity O Ì(1 + 1 ) bits in âmostâ ε ε2 ofitemsintheinputstreamÏ, AoutputsÏËsuchthat Ë cases, and O Ì(1 + nâ''&- mp;#x00CE;µâ''1 ) otherwise. To the best of our ε ε2 P{|Ïâ''Ï|>εÏ}0aregivenas parameters of the algorithm. knowledge, an approximation algorithm for estimating the Kullback-Leibler divergence has never been analyzed before.
机译:在本文中,我们考虑大型分布式系统的设置,其中每个节点都需要快速处理以流的形式接收的大量数据,这些数据可能已被对手篡改。在这种情况下,一个基本问题是如何检测和量化对手所完成的工作量。为了解决这个问题,我们在先前的工作中提出了AnKLe,这是一种单程算法,用于估计观察到的流与预期流的Kullback-Leibler散度。实验评估表明,AnKLe提供的估计对于不同的对抗设置是准确的,而其他对抗方法的质量会大大降低。在本文中,考虑n作为流中不同数据项的数量,我们表明AnKLe是近似(μ,γ)的Yann Busnel LINA /Universitéde Nantes Nantes,法国Yann。 Busnel @ univ-nantes。在空间和时间上都用有限的资源获取大量数据。” AnKLe通过采样技术和信息理论方法来检测观察到的流相对于预期行为的变化。使用的度量标准是Kullback-Leibler(KL )散度,可以看作是Shannon熵的扩展,通常被称为相对熵[3]。在本文中,我们分析了AnKLe的质量,以近似估算预期流与水汽之间的KL散度。如果一个算法A对于任何具有最大空间复杂度OÌ(1 + 1)位的输入流中A的Φμ2个位的空间算法,则算法A被称为Ï上的函数Î的(α,α)逼近。情况,否则为OÌ(1 +nâmp-#x00CE;μâ1),对于我们的μβ2P {|Ïα||μμ} 0aregivenas参数来说,这是最好的。 ,从未评估过用于估计Kullback-Leibler散度的近似算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号