首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >A Distributed Information Divergence Estimation over Data Streams
【24h】

A Distributed Information Divergence Estimation over Data Streams

机译:数据流上的分布式信息发散估计

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

In this paper, we consider the setting of large-scale distributed systems, in which each node needs to quickly process a huge amount of data received in the form of a stream that may have been tampered with by an adversary. In this situation, a fundamental problem is how to detect and quantify the amount of work performed by the adversary. To address this issue, we propose a novel algorithm AnKLe for estimating the Kullback-Leibler divergence of an observed stream compared with the expected one. AnKLe combines sampling techniques and information-theoretic methods. It is very efficient, both in terms of space and time complexities, and requires only a single pass over the data stream. We show that AnKLe is an $((varepsilon,delta ))$-approximation algorithm with a space complexity $(tilde{{cal O}}({1over varepsilon } +{1over varepsilon^2} ))$ bits in "mostâ' cases, and $(tilde{{cal O}}({1over varepsilon } +{n-varepsilon^{-1}over varepsilon^2} ))$ otherwise, where $(n)$ is the number of distinct data items in a stream. Moreover, we propose a distributed version of AnKLe that requires at most $({cal O}(rell (log {n} + 1)))$ bits of communication between the $(ell)$ participating nodes, where $(r)$ is number of rounds of the algorithm. Experimental results show that the estimation provided by AnKLe remains accurate even for different adversarial settings for which the quality of other methods dramatically decreases.
机译:在本文中,我们考虑了大型分布式系统的设置,其中每个节点都需要快速处理以流形式接收的大量数据,这些数据可能已被对手篡改。在这种情况下,一个基本问题是如何检测和量化对手所完成的工作量。为了解决这个问题,我们提出了一种新颖的算法AnKLe,用于估计观察到的流与预期流的Kullback-Leibler散度。 AnKLe结合了采样技术和信息理论方法。就空间和时间复杂度而言,它非常高效,并且只需要对数据流进行一次传递。我们证明AnKLe是$(((varepsilon,delta))$-近似算法,具有空间复杂度$(tilde {{cal O}}({1over varepsilon} + {1over varepsilon ^ 2}))$位,位于“mostâ '的情况下,则$(波浪号{{cal O}}({1在varepsilon} + {n-varepsilon ^ {-1}在varepsilon ^ 2}))$$否则,其中$(n)$是不同数据的数量此外,我们提出了AnKLe的分布式版本,该版本最多需要$({ell)$个参与节点之间的$({cal O}(rell(log {n} + 1)))$位通信,实验结果表明,AnKLe提供的估计即使在不同对抗性设置(其他方法的质量急剧下降)下也仍然是准确的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号