In this paper, we consider the setting of large scale distributed systems, in which each node needs to quickly process a huge amount of data received in the form of a stream that may have been tampered with by an adversary. In this situation, a fundamental problem is how to detect and quantify the amount of work performed by the adversary. To address this issue, we have proposed in a prior work, AnKLe, a one pass algorithm for estimating the Kullback-Leibler divergence of an observed stream compared to the expected one. Experimental evaluations have shown that the estimation provided by AnKLe is accurate for different adversarial settings for which the quality of other methods dramatically decreases. In the present paper, considering n as the number of distinct data items in a stream, we show that AnKLe is an (ε, δ)-approximation Yann Busnel LINA / Universite Ì de Nantes Nantes, France Yann. Busnel@univ-nantes. fr a huge amount of data with limited resources, both in space and timeâ" AnKLe detects changes in the observed stream with respect to an expected behavior by relying on sampling techniques and information-theoretic methods. The metric used is the Kullback-Leibler (KL) divergence, which can be viewed as an extension of the Shannon entropy and is often referred to as the relative entropy [3]. In this paper, we analyze the quality of AnKLe in approx-imating the KL divergence between the expected stream and the observed one. An algorithm A is said to be an (ε, δ)-approximation of a function Ï on Ï if for any sequence algorithm with a space complexity O Ì(1 + 1 ) bits in âmostâ ε ε2 ofitemsintheinputstreamÏ, AoutputsÏËsuchthat Ë cases, and O Ì(1 + nâ''&- mp;#x00CE;µâ''1 ) otherwise. To the best of our ε ε2 P{|Ïâ''Ï|>εÏ}0aregivenas parameters of the algorithm. knowledge, an approximation algorithm for estimating the Kullback-Leibler divergence has never been analyzed before.
展开▼