首页> 外文会议>2013 Fifth International Conference on Communication Systems and Networks. >Importance-aware Bloom Filter for managing set membership queries on streaming data
【24h】

Importance-aware Bloom Filter for managing set membership queries on streaming data

机译:具有重要性的Bloom Bloom过滤器,用于管理流数据上的集合成员资格查询

获取原文
获取原文并翻译 | 示例

摘要

In this work1, we consider a set of networking applications which generate or process a continuous stream of data items, for example, a web-cache which processes a stream of web-objects. These applications often require to answer membership queries for duplicate detection on an unbounded set of data items. Two key challenges to answer such membership queries are the limited space to store the entire stream and the different importance-levels associated with different data items. For instance, a web-caches are of finite sizes and the cost of fetching objects into the cache is proportional to the size of objects. Motivated by such examples, our work focuses on developing a time and space efficient indexing and membership query scheme which takes into account importance levels of objects in a data stream. We propose Importance-aware Bloom Filter (IBF) which provides a set of insertion and deletion algorithms to handle membership queries on a data stream. Our evaluation of IBF for a synthetic as well as a real data set of a stream of Youtube videos, shows that IBF has close to 0% error in answering membership queries on important data items, and it results in 4-fold better performance in comparison to other importance-agnostic Bloom filter-based schemes. Importantly, we also find properties of IBF analytically via a Markov model based analysis. Thus, we believe, IBF provides a practical framework to balance the application-specific requirements to index and query data streams based on the data semantics.
机译:在本工作 1 中,我们考虑了一组联网应用程序,这些应用程序生成或处理连续的数据项流,例如,处理Web对象流的Web缓存。这些应用程序通常需要回答成员资格查询,以便对无限制的一组数据项进行重复检测。回答此类成员资格查询的两个关键挑战是存储整个流的空间有限以及与不同数据项相关联的不同重要性级别。例如,网络缓存的大小是有限的,将对象提取到缓存中的成本与对象的大小成正比。受此类示例的启发,我们的工作重点是开发一种时空高效的索引和成员资格查询方案,该方案考虑了数据流中对象的重要性级别。我们提出了重要性感知布隆过滤器(IBF),它提供了一组插入和删除算法来处理数据流上的成员身份查询。我们对Youtube视频流的合成数据集和真实数据集的IBF评估显示,IBF在回答重要数据项的成员资格查询时错误率接近0%,与之相比,其性能提高了4倍到其他与重要性无关的Bloom Bloom方案。重要的是,我们还可以通过基于马尔可夫模型的分析来找到IBF的属性。因此,我们相信,IBF提供了一个实用的框架来平衡特定于应用程序的需求,以基于数据语义对索引和查询数据流进行索引。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号