首页> 外文会议>Proceedings of the 5th ACM international conference on distributed event-based systems. >Complex Pattern Ranking (CPR): Evaluating Top-k Pattern Queries over Event Streams
【24h】

Complex Pattern Ranking (CPR): Evaluating Top-k Pattern Queries over Event Streams

机译:复杂模式排名(CPR):评估事件流中的前k个模式查询

获取原文
获取原文并翻译 | 示例

摘要

Most existing approaches to complex event processing over streaming data rely on the assumption that the matches to the queries are rare and that the goal of the system is to identify these few matches within the incoming deluge of data. In many applications, such as user credit card purchase pattern monitoring, however the matches to the user queries are in fact plentiful and the system has to efficiently sift through these many matches to locate only the few most preferable matches. In this paper, we propose a complex pattern ranking (CPR) framework for specifying top-k pattern queries over streaming data, present new algorithms to support top-k pattern queries in data streaming environments, and verify the effectiveness and efficiency of the proposed algorithms. The algorithms we develop identify top-k matching results satisfying both patterns and additional criteria. To support real-time processing of the data streams, instead of computing top-k results from scratch for each time window, we maintain top-k results dynamically as new events come and old ones expire. We also develop new top-k join execution strategies that are able to adapt to the changing situations (e.g., sorted and random access costs, join rates) without having to assume a priori, presence of distributed stream statistics. Experiments show significant improvements over existing approaches.
机译:在流数据上进行复杂事件处理的大多数现有方法都基于以下假设:与查询的匹配很少,并且系统的目标是在传入的数据泛滥中识别出这些匹配。然而,在许多应用中,例如用户信用卡购买模式监视,与用户查询的匹配实际上很多,并且系统必须有效地筛选这许多匹配,以仅找到一些最优选的匹配。在本文中,我们提出了一个复杂的模式排序(CPR)框架,用于指定流数据上的top-k模式查询,提出了新的算法以支持数据流环境中的top-k模式查询,并验证了所提出算法的有效性和效率。 。我们开发的算法可识别满足模式和其他条件的前k个匹配结果。为了支持数据流的实时处理,我们不会在每个时间窗口从头开始计算前k个结果,而是随着新事件的出现和旧事件的过期动态地维护前k个结果。我们还开发了新的top-k连接执行策略,该策略能够适应不断变化的情况(例如排序和随机访问成本,连接率),而无需先验地假设分布式流统计信息的存在。实验表明,与现有方法相比有了显着改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号