首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >Filtering Data Streams for Entity-Based Continuous Queries
【24h】

Filtering Data Streams for Entity-Based Continuous Queries

机译:过滤基于实体的连续查询的数据流

获取原文
获取原文并翻译 | 示例

摘要

The idea of allowing query users to relax their correctness requirements in order to improve performance of a data stream management system (e.g., location-based services and sensor networks) has been recently studied. By exploiting the maximum error (or tolerance) allowed in query answers, algorithms for reducing the use of system resources have been developed. In most of these works, however, query tolerance is expressed as a numerical value, which may be difficult to specify. We observe that in many situations, users may not be concerned with the actual value of an answer, but rather which object satisfies a query (e.g., "who is my nearest neighbor?¿). In particular, an entity-based query returns only the names of objects that satisfy the query. For these queries, it is possible to specify a tolerance that is "nonvalue-based.¿ In this paper, we study fraction-based tolerance, a type of nonvalue-based tolerance, where a user specifies the maximum fractions of a query answer that can be false positives and false negatives. We develop fraction-based tolerance for two major classes of entity-based queries: 1) nonrank-based query (e.g., range queries) and 2) rank-based query (e.g., k-nearest-neighbor queries). These definitions provide users with an alternative to specify the maximum tolerance allowed in their answers. We further investigate how these definitions can be exploited in a distributed stream environment. We design adaptive filter algorithms that allow updates be dropped conditionally at the data stream sources without affecting the overall query correctness. Extensive experimental results show that our protocols reduce the use of network and energy resources significantly.
机译:最近研究了允许查询用户放宽其正确性要求以改善数据流管理系统(例如,基于位置的服务和传感器网络)的性能的想法。通过利用查询答案中允许的最大错误(或容忍度),开发了减少系统资源使用的算法。但是,在大多数这些工作中,查询容忍度表示为一个数值,可能难以指定。我们观察到,在许多情况下,用户可能并不关心答案的实际值,而是关注哪个对象满足查询要求(例如,“谁是我最近的邻居?”。)特别是,基于实体的查询返回仅满足查询条件的对象的名称。对于这些查询,可以指定“基于非值的公差”。在本文中,我们研究基于分数的公差,这是一种基于非值的公差,其中用户指定查询答案的最大分数,可以是假阳性和假阴性。我们为基于实体的查询的两大类开发基于分数的容限:1)非基于等级的查询(例如,范围查询)和2)基于等级的查询(例如,k最近邻查询)。这些定义为用户提供了另一种选择,以指定答案中允许的最大公差。我们将进一步研究如何在分布式流环境中利用这些定义。我们设计了自适应过滤器算法,该算法允许有条件地在数据流源处删除更新,而不会影响整体查询的正确性。大量的实验结果表明,我们的协议大大减少了网络和能源的使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号