首页> 外文OA文献 >Efficient publish/subscribe processing over geo-textual stream
【2h】

Efficient publish/subscribe processing over geo-textual stream

机译:地理文本流上的高效发布/订阅处理

摘要

With the prevalence of social media and GPS-enabled devices, a massive amount of geo-textual data has been continuously generated in a stream fashion. In this thesis, we study the problem of efficiently processing streaming geo-textual data over publish/subscribe systems (pub/sub for short), which has broad applications in location-based advertising and information dissemination. In a spatial-keyword pub/sub system, users can register their interest as spatial-keyword subscriptions (e.g., interest in nearby restaurant discount); a stream of geo-textual messages (e.g., geo-tagged e-coupons) released by publishers will be delivered to the relevant subscriptions continuously. We comprehensively study three important aspects regarding spatial-keyword pub/sub systems as follows.Firstly, we investigate boolean-based spatial-keyword pub/sub, where a message is delivered to a subscription if it contains all the subscription keywords and falls inside the subscription range. We tackle both stationary subscriptions and moving subscriptions by proposing a novel adaptive indexing structure, which significantly reduces the processing time of incoming messages.Secondly, we study ranking-based spatial-keyword pub/sub, where we continuously maintain top-k most relevant messages for all the subscriptions over a sliding window. A novel index which seamlessly integrates both spatial-based and keyword-based pruning rules is proposed to support efficient message dissemination. A cost-based re-evaluation technique is further developed to reduce the number of re-evaluations. This is the first work to investigate spatial-keyword pub/sub over sliding window.Finally, we investigate distributed stream processing, where we process a continuous data stream in a distributed manner. We first study distributed stream similarity join over textual data. We develop a novel length-based distribution framework to dispatch incoming data by the number of tokens inside, which incurs no data replication, small communication cost and high throughput. We also design a bundle-based local index to facilitate the local join by grouping similar objects. We then consider geo-textual data by extending ranking-based spatial-keyword pub/sub into a distributed environment. Efficient distribution mechanisms are developed to achieve load balance and high throughput. This is the first work that systematically studies ranking-based spatial-keyword pub/sub in a distributed stream environment.
机译:随着社交媒体和支持GPS的设备的普及,以流方式不断生成大量的地理文本数据。在本文中,我们研究了在发布/订阅系统(简称pub / sub)上有效处理流式地理文本数据的问题,该系统在基于位置的广告和信息传播中具有广泛的应用。在空间关键字发布/订阅系统中,用户可以将其兴趣注册为空间关键字订阅(例如,对附近餐厅折扣的兴趣);由发布商发布的一系列地理文本消息(例如带有地理标签的电子优惠券)将连续交付给相关订阅。我们对空间关键字发布/订阅系统的三个重要方面进行了全面研究,如下所示:首先,我们研究基于布尔值的空间关键字发布/订阅,其中消息包含所有订阅关键字并且属于订阅消息,则将其传递到订阅订阅范围。我们通过提出一种新颖的自适应索引结构来解决固定订阅和移动订阅的问题,这将显着减少传入消息的处理时间;其次,我们研究基于排名的空间关键字pub / sub,在其中我们连续维护前k个最相关的消息滑动窗口上的所有订阅。提出了一种新的索引,该索引无缝集成了基于空间和基于关键字的修剪规则,以支持有效的消息分发。进一步开发了一种基于成本的重新评估技术,以减少重新评估的次数。这是研究滑动窗口上的空间关键字pub / sub的第一项工作。最后,我们研究了分布式流处理,其中我们以分布式方式处理连续数据流。我们首先研究文本数据上的分布式流相似性连接。我们开发了一种新颖的基于长度的分发框架,该分发框架可以根据内部令牌的数量来分配传入的数据,这不会导致数据复制,通信成本低和吞吐量高。我们还设计了基于包的本地索引,以通过对相似对象进行分组来促进本地联接。然后,我们通过将基于排名的空间关键字pub / sub扩展到分布式环境中来考虑地理文本数据。开发了有效的分配机制来实现负载平衡和高吞吐量。这是在分布式流环境中系统研究基于排名的空间关键字pub / sub的第一项工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号