首页> 外文学位 >Real-Time Query Systems for Complex Data Sources.
【24h】

Real-Time Query Systems for Complex Data Sources.

机译:复杂数据源的实时查询系统。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation presents techniques for building scalable systems that allow real-time querying of complex data sources. In recent years, networking and sensing advances have dramatically increased the volume of information available to data consumers. However, coping with large scales and high data rates often requires processing data in real time, as it arrives, rather than storing it for later analysis. Our thesis is that by including the data acquisition process in the overall system design, it is possible to build scalable, real-time stream processing systems for complex data sources.;We have built two systems to demonstrate a number of unique design features required for scalable operation in our chosen domains. Cobra is a system that taps online RSS feeds (such as blogs, news articles and websites' user comments) as its data source. Cobra repeatedly crawls a set of RSS feeds, matching the contents to keyword-based user queries, similar to those used in Web search engines. As RSS-based content can change frequently, the design ensures that the latency between crawls is low, while still scaling to a large number of RSS feeds and many concurrent user queries.;Secondly, Argos is a system for widely-distributed, outdoor wireless network monitoring. Capturing 802.11 WiFi traffic across a large urban area, Argos enables a wide range of user queries, such as mobile node tracking, malware detection, and traffic characterization. Use of a wireless mesh network to connect the deployed sniffer nodes introduces additional challenges due to its limited bandwidth capacity. To address this restriction, we designed a novel in-network packet merging process and demonstrate its bandwidth savings. Additionally, Argos provides a variety of channel management schemes; 802.11 defines up to 14 radio channels but each sniffer can only capture from one channel at a time, necessitating policies for when to capture from which channel.;These systems are built around three design principles that aid in the real-time querying of complex data sources: query interfaces tailored to the application's specific data types, optimized data collection processes, and allowing queries to provide feedback to the collection process.
机译:本文提出了构建可扩展系统的技术,这些系统允许实时查询复杂的数据源。近年来,网络和传感技术的进步极大地增加了数据消费者可获得的信息量。但是,要应对大规模和高数据速率,通常需要实时处理到达的实时数据,而不是将其存储以供以后分析。我们的论点是,通过将数据采集过程包括在整个系统设计中,可以为复杂的数据源构建可伸缩的实时流处理系统。我们已经构建了两个系统,以演示为实现这些任务所需的许多独特设计功能我们选择的领域中的可扩展操作。眼镜蛇是一种利用在线RSS feed(例如博客,新闻文章和网站的用户评论)作为其数据源的系统。 Cobra反复抓取一组RSS feed,使内容与基于关键字的用户查询匹配,类似于Web搜索引擎中使用的查询。由于基于RSS的内容可以经常更改,因此该设计确保了爬网之间的等待时间很短,同时仍可扩展到大量RSS提要和许多并发用户查询。其次,Argos是用于广泛分布的室外无线系统网络监控。 Argos可以捕获整个市区的802.11 WiFi流量,从而可以进行广泛的用户查询,例如移动节点跟踪,恶意软件检测和流量表征。由于其有限的带宽容量,使用无线网状网络连接已部署的嗅探器节点会带来其他挑战。为了解决此限制,我们设计了一种新颖的网络内数据包合并过程,并演示了其节省带宽的方法。此外,Argos还提供了多种渠道管理方案。 802.11最多定义14个无线电信道,但每个嗅探器一次只能捕获一个信道,因此需要制定何时从哪个信道捕获的策略。这些系统围绕三种设计原理构建,可帮助实时查询复杂数据来源:针对应用程序的特定数据类型定制的查询接口,优化了数据收集过程,并允许查询向收集过程提供反馈。

著录项

  • 作者

    Rose, Ian Thomas.;

  • 作者单位

    Harvard University.;

  • 授予单位 Harvard University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 163 p.
  • 总页数 163
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号