首页> 外文学位 >Approximation algorithms for frequency related query processing on streaming data.
【24h】

Approximation algorithms for frequency related query processing on streaming data.

机译:用于流数据的频率相关查询处理的近似算法。

获取原文
获取原文并翻译 | 示例

摘要

With the emergence of more and more data stream applications, such as Internet traffic measurement, sensor network data monitoring, Web click and crawling stream analysis, financial data alert and so on, researchers realize that many data processing models and algorithms well-suited for traditional database applications are not applicable in those new streaming scenarios. To address the problem, novel data stream processing systems and algorithms have been proposed within database, theory and computer networking communities. Some of the work has led to commercial systems or algorithms which have been applied in industry. Streaming algorithms also have found their way into non-streaming environments where massive data processing is needed.; This thesis focuses on the algorithmic aspect of data stream processing, more specifically, approximation algorithms for answering frequency related queries on streaming data. Example-queries are "find the number of similar record pairs in a very large relational table", "identify URLs that appear for the first time in the crawling stream of a search engine" and "give a list of IDs which appear more frequently in a web click stream". Efficiently answering these queries with bounds on time and space costs is both important and challenging, because fast response is either required or desirable in many scenarios, and the available computing and fast storage resources are often very limited compared with the massive streaming data volume. Thus, approximation algorithms trading accuracy for computing or storage resources is a valuable option in these cases.; In this thesis, two types of data reduction techniques, namely sketching and sampling, are studied. Both techniques are suitable not only for answering frequency related queries over streaming data, but also have a wide range of other application areas. This thesis focuses on exploring these powerful techniques to answer frequency related queries under data stream scenarios. More precisely, based on well-known sketching and sampling techniques, this thesis proposes new data structures and one-pass approximation algorithms to answer membership queries, frequency queries, join/self-join size estimations, similarity join/self-join size estimations and result set size estimations for similarity searches. Both theoretical and experimental analysis show that our new techniques improve the state-of-the-art.
机译:随着越来越多的数据流应用程序的出现,例如Internet流量测量,传感器网络数据监视,Web单击和爬网分析,财务数据警报等,研究人员意识到,许多数据处理模型和算法都非常适合传统数据库应用程序不适用于那些新的流方案。为了解决该问题,在数据库,理论和计算机网络社区内已经提出了新颖的数据流处理系统和算法。一些工作导致了已经在工业中应用的商业系统或算法。流算法也已经进入需要大量数据处理的非流环境。本文着眼于数据流处理的算法方面,更具体地说,是一种用于回答流数据上与频率相关的查询的近似算法。示例查询是“在非常大的关系表中查找相似记录对的数量”,“标识在搜索引擎的抓取流中首次出现的URL”和“给出在以下情况下更频繁出现的ID列表”网络点击流”。用时间和空间成本来有效地回答这些查询既重要又具有挑战性,因为在许多情况下都需要快速响应,或者快速响应是必需的,而且与庞大的流数据量相比,可用的计算和快速存储资源通常非常有限。因此,在这些情况下,折衷计算或存储资源的准确性的近似算法是一种有价值的选择。本文研究了两种数据约简技术,即草图绘制和采样。两种技术不仅适用于回答流数据上与频率相关的查询,而且还具有广泛的其他应用领域。本文着重探讨这些强大的技术来回答数据流场景下与频率相关的查询。更准确地说,本文基于众所周知的草图绘制和采样技术,提出了新的数据结构和一遍近似算法来回答成员资格查询,频率查询,联接/自联接大小估计,相似联接/自联接大小估计以及相似搜索的结果集大小估计。理论和实验分析均表明,我们的新技术改进了最新技术。

著录项

  • 作者

    Deng, Fan.;

  • 作者单位

    University of Alberta (Canada).;

  • 授予单位 University of Alberta (Canada).;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2007
  • 页码 133 p.
  • 总页数 133
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

  • 入库时间 2022-08-17 11:39:54

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号