首页> 外文学位 >Optimal 'big data' aggregation systems -- From theory to practical application.
【24h】

Optimal 'big data' aggregation systems -- From theory to practical application.

机译:最佳的“大数据”聚合系统-从理论到实际应用。

获取原文
获取原文并翻译 | 示例

摘要

The integration of computers into many facets of our lives has made the collection and storage of staggering amounts of data feasible. However, the data on its own is not so useful to us as the analysis and manipulation which allows manageable descriptive information to be extracted. New tools to extract this information from ever growing repositories of data are required.;Some of these analyses can take the form of a two phase problem which is easily distributed to take advantage of available computing power. The first phase involves computing some descriptive partial result from some subset of the original data, and the second phase involves aggregating all the partial results to create a combined output. We formalize this compute-aggregate model for a rigorous performance analysis in an effort to minimize the latency of the aggregation phase with minimal intrusive analysis or modification.;Based on our model we find an aggregation overlay attribute which highly affects aggregation latency and its dependence on an easily findable trait of aggregation. We rigorously prove the dependence and find optimal overlays for aggregation. We use the proven optima to create simple heuristics and build a system, NOAH, to take advantage of the findings. NOAH can be used by big data analysis systems.;We also study an individual problem, top-k matching, to explore the effects of optimizing the computation phase separately from aggregation and create a complete distributed system to fulfill an economically relevant task.
机译:将计算机集成到我们生活的各个方面,使得收集和存储数量惊人的数据变得可行。但是,数据本身对我们来说不如分析和操作那样有用,因为它允许提取可管理的描述性信息。需要从不断增长的数据存储库中提取此信息的新工具。;这些分析中的某些可以采取两相问题的形式,很容易分布以利用可用的计算能力。第一阶段涉及从原始数据的某个子集计算一些描述性部分结果,第二阶段涉及对所有部分结果进行汇总以创建组合输出。我们将这种计算聚合模型形式化,以进行严格的性能分析,以通过最小程度的侵入式分析或修改来最小化聚合阶段的延迟。;基于我们的模型,我们发现了一个聚合覆盖属性,该属性会严重影响聚合延迟及其对易于发现的聚集特征。我们严格证明了依赖性,并找到了最佳的聚合覆盖。我们使用经过验证的最优方法来创建简单的启发式方法,并建立一个系统NOAH,以利用这些发现。大数据分析系统可以使用NOAH 。;我们还研究了单个问题,即top-k匹配,以探索优化计算阶段与聚合分开的效果,并创建一个完整的分布式系统来完成与经济相关的任务。

著录项

  • 作者

    Culhane, William John, IV.;

  • 作者单位

    Purdue University.;

  • 授予单位 Purdue University.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2015
  • 页码 110 p.
  • 总页数 110
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:52:48

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号