首页> 外文OA文献 >ビットマップインデックスに基づくデータ解析のためのハードウェアシステムに関する研究
【2h】

ビットマップインデックスに基づくデータ解析のためのハードウェアシステムに関する研究

机译:基于位图索引的数据分析硬件系统研究

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Recent years have witnessed a massive growth of global data generated from web services, social media networks, and science experiments, as well as the  “tsunami" of Internet-of-Things devices. According to a Cisco forecast, total data center traffic is projected to hit 15.3 zettabytes (ZB) by the end of 2020. Gaining insight into a vast amount of data is highly important because valuable data are the driving force for business decisions and processes, as well as scientistsu27 exploration and discovery.To facilitate analytics, data are usually indexed in advance. Depending on the workloads, such as online transaction processing (OLTP) workloads and online analytics processing (OLAP) workloads, several indexing frameworks have been proposed. Specifically, B+-tree and hash are two common indexing methods in OLTP, where the number of querying and updating processes are nearly similar. Unlike OLTP, OLAP concentrates on querying in a huge historical storage, where updating processes are irregular. Most queries in OLAP are also highly complex and involve aggregations, while the execution time is often limited. To address these challenges, a bitmap index (BI) was proposed and has been proven as a promising candidate for OLAP-like workloads.A BI is a bit-level matrix, whose number of rows and columns are the length and cardinality of the datasets, respectively. With a BI, answering multi-dimensional queries becomes a series of bitwise operators, e.g. AND, OR, XOR, and NOT, on bit columns. As a result, a BI has proven profitable for solving complex queries in large enterprise databases and scientific databases. More significantly, because of the usage of low-hardware logical operators, a BI appears to be suitable for advanced parallel-processing platforms, such as multi-core CPUs, graphics processing units (GPUs), field-programmable logic arrays (FPGAs), and application-specific integrated circuits (ASIC).Modern FPGAs and ASICs have become increasingly important in data analytics because they can confront both data-intensive and computing-intensive tasks effectively. Furthermore, FPGAs and ASICs can provide higher energy efficiency, compared to CPUs and GPUs. As a result, since 2010, Microsoft has been working on the so-called Catapult project, where FPGAs were integrated into datacenter servers to accelerate their search engine as well as AI applications. In 2016, Oracle for the first time introduced SPARC S7 and M7 processors that are used for accelerating the OLTP databases. Nonetheless, a study on the feasibility of BI-based analytics systems using FPGAs and ASICs has not yet been developed.This dissertation, therefore, focuses on implementing the data analytics systems, in both FPGAs and ASICs, using BI. The advantages of the proposed systems include scalability, low data input/output cost, high processing throughput, and high energy efficiency. Three main modules are proposed: (1) a BI creator that indexes the given records by a list of keys and outputs the BI vectors to the external memory; (2) a BI-based query processor that employs the given BI vectors to answer usersu27 queries and outputs the results to the external memory; and (3) an BI encoder that returns the positions of one-bits of bitmap results to the external memory. Six hardware systems based on those three modules are implemented in an FPGA in advance for functional verification and then partially in two ASICs|180-nm bulk complementary metal-oxide-semiconductor (CMOS) and 65-nm Silicon-On-Thin-Buried-Oxide (SOTB) CMOS technology―for physical design verification. Based on the experimental results, these proposed systems outperform other CPU-based and GPU-based designs, especially in terms of energy efficiency.
机译:近年来,通过Web服务,社交媒体网络和科学实验以及物联网设备的“海啸”生成的全球数据大量增长,根据思科的预测,预计数据中心的总流量到2020年底达到15.3 ZB。对大量数据的洞察力非常重要,因为有价值的数据是业务决策和流程以及科学家探索和发现的驱动力。 ,数据通常是预先索引的,根据诸如在线事务处理(OLTP)工作负载和在线分析处理(OLAP)工作负载的工作量,已提出了几种索引框架,特别是B +-树和哈希是两种常见的索引方法在OLTP中,查询和更新过程的数量几乎相似。与OLTP不同,OLAP专注于在庞大的历史存储中进行查询,而更新过程是定期。 OLAP中的大多数查询也非常复杂,涉及聚合,而执行时间通常有限。为了解决这些挑战,提出了一种位图索引(BI),并已证明它是类似OLAP的工作负载的有希望的候选者.BI是位级矩阵,其行和列数是数据集的长度和基数, 分别。使用BI,回答多维查询成为一系列按位运算符,例如位列上的AND,OR,XOR和NOT。因此,事实证明,BI可以解决大型企业数据库和科学数据库中的复杂查询,因而可获利。更重要的是,由于使用了低硬件逻辑运算符,BI似乎适合于高级并行处理平台,例如多核CPU,图形处理单元(GPU),现场可编程逻辑阵列(FPGA),现代FPGA和ASIC在数据分析中变得越来越重要,因为它们可以有效地处理数据密集型任务和计算密集型任务。此外,与CPU和GPU相比,FPGA和ASIC可以提供更高的能效。因此,自2010年以来,微软一直致力于所谓的Catapult项目,该项目将FPGA集成到数据中心服务器中,以加速其搜索引擎和AI应用程序。 2016年,Oracle首次推出了用于加速OLTP数据库的SPARC S7和M7处理器。然而,关于使用FPGA和ASIC的基于BI的分析系统的可行性的研究尚未开展,因此,本文着重于使用BI在FPGA和ASIC中实现数据分析系统。提出的系统的优点包括可伸缩性,低数据输入/输出成本,高处理吞吐量和高能效。提出了三个主要模块:(1)BI创建器,该BI创建器通过键列表索引给定记录,并将BI向量输出到外部存储器; (2)一个基于BI的查询处理器,该处理器使用给定的BI向量来回答用户的查询并将结果输出到外部存储器; (3)BI编码器,其将一位图结果的位置返回到外部存储器。基于这三个模块的六个硬件系统预先在FPGA中实现以进行功能验证,然后部分地在两个ASICs中使用:180 nm块状互补金属氧化物半导体(CMOS)和65 nm薄埋硅芯片。氧化物(SOTB)CMOS技术-用于物理设计验证。根据实验结果,这些拟议的系统优于其他基于CPU和基于GPU的设计,特别是在能效方面。

著录项

  • 作者

    Nguen Xuan Thuan;

  • 作者单位
  • 年度 2017
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号