首页> 外文期刊>ACM transactions on computer systems >BlueDBM: Distributed Flash Storage for Big Data Analytics
【24h】

BlueDBM: Distributed Flash Storage for Big Data Analytics

机译:BlueDBM:用于大数据分析的分布式闪存

获取原文
获取原文并翻译 | 示例

摘要

Complex data queries, because of their need for random accesses, have proven to be slow unless all the data can be accommodated in DRAM. There are many domains, such as genomics, geological data, and daily Twitter feeds, where the datasets of interest are 5TB to 20TB. For such a dataset, one would need a cluster with 100 servers, each with 128GB to 256GB of DRAM, to accommodate all the data in DRAM. On the other hand, such datasets could be stored easily in the flash memory of a rack-sized cluster. Flash storage has much better random access performance than hard disks, which makes it desirable for analytics workloads. However, currently available off-the-shelf flash storage packaged as SSDs does not make effective use of flash storage because it incurs a great amount of additional overhead during flash device management and network access. In this article, we present BlueDBM, a new system architecture that has flash-based storage with in-store processing capability and a low-latency high-throughput intercontroller network between storage devices. We show that BlueDBM outperforms a flash-based system without these features by a factor of 10 for some important applications. While the performance of a DRAM-centric system falls sharply even if only 5% to 10% of the references are to secondary storage, this sharp performance degradation is not an issue in BlueDBM. BlueDBM presents an attractive point in the cost/performance tradeoff for Big Data analytics.
机译:复杂数据查询由于需要随机访问而被证明是缓慢的,除非所有数据都可以容纳在DRAM中。有许多领域,例如基因组学,地质数据和每日Twitter提要,其中感兴趣的数据集为5TB至20TB。对于这样的数据集,将需要一个包含100个服务器的群集,每个服务器具有128GB至256GB的DRAM,以容纳DRAM中的所有数据。另一方面,此类数据集可以轻松存储在机架大小的集群的闪存中。闪存的随机访问性能比硬盘好得多,这使其成为分析工作负载的理想选择。但是,当前包装为SSD的现成现成闪存无法有效利用闪存,因为在闪存设备管理和网络访问期间会产生大量额外开销。在本文中,我们介绍了BlueDBM,它是一种新的系统体系结构,具有基于闪存的存储功能,该存储具有店内处理功能以及存储设备之间的低延迟高吞吐量交互控制器网络。我们证明,对于某些重要应用程序,BlueDBM优于不具有这些功能的基于Flash的系统的性能提高了10倍。即使仅5%到10%的引用是针对辅助存储的,以DRAM为中心的系统的性能也会急剧下降,但是在BlueDBM中,性能的急剧下降并不是问题。 BlueDBM在大数据分析的成本/性能折衷方面提出了一个有吸引力的观点。

著录项

  • 来源
    《ACM transactions on computer systems》 |2016年第3期|7.1-7.31|共31页
  • 作者单位

    MIT, Stata Ctr, 32-G836,32 Vassar St, Cambridge, MA 02139 USA;

    MIT, Stata Ctr, 32-G836,32 Vassar St, Cambridge, MA 02139 USA;

    MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA|Inha Univ, Room 1010,High Tech Bldg,100 Inharo, Incheon, South Korea;

    Quanta Res Cambridge, Cambridge, MA USA|Accelerated Tech Inc, Cambridge, MA USA;

    Quanta Res Cambridge, Cambridge, MA USA|MIT, Stata Ctr, 32-G870,32 Vassar St, Cambridge, MA USA;

    Quanta Res Cambridge, Cambridge, MA USA|38 Ashland St, Arlington, MA 02476 USA;

    MIT, Stata Ctr, 32-G836,32 Vassar St, Cambridge, MA 02139 USA;

    MIT, Stata Ctr, 32-G866,32 Vassar St, Cambridge, MA USA;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Wireless sensor networks; media access control; multichannel; radio interference; time synchronization;

    机译:无线传感器网络;媒体访问控制;多通道;无线电干扰;时间同步;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号