首页> 外文会议>IEEE International Congress on Big Data >Building Wrangler: A transformational data intensive resource for the open science community
【24h】

Building Wrangler: A transformational data intensive resource for the open science community

机译:Building Wrangler:开放科学界的变革性数据密集型资源

获取原文

摘要

With the growth of data in science and engineering fields and the I/O intense technologies used to carry out research with these massive datasets, it has become clear new solutions to support data research is required. In support of this, the Texas Advanced Computing Center presents Wrangler, the first open science research platform built from the ground up in support of data. Wrangler features a replicated 10 PB Lustre based parallel file system, compute capacity of 120 Intel Haswell nodes and 15 TB of RAM. In addition to the base system, Wrangler features a unique NAND flash-based storage system from DSSD, providing users with 0.5 PB of storage 1 TB/s bandwidth and 250 million IOP/s across the cluster. Supporting Hadoop, but not just Hadoop, Wrangler will provide current and future researchers with an environment supporting the most I/O intensive workflows in fields from astronomy to paleontology. With data at the forefront of Wrangler's mission, support for ETL workflows, data curation, and data publication will enable users as they both discover new results and publish their own research. Support for both SQL and noSQL databases and GIS based extensions will also be provided, allowing users to leverage these tools for both data cataloging and cross-study integration. Wrangler will allow users to focus more on what is most important to them, the data and knowledge gained from its analysis, and less on the details of curation and I/O optimization.
机译:随着科学和工程领域中数据的增长以及用于对这些海量数据集进行研究的I / O密集型技术,很明显,需要支持数据研究的新解决方案。为此,得克萨斯州高级计算中心提供了Wrangler,这是第一个完全为数据支持而构建的开放式科学研究平台。牧马人具有可复制的基于10 PB Lustre的并行文件系统,具有120个Intel Haswell节点和15 TB RAM的计算能力。除基本系统外,Wrangler还具有DSSD独特的基于NAND闪存的存储系统,可为用户提供0.5 PB的存储(1 TB / s带宽)和2.5亿IOP / s(整个集群)。 Wrangler不仅支持Hadoop,而且还支持Hadoop,它将为当前和未来的研究人员提供一个支持从天文学到古生物学领域最I / O密集型工作流的环境。有了Wrangler使命的最重要数据,对ETL工作流程,数据管理和数据发布的支持将使用户既发现新结果又发表自己的研究成果。还提供对SQL和noSQL数据库以及基于GIS的扩展的支持,从而使用户可以利用这些工具进行数据分类和跨研究集成。牧马人将允许用户更多地关注对他们而言最重要的事物,从其分析中获得的数据和知识,而不再关注策展和I / O优化的细节。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号