首页> 外文会议>IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing >Data Transfer between Scientific Facilities – Bottleneck Analysis, Insights and Optimizations
【24h】

Data Transfer between Scientific Facilities – Bottleneck Analysis, Insights and Optimizations

机译:科学机构之间的数据传输–瓶颈分析,见解和优化

获取原文

摘要

Wide area file transfers play an important role in many science applications. File transfer tools typically deliver the highest performance for datasets with a small number of large files, but many science datasets consist of many small files. Thus it is important to understand the factors that contribute to the decrease in wide area data transfer performance for datasets with many small files. To this end, we (i) benchmark the performance of subsystems involved in end-to-end file transfer between two HPC facilities for a many-file dataset that is representative of production science transfers; (ii) characterize the per-file overhead introduced by different subsystems; (iii) identify potential dependencies and bottlenecks; (iv) study the effectiveness of transferring many files concurrently as a means of reducing per-file overheads; and (v) prototype a prefetching mechanism as an alternative of concurrency to reduce the per-file overhead on source storage system. We show that both concurrency and prefetching can help reduce the per-file overhead significantly. A reasonable level of concurrency combined with prefetching can bring the per-file overhead down to a negligible level.
机译:广域文件传输在许多科学应用中都起着重要作用。文件传输工具通常为包含少量大文件的数据集提供最高的性能,但是许多科学数据集由许多小文件组成。因此,重要的是要了解导致具有许多小文件的数据集的广域数据传输性能下降的因素。为此,我们(i)对代表生产科学传输的多文件数据集的两个HPC设施之间的端到端文件传输所涉及的子系统的性能进行基准测试; (ii)描述不同子系统引入的每个文件的开销; (iii)确定潜在的依赖关系和瓶颈; (iv)研究同时传输许多文件作为减少每文件开销的一种方式的有效性; (v)对预取机制进行原型设计,以作为并发的替代方法,以减少源存储系统上每个文件的开销。我们显示并发和预取都可以帮助显着减少每个文件的开销。合理程度的并发与预取相结合可以将每个文件的开销降低到可以忽略的水平。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号