首页> 外文期刊>IEICE transactions on information and systems >High-Performance End-to-End Integrity Verification on Big Data Transfer
【24h】

High-Performance End-to-End Integrity Verification on Big Data Transfer

机译:大数据传输的高性能端到端完整性验证

获取原文
           

摘要

The scale of scientific data generated by experimental facilities and simulations in high-performance computing facilities has been proliferating with the emergence of IoT-based big data. In many cases, this data must be transmitted rapidly and reliably to remote facilities for storage, analysis, or sharing, for the Internet of Things (IoT) applications. Simultaneously, IoT data can be verified using a checksum after the data has been written to the disk at the destination to ensure its integrity. However, this end-to-end integrity verification inevitably creates overheads (extra disk I/O and more computation). Thus, the overall data transfer time increases. In this article, we evaluate strategies to maximize the overlap between data transfer and checksum computation for astronomical observation data. Specifically, we examine file-level and block-level (with various block sizes) pipelining to overlap data transfer and checksum computation. We analyze these pipelining approaches in the context of GridFTP, a widely used protocol for scientific data transfers. Theoretical analysis and experiments are conducted to evaluate our methods. The results show that block-level pipelining is effective in maximizing the overlap mentioned above, and can improve the overall data transfer time with end-to-end integrity verification by up to 70% compared to the sequential execution of transfer and checksum, and by up to 60% compared to file-level pipelining.
机译:随着基于物联网的大数据的出现,由实验设施和高性能计算设施中的模拟生成的科学数据的规模正在激增。在许多情况下,必须将这些数据快速可靠地传输到远程设施,以进行物联网(IoT)应用程序的存储,分析或共享。同时,在将数据写入目标位置的磁盘后,可以使用校验和来验证IoT数据,以确保其完整性。但是,这种端到端完整性验证不可避免地会产生开销(额外的磁盘I / O和更多的计算)。因此,总的数据传输时间增加。在本文中,我们评估了将数据传输和天文观测数据的校验和计算之间的重叠最大化的策略。具体来说,我们检查文件级和块级(具有各种块大小)的流水线以重叠数据传输和校验和计算。我们在GridFTP(一种广泛用于科学数据传输的协议)的上下文中分析这些流水线方法。进行理论分析和实验以评估我们的方法。结果表明,块级流水线可有效地最大化上述重叠,并且与顺序执行传输和校验和相比,通过端到端完整性验证,可以将总体数据传输时间缩短多达70%。与文件级流水线相比,最高可达60%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号