...
首页> 外文期刊>Computers & geosciences >Large-scale seismic signal analysis with Hadoop
【24h】

Large-scale seismic signal analysis with Hadoop

机译:使用Hadoop进行大规模地震信号分析

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In seismology, waveform cross correlation has been used for years to produce high-precision hypocenter locations and for sensitive detectors. Because correlated seismograms generally are found only at small hypocenter separation distances, correlation detectors have historically been reserved for spotlight purposes. However, many regions have been found to produce large numbers of correlated seismograms, and there is growing interest in building next-generation pipelines that employ correlation as a core part of their operation. In an effort to better understand the distribution and behavior of correlated seismic events, we have cross correlated a global dataset consisting of over 300 million seismograms. This was done using a conventional distributed cluster, and required 42 days. In anticipation of processing much larger datasets, we have re-architected the system to run as a series of MapReduce jobs on a Hadoop cluster. In doing so we achieved a factor of 19 performance increase on a test dataset. We found that fundamental algorithmic transformations were required to achieve the maximum performance increase. Whereas in the original IO-bound implementation, we went to great lengths to minimize IO, in the Hadoop implementation where IO is cheap, we were able to greatly increase the parallelism of our algorithms by performing a tiered series of very fine-grained (highly parallelizable) transformations on the data. Each of these MapReduce jobs required reading and writing large amounts of data. But, because IO is very fast, and because the fine-grained computations could be handled extremely quickly by the mappers, the net was a large performance gain.
机译:在地震学中,波形互相关已被使用多年,以产生高精度震源位置并用于敏感探测器。由于通常仅在较小的震源间隔距离处发现相关地震图,因此历史上一直将相关检测器用于聚光灯目的。但是,已经发现许多地区会产生大量的相关地震图,并且人们对构建以相关为操作核心的下一代管道的兴趣日益浓厚。为了更好地了解相关地震事件的分布和行为,我们对包含超过3亿个地震图的全球数据集进行了互相关。这是使用常规的分布式群集完成的,需要42天。为了处理更大的数据集,我们重新设计了系统,使其在Hadoop集群上作为一系列MapReduce作业运行。这样,我们在测试数据集上实现了19倍的性能提升。我们发现需要基本的算法转换来实现最大的性能提升。在最初的IO绑定实现中,我们竭尽全力使IO最小化,而在IO廉价的Hadoop实现中,我们能够通过执行一系列非常细粒度的分层(极大地提高了算法的并行度)可并行化)对数据的转换。每个MapReduce作业都需要读取和写入大量数据。但是,由于IO速度非常快,而且由于映射程序可以非常快速地处理细粒度的计算,因此网络可以大大提高性能。

著录项

  • 来源
    《Computers & geosciences》 |2014年第5期|145-154|共10页
  • 作者单位

    Google Inc., 1600 Amphitheater Parkway, Mountain View, CA 94043, USA;

    Lawrence Livermore National Laboratory, 7000 East Avenue, MS 046, Livermore, CA 94550, USA;

    Lawrence Livermore National Laboratory, 7000 East Avenue, MS 046, Livermore, CA 94550, USA;

    Lawrence Livermore National Laboratory, 7000 East Avenue, MS 046, Livermore, CA 94550, USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Correlation; Hadoop; MapReduce; Seismology;

    机译:相关性Hadoop;MapReduce;地震学;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号