首页> 外文会议>Conference on survey and other telescope technologies and discoveries >Data Processing Factory for the Sloan Digital Sky Survey
【24h】

Data Processing Factory for the Sloan Digital Sky Survey

机译:用于斯隆数字天空调查的数据处理工厂

获取原文
获取外文期刊封面目录资料

摘要

The Sloan Digital Sky Survey (SDSS) data handling presents two challenges: large data volume and timely production of spectroscopic plates from imaging data. A data processing factory, using technologies both old and new, handles this flow. Distribution to end users is via disk farms, to serve corrected images and calibrated spectra, and a database, to efficiently process catalog queries. For distribution of modest amounts of data from Apache Point Observatory to Fermilab, scripts use rsync to update files, while larger data transfers are accomplished by shipping magnetic tapes commercially. All data processing pipelines are wrapped in scripts to address consecutive phases: preparation, submission, checking, and quality control. We constructed the factory by chaining these pipelines together while using an operational database to hold processed imaging catalogs. The science database catalogs all imaging and spectroscopic object, with pointers to the various external files associated with them. Diverse computing systems address particular processing phases. UNIX computers handle tape reading and writing, as well as calibration steps that require access to a large amount of data with relatively modest computational demands. Commodity CPUs process steps that requires access to a limited amount of data with more demanding computations requirements. Disk severs optimized for cost per Gbyte serve terabytes of processed data, while servers optimized for disk read speed run SQLServer software to process queries on the catalogs. This factory produced data for the SDSS Early Data Release in June 2001, and it is currently producing Data Release One, scheduled for January 2003.
机译:Sloan Digital Sky Square(SDSS)数据处理呈现出两个挑战:大数据量,及时生产光谱板块从成像数据。一个数据处理工厂,使用旧和新的技术处理此流程。分发到最终用户是通过磁盘场,用于服务校正的图像和校准光谱,以及有效地处理目录查询。对于从Apache Point Observatory向Fermilab分发适度数量的数据,脚本使用rsync来更新文件,而较大的数据传输是通过商业运输磁带来完成的。所有数据处理管道都以脚本包裹,以满足连续阶段:准备,提交,检查和质量控制。我们通过在使用操作数据库中持有处理的成像目录时将这些管道链接在一起来构建工厂。科学数据库目录所有成像和光谱对象,指向与它们相关联的各种外部文件的指针。不同的计算系统地址特定处理阶段。 UNIX计算机处理磁带读取和写入,以及需要使用相对适度的计算需求访问大量数据的校准步骤。商品CPU流程步骤需要使用更多苛刻的计算要求访问有限数量的数据。 SED SEVERS针对每个GBYTE服务的成本优化,用于处理数据的TBERBYTES,而针对磁盘读取速度运行SQLSERVER软件的服务器进行了优化的服务器,以处理目录上的查询。本厂于2001年6月生产了SDSS早期数据发布的数据,目前正在制作数据发布,计划于2003年1月。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号