首页> 外文会议>International Conference on High Performance Computing Simulation >An automated infrastructure to support high-throughput bioinformatics
【24h】

An automated infrastructure to support high-throughput bioinformatics

机译:一种支持高吞吐量生物信息学的自动基础架构

获取原文
获取外文期刊封面目录资料

摘要

The number of domains affected by the big data phenomenon is constantly increasing, both in science and industry, with high-throughput DNA sequencers being among the most massive data producers. Building analysis frameworks that can keep up with such a high production rate, however, is only part of the problem: current challenges include dealing with articulated data repositories where objects are connected by multiple relationships, managing complex processing pipelines where each step depends on a large number of configuration parameters and ensuring reproducibility, error control and usability by non-technical staff. Here we describe an automated infrastructure built to address the above issues in the context of the analysis of the data produced by the CRS4 next-generation sequencing facility. The system integrates open source tools, either written by us or publicly available, into a framework that can handle the whole data transformation process, from raw sequencer output to primary analysis results.
机译:受重大数据现象影响的域名在科学和工业中不断增加,具有高通量DNA序列序列是最巨大的数据生产商之一。尽管如此,可以跟上这种高生产率的建立分析框架只是问题的一部分:当前挑战包括处理对象通过多个关系连接的铰接数据存储库,管理每个步骤的复杂处理流水线,其中每个步骤都取决于大量配置参数数量,并确保非技术人员的再现性,错误控制和可用性。在这里,我们描述了一种自动化基础设施,以解决上述问题的上述问题,在分析CRS4下一代测序设施的数据的分析。该系统集成了由我们或公开可用的开源工具,进入可以处理整个数据转换过程的框架,从RAW测序器输出到主要分析结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号