首页> 外文期刊>Proteomics >ProteomeGRID: towards a high-throughput proteomics pipeline through opportunistic cluster image computing for two-dimensional gel electrophoresis
【24h】

ProteomeGRID: towards a high-throughput proteomics pipeline through opportunistic cluster image computing for two-dimensional gel electrophoresis

机译:ProteomeGRID:通过机会簇图像计算实现二维凝胶电泳的高通量蛋白质组学流水线

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

The quest for high-throughput proteomics has revealed a number of critical issues. Whilst improved two-dimensional gel electrophoresis (2-DE) sample preparation, staining and imaging issues are being actively pursued by industry, reliable high-throughput spot matching and quantification remains a significant bottleneck in the bioinformatics pipeline, thus restricting the flow of data to mass spectrometry through robotic spot excision and protein digestion. To this end, it is important to establish a full multi-site Grid infrastructure for the processing, archival, standardisation and retrieval of proteomic data and metadata. Particular emphasis needs to be placed on large-scale image mining and statistical cross-validation for reliable, fully automated differential expression analysis, and the development of a statistical 2-DE object model and ontology that underpins the emerging HUPO PSI GPS (Human Proteome Organization Proteomics Standards Initiative General Proteomics Standards. The first step towards this goal is to overcome the computational and communications burden entailed by the image analysis of 2-DE gels with Grid enabled cluster computing. This paper presents the proTurbo framework as part of the ProteomeGRID, which utilises Condor cluster management combined with CORBA communications and JPEG-LS lossless image compression for task farming. A novel probabilistic eager scheduler has been developed to minimise make-span, where tasks are duplicated in response to the likelihood of the Condor machines' owners evicting them. A 60 gel experiment was pair-wise image registered (3540 tasks) on a 40 machine Linux cluster. Real-world performance and network overhead was gauged, and Poisson distributed worker evictions were simulated. Our results show a 4:1 lossless and 9:1 near lossless image compression ratio and so network overhead did not affect other users. With 40 workers a 32 x speed-up was seen (80% resource efficiency), and the eager scheduler reduced the impact of evictions by 58%.
机译:对高通量蛋白质组学的追求揭示了许多关键问题。尽管业界正在积极寻求改进的二维凝胶电泳(2-DE)样品制备,染色和成像问题,但是可靠的高通量斑点匹配和定量仍然是生物信息学管道中的重大瓶颈,因此限制了数据流向通过机器人点切除和蛋白质消化进行质谱分析。为此,重要的是建立一个完整的多站点Grid基础结构,以处理,归档,标准化和检索蛋白质组数据和元数据。需要特别强调进行大规模图像挖掘和统计交叉验证,以进行可靠的,全自动的差异表达分析,以及发展统计2-DE对象模型和本体,以支持新兴的HUPO PSI GPS(人类蛋白质组组织)蛋白质组学标准倡议组织通用蛋白质组学标准。迈向这一目标的第一步是克服使用网格使能的集群计算对2-DE凝胶进行图像分析所带来的计算和通信负担,本文提出了ProTurbo框架作为ProteomeGRID的一部分,利用Condor群集管理结合CORBA通信和JPEG-LS无损图像压缩技术进行任务耕种,开发了一种新颖的概率型急切调度程序,以最大程度地减少了跨度,其中响应于Condor机器所有者驱逐任务的可能性重复执行任务。在40台Mac上将60凝胶实验成对成像(3540任务) hine Linux集群。测量了实际性能和网络开销,并模拟了Poisson分布式工作人员驱逐。我们的结果显示无损图像压缩率为4:1,接近无损图像为9:1,因此网络开销不会影响其他用户。在40名工人的情况下,可以看到32倍的提速(80%的资源效率),而热切的调度程序将驱逐的影响减少了58%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号