首页> 外文会议>IEEE International Conference on eScience >Data Analytics in Bioinformatics: Data Science in Practice for Genomics Analysis Workflows
【24h】

Data Analytics in Bioinformatics: Data Science in Practice for Genomics Analysis Workflows

机译:生物信息学的数据分析:基因组学分析工作流程中的数据科学

获取原文

摘要

Workflow systems manage large-scale experiments and deliver a large volume of provenance data traces. The provenance repository of these systems contains information about the workflow execution, which allows for tracking and analyzing data transformations. However, provenance data may still be considered a black-box, when it comes to analyze the contents of resulting data files. Current solutions are focused on data transformation at coarse grain, they point to input and output files, but do not allow for exploring domain-specific data. Data analytics is essential for managing large-scale workflows executed in parallel, especially when tracking anomalous executions. In this paper, we present a data analytics approach, which is based on the use of provenance data enriched with domain-specific data coupled to a data mining tool. A real bioinformatics workflow was modeled and executed in parallel on top of Amazon clouds. It manipulates complex biological data, which is difficult to monitor like many other genomic workflows. We evaluate the benefits of using domain-specific data and provenance data for user steering while monitoring the execution with detailed filters, steering on specific conditions and performance evaluation. Results show that the provenance database coupled to workflow systems has an unexplored potential for raw data analytics, which may improve the user confidence and reduce overall execution time.
机译:工作流系统管理大规模实验,并提供大量的出处数据痕迹。这些系统的原子生物存储库包含有关工作流执行的信息,允许跟踪和分析数据变换。然而,当分析结果数据文件的内容时,出处数据仍可能被视为黑盒子。目前的解决方案集中在粗粒的数据转换上,它们指向输入和输出文件,但不允许探索特定于域的数据。数据分析对于管理并行执行的大规模工作流程是必不可少的,尤其是在跟踪异常执行时。在本文中,我们提出了一种数据分析方法,其基于使用富有域特定数据的出处数据耦合到数据挖掘工具。真正的生物信息学工作流程是在亚马逊云的顶部并行进行建模和执行。它操纵复杂的生物数据,这很难像许多其他基因组工作流一样监测。我们评估使用域特定数据的好处,并在使用详细滤波器监视执行时,对用户转向提供数据,用于对特定滤波器进行指导和性能评估。结果表明,耦合到工作流系统的出处数据库对原始数据分析具有未开发的潜力,这可能会提高用户的置信度并减少整体执行时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号