首页> 外文期刊>BMC Bioinformatics >Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data
【24h】

Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data

机译:在Taverna工作流程中对定量数据进行统计分析:使用R和maxdBrowse从微阵列数据中识别差异表达基因的示例

获取原文
           

摘要

Background There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. Results Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. Conclusion Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data.
机译:背景技术在后基因组时代,从不同生物学复杂性水平的变化的测量结果得出的定量数据数量急剧增加。但是,与用于分析此类数据的计算工具的使用存在许多问题。例如,诸如R和MATLAB之类的计算工具需要其编程语言的先验知识,以便对数据进行统计分析。在分析中将两个或多个工具组合在一起也可能会出现问题,因为可能必须手动复制数据并将其粘贴在每个工具的单独用户界面之间。此外,这种数据传输可能需要协调步骤,以使计算工具之间具有互操作性。结果Taverna工作流系统的发展使得能够构建和制定用于定量数据的通用和临时分析的管道。在这里,我们提供了一个这样的工作流程的示例,该工作流程涉及从微阵列数据中统计识别差异表达基因,然后注释它们与细胞过程的关系。该工作流程利用定制的maxdBrowse Web服务,该系统使Taverna可以从maxdLoad2微阵列数据库中查询和检索基因表达数据。然后,使用Taverna RShell处理器对这些数据进行分析,以鉴定差异表达的基因,该处理器已开发用于在使用RServe库将其作为服务部署时调用此工具。此外,工作流使用Beanshell脚本来调和服务之间数据的不匹配,并实现一种用户交互形式,以选择微阵列数据的子集进行分析,作为工作流执行的一部分。 Taverna软件体系结构中的新插件系统通过使用渲染器展示,以在Taverna工作台中显示PDF文件和CSV格式的数据。结论Taverna可以被数据分析专家用作通用工具,通过结合使用R编程语言编写的脚本和在工作流中作为服务公开的工具来构成定量数据的临时分析。与同事和更广泛的科学界共享这些工作流时,它们为希望使用R等工具的其他科学家提供了一种方法,而无需学习相应的编程语言来分析自己的数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号