首页> 美国卫生研究院文献>OMICS : a Journal of Integrative Biology >Redundancy Control in Pathway Databases (ReCiPa): An Application for Improving Gene-Set Enrichment Analysis in Omics Studies and Big Data Biology
【2h】

Redundancy Control in Pathway Databases (ReCiPa): An Application for Improving Gene-Set Enrichment Analysis in Omics Studies and Big Data Biology

机译:通路数据库中的冗余控制(ReCiPa):在眼科学研究和大数据生物学中改善基因集富集分析的应用

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Unparalleled technological advances have fueled an explosive growth in the scope and scale of biological data and have propelled life sciences into the realm of “Big Data” that cannot be managed or analyzed by conventional approaches. Big Data in the life sciences are driven primarily via a diverse collection of ‘omics’-based technologies, including genomics, proteomics, metabolomics, transcriptomics, metagenomics, and lipidomics. Gene-set enrichment analysis is a powerful approach for interrogating large ‘omics’ datasets, leading to the identification of biological mechanisms associated with observed outcomes. While several factors influence the results from such analysis, the impact from the contents of pathway databases is often under-appreciated. Pathway databases often contain variously named pathways that overlap with one another to varying degrees. Ignoring such redundancies during pathway analysis can lead to the designation of several pathways as being significant due to high content-similarity, rather than truly independent biological mechanisms. Statistically, such dependencies also result in correlated p values and overdispersion, leading to biased results. We investigated the level of redundancies in multiple pathway databases and observed large discrepancies in the nature and extent of pathway overlap. This prompted us to develop the application, ReCiPa (Redundancy Control in Pathway Databases), to control redundancies in pathway databases based on user-defined thresholds. Analysis of genomic and genetic datasets, using ReCiPa-generated overlap-controlled versions of KEGG and Reactome pathways, led to a reduction in redundancy among the top-scoring gene-sets and allowed for the inclusion of additional gene-sets representing possibly novel biological mechanisms. Using obesity as an example, bioinformatic analysis further demonstrated that gene-sets identified from overlap-controlled pathway databases show stronger evidence of prior association to obesity compared to pathways identified from the original databases.
机译:无与伦比的技术进步推动了生物数据范围和规模的爆炸性增长,并将生命科学推向了“大数据”领域,而这是传统方法无法管理或分析的。生命科学中的大数据主要由多种基于“组学”的技术驱动,包括基因组学,蛋白质组学,代谢组学,转录组学,宏基因组学和脂质组学。基因集富集分析是一种用于查询大型“组学”数据集的强大方法,可用于识别与观察到的结果相关的生物学机制。尽管有几个因素会影响这种分析的结果,但通路数据库的内容所带来的影响通常却被人们低估了。途径数据库通常包含各种命名的途径,它们在不同程度上相互重叠。在路径分析过程中忽略这种冗余可能会导致将多个路径指定为重要路径,这是因为它们具有很高的内容相似性,而不是真正独立的生物学机制。从统计上讲,这种依赖性还导致相关的p值和过度分散,导致结果有偏差。我们研究了多途径数据库中的冗余水平,并观察到途径重叠的性质和程度存在较大差异。这促使我们开发应用程序ReCiPa(路径数据库中的冗余控制),以根据用户定义的阈值控制路径数据库中的冗余。使用ReCiPa生成的KEGG和Reactome途径的重叠对照版本进行基因组和遗传数据集分析,可减少得分最高的基因组之间的冗余,并允许包含代表可能新的生物学机制的其他基因组。以肥胖症为例,生物信息学分析进一步证明,与从原始数据库中鉴定出的途径相比,从重叠控制途径数据库中鉴定出的基因组显示出与肥胖症先前关联的更强证据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号