首页> 美国卫生研究院文献>Scientific Reports >A data mining paradigm for identifying key factors in biological processes using gene expression data
【2h】

A data mining paradigm for identifying key factors in biological processes using gene expression data

机译:使用基因表达数据识别生物学过程中关键因素的数据挖掘范例

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

A large volume of biological data is being generated for studying mechanisms of various biological processes. These precious data enable large-scale computational analyses to gain biological insights. However, it remains a challenge to mine the data efficiently for knowledge discovery. The heterogeneity of these data makes it difficult to consistently integrate them, slowing down the process of biological discovery. We introduce a data processing paradigm to identify key factors in biological processes via systematic collection of gene expression datasets, primary analysis of data, and evaluation of consistent signals. To demonstrate its effectiveness, our paradigm was applied to epidermal development and identified many genes that play a potential role in this process. Besides the known epidermal development genes, a substantial proportion of the identified genes are still not supported by gain- or loss-of-function studies, yielding many novel genes for future studies. Among them, we selected a top gene for loss-of-function experimental validation and confirmed its function in epidermal differentiation, proving the ability of this paradigm to identify new factors in biological processes. In addition, this paradigm revealed many key genes in cold-induced thermogenesis using data from cold-challenged tissues, demonstrating its generalizability. This paradigm can lead to fruitful results for studying molecular mechanisms in an era of explosive accumulation of publicly available biological data.
机译:为了研究各种生物过程的机理,正在产生大量的生物数据。这些宝贵的数据可以进行大规模的计算分析,从而获得生物学上的见识。但是,有效地挖掘数据以进行知识发现仍然是一个挑战。这些数据的异质性使得难以一致地整合它们,从而减慢了生物发现的过程。我们介绍了一种数据处理范例,可通过系统地收集基因表达数据集,初步分析数据和评估一致信号来识别生物过程中的关键因素。为了证明其有效性,我们的范例应用于表皮发育,并鉴定了许多在此过程中可能发挥作用的基因。除已知的表皮发育基因外,获得或丧失功能的研究仍不支持大部分已鉴定的基因,从而为未来研究提供了许多新基因。其中,我们选择了一个顶级基因进行功能丧失实验验证,并确认了其在表皮分化中的功能,证明了该范例能够识别生物过程中的新因素。此外,该范式使用来自冷挑战组织的数据揭示了冷诱导生热中的许多关键基因,证明了其普遍性。在公众积累的生物数据爆炸性积累的时代,这种范式可以为研究分子机制带来丰硕的成果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号