...
首页> 外文期刊>Nucleic Acids Research >The computational analysis of scientific literature to define and recognize gene expression clusters
【24h】

The computational analysis of scientific literature to define and recognize gene expression clusters

机译:定义和识别基因表达簇的科学文献的计算分析

获取原文
获取原文并翻译 | 示例
           

摘要

A limitation of many gene expression analytic approaches is that they do not incorporate comprehensive background knowledge about the genes into the analysis. We present a computational method that leverages the peer-reviewed literature in the automatic analysis of gene expression data sets. Including the literature in the analysis of gene expression data offers an opportunity to incorporate functional information about the genes when defining expression clusters. We have created a method that associates gene expression profiles with known biological functions. Our method has two steps. First, we apply hierarchical clustering to the given gene expression data set. Secondly, we use text from abstracts about genes to (i) resolve hierarchical cluster boundaries to optimize the functional coherence of the clusters and (ii) recognize those clusters that are most functionally coherent. In the case where a gene has not been investigated and therefore lacks primary literature, articles about well-studied homologous genes are added as references. We apply our method to two large gene expression data sets with different properties. The first contains measurements for a subset of well-studied Saccharomyces cerevisiae genes with multiple literature references, and the second contains newly discovered genes in Drosophila melanogaster, many have no literature references at all. In both cases, we are able to rapidly define and identify the biologically relevant gene expression profiles without manual intervention. In both cases, we identified novel clusters that were not noted by the original investigators.
机译:许多基因表达分析方法的局限性在于它们没有将有关基因的全面背景知识纳入分析。我们提出了一种计算方法,该方法在基因表达数据集的自动分析中利用了同行评议的文献。在定义表达簇时,将文献包括在基因表达数据的分析中提供了整合有关基因功能信息的机会。我们创建了一种将基因表达谱与已知生物学功能相关联的方法。我们的方法有两个步骤。首先,我们将分层聚类应用于给定的基因表达数据集。其次,我们使用有关基因的摘要中的文本来(i)解决层次集群边界以优化集群的功能一致性,以及(ii)识别功能上最为一致的那些集群。如果尚未对基因进行研究,因此缺乏主要文献,则会添加有关经过充分研究的同源基因的文章作为参考。我们将我们的方法应用于具有不同属性的两个大型基因表达数据集。第一个包含对啤酒酵母基因的研究充分的子集的测量结果,具有多个文献参考,第二个包含黑腹果蝇中新发现的基因,许多根本没有文献参考。在这两种情况下,我们都可以快速定义和鉴定生物学上相关的基因表达谱,而无需人工干预。在这两种情况下,我们都发现了原始研究者没有注意到的新型簇。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号