首页> 外文期刊>BMC Bioinformatics >Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model
【24h】

Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model

机译:从文献中识别基因清单中过度代表的概念:基于泊松混合模型的统计方法

获取原文
           

摘要

Background Large-scale genomic studies often identify large gene lists, for example, the genes sharing the same expression patterns. The interpretation of these gene lists is generally achieved by extracting concepts overrepresented in the gene lists. This analysis often depends on manual annotation of genes based on controlled vocabularies, in particular, Gene Ontology (GO). However, the annotation of genes is a labor-intensive process; and the vocabularies are generally incomplete, leaving some important biological domains inadequately covered. Results We propose a statistical method that uses the primary literature, i.e. free-text, as the source to perform overrepresentation analysis. The method is based on a statistical framework of mixture model and addresses the methodological flaws in several existing programs. We implemented this method within a literature mining system, BeeSpace, taking advantage of its analysis environment and added features that facilitate the interactive analysis of gene sets. Through experimentation with several datasets, we showed that our program can effectively summarize the important conceptual themes of large gene sets, even when traditional GO-based analysis does not yield informative results. Conclusions We conclude that the current work will provide biologists with a tool that effectively complements the existing ones for overrepresentation analysis from genomic experiments. Our program, Genelist Analyzer, is freely available at: http://workerbee.igb.uiuc.edu:8080/BeeSpace/Search.jsp
机译:背景技术大规模的基因组研究通常会确定大型的基因列表,例如,具有相同表达模式的基因。这些基因列表的解释通常是通过提取基因列表中过分代表的概念来实现的。这种分析通常取决于基于受控词汇的基因的手动注释,特别是基因本体论(GO)。但是,注释基因是一个劳动密集型的过程。并且词汇表通常不完整,从而导致一些重要的生物学领域没有被充分覆盖。结果我们提出了一种统计方法,该方法使用原始文献(即自由文本)作为执行过度表达分析的来源。该方法基于混合模型的统计框架,并解决了几个现有程序中的方法缺陷。我们利用其分析环境并在文献挖掘系统BeeSpace中实现了该方法,并添加了有助于对基因集进行交互式分析的功能。通过对几个数据集进行的实验,我们证明了即使在传统的基于GO的分析无法获得有益结果的情况下,我们的程序也可以有效地总结大型基因集的重要概念主题。结论我们得出的结论是,当前的工作将为生物学家提供一种有效补充现有研究者的工具,以进行基因组实验中的过度表达分析。我们的程序Genelist Analyzer,可从以下网址免费获得:http://workerbee.igb.uiuc.edu:8080/BeeSpace/Search.jsp

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号