首页> 外文期刊>BMC Bioinformatics >Extending pathways based on gene lists using InterPro domain signatures
【24h】

Extending pathways based on gene lists using InterPro domain signatures

机译:使用InterPro域签名扩展基于基因列表的途径

获取原文
           

摘要

Background High-throughput technologies like functional screens and gene expression analysis produce extended lists of candidate genes. Gene-Set Enrichment Analysis is a commonly used and well established technique to test for the statistically significant over-representation of particular pathways. A shortcoming of this method is however, that most genes that are investigated in the experiments have very sparse functional or pathway annotation and therefore cannot be the target of such an analysis. The approach presented here aims to assign lists of genes with limited annotation to previously described functional gene collections or pathways. This works by comparing InterPro domain signatures of the candidate gene lists with domain signatures of gene sets derived from known classifications, e.g. KEGG pathways. Results In order to validate our approach, we designed a simulation study. Based on all pathways available in the KEGG database, we create test gene lists by randomly selecting pathway genes, removing these genes from the known pathways and adding variable amounts of noise in the form of genes not annotated to the pathway. We show that we can recover pathway memberships based on the simulated gene lists with high accuracy. We further demonstrate the applicability of our approach on a biological example. Conclusion Results based on simulation and data analysis show that domain based pathway enrichment analysis is a very sensitive method to test for enrichment of pathways in sparsely annotated lists of genes. An R based software package domainsignatures , to routinely perform this analysis on the results of high-throughput screening, is available via Bioconductor.
机译:背景技术诸如功能筛选和基因表达分析之类的高通量技术会产生候选基因的扩展列表。基因组富集分析是一种常用的且建立良好的技术,用于测试特定途径在统计学上的显着过量表达。然而,该方法的缺点是,在实验中研究的大多数基因具有非常稀疏的功能或途径注释,因此不能成为此类分析的目标。本文介绍的方法旨在将注释有限的基因列表分配给先前描述的功能基因集合或途径。这通过将候选基因列表的InterPro域签名与从已知分类(例如E. KEGG途径。结果为了验证我们的方法,我们设计了一个仿真研究。基于KEGG数据库中可用的所有途径,我们通过随机选择途径基因,从已知途径中除去这些基因并以未注释途径的基因形式添加可变数量的噪声来创建测试基因列表。我们表明,我们可以基于模拟的基因列表以高精度恢复途径成员。我们进一步证明了我们的方法在生物学实例上的适用性。结论基于仿真和数据分析的结果表明,基于域的途径富集分析是测试稀疏注释基因列表中途径富集的一种非常敏感的方法。可通过Bioconductor获得基于R的软件包domainsignatures,以对高通量筛选的结果进行常规分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号