...
首页> 外文期刊>BMC Bioinformatics >Errors in CGAP xProfiler and cDNA DGED: the importance of library parsing and gene selection algorithms
【24h】

Errors in CGAP xProfiler and cDNA DGED: the importance of library parsing and gene selection algorithms

机译:CGAP xProfiler和cDNA DGED中的错误:文库解析和基因选择算法的重要性

获取原文

摘要

Background The Cancer Genome Anatomy Project (CGAP) xProfiler and cDNA Digital Gene Expression Displayer (DGED) have been made available to the scientific community over a decade ago and since then were used widely to find genes which are differentially expressed between cancer and normal tissues. The tissue types are usually chosen according to the ontology hierarchy developed by NCBI. The xProfiler uses an internally available flat file database to determine the presence or absence of genes in the chosen libraries, while cDNA DGED uses the publicly available UniGene Expression and Gene relational databases to count the sequences found for each gene in the presented libraries. Results We discovered that the CGAP approach often includes libraries from dependent or irrelevant tissues (one third of libraries were incorrect on average, with some tissue searches no correct libraries being selected at all). We also discovered that the CGAP approach reported genes from outside the selected libraries and may omit genes found within the libraries. Other errors include the incorrect estimation of the significance values and inaccurate settings for the library size cut-off values. We advocated a revised approach to finding libraries associated with tissues. In doing so, libraries from dependent or irrelevant tissues do not get included in the final library pool. We also revised the method for determining the presence or absence of a gene by searching the UniGene relational database, revised calculation of statistical significance and sorted the library cut-off filter. Conclusion Our results justify re-evaluation of all previously reported results where NCBI CGAP expression data and tools were used.
机译:背景技术癌症基因组解剖学计划(CGAP)xProfiler和cDNA数字基因表达显示仪(DGED)已于十年前提供给科学界,从那以后被广泛用于寻找在癌症和正常组织之间差异表达的基因。通常根据NCBI开发的本体层次结构选择组织类型。 xProfiler使用内部可用的平面文件数据库来确定所选文库中基因的存在或不存在,而cDNA DGED使用公开可用的UniGene Expression和Gene关系数据库来对所提供文库中每个基因的序列进行计数。结果我们发现CGAP方法通常包含来自依赖或不相关组织的文库(平均三分之一的文库不正确,有些组织搜索根本没有选择正确的文库)。我们还发现CGAP方法从选定的库外报告了基因,并且可能省略了在库内发现的基因。其他错误包括有效值估计不正确以及库大小截止值的设置不正确。我们提倡一种经过修订的方法来查找与组织相关的文库。这样,来自相依组织或无关组织的库就不会包含在最终库库中。我们还通过搜索UniGene关系数据库,修改了统计显着性的计算并对库截止过滤器进行了排序,对确定基因存在或不存在的方法进行了修订。结论我们的结果证明了对所有先前报告的结果(使用NCBI CGAP表达数据和工具进行重新评估)的合理性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号