...
首页> 外文期刊>Genome Biology >Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA
【24h】

Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA

机译:Exogean:注释真核基因组DNA中蛋白质编码基因的框架

获取原文
获取原文并翻译 | 示例

摘要

Background: Accurate and automatic gene identification in eukaryotic genomic DNA is more than ever of crucial importance to efficiently exploit the large volume of assembled genome sequences available to the community. Automatic methods have always been considered less reliable than human expertise. This is illustrated in the EGASP project, where reference annotations against which all automatic methods are measured are generated by human annotators and experimentally verified. We hypothesized that replicating the accuracy of human annotators in an automatic method could be achieved by formalizing the rules and decisions that they use, in a mathematical formalism. Results: We have developed Exogean, a flexible framework based on directed acyclic colored multigraphs (DACMs) that can represent biological objects (for example, mRNA, ESTs, protein alignments, exons) and relationships between them. Graphs are analyzed to process the information according to rules that replicate those used by human annotators. Simple individual starting objects given as input to Exogean are thus combined and synthesized into complex objects such as protein coding transcripts. Conclusions: We show here, in the context of the EGASP project, that Exogean is currently the method that best reproduces protein coding gene annotations from human experts, in terms of identifying at least one exact coding sequence per gene. We discuss current limitations of the method and several avenues for improvement.
机译:背景:真核基因组DNA的准确,自动的基因鉴定比以往任何时候都至关重要,对于有效利用社区中可用的大量组装基因组序列来说,这是至关重要的。一直以来,人们一直认为自动方法的可靠性不如人类专业知识。这在EGASP项目中得到了说明,其中由人工注释者生成了针对其测量所有自动方法的参考注释,并进行了实验验证。我们假设,可以通过以数学形式主义形式化人类注释者使用的规则和决策来实现自动方法中人类注释者的准确性的复制。结果:我们开发了Exogean,这是一个基于有向无环彩色多图(DACM)的灵活框架,可以代表生物学对象(例如,mRNA,EST,蛋白质比对,外显子)及其之间的关系。根据复制人类注释者使用的规则,对图形进行分析以处理信息。因此,将作为输入给Exogean的简单单个起始对象进行组合,并合成为复杂的对象,例如蛋白质编码转录本。结论:在EGASP项目的背景下,我们证明了Exogean是目前从人类专家中复制蛋白质编码基因注释最有效的方法,可以识别每个基因至少一个确切的编码序列。我们讨论了该方法的当前局限性和改进的几种途径。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号