首页> 外文期刊>Bioinformatics >Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus
【24h】

Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus

机译:匹配相同基因组基因座的多个EST的共有剪接比对预测基因结构

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Accurate gene structure annotation is a challenging computational problem in genomics. The best results are achieved with spliced alignment of full-length cDNAs or multiple expressed sequence tags (ESTs) with sufficient overlap to cover the entire gene. For most species, cDNA and EST collections are far from comprehensive. We sought to overcome this bottleneck by exploring the possibility of using combined EST resources from fairly diverged species that still share a common gene space. Previous spliced alignment tools were found inadequate for this task because they rely on very high sequence similarity between the ESTs and the genomic DNA. Results: We have developed a computer program, GeneSeqer, which is capable of aligning thousands of ESTs with a long genomic sequence in a reasonable amount of time. The algorithm is uniquely designed to tolerate a high percentage of mismatches and insertions or deletions in the EST relative to the genomic template. This feature allows use of non-cognate ESTs for gene structure prediction, including ESTs derived from duplicated genes and homologous genes from related species. The increased gene prediction sensitivity results in part from novel splice site prediction models that are also available as a stand-alone splice site prediction tool. We assessed GeneSeqer performance relative to a standard Arabidopsis thaliana gene set and demonstrate its utility for plant genome annotation. In particular, we propose that this method provides a timely tool for the annotation of the rice genome, using abundant ESTs from other cereals and plants.
机译:动机:正确的基因结构注释是基因组学中一个具有挑战性的计算问题。全长cDNA或具有足够重叠覆盖整个基因的多个表达序列标签(EST)的剪接比对可实现最佳结果。对于大多数物种而言,cDNA和EST的收集还远远不够。我们试图通过探索使用仍来自同一物种但仍具有共同基因空间的相当多样化物种的组合EST资源的可能性,来克服这一瓶颈。发现以前的剪接比对工具不足以完成此任务,因为它们依赖于EST和基因组DNA之间的非常高的序列相似性。结果:我们开发了GeneSeqer计算机程序,该程序能够在合理的时间内将数千个具有长基因组序列的EST进行比对。该算法经过独特设计,可承受相对于基因组模板的EST中较高比例的错配,插入或缺失。此功能允许使用非同源EST进行基因结构预测,包括源自重复基因的EST和来自相关物种的同源基因。基因预测敏感性的提高部分源于新颖的剪接位点预测模型,该模型也可作为独立的剪接位点预测工具使用。我们评估了GeneSeqer与标准拟南芥基因集相关的性能,并证明了其在植物基因组注释中的效用。特别是,我们建议使用来自其他谷物和植物的大量EST,该方法为水稻基因组的注释提供及时的工具。

著录项

  • 来源
    《Bioinformatics》 |2004年第7期|p. 1157-1169|共13页
  • 作者单位

    Department of Genetics, Development and Cell Biology, Iowa State University, 2112 Molecular Biology Building, Ames, Iowa 50011–3260, USA;

    BASFPlant Science NC, 26 Davis Drive, Research Triangle Park, NC 27709-3528, USA;

    NewLink Genetics, 2901 SX. Loop Dr, Ames, IA 50010, USA;

  • 收录信息 美国《科学引文索引》(SCI);美国《化学文摘》(CA);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 生物科学;
  • 关键词

  • 入库时间 2022-08-17 23:50:17

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号