首页> 外文期刊>Journal of Molecular Biology >IDENTIFICATION OF PROTEIN CODING REGIONS IN GENOMIC DNA
【24h】

IDENTIFICATION OF PROTEIN CODING REGIONS IN GENOMIC DNA

机译:基因组DNA中蛋白质编码区的鉴定

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

We have developed a computer program, GeneParser, which identifies and determines the fine structure of protein genes in genomic DNA sequences. The program scores all subintervals in a sequence for content statistics indicative of introns and exons, and for sites that identify their boundaries. This information is weighted by a neural network to approximate the log-likelihood that each subinterval exactly represents an intron or exon (first, internal or last). A dynamic programming algorithm is then applied to this data to find the combination of introns and exons that maximizes the likelihood function. Using this method, we can rapidly generate ranked suboptimal solutions, each of which is the optimum solution containing a given intron-exon junction. We have tested the system on a large collection of human genes. On sequences not used in training, we achieved a correlation coefficient for exon nucleotide prediction of 0.89. For a subset of G + C-rich genes, a correlation coefficient of 0.94 was achieved. We have also quantified the robustness of the method to substitution and frame-shift errors and show how the system can be optimized for performance on sequences with known levels of sequencing errors. [References: 53]
机译:我们已经开发了计算机程序GeneParser,该程序可以识别并确定基因组DNA序列中蛋白质基因的精细结构。该程序按顺序对所有子间隔进行评分,以表示内含子和外显子的内容统计信息,以及确定其边界的位点。该信息由神经网络加权,以近似对数似然性,即每个子间隔精确表示一个内含子或外显子(第一,内部或最后)。然后将动态编程算法应用于此数据,以找到使似然函数最大化的内含子和外显子的组合。使用此方法,我们可以快速生成排名次优的解决方案,每个解决方案都是包含给定内含子-外显子连接的最优解决方案。我们已经在大量人类基因上测试了该系统。在训练中未使用的序列上,我们获得的外显子核苷酸预测相关系数为0.89。对于富含G + C的基因的子集,相关系数达到0.94。我们还量化了该方法对替换和移码错误的鲁棒性,并显示了如何针对已知序列错误水平的序列优化系统性能。 [参考:53]

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号