首页> 美国卫生研究院文献>Nucleic Acids Research >Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks.
【2h】

Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks.

机译:基因组DNA序列中编码区的鉴定:动态编程和神经网络的应用。

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

Dynamic programming (DP) is applied to the problem of precisely identifying internal exons and introns in genomic DNA sequences. The program GeneParser first scores the sequence of interest for splice sites and for these intron- and exon-specific content measures: codon usage, local compositional complexity, 6-tuple frequency, length distribution and periodic asymmetry. This information is then organized for interpretation by DP. GeneParser employs the DP algorithm to enforce the constraints that introns and exons must be adjacent and non-overlapping and finds the highest scoring combination of introns and exons subject to these constraints. Weights for the various classification procedures are determined by training a simple feed-forward neural network to maximize the number of correct predictions. In a pilot study, the system has been trained on a set of 56 human gene fragments containing 150 internal exons in a total of 158,691 bps of genomic sequence. When tested against the training data, GeneParser precisely identifies 75% of the exons and correctly predicts 86% of coding nucleotides as coding while only 13% of non-exon bps were predicted to be coding. This corresponds to a correlation coefficient for exon prediction of 0.85. Because of the simplicity of the network weighting scheme, generalization performance is nearly as good as with the training set.
机译:动态编程(DP)用于精确识别基因组DNA序列中的内部外显子和内含子。程序GeneParser首先对剪接位点以及这些内含子和外显子特异性含量测量的目标序列进行评分:密码子使用,局部组成复杂性,6元组频率,长度分布和周期性不对称。然后,组织此信息以供DP解释。 GeneParser使用DP算法强制执行内含子和外显子必须相邻且不重叠的约束条件,并在这些约束条件下找到得分最高的内含子和外显子组合。通过训练简单的前馈神经网络来最大化正确预测的数量,可以确定各种分类程序的权重。在一项前期研究中,该系统已针对一组56个人类基因片段进行了训练,这些片段包含150个内部外显子,总基因组序列为158,691 bps。当对照训练数据进行测试时,GeneParser可以准确地识别75%的外显子,并正确地预测86%的编码核苷酸为编码,而只有13%的非外显子bps被编码。这对应于外显子预测的相关系数为0.85。由于网络加权方案的简单性,泛化性能几乎与训练集一样好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号