首页> 外文会议>IEEE 5th International Bio-Inspired Computing: Theories and Applications >Identification of coding and non-coding sequences in a complete genome using local Hölder exponent formalism and Multi-affinity analysis
【24h】

Identification of coding and non-coding sequences in a complete genome using local Hölder exponent formalism and Multi-affinity analysis

机译:使用局部Hölder指数形式主义和多亲和力分析鉴定完整基因组中的编码和非编码序列

获取原文

摘要

Accurate prediction of genes in genomes has always been a challenging task for bioinformaticians and computational biologists. Therefore, the discovery of relations in coding and non-coding sequences has led to new perspectives in the understanding of the DNA sequences. This has motivated us to find new methods to distinguish coding and non-coding sequences. We first introduce a number sequence representation of DNA sequences. Multi-affinity analysis and local Hölder exponent are then performed on the representation of the obtained number sequence. Three suited exponents are selected to form a parameter space. The two exponents γ(−2), γ(6) are from Multi-affinity analysis, the exponent h is from local Hölder exponent. Thus, each coding or non-coding sequence may be represented by a point in the three-dimensional parameter space. We can see the points corresponding to coding and non-coding sequences in the complete genome of many prokaryotes be divided to different regions roughly. If the point (γ(−2), γ(6), h) for a DNA sequence is situated in the region corresponding to coding sequences, the sequence is discriminated as a coding sequence; otherwise, the sequence is classified as a non-coding one. Therefore these exponents can be used to distinguish coding and non-coding sequences. The Fisher''s discriminant algorithm is used to give the discriminant accuracies. The average discriminant accuracies pc, pnc, qc and qnc of all 51 prokaryotes obtained by the present method reach 69.08%, 83.34%, 72.08% and 83.54%, respectively.
机译:对于生物信息学家和计算生物学家而言,准确预测基因组中的基因一直是一项艰巨的任务。因此,在编码序列和非编码序列中的关系的发现导致了对DNA序列的理解的新观点。这促使我们找到区分编码序列和非编码序列的新方法。我们首先介绍DNA序列的数字序列表示。然后对获得的数字序列的表示进行多亲和力分析和局部Hölder指数。选择三个合适的指数以形成参数空间。两个指数γ(−2),γ(6)来自多亲和力分析,指数h来自局部Hölder指数。因此,每个编码或非编码序列可以由三维参数空间中的点表示。我们可以看到,许多原核生物完整基因组中与编码和非编码序列相对应的点大致分为不同区域。如果DNA序列的点(γ(-2),γ(6),h)位于与编码序列相对应的区域中,则将该序列区分为编码序列。否则,该序列被分类为非编码序列。因此,这些指数可用于区分编码和非编码序列。 Fisher的判别算法用于给出判别精度。蛋白质获得的所有51个原核生物的平均判别准确度p c ,p nc ,q c 和q nc 目前的方法分别达到69.08%,83.34%,72.08%和83.54%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号