首页> 外文期刊>Acta Agronomica >Rescaled range R/S analysis application for genes prediction in the plant genome
【24h】

Rescaled range R/S analysis application for genes prediction in the plant genome

机译:重标范围R / S分析应用程序用于植物基因组中的基因预测

获取原文
       

摘要

Currently gene's prediction problem is one of the main genomic challenges. Prediction allows performing experiments with high probability of interesting genes to be found and compare DNA regions of agronomic importance among genomes; besides, it helps to restrict the searching spaces into the data bases. A statistical procedure based on the R/S analysis and the Hurst coefficient was developed in order to characterize and predict genes and their structural components (exones and intrones) in the whole eukaryotic genomes of Arabidopsis thaliana, Oriza sativa and Mus musculus. Python programming language algorithms were developed with the purpose of extract, screen and modeling more than 80% of the registered gene sequences for these genomes in the NCBI Gene Bank data base. The R/S analysis allows to demonstrate that a structural order do exist in the distribution of the nucleotides which are constituting sequences with the memory or long range dependence phenomena predominance. The memory structure varies according to the sequences type and the species genome. The genes and exones sequences from the analyzed plant genomes showed a persistent behavior whereas those from the intrones had an anti-persistent behavior, in comparison with animal genome in which the three type of sequences showed persistent behavior. According to R/S analysis out coming parameters the genome sequences distribution pattern was replicated in a statistically similar manner in each chromosome belonging to one species, constituting fundamental evidences of invariance by scale change; it means each chromosome by itself is a statistical replication to a minor scale of the whole genome. The parameters constituted compact criteria in order to derivate sequences predictors (classifiers) which reached sensibility and specificity averages higher than 81% and 70% respectively. This procedure could be tried in other genomes and be used as a criterion in order to increasing selection efficiency in plant genetic breeding programs.
机译:目前,基因的预测问题是主要的基因组挑战之一。预测允许进行发现感兴趣基因的可能性很高的实验,并比较基因组之间具有农学重要性的DNA区域;此外,它有助于将搜索空间限制到数据库中。为了描述和预测拟南芥,水稻和小家鼠整个真核基因组中的基因及其结构成分(外显子和内含子),开发了一种基于R / S分析和赫斯特系数的统计程序。开发Python编程语言算法的目的是为NCBI基因库数据库中的这些基因组提取,筛选和建模80%以上的已注册基因序列。 R / S分析允许证明在构成具有记忆或长程依赖现象的序列的核苷酸的分布中确实存在结构顺序。记忆结构根据序列类型和物种基因组而变化。与动物基因组中的三种类型的序列表现出持续行为相比,来自被分析的植物基因组的基因和外显子序列表现出持续行为,而内含子的基因和外显子序列具有抗持续行为。根据R / S分析得出的参数,基因组序列分布模式以统计学上相似的方式复制到属于一个物种的每个染色体中,构成了因尺度变化而不变的基本证据。这意味着每个染色体本身都是对整个基因组较小比例的统计复制。这些参数构成了紧凑的标准,以推导序列预测因子(分类器),其平均灵敏度和特异性平均值分别高于81%和70%。可以在其他基因组中尝试该程序,并将其用作标准,以提高植物遗传育种程序的选择效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号