首页> 外文期刊>Procedia Computer Science >Classification of Eukaryotic Splice-junction Genetic Sequences Using Averaged One-dependence Estimators with Subsumption Resolution
【24h】

Classification of Eukaryotic Splice-junction Genetic Sequences Using Averaged One-dependence Estimators with Subsumption Resolution

机译:真核剪接连接遗传序列的分类使用归一化平均一依赖估计量。

获取原文

摘要

DNA is the building block of life, which contains encoded genetic instructions for building living organisms. Because of the fact that proteins are constructed in accordance with the genetic instructions encoded in DNAs, errors in RNA synthesis and translation into proteins can cause genetic disorders. Therefore, understanding and recognizing genetic sequences is one step towards the treatment of these genetic disorders. Since the discovery of DNA, there has been a growing interest in the problem of genetic sequence recognition, motivated by its enormous potential to cure a wide range of genetic disorders. The completion of the human genome project in the last decade has generated a strong demand in computational analysis techniques in order to fully exploit the acquired human genome database. This paper describes a state-of-the-art machine learning based approach called averaged one-dependence estimators with subsumption resolution to tackle the problem of recognizing an important class of genetic sequences known as eukaryotic splice junctions. To lower the computational complexity and to increase the generalization capability of the system, we employ a genetic algorithm to select relevant nucleotides that are directly responsible for splice-junction recognition. We carried out experiments on a dataset extracted from the biological literature. This proposed system has achieved an accuracy of 96.68% in classifying splice-junction genetic sequences. The experimental results demonstrate the efficacy of our framework and encourage us to apply the framework on other types of genetic sequences.
机译:DNA是生命的基石,其中包含编码的生物体构建遗传指令。由于蛋白质是按照DNA中编码的遗传指令构建的,因此RNA合成和翻译成蛋白质的错误会导致遗传疾病。因此,了解和识别基因序列是治疗这些遗传疾病的第一步。自从发现DNA以来,人们对基因序列识别问题的兴趣日益浓厚,这是由于它具有治愈各种遗传疾病的巨大潜力。为了充分利用获得的人类基因组数据库,近十年来人类基因组计划的完成对计算分析技术提出了强烈要求。本文介绍了一种基于最新机器学习的方法,该方法称为平均单依赖估计量,具有包含分辨率,可解决识别重要的一类基因序列(称为真核剪接点)的问题。为了降低计算复杂度并提高系统的泛化能力,我们采用遗传算法选择直接负责剪接点识别的相关核苷酸。我们对从生物学文献中提取的数据集进行了实验。该提议的系统在对剪接点遗传序列进行分类时已达到96.68%的准确性。实验结果证明了我们框架的有效性,并鼓励我们将该框架应用于其他类型的基因序列。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号