首页> 美国卫生研究院文献>Evolutionary Bioinformatics Online >Decision Tree Algorithm–Generated Single-Nucleotide Polymorphism Barcodes of rbcL Genes for 38 Brassicaceae Species Tagging
【2h】

Decision Tree Algorithm–Generated Single-Nucleotide Polymorphism Barcodes of rbcL Genes for 38 Brassicaceae Species Tagging

机译:决策树算法生成的38个十字花科物种标记的rbcL基因单核苷酸多态性条形码

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

DNA barcode sequences are accumulating in large data sets. A barcode is generally a sequence larger than 1000 base pairs and generates a computational burden. Although the DNA barcode was originally envisioned as straightforward species tags, the identification usage of barcode sequences is rarely emphasized currently. Single-nucleotide polymorphism (SNP) association studies provide us an idea that the SNPs may be the ideal target of feature selection to discriminate between different species. We hypothesize that SNP-based barcodes may be more effective than the full length of DNA barcode sequences for species discrimination. To address this issue, we tested a ribulose diphosphate carboxylase (rbcL) SNP barcoding (RSB) strategy using a decision tree algorithm. After alignment and trimming, 31 SNPs were discovered in the rbcL sequences from 38 Brassicaceae plant species. In the decision tree construction, these SNPs were computed to set up the decision rule to assign the sequences into 2 groups level by level. After algorithm processing, 37 nodes and 31 loci were required for discriminating 38 species. Finally, the sequence tags consisting of 31 rbcL SNP barcodes were identified for discriminating 38 Brassicaceae species based on the decision tree–selected SNP pattern using RSB method. Taken together, this study provides the rational that the SNP aspect of DNA barcode for rbcL gene is a useful and effective sequence for tagging 38 Brassicaceae species.
机译:DNA条码序列在大型数据集中积累。条形码通常是大于1000个碱基对的序列,会产生计算负担。尽管最初将DNA条形码设想为简单的物种标签,但目前很少强调条形码序列的识别用途。单核苷酸多态性(SNP)关联研究为我们提供了一个想法,即SNP可能是特征选择以区分不同物种的理想目标。我们假设基于SNP的条形码可能比DNA条形码序列的全长更能有效区分物种。为了解决此问题,我们使用决策树算法测试了核糖二磷酸羧化酶(r​​bcL)SNP条形码(RSB)策略。对齐和修剪后,在38个十字花科植物的rbcL序列中发现了31个SNP。在决策树构造中,计算这些SNP以建立决策规则,以将序列逐级分配到2个组中。经过算法处理后,需要37个节点和31个基因座来区分38种。最后,基于决策树选择的SNP模式,使用RSB方法,鉴定了由31个rbcL SNP条码组成的序列标签,以区分38个十字花科。两者合计,这项研究提供了合理的理由是rbcL基因的DNA条码的SNP方面是标记38个十字花科的有用和有效序列。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号