首页> 外文期刊>BioData Mining >matK -QR classifier: a patterns based approach for plant species identification
【24h】

matK -QR classifier: a patterns based approach for plant species identification

机译:matK -QR分类器:一种基于模式的植物物种识别方法

获取原文
       

摘要

Background DNA barcoding is widely used and most efficient approach that facilitates rapid and accurate identification of plant species based on the short standardized segment of the genome. The nucleotide sequences of maturaseK ( matK ) and ribulose-1, 5-bisphosphate carboxylase ( rbcL ) marker loci are commonly used in plant species identification. Here, we present a new and highly efficient approach for identifying a unique set of discriminating nucleotide patterns to generate a signature (i.e. regular expression) for plant species identification. Methods In order to generate molecular signatures, we used matK and rbcL loci datasets, which encompass 125 plant species in 52 genera reported by the CBOL plant working group. Initially, we performed Multiple Sequence Alignment (MSA) of all species followed by Position Specific Scoring Matrix (PSSM) for both loci to achieve a percentage of discrimination among species. Further, we detected Discriminating Patterns (DP) at genus and species level using PSSM for the matK dataset. Combining DP and consecutive pattern distances, we generated molecular signatures for each species. Finally, we performed a comparative assessment of these signatures with the existing methods including BLASTn, Support Vector Machines (SVM), Jrip-RIPPER, J48 (C4.5 algorithm), and the Na?ve Bayes (NB) methods against NCBI-GenBank matK dataset. Results Due to the higher discrimination success obtained with the matK as compared to the rbcL , we selected matK gene for signature generation. We generated signatures for 60 species based on identified discriminating patterns at genus and species level. Our comparative assessment results suggest that a total of 46 out of 60 species could be correctly identified using generated signatures, followed by BLASTn (34 species), SVM (18 species), C4.5 (7 species), NB (4 species) and RIPPER (3 species) methods As a final outcome of this study, we converted signatures into QR codes and developed a software matK -QR Classifier ( http://www.neeri.res.in/matk_classifier/index.htm ), which search signatures in the query matK gene sequences and predict corresponding plant species. Conclusions This novel approach of employing pattern-based signatures opens new avenues for the classification of species. In addition to existing methods, we believe that matK -QR Classifier would be a valuable tool for molecular taxonomists enabling precise identification of plant species.
机译:背景DNA条形码被广泛使用并且是最有效的方法,该方法可促进基于基因组的短标准化片段快速,准确地鉴定植物物种。成熟酶K(matK)和核糖1,5-双磷酸羧化酶(r​​bcL)标记基因座的核苷酸序列通常用于植物物种鉴定。在这里,我们提出了一种新的高效方法,用于识别独特的一组区分核苷酸模式以生成用于植物物种识别的签名(即正则表达)。方法为了生成分子标记,我们使用了matK和rbcL基因座数据集,该数据集包含CBOL植物工作组报告的52个属中的125种植物。最初,我们对所有物种进行了多个物种的多序列比对(MSA),然后对两个基因座进行了位置特异性得分矩阵(PSSM),以实现物种之间的区分百分比。此外,我们针对matK数据集使用PSSM检测了属和种水平的区分模式(DP)。结合DP和连续模式距离,我们为每个物种生成了分子标记。最后,我们使用现有的方法(包括BLASTn,支持向量机(SVM),Jrip-RIPPER,J48(C4.5算法)和朴素贝叶斯(NB)方法)针对NCBI-GenBank对这些签名进行了比较评估。 matK数据集。结果由于与rbcL相比,使用matK可获得更高的识别成功率,因此我们选择matK基因进行签名生成。我们基于属和种级别的已识别区分模式生成了60种物种的签名。我们的比较评估结果表明,使用生成的特征标记可以正确识别出60种中的46种,其次是BLASTn(34种),SVM(18种),C4.5(7种),NB(4种)和RIPPER(3种)方法作为本研究的最终结果,我们将签名转换为QR码,并开发了matK -QR分类器(http://www.neeri.res.in/matk_classifier/index.htm)进行搜索matK基因序列中的特征签名并预测相应的植物种类。结论采用基于模式特征的新颖方法为物种分类开辟了新途径。除了现有方法外,我们相信matK -QR分类器将是分子分类学家能够精确识别植物物种的宝贵工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号