首页> 中文期刊> 《林业科学》 >基于随机森林算法和SRAP分子标记的桂花品种鉴定方法

基于随机森林算法和SRAP分子标记的桂花品种鉴定方法

         

摘要

[Objective] To solve the problem that Osmanthus fragrans cultivars being hardly identified in nursery stock production and landscape application,this study proposed a classification method based on random forest algorithm and SRAP molecular markers,which can be used for easily,quickly and accurately identifying varieties.[Method] DNA of 45 O.fragrans cultivars were extracted,which were applied to PCR amplification,using 90 SRAP primer pairs.The fragments were examined by Capillary Electrophoresis to screen the primer pairs with high polymorphism level and steady amplification.The amplification data were used to calculate polymorphism information content (PIC),numbers of patterns,numbers of effective patterns,the discriminating power (D),chi-square value of patterns distribution (x2),and pairs of indistinguishable samples (x).The locus data of combination of primer pairs that can discriminate all cultivars were used as training set for construction of classification modes based on random forest algorithm.The models with best classifying ability were selected depending on their generalization ability and classifying quality.[Result] A total of 10 SRAP primer pairs were selected,with mean PIC of 0.26,mean numbers of patterns of 33.9,mean numbers of effective patterns of 26.6,mean D of 0.97,mean x2 of 21.07 and mean x of 28.2.Eight classification models were constructed using 8 combination of 2 prime pairs that can discriminate all cultivars (rf1-rf8).The OOB (out of bag) error rate of these models ranged from 0.004 4-0.013 9.Among of them,rf5 and rf3 had the strongest generalization ability,while rf8 had the weakest.And rf1 had the best classifying quality,rf3,rf4,rf5 and rf7 had better,while rf8 had the worst.[Conclusion] Classification models rfl,rf3,rf4,rf5 and rf7 have the strongest classifying ability,with the combination of SRAP primer pairs of me1/em3 + me9/em6,me4/em5 + me9/em6,me4/em8 + me9/em6,me6/em9 + me9/em6 and me5/em5 + me9/em6,separately.The weaker correlation of selected primer pairs brings the stronger classifying ability of models.The method proposed in this study can be applied to identity O.fragrans cultivars quickly and accurately.%[目的]为了解决桂花品种难以鉴定以及苗木生产和园林应用中品种混杂、以次充好和常规DNA指纹图谱无法很好地应用于品种鉴定的问题,提出一种基于随机森林算法和SRAP分子标记的桂花品种鉴定方法,以实现桂花品种简便、快速和准确的鉴定.[方法]以45个桂花品种或变异类型为材料,提取DNA,使用90对SRAP引物进行PCR扩增,以毛细管电泳技术采集扩增信息,筛选出多态性强、扩增结果稳定的引物,计算单对引物的多态信息含量(PIC)、带型数、有效带型数、分辨能力(D)、带型分布的卡方值(x2)和无法区分的样品对数(x).筛选出能够完全区分所有品种的引物对组合位点数据作为训练集,用于构建基于随机森林算法的分类模型,并根据模型的泛化能力和分类效果选择最优的分类模型.[结果]筛选出10对SRAP引物,平均PIC为0.26,平均带型数为33.9,平均有效带型数为26.6,平均D为0.97,平均x2为21.07,平均x为28.2.构建了8个分类模型rf1-rf8,每个分类模型均含有2对SRAP引物.所有分类模型都能完全区分所有桂花品种,模型的袋外数据(OOB)误差估计在0.004 4~0.013 9之间,rf5和rf3泛化能力最强,rf8最弱.rf1分类效果最优,rf3、f4、rf5和rf7其次,rf2、rf6和rf8最差.[结论]分类模型rf1、rf3、rf4、rf5和rf7的分类能力最佳,所用SRAP引物对分别为me1/em3+ me9/em6、me4/em5+ me9/em6、me4/em8+ me9/em6、me6/em9+ me9/em6和me5/em5+ me9/em6.除引物对的分辨能力外,所选引物对之间的相关性也显著影响模型的分类能力,相关性越弱,模型的分类能力越强.本研究提出的基于随机森林算法和SRAP分子标记的桂花品种鉴定方法,能够实现桂花品种简便、快速、准确的鉴定,满足桂花苗木生产、推广应用和种质资源保护对于品种鉴定的要求.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号