首页> 外文期刊>BMC Bioinformatics >An application of kernel methods to variety identification based on SSR markers genetic fingerprinting
【24h】

An application of kernel methods to variety identification based on SSR markers genetic fingerprinting

机译:基于SSR标记遗传指纹的核方法在品种鉴定中的应用。

获取原文
           

摘要

Background In crop production systems, genetic markers are increasingly used to distinguish individuals within a larger population based on their genetic make-up. Supervised approaches cannot be applied directly to genotyping data due to the specific nature of those data which are neither continuous, nor nominal, nor ordinal but only partially ordered. Therefore, a strategy is needed to encode the polymorphism between samples such that known supervised approaches can be applied. Moreover, finding a minimal set of molecular markers that have optimal ability to discriminate, for example, between given groups of varieties, is important as the genotyping process can be costly in terms of laboratory consumables, labor, and time. This feature selection problem also needs special care due to the specific nature of the data used. Results An approach encoding SSR polymorphisms in a positive definite kernel is presented, which then allows the usage of any kernel supervised method. The polymorphism between the samples is encoded through the Nei-Li genetic distance, which is shown to define a positive definite kernel between the genotyped samples. Additionally, a greedy feature selection algorithm for selecting SSR marker kits is presented to build economical and efficient prediction models for discrimination. The algorithm is a filter method and outperforms other filter methods adapted to this setting. When combined with kernel linear discriminant analysis or kernel principal component analysis followed by linear discriminant analysis, the approach leads to very satisfactory prediction models. Conclusions The main advantage of the approach is to benefit from a flexible way to encode polymorphisms in a kernel and when combined with a feature selection algorithm resulting in a few specific markers, it leads to accurate and economical identification models based on SSR genotyping.
机译:背景技术在农作物生产系统中,越来越多地使用遗传标记来根据其遗传组成来区分较大人群中的个体。有监督的方法不能直接应用于基因分型数据,因为这些数据既不是连续的,不是名义的,也不是有序的,而是仅部分排序的,因此其特殊性。因此,需要一种策略来编码样本之间的多态性,以便可以应用已知的监督方法。此外,找到具有最佳区分能力的最小分子标记集非常重要,例如,在给定的一组品种之间进行区分,这很重要,因为基因分型过程在实验室耗材,人工和时间方面可能会很昂贵。由于使用的数据的特定性质,此功能选择问题也需要特别注意。结果提出了一种在正定内核中编码SSR多态性的方法,该方法然后可以使用任何受内核监督的方法。样本之间的多态性是通过Nei-Li遗传距离编码的,该遗传距离显示出在基因型样本之间定义了正定核。此外,提出了一种用于选择SSR标记试剂盒的贪婪特征选择算法,以建立经济有效的歧视预测模型。该算法是一种滤波方法,其性能优于适用于此设置的其他滤波方法。当与核线性判别分析或核主成分分析然后进行线性判别分析相结合时,该方法可得出非常令人满意的预测模型。结论该方法的主要优点是受益于一种灵活的方式来对内核中的多态性进行编码,当与特征选择算法结合使用时,可以生成一些特定的标记,从而可以基于SSR基因分型建立准确而经济的识别模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号