...
首页> 外文期刊>Animal >Preselection statistics and Random Forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochthonous cattle breeds
【24h】

Preselection statistics and Random Forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochthonous cattle breeds

机译:预选统计和随机森林分类鉴定了在国际化和自动加密牛种类中的人口信息单核苷酸多态性

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Commercial single nucleotide polymorphism (SNP) arrays have been recently developed for several species and can be used to identify informative markers to differentiate breeds or populations for several downstream applications. To identify the most discriminating genetic markers among thousands of genotyped SNPs, a few statistical approaches have been proposed. In this work, we compared several methods of SNPs preselection (Delta, F st and principal component analyses (PCA)) in addition to Random Forest classifications to analyse SNP data from six dairy cattle breeds, including cosmopolitan (Holstein, Brown and Simmental) and autochthonous Italian breeds raised in two different regions and subjected to limited or no breeding programmes (Cinisara, Modicana, raised only in Sicily and Reggiana, raised only in Emilia Romagna). From these classifications, two panels of 96 and 48 SNPs that contain the most discriminant SNPs were created for each preselection method. These panels were evaluated in terms of the ability to discriminate as a whole and breed-by-breed, as well as linkage disequilibrium within each panel. The obtained results showed that for the 48-SNP panel, the error rate increased mainly for autochthonous breeds, probably as a consequence of their admixed origin lower selection pressure and by ascertaining bias in the construction of the SNP chip. The 96-SNP panels were generally more able to discriminate all breeds. The panel derived by PCA-chrom (obtained by a preselection chromosome by chromosome) could identify informative SNPs that were particularly useful for the assignment of minor breeds that reached the lowest value of Out Of Bag error even in the Cinisara, whose value was quite high in all other panels. Moreover, this panel contained also the lowest number of SNPs in linkage disequilibrium. Several selected SNPs are located nearby genes affecting breed-specific phenotypic traits (coat colour and stature) or associated with production traits. In general, our results demonstrated the usefulness of Random Forest in combination to other reduction techniques to identify population informative SNPs.
机译:最近已经开发了几种物种的商业单核苷酸多态性(SNP)阵列,可用于鉴定信息性标志物,以区分若干下游应用程序的品种或群体。为了确定成千上万的基因分型SNP中最辨别的遗传标记,已经提出了一些统计方法。在这项工作中,除了随机森林分类外,我们还比较了SNP预选(Delta,F ST和主成分分析(PCA))的几种方法,以分析来自六个乳制力养殖品种的SNP数据,包括国际化(Holstein,Brown和Simmental)和在两个不同的地区提出的自身加密的意大利品种,并进行有限或没有育种计划(Cinisara,Modicana,只在西西里岛和Reggiana筹集,只在Emilia Romagna中提出)。从这些分类,为每个预选方法创建了包含最多判别SNP的96和48个SNP的两个面板。这些面板是在鉴别整体和常常歧视的能力方面进行评估,以及每个面板内的连接不平衡。得到的结果表明,对于48-SNP面板,可能主要用于自加起来的品种,可能是由于其混合的原点较低选择压力以及通过确定SNP芯片的构造中的偏差而增加的错误率。 96-SNP面板通常更能歧视所有品种。由PCA-CHOM(通过染色体预选染色体获得)衍生的面板可以识别信息性的SNP,这对于即使在Cinisara中也达到袋子误差的最低品种的分配特别有用,其价值相当高在所有其他面板中。此外,该面板也含有连锁不平衡中最低数量的SNP。几个选定的SNP位于附近的基因,影响特异性特异性表型特征或与生产性状相关。一般而言,我们的结果表明随机森林的用途组合到其他减少技术,以识别人口丰富的SNP。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号