首页> 外文期刊>BMC Bioinformatics >Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique
【24h】

Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique

机译:通过信息增益,遗传算法和频率特征选择技术的混合来发现猪养殖的重要猪SNPS

获取原文
           

摘要

The number of porcine Single Nucleotide Polymorphisms (SNPs) used in genetic association studies is very large, suitable for statistical testing. However, in breed classification problem, one needs to have a much smaller porcine-classifying SNPs (PCSNPs) set that could accurately classify pigs into different breeds. This study attempted to find such PCSNPs by using several combinations of feature selection and classification methods. We experimented with different combinations of feature selection methods including information gain, conventional as well as modified genetic algorithms, and our developed frequency feature selection method in combination with a common classification method, Support Vector Machine, to evaluate the method’s performance. Experiments were conducted on a comprehensive data set containing SNPs from native pigs from America, Europe, Africa, and Asia including Chinese breeds, Vietnamese breeds, and hybrid breeds from Thailand. The best combination of feature selection methods—information gain, modified genetic algorithm, and frequency feature selection hybrid—was able to reduce the number of possible PCSNPs to only 1.62% (164 PCSNPs) of the total number of SNPs (10,210 SNPs) while maintaining a high classification accuracy (95.12%). Moreover, the near-identical performance of this PCSNPs set to those of bigger data sets as well as even the entire data set. Moreover, most PCSNPs were well-matched to a set of 94 genes in the PANTHER pathway, conforming to a suggestion by the Porcine Genomic Sequencing Initiative. The best hybrid method truly provided a sufficiently small number of porcine SNPs that accurately classified swine breeds.
机译:遗传结合研究中使用的猪单核苷酸多态性(SNP)的数量非常大,适用于统计检测。然而,在品种分类问题中,人们需要具有更小的猪分类SNPS(PCSNPS)集,可以将猪准确分类为不同的品种。本研究试图通过使用特征选择和分类方法的多种组合来找到这样的PCSNP。我们尝试了特征选择方法的不同组合,包括信息增益,常规和修改的遗传算法,以及我们的发达频率特征选择方法与共同的分类方法,支持向量机,评估方法的性能。在含有来自美国,欧洲,非洲和亚洲的本土猪的综合数据集上进行了实验,包括中国品种,越南品种和泰国的杂种品种。特征选择方法的最佳组合 - 信息增益,修改遗传算法和频率特征选择混合 - 能够将可能的PCSNP的数量降低到仅在保持的同时仅为1.62%(164 pcsnps)的SNPS总数(10,210个SNP)高分类准确性(95.12%)。此外,该PCSNP的近似相同的性能设置为更大的数据集以及即使是整个数据集。此外,大多数PCSNPP在豹通路中的一组94基因均匀匹配,符合猪基因组测序序列的建议。最好的混合方法真正提供了足够少量的猪SNP,可准确分类猪养殖。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号