首页> 外文期刊>Scientific reports. >Smoking Gun or Circumstantial Evidence? Comparison of Statistical Learning Methods using Functional Annotations for Prioritizing Risk Variants
【24h】

Smoking Gun or Circumstantial Evidence? Comparison of Statistical Learning Methods using Functional Annotations for Prioritizing Risk Variants

机译:抽烟还是间接证据?使用功能注释确定风险变量优先级的统计学习方法的比较

获取原文

摘要

Although technology has triumphed in facilitating routine genome sequencing, new challenges have been created for the data-analyst. Genome-scale surveys of human variation generate volumes of data that far exceed capabilities for laboratory characterization. By incorporating functional annotations as predictors, statistical learning has been widely investigated for prioritizing genetic variants likely to be associated with complex disease. We compared three published prioritization procedures, which use different statistical learning algorithms and different predictors with regard to the quantity, type and coding. We also explored different combinations of algorithm and annotation set. As an application, we tested which methodology performed best for prioritizing variants using data from a large schizophrenia meta-analysis by the Psychiatric Genomics Consortium. Results suggest that all methods have considerable (and similar) predictive accuracies (AUCs 0.64–0.71) in test set data, but there is more variability in the application to the schizophrenia GWAS. In conclusion, a variety of algorithms and annotations seem to have a similar potential to effectively enrich true risk variants in genome-scale datasets, however none offer more than incremental improvement in prediction. We discuss how methods might be evolved for risk variant prediction to address the impending bottleneck of the new generation of genome re-sequencing studies.
机译:尽管技术在简化常规基因组测序方面取得了成功,但为数据分析人员提出了新的挑战。人类变异的基因组规模调查所产生的数据量远远超过了实验室表征的能力。通过将功能注释作为预测因子,人们广泛研究了统计学习,以优先考虑可能与复杂疾病相关的遗传变异。我们比较了三种已发布的优先级排序程序,它们在数量,类型和编码方面使用了不同的统计学习算法和不同的预测变量。我们还探索了算法和注释集的不同组合。作为应用程序,我们使用来自精神病学基因组学联盟的大型精神分裂症荟萃分析中的数据,测试了哪种方法最适合确定变体的优先级。结果表明,所有方法在测试集数据中均具有相当(和相似)的预测准确性(AUC 0.64–0.71),但在精神分裂症GWAS的应用中存在更大的可变性。总而言之,各种算法和注释似乎都具有在有效丰富基因组规模数据集中真实风险变体方面的相似潜力,但是,除了预测方面的增量改进外,没有其他方法和注释可以提供更多的改进。我们讨论了如何开发用于风险变异预测的方法,以解决新一代基因组重测序研究中即将出现的瓶颈。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号