首页> 外文期刊>The American Journal of Human Genetics >DOMINO: Using Machine Learning to Predict Genes Associated with Dominant Disorders
【24h】

DOMINO: Using Machine Learning to Predict Genes Associated with Dominant Disorders

机译:Domino:使用机器学习来预测与主要疾病相关的基因

获取原文
获取原文并翻译 | 示例
           

摘要

In contrast to recessive conditions with biallelic inheritance, identification of dominant (monoallelic) mutations for Mendelian disorders is more difficult, because of the abundance of benign heterozygous variants that act as massive background noise (typically, in a?400:1 excess ratio). To reduce this overflow of false positives in next-generation sequencing (NGS) screens, we developed DOMINO, a tool assessing the likelihood for a gene to harbor dominant changes. Unlike commonly-used predictors of pathogenicity, DOMINO takes into consideration features that are the properties of genes, rather than of variants. It uses a machine-learning approach to extract discriminant information from a broad array of features (N = 432), including: genomic data, intra-, and interspecies conservation, gene expression, protein-protein interactions, protein structure, etc. DOMINO’s iterative architecture includes a training process on 985 genes with well-established inheritance patterns for Mendelian conditions, and repeated cross-validation that optimizes its discriminant power. When validated on 99 newly-discovered genes with pathogenic mutations, the algorithm displays an excellent final performance, with an area under the curve (AUC) of 0.92. Furthermore, unsupervised analysis by DOMINO of real sets of NGS data from individuals with intellectual disability or epilepsy correctly recognizes known genes and predicts 9 new candidates, with very high confidence. In summary, DOMINO is a robust and reliable tool that can infer dominance of candidate genes with high sensitivity and specificity, making it a useful complement to any NGS pipeline dealing with the analysis of the morbid human genome.
机译:与具有双层遗传的隐性条件相反,由于良性杂合的变体(通常,在a 400:1的多余比例中,孟德利亚疾病的抗性(单邻)突变的鉴定更加困难为了减少下一代测序(NGS)屏幕中的误报的这种溢出,我们开发了Domino,该工具评估了基因含有占主导地位变化的可能性。与致病性的常用预测因子不同,Domino考虑了基因属性的特征,而不是变体。它使用机器学习方法从广泛的特征(n = 432)中提取判别信息(n = 432),包括:基因组数据,和间隔保护,基因表达,蛋白质 - 蛋白质相互作用,蛋白质结构等。多米诺的迭代架构包括985个基因的培训过程,具有良好的孟德尔条件的遗传模式,并重复交叉验证,可优化其判别权力。当验证99个具有致病性突变的新发现基因时,该算法显示出优异的最终性能,其中曲线(AUC)为0.92的区域。此外,来自具有智力残疾或癫痫的个体的人类NGS数据的大多数NGS数据的无监督分析正确地识别已知的基因并预测9名新候选人,非常高的信心。总之,多米诺是一种坚固且可靠的工具,可以推断出具有高灵敏度和特异性的候选基因的主导地位,使其对处理病态人类基因组分析的任何NGS管道有用的补充。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号