首页> 外文会议>Pacific Symposium on Biocomputing >Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder
【24h】

Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder

机译:除孔机学习方法识别与自闭症谱系障碍相关的非分量DNA中的单核苷酸变体

获取原文

摘要

Autism spectrum disorder (ASD) is a heritable neurodevelopmental disorder affecting 1 in 59 children. While noncoding genetic variation has been shown to play a major role in many complex disorders, the contribution of these regions to ASD susceptibility remains unclear. Genetic analyses of ASD typically use unaffected family members as controls; however, we hypothesize that this method does not effectively elevate variant signal in the noncoding region due to family members having subclinical phenotypes arising from common genetic mechanisms. In this study, we use a separate, unrelated outgroup of individuals with progressive supranuclear palsy (PSP), a neurodegenerative condition with no known etiological overlap with ASD, as a control population. We use whole genome sequencing data from a large cohort of 2182 children with ASD and 379 controls with PSP, sequenced at the same facility with the same machines and variant calling pipeline, in order to investigate the role of noncoding variation in the ASD phenotype. We analyze seven major types of noncoding variants: microRNAs, human accelerated regions, hypersensitive sites, transcription factor binding sites, DNA repeat sequences, simple repeat sequences, and CpG islands. After identifying and removing batch effects between the two groups, we trained an ?_1-regularized logistic regression classifier to predict ASD status from each set of variants. The classifier trained on simple repeat sequences performed well on a held-out test set (AUC-ROC = 0.960); this classifier was also able to differentiate ASD cases from controls when applied to a completely independent dataset (AUC-ROC = 0.960). This suggests that variation in simple repeat regions is predictive of the ASD phenotype and may contribute to ASD risk. Our results show the importance of the noncoding region and the utility of independent control groups in effectively linking genetic variation to disease phenotype for complex disorders.
机译:自闭症谱系障碍(ASD)是一种影响59名儿童的遗传性神经发育障碍。虽然未划定的遗传变异已经显示在许多复杂的障碍中起主要作用,但这些区域对ASD易感性的贡献仍然不清楚。 ASD的遗传分析通常使用未受影响的家庭成员作为对照;然而,我们假设该方法由于来自共同遗传机制而产生的亚临床表型的家庭成员,该方法在非分量区域中没有有效地提升变形信号。在这项研究中,我们使用具有渐进性激核麻痹(PSP)的单独的,不相关的小组,一种神经变性病症,没有已知的病因重叠与ASD,作为对照群体。我们使用全基因组测序数据来自2182名儿童的大群组和379个与PSP对照,在与同一机器和变体调用管道的同一设施中测序,以研究非编码变异在ASD表型中的作用。我们分析七种主要类型的非码变体:microRNA,人类加速区域,过敏位点,转录因子结合位点,DNA重复序列,简单的重复序列和CPG岛。在两组之间识别和删除批处理后,我们培训了一个?_1正则化的逻辑回归分类器,以预测来自每组变体的ASD状态。对简单重复序列培训的分类器在保持的测试集(AUC-ROC = 0.960)上进行了良好;当应用于完全独立的数据集(AUC-ROC = 0.960)时,该分类器还能够区分从控制的ASD案例。这表明简单重复区域的变化是对ASD表型的预测性,并且可能有助于ASD风险。我们的结果表明了非编码区的重要性以及独立对照组的实用性在有效地将遗传变异与复杂疾病的疾病表型联系起来。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号