首页> 外文会议>Pacific Symposium on Biocomputing >SNPs2ChIP: Latent Factors of ChIP-seq to infer functions of non-coding SNPs
【24h】

SNPs2ChIP: Latent Factors of ChIP-seq to infer functions of non-coding SNPs

机译:SNPS2芯片:芯片SEQ的潜在因子推断非编码SNP的功能

获取原文

摘要

Genetic variations of the human genome are linked to many disease phenotypes. While whole-genome sequencing and genome-wide association studies (GWAS) have uncovered a number of genotype-phenotype associations, their functional interpretation remains challenging given most single nucleotide polymorphisms (SNPs) fall into the non-coding region of the genome. Advances in chromatin immunoprecipitation sequencing (ChIP-seq) have made large-scale repositories of epigenetic data available, allowing investigation of coordinated mechanisms of epigenetic markers and transcriptional regulation and their influence on biological functions. To address this, we propose SNPs2ChIP, a method to infer biological functions of non-coding variants through unsupervised statistical learning methods applied to publicly-available epigenetic datasets. We systematically characterized latent factors by applying singular value decomposition to 652 ChIP-seq tracks of lymphoblastoid cell lines, and annotated the biological function of each latent factor using the genomic region enrichment analysis tool. Using these annotated latent factors as reference, we developed SNPs2ChIP, a pipeline that takes genomic region(s) as an input, identifies the relevant latent factors with quantitative scores, and returns them along with their inferred functions. As a case study, we focused on systemic lupus erythematosus and demonstrated our method's ability to infer relevant biological functions. We systematically applied SNPs2ChIP on publicly available datasets, including known GWAS associations from the GWAS catalogue and ChIP-seq peaks from a previously published study. Our approach to leverage latent patterns across genome-wide epigenetic datasets to infer the biological functions will advance understanding of the genetics of human diseases by accelerating the interpretation of non-coding genomes.
机译:人类基因组的遗传变异与许多疾病表型相关联。虽然全基因组测序和基因组 - 宽协会研究(GWAS)被发现了许多基因型 - 表型关联,但在大多数单一核苷酸多态性(SNPs)落入基因组的非编码区域中,它们的功能解释仍然挑战。染色质免疫沉淀序列(Chip-SEQ)的进展使得大规模的映储存库可用,允许调查表观遗传标志物和转录调控的协调机制及其对生物功能的影响。为了解决这一点,我们提出了SNPS2芯片,一种通过申请于可公开的表观遗传数据集的无监督统计学习方法推断非编码变体的生物学功能的方法。通过将奇异值分解应用于淋巴细胞系的652芯片-SEQ轨迹,系统地表征了潜伏因素,并使用基因组富集分析工具向每个潜在因子的生物学函数进行注释。使用这些带注释的潜在因子作为参考,我们开发了一个SNPS2芯片,一种将基因组区域作为输入的管道,识别具有定量分数的相关潜在因子,并与其推断的功能一起返回它们。作为一个案例研究,我们专注于Systemic Lupus红斑,并证明了我们的方法推断出相关的生物学功能的能力。我们系统地应用于公共可用数据集的SNPS2芯片,包括来自先前发布的研究的GWAS目录和Chip-SEQ峰的已知GWAS关联。我们的方法,以利用基因组宽的表观遗传数据集来推断生物学功能将通过加速对非编码基因组的解释来推进人类疾病的遗传学。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号