首页> 外文会议>Asia-Pacific Bioinformatics Conference >Pinpointing disease genes through phenomic and genomic data fusion
【24h】

Pinpointing disease genes through phenomic and genomic data fusion

机译:通过表征和基因组数据融合定位疾病基因

获取原文

摘要

Background: Pinpointing genes involved in inherited human diseases remains a great challenge in the post-genomics era. Although approaches have been proposed either based on the guilt-by-association principle or making use of disease phenotype similarities, the low coverage of both diseases and genes in existing methods has been preventing the scan of causative genes for a significant proportion of diseases at the whole-genome level.Results: To overcome this limitation, we proposed a rigorous statistical method called pgFusion to prioritize candidate genes by integrating one type of disease phenotype similarity derived from the Unified Medical Language System (UMLS) and seven typesof gene functional similarities calculated from gene expression, gene ontology, pathway membership, protein sequence, protein domain, protein-protein interaction and regulation pattern, respectively. Our method covered a total of 7,719 diseases and 20,327 genes, achieving the highest coverage thus far for both diseases and genes. We performed leave-one-out cross-validation experiments to demonstrate the superior performance of our method and applied it to a real exome sequencing dataset of epileptic encephalopathies, showing the capability of this approach in finding causative genes for complex diseases. We further provided the standalone software and online services of pgFusion at http://bioinfo.au.tsinghua.edu.cn/jianglab/pgfusion.Conclusions: pgFusion not only provided an effective way for prioritizing candidate genes, but also demonstrated feasible solutions to two fundamental questions in the analysis of big genomic data: the comparability of heterogeneous data and the integration of multiple types of data. Applications of this method in exome or whole genome sequencing studies would accelerate the finding of causative genes for human diseases. Other research fields in genomics could also benefit from the incorporation of ourdata fusion methodology.
机译:背景:针对遗传的人类疾病所涉及的基因仍然是基因组学时的巨大挑战。尽管已经基于逐个缔合的原则或利用疾病表型相似性提出了方法,但是现有方法中疾病和基因的低覆盖率已经预防了致病基因的扫描全基因组水平。结果:克服这种限制,我们提出了一种严格的统计方法,通过整合统一医疗语言系统(UMLS)的一种疾病表型相似性和七种类型的基因功能相似性来优先考虑候选基因优先考虑候选基因。基因表达,基因本体,途径成员,蛋白质序列,蛋白质结构域,蛋白质 - 蛋白质相互作用和调节模式。我们的方法涵盖了7,719个疾病和20,327个基因,实现了迄今为止疾病和基因的最高覆盖率。我们进行了休假交叉验证实验,以证明我们的方法的优越性能,并将其应用于癫痫脑病的真正exame测序数据集,显示这种方法在寻找复杂疾病的致病基因方面的能力。我们进一步提供了在http://bioinfo.au.tsinghua.edu.cn/jianglab/pgfusion的独立软件和在线服务大基因组数据分析中的两个基本问题:异构数据的可比性和多种数据的集成。这种方法在外壳或全基因组测序研究中的应用将加速对人类疾病的致病基因的发现。基因组学中的其他研究领域也可能受益于Ourdata融合方法的纳入。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号