首页> 外文OA文献 >Integration of text- and data-mining using ontologies successfully selects disease gene candidates
【2h】

Integration of text- and data-mining using ontologies successfully selects disease gene candidates

机译:使用本体整合文本和数据挖掘可以成功选择疾病基因候选者

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Genome-wide techniques such as microarray analysis, Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS), linkage analysis and association studies are used extensively in the search for genes that cause diseases, and often identify many hundreds of candidate disease genes. Selection of the most probable of these candidate disease genes for further empirical analysis is a significant challenge. Additionally, identifying the genes that cause complex diseases is problematic due to low penetrance of multiple contributing genes. Here, we describe a novel bioinformatic approach that selects candidate disease genes according to their expression profiles. We use the eVOC anatomical ontology to integrate text-mining of biomedical literature and data-mining of available human gene expression data. To demonstrate that our method is successful and widely applicable, we apply it to a database of 417 candidate genes containing 17 known disease genes. We successfully select the known disease gene for 15 out of 17 diseases and reduce the candidate gene set to 63.3% (±18.8%) of its original size. This approach facilitates direct association between genomic data describing gene expression and information from biomedical texts describing disease phenotype, and successfully prioritizes candidate genes according to their expression in disease-affected tissues.
机译:全基因组技术,例如微阵列分析,基因表达序列分析(SAGE),大规模并行签名测序(MPSS),连锁分析和关联研究,广泛用于寻找引起疾病的基因,并经常鉴定出数百种候选基因疾病基因。选择这些候选疾病基因中最有可能进行进一步的经验分析是一项重大挑战。另外,由于多种贡献基因的低渗透性,鉴定引起复杂疾病的基因是有问题的。在这里,我们描述了一种新的生物信息学方法,该方法根据其表达谱选择候选疾病基因。我们使用eVOC解剖本体来整合生物医学文献的文本挖掘和可用的人类基因表达数据的数据挖掘。为了证明我们的方法是成功的并且广泛适用,我们将其应用于包含17个已知疾病基因的417个候选基因的数据库。我们成功地从17种疾病中选择了15种已知的疾病基因,并将候选基因集减少到其原始大小的63.3%(±18.8%)。这种方法促进了描述基因表达的基因组数据与描述疾病表型的生物医学文献信息之间的直接关联,并根据候选基因在受疾病影响的组织中的表达成功确定了优先顺序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号