首页> 外文会议>Pacific Symposium on Biocomputing >A NEW RELEVANCE ESTIMATOR FOR THE COMPILATION AND VISUALIZATION OF DISEASE PATTERNS AND POTENTIAL DRUG TARGETS
【24h】

A NEW RELEVANCE ESTIMATOR FOR THE COMPILATION AND VISUALIZATION OF DISEASE PATTERNS AND POTENTIAL DRUG TARGETS

机译:一种新的相关性估计,用于疾病模式和潜在药物目标的编译和可视化

获取原文

摘要

A new computational method is presented to extract disease patterns from heterogeneous and text-based data. For this study, 22 million PubMed records were mined for co-occurrences of gene name synonyms and disease MeSH terms. The resulting publication counts were transferred into a matrix M_(data). In this matrix, a disease was represented by a row and a gene by a column. Each field in the matrix represented the publication count for a co-occurring disease-gene pair. A second matrix with identical dimensions M_(relevance) was derived from M_(data). To create M_(relevance) the values from M_(data) were normalized. The normalized values were multiplied by the column-wise calculated Gini coefficient. This multiplication resulted in a relevance estimator for every gene in relation to a disease. From M_(relevance) the similarities between all row vectors were calculated. The resulting similarity matrix S_(relevance) related 5,000 diseases by the relevance estimators calculated for 15,000 genes. Three diseases were analyzed in detail for the validation of the disease patterns and the relevant genes. Cytoscape was used to visualize and to analyze M_(relevance) and S_(relevance) together with the genes and diseases. Summarizing the results, it can be stated that the relevance estimator introduced here was able to detect valid disease patterns and to identify genes that encoded key proteins and potential targets for drug discovery projects.
机译:提出了一种新的计算方法,以提取来自异构和基于文本的数据的疾病模式。对于这项研究,为基因名称的同义词和疾病网格术语进行了2200万次发布的葡萄干记录。将得到的发布计数转移到矩阵M_(数据)中。在该基质中,疾病由一排和基因表示。基质中的每个领域表示共同发生的疾病 - 基因对的公开计数。具有相同尺寸M_(相关性)的第二矩阵来自M_(数据)。要创建M_(相关性)来自M_(数据)的值归一化。归一化值乘以列明的基尼系数。该倍增导致与疾病有关的每个基因的相关性估算器。来自M_(相关性)计算所有行向量之间的相似性。由此产生的相似性矩阵S_(相关性)通过计算15,000个基因计算的相关性估计有5,000个疾病。详细分析了三种疾病,以验证疾病模式和相关基因。 Cytoscape用于可视化和分析M_(相关性)和S_(相关性)以及基因和疾病。总结结果,可以说明这里介绍的相关性估计器能够检测有效的疾病模式,并鉴定编码关键蛋白质和药物发现项目潜在目标的基因。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号