首页> 外文期刊>BMC Bioinformatics >Improving biomarker list stability by integration of biological knowledge in the learning process
【24h】

Improving biomarker list stability by integration of biological knowledge in the learning process

机译:通过在学习过程中整合生物学知识来提高生物标志物清单的稳定性

获取原文
       

摘要

BackgroundThe identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for biomarker discovery using microarray data often provide results with limited overlap. It has been suggested that one reason for these inconsistencies may be that in complex diseases, such as cancer, multiple genes belonging to one or more physiological pathways are associated with the outcomes. Thus, a possible approach to improve list stability is to integrate biological information from genomic databases in the learning process; however, a comprehensive assessment based on different types of biological information is still lacking in the literature. In this work we have compared the effect of using different biological information in the learning process like functional annotations, protein-protein interactions and expression correlation among genes.ResultsBiological knowledge has been codified by means of gene similarity matrices and expression data linearly transformed in such a way that the more similar two features are, the more closely they are mapped. Two semantic similarity matrices, based on Biological Process and Molecular Function Gene Ontology annotation, and geodesic distance applied on protein-protein interaction networks, are the best performers in improving list stability maintaining almost equal prediction accuracy.ConclusionsThe performed analysis supports the idea that when some features are strongly correlated to each other, for example because are close in the protein-protein interaction network, then they might have similar importance and are equally relevant for the task at hand. Obtained results can be a starting point for additional experiments on combining similarity matrices in order to obtain even more stable lists of biomarkers. The implementation of the classification algorithm is available at the link: http://www.math.unipd.it/~dasan/biomarkers.html.
机译:背景技术确定与疾病相关的分子生物标记物的可靠列表是早期诊断和治疗的基本步骤。但是,使用微阵列数据发现生物标志物的方法学通常会提供有限的重叠结果。已经提出这些不一致的原因之一可能是在诸如癌症的复杂疾病中,属于一种或多种生理途径的多个基因与结果相关。因此,提高列表稳定性的一种可行方法是在学习过程中整合来自基因组数据库的生物学信息。然而,文献中仍缺乏基于不同类型生物信息的综合评估。在这项工作中,我们比较了在学习过程中使用不同生物学信息(如功能注释,蛋白质-蛋白质相互作用和基因之间的表达相关性)的效果​​。结果生物学知识已通过基因相似性矩阵和表达数据线性转化的方式进行了整理。这样,两个特征越相似,它们的映射就越紧密。基于生物学过程和分子功能基因本体论注释的两种语义相似度矩阵,以及应用于蛋白质-蛋白质相互作用网络的测地距离,在提高列表稳定性,保持几乎相等的预测准确性方面表现最佳。结论进行的分析支持这样的思想:要素之间的相关性很强,例如,由于它们在蛋白质-蛋白质相互作用网络中的距离很近,因此它们可能具有相似的重要性,并且与手头的任务同样相关。获得的结果可能是进行更多相似性矩阵合并实验的起点,以便获得更加稳定的生物标记物列表。分类算法的实现可从以下链接获得:http://www.math.unipd.it/~dasan/biomarkers.html。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号