首页> 外文学位 >Biomarker discovery and clinical outcome prediction using knowledge-based bioinformatics.
【24h】

Biomarker discovery and clinical outcome prediction using knowledge-based bioinformatics.

机译:使用基于知识的生物信息学进行生物标志物发现和临床结果预测。

获取原文
获取原文并翻译 | 示例

摘要

Advances in high-throughput genomic and proteomic technology have led to a growing interest in cancer biomarkers. These biomarkers can potentially improve the accuracy of cancer subtype prediction and subsequently, the success of therapy. However, identification of statistically and biologically relevant biomarkers from high-throughput data can be unreliable due to the nature of the data---e.g., high technical variability, small sample size, and high dimension size. Due to the lack of available training samples, data-driven machine learning methods are often insufficient without the support of knowledge-based algorithms. We research and investigate the benefits of using knowledge-based algorithms to solve clinical prediction problems. Because we are interested in identifying biomarkers that are also feasible in clinical prediction models, we focus on two analytical components: feature selection and predictive model selection. In addition to data variance, we must also consider the variance of analytical methods. There are many existing feature selection algorithms, each of which may produce different results. Moreover, it is not trivial to identify model parameters that maximize the sensitivity and specificity of clinical prediction. Thus, we introduce a method that uses independently validated biological knowledge to reduce the space of relevant feature selection algorithms and to improve the reliability of clinical predictors.;Biologically relevant feature selection algorithms are those that favor independently validated biomarkers. We show that guiding feature ranking algorithm and parameter selection using these biomarkers improves the efficiency of detecting new biomarkers that are also likely to validate. Furthermore, the algorithm selection process iteratively evolves as it learns and incorporates new biomarkers into the knowledge set. Using both maximum likelihood and maximum a posteriori approaches, we show that the choice of an optimal or biologically relevant method changes in the presence of knowledge feedback. The clinical utility of biomarkers depends on their feasibility in clinical prediction applications. Thus, in a similar approach as---and in collaboration with---the FDA Microarray Quality Control (MAQC) Consortium, we examine several microarray datasets to assess the effect of knowledge-guided feature selection on prediction accuracy. The microarray datasets in our study vary in sample size and clinical focus. For each clinical focus---renal cancer, prostate cancer, and breast cancer---we build and test classification models using independent training and testing datasets in order to reduce prediction bias. Results of these experiments indicate that knowledge-guided feature selection improves clinical prediction. Finally, one of the primary obstacles in translating research to clinical applications is the inaccessibility of bioinformatics applications to the general community of clinicians and biologists. Therefore, we implement several functions of the knowledge-based framework as a web-based and user-friendly application called omniBiomarker. We develop functions of omniBiomarker according to standards of the NCI Cancer BioInformatics Grid (caBIG), further increasing the overall impact of this work.
机译:高通量基因组学和蛋白质组学技术的进步导致人们对癌症生物标志物的兴趣与日俱增。这些生物标志物可以潜在地提高癌症亚型预测的准确性,从而提高治疗的成功率。但是,由于数据的性质,例如高技术可变性,小样本量和高尺寸大小,从高通量数据中识别统计和生物学相关的生物标记可能不可靠。由于缺乏可用的训练样本,因此如果没有基于知识的算法的支持,数据驱动的机器学习方法通​​常是不够的。我们研究和调查使用基于知识的算法来解决临床预测问题的好处。由于我们对确定在临床预测模型中也可行的生物标志物感兴趣,因此我们专注于两个分析组件:特征选择和预测模型选择。除了数据差异外,我们还必须考虑分析方法的差异。现有许多特征选择算法,每种算法可能会产生不同的结果。此外,识别使临床预测的敏感性和特异性最大化的模型参数并非易事。因此,我们介绍一种使用独立验证的生物学知识来减少相关特征选择算法的空间并提高临床预测指标可靠性的方法。生物相关的特征选择算法是那些赞成独立验证的生物标记的算法。我们表明,使用这些生物标志物的指导特征分级算法和参数选择提高了检测也有可能验证的新生物标志物的效率。此外,算法选择过程在学习过程中不断发展,并将新的生物标记物整合到知识集中。使用最大似然和最大后验方法,我们表明,在存在知识反馈的情况下,最佳或生物学相关方法的选择会发生变化。生物标志物的临床用途取决于它们在临床预测应用中的可行性。因此,通过与FDA微阵列质量控制(MAQC)联盟类似的方法(并与之合作),我们检查了几个微阵列数据集,以评估知识指导的特征选择对预测准确性的影响。我们研究中的微阵列数据集的样本量和临床重点不同。对于每个临床重点-肾癌,前列腺癌和乳腺癌-我们将使用独立的训练和测试数据集构建和测试分类模型,以减少预测偏差。这些实验的结果表明,知识导向的特征选择可以改善临床预测。最后,将研究转化为临床应用的主要障碍之一是生物信息学应用对临床医生和生物学家的普遍访问能力。因此,我们将基于知识的框架的几个功能实现为基于Web的用户友好应用程序omniBiomarker。我们根据NCI癌症生物信息学网格(caBIG)的标准开发了omniBiomarker的功能,从而进一步增加了这项工作的总体影响。

著录项

  • 作者

    Phan, John H.;

  • 作者单位

    Georgia Institute of Technology.;

  • 授予单位 Georgia Institute of Technology.;
  • 学科 Engineering Biomedical.;Biology Bioinformatics.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 220 p.
  • 总页数 220
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号