...
首页> 外文期刊>Biology Direct >Integration of multiple types of genetic markers for neuroblastoma may contribute to improved prediction of the overall survival
【24h】

Integration of multiple types of genetic markers for neuroblastoma may contribute to improved prediction of the overall survival

机译:整合神经母细胞瘤的多种类型的遗传标记可能有助于改善整体生存的预测

获取原文
           

摘要

Modern experimental techniques deliver data sets containing profiles of tens of thousands of potential molecular and genetic markers that can be used to improve medical diagnostics. Previous studies performed with three different experimental methods for the same set of neuroblastoma patients create opportunity to examine whether augmenting gene expression profiles with information on copy number variation can lead to improved predictions of patients survival. We propose methodology based on comprehensive cross-validation protocol, that includes feature selection within cross-validation loop and classification using machine learning. We also test dependence of results on the feature selection process using four different feature selection methods. The models utilising features selected based on information entropy are slightly, but significantly, better than those using features obtained with t-test. The synergy between data on genetic variation and gene expression is possible, but not confirmed. A slight, but statistically significant, increase of the predictive power of machine learning models has been observed for models built on combined data sets. It was found while using both out of bag estimate and in cross-validation performed on a single set of variables. However, the improvement was smaller and non-significant when models were built within full cross-validation procedure that included feature selection within cross-validation loop. Good correlation between performance of the models in the internal and external cross-validation was observed, confirming the robustness of the proposed protocol and results. We have developed a protocol for building predictive machine learning models. The protocol can provide robust estimates of the model performance on unseen data. It is particularly well-suited for small data sets. We have applied this protocol to develop prognostic models for neuroblastoma, using data on copy number variation and gene expression. We have shown that combining these two sources of information may increase the quality of the models. Nevertheless, the increase is small and larger samples are required to reduce noise and bias arising due to overfitting. This article was reviewed by Lan Hu, Tim Beissbarth and Dimitar Vassilev.
机译:现代实验技术提供的数据集包含数以万计的潜在分子和遗传标记,可用于改善医学诊断。对一组成神经细胞瘤患者使用三种不同的实验方法进行的先前研究创造了机会,以检查具有拷贝数变异信息的增强基因表达谱是否可以改善患者存活率的预测。我们提出了基于综合交叉验证协议的方法,该方法包括在交叉验证循环中进行特征选择以及使用机器学习进行分类。我们还使用四种不同的特征选择方法测试结果对特征选择过程的依赖性。使用基于信息熵选择的特征的模型比使用t检验获得的特征的模型稍好,但效果明显。遗传变异和基因表达数据之间的协同作用是可能的,但尚未得到证实。对于基于组合数据集的模型,已经观察到机器学习模型的预测能力略有提高,但在统计意义上显着。在使用袋外估计和对一组变量进行交叉验证时发现。但是,当在完全交叉验证过程中建立模型(包括在交叉验证循环中进行特征选择)时,改进幅度较小且不显着。在内部和外部交叉验证中,模型的性能之间具有良好的相关性,从而证实了所提出协议和结果的鲁棒性。我们已经开发了用于构建预测性机器学习模型的协议。该协议可以对看不见的数据提供模型性能的可靠估计。它特别适合于小型数据集。我们已经使用该协议开发了神经母细胞瘤的预后模型,使用了拷贝数变异和基因表达的数据。我们已经表明,结合这两种信息源可以提高模型的质量。然而,增加的幅度很小,需要较大的样本以减少由于过度拟合而产生的噪声和偏差。本文由Lan Hu,Tim Beissbarth和Dimitar Vassilev撰写。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号