首页> 外文期刊>Journal of computational biology: A journal of computational molecular cell biology >Improving eQTL Analysis Using a Machine Learning Approach for Data Integration: A Logistic Model Tree Solution
【24h】

Improving eQTL Analysis Using a Machine Learning Approach for Data Integration: A Logistic Model Tree Solution

机译:使用机器学习方法改进EQTL分析,用于数据集成:逻辑模型树解决方案

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Expression quantitative trait loci (eQTL) analysis is an emerging method for establishing the impact of genetic variations (such as single nucleotide polymorphisms) on the expression levels of genes. Although different methods for evaluating the impact of these variations are proposed in the literature, the results obtained are mostly in disagreement, entailing a considerable number of false-positive predictions. For this reason, we propose an approach based on Logistic Model Trees that integrates the predictions of different eQTL mapping tools to produce more reliable results. More precisely, we employ a machine learning-based method using logistic functions to perform a linear regression able to classify the predictions of three eQTL analysis tools (namely, R/qtl, MatrixEQTL, and mRMR). Given the lack of a reference dataset and that computational predictions are not so easy to test experimentally, the performance of our approach is assessed using data from the DREAM5 challenge. The results show the quality of the aggregated prediction is better than that obtained by each single tool in terms of both precision and recall. We also performed a test on real data, employing genotypes and microRNA expression profiles from Caenorhabditis elegans, which proved that we were able to correctly classify all the experimentally validated eQTLs. These good results come both from the integration of the different predictions, and from the ability of this machine learning algorithm to find the best cutoff thresholds for each tool. This combination makes our integration approach suitable for improving eQTL predictions for testing in a laboratory, reducing the number of false-positive results.
机译:表达定量性状基因座(EQTL)分析是用于建立遗传变异(例如单核苷酸多态性)对基因表达水平的影响的新出现方法。虽然在文献中提出了评估这些变化影响的不同方法,但获得的结果主要是分歧,旨在具有相当数量的假阳性预测。因此,我们提出了一种基于Logistic模型树的方法,该树集成了不同EQTL映射工具的预测,以产生更可靠的结果。更确切地说,我们采用了一种基于机器学习的方法,使用逻辑函数来执行能够对三个EQTL分析工具的预测进行分类的线性回归(即,R / QTL,MatrixeQTL和MRMR)。鉴于缺乏参考数据集并且计算预测不那么容易测试实验,我们的方法的性能是使用来自Dream5挑战的数据进行评估。结果表明,在精度和召回的方面,聚合预测的质量优于每个单个工具获得的质量。我们还对真实数据进行了测试,使用来自Caenorhabditis elegiss的基因型和MicroRNA表达谱,这证明我们能够正确分类所有实验验证的EQTL。这些良好的结果来自不同预测的集成,以及从本机学习算法找到每个工具的最佳截止阈值的能力。这种组合使我们的集成方法适用于改善在实验室中测试的EQTL预测,降低了假阳性结果的数量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号