Improving eQTL Analysis Using a Machine Learning Approach for Data Integration: A Logistic Model Tree Solution

Beretta Stefano; Castelli Mauro; Goncalves Ivo; Kel Ivan; Giansanti Valentina; Merelli Ivan

首页> 外文期刊>Journal of computational biology: A journal of computational molecular cell biology >Improving eQTL Analysis Using a Machine Learning Approach for Data Integration: A Logistic Model Tree Solution

【24h】

Improving eQTL Analysis Using a Machine Learning Approach for Data Integration: A Logistic Model Tree Solution

机译：使用机器学习方法改进EQTL分析，用于数据集成：逻辑模型树解决方案

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Expression quantitative trait loci (eQTL) analysis is an emerging method for establishing the impact of genetic variations (such as single nucleotide polymorphisms) on the expression levels of genes. Although different methods for evaluating the impact of these variations are proposed in the literature, the results obtained are mostly in disagreement, entailing a considerable number of false-positive predictions. For this reason, we propose an approach based on Logistic Model Trees that integrates the predictions of different eQTL mapping tools to produce more reliable results. More precisely, we employ a machine learning-based method using logistic functions to perform a linear regression able to classify the predictions of three eQTL analysis tools (namely, R/qtl, MatrixEQTL, and mRMR). Given the lack of a reference dataset and that computational predictions are not so easy to test experimentally, the performance of our approach is assessed using data from the DREAM5 challenge. The results show the quality of the aggregated prediction is better than that obtained by each single tool in terms of both precision and recall. We also performed a test on real data, employing genotypes and microRNA expression profiles from Caenorhabditis elegans, which proved that we were able to correctly classify all the experimentally validated eQTLs. These good results come both from the integration of the different predictions, and from the ability of this machine learning algorithm to find the best cutoff thresholds for each tool. This combination makes our integration approach suitable for improving eQTL predictions for testing in a laboratory, reducing the number of false-positive results.

机译：表达定量性状基因座（EQTL）分析是用于建立遗传变异（例如单核苷酸多态性）对基因表达水平的影响的新出现方法。虽然在文献中提出了评估这些变化影响的不同方法，但获得的结果主要是分歧，旨在具有相当数量的假阳性预测。因此，我们提出了一种基于Logistic模型树的方法，该树集成了不同EQTL映射工具的预测，以产生更可靠的结果。更确切地说，我们采用了一种基于机器学习的方法，使用逻辑函数来执行能够对三个EQTL分析工具的预测进行分类的线性回归（即，R / QTL，MatrixeQTL和MRMR）。鉴于缺乏参考数据集并且计算预测不那么容易测试实验，我们的方法的性能是使用来自Dream5挑战的数据进行评估。结果表明，在精度和召回的方面，聚合预测的质量优于每个单个工具获得的质量。我们还对真实数据进行了测试，使用来自Caenorhabditis elegiss的基因型和MicroRNA表达谱，这证明我们能够正确分类所有实验验证的EQTL。这些良好的结果来自不同预测的集成，以及从本机学习算法找到每个工具的最佳截止阈值的能力。这种组合使我们的集成方法适用于改善在实验室中测试的EQTL预测，降低了假阳性结果的数量。

著录项

来源
《Journal of computational biology: A journal of computational molecular cell biology》 |2018年第10期|共15页
作者
Beretta Stefano; Castelli Mauro; Goncalves Ivo; Kel Ivan; Giansanti Valentina; Merelli Ivan;
展开▼
作者单位

Univ Milano Bicocca Dipartimento Informat Sistemist &

Comunicaz Milan Italy;

Univ Nova Lisboa NOVA IMS Campus Campolide P-1070312 Lisbon Portugal;

Univ Nova Lisboa NOVA IMS Campus Campolide P-1070312 Lisbon Portugal;

CNR Ist Tecnol Biomed Via Flli Cervi 93 I-20090 Segrate Italy;

CNR Ist Tecnol Biomed Via Flli Cervi 93 I-20090 Segrate Italy;

CNR Ist Tecnol Biomed Via Flli Cervi 93 I-20090 Segrate Italy;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类生物数学方法;
关键词
data integration; eQTL analysis; evolutionary algorithm; genetic programming; machine learning;

机译：数据集成;EQTL分析;进化算法;遗传编程;机器学习;

相似文献

外文文献
中文文献
专利

1. Improving eQTL Analysis Using a Machine Learning Approach for Data Integration: A Logistic Model Tree Solution [J] . Stefano Beretta, Mauro Castelli, Ivo Gon?alves, Journal of computational biology . 2018,第10期

机译：使用机器学习方法进行数据集成来改进eQTL分析：逻辑模型树解决方案
2. A Time of Day Analysis of Pedestrian-Involved Crashes in California: Investigation of Injury Severity,a Logistic Regression and Machine Learning Approach Using HSIS Data [J] . Mokhtarimousavi Seyedmirsajad ITE journal . 2019,第10期

机译：加利福尼亚州行人事故的每日分析：伤害严重程度调查，Logistic回归和使用HSIS数据的机器学习方法
3. Integrating data sources to improve hydraulic head predictions: A hierarchical machine learning approach [J] . William J. Michael, Barbara S. Minsker, David Tcheng, Water resources research . 2005,第3期

机译：集成数据源以改善液压头预测：分层的机器学习方法
4. Improved Big Data Analytics Solution Using Deep Learning Model and Real-Time Sentiment Data Analysis Approach [C] . Chun-I Philip Chen, Jiangbin Zheng International conference on brain-inspired cognitive systems . 2018

机译：使用深度学习模型和实时情感数据分析方法的改进的大数据分析解决方案
5. A spatial approach to mineral potential modelling using decision tree and logistic regression analysis. [D] . Honarvar, Pauline. 2001

机译：使用决策树和逻辑回归分析进行矿物势建模的空间方法。
6. Developing a Simulated Online Model That Integrates GNSS Accelerometer and Weather Data to Detect Parturition Events in Grazing Sheep: A Machine Learning Approach [O] . Eloise S. Fogarty, David L. Swain, Greg M. Cronin, 2021

机译：开发一个模拟的在线模型集成了GNSS加速度计和天气数据以检测放牧绵羊中的份产事件：机器学习方法
7. The Bioinformatics Bookshelf: Teach Yourself Computational Biology? Bioinformatics: The Machine Learning Approach By Pierre Baldi and Soren Brunak Cambridge, MA: MIT Press (1998). 351 pp. $40.00; Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins Edited by Andreas D. Baxevanis and B. F. Francis Ouellette New York: Wiley-lnterscience (1998). 370 pp. $59.95; Guide to Human Genome Computing, Second Edition Edited by Martin J. Bishop San Diego, CA: Academic Press (1998). 306 pp. $69.95; Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids By Richard Durbin, Sean Eddy, Anders Krogh, and Graeme Mitchison Cambridge: Cambridge University Press (1998). 356 pp. $34.95; Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology By Dan Gusfield Cambridge: Cambridge University Press (1997). 534 pp. $59.95; Introduction to Computational Molecular Biology By Joao Setubal and Joao Meidanis Boston: PWS Publishing (1997). 296 pp. $61.95 [O] . Pickeral Oxana K, Boguski Mark S 1999

机译：生物信息学书架：自学计算生物学吗？生物信息学：机器学习方法，作者：Pierre Baldi和Soren Brunak剑桥，麻省：麻省理工学院出版社（1998）。 351页，$ 40.00；生物信息学：由Andreas D. Baxevanis和B. F. Francis Ouellette编辑的基因和蛋白质分析实用指南纽约：Wiley-Interscience（1998）。 370页，$ 59.95；《人类基因组计算指南》，第二版，由马丁·J·毕晓普（Martin J. Bishop）编辑，加利福尼亚州圣地亚哥：学术出版社（1998）。 306页，$ 69.95；生物序列分析：蛋白质和核酸的概率模型Richard Durbin，Sean Eddy，Anders Krogh和Graeme Mitchison剑桥：剑桥大学出版社（1998年）。 356页，$ 34.95；字符串，树和序列上的算法：计算机科学和计算生物学Dan Danssfield剑桥：剑桥大学出版社（1997年）。 534页，$ 59.95； Joao Setubal和Joao Meidanis Boston撰写的《计算分子生物学概论》：PWS出版（1997）。 296羽61.95美元

Improving eQTL Analysis Using a Machine Learning Approach for Data Integration: A Logistic Model Tree Solution

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅