Data mining in the life science swith random forest: A walk in the park or lost in the jungle?

Touw W.G.; Bayjanov J.R.; Overmars L.; Backus L.; Boekhorst J.; Wels M.; Sacha van Hijum A.F.T.

首页> 外文期刊>Briefings in bioinformatics >Data mining in the life science swith random forest: A walk in the park or lost in the jungle?

【24h】

Data mining in the life science swith random forest: A walk in the park or lost in the jungle?

机译：在具有随机森林的生命科学中进行数据挖掘：在公园散步还是在丛林中迷路？

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the Life Sciences 'omics' data is increasingly generated by different high-throughput technologies. Often only the integration of these data allows uncovering biological insights that can be experimentally validated or mechanistically modelled, i.e. sophisticated computational approaches are required to extract the ccomplex non-linear trends presentin omics data.Classification techniques allow training a model based on variables (e.g. SNPs in genetic association studies) to separate different classes (e.g. healthy subjects versus patients). Random Forest (RF) is a versatile classification algorithm suited for the analysis of these large data sets. In the Life Sciences, RF is popular because RF classification models have a high-prediction accuracy and provide information on importance of variables for classification. For omics data, variables or conditional relations between variables are typically important for a subset of samples of he same class. For example: within a class of cancer patients certain SNP combinations may be important for a subset of patients that have a specific subtype of cancer, but not important for a different subset of patients. These conditional relationships can in principle be uncovered from the data with RF as these are implicitly taken into account by thealgorithm during the creation of the classification model. This review details some of the to the best of our knowledgerarely or never used RF properties that allow maximizing the biological insights that can be extracted from ccomplex omics data sets using RF.

机译：在生命科学中，“组学”数据越来越多地由不同的高通量技术生成。通常，只有这些数据的整合才能揭示可以通过实验验证或以机械方式建模的生物学见解，即需要复杂的计算方法来提取组学数据中复杂的非线性趋势。分类技术允许基于变量（例如SNP）训练模型在遗传关联研究中）以区分不同的类别（例如，健康受试者与患者）。随机森林（RF）是一种通用的分类算法，适用于分析这些大数据集。在生命科学中，RF之所以受欢迎，是因为RF分类模型具有较高的预测准确性，并提供了有关分类变量重要性的信息。对于组学数据，变量或变量之间的条件关系对于同一类样本的子集通常很重要。例如：在一类癌症患者中，某些SNP组合对于具有特定癌症亚型的患者子集可能很重要，但对不同患者子集则不重要。原则上可以使用RF从数据中发现这些条件关系，因为在创建分类模型时，算法会隐式地考虑这些条件关系。这篇综述详细介绍了我们所学到的一些罕有或从未使用过的RF特性，这些特性可以最大化利用RF从复杂的组学数据集中提取的生物学见解。

著录项

来源
《Briefings in bioinformatics 》 |2013年第3期| 共12页
作者
Touw W.G.; Bayjanov J.R.; Overmars L.; Backus L.; Boekhorst J.; Wels M.; Sacha van Hijum A.F.T.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类遗传学 ;
关键词
Conditional relationships; Local importance; Proximity; Random Forest; Variable importance; Variable interaction;

机译：条件关系;局部重要性;邻近性;随机森林;变量重要性;变量交互;

相似文献

外文文献
专利

1. Data mining in the life science swith random forest: A walk in the park or lost in the jungle? [J] . Touw W.G., Bayjanov J.R., Overmars L., Briefings in bioinformatics . 2013 ,第3期

机译：在具有随机森林的生命科学中进行数据挖掘：在公园散步还是在丛林中迷路？
2. On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data [J] . Schwarz Daniel F., Koenig Inke R., Ziegler Andreas Bioinformatics . 2010 ,第14期

机译：关于野生丛林的野生动物园：高维数据随机森林的快速实现
3. On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data [J] . Andreas Ziegler Bioinformatics . 2010 ,第14期

机译：关于野生丛林的野生动物园：快速实现高维数据的随机森林
4. A walk in a cyber park or is it a digital jungle out there? [C] . Neste, T. . 2004

机译：在电子公园散步还是那里的数字丛林？
5. The Search for Gravitational Waves from the Coalescence of Black Hole Binary Systems in Data from the LIGO and Virgo Detectors Or: A Dark Walk through a Random Forest. [D] . Hodge, Kari Alison. 2014

机译：从LIGO和处女座探测器的数据中黑洞二元系统的合并中寻找引力波，或者：穿过随机森林的黑暗漫步。
6. Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle? [O] . Wouter G. Touw, *, Jumamurat R. Bayjanov, -1

机译：随机森林生命科学中的数据挖掘：在公园散步还是在丛林中迷路？
7. Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle? [O] . W. G. Touw, J. R. Bayjanov, L. Overmars, 2012

机译：随机森林生命科学的数据挖掘：在公园散步或迷失在丛林中？

Data mining in the life science swith random forest: A walk in the park or lost in the jungle?

摘要

著录项

相似文献

相关主题

期刊订阅