首页> 外文OA文献 >A genetic algorithm approach to optimising random forests applied to class engineered data.
【2h】

A genetic algorithm approach to optimising random forests applied to class engineered data.

机译:一种遗传算法,用于优化应用于分类工程数据的随机森林。

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In numerous applications and especially in the life science domain, examples are labelled at a higher level of granularity. For example, binary classification is dominant in many of these datasets, with the positive class denoting the existence of a particular disease in medical diagnosis applications. Such labelling does not depict the reality of having different categories of the same disease; a fact evidenced in the continuous research in root causes and variations of symptoms in a number of diseases. In a quest to enhance such diagnosis, datasests were decomposed using clustering of each class to reveal hidden categories. We then apply the widely adopted ensemble classification technique Random Forests. Such class decomposition has two advantages: (1) diversification of the input that enhances the ensemble classification; and (2) improving class separability, easing the follow-up classification process. However, to be able to apply Random Forests on such class decomposed data, three main parameters need to be set: number of trees forming the ensemble, number of features to split on at each node, and a vector representing the number of clusters in each class. The large search space for tuning these parameters has motivated the use of Genetic Algorithm to optimise the solution. A thorough experimental study on 22 real datasets was conducted, predominantly in a variety of life science applications. To prove the applicability of the method to other areas of application, the proposed method was tested on a number of datasets from other domains. Three variations of Random Forests including the proposed method as well as a boosting ensemble classifier were used in the experimental study. The results prove the superiority of the proposed method in boosting up the accuracy.
机译:在许多应用程序中,尤其是在生命科学领域,示例都以更高的粒度标记。例如,二元分类在这些数据集中的许多数据中均占主导地位,阳性分类表示在医学诊断应用中特定疾病的存在。这样的标签并没有描绘出同一疾病具有不同类别的现实。持续研究证明了许多疾病的根本原因和症状变化的事实。为了增强这种诊断,使用每个类的聚类来分解数据存储以揭示隐藏的类别。然后,我们应用广泛采用的集成分类技术“随机森林”。这种类分解具有两个优点:(1)输入的多样化,增强了集成分类; (2)提高类别的可分离性,简化后续的分类过程。但是,为了能够将随机森林应用于此类类别的分解数据,需要设置三个主要参数:形成集合的树的数量,每个节点上要分割的要素的数量以及代表每个簇的数量的向量类。用于调整这些参数的巨大搜索空间促使人们使用遗传算法来优化解决方案。对22个真实数据集进行了全面的实验研究,主要是在各种生命科学应用中。为了证明该方法对其他应用领域的适用性,在来自其他领域的许多数据集上对提出的方法进行了测试。在实验研究中使用了随机森林的三种变体,包括所提出的方法以及增强集成分类器。结果证明了该方法在提高精度上的优越性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号