首页> 外文会议>IEEE International Conference on Big Data >Crack random forest for arbitrary large datasets
【24h】

Crack random forest for arbitrary large datasets

机译:破解任意大数据集的随机森林

获取原文

摘要

Random Forests (RF) of tree classifiers are a state-of-the-art method for classification purposes. RF show limited hyperparameter sensitivity, have high numerical robustness, possess native capacity of dealing with numerical and categorical features, and are quite effective in many real world problems with respect to other state-of-the-art techniques. In this work we show how to crack RF in order to be able to train them on arbitrary large datasets. In particular, we extend ReForeSt, an Apache Spark-based RF implementation. The new version of ReForeSt computation automatically adapts to two methodologies to distribute the data and the computation on the available machines and automatically chooses the one able to provide the result in less time. The new ReForeSt also supports Random Rotations, a quite recent randomization technique which can bust the accuracy of the original RF. We perform an extensive experimental evaluation between ReForeSt and MLlib by taking advantage of the Google Cloud Platform1. We test the performances and the scalability of ReForeSt and MLlib on several real world datasets. Results confirm that ReForeSt outperforms MLlib both in terms of memory and computational efficiency, and classification performances. ReForeSt is publicly available via GitHub2.
机译:树分类器的随机森林(RF)是用于分类目的的最新方法。 RF显示出有限的超参数灵敏度,具有很高的数值鲁棒性,具有处理数值和分类特征的固有能力,并且相对于其他最新技术在许多现实世界中的问题上非常有效。在这项工作中,我们展示了如何破解RF,以便能够在任意大型数据集上对其进行训练。特别是,我们扩展了ReForeSt(基于Apache Spark的RF实现)。新版本的ReForeSt计算自动适应两种方法,以在可用的计算机上分发数据和计算,并自动选择一种能够在较短时间内提供结果的方法。新的ReForeSt还支持随机旋转,这是一种很新的随机化技术,可以破坏原始RF的准确性。我们利用Google Cloud Platform 1 在ReForeSt和MLlib之间进行了广泛的实验评估。我们在几个真实的数据集上测试ReForeSt和MLlib的性能和可伸缩性。结果证实,在内存和计算效率以及分类性能方面,ReForeSt均优于MLlib。 ReForeSt可通过GitHub 2 公开获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号