首页> 外文期刊>Methods of information in medicine >Probability Machines: Consistent probability estimation using nonparametric learning machines
【24h】

Probability Machines: Consistent probability estimation using nonparametric learning machines

机译:概率机:使用非参数学习机进行一致的概率估计

获取原文
获取原文并翻译 | 示例
           

摘要

Background: Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. Objectives: The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Methods: Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Results: Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Conclusions: Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications.
机译:背景:大多数机器学习方法仅提供二进制响应的分类。但是,使用个别患者特征进行风险估计需要概率。最近已经表明,每个已知的与非参数回归问题一致的统计学习机都是一个概率机器,对于该估计问题可证明是一致的。目标:本文的目的是展示如何将随机森林和最近的邻居用于个体概率的一致估计。方法:详细描述了两个随机森林算法和两个最近邻算法,用于估计各个概率。我们详细讨论了随机森林,最近邻居和其他学习机器的一致性。我们进行了仿真研究,以说明该方法的有效性。我们通过分析有关皮马印第安人的阑尾炎的诊断和糖尿病的诊断的两个众所周知的数据集来举例说明算法。结果:仿真证明了该方法的有效性。通过实际的数据应用,我们展示了这种方法的准确性和实用性。我们提供了R包中的示例代码,其中概率估计已可用。这意味着可以使用现有软件执行所有计算。结论:随机森林算法以及最近邻居方法是有效的机器学习方法,用于估计个体对二进制响应的概率。免费提供的实现在R中可用,可用于应用程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号