...
首页> 外文期刊>International Journal of Electrical and Computer Engineering >Performance evaluation of random forest with feature selection methods in prediction of diabetes
【24h】

Performance evaluation of random forest with feature selection methods in prediction of diabetes

机译:糖尿病预测中具有特征选择方法随机林的性能评估

获取原文
           

摘要

Data mining is nothing but the process of viewing data in different angle and compiling it into appropriate information. Recent improvements in the area of data mining and machine learning have empowered the research in biomedical field to improve the condition of general health care. Since the wrong classification may lead to poor prediction, there is a need to perform the better classification which further improves the prediction rate of the medical datasets. When medical data mining is applied on the medical datasets the important and difficult challenges are the classification and prediction. In this proposed work we evaluate the PIMA Indian Diabtes data set of UCI repository using machine learning algorithm like Random Forest along with feature selection methods such as forward selection and backward elimination based on entropy evaluation method using percentage split as test option. The experiment was conducted using R studio platform and we achieved classification accuracy of 84.1%. From results we can say that Random Forest predicts diabetes better than other techniques with less number of attributes so that one can avoid least important test for identifying diabetes.
机译:数据挖掘只不过是从不同角度查看数据的过程并将其编译为适当的信息。最近的数据挖掘和机器学习领域的改进赋予生物医学领域的研究,以改善一般保健条件。由于错误的分类可能导致预测差,因此需要执行更好的分类,这进一步提高了医疗数据集的预测率。当医疗数据挖掘应用于医疗数据集时,重要和困难的挑战是分类和预测。在这个拟议的工作中,我们使用像随机林等机器学习算法一起评估了PIMA印度讽刺数据集UCI存储库,以及基于使用百分比拆分作为测试选项的熵评估方法等特征选择方法,如前向选择和向后消除。使用R工作室平台进行实验,我们实现了84.1%的分类准确性。从结果我们可以说,随机森林比具有较少数量的属性的技术更好地预测糖尿病,以便可以避免对鉴定糖尿病的最不重要的测试。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号