A New Random Forest Method for Longitudinal Data Classification Using a Lexicographic Bi-Objective Approach

机译：利用词典双目标方法进行纵向数据分类的新随机森林方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Standard supervised machine learning methods often ignore the temporal information represented in longitudinal data, but that information can lead to more precise predictions in classification tasks. Data preprocessing techniques and classification algorithms can be adapted to cope directly with longitudinal data inputs, making use of temporal information such as the timeindex of features and previous measurements of the class variable. In this article, we propose two changes to the classification task of predicting age-related diseases in a real-world dataset created from the English Longitudinal Study of Ageing. First, we explore the addition of previous measurements of the class variable, and estimating the missing data in those added features using intermediate classifiers. Second, we propose a new splitfeature selection procedure for a random forest’s decision trees, which considers the candidate features’ time-indexes, in addition to the information gain ratio. Our experiments compared the proposed approaches to baseline approaches, in 3 prediction scenarios, varying the “time gap” for the prediction – how many years in advance the class (occurrence of an age-related disease) is predicted. The experiments were performed on 10 datasets varying the class variable, and showed that the proposed approaches increased the random forest’s predictive accuracy.

机译：标准监督机器学习方法通常忽略在纵向数据中表示的时间信息，但是该信息可以导致分类任务中的更精确的预测。数据预处理技术和分类算法可以适于直接用纵向数据输入来应对，利用诸如特征的TimeIndex的时间信息和类变量的先前测量。在本文中，我们提出了对从老龄化的英语纵向研究创造的真实数据集预测年龄相关疾病的分类任务的两个变化。首先，我们探索添加类变量的先前测量，并使用中间分类器估计丢失的数据中的数据。其次，除了信息增益比之外，我们提出了一种用于随机森林的决策树的新分裂选择程序，该决策树将考虑候选功能的时间索引。我们的实验将提出的基线方法的方法与3个预测情景相比，改变了预测的“时间差距” - 预先提前多年（相关疾病的发生）。对10个不同类变量的数据集进行实验，并显示提出的方法增加了随机森林的预测精度。

著录项

来源
《IEEE Symposium Series on Computational Intelligence》|2020年|806-813|共8页
会议地点
作者
Caio Ribeiro; Alex Freitas;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Diseases; Radio frequency; Prediction algorithms; Biomedical measurement; Training; Task analysis; Decision trees;

机译：疾病;射频;预测算法;生物医学测量;培训;任务分析;决策树;

相似文献

外文文献
中文文献
专利

1. Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests [J] . Jo?o Maroco, Dina Silva, Ana Rodrigues, BMC research notes . 2011,第1期

机译：痴呆症预测中的数据挖掘方法：线性判别分析，逻辑回归，神经网络，支持向量机，分类树和随机森林的准确性，敏感性和特异性的真实数据比较
2. FOREST SPECIES CLASSIFICATION BASED ON THREE-DIMENSIONAL COORDINATE AND INTENSITY INFORMATION OF AIRBORNE LIDAR DATA WITH RANDOM FOREST METHOD [J] . H. T. You, P. Lei, M. S. Li, International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences . 2020,第4期

机译：基于随机林法的空气延迟数据三维坐标和强度信息的森林物种分类
3. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data [J] . Bjoern H Menze, B Michael Kelm, Ralf Masuch, BMC Bioinformatics . 2009,第1期

机译：使用标准化学计量学方法对光谱数据进行特征选择和分类的随机森林及其基尼重要性的比较
4. Comparison of Sampling Methods for Imbalanced Data Classification in Random Forest [C] . May Phu Paing, C. Pintavirooj, Supan Tungjitkusolmun, Biomedical Engineering International Conference . 2018

机译：随机森林中不平衡数据分类的抽样方法比较
5. Feature Extraction and Random Forests Classification Software for Gas Chromatography/Differential Mobility Spectrometry (GC/DMS) Data [D] . Yeap, Danny. 2020

机译：用于气相色谱/差分移动光谱（GC / DMS）数据的特征提取和随机森林分类软件
6. Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy sensitivity and specificity of linear discriminant analysis logistic regression neural networks support vector machines classification trees and random forests [O] . João Maroco, Dina Silva, Ana Rodrigues, 2011

机译：痴呆症预测中的数据挖掘方法：线性判别分析逻辑回归神经网络支持向量机分类树和随机森林的准确性敏感性和特异性的真实数据比较
7. Data mining methods in the prediction of dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests [O] . Maroco, João, Silva, Dina, Rodrigues, Ana, 2011

机译：痴呆症预测中的数据挖掘方法：线性判别分析，逻辑回归，神经网络，支持向量机，分类树和随机森林的准确性，敏感性和特异性的真实数据比较

A New Random Forest Method for Longitudinal Data Classification Using a Lexicographic Bi-Objective Approach

摘要

著录项

相似文献

相关主题

期刊订阅