Enhancement of hepatitis virus immunoassay outcome predictions in imbalanced routine pathology data by data balancing and feature selection before the application of support vector machines

Alice M. Richardson; Brett A. Lidbury

首页> 外文期刊>BMC Medical Informatics and Decision Making >Enhancement of hepatitis virus immunoassay outcome predictions in imbalanced routine pathology data by data balancing and feature selection before the application of support vector machines

【24h】

Enhancement of hepatitis virus immunoassay outcome predictions in imbalanced routine pathology data by data balancing and feature selection before the application of support vector machines

机译：在支持向量机应用之前，通过数据平衡和特征选择来增强不平衡常规病理数据中肝炎病毒免疫测定结果的预测

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background Data mining techniques such as support vector machines (SVMs) have been successfully used to predict outcomes for complex problems, including for human health. Much health data is imbalanced, with many more controls than positive cases. Methods The impact of three balancing methods and one feature selection method is explored, to assess the ability of SVMs to classify imbalanced diagnostic pathology data associated with the laboratory diagnosis of hepatitis B (HBV) and hepatitis C (HCV) infections. Random forests (RFs) for predictor variable selection, and data reshaping to overcome a large imbalance of negative to positive test results in relation to HBV and HCV immunoassay results, are examined. The methodology is illustrated using data from ACT Pathology (Canberra, Australia), consisting of laboratory test records from 18,625 individuals who underwent hepatitis virus testing over the decade from 1997 to 2007. Results Overall, the prediction of HCV test results by immunoassay was more accurate than for HBV immunoassay results associated with identical routine pathology predictor variable data. HBV and HCV negative results were vastly in excess of positive results, so three approaches to handling the negative/positive data imbalance were compared. Generating datasets by the Synthetic Minority Oversampling Technique (SMOTE) resulted in significantly more accurate prediction than single downsizing or multiple downsizing (MDS) of the dataset. For downsized data sets, applying a RF for predictor variable selection had a small effect on the performance, which varied depending on the virus. For SMOTE, a RF had a negative effect on performance. An analysis of variance of the performance across settings supports these findings. Finally, age and assay results for alanine aminotransferase (ALT), sodium for HBV and urea for HCV were found to have a significant impact upon laboratory diagnosis of HBV or HCV infection using an optimised SVM model. Conclusions Laboratories looking to include machine learning via SVM as part of their decision support need to be aware that the balancing method, predictor variable selection and the virus type interact to affect the laboratory diagnosis of hepatitis virus infection with routine pathology laboratory variables in different ways depending on which combination is being studied. This awareness should lead to careful use of existing machine learning methods, thus improving the quality of laboratory diagnosis.

机译：背景技术诸如支持向量机（SVM）之类的数据挖掘技术已成功用于预测复杂问题（包括人类健康）的结果。许多健康数据失衡，与阳性病例相比，控制得多。方法探讨三种平衡方法和一种特征选择方法的影响，以评估SVM对与实验室诊断乙型肝炎（HBV）和丙型肝炎（HCV）感染相关的不平衡诊断病理数据进行分类的能力。检查了用于预测变量选择的随机森林（RF），以及为了克服与HBV和HCV免疫测定结果有关的阴性与阳性测试结果之间的巨大不平衡而进行的数据重塑。使用ACT病理学（澳大利亚堪培拉）的数据对方法进行了说明，该数据由1997年至2007年这十年间来自18,625例接受肝炎病毒检测的个人的实验室检测记录组成。结果总体而言，通过免疫测定对HCV检测结果的预测更为准确与相同的常规病理预测变量数据相关的HBV免疫测定结果相比。 HBV和HCV阴性结果远远超过阳性结果，因此比较了处理阴性/阳性数据失衡的三种方法。通过合成少数族裔过采样技术（SMOTE）生成数据集，与数据集的单次精简或多次精简（MDS）相比，预测结果要准确得多。对于缩小的数据集，将RF应用于预测变量选择对性能的影响很小，具体取决于病毒。对于SMOTE，RF对性能有负面影响。跨设置的性能差异分析支持了这些发现。最后，使用优化的SVM模型，发现丙氨酸转氨酶（ALT），乙肝病毒钠盐和丙肝病毒尿素的年龄和化验结果对实验室诊断乙肝病毒或丙肝病毒感染具有重要影响。结论希望将通过SVM进行机器学习作为决策支持的一部分的实验室需要意识到，平衡方法，预测变量选择和病毒类型会相互作用，从而通过常规病理实验室变量以不同方式影响肝炎病毒感染的实验室诊断，具体取决于正在研究哪种组合。这种认识应导致谨慎使用现有的机器学习方法，从而提高实验室诊断的质量。

著录项

来源
《BMC Medical Informatics and Decision Making》 |2017年第1期|共页
作者
Alice M. Richardson; Brett A. Lidbury;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类医药、卫生;
关键词

相似文献

外文文献
中文文献
专利

1. Infection status outcome, machine learning method and virus type interact to affect the optimised prediction of hepatitis virus immunoassay results from routine pathology laboratory assays in unbalanced data [J] . Alice M Richardson, Brett A Lidbury BMC Bioinformatics . 2013,第1期

机译：感染状态结果，机器学习方法和病毒类型相互影响，影响常规病理实验室分析中不平衡数据对肝炎病毒免疫分析结果的优化预测
2. Integration of feature vector selection and support vector machine for classification of imbalanced data [J] . Liu Jie, Zio Enrico Applied Soft Computing . 2019,第期

机译：集成功能矢量选择和支持向量机，用于分类数据分类
3. Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines [J] . Sebastián Maldonado, Richard Weber, Fazel Famili Information Sciences: An International Journal . 2014,第Null期

机译：使用支持向量机的高维类不平衡数据集特征选择
4. Enhancement of Hepatitis Virus Outcome Predictions with Application of K-Means Clustering [C] . G. Kurniawan, Z. Rustam International Symposium on Current Progress in Mathematics and Sciences . 2019

机译：肝炎病毒结果预测的应用于K-Means聚类
5. Active learning with support vector machines for imbalanced datasets and a method for stopping active learning based on stabilizing predictions. [D] . Bloodgood, Michael. 2009

机译：支持向量机用于不平衡数据集的主动学习，以及一种基于稳定预测的主动学习停止方法。
6. Enhancement of hepatitis virus immunoassay outcome predictions in imbalanced routine pathology data by data balancing and feature selection before the application of support vector machines [O] . Alice M. Richardson, Brett A. Lidbury 2017

机译：在支持向量机应用之前通过数据平衡和特征选择来增强不平衡常规病理数据中肝炎病毒免疫测定结果的预测
7. Infection status outcome, machine learning method and virus type interact to affect the optimised prediction of hepatitis virus immunoassay results from routine pathology laboratory assays in unbalanced data [O] . Richardson Alice, Lidbury Brett 2013

机译：感染状况的结果，机器学习方法和病毒类型相互影响，从而影响不平衡数据中常规病理实验室测定对肝炎病毒免疫测定结果的优化预测

Enhancement of hepatitis virus immunoassay outcome predictions in imbalanced routine pathology data by data balancing and feature selection before the application of support vector machines

摘要

著录项

相似文献

相关主题

期刊订阅