首页> 外文期刊>Applied Medical Informatics >Breast cancer diagnosis using feature extraction techniques with supervised and unsupervised classification algorithms
【24h】

Breast cancer diagnosis using feature extraction techniques with supervised and unsupervised classification algorithms

机译:使用特征提取技术以及监督和非监督分类算法进行乳腺癌诊断

获取原文
       

摘要

Abstract Background: Breast cancer is a serious disease that affects females around the globe. With the development of clinical technologies, different tumor features have been collected for breast cancer diagnosis. Filtering all the pertinent feature information to support the clinical disease diagnosis is a challenging and time-consuming task. The objective of this research was to diagnose breast cancer based on the extracted tumor features. The main contribution of our study is to use multivariate techniques such as principal component analysis, discriminant analysis and logistic regression for feature reduction combined with machine learning tools to classify and predict the tumor type. A hybrid DA-LR feature reduction is proposed, and models created with reduced features are tested by performing classification using Support Vector Machine, Naive Bayes, Decision Tree, Logistic Regression and Artificial Neural Network. Materials and Methods: Feature extraction and selection are critical to the quality of classifiers founded through data mining methods. To diagnose tumor through reduced features, a hybrid feature extraction is proposed. We tried to predict the disease based on relevant features in the data. The Breast Cancer Wisconsin Diagnostic Dataset obtained from the UCI Irvine Machine Learning Repository has been used in this study. After data pre-processing, the correlation matrix is generated that suggests the presence of multicollinearity. Feature reduction techniques including principal component analysis, discriminant analysis, and logistic regression are applied to extract features. Classification models namely Support vector machine, Naive Bayes, Decision Tree, Logistic Regression and Artificial Neural Network are created with extracted features, and their performance is compared. Result: The results not only illustrate the capability of the proposed approach on breast cancer diagnosis but also show time savings during the training phase. Physicians can also benefit from the mined abstract tumor features by better understanding the properties of different types of tumors. Conclusion: The Naive Bayes and Support Vector machine classification outperforms other classification methods and the model created with hybrid discriminant-logistic (DA-LR) feature selection performs best among all models.
机译:摘要背景:乳腺癌是一种严重的疾病,会影响全球的女性。随着临床技术的发展,已经收集了用于乳腺癌诊断的不同肿瘤特征。过滤所有相关特征信息以支持临床疾病诊断是一项艰巨而耗时的任务。这项研究的目的是根据提取的肿瘤特征诊断乳腺癌。我们研究的主要贡献是使用多变量技术,例如主成分分析,判别分析和逻辑回归以减少特征,并结合机器学习工具对肿瘤类型进行分类和预测。提出了一种混合的DA-LR特征约简,并通过使用支持向量机,朴素贝叶斯,决策树,逻辑回归和人工神经网络进行分类来测试创建有约简特征的模型。材料和方法:特征提取和选择对于通过数据挖掘方法建立的分类器的质量至关重要。为了通过减少特征来诊断肿瘤,提出了一种混合特征提取。我们试图根据数据中的相关特征预测疾病。从UCI Irvine机器学习存储库获得的乳腺癌威斯康星州诊断数据集已用于本研究。在数据预处理之后,生成表明存在多重共线性的相关矩阵。特征减少技术包括主成分分析,判别分析和逻辑回归被用于提取特征。利用提取的特征创建了支持向量机,朴素贝叶斯,决策树,逻辑回归和人工神经网络等分类模型,并对它们的性能进行了比较。结果:结果不仅说明了所提出方法在乳腺癌诊断中的能力,而且还表明了在培训阶段可以节省时间。通过更好地了解不同类型肿瘤的特性,医师还可以从挖掘的抽象肿瘤特征中受益。结论:朴素贝叶斯和支持向量机分类优于其他分类方法,使用混合判别逻辑(DA-LR)特征选择创建的模型在所有模型中表现最佳。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号