首页> 外文会议>IEEE International Conference on Cloud Computing in Emerging Markets >A Comparative Study of Feature Selection Methods for Classification of Chest X-Ray Image as Normal or Abnormal Inside AWS ECS Cluster
【24h】

A Comparative Study of Feature Selection Methods for Classification of Chest X-Ray Image as Normal or Abnormal Inside AWS ECS Cluster

机译:AWS ECS集群内部将胸部X射线图像分类为正常或异常的特征选择方法的比较研究

获取原文

摘要

Machine learning algorithms are used to discover complex nonlinear relationships in biomedical data. However, sophisticated learning models becomes computationally unfeasible when dimension of the data increases. One of the solution to overcome this problem is to use feature selection methods. Feature selection methods finds the optimal feature subset and the subset performance is evaluated using some evaluation criteria, these methods are categorized as Filter, Wrapper, Embedded and Hybrid approaches. Even though these methods reduces the dimension of the data, the execution time of training increases as the dataset size increases. And also nowadays the preferred place for storage of data is cloud. Thus, the first step before applying machine learning algorithms is to copy the data to our local machine. This might take lot of time, if the size of data is huge. So to overcome such problems, here we propose a pipeline that runs on the AWS cloud based distributed architecture capable of doing feature selection, training and classifying. Here, we define an evaluation criteria that measures the performance of feature subsets based on the classification accuracy and size of the feature subset. The experiments were carried out on two chest X-ray datasets (Shenzhen and NIH) clinically tested as normal or abnormal. We achieved the classification accuracy of 84.24% for Shenzhen dataset and 79.55% for NIH dataset for classifying the chest X-ray image as normal or abnormal reducing the feature subset size to more than 50% with hybrid approach of feature selection and using defined evaluation criteria.
机译:机器学习算法用于发现生物医学数据中的复杂非线性关系。但是,当数据的维数增加时,复杂的学习模型在计算上变得不可行。解决此问题的方法之一是使用特征选择方法。特征选择方法找到最佳特征子集,并使用一些评估标准对子集性能进行评估,这些方法分为“过滤器”,“包装器”,“嵌入式”和“混合”方法。即使这些方法减小了数据的维数,训练的执行时间也会随着数据集大小的增加而增加。如今,存储数据的首选地点是云。因此,应用机器学习算法之前的第一步是将数据复制到我们的本地机器上。如果数据量巨大,则可能要花费很多时间。因此,为了克服这些问题,我们在此提出了一个在基于AWS云的分布式架构上运行的管道,该管道能够进行功能选择,训练和分类。在这里,我们定义了一个评估标准,该评估标准基于特征子集的分类准确性和大小来测量特征子集的性能。实验是在两个经临床测试为正常或异常的胸部X射线数据集(深圳和NIH)上进行的。对于深圳X射线数据集,分类精度为84.24%,对于NIH数据集,分类精度为79.55%,使用特征选择和混合方法将特征子集大小降低到50%以上,可以将胸部X射线图像分类为正常或异常。确定的评估标准。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号