...
首页> 外文期刊>International journal of soft computing >Review of the effect of feature selection for microarray data on the classification accuracy for cancer data sets
【24h】

Review of the effect of feature selection for microarray data on the classification accuracy for cancer data sets

机译:审查微阵列数据的特征选择对癌症数据集分类准确性的影响

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

DNA microarrays can be used to monitor the expression level of thousands of genes simultaneously and gene microarray data can be used in cancer diagnosis and classification. Many machine learning techniques have been developed for computational analyses of microarray data. A common difficulty for all techniques is the large number of genes compared to the small sample size which has a negative impact on their speed and accuracy. To overcome these limitations, feature selection techniques are applied to distinguish between significant and redundant or irrelevant genes. Feature selection methods are used for two main goals. The first is to identify the relationship between specific diseases and genes. The second is to examine a compact set of discriminative genes to develop a pattern classifier with good generalizability and limited complexity. Here, we review different feature selection methods for cancer microarray data sets and analyze their accuracy. We describe methods commonly used for selecting significant features including filters, wrappers and embedded methods, categorized according to their experimental methodology. We then compare the classification accuracy of the methods for various cancer data sets and their time complexity to make some suggestions regarding the use of suitable methods for cancer data sets.
机译:DNA微阵列可用于同时监测数千种基因的表达水平,而基因微阵列数据可用于癌症的诊断和分类。已经开发了许多用于微阵列数据的计算分析的机器学习技术。所有技术的共同难题是,与小样本量相比,基因数量众多,这对其速度和准确性产生负面影响。为了克服这些限制,应用特征选择技术来区分重要基因和冗余基因或无关基因。特征选择方法用于两个主要目标。首先是确定特定疾病和基因之间的关系。第二个是检查一组紧凑的判别基因,以开发具有良好通用性和有限复杂性的模式分类器。在这里,我们回顾了针对癌症微阵列数据集的不同特征选择方法,并分析了它们的准确性。我们描述了通常用于选择重要功能的方法,包括过滤器,包装器和嵌入式方法,并根据其实验方法进行了分类。然后,我们比较了各种癌症数据集方法的分类准确性及其时间复杂度,以就使用适合的癌症数据集方法提出一些建议。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号