首页> 外文会议>Eighth Pacific Symposium on Biocomputing (PSB), Jan 3-7, 2003, Kauai, Hawaii >MULTICLASS CANCER CLASSIFICATION USING GENE EXPRESSION PROFILING AND PROBABILISTIC NEURAL NETWORKS
【24h】

MULTICLASS CANCER CLASSIFICATION USING GENE EXPRESSION PROFILING AND PROBABILISTIC NEURAL NETWORKS

机译:基因表达谱和概率神经网络的多分类癌分类

获取原文
获取原文并翻译 | 示例

摘要

Gene expression profiling by microarray technology has been successfully applied to classification and diagnostic prediction of cancers. Various machine learning and data mining methods are currently used for classifying gene expression data. However, these methods have not been developed to address the specific requirements of gene microarray analysis. First, microarray data is characterized by a high-dimensional feature space often exceeding the sample space dimensionality by a factor of 100 or more. In addition, microarray data exhibit a high degree of noise. Most of the discussed methods do not adequately address the problem of dimensionality and noise. Furthermore, although machine learning and data mining methods are based on statistics, most such techniques do not address the biologist's requirement for sound mathematical confidence measures. Finally, most machine learning and data mining classification methods fail to incorporate misclassification costs, i.e. they are indifferent to the costs associated with false positive and false negative classifications. In this paper, we present a probabilistic neural network (PNN) model that addresses all these issues. The PNN model provides sound statistical confidences for its decisions, and it is able to model asymmetrical misclassification costs. Furthermore, we demonstrate the performance of the PNN for multiclass gene expression data sets. Here, we compare the performance of the PNN with two machine learning methods, a decision tree and a neural network. To assess and evaluate the performance of the classifiers, we use a lift-based scoring system that allows a fair comparison of different models. The PNN clearly outperformed the other models. The results demonstrate the successful application of the PNN model for multiclass cancer classification.
机译:通过微阵列技术进行基因表达谱分析已成功应用于癌症的分类和诊断预测。当前使用各种机器学习和数据挖掘方法来对基因表达数据进行分类。但是,尚未开发出这些方法来解决基因微阵列分析的特定要求。首先,微阵列数据的特征是高维特征空间,通常比样本空间维数大100倍或更多。另外,微阵列数据表现出高度的噪声。大多数讨论的方法不能充分解决尺寸和噪声问题。此外,尽管机器学习和数据挖掘方法是基于统计的,但大多数此类技术并未满足生物学家对合理的数学置信度度量的要求。最后,大多数机器学习和数据挖掘分类方法都没有包含误分类成本,即它们对与错误肯定和错误否定分类相关的成本无动于衷。在本文中,我们提出了一种解决所有这些问题的概率神经网络(PNN)模型。 PNN模型为其决策提供了可靠的统计可信度,并且能够对不对称错误分类成本进行建模。此外,我们证明了多类基因表达数据集的PNN性能。在这里,我们将PNN与两种机器学习方法(决策树和神经网络)的性能进行比较。为了评估和评估分类器的性能,我们使用了基于提升的评分系统,可以对不同模型进行公平比较。 PNN明显胜过其他模型。结果证明了PNN模型在多类癌症分类中的成功应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号