Microarray-based classification of disease states is based on gene expression profiles of subjects. Various methods have been proposed to identify diagnostic markers that can accurately discriminate between two classes such as case and control. Many of the methods that used only a subset of ranked genes in the pathway may not be able to fully represent the classification boundaries for the two disease classes. The use of negatively correlated feature sets (NCFS) for identifying phenotype-correlated genes (PCOGs) and inferring pathway activities is used here. The NCFS-based pathway activity inference schemes significantly improved the power of pathway markers to discriminate between normal and cancer, as well as relapse and non-relapse, classes in microarray expression datasets of breast cancer. Furthermore, the use of ranker feature selection methods with top 3 pathway markers has been shown to be suitable for both logistic and NB classifiers. In addition, the proposed single pathway classification (SPC) ranker provided similar performance to the traditional SVM and Relief-F feature selection methods. The identification of PCOGs within each pathway, especially with the use of NCFS based on correlation with ideal markers (NCFS-i), helps to minimize the effect of potentially noisy experimental data, leading to accurate and robust classification results.
展开▼