Lung diseases include some of the most widespread and deadly conditions known toaffect people in the US today. One of the main challenges in treating lung disease is thedifficulty of diagnosis. Clinical diagnosis remains largely dependent upon symptomatic-baseddiagnoses; many cases can be either misdiagnosed or undiagnosed until disease hasprogressed to a more severe stage. Most studies aimed at finding molecular-based diagnosticshave focused on one or two diseases at a time, yielding limited success. Instead, we searchedfor biomarkers reflective of the global health state of the lung by studying data taken from abroad range of lung diseases. We used gene expression microarray data from five differentlung diseases—lung adenocarcinoma, lung squamous cell carcinoma, malignant pleuralmesothelioma, chronic obstructive pulmonary disease, and asthma—as well as a nondiseasedphenotype, to train a classification tree scheme based on the Top Scoring Pair (TSP)algorithm (Geman et al., Stat Appl Genet Mol Biol. 2004; 3: Article 19). The algorithm identified27 gene pair classifiers that classify the three cancers explicitly, and several of the markershave been previously cited in literature as linked to these cancers. Ten-fold cross validationyielded a classification accuracy of approximately 88%. Thus, a TSP-based classification treescheme accurately identifies lung diseases from the relative expression of a few number ofdiagnostic gene pairs.
展开▼