Discovering molecular heterogeneities in phenotypically defined disease is of critical importance both for understanding pathogenic mechanisms of complex diseases and for finding efficient treatments. Recently, it has been recognized that cellular phenotypes are determined by the concerted actions of many functionally related genes in modular fashions. The underlying modular mechanisms should help the understanding of hidden genetic heterogeneities of complex diseases. We defined a putative disease module to be the functional gene groups in terms of both biological process and cellular localization, which are significantly enriched with genes highly variably expressed across the disease samples. As a validation, we used two large cancer datasets to evaluate the ability of the modules for correctly partitioning samples. Then, we sought the subtypes of complex diffuse large B-cell lymphoma (DLBCL) using a public dataset. Finally, the clinical significance of the identified subtypes was verified by survival analysis. In two validation datasets, we achieved highly accurate partitions that best fit the clinical cancer phenotypes. Then, for the notoriously heterogeneous DLBCL, wedemonstrated that two partitioned subtypes using an identified module (“cellularresponse to stress”) had very different 5-yearoverall rates (65% vs. 14%) and were highly significantly (P < 0.007) correlated with the clinical survival rate. Finally, we builta multivariate Cox proportional-hazard prediction model that included 4 genesas risk predictors for survival over DLBCL. The proposed modularapproach is a promising computational strategy for peeling offgenetic heterogeneities and understanding the modular mechanisms of humandiseases such as cancers.
展开▼