Unsupervised multirelational learning (clustering) in non-sparse domains such as molecular biology is especially difficult as most clustering algorithms tend to produce distinct clusters in slightly different runs (either with different initializations or with slightly different training data). In this paper we develop a multirelational consensus clustering algorithm based on nonnegative decompositions, which are known to produce sparser and more interpretable clusterings than other data-oriented algorithms. We apply this algorithm to the joint analysis of the largest available gene expression datasets for leukemia and respectively normal hematopoiesis in order to develop a more comprehensive genomic characterization of the heterogeneity of leukemia in terms of 38 normal hematopoietic cell states. Surprisingly, we find unusually complex expression programs involving large numbers of transcription factors, whose further in-depth analysis may help develop personalized therapies.
展开▼