A major challenge in bioinformatics is the grouping together of protein sequences into functionally similar families. Large scale clustering of protein sequences may help to identify novel relationships and may also be of use in structural genomics. This paper explores the use of graph-theoretic spectral methods for clustering protein sequences. Using the leading eigenvectors of a matrix derived from similarity information between protein sequences, we were able to obtain meaningful clusters on quite diverse sets of proteins. The results presented show how this method is often able to identify correctly the superfamilies to which the sequences belong.
展开▼