Embodiments provide a system and method for semantic distance calculation. The method can involve receiving a plurality of documents having a set of subjects extracted through the use of latent dirichlet allocation; for each document in the plurality of documents, generating a classification list comprising a ranking of the one or more subjects based on the relevance of each subject to the document; for each classification list, calculating the semantic distance between each subject present on the classification list; aggregating the plurality of classification lists; and creating a distance matrix containing the relative semantic distances between each member of the set of subjects.
展开▼