A system and method performs automated domain concept discovery and clustering using word embeddings by receiving a set of documents for natural language processing for a domain, representing a plurality of entries in the set of documents as continuous vectors in a high dimensional continuous space, applying a clustering algorithm based on a mutual information optimization criterion to form a set of clusters, associating each entry of the plurality of entries with each cluster in the set of clusters through formalizing an evidence based model of each cluster given each entry, calculating a mutual information metric between each entry and each cluster using the evidence based model, and identifying a nominal center of each cluster by maximizing the mutual information.
展开▼