首页>
外国专利>
Document clustering using a locality sensitive hashing function
Document clustering using a locality sensitive hashing function
展开▼
机译:使用位置敏感的哈希函数进行文档聚类
展开▼
页面导航
摘要
著录项
相似文献
摘要
Documents from a data stream are clustered by first generating a feature vector for each document. A set of cluster centroids (e.g., feature vectors of their corresponding clusters) are retrieved from a memory based on the feature vector of the document using a locality sensitive hashing function. The centroids may be retrieved by retrieving a set of cluster identifiers from a cluster table, the cluster identifiers each indicative of a respective cluster centroid, and retrieving the cluster centroids corresponding to the retrieved cluster identifiers from a memory. Documents may then be clustered into one or more of the candidate clusters using distance measures from the feature vector of the document to the cluster centroids.
展开▼