首页> 外国专利> Document clustering using a locality sensitive hashing function

Document clustering using a locality sensitive hashing function

机译:使用位置敏感的哈希函数进行文档聚类

摘要

Documents from a data stream are clustered by first generating a feature vector for each document. A set of cluster centroids (e.g., feature vectors of their corresponding clusters) are retrieved from a memory based on the feature vector of the document using a locality sensitive hashing function. The centroids may be retrieved by retrieving a set of cluster identifiers from a cluster table, the cluster identifiers each indicative of a respective cluster centroid, and retrieving the cluster centroids corresponding to the retrieved cluster identifiers from a memory. Documents may then be clustered into one or more of the candidate clusters using distance measures from the feature vector of the document to the cluster centroids.
机译:通过首先为每个文档生成特征向量来对数据流中的文档进行聚类。使用局部敏感的哈希函数,基于文档的特征向量,从存储器中检索出一组簇质心(例如,其对应的簇的特征向量)。可以通过从群集表中检索一组群集标识符来检索质心,每个群集标识符指示各自的群集质心,并从存储器中检索与检索到的群集标识符相对应的群集质心。然后,可以使用从文档的特征向量到聚类质心的距离度量将文档聚类为一个或多个候选聚类。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号