首页> 外国专利> METHOD FOR AUTOMATIC ITERATIVE CLUSTERISATION OF ELECTRONIC DOCUMENTS ACCORDING TO SEMANTIC SIMILARITY, METHOD FOR SEARCH IN PLURALITY OF DOCUMENTS CLUSTERED ACCORDING TO SEMANTIC SIMILARITY AND COMPUTER-READABLE MEDIA

METHOD FOR AUTOMATIC ITERATIVE CLUSTERISATION OF ELECTRONIC DOCUMENTS ACCORDING TO SEMANTIC SIMILARITY, METHOD FOR SEARCH IN PLURALITY OF DOCUMENTS CLUSTERED ACCORDING TO SEMANTIC SIMILARITY AND COMPUTER-READABLE MEDIA

机译:一种基于语义相似度的电子文档自动迭代聚类的方法,一种基于语义相似度的聚类文档的多种搜索方法及计算机可读介质

摘要

FIELD: information technology.SUBSTANCE: method for automatic iterative clusterisation of electronic documents according to semantic similarity includes converting each electronic document into a corresponding multidimensional vector in multidimensional space, the number of dimensions of which is determined by terms contained in the electronic document; finding the measure of proximity of the obtained vector to each of the vectors already existing in the clusters, which combine semantically similar documents processed previously; supplementing the cluster for which the found proximity measure is minimal with the document to be processed; determining a new vector for the additional cluster; taking as the term of the additional cluster the name of the document in said cluster for which the proximity measure of its vector to the determined new vector is minimal. Thus, when new electronic documents are input, existing clusters are processed as separate documents and not as a set of documents.EFFECT: simple and faster processing of processing electronic documents and search in a clustered set of documents which are relevant to a search request.12 cl, 6 dwg
机译:技术领域:信息技术。领域:一种用于根据语义相似性对电子文档进行自动迭代聚类的方法,包括将每个电子文档转换为多维空间中的相应多维矢量,其维数由电子文档中包含的术语确定;寻找获得的矢量与聚类中已经存在的每个矢量的接近程度,这些矢量结合了先前处理的语义相似的文档;用要处理的文档补充发现的邻近度量最小的聚类;为附加集群确定一个新的向量;以附加簇的术语作为所述簇中文档的名称,其矢量与确定的新矢量的接近度最小。因此,当输入新的电子文档时,现有的群集将作为单独的文档而不是作为一组文档进行处理。效果:简单,更快地处理电子文档,并在与搜索请求相关的一组文档集中进行搜索。 12厘升,6载重吨

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号