首页> 外国专利> SYSTEM AND METHOD FOR HIERARCHICALLY ORGANIZING DOCUMENTS BASED ON DOCUMENT PORTIONS

SYSTEM AND METHOD FOR HIERARCHICALLY ORGANIZING DOCUMENTS BASED ON DOCUMENT PORTIONS

机译:基于文档部分分层组织文档的系统和方法

摘要

Embodiments as disclosed may generate an organizational hierarchy based on embeddings of portions of documents. Embeddings resulting from the embedding of the portions of the documents can be clustered using a hierarchical clustering mechanism to segment the portion space into a set of hierarchical clusters. Documents can be assigned to these clusters based on the presence of a portion of a document within a cluster. In this manner, the documents may themselves be clustered based on the clusters created from portions across the documents of the corpus. The clusters to which a document is assigned may also be ranked with respect to that document. Similarly, documents assigned to cluster can be ranked within the cluster to which they are assigned. Additionally, in certain embodiments, names or snippets for the clusters of the hierarchy may be derived from the portions comprising that cluster.
机译:所公开的实施例可以基于部分文档的嵌入来生成组织层次结构。由嵌入文档的部分产生的嵌入可以使用分层聚类机制群集,以将部分空间分段为一组分层集群。可以基于群集中的一部分文档的存在,将文档分配给这些群集。以这种方式,可以基于从语料库的文档中的部分创建的集群来群集文档。分配了文档的群集也可以对该文档进行排序。类似地,分配给群集的文档可以在分配它们的群集中排列。另外,在某些实施例中,可以从包括该群集的部分导出的层次结构的簇的名称或片段。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号