首页> 外国专利> Systems and methods for identifying key phrase clusters within documents

Systems and methods for identifying key phrase clusters within documents

机译:用于识别文档中的关键词簇的系统和方法

摘要

Systems and methods are disclosed for key phrase clustering of documents. In accordance with one implementation, a method is provided for key phrase clustering of documents. The method includes obtaining a first plurality of documents based at least on a user input, obtaining a statistical model based at least on the user input, and obtaining, from content of the first plurality of documents, a plurality of segments. The method also includes identifying a plurality of clusters of segments from the plurality of segments, determining statistical significance of the plurality of clusters based at least on the statistical model and the content, and providing for display a representative cluster from the plurality of tokens, the representative cluster being determined based at least on the statistical significance. The method further includes determining a label for the representative cluster based at least on the plurality of clusters and the statistical significance.
机译:公开了用于文档的关键词聚类的系统和方法。根据一个实施方式,提供了一种用于文档的关键词聚类的方法。该方法包括:至少基于用户输入来获得第一多个文档;至少基于用户输入来获得统计模型;以及从第一多个文档的内容中获得多个片段。该方法还包括:从多个片段中识别片段的多个聚类;至少基于统计模型和内容来确定多个聚类的统计显着性;以及提供来自多个令牌的代表性聚类的显示,至少基于统计显着性确定代表性簇。该方法还包括至少基于多个聚类和统计显着性来确定用于代表性聚类的标签。

著录项

  • 公开/公告号US10180929B1

    专利类型

  • 公开/公告日2019-01-15

    原文格式PDF

  • 申请/专利权人 PALANTIR TECHNOLOGIES INC.;

    申请/专利号US201615293140

  • 发明设计人 MAX KESIN;HEM WADHAR;

    申请日2016-10-13

  • 分类号G06F17/30;G06F17/21;G06F17/27;G06F3/0481;

  • 国家 US

  • 入库时间 2022-08-21 12:12:32

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号