首页> 外国专利> SYSTEM AND ENGINE FOR SEEDED CLUSTERING OF NEWS EVENTS

SYSTEM AND ENGINE FOR SEEDED CLUSTERING OF NEWS EVENTS

机译:新闻事件的种子聚类系统和引擎

摘要

The present invention provides a seeded news event clustering and retrieval system configured to first create a candidate data set of documents, second create a set of initial clusters based on nearness or duplicate similarity status, and third create an aggregate cluster by merging initial clusters with seed documents. The invention generates top-level clusters for news events based on an editorially supplied topical label or “seed” component and generates sub-topic-focused clusters based on algorithm. The system uses an agglomerative clustering algorithm to gather and structure documents into distinct result sets. Decisions on whether to merge related documents or clusters are made according to similarity of evidence derived from two distinct sources, one, relying on a digital signature based on the unstructured text in the document, the other based on the presence of named entity tags that have been assigned to the document by an event or named entity tagger such as the Thomson Reuters Calais engine/web service.
机译:本发明提供一种种子新闻事件聚类和检索系统,其被配置为首先创建文档的候选数据集,然后基于接近度或重复的相似性状态创建一组初始聚类,然后通过将初始聚类与种子合并来创建聚合聚类。文件。本发明基于编辑提供的主题标签或“种子”组件生成新闻事件的顶级聚类,并基于算法生成关注子主题的聚类。该系统使用聚集聚类算法将文档收集和结构化为不同的结果集。根据来自两个不同来源的证据的相似性来决定是否合并相关文档或群集,一个是基于基于文档中非结构化文本的数字签名,另一个是基于具有以下名称的实体标签的存在:已由事件或命名的实体标记器(例如Thomson Reuters Calais引擎/网络服务)分配给文档。

著录项

  • 公开/公告号US2017235820A1

    专利类型

  • 公开/公告日2017-08-17

    原文格式PDF

  • 申请/专利权人 JACK G. CONRAD;MICHAEL J. BENDER;

    申请/专利号US201715418763

  • 发明设计人 JACK G. CONRAD;MICHAEL J. BENDER;

    申请日2017-01-29

  • 分类号G06F17/30;G06F17/22;G06F17/27;

  • 国家 US

  • 入库时间 2022-08-21 13:51:38

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号