首页> 外国专利> SYSTEM AND ENGINE FOR SEEDED CLUSTERING OF NEWS EVENTS

SYSTEM AND ENGINE FOR SEEDED CLUSTERING OF NEWS EVENTS

机译:新闻事件的种子聚类系统和引擎

摘要

The present invention provides a seeded news event clustering and retrievalsystemconfigured to first create a candidate data set of documents, second create aset of initialclusters based on nearness or duplicate similarity status, and third create anaggregatecluster by merging initial clusters with seed documents. The inventiongenerates top-levelclusters for news events based on an editorially supplied topical label or"seed" componentand generates sub-topic-focused clusters based on algorithm. The system usesanagglomerative clustering algorithm to gather and structure documents intodistinct resultsets. Decisions on whether to merge related documents or clusters are madeaccording tosimilarity of evidence derived from two distinct sources, one, relying on adigital signaturebased on the unstructured text in the document, the other based on thepresence of namedentity tags that have been assigned to the document by an event or namedentity taggersuch as the Thomson Reuters Calais engine/web service.
机译:本发明提供了种子新闻事件聚类和检索系统配置为首先创建文档的候选数据集,然后创建一个初始集基于接近度或重复相似状态的聚类,第三个创建一个骨料通过将初始群集与种子文档合并在一起。本发明产生顶级根据编辑提供的主题标签来分类新闻事件“种子”部分并基于算法生成以子主题为中心的聚类。系统使用一个聚集聚类算法将文档收集和结构化为明显的结果套。决定是否合并相关文档或群集根据来自两个不同来源的证据的相似性,一个依赖于电子签名基于文档中的非结构化文本,另一个基于命名的存在由事件分配给文档或已命名的实体标签实体标记器例如Thomson Reuters的Calais引擎/网络服务。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号