首页>
外国专利>
SYSTEM AND ENGINE FOR SEEDED CLUSTERING OF NEWS EVENTS
SYSTEM AND ENGINE FOR SEEDED CLUSTERING OF NEWS EVENTS
展开▼
机译:新闻事件的种子聚类系统和引擎
展开▼
页面导航
摘要
著录项
相似文献
摘要
The present invention provides a seeded news event clustering and retrieval system configured to first create a candidate data set of documents, second create a set of initial clusters based on nearness or duplicate similarity status, and third create an aggregate cluster by merging initial clusters with seed documents. The invention generates top-level clusters for news events based on an editorially supplied topical label or “seed” component and generates sub-topic-focused clusters based on algorithm. The system uses an agglomerative clustering algorithm to gather and structure documents into distinct result sets. Decisions on whether to merge related documents or clusters are made according to similarity of evidence derived from two distinct sources, one, relying on a digital signature based on the unstructured text in the document, the other based on the presence of named entity tags that have been assigned to the document by an event or named entity tagger such as the Thomson Reuters Calais engine/web service.
展开▼