首页>
外国专利>
SYSTEM AND ENGINE FOR SEEDED CLUSTERING OF NEWS EVENTS
SYSTEM AND ENGINE FOR SEEDED CLUSTERING OF NEWS EVENTS
展开▼
机译:新闻事件的种子聚类系统和引擎
展开▼
页面导航
摘要
著录项
相似文献
摘要
The present invention provides a seeded news event clustering and retrievalsystemconfigured to first create a candidate data set of documents, second create aset of initialclusters based on nearness or duplicate similarity status, and third create anaggregatecluster by merging initial clusters with seed documents. The inventiongenerates top-levelclusters for news events based on an editorially supplied topical label or"seed" componentand generates sub-topic-focused clusters based on algorithm. The system usesanagglomerative clustering algorithm to gather and structure documents intodistinct resultsets. Decisions on whether to merge related documents or clusters are madeaccording tosimilarity of evidence derived from two distinct sources, one, relying on adigital signaturebased on the unstructured text in the document, the other based on thepresence of namedentity tags that have been assigned to the document by an event or namedentity taggersuch as the Thomson Reuters Calais engine/web service.
展开▼