首页> 外国专利> Interactive cleaning for automatic document clustering and categorization

Interactive cleaning for automatic document clustering and categorization

机译：交互式清理，用于自动文档聚类和分类

页面导航

摘要
著录项
相似文献

摘要

Documents are clustered or categorized to generate a model associating documents with classes. Outlier measures are computed for the documents indicative of how well each document fits into the model. Outlier documents are identified to a user based on the outlier measures and a user selected outlier criterion. Ambiguity measures are computed for the documents indicative of a number of classes with which each document has similarity under the model. If a document is annotated with a label class, a possible corrective label class is identified if the annotated document has higher similarity with the possible corrective label class under the model than with the annotated label class. The clustering or categorizing is repeated adjusted based on received user input to generate an updated model associating documents with classes. Outlier and. ambiguity measures are also calculated at runtime for new documents classified using the model.

机译：将文档聚类或分类以生成将文档与类相关联的模型。计算文档的异常值，以指示每个文档适合模型的程度。基于离群值和用户选择的离群准则，向用户标识离群文档。为文档计算歧义度量，该歧义度量指示在该模型下每个文档具有相似性的多个类别。如果使用标签类对文档进行注释，则如果带注释的文档与模型下的可能的纠正标签类相比具有更高的相似性，则标识可能的纠正标签类。基于收到的用户输入，重复调整聚类或分类，以生成将文档与类相关联的更新模型。离群值和。对于使用该模型分类的新文档，还会在运行时计算歧义度量。

著录项

公开/公告号US2008249999A1

专利类型
公开/公告日2008-10-09

原文格式PDF
申请/专利权人 JEAN-MICHEL RENDERS;CAROLINE PRIVAULT;LUDOVIC MENUGE;
展开▼

申请/专利号US20070784321
发明设计人 LUDOVIC MENUGE;CAROLINE PRIVAULT;JEAN-MICHEL RENDERS;
展开▼

申请日2007-04-06
分类号G06F17/30;
国家 US
入库时间 2022-08-21 20:11:47

相似文献

专利
外文文献
中文文献