首页> 外国专利> Interactive cleaning for automatic document clustering and categorization

Interactive cleaning for automatic document clustering and categorization

机译:交互式清理,用于自动文档聚类和分类

摘要

Documents are clustered or categorized to generate a model associating documents with classes. Outlier measures are computed for the documents indicative of how well each document fits into the model. Outlier documents are identified to a user based on the outlier measures and a user selected outlier criterion. Ambiguity measures are computed for the documents indicative of a number of classes with which each document has similarity under the model. If a document is annotated with a label class, a possible corrective label class is identified if the annotated document has higher similarity with the possible corrective label class under the model than with the annotated label class. The clustering or categorizing is repeated adjusted based on received user input to generate an updated model associating documents with classes. Outlier and. ambiguity measures are also calculated at runtime for new documents classified using the model.
机译:将文档聚类或分类以生成将文档与类相关联的模型。计算文档的异常值,以指示每个文档适合模型的程度。基于离群值和用户选择的离群准则,向用户标识离群文档。为文档计算歧义度量,该歧义度量指示在该模型下每个文档具有相似性的多个类别。如果使用标签类对文档进行注释,则如果带注释的文档与模型下的可能的纠正标签类相比具有更高的相似性,则标识可能的纠正标签类。基于收到的用户输入,重复调整聚类或分类,以生成将文档与类相关联的更新模型。离群值和。对于使用该模型分类的新文档,还会在运行时计算歧义度量。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号