首页> 外国专利> METHOD FOR AUTOMATIC AND SEMI-AUTOMATIC CLASSIFICATION AND CLUSTERING OF NON-DETERMINISTIC TEXTS

METHOD FOR AUTOMATIC AND SEMI-AUTOMATIC CLASSIFICATION AND CLUSTERING OF NON-DETERMINISTIC TEXTS

机译:非确定性文本的自动和半自动分类与聚类方法

摘要

Non-deterministic text with average word recognition precision below 50 % is processed utilizing non-textual differences between words or sequences of words in the text to provide more useful information to users by resolving more than two decision options. One or more indexes that indicate non-textual differences between n-word sequences, where n is a positive integer, may be generated for use in data mining that considers the non-textual differences. Alternatively, multiple indexes may be generated using different data mining techniques that may or may not utilize non-textual differences and then the results produced by the different data mining techniques may be merged to identify non-textual differences. These techniques may be used in classifying, labeling, categorizing, filtering, clustering, or retrieving documents, or in discovering salient terms in a set of documents.
机译:利用单词中的单词或单词序列之间的非文本差异来处理平均单词识别精度低于50%的非确定性文本,以通过解决两个以上的决策选项为用户提供更多有用的信息。可以生成一个或多个指示n个单词序列之间非文本差异的索引,其中n是一个正整数,以用于考虑非文本差异的数据挖掘。备选地,可以使用可以使用或可以不使用非文本差异的不同数据挖掘技术来生成多个索引,然后可以将由不同数据挖掘技术产生的结果合并以识别非文本差异。这些技术可用于分类,标记,分类,过滤,聚类或检索文档,或用于发现一组文档中的显着术语。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号