首页> 外国专利> TEXT-REPRESENTATION, TEXT-MATCHING AND TEXT-CLASSIFICATION CODE, SYSTEM AND METHOD

TEXT-REPRESENTATION, TEXT-MATCHING AND TEXT-CLASSIFICATION CODE, SYSTEM AND METHOD

机译:文本表示,文本匹配和文本分类代码,系统和方法

摘要

Disclosed are a computer-readable code, system and method for representing, retrieving, and/or classifying a target document in the form of a digitally encoded natural-language text. For each of a plurality of non-generic words and/or words groups characterizing the target document, there is determined a selectivity value calculated as the frequency of occurrence of that term in a library of texts in one field, relative to the frequency of occurrence of the same term in one or more other libraries of texts in one or more other fields, respectively, and the document is represented as a vector of terms, where the coefficient assigned to each term is a function of the selectivity value determined for that term. There is then determined, for each of the plurality of sample texts having associated classification identifiers, a match score related to the number of descriptive terms present in or derived from that text that match those in the target text. From the selected matched texts, and the associated classification identifiers, a classification determination of the target document may be made.
机译:公开了一种用于以数字编码的自然语言文本的形式表示,检索和/或分类目标文档的计算机可读代码,系统和方法。对于表征目标文档的多个非通用词和/或词组中的每一个,确定选择性值,该选择性值被计算为该术语在一个字段中的文本库中相对于出现频率的出现频率分别在一个或多个其他字段中的一个或多个其他文本库中显示同一术语,并且文档表示为术语向量,其中分配给每个术语的系数是针对该术语确定的选择性值的函数。然后,对于具有关联的分类标识符的多个示例文本中的每一个,确定与该目标文本中存在的或源自该文本的描述性术语的数量有关的匹配分数。根据所选择的匹配文本以及相关的分类标识符,可以确定目标文档的分类。

著录项

  • 公开/公告号WO2004006124A3

    专利类型

  • 公开/公告日2004-05-06

    原文格式PDF

  • 申请/专利权人 WORD DATA CORP.;

    申请/专利号WO2003US21243

  • 发明设计人 DEHLINGER PETER J.;CHIN SHAO;

    申请日2003-07-02

  • 分类号G06F17/30;

  • 国家 WO

  • 入库时间 2022-08-21 22:59:02

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号