首页> 外国专利> A text classification system and method for the analysis and management of text

A text classification system and method for the analysis and management of text

机译:用于文本分析和管理的文本分类系统和方法

摘要

Documents are classified into one or more clusters corresponding to predefined classification categories by building a knowledge base comprising matrices of vectors which indicate the significance of terms within a corpus of text formed by the documents and classified in the knowledge base to each cluster. The significance of terms is determined assuming a standard normal probability distribution, and terms are determined to be significant to a cluster if their probability of occurrence being due to chance is low. For each cluster, statistical signatures comprising sums of weighted products and intersections of cluster terms to corpus terms are generated and used as discriminators for classifying documents. The knowledge base is built using prefix and suffix lexical rules which are context-sensitive and applied selectively to improve the accuracy and precision of classification.
机译:通过建立包括矢量矩阵的知识库将文档分类为与预定义分类类别相对应的一个或多个群集,该矢量矩阵指示由文档形成并在知识库中分类到每个群集的文本语料库中的术语的重要性。假设标准正态概率分布来确定项的重要性,并且如果因偶然性而导致它们出现的可能性较低,则确定项对聚类有意义。对于每个聚类,将生成包括加权乘积之和以及聚类术语与语料库术语的交集的统计签名,并将其用作对文档进行分类的鉴别符。知识库使用上下文敏感的前缀和后缀词法规则构建,并有选择地应用以提高分类的准确性和准确性。

著录项

  • 公开/公告号NZ502332A

    专利类型

  • 公开/公告日2002-10-25

    原文格式PDF

  • 申请/专利权人 THE DIALOG CORPORATION;

    申请/专利号NZ19980502332

  • 发明设计人 ZHILYAEV MAXIM;

    申请日1998-06-16

  • 分类号G06K9/62;G06K9/68;G06K9/70;G06K9/74;

  • 国家 NZ

  • 入库时间 2022-08-22 00:44:01

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号