首页> 外国专利> Automatic word classification system

Automatic word classification system

机译:自动词分类系统

摘要

PURPOSE: To form a thesaurus for processing a natural language at high speed by sorting words by repeating division into clusters while using the cooccurrence frequency vectors of the words of sorting objects corresponding to an information quantity reference. CONSTITUTION: A statistical processing part 1 extracts words from an inputted document, totalizes (sums up) the cooccurrence frequency between the extracted word and the specified context of that word and prepares the cooccurrence frequency vector of the word. On the other hand, an automatic word sorting part 2 sorts the words while using the coccurrence frequency vector prepared by the statistic processing part 1 and outputs the thesaurus for sorting those words. When sorting the words with the automatic word sorting part 2 in this case, first of all, the word group of the sorting object is divided into two clusters, the relation (full description length) of two clusters at such a time is found, the the words of two clusters are exchanged so that this relation can be minimized corresponding to the prescribed information quantity reference. Then, clustering is performed again to two provided clusters and its division is performed until they can not be divided any more.
机译:目的:通过使用与信息量参考相对应的分类对象词的共现频率矢量,通过重复划分成簇来对词进行分类,从而形成用于高速处理自然语言的同义词库。构成:统计处理部分1从输入的文档中提取单词,对提取的单词与该单词的指定上下文之间的共现频率进行总计(求和),并准备该单词的共现频率向量。另一方面,自动词分类部2在使用由统计处理部1准备的并发频率矢量的同时对词进行分类,并输出用于分类这些词的同义词库。在这种情况下,当用自动词分类部2对词进行分类时,首先,将分类对象的词组划分为两个簇,此时找到两个簇的关系(完整描述长度),交换两个簇的单词,从而可以对应于规定的信息量参考最小化该关系。然后,再次对提供的两个群集执行聚类,并对其进行划分,直到无法再对其进行划分为止。

著录项

  • 公开/公告号JP3304670B2

    专利类型

  • 公开/公告日2002-07-22

    原文格式PDF

  • 申请/专利权人 日本電気株式会社;

    申请/专利号JP19950065716

  • 发明设计人 李 航;

    申请日1995-03-24

  • 分类号G06F17/28;

  • 国家 JP

  • 入库时间 2022-08-22 01:01:11

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号