首页> 外国专利> METHOD AND DEVICE FOR REGISTERING UNKNOWN WORD WITH NOUN THESAURUS AND RECORDING MEDIUM WITH UNKNOWN WORD REGISTRATION PROGRAM RECORDED THEREIN

METHOD AND DEVICE FOR REGISTERING UNKNOWN WORD WITH NOUN THESAURUS AND RECORDING MEDIUM WITH UNKNOWN WORD REGISTRATION PROGRAM RECORDED THEREIN

机译:用名词词库注册未知单词并用其中记录的未知单词注册程序记录媒体的方法和装置

摘要

PROBLEM TO BE SOLVED: To statistically strictly decide a node of a noun thesaurus having a multinomial distribution being close to the multinomial distribution of unknown words as an unknown work registration node by using a Bayesian estimator logically having guarantee under a limited sample instead of a cooccurrence frequency and Kullback-Leibler information quantity(KL information quantity) being an inter-distribution distance in probability distribution space instead of a cosine between vectors in vector space. ;SOLUTION: This device consists of a means 100 which calculates the cooccurrence frequency of an unknown word and each verb in document data in corpus 120 and the cooccurrence frequency of each node and each verb of a noun thesaurus 130 in the document data in the corpus, a means 200 which uses the cooccurrence frequency information and calculates the Bayesian estimator of a multinominal distribution in which the unknown word co-occurs with each verb and the Bayesian estimator of a multinomial distribution in which each node of the noun thesaurus co-occurs with each verb and a means 300 which uses the Bayesian estimators and outputs a node of the noun thesaurus having a multinomial distribution being the closest to the unknown word as an unknown work registration node with Kullback-Leibler information quantity as a standard.;COPYRIGHT: (C)2000,JPO
机译:解决的问题:通过使用严格地在有限样本下而非同时存在保证的贝叶斯估计器,从统计学上严格地确定具有接近于未知单词的多项式分布的多项式分布的名词词库的节点作为未知工作注册节点频率和Kullback-Leibler信息量(KL信息量)是概率分布空间中的分布间距离,而不是向量空间中向量之间的余弦。 ;解决方案:该设备由装置100组成,该装置计算语料库120中文档数据中未知单词和每个动词的同现频率,以及语料库中文档数据中名词词库130的每个节点和每个动词的同现频率装置200,它使用同现频率信息并计算多项式分布的贝叶斯估计量,其中未知词与每个动词同时出现;以及一种多项式分布的贝叶斯估计量,其中名词同义词库的每个节点与之同时出现。每个动词和一个使用贝叶斯估计器并输出具有最接近未知单词的多项式分布的名词同义词库的节点作为以Kullback-Leibler信息量为标准的未知工作注册节点的装置300; COPYRIGHT:( C)2000年

著录项

  • 公开/公告号JP2000231572A

    专利类型

  • 公开/公告日2000-08-22

    原文格式PDF

  • 申请/专利权人 NIPPON TELEGR & TELEPH CORP NTT;

    申请/专利号JP19990032475

  • 发明设计人 MAEDA YASUNARI;

    申请日1999-02-10

  • 分类号G06F17/30;

  • 国家 JP

  • 入库时间 2022-08-22 02:01:15

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号