首页> 外国专利> EXTRACTING PROCESSING SYSTEM FOR CHARACTERISTIC VOCABULARY IN JAPANESE OBJECT SENTENCE

EXTRACTING PROCESSING SYSTEM FOR CHARACTERISTIC VOCABULARY IN JAPANESE OBJECT SENTENCE

机译:日本语对象特征性词汇提取处理系统

摘要

PURPOSE:To automatically extract characteristic vocabulary in an object sentence by classifying a Japanese document into character type code strings, extracting the candidates of object characteristic vocabulary from the code strings, further extracting the candidates of the object characteristic vocabulary having high accuracy out of all the above-mentioned candidates based on language information, and further outputting vocabulary which does not exist in a Japanese dictionary for analysis. CONSTITUTION:For the inputted Japanese document, a code string expanding part 1 generates plural types of the character type code strings for every character in the Japanese document. A characteristic vocabulary candidate extracting part 2 extracts all the character strings corresponding to the code strings in an extracting character type string prescribing table 7 as the candidates of the characteristic vocabulary in the Japanese object sentence and classifies the candidates according to conditions in a classifying table 8. Next, a characteristic vocabulary language processing part 3 retrieves a language information table 9, processes respective above-mentioned candidates, and extracts some candidates out of the above- mentioned candidates having higher accuracy. A characteristic vocabulary language selecting part 4 retrieves a dictionary 10 for the analysis with the shapes of the characters of the candidates from the processing part 3 as keys and removes the candidates from all the candidates when the candidates are already registered in the dictionary 10. The candidates are regarded as the characteristic vocabulary in the Japanese object sentence, sent to a registering part, and written and registered into a file 6 when the candidates are not registered yet.
机译:目的:通过将日语文档分类为字符型代码字符串,自动提取对象句子中的特征词汇,从代码字符串中提取对象特征词汇的候选项,进一步从所有单词中提取具有较高准确性的对象特征词汇的候选项根据语言信息确定上述候选词,并进一步输出日语词典中不存在的词汇进行分析。构成:对于输入的日语文档,代码字符串扩展部分1为日语文档中的每个字符生成多种类型的字符代码字符串。特征词汇候选提取部2将与提取字符串类型的字符串规定表7中的代码字符串对应的所有字符串作为日语对象语句中的特征词汇的候选来提取,并根据分类表8中的条件对候选进行分类。接下来,特征词汇语言处理部分3检索语言信息表9,处理各个上述候选者,并从上述具有较高准确性的候选者中提取一些候选者。特征词汇语言选择部分4以来自处理部分3的候选者的字符形状作为关键字来检索用于分析的词典10,并且当候选者已经被注册在词典10中时,从所有候选者中除去候选者。候选词被认为是日语宾语中的特征词汇,被发送到注册部分,并且在尚未注册候选词时将其写入文件6。

著录项

  • 公开/公告号JPH01266670A

    专利类型

  • 公开/公告日1989-10-24

    原文格式PDF

  • 申请/专利权人 NIPPON TELEGR & TELEPH CORP NTT;

    申请/专利号JP19880095096

  • 发明设计人 OKU MASAHIRO;HIGASHIDA MASANOBU;

    申请日1988-04-18

  • 分类号G06F17/27;G06F17/30;

  • 国家 JP

  • 入库时间 2022-08-22 06:46:00

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号