首页> 外国专利> COMPUTER SYSTEM FOR CREATING TERM DICTIONARY WITH NAMED ENTITIES OR TERMINOLOGIES INCLUDED IN TEXT DATA, AND METHOD AND COMPUTER PROGRAM THEREFOR

COMPUTER SYSTEM FOR CREATING TERM DICTIONARY WITH NAMED ENTITIES OR TERMINOLOGIES INCLUDED IN TEXT DATA, AND METHOD AND COMPUTER PROGRAM THEREFOR

机译:用文本数据中包含的命名实体或术语创建术语词典的计算机系统及其方法和计算机程序

摘要

PPROBLEM TO BE SOLVED: To find a word to be registered from newly added text without omission, and to efficiently perform an operation when constructing a term dictionary of word categories. PSOLUTION: A computer system includes a morphological analysis unit which acquires token sequence data by performing morphological analysis for text data, a category distinguishing unit which distinguishes respective tokens of the token sequence data by using a category dictionary to extract uncategorized words, an uncategorized-word comparing unit which compares each of the extracted uncategorized words with an uncategorized-word comparison rule to extract an uncategorized word matching the uncategorized-word comparison rule as a registration candidate word, and a token-sequence comparing unit which compares a token sequence of the token sequence data with a token-sequence comparison rule to extract a token sequence matching the token-sequence comparison rule as registration candidate words, and comprises a permission unit which permits a user to select whether to register the registration candidate words in the category dictionary. PCOPYRIGHT: (C)2010,JPO&INPIT
机译:

要解决的问题:在不遗漏的情况下从新添加的文本中查找要注册的单词,并在构建单词类别的术语词典时有效地执行操作。

解决方案:一种计算机系统,包括:形态分析单元,其通过对文本数据进行形态分析来获取标记序列数据;类别区分单元,其通过使用类别字典来提取未分类词来区分标记序列数据的各个标记;未分类词比较单元,其将所提取的每个未分类词与未分类词比较规则进行比较,以提取与未分类词比较规则匹配的未分类词作为注册候选单词;以及令牌序列比较单元,其对令牌序列进行比较。利用令牌序列比较规则对令牌序列数据进行提取,以提取与令牌序列比较规则匹配的令牌序列作为注册候选单词,并且包括允许单元选择允许用户选择是否在类别中注册注册候选单词的许可单元。字典。

版权:(C)2010,日本特许厅&INPIT

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号