首页> 外国专利> Corpus generation device, corpus generation method, and corpus generation program

Corpus generation device, corpus generation method, and corpus generation program

机译:语料库生成设备,语料库生成方法和语料库生成程序

摘要

A corpus generation device according to an embodiment includes a web page acquisition unit, a reference word acquisition unit, an attachment unit and an output unit. The web page acquisition unit acquires a web page including description sentence data regarding a presentation target. The reference word acquisition unit acquires a reference word that is an attribute value regarding the presentation target from the web page. The attachment unit extracts a broader word belonging to a layer above the reference word acquired by the reference word acquisition unit from a storage unit that stores hierarchical relationship information indicating a hierarchical relationship between attribute values, and attaches an attribute tag corresponding to the reference word to the broader word included in the description sentence data. The output unit outputs, as corpus data, the description sentence data to which the attribute tag is attached by the attachment unit.
机译:根据实施例的语料库生成设备包括网页获取单元,参考词获取单元,附件单元和输出单元。网页获取单元获取包括关于呈现对象的描述语句数据的网页。参考词获取单元从网页获取作为与呈现对象有关的属性值的参考词。附接单元从存储表示属性值之间的层次关系的层次关系信息的存储单元中提取属于由参考词获取单元获取的参考词之上的层的较宽词,并将与该参考词相对应的属性标签附于其中。描述句子数据中包含的较宽词。输出单元将附有属性标签的描述语句数据作为语料数据输出。

著录项

  • 公开/公告号JP5576003B1

    专利类型

  • 公开/公告日2014-08-20

    原文格式PDF

  • 申请/专利权人 楽天株式会社;

    申请/专利号JP20140508630

  • 发明设计人 新里 圭司;

    申请日2013-09-30

  • 分类号G06F17/21;G06F17/27;G06F17/30;

  • 国家 JP

  • 入库时间 2022-08-21 16:14:34

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号