首页> 外国专利> Corpus generation device, corpus generation method, and corpus generation program

Corpus generation device, corpus generation method, and corpus generation program

机译:语料库生成设备,语料库生成方法和语料库生成程序

摘要

The corpus generation device according to the embodiment includes a web page acquisition unit, a reference word acquisition unit, a grant unit, and an output unit. A web page acquisition part acquires the web page containing the explanatory text data regarding a presentation object. The reference word acquisition unit acquires a reference word that is an attribute value related to the presentation target from the web page. The assigning unit extracts a broader term that belongs to a higher rank than the reference word acquired by the reference word acquisition unit from the storage unit that stores the hierarchical relationship information representing the vertical relationship between the attribute values, An attribute tag corresponding to the reference word is assigned to the word. The output unit outputs the explanatory text data to which the attribute tag is assigned by the assigning unit as corpus data.
机译:根据实施例的语料库生成设备包括网页获取单元,参考词获取单元,授予单元和输出单元。网页获取部获取包含关于呈现对象的说明文字数据的网页。参考词获取单元从网页获取作为与呈现对象有关的属性值的参考词。分配单元从存储表示属性值之间的垂直关系的层次关系信息的存储单元中,提取比由参考词获取单元获取的参考词更高等级的更广泛术语,以及与参考对应的属性标签将单词分配给单词。输出单元输出由分配单元向其分配了属性标签的说明文本数据作为语料数据。

著录项

  • 公开/公告号JPWO2015045155A1

    专利类型

  • 公开/公告日2017-03-02

    原文格式PDF

  • 申请/专利权人 楽天株式会社;

    申请/专利号JP20140508630

  • 发明设计人 新里 圭司;

    申请日2013-09-30

  • 分类号G06F17/21;G06F17/30;

  • 国家 JP

  • 入库时间 2022-08-21 13:53:19

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号