首页> 外文期刊>Frontiers of computer science >Patent expanded retrieval via word embedding under composite-domain perspectives
【24h】

Patent expanded retrieval via word embedding under composite-domain perspectives

机译:通过复合域视角下的单词嵌入专利扩展检索

获取原文
获取原文并翻译 | 示例
           

摘要

Patent prior art search uses dispersed information to retrieve all the relevant documents with strong ambiguity from the massive patent database. This challenging task consists in patent reduction and patent expansion. Existing studies on patent reduction ignore the relevance between technical characteristics and technical domains, and result in ambiguous queries. Works on patent expansion expand terms from external resource by selecting words with similar distribution or similar semantics. However, this splits the relevance between the distribution and semantics of the terms. Besides, common repository hardly meets the requirement of patent expansion for uncommon semantics and unusual terms. In order to solve these problems, we first present a novel composite-domain perspective model which converts the technical characteristic of a query patent to a specific composite classified domain and generates aspect queries. We then implement patent expansion with double consistency by combining distribution and semantics simultaneously.We also propose to train semantic vector spaces via word embedding under the specific classified domains, so as to provide domain-aware expanded resource. Finally, multiple retrieval results of the same topic are merged based on perspective weight and rank in the results. Our experimental results on CLEP-IP 2010 demonstrate that our method is very effective. It reaches about 5.43% improvement in recall and nearly 12.38% improvement in PRES over the state-of-the-art. Our work also achieves the best performance balance in terms of recall, MAP and PRES.
机译:专利现有技术搜索使用分散的信息来检索具有来自大规模专利数据库的强歧义的所有相关文档。这项有挑战性的任务包括减少专利和专利扩张。现有的专利减少研究忽略了技术特征和技术领域之间的相关性,并导致模糊的查询。通过选择具有类似分布或类似语义的单词,从外部资源扩展术语。但是,这拆分了这些术语的分布与语义之间的相关性。此外,普通储存库几乎不符合专利扩展的要求,以获得罕见的语义和不寻常的术语。为了解决这些问题,首先提出一种新的复合域透视模型,将查询专利的技术特征转换为特定的复合分类域,并生成方面查询。然后,我们通过同时组合分发和语义来实现具有双重一致性的专利扩展。我们还建议通过嵌入在特定的分类域下的Word捕获语义矢量空间,以便提供域感知扩展资源。最后,同一主题的多个检索结果基于透视程权重和等级在结果中进行合并。我们对2010年CLEP-IP的实验结果表明我们的方法非常有效。召回的提高约为5.43%,在最先进的情况下改善了近12.38%。我们的工作也在召回,地图和PRAS方面实现了最佳性能平衡。

著录项

  • 来源
    《Frontiers of computer science》 |2019年第5期|1048-1061|共14页
  • 作者单位

    Wuhan Univ Sch Comp Sci Wuhan 430072 Hubei Peoples R China|Wuhan Univ State Key Lab Software Engn Wuhan 430072 Hubei Peoples R China;

    Wuhan Univ Sch Comp Sci Wuhan 430072 Hubei Peoples R China|Wuhan Univ State Key Lab Software Engn Wuhan 430072 Hubei Peoples R China;

    Wuhan Univ Sch Comp Sci Wuhan 430072 Hubei Peoples R China|Wuhan Univ State Key Lab Software Engn Wuhan 430072 Hubei Peoples R China;

    Wuhan Univ Sch Comp Sci Wuhan 430072 Hubei Peoples R China|Wuhan Univ State Key Lab Software Engn Wuhan 430072 Hubei Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    patent retrieval; composite-domain perspective; double-consistency expansion; word embedding;

    机译:专利检索;复合域视角;双重一致性扩张;单词嵌入;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号