...
首页> 外文期刊>Data technologies and applications >Domain-specific word embeddings for patent classification
【24h】

Domain-specific word embeddings for patent classification

机译:特定领域的专利词嵌入分类

获取原文
获取原文并翻译 | 示例
           

摘要

Purpose Patent offices and other stakeholders in the patent domain need to classify patent applications according to a standardized classification scheme. The purpose of this paper is to examine the novelty of an application it can then be compared to previously granted patents in the same class. Automatic classification would be highly beneficial, because of the large volume of patents and the domain-specific knowledge needed to accomplish this costly manual task. However, a challenge for the automation is patent-specific language use, such as special vocabulary and phrases. Design/methodology/approach To account for this language use, the authors present domain-specific pre-trained word embeddings for the patent domain. The authors train the model on a very large data set of more than 5m patents and evaluate it at the task of patent classification. To this end, the authors propose a deep learning approach based on gated recurrent units for automatic patent classification built on the trained word embeddings. Findings Experiments on a standardized evaluation data set show that the approach increases average precision for patent classification by 17 percent compared to state-of-the-art approaches. In this paper, the authors further investigate the model's strengths and weaknesses. An extensive error analysis reveals that the learned embeddings indeed mirror patent-specific language use. The imbalanced training data and underrepresented classes are the most difficult remaining challenge. Originality/value The proposed approach fulfills the need for domain-specific word embeddings for downstream tasks in the patent domain, such as patent classification or patent analysis.
机译:目的专利机构和其他利益相关者专利领域需要对专利进行分类根据标准化的应用程序分类方案。是检查应用程序的新奇吗以前授予可以相比专利在同一个班。分类是非常有益的,因为大量的专利和完成所需的特定领域的知识这昂贵的手工任务。自动化是专利专用的语言使用,如特殊词汇和短语。设计/方法/方法帐户语言使用,作者展示了特定于域的pre-trained字嵌入的专利域。大数据集的专利和超过5米评估在专利分类的任务。为此,作者提出一个深度学习方法基于封闭的复发性单位自动建立在专利分类训练有素的嵌入。一个标准化的评估数据集显示方法增加平均精度为专利分类相比,下降了17%最先进的方法。作者进一步研究模型的优势和弱点。揭示了确实学会了嵌入的镜子专利专用的语言使用。训练数据和弱势阶层剩下的最困难的挑战。创意/值该方法实现特定领域的单词嵌入的必要性下游任务在专利领域,例如专利分类或者专利分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号