Domain-specific word embeddings for patent classification

Risch Julian; Krestel Ralf

首页> 外文期刊>Data technologies and applications >Domain-specific word embeddings for patent classification

【24h】

Domain-specific word embeddings for patent classification

机译：特定领域的专利词嵌入分类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Purpose Patent offices and other stakeholders in the patent domain need to classify patent applications according to a standardized classification scheme. The purpose of this paper is to examine the novelty of an application it can then be compared to previously granted patents in the same class. Automatic classification would be highly beneficial, because of the large volume of patents and the domain-specific knowledge needed to accomplish this costly manual task. However, a challenge for the automation is patent-specific language use, such as special vocabulary and phrases. Design/methodology/approach To account for this language use, the authors present domain-specific pre-trained word embeddings for the patent domain. The authors train the model on a very large data set of more than 5m patents and evaluate it at the task of patent classification. To this end, the authors propose a deep learning approach based on gated recurrent units for automatic patent classification built on the trained word embeddings. Findings Experiments on a standardized evaluation data set show that the approach increases average precision for patent classification by 17 percent compared to state-of-the-art approaches. In this paper, the authors further investigate the model's strengths and weaknesses. An extensive error analysis reveals that the learned embeddings indeed mirror patent-specific language use. The imbalanced training data and underrepresented classes are the most difficult remaining challenge. Originality/value The proposed approach fulfills the need for domain-specific word embeddings for downstream tasks in the patent domain, such as patent classification or patent analysis.

机译：目的专利机构和其他利益相关者专利领域需要对专利进行分类根据标准化的应用程序分类方案。是检查应用程序的新奇吗以前授予可以相比专利在同一个班。分类是非常有益的,因为大量的专利和完成所需的特定领域的知识这昂贵的手工任务。自动化是专利专用的语言使用,如特殊词汇和短语。设计/方法/方法帐户语言使用,作者展示了特定于域的pre-trained字嵌入的专利域。大数据集的专利和超过5米评估在专利分类的任务。为此,作者提出一个深度学习方法基于封闭的复发性单位自动建立在专利分类训练有素的嵌入。一个标准化的评估数据集显示方法增加平均精度为专利分类相比,下降了17%最先进的方法。作者进一步研究模型的优势和弱点。揭示了确实学会了嵌入的镜子专利专用的语言使用。训练数据和弱势阶层剩下的最困难的挑战。创意/值该方法实现特定领域的单词嵌入的必要性下游任务在专利领域,例如专利分类或者专利分析。

著录项

来源
《Data technologies and applications》 |2019年第1期|108-122|共15页
作者
Risch Julian; Krestel Ralf;
展开▼
作者单位

Univ Potsdam, Hasso Plattner Inst, Potsdam, Germany;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类
关键词
Language; patenting; WordWritersPatents;

机译：语言;专利;WordWritersPatents;

相似文献

外文文献
中文文献
专利

1. An empirical study on classification of patent life patterns [J] . Wang Juncheng, Lin Hui, Shi Bingfan, 高技术通讯（英文版） . 2018,第004期
2. An Embedded Feature Selection Method for Imbalanced Data Classification [J] . Haoyue Liu, MengChu Zhou, Qing Liu 自动化学报：英文版 . 2019,第003期
3. SynoExtractor: A Novel Pipeline for Arabic Synonym Extraction Using Word2Vec Word Embeddings [J] . Rawan N. Al-Matham, Hend S. Al-Khalifa Complexity . 2021,第a期

机译：SynoExtractor：使用Word2Vec Word Embeddings的阿拉伯语义词义提取一个新型管道
4. A semantic retrieval system of conversational English/Japanese sentences using SD-form model with word-classification lexicon [J] . Sayaka Minewaki, Shota Yoshihara, Masahiro Wakiyama, 電子情報通信学会技術研究報告. 思考と言語. Thought and Language . 2001,第484期

机译：使用SD-Form Model使用Word-Classification Lexicon的对话英语/日语句子的语义检索系统
5. A semantic retrieval system of conversational English/Japanese sentences using SD-form model with word-classification lexicon [J] . Sayaka Minewaki, Shota Yoshihara, Masahiro Wakiyama, 電子情報通信学会技術研究報告. 思考と言語. Thought and Language . 2001,第484期

机译：使用SD-Form Model使用Word-Classification Lexicon的对话英语/日语句子的语义检索系统
6. Onto.KOM - Towards a Minimally Supervised Ontology Learning System based on Word Embeddings and Convolutional Neural Networks [C] . Wael Alkhatib, Leon Alexander Herrmann, Christoph Rensing International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management . 2017

机译：基于Word Embeddings和卷积神经网络的基于Word Embeddings和卷积神经网络的最小监督本体学习系统
7. Using Word Embeddings to Explore the Language of Depression on Twitter [D] . Gopchandani, Sandhya. 2019

机译：使用Word Embeddings探索Twitter上的抑郁症语言
8. Expanding Our Understanding of COVID-19 from Biomedical Literature Using Word Embedding [O] . Heyoung Yang, Eunsoo Sohn 2021

机译：使用Word Embedding从生物医学文献中扩展我们对Covid-19的理解
9. SynoExtractor: A Novel Pipeline for Arabic Synonym Extraction Using Word2Vec Word Embeddings [O] . Rawan N. Al-Matham, Hend S. Al-Khalifa 2021

机译：SynoExtractor：使用Word2Vec Word Embeddings的阿拉伯语义词义提取一个新型管道
10. Word Domain Disambiguation via Word Sense Disambiguation [R] . Sanfilippo, A. 2006

机译：Word Word消歧通过Word sense消歧

Domain-specific word embeddings for patent classification

摘要

著录项

相似文献

相关主题

期刊订阅