Multiclass patent document classification

Chaitanya Anne; Avdesh Mishra; Md Tamjidul Hoque; Shengru Tu

首页> 外文期刊>Artificial Intelligence Research >Multiclass patent document classification

【24h】

Multiclass patent document classification

机译：多牌专利文献分类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This article addresses patent document classification problem into fifteen different categories or classes, where some classes overlap with each other for practical reasons. For the development of the classification model using machine learning techniques, useful features have been extracted from the given documents. The features are used to classify patent document as well as to generate useful tag-words. The overall objective of this work is to systematize NASA's patent management, by developing a set of automated tools that can assist NASA to manage and market its portfolio of intellectual properties (IP), and to enable easier discovery of relevant IP by users. We have identified an array of methods that can be applied such as k-Nearest Neighbors (kNN), two variations of the Support Vector Machine (SVM) algorithms, and two tree based classification algorithms: Random Forest and J48. The major research steps in this paper consist of filtering techniques for variable selection, information gain and feature correlation analysis, and training and testing potential models using effective classifiers. Further, the obstacles associated with the imbalanced data were mitigated by adding pseudo-synthetic data wherever appropriate, which resulted in a superior SVM classifier based model.

机译：本文将专利文献分类问题纳入十五种不同类别或类，因为有些类以实际原因彼此重叠。对于使用机器学习技术开发分类模型，已从给定文档中提取有用的特征。该功能用于对专利文档进行分类，以及生成有用的标签。这项工作的总体目标是通过开发一套可以协助美国国家航空航天局来管理和筹集其知识产权（IP）组合的一套自动化工具来系统化，并使用户能够更容易发现用户的相关IP。我们已经识别了一种可以应用的方法阵列，例如k-collect邻居（knn），支持向量机（SVM）算法的两个变体，以及两个基于树的分类算法：随机林和J48。本文的主要研究步骤包括使用有效分类器的可变选择，信息增益和特征相关性分析以及培训和测试潜在模型的过滤技术。此外，通过在适当的位置添加伪合成数据来减轻与不平衡数据相关的障碍物，这导致基于SVM分类器的较好的SVM分类器。

著录项

来源
《Artificial Intelligence Research》 |2018年第1期|共14页
作者
Chaitanya Anne; Avdesh Mishra; Md Tamjidul Hoque; Shengru Tu;
展开▼
作者单位

University of New Orleans;

University of New Orleans;

Computer Science University of New Orleans;

University of New Orleans;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词
Machine learning algorithms; Patent classification; Imbalanced data; SVM; Multi-class classification;

机译：机器学习算法;专利分类;数据不平衡;SVM;多级分类;

相似文献

外文文献
中文文献
专利

1. Multiclass patent document classification [J] . Chaitanya Anne, Avdesh Mishra, Md Tamjidul Hoque, Artificial Intelligence Research . 2018,第1期

机译：多牌专利文献分类
2. Patent Issued for Document Classification and Characterization [J] . Journal of Engineering . 2013,第13期

机译：为文件分类和表征颁发的专利
3. Development of a patent document classification and search platform using a back-propagation network [J] . Amy J. C. Trappey, Fu-Chiang Hsu, Charles V. Trappey, Expert Systems with Application . 2006,第4期

机译：使用反向传播网络开发专利文件分类和搜索平台
4. Computer-assisted categorisation of patent documents in the International Patent Classification [C] . C. J. Fall, K. Benzineb, J. Guyot, International Chemical Information Conference; 20031019-20031022; Nimes; FR . 2003

机译：国际专利分类中专利文件的计算机辅助分类
5. Automated patent classification for German patent documents. [D] . Zakaria, Saiedeh. 1989

机译：德国专利文件的自动专利分类。
6. Correction: Enhancing Confusion Entropy (CEN) for binary and multiclass classification [O] . Rosario Delgado, J. David Núñez-González 2021

机译：校正：增强混淆熵（CEN）用于二进制和多字符分类
7. Multiclass patent document classification [O] . Chaitanya Anne, Avdesh Mishra, Md Tamjidul Hoque, 2017

机译：多牌专利文献分类
8. Identifying Areas of Leading Edge Japanese Science and Technology: Second Interim Report. Patent Activity and Citation Analysis Using U.S. POC (Patent Office Classification) Classification [R] . Narin, F. , Olivastro, D. 1986

机译：确定领先优势领域日本科技：第二次中期报告。使用美国pOC（专利局分类）分类的专利活动和引用分析

Multiclass patent document classification

摘要

著录项

相似文献

相关主题

期刊订阅