首页> 外文期刊>Artificial Intelligence Research >Multiclass patent document classification
【24h】

Multiclass patent document classification

机译:多牌专利文献分类

获取原文
获取原文并翻译 | 示例
           

摘要

This article addresses patent document classification problem into fifteen different categories or classes, where some classes overlap with each other for practical reasons. For the development of the classification model using machine learning techniques, useful features have been extracted from the given documents. The features are used to classify patent document as well as to generate useful tag-words. The overall objective of this work is to systematize NASA's patent management, by developing a set of automated tools that can assist NASA to manage and market its portfolio of intellectual properties (IP), and to enable easier discovery of relevant IP by users. We have identified an array of methods that can be applied such as k-Nearest Neighbors (kNN), two variations of the Support Vector Machine (SVM) algorithms, and two tree based classification algorithms: Random Forest and J48. The major research steps in this paper consist of filtering techniques for variable selection, information gain and feature correlation analysis, and training and testing potential models using effective classifiers. Further, the obstacles associated with the imbalanced data were mitigated by adding pseudo-synthetic data wherever appropriate, which resulted in a superior SVM classifier based model.
机译:本文将专利文献分类问题纳入十五种不同类别或类,因为有些类以实际原因彼此重叠。对于使用机器学习技术开发分类模型,已从给定文档中提取有用的特征。该功能用于对专利文档进行分类,以及生成有用的标签。这项工作的总体目标是通过开发一套可以协助美国国家航空航天局来管理和筹集其知识产权(IP)组合的一套自动化工具来系统化,并使用户能够更容易发现用户的相关IP。我们已经识别了一种可以应用的方法阵列,例如k-collect邻居(knn),支持向量机(SVM)算法的两个变体,以及两个基于树的分类算法:随机林和J48。本文的主要研究步骤包括使用有效分类器的可变选择,信息增益和特征相关性分析以及培训和测试潜在模型的过滤技术。此外,通过在适当的位置添加伪合成数据来减轻与不平衡数据相关的障碍物,这导致基于SVM分类器的较好的SVM分类器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号