首页> 外文期刊>World Patent Information >Multi-label classification and interactive NLP-based visualization of electric vehicle patent data
【24h】

Multi-label classification and interactive NLP-based visualization of electric vehicle patent data

机译:电动汽车专利数据的多标签分类和基于交互式NLP的可视化

获取原文
获取原文并翻译 | 示例
           

摘要

The objectives of this study are to (1) interactively visualize information embedded in patent texts, and (2) train a high-accuracy multi-label classification algorithm capable of classifying patents into multiple cooperative patent classification (CPC) classes. The case study involved metadata and text data of 17,500 electric vehicle patents. To these ends, the following methodology was applied: First, feature engineering was based on topic extraction from patent texts using latent dirichlet analysis (LDA) and the perplexity metric. Second, the multi-label implementations of the random forest, decision trees, and KNN algorithms were trained on the data in order to predict multiple class labels corresponding to a given electric vehicle patent. The results of this study were promising, with the best scores for performance metrics such as accuracy, precision, recall, f-score, and hamming loss being 0.91, 0.92, 0.74, and 0.02 respectively. The implications of our results are two-fold: firstly, we present the effectiveness of using open-source tools for customized patent analysis pipelines including interactive data visualization and machine learning. Secondly, our results provide a strong basis for automated multi-label patent classification into CPC classes.
机译:这项研究的目的是(1)交互式地可视化嵌入专利文本中的信息,以及(2)训练能够将专利分类为多个合作专利分类(CPC)类的高精度多标签分类算法。案例研究涉及17,500项电动汽车专利的元数据和文本数据。为此,采用了以下方法:首先,特征工程是基于使用潜在狄利克雷特分析(LDA)和困惑度指标从专利文本中提取主题的。其次,在数据上训练了随机森林,决策树和KNN算法的多标签实现,以便预测与给定电动汽车专利相对应的多个类别标签。这项研究的结果是有希望的,性能指标(如准确性,准确性,召回率,f得分和汉明损失)的最佳分数分别为0.91、0.92、0.74和0.02。我们研究结果的含义有两个方面:首先,我们展示了使用开放源代码工具进行定制的专利分析管道(包括交互式数据可视化和机器学习)的有效性。其次,我们的结果为将多标签专利自动分类为CPC类别提供了坚实的基础。

著录项

  • 来源
    《World Patent Information》 |2019年第9期|101903.1-101903.10|共10页
  • 作者单位

    Department of Industrial Engineering and Operations Research University of California Berkeley United States State Key Joint Laboratory of Environment Simulation and Pollution Control School of Environment Tsinghua University Beijing 100084 China;

    Department of Industrial Engineering and Operations Research University of California Berkeley United States;

    State Key Joint Laboratory of Environment Simulation and Pollution Control School of Environment Tsinghua University Beijing 100084 China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号