首页> 外文期刊>Journal of Information Recording >Sentence-Ranking-Enhanced Keywords Extraction from Chinese Patents
【24h】

Sentence-Ranking-Enhanced Keywords Extraction from Chinese Patents

机译:中国专利中句子排位增强关键词的提取

获取原文
获取原文并翻译 | 示例
       

摘要

Patent keywords, a high-level topic representation of patents, hold an important position in many patent-oriented mining tasks, such as classification, retrieval and translation. However, there are few studies concentrated on keywords extraction for patents in current stage, and neither exist human-annotated gold standard datasets, especially for Chinese patents. This paper introduces a new human-annotated Chinese patent dataset and proposes a sentence-ranking based Term Frequency-Inverse Document Frequency (SR based TF-IDF) algorithm for patent keywords extraction, motivated by the thought of "the keywords are in the key sentences". In the algorithm, a sentence-ranking model is constructed to filter top-K-s percent sentences from each patent based on a sentence semantic graph and heuristic rules. At last, the proposed algorithm is evaluated with TF-IDF, TextRank, word2vec weighted TextRank and Patent Keyword Extraction Algorithm (PKEA) on the homemade Chinese patent dataset and several standard benchmark datasets. The experimental results testify that our proposed algorithm effectively improves the performance of extracting keywords from Chinese patents.
机译:专利关键字是专利的高级主题表示形式,在许多面向专利的挖掘任务(例如分类,检索和翻译)中占有重要地位。但是,现阶段很少有研究集中在专利的关键词提取上,也没有人为注释的黄金标准数据集,尤其是中国专利。本文介绍了一个新的带有人工注释的中国专利数据集,并提出了一种基于句子排序的术语频率-逆文档频率(基于SR的TF-IDF)算法,用于专利关键词的提取,其动机是“关键词在关键句子中” ”。在该算法中,构建了一个句子排序模型,以基于句子语义图和启发式规则从每个专利中过滤出前百分之K的句子。最后,在自制的中国专利数据集和一些标准基准数据集上,使用TF-IDF,TextRank,word2vec加权TextRank和专利关键字提取算法(PKEA)对提出的算法进行了评估。实验结果证明,本文提出的算法有效提高了从中国专利中提取关键词的性能。

著录项

  • 来源
    《Journal of Information Recording》 |2019年第3期|651-674|共24页
  • 作者

    Wang Zhi-Hong; Guo Yi;

  • 作者单位

    East China Univ Sci & Technol Dept Comp Sci & Engn Shanghai 200237 Peoples R China;

    East China Univ Sci & Technol Dept Comp Sci & Engn Shanghai 200237 Peoples R China|Natl Engn Lab Big Data Distribut & Exchange Techn Business Intelligence & Visualizat Res Ctr Shanghai 200436 Peoples R China|Shihezi Univ Sch Informat Sci & Technol Shihezi 8320003 Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Chinese patents; key sentences; sentence-ranking model; keywords extraction; human-annotated dataset;

    机译:中国专利;关键句子句子排序模型关键字提取;人工注释数据集;
  • 入库时间 2022-08-18 04:33:23

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号