SIFRank: A New Baseline for Unsupervised Keyphrase Extraction Based on Pre-Trained Language Model

Sun Yi; Qiu Hangping; Zheng Yu; Wang Zhongwei; Zhang Chaoran

首页> 外文期刊>Quality Control, Transactions >SIFRank: A New Baseline for Unsupervised Keyphrase Extraction Based on Pre-Trained Language Model

【24h】

SIFRank: A New Baseline for Unsupervised Keyphrase Extraction Based on Pre-Trained Language Model

机译：Sifrank：基于预先训练的语言模型的无监督关键术提取新基线

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the age of social media, faced with a huge amount of knowledge and information, accurate and effective keyphrase extraction methods are needed to be applied in information retrieval and natural language processing. It is difficult for traditional keyphrase extraction models to contain a large amount of external knowledge information, but with the rise of pre-trained language models, there is a new way to solve this problem. Based on the above background, we propose a new baseline for unsupervised keyphrase extraction based on pre-trained language model called SIFRank. SIFRank combines sentence embedding model SIF and autoregressive pre-trained language model ELMo, and it has the best performance in keyphrase extraction for short documents. We speed up SIFRank while maintaining its accuracy by document segmentation and contextual word embeddings alignment. For long documents, we upgrade SIFRank to SIFRank & x002B; by position-biased weight, greatly improve its performance on long documents. Compared to other baseline models, our model achieves state-of-the-art level on three widely used datasets.

机译：在社交媒体时代，面对大量知识和信息，需要准确和有效的关键酶提取方法，以应用于信息检索和自然语言处理。传统的关键词提取模型很难包含大量的外部知识信息，但随着预先训练的语言模型的兴起，有一种解决这个问题的新方法。基于以上背景，我们提出了一种基于名为Sifrank的预先训练的语言模型的无监督关键症提取的新基线。 Sifrank将句子嵌入式模型SIF和自回归预先训练的语言模型ELMO，它具有短期文件的关键词提取中的最佳性能。我们加快SIFRANK，同时通过文档分割和上下文单词嵌入对齐进行准确性。对于长文件，我们将Sifrank升级为Sifrank＆x002b;通过偏见的重量，大大提高了它们在长文件上的性能。与其他基线模型相比，我们的模型在三个广泛使用的数据集中实现了最先进的水平。

著录项

来源
《Quality Control, Transactions》 |2020年第2020期|10896-10906|共11页
作者
Sun Yi; Qiu Hangping; Zheng Yu; Wang Zhongwei; Zhang Chaoran;
展开▼
作者单位

Army Engn Univ PLA Command & Control Engn Coll Nanjing 210001 Peoples R China;

Army Engn Univ PLA Command & Control Engn Coll Nanjing 210001 Peoples R China;

MIIT Res Inst 5 Ceprei Nanjing Lab Nanjing 211800 Peoples R China;

Army Engn Univ PLA Command & Control Engn Coll Nanjing 210001 Peoples R China;

Army Engn Univ PLA Command & Control Engn Coll Nanjing 210001 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Keyphrase extraction; pre-trained language model; sentence embeddings; position-biased weight; SIFRank;

机译：关键术提取;预先接受的语言模型;句子嵌入;位置偏置重量;Sifrank;

相似文献

外文文献
中文文献
专利

1. Unsupervised-learning-based keyphrase extraction from a single document by the effective combination of the graph-based model and the modified C-value method [J] . Yeom Hongseon, Ko Youngjoong, Seo Jungyun Computer speech and language . 2019,第NOVa期

机译：通过有效结合基于图的模型和改进的C值方法从单个文档中提取基于无监督学习的关键字
2. Unsupervised-learning-based keyphrase extraction from a single document by the effective combination of the graph-based model and the modified C-value method [J] . Yeom Hongseon, Ko Youngjoong, Seo Jungyun Computer speech and language . 2019,第Nova期

机译：通过基于图形的模型的有效组合和改进的C值方法的无监督学习的基于学习的关键词提取
3. An unsupervised keyphrase extraction model by incorporating structural and semantic information [J] . Progress in Artificial Intelligence . 2020,第1期

机译：通过结合结构和语义信息，一种无监督的关键术提取模型
4. Scientific Keyphrase Identification and Classification by Pre-Trained Language Models Intermediate Task Transfer Learning [C] . Seo Yeon Park, Cornelia Caragea International Conference on Computational Linguistics . 2020

机译：通过预训练的语言模型中间任务转移学习的科学关键词识别和分类
5. Evaluation techniques and graph-based algorithms for automatic summarization and keyphrase extraction. [D] . Hamid, Fahmida. 2016

机译：自动汇总和关键短语提取的评估技术和基于图的算法。
6. Relation Extraction from Clinical Narratives Using Pre-trained Language Models [O] . Qiang Wei, Zongcheng Ji, Yuqi Si, 2019

机译：使用预训练的语言模型从临床叙事中提取关系
7. Long story short - global unsupervised models for keyphrase based meeting summarization [O] . Korbinian Riedhammer, Benoit Favrec 2010

机译：长话短说 - 基于关键词的会议总结的全球无监督模型
8. Sparse Information Extraction: Unsupervised Language Models to the Rescue [R] . Downey, D., Schoenmackers, S., Etzioni, O. 2007

机译：稀疏信息提取：无人监督的语言模型

SIFRank: A New Baseline for Unsupervised Keyphrase Extraction Based on Pre-Trained Language Model

摘要

著录项

相似文献

相关主题

期刊订阅