...
首页> 外文期刊>Quality Control, Transactions >SIFRank: A New Baseline for Unsupervised Keyphrase Extraction Based on Pre-Trained Language Model
【24h】

SIFRank: A New Baseline for Unsupervised Keyphrase Extraction Based on Pre-Trained Language Model

机译:Sifrank:基于预先训练的语言模型的无监督关键术提取新基线

获取原文
获取原文并翻译 | 示例
           

摘要

In the age of social media, faced with a huge amount of knowledge and information, accurate and effective keyphrase extraction methods are needed to be applied in information retrieval and natural language processing. It is difficult for traditional keyphrase extraction models to contain a large amount of external knowledge information, but with the rise of pre-trained language models, there is a new way to solve this problem. Based on the above background, we propose a new baseline for unsupervised keyphrase extraction based on pre-trained language model called SIFRank. SIFRank combines sentence embedding model SIF and autoregressive pre-trained language model ELMo, and it has the best performance in keyphrase extraction for short documents. We speed up SIFRank while maintaining its accuracy by document segmentation and contextual word embeddings alignment. For long documents, we upgrade SIFRank to SIFRank & x002B; by position-biased weight, greatly improve its performance on long documents. Compared to other baseline models, our model achieves state-of-the-art level on three widely used datasets.
机译:在社交媒体时代,面对大量知识和信息,需要准确和有效的关键酶提取方法,以应用于信息检索和自然语言处理。传统的关键词提取模型很难包含大量的外部知识信息,但随着预先训练的语言模型的兴起,有一种解决这个问题的新方法。基于以上背景,我们提出了一种基于名为Sifrank的预先训练的语言模型的无监督关键症提取的新基线。 Sifrank将句子嵌入式模型SIF和自回归预先训练的语言模型ELMO,它具有短期文件的关键词提取中的最佳性能。我们加快SIFRANK,同时通过文档分割和上下文单词嵌入对齐进行准确性。对于长文件,我们将Sifrank升级为Sifrank&x002b;通过偏见的重量,大大提高了它们在长文件上的性能。与其他基线模型相比,我们的模型在三个广泛使用的数据集中实现了最先进的水平。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号