【24h】

Automatic keyphrase extraction using word embeddings

机译:使用Word Embeddings自动关键字提取

获取原文
获取原文并翻译 | 示例
       

摘要

Unsupervised random-walk keyphrase extraction models mainly rely on global structural information of the word graph, with nodes representing candidate words and edges capturing the co-occurrence information between candidate words. However, using word embedding method to integrate multiple kinds of useful information into the random-walk model to help better extract keyphrases is relatively unexplored. In this paper, we propose a random-walk-based ranking method to extract keyphrases from text documents using word embeddings. Specifically, we first design a heterogeneous text graph embedding model to integrate local context information of the word graph (i.e., the local word collocation patterns) with some crucial features of candidate words and edges of the word graph. Then, a novel random-walk-based ranking model is designed to score candidate words by leveraging such learned word embeddings. Finally, a new and generic similarity-based phrase scoring model using word embeddings is proposed to score phrases for selecting top-scoring phrases as keyphrases. Experimental results show that the proposed method consistently outperforms eight state-of-the-art unsupervised methods on three real datasets for keyphrase extraction.
机译:无监督的随机步道关键词提取模型主要依赖于单词图的全局结构信息,其中节点表示候选词和边缘捕获候选词之间的共同发生信息。但是,使用Word嵌入方法将多种有用信息集成到随机步行模型中,以帮助更好的提取密钥次相对未探索。在本文中,我们提出了一种随机散步的排名方法,可以使用Word Embeddings从文本文档中提取关键次数。具体地,我们首先设计一个异构文本图嵌入模型,以将字图(即,本地词搭配模式)的本地上下文信息与单词图的候选词和边缘的一些重要特征集成在一起。然后,设计了一种新的随机步行排名模型,用于通过利用这样的学习词嵌入来获得候选词。最后,提出了一种使用Word Embeddings的基于新的基于相似性的短语评分模型,以逐句选择要选择顶级短语作为关键词的短语。实验结果表明,该方法在三个真实数据集中始终如一地优于八种最新的无监督方法,用于关键正萃取。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号