首页> 外文期刊>Computer speech and language >Unsupervised-learning-based keyphrase extraction from a single document by the effective combination of the graph-based model and the modified C-value method
【24h】

Unsupervised-learning-based keyphrase extraction from a single document by the effective combination of the graph-based model and the modified C-value method

机译:通过基于图形的模型的有效组合和改进的C值方法的无监督学习的基于学习的关键词提取

获取原文
获取原文并翻译 | 示例

摘要

Keyphrases of a given document represent its main topic and they are used as a simple method to represent the document. Statistical and graph-based models as unsupervised approaches have been mainly studied. The statistical models have some difficulty in extracting keyphrases from a single document because most statistical ones generally require statistical information from a large corpus. On the other hand, graph-based models can extract keyphrases by only using the information from a single document; nevertheless, they have some drawbacks. The scores of the edges can be biased because a single document does not contain sufficient information to score the edges of a graph and this influences the score of the nodes. In this paper, we propose an effective combination method of a statistical model, C-value method, and a graph-based model to overcome the drawbacks of each model. A new scoring method for keyphrase candidates is developed by the graph-based model and the scores calculated by the new method are applied to the modified C-value method to estimate the final importance scores of the keyphrase candidates. Subsequently, the proposed model is evaluated using two datasets, SemEval 2010 and Inspec, and its results outperformed the state-of-the-art model among unsupervised models and the existing graph-based ranking models. (C) 2019 Elsevier Ltd. All rights reserved.
机译:给定文档的关键短址表示其主要主题,它们用作表示文档的简单方法。主要研究了统计和基于图形的模型,作为无监督的方法。统计模型对从单个文档中提取关键短路阶段的统计模型有些困难,因为大多数统计数据通常需要来自大型语料库的统计信息。另一方面,基于图形的模型可以仅通过从单个文档中的信息提取密钥段;尽管如此,他们有一些缺点。边缘的分数可以偏置,因为单个文档不包含足够的信息来对图形的边缘进行评分,并且这影响节点的得分。在本文中,我们提出了一种统计模型,C值方法和基于图形模型的有效组合方法,以克服每个模型的缺点。基于图形的模型开发了对关键酶候选的新评分方法,并将通过新方法计算的分数应用于修改的C值方法以估计关键术候选的最终重要性评分。随后,使用两个数据集,Semeval 2010和Inspec评估所提出的模型,其结果表现优于无监督模型和现有的基于图形的排名模型的最先进模型。 (c)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号