首页> 外文期刊>Computer speech and language >Unsupervised-learning-based keyphrase extraction from a single document by the effective combination of the graph-based model and the modified C-value method
【24h】

Unsupervised-learning-based keyphrase extraction from a single document by the effective combination of the graph-based model and the modified C-value method

机译:通过有效结合基于图的模型和改进的C值方法从单个文档中提取基于无监督学习的关键字

获取原文
获取原文并翻译 | 示例

摘要

Keyphrases of a given document represent its main topic and they are used as a simple method to represent the document. Statistical and graph-based models as unsupervised approaches have been mainly studied. The statistical models have some difficulty in extracting keyphrases from a single document because most statistical ones generally require statistical information from a large corpus. On the other hand, graph-based models can extract keyphrases by only using the information from a single document; nevertheless, they have some drawbacks. The scores of the edges can be biased because a single document does not contain sufficient information to score the edges of a graph and this influences the score of the nodes. In this paper, we propose an effective combination method of a statistical model, C-value method, and a graph-based model to overcome the drawbacks of each model. A new scoring method for keyphrase candidates is developed by the graph-based model and the scores calculated by the new method are applied to the modified C-value method to estimate the final importance scores of the keyphrase candidates. Subsequently, the proposed model is evaluated using two datasets, SemEval 2010 and Inspec, and its results outperformed the state-of-the-art model among unsupervised models and the existing graph-based ranking models. (C) 2019 Elsevier Ltd. All rights reserved.
机译:给定文档的关键字短语代表其主要主题,它们被用作表示文档的简单方法。统计和基于图形的模型作为无监督方法已被主要研究。统计模型在从单个文档中提取关键短语方面有些困难,因为大多数统计模型通常都需要大型语料库的统计信息。另一方面,基于图的模型可以仅通过使用单个文档中的信息来提取关键字。但是,它们有一些缺点。边缘的分数可能会出现偏差,因为单个文档不包含足够的信息来对图形的边缘进行分数,并且这会影响节点的分数。在本文中,我们提出了一种有效的统计模型,C值方法和基于图的模型的组合方法,以克服每种模型的缺点。通过基于图的模型开发了一种针对关键短语候选者的新评分方法,并将通过该新方法计算出的分数应用于修改后的C值方法,以估算关键短语候选者的最终重要性分数。随后,使用两个数据集SemEval 2010和Inspec对提议的模型进行了评估,其结果优于无人监督模型和现有基于图的排名模型中的最新模型。 (C)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号