首页> 外文会议>European Conference on Information Retrieval Research >A New Scheme for Scoring Phrases in Unsupervised Keyphrase Extraction
【24h】

A New Scheme for Scoring Phrases in Unsupervised Keyphrase Extraction

机译:一种新的核心关键词提取中的短语计划

获取原文

摘要

Many unsupervised methods for keyphrase extraction typically compute a score for each word in a document based on various measures such as tf-idf or the PageRank score computed from the word graph built from the text document. The final score of a candidate phrase is then calculated by summing up the scores of its constituent words. A potential problem with the sum up scoring scheme is that the length of a phrase highly impacts its score. To reduce this impact and extract keyphrases of varied lengths, we propose a new scheme for scoring phrases which calculates the final score using the average of the scores of individual words weighted by the frequency of the phrase in the document. We show experimentally that the unsupervised approaches that use this new scheme outperform their counterparts that use the sum up scheme to score phrases.
机译:对于关键斑点提取的许多无监督方法通常基于从文本文档中构建的单词图中计算的各种措施(例如TF-IDF或PageRank分数)计算文档中的每个单词的分数。然后通过总结其组成词的分数来计算候选词组的最终得分。总结得分方案的潜在问题是短语的长度高度影响其分数。为减少这种影响和提取各种长度的关键效果,我们提出了一种新的计划,用于使用由文档中短语的频率加权的单个单词的分数的平均值计算最终分数的新方案。我们通过实验显示了使用此新方案的无监督方法优先于其对应于使用总结方案来衡量短语的对应物。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号