首页> 外文会议>International conference on computational linguistics >Topical Word Trigger Model for Keyphrase Extraction
【24h】

Topical Word Trigger Model for Keyphrase Extraction

机译:用于主题词提取的主题词触发模型

获取原文

摘要

Keyphrase extraction aims to find representative phrases for a document. Keyphrases are expected to cover main themes of a document. Meanwhile, keyphrases do not necessarily occur frequently in the document, which is known as the vocabulary gap between the words in a document and its keyphrases. In this paper, we propose Topical Word Trigger Model (TWTM) for keyphrase extraction. TWTM assumes the content and keyphrases of a document are talking about the same themes but written in different languages. Under the assumption, keyphrase extraction is modeled as a translation process from document content to keyphrases. Moreover, in order to better cover document themes, TWTM sets trigger probabilities to be topic-specific, and hence the trigger process can be influenced by the document themes. On one hand, TWTM uses latent topics to model document themes and takes the coverage of document themes into consideration; on the other hand, TWTM uses topic-specific word trigger to bridge the vocabulary gap between the words in document and keyphrases. Experiment results on real world dataset reveal that TWTM outperforms existing state-of-the-art methods under various evaluation metrics.
机译:关键字短语提取旨在查找文档的代表性短语。关键短语应涵盖文档的主要主题。同时,关键字短语不一定在文档中频繁出现,这被称为文档中的单词与其关键字短语之间的词汇间隔。在本文中,我们提出主题词触发模型(TWTM)用于关键词提取。 TWTM假设文档的内容和关键短语在谈论相同的主题,但是用不同的语言编写。在此假设下,将关键短语提取建模为从文档内容到关键短语的翻译过程。此外,为了更好地覆盖文档主题,TWTM将触发概率设置为特定于主题,因此触发过程可能会受到文档主题的影响。一方面,TWTM使用潜在主题为文档主题建模,并考虑文档主题的覆盖范围;另一方面,TWTM使用主题特定的单词触发器来弥合文档中的单词和关键字短语之间的词汇间隔。真实数据集上的实验结果表明,在各种评估指标下,TWTM均优于现有的最新方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号