【24h】

Keyword Extraction from a Single Document using Word Co-occurrence Statistical Information

机译:使用单词共现统计信息从单个文档中提取关键字

获取原文
获取原文并翻译 | 示例

摘要

We present a new keyword extraction algorithm that applies to a single document without using a corpus. Frequent terms are extracted first, then a set of cooccurrence between each term and the frequent terms, i.e., occurrences in the same sentences, is generated. Co-occurrence distribution shows importance of a term in the document as follows. If probability distribution of co-occurrence between term a and the frequent terms is biased to a particular subset of frequent terms, then term a is likely to be a keyword. The degree of biases of distribution is measured by the x~2-measure. Our algorithm shows comparable performance to tfidf without using a corpus.
机译:我们提出了一种新的关键字提取算法,该算法适用于单个文档而无需使用语料库。首先提取频繁项,然后生成每个项与频繁项之间的一组同现,即同一句子中的出现。共现分布显示文档中术语的重要性,如下所示。如果术语a和频繁术语之间的共现概率分布偏向频繁术语的特定子集,则术语a可能是关键字。分布的偏差程度通过x〜2-度量来度量。我们的算法在不使用语料库的情况下显示了与tfidf相当的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号