首页> 外文会议>Conference on Natural Language Processing in Artificial Intelligence >The Semantic Level of Shannon Information: Are Highly Informative Words Good Keywords? A Study on German

The Semantic Level of Shannon Information: Are Highly Informative Words Good Keywords? A Study on German

机译:Shannon信息的语义水平:是高度信息性的词好关键词吗? 德语研究



This paper reports the results of a study on automatic keyword extraction in German. We employed in general two types of methods: (A) unsupervised, based on information theory, i.e., (ⅰ) a bigram model, (ⅱ) a probabilistic parser model, and (ⅲ) a novel model which considers topics within the discourse of target word for the calculation of their information content, and (B) supervised, employing a recurrent neural network (RNN). As baselines, we employed TextRank and the TF-IDF ranking function. The topic model (A)(ⅲ) outperformed clearly all remaining models, even TextRank and TF-IDF. In contrast, RNN performed poorly. We take the results as first evidence that (ⅰ) information content can be employed for keyword extraction tasks and has thus a clear correspondence to semantics of natural language, and (ⅱ) that-as a cognitive principle-the information content of words is determined from extra-sentential contexts, i.e., from the discourse of words.
机译:本文报告了德语中自动关键词提取研究的结果。 我们一般雇用两种方法:(a)根据信息理论,IE,(Ⅰ)BIGRAM模型,(Ⅱ)是概率解析器模型,(Ⅲ)将话题介绍主题的新型模型 针对计算其信息内容的目标词,(b)监督,采用经常性神经网络(RNN)。 作为基准,我们使用Textrank和TF-IDF排名功能。 主题型号(a)(Ⅲ)显然明显明显所有剩余模型,甚至是textrank和tf-idf。 相比之下,RNN表现不佳。 我们将结果作为第一证据表明(Ⅰ)信息内容可以用于关键字提取任务,因此对自然语言的语义进行了明确的对应,(Ⅱ) - 作为认知原则 - 确定单词的信息内容 从左撇子上下文,即从词语中的话语。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号