首页> 外文期刊>Language Resources and Evaluation >Enriching news events with meta-knowledge information

Enriching news events with meta-knowledge information


获取原文并翻译 | 示例


Given the vast amounts of data available in digitised textual form, it is important to provide mechanisms that allow users to extract nuggets of relevant information from the ever growing volumes of potentially important documents. Text mining techniques can help, through their ability to automatically extract relevant event descriptions, which link entities with situations described in the text. However, correct and complete interpretation of these event descriptions is not possible without considering additional contextual information often present within the surrounding text. This information, which we refer to as meta-knowledge, can include (but is not restricted to) the modality, subjectivity, source, polarity and specificity of the event. We have developed a meta-knowledge annotation scheme specifically tailored for news events, which includes six aspects of event interpretation. We have applied this annotation scheme to the ACE 2005 corpus, which contains 599 documents from various written and spoken news sources. We have also identified and annotated the words and phrases evoking the different types of meta-knowledge. Evaluation of the annotated corpus shows high levels of inter-annotator agreement for five meta-knowledge attributes, and moderate level of agreement for the sixth attribute. Detailed analysis of the annotated corpus has revealed further insights into the expression mechanisms of different types of meta-knowledge, their relative frequencies and mutual correlations.
机译:鉴于以数字化文本形式提供的大量数据,重要的是要提供一种机制,使用户能够从数量不断增长的潜在重要文档中提取相关信息。文本挖掘技术可以自动提取相关事件描述,从而将实体与文本中描述的情况联系起来,从而可以提供帮助。但是,如果不考虑周围文本中经常出现的其他上下文信息,就不可能正确,完整地解释这些事件描述。这些信息(我们称为元知识)可以包括(但不限于)事件的形式,主观性,来源,极性和特异性。我们已经开发了专门为新闻事件量身定制的元知识注释方案,其中包括事件解释的六个方面。我们已将此注释方案应用于ACE 2005语料库,其中包含来自各种书面和口头新闻来源的599个文档。我们还识别并注释了引起不同类型的元知识的单词和短语。对带注释的语料库的评估显示,对于五个元知识属性,注释者之间的协议水平很高,对于第六个属性,注释者的协议水平中等。对带注释的语料库的详细分析揭示了对不同类型的元知识的表达机制,它们的相对频率和相互关系的进一步了解。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号