首页> 外文会议>9th Linguistic annotation workshop 2015 >Annotating genericity: a survey, a scheme, and a corpus
【24h】

Annotating genericity: a survey, a scheme, and a corpus

机译:注释通用性:调查,方案和语料库

获取原文
获取原文并翻译 | 示例

摘要

Generics are linguistic expressions that make statements about or refer to kinds, or that report regularities of events. Non-generic expressions make statements about particular individuals or specific episodes. Generics are treated extensively in semantic theory (Krifka et al., 1995). In practice, it is often hard to decide whether a referring expression is generic or non-generic, and to date there is no data set which is both large and satisfactorily annotated. Such a data set would be valuable for creating automatic systems for identifying generic expressions, in turn facilitating knowledge extraction from natural language text. In this paper we provide the next steps for such an annotation endeavor. Our contributions are: (1) we survey the most important previous projects annotating genericity, focusing on resources for English; (2) with a new agreement study we identify problems in the annotation scheme of the largest currently-available resource (ACE-2005); and (3) we introduce a linguistically-motivated annotation scheme for marking both clauses and their subjects with regard to their genericity. (4) We present a corpus of MASC (Ide et al., 2010) and Wikipedia texts annotated according to our scheme, achieving substantial agreement.
机译:泛型是一种语言表达式,用于声明或引用种类,或报告事件的规律性。非泛型表达表达关于特定个体或特定情节的陈述。泛型在语义理论中得到了广泛的对待(Krifka等,1995)。在实践中,通常很难确定引用表达式是通用的还是非通用的,并且迄今为止,还没有既大又令人满意的数据集。这样的数据集对于创建用于识别通用表达的自动系统,进而促进从自然语言文本中提取知识而言将是有价值的。在本文中,我们提供了进行此类注释的后续步骤。我们的贡献是:(1)我们调查了以前最重要的注释通用性的项目,重点是英语资源; (2)通过一项新的协议研究,我们发现了当前最大可用资源的注释方案(ACE-2005)中的问题; (3)我们引入了一种出于语言动机的注释方案,用于对子句及其主题的通用性进行标记。 (4)我们提出了MASC的语料库(Ide等,2010),并根据我们的计划对Wikipedia文本进行了注释,从而取得了实质性的共识。

著录项

  • 来源
  • 会议地点 Denver CO(US)
  • 作者单位

    Department of Computational Linguistics, Universitaet des Saarlandes, Germany;

    Institut fuer Maschinelle Sprachverarbeitung, Universitaet Stuttgart, Germany;

    Department of Computational Linguistics, Universitaet des Saarlandes, Germany;

    Department of Computational Linguistics, Universitaet des Saarlandes, Germany;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号