首页> 外文会议>11th Workshop on multiword expressions >A Multiword Expression Data Set: Annotating Non-Compositionality and Conventionalization for English Noun Compounds
【24h】

A Multiword Expression Data Set: Annotating Non-Compositionality and Conventionalization for English Noun Compounds

机译:多词表达数据集:注释英语名词复合词的非可组合性和约定俗成

获取原文
获取原文并翻译 | 示例

摘要

Scarcity of multiword expression data sets raises a fundamental challenge to evaluating the systems that deal with these linguistic structures. In this work we attempt to address this problem for a subclass of multiword expressions by producing a large data set annotated by experts and validated by common statistical measures. We present a set of 1048 noun-noun compounds annotated as non-compositional, compositional, conventionalized and not conventionalized. We build this data set following common trends in previous work while trying to address some of the well known issues such as small number of annotated instances, quality of the annotations, and lack of availability of true negative instances.
机译:多词表达数据集的稀缺性对评估处理这些语言结构的系统提出了根本性的挑战。在这项工作中,我们尝试通过产生由专家注释并通过通用统计量度验证的大数据集来解决多字表达式子类的此问题。我们提出了一组1048个名词-名词化合物,它们被标注为非组成性,组成性,常规化和非常规化。我们按照先前工作中的常见趋势构建此数据集,同时尝试解决一些众所周知的问题,例如注释实例数量少,注释的质量以及缺乏真正的否定实例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号