首页> 外文会议>Workshop on multiword expressions >A Multiword Expression Data Set: Annotating Non-Compositionality and Conventionalization for English Noun Compounds
【24h】

A Multiword Expression Data Set: Annotating Non-Compositionality and Conventionalization for English Noun Compounds

机译:多字expression数据集:注释英语名词化合物的非合成性和常规化

获取原文

摘要

Scarcity of multiword expression data sets raises a fundamental challenge to evaluating the systems that deal with these linguistic structures. In this work we attempt to address this problem for a subclass of multiword expressions by producing a large data set annotated by experts and validated by common statistical measures. We present a set of 1048 noun-noun compounds annotated as non-compositional, compositional, conventionalized and not conventionalized. We build this data set following common trends in previous work while trying to address some of the well known issues such as small number of annotated instances, quality of the annotations, and lack of availability of true negative instances.
机译:跨越多字expression数据集的稀缺提高了评估处理这些语言结构的系统的根本挑战。在这项工作中,我们试图通过制作由专家注释的大数据集并通过常规统计措施验证来解决多个表达式的子类问题。我们展示了一套1048名Noun-Noun化合物作为非成分,成分,常规和不常规化的。我们在以前的工作中的常见趋势中建立此数据,同时尝试解决一些众所周知的问题,例如少量注释的实例,注释质量,以及缺少真正的负实例的可用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号