首页>
外文会议>Workshop on multiword expressions
>A Multiword Expression Data Set: Annotating Non-Compositionality and Conventionalization for English Noun Compounds
【24h】
A Multiword Expression Data Set: Annotating Non-Compositionality and Conventionalization for English Noun Compounds
Scarcity of multiword expression data sets raises a fundamental challenge to evaluating the systems that deal with these linguistic structures. In this work we attempt to address this problem for a subclass of multiword expressions by producing a large data set annotated by experts and validated by common statistical measures. We present a set of 1048 noun-noun compounds annotated as non-compositional, compositional, conventionalized and not conventionalized. We build this data set following common trends in previous work while trying to address some of the well known issues such as small number of annotated instances, quality of the annotations, and lack of availability of true negative instances.
展开▼