首页> 外文会议>LREC-2012 >Semantic annotation of French corpora: animacy and verb semantic classes
【24h】

Semantic annotation of French corpora: animacy and verb semantic classes

机译:法国语料库的语义诠释:动画和动词语义课程

获取原文

摘要

This paper presents a first corpus of French annotated for animacy and for verb semantic classes. The resource consists of 1,346 sentences extracted from three different corpora: the French Treebank (Abeillé and Barrier, 2004), the Est-Républicain corpus (CNRTL) and the ESTER corpus (ELRA). It is a set of parsed sentences, containing a verbal head subcategorizing two complements, with annotations on the verb and on both complements, in the TIGER XML format (Mengel and Lezius, 2000). The resource was manually annotated and manually corrected by three annotators. Animacy has been annotated following the categories of (Zaenen et al., 2004). Measures of inter-annotator agreement are good (Multi-π = 0.82 and Multi-k = 0.86 (k = 3, N = 2360)). As for verb semantic classes, we used three of the five levels of classification of an existing dictionary: Les Verbes du Fran?ais (Dubois and Dubois-Charlier, 1997). For the higher level (generic classes), the measures of agreement are Multi-π= 0.84 and Multi-k = 0.87 (k = 3, N = 1346). The inter-annotator agreements show that the annotated data are reliable for both animacy and verbal semantic classes.
机译:本文介绍了一个用于动画和动词语义课程的法语的第一个语料库。该资源由三个不同的基层提取的1,346个句子:法国TreeBank(Abeillé和Barrier,2004),Est-RéPublicain语料库(CNRTL)和酯类语料库(ELRA)。它是一组解析的句子,包含一个口头头子类别的两个补充,在Tiger XML格式(Mengel和Lezius,2000)中,动词和两种补充都有注释。资源由三个注释器手动注释并手动纠正。动态已经在(Zaenen等,2004)的类别之后被注释。共注入者协议的措施良好(多π= 0.82和多k = 0.86(k = 3,n = 2360))。对于动词语义课程,我们使用了现有词典的五个分类中的三个分类:LES术语杜弗兰?AIS(Dubois和Dubois-Charlier,1997)。对于更高的级别(通用类),协议措施是多π= 0.84,多k = 0.87(k = 3,n = 1346)。 Inter-Annotator协议表明,Animacy和口头语义类都可以是可靠的注释数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号