【24h】

Polish evaluation dataset for compositional distributional semantics models

机译:波兰语评估数据集,用于成分分布语义模型

获取原文

摘要

The paper presents a procedure of building an evaluation dataset1. for the validation of compositional distributional semantics models estimated for languages other than English. The procedure generally builds on steps designed to assemble the SICK corpus, which contains pairs of English sentences annotated for semantic related-ness and entailment, because we aim at building a comparable dataset. However, the implementation of particular building steps significantly differs from the original SICK design assumptions, which is caused by both lack of necessary extraneous resources for an investigated language and the need for language-specific transformation rules. The designed procedure is verified on Polish, a fusional language with a relatively free word order, and contributes to building a Polish evaluation dataset. The resource consists of 10K sentence pairs which are human-annotated for semantic relatedness and entailment. The dataset may be used for the evaluation of compositional distributional semantics models of Polish.
机译:本文介绍了建立评估数据集的过程1。用于验证除英语以外的其他语言估计的成分分布语义模型。该过程通常建立在旨在组装SICK语料库的步骤的基础上,该语料库包含成对的英语句子,这些句子被标注为具有语义相关性和局限性,因为我们的目标是建立可比较的数据集。但是,特定构建步骤的实现与原始SICK设计假设有很大不同,这是由于缺少所研究语言的必要外部资源以及对特定语言的转换规则的需求所致。所设计的过程已在波兰语(一种相对自由的词序融合语言)上进行了验证,并有助于建立波兰语评估数据集。该资源由10K个句子对组成,这些句子对是人工注释的,以实现语义相关性和必要性。该数据集可以用于评估波兰语的成分分布语义模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号