首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Polish evaluation dataset for compositional distributional semantics models
【24h】

Polish evaluation dataset for compositional distributional semantics models

机译:波兰评估数据集用于组建分布语义模型

获取原文

摘要

The paper presents a procedure of building an evaluation dataset1. for the validation of compositional distributional semantics models estimated for languages other than English. The procedure generally builds on steps designed to assemble the SICK corpus, which contains pairs of English sentences annotated for semantic related-ness and entailment, because we aim at building a comparable dataset. However, the implementation of particular building steps significantly differs from the original SICK design assumptions, which is caused by both lack of necessary extraneous resources for an investigated language and the need for language-specific transformation rules. The designed procedure is verified on Polish, a fusional language with a relatively free word order, and contributes to building a Polish evaluation dataset. The resource consists of 10K sentence pairs which are human-annotated for semantic relatedness and entailment. The dataset may be used for the evaluation of compositional distributional semantics models of Polish.
机译:本文提出了构建评估数据集1的过程。对于估计英语以外的语言的组成分布语义模型的验证。该过程通常在旨在组装生病语料库的步骤中构建,其中包含用于语义相关的对的英语句子,因为我们的目标是构建可比的数据集。然而,特定构建步骤的实施与原始病假设计假设有很大不同,这是由缺乏对调查语言的必要外线资源引起的,以及对特定于语言的转型规则的必要性。设计过程在波兰语中验证,一种具有相对自由单词顺序的诡计语言,并有助于构建波兰评估数据集。该资源由10K句子对组成,用于人为注释,用于语义相关性和有关。数据集可用于评估抛光的组成分布语义模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号