首页> 外文OA文献 >Towards a Lexicologically Informed Parameter Evaluation of Distributional Modelling in Lexical Semantics
【2h】

Towards a Lexicologically Informed Parameter Evaluation of Distributional Modelling in Lexical Semantics

机译:走向语义语义分布模型的词汇学参数评估

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Distributional models of semantics have become the mainstay of large-scale modelling of word meaning statistical NLP (see Turney and Pantel 2010 for an overview). In a Word Sense Disambiguation task, identifying semantic structure is usually seen as a clustering problem where occurrences of a polysemous word have to be assigned to the ‘correct’ sense. As linguists however, we are not interested solely in performance evaluation against some gold standard; rather, we want to investigate the precise relation between a word's distributional behaviour and its meaning. Given that distributional models are extremely parameter-rich, we want to assess how well and in which way a specific model can capture a lexicological description of semantic structure.In this presentation, we discuss three tools we are developing for a lexicological assessment of distributional models. Firstly, we are creating our own lexicologically informed 'gold standard' of disambiguated noun occurrences, based on the ANW (Algemeen Nederlands Woordenboek) and a random sample from two large-scale Belgian (1.3G) and Netherlandic (500M) Dutch newspaper corpora. Secondly, we are developing a visualisation tool to analyse the impact of parameter settings on the semantic structure captured by a distributional model. Thirdly, we have adapted the a clustering quality measure (McClain & Rao 1975) to assess how well a manual disambiguation is captured by a distributional model independently from a specific clustering algorithm. Similar to Lapesa and Evert's (2013) parameter sweep for a type-level model on semantic priming data, we are striving towards a large-scale parameter evaluation for token-level models on sense-annotated occurrences.
机译:语义分布模型已成为大规模的词义统计NLP建模的主体(有关概述,请参见Turney和Pantel 2010)。在单词义消除歧义任务中,识别语义结构通常被视为一个聚类问题,其中必须将多义单词的出现分配给“正确”义。但是作为语言学家,我们并不仅仅对根据某些金标准进行性能评估感兴趣。相反,我们想研究一个单词的分布行为与其含义之间的精确关系。鉴于分布模型的参数非常丰富,我们想评估一个特定模型以何种方式和方式来捕获语义结构的词汇描述。在此演示中,我们讨论了我们正在开发的三个用于分布模型的词汇学评估的工具。首先,我们基于ANW(Algemeen Nederlands Woordenboek)以及来自两个大型比利时人(1.3G)和荷兰人(500M)荷兰报纸语料库的随机样本,创建了词汇明确的名词歧义出现的“黄金标准”。其次,我们正在开发一种可视化工具,以分析参数设置对分布模型捕获的语义结构的影响。第三,我们采用了聚类质量度量(McClain&Rao 1975),以评估独立于特定聚类算法的分布模型捕获手动歧义的程度。与Lapesa和Evert(2013)针对语义启动数据的类型级别模型的参数扫描类似,我们正在努力为基于语义注释的事件的令牌级别模型进行大规模参数评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号