首页> 外文OA文献 >Mapping Semantic Space in Comparable Corpora. Token-level semantic vector spaces as an analysis tool for lexical variation
【2h】

Mapping Semantic Space in Comparable Corpora. Token-level semantic vector spaces as an analysis tool for lexical variation

机译:在可比语料库中映射语义空间。令牌级语义向量空间作为词汇变异的分析工具

摘要

Conceptual space can be carved up linguistically in different ways. The mapping between a set of related concepts and a set of forms need not be one to one and can differ both between varieties of the same language and between different languages. Recently, a number of studies have combined quantitative corpus analysis with visualization techniques to study form-meaning mappings on the exemplar level, both cross-linguistically and within one language: Wälchli (2010) used distributional similarity in parallel corpora and Multi-Dimensional Scaling to visualize how the exemplars of local phrase markers divide up the semantic space between themselves in different languages. Levshina (2011) coded exemplars of Dutch causative constructions for many different features in comparable corpora of different varieties and then used MDS to visualize how they carve up the causativity space. In this study, we present such an exemplar-level analysis and visualization for referentially rich lexical categories, rather than the less referential, grammatical categories studied by Wälchli and Levshina. We argue that the rich semantics of full lexical categories can be captured in a bottom-up, automatic way by token-level Semantic Vector Spaces (Turney & Pantel 2010; Heylen, Speelman & Geeraerts 2012) and we visualize how the individual occurrences of a set of near-synonyms carve up their concept’s semantic space in a comparable corpus of different language varieties. As a case study, we look all the occurrences of lexemes used to refer to the concept IMMIGRANT in a 1.3 million word corpus of Dutch and Belgian newspapers from 1999 to 2005. A token-level Semantic Vector Space (Heylen, Speelman & Geeraerts 2012) is then used to structure these occurrences semantically based on the similarity of their contextual usage. Multi Dimensional Scaling allows us to represent these contextual similarities in a 2 dimensional semantic space. With an interactive visualization, we can analyze the different dimensions in the semantic space and their contextual realization, as well as the differences in form-meaning mapping between the Netherlands and Belgium and different newspapers. We also look at the change in the space and form-meaning mappings during the period 1999-2005.
机译:可以用不同的方式在语言上雕刻概念空间。一组相关概念和一组表单之间的映射不必是一对一的,并且可以在相同语言的变体之间以及不同语言之间都不同。最近,许多研究将定量语料库分析与可视化技术相结合,以跨语言和一种语言研究示例水平上的形式-意义映射:Wälchli(2010)在并行语料库和多维尺度分析中使用了分布相似性。可视化本地短语标记的示例如何以不同的语言划分它们之间的语义空间。 Levshina(2011)编码了荷兰致病性构造的样例,用于不同品种的可比语料库中的许多不同特征,然后使用MDS可视化它们如何刻画致病性空间。在这项研究中,我们为参考丰富的词汇类别(而不是Wälchli和Levshina研究的参考较少的语法类别)提供了示例级别的分析和可视化。我们认为,可以通过令牌级语义向量空间以自下而上的自动方式捕获完整词汇类别的丰富语义(Turney&Pantel 2010; Heylen,Speelman&Geeraerts 2012),并且我们可以直观地看到一组近似同义词在不同语言变体的可比语料库中扩展了其概念的语义空间。作为案例研究,我们观察了1999年至2005年间在130万个荷兰和比利时报纸的语料库中所有用于引用IMMIGRANT概念的词素的出现。令牌级语义向量空间(Heylen,Speelman&Geeraerts 2012)然后,根据上下文用法的相似性,使用术语“语义”来构造这些出现。多维标度允许我们在二维语义空间中表示这些上下文相似性。通过交互式可视化,我们可以分析语义空间中的不同维度及其上下文实现,以及荷兰和比利时与不同报纸之间的形式意向映射差异。我们还研究了1999-2005年期间空间和形式-意向映射的变化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号