首页> 外文期刊>Natural language engineering >Lexical acquisition and semantic space models: Learning the semantics of unknown words
【24h】

Lexical acquisition and semantic space models: Learning the semantics of unknown words

机译:词汇习得和语义空间模型:学习未知词的语义

获取原文
获取原文并翻译 | 示例
       

摘要

In recent studies it has been shown that syntax-based semantic space models outperform models in which the context is represented as a bag-of-words in several semantic analysis tasks. This has been generally attributed to the fact that syntax-based models employ corpora that are syntactically annotated by a parser and a computational grammar. However, if the corpora processed contain words which are unknown to the parser and the grammar, a syntax-based model may lose its advantage since the syntactic properties of such words are unavailable. On the other hand, bag-of-words models do not face this issue since they operate on raw, non-annotated corpora and are thus more robust. In this paper, we compare the performance of syntax-based and bag-of-words models when applied to the task of learning the semantics of unknown words. In our experiments, unknown words are considered the words which are not known to the Alpino parser and grammar of Dutch. In our study, the semantics of an unknown word is defined by finding its most similar word in CORNETTO, a Dutch lexico-semantic hierarchy. We show that for unknown words the syntax-based model performs worse than the bag-of-words approach. Furthermore, we show that if we first learn the syntactic properties of unknown words by an appropriate lexical acquisition method, then in fact the syntax-based model does outperform the bag-of-words approach. The conclusion we draw is that, for words unknown to a given grammar, a bag-of-words model is more robust than a syntax-based model. However, the combination of lexical acquisition and syntax-based semantic models is best suited for learning the semantics of unknown words.
机译:在最近的研究中,已经表明基于语法的语义空间模型的性能优于其中在多个语义分析任务中将上下文表示为单词袋的模型。这通常归因于以下事实:基于语法的模型采用语料库,该语料库由解析器和计算语法进行语法注释。但是,如果处理的语料库包含解析器和语法未知的单词,则基于语法的模型可能会失去其优势,因为此类单词的语法特性不可用。另一方面,词袋模型不会遇到此问题,因为它们在未注释的原始语料库上运行,因此更加健壮。在本文中,我们比较了基于语法和词袋模型在学习未知词语义的任务时的性能。在我们的实验中,未知单词被视为Alpino解析器和荷兰语语法未知的单词。在我们的研究中,未知单词的语义是通过在CORNETTO(荷兰词汇语义层次)中找到最相似的单词来定义的。我们表明,对于未知单词,基于语法的模型的性能要比词袋方法差。此外,我们显示出,如果我们首先通过适当的词汇习得方法来学习未知单词的句法属性,那么实际上基于语法的模型的确优于单词袋方法。我们得出的结论是,对于给定语法未知的单词,词袋模型比基于语法的模型更健壮。但是,词汇获取和基于语法的语义模型的组合最适合于学习未知单词的语义。

著录项

  • 来源
    《Natural language engineering》 |2014年第4期|537-555|共19页
  • 作者

    KOSTADIN CHOLAKOV;

  • 作者单位

    University of Groningen, Oude Kijk in 't Jatstraat 26, 9712EK Groningen, The Netherlands;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-18 02:09:17

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号