首页> 外文会议>6th workshop on balto-slavic natural language processing >Clustering of Russian Adjective-Noun Constructions Using Word Embeddings
【24h】

Clustering of Russian Adjective-Noun Constructions Using Word Embeddings

机译:使用词嵌入法对俄语形容词-名词结构进行聚类

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a method of automatic construction extraction from a large corpus of Russian. The term 'construction' here means a multi-word expression in which a variable can be replaced with another word from the same semantic class, for example, a glass of [water/juice/milk]. We deal with constructions that consist of a noun and its adjective modifier. We propose a method of grouping such constructions into semantic classes via 2-step clustering of word vectors in distributional models. We compare it with other clustering techniques and evaluate it against A Russian-English Collocational Dictionary of the Human Body that contains manually annotated groups of constructions with nouns denoting human body parts. The best performing method is used to cluster all adjective-noun bigrams in the Russian National Corpus. Results of this procedure are publicly available and can be used to build a Russian construction dictionary, accelerate theoretical studies of constructions as well as facilitate teaching Russian as a foreign language.
机译:本文提出了一种从大型俄语语料库中自动提取构造方法。这里的“构造”一词是一个多词表达,其中一个变量可以用同一语义类别中的另一个词替换,例如一杯[水/果汁/牛奶]。我们处理由名词及其形容词修饰语组成的构造。我们提出了一种通过分布模型中单词向量的两步聚类将这种结构分组为语义类的方法。我们将其与其他聚类技术进行比较,并根据《俄英人体搭配词典》进行评估,该词典包含人工注释的结构组,其名词表示人体部位。表现最佳的方法用于将俄罗斯国家语料库中的所有形容词双词组聚类。该过程的结果是公开可用的,可用于构建俄语建筑词典,加速对结构的理论研究以及促进俄语作为外语的教学。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号