A distinguishing feature of many multiword expressions (MWEs) is their semantic non-compositionality. Determining the semantic compositionality of MWEs is important for many natural language processing tasks. We address the task of modeling semantic compositionality of Croatian MWEs. We adopt a composition-based approach within the distributional semantics framework. We build and evaluate models based on Latent Semantic Analysis and the recently proposed neural network-based Skip-gram model, and experiment with different composition functions. We show that the compositionality scores predicted by the Skip-gram additive models correlate well with human judgments (=0.50). When framed as a classification task, the model achieves an accuracy of 0.64.
展开▼
机译:多词表达语义组成分布语义克罗地亚语:2015年3月30日许多多词表达(MWE)的一个显着特征是它们的语义非组成性。确定MWE的语义组成对于许多自然语言处理任务很重要。我们解决了克罗地亚MWE语义组成结构建模的任务。我们在分布语义框架内采用基于组合的方法。我们基于潜在语义分析和最近提出的基于神经网络的Skip-gram模型构建和评估模型;并使用不同的合成函数进行实验。我们表明;由Skip-gram加法模型预测的组成评分与人类判断之间的相关性很好(ρ= 0.50)。当作为分类任务时;该模型的精度为0.64。 Povzetek:Razvita je metoda za dekompozicijo hrva?kega jezika。 1个;