A reproducible survey on word embeddings and ontology-based methods for word similarity: Linear combinations outperform the state of the art

Lastra-Diaz Juan J.; Goikoetxea Josu; Taieb Mohamed Ali Hadj; Garcia-Serrano Ana; Ben Aouicha Mohamed; Agirre Eneko

首页> 外文期刊>Engineering Applications of Artificial Intelligence >A reproducible survey on word embeddings and ontology-based methods for word similarity: Linear combinations outperform the state of the art

【24h】

A reproducible survey on word embeddings and ontology-based methods for word similarity: Linear combinations outperform the state of the art

机译：有关单词嵌入和基于本体的单词相似性方法的可重复性调查：线性组合的性能超越了现有技术

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Human similarity and relatedness judgements between concepts underlie most of cognitive capabilities, such as categorisation, memory, decision-making and reasoning. For this reason, the proposal of methods for the estimation of the degree of similarity and relatedness between words and concepts has been a very active line of research in the fields of artificial intelligence, information retrieval and natural language processing among others. Main approaches proposed in the literature can be categorised in two large families as follows: (1) Ontology-based semantic similarity Measures (OM) and (2) distributional measures whose most recent and successful methods are based on Word Embedding (WE) models. However, the lack of a deep analysis of both families of methods slows down the advance of this line of research and its applications. This work introduces the largest, reproducible and detailed experimental survey of OM measures and WE models reported in the literature which is based on the evaluation of both families of methods on a same software platform, with the aim of elucidating what is the state of the problem. We show that WE models which combine distributional and ontology-based information get the best results, and in addition, we show for the first time that a simple average of two best performing WE models with other ontology-based measures or WE models is able to improve the state of the art by a large margin. In addition, we provide a very detailed reproducibility protocol together with a collection of software tools and datasets as supplementary material to allow the exact replication of our results.

机译：概念之间的人类相似性和相关性判断是大多数认知能力的基础，例如分类，记忆，决策和推理。因此，提出用于估计词与概念之间的相似度和相关度的方法的提议一直是人工智能，信息检索和自然语言处理等领域中非常活跃的研究领域。文献中提出的主要方法可以分为以下两个大类：（1）基于本体的语义相似性度量（OM）和（2）分布度量，其最新和成功的方法都基于词嵌入（WE）模型。但是，缺乏对这两种方法系列的深入分析，减慢了这一研究及其应用的进展。这项工作介绍了文献中报告的最大，可重现和详细的OM测度和WE模型实验调查，该调查基于对同一软件平台上两种方法系列的评估，目的是阐明问题的状态。。我们展示了结合了分布信息和基于本体的信息的WE模型获得了最佳结果，此外，我们首次展示了两个性能最佳的WE模型与其他基于本体的测度或WE模型的简单平均能够大幅度改善现有技术。此外，我们提供了非常详细的可重复性协议，以及一系列软件工具和数据集，作为补充材料，可精确复制我们的结果。

著录项

来源
《Engineering Applications of Artificial Intelligence》 |2019年第10期|645-665|共21页
作者
Lastra-Diaz Juan J.; Goikoetxea Josu; Taieb Mohamed Ali Hadj; Garcia-Serrano Ana; Ben Aouicha Mohamed; Agirre Eneko;
展开▼
作者单位

UNED ETSI Informat NLP & IR Res Grp Juan Rosal 16 Madrid 28040 Spain;

Univ Basque Country Fac Informat IXA NLP Grp Manuel Lardizabal 1 Donostia San Sebastian 20018 Basque Country Spain;

Fac Sci Sfax Sfax Tunisia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Ontology-based semantic similarity measures; Word embedding models; Information Content models; WordNet; Experimental survey; HESML;

机译：基于本体的语义相似性度量;词嵌入模型;信息内容模型;词网;实验调查;汉斯;

相似文献

外文文献
中文文献
专利

1. A reproducible survey on word embeddings and ontology-based methods for word similarity: Linear combinations outperform the state of the art [J] . Lastra-Diaz Juan J., Goikoetxea Josu, Taieb Mohamed Ali Hadj, Engineering Applications of Artificial Intelligence . 2019,第Octa期

机译：有关单词嵌入和基于本体的单词相似性方法的可重复性调查：线性组合的性能超越了现有技术
2. Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity [J] . Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Data in Brief . 2019,第1期

机译：用于单词嵌入的大型实验调查的可再现性数据集，以及基于本体的单词相似性方法
3. A large reproducible benchmark of ontology-based methods and word embeddings for word similarity [J] . Lastra-Diaz Juan J., Goikoetxea Josu, Taieb Mohamed Ali Hadj, Information Systems . 2021,第Feba期

机译：基于本体的方法和单词嵌入式的大型可重复性基准，用于单词相似性
4. Sub-Word Similarity based Search for Embeddings: Inducing Rare-Word Embeddings for Word Similarity Tasks and Language Modelling [C] . Mittul Singh, Clayton Greenberg, Youssef Oualil, International conference on computational linguistics . 2016

机译：基于子词相似性的嵌入搜索：为词相似性任务和语言建模引入稀有词嵌入
5. Improved GloVe Word Embedding Using Linear Weighting Scheme for Word Similarity Tasks [D] . Lu, Qinglan. 2021

机译：使用线性加权方案进行改进的手套单词嵌入单词相似性任务
6. Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity [O] . Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, 2019

机译：用于单词嵌入的大型实验调查的可重复性数据集以及基于本体的单词相似性方法
7. Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity [O] . Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, 2019

机译：用于Word eMbeddings的大型实验调查的再现性数据集和基于本体的词汇方法

A reproducible survey on word embeddings and ontology-based methods for word similarity: Linear combinations outperform the state of the art

摘要

著录项

相似文献

相关主题

期刊订阅