首页> 外文OA文献 >Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity
【2h】

Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity

机译:用于Word eMbeddings的大型实验调查的再现性数据集和基于本体的词汇方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This data article introduces a reproducibility dataset with the aim of allowing the exact replication of all experiments, results and data tables introduced in our companion paper (Lastra-Díaz et al., 2019), which introduces the largest experimental survey on ontology-based semantic similarity methods and Word Embeddings (WE) for word similarity reported in the literature. The implementation of all our experiments, as well as the gathering of all raw data derived from them, was based on the software implementation and evaluation of all methods in HESML library (Lastra-Díaz et al., 2017), and their subsequent recording with Reprozip (Chirigati et al., 2016). Raw data is made up by a collection of data files gathering the raw word-similarity values returned by each method for each word pair evaluated in any benchmark. Raw data files were processed by running a R-language script with the aim of computing all evaluation metrics reported in (Lastra-Díaz et al., 2019), such as Pearson and Spearman correlation, harmonic score and statistical significance p-values, as well as to generate automatically all data tables shown in our companion paper. Our dataset provides all input data files, resources and complementary software tools to reproduce from scratch all our experimental data, statistical analysis and reported data. Finally, our reproducibility dataset provides a self-contained experimentation platform which allows to run new word similarity benchmarks by setting up new experiments including other unconsidered methods or word similarity benchmarks. Keywords: Ontology-based semantic similarity measures, Word embedding models, Information content models, WordNet, Experimental survey, HESML, Reprozip
机译:该数据文章介绍了一个重现性数据集,目的是允许在我们的伴文纸(Lastra-díaz等,2019)中引入所有实验,结果和数据表的确切复制,这为基于本体的语义引入了最大的实验调查在文献中报告的单词相似性的相似性方法和单词嵌入式(我们)。我们所有实验的实施以及从HESML库中所有方法的软件实现和评估的所有实验以及所有原始数据的聚集(Lastra-Díaz等,2017)以及他们随后的录音砍伐(Chirigati等,2016)。原始数据由收集在任何基准中评估的每个单词对返回的每个方法返回的原始字相似性值组成的数据文件组成。通过运行R-Language脚本来处理原始数据文件,其目的是计算(Lastra-Díaz等,2019)中报告的所有评估指标,例如Pearson和Spearman相关,谐波评分和统计显着性P值,如以及自动生成我们的伴侣纸上所示的所有数据表。我们的数据集提供所有输入数据文件,资源和互补软件工具,以从划痕重现我们所有的实验数据,统计分析和报告的数据。最后,我们的再现性数据集提供了一个独立的实验平台,它允许通过设置新的实验,包括其他未判定方法或单词相似性基准的新实验来运行新的单词相似性基准。关键词:基于本体的语义相似度措施,Word嵌入模型,信息内容模型,Wordnet,实验调查,HESML,ROPOZIP

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号