首页> 外文期刊>Information Systems >HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset
【24h】

HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset

机译:HESML:基于可扩展本体的语义相似性度量库,具有一组可重复的实验和一个复制数据集

获取原文
获取原文并翻译 | 示例
       

摘要

This work is a detailed companion reproducibility paper of the methods and experiments proposed by Lastra-Diaz and Garcia-Serrano in (2015, 2016) [56-58], which introduces the following contributions: (1) a new and efficient representation model for taxonomies, called PosetHERep, which is an adaptation of the half-edge data structure commonly used to represent discrete manifolds and planar graphs; (2) a new Java software library called the Half-Edge Semantic Measures Library (HESML) based on PosetHERep, which implements most ontology-based semantic similarity measures and Information Content (IC) models reported in the literature; (3) a set of reproducible experiments on word similarity based on HESML and ReproZip with the aim of exactly reproducing the experimental surveys in the three aforementioned works; (4) a replication framework and dataset, called WNSimRep v1, whose aim is to assist the exact replication of most methods reported in the literature; and finally, (5) a set of scalability and performance benchmarks for semantic measures libraries. PosetHERep and HESML are motivated by several drawbacks in the current semantic measures libraries, especially the performance and scalability, as well as the evaluation of new methods and the replication of most previous methods. The reproducible experiments introduced herein are encouraged by the lack of a set of large, self-contained and easily reproducible experiments with the aim of replicating and confirming previously reported results. Likewise, the WNSimRep v1 dataset is motivated by the discovery of several contradictory results and difficulties in reproducing previously reported methods and experiments. PosetHERep proposes a memory-efficient representation for taxonomies which linearly scales with the size of the taxonomy and provides an efficient implementation of most taxonomy-based algorithms used by the semantic measures and IC models, whilst HESML provides an open framework to aid research into the area by providing a simpler and more efficient software architecture than the current software libraries. Finally, we prove the outperformance of HESML on the state-of-the-art libraries, as well as the possibility of significantly improving their performance and scalability without caching using PosetHERep. (C) 2017 The Authors. Published by Elsevier Ltd.
机译:这项工作是Lastra-Diaz和Garcia-Serrano在(2015,2016)[56-58]中提出的方法和实验的详细同伴重现性论文,其中介绍了以下方面:(1)一种新的有效的表示模型分类法,称为PosetHERep,它是通常用于表示离散流形和平面图的半边数据结构的改编; (2)基于PosetHERep的新Java软件库,称为半边语义测度库(HESML),它实现了文献中报道的大多数基于本体的语义相似度测度和信息内容(IC)模型; (3)一组基于HESML和ReproZip的可重复性单词相似性实验,目的是准确地再现上述三篇著作中的实验问卷; (4)一个称为WNSimRep v1的复制框架和数据集,其目的是帮助精确复制文献中报告的大多数方法;最后,(5)语义度量库的一组可伸缩性和性能基准。 PosetHERep和HESML受到当前语义度量库中几个缺点的驱动,特别是性能和可伸缩性,以及对新方法的评估和大多数以前方法的复制。缺少一组大型的,自包含的和易于再现的实验,鼓励了本文介绍的可再现实验,目的是复制和确认先前报道的结果。同样,WNSimRep v1数据集的动机是发现了几个相互矛盾的结果,以及在再现以前报告的方法和实验中遇到的困难。 PosetHERep提出了一种用于分类法的内存有效表示法,该表示法与分类法的大小成线性比例,并有效地实现了语义度量和IC模型使用的大多数基于分类法的算法,而HESML提供了一个开放的框架来帮助对该领域进行研究通过提供比当前软件库更简单,更高效的软件体系结构。最后,我们在最先进的库中证明了HESML的出色性能,以及无需使用PosetHERep进行缓存即可显着提高其性能和可伸缩性的可能性。 (C)2017作者。由Elsevier Ltd.发布

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号