HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset

Lastra-Diaz Juan J.; Garcia-Serrano Ana; Batet Montserrat; Fernandez Miriam; Chirigati Fernando

首页> 外文期刊>Information Systems >HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset

【24h】

HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset

机译：HESML：基于可扩展本体的语义相似性度量库，具有一组可重复的实验和一个复制数据集

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This work is a detailed companion reproducibility paper of the methods and experiments proposed by Lastra-Diaz and Garcia-Serrano in (2015, 2016) [56-58], which introduces the following contributions: (1) a new and efficient representation model for taxonomies, called PosetHERep, which is an adaptation of the half-edge data structure commonly used to represent discrete manifolds and planar graphs; (2) a new Java software library called the Half-Edge Semantic Measures Library (HESML) based on PosetHERep, which implements most ontology-based semantic similarity measures and Information Content (IC) models reported in the literature; (3) a set of reproducible experiments on word similarity based on HESML and ReproZip with the aim of exactly reproducing the experimental surveys in the three aforementioned works; (4) a replication framework and dataset, called WNSimRep v1, whose aim is to assist the exact replication of most methods reported in the literature; and finally, (5) a set of scalability and performance benchmarks for semantic measures libraries. PosetHERep and HESML are motivated by several drawbacks in the current semantic measures libraries, especially the performance and scalability, as well as the evaluation of new methods and the replication of most previous methods. The reproducible experiments introduced herein are encouraged by the lack of a set of large, self-contained and easily reproducible experiments with the aim of replicating and confirming previously reported results. Likewise, the WNSimRep v1 dataset is motivated by the discovery of several contradictory results and difficulties in reproducing previously reported methods and experiments. PosetHERep proposes a memory-efficient representation for taxonomies which linearly scales with the size of the taxonomy and provides an efficient implementation of most taxonomy-based algorithms used by the semantic measures and IC models, whilst HESML provides an open framework to aid research into the area by providing a simpler and more efficient software architecture than the current software libraries. Finally, we prove the outperformance of HESML on the state-of-the-art libraries, as well as the possibility of significantly improving their performance and scalability without caching using PosetHERep. (C) 2017 The Authors. Published by Elsevier Ltd.

机译：这项工作是Lastra-Diaz和Garcia-Serrano在（2015，2016）[56-58]中提出的方法和实验的详细同伴重现性论文，其中介绍了以下方面：（1）一种新的有效的表示模型分类法，称为PosetHERep，它是通常用于表示离散流形和平面图的半边数据结构的改编；（2）基于PosetHERep的新Java软件库，称为半边语义测度库（HESML），它实现了文献中报道的大多数基于本体的语义相似度测度和信息内容（IC）模型；（3）一组基于HESML和ReproZip的可重复性单词相似性实验，目的是准确地再现上述三篇著作中的实验问卷；（4）一个称为WNSimRep v1的复制框架和数据集，其目的是帮助精确复制文献中报告的大多数方法；最后，（5）语义度量库的一组可伸缩性和性能基准。 PosetHERep和HESML受到当前语义度量库中几个缺点的驱动，特别是性能和可伸缩性，以及对新方法的评估和大多数以前方法的复制。缺少一组大型的，自包含的和易于再现的实验，鼓励了本文介绍的可再现实验，目的是复制和确认先前报道的结果。同样，WNSimRep v1数据集的动机是发现了几个相互矛盾的结果，以及在再现以前报告的方法和实验中遇到的困难。 PosetHERep提出了一种用于分类法的内存有效表示法，该表示法与分类法的大小成线性比例，并有效地实现了语义度量和IC模型使用的大多数基于分类法的算法，而HESML提供了一个开放的框架来帮助对该领域进行研究通过提供比当前软件库更简单，更高效的软件体系结构。最后，我们在最先进的库中证明了HESML的出色性能，以及无需使用PosetHERep进行缓存即可显着提高其性能和可伸缩性的可能性。（C）2017作者。由Elsevier Ltd.发布

著录项

来源
《Information Systems》 |2017年第6期|97-118|共22页
作者
Lastra-Diaz Juan J.; Garcia-Serrano Ana; Batet Montserrat; Fernandez Miriam; Chirigati Fernando;
展开▼
作者单位

Univ Nacl Educ Distancia, ETSI Informat, NLP & IR Res Grp, C Juan Rosal 16, E-28040 Madrid, Spain;

Univ Nacl Educ Distancia, ETSI Informat, NLP & IR Res Grp, C Juan Rosal 16, E-28040 Madrid, Spain;

Univ Oberta Catalunya, Internet Interdisciplinary Inst IN3, Av Carl Friedrich Gauss 5, Castelldefels 08860, Spain;

Open Univ, Knowledge Media Inst, Walton Hall, Mlton Keynes MK7 6AA, England;

NYU, Dept Comp Sci & Engn, New York, NY USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
HESML; PosetHERep; Semantic measures library; Ontology-based semantic similarity measures; Intrinsic and corpus-based Information; Content models; Reproducible experiments on word similarity; WNSimRep v1 dataset; ReproZip; WordNet-based semantic similarity measures;

机译：HESML;PosetHERep;语义度量库;基于本体的语义相似性度量;内在和基于语料的信息;内容模型;可重现的单词相似性实验;WNSimRep v1数据集;ReproZip;基于WordNet的语义相似性度量;
入库时间 2022-08-18 02:47:40

相似文献

外文文献
中文文献
专利

1. Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity [J] . Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Data in Brief . 2019,第1期

机译：用于单词嵌入的大型实验调查的可再现性数据集，以及基于本体的单词相似性方法
2. From Ontology to Semantic Similarity: Calculation of Ontology-Based Semantic Similarity [J] . MingxinGan, XueDou, RuiJiang ScientificWorldJournal . 2013,第3期

机译：从本体学到语义相似性：基于本体的语义相似性计算
3. Comparison of Ontology-Based Semantic-Similarity Measures in the Biomedical Text [J] . Ahmad Fayez S. Althobaiti Journal of Computer and Communications . 2017,第2期

机译：生物医学文本中基于本体的语义相似性度量的比较
4. Ontology-Based Semantic Similarity Approach for Biomedical Dataset Retrieval [C] . Xu Wang, Zhisheng Huang, Prank van Harmelen International Conference on Health Information Science . 2020

机译：生物医学数据集检索的基于本体的语义相似性方法
5. Ontology-based Semantic Harmonization of HIV-associated Common Data Elements for Integration of Diverse HIV Research Datasets. [D] . Brown, William, III. 2016

机译：基于本体的与HIV相关的通用数据元素的语义协调，用于整合不同的HIV研究数据集。
6. Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity [O] . Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, 2019

机译：用于单词嵌入的大型实验调查的可重复性数据集以及基于本体的单词相似性方法
7. HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset [O] . Lastra-Díaz Juan J., García-Serrano Ana, Batet Montserrat, 2017

机译：HEsmL：一种可扩展的基于本体的语义相似性度量库，具有一组可重现的实验和一个复制数据集

HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset

摘要

著录项

相似文献

相关主题

期刊订阅