Protocol for a reproducible experimental survey on biomedical sentence similarity

Alicia Lara-Clares; Juan J. Lastra-Díaz; Ana Garcia-Serrano

首页> 外文期刊>PLoS One >Protocol for a reproducible experimental survey on biomedical sentence similarity

【24h】

Protocol for a reproducible experimental survey on biomedical sentence similarity

机译：生物医学句子相似性再现实验调查的协议

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Measuring semantic similarity between sentences is a significant task in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and biomedical text mining. For this reason, the proposal of sentence similarity methods for the biomedical domain has attracted a lot of attention in recent years. However, most sentence similarity methods and experimental results reported in the biomedical domain cannot be reproduced for multiple reasons as follows: the copying of previous results without confirmation, the lack of source code and data to replicate both methods and experiments, and the lack of a detailed definition of the experimental setup, among others. As a consequence of this reproducibility gap, the state of the problem can be neither elucidated nor new lines of research be soundly set. On the other hand, there are other significant gaps in the literature on biomedical sentence similarity as follows: (1) the evaluation of several unexplored sentence similarity methods which deserve to be studied; (2) the evaluation of an unexplored benchmark on biomedical sentence similarity, called Corpus-Transcriptional-Regulation (CTR); (3) a study on the impact of the pre-processing stage and Named Entity Recognition (NER) tools on the performance of the sentence similarity methods; and finally, (4) the lack of software and data resources for the reproducibility of methods and experiments in this line of research. Identified these open problems, this registered report introduces a detailed experimental setup, together with a categorization of the literature, to develop the largest, updated, and for the first time, reproducible experimental survey on biomedical sentence similarity. Our aforementioned experimental survey will be based on our own software replication and the evaluation of all methods being studied on the same software platform, which will be specially developed for this work, and it will become the first publicly available software library for biomedical sentence similarity. Finally, we will provide a very detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results.

机译：测量句子之间的语义相似性是自然语言处理（NLP），信息检索（IR）和生物医学文本挖掘领域的重要任务。因此，近年来，生物医学领域的句子相似性方法的提议引起了很多关注。但是，在生物医学域中报告的大多数句子相似性方法和实验结果不能被重现，因为如下：在没有确认的情况下复制先前的结果，缺乏源代码和数据来复制方法和实验，以及缺乏缺陷实验设置的详细定义，等。由于这种再现性差距来说，问题的状态既不阐明也不是新的研究线条。另一方面，在生物医学句子中的文献中存在其他重要的差距，如下所示：（1）评估几种不开发的句子相似性方法，该方法应该得到研究; （2）评估生物医学句子相似性的未开发基准，称为语料库转录 - 调节（CTR）; （3）研究预处理阶段和命名实体识别（NER）工具对句子相似性的性能的影响;最后，（4）缺乏软件和数据资源，用于在这一研究线中的方法和实验的再现性。该挂号报告识别出这些公开问题，介绍了详细的实验设置，以及文献的分类，开发最大，更新，以及第一次可重复的生物医学句子相似实验调查。我们上述实验调查将基于我们自己的软件复制，并在同一软件平台上研究所有方法的评估，这将专门为此作品开发，并且它将成为生物医学句子相似性的第一个公开的软件库。最后，我们将提供一个非常详细的再现性协议和数据集作为补充材料，以便允许所有实验和结果的确切复制。

著录项

来源
《PLoS One》 |2021年第3期|共28页
作者
Alicia Lara-Clares; Juan J. Lastra-Díaz; Ana Garcia-Serrano;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类医药、卫生;
关键词

相似文献

外文文献
中文文献
专利

1. Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity [J] . Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Data in Brief . 2019,第1期

机译：用于单词嵌入的大型实验调查的可再现性数据集，以及基于本体的单词相似性方法
2. Neural sentence embedding models for semantic similarity estimation in the biomedical domain [J] . Kathrin Blagec, Hong Xu, Asan Agibetov, BMC Bioinformatics . 2019,第1期

机译：神经句子嵌入生物医学域中语义相似性估计的模型
3. BIOSSES: a semantic sentence similarity estimation system for the biomedical domain [J] . Bioinformatics . 2017,第14期

机译：生物医学域的语义句子相似性估算系统
4. Reproducibility of Survey Results: A New Method to Quantify Similarity of Human Subject Pools [C] . Atieh R. Khamesi, Riccardo Musmeci, Simone Silvestri, IEEE Global Communications Conference . 2020

机译：调查结果的再现性：一种量化人类主题池相似性的新方法
5. Using semantic similarity measures in the biomedical domain for computing functional similarity between genes based on gene ontology [D] . Khabiri, Elham 2007

机译：在生物医学领域中使用语义相似性度量基于基因本体计算基因之间的功能相似性
6. Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity [O] . Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, 2019

机译：用于单词嵌入的大型实验调查的可重复性数据集以及基于本体的单词相似性方法
7. Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity [O] . Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, 2019

机译：用于Word eMbeddings的大型实验调查的再现性数据集和基于本体的词汇方法
8. Investigation of In vitro Diagnostic Products Used for the Determination of Bilirubin in Human Subjects: Market Survey, Product Characterization and Proposed Experimental Protocol [R] . Astle, L. 1975

机译：用于测定人类受试者胆红素的体外诊断产品的研究：市场调查，产品表征和拟议的实验方案

Protocol for a reproducible experimental survey on biomedical sentence similarity

摘要

著录项

相似文献

相关主题

期刊订阅