首页> 外文期刊>Journal of biomedical informatics. >Semantic similarity estimation in the biomedical domain: an ontology-based information-theoretic perspective.
【24h】

Semantic similarity estimation in the biomedical domain: an ontology-based information-theoretic perspective.

机译:生物医学领域的语义相似性估计:基于本体的信息理论观点。

获取原文
获取原文并翻译 | 示例
           

摘要

Semantic similarity estimation is an important component of analysing natural language resources like clinical records. Proper understanding of concept semantics allows for improved use and integration of heterogeneous clinical sources as well as higher information retrieval accuracy. Semantic similarity has been the focus of much research, which has led to the definition of heterogeneous measures using different theoretical principles and knowledge resources in a variety of contexts and application domains. In this paper, we study several of these measures, in addition to other similarity coefficients (not necessarily framed in a semantic context) that may be useful in determining the similarity of sets of terms. In order to make them easier to interpret and improve their applicability and accuracy, we propose a framework grounded in information theory that allows the measures studied to be uniformly redefined. Our framework is based on approximating concept semantics in terms of Information Content (IC). We also propose computing IC in a scalable and efficient manner from the taxonomical knowledge modelled in biomedical ontologies. As a result, new semantic similarity measures expressed in terms of concept Information Content are presented. These measures are evaluated and compared to related works using a benchmark of medical terms and a standard biomedical ontology. We found that an information-theoretical redefinition of well-known semantic measures and similarity coefficients, and an intrinsic estimation of concept IC result in noticeable improvements in their accuracy.
机译:语义相似度估计是分析自然语言资源(如临床记录)的重要组成部分。正确理解概念语义可以改进异构临床源的使用和集成,以及更高的信息检索精度。语义相似性一直是许多研究的重点,这导致在各种情况和应用领域中使用不同的理论原理和知识资源来定义异构度量。在本文中,我们将研究其中的几种度量,以及可能对确定术语集的相似性有用的其他相似性系数(不一定在语义上下文中构成)。为了使它们更易于解释并提高其适用性和准确性,我们提出了一个基于信息论的框架,该框架允许对研究的方法进行统一的重新定义。我们的框架基于信息内容(IC)的近似概念语义。我们还建议根据生物医学本体中建模的分类学知识以可扩展且高效的方式来计算IC。结果,提出了根据概念信息内容表达的新的语义相似性度量。使用医学术语基准和标准生物医学本体对这些措施进行评估并与相关工作进行比较。我们发现,信息理论上对众所周知的语义度量和相似性系数的重新定义以及概念IC的内在估计会导致其准确性显着提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号