首页> 外文期刊>Biomedical and Health Informatics, IEEE Journal of >Semantic Similarity Measures in the Biomedical Domain by Leveraging a Web Search Engine
【24h】

Semantic Similarity Measures in the Biomedical Domain by Leveraging a Web Search Engine

机译:利用Web搜索引擎的生物医学领域中的语义相似性度量

获取原文
获取原文并翻译 | 示例
           

摘要

Various researches in web related semantic similarity measures have been deployed. However, measuring semantic similarity between two terms remains a challenging task. The traditional ontology-based methodologies have a limitation that both concepts must be resided in the same ontology tree(s). Unfortunately, in practice, the assumption is not always applicable. On the other hand, if the corpus is sufficiently adequate, the corpus-based methodologies can overcome the limitation. Now, the web is a continuous and enormous growth corpus. Therefore, a method of estimating semantic similarity is proposed via exploiting the page counts of two biomedical concepts returned by Google AJAX web search engine. The features are extracted as the co-occurrence patterns of two given terms P and Q, by querying P, Q, as well as P AND Q, and the web search hit counts of the defined lexico-syntactic patterns. These similarity scores of different patterns are evaluated, by adapting support vector machines for classification, to leverage the robustness of semantic similarity measures. Experimental results validating against two datasets: dataset 1 provided by A. Hliaoutakis; dataset 2 provided by T. Pedersen, are presented and discussed. In dataset 1, the proposed approach achieves the best correlation coefficient (0.802) under SNOMED-CT. In dataset 2, the proposed method obtains the best correlation coefficient (SNOMED-CT: 0.705; MeSH: 0.723) with physician scores comparing with measures of other methods. However, the correlation coefficients (SNOMED-CT: 0.496; MeSH: 0.539) with coder scores received opposite outcomes. In conclusion, the semantic similarity findings of the proposed method are close to those of physicians’ ratings. Furthermore, the study provides a cornerstone investigation for extracting fully relevant information from digitizing, free-text medical records in the National Taiwan University Hospital database.
机译:与网络相关的语义相似性度量的各种研究已经展开。但是,测量两个术语之间的语义相似性仍然是一项艰巨的任务。传统的基于本体的方法有一个局限,即这两个概念必须驻留在相同的本体树中。不幸的是,实际上,该假设并不总是适用。另一方面,如果语料库足够充裕,则基于语料库的方法可以克服限制。现在,网络是一个持续不断且巨大的增长主体。因此,通过利用Google AJAX网络搜索引擎返回的两个生物医学概念的页数,提出了一种估计语义相似度的方法。通过查询P,Q以及P AND Q和定义的词法句法模式的网络搜索命中数,将特征提取为两个给定术语P和Q的共现模式。通过调整支持向量机进行分类,可以评估不同模式的这些相似性评分,以利用语义相似性度量的鲁棒性。针对两个数据集验证了实验结果:A。Hliaoutakis提供的数据集1;介绍并讨论了T. Pedersen提供的数据集2。在数据集1中,该方法在SNOMED-CT下达到了最佳相关系数(0.802)。在数据集2中,与其他方法的测量结果相比,所提出的方法在医生评分方面获得了最佳的相关系数(SNOMED-CT:0.705; MeSH:0.723)。但是,与编码者分数相关的相关系数(SNOMED-CT:0.496; MeSH:0.539)收到相反的结果。总之,所提出方法的语义相似性发现与医师的评定结果相近。此外,该研究为从台湾大学医院数据库中的数字化自由文本医疗记录中提取完全相关的信息提供了基础研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号