...
首页> 外文期刊>Journal of the American Society for Information Science and Technology >Word sense disambiguation by selecting the best semantic type based on journal descriptor indexing: Preliminary experiment
【24h】

Word sense disambiguation by selecting the best semantic type based on journal descriptor indexing: Preliminary experiment

机译:通过基于期刊描述符索引的最佳语义类型选择来消除词义:初步实验

获取原文
获取原文并翻译 | 示例
           

摘要

An experiment was performed at the National Library of Medicine (R) (NLM (R)) in word sense disambiguation (WSD) using the Journal Descriptor Indexing (JDI) methodology. The motivation is the need to solve the ambiguity problem confronting NLM's MetaMap system, which maps free text to terms corresponding to concepts in NLM's Unified Medical Language System (R) (UMLS (R)) Metathesaurus(D. If the text maps to more than one Metathesaurus concept at the same high confidence score, MetaMap has no way of knowing which concept is the correct mapping. We describe the JDI methodology, which is ultimately based on statistical associations between words in a training set of MEDLINE (R) citations and a small set of journal descriptors (assigned by humans to journals per se) assumed to be inherited by the citations. JDI is the basis for selecting the best meaning that is correlated to UMLS semantic types (STs) assigned to ambiguous concepts in the Metathesaurus. For example, the ambiguity transport has two meanings: "Biological Transport" assigned the ST Cell Function and "Patient transport" assigned the ST Health Care Activity. A JDI-based methodology can analyze text containing transport and determine which ST receives a higher score for that text, which then returns the associated meaning, presumed to apply to the ambiguity itself. We then present an experiment in which a baseline disambiguation method was compared to four versions of JDI in disambiguating 45 ambiguous strings from NLM's WSD Test Collection. Overall average precision for the highest-scoring JDI version was 0.7873 compared to 0.2492 for the baseline method, and average precision for individual ambiguities was greater than 0.90 for 23 of them (51%), greater than 0.85 for 24 (53%), and greater than 0.65 for 35 (79%). On the basis of these results, we hope to improve performance of JDI and test its use in applications.
机译:使用Journal Descriptor Indexing(JDI)方法在国家医学图书馆(R)(NLM(R))进行了词义消歧(WSD)实验。动机是需要解决NLM的MetaMap系统面临的歧义问题,该系统将自由文本映射到与NLM的统一医学语言系统(UM)(M)Metathesaurus(D)中的概念相对应的术语。一个具有相同置信度得分的Metathesaurus概念,MetaMap无法知道哪个概念是正确的映射,我们描述了JDI方法,该方法最终基于MEDLINE(R)引文训练集中的单词之间的统计关联和假定被引用继承的一小套期刊描述符(由人类分配给期刊本身)JDI是选择最佳含义的基础,该含义与Metathesaurus中分配给歧义概念的UMLS语义类型(ST)相关。例如,歧义运输具有两个含义:“生物运输”分配给ST细胞功能,“患者运输”分配给ST医疗保健活动基于JDI的方法可以分析包含运输的文本,并确定哪个ST对该文本接收更高的分数,然后返回关联的含义,并假定适用于歧义本身。然后,我们提出了一个实验,其中将基线消歧方法与四种版本的JDI进行了比较,以消除NLM的WSD测试集中的45个歧义字符串。得分最高的JDI版本的总体平均精度为0.7873,而基线方法的总体平均精度为0.2492,其中单个歧义度的平均精度对于其中23个歧义度(0.9%)大于0.90,对于24个歧义度歧义度大于0.85(53%),并且35(79%)大于0.65。基于这些结果,我们希望提高JDI的性能并测试其在应用程序中的使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号