首页> 外文OA文献 >The Effect of Word Sense Disambiguation Accuracy on Literature Based Discovery
【2h】

The Effect of Word Sense Disambiguation Accuracy on Literature Based Discovery

机译:词义消歧准确性对文献发现的影响

摘要

Background: The volume of research published in the biomedical domain has increasingly lead to researchers focussing on specific areas of interest and connections between findings being missed. Literature based discovery (LBD) attempts to address this problem by searching for previously unnoticed connections between published information (also known as ``hidden knowledge''). A common approach is to identify hidden knowledge via shared linking terms. However, biomedical documents are highly ambiguous which can lead LBD systems to over generate hidden knowledge by hypothesising connections through different meanings of linking terms. Word Sense Disambiguation (WSD) aims to resolve ambiguities in text by identifying the meaning of ambiguous terms. This study explores the effect of WSD accuracy on LBD performance. Methods: An existing LBD system is employed and four approaches to WSD of biomedical documents integrated with it. The accuracy of each WSD approach is determined by comparing its output against a standard benchmark. Evaluation of the LBD output is carried out using timeslicing approach, where hidden knowledge is generated from articles published prior to a certain cutoff date and a gold standard extracted from publications after the cutoff date. Results: WSD accuracy varies depending on the approach used. The connection between the performance of the LBD and WSD systems are analysed to reveal a correlation between WSD accuracy and LBD performance. Conclusion: This study reveals that LBD performance is sensitive to WSD accuracy. It is therefore concluded that WSD has the potential to improve the output of LBD systems by reducing the amount of spurious hidden knowledge that is generated. It is also suggested that further improvements in WSD accuracy have the potential to improve LBD accuracy.
机译:背景:在生物医学领域发表的研究量越来越多,导致研究人员专注于特定的关注领域以及遗漏的研究结果之间的联系。基于文献的发现(LBD)试图通过搜索已发布的信息(也称为``隐藏知识'')之间以前未被注意的联系来解决此问题。一种常见的方法是通过共享的链接术语来识别隐藏的知识。但是,生物医学文献的模棱两可性很高,这可能导致LBD系统通过假设通过链接术语的不同含义进行的连接而过度生成隐藏的知识。词义歧义消除(WSD)旨在通过识别歧义术语的含义来解决文本中的歧义。这项研究探讨了WSD准确性对LBD性能的影响。方法:采用现有的LBD系统,并将四种方法与生物医学文档的WSD集成在一起。每个WSD方法的准确性是通过将其输出与标准基准进行比较来确定的。对LBD输出的评估是使用时间分段方法进行的,其中从某个截止日期之前发布的文章中生成隐藏知识,并在截止日期之后从出版物中提取一个黄金标准。结果:WSD的准确性取决于所使用的方法。分析了LBD和WSD系统的性能之间的联系,以揭示WSD准确性和LBD性能之间的相关性。结论:这项研究表明,LBD性能对WSD精度很敏感。因此可以得出结论,水务署有潜力通过减少所产生的虚假隐性知识的数量来提高LBD系统的输出。还建议进一步改善WSD精度有可能提高LBD精度。

著录项

  • 作者

    Preiss J.; Stevenson R.M.;

  • 作者单位
  • 年度 2016
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号