首页> 美国卫生研究院文献>other >Unsupervised Medical Subject Heading Assignment Using Output Label Co-occurrence Statistics and Semantic Predications
【2h】

Unsupervised Medical Subject Heading Assignment Using Output Label Co-occurrence Statistics and Semantic Predications

机译:使用输出标签共现统计信息和语义预测的无监督医学科目标题分配

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Librarians at the National Library of Medicine tag each biomedical abstract to be indexed by their Pubmed information system with terms from the Medical Subject Headings (MeSH) terminology. The MeSH terminology has over 26,000 terms and indexers look at each article’s full text to assign a set of most suitable terms for indexing it. Several recent automated attempts focused on using the article title and abstract text to identify MeSH terms for the corresponding article. Most of these approaches used supervised machine learning techniques that use already indexed articles and the corresponding MeSH terms. In this paper, we present a novel unsupervised approach using named entity recognition, relationship extraction, and output label co-occurrence frequencies of MeSH term pairs from the existing set of 22 million articles already indexed with MeSH terms by librarians at NLM. The main goal of our study is to gauge the potential of output label co-occurrence statistics and relationships extracted from free text in unsupervised indexing approaches. Especially, in biomedical domains, output label co-occurrences are generally easier to obtain than training data involving document and label set pairs owing to the sensitive nature of textual documents containing protected health information. Our methods achieve a micro F-score that is comparable to those obtained using supervised machine learning techniques with training data consisting of document label set pairs. Baseline comparisons reveal strong prospects for further research in exploiting label co-occurrences and relationships extracted from free text in recommending terms for indexing biomedical articles.
机译:国家医学图书馆的馆员使用医学主题词(MeSH)术语来标记每个生物医学摘要,以供其Pubmed信息系统索引。 MeSH术语有26,000多个术语,索引器会查看每篇文章的全文,以分配一组最合适的术语对其进行索引。最近的几次自动化尝试着重于使用文章标题和摘要文本来标识相应文章的MeSH术语。这些方法大多数使用有监督的机器学习技术,该技术使用已索引的文章和相应的MeSH术语。在本文中,我们提出了一种新的无监督方法,该方法使用了命名实体识别,关系提取和MeSH术语对的输出标签共现频率,该方法来自NLM馆员已经用MeSH术语索引的现有2200万篇文章中。我们研究的主要目标是评估在无监督索引方法中输出标签共现统计和从自由文本中提取的关系的潜力。特别是,在生物医学领域,由于包含受保护的健康信息的文本文档的敏感性质,通常比涉及文档和标签集对的训练数据更容易获得输出标签共现。我们的方法获得的微F分数与使用监督性机器学习技术获得的微观F分数相当,训练数据由文档标签集对组成。基线比较显示了在利用标签共现和从自由文本中提取推荐生物医学文章索引用术语的关系方面进行进一步研究的强大前景。

著录项

  • 期刊名称 other
  • 作者单位
  • 年(卷),期 -1(7934),-1
  • 年度 -1
  • 页码 176–188
  • 总页数 15
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号