首页> 美国卫生研究院文献>NPJ Digital Medicine >VetTag: improving automated veterinary diagnosis coding via large-scale language modeling
【2h】

VetTag: improving automated veterinary diagnosis coding via large-scale language modeling

机译:VetTag:通过大规模语言建模改进自动兽医诊断编码

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

Unlike human medical records, most of the veterinary records are free text without standard diagnosis coding. The lack of systematic coding is a major barrier to the growing interest in leveraging veterinary records for public health and translational research. Recent machine learning effort is limited to predicting 42 top-level diagnosis categories from veterinary notes. Here we develop a large-scale algorithm to automatically predict all 4577 standard veterinary diagnosis codes from free text. We train our algorithm on a curated dataset of over 100 K expert labeled veterinary notes and over one million unlabeled notes. Our algorithm is based on the adapted Transformer architecture and we demonstrate that large-scale language modeling on the unlabeled notes via pretraining and as an auxiliary objective during supervised learning greatly improves performance. We systematically evaluate the performance of the model and several baselines in challenging settings where algorithms trained on one hospital are evaluated in a different hospital with substantial domain shift. In addition, we show that hierarchical training can address severe data imbalances for fine-grained diagnosis with a few training cases, and we provide interpretation for what is learned by the deep network. Our algorithm addresses an important challenge in veterinary medicine, and our model and experiments add insights into the power of unsupervised learning for clinical natural language processing.
机译:与人类医疗记录不同,大多数兽医记录都是自由文本,没有标准的诊断代码。缺乏系统编码是阻碍人们越来越多地利用兽医记录进行公共卫生和转化研究的主要障碍。最近的机器学习工作仅限于根据兽医笔记预测42个顶级诊断类别。在这里,我们开发了一种大规模算法,可以从自由文本中自动预测所有4577个标准兽医诊断代码。我们在超过100 K专家标记的兽医注释和超过一百万未标记的注释的精选数据集上训练算法。我们的算法基于自适应的Transformer架构,并且我们证明了通过预先训练对无标签音符进行大规模语言建模,并将其作为监督学习期间的辅助目标,可以大大提高性能。我们系统地评估了在具有挑战性的环境中该模型和几个基准的性能,其中在具有实质性领域转移的另一家医院中评估在一所医院训练的算法。此外,我们证明了分层训练可以通过一些训练案例解决严重的数据不平衡问题,以进行细粒度的诊断,并且我们为深度网络学到的知识提供了解释。我们的算法解决了兽医学中的一项重要挑战,我们的模型和实验为无监督学习对临床自然语言处理的强大功能提供了深刻见解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号