...
首页> 外文期刊>BMC Medical Informatics and Decision Making >Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients
【24h】

Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients

机译:公开的机器学习模型,用于鉴定阿片类药物滥用住院患者的临床票据

获取原文
           

摘要

Automated de-identification methods for removing protected health information (PHI) from the source notes of the electronic health record (EHR) rely on building systems to recognize mentions of PHI in text, but they remain inadequate at ensuring perfect PHI removal. As an alternative to relying on de-identification systems, we propose the following solutions: (1) Mapping the corpus of documents to standardized medical vocabulary (concept unique identifier [CUI] codes mapped from the Unified Medical Language System) thus eliminating PHI as inputs to a machine learning model; and (2) training character-based machine learning models that obviate the need for a dictionary containing input words/n-grams. We aim to test the performance of models with and without PHI in a use-case for an opioid misuse classifier. An observational cohort sampled from adult hospital inpatient encounters at a health system between 2007 and 2017. A case-control stratified sampling (n?=?1000) was performed to build an annotated dataset for a reference standard of cases and non-cases of opioid misuse. Models for training and testing included CUI codes, character-based, and n-gram features. Models applied were machine learning with neural network and logistic regression as well as expert consensus with a rule-based model for opioid misuse. The area under the receiver operating characteristic curves (AUROC) were compared between models for discrimination. The Hosmer-Lemeshow test and visual plots measured model fit and calibration. Machine learning models with CUI codes performed similarly to n-gram models with PHI. The top performing models with AUROCs ?0.90 included CUI codes as inputs to a convolutional neural network, max pooling network, and logistic regression model. The top calibrated models with the best model fit were the CUI-based convolutional neural network and max pooling network. The top weighted CUI codes in logistic regression has the related terms ‘Heroin’ and ‘Victim of abuse’. We demonstrate good test characteristics for an opioid misuse computable phenotype that is void of any PHI and performs similarly to models that use PHI. Herein we share a PHI-free, trained opioid misuse classifier for other researchers and health systems to use and benchmark to overcome privacy and security concerns.
机译:自动解除识别方法,用于从电子健康记录(EHR)的源券中删除受保护的健康信息(PHI)依赖于建筑系统,以识别文本中PHI的提及,但在确保完美的PHI拆除时保持不足。作为依赖去识别系统的替代方案,我们提出以下解决方案:(1)将文档语料库映射到标准化的医学词汇(概念唯一标识符[CUI]代码从统一医疗语言系统映射),从而消除了PHI作为输入到机器学习模型; (2)培训基于角色的机器学习模型,避免了包含输入单词/ n-gram的字典的需要。我们的目的是在用作阿片类药物滥用分类器的用例中测试模型的性能和没有PHI。从2007年至2017年之间的卫生系统中的成人医院住院病患者遇到的观察队列。进行案例控制分层采样(N?=α1000)以构建注释数据集,用于参考案例和非阿片类药物的非案例滥用。培训和测试模型包括CUI代码,基于角色和N-GRAM功能。应用模型是具有神经网络和逻辑回归的机器学习以及具有基于规则的滥用的规则模型的专家互动。比较了接收器操作特性曲线(AUROC)的区域进行了比较用于辨别的模型。 Hosmer-Lemeshow测试和视觉绘图测量模型适合和校准。具有CUI码的机器学习模型与具有PHI的N-GRAM型号类似地执行。具有Aurocs的顶部执行模型> 0.90包括CUI代码作为卷积神经网络,MAX池网络和逻辑回归模型的输入。具有最佳型号的顶级校准型号是崔的卷积神经网络和最大汇集网络。 Logistic回归中的顶级加权CUI代码具有相关术语“海洛因”和“虐待受害者”。我们证明了阿片类药物滥用可计利用的良好测试特征,这些表型是空隙的任何PHI,并且与使用PHI的模型类似地执行。在此,我们分享无培训的阿片类药物滥用分类器,用于其他研究人员和卫生系统使用和基准以克服隐私和安全问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号