首页> 美国卫生研究院文献>Neuro-oncology Advances >MLTI-05. IDENTIFYING BRAIN METASTATIC CASES FROM FREE TEXT CLINICAL NARRATIVES WITH REFINEMENT OF SEMANTIC HETEROGENEITY USING MACHINE LEARNING
【2h】

MLTI-05. IDENTIFYING BRAIN METASTATIC CASES FROM FREE TEXT CLINICAL NARRATIVES WITH REFINEMENT OF SEMANTIC HETEROGENEITY USING MACHINE LEARNING

机译:MLTI-05。使用机器学习从文本临床叙词中识别脑转移并完善语义异质性

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

INTRODUCTION: Brain metastatic disease (BM) is ripe for discovery using computational tools like machine learning (ML) due to disease complexity and multidimensional critical data (imaging, genomics, primary disease, drug exposures) . Leveraging real-world-evidence’ (RWE) from routine health data to inform clinical management is hindered by fragmented unstructured data and semantic heterogeneity . Clinical data in EHR and institutional registries are typically free text narratives absent common data elements (CDE). Curating existing data into CDE with machine learning (ML) may inform contemporary approaches (RWE, N-of-1 trials, and precision medicine) that are dependent on large high-quality datasets. Harvesting existing institutional registries may expand demographic representation, confirm benchmarks of established treatments, and provide test environment for prospective ML applications. METHOD: An R-based deep convoluted neural network (DNN) using keras and an API for Tensorflow python was trained on physician narratives of 2000 BM cases and 8000 other CNS conditions labeled by diagnosis spanning 17 years . The ML model was tested with 405 non-labeled narratives to: A) Identify BM from other CNS conditions (i.e. glioma, meningioma, non-tumor). B) Evaluate word embedding using GLoVe to standardize abbreviations and misspellings by assigning terms to CDE by training the model to plot “mets”, “metastases” and “spine” with the 20 most similar contextual words. RESULTS: DNN architecture achieved 97% accuracy in distinguishing BM (n=178) for others (n=227). “Mets” and “metastasis” have a connected contextual network suggesting shared meaning, whereas spine did not share a network. CONCLUSIONS: ML can identify BM cases in free-text registries which can serve as a quality control measure and aid data aggregation. Standardizing shorthand terminology to CDE with DNN trained in word embedding can possibly address semantic heterogeneity and facilitate data automation. Solutions are needed to compile and automate quality BM data across institutions to achieve the volume and complexity required for contemporary analysis using ML.
机译:简介:由于疾病的复杂性和多维关键数据(影像,基因组学,原发病,药物暴露),使用诸如机器学习(ML)之类的计算工具发现脑转移性疾病(BM)的时机已经成熟。零散的非结构化数据和语义异质性阻碍了利用常规健康数据中的“现实证据”(RWE)来进行临床管理。电子病历和机构注册表中的临床数据通常是不含通用数据元素(CDE)的自由文本叙述。使用机器学习(ML)将现有数据整理成CDE可能会为依赖大型高质量数据集的当代方法(RWE,N-of-1试验和精密医学)提供参考。收集现有的机构注册表可能会扩大人口代表性,确定已建立治疗的基准,并为预期的ML应用提供测试环境。方法:使用基于keras和Tensorflow python的API的基于R的深层卷积神经网络(DNN),对2000年BM病例和8000年其他中枢神经系统疾病(经诊断长达17年)的医师叙述进行了培训。使用405种未标记的叙述对ML模型进行了测试,以:A)从其他中枢神经系统疾病(即神经胶质瘤,脑膜瘤,非肿瘤)中识别出BM。 B)使用GLoVe评估单词嵌入,通过训练模型用20个最相似的上下文单词来绘制“元”,“转移”和“脊柱”,从而为CDE分配术语来标准化缩写和拼写错误。结果:DNN架构在区分BM(n = 178)和其他(n = 227)方面达到了97%的准确性。 “大都会”和“转移”具有连接的上下文网络,暗示着共享的含义,而脊椎则没有共享网络。结论:ML可以在自由文本注册表中识别BM案例,这些案例可以用作质量控制措施和辅助数据聚集。使用经过词嵌入训练的DNN将CDE的速记术语标准化,可以解决语义上的异质性并促进数据自动化。需要解决方案来跨机构编译和自动化高质量BM数据,以实现使用ML进行当代分析所需的数量和复杂性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号