首页> 外文会议>IEEE International Conference on Healthcare Informatics >Using Clinical Narratives and Structured Data to Identify Distant Recurrences in Breast Cancer
【24h】

Using Clinical Narratives and Structured Data to Identify Distant Recurrences in Breast Cancer

机译:使用临床叙述和结构化数据确定乳腺癌的远处复发

获取原文

摘要

Accurately identifying distant recurrences in breast cancer from the Electronic Health Records (EHR) is important for both clinical care and secondary analysis. Although multiple applications have been developed for computational phenotyping in breast cancer, distant recurrence identification still relies heavily on manual chart review. In this study, we aim to develop a model that identifies distant recurrences in breast cancer using clinical narratives and structured data from EHR. We apply MetaMap to extract features from clinical narratives and also retrieve structured clinical data from EHR. Using these features, we train a support vector machine model to identify distant recurrences in breast cancer patients. We train the model using 1,396 double-annotated subjects and validate the model using 599 double-annotated subjects. In addition, we validate the model on a set of 4,904 single-annotated subjects as a generalization test. We obtained a high area under curve (AUC) score of 0.92 (SD=0.01) in the cross-validation using the training dataset, then obtained AUC scores of 0.95 and 0.93 in the held-out test and generalization test using 599 and 4,904 samples respectively. Our model can accurately and efficiently identify distant recurrences in breast cancer by combining features extracted from unstructured clinical narratives and structured clinical data.
机译:从电子健康记录(EHR)中准确识别出乳腺癌的远处复发对于临床护理和二级分析都非常重要。尽管已经为乳腺癌的计算机表型开发了多种应用程序,但远距离复发的鉴定仍然严重依赖于人工图表检查。在这项研究中,我们旨在开发一种模型,该模型使用临床叙述和EHR的结构化数据来识别乳腺癌的远处复发。我们应用MetaMap从临床叙述中提取特征,并从EHR中检索结构化的临床数据。利用这些功能,我们训练了一种支持向量机模型,以识别乳腺癌患者中的远处复发。我们使用1,396个双注释主题训练模型,并使用599个双注释主题验证模型。此外,我们对一组4,904个单注释主题的模型进行了验证,作为泛化测试。使用训练数据集在交叉验证中获得0.92(SD = 0.01)的高曲线下面积(AUC)分数,然后在599和4,904个样本的坚持测试和泛化测试中获得0.95和0.93的AUC得分分别。通过结合从非结构化临床叙述和结构化临床数据中提取的特征,我们的模型可以准确有效地识别乳腺癌的远处复发。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号