首页> 外文期刊>Annals of data science >A Clustering Algorithm Based on Document Embedding to Identify Clinical Note Templates
【24h】

A Clustering Algorithm Based on Document Embedding to Identify Clinical Note Templates

机译:一种基于文档嵌入的聚类算法识别临床笔记模板

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

This paper proposes a novel unsupervised document embedding based clustering algorithm to generate clinical note templates. We adapted Charikar's SimHash to embed each clinical document into a vector representation. We modified the traditional K-means algorithm to merge any two clusters with centroids when they are very close. Under the K-means paradigm, our algorithm designates the cluster representative corresponding to the document vector closest to the centroid as the prototype template. On a corpus of clinical notes, we evaluated the feasibility of utilizing our algorithm at the individual author level. The corpus contains 1,063,893 clinical notes corresponding to 19,146 unique providers between January 2011 and July 2016. Our algorithm achieved more than 80% precision and runs in O(n) time complexity. We further validated our algorithm using human annotators who reported it is able to efficiently detect a real clinical document that can represent the other documents in the same cluster at both the department level and the individual clinician level.
机译:本文提出了一种新的无监督文件嵌入基于集群算法,可以生成临床说明模板。我们调整了Charikar的Simhash将每个临床文献嵌入到矢量表示中。我们修改了传统的K-MEAS算法,在非常接近时将任何两个群集与质心合并。在K-means范式下,我们的算法指定与最接近质心的文档向量相对应的集群代表作为原型模板。在临床备注的语料库上,我们评估了利用我们在个人作者水平的算法的可行性。语料库包含1,063,893个临床笔记,对应于2011年1月和2016年7月之间的19,146个独特的提供商。我们的算法在O(n)时间复杂度中实现了超过80%的精度并运行。我们进一步验证了我们的算法,使用人类的注册人报告能够有效地检测到可以在部门层面和个体临床医生层面代表同一集群中的其他文件的真实临床文献。

著录项

  • 来源
    《Annals of data science》 |2021年第3期|497-515|共19页
  • 作者单位

    Division of General Internal Medicine and Primary Care Harvard Medical School Brigham and Women's Hospital Boston MA 02115 USA;

    Division of General Internal Medicine and Primary Care Harvard Medical School Brigham and Women's Hospital Boston MA 02115 USA;

    Shanghai Key Laboratory of Data Science School of Computer Science Fudan University Shanghai 201203 China;

    Shanghai Key Laboratory of Data Science School of Computer Science Fudan University Shanghai 201203 China;

    Division of General Internal Medicine and Primary Care Harvard Medical School Brigham and Women's Hospital Boston MA 02115 USA;

    Division of General Internal Medicine and Primary Care Harvard Medical School Brigham and Women's Hospital Boston MA 02115 USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Cluster analysis; Medical informatics applications; Electronic health records; Embedding methods; Documentation;

    机译:聚类分析;医疗信息学应用;电子健康记录;嵌入方法;文件;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号