首页> 外文期刊>Journal of biomedical informatics. >Text de-identification for privacy protection: A study of its impact on clinical text information content
【24h】

Text de-identification for privacy protection: A study of its impact on clinical text information content

机译:用于隐私保护的文本去识别:对临床文本信息内容的影响研究

获取原文
获取原文并翻译 | 示例
           

摘要

As more and more electronic clinical information is becoming easier to access for secondary uses such as clinical research, approaches that enable faster and more collaborative research while protecting patient privacy and confidentiality are becoming more important. Clinical text de-identification offers such advantages but is typically a tedious manual process. Automated Natural Language Processing (NLP) methods can alleviate this process, but their impact on subsequent uses of the automatically de-identified clinical narratives has only barely been investigated. In the context of a larger project to develop and investigate automated text de-identification for Veterans Health Administration (VHA) clinical notes, we studied the impact of automated text de-identification on clinical information in a stepwise manner. Our approach started with a high-level assessment of clinical notes informativeness and formatting, and ended with a detailed study of the overlap of select clinical information types and Protected Health Information (PHI). To investigate the informativeness (i.e., document type information, select clinical data types, and interpretation or conclusion) of VHA clinical notes, we used five different existing text de-identification systems. The informativeness was only minimally altered by these systems while formatting was only modified by one system. To examine the impact of de-identification on clinical information extraction, we compared counts of SNOMED-CT concepts found by an open source information extraction application in the original (i.e., not de-identified) version of a corpus of VHA clinical notes, and in the same corpus after de-identification. Only about 1.2-3% less SNOMED-CT concepts were found in de-identified versions of our corpus, and many of these concepts were PHI that was erroneously identified as clinical information. To study this impact in more details and assess how generalizable our findings were, we examined the overlap between select clinical information annotated in the 2010 i2b2 NLP challenge corpus and automatic PHI annotations from our best-of-breed VHA clinical text de-identification system (nicknamed 'BoB'). Overall, only 0.81% of the clinical information exactly overlapped with PHI, and 1.78% partly overlapped. We conclude that automated text de-identification's impact on clinical information is small, but not negligible, and that improved clinical acronyms and eponyms disambiguation could significantly reduce this impact.
机译:随着越来越多的电子临床信息越来越容易用于临床研究等次要用途,在保护患者隐私和机密性的同时实现更快,更协作的研究的方法变得越来越重要。临床文本取消识别具有这些优点,但通常是繁琐的手动过程。自动自然语言处理(NLP)方法可以减轻此过程,但是仅对它们对自动识别的临床叙述的后续使用的影响进行了研究。在开发和研究退伍军人卫生管理局(VHA)临床笔记的自动文本去识别的更大项目的背景下,我们以逐步的方式研究了自动文本去识别对临床信息的影响。我们的方法首先是对临床笔记的信息量和格式进行了高级评估,最后是对某些临床信息类型和受保护的健康信息(PHI)重叠进行的详细研究。为了调查VHA临床注释的信息性(即文档类型信息,选择临床数据类型以及解释或结论),我们使用了五个不同的现有文本去识别系统。这些系统仅对信息性进行了最小程度的更改,而格式化仅由一个系统进行了更改。为了检查取消身份识别对临床信息提取的影响,我们比较了开源信息提取应用程序在原始(即未确定身份)的VHA临床注释语料库中发现的SNOMED-CT概念的计数,以及取消识别后在同一语料库中。在我们的语料库的去识别版本中,SNOMED-CT概念仅减少了约1.2-3%,并且这些概念中有许多是被错误地识别为临床信息的PHI。为了更详细地研究这种影响并评估我们的发现的一般性,我们检查了2010 i2b2 NLP挑战语料中注释的特定临床信息与我们同类最佳的VHA临床文本去识别系统中的自动PHI注释之间的重叠(昵称“ BoB”)。总体而言,仅0.81%的临床信息与PHI完全重叠,而1.78%的部分重叠。我们得出的结论是,自动文本取消识别对临床信息的影响很小,但不能忽略,而且改进的临床首字母缩写词和别名消除歧义可以大大减少这种影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号