...
首页> 外文期刊>BMC Bioinformatics >A system for de-identifying medical message board text
【24h】

A system for de-identifying medical message board text

机译:用于取消识别医疗留言板文本的系统

获取原文

摘要

There are millions of public posts to medical message boards by users seeking support and information on a wide range of medical conditions. It has been shown that these posts can be used to gain a greater understanding of patients’ experiences and concerns. As investigators continue to explore large corpora of medical discussion board data for research purposes, protecting the privacy of the members of these online communities becomes an important challenge that needs to be met. Extant entity recognition methods used for more structured text are not sufficient because message posts present additional challenges: the posts contain many typographical errors, larger variety of possible names, terms and abbreviations specific to Internet posts or a particular message board, and mentions of the authors’ personal lives. The main contribution of this paper is a system to de-identify the authors of message board posts automatically, taking into account the aforementioned challenges. We demonstrate our system on two different message board corpora, one on breast cancer and another on arthritis. We show that our approach significantly outperforms other publicly available named entity recognition and de-identification systems, which have been tuned for more structured text like operative reports, pathology reports, discharge summaries, or newswire.
机译:用户寻求针对各种医疗状况的支持和信息的医疗公告板上有数百万条公共帖子。研究表明,这些帖子可以用来更好地了解患者的经历和关注点。随着研究人员继续探索大量的医学讨论区数据以进行研究,保护这些在线社区成员的隐私已成为需要解决的重要挑战。现有的用于结构化文本的实体识别方法是不够的,因为消息发布带来了更多挑战:这些帖子包含许多印刷错误,可能的名称,Internet帖子或特定留言板特有的各种术语,缩写以及作者的提及个人生活。本文的主要贡献在于,考虑到上述挑战,该系统可自动取消对留言板帖子作者的身份。我们用两种不同的留言板语料来演示我们的系统,一种用于乳腺癌,另一种用于关节炎。我们表明,我们的方法明显优于其他公开可用的命名实体识别和取消标识系统,这些系统已针对诸如手术报告,病理报告,出院摘要或新闻专线等更结构化的文本进行了调整。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号