首页> 美国卫生研究院文献>BMC Bioinformatics >Redundancy in electronic health record corpora: analysis impact on text mining performance and mitigation strategies
【2h】

Redundancy in electronic health record corpora: analysis impact on text mining performance and mitigation strategies

机译:电子病历语料库中的冗余:分析对文本挖掘性能的影响和缓解策略

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

BackgroundThe increasing availability of Electronic Health Record (EHR) data and specifically free-text patient notes presents opportunities for phenotype extraction. Text-mining methods in particular can help disease modeling by mapping named-entities mentions to terminologies and clustering semantically related terms. EHR corpora, however, exhibit specific statistical and linguistic characteristics when compared with corpora in the biomedical literature domain. We focus on copy-and-paste redundancy: clinicians typically copy and paste information from previous notes when documenting a current patient encounter. Thus, within a longitudinal patient record, one expects to observe heavy redundancy. In this paper, we ask three research questions: (i) How can redundancy be quantified in large-scale text corpora? (ii) Conventional wisdom is that larger corpora yield better results in text mining. But how does the observed EHR redundancy affect text mining? Does such redundancy introduce a bias that distorts learned models? Or does the redundancy introduce benefits by highlighting stable and important subsets of the corpus? (iii) How can one mitigate the impact of redundancy on text mining?
机译:背景技术电子病历(EHR)数据(尤其是自由文本患者注释)的可用性不断提高,为表型提取提供了机会。文本挖掘方法尤其可以通过将命名实体提及映射到术语并将语义相关术语聚类来帮助疾病建模。但是,与生物医学文献领域的语料库相比,EHR语料库显示出特定的统计和语言特征。我们专注于复制和粘贴冗余:在记录当前的患者遭遇时,临床医生通常会复制和粘贴先前笔记中的信息。因此,在纵向的患者记录中,人们期望观察到大量的冗余。在本文中,我们提出三个研究问题:(i)如何在大规模文本语料库中量化冗余? (ii)传统观点认为,较大的语料库在文本挖掘中产生更好的结果。但是,观察到的EHR冗余如何影响文本挖掘?这样的冗余是否会导致扭曲学习模型的偏见?还是通过突出语料库的稳定而重要的子集来实现冗余? (iii)如何减轻冗余对文本挖掘的影响?

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号