首页> 外文期刊>BMC research notes >An efficient prototype method to identify and correct misspellings in clinical text
【24h】

An efficient prototype method to identify and correct misspellings in clinical text

机译:一种有效的原型方法,用于识别和纠正临床文本中的拼写错误

获取原文
       

摘要

Abstract ObjectiveMisspellings in clinical free text present challenges to natural language processing. With an objective to identify misspellings and their corrections, we developed a prototype spelling analysis method that implements Word2Vec, Levenshtein edit distance constraints, a lexical resource, and corpus term frequencies. We used the prototype method to process two different corpora, surgical pathology reports, and emergency department progress and visit notes, extracted from Veterans Health Administration resources. We evaluated performance by measuring positive predictive value and performing an error analysis of false positive output, using four classifications. We also performed an analysis of spelling errors in each corpus, using common error classifications.ResultsIn this small-scale study utilizing a total of 76,786 clinical notes, the prototype method achieved positive predictive values of 0.9057 and 0.8979, respectively, for the surgical pathology reports, and emergency department progress and visit notes, in identifying and correcting misspelled words. False positives varied by corpus. Spelling error types were similar among the two corpora, however, the authors of emergency department progress and visit notes made over four times as many errors. Overall, the results of this study suggest that this method could also perform sufficiently in identifying misspellings in other clinical document types.
机译:摘要目的:临床自由文本中的拼写错误对自然语言处理提出了挑战。为了识别拼写错误及其更正,我们开发了一种原型拼写分析方法,该方法实现了Word2Vec,Levenshtein编辑距离约束,词汇资源和语料库词频。我们使用了原型方法来处理两种不同的语料库,手术病理学报告以及急诊科的进度和访问记录,这些记录是从退伍军人卫生管理局的资源中提取的。我们通过测量阳性预测值并使用四种分类对假阳性输出进行误差分析来评估性能。我们还使用常见错误分类对每个语料库的拼写错误进行了分析。结果在这项涉及76,786份临床笔记的小规模研究中,原型方法对手术病理报告的阳性预测值分别为0.9057和0.8979 ,以及急诊部门在识别和纠正拼写错误的单词方面的进展和访问说明。误报因语料而异。两种语料库中的拼写错误类型相似,但是,急诊部门的作者进步了,并且访问记录的错误是错误的四倍。总体而言,这项研究的结果表明,该方法在识别其他临床文档类型中的拼写错误时也可以发挥足够的作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号