首页> 外文OA文献 >Validation of an improved computer-assisted technique for mining free-text electronic medical records
【2h】

Validation of an improved computer-assisted technique for mining free-text electronic medical records

机译:验证用于挖掘自由文本电子病历的改进的计算机辅助技术

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Background: The use of electronic medical records (EMRs) offers opportunity for clinical epidemiological research. With large EMR databases, automated analysis processes are necessary but require thorough validation before they can be routinely used.udObjective: The aim of this study was to validate a computer-assisted technique using commercially available content analysis software (SimStat-WordStat v.6 (SS/WS), Provalis Research) for mining free-text EMRs.udMethods: The dataset used for the validation process included life-long EMRs from 335 patients (17,563 rows of data), selected at random from a larger dataset (141,543 patients, ~2.6 million rows of data) and obtained from 10 equine veterinary practices in the United Kingdom. The ability of the computer-assisted technique to detect rows of data (cases) of colic, renal failure, right dorsal colitis, and non-steroidal anti-inflammatory drug (NSAID) use in the population was compared with manual classification. The first step of the computer-assisted analysis process was the definition of inclusion dictionaries to identify cases, including terms identifying a condition of interest. Words in inclusion dictionaries were selected from the list of all words in the dataset obtained in SS/WS. The second step consisted of defining an exclusion dictionary, including combinations of words to remove cases erroneously classified by the inclusion dictionary alone. The third step was the definition of a reinclusion dictionary to reinclude cases that had been erroneously classified by the exclusion dictionary. Finally, cases obtained by the exclusion dictionary were removed from cases obtained by the inclusion dictionary, and cases from the reinclusion dictionary were subsequently reincluded using Rv3.0.2 (R Foundation for Statistical Computing, Vienna, Austria). Manual analysis was performed as a separate process by a single experienced clinician reading through the dataset once and classifying each row of data based on the interpretation of the free-text notes. Validation was performed by comparison of the computer-assisted method with manual analysis, which was used as the gold standard. Sensitivity, specificity, negative predictive values (NPVs), positive predictive values (PPVs), and F values of the computer-assisted process were calculated by comparing them with the manual classification.udResults: Lowest sensitivity, specificity, PPVs, NPVs, and F values were 99.82% (1128/1130), 99.88% (16410/16429), 94.6% (223/239), 100.00% (16410/16412), and 99.0% (100×2×0.983×0.998/[0.983+0.998]), respectively. The computer-assisted process required few seconds to run, although an estimated 30 h were required for dictionary creation. Manual classification required approximately 80 man-hours.udConclusions: The critical step in this work is the creation of accurate and inclusive dictionaries to ensure that no potential cases are missed. It is significantly easier to remove false positive terms from a SS/WS selected subset of a large database than search that original database for potential false negatives. The benefits of using this method are proportional to the size of the dataset to be analyzed.
机译:背景:电子病历(EMR)的使用为临床流行病学研究提供了机会。对于大型EMR数据库,自动分析过程是必需的,但需要进行常规验证,然后才能常规使用。 ud目标:本研究的目的是使用市售内容分析软件(SimStat-WordStat v.6验证计算机辅助技术) (uds)(Provalis Research)(SS / WS),用于挖掘自由文本EMR。 udMethods:用于验证过程的数据集包括来自335名患者的终生EMR(17,563行数据),是从较大的数据集中随机选择的(141,543)患者,约260万行数据),并从英国的10种马兽医实践中获得。将计算机辅助技术检测人群中绞痛,肾功能衰竭,右背结肠炎和非甾体抗炎药(NSAID)使用的数据(案例)的能力与手动分类进行了比较。计算机辅助分析过程的第一步是定义包含字典以识别案例,包括确定感兴趣条件的术语。从SS / WS中获得的数据集中所有单词的列表中选择包含词典中的单词。第二步包括定义一个排除字典,包括单词组合以消除仅由包含字典错误分类的案例。第三步是重新包含字典的定义,以重新包含被排除字典错误分类的案例。最后,将排除字典中获得的案例从包含字典中获得的案例中删除,然后使用Rv3.0.2(R Foundation for Statistics Computing,维也纳,奥地利)将重新包含字典中的案例包括在内。手动分析是由一位经验丰富的临床医生一次遍历数据集并基于对自由文本注释的解释对数据的每一行进行分类的单独过程。通过比较计算机辅助方法和手工分析作为金标准,进行了验证。通过与手动分类进行比较,计算了计算机辅助过程的灵敏度,特异性,阴性预测值(NPV),阳性预测值(PPV)和F值。 ud结果:最低的灵敏度,特异性,PPV,NPV和F值是99.82%(1128/1130),99.88%(16410/16429),94.6%(223/239),100.00%(16410/16412)和99.0%(100×2×0.983×0.998 / [0.983+ 0.998])。尽管需要大约30个小时来创建字典,但计算机辅助过程需要运行几秒钟。人工分类大约需要80个工时。 ud结论:这项工作的关键步骤是创建准确且具有包容性的词典,以确保不会遗漏任何潜在案例。从大型数据库的SS / WS选择的子集中删除误报项比在原始数据库中查找潜在的误报要容易得多。使用此方法的好处与要分析的数据集的大小成正比。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号