...
首页> 外文期刊>BMC Medical Research Methodology >Creating efficiencies in the extraction of data from randomized trials: a prospective evaluation of a machine learning and text mining tool
【24h】

Creating efficiencies in the extraction of data from randomized trials: a prospective evaluation of a machine learning and text mining tool

机译:从随机试验中提取数据的提取效率:机器学习和文本挖掘工具的预期评估

获取原文
           

摘要

Machine learning tools that semi-automate data extraction may create efficiencies in systematic review production. We evaluated a machine learning and text mining tool’s ability to (a) automatically extract data elements from randomized trials, and (b) save time compared with manual extraction and verification. For 75 randomized trials, we manually extracted and verified data for 21 data elements. We uploaded the randomized trials to an online machine learning and text mining tool, and quantified performance by evaluating its ability to identify the reporting of data elements (reported or not reported), and the relevance of the extracted sentences, fragments, and overall solutions. For each randomized trial, we measured the time to complete manual extraction and verification, and to review and amend the data extracted by the tool. We calculated the median (interquartile range [IQR]) time for manual and semi-automated data extraction, and overall time savings. The tool identified the reporting (reported or not reported) of data elements with median (IQR) 91% (75% to 99%) accuracy. Among the top five sentences for each data element at least one sentence was relevant in a median (IQR) 88% (83% to 99%) of cases. Among a median (IQR) 90% (86% to 97%) of relevant sentences, pertinent fragments had been highlighted by the tool; exact matches were unreliable (median (IQR) 52% [33% to 73%]). A median 48% of solutions were fully correct, but performance varied greatly across data elements (IQR 21% to 71%). Using ExaCT to assist the first reviewer resulted in a modest time savings compared with manual extraction by a single reviewer (17.9 vs. 21.6?h total extraction time across 75 randomized trials). Using ExaCT to assist with data extraction resulted in modest gains in efficiency compared with manual extraction. The tool was reliable for identifying the reporting of most data elements. The tool’s ability to identify at least one relevant sentence and highlight pertinent fragments was generally good, but changes to sentence selection and/or highlighting were often required.
机译:半自动化数据提取的机器学习工具可能会在系统审查生产中产生效率。我们评估了机器学习和文本挖掘工具的能力(a)自动从随机试验中提取数据元素,(b)与手动提取和验证相比节省时间。对于75个随机试验,我们手动提取和验证了21个数据元素的数据。我们将随机试验上传到在线机器学习和文本挖掘工具,并通过评估其识别数据元素报告(报告或未报告)的能力以及提取的句子,片段和整体解决方案的相关性来进行量化的性能。对于每个随机试验,我们测量了完成手动提取和验证的时间,并审查和修改该工具提取的数据。我们计算了手动和半自动数据提取的中位数(句子范围[IQR])时间,节省了总节能。该工具确定了具有中位数(IQR)91%(75%至99%)准确度的数据元素的报告(报告或未报告)。在每个数据元素的前五个句子中,至少有一句在中位数(IQR)88%(83%至99%)相关的案件。中位数(IQR)90%(86%至97%)相关句子,该工具突出了相关碎片;完全匹配不可靠(中位数(IQR)52%[33%至73%])。中位数48%的解决方案完全正确,但数据元素(IQR 21%至71%)的性能大大变化。使用精确辅助第一个审稿人导致了一个适度的时间节省,与单个评论者的手动提取相比(17.9与21.6?H总提取时间,在75项随机试验中)。与手动提取相比,使用精确辅助数据提取导致效率的适度增益。该工具可靠地识别大多数数据元素的报告。该工具能够识别至少一个相关句子和突出相关碎片的能力通常很好,但通常需要更改句子选择和/或突出显示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号