首页> 外文OA文献 >A text-mining approach to obtain detailed treatment information from free-text fields in population-based cancer registries: A study of non-small cell lung cancer in California
【2h】

A text-mining approach to obtain detailed treatment information from free-text fields in population-based cancer registries: A study of non-small cell lung cancer in California

机译:一种文本挖掘方法,以获得基于人口癌症的自由文本领域的详细治疗信息:加利福尼亚州非小细胞肺癌研究

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

BackgroundPopulation-based cancer registries have treatment information for all patients making them an excellent resource for population-level monitoring. However, specific treatment details, such as drug names, are contained in a free-text format that is difficult to process and summarize. We assessed the accuracy and efficiency of a text-mining algorithm to identify systemic treatments for lung cancer from free-text fields in the California Cancer Registry.MethodsThe algorithm used Perl regular expressions in SAS 9.4 to search for treatments in 24,845 free-text records associated with 17,310 patients in California diagnosed with stage IV non-small cell lung cancer between 2012 and 2014. Our algorithm categorized treatments into six groups that align with National Comprehensive Cancer Network guidelines. We compared results to a manual review (gold standard) of the same records.ResultsPercent agreement ranged from 91.1% to 99.4%. Ranges for other measures were 0.71-0.92 (Kappa), 74.3%-97.3% (sensitivity), 92.4%-99.8% (specificity), 60.4%-96.4% (positive predictive value), and 92.9%-99.9% (negative predictive value). The text-mining algorithm used one-sixth of the time required for manual review.ConclusionSAS-based text mining of free-text data can accurately detect systemic treatments administered to patients and save considerable time compared to manual review, maximizing the utility of the extant information in population-based cancer registries for comparative effectiveness research.
机译:背景技术的癌症注册管理机构对所有患者的治疗信息都具有良好的人口级监测资源。然而,具体的治疗细节,例如药物名称,以自由文本格式包含,这些格式难以处理和总结。我们评估了文本挖掘算法的准确性和效率,以识别来自加州癌症登记处的自由文本领域的肺癌的系统治疗。算法在SAS 9.4中使用了Perl常规表达式,以搜索关联的24,845个自由文本记录中的治疗方法在2012年和2014年间诊断患有17,310名加利福尼亚患者,2012年和2014年间阶段非小细胞肺癌。我们的算法将治疗分为六组,与国家综合癌症网络指南一致。我们对结果进行了比较了同一记录的手动评论(黄金标准)。结果分协议范围从91.1%到99.4%。其他措施的范围为0.71-0.92(κ),74.3%-97.3%(敏感性),92.4%-99.8%(特异性),60.4%-96.4%(阳性预测值),92.9%-99.9%(负预测价值)。文本挖掘算法使用手动评论所需的一六个时间。基于自由文本数据的文本挖掘可以准确地检测给患者的全身治疗,并节省相当长的时间与手动回顾,最大化现存的效用基于人口的癌症注册表的信息对比较有效性研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号