...
首页> 外文期刊>JMIR Medical Informatics >Natural Language Processing for Surveillance of Cervical and Anal Cancer and Precancer: Algorithm Development and Split-Validation Study
【24h】

Natural Language Processing for Surveillance of Cervical and Anal Cancer and Precancer: Algorithm Development and Split-Validation Study

机译:宫颈癌和肛门癌和癌前癌监测的自然语言处理:算法开发和分裂验证研究

获取原文

摘要

Background Accurate identification of new diagnoses of human papillomavirus–associated cancers and precancers is an important step toward the development of strategies that optimize the use of human papillomavirus vaccines. The diagnosis of human papillomavirus cancers hinges on a histopathologic report, which is typically stored in electronic medical records as free-form, or unstructured, narrative text. Previous efforts to perform surveillance for human papillomavirus cancers have relied on the manual review of pathology reports to extract diagnostic information, a process that is both labor- and resource-intensive. Natural language processing can be used to automate the structuring and extraction of clinical data from unstructured narrative text in medical records and may provide a practical and effective method for identifying patients with vaccine-preventable human papillomavirus disease for surveillance and research. Objective This study's objective was to develop and assess the accuracy of a natural language processing algorithm for the identification of individuals with cancer or precancer of the cervix and anus. Methods A pipeline-based natural language processing algorithm was developed, which incorporated machine learning and rule-based methods to extract diagnostic elements from the narrative pathology reports. To test the algorithm’s classification accuracy, we used a split-validation study design. Full-length cervical and anal pathology reports were randomly selected from 4 clinical pathology laboratories. Two study team members, blinded to the classifications produced by the natural language processing algorithm, manually and independently reviewed all reports and classified them at the document level according to 2 domains (diagnosis and human papillomavirus testing results). Using the manual review as the gold standard, the algorithm’s performance was evaluated using standard measurements of accuracy, recall, precision, and F-measure. Results The natural language processing algorithm’s performance was validated on 949 pathology reports. The algorithm demonstrated accurate identification of abnormal cytology, histology, and positive human papillomavirus tests with accuracies greater than 0.91. Precision was lowest for anal histology reports (0.87, 95% CI 0.59-0.98) and highest for cervical cytology (0.98, 95% CI 0.95-0.99). The natural language processing algorithm missed 2 out of the 15 abnormal anal histology reports, which led to a relatively low recall (0.68, 95% CI 0.43-0.87). Conclusions This study outlines the development and validation of a freely available and easily implementable natural language processing algorithm that can automate the extraction and classification of clinical data from cervical and anal cytology and histology.
机译:背景技术准确鉴定人乳头瘤病毒相关癌症和脊髓素质的新诊断是促进发展人乳头瘤病毒疫苗的策略的重要一步。人乳头瘤病毒癌症在组织病理学报告中诊断,其通常存储在电子医疗记录中作为自由形式,或非结构化的叙述文本。以前对人类乳头瘤病毒癌进行监测的努力依赖于对病理学报告的手工审查提取诊断信息,这是一种劳动和资源密集型的过程。自然语言处理可用于自动化医疗记录中非结构化叙事文本的临床资料的结构和提取,并可提供鉴定患有疫苗可预防的人乳头瘤病毒病毒疾病的实用有效的方法,用于监测和研究。目的本研究的目的是开发和评估自然语言处理算法的准确性,用于鉴定患有癌症或肛门癌的个体的个体。方法开发了一种基于管道的自然语言处理算法,其中包含了基于机器学习和基于规则的方法,以从叙述病理报告中提取诊断元素。为了测试算法的分类准确性,我们使用了分型验证研究设计。从4个临床病理实验室随机选择全长宫颈和肛门病理学报告。两项研究团队成员蒙蔽了自然语言处理算法,手动和独立地审查了所有报告,并根据2个结构域(诊断和人乳头瘤病毒检测结果)将它们分类。使用手动审查作为黄金标准,使用准确度,召回,精度和F测量的标准测量评估算法的性能。结果自然语言处理算法在949个病理报告中验证了性能。该算法表明,精确鉴定异常细胞学,组织学和具有大于0.91的精度的阳性人乳头瘤病毒试验。肛门组织学报告(0.87,95%CI 0.98)和宫颈细胞学最高的精度最低(0.98,95%CI 0.95-0.99)。自然语言处理算法错过了15个异常肛门组织学报告中的2个,其导致相对较低的召回(0.68,95%CI 0.43-0.87)。结论本研究概述了自由且易于可实现的自然语言处理算法的开发和验证,可以自动化宫颈和肛门细胞学和组织学的临床数据的提取和分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号