Natural Language Processing of Large-Scale Structured Radiology Reports to Identify Oncologic Patients With or Without Splenomegaly Over a 10-Year Period

Simon Sun; Kaelan Lupton; Karen BatchHuy NguyenLior GazitNatalie GangaiJessica ChoKevin NicholasFarhana ZulkernineVaradan SevilimeduAmber SimpsonRichard K. G. Do

摘要

PURPOSE To assess the accuracy of a natural language processing (NLP) model in extracting splenomegaly described in patients with cancer in structured computed tomography radiology reports. METHODS In this retrospective study between July 2009 and April 2019, 3,87,359 consecutive structured radiology reports for computed tomography scans of the chest, abdomen, and pelvis from 91,665 patients spanning 30 types of cancer were included. A randomized sample of 2,022 reports from patients with colorectal cancer, hepatobiliary cancer (HB), leukemia, Hodgkin lymphoma (HL), and non-HL patients was manually annotated as positive or negative for splenomegaly. NLP model training/testing was performed on 1,617/405 reports, and a new validation set of 400 reports from all cancer subtypes was used to test NLP model accuracy, precision, and recall. Overall survival was compared between the patient groups (with and without splenomegaly) using Kaplan-Meier curves. RESULTS The final cohort included 3,87,359 reports from 91,665 patients (mean age 60.8 years; 51.2% women). In the testing set, the model achieved accuracy of 92.1%, precision of 92.2%, and recall of 92.1% for splenomegaly. In the validation set, accuracy, precision, and recall were 93.8%, 92.9%, and 86.7%, respectively. In the entire cohort, splenomegaly was most frequent in patients with leukemia (32.5%), HB (17.4%), non-HL (9.1%), colorectal cancer (8.5%), and HL (5.6%). A splenomegaly label was associated with an increased risk of mortality in the entire cohort (hazard ratio 2.10; 95% Cl, 1.98 to 2.22; P < .001). CONCLUSION Automated splenomegaly labeling by NLP of radiology report demonstrates good accuracy, precision, and recall. Splenomegaly is most frequently reported in patients with leukemia, followed by patients with HB.

机译：目的是评估自然语言处理（NLP）模型在结构化计算机断层扫描放射学报告中提取脾肿大的脾肿大的准确性。在2009年7月至2019年4月之间的这项回顾性研究中，包括91,665名患者的胸部，腹部和骨盆的计算机断层扫描连续3,87,359种结构化放射学报告，包括30例癌症。来自结直肠癌，肝胆管癌（HB），白血病，霍奇金淋巴瘤（HL）和非HL患者的2,022例报告的随机样本被手动注释为阳性或脾肿大的阳性或阴性。对1,617/405的报告进行了NLP模型培训/测试，并使用所有癌症亚型的400个报告的新验证集来测试NLP模型的准确性，精度和召回率。使用Kaplan-Meier曲线比较患者组（有或没有脾脏肿大）之间的总生存期。结果最终队列包括来自91,665例患者（平均60.8岁;女性51.2％）的3,87,359个报告。在测试集中，该模型的精度为92.1％，精度为92.2％，脾肿大的召回率为92.1％。在验证集中，准确性，精度和召回率分别为93.8％，92.9％和86.7％。在整个队列中，脾肿大的白血病患者（32.5％），HB（17.4％），非HL（9.1％），结直肠癌（8.5％）和HL（5.6％）。脾肿大的标签与整个队列中死亡率的增加有关（危险比2.10; 95％Cl，1.98至2.22; p <.001）。结论放射学报告NLP自动化脾肿大的标记表明了良好的准确性，精度和召回率。脾肿大的白血病患者最常报告，其次是HB患者。

Natural Language Processing of Large-Scale Structured Radiology Reports to Identify Oncologic Patients With or Without Splenomegaly Over a 10-Year Period

摘要

著录项

相似文献

相关主题

期刊订阅