首页> 外文OA文献 >Identification of Malignancies from Free-Text Histopathology Reports Using a Multi-Model Supervised Machine Learning Approach

【2h】

Identification of Malignancies from Free-Text Histopathology Reports Using a Multi-Model Supervised Machine Learning Approach

机译：使用多模型监督机器学习方法从自由文本组织病理学报告中鉴定恶性肿瘤

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We explored various Machine Learning (ML) models to evaluate how each model performs in the task of classifying histopathology reports. We trained, optimized, and performed classification with Stochastic Gradient Descent (SGD), Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbor (KNN), Adaptive Boosting (AB), Decision Trees (DT), Gaussian Naïve Bayes (GNB), Logistic Regression (LR), and Dummy classifier. We started with 60,083 histopathology reports, which reduced to 60,069 after pre-processing. The F1-scores for SVM, SGD KNN, RF, DT, LR, AB, and GNB were 97%, 96%, 96%, 96%, 92%, 96%, 84%, and 88%, respectively, while the misclassification rates were 3.31%, 5.25%, 4.39%, 1.75%, 3.5%, 4.26%, 23.9%, and 19.94%, respectively. The approximate run times were 2 h, 20 min, 40 min, 8 h, 40 min, 10 min, 50 min, and 4 min, respectively. RF had the longest run time but the lowest misclassification rate on the labeled data. Our study demonstrated the possibility of applying ML techniques in the processing of free-text pathology reports for cancer registries for cancer incidence reporting in a Sub-Saharan Africa setting. This is an important consideration for the resource-constrained environments to leverage ML techniques to reduce workloads and improve the timeliness of reporting of cancer statistics.

机译：我们探索了各种机器学习（ML）模型来评估每个模型如何在分类组织病理学报告的任务中进行。我们用随机梯度下降（SGD），支持向量机（SVM），随机森林（RF），K最近邻居（KNN），自适应升压（AB），决策树（DT），高斯，高斯天真贝叶斯（GNB），Logistic回归（LR）和虚拟分级器。我们开始使用60,083个组织病理学报告，预处理后减少到60,069。 SVM，SGD KNN，RF，DT，LR，AB和GNB的F1分数分别为97％，96％，96％，96％，92％，96％，84％和88％，而且错误分类率分别为3.31％，5.25％，4.39％，1.75％，3.5％，4.26％，23.9％和19.94％。近似运行时间为2小时，20分钟，40分钟，8小时，40分钟，10分钟，50分钟和4分钟。 RF具有最长的运行时间，但标记数据的错误分类率最低。我们的研究表明，在撒哈拉以南非洲撒哈拉非洲环境中加工癌症发出率报告的自由文本病理报告中应用ML技术的可能性。这是对资源受限环境的重要考虑因素利用ML技术来减少工作量，提高癌症统计数据报告的及时性。

著录项

作者
Victor Olago; Mazvita Muchengeti; Elvira Singh; Wenlong C. Chen;
展开▼
作者单位

展开▼
年度 2020
总页数
原文格式 PDF
正文语种 eng
中图分类

相似文献

外文文献
中文文献
专利

1. Comparing a knowledge-driven approach to a supervised machine learning approach in large-scale extraction of drug-side effect relationships from free-text biomedical literature [J] . Rong Xu, QuanQiu Wang BMC Bioinformatics . 2015,第SUPPLEMENTa5期

机译：从大规模的自由文本生物医学文献中比较知识驱动的方法与有监督的机器学习方法以大规模提取药物副作用的关系
2. How to Conduct Rigorous Supervised Machine Learning in Information Systems Research: The Supervised Machine Learning Report Card [J] . Niklas Kühl, Robin Hirt, Lucas Baier, Communications of the Association for Information Systems . 2021,第a期

机译：如何在信息系统研究中进行严格的监督机器学习：监督机器学习报告卡
3. Comparison of machine learning classifiers for influenza detection from emergency department free-text reports [J] . Journal of biomedical informatics. . 2015,第Null期

机译：急诊部门自由文本报告中用于检测流感的机器学习分类器的比较
4. Evaluation of Hybrid Unsupervised and Supervised Machine Learning Approach to Detect Self-Reporting of COVID-19 Symptoms on Twitter [C] . Mingxiang Cai, Jiawei Li, Matthew Nali, IEEE International Conference on Communications Workshops . 2021

机译：杂交无监督和监督机器学习方法检测Covid-19在Twitter上的自我报告的评价
5. Metabolite in silico identification software (MetISIS): A machine learning approach to tandem mass spectral identification of metabolites. [D] . Kangas, Lars J. 2012

机译：硅化物中的代谢物鉴定软件（MetISIS）：一种机器学习的方法，用于串联质谱鉴定代谢物。
6. Comparing a knowledge-driven approach to a supervised machine learning approach in large-scale extraction of drug-side effect relationships from free-text biomedical literature [O] . Rong Xu, QuanQiu Wang 2015

机译：从大规模文本医学生物医学文献中大规模提取药物副作用关系时将知识驱动方法与有监督的机器学习方法进行比较
7. Comparing a knowledge-driven approach to a supervised machine learning approach in large-scale extraction of drug-side effect relationships from free-text biomedical literature [O] . 2015

机译：从大规模文本医学生物医学文献中大规模提取药物副作用关系时，将知识驱动方法与有监督的机器学习方法进行比较

Identification of Malignancies from Free-Text Histopathology Reports Using a Multi-Model Supervised Machine Learning Approach

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅