首页> 外文期刊>Ophthalmology >Grader Variability and the Importance of Reference Standards for Evaluating Machine Learning Models for Diabetic Retinopathy
【24h】

Grader Variability and the Importance of Reference Standards for Evaluating Machine Learning Models for Diabetic Retinopathy

机译:评分为评估糖尿病视网膜病变的机器学习模型的参考标准的评分变异性和参考标准的重要性

获取原文
获取原文并翻译 | 示例
           

摘要

PurposeUse adjudication to quantify errors in diabetic retinopathy (DR) grading based on individual graders and majority decision, and to train an improved automated algorithm for DR grading.DesignRetrospective analysis.ParticipantsRetinal fundus images from DR screening programs.MethodsImages were each graded by the algorithm, U.S. board-certified ophthalmologists, and retinal specialists. The adjudicated consensus of the retinal specialists served as the reference standard.Main Outcome MeasuresFor agreement between different graders as well as between the graders and the algorithm, we measured the (quadratic-weighted) kappa score. To compare the performance of different forms of manual grading and the algorithm for various DR severity cutoffs (e.g., mild or worse DR, moderate or worse DR), we measured area under the curve (AUC), sensitivity, and specificity.ResultsOf the 193 discrepancies between adjudication by retinal specialists and majority decision of ophthalmologists, the most common were missing microaneurysm (MAs) (36%), artifacts (20%), and misclassified hemorrhages (16%). Relative to the reference standard, the kappa for individual retinal specialists, ophthalmologists, and algorithm ranged from 0.82 to 0.91, 0.80 to 0.84, and 0.84, respectively. For moderate or worse DR, the majority decision of ophthalmologists had a sensitivity of 0.838 and specificity of 0.981. The algorithm had a sensitivity of 0.971, specificity of 0.923, and AUC of 0.986. For mild or worse DR, the algorithm had a sensitivity of 0.970, specificity of 0.917, and AUC of 0.986. By using a small number of adjudicated consensus grades as a tuning dataset and higher-resolution images as input, the algorithm improved in AUC from 0.934 to 0.986 for moderate or worse DR.ConclusionsAdjudication reduces the errors in DR grading. A small set of adjudicated DR grades allows substantial improvements in algorithm performance. The resulting algorithm's performance was on par with that of individual U.S. Board-Certified ophthalmologists and retinal specialists.
机译:目的判断基于各个年级学生和多数决定的糖尿病视网膜病变(DR)分级的判断,以及培训改进的自动化算法.DesignRetrospive分析。来自筛选程序的ParticipantsRetinal眼底图像。每种算法分级,美国董事会认证的眼科医生和视网膜专家。视网膜专家的判决共识担任参考标准。在不同年级学生之间的协议以及分级机和算法之间的协议,我们测量了(二次加权)Kappa得分。为了比较不同形式的手动分级和各种DR严重截止的算法的性能(例如,温和或更糟糕的DR,中等或更糟DR),我们在曲线(AUC),灵敏度和特异性下的测量区域。193视网膜专家和大多数眼科医生的判决之间的判决差异,最常见的是缺少微安瘤(MAS)(36%),文物(20%)和错误分类的出血(16%)。相对于参考标准,个人视网膜专家,眼科医生和算法的κ分别为0.82至0.91,0.80至0.84和0.84。对于中等或更差的DR,眼科医生的大多数决定具有0.838的敏感性,特异性为0.981。该算法的灵敏度为0.971,特异性为0.923,AUC为0.986。对于温和或更差的DR,该算法的灵敏度为0.970,特异性为0.917,AUC为0.986。通过使用少量判决的共识等级作为调谐数据集和更高分辨率的图像作为输入,AUC的算法从0.934到0.986改进,适度或更差的DR.ConclusionsAdjimation减少了DR分级的错误。一小一组判决的DR等级允许大量改进算法性能。由此产生的算法的表现与个人美国董事会认证的眼科医生和视网膜专家的表现相当于案例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号