The effect of sample size and disease prevalence on supervised machine learning of narrative data.

机译：样本量和疾病患病率对叙事数据的监督机器学习的影响。

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper examines the independent effects of outcome prevalence and training sample sizes on inductive learning performance. We trained 3 inductive learning algorithms (MC4, IB, and Naïve-Bayes) on 60 simulated datasets of parsed radiology text reports labeled with 6 disease states. Data sets were constructed to define positive outcome states at 4 prevalence rates (1, 5, 10, 25, and 50%) in training set sizes of 200 and 2,000 cases. We found that the effect of outcome prevalence is significant when outcome classes drop below 10% of cases. The effect appeared independent of sample size, induction algorithm used, or class label. Work is needed to identify methods of improving classifier performance when output classes are rare.

机译：本文研究了结果流行度和培训样本量对归纳学习成绩的独立影响。我们在解析的放射学文本报告的60个模拟数据集上对3种归纳学习算法（MC4，IB和朴素贝叶斯）进行了训练，并标记了6种疾病状态。构建数据集以定义200个和2,000个案例的训练集中的4种患病率（1、5％，10％，25％和50％）的阳性结果状态。我们发现，当结局类别降至病例的10％以下时，结局患病率的影响很明显。出现的效果与样本大小，使用的归纳算法或类别标签无关。当输出类很少时，需要工作来确定提高分类器性能的方法。

著录项

期刊名称 AMIA Annual Symposium Proceedings
作者
Lawrence K. McKnight; Adam Wilcox; George Hripcsak;
展开▼
作者单位

展开▼
年(卷),期 2002(519–522),-1
年度 2002
页码 519–522
总页数 4
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Using Machine Learning to Predict Geomorphic Disturbance: The Effects of Sample Size, Sample Prevalence, and Sampling Strategy [J] . Perry George L. W., Dickson Mark E. Journal of geophysical research. Earth Surface: JGR . 2018,第11期

机译：使用机器学习预测地貌扰动：样本大小，样品普及率和采样策略的影响
2. Supervised machine learning and heterotic classification of maize (Zea mays L.) using molecular marker data. [J] . Ornella L., Tapia E. Computers and Electronics in Agriculture . 2010,第2期

机译：使用分子标记数据对玉米（Zea mays L.）进行有监督的机器学习和杂种分类。
3. Large sample size, wide variant spectrum, and advanced machine-learning technique boost risk prediction for inflammatory bowel disease [J] . WeiZ., WangW., BradfieldJ., The American Journal of Human Genetics . 2013,第6期

机译：大样本量，宽广的变异谱和先进的机器学习技术可提高炎症性肠病的风险预测
4. SAMPLE SIZES FOR DECLARING DISEASE FREEDOM WITH UNCERTAIN DIAGNOSTIC TEST PERFORMANCE AND PREVALENCE [C] . K. MINTIENS, Y. BALAVARCA, D. VERLOO, Meeting of the Society for Veterinary Epidemiology and Preventive Medicine . 2007

机译：用于宣布具有不确定诊断测试性能和患病率的疾病自由的示例尺寸
5. Lightly Supervised Machine Learning for Classifying Online Social Data. [D] . Mohammady Ardehaly, Ehsan. 2017

机译：轻微监督的机器学习，用于对在线社交数据进行分类。
6. Large Sample Size Wide Variant Spectrum and Advanced Machine-Learning Technique Boost Risk Prediction for Inflammatory Bowel Disease [O] . Zhi Wei, Wei Wang, Jonathan Bradfield, 2013

机译：大样本量宽光谱和先进的机器学习技术可提高炎症性肠病的风险预测
7. Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data. [O] . Enrico Glaab, Jaume Bacardit, Jonathan M Garibaldi, 2012

机译：使用基于规则的机器学习来进行候选疾病基因优先排序和癌症基因表达数据的样本分类。

The effect of sample size and disease prevalence on supervised machine learning of narrative data.

摘要

著录项

相似文献

相关主题

期刊订阅