首页> 美国卫生研究院文献>BMC Bioinformatics >Sample size and statistical power considerations in high-dimensionality data settings: a comparative study of classification algorithms

【2h】

Sample size and statistical power considerations in high-dimensionality data settings: a comparative study of classification algorithms

机译：高维数据设置中的样本量和统计功效考虑因素：分类算法的比较研究

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

BackgroundData generated using 'omics' technologies are characterized by high dimensionality, where the number of features measured per subject vastly exceeds the number of subjects in the study. In this paper, we consider issues relevant in the design of biomedical studies in which the goal is the discovery of a subset of features and an associated algorithm that can predict a binary outcome, such as disease status. We compare the performance of four commonly used classifiers (K-Nearest Neighbors, Prediction Analysis for Microarrays, Random Forests and Support Vector Machines) in high-dimensionality data settings. We evaluate the effects of varying levels of signal-to-noise ratio in the dataset, imbalance in class distribution and choice of metric for quantifying performance of the classifier. To guide study design, we present a summary of the key characteristics of 'omics' data profiled in several human or animal model experiments utilizing high-content mass spectrometry and multiplexed immunoassay based techniques.

机译：背景技术使用“组学”技术生成的数据具有高维度特征，其中每个受试者测量的特征数量大大超过了研究中的受试者数量。在本文中，我们考虑了与生物医学研究设计相关的问题，这些研究的目标是发现特征的子集以及可以预测二进制结果（例如疾病状态）的关联算法。我们在高维数据设置中比较了四个常用分类器（K最近邻，微阵列的预测分析，随机森林和支持向量机）的性能。我们评估了数据集中信噪比水平变化，类分布不平衡以及量化分类器性能的度量选择的影响。为了指导研究设计，我们对使用高含量质谱和基于多重免疫测定技术的几种人或动物模型实验中描述的“组学”数据的关键特征进行了总结。

著录项

期刊名称 BMC Bioinformatics
作者
Yu Guo; Armin Graber; Robert N McBurney; Raji Balasubramanian;
展开▼
作者单位

展开▼
年(卷),期 2010(11),-1
年度 2010
页码 447
总页数 19
原文格式 PDF
正文语种
中图分类应用微生物学;生化遗传学;生化药理学;
关键词

相似文献

外文文献
中文文献
专利

1. Sample size and statistical power considerations in high-dimensionality data settings: a comparative study of classification algorithms [J] . Yu Guo, Armin Graber, Robert N McBurney, BMC Bioinformatics . 2010,第1期

机译：高维数据设置中的样本大小和统计功效考虑因素：分类算法的比较研究
2. Microbial Diversity in Clinical Microbiome Studies: Sample Size and Statistical Power Considerations [J] . Gastroenterology . 2020,第6期

机译：临床微生物组研究中的微生物多样性：样本尺寸和统计功率考虑
3. Statistical power calculation and sample size determination for environmental studies with data below detection limits [J] . Quanxi Shao, You-Gan Wang Water resources research . 2009,第9期

机译：低于检测极限的数据用于环境研究的统计功效计算和样本大小确定
4. A Comparative Study of Manual Wagon Top Sampling and Auto Mechanical Sampling of 200 mm Size Coal with Respect to Stopped Belt Sampling of Thermal Coal at Indian Super Thermal Powerplants [C] . K M K Sinha, K K Sharma, G S Jha Sampling Conference . 2014

机译：200mm尺寸煤的手动货车顶部采样和自动机械采样的比较研究，相对于印度超热动力动力粉线热煤的停止带采样
5. Alleviating class imbalance using data sampling: Examining the effects on classification algorithms. [D] . Napolitano, Amri E. 2006

机译：使用数据采样缓解类不平衡：检查对分类算法的影响。
6. Lung nodule malignancy classification using only radiologist-quantified image features as inputs to statistical learning algorithms: probing the Lung Image Database Consortium dataset with two statistical learning methods [O] . Matthew C. Hancock, Jerry F. Magnan 2016

机译：仅使用放射科医生量化的图像特征作为统计学习算法的输入的肺结节恶性分类：使用两种统计学习方法探查肺图像数据库联盟数据集
7. Sample size and statistical power considerations in high-dimensionality data settings: a comparative study of classification algorithms [O] . Yu Guo, Armin Graber, Robert N McBurney, 2010

机译：高维数据设置中的样本量和统计功效考虑因素：分类算法的比较研究

Sample size and statistical power considerations in high-dimensionality data settings: a comparative study of classification algorithms

摘要

著录项

相似文献

相关主题

期刊订阅