首页> 美国卫生研究院文献>BMC Bioinformatics >Sample size and statistical power considerations in high-dimensionality data settings: a comparative study of classification algorithms
【2h】

Sample size and statistical power considerations in high-dimensionality data settings: a comparative study of classification algorithms

机译:高维数据设置中的样本量和统计功效考虑因素:分类算法的比较研究

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

BackgroundData generated using 'omics' technologies are characterized by high dimensionality, where the number of features measured per subject vastly exceeds the number of subjects in the study. In this paper, we consider issues relevant in the design of biomedical studies in which the goal is the discovery of a subset of features and an associated algorithm that can predict a binary outcome, such as disease status. We compare the performance of four commonly used classifiers (K-Nearest Neighbors, Prediction Analysis for Microarrays, Random Forests and Support Vector Machines) in high-dimensionality data settings. We evaluate the effects of varying levels of signal-to-noise ratio in the dataset, imbalance in class distribution and choice of metric for quantifying performance of the classifier. To guide study design, we present a summary of the key characteristics of 'omics' data profiled in several human or animal model experiments utilizing high-content mass spectrometry and multiplexed immunoassay based techniques.
机译:背景技术使用“组学”技术生成的数据具有高维度特征,其中每个受试者测量的特征数量大大超过了研究中的受试者数量。在本文中,我们考虑了与生物医学研究设计相关的问题,这些研究的目标是发现特征的子集以及可以预测二进制结果(例如疾病状态)的关联算法。我们在高维数据设置中比较了四个常用分类器(K最近邻,微阵列的预测分析,随机森林和支持向量机)的性能。我们评估了数据集中信噪比水平变化,类分布不平衡以及量化分类器性能的度量选择的影响。为了指导研究设计,我们对使用高含量质谱和基于多重免疫测定技术的几种人或动物模型实验中描述的“组学”数据的关键特征进行了总结。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号