首页> 美国卫生研究院文献>PLoS Genetics >Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates
【2h】

Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates

机译:在大规模公共数据库时代使用等离子数据集评估统计方法:使用错误发现率的图示

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Plasmode is a term coined several years ago to describe data sets that are derived from real data but for which some truth is known. Omic techniques, most especially microarray and genomewide association studies, have catalyzed a new zeitgeist of data sharing that is making data and data sets publicly available on an unprecedented scale. Coupling such data resources with a science of plasmode use would allow statistical methodologists to vet proposed techniques empirically (as opposed to only theoretically) and with data that are by definition realistic and representative. We illustrate the technique of empirical statistics by consideration of a common task when analyzing high dimensional data: the simultaneous testing of hundreds or thousands of hypotheses to determine which, if any, show statistical significance warranting follow-on research. The now-common practice of multiple testing in high dimensional experiment (HDE) settings has generated new methods for detecting statistically significant results. Although such methods have heretofore been subject to comparative performance analysis using simulated data, simulating data that realistically reflect data from an actual HDE remains a challenge. We describe a simulation procedure using actual data from an HDE where some truth regarding parameters of interest is known. We use the procedure to compare estimates for the proportion of true null hypotheses, the false discovery rate (FDR), and a local version of FDR obtained from 15 different statistical methods.
机译:Plasmode是几年前创造的一个术语,用于描述从真实数据派生但已知某些事实的数据集。卵子技术,尤其是微阵列和全基因组关联研究,催生了一种新的数据共享时代精神,使数据和数据集以前所未有的规模公开可用。将此类数据资源与使用等离子体的科学相结合,将允许统计方法学家根据经验(而不是仅从理论上)并且根据定义而言具有现实意义和代表性的数据来​​审查提议的技术。我们通过分析高维数据时考虑到的一项常见任务来说明经验统计技术:同时检验数百或数千个假设,以确定哪些假设(如果有的话)具有统计学意义,值得进行后续研究。现在,在高维实验(HDE)设置中进行多次测试的常见做法产生了检测统计学上显着结果的新方法。尽管迄今为止这些方法已经使用模拟数据进行了比较性能分析,但是模拟现实地反映出来自实际HDE的数据的数据仍然是一个挑战。我们描述了一种使用来自HDE的实际数据进行的仿真过程,其中已知了有关感兴趣参数的一些事实。我们使用该程序比较从15种不同统计方法获得的真实无效假设,错误发现率(FDR)和FDR的本地版本的估计值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号