首页> 美国卫生研究院文献>BMC Proceedings >Comparison of parametric and machine methods for variable selection in simulated Genetic Analysis Workshop 19 data

【2h】

Comparison of parametric and machine methods for variable selection in simulated Genetic Analysis Workshop 19 data

机译：模拟遗传分析研讨会19数据中用于变量选择的参数方法和机器方法的比较

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Current findings from genetic studies of complex human traits often do not explain a large proportion of the estimated variation of these traits due to genetic factors. This could be, in part, due to overly stringent significance thresholds in traditional statistical methods, such as linear and logistic regression. Machine learning methods, such as Random Forests (RF), are an alternative approach to identify potentially interesting variants. One major issue with these methods is that there is no clear way to distinguish between probable true hits and noise variables based on the importance metric calculated. To this end, we are developing a method called the Relative Recurrency Variable Importance Metric (r2VIM), a RF-based variable selection method. Here, we apply r2VIM to the unrelated Genetic Analysis Workshop 19 data with simulated systolic blood pressure as the phenotype. We compare the number of “true” functional variants identified by r2VIM with those identified by linear regression analyses that use a Bonferroni correction to calculate a significance threshold. Our results show that r2VIM performed comparably to linear regression. Our findings are proof-of-concept for r2VIM, as it identifies a similar number of functional and nonfunctional variants as a more commonly used technique when the optimal importance score threshold is used.

机译：来自复杂人类特征的遗传研究的最新发现通常不能解释遗传因素导致的这些特征估计变异的很大一部分。这可能部分是由于传统统计方法（例如线性回归和逻辑回归）中的重要性阈值过严格。诸如随机森林（RF）之类的机器学习方法是识别潜在有趣变体的替代方法。这些方法的一个主要问题是，没有一种清晰的方法可以根据计算出的重要性指标来区分可能的真实命中和噪声变量。为此，我们正在开发一种称为相对循环变量重要性度量（r2VIM）的方法，这是一种基于RF的变量选择方法。在这里，我们将r2VIM应用于无关的基因分析研讨会19数据，并以模拟的收缩压作为表型。我们将r2VIM识别的“真实”功能变体的数量与使用Bonferroni校正来计算显着性阈值的线性回归分析所识别的数量进行比较。我们的结果表明，r2VIM的性能与线性回归相当。我们的发现是r2VIM的概念证明，因为当使用最佳重要性评分阈值时，它可以识别出与更常用的技术相似数量的功能性和非功能性变体。

著录项

期刊名称 BMC Proceedings
作者
Emily R. Holzinger; Silke Szymczak; James Malley; Elizabeth W. Pugh; Hua Ling; Sean Griffith; Peng Zhang; Qing Li; Cheryl D. Cropp; Joan E. Bailey-Wilson;
展开▼
作者单位

展开▼
年(卷),期 2016(10),Suppl 7
年度 2016
页码 147–152
总页数 6
原文格式 PDF
正文语种
中图分类
关键词
入库时间 2022-08-17 14:09:44

相似文献

外文文献
中文文献
专利

1. Comparison of parametric and machine methods for variable selection in simulated Genetic Analysis Workshop 19 data [J] . Emily R. Holzinger, Silke Szymczak, James Malley, BMC proceedings. . 2016,第Suppla7期

机译：模拟遗传分析研讨会中可变选择参数和机器方法的比较19数据
2. Whole genome sequence analysis of the simulated systolic blood pressure in Genetic Analysis Workshop 18 family data: long-term average and collapsing methods [J] . Yun Ju Sung, Jacob Basson, Dabeeru C Rao BMC proceedings. . 2014,第S1期

机译：在Genetic Analysis Workshop 18家庭数据中模拟收缩压的全基因组序列分析：长期平均和崩溃方法
3. Comparison of multilevel modeling and the family-based association test for identifying genetic variants associated with systolic and diastolic blood pressure using Genetic Analysis Workshop 18 simulated data [J] . Jian Wang, Robert Yu, Sanjay Shete BMC proceedings. . 2014,第S1期

机译：使用遗传分析研讨会18模拟数据比较多级建模和基于家族的关联测试，以鉴定与收缩压和舒张压相关的遗传变异
4. An extended comparison study of large scale data-driven prediction methods based on variable selection, latent variables, penalized regression and machine learning [C] . Ricardo Rendall, Ana Pereira, Marco Rei European Symposium on Computer Aided Process Engineering . 2016

机译：基于变量选择，潜在变量，惩罚回归与机器学习的大规模数据驱动预测方法的扩展比较研究
5. REGRESSION ANALYSIS WITH SELECTION BIASED DEPENDENT VARIABLE (TRUNCATED DATA, STRATIFIED SAMPLES, CENSORED, KAPLAN-MEIR ESTIMATE, SEMI-PARAMETRIC MODEL) [D] . WANG, MEI-CHENG. 1985

机译：具有选择偏置相关变量的回归分析（截断的数据，分层的样本，经过检查的，Kaplan-Meier估计，半参数模型）
6. Comparison of variable and model selection methods for genetic association studies using the GAW15 simulated data [O] . Zhan Ye, Elizabeth J Atkinson, Brooke L Fridley, 2007

机译：使用GAW15模拟数据进行遗传关联研究的变量和模型选择方法的比较
7. Comparison of parametric and machine methods for variable selection in simulated Genetic Analysis Workshop 19 data [O] . 2016

机译：模拟遗传分析研讨会19数据中用于变量选择的参数方法和机器方法的比较
8. Selection of variables for neural network analysis. Comparisons of several methods with high energy physics data [R] . Proriol, J. 1994

机译：选择用于神经网络分析的变量。几种方法与高能物理数据的比较

Comparison of parametric and machine methods for variable selection in simulated Genetic Analysis Workshop 19 data

摘要

著录项

相似文献

相关主题

期刊订阅