High dimensional model representation of log likelihood ratio: binary classification with SNP data

pour Ali Foroughi; Pietrzak Maciej; Sucheston-Campbell Lara E.; Karaesmen Ezgi; Dalton Lori A.; Rempa?a Grzegorz A.

首页> 外文期刊>BMC Medical Genomics >High dimensional model representation of log likelihood ratio: binary classification with SNP data

【24h】

High dimensional model representation of log likelihood ratio: binary classification with SNP data

机译：日志似然比的高维模型表示：具有SNP数据的二进制分类

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Developing binary classification rules based on SNP observations has been a major challenge for many modern bioinformatics applications, e.g., predicting risk of future disease events in complex conditions such as cancer. Small-sample, high-dimensional nature of SNP data, weak effect of each SNP on the outcome, and highly non-linear SNP interactions are several key factors complicating the analysis. Additionally, SNPs take a finite number of values which may be best understood as ordinal or categorical variables, but are treated as continuous ones by many algorithms. We use the theory of high dimensional model representation (HDMR) to build appropriate low dimensional glass-box models, allowing us to account for the effects of feature interactions. We compute the second order HDMR expansion of the log-likelihood ratio to account for the effects of single SNPs and their pairwise interactions. We propose a regression based approach, called linear approximation for block second order HDMR expansion of categorical observations (LABS-HDMR-CO), to approximate the HDMR coefficients. We show how HDMR can be used to detect pairwise SNP interactions, and propose the fixed pattern test (FPT) to identify statistically significant pairwise interactions. We apply LABS-HDMR-CO and FPT to synthetically generated HAPGEN2 data as well as to two GWAS cancer datasets. In these examples LABS-HDMR-CO enjoys superior accuracy compared with several algorithms used for SNP classification, while also taking pairwise interactions into account. FPT declares very few significant interactions in the small sample GWAS datasets when bounding false discovery rate (FDR) by 5%, due to the large number of tests performed. On the other hand, LABS-HDMR-CO utilizes a large number of SNP pairs to improve its prediction accuracy. In the larger HAPGEN2 dataset FTP declares a larger portion of SNP pairs used by LABS-HDMR-CO as significant. LABS-HDMR-CO and FPT are interesting methods to design prediction rules and detect pairwise feature interactions for SNP data. Reliably detecting pairwise SNP interactions and taking advantage of potential interactions to improve prediction accuracy are two different objectives addressed by these methods. While the large number of potential SNP interactions may result in low power of detection, potentially interacting SNP pairs, of which many might be false alarms, can still be used to improve prediction accuracy.

机译：基于SNP观测的发展二元分类规则是许多现代生物信息学应用的主要挑战，例如，预测癌症等复杂条件下未来疾病事件的风险。 SNP数据的小样本，高维性质，每个SNP对结果的弱效果，高度非线性SNP相互作用是一个复杂分析的几个关键因素。另外，SNPS采用有限数量的值，该值可以最好地理解为序数或分类变量，而是被许多算法视为连续的值。我们使用高维模型表示（HDMR）理论构建适当的低维玻璃盒型号，允许我们考虑特征交互的影响。我们计算日志似然比的二阶HDMR扩展，以考虑单个SNP和其成对交互的影响。我们提出了一种基于回归的方法，称为线性近似用于分类观测的块二阶HDMR扩展（Labs-HDMR-Co），以近似HDMR系数。我们展示了HDMR如何用于检测成对SNP交互，并提出固定模式测试（FPT）以识别统计显着的成对相互作用。我们将Labs-HDMR-CO和FPT应用于合成生成的HAPGEN2数据以及两个GWAS癌症数据集。在这些例子中，与SNP分类的几种算法相比，Labs-HDMR-Co享有卓越的精度，同时还考虑了成对的相互作用。由于执行了大量的测试，FPT在限定错误发现率（FDR）时，在小样本GWAS数据集中宣布非常少，由于执行了大量的测试。另一方面，Labs-HDMR-CO利用大量SNP对来提高其预测精度。在较大的HAPGen2数据集中，FTP声明Labs-HDMR-Co使用的大部分SNP对。 Labs-HDMR-CO和FPT是设计预测规则的有趣方法，并检测SNP数据的成对功能交互。可靠地检测成对SNP相互作用并利用潜在的相互作用以提高预测精度是通过这些方法解决的两个不同的目标。虽然大量潜在的SNP相互作用可能导致低功率的检测，潜在的交互SNP对，其中许多可能是假警报，仍可用于提高预测精度。

著录项

来源
《BMC Medical Genomics》 |2020年第9期|共22页
作者
pour Ali Foroughi; Pietrzak Maciej; Sucheston-Campbell Lara E.; Karaesmen Ezgi; Dalton Lori A.; Rempa?a Grzegorz A.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类遗传学;
关键词
Single nucleotide polymorphismBinary classificationHigh dimensional model representationPairwise SNP interactionsLog likelihood ratio;

机译：单核苷酸多态性分类高尺寸模型代表性化方向SNP相互作用似然比;

相似文献

外文文献
中文文献
专利

1. High dimensional model representation of log-likelihood ratio: binary classification with expression data [J] . Ali Foroughi pour, Maciej Pietrzak, Lori A Dalton, BMC Bioinformatics . 2020,第1期

机译：逻辑似然比的高维模型表示：具有表达数据的二进制分类
2. A COMPOSITE LIKELIHOOD APPROACH TO COMPUTER MODEL CALIBRATION WITH HIGH-DIMENSIONAL SPATIAL DATA [J] . Chang Won, Haran Murali, Olson Roman, Statistica Sinica . 2015,第1期

机译：高维空间数据的计算机模型标定的复合似然方法
3. Gaussian mixture models for the classification of high-dimensional vibrational spectroscopy data [J] . Julien Jacques, Charles Bouveyron, Stephane Girard Journal of Chemometrics . 2010,第11a12期

机译：高斯混合模型用于高维振动光谱数据分类
4. An Associative Classification Based Approach for Detecting SNP-SNP Interactions in High Dimensional Genome [C] . Uppu Suneetha, Krishna Aneesh, Gopalan Raj P. IEEE International Conference on Bioinformatics and Bioengineering . 2014

机译：基于关联分类的高维基因组SNP-SNP相互作用检测方法
5. Monitoring Markov dependent binary observations with a log-likelihood ratio based CUSUM control chart [D] . Modarres-Mousavi, Shabnam 2006

机译：使用基于对数似然比的CUSUM控制图监视与Markov相关的二元观测值
6. High dimensional model representation of log likelihood ratio: binary classification with SNP data [O] . Ali Foroughi pour, Maciej Pietrzak, Lara E. Sucheston-Campbell, 2020

机译：日志似然比的高维模型表示：具有SNP数据的二进制分类
7. High dimensional model representation of log likelihood ratio: binary classification with SNP data [O] . Ali Foroughi pour, Maciej Pietrzak, Lara E. Sucheston-Campbell, 2020

机译：日志似然比的高维模型表示：具有SNP数据的二进制分类
8. Effect of Radiance-to-Reflectance Transformation and Atmosphere Removal onMaximum Likelihood Classification Accuracy of High-Dimensional Remote Sensing Data [R] . Hoffbeck, J. P., Landgrebe, D. A. 1994

机译：辐射 - 反射率变换和大气去除对高维遥感数据最大极大似然分类精度的影响

High dimensional model representation of log likelihood ratio: binary classification with SNP data

摘要

著录项

相似文献

相关主题

期刊订阅