...
首页> 外文期刊>BMC Medical Genomics >High dimensional model representation of log likelihood ratio: binary classification with SNP data
【24h】

High dimensional model representation of log likelihood ratio: binary classification with SNP data

机译:日志似然比的高维模型表示:具有SNP数据的二进制分类

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Developing binary classification rules based on SNP observations has been a major challenge for many modern bioinformatics applications, e.g., predicting risk of future disease events in complex conditions such as cancer. Small-sample, high-dimensional nature of SNP data, weak effect of each SNP on the outcome, and highly non-linear SNP interactions are several key factors complicating the analysis. Additionally, SNPs take a finite number of values which may be best understood as ordinal or categorical variables, but are treated as continuous ones by many algorithms. We use the theory of high dimensional model representation (HDMR) to build appropriate low dimensional glass-box models, allowing us to account for the effects of feature interactions. We compute the second order HDMR expansion of the log-likelihood ratio to account for the effects of single SNPs and their pairwise interactions. We propose a regression based approach, called linear approximation for block second order HDMR expansion of categorical observations (LABS-HDMR-CO), to approximate the HDMR coefficients. We show how HDMR can be used to detect pairwise SNP interactions, and propose the fixed pattern test (FPT) to identify statistically significant pairwise interactions. We apply LABS-HDMR-CO and FPT to synthetically generated HAPGEN2 data as well as to two GWAS cancer datasets. In these examples LABS-HDMR-CO enjoys superior accuracy compared with several algorithms used for SNP classification, while also taking pairwise interactions into account. FPT declares very few significant interactions in the small sample GWAS datasets when bounding false discovery rate (FDR) by 5%, due to the large number of tests performed. On the other hand, LABS-HDMR-CO utilizes a large number of SNP pairs to improve its prediction accuracy. In the larger HAPGEN2 dataset FTP declares a larger portion of SNP pairs used by LABS-HDMR-CO as significant. LABS-HDMR-CO and FPT are interesting methods to design prediction rules and detect pairwise feature interactions for SNP data. Reliably detecting pairwise SNP interactions and taking advantage of potential interactions to improve prediction accuracy are two different objectives addressed by these methods. While the large number of potential SNP interactions may result in low power of detection, potentially interacting SNP pairs, of which many might be false alarms, can still be used to improve prediction accuracy.
机译:基于SNP观测的发展二元分类规则是许多现代生物信息学应用的主要挑战,例如,预测癌症等复杂条件下未来疾病事件的风险。 SNP数据的小样本,高维性质,每个SNP对结果的弱效果,高度非线性SNP相互作用是一个复杂分析的几个关键因素。另外,SNPS采用有限数量的值,该值可以最好地理解为序数或分类变量,而是被许多算法视为连续的值。我们使用高维模型表示(HDMR)理论构建适当的低维玻璃盒型号,允许我们考虑特征交互的影响。我们计算日志似然比的二阶HDMR扩展,以考虑单个SNP和其成对交互的影响。我们提出了一种基于回归的方法,称为线性近似用于分类观测的块二阶HDMR扩展(Labs-HDMR-Co),以近似HDMR系数。我们展示了HDMR如何用于检测成对SNP交互,并提出固定模式测试(FPT)以识别统计显着的成对相互作用。我们将Labs-HDMR-CO和FPT应用于合成生成的HAPGEN2数据以及两个GWAS癌症数据集。在这些例子中,与SNP分类的几种算法相比,Labs-HDMR-Co享有卓越的精度,同时还考虑了成对的相互作用。由于执行了大量的测试,FPT在限定错误发现率(FDR)时,在小样本GWAS数据集中宣布非常少,由于执行了大量的测试。另一方面,Labs-HDMR-CO利用大量SNP对来提高其预测精度。在较大的HAPGen2数据集中,FTP声明Labs-HDMR-Co使用的大部分SNP对。 Labs-HDMR-CO和FPT是设计预测规则的有趣方法,并检测SNP数据的成对功能交互。可靠地检测成对SNP相互作用并利用潜在的相互作用以提高预测精度是通过这些方法解决的两个不同的目标。虽然大量潜在的SNP相互作用可能导致低功率的检测,潜在的交互SNP对,其中许多可能是假警报,仍可用于提高预测精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号