首页> 美国卫生研究院文献>other >Generalized Hierarchical Sparse Model for Arbitrary-Order Interactive Antigenic Sites Identification in Flu Virus Data
【2h】

Generalized Hierarchical Sparse Model for Arbitrary-Order Interactive Antigenic Sites Identification in Flu Virus Data

机译:流感病毒数据中任意顺序交互式抗原位点识别的广义分层稀疏模型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Recent statistical evidence has shown that a regression model by incorporating the interactions among the original covariates/features can significantly improve the interpretability for biological data. One major challenge is the exponentially expanded feature space when adding high-order feature interactions to the model. To tackle the huge dimensionality, hierarchical sparse models (HSM) are developed by enforcing sparsity under heredity structures in the interactions among the covariates. However, existing methods only consider pairwise interactions, making the discovery of important high-order interactions a non-trivial open problem. In this paper, we propose a generalized hierarchical sparse model (GHSM) as a generalization of the HSM models to tackle arbitrary-order interactions. The GHSM applies the ℓ1 penalty to all the model coefficients under a constraint that given any covariate, if none of its associated kth-order interactions contribute to the regression model, then neither do its associated higher-order interactions. The resulting objective function is non-convex with a challenge lying in the coupled variables appearing in the arbitrary-order hierarchical constraints and we devise an efficient optimization algorithm to directly solve it. Specifically, we decouple the variables in the constraints via both the general iterative shrinkage and thresholding (GIST) and the alternating direction method of multipliers (ADMM) methods into three subproblems, each of which is proved to admit an efficiently analytical solution. We evaluate the GHSM method in both synthetic problem and the antigenic sites identification problem for the influenza virus data, where we expand the feature space up to the 5th-order interactions. Empirical results demonstrate the effectiveness and efficiency of the proposed methods and the learned high-order interactions have meaningful synergistic covariate patterns in the influenza virus antigenicity.
机译:最近的统计证据表明,通过合并原始协变量/特征之间的相互作用的回归模型可以显着提高生物学数据的可解释性。一个主要挑战是在向模型添加高阶特征交互时,特征空间呈指数扩展。为了解决巨大的维数,通过在协变量之间的交互作用下在遗传结构下实施稀疏性来开发分层稀疏模型(HSM)。但是,现有方法仅考虑成对相互作用,这使得重要的高阶相互作用的发现成为不平凡的开放问题。在本文中,我们提出了广义分层稀疏模型(GHSM)作为HSM模型的一般化,以解决任意顺序的相互作用。 GHSM在给定任何协变量的约束下将all1罚分应用于所有模型系数,如果其关联的k阶交互都不对回归模型有任何贡献,则其关联的高阶交互也不起作用。由此产生的目标函数是非凸的,挑战在于耦合变量出现在任意顺序的层次约束中,我们设计了一种有效的优化算法来直接求解它。具体来说,我们通过一般迭代收缩和阈值(GIST)和乘数交替方向方法(ADMM)方法将约束中的变量解耦为三个子问题,每个问题都被证明可以接受有效的分析解决方案。我们在流感病毒数据的合成问题和抗原位点识别问题中评估GHSM方法,在其中我们将特征空间扩展到5阶相互作用。实验结果证明了所提出方法的有效性和效率,并且所获高阶相互作用在流感病毒抗原性方面具有有意义的协同协变量模式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号