首页> 外文会议>International Conference on Machine Learning >Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms
【24h】

Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms

机译:在专家混合中打破栅格:一致而有效的算法

获取原文

摘要

Mixture-of-Experts (MoE) is a widely popular model for ensemble learning and is a basic building block of highly successful modern neural networks as well as a component in Gated Recurrent Units (GRU) and Attention networks. However, present algorithms for learning MoE, including the EM algorithm and gradient descent, are known to get stuck in local optima. From a theoretical viewpoint, finding an efficient and provably consistent algorithm to learn the parameters remains a long standing open problem for more than two decades. In this paper, we introduce the first algorithm that learns the true parameters of a MoE model for a wide class of non-linearities with global consistency guarantees. While existing algorithms jointly or iteratively estimate the expert parameters and the gating parameters in the MoE, we propose a novel algorithm that breaks the deadlock and can directly estimate the expert parameters by sensing its echo in a carefully designed cross-moment tensor between the inputs and the output. Once the experts are known, the recovery of gating parameters still requires an EM algorithm; however, we show that the EM algorithm for this simplified problem, unlike the joint EM algorithm, converges to the true parameters. We empirically validate our algorithm on both the synthetic and real data sets in a variety of settings, and show superior performance to standard baselines.
机译:混合 - 的 - 专家(MOE)是集成学习一种广泛流行的模式是非常成功的现代神经网络的基本构建块,以及在门控复发单位(GRU)和注意网络的一个组成部分。然而,对于学习教育部,包括EM算法和梯度下降的现有算法,已知陷入局部最优。从理论的角度来看,找到一个有效的和可证明的一致性算法学习的参数仍然超过二十年一个长期的开放问题。在本文中,我们介绍一个学习教育部模型的真实参数为一大类非线性全球一致性保证的第一算法。虽然现有的算法联合或迭代估计专家参数,并在教育部的门控参数,我们提出了一种新的算法,打破僵局,可以直接通过输入之间的一个精心设计的跨矩张量感应它的回声估计专家参数输出。一旦专家已知的门控参数的复苏仍需要一个EM算法;然而,我们证明了EM算法对于该简化的问题,不像合资EM算法收敛到真正的参数。我们经验验证我们对合成和真实数据这两组算法的各种设置,并显示优异的性能标准基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号