首页> 外文会议>International Conference on Machine Learning >Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods
【24h】

Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods

机译:快速和三义:用三联方法加速弱监管

获取原文

摘要

Weak supervision is a popular method for building machine learning models without relying on ground truth annotations. Instead, it generates probabilistic training labels by estimating the accuracies of multiple noisy labeling sources (e.g., heuristics, crowd workers). Existing approaches use latent variable estimation to model the noisy sources, but these methods can be computationally expensive, scaling superlinearly in the data. In this work, we show that, for a class of latent variable models highly applicable to weak supervision, we can find a closed-form solution to model parameters, obviating the need for iterative solutions like stochastic gradient descent (SGD). We use this insight to build FlyingSquid, a weak supervision framework that runs orders of magnitude faster than previous weak supervision approaches and requires fewer assumptions. In particular, we prove bounds on generalization error without assuming that the latent variable model can exactly parameterize the underlying data distribution. Empirically, we validate FlyingSquid on benchmark weak supervision datasets and find that it achieves the same or higher quality compared to previous approaches without the need to tune an SGD procedure, recovers model parameters 170 times faster on average, and enables new video analysis and online learning applications.
机译:弱监管是一种在不依赖地面真理注释的情况下建立机器学习模型的流行方法。相反,它通过估计多个嘈杂标签来源的准确性(例如,启发式,人群工人)来产生概率训练标签。现有方法使用潜在的变量估计来模拟嘈杂的来源,但这些方法可以在计算上昂贵,在数据中超级缩放。在这项工作中,我们展示了,对于一类高度适用于弱监管的一类潜在的变量模型,我们可以找到一个封闭式解决方案来模拟参数,避免了对随机梯度下降(SGD)等迭代解决方案的需求。我们使用这种洞察力来构建飞行,这是一个薄弱的监督框架,比以前的弱势监督方法更快地运行数量幅度,并且需要更少的假设。特别是,我们证明了泛化误差的界限而不假设潜在变量模型可以准确地参数化底层数据分发。经验上,我们在基准弱监督数据集上验证了飞行质量,并发现它与以前的方法相比,无需调整SGD程序的速度相比,平均速度更快地恢复模型参数,并启用新的视频分析和在线学习应用程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号