首页> 外文会议>International conference on very large data bases >Snorkel: Rapid Training Data Creation with Weak Supervision
【24h】

Snorkel: Rapid Training Data Creation with Weak Supervision

机译:浮潜:快速培训数据创建弱监管

获取原文

摘要

Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the-art models without hand labeling any training data. Instead, users write labeling functions that express arbitrary heuristics, which can have unknown accuracies and correlations. Snorkel denoises their outputs without access to ground truth by incorporating the first end-to-end implementation of our recently proposed machine learning paradigm, data programming. We present a flexible interface layer for writing labeling functions based on our experience over the past year collaborating with companies, agencies, and research labs. In a user study, subject matter experts build models 2.8× faster and increase predictive performance an average 45.5% versus seven hours of hand labeling. We study the modeling tradeoffs in this new setting and propose an optimizer for automating tradeoff decisions that gives up to 1.8× speedup per pipeline execution. In two collaborations, with the U.S. Department of Veterans Affairs and the U.S. Food and Drug Administration, and on four open-source text and image data sets representative of other deployments, Snorkel provides 132% average improvements to predictive performance over prior heuristic approaches and comes within an average 3.60% of the predictive performance of large hand-curated training sets.
机译:标签训练数据越来越最大的部署机器学习系统的瓶颈。我们呈现浮潜,一流的系统,使用户能够培训最先进的模型,而无需标记任何培训数据。相反,用户编写表达任意启发式的标签功能,这可能具有未知的精度和相关性。 Snorkel通过纳入我们最近提出的机器学习范例的第一个端到端实施,无需访问地面真理,无需访问地面真理。我们介绍了一个灵活的界面层,用于根据我们过去一年的经验与公司,机构和研究实验室合作,根据我们的经验编写标签功能。在用户学习中,主题专家建立模型2.8×更快,增加预测性能,平均为45.5%,而七小时的手标签。我们研究了这个新设置中的建模权衡,并提出了一种自动化权衡决策的优化器,每个管道执行高达1.8倍的加速。在两个合作中,与美国退伍军人事务部和美国食品和药物管理部门,并在四个开源文本和图像数据集中代表其他部署,浮潜提供了132%的平均改进,以预测性能超过现有启发式方法和来源平均大大手动策划培训套装的360%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号