Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers?

机译：学习大容量分类器时，是否可以安全避免使用外键-钥匙联接？

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Machine learning (ML) over relational data is a booming area of data management. While there is a lot of work on scalable and fast ML systems, little work has addressed the pains of sourcing data for ML tasks. Real-world relational databases typically have many tables (often, dozens) and data scientists often struggle to even obtain all tables for joins before ML. In this context, Kumar et al. showed recently that, key-foreign key dependencies (KFKDs) between tables often lets us avoid such joins without significantly affecting prediction accuracy—an idea they called "avoiding joins safely." While initially controversial, this idea has since been used by multiple companies to reduce the burden of data soureing for ML. But their work applied only to linear classifiers. In this work, we verify if their results hold for three popular high-capacity classifiers: decision trees, non-linear SVMs. and AXNs. We conduct an extensive experimental study using both real-world datasets and simulations to analyze the effects of avoiding KFK joins on such models. Our results show that these high-capacity classifiers are surprisingly and counter-intuitively more robust to avoiding KFK joins compared to linear classifiers, refuting an intuition from the prior work's analysis. We explain this behavior intuitively and identify open questions at the intersection of data management and ML theoretical research.

机译：关系数据上的机器学习（ML）是数据管理的新兴领域。尽管在可伸缩和快速的ML系统上进行了大量工作，但很少有工作解决了为ML任务寻找数据的麻烦。现实世界中的关系数据库通常具有许多表（通常是几十个），并且数据科学家通常甚至很难在ML之前获取所有表进行联接。在这种情况下，Kumar等人。最近显示，表之间的键-外键依赖关系（KFKD）通常使我们能够避免这种联接而又不会显着影响预测准确性-他们称之为“安全避免联接”的想法。尽管最初引起争议，但此想法已被多家公司用来减轻ML的数据获取负担。但是他们的工作仅适用于线性分类器。在这项工作中，我们验证了它们的结果是否适用于三种流行的大容量分类器：决策树，非线性SVM。和AXN。我们使用现实世界的数据集和模拟进行了广泛的实验研究，以分析避免在此类模型上使用KFK联接的影响。我们的结果表明，与线性分类器相比，这些高容量分类器在避免KFK联接方面出人意料地且反直觉上更强大，从而避免了先前工作的分析得出的直觉。我们直观地解释此行为，并在数据管理和ML理论研究的交集中找出未解决的问题。

著录项

来源
《International conference on very large data bases》|2018年|366-379|共14页
会议地点
作者
Vraj Shah; Arun Kumar; Xiaojin Zhu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Using Machine Learning and a Combination of Respiratory Flow, Laryngeal Motion, and Swallowing Sounds to Classify Safe and Unsafe Swallowing [J] . Katsufumi Inoue, Michifumi Yoshioka, Naomi Yagi, Biomedical Engineering, IEEE Transactions on . 2018,第11期

机译：使用机器学习以及呼吸流，喉运动和吞咽音的组合来对安全吞咽和不安全吞咽进行分类
2. Joined-up thinking is key to building safely [J] . Nick Atkinson Architectural technology . 2020,第134期

机译：加入思维是安全建设的关键
3. 'Tempos' management in primary care: A key factor for classifying adverse events, and improving quality and safety [J] . AmalbertiR., BramiJ. BMJ quality & safety . 2012,第9期

机译：初级保健中的“天宝”管理：对不良事件进行分类并提高质量和安全性的关键因素
4. Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers? [C] . Vraj Shah, Arun Kumar, Xiaojin Zhu International conference on very large data bases . 2018

机译：关键外键加入安全，以避免学习高容量分类器吗？
5. Development and evaluation of a personal particle sampler and a mobile high-capacity particle size classifier. [D] . Chang, Ming-chih. 2000

机译：开发和评估个人颗粒采样器和移动式大容量粒度分类器。
6. ‘Tempos’ management in primary care: a key factor for classifying adverse events and improving quality and safety [O] . R Amalberti, J Brami -1

机译：初级保健中的天宝管理：对不良事件进行分类并提高质量和安全性的关键因素
7. Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers? [O] . Shah, Vraj, Kumar, Arun, Zhu, Xiaojin 2017

机译：在学习高容量时，关键外键是否可以安全避免分类？

Are Key-Foreign Key Joins Safe to Avoid when Learning High-Capacity Classifiers?

摘要

著录项

相似文献

相关主题

期刊订阅