首页> 外文期刊>International Journal of Data Science and Analytics >Large-scale predictive modeling and analytics through regression queries in data management systems
【24h】

Large-scale predictive modeling and analytics through regression queries in data management systems

机译:通过数据管理系统中的回归查询进行大规模的预测建模和分析

获取原文
获取原文并翻译 | 示例
           

摘要

Regression analytics has been the standard approach to modeling the relationship between input and output variables, while recent trends aim to incorporate advanced regression analytics capabilities within data management systems (DMS). Linear regression queries are fundamental to exploratory analytics and predictive modeling. However, computing their exact answers leaves a lot to be desired in terms of efficiency and scalability. We contribute with a novel predictive analytics model and an associated statistical learning methodology, which are efficient, scalable and accurate in discovering piecewise linear dependencies among variables by observing only regression queries and their answers issued to a DMS. We focus on in-DMS piecewise linear regression and specifically in predicting the answers to mean-value aggregate queries, identifying and delivering the piecewise linear dependencies between variables to regression queries and predicting the data dependent variables within specific data subspaces defined by analysts and data scientists. Our goal is to discover a piecewise linear data function approximation over the underlying data only through query-answer pairs that is competitive with the best piecewise linear approximation to the ground truth. Our methodology is analyzed, evaluated and compared with exact solution and near-perfect approximations of the underlying relationships among variables achieving orders of magnitude improvement in analytics processing.
机译:回归分析一直是对输入和输出变量之间的关系进行建模的标准方法,而最近的趋势旨在将先进的回归分析功能整合到数据管理系统(DMS)中。线性回归查询是探索性分析和预测模型的基础。但是,就效率和可伸缩性而言,计算它们的确切答案还有很多需求。我们提供了新颖的预测分析模型和相关的统计学习方法,它们通过仅观察回归查询及其发给DMS的答案,在发现变量之间的分段线性相关性方面高效,可扩展且准确。我们专注于DMS内部分段线性回归,尤其是预测均值聚合查询的答案,识别变量之间的分段线性相关性并将其传递给回归查询,以及预测由分析师和数据科学家定义的特定数据子空间中的数据相关变量。我们的目标是仅通过查询-答案对来发现基础数据的分段线性数据函数近似值,该近似值与对地面真实情况的最佳分段线性近似值竞争。我们对方法进行了分析,评估,并与精确解决方案和变量之间基本关系的近乎完美近似进行了比较,从而在分析处理中实现了数量级的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号