Query-Driven Learning for Predictive Analytics of Data Subspace Cardinality

Anagnostopoulos Christos; Triantafillou Peter

首页> 外文期刊>ACM transactions on knowledge discovery from data >Query-Driven Learning for Predictive Analytics of Data Subspace Cardinality

【24h】

Query-Driven Learning for Predictive Analytics of Data Subspace Cardinality

机译：用于数据子空间基数预测分析的查询驱动学习

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Fundamental to many predictive analytics tasks is the ability to estimate the cardinality (number of data items) of multi-dimensional data subspaces, defined by query selections over datasets. This is crucial for data analysts dealing with, e.g., interactive data subspace explorations, data subspace visualizations, and in query processing optimization. However, in many modern data systems, predictive analytics may be (i) too costly money-wise, e.g., in clouds, (ii) unreliable, e.g., in modern Big Data query engines, where accurate statistics are difficult to obtain/maintain, or (iii) infeasible, e.g., for privacy issues. We contribute a novel, query-driven, function estimation model of analyst-defined data subspace cardinality. The proposed estimation model is highly accurate in terms of prediction and accommodating the well-known selection queries: multi-dimensional range and distance-nearest neighbors (radius) queries. Our function estimation model: (i) quantizes the vectorial query space, by learning the analysts' access patterns over a data space, (ii) associates query vectors with their corresponding cardinalities of the analyst-defined data subspaces, (iii) abstracts and employs query vectorial similarity to predict the cardinality of an unseen/unexplored data subspace, and (iv) identifies and adapts to possible changes of the query subspaces based on the theory of optimal stopping. The proposed model is decentralized, facilitating the scaling-out of such predictive analytics queries. The research significance of the model lies in that (i) it is an attractive solution when data-driven statistical techniques are undesirable or infeasible, (ii) it offers a scale-out, decentralized training solution, (iii) it is applicable to different selection query types, and (iv) it offers a performance that is superior to that of data-driven approaches.

机译：许多预测分析任务的基础是估计多维数据子空间的基数（数据项数）的能力，多维数据子空间由数据集上的查询选择定义。这对于处理例如交互式数据子空间探索，数据子空间可视化以及查询处理优化的数据分析员至关重要。但是，在许多现代数据系统中，预测分析可能（i）在金钱上成本太高，例如在云中；（ii）在现代大数据查询引擎中不可靠，在这些情况下，准确的统计信息很难获得/维护；或（iii）不可行，例如由于隐私问题。我们为分析师定义的数据子空间基数提供了一种新颖的，查询驱动的功能估计模型。所提出的估计模型在预测和适应众所周知的选择查询方面是高度准确的：多维范围查询和距离最近的邻居（半径）查询。我们的函数估计模型：（i）通过学习数据空间上分析师的访问模式来量化矢量查询空间，（ii）将查询向量与其分析师定义的数据子空间的对应基数相关联，（iii）抽象并采用查询向量相似性来预测未知/未探索数据子空间的基数，并且（iv）根据最佳停止理论识别并适应查询子空间的可能变化。所提出的模型是分散式的，有利于横向扩展此类预测分析查询。该模型的研究意义在于（i）当数据驱动的统计技术不受欢迎或不可行时，这是一个有吸引力的解决方案；（ii）提供了横向扩展，分散的培训解决方案；（iii）适用于不同的解决方案选择查询类型，并且（iv）它提供的性能优于数据驱动方法。

著录项

来源
《ACM transactions on knowledge discovery from data》 |2017年第4期|47.1-47.46|共46页
作者
Anagnostopoulos Christos; Triantafillou Peter;
展开▼
作者单位

Univ Glasgow, Sch Comp Sci, Sir Alwyn Williams Bldg, Glasgow G12 8RZ, Lanark, Scotland;

Univ Glasgow, Sch Comp Sci, Sir Alwyn Williams Bldg, Glasgow G12 8RZ, Lanark, Scotland;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Predicive analytics; predictive learning; data subspace exploration; analytics selection queries; vector regression quantization; optimal stopping theory;

机译：预测分析;预测学习;数据子空间探索;分析选择查询;向量回归量化;最优停止理论;

相似文献

外文文献
中文文献
专利

1. An evolution-based DNA-binding residue predictor using a dynamic query-driven learning scheme [J] . H. Chai, J. Zhang, G. Yang, Molecular BioSystems . 2016,第12期

机译：使用动态查询驱动的学习方案的基于进化的DNA结合残基预测子
2. BIG DATA ANALYTICS AND PRECISION ANIMAL AGRICULTURE SYMPOSIUM: Machine learning and data mining advance predictive big data analysis in precision animal agriculture [J] . Morota Gota, Ventura Ricardo V., Silva Fabyano F., Journal of Animal Science . 2018,第4期

机译：大数据分析和精密动物农业研讨会：机器学习与数据挖掘预测性大数据分析精密动物农业
3. LRSSL: predict and interpret drug-disease associations based on data integration using sparse subspace learning [J] . Bioinformatics . 2017,第8期

机译：LRSSL：使用稀疏子空间学习的数据集成预测和解释毒性疾病关联
4. Learning to accurately COUNT with query-driven predictive analytics [C] . Anagnostopoulos Christos, Triantafillou Peter IEEE International Congress on Big Data . 2015

机译：通过查询驱动的预测分析学习准确计数
5. Low-Latency, Query-Driven Analytics Over Voluminous Multidimensional, Spatiotemporal Datasets [D] . Malensek, Matthew. 2017

机译：低延迟，查询驱动的大量多维时空数据集分析
6. BIG DATA ANALYTICS AND PRECISION ANIMAL AGRICULTURE SYMPOSIUM: Machine learning and data mining advance predictive big data analysis in precision animal agriculture [O] . Gota Morota, Ricardo V Ventura, Fabyano F Silva, 2018

机译：大数据分析和精密动物农业研讨会：机器学习和数据挖掘促进了精确动物农业中的预测性大数据分析
7. Query-driven learning for predictive analytics of data subspace cardinality [O] . Anagnostopoulos Christos, Triantafillou Peter 2017

机译：查询驱动的学习，用于数据子空间基数的预测分析

Query-Driven Learning for Predictive Analytics of Data Subspace Cardinality

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅