首页> 外文期刊>Data & Knowledge Engineering >SubspaceDB: In-database subspace clustering for analytical query processing
【24h】

SubspaceDB: In-database subspace clustering for analytical query processing

机译:SubspaceDB:数据库内子空间聚类,用于分析查询处理

获取原文
获取原文并翻译 | 示例

摘要

High dimensional data analysis within relational database management systems (RDBMS) is challenging because of inadequate support from SQL. Currently, subspace clustering of high dimensional data is implemented either outside DBMS using wrapper code or inside DBMS using SQL User Defined Functions/Aggregates(UDFs/UDAs). However, both these approaches have potential disadvantages from performance, resource usage, and security perspective for voluminous and frequently updated data. Hence, we propose an efficient querying system, named SubspaceDB, that implements subspace clustering directly within an RDBMS. SubspaceDB provides a novel set of query operators, each with an optimization objective, to facilitate interactive analysis for subspace clustering. The query operators focus on retrieving optimal answers to four key query types : (a) Medoid queries, (b) Neighbourhood queries, (c) Partial similarity queries, and (d) Prominence queries, that aid the formation of subspace clusters. Experimental studies on real and synthetic databases of size 15M tuples and 104 attributes show that our proposed approach SubspaceDB can be over 10 times faster as compared to a conventional wrapper-based or SQL UDF approach. The proposed approach is also efficient in retrieving at least 50% data with performance improvement of at least 25%.
机译:由于SQL支持不足,关系数据库管理系统(RDBMS)中的高维数据分析具有挑战性。当前,高维数据的子空间聚类是在DBMS外部使用包装程序代码或在DBMS内部使用SQL用户定义的函数/聚合(UDF / UDA)实现的。但是,这两种方法从性能,资源使用和安全角度出发,对于大量且频繁更新的数据都具有潜在的缺点。因此,我们提出了一个名为SubspaceDB的高效查询系统,该系统直接在RDBMS中实现子空间聚类。 SubspaceDB提供了一组新颖的查询运算符,每个查询运算符都有一个优化目标,以便于对子空间聚类进行交互式分析。查询运算符集中于检索对四种关键查询类型的最佳答案:(a)类固醇查询,(b)邻域查询,(c)部分相似性查询和(d)突出查询,它们有助于子空间簇的形成。对大小为1500万个元组和104个属性的真实和合成数据库的实验研究表明,与传统的基于包装器或SQL UDF的方法相比,我们提出的SubspaceDB方法可以快10倍以上。所提出的方法还可以有效地检索至少50%的数据,并且性能至少提高25%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号