首页> 外文会议>ACM SIGMOD international conference on Management of data >Data densification in a relational database system
【24h】

Data densification in a relational database system

机译:关系数据库系统中的数据致密化

获取原文

摘要

Data in a relational data warehouse is usually sparse. That is, if no value exists for a given combination of dimension values, no row exists in the fact table. Densities of 0.1-2% are very common. However, users may want to view the data in a dense form, with rows for all combination of dimension values displayed even when no fact data exists for them. For example, if a product did not sell during a particular time period, users may still want to see the product for that time period with zero sales value next to it. Moreover, analytic window functions [1] and the SQL model clause [2] can more easily express time series calculations if data is dense along the time dimension because dense data will fill a consistent number of rows for each period.Data densification is the process of converting spare data into dense form. The current SQL technique for densification (using the combination of DISTINCT, CROSS JOIN and OUTER JOIN operations) is extremely unintuitive, difficult to express and inefficientto compute. Hence, we propose an extension to the ANSI SQL join operator, referred to as "PARTITIONED OUTER JOIN", which allows for a succinct expression of densification along the dimensions of interest. We also present various algorithms to evaluate the new join operator efficiently and compare it with existing methods of doing the equivalent operation. We also define a new window function "LAST_VALUE (IGNORE NULLS)" which is very useful with partitioned outer join.
机译:关系数据仓库中的数据通常是稀疏的。也就是说,如果给定的维值组合不存在任何值,则事实表中将不存在任何行。 0.1-2%的密度非常常见。但是,用户可能希望以密集的形式查看数据,即使没有事实数据,也要显示维度值所有组合的行。例如,如果某个产品在特定时间段内未销售,则用户可能仍希望查看该时间段内该产品旁边的销售值为零的产品。此外,如果数据沿时间维度密集,则解析窗口函数[1]和SQL模型子句[2]可以更轻松地表达时间序列计算,因为密集数据将在每个期间填充一致的行数。数据致密化是一个过程将备用数据转换为密集格式的过程。当前的SQL致密化技术(结合使用DISTINCT,CROSS JOIN和OUTER JOIN操作)极不直观,难以表达且计算效率低。因此,我们提出了对ANSI SQL连接运算符的扩展,称为“ PARTITIONED OUTER JOIN”,它允许沿感兴趣的维度简洁地表示致密化。我们还提出了各种算法来有效地评估新的联接运算符,并将其与进行等效运算的现有方法进行比较。我们还定义了一个新的窗口函数“ LAST_VALUE(IGNORE NULLS)”,该函数对于分区外部联接非常有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号