首页> 外国专利> Method and apparatus for optimizing and structuring data by designing a cube forest data structure for hierarchically split cube forest template

Method and apparatus for optimizing and structuring data by designing a cube forest data structure for hierarchically split cube forest template

机译:通过设计用于分层拆分多维数据集林模板的多维数据集林数据结构来优化和结构化数据的方法和装置

摘要

The paradigmatic view of data in typical decision support applications divides the attributes (or fields) in the data records into two groups: dimensional attributes and value attributes. The dimensional attributes classify the record, while the value attributes indicate a measured quantity. The dimensional attributes can be partitioned into a set of dimensions, which are orthogonal descriptions of the record. The attributes within a dimension form hierarchies of descriptions of the record, ranging from a coarse to a description. For example, the database might consist of records of retail sales collected from individual stores and brought together into a central data warehouse. This database might have three dimensions: store location, product, and time of sale. The value attribute might be the dollar value of the sale. A dimension might contain several attributes. For example, the store location dimension might consist of country, region, state, county, and zip code. These attributes form a hierarchy because knowing the value of a fine attribute (e.g., zip code) tells you the value of a coarse attribute (e.g., country) . The attributes in the time dimension might be year, month, week, day, and hour. This dimension has multiple hierarchies because months do not contain an integral number of weeks. A large class of decision support queries ask for the aggregate value of one or more value attribute, where the aggregation ranges over all records whose dimensional attributes satisfy a selection predicate. For example, a query might be to find the sum of all sales of blue polo shirts in Palm Beach during the last quarter. A data table that can be described in terms of dimensions and value attributes is often called a "data cube." The records in our retail sales example can be imagined to exist in a three dimensional cube, the dimensions being the dimensional attributes. Queries, such as the example query, can be thought of as corresponding to sums over regions of the data cube. We describe herein a file structure (i.e., the Cube Forest) for storing a data cube that ensures fast response to the queries. The algorithms included herein are: (1) algorithms to load data into a cube forest; (2) algorithms to obtain an aggregate from the cube forest in response to a query; and (3) algorithms that compute an optimal cube forest structure.
机译:在典型的决策支持应用程序中,数据的范例视图将数据记录中的属性(或字段)分为两组:维属性和值属性。尺寸属性将记录分类,而值属性则指示测量数量。维度属性可以划分为一组维度,这些维度是记录的正交描述。维中的属性形成记录描述的层次结构,范围从粗略到描述。例如,数据库可能包含从各个商店收集并汇总到中央数据仓库中的零售记录。该数据库可能具有三个维度:商店位置,产品和销售时间。 value属性可能是销售的美元价值。一个维度可能包含几个属性。例如,商店位置维度可能由国家,地区,州,县和邮政编码组成。这些属性形成层次结构,因为知道精细属性(例如邮政编码)的值会告诉您粗略属性(例如国家/地区)的值。时间维度中的属性可以是年,月,周,日和小时。此维度具有多个层次结构,因为月份不包含整数周。一大类决策支持查询要求一个或多个值属性的合计值,其中合计范围覆盖所有维属性满足选择谓词的记录。例如,一个查询可能是查找上一季度在棕榈滩的蓝色polo衫的所有销售额的总和。可以用维度和值属性来描述的数据表通常称为“数据多维数据集”。可以想象我们的零售示例中的记录存在于三维立方体中,这些维度是维度属性。可以将查询(例如示例查询)视为与数据多维数据集区域上的总和相对应。我们在这里描述用于存储确保快速响应查询的数据立方体的文件结构(即,立方体森林)。这里包括的算法是:(1)将数据加载到多维数据集林中的算法; (2)响应查询而从多维数据集林中获取聚合的算法; (3)计算最佳立方体森林结构的算法。

著录项

  • 公开/公告号US6141655A

    专利类型

  • 公开/公告日2000-10-31

    原文格式PDF

  • 申请/专利权人 AT&T CORP;

    申请/专利号US19970936000

  • 发明设计人 THEODORE JOHNSON;DENNIS SHASHA;

    申请日1997-09-23

  • 分类号G06F17/30;

  • 国家 US

  • 入库时间 2022-08-22 01:35:46

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号