A data warehouse is an integrated database primarily used in organizational decision making. Although the deployment of data warehouses is current practice in modern information technology landscapes, the methodical schema design for such databases has only been studied cursorily. In this paper we pursue schema design for data warehouses in the spirit of classical database design, organized as a sequence of requirement analysis and specification to collect user requirements, conceptual design to model the data warehouse as a multidimensional database independently from implementation issues, logical design to transform the conceptual data warehouse schema into the target data model, and physical design to maximize performance with respect to a target database system In short, from a conceptual point of view a data warehouse is a multidimensional database, and fact schemata, such as the one shown in Figure 1, represent such databases conceptually. In Figure 1, we have a fact schema Account from the banking domain, where measures Balance, BalanceClass, and NoOfTransactions are shown in a two dimensional content of dimensions Time and Account Each dimension is specified by means of a lattice of dimension levels, whose bottom element is called terminal dimension level (here: Day, AccID). Importantly, domains of optional dimension levels may contain inapplicable null values, and context dependencies specify contexts of validity for optional dimension levels (e.g., Age is applicable to private customers, whereas Legal Form is applicable to capital companies, and annotations in the schema indicate these facts). Given a fact schema F, we call the attributes occurring in F the universe of F, denoted by UF, and we note that there is a set of functional dependencies over UF, called the functional dependencies implied by F, denoted by FDF (e.g., the set of terminal dimension levels functionally determines the set of measures, and each edge in a dimension lattice indicates a functional dependency).
展开▼