<<

. 100
( 137 .)



>>

instance, OLAP is a good tool for visualizing trends over time, such as for sales
or financial data. A specific date in this case potentially represents information
along several dimensions, as shown in Figure 15.7:
Day of the week
––

Month
––


Quarter
––


Calendar year
––


One approach is to represent each of these as a different dimension. In other
words, there would be four dimensions, one for the day of the week, one for
the month, one for the quarter, and one for the calendar year. The data for Jan­
uary 2004, then would be the subcube where the January dimension intersects
the 2004 dimension.
This is not a good approach. Multidimensional modeling recognizes that
time is an important dimension, and that time can have many different attrib­
utes. In addition to the attributes described above, there is also the week of the
year, whether the date is a holiday, whether the date is a work day, and so on.
Such attributes are stored in reference tables, called dimension tables. Dimen­
sion tables make it possible to change the attributes of the dimension without
changing the underlying data.




Team-Fly®
Data Warehousing, OLAP, and Data Mining 503


Date
(7 March 1997)




Day of the Day of the Day of the
Month
Week Month Year
(Mar)
(Friday) (7) (67)




Year
(1997)


Figure 15.7 There are multiple hierarchies for dates.



WA R N I N G Do not take shortcuts when designing the dimensions for an
OLAP system. These are the skeleton of the data mart, and a weak skeleton will
not last very long.

Dimension tables contain many different attributes describing each value of
the dimension. For instance, a detailed geography dimension might be built
from zip codes and include dozens of summary variables about the zip codes.
These attributes can be used for filtering (“How many customers are in high-
income areas?”). These values are stored in the dimension table rather than the
fact table, because they cannot be aggregated correctly. If there are three stores
in a zip code, a zip code population fact would get added up three times”
multiplying the population by three.
Usually, dimension tables are kept up to date with the most recent values for
the dimension. So, a store dimension might include the current set of stores
with information about the stores, such as layout, square footage, address, and
manager name. However, all of these may change over time. Such dimensions
are called slowly changing dimensions, and are of particular interest to data
mining because data mining wants to reconstruct accurate histories. Slowly
changing dimensions are outside the scope of this book. Interested readers
should review Ralph Kimball™s books.
504 Chapter 15


Conformed Dimensions
As mentioned earlier, data warehouse systems often contain multiple OLAP
cubes. Some of the power of OLAP arises from the practice of sharing dimen­
sions across different cubes. These shared dimensions are called conformed
dimensions and are shown in Figure 15-8; they help ensure that business
results reported through different systems use the same underlying set of busi­
ness rules.


Merchandizing
View
Shop




Marketing
View t
uc
d
ro
P
Weeks
Customer




Finance
View
Region

t
uc
od t­
r
ar
P
ep ent
Days
Dm
Weeks




Different users have different views of
the data, but they often share
dimensions.

The hierarchy for the time dimension
time
needs to cover days, weeks, months,
and quarters.

The hierarchy for region starts at the
shop
shop level and then includes
metropolitan areas and states.

product The hierarchy for product includes the
department.

The hierarchy for the customer might
customer include households.
Figure 15.8 Different views of the data often share common dimensions. Finding the
common dimensions and their base units is critical to making data warehousing work well
across an organization.
Data Warehousing, OLAP, and Data Mining 505


A good example of a conformed dimension is the calendar dimension,
which keeps track of the attributes of each day. A calendar dimension is so
important that it should be a part of every data warehouse. However, different
components of the warehouse may need different attributes. For instance, a
multinational business might include sets of holidays for different countries,
so there might be a flag for “United States Holiday,” “United Kingdom
Holiday,” “French Holiday,” and so on, instead of an overall holiday flag.
January 1st is a holiday in most countries; however, July 4th is mostly celebrated
in the United States.
One of the challenges in building OLAP systems is designing the conformed
dimensions so that they are suitable for a wide variety of applications. For
some purposes geography might be best described by city and state; for
another, by county; for another, by census block group; and for another by zip
code. Unfortunately, these four descriptions are not fully compatible, since
there can be several small towns in a zip code, and there are five counties in
New York City. Multidimensional modeling helps resolve such conflicts.


Star Schema
Cubes are easily stored in relational databases, using a denormalized data
structure called the star schema, developed by Ralph Kimball, a guru of OLAP.
One advantage of the star schema is its use of standard database technology to
achieve the power of OLAP.
A star schema starts with a central fact table that corresponds to facts about a
business. These can be at the transaction level (for an event cube), although
they are more often low-level summaries of transactions. For retail sales, the
central fact table might contain daily summaries of sales for each product in
each store (shop-SKU-time). For a credit card company, a fact table might con­
tain rows for each transaction by each customer or summaries of spending by
product (based on card type and credit limit), customer segment, merchant
type, customer geography, and month. For a diesel engine manufacturer inter­
ested in repair histories, it might contain each repair made on each engine or a
daily summary of repairs at each shop by type of repair.
Each row in the central fact table contains some combination of keys that
makes it unique. These keys are called dimensions. The central fact table also
has other columns that typically contain numeric information specific to each
row, such as the amount of the transaction, the number of transactions, and so
on. Associated with each dimension are auxiliary tables called dimension tables,
which contain information specific to the dimensions. For instance, the dimen­
sion table for date might specify the day of the week for a particular date, its
month, year, and whether it is a holiday.
506 Chapter 15


In diagrams, the dimension tables are connected to the central fact table,
resulting in a shape that resembles a star, as shown in Figure 15.9.


Dept Description
01 CORE FRAGRANCE

02 MISCELLANEOUS

05 GARDENS

06 BRIDAL

10 ACCESSORIES



SKU Description Dept Color Description
0001 V NECK TEE 70 01 BLACK

0002 PANTYHOSE 65 02 IVORY

0003 TUXEDO PJ 60 03 TAYLOR GREEN

0004 NOVELTY T SHIRT 70 04 STILETTO

0005 VELOUR JUMPSUIT 76 05 BLUE TOPAZ




Shop SKU Color Date Count Sales Cost Returns
0001 0001 01 000001 5 $50 $20 0

0001 0002 02 000001 12 $240 $96 0

0001 0002 03 000001 4 $80 $32 1

0001 0002 04 000001 12 $240 $96 0
0001 0003 09 000001 19 $85 $19 2

0001 0003 01 000001 5 $25 $5 0

0150 0001 01 000001 31 $310 $134 2



Date Year Month Day
000001 1997 01 01

Shop Reg State City Sq Ft 000002 1997 01 01

0001 J CA San Francisco 3,141 000003 1997 01 01

0007 A MA Central Boston 1,026 000004 1997 01 01

0034 E FL Miami 5,009 000005 1997 01 01

0124 H MN Minneapolis 1,793

Date Hol? Date DoW
0150 B NY New York City 6,400

000001 Y 000001 Wed

000002 N 000002 Thu

Reg Name 000003 N 000003 Fri

A Northeast 000004 N 000004 Sat

<<

. 100
( 137 .)



>>