Link to home
Start Free TrialLog in
Avatar of SybaseUk
SybaseUk

asked on

Question on datawarehousing with Sybase IQ

Hi,

Could you kindly explain what 'Facts' and 'Dimensions' are in relation to data warehousing?

I've come across these terms but i'm trying to understand what they mean, i'm totally lost to this concept...

THANKS
Avatar of grant300
grant300

Facts and Dimensions are part of what is generally called the Dimensional Model for data warehousing.  You sometimes here it referred to as a "Star Schema" with the fact being the sun and the dimensions being planets.  The father and prime formalizer of this school is Ralph Kimball.  If you are involved with designing and or maintaining a data warehouse, you should have a copy of the seminal work, The Data Warehouse Toolkit.

Facts are generally irreducible pieces of data.  The dimensions are the normalizable and sometimes hierarchical details associated with the fact.  The "granularity" of the fact is defined by the dimensions that identify it.

As an example, let's take a typical retail star schema for a grocery store.  Lets further assume that the fact is each sale of each individual item.  So your dimensions might be:
 - Period : there is always a hierarchical time dimension so that you can browse and group by certain periods, e.g. portion of the day, day of the week, weekly, monthly, holidays, etc.  This dimension should have every time/calendar specific piece of information like holidays (they sell a lot of food then).  Even though you may have the date and time to the milisecond, your period dimension probably has a granularity of say quarter hours.
 - Store : assuming this is a chain and not just one store, you will want a dimension that identifies each.  The dimension should probably have geographic and organizational hierarchies, e.g. city, state, zip code, MMA, as well as zone and region that the company uses to organize stores.  A more sophisticated model might also include information about operating hours for example so that you can adjust reports to take that into account.
 - Product : this will have a row for each possible SKU the store handles.  You will track manufacturer, size, may be case size; essentially every static piece of information about that product.
 - Customer : if you have ever wondered why the grocery stores are so hot to get you to apply for a loyalty card, now you know.  The loyalty card is the best way for them to build a customer dimension, track demographics and do householding.  That is why they as things like your family income and whether you rent or own a home.  Even if you don't have a loyalty card, when you use a credit or debit card, they can attach a unique identifier that they have a high probability of seeing again the next time you purchase something.

So now your fact table has 4 surrogate keys (Period_ID, Store_ID, SKU, and Customer_ID), some "facts" like price, quantity, margin (if it is a really smart system), information about any coupon that was used, etc, and some details.  The details are generally non-numeric items (non-additive facts) that you still might want to know like the register ID, clerk ID, etc.

When you are loading data into the data warehouse, you have to make sure the dimensions are taken care of first since you will need the surrocate keys to load the fact.  In this example, Period and Store are pretty stable and are pre-loaded with known values.  Most of the Product table will be static and preloaded however, you have to be able to handle new and regional items on the fly.  These including things like microbrews that are purchased and handled by only a handful of stores.  Even if you have to go back later and fill in the details for the SKU, you have to create a dimension record and the surrogate key.  Customer will be a much more dynamic dimension with new identifiers all the time.  You may also have group customer records pre-populated to deal with those that can not be identified.  You can group them by method of payment, e.g. check, cash, food stamps, other, etc. and perhaps total sale buckets, e.g. $0-$5, $5-$10, $10-$20, $20-$50, $50-$75, $75-$100, $100-$150, $150-----etc.

I could go on and write a book, or at least a few chapters, but this gives you the idea.  As I said before, the real way to understand this technique in depth is to get a copy of "The Data Warehouse Toolkit".  It is an easy read, and, unlike many technical books, is organized buy the type of business problem you are trying to solve more than the subsection of the technology.  Learning with practical examples you can actually identify with is much easier.

Regards,
Bill
Avatar of SybaseUk

ASKER

Thanks for that Bill!

Where can i get 'The Data Warehouse Toolkit' from? Is this free or is there a charge?

On a sidenote, going back to the question you answered earlier.....Is it fair to say that Sybase IQ's vertical storage is only good for single column "Selects" because it can pin-point only the necessary column/page as opposed to "Select All" columns??

Thanks again!
ASKER CERTIFIED SOLUTION
Avatar of grant300
grant300

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial