Question on datawarehousing with Sybase IQ

Posted on 2009-02-19
Last Modified: 2012-05-06

Could you kindly explain what 'Facts' and 'Dimensions' are in relation to data warehousing?

I've come across these terms but i'm trying to understand what they mean, i'm totally lost to this concept...

Question by:SybaseUk
    LVL 19

    Expert Comment

    Facts and Dimensions are part of what is generally called the Dimensional Model for data warehousing.  You sometimes here it referred to as a "Star Schema" with the fact being the sun and the dimensions being planets.  The father and prime formalizer of this school is Ralph Kimball.  If you are involved with designing and or maintaining a data warehouse, you should have a copy of the seminal work, The Data Warehouse Toolkit.

    Facts are generally irreducible pieces of data.  The dimensions are the normalizable and sometimes hierarchical details associated with the fact.  The "granularity" of the fact is defined by the dimensions that identify it.

    As an example, let's take a typical retail star schema for a grocery store.  Lets further assume that the fact is each sale of each individual item.  So your dimensions might be:
     - Period : there is always a hierarchical time dimension so that you can browse and group by certain periods, e.g. portion of the day, day of the week, weekly, monthly, holidays, etc.  This dimension should have every time/calendar specific piece of information like holidays (they sell a lot of food then).  Even though you may have the date and time to the milisecond, your period dimension probably has a granularity of say quarter hours.
     - Store : assuming this is a chain and not just one store, you will want a dimension that identifies each.  The dimension should probably have geographic and organizational hierarchies, e.g. city, state, zip code, MMA, as well as zone and region that the company uses to organize stores.  A more sophisticated model might also include information about operating hours for example so that you can adjust reports to take that into account.
     - Product : this will have a row for each possible SKU the store handles.  You will track manufacturer, size, may be case size; essentially every static piece of information about that product.
     - Customer : if you have ever wondered why the grocery stores are so hot to get you to apply for a loyalty card, now you know.  The loyalty card is the best way for them to build a customer dimension, track demographics and do householding.  That is why they as things like your family income and whether you rent or own a home.  Even if you don't have a loyalty card, when you use a credit or debit card, they can attach a unique identifier that they have a high probability of seeing again the next time you purchase something.

    So now your fact table has 4 surrogate keys (Period_ID, Store_ID, SKU, and Customer_ID), some "facts" like price, quantity, margin (if it is a really smart system), information about any coupon that was used, etc, and some details.  The details are generally non-numeric items (non-additive facts) that you still might want to know like the register ID, clerk ID, etc.

    When you are loading data into the data warehouse, you have to make sure the dimensions are taken care of first since you will need the surrocate keys to load the fact.  In this example, Period and Store are pretty stable and are pre-loaded with known values.  Most of the Product table will be static and preloaded however, you have to be able to handle new and regional items on the fly.  These including things like microbrews that are purchased and handled by only a handful of stores.  Even if you have to go back later and fill in the details for the SKU, you have to create a dimension record and the surrogate key.  Customer will be a much more dynamic dimension with new identifiers all the time.  You may also have group customer records pre-populated to deal with those that can not be identified.  You can group them by method of payment, e.g. check, cash, food stamps, other, etc. and perhaps total sale buckets, e.g. $0-$5, $5-$10, $10-$20, $20-$50, $50-$75, $75-$100, $100-$150, $150-----etc.

    I could go on and write a book, or at least a few chapters, but this gives you the idea.  As I said before, the real way to understand this technique in depth is to get a copy of "The Data Warehouse Toolkit".  It is an easy read, and, unlike many technical books, is organized buy the type of business problem you are trying to solve more than the subsection of the technology.  Learning with practical examples you can actually identify with is much easier.


    Author Comment

    Thanks for that Bill!

    Where can i get 'The Data Warehouse Toolkit' from? Is this free or is there a charge?

    On a sidenote, going back to the question you answered earlier.....Is it fair to say that Sybase IQ's vertical storage is only good for single column "Selects" because it can pin-point only the necessary column/page as opposed to "Select All" columns??

    Thanks again!
    LVL 19

    Accepted Solution

    "The Data Warehouse Toolkit" is available from and maybe stocked at your local book store.

    No, that is not fair to say.  You are taking two extremes; select * versus select MyColumn, neither of which is very realistic, particularly in a data warehouse and analytic queries.

    You are missing the purpose of a data warehouse.  It is not designed so you can pick out one complete record at a time.  That is just a simple repository and usually pretty worthless for doing analytics.

    A data warehouse lets you answer questions about the data, not simply regurgitate rows.  It is pointless to structure a query against our fictitious grocery store sales DW in the form...

    select *
      from Period, Product, Customer, Store, Fact
     where Fact.PID = Period.PID
        and Fact.SKU = Product.SKU
        and Fact.CustID = Customer.CustID
        and Fact.StoreID = Store.StoreID

    Even if you add clauses to reduce the amount of data, what you have is still just a pile of raw data and no value beyond having a flat file.  You can always go back to the POS system if you want to see an individual record, e.g. to handle a single customer complaint about an item.

    The power of the data warehouse and OLAP (On Line Analytic Processing) is that you can quickly characterize the data and discover trends and patterns in mountains of data you could never manipulate one record at a time.  A famous data mining case study for a convenience store chain discovered there was a correlation between late evening beer and displosable diaper sales.  It turns out that wives send their husbands out to get diapers when they run out and, kind of as a consolation prize, the husbands pick up some beer while they are there.  The chain capitolized on this by putting premium beer on the end panels of the isles with the diapers.

    The point is that nobody gets smarter by looking at individual rows of data.  It was the correlation that mattered.  Likewise, if you have a trend of customer complaints, you might look for a pattern to the geography, product(s), or cuatomer demographics.  Again, SELECT * is not going to be appropriate, much less helpful.


    Featured Post

    Why You Should Analyze Threat Actor TTPs

    After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

    Join & Write a Comment

    Suggested Solutions

    It can often be challenging to stay relevant in the rapidly evolving world of technology. This can make recruiting talent difficult for companies of all sizes.
    Outlook Free & Paid Tools
    Need more eyes on your posted question? Go ahead and follow the quick steps in this video to learn how to Request Attention to your question. *Log into your Experts Exchange account *Find the question you want to Request Attention for *Go to the e…
    Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…

    733 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    23 Experts available now in Live!

    Get 1:1 Help Now