Link to home
Start Free TrialLog in
Avatar of johnftamburo
johnftamburo

asked on

Demographic: Dimension or fact

I have a fairly simple star.  7 dimensions that are easily set up.  Fact table with various dates and counts.  It contains customer ID  Customer is a dimension with primary key customer ID.

There is a demographic table.  I have denormalized it to this:
ID                  customer id  Demo                          Demo Value
46965000      274701      Employee Size      250 to 499
46355000      274701      Industry      Computer Manufacturer
45745000      274701      Job Title      Corporate Management
49145000      274701      Opt-out      Do Not Promote
43880000      274701      Original Product      Brand A Seminar
43880001      274701      Original Product      Brand B Print
43880002      274701      Original Product      Brand B Seminar
43880003      274701      Original Product      Brand C Print
43880004      274701      Original Product      Brand C Newsletter
47575000      274701      Purchasing Budget      less than $5,000,000
48185000      274701      Purchasing Scope      Technical Decision Maker
43270000      274701      Status      Incomplete
46965001      274702      Employee Size      250 to 499
46355001      274702      Industry      Financial/Banking/Insurance/Real Estate/Legal
45745001      274702      Job Title      Sales/Marketing
49145001      274702      Opt-out      Newsletter A Opt-out
43880005      274702      Original Product      Brand C Tradeshow
43880006      274702      Original Product      Brand C Seminar
43880007      274702      Original Product      Brand C Newsletter
47575001      274702      Purchasing Budget      $10,000,000 to $24,999,999
48185001      274702      Purchasing Scope      Technical Decision Maker
48185002      274702      Purchasing Scope      Financial Decision Maker
43270001      274702      Status      Reserve
46965002      274703      Employee Size      5,000 to 9,999
46355002      274703      Industry      Travel/Hospitality/Recreation
45745002      274703      Job Title      Purchasing/Buyer
43880008      274703      Original Product      Brand A Tradeshow
43880009      274703      Original Product      Brand A Newsletter
47575002      274703      Purchasing Budget      $150,000,000+
48185003      274703      Purchasing Scope      Evaluate Products/Services
43270002      274703      Status      Reserve
46965003      274704      Employee Size      500 to 999
46355003      274704      Industry      Transportation
45745003      274704      Job Title      Operating Management
49145002      274704      Opt-out      Do Not Promote
43880010      274704      Original Product      Brand A Print
43880011      274704      Original Product      Brand A Seminar
43880012      274704      Original Product      Brand B Print
43880013      274704      Original Product      Brand B Newsletter
47575003      274704      Purchasing Budget      $5,000,000 to $9,999,999
48185004      274704      Purchasing Scope      Technical Decision Maker

Each customer can have an infinite number of demographics.  Many already have 1500+ defined.

Management wants to drill into demographics and see the facts for it.  Normally one would spread the demographics into columns but there are too many demographics to do so.  But we run into column limitations.

Is there a way to define this as a fact table or something that would allow my managers to slice and dice the other fact table as if this demographic table were a Dimension?

I wish there were 500,000 points for this one!
Avatar of RWrigley
RWrigley
Flag of Canada image

Me, I'd just define the demographic data as an attribute.  Then you can use it to "Filter" the results using the MDX filter function.  But it you want a junk dimension for it, your best bet is to define a alphabet tree on top of it.  Create a calculated field int he database that breaks out the first letter of the demographic (and if there's a lot, maybe the second letter too), and use those as the first and second level of your hierarchy.  In the final cube, they'll be able to drill down on the letters to the demographic they want.
Avatar of nmcdermaid
nmcdermaid

As you have correctly identified, the root of the problem is that you have a 'fan trap' - a Many > One > Many join going on, which in turns results in double counting facts (when a customer is in more than one demographic)
The only bits of advice I can give are:
1. How do they want to see a single fact figure against two demographics - split, weighted, constant?
2. Cognos 8 (and possible other BI tools) automatically identifies these kind of joins and stops double counting occuring
Different BI toolshandle this in different ways, can you tell us whether you are using ROLAP or OLAP, and which BI platform you are using? I can give advice for Cognos but unfortunately not for Analysis Services.
Avatar of johnftamburo

ASKER

We're using SSAS 2005 cube.

They would like to do things like count customers in transportation industry with company size 1000-4999, or to see how many emails were clicked by customers in that delta (stored in the other of the two fact tables).

I beleive having more than one fact table feeding a cube is a new feature in 2005. But I don't know enough to give you any more advice except to say that I suggest you research how SSAS 2005 handles these kinds of fan trap (or many to many) joins when using more than one fact table.
Even though the demographics dimension is not a fact table, it is treated that way from a double counting perspective.
Sorry I can't help further. If I had time I would sit down and read some doco. I am interested in the outcome though.
It appears to me that the answer lies in the tool recognising and allowing for this situation. As you have stated its not practical to pivot these demographics into an attribute.
Fro a demo I pivoted everything.  The demo was a smash, to my great chagrin, since I now must resolve the fan trap! (:-S)

I will research and continue to look; any advice would be truly appreciated.
Glad you demo is a smash. Its amazing to think that some users still haven't come accross this stuff, they're stuck in Excel land - crazy!
I suggest you thoroughly read any information you can find in SSAS doco about multiple fact tables feeding a cube (even though this is a dimension table)
I have more research.  The more I review this the more I am convinced that I may have managed to combine a fantrap and a chasm trap!!!  

Fantrap:  The customer number can have 1...n demographics, which in turn can have 1...n answers per demographic.  Customer_id is on fact table but is also a dimension.

Chasm Trap: in SSAS you cannot have a dimension linked to a fact except by dimension's primary key (duh).  So Customer dimension is linked to fact, and demographic is linked to customer.  customer = 1...n facts and customer = 1...n demographic/answer combinations.

There is no way, for example, to drill into the facts as follows:  Drill into corporate management and then into employee size.  So if they want the aggregations of facts for corporate management for 1-50 and 51-100 employees, it cannot be done in SSAS using this configuration.

I still cannot figure out how to resolve this dilemma.
Most times the answer to these traps is that you have to 'impose' a particular 'join' upon the data which in turn resolves the many to many join.
In pure SQL, the answer would be to do something like this:
SELECT C.Customer_ID, SUM(R.Revenue)
FROM factRevenue R
INNER JOIN
dimCustomer C
ON R.CustomerID = C.CustomerID
INNER JOIN
(
  SELECT DISTINCT CustomerID FROM dim_Demographic
  WHERE Demo IN (DemoList) AND DemoValue .......
) DU
ON DU.CustomerID = C.CustomerID
GROUP BY C.Customer_ID
...I think that would be something that would stop double counting. I'n not sure if it is exaclty what you're after. However the trick is to get SSAS to work this statement out automatically on the fly.
Cognos can recognise this stuff and do it automatically to a certain extent and I am interested in whether SSAS can too.
Unfortunately I still have no solid answer for you!
Wow, thanks.  You have been helpful.  Very helpful.  I will go find out more.

Thanks!
Are you treating your demographic table as a fact or as a dimension. It looks like it could be used as a "factless" fact table to me. (meaning that the existence of the row is itself the fact).

Make two named queries in the DSV like this:
DimDemographicCategory
SELECT DISTINCT [Demo] AS DemographicCategory FROM [That table above]

DimDemographic
SELECT DISTINCT [Demo Value] AS DemographicValue FROM [That table above]

Build two separate single attribute dimensions from these and join them to [that table above]. Create a calculated column or add a column to the demographic table above that contains only the number 1 cast as an integer and add it as a measure.

You'll be able then to count the number of customers falling into whatever variety of demographic categories and values if that gets you any closer to what you're looking for. You will have to make sure that both dimensions are used in every query for the results to be trustworthy.
If this makes any sense at all go ahead and give it a try if you already haven't had success. Otherwise I can try to clarify more if this seems obscure.  Or maybe I'm just all wet.
ASKER CERTIFIED SOLUTION
Avatar of agandau
agandau

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks; I will look into this right away.
Very interesting read, so it seems SSAS can handle these many to many relationships. I'm be interested to know what actual effect it has - does it actually recognise that it will cause double counting, and stop the double counting occur?

It won't always suppress double counting, but that is expected. A many-to-many implies that double counting is sometimes appropriate.

Looking at the data above, suppose the sales for some time period were:
Customer ID    Sales
274701           100,000
274702           300,000
274703           500,000

The following queries would return as follows.  I'm just using some made up metadata which I hope is clear enough.  Hopefully this formats well or I will try to re-post.

SELECT [Measures].[Sales] on columns, [Customer].[CustomerID] on rows FROM [mycube]

274701           100,000
274702           300,000
274703           500,000

SELECT [Measures].[Sales] on columns, [Demographics].[Category].[Employee Size].Children on rows

250 to 499            400,000
5,000 to 9,999      500,000

SELECT [Measures].[Sales] on columns, [Demographics].[Category].[Original Product].Children on rows
-- this may be what one would consider double counting, but provided you know what you're looking for you can get exactly what you're looking for.

Brand A Newsletter         500,000        
Brand A Seminar             100,000        
Brand A Tradeshow        500,000        
Brand B Print                    100,000        
Brand B Seminar              100,000        
Brand C Newsletter         400,000   (sum of 274701 and 274702)
Brand C Print                    100,000        
Brand C Seminar              300,000        
Brand C Tradeshow        300,000        

SELECT [Measures].[Sales] on columns, CROSSJOIN([Demographics].[Category].[Employee Size].Children,  [Customer].[CustomerID]) on rows

250 to 499           274701           100,000
250 to 499           274702           300,000
5,000 to 9,999     274703           500,000

Does this answer your question?
Wow!

I have used EE for years, but I never knew how much of an awesome asset it could be to OLAP/Warehousing issues!

I will post up in the next couple days after I implement the suggestions in the cited web page.

Thanks! (:-D)
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I am sorry I have not posted back up on this; I intend to find out by Friday how this works.
I am working this this week.  Finally we will have closure.
I wish I could award 1000 points!