johnftamburo
asked on
Demographic: Dimension or fact
I have a fairly simple star. 7 dimensions that are easily set up. Fact table with various dates and counts. It contains customer ID Customer is a dimension with primary key customer ID.
There is a demographic table. I have denormalized it to this:
ID customer id Demo Demo Value
46965000 274701 Employee Size 250 to 499
46355000 274701 Industry Computer Manufacturer
45745000 274701 Job Title Corporate Management
49145000 274701 Opt-out Do Not Promote
43880000 274701 Original Product Brand A Seminar
43880001 274701 Original Product Brand B Print
43880002 274701 Original Product Brand B Seminar
43880003 274701 Original Product Brand C Print
43880004 274701 Original Product Brand C Newsletter
47575000 274701 Purchasing Budget less than $5,000,000
48185000 274701 Purchasing Scope Technical Decision Maker
43270000 274701 Status Incomplete
46965001 274702 Employee Size 250 to 499
46355001 274702 Industry Financial/Banking/Insuranc e/Real Estate/Legal
45745001 274702 Job Title Sales/Marketing
49145001 274702 Opt-out Newsletter A Opt-out
43880005 274702 Original Product Brand C Tradeshow
43880006 274702 Original Product Brand C Seminar
43880007 274702 Original Product Brand C Newsletter
47575001 274702 Purchasing Budget $10,000,000 to $24,999,999
48185001 274702 Purchasing Scope Technical Decision Maker
48185002 274702 Purchasing Scope Financial Decision Maker
43270001 274702 Status Reserve
46965002 274703 Employee Size 5,000 to 9,999
46355002 274703 Industry Travel/Hospitality/Recreat ion
45745002 274703 Job Title Purchasing/Buyer
43880008 274703 Original Product Brand A Tradeshow
43880009 274703 Original Product Brand A Newsletter
47575002 274703 Purchasing Budget $150,000,000+
48185003 274703 Purchasing Scope Evaluate Products/Services
43270002 274703 Status Reserve
46965003 274704 Employee Size 500 to 999
46355003 274704 Industry Transportation
45745003 274704 Job Title Operating Management
49145002 274704 Opt-out Do Not Promote
43880010 274704 Original Product Brand A Print
43880011 274704 Original Product Brand A Seminar
43880012 274704 Original Product Brand B Print
43880013 274704 Original Product Brand B Newsletter
47575003 274704 Purchasing Budget $5,000,000 to $9,999,999
48185004 274704 Purchasing Scope Technical Decision Maker
Each customer can have an infinite number of demographics. Many already have 1500+ defined.
Management wants to drill into demographics and see the facts for it. Normally one would spread the demographics into columns but there are too many demographics to do so. But we run into column limitations.
Is there a way to define this as a fact table or something that would allow my managers to slice and dice the other fact table as if this demographic table were a Dimension?
I wish there were 500,000 points for this one!
There is a demographic table. I have denormalized it to this:
ID customer id Demo Demo Value
46965000 274701 Employee Size 250 to 499
46355000 274701 Industry Computer Manufacturer
45745000 274701 Job Title Corporate Management
49145000 274701 Opt-out Do Not Promote
43880000 274701 Original Product Brand A Seminar
43880001 274701 Original Product Brand B Print
43880002 274701 Original Product Brand B Seminar
43880003 274701 Original Product Brand C Print
43880004 274701 Original Product Brand C Newsletter
47575000 274701 Purchasing Budget less than $5,000,000
48185000 274701 Purchasing Scope Technical Decision Maker
43270000 274701 Status Incomplete
46965001 274702 Employee Size 250 to 499
46355001 274702 Industry Financial/Banking/Insuranc
45745001 274702 Job Title Sales/Marketing
49145001 274702 Opt-out Newsletter A Opt-out
43880005 274702 Original Product Brand C Tradeshow
43880006 274702 Original Product Brand C Seminar
43880007 274702 Original Product Brand C Newsletter
47575001 274702 Purchasing Budget $10,000,000 to $24,999,999
48185001 274702 Purchasing Scope Technical Decision Maker
48185002 274702 Purchasing Scope Financial Decision Maker
43270001 274702 Status Reserve
46965002 274703 Employee Size 5,000 to 9,999
46355002 274703 Industry Travel/Hospitality/Recreat
45745002 274703 Job Title Purchasing/Buyer
43880008 274703 Original Product Brand A Tradeshow
43880009 274703 Original Product Brand A Newsletter
47575002 274703 Purchasing Budget $150,000,000+
48185003 274703 Purchasing Scope Evaluate Products/Services
43270002 274703 Status Reserve
46965003 274704 Employee Size 500 to 999
46355003 274704 Industry Transportation
45745003 274704 Job Title Operating Management
49145002 274704 Opt-out Do Not Promote
43880010 274704 Original Product Brand A Print
43880011 274704 Original Product Brand A Seminar
43880012 274704 Original Product Brand B Print
43880013 274704 Original Product Brand B Newsletter
47575003 274704 Purchasing Budget $5,000,000 to $9,999,999
48185004 274704 Purchasing Scope Technical Decision Maker
Each customer can have an infinite number of demographics. Many already have 1500+ defined.
Management wants to drill into demographics and see the facts for it. Normally one would spread the demographics into columns but there are too many demographics to do so. But we run into column limitations.
Is there a way to define this as a fact table or something that would allow my managers to slice and dice the other fact table as if this demographic table were a Dimension?
I wish there were 500,000 points for this one!
Me, I'd just define the demographic data as an attribute. Then you can use it to "Filter" the results using the MDX filter function. But it you want a junk dimension for it, your best bet is to define a alphabet tree on top of it. Create a calculated field int he database that breaks out the first letter of the demographic (and if there's a lot, maybe the second letter too), and use those as the first and second level of your hierarchy. In the final cube, they'll be able to drill down on the letters to the demographic they want.
As you have correctly identified, the root of the problem is that you have a 'fan trap' - a Many > One > Many join going on, which in turns results in double counting facts (when a customer is in more than one demographic)
The only bits of advice I can give are:
1. How do they want to see a single fact figure against two demographics - split, weighted, constant?
2. Cognos 8 (and possible other BI tools) automatically identifies these kind of joins and stops double counting occuring
Different BI toolshandle this in different ways, can you tell us whether you are using ROLAP or OLAP, and which BI platform you are using? I can give advice for Cognos but unfortunately not for Analysis Services.
The only bits of advice I can give are:
1. How do they want to see a single fact figure against two demographics - split, weighted, constant?
2. Cognos 8 (and possible other BI tools) automatically identifies these kind of joins and stops double counting occuring
Different BI toolshandle this in different ways, can you tell us whether you are using ROLAP or OLAP, and which BI platform you are using? I can give advice for Cognos but unfortunately not for Analysis Services.
ASKER
We're using SSAS 2005 cube.
They would like to do things like count customers in transportation industry with company size 1000-4999, or to see how many emails were clicked by customers in that delta (stored in the other of the two fact tables).
They would like to do things like count customers in transportation industry with company size 1000-4999, or to see how many emails were clicked by customers in that delta (stored in the other of the two fact tables).
I beleive having more than one fact table feeding a cube is a new feature in 2005. But I don't know enough to give you any more advice except to say that I suggest you research how SSAS 2005 handles these kinds of fan trap (or many to many) joins when using more than one fact table.
Even though the demographics dimension is not a fact table, it is treated that way from a double counting perspective.
Sorry I can't help further. If I had time I would sit down and read some doco. I am interested in the outcome though.
It appears to me that the answer lies in the tool recognising and allowing for this situation. As you have stated its not practical to pivot these demographics into an attribute.
Even though the demographics dimension is not a fact table, it is treated that way from a double counting perspective.
Sorry I can't help further. If I had time I would sit down and read some doco. I am interested in the outcome though.
It appears to me that the answer lies in the tool recognising and allowing for this situation. As you have stated its not practical to pivot these demographics into an attribute.
ASKER
Fro a demo I pivoted everything. The demo was a smash, to my great chagrin, since I now must resolve the fan trap! (:-S)
I will research and continue to look; any advice would be truly appreciated.
I will research and continue to look; any advice would be truly appreciated.
Glad you demo is a smash. Its amazing to think that some users still haven't come accross this stuff, they're stuck in Excel land - crazy!
I suggest you thoroughly read any information you can find in SSAS doco about multiple fact tables feeding a cube (even though this is a dimension table)
I suggest you thoroughly read any information you can find in SSAS doco about multiple fact tables feeding a cube (even though this is a dimension table)
ASKER
I have more research. The more I review this the more I am convinced that I may have managed to combine a fantrap and a chasm trap!!!
Fantrap: The customer number can have 1...n demographics, which in turn can have 1...n answers per demographic. Customer_id is on fact table but is also a dimension.
Chasm Trap: in SSAS you cannot have a dimension linked to a fact except by dimension's primary key (duh). So Customer dimension is linked to fact, and demographic is linked to customer. customer = 1...n facts and customer = 1...n demographic/answer combinations.
There is no way, for example, to drill into the facts as follows: Drill into corporate management and then into employee size. So if they want the aggregations of facts for corporate management for 1-50 and 51-100 employees, it cannot be done in SSAS using this configuration.
I still cannot figure out how to resolve this dilemma.
Fantrap: The customer number can have 1...n demographics, which in turn can have 1...n answers per demographic. Customer_id is on fact table but is also a dimension.
Chasm Trap: in SSAS you cannot have a dimension linked to a fact except by dimension's primary key (duh). So Customer dimension is linked to fact, and demographic is linked to customer. customer = 1...n facts and customer = 1...n demographic/answer combinations.
There is no way, for example, to drill into the facts as follows: Drill into corporate management and then into employee size. So if they want the aggregations of facts for corporate management for 1-50 and 51-100 employees, it cannot be done in SSAS using this configuration.
I still cannot figure out how to resolve this dilemma.
Most times the answer to these traps is that you have to 'impose' a particular 'join' upon the data which in turn resolves the many to many join.
In pure SQL, the answer would be to do something like this:
SELECT C.Customer_ID, SUM(R.Revenue)
FROM factRevenue R
INNER JOIN
dimCustomer C
ON R.CustomerID = C.CustomerID
INNER JOIN
(
SELECT DISTINCT CustomerID FROM dim_Demographic
WHERE Demo IN (DemoList) AND DemoValue .......
) DU
ON DU.CustomerID = C.CustomerID
GROUP BY C.Customer_ID
...I think that would be something that would stop double counting. I'n not sure if it is exaclty what you're after. However the trick is to get SSAS to work this statement out automatically on the fly.
Cognos can recognise this stuff and do it automatically to a certain extent and I am interested in whether SSAS can too.
Unfortunately I still have no solid answer for you!
In pure SQL, the answer would be to do something like this:
SELECT C.Customer_ID, SUM(R.Revenue)
FROM factRevenue R
INNER JOIN
dimCustomer C
ON R.CustomerID = C.CustomerID
INNER JOIN
(
SELECT DISTINCT CustomerID FROM dim_Demographic
WHERE Demo IN (DemoList) AND DemoValue .......
) DU
ON DU.CustomerID = C.CustomerID
GROUP BY C.Customer_ID
...I think that would be something that would stop double counting. I'n not sure if it is exaclty what you're after. However the trick is to get SSAS to work this statement out automatically on the fly.
Cognos can recognise this stuff and do it automatically to a certain extent and I am interested in whether SSAS can too.
Unfortunately I still have no solid answer for you!
ASKER
Wow, thanks. You have been helpful. Very helpful. I will go find out more.
Thanks!
Thanks!
Are you treating your demographic table as a fact or as a dimension. It looks like it could be used as a "factless" fact table to me. (meaning that the existence of the row is itself the fact).
Make two named queries in the DSV like this:
DimDemographicCategory
SELECT DISTINCT [Demo] AS DemographicCategory FROM [That table above]
DimDemographic
SELECT DISTINCT [Demo Value] AS DemographicValue FROM [That table above]
Build two separate single attribute dimensions from these and join them to [that table above]. Create a calculated column or add a column to the demographic table above that contains only the number 1 cast as an integer and add it as a measure.
You'll be able then to count the number of customers falling into whatever variety of demographic categories and values if that gets you any closer to what you're looking for. You will have to make sure that both dimensions are used in every query for the results to be trustworthy.
If this makes any sense at all go ahead and give it a try if you already haven't had success. Otherwise I can try to clarify more if this seems obscure. Or maybe I'm just all wet.
Make two named queries in the DSV like this:
DimDemographicCategory
SELECT DISTINCT [Demo] AS DemographicCategory FROM [That table above]
DimDemographic
SELECT DISTINCT [Demo Value] AS DemographicValue FROM [That table above]
Build two separate single attribute dimensions from these and join them to [that table above]. Create a calculated column or add a column to the demographic table above that contains only the number 1 cast as an integer and add it as a measure.
You'll be able then to count the number of customers falling into whatever variety of demographic categories and values if that gets you any closer to what you're looking for. You will have to make sure that both dimensions are used in every query for the results to be trustworthy.
If this makes any sense at all go ahead and give it a try if you already haven't had success. Otherwise I can try to clarify more if this seems obscure. Or maybe I'm just all wet.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks; I will look into this right away.
Very interesting read, so it seems SSAS can handle these many to many relationships. I'm be interested to know what actual effect it has - does it actually recognise that it will cause double counting, and stop the double counting occur?
It won't always suppress double counting, but that is expected. A many-to-many implies that double counting is sometimes appropriate.
Looking at the data above, suppose the sales for some time period were:
Customer ID Sales
274701 100,000
274702 300,000
274703 500,000
The following queries would return as follows. I'm just using some made up metadata which I hope is clear enough. Hopefully this formats well or I will try to re-post.
SELECT [Measures].[Sales] on columns, [Customer].[CustomerID] on rows FROM [mycube]
274701 100,000
274702 300,000
274703 500,000
SELECT [Measures].[Sales] on columns, [Demographics].[Category].
250 to 499 400,000
5,000 to 9,999 500,000
SELECT [Measures].[Sales] on columns, [Demographics].[Category].
-- this may be what one would consider double counting, but provided you know what you're looking for you can get exactly what you're looking for.
Brand A Newsletter 500,000
Brand A Seminar 100,000
Brand A Tradeshow 500,000
Brand B Print 100,000
Brand B Seminar 100,000
Brand C Newsletter 400,000 (sum of 274701 and 274702)
Brand C Print 100,000
Brand C Seminar 300,000
Brand C Tradeshow 300,000
SELECT [Measures].[Sales] on columns, CROSSJOIN([Demographics].[
250 to 499 274701 100,000
250 to 499 274702 300,000
5,000 to 9,999 274703 500,000
Does this answer your question?
ASKER
Wow!
I have used EE for years, but I never knew how much of an awesome asset it could be to OLAP/Warehousing issues!
I will post up in the next couple days after I implement the suggestions in the cited web page.
Thanks! (:-D)
I have used EE for years, but I never knew how much of an awesome asset it could be to OLAP/Warehousing issues!
I will post up in the next couple days after I implement the suggestions in the cited web page.
Thanks! (:-D)
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I am sorry I have not posted back up on this; I intend to find out by Friday how this works.
ASKER
I am working this this week. Finally we will have closure.
ASKER
I wish I could award 1000 points!