I'm following Kimball Fact - Dimension design principles, but I've encountered a scenario that lends itself to a Snowflake structure, which I'm trying to avoid.
In our ERP, we have a table that contains both customer and vendor information. Here is a simplified version:
BusinessPartner Table
BP_ID,
BP_Type, --either Customer or Vendor
BP_Name,
SalesPersonCode, -- this only occurs on Customer records
AssociatedCustomerID --on some Vendors, this will be the BP_ID of an associated Customer
So, for example, we have:
BP_ID | BP_Type | BP_Name | SalesPersonID | AssociatedCustomerID
1 | C | FooBar | 123 | NULL
2 | V | BlueBar | NULL | NULL
3 | V | ShoeBar | NULL | 1
Certain accounting transactions will reference the Vendor ID, but these need to be associated with a customer ID in order to tie them to a salesperson. To help explain the relationship here is the SQL Query I'd use against the ERP table:
SELECT v.BP_ID, v.BP_Name, c.SalesPersonID
FROM BusinessPartner v
INNER JOIN BusinessPartner c ON c.BP_Type = 'C' AND c.BP_ID = v.AssociatedCustomerID
WHERE v.BP_Type = 'V'
note: this is NOT the query I'd use to load data for the datawarehouse
1. Should I split this BusinessPartner table up into a Customer Dimension table and Vendor Dimension table or keep them in one Business Partner Dimension table?
2. How do I structure this so that I'll be able to avoid a Snowflake design but still maintain this association I need? My guess is that I would omit the association from the dimension (or dimensions) altogether and handle the Salesperson association in the query that loads the GL Fact table (which has yet to be designed). But I'm not sure.
You could choose to still have the business partner dimension and additional vendor and customer dimensions. The association of customer and vendor is symmetric, so you'd have mirror facts, but that only means you can look at the data from two more perspectives.
Could you explain what design problems you face with all three dimensions?