Solved

EAV model used in datawarehouse design?

Posted on 2009-05-05
4
937 Views
Last Modified: 2012-08-14
I am designing a database for a potentially large data warehouse.   Several of the fact tables may contain over a hundred columns.  Some of these columns may be sparse since data extracted from various sources may have different attribute.   We are debating between

1)  Keep all columns and deal with sparseness.  A few colleagues of mine worried about tables being sparse, but I don't see any problem with it. Perhaps I misses some important thing here.  Would the width (number of columns) of a table have any impact on performance that much?

or

2)  Move sparse columns into an EAV table.  Queries will be more expensive, but how much more?  Would you use this model in your own data warehouse?

I am not familiar enough with datawarehouse performance and think I would post this question ot the gurus in this community.


Thanks for your comments in advance!
pax
0
Comment
Question by:cpeters5
  • 2
4 Comments
 
LVL 22

Assisted Solution

by:dportas
dportas earned 100 total points
ID: 24303298
SQL Server has support for sparse numerics (ie. variable size storage for numerics) and page or row compression. So size alone should not be an issue.

Assuming you want a normalized DW design, consider decomposing the "sparse" attributes into a new table (ie. vertical partitioning).
0
 

Author Comment

by:cpeters5
ID: 24303445
Thanks dportas,
We did start with normalized schema, then we denormalized to form a wide fact table.  We wanted to avoid table joins.  The frontend will be a BI product, probably Business Object.

As per NULL values, it has alwasy been my understanding that using variable size variable should remove this concern.  I just wanted to confirm my understanding with the expert here.

I can see that EAV is appropriate when the attributes may change over time.  Is there any other reason why we should use EAV in a datawarehose project?

0
 
LVL 22

Expert Comment

by:dportas
ID: 24303812
>> We did start with normalized schema, then we denormalized to form a wide fact table.

That's unfortunate. Are you familiar with DW design practices? Are you following a model such as Inmon or Kimball?

>> I can see that EAV is appropriate when the attributes may change over time.  Is there any other reason why we should use EAV in a datawarehose project?

No to both of those things.
0
 
LVL 51

Accepted Solution

by:
Mark Wills earned 400 total points
ID: 24315989
EAV is designed to accommodate dynamic attributes, or minimally, variable attributes over time where adding columns is not likely. The SQL to return data is more involved and can get complex (or maybe confusing), sometimes requiring recursion to solve.

The argument that is saves space is no longer as relevant as it once used to be. The argument of having variable or "as yet to be qualified" attributes is not a good thing for a data warehouse.

So, having tables help describe and manage both metadata elements and values are not the best thing for a data warehouse. Maybe SPARQL will fix that, but not yet.

Entity Attribute Value does have certain attractions and flexibility, so could be used for semi-structured data, or user defined data, and better suited to that transactional type space where such queries are often smaller and better defined than those we encounter in a datawarewouse.

There is a finite number of columns in a table, and in very large datawarehouse projects, it is sometimes neccessary to move some columns or data elements into another table... In those cases, it can help to adopt a quasi-eav leaving out the dynamic attribute (let that be the table) and adopt a datatype more closely resembling the nature and type of data in that sparse column.

SQL2008 is supporting sparese data a lot more where you can specify columns as spares, and create indexes on the as well. They also support an XML "column-set" for grouping otherwise sparse data (or maybe just felixbly trying to cater for an in line name-value).

It is more a "price" question, flexibility versus size, with the real "cost" being speed, data typing, excessive storage, SQL code complexity.

My vote ? Whilst not a DW purist (therfore not dead against EAV), and in fact a bit of a fan of EAV in appropriate places, would not use it in a datawarehouse.
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

CCModeler offers a way to enter basic information like entities, attributes and relationships and export them as yEd or erviz diagram. It also can import existing Access or SQL Server tables with relationships.
How to leverage one TLS certificate to encrypt Microsoft SQL traffic and Remote Desktop Services, versus creating multiple tickets for the same server.
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now