EAV model used in datawarehouse design?

Posted on 2009-05-05
Medium Priority
Last Modified: 2012-08-14
I am designing a database for a potentially large data warehouse.   Several of the fact tables may contain over a hundred columns.  Some of these columns may be sparse since data extracted from various sources may have different attribute.   We are debating between

1)  Keep all columns and deal with sparseness.  A few colleagues of mine worried about tables being sparse, but I don't see any problem with it. Perhaps I misses some important thing here.  Would the width (number of columns) of a table have any impact on performance that much?


2)  Move sparse columns into an EAV table.  Queries will be more expensive, but how much more?  Would you use this model in your own data warehouse?

I am not familiar enough with datawarehouse performance and think I would post this question ot the gurus in this community.

Thanks for your comments in advance!
Question by:cpeters5
  • 2
LVL 22

Assisted Solution

dportas earned 400 total points
ID: 24303298
SQL Server has support for sparse numerics (ie. variable size storage for numerics) and page or row compression. So size alone should not be an issue.

Assuming you want a normalized DW design, consider decomposing the "sparse" attributes into a new table (ie. vertical partitioning).

Author Comment

ID: 24303445
Thanks dportas,
We did start with normalized schema, then we denormalized to form a wide fact table.  We wanted to avoid table joins.  The frontend will be a BI product, probably Business Object.

As per NULL values, it has alwasy been my understanding that using variable size variable should remove this concern.  I just wanted to confirm my understanding with the expert here.

I can see that EAV is appropriate when the attributes may change over time.  Is there any other reason why we should use EAV in a datawarehose project?

LVL 22

Expert Comment

ID: 24303812
>> We did start with normalized schema, then we denormalized to form a wide fact table.

That's unfortunate. Are you familiar with DW design practices? Are you following a model such as Inmon or Kimball?

>> I can see that EAV is appropriate when the attributes may change over time.  Is there any other reason why we should use EAV in a datawarehose project?

No to both of those things.
LVL 51

Accepted Solution

Mark Wills earned 1600 total points
ID: 24315989
EAV is designed to accommodate dynamic attributes, or minimally, variable attributes over time where adding columns is not likely. The SQL to return data is more involved and can get complex (or maybe confusing), sometimes requiring recursion to solve.

The argument that is saves space is no longer as relevant as it once used to be. The argument of having variable or "as yet to be qualified" attributes is not a good thing for a data warehouse.

So, having tables help describe and manage both metadata elements and values are not the best thing for a data warehouse. Maybe SPARQL will fix that, but not yet.

Entity Attribute Value does have certain attractions and flexibility, so could be used for semi-structured data, or user defined data, and better suited to that transactional type space where such queries are often smaller and better defined than those we encounter in a datawarewouse.

There is a finite number of columns in a table, and in very large datawarehouse projects, it is sometimes neccessary to move some columns or data elements into another table... In those cases, it can help to adopt a quasi-eav leaving out the dynamic attribute (let that be the table) and adopt a datatype more closely resembling the nature and type of data in that sparse column.

SQL2008 is supporting sparese data a lot more where you can specify columns as spares, and create indexes on the as well. They also support an XML "column-set" for grouping otherwise sparse data (or maybe just felixbly trying to cater for an in line name-value).

It is more a "price" question, flexibility versus size, with the real "cost" being speed, data typing, excessive storage, SQL code complexity.

My vote ? Whilst not a DW purist (therfore not dead against EAV), and in fact a bit of a fan of EAV in appropriate places, would not use it in a datawarehouse.

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Creating a Cordova application which allow user to save to/load from his Dropbox account the application database.
Blockchain technology enhances society similar to the Internet. Its effects are broad, disruptive, and will boost global productivity.
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…
Planning to migrate your EDB file(s) to a new or an existing Outlook PST file? This video will guide you how to convert EDB file(s) to PST. Besides this, it also describes, how one can easily search any item(s) from multiple folders or mailboxes…

621 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question