Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win


EAV model used in datawarehouse design?

Posted on 2009-05-05
Medium Priority
Last Modified: 2012-08-14
I am designing a database for a potentially large data warehouse.   Several of the fact tables may contain over a hundred columns.  Some of these columns may be sparse since data extracted from various sources may have different attribute.   We are debating between

1)  Keep all columns and deal with sparseness.  A few colleagues of mine worried about tables being sparse, but I don't see any problem with it. Perhaps I misses some important thing here.  Would the width (number of columns) of a table have any impact on performance that much?


2)  Move sparse columns into an EAV table.  Queries will be more expensive, but how much more?  Would you use this model in your own data warehouse?

I am not familiar enough with datawarehouse performance and think I would post this question ot the gurus in this community.

Thanks for your comments in advance!
Question by:cpeters5
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
LVL 22

Assisted Solution

dportas earned 400 total points
ID: 24303298
SQL Server has support for sparse numerics (ie. variable size storage for numerics) and page or row compression. So size alone should not be an issue.

Assuming you want a normalized DW design, consider decomposing the "sparse" attributes into a new table (ie. vertical partitioning).

Author Comment

ID: 24303445
Thanks dportas,
We did start with normalized schema, then we denormalized to form a wide fact table.  We wanted to avoid table joins.  The frontend will be a BI product, probably Business Object.

As per NULL values, it has alwasy been my understanding that using variable size variable should remove this concern.  I just wanted to confirm my understanding with the expert here.

I can see that EAV is appropriate when the attributes may change over time.  Is there any other reason why we should use EAV in a datawarehose project?

LVL 22

Expert Comment

ID: 24303812
>> We did start with normalized schema, then we denormalized to form a wide fact table.

That's unfortunate. Are you familiar with DW design practices? Are you following a model such as Inmon or Kimball?

>> I can see that EAV is appropriate when the attributes may change over time.  Is there any other reason why we should use EAV in a datawarehose project?

No to both of those things.
LVL 51

Accepted Solution

Mark Wills earned 1600 total points
ID: 24315989
EAV is designed to accommodate dynamic attributes, or minimally, variable attributes over time where adding columns is not likely. The SQL to return data is more involved and can get complex (or maybe confusing), sometimes requiring recursion to solve.

The argument that is saves space is no longer as relevant as it once used to be. The argument of having variable or "as yet to be qualified" attributes is not a good thing for a data warehouse.

So, having tables help describe and manage both metadata elements and values are not the best thing for a data warehouse. Maybe SPARQL will fix that, but not yet.

Entity Attribute Value does have certain attractions and flexibility, so could be used for semi-structured data, or user defined data, and better suited to that transactional type space where such queries are often smaller and better defined than those we encounter in a datawarewouse.

There is a finite number of columns in a table, and in very large datawarehouse projects, it is sometimes neccessary to move some columns or data elements into another table... In those cases, it can help to adopt a quasi-eav leaving out the dynamic attribute (let that be the table) and adopt a datatype more closely resembling the nature and type of data in that sparse column.

SQL2008 is supporting sparese data a lot more where you can specify columns as spares, and create indexes on the as well. They also support an XML "column-set" for grouping otherwise sparse data (or maybe just felixbly trying to cater for an in line name-value).

It is more a "price" question, flexibility versus size, with the real "cost" being speed, data typing, excessive storage, SQL code complexity.

My vote ? Whilst not a DW purist (therfore not dead against EAV), and in fact a bit of a fan of EAV in appropriate places, would not use it in a datawarehouse.

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article shows how to get a list of available printers for display in a drop-down list, and then to use the selected printer to print an Access report or a Word document filled with Access data, using different syntax as needed for working with …
Instead of error trapping or hard-coding for non-updateable fields when using QODBC, let VBA automatically disable them when forms open. This way, users can view but not change the data. Part 1 explained how to use schema tables to do this. Part 2 h…
This is a high-level webinar that covers the history of enterprise open source database use. It addresses both the advantages companies see in using open source database technologies, as well as the fears and reservations they might have. In this…
This lesson discusses how to use a Mainform + Subforms in Microsoft Access to find and enter data for payments on orders. The sample data comes from a custom shop that builds and sells movable storage structures that are delivered to your property. …

604 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question