EAV model used in datawarehouse design?

Posted on 2009-05-05
Last Modified: 2012-08-14
I am designing a database for a potentially large data warehouse.   Several of the fact tables may contain over a hundred columns.  Some of these columns may be sparse since data extracted from various sources may have different attribute.   We are debating between

1)  Keep all columns and deal with sparseness.  A few colleagues of mine worried about tables being sparse, but I don't see any problem with it. Perhaps I misses some important thing here.  Would the width (number of columns) of a table have any impact on performance that much?


2)  Move sparse columns into an EAV table.  Queries will be more expensive, but how much more?  Would you use this model in your own data warehouse?

I am not familiar enough with datawarehouse performance and think I would post this question ot the gurus in this community.

Thanks for your comments in advance!
Question by:cpeters5
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
LVL 22

Assisted Solution

dportas earned 100 total points
ID: 24303298
SQL Server has support for sparse numerics (ie. variable size storage for numerics) and page or row compression. So size alone should not be an issue.

Assuming you want a normalized DW design, consider decomposing the "sparse" attributes into a new table (ie. vertical partitioning).

Author Comment

ID: 24303445
Thanks dportas,
We did start with normalized schema, then we denormalized to form a wide fact table.  We wanted to avoid table joins.  The frontend will be a BI product, probably Business Object.

As per NULL values, it has alwasy been my understanding that using variable size variable should remove this concern.  I just wanted to confirm my understanding with the expert here.

I can see that EAV is appropriate when the attributes may change over time.  Is there any other reason why we should use EAV in a datawarehose project?

LVL 22

Expert Comment

ID: 24303812
>> We did start with normalized schema, then we denormalized to form a wide fact table.

That's unfortunate. Are you familiar with DW design practices? Are you following a model such as Inmon or Kimball?

>> I can see that EAV is appropriate when the attributes may change over time.  Is there any other reason why we should use EAV in a datawarehose project?

No to both of those things.
LVL 51

Accepted Solution

Mark Wills earned 400 total points
ID: 24315989
EAV is designed to accommodate dynamic attributes, or minimally, variable attributes over time where adding columns is not likely. The SQL to return data is more involved and can get complex (or maybe confusing), sometimes requiring recursion to solve.

The argument that is saves space is no longer as relevant as it once used to be. The argument of having variable or "as yet to be qualified" attributes is not a good thing for a data warehouse.

So, having tables help describe and manage both metadata elements and values are not the best thing for a data warehouse. Maybe SPARQL will fix that, but not yet.

Entity Attribute Value does have certain attractions and flexibility, so could be used for semi-structured data, or user defined data, and better suited to that transactional type space where such queries are often smaller and better defined than those we encounter in a datawarewouse.

There is a finite number of columns in a table, and in very large datawarehouse projects, it is sometimes neccessary to move some columns or data elements into another table... In those cases, it can help to adopt a quasi-eav leaving out the dynamic attribute (let that be the table) and adopt a datatype more closely resembling the nature and type of data in that sparse column.

SQL2008 is supporting sparese data a lot more where you can specify columns as spares, and create indexes on the as well. They also support an XML "column-set" for grouping otherwise sparse data (or maybe just felixbly trying to cater for an in line name-value).

It is more a "price" question, flexibility versus size, with the real "cost" being speed, data typing, excessive storage, SQL code complexity.

My vote ? Whilst not a DW purist (therfore not dead against EAV), and in fact a bit of a fan of EAV in appropriate places, would not use it in a datawarehouse.

Featured Post

Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
SQL Server maintenance plan 8 54
SQL works but want to get the XML node data separately 11 30
SQL profiler 3 19
Estimating my database size 7 17
Many companies are looking to get out of the datacenter business and to services like Microsoft Azure to provide Infrastructure as a Service (IaaS) solutions for legacy client server workloads, rather than continuing to make capital investments in h…
Recently, Microsoft released a best-practice guide for securing Active Directory. It's a whopping 300+ pages long. Those of us tasked with securing our company’s databases and systems would, ideally, have time to devote to learning the ins and outs…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question