asked on

How to best represent Inheritance in a MS Sql Server database?

Say i have two classes that inherit from the same base class. The base class has members A, B, and C. Class one has members W and X. Class two has members Y and Z.

What would be the best way to structure my database? My main concerns are simplifying future updates, simplifying my queries in general and performance.

Kyle Abrahams, PMP

Your database structure has nothing to do with your classes.

Your database structure should be based on the data your storing (google database normalization) . . . your classes just interact with that data.

abemiester

ASKER

I want to store the data from instances of class one and class two in a database. Hence the question...

Kyle Abrahams, PMP

post what data you want to store among all classes. How do they relate?

I would do a union of all columns and store it in a table.

abemiester

ASKER

I want to store all data.
For class 1 this would be: A,B,C,W, and X

For class 2 this would be A,B,C,Y, and Z

Please keep in mind that I have simplified this example. In reality the two classes could have 95% of their members in common with only 5% being unique.

Kyle Abrahams, PMP

I would do
ABC W X Y Z in a table with WXYZ being nullable.

Kyle Abrahams, PMP

scratch that .

ID A B C in Table Widget

setup a foreign key to another table
ID, FK_WidgetID, W,X Extended Widget

ID, FK_WidgetID, Y,Z Ultra Widget.

IF (big IF) W,X and Y,Z are of the same datatype and are always going to be the same datatype, you can combine in one table.

ID, FK_Widget, W/Y, X/Z, Type_Of_Widget

abemiester

ASKER

yeah those were the two possible structures i came up with too. I had started to lean towards the one table solution with nullable columns because it wouldn't require joins when selecting data. Also i believe inserts and updates would be simpler.

Kyle Abrahams, PMP

Don't be lazy, do what is right not what is easy.

abemiester

ASKER

How am i being lazy? Could you please explain the benefits of your solution vs. mine? In my experience simpler solutions are almost always better and joins can hurt performance...

Kyle Abrahams, PMP

Primary keys give you more physical indexes which are better for seeking data.

Say you have the same ABC but different values for each XY
in your table
ID,ABC X1W1Y1Z1
ID,ABC X1W2Y1Z1
etc.

in the 2 table system
ID,ABC

ID, FKID, X1 Y1
ID, FKID, X1 Y2

The data is normalized so you are not repeating any information. Data size will grow less rapidly and searches will be faster. Proper joins don't take a hit on performance as much as parsing that many more N rows.

abemiester

ASKER

Ahh i see what you are saying. Let me clarify my example:
ABC are members of the base class, not the same values. WX are members of class1 and YZ are members of class 2.

maybe this real life example will help:

class Shape //base class
{
double area; //Member A
string color; //member B
int perimeter; //Member C
}

class PolyGon : Shape // class 1
{
int numberOfSides; //Member W
string someOtherMember; //Member X
}

class Circle : Shape // class 2
{
double radius; //Member Y
double diameter; //Member Z
}

Above would be an example of the class structure i have.
Now lets say i have the following code snippit

Circle c = new circle();
c.radius = 1.0;
c.diameter = 2.0;
c.SaveToDatabase();

Polygon p = new Polygon();
p.numberOfSides = 5;
p.someOtherMember = "who cares";
p.SaveToDatabase();

This code would ( and should) make two different entries in my database. I am almost positive that hitting one table instead of hitting three tables with joins will be faster. And again, the inserts and updates will be simpler...

Kyle Abrahams, PMP

Would do 3 tables in this case.

Table Shape
ID, area, string, permiter,

Table PolyGon
Id, fk_shape_id, numSides, oMember

Table Circle
id, fk_shape_id, radius, diameter.

My lazy statement was not meant as an insult, but as encouragement . . . it's simpler/easier to do the 1 table approach, however you're still going to have nulls for circle if you have a polygon and vice versa . . . which in DB land is a "bad" thing if can be avoided. (Sometimes it can't be, but most times there are ways around it.)

Just my advice though, do as you like.

dportas

I second what Ged has said. The principle at work here is to create one table for each distinct sub class but then to push each attribute up to the highest level that it can belong in. That's the way to minimise redundancy in the hierarchy.

abemiester

ASKER

Thank you for the input Dportas. Do you think there is anything to the last paragraph on my previous comment:

"This code would ( and should) make two different entries in my database. I am almost positive that hitting one table instead of hitting three tables with joins will be faster. And again, the inserts and updates will be simpler..."

If I made those entries into one table I still wouldn't be repeating any data. There would just be columns with null values...

Ged:
Why are nulls in a database a bad thing? I'm not trying to be a smart ass I just really didn't know that :)

ASKER CERTIFIED SOLUTION

Kyle Abrahams, PMP

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

abemiester

ASKER

Thanks for the engaging discussion! I think i will use the multi-table solution after all.

dportas

>> Nulls in the "truest" sense represent an unknown

I don't know what "Nulls in the truest sense" means but presumably it does not refer to SQL. Nulls in SQL are not a true and accurate representation of something unknown. If X is an unknown quantity then we still know that X=X is true but that isn't so in SQL: X=X is neither true or false. So if you choose to use nulls to indicate "unknown" in SQL then you are still forced to create extra logic to re-define SQL's intended meaning of null if you want correct answers from your database.

Kyle Abrahams, PMP

http://en.wikipedia.org/wiki/Null_(SQL)#Mathematical_and_string_concatenation

"Null was introduced by E. F. Codd as a method of representing missing data in the relational model. Codd later reinforced his requirement that all RDBMS' support Null to indicate missing data in a two-part series published in ComputerWorld magazine.[1][2] Codd also introduced a ternary (three-valued) logic, consisting of the truth values True, False, and Unknown, which is closely tied to the concept of Null. The Unknown truth value is generated whenever Null is compared with any data value, or with another Null.

Codd indicated in his 1990 book The Relational Model for Database Management, Version 2 that the single Null mandated by the SQL standard was inadequate, and should be replaced by two separate Null-type markers to indicate the reason why data is missing. These two Null-type markers are commonly referred to as 'A-Values' and 'I-Values', representing 'Missing But Applicable' and 'Missing But Inapplicable', respectively.[3] Codd's recommendation would have required SQL's logic system be expanded to accommodate a four-valued logic system. Because of this additional complexity, the idea of multiple Null-type values has not gained widespread acceptance."

Null is accepted for the A-Value . . . there is no use in keeping track of the I-value in a relational database when it can be avoided.

null can't be related to X because X is a representation of a value in the range of unknowns.

dportas

Codd's model of nulls was no better than SQL's model. In both cases many-valued logic demonstrably fails to represent the semantics of "unknown" according to every-day maths, logic or common sense - a fact which Codd admits in his book. Codd's choice of two null marks was not a sound one because there are more than two possible reasons for data to be missing. On the other hand, having only one such mark in SQL isn't helpful either - as evidence the multiple and contradictory uses that are made of null in the SQL language and in SQL databases.

Nulls don't add to the expressive power of any system, ie: there is no information expressible with nulls that isn't equally (or better) expressible without them. Nulls only add complexity and cause the system to generate wrong answers to some queries. Unfortunately the constraints of SQL can make it very hard (though not impossible) to manage without nulls. In a truly relational (non-SQL) system all data can be represented as (non null) values within tuples within relations and the problems inherent in SQL nulls go away.