Link to home
Start Free TrialLog in
Avatar of aikimark
aikimarkFlag for United States of America

asked on

Database (1NF) Normalization examples wanted

I'm preparing a user group presentation on database normalization.  I would like to show examples of different challenging 1NF conversions.

Tackling 1NF conversions of unstructured/delimited data is a different animal that I will tackle separately.  What I would like to see from you are different repeated group column names that you've encountered.

Sequential numbering is usually what the students see.  I'd like to show them more challenging examples from your real-world experiences.  Things that come to mind are year and quarter values, week values, month name values, and day name values.  Other challenges might be values that aren't at the end of a column name, capitalization, and no delimiter between the root column name and the serialization values.
Avatar of Jim Dettman (EE MVE)
Jim Dettman (EE MVE)
Flag of United States of America image

Someone who lists what's been sold as:

2 20oz Hammers, 1 13mm combo wrench, and a set of 8" curved vice grips.

Jim.
cell phone / work phone / ... phone

The numbered values are still quite common in the real world, such as a list of classes a student is taking (yes, I have seen "table" "designs" that consisted of ( semester_id, class_1, class_2, class_3, ... )
Avatar of aikimark

ASKER

@Jim

Would the column names/headers be
20oz Hammers
13mm combo wrench
8" curved vice grips
Or something different?
These 1NF examples can be taken from imported data, such as CSV or Excel files.
<<Or something different?>>
 
 Example of both not being atomic (should be qty and item), and having repeating groups (Qty1, Item1, Qty 2, Item 2...)

Jim.
Here is one I see often:
 User generated image
Here the Time increments are both inconsistent, and need to be in their own field.
Or like this, when the word "Type" is Redundant for each value
User generated image(Here, obviously, the Blood "Type" is implied through the field Name)
Jeffrey, that's not 1NF. 1NF has to have with repeating columns. Scott Pletcher gave good examples as why you can put the numerous telephone numbers in a single table like contacts. This is also good for addresses (home address, work address, delivery address, ...) or for customers (customer1, customer2, ...) or suppliers (supplier1, supplier2, ....), classes (Math, English, History, ...) and so on and so on.
1NF also deals with multiple values being combined into a single value.

For example, something like:
person_name
with values like 'John Smith' or 'Smith, John' is not really 1NF, because it contains multiple discrete values (first name, last name).
Similarly, to me, for email addresses:
somename@somedomain.com
fails 1NF (for me) because it contains both the local part and the domain in a single column, when they should be separate columns.

Of course opinions on those are not universal.  They should be a good examples if you want a way to get a discussion going about  "1NF" more broadly.
The classic example of storing first and last name in the same field.

You can argue this both ways.
In some cases, ...the distinction between first and last name may never be drawn.

For example, ...in one of my databases I have a "Manager" field
In it, I store not only the full name, ...but multiple names.
ex.:
    John Smith, Gloria Jones, Kevin Holmes
Here I will never be asked to sort, filter or group on first or last name.
Also note that this is not a field in the "Employee" table (where I would argue for normalizing it), ...but it is simply a notes field in a distantly related table.
I use this field just to search on, (*SomePartOfName*)
 ...because I will frequently be asked questions like:
Did you call Barbra back?
What house is Kevin the manager at?
Whats the manager's name at the Elm street location.
How many managers named Lisa do we have?
...etc

In other cases you must "Build with the bricks you are given"
Meaning, you have an SQL table in the BE with only one Name field, ...and you are not allowed to make any table design changes.

So this might be an example of a rare case where you might not normalize (at the 1NF level)

I am sure other experts will add other examples..
;-)

Jeff
.in one of my databases I have a "Manager" field
In it, I store not only the full name, ...but multiple names.
ex.:
    John Smith, Gloria Jones, Kevin Holmes

Unquestionably that is the wrong way to store such data, period.  It's extraordinarily difficult to use and nearly impossible to properly manage that way.

Given the complexity of names, and the potential for name changes, people's names should be encoded (to a numeric code) in all tables except their base source table.


Meaning, you have an SQL table in the BE with only one Name field, ...and you are not allowed to make any table design changes.
Potentially you could even in that case use a number value there to represent the name, and use a view or a similar mechanism to virtualize the combined name column for any query that needed it.
@Vitor Montalvão
The Q here was based on the OP giving a "presentation"
I'm preparing a user group presentation on database normalization.  I would like to show examples of different challenging 1NF conversions.

So my first two examples were designed to keep things simple (for the varying skill levels possibly attending this presentation) and illustrate examples tables that did not follow 1NF, strictly (in the classic sense)
My examples illustrate a different strain of 1NF inconsistency that a person might encounter when working with data from word tables, text files or Excel sheets
;-)

The "problem" addressed by 1NF is having more than one value in a field.
The "solution" is to have separate fields.

As Scott pointed out, ...we can split hairs on this all day,...
So I did not want this thread to be sidetracked with a side discussion on the minutiae of Normal Forms...

Again, this was just to illustrate different types of 1NF scenarios an lay person might encounter.
;-)

Jeff Coachman
@ Scott
Unquestionably that is the wrong way to store such data, period.  It's extraordinarily difficult to use and nearly impossible to properly manage that way.

Yes, ...but my example here was just to illustrate the fact that, ...*sometimes* in "rare" cases, ...normalizing out the name may not be needed, ..as I stated, ...this is not an "employee" table (where indeed the names should be split), ...but a mere "Notes" field in a distantly related table.

Again, ...for a "presentation", ...we don't know the skill level of the attendees, or how deep down the Normalization "rabbit hole" they would like to go...

We all here can agree that at some point in database design, the line between Art and Science can become blurred...
We could all stomp out feet and demand full normalization,

We can all agree that each DBA has to draw that line where it best fits into the parameters that they have to work inside.

But again, ...(as I read the Q), ...this was just for a "Presentation".
So I don't want the thread to turn into an in depth discussion of Normal forms.

;-)

Jeff
I'm mostly interested in field/column names that are in repeated groups.  I'm not currently interested in unstructured/delimited text in a field/column.
ASKER CERTIFIED SOLUTION
Avatar of PatHartman
PatHartman
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
So far, Pat is the only expert to address my 1NF repeating groups example request.

I thought that you'd have encountered plenty of examples of repeating groups from EE questions.
Another less than obvious one is types of pay - Regular, OT, Holiday, Hazardous Duty, Jury Duty

And then there are the non-atomic abominations that mush all the parts of a name into a single field.  This of course works fine until someone wants to send a letter and personalize it or sort by last name.  Address is less of a problem because unless you are implementing a direct mail application, no one ever cares to separate house number, street name, street designation, compass designation, and several other parts that I don't even remember any more.

I'm helping someone now who seems to have status (open/closed) in the same column as Assigned To and they wanted to know how to use the employee table rather than keep updating their custom table so rather than convince them to split the column into two separate parts, some expert posted code that used a union query to get the open and closed into the list of employee names.  Who needs RI?  Just load the bullets into the gun.
Another less than obvious one is types of pay - Regular, OT, Holiday, Hazardous Duty, Jury Duty
I'm not sure I'd classify that as purely a 'repeating group'
Why Not?  Each is a type of payment that almost certainly uses a different hourly rate.  What about Vacation and PTO.  There's two more types.  Depending on what business you are in, there are likely to be others.  Will we just keep adding new columns and changing formulas as we would with Excel?  If you managed a payroll system and you just found out that you needed to add Wait Time as a new payment type for your drivers so you could pay them less for sitting in line at the terminal than actually driving would you rather have a flat table that you needed to modify in addition to  all the queries and calculations or a payment type table that just required a new row and rate adjustment?

Basically regular time, Vacation, and PTO are 1 * hourly rate * hours, OT is 1.5 * hourly rate * hours; Holiday is 2 * hourly rate * hours,  Wait Time is .9 * hourly rate * hours, etc.  They are all the same type of data used in an identical calculation but which use a different rate in the calculation.  They are separated in some cases because the rate will be different and in others simply for book keeping reasons.
It is for the purpose of this question that I don't consider it purely a 'repeating group'
<<It is for the purpose of this question that I don't consider it purely a 'repeating group'>>

  The record as Pat laid it out can't pass the "shape" test.   In that for every given record, can I ever enter a value for each and every field?  The answer is no.    I can't for example have a record with both Regular and vacation pay filled in.

  That tells you that all the different types are really a repeat of the same thing.

  "pay", boils down to a record of Type, Rate, and Qty.

  It's no different than the earlier example of  Home Phone, Work Phone, Cell Phone, etc.

<<So far, Pat is the only expert to address my 1NF repeating groups example request.>>

  ah, no, and I think your question has been more than answered.

Jim.
So far, Pat is the only expert to address my 1NF repeating groups example request.

Fascinating.  Then you seem to have wanted something different than what you asked for.  Separate rows of data that have repeating values are not "repeating groups" within a column, as is relevant for normalization.

Good luck with your project.
Maybe I haven't been clear.  I'm looking for examples of column names you've encountered that were part of repeating groups.  From my question text:
Things that come to mind are year and quarter values, week values, month name values, and day name values

Your examples don't have to be these, but I had hoped that that description would have adequately described the solution set (request/goal).
It looks like that depending on your source, there are three definitions for repeating group.  My definition is #1 and that is how I responded.  Apparently others have different opinions of what a repeating group is.

1. Multiple columns with the same type of data that should instead be multiple rows.  Storing this type of data as columns forces you to limit the instances such as 6 children, 10 different expense types, 5 different pay types.  Each column may be given unique name or a common name with a numeric suffix.

2.  Non atomic data.  This can be different fields such as first, middle, and last all stuffed into a single column or it can be "repeating" values such as a list of children'ts names or team members.

3.  Repeating data. This is more an indication of a violation of second normal form rather than a violation of first normal form.  Looking at data contents, you would see  multiple rows having the exact same value.  This happens when data such as company name and billing address  is repeated on every order rather than simply keeping a FK to the company table.
<<2.  Non atomic data.  This can be different fields such as first, middle, and last all stuffed into a single column or it can be "repeating" values such as a list of children'ts names or team members.>>

  On this, it depends on which set of definitions you subscribe to.   For some, forming a relation is something you do before you apply the normalization rules.  The two things required to do that is that data needs to be atomic and there is a primary key (you can tell one row apart from another).

 But most lump that in with 1NF.

Jim.
If it helps communicate what I'm looking for.  Here is an example of column headers (field names) that constitute a REPEATING GROUP.  Each group is a three-tuple of data.

2015Q1VOLUME
2015Q1SALESAMT
2015Q1TAXES
2015Q2VOLUME
2015Q2SALESAMT
2015Q2TAXES
2015Q3VOLUME
2015Q3SALESAMT
2015Q3TAXES
2015Q4VOLUME
2015Q4SALESAMT
2015Q4TAXES
2016Q1VOLUME
2016Q1SALESAMT
2016Q1TAXES
2016Q2VOLUME
2016Q2SALESAMT
2016Q2TAXES
2016Q3VOLUME
2016Q3SALESAMT
2016Q3TAXES
2016Q4VOLUME
2016Q4SALESAMT
2016Q4TAXES
2017Q1VOLUME
2017Q1SALESAMT
2017Q1TAXES
2017Q2VOLUME
2017Q2SALESAMT
2017Q2TAXES
2017Q3VOLUME
2017Q3SALESAMT
2017Q3TAXES
2017Q4VOLUME
2017Q4SALESAMT
2017Q4TAXES


While I can go through the different schemes I've already alluded to in my question text (online Format command and Excel autofill sequences), I thought I would exchange some points for really interesting real world examples the EE experts have encountered in questions or at work/client sites.  I'm not particularly interested in groups that only have single item.  Since sequential numbers are so commonly used examples, I'm not particularly interested in those examples, even though they do qualify as repeating groups.

I don't participate or monitor every question in my zones like I did when I was a ZA.  I haven't seen what other experts have seen.  Thus, I posted this question about what you've seen.  I don't need to learn about normalization.  I'm ONLY looking for examples of field/column names that constitute repeating groups of non-trivial tuple size.

Hope this clarification is helpful.
Excuse me.  I was trying to explain why you were getting different answers from different experts.
I don't need explanations.  I need examples that meet my expectations.  I've been trying to communicate my expectations since I crafted this question.

Hey experts,
Have I made myself clear on what I'm looking for now?

...and you've been given what you asked for.   You have been given several examples.  The fact that what has been provided is not interesting enough to meet your expectations for presenting is not something we can address.  It's not like anyone makes a note in their mind that "hey, this is a great example of breaking 1NF" when working on something.   You just know it when you see it and fix it.

 I don't see any point in continuing.

Jim.
Thank you for your participation, Jim
Thank you for your example, Pat
You're welcome.