Solved

How to normalize data - sql - vb6

Posted on 2013-11-23
16
355 Views
Last Modified: 2013-12-02
I have to develop a table that contains the records of students and each day of the month that they had hours in attendance.  Here are the columns:


Last Name
First Name
1
2
3
....
30
31


When I pull up the data in a datagrid within VB6 I would like for a grid to pop up that shows all the days of the month as columns so that I can enter the hours for each student.

What is the proper to normalize the table so that the application can pull up the columns accordingly?
0
Comment
Question by:al4629740
  • 4
  • 4
  • 3
  • +3
16 Comments
 
LVL 21

Assisted Solution

by:oleggold
oleggold earned 50 total points
Comment Utility
You have to design simple sql , just wanted to understand the task , actually looks like what You need is de-normalize the normalized data or rather pivot it to show each day of the month as column?
0
 
LVL 21

Expert Comment

by:oleggold
Comment Utility
if that what You need to do , just use appropriate function in sql server.
0
 

Author Comment

by:al4629740
Comment Utility
What is an example of pivoting the data?
0
 
LVL 50

Assisted Solution

by:Lowfatspread
Lowfatspread earned 50 total points
Comment Utility
for a normalised solution

i'd expect as a minimum you had the following tables

student   -  id,name.dob,....
attendance - id,date,hours,status
0
 
LVL 38

Assisted Solution

by:Jim P.
Jim P. earned 100 total points
Comment Utility
You should also have a course table that ties to the attendance table. Then you will probably need an instructor table.

There is a decision point in the course table as well. Do you want duplicate courses with different instructors? Or do you want a single course name/ID and use a cross reference table to the instructors that teach it?

And how do you handle substitute instructors?
0
 
LVL 69

Assisted Solution

by:ScottPletcher
ScottPletcher earned 50 total points
Comment Utility
students ( student_id int, last_name varchar(nnn), first_name varchar(nnn), ... )
-- student_id = unique key


attendance ( date, student_id, hours, ... )
-- ( date, student_id ) = unique key

OR, if you need to record different blocks of hours per student per day, then this:

attendance ( date, student_id, code, hours, ... )
-- ( date, student_id, code ) = unique key
0
 

Author Comment

by:al4629740
Comment Utility
So if I had two tables,

student   -  id,name.dob,....
attendance - id,date,hours,status

Then could someone provide an sql statement that would take the data from the two tables and output them as a grid in my datagrid in vb6?  I am assuming I would use an innerjoin statement somehow.

As you can tell I am new to pivoting the tables to accomplish this task.
0
 
LVL 38

Assisted Solution

by:Jim P.
Jim P. earned 100 total points
Comment Utility
The ID should really something like StudentID and then the Attendance would have an AttID as the PK, and the StudentID as a separate column. The reason is that when you start using just ID it gets confusing really quick.

But the query would be something like:
select  student.id, student.name, student.dob, attendance.date, attendance.hours, attendance.status
from student
left join attendance 
      on student.id = attendance.id

Open in new window


And just a side comment. This sounds like you are working out of a book. Probably the best method to get it all down faster is to try coding the query yourself and then show us what you're having issues with.
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 51

Accepted Solution

by:
Mark Wills earned 250 total points
Comment Utility
Yeah, seems to be a missing column in attendance... You do need StudentID in there.

student   -  StudenttID,name.dob,....
attendance - attendanceID,StudentID,date,hours,status

Because there will be multiple dates for a Student, so, need StudentID in there (maybe that's what "ID" was, but lets clarify)

Could use StudentID + Date as the unique key, or, add in a separate unique key. I would be inclined to have the separate / surrogate key and just an index over StudentID + Date

Now, you don't mention what version of SQL you are using, so will assume minimum 2005.

To get the combined data...

Select *
From Student S
Inner Join Attendance A on S.studentID = A.studentID

Open in new window


BUT the problem with that is what about a new period ? Then the Attendance row wont exist, unless, you pre-populate, or, you have a "school calendar" table

So, would suggest you build a calendar table as well...

Calendar - Date, Day, Month, Year, term, ...

Then your join would look like :

select * 
from Student S
Cross join Calendar C
left join Attendance A on S.studentID = A.studentID AND C.date = S.date

Open in new window


And then just add in a WHERE clause to decide the month / year

select * 
from Student S
Cross join Calendar C
left join Attendance A on S.studentID = A.studentID AND A.date = C.date
where year = 2013
and month = 9

Open in new window


So, lets now start to construct some data so we can test out our query...

-- first Student and Attendance
create table student (StudentID int Identity primary key,Name Varchar(100),DOB datetime)
create table attendance (attendanceID int identity primary key,StudentID int,[date] datetime,[hours] int, [status] int)

-- now some values
 
insert student values ('Mark','19220607')
insert student values ('Jim','19200607')

Insert attendance values (1, '20130906',6,1)
Insert attendance values (1, '20130915',5,1)

-- note we don't have any attendance for Jim at this point

Open in new window


and now out calendar table which is going to be a bit boring because I am not doing any checks like holidays, school term, etc...

create table Calendar ([Date] datetime primary key, [Day] int, [Month] int, [Year] int, [WeekDay] Char(1))

declare @d datetime = '20130101'
While @d < '20140101'
begin

   insert calendar
   select @d, day(@d), month(@d), year(@d), case when left(datename(dw,@d),1) <> 'S' then 'Y' else 'N' end 
   set @d = @d +1

end

Open in new window


There is an article about building a calendar table : http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/A_12267-Date-Fun-Part-One-Build-your-own-SQL-calendar-table-to-perform-complex-date-expressions.html

So, now lets convert our select statement to get the "raw" info required...

select S.StudentID, S.Name, C.Day, isnull(a.hours,0) as attended 
from Student S
Cross join Calendar C
left join Attendance A on S.studentID = A.studentID AND A.date = C.date
where year = 2013
and month = 9

Open in new window


Now we can see that the cross join has created entries for each day, and attendance (hours) are appearing only where there is attendance hours. So, even Jim is appearing despite not having any attendance.

But we do want to pivot. So we use the above and pivot student and hours over days as columns

select * 
from 
     (select S.StudentID, S.Name, C.Day, isnull(a.hours,0) as attended 
      from Student S
      Cross join Calendar C
      left join Attendance A on S.studentID = A.studentID AND A.date = C.date
      where year = 2013
      and month = 9) src
PIVOT
     (sum(attended) for day in ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17],[18],[19],[20],[21],[22],[23],[24],[25],[26],[27],[28],[29],[30],[31])) pvt

Open in new window


The above will give you 31 columns everytime. to get those columns to be date sensitive, you will need to get into dynamic SQL.

For that, please read : http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SQL-Server-2005/A_653-Dynamic-Pivot-Procedure-for-SQL-Server.html
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
You do NOT need NOR WANT an "attendance id".  You have natural keys for that table.

More generally, for the sake of your database designs and performance, get away from the dopey, lazy notion that "every table should/must have an identity".
0
 
LVL 38

Expert Comment

by:Jim P.
Comment Utility
Scott,

I agree that the composite key will eventually be StudentID, AttendanceDate, CourseID and maybe InstructorID depending on design.

I disagree that a simple index is not needed. I have had too many issues with doing bug fixes from programmers that I have had to get into a long and complicated queries building up what to update or delete. Where if I had an ID number then I could have dumped the rows into a temp table where StudentID = "####", then delete out non valid rows from the temp table, and then go back  into the production table and just delete where InexNum in (select InexNum from temptable).
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
For this table, the composite key must itself be unique.  Thus, there is absolutely no need for an identity key.  An identity "crutch" added to this table will likely cause problems by allowing the natural key values to be repeated when they should not be (because when a nice, simple key is available, that gets used for the PK, which becomes the clustered index, and "messy", i.e. non-simple, indexes are then not created).
0
 
LVL 38

Expert Comment

by:Jim P.
Comment Utility
(because when a nice, simple key is available, that gets used for the PK,

I'm not advocating for it to be the primary key. I'm advocating to have it available as an easier way for doing data manipulation.
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Scott,

I clearly said that StudentID + Date can serve as a natural key.

It is not a dopey / lazy "crutch" to add an Identity.

The reality is the design is very far from complete, and can well imagine additional requirements in Attendance such as Course and other potential unique identifiers needed for referential integrity (such s courseID).

You are quite wrong saying that a surrogate (automatically) leads to duplication of natural keys. I also mentioned adding in a unique key. There are constraints such as "UNIQUE" that is available. And if student + date is "nice and simple" as a PK, then that exact combo cannot become a "messy non-simple" (unique) index for no other reason than the introduction of an identity. It is still a "nice and simple" unique index.

Quite frankly, your reaction and tone with the bold comments was very disappointing to see and read, and assume the comments were targeted to discredit or raise doubts about my considered post.

While I understand and will often support the use of natural keys, when dates are involved and progressively being added to, by student, there is naturally a strong tendency of fragmentation and page shuffles due to dates being inserted across the range of students.

There is no real chronology other than the sequence in which rows are being added leading to built in inserts between pre-existing rows over time.

So, considering the page structure using natural keys, I was quite deliberate when being "inclined" to use an attendanceID as a surrogate key. And clearly said there was a choice. And clearly referred to uniqueness of Student + Date

To say it was "dopey, lazy" is simply wrong.

Let's also consider (or simply remember the old education days) that there could quite easily be additional requirements in "attendance" to reflect curriculum such as course, maybe year, electives, subject matter. We have yet to be told about any frequency of "marking the role" and who does that, or, if this requirement is a different kind of planner for student volumes.

Now, I will agree that making StudentID an identity was rather lazy, but also convenient for the example. The reality is that studentID has (more than likely) some other method of being created or assigned which becomes integral with institutional identity cards (library, travel etc) for that student.

My objective was two fold.

1) to show the pivot to present normalised data as 31 columns
2) to show the inclusion of the Calendar Table to provide the "template" regardless of actual attendance

That was my understanding of the brief in the question header.

If the question was more about designing a student system, then there is considerable more work that might well change the very scant design being presented.
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Apologies Scott, upon reflection it may not have been directed purely at my post.
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
I was very specific as well: what I said was "dopey and lazy" was the idea that essentially every table should have an identity, and that it should be the clustering key.  

Yes, it's certainly possible that one could add an identity column, say as a PK, but still properly define and use the natural keys as clustered and unique.  I've just seen in practice that rarely ever happens.  Once an identity appears in a table, it is usually very quickly designated the clustered PK and the more natural keys are ignored.  That can, of course, lead to duplicate natural key values where, from a business standpoint, none should be allowed.

I point this out so dramatically because I've seen it on literally hundreds of tables.  The identity-as-default-primary-clustering-key is the single most common mistake seen in table design.  And the worst performance loss is caused by clustering by identity instead of choosing the single best clustering key.
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Slowly Changing Dimension Transformation component in data task flow is very useful for us to manage and control how data changes in SSIS.
The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now