historical database design

Hello,
I am designing a database, and one of the requirements is to store row history. I have the following tables:
employees (id, name, surname, dob, etc. )
employee_docs_rel (id, emp_id, doc_id, document (varbinary(max)), has_document) documents (id, doc_name)

I created another table for employees_history(id, master_id, name, surname, dob, etc., user_id, timestamp, start_date, end_date)

I need to save the history of the documents brought by the employee. Which is the best approach that i should follow in order to retrieve the changing history of each employee and also to retrieve the employee and documents version in a given time.

Thank you in advance
rilindAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

ButlerTechnologyCommented:
I would recommend using stored procedures as the only method that your users/applications have access to the tables.  Each stored procedure (Update, Insert, and Delete) would have additional code that would handle entries to the history table(s).  It is also an efficient method of securing the database as you are controlling the work flow with the stored procedures.  You can also create a List and Display stored procedure --  The LIST would show all of the entries for a table including any joins and the DISPLAY would show a single entry.

You could also use triggers to handle the additional code to populate the history table.  I am not a fan of triggers, but they have their uses.  They would be needed if the users/application have the rights to use DML commands (Update, Insert, and Delete) against the tables.

Tom
0
rilindAuthor Commented:
Thank you Tom. Can you recommend me an approach (database structure) on how to store the employee documents history, because if I create another table for the employee_docs_rel_history then the joins to retrieve the changing history of each employee and also to retrieve the employee and documents version in a given time would become very complex
0
Scott PletcherSenior DBACommented:
Do you have Enterprise Edition?  If so, I'd suggest you strongly consider Change Data Capture (CDC).  You'll need to change the default retention, which is only 3 days (72 hours), but you can increase it to up to ~100 years (which I've done on some of my tables, since I need some history "forever").

Also, you'll want to write your own stored procs to access the historical data.  The ones SQL provides are a royal pain to use, and can give odd and unexpected errors from, say, requesting a date from the proc that was before CDC was activated.


If you can't/don't want to use CDC, I suggest creating "history" tables using exactly the same structure as the current tables -- that allows existing queries to be used on history almost as easily as on current data.
0
Powerful Yet Easy-to-Use Network Monitoring

Identify excessive bandwidth utilization or unexpected application traffic with SolarWinds Bandwidth Analyzer Pack.

ButlerTechnologyCommented:
Adding to Scott's recommendation  -- I usually add a few extra-fields to history tables like date/time of change, who made the change, and what type of change.

I am not 100% sure on the changing of documents that you are asking about.  Can you provide a little more detail and perhaps an example.

Tom
0
Scott PletcherSenior DBACommented:
I actually go to the hassle of keeping the additional info -- who changed, when, etc. -- in a separate table (when I'm not overridden by the programmers, which sadly happens much more often than it should to DBAs).  

My preference is, again, to keep the history table exactly the same as the base table.  If the base table changes, the history tables are changed as well.
0
rilindAuthor Commented:
Thank you Scott, but for the moment I have Standard Edition, and I can't switch to Enterprise due to company policy, consequently I am looking for a resolution with history tables. I was thinking to create a table

employee_docs_rel_history (id, master_id, emp_id, doc_id, document (varbinary(max)), has_document, start_date, end_date, user_id, machine_name)

In this situation I would have two main tables  

employees (id, name, surname, dob, etc. )
employee_docs_rel (id, emp_id, doc_id, document (varbinary(max)), has_document)

and two history tables. For each update in the main tables, a new row is inserted to the history tables, then the main tables are updated. For example, we have a row in the employee table and the corresponding row in the history table:

before update
employees(1, Name, Surname, 01/01/1980, other data)

employees_history(1,1,Name,Surname, 01/01/1980,other data, start_date, null, user_id)

If we change the surname of the employee after marriage then we get

after update
employees_history
(1,1,Name,Surname, 01/01/1980,other data, start_date, end_date, user_id)
(2,1,Name,Surname v2, 01/01/1980,other data, start_date, null, user_id)

employees(1, Name, Surname v2, 01/01/1980, other data)


the same case goes for the documents too.


before update
employee_docs_rel (id, emp_id, doc_id, document (varbinary(max)), has_document)
employee_docs_rel (1, 1, 1, doc1, true)

employee_docs_rel_history (id, emp_id, doc_id, document (varbinary(max)), has_document, start_date, end_date, user_id)

(1, 1, 1, doc1, true, start_date, null, user_id)

after update
employee_docs_rel_history
(1, 1, 1, doc1, true, start_date, end_date, user_id)
(1, 1, 1, doc2, true, start_date, null, user_id)

employee_docs_rel
(1, 1, 1, doc2, true)

We have a version v1 then we update it and add a version v2. The query to get the docs history is a simple select. The same goes for the employees history. Now, since I have lost the references with the history tables the query to get the employee data and the employee docs at a given time becomes too complicated. I was trying to find a better solution.
0
Scott PletcherSenior DBACommented:
You could add an id to the history tables that relates to when, who, etc., made the change.

employee_changes ( change_id int identity(1, 1), when datetime, who varchar(...), description varchar(...), ... )


employees_history ( ..., change_id int FK --> employee_changes )
employee_docs_rel_history ( ..., change_id int FK --> employee_changes )
0
rilindAuthor Commented:
Oh, sorry i forgot to put it in the example but i already have the foreign keys

employees_history(id, master_id (FK to the master table), name, surname, dob, etc., user_id, timestamp, start_date, end_date)

The idea is that i can't link the employee version with the documents version. The only way to link them is the date. I don't like the query (and I think that it will be slow) that will get all the employees with their documents version at a given time. Another downside, from what i have read, is that history tables should not have foreign keys and indexes because they will cause problems in the future
0
Scott PletcherSenior DBACommented:
That fk is completely within the history tables, NOT back to the original table.  It provides a history datetime maker but taking less bytes.

I agree that you don't want to link back to the original table -- wasted overhead.

You could cluster the history tables by the change_id (or modify datetime, if you prefer to store the actual change info in the row itself), which would keep the changes in order, and allow easy and efficient joins between history tables for a given period of time.  But it would not allow you to query a single employee's changes efficiently.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
rilindAuthor Commented:
Thank you for your help Scott. Now I'm closer to the final approach.

Correct me if i'm wrong. In the example above, when I change the surname of an employee,
I should add a row at the employee_changes table and then a row at the employee_history and another row at the employee_docs_rel_history although the documents weren't changed? (Database may grow too large because of docs)
After that i'll have to change the employee row at the master table? Right now i'm saving the employee documents as varbinary(max) filestream.
0
Scott PletcherSenior DBACommented:
No, you'd only add a row to a history table if the row actually changes.

If you want to reconstruct the rows for all tables at a given period of time, you'd still have to write code for that, since CDC is not available to you (CDC has functions that can do the equivalent of that for you).
0
rilindAuthor Commented:
As this was my question from the beginning, can you give me a simple example on how to reconstruct the rows for all tables at a given period of time? Is this the best approach to have each history in a separate table?
0
Scott PletcherSenior DBACommented:
Sorry, but I don't think there's a "simple" way to do that.  It's a complex task to create a point-in-time look at a given row's history across even a single table, let alone multiple rows across multiple tables.

Initially, you might even want to use a cursor(s) [yikes!] until you fully understand all the logic required, then convert it into set-based code.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.