Row versioning data inside a table

Hi guys,

Loooking for view / opinions on the following ( got a debate with someone so thought Id ask everyone here)...

Im going to comapre two ways in which i can see i could version groups of data within a table.

The business need here is that we have forms that have fields which contains values from lookup fields. We are basically creating a table for lookup fields, the requirement is that we should be able to see the easily version these lookups. so when one or more values change we create a new version of our available lookup fields.

Consider the following two tables...

CREATE TABLE [dbo].[Method1](
    [TypeID] [int] IDENTITY(1,1) NOT NULL,
    [TypeName] [nvarchar](100) NOT NULL,
    [Version] [int] NOT NULL,
    [Value] [nvarchar](100) NOT NULL
)

CREATE TABLE [dbo].[Method2](
    [TypeID] [int] IDENTITY(1,1) NOT NULL,
    [TypeName] [nvarchar](100) NOT NULL,
    [ValueID] [int] NOT NULL,
    [Value] [nvarchar](100) NOT NULL,
    [Deleted] [bit] NOT NULL
)

Now in table Method1 data will look like this....

TypeID TypeName Version Value
------- ----------- ---------- -------------------------------------
1      Type 1       1        Some Value 1
2      Type 1       1        Some Value 2
3      Type 2       1        Some Value 1
4      Type 2       1        Some Value 2

To add / remove or even edit values of a given TypeName you create a new version of my TypeName

ie...

TypeID TypeName Version Value
------- ----------- ---------- -------------------------------------
1       Type 1       1       Some Value 1
2       Type 1       1       Some Value 2
5       Type 1       2       Some Value 1
6       Type 1       2       Some Value 2
7       Type 1       2       Some Value 3
3       Type 2       1       Some Value 1
4       Type 2       1       Some Value 2

I kinda see method one as explicitly saying this is my TypeName, i.e. these are its versions and these are its values.

Method2 data looks like this....

TypeID TypeName ValueID Value Deleted
-------- ------------ ---------- --------- ---------
1       Type 1       1       Value 1       0
2       Type 1       2       Value 2       0
3       Type 2       1       Value 1       0
4       Type 2       2       Value 2       0

To add data - create a new row of a given type and new value id

TypeID TypeName ValueID Value Deleted
-------- ------------ ---------- --------- ---------
1       Type 1       1       Value 1       0
2       Type 1       2       Value 2       0
3       Type 2       1       Value 1       0
4       Type 2       2       Value 2       0
5       Type 1       3       Value 3       0

To edit data - create a new row of a given type and reuse the value id

TypeID TypeName ValueID Value Deleted
 -------- ------------ ---------- --------- ---------
1       Type 1       1       Value 1       0
2       Type 1       2       Value 2       0
3       Type 2       1       Value 1       0
4       Type 2       2       Value 2       0
5       Type 1       1       Value 1a      0

To remove data - inset the type name again and mark as deleted

TypeID TypeName ValueID Value Deleted
-------- ------------ ---------- --------- ---------
1       Type 1       1       Value 1       0
2       Type 1       2       Value 2       0
3       Type 2       1       Value 1       0
4       Type 2       2       Value 2       0
5       Type 1       1       Value 1       1

Now i see method two (for better or worse at this stage) as kinda implicit versioning in that you having to rely on the developer knowing that the max type id for typname / valueid pair is the most recent value.

As much as i dislike duplicating data I cant help but feel method one is easier to understand and has benefits in making it easier to report on.

What are peoples views on how to keep versions of groups of data in the table (based on these examples or other methods).

Views much appreciated.
noel58Asked:
Who is Participating?
 
SharathConnect With a Mentor Data EngineerCommented:
We used the second approach in our databse design (Deleted flag). We are comfirtable with this approach. we are dealing with huge volumes (Millions) of data.
In SQL Server 2008, you have the concept of CDC (Change Data Capture). CDC is also uses the same approach. FOr each table, SQL Server will create a CT table which contains the records which are modifed. The field __$operation will have the following values.
1 - Delete
2 - Insert
3 - Old Value (For Update)
4 - New Value (For Update)
0
 
MacNuttinCommented:
The deleted feild is nice because it allow dual use of the table. That is to say you can always rely on deleted flag to be 0 for only one TypeName so JOIN Method2  m2 on xx.other = m2.typename and m2 deleted = 0

Also the Lookup table can show a complete version history.
0
 
noel58Author Commented:
Hi, yeah the delete column could I guess be applied to both.

When you menntion a complete version history - you regarding to method 1?

I guess this is all really boiling down to if its best to copy all the rows or not for each new version.

Regards
0
Improve Your Query Performance Tuning

In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!

 
MacNuttinCommented:
Ya from what I can see both methods give you a complete version history. Just that method2 is easier to code to get the current verion with the delete flag Method 1 as you pointed out can too but it has to use select MAX(date) where....
0
 
noel58Author Commented:
where...?

The other thing i also considered is if i wanted to see a version at a particular point in time.

Method 1 could have a date time lookup based on the type name & version number

Method 2 would require each and every row in the type table to have a datetime.

I guess the other thing to consider is that i can get a unique clustered index on cols TypeName, Version, Value in method 1 where as i can't in method 2.

Im feeling to go with version 1 even though id have to work out the max version id for the latest version. I feels it clearer on what the data in the table is trying to express.

What mehod would you personally go for?

Regards
0
 
MacNuttinCommented:
Oh sorry you have no date column select MAX(version) from Method1 where typeName =

My point is you don't want to do that if you have the deleted column - Which in my opinion is better.


I would call it activeFlag or currentStableVersion
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.