When do record changes occur?

A client recently asked me to start uploading a text file into a SQL Server database each day.  This file is provided by a third party, and reflects current values, not changes.  I've got this upload process working as a simple append query, but it is resulting in a bunch of duplicate values (records that are identical except for the upload_date field).

The SQL table contains fields like:

Upload_Date, Company_ID, Column1, Column2, Column3

I'm seriously thinking about modifying the upload, so that it is not a simple append, but a merge which would use all of the columns to perform the merge and if the record is matched, (everything but the upload_date), then I would simply update a counter field, but if the record is unmatched, I would perform the append, but I need to discuss this with the client first.

In the meantime, what I need to do is create a query which will identify the dates a companies data (Column1, Column2, or Column3) changed. Assuming these 5 columns, with data similar to

UPload_Date          Company_ID         Column1     Column2     Column3
2018-01-01                      1                         a                     b                  c
2018-01-02                      1                         a                     b                  c
2018-01-03                      1                         a                     b                  d
2018-01-04                      1                         a                     b                  d
2018-01-05                      1                         z                      b                 d

I would like a recodset which looks like:

Company_ID      Upload_Date         PrevChange
1                           2018-01-05             2018-01-03
1                           2018-01-03             2018-01-01
1                           2018-01-01             NULL

Any recommendations
LVL 52
Dale FyeOwner, Developing Solutions LLCAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Olaf DoschkeSoftware DeveloperCommented:
You can do this with over, first_value/last_value,lag/lead:

declare @data as table (upload_date date, company_id int, column1 char(1), column2 char(1), column3 char(1))
insert into @data values ('20180101', 1, 'a','b','c'), 
('20180102', 1, 'a','b','c'), ('20180103', 1, 'a','b','d'), 
('20180104', 1, 'a','b','d'), ('20180105', 1, 'z','b','d');

Select  * from @data;

with changes as (select distinct company_id, 
first_value(upload_date) over (partition by company_id, column1, column2, column3 order by upload_date) as change_date
   from @data)

Select *, lag(change_date) over (partition by company_id order by change_date) as prev_change  from changes order by change_date desc;

Open in new window

The "changes" CTE query already gives you the change dates, depending on your exact plans, it might not be necessary to add the PrevChange column, it contains redundant data you already have in the list of change dates.

Bye, Olaf.
Dale FyeOwner, Developing Solutions LLCAuthor Commented:

Thanks, I'll take a look at that.  The reason I'm looking for the two dates is that I want to use this as a subquery, to join back to my data table like

Select Q.Company_ID, Q.Upload_Date, Q.PreviousChange, T1.*, T2.*
FROM subQuery as Q
LEFT JOIN myTable as T1 ON T1.Company_ID = Q. Company_ID AND T1.UPload_Date = Q.Upload_Date
LEFT JOIN myTable as T2 ON T2.Company_ID = Q.Company_ID and T2.Upload_Date = Q.PreviousChange

This will then give me the ability to identify which field(s) changed and the values that changed on those dates.

Ultimately, what I'm looking for is something like:

Company_ID   Upload_Date     ColumnChanged    OldValue          NewValue

So for 1/1/18, it would list all three columns, NULL as the OldValue and the NewValue.
For 1/3/18, it would list a single record with

1                          2018-01-03           Column3                 c                            d

one for 1/5/18:

1                           2018-01-03           Column1                a                            z
Olaf DoschkeSoftware DeveloperCommented:
To know what changed when, it's best to do something like change data capture (CDC), temporal tables or change tracking.

CDC offers a simple way to determine which columns changed, as it sets bits in an _$update_mask binary field for each column changed.

You could also do updates instead of append data and use a "before update" trigger making use of COLUMNS_UPDATED() could store the previous record only in case the bits of that value signal changes.

Bye, Olaf.
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

Dale FyeOwner, Developing Solutions LLCAuthor Commented:

Finally getting a chance to test.  Received error message:

Msg 195, Level 15, State 10, Line 11
'first_value' is not a recognized built-in function name.
Msg 11305, Level 15, State 10, Line 14
The Parallel Data Warehouse (PDW) features are not enabled.

Forgot to mention in the original post that my client is running SQL Server 2008
Olaf DoschkeSoftware DeveloperCommented:
Well, First_Value is one of the "new" window functions introduced in 2012. That's really unfortunate. You can't simply replace that with GROUP BY as col1-3 might change from a,b,c to a,b,d, and back to a,b,c and you still want to get three groups and two change dates from this.

So if you're bound to this older version CDC would be an option, but only available in the enterprise editions.

Temporal tables are even newer and so change tracking seems the obvious solution, see https://technet.microsoft.com/en-us/library/cc280358(v=sql.105).aspx

Anyway, this means a totally different solution strategy and approach by "blindly" applying all data changes to find out when what changed as result. You don't optimize workload of changes to a minimum this way, you just find out the net changes afterwards.

So whether that works for you depends on what's your main goal.

Bye, Olaf.
Dale FyeOwner, Developing Solutions LLCAuthor Commented:
Thanks, Olaf.

I may go to using a merge and do an update if all of the fields (other than the Upload_Date) are the same, and an append if one or more of those fields changes.  This way, every record would represent a change in one value or another.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Olaf DoschkeSoftware DeveloperCommented:
Well, appending is the strategy making your table a history of records.

Using any technique I listed including change tracking you'd simply use updates if only company_id matches and only insert new companies. The history of record states then will result from that. You don't build the history, the history is built for you.

Bye, Olaf.
Dale FyeOwner, Developing Solutions LLCAuthor Commented:
Thanks for your input Olaf,

The client wanted this table to basically be a history table, so I created a MERGE statement, comparing the latest (max UploadDate) record for each of the companies against the newly uploaded recordset.  If all of the non-date fields match, then I simply update a single field in the target recordset, but if any of the non-date fields differ, I create a new record.  This gives me the ability to view the entire history of a company by simply filtering on the CompanyID field.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.