Solved

Modify Query to allow Drift in DateTime for join

Posted on 2012-03-15
30
269 Views
Last Modified: 2012-05-07
Hi,

A while back I posted a question that I thought was complete, I've now found that it does not return data as expected.

http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/Q_26855116.html

Bascially, I need the query to Join the 2 columns even if the Timestamp is different by 2mins, so If there is TLInstance 1 with timestamp '24/02/2012 16:45' and TLInstance 2 with timestamp '24/02/2012 14:46' the Timestamp os TLInstance 1 is used as the Timestamp for the returned Data, and the 'TLData.Data' of both is joined on a single row against the '24/02/2012 16:45' Timestamp.

Currently I get 2 rows returned for very similar Timestamps, as seen in the attached data.
CalLabTrendReport.xls
0
Comment
Question by:wint100
  • 9
  • 9
  • 7
  • +2
30 Comments
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
Are those the only columns you are joining on?

The simplest solution would be to subtract the time between these two columns and add a condition in the Where clause to accept rows where this difference is between 0 and 2 minutes.

A bit more efficient would be to join the tables on the date value of this timestamp and further limit the rows with the <= 2 minute time difference.
0
 
LVL 34

Expert Comment

by:James0628
Comment Utility
First of all, the 2 times in your post were '24/02/2012 16:45' and '24/02/2012 14:46'.  I'm assuming that the second one should be 16:46.

 My first impression was that you were trying to Join 2 tables when the times weren't quite the same.  That's what "Join the 2 columns even if the Timestamp is different by 2mins" sounds like to me.

 But now I think you're trying to combine rows with times that are "similar".

 Or is it both?

 Either way, how do you define "similar"?  You said if they're different by 2 minutes, but what if you have 24/02/2012 16:45, 24/02/2012 16:46, 24/02/2012 16:47, 24/02/2012 16:48 and 24/02/2012 16:49?  Which of those are "similar", and how would the query know which ones belong together?

 James
0
 
LVL 1

Author Comment

by:wint100
Comment Utility
Combining rows with timestamp +/-2mins would work. The system logging data logs every 15mins so any value within the 2 min window can be considered the same row.

The colums are currently joined on TLinstance and timestamp. If the timestamps are within a minute the rows join fine, it is only when the minute is different that the problem arises.

Open to any suggestions.
0
 
LVL 35

Expert Comment

by:David Todd
Comment Utility
Hi,

psuedo code
select blah
from blahblah
join blahblahblah
    on abs( timestamp - TLinstance ) < 2minutes
where
    somewhereclause

Note that this might cause the join to perform rather badly. I'd prefer to use datediff if both columns where datetime.

HTH
  David
0
 
LVL 35

Expert Comment

by:David Todd
Comment Utility
PS Should the drift be one way, then could dispense with the abs function, which might improve performance. Or rather, the query thingys ability to find a more optimal plan.
0
 
LVL 34

Expert Comment

by:James0628
Comment Utility
Will there never be more than 2 rows that need to be combined, or could there be more, as in my example?

 If there could be more, I'm not sure how you would define your 2 minute windows.  For example, 16:47 is within 2 minutes of all of the times in my example, but 16:45 is only within 2 minutes of 16:46 and 16:47.

 James
0
 
LVL 1

Author Comment

by:wint100
Comment Utility
There can be up to 6, but I think we need to set the first TLInstance as the base timestamp, so if the Timestamp of TLInstance1 is 16:45, there others are +/- 2 mins of this timestamp
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
There are several different approaches to similar data 'windowing' and matching problems.  
* When you find two items, you might combine them and average their time, returning these to the pool for subsequent rounds of match-and-combine processes.
* Find a window/episode where there is a clearly defined first and last item in the chain
* Employ graph theory techniques to traverse the data
* Use some extra data to help relate/correlate the data
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Hmmmm.... looking at the other question and the sample spreadsheet and trying to digest / reconcile the two.

What if you were to "round" dates to the same five,ten or fifteen minutes ? Meaning, if it is OK to assume the value of the first one, then would it be OK to assume an appropriate global interval ?

Looking at the sample data from the report / spreadsheet it would appear that you want the two lines :

03/02/2012 17:22      20.0402                              
03/02/2012 17:23                  34.2513                  

to appear as the one.

Now, considering they are both using TLDATA to get the timestamp, then can only assume that the difference in timestamps is because of the subtle difference between the TLINSTANCES and ideally they all have the same timestamps and intervals.

What I am unsure of is why is there a union all query just for tl.name ? meaning the name is going to be the same for any one specific instance and isnt it going to repeat the same value vertically under (say for example) the A_Name column ? Wouldnt it be better to have the column name of the actual data element be the tlinstance name ?

Is this a straight query, or, an expreact from a procedure (or, is it OK to have a procedure) ?
0
 
LVL 1

Author Comment

by:wint100
Comment Utility
The global interval is usually 15mintues for all samples.

Yes, I do need those lines to appear on the same line, with 17:22 as the timestamp.

The difference in timestamp is because the TLInstance defines a different sample. In the supplied report, TLInstance1 is sampling Temperature data and TLInstance 2 is sampling Humidity data. Each has its own timestamp and we then need to join the 2 together into the report.

The TL table holds the Name of the sample, and has a reference for TLInstance so we can lookup the name of each TLInstance to display in thre report.

This should ideally be a straight query, as it is embedded into an ASP.NET class file.
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
I would recommend that you process the raw data on the server, aligning the rows before rendering them at the browser.
0
 
LVL 35

Expert Comment

by:David Todd
Comment Utility
Hi,

Just a thought based on the above comment:
>> rendering them at the browser.

What is the resolution you require at the browswer? Is it fifeteen minutes? Why not average the values on 15 min windows say, and then join them.

eg
group by dateadd( minute, round( datediff(  minute, 0, loggeddatetime ) / 15, 0, 1 ) * 15 , 0 )

Note that the thrird parameter in the rounding, instead of a default of 0, any other value creates truncation, which is what we want here.

HTH
  David
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
OK,

And what I really meant was "Stored Procedure" (because of the variables/params in the query) and have no idea what "expreact" means, should have been "extract", but I digress...

If you round back to the nearest quarter hour, would that work for you ?

So that time like 11:15:00 and 11:16:00 and 11:17:00 and 11:29:00 all 'belong' to 11:15:00

But 11:29:00 and 11:31:00 belong to two different quarter hours (11:15 and 11:30 respectively).

If so, then have a look at the timestamps produced by :

select dateadd(minute,(datediff(minute,0,timestamp)/15)*15,0), timestamp
from tldata

And then, no need for string truncation in timestamp formatting either - or - leave it to the final selection or display (although, with pivot, you do need the column names).
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
The reason for the comment about the SP is you could do something like :

declare @sql varchar(8000)
declare @ins varchar(8000)

select @ins = isnull(@ins+',','') + convert(varchar,tl.tlinstance) from tl where TLInstance in (@TLInstance1,@TLInstance2,@TLInstance3,@TLInstance4,@TLInstance5,@TLInstance6) group by tl.tlinstance
select @sql = isnull(@sql+',','') + '['+tl.name+']' from tl where TLInstance in (@TLInstance1,@TLInstance2,@TLInstance3,@TLInstance4,@TLInstance5,@TLInstance6) group by tl.name

set @sql = 'select * from (  
SELECT dateadd(minute,(datediff(minute,0,timestamp)/15)*15,0) as timestamp, tldata.data, tl.name 
FROM TLData
INNER JOIN TL on TLDATA.TLINSTANCE = TL.TLINSTANCE
WHERE TLDATA.TLINSTANCE in ('+@ins+') and isnumeric(data)=1 and Type=0 and (timestamp>'''+convert(varchar,@SDate,13)+''' and Timestamp < '''+convert(varchar,@eDate,13)+''')

) srce
PIVOT
( max(data) for name in ('+@sql+')) pvt'

exec (@sql)

Open in new window


which is more in keeping with the spreadsheet output (and works a bit quicker).
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Oops, sorry dtodd, didnt see your post and just going through emails now... GMTA, similar approach, but dont think the group by is needed because of the pivot.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 45

Expert Comment

by:aikimark
Comment Utility
@mark

Would any of the windowing keywords, such as OVER, be helpful?

Although 15 minute windows would be helpful, the problem description (as I read it) seemed to indicate that the different 15 minute windows weren't synced to the same 15 minute start/end points.
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
@aikimark,

Dont think so. The basic problem is a disparity between the timestamp of the individual instances, and the need to transpose the instance into column names with the row being identified by that (occassionally disparate) timestamp.

So, the underlying challenge is to make sure the raw data is going to have the same row identifiers across the range of instances.

Now, if there was every single timestamp such the the first time for each instance corresponded and the 10th or 15th or 19th, 30th etc where also all in perfect symmetry, then you could consider using some of the ranking type window functions (ie use the over clause). But if instance 4 (for example) was missing the time at 11:35 am then the rest of the times for that instance become misaligned. Combine that with a couple of other instances then it becomes a dogs breakfast.

Also consider if one of the instances changed frequency of count from 15 minutes to (say) every 10 minutes, or half hour. There could be huge discrepancies.

But if they do correspond and there is a huge (100%) reliability of data such that there were no chances of misalignment, then, something like:

 Row_number() over (partition by tlinstance order by timestamp)

could then be used - but - every instance must have the same number of and sequence of timestamps to consider breaking away from trying to align by using timestamp itself.

Does that make sense ?
0
 
LVL 1

Author Comment

by:wint100
Comment Utility
Sorry for my absence, I'm back on this now.

I can't really round 11:29 to the 11:15 timestamp as it will be too far off the actual timestamp. It may have just be a 2min window around the timestamp of the first timestamp.
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
That brings me back to my prior question about other criteria used for grouping
0
 
LVL 1

Author Comment

by:wint100
Comment Utility
These to optiosn sound interesting:

* When you find two items, you might combine them and average their time, returning these to the pool for subsequent rounds of match-and-combine processes.
* Find a window/episode where there is a clearly defined first and last item in the chain
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
From the sample data provided it seems to be a regular 15 minute interval, and the second instance for whatever reason deviates ever so slightly but when measuring in minutes might well be (could be) a matter of seconds / milliseconds to go to the extra minute. Doesnt appear to be multiple minutes (though might be over a greater range of instances).

Also appears that there is a corresponding / matching entry for each time slot.

Do any time slots ever start exactly on the quarter hour (whereby rounding has a greater impact) ?

Is there going to be a single most appropriate entry for all instances (ie matching give or take a minute or so) based on the time of the first (or any particular instance) ?

Is there going to be a time at each and every interval for each and every instance albeit just very slightly different ?

Is there a constant and regular interval from the first instance that all others *should* have a corresponding entry ? Or can that interval change as well (like being normally 15 minutes apart then sometimes 13 or 17 minutes or even half hour for other instances)

If the expected interval is fifteen minutes, can a deviation ever exceed (say) 7 minutes whereby an incorrect match *might* happen ?

Is it possible to get a set of *worse case* sample data with expected results ?
0
 
LVL 1

Author Comment

by:wint100
Comment Utility
Max deviation is only 2-3min max.

Also appears that there is a corresponding / matching entry for each time slot. - TRUE

Do any time slots ever start exactly on the quarter hour (whereby rounding has a greater impact) ? - SOME DO YES, ANY NEW REPORTS I START DO EXACTLY ON THE 15MIN PAST THE HOUR, BUT LEGACY ARCHIVES ARE JUST EVERY 15MINS FROM WHEN THEY STARTED

Is there going to be a single most appropriate entry for all instances (ie matching give or take a minute or so) based on the time of the first (or any particular instance) ? - THE TIMESTAMP OF THE FIRST TLINSTANCE SHOULD BE THE 'BASE' TIMESTAMP, WHEREBY ALL OTHER SAMPLES CAN DEVIATE +/- 2MINS

Is there going to be a time at each and every interval for each and every instance albeit just very slightly different ? - YES

Is there a constant and regular interval from the first instance that all others *should* have a corresponding entry ? Or can that interval change as well (like being normally 15 minutes apart then sometimes 13 or 17 minutes or even half hour for other instances) - SHOULD ALWAYS BE THE SAME INTERVAL

If the expected interval is fifteen minutes, can a deviation ever exceed (say) 7 minutes whereby an incorrect match *might* happen ? - IF THE DEVIATION IS GREATER THAN 2 MINS, NO MATCH SHOULD BE MADE, AND A NEW ROW ADDED AS IN THE SAMPLE DATA, WE CAN'T REALLY MATCH UP SAMPLES > 2MINS OUT OR IT IS TOO 'FAKED'

I've attached a report for Feb2012 that has 3 columns per page, but 6 TLInstances in total. (The other 3 continue after the first 3 have reached 29th Feb). You can see the sampels here.
BMSReport.xls
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Well, I hope a Stored Procedure is OK because I reckon it is by far the better approach and generates the output more in keeping with the spreadsheet.

Have a look at :

create procedure usp_get_tlinstance_pivot_data(@TLinstance1 int, @TLinstance2 int, @TLinstance3 int, @TLinstance4 int, @TLinstance5 int, @TLinstance6 int, @start_date datetime, @end_date datetime, @precision int)
as
BEGIN

-- should really validate the params first, then populate internal use variables

   declare @I1 int
   declare @I2 int
   declare @I3 int
   declare @I4 int
   declare @I5 int
   declare @I6 int
   
   declare @sdate datetime
   declare @edate datetime
   
   declare @sql varchar(8000)
   declare @ins varchar(8000)

   declare @p char(1)

   set @sdate = @start_date
   set @edate = @end_date

   set @I1 = @TLinstance1
   set @I2 = @TLinstance2
   set @I3 = @TLinstance3
   set @I4 = @TLinstance4
   set @I5 = @TLinstance5
   set @I6 = @TLinstance6

-- @p = precision in terms of how many minutes our deviation can tolerate

   set @p = '3'

   if isnull(@precision,0) between 1 and 5 set @p = @precision

   set nocount ON

-- Now get a list of resulting column names based on the TLINSTANCE name

   select @sql = isnull(@sql+',','') + '['+tl.name+']' from TL where TLInstance in (@I1,@I2,@I3,@I4,@I5,@I6) group by tl.name

-- Now get a list of TLINSTANCE's to use in our dynamic sql below - doing a select also validates legitimate TLINSTANCES, not just numeric ones.

   select @ins = isnull(@ins+',','') + convert(varchar,tl.tlinstance) from TL where TLInstance in (@I1,@I2,@I3,@I4,@I5,@I6) group by tl.tlinstance

-- Now we can build our SQL procedure
-- Basically a CTE query with a PIVOT

   set @sql = ';with TLC as
   (select TL1.tlinstance as TL_Instance, TL1.timestamp as TL_Timestamp, dateadd(minute,-'+@p+',TL1.timestamp) as TL_lower, dateadd(minute,'+@p+',TL1.timestamp) as TL_upper, TL1.data as TL_data, TL.name as TL_name 
    from  tldata TL1 
    inner join TL on TL1.tlinstance = TL.tlinstance
    Where TL1.tlinstance in ('+@ins+')
    and isnumeric(TL1.data)=1 
    and TL1.Type=0 
    and (timestamp>'''+convert(varchar,@SDate,13)+''' and Timestamp < '''+convert(varchar,@eDate,13)+'''))
   Select * from
   (Select tldata.Timestamp as Sample_Time,tlc.TL_name ,tlc.TL_Data
    from TLC
    inner join tldata on tldata.tlinstance = '+convert(varchar,@tlinstance1)+' and tldata.timestamp between TL_lower and TL_upper) src
   PIVOT
   (max(tl_data) for tl_name in ('+@sql+')) pvt'

-- Then we execute our query

   exec (@sql)
END
GO

-- now we can test our stored procedure

exec usp_get_tlinstance_pivot_data 1,11,21,12,NULL,NULL,'20100101','20100201',3

Open in new window


Now, we might need to do some more work on sequencing the column names in the correct order (ie matching the sequence requested by ordinal position in sp params)
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
1. Looking at your data, I would first delete or exclude rows where the 8000_Space Temp 2 TL Archive value was Null or an empty string.  That reduces some of the workload by 359 rows.

2. Do you want to add the values or other operation?

3. Should the timestamp for the combined row be the lower or the higher time value?
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
I dumped one of your worksheets into Access and tried to run the following query.  Unfortunately, Access doesn't consider this as an updateable result.  Maybe SQL Server will be more understanding.

UPDATE BMS1, BMS1 AS BMS1_1 
SET BMS1.[8000_Space Temp 2 TL Archive] = [bms1].[8000_Space Temp 2 TL Archive]+[bms1_1].[8000_Space Temp 2 TL Archive], 
BMS1.[8000_Space Hum 2 TL Archive] = [bms1].[8000_Space Hum 2 TL Archive]+[bms1_1].[8000_Space Hum 2 TL Archive], 
BMS1.[8000_Space Temp 3 TL Archive] = [bms1].[8000_Space Temp 3 TL Archive]+[bms1_1].[8000_Space Temp 3 TL Archive]
WHERE (((BMS1.id)=[bms1_1].[id]-1) AND ((BMS1_1.[Sample Time]) Between [bms1].[Sample Time]+(1/1440) And [bms1].[Sample Time]+(3/1440)) AND ((BMS1_1.[8000_Space Temp 2 TL Archive]) Is Not Null));

Open in new window


I also tried it using a (Select ...) statement on the right side of the = operator.
0
 
LVL 1

Author Comment

by:wint100
Comment Utility
Sorry for the delay, I've been unable to connect to site for a while.

The SP seems to work well, so my only question is on this comment:

**Now, we might need to do some more work on sequencing the column names in the correct order (ie matching the sequence requested by ordinal position in sp params)**

The columns do seemt o come out in reverse order, so TLInstance6 in columns 1, TLInstance5 in column 2 etc..
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Yeah, thought that might be the case.

So, the correct sequence is the sequence as entered (e.g. the 1st instance param should appear first, and the 4th instance param should be 4th etc) ?
0
 
LVL 1

Author Comment

by:wint100
Comment Utility
Yes, that's right
0
 
LVL 51

Accepted Solution

by:
Mark Wills earned 500 total points
Comment Utility
I am so sorry wint100, I totally lost sight of this...

Here is the correct sequencing of tlinstances :

if object_id('usp_get_tlinstance_pivot_data','P') is not null drop procedure usp_get_tlinstance_pivot_data
go

create procedure usp_get_tlinstance_pivot_data(@TLinstance1 int, @TLinstance2 int, @TLinstance3 int, @TLinstance4 int, @TLinstance5 int, @TLinstance6 int, @start_date datetime, @end_date datetime, @precision int)
as
BEGIN

-- should really validate the params first, then populate internal use variables

   declare @I1 int
   declare @I2 int
   declare @I3 int
   declare @I4 int
   declare @I5 int
   declare @I6 int
   
   declare @sdate datetime
   declare @edate datetime
   
   declare @sql varchar(8000)
   declare @ins varchar(8000)

   declare @p char(1)

   set @sdate = @start_date
   set @edate = @end_date

   set @I1 = @TLinstance1
   set @I2 = @TLinstance2
   set @I3 = @TLinstance3
   set @I4 = @TLinstance4
   set @I5 = @TLinstance5
   set @I6 = @TLinstance6

-- @p = precision in terms of how many minutes our deviation can tolerate

   set @p = '3'

   if isnull(@precision,0) between 1 and 5 set @p = @precision

   set nocount ON

-- Now get a list of resulting column names based on the TLINSTANCE name

   declare @tmp_instances table (param_seq int identity, tlparam int)
   insert @tmp_instances (tlparam) values (@I1) 
   insert @tmp_instances (tlparam) values (@I2)
   insert @tmp_instances (tlparam) values (@I3)
   insert @tmp_instances (tlparam) values (@I4)
   insert @tmp_instances (tlparam) values (@I5)
   insert @tmp_instances (tlparam) values (@I6)
   
   select @sql = isnull(@sql+',','') + '['+tl.name+']' from TL inner join @tmp_instances on tlparam = TLInstance group by tl.name, param_seq order by param_seq

-- Now get a list of TLINSTANCE's to use in our dynamic sql below - doing a select also validates legitimate TLINSTANCES, not just numeric ones.

   select @ins = isnull(@ins+',','') + convert(varchar,tl.tlinstance) from TL where TLInstance in (@I1,@I2,@I3,@I4,@I5,@I6) group by tl.tlinstance

-- Now we can build our SQL procedure
-- Basically a CTE query with a PIVOT

   set @sql = ';with TLC as
   (select TL1.tlinstance as TL_Instance, TL1.timestamp as TL_Timestamp, dateadd(minute,-'+@p+',TL1.timestamp) as TL_lower, dateadd(minute,'+@p+',TL1.timestamp) as TL_upper, TL1.data as TL_data, TL.name as TL_name 
    from  tldata TL1 
    inner join TL on TL1.tlinstance = TL.tlinstance
    Where TL1.tlinstance in ('+@ins+')
    and isnumeric(TL1.data)=1 
    and TL1.Type=0 
    and (timestamp>'''+convert(varchar,@SDate,13)+''' and Timestamp < '''+convert(varchar,@eDate,13)+'''))
   Select * from
   (Select tldata.Timestamp as Sample_Time,tlc.TL_name ,tlc.TL_Data
    from TLC
    inner join tldata on tldata.tlinstance = '+convert(varchar,@tlinstance1)+' and tldata.timestamp between TL_lower and TL_upper) src
   PIVOT
   (max(tl_data) for tl_name in ('+@sql+')) pvt'

-- Then we execute our query

   exec (@sql)
END
GO

Open in new window

0
 
LVL 1

Author Comment

by:wint100
Comment Utility
Thanks Mark
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

Introduced in Microsoft SQL Server 2005, the Copy Database Wizard (http://msdn.microsoft.com/en-us/library/ms188664.aspx) is useful in copying databases and associated objects between SQL instances; therefore, it is a good migration and upgrade tool…
Let's review the features of new SQL Server 2012 (Denali CTP3). It listed as below: PERCENT_RANK(): PERCENT_RANK() function will returns the percentage value of rank of the values among its group. PERCENT_RANK() function value always in be…
Via a live example, show how to set up a backup for SQL Server using a Maintenance Plan and how to schedule the job into SQL Server Agent.
Viewers will learn how the fundamental information of how to create a table.

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now