Hi,

I have a query that insert data into a table from a data logging system. My issue is that when the system is offline for a prolonged period, I get gaps in the data. The data is an accumulating meter reading, and my query will calculate the datachange between samples and save this in the Datachange column. If gaps are present, the datachange will be huge for the sample DateTime, when the system is back online.

I've attached a sample data file to this question to show my issue. If you look at 11/02/2014 (which is when the system went offline) there are missing samples up to 8/3/2014 and therefore a huge datachange.

What I need to do is run an insert query on this table and insert row data in 15min intervals up until 15mins prior to 8/3/2013 when the system is back online. I also need to populate the 'DataChange' and 'Data' columns. to populate these columns I'd have to calculate the difference between the 'Data' values (So in this example 443023-428475), divide this by the number of inserted rows and use this value as the datachange for each inserted row and accumulate the 'Data' rows by adding this value.

SampleData.txt
Even though ID might be unique, it is not sequential by timestamp (per meterID) and that is what is needed for our joins. A predictable sequential number that can be pointed to with precision.

You'll note that point 8 above says "thought it best to use meterID with row_number()"

And I used this in the import, and looks like we should have it in the CTE. The column 'rn' represents the computed row number.

Open in new window

As Anthony previously indicated, the row_number() provides us with a means to create a sequential integer with increments of 1 on the fly.

In terms of rounding, wile we could calculate down to a decimal place or two, so far, the DATA column looks very much to be an integer. So, I was keeping that very much in mind. It will still be very accurate because we are apportioning integer increments over the new inserted columns and the following row retains its correct value which effectively takes up the remaining difference of the individual inserted values.

e.g from data samples...

Open in new window

with time rounding we have 15:30 - 13:00

Based on the difference in timestamp, we get 6 blocks of 15 minutes.

So we take DATA difference and get 413220 - 413157 = 63

That means (integer arithmetic) 63 / 6 = 10 remainder 3

So we need to insert another 5 new rows (calculated as 6 but we must exclude current)

Open in new window

The last row is untouched, bet essentially gets a small inflation (ie 13 instead of 10) of the difference from prior row which includes the remainder.

I did assume (without consultation) that a small skew towards that last row is worthwhile because of the "fake" readings in between, and that last row being a real one.

By using decimal arithmetic we can minimise that last gap, even still, if there is rounding, that's where the remainder will be.

So, incorporating that initial layer to calculate RN, and leaving any DATA calculations according to the underlying data-type, and excluding ID from the insert, then our query would now look like :

Open in new window

And still, the prior comments about performance / indexes / the datachange calc (update) for the "next" row in the sequence all still applies.