SQL scripted data correction

Using SQL 2012 I perform a manual process each month cleaning up erroneous data captured by the system. Yes it would be better to stop the data at the source but that's harder than it sounds.
I would like to script this entire process so I can run it either with a single call or even automate the running so that it happens before the SSRS report is generated. But I cant figure out how to put all the Hard pieces together with what I have now.

An example from todays work, I receive a trend for the month with unexplained spikes in the data. I note the date and run the following:
Declare @StDate DateTime
Declare @EndDate DateTime
Set @StDate = '6/06/2015 00:00:00.000' --  <<<<<<<<< SET BEGIN DATE
--Set @EndDate = DateAdd(month,1,@StDate) -- One Month
SET @EndDate =  DateAdd(day,1,@StDate) --One Day
Select dateadd(day,(datediff(day,0,TIMEandDATE)),0) as ReadingDate, TimeandDate,Value, id, tagid, _COMMENT
From MyTable
where 
TagID = 3004
and value > 0.06 -- <<<<< This varies depending on what I determine is a reasonable value based on before and after values
and TimeAndDate >= @StDate
and TimeAndDate < @EndDate

Open in new window


The results look like this:
2015-06-06 00:00:00.000	2015-06-06 20:00:00.953	0.0713222	8253107	3004
2015-06-06 00:00:00.000	2015-06-06 20:04:59.963	0.07127935	8253134	3004
2015-06-06 00:00:00.000	2015-06-06 20:09:59.947	0.07303441	8253161	3004
2015-06-06 00:00:00.000	2015-06-06 20:30:00.117	0.07020926	8253269	3004
2015-06-06 00:00:00.000	2015-06-06 20:35:00.107	0.1464455	8253296	3004 <<<<< Invalid Data Captured
2015-06-06 00:00:00.000	2015-06-06 20:40:00.193	0.07341963	8253323	3004
2015-06-06 00:00:00.000	2015-06-06 20:50:00.287	0.07256353	8253377	3004
2015-06-06 00:00:00.000	2015-06-06 21:50:00.563	0.07243514	8253701	3004
2015-06-06 00:00:00.000	2015-06-06 22:10:00.280	0.07252073	8253809	3004

Open in new window


So I find the ID number of the bad record and run this:
Update MyTable 
Set Value = 0.073 , -- <<<<<<<<<<<<<<<<<   SET CALCULATED (Reasonable)VALUE HERE
_COMMENT = LTRIM(ISNULL(_COMMENT,''+'Orig Value '+ CAST(Value AS VARCHAR(20)))) -- Places old value into comment field
where id = 8253296 ----  <<<<<<<<<<<<<<<<<<<<<<  ENTER ID OF INVALID DATA HERE
GO

Open in new window

Which then produces this:
2015-06-06 00:00:00.000	2015-06-06 20:00:00.953	0.0713222	8253107	3004	NULL
2015-06-06 00:00:00.000	2015-06-06 20:04:59.963	0.07127935	8253134	3004	NULL
2015-06-06 00:00:00.000	2015-06-06 20:09:59.947	0.07303441	8253161	3004	NULL
2015-06-06 00:00:00.000	2015-06-06 20:15:00.040	0.06931031	8253188	3004	NULL
2015-06-06 00:00:00.000	2015-06-06 20:20:00.037	0.06853986	8253215	3004	NULL
2015-06-06 00:00:00.000	2015-06-06 20:25:00.130	0.06926751	8253242	3004	NULL
2015-06-06 00:00:00.000	2015-06-06 20:30:00.117	0.07020926	8253269	3004	NULL
2015-06-06 00:00:00.000	2015-06-06 20:35:00.107	0.073		8253296	3004	Orig Value 0.146445 <<< CORRECTED VALUE
2015-06-06 00:00:00.000	2015-06-06 20:40:00.193	0.07341963	8253323	3004	NULL
2015-06-06 00:00:00.000	2015-06-06 20:45:00.183	0.06935316	8253350	3004	NULL
2015-06-06 00:00:00.000	2015-06-06 20:50:00.287	0.07256353	8253377	3004	NULL
2015-06-06 00:00:00.000	2015-06-06 20:55:00.267	0.06931031	8253404	3004	NULL
2015-06-06 00:00:00.000	2015-06-06 21:00:00.253	0.06776935	8253431	3004	NULL
2015-06-06 00:00:00.000	2015-06-06 21:05:00.350	0.06717008	8253458	3004	NULL

Open in new window


The issue I would like to solve is that the value I enter here is an approximation of the values before and after. Just had to be close, we are looking for a trend not specific numbers.
How would I automatically calculate a value to place in that single record and how would I do that for a months worth of data?
I would need to run through the months values looking for a number that is WAY higher than its predecessor and WAY higher than its successor. Then figure out what value to place there that is a reasonable mid value for the direction the data is trending.

Does this make sense? it does in my head .:)
LVL 1
Salad-DodgerInstrumentationAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

chaauCommented:
Yes, you can do this, as it is proved by this SQL Fiddle

Use LAG and LEAD functions to get the previous and next values:
with a as
(select *,
Lag(value) over( partition by tag_id order by id) preValue,
Lead(value) over(partition by tag_id order by id) postValue
 from MyTable
)
select * from a

Open in new window

Now, use a condition to locate the values that are way greater then the average between pre and post value:
with a as
(select *,
Lag(value) over( partition by tag_id order by id) preValue,
Lead(value) over(partition by tag_id order by id) postValue
 from MyTable
)
select * from a
where preValue is NOT NULL and postValue IS NOT NULL
AND ((preValue + postValue) / 2.0) * 1.5 < value;

Open in new window

You can adjust the coefficient 1.5 to  catch all possible incorrect values.
Then run the update command to fix the database (use the same coefficient):
with a as
(select *,
Lag(value) over( partition by tag_id order by id) preValue,
Lead(value) over(partition by tag_id order by id) postValue
 from MyTable
)
update a
set value = (preValue + postValue) / 2.0
where preValue is NOT NULL and postValue IS NOT NULL
AND ((preValue + postValue) / 2.0) * 1.5 < value

Open in new window

All fixed:
|            TimeAndDate |       Value |      id | tag_id |
|------------------------|-------------|---------|--------|
| June, 06 2015 20:00:00 |   0.0713222 | 8253107 |   3004 |
| June, 06 2015 20:04:59 |  0.07127935 | 8253134 |   3004 |
| June, 06 2015 20:09:59 |  0.07303441 | 8253161 |   3004 |
| June, 06 2015 20:30:00 |  0.07020926 | 8253269 |   3004 |
| June, 06 2015 20:35:00 | 0.071814445 | 8253296 |   3004 |
| June, 06 2015 20:40:00 |  0.07341963 | 8253323 |   3004 |
| June, 06 2015 20:50:00 |  0.07256353 | 8253377 |   3004 |
| June, 06 2015 21:50:00 |  0.07243514 | 8253701 |   3004 |
| June, 06 2015 22:10:00 |  0.07252073 | 8253809 |   3004 |

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Vitor MontalvãoMSSQL Senior EngineerCommented:
Salad-Dodger, do you still need help with this question?
Salad-DodgerInstrumentationAuthor Commented:
Apologies, It was a holiday weekend here and I (and my wife)  was determined not to let work intrude this time.
It looks like Chaau has provided an intriguing solution but I have not yet seen if I can implement it. I will do that early in the week.  I'm using 2012 so I hope that doesn't present a problem. Fiddle doesn't provide a 2012 option..

Did you have an alternative solution?
SolarWinds® Network Configuration Manager (NCM)

SolarWinds® Network Configuration Manager brings structure and peace of mind to configuration management. Bulk config deployment, automatic backups, change detection, vulnerability assessments, and config change templates reduce the time needed for repetitive tasks.

Vitor MontalvãoMSSQL Senior EngineerCommented:
Not really. The LAG and LEAD functions was introduced in SQL Server 2012 to make this kind of solutions more easy.
Salad-DodgerInstrumentationAuthor Commented:
Oh good. 2012 is what I use because I used PERCENTAGE function elsewhere.
I noticed Fiddler only had 2008 and 2014 as options so I got a little concerned that I would be on an island with this version.
Vitor MontalvãoMSSQL Senior EngineerCommented:
Use 2014 then. It's almost a tuned 2012 ;)
Salad-DodgerInstrumentationAuthor Commented:
Nice answer. Quick too.  I'm having a little trouble integrating it into my system, but that's clearly my issue and I shouldn't keep the thread open because of that. Thank you a great deal for the help.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.