keep a historical record of failed SQL job

Dear all,

right now we tried to see how frequent if a SQL failed in daily/weekly/monthly basis, we can create a table to store the result.

the table which store the result can have this schema:

gmt Date/Time
Job id/name
Job Desc
Failure Desc
Executed By User

any script/tools  you all using to do this?
LVL 1
marrowyungSenior Technical architecture (Data)Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Vitor MontalvãoMSSQL Senior EngineerCommented:
Why not query directly from msdb?
SELECT h.server, j.name, h.step_name, h.message, h.run_date, h.run_time, h.run_duration
FROM sysjobs j
	INNER JOIN sysjobhistory h ON j.job_id=h.job_id
WHERE h.run_status=0 -- Failed
ORDER BY h.run_date, h.run_time

Open in new window

0
marrowyungSenior Technical architecture (Data)Author Commented:
this script don't run well on my side so I change it to:

SELECT h.server, j.name, h.step_name, h.message, h.run_date, h.run_time, h.run_duration
FROM msdb..sysjobs j
	INNER JOIN msdb..sysjobhistory h ON j.job_id=h.job_id
WHERE h.run_status=0 -- Failed
ORDER BY h.run_date, h.run_time

Open in new window


as we want to see how frequent is the failure and we prefer to store the failure record to a table with time so that we can measure the frequency  of it.
0
marrowyungSenior Technical architecture (Data)Author Commented:
basically I am testing this script to check the last running result of a job:

;WITH jobListCTE as
(
  SELECT j.name as job_name, 
         msdb.dbo.agent_datetime(run_date, run_time) AS run_datetime,
         RIGHT('000000' + CONVERT(varchar(6), run_duration), 6) AS run_duration,
         message
    FROM msdb..sysjobhistory h
   INNER JOIN msdb..sysjobs j ON h.job_id = j.job_id
   WHERE h.step_name = '(Job outcome)' and h.run_status = 1 /* run status: 1=successs, 0= failed */
) 
,CTE2 as
(
SELECT job_name as [JobStep], 
       run_datetime as [StartDateTime], 
       SUBSTRING(run_duration, 1, 2) + ':' + 
       SUBSTRING(run_duration, 3, 2) + ':' + 
       SUBSTRING(run_duration, 5, 2) as [Duration], 
       message
	   ,ROW_NUMBER()over(partition by job_name order by run_datetime desc) RN
  FROM jobListCTE
)

SELECT * FROM CTE2 T1
WHERE RN = 1
order by [JobStep]

Open in new window


but your script return more information like failure detail, how to integrate the job steps and the failure message of each steps of it ?
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

Vitor MontalvãoMSSQL Senior EngineerCommented:
this script don't run well on my side so I change it to:
You'll need to run it in the msdb context but I saw that you added the reference to msdb to the table name and that's also fine.
0
marrowyungSenior Technical architecture (Data)Author Commented:
yeah.

is there anyway to integrate the job steps and the failure message of each steps of it if it is failed ?
0
Vitor MontalvãoMSSQL Senior EngineerCommented:
is there anyway to integrate the job steps and the failure message of each steps of it if it is failed ?
JobHistory table has that information but your query is only filtering by succeeded job executions: WHERE h.step_name = '(Job outcome)' and h.run_status = 1 /* run status: 1=successs, 0= failed */
0
marrowyungSenior Technical architecture (Data)Author Commented:
"JobHistory table has that information but your query is only filtering by succeeded job executions: WHERE h.step_name = '(Job outcome)' and h.run_status = 1 /* run status: 1=successs, 0= failed */ "

but I can also filter out only failed record by changing run_status to 0, right?

if I remove run_status as the condition then it return all, agree?
0
Vitor MontalvãoMSSQL Senior EngineerCommented:
but I can also filter out only failed record by changing run_status to 0, right?
 if I remove run_status as the condition then it return all, agree?
Right and Agree :)
0
marrowyungSenior Technical architecture (Data)Author Commented:
but how can I find out the failed job each month if i only query the msdb ? must change the log history to make it store record for a least a month ?

from out point of view, creating a table can store as much record as we can only for failed job and we just then query that table without touching the msdb configuration ? agree ?
0
Vitor MontalvãoMSSQL Senior EngineerCommented:
Yes, querying msdb depends on how long it can keeps job execution history so if you need it to keep more records than the current configuration why not increase it instead of building a new table and creating all process to copy data from msdb to the new table?
You're only moving records from place so why not keep them in the original tables but for longer?
0
marrowyungSenior Technical architecture (Data)Author Commented:
so you also in the same page that I can increase the number of log msdb by increasing the number in that page?

SQL job log history
and then just use this script ?

;WITH jobListCTE as
(
  SELECT j.name as job_name, 
         msdb.dbo.agent_datetime(run_date, run_time) AS run_datetime,
         RIGHT('000000' + CONVERT(varchar(6), run_duration), 6) AS run_duration,
         message
    FROM msdb..sysjobhistory h
   INNER JOIN msdb..sysjobs j ON h.job_id = j.job_id
   WHERE h.step_name = '(Job outcome)' and h.run_status = 1 /* run status: 1=successs, 0= failed */
) 
,CTE2 as
(
SELECT job_name as [JobStep], 
       run_datetime as [StartDateTime], 
       SUBSTRING(run_duration, 1, 2) + ':' + 
       SUBSTRING(run_duration, 3, 2) + ':' + 
       SUBSTRING(run_duration, 5, 2) as [Duration], 
       message
	   ,ROW_NUMBER()over(partition by job_name order by run_datetime desc) RN
  FROM jobListCTE
)

Open in new window

0
marrowyungSenior Technical architecture (Data)Author Commented:
how can I include the "Executed By User" in the above script ?
0
Vitor MontalvãoMSSQL Senior EngineerCommented:
The "Executed by user" isn't stored in the tables. There's the job owner but I don't know if it useful for you.
0
marrowyungSenior Technical architecture (Data)Author Commented:
tks.

how can we make it works? create a new table now and join one more table and then insert the result to this newly created table will works?

what table we can join to see the result ?
0
Vitor MontalvãoMSSQL Senior EngineerCommented:
So, you still thinking in going ahead with replicating the msdb tables into a new table.
This will give you some work (not really to create it but to manage it). You already have the initial query and now you'll need to find a way to insert only new rows. Maybe something like running a daily job at midnight (00:00) and insert only rows from the previous date (run_date>=GETDATE()-1).
0
marrowyungSenior Technical architecture (Data)Author Commented:
"So, you still thinking in going ahead with replicating the msdb tables into a new table."

nono. what I need is the information on who execute the job and not the job owner information.

as default information from MSDB don't show it, it makes me think I need an extra table and extra join query to implement this.

I think this is the last information I need from the result.
0
Vitor MontalvãoMSSQL Senior EngineerCommented:
nono. what I need is the information on who execute the job and not the job owner information.
I couldn't find that information nowhere. Will try to dig on this and back to you.
0
marrowyungSenior Technical architecture (Data)Author Commented:
one interest thing I found out is, if the job has many steps and only one of the steps failed, it can't show you what is the error message.

it just said job failed at step (X) and don't show you what the error is . agree?

I think we are close as long as the script just return who executed it manually with error message on the "message" column will be good ?
0
Vitor MontalvãoMSSQL Senior EngineerCommented:
Well, the error message have that information but then you can only store it when an error occurs (and you'll need to extract it from a text field). So, for the success steps you won't be able to register who ran the job.
0
marrowyungSenior Technical architecture (Data)Author Commented:
but we have a job yesterday failed and when using the above script I can't see which steps is failing and who did this, any idea ?
0
Vitor MontalvãoMSSQL Senior EngineerCommented:
Well, you're filtering by jobs that has been succeeded:
WHERE h.step_name = '(Job outcome)' and h.run_status = 1 /* run status: 1=successs, 0= failed */
 so you'll never get the failed ones.
0
Vitor MontalvãoMSSQL Senior EngineerCommented:
And I already called your attention for that in my comment #41149484
0
marrowyungSenior Technical architecture (Data)Author Commented:
one thing, I am quite busy on how to monitor replication as you can see I just post one more.

our DBA in US is going to leave at the end of this week so I am focusing on merge replication monitoring now.

but for this, any way to change the script so that we can see how many failed/success during last week/ last 2 weeks/ last months and last 2 months ?
0
Vitor MontalvãoMSSQL Senior EngineerCommented:
You'll need to group by status and add a filter criteria to only bring the wanted period.
0
marrowyungSenior Technical architecture (Data)Author Commented:
"add a filter criteria to only bring the wanted period. "

where by date and what should be the format like ?
0
Vitor MontalvãoMSSQL Senior EngineerCommented:
run_datetime is already DATETIME format so you can use DATEDIFF function. Examples:
run_datetime > DATEADD(week, GETDATE(), -2) -- Last 2 weeks
run_datetime > DATEADD(month, GETDATE(), -1) -- Last month
0
marrowyungSenior Technical architecture (Data)Author Commented:
tks victor, I will test it.

"but we have a job yesterday failed and when using the above script I can't see which steps is failing and who did this, any idea ? "

We have one other job failed but what I want to say here, from the message field output, I can't see what is the steps it failes and I can't see what the error message is.

any idea?
0
marrowyungSenior Technical architecture (Data)Author Commented:
I have to put the date inside that loop:

CTE2 as
(
SELECT job_name as [JobStep], 
       run_datetime as [StartDateTime], 
       SUBSTRING(run_duration, 1, 2) + ':' + 
       SUBSTRING(run_duration, 3, 2) + ':' + 
       SUBSTRING(run_duration, 5, 2) as [Duration], 
       message,
       [Job description]
	   ,ROW_NUMBER()over(partition by job_name order by run_datetime desc) RN
  FROM jobListCTE
)

Open in new window


like this :
CTE2 as
(
SELECT job_name as [JobStep], 
       run_datetime as [StartDateTime], 
       SUBSTRING(run_duration, 1, 2) + ':' + 
       SUBSTRING(run_duration, 3, 2) + ':' + 
       SUBSTRING(run_duration, 5, 2) as [Duration], 
       message,
       [Job description]
	   ,ROW_NUMBER()over(partition by job_name order by run_datetime desc) RN
  FROM jobListCTE
  where run_datetime > DATEADD(week, GETDATE(), -2) -- Last 2 weeks
)

Open in new window


?

it gives error:

Msg 8116, Level 16, State 1, Line 1
Argument data type datetime is invalid for argument 2 of dateadd function.

Open in new window

0
Vitor MontalvãoMSSQL Senior EngineerCommented:
Strange. Try with DATEDIFF instead:
CTE2 as
(
SELECT job_name as [JobStep],
       run_datetime as [StartDateTime],
       SUBSTRING(run_duration, 1, 2) + ':' +
       SUBSTRING(run_duration, 3, 2) + ':' +
       SUBSTRING(run_duration, 5, 2) as [Duration],
       message,
       [Job description]
         ,ROW_NUMBER()over(partition by job_name order by run_datetime desc) RN
  FROM jobListCTE
  where DATEDIFF(week, run_datetime, GETDATE()) <= 2 -- Last 2 weeks
)
0
marrowyungSenior Technical architecture (Data)Author Commented:
so last 2 months is :

,CTE2 as
(
SELECT job_name as [JobStep], 
       run_datetime as [StartDateTime], 
       SUBSTRING(run_duration, 1, 2) + ':' + 
       SUBSTRING(run_duration, 3, 2) + ':' + 
       SUBSTRING(run_duration, 5, 2) as [Duration], 
       message,
       [Job description]
	   ,ROW_NUMBER()over(partition by job_name order by run_datetime desc) RN
  FROM jobListCTE
 where DATEDIFF(month, run_datetime, GETDATE()) <= 2 -- Last 2 months 

)

Open in new window


sorry, last 2 x days is :

,CTE2 as
(
SELECT job_name as [JobStep], 
       run_datetime as [StartDateTime], 
       SUBSTRING(run_duration, 1, 2) + ':' + 
       SUBSTRING(run_duration, 3, 2) + ':' + 
       SUBSTRING(run_duration, 5, 2) as [Duration], 
       message,
       [Job description]
	   ,ROW_NUMBER()over(partition by job_name order by run_datetime desc) RN
  FROM jobListCTE

 where DATEDIFF(day, run_datetime, GETDATE()) <= 2 

)

Open in new window


?
0
marrowyungSenior Technical architecture (Data)Author Commented:
so the detail step failure message and who run the job cant' be include?
0
Vitor MontalvãoMSSQL Senior EngineerCommented:
Step failure message can be included ( h.run_status = 0) but who run the job can't.
0
marrowyungSenior Technical architecture (Data)Author Commented:
"Step failure message can be included ( h.run_status = 0)"

from time to time I run this it doesn't show up, it just say error.

"but who run the job can't. "

it will relies on the message column in the output.
0
Vitor MontalvãoMSSQL Senior EngineerCommented:
from time to time I run this it doesn't show up, it just say error.
Check which columns are you retrieving from.
0
marrowyungSenior Technical architecture (Data)Author Commented:
we have an example just here:

from the script:

  The job failed.  The Job was invoked by Schedule 35 (Process Mail Schedule).  The last step to run was step 1 (Process Mail).  NOTE: Failed to notify 'SQL DBA' via email.

from SQL job history of that failed steps:


 Executed as user: xxx\yyy. Attachment file \\zzzz\data\uncleLeo\DocuShare\ELHScannedDocs\BOL-ELHEGLV142553325801LAX-EVERLIBERAL10132015-2.pdf.pdf is invalid. [SQLSTATE 42000] (Error 22051).  The step failed.
0
marrowyungSenior Technical architecture (Data)Author Commented:
basically by the script, that one only show what is in steps 0, it don't show what is the failed steps and what is the failed steps message.

this is what worry me.
0
Vitor MontalvãoMSSQL Senior EngineerCommented:
Once again, check what are you filtering:
WHERE h.step_name = '(Job outcome)'

This won't return the error message since you're only filtering step_name = '(Job outcome)' .
Try without the filtering (remove or comment the following WHERE):
WHERE h.step_name = '(Job outcome)' and h.run_status = 1 /* run status: 1=successs, 0= failed */
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
marrowyungSenior Technical architecture (Data)Author Commented:
ok, you are right!

but one thing, can we show both the result from condition with WHERE h.step_name = '(Job outcome)'  and without that?
0
Vitor MontalvãoMSSQL Senior EngineerCommented:
but one thing, can we show both the result from condition with WHERE h.step_name = '(Job outcome)'  and without that?
Sure. Just remove that condition so will bring all.
0
marrowyungSenior Technical architecture (Data)Author Commented:
non, I mean can we have both message?
0
Vitor MontalvãoMSSQL Senior EngineerCommented:
Yes you can but as I said the only way to achieve that is to remove the filter otherwise will bring only the successfully steps.
0
marrowyungSenior Technical architecture (Data)Author Commented:
yes. but how can we give both condition, which means diff result for the SAME record, agree ?
0
Vitor MontalvãoMSSQL Senior EngineerCommented:
Sorry, I'm not following you in this one.
Did you run without the filter so you can compare?
0
marrowyungSenior Technical architecture (Data)Author Commented:
oh ,no . I mean they both return as message, so how can I just change the filter and return them both ?
0
Vitor MontalvãoMSSQL Senior EngineerCommented:
Give it a try, analyze it and try to understand the changes. If you still have doubts then come back and make more questions :)
0
marrowyungSenior Technical architecture (Data)Author Commented:
yes and yes sir :) bathroom now.
0
marrowyungSenior Technical architecture (Data)Author Commented:
0
marrowyungSenior Technical architecture (Data)Author Commented:
you are very good helper .
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.