Avatar of marrowyung
marrowyung

asked on 

keep a historical record of failed SQL job

Dear all,

right now we tried to see how frequent if a SQL failed in daily/weekly/monthly basis, we can create a table to store the result.

the table which store the result can have this schema:

gmt Date/Time
Job id/name
Job Desc
Failure Desc
Executed By User

any script/tools  you all using to do this?
Microsoft SQL ServerMicrosoft SQL Server 2008

Avatar of undefined
Last Comment
marrowyung
Avatar of Vitor Montalvão
Vitor Montalvão
Flag of Switzerland image

Why not query directly from msdb?
SELECT h.server, j.name, h.step_name, h.message, h.run_date, h.run_time, h.run_duration
FROM sysjobs j
	INNER JOIN sysjobhistory h ON j.job_id=h.job_id
WHERE h.run_status=0 -- Failed
ORDER BY h.run_date, h.run_time

Open in new window

Avatar of marrowyung
marrowyung

ASKER

this script don't run well on my side so I change it to:

SELECT h.server, j.name, h.step_name, h.message, h.run_date, h.run_time, h.run_duration
FROM msdb..sysjobs j
	INNER JOIN msdb..sysjobhistory h ON j.job_id=h.job_id
WHERE h.run_status=0 -- Failed
ORDER BY h.run_date, h.run_time

Open in new window


as we want to see how frequent is the failure and we prefer to store the failure record to a table with time so that we can measure the frequency  of it.
Avatar of marrowyung
marrowyung

ASKER

basically I am testing this script to check the last running result of a job:

;WITH jobListCTE as
(
  SELECT j.name as job_name, 
         msdb.dbo.agent_datetime(run_date, run_time) AS run_datetime,
         RIGHT('000000' + CONVERT(varchar(6), run_duration), 6) AS run_duration,
         message
    FROM msdb..sysjobhistory h
   INNER JOIN msdb..sysjobs j ON h.job_id = j.job_id
   WHERE h.step_name = '(Job outcome)' and h.run_status = 1 /* run status: 1=successs, 0= failed */
) 
,CTE2 as
(
SELECT job_name as [JobStep], 
       run_datetime as [StartDateTime], 
       SUBSTRING(run_duration, 1, 2) + ':' + 
       SUBSTRING(run_duration, 3, 2) + ':' + 
       SUBSTRING(run_duration, 5, 2) as [Duration], 
       message
	   ,ROW_NUMBER()over(partition by job_name order by run_datetime desc) RN
  FROM jobListCTE
)

SELECT * FROM CTE2 T1
WHERE RN = 1
order by [JobStep]

Open in new window


but your script return more information like failure detail, how to integrate the job steps and the failure message of each steps of it ?
this script don't run well on my side so I change it to:
You'll need to run it in the msdb context but I saw that you added the reference to msdb to the table name and that's also fine.
Avatar of marrowyung
marrowyung

ASKER

yeah.

is there anyway to integrate the job steps and the failure message of each steps of it if it is failed ?
is there anyway to integrate the job steps and the failure message of each steps of it if it is failed ?
JobHistory table has that information but your query is only filtering by succeeded job executions: WHERE h.step_name = '(Job outcome)' and h.run_status = 1 /* run status: 1=successs, 0= failed */
Avatar of marrowyung
marrowyung

ASKER

"JobHistory table has that information but your query is only filtering by succeeded job executions: WHERE h.step_name = '(Job outcome)' and h.run_status = 1 /* run status: 1=successs, 0= failed */ "

but I can also filter out only failed record by changing run_status to 0, right?

if I remove run_status as the condition then it return all, agree?
but I can also filter out only failed record by changing run_status to 0, right?
 if I remove run_status as the condition then it return all, agree?
Right and Agree :)
Avatar of marrowyung
marrowyung

ASKER

but how can I find out the failed job each month if i only query the msdb ? must change the log history to make it store record for a least a month ?

from out point of view, creating a table can store as much record as we can only for failed job and we just then query that table without touching the msdb configuration ? agree ?
Yes, querying msdb depends on how long it can keeps job execution history so if you need it to keep more records than the current configuration why not increase it instead of building a new table and creating all process to copy data from msdb to the new table?
You're only moving records from place so why not keep them in the original tables but for longer?
Avatar of marrowyung
marrowyung

ASKER

so you also in the same page that I can increase the number of log msdb by increasing the number in that page?

User generated image
and then just use this script ?

;WITH jobListCTE as
(
  SELECT j.name as job_name, 
         msdb.dbo.agent_datetime(run_date, run_time) AS run_datetime,
         RIGHT('000000' + CONVERT(varchar(6), run_duration), 6) AS run_duration,
         message
    FROM msdb..sysjobhistory h
   INNER JOIN msdb..sysjobs j ON h.job_id = j.job_id
   WHERE h.step_name = '(Job outcome)' and h.run_status = 1 /* run status: 1=successs, 0= failed */
) 
,CTE2 as
(
SELECT job_name as [JobStep], 
       run_datetime as [StartDateTime], 
       SUBSTRING(run_duration, 1, 2) + ':' + 
       SUBSTRING(run_duration, 3, 2) + ':' + 
       SUBSTRING(run_duration, 5, 2) as [Duration], 
       message
	   ,ROW_NUMBER()over(partition by job_name order by run_datetime desc) RN
  FROM jobListCTE
)

Open in new window

Avatar of marrowyung
marrowyung

ASKER

how can I include the "Executed By User" in the above script ?
The "Executed by user" isn't stored in the tables. There's the job owner but I don't know if it useful for you.
Avatar of marrowyung
marrowyung

ASKER

tks.

how can we make it works? create a new table now and join one more table and then insert the result to this newly created table will works?

what table we can join to see the result ?
So, you still thinking in going ahead with replicating the msdb tables into a new table.
This will give you some work (not really to create it but to manage it). You already have the initial query and now you'll need to find a way to insert only new rows. Maybe something like running a daily job at midnight (00:00) and insert only rows from the previous date (run_date>=GETDATE()-1).
Avatar of marrowyung
marrowyung

ASKER

"So, you still thinking in going ahead with replicating the msdb tables into a new table."

nono. what I need is the information on who execute the job and not the job owner information.

as default information from MSDB don't show it, it makes me think I need an extra table and extra join query to implement this.

I think this is the last information I need from the result.
nono. what I need is the information on who execute the job and not the job owner information.
I couldn't find that information nowhere. Will try to dig on this and back to you.
Avatar of marrowyung
marrowyung

ASKER

one interest thing I found out is, if the job has many steps and only one of the steps failed, it can't show you what is the error message.

it just said job failed at step (X) and don't show you what the error is . agree?

I think we are close as long as the script just return who executed it manually with error message on the "message" column will be good ?
Well, the error message have that information but then you can only store it when an error occurs (and you'll need to extract it from a text field). So, for the success steps you won't be able to register who ran the job.
Avatar of marrowyung
marrowyung

ASKER

but we have a job yesterday failed and when using the above script I can't see which steps is failing and who did this, any idea ?
Well, you're filtering by jobs that has been succeeded:
WHERE h.step_name = '(Job outcome)' and h.run_status = 1 /* run status: 1=successs, 0= failed */
 so you'll never get the failed ones.
And I already called your attention for that in my comment #41149484
Avatar of marrowyung
marrowyung

ASKER

one thing, I am quite busy on how to monitor replication as you can see I just post one more.

our DBA in US is going to leave at the end of this week so I am focusing on merge replication monitoring now.

but for this, any way to change the script so that we can see how many failed/success during last week/ last 2 weeks/ last months and last 2 months ?
You'll need to group by status and add a filter criteria to only bring the wanted period.
Avatar of marrowyung
marrowyung

ASKER

"add a filter criteria to only bring the wanted period. "

where by date and what should be the format like ?
run_datetime is already DATETIME format so you can use DATEDIFF function. Examples:
run_datetime > DATEADD(week, GETDATE(), -2) -- Last 2 weeks
run_datetime > DATEADD(month, GETDATE(), -1) -- Last month
Avatar of marrowyung
marrowyung

ASKER

tks victor, I will test it.

"but we have a job yesterday failed and when using the above script I can't see which steps is failing and who did this, any idea ? "

We have one other job failed but what I want to say here, from the message field output, I can't see what is the steps it failes and I can't see what the error message is.

any idea?
Avatar of marrowyung
marrowyung

ASKER

I have to put the date inside that loop:

CTE2 as
(
SELECT job_name as [JobStep], 
       run_datetime as [StartDateTime], 
       SUBSTRING(run_duration, 1, 2) + ':' + 
       SUBSTRING(run_duration, 3, 2) + ':' + 
       SUBSTRING(run_duration, 5, 2) as [Duration], 
       message,
       [Job description]
	   ,ROW_NUMBER()over(partition by job_name order by run_datetime desc) RN
  FROM jobListCTE
)

Open in new window


like this :
CTE2 as
(
SELECT job_name as [JobStep], 
       run_datetime as [StartDateTime], 
       SUBSTRING(run_duration, 1, 2) + ':' + 
       SUBSTRING(run_duration, 3, 2) + ':' + 
       SUBSTRING(run_duration, 5, 2) as [Duration], 
       message,
       [Job description]
	   ,ROW_NUMBER()over(partition by job_name order by run_datetime desc) RN
  FROM jobListCTE
  where run_datetime > DATEADD(week, GETDATE(), -2) -- Last 2 weeks
)

Open in new window


?

it gives error:

Msg 8116, Level 16, State 1, Line 1
Argument data type datetime is invalid for argument 2 of dateadd function.

Open in new window

Strange. Try with DATEDIFF instead:
CTE2 as
(
SELECT job_name as [JobStep],
       run_datetime as [StartDateTime],
       SUBSTRING(run_duration, 1, 2) + ':' +
       SUBSTRING(run_duration, 3, 2) + ':' +
       SUBSTRING(run_duration, 5, 2) as [Duration],
       message,
       [Job description]
         ,ROW_NUMBER()over(partition by job_name order by run_datetime desc) RN
  FROM jobListCTE
  where DATEDIFF(week, run_datetime, GETDATE()) <= 2 -- Last 2 weeks
)
Avatar of marrowyung
marrowyung

ASKER

so last 2 months is :

,CTE2 as
(
SELECT job_name as [JobStep], 
       run_datetime as [StartDateTime], 
       SUBSTRING(run_duration, 1, 2) + ':' + 
       SUBSTRING(run_duration, 3, 2) + ':' + 
       SUBSTRING(run_duration, 5, 2) as [Duration], 
       message,
       [Job description]
	   ,ROW_NUMBER()over(partition by job_name order by run_datetime desc) RN
  FROM jobListCTE
 where DATEDIFF(month, run_datetime, GETDATE()) <= 2 -- Last 2 months 

)

Open in new window


sorry, last 2 x days is :

,CTE2 as
(
SELECT job_name as [JobStep], 
       run_datetime as [StartDateTime], 
       SUBSTRING(run_duration, 1, 2) + ':' + 
       SUBSTRING(run_duration, 3, 2) + ':' + 
       SUBSTRING(run_duration, 5, 2) as [Duration], 
       message,
       [Job description]
	   ,ROW_NUMBER()over(partition by job_name order by run_datetime desc) RN
  FROM jobListCTE

 where DATEDIFF(day, run_datetime, GETDATE()) <= 2 

)

Open in new window


?
Avatar of marrowyung
marrowyung

ASKER

so the detail step failure message and who run the job cant' be include?
Step failure message can be included ( h.run_status = 0) but who run the job can't.
Avatar of marrowyung
marrowyung

ASKER

"Step failure message can be included ( h.run_status = 0)"

from time to time I run this it doesn't show up, it just say error.

"but who run the job can't. "

it will relies on the message column in the output.
from time to time I run this it doesn't show up, it just say error.
Check which columns are you retrieving from.
Avatar of marrowyung
marrowyung

ASKER

we have an example just here:

from the script:

  The job failed.  The Job was invoked by Schedule 35 (Process Mail Schedule).  The last step to run was step 1 (Process Mail).  NOTE: Failed to notify 'SQL DBA' via email.

from SQL job history of that failed steps:


 Executed as user: xxx\yyy. Attachment file \\zzzz\data\uncleLeo\DocuShare\ELHScannedDocs\BOL-ELHEGLV142553325801LAX-EVERLIBERAL10132015-2.pdf.pdf is invalid. [SQLSTATE 42000] (Error 22051).  The step failed.
Avatar of marrowyung
marrowyung

ASKER

basically by the script, that one only show what is in steps 0, it don't show what is the failed steps and what is the failed steps message.

this is what worry me.
ASKER CERTIFIED SOLUTION
Avatar of Vitor Montalvão
Vitor Montalvão
Flag of Switzerland image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of marrowyung
marrowyung

ASKER

ok, you are right!

but one thing, can we show both the result from condition with WHERE h.step_name = '(Job outcome)'  and without that?
but one thing, can we show both the result from condition with WHERE h.step_name = '(Job outcome)'  and without that?
Sure. Just remove that condition so will bring all.
Avatar of marrowyung
marrowyung

ASKER

non, I mean can we have both message?
Yes you can but as I said the only way to achieve that is to remove the filter otherwise will bring only the successfully steps.
Avatar of marrowyung
marrowyung

ASKER

yes. but how can we give both condition, which means diff result for the SAME record, agree ?
Sorry, I'm not following you in this one.
Did you run without the filter so you can compare?
Avatar of marrowyung
marrowyung

ASKER

oh ,no . I mean they both return as message, so how can I just change the filter and return them both ?
Give it a try, analyze it and try to understand the changes. If you still have doubts then come back and make more questions :)
Avatar of marrowyung
marrowyung

ASKER

yes and yes sir :) bathroom now.
Avatar of marrowyung
marrowyung

ASKER

you are very good helper .
Microsoft SQL Server
Microsoft SQL Server

Microsoft SQL Server is a suite of relational database management system (RDBMS) products providing multi-user database access functionality.SQL Server is available in multiple versions, typically identified by release year, and versions are subdivided into editions to distinguish between product functionality. Component services include integration (SSIS), reporting (SSRS), analysis (SSAS), data quality, master data, T-SQL and performance tuning.

171K
Questions
--
Followers
--
Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews

TRUSTED BY

IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo