<

Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x

SCOM Agent Health -  Heartbeat Alert fun with Orchestrator

Published on
4,659 Points
1,459 Views
2 Endorsements
Last Modified:
We were having a lot of "Heartbeat Alerts" in our SCOM environment, now "Heartbeat" in a SCOM environment for those of you who might not be familiar with SCOM is a packet of data sent from the agent to the management server on a regular basis, basically letting the management server know, “Hey I am still ok and here”, the interval is by default every 60 seconds, but this is customizable. When the guys checked these agents they were online and functioning with no apparent issues, so it made me think, the only thing it could be is stale alerts or something wrong with the agents.

Now I love automation and always having some excuse to use Orchestrator, so when the guys asked if I could make it easier for them to automatically get and repair these agents I jumped at it, literally.
System Center Orchestrator is Microsoft’s workflow management solution that allows you to automate the creation, deployment and monitoring in your environment, you will notice that I mention the word “Runbooks”, now runbooks contain the individual instructions for your automation process and each step is called an activity and each of these activities have configurable settings.

While creating mine I found this article from Nathan Olmstead: http://blogs.technet.com/b/systemcenterramblings/archive/2014/03/22/runbook-for-persisting-stale-heartbeat-alerts-in-scom.aspx

With mine I needed to check DNS, ping, update my alerts as well as send out notifications to the BackOffice team of failures if any during the whole remediation process so that they could act accordingly, so you will see a few more activities added.

I am also in the process of adding HP Service Manager Integration, allowing us to have Orchestrator log an incident (Service Ticket) automatically instead of needing our Helpdesk to log it for us, saving us time and giving us reporting. This would also give us an additional notification channel, making sure that nothing is missed.

Here is a view of the runbook

Stale-Alerts---Orchestrator.PNGI have attached a Activity Reference to give a little more info on each Activity

Activity-Snip.PNG

Here is a quick view of one or two of the individual activities and their configuration just to give an idea of what they look like.

Monitor Alert

This activity gets the alert from SCOM, you will see the filters below

Monitor-SCOM-Activity.PNG
DNS Check

This activity as stated above runs the "nslookup" command and it receives the server name from the previous activity "Monitor Alert"

DNSCheck-Activity.PNG Mail Activity

Here you will add the recipients, subject and body of the mail to be sent.

Send-Mail-Activity-1.PNG Mail Activity cont.  mail server settings where you will add the mail server to use for the SMTP connection as well as the sender address
Send-Mail-Activity-2.PNGEach activity in Orchestrator has the ability to  pass on relevant data onto the next activity where required and configured to be used, they are also connected by the "Link" lines you see between them.
 
What you could also do is disable the default "Heartbeat Alerts" monitor and create your own custom monitor. The reason I say that is then instead of having a "Monitor Alert" activity you could use SCOJobrunner and have it triggered from the SCOM side as soon as your custom alert is triggered. SCOJobrunner is a command-line tool you can use to trigger Orchestrator runbooks, so you could create a diagnostic and recovery command within your custom monitor. What this would also do is not require your runbook to always be running, using less overhead.

Yes you can clean it up by using child runbooks and I can also add more failure checks like adding a leg for the "Start Health Service" activity to also notify or remediate a failure when trying to start the agent health service again, but it is working perfectly for us and the automation of the agent repair is helping our guys a great deal, allowing them to focus on a few other things.

I hope this has been useful, if there are any questions please don’t hesitate contacting me

Thanks, 

Leon
 
2
Comment
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
0 Comments

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Join & Write a Comment

Viewers will learn the different options available in the Backstage view in Excel 2013.
The viewer will learn how to simulate a series of coin tosses with the rand() function and learn how to make these “tosses” depend on a predetermined probability. Flipping Coins in Excel: Enter =RAND() into cell A2: Recalculate the random variable…

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month