On Beyond Tools
A conversation I recently had with the DevOps manager of a major online retailer really made me think about DevOps monitoring tools
. The manager and I discussed how several DevOps shops seem to define themselves based on the number of tools they have monitoring their build and IT stack. The point he went on to make is:
You can go up and down the isles at a conference with the corporate credit card and buy every tool in sight but all those purchases don’t make you a DevOps. All it makes you is the owner of many tools.
The point of the manager’s comment is that being an effective DevOps shop or IT service provider means you go beyond just owning tools. You have to incorporate those tools into a meaningful DevOps philosophy and an understanding of proper tool management and proper team integration. And, importantly from my humble perspective, proper alerting.
DevOps, as a philosophy, encourages shifting left
and putting testing earlier into the process so that teams can be proactive in their support rather than reactive to problems. So, how does a DevOps enable this shift in thinking from reactive to proactive? Read on to find out.
Devops monitoring tools – a love affair
Devops is about bringing development and operational teams together. And to some extent, tools can be a way to improve this relationship. A recent whitepaper
from Puppet describes how:
Adopting DevOps practices usually means embracing automation as a default solution to many problems.
And indeed every developer or ops loves their shiny new toys. Tools do allow for faster builds, quicker deployment, greater visibility and faster feedback.
Puppet, for example, can be used for greater server configuration and configuration management. Nagios is also a favorite for infrastructure monitoring. Jenkins can be used to build code, create Docker containers and push code to production. Jenkins is also great for continuous integration. Many enjoy the integration provided by our friends at Logz.io
because it collects logs from all services, applications, networks, tools, servers, and more in an environment into a single, centralized location for processing and analysis.
Yet these tools, as strong as they are at dealing with reams of data, do not alert the end user, be it Dev or Ops, when a real issue arises. For the most part, they will not solve underlying issues that arise in in any operation such as failed deployments, security issues or scaling problems. Instead those types of issues need to be alerted on and responded to appropriately.
Don’t forget the alerting
If DevOps were just to rely on their tools, they would be left in a position where they were always reacting to situations rather than being proactive. Metric provided by all the shiny devops tools enable us to measure and observe various components of the operation. But it is alerting that draws attention to the particular systems that require observation, inspection, and intervention. It is alerting that furthers proactive management.
By putting alerting earlier in the monitoring process, DevOps teams take the true meaning of shift left to heart. Teams can see early on when software doesn’t deploy as expected by alerting the proper team members. Similarly, security vulnerabilities can be detected early on and alert the engineers who can react appropriately and intervene.
Not all alerts are created equal
Even though most DevOps teams have adopted alerting practices, they are often far from alerting best practices. It’s not enough to just have an alerting tool. Like a monitoring tool, if left uncalibrated, alerts will simply produce a sea of noisy data. Instead, teams should calibrate alerts so that they are meaningful.
For example, a meaningful alert might be something along the line of web requests are taking more than x seconds to process and respond or new servers are failing to spin up as expected. And these are great examples of what could be high priority alerts for a company. The Ops team, in these cases, can then investigate based on specific information rather than complaints from end users.
Alternatively, alerts that are less high priority, such as server is 90% full can be a low priority alert that can be forwarded to the on call engineer but don’t rise to the level of a 2am wakeup call. In OnPage, you can send this low priority alert to go to the engineer’s account but ensure the account notifies the engineer during normal business hours.
6 steps to alerting best practices
It’s an important realization that not all alerting needs to wake up an engineer. Successful adoption of DevOps means planning ahead and providing meaningful alerts when issues do occur. To this end, OnPage has the following alerting best practices which have been vetted by our numerous end users:
- Make sure your alerts are calibrated. Establish a baseline so you know how your systems are supposed to work
- Ensure alerts are tied to a schedule. As weird as it sounds, some shops just alert everyone. You never want to alert everyone. Make sure your alerts are tied to a schedule so that one person is alerted. If the engineer is unavailable, then escalate to the next person on call.
- Ensure alerts are actionable. Who wants to be woken up to a message that is pointless such as there’s a problem with deployment in the test environment. Instead, ensure alerts have a direct piece of information that needs to be investigated and resolved.
- Develop run books. Publish operating procedures so on-call can become more standardized.
- Review audit trails. Make sure alerts went to the right person on the team who is best able to resolve the issue
- Review on call at weekly meetings. Review alerts that were received during the week to ensure sufficient information is arriving with alerts and that alerts are actionable. If they are not, then alter the alert messaging so it is more effective.
By following these steps your DevOps team will begin the process towards thinking from a proactive rather than a reactive position.
DevOps monitoring tools are powerful instruments. However, the devops monitoring tools need to be attached to proper alerting tools and procedures to enable proactive engineering. OnPage’s cloud based alerting tool is a powerful tool to ensure the right information gets to the right engineer at the right time.
See what proper alerting can do to help your team’s monitoring. Schedule a demo with OnPage today.