Change management best practice

Posted on 2016-09-21
Last Modified: 2016-09-27
Hi all,  I was hoping you could offer some best practice advice in terms of change management where multiple services are affected.

We currently have a change managmenet process in place but where a change encompasses numerous services I.e 20+ what we have been doing is creating one overarching request for change form. This works fine in the majority of cases but I'm concerned about the lack of detail in this form in terms of implementation plans testing,  risk assessments etc.  For example the implementation plan might say that server x needs to be shut down or moved to anothet DR site and thats it. Whereas if this was to be done on its own it would cover all the steps required to do this.
To overcome this surely you would need an RFC form for each affected service which  would be time consuming.

What's the best way of managing these situations as we have had issues arise because correct procedures weren't followed or testing wasn't completed because it was part of the overarching RFC.
Question by:jdc1944
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 27

Assisted Solution

aburr earned 150 total points
ID: 41811179
"To overcome this surely you would need an RFC form for each affected service which  would be time consuming."
If indeed it is "surely" then you must do it. Turn you attention to make sure that the required steps and approvals are done quickly. Study the bottlenecks and eliminate them.
LVL 25

Accepted Solution

Cyclops3590 earned 350 total points
ID: 41812405
chg mgmt is a major pain and i'm sure you'll find no one does it perfect.  why?  as you've eluded to; imperfect information in which to make a fully informed decision.

with that said, chg mgmt boils down the following:
1) Identify change that is desired to be made (this involves an IVB, implementation plan, verification plan, and backout plan)
1b) Part of this is the risk of the change going bad and if backout is even possible and how long it'd take.
2) Identify what depends on the item being changed.
3) Identify the verification and recovery for those potentially affected services as well as the risks/time frame associated with recovery
3b) Part of the this is, what is the business impact if an outage is caused.  For example, a non-essential system is taken out for a day, who cares.  If your ecommerce system is taken out for even 30 minutes, it can have potential customer image ramifications even.  The risk is defined purely by mgmt.  The possibility by the engineers.
4) Mgmt needs to decide if the change has enough business value to do or if there are things that should be done to minimize potential impact or if the change can be broken up.

Sorry, but without specifics, this is very much of a philosophical/ideological question which is heavily based on experience.  ITIL outlines good chg mgmt processes IMHO.

However, a couple quick examples from my own experience
1) Upgrading network devices
Even if they are core devices, the risk should be medium to low.  The reason is chances are it is HA so you can do rolling upgrades.  However, this can sometimes mean that traffic in flight gets forcibly reset.  While most apps should recover, its not guaranteed.  In this case, there is impact.  But should you really try to identify every single app that could possibly be affected?  Kind of.  Only for business critical systems (systems mgmt have identified as required 5 9's uptime).  Then mgmt makes the decision.  This would typically just be done by notifying the team to ensure there aren't multiple changes going on that increases risk, but beyond that notification is good enough.  So email, and do it.  App teams then know about it and to watch carefully at their systems during that time frame.
So yes, systems can be impacted, but upgrading for bug fixes or needed features helps the business more than slowing things down so much by being crippled by FUD that something might go wrong

2) Re-architecting dynamic routing over entire company
You are completely redoing how BGP is architected to be used.  This is high risk, because if things go sideways and you don't have a out of band connection, you could take down the entire network without any ability to quickly get it up.  So what do you do?  Well, basically the same as in #1.  The major difference is higher risk.  All that means is that you may have to spend more time planning.  Since if things go bad, the backout plan is extremely undesired and time intensive.  You also change when it may be done as well as a lot of communication ahead of time so that others don't schedule changes in the meantime.  All of this is up to mgmt to help coordinate though.

In summary, there are 3 things to remember, again imho.
1) Engineers identify technical risks, recovery plans and possibility of failures
2) Mgmt identify business risks and solutions to minimize those risks
3) Don't be crippled by fear of breaking stuff.  Things break, it's inevitable.  Prepare for things breaking to quickly fix them.  Simply, even in high risk changes, at some point you have to pull the trigger.

Featured Post

Best Practices: Disaster Recovery Testing

Besides backup, any IT division should have a disaster recovery plan. You will find a few tips below relating to the development of such a plan and to what issues one should pay special attention in the course of backup planning.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

How to set-up an On Demand, IPSec, Site to SIte, VPN from a Draytek Vigor Router to a Cyberoam UTM Appliance. A concise guide to the settings required on both devices
I've been asked to discuss some of the UX activities that I'm using with my team. Here I will share some details about how we approach UX projects.
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question