Troubleshooting a problem which might have diverse effects on the entire company

Hi guys. I would like to seek advise from all of you gurus here.

When troubleshooting a problem regardless of AD, DNS, DHCP or other related problems.

How do you ensure that when you change a setting here or there that it would not affect the entire business operation. E.g like AD replication, DNS changes.

I am a fresh MCSE and I know that its an entire level when dealing with servers rather than client workstations.

One wrong move and it could cripple the entire business operation.
Questions that I would like to ask

1. What methodology or ways do you prevent or rather minimise the impace of what you do to the server.
2. Which impact the most? AD? DNS?
3. What to take note off.
4. and if it really happens, how do you rectify the problem ASAP.
More advise possible if you could let me know more about it.
Your advise is very important to me.
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.


This is a pretty difficult question. Basically I think the answer lies in being cautious in a production environment and possibly implementing a change control board or at a minimum a change logging system.

What this means is that everytime you make a change you either obtain approval from a group of people some of these members may be technical and others may be business users. Keeping track of everything you change is very important that way if something does go wrong you know exactly what was done.

The biggest cause of most issues in a stable environment is when someone changes something. If you don't have a log of all the changes made to the system finding the cause can be more difficult. I also schedule changes one at a time. I don't change 10 things on 10 different servers at the same time. If you have an issue how would you know what might have caused it?

This works well in big environments but if you only have a small setup then it might not be practical i.e. if you are the only IT person who could be part of the approval process?

In this situation I would advise further caution and at a minimum setup some virtual machines with the same basic setup as your production systems. For example if you use Exchange have a Exchange VM running with the same versions, service packs, etc. Same goes for other applications. Once you have this setup use it for testing. If you are not sure what could happen when you make a certain change then test it out in the virtual machines lab that you have setup.

The simple answer is before you change things in production make sure you have a good understanding of what the change will do and why you are doing it. If you don't understand and think there may be some risk test it somewhere else first, research it and you can always ask here or other forums.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
1. What methodology or ways do you prevent or rather minimise the impace of what you do to the server.
Implement a good change management process which should include but not be limited to impact assesment and a roll-back plan. Take a look at ITIL Service Delivery - change management process.

2. Which impact the most? AD? DNS?
You should know this as an MCSE - It depends what you change. If DNS is offline and broken  you've got no AD. You need to consider each change and it's potential impact.

3. What to take note off.
What you've changed and when it was done - even if you're fixing it.

4. and if it really happens, how do you rectify the problem ASAP.
Implement your roll back plan from step 1.

Not all changes require this kind of detail eg. adding a user account. In ITIL you'd set this up as a standard change with a change model and carry out the change and just log that it was done. It's common sense really.

Also make sure that you always have good backups. If you're making a major change and it all goes wrong a backup is always useful.
Excellent comments from these experts for a great question.  It is true that your approach and the dynamic of your support function will vary based on whether you run the show or work as a part of a support team.  A bad supervisor or support culture can make things like a system for "changelog" completely out of the question, which I am sure the other experts will agree, is worst thing for the health of a network.  I have seen it happen and had to live with supporting environments like this...much better when you call the shots or at least have management with half a brain for approaching support.

The biggest challenge I found myself facing was security.  I am constantly learning more about it.  The applications, services and directories all have plenty of documents explaining how to fix things when they go wrong.  Security is another world where I often found myself fighting to stay one step ahead of attack vectors and actual live attacks...there is nothing scarier and more troubling than knowing that someone has breached your system.  It's like being captain of a ship that sunk.  I hope you never have to deal with that!

You may know of it already but the SANS Institute has a lot of good stuff to read on security:

Supporting live environments over the past few years at all levels, some better than others, has taught me that, though we may try to avoid issues that impact business, it *will* happen.  Do not get stale in your technical aptitude, set up that test environment and blow it up constantly; sometimes with a live production system it seems easier to say "well, it's working...don't mess with it!"  Better to know how to fix things quickly when they break...because they will...or the users will claim that they have and you have to know exactly what you're talking about or else they will mob you like wolves.  Been there too!  Speak softly and carry a big Google!

A few last words of advice:  A good admin always leaves a back door.  If you do something more than once, script it.  Write documents for repeating support issues and distribute them amongst your team and/or company.  Standardize as much as you can...whether tuning company OS images for workstations or building out your server infrastructure...the more you keep things unified and arranged, the easier it will be to keep things running right and fix problems when they arise.  Good luck!
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Windows Server 2003

From novice to tech pro — start learning today.