Troubleshooting a problem which might have diverse effects on the entire company

Posted on 2008-02-01
Medium Priority
Last Modified: 2010-04-18
Hi guys. I would like to seek advise from all of you gurus here.

When troubleshooting a problem regardless of AD, DNS, DHCP or other related problems.

How do you ensure that when you change a setting here or there that it would not affect the entire business operation. E.g like AD replication, DNS changes.

I am a fresh MCSE and I know that its an entire level when dealing with servers rather than client workstations.

One wrong move and it could cripple the entire business operation.
Questions that I would like to ask

1. What methodology or ways do you prevent or rather minimise the impace of what you do to the server.
2. Which impact the most? AD? DNS?
3. What to take note off.
4. and if it really happens, how do you rectify the problem ASAP.
More advise possible if you could let me know more about it.
Your advise is very important to me.
Question by:moneywell
LVL 29

Accepted Solution

mass2612 earned 800 total points
ID: 20795024

This is a pretty difficult question. Basically I think the answer lies in being cautious in a production environment and possibly implementing a change control board or at a minimum a change logging system.

What this means is that everytime you make a change you either obtain approval from a group of people some of these members may be technical and others may be business users. Keeping track of everything you change is very important that way if something does go wrong you know exactly what was done.

The biggest cause of most issues in a stable environment is when someone changes something. If you don't have a log of all the changes made to the system finding the cause can be more difficult. I also schedule changes one at a time. I don't change 10 things on 10 different servers at the same time. If you have an issue how would you know what might have caused it?

This works well in big environments but if you only have a small setup then it might not be practical i.e. if you are the only IT person who could be part of the approval process?

In this situation I would advise further caution and at a minimum setup some virtual machines with the same basic setup as your production systems. For example if you use Exchange have a Exchange VM running with the same versions, service packs, etc. Same goes for other applications. Once you have this setup use it for testing. If you are not sure what could happen when you make a certain change then test it out in the virtual machines lab that you have setup.

The simple answer is before you change things in production make sure you have a good understanding of what the change will do and why you are doing it. If you don't understand and think there may be some risk test it somewhere else first, research it and you can always ask here or other forums.

Assisted Solution

Spot_The_Cat earned 400 total points
ID: 20795275
1. What methodology or ways do you prevent or rather minimise the impace of what you do to the server.
Implement a good change management process which should include but not be limited to impact assesment and a roll-back plan. Take a look at ITIL Service Delivery - change management process.

2. Which impact the most? AD? DNS?
You should know this as an MCSE - It depends what you change. If DNS is offline and broken  you've got no AD. You need to consider each change and it's potential impact.

3. What to take note off.
What you've changed and when it was done - even if you're fixing it.

4. and if it really happens, how do you rectify the problem ASAP.
Implement your roll back plan from step 1.

Not all changes require this kind of detail eg. adding a user account. In ITIL you'd set this up as a standard change with a change model and carry out the change and just log that it was done. It's common sense really.

Also make sure that you always have good backups. If you're making a major change and it all goes wrong a backup is always useful.

Assisted Solution

antioed earned 800 total points
ID: 20796115
Excellent comments from these experts for a great question.  It is true that your approach and the dynamic of your support function will vary based on whether you run the show or work as a part of a support team.  A bad supervisor or support culture can make things like a system for "changelog" completely out of the question, which I am sure the other experts will agree, is worst thing for the health of a network.  I have seen it happen and had to live with supporting environments like this...much better when you call the shots or at least have management with half a brain for approaching support.

The biggest challenge I found myself facing was security.  I am constantly learning more about it.  The applications, services and directories all have plenty of documents explaining how to fix things when they go wrong.  Security is another world where I often found myself fighting to stay one step ahead of attack vectors and actual live attacks...there is nothing scarier and more troubling than knowing that someone has breached your system.  It's like being captain of a ship that sunk.  I hope you never have to deal with that!

You may know of it already but the SANS Institute has a lot of good stuff to read on security:  http://www.sans.org/

Supporting live environments over the past few years at all levels, some better than others, has taught me that, though we may try to avoid issues that impact business, it *will* happen.  Do not get stale in your technical aptitude, set up that test environment and blow it up constantly; sometimes with a live production system it seems easier to say "well, it's working...don't mess with it!"  Better to know how to fix things quickly when they break...because they will...or the users will claim that they have and you have to know exactly what you're talking about or else they will mob you like wolves.  Been there too!  Speak softly and carry a big Google!

A few last words of advice:  A good admin always leaves a back door.  If you do something more than once, script it.  Write documents for repeating support issues and distribute them amongst your team and/or company.  Standardize as much as you can...whether tuning company OS images for workstations or building out your server infrastructure...the more you keep things unified and arranged, the easier it will be to keep things running right and fix problems when they arise.  Good luck!

Featured Post

Free tool for managing users' photos in Office 365

Easily upload multiple users’ photos to Office 365. Manage them with an intuitive GUI and use handy built-in cropping and resizing options. Link photos with users based on Azure AD attributes. Free tool!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Compliance and data security require steps be taken to prevent unauthorized users from copying data.  Here's one method to prevent data theft via USB drives (and writable optical media).
The article explains the process to deploy a Self-Service password reset portal I developed a few years ago. Hopefully, it will prove useful to someone.  Any comments, bug reports etc. are welcome...
This tutorial will walk an individual through the steps necessary to join and promote the first Windows Server 2012 domain controller into an Active Directory environment running on Windows Server 2008. Determine the location of the FSMO roles by lo…
This video shows how to use Hyena, from SystemTools Software, to update 100 user accounts from an external text file. View in 1080p for best video quality.

601 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question