Troubleshooting is the steps you take to resolve an issue. This is perhaps the single most important skill an IT person can possess. The ability to breakdown any problem in to its individual components is required before any true resolution can be determined. Troubleshooting is not confined to the computer field. We troubleshoot car problems, vacuums, electrical issues, and so on. It is simply a skill set, not confined to any particular field.
No one taught me how to troubleshoot. I never read an FAQ or a HOW-TO. For me it was all learn by doing. Over time I realized the steps I took to resolve an issue were the same. The problems were different, though the process more or less stayed the same.
These steps are grouped in to three broad phases. You likely will not follow the steps within each phase exactly as I've listed them, though you will find yourself moving through the three phases in order.
I've equated the three phases to one of the fundamental planning steps a programmer might utilize, which is to determine the Input, the Process, and the Output required by a module within their application. Troubleshooting and programming are essentially the same. With both you are presented with a problem for which you must arrive at a solution. The three phases are:
1. INPUT PHASE: The Input Phase is the information gathering portion of your troubleshooting.
2. PROCESS PHASE: The Process Phase is when you formulate and put in place a solution utilizing the information you gathered from the Input Phase.
3. OUTPUT PHASE: The Output Phase is clean up. You are ensuring the solution you put forth is the correct one.
Mindset: Stop for a moment. Think about the situation. Don't panic. Don't attempt the first fix that pops in to your head just for the sake of doing something. Any problem that warrants your attention necessarily warrants your best effort, so make sure you approach the next steps logically and with a cool head.
Be confident. You are in charge of the problem. Not the other way around. If you don't think you can fix a problem, then you won't. It is okay to make a mistake or even fail. It happens. Just make sure you learn from it.
Listen: If you are working with a customer then listen to them with intent. Don't fiddle with the mouse or type on the keyboard. Listen to the customer no matter how inconsequential you believe the information to be.
Recent Changes: Asking yourself (or the customer) this question early in the troubleshooting process can save you a lot of time and headaches. Keep in mind that customers will often initially reply that nothing has changed or that they have done nothing, so be prepared to do a bit of investigative questioning. Your goal here is to find cause rather than accuse. Consider questions, such as "When was the last time this worked?" and "Have you recently changed your workflow?"
Diagram: Sometimes drawing the problem (especially if it is complex), along with the flow of information, can help break a process down in to its component parts. Diagraming a problem is giving you a visual to look at rather than relying upon your brain to sort everything out. Your brain has more important things to do. Flowcharting is a fundamental technical skill. Learn it and love it.
Scope: Determine how far reaching your problem truly is. Often a problem will be reported by a single customer, though with a bit of investigative work you'll find that three other customers are experiencing the exact same issue, so don't hesitate to ask other customers if they are experiencing the same issue.
Research: Research can take the form of message boards, manuals, IT professional websites, white papers, FAQs, or consulting co-workers. Use them. Read them. Study them. Though I have included Research in the Input Phase, it could just as easily be in the Process Phase and it is probably fair to say this step bridges the gap between the two phases.
Get Dirty: Don't be afraid to get your hands dirty. Developing IT skills, or any skills for that matter, is all about making mistakes. You are effectively learning what not to do. The only way to do this is to work on solving problems on your own. Asking for help is fine, though if the situation allows, use the opportunity to learn. You will not learn if your first step in troubleshooting is to depend upon someone else.
A word of caution: Learning by doing does not mean deleting user accounts on the fly, rebooting production servers during work hours, running potentially harmful scripts untested, et cetera. Use common sense. Set up a test environment if resources permit.
Records: Record any modifications you make as you work. This can be a lifesaver if your troubleshooting path starts off on the wrong foot, which isn't uncommon for any of us. Ensuring you have an accurate log of any changes you made will help you not make the problem worse.
I record every modification we make to our XenDesktop environment. When a new problem surfaces one of my first steps is to look back on our change log to determine the last modification we made. When I need to make changes to our telephone autoattendant system I take screenshots of the before and after configuration screens so that I can easily revert back if necessary.
Reproduce: Reproduce the issue. Assuming the issue isn't persistent, can you reproduce it? Being able to reproduce an elusive problem is a key to solving it. You will often require more than verbal assistance from the customer. Shadow them or stand behind them as they maneuver step-by-step through the steps they took to arrive at the error.
This will also reveal whether you are dealing with a technical issue or a training issue. For example, one of our nurses recently put in a helpdesk ticket stating that our electronic medical records software was not properly switching between her login and the physician's. After observing the nurse's workflow we discovered she was not pressing the proper button to initiate the user switch.
KISS: Keep it simple. Computer problems are notorious for having a dozen different possible causes. For example, a workstation not connecting to the network can do so for probably one of a dozen reasons, including NIC drivers, workstation domain account, TCP/IP configuration, et cetera. Start with the easiest to test and most likely of causes. This doesn't always mean you'll have the problem resolved quickly, but odds are you'll do so faster than if your troubleshooting starts with the least likely and most complicated possible cause.
Root Cause: Look for a root cause. This works hand-in-hand with the scope determination from the Input Phase and you can easily consider them one in the same. Determine if more than one customer or workstation is experiencing the same problem or anything similar. Multiple customers experiencing the same issue will likely rule out quite a bit of suspects narrowing down your search.
A few years ago one of my customers began complaining of intermittent drops to our Citrix servers. Over the course of several days my technicians attempted a multitude of corrections, including changing network cables, switch ports, network card settings and even a different workstation. Unfortunately the problem persisted. No one thought to ask if anyone else was experiencing the same problem until several days passed. Only then did we discover several other customers in the same department were also experiencing dropped connections. With a new direction we quickly discovered that our cable vendor had bound the network cables too tightly together causing bleedover. If we had done a proper job of discovery early in the troubleshooting process we would have saved ourselves some time and embarassment.
Vendor Support: Leverage vendor support. You aren't in this alone. Vendors work with their products eight hours every working day. You don't. Utilize their knowledge and resources.
Recently I was testing a production release of Citrix's XenDesktop 3.0 product. I spent hours attempting to get pass through authentication working with Web Interface, which I had easily configured during the beta release. I finally relented by putting a call in to Citrix support only to learn pass through authentication to Web Interface was deprecated (citing security risks) with the 3.0 release. The moral of this particular story is that there is nothing wrong with trying, though you must know when to cut your losses by contacting support.
Symptom vs. Cause: When putting a solution in place ask yourself if you are treating the symptom or the cause. Are you treating a chronic problem with an acute solution? Let's say you wake up every morning absolutely exhausted. You find yourself resorting to the snooze button three or four times causing you to be late to work. At work the exhaustion makes concentration difficult. Perhaps you find yourself dozing off in meetings. You decide that kick starting your day with an energy drink will help, and it does, for a few hours anyway. After a while, though, you crash causing you to drink another energy supplement. By resorting to the energy drinks you are not truly addressing the problem. You are addressing the symptom rather than the cause. You are addressing "I am tired," rather than "Why am I tired?" Perhaps by eating better, exercising, or simply getting more sleep you could better correct the cause.
Technical work is no different. We can apply quick fixes or we can apply solutions. Sometimes we have to do both. The quick fix is occasionally necessary to get the customer(s) back up and running, though we must follow up with a long term solution. So next time, before you simply reboot, ask yourself if you are addressing the symptom rather than the cause.
Test: Test your solution to the best of your ability to ensure your remedy is the correct one. If it is a persistent problem then you should know right away. If the problem seems inconsistent then you might not know for a few hours, days, or even weeks. Communicate this to the customer very clearly. Assuming you fixed without properly testing and communicating to the customer, only to learn at a later date that the problem still exists is a black eye.
Follow Up: Follow up with the customer who reported the problem. Ensure they are aware the issue is corrected and satisfied with the solution. With some issues it is also helpful to follow up several times over the course of the day, week, or even month, particularly if you working to correct an intermittent issue. Follow up gives the customer a warm and fuzzy.
Document: A great deal of our work is routine. We unlock accounts and delete stuck print jobs every day. These are simple tasks that simply require little to no documentation. However, we also deal with issues in which after-action documentation is absolutely mandatory either due to the complexity of the problem or the rarity in which it is encountered. Documenting a solution to a complex issue ensures that when another customer is dealing with the exact same problem two years later you won't have to spend hours of research to retrace your steps.