Return to HomepageTroubleshootinginvisible_brick.gif (884 bytes)

HOME | UP ONE LEVEL | BACK | NEXT | EMAIL  

invisible_block.gif (851 bytes)

Troubleshooting Methodology

Problems in a computer system are inevitable.   With a complex implementation of Windows, as we have it here at Bethel, problems can occur on many levels.  Finding and resolving a problem (a process often referred to as "troubleshooting") would be like finding a needle in a haystack if one was to approach the situation by haphazardly trying random solutions.  Effective troubleshooting requires the combination of knowledge of the system, solid deductive reasoning and a structured approach to the problem.  While knowledge and experience will vary from person to person, we can all benefit by a structured approach.

Troubleshooting.gif (11205 bytes)

The structured approach recommended by experienced industry professionals involves five steps that should lead to the solution of nearly any technical problem.

  1. Set the problem's priority.
  2. Collect information to identify the symptoms.
  3. Develop a list of possible causes.
  4. Test to isolate the cause.
  5. Study the results of the test to identify a solution.

The problem with attempting to find a needle in a haystack comes from trying to search the entire haystack.  Deduction means reasoning from the general to the specific.  That is, looking at all of the possibilities and then sequentially ruling out each of the probabilities until you reach the correct conclusion.

Setting Priorities
"When it rains, it pours."  Problems rarely occur alone.   Generally, others have been reported or are being reported at the same time that a problem call comes in.  When resources are limited, the work must be scheduled, usually according to priority.

Setting priorities is generally done by assessing each problem's impact.  A problem that halts work would usually take priority over a request that would just enhance work.  In the end, each support organization and even each individual support technician must answer the questions:  What is a true emergency?  What is a "mission critical" operation?  What types of problems will take precedence over others?

Collecting Information
The information that you collect will provide the basis for isolating the problem.   It is a good idea to have baseline information as a point of reference so as to recognize the difference between an aberration and what is normal.  Knowledge and experience plays an important role in the information gathering phase.  Collecting insufficient or inappropriate information could lead to a dead end or worse a wrong conclusion.  Remember that the symptoms will "describe" the problem for you.  Your job will be to find the symptoms and then make sense out of what they reveal.

Look first for the obvious things:  Is everything where it should be?  Is everything plugged in?  Are all of the needed files in place?  Review the existing documentation to see if the problem has occurred before and if there is a recorded solution.  Even if we have not experienced a particular problem, likely someone has.  Remember to check a computer support database like Microsoft TechNet.

Ask the Right Questions: Regardless of their technical know-how, users can be extremely helpful in your information gathering.  In many cases, the user is your key witness to the problem.  He may have seen something that will provide the very clue that you need.  If the problem is intermittent, it will often work perfectly for the troubleshooter.  The trick is to ask the right questions, to ascertain from the user 'what the computer or network is doing that makes him think that it is not performing correctly'.

   Useful Observations to look for:
  • "I can't log in."
  • "I cannot print."
  • "The network is really slow."
  • "I was connected to the server but I lost the connection."
  • "My computer freezes every time that I ..."
  • "One of my applications will not run."
  • "I've had this problem ever since they installed ..."
  • "I've had this problem ever since they moved ..."

Questions that reveal:

   Questions to isolate the problem:
  • How many are affected by the problem?  One, a specific area or group, everyone, or were users affected randomly?
  • When did the problem begin?
  • Has anything changed since the last time that it worked properly?
  • Does the problem occur constantly or is it intermittent?
  • Does the problem appear with all applications or just one?
  • What versions of software and the operating system is being used?
  • Has a new service pack or ROM pack been used?
  • Are there new users on the network?
  • Is there new equipment on the network?
  • Was there a new application installed before the problem occurred?
  • Has any of the equipment been moved lately?
  • Is there a pattern among certain vendors and certain components such as cards, disk drives, software, hubs, or network operating software.
  • Is this similar to a previous problem?
  • Has anyone else attempted to fix the problem?

Most new problems are just variant reoccurrences of the same old problems that have have been handled before.  In time, support technicians and engineers become so familiar with their system that generally know where to look when something goes wrong and are able to isolate a problem quickly.

Divide Your System: Some problems are not so quickly localized.  Particularly, with a large environment it may become necessary to troubleshoot on a much larger scale.  It is not advisable to troubleshoot an entire large network all at once, but rather you may want to look at a system in smaller segments.

There are different ways that you can logically segment the network, but one way that is highly recommended is to look at network components by class.   Each class of components would operate essentially the same way and would thus be subject to the same types of problems.

  
  • Clients
  • Adapters
  • Hubs
  • Cabling and connectors
  • Servers
  • Connectivity components such as repeaters, bridges, routers, brouters, and gateways
  • Protocols
    They make for difficult troubleshooting because protocols can mask the symptoms of a problem because of their own built in fault tolerance features.

Again questions are helpful in narrowing down the possibilities and isolating the problem.  The critical point during this phase is to fully understand how each component is supposed to work and how each component can fail.  Comparing current performance information with baseline information obtained by network performance monitoring utilities can provide a means for identifying problems at this phase also.

  
  • Which computers are able to function on the network?
  • If a computer cannot function on the network, can it function as a stand-alone computer?
  • If the computer cannot function on the network, is the computer's NIC working?
  • Is there a normal amount of traffic on the network?

Sometimes during this long, involved process of information gathering you find and solve the problem on the spot.  Other times it is a much more painstaking process testing each of the possibilities until you find the true cause of the problem and its solution.  This involves the next three steps of the structured approach.

List Possible Causes
After gathering the various possibilities, develop a list and rank them from most likely to least likely.  This is important because the symptoms may point to several possible causes.

Test the Possibilities
In attempting to make a diagnosis, a doctor will look at a set of signs and symptoms, call to mind the various conditions that would exhibit those symptoms and then order various tests to either confirm a suspicion or "rule out" one of the possibilities.  Hopefully, he will have the least expensive or least painful tests done first.  Similarly, at this phase it is time to test and see if one of your suspected causes is actually the cause of the problem.  Like the doctor, unless you were very sure of the cause, you wouldn't want to do the most "painful" test first.  In other words, if the problem seemed to be a bad physical connection of a workstation to the network, your first test shouldn't be to knock a hole in the wall to check the cable.  You would first test the patch cable, the NIC and use various diagnostic tools on the connection.  Pulling the cable or opening the wall would be your very last resort.

Study the Results
This phase is a brief point of evaluation.  Hopefully, one of your tests will lead to a resolution of the problem.  Other times, you will have to go back to your list of possibilities or even expand your list.  It may even be necessary to return to the information gathering phase.  If none of your options seem to lead anywhere, perhaps it is time to seek assistance.

Ask for Help
The best of industry's computer support people are generally know as those who know where to go for help.  There is no shame in benefiting from someone else's perspective or experience.
troubleshooting, Troubleshooting, TROUBLESHOOTING, troubleshoot, Troubleshoot, TROUBLESHOOT, troubleshooting windows, troubleshooting Windows, troubleshooting Windows 95, troubleshooting Win95, troubleshooting techniques, approaches to troubleshooting, troubleshooting approach

 

Last Modified 5/1/99
©
1999 Advanced Methods Computer Applications. All rights reserved. Terms of Use.