Performance tuning, discussed in Chapter 35, "Tools and Strategies for Optimizing Windows 98," is one of the most interesting and rewarding kinds of advanced work you can do with Windows 98. Troubleshooting, on the other hand, can sometimes be rewarding and interesting, but it also offers vast potential for frustration. This is particularly true when you're unable to resolve a problem and conclude that it's better to throw in the towel and reinstall all the software on the computer, replace the computer, or whatever else you might have to do.
The good news is that Windows 98 includes a number of powerful new tools to add to your troubleshooting toolbox. With these new tools, you'll find that trouble-shooting Windows 98 is much more productive than troubleshooting Windows 95. In this chapter, you learn about the tools included with Windows 98 that can help your troubleshooting efforts, along with strategies for their use.
System trouble comes in all shapes and sizes. You might have erratic behavior from a device, seemingly random application or system crashes, or bizarre behavior from applications that don't do what they're supposed to do. There are common threads to solving all of these problems, though, and the following sections discuss good troubleshooting practices that will help you.
Usually, the problems that you'll be called on to solve aren't your own, but are problems of the users you're supporting. This adds an entirely new dimension to most troubleshooting endeavors and is an area where many systems people are sadly deficient. You'll find, however, that mastering this tricky area results in a number of payoffs, including
The most successful troubleshooters have mastered the skill of communicating with users. In my experience, this skill is somewhat more valuable than top-notch technical skills. Although you obviously also need to know your stuff technically, you'll waste a lot of time and be less effective than otherwise unless you become good at working with users. The upshot is that it's sometimes just as enjoyable to troubleshoot users as it is to troubleshoot systems.
Doing a good job communicating with users boils down to several skills. The first one is calming the user. Often, users won't come to you for help until they've almost literally beat their heads against the wall trying to resolve the problem themselves. (Although some users will involve you for trivial problems without falling back on their own resources first, most users do try to do things on their own.) When they contact you, they're likely to be frustrated, and some people will be looking for someone (anyone!) to blame for their frustration. Moreover, you'll have a difficult time getting accurate information from them if they're in a frustrated state of mind.
The first skill, therefore, is to calm the user. When it becomes necessary to do this, you'll find that the most important things are the following:
It's worthwhile to establish some trust with the person and calm them down before moving on to tackling the actual problem. You'll solve problems much more quickly when you do.
The second skill is communicating accurately. The difficulty here is that often users have strange ways of referring to things on their computer or describing events. It's important that you get them to describe their problem in sensory-specific language. This means that you get them to offer you information about exactly what they did, saw, and heard when they experienced the problem. With some people, this is difficult, because they seem to want to interpret everything they do; it's almost as if they're incapable of simply telling you what keys they pressed, what commands they selected, and what happened. With these sorts of people, things are tricky because you'll have to feed them possibilities for confirmation, and it's really easy to let this sound condescending and make them defensive. About 20 percent of the population seems to have trouble describing things in a way that will be useful for you, so you're going to have to learn how to tease out the information you need from such people.
"What's a Hard Drive?!?"
One amusing thing that I experienced was an organization where the users kept persisting in calling their system units hard drives. Even when I explained the difference to them, this strange nomenclature kept cropping up. It was almost like there was some sort of weird rampant language virus running amok in the company.
One day I realized what was going on. When support people worked with users and observed their systems, they looked at the hard disk indicator light on the system unit and said things like, "Wow, your hard drive seems awfully busy." The users, not realizing that the light indicated a device within the system unit, figured out (incorrectly) that the light indicated the activity of their entire system unit, and therefore the system unit must be called a hard drive.
Another barrier to accurate communication occurs when users have high opinions of their own computer skills, and want to offer you their diagnosis rather than the detailed information that you actually need. You should develop strategies for handling such people. The key is to avoid contradicting their opinion while gathering the information you need. You can say things like, "Yes, I had a similar thing happen the other day, but it actually turned out to be something else. What, exactly, are you doing and seeing?"
Part of accurate communication often involves having the user show you what's happening. Because there will be times when people won't be able to communicate their problem to you effectively, get them to demonstrate the problem for you. By watching them, you'll be able to see what's really happening. When doing troubleshooting over the phone, you can do much the same thing by having them go through each step, and report what you want to see, and then have them type or do exactly what you tell them.
The last skill involves something that will improve your troubleshooting over the long term. Build trust with your users. This means several things: Be honest when you don't know what's causing a problem. Follow up with them when you say you'll follow up with them even if it's only to tell them that you haven't yet solved their problem. Treat them the same way that you would like to be treated in similar circumstances. A helpful idea to keep in mind when you have trouble respecting a user's problem is this: What's important to people is what they think is important to them.
Before you can solve a problem, you have to know exactly what the problem is. This first means knowing if you can reproduce a problem. Unfortunately, you'll see many problems in your career that can't be reproduced. They're usually caused by possibilities such as bugs in software that cause seemingly random behavior, misreads or miswrites of data, power surges, or--for all you know--cosmic rays. It's been said that 90 percent of computer problems are easily solved by rebooting the system, and it's true. Unless a particular system is experiencing a high rate of such problems, it's not going to be worth your time to chase these troubleshooting ghosts. Instead, restart the system and move on. Sometimes you have to explain this to your users, too. You have to tell them that sometimes, weird things happen for no reason, but to let you know if they see any other odd behavior in the next few days.
After determining that a problem is repeatable, the next step in identification is isolation. Is the problem with a modem, or the telephone connection, or the serial port, or the cable from the serial port to the modem? Is the problem with an errant software application, or the RAM in the system? Isolation involves systematically eliminating possible sources for the problem. Isolating a problem might mean trying to reproduce it on an identical system, to provoke a similar problem with another application, or even systematically replacing components of a system until you find the cause. Isolating problems requires patience. Over time, however, you'll gain experience so that you can narrow down a set of possibilities based on similar problems you've seen and resolved.
The final overall skill needed to become a strong troubleshooter involves approaching problems systematically. People who jump around a problem and try everything that pops into their heads simultaneously, invariably end up making the problem worse, or at best, solving the problem without knowing the cause or the actual resolution.
Instead, you need to work systematically, both when isolating a problem and when trying possible solutions to the problem. With complex problems, you might want to make a list of the steps to take, and then work through those steps one by one. Not only will you avoid frustration this way, but you'll learn more about the problem and its resolution than otherwise.
System problems can manifest in many different ways, and yet they all resolve down to one of several possibilities. Keep this in mind when identifying and isolating a problem, because one of the troubleshooting steps you can take involves eliminating these different possibilities and narrowing down your list of suspects.
In addition to the previously listed problems, there are sometimes random, one-time problems that clear up when a system is restarted.
The troubleshooting tools included in Windows 98 are substantially improved over those in Windows 95. Not only have some familiar tools been reworked and improved, but many new tools have been added to solve problems that occur regularly enough to warrant them. In the following sections, you learn how to use the troubleshooting tools that come with Windows 98 and solve common problems with them.
You might remember a tool called MSD in Windows 95, which reported on hardware
configuration, installed Windows-based drivers, and so on. It has been replaced in
Windows 98 with an entirely new tool called Microsoft System Information (MSI). MSI
is found in the Start/Programs/Accessories/System Tools folder, or by using the filename
MSINFO32.EXE with the Run command of the Start menu. Figure 37.1 shows the main screen
in MSI.
Figure 37.1 The
Microsoft System Information tool is new for Windows 98.
MSI serves two valuable functions. First, it displays a plethora of information about
the computer hardware and its configuration, about Windows 98 itself, and about the
software configured on the system. Second, it serves as a launch pad for other troubleshooting
tools, listed on its Tools menu. When accessing most of the tools discussed
in this chapter, you'll find it's easiest to first start MSI and then run the other
tools from MSI.
MSI displays three different classes of information, listed in the left pane. The Hardware Resources branch displays information about IRQs, DMAs, memory I/O port addresses, and so on. The Components branch lists information about different types of components in the system, such as its display, keyboard, multimedia devices, and so on. The Software Environment branch shows information about drivers and modules loaded and running in the system. Figure 37.2 shows MSI with all of its branches open and the Ports category of Components selected.
Many of the pages displayed in MSI let you select the information you want to
see using option buttons at the top of the display page in the right pane. Typically,
you can choose to display either Basic Information, Advanced Information, or History.
Basic Information overviews the selected page's data; Advanced Information shows
you much more detailed information. The History selection shows the configuration
history of the selected information category. Table 37.1 lists the different information
pages available in MSI.
Figure 37.2 MSI
with one of its information pages displayed.
Page | Describes |
Hardware Resources Branch |
|
Conflicts/Sharing | All IRQs on the system that are being shared or are in conflict; this is an important page to view if a system is experiencing problems that might be due to hardware conflicts. |
DMA | Displays assigned DMA channels. |
Forced Hardware | Any hardware devices that have had their configuration forced through Device Manager settings. |
I/O | All assigned I/O memory addresses in the system. |
IRQs | All assigned IRQs in the system. |
Memory | Hardware memory area assignments. |
Components Branch |
|
Multimedia | Information about installed multimedia devices. Sub-branches include information on audio and video codecs and any installed CD-ROM devices. |
Display | Information about video cards and attached monitors. |
Infrared | Information about any installed infrared interface devices. |
Input | Information about installed keyboards and pointing devices. |
Miscellaneous | Information about printers and tape backup devices. |
Modem | Information about installed modems. |
Network | All installed networking software; network interface cards, protocols, clients, and file and print sharing drivers. A sub-branch shows information about Windows Sockets (WINSOCK). |
Ports | Installed serial and parallel ports. |
Storage | Installed storage interface cards (floppy, EIDE, and SCSI) and attached devices. |
Printing | Overall printing configuration. |
Problem Devices | A list of any devices reporting a problem state. |
USB | Universal Serial Bus devices. |
History | The history of all configuration changes to the system (hardware configuration and driver changes) since Windows 98 was initially installed. This is a valuable page when troubleshooting. |
System | Motherboard devices and settings. |
Software Environment Branch |
|
Drivers/kernel Drivers | Installed kernel-level drivers. |
Drivers/MS-DOS Drivers | Installed MS-DOS device drivers. |
Drivers/User Mode Drivers | Installed user-level drivers. |
16-bit Modules Loaded | All 16-bit software modules running on the system. |
32-bit Modules Loaded | All 32-bit software modules running on the system. |
Running Tasks | All tasks running on the system. |
Startup Programs | All programs that are started at various times in the system. Shows more detailed startup program information than that shown in the Startup menu. |
System Hooks | Any running software that has hooked system resources. |
OLE Registration | Registered OLE clients and servers loaded through .INI files and through the Registry. |
If one of the previous pages shows some kind of trouble, other detailed pages of MSI might yield additional information. For example, if you see in the History page that the driver for the hard disk subsystem has recently been changed, you can then examine the Storage page for additional details about those devices.
After examining the information pages in MSI, use the MSI Tools menu to launch any of the following troubleshooting tools:
The following sections discuss most of these Windows 98 troubleshooting tools. (See Chapter 36, "The Windows 98 Boot Process and Emergency Recovery," for a description of System Configuration Utility.)
The Signature Verification Tool lets you search for files on the system that have
been signed or not signed by their publishers. This can be useful when searching
for Internet-downloaded modules that might be causing problems. The Signature Verification
Tool is simply a modified version of the standard Find dialog box in Windows 98 used
to search for filenames. Figure 37.3 shows the Signature Verification Tool.
Figure 37.3 The
Signature Verification Tool is new in Windows 98.
The Windows Report Tool gathers information on a bug in Windows itself. It automatically
includes copies of all key system configuration files, system settings, and descriptions
of the problem that you describe. A Microsoft Support Engineer might request that
you prepare a report file for examination when trying to debug a problem that you
report. Figure 37.4 shows a sample report.
Figure 37.4 The
Windows Report Tool is new in Windows 98.
You can customize the information included within the Windows Report file, and
might be requested to do so by Microsoft. First, access the Collected
Information command in the Options menu, which displays the Collected
Information dialog box shown in Figure 37.5. Select or deselect the files to be included
with the check boxes shown. You can also include files not listed by clicking the
Add button and then selecting any files that should be included, such
as application-specific .INI files.
Figure 37.5 The
Collected Information dialog box.
Before creating the report file and sending it, also access the User
Information command in the Options menu. It lists your name, address,
and telephone contact information. This information should be correct in case you
need to be contacted further to resolve the problem.
After completing the information, use the Save command on the File menu to prepare the report file. Report files are compressed into .CAB files, which are then extracted by the receiving engineer. You can then email the .CAB file as directed.
When Windows 98 is installed onto a functioning system, it backs up the key system
.DLL files it finds before replacing them with its own versions. This might cause
problems for some installed applications. Additionally, you might have had some installed
.DLL files that were a later version than the ones included with Windows 98 and that
might be required by your applications. The Version Conflict Manager shows you which
.DLL files that were backed up have a higher version number than the ones installed
by Windows 98. You can select .DLL files from the list and restore them from the
backup, replacing the Windows 98 versions of those files. This should be done with
great care because the non-Windows 98 .DLL files might cause other problems in the
system. Figure 37.6 shows the Version Conflict Manager.
Figure 37.6 The Version
Conflict Manager is new for Windows 98.
One of the biggest troubleshooting problems in previous versions of Windows was
the inability to verify the integrity of the operating system's files. Their integrity
could be damaged from several causes: bad sectors on the disk, an incorrect copy
from the original CD-ROM during installation, or by being overwritten by subsequently
installed applications. Corrupted system files can play havoc on a system, resulting
in application or operating system crashes, or erratic behavior that is difficult
to pin down. Now, with Windows 98's System File Checker, you can scan all of the
Windows 98 files to ensure that they haven't become corrupted or replaced by incorrect
versions. When you start System File Checker, you see the window shown in Figure
37.7.
Figure 37.7 The System
File Checker is new for Windows 98.
Generally, you can click the Start button to start the scan of files.
System File Checker compares the CRC values and version numbers of the installed
Windows 98 files with a database of expected values. When it finds a problem, a dialog
box appears asking you what action it should take with the file in question. Figure
37.8 shows such a dialog box.
Figure 37.8 The
File Changed dialog box in System File Checker.
When System File Checker finds a questionable file, you can choose from among the
following actions:
Before running System File Checker, you can adjust its options. From the main window, click the Settings button to open the dialog box. Figure 37.9 shows the Settings tab of System File Checker Settings dialog box.
On the Settings tab, you can choose how flagged files are backed up. You can always
back them up to a directory before restoring different versions from the installation
media. You can be prompted on a case-by-case basis for whether they should be backed
up, or you can choose to never back them up. You can also change the location to
which the files will be backed up before System File Checker restores them. Additionally,
on the Settings tab, you can control how the System File Checker log is maintained,
and you can view the log file from previous uses of System File Checker. Finally,
you can direct whether System File Checker looks for changed or deleted files with
the appropriate check boxes.
Figure 37.9 System
File Checker Settings dialog box's Settings tab.
The Search Criteria tab (see Figure 37.10) defines which directories are checked
by System File Checker. If you are creating a new verification database, you can
define which directories are tracked and can include application directories if needed
by using this tab. You can also define which file extensions are checked.
Figure 39.10 System
File Checker Settings dialog box's Search Criteria tab.
The final tab, Advanced (see Figure 37.11), lets you choose which verification database
is used by System File Checker; it also lets you create a new database. Should you
require, you can also select the original verification database installed with Windows
98 by clicking the Restore Defaults button.
When installing Windows 98 workstations into an organization, it makes sense to
run System File Checker as a final step (after all applications and services have
been installed and configured). Additionally, you should create a new verification
database, using the Search Criteria tab to select all the directories that contain
files that make up the default system configuration. This gives you two benefits
that can help future troubleshooting efforts: It gives you a baseline from which
to measure changes to the system, and it also monitors application files and settings
to ensure that they do not become damaged. After creating the database with the Advanced
tab and selecting all the appropriate file types and directories with the Search
Criteria tab, run System File Checker and choose Update Verification Information
for All Changed Files at the first prompted file.
Figure 37.11 System
File Checker Settings dialog box's Advanced tab.
It's unfortunate, but sometimes a system's Registry becomes corrupted. Because the Registry contains so many vital system settings, corruption in either the SYSTEM.DAT or USER.DAT files can wreak havoc with a system. The trouble is, you don't often know whether or not the Registry's corrupted, and you might not know if the backups you have of the Registry are corrupted, either. Windows 98 includes a new tool called the Registry Checker, which quickly scans the Registry files for corrupted data. Note that Registry Checker cannot find incorrect settings within the Registry, just corrupted Registry files.
When you run Registry Checker, you see a simple bar chart showing its progress. It typically finishes checking the Registry within 15-30 seconds. If the Registry has not been backed up on the day on which you run Registry Checker, it prompts you with a message box asking if you would like it backed up immediately.
Some device failures cause Windows 98 to run improperly. When this type of failure is detected, you can use the Automatic Skip Driver Agent (ASD) to control whether that device will be started for future Windows 98 startups.
Starting ASD from MSI shows you any critical driver failures on the machine and allows you to designate their new startup status. If there are no driver failures recorded, you see a message to that effect.
Dr. Watson is a program that runs in the background and can detect system errors when they occur. When this happens, Dr. Watson creates a log file that contains the state of the system. Dr. Watson log files might be requested by Microsoft Technical Support to help diagnose a tricky problem in the operating system.
Dr. Watson has been reworked for Windows 98 and now includes a number of system
state information tabs when it is run in its Advanced mode. Upon starting Dr. Watson,
you see its Diagnosis page (see Figure 37.12). On this tab, any peculiarities about
the state of the system detected by Dr. Watson are reported.
Figure 37.12 Dr.
Watson is updated for Windows 98.
If the information tabs indicated in Figure 37.12 are not visible, access the View
menu and choose Advanced View to display them. Each of the remaining
tabs shows information about various aspects of the system. These information tabs
are similar to the information pages shown in MSI, although they are less extensive.
To control Dr. Watson's settings, access the View menu and choose
Options. You see the Dr. Watson Options dialog box shown in Figure
37.13. You can control how many log entries are maintained by Dr. Watson, the directory
in which they are stored, as well as the number of instructions and stack frames
that will be disassembled in the Dr. Watson log files. You can also choose whether
to open the Dr. Watson window in Standard or Advanced
view. (The difference is the access to the additional information tabs.)
Figure 37.13 Dr.
Watson Options dialog box.
The ScanDisk utility is unchanged from Windows 95. You use it to detect directory
structure problems on a disk, and you can perform detailed disk surface analysis
testing. Figure 37.14 shows the ScanDisk utility.
Figure 37.14 ScanDisk
is the same as it was in Windows 95.
You should run ScanDisk prior to doing more detailed troubleshooting on Windows 98
systems. Many system problems are caused by lost clusters or other errors in the
disk system that ScanDisk can detect and fix.
The ScanDisk Advanced Options dialog box lets you choose how ScanDisk operates, and is shown in Figure 37.15. You can control whether summary displays are shown after ScanDisk finishes, how ScanDisk maintains its log file (usually C:\SCANDISK.LOG), how ScanDisk handles cross-linked files, how it handles lost clusters, and what types of tests it performs on the directories it examines.
Problems with Windows 98 systems or applications are often caused by inappropriate
changes to files. Perhaps a file has become changed due to an error, from a virus,
or from some other reason. A utility that is located on the Windows 98 CD-ROM can
help you detect such problems.
Figure 37.15 ScanDisk
Advanced Options dialog box.
Located in the \Tools\Reskit\Diagnose directory, FileWise (FILEWISE.EXE) lets you
select files or entire directory structures and then displays detailed information
about all of the files it finds. It then can be made to calculate Cyclic Redundancy
Check (CRC) values for all of the files, and the resulting information can be saved
into a text file. You can then use FileWise to perform a similar test of a supposedly
matching set of files, and then compare the two text files for differences that should
not exist. For example, you might compare the files stored in the \Windows directory
of two different machines, one of which is having strange problems, but both of which
are configured the same. In this context, FileWise can help you identify files that
don't match, and might help you find the source of the problem. Another way to use
FileWise is to make a "snapshot" of a set of files prior to making some
change to a system, and then using FileWise to make another "snapshot"
after the change is complete. You can then compare the resulting saved text files
in order to learn what was changed in the files as a result of the modification.
FileWise is shown in Figure 37.16.
Figure 37.16 FileWise
is on the Windows 98 CD-ROM.
After starting FileWise, open its File menu and choose Add a File or Add a Directory. Select the files or directories that you want FileWise to analyze. After it finishes loading all of the file information, click the Generate CRCs button on its toolbar. Calculating CRCs might take a while for very large sets of files. When finished, use the Save command (in the File menu) to save a text file containing all of the file details. Follow the same steps with another machine, a copy of a directory on the same machine, or after some change that you want to analyze. You can then compare the two text files to learn which files were changed between the two FileWise analyses.
You've learned about general troubleshooting practice and about the troubleshooting tools included with Windows 98. Armed with this knowledge, you can start to work through most Windows 98 system problems. However, until you gain experience dealing with a variety of problems, you might spend unnecessary amounts of time looking for problems in areas not indicated by a problem. The following sections discuss a number of common types of problems that you might be called on to solve under Windows 98. It discusses their causes and shows you which tools to use (and in which order) to rapidly resolve the problem.
NOTE Problems with system boot or startup are discussed in Chapter 36, "The Windows 98 Boot Process and Emergency Recovery."
Before pursuing any troubleshooting activity on a system, there are some steps that you should always take. Not only do these steps eliminate some common problems, but they are easily performed:
After performing these steps, if the problem has not been revealed or corrected, proceed with the following sections.
Problems with applications crashing on a regular basis can be common. The first step is to determine if only one application is crashing, if a set of common applications are crashing, or if all applications are crashing.
Problems with a single application crashing are almost always related to one of several possible causes:
When you have trouble with a group of applications, the trick is to find out what is common among the applications. There is almost always a common .DLL, system service, or device upon which all the applications depend. Use MSI to look for devices that are in conflict or are reporting trouble, and then use System File Checker to check the Windows 98 .DLL files and system services. Replace or repair any devices that aren't working properly or are in conflict. Reinstalling one of the crashing applications might resolve the problem for all of them. As a last resort, you can try reinstalling Windows 98 itself.
WARNING I hope that you think this goes without saying: All important data files on a system should be regularly backed up, before trouble strikes. You should also make an extra backup of the system before doing most troubleshooting work.
If all applications on the system are crashing regularly, then the problem is not with the applications, but with the operating system or the computer hardware. See the following section for information on resolving these problems.
Problems where the system is crashing regularly can be extremely difficult to resolve because there are many possible sources, and the trouble might be intermittent. Moreover, the system might report all sorts of different problems over time, none of which will tell you what's really causing the trouble. Chasing down these gremlins can be time-consuming and may become expensive if the cause seems rooted in hardware and you don't have spare parts with which to experiment. Proceed as follows:
Hard disks usually fail in one of two ways: Either they go completely dead and never work again, or they start to develop bad sectors, often at an increasing rate. There's not much you can do about the former except to replace the drive and start over, restoring your data from backup. In the latter case, you should regularly run ScanDisk; if errors keep cropping up on the surface scan, then make a fresh backup of your data and replace the drive as soon as possible. Cases where hard drives are failing by sectors going bad over time do not usually stabilize, so it's not a matter of fixing the bad sectors and continuing.
Problems with modem connections can be very tricky to resolve because although you will usually know that something is wrong with the dial-up connection, there are many possible sources of failure, mirrored on both sides of the connection. Check the following to see if the problem is on a particular side:
As pointed out in the chapter introduction, troubleshooting tricky problems can be very frustrating and rewarding. In this chapter, you learned about general troubleshooting practice (including how to gather troubleshooting information from users), about the troubleshooting tools included with Windows 98, and about some specific advice for troubleshooting common problems on Windows 98 systems.
Although this chapter should serve to help you solve a problem, don't forget other resources that are available to you. Sometimes people spend inordinate amounts of time troubleshooting some problem, when they could have solved the problem in one-tenth the time if they had used a different resource. Application vendors and hardware manufacturers are invaluable sources of assistance when troubleshooting, and most maintain Web pages that contain troubleshooting information. For Windows 98 problems, don't forget to consult the Microsoft Knowledge Base, which lists problems encountered and solved by Microsoft Software Engineers.
© Copyright, Macmillan Publishing. All rights reserved.