This chapter is devoted entirely to the defragmenter as a solution to the fragmentation problem.
Nevertheless, it was a start, and a crack in the walls of Digital
Equipment Corporation's system software fortress.
Before 1986, Digital had a virtual monopoly on system software
for the VAX. But, when it came to defragmentation, Digital's official
view was that on-line defragmentation was "not an easy problem."
Each year, a large group of Digital's customers, members of the
Digital Equipment Computer Users Society (DECUS), surveys its
members to determine the things that most need improvement in
the OpenVMS operating system. This survey is called the System
Improvement Request (SIR) ballot. In the 1985 SIR ballots, on-line
disk compression (another word for defragmentation) placed first
in the survey by a large margin in the United States and second
in Europe. Digital customarily responds to the top ten items on
the SIR ballot. In Europe, Digital's response took the form of
a talk by Andy Goldstein, a top guru for Digital's revered VMS
Central Engineering group, in which he said:
In the U.S., Digital's official response appeared in the February
1986 DECUS newsletter:
These statements by Digital were alarming. Here we had a very
large group of customers on two continents declaring this to be
their single biggest problem with Digital's flagship product and
Digital's official response was that they had no current plans
to deal with it!
It would appear that the roadblock in Digital's path was the fixed
idea that on-line defragmentation was an "n-squared order
problem." This idea is basically a declaration of the impossibility
of solving the problem in an acceptable way. Fortunately for the
customers, there is nothing like a public declaration of impossibility
from an expert to prompt entrepreneurs into action.
The second quote, in fact, was a driving factor in convincing
more than one software company to throw its hat into the defragmenter
ring. It seemed to be assurance that Digital would not be coming
along anytime soon to steamroller the market with its own solution
in the form of an "official" Digital defragmenter, even
in the face of huge customer demand. True enough, it was not until
more than five years later, on December 9, 1991, that Digital
announced a disk defragmentation product. During those five years,
Digital repeatedly declared its policy on the subject of defragmenters
to be that the possibility of obtaining performance gains by use
of a disk defragmenting utility was a "misconception."
The official Digital solution was:
The policy statement went on to speculate on the horrors that
might befall a defragmenter user if something went wrong.
This policy was based on unsound advice. The policy statement
ignores the risk inherent in backup and restore: that of unreadable
tape. This is not an insignificant risk. What happens if you backup
a disk to tape, reinitialize it and then find that the tape is
unreadable? You're up the river without a paddle, as they say.
That data is gone and it's gone for good. Not so obvious, but
a serious concern is that the backup and restore process is so
tedious and so time-consuming that one-quarter of the System Managers
surveyed recently said they have never bothered to do it. How
good can a solution be if it is never used at all?
On top of that, Digital's policy statement lacked any serious
testing to determine whether the facts on which it was based were
true. I know, because I did serious testing. The best that
could be said for Digital would be that Digital had lumped all
defragmenters together and assumed that what is true for one must
be true for the rest. Don't get the idea that I think badly of
Digital. I admire the company in many ways. But Digital has over
95,000 employees and sometimes some of these employees don't communicate
very well with each other.
So, first Digital said it couldn't be done. Then, when it was
done by others, Digital said its usefulness was a misconception.
Now Digital is fielding its own defragmenter product. You draw
your own conclusion.
In any event, in 1986 the defragmenter happened. Now there was
a tool for the specific purpose of defragmenting disks. Best of
all, an on-line variety of defragmenter was developed and did
everything itself, automatically.
An on-line defragmenter is distinguished from an off-line defragmenter
by the fact that you do not have to shut down the system, kick
the users off or take a disk out of service to use the defragmenter.
An automatic on-line defragmenter goes a step further and includes
a mechanism for determining when to defragment. The non-automatic
version is manual in this regard - it requires the System Manager
to decide when to defragment. The automatic on-line defragmenter
has some mechanism for determining when to defragment - whether
by sensing the state of fragmentation or by measuring the degree
of fragmentation between defragmentation passes or just by waiting
a certain time and then cleaning things up again.
Ideally, the on-line defragmenter keeps a record of how badly
fragmented the disk is every time it makes a defragmentation run
and, based on that information, increases or decreases the time
intervals between runs.
This is not a simple problem. Perhaps the single greatest contributing
factor to the problem is the fact that OpenVMS includes no mechanism
for determining how often a particular file is accessed. If you
could find out how often a file is accessed, you would know how
critical that file is to system performance and thus how much
attention the defragmenter should give to keeping that particular
file contiguous. Another important factor is the lack of any mechanism
to determine whether a file has been accessed at all. Without
either mechanism, we are reduced to looking mostly at indirect
measures of the impact of fragmentation on the system. These measures
are described in Chapter 3.
What's Wrong With Fragmentation?
Once it has been determined, by whatever mechanism, that it is
time to defragment a disk, the defragmenter has to determine whether
defragmentation can be done safely at that time or whether it
should wait for a better time to do it. It also has to determine
whether the defragmentation activity would degrade system performance
unacceptably at that time. It even has to check whether defragmentation
will do any good; maybe the disk is in such good shape that defragmentation
would be a waste of resources. In this case it would just wait
for a while longer and check again later on.
When the defragmenter has determined that the time is right, the
next question is, "What files should be defragmented?"
On even the most badly fragmented disk, some files are contiguous.
Attempting to defragment these would be a waste of time and resources,
so some means is needed of detecting them and causing the defragmenter
to skip processing those files - all with a minimum of overhead,
of course. Other files (such as INDEXF.SYS) must not be defragmented
because to do so would interfere with the proper operation of
the operating system.
Then, amongst the files that should be defragmented, some
determination must be made as to which should go first. Should
the files be processed in order of appearance in their directory
files? Should the most fragmented files be processed first? Or
is some other order best? Does it matter at all?
When a file is selected for defragmenting, the defragmenter has
to determine where on the disk to put the newly-created defragmented
file. The wrong location might do nothing to improve the condition
of free space fragmentation or might even make it worse. Some
intelligence is required to choose the best place on the disk
for the defragmented file, keeping in mind the gaps that will
be created when the fragmented version is deleted.
Then there is the question of what if no single place on the disk
is suitable? Could the file be split into two or more pieces and
still be better off than in its present fragmented state? Maybe.
Maybe not.
Suppose the disk is so messed up that it is going to take two
or more defragmentation passes to get the disk into good shape.
How does this affect decisions on where to put a defragmented
file? Could it profitably be put into a worse position
on the first pass, anticipating movement to its ideal position
in a subsequent pass?
When a new location is selected, how exactly should the copying
be done? Should the file be copied directly into the new location
or should it be copied first into a temporary location and then
moved to its final destination? And how do you deal with user
attempts to access that file while it is in the middle of being
relocated? What if someone attempts to modify the original file
after the copy has been made, but before the new file formally
takes the place of the old one? And, not the least of our worries,
what if the system goes down right in the middle of all this?
An automatic on-line defragmenter also has a quality control problem.
How can it be sure the file was copied correctly and users can
now access every bit of the new file exactly as they did the old?
These are the obvious problems in the simple case. No mention
has been made of problems unique to multi-header files, files
that span disk volumes, or files that are held open by the system
on a complete bypass of the usual procedures so there is no way
to tell whether a file is in use or not.
OK, gentle reader, the scary part is over. I hope you are still
reading so you can receive the news that there are solutions to
all these problems. My purpose in rattling off these problems
is to show you that defragmentation is a complicated undertaking,
that it has been thought through and we defragmenter vendors are
not playing around with your irreplaceable data with our heads
stuck in the sand. The important thing to know is that the computer
system itself has the answers to these questions within it or,
at least, it has the data from which answers can be formulated.
An automatic on-line defragmenter then, is one which uses the
data already available within the computer, without the need for
operator intervention, to determine when to defragment disks and
disk files, what files to defragment, the order in which to defragment
them, where to put the new, contiguous files, whether to completely
or only partially defragment a particular file, to do all this
without interfering with user access to the disk and disk files
and to do so with absolute 100% guaranteed data integrity. Yes,
this can be done and, at this writing, is being done on tens of
thousands of OpenVMS systems around the world.
Done right, this solution to fragmentation requires no attention
from the System Manager or operations staff at all. It is a complete
elimination of fragmentation as a problem for that OpenVMS system.
Such an automatic solution to a problem inherent in the operating
system is called an operating system enhancement, as opposed
to the manual, tool-variety solution, which is called a utility.
A good on-line defragmenter does not just provide a means for
recovery of user data in the event of a system crash during the
defragmentation process; it actually processes files in such a
way that no data can be lost. It is possible and practical to
create the new, defragmented version of the file, verify its accuracy
and replace the old version with the new in between user accesses
to the file, all the while guaranteeing that directory and INDEXF.SYS
file information refers to a flawless copy of the file. With such
a method, there is no window for error in which a file of user
data can be lost, even when the system crashes at the worst possible
moment.
Apparently, it was concern about this potential window for error
and uncertainty about its handling that kept Digital out of the
defragmenter business until 1991. In that year Digital incorporated
a mechanism into OpenVMS version 5.5 to relocate a file without
any window for error. The mechanism adopted
by Digital, called MOVEFILE, is similar to the mechanism a leading
defragmenter had been using since 1986. When MOVEFILE appeared,
Digital's public concerns about the safety of on-line defragmenters
ceased, at least for those defragmenters that used the official
Digital mechanism for moving files!
The solution is easily explained. Relocating a file on the disk
for purposes of defragmenting is a multi-step process. Doing some
of the steps without doing the rest can result in a file that
is confused, damaged or even lost. The solution is to isolate
the critical steps that must be all completely done or none
done at all and treat these as a single step. Such a group
of steps treated as a unit is called a primitive. In version
5.5 of OpenVMS, this operation is called the movefile primitive.
It moves a file from one location on a disk to another, guaranteeing
that all the steps necessary to move the file will be fully complete,
or none of the steps will be done at all. Thus, you can be sure
that the file is either fully moved intact or remains fully intact
at its old location. No in-between state can exist. Therefore,
even if the system crashes, you can be confident that all your
data still exists without corruption of any kind.
This is not a very safe rule for a defragmenter, however, and
any defragmenter should honor file placement control just in case
someone really does need that file to be in that exact place.
These are not usually viable options, so system disks were originally
excluded from processing by defragmenters. Some defragmenters
skirted the issue by excluding all files in a system root directory,
processing only files that reside on the system disk outside those
directories reserved for the OpenVMS operating system. The remarkable
aspect of this is that user files residing on the system disk
probably cost more performance than moderate levels of fragmentation.
The System Manager could get a bigger performance boost by moving
those files off the system disk than by defragmenting them,
bigger perhaps than defragmenting all the fragmented user files
on user disks.
A good defragmenter knows exactly which files can be moved and
which cannot, so it can defragment a system disk just as freely
as any user disk without risk to the system. The same is true
for a common system disk, where the system files of two or more
different systems reside, though extra care must be taken by the
defragmenter to ensure that another system's unmovable files are
left in place even though they will not appear unmovable to the
system on which the defragmenter is running.
A quorum disk is one which substitutes for a VAX or Alpha AXP
computer, acting as a node in a cluster. The reasons for this
are complex and not important to defragmentation. The important
thing is that, as on a system disk, certain files on a quorum
disk must not be moved. The defragmenter has to take this into
account.
The main reason, if not the only reason, for defragmenting a disk
is performance. If you know nothing of fragmentation, you can
become quite perplexed watching a system's performance become
worse and worse, week after week, month after month, with exactly
the same hardware configuration, the same software applications
and the same user load. It's almost as if the machine left the
factory with planned obsolescence built into it.
How can the performance of the exact same system be so much worse
if nothing has changed? Well something did change: the files and
disk free space fragmented with use. The proof? Defragment the
disks, and the system performs like new.
The sinister side of this is that fragmentation occurs so gradually
that you might not notice the creeping degradation. If system
response worsens by only a fraction of a second each day, no one
is likely to notice from one day to the next. Then, weeks or months
later, you realize that system response is intolerable. What happened?
Your system has caught the fragmentation disease.
The first rule of performance management
is that the cure must not be worse than the disease.
This is the rule that killed off-line defragmenters. Here's how
it works.
Let's say, for the sake of argument, that your system is losing
10% of its performance to fragmentation. That is to say, jobs
take 10% longer to run than they should or, put another way, only
90% of a day's work can get done in a day. Ten percent of a 24-hour
day is 2.4 hours. The solution to your fragmentation problem has
to consume less than 2.4 hours per day or it just isn't worth
it.
Seems simple, doesn't it? Well, shutting down the system or taking
a disk out of service to defragment it is a 100% degradation of
performance. Performance just doesn't get any worse than "the
system is down." So an off-line defragmenter that costs you
three or four hours of computer time a day is more costly than
the losses to fragmentation. The cure is worse than the disease.
The computer resources consumed by a defragmenter must be less,
much less, than the performance losses due to fragmentation.
The best way to violate this rule is to defragment using a method
that requires taking the disk out of service. So a good defragmenter
works on a disk while the system is up and while the disk being
defragmented is being accessed by user applications. After safety,
this is the most important feature of a defragmenter.
A secondary aspect of this same disk access
feature is that the files on the disk must be available to user
applications. It is not enough to allow access only to the free
space on the disk for the creation of new files. User applications
must be able to access existing files as well. And while the defragmenter
may be accessing only a single file out of perhaps 10,000 on a
disk, guess which file some user's application is most likely
to want to read? Yes, it is the one file that just happens to
be being defragmented at that moment. Murphy's law strikes again.
So an on-line defragmenter must assume that there will be contention
for access to the files being defragmented. Other programs will
want to get at those files and will want to get at them at the
same time as the defragmenter. The defragmenter,
therefore, must have some means of detecting such an access conflict
and responding in such a way that user access is not denied. The
defragmenter has to give way. Technologically, this is tricky,
but it can be done and is done by a good defragmenter.
Another aspect of defragmenter performance is the amount of time
and resources consumed in finding a file to defragment. Scanning
through some 10,000 files by looking up file names in directories
and subdirectories is out of the question. The time it takes to
do this is a blatant violation of Rule One - it outweighs the
performance gains likely to be obtained by defragmenting.
A much better way to rapidly find files for defragmenting is by
reading the INDEXF.SYS file directly. The index file contains
the file headers for all the files on the disk and within each
file header is contained all the information a defragmenter needs
to know about the state of fragmentation of a file. Specifically,
the header tells how many fragments there are, where each is located
on the disk and how big each one is. So a defragmenter can zip
through the index file, picking out those files that need defragmenting,
consuming typically only one disk access per file checked. Better
yet, by reading several headers at once, multiple files
can be checked for fragmentation with each disk access. A good
defragmenter uses the index file to find files to process.
After a file has been selected for defragmentation, the overhead
involved in the defragmentation process itself can be significant.
If the file is large, it can be very significant. After
all, it is usually necessary to copy the file in its entirety
to make it contiguous. As many as 200 disk accesses may be required
to copy a 100-block file (100 reads and 100 writes). These two
hundred disk accesses at 25 milliseconds apiece would consume
2.5 seconds. With this kind of overhead, processing even a fraction
of 10,000 files on a disk holding hundreds of megabytes can be
a time-consuming activity. Fortunately, the OpenVMS file system
is more efficient than these figures would imply. Only the smallest
disks, for example, have a cluster size of 1, so disk reads and
writes generally move 2 or 3 or more blocks at once. Further,
regular defragmentation holds down the amount of activity required.
It is worth noting that performance worsens geometrically as the
degree of fragmentation increases, so catching fragmentation early
and defragmenting often requires less resources overall than occasional
massive defragmentation.
The defragmenter itself can do a lot to lessen the impact of defragmentation
overhead. A throttling mechanism, for example, can reduce defragmentation
I/O during times of intense disk activity and increase it during
slack times. This mechanism gives the appearance of greatly reduced
overhead by scheduling the overhead at a time when the resource
is not needed anyway. Using idle time in this way can make the
defragmenter invisible to users of the system.
Perhaps the worst source of excess overhead for a disk defragmenter
is the attempt to analyze an entire disk before defragmenting
and plan a "perfect" defragmentation pass based on the
results of this analysis. The idea is that a defragmenter can
calculate the ideal position for each file, then move each file
to the best position on the disk. This is a holdover from the
off-line defragmenter days and, besides carrying the risks described
in Chapter 5, it is enormously expensive in terms of overhead.
Such an analysis requires examining literally every file on the
disk. On top of that, the analysis becomes obsolete instantly
if there is any activity on the disk other than the defragmenter.
A good defragmenter, then, should approach the process one file
at a time and not require the overhead of analyzing every file
on a disk in order to defragment only a few files.
After safety and performance, you should look for basic functionality
in a defragmenter.
The most basic functionality is the decision of what to defragment
and what to leave alone. Not in all cases is it desirable to defragment
everything. Some selectivity is required.
A defragmenter has to exclude from its processing certain system
files, like INDEXF.SYS. It should be wary of placed files and
files that have allocation errors. It should also have the capability
of excluding a list of files provided by the System Manager. You
might also look for the ability to include certain files
in processing (or exclude "all files except ______")
and possibly the ability to force immediate defragmentation of
a particular file or group of files.
Perhaps the most important basic functionality of a defragmenter
is determining whether a disk is safe to defragment or not. It
is possible, even commonplace, for a disk to get into a state
where the data on the disk is not exactly where the file headers
in the index file indicate it should be. When this occurs, it
is extremely important for the matter to be corrected before any
file involved is deleted, as deleting a file (using the erroneous
information in the header from the index file) might cause the
wrong data on the disk to be deleted! A good defragmenter must
detect this condition and alert the System Manager to it so it
can be corrected before defragmentation begins.
It is also possible for a good defragmenter to detect and isolate
certain types of problems on a disk and avoid those areas while
continuing to safely defragment the rest of the disk.
To answer this question with a numeric quantity, like "every
week" or "every two weeks," you have to know how
long it takes for fragmentation to build up to a level where performance
suffers noticeably. You can use a disk analysis utility or a performance
monitor to measure the level of fragmentation on your system periodically,
perhaps daily. Then, when performance begins to suffer noticeably,
you can take note of what level of fragmentation you have. Let's
say this happens when fragmentation reaches an average of 1.1
fragments per file (10% fragmentation). Thereafter, you can periodically
measure fragmentation and when it gets to, say, 1.05, defragment
the disk.
An automatic on-line defragmenter includes a mechanism to measure
fragmentation and schedule defragmentation passes accordingly.
The ideal automatic on-line defragmenter would detect performance
drains attributable to fragmentation and eliminate the causes
before the drains became noticeable.
It is one thing to ramble on about the workings of a defragmenter
in an ideal, laboratory environment, but it is quite another thing
to see one working in the real world. One of the tricks played
on us System Managers in the real world is full disks. Somehow,
despite our best efforts, disks that really ought to remain 80%
full or less drift up to 98%, 99% or even 100% full. Sure, you
can spot this and take appropriate steps to handle it, but what
happens to your defragmenter during that critical window of time
between the disk going 100% full and your clearing off some space?
Look for a defragmenter that survives this circumstance intact
and leaves every bit of your user data equally intact.
A defragmenter can't do much in the way of defragmenting with
zero free space on the disk. The point is, if it can't defragment,
it shouldn't consume overhead either. So a good defragmenter should
do nothing at all when there is nothing useful it can do.
Another side of the same coin is the fragmented file that is larger
than the largest free space on the disk. Suppose, for example,
you have 10,000 blocks of free space, all in one place, but there
is a 12,000 block fragmented file. How does the defragmenter deal
with that?
Older defragmenters used to rely on scratch space on a second
disk to handle this problem, but that proved so unreliable that
it has disappeared as a practice. Some defragmenters don't deal
with the problem at all; they just ignore the file. A good defragmenter
will partially defragment the file, giving you the best
result it can within the constraints of space available, and then
return to the file later for complete defragmenting when sufficient
space has been freed up.
Another one of those real world tricks that
doesn't show up in the test lab is the file that is held open
all the time, leaving no "downtime" for that
file in which to defragment it. Database files are prime candidates
for this trick, particularly large database files. And
why not? That big database probably contains most of the data
that justifies the computer's existence. It ought to be
in use around the clock. A defragmenter needs to take such files
into account and provide a means of dealing with them safely.
Besides the always-open file, there is also the one file that
a user application happens to need at the instant the defragmenter
is working on it. What happens in that case? Does the defragmenter
give way? Does it even notice? Or does the application program
trip, fail and abort with a file access error?
The minimum proper action is for the defragmenter to 1) notice
that an attempt is being made to access the file, 2) abort its
own operation safely and quickly, and 3) try again later. The
ideal defragmenter would process files in such a way that no user
application could ever falter from or even detect an access conflict.
In other words, the defragmenter should have enough control over
system operation to move the file at a time when no user is attempting
access and in such a way that no attempted access by an application
would ever fail.
Another simple but important piece of basic functionality is the
preservation of file creation and modification dates. You can
defragment a file quite easily by simply using the
DCL command COPY /CONTIGUOUS. If there is a free space of sufficient
size available, DCL will make a contiguous copy of the file for
you. The problem with this method is that it gives the copy a
new file creation date. You might not care whether the date is
changed or not, but the VMS BACKUP utility will. The next time
you do an incremental backup, the copied file will be saved even
though it was saved on an earlier backup. The reason is the new
date given to the file by the COPY command. For a single file,
this may be no big deal, but clearly a defragmenter cannot go
around changing file creation dates wholesale. Nor can the file's
modification date or date of last backup be changed. Either action
would cause your incremental backups to explode from relatively
small savesets to ones rivaling full backups in size.
A good defragmenter should not change file creation dates, file
modification dates, file backup dates or any other information
in the file header except the size and location of the extents
(fragments) that make up the file.
Directory files never become fragmented, but they are otherwise
just like any other file. Directory files do fragment the disk's
free space, however. Directory files present a special problem
for a defragmenter in that while a defragmenter has a directory
file locked or held open for relocation, not only is that directory
file inaccessible to users, so is every file in that directory
and every file in every subdirectory below it. To access a file
by name, a user application must go through the directory or directories
containing that file's name. If the directory is locked, the user
application gets an access conflict error. If the user application
is not expecting an access conflict error or is not designed to
deal with such errors, the application may abort.
A good defragmenter is designed with this problem in mind and
moves directory files without any restrictions whatsoever on user
access to files in that directory or its subdirectories. It is
no solution to just ignore directory files, as this leaves your
free space badly fragmented.
First of all, optimization has nothing to do with defragmentation.
Defragmentation is the solution to the problem created by fragmented
files and disk free space. Your system is slow. The real reason
it is slow is that files and disk free space are fragmented. The
solution is to make the files contiguous (not fragmented) and
group the free space together. That's it.
Where does optimization come in? Well, this is a different subject
altogether. The concept of disk optimization supposedly accelerates
file access even when all the files are contiguous and all the
free space is grouped together. Disk optimization is an attempt
to speed up file access by forcing certain files to be permanently
located in certain positions on the disk. The theory goes that
if you put the INDEXF.SYS file in the middle of the disk and group
the most frequently accessed files around it, the disk heads will
generally have to travel a shorter distance than if these files
were located randomly around the disk.
There are some major holes in this theory. In fact, the holes
are so major that I think the "optimization" proponents
either don't fully understand the OpenVMS system or are just using
optimization as a marketing gimmick. There are too many holes
in the theory.
Hole number one: There is no standard, supported way on
an OpenVMS system to tell which files are most frequently accessed.
In fact, there is no way to tell which files are frequently accessed
or even which files have ever been accessed. You can tell
which files have been written and when they were last written,
but not when they were read. The only thing that comes close to
providing this information is the enabling of volume retention
dates, but enabling this feature consumes more overhead than you
are likely to get back by "optimizing" file placement.
The cure is worse than the disease.
Hole number two: Extensive analysis of real-world computer
sites shows that it is not commonplace for entire files to be
accessed all at once. It is far more common for only a few blocks
of a file to be accessed at a time. Consider a database application,
for example. User applications rarely, if ever, search or update
the entire database. They access only the particular records desired.
Thus locating the entire database in the middle of a disk is wasteful
at best and possibly destructive as far as performance is concerned.
Hole number three: File placement capability in OpenVMS
was designed for the realtime laboratory environment in which
a single process has continuous control of the computer system.
In such a system, the time consumed by head movement from one
particular file to another particular file can be critical to
the success of the process. The system designer can minimize that
critical time lag by calculating the ideal location for the second
file in relation to the first and forcing the two files to exact
locations. Then, when the process has completed reading the first
file, access to the second is effected with minimal delay.
By comparison, consider the typical interactive user environment.
Dozens or even hundreds of interactive users might be logged on
and active at any moment, running who knows what applications,
accessing innumerable files willy-nilly in every conceivable part
of a disk. How can one even hope to guess where the disk's read-write
head might be at any given time? With this extremely random mode
of operation, how can a disk optimizer state flatly that positioning
such-and-such a file at such-and-such an exact location will reduce
disk access times? It seems to me that such a statement is foolish
and such file positioning is equally as likely to worsen system
performance as to improve it. Even if the two conditions balance
out at zero, the overhead involved gives you a net loss.
Hole number four: When you force a file to a specific position
on the disk by specifying exact LBNs, how do you know where it
really is? You have to take into account the difference between
logical block numbers (LBNs) and physical block
numbers (PBNs). These two are not the same thing. LBNs are assigned
to PBNs by the disk's controller. Disks supplied by Digital Equipment
Corporation often have as many as 10% more physical blocks than
logical blocks. The LBNs are assigned to most of the physical
blocks and the remainder are used as spares and for maintenance
purposes. You see, magnetic disks are far from perfect and blocks
sometimes "go bad." In fact, it is a rarity for a magnetic
disk to leave the manufacturer without some bad blocks.
When the disk is formatted by Digital or by the customer, the
bad blocks are detected and "revectored" to spares.
Revectored means that the LBN assigned to that physical
block is reassigned to some other physical block. This revectoring
can also be done on the fly while your disk is in use. The new
block after revectoring might be on the same track and physically
close to the original, but then again it might not. Thus, all
LBNs do not correspond to the physical block of the same number
and two consecutive LBNs may actually be widely separated on the
disk.
So I ask again, "When you force a file to a specific position
on the disk, how do you know where it really is?" You may
be playing probabilities and perhaps you should think twice before
gambling with user data and system performance.
Hole number five: Where is the "middle" of a
disk? Obviously, no one is suggesting that the geometric center
of the round disk platter, like the hole in a phonograph record,
is the "middle." Of course not. We are talking about
data storage. The middle is the point halfway between LBN zero
(the "beginning" of the disk) and the highest LBN on
that disk volume (the "end" of the disk). Right?
Well, maybe not. We have already seen that LBNs do not necessarily
correspond to the physical disk block of the same number. But
what about a multi-spindle disk (one with two or more sets of
platters rotating on separate spindles)? There are several different
types of multi-spindle disks. Besides the common volume sets and
stripesets, there are also disks that use multiple spindles for
speed and reliability yet appear to OpenVMS as a single disk drive.
Where is the "middle" of such a disk? I think you will
agree that, while the location of the apparent middle can be calculated,
the point accessed in the shortest average time is certainly not
the point halfway between LBN zero and the last LBN. This halfway
point would be on the outermost track of one platter or on the
innermost track of another - not on the middle track of either
one. Such disk volumes actually have several "middles"
when speaking in terms of access times.
There are even disks that have no performance middle at all. I
am thinking of electronic (semiconductor) disks, which have no
heads and thus no head movement. With an electronic disk, all
overhead associated with "optimizing" file placement
is wasted time and lost performance.
Hole number six: With regular defragmentation, a defragmenter
needs to relocate only a tiny percentage of the files on a disk;
perhaps even less than one percent. "Optimization" requires
moving virtually all the files on the disk, every time you optimize.
Moving 100 times as many files gives you 100 times the opportunity
for error and 100 times the overhead. Is the result worth the
risk and the cost?
Hole number seven: What exactly is the cost of optimizing
a disk and what do you get for it? The costs of fragmentation
are enormous. A file fragmented into two pieces can take twice
as long to access as a contiguous file. A three-piece file can
take three times as long, and so on. Some files fragment into
hundreds of pieces in a few days' use. Imagine the performance
cost of 100 disk accesses where only one would do! Defragmentation
can return a very substantial portion of your system to productive
use.
Now consider optimization. Suppose, for the sake of argument,
that disk data block sequencing really did correspond to physical
block locations and you really could determine which files are
accessed most frequently and you really knew the exact sequence
of head movement from file to file. By carefully analyzing the
entire disk and rearranging all the files on the disk, you could
theoretically reduce the head travel time. The theoretical maximum
reduction in average travel time is one-quarter the average head
movement time, after subtracting the time it takes to start and
stop the head. If the average access time is 32 milliseconds (for
an RA82 model disk) and 24 milliseconds of this is head travel
time, the best you can hope for is a 6 millisecond reduction for
each file that is optimized. On a faster disk, such as the RA71
(12.5 milliseconds), the potential for reduction is proportionately
less - about 2 milliseconds. Taking rotational latency into account,
your savings may be even less.
Each defragmented file, on the other hand, saves potentially one
disk access (32 milliseconds) per fragment. That's over five
times the optimization savings, even with the bare minimum
level of fragmentation. With badly fragmented files, the difference
is astounding.
On top of all that, what do you suppose it costs your system to
analyze and reposition every file on your disk? When you subtract
that from the theoretical optimization savings, it is probably
costing you performance to "optimize" the files.
The fact is that it takes only a tiny amount of fragmentation,
perhaps only one day's normal use of your system, to undo the
theoretical benefits of optimizing file locations. While "optimization"
is an elegant concept to the uninitiated, it is no substitute
for defragmentation, it is unlikely to improve the performance
of your system at all, and it is more than likely to actually
worsen performance in a large number of cases.
In summary, file placement for purposes of optimizing disk performance
is a red herring. It is not technologically difficult to do. It
is just a waste of time.
What should the end result of defragmentation be? What, exactly,
is the product of a defragmenter's efforts?
How about a perfect disk? Wouldn't that be reassuring,
to know that your disks are "perfect"?
A perfect disk, in terms of fragmentation, is a thing of beauty. It is a disk which has each and every file in a perfectly contiguous state, with every bit of free space all collected together in one spot, preferably at the beginning (near LBN 0) of the physical disk.
This seems straightforward and well-grounded in sensible reasoning,
yet there is quite a controversy over the matter. Why?
Well, there are other factors that need to be taken into consideration.
Some say that free space on the disk should not be organized at
the beginning of the disk; that putting the free space there does
no good because new files are allocated from blocks pointed to
by the extent cache (blocks recently freed up by file deletions)
instead of from repeated scanning of the storage bitmap.
This may be true, but it is also true that the extent cache is
loaded first from the storage bitmap and then added to as files
are deleted. It is also true that the location of blocks freed
up by deletions is relatively random across the disk. A defragmentation
strategy that groups files near the beginning will merely reinforce
the random nature of the extent cache because holes caused by
deletions will appear near the beginning (as well as everywhere
else) and the extent cache will be loaded with lots of little
pieces. On the other hand, if the defragmentation strategy grouped
free space near the beginning, the extent cache would be loaded
initially with a large amount of contiguous free space. This would
then result in newly created files being more likely to be created
contiguously in the first place and reduce the need for defragmentation.
In other words, performance would degrade less (remain high) through
prevention.
I must admit that when we are talking about where to consolidate
free space, we are splitting hairs. The performance to be gained
from consolidating free space in a particular area of the disk
is slight, even under the best of circumstances. Moreover, consolidation
of free space is overrated. While badly fragmented free
space is likely to cause performance problems indirectly by forcing
new files to be created in a fragmented state, slightly fragmented
free space does not affect performance at all. In the absence
of an absolute requirement for large contiguous files, there is
no performance benefit whatsoever to a single large contiguous
free space over as many as a few hundred smaller free spaces.
Any resources expended consolidating a few free spaces into one
large one are likely to be wasted. The important number to look
at is the percentage of free space that is consolidated into a
few large spaces.
Some say that free space on a disk should be grouped around the
index file in the middle of the disk. Their argument is that,
by placing free space near the index file, the disk head will
have less distance to travel to and from the INDEXF.SYS file when
creating new files. By keeping the head near the middle tracks
of the disk, the greatest overhead factor in disk performance,
head movement, is reduced.
This is certainly true under certain specific circumstances, but
it is decidedly not true under others. For example, while
this technique is sensible for creating new files, what about
accessing the existing files? By relegating these files
to the outside edges of the disk (the lowest and highest LBNs),
the distance the head must travel to access these files is increased.
Should we assume that any file created before the defragmentation
will never be accessed again? Or that such "old" files
are not accessed often enough to matter? Surely somebody accesses
data files that are more than a day or two old. Under a scheme
such as this, those data file accesses are going to be slowed,
not speeded.
There is also the question of where is the INDEXF.SYS file
located? This scheme assumes that it is located in the middle
of the disk. But OpenVMS allows for the INDEXF.SYS file to be
located at the beginning, at the end, or at any specific block
location the System Manager might choose. What happens to your
system's performance if the INDEXF.SYS file is positioned at the
beginning of the disk and the defragmenter groups all the free
space in the middle? Performance gets clobbered, that's what happens.
The problem with schemes like this is that they are based on theoretical
assumptions rather than on real-world observation of disks in
use on live computer systems. The worst assumption of all is that
disks have a "beginning," an "end" and a "middle."
As noted earlier, we talk of disk locations in terms of LBNs,
or Logical Block Numbers. While a logical block ordinarily
corresponds to a physical block, this is not always so. The OpenVMS
disk architecture allows for logical block numbers to be
reassigned, if necessary, to any physical disk block at
all. This allows a disk to continue to be used even though one
or more physical disk blocks are unusable. The corresponding LBNs
are merely reassigned to other physical blocks. It also allows
for widely varying physical disk architectures that can be treated
identically by the I/O subsystem of OpenVMS.
Take a multi-spindle disk as an example. A disk with a single
platter on a single spindle, with only one side used, is easy
to discuss in terms of "beginning," "middle"
and "end." Here is a diagram showing such a disk:
![]() |
Now here is a diagram of a two-spindle disk with LBN numbers and
arrows indicating the "beginning," "middle"
and "end."
![]() |
Note that the "middle" of this disk spans the innermost
tracks of one platter and the outermost tracks of the other. What
will happen to head movement if the INDEXF.SYS file is placed
at the "middle" of this multi-spindle disk and the free
space is all grouped on either side of it? It will be a disaster
performance-wise.
Your disk might not look like this. But it also might not look
like the idealized conceptual scheme used by designers of such
a system. The point is that, to be useful, the design of a defragmentation
strategy must be based on the logical architecture of the
disk and must work well regardless of different physical structures.
A strategy that assumes one particular physical architecture may
be devastating for other types of disks.
Grouping free space near the logical beginning of a disk
(LBN 0) is guaranteed to reduce time spent scanning the storage
bitmap for free clusters. It is also guaranteed to maximize the
probability that newly created files will be created contiguously
or, at least, minimize the number of fragments in newly created
files, regardless of the physical architecture of the disk
involved.
The final and worst problem with a "perfect" disk is
that its perfection doesn't last. Moments after achieving that
exalted state, some user application has created, deleted or extended
a file and the perfection is history. You now have an imperfect
disk that is well on its way to becoming less and less perfect.
So just how valuable is that fleeting moment of "perfection"?
I submit that a perfect disk is the wrong goal. Elegance, beauty
and perfection in disk file arrangement are not the ideals to
which a System Manager aspires. You cannot take them to
management and show them off as a demonstration of your value
to the organization. You cannot even use them to keep yourself
employed. How many System Manager resumes include a section for
"beauty, elegance and perfection" of disk file arrangement?
None.
The true goal, of course, is performance. Now there's something
you can take to the bank. A system that is so slow you can't log
on this week is a system that will get you fired. A system that
performs even better than the expectations of users and management
might just get you a raise or a promotion or both. So let's talk
about better performance of the system.
The mistake made by the "perfect disk" advocates is
that of viewing a disk and even a VAX (or Alpha AXP) computer
system as a static, unchanging thing. Static, unchanging VAX systems
don't need defragmenters. They need a lift to the nearest scrap
yard.
A real VAX or Alpha AXP system is dynamic, in motion, changing
continuously - almost alive with activity. Real perfection,
in my mind, would be having all this activity streamlined for
efficiency and directed at the specific production goals of the
organization and the individuals who make it up. Let's see a computer
system that can be pointed at a computational problem and vaporize
that problem in a flash. The culprit we are targeting is slowness,
sluggishness, the "can't-help-you-now-I'm-
Let us take a careful look, then, at exactly what factors are
important to deliver this capability. What really counts when
it comes to defragmentation results?
There must be no compromises with safety, either. "99% safe"
is not a feature. "Able to recover most lost data" does
not go over too well with users and management. What you want
is absolute, 100% guaranteed safety; no chance whatsoever of any
user data being lost at all.
Even the ability to recover lost data in the event of a catastrophe
is dicey. The minute you hear that data is lost, even temporarily,
your confidence is weakened. The thread of recovery is too tenuous
for you to sit back, relax and coolly await the recovery of the
lost data by the very tool that trashed it moments ago. OK, maybe
it's better than a poke in the eye with a sharp stick, but it's
still not a very comforting situation.
Earlier, we covered the troublesome defragmentation techniques
that consumed more system resources than they gave back - the
"cure" that is worse than the disease. We answered the
question, "How long should a defragmenter take to do its
job?" with "Less than the time and resources being lost
to fragmentation."
While it is a definite improvement for a defragmenter to spend
19% of the system's resources to get a 20% improvement in performance,
it is a much better improvement for a defragmenter to spend
only 2% of the system's resources to get a 20% improvement in
performance. As a rule of thumb, your defragmenter should consume
not more than two percent (2%) of the system. You can measure
this by using the DCL command SHOW PROCESS or looking in ACCOUNTING
records, then dividing the resources consumed (CPU time, for example)
by the total resources available during the same time period.
For example, if the defragmenter consumed two hours of CPU time
out of every 100, that would be 2 divided by 100, or 2% of the
available CPU time.
The ideal defragmenter would not only consume less than 2% of
the system resources, it would take its 2% from resources that
are otherwise idle, such as CPU idle time and unused disk I/O
bandwidth.
Sometimes, in a highly technical subject, it is easy to get so
caught up in the details that one overlooks things of major importance.
Like the disk that is 99% full. Defragmenting such a disk is a
complete waste of time. Except in extreme cases, the performance
benefits you might get from defragmenting such a disk are insignificant
when compared to the performance benefits available from simply
clearing off 10% or 20% of the disk. When a disk is very nearly
full, OpenVMS spends so much time looking for free space, waiting
for the disk to spin around to bring a tiny free space under the
head, and allocating files in lots of little pieces all over the
place, it is a wonder that any useful work gets done at all.
Similarly, as mentioned earlier, the defragmenter that works hard
to consolidate every bit of free space into a single contiguous
area is wasting its time. Rigorous testing shows clearly that
an OpenVMS disk with up to 63 free spaces performs just as well
as a disk with one large free space. So all the overhead expended
by the defragmenter consolidating free space into less than 63
areas is wasted. And it is your computer resources that
are being wasted.
A defragmenter should address itself to real problems instead
of theoretical ones and deliver immediate, real solutions to them.
The single most important piece of advice I can give on the subject
of defragmenters is to distinguish between the idealized results
obtained in a perfect laboratory environment and what can be expected
in your system under real-world conditions.
According to Computer Intelligence Corporation, a respected source
of information about the VAX market, only 1% of all VAX sites
had a defragmenter installed in October 1987. By October 1988,
the number had grown to 4%. By October 1989, it had shot to 11%.
The survey was not done in 1990, but in October 1991, market research
showed 18% of all sites running a disk defragmenter.
For comparison, the next graph shows the increase in disk capacity
over the same time period (these figures are also from Computer
Intelligence):
Sites with many users: The average VAX site has 102 users.
Two-thirds of all defragmenters are installed at sites with 50
or more users. Twenty-six percent of sites with 200 or more users
have a defragmenter. That's substantially more than the 18% of
all sites that have a defragmenter.
Sites with many disks: The average VAX site has 7.5 disks.
(Don't you love statistics that imply that someone has half a
disk?) Sixty percent of all defragmenters are installed at sites
with six or more disks. Twenty-four percent of sites with 10 or
more disks use a defragmenter. Again, substantially more than
average.
One more interesting statistic is that 62.5% of System Managers
who have attended training classes at Digital Equipment Corporation
use defragmenters.
In looking at the distribution of defragmenters amongst VAX sites,
it can be seen that defragmenters are not distributed evenly across
all sites. More sites than you would expect from an even distribution
have a defragmenter when:
Specifically, survey results from Computer Intelligence show that
the sites running the big VAXes have 26% to 33% more defragmenters
than a random distribution would predict; sites with 500 or more
users are 36% more likely to have a defragmenter, and sites with
three or more VAXes are 131% more likely to be running a defragmenter.
Also, as you might expect, the survey results show that System
Managers with the most experience and training are substantially
more likely to use a defragmenter than inexperienced, untrained
System Managers.
The conclusion is plain: the people who buy defragmenters are
the more experienced and trained System Managers, the ones with
many users, many disks and many VAXes, and particularly those
with the big machines.
In surveying defragmenter buyers as to why they made the purchase,
the overwhelming response is that they had to handle the fragmentation
problem, but backing up to tape and restoring each disk was far
too tedious and time-consuming. The defragmenter frees the System
Manager from this unpleasant chore and saves time so he or she
can get more done.
Finally, it should be noted that ninety percent of defragmenter
buyers place the safety of the product above performance in importance,
but three-quarters of them expect a performance improvement as
a result of defragmenting.
The people who don't buy defragmenters are the sites that have
more disk space, by far, than they really need. Files tend to
be created contiguously in the first place, as there is plenty
of free space to do so. By keeping lots of free disk space available,
these sites suffer very little from fragmentation.
Naturally, these tend to be very small sites. At a larger site,
the cost of a defragmenter is a very small portion of the computer
budget.
As we have seen from the Computer Intelligence data, inexperienced
System Managers don't buy defragmenters. These folks either don't
understand the cause of their system's slower and slower performance
or they lack the expertise to demonstrate the problem and the
need for a solution to those who hold the purse strings. This
book is the answer to both problems, the former in the main body
of the book and the latter in the appendices.
Reverting to a file system that allows only contiguous files is
no solution for the nineties. The automatic on-line defragmenter
is the ideal solution for now. But what does the future hold?
Can we envision what disks might be like ten years from now? What
new and even more capacious forms of storage might come along
and how would these impact the problem of fragmentation and its
defragmenter solution?
We have already seen the introduction of another deliberate form
of fragmentation: disk striping. With striping, files are deliberately
fragmented across two or more disks in "stripes" of
data that can be retrieved from the multiple disks much faster
than they could be retrieved from any one disk. Extensions of
this technology to large arrays of disks could dramatically counter
the fragmentation problem.
Electronic "disks" have made their debut and, with falling
prices for semiconductor memory chips, could become a viable form
of mass storage. Even before seeing electronic storage broadly
in use, however, we will see more of the hybrids, which combine
electronic storage with magnetic. All that is needed is some mechanism
for sorting out the performance-critical data from the non-critical
and this system becomes very cost effective. We are seeing this
type of system now with data caching, particularly as caches are
built into disk controllers to provide better performance.
This path can be extrapolated to intelligent disk subsystems that
determine for themselves where best to store data and have the
data all ready to return promptly when needed.
We can also envision a new file allocation strategy that is not
sensitive to fragmentation. Thinking of files as "flat"
one- or two-dimensional objects leads us to think of the parts
of files as being "close together" or "separate."
A collection of data can also be thought of as a three- or more-dimensional
pile of associated records that can be accessed in any old way.
More elaborate indexing methods give us faster access to the data.
Larger and faster storage devices allow for more elaborate indexing
methods.
This all culminates in a vision of data storage as a completely
automatic mechanism without any files at all. You put data
in and you get information out. Don't ask me how it works,
but that's where I think we are headed. Naturally, without files,
there can be no file fragmentation, so the problem is gone altogether,
no doubt to be replaced by some new, even more perplexing problem
that will keep us system programmers in business for a long time
to come.
It should be clear by now that fragmentation is well understood
and a good solution is available in the form of an automatic on-line
disk defragmenter. It should also be clear that a defragmenter
designed for a static laboratory environment won't cut the mustard
in the real world where things are changing continuously. It is
vital both for performance and for the safety of your data that
the defragmenter be able to deal with a disk that is organized
differently from defragmentation pass to defragmentation pass,
with files that appear and disappear and with heavy user loads
at unexpected times. In choosing a defragmenter, look for one
that addresses known, demonstrable performance issues and not
just theoretical problems demonstrable only in a laboratory environment.
I have taken a great deal of time in this chapter to explain the
ins and outs of defragmentation so that you will be well-informed
and better able to judge your needs and the best solution for
you. I believe that the more you know about fragmentation and
defragmentation, the easier your job will be. How much do you
need to know? Enough to handle the problem once and for all and
get on with more important and interesting things.
Factor Number One: Safety
Factor Number Two: Low Overhead
Factor Number Three: Worst Problems Handled First
Who Buys a Defragmenter?
Who Does Not Buy a Defragmenter?
What Does the Future Hold for Defragmentation?
Conclusion