CHAPTER 6

GETTING THE COMPUTER TO CLEAN UP AFTER ITSELF


This chapter is devoted entirely to the defragmenter as a solution to the fragmentation problem.

History

Defragmenters are not new, as things go in the computer industry. The first one became available for VAX/VMS in 1986. The concept was a simple one: get the computer to clean up after itself, saving the System Manager from the drudgery of doing it himself. I hasten to point out that this simplicity and clarity of purpose was not as obvious at first as it seems in looking back. In fact, the first defragmenter for VAX/VMS took longer to run and had a higher risk of data corruption than backing up the entire disk to tape and restoring it. It had the sole distinction of being the first software product designed specifically for defragmenting disks and disk files. It achieved this distinction by virtue of being a conversion of a defragmenter from another computer and operating system to VAX/VMS. To be first, it paid the price of insufferably slow performance and severe risk of data loss. Needless to say, this product and the company that produced it have long since vanished from the market.

Nevertheless, it was a start, and a crack in the walls of Digital Equipment Corporation's system software fortress. Before 1986, Digital had a virtual monopoly on system software for the VAX. But, when it came to defragmentation, Digital's official view was that on-line defragmentation was "not an easy problem."

Each year, a large group of Digital's customers, members of the Digital Equipment Computer Users Society (DECUS), surveys its members to determine the things that most need improvement in the OpenVMS operating system. This survey is called the System Improvement Request (SIR) ballot. In the 1985 SIR ballots, on-line disk compression (another word for defragmentation) placed first in the survey by a large margin in the United States and second in Europe. Digital customarily responds to the top ten items on the SIR ballot. In Europe, Digital's response took the form of a talk by Andy Goldstein, a top guru for Digital's revered VMS Central Engineering group, in which he said:

We agree that's a fine idea. . . We certainly intend to take a good hard look at this problem. I have some experience with it. I actually worked for RSX back in the old days when we had a gadget called DCU which some of you may remember. I think DCU was a very good demonstration to us all that this is not an easy problem. One of the interesting problems, other than simply interpreting the file structure correctly, is coordinating the compression with ongoing file activity, and of course clusters add a unique twist to that. Also, the performance problem is quite difficult because in-place disk compression is to the best of my understanding an n-squared problem and so with some very large disks you run into some very serious performance problems. Obviously you can do intelligent things to make it run faster, and we are going to have to spend some considerable time and attention in getting reasonable performance out of it.

--DECUS, Cannes, France, Sept 1985

In the U.S., Digital's official response appeared in the February 1986 DECUS newsletter:

While the importance of on-line compression is obvious in retrospect, this is a new request for us in that it has never appeared so prominently in past SIR ballots. We do not have any current plans to build such a facility. However, we understand its importance and we understand that it will increase as time goes on.

There are a number of difficult problems to deal with, including coordinating ongoing file activity with the compression process, and solving the performance problem on large disks (since disk reorganization is inherently an n-squared order problem).

We will investigate this for future VMS development.


--PAGESWAPPER, Feb 1986


These statements by Digital were alarming. Here we had a very large group of customers on two continents declaring this to be their single biggest problem with Digital's flagship product and Digital's official response was that they had no current plans to deal with it!

It would appear that the roadblock in Digital's path was the fixed idea that on-line defragmentation was an "n-squared order problem." This idea is basically a declaration of the impossibility of solving the problem in an acceptable way. Fortunately for the customers, there is nothing like a public declaration of impossibility from an expert to prompt entrepreneurs into action.

The second quote, in fact, was a driving factor in convincing more than one software company to throw its hat into the defragmenter ring. It seemed to be assurance that Digital would not be coming along anytime soon to steamroller the market with its own solution in the form of an "official" Digital defragmenter, even in the face of huge customer demand. True enough, it was not until more than five years later, on December 9, 1991, that Digital announced a disk defragmentation product. During those five years, Digital repeatedly declared its policy on the subject of defragmenters to be that the possibility of obtaining performance gains by use of a disk defragmenting utility was a "misconception." The official Digital solution was:

Digital recommends that disk volumes be saved and restored on a regular basis, using the VMS Backup Utility. When the volume is restored, all the files will be allocated contiguously, and the free space on the disk will be collapsed into one area. Plus, a backup copy of the disk will exist.

VMS Systems Dispatch, August 1989, Digital Equipment Corporation

The policy statement went on to speculate on the horrors that might befall a defragmenter user if something went wrong.

This policy was based on unsound advice. The policy statement ignores the risk inherent in backup and restore: that of unreadable tape. This is not an insignificant risk. What happens if you backup a disk to tape, reinitialize it and then find that the tape is unreadable? You're up the river without a paddle, as they say. That data is gone and it's gone for good. Not so obvious, but a serious concern is that the backup and restore process is so tedious and so time-consuming that one-quarter of the System Managers surveyed recently said they have never bothered to do it. How good can a solution be if it is never used at all?

On top of that, Digital's policy statement lacked any serious testing to determine whether the facts on which it was based were true. I know, because I did serious testing. The best that could be said for Digital would be that Digital had lumped all defragmenters together and assumed that what is true for one must be true for the rest. Don't get the idea that I think badly of Digital. I admire the company in many ways. But Digital has over 95,000 employees and sometimes some of these employees don't communicate very well with each other.

So, first Digital said it couldn't be done. Then, when it was done by others, Digital said its usefulness was a misconception. Now Digital is fielding its own defragmenter product. You draw your own conclusion.

In any event, in 1986 the defragmenter happened. Now there was a tool for the specific purpose of defragmenting disks. Best of all, an on-line variety of defragmenter was developed and did everything itself, automatically.


An On-Line Defragmenter

Laying aside the off-line variety as an unsatisfactory solution, let's take a look at what an on-line defragmenter is, exactly, and how it works.

An on-line defragmenter is distinguished from an off-line defragmenter by the fact that you do not have to shut down the system, kick the users off or take a disk out of service to use the defragmenter. An automatic on-line defragmenter goes a step further and includes a mechanism for determining when to defragment. The non-automatic version is manual in this regard - it requires the System Manager to decide when to defragment. The automatic on-line defragmenter has some mechanism for determining when to defragment - whether by sensing the state of fragmentation or by measuring the degree of fragmentation between defragmentation passes or just by waiting a certain time and then cleaning things up again.

Ideally, the on-line defragmenter keeps a record of how badly fragmented the disk is every time it makes a defragmentation run and, based on that information, increases or decreases the time intervals between runs.

This is not a simple problem. Perhaps the single greatest contributing factor to the problem is the fact that OpenVMS includes no mechanism for determining how often a particular file is accessed. If you could find out how often a file is accessed, you would know how critical that file is to system performance and thus how much attention the defragmenter should give to keeping that particular file contiguous. Another important factor is the lack of any mechanism to determine whether a file has been accessed at all. Without either mechanism, we are reduced to looking mostly at indirect measures of the impact of fragmentation on the system. These measures are described in Chapter 3.

What's Wrong With Fragmentation?

Once it has been determined, by whatever mechanism, that it is time to defragment a disk, the defragmenter has to determine whether defragmentation can be done safely at that time or whether it should wait for a better time to do it. It also has to determine whether the defragmentation activity would degrade system performance unacceptably at that time. It even has to check whether defragmentation will do any good; maybe the disk is in such good shape that defragmentation would be a waste of resources. In this case it would just wait for a while longer and check again later on.

When the defragmenter has determined that the time is right, the next question is, "What files should be defragmented?" On even the most badly fragmented disk, some files are contiguous. Attempting to defragment these would be a waste of time and resources, so some means is needed of detecting them and causing the defragmenter to skip processing those files - all with a minimum of overhead, of course. Other files (such as INDEXF.SYS) must not be defragmented because to do so would interfere with the proper operation of the operating system.

Then, amongst the files that should be defragmented, some determination must be made as to which should go first. Should the files be processed in order of appearance in their directory files? Should the most fragmented files be processed first? Or is some other order best? Does it matter at all?

When a file is selected for defragmenting, the defragmenter has to determine where on the disk to put the newly-created defragmented file. The wrong location might do nothing to improve the condition of free space fragmentation or might even make it worse. Some intelligence is required to choose the best place on the disk for the defragmented file, keeping in mind the gaps that will be created when the fragmented version is deleted.

Then there is the question of what if no single place on the disk is suitable? Could the file be split into two or more pieces and still be better off than in its present fragmented state? Maybe. Maybe not.

Suppose the disk is so messed up that it is going to take two or more defragmentation passes to get the disk into good shape. How does this affect decisions on where to put a defragmented file? Could it profitably be put into a worse position on the first pass, anticipating movement to its ideal position in a subsequent pass?

When a new location is selected, how exactly should the copying be done? Should the file be copied directly into the new location or should it be copied first into a temporary location and then moved to its final destination? And how do you deal with user attempts to access that file while it is in the middle of being relocated? What if someone attempts to modify the original file after the copy has been made, but before the new file formally takes the place of the old one? And, not the least of our worries, what if the system goes down right in the middle of all this?

An automatic on-line defragmenter also has a quality control problem. How can it be sure the file was copied correctly and users can now access every bit of the new file exactly as they did the old?

These are the obvious problems in the simple case. No mention has been made of problems unique to multi-header files, files that span disk volumes, or files that are held open by the system on a complete bypass of the usual procedures so there is no way to tell whether a file is in use or not.

OK, gentle reader, the scary part is over. I hope you are still reading so you can receive the news that there are solutions to all these problems. My purpose in rattling off these problems is to show you that defragmentation is a complicated undertaking, that it has been thought through and we defragmenter vendors are not playing around with your irreplaceable data with our heads stuck in the sand. The important thing to know is that the computer system itself has the answers to these questions within it or, at least, it has the data from which answers can be formulated.

An automatic on-line defragmenter then, is one which uses the data already available within the computer, without the need for operator intervention, to determine when to defragment disks and disk files, what files to defragment, the order in which to defragment them, where to put the new, contiguous files, whether to completely or only partially defragment a particular file, to do all this without interfering with user access to the disk and disk files and to do so with absolute 100% guaranteed data integrity. Yes, this can be done and, at this writing, is being done on tens of thousands of OpenVMS systems around the world.

Done right, this solution to fragmentation requires no attention from the System Manager or operations staff at all. It is a complete elimination of fragmentation as a problem for that OpenVMS system. Such an automatic solution to a problem inherent in the operating system is called an operating system enhancement, as opposed to the manual, tool-variety solution, which is called a utility.


Safety

A good on-line defragmenter does not just provide a means for recovery of user data in the event of a system crash during the defragmentation process; it actually processes files in such a way that no data can be lost. It is possible and practical to create the new, defragmented version of the file, verify its accuracy and replace the old version with the new in between user accesses to the file, all the while guaranteeing that directory and INDEXF.SYS file information refers to a flawless copy of the file. With such a method, there is no window for error in which a file of user data can be lost, even when the system crashes at the worst possible moment.

Apparently, it was concern about this potential window for error and uncertainty about its handling that kept Digital out of the defragmenter business until 1991. In that year Digital incorporated a mechanism into OpenVMS version 5.5 to relocate a file without any window for error. The mechanism adopted by Digital, called MOVEFILE, is similar to the mechanism a leading defragmenter had been using since 1986. When MOVEFILE appeared, Digital's public concerns about the safety of on-line defragmenters ceased, at least for those defragmenters that used the official Digital mechanism for moving files!

The solution is easily explained. Relocating a file on the disk for purposes of defragmenting is a multi-step process. Doing some of the steps without doing the rest can result in a file that is confused, damaged or even lost. The solution is to isolate the critical steps that must be all completely done or none done at all and treat these as a single step. Such a group of steps treated as a unit is called a primitive. In version 5.5 of OpenVMS, this operation is called the movefile primitive. It moves a file from one location on a disk to another, guaranteeing that all the steps necessary to move the file will be fully complete, or none of the steps will be done at all. Thus, you can be sure that the file is either fully moved intact or remains fully intact at its old location. No in-between state can exist. Therefore, even if the system crashes, you can be confident that all your data still exists without corruption of any kind.

Unmovable Files

Some files are unmovable, either because the OpenVMS operating system depends upon the file being at a fixed location or because an application program has fixed the file at a certain location and requires it to remain there. The defragmenter must take care to detect such files and leave them where they are. It is worth noting, however, that programmer errors in applications and even in certain OpenVMS utilities, such as the MAIL facility, sometimes result in a file having the "fixed placement" attribute when there is no reason or intention for that file to remain fixed in place. So how do you know whether a file is supposed to be placed or not? The rule I use as a System Manager is that no one has any business fixing files in place except the System Manager; at least not without his or her knowledge and permission. So, if the file is not an OpenVMS system file and you, as the System Manager, did not put it there, the file is erroneously marked "placed" and you can feel free to move it.

This is not a very safe rule for a defragmenter, however, and any defragmenter should honor file placement control just in case someone really does need that file to be in that exact place.

System Disks and Common System Disks

System disks present special problems for a defragmenter and common system disks are even harder to deal with. Until recently, there were exactly nine files on an OpenVMS system disk that could never be defragmented while the system was up. Now there are many more, though the number varies from system to system. The only ways to process these untouchable files are to shut down the system and use Stand-Alone Backup to backup and restore the system disk, or to arrange for the system disk to be accessible from a VAX or Alpha AXP as a user disk and defragment the disk off-line.

These are not usually viable options, so system disks were originally excluded from processing by defragmenters. Some defragmenters skirted the issue by excluding all files in a system root directory, processing only files that reside on the system disk outside those directories reserved for the OpenVMS operating system. The remarkable aspect of this is that user files residing on the system disk probably cost more performance than moderate levels of fragmentation. The System Manager could get a bigger performance boost by moving those files off the system disk than by defragmenting them, bigger perhaps than defragmenting all the fragmented user files on user disks.

A good defragmenter knows exactly which files can be moved and which cannot, so it can defragment a system disk just as freely as any user disk without risk to the system. The same is true for a common system disk, where the system files of two or more different systems reside, though extra care must be taken by the defragmenter to ensure that another system's unmovable files are left in place even though they will not appear unmovable to the system on which the defragmenter is running.

Quorum Disks

A quorum disk is one which substitutes for a VAX or Alpha AXP computer, acting as a node in a cluster. The reasons for this are complex and not important to defragmentation. The important thing is that, as on a system disk, certain files on a quorum disk must not be moved. The defragmenter has to take this into account.


Performance

The main reason, if not the only reason, for defragmenting a disk is performance. If you know nothing of fragmentation, you can become quite perplexed watching a system's performance become worse and worse, week after week, month after month, with exactly the same hardware configuration, the same software applications and the same user load. It's almost as if the machine left the factory with planned obsolescence built into it. How can the performance of the exact same system be so much worse if nothing has changed? Well something did change: the files and disk free space fragmented with use. The proof? Defragment the disks, and the system performs like new.

The sinister side of this is that fragmentation occurs so gradually that you might not notice the creeping degradation. If system response worsens by only a fraction of a second each day, no one is likely to notice from one day to the next. Then, weeks or months later, you realize that system response is intolerable. What happened? Your system has caught the fragmentation disease.

Disk Access

The first rule of performance management is that the cure must not be worse than the disease. This is the rule that killed off-line defragmenters. Here's how it works.

Let's say, for the sake of argument, that your system is losing 10% of its performance to fragmentation. That is to say, jobs take 10% longer to run than they should or, put another way, only 90% of a day's work can get done in a day. Ten percent of a 24-hour day is 2.4 hours. The solution to your fragmentation problem has to consume less than 2.4 hours per day or it just isn't worth it.

Seems simple, doesn't it? Well, shutting down the system or taking a disk out of service to defragment it is a 100% degradation of performance. Performance just doesn't get any worse than "the system is down." So an off-line defragmenter that costs you three or four hours of computer time a day is more costly than the losses to fragmentation. The cure is worse than the disease.

The computer resources consumed by a defragmenter must be less, much less, than the performance losses due to fragmentation. The best way to violate this rule is to defragment using a method that requires taking the disk out of service. So a good defragmenter works on a disk while the system is up and while the disk being defragmented is being accessed by user applications. After safety, this is the most important feature of a defragmenter.

File Availability

A secondary aspect of this same disk access feature is that the files on the disk must be available to user applications. It is not enough to allow access only to the free space on the disk for the creation of new files. User applications must be able to access existing files as well. And while the defragmenter may be accessing only a single file out of perhaps 10,000 on a disk, guess which file some user's application is most likely to want to read? Yes, it is the one file that just happens to be being defragmented at that moment. Murphy's law strikes again.

So an on-line defragmenter must assume that there will be contention for access to the files being defragmented. Other programs will want to get at those files and will want to get at them at the same time as the defragmenter. The defragmenter, therefore, must have some means of detecting such an access conflict and responding in such a way that user access is not denied. The defragmenter has to give way. Technologically, this is tricky, but it can be done and is done by a good defragmenter.

Locating Files

Another aspect of defragmenter performance is the amount of time and resources consumed in finding a file to defragment. Scanning through some 10,000 files by looking up file names in directories and subdirectories is out of the question. The time it takes to do this is a blatant violation of Rule One - it outweighs the performance gains likely to be obtained by defragmenting.

A much better way to rapidly find files for defragmenting is by reading the INDEXF.SYS file directly. The index file contains the file headers for all the files on the disk and within each file header is contained all the information a defragmenter needs to know about the state of fragmentation of a file. Specifically, the header tells how many fragments there are, where each is located on the disk and how big each one is. So a defragmenter can zip through the index file, picking out those files that need defragmenting, consuming typically only one disk access per file checked. Better yet, by reading several headers at once, multiple files can be checked for fragmentation with each disk access. A good defragmenter uses the index file to find files to process.

The Defragmentation Process

After a file has been selected for defragmentation, the overhead involved in the defragmentation process itself can be significant. If the file is large, it can be very significant. After all, it is usually necessary to copy the file in its entirety to make it contiguous. As many as 200 disk accesses may be required to copy a 100-block file (100 reads and 100 writes). These two hundred disk accesses at 25 milliseconds apiece would consume 2.5 seconds. With this kind of overhead, processing even a fraction of 10,000 files on a disk holding hundreds of megabytes can be a time-consuming activity. Fortunately, the OpenVMS file system is more efficient than these figures would imply. Only the smallest disks, for example, have a cluster size of 1, so disk reads and writes generally move 2 or 3 or more blocks at once. Further, regular defragmentation holds down the amount of activity required. It is worth noting that performance worsens geometrically as the degree of fragmentation increases, so catching fragmentation early and defragmenting often requires less resources overall than occasional massive defragmentation.

The defragmenter itself can do a lot to lessen the impact of defragmentation overhead. A throttling mechanism, for example, can reduce defragmentation I/O during times of intense disk activity and increase it during slack times. This mechanism gives the appearance of greatly reduced overhead by scheduling the overhead at a time when the resource is not needed anyway. Using idle time in this way can make the defragmenter invisible to users of the system.

Perhaps the worst source of excess overhead for a disk defragmenter is the attempt to analyze an entire disk before defragmenting and plan a "perfect" defragmentation pass based on the results of this analysis. The idea is that a defragmenter can calculate the ideal position for each file, then move each file to the best position on the disk. This is a holdover from the off-line defragmenter days and, besides carrying the risks described in Chapter 5, it is enormously expensive in terms of overhead. Such an analysis requires examining literally every file on the disk. On top of that, the analysis becomes obsolete instantly if there is any activity on the disk other than the defragmenter.

A good defragmenter, then, should approach the process one file at a time and not require the overhead of analyzing every file on a disk in order to defragment only a few files.


Basic Functionality

After safety and performance, you should look for basic functionality in a defragmenter.

The most basic functionality is the decision of what to defragment and what to leave alone. Not in all cases is it desirable to defragment everything. Some selectivity is required.

A defragmenter has to exclude from its processing certain system files, like INDEXF.SYS. It should be wary of placed files and files that have allocation errors. It should also have the capability of excluding a list of files provided by the System Manager. You might also look for the ability to include certain files in processing (or exclude "all files except ______") and possibly the ability to force immediate defragmentation of a particular file or group of files.

Disk Integrity Checks

Perhaps the most important basic functionality of a defragmenter is determining whether a disk is safe to defragment or not. It is possible, even commonplace, for a disk to get into a state where the data on the disk is not exactly where the file headers in the index file indicate it should be. When this occurs, it is extremely important for the matter to be corrected before any file involved is deleted, as deleting a file (using the erroneous information in the header from the index file) might cause the wrong data on the disk to be deleted! A good defragmenter must detect this condition and alert the System Manager to it so it can be corrected before defragmentation begins.

It is also possible for a good defragmenter to detect and isolate certain types of problems on a disk and avoid those areas while continuing to safely defragment the rest of the disk.

Frequency

How often should you defragment a disk? "Often enough so performance does not suffer noticeably," is the simple answer. Of course, by the time your system's performance is suffering "noticeably," it's too late, so this is not a workable answer.

To answer this question with a numeric quantity, like "every week" or "every two weeks," you have to know how long it takes for fragmentation to build up to a level where performance suffers noticeably. You can use a disk analysis utility or a performance monitor to measure the level of fragmentation on your system periodically, perhaps daily. Then, when performance begins to suffer noticeably, you can take note of what level of fragmentation you have. Let's say this happens when fragmentation reaches an average of 1.1 fragments per file (10% fragmentation). Thereafter, you can periodically measure fragmentation and when it gets to, say, 1.05, defragment the disk.

An automatic on-line defragmenter includes a mechanism to measure fragmentation and schedule defragmentation passes accordingly. The ideal automatic on-line defragmenter would detect performance drains attributable to fragmentation and eliminate the causes before the drains became noticeable.

Full Disks

It is one thing to ramble on about the workings of a defragmenter in an ideal, laboratory environment, but it is quite another thing to see one working in the real world. One of the tricks played on us System Managers in the real world is full disks. Somehow, despite our best efforts, disks that really ought to remain 80% full or less drift up to 98%, 99% or even 100% full. Sure, you can spot this and take appropriate steps to handle it, but what happens to your defragmenter during that critical window of time between the disk going 100% full and your clearing off some space? Look for a defragmenter that survives this circumstance intact and leaves every bit of your user data equally intact. A defragmenter can't do much in the way of defragmenting with zero free space on the disk. The point is, if it can't defragment, it shouldn't consume overhead either. So a good defragmenter should do nothing at all when there is nothing useful it can do.

Large Files

Another side of the same coin is the fragmented file that is larger than the largest free space on the disk. Suppose, for example, you have 10,000 blocks of free space, all in one place, but there is a 12,000 block fragmented file. How does the defragmenter deal with that?

Older defragmenters used to rely on scratch space on a second disk to handle this problem, but that proved so unreliable that it has disappeared as a practice. Some defragmenters don't deal with the problem at all; they just ignore the file. A good defragmenter will partially defragment the file, giving you the best result it can within the constraints of space available, and then return to the file later for complete defragmenting when sufficient space has been freed up.

Always-Open Files

Another one of those real world tricks that doesn't show up in the test lab is the file that is held open all the time, leaving no "downtime" for that file in which to defragment it. Database files are prime candidates for this trick, particularly large database files. And why not? That big database probably contains most of the data that justifies the computer's existence. It ought to be in use around the clock. A defragmenter needs to take such files into account and provide a means of dealing with them safely.

Besides the always-open file, there is also the one file that a user application happens to need at the instant the defragmenter is working on it. What happens in that case? Does the defragmenter give way? Does it even notice? Or does the application program trip, fail and abort with a file access error?

The minimum proper action is for the defragmenter to 1) notice that an attempt is being made to access the file, 2) abort its own operation safely and quickly, and 3) try again later. The ideal defragmenter would process files in such a way that no user application could ever falter from or even detect an access conflict. In other words, the defragmenter should have enough control over system operation to move the file at a time when no user is attempting access and in such a way that no attempted access by an application would ever fail.

File Creation and Modification Dates

Another simple but important piece of basic functionality is the preservation of file creation and modification dates. You can defragment a file quite easily by simply using the DCL command COPY /CONTIGUOUS. If there is a free space of sufficient size available, DCL will make a contiguous copy of the file for you. The problem with this method is that it gives the copy a new file creation date. You might not care whether the date is changed or not, but the VMS BACKUP utility will. The next time you do an incremental backup, the copied file will be saved even though it was saved on an earlier backup. The reason is the new date given to the file by the COPY command. For a single file, this may be no big deal, but clearly a defragmenter cannot go around changing file creation dates wholesale. Nor can the file's modification date or date of last backup be changed. Either action would cause your incremental backups to explode from relatively small savesets to ones rivaling full backups in size.

A good defragmenter should not change file creation dates, file modification dates, file backup dates or any other information in the file header except the size and location of the extents (fragments) that make up the file.

Directory Files

Directory files never become fragmented, but they are otherwise just like any other file. Directory files do fragment the disk's free space, however. Directory files present a special problem for a defragmenter in that while a defragmenter has a directory file locked or held open for relocation, not only is that directory file inaccessible to users, so is every file in that directory and every file in every subdirectory below it. To access a file by name, a user application must go through the directory or directories containing that file's name. If the directory is locked, the user application gets an access conflict error. If the user application is not expecting an access conflict error or is not designed to deal with such errors, the application may abort.

A good defragmenter is designed with this problem in mind and moves directory files without any restrictions whatsoever on user access to files in that directory or its subdirectories. It is no solution to just ignore directory files, as this leaves your free space badly fragmented.


Red Herrings

"Optimization" by File Placement

"Optimization" of disk access by deliberate placement of files in certain locations is a red herring - an attempt to draw your attention away from the real issues of defragmentation and onto something else entirely.

First of all, optimization has nothing to do with defragmentation. Defragmentation is the solution to the problem created by fragmented files and disk free space. Your system is slow. The real reason it is slow is that files and disk free space are fragmented. The solution is to make the files contiguous (not fragmented) and group the free space together. That's it.

Where does optimization come in? Well, this is a different subject altogether. The concept of disk optimization supposedly accelerates file access even when all the files are contiguous and all the free space is grouped together. Disk optimization is an attempt to speed up file access by forcing certain files to be permanently located in certain positions on the disk. The theory goes that if you put the INDEXF.SYS file in the middle of the disk and group the most frequently accessed files around it, the disk heads will generally have to travel a shorter distance than if these files were located randomly around the disk.

There are some major holes in this theory. In fact, the holes are so major that I think the "optimization" proponents either don't fully understand the OpenVMS system or are just using optimization as a marketing gimmick. There are too many holes in the theory.

Hole number one: There is no standard, supported way on an OpenVMS system to tell which files are most frequently accessed. In fact, there is no way to tell which files are frequently accessed or even which files have ever been accessed. You can tell which files have been written and when they were last written, but not when they were read. The only thing that comes close to providing this information is the enabling of volume retention dates, but enabling this feature consumes more overhead than you are likely to get back by "optimizing" file placement. The cure is worse than the disease.

Hole number two: Extensive analysis of real-world computer sites shows that it is not commonplace for entire files to be accessed all at once. It is far more common for only a few blocks of a file to be accessed at a time. Consider a database application, for example. User applications rarely, if ever, search or update the entire database. They access only the particular records desired. Thus locating the entire database in the middle of a disk is wasteful at best and possibly destructive as far as performance is concerned.

Hole number three: File placement capability in OpenVMS was designed for the realtime laboratory environment in which a single process has continuous control of the computer system. In such a system, the time consumed by head movement from one particular file to another particular file can be critical to the success of the process. The system designer can minimize that critical time lag by calculating the ideal location for the second file in relation to the first and forcing the two files to exact locations. Then, when the process has completed reading the first file, access to the second is effected with minimal delay.

By comparison, consider the typical interactive user environment. Dozens or even hundreds of interactive users might be logged on and active at any moment, running who knows what applications, accessing innumerable files willy-nilly in every conceivable part of a disk. How can one even hope to guess where the disk's read-write head might be at any given time? With this extremely random mode of operation, how can a disk optimizer state flatly that positioning such-and-such a file at such-and-such an exact location will reduce disk access times? It seems to me that such a statement is foolish and such file positioning is equally as likely to worsen system performance as to improve it. Even if the two conditions balance out at zero, the overhead involved gives you a net loss.

Hole number four: When you force a file to a specific position on the disk by specifying exact LBNs, how do you know where it really is? You have to take into account the difference between logical block numbers (LBNs) and physical block numbers (PBNs). These two are not the same thing. LBNs are assigned to PBNs by the disk's controller. Disks supplied by Digital Equipment Corporation often have as many as 10% more physical blocks than logical blocks. The LBNs are assigned to most of the physical blocks and the remainder are used as spares and for maintenance purposes. You see, magnetic disks are far from perfect and blocks sometimes "go bad." In fact, it is a rarity for a magnetic disk to leave the manufacturer without some bad blocks. When the disk is formatted by Digital or by the customer, the bad blocks are detected and "revectored" to spares. Revectored means that the LBN assigned to that physical block is reassigned to some other physical block. This revectoring can also be done on the fly while your disk is in use. The new block after revectoring might be on the same track and physically close to the original, but then again it might not. Thus, all LBNs do not correspond to the physical block of the same number and two consecutive LBNs may actually be widely separated on the disk.

So I ask again, "When you force a file to a specific position on the disk, how do you know where it really is?" You may be playing probabilities and perhaps you should think twice before gambling with user data and system performance.

Hole number five: Where is the "middle" of a disk? Obviously, no one is suggesting that the geometric center of the round disk platter, like the hole in a phonograph record, is the "middle." Of course not. We are talking about data storage. The middle is the point halfway between LBN zero (the "beginning" of the disk) and the highest LBN on that disk volume (the "end" of the disk). Right?

Well, maybe not. We have already seen that LBNs do not necessarily correspond to the physical disk block of the same number. But what about a multi-spindle disk (one with two or more sets of platters rotating on separate spindles)? There are several different types of multi-spindle disks. Besides the common volume sets and stripesets, there are also disks that use multiple spindles for speed and reliability yet appear to OpenVMS as a single disk drive. Where is the "middle" of such a disk? I think you will agree that, while the location of the apparent middle can be calculated, the point accessed in the shortest average time is certainly not the point halfway between LBN zero and the last LBN. This halfway point would be on the outermost track of one platter or on the innermost track of another - not on the middle track of either one. Such disk volumes actually have several "middles" when speaking in terms of access times.

There are even disks that have no performance middle at all. I am thinking of electronic (semiconductor) disks, which have no heads and thus no head movement. With an electronic disk, all overhead associated with "optimizing" file placement is wasted time and lost performance.

Hole number six: With regular defragmentation, a defragmenter needs to relocate only a tiny percentage of the files on a disk; perhaps even less than one percent. "Optimization" requires moving virtually all the files on the disk, every time you optimize. Moving 100 times as many files gives you 100 times the opportunity for error and 100 times the overhead. Is the result worth the risk and the cost?

Hole number seven: What exactly is the cost of optimizing a disk and what do you get for it? The costs of fragmentation are enormous. A file fragmented into two pieces can take twice as long to access as a contiguous file. A three-piece file can take three times as long, and so on. Some files fragment into hundreds of pieces in a few days' use. Imagine the performance cost of 100 disk accesses where only one would do! Defragmentation can return a very substantial portion of your system to productive use.

Now consider optimization. Suppose, for the sake of argument, that disk data block sequencing really did correspond to physical block locations and you really could determine which files are accessed most frequently and you really knew the exact sequence of head movement from file to file. By carefully analyzing the entire disk and rearranging all the files on the disk, you could theoretically reduce the head travel time. The theoretical maximum reduction in average travel time is one-quarter the average head movement time, after subtracting the time it takes to start and stop the head. If the average access time is 32 milliseconds (for an RA82 model disk) and 24 milliseconds of this is head travel time, the best you can hope for is a 6 millisecond reduction for each file that is optimized. On a faster disk, such as the RA71 (12.5 milliseconds), the potential for reduction is proportionately less - about 2 milliseconds. Taking rotational latency into account, your savings may be even less.

Each defragmented file, on the other hand, saves potentially one disk access (32 milliseconds) per fragment. That's over five times the optimization savings, even with the bare minimum level of fragmentation. With badly fragmented files, the difference is astounding.

On top of all that, what do you suppose it costs your system to analyze and reposition every file on your disk? When you subtract that from the theoretical optimization savings, it is probably costing you performance to "optimize" the files.

The fact is that it takes only a tiny amount of fragmentation, perhaps only one day's normal use of your system, to undo the theoretical benefits of optimizing file locations. While "optimization" is an elegant concept to the uninitiated, it is no substitute for defragmentation, it is unlikely to improve the performance of your system at all, and it is more than likely to actually worsen performance in a large number of cases.

In summary, file placement for purposes of optimizing disk performance is a red herring. It is not technologically difficult to do. It is just a waste of time.

The "Perfect" Disk

What should the end result of defragmentation be? What, exactly, is the product of a defragmenter's efforts?

How about a perfect disk? Wouldn't that be reassuring, to know that your disks are "perfect"?

A perfect disk, in terms of fragmentation, is a thing of beauty. It is a disk which has each and every file in a perfectly contiguous state, with every bit of free space all collected together in one spot, preferably at the beginning (near LBN 0) of the physical disk.

This seems straightforward and well-grounded in sensible reasoning, yet there is quite a controversy over the matter. Why?

Well, there are other factors that need to be taken into consideration.

Some say that free space on the disk should not be organized at the beginning of the disk; that putting the free space there does no good because new files are allocated from blocks pointed to by the extent cache (blocks recently freed up by file deletions) instead of from repeated scanning of the storage bitmap.

This may be true, but it is also true that the extent cache is loaded first from the storage bitmap and then added to as files are deleted. It is also true that the location of blocks freed up by deletions is relatively random across the disk. A defragmentation strategy that groups files near the beginning will merely reinforce the random nature of the extent cache because holes caused by deletions will appear near the beginning (as well as everywhere else) and the extent cache will be loaded with lots of little pieces. On the other hand, if the defragmentation strategy grouped free space near the beginning, the extent cache would be loaded initially with a large amount of contiguous free space. This would then result in newly created files being more likely to be created contiguously in the first place and reduce the need for defragmentation. In other words, performance would degrade less (remain high) through prevention.

I must admit that when we are talking about where to consolidate free space, we are splitting hairs. The performance to be gained from consolidating free space in a particular area of the disk is slight, even under the best of circumstances. Moreover, consolidation of free space is overrated. While badly fragmented free space is likely to cause performance problems indirectly by forcing new files to be created in a fragmented state, slightly fragmented free space does not affect performance at all. In the absence of an absolute requirement for large contiguous files, there is no performance benefit whatsoever to a single large contiguous free space over as many as a few hundred smaller free spaces. Any resources expended consolidating a few free spaces into one large one are likely to be wasted. The important number to look at is the percentage of free space that is consolidated into a few large spaces.

Some say that free space on a disk should be grouped around the index file in the middle of the disk. Their argument is that, by placing free space near the index file, the disk head will have less distance to travel to and from the INDEXF.SYS file when creating new files. By keeping the head near the middle tracks of the disk, the greatest overhead factor in disk performance, head movement, is reduced.

This is certainly true under certain specific circumstances, but it is decidedly not true under others. For example, while this technique is sensible for creating new files, what about accessing the existing files? By relegating these files to the outside edges of the disk (the lowest and highest LBNs), the distance the head must travel to access these files is increased. Should we assume that any file created before the defragmentation will never be accessed again? Or that such "old" files are not accessed often enough to matter? Surely somebody accesses data files that are more than a day or two old. Under a scheme such as this, those data file accesses are going to be slowed, not speeded.

There is also the question of where is the INDEXF.SYS file located? This scheme assumes that it is located in the middle of the disk. But OpenVMS allows for the INDEXF.SYS file to be located at the beginning, at the end, or at any specific block location the System Manager might choose. What happens to your system's performance if the INDEXF.SYS file is positioned at the beginning of the disk and the defragmenter groups all the free space in the middle? Performance gets clobbered, that's what happens.

The problem with schemes like this is that they are based on theoretical assumptions rather than on real-world observation of disks in use on live computer systems. The worst assumption of all is that disks have a "beginning," an "end" and a "middle." As noted earlier, we talk of disk locations in terms of LBNs, or Logical Block Numbers. While a logical block ordinarily corresponds to a physical block, this is not always so. The OpenVMS disk architecture allows for logical block numbers to be reassigned, if necessary, to any physical disk block at all. This allows a disk to continue to be used even though one or more physical disk blocks are unusable. The corresponding LBNs are merely reassigned to other physical blocks. It also allows for widely varying physical disk architectures that can be treated identically by the I/O subsystem of OpenVMS.

Take a multi-spindle disk as an example. A disk with a single platter on a single spindle, with only one side used, is easy to discuss in terms of "beginning," "middle" and "end." Here is a diagram showing such a disk:

Single Disk
Figure 6-1 Single Disk

Now here is a diagram of a two-spindle disk with LBN numbers and arrows indicating the "beginning," "middle" and "end."

Two-Spindle Disk

Figure 6-2 Two-Spindle Disk

Note that the "middle" of this disk spans the innermost tracks of one platter and the outermost tracks of the other. What will happen to head movement if the INDEXF.SYS file is placed at the "middle" of this multi-spindle disk and the free space is all grouped on either side of it? It will be a disaster performance-wise.

Your disk might not look like this. But it also might not look like the idealized conceptual scheme used by designers of such a system. The point is that, to be useful, the design of a defragmentation strategy must be based on the logical architecture of the disk and must work well regardless of different physical structures. A strategy that assumes one particular physical architecture may be devastating for other types of disks.

Grouping free space near the logical beginning of a disk (LBN 0) is guaranteed to reduce time spent scanning the storage bitmap for free clusters. It is also guaranteed to maximize the probability that newly created files will be created contiguously or, at least, minimize the number of fragments in newly created files, regardless of the physical architecture of the disk involved.

The final and worst problem with a "perfect" disk is that its perfection doesn't last. Moments after achieving that exalted state, some user application has created, deleted or extended a file and the perfection is history. You now have an imperfect disk that is well on its way to becoming less and less perfect. So just how valuable is that fleeting moment of "perfection"?


The True Goal Of Defragmentation

I submit that a perfect disk is the wrong goal. Elegance, beauty and perfection in disk file arrangement are not the ideals to which a System Manager aspires. You cannot take them to management and show them off as a demonstration of your value to the organization. You cannot even use them to keep yourself employed. How many System Manager resumes include a section for "beauty, elegance and perfection" of disk file arrangement? None.

The true goal, of course, is performance. Now there's something you can take to the bank. A system that is so slow you can't log on this week is a system that will get you fired. A system that performs even better than the expectations of users and management might just get you a raise or a promotion or both. So let's talk about better performance of the system.

The mistake made by the "perfect disk" advocates is that of viewing a disk and even a VAX (or Alpha AXP) computer system as a static, unchanging thing. Static, unchanging VAX systems don't need defragmenters. They need a lift to the nearest scrap yard.

A real VAX or Alpha AXP system is dynamic, in motion, changing continuously - almost alive with activity. Real perfection, in my mind, would be having all this activity streamlined for efficiency and directed at the specific production goals of the organization and the individuals who make it up. Let's see a computer system that can be pointed at a computational problem and vaporize that problem in a flash. The culprit we are targeting is slowness, sluggishness, the "can't-help-you-now-I'm-too-busy-doing-something-else" of a mismanaged computer system. The ideal defragmenter would vaporize that problem and give you the laser-fast precision instrument you expect.

Let us take a careful look, then, at exactly what factors are important to deliver this capability. What really counts when it comes to defragmentation results?

Factor Number One: Safety

Your defragmenter must be safe. Above all other requirements, features and benefits, safety of user data is the most important. One trashed disk of user data, one crashed system, even one lost user file can outweigh all the performance benefits in the world. What good does it do to talk of performance increases when the system is down and user data forever lost?

There must be no compromises with safety, either. "99% safe" is not a feature. "Able to recover most lost data" does not go over too well with users and management. What you want is absolute, 100% guaranteed safety; no chance whatsoever of any user data being lost at all.

Even the ability to recover lost data in the event of a catastrophe is dicey. The minute you hear that data is lost, even temporarily, your confidence is weakened. The thread of recovery is too tenuous for you to sit back, relax and coolly await the recovery of the lost data by the very tool that trashed it moments ago. OK, maybe it's better than a poke in the eye with a sharp stick, but it's still not a very comforting situation.

Factor Number Two: Low Overhead

Earlier, we covered the troublesome defragmentation techniques that consumed more system resources than they gave back - the "cure" that is worse than the disease. We answered the question, "How long should a defragmenter take to do its job?" with "Less than the time and resources being lost to fragmentation."

While it is a definite improvement for a defragmenter to spend 19% of the system's resources to get a 20% improvement in performance, it is a much better improvement for a defragmenter to spend only 2% of the system's resources to get a 20% improvement in performance. As a rule of thumb, your defragmenter should consume not more than two percent (2%) of the system. You can measure this by using the DCL command SHOW PROCESS or looking in ACCOUNTING records, then dividing the resources consumed (CPU time, for example) by the total resources available during the same time period. For example, if the defragmenter consumed two hours of CPU time out of every 100, that would be 2 divided by 100, or 2% of the available CPU time.

The ideal defragmenter would not only consume less than 2% of the system resources, it would take its 2% from resources that are otherwise idle, such as CPU idle time and unused disk I/O bandwidth.

Factor Number Three: Worst Problems Handled First

Sometimes, in a highly technical subject, it is easy to get so caught up in the details that one overlooks things of major importance. Like the disk that is 99% full. Defragmenting such a disk is a complete waste of time. Except in extreme cases, the performance benefits you might get from defragmenting such a disk are insignificant when compared to the performance benefits available from simply clearing off 10% or 20% of the disk. When a disk is very nearly full, OpenVMS spends so much time looking for free space, waiting for the disk to spin around to bring a tiny free space under the head, and allocating files in lots of little pieces all over the place, it is a wonder that any useful work gets done at all.

Similarly, as mentioned earlier, the defragmenter that works hard to consolidate every bit of free space into a single contiguous area is wasting its time. Rigorous testing shows clearly that an OpenVMS disk with up to 63 free spaces performs just as well as a disk with one large free space. So all the overhead expended by the defragmenter consolidating free space into less than 63 areas is wasted. And it is your computer resources that are being wasted.

A defragmenter should address itself to real problems instead of theoretical ones and deliver immediate, real solutions to them. The single most important piece of advice I can give on the subject of defragmenters is to distinguish between the idealized results obtained in a perfect laboratory environment and what can be expected in your system under real-world conditions.

Who Buys a Defragmenter?

According to Computer Intelligence Corporation, a respected source of information about the VAX market, only 1% of all VAX sites had a defragmenter installed in October 1987. By October 1988, the number had grown to 4%. By October 1989, it had shot to 11%. The survey was not done in 1990, but in October 1991, market research showed 18% of all sites running a disk defragmenter.

Percentage Of VAX Sites With A Defragmenter

Graph 6-1 Percentage Of Vax Sites With A Defragmenter

For comparison, the next graph shows the increase in disk capacity over the same time period (these figures are also from Computer Intelligence):

Average Disk Capacity Per System (Megabytes)

Graph 6-2 Average Disk Capacity Per System (Megabytes)

Experienced System Managers: The average System Manager has three years experience. Fully two-thirds of the defragmenters in existence are in the hands of the more experienced half of the System Managers. A much higher percentage of System Managers with eight or more years of experience use a defragmenter than System Managers with less experience.

Sites with many users: The average VAX site has 102 users. Two-thirds of all defragmenters are installed at sites with 50 or more users. Twenty-six percent of sites with 200 or more users have a defragmenter. That's substantially more than the 18% of all sites that have a defragmenter.

Sites with many disks: The average VAX site has 7.5 disks. (Don't you love statistics that imply that someone has half a disk?) Sixty percent of all defragmenters are installed at sites with six or more disks. Twenty-four percent of sites with 10 or more disks use a defragmenter. Again, substantially more than average.

One more interesting statistic is that 62.5% of System Managers who have attended training classes at Digital Equipment Corporation use defragmenters.

In looking at the distribution of defragmenters amongst VAX sites, it can be seen that defragmenters are not distributed evenly across all sites. More sites than you would expect from an even distribution have a defragmenter when:


Specifically, survey results from Computer Intelligence show that the sites running the big VAXes have 26% to 33% more defragmenters than a random distribution would predict; sites with 500 or more users are 36% more likely to have a defragmenter, and sites with three or more VAXes are 131% more likely to be running a defragmenter.

Also, as you might expect, the survey results show that System Managers with the most experience and training are substantially more likely to use a defragmenter than inexperienced, untrained System Managers.

The conclusion is plain: the people who buy defragmenters are the more experienced and trained System Managers, the ones with many users, many disks and many VAXes, and particularly those with the big machines.

In surveying defragmenter buyers as to why they made the purchase, the overwhelming response is that they had to handle the fragmentation problem, but backing up to tape and restoring each disk was far too tedious and time-consuming. The defragmenter frees the System Manager from this unpleasant chore and saves time so he or she can get more done.

Finally, it should be noted that ninety percent of defragmenter buyers place the safety of the product above performance in importance, but three-quarters of them expect a performance improvement as a result of defragmenting.


Who Does Not Buy a Defragmenter?

The people who don't buy defragmenters are the sites that have more disk space, by far, than they really need. Files tend to be created contiguously in the first place, as there is plenty of free space to do so. By keeping lots of free disk space available, these sites suffer very little from fragmentation.

Naturally, these tend to be very small sites. At a larger site, the cost of a defragmenter is a very small portion of the computer budget.

As we have seen from the Computer Intelligence data, inexperienced System Managers don't buy defragmenters. These folks either don't understand the cause of their system's slower and slower performance or they lack the expertise to demonstrate the problem and the need for a solution to those who hold the purse strings. This book is the answer to both problems, the former in the main body of the book and the latter in the appendices.


What Does the Future Hold for Defragmentation?

We have seen, much earlier in this book, that fragmentation did not just happen. Rather, it was deliberately introduced as a solution to an earlier problem. The file structure for the OpenVMS operating system and its predecessor, RSX-11, was purposefully designed to allow fragmentation so users would not have the more serious problem of running out of file space prematurely. Then, as disk capacities grew to proportions previously unimagined, fragmentation came to be a problem in its own right.

Reverting to a file system that allows only contiguous files is no solution for the nineties. The automatic on-line defragmenter is the ideal solution for now. But what does the future hold? Can we envision what disks might be like ten years from now? What new and even more capacious forms of storage might come along and how would these impact the problem of fragmentation and its defragmenter solution?

We have already seen the introduction of another deliberate form of fragmentation: disk striping. With striping, files are deliberately fragmented across two or more disks in "stripes" of data that can be retrieved from the multiple disks much faster than they could be retrieved from any one disk. Extensions of this technology to large arrays of disks could dramatically counter the fragmentation problem.

Electronic "disks" have made their debut and, with falling prices for semiconductor memory chips, could become a viable form of mass storage. Even before seeing electronic storage broadly in use, however, we will see more of the hybrids, which combine electronic storage with magnetic. All that is needed is some mechanism for sorting out the performance-critical data from the non-critical and this system becomes very cost effective. We are seeing this type of system now with data caching, particularly as caches are built into disk controllers to provide better performance.

This path can be extrapolated to intelligent disk subsystems that determine for themselves where best to store data and have the data all ready to return promptly when needed.

We can also envision a new file allocation strategy that is not sensitive to fragmentation. Thinking of files as "flat" one- or two-dimensional objects leads us to think of the parts of files as being "close together" or "separate." A collection of data can also be thought of as a three- or more-dimensional pile of associated records that can be accessed in any old way. More elaborate indexing methods give us faster access to the data. Larger and faster storage devices allow for more elaborate indexing methods.

This all culminates in a vision of data storage as a completely automatic mechanism without any files at all. You put data in and you get information out. Don't ask me how it works, but that's where I think we are headed. Naturally, without files, there can be no file fragmentation, so the problem is gone altogether, no doubt to be replaced by some new, even more perplexing problem that will keep us system programmers in business for a long time to come.


Conclusion

It should be clear by now that fragmentation is well understood and a good solution is available in the form of an automatic on-line disk defragmenter. It should also be clear that a defragmenter designed for a static laboratory environment won't cut the mustard in the real world where things are changing continuously. It is vital both for performance and for the safety of your data that the defragmenter be able to deal with a disk that is organized differently from defragmentation pass to defragmentation pass, with files that appear and disappear and with heavy user loads at unexpected times. In choosing a defragmenter, look for one that addresses known, demonstrable performance issues and not just theoretical problems demonstrable only in a laboratory environment. I have taken a great deal of time in this chapter to explain the ins and outs of defragmentation so that you will be well-informed and better able to judge your needs and the best solution for you. I believe that the more you know about fragmentation and defragmentation, the easier your job will be. How much do you need to know? Enough to handle the problem once and for all and get on with more important and interesting things.

[PREVIOUS PAGE][NEXT PAGE][RETURN TO TOP][TABLE OF CONTENTS]