van Gogh, V.



FREQUENCY HISTOGRAMS

The frequency histogram is one tool that helps us keep track of variation. As we mentioned earlier, a frequency histogram is a "snapshot" of a process that shows the spread of measurements and how many of each measurement there are.
The figure below, which is a frequency histogram, the lower edge of the chart, called the horizontal scale, records the sizes of men's sport shoes sold during a week's time. Notice that the sizes are listed in order from left to right. In this example, each number really represents a group, or class: 6 represents sizes of 6 and 6½, 7 represents sizes 7 and 7½, and so on.
The vertical scale along the left-hand edge records how often each size sold. As we can see, size sold 10 times; size 7 sold 16 times; size 8 sold 24 times and so on.
Figure above tell us a great deal about variation. The sizes vary from 6 for the smallest size to 14 for the largest. The most frequent size sold is in the size 10 group. The average size sold is about 10½. This frequency histogram shows us all these things quickly and easily without formulas or tables.
Still, frequency histograms don't tell us everything about variation. The histogram in above figure, for example, does not tell us whether the variations come form just on e store or from more than one. Also, it does not show us any pattern over time. That is, we can't tell from above figure whether the first pair of size 6 was sold on Monday or on another day.
But, before we discuss what frequency histogram can or cannot do, we will see how to construct one.

CONSTRUCTING A FREQUENCY HISTOGRAM

In this section, we will go through the steps for constructing a frequency histogram. At each step, we will describe the step and give an example.

Step 1. Collect data.

If we are lucky, we can find data already collected in reports, data files, and elsewhere. If not, we may have to collect them ourselves. Divide the data into fairly small groups to make them easy to work with.
For example, a bank is studying processing times for accounts of medium sized businesses in order to distribute the workload equitably. A bank employee has recorded the times for 50 such transactions. (see the table below)

Minutes to process transactions
84	54	50	58	34	26	72	58	74	56	
22	72	16	36	24	10	70	36	70	72
52	36	44	42	48	54	62	56	60	58
42	50	42	68	80	76	44	20	46	58
64	86	50	60	46	32	60	56	48	34

Variation in a process can be measured. The table above shows the variation in minutes for the processing of these accounts. Think of the 10 columns in this table as ten groups of data. There are only 5 numbers in each group, so the groups will be easily worked on.

Step 2. Find and mark the largest and smallest number in each group.

Circle the largest numbers and draw boxes around the smallest. Check our work. In table above, we circled the largest number in each column and drew a box around the smallest. Then we checked each column.

Step 3. Find the largest and the smallest numbers in the whole set.

Double circle the very largest and draw a double box around the smallest. Check our work. In step 2, we worked on small group of measurements, and it was easy to find the largest and the smallest in each group. Now, in step 3, look only at the numbers with circles and boxes around them. In table above, 86 is the largest of all the circled numbers. Put a second circle around it. The 10 is the smallest of the numbers in boxes. Draw a second box around it. Now check to make sure we did it correctly.
We may feel that this procedure takes us through too much detail, but keep two things in mind. First, I expect that you are learning this technique for the first time. Once you have learned and practiced it, you will be able to zip through the steps. Second, I am trying to give you details that will make it easy to use the technique and prevent you from making mistakes. Breaking this set of times into small groups makes it simple to find the largest and the smallest times in each group. Then it is easy to locate the very largest and the very smallest. Because we broke the times down into small groups, we can check our work easily and reduce the chance of making errors.

Step 4. Calculate the range of the data.

Subtract the very smallest number from the largest. The largest number is 86 and the smallest is 10, so the range is 76.

Step 5. Determine the intervals (also known as class intervals) for our frequency histogram.

From previous steps, we know that the measurements cover an interval from 10 to 86. Now divide this large interval into a number of smaller intervals of equal width. One rule of thumb is to use about 10 intervals, but this number of intervals doesn't always work. See table below for quidelines.

Guidelines for determining the number of intervals.

Number of readings			Number of intervals
Fewer than 50					5 to 7
50 to 100					6 to 10
101 to 150					7 to 12
more than 150					10 to 12

It is important to choose the right number of intervals for the number of readings. Too few intervals sometimes hide valuable information. Too many intervals may give such a flat histogram that we miss something important. We need to be skillful in picking the right number of intervals so that the information in the data will show up in the histogram. This skill comes with practices.

Lets try 8 class intervals for our data because table above recommended 6 to 10 intervals for 50 readings.

Step 6. Determine intervals, boundaries and midpoints.

First, divide the range of the data by the desired number of intervals. Round off this result for convenience. This gives the width of each interval.
The range of the set of 50 observations is 76. When we divide this range by 8 (the desired number of intervals), the result is 9.5.

76/8 =9.5

If we round off 9.5 to 10, which will be much easier to work with, we can group our data into eight intervals, each 10 units wide.
Next, set up boundaries for the intervals. Each reading must fall between two boundaries, for reasons we will discuss in what follows.
Since the smallest reading is 10, we may want to make the first interval go from a lower endpoint of 10 to an upper endpoint of 20, the second from 20 to 30, and so on, because we decided to make the intervals 10 units wide. But if we have a measurement of exactly 20, we will have the problem of deciding whether to put the 20 into the first interval (10 to 20) or the second (20 to 30).
Boundaries solve this problem. Set up boundaries between the intervals and make the boundaries such that no readings can fall on them. The easy way to do this is to add or subtract one decimal place from an endpoint. Since the data in our example table have no decimal places, subtract 0.5 from the endpoint of each interval. This changes the 10 endpoint to 9.5, the 20 endpoint to 19.5, and so on. In this case, we have subtracted, but we could just as easily have added 0.5 to the endpoints of the intervals.
Now, no observations can fall on the boundaries, and the problem is solved. The first interval runs from 9.5 to 19.5, the second from 19.5 to 29.5, and so on.
Finally, set a midpoint at the center of each interval. (A little rounding off is all right). The first interval runs from 9.5 to 19.5, a width of 10 units. Half this width is 5. Add 5 to the lower boundary, 9.5. the result is 14.5, the midpoint of the first interval. Round up to 15.0. Set all the other midpoints in the same way to obtain midpoints of 25, 35, and so on.

Step 7. Determine the frequencies.

Tally the data in each class interval and check the tallies. Now add them, and list the totals under "Frequency". As a final check, add all the numbers in the "Frequency" column. This final total should equal the total number of readings.
Make a tally mark beside the intervals where that number fits. The first number in the bottom left corner from the example table is 64, so make a tally mark beside the interval that is defined by the "69.5-79.5" boundary.
When we look at the completed results, e will see that there are two tally marks in the first interval of 9.5-19.5. In the third interval, 29.5-39.5, we made four tally marks and drew a fifth one horizontally through them. Later we add one more tally mark to make a total of 6 for this interval. This method makes it easy to count the tallies, and reduces errors.
Once all the tallies are done, check by doing them again. Then total the tallies for each interval under the "Frequency" column.
We can check our work in two ways. First, count the tally and the tally check to make sure each gives the same result. Then add up the entries in the "Frequency" column. The sum is 50, which we already know is the total number of readings given in our example table.

Step 8. Prepare the frequency histogram.

There are two main principles to follow in preparing the frequency histogram. Is should:

· Tell the story of the data, no more and no less
· Be neat and easy to read
In drawing the frequency histogram we must:
· Mark and label the vertical scale
· Mark and label the horizontal scale
· Draw in the bars according to the tallies
· Label the histogram

Figure below shows the tallies in the form of a frequency histogram.
The vertical scale is labeled "Frequency" and the horizontal scale is labeled "Times to Process Transactions". The intervals are identified by their midpoints. (the 10-20 interval is labeled 15, and so on). The tally for the 15 midpoint is two big units high, the taly for the 25 midpoint is four units high, and so on. Each bar has a width of 10 units; the first goes from 10-20, the second from 20-30, and so forth. Finally, the label in the upper left-hand corner, "Frequency Histogram of Minutes to Process Transactions", identifies the histogram.
Does the histogram tell the story of the data? Let's see. First, there are eight bars in the histogram. The recommendation is 6 to 10 intervals for a set of 50 readings, so 8 intervals are O.K. As far as we know, there is nothing unusual about these data that doesn't show up on the histogram, so we think this histogram tells the story of the data.
Is this histogram neat and easy to read? We can gain eye appeal in several ways, as we did here: paste graph paper onto a white background and write on the background; make the frequencies and midpoints easy to read; don't make the histogram too tall, too short, too wide of too narrow. Above all, keep the histogram simple. Don't include extra information or superimpose another histogram on this one.

SOME CAUTIONS

In preparing a frequency histogram, we must be careful about several things so that the histogram will tell the story of the data it represents. The following guidelines will help us to accomplish this.
1. Use equal-width intervals. Unequal-width intervals tend to be confusing.
2. Do not use open intervals. That is, make sure every interval has definite boundaries.
3. Do not make any breaks in the vertical or horizontal scales. If we do, they may be overlooked.
4. Do not have too few or too many intervals. By using too few intervals, the histogram hides the fact that there is one high reading, which is very different form all the others.
5. Do not put too much information on one histogram. It can be confusing.
6. Give everything needed to identify all the information completely and to make the graph understandable.

WHAT FREQUENCY HSITOGRAM TELL US ABOUT UNDERLYING FREQUENCY DISTRIBUTIONS

We learned at the beginning of this page that there are variations between individual units, such as rifle target shots, temperatures at a resort, and times to order and pick up a meal at the restaurant. We learned that a good way to describe these variations is to build a frequency histogram.
The frequency histograms that we develop on our job will usually be based on samples. Even though they may come form the same process, our histograms will look different because the samples are different.
Suppose we fill a bucket with 1000 small metal disks coated with plastic and stir them thoroughly. We draw out a sample of 10 disks and measure the thickness of the coating on each one. We put the sample back, stir the bucket, and take another sample of 10. The frequency histograms for these two samples are different.
The pattern created by taking all 1000 disks as our sample is called the underlying frequency distribution. The underlying frequency distribution will always create the same pattern because it includes all the disks, which are the same every time.
By contrast, the two 10-disks samples are so small that they don't give a clear idea of the underlying distribution. In addition, the samples are so small that the two histograms are different from each other. Histograms based on small samples like this will usually differ form each other because it is not likely that we will pick the same 10 disks twice.
The bigger the sample, the more the histogram will look like the underlying distribution and show what is really "in the bucket". For this reason, we recommended that you take samples of at least 50 pieces. A sample of 100 is even better.
Frequency histograms that are based on small samples will tell us something about the averages, even though they don't show the underlying frequency distributions. When we compare the histograms of small sample sizes to the underlying frequency distribution, we can see that the averages of both histograms are about the same.

Brainstorming
Cause and Effect Diagrams
Flowcharts
Storyboarding
Scatter Diagram

© 1997 say_kian@hotmail.com

CDnow

review questionnaire


1998 Corvette Roadster - Motor Trend's Car of the Y1998 Corvette Roadster - Motor Trend's Car of the Y

This page hosted by GeoCities Get your own Free Home Page