If we are lucky, we can find data already collected in reports, data files, and elsewhere. If not, we may have to collect them ourselves. Divide the data into fairly small groups to make them easy to work with.
For example, a bank is studying processing times for accounts of medium sized businesses in order to distribute the workload equitably. A bank employee has recorded the times for 50 such transactions. (see the table below)
Minutes to process transactions 84 54 50 58 34 26 72 58 74 56 22 72 16 36 24 10 70 36 70 72 52 36 44 42 48 54 62 56 60 58 42 50 42 68 80 76 44 20 46 58 64 86 50 60 46 32 60 56 48 34
Variation in a process can be measured. The table above shows the variation in minutes for the processing of these accounts. Think of the 10 columns in this table as ten groups of data. There are only 5 numbers in each group, so the groups will be easily worked on.
Step 2. Find and mark the largest and smallest number in each group.
Circle the largest numbers and draw boxes around the smallest. Check our work. In table above, we circled the largest number in each column and drew a box around the smallest. Then we checked each column.
Step 3. Find the largest and the smallest numbers in the whole set.
Double circle the very largest and draw a double box around the smallest. Check our work. In step 2, we worked on small group of measurements, and it was easy to find the largest and the smallest in each group. Now, in step 3, look only at the numbers with circles and boxes around them. In table above, 86 is the largest of all the circled numbers. Put a second circle around it. The 10 is the smallest of the numbers in boxes. Draw a second box around it. Now check to make sure we did it correctly.
We may feel that this procedure takes us through too much detail, but keep two things in mind. First, I expect that you are learning this technique for the first time. Once you have learned and practiced it, you will be able to zip through the steps. Second, I am trying to give you details that will make it easy to use the technique and prevent you from making mistakes. Breaking this set of times into small groups makes it simple to find the largest and the smallest times in each group. Then it is easy to locate the very largest and the very smallest. Because we broke the times down into small groups, we can check our work easily and reduce the chance of making errors.
Step 4. Calculate the range of the data.
Subtract the very smallest number from the largest. The largest number is 86 and the smallest is 10, so the range is 76.
Step 5. Determine the intervals (also known as class intervals) for our frequency histogram.
From previous steps, we know that the measurements cover an interval from 10 to 86. Now divide this large interval into a number of smaller intervals of equal width. One rule of thumb is to use about 10 intervals, but this number of intervals doesn't always work. See table below for quidelines.
Guidelines for determining the number of intervals.
Number of readings Number of intervals Fewer than 50 5 to 7 50 to 100 6 to 10 101 to 150 7 to 12 more than 150 10 to 12
It is important to choose the right number of intervals for the number of readings. Too few intervals sometimes hide valuable information. Too many intervals may give such a flat histogram that we miss something important. We need to be skillful in picking the right number of intervals so that the information in the data will show up in the histogram. This skill comes with practices.
Lets try 8 class intervals for our data because table above recommended 6 to 10 intervals for 50 readings.
Step 6. Determine intervals, boundaries and midpoints.
First, divide the range of the data by the desired number of intervals. Round off this result for convenience. This gives the width of each interval.
The range of the set of 50 observations is 76. When we divide this range by 8 (the desired number of intervals), the result is 9.5.
Step 7. Determine the frequencies.
Tally the data in each class interval and check the tallies. Now add them, and list the totals under "Frequency". As a final check, add all the numbers in the "Frequency" column. This final total should equal the total number of readings.
Make a tally mark beside the intervals where that number fits. The first number in the bottom left corner from the example table is 64, so make a tally mark beside the interval that is defined by the "69.5-79.5" boundary.
When we look at the completed results, e will see that there are two tally marks in the first interval of 9.5-19.5. In the third interval, 29.5-39.5, we made four tally marks and drew a fifth one horizontally through them. Later we add one more tally mark to make a total of 6 for this interval. This method makes it easy to count the tallies, and reduces errors.
Once all the tallies are done, check by doing them again. Then total the tallies for each interval under the "Frequency" column.
We can check our work in two ways. First, count the tally and the tally check to make sure each gives the same result. Then add up the entries in the "Frequency" column. The sum is 50, which we already know is the total number of readings given in our example table.
Step 8. Prepare the frequency histogram.
There are two main principles to follow in preparing the frequency histogram. Is should:
· Tell the story of the data, no more and no less
· Be neat and easy to read
In drawing the frequency histogram we must:
· Mark and label the vertical scale
· Mark and label the horizontal scale
· Draw in the bars according to the tallies
· Label the histogram
Figure below shows the tallies in the form of a frequency histogram.
The vertical scale is labeled "Frequency" and the horizontal scale is labeled "Times to Process Transactions". The intervals are identified by their midpoints. (the 10-20 interval is labeled 15, and so on). The tally for the 15 midpoint is two big units high, the taly for the 25 midpoint is four units high, and so on. Each bar has a width of 10 units; the first goes from 10-20, the second from 20-30, and so forth. Finally, the label in the upper left-hand corner, "Frequency Histogram of Minutes to Process Transactions", identifies the histogram.
Does the histogram tell the story of the data? Let's see. First, there are eight bars in the histogram. The recommendation is 6 to 10 intervals for a set of 50 readings, so 8 intervals are O.K. As far as we know, there is nothing unusual about these data that doesn't show up on the histogram, so we think this histogram tells the story of the data.
Is this histogram neat and easy to read? We can gain eye appeal in several ways, as we did here: paste graph paper onto a white background and write on the background; make the frequencies and midpoints easy to read; don't make the histogram too tall, too short, too wide of too narrow. Above all, keep the histogram simple. Don't include extra information or superimpose another histogram on this one.
WHAT FREQUENCY HSITOGRAM TELL US ABOUT UNDERLYING FREQUENCY DISTRIBUTIONS
We learned at the beginning of this page that there are variations between individual units, such as rifle target shots, temperatures at a resort, and times to order and pick up a meal at the restaurant. We learned that a good way to describe these variations is to build a frequency histogram.
The frequency histograms that we develop on our job will usually be based on samples. Even though they may come form the same process, our histograms will look different because the samples are different.
Suppose we fill a bucket with 1000 small metal disks coated with plastic and stir them thoroughly. We draw out a sample of 10 disks and measure the thickness of the coating on each one. We put the sample back, stir the bucket, and take another sample of 10. The frequency histograms for these two samples are different.
The pattern created by taking all 1000 disks as our sample is called the underlying frequency distribution. The underlying frequency distribution will always create the same pattern because it includes all the disks, which are the same every time.
By contrast, the two 10-disks samples are so small that they don't give a clear idea of the underlying distribution. In addition, the samples are so small that the two histograms are different from each other. Histograms based on small samples like this will usually differ form each other because it is not likely that we will pick the same 10 disks twice.
The bigger the sample, the more the histogram will look like the underlying distribution and show what is really "in the bucket". For this reason, we recommended that you take samples of at least 50 pieces. A sample of 100 is even better.
Frequency histograms that are based on small samples will tell us something about the averages, even though they don't show the underlying frequency distributions. When we compare the histograms of small sample sizes to the underlying frequency distribution, we can see that the averages of both histograms are about the same.
Brainstorming
Cause and Effect Diagrams
Flowcharts
Storyboarding
Scatter Diagram
© 1997 say_kian@hotmail.com
1998 Corvette Roadster - Motor Trend's Car of the Y