## Quantitative Frequency Distributions and Histograms

 Contents A. B. Making a Histogram from a Quantitative Frequency Distribution C. Good Histograms, Bad Histograms and Non-histograms D. Analyzing Histograms

A. Making a Quantitative Frequency Distribution

To create a histogram, you first must make a quantitative frequency distribution. The following list of steps allows you to construct a perfect quantitative frequency distribution every time. Other methods may work sometimes, but they may not work every time.

1. Find the smallest data value (low score) and the largest data value (high score).

2. Select the number of classes you want. Usually, this number is between 3 and 7. (The number of classes may be given in the instructions to the problem.)

3. Determine the accuracy of the data. That is, look at the data to see how many places to the right of the decimal point are used.

4. Compute the following two numbers:

5. The class width is now chosen to be any number greater than the lower bound, but not more than the upper bound. The class width may have more accuracy than the original data, but should be easy to use in calculations. Since there may be more than one possible class width, there can be many correct frequency distributions with the same number of classes.

6. Next you compute the lower class limits. Starting with the low score, repeatedly add the class width until - including the low score - you have one lower class limit for each class.

7. The upper class limit for the first class is the biggest number below the second lower class limit with the same accuracy as the class width. To obtain the other upper class limits, you repeatedly add the class width to the first upper class limit until - including the first upper class limit - you have one upper class limit for each class.

8. For each class, count the number of data values in the class. This is the class frequency. You can do this by going through the data values one by one and making a tally mark next to the class where the data value occurs. Counting up the tallies for each class gives the class frequency. The class frequencies should be recorded in their own column.

Tally marks are optional, but you must show the class frequencies. The frequencies of the first and last class must be greater than zero. The frequency of any other class may be zero. If you tallied correctly, the sum of all the frequencies should equal the total number of data values.

Example: The following data represents the actual liquid weight in 16 "twelve-ounce" cans. Construct a frequency distribution with four classes from this data.

 11.95 11.91 11.86 11.94 12 11.93 12 11.94 12.1 11.95 11.99 11.94 11.89 12.01 11.99 11.94

Solution: First we use the steps listed above to construct the frequency distribution.

Step 1: low score = 11.86, high score = 12.10

Step 2: number of classes = 4 (given in problem)

Step 3: The accuracy is two decimal places.

Step 4: Compute the lower and upper bounds.

Step 5: We can use any number bigger than 0.06, but not more than 0.08. If we restrict our attention to the simplest numbers, either 0.07 or 0.08 will work. I chose 0.08 because I think it is easier to work with than 0.07.

Step 6: By adding 0.08 to 11.86 repeatedly, we obtain the lower class limits: 11.86, 11.94, 12.02, 12.10. Notice there are 4 numbers because we want 4 classes.

Step 7: The first upper class limit is the largest number with the same accuracy as the data that is just below the second lower class limit. In this case, the number is 11.93. The other upper class limits are found by adding 0.08 repeatedly to 11.93, until there are 4 upper class limits.

Step 8: Next, for each member of the data set, we decide which class contains it and then put a tally mark by that class. The numbers corresponding to these tallies gives us the class frequencies.

 Class Tally Frequency 11.86-11.93 |||| 4 11.94-12.01 ||||  ||||  | 11 12.02-12.09 0 12.10-12.17 | 1

The tallies in the last step are optional, but the frequency column is required. Notice that the frequency of the third class is zero. Since this is not the first or last class, this is not a problem. Notice also that the sum of the frequencies is 16, which is the same as the number of data values.

For more examples of making a quantitative frequency distribution, go to the GeoGebra applet Quantitative Frequency Distributions

B. Making a Histogram from a Quantitative Frequency Distribution

To make a histogram, you must first create a quantitative frequency distribution. We will make a histogram from the the quantitative frequency distribution constructed in part A, a copy of which is shown below.

 Class Frequency 11.86-11.93 4 11.94-12.01 11 12.02-12.09 0 12.10-12.17 1

First, set up a coordinate system with a uniform scale on each axis (See Figure 1 below). The data axis is marked here with the lower class limits. Note that the last number is 12.10 + 0.08 = 12.18, which is not in the frequency table, but it keeps the scale uniform. You could also mark the data axis using the upper class limits, or you could mark it with the class midpoints. Whatever method you use, the data axis will always have a uniform numeric scale with tick marks at regular intervals and numbers next to the tick marks.

If the data axis doesn’t look like a number line, then you don’t have a histogram.

Frequency scales always start at zero, so the frequency scale must extend from 0 to at least 11 in this case. I went up to 12, so that I could use multiples of 4 on the vertical axis. You could use multiples of three, multiples of two or mark each number from 0 to 11 on the frequency scale, but the smaller the multiple, the more work you will have to do. As with the data axis, the frequency scale should have tick marks at regular intervals and numbers next to the tick marks.

Once the scales are set up, you draw a bar for each class with a frequency greater than zero (See Figure 2 below). Each bar will cover the interval from its lower class limit to the next lower class limit to the right. The height of the bar is the same as the frequency for that class.

Finally add a label for each axis. The vertical axis can always be labeled "Frequency". The label on the horizontal axis just describes the original data set.

For more examples of making a histogram, go to the GeoGebra applet Histograms.

C. Good Histograms, Bad Histograms and Non-histograms

Figure 1: GOOD Histogram

This graph has all the characteristics of a good histogram. Both axes are labeled and a title is given. The data axis has a uniform number scale to label the bars. The scales on both the frequency and the data axes cover the data values and not much more. Finally, there are no gaps between the bars.

The problem with this histogram is that there are gaps between the bars. The gaps make it appear that some values like 44 and 45 or 53 and 54 never occur.

Here, the problem is that the frequency scale goes too high. One bar has a frequency of 5 and all the rest have frequencies of 3 or lower, so there is no reason to extend the frequency scale above 6. Notice that the bars in Figure 1 are twice as tall as the bars in Figure 3, even though Figure 1 is about the same size as Figure 3.

The data scale here extends too far to the left. The smallest data value is 27 and everything else is bigger, so there is no reason for the data scale to go below 25. Notice that the bars in Figure 1 are wider than those in Figure 4, even though Figure 1 is about the same size as Figure 4.

Figure 5: NOT a Histogram

This chart is NOT a histogram because there is not a uniform number scale on the data axis. The chart is best described as a quantitative bar chart. Without a uniform number scale on the data axis, reading the chart and analyzing the results become much more difficult.

D. Analyzing Histograms

The following are questions that a statistician should be able to answer about any histogram.

*        What is the maximum data value as shown on the histogram? (What is the largest value on the data axis?)

*       What is the minimum data value as shown on the histogram? (What is the smallest value on the data axis?)

*        Is the histogram symmetric, skewed to the left, skewed to the right, bell-shaped, uniform or does it have no special shape? (Because real data rarely results in perfectly uniform, bell-shaped, or symmetric histograms, anything close to these shapes can be classified as such.)

*       How many peaks does the histogram have, and where are they located? (Peaks are bars with shorter bars on each side. First bars that are taller than second bars or last bars that are taller than the preceding bar are also called peaks. Two or more adjacent bars of the same height with neighboring shorter bars - a plateau - would be considered one peak.)

*     Does the histogram have any gaps, and if so, where are they located? (Gaps are empty classes with bars on both sides.)

*     Does the histogram have any extreme values, and if so, where are they located? (An extreme value is a bar with a large gap - two or more classes - between it and the other bars.)

Notice that to answer all of these questions you only need to look at the numbers on the data axis of the histogram - not the frequency axis. The questions are not numbered - in fact they can be asked in any order - so placing a number or letter next to an answer does not identify the question. You should give enough information in your answer to a question so the reader does not have to even know there was a question. The eventual goal is for you to combine all the answers to these questions together in a paragraph.

Exercise: Answer the questions listed above for each of the histograms shown below.