Return to Statistics Topics

Modified Boxplots

Contents
A. Modified Boxplot Construction
B. Using TI 83/84 Calculator as Aid in Modified Boxplot Construction
C. Good Modified Boxplots, Bad Modified Boxplots and Unmodified Boxplots
D. Comparing Modified Boxplots

 

 

A. Modified Boxplot Construction

The following steps can be used to construct a modified box plot.

1. Put the data values in order.

2. Find the median, i.e. the middle data value when the scores are put in order.

3. Find the median of the data values below the median. This is the first quartile, Q1.

4. Find the median of the data values above the median. This is the third quartile, Q3.

5. Find the minimum, or smallest, data value, and the maximum, or largest, data value.

6. Find Q3 minus Q1. This is the interquartile range, denoted IQR.

7. Multiply the IQR by 1.5. This is the maximum whisker length, denoted MWL.

8. Subtract the MWL from Q1. This is the Lower Fence. Reasonable data values should be at or above the Lower Fence.

9. Add the MWL to Q3. This is the Upper Fence. Reasonable data values should be at or below the Upper Fence.

10. Mark any data values below the Lower Fence or above the Upper Fence as possible outliers.

11. If the minimum is a possible outlier, replace it by the smallest data value that is not a possible outlier. Call this the new minimum.

12. If the maximum is a possible outlier, replace it by the largest data value that is not a possible outlier. Call this the new maximum.

13. Draw a number line that extends from the original minimum data value to the original maximum data value.

14. Mark the new minimum, Q1, the median, Q3, and the new maximum as short vertical lines above their corresponding position on the number line. Use the minimum if there is no new minimum. Use the maximum if there is no new maximum.

15. Connect the segments for Q1, the median, and Q3 with horizontal lines through their top points and their bottom points.

16. Draw a line from the middle of the segment for Q1 to the middle of the segment for the new minimum (if you have one) or otherwise to the segment for the minimum.

17. Draw a line from the middle of the segment for Q3 to the middle of the segment for the new maximum (if you have one) or otherwise to the segment for the maximum.

18. Mark the location of all your possible outliers with asterisks (*).

The result is a modified box plot of the data set.

Example: We will use the following data representing tornadoes per year in Oklahoma from 1995 until 2004 (Sullivan, 2nd edition, p. 167), to construct a modified box plot .

79 47 55 83 145 44 61 18 78 62

Step 1: The data is put in order from smallest to largest.

18 44 47 55 61 62 78 79 83 145

Step 2: The median is the average of the middle two scores. (61 + 62)/2 = 61.5

Step 3: Data values below 61.5 are: 18, 44, 47, 55, 61. Their median is at the middle: Q1 = 47.

Step 4: Data values above 61.5 are: 62, 78, 79, 83, 145. Their median is at the middle: Q3 = 79.

Step 5: The minimum is 28 and the maximum is 82.

Step 6: Now find the interquartile range: IQR = Q3 - Q1 = 79 - 47 = 32.

Step 7: Next we find the maximum whisker length: MWL = IQR x 1.5 = 32 x 1.5 = 48.

Step 8: The lower fence for reasonable data is Q1 - MWL = 47 - 48 = 1.

Step 9: The upper fence for reasonable data is Q3 + MWL = 79 + 48 = 127.

Step 10: Nothing is below the lower fence, but 145 is above the upper fence. Thus 145 is a possible outlier.

Step 11: Since there are no data values below the lower bound, we leave the minimum unchanged.

Step 12: The original maximum was a possible outlier, so we use the maximum of the remaining data, 83, as the new maximum.

Step 13: Draw a number line with a uniform scale that extends at least from the original minimum to the original maximum, but not much farther.

Step 14: Mark the locations of the following five values with vertical line segments all having the same length: the minimum, the first quartile, the median, the third quartile, and the new maximum.

Step 15: Connect the tops of the line segments for the median and the other quartiles, and then connect the bottoms of the same line segments to make the box.

Step 16 and 17: Draw a line from the first quartile to the minimum and another from the third quartile to the new maximum to make the whiskers.

Step 18: Mark the location of the possible outlier at 145 with an asterisk.

Your modified boxplot is finished.

B. Using TI 83/84 Calculator as an Aid in Modified Box Plot Construction

To demonstrate this procedure, we will use the same data from the example in Part A.

79 47 55 83 145 44 61 18 78 62

Press "STAT" and "ENTER", and enter the numbers shown above under L1. For information on how to do this, go to the Data Entry webpage under TI 83/84 Statistics.

If you have not created a statistics plot with the calculator recently, press "Y=" and clear everything to the right of "Y1 =", "Y2 =", "Y3 =" , etc. If you do not clear the right hand sides of these equations, your box plot may not appear at all or you may have spurious lines and curves cutting through your box plot. To clear the right side of an equation, just put the cursor on the right side and press the "CLEAR" button. You should use the down arrow button to make sure there is nothing next to "Y8 =", "Y9 =" or "Y0 =".

Next press the "2ND", "Y=" and "ENTER" buttons. This will bring up the plots screen with "Plot1" highlighted. To make a selection on this screen, highlight the option you want and press the "ENTER" button.

Make sure the plot is "On" as opposed to "Off". Next to "Type", choose the icon in the lower left corner which looks like a box plot without a middle bar and has two dots off to the right. DO NOT choose the icon that looks like an unmodified box plot.

Next to "Xlist", L1 should be entered, and next to "Freq", 1 should be entered. If you had entered a frequency distribution with the numbers in L1 and the frequencies in L2, then you would put L2 next to "Freq".

"Mark" is the symbol used for outliers. The default setting is fine, though any of the symbols could be used.

When your settings on the plot screen are correct, press the "ZOOM" button and choose the "ZoomStat" option. You will have to use the down-arrow button to find this option. Then press "ENTER". A modified boxplot should appear on the screen.

To see where the whisker tips, the box ends, the median and possible outliers are located, press the "TRACE" button. By using the left and right arrow buttons, you can move the cursor from the left whisker tip through the box to the right whisker tip and out to the possible outliers, and the values of these points will be shown at the bottom of the screen.

Moving the cursor from left to right through the modified box plot created from the list of numbers given above, the calculator gives the values as minX = 18, Q1 = 47, Med = 61.5, Q3 = 79, X = 83, maxX = 145. The plain X in this case is the new maximum, since it is at the right whisker tip. The other numbers are the five number summary of the original data set.

To create the modified box plot on your own paper, you need to set up a uniform number scale extending from 18 at the minimum to 145 at the maximum, but not much farther. Put vertical lines of the same length above each of the quartiles and the minimum and new maximum. Connect the middle three lines across the top and bottom to make the box. Draw a horizontal line connecting the minimum to the first quartile, and another horizontal line connecting the new maximum to the third quartile. Finally, mark the outlier at 145 with an asterisk.

C. Good Modified Boxplots, Bad Modified Boxplots, and Unmodified Boxplots

Figure 1: GOOD Modified Boxplots

This is a GOOD graph. The number scale is uniform and extends from the minimum of the data values to the maximum of the data values. The top boxplot is clearly identified as being for the female actors and the bottom boxplot is clearly identified as being for the male actors. Outliers are marked for each of the two groups of actors.

Figure 2: BAD Modified Boxplots

This is a BAD graph because the data source for each boxplot is coded, making it harder for the reader to understand what each boxplot represents. Notice that Figure 1 provided the same information in less space without using coding.

Figure 3: BAD Modified Boxplots

This is a BAD graph because the scale extends too far in both directions. Since the scale extends so much farther, the boxplots must be smaller than they are in Figure 1. The youngest actor is older than 20 and the oldest actor is no more than 80 years old; so there is no reason for the scale to extend below 20 or above 80. Figure 1 provides the same information in the same space and is much bigger.

Figure 4: UNMODIFIED Boxplots

These are NOT modified boxplots because the left whiskers are too long. If a whisker is more than 1.5 times as long as the box, then the whisker contains one or more possible outliers. Notice that in Figure 1, each boxplot has one or more outliers identified and the whiskers are no more than 1.5 times as long as the box. The graphs in Figure 4 would be called unmodified boxplots. I will NEVER ask you to construct unmodified boxplots in this course.

 

D. Comparing Boxplots

Two different boxplots plotted above the same axis can be readily compared visually. A statistician should be able to answer the following questions about such a pair of boxplots.

  • Which plot is more skewed? (Which whiskers are more lopsided? If the whiskers are about the same, which vertical line in the boxes is more off center?)
  • Which plot has the biggest value? (Which right whisker tip is farther right?)
  • Which plot has the smallest value? (Which left whisker tip is farther left?)
  • Which plot has the largest median? (Which vertical line in the boxes is farther right?)
  • Which plot has the largest range? (Which has the largest whisker tip to whisker tip length?)
  • Which plot has the largest interquartile range? (Which box is longer?)

Note that if you cannot decide which box plot meets the criteria, a good answer is "They are about the same." These questions are also not numbered, so you can answer them in any order, but your answers should give enough information to let readers know what the question was.

Exercise: Answer the six questions above for each of the examples shown below.