Chi-Square Goodness of Fit Test
If some of the symbols or
images below do not appear, try using Mozilla Firefox as your internet browser.
Brief Instructions
Enter the observed values under L1. Divide the sample size by the number of possible outcomes. The result is the expected value.
Put the cursor on L2 in the data entry window. Press "(" left parenthesis, 2ND, "1" and "-" minus sign. Enter the expected value. Then press ")" right parenthesis, "x2", "÷", and enter the expected value again. Pressing ENTER will fill in L2.
Now press STAT and choose CALC. Press ENTER, 2ND, "2"
and ENTER. The test statistic is next to Sx = .
Subtract one from the number of possible outcomes to obtain the
degrees of freedom. Next press 2ND and VARS, choose c2cdf(
and press ENTER. Enter the test statistic, 1E99, and the number of possible
outcomes minus one. Put commas between these three numbers and a right
parenthesis after the last number. Pressing ENTER gives the P-value.
Detailed Instructions
This test is
used to decide whether outcomes of a certain event are in the same proportions
that were expected. We will always test whether or not all outcomes are equally
likely. To demonstrate this procedure we will test whether the days of the week
for 300 randomly selected pedestrian deaths are equally likely at 98%
confidence (Sullivan, Fundamentals of
Statistics, 2nd ed, Pearson Education, Inc. 2008, p.561). The
data are shown below.
Day |
Sun. |
Mon. |
Tues. |
Wed. |
Thurs. |
Fri. |
Sat |
Frequency |
39 |
40 |
30 |
40 |
41 |
49 |
61 |
Assuming the
days of the week are equally likely, you would expect 300/7 ≈ 42.9 deaths
on average for each of the days. The test statistic for this test is given by
the formula
|
where k is the number of possible outcomes, Oi
is the observed frequency, and Ei
is the expected frequency. In this situation, k = 7 since there are 7 days of the week, the Oi are given in the table shown above, and since we are
assuming each day is equally likely, Ei
= 42.9 for each i. To aid in
computing, we will rename the second row and add a third row to the table
above.
Day |
Sun. |
Mon. |
Tues. |
Wed. |
Thurs. |
Fri. |
Sat |
Oi |
39 |
40 |
30 |
40 |
41 |
49 |
61 |
Ei |
42.9 |
42.9 |
42.9 |
42.9 |
42.9 |
42.9 |
42.9 |
To save time, we
will use the use the "list" operations of the calculator to do the
necessary calculations. Press "STAT" and "ENTER". Clear
lists L1 and L2. Enter the Oi
under L1. When you are finished, the data entry screen should look like the
following.
L1 |
L2 |
39 |
|
40 |
|
30 |
|
40 |
|
41 |
|
49 |
|
61 |
|
Next put the cursor on L2 in the data entry window, and press "(" (left parenthesis), "2ND", "1", "-" (minus sign), "4", "2", "." (decimal point), "9", ")" (right parenthesis), "x2", "÷", "4", "2", "." (decimal point), "9". At this point, the formula at the bottom of the data entry window should look like:
L 2 =
(L1 - 42.9)
2 / 42.9 |
Now if you press "ENTER", the L2 list will be filled as shown below.
L1 |
L2 |
39 |
.35455 |
40 |
.19604 |
30 |
3.879 |
40 |
.19604 |
41 |
.08415 |
49 |
.86737 |
61 |
7.6366 |
The numbers under L2 are the numbers to be summed in the right
hand side of formula (1) above. To find the sum, press "STAT" and
choose "CALC". Then press "ENTER", "2ND",
"2" and "ENTER". The test statistic is found to the right
of Sx = , and in this case is about 13.21.
The final step is to find the P-value for this test. To do this
press "2ND" and "VARS" and choose " c2cdf(", and press "ENTER". You should see
"c2cdf(" on your screen. The P-value in this case is the
probability of obtaining a test statistic at least as big as 13.21 assuming
that the null hypothesis holds. So we will enter the test statistic as the
lower bound, "1E99" as the upper bound, and then the degrees of
freedom with commas between the three numbers and a right parenthesis on the
end.
For this
example, there are 7 possible outcomes, one for each day of the week. The
degrees of freedom is thus 7 - 1 = 6, and we would fill in the chi-square
formula as follows.
c2cdf(13.21,
1E99, 6) |
Now pressing the
"ENTER" button will produce the P-value. In this case, the P-value is
0.0398. Since we are testing at 98% confidence, the significance level is 1 - 0.98
= 0.02. The P-value is greater than the significance level, so we should
keep the null hypothesis. Hence, the evidence is not strong enough at the 98% confidence level to say that the days of the week are not all equally likely.