Contingency Tables

Contents

Objective   Contingency Tables   Chi-Square Statistic  Chi Square Test of Independence  Example-Undergraduate Students  Chi Square Test of Homogeneity  Example-No Tax Increase

^ Objective

Upon completion of this module you should be able to perform two chi-square tests, the chi-square test of independence and the chi-square test of homogeneity, through the use of contingency tables.

^ Contingency Tables

Contingency tables are two-way frequency tables used to organize data that has been collected on two categorical random variables. We are interested in discovering whether the two variables are dependent upon one another.

There are two types of contingency tables that are most often used. The first type organizes the data we collect or observe. We call this the table of observed counts.

Observed Counts

Variable 1

 

Level a

Level b

Total

Variable 2

Level a

o11

o12

Row 1 total

Level b

o21

o22

Row 2 total

 

Total

Col 1 total

Col 2 total

Grand Total

n=sample size

The second type, the table of expected counts, organizes our expected counts in the same way as the table of observed counts. The expected counts are the values that we would expect to obtain for each cell under the assumption that the categorical variables are independent.

 

Expected Counts

Variable 1

 

Level a

Level b

Total

Variable 2

Level a

e11

e12

Row 1 total

Level b

e21

e22

Row 2 total

 

Total

Col 1 total

Col 2 total

Grand Total

n=sample size

The expected counts can be computed by multiplying the row and column totals of our table of observed counts and then dividing by the grand total or total number observations. The eij are the expected frequencies for the cell located in row i and column j.

Notice in the tables above that the contingency tables should always have row and column labels along with the type of table, either observed or expected.

^ Chi-Square Statistic

This statistic measures the amount of disagreement between the observed frequencies and the expected frequencies.

Degrees of freedom = df = (r - 1)*(c - 1) where r = number of rows and c = number of columns in the contingency table

Critical value: χ2 α,df where α = the chosen significance level and df = the degrees of freedom

^ Chi Square Test of Independence

^ Example-Undergraduate Students

A random sample of 400 undergraduate college students was classified according to GPA classification and study habits (Good, Average, Bad). An A corresponds to a GPA of 3.5 or greater, a B is a GPA between 2.5 and 3.5, a C is a GPA between 1.5 and 2.5, and a D or F corresponds to a GPA of 1.5 or less. From the data in the contingency table below, test to see whether the two classifications are independent.

In this applet the observed counts in the ovals will be determined from the problem description and set so that they cannot be changed. Select the Expect Radio Button in the bottom panel to display the expected values. Click in the rectangular regions and fill in the missing counts. This is like a cross-word puzzle in which the row and column sums must equal the marginal totals. By clicking the Calc Button the value of the test statistic and its associated p-value are computed. The hypotheses to be tested and the results of this test are also displayed.

H0: GPA and Study Habits are independent
H1: GPA and Study Habits are dependent

^ Chi Square Test of Homogeneity

Homogeneity means that all units within a group are alike in their characteristics (here we are interested in whether the two populations sampled are the same with respect to the classification levels of a single categorical variable). In this test we have two samples, one from each of two populations. These two populations will be compared according to some criteria (these criteria will be categorical variables).

^ Example-No Tax Increase

Statistics for Research, 2nd edition, Dowdy and Wearden, pages 122-123

(Used with permission from the authors)

A political scientist is interested in determining if the promise of no tax increase is of the same level of importance for voters of different political affiliations. Using voter registration lists, she chooses a random sample of 100 voters from each of the groups, Democrats, Republicans and Independents, and she asks the subjects to rate the importance of no tax increases according to the following scale; very important, somewhat important, not too important, not at all important. The results are listed in the table below.

H0: Members of the three parties agree on the importance of no tax increase (homogeneity)
H1: Members of the three parties do not agree on the importance of no tax increase (lack of homogeneity)

Self-test