left-icon

Statistics Fundamentals Succinctly®
by Katie Kormanik

Previous
Chapter

of
A
A
A

CHAPTER 8

Tabulated Data

Tabulated Data


Test for significance with tabulated data

You’ve learned how to determine whether or not two or more samples comprised of continuous data are significantly different. We can also perform hypothesis tests for tabulated data (a tally of subjects that fits into various categories) to determine if proportions significantly differ and whether or not the number of values in subsets of data significantly differs from the expected number of values. The former involves a z-test; the latter involves a chi-square test.

Difference between proportions

The z-test test is similar to the test for proportions you learned in Chapter 5. However, this time we’re comparing two samples rather than comparing one proportion to an expected proportion. The null and alternative hypotheses are:

H0: p1 = p2
Ha: p1 ¹ p2

For example, let’s say you want to know if divorce is more likely to occur among urban or suburban professionals. You send out surveys to randomly selected professionals aged 30-50 in major cities and various suburbs across the U.S. asking if they’ve ever been divorced (response is “yes” or “no”). You get back 1032 responses from urban professionals and 865 responses from suburban professionals. The results are tabulated in Table 11.

Table 11: Survey results for use in a z-test.

Have you ever been divorced?

Yes

No

Urban

187

845

Suburban

62

803

You can use a z-test to determine if these results are significant and if one group has higher divorce rates than the other.

As we learned in Chapter 7, we need to calculate a z-score by finding the difference between the two proportions (the proportion of urban professionals who responded one way and the proportion of suburban professionals who also responded that way) and divide this difference by the standard error.

Let’s analyze the proportion that responded “yes.” For urban professionals, p1 = 187/1032 = 0.18. For suburban professionals, the proportion that responded “yes” is p2 = 62/865 = 0.07. So we’ll look at the difference between 0.18 and 0.07 and divide this difference by the standard error.

In this case, the standard error changes because we need to account for the two sample proportions as well as the two sample sizes. To do this, we calculate a pooled sample proportion, .

We then use to calculate the standard error, SE.

Now, we can calculate our z-statistic.

Let’s perform this z-test for proportions with our example, starting with calculating the pooled sample proportion.

We can now calculate the standard error.

Finally, we can calculate the z-score.

Because this z-score is far greater than 1.96, the z-critical value for a two-tailed test at α = 0.05, we’ll reject the null and can conclude that the two proportions are significantly different—urban professionals are more likely to get divorced.

Chi-square test

We can analyze this same tabulated data with a chi-square test, which is different from the z-test in that it compares the frequencies of occurance to what we might expect if the two factors are independent, i.e. we can’t predict the level of one factor by knowing the other.

H0: The two factors are independent.
Ha: The two factors are not independent (we can predict the frequency of one factor by knowing that of the other). 

Table 12: Survey results for use in a chi-square test.

Have you ever been divorced?

Yes

No

Total

Urban

187

845

1032

Suburban

62

803

865

Total

249

1648

1897

In this case, we would expect that the number of urbanites who have been divorced is the same proportion of the total number who have been divorced (249/1897 = 0.13). So, if about 13% of all people have been divorced and we have 1032 urbanite responses, we would expect that 0.13´1032 = 134.16 urbanites have been divorced. Looking at our data, we see a higher number of divorces (187) than this expected value (134). We want to determine if this difference is significant. Let’s first go a little more in-depth with how we find the expected values.

To simplify the procedure for finding expected values, we multiply the marginal totals and divide by the grand total (1,897).

Table 13: Expected values (in green) are found by multiplying each marginal total and dividing by the grand total.

Have you ever been divorced?

Yes

No

Total

Urban

187

845

1032

Suburban

62

803

865

Total

249

1648

1897

After calculating expected values, we compute a chi-square (c2) statistic

where f0 is the observed value and fE is the expected value. Here is our example:

Again, we use another table to determine if our results (i.e. the difference between our observed and expected values) are significant. Degrees of freedom are equal to (n-1)(m-1), where n is the number of categories for Factor 1 and m is the number of categories for Factor 2. In this case, there are two categories for location (urban and suburban) and two categories for divorce (yes and no). Therefore, df = (2-1)(2-1) = 1. The chi-square table tells us that for df = 1 and a = 0.05, the critical c2 value is 3.84. Because our computed c2 statistic is greater than the critical value, we conclude that location (urban vs. suburban) and whether or not someone has been divorced are independent of one another.

Scroll To Top
Disclaimer
DISCLAIMER: Web reader is currently in beta. Please report any issues through our support system. PDF and Kindle format files are also available for download.

Previous

Next



You are one step away from downloading ebooks from the Succinctly® series premier collection!
A confirmation has been sent to your email address. Please check and confirm your email subscription to complete the download.