Why do we use chi square




















You must enter at least one Column variable. Also note that if you specify one row variable and two or more column variables, SPSS will print crosstabs for each pairing of the row variable with the column variables. The same is true if you have one column variable and two or more row variables, or if you have multiple row and column variables.

A chi-square test will be produced for each table. Additionally, if you include a layer variable, chi-square tests will be run for each pair of row and column variables within each level of the layer variable. C Layer: An optional "stratification" variable. If you have turned on the chi-square test results and have specified a layer variable, SPSS will subset the data with respect to the categories of the layer variable, then run chi-square tests between the row and column variables.

This is not equivalent to testing for a three-way association, or testing for an association between the row and column variable after controlling for the layer variable. D Statistics: Opens the Crosstabs: Statistics window, which contains fifteen different inferential statistics for comparing categorical variables. E Cells: Opens the Crosstabs: Cell Display window, which controls which output is displayed in each cell of the crosstab.

Note: in a crosstab, the cells are the inner sections of the table. They show the number of observations for a given combination of the row and column categories. There are three options in this window that are useful but optional when performing a Chi-Square Test of Independence:. This option is enabled by default. F Format: Opens the Crosstabs: Table Format window, which specifies how the rows of the table are sorted. In the sample dataset, respondents were asked their gender and whether or not they were a cigarette smoker.

There were three answer choices: Nonsmoker, Past smoker, and Current smoker. Before we test for "association", it is helpful to understand what an "association" and a "lack of association" between two categorical variables looks like.

One way to visualize this is using clustered bar charts. Let's look at the clustered bar chart produced by the Crosstabs procedure. This is the chart that is produced if you use Smoking as the row variable and Gender as the column variable running the syntax later in this example :. The "clusters" in a clustered bar chart are determined by the row variable in this case, the smoking categories. The color of the bars is determined by the column variable in this case, gender. The height of each bar represents the total number of observations in that particular combination of categories.

This type of chart emphasizes the differences within the categories of the row variable. Notice how within each smoking category, the heights of the bars i. That is, there are an approximately equal number of male and female nonsmokers; approximately equal number of male and female past smokers; approximately equal number of male and female current smokers. If there were an association between gender and smoking, we would expect these counts to differ between groups in some way.

The first table is the Case Processing summary, which tells us the number of valid cases used for analysis. Only cases with nonmissing values for both smoking behavior and gender can be used in the test. Rather, we conclude that there is not enough evidence to suggest an association between gender and smoking.

Recall that the column percentages of the crosstab appeared to indicate that upperclassmen were less likely than underclassmen to live on campus:. The clustered bar chart from the Crosstabs procedure can act as a complement to the column percentages above. Let's look at the chart produced by the Crosstabs procedure for this example:. The "clusters" are formed by the row variable in this case, class rank.

This type of chart emphasizes the differences within the underclassmen and upperclassmen groups. Here, the differences in number of students living on campus versus living off-campus is much starker within the class rank groups.

Only cases with nonmissing values for both class rank and living on campus can be used in the test. The next table is the crosstabulation. If you elected to check off the boxes for Observed Count, Expected Count, and Unstandardized Residuals, you should see the following table:. With the Expected Count values shown, we can confirm that all cells have an expected value greater than 5. Search this Guide Search. This test is also known as: Chi-Square Test of Association.

There are two limitations to the chi-square test about which you should be aware. First, the chi-square test is very sensitive to sample size.

With a large enough sample, even trivial relationships can appear to be statistically significant. When using the chi-square test, you should keep in mind that "statistically significant" doesn't necessarily mean "meaningful.

It does not necessarily imply that one variable has any causal effect on the other. In order to establish causality, a more detailed analysis would be required.

Now click on "Statistics" and check the box next to "Chi-Square. After looking at the output, some of you are probably wondering why SPSS provides you with a two-tailed p-value when chi-square is always a one-tailed test.

In all honesty, I don't know the answer to that question. However, all is not lost. Because two-tailed tests are always more conservative than one-tailed tests i. If you're highly motivated, you can compare the obtained statistic from your output to the critical statistic found on a chi-square chart. Here's a video walkthrough with a slightly more detailed explanation:. Search Site: Powered by. Powered by. Search Campus. The Chi-Square Test for Independence. Learning Objectives Understand the characteristics of the chi-square distribution Carry out the chi-square test and interpret its results Understand the limitations of the chi-square test Key Terms Chi-Square Distribution: a family asymmetrical, positively skewed distributions, the exact shape of which is determined by their respective degrees of freedom Observed Frequencies: the cell frequencies actually observed in a bivariate table Expected Frequencies: The cell frequencies that one might expect to see in a bivariate table if the two variables were statistically independent Overview The primary use of the chi-square test is to examine whether two variables are independent or not.

The Chi-Square Distribution The chi-square distribution, like the t distribution, is actually a series of distributions, the exact shape of which varies according to their degrees of freedom. The graph below illustrates how the shape of the chi-square distribution changes as the degrees of freedom k increase: The Chi-Square Test Earlier in the semester, you familiarized yourself with the five steps of hypothesis testing: 1 making assumptions 2 stating the null and research hypotheses and choosing an alpha level 3 selecting a sampling distribution and determining the test statistic that corresponds with the chosen alpha level 4 calculating the test statistic and 5 interpreting the results.

Below is the table documenting the raw scores of boys and girls and their respective behavior issues or lack thereof : Gender and Getting in Trouble at School Got in Trouble Did Not Get in Trouble Total Boys 46 71 Girls 37 83 Total 83 To examine statistically whether boys got in trouble in school more often, we need to frame the question in terms of hypotheses.

In this case, the specific hypotheses are: H0: There is no relationship between gender and getting in trouble at school H1: There is a relationship between gender and getting in trouble at school As is customary in the social sciences, we'll set our alpha level at 0.

We do the same thing for the other three cells and end up with the following expected counts in parentheses next to each raw score : Gender and Getting in Trouble at School Got in Trouble Did Not Get in Trouble Total Boys 46 Then we add all of the terms there will be four, one for each cell together, like so: After we've crunched all those numbers, we end up with an obtained statistic of 1. The Limitations of the Chi-Square Test There are two limitations to the chi-square test about which you should be aware.

And, because we have cleaned up s1q62a , we are ready to run our chi square test. Select Analyze , Descriptive Statistics , and then Crosstabs. Find s1q62a in the variable list on the left, and move it to the Column s box. Your output should look like the table on the left. Take a look at the Asymptotic Significance of this chi square test. Using this information, what can we say about the relationship between paternal degree and full time enrolment in education after secondary school?

Before you run the chi square, make sure to check the frequencies in s1q62b and make any corrections you think are necessary.

Is there a statistically significant relationship between maternal degree and full time education after secondary school? Remember that you are simply able to say now that paternal degree and Year 11 truancy both have relationships with respondent enrolment in full time education after secondary school. You cannot say, for example, that a paternal degree causes enrolment in full time education. Univariate analysis Bivariate analysis Multivariate analysis. Crosstabs Chi square. Research Question 4: Full time education.

Bivariate analysis. Chi square.



0コメント

  • 1000 / 1000