User Tools

Site Tools


chi-square_test

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
chi-square_test [2016/05/16 07:31] hkimscilchi-square_test [2016/05/16 08:21] (current) hkimscil
Line 1: Line 1:
 {{keywords>"Chi-square test" statistics "research methods"}} {{keywords>"Chi-square test" statistics "research methods"}}
 ====== Short Explanation ====== ====== Short Explanation ======
 +To be filled... 
 ====== Chi-square test, explanation ====== ====== Chi-square test, explanation ======
 This is rather a redudent, long description of chi-square test. This is rather a redudent, long description of chi-square test.
Line 18: Line 19:
 Let's start with what we know first. Let's start with what we know first.
  
-Two variables:: Let's say you are interested in the relationships between the types of religions and opinions about abortion.+Two variables: Let's say you are interested in the relationships between the types of religions and opinions about abortion.
  
 You have a hunch that people who have a different religion will have a different opinion about abortion. This actually reveals that you think having a particular religion will affect what to think about abortion. Therefore, particular religions will be the __IndependentVariable__ (IV). And the opinions abut abortion will be the __DependentVariable__ (DV).  You have a hunch that people who have a different religion will have a different opinion about abortion. This actually reveals that you think having a particular religion will affect what to think about abortion. Therefore, particular religions will be the __IndependentVariable__ (IV). And the opinions abut abortion will be the __DependentVariable__ (DV). 
Line 26: Line 27:
 {{19-419434-a-tshirt-camp-scils-rutgers.jpg?202 |Dean of SCILS at Rutgers promoting RUSURE program}}  {{19-419434-a-tshirt-camp-scils-rutgers.jpg?202 |Dean of SCILS at Rutgers promoting RUSURE program}} 
  
-We have two variables here. What kinds of values (attributes) do you see for the first variable, types of religions? __Nominal variable__. Your initial response would be ''Catholic, Protestant, and Judaism. . . .'' At this point your might not be sure if you thought of all kinds of religions -- yes, certainly, there is Buhdism, too; and many others. Now you use your judgment about this __exhaustivenss__ problem. With your rationale, you can decide that you make categories as follow: Variable 1: Catholic, Protestant, and Others. The last one, ''others,'' covers everything else, right? So you just escaped from the exhaustivenss problem. As a side note, you may say, "Wait, my religion is not there... and It's unfair!" That's why I said you use your judgment. The rationale for the choice was to see the difference between the two religions and others -- Catholic versus Protestant versus everything else. Or in your gathering samples, you may want to choose only those who are catholic and protestant. Basically, it is your (researcher's) choice. But, as far as sampling method concerned, it should be randomly done. +We have two variables here. What kinds of values (attributes) do you see for the first variable, types of religions? __Nominal variable__. Your initial response would be Catholic, Protestant, and Judaism. . . . At this point your might not be sure if you thought of all kinds of religions -- yes, certainly, there is Buhdism, too; and many others. Now you use your judgment about this __exhaustivenss__ problem. With your rationale, you can decide that you make categories as follow: Variable 1: Catholic, Protestant, and Others. The last one, others, covers everything else, right? So you just escaped from the exhaustivenss problem. As a side note, you may say, "Wait, my religion is not there... and It's unfair!" That's why I said you use your judgment. The rationale for the choice was to see the difference between the two religions and others -- Catholic versus Protestant versus everything else. Or in your gathering samples, you may want to choose only those who are catholic and protestant. Basically, it is your (researcher's) choice. But, as far as sampling method concerned, it should be randomly done. 
  
 Now we need to take care of the other variable, the choice of abortion. The values of the variable, choice of abortion, may be rather simple: Now we need to take care of the other variable, the choice of abortion. The values of the variable, choice of abortion, may be rather simple:
Line 82: Line 83:
 <WRAP clear /> <WRAP clear />
 {{ 11-419460-rusure.jpg?202|Dr Mokros at RuSure Campaign SCILS Rutgers 2000}} {{ 11-419460-rusure.jpg?202|Dr Mokros at RuSure Campaign SCILS Rutgers 2000}}
-Chi-square test:: This is how the Chi-square method was involved in ... Anyway, now let's think about the chi-square test. Previously, you wanted to know if there are differences in the abortion opinions among religious groups; and you got the frequency table. Now think about what the table would look like if there is no differences in the abortion opinion among religious groups?+Chi-square test: This is how the Chi-square method was involved in ... Anyway, now let's think about the chi-square test. Previously, you wanted to know if there are differences in the abortion opinions among religious groups; and you got the frequency table. Now think about what the table would look like if there is no differences in the abortion opinion among religious groups?
  
 We already know from the first contingency, frequency, or bivariate analysis table that there were 50 people who were for legal abortion and 50 people who were against legal abortion -- the ratio of each category was (YES --&gt; 50 out of 100; NO --&gt; 50 out of 100). And there were 40 Catholics; 42 Protestant; and 18 others. We already know from the first contingency, frequency, or bivariate analysis table that there were 50 people who were for legal abortion and 50 people who were against legal abortion -- the ratio of each category was (YES --&gt; 50 out of 100; NO --&gt; 50 out of 100). And there were 40 Catholics; 42 Protestant; and 18 others.
Line 89: Line 90:
 |    |    |       | Religion  |    |    |       | Religion 
 |    |    | Catholic  | Protestant  | Other  | Total  |  |    |    | Catholic  | Protestant  | Other  | Total  | 
-<|3>Legal Abortion  | yes  |    |    |    | 50  |  +| Legal Abortion  | yes  |    |    |    | 50  |  
-   |    | .      |    |  +:::   |    | .      |    |  
-| no  |    |    |    | 50  | +| :::   | no  |    |    |    | 50  | 
 |    |    |    | .      |    |  |    |    |    | .      |    | 
 |    | Total  | 40  | 42  | 18  | 100  |  |    | Total  | 40  | 42  | 18  | 100  | 
Line 157: Line 158:
  
 {{ 19-419667-rusure.jpg?202|Brad Crownover (Comm Dep advisor) at RuSURE campaign SCILS Rutgers 2000}} {{ 19-419667-rusure.jpg?202|Brad Crownover (Comm Dep advisor) at RuSURE campaign SCILS Rutgers 2000}}
-{{ 20-419434-a-tshirt-camp-scils-rutgers.jpg?202|RUSURE campaign SCILS Rutgers 2000",selflink}} You may want to look up your textbook about this. The sign $\textstyle\sum$ is called Sigma and means that you sum up all the values involved in the calculation. In your example, you have 6 different cells and each cell has two values (expected and observed). You obtain the difference between the expected and observed value; square the result; and divide the result with the expected value. You keep this calculation for the other five cells and sum them up.+{{ 20-419434-a-tshirt-camp-scils-rutgers.jpg?202|RUSURE campaign SCILS Rutgers 2000}} You may want to look up your textbook about this. The sign $\textstyle\sum$ is called Sigma and means that you sum up all the values involved in the calculation. In your example, you have 6 different cells and each cell has two values (expected and observed). You obtain the difference between the expected and observed value; square the result; and divide the result with the expected value. You keep this calculation for the other five cells and sum them up.
  
 Let's see the table. Let's see the table.
-|           |  religeon vs abortion opinion   + religeon vs abortion opinion   ||||||  
 |    |  Catholic    Protestant    Other    Total    Chi-square    |    |  Catholic    Protestant    Other    Total    Chi-square   
 |  yes  |  5  |  32  |  13  |  50  |    |  |  yes  |  5  |  32  |  13  |  50  |    | 
 |  Expected value  |  (20)  |  (21)  |  (9)  |  50  |    |  |  Expected value  |  (20)  |  (21)  |  (9)  |  50  |    | 
-|  (O-T)2 / T  |  (-15)^2^/20=11.25  |  (11)^2^/20=5.76  |  (4)^2^/20=1.78  |    |  18.79  | +|  (O-T)2 / T  |  (-15)<sup>2</sup>/20=11.25  |  (11)<sup>2</sup>/20=5.76  |  (4)<sup>2</sup>/20=1.78  |    |  18.79  | 
 |  no  |  35  |  10  |  5  |  50  |    |  |  no  |  35  |  10  |  5  |  50  |    | 
 |    |  (20)  |  (21)  |  (9)  |  50  |    |  |    |  (20)  |  (21)  |  (9)  |  50  |    | 
-|    |  (15)^2^/20=11.25  |  (-11)^2^/20=5.76  |  (-4)^2^/20=1.78  |    |  18.79  | +|    |  (15)<sup>2</sup>/20=11.25  |  (-11)<sup>2</sup>/20=5.76  |  (-4)<sup>2</sup>/20=1.78  |    |  18.79  | 
 |  Total  |  40  |  42  |  18  |  100  |  37.58  |  |  Total  |  40  |  42  |  18  |  100  |  37.58  | 
 Chi-square value = 37.58. \\  Chi-square value = 37.58. \\ 
Line 174: Line 175:
 I do not know exactly why the degree of freedom is important in a conceptual way -- so, having a difficulty explaining it. But, the idea behind it is that if you know totals of column and row, and the values of two cells (as a minimum requirement; called degree of freedom), you would be able to obtain the values of the other four cells without consulting the actual observed values. I do not know exactly why the degree of freedom is important in a conceptual way -- so, having a difficulty explaining it. But, the idea behind it is that if you know totals of column and row, and the values of two cells (as a minimum requirement; called degree of freedom), you would be able to obtain the values of the other four cells without consulting the actual observed values.
  
-{{07-419434-a-tshirt-camp-scils-rutgers.jpg?202 |RUSURE campaign SCILS Rutgers 2000}} Anyway, you just obtained the chi-square value (37.58) and the degrees of freedom (2). You can look up the text book (for chi-square test): (1) find your degrees of freedom (2), that is, the second row of the table; (2) decide the probability you want to employ (usually .05 or .01); (3) write down the numbers; and (4) compare them to your chi-square value.+{{07-419434-a-tshirt-camp-scils-rutgers.jpg?202 |RUSURE campaign SCILS Rutgers 2000}} Anyway, you just obtained the chi-square value (37.58) and the degrees of freedom (2). You can look up the text book (for chi-square test): (1) find your degrees of freedom (2), that is, the second row of the table; (2) decide the probability you want to employ (usually .05 or .01); (3) write down the numbers; and (4) compare them to your chi-square value. (see [[:chi-square distribution table]])
  
 The numbers you obtain from the book are 5.991 for the 0.05 probability and 9.210 for the 0.01 probability. They are called critical values. So the critical values are: The numbers you obtain from the book are 5.991 for the 0.05 probability and 9.210 for the 0.01 probability. They are called critical values. So the critical values are:
Line 191: Line 192:
 {{ princeton-nassau-inn-1.jpg?202|Naussau Inn near Princeton University}} {{ princeton-nassau-inn-1.jpg?202|Naussau Inn near Princeton University}}
  
-Chi-square test, example:: Let's have an exercise for the chi-square thing. I hope you remember the below part which is from the last essay I wrote. Let's go through it, first. Let's look at another table. I will show the percentage in the table -- the ratio between the opinions in a religious group. +Chi-square test, example: Let's have an exercise for the chi-square thing. I hope you remember the below part which is from the last essay I wrote. Let's go through it, first. Let's look at another table. I will show the percentage in the table -- the ratio between the opinions in a religious group. 
 |  Abortion opinion and Religeon   |||||  |  Abortion opinion and Religeon   ||||| 
 |    | Catholic  | Protestant  | other  | Total  |  |    | Catholic  | Protestant  | other  | Total  | 
Line 205: Line 206:
 This may lead you to an idea that you may need some kinds of methods that can be used to aid your decision. And these "some kinds of methods" are so called statistics. Note the sentence, "Sure 45% and 62.5% are different." And let's see if it turnes out to be right. The below is the table with raw numbers -- observed values. I would like to ask you to fill out the expected value under each observed value -- total 6 cells. This may lead you to an idea that you may need some kinds of methods that can be used to aid your decision. And these "some kinds of methods" are so called statistics. Note the sentence, "Sure 45% and 62.5% are different." And let's see if it turnes out to be right. The below is the table with raw numbers -- observed values. I would like to ask you to fill out the expected value under each observed value -- total 6 cells.
  
-|         <table class="wikiLeft"> Abortion opinion and Religeon   + Abortion opinion and Religeon   ||||| 
 |    | Catholic  | Protestant  | other  | total  |  |    | Catholic  | Protestant  | other  | total  | 
 | yes  | 18  | 25  | 12  | 55  |  | yes  | 18  | 25  | 12  | 55  | 
Line 216: Line 217:
 How about [c] cell? We have total 20 people in "other" category (an attribute of the independent variable). So, 0.55 X 20 = 11. Lastly, how about [y] cell? In this case, we are talking about "no" attribute; and the ratio of the "no" attribute of the total sample was 45/100. The total number of the Protestant is 40 -- note that it is not the number of the Catholic. So the expected value for the cell, [y], is 0.45 X 40 = 18.  How about [c] cell? We have total 20 people in "other" category (an attribute of the independent variable). So, 0.55 X 20 = 11. Lastly, how about [y] cell? In this case, we are talking about "no" attribute; and the ratio of the "no" attribute of the total sample was 45/100. The total number of the Protestant is 40 -- note that it is not the number of the Catholic. So the expected value for the cell, [y], is 0.45 X 40 = 18. 
  
-*** Note: Hey! the calculated expected values are all whole numbers! Guess how much time I spent in making up the example table! My point is that you might not get the whole numbers [such as 1, 2, 3, 4, ...] as your expected values in your table. Anyway, can you fill the entire cells now? It should look like the below.  +__Note:__ Hey! the calculated expected values are all whole numbers! Guess how much time I spent in making up the example table! My point is that you might not get the whole numbers [such as 1, 2, 3, 4, ...] as your expected values in your table. Anyway, can you fill the entire cells now? It should look like the below.  
-|         <table class="wikiLeft"> Abortion opinion and Religeon   + Abortion opinion and Religeon   |||||  
 |    | Catholic  | Protestant  | other  | total  |  |    | Catholic  | Protestant  | other  | total  | 
 | yes  | 18  | 25  | 12  | 55  |  | yes  | 18  | 25  | 12  | 55  | 
Line 233: Line 234:
  E is expected value. Sometimes, it is called theoretical value (T).  E is expected value. Sometimes, it is called theoretical value (T).
 <WRAP clear /> <WRAP clear />
-|           <table class="wikiLeft"> abortion opinion and religion   + abortion opinion and religion   |||||| 
 |    | Catholic  | Protestant  | Other  | Total  | Chi-square  |    | Catholic  | Protestant  | Other  | Total  | Chi-square 
 | yes  | 18  | 25  | 12  | 55  |    |  | yes  | 18  | 25  | 12  | 55  |    | 
 | Expected Value  | (22)  | (22)  | (11)  | (55)  |    |  | Expected Value  | (22)  | (22)  | (11)  | (55)  |    | 
-| (O-T)2 / T  | (-4)2/22=0.73  | (3)2/22=0.41  | (1)2/11=0.09  |    | 1.23  | +| (O-T)<sup>2</sup> / T  | (-4)<sup>2</sup>/22=0.73  | (3)<sup>2</sup>/22=0.41  | (1)<sup>2</sup>/11=0.09  |    | 1.23  | 
 | no  | 22  | 15  | 8  | 45  |    |  | no  | 22  | 15  | 8  | 45  |    | 
 | Expected Value  | (18)  | (18)  | (9)  | (45)  |    |  | Expected Value  | (18)  | (18)  | (9)  | (45)  |    | 
-| (O-T)2 / T  | (4)2/18=0.89  | (-3)2/18=0.5  | (-1)2/9=0.11  |    | 1.5  | +| (O-T)<sup>2</sup> / T  | (4)<sup>2</sup>/18=0.89  | (-3)<sup>2</sup>/18=0.5  | (-1)<sup>2</sup>/9=0.11  |    | 1.5  | 
 | Total  | 40  | 40  | 20  | 100  | 2.73  |  | Total  | 40  | 40  | 20  | 100  | 2.73  | 
-Chi-square value = The sum of the entire 6 yellow cells = 2.73.  +**Chi-square value = The sum of the entire 6 yellow cells = 2.73.**  \\  
-Degrees of Freedom (df) = (the # of columns-1) x (the # of rows-1)= (3-1) x (2-1) = 2 x 1 = 2.  +**Degrees of Freedom (df) = (the # of columns-1) x (the # of rows-1)= (3-1) x (2-1) = 2 x 1 = 2.** \\ 
 +\\
 Look up the values in your textbook -- which is called "critical values." Look up the values in your textbook -- which is called "critical values."
 +\\
 They are:  They are: 
  5.991 (0.05 probability)   5.991 (0.05 probability) 
Line 260: Line 261:
 In the first place, you assumed that there would be no differences in the abortion issue among the religious groups to get the expected values. And you compared the expected values to the observed values. In other words, you tested your survey result (the observed values) against the idea of "no difference." Your plans were: If the some of the comparison (chi-square) is big enough, you'd say that the idea of "no difference" was not likely true. If the some of the comparison (chi-square) is small enough, you'd say that there seems to be no reason to reject the idea of "no difference." In other words, in the first place, you assumed that there would be no difference, and you tested your survey result against this idea. What you conclude from this testing was you failed to disapprove the idea -- the idea of no differences. In the first place, you assumed that there would be no differences in the abortion issue among the religious groups to get the expected values. And you compared the expected values to the observed values. In other words, you tested your survey result (the observed values) against the idea of "no difference." Your plans were: If the some of the comparison (chi-square) is big enough, you'd say that the idea of "no difference" was not likely true. If the some of the comparison (chi-square) is small enough, you'd say that there seems to be no reason to reject the idea of "no difference." In other words, in the first place, you assumed that there would be no difference, and you tested your survey result against this idea. What you conclude from this testing was you failed to disapprove the idea -- the idea of no differences.
  
-{{raritan-river-01.jpg?132|Princeton Park river}} *** Why null? -- Someone in the class cleverly asked why we should use null hypothesis in the first place. As you see the above, it would be harder to test whether the researcher is right. Most statistic methods (chi-square, t-test, ANOVA, and others) test against the idea of 0 (zero -- no difference). +{{raritan-river-01.jpg?132 |Princeton Park river}} __Why null?__ -- Someone in the class cleverly asked why we should use null hypothesis in the first place. As you see the above, it would be harder to test whether the researcher is right. Most statistic methods (chi-square, t-test, ANOVA, and others) test against the idea of 0 (zero -- no difference). Therefore, it would not have been safe, had you ever said, "Sure 45% and 62.5% are different."
  
-Therefore, it would not have been safe, had you ever said, "Sure 45% and 62.5% are different." +__Another note:__ You might have a question... Hey, wait a minute... If I pick up some other numbers from the chi-square distribution table, the result would be totally different!  
- +<WRAP clear /> 
-*** Another note: You might have a question... Hey, wait a minute... If I pick up some other numbers from the chi-square distribution table, the result would be totally different!  +For your information, the table looks as follows. And the chi-square value you got from your data was 2.73 (see [[:Chi-square distribution table]]).
- +
-*** For your information, the table looks as follows. And the chi-square value you got from your data was 2.73.+
 | df  | .30  | .20  | .10  | .05  | .02  | .01  | .001  |  | df  | .30  | .20  | .10  | .05  | .02  | .01  | .001  | 
 | 1  | 1.074  | 1.642  | 2.706  | 3.841  | 5.412  | 6.635  | 10.827  | 1  | 1.074  | 1.642  | 2.706  | 3.841  | 5.412  | 6.635  | 10.827 
Line 279: Line 278:
 So, basically, choosing the probability means you decide the certainty of your decision.  So, basically, choosing the probability means you decide the certainty of your decision. 
  
-[[Attachment(princeton_park_river_1.jpg,width=202,align=left,caption="Princeton Park river",selflink)]] +{{princeton_park_river_1.jpg?202 |Princeton Park river}} At the same token, since the chi-square value is even bigger than the critical value at 0.01 probability level, you can state that there is indeed difference in legal abortion opinions among the religious groups. The chances that you are wrong about this decision is 0.01 out of 1 (1 out of 100; 1%). That is, even though the difference of the chi-square and the critical value seem to be due to the fact that the religious groups are indeed different in legal abortion issue, there is still a slight chance saying that such a big difference between the chi-square and critical value is due to the randomly occurred error in your sampling procedure. And such chances are 1 out of 100. 
-At the same token, since the chi-square value is even bigger than the critical value at 0.01 probability level, you can state that there is indeed difference in legal abortion opinions among the religious groups. The chances that you are wrong about this decision is 0.01 out of 1 (1 out of 100; 1%). That is, even though the difference of the chi-square and the critical value seem to be due to the fact that the religious groups are indeed different in legal abortion issue, there is still a slight chance saying that such a big difference between the chi-square and critical value is due to the randomly occurred error in your sampling procedure. And such chances are 1 out of 100. +
  
 Yes, choosing the probability means that you decide the certainty of your decision. The chances of your being wrong in your statement -- there is differences in the abortion issue among the religious groups -- was 3 out 10. And unlike the probability of 0.05 or 0.01, this risk is too big to take. In other words, it is a bit meaningless. This is why professor White said that social scientists usually take 0.05 as a criterion of his or her statistical tests. Yes, choosing the probability means that you decide the certainty of your decision. The chances of your being wrong in your statement -- there is differences in the abortion issue among the religious groups -- was 3 out 10. And unlike the probability of 0.05 or 0.01, this risk is too big to take. In other words, it is a bit meaningless. This is why professor White said that social scientists usually take 0.05 as a criterion of his or her statistical tests.
- 
  
  
chi-square_test.1463353279.txt.gz · Last modified: 2016/05/16 07:31 by hkimscil

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki