Testing Statistical Significance - Hypothesis testing

and the Chi-square goodness of fit test


By Jon Baker, National Marine Fisheries Service and BioLab


When designing and conducting an experiment, a biologist will have a predicted or expected outcome based on his/her hypothesis. However the results do not always match the expected results perfectly and the problem of how to interpret the results arises.  That is why biologists turn to statistics to help them make decisions about the results of their experiments.  For example, suppose you are performing a guinea pig heredity experiment.  You crossed a pure-breeding black with a pure-breeding white guinea pig and all the progeny were black.  You then crossed the offspring of the first mating and knowing of Mendel’s principles you predict that a 3:1 ratio, black to white, should result.  After the guinea pigs are born you find 164 guinea pigs, 117 black and 47 white.  This is not exactly a 3:1 ratio.  Given that there were 164 guinea pigs born you would expect 123 black and 41 white.  The question then becomes are these data sufficiently close to the predicted result to say that they agree?  This is when statistics must be used.


The first thing to do is form a statement of the hypothesis being tested.  In this experiment you believe that the guinea pig cross will result in a 3:1 ratio and you write a hypothesis that predicts just that.  It is called the null hypothesis (Ho) because it predicts no difference between the experimental results and your prediction.  Write a very precise and clear statement such as, “The sample data come from a population having a 3:1 ratio of black to white guinea pigs”.  You also write an alternate hypothesis (HA) that states there is a difference.  For example, “the sample data come from a population not having a 3:1 ratio of black to white guinea pigs”.  If your statistical analysis shows that the Ho is false then the HA is assumed to be true.  You must state a null hypothesis and an alternate hypothesis for every statistical test you perform.  This assures that all possible outcomes are accounted for by the two hypotheses.


The statistic to use when analyzing genetic data of this type, when we are comparing two distributions of data, is the Chi-Square (X2) Goodness of Fit.  The two distributions of data are the observed experimental results (117 black and 47 white) and the expected hypothetical distribution (3:1 or 123 black and 41 white).  This statistic will measure how far a sample distribution deviates from the hypothetical distribution. Let's examine the equation and see how to it is used.  Here is the equation.


c2 = S ( O – E )2 /E


O = observed value of each class of data

E = expected value of each class of data


Our data has two classes of data, black guinea pigs and white guinea pigs and we know what the numbers are for the observed and expected data.  All there is to do is plug in the numbers!

 c2 = (117 – 123)2 /123 + (47 – 41)2 /41

c2 = (117 – 123)2 /123 + (47 – 41)2 /41

c2 = (-6)2 /123 + (6)2 /41

c2 = (36) /123 + (36) /41

c2 = .29 + .88

c2 = 1.17


Notice what happened; we found the arithmetic difference between the observed and expected values for each class of data and since that was squared, the negative number was eliminated. The number 1.17 is the test statistic and is compared to values in a table to determine which hypothesis to accept.  The example below demonstrates the proper way to present and perform a statistical test on paper.



Ho : The sample data come from a population having a 3:1 ratio of black to white guinea pigs.


HA : The sample data come from a population not having a 3:1 ratio of black to white guinea pigs.















n = total of number of samples


c2 = (-6)2 /123 + (6)2 /41

c2 = .29 + .88

c2 = 1.17


n = k – 1

n = 2 – 1

n = 1


0.25 < P < 0.50


Whoa!  It was going well until all this nonsense at the end.  Well it is not that bad, let me explain.  Well first of all n is something known as degrees of freedom.  It is calculated by subtracting 1 from the number of classes in the data set (k).  In the example above there are two classes of data, black and white guinea pigs.  (We will not take the time to explain degrees of freedom here because it is a complicated concept best saved for a college statistics class.  Just know that you must calculate and use it to find your place in the Chi-square critical values table, which will be demonstrated next.) 


Once the test statistic (c2) and degrees of freedom (n) have been calculated it is time to compare the value to the Chi-square critical values table.   This table will tell us the probability that chance caused the amount of difference we saw between our data and expected results.  Generally, if that probability turns out to be less than 0.05, then it is agreed that chance did not cause the difference.  Now there is nothing special about 0.05, it is just the level that most scientists have agreed on.  This is what is meant by statistical significance.  If your results reveal that the probability that chance caused the difference between your data and the expected results is less than 0.05 then scientists will agree that chance probably is not the cause of the difference.  So some other mechanism is working to produce the results.  It is up to you to determine what that mechanism is, the statistics will not tell you.  In the example above we have calculated a test statistic of 1.17 and a n of 1.


Look at the Chi-square critical values table your teacher has provided.  You will first notice that there are many columns.  The first column is headed with a n and the others with the numbers 0.999, 0.995 and so on until ending with .001.  The n column is the degrees of freedom and all other columns are probabilities.  To find the probability of obtaining a X2 statistic of 1.17 read down the degrees of freedom column until you reach the degrees of freedom calculated for your test.  Then read across that row until you reach the number you calculated.  It is unlikely that you will find your number, rather it will be bracketed between two values.   In this example, our value of 1.17 is between 0.455 and 1.323.  Reading the top of the columns we find the probabilities 0.50 and 0.25 respectively.  This means that our statistic has a probability some where between .50 and .25.  Stated symbolically it is 0.25 < P < 0.50.  What this means is that the chance that the difference we observed was due to random sampling is located between 0.25 and 0.50.  A more detailed table will be needed to acquire a more precise probability.


Now, how to interpret the results?  Well with a P value of 0.25<P<0.50 the Chi-square test reveals that there is a pretty good probability that chance could be responsible for the experimental results.  This means that our experimental results agree with the predicted outcome.  Therefore we fail to reject the null hypothesis (Ho) and conclude that the sample data come from a population having a 3:1 ratio of black to white guinea pigs.  This is consistent with a Mendelian explanation for the data.





Now do a thought experiment and change the observed frequency values:


  1. Increase the disagreement between the observed and expected results in the Guinea pig example to 110 black and 54 white.  Perform the statistical test on the new data.  What did you find?


  1. Now let’s imagine a new data set with the same proportion of deviation but a total of 1000 observations.  This means that we observed 713 black, 287 white guinea pigs while expecting 750 black and 250 white guinea pigs.  Perform the statistical test on the new data.  What did you find?


  1. Explain the effect of sample size on an experiment.






































What will happen to the Chi-Square (X2) statistic as the disagreement between the observed and the expected becomes larger and smaller?  What do you find?  Try it when changing the total number of samples.






In hypothesis testing there are four outcomes.  See the table below.



Null Hypothesis



Null Hypothesis


Correct Decision

Type I error


Type II error

Correct Decision


Before you carry out a test you must first determine the magnitude of Type I error you will accept.  A Type I error is one in which you reject a true hypothesis (see the table).  By doing this you are setting the significance level.  This is the point at which you will accept or reject the null hypothesis.  It is this level that you will have to defend from peers reviewing your work.  A typical level in the sciences is .05 or 5%.  What this says is that "the difference observed in my data from what is expected has only a .05 (or less) probability of being due to random error (chance)".   Stated another way - "there is only a 5% chance that chance caused the difference seen in the data."  So if you get a Chi-Square (X2) statistic of .05 or less we reject the null hypothesis and say that the difference we see is likely due to some factor other than chance.


Now your teacher will show you how to calculate a Chi-Square (X2) statistic.