Comparing Two
Populations
By Jon Baker, National
Marine Fisheries Association and BioLab
You have learned how to perform a Chi-square analysis and test Hardy-Weinberg expectations. In both cases, a single experiment was performed and the experimental results were compared to expected results for one population. But what if we want to determine whether or not two populations have similar allele frequencies? Let’s say that we are wondering if a hatchery stock has the same distribution of alleles as a wild stock. The problem is that we do not have a theoretical distribution to which we can compare our data. Or do we? To answer this question we must rely on a slightly different statistical method. This statistical analysis will allow us to determine if the gene frequencies between the two populations are significantly different.
We
will still use the Chi-square statistic but since we will be comparing two
populations simultaneously we arrange the data differently. The data will be arranged into what is known
as a contingency table. Look at the table below, because we have two
populations and one locus with two alleles.
This is a 2 x 2 contingency table.
The test begins with the table and it is easy to fill in the table. There is a column for each allele and a row
for each population. Imagine that we
discovered that the Green River population had 212 A alleles and 389 B alleles,
while the Rapid River population had 189 A alleles and 275 B alleles. These are the observed data; find them in
the table below.
#
of alleles at locus 1
|
|||
Population |
A |
B |
Total |
Green River
|
|
|
|
Observed
|
212 |
388 |
? |
|
Expected |
? |
? |
|
|
|
|
|
|
Rapid River
|
|
|
|
Observed
|
157 |
243 |
? |
|
Expected |
? |
? |
|
|
|
|
|
|
Total
|
? |
? |
? |
The next thing to do is calculate the row and column totals, they are at the far right of each row and the bottom of each column. Also notice that there is a grand total in the far bottom right corner. The total for rows and total for columns should be the same. This is the total number of alleles counted and in this example that number is 1000.
Alleles at locus 1
|
|||
Population |
A |
B |
Total |
Green River
|
|
|
|
Observed
|
212 |
390 |
600 (=R1) |
|
Expected |
|
|
|
|
|
|
|
|
Rapid River
|
|
|
|
Observed
|
157 |
241 |
400 (=R2) |
|
Expected |
|
|
|
|
|
|
|
|
Total
|
369 (=C1) |
631 (=C2) |
1000 (=n) |
Now recall that Chi-square compares observed results to expected results. But what are the expected results for a contingency test? Just as with the other tests we have done, you must consider the null hypothesis to answer this question. Because a contingency table tests for independence between populations - in this case the independence of the allele frequency between the populations, the null hypothesis is that there is no difference in allele frequencies between the populations. Stated another way, the proportion of alleles in each population is the same. In terms of our example, the null hypothesis states that the proportion 212 A alleles out of 600 in the Green River population is the same as the proportion 157 A alleles out of 400 in the Rapid River population. Similarly, the proportion 388 B alleles in the 600 in the Green River population is equivalent to the proportion 243 B alleles in the 400 in the Rapid River population.
Let’s determine the expected allele proportions under the null hypothesis.- its calculation is very simple. Notice that the Green River population has 600 of the 1000 alleles in the study. In other words 6/10 of the alleles in the study are from the Green River population. Also notice that between the two populations, there are a total of 369 A alleles. If we form a null hypothesis of no difference in allele frequencies between populations this means that the proportions should be equivalent. Numerically then 6/10 of the 369 A alleles should belong to the Green River population and 4/10 of the 369 A alleles should belong to the Rapid River population. The calculation looks like this, 369 x 6/10 = 221.4 and reveals the expected number of A alleles for the Green River population. Symbolically it looks like this,
C1 x R1/ n
Interpreted, C1 is the total of observed A alleles in both populations or Column 1 total, and R1 is total of the Green River A and B alleles counted or Row 1 total. These are multiplied and divided by n, which is the total number of alleles counted in the study.
The expected number of A alleles for the Rapid River population is C1 x R2/ n, which is 369 x 400/1000 or, simplified, 369 x 4/10 = 147.6 . To determine the expected numbers of B alleles, perform the same calculations, but using the C2 totals.
Alleles at locus 1
|
|||
Population |
A |
B |
Total |
Green River
|
|
|
|
Observed
|
212 |
390 |
600 (=R1) |
|
Expected |
C1 x R1/ n |
C2 x R1/ n |
|
|
|
(221.4) |
(378.6) |
|
Rapid River
|
|
|
|
Observed
|
157 |
241 |
400 (=R2) |
|
Expected |
C1 x R2/ n (147.6) |
C2 x R2/ n (252.4) |
|
|
|
|
|
|
Total
|
369 (=C1) |
631 (=C2) |
1000 (=n) |
Now that you understand how to set up a contingency table, it is a
simple matter to perform the Chi-square test.
Read the following example.
Ho : The allele frequencies are the same for both populations.
HA : The allele frequencies are different for both
populations.
Alleles at locus 1
|
|||
Population |
A |
B |
Total |
Green River
|
|
|
|
Observed
|
212 |
390 |
600 (=R1) |
|
Expected |
(221.4) |
(378.6) |
|
|
|
|
|
|
Rapid River
|
|
|
|
Observed
|
157 |
241 |
400 (=R2) |
|
Expected |
(147.6) |
(252.4) |
|
|
|
|
|
|
Total
|
369 (=C1) |
631 (=C2) |
1000 (=n) |
n
= total of number of samples
Recall that
:
c2 = S ( O – E )2 /E
so,
c2 = (212 - 221.4)2 /221.4 + (157 - 147.6)2 /147.6 + (390 - 378.6)2 /378.6 + (241 – 252.4)2 /252.4
c2 = (-9.4)2 /221.4 + (9.4)2 /147.6 + (11.4)2 /378.6 + (-11.4)2 /252.4
c2 = 0.3990 + 0.5986 + 0.3433 + 0.5149
c2 = 1.8558
In the case of contingency
tables the degrees of freedom are calculated differently than for Goodness of
Fit tests:
n = (rows – 1) (columns – 1)
n = (2-1) (2-1) = 1 X 1
n = 1
We will take our c2 critical value to the c2 critical value table below and we find that the probability of a critical value of 1.8558 is greater than 0.10.
Probability of exceeding
the critical value
od.f. 0.10 0.05 0.025 0.01
0.001
----------------------------------------------------------------
1 2.706 3.841
5.024 6.635 10.828
2 4.605 5.991 7.378 9.210
13.816
3 6.251 7.815 9.348 11.345
16.266
4 7.779 9.488 11.143 13.277
18.467
5 9.236 11.070 12.833 15.086
20.515
If we had an expanded table
we would find that:
0.10
< P < 0.25
Decision:
we fail to reject the null hypothesis and conclude that there is no difference
in allele frequencies at locus 1 between these two populations.
Did you notice that the degrees of
freedom and the Chi-square statistic were calculated a bit differently? To calculate the degrees of freedom for a
contingency test you have to multiply the number of rows minus 1 and the number
of columns minus 1. As for the
Chi-square statistic, you now have 4 classes of data rather than two as in an
earlier lesson. You can see that the
deviation from expected is calculated four times and summed to arrive at the
Chi-square statistic.