Comparing Two
Populations
By
Jon Baker, National Marine Fisheries Association and BioLab
You have learned how to perform a Chi-square analysis and test Hardy-Weinberg expectations. In both cases, a single experiment was performed and the experimental results were compared to expected results for one population. But what if we want to determine whether or not two populations have similar allele frequencies? Let’s say that we are wondering if a hatchery stock has the same distribution of alleles as a wild stock. The problem is that we do not have a theoretical distribution to which we can compare our data. Or do we? To answer this question we must rely on a slightly different statistical method. This statistical analysis will allow us to determine if the gene frequencies between the two populations are significantly different.
We
will still use the Chi-square statistic but since we will be comparing two
populations simultaneously we arrange the data differently.
The data will be arranged into what is known as a contingency
table. Look at the table below,
because we have two populations and one locus with two alleles.
This is a 2 x 2 contingency table. The
test begins with the table and it is easy to fill in the table.
There is a column for each allele and a row for each population.
Imagine that we discovered that the Green River population had 212 A
alleles and 389 B alleles, while the Rapid River population had 189 A alleles
and 275 B alleles. These are the
observed data; find them in the table below.
# of alleles at locus 1
|
|||
Population |
A |
B |
Total |
Green River
|
|
|
|
Observed: |
212 |
388 |
? |
|
Expected: |
? |
? |
|
|
|
|
|
|
Rapid River
|
|
|
|
Observed:
|
157 |
243 |
? |
|
Expected: |
? |
? |
|
|
|
|
|
|
Total:
|
? |
?
|
? |
The next thing to do is calculate the row and column totals, they are at the far right of each row and the bottom of each column. Also notice that there is a grand total in the far bottom right corner. The total for rows and total for columns should be the same. This is the total number of alleles counted and in this example that number is 1000.
Alleles at locus 1
|
|||
Population |
A |
B |
Total |
Green River
|
|
|
|
Observed:
|
212 |
390 |
600
(=R1) |
|
Expected: |
|
|
|
|
|
|
|
|
Rapid River
|
|
|
|
Observed:
|
157 |
241 |
400
(=R2) |
|
Expected: |
|
|
|
|
|
|
|
|
Total:
|
369
(=C1) |
631
(=C2) |
1000
(=n) |
Now recall that Chi-square compares observed results to expected results. But what are the expected results for a contingency test? Just as with the other tests we have done, you must consider the null hypothesis to answer this question. Because a contingency table tests for independence between populations - in this case the independence of the allele frequency between the populations, the null hypothesis is that there is no difference in allele frequencies between the populations. Stated another way, the proportion of alleles in each population is the same. In terms of our example, the null hypothesis states that the proportion 212 A alleles out of 600 in the Green River population is the same as the proportion 157 A alleles out of 400 in the Rapid River population. Similarly, the proportion 388 B alleles in the 600 in the Green River population is equivalent to the proportion 243 B alleles in the 400 in the Rapid River population.
Let’s determine the expected allele proportions under the null hypothesis.- its calculation is very simple. Notice that the Green River population has 600 of the 1000 alleles in the study. In other words 6/10 of the alleles in the study are from the Green River population. Also notice that between the two populations, there are a total of 369 A alleles. If we form a null hypothesis of no difference in allele frequencies between populations this means that the proportions should be equivalent. Numerically then 6/10 of the 369 A alleles should belong to the Green River population and 4/10 of the 369 A alleles should belong to the Rapid River population. The calculation looks like this, 369 x 6/10 = 221.4 and reveals the expected number of A alleles for the Green River population. Symbolically it looks like this,
C1 x R1/ n
Interpreted, C1 is the total of observed A alleles in both populations or Column 1 total, and R1 is total of the Green River A and B alleles counted or Row 1 total. These are multiplied and divided by n, which is the total number of alleles counted in the study.
The expected number of A alleles for the Rapid River population is C1 x R2/ n, which is 369 x 400/1000 or, simplified, 369 x 4/10 = 147.6 . To determine the expected numbers of B alleles, perform the same calculations, but using the C2 totals.
Alleles at locus 1
|
|||
Population |
A |
B |
Total |
Green River
|
|
|
|
Observed:
|
212 |
390 |
600
(=R1) |
|
Expected: |
C1
x R1/ n |
C2
x R1/ n |
|
|
|
(221.4) |
(378.6) |
|
Rapid River
|
|
|
|
Observed:
|
157 |
241 |
400
(=R2) |
|
Expected: |
C1
x R2/ n (147.6) |
C2
x R2/ n (252.4) |
|
|
|
|
|
|
Total:
|
369
(=C1) |
631
(=C2) |
1000
(=n) |
Now that you understand how to set up a contingency table, it is a
simple matter to perform the Chi-square test.
Read the following example.
Ho : The allele frequencies are the same for both
populations.
HA : The allele frequencies are different for both
populations.
Alleles at locus 1
|
|||
Population |
A |
B |
Total |
Green River
|
|
|
|
Observed: |
212 |
390 |
600
(=R1) |
|
Expected: |
(221.4) |
(378.6) |
|
|
|
|
|
|
Rapid River
|
|
|
|
Observed:
|
157 |
241 |
400
(=R2) |
|
Expected: |
(147.6) |
(252.4) |
|
|
|
|
|
|
Total:
|
369
(=C1) |
631
(=C2) |
1000
(=n) |
n
= total of number of samples
Recall that
:
c2
= S ( O – E )2
/E
so,
c2
= (212 - 221.4)2
/221.4
+ (157 - 147.6)2 /147.6 + (390 - 378.6)2 /378.6 + (241 –
252.4)2 /252.4
c2
= (-9.4)2 /221.4 + (9.4)2
/147.6
+ (11.4)2 /378.6 + (-11.4)2
/252.4
c2
= 0.3990 + 0.5986
+ 0.3433 + 0.5149
c2
= 1.8558
In the case of contingency
tables the degrees of freedom are calculated differently than for Goodness of
Fit tests:
n
= (rows – 1) (columns – 1)
n
= (2-1) (2-1) = 1 X 1
n
= 1
We will take our c2 critical value to the c2 critical value table below and we find that the probability of a critical value of 1.8558 is greater than 0.10.
Probability of
exceeding the critical value
od.f.
0.10 0.05
0.025 0.01
0.001
----------------------------------------------------------------
1
2.706 3.841
5.024 6.635
10.828
2
4.605 5.991
7.378 9.210
13.816
3
6.251 7.815
9.348 11.345
16.266
4
7.779 9.488
11.143 13.277
18.467
5
9.236 11.070
12.833 15.086
20.515
If we had an expanded table
we would find that:
0.10
< P < 0.25
Decision:
We fail to reject the null hypothesis and conclude that there is no difference
in allele frequencies at locus 1 between these two populations.
Worksheet for Comparing Two Populations
Ho=
HA=
Alleles at locus (e.g. prolactin2 or gonadotropin)
Population A B Total
1. ___________________ Observed: ________ ________ _______
Expected: ________ ________ _______
2. ___________________ Observed: ________ ________ _______
Expected:
________
________
_______
TOTAL: __________(=C1) ________(=C2) ________(=n)
Ask: Are the proportion of alleles the same in the two populations? e.g.
Expected A's in Pop.1 = R1 x C1 / n (% alleles in Pop. 1 x total A's)
Expected A's in Pop.2 = R2 x C1 / n (% alleles in Pop. 2 x total A's)
c2 = S (O - E)2 / E = _________ = critical value
n = (rows -1)(columns - 1) = _______degrees of freedom
Go to Chi Square table of critical values (see section above)