Comparing Two Populations

By Jon Baker, National Marine Fisheries Association and BioLab

 

You have learned how to perform a Chi-square analysis and test Hardy-Weinberg expectations.  In both cases, a single experiment was performed and the experimental results were compared to expected results for one population.  But what if we want to determine whether or not two populations have similar allele frequencies?  Let’s say that we are wondering if a hatchery stock has the same distribution of alleles as a wild stock.  The problem is that we do not have a theoretical distribution to which we can compare our data.  Or do we?  To answer this question we must rely on a slightly different statistical method.  This statistical analysis will allow us to determine if the gene frequencies between the two populations are significantly different.

 

We will still use the Chi-square statistic but since we will be comparing two populations simultaneously we arrange the data differently.  The data will be arranged into what is known as a contingency table.  Look at the table below, because we have two populations and one locus with two alleles.  This is a 2 x 2 contingency table.  The test begins with the table and it is easy to fill in the table.  There is a column for each allele and a row for each population.  Imagine that we discovered that the Green River population had 212 A alleles and 389 B alleles, while the Rapid River population had 189 A alleles and 275 B alleles.  These are the observed data; find them in the table below. 

 

 

                                    # of alleles at locus 1

Population

A

B

Total

Green River

 

 

 

Observed:

212

388

?

Expected:

?

?

 

 

 

 

 

Rapid River

 

 

 

Observed:

157

243

?

Expected:

?

?

 

 

 

 

 

Total:

?

?

?

 

 

The next thing to do is calculate the row and column totals, they are at the far right of each row and the bottom of each column.  Also notice that there is a grand total in the far bottom right corner.  The total for rows and total for columns should be the same.  This is the total number of alleles counted and in this example that number is 1000. 

 

Alleles at locus 1

Population

A

B

Total

Green River

 

 

 

Observed:

212

390

600 (=R1)

Expected:

 

 

 

 

 

 

 

Rapid River

 

 

 

Observed:

157

241

400 (=R2)

Expected:

 

 

 

 

 

 

 

Total:

369 (=C1)

 631 (=C2)

1000 (=n)

 

Now recall that Chi-square compares observed results to expected results.  But what are the expected results for a contingency test?  Just as with the other tests we have done, you must consider the null hypothesis to answer this question.  Because a contingency table tests for independence between populations - in this case the independence of the allele frequency between the populations, the null hypothesis is that there is no difference in allele frequencies between the populations.  Stated another way, the proportion of alleles in each population is the same.  In terms of our example, the null hypothesis states that the proportion 212 A alleles out of 600 in the Green River population is the same as the proportion 157 A alleles out of 400 in the Rapid River population.  Similarly, the proportion 388 B alleles in the 600 in the Green River population is equivalent to the proportion 243 B alleles in the 400 in the Rapid River population. 

 

Let’s determine the expected allele proportions under the null hypothesis.- its calculation is very simple.  Notice that the Green River population has 600 of the 1000 alleles in the study.  In other words 6/10 of the alleles in the study are from the Green River population.  Also notice that between the two populations, there are a total of 369 A alleles.  If we form a null hypothesis of no difference in allele frequencies between populations this means that the proportions should be equivalent.  Numerically then 6/10 of the 369 A alleles should belong to the Green River population and 4/10 of the 369 A alleles should belong to the Rapid River population.  The calculation looks like this, 369 x 6/10 = 221.4 and reveals the expected number of A alleles for the Green River population.  Symbolically it looks like this,

 

C1 x R1/ n 

 

Interpreted, C1 is the total of observed A alleles in both populations or Column 1 total, and R1 is total of the Green River A and B alleles counted or Row 1 total.  These are multiplied and divided by n, which is the total number of alleles counted in the study. 

 

The expected number of A alleles for the Rapid River population is C1 x R2/ n, which is 369 x 400/1000 or, simplified, 369 x 4/10 = 147.6 .  To determine the expected numbers of B alleles, perform the same calculations, but using the C2 totals.

 

Alleles at locus 1

Population

A

B

Total

Green River

 

 

 

Observed:

212

390

600 (=R1)

Expected:

C1 x R1/ n

C2 x R1/ n

 

 

(221.4)

(378.6)

 

 

Rapid River

 

 

 

Observed:

157

241

400 (=R2)

Expected:

C1 x R2/ n

(147.6)

C2 x R2/ n

(252.4)

 

 

 

 

 

Total:

369 (=C1)

 631 (=C2)

1000 (=n)

 

 

Now that you understand how to set up a contingency table, it is a simple matter to perform the Chi-square test.  Read the following example.

 

Ho : The allele frequencies are the same for both populations.

 

HA : The allele frequencies are different for both populations.

 

 

Alleles at locus 1

Population

A

B

Total

Green River

 

 

 

Observed:

212

390

600 (=R1)

Expected:

(221.4)

(378.6)

 

 

 

 

 

Rapid River

 

 

 

Observed:

157

241

400 (=R2)

Expected:

(147.6)

(252.4)

 

 

 

 

 

Total:

369 (=C1)

 631 (=C2)

1000 (=n)

 

n = total of number of samples

 

Recall that

:

c2 = S ( O – E )2 /E

 

so,

 

c2 = (212 - 221.4)2 /221.4 + (157 - 147.6)2 /147.6 + (390 - 378.6)2 /378.6 + (241 –    252.4)2 /252.4

c2 =  (-9.4)2 /221.4 + (9.4)2 /147.6 + (11.4)2 /378.6 + (-11.4)2 /252.4

c2 =  0.3990 + 0.5986 + 0.3433 + 0.5149

c2 = 1.8558

 

In the case of contingency tables the degrees of freedom are calculated differently than for Goodness of Fit tests:

 

n = (rows – 1) (columns – 1)

n = (2-1) (2-1) = 1 X 1

n = 1

 

We will take our c2 critical value to the c2 critical value table below and we find that the probability of a critical value of 1.8558 is greater than 0.10. 

 

Probability of exceeding the critical value

od.f.         0.10      0.05     0.025      0.01     0.001

----------------------------------------------------------------

  1          2.706     3.841     5.024     6.635    10.828

  2          4.605     5.991     7.378     9.210    13.816

  3          6.251     7.815     9.348    11.345    16.266

  4          7.779     9.488    11.143    13.277    18.467

  5          9.236    11.070    12.833    15.086    20.515

 

If we had an expanded table we would find that:

 

0.10 < P < 0.25

 

Decision: We fail to reject the null hypothesis and conclude that there is no difference in allele frequencies at locus 1 between these two populations.

  Did you notice that the degrees of freedom and the Chi-square statistic were calculated a bit differently?  To calculate the degrees of freedom for a contingency test you have to multiply the number of rows minus 1 and the number of columns minus 1.  As for the Chi-square statistic, you now have 4 classes of data rather than two as in an earlier lesson.  You can see that the deviation from expected is calculated four times and summed to arrive at the Chi-square statistic.

 

Worksheet for Comparing Two Populations

Ho=

HA=

 

                                                            Alleles at locus (e.g. prolactin2 or gonadotropin)

           Population                                          A                                   B                        Total

1. ___________________    Observed:  ________                    ________                _______

                                              Expected:   ________                    ________                _______

 

2. ___________________     Observed: ________                    ________                 _______

                                               Expected:  ________                    ________                 _______

                                TOTAL:                __________(=C1)         ________(=C2)        ________(=n)

 

Ask:  Are the proportion of alleles the same in the two populations?  e.g.

 

Expected A's in Pop.1 = R1 x C1 / n     (% alleles in Pop. 1 x total A's)                                  

 

Expected A's in Pop.2 = R2 x C1 / n      (% alleles in Pop. 2 x total A's)                                       

 

c2 = S (O - E)2 / E =  _________  =  critical value

              

n = (rows -1)(columns - 1) = _______degrees of freedom

 

Go to Chi Square table of critical values (see section above)