Let’s take a moment to explain some mark-recapture basics. In my last blog, I spoke about using computer vision to aid in identifying and re-identifying individuals from photographs. The ability to use natural markings for the identification of individuals in a population naturally enables the use of mark-recapture methods to estimate important parameters like abundance.
In mark-recapture experiments, a proportion of a population is captured, marked, and released back into the population. In the simplest of scenarios, a second group is recaptured. The proportion of marked individuals in the second sample should be representative of the proportion of marked individuals in the entire population. This is the Lincoln Petersen Estimator.
Grab yourself a shuffled deck of playing cards and deal 10 cards from the top. There’s a nice simulator here. Make a note of what cards you draw; these will be our marked individuals. After returning these to the deck, shuffle and draw ten more. This is our second sample. We can estimate the number of cards in the deck by using the Lincoln-Petersen Estimator, N = n1n2/m2, where n1 and n2 are the number of cards dealt in each hand (10) and m2 is the number of “marked” cards from the first hand seen again in the second hand. On average this will be about two cards and result in an estimate of 50 cards in the deck.
Statistically, this is equivalent to estimating the number of cards that would need to be in the deck to explain the number of successes we observe (seeing a previously drawn card) in a sequence of 10 new cards sampled without replacement. The expected number of successes in 10 cards, m2, according to the hypergeometric distribution, is equal to the number of draws n2, multiplied by the proportion of successes in the population n1/N. Rewriting m2 = n2n1/N gives our estimator N = n1n2/m2 We’ve just created our first closed-population model; closed because our estimator relies on N being a fixed value and thus we cannot add or remove cards from the deck between dealing the two hands.
Another way to consider our capture information is in the form of capture histories for each card that we have encountered. For example, if we encounter the queen of hearts in the first hand and again in the second hand, we would denote this as a capture history of {11}, present on both occasions. Alternatively, we could have seen the card for the first time in the second hand, which would give a capture history of {01}. With two sampling occasions, there are three possible capture histories that we can see {11, 10, 01} and a fourth that we can’t see {00}. We can model the observed frequencies of each encounter history with the multinomial distribution so that we can estimate how many cards in the deck we didn’t see. That is, those that had an encounter history of {00}. Deriving this ultimately leads to the same estimator for N above, the benefit, however, is that the extension to multiple sampling occasions is straightforward, and we can also begin to design models with parameters that estimate things like survival between sampling occasions and immigration into the population. The math involved become a little more difficult to demonstrate by hand, but the concepts are similar.
Now, think about how our deck of cards relates to what is going on in the wild. How do we ensure our deck is being shuffled? Are we adding and removing cards to the deck?