AIMs Overestimate Admixture

September 8, 2014

AIMs are a subset of SNPs chosen for their informativeness about ancestry and often used by geneticists instead of genome-wide data to save time and money. However, according to Galanter et al. (2010), this can lead to errors and overestimations of admixture, especially when the panel of AIMs is very small:

Ancestry informative markers (AIMs) have been used as a cost-effective way to estimate individual ancestral proportions in admixed populations such as African Americans and Latinos. [...] We compared differences in ancestry estimated with different size AIMs panels with ancestry estimated from genomewide markers. [...] There was an inverse correlation between the number of AIMs used to estimate ancestry and mean and standard deviation of the error in ancestry estimation. Using AIMs, African ancestry was consistently overestimated, while the major ancestral component (European in Puerto Ricans and Native American in Mexicans) was systematically underestimated. Using 300 or fewer AIMs consistently produced a standard deviation of ancestry estimation error of 10% or greater. [...] There is both systematic bias resulting in overestimation of African ancestry (and underestimation of other continental ancestry) and random error. Such error is inversely proportional to the number of AIMs used.

Bauchet et al. (2007) found that even larger panels of AIMs, while somewhat more accurate, still lead to a loss of structure, and therefore an overestimation of admixture, compared with using the full SNP data set:

Using <1,200 EuroAIMs of the type available in this panel gradually leads to loss of consistent structure and a corresponding increase in misclassification of individual origins (fig. 7C).

While the number of AIMs used is clearly a big factor in the accuracy level of results, another problem is that AIMs may not even be as informative about ancestry as they claim, according to Bolnick et al. (2007):

Furthermore, some of the most "informative" AIMs involve loci that have undergone strong selection, which makes it unclear whether these markers indicate shared ancestry or parallel selective pressures (such as similar environmental exposures in different geographic regions) or both.

Hopefully, all this criticism will get more notice, and geneticists will stop trying to cut corners by using these inferior markers for quantifying individual ancestry.

Related: Overestimated Admixture in Brisighelli (2012)