Update 20/02/2013: Ancient Amerindian-like admixture in Europe - something doesn't add up
There are a lot of people out there who believe that programs like ADMIXTURE and STRUCTURE can accurately measure exotic admixtures in their genomes. But this is not true.
If run properly, and with the right reference samples, ADMIXTURE and STRUCTURE do indeed show very high accuracy in classifying the ethnic origins of individuals, even at intra-national level. However, this process relies on finding relative differences between the samples in the given run, based on modern allele frequencies, and doesn’t usually provide true admixture rates.
So the fact that someone gets, say, 100% European and 0% East Asian, doesn’t mean they don’t have East Asian admixture. That’s because the European cluster is unlikely to be purely European, but rather a composite of all the things that make up modern Europeans.
I recently e-mailed David Reich, a well-known population geneticist, asking him to give me some tips on how to find “true” levels of East Asian ancestry in Northern Europeans using the ADMIXTURE software. He was kind enough to reply, but basically said that he couldn’t help because he wasn’t an expert on ADMIXTURE. Also, he said that ADMIXTURE wasn’t a formal mixture test, and thus could easily give false results.
The quote below, from one of David Reich’s studies, Reconstructing Indian population history, explains the concept of the formal mixture test in more detail.
We developed a model to study the historical relationship of Indian groups to those worldwide, on the basis of the hypothesis that most groups can be approximated as a mixture of two ancestral populations followed by group-specific drift. To fit the model to the data, we computed the squared allele frequency difference between all pairs of groups, and chose parameters by minimizing the difference between observation and expectation (Supplementary Note 4). The idea of fitting allele frequency differentiation to historical models was first explored by Cavalli-Sforza and Edwards, and here we extend it to trees with mixture. This approach contrasts with the STRUCTURE algorithm, which fits data without a tree, or a tree in which many groups split simultaneously from an ancestral population followed by mixture. Although STRUCTURE is accurate for estimating individual mixture proportions in recently mixed groups, it is not clear whether its estimates of ancient mixture are biased because it does not model hierarchical relationships among groups, which could lead to inaccurate estimates of allele frequencies in ancestral populations. In contrast, we use a more realistic tree model, and provide a test of fit.
Another recent paper, The History of African Gene Flow into Southern Europeans, Levantines, and Jews, directly compared the results from a formal mixture test to those from STRUCTURE. Note, for instance, the large discrepancy between the Sub-Saharan admixture scores for the Sardinian sample obtained from formal and STRUCTURE tests - 2.9 vs. 0.2 respectively.
Indeed, it seems we can’t really be sure of results from PCA and MDS plots either. Here’s a quote from David Reich’s latest article, Reconstructing Native American population history.
In the Saqqaq genome paper, the authors co-analyzed the data they collected with data from diverse present-day populations from Siberia and the America. Based on the patterns that they observed in Principal Component Analysis, they argued that the Saqqaq have ancestry from a different stream of gene flow into America than Eskimo-Aleut speakers, Na-Dene speakers, and Southern Native Americans. However, this is not a formal test: the failure to cluster together in the first few principal components does not necessarily imply that populations are unrelated; just that they do not share much genetic drift on their common ancestral lineage.
This brings me to the last point, which is that there are some major surprises on the way about the genetic origins and structure of Europeans, because it seems we’ve learned very little from non-formal mixture analyses to date.
That latest David Reich paper mentions that all Europeans carry East/Central Asian admixture, with Northern Europeans having more of it than Sardinians. Remarkably, it also says that unadmixed non-Arctic rather than unadmixed Arctic Native Americans are genetically closer to Europeans, and this is due to the aforementioned Asian admixture in Europeans. The quotes below come from the supplementary information to the study.
A complication in computing this statistic is that Native American, Siberian, and East Asian populations are not all equally genetically related to West Eurasian populations, as we can see empirically from 4 Population Tests of the proposed tree (Yoruba, (French, (East Asian, Native American))) failing dramatically whether the East Asian population is Han, Chukchi, Naukan and Koryak. The explanation for this is outside the scope of this study (it has to do with admixture events in Europe, as we explain in another paper in submission). In practice, however, it means that we cannot simply use a European population like French to represent West Eurasians in Equation S3.2, since if we do this, Equation S3.2 may have a non-zero value for a Native American population, even without recent European admixture.
To address this complication, we took advantage of the fact that east/central Asian admixture has affected northern Europeans to a greater extent than Sardinians (in our separate manuscript in submission, we show that this is a result of the different amounts of central/east Asian-related gene flow into these groups). To quantify this, we computed the statistic f4(San, West Eurasian; Pop1, Pop2) for West Eurasian = Sardinian and West Eurasian = French, and for 24 Siberian and Native American populations (Pop1 and Pop2) (Figure S3.2). Figure S3.2 shows a scatterplot for all 190=20×19/2 possible pairs of these populations. Within non-Arctic Native populations, and within Arctic populations (East Greenland Inuit, Chukchi, Naukan and Koryak), the statistics are close to zero, consistent with their being (approximate) clades relative to West Eurasians. In contrast, there are deviations from zero when the comparisons are between non-Arctic Native and Arctic populations, with non-Arctic Native populations showing consistent evidence of being genetically closer to West Eurasians.
The observation of non-zero statistics when one of the Native populations is Arctic and the other is a more southern Native American population is a complication, since we would like Ancestry Subtraction to work not just for southern Native American populations, but also for northern North Americans who have inherited genetic material from multiple streams of Asian migration. However, the fact that Sardinian statistics are smaller than the French statistics by a constant factor (0.75), allows us to adjust for this difference by regression. Specifically, we can compute a linear combination S2 of the French and Sardinian statistics that subtracts out the effect of central/east Asian gene flow into West Eurasians and has an expected value of zero.
Now, unless I don’t quite get what‘s being said there, it seems as if Europeans mostly carry the type of East Asian ancestry that was present in the first human migration wave from Asia to the New World, which moved across the Bering Strait about 15,000 years ago.
So, did a migration wave from the same source also move into Europe at about the same time? If so, this would indicate that the East Asian admixture in Europeans found by David Reich is very old. Perhaps that’s why it’s not possible to measure it accurately using standard ancestry tools?
David Reich et al., Reconstructing Indian population history, Nature, Vol 461|24 September 2009| doi:10.1038/nature08365
Moorjani P, Patterson N, Hirschhorn JN, Keinan A, Hao L, et al. (2011) The History of African Gene Flow into Southern Europeans, Levantines, and Jews. PLoS Genet 7(4): e1001373. doi:10.1371/journal.pgen.1001373
David Reich et al., Reconstructing Native American population history, Nature, Year published: (2012), DOI: doi:10.1038/nature11258
They had blond hair and light eyes, and came from the north…but they were racially impure