Monday, January 11, 2016
The Poltavka outlier
Anyone who still thinks that Y-chromosome haplogroup R1a originated in South Asia should burn this map into their brains. It'll come in useful over the next few years as we learn from ancient DNA about the conquest of the Indian subcontinent, and indeed much of Asia, by pastoralists from the western Russian and Ukrainian steppes.
X marks the spot of the burial site of Poltavka sample I0432 from the Mathieson et al. 2015 dataset. This individual belongs to Y-chromosome haplogroup R1a-Z93(Z94+), which today accounts for well over 90% of the R1a lineages in Asia and peaks in frequency at over 60% in the northern parts of South Asia.
Moreover, the dating of his burial site, 2925-2536 calBCE, suggests that he lived not long after the Z93 and Z94 mutations came into existence. That's because Z93 doesn't appear to be much older than 5,000 years based on full Y-chromosome sequence data (see here and here, including the comments).
So I0432 could well turn out to be a crucial piece in the puzzle of the peopling of South Asia.
Interestingly, this individual was flagged as an outlier in the Poltavka sample set by Mathieson et al., hence his other moniker: the Poltavka outlier. However, this wasn't because of any ancestry from South or even Central Asia. In fact, it was because he was too western.
Principal Component Analyses (PCA) featuring a wide range of present-day and ancient samples from Europe and Asia, like the one below, show that Poltavka outlier clusters further west than most Corded Ware individuals from Germany. Right click and open in a new tab to view full size.
In the past, using qpAdm, I modeled Poltavka outlier as 63.7% Yamnaya Samara and 36.3% German Middle Neolithic (see here). This is probably not very far from the truth, but qpAdm offers a supervised mixture test in which the results are heavily reliant on the choice of outgroups, so I thought I'd revisit the issue with TreeMix, which allows an unsupervised analysis.
In a dataset including seven relatively high coverage Copper Age (CA), Early Bronze Age and Middle Neolithic (MN) European genomes, TreeMix picked out Poltavka outlier as the most likely sample to be admixed, showing a mixture edge of 33% from the base of the branch leading to the Iberian MN individual to that of Poltavka outlier.
This outcome is very similar to my qpAdm model, but it suggests an even more western source of admixture in Poltavka outlier. Could this admixture actually be from Iberia? I wouldn't discount this possibility, considering the presence of Bell Beaker communities, possibly of Atlantic or even Iberian origin, as far east as present-day Poland. Indeed, according to Cassidy et al. 2015, German Beakers show high affinity to MN and CA Iberians (see page 51 in the supp info here).
I double checked my TreeMix result with D-stats, and yep, when placed in a clade with Poltavka or Samara Yamnaya, Poltavka outlier shows the strongest signal of admixture from the Iberia MN individual.
At the same time, however, the signal from the Early Neolithic (EN) Iberian fails to reach significance (Z=<3), which suggests that, in fact, TreeMix and D-stats might be seeing the Iberia MN sample as the most attractive mixture source due to her high level of Western European hunter-gatherer (WHG) ancestry, which Poltavka outlier also has plenty of, rather than anything specific to Iberia.
In any case, it's clear enough that Poltavka outlier was the result of mixture between Yamnaya-related western steppe pastoralists and the descendants of Middle Neolithic Europeans with a high ratio of WHG ancestry. Where this admixture actually took place and which archaeological cultures were involved will have to be resolved with further sampling of ancient remains from Central and Eastern Europe.
However, it's already impossible to place the origin of Poltavka outlier anywhere in Asia, which suggests that both Z93 and Z94 are also from well inside the generally accepted borders of Europe.
This obviously has implications for the origins of the Indo-Iranians, because the widespread presence of these mutations in Asia gels very nicely with the idea, and indeed academic consensus, that Indo-Iranian languages expanded rapidly from the Eurasian steppe into Asia during the Bronze Age.
Considering that Poltavka outlier came from a Kurgan burial, and was therefore an individual of some social standing, he might be the direct ancestor of many millions of present-day Asians. If so, this won't be very difficult to prove in the near future as ancient DNA research revs up a few notches.
On a related note, apparently there's a paper on the way with ancient DNA results from Rakhigarhi, a Harappan site in Haryana, northern India (see here). As far as I know, the results will include Y-chromosome haplogroups of three males, but I don't think we'll see any decent genome-wide data at this stage. However, hopefully I'm wrong and the paper will come out with full ancient genomes.
Feel free to post your predictions in the comments. I'm tentatively expecting a couple of instances of J2 and maybe an L or H. Razib made basically the same prediction recently so I'm not being original. What I do know is that we won't see any R1a-Z93. The only way that might happen is if, say, someone coughed or sneezed on the Harappan remains.
Data source and reference...
Mathieson et al., Genome-wide patterns of selection in 230 ancient Eurasians, Nature, 528, 499–503 (24 December 2015), doi:10.1038/nature16152