Monday, April 30, 2018

Zoroastrian genetic origins revisited

About a year ago I found that the ancestry of present-day Iranians was best explained as largely a mixture between early Anatolian and Iranian farmers and Sarmatians from the Pontic-Caspian steppe (see here).

Things have now changed somewhat after the release of several hundred ancient samples from across Eurasia. Below are the best qpAdm models that I was able to find for various Iranian ethnic/regional populations based on my new dataset.

Ganj_Dareh_N 0.363±0.031
Hajji_Firuz_ChL 0.481±0.029
Karagash_MLBA 0.156±0.019
tail: 0.753635
Ganj_Dareh_N 0.056±0.042
Hajji_Firuz_ChL 0.883±0.039
Karagash_MLBA 0.061±0.027
tail: 0.862141
Ganj_Dareh_N 0.598±0.048
Hajji_Firuz_ChL 0.244±0.045
Karagash_MLBA 0.158±0.030
tail: 0.604908
Dashti_Kozy_BA 0.143±0.025
Ganj_Dareh_N 0.286±0.034
Hajji_Firuz_ChL 0.571±0.029
tail: 0.994129
Ganj_Dareh_N 0.309±0.035
Hajji_Firuz_ChL 0.556±0.029
Yamnaya_Samara 0.134±0.019
tail: 0.383344
Ganj_Dareh_N 0.279±0.045
Hajji_Firuz_ChL 0.600±0.048
Yamnaya_Samara 0.073±0.048
West_Siberia_N 0.048±0.033
tail: 0.413456
Ganj_Dareh_N 0.417±0.033
Hajji_Firuz_ChL 0.464±0.031
Karagash_MLBA 0.120±0.020
tail: 0.777933
Bustan_BA 0.352±0.053
Dashti_Kozy_BA 0.168±0.031
Hajji_Firuz_ChL 0.480±0.036
tail: 0.921955
However, all of the Iranian groups are still scoring a fair amount of ancient steppe ancestry, with the Zoroastrians ahead of the rest, which is potentially important, because they're basically a population relict from pre-Islamic Persia. Hence, this might be betraying their stronger ties to pre-Turkic, early Indo-Iranian Central Asia relative to the other Iranians. Also worth noting:

- As far as I can see, the Zoroastrians are the only Iranians in this analysis that really benefit from the addition of an Bactria Margiana Archaeological Complex (BMAC) reference population to their model, which might also be important, for the same reason outlined above

- There's no point modeling most of the Iranian groups as partly of Western Siberian forager (West_Siberia_N) origin, except perhaps the Mazandarani Iranians

- Indeed, Mazandarani Iranians are also the only group better modeled as part Yamnaya rather than Steppe_MLBA, which might be explained by Yamnaya-related incursions into what is now Northwestern Iran during the Early Bronze Age (see here)

- No matter what, I can't find a working model (P-value >0.05) for the Bandari Iranians using the new set of right pops aka outgroups, probably because the Bandaris harbor recent admixture from outside of Iran, including from Africa

On a related note, there's yet another feature in the Indian media about the impending publication of ancient DNA from the Harappan burial site at Rakhigarhi (see here). I've lost count of how many articles like this I've read over the last few years. But unlike the rest, this one actually reveals some specific information about the results: no Y-haplogroup R1a and no steppe ancestry in the Harappan sample or samples. So this time, I'd say that we're only days or weeks away from the publication of the relevant paper.

My final prediction in this context is that we'll see an ancient genome, or, hopefully, genomes, basically identical to the Indus_Periphery samples from Narasimhan et al. 2018 (see here). And then, apart from a few crazy people still shouting online that we need many more Harappan genomes because almost anything is yet possible, it'll be game over.

Friday, April 27, 2018

The mystery of the Sintashta people

During the Middle to Late Bronze Age, the steppes southeast of the Ural Mountains, in what is now Russia, were home to communities of metallurgists who buried their warriors with horses and the earliest examples of the spoked-wheel battle chariot.

We don't know what they called themselves, because they didn't leave any written texts, but their archaeological culture is commonly known as Sintashta. It was named after a river near one of their main settlements; an elaborate fortified town that has also been described as an ancient metallurgical industrial center. Another of their well known settlements, very similar to Sintashta, is Arkaim, pictured below courtesy of Wikipedia.

Sintashta is arguably one of the coolest ancient cultures ever discovered by archaeologists. It's also generally accepted to be the Proto-Indo-Iranian culture, and thus linguistically ancestral to a myriad of present-day peoples of Asia, including Indo-Aryans and Persians. No wonder then, that its origin, and that of its population, have been hotly debated issues.

The leading hypothesis based on archaeological data is that Sintashta is largely derived from the more westerly and warlike Abashevo culture, which occupied much of the forest steppe north of the Black and Caspian Seas. In turn, Abashevo is usually described as an eastern offshoot of the Late Neolithic Corded Ware Culture (CWC), which is generally seen as the first Indo-European archaeological culture in Northern Europe (see here).

Below is a Principal Component Analysis (PCA) featuring 38 Sintashta individuals from the recent Narasimhan et al. 2018 preprint. Note that the main Sintashta cluster overlaps almost perfectly with the main CWC cluster. The relevant datasheet is available here.

Moreover, many ancient and present-day South and Central Asians, particularly those identified with or speaking Indo-Iranian languages, appear to be strongly attracted to the main Sintashta cluster, forming an almost perfect cline between this cluster and the likely Indus Valley diaspora individuals who show no evidence of steppe ancestry.

This is in line with mixture models based on formal statistics showing significant Sintashta-related ancestry in Indo-Iranian-speakers (for instance, see here), and high frequencies of Y-haplogroup R1a-Z93 in both the Sintashta and many Indo-Iranian-speaking populations.

Some of the Sintashta samples are outliers from the main Sintashta cluster, and that's because they harbor elevated levels of ancestry related to the Mesolithic and Neolithic foragers of Eastern Europe and/or Western Siberia. This is especially true of a pair of individuals who belong to Y-haplogroup Q. However, this doesn't contradict archaeological data, which suggest that the Sintashta community may have been multi-cultural and multi-lingual. Indeed, it's generally accepted based on historical linguistics data that there were fairly intense contacts in North Eurasia between the speakers of Proto-Indo-Iranian, Proto-Uralic and Yeniseian languages.

Thus, it appears that there's not much left to debate because ancient DNA has seemingly backed up the most widely accepted hypotheses about the origin of Sintashta and its people, and their identification mainly as Proto-Indo-Iranian-speakers.

However, a sample from a Sredny Stog II culture burial on the North Pontic steppe, in what is now eastern Ukraine, has complicated matters somewhat. This individual, known as Ukraine_Eneolithic I6561, not only clusters very strongly with the most typical Sintashta samples, but also belongs to Y-haplogroup R1a-Z93. On the other hand, none of the CWC remains sequenced to date belong to this particular subclade of R1a (although, obviously, they do belong to a host of near and far related R1a subclades).

I've never seen anyone worth reading propose that Sintashta might derive from Sredny Stog II instead of Abashevo. And no wonder, because Sredny Stog II was long gone when Sintashta appeared in the archaeological record.

However, if CWC remains continue to fail to produce R1a-Z93, while, at the same time, the steppes of eastern Ukraine and surrounds are shown to be a hotbed of R1a-Z93 from the Sredny Stog to the Sintashta periods, which I think is possible, then ancient DNA might well force a serious re-examination of how the awesome Sintashta culture and people came to be.

Sunday, April 22, 2018

Likely Yamnaya incursion(s) into Northwestern Iran

Despite being stratigraphically dated to 5900-5500 BCE (ie. the Chalcolithic period), ancient sample Hajji_Firuz I2327 from Narasimhan et al. 2018, belongs to Y-haplogroup R1b-Z2103 and shows minor, but unambiguous, Yamnaya-related ancestry on the autosomes. Why is this a problem? Because both R1b-Z2103 and the Yamnaya culture are dated to the Bronze Age, and Yamnaya samples from Kalmykia and Samara are exceptionally rich in R1b-Z2103.

Hence, pending a successful radiocarbon (C14) dating analysis, it seems unlikely that Hajji_Firuz I2327 was alive during the Chalcolithic. Rather, it appears that he's partly of Yamnaya origin and has been wrongly dated. His remains are likely to be from a secondary burial from the Bronze Age that collapsed into the layer below, right into a Chalcolithic bin ossuary burial full of much older bones.

This scenario is strongly corroborated by data from two other ancient individuals from what is now Northwestern Iran:

- Hajji_Firuz_BA I4243 (also from Narasimhan et al. 2018 and from the same site as Hajji_Firuz I2327) was initially also stratigraphically dated to the Chalcolithic, but is now labeled as a Bronze Age sample after a radiocarbon (C14) analysis of the remains revealed a date of 2465-2286 calBCE. Moreover, this individual packs around 50% Yamnaya-related ancestry.

- Iran_IA F38 (from Broushaki et al. 2016) from an Iron Age burial at Tepe Hasanlu, which is just a few miles from Hajji Firuz, also belongs to Y-haplogroup R1b-Z2103 and harbors some sort of steppe ancestry on the autosomes (see here).

Below is a Principal Component Analysis (PCA) showing how this trio compare in terms of genome-wide ancestry to C14-dated Chalcolithic samples from Hajji Firuz and the nearby Seh Gabi. The relevant datasheet is available here.

Clearly, they're shifted "north" relative to the Chalcolithic group and thus closer to the Eneolithic/Bronze Age steppe cluster, suggesting that they carry steppe ancestry that was missing, or at least much less pronounced, in the region before the Bronze Age. I can use qpAdm and Global25/nMonte to double check this and also estimate more precisely their levels of Yamnaya-related admixture.

Afanasievo 0.172±0.033
Hajji_Firuz_ChL 0.313±0.156
Seh_Gabi_ChL 0.515±0.158
tail: 0.668410201 (full output)

Hajji_Firuz_ChL 0.484±0.033
Yamnaya_Samara 0.516±0.033
tail: 0.26511852 (full output)




Considering the standard errors and statical fits, qpAdm and Global25/nMonte have produced very similar results for both samples, which cannot be explained away as coincidental outcomes. I think these are signals of a population movement or movements from the Pontic-Caspian steppe into the South Caspian region, probably across the Caucasus, and most likely during the Bronze Age rather than the Chalcolithic.

I don't have a clue who these people were. It's rather unlikely that they were the early Iranians, who probably arrived in the region from Central Asia during the Late Bronze Age or even Iron Age (for instance, see here). Perhaps they were the Hittites? Indeed, in his book In Search of the Indo-Europeans, archaeologist James Mallory suggested that the ancestors of the Hittites and other Anatolian-speakers entered the Near East via the Caucasus route:

Most arguments for an Indo-European invasion from the northeast concern the appearance of a new burial rite at the end of the fourth and through the third millennium BC. At that time, both north of the Black Sea and the Caucasus, burials on the Russian-Ukrainian steppe were typically placed in an underground shaft and covered with a mound (kurgan in Russian). Before 3000 BC there begin to appear in the territory of the indigenous Transcaucasian (Kuro-Araxes) culture somewhat similar burials such as the royal tomb of Uch-Tepe on the Milska steppe. As tumulus burials are previously unknown in this region, some would explain their appearance by an intrusion of steppe pastoralists who migrated through the Caucasus and subjugated the local Early Bronze Age culture. More importantly, a status burial inserted into a mound at the site of Korucu Tepe in eastern Anatolia has been compared with somewhat similar burials both in the Caucasus and the Russian steppe. The discovery of horse bones on several sites of east Anatolia such as Norsun Tepe and Tepecik are seen to confirm a steppe intrusion since, as mentioned earlier, the horse, long known in the Ukraine and south Russia, is not attested in Anatolia prior to the Bronze Age.

Another option, however, is that they belonged to some other extinct Indo-European group, such as the Gutians (see here). In any case, keep an eye out for more Bronze Age samples from this part of the world. I have a strong feeling that, unlike their Neolithic and Chalcolithic predecessors, they will be rich in steppe ancestry and R1b-Z2103.

Wednesday, April 18, 2018

Protohistoric Swat Valley peoples in qpGraph

If I was to add one thing to the Narasimhan et al. 2018 preprint, it'd be a series of uncomplicated qpGraph trees that back up, very simply and directly, the main conclusions in the manuscript. Such as this:

If some of you think that it's possible to show pretty much anything in these sorts of graphs, then you're wrong. For instance, it's not possible to swap West_Siberia_N for Sintashta, because the highest Z score usually blows out from almost nothing to well over five. And it's not possible to push Sintashta-related ancestry into Dravidian-speakers from South India. But if you think it is, then, by all means, have a go. The graph file is here.

Friday, April 13, 2018

On the doorstep of India

One of the most remarkable discoveries in the recent Narasimhan et al. 2018 preprint has to be the presence of what are essentially Eastern European migrant populations within the Inner Asian Mountain Corridor (IAMC) during the Middle to Late Bronze Age (MLBA). Remarkable for so many reasons, but seemingly under-appreciated by a lot of people, judging by the online discussions that I've seen on the preprint, and even, I'd say, the authors themselves.

Narasimhan et al. labeled these groups as belonging to the "forest/steppe MLBA" complex (for instance, see the main figure from the preprint here). This is indeed what they are in terms of their genetic structure, but certainly not geography, because the IAMC is well south of the steppe. Thus, in my Principal Component Analysis (PCA) I'm going to label them as part of the "post-steppe herder expansion Turan" complex.

Strikingly, most of these people cluster with Bronze Age Eastern Europeans, and even some Bronze Age Central Europeans. They're also sitting very close to the more easterly present-day Slavic-speakers from Russia and Ukraine, and indeed closer to the bulk of the European cluster than some present-day Turkic and Uralic groups from the Volga-Ural region. Even I never predicted such an outcome. Sure, I was expecting to see ancient genomes from South Central Asia with some very heavy steppe influence, but not this. The relevant datasheet is available here.

Two of the MLBA IAMC individuals are from Kashkarchi in the Ferghana Valley, in what is now Uzbekistan, and basically on the doorstep of the Indian subcontinent. I've made special mention of them on the plot, and I've also highlighted a pair of individuals from the Bronze Age Central Asian sites of Gonur Tepe and Shahr-i Sokhta, who are, in all likelihood, unadmixed migrants from the Indus Valley (for more on that, see here).

It's surely not a coincidence that the ancient and present-day South Asians on the plot (including those from Pakistan's Swat Valley dated to the Iron Age) form an almost prefect cline between these two pairs of individuals. It's also surely not a coincidence that the MLBA IAMC groups are rich in Y-haplogroup R1a-M417, and in particular its R1a-Z93 subclade, which is today an especially frequent marker in Indo-European-speaking South Asians.

Forget about the pre-MLBA populations from the forests, steppe, or IAMC, like those represented by Dali_EBA; they're practically irrelevant to this story. How do I know? Because they have little to no impact on the above mentioned cline. And this can be easily verified with mixture models based on multiple Principal Components (PCs) and formal statistics (for instance, see here).

Clearly, many populations in South Asia, particularly those speaking Indo-European languages, derive the bulk of their steppe-related ancestry from the peoples of the MLBA IAMC, and/or their very close relatives. And if you do believe that this inference is just based on coincidences, then I'm sorry to say this, but obviously a new, much less mentally challenging, hobby or profession beckons. All the best with that.

Just to help put all of this in a geographic perspective, here's a topographical map of Eurasia. I've marked the location of the Ferghana Valley. The close relatives of Kashkarchi_BA most likely skirted their way around those winding high mountains and slipped into India via the Khyber Pass, which I've also marked on the map.

And the rest, as they say, is history, including the history described in the ancient Indo-Aryan Sanskrit texts known as the Vedas. I'm sure we'll soon be learning about these events in great detail when many more ancient samples from Pakistan and, hopefully, the first ancient samples from India, are published.


Narasimhan et al, The Genomic Formation of South and Central Asia, Posted March 31, 2018, doi:

Wednesday, April 11, 2018

Bronze Age Central Asia: terra incognita no longer

I've updated my Global25 datasheets with the samples from the Narasimhan et al. 2018 preprint (look for these labels). Feel free to use this output for anything you like, and please show us the results in the comments below.

Global 25 datasheet

Global 25 datasheet (scaled)

Global 25 pop averages

Global 25 pop averages (scaled)

Also, here's my Principal Component Analysis (PCA) of ancient West Eurasia featuring most of the new samples. Note the cline made up of ancient and present-day South Asians running from the likely Indus Valley diaspora individuals (from the Gonur Tepe and Shahr-i Sokhta archaeological sites, in present-day Turkmenistan and Iran, respectively) towards the Bronze Age steppe. The relevant datasheet is available here.

I have little doubt that these are indeed migrants from the Indus Valley Civilization (IVC). Their relatively unusual genetic structure - which includes ancestry from an West Eurasian ghost population that is inferred to have been exceedingly poor in Anatolian-related ancestry, as well as significant indigenous South Asian ancestry - leaves little scope for plausible alternatives. If you're wondering what they may have been doing so far north of the IVC, Frenez 2018 has a detailed discussion on the topic. From the paper:

An alternative and intriguing hypothesis is instead supported by significant archaeological and textual data from comparable socio-economic or geographical contexts, which suggest that the likely high commercial and ideological value of ivory and of the expertise required to carve it made also possible and economically profitable the presence in Central Asia of independent itinerant ivory carvers native to or trained in the Indus Valley. These itinerant artisans might have provided at the same time both the raw material and the unique skills to transform it into finished objects.


Moreover, the existence of itinerant ivory workers in ancient South Asia is also described in a few literary sources. The Guttila Jātaka mentions a group of ivory carvers who traveled from Benares to Ujjain to offer their products and skills to the local elites (Pal, 1978: 46), while a Buddhist Sanskrit Vinaya tells the story of an Indian master ivory carver who traveled “up to the land of the Yavanas”, most likely the Hellenistic Bactria, to put his superior expertise at the service of a renown local artist (Dwivedi, 1976: 19).

Citation: Frenez, D., Manufacturing and trade of Asian elephant ivory in Bronze Age Middle Asia. Evidence from Gonur Depe (Margiana, Turkmenistan), Archaeological Research in Asia (2017),

