search this blog


Saturday, June 25, 2016

D-stats/nMonte open thread #3

For the latest datasheets with D-stats of the form D(Chimp,Columns)(Mbuti.DG,Rows), featuring samples from Lazaridis et al. 2016, see here, here and here.

Datasheets with D-stats of the form D(Chimp,Rows)(Mbuti.DG,Columns) are available here, here and here. D-stats 1 and 1b include Iran_Chalcolithic in both the rows and columns, while D-stats 3 and 3b have Eastern_HG in both the rows and columns.

The interesting question is, which of these sheets is the best for estimating admixture proportions, primarily in populations from West Eurasia?

Thursday, June 23, 2016

A moment of clarity

A lot of things now make so much more sense thanks to all of the recently published ancient DNA. For instance, in the Principal Component Analysis (PCA) below, South Central Asians (SC_Asia) finally look like a three-way mixture of Bronze Age steppe pastoralists, early farmers from Iran and surrounds, and indigenous South Asians, which is exactly what they are.

By the way, I also ran a global analysis but didn't get the chance to make a decent plot. However, the datasheet is available for download here. The samples are from a variety of recent DNA papers and freely available at the Reich Lab site here.

Wednesday, June 22, 2016

Yamnaya =/= Eastern Hunter-Gatherers + Iran Chalcolithic

The fully public version of the Lazaridis et al. 2016 dataset is now available for download at the Reich Lab website here. Many thanks to the authors for releasing their data before formal publication, and in fact apparently even before the end of the peer review.

As usual from this team, it's high quality stuff with hundreds of thousands of SNPs genotyped in most of the 45 ancient samples from the Near East. And to think that only a couple of years ago the idea of getting genome-wide data from even a single ancient individual from hot places like the Near East was just that, an idea.

I'm planning to do a lot with this data, but the first issue I want to tackle is the genetic structure of the Yamnaya pastoralists from the Early Bronze Age (EBA) European steppes.

Lazaridis et al. show that Early to Middle Bronze Age steppe groups, including Yamnaya, tagged by them as Steppe EMBA, are best modeled with formal statistics as a mixture of Eastern European Hunter-Gatherers (EHG) and Chalcolithic farmers from western Iran. The mixture ratios are 56.8/43.2, respectively.

However, they add that a model of Steppe EMBA as a three-way mixture between EHG, the Chalcolithic farmers and Caucasus Hunter-Gatherers (CHG) is also a good fit and plausible.

I've looked at the topic before and concluded that Yamnaya had to be in large part of CHG origin, with only minor admixture from early farmers, probably from Eastern Europe (see here and here). After having a chance to study the data from Lazaridis et al. 2016, I stand by my earlier results.

Below are a couple of TreeMix graphs featuring Yamnaya alongside a variety of modern and ancient groups, including several potentially relevant to its ancestry, such as Armenia_Chalcolithic and Iran_Chalcolithic from Lazaridis et al. 2016. The full output is available for download here.

Now, it is true that TreeMix is a temperamental algorithm. It can react in extreme ways to the types of samples chosen by the user, often showing results that might appear wrong, or at the very least counter-intuitive. On the other hand, my experience shows that it's also exceptionally effective at picking up and characterizing significant and relatively sudden pulses of admixture. Moreover, unlike modeling with formal stats, it's an unsupervised test.

Clearly, the graphs below are very much at odds with the claim that Yamnaya might be in large part of Iranian Chalcolithic or similar ancestry. As per my earlier tests, it appears to be overwhelmingly a mixture between EHG and CHG.

It's also important to note that the uniparental marker data in Lazaridis et al. firmly back up my TreeMix output, with the Steppe EMBA groups showing starkly different Y-chromosome and mitochondrial (mtDNA) haplogroups from the ancient samples from Iran.

Indeed, mtDNA haplogroup U7 is an excellent diagnostic marker for ancestry from the southern Caspian region, and, sure enough, it appears in the Iranian Chalcolithic set. Conversely, it's conspicuous by its absence from all Bronze Age steppe remains tested to date.

Admittedly, it's still extremely difficult to be precise about the source of the southern admixture in Yamnaya without lots of high quality samples from all over the steppe and surrounds. But already Iran looks a highly unlikely proposition.

See also...

Indian genetic history in three simple graphs

Monday, June 20, 2016

Ancient genomes from the Himalayas

Open access at PNAS:

Abstract: The high-altitude transverse valleys [>3,000 m above sea level (masl)] of the Himalayan arc from Arunachal Pradesh to Ladahk were among the last habitable places permanently colonized by prehistoric humans due to the challenges of resource scarcity, cold stress, and hypoxia. The modern populations of these valleys, who share cultural and linguistic affinities with peoples found today on the Tibetan plateau, are commonly assumed to be the descendants of the earliest inhabitants of the Himalayan arc. However, this assumption has been challenged by archaeological and osteological evidence suggesting that these valleys may have been originally populated from areas other than the Tibetan plateau, including those at low elevation. To investigate the peopling and early population history of this dynamic high-altitude contact zone, we sequenced the genomes (0.04×–7.25×, mean 2.16×) and mitochondrial genomes (20.8×–1,311.0×, mean 482.1×) of eight individuals dating to three periods with distinct material culture in the Annapurna Conservation Area (ACA) of Nepal, spanning 3,150–1,250 y before present (yBP). We demonstrate that the region is characterized by long-term stability of the population genetic make-up despite marked changes in material culture. The ancient genomes, uniparental haplotypes, and high-altitude adaptive alleles suggest a high-altitude East Asian origin for prehistoric Himalayan populations.

Jeong et al., Long-term genetic stability and a high-altitude East Asian origin for the peoples of the high valleys of the Himalayan arc, PNAS June 20, 2016, doi: 10.1073/pnas.1520844113

Saturday, June 18, 2016

Genetics of an early Neolithic pastoralist from western Iran (Gallego Llorente et al. preprint)

Just in at bioRxiv:

Abstract: The agricultural transition profoundly changed human societies. We sequenced and analysed the first genome (1.39x) of an early Neolithic woman from Ganj Dareh, in the Zagros Mountains of Iran, a site with early evidence for an economy based on goat herding,ca. 10,000 BP. We show that Western Iran was inhabited by a population genetically most similar to hunter-gatherers from the Caucasus, but distinct from the Neolithic Anatolian people who later brought food production into Europe. The inhabitants of Ganj Dareh made little direct genetic contribution to modern European populations, suggesting they were somewhat isolated from other populations in the region. Runs of homozygosity are of a similar length to those from Neolithic Anatolians, and shorter than those of Caucasus and Western Hunter-Gatherers, suggesting that the inhabitants of Ganj Dareh did not undergo the large population bottleneck suffered by their northern neighbours. While some degree of cultural diffusion between Anatolia, Western Iran and other neighbouring regions is possible, the genetic dissimilarity of early Anatolian farmers and the inhabitants of Ganj Dareh supports a model in which Neolithic societies in these areas were distinct.

Gallego Llorente et al., The genetics of an early Neolithic pastoralist from the Zagros, Iran, bioRxiv prerprint, posted June 18, 2016, doi:

The same individual, tagged as GD13a by Gallego Llorente et al., is also featured in the new Lazaridis et al. prerprint as I1290 (see here). Not sure if there's much point in two different papers on the same sample, but at least the sequences are different.

In any case, here's a very interesting part from the paper dealing with the population history of South Asia:

It is possible that farmers related to GD13a contributed to the eastern diffusion of agriculture from the Near East that reached Turkmenistan (34) by the 6th millennium BP, and continued further east to the Indus Valley (35). However, detecting such a contribution is complicated by a later influx from Steppe populations with Caucasus Hunter-Gatherer ancestry during the Bronze Age. We tested whether the Western Eurasian component found in Indian populations can be better attributed to either of these two sources, GD13a and Kotias (a Caucasus Hunter-Gatherer), using D-statistics to detect gene flow into an ancestral Indian component (represented by the Onge). For all tests where a difference could be detected, Kotias acted a better proxy than GD13a (Fig. S9 and Table S6). This result implies that the majority of the West Eurasian component seen in India derives from the Bronze age migrations; this interpretation is supported by dating of last contact based on patterns of Linkage Disequilibrium (36).

Dating admixture events with Linkage Disequilibrium patterns is somewhat controversial, but what they're saying is more or less in agreement with what I've been bleating about on this blog for the last few years (for instance, see here). So I'm very happy to finally see others noticing the same thing.

However, I have to say that the Principal Component Analysis (PCA) in this paper is off the wall. Many of the ancient samples don't appear to cluster where they should. For instance, most of the European Hunter-Gatherers (HGs) are way too close to present-day samples, and in fact in some cases they're overlapping with them, which is just wrong. Note also the unusual elongated cluster formed by the Neolithic Anatolians, stretching all the way from where they should all cluster (south of the Sardinians), to where they really shouldn't (right next to the North Caucasians).

My bet is that the results are affected by missing markers, with the ancient samples with high rates of missing markers being pulled into the middle of the plot (towards 0.00 in both dimensions).

Below is a similar PCA that I ran using my dataset minus GD13a, which I don't yet have access to. Note the relatively tight clusters formed by all of the ancient populations. The European Hunter-Gatherers are distinct from present-day Europeans, while none of the Neolithic Anatolians (Anatolia_N) fall near the North Caucasians in dimension 2. This is of course in line with a wide range of formal stats.

Friday, June 17, 2016

The genetic structure of the world's first farmers (Lazaridis et al. preprint)

Huge one from the Laz at bioRxiv:

We report genome-wide ancient DNA from 44 ancient Near Easterners ranging in time between ~12,000-1,400 BCE, from Natufian hunter-gatherers to Bronze Age farmers. We show that the earliest populations of the Near East derived around half their ancestry from a 'Basal Eurasian' lineage that had little if any Neanderthal admixture and that separated from other non-African lineages prior to their separation from each other. The first farmers of the southern Levant (Israel and Jordan) and Zagros Mountains (Iran) were strongly genetically differentiated, and each descended from local hunter-gatherers. By the time of the Bronze Age, these two populations and Anatolian-related farmers had mixed with each other and with the hunter-gatherers of Europe to drastically reduce genetic differentiation. The impact of the Near Eastern farmers extended beyond the Near East: farmers related to those of Anatolia spread westward into Europe; farmers related to those of the Levant spread southward into East Africa; farmers related to those from Iran spread northward into the Eurasian steppe; and people related to both the early farmers of Iran and to the pastoralists of the Eurasian steppe spread eastward into South Asia.

Lazaridis et al., The genetic structure of the world's first farmers, bioRxiv preprint, posted June 16, 2016, doi:

And here's a list of the Y-chromosome haplogroups for the new samples in this paper:

Armenia_ChL (Chalcolithic Armenia)

I1407: L1a
I1632: L1a
I1634: L1a


I1635: R1b1-M415(xM269)

Iran_Mesolithic (Hotu Cave)

I1293: J(xJ2a1b3, J2b2a1a1)


I1945: P1(xQ, R1b1a2, R1a1a1b1a1b, R1a1a1b1a3a, R1a1a1b2a2a)

My guess here is that this is R2, and hopefully we shall see when the bam files are released.

I1949: CT


I1671: G2a1(xG2a1a)

Iran_ChL (Chalcolithic Iran)

I1662: J(xJ1a, J2a1, J2b)
I1674: G1a(xG1a1)


I0861: E1b1b1b2(x E1b1b1b2a, E1b1b1b2b)
I1069: E1b1(xE1b1a1, E1b1b1b1)
I1072: E1b1b1b2(xE1b1b1b2a, E1b1b1b2b)
I1685: CT
I1690: CT


I0867: H2 (PPNB)
I1414: E(xE2, E1a, E1b1a1a1c2c3b1, E1b1b1b1a1, E1b1b1b2b) (PPNB)
I1415: E1b1b1 (PPNB)
I1416: CT (PPNB)
I1707: T(xT1a1, T1a2a) (PPNB)
I1710: E1b1b1(x E1b1b1b1a1, E1b1b1a1b1, E1b1b1a1b2, E1b1b1b2a1c) (PPNB)
I1727: CT(xE, G, J, LT, R, Q1a, Q1b) (PPNB)
I1700: CT (PPNC)


I1705: J1(xJ1a)
I1730: J(xJ1, J2a, J2b2a)

See also...

A moment of clarity

Monday, June 13, 2016

Another ancient genome from Iran coming soon

Very interesting abstract here from a recent genomics conference in London. I do have a few more details from this presentation, but I guess they're not yet online for a reason, so let's just wait until the paper comes out. Suffice to say, the data in this paper is going to be very useful in the Proto-Indo-European (PIE) homeland debate. I just hope that the mixture model chosen by the authors is really solid and doesn't leave too much to the imagination.

Iran is considered a pivotal region in the Fertile Crescent, occupying a central space between Africa and Eurasia, and has thus been extensively studied to infer the development of the earliest human civilizations and farming settlements. From a historical and cultural perspective, this region is also of great interest as the cradle of Zoroastrianism. With reported roots dating back to the second millennium BC in Iran, Zoroastrianism is one of the oldest religions in the world and is now mainly concentrated in India, Iran, and Southern Pakistan. In this work we present novel genotype data from present-day Zoroastrians from Iran and India, along with a high coverage (10x) early Neolithic sample from Iran (7,455-7,082 BC), comparing these samples to publicly available genome-wide genotypes from >200 modern and ancient groups worldwide to elucidate patterns of shared ancestry. We apply a novel Bayesian mixture model to represent the DNA from modern and ancient groups or individuals as mixtures of that from other sampled groups or individuals, using a haplotype-based approach that is more powerful than commonly-used algorithms. Our mixture model identifies which sampled groups are most related to one another genetically, reflecting shared common ancestry relative to other groups due to e.g. admixture (i.e. intermixing of genetically distinct groups) or other historical processes. Interestingly, analysis of ancestry patterns revealed strong affinities of the Neolithic Iranian sample to modern-day Pakistani and Indian populations, and particularly to Iranian Zoroastrians, in stark contrast to Neolithic samples from Europe. We also identify, describe and date recent admixture events in modern-day Iranian groups that have altered their current genetic make-up relative to these ancient origins.

Saioa López et al., The genetic landscape of Iran and the legacy of Zoroastrianism: Comparing haplotype sharing patterns among ancient and modern-day samples using a mixture model, poster presentation, Quantitative Genomics 2016, University College London (UCL)

See also...

Neolithic genome from Iran SMBE 2016 teaser

Thursday, June 9, 2016

The discrepancy

I posted this in the comments in the previous blog entry [here], but it hasn't received the attention that it deserves, so it's now getting an entry all of its own. I'm also posting Matt's reply, since he motivated me to look at the formal stats from Hofmanova et al. in more detail. I'd like to get to the bottom of this. Any ideas?

Davidski said...


How would you interpret these sets of f4 and D statistics?

The f4 stats are from Hofmanova et al., while the D stats were run by me. The first set of D stats uses the highest quality Anatolia Neolithic sample from Barcin from Mathieson et al. and CHG genotypes from Fu Q et al., and the second uses the same Barcin sample plus CHG genotypes from Jones et al.

Also, keep in mind that, as far as I can tell, the Barcin genomes from Hofmanova et al. and Mathieson et al. date to the same period.

f4 Corded_Ware_LN Bar8 Satsurblia Khomani -0.0367 -8.145
f4 Corded_Ware_LN Bar8 Kotias Khomani -0.0193 -3.437

f4 Spain_MN Bar8 Satsurblia Khomani -0.0327 -5.385
f4 Spain_MN Bar8 Kotias Khomani -0.0182 -3.136


D Corded_Ware_Germany BAR20_I0709 Satsurblia Khomani 0.0215 4.299
D Corded_Ware_Germany BAR20_I0709 Kotias Khomani 0.0205 4.408

D Iberia_MN BAR20_I0709 Satsurblia Khomani -0.0003 -0.06
D Iberia_MN BAR20_I0709 Kotias Khomani -0.0017 -0.339


D Corded_Ware_Germany BAR20_I0709 Satsurblia2 Khomani 0.0224 4.38
D Corded_Ware_Germany BAR20_I0709 Kotias2 Khomani 0.0226 5.168

D Iberia_MN BAR20_I0709 Satsurblia2 Khomani 0.0068 1.222
D Iberia_MN BAR20_I0709 Kotias2 Khomani 0 0.004

Clearly, something's horribly wrong. If I made a mistake, my apologies. But I'm pretty sure I didn't make any mistakes. I checked the datasets that I'm using for consistency with the f4 and D stats published in Mathieson et al. and Fu et al., so I can say with confidence that my D stats should not be much different from correctly run f4 stats using the same ancient samples.

June 8, 2016 at 7:06 PM

Matt said...

@ Davidski, yeah I see what's going on there with the D stats giving a result we would expect from previous work - Anatolia Neolithic and Iberia_MN equally related to CHG, Corded Ware more to CHG - with the stats from this paper being different - Bar8 being more related to CHG than Iberia_MN, and Bar8 even more strongly related to CHG relative to Corded_Ware, also implying Iberia_MN more related to CHG than Corded Ware is. I don't know that there's anything about f4 vs D stats themselves that would explain that difference, and as you say, yours are consistent with the previously published.

This is really stuff that should have been in and the resolved in the early print. That's the whole point of the process!

June 9, 2016 at 12:13 AM

Update 12/06/2016: Here are f4 stats using the same data as for the D stats above. They look basically the same as the D stats.

Corded_Ware_Germany BAR20_I0709 Satsurblia Khomani 0.002073 4.294
Corded_Ware_Germany BAR20_I0709 Kotias Khomani 0.001997 4.402

Iberia_MN BAR20_I0709 Satsurblia Khomani -0.00003 -0.06
Iberia_MN BAR20_I0709 Kotias Khomani -0.000157 -0.34

Corded_Ware_Germany BAR20_I0709 Satsurblia2 Khomani 0.0021 4.388
Corded_Ware_Germany BAR20_I0709 Kotias2 Khomani 0.002031 4.971

Iberia_MN BAR20_I0709 Satsurblia2 Khomani 0.000613 1.221
Iberia_MN BAR20_I0709 Kotias2 Khomani 0.000002 0.004

Monday, June 6, 2016

Comic relief from Hofmanova et al. at PNAS

PNAS has a new paper on the Neolithic transition in Europe. I don't know what the authors were puffing on when they computed the inferred mixture coefficients, but they look like crap, with Loschbour-related admixture (in other words, indigenous European ancestry) peaking near the Caspian Sea, including among North Caucasians and Kalmyks, who today live just northwest of the Caspian, but are recent migrants to Europe from Mongolia.

Moreover, western Turks appear to show fairly even ratios of Loschbour and early Aegean farmer admixture, which is also strange.

I pointed out this problem when the paper was posted for review at bioRxiv (see here), but my comments were ignored. The co-authors responsible for this analysis are Lucy van Dorp, Saioa Lopez and Garrett Hellenthal. The paper was edited by Eske Willerslev of the University of Copenhagen. Holy shit!

Zuzana Hofmanová et al., Early farmers from across Europe directly descended from Neolithic Aegeans, PNAS June 6, 2016, 2016, doi: 10.1073/pnas.1523951113

See also...

The discrepancy

Friday, June 3, 2016

Neolithic genome from Iran SMBE 2016 teaser

Update 18/06/2016: Genetics of an early Neolithic pastoralist from western Iran (Gallego Llorente et al. preprint)


The paper is probably coming very soon. This is an abstract of an SMBE 2016 talk to be held in four weeks (emphasis is mine):

The shift from hunter-gathering to food production, the so-called Neolithic Revolution, profoundly changed human societies. Whilst much is known about the mode of spread of people and domesticates into Europe during the Neolithic period, the origin of this cultural package in the Ancient Near East and Anatolia is poorly understood. By sequencing the whole genome (1.39x) of an early Neolithic woman from Ganj Dareh, in the Zagros Mountains of Iran, we show that the eastern part of the Ancient Near East was inhabited by a population genetically most similar to hunter-gatherers from the Caucasus but distinct from the Neolithic Anatolian people who later brought food production into Europe. Despite their key role in developing the Neolithic package, the inhabitants of Ganj Dareh made little direct genetic contribution to modern European populations, suggesting they were somewhat isolated from other populations in this region. Their high frequency of short runs of homozygosity, comparable to other early Neolithic farmers, suggests that they overwintered the Last Glacial Maximum in a climatically favourable area, where they may have received a genetic contribution from a population basal to modern Eurasians. Thus, the Neolithic package was developed by at least two genetically-distinct groups which coexisted next to each other, implying a degree of cultural yet little genetic exchange among them.

Gallego Llorente et al., The Neolithic Revolution developed among geographically adjacent but genetically distinct populations, oral presentation (35146), Society for Molecular Biology and Evolution Conference 2016

See also...

Another ancient genome from Iran coming soon

On crop dispersal in prehistoric Central Asia