search this blog


Tuesday, August 26, 2014

Guessing game

Whose results are these? Feel free to post your guesses in the comments section below. I'll reveal the answer and make the sample available online in a couple of days.

Eurogenes K15 results

North_Sea 5.81
Atlantic 0
Baltic 0.13
Eastern_Euro 32.22
West_Med 0
West_Asian 40.49
East_Med 0
Red_Sea 0
South_Asian 11.7
Southeast_Asian 0
Siberian 1.15
Amerindian 7.41
Oceanian 1.1
Northeast_African 0
Sub-Saharan 0

4 Ancestors Oracle results

1 MA-1+Tabassaran+Tabassaran+Tabassaran @ 7.771513
2 Kalash+MA-1+Tabassaran+Tabassaran @ 7.785069
3 Lezgin+MA-1+Tabassaran+Tabassaran @ 7.960974
4 Kalash+Lezgin+MA-1+Tabassaran @ 7.96793
5 Kalash+Kalash+MA-1+Tabassaran @ 8.119039

Full output

Update 27/08/2014: OK, the sample is a composite of two Lezgins, a people from the Northeast Caucasus, and two Ancient North Eurasian (ANE) genomes from Upper Paleolithic Siberia: Mal'ta boy or MA-1 and Afontova Gora-2 or AG-2. It can be downloaded here.

I chose these two Lezgins because they showed higher than average levels of ANE ancestry (well over 30% in most tests). Basically, I wanted to see where a Lezgin-like individual with unusually high ANE, as well as a dab of WHG, would land on a Principal Component Analysis (PCA) or genetic map of West Eurasia. That's because I now believe that a population like this played a key role in the formation of the modern European gene pool during the early metal ages.

My rough estimate is that the composite genome is around 50% ANE, around 40% early European farmer (EEF), and a few per cent Western European hunter-gatherer (WHG). For a detailed description of these three ancestral components see here.

The outcome is very interesting, because it puts the composite more or less between the Maris and North Caucasians, which roughly translates to the Russo-Kazakh border. This is an area generally accepted to be part of the Proto-Indo-European (PIE) homeland, and fits with a recent theory that populations expanding from this region after the Neolithic might be responsible for the widespread occurrence of ANE across Europe today (see here).

However, formal statistics, rather than PCA, are the favored method for studying ancient genomes in scientific literature. So I thought I'd run f3 and D-statistics to see whether this composite was indeed the closest thing to a PIE individual in my dataset.

I picked a set of French samples as the test group, and chose French Basques as the main reference group, alongside the composite and a variety of populations that are documented or suspected of carrying high levels of ANE. The assumption I made was that the French used to be like the French Basques, their non-Indo-European neighbors, before someone pushed in from the east and changed both their language and genomes.

The results can be seen in the spreadsheet below. Please note, if the f3-statistic is negative, then the target group is assumed to be admixed. Moreover, if the D-statistic Z-score is positive, then the gene flow occurred either between W and Y or X and Z. If the Z-score is negative, then the gene flow occurred either between W and Z or X and Y.

f3 & D-statistics

Thus, the results suggest that the French are indeed admixed compared to the French Basques. In most cases the composite genome is the preferred surrogate for the population that brought this admixture to Western Europe, which is really cool. However, Karitiana Indians from the Amazon, and even Kets and Selkups from Siberia, are apparently the best proxies, which isn't the first time that's happened (see here).

Sunday, August 24, 2014

Genetic structure in the Western Balkans

PLoS ONE has a new paper on the genetic structure of Western Balkan populations. Here's the abstract:

Contemporary inhabitants of the Balkan Peninsula belong to several ethnic groups of diverse cultural background. In this study, three ethnic groups from Bosnia and Herzegovina - Bosniacs, Bosnian Croats and Bosnian Serbs - as well as the populations of Serbians, Croatians, Macedonians from the former Yugoslav Republic of Macedonia, Montenegrins and Kosovars have been characterized for the genetic variation of 660 000 genome-wide autosomal single nucleotide polymorphisms and for haploid markers. New autosomal data of the 70 individuals together with previously published data of 20 individuals from the populations of the Western Balkan region in a context of 695 samples of global range have been analysed. Comparison of the variation data of autosomal and haploid lineages of the studied Western Balkan populations reveals a concordance of the data in both sets and the genetic uniformity of the studied populations, especially of Western South-Slavic speakers. The genetic variation of Western Balkan populations reveals the continuity between the Middle East and Europe via the Balkan region and supports the scenario that one of the major routes of ancient gene flows and admixture went through the Balkan Peninsula.

Among the most eye catching figures from the study is this TreeMix graph with ten migration edges or admixture events. Note the 44% migration edge running from the base of the Eastern European branch to the French. Is this perhaps a legacy of the Proto-Celts and early Germanics? In any case, something similar can be seen on this TreeMix graph from the supplementary PDF to Skoglund et al. 2014, where a French genome is modeled as a clade closely related to Upper Paleolithic Siberian forager MA-1, but with considerable Sardinian admixture.

Also, the position of the Poles at the tip of the tree, and thus near the North Russians, is somewhat curious. However, I know that several of these individuals are ethnic Poles from Estonia, so that might be the problem.

Update 25/08/2014: Here's a typical Eurogenes Principal Component Analysis (PCA) of West Eurasia with the new samples from this paper (Bosnians, Kosovars, Macedonians, Montenegrins and Serbs).


Citation: Kovacevic L, Tambets K, Ilumäe A-M, Kushniarevich A, Yunusbayev B, et al. (2014) Standing at the Gateway to Europe - The Genetic Structure of Western Balkan Populations Based on Autosomal and Haploid Markers. PLoS ONE 9(8): e105090. doi:10.1371/journal.pone.0105090

Tuesday, August 19, 2014

Complex paternal origins of the Han Chinese

There's an intriguing new paper at the AJHB on the paternal ancestry of a population from Iron Age China. It argues that the Han Chinese are the result of fairly recent admixture events, with Y-chromosome haplogroup Q1a1 entering the ancestral territory of the Han, the Central Plain of China, only around 3,000 years ago from the northwest. It's probably a sign of things to come, not only for the Han but many populations generally thought to be genetically homogeneous.

Note also how the Y-chromosome haplogroups appear to be associated with different burial customs and inferred social status. Q1a1 was found in the remains of three aristocrats and eight commoners, most of them buried in the extended prostrate position typical of Bronze and Iron Age steppe nomads of what is now western China. Most of the other remains were buried in the extended supine position, characteristic of the populations of the Chinese Central Plain at the time. I've put the details into a spreadsheet here.

It'll be interesting to learn about the genome-wide genetic structure of the people who introduced haplogroup Q1a1 into the ancestral Han gene pool. Were they perhaps in large part of Ancient North Eurasian (ANE) origin? The reason I say this is because Q is the most common Y-chromosome haplogroup in the Americas, where ANE peaks today. It's also the sister clade of haplogroup R, which is the paternal marker of Mal'ta boy, or the MA-1 genome, the main reference sample for ANE.

Indeed, haplogroup R was expanding in a big way across Europe and West and Central Asia at about the same time as Q1a1 in China. It also probably came from the steppe and was in all likelihood associated with the spread of ANE deep into Europe.

Objectives: Y chromosome haplogroup Q1a1 is found almost only in Han Chinese populations. However, it has not been found in ancient Han Chinese samples until now. Thus, the origin of haplogroup Q1a1 in Han Chinese is still obscure. This study attempts to provide answer to this question, and to uncover the origin and paternal genetic structure of the ancestors of the Han Chinese.

Methods: Eighty-nine ancient human remains that were excavated from the presumed geographic source of the Han Chinese and dated to approximately 3,000 years ago were treated by the amelogenin gene polymerase chain reaction test, to determine their sex. Then, Y chromosome single nucleotide polymorphisms were subsequently analyzed from the samples detected as male.

Results: Samples from 27 individuals were successfully amplified. Their haplotypes could be attributed to haplogroups N, O*, O2a, O3a, and Q1a1. Analyses showed that the assigned haplogroup of each sample is correlated to the suspected social status and observed burial custom associated with the sample.

Conclusions: The origins of the observed haplotypes and their distribution in present day Han Chinese and in the samples suggest that haplogroup Q1a1 was probably introduced into the Han Chinese population approximately 3,000 years ago.


Yong-Bin Zhao et al., Ancient DNA evidence reveals that the Y chromosome haplogroup Q1a1 admixed into the Han Chinese 3,000 years ago, American Journal of Human Biology, Article first published online: 18 AUG 2014, DOI: 10.1002/ajhb.22604

See also...

Lots of ancient Y-DNA from China

First genome of an Upper Paleolithic human (Mal'ta boy)

Ancient European admixture in the Americas, or ancient Amerindian admixture in Europe?

Ancient human genomes suggest (more than) three ancestral populations for present-day Europeans

Friday, August 15, 2014

Near Eastern-like mtDNA from Chalcolithic Spain

Ancient DNA studies based solely on low resolution mtDNA sequences aren't exactly cutting edge science nowadays, but this one is still interesting and somewhat surprising in that it describes an unusually Near Eastern-like population from post-Neolithic Iberia. These people were probably either fresh off the boat colonists from the Near East, or, alternatively, the descendants of Neolithic farmers from the Near East who were yet to begin mixing with other distinct populations to produce the modern Iberian mtDNA gene pool. The authors of this paper favor the second scenario:

Abstract: Previous mitochondrial DNA analyses on ancient European remains have suggested that the current distribution of haplogroup H was modeled by the expansion of the Bell Beaker culture (ca 4,500–4,050 years BP) out of Iberia during the Chalcolithic period. However, little is known on the genetic composition of contemporaneous Iberian populations that do not carry the archaeological tool kit defining this culture. Here we have retrieved mitochondrial DNA (mtDNA) sequences from 19 individuals from a Chalcolithic sample from El Mirador cave in Spain, dated to 4,760–4,200 years BP and we have analyzed the haplogroup composition in the context of modern and ancient populations. Regarding extant African, Asian and European populations, El Mirador shows affinities with Near Eastern groups. In different analyses with other ancient samples, El Mirador clusters with Middle and Late Neolithic populations from Germany, belonging to the Rössen, the Salzmünde and the Baalberge archaeological cultures but not with contemporaneous Bell Beakers. Our analyses support the existence of a common genetic signal between Western and Central Europe during the Middle and Late Neolithic and points to a heterogeneous genetic landscape among Chalcolithic groups.

Of course, these results don't debunk in any way the generally accepted theory that the enigmatic Bell Beakers first expanded from what is now Portugal. Indeed, a Principal Component Analysis (PCA) from the paper shows Bell Beaker mtDNA from Germany (BBC) right next to mtDNA from late Neolithic Portugal (NPO). On the other hand, the El Mirador cave sample (MIR) appears most similar to mtDNA from Germany belonging to the middle Neolithic Salzmunde culture (SMC).


Gómez-Sánchez D, Olalde I, Pierini F, Matas-Lalueza L, Gigli E, et al. (2014) Mitochondrial DNA from El Mirador Cave (Atapuerca, Spain) Reveals the Heterogeneity of Chalcolithic Populations. PLoS ONE 9(8): e105105. doi:10.1371/journal.pone.0105105

Sunday, August 10, 2014

Ancient Middle Easterner (AME)

For a while now I've been trying to put together an ancestry test that splits apart with fair accuracy the three main ancestral European components as per the Lazaridis et al. preprint. Of course, these components are West European Hunter-Gatherer (WHG), Early European Farmer (EEF) and Ancient North Eurasian (ANE). Unfortunately, it's proved to be an impossible task, probably because I'm missing Stuttgart, the only genuine EEF genome sequenced to date.

Stuttgart will be publically available when Lazaridis et al. is eventually published in a journal. However, I thought it might be worth sharing the results from a supervised ADMIXTURE analysis in which I used a subset of Bedouins from the HGDP as proxies for Stuttgart.

K=6 spreadsheet

And here's a Principal Component Analysis (PCA) of Europe based on the data, thanks to Eurogenes Project member PL16.

What's so special about these Bedouins, you might ask? Well, as far as I can tell, just like Stuttgart they don't show any ANE admixture, which is very unusual for a present-day Middle Eastern group. However, unlike Stuttgart they also don't carry much WHG-like ancestry. This suggests that they might well be more similar in terms of genome-wide genetic structure to ancient (pre-Bronze Age?) Middle Easterners than Stuttgart and also anyone currently alive. Hence, I labelled their component AME (Ancient Middle Eastern).

Please note also that the WHG in the spreadsheet is not the same as the Lazaridis et al. WHG. The former is probably more representative of West Eurasian hunter-gatherers rather than just West European hunter-gatherers, even though I used La Brana-1, a Mesolithic hunter-gatherer genome from Iberia, to create it.On the other hand, the ANE component, created with the MA-1 genome from Upper Paleolithic Siberia, looks basically identical to the ANE in scientific literature.

Now, Lazaridis et al. seem very confident that their WHG is not found outside of Europe. However, they also admit that some sort of WHG-like ancestry probably does exist in the Near East. This makes good sense because, for instance, Y-chromosome haplogroup I, which is the most common haplogroup among European hunter-gatherer genomes sequenced to date, is the sister clade of haplogroup J, a marker clearly clearly native to the Near East.

Below are a a couple of PCA illustrating the relationships between the ancient genomes featured in Lazaridis et al. and the components in my ADMIXTURE run, respectively. The first PCA is from page 74 of the latest version of the Lazaridis et al. preprint (see link below).

Note that Loschbour, the WHG genome, is closer in dimension one to Stuttgart than to MA-1 (the main ANE proxy). However, the West Eurasian hunter-gatherers are closer to the Ancient North Eurasians than to the Ancient Middle Easterners. That's probably because, as per above, Stuttgart is more mixed than my AME component, with significant WHG and WHG-like ancestry, both from Europe and the Near East.


Iosif Lazaridis, Nick Patterson, Alissa Mittnik, et al., Ancient human genomes suggest three ancestral populations for present-day Europeans, arXiv, April 2, 2014, arXiv:1312.6639v2

Wednesday, August 6, 2014

Haplotype-based PCA of West Eurasia and Europe

The Principal Component Analyses (PCA) below were produced with matrices of pairwise centiMorgan (cM) values inferred with fastIBD (see here). My aim was to create PCA that took into account haplotype information to see how they might differ from similar plots based on unlinked loci (such as here).

Clearly, they're less reflective of geography and isolation-by-distance, and instead more profoundly influenced by relatively recent isolation, founder effects and/or rapid expansions, especially in Northern and Eastern Europe, and in particular among the Finns, Balts and East Slavs. Unfortunately, I don't have time to say much more about these results. But feel free to post any questions or observations in the comments below. I have done something very similar in the past, but with far fewer samples (see here).

Please note, to ensure that the PCA were as informative as possible I was forced to drop several populations that produced unusual results, probably because of extreme founder effects. This is why, for instance, there are no Ashkenazi Jews on any of the plots, and the only Finns you'll find come from western Finland.

I'll try this again on a much larger dataset when more samples come in, and also include populations from Central and South Asia.

Update 7/8/2014: Apparently some people are wondering what the plots with Finns and Jews look like. Here you go...

Thursday, July 31, 2014

Turks probably came from south Siberia

The good people at the Estonian Biocentre have just put out a preprint at bioRxiv focusing on the genetic origins of Turkic-speaking nomads. It's a solid effort based on a wide range of samples and several standard analyses, including a massive fastIBD run. The authors' conclusions are very sensible and probably correct:

Most of the Turkic peoples studied, except those in Central Asia, genetically resembled their geographic neighbors, in agreement with the elite dominance model of language expansion. However, western Turkic peoples sampled across West Eurasia shared an excess of long chromosomal tracts that are identical by descent (IBD) with populations from present-day South Siberia and Mongolia (SSM), an area where historians center a series of early Turkic and non-Turkic steppe polities. The observed excess of long chromosomal tracts IBD (> 1cM) between populations from SSM and Turkic peoples across West Eurasia was statistically significant. Finally, we used the ALDER method and inferred admixture dates (~9th–17th centuries) that overlap with the Turkic migrations of the 5th–16th centuries. Thus, our results indicate historical admixture among Turkic peoples, and the recent shared ancestry with modern populations in SSM supports one of the hypothesized homelands for their nomadic Turkic and related Mongolic ancestors.

However, even tough the paper includes a lot of detail, I still find it somewhat underwhelming. The blame lies with Lazaridis et al. 2013/2014, which really raised the bar for this type of work, using several ancient genomes and very sophisticated techniques to try and unravel the deep ancestry of Europeans (see here and here). It's probably unreasonable of me to expect most population genetics papers to be so thorough, but it's still disappointing when they're not.

Also, thanks to Lazaridis et al. as well as a few other recent ancient DNA studies, we now know that the prehistory of Eurasia was more complex than anyone had imagined only a few years ago. Once upon a time is was OK to blame any sort of seemingly eastern genetic signals in Europe on Genghis Khan or Attila the Hun. These days you'd look like a bit of an idiot trying that sort of thing.

So yes, in this case the authors probably got it right, and they probably did pick up signals of Turkic migrations from south Siberia and surrounds. But let's wait and see what a good number of ancient genomes reveal about the origins, direction and time frames of population movements across the Eurasian steppe and Taiga belt.


Bayazit Yunusbayev, Mait Metspalu, Ene Metspalu, et al., The Genetic Legacy of the Expansion of Turkic-Speaking Nomads Across Eurasia, bioRxiv posted online July 30, 2014