search this blog


Tuesday, August 19, 2014

Complex paternal origins of the Han Chinese

I haven't read this yet, but it looks like an intriguing paper. It's probably a sign of things to come, not only for the origins of the Han Chinese but many populations generally thought to be genetically homogenous. Note also how each of the identified haplogroups (N, O*, O2a, O3a, and Q1a1) appear to be associated with specific burial customs and inferred social status. Fascinating stuff.

Objectives: Y chromosome haplogroup Q1a1 is found almost only in Han Chinese populations. However, it has not been found in ancient Han Chinese samples until now. Thus, the origin of haplogroup Q1a1 in Han Chinese is still obscure. This study attempts to provide answer to this question, and to uncover the origin and paternal genetic structure of the ancestors of the Han Chinese.

Methods: Eighty-nine ancient human remains that were excavated from the presumed geographic source of the Han Chinese and dated to approximately 3,000 years ago were treated by the amelogenin gene polymerase chain reaction test, to determine their sex. Then, Y chromosome single nucleotide polymorphisms were subsequently analyzed from the samples detected as male.

Results: Samples from 27 individuals were successfully amplified. Their haplotypes could be attributed to haplogroups N, O*, O2a, O3a, and Q1a1. Analyses showed that the assigned haplogroup of each sample is correlated to the suspected social status and observed burial custom associated with the sample.

Conclusions: The origins of the observed haplotypes and their distribution in present day Han Chinese and in the samples suggest that haplogroup Q1a1 was probably introduced into the Han Chinese population approximately 3,000 years ago.

It'll be interesting to learn about the genome-wide genetic structure of the population that introduced haplogroup Q1a1 into the ancestral Han gene pool. Were they perhaps in large part Ancient North Eurasian (ANE)? The reason I say this is because Q is the most common Y-chromosome haplogroup in the Americas, where ANE peaks today. It's also the sister clade of haplogroup R, which is the paternal marker of Mal'ta boy, or the MA-1 genome, the main reference sample for ANE.


Yong-Bin Zhao etal., Ancient DNA evidence reveals that the Y chromosome haplogroup Q1a1 admixed into the Han Chinese 3,000 years ago, American Journal of Human Biology, Article first published online: 18 AUG 2014, DOI: 10.1002/ajhb.22604

See also...

First genome of an Upper Paleolithic human (Mal'ta boy)

Ancient European admixture in the Americas, or ancient Amerindian admixture in Europe?

Ancient human genomes suggest (more than) three ancestral populations for present-day Europeans

Friday, August 15, 2014

Near Eastern-like mtDNA from Chalcolithic Spain

Ancient DNA studies based solely on low resolution mtDNA sequences aren't exactly cutting edge science nowadays, but this one is still interesting and somewhat surprising in that it describes an unusually Near Eastern-like population from post-Neolithic Iberia. These people were probably either fresh off the boat colonists from the Near East, or, alternatively, the descendants of Neolithic farmers from the Near East who were yet to begin mixing with other distinct populations to produce the modern Iberian mtDNA gene pool. The authors of this paper favor the second scenario:

Abstract: Previous mitochondrial DNA analyses on ancient European remains have suggested that the current distribution of haplogroup H was modeled by the expansion of the Bell Beaker culture (ca 4,500–4,050 years BP) out of Iberia during the Chalcolithic period. However, little is known on the genetic composition of contemporaneous Iberian populations that do not carry the archaeological tool kit defining this culture. Here we have retrieved mitochondrial DNA (mtDNA) sequences from 19 individuals from a Chalcolithic sample from El Mirador cave in Spain, dated to 4,760–4,200 years BP and we have analyzed the haplogroup composition in the context of modern and ancient populations. Regarding extant African, Asian and European populations, El Mirador shows affinities with Near Eastern groups. In different analyses with other ancient samples, El Mirador clusters with Middle and Late Neolithic populations from Germany, belonging to the Rössen, the Salzmünde and the Baalberge archaeological cultures but not with contemporaneous Bell Beakers. Our analyses support the existence of a common genetic signal between Western and Central Europe during the Middle and Late Neolithic and points to a heterogeneous genetic landscape among Chalcolithic groups.

Of course, these results don't debunk in any way the generally accepted theory that the enigmatic Bell Beakers first expanded from what is now Portugal. Indeed, a Principal Component Analysis (PCA) from the paper shows Bell Beaker mtDNA from Germany (BBC) right next to mtDNA from late Neolithic Portugal (NPO). On the other hand, the El Mirador cave sample (MIR) appears most similar to mtDNA from Germany belonging to the middle Neolithic Salzmunde culture (SMC).


Gómez-Sánchez D, Olalde I, Pierini F, Matas-Lalueza L, Gigli E, et al. (2014) Mitochondrial DNA from El Mirador Cave (Atapuerca, Spain) Reveals the Heterogeneity of Chalcolithic Populations. PLoS ONE 9(8): e105105. doi:10.1371/journal.pone.0105105

Sunday, August 10, 2014

Ancient Middle Easterner (AME)

For a while now I've been trying to put together an ancestry test that splits apart with fair accuracy the three main ancestral European components as per the Lazaridis et al. preprint. Of course, these components are West European Hunter-Gatherer (WHG), Early European Farmer (EEF) and Ancient North Eurasian (ANE). Unfortunately, it's proved to be an impossible task, probably because I'm missing Stuttgart, the only genuine EEF genome sequenced to date.

Stuttgart will be publically available when Lazaridis et al. is eventually published in a journal. However, I thought it might be worth sharing the results from a supervised ADMIXTURE analysis in which I used a subset of Bedouins from the HGDP as proxies for Stuttgart.

K=6 spreadsheet

And here's a Principal Component Analysis (PCA) of Europe based on the data, thanks to Eurogenes Project member PL16.

What's so special about these Bedouins, you might ask? Well, as far as I can tell, just like Stuttgart they don't show any ANE admixture, which is very unusual for a present-day Middle Eastern group. However, unlike Stuttgart they also don't carry much WHG-like ancestry. This suggests that they might well be more similar in terms of genome-wide genetic structure to ancient (pre-Bronze Age?) Middle Easterners than Stuttgart and also anyone currently alive. Hence, I labelled their component AME (Ancient Middle Eastern).

Please note also that the WHG in the spreadsheet is not the same as the Lazaridis et al. WHG. The former is probably more representative of West Eurasian hunter-gatherers rather than just West European hunter-gatherers, even though I used La Brana-1, a Mesolithic hunter-gatherer genome from Iberia, to create it.On the other hand, the ANE component, created with the MA-1 genome from Upper Paleolithic Siberia, looks basically identical to the ANE in scientific literature.

Now, Lazaridis et al. seem very confident that their WHG is not found outside of Europe. However, they also admit that some sort of WHG-like ancestry probably does exist in the Near East. This makes good sense because, for instance, Y-chromosome haplogroup I, which is the most common haplogroup among European hunter-gatherer genomes sequenced to date, is the sister clade of haplogroup J, a marker clearly clearly native to the Near East.

Below are a a couple of PCA illustrating the relationships between the ancient genomes featured in Lazaridis et al. and the components in my ADMIXTURE run, respectively. The first PCA is from page 74 of the latest version of the Lazaridis et al. preprint (see link below).

Note that Loschbour, the WHG genome, is closer in dimension one to Stuttgart than to MA-1 (the main ANE proxy). However, the West Eurasian hunter-gatherers are closer to the Ancient North Eurasians than to the Ancient Middle Easterners. That's probably because, as per above, Stuttgart is more mixed than my AME component, with significant WHG and WHG-like ancestry, both from Europe and the Near East.


Iosif Lazaridis, Nick Patterson, Alissa Mittnik, et al., Ancient human genomes suggest three ancestral populations for present-day Europeans, arXiv, April 2, 2014, arXiv:1312.6639v2

Wednesday, August 6, 2014

Haplotype-based PCA of West Eurasia and Europe

The Principal Component Analyses (PCA) below were produced with matrices of pairwise centiMorgan (cM) values inferred with fastIBD (see here). My aim was to create PCA that took into account haplotype information to see how they might differ from similar plots based on unlinked loci (such as here).

Clearly, they're less reflective of geography and isolation-by-distance, and instead more profoundly influenced by relatively recent isolation, founder effects and/or rapid expansions, especially in Northern and Eastern Europe, and in particular among the Finns, Balts and East Slavs. Unfortunately, I don't have time to say much more about these results. But feel free to post any questions or observations in the comments below. I have done something very similar in the past, but with far fewer samples (see here).

Please note, to ensure that the PCA were as informative as possible I was forced to drop several populations that produced unusual results, probably because of extreme founder effects. This is why, for instance, there are no Ashkenazi Jews on any of the plots, and the only Finns you'll find come from western Finland.

I'll try this again on a much larger dataset when more samples come in, and also include populations from Central and South Asia.

Update 7/8/2014: Apparently some people are wondering what the plots with Finns and Jews look like. Here you go...

Thursday, July 31, 2014

Turks probably came from south Siberia

The good people at the Estonian Biocentre have just put out a preprint at bioRxiv focusing on the genetic origins of Turkic-speaking nomads. It's a solid effort based on a wide range of samples and several standard analyses, including a massive fastIBD run. The authors' conclusions are very sensible and probably correct:

Most of the Turkic peoples studied, except those in Central Asia, genetically resembled their geographic neighbors, in agreement with the elite dominance model of language expansion. However, western Turkic peoples sampled across West Eurasia shared an excess of long chromosomal tracts that are identical by descent (IBD) with populations from present-day South Siberia and Mongolia (SSM), an area where historians center a series of early Turkic and non-Turkic steppe polities. The observed excess of long chromosomal tracts IBD (> 1cM) between populations from SSM and Turkic peoples across West Eurasia was statistically significant. Finally, we used the ALDER method and inferred admixture dates (~9th–17th centuries) that overlap with the Turkic migrations of the 5th–16th centuries. Thus, our results indicate historical admixture among Turkic peoples, and the recent shared ancestry with modern populations in SSM supports one of the hypothesized homelands for their nomadic Turkic and related Mongolic ancestors.

However, even tough the paper includes a lot of detail, I still find it somewhat underwhelming. The blame lies with Lazaridis et al. 2013/2014, which really raised the bar for this type of work, using several ancient genomes and very sophisticated techniques to try and unravel the deep ancestry of Europeans (see here and here). It's probably unreasonable of me to expect most population genetics papers to be so thorough, but it's still disappointing when they're not.

Also, thanks to Lazaridis et al. as well as a few other recent ancient DNA studies, we now know that the prehistory of Eurasia was more complex than anyone had imagined only a few years ago. Once upon a time is was OK to blame any sort of seemingly eastern genetic signals in Europe on Genghis Khan or Attila the Hun. These days you'd look like a bit of an idiot trying that sort of thing.

So yes, in this case the authors probably got it right, and they probably did pick up signals of Turkic migrations from south Siberia and surrounds. But let's wait and see what a good number of ancient genomes reveal about the origins, direction and time frames of population movements across the Eurasian steppe and Taiga belt.


Bayazit Yunusbayev, Mait Metspalu, Ene Metspalu, et al., The Genetic Legacy of the Expansion of Turkic-Speaking Nomads Across Eurasia, bioRxiv posted online July 30, 2014

Monday, July 28, 2014

Shared drift between eleven ancient genomes and 217 present-day populations

I've just figured out a more effective way of running lots of f3-statistics, using the 3-Population (qp3Pop) Test offered as part of the Admixtools package. I'll be updating this post as new ancient genomes are published, but here's what I've got so far:

AG-2 (Upper Paleolithic Siberian hunter-gatherer)

Ajvide52 (Late Neolithic Gotland hunter-gatherer)

Ajvide58 (Late Neolithic Gotland hunter-gatherer)

Anzick-1 (Late Pleistocene North Amerindian)

Gokhem2 (Late Neolithic Swedish farmer)

Gokhem4 (Late Neolithic Swedish farmer)

Ire8 (Late Neolithic Gotland hunter-gatherer)

La Brana-1 (Mesolithic Iberian hunter-gatherer)

MA-1 (Upper Paleolithic Siberian hunter-gatherer)

Saqqaq (~4,000 year-old Palaeo-Eskimo from Greenland)

StoraFörvar11 (Mesolithic Gotland hunter-gatherer)

I'll also throw in the results for Karitiana Indians and Dai from southern China: see here and here. These should prove useful to anyone wanting to analyze the MA-1 output in more detail.

Below are a couple of graphs based on the shared drift stats for MA-1, Gokhem2 and La Brana-1, featuring samples with the highest SNP counts (less than 5% of missing markers in each of the three tests). The datasheets with the full keys can be downloaded here and here. You can open them with any text editor, but they're best viewed with Past3, which is freely available here.

See also...

f3-stats: 100 present-day populations plus MA-1

The Gokhem2 factor

Saturday, July 26, 2014

The Gokhem2 factor

Gokhem2 is a late Neolithic genome from Sweden published by Skoglund et al. earlier this year. It's a very important sample because it probably represents the typical Western and Central European of its time; mostly of ancient Near Eastern origin but with substantial (perhaps as much as 25%) indigenous Western European Hunter-Gatherer (WHG) ancestry. Moreover, in all likelihood it belonged to one of the last people alive just before much of Europe experienced large-scale shifts in material culture and DNA during the early metal ages.

I ran an f3 analysis of 65 present-day populations plus Gokhem2 to see whether the descendants and/or close relatives of this 5,000 year-old individual made a significant impact on the modern European gene pool. The results suggest that indeed they did.

f3-statistics are used to confirm admixture; if the f3 ratio is significantly negative, then the test group is considered to be admixed.

In my analysis the lowest f3 means for almost all West Eurasians, except some Northeast Europeans, involve the Mbuti Pygmies and Gokhem2. This pairing seems to represent something very basal, and if we ignore it, we find that Northern Europeans are best characterized as Gokhem2 plus Amerindians or Siberians, and most Southern Europeans as Gokhem2 plus North Indians or Pakistanis.

Below are the five lowest f3 means (along with the standard errors and Z-scores) for several present-day European groups, after ignoring the Mbuti/Gokhem2 pairing. The full output from this test can be downloaded here.

Belorussian;Karitiana,Gokhem2 -0.002573 0.000774451 -3.32236
Belorussian;Gokhem2,Chukchi -0.00225896 0.000646711 -3.493
Belorussian;Gokhem2,Pima -0.00216931 0.000670227 -3.23669
Belorussian;Selkup,Gokhem2 -0.00203724 0.000592401 -3.43896
Belorussian;Shors,Gokhem2 -0.00192983 0.0005887 -3.27813

Bulgarian;Gokhem2,Chukchi -0.00354553 0.000601482 -5.89465
Bulgarian;Karitiana,Gokhem2 -0.00348264 0.000702038 -4.96076
Bulgarian;Gujarati3,Gokhem2 -0.00334141 0.000350863 -9.52341
Bulgarian;Gokhem2,Koryak -0.00332069 0.000647444 -5.12892
Bulgarian;Gujarati2,Gokhem2 -0.00331262 0.000331569 -9.99073

Chuvash;Gokhem2,Chukchi -0.00686169 0.000556132 -12.3383
Chuvash;Gokhem2,Koryak -0.00669996 0.000567184 -11.8127
Chuvash;Sardinian,Koryak -0.00604931 0.000186753 -32.392
Chuvash;Sardinian,Chukchi -0.0059954 0.000175401 -34.1812
Chuvash;French_Basque,Koryak -0.00598742 0.000179266 -33.3996

East_Sicilian;Gujarati3,Gokhem2 -0.0027439 0.000414587 -6.6184
East_Sicilian;Gujarati2,Gokhem2 -0.00268618 0.000403019 -6.66514
East_Sicilian;Sindhi,Gokhem2 -0.00262624 0.000353489 -7.42948
East_Sicilian;Balochi,Gokhem2 -0.00244354 0.000355967 -6.86452
East_Sicilian;Gujarati1,Gokhem2 -0.00235444 0.000412289 -5.71065

French_Basque;Shors,Gokhem2 -9.07978E-005 0.000580612 -0.156383
French_Basque;Gujarati3,Gokhem2 -4.85325E-005 0.000398641 -0.121745
French_Basque;Gujarati2,Gokhem2 -3.56909E-005 0.000386198 -0.092416
French_Basque;Sindhi,Gokhem2 4.45144E-006 0.000364349 0.0122175
French_Basque;Karitiana,Gokhem2 0.00010138 0.000748248 0.13549

Orcadian;Karitiana,Gokhem2 -0.00203301 0.000747568 -2.7195
Orcadian;Gokhem2,Pima -0.00178257 0.000658361 -2.70759
Orcadian;Gokhem2,Chukchi -0.00158611 0.000640651 -2.47578
Orcadian;Shors,Gokhem2 -0.00156046 0.000566159 -2.75623
Orcadian;Selkup,Gokhem2 -0.00151986 0.00058037 -2.61877

Portuguese;Karitiana,Gokhem2 -0.00355338 0.000769315 -4.61889
Portuguese;Gujarati3,Gokhem2 -0.00349678 0.000497986 -7.02185
Portuguese;Gujarati2,Gokhem2 -0.00335917 0.00048259 -6.96072
Portuguese;Shors,Gokhem2 -0.00322054 0.000638617 -5.04299
Portuguese;Sindhi,Gokhem2 -0.00320858 0.000469164 -6.83894

Swedish;Karitiana,Gokhem2 -0.00262634 0.000716148 -3.66732
Swedish;Gokhem2,Pima -0.00243545 0.000623376 -3.90688
Swedish;Gokhem2,Chukchi -0.0023131 0.000623257 -3.71131
Swedish;Shors,Gokhem2 -0.00226576 0.000577058 -3.9264
Swedish;Selkup,Gokhem2 -0.00223314 0.000570344 -3.91542

The reasons for the strong showing by the Karitiana should be obvious by now; Amazon Indians carry high levels of Ancient North Eurasian (ANE) ancestry, so they're simply the best proxies for the ANE-rich people who apparently pushed deep into Europe during the early metal ages. Pakistanis and North Indians usually produce lower f3 means for Southern Europeans probably because they carry lower levels of ANE than the Karitiana, and also harbor significant Neolithic ancestry from the Near East, which is much more important in Southern Europe than Northern Europe.

Unfortunately, this doesn't get us any closer to knowing the precise genetic composition of the aforementioned post-Neolithic invaders of Europe. They certainly weren't like the Karitiana, who derive almost 60% of their genetic structure from East Asia, because East Asian admixture is basically lacking in most of Europe apart from some current and former Turkic and Uralic-speaking regions (see here). They also couldn't have been exactly like Pakistanis and Indians, who carry South Asian admixture which, again, is essentially missing from Europe.

Hopefully the ancient genomes from the Samara Valley currently being analyzed at the Reich lab are as useful in helping to solve this riddle as they're shaping up to be (see here).

Interestingly, the f3-statistics from my latest experiment correlate very nicely with this Principal Component Analysis (PCA) of West Eurasia that I ran a few weeks ago to test the present-day genetic affinities of Gokhem2. Note that most Europeans on this plot can basically be described as a two-way mixture between Gokhem2 and something from the east.

Also worth noting is that Western Europeans are involved in the lowest f3 means for a few groups from deep in Asia. Again, the underlying cause for this is the correct balance of ancient components among the reference samples. At this stage, we can only speculate why in these instances that balance is the correct one.

Gujarati1;Dai,Orcadian -0.00268514 0.00020487 -13.1066
Gujarati1;Dai,Cyprian -0.00267843 0.000199391 -13.4331
Gujarati1;Dai,North_Italian -0.0026674 0.000203923 -13.0804
Gujarati1;Dai,Tuscan -0.00265071 0.000223236 -11.874
Gujarati1;Dai,Greek -0.00264462 0.000182486 -14.4922

Hakas;Sardinian,Koryak -0.00506239 0.000211893 -23.8912
Hakas;German,Yukaghir -0.00497669 0.000269175 -18.4887
Hakas;Irish,Yukaghir -0.00496753 0.000259862 -19.116
Hakas;French_Basque,Koryak -0.00492849 0.000215496 -22.8704
Hakas;French_Basque,Yukaghir -0.00490055 0.000247992 -19.7609

Shors;French_Basque,Koryak -0.00197323 0.00041609 -4.74232
Shors;Sardinian,Koryak -0.00193602 0.000408341 -4.74119
Shors;French_Basque,Yukaghir -0.00185561 0.000425466 -4.36137
Shors;Irish,Yukaghir -0.00173416 0.00044614 -3.88703
Shors;French,Yukaghir -0.00168648 0.000416114 -4.05293

By the way, shared drift statistics with Gokhem2 using f3(Mbuti;Gokhem2,Test) are available in a spreadsheet here. As expected, Sardinians easily top the list followed by the French Basques.

See also...

f3-stats: 100 present-day populations plus MA-1

More ancient genomes from Sweden: Pitted Ware forager Ajvide58 and TRB farm girl Gokhem2