search this blog

Tuesday, February 20, 2018

Migration of the Bell Beakers—but not from Iberia (Olalde et al. 2018)


At last, after many months of waiting, the paper that I've been calling the Bell Beaker Behemoth will finally appear at Nature today or tomorrow, depending on your time zone [Update: the paper is here]. The accompanying dataset is already online, and it's twice as big as what the paper's bioRxiv preprint promised, packing 400 new samples from Neolithic, Copper Age and Bronze Age Europe (freely available via the Reich Lab here).

I'll incorporate these samples into my collection of ancients very shortly, and then put them through their paces in the usual and new ways.

Nevertheless, despite the much larger and more varied new dataset, I know for a fact that the conclusions in the paper are the same as those in the preprint (which we discussed here). The authors tentatively accept the archaeologically-based academic consensus that the Bell Beaker phenomenon originated in Copper Age Iberia. But they admit that they can't find evidence in their ancient DNA data that its expansion across much of the rest of Europe was accompanied by significant gene flow from Iberia, and thus driven by migration.

However, they do see in their data a large-scale migration of Central European Beakers to Western Europe around 2500 BC, bringing with them, amongst other things, steppe or Yamnaya-related admixture to the region for the first time. Many of the new samples are from the British Isles - where the impact of this migration was profound, resulting in roughly a 90% turnover of the population - and they appear to have been collected specifically to reaffirm this conclusion.

How exactly this massive population turnover came about isn't known yet. But early indications from other parts of Europe, where similar population shifts have been inferred from ancient DNA for the Late Neolithic/Early Bronze Age period, are that plague epidemics and deadly violence may have been important factors (see here and here).

I don't have a strong opinion about the place of origin of the Beaker cultural package, and I don't find the Iberian model entirely satisfying, mostly because it doesn't gel with the latest ancient DNA data. On the other hand, I've made up my mind as to who the Central European Beakers rich in steppe ancestry and also Y-haplogroup R1b-M269 were, and you can read about that here.

What are your thoughts after looking over the new samples? It's a big dataset alright, but does it do justice to the massive and complex Bell Beaker phenomenon? If not, then what's missing? Who's actually happy that the puzzle of the origin of the Beakers has now been solved? Feel free to let me know in the comments below.

Update 21/02/2018: I've updated my Global 25 datasheets with most of the ancient samples from Olalde et al. 2018 and Mathieson et al. 2018 (see list here).

Global 25 datasheet

Global 25 datasheet (scaled)

Global 25 pop averages

Global 25 pop averages (scaled)

See also...

Who's your (proto) daddy Western Europeans?

Sunday, February 18, 2018

C for Cheddar Man (?)


A new preprint has just appeared at bioRxiv on the Mesolithic to Neolithic transition and resulting massive population shift in Britain. It features genome-wide data from six Mesolithic and 67 Neolithic individuals, including the famous Cheddar Man.

Population Replacement in Early Neolithic Britain by Brace et al.

The peculiar thing about this preprint is that it doesn't list the Y-haplogroups of the male ancients. However, it's been rumored for a while that Cheddar Man belongs to Y-haplogroup C (for instance, see here). Has this now been confirmed officially anywhere?

On a related note, the guys at DNAGeeks have been working on a range of Cheddar Man products (see here). So for a few bucks you can get yourself a Cheddar Man tee or wall print based on this arty depiction of the Mesolithic British forager. Yes, his resemblance to pop icon Prince is indeed uncanny.


Thursday, February 15, 2018

Modeling genetic ancestry with Davidski: step by step


There are many different ways to model your genetic ancestry. I prefer the Global25/nMonte method (see here). This is a step by step guide to modeling ancient ancestry proportions with this simple but powerful method using my own genome.


As far as I know, the vast majority of my recent ancestors came from the northern half of Europe. This may or may not be correct, but it gives me somewhere to start, so that I can come up with a coherent model. If you don't have this sort of information, because, perhaps, you were adopted, then just look in the mirror, and work from there. Like I say, it's not imperative that you know anything whatsoever about your ancestry, because your genetic data will do the talking, but you do need a model when modeling.

In scientific literature nowadays, Northern Europeans are often described as a three-way mixture between Yamnaya-related pastoralists, Anatolian-derived early farmers, and Western European Hunter-Gatherers (WHG). So let's see if this model works for me. Obviously, if it does, then it'll confirm the information that I have about my origins, but it might also reveal finer details that I'm not aware of. The datasheet that I'm using for this model is available here.

[1] distance%=6.9025 / distance=0.069025

Davidski

Yamnaya_Samara 53.9
Barcin_N 30.75
Rochedane 15.35
Tepecik_Ciftlik_N 0

Yep, the model does work, with a fairly reasonable distance of almost 7%. The ancestry proportions more or less match those from scientific literature and the plethora of analyses that I've featured at this blog on the topic. Please note that I've kept things very simple, using only four reference populations and individuals as proxies for four distinct streams of ancestry. But I've put my own twist on this Neolithic/Bronze Age model by including two populations from Neolithic Anatolia (Barcin_N and Tepecik_Ciftlik_N), just to see what would happen. The WHG proxy is Rochedane.

Admittedly, though, my Yamnaya cut of ancestry appears somewhat bloated at over 53%, and the model's distance is a little higher than what I normally see for really strong models. So let's check if I can get a better fitting and more sensible result by adding a slightly more easterly forager proxy than Rochedane: Narva_Lithuania.

[1] distance%=5.9331 / distance=0.059331

Davidski

Yamnaya_Samara 45.75
Barcin_N 31.45
Narva_Lithuania 22.8
Rochedane 0
Tepecik_Ciftlik_N 0

The statistical fit does improve, and when given a choice between Rochedane and Narva_Lithuania, the algorithm picks the latter as the only source of extra forager input in my genome.

What could this mean? It might mean that a large part of my ancestry derives from the Baltic region. Actually, I know for a fact that this is true. But even if I had no idea about my genealogy, this result would be a very strong hint about my genetic origins. Indeed, let's follow this trail and try to further improve the fit of the model by adding a more relevant Yamnaya-related proxy, such as early Baltic Corded Ware (CWC_Baltic_early).

[1] distance%=5.444 / distance=0.05444

Davidski

CWC_Baltic_early 54.95
Barcin_N 26.7
Narva_Lithuania 18.35
Rochedane 0
Tepecik_Ciftlik_N 0
Yamnaya_Samara 0

Holy shit! To be honest, I wasn't expecting this sort of resolution and accuracy, and I can't promise that everyone using the Global25/nMonte method will see such incredibly nuanced outcomes, but this isn't a fluke. It can't be, because it gels so well with everything that I know about my ancestry. Please note also that I belong to Y-chromosome haplogroup R1a-M417, which is a lineage intimately associated with the Corded Ware expansion across Northern Europe (for instance, see here).

But of course, the Baltic and nearby regions haven't been isolated from migrations and invasions since the Corded Ware times. For instance, at some point, probably during the Bronze Age, Uralic-speaking peoples moved west across the forest zone of Northeastern Europe and into the East Baltic and northern Scandinavia. It's generally accepted that they brought Siberian admixture with them (see here). Moreover, from the Iron Age to the Middle Ages, East Central Europe was under intense pressure from a wide range of nomadic steppe groups with complex ancestry, such as the Sarmatians, Avars, Huns, and Mongolians. Did any of these peoples leave their mark on my genome? At the risk of overfitting the model, let's explore this possibility by adding a few more reference populations.

[1] distance%=5.444 / distance=0.05444

Davidski

CWC_Baltic_early 54.95
Barcin_N 26.7
Narva_Lithuania 18.35
Han 0
Mongolian 0
Nganassan 0
Rochedane 0
Sarmatian_Pokrovka 0
Tepecik_Ciftlik_N 0
Yamnaya_Samara 0

Nothing changes when I add the Han Chinese, Mongolians, Nganassans (an Uralic people from Siberia), and Sarmatians to the model. But what about if I throw in the only ancient Slav in my datasheet?

[1] distance%=2.9904 / distance=0.029904

Davidski

Slav_Bohemia 85.9
CWC_Baltic_early 7.7
Narva_Lithuania 6.4
Barcin_N 0
Rochedane 0
Tepecik_Ciftlik_N 0
Yamnaya_Samara 0

Considering that the vast majority of my recent ancestors were Poles, thus a Slavic-speaking people from near the Baltic, this outcome makes perfect sense. And check out the new distance! But the problem now is that I'm overfitting the model by using two very similar and probably very closely related references, CWC_Baltic_early and Slav_Bohemia. And overfitting should be avoided at all costs. So it might be useful to break up this effort into two models: one focusing on the Neolithic and Bronze Age, and the other on the Iron Age and Middle Ages. I'll do that soon, but not just yet, because there are still too few Iron Age and Medieval samples available from the Baltic region and surrounds for meaningful analyses of this type.

For a more technical guide to running Global25-type data with nMonte, please refer to this post at my other blog by regular commentator Onur: An nMonte and 4mix guide for the participants of the Basal-rich K7 and/or Global 10 tests.

Tuesday, February 6, 2018

Unleash the power: Global 25 test drive thread


Ancestry modeling enthusiasts, feel free to do your best (or worst) with these datasheets and share the output, whatever it might be, in the comments below:

Global 25 datasheet

Global 25 datasheet (scaled)

Global 25 pop averages

Global 25 pop averages (scaled)

Global 25 PAST datasheet

The Global 25 is a more powerful version of the Global 10 ancestry analysis (see here). If all goes well in the comments, it'll soon be offered for free to those who already have Global 10 coordinates. After that, we'll see what happens.


Below is a quick attempt to model Samara Yamnaya with its Global 25 coordinates using nMonte2. Hmmm...interesting stuff.

[1] distance%=2.4936 / distance=0.024936

Yamnaya_Samara

Samara_Eneolithic 71.15
Armenia_EBA 13.6
CHG 12.6
Iran_LN 2.65
ALPc_MN 0
Anatolia_BA 0
Anatolia_ChL 0
Armenia_ChL 0
Baden_LCA 0
Balaton_Lasinja_CA 0
Baltic_HG 0
Barcin_N 0
Blatterhole_HG 0
Blatterhole_MN 0
Boncuklu_N 0
EHG 0
Greece_N 0
Greece_Peloponnese_N 0
Iran_ChL 0
Iran_N 0
Koros_EN 0
Koros_HG 0
LBKT_MN 0
LBK_EN 0
Levant_BA 0
Levant_N 0
Narva_Estonia 0
Narva_Lithuania 0
SHG 0
Starcevo_EN 0
TDLN 0
Tepecik_Ciftlik_N 0
Tianyuan 0
Tisza_LN 0
Tiszapolgar_ECA 0
Vinca_MN 0
WHG 0

Obviously, this makes a lot of sense, but it's somewhat different from my recent models of Samara Yamnaya using methods based on formal stats (see here and here). In the end, only ancient DNA from the steppes and Caucasian-Caspian region will settle this issue when enough of it is sampled.

Update 10/02/201: As per our discussion in the comments, in most cases it might be useful to restore the variance of the raw data (like in the datasheets here and here). This can be done with the EigenScale.R script available here. You'll also need a text file of the relevant eigenvalues, available here. Instructions on how to call the R script are here. Below is the same model of Samara Yamnaya as above, except using the "restored" data. The result is very similar, albeit a little cleaner.

[1] distance%=3.8086 / distance=0.038086

Yamnaya_Samara

Samara_Eneolithic 70.35
Armenia_EBA 14.9
CHG 14.75
ALPc_MN 0
Anatolia_BA 0
Anatolia_ChL 0
Armenia_ChL 0
Baden_LCA 0
Balaton_Lasinja_CA 0
Baltic_HG 0
Barcin_N 0
Blatterhole_HG 0
Blatterhole_MN 0
Boncuklu_N 0
EHG 0
Greece_N 0
Greece_Peloponnese_N 0
Iran_ChL 0
Iran_LN 0
Iran_N 0
Koros_EN 0
Koros_HG 0
LBKT_MN 0
LBK_EN 0
Levant_BA 0
Levant_N 0
Narva_Estonia 0
Narva_Lithuania 0
SHG 0
Starcevo_EN 0
TDLN 0
Tepecik_Ciftlik_N 0
Tianyuan 0
Tisza_LN 0
Tiszapolgar_ECA 0
Vinca_MN 0
WHG 0

See also...

Modeling genetic ancestry with Davidski: step by step

Mitogenomes from the Iron Age South Baltic (Stolarek et al. 2018)


Over at Scientific Reports at this LINK. And yes, full genomes of many of the samples are on the way. Emphasis is mine:

Abstract: Despite the increase in our knowledge about the factors that shaped the genetic structure of the human population in Europe, the demographic processes that occurred during and after the Early Bronze Age (EBA) in Central-East Europe remain unclear. To fill the gap, we isolated and sequenced DNAs of 60 individuals from Kowalewko, a bi-ritual cemetery of the Iron Age (IA) Wielbark culture, located between the Oder and Vistula rivers (Kow-OVIA population). The collected data revealed high genetic diversity of Kow-OVIA, suggesting that it was not a small isolated population. Analyses of mtDNA haplogroup frequencies and genetic distances performed for Kow-OVIA and other ancient European populations showed that Kow-OVIA was most closely linked to the Jutland Iron Age (JIA) population. However, the relationship of both populations to the preceding Late Neolithic (LN) and EBA populations were different. We found that this phenomenon is most likely the consequence of the distinct genetic history observed for Kow-OVIA women and men. Females were related to the Early-Middle Neolithic farmers, whereas males were related to JIA and LN Bell Beakers. In general, our findings disclose the mechanisms that could underlie the formation of the local genetic substructures in the South Baltic region during the IA.

Stolarek et al., A mosaic genetic structure of the human population living in the South Baltic region during the Iron Age, Scientific Reportsvolume 8, Article number: 2455 (2018) doi:10.1038/s41598-018-20705-6

Friday, February 2, 2018

Early Baltic Corded Ware form a genetic clade with Yamnaya, but...


This is what Mittnik et al. 2018 say about a couple of their Corded Ware, or Baltic Late Neolithic (Baltic_LN), samples from what is now Lithuania:

Computing D-statistics for each individual of the form D(Baltic LN, Yamnaya; X, Mbuti), we find that the two individuals from the early phase of the LN (Plinkaigalis242 and Gyvakarai1, dating to ca. 3200–2600 calBCE) form a clade with Yamnaya (Supplementary Table 7), consistent with the absence of the farmer-associated component in ADMIXTURE (Fig. 2b). Younger individuals share more alleles with Anatolian and European farmers (Supplementary Table 7) as also observed in contemporaneous Central European CWC individuals [2].

We can add a third early Baltic Corded Ware sample, Latvia_LN1, to this list, because this individual was also shown to lack the above mentioned farmer-associated component in ADMIXTURE by Jones et al. 2017.

However, in my Principal Component Analysis (PCA) of ancient West Eurasia, all three samples fall just "northwest" of Yamnaya, along with one German Corded Ware outlier, and form a separate cluster that is shifted slightly closer to European hunter-gatherers and farmers. Hence, Plinkaigalis242 and Gyvakarai1 only form a clade with Yamnaya to the limit of the resolution in the analysis by Mittnik et al., but aren't exactly identical to Yamnaya. The relevant datasheet is available here.


So what might this mean? Possibly that the ancestors of this Corded Ware trio "absorbed" trace forager and/or farmer admixture as they migrated from the Pontic-Caspian steppe to the East Baltic. Or it could mean that they came from a more westerly part of the Pontic-Caspian steppe where people harbored slightly elevated forager and/or farmer ancestry relative to Yamnaya.

More sampling of Eneolithic and Early Bronze Age (EBA) burial sites on the Pontic-Caspian steppe, particularly north of the Black Sea, will probably solve this mystery. Please note, however, that we already have an Eneolithic sample from the Pontic-Caspian steppe that not only packs extra farmer admixture over Yamnaya, but also belongs to Y-haplogroup R1a-M417, which is a marker intimately associated with the Corded Ware expansion (see here).

By the way, this is how the Corded Ware set from Mittnik et al. behaves in another of my PCA, which is designed to focus on entho-linguistic-specific genetic drift in Northern Europe. I don't usually run samples older than the Bronze Age in this analysis, the reason being that they often don't share enough genetic drift with modern-day Europeans to produce meaningful output. And to be honest, I'm not quite sure what to make of these results. But it's probably not a coincidence that the Scandinavian Corded Ware (CWC_Battle_Axe) individual clusters so strongly with the Nordic Iron Age and modern-day Scandinavian samples. The relevant datasheet is here.


See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Modern-day Poles vs Bronze Age peoples of the East Baltic

The genetic history of Northern Europe (or rather the South Baltic)