Saturday, July 9, 2016

Modeling Steppe_EMBA

Lazaridis et al. showed that their Steppe_EMBA grouping, which included Afanasievo, Poltavka and Yamnaya, as well as two Potapovka samples, one Russia_EBA sample and one Srubnaya_outlier sample, were best modeled in the following two ways using qpAdm:

Eatern Hunter-Gatherer (EHG) 0.568
Iran Chalcolithic (Iran_ChL) 0.432

Caucasus Hunter-Gatherer (CHG) 0.181
Eastern Hunter-Gatherer (EHG) 0.527
Iran Chalcolithic (Iran_ChL) 0.292

I'm not a huge fan of either of these models, but especially the first one, even though I understand that they're both statistically very sound. For one, the uniparental markers don't match, and two, TreeMix seems to disagree (see here).

So let's try something a little different and see what happens when I model Steppe_EMBA as EHG, CHG, and Anatolia Chalcolithic.


Anatolia Chalcolithic (Anatolia_ChL) 0.128
Caucasus Hunter-Gatherer (CHG) 0.375
Eastern Hunter-Gatherer (EHG) 0.497

As far as I can tell, it's a very decent fit, especially considering that I'm using 12 outgroups and three reference populations. To me, at least, the standard errors look surprisingly low for such a complex model: 0.033, 0.046 and 0.020, respectively.

Now, I'm not arguing here that Chalcolithic Anatolia is the answer. What I'm saying is that multiple lines of evidence do not support Chalcolithic Iran as a real source of admixture for Steppe_EMBA, and I'm offering what I see as a plausible alternative among the currently available samples.

I know that this is a work in progress for the Broad MIT/Harvard team, and we'll have to wait for more ancient samples and another paper or two before a consensus is reached on the topic.

But here's my prediction: Steppe_EMBA only has 10-15% admixture from the post-Mesolithic Near East not including the North Caucasus, and basically all of this comes via female mediated gene flow from farming communities in the Caucasus and perhaps present-day Ukraine.


Olympus Mons said...

See. what basicly you all here do, ans its alright, is to feed confirmation bias inputs to people who already agree with you. If results do not fit, just run diferent setups until it does.

Having said that! - humpft. I agree with you. lets run the story (my story!).

1- Apparently (by cattle and goats) a population moved from eastern Anatolia by 9000 bc into really southern Caucasus where CHG already lived north of Kura river. Either those were already R1b (P25+) or they got infiltrated by R1b coming from near the margins of the black sea where you later see Maykop. So maybe that was the the Anatolia component.

2 - Ubaid ophidians (hussana, Samarra, Uruk, sumeria, Ur, Elamite..) replaced the region DNA admixture by 4.500 bc and by doing that two blobs of people with Anatolia DNA (take your pick which N or C) by then already mixed with CHG and some original EHG geographically formed. The ones I dont care that moved north by the black sea shores into Nalchik and later up samarra river that you guys love to talk about as Yamnaya and the one I care which moved to eastern Anatolia where the highest diversity of R1b still exist (the ones I care about). this are the ones that ended up moving to Merimde as M269. we will see. maybe sooner than you all think.

3 - So those north, mixed with people heavy on EHG and that is why you model them, R1b Yamnaya, as 50% Anatolia_Chl and CHG.

Chad Rohlfsen said...


Did you combine Karelia and Samara in EHG, also Satsurblia and Kotias for your CHG? Also, is WHG a mix of Bichon, Loschbour, and Villabruna? If so, here are a few models I'll run down with your outgroups.


best coefficients: 0.529 0.165 0.305

std. errors: 0.033 0.055 0.080

fixed pat wt dof chisq tail prob
000 0 9 13.987 0.122797


best coefficients: 0.484 0.231 0.285

std. errors: 0.025 0.073 0.087

fixed pat wt dof chisq tail prob
000 0 9 15.827 0.070578

Chad Rohlfsen said...

Adding a European MN sample with decent coverage


best coefficients: 0.493 0.115 0.321 0.071

std. errors: 0.026 0.179 0.123 0.077

fixed pat wt dof chisq tail prob
0000 0 8 11.941 0.153858

Chad Rohlfsen said...

With Iran_EN in the pright


best coefficients: 0.483 0.033 0.387 0.097

std. errors: 0.024 0.212 0.127 0.095

fixed pat wt dof chisq tail prob
0000 0 9 16.021 0.0664511


best coefficients: 0.509 0.053 0.359 0.079

std. errors: 0.039 0.147 0.090 0.102

fixed pat wt dof chisq tail prob
0000 0 9 11.779 0.226059

Chad Rohlfsen said...

Using Khvalynsk as a base


best coefficients: 0.631 0.084 0.209 0.077

std. errors: 0.032 0.189 0.116 0.088

fixed pat wt dof chisq tail prob
0000 0 9 10.069 0.344961

Chad Rohlfsen said...


best coefficients: 0.654 0.138 0.171 0.038

std. errors: 0.032 0.194 0.107 0.089

fixed pat wt dof chisq tail prob
0000 0 9 7.023 0.634752

Alberto said...

With nMonte the one that seems to work better in Hungary_CA:

"Eastern_HG" 55.1
"Satsurblia" 27.65
"Hungary_CA" 17.25
"Hungary_EN" 0
"Anatolia_Chalcolithic" 0
"Anatolia_Neolithic" 0
"Hungary_HG" 0

No idea how that would work in qpAdm, though.

Chad Rohlfsen said...

I'm trying to stick to pre-Yamnaya samples. Baden is not that great for coverage anyway.

Alberto said...

Sorry, off topic, but I was looking at models with Spanish and realized that the reason why they improve with a bit of SSA admixture is not because that fits better their Yoruba relatedness, but because it decreases affinity to other Eurasians. This is something that can be seen in the PCAs from the D-stats, and it becomes very evident in populations with higher SSA admixture, which become very distant from other Eurasians even by having ~5% SSA.

So I'm wondering, if Basal Eurasian is something that decreases affinity to the rest of Eurasians, but it's not something African because Basal Eurasian rich populations are not closer to Africans than other Eurasians, meaning that a D-stat like:

D(Chimp, Yoruba; Anatolia_Neolithic, Loschbour)

Is not significant, then I'm guessing that someone actually cared to run stats in the form:

D(Chimp, African1; African2, Loschbour)

And that those are significant, right? Because if by the same measurements, Africans are not closer to Africans, then where's the logic in Basal Eurasian? A bit of African admixture will be enough to explain that decrease in affinity to other Eurasians, and from every other point of view much a much better option (the idea of a Basal Eurasian population is not only at odds with Archaeology, but even with common sense, so alternative explanations would be preferred).

The last paper in pre-print uses the same method to prove that Natufians are not closer to Africans, so again I have to assume that those stats with Africans are significant, but would it be able to actually see those results? Could someone run a few D-stats like:

Chimp Mbuti Loschbour Yoruba
Chimp Mbuti Loschbour Masai_Kinyawa
Chimp Yoruba Loschbour Ju_Hoan_North
Chimp Mota Loschbour Esan_Nigeria

If all those are significantly positive (as should be expected, someone must have checked this before) then I guess all is good. But just to make sure this is solid enough (I have reasonable doubts based on what I've seen, but I haven't seen much of those).

Alberto said...

Yes, Hungary_CA is roughly contemporary with Yamnaya, but that doesn't look like too anachronistic. Though yes, the sample is not great quality, that's true.

Davidski said...


This is what they used...

CHG is Kotias + Satsurblia
EHG is Karelia_HG (2) and Samara_HG
WHG is Loschbour, Iberia_Mesolithic and Hungary_HG

Btw, from memory Hungary_CA maybe works, but it's definitely a worse fit than Hungary_EN.

Chad Rohlfsen said...

result: Chimp Mbuti Loschbour Yoruba -0.0035 -1.200 28849 29052 596893
result: Chimp Mbuti Loschbour Masai 0.0111 3.888 27704 27095 596893
result: Chimp Yoruba Loschbour Ju_hoan_North -0.1414 -44.648 27858 37032 596893
result: Chimp Mota Loschbour Esan -0.0709 -16.715 28750 33137 596642
result: Han Mbuti Loschbour Yoruba 0.2732 84.168 45170 25786 610774
result: Han Mbuti Loschbour Masai 0.2160 66.203 41436 26714 610774
result: Han Yoruba Loschbour Ju_hoan_North 0.2634 85.033 44913 26189 610775
result: Han Mota Loschbour Esan 0.2178 52.835 42179 27093 610438
result: Onge Mbuti Loschbour Yoruba 0.2688 76.875 44805 25819 610765
result: Onge Mbuti Loschbour Masai 0.2128 60.537 41120 26687 610765
result: Onge Yoruba Loschbour Ju_hoan_North 0.2586 77.419 44560 26250 610767
result: Onge Mota Loschbour Esan 0.2139 48.431 41723 27021 610429

Chad Rohlfsen said...

Hmm. I wonder why they'd use LaBrana with having significantly more Aurignacian, and KO1 looks like he has BE and extra EHG.

Davidski said...

I don't think it really matters in this case, because they're using WHG as an outgroup, and they also used Anatolia_Neolithic as an outgroup.

KO1 might have a bit of extra EHG, but it's probably not recent.

Alberto said...


Thanks for those stats. So the ones with Chimp as an outgroup, only Masai appears closer to other African. In the other 3, Loschbour is closer to Africans than the other Africans are.

So I don't know what the meaning of using those kind of stats to prove that populations with Basal Eurasian are not closer to Africans than those without it. Or the last ones about Natufians.

A much more simple explanation would be a late (during or after the LGM) African admixture in populations closer to Africa. Just some ~3% would be enough, and that makes everything so much easier. And as long as the putative African population mixed with other Africans after that, you wouldn't be able to find the source of it with D-stats.

ryukendo kendow said...

@ Alberto

I take it that you are postulating Basal Eurasian as a group apart from present-day Africans, which is still African in the geographical sense?

Because the stats only show that Basal Eurasian do not share drift with present-day/currently known Africans, which can also be seen from the previous image with the shared drift paths between populations and columns, where Basal-admixed populations shared no more drift with Mota and Yoruba than other Eurasians did. Treemix also finds the same thing, or very much the same thing where the Basal edges are from the trunk, or share a very minute amount of drift with Mota at most.

So even if Basal Eurasian is African it would have to not bias Basal-admixed populations towards any currently-known Africans. As for the amount of African ancestry needed, a ~3% figure would need to be from a population between Mbuti and Yoruba, closer to Mbuti's split point, to fit drift measures with Ust Ishim, Kostenki, WHG etc. Not impossible, but a somewhat surprising scenario.

ryukendo kendow said...

^^Admittedly, this is the scenario I proposed for Natufians. So maybe more agnosticism is warranted on my side as well.

Alberto said...


Yes, exactly. According to D-stats the term African does not seem to define one specific clade. So whatever African branch that might have admixed with Eurasians to create this decrease in affinity to other Eurasians might no longer exist in its pure form. So D-stats with modern Africans are not going to be informative about it.

As for the amount it doesn't have to be 3% exactly. I don't know how much recent African admixture Palestinians or Jordanians might have. But probably not far from ~5%. And I think the effect is already quite bigger than what we see with Anatolia_Neolithic vs. Loschbour. IOW, I think (can't test it) that in stats like:

D(Chimp, Ust-Ishim; Anatolia_Neolithic, Loschbour)
D(Chimp, Ust-Ishim; Palestinian, Anatolia_Neolithic)

The second one might be more significant. And I don't know exactly what kind of recent SSA Admixture Palestinians might have, but I don't think it's from Pygmies or San populations. More likely east African? But anyway, 3 or 6%, doesn't matter. As long as we don't need 50-60% admixture from a ghost Basal Eurasian population that must have been in Eurasia since 70 Kya but hiding somewhere, it's already a big step forward.

All the rather failed attempts from the last Lazaridis et al. pre-print to estimate Basal Eurasian admixture only help to become more sceptic about the existence of such population. I don't know, is there is any other hint about it apart from the D-stats showing that Basal Eurasian populations are not closer to a few modern Africans?

ryukendo kendow said...

@ Alberto
If a population is from Africa, but not related to present day Africans, and was in contact with Eurasians, I suppose there is still some chance that the population split off before Mota and thus is nested within variation properly defined as 'African', but then there are some parameters which delimit our models already, at least with respect to Ust Ishim, Villabruna, Natufians, Iran_N, MA-1, Kostenki, et cetera, i.e. all the populations they include in the largest ADMIXTUREGRAPH figure. There's the matter of how basal the contribution is, which they display with a curve in the supp info: if a contribution comes from a population that is as far away from Eurasians as Mota is, then the contribution is already ~10%, and if before Mota and after Yoruba the proportion drops steadily to ~2%, which sounds fine but gets us back to the question of how likely a population more basal than Mota is to be found close to the Sinai or the Red sea straits.

For the bad modelling, a lot of it comes from using Kostenki and Loschbour as the estimator, which produces bad estimates for all groups with ancestry from outside (Kostenki-Loschbour), such as SHG and EHG. They attempt to resolve this by using the curve-fitting method in the ADMIXTUREGRAPH as well, and it turns out that trees can be fit with 0% Basal in these groups, unlike the case for Iran_N and Natufian.

I think Basal is probably real, especially as we haven't tried a f3 to see if e.g. Natufian, Iran_N and Anatol_N shares more drift with each other than any do with K14, I think they would, but that the 60% figure too high, that I suspect as well.

Chad Rohlfsen said...

In the sense that no BE carrying pops are closer to Africans, but share more drift with each other than with hunters, and less with East Asians it makes sense.

Alberto said...

Looking at the f3 of shared drift with Ust-Ishim, the difference between WHG and LBK_EN is slightly bigger than between LBK_EN and Jordanians and Palestinians, but about half as much as between LBK_EN and Tunisians and Algerians. Since Levant_Neolithic is already quite more Basal than LBK_EN, it would probably require a similar amount of admixture from a similarly divergent African branch as whatever Jordanians and Palestinians have. So that looks realistic enough.

But I wonder, do Palestinians and Jordanians appear closer to Africans than other Eurasians? And to which Africans exactly? At least that could give some realistic point of comparison to substantiate that Basal Eurasian admixed populations are not closer to Africans by those same stats that prove that Jordanians are.

Another thing that puzzles me is the behaviour of D-stats with Africans. Has someone ever attempted to run more stats with them to build some tree or see which clades they form or whatever?

And why are results so different from what we see with IBS? By this Yoruba IBS that Kurd did a while back it looks really different from D-stats:

So maybe IBS is a better method to check if one Eurasian population is closer to African than another?

Shaikorth said...

Alberto, supplementary table 2 here has a large set of pairwise allele sharing distance comparisons between Eurasians and Africans. That method should correspond to IBS. Lower distance = closer.

human443 said...

I'm with Ryukendo and Alberto on this one. I think the figures for basal are too high, and the split too recent.

If you imagine a stat in the form
Mbuti test K14 Ust_Ishim
A test population with 100% basal, ENA, or any extra crown Eurasian branch should get a value of 0.
If we use Vestonice16 as our upper bound in K14 relatedness, we get a D value of -.914.

Here are some values (and upper bound percentages based on Vestonice16)...
BedouinB -.538 (41.1%)
Sardinian -.618 (32.4%)

So BedouinB's combined Basal, ENA, extra crown Eurasian, and African can only add up to 41.1%. Most sources list them as what, 8% African? Throw in 3.1% of combined ENA and extra Crown (shouldn't be unreasonable), and that puts us at only 30% max. It should also be noted that any West Eurasian that branches off before the V16-K14 split will also inflate this percentage, seeing as our ANE reference (AG3) gets only a -.507 D value.

With that noted, we know ANE affinity increased in Epipaleolithic Europe, decreased a bit in the neolithic ('neolithic anatolian farmers have whg like ancestry farther along the whg-ehg cline than whg'), then rose drastically in the bronze for Europe, we will get more accurate basal values setting our upper bound to Villabruna (-.752).

French -.633 (15.8%)
Italian Tuscan -.628 (16.5%)
Lithuanian -.653 (13.2%)
Sardinian -.618 (17.8%)

Alberto said...


Thanks! Though I'm not sure I'm understanding those numbers correctly. For example, Georgians, Armenians or Adyghe are about 0.0004 distance from Africans, but Jordanians are 0.0015. That means that Jordanians are very significantly further away from Africans than Caucasus populations?

Shaikorth said...

Lower diagonal is standard errors I think.
So Armenian-Yoruba is 0.28569 and Jordanian-Yoruba 0.27960.

Subtract those from 1 and the number should be the IBS they get with Yoruba using that set.

Matt said...

@ Alberto:
Another thing that puzzles me is the behaviour of D-stats with Africans. Has someone ever attempted to run more stats with them to build some tree or see which clades they form or whatever?

My understanding is, besides the East Africans whose admixture we know about (from both West Africa and West Eurasia), African patterns go like this:

Ju_Hoan_North: All groups are a clade to them except Levant_Neolithic and related groups. That is D(Chimp, JHN, Yoruba, Han) or D(Chimp, JHN, Yoruba, Mbuti) or D(Chimp, JHN, Mota, Mbuti) would be roughly 0.

Mbuti: They are a clade with Eurasians to JHN, Yoruba and Eurasians form a clade to them.

Yoruba: 1) All Eurasian groups form a clade to them, 2) they form a clade with Eurasians to the exclusion of Mbuti or Ju_Hoan_North, 3) Mota and Eurasians are a clade to them.

Mota: 1) Eurasians are all a clade to them, 2) they are a clade with Eurasians to all other Africans

The geneflow from Levant_Neolithic related groups to JHN makes this slightly more complicated but I think those are the patterns.

And these are the patterns I think we see in treemix (for example), plus some weak low level geneflow edges.

This could all be tested explicitly though!

Davidski kindly ran off a set of African based D-stats for me when the Mota paper was released, but they didn't include the specific statistics. Generally all that is applicable from them to the above is that the pattern of increasing shared drift with Eurasians goes JHN->Mbuti->West Africans->Mota and the admixed East Africans overlap with Mota. Although this said, there was a finding there of Mota and Yoruba both sharing more with Eurasians than each other. However, that was before the Mota sample was corrected for genotyping / SNP errors I think, so probably does not quite stand.

In theory this recapitulates a population history of

1) All other humans break off from JHN
2) pygmy groups break off from all other humans
3) West-Central African break off from all other humans
4) Ancient East Africans (represented by Mota) break off from all other humans
5) All other humans (non-Africans) then split into different Eurasian related groups, with the BEu splitting off first.

Plus admixture from clades which are basal to all humans (Neanderthal, Denisovan, etc.).

Matt said...

@ Alberto again, just to visually describe the clade pattern I described from my previous post:

This contrasts the kind of models that fall out of software like treemix, where there is a differential clading of different Africans with Eurasians, with a model that would have all Africans form a clade together against Eurasians.

(note low level admixture edges like those that link Eurasia->Ju_Hoan_North and probably link Yoruba->Mbuti and Eurasian->Neanderthal are not present).

Alberto said...


Thanks. So those numbers do seem to support a Basal Eurasian population that is not closer to Africans, since Armenians and Han are equidistant to Africans. And also it shows that populations with African admixture are closer to Africans (Jordan, Yemen, Bedouin). Also it does show that Africans are significantly closer to each other than to Eurasians. So all that looks consistent, at least.


Thanks for that explanation. It all sounds reasonable, but did the D-stats confirm those patterns? From the 4 run by Chad above 3 would be consistent with them, but this one doesn't seem to be:

Chimp Mbuti Loschbour Masai 0.0111 3.888 27704 27095 596893

And when you introduce related populations it works? Like Gambian forming a clade with Yoruba to everyone else?

Shaikorth said...

Alberto, may not be that simple as Europeans are more distant from Africans than Armenians and Han in that table.

Occasionally we see African edges to East Asians in TreeMix too...

ryukendo kendow said...

Nice tree Matt, more or less in perfect agreement. Looking at how consistently Treemix produces this, there may be another possibility that Mota may branch off rather close to Eurasians, less than half of the way back from Ust-Ishim to Yoruba, and have 20% contribution from a Ju-Hoan-like population.

About the stats Chad produced, its interesting that the sharing between Mbuti and Maasai is greater than the sharing between Mbuti and Yoruba, wonder why that is?

Chad, how many populations of Africans do you have in your dataset?

Chad Rohlfsen said...

Tons. 41 from the non-public set, plus another 40 or so from a low coverage set with Sudanese, Fulani, and Ethiopian groups. Of my 4000 samples, probably half are Africans, but some are very admixed of course.

ryukendo kendow said...

Understand this will take quite some effort, but do you mind running stats of the form:

Chimp Ju_Hoan Denisovan X
Chimp Mbuti Denisovan X
Chimp Mota Denisovan X
Chimp Esan_Nigeria Denisovan X
Chimp Anatolia_Neolithic Denisovan X
Chimp Iran_Neolithic Denisovan X
Chimp Levant_Neolithic Denisovan X
Chimp Natufian Denisovan X
Chimp BedouinB Denisovan X
Chimp Jordan_EBA Denisovan X
Chimp Ust_Ishim Denisovan X

Where X is Africans? No need to include all of them, maybe include a few populations each from W Africa Niger-Congo speakers like Yoruba and Mandenka, non-Semitic Afroasiatics such as Fulani, Oromo, Somali, Ari etc, some Ethiosemitics like Tigray and Amhara, some N Africans like Mozabite and Moroccan, some Nilotes and Sudanese, S African HGs, E African Bantus, Central African HGs and Hadza/Sandawe?

The drift path graphing will be very illuminating w respect to whether the clade-like splitting in Treemix is true, and PCA and neighbour-joining and cluster analysis will allow us to explore patterns independently of ADMIXTURE analyses and haplotype-based PCAs and chunk-based analyses which are the only explorations I've seen so far iirc. There doesn't seem to be a formal-stats based analysis yet.

Particularly interested in how the N and E Africans will behave with Ancient Near Easterners, which ones each group prefers.

Alberto said...


Yes, there are some small differences, but the populations that have some SSA admixture are clearly more significant. So with that data at least I see more consistency and could buy the Basal Eurasian hypothesis. Though I would have been nice to see some East African population included to see how those ones behave.


Yes, would be great to check that as time permits. Right now it seems we're quite in the dark with formal stats with Africans. Apparently everyone, because in this last pre-print they run something like:

Chimp Mbuti Natufian WHG

To show that Natufian is not closer to Africans. But when Yoruba in the place of Natufian gives a non-significant result too, the stat seems pretty uninformative. Same for the ones with Ju_hoan_North and Yoruba. Only with Mota it seems to be relevant in that case, but even then you'd need to test many more putative sources, knowing which ones are possibly relevant and which ones aren't. And even then the doubt of later admixture into the putative branch would make the results inconclusive without ancient DNA, but at least they could say they tried all the meaningful possibilities.

Matt said...

@ Alberto: Thanks for that explanation. It all sounds reasonable, but did the D-stats confirm those patterns? From the 4 run by Chad above 3 would be consistent with them, but this one doesn't seem to be:

Chimp Mbuti Loschbour Masai 0.0111 3.888 27704 27095 596893

And when you introduce related populations it works? Like Gambian forming a clade with Yoruba to everyone else?

I don't think I've ever seen it tested by D-stats. I think any violations to that model should've been big news, but it would be nice to explicitly test. I think these - would be able to test the basic features of that model, if Davidski or Chad or another observer had any time to run them.

Same re: testing whether populations that seem similar to Yoruba like Gambian forms a clade with any other African populations or whether the mainly trunk of their ancestry just branches off the tree in model A at a higher / lower position than Yoruba, I don't actually know what happens there, with D-stats.

Shaikorth said...


Those differences don't seem to be so significant, actually. For instance, Orcadian-Chinese ASD to Mbuti is comparable to Chinese-Palestinian ASD to Mbuti.

Davidski said...

Matt, you wrote Chimo instead of Chimp. Haha.

Matt said...

Thanks David. Chimo, what a fail...

So, these violate my expectations in numerous ways:

D (Chimp Ju_hoan_North Yoruba Han) = 0.019 Z= 9.023
D (Chimp Ju_hoan_North Mota Han) = 0.007 Z= 2.095
D (Chimp Ju_hoan_North Mbuti.DG Han) = 0.0192 Z= 6.646
D (Chimp Ju_hoan_North Biaka Han) = 0.0258 Z= 10.992
D (Chimp Ju_hoan_North Somali Han) = -0.0047 Z= -2.423

D (Chimp Ju_hoan_North Yoruba French) = 0.0247 Z= 12.797
D (Chimp Ju_hoan_North Mota French) = 0.0127 Z= 3.889
D (Chimp Ju_hoan_North Mbuti.DG French) = 0.0245 Z= 8.749
D (Chimp Ju_hoan_North Biaka French) = 0.0312 Z= 13.947
D (Chimp Ju_hoan_North Somali French) = 0.0014 Z= 0.914

D (Chimp Ju_hoan_North Yoruba GujaratiD) = 0.0219 Z= 10.168
D (Chimp Ju_hoan_North Mota GujaratiD) = 0.0098 Z= 2.883
D (Chimp Ju_hoan_North Mbuti.DG GujaratiD) = 0.0219 Z= 7.395
D (Chimp Ju_hoan_North Biaka GujaratiD) = 0.0286 Z= 11.802
D (Chimp Ju_hoan_North Somali GujaratiD) = -0.0017 Z= -0.937

Ju_Hoan closer to all Eurasians in this comparison than to all Africans (except Somali). Levant_N farmer ancestry must have a very strong impact on Ju_Hoan_North.

Though at the same time:

One pair of stats I do find a little tricky to understand is:

D (Chimp Yoruba Ju_hoan_North Han) = 0.1439 Z= 55.941
D (Chimp Ju_hoan_North Yoruba Han) = 0.019 Z= 9.023

which is tough for me to understand quite what that means at the same time (for me).


D (Chimp Ju_hoan_North Yoruba Mota) = 0.0124 Z= 4.36
D (Chimp Ju_hoan_North Mbuti.DG Mota) = 0.0131 Z= 3.974
D (Chimp Ju_hoan_North Biaka Mota) = 0.0197 Z= 6.711
D (Chimp Ju_hoan_North Somali Mota) = -0.0116 Z= -3.952


indicates there is some clade structure where Ju_Hoan_North is closer to ancient and modern East Africans than West Africans.


D (Ju_hoan_North Mbuti.DG Yoruba Han) = -0.0118 Z= -6.633
D (Ju_hoan_North Mbuti.DG Mota Han) = 0.0125 Z= 4.669
D (Ju_hoan_North Mbuti.DG Biaka Han) = -0.013 Z= -6.16
D (Ju_hoan_North Mbuti.DG Somali Han) = -0.0033 Z= -2.017
D (Ju_hoan_North Mbuti.DG Yoruba Mota) = -0.0247 Z= -11.105
D (Ju_hoan_North Mbuti.DG Biaka Mota) = -0.0259 Z= -11.034
D (Ju_hoan_North Mbuti.DG Somali Mota) = -0.016 Z= -7.078


West African pymgy and Yoruba do seem to form a clade together to exclusion of Ju_Hoan_North and Mota.

These patterns forming makes it tougher to understand the patterns in the other stats, which I wrote out on different assumptions.

Maybe Chimp (chimo) might be a pretty bad outgroup to make sense of Africa. Unfortunately we don't have anything else...

Alberto said...

I don't think that anyone can make much sense of those stats. I kind of expected those kind of results, and I think that adding more populations will only make things worse.

In any case, this is why I'm a bit sceptic about the Basal Eurasian not being closer to Africans as part of the definition of Basal Eurasian. Being "closer to Africans" is a very vague concept in the light of the stats we see above.

So I think that for the time being I prefer to think about Basal Eurasian as a late Out of Africa instead of it as being part of the early (main) Out of Africa. It just makes more sense overall. I guess we have to wait for some DNA from the Antelian period (30-18 K BCE) of the levant to really know.

ryukendo kendow said...

...Hmm Matt and Alberto, the stats seem reasonably clear, in fact they do not contradict the tree structure you(Matt) proposed earlier at all.

We had
(Ju_Hoan (Yoruba (Mota/Somali (Eurasian))))
With some Eurasian and East African gene flow to Ju_Hoan, as found in Reich et al, and Yoruba gene flow to Mbuti as found in ADMIXTURE and Treemix, so Ju Hoan being closer to Eurasian than to Yoruba is expected since the ancestors of Yoruba and Eurasian split off from Ju Hoan at the same time, so both are the same distance to Ju_Hoan in the absence of gene flow.

Then for this:
D (Chimp Yoruba Ju_hoan_North Han) = 0.1439 Z= 55.941
D (Chimp Ju_hoan_North Yoruba Han) = 0.019 Z= 9.023

Is also expected because Yoruba and Han form a clade against Ju_Hoan, so Yoruba<<>>Han to the exclusion of Ju_Hoan, while Ju_Hoan is equidistant from Yoruba and Han without gene flow from either, so a small amount of gene flow from Eurasians biases Ju_Hoan towards Han.

Nevertheless, 'Africans' aren't monophyletic, so it makes sense to question what it means to have 'African' gene flow, especially as its possible to have gene flow from geographic 'Africans' without being biased to any present-day African clades at all. But any such clade would have to be hypothetical.

Alberto said...

But there are too many problems there. If this:

Chimp Ju_hoan_North Mota French 0.0127 3.889 28311 27604 593430

Is because Ju_hoan_North have a significant amount of Eurasian admixture, and closer to French admixture than to Han admixture, because:

Chimp Ju_hoan_North Mota Han 0.0070 2.095 28213 27822 593430

Then Ju_hoan_North is not a good outgroup to Eurasians because probably comparing to Levant_Neolithic and Han, it will reach significance. So suddenly Basal Eurasian will indeed make populations closer to Africans. At least to Ju_hoan_North.

And then:

Chimp Yoruba Mota French 0.0559 18.321 32116 28714 593430

Yoruba has actually a lot more Eurasian admixture than Ju_hoan_North!

Ju_hoan_North Biaka Mota French 0.0212 10.370 29340 28120 593430

Apparently Biaka also has a lot more Eurasian admixture than Ju_hoan_North!

Ju_hoan_North Mbuti.DG Mota French 0.0120 4.754 28560 27880 593418

Mbuti also more, though not so much more.

And consistently French represents better that Eurasian admixture across Africa. French has the highest Basal Eurasian too. So one way or the other there was gene flow between Africans and Basal Eurasian, which contradicts the idea that Basal Eurasian does not get you closer to Africans (unless it's because of the WHG in French, but in that case it would be WHG then one closer to Africans).

At least, Masai has more Mbuti admixture than Mbuti has Eurasian:

Chimp Mbuti Loschbour Masai 0.0111 3.888 27704 27095 596893

(or alternatively Mbuti has more Masai than Eurasian admixture).

This is just with a few populations. I'm guessing that if you add 30 African populations you will run out of ideas very fast to explain what's going on.

Shaikorth said...

And with Wong et al's D-stats using high coverage genomes French are not closer to San, Dinka or Mbuti than East Asians are so we can't say there is significant French-like mixture around Sub-Saharan Africa.

ryukendo kendow said...

I really don't think so, especially because we already see repeatedly that Mota has ~30% gene flow from a point even more basal than Ju_Hoan, so in stats with Mota in 3rd position and the most basal population in 1st there will be an attraction between 2nd and 4th positions which will take over the stat, which is why the original paper with Mota did not use stats of that form to discover gene flow into Africans.

I don't think there's much thats surprising, the stats are highly comformable, like those for the EEF populations in the Kumtepe paper, and unlike the case in Eurasia right now; they seem very consistent with the results of Treemix and ADMIXTURE so far.