search this blog

Tuesday, February 6, 2018

Unleash the power: Global 25 test drive thread


Ancestry modeling enthusiasts, feel free to do your best (or worst) with these datasheets and share the output, whatever it might be, in the comments below:

Global 25 datasheet

Global 25 datasheet (scaled)

Global 25 pop averages

Global 25 pop averages (scaled)

Global 25 PAST datasheet

The Global 25 is a more powerful version of the Global 10 ancestry analysis (see here). If all goes well in the comments, it'll soon be offered for free to those who already have Global 10 coordinates. After that, we'll see what happens.


Below is a quick attempt to model Samara Yamnaya with its Global 25 coordinates using nMonte2. Hmmm...interesting stuff.

[1] distance%=2.4936 / distance=0.024936

Yamnaya_Samara

Samara_Eneolithic 71.15
Armenia_EBA 13.6
CHG 12.6
Iran_LN 2.65
ALPc_MN 0
Anatolia_BA 0
Anatolia_ChL 0
Armenia_ChL 0
Baden_LCA 0
Balaton_Lasinja_CA 0
Baltic_HG 0
Barcin_N 0
Blatterhole_HG 0
Blatterhole_MN 0
Boncuklu_N 0
EHG 0
Greece_N 0
Greece_Peloponnese_N 0
Iran_ChL 0
Iran_N 0
Koros_EN 0
Koros_HG 0
LBKT_MN 0
LBK_EN 0
Levant_BA 0
Levant_N 0
Narva_Estonia 0
Narva_Lithuania 0
SHG 0
Starcevo_EN 0
TDLN 0
Tepecik_Ciftlik_N 0
Tianyuan 0
Tisza_LN 0
Tiszapolgar_ECA 0
Vinca_MN 0
WHG 0

Obviously, this makes a lot of sense, but it's somewhat different from my recent models of Samara Yamnaya using methods based on formal stats (see here and here). In the end, only ancient DNA from the steppes and Caucasian-Caspian region will settle this issue when enough of it is sampled.

Update 10/02/201: As per our discussion in the comments, in most cases it might be useful to restore the variance of the raw data (like in the datasheets here and here). This can be done with the EigenScale.R script available here. You'll also need a text file of the relevant eigenvalues, available here. Instructions on how to call the R script are here. Below is the same model of Samara Yamnaya as above, except using the "restored" data. The result is very similar, albeit a little cleaner.

[1] distance%=3.8086 / distance=0.038086

Yamnaya_Samara

Samara_Eneolithic 70.35
Armenia_EBA 14.9
CHG 14.75
ALPc_MN 0
Anatolia_BA 0
Anatolia_ChL 0
Armenia_ChL 0
Baden_LCA 0
Balaton_Lasinja_CA 0
Baltic_HG 0
Barcin_N 0
Blatterhole_HG 0
Blatterhole_MN 0
Boncuklu_N 0
EHG 0
Greece_N 0
Greece_Peloponnese_N 0
Iran_ChL 0
Iran_LN 0
Iran_N 0
Koros_EN 0
Koros_HG 0
LBKT_MN 0
LBK_EN 0
Levant_BA 0
Levant_N 0
Narva_Estonia 0
Narva_Lithuania 0
SHG 0
Starcevo_EN 0
TDLN 0
Tepecik_Ciftlik_N 0
Tianyuan 0
Tisza_LN 0
Tiszapolgar_ECA 0
Vinca_MN 0
WHG 0

See also...

Modeling genetic ancestry with Davidski: step by step

334 comments:

1 – 200 of 334   Newer›   Newest»
Chad Rohlfsen said...

Uh oh.. looks like my qpGraph LOL

Gotta give you shit David

Chad Rohlfsen said...

Did I do the Globe10? I don't remember.

Davidski said...

No, but I can send you the Global 25 coordinates tomorrow after I run a few more ancient samples.

Chad Rohlfsen said...

Cool, thanks!

Matt said...

Sweet. Not had time to do any nMonte stuff yet, here a few simple neighbour joining trees and distance co-plots based on the full 25 dimensions and population averages:

https://imgur.com/a/KNsnD

In the plots, open circles are ancient, filled are modern. Colour scheme is pretty obvious (only really distinguishes West Eurasian populations since I'm reusing one from an old West Eurasian plot and supplementing with PAST3's K-means cluster analysis).

Plots above focus on distinguishing affinities which would've been tough to separate in Global10, mainly Bronze Age populations who have fairly overlapping positions on basic West Eurasian 2D PCA.

(Aside, Tajiks+North Caucasus come out marginally closest to Yamnaya among moderns over all 25 dimensions, kind of like in the high dimension West Eurasian Ancient 67 plots.)

(Differentiation within populations is v. interesting as well - Iranian Bandari stood out to me on that one.)

Re; whether this is scaled properly for Alberto / Sein, the distances look relatively right for it to be, to me. Couldn't confirm the way I normally do, by just re-processing the PCA through PAST3 again, but the distances def. look right.

human443 said...

So, let me get this right, you modeled Yamnaya as part Armenia_EBA and are quoted saying "Obviously, this makes a lot of sense" despite vehemently denying anything of the sort up until now. What changed? Have to agree with Chad here, shit giving seconded.

Alberto said...

@Matt

Thanks for all the graphs. It looks good, retaining fine structure. I didn't have time to test it myself, but a quick look makes me think that the dimensions are normalized. Turkish being closer to Bantu_Kenya than to Saudi or Koros_EN, much closer to Yoruba than to WHG...

Might not matter too much for Europeans, but probably people trying more distant populations will still get those strange results (Natufian in Native Americans, IIRC, etc...).

Rob said...

Dave, nice work
It’ll be good for more recent historic stuff methinks

Davidski said...

@human443

So, let me get this right, you modeled Yamnaya as part Armenia_EBA and are quoted saying "Obviously, this makes a lot of sense" despite vehemently denying anything of the sort up until now. What changed?

Nothing's changed.

I was in a hurry and went a little crazy, packing the model with well over 30 reference samples. This is not a sensible modeling strategy, because it's likely to lead to overfitting.

Nevertheless, the algorithm spat out a fairly coherent model, that did make sense, more or less, not only in terms of genetic affinities, but also geography and chronology.

Of course, this model does contradict my latest models based on formal stats. So pending more work with the Global 25 and arrival of more ancient samples from the steppe and Caucasus, I suspect that it is indeed overfitted and/or missing something.

So I'm really not sure why you've got your panties in a knot, carrying on as if this was some sort of ideological struggle, rather than an open analysis and discussion, which is what it is.

Davidski said...

Same sloppy methodology, but check out the results! This is impressive.

[1] distance%=1.4372 / distance=0.014372

CWC_Germany
Yamnaya_Kalmykia 69.85
Tisza_LN 18.65
Blatterhole_HG 8.4
Blatterhole_MN 3.1
ALPc_MN 0
Anatolia_BA 0
Anatolia_ChL 0
Armenia_ChL 0
Armenia_EBA 0
Baden_LCA 0
Balaton_Lasinja_CA 0
Boncuklu_N 0
CHG 0
EHG 0
Greece_N 0
Greece_Peloponnese_N 0
Iran_ChL 0
Iran_LN 0
Iran_N 0
Koros_EN 0
Koros_HG 0
Levant_BA 0
Levant_N 0
Narva_Estonia 0
Narva_Lithuania 0
Samara_Eneolithic 0
SHG 0
Sweden_EN 0
Sweden_MN 0
TDLN 0
Tepecik_Ciftlik_N 0
Tianyuan 0
Tiszapolgar_ECA 0
Vinca_MN 0
Yamnaya_Samara 0

Samuel Andrews said...

Baltic_BA:Turlojiske3
"Yamnaya_Samara" 54.65
"Germany_MN" 24.6
"Narva_Estonia" 20.75
"Narva_Lithuania" 0
"SHG" 0
"Sweden_MN" 0

Baltic_BA:Kivutkalns19
"Yamnaya_Samara" 42.1
"Narva_Estonia" 35.25
"Germany_MN" 22.65
"Narva_Lithuania" 0
"SHG" 0
"Sweden_MN" 0

Lithuanian
"Yamnaya_Samara" 45.9
"Germany_MN" 31.7
"Narva_Estonia" 21
"Nganassan" 1.4
"Narva_Lithuania" 0
"SHG" 0
"Sweden_MN" 0

Estonian
"Yamnaya_Samara" 49.95
"Germany_MN" 30.15
"Narva_Estonia" 16.4
"Nganassan" 3.5
"Narva_Lithuania" 0
"SHG" 0
"Sweden_MN" 0

Finnish
"Yamnaya_Samara" 45.65
"Germany_MN" 29.85
"Narva_Estonia" 9.5
"EHG:I0124" 7.8
"Nganassan" 7.2
"Narva_Lithuania" 0
"SHG" 0
"Sweden_MN" 0
"EHG:I0061" 0
"EHG:UzOO77" 0

Samuel Andrews said...

Saami. Two different samples. Using D-stats, Saami always get about 13% SHG. Btw Mittink reported U5b1b1a in Narva. It is actually U5b1b1. U5b1b1 has also been found in Mesolithic Poland. Today, U5b1b1* is everywhere while U5b1b1a is only in eastern Europe.

Saami:GS000035025
"Nganassan" 27.35
"Yamnaya_Samara" 22.65
"EHG:I0061" 25
"Germany_MN" 15.1
"Narva_Estonia" 9.85

Saami:GS000035026
"Yamnaya_Samara" 25
"Sweden_MN" 24.3
"EHG:I0124" 23.55
"Nganassan" 21.7
"Narva_Lithuania" 5.45

When I take out EHG....

Saami:GS000035025
"Yamnaya_Samara" 39.05
"Nganassan" 29.7
"Narva_Estonia" 21.8
"Germany_MN" 9.45
"Narva_Lithuania" 0
"SHG" 0
"Sweden_MN" 0

Saami:GS000035026
"Yamnaya_Samara" 39.15
"Nganassan" 22.8
"Sweden_MN" 20.3
"Narva_Estonia" 17.65
"SHG" 0

Samuel Andrews said...

Norse originate in the southern tip of Scandinavia which is why they are typical northern Europeans with no excess HG admixture.

Saami and Finns are the only Fennoscandians native to former HG territory. Finns clearly have mostly mainland European ancestry as well, probably a lot from people similar to Baltic BA, which includes significant HG admixture from Baltic HGs.

Saami have a huge chunk of Siberian ancestry as well as a dose of Fennoscandian HG admixture. Before Finns and Karelians expanded east and north, you have to think lots of Uralic people like Saami lived there and probably had more EHG/SHG than modern Saami.

Modern Balts and Estonians do have noticeable Baltic HG admixture but it looks like the Baltic HGs may have disappeared to a similar degree as the Stone-henge farmers in Britain. Steppe-heavy mainland Europeans later moved deep into Scandinavia & Karelia where they also mostly "replaced" whatever HG-rich people lived there.

This is interesting because potentially northern Europe could have become much more diverse than it is today. You could have had pure EEFs in Ireland, pure HGs in Karelia. But for independant random reasons, just about everyone along the northern half of Europe became roughly the same Yamnaya, EEF, WHG mix.

You could never predict that there were once really distinct people living side by side in regions today which are genetically uniform.

Chad Rohlfsen said...

Rob,

Did you see that Iran model on the earlier thread?

Chad Rohlfsen said...

Sam, you might wanna check Scandinavians again. There likely is HG and Siberian ancestry via Finns and Saami.

Rob said...

Chad, yep just looked now
Looks good thanks

Chad Rohlfsen said...

You bet!

Arvid E said...

It should be noted that the word Finn comes from the Old Norse word "Finnr", meaning someone who finds, and this word refers either to the Sami or more broadly to hunter/gatherers. As far as I can gather the Sami did not start herding reindeer until relatively recently. It should also be noted that the steppe incursion into Scandinavia predates that of the Sami.

aniasi said...

@Davidski

Probably irrelevant, but don't know where else to ask.

Do you have the info on the genetic sequencing from Cheddar Man?

Seinundzeit said...

David,

Thanks!

Good stuff. It'll be quite fun to work with this data.

If possible, could you also post the eigenvalues for all 25 PCs? I'd greatly appreciate it.

Thanks in advance.

human443 said...

@Dave

So Armenia_EBA doesn't have any Iran_Chl ancestry?

Davidski said...

@aniasi

http://www.dailymail.co.uk/sciencetech/article-5358699/First-Brit-dark-skinned-blue-eyed.html

@Sein

https://drive.google.com/file/d/1uDZxMPkY-52Gef_DWCYJj3V-MAthyxqE/view?usp=sharing

@human443

So Armenia_EBA doesn't have any Iran_Chl ancestry?

It might have, but probably not. See here...

http://eurogenes.blogspot.com/2017/08/the-iron-age-iranian.html

Then again, I haven't tried modeling it yet with the Global 25.

Ric Hern said...

@ Davidski

I wonder what the mean by "...would be considered ‘black’ if he lived today." ?

This is a very broad range from Light Brown to actual Black. Do they mean what Geneticists would describe as Black ? Was he darker than Loschbour?

It is interesting that the reconstruction shows a very Dark Red person. I have seen this type of colour amongst some Africans but not all.

Samuel Andrews said...

"http://www.dailymail.co.uk/sciencetech/article-5358699/First-Brit-dark-skinned-blue-eyed.html"

I just really hope leftist don't get a hold of this and politicize it. Human genetic history is more complex than "Humans arrived in China in 40,000 BC. Nothing changed for 40,000 years, now we have modern Chinese." But many average people think it is that simple. Over the course of 1,000s of years, some discontinuation and natural selection changing phenotype is expected.

Leftist will put Europeans to an unfair standard of if everyone who lived on your land in the last 40,000 years wasn't exactly like you then you have no real claim to your land. Or if some of your ancestors lived in exotic lands (Middle East) then you're not actually European.

Ric Hern said...

@ Rob

Thanks. I do not worry. I'm just trying to figure out what the actual consensus is about Black.

I wonder if he was darker than Loschbour ?

Most Cushetic peoples looks like the skin colour within the reconstruction.
And some ancient Sahara Rockart shows some reddish and some pitch black individuals...

Samuel Andrews said...

@Ric Hern,

Not enough markers related to skin color are known to precisely measure skin color with DNA. All we know is most WHGs from western Europe lacked the two DNA markers associated with light skin in modern Europeans. That's it.

Davidski said...

@All

Enough about Cheddar man.

Seinundzeit said...

David,

Awesome; thank you very much!

With scaling, the output is quite impressive.

To start things off, I did some very basic models for the Srubnaya_outlier, Yamnaya_Kalmykia, and CWC_Baltic_early.

Srubnaya_outlier:

42% EHG
35% MA1
23% CHG

distance%=0.5171

As would be expected, she turns out to be an extremely ANE-shifted sample with minor/moderate CHG-related admixture.

Yamnaya_Kalmykia:

57.4% EHG
30.4% CHG
7.7% Barcin_N
4.6% Iran_ChL

distance%=0.5048

CWC_Baltic_early

54.30% EHG
24.00% CHG
12.35% Barcin_N
7.70% Blatterhole_MN
1.65% Iran_N

distance%=0.5769

Again, sensible results.

So, then I tried some basic models for Iran, southern Central Asia, and South Asia.

Iranian_Persian:

58.95% Iran_ChL + 7.30% Levant_BA + 4.30% Natufian + 1.25% CHG + 1.05% Iran_N
10.65% CW_Baltic_early + 7.60% Scythian_AldyBel + 4.80% Andronovo
3.85% Bonda:ORI34
0.25% South_Africa_2000BP

distance%=0.0779

(overfitted, but still very reasonable)

Tajik_Shugnan:

15.25% Srubnaya_outlier + 13.35% Scythian_AldyBel + 10.70% CWC_Baltic_early + 8.75% Sarmatian_Pokrovka + 6.25% Andronovo + 5.45% Barcin_N + 0.05% Iberia_ChL
17.20% Iran_N + 15.60% Iran_ChL
7.10% Bonda:ORI34
0.30% Mongola

distance%=0.1284

Kalash:

19.6% Andronovo + 19.0% Srubnaya_outlier + 4.4% CHG
35.8% Iran_N + 5.2% Iran_ChL
15.9% Bonda:ORI34

distance%=0.3408

Pashtun:Pashtun2_8Af:

23.8% Iran_Chl + 20.2% Iran_N
24.1% Andronovo + 9.3% Srubnaya_outlier + 7.3% Scythian_AldyBel + 2.2% CHG
13.2% Bonda:ORI34

distance%=0.2549

Brahmin:

16.85% Srubnaya_outlier + 6.50% Andronovo + 5.45% CWC_Baltic_early + 4.55% Iberia_ChL
34.65% Bonda:ORI34
31.90% Iran_N + 0.05% Iran_ChL + 0.05% MA1

distance%=0.6607

Using the Bonda isn't ideal, as the sample in question is anywhere between 10%-20% West Eurasian. Due to this, in terms of "ASI", the percentages are likely to be slightly inflated.

Nevertheless, that's not the point.

Rather, I think it's quite exciting to see Steppe_MLBA + Srubnaya_outlier for South Central Asians and Brahmins.

If true, this would explain the Steppe_EMBA effect, and with it the R1a-Z93 paradox; it's just an artifact, due to the mixed nature of the steppe-related genetic contribution to South Central Asians and "Upper Caste" North Indians (I mean, a mix of something similar to Sintashta/Andronovo + something similar to Srubnaya_outlier could easily be conflated with either Yamnaya or Sarmatian_Pokrovka, in formal analyses).

Davidski said...

I've noticed myself that there's no decent proxy for the Ancestral South Indian (ASI) component in this test yet.

Onge clearly inflates ASI, while Dai dampens it. Aeta seems to work OK, but is still far from ideal.

We'll just have to wait for Mesolithic genomes from South Asia to get these models right for just about anyone with significant South Asian admix.

Rob said...

My run:

Yamnaya_Samara:I0357
"EHG:I0124" 55.4
"CHG:KK1" 22.15
"Armenia_ChL:I1631" 15.45
"Tepecik_Ciftlik_N:Tep002" 4.85
"Iran_N:I1290" 2.15
"Villabruna:I9030" 0
"ALPc_MN:I1498" 0
d: 0.46%

Pretty sure KK/ Arm Chalc/ Tepecik will disappear when Meshoko-Majkop genomes available.

Rob said...

CWC_Germany:I0049
"Yamnaya_Samara:I0231" 76.1
"Tiszapolgar_ECA:I2354" 18.8
"Narva_Lithuania:Kretuonas1" 5.1
"Villabruna:I9030" 0
"CHG:KK1" 0
"Germany_MN:I0172" 0
d.30%

Bell_Beaker_Germany:I0108
"Yamnaya_Samara:I0231" 38.1
"Tiszapolgar_ECA:I2354" 30.95
"Blatterhole_MN:I1593" 28.8
"Villabruna:I9030" 2.15
"CHG:KK1" 0
d.36%

Adding CWC, Yamnaya is displaced for BB run

Bell_Beaker_Germany:I0108
"CWC_Germany:I0049" 47.95
"Tiszapolgar_ECA:I2354" 24.9
"Blatterhole_MN:I1593" 23.3
"Villabruna:I9030" 3
.....
"Yamnaya_Samara:I0231" 0
d.30%

Very nice, IMO. So CWC is Yamnaya plus north Balkan/ Carpathian Eneolithic. BB = CWC + extra MN, from both C-B & local German MN.

Alberto said...

Yes, looks very nice. Still need to run many more models to see, but I wanted to test the Baltic_BA to compare with Global 10 first. I used the eigenvalue scaling.

Baltic_BA
CWC_Baltic 62.75 %
Narva_Estonia 20.65 %
Hungary_BA 16.55 %
Narva_Lithuania 0.05 %

I like that it took CWC_Baltic instead of CWC_Germany (that was in a long list of omitted populations). Global 10 was not picking CWC_Baltic even in the absence or CWC_Germany, and doing strange things instead. Nice!

Rob said...

Seems to have good diffn.

Well I be ...

Sintashta:RISE392
"CWC_Germany:I0049" 58.55
"Yamnaya_Samara:I0231" 22.4
"Germany_MN:I0172" 16.45
"Narva_Lithuania:Kretuonas1" 1.95
"EBA_Armenia" 0.0

d 0.26%

Seinundzeit said...

Alberto,

I completely concur; with eigenvalue scaling, everything so far looks very nice.

David,

I dream of Mesolithic South Asian genomes (lol). Such samples would clarify so many questions.

With that being said, I've found that when one applies scaling to this data, the Onge and Bonda yield somewhat sensible percentages (a tad bit inflated, but only by a slight amount).

And for whatever it's worth, I was just able to produce an ASI simulation that works very well (so far).

Tomorrow, when I can utilize it with the South Central Asian data I've sent your way, I'll post the results.

Matt said...

@Alberto, you're correct mate, well spotted. I'd just noted that some African samples were the most distant, rather than the Nganasan or anything like that, and then left it at that, but it seems like you're correct to me.

Some evidence for this, correlation of scaled and unscaled distances with Fst: https://imgur.com/a/3unA6

(Where scaled means "multiplying each dimension by the square root of the eigenvalue").

You can see that while there is correlation in the unscaled specification, sometimes it can be quite low. Conversely correlation (r2) with Fst hits ranges in 0.88-0.97 for the scaled specification. This strongly suggests that "scaling" improves fit to real distance (assuming Fst as a stand in for real distance).

This is also notably as true or marginally more so when considering squared euclidean distance, which drives nMonte: https://imgur.com/a/cjP79

One thing I'd note as well is that the correlation with Fst looks waaay better, than when I did a similar exercise with Global10. This Global25 probably much more accurately captures differentiation? And particularly better for Africans vs other - IRC Global10 gave way higher differentiation for West African vs other pops relative to their intra differentiation than actually existed, probably because dimension 1 took up a higher share of the differentiation.

There are still some populations who seem under and overdifferentiated compared to Fst, and generally these are relatively isolated populations with independent drift (Kalash, Natufian, Ust Ishim, WHG, SHG, CHG, etc.) who are probably not going to be well described within the first 25. To the extent the 25 capture them, it's actually probably capturing a slightly more general population that contributed to modern people (e.g. our sampling of CHG, for'ex, is two samples who probably have a bit of drift that would be smoothed out in a more general Caucasus HG subpopulation that actually existed and contributed to people).

Also by the way, squared Euclidean distance on these PCA converts to Fst at about a ratio of 0.26-0.28 in the plots I've tried, so if you want to try and thing about nMonte3 distances in terms of Fst and compared to real population Fst, that may be useful?

@Sam / David, it may be worth repeating those fits for the modern Baltic and the Yamnaya with the scaled dimensions. In the case of the Baltic, the distance between Baltic to Nganassan will be underspecified in the unscaled and this will mean that nMonte may be taking more of it in the fit. I expect that the Nganassan in Lithuanians will evaporate in the scaled data.

Davidski said...

@Matt

It may be worth repeating those fits for the modern Baltic and the Yamnaya with the scaled dimensions.

How are you scaling this?

Chad Rohlfsen said...

I'm not sure the Nganasan is wrong though. Look at all the N1 in Lithuanians. They have East Asian mtDNA too.

Matt said...

@Davidski: Scaled means "multiplying each dimension (column) by the square root of the eigenvalue". I'll put a screenshot up in a min.

Distance co-plots and NJ tree using the scaled data: https://imgur.com/a/oinmB

You can see that the distances have become much more smooth to the deep ancestry represented in lower PCs (so that Baltic_BA distinctiveness does not take NE European populations as far from Srubnaya or Yamnaya as it did in the unscaled distances), but these still represent the fine structure as well.

(It's a bit marginal to adna, however, looks like some nice structure between Sino-Tibetan related vs Japan-Korea, so any users with ancestry from East Asia could find this more useful than Global10, I'd guess. See - https://imgur.com/a/5UAAp)

Eren said...

@David:
Exactly as Matt explained. But after taking the square roots of the Eigenvalues, it might be prefarable to divide each of the Eigenvalues by the first Eigenvalue. So, instead of multiplying the columns by 11.38231, 10.15529, 3.771207 ... you multiply by 1.0, 0.8921997, 0.3313217 ...
It doesn't make a difference for the models produced, but the distances will look more familiar, i.e. smaller, wich helps with interpretation of the results.

You can do this either in Excel by hand, or in R. I wrote a small script that does that: https://drive.google.com/open?id=1M-MLj2TPyGvSmSXxRxfS0vMt_JcjAWLd
It takes as input the PCA spreadsheet and a text file with the Eigenvalues (only the Eigenvalues, comma seperated. Like this: https://drive.google.com/open?id=1Xw9xEmhIyPItWc9zQJMGrAQnmcQt2q2R)

Unknown said...

I know WHG can be modeled as part Vestonice16 and part ANE, and El Miron as part GayetQ116-1 and part WHG, and also Vestonice16 and Sunghir as part Villabruna but what about admixture from Paleolithic into younger populations?

Could someone test if oldest available Paleolithic genomes like Oase1/2, Ust'-Ishim, Kostenki12/14, Sunghir1/2/3, Muerii1/2, Cioclovina1, Ostuni1/2, GoyetQ116-1, many different Vestonices, Pagliccis, and Goyets, Krems-Wachteberg, LaRochette and Rigney1
can be used to model any modern, mesolithic and neolithic populations as directly admixed with them? Also could someone test how much Tianyuan-like admxture does GoyetQ116-1 have and if it's Sunghir/Vestonice who have WHG admixture or the other way around?

huijbregts said...

Actually it is not a scaling but an unscaling, because you multiply by the square root of the eigenvalues. Mathematically this is equivalent to restoring the variance of the raw data.
If there is compelling empirical evidence that this improves the geometry, it would be convenient if Davidski sets the scale switch of his PCA software to false.

Matt said...

@Davidski, yes, as Eren says.

@Eren, you could do that additional scaling as well, the only thing I would say about this is that, although i haven't tested it extensively, from memory of this and Global10, it seems like when the raw sqrt of eigenvalues are used (rather than normalizing the sqrt of eigenvalues such that the sqrt eigenvalue of PC1 = 1), then the outcomes *seems* to be that maximum population Euclidean distance across populations within the set is approximately 1.

So it may depend on if we think this is a worthwhile property to retain, where maximum between pair distance = 1, and if that helps us to interpret what relative distance between different PCA means?

Eren said...

@huijbregts:
Indeed, we are talking about restoring the original variance of the raw data.
But I've seen an instance of the same lingo used in an academic paper (Behar et al. 2009):

https://www.nature.com/articles/nature09103#supplementary-information
"Supplementary Figure 2
This file shows the Principal Component Analysis of the Old World High-Density Array Data. a, Scatter plot of Old World individuals, showing the first two principal components. Here, the first PC (4.2% of variation, vertical axis) captures primarily differences between sub-Saharan Africans and the rest of the Old World. The second PC (3.4% of variation, horizontal axis) differentiates West Eurasians from South and East Asians. Axes of variation were scaled according to eigenvalues. Each letter code (Supplementary Table 1) corresponds to one individual and the colour indicates population origin. b, Scatter plot of Old World individuals, showing PC1 and PC3. c, Scatter plot of Old World individuals, showing PC1 and PC4. Note that eigenvalues for PC3 and PC4 are ~8 times smaller than for PC1 and 2."

@Matt:
Ah okay, I haven't paid attention. That would be an interesting property actually. If the max distance is approximately 1 that could be used as a benchmark for the goodness of fit.
I actually hadn't used that normalization again until today, when I've seen Sein's scaled results. The small distances he achieved cought my attentiion, then I remembered about the normalization step that achieves this.

Unknown said...

>I've noticed myself that there's no decent proxy for the Ancestral South Indian (ASI) component in this test yet.
>Onge clearly inflates ASI, while Dai dampens it. Aeta seems to work OK, but is still far from ideal.

Have you tried Oase1/2, Tianyuan and Ust'-Ishim? They are likely not directly ancestral, too old and archaic admixed but may show some interesting results.

Matt said...

Produced some fits: https://pastebin.com/61wY2vYa

Target - Modern European population averages
Valid ancestor set - All West Eurasian ancient samples between Copper and Bronze Age; All World population averages and ancient

Fits for North European populations make a reasonable amount of sense (including, really no East Asian ancestry greater than 0.5% in East Central Europe-Baltic, unless any within Baltic_BA, Copper Age Hungary). Fits for Southern Europe are more complex and will need more recent ancient dna over time to be able to fit recent people well with a small number of ancestor populations.

(Also few more graphics: https://imgur.com/a/R03Ly.

First a NJ tree with only the ancients and modern population averages (since it's impossible to fit *all* samples, even all broadly West Eurasian samples on a tree within PAST3).

Second, some graphics for a reprocessed / meta-PCA, where I've taken the PCA data for broadly West Eurasian samples, then placed through another PCA to show the structure that is more or less present for them alone in the whole Global25).

Antoni Małkowski said...

Czy ci dwaj Panowie biorą udział w tej analizie ?
Vucedol [ I3499]
Szigetszentmiklós [I2787]

Matt said...

Best fit for a couple of early Steppe related population allowing all West Eurasian ancients pre-Bronze Age, all world ancients, and all world population averages outside of West Eurasia:

Yamnaya_Samara:
Samara_Eneolithic,69.6, CHG,19.8, Armenia_ChL,10, Tisza_LN,0.6
(W/out Samara_Eneolithic) EHG,56.6, CHG,29, Armenia_ChL,12.6, Baden_LCA,1.4, Sweden_EN,0.4

CWC_Baltic_Early:
Samara_Eneolithic,64.8, CHG,10.2, Baden_LCA,7.8, Sweden_EN,7.6, Armenia_ChL,6.2, Iran_LN,2.8, Narva_Lithuania,0.6
(W/out Samara_Eneolithic) EHG,52.8, CHG,19.4, Armenia_ChL,10, Sweden_EN,9.2, Baden_LCA,6.6, Iran_LN,1.4, Narva_Estonia,0.6

Comparing Yamnaya_Samara to CWC_Baltic_Early, it seems like the formula basically switches about 5% Samara_Eneolithic+13%CHG/Armenia_CHL for about 15%EEF+3%Iran.

I guess the Iran_N to be noise, but otherwise seems sensible for a population which split off a little earlier in the Chalcolithic from Samara_E, before it had picked up as much southern CHG/Armenia related ancestry, and then picked up a little EEF, poss in Ukraine. (Rather than a straight up Yamnaya_Samara clone + WHG). I think that fits sensibly what we know about early Corded Ware?

Davidski said...

@Matt & Eren

I'd like to give people a choice on what type of PCA data to use. So Eren, are you happy with the script that you posted, and can I link to it in the blog post? Matt, have you had a chance to look at the results from Eren's script? I just got up, so I'll check it out in a few hours.

@Antoni and everyone else asking me about samples from preprints

The samples that you're looking for are from the Olalde et al. preprint, which hasn't yet been formally published, so they're not available yet. They will be available as soon as Olalde et al. is published in a journal.

@All

I said enough about Cheddar Man.

Samuel Andrews said...

@David,

Global25 gives odd results for northwest Europeans. Something like 55% Yamnaya, 40% Barcin, 5% WHG.

MomOfZoha said...

Jesus H Christ.

This whole damn time all that "nMonte" analysis was just based on a bunch of vectors??

And, here I thought that nMonte itself does some fancy chromosome painting or allele sharing stuff (too lazy to look up exactly what software that is despite catching the obvious "Monte" of "Monte Carlo").

Wowsers.

Well, MATT, I can show you a method that has no dependence on number of dimensions and gives beautiful visualization to boot. Not to mention the clustering extended to amazing variations. :)

See, I am no good at manipulating the data at the SNP level (I just never did), and way too lazy to download all those genomes to start with.

BUT, I *do* know *something* about vectors, and graphs even better than that. I analyze way more complicated networks regularly than the tiny little graphs that come up here.

The only question is if I should be patient and publish, or blurt things out here instead. I cannot do both because a paper has my name attached, and there are waaay too many fascists grazing around.

Big conundrum. :/

Just for the sake of Matt and Chad, I am leaning towards impatience. I suppose David is ok too since he hasn't banned me yet. :)

Let's see what happens tomorrow. I have no time today and am not on the right machine anyhoo. Tomorrow tomorrow.... may blowwww your miiiind awaaaayyyyy.....

Matt said...

@Davidski, yeah the script works completely fine, and is a nice quick way to get the result.

It's got that extra step Eren talked about where the "eigenvalue scaling" where the sqrt of each eigenvalue is effectively normalised by the sqrt of the first eigenvalue (and this cancels for the first PC so it is effectively not scaled), but apart from that it's the straight scaling by sqrt of eigenvalue we've talked about. (I'm guess you can just delete the row "eVs <- eVs/eVs[1]" from the script to remove that extra step if you don't want it?).

...

Also all, couple of fits for South Asia, allowing all ancients and all non-West Eurasian moderns, except South Asians apart from Paniya:

Brahmin: Paniya,46.6, Iran_N,21.4, Srubnaya_outlier,9.2, Yamnaya_Samara,7.2, England_IA,5.2, CWC_Baltic,4.6, Armenia_EBA,2.4, Srubnaya,1.8, LBK_EN,1.4, Armenia_ChL,0.2

Pashtun: Iran_N,31.6, Paniya,26.6, CWC_Germany,8.6, Armenia_ChL,6.8, Scythian_AldyBel,5, CWC_Baltic_early,4.4, Potapovka,4.4, Srubnaya,4.4, Srubnaya_outlier,3.6, Armenia_EBA,2.4, Minoan_Lasithi,2

On the assumption that the European related ancestry (IA, EEF, etc.) is an historically awkward way of trying to best fit Steppe_MLBA ancestry (with their EEF related ancestry), removing all European populations:

Brahmin: Paniya,47.2, Iran_N,19.4, Steppe_MLBA 12.8 (Srubnaya,7.4, Potapovka,5.4), Yamnaya_Samara,11.6, Srubnaya_outlier,4.6, ArmeniaBAChl 4.4 (Armenia_EBA,3, Armenia_ChL,1.4)

Pashtun: Iran_N,30.6, Paniya,26.4, Steppe_MLBA 21.2 (Srubnaya,14, Potapovka,7.2), ArmeniaBAChl 11.8 (Armenia_EBA,4.8, Armenia_ChL,7),Scythian_AldyBel,5.8,Poltavka,2, Srubnaya_outlier,2,Korean,0.2

Anthro Survey said...

@All

For the heck of it, made a simple 2D analogy of what happens when a PCA is (un)scaled.

Initial PCA. Let L be the distance between our two inputs. Let D be the Euclidean distance from the fit to the target.
https://justpaste.it/1gsh7

(un)scaled PCA---dimensions multiplied by different numbers(representing diff square root values of 2 eigenvalues). As seen, the old "d" segment no longer intersects at 90 degrees, and we have a new d. Hence, a new fit: the "ancestral" proportions DO change somewhat.
https://justpaste.it/1gsh4

How about the old d/L ratio vs new d/L ratio? Not the same, as demonstrated by triangles which aren't similar.
https://justpaste.it/1gshb

As for the PC1 normalization step? Since all the dimensions of the PCA are divided by the SAME value, it changes neither the fit(proportions), nor the d/L ratio. Just a smaller version of the previous PCA. This effect is pretty intuitive and no demo is necessary.
But, yes, the utility is in its smaller and, hence, more "familiar" numbers. It doesn't bother me either way, though.

Rob said...

@ Matt

Interesting I get a different result from you.; the same old thing ive been getting for years now

Brahmin:Brahmin4
"Ust_Ishim:Ust_Ishim" 46.8
"Yamnaya_Samara:I0231" 29.7
"Iran_N:I1290" 23.5
"Villabruna:I9030" 0

Using U-I as a fill in for the uknowns. CWC and Sintashta are in there, scoring zero; and no Euro MNE of any sort.

Also im weary of using the CHG heavy Khvalynsk, as it gives a false impression and undercalls the broader processes occurring between Eurasia & the steppe

Davidski said...

@Samuel Andrews

Global25 gives odd results for northwest Europeans. Something like 55% Yamnaya, 40% Barcin, 5% WHG.

You might need better proxies for the forager and farmer admix in Northwest Europeans to make sure that the steppe ancestry proportion isn't inflated.

This sort of thing happens a lot in this test if you don't get the reference samples right. Of course, in some cases the right references might not be available yet.

Davidski said...

@Eren & huijbregts

Any way to combine the modeling and scaling, the latter as an option, in one script? Options are always good, just in case.

Matt said...

@MomOfZoha, yeah, nMonte is just a Monte Carlo distance minimization algorithm (repeated cumulative distance improving iteration with randomized starting conditions over millions of steps, Ger Huijbregts who designed it may correct me on this).

We're using it for the most part on spreadsheets of Principal Components Analysis data that Davidski has run over genotype data, finding dimensions and vectors which explain the covariance of genotypes (and so population structure). For the most part, none of the rest of us are actually handling any of the genotype data directly ourselves...

@Rob, Ust-Ishim is probably going to crop up if we're restricting ourselves purely to ancient samples, as it's the closest (very distant!) thing to the Paniya among ancients (see most of the neighbour joining trees).

Though probably for quite different reasons - Ust Ishim is Paniya like because it's fairly equidistant between West and East Eurasia, belonging to neither, and because it has not undergone any heavy process of drift that later Eurasians see. (There was a trend to describe South Asia like this years ago; effectively as "Basal Crown Eurasian" like Ust-Ishim, but few believe this today, rather than that South Asia is simple the product of admixture between more differentiated Basal, West, East Eurasian groups.). Additionally, any actual additional private drift Ust Ishim has is probably not that well described by these dimensions.

Anthro Survey said...

@Davidski

Forgot to ask----You were more stringent with the coverage for this Global25, right? I noticed a few samples from Global10 are absent here. I recall one of the Blatterhole_MN being of poor coverage, for example.

Davidski said...

@Anthro Survey

You were more stringent with the coverage for this Global25, right?

Yes, although I plan on adding a few of the ancients that missed out initially because of the more stringent criteria. In other words, the sheet will be updated with a few more ancient samples soon.

Matt said...

@Sam / David: Using scaled data, no evidence for me that WHG/VB is low in nMonte fits. See - https://pastebin.com/JJ6kfAm2.

E.g. In simple Yamnaya+Barcin_N+Villabruna model (and allowing all various other possible ancients but no SHG, Baltic_HG, Narva, etc.):

English: Yamnaya 48.2 (Kalmykia 38.8, Samara 9.4), Barcin_N: 37.8, Villabruna: 14

Icelandic: Yamnaya 53 (Kalmykia: 37.8, Samara: 15.2), Barcin_N: 32.8, WHG: 14.2

Polish: Yamnaya 41 (Kalmykia: 0.2, Samara: 40.8), Barcin_N: 38.2, WHG: 12.6, EHG: 8.2

Lithuanian: Yamnaya 36.4 (Samara), Barcin_N: 32.6, WHG: 15.6, EHG: 15.4

Low WHG in unscaled fits probably relates to WHG being excessively far from other West Eurasians in the unscaled scenario, per Alberto's first comment above.

Overall these aren't very good fits compared to including later populations with more of the right drift, but they are the best fits with the available. The fits for the NE European populations may be "reaching" for EHG to try and fit the mix of heavier HG balance and the East Europe specific Baltic_BA dimension (which is lacking in Global10, for'ex); bug or feature depends on what you're using it for.

Rob said...

@ Matt

I am not attached to U-I. Adding Paniya doesn't change much overall. Rather, the main thrust of the query was how you're obtaining MLBA in your results and I am not?
Essentially, i think these results suggest a direct movement from steppe to South Asia, with no additional MNE component.

Matt said...

@Rob, are you using the scaled or unscaled datasheet?

Rob said...

Scaled (by Alberto)

Matt said...

Can't explain it then! If we're using the same software, datasheet scaled the same way, same calc populations, the outcome should not be different.

Here are a few pastebins:

Calc: https://pastebin.com/QBLWbzRa, Target (Brahmin Average): https://pastebin.com/dats59hs, Target (Pashtun Average): https://pastebin.com/NLk6Mb36

If you're interested in running in nMonte and checking that you get the same results, and against the same rows to check the scaling has been run identically.

Seinundzeit said...

Rob,

I think Matt has a point with Ust-Ishim; the similarity with the Paniya is likely to be an artifact of our Paniya samples being an ENA-"West Eurasian" (a term without much phylogenetic meaning, but we do need to use it) mixture, since Ust-Ishim is approximately equidistant to West Eurasians (ones lacking BEA) and ENA populations.

That being said, I'm not a fan of using the Paniya either, considering that their 20%-40% West Eurasian admixture (probably 30%) eats into the Iran_N-related percentages.

At the end of the day, if one must use a contemporary Indian population to capture South Asian/southern Central Asian/Iranian-specific ENA admixture, the Bonda are probably best, as they are the most ENA-like population in South Asia (at most, 20% West Eurasian, but probably closer to 10%-5%).

All,

I've finally found a setup I truly like; the results are perfectly in tune with the hints I've heard concerning Central Asian/South Asian aDNA, and it solves the Steppe_MLBA vs Steppe_EMBA conundrum for southern Central Asia/South Asia.

If it succeeds in every circumstance, I'll post the output.

Davidski said...

@All

Everyone needs to scale the datasheet in the same way, otherwise his whole thing will get very messy.

What's the easiest and most accurate way to do this? Past? Eren's script? Excel?

Anthro Survey said...

Dave's right. We need a common tongue. lol

Created an (un)scaled datasheet earlier today, multiplying dimensions by sq. rt. of the corresponding eigenvalue.

BUT, if everyone else decides to add the extra PC1 normalization step, I'll follow suit, too.

What's the consensus?

P.S. Personally, not as script-oriented as some people here are and prefer using excel for this, but that's just me.

huijbregts said...

@Davidski
Integrating the scaling into nMonte is not hard, but would demand an extra set of input data and be very inconvenient when you use nMonte on other datasets.
I think the easiest way to enforce uniformity in scaling is that you switch off the scaling of the raw data in your PCA software. Most of the PCA programs have that option.

Eren said...

@David:
The script does the job, it's a very simple little function really. Having the confirmation from Matt, it would be fine by me if you link to it.
But as Matt pointed out, I've included a line that does the additional step of normalizing by the sqrt of the first eigenvalue. This line can simply be removed if so desired. It doesn't change the modeling results, only the distances (i.e. bigger vs smaller). If Matt is correct, than not doing that step might preserve an interesting property that might help with interpreting results (see his previous post).

As to your question if it is possible to include scaling in nMonte. Yes, it's possible. But that's something that huijbregts would have to do, if he so decides to.

The easiest and most accurate way of doing it is with a script. But it's also possible to do in Excel, just a bit inconvenient. In PAST3 it's not possible afaik.

Davidski said...

@huijbregts

I think the easiest way to enforce uniformity in scaling is that you switch off the scaling of the raw data in your PCA software. Most of the PCA programs have that option.

I'm hesitant to do this for two reasons. I've already ran a lot of people with the current Global 25 setup, and I'm getting good feedback. Also, it's not yet certain that the scaled output is better for all populations, especially for as yet unsampled ancient populations.

Hence, I think it's better to have two options: unscaled and scaled, and let people try both.

@Eren

Can you please upload a version of your script without the normalization feature, and I'll link to it in an update to the post after you're done. Also, can you include a short guide how to call the script, as this will probably save me from replying to at least a few e-mails about how to call the script.

@All

We'll be using Eren's soon to be linked script as the default scaling script. If you're intent on using a different scaling method, please always specify that you're doing so along with the models that you post here. Thanks in advance.

Unknown said...

>huijbregts

Is nMonte open source and where can I download this program?

Eren said...

@David:
I've updated the script, it can be found here:
https://drive.google.com/open?id=1M-MLj2TPyGvSmSXxRxfS0vMt_JcjAWLd

Instructions:
The function takes two inputs:
firstly a PCA datasheet (e.g. Global_25_PCA.txt);
secondly a simple text file with the eigenvalues, comma separated
(https://drive.google.com/open?id=1oIdH8UStJqqwER6yHnn2l2yinLs5VWqO)

There is a third parameter "writeFile", default value is TRUE. Meaning it will create a scaled datasheet with the extension "_scaled" in the same folder by default. If you don't want that, simply call the function with writeFile=F.

Example usage:
> source("EigenScale.R")
> EigenScale("Global_25_PCA.txt","Eigenvalues.txt")

This will create a scaled datasheet in the same folder named "Global_25_PCA_scaled.csv".

Davidski said...

@Unknown

nMonte

https://www.dropbox.com/sh/1iaggxyc2alafow/AACIjLtnkuaNNsJ5oKME_3XHa?dl=0

@Eren

Thanks, I'll try it on Yamnaya tomorrow and will update my post with the results.

Rob said...

Can you add Motala, Latvia and Ukraine HG?

Davidski said...

@Rob

Motala is in the datasheet. It's labeled SHG.

I can't add Latvian and Ukrainian HGs to this just yet. But I will as soon as I get the Mathieson et al. dataset.

For now, Narva_Estonia is probably the most similar pop.

Unknown said...


>Though probably for quite different reasons - Ust Ishim is Paniya like because it's fairly equidistant between West and East Eurasia, belonging to neither, and because it has not undergone any heavy process of drift that later Eurasians see. (There was a trend to describe South Asia like this years ago; effectively as "Basal Crown Eurasian" like Ust-Ishim, but few believe this today, rather than that South Asia is simple the product of admixture between more differentiated Basal, West, East Eurasian groups.). Additionally, any actual additional private drift Ust Ishim has is probably not that well described by these dimensions.

Ust'-Ishim does have elevated IBD with South and Southeast Asians. GoyetQ116-1 surprasingly has the highest
IBD with Poland. In fact Kostenki14 and GoyetQ116-1 both have higher IBD with modern populations than Loschbour does and Goyet higher than even Villabruna.

Multiple

https://imgur.com/a/7TyrC

Goyet

https://imgur.com/a/P4GZc

Villabruna

https://imgur.com/a/ndegO

Although, I also have another picture with diffferent IBD for Ishim.

https://imgur.com/a/1fPKr

>There are still some populations who seem under and overdifferentiated compared to Fst, and generally these are relatively isolated populations with independent drift (Kalash, Natufian, Ust Ishim, WHG, SHG, CHG, etc.)

Do you have any graph for FST of Ust'-Ishim(and Oase1 if you have it) and other populations?

Roy King said...

@Davidski
You may want to do a new posting, but Cheddar Man is Y-chromosome haplogroup C!

Roy King said...

My bad--the paper is not yet out!

Alberto said...

Sorry for late replies. Just had time to go fast through the comments now.

About scaling: Yes, I agree we should all use the same method. I was using Eren's method with normalizing the sqrt of the eigenvalues before multiplying by each dimension because this is what I think all others were doing (when scaling). The results of the models are exactly the same, but I agree with Matt that the distances are more compatible with the unscaled version if we don't normalize them. So I'd go for that method instead.

@Rob/Matt

Some quick runs also got me different results than the ones Matt was getting. This is probably because Matt is using nMonte3 which runs with all the individuals and after aggregates them into a group. I think this method is good for modern samples (say, an American of mixed European origin wanting to find the best fit with modern European populations to know about his ancestry), but I'm less sure it's a good idea with ancient samples, where some low coverage/quality ones could get in and give strange results (worse part is that we don't even know it, since the individuals chosen are hidden in the output). I'd rather run with either averages (when a large ratio of the samples of a given population are of decent quality) or to pick individuals of good quality by hand. This was not only results will be generally better, but also reproducible and transparent for everyone.

Re: high steppe. Yes, in models using less proximate sources than BA, the EHG proportions are quite high (35% EHG for Scottish, for example). In Europe, getting the right balance between the 2 main axes (AN-EHG and WHG-CHG) is quite tricky. Global 10 has a bias towards inflating WHG-CHG, but it seems that Global 25 has a bigger one of inflating AN-EHG instead. No idea why. I'll try to post a comparison when I get the time.

Matt said...

@Alberto, I don't agree about low coverage/quality clearly being better diluted as an issue by using averages, rather than allowing nMonte3 to select samples and then average out (it's not clear that "avoiding" quality by averaging out is better than by selecting for subsets that optimize distance then aggregate).

But I do agree with the proposition re: transparency of using population averages where populations have large heterogenity of position (e.g. if nMonte3 gives 20% Corded_Ware_Germany, it's entirely ambiguous if this is from lower or higher steppe samples). The flipside to this though, is that using only averages sharply reduces the set of ancient points nMonte has to choose from, so will probably result in some less optimal fits.

That said, in this case, using the nMonte3 workflow with post fit aggregation does not really affect the outcome re: whether Steppe_MLBA is detected in the two SCA populations I used. Using nMonte3 and only population average rows:

Pashtun: Paniya,26, Iran_N 29.4 (Iran_N,25.2, Iran_LN,4.2), Steppe_MLBA,18.4 (Potapovka,10.8, Andronovo,7.6), ArmeniaBAChl,11.2(Armenia_EBA,6.8, Armenia_ChL,4.4) Scythian_AldyBel, 6.2 Srubnaya_outlier,5.6, Steppe_EMBA 3.2 (Poltavka)

Brahmin: Paniya,46.6, Iran_N 21.6 (Iran_LN,14.4, Iran_N,7.2), Steppe_MLBA 15 (Potapovka), Srubnaya_outlier,8.6, Steppe_EMBA 8.2 (Poltavka)

(Per Sein, using Bonda in place of Paniya:

Pashtun: Iran_N,35.8, Steppe_MLBA 22.6 (Potapovka,22.4, Andronovo,0.2), Bonda,21.4, Armenia_ChL,10.4, Srubnaya_outlier,9.8

Brahmin: Bonda,34.2, Iran_N,31.4, Steppe_MLBA 23.6 (Potapovka), MA1,5.6, Srubnaya_outlier,5.2

Fits with Bonda are slightly worse prob. as Paniya better captures real ancestry than the Iran_N+Bonda combination)

I'd also say I don't really agree that the steppe samples are necessarily being fit as too high here. How do we know what is the "right" level and whether G10 / G25 are closer to it? 50% Yamnaya is plausible for Northern Europe, and with Yamnaya 44% CHG+ArmeniaChl is also well within the range of formal models. We can only say whether these are fairly consistent with formal models and this appears to me to be the case. Not clear G10 performs better.

Alberto said...

@Matt

No idea about the SCA models, really. I haven't tested any of those yet. But I noticed the discrepancy with your models when running a few European moderns. For example, for Scottish I was getting:

[1] "distance%=1.6979 / distance=0.016979"


Scottish
"Ireland_EBA" 53.55
"BenzigerodeHeimburg_LN" 34.2
"Protoboleraz_LCA" 10.65
"Iberia_BA" 1.45
"Han" 0.05
"Iberia_EN" 0.05
"Unetice" 0.05

While you got:

[1] "distance%=1.3368"

Scottish:ScottishAverage

Unetice,36.2
Ireland_EBA,26.8
BenzigerodeHeimburg_LN,11.8
Iberia_BA,9.6
Baden_LCA,8.4
CWC_Germany,3.4
Portugal_MBA,3.4
Iberia_ChL,0.4

I was wondering why you got a different result and better fit. And then I noticed you're not running averages (as I was), but individuals that after get aggregated into its group.

My problem with this when using ancients is that worse quality samples can change things. Here for example you get high Unetice, but I don't. I remember that some Unetice samples (from the Reich lab) are UDG treated, while the ones from RISE aren't. I once made this test with all those samples:

https://docs.google.com/spreadsheets/d/15FZpOVsnm3l017CFWMCQtaJX1viKmR2SJPSwJ8oSA-Q/edit?usp=sharing

This was to explain why I avoid using dubious samples as source populations, specifically this was about the Remedello ones. The Remedello samples are problematic for various reasons (different periods, low quality..) and they're not UDG treated. If you look at the results, the Unetice samples from RISE which are also non-UDG treated are attracted to those Remedello ones, while the UDG treated samples aren't.

So back to your model, I couldn't see if you were getting high Unetice from the "good" samples or from the "bad" ones. (This as you agree affects transparency, but I also think it affects the quality of the output - not necessarily in your model above that's run against a modern population, but more clearly when running against other ancients).

I actually never run averages with ancients until now. I always preferred to hand pick the highest quality ones as source and run against all individuals as targets (so we can spot the differences and the possible quality issues).

Regarding the amount of steppe, I guess it will matter less when you use BA sources, since they're already balanced (or unbalanced, however you want to think of it) themselves and will affect less the output. But for comparison, this are results from Global 25 for Irish using a standard set of pops that I used a lot in Global 10 before:

Irish
Germany_MN:I0172 46 %
EHG:I0061 35.3 %
CHG:KK1 11.8 %
Barcin_N:I0707 6.7 %
WHG:Loschbour 0.2 %
Nganassan 0 %
Mozabite 0 %
Iran_N:I1290 0 %
Paniya 0 %
Dai 0 %
Natufian:I1072 0 %

With Global 10:

Irish
Esperstedt_MN:I0172 32.1 %
Kotias:KK1 25.4 %
Loschbour:Loschbour 23.4 %
Karelia_HG:I0061 12.9 %
Barcin_N:I0707 6.2 %
Nganassan 0 %
Mozabite 0 %
Iran_N:I1290 0 %
Paniya 0 %
Dai 0 %
Natufian:I1072 0 %


So there's a clear shift from a likely inflated WHG-CHG in Global 10 to an even more likely inflated EHG-AN in Global 25. That's why I said it's tricky to get the right balance there. I try to use the formal stats with WHG-EHG to see where the balance should be. CW is clearly EHG:

Loschbour EHG Corded_Ware_LN Chimp -0.0189 -3.32 341342

But modern North Europeans are clearly WHG:

Loschbour EHG Scottish Chimp 0.0202 3.771 341763

And the model from Global 25 looks a bit incompatible with those stats. (Not that Global 10 was perfect in this either, though I think it was closer to the right point. That, of course, does not preclude that this new Global 25 is clearly superior for fine structure with more proximate sources).

Matt said...

@Alberto, that's a fair comment on the European fits, they are slightly different between your run and mine. I'd actually thought you were commenting on the SCA models since your reply was to Rob and given the context.

I would say that qualitatively, both results we've got do seem overwhelmingly similar. I wouldn't actually count on this PCA to be able to *that* reliably distinguish between the different streams of LNBA ancestry for Ireland_EBA and Unetice related ancestry.

There are further correlations in position visible in David's North Europe specific PCA, for sure, between Ireland_EBA and Irish/Scottish, and the tree stricture here is *mostly* pretty good at landing ancients close to moderns from roughly the same region. But I would be pretty surprised if it could reliably distinguish between Ireland_EBA and Unetice related ancestry with a high degree of accuracy.

Qualitatively both the fits tell us that the Scottish average is similar to the NW European Bronze Age, with some additional EEF related ancestry, which is represented slightly differently in your fit than mine. I really don't have confidence that the data in Global25 can distinguish reliably between the scenarios either offers, and that's kind of the spirit in which I posted them up. Like, this PCA cannot really tell us the exact degree of post early Bronze Age migration involving Ireland/Scotland.

Similarly, I think it's a really admirably detail oriented approach to be actually going in and checking which ones of the Unetice samples were and weren't UDG treated, and how this could affects very fine relatedness with other samples. But unless they actually behave quite differently in these dimensions, it doesn't seem so relevant here, and there doesn't seem like there is any substructure among the Unetice samples that would reflect quality.

If you have these very slight degrees to which position in the PCA is affected by UDG treatment, and this has systematic but very slight difference on attraction to samples which almost overlap exactly themselves (e.g. comparing Remedello and MN Europeans, Latvia_HG and Loschbour, etc.), then that's kind of beyond the resolution of what I think even these 25 dimension PCA are going to be picking up with accuracy via nMonte...

Regarding the amount of steppe, I guess it will matter less when you use BA sources, since they're already balanced (or unbalanced, however you want to think of it) themselves and will affect less the output.

Yamnaya_Samara though, in my models, I would have to comment not say is actually unbalanced - in the range of 29% CHG or 34% CHG ancestry (depending on how we treat Armenia_Chl), then probably 10-15% Barcin related, and the remainder EHG related.

If you assume for Irish a similar Steppe to English-Icelandic of 50%, then feed that through, 14.5-17% CHG. Not *too* radically far from the 11.8% in your model... Assume something more like CWC_Baltic_Early rather than Samara_Yamnaya, which may be historically more reasonable, and 10-12.5% CHG would result.

I guess the point I'm making here is: a) proportions CHG for Yamnaya_Samara are within the range of plausible models, b) proportions for Irish are still roughly consistent with plausible models for Yamnaya or early Corded Ware (latter may be more likely to represent real ancestors).

They're consistent with a strong stream of CHG related ancestry as the dominant southern component in early Steppe related populations (with variance by CWC or Yamnaya subpopulation) and with an early Steppe related component as a strong stream of ancestry in Europeans. Beyond that, I'm uncertain if relatively slight differences between these results and similar scenarios in Global10 indicate that Global25 has more fine grained accuracy, or as you say, reflect some measurement error.

MomOfZoha said...

@Matt:
Thank you, Matt.

Partly thanks to an independent communication with AnthroSurvey, I see that the main purpose of nMonte is to help discover the ancestral/other population proportions whose admixture gives rise to a target population. I haven't looked at the README/code just yet (thanks for sharing the dropbox link, David), but IF conditional upon some s_1, s_2, ..., s_k source populations and a target population with samples t_1, t_2, ... t_n and sample average t, it seems clear that the min-distance weight distribution over the s_i is just the location in the convex hull of the {s_i} that is of minimum distance to vector t. In the worst case, the convex hull of {s_i} does not even intersect the convex hull of {t_i}.

But, that doesn't require Monte Carlo, so nMonte must of course be attempting to find the set of sources {s_i} whose convex hull is of minimum distance to the average target vector t. Yeah, that is not an easy combinatorial optimization problem in general, though one might not need such a general method for this particular dataset...

Well, having hopefully understood that correctly, let me first give a major DISCLAIMER that none of "my" methods (not that I "own" graph theory, but just talking in this context) would be so useful towards finding those source populations s to the fine-tuned granularity that is sought in these pop gen discussions. If y'all are arguing about differences of 0.01 versus 0.007 then forget about it.

However, the usefulness of graph theoretic methods arises from the representation itself based on *relative* sample distances, with great robustness against many types of "scaling" perturbations that you are all worrying over here. There are also many other forms of analysis that may be performed, and observations to be made directly from the connectivity information itself. Mind you, unlike all the directed acyclic admixture networks that I have seen used here, I am not talking about a small directed graph, but a huge and highly undirected graph simply indicating relatedness between pairwise samples.

But, instead of diving deeper into this, let me just state that if Global 25 is based on a PCA of some raw-er data, then that "raw" data itself must have been in some similarity matrix (or distance matrix) format to begin with. PCA is not just magically defined on SNP data. Dave has shared his full IBD matrix data of pairwise shared cMs between samples, over that set of 3K+ modern and 50+ ancient samples. I could, and would, create a graph from that pure form of the data itself.

However, there were lots of issues like some people sharing HUGE cMs with others and other people sharing teensy cMs with others. While a k-nearest neighbors graph (unlike a geometric graph) could still be relatively robust against this variance in max-cM sharing per sample, it still made me question the meaningfulness of the measure. Dave himself also confirmed that I should "clean" the data to remove highly "inbred" individuals sharing vast cMs within their group. But, as I also did not want to remove such groups completely, it became difficult to know the precise pruning method for this.

Then I thought that perhaps I should simply use the vector of *ancient cMs* for each modern sample, though that too was less useful in the case of highly recombinant/admixed populations (like my own population).

Anyway, wow, I wrote too much again, and still didn't deliver on the promised visualization. :)

Well, I'll run it for this Global 25 and the linked Global 10 datasets later today. I need to MOVE right now!

Rob said...

Haha thanks Roy

epoch2013 said...

@Unknown

K14 does not only show higher affinity in Europe, also surpisingly high in the Middle-East. You'd almost think he *did* have Basal admixture. But it could be he was just as close to the UHG as he was to WHG, UHG being the mysterious admixture in Anatolian, the non-Basal part. In which case his affinity would have been brought to Europe via more than one way.

Maybe a similar thing is the case with Goyet, with the link to Tianyuan and all..

Samuel Andrews said...

Guys, post results don't write books in the comment section.

Eren said...

@MomOfZoha: looking forward to what you're gonna come up with.


I tried some models for my grandparents earlier. One based on HG-Neolithic + East-Eurasian samples and another based on BA-IA samples. Paternal grandpa is from Sakarya, grandma from Giresun. Maternal grandparents are from Trabzon.

[1] "distance%=2.4704 / distance=0.024704"

eren_pgrandpa:ID001

Barcin_N:I0745 26.45
CHG:KK1 19.90
EHG:I0124 12.25
Ulchi:Ul16 12.00
Barcin_N:I0708 8.75
Iran_N:WC1 8.15
Barcin_N:I1098 6.35
Yakut:358_B 3.40
Natufian:I1072 2.20
EHG:I0061 0.55


[1] "distance%=3.4365 / distance=0.034365"

eren_pgrandma:ID001

Barcin_N:I0746 24.75
CHG:KK1 19.85
Barcin_N:I0745 17.25
Iran_N:WC1 15.60
Ulchi:Ul19 8.30
EHG:I0061 8.00
Yakut:358_B 6.25


[1] "distance%=3.4517 / distance=0.034517"

eren_mgrandpa:ID001

CHG:KK1 37.60
Barcin_N:I0708 26.20
Barcin_N:I0707 17.15
Iran_N:WC1 12.10
Natufian:I1072 4.90
Barcin_N:I0746 2.05


[1] "distance%=3.4933 / distance=0.034933"

eren_mgrandma:ID001

CHG:KK1 35.10
Barcin_N:I0707 28.90
Iran_N:WC1 17.20
Barcin_N:I0745 10.75
Natufian:I1072 4.60
Barcin_N:I1580 3.45



And these are the BA-IA fits:

[1] "distance%=1.8369 / distance=0.018369"

eren_pgrandpa:ID001

Armenia_EBA:I1635 23.45
Scythian_Pazyryk:I0563 19.40
Anatolia_BA:I2495 14.30
Armenia_MLBA:RISE423 13.75
Hungary_BA:I1502 7.90
Anatolia_BA:I2683 7.55
Scythian_Pazyryk:I0562 6.10
Minoan_Lasithi:I0074 5.85
Levant_BA:I1705 1.70
Armenia_EBA:I1658 0.00


[1] "distance%=2.0889 / distance=0.020889"

eren_pgrandma:ID001

Minoan_Lasithi:I0074 26.40
Armenia_EBA:I1633 16.80
Scythian_Pazyryk:I0563 16.40
Scythian_AldyBel:I0577 12.45
Armenia_EBA:I1635 12.00
Iran_ChL:I1662 11.00
Armenia_MLBA:RISE423 4.95


[1] "distance%=2.4095 / distance=0.024095"

eren_mgrandpa:ID001

Armenia_EBA:I1635 43.1
Armenia_EBA:I1658 34.4
Minoan_Lasithi:I0074 17.9
Levant_BA:I1730 4.6


[1] "distance%=1.7881 / distance=0.017881"

eren_mgrandma:ID001

Armenia_EBA:I1635 57.90
Minoan_Lasithi:I9005 16.45
Iran_ChL:I1662 9.55
Armenia_EBA:I1633 7.70
Levant_BA:I1730 6.15
Armenia_EBA:I1658 2.25

Chad Rohlfsen said...

@ Alberto,

Single Dstats are not going to tell you if they have more WHG or EHG, really. That's not what they are for. When you do model NW Euros, and basically Europeans in general, they do like more EHG than CHG. Often I get a higher rate of EHG to Caucasus/NEast in Euros than Yamnaya. Anyway, you can trust me on this and if you like, I can post some qpAdm and qpGraphs that show Euros have higher EHG than CHG-like stuff.

Chad Rohlfsen said...

Another thing is that you shouldn't compare samples with different capture methods. Loschbour is a different method than the EHG and other WHG. Loschbour is closer to ENA and EHG, compared to Bichon. Anyway, I wouldn't use Loschbour as your WHG stand-in. Bichon and Rochedane as a combo should serve you well.

Davidski said...

Loschbour is a different method than the EHG and other WHG.

Anyway, I wouldn't use Loschbour as your WHG stand-in. Bichon and Rochedane as a combo should serve you well.

Bichon is SG, like Loschbour, while Rochedane is capture, like EHG.

Anyway, I'll be adding more foragers to the datasheet soon, from those that I already have, plus those that are on the way. But Blatterhole_HG and Koros_HG are decent forager samples as well, even if they have a bit of EEF in them.

Alberto said...

@Matt

I checked more closely the datasheet with individuals and Davidski's already done a good job with pruning the bad quality samples, so there's less to be concerned about (for example, the Unetice samples from RISE are not included). Still, I think it can be confusing when someone posts a result that has, for example, Samara_Eneolithic and you don't know if it's the average of both samples included (the Ra1 and R1b guys) or if it's only one or the other because they get aggregated into one.

@Chad

Sure I'm not going to know how much EHG or WHG each population has with a direct D-stat. But it will tell me the relative balance between both points. And really, if you look at the model for Irish with 35% EHG but 0% Loschbour, it's difficult to make that compatible with the D-stat (46% Esperstedt_MN wouldn't be enough to turn that stat into positive, even slightly).

(I didn't run those D-stats, so didn't choose Loschbour myself. But in any case the same samples are used in both D-stats, so whatever effect Loschbour has in one, it will have in the other).

Chad Rohlfsen said...

There was other issues with Loschbour, but I can't remember off the top of my head. It might've been a Lipson paper where they only wanted to use about 300K sites.

Chad Rohlfsen said...

Ranchot88 and Rochedane. Those are the two that are from the 1240K and have okay coverage.

Davidski said...

I'll see if I can add Ranchot88 and Rochedane to the datasheet.

But one thing to keep in mind with this method is that it's most useful when you're looking for more proximate sources of admixture, and it actually relies on such proximate sources being present in the datasheet.

If you want to model more distant ancestry, then formal stats-based models are the best, because they largely ignore recent drift.

Simon_W said...

@ David
I can't wait to have the coordinates for me and my three related kits. Do we have to send you the raw data again?

@ Matt

"a few simple neighbour joining trees and distance co-plots based on the full 25 dimensions and population averages: https://imgur.com/a/KNsnD"

Interesting how Anatolia_BA and Anatolia_Chl are in a different cluster than modern Cypriots. In the 2D PCA they look so close. But in your tree Cypriots appear in a cluster with more Levantine ancestry.

Another thing that caught my eye: There are two Sephardic samples, and one clusters with Turks, whereas the other one does so with Ashkenazim. I'd guess the latter clustering should be more typical for these people overall.

Seinundzeit said...

Here are a few basic models for various northern South Asian and southern Central Asian populations.

Northern South Asia:

Chamar

53.45% ASI
29.95% Iran_N
10.85% Srubnaya_outlier + 5.00% Steppe_MLBA + 0.75% Andronovo_outlier

Kshatriya

36.20% ASI
35.75% Iran_N
15.15% Srubnaya_outlier + 8.95% Steppe_MLBA + 3.95% Andronovo_outlier

Brahmin

18.6% Srubnaya_outlier + 15.6% Steppe_MLBA
32.2% Iran_N + 1.8% Armenia_ChL/EBA
31.7% ASI

Punjabi_Lahore

42.6% ASI
35.7% Iran_N
12.7% Srubnaya_outlier + 5.7% Steppe_MLBA + 3.3% Andronovo_outlier

Sindhi

46.05% Iran_N + 3.45% Armenia_ChL/EBA
16.00% Steppe_MLBA + 9.10% Srubnaya_outlier + 3.40% Andronovo_outlier
22.00% ASI

Southern Central Asia:

Kalash

38.6% Iran_N + 9.7% Armenia_ChL/EBA + 0.6% CHG
24.1% Srubnaya_outlier + 12.4% Steppe_MLBA + 0.6% Andronovo_outlier
13.9% ASI

Pashtun:HGDP00259

37.25% Iran_N + 12.55% Armenia_ChL/EBA + 0.85% CHG
17.20% Steppe_MLBA + 12.45% Srubnaya_outlier + 5.90% Andronovo_outlier
12.70% ASI
1.10% Mongola

Pashtun_Af

29.4% Iran_N + 24.1% Armenia_ChL/EBA
19.2% Steppe_MLBA + 12.9% Srubnaya_outlier
10.7% ASI
3.6% Mongola

(The Pashtun average is rather odd in this sheet; it's mostly composed of the most northwest Indian/Pakistani-like HGDP Pashtuns, alongside only 3 of the 5 Afghan Pashtun samples we've seen in other analyses. HGDP00259 is the only sample that resembles the usual HGDP Pashtun average, and one can see its results above. Pashtun_Af is just an average of Pashtun2_22Af and Pashtun2_8Af)

Karlani Pashtun, Central Highlands

28.40% Iran_N + 23.05% Armenia_ChL/EBA
22.65% Steppe_MLBA + 9.95% Srubnaya_outlier + 3.00% Andronovo_outlier
10.20% ASI
2.75% Mongola

Batanri Pashtun, Nomadic

34.60% Iran_N + 21.15% Armenia_ChL/EBA
18.05% Steppe_MLBA + 12.65% Srubnaya_outlier
10.80% ASI
2.75% Mongola

Sarbani Pashtun, Southwestern Plateau

36.20% Iran_N + 23.35% Armenia_ChL/EBA
16.50% Steppe_MLBA + 11.40% Srubnaya_outlier
8.00% ASI
4.55% Mongola

Gharghakht Pashtun (Northeastern Highlands) + Panjsheri Tajik

34.30% Iran_N + 20.14% Armenia_ChL/EBA
11.40% Andronovo_outlier + 11.05% Steppe_MLBA + 6.50% Srubnaya_outlier + 2.70% Scythian_Pazyryk
12.85% ASI
1.05% Ulchi

Myself

37.95% Iran_N + 10.80% Armenia_ChL/EBA
13.45% Srubnaya_outlier + 11.10% Andronovo_outlier + 10.20% Steppe_MLBA
13.75% ASI
2.75% Ulchi

Tajik_Ishkashim

24.1% Iran_N + 19.2% Armenia_ChL/EBA
22.4% Steppe_MLBA + 19.5% Srubnaya_outlier + 1.2% Scythian_Pazyryk
9.2% ASI
4.4% Mongola

Tajik_Shugnan

27.50% Steppe_MLBA + 15.95% Srubnaya_outlier + 4.40% Andronovo_outlier + 0.05% Scythian_Pazyryk
20.40% Iran_N + 15.50% Armenia_ChL/EBA + 5.70% Iran_ChL
5.05% Mongola + 0.95% Ulchi
4.50% ASI

Tajik_Rushan

34.4% Steppe_MLBA + 14.4% Srubnaya_outlier + 0.8% Scythian_Pazyryk
20.9% Iran_N + 20.1% Armenia_ChL/EBA
6.4% Mongola
3.0% ASI

Tajik_Yaghnobi

26.00% Armenia_ChL/EBA + 18.20% Iran_N + 7.55% Iran_ChL
32.55% Steppe_MLBA + 9.65% Srubnaya_outlier
6.05% Mongola

Steppe_MLBA is everywhere, often accompanied by Andronovo_outlier, and always alongside Srubnaya_outlier.

For what it's worth, a mixed population approximated by Steppe_MLBA + Srubnaya_outlier-related seems like a solid explanation for the unusual preference displayed by South Central Asians and South Asian towards either Steppe_EMBA or Sarmatian_Pokrovka, in formal analyses.

Also, a question for Rob:

What do the Armenia_ChL/EBA percentages imply for South Central Asians (seems to be missing in India)?

Davidski said...

@Simon_W

I don't usually store data files, so you'll have to send them again. But please note that there's a bit of a queue forming, so it might take a few days.

Anthro Survey said...

@Simon_W

Remember how we were talking about distance matrices that one time? Made this xls matrix for Global25(w/scaling) to more directly visualize distances:

https://drive.google.com/file/d/1CgSS0W6rdgYL-3SfkoDzxw64unJFAaXM/view

No, Anatolia_BA/Chl aren't QUITE as close as modern Levantines are, but still relatively close, rank-wise. It makes sense, though, because Cypriots do have more recent shared drift with the Levantines, as both history and archaeology suggest, even if their overall ancestry is quite similar to post-neo Anatolians. Therein lies the beauty of multi-d PCAs.

Anthro Survey said...

@Seinundzeit

Keep it up, man. I know MLBA HAS TO be the ancestral layer associated w/steppe ancestry in Greater India and Iranosphere.

Though, I've considered another alternative as to why we see artifacts like Srubnaya_outlier and Samara434(global 10).

As you know, previously I suspected admixture of MLBAs with some mystery Central Asian groups rich in some sort of ANE-related ancestry along the way. I still do, but maybe pre-IA India and BMAC were partially enriched in some ANE-like forager ancestry as Rob and others suggested? Either way, the PCA and formal models would have to compensate with an artifact due to present lack of relevant samples.

Simon_W said...

@ David

Alright, I'll enter the queue in a minute!

@ Anthro Survey

True, valid point. As far as pure distances are concerned the distance matrix is better than the neighbour joining tree, and there, many of those in the same macrocluster as Cypriots are more distant than Anatolia_Chl & _BA.

@ Onur Dincer

I answered your question in the other blog, but it yet has to be approved.

Alberto said...

@Anthro

Yes, I definitely agree that steppe_MLBA is what should be used to estimate steppe admixture in SC Asia rather than Yamnaya. I don't even think Srubnaya_outlier kind of ancestry could arrive together with steppe_MLBA. Just look at the ratio of steppe_MLBA+Andronovo_outlier to Srubnaya_outlier. They don't go hand in hand. The ratio clearly increases going north and decreases going south. It's not parsimonious to think that Srubnaya_outlier kind of ancestry wasn't there before any steppe_MLBA (keeping in mind that everything is quite speculative here, and I wouldn't bet money on anything specific, to be honest).

Matt said...

Eren: @MomOfZoha: looking forward to what you're gonna come up with.

Yeah, will second that.

MomOfZoha: But, instead of diving deeper into this, let me just state that if Global 25 is based on a PCA of some raw-er data, then that "raw" data itself must have been in some similarity matrix (or distance matrix) format to begin with. PCA is not just magically defined on SNP data.

I believe generating a covariance matrix on the SNP data is part of the workflow of the PCA software Davidski is using? Rather than that he has processed and stores the matrix as a separate data file. You may be best off chatting directly though.

@Sein, following your example generated an ASI and AustroAsiatic simulation using the intersection between the SA cline and the Austroasiatic India cline:

Scaled: https://pastebin.com/876sBTVN
Unscaled: https://pastebin.com/EBjEZkiJ
Examples of fits: https://pastebin.com/hf4TgVN6
Graphics of fits: https://imgur.com/a/cjhPS

This one is not designed to remove ENA related element from SA cline, just to find the intersection between the two main clines in SA (a SA population which proto-Austroasiatic and various streams of West Eurasian ancestry could mix with to generate SA diversity).

Seems like still fairly significant variance within West Eurasian ancestry once my "ASI" was controlled for (e.g. Velamas, Gond, Brahmin_TN, Brahmin_UP).

You're right about the Pashtun samples as well - some are Sindhi like and some more Tajik like. Likewise with Sindhi samples, some part of the main cluster and 1x looks to sits with Baloch / Makrani / Brahui cluster. May be best not to run the average.

Rob said...

@ Sein
Thanks . Just before I look at it more, what did you use here as “ASI” ?

Davidski said...

@All

If you sent your data to my hotmail account, please send it to my gmail account. My hotmail account doesn't appear to be working.

Seinundzeit said...

Anthro Survey,

"Keep it up, man."

Thanks brother; I appreciate that.

" maybe pre-IA India and BMAC were partially enriched in some ANE-like forager ancestry as Rob and others suggested?"

In my estimation, this is well-grounded speculation.

For what it's worth, the Paniya often show hints of ANE enrichment. Additionally, as per raw Fst distances, it seems that South Central Asians and South Asians are the closest contemporary populations to MA1 and AfontovaGora3.

I guess we'll find out eventually (at some point, we'll have Mesolithic data from both Central and South Asia).

Matt,

"You're right about the Pashtun samples as well - some are Sindhi like and some more Tajik like. Likewise with Sindhi samples, some part of the main cluster and 1x looks to sits with Baloch / Makrani / Brahui cluster. May be best not to run the average."

Ah, I should have taken a look at the individual Sindhi samples.

I do agree; better to run either specific subsets, or just individual samples (in this case, for Pashtuns and Sindhis).

Alberto,

Although I do think it would be pretty exciting if Srubnaya_outlier was a pre-IE Central Asian, it seems unlikely to me, considering the current genetic evidence.

I mean, this sample only takes CHG (20%-25%), no Iran_N/Iran_ChL. For Mesolithic Central Asian-related ancestry (Pre-BMAC and Pre-Indo-Iranian people from contemporary Tajikistan, Turkmenistan, Uzbekistan, Afghanistan, etc), I would expect something like Iran_Hotu, but with more ANE-related ancestry.

Also, is there any evidence that the Srubnaya people were tied to a robust network of cultural exchange which reached deep into Central Asia (and that this could explain the presence of a Central Asian female in their midst)? I doubt it.

And, even if they had a Central Asian cultural connection, it just seems more parsimonious to assume that the Srubnaya_outlier had her origins in some nearby people who possessed a socio-political connection to the "Srubnaya Culture" (if I'm not mistaken, she had a "high-status" burial, and was the biological mother of a few other Srubnaya samples). In addition, the Sarmatians and Western Scythians show a strong component of Srubnaya_outlier-related ancestry. So, that's another indication that people like her weren't living too far away from the western steppe.

With regard to the north-south distinction, we could just chalk that up to the high ASI in the "Scheduled Caste" Indians, which might "force" their steppe admixture to seem more "eastern".

Although, the Kalash aren't even South Asians. Despite that, they still skew towards Srubnaya_outlier. So, perhaps the cline you noted is a real phenomenon, and tracks an earlier steppe-related layer?

Or, perhaps the Srubnaya_outlier percentages appear because we are missing that Central Asian "population X" (Iran_Hotu-like, but far less basal), so the ANE-like Srubnaya_outlier is the next best thing to represent this kind of ancestry (in which case, the Srubnaya_outlier percentages truly are reflective of a pre_IE substrate)?

As you've noted, it's all speculative, until we see more aDNA from the region.

Rob,

It's a simulation I've produced.

There are many ways of doing this, and I've tried them all.

In general terms, no matter how you arrive at it, the percentages don't change. Basically, 1%-8% in Iran (minimum in Lor, maximum in Bandari), 0%-15% in southern Central Asia (0% in Yaghnobi, maximum in Burusho), around 20% among the Sindhi, around 30%-35% in UP Brahmin, a range of 50%-60% in non-tribal South Indians (Brahmins are a bit unique; they seem to be around 35%-40%), 70% in Paniya, etc.

You get somewhat similar proportions by just using the Bonda, or even the Onge, although the fits aren't as good.

All,

I tried to broaden the setup, so that it could accommodate any population I throw at it.

Seems to have worked out. At some point, I'll post the output.

Anthro Survey said...

@Matt

Cool stuff! Speaking of your cline intersection approach, I had a similar idea(I think) back in Global10, but was/am too lazy to implement it. Is what you're doing similar to this by chance?:

Plz refer to this pic for the terms:
https://justpaste.it/1e9rd

Basically, C is the "ASI" or some other mystery population of interest.

A and A' are assumed to mainly vary in C and smth else(say, Iran_N-like). B and B' are assumed to mainly vary in C and another component(e.g. other ENA). 4 starting pops, basically.

So, using a fraction of the delta(s)/runs/rises, between A and A'(in each PC), a set of "extrapolated pops" can be generated. This set can then be compared to the set for B and B' to get two CLOSEST pops(wouldn't expect to get a perfect intersection in 10D or 25D, obv) supposedly in the vicinity of C.

I just wasn't sure which sets of S and SE Asian populations to pick to generate the clines.

namedguest said...

If you guys are looking for some mysterious ANE component in South Asians, I think it might be the case that the ASI had an ancestral component from a population related to the Tianyuan man. You can see this same component in Ust'-Ishim and Oase 1 and is different from the one found in MA1 and Afontova Gora 3.

This component, for what I can see, makes the Gond, Ho and Khonda Dora be the best matches for the ASI group.

Anthro Survey said...

@All
Had the Khwarizmians been around today as a population, would you expect a higher or lower Steppe_MLBA proportion relative to that in Pamiris from Shugnan, Rushan, etc.?

@Alberto
There doesn't really seem to be a clean cut ratio, that's true. So, yeah, if only one of the alternatives is correct, it's that a substrate with similar ancestry to the Srub. outlier existed.
If both were the case, though, it's a bit more tricky to assess the total population turnover.

Rob said...

Sein,
I essentially got similar results as you, when throwing in all samples
So the issue of Arm_Chalc in Tajiks vs lack in, say , Brahmins is an interesting one; and something we've been noticing for a while now.

Back to basics, Tajiks are Iranics right ? Where is there presumed ethnogenesis (Northern Iran/ SCA )?
I cant help but speculate if Arm-Chalc is ~ BMAC. If so, why do Indians lack it ?
But it could be later inflow (why/ when) ?

Chad Rohlfsen said...

Sein,
I'm of the opinion too the Srubnaya outlier is covering unaccounted for ANE. It isn't really admixture from something like this sample.

Seinundzeit said...

Rob,

"I cant help but speculate if Arm-Chalc is ~ BMAC."

I find this to be a very sensible idea; but the notion of later inflow also seems to be equally compelling.

If the latter, perhaps a reflection of an "isolation by distance"-type dynamic with populations further west across the Iranian plateau?

Chad,

I do agree that it's possible.

We'd be in a much better position to decide, if we had an ANE sample considerably younger than MA1 and AG3 (admittedly, if Srubnaya_outlier wasn't 20%-30% CHG-admixed, she'd be the closest thing we have to a young ANE sample).

Rob said...

About Srubnaya Outlier

Srubnaya_outlier
"EHG" 41
"MA1" 35.35
"CHG" 20.75
"Armenia_ChL" 2.9
d.5,8%


adding later sources:

Srubnaya_outlier
"Yamnaya_Samara" 55.7
"MA1" 42.5
"EHG" 1.8
d 4.6%

So an early Yamnaya migration to central Asia. ?

Arza said...

@ Anthro Survey

That's how I generate most of my ghosts. And if you spot the right clines it works amazingly well.

@ Matt

Just like in Global 10 Austroasiatic and South Asian clines do not intersect.

PC 2,3,9
https://s6.postimg.org/fqn1tvs9d/SA_AA.png

From AA to SA you jump along a third cline by adding 10-20% of Iran_N. Pulliyar (IIRC) on the other hand are sitting on a fourth cline between AA and SA.

This is in line with the "Dead cat" slide that showed ~15% Iran_N in North India.

Arza said...

Brahmin
ASI_Sim_Final 45.2
Iran_N:AH1 23.45
CWC_Baltic_early:Gyvakarai1 21.65
Srubnaya_outlier:I0354 2.95
Tisza_LN:I2358 2.45
MA1:MA1 1.55
CWC_Germany:I0104 1.45
Starcevo_EN:I1880 1.3

distance%=1.1593 / distance=0.011593

not (un)scaled

MomOfZoha said...

@Eren, @Matt:

Sorry for the lateness -- did not expect to be hit by major headaches... As I hate to break promises, please pretend that the "later today" was a biblical day.

With slightly less of a headache/nausea, I quickly ran Dave's PCA 25 population average data through my graph theoretic pipeline, just on some default parameter settings and quick-and-dirty clustering (randomized modularity), with such results:

https://drive.google.com/open?id=1qMzXxVqzdcnvbjVwYFQtZ46e_n7-1Gsb

I'd have to tinker more to replace the numeric labels with population labels, and am not in any mood to do that now. Anyway, the numeric ids respect the original order of the populations in the original PCA25 Pop averages datasheet EXCEPT that they start from 0 instead of 1. That is, Abkhasian has ID 0, Adygei ID 1, and so forth. It's easy to do the mapping in excel.

The notable nodes then are those that appear larger due to higher betweenness centralities. Namely: Biaka (51), Lipka Tatar (372), Tianyuan (377), Turkmen (387), Ust Ishim (393), Turkish (386), Hazara (146), et al...

As I was writing the above, looking up each ID at a time, I myself am surprised that many of those high centrality nodes are "Turkic" or other Euro-Asian mixes. Even though I totally should not be surprised at this because it confirms what I hypothesize betweenness centrality to imply in such representations (in addition to what I know about my own ancestry).

One of the side effects of high betweenness is that the node lies "between clusters" rather than belonging to one cluster alone. In such a situation, the node may represent either an admixture of its adjacent clusters, or alternatively a common ancestral source (in the case of ancient nodes, perhaps). For moderns, the former would tend to be the case.

If there were Puerto Rican, Madagascan, or North Indian samples in Dave's PCA 25 spreadsheet, they would probably have even higher betweenness centrality than some of the above Eurasian/Turkic populations.

@Eren specifically:
Selam! It looks like your mom is autochthonous Trabzonlu while your dad has some major Siberian admixture there. That seems surprising given the Sakarya and Giresun origins, unless you are aware of recent Tatar or Nogay ancestry? Unfortunately, all but one of my grandparents are deceased, and I have not had luck convincing my great uncles to DNA test. That is awesome that you can model your grandparents.

@Matt specifically, and Dave:
Well, WOULDN'T IT BE NICE if Dave made his covariance matrix available for all? :)

Chad Rohlfsen said...

@Unknown,

Can you email me at chadrohlfsen@gmail.com I'd like to discuss those IBD runs.

Arza said...

@ MomOfZoha

Can you post your spreadsheet? It seems that one row is misplaced. I have Biaka at 51, but Tatar_Lipka at 373 (spreadsheet downloaded right now).

epoch2013 said...

@David

Are there any Iron_Gates samples out yet? If so, could you do:

D(Mbuti, Anatolia_Neolithic, WHG, Iron_Gates_HG)

as well as

D(Mbuti, Natufian, WHG, Iron_Gates_HG)?

Alberto said...

@Sein

Or, perhaps the Srubnaya_outlier percentages appear because we are missing that Central Asian "population X" (Iran_Hotu-like, but far less basal), so the ANE-like Srubnaya_outlier is the next best thing to represent this kind of ancestry (in which case, the Srubnaya_outlier percentages truly are reflective of a pre_IE substrate)?

Yes, this is what I meant. Not that Srubnaya_outliers were walking around North India, but that the native population was Iran_N + ANE (and that extra ANE is fulfilled by the Srubnaya_outlier). Basically the same that Anthro, Chad or Rob are suggesting and that we were testing long ago with MA1 instead of Srubnaya_outlier.

Let's hope that aDNA finally comes out and we can sort this out.

Davidski said...

@epoch

There are no Iron Gates samples in my dataset yet. There probably will be when the Mathieson 2017 preprint is formally published.

Davidski said...

@All

I've added more ancient samples to the Global 25 datasheets, including AfontovaGora3, one Comb Ceramic, the rest of the Mycenaeans, the other Narva Estonia, and Rochedane.

Rochedane does appear to be a better WHG reference than Loschbour in this test. I also tried adding Ranchot88, but that didn't work out, with all sorts of crazy shit going on in the higher dimensions.

More updates coming soon. Happy modeling...

epoch2013 said...

@Davidski


Never mind, we'll just have t wait then.

It's just that I thought another paper had samples of them too. SC1_meso and SC2_meso from this paper:
http://www.cell.com/current-biology/fulltext/S0960-9822(17)30559-6

Matt said...

@Azra, it looks like there is an intersection between a South Asian and Austroasiatic cline in 2v3, 2v9 and 3v9, quite approximately - https://imgur.com/a/FA4pt

I do mean clines in the roughest of possible sense though. The intersection is not perfect across all dimensions, but it shouldn't be if there is *any* variance within or between ASI / ANI / East Asia in South Asian / Austroasiatic cline populations - this is where nMonte decomposing at least the ANI side into other populations comes in...

@David, the Rochedane sample looks to have about the right kind of drift in east-west related dimensionality as Loschbour, and less extreme dimensionality in north-south. New samples look like Rochedan+AfontovaGora3+Comb_Ceramic+ additional for some other populations.

@Anthro Survey, more or less what I did was:
1) Feed the Global25 PCA selected South Asian results (inc. Austroasiatic) into another PCA, inc. Dai, 2) Take first two dimensions of this PCA, add clines and use to estimate an extension along the Austroasiatic cline which intersects roughly with other SA clines (https://imgur.com/a/7g2nO), 3) use distance along this cline to estimate "ASI" in Austroasiatic (in this case ASI vs an extrapolated Austroasiatic, but could have used Bonda as the "max", if all I wanted to do was find an ASI point) - this would be proportion A-C or B-C in your diagram, then 4) feed this proportions and positions in all 25 dimensions into a univariate regression in PAST3 to project the intersect (e.g. when Austroasiatic is at 0 and ASI at 100).

It could've been better to use multivariate and multiple clines at once, but the univariate result gets near enough anyway. (I then also used a little nMonte to mix down both projected with other populations in 1-2% fractions to get better fits on a variety of SA populations.)

Eren wrote a nice little PCA projection script for R mentioned in another thread tool that could make this all a bit quicker, once you have the proportions. Though we discussed using it to project from one PCA to another, I think he also used it to project the Basal-Rich K7 onto Global10 as a test, so there's no reason you couldn't do that with any set of proportions. There might be some scripiting possible to automate the proportion estimation function as well.

Rob said...

Some quick models for SEE/ Aegean

Mycenaean
"Greece_N" 50.65
"Armenia_EBA" 15.75 + "Tepecik_Ciftlik_N" 15.6
"Yamnaya_Samara" 11.75
"Levant_N" 4.3
"Srubnaya" / "Srubnaya_outlier"/ "Andronovo" 0
"Levant_BA" 0
d 2.3%

Minoan_Lasithi
"Greece_N" 60.8
"Anatolia_ChL" 24.55
"Tepecik_Ciftlik_N" 9.35
"Armenia_EBA" 3.75
"Iran_ChL" 1.55



Iberia_BA
"Iberia_ChL" 59.65
"Yamnaya_Samara" 20.4 + "Tiszapolgar_ECA" 19.55
d 3.0 %

Hungary_BA:I1502
"Tiszapolgar_ECA" 39.45
"Narva_Lithuania" 31.3
"Yamnaya_Samara" 18.95
"Greece_N" 10.3
d 3.6%
Perhaps the inflated WHG in Mako came from middle Dnieper region.


huijbregts said...

Good to know: the clustering software mclust classifies Loschbour, Villabruna, SHG(5), Narva(8), Koros_HG, ElMiron, Blatterhole_HG, Baltic_HG and Rochedane all in the same cluster (Nclusters=50); Bichon and Hungary_HG are missing from the dataset.
This 'WHG' cluster has no false positives or false negatives; indeed a tight cluster.
Even so, a separate multidimensional scaling of these 20 samples shows that ElMiron.damage is an outlier. The multidimensional scaling does not show a central position of Rochedane. Maybe due to the overrepresentation of Narva.
This analysis was done without a rescaling; I don't expect that this affects the results.

Unknown said...

>t ElMiron.damage is an outlier

Is data of the Chan_Mesolithic sample from this study available anywhere online? I wonder if it would cluster with El Miron.


https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5483232/

Unknown said...


BTW, Vestonice16 IBD.

https://imgur.com/a/DjHLc

Chad Rohlfsen said...

@Unknown,

Message me at chadrohlfsen@gmail.com regarding your IBD runs

MomOfZoha said...

@Arza:
"@ MomOfZoha

Can you post your spreadsheet? It seems that one row is misplaced. I have Biaka at 51, but Tatar_Lipka at 373 (spreadsheet downloaded right now)."

Sure, it's this:
https://drive.google.com/open?id=1LmYVoWH0sUlHDdQhz_ggHkonx-LvfbOj

It seems that David is adding stuff to this dataset dynamically, so certainly this might not be the most updated version, but that's consistent with the IDs in the graph.

Eren said...

@MomOfZoha:
Selam, first of, very interesting looking graph you produced there. This is based on the Global 25 PCA datasheet, not IBS, right? As a noob in graph theory, and without labels it’s hard for me to understand what it means.


Regarding my grandparents’ results, the amount of Siberian admixture is not surprising given their locations. The whole Anatolian coastline starting from Giresun in the north-east and extending all the way to the southern coasts is rather high in Turkic admixture, with strongholds in the Central Black Sea region(Giresun), the West (Canakkale) and South-West (Mugla).

Academic samples of Turkey are often quite mixed and therefore not representative. Some friends of mine are running a Turkish DNA project, collecting autosomal and Y-DNA results from regional natives from all around the country. You can check out the results if you want, they have a blog in Turkish: https://oghuzturksdna.blogspot.com, and in English (not up-to-date): https://turkishdna.blogspot.com.

Here, to proof my point about mixed or unrepresentative samples, I took a look at the Turkish samples in the Global 25 datasheet. These samples are from the Human Origins dataset of the Reich Lab, if I’m not mistaken. I ran a PCA on the Turkish samples, together with my grandparents, myself, and other relevant populations which you can see here: https://abload.de/img/turkish_pcatckcb.png .

The non-mixed Turkish samples from Adana, Aydin, Balikesir, Istanbul, as well as my paternal grandparents all cluster together. Kayseri_out1, and Adana_out2 cluster with Kurds. Kayseri_out4 clusters with Albanians, and Kayseri_out3 and Istanbul_out1 seem to be Balkan Turks, or Balkan admixed. My grandparents from Trabzon cluster with Georgian Laz, instead of with the Trabzon samples, who seems to be less Caucasus shifted. Interestingly 4 Trabzon samples cluster close to the main Turkish cluster, meaning they do have Turkic admixture. Probably from western Trabzon.

The most interesting/strange finding concerns the Kayseri cluster, though. Those samples seem not to be Turks at all. After pruning the regional samples from outliers, I created averages and ran them through nMonte with Chal and BA samples. Here are the results:

Turkish_Adana
Armenia_EBA,17.6
Sarmatian_Pokrovka,16.7
Minoan_Lasithi,13.7
Levant_BA,12.9
Ulchi,8.9
Armenia_ChL,7.6
Mycenaean,6.6
Iran_ChL,6.1
Srubnaya,5
Anatolia_ChL,4.9
[1] "distance%=0.7289"

Turkish_Aydin
Armenia_ChL,25
Anatolia_BA,14.1
Ulchi,12.3
Iran_ChL,11.7
Minoan_Lasithi,11.2
Srubnaya,9.7
Armenia_EBA,7.2
Sarmatian_Pokrovka,4
Levant_BA,3.7
Yoruba,1
Sintashta,0.1
[1] "distance%=0.982"

Turkish_Balikesir
Minoan_Lasithi,20.6
Armenia_MLBA,16.3
Ulchi,12
Levant_BA,11.7
Srubnaya,8.7
Iran_ChL,7.8
Armenia_ChL,7.6
Poltavka,5
Sarmatian_Pokrovka,4.8
Armenia_EBA,4.5
Yoruba,0.6
Mycenaean,0.4
[1] "distance%=0.9614"

Turkish_Istanbul
Armenia_EBA,18.5
Minoan_Lasithi,17.1
Armenia_ChL,16
Srubnaya,13.2
Ulchi,12.2
Iran_ChL,9
Levant_BA,6.3
Poltavka,4.2
Mycenaean,3.5
[1] "distance%=0.8439"

Turkish_Kayseri
Levant_BA,30.3
Mycenaean,22.9
Minoan_Lasithi,13.1
Anatolia_BA,10.6
Potapovka,10.5
Sintashta,7.3
Armenia_ChL,4
Yoruba,1.1
Iran_ChL,0.1
Ulchi,0.1
[1] "distance%=1.3202"

Turkish_Kayseri_outlier2
Anatolia_BA,31
Armenia_EBA,24.9
Srubnaya,21.2
Ulchi,7.3
Armenia_ChL,6.3
Levant_BA,5.7
Sarmatian_Pokrovka,2.3
Minoan_Lasithi,1.3
[1] "distance%=1.5589"

Turkish_Trabzon
Armenia_EBA,67
Minoan_Lasithi,22.4
Levant_BA,7.3
Armenia_ChL,3.1
Ulchi,0.2
[1] "distance%=1.0087"

Turkish_Trabzon_outlier
Armenia_EBA,21.3
Armenia_ChL,18.4
Iran_ChL,10.2
Minoan_Lasithi,10.2
Levant_BA,9.5
Mycenaean,8.3
Poltavka,6.3
Ulchi,6.2
Anatolia_BA,6.1
Srubnaya,2.5
Scythian_Samara,1
[1] "distance%=0.692"

The Kayseri results look super weird, 30% Levant_BA + 35% Mycenean+Minoan. Almost no admixture from ancient Armenia, low Anatolia_BA, no East-Eurasian. They must have mixed up the labels or something.

Arza said...

@ MomOfZoha

Thank you. It's simply beautiful.
And I've managed to replace the numbers with names. I hope you will not mind if I share it now.

(3620 x 2713, 900KB)
https://s6.postimg.org/e2x36vt73/MOZvis.jpg

MomOfZoha said...

@Arza:
"@ MomOfZoha

Thank you. It's simply beautiful.
And I've managed to replace the numbers with names. I hope you will not mind if I share it now."

Well, thank *you*, Arza. WHAT did you do there -- on a PDF file no less! :) I would have shared with you (separately, via email) the original graph file format had I known you were going to enhance the meaning of the picture. I am very happy that you appreciate the beauty of this representation. :)

@Eren:
No worries, I am an "anthro noob" myself, not to mention just generally impatient. My undergrad math degree did not increase my patience in looking at pages of numbers. Symbols, pictures, patterns -- another story... Having said that, I'm going to start by assuming that your models are accurate to a reasonable extent.

About Kayseri: In some previous lifetimes, I have had some friends (unrelated to each other) from Kayseri, coincidentally. Literally every single one of them directly descended from different Caucasus groups who established villages in Kayseri (Ossetian, Kabardian, and general Circassian). One could say that I have never met an actual "Turk" from Kayseri then!

About the Black Sea coast, it is still interesting to me that the region would appear more "Turkic" than Central Anatolia. E.g. my Konyali family (dad's village borders Karaman too) also registers more Siberian admixture than the average Turk, but that makes sense given actual Turcoman village names and the Seljuk and Karamanid history of the region.

North of the Black Sea has Crimean Tatars as well as a history of other Turko-Ugric as well as Mongolic groups. While the Oguz have of course made an important mark, one cannot assume that Black Sea Turks (north or south) are of primarily Oguz descent in their Turkic component. Not that anyone knows what an "Oguz" Turk was like (we don't have that kind of recent-historic "ancient" samples!), but I would posit their greater similarity to Turkmenistan-like Central Asians.

I have really enjoyed reading about all the various Turkic/Ugric speaking groups in the recent translation of Ibn Fadlan and other writers in the book "Ibn Fadlan and the Land of Darkness". Ibn Fadlan is actually incredibly objective, even self-deprecating in his humorous (and kind of dangerous) interaction with the Bulgar Turkic ruler (himself extremely funny and clever). There are other historic writers in the book too, which is amazing in noting the change in all the different groups of Turks from the 900s to the 1100s time frame.

***

My daughter awoke from her nap, so I have to tend to a VIP now.

Best wishes...

Rob said...

more comments on my own run:

- surprising the rel. low steppe (15%) admixture in Mako- which is contemporary to late Yamnaya - Early B.B.; and adjacent to Hungarian Yamnaya

- main thrust to mainland Greece was via South Caucasus, with small steppe picked up via ?Thrace and Dalmatia

- Minoans different; migration from Anatolia

I’d say steppe admixture was mostly relevant for Northern Europe
In SEE, it’s lower than the MNE admixture into Beaker

Jortita said...

Hi Davidski, I am interested in my results being generated using my Living DNA raw data file and also Ancestry DNA. Please let me know on how I can send you the files and also the payment. Thanks, Jortita

Samuel Andrews said...

@Rob,

"- main thrust to mainland Greece was via South Caucasus, with small steppe picked up via ?Thrace and Dalmatia "

That is such BS and you know it. Steppe wasn't "picked up." A Steppe-heavy, Indo European speaking population moved directly into Greece.

"I’d say steppe admixture was mostly relevant for Northern Europe "

Generally, speaking that's correct. BA Croatia though had a decent chunk of Steppe. BA Iberia probably had mostly R1b P312. R1b P312 became big in Italy before the Romans.

So, Steppe-rich groups did go into southern Europe hence most southern Europeans in 0 AD spoke an Indo European language.

Rob said...

@ Sam

First off, watch your mouth. I am an expert, and not a biased noob like you
Secondly, look at the data and deal with it.
Maybe someone with patience can break it down for you because I'm above your crap. For a while there i thought you'd finally grown a brain.

Samuel Andrews said...

@Rob,

Also, Rob you obviously emotionally don't like the idea of recent Mid-eastern admixture in southeastern Europe because that's where you are from. Angela, a Tuscan at Eupeida, was the same way. She hated the idea Tuscany has anything other than EEF, WHG, and Steppe.

Remeber, when I mentioned the consistent presence of typical SW Asian/Arabian R0a1a in southeastern Europe? It hasn't been found in EEF. Yet you claimed it descends from EEF. It shows how little you know about mtDNA.

R0a1a is just one of many mtDNA haplogroups which suggest there's recent SW Asian stuff in southeastern Europe.

Samuel Andrews said...

Point is, Rob you carry your own biases. And it is about time you undestand "Steppe proponents" aren't ugly, losers who seek self-esteem in their ancestors. That's a very simplistic, erroneous assessment. I know none of us on this blog fit that description except maybe Shah. And the fact you give people such horrendous insults and then brag about your education demonstrates what a terrible person you are.

Samuel Andrews said...

@Rob,

"Secondly, look at the data and deal with it."

Um, Myceneans had *signifcant* Steppe ancestry, spoke an Indo European language. Minoans lacked Steppe ancestry, spoke a non-Indo European language.

Do you really think that's a coincidence, genius? Or do you just not like the idea archaeologically "insignificant" people spread the Indo European language family? Would you rather Indo European languages emerged in some telapathic trade deal between the Balkans and Caucasus?

Read my blog post about Steppe mtDNA again. The genetic evidence in support for the Kurgan hypothesis is coming from all angles.

Anthro Survey said...

@Matt

Ah, you use the coordinates from selected Global25 samples to generate another 25D PCA in PAST3, then.

So, the 25 coordinates you end up obtaining w/regression in step 4 for ASI are specific to that new PCA?
If so, do you "convert" these into Global25 or merely use that new PCA for all subsequent ASI ancestry estimation/fit?

Anthro Survey said...

@Arza

Oh really? Or is your method more like the one Matt's using?
(Yeah, finding the right cline is always the million dollar question!)

If like the one I described:
When you do find two extrapolated populations with a small distance between them(obv impossible to get a perfect intersection) suspected to be in the vicinity of your ghost, do you then take their average?

Aside from ASI, I'd love to find some basal-rich ghosts.

Rob said...

@ Sam

You're silly child from Bumfuck Idaho suffering Kruger-Dunning, so don't project your agendas on me.
Secondly, I stated Myceneans have steppe admixture - 10%., so as usual you are making strawman arguements because you lack the intellectual capacity to engage in upfront discussion, and also because your a whiney pussy.

(here is is again
Mycenean
"Greece_N" 50.65
"Armenia_EBA" 15.75 + "Tepecik_Ciftlik_N" 15.6
"Yamnaya_Samara" 11.75
"Levant_N" 4.3
"Srubnaya" / "Srubnaya_outlier"/ "Andronovo" 0
"Levant_BA" 0
d 2.3%

Sure, 12% is 'significant"


Secondly, your call the major thrust form the Caucasus "BS" because you're clueless. The archaeology is clear, but feel free to also check the Lazaridis paper; which by the way, I did not link to PIE coming from Armenia. So again, that reflects your paranoid biases.

As for your blog, as if Id waste my time.

Onur Dincer said...

@MomOfZoha

North of the Black Sea has Crimean Tatars as well as a history of other Turko-Ugric as well as Mongolic groups. While the Oguz have of course made an important mark, one cannot assume that Black Sea Turks (north or south) are of primarily Oguz descent in their Turkic component. Not that anyone knows what an "Oguz" Turk was like (we don't have that kind of recent-historic "ancient" samples!), but I would posit their greater similarity to Turkmenistan-like Central Asians.

You are right about the Turkic groups coming to the north of the Black Sea, but for the Turkic groups coming to the south of the Black Sea, i.e., northern Anatolia, I should say that they were primarily from the Oghuz/Turcoman groups just like the ones coming to the rest of Anatolia, the Caucasus, the Near East and the southern half of Central Asia (including what is now Turkmenistan) during the Seljuk times, historical sources are clear about that.

As for the genetic makeup of the Oghuz/Turcoman groups coming to Anatolia, the Caucasus, the Near East and the southern half of Central Asia from the northern steppe parts of Central Asia (in what is now Kazakhstan) with the Seljuks and their subsequent empire, we cannot be certain about exactly how it was without ancient DNA results from them, but we can be almost certain that they were not like the Turkmenistan Turkmen samples in the academic studies, whose genetic results show clear signs of high amounts of native Iranic southern Central Asian ancestry (almost certainly due to assimilating and mixing with those native Iranic groups during the last 1000 years since the Seljuk times) compared to the results of other modern populations with ancestry from the Oghuz/Turcomans of the medieval times such as Turks from Turkey and Azeris, who both lack or have much less of the the southern Central Asian Iranic ancestry and show high amounts of native ancestry from their respective locations (Anatolia, the Balkans, the Caucasus and the Near East) instead. My guess is that the genetic composition of the Oghuz/Turcoman groups coming to Anatolia, the Caucasus, the Near East and the southern half of Central Asia 1000-800 years ago was somewhere between modern Uzbeks and modern Kazakhs, and probably not much different in their proportion of West Eurasian and East Eurasian genetic elements from modern Uyghurs, but that is just my guess and it is largely based on modern genetic results and should be regarded as speculative due to the current lack of relevant ancient DNA samples.

About the Black Sea coast, it is still interesting to me that the region would appear more "Turkic" than Central Anatolia. E.g. my Konyali family (dad's village borders Karaman too) also registers more Siberian admixture than the average Turk, but that makes sense given actual Turcoman village names and the Seljuk and Karamanid history of the region.

Eren is right that on average Turks with origins from the coastal parts of western Anatolia and parts of coastal northern Anatolia show more signs of Turkic ancestry in their genetics than Turks from the interior regions of Anatolia. But by "coastal" I do not necessarily mean to say lands adjacent to the sea but lands outside the generally arid interior zone of Anatolia. This result makes historical sense when we look at the regions where the Turcoman principalites (beyliks) that were founded in Anatolia following the decline of the Seljuk Sultanate of Rum were concentrated and also where in Ottoman Anatolia Yoruk/Turcoman tribes were concentrated according to the tax registers of the Ottoman Empire.

Hope this helps.

Onur Dincer said...

@Eren

The most interesting/strange finding concerns the Kayseri cluster, though. Those samples seem not to be Turks at all. After pruning the regional samples from outliers, I created averages and ran them through nMonte with Chal and BA samples. Here are the results:

...

The Kayseri results look super weird, 30% Levant_BA + 35% Mycenean+Minoan. Almost no admixture from ancient Armenia, low Anatolia_BA, no East-Eurasian. They must have mixed up the labels or something.


Don't know which samples David has among the samples labeled as "Turkish_Kayseri" in his database, but the samples labeled as "Turkish_Kayseri" in academic studies are not weird at all, for example, they show about 7.5% East Eurasian element according to MDLP K23b on average and cluster with other Anatolian Turks in PCAs and other analyses. You should ask David to check whether he uses the right samples for the Turkish_Kayseri samples or accidentally mislabeled some other samples as "Turkish_Kayseri".

Onur Dincer said...

@MomOfZoha

About Kayseri: In some previous lifetimes, I have had some friends (unrelated to each other) from Kayseri, coincidentally. Literally every single one of them directly descended from different Caucasus groups who established villages in Kayseri (Ossetian, Kabardian, and general Circassian). One could say that I have never met an actual "Turk" from Kayseri then!

For Kayseri see my above post to Eren.

MomOfZoha said...

@Onur:

I understand that modern Turkmen from Turkmenistan have a high Iranic component, as likely did Turkmen from Turkmenistan prior to the time of the Oguz migrations into Anatolia --actually, the Iranic component was likely *higher* in Turkmenistan prior to that time. My comment on "Oguz" was more directed towards the name of the project that Eren linked, which I found to be a strange naming unless its intent is to exclude people who do not speak some Oguz Turkic language?

I am not attached to any theory on Turkic origins, as the more I read the more complicated it gets. That is why I refer to the translation of the Arab travelers from 900s-1100s. The first time that the "Oguz" appear in writing is exactly Ibn Fadlan's travel diary in which he refers to them as (transliterated) "Ghuzz" -- and they are not yet Muslim. When al-Andalusi encounters the Oguz two centuries later, they have already become the powerful Seljuks...

I ultimately do not care to attempt to specify whether Oguz are this or Oguz are that as I very well understand that "Turk" has often come to mean a conglomerate of multi-lingual nations, usually with an admixed Siberian or East Asian component, even in cases where such component is far from the majority of average ancestry. If you read any historic account in depth, you will understand that Turks have always acted in a way to create a greater trust and security, with little or no care for racial purity whatsoever. The biggest mechanism of securing such expanded trust has in turn been strategic conversion, the idea being that co-religionism automatically confers some fundamental similarity of mind. All the racial stuff is post-19th century, largely inspired by European racialists as usual.

Anyway, Onur, thank you for your input, and I know you have looked at a lot of Turkish DNA, including my own. :) However: *Especially* in situations where there is notable descent from the "Oguz" Turkmen of ~1000 years ago, who settled and admixed with autochthonous Anatolian populations (not to mention some effects of later Ottoman migrations across the empire), then we are talking about centuries of recombination involving genetically distant groups. Why should anyone expect this (i.e. discerning the precise ancestral paths) to be an invertible problem -- in lieu of massive grave digging?

I am capable of finding and weighing the available information. And, anyway, personally, I am more interested in other aspects of Anatolian dynamics, especially including the role of the autochthonous populations, such as Armenians and Greeks, often hand-in-hand with the Turks. In this regard, I have my own theories of the initial "Islamification" of Anatolia via a syncretic proto-Alevism, which -- in my opinion -- was initially spread by the union of Armenians and Turkmen of the Karamanid times. The role of new Armenian converts in this was also -- in my opinion -- quite substantial... Such a dynamics is far more interesting to me than the precise genetic composition of this or that, ALTHOUGH the genetics *might* help me to better understand the likelihood of my theories.

And, with regards to my own ancestry, although the Siberian and Turkic parts are aforementioned above and probably obvious to anyone who knows me, I have also expressed happiness in discovering my significant shared ancestry with the following peoples -- at least the first three of whom I also surely descend: Iraqi Kurds, Ubykh, Armenians, Lebanese Christians, Ashkenazi Jews, Zaza, Hungarians, Bulgarians, Greeks...

Anthro Survey said...

@Rob-ski,

A few q's.

This is the model you favor, right?:
https://justpaste.it/1gxoi

(not this)
https://justpaste.it/1gxol

If so, then:
1)Would this Mycenaean wave be the sole significant source of Anatola_BA-like admixture in Greece? Or do you still think a previous 2200-2000BC wave w/no steppe took place?

(described here): http://www.ufg.uni-kiel.de/dateien/dateien_studium/Archiv/Kirleis/200100_Kirleis_DalCorso/Oxford%20Handbook%20of%20European%20Bronze%20Age/Chapter%203%20Heyd_Europe_2500_to_2200_BC_Bet.pdf

2)Would this be a black-sea coastal route as shown?

Don't see a prob with it, btw, just wanted to get more deets.
------------------------------

As for steppe and N/S Europe--
Depends on what we call significant and the boundaries of S. Europe, but North Italy, Southern France, Balkans north of ~Via Egnatia likely had considerable steppe by 1000BC. Maybe ~20%, on average. Greece may have remained rather Mycenaean-like, though. Same with low-lying S.Italy(as opp. to Osco-Umbrian hills).

As for ME admixture---
Post-Neolithic flow is important in and mainly restricted to SE Europe. In the Balkans, it's the Anatolia_BA-type, mainly.
In mainland S. Italy, there's said admixture as well as some recent Roman-age Syrian input. I've suspected this even before getting into genetics due to frequently overlapping appearances of Balkaners with Italians and of these two with West Asians.

Angela is a girl from Eupedia who gets triggered when you suggest these post-Neolithic movements into Italy. Afaik, she threatened to kick people out over this. Heck, I myself have gotten into many heated exchanges with Italiacists in discussing this subject.

Samuel Andrews said...

@Rob,
"so as usual you are making strawman arguements because you lack the intellectual capacity to engage in upfront discussion, and also because your a whiney pussy.
"

I never claimed you said Myceneans lacked Steppe admixture. No strawman argument.

Explain this....
Myceneans had Steppe admixture. Myceneans spoke an Indo European language
Minoans lacked Steppe admixture. Minoans spoke a non-Indo European language.

Ancient DNA from Europe & Asia suggests Indo European languages spread alongside Steppe admixture. Can it really be a coincidence Myceneans spoke an Indo European language and had Steppe admixture?

Knowing, Steppe peoples were moving around a lot around the time of the Myceneans, do you really think they just "picked up" Steppe admixture?

Rob, this is pretty simple. You dodge this logic because for emotional reasons you don't like the Kurgan hypothesis.

"Secondly, your call the major thrust form the Caucasus "BS" because you're clueless. "

I never argued a thrust from the Caucasus didn't happen! Look who is making straw man arguments!

"I did not link to PIE coming from Armenia. So again, that reflects your paranoid biases."

I never said you did!

Samuel Andrews said...

@Everyone,

This is for anybody who still thinks Rob is a decent human being.

"You're silly child from Bumfuck Idaho suffering Kruger-Dunning"

Anthro Survey said...

@MomOfZoha

The Turkmen residing in T-stan likely did not have a lot of NATIVE Iranic ancestry on the onset of migration, as opposed to modern Turkmen living there who would have absorbed, over the centuries, the types of folks living (large) in Merv, Tus and Nishapur.

As for the pre-Turkic composition of that whole region, I'd say Khwarezmians packed the most steppe_MLBA ancestry. Heck, I bet they'd model mostly as steppe_MLBA+Srubanaya_outlier in our setups with ~10-15 Iran_Chl/N, but this is speculative. To my knowledge, the Aral sea region didn't feature mud-brick urban complexes like Bactria-Margiana did.

Samuel Andrews said...

@Rob,

Anybody can skew nMonte models to fit their favored narrative if they chose the right reference populations because all ancient West Eurasians have intertwined ancestry. Using, Srubnaya and taking out ancient Armenian's with EHG admixture, Mycenean's Steppe admixture goes up to 20%.

Anyways, the amount of Steppe admixture in Myceneans doesn't matter. All that matters is they had it.

To argue their Steppe admixture has no connection to their Indo European speech, considering the proven link (Y DNA & mtDNA) between so many modern Indo European speakers with ancient Steppe folk, is plain ridiculous.

For several years now, you have tried and tried to poke holes in the Steppe narrative. I remeber you suggested western Europe has pusedo Steppe admixture from Neolithic Balkan farmers rich in HG admixture and R1b. That's a laughable now.

I think it is pretty obvious you are bias at some level.

Onur Dincer said...

@MomOfZoha

I understand that modern Turkmen from Turkmenistan have a high Iranic component, as likely did Turkmen from Turkmenistan prior to the time of the Oguz migrations into Anatolia --actually, the Iranic component was likely *higher* in Turkmenistan prior to that time. My comment on "Oguz" was more directed towards the name of the project that Eren linked, which I found to be a strange naming unless its intent is to exclude people who do not speak some Oguz Turkic language?

"Turkmen from Turkmenistan prior to the time of the Oguz migrations into Anatolia" would be a misnomer since the Oghuz/Turcoman migrations to what is now Turkmenistan only began during the first half of the 11th century, so immediately before the beginning of the Oghuz/Turcoman migrations to Anatolia and environs during the middle of the same century. What was happening was a fast series of long-distance migrations that went from the Oghuz/Turcoman lands in the Aral steppe region of what is now Kazakhstan to a very wide geography stretching all the way from what is now Uzbekistan in the east to Anatolia in the west in the blink of an eye figuratively speaking.

I am not attached to any theory on Turkic origins, as the more I read the more complicated it gets. That is why I refer to the translation of the Arab travelers from 900s-1100s. The first time that the "Oguz" appear in writing is exactly Ibn Fadlan's travel diary in which he refers to them as (transliterated) "Ghuzz" -- and they are not yet Muslim. When al-Andalusi encounters the Oguz two centuries later, they have already become the powerful Seljuks...

I read Ibn Fadlan's account of the Oghuz. Though the first person to mention the Oghuz people we know is the Arab geographer Yaqubi (9th century) to my knowledge.

Anyway, Onur, thank you for your input, and I know you have looked at a lot of Turkish DNA, including my own. :) However: *Especially* in situations where there is notable descent from the "Oguz" Turkmen of ~1000 years ago, who settled and admixed with autochthonous Anatolian populations (not to mention some effects of later Ottoman migrations across the empire), then we are talking about centuries of recombination involving genetically distant groups. Why should anyone expect this (i.e. discerning the precise ancestral paths) to be an invertible problem -- in lieu of massive grave digging?

You are welcome, MoZ. As you well said, generations of recombination makes it very hard to discern the genetic segments coming from the native populations and those coming from the Oghuz/Turcoman incomers, at least when it comes to the West Eurasian genetic segments. That is why ancient DNA data from the Oghuz/Turcoman incomers and their immediate descendants are of utmost importance for estimating the degree of Oghuz/Turcoman genetic input among modern populations such as Turks, Azeris and Turkmens, especially given the lack of sufficiently representative modern populations for the Oghuz/Turcoman incomers.

And, with regards to my own ancestry, although the Siberian and Turkic parts are aforementioned above and probably obvious to anyone who knows me, I have also expressed happiness in discovering my significant shared ancestry with the following peoples -- at least the first three of whom I also surely descend: Iraqi Kurds, Ubykh, Armenians, Lebanese Christians, Ashkenazi Jews, Zaza, Hungarians, Bulgarians, Greeks...

Since you are from Konya, we can expect that your native ancestry largely comes from Anatolian Greeks as Konya/Ikonion was a largely Greek-speaking area since at least the Byzantine times. Anatolian Greeks are genetically much closer to Anatolian Turks and Armenians than to Balkan Greeks, and unfortunately they are not represented well in academic studies and GEDmatch calculators, so their genetics is not widely known among the general public that have some knowledge of human population genetics. But of course experts of the subject like us have sufficient knowledge about their genetics.

Samuel Andrews said...

@Everyone,

This is for anybody who still thinks Rob is a decent human being. Btw, Rob this is why I hate you. People like you lack basic human sympathy.

"You're silly child from Bumfuck Idaho suffering Kruger-Dunning"

Samuel Andrews said...

@Rob,

Sociopath might be a good way to describe you. I value a retard a whole lot more than a sociopath. But of course, you probably lack the capability to understand why.

It's strange. The worst human beings I have ever interacted with have been on DNA forums and blogs. They are either the crazy racist(s) or snobby educated posters.

Anthro Survey said...

@Sam

I don't think Rob disagrees with steppe-DNA people ultimately being responsible for IE's spread. It just seems he favors steppe--->caucasus---->anatolia--->Greece model for Greek/Mycenaean specifically(as opposed to steppe--->balkans--->greece). Refer to the diagram I made in my comment addressed to him above. In other words, steppe DNA(as well as R-Z103) arrived in Greece(supposedly packing EEF+some Anatolia_BA at the time) in tandem with some additional Anatolia_BA/Armenia-like admixture.

Samuel Andrews said...

@Rob,

Or maybe you are realizing you wasted your life studying archaeology! Nobody gives a shit about the Balkan Chalcolithic trade routes you scream about all the time.

Samuel Andrews said...

@Anthro Survey,

I am not closed to that idea. How can it explain that Myceneans simply having a fraction of the CHG that Minoans do? Or that Minoans have CHG but no Steppe?

Onur Dincer said...

@Anthro Survey

The Turkmen residing in T-stan likely did not have a lot of NATIVE Iranic ancestry on the onset of migration, as opposed to modern Turkmen living there who would have absorbed, over the centuries, the types of folks living (large) in Merv, Tus and Nishapur.

See my reply to MomOfZoha on the source location of the Oghuz/Turcoman migrations of the Seljuk times:

""Turkmen from Turkmenistan prior to the time of the Oguz migrations into Anatolia" would be a misnomer since the Oghuz/Turcoman migrations to what is now Turkmenistan only began during the first half of the 11th century, so immediately before the beginning of the Oghuz/Turcoman migrations to Anatolia and environs during the middle of the same century. What was happening was a fast series of long-distance migrations that went from the Oghuz/Turcoman lands in the Aral steppe region of what is now Kazakhstan to a very wide geography stretching all the way from what is now Uzbekistan in the east to Anatolia in the west in the blink of an eye figuratively speaking."

In other words, the location of the Oghuz Yabgu State:

https://en.wikipedia.org/wiki/Oghuz_Yabgu_State

"The Oguz Yabgu State (Oguz il, meaning Oguz Land, Oguz Country, 750–1055) was a Turkic state, founded by Oguz Turks in 766, located geographically in an area between the coasts of the Caspian and Aral Seas. Oguz tribes occupied a vast territory in Kazakhstan along the Irgiz, Yaik, Emba, and Uil rivers, the Aral Sea area, the Syr Darya valley, the foothills of the Karatau Mountains in Tien-Shan, and the Chui River valley (see map). The Oguz political association developed in the 9th and 10th centuries in the basin of the middle and lower course of the Syr Darya and adjoining the modern western Kazakhstan steppes."

I agree with you on your other points, Anthro Survey.

MomOfZoha said...

@Onur

"Since you are from Konya, we can expect that your native ancestry largely comes from Anatolian Greeks as Konya/Ikonion was a largely Greek-speaking area since at least the Byzantine times. Anatolian Greeks are genetically much closer to Anatolian Turks and Armenians than to Balkan Greeks, and unfortunately they are not represented well in academic studies and GEDmatch calculators, so their genetics is not widely known among the general public that have some knowledge of human population genetics. But of course experts of the subject like us have sufficient knowledge about their genetics."

Onur, everyone knows that Konya was Ikonion and that the whole region was Greek speaking due to the Byzantines at any rate. While the Greeks were successful imperializers of Anatolia, they were not the only ones. My father's region in particular is also Cilicia+Isauria, and he is from an isolated village region there. So, please, despite your great expertise, allow some room for further knowledge. A variety of peoples have passed through Anatolia, and every single village has its own history.

As for the Oguz, the question remains as to when and where did Oguz become Oguz. The main Oguz migration into Anatolia was of the Seljuk variety, and the Seljuks were clearly Persianate in culture by that time -- who knows of their genetics by the point of entry...

At any rate, you continue to claim that moderns are good enough proxies for recent ancients (past millenium). And, I respectfully disagree, despite your great expertise...

Anthro Survey said...

@Sam

Well, two variables have to be in the right ranges for that idea to work---
-Degree of dilution en route from Southern Russia through Caucasus-Anatolia.
-The degree to which pre-Mycenaean waves from Anatolia affected Greece

IIRC, Mycenaeans still have a lot extra CHG/"West_Asian" shift once we remove the steppe ancestry, btw. It's less than that of Minoans, sure, but that's not an issue.

Anthro Survey said...

@Onur

Yes, I was a bit imprecise but nonetheless referring to these first Turkmen tribes to have set foot on Turkmenistan's territory:
"...only began during the first half of the 11th century, so immediately before the beginning of the Oghuz/Turcoman migrations to Anatolia"

The point here is that Seljuk migrants to Anatolia likely would not have Turkmen as their best proxies, but rather Kazakhs, as you put it, from what the records suggest. Plus, I remember you posted a link to a Seljuk girl's skull once. Of course, we can't judge ancestral composition solely from this(let alone a single such sample), but hard to see her being influenced by SC Iranian adxmiture, tbh.

MomOfZoha said...

@AnthroSurvey:
"The point here is that Seljuk migrants to Anatolia likely would not have Turkmen as their best proxies, but rather Kazakhs, as you put it, from what the records suggest. Plus, I remember you posted a link to a Seljuk girl's skull once. Of course, we can't judge ancestral composition solely from this(let alone a single such sample), but hard to see her being influenced by SC Iranian adxmiture, tbh."

My dad's sister can pass for a Kazakh too, which does not prove shit, excuse my French.

Balaji said...

Epicycles upon epicycles were invented in an attempt to preserve the Ptolemaic geocentric theory. Similarly in an attempt to have Andronovo be the “proto-Indo-Iranians”, we have suggestions that all the Andoronovo burials to date have been elite ones and that the elite Andronovo were genetically different from the more numerous commoners. Another suggestion is that the Andronove on their way to doing the Aryan invasion somehow met up with people like Srubnaya_Outlier. Yet another is of a mystery Central Asian group high in ANE that merged with the Andronovo to become the “Indo-Iranians”. Such hypothesis will likely be falsified when the Rakhigarhi results are published and show as much or more of “steppe-related” ancestry in the ancient inhabitants there as in in the modern inhabitants. The Euro-centric AIT will go the way of the geocentric Ptolemaic theory.

Samuel Andrews said...

@ROb,

"People, - and Im not talking about steppe tards like Sam and Shah"

How many fucking times do I have to tell you I am not a 'steppe tard'? I just go with the evidence, Rob. Did you read my responses to Shah? I banned him from my blog.

"Sam can cry all he wants about the 10% steppe admixture , but it doesn;t matter to me. If 4500 BC Bulgaria had steppe admixture, of course Greece would by 1700 BC. Crete is an island not connected to Europe, and it had it's own history.
"

Most people in Chalcolithic Bulgaria had 0% Steppe admixture!! Things don't just happen. Myceneans didn't get Steppe admixture from the random admixture. That's a cope out argument.

Funnel Beaker had no SHG admixture (as far as we know). Globular Amphora had 0% Steppe admixture.

Signifcant admixture rarelly just happens. It usually happens because of some kind of a migration.

"The reality is that the events which catalysed the Copper Age processes began elsewhere, and continued onto the steppe in earnest, propagating all the way to the Urals: the culmination of EEF and CHG expansion all the way to Urals and Altai. "

I'm fully aware of all that and will give it attention when its relevant. DId you read my blog post which calimed most EUropean mtDNA is from Neolithic Anatolia? Don't you dare tell me I'm a racist who hates EEF & CHG.

In the 3rd millennium BC the signifcant widespread migrations all came out of the Steppe.

Steppe>Northeast Europe (Corded Ware)
Steppe>Northwest Europe (Bell Beaker)
Steppe>Asia (Sintashta, Andronovo)

We also, see that by at least 1500 BC Steppe admixture existed in Iberia, Croatia, and Greece. Because of the domanience of R1b we know the Steppe admixture into Iberia wasn't just from random admixture.

Myceneans lived right after this era of mass Steppe migrations. Yet, you think they "picked up" Steppe admixture instead of recieving it from migration.

Kanishka said...

@Anthro Survey I certainly do not agree with you there on the amount of Steppe ancestry harbored by the Khwarezmians. The reason for this is that the Khwarezmians were also descendants of the BMAC culture and Andronovo synthesis.

See here:

https://en.wikipedia.org/wiki/Khwarezm#Early_people
https://en.wikipedia.org/wiki/Kelteminar_culture

From Wiki:

"Like Soghdiana, Khwarezm was an expansion of the BMAC culture during the Bronze Age which later fused with Indo-Iranians during their migrations around 1000 BC. Early Iron Age states arose from this cultural exchange. List of successive cultures in Khwarezm region 3000–500 BC:[12]"

I think they had a lot of Steppe_MLBA ancestry, perhaps as much as 70%, but nonetheless they were another Iranic extension of the BMAC cultures, though more Steppe derived that any other post-BMAC Iranic culture.

So, in all likelihood, they probably did have European features.

Samuel Andrews said...

@Rob,
"So the point is - and its very simple and clear- the EHG/ original steppe component is minor in the entire process, and its role pushed by all you blokes is rather dubious & forced."

I could care less if in the 5th millennium BC Russia was home to primitive EHGs who received a migration from CHGs.

It doesn't change the fact that in the 3rd millennium BC people in Russia, the Steppe, migrated into lots of different places where they made huge genetic impacts. Greece was one of the places they migrated into.

Anthro Survey said...

@Rob

Fully aware that Chalcolithic Caucasus exerted both a profound cultural and demographic impact upon the Pontic steppe without which the Sredny Stog & Yamna horizons and the "steppe component" would have been unthinkable. Nor am I averse to the idea that IE may have ultimately arisen in the Caucasus. Perhaps proto-Anatolia was a result of this early split and may have only had CHG initially associated with its spread, but Greek/Armenian split off later.

I wouldn't say I ignore, in any way, the presence of EEF(Old European ancestry) in BB, Unetice, etc, nor the Iran_Chl, ASI, etc. in Pashtuns, Pamiris, and Indians. In fact, I strongly emphasize the cardinal importance of all of these substrates as well as the formative processes occurring AFTER the hybridization took place. This is why I don't refer to steppe as "Europeans". Modern Europe wasn't born yet. Neither was India.

So, no, Greek culture does not "come from the steppe[& let's just leave it at that]" as a now banned guy said/implied once. Nor does Indian culture. It's laughable to suggest otherwise.

Although the importance of the steppe shouldn't be overstated,as you said, the steppe component is still a useful marker to track the spread of IE and other processes accompanying it.

Rob said...

@ Sam

Yes they picked it up off the ground
Oh wait- people need to move and have sex ?

Anthro Survey said...

@Kanishka

You misunderstood me somewhat.

To emphasize again, they should NOT be able to model as 70-80% Steppe_MLBA, but as 70-80% steppe_MLBA+Srubna_outlier. Perhaps 35-40% (intrusive) steppe_MLBA, which still seems higher, mind you, than most extant Iranic populations.
The "srubnaya outlier" signal is suspected to be an artifact resulting from a presence of ANE-rich ancestry across Central and South Asia.

Yes, it's true that the Aral sea region experienced some southerly influence, but pre-Iranic Suyarganovo substratum there would have featured a lot more of some ANE-rich(Kelteminar-related?) ancestry than in BMAC proper. Good western analogy: Iberia_Chl and Blatterhole_MN vs Hungarian LBK.

Kanishka said...

@Anthro Survey Great point. So, do you think that Al-Khwarezmi and Al-Biruni would genetically cluster with Europeans or S/C Asians?

I think we cannot jump to any serious conclusions unless we see evidence from the sites, if we can still recover ancient DNA from that area. I would love to see the genetic composition of the Khwarezmians, Parthians, Sogdians, etc. Personally, I feel that anything in the range of the BMAC horizon had significant BMAC related admixture, possibly between 30 and 50%. I think the Yaghnobi peoples are an excellent indicator of what the Sogdian genetic makeup would have been.

One thing I must also add is that, as you have put it, we must not be too haste to assign "Steppe origins" to civilizations which had not yet even formed and set in stone. For instance, the impact of BMAC and similar cultures on the Indo-Iranians is significant, and while their initial beginnings may have been on the Steppe, they nonetheless benefited substantially from their interactions with the heavily urbanized peoples of the BMAC. I would hope that we get some BMAC samples really soon in order to assess the degree to which the Andronovo culture had hybridized with the natives.

Do you expect to see Steppe ancestry in any upcoming sequenced genomes from BMAC?

Onur Dincer said...

@Anthro Survey

Yes, I was a bit imprecise but nonetheless referring to these first Turkmen tribes to have set foot on Turkmenistan's territory:
"...only began during the first half of the 11th century, so immediately before the beginning of the Oghuz/Turcoman migrations to Anatolia"

The point here is that Seljuk migrants to Anatolia likely would not have Turkmen as their best proxies, but rather Kazakhs, as you put it, from what the records suggest. Plus, I remember you posted a link to a Seljuk girl's skull once. Of course, we can't judge ancestral composition solely from this(let alone a single such sample), but hard to see her being influenced by SC Iranian adxmiture, tbh.


In that case I do not find anything to disagree with you, but like I said before, Uyghurs may be a better proxy than Kazakhs, at least in terms of West Eurasian / East Eurasian proportions.

Onur Dincer said...

@MomOfZoha

Onur, everyone knows that Konya was Ikonion and that the whole region was Greek speaking due to the Byzantines at any rate. While the Greeks were successful imperializers of Anatolia, they were not the only ones. My father's region in particular is also Cilicia+Isauria, and he is from an isolated village region there. So, please, despite your great expertise, allow some room for further knowledge. A variety of peoples have passed through Anatolia, and every single village has its own history.

I am going by what you have told me and others on your known ancestry. Indeed, several times you have mentioned that your father is from a village likely connected with Isaurians. If that is the case, you should also know that Isaurians had totally switched to the Greek language by the early Byzantine times, so had been Hellenized practically speaking. So when the Oghuz/Turcomans arrived in Anatolia there was no one around speaking the ancient Isaurian language for some centuries. In fact, Hellenization of much of Anatolia, especially the interior parts, happened with no mixing with Greek colonists at all. Cilicia experienced a later Armenization, so it is quite likely that you have some Armenian ancestry from there.

As for the Oguz, the question remains as to when and where did Oguz become Oguz. The main Oguz migration into Anatolia was of the Seljuk variety, and the Seljuks were clearly Persianate in culture by that time -- who knows of their genetics by the point of entry...

Seljuks were only a small elite or rather a dynasty. The vast majority of Oghuz/Turcomans were then nomads and had very little interaction with city life if at all. As for the origin of the Oghuz people, it is recorded by both Mahmud of Kashgar and Ibn al-Athir that the Oghuz migrated to the Aral steppe lands from an area around the Altai and Tengri (Tian Shan) mountains, Ibn al-Athir even specifies the time of that migration: during the reign of the Abbasid caliph Al-Mahdi (775-785). That date makes perfect sense given the ensuing chaos in the Eurasian steppes following the collapse of the Gokturk Khaganate around the middle of the 8th century.

At any rate, you continue to claim that moderns are good enough proxies for recent ancients (past millenium). And, I respectfully disagree, despite your great expertise...

Where did I claim that? In some cases they can be considered good enough proxies such as modern Anatolian Greeks as a proxy for Byzantine Anatolian Greeks, but for Oghuz/Turcomans of 1000 years ago there is no modern proxy we can confidentially point to because of all those migrations, mixing, etc.

Kanishka said...

@Onur Dincer I personally think that the Yakut would be excellent proxies for the early Turkic populations which moved into the territories formerly inhabited by Eastern Iranics, and the Kyrgyz for the early Turkic migrants to Anatolia. Keep in mind that at the time of the great Turkic migrations, it is likely that the Turks moving westwards were not significantly admixed with the Central Asian Iranic speakers. It was not until the Mongol invasions that the Turkic groups in Central Asia began assimilating the native Iranic speakers, likely due to a desire to replenish their numbers after the Mongol invasions devastated both them and their neighbours. Obviously, then you had the migration of the Uzbeks from the former Chagatai domains, which further led to assimilation of the native Iranics. Unfortunately, the Mongols exterminated many of these great peoples with a rich history, culture, and traditions, and we will likely never know what might have been...

In fact, one can notice that this assimilation by the Turkic speakers was not complete, as even today you can find Turkmen who are mostly genetically Iranic (90%+), though, it is very rare.

Onur Dincer said...

@Kanishka

I personally think that the Yakut would be excellent proxies for the early Turkic populations which moved into the territories formerly inhabited by Eastern Iranics, and the Kyrgyz for the early Turkic migrants to Anatolia. Keep in mind that at the time of the great Turkic migrations, it is likely that the Turks moving westwards were not significantly admixed with the Central Asian Iranic speakers. It was not until the Mongol invasions that the Turkic groups in Central Asia began assimilating the native Iranic speakers, likely due to a desire to replenish their numbers after the Mongol invasions devastated both them and their neighbours. Obviously, then you had the migration of the Uzbeks from the former Chagatai domains, which further led to assimilation of the native Iranics. Unfortunately, the Mongols exterminated many of these great peoples with a rich history, culture, and traditions, and we will likely never know what might have been...

In fact, one can notice that this assimilation by the Turkic speakers was not complete, as even today you can find Turkmen who are mostly genetically Iranic (90%+), though, it is very rare.


Yakuts are unlikely to be a good proxy for Proto-Turks because Yakuts themselves are products of a relatively recent Turkic migration to their current territory due to the pressures of the Mongol Empire from a region in around Lake Baikal and have been mixing with their Tungusic neighbors since then. If you want to see a relatively good modern proxy for Proto-Turks, I think you should look at populations such as Mongols and Tuvinians, they live in and around the Proto-Turkic lands at least and have been living quite similar lives to them for most of their history.

Kanishka said...

@Onur Dincer Good points, well stated. Why do you think this is though, and where does most of the recent admixture in Yakuts come from?

Anthro Survey said...

@Kanishka

"So, do you think that Al-Khwarezmi and Al-Biruni would genetically cluster with Europeans or S/C Asians? "

Intermediate between extant SC Asian groups and Sintashta/Andronovo.

& I expect little to no true steppe admixture in BMAC. Maybe some pseudo-steppe effects of some kind due to presence of UHG and/or some ANE-like ancestry?

Regarding what you said about having European features----yeah, if my predictions hold, they'd be expected to overlap more with present-day N and E Euros than any other S/C Asian group(s).
A couple of years back, they were going to cast DiCaprio as Rumi(of Balkhi origins). They've now reconsidered and I'm glad because of it. Though far from ideal, casting him as either of the two polymaths instead would have been more logical.

Onur Dincer said...

@Kanishka

Good points, well stated. Why do you think this is though, and where does most of the recent admixture in Yakuts come from?

You are welcome. Yakuts have been mixing with Evenk and Even natives since their arrival in their current territories as can be seen from their genetics. Recently they also have been mixing with Russians.

Anthro Survey said...

@Rob

I can't find the article in this thread. Can you post the link again please? :-)

Elite dominance isn't a far-fetched idea, though. Sure, people overstate the influence these elites had, but it holds water wrt transmission... Having said this, I highly doubt a huge difference in the steppe content of the Mycenaean elites and commoner.

Kanishka said...

@Anthro Survey Good points you have made there, I must say. Though, what makes you so sure that there will be little to no Steppe admixture in the potentially upcoming BMAC samples? Personally, I think there should be some at least if these samples are from the time period in which BMAC overlapped with the Andronovo culture, wouldn't you say? Also, I agree with you, casting him as either Khwarezmi or Biruni would certainly have been more logical.

Really? Yeah, I guess I have to agree with you there. You have made some excellent points here.

@Onur Dincer Thank you, and interesting to know. It is unfortunate that we do not have any ancient Turkic samples available to us at this moment in time.

Onur Dincer said...

@Kanishka

Thank you, and interesting to know. It is unfortunate that we do not have any ancient Turkic samples available to us at this moment in time.

Actually we have a Gokturk-era Turkic ancient autosomal result from the Altai region, it seems most similar to the Kyrgyz results among modern results.

Seinundzeit said...

Kanishka,

"So, do you think that Al-Khwarezmi and Al-Biruni would genetically cluster with Europeans or S/C Asians?"

Is that a serious question?

Obviously, they would cluster with South Central Asians, and probably very close to Yaghnobi people.

Just imagine contemporary Tajiks (not Pamiris, but Farsi/Dari speaking people of Tajikistan), but without the recent layer of Turko-Mongolic admixture.

You have to look at the temporal scale which you're examining.

I mean, Al-Khwarezmi and Al-Biruni are fairly recent individuals; some magical Iran_N/Iran_ChL/Armenia_EBA genetic reemergence didn't happen between their existence and the present day.

Kanishka said...

@Onur Dincer Thanks, I did not know about this sample. Would you mind linking the study in question?

@Seinundzeit Pretty sure your assessment is on point, and I was just curious because of what Anthro said about them having around 85% Steppe-related admixture. On a side note, how to get that excellent ASI proxy of yours on David's K25? It looked very nice.

Anthro Survey said...

@Sein

Regarding those actual Farsi-speaking Tajiks(if you've looked into it)---
I was always curious about this. Their Turkic admixture layer doesn't consist exclusively of Buryat-like ancestry, but has a West Eurasian component, too. Think Scythian_Pazryk, Russia_IA from the Altai or Kyrgyz. Probably DECENT proxies for early Turkics and Turkic admixture.

So, after subtracting this source of steppe ancestry, do they still have comparable steppe ancestry to Yaghnobis and Pamiris, on average? Or less?

@Kanishka

I think there's pretty good material evidence to suggest that, up until their demise, BMAC communities did not mingle much with people from those "kurgan"-derived/influenced complexes---be they Tazabagyab or Vaksh.

Rob said...

@ Anthro

Sure, here

"Emergence of the Ideology of the Warrior in the Western Mediterranean during the second Half of the fourth Millennium BC, Eurasia Antiqua. Zeitschrift für Archäologie Eurasiens 14 (2014), p. 171-184." C Jenuesse

Also, as per my original post (which Sam obviously misunderstood)

- there is relatively low steppe (15%) admixture in Mako- which is contemporary to late Yamnaya - Early B.B.; and right next to Hungarian Yamnaya

- there is only very modest steppe admixture in Greece, which probably arrived via Thrace and Dalmatia

Also; they are actually smaller than the inverse impact of EEF in MBA steppe.
ELite dominance you might argue , but I see no evidence even for that.
Sure, there was a stepp-rich Z2013 guy in Hungary BB, but buried as a commoner.

Anthro Survey said...

"Anthro said about them having around 85% Steppe-related admixture"

Never said this. Again, read carefully what I wrote. 40% steppe_MLBA. It's just that substratum would not be so Iran_Chl/N heavy but would consist more smth more ANE-rich resulting in that artifact. Good reason to think this considering latitude and preceding Kelteminar nearby.

Seinundzeit said...

Anthro Survey,

Honestly, I can't recall seeing Farsi-speaking Tajikistanis in David's various PCAs.

But based on the picture seen with ADMIXTURE, minus the Turkic infusion, they would probably have a similar ratio of steppe-related to Iranian plateau-related ancestry as Yaghnobi people.

Rob said...

@ Anthro

I think most people see this
https://www.google.com.au/imgres?imgurl=https://i.ytimg.com/vi/nvQXXtgcEF0/maxresdefault.jpg&imgrefurl=https://www.youtube.com/watch?v%3DnvQXXtgcEF0&h=1080&w=1920&tbnid=2i261YB_EVDw1M:&tbnh=160&tbnw=285&usg=__NqTePG7C2sT4MeaobdenbsDzUhs%3D&vet=10ahUKEwir8Ifwv53ZAhWFrJQKHV4aBfkQ9QEIMDAA..i&docid=cUdWMgV7AWkiZM&sa=X&ved=0ahUKEwir8Ifwv53ZAhWFrJQKHV4aBfkQ9QEIMDAA


But i see this (not including south asia)
https://i.imgur.com/9A2XrFV.png

Anthro Survey said...

@Rob

Thanks for the article.

"there is only very modest steppe admixture in Greece, which probably arrived via Thrace and Dalmatia"

In other words, you don't think steppe ancestry in the Mycenaeans corresponds to the transmission of Mycenean Greek speech, but to speakers of other languages who later ended up assimilating into "Greek" society?
If so, plz diagram how early Greek arrived there/by whom. Picture=1000 words.

Mako---the Hungary_BA sample close to modern Iberians on a 2D PCA? I remember it was of questionable coverage, but I'll double check. At any rate, I don't know why Vucedol and Hungary_BA don't seem to be as steppe-heavy as German beakers and more French-like as Simon_W said. May have to do with migration specifics.

Rob said...

@ Anthro

Picture done

Of course, if Mycenean shaft graves come back full of R1a or R1b, then it's done and dusted, and the classic Kurgan is proven. There'll be no arguements there.

Till then, im still up in there. And note, I am merely analysing movements and cultural shifts. I am always careful not to make any sweeping linguistic assertions, although sometimes people imagine I do.

Anthro Survey said...

@Rob

blue 5000BC----Balkan EEF expansion? No/miniscule CHG associated with it, right?

3300BC----proto-Greek speakers(or some upstream language)? Do you see them as carrying EHG-related ancestry or just CHG?

I do strongly agree with your treatment of Russian Yamnayans(purple). Likewise don't see them as ancestral to modern Europeans.

Eren said...

@Matt:
Regarding the creation of simulated/ghost pops, I haven’t dealt with that myself yet. But maybe we can try to automate the process with a script, if you think that’s possible (workflow etc).

@MomOfZoha:
VIPs should always come first. Hopefully my reply didn’t come off as unpolite, as I only focused on the part of your post about Siberian admixture. I was in a hurry and used your remark as a segue to post my analyses that I wanted to share.
Sorry, about your grandparents. I was lucky enough to test mine when they were all still alive. My maternal grandpa has passed since then. Being able to model my ancestry at the level of my grandparents is definitely great. Don’t give up trying to convince your great uncles.
Now with the labels that Arza added to your graph (great job btw), a lot of it makes sense. A very different kind of visualization than I’m used to. Very nice.
About the Turkic ancestry in northern Anatolia, and Anatolia in general I would have to agree with Onur here.
I might look up that book recommendation of yours later. AFAIK the movie “The 13th Warrior” is based on Ibn Fadlan’s encounters with the Vikings.

@Onur
Regarding the Kayseri samples, the observation was specifically about the ones in the current Global 25 data set. Below are the top 15 closest populations to that sample. Based on that it definitely looks like a mislabeling. The Turkish samples in the ancient 67 PCA (also from Kayseri) on the other hand look normal (cluster with me).

@David, do you have any idea of what happened with the Kayseri samples?

Ashkenazi_Jew 0.01953279
Sephardic_Jew 0.02091176
Maltese 0.02405819
Moroccan_Jew 0.02722527
Sicilian_East 0.0338304
Libyan_Jew 0.03457357
Cypriot 0.03508615
Italian_South 0.03512949
Tunisian_Jew 0.0355724
Sicilian_West 0.03829895
Druze 0.04758212
Lebanese_Druze 0.04873213
Lebanese_Muslim 0.04884838
Lebanese_Christian 0.04969837

Anthro Survey said...

@Sein

It's possible, of course, but think it's probably a bit lower.

Lowlands---denser Iran_Chl settlement, easier for steppe_MLBA to have been diluted there from the get-go. Plus, there were minor movements of people from further west down the line into the region, be it the Sassanian era or due to the cosmopolitan nature of Farsi cities like Tus, etc. in medieval times. I've heard Tajiks from Mazar claim ancestry from Samarkand, for ex, and vice versa.

Seinundzeit said...

Anthro Survey,

Sure; like a 5% or so difference, so fairly close.

For what it's worth, the Yaghnobi have noticeably less steppe-related admixture in comparison to the Pamiri peoples.

Rob said...

@ Anthro

Light Blue: Yes, classic EEF
Black: Hypothetical source of CHG
Red: Steppe EBA (EHG/ CHG) , to early CWC and Afansyevo
Purple: Sintashta: MNE containing CWC/ west Yamnaya moving East.
Brown: Migration from Plateau to Anatolia & Aegean, and significant in Myceneans but not Minoans.

If we observe, we can argue 3 hypotheses.
1) There is an expansion of ANF/ EEF from Anatolia (8000 BC) to Greece (7000 BC), Balkans (south 6000 BC-> south 4500 BC), steppe (Eneolithic 4500 BC -> west Yamnaya 3200-2600 BC), then Andronovo / Sintashta (2200 BC) via Poltavka Outlier, late CWC, etc and if one argues, South Asia but was 'washed away' by the time it arrived.
If this occurred, then east Yamnaya / Afansievo could be non-IE. Tocharian becomes problematic.

2) There is an expansion from Caucasus, and perhaps beyond, with a southern route via Anatolia and northern route, then spreading from the steppe as late PIE.

3) Impacts from 1 & 2 were mere influences, and PIE was essentially a (relatively) sudden expansion of native steppe groups, which would be demonstrated by contextual steppe admixture in Myceneans, post-Harappa, Anatolia, etc.

I do not promote any one in particular, but only recognise that 3 possibilities exist for the linguistic question. But it;s important to also recognise the cultural and ideological changes associated with the entire pre-Yamnaya period. That is the key.


BTW: The I1502 is very good coverage and we no longer have to describe it by modern analogy.

Chad Rohlfsen said...

Sein,
Here is a run with my own ASI. I'm not sure about the result, but it is interesting with all the many choices available. Excuse the methodology, but just testing it out.

Paniya
"ASI" 54.5
"Iran_N" 25.3
"GoyetQ116-1" 13.55
"MA1" 5.7
"Papuan" 0.95
"Loschbour" 0
"Yoruba" 0
"ElMiron" 0
"Kostenki14" 0
"Narva_Estonia" 0
"Narva_Lithuania" 0
"Vestonice16" 0
"Villabruna" 0
"Natufian" 0
"Yamnaya_Kalmykia" 0
"Yamnaya_Samara" 0
"Srubnaya" 0
"Scythian_AldyBel" 0
"Scythian_Pazyryk" 0
"Scythian_Samara" 0
"Scythian_ZevakinoChilikta" 0
"Poltavka" 0
"Murut" 0
"Mongolian" 0
"Lebbo" 0
"Levant_BA" 0
"Levant_N" 0
"Kinh_Vietnam" 0
"Karasuk" 0
"Karasuk_outlier" 0
"Iran_ChL" 0
"Iran_IA" 0
"Iran_LN" 0
"Igorot" 0
"Dusun" 0
"Dinka" 0
"Dai" 0
"Andronovo" 0
"Andronovo_outlier" 0
"Armenia_ChL" 0
"Armenia_EBA" 0
"Armenia_MLBA" 0
"Altai_IA" 0
"Aeta" 0
"Afanasievo" 0
"Agta" 0


Funny enough, look at the closest single pops to the Paniya

[1] "1. CLOSEST SINGLE ITEM DISTANCES"
ASI GoyetQ116-1 Kostenki14
0.2720454 0.2970452 0.3012848
Aeta Agta Scythian_ZevakinoChilikta
0.3125922 0.3142808 0.3167663
Altai_IA Scythian_Pazyryk
0.3201339 0.3310801

Onge are closer still at around .17, but I think that needs to be resolved. My ASI looks similar to an Onge-shifted Negrito, minus some Denisova. Here is a run without the UP or Mesolithic samples, except for MA1.

[1] "distance%=20.3246 / distance=0.203246"


Paniya
"ASI" 57.8
"Iran_N" 28.2
"MA1" 12.6
"Papuan" 1.4
"Yamnaya_Kalmykia" 0
"Yamnaya_Samara" 0
"Srubnaya" 0
"Scythian_AldyBel" 0
"Scythian_Pazyryk" 0
"Scythian_Samara" 0
"Scythian_ZevakinoChilikta" 0
"Poltavka" 0
"Karasuk" 0
"Karasuk_outlier" 0
"Iran_ChL" 0
"Iran_IA" 0
"Iran_LN" 0
"Andronovo" 0
"Andronovo_outlier" 0
"Armenia_ChL" 0
"Armenia_EBA" 0
"Armenia_MLBA" 0
"Altai_IA" 0
"Aeta" 0
"Afanasievo" 0
"Agta" 0

Chad Rohlfsen said...

That's just kind of playing around a little bit, nothing serious. If you want to use this in any way. Here are the scaled numbers I used.

ASI,-0.01,-0.33,-0.09,0.04,0.07,0.001,-0.012,0.005,0.03,0.015,0.027,0.001,-0.001,0.004,-0.01,-0.01,0.01,0.001,-0.003,0.03,0.000001,0.0001,-0.02,-0.0005,0.0015

Seinundzeit said...

Chad,

Very interesting!

I'll definitely give this simulation a spin.

Onur Dincer said...

@Kanishka

Here is the paper in question: https://www.nature.com/articles/nature14507

The sample is RISE504.

Onur Dincer said...

@Eren

Thanks for the info.

Simon_W said...

@ Anthro
Yeah, the Mako sample Rob mentioned is I1502 aka BR1. It's not close to Iberians in 2D PCA, but excessively WHG shifted, like one of the Vatya samples.

Simon_W said...

Some Global 25 models of my close ancestors (run with nMonte 1.0 and without scaling):

maternal grandmother (ancestry from Esslingen am Neckar (D), Biberach an der Riss (D) and Liestal (CH)):

"French_South" 24.3
"Italian_Tuscan" 23.25
"Nordic_IA" 21.6
"Hungary_IA" 20.4
"Halberstadt_LBA" 8.45
"French_East" 2

paternal grandfather (ancestry from Hasel (D) and Münchwilen AG (CH))
(based on extrapolated coordinates via my father and my paternal grandmother):

"Nordic_IA" 71
"Hungary_IA" 16.6
"French_South" 9.35
"Italian_Tuscan" 3.05
"Halberstadt_LBA" 0
"French_East" 0

So, the components are similar to my maternal grandmother, the difference being mostly in the proportions. These models look sensible I'd say. I'm just surprised by the substantial Hungary_IE in both.

paternal grandmother (from Pettelkau, East Prussia):

"Nordic_IA" 66.05
"Baltic_BA:Turlojiske3" 19.2
"England_Anglo-Saxon" 9.6
"Slav_Bohemia:RISE569" 5.15
"Halberstadt_LBA" 0
"Hungary_BA:I1502" 0
"Polish" 0

"distance%=1.6795

Makes complete sense. About 3/4 North German, nearly 20% Prussian Balt and about 5% Slavic admixture. No Hungary_BA I1502 needed here.

What I inherited from my mother (coordinates inferred via mine and my father's coordinates):

"French_East" 61.2
"French_South" 21.9
"Cypriot" 7.15
"Minoan_Lasithi" 6.55
"Mozabite" 3.2
"Hungary_BA:I1504" 0
"Remedello_BA:RISE489" 0
"Mycenaean" 0
"Anatolia_BA" 0
"Anatolia_ChL" 0
"Levant_BA" 0
"England_Roman_outlier:3DT26" 0

distance%=1.7242

This would mean that my 1/4 Italian ancestry (from Cesena, Montiano, Meldola, Forlimpopoli) is like:

"French_East" 24.5
"French_South" 42.6
"Cypriot" 13.9
"Minoan_Lasithi" 12.74
"Mozabite" 6.2

So maybe a mix of Celts, Ligurians, "Tyrsenians" and considerable Roman Age gene flow from the East Med and North Africa. Makes sense, I think.

I can model my maternal coordinates also like this:

"French_East" 59.65
"Italian_Bergamo" 16.65
"Sicilian_West" 15.7
"Sicilian_East" 8
"Italian_South" 0
"Italian_Tuscan" 0

"distance%=1.7737

So my 1/4 Italian ancestry would be like:

"French_East" 21.5
"Italian_Bergamo" 32.4
Sicilian-like 46.1

The fit of the model is only slightly worse than with the ancient pops. This is quite remarkable, almost 50% Sicilian-like in spite of the purely North Italian origin. But makes sense from what I heard about Romagnols.

«Oldest ‹Older   1 – 200 of 334   Newer› Newest»