search this blog


Friday, May 22, 2015

A few ESHG 2015 abstracts

Apart maybe from the first abstract below, I haven't been able to find anything jaw dropping yet. If anyone wants to help out, the abstract search engine is here.

Spatial variation of the Y-chromosome: The global patterns and correlations with other genetic systems, linguistic and geography

Balanovsky et al.

We developed the “Y-base” database, which includes frequencies of 500 Y-chromosomal haplogroups in 4200 populations worldwide, with total sample size 142,000. 130,000 Y-chromosomes came from 300 published papers and remaining 12,000 are our unpublished data.

Using this dataset we created the world spatial distribution maps of 230 haplogroups. This World Atlas of Y-chromosomal variation was created by GeneGeo software, which we developed for digital map analysis in gene geography. The zones of sharp changes in frequencies were interpreted as genetic boundaries; the main boundary crosses Eurasia and includes not only mountain (Himalayas and Caucasus) but also steppe segments.

The question arises to which degree patterns of Y-chromosomal variation agree with data on other genetic systems. To answer, we characterized all extant ethnic groups speaking Balto-Slavic languages by mitochondrial DNA (N=6,876), Y-chromosome (N=6,079) and genome-wide SNPs (N=296). We found that genetic distances, based on autosomal and Y-chromosomal loci, show a high correlation (0.9) both with each other and with geography but slightly lower correlation (0.7) with the mitochondrial DNA and linguistic affiliation.

The high-throughput sequencing of the Y-chromosome reveals thousands phylogenetically informative SNPs. Population screening for these markers subdivides old haplogroups with subcontinental zones of spread into multiple young haplogroups with restricted areas - thus providing excellent tools for reconstructing population history. This approach allowed us successfully subdivide C2-M217, N1c-M178, and G1-M285 into 35 new subhaplogroups, to create their frequency distribution maps and estimate the SNP and STR mutation rate on the Y-chromosome.

Detection of mitochondrial haplogroups variability of small population living in 9th century based on analysis of ancient DNA

Šebest et al.

Introduction: Ancient DNA (aDNA) represents all types of DNA that can be recovered from archaeological and palaeontological material or museum specimens. Information from aDNA is very useful in phylogenetics, paleoanthropology or genealogy. The isolation and analysis of aDNA is accompanied by two major problems: low quality and quantity of aDNA and the risk of contamination with modern DNA. Therefore, several strict laboratory and methodological criteria must be followed. The aim of this study is to isolate and analyze aDNA from human remains of the small Avar-Slavic population living in 9th century and to determine mitochondrial haplogroups in order to estimate the ratio of haplogroups typical for these two ethnicities.

Material and methods: The 50 samples of human teeth and bones were used for the isolation of aDNA in this experiment. The samples were excavated from Avar-Slavic burial site located near Cífer-Pác (Slovakia). Isolation of aDNA were performed in recommended conditions. Mitochondrial haplogroups were determined by sequencing of the HVRI of mtDNA followed by analysis of polymorphisms in this region.

Results: Despite the fact that the graves of mentioned burial place contained Avar artefacts and some remains showed mongoloid cranial features, majority of detected mitochondrial haplogroups belong to the common lineages of the Slavic populations and only presence of haplogroup U7 (typical for region of Near East) indicate the Avar origin. Conclusion: Our results suggest that the assimilation between Avars and other neighbour ethnicities was too extensive in 9th century and, therefore the presence of haplogroups characteristic for Avars is very rare.

Analysis of the Y-chromosome in the Volga-Ural region populations from Russia

Trofimova et al.

We analyzed a sample of the Volga-Ural region, including 462 individuals from 8 populations: Udmurts, Komi, Mordvinians, Mari, Besermyans, Chuvashes, Tatars, Bashkirs. We have shown that the major proportion of Y-chromosome haplogroups in the studied populations accounted for the four branches (R1b-M269, R1a-M198, N1c1-Tat and N1c2-P43), which together make up from 51% to 100% of the patrilineal genetic diversity in the studied region.

We have shown that West Asian and Central Asian Y-chromosome haplogroup R1a-Z2125 in the Volga-Ural region occurs with the greatest frequency in Bashkirs (31%), which is the dominant subgroup of haplogroup R1a-M198 in this population despite the fact that in other populations Eastern European R1a-M558 and R1a-M458 are the dominant lines. This fact indicates that different haplogroup R1a-M198 lines in the populations of the Volga-Ural region have different sources.

The Eastern European influence in the population can be also seen in Tatars from Tuimasinsky district of Bashlortostan in which typical for Central Europe haplogroup R1b-M405 is the predominant line of the haplogroup R1b-M343. According to the PCA analysis based on the Y-chromosome haplogroups distribution, Bashkirs show the greatest separation from other populations of the region. The reason is the presence with the high frequency of Asian lineages in their gene pool.

Phylogeographic refinement of human Y chromosome haplogroup E provides new insights into the early dispersal of herders in sub-Saharan Africa

Trombetta et al.

Recently, a high number of Y chromosome SNPs has been discovered through next generation sequencing studies, but the geographic distribution for most of these variants remains largely unexplored.

Haplogroup E is the most common human Y chromosome clade within Africa and its internal branches have been linked to a wide range of human movements. To increase the level of resolution of haplogroup E, we disclosed the phylogenetic relationships among 729 mutations found in 33 haplogroup DE Y-chromosomes sequenced at high coverage in previous studies and further dissected the E-M35 subclade by genotyping 62 informative markers in about 5000 samples from 118 worldwide populations.

The phylogeny of haplogroup E showed novel features compared to the previous topology, including a new basal clade. Within haplogroup E-M35, we resolved basal polytomies and assigned all the E-M35* chromosomes to different new monophyletic clades. Through a Bayesian phylogeographic analysis, we associated each node of the tree to specific geographic areas. By this analysis, we identified a new E-M35 sub-Saharan clade, which originated about 11 kya in the northern part of the Horn of Africa. SNP-based dating, phylogenetic structuring and geographic distribution of this clade (and its sub-clades) are consistent with a multi-step dispersal of herders within eastern Africa and its subsequent diffusion to sub-equatorial areas.

Our results provide new insights into the evolutionary hypotheses about the spread of pastoralism in Africa and increase the discriminative power of the E-M35 haplogroup for use in forensic genetics through the identification of new ancestry informative markers.

Tuesday, May 19, 2015

Large-scale recent expansion of European patrilineages

Open access at Nature Communications today: Large-scale recent expansion of European patrilineages shown by population resequencing by Batini et al.

It's a shame the authors failed to sample any Eastern European populations, but it's still a very useful effort which moves us in the right direction in an area of study that has floundered from the start, largely due to the widespread use of bikini STR haplotypes and faulty methodology. From the paper:

Here, we use targeted NGS of European and Middle Eastern populations to show that Europe was affected by a major continent-wide expansion in patrilineages that post-dates the Neolithic transition. Resequencing at high coverage of 3.7 Mb of MSY DNA, in each of 334 males comprising 17 population samples, defines an unbiased phylogeny containing 5,996 high-confidence single-nucleotide polymorphisms (SNPs). Dating indicates that three major lineages (I1, R1a and R1b), accounting for 64% of the sampled chromosomes, have very recent coalescent times, ranging between 3.5 and 7.3 KYA. In demographic reconstructions (17) a continuous swathe of 13/17 populations from the Balkans to the British and Irish Isles share similar histories featuring a minimum effective population size ~2.1–4.2 KYA, followed by expansion to the present. Together with other data on maternally inherited mtDNA (16, 18) and autosomal DNA (19), our results indicate a recent widespread male-specific phenomenon that may point to social selection, and refocuses interest on the social and population structure of Bronze Age Europe.


The shapes of different clades within the tree (Fig. 1a) vary greatly. Haplogroups E1b-M35, G2a-L31, I2-P215, J2-M172, L-M11 and T-M70 contain long branches with deep-rooting nodes, whereas I1-M253, N1c-M178, R1a-M198 and R1b-M269 show much shallower genealogies.

The recent and rapid continent-wide demographic changes we observe suggest a remarkably widespread transition affecting paternal lineages. This picture is confirmed in an independent analysis of MSY diversity in the pooled HGDP CEPH panel European samples (16), and is compatible with current (n=98) ancient DNA data for MSY (Fig. 3; Supplementary Table 8), in which hgs R1a, R1b and I1 are absent or rare in sites dating before 5 KYA, whereas hgs G2a and I2 are prevalent.

The period 4–5 KYA (the Early Bronze Age) is characterized by rapid and widespread change, involving changes in burial practices that might signify an emphasis on individuals or kin groups, the spread of horse riding, and the emergence of elites and developments in weaponry (35). In principle male-driven social selection (36) associated with these changes could have led to rapid local increases in the frequencies of introgressing haplogroups (34), and subsequent spread, as has been suggested for Asia (37). However, cultures across Europe remain diverse during this period; clarifying the temporal and geographical pattern of the shift will rely heavily on additional ancient DNA data.

Batini, C. et al. Large-scale recent expansion of European patrilineages shown by population resequencing. Nat. Commun. 6:7152 doi: 10.1038/ncomms8152 (2015).

See also...

R1a1a from an Early Bronze Age warrior grave in Poland

The LN/EBA: like one big party

Massive migration from the steppe is a source for Indo-European languages in Europe (Haak et al. 2015 preprint)

The Origins of Proto-Indo-European: The Caucasian Substrate Hypothesis

Some interesting stuff here from Allan R. Bomhard, especially in light of the ancient DNA we've seen recently from Late Neolithic/Early Bronze Age Europe.

ABSTRACT: There have been numerous attempts to find relatives of Proto-Indo-European, not the least of which is the Indo-Uralic Hypothesis. According to this hypothesis, Proto-Indo-European and Proto-Uralic are alleged to descend from a common ancestor. However, attempts to prove this hypothesis have run into numerous difficulties. One difficulty concerns the inability to econstruct the ancestral morphological system in detail, and another concerns the rather small shared vocabulary. This latter problem is further complicated by the fact that many scholars think in terms of borrowing rather than inheritance. Moreover, the lack of agreement in vocabulary affects the ability to establish viable sound correspondences and rules of combinability. This paper will attempt to show that these and other difficulties are caused, at least in large part, by the question of the origins of the Indo-European parent language. Evidence will be presented to demonstrate that Proto-Indo-European is the result of the imposition of a Eurasiatic language — to use Greenberg’s term — on a population speaking one or more primordial Northwest Caucasian languages.

Allan R. Bomhard, The Origins of Proto-Indo-European: The Caucasian Substrate Hypothesis. Paper to be presented at “The Precursors of Proto-Indo-European: the Indo-Hittite and Indo-Uralic Hypotheses”. Workshop to be held at the Leiden University Centre for Linguistics, Leiden, The Netherlands, 9—11 July 2015.

See also...

Tracking ASI on the ancient steppe

(Note here the high level of the K=12 Abkhazian/Georgian-centered Transcaucasian component among the Yamnaya and Corded Ware genomes.)

Modeling Yamnaya with qpAdm

First look at Bell Beaker, Corded Ware and Yamnaya genomes

Massive migration from the steppe is a source for Indo-European languages in Europe (Haak et al. 2015 preprint)

Sunday, May 17, 2015

Tracking ASI on the ancient steppe

Below is a series of ADMIXTURE results featuring an Ami-centered Southeast Asian component from K=7 to K=11 that appears to effectively track both East Asian and Ancestral South Indian (ASI) ancestry. The K=12 features an Indian-specific cluster that might also be useful in tracking ASI. Admittedly though, it looks a little bit noisy, probably because it packs a good deal of West Eurasian admixture.

K=7 spreadsheet

K=8 spreadsheet

K=9 spreadsheet

K=10 spreadsheet

K=11 spreadsheet

K=12 spreadsheet

The suggestion that ancient South Asians migrated to the European steppe has been made a few times in the comments section at this blog, but it doesn't appear to be valid. Based on the above results I'm willing to wager that Mesolithic, Neolithic and Bronze Age populations of the western steppe lacked ASI.

The Yamnaya and other Late Neolithic/Early Bronze Age (LN/EBA) European samples do show high levels of the South Central Asian component in most of the runs. However, this is most likely a signal of shared Ancient North Eurasian (ANE) or MA1-related ancestry via the Asian steppe rather than South Central Asian admixture in Europe.

By the way, if anyone wants to turn these spreadsheets into 4mix files and try a little modeling, please do so, and let us know what you found in the comments below.

See also...

The enigma of the Kalash

Monday, May 11, 2015

4mix: four-way mixture modeling in R

Thanks to Eurogenes project member DESEUK1. A zip file with the R script, instructions and a couple of data sheets is available here.

So let's model Poles as a bunch of ancient genomes from Central and Eastern Europe using output from my K8 analysis.

Copy & Paste: source('4mix.r')


Copy & Paste: getMix('K8avg.csv', 'target.txt', 'HungaryGamba_EN', 'HungaryGamba_HG', 'Karelia_HG', 'Corded_Ware_LN')


After a few seconds you should see the results...

Target = 19% HungaryGamba_EN + 14% HungaryGamba_HG + 2% Karelia_HG + 65% Corded_Ware_LN @ D = 0.0062

Obviously the script can use ancestry proportions and/or population averages from any test, provided they're formatted properly. The accuracy of the modeling will depend on the quality of the input.

Update 19/05/2015: A new version of the 4mix script that can run multiple targets is available here, courtesy of Open Genomes.

Saturday, May 9, 2015

The time and place of European gene flow into Ashkenazi Jews

It looks like we're about to see yet another paper on the origins of Ashkenazi Jews. A poster on the topic was presented this week at the Biology of Genomes conference, and is available for download here.

It's a very reasonable effort, perhaps the best one so far. However, I'm of the opinion that the genetic structure of the Near East has changed significantly over the millennia. If that's correct, then using modern samples from the Near East to estimate Near Eastern ancestry in Ashkenazi Jews might not work too well.

For instance, let's assume, just for the sake of argument, that ~2,000 years ago the Levant harbored populations that were genetically almost indistinguishable from present-day Cretans. This might mean that Ashkenazi Jews are much less than 50% European. But we won't know until we see some ancient DNA from the Near East, including from the remains of early Jews.


James Xue, Itsik Pe’er, and Shai Carmi, The time and place of European gene flow into Ashkenazi Jews, Biology of Genomes 2015 poster presentation.

Friday, May 8, 2015

Ancient DNA from an Upper Paleolithic European with recent Neanderthal ancestry

Arguably the biggest news from this week's Biology of Genomes conference at the Cold Spring Harbor Laboratory is this:

Team Characterizing DNA from Ancient Human with Recent Neanderthal Ancestry

The article is free, but you might need to register at GenomeWeb to access it. This is basically what it says:

- the male sample, Oase 1, is 37-42K years old and comes from the Pestera cu Oase site in southwestern Romania

- it's estimated to harbor 5-11% of genome-wide Neanderthal ancestry, with as much as 50% on chromosome 12

- the admixture is in relatively long stretches, which suggests that the mixture took place four to six generations before Oase 1 was alive

- it's similar to present-day Eurasians, and apparently from a population that was on the way to becoming European, but probably did not contribute in any significant way to the genetic makeup of present-day Europeans

- thus far, the capture method used by the scientists has managed to snag around 78,000 genome-wide sites (presumably SNPs), but the sequencing efforts continue, and we might see a Y-haplogroup result in the near future

It sounds to me as if Oase 1 is similar to Kostenki14 (see here), so I'd say there's a good chance its Y-chromosome belongs to haplogroup C-M130.