Saturday, July 29, 2017

Speakers per language diagram & International Linguistics Olympiad memes

Hello readers of Humans Who Read Grammars,

As well as writing on this blog, I also work with the International Linguistics Olympiad (IOL*). The IOL is a contest for students of secondary school from all over the world where they get to compete in solving linguistic puzzles. Normally in order to explain what the contest is all about I send people to the page with old problem sets, but there's a hip IOL-meme page that's produced some very apt memes that may do a better job at explaining the contest to linguists. I'll paste them in below. (Remember how we started as a meme-based blog for typologists?)

I recently made a post on our blog over there about the dominance of European countries in the contest and language diversity. For that post, I derived a little data visualisation of speaker populations per language (based on the 19th edition of Ethnologue) with infogram. I thought y'all might like it as well, so I'm sharing it here too.

By the way, if you're a linguist who'd like to help keep the contest strong and encourage clever youngsters to get into linguistics, get in touch! There's a lot of countries where there is no contest, or where the contest could well do with some help in thinking of clever problems based on small languages, lecturing etc. Talk to us and we'll figure something out.

Here is a table from Ethnologue that tries to explain this as well, a bit niftier but perhaps less pretty.

Table from Ethnologue summarising the number of speakers per language.

* Yes, the International Linguistics Olympiad is abbreviated "IOL". It's a thing about neutrality, don't worry about it.

Wednesday, July 5, 2017

What languages are grammars of the world written in?

Humans have been writing grammars for a long time. The serious expansion into non-european languages is fairly recent though, and associated with colonialism and Christian missionary work. Because of this, it's interesting to see in what language grammars are written in (meta-langauge) as well as what language their about (target-language). In the map above, this is precisely what we see - what the meta-languages of Glottolog language descriptions are.

There's roughly 7,000 languages in the world alive today, and we have some kind of description of approximately 4,000 of them. If you want to find them, go and search Glottolog.

Harald Hammarström, one of the editors of Glottolog, recently shared with me some interesting data on these descriptions that I want to share with all of you. In Glottolog, descriptive references are tagged for which language their in (meta-language) as well as which language they are about (target-language)*.  The map above gives the distribution of meta-languages of the descriptions of 4,005 languages in Glottolog. For each language on the map above there is only one dot with only one color. The color is according to the meta-language of the Most Extensive Description for said language**.

In this map we can clearly see the domination of English as a world language, but we can also so the prevalence of French in former French colonies in Africa and naturally the national languages of the modern nation states like Brazil (Portuguese) and Indonesia (Indonesian).

If we look a bit closer at this data we can see exactly how many target-languages there are per meta-language in total, as well how many documents in Glottolog there are per meta-language. For those documents where it's possible, Hammarström has also compiled a corpus of the actual content text per document and calculated how many types and tokens there are therein.

The table below summarizes this information for all references in Glottolog, i.e. not only the Most Extensive Description per language. There's a total of 96 meta-languages in Glottolog, the table summarized the 9 most common.
Here is an interactive graphic showing the same data as the table above:

We hope you enjoyed that, be sure to explore Glottolog yourself if you haven't already!

* In bibTeX-entries for Glottolog references, meta-language have the entry field "inlg" and target-languages have "lgcode". 

** Most Extensive Description is first sorted by descriptive type (Grammar>Grammar Sketch> etc), then number of pages and lastly publication year.