TL;DR Wikidata is an ambitious enterprise, but social ontologies are never language-agnostic — so the project risks perpetuating rather than transcending the worldviews most prevalent in current Wikipedia databases, which means broadly speaking global north, Anglo, western, white cishet male worldviews. I think Wikidata is perhaps promising for brute physical facts like the periodic table and biochemistry. But the social facts we live with —from politics to personhood and kinship to currency— are never fully language-independent, so any single ontology will be biased & incomplete.

Computer scientist Denny Vrandečić has the interesting idea to develop Abstract Wikipedia, an initiative that would use “structured data from Wikidata to create a multilingual, machine-driven knowledge platform” (source). As a linguist, I am skeptical. I first recorded my skepticism about this on twitter, where I wrote:

For a limited class of ‘brute facts’ this may help; but the structural and semantic differences between, say, Chukchi and Farsi surely exceed those between C# and F# languages… and that’s even before considering cultural and communicative context

To this, Vrandečić responded with a challenge: “Can you find a concrete example in Wikipedia, where a sentence or short paragraph wouldn’t be expressible in two languages of your choice? I would be curious to work through that.”

I dediced to follow up on this, using an example of Vrandečić himself: the word “mayor”. This is seemingly a clear enough, easily definable term. As in, “The current mayor of San Francisco is London Breed” (source). So, what would the maximally abstract, language-agnostic content of this concept be, and how would we use it to autotranslate content into other languages?

Let’s start easy by staying within the Germanic languages. In German, do you pick Bürgermeister or Oberbürgermeister? Depends, among other things, on the kind of analogical mapping you want to do (and there are multiple possibilities, none neutral). Or take Swedish, where the cognate term borgmästare was actually abandoned in the 1970s. Perhaps this one’s easy: bilingual as they are, Swedes may well just use “mayor” for the mayor of San Francisco — but that’s boring and issues would still arise when translating or generating historical content.

Moving to Slavic, how about Polish? We’ll probably use a calque from German (‘burmistrz’) but the semantic space is again being warped by alternatives, and partial incommensurability is demonstrated by key terms remaining untranslated in this academic paper on the topic.

Colonialism has an ugly habit of erasing cultural institutions and indigenous voices and vocabulary — and even then the result is rarely simple translational equivalence, as seen in the use of Spanish alcalde, which in the Americas may mean either ‘judge’ or ‘mayor’ (or something else) depending on the historical context. Here, mechanistically picking whatever is the most common English term happens to be leads to important historical distortions and problems of interpretability:

After examining the innumerable occurrences of the term in the historical literature of the last 50 years, it seems that its translation as mayor has occurred not because of the essential nature of the office, but because in the modern world the term alcalde is used where in English one would say mayor. Future scholars of the colonial period would be well served if they considered very seriously the essential function of the office before simply translating the word to a modern equivalent. (Schwaller 2015)

Moving further afield, let’s take Samoan, where the office of pulenu’u is a weird mesh of locally organized administration and colonial era divide and conquer policies (So’o & Laking 2008). “Mayor” might be translated as pulenu’u but it would definitely have a semantic accent. On the very useful notion of “semantic accent”, see Werner 1993. It is this kind of careful anthropological linguistic work that most strongly brings home, to me, the (partial) incommensurability of the worlds we build with words.

I have here focused on languages for which there at least appears to be an available (if not fully equivalent) translation, but of course there will also be those that simply haven’t lexicalised the concept — think of languages spoken by egalitarian hunter-gatherers. One might say that they could surely adopt the concept & term from another language and get it. Sure. And there’s the rub: no natural language ontology is neutral. The English term “mayor” is supported by and realized in its own linguistically & culturally relative ontology.

While I’ve mostly taken the English -> other language direction here (which may seem easier because of globalization, cultural diffusion, calqueing, etc.), clearly the problems are at least as bad if you try going the other direction, starting from other culturally relative notions.

If even a seemingly innocuous term like “mayor” is subject to this kind of warping of semantic spaces (if it’s available at all), that doesn’t bode well for many other concepts. Which is why, even if I can see the attraction, I’m skeptical about a concrete Abstract Wikipedia.


