Concrete reasons to be skeptical about an ‘Abstract Wikipedia’

Computer scientist Denny Vrandečić has the interesting idea to develop Abstract Wikipedia, an initiative that would use “structured data from Wikidata to create a multilingual, machine-driven knowledge platform” (source). As a linguist, I am skeptical. I first recorded my skepticism about this on twitter, where I wrote:

For a limited class of ‘brute facts’ this may help; but the structural and semantic differences between, say, Chukchi and Farsi surely exceed those between C# and F# languages… and that’s even before considering cultural and communicative context

To this, Vrandečić responded with a challenge: “Can you find a concrete example in Wikipedia, where a sentence or short paragraph wouldn’t be expressible in two languages of your choice? I would be curious to work through that.”

I dediced to follow up on this, using an example of Vrandečić himself: the word “mayor”. This is seemingly a clear enough, easily definable term. As in, “The current major of San Francisco is London Breed” (source). So, what would the maximally abstract, language-agnostic content of this concept be, and how would we use it to autotranslate content into other languages?

Staying within Germanic languages, let’s start easy, with German. Do you pick Bürgermeister or Oberbürgermeister? Depends, among other things, on the kind of analogical mapping you want to do (and there are multiple possibilities, none neutral). Or take Swedish, where the cognate term borgmästare was actually abandoned in the 1970s. Perhaps this one’s easy: bilingual as they are, Swedes may well just use “mayor” for the mayor of San Francisco — but that’s boring and issues would still arise with historical articles

Moving to Slavic, how about Polish? We’ll probably use a calque from German (‘burmistrz’) but the semantic space is again being warped by alternatives, and partial incommensurability is demonstrated by key terms remaining untranslated in this academic paper on the topic.

Colonialism has an ugly habit of erasing cultural institutions and indigenous voices and vocabulary — and even then the result is rarely simple translational equivalence, as seen in the use of Spanish alcalde (judge/mayor/{…}) in the Americas (Schwaller 2015).

Moving further afield, let’s take Samoan, where the office of pulenu’u is a weird mesh of locally organized administration and colonial era divide and conquer policies. “Mayor” might be translated as pulenu’u but it would definitely have a semantic accent. On the very useful notion of “semantic accent”, see Werner 1993. It is this kind of careful anthropological linguistic work that most strongly brings home, to me, the (partial) incommensurability of the worlds we build with words.

I have here focused on languages for which there at least appears to be an available (if not fully equivalent) translation, but of course there will also be those that simply haven’t lexicalised the concept — think of languages spoken by egalitarian hunter-gatherers. One might say that they could surely adopt the concept & term from another language and get it. Sure. And there’s the rub: ontologies are rarely neutral. The English term “mayor” is supported by and realized in its own linguistically & culturally relative ontology.

While I’ve mostly taken the English -> other language direction here (which may seem easier because of globalization, cultural diffusion, calqueing, etc.), clearly the problems are at least as bad if you try going the other direction, starting from other culturally relative notions.

In sum. TL;DR Wikidata may help us autofill some slots, but social ontologies are never language-agnostic — so the project risks perpetuating rather than transcending the worldviews most prevalent in current Wikipedia databases, which means broadly speaking global north, Anglo, western worldviews. I think Wikidata is perhaps promising for brute physical facts like the periodic table and biochemistry. But the social facts we live with —from politics to personhood and kinship to currency— are never fully language-independent, so any single ontology will be biased & incomplete.

If even a seemingly innocuous term like “mayor” is subject to this kind of warping of semantic spaces (if it’s available at all), that doesn’t bode well for many other concepts. Which is why, even if I like the idea, I’m skeptical about a concrete Abstract Wikipedia.

Leave a Reply

Your email address will not be published. Required fields are marked *