Sometimes precision gained is freedom lost

Part of the struggle of writing in a non-native language is that it can be hard to intuit the strength of one’s writing. Perhaps this is why it is especially gratifying when generous readers lift out precisely those lines that {it?} took hard work to streamline — belated thanks!

Interestingly, the German translation for Tech Review needed double the amount of words for the same point: “Ein Mehr an Präzision bedeutet manchmal ein Weniger an Freiheit.” I’m still wondering whether that makes it more precise or less.

  • Dingemanse, M. (2020, August). Why language remains the most flexible brain-to-brain interface. Aeon. doi: 10.5281/zenodo.4014750

The perils of edited volumes

Ten years ago, fresh out of my PhD, I completed three papers. One I submitted to a regular journal; it came out in 2012. One was for a special issue; it took until 2017 to appear. One was for an edited volume; the volume is yet to appear.

These may be extreme cases, but I think they reflect quite well the relative risks for early career researchers (in linguistics & perhaps more widely) of submitting to regular journals vs special issues vs edited volumes.

Avoiding the latter is not always possible; in linguistics, handbooks still have an audience. If I could advise my 2012 self, I’d say: 1. always preprint your work; 2. privilege online-first & open access venues; 3. use #RightsRetention statements to keep control over your work.

A natural experiment

Anyway, these three papers also provide an interesting natural experiment on the role of availability for reach and impact. The first, Advances in the cross-linguistic study of ideophones, now has >400 cites according to Google Scholar, improbably making it one of the most cited papers in its journal. This paper has done amazingly well.

The second, Expressiveness and system integration, has >50 cites and was scooped by a paper on Japanese that I wrote with Kimi Akita. We wrote that second paper two years after the first, but it appeared one year before it, if you still follow the chronology. As linguistics papers go, I don’t think it has done all that bad, especially considering that its impact was stunted by being in editorial purgatory for 4 years.

The third, “The language of perception in Siwu”, has only been seen by three people and cited by one of them (not me). I am not sure if or when it will see the light of day.

Some ACL2022 papers of interest

Too much going on at #acl2022nlp for live-tweeting, but I’ll do a wee thread on 3 papers I found thought-provoking: one on robustness probing by @jmderiu et al.; one on underclaiming by @sleepinyourhat; and one on bots for psychotherapy by Das et al..

Deriu et al. stress-test automated metrics for evaluating conversational dialogue systems. They use Blenderbot to identify local maxima in trained metrics and so identify blatantly nonsensical response types that reliably lead to high scores

As they write, "there are no known remedies to this problem". My conjecture (also see Goodhart's law): any automated metric will be affected by this as long as we're training on form alone. It's a thought-provoking paper, go read it

Next! Bowman acknowledges the harms of hype but focuses on the inverse: overclaiming the scope of work on limitations (='underclaiming'). I think his argument underestimates the enormous asymmetry of these cases and therefore may overclaim the harms?

I did wonder whether @sleepinyourhat is playing 4D chess here by writing a paper that's likely to attract citations from work that may have an incentive to overclaim the harms of underclaiming 🤯😂 #acl2022nlp

Third is Das et. al who propose to expose psychologically vulnerable people to conversational bots trained on Reddit, which frankly is every bit as bad an idea as it sounds (the words "ethics" and "risk" do not occur in the paper 🤷) #acl2022nlp #bionlp

There’s been loads more interesting and intriguing work at #acl2022nlp and I have particularly enjoyed the many talks in the theme track sessions on linguistic diversity. Check out the hundreds of papers (8831 pages) in the @aclanthology here:

Okay because @KLM has decided to cancel my flight and delay the next one, some quick notes from the liminality of Dublin Airport on a few more #acl2022nlp papers I found interesting, revealing, or thought-provoking

Ung et al. (Facebook AI Research) train chatbots to say sorry in nicer ways, though without addressing the underlying problems that make them say offensive things in the 1st place. I thought this was both interesting and revealing of FBs priorities. Paper:

Room for improvement: throughout, Ung et al remove "stop words" — but as conversation analysts can tell you, turn prefaces like uh, um, well, etc. often signal interactionally delicate matters, i.e. precisely the stuff they're hoping to track here 😬

Further, feedback is seen as strictly individual — whereas in normal human interaction it (also) reinforces *social* norms. Consider: those offended may not always have the social capital, privilege or energy to speak out ➡️ FBs bots will blithely continue to offend them 🤷

Deep learning, image generation, and the rise of bias automation machines

DALL-E, a new image generation system by OpenAI, does impressive visualizations of biased datasets. I like how the first example that OpenAI used to present DALL-E to the world is a meme-like koala dunking a baseball leading into an array of old white men — representing at one blow the past and future of representation and generation.

It’s easy to be impressed by cherry-picked examples of DALL•E 2 output, but if the training data is web-scraped image+text data (of course it is) the ethical questions and consequences should command much more of our attention, as argued here by Abeba Birhane and Vinay Uday Prabhu.

Suave imagery makes it easy to miss what #dalle2 really excels at: automating bias. Consider what DALL•E 2 produces for the prompt “a data scientist creating artificial general intelligence”:

When the male bias was pointed out to AI lead developer Boris Power, he countered that “it generates a woman if you ask for a woman”. Ah yes, what more could we ask for? The irony is so thicc on this one that we should be happy to have ample #dalle2 generated techbros to roll eyes at. It inspired me to make a meme. Feel free to use this meme to express your utter delight at the dexterousness of DALL-E, cream of the crop of image generation!

The systematic erasure of human labour

It is not surprising that glamour magazines like Cosmopolitan, self-appointed suppliers of suave imagery, are the first to fall for the gimmicks of image generation. As its editor Karen Cheng found out after thousands of tries, it generates a woman if you ask for “a female astronaut with an athletic feminine body walking with swagger” (Figure 3).

I also love this triptych because of the evidence of human curation in the editor’s tweet (“after thousands of options, none felt quite right…”) — and the glib erasure of exactly that curation in the subtitle of the magazine cover: “and it only took 20 seconds to make”.

The erasure of human labour holds for just about every stage of the processing-to-production pipeline of today’s image generation models: from data collection to output curation. Believing in the magic of AI can only happen because of this systematic erasure.

Figure 3

‘From text to talk’, ACL 2022 paper

📣New! From text to talk: Harnessing conversational corpora for humane and diversity-aware language technology — very happy to see this position paper with Andreas Liesenfeld accepted to ACL 2022. This paper is one of multiple coming out of our @NWO_SSH Vidi project ‘Elementary Particles of Conversation’ and presents a broad-ranging overview of our approach, which combines comparison, computation and conversation.

More NLP work on diverse languages is direly needed. Here we identify a type of data that will be critical to the field’s future and yet remains largely untapped: linguistically diverse conversational corpora. There’s more of it than you might think! Large conversational corpora are still rare & ᴡᴇɪʀᴅ* but granularity matters: even an hour of conversation easily means 1000s of turns with fine details on timing, joint action, incremental planning, & other aspects of interactional infrastructure (*Henrich et al. 2010).

We argue for a move from monologic text to interactive, dialogical, incremental talk. One simple reason this matters: the text corpora that feed most language models & inform many theories woefully underrepresent the very stuff that streamlines & scaffolds human interaction. Text is atemporal, depersonalized, concatenated, monologic — it yields readily to our transformers, tokenizers, taggers, and classifiers. Talk is temporal, personal, sequentially contingent, dialogical. As Wittgenstein would say, it’s a whole different ball game.

Take turn-taking. Building on prior work, we find that across unrelated lgs people seem to aim for rapid transitions on the order of 0~200ms, resulting in plenty small gaps and overlaps — one big reason most voice UIs today feel stilted and out of sync. This calls for incremental architectures (as folks like David Schlangen, Gabriel Skante and Karola Pitsch have long pointed out). Here, cross-linguistically diverse conversational corpora can help to enable local calibration & to identify features that may inform TRP projection.

Turns come in sequences. It’s alluring to see exchanges as slot filling exercises (e.g. Q→A), but most conversations are way more open-ended and fluid. Promisingly, some broad activity type distinctions can be made visible in language-agnostic ways & are worth examing. This bottom-up view invites us to think about languages less in terms of tokens with transition probabilities, and more as tools for flexible coordination games. Look closely at just a minute of quotidian conversation in any language (as #EMCA does) and you cannot unsee this. What’s more, even seemingly similar patterns can harbour diversity. While English & Korean both use a minimal continuer form mhm/응, we find that response tokens are about twice as frequent in the latter (and more often overlapped), with implications for parsers & interaction design.

Finally, we touch on J.R. Firth — not his NLP-famous dictum on distributional semantics, but his lesser known thoughts on conversation, which according to him holds “the key to a better understanding of what language really is and how it works” (1935, p. 71). As Firth observed, talk is more orderly and ritualized than most people think. We rightly celebrate the amazing creativity of language, but tend to overlook the extent to which it is scaffolded by recurrent turn formats — which, we find, may make up ~20% of turns at talk.

Do recurrent turn formats follow the kind of rank/frequency distribution we know from tokenised words? We find that across 22 languages, it seems they do — further evidence they serve as interactional tools (making them a prime example of Zipf’s notion of tools-for-jobs).

We ignore these turn formats at our own peril. Text erases them; tokenisation obscures them; dialog managers stumble over them; ASR misses them — and yet we can’t have a conversation for even thirty seconds without them. Firth was not wrong in saying they’re key.


So, implications! How can linguistically diverse conversational corpora help us do better language science and build more inclusive language technologies? We present three principles: aim for ecological validity, represent interactional infrastructure, design for diversity

Ecological validity is a hard sell because incentives in #nlproc and #ML —size, speed, SOTA-chasing— work against a move from text to talk. However, terabytes of text cannot replace the intricacies of interpersonal interaction. Data curation is key, we say with Anna Rogers. Pivoting from text to talk means taking conversational infrastructure seriously, also at the level of data structures & representations. Flattened text is radically different from the texture of turn-organized, time-bound, tailor-made talk — it takes two to tango.

A user study (Hoegen et al. 2019) provides a fascinating view of what happens when interactional infrastructure is overlooked. People run into overlap when talking with a conversational agent; the paper proposes this may be solved by filtering out “stop words and interjections”. This seems pretty much the wrong way round to us. Filtering out interjections to avoid overlap is like removing all pedestrian crossings to give free reign to self-driving cars. It’s robbing people of self-determination & agency just because technology can’t cope.

Our 3rd recommendation is to design for diversity. As the case studies show, we cannot assume that well-studied languages tell us everything we need to know. Extending the empirical and conceptual foundations of #nlproc and language technologies will be critical for progress.


Voice user interfaces are ubiquitous, yet still feel stilted; text-based LMs have many applications, yet can’t sustain meaningful interaction; and crosslinguistic data & perspectives are in short supply. Our paper sits right at the intersection of these challenges. If you’re going to be at ACL you’ll find our talk on Underline, but here’s a public version of the 12min pre-recorded talk with corrected captions for accessibility:

Why it is useful to distinguish iconicity from indexicality

Every once in a while I come across work that conflates iconicity and indexicality, or lumps them together under a broad label of motivation (often in opposition to ‘arbitrariness’). Even if I tend to advocate for treating terminology lightly, I think there are many cases where it does pay off to maintain this distinction, and conflating it comes at a cost.

Not distinguishing iconicity and indexicality means losing the ability to explain how and why some linguistic resources differ in markedness & morphosyntactic behaviour, as I point out for the analogical issue of ideophones vs interjections here. A related case is transparent compounds, which naïve raters (under some instructions) also rate as highly iconic, yet for which it helps to be able to articulate how they differ from the kind of form-meaning resemblance usually targeted by the technical term iconicity.

There are also deeper evolutionary implications you’d lose sight of without the distinction. If an ancestral pain vocalization underlies interjections like ‘ow’, that makes for a different causal story than cross-linguistic similarities that can be ascribed to (possibly convergent) iconic mappings. So to explain why today’s languages are the way they are, a distinction like this comes in useful.

But for my money, the most interesting questions lie in where iconic vs indexical motivations overlap and where they diverge, and how this influences learning, processing, and cultural evolution. We can’t see those questions if we lump the notions together, nor when we dichotomize them.

New paper: Trilled /r/ is associated with roughness

Very happy to see this paper out! We combine comparative, lexical, historical, and psycholinguistic evidence for an in-depth look at a pervasive form of cross-modal iconicity.

For me, this goes back to ~2011, when I wondered why Siwu ideophones for roughness like wòsòròò, safaraa and dɛkpɛrɛɛ (all with trilled /r:/) felt so… rough. So something clicked when Bodo Winter told me about an intriguing link between /r/ & roughness in English in 2015

Many email threads, conversations, github commits and submissions & revisions later, we have this beast of a paper where we look at /r/~rough in sensory adjectives in English & Hungarian, trace it across hundreds of languages worldwide, and even peer back some 6 millennia in Proto-Indo-European.

It’s been such a pleasure to be part of this endeavour alongside Bodo Winter, Martón Soskuthy and Marcus Perlman. Do check out Bodo’s excellent summary in the thread linked above. And find the paper —open access!— here:

Oh, by the way, one crunchy factoid about this paper (which Marcus Perlman pointed out to us) is that the r-for-rough link persists in present-day English variants where /r/ is no longer trilled — and that it can be awakened, like a sleeping beauty, as in this ad for Ruffles chips.

Coordinating social action

📣New! Coordinating social action: A primer for the cross-species investigation of communicative repair. Very happy to present this work w/ stellar coauthors @rapha_heesen @MarlenFroehlich Christine Sievers @mariekewoe, accepted in PhilTrans B 🧵

In this paper we consider the awesome flexibility of communicative repair in human interaction and take a peek under the hood. We ask: what elementary building blocks make this possible?

We find that several of the building blocks are found across species —from gibbons apparently self-correcting to chimps & bonobos showing persistence and elaboration— and introduce a conceptual framework that we hope will foster further comparative work

I've been interested in this topic ever since observing (in that ways of dealing with communicative trouble pattern within & across species in interesting ways. This year serendipity struck and we were able to get to it with an interdisciplinary team

It was great to work on this with @rapha_heesen @MarlenFroehlich Christine Sievers and @mariekewoe — between us, we represent (at least) psychology, anthropology, primatology, philosophy, psychobiology and the language sciences, which made things all the more fun and interesting

Anyway, while we still seem to have a joint focus of attention, let me just drop this link here again, which (as you can read in the paper) may be a form of persistence if not elaboration — go check it out!

One thing we found is that outside primates, research on sequentially organized social interaction is still rare — most work focuses on acoustics, song structure & ethograms rather than on contingency, sequence & interactional achievement. Lots of opportunities for exciting work!

As we point out in the paper, sequential analysis allows us to unify work on persistence & elaboration in great apes w/ work on repair in humans; and to identify possible continuities or bridging contexts, such as the freeze-look described by @elycorman and @njenfield

One risk of introducing a 'framework' is that it may be interpreted as proposing a simple matrix of ready-to-use labels for reified phenomena. Our goal here is different: we seek to make visible a space of possibilities with room for diversity & gradience

Finally out in print! 📄
🔓 PDF:

Cover page of paper showing title "Coordinating social action: a primer for the cross-species investigation of communicative repair", by authors Raphaela Heesen, Marlen Fröhlich, Christine Sievers, Marieke Woensdregt and Mark Dingemanse

Primates 🦧🧑 are cool, but if there is one thing that I hope our paper will help contribute to it would be a broader interactive turn in communicative ethology across species 🐳🐠🐘🐦🦇 : from signals and their properties to sequential exchanges as an interactional achievement.

Always plot your data

Always plot your data. We're working with conversational corpora and looking at timing data. Here's a density plot of the timing of turn-taking for three corpora of Japanese and Spanish. At least 3 of the distributions look off (non-normal). But why?

Plotting turn duration against offset provides a clue: in the weird looking ones, there’s a crazy number of turns whose negative offset is equal to their duration — something that can happen if consecutive turns in the data share the exact same end time (very unlikely in actual data).

Plotting the actual timing of the turns as a piano roll shows what’s up: the way turns are segmented and overlap are highly improbable ways — imagine a conversation that goes like this! (in red are data points on the diagonal lines above)

Fortunately some of the corpora we have for these languages don’t show this — so we’re using those. If we hadn’t plotted the data in a few different ways it would have been pretty hard to spot, with consequences down the line. So: always plot your data.

The Gruner Map: a 1913 map of the Togo Plateau in present-day Ghana

Few historical maps of Ghana’s Volta and Oti regions have been invested with so much political and sociohistorical meaning as Hans Gruner’s 1913 map of the Togo Plateau. Gruner, stationed for over twenty years at Misahöhe in present-day Togo, was a long-time colonial administrator known for his ethnographical and historical knowledge of the area. His name is still known in most localities depicted on the map, as I attested in Akpafu myself (I’ve written about the map on this blog before). Besides Akpafu, we find the communities Santrokofi, Gbi, Alavanyo, Nkonya, and Bowiri on this map.

The map is not uncontroversial and is in the first place a political object, serving a double goal of documentation and geopolitical regimentation. Gruner worked with the communities bordering on the Togo Plateau and saw to it that all of them received an official copy, some of which still survive. It was widely accepted by most of the communities, was adopted and used by British colonial authorities in the 1920s, and has since been upheld by the Ghana High Court numerous times as the definitive demarcation for settling land claims and boundary disputes, though the Nkonya-Alavanyo border remains disputed, with conflicts flaring up every now and then (Penu & Essaw 2019).

Gruner is still a household name in part because was a petty tyrant with a powerful grip on ‘his’ Misahöhe district. In the 1890s he played a key role in expanding the German colonial sphere and violently subjugating people that were in the way of German commercial and political interests. He led the infamous 1894/95 Togo Hinterland expedition, which sought to extend Germany’s sphere of influence ostensibly under the goal of building scientific and ethnographic collections (the latter obtained by buying or by looting). The influential Dente Bosomfo at Kete-Krachi was executed in public by firing squad under Gruner’s direction in November 1894, and Gruner subsequently oversaw the plundering of the Dente shrine (Maier 1980, Hüsgen 2020). He was stationed at Misahöhe between 1896 and 1914 and was presumptuous enough to give himself the title of “Graf von Misahöhe”.

Obscure and hard to find

Despite its historical significance and continuing local geopolitical relevance, access to the Gruner Map has been severely restricted for over hundred years, and interested parties have been pointed to archival copies in the custody of local authorities or in libraries in Europe that carry copies of Mitteilungen aus den deutschen Schutzgebieten, the obscure and long-defunct German colonial-era journal in which the map was originally published as a supplement. Here’s a photograph of one of the copies surviving in Ghana:

Original copy of Gruner’s map as photographed by Penu & Essaw 2019 ‘during fieldwork in April 2015’

Now in the public domain

This situation is far from desirable: material of such significance should be available freely and at the highest possible quality to anyone interested. Fortunately, digitisation puts early sources within reach of anyone with an internet connection, and it has been possible for a while to now to find low-resolution copies online. But we can do better. Therefore I am making available a new high resolution scan of the map that I made myself with the help of the librarians at the MPI for Psycholinguistics. Here it is:

If ~5000x7500px is too large for you, try this slightly more reasonably sized one at 2000×2700 pixels, sourced from the Bayerische Staatsbibliothek: Gruner map, 1913, JPG (2Mb, 2000×2707 pixels). Given that the map is from 1913 and its makers died in 1928 (Sprigade) and 1943 (Gruner), I consider it to be in the public domain.

To be clear: I take no position in any territorial disputes in which this old map may or may not be relevant. My position is that information wants to be free. For an overview of the Nkonya/Alavanyo conflicts, the contested role of the Gruner Map, and alternative ways of determining the relevant boundaries, see Penu & Essaw 2019 (PDF).

Historical & ethnographical interest

The communities featured on the map are, in clockwise order from top right and by their present-day designations: Akpafu, Santrokofi, Gbi, Alavanyo, Nkonya, and Bowiri. Gruner used a Germanized spelling, as seen in Sandrokofi and Kunja, and was somewhat erratic in keeping (Egbi) or leaving out (Lavanyo, Kunja) presyllabic vowels and nasals.

Even though the map is mostly known for its local geopolitical significance, there is another reason to share it: it has great historical-descriptive value. Just taking the Akpafu area (which I know best), it is clear that placenames have been faithfully recorded in such a way that we can recognise and even parse many local toponyms (e.g., Eprimkato = Iprimu-kato ‘the top of Iprimu’, Klasereré ɔkàlà-sɛrɛrɛ ‘steep sleeping mat’, and so on). Moreover, many abandoned settlements in the Kùbe mountains —of great historical and archaeological significance because of the famed local iron industry— are indicated on the map.

In short, the Gruner map offered unprecented detail for its time and was underpinned by considerable geological, ethnographical and sociological research. The research underlying the map was amply documented in an often-overlooked 12 page treatise that accompanies the map and that is also made available digitally here, perhaps for the first time (Gruner 1913):

While I have published translations of German early sources on this blog before, and in general I try to go out of my way to make early work accessible to as many readers as possible, I don’t quite have the resources right now to commit to a translation of this 12-page treatise, which is all the more reason to make the original German available. Perhaps others will beat me to translating it.

References & further reading

