Deep learning, image generation, and the rise of bias automation machines

DALL-E, a new image generation system by OpenAI, does impressive visualizations of biased datasets. I like how the first example that OpenAI used to present DALL-E to the world is a meme-like koala dunking a baseball leading into an array of old white men — representing at one blow the past and future of representation and generation.

It’s easy to be impressed by cherry-picked examples of DALL•E 2 output, but if the training data is web-scraped image+text data (of course it is) the ethical questions and consequences should command much more of our attention, as argued here by Abeba Birhane and Vinay Uday Prabhu.

Suave imagery makes it easy to miss what #dalle2 really excels at: automating bias. Consider what DALL•E 2 produces for the prompt “a data scientist creating artificial general intelligence”:

When the male bias was pointed out to AI lead developer Boris Power, he countered that “it generates a woman if you ask for a woman”. Ah yes, what more could we ask for? The irony is so thicc on this one that we should be happy to have ample #dalle2 generated techbros to roll eyes at. It inspired me to make a meme. Feel free to use this meme to express your utter delight at the dexterousness of DALL-E, cream of the crop of image generation!

The systematic erasure of human labour

It is not surprising that glamour magazines like Cosmopolitan, self-appointed suppliers of suave imagery, are the first to fall for the gimmicks of image generation. As its editor Karen Cheng found out after thousands of tries, it generates a woman if you ask for “a female astronaut with an athletic feminine body walking with swagger”.

I also love this triptych because of the evidence of human curation in the editor’s tweet (“after thousands of options, none felt quite right…”) — and the glib erasure of exactly that curation in the subtitle of the magazine cover: “and it only took 20 seconds to make”.

The erasure of human labour holds for just about every stage of the processing-to-production pipeline of today’s image generation models: from data collection to output curation. Believing in the magic of AI can only happen because of this systematic erasure.

Based on a thread originally tweeted by (@DingemanseMark) on April 7, 2022.

Over-reliance on English hinders cognitive science

Been reading this paper by @blasi_lang @JoHenrich @EvangeliaAdamou Kemmerer & @asifa_majid and can recommend it — Figure 1 is likely to end up in many lecture slides

Naturally I was interested in what the paper says about conversation. The claim about indirectness in Yoruba and other languages is sourced to a very nice piece by Felix Ameka and Marina Terkourafi.

The paper also devotes some attention to the importance of linguistic diversity in computer science and NLP — a key theme in the new language diversity track at #acl2022nlp, where another paper by Blasi and colleagues stood out. (The relevance of cross-linguistically diverse corpora for NLP was also a focus in this ACL paper of ours, where we argue such data is crucial for diversity-aware modelling of dialogue and conversational AI.

I do have a nitpick about Blasi &al’s backchannel claim. They note many languages have minimal forms (citing a study of ours that provides evidence on this for 32 languages) and add, “However, listeners of Ruruuli … repeat whole words said by the speaker” — seeming to imply they rarely produce such minimal forms and (tend to) repeat words instead. Or at least I’m guessing that would be most people’s reading of this claim.

The source given for this idea is Zellers 2021. However, this actually paints a very different picture: in fact, ~87% of relevant utterances (1325 out of 1517) do consist of minimal forms like the ‘nonlexical’ hmm and the ‘short lexical’ eeh ‘yes’, against <9% featuring repetition, as seen in this table from Zellers:

I don’t think anyone has done the relevant comparison for other languages yet, but it seems safe to say that Ruruuli/Lunyala does in fact mostly use “the minimal mm-hmm”, and that repetition, while certainly worthwhile of more research, is one of the minority strategies for backchanneling in the language.

Despite this shortcoming, the relevance of cross-linguistic diversity in this domain can be supported by a different observation: the relative frequency and points of occurrence of ‘backchannels’ do seem to differ across languages — as shown in our ACL paper for English versus Korean. And the work on repetition is fascinating in itself — it is certainly possible that repetition is used in a wider range of interactional practices in some languages, with possible effects on transmission & lg structure as suggested in work by Sonja Gipper.

Originally tweeted by (@DingemanseMark) on October 17, 2022.

A serendipitous wormhole into the history of Ethnomethodology and Conversation Analysis (EMCA)

A serendipitous wormhole into #EMCA history. I picked up Sudnow’s piano course online and diligently work through the lessons. Guess what he says some time into the audio-recorded version of his 1988 Chicago weekend seminar (see lines 7-11)

[Chicago, 1988. Audio recording of David Sudnow’s weekend seminar]

We learn too quickly and cannot afford to contaminate a movement by making a mistake.

People who type a lot have had this experience. You type a word and you make a mistake.

I have been involved, uh of late, in: a great deal of correspondence in connection with uh a deceased friend’s archives of scholarly work and what should be done with that and his name is Harvey. And about two months ago or three months ago when the correspondence started I made a mistake when I ( ) taped his name once and I wrote H A S R V E Y, >jst a mistake<.

I must’ve written his name uh two hundred times in the last few months in connection with all the letters and the various things they were doing. Every single time I do that I get H A S R V E Y and I have to go back and correct the S. I put it in the one time and my hands learned a new way of spelling Harvey. I call ‘m Harvey but my hands call ‘m Hasrvey.

And they learned it that one time. Right then and there, the old Harvey got replaced and a new Harvey, spelled H A S R V E Y got put in. So we learn very fast.

Folks who know #EMCA history will notice this is right at the height of the activity of the Harvey Sacks Memorial Association, when Sudnow, Jefferson, Schegloff, and others were exchanging letters on Sacks’ Nachlass, intellectual priority in CA, and so on

We have here a rare first person record of the activity that Gail Jefferson obliquely referred to in her acknowledgement to the posthumously published Sacks lectures (“With thanks to David Sudnow who kick-started the editing process when it had stalled”), and much more explicitly in an 1988 letter (paraphrased in Button et al. 2022).

Historical interest aside, I like how the telling demonstrates Sudnow’s gift for first-person observation — a powerful combination of ethnomethodology and phenomenology that is also on display in his books, Pilgrim in the Microworld and Ways of the Hand #EMCA

Originally tweeted by (@DingemanseMark) on October 6, 2022.

The perils of edited volumes

Ten years ago, fresh out of my PhD, I completed three papers. One I submitted to a regular journal; it came out in 2012. One was for a special issue; it took until 2017 to appear. One was for an edited volume; the volume is yet to appear.

These may be extreme cases, but I think they reflect quite well the relative risks for early career researchers (in linguistics & perhaps more widely) of submitting to regular journals vs special issues vs edited volumes.

Avoiding the latter is not always possible; in linguistics, handbooks still have an audience. If I could advise my 2012 self, I’d say: 1. always preprint your work; 2. privilege online-first & open access venues; 3. use #RightsRetention statements to keep control over your work.

A natural experiment

Anyway, these three papers also provide an interesting natural experiment on the role of availability for reach and impact. The first, Advances in the cross-linguistic study of ideophones, now has >400 cites according to Google Scholar, improbably making it one of the most cited papers in its journal. This paper has done amazingly well.

The second, Expressiveness and system integration, has >50 cites and was scooped by a paper on Japanese that I wrote with Kimi Akita. We wrote that second paper two years after the first, but it appeared one year before it, if you still follow the chronology. As linguistics papers go, I don’t think it has done all that bad, especially considering that its impact was stunted by being in editorial purgatory for 4 years.

The third, “The language of perception in Siwu”, has only been seen by three people and cited by one of them (not me). I am not sure if or when it will see the light of day.

Some ACL2022 papers of interest

Too much going on at #acl2022nlp for live-tweeting, but I’ll do a wee thread on 3 papers I found thought-provoking: one on robustness probing by @jmderiu et al.; one on underclaiming by @sleepinyourhat; and one on bots for psychotherapy by Das et al..

Deriu et al. stress-test automated metrics for evaluating conversational dialogue systems. They use Blenderbot to identify local maxima in trained metrics and so identify blatantly nonsensical response types that reliably lead to high scores

As they write, "there are no known remedies to this problem". My conjecture (also see Goodhart's law): any automated metric will be affected by this as long as we're training on form alone. It's a thought-provoking paper, go read it

Next! Bowman acknowledges the harms of hype but focuses on the inverse: overclaiming the scope of work on limitations (='underclaiming'). I think his argument underestimates the enormous asymmetry of these cases and therefore may overclaim the harms?

I did wonder whether @sleepinyourhat is playing 4D chess here by writing a paper that's likely to attract citations from work that may have an incentive to overclaim the harms of underclaiming 🤯😂 #acl2022nlp

Third is Das et. al who propose to expose psychologically vulnerable people to conversational bots trained on Reddit, which frankly is every bit as bad an idea as it sounds (the words "ethics" and "risk" do not occur in the paper 🤷) #acl2022nlp #bionlp

There’s been loads more interesting and intriguing work at #acl2022nlp and I have particularly enjoyed the many talks in the theme track sessions on linguistic diversity. Check out the hundreds of papers (8831 pages) in the @aclanthology here:

Okay because @KLM has decided to cancel my flight and delay the next one, some quick notes from the liminality of Dublin Airport on a few more #acl2022nlp papers I found interesting, revealing, or thought-provoking

Ung et al. (Facebook AI Research) train chatbots to say sorry in nicer ways, though without addressing the underlying problems that make them say offensive things in the 1st place. I thought this was both interesting and revealing of FBs priorities. Paper:

Room for improvement: throughout, Ung et al remove "stop words" — but as conversation analysts can tell you, turn prefaces like uh, um, well, etc. often signal interactionally delicate matters, i.e. precisely the stuff they're hoping to track here 😬

Further, feedback is seen as strictly individual — whereas in normal human interaction it (also) reinforces *social* norms. Consider: those offended may not always have the social capital, privilege or energy to speak out ➡️ FBs bots will blithely continue to offend them 🤷

Originally tweeted by (@DingemanseMark) on May 25, 2022.

‘From text to talk’, ACL 2022 paper

(this post originated as a twitter thread)

📣New! From text to talk: Harnessing conversational corpora for humane and diversity-aware language technology — very happy to see this position paper w/ @a_liesenfeld accepted to #acl2022nlp — Preprint 📜:

Screenhot of cover page of article. Abstract: "Informal social interaction is the primordial home of human language. Linguistically diverse conversational corpora are an important and largely untapped resource for computational linguistics and language technology. Through the efforts of a worldwide language documentation movement, such corpora are increasingly becoming available. We show how interactional data from 63 languages (26 families) harbours insights about turn-taking, timing, sequential structure and social action, with implications for language technology, natural language understanding, and the design of conversational interfaces. Harnessing linguistically diverse conversational corpora will provide the empirical foundations for flexible, localizable, humane language technologies of the future."

This paper is one of multiple coming out of our @NWO_SSH Vidi project 'Elementary Particles of Conversation' and presents a broad-ranging overview of our approach, which combines comparison, computation and conversation

More NLP work on diverse languages is direly needed. In this #acl2022nlp position paper we identify a type of data that will be critical to the field's future and yet remains largely untapped: linguistically diverse conversational corpora. There's more of it than you might think!

World map showing the location of 63 spoken languages included in the curated collection considered in the paper: 1 Arapaho 2 Cora 3 English 4 Otomi 5 Ulwa 6 Kichwa 7 Siona 8 Tehuelche 9 Br. Portuguese 10 Kakabe 11 Minderico 12 Spanish 13 Siwu 14 Catalan 15 French 16 Dutch 17 Akpes 18 Hausa 19 Danish 20 Zaar 21 Baa 22 German 23 Italian 24 Sakun 25 Czech 26 Croatian 27 Limassa 28 }Akhoe 29 Saami 30 Laal 31 Polish 32 N|uu 33 Hungarian 34 Juba Creole 35 Arabic 36 Siputhi 37 Farsi 38 Chitkuli 39 Gutob 40 Nganasan 41 Yakkha 42 Anal 43 Zauzou 44 Kerinci 45 Duoxu 46 S. Qiang 47 Nasal 48 Sambas 49 Kelabit 50 Mandarin 51 Totoli 52 Kula 53 Jejueo 54 Korean 55 Pagu 56 Ambel 57 Gunwinggu 58 Japanese 59 Wooi 60 Yali 61 Heyo 62 Yélî Dnye 63 Vamale.

Large conversational corpora are still rare & ᴡᴇɪʀᴅ* but granularity matters: even an hour of conversation easily means 1000s of turns with fine details on timing, joint action, incremental planning, & other aspects of interactional infrastructure (*Henrich et al. 2010)

Language resources (corpora) and their size in relation to global language diversity. >7000 languages, >180 with some form of corpus resources, ~70 with conversational corpora of casual talk.

We argue for a move from monologic text to interactive, dialogical, incremental talk. One simple reason this matters: the text corpora that feed most language models & inform many theories woefully underrepresent the very stuff that streamlines & scaffolds human interaction

Diagram showing the words and expressions most distinctive of talk (compared to text): interjections like hhuh, hm, mhm, wow, um, yeah, etc.

Text is atemporal, depersonalized, concatenated, monologic — it yields readily to our transformers, tokenizers, taggers, and classifiers. Talk is temporal, personal, sequentially contingent, dialogical. As Wittgenstein would say, it's a whole different ball game

Take turn-taking. Building on prior work, we find that across unrelated lgs people seem to aim for rapid transitions on the order of 0~200ms, resulting in plenty small gaps and overlaps — one big reason most voice UIs today feel stilted and out of sync

The timing of turn transitions in dyadic interactions in 24 languages around the world, replicating earlier findings and extending the evidence for the interplay of universals and cultural variation in turn-taking (n = number of turn transitions per corpus). Positive values represent gaps between turns; negative values represent overlaps. Across languages, the mean transition time is 59ms, and 46% of turns are produced in (slight) terminal overlap with a prior turn

This calls for incremental architectures (as folks like @davidschlangen @GabrielSkantze @KarolaPitsch have long pointed out). Here, cross-linguistically diverse conversational corpora can help to enable local calibration & to identify features that may inform TRP projection

Turns come in sequences. It's alluring to see exchanges as slot filling exercises (e.g. Q→A), but most conversations are way more open-ended and fluid. Promisingly, some broad activity type distinctions can be made visible in language-agnostic ways & are worth examing

Two types of conversational activity in 6 unrelated languages, showing the viability of identifying broad activity types using ebbs and flows in amount of talk contributed (time in ms). Panel A: a 'piano roll' display of turns by two participants as they unfold over time. Tellings (‘chunks’) are characterized by highly skewed relative contributions, with one participant serving as teller and the other taking on a recipient role (roles may switch, as in the Japanese example). Panel B. In ‘chat’ segments, turns and speaking time are distributed more evenly. Panel C. Shifts from one state to another are interactionally managed by participants.

This bottom-up view invites us to think about languages less in terms of tokens with transition probabilities, and more as tools for flexible coordination games. Look closely at just a minute of quotidian conversation in any language (as #EMCA does) and you cannot unsee this

Even seemingly similar patterns can harbour diversity. While English & Korean both use a minimal continuer form mhm/응, we find that response tokens are about twice as frequent in the latter (and more often overlapped), with implications for parsers & interaction design

Finally, we touch on J.R. Firth — not his NLP-famous dictum on distributional semantics, but his lesser known thoughts on conversation, which according to him holds "the key to a better understanding of what language really is and how it works" (1935, p. 71)

Quote from Firth (1935): "Neither linguists nor psychologists have begun the study of conversation; but it is here we shall find the key to a better understanding of what language really is and how it works"

As Firth observed, talk is more orderly and ritualized than most people think. We rightly celebrate the amazing creativity of language, but tend to overlook the extent to which it is scaffolded by recurrent turn formats — which, we find, may make up ~20% of turns at talk

a look at conversational
a look at conversational data shows that many turns are not one-offs: at least 28% of the utterances in our sample (436 367 out of 1 532 915 across 63 languages) occur more than once, and over 21% (329 548) occur more than 20 times. Many of these recurring turn formats are interjections and other pragmatic devices that help manage the flow of interaction and calibrate understanding

Do recurrent turn formats follow the kind of rank/frequency distribution we know from tokenised words? We find that across 22 languages, it seems they do — further evidence they serve as interactional tools (making them a prime example of Zipf's notion of tools-for-jobs)

We ignore these turn formats at our own peril. Text erases them; tokenisation obscures them; dialog managers stumble over them; ASR misses them — and yet we can't have a conversation for even thirty seconds without them. Firth was not wrong in saying they're 🔑

Conversation Deepening GIF

I've been slow-threading my way through some of our empirical results and will be adding a bunch more tweets on the implications. If you're hopping on, this is work with the amazing @a_liesenfeld, preprinted at & to be presented at #acl2022nlp soon

So, implications! How can linguistically diverse conversational corpora help us do better language science and build more inclusive language technologies? We present three principles: aim for ecological validity, represent interactional infrastructure, design for diversity

Ecological validity is a hard sell because incentives in #nlproc and #ML —size, speed, SOTA-chasing— work against a move from text to talk. However, terabytes of text cannot replace the intricacies of interpersonal interaction. Data curation is key, we say with @annargrs

Pivoting from text to talk means taking conversational infrastructure seriously, also at the level of data structures & representations. Flattened text is radically different from the texture of turn-organized, time-bound, tailor-made talk — it takes two to tango

Kaysar Dadour Mayara Araujo GIF

A user study (Hoegen et al. 2019) provides a fascinating view of what happens when interactional infrastructure is overlooked. People run into overlap when talking with a conversational agent; the paper proposes this may be solved by filtering out "stop words and interjections"

This seems pretty much the wrong way round to us. Filtering out interjections to avoid overlap is like removing all pedestrian crossings to give free reign to self-driving cars. It's robbing people of self-determination & agency just because technology can't cope

Our 3rd recommendation is to design for diversity. As the case studies show, we cannot assume that well-studied languages tell us everything we need to know. Extending the empirical and conceptual foundations of #nlproc and language technologies will be critical for progress

To escape the reign of the resourceful few, use linguistically diverse data and anticipate a combination of universal and language-specific design principles. This not only ensures broad empirical coverage and enables new discoveries; it also benefits diversity and inclusion, as it enables language technology development that serves the needs of diverse communities. and makes technology more inclusive, more humane and more convivial for a larger range of possible users (Munn, 2018; Voinea, 2018). Localizing user interface elements is only a first step; diversity in how and when basic interactional structures are deployed must ultimately be reflected in the design of conversational user interfaces. In the rush for better language technology we should avoid being driven into the arms of only the

Voice user interfaces are ubiquitous, yet still feel stilted; text-based LMs have many applications, yet can't sustain meaningful interaction; and crosslinguistic data & perspectives are in short supply. Our #acl2022nlp paper sits right at the intersection of these challenges

Still of video showing opening page of paper, which is available here:

Cleaning up the youtube autocaptions for our #acl2022nlp preview, it is really uncanny how accurate it is at *not ever transcribing* interjections like "m-hm", "huh?" — a neat illustration of our point that ASR often misses these words

Revisiting this thread to record the official link to the paper in the ACL Anthology (for those of you who like official page numbers):

If you're going to be at ACL you'll find our talk on Underline, but here's a public version of the 12min pre-recorded talk with corrected captions for accessibility — w/ @a_liesenfeld #ACL2022 #nlproc #ACL2022nlp

Sweet: this line from our paper's conclusions was highlighted by @thamar_solorio as a key take-away message at the #acl2022nlp Next Big Ideas plenary session. Here's to more room for linguistic agency and diversity in NLP

By the way, one of the more puzzling #acl2022nlp reviewer comments we got was precisely about that line (among others), and featured a serious charge that @a_liesenfeld and I now often lob at each other: 🚨 "figurative language in evidence" 🚨

Originally tweeted by (@DingemanseMark) on March 23, 2022.

Coordinating social action

📣New! Coordinating social action: A primer for the cross-species investigation of communicative repair. Very happy to present this work w/ stellar coauthors @rapha_heesen @MarlenFroehlich Christine Sievers @mariekewoe, accepted in PhilTrans B 🧵

In this paper we consider the awesome flexibility of communicative repair in human interaction and take a peek under the hood. We ask: what elementary building blocks make this possible?

We find that several of the building blocks are found across species —from gibbons apparently self-correcting to chimps & bonobos showing persistence and elaboration— and introduce a conceptual framework that we hope will foster further comparative work

I've been interested in this topic ever since observing (in that ways of dealing with communicative trouble pattern within & across species in interesting ways. This year serendipity struck and we were able to get to it with an interdisciplinary team

It was great to work on this with @rapha_heesen @MarlenFroehlich Christine Sievers and @mariekewoe — between us, we represent (at least) psychology, anthropology, primatology, philosophy, psychobiology and the language sciences, which made things all the more fun and interesting

Anyway, while we still seem to have a joint focus of attention, let me just drop this link here again, which (as you can read in the paper) may be a form of persistence if not elaboration — go check it out!

One thing we found is that outside primates, research on sequentially organized social interaction is still rare — most work focuses on acoustics, song structure & ethograms rather than on contingency, sequence & interactional achievement. Lots of opportunities for exciting work!

As we point out in the paper, sequential analysis allows us to unify work on persistence & elaboration in great apes w/ work on repair in humans; and to identify possible continuities or bridging contexts, such as the freeze-look described by @elycorman and @njenfield

One risk of introducing a 'framework' is that it may be interpreted as proposing a simple matrix of ready-to-use labels for reified phenomena. Our goal here is different: we seek to make visible a space of possibilities with room for diversity & gradience

Finally out in print! 📄
🔓 PDF:

Cover page of paper showing title "Coordinating social action: a primer for the cross-species investigation of communicative repair", by authors Raphaela Heesen, Marlen Fröhlich, Christine Sievers, Marieke Woensdregt and Mark Dingemanse

Primates 🦧🧑 are cool, but if there is one thing that I hope our paper will help contribute to it would be a broader interactive turn in communicative ethology across species 🐳🐠🐘🐦🦇 : from signals and their properties to sequential exchanges as an interactional achievement.

Originally tweeted by (@DingemanseMark) on December 15, 2021.

Always plot your data

Always plot your data. We're working with conversational corpora and looking at timing data. Here's a density plot of the timing of turn-taking for three corpora of Japanese and Spanish. At least 3 of the distributions look off (non-normal). But why?

Plotting turn duration against offset provides a clue: in the weird looking ones, there’s a crazy number of turns whose negative offset is equal to their duration — something that can happen if consecutive turns in the data share the exact same end time (very unlikely in actual data).

Plotting the actual timing of the turns as a piano roll shows what’s up: the way turns are segmented and overlap are highly improbable ways — imagine a conversation that goes like this! (in red are data points on the diagonal lines above)

Fortunately some of the corpora we have for these languages don’t show this — so we’re using those. If we hadn’t plotted the data in a few different ways it would have been pretty hard to spot, with consequences down the line. So: always plot your data.

Originally tweeted by (@DingemanseMark) on November 6, 2021.

Van betekenisloze getallen naar een evidence-based CV

Lezenswaardig: een groep jonge medici ageert tegen de marketing-wedstrijd waarin volgens hen narratieve CVs in kunnen ontaarden — de nieuwste bijdrage aan het Erkennen & Waarderen-debat. Maar niets is wat het lijkt. Over evidence-based CVs, kwaliteit & kwantificatie

Eerst dit: de brief benoemt het risico dat je met narratieve CVs een soort competitie krijgt tussen verhalen. Dat kan zeker als de conventies van het genre nog niet uitgekristalliseerd zijn, zoals ik al schreef in 2019, toen NWO het invoerde. Een mooie-verhalen-wedstrijd wil niemand, daar zijn we het over eens. Ik ben het wat dat betreft trouwens ook eens met misschien wel het belangrijkste punt van de eerste brief o.l.v. Raymond Poot: meten is weten. Je moet alleen wel weten wát je meet. Daarover gaat dit stuk ook.

De medici (zowel deze jongere collega’s als de senioren o.l.v. @raymondpoot in het openingssalvo) lijken vooral te ageren tegen de term “narratief CV”. Die heeft ook de schijn tegen natuurlijk: gaan we elkaar nou sterke verhalen zitten vertellen bij het kampvuur? Nee toch zeker! Volgens de briefschrijvers moet een wetenschapper in het nieuwe systeem iets over haar achtergrond & prestaties opschrijven “op een onderscheidende manier” en “zonder kwantitatieve maten te gebruiken”. Factcheck: ❌ Kwantificatie in het narratieve CV is prima, gewenst zelfs!

Laten we de call van NWO er anders even bij pakken: hier is de PDF — het stuk waar het om gaat (§3.4.1 sectie 1 en 2) plak ik hieronder

Evidence-based CV

Als de term “narratief CV” je niet zint kun je het ook een evidence-based CV noemen: in plaats van contextloze lijstjes & getallen wil men argumenten zien voor de excellentie van de kandidaat & haar werk, kracht bijgezet door kwalitatief en kwantitatief bewijs van impact.

Want kijk even mee: zowel kwantitatieve als kwalitatieve indicatoren zijn uitdrukkelijk toegestaan. Dat zou je niet uit de brieven van Raymond Poot & collega-medici gehaald hebben. Het cruciale verschil is dat indicatoren duidelijk betrekking moeten hebben op specieke items: “Alle type kwaliteitsindicatoren mogen genoemd worden, zolang ze betrekking hebben op slechts één output item.”

Wat hier goed aan is 1: Waar eerder de complete publicatielijst geplempt mocht worden (waar vooral veelschrijvers bij gebaat zijn) vraagt dit format om een gemotiveerde keuze van 10 items: de n-best methode die aan Ivy Leagues gangbaar is. Niks mis mee!

Wat hier goed aan is 2: Waar je eerder goede sier kon maken met journal-level metrics als IF (statistisch gezien niet meer dan een opgedirkt halo-effect) moet je nu hard bewijs leveren voor de impact & het belang van je werk.

Wat hier goed aan is 3: Waar je eerder te koop kon lopen met een hoge h-index (niet gecorrigeerd voor voorsprong door leeftijd, coauteurschap, zelfcitaties & andere biases) mag je nu laten zien welke van je papers echt zo briljant & origineel zijn.

Dat kwantificatie niet meer mag is quatsch

Volgens mij zijn dat ook 3 manieren waarop een evidence-based CV meer kansen biedt juist voor de ‘kwetsbare groepen’ die de brief noemt. (En ook: 3 manieren waarop de voorsprong van traditioneel bevoorrechten enigszins rechtgetrokken wordt — is dat niet ook een deel van de pijn?)

Kortom, dat kwantificatie niet meer zou mogen is quatsch. Je kunt alleen niet meer wegkomen met de meest indirecte cijfers (die vooral wat zeggen over privileges, kruiwagens en co-auteurs) — in plaats daarvan moet je nu hard bewijs leveren voor de impact & het belang van je werk.

Ik moet wel zeggen: de misverstanden in de brieven komen niet helemaal uit de lucht vallen. “Narratief CV” is geen beste term en er is kennelijk gebrek aan sterke voorbeelden van verantwoorde & genuanceerde kwantificatie op artikelniveau. Werk aan de winkel voor Erkennen en Waarderen en NWO!

Tot slot: álle briefschrijvers —van @raymondpoot cs tot @DeJongeAkademie @RadboudYA etc tot de jonge medici— zijn het erover eens dat roofbouw op de financiering de echte nekslag is voor topwetenschap in ons land: meer investering in fundamenteel onderzoek is cruciaal

Toevoeging 11 mei 2022:

Nou, mijn betoog in dit draadje, of in ieder geval de de term ‘evidence-based CV’, lijkt bij NWO gehoor gevonden te hebben — waar op mijn CV zal ik dat zetten? 😃

Originally tweeted by (@DingemanseMark) on July 27, 2021.

Linguistic roots of connectionism

This Lingbuzz preprint by Baroni is a nice read if you’re interested in linguistically oriented deep net analysis. I did feel it’s a bit hampered by the near-exclusive equation of linguistic theory with generative/Chomskyan aps. (I know it makes a point of claiming a “very broad notion of theoretical linguistics”, but it doesn’t really demonstrate this, and throughout the implicit notion of theory is near-exclusively aligned with GG and its associated concerns of competence, poverty of the stimulus, et cetera).

For instance, it notes (citing Lappin) that theoretical linguistics “played no role” in deep learning for NLP, but while this may hold for generative grammar (GG), linguistic theorizing was much broader than that right at the start of connectionism and RNNs, e.g. in Elman 1991.

In fact, just look at the bibliography of Elman’s classic RNN work and tell us again how exactly theoretical linguistics “played no role” — Bates & Macwhinney, Chomsky, Fillmore, Fodor, Givon, Hopper & Thompson, Lakoff, Langacker, they’re all there. Elman’s bibliography is a virtual Who is Who of big tent linguistics at the start of the 1990s. The only way to give any content to Lappin’s claim (and by extension, Baroni’s generalization) is to give the notion of “theoretical linguistics” the narrowest conceivable reading.

However, Baroni’s point may generalize: perhaps modern-day usage-based, functional, and cognitive approaches to ling theory aren’t drawing as heavily on current NLP/ML/DL work as they could either. Might a lack of reciprocity play a role? After all, the well known ahistoricism and lack of interdisciplinary engagement of NLP today does not exactly invite productive exchange. (Though some of us try.)

The theory=Chomsky equation also makes it appearance at the end, where Baroni muses about incorporating storage, retrieval, gating and attention in theories of language. Outside the confines of Chomskyan linguistics folks have long been working on precisely such things. One might think work by Joan Bybee, Maryellen MacDonald, Morten Christiansen, and others might merit a mention!

In sum, Baroni’s piece provides an informative if partial review of recent work and includes bold proposals (e.g., deep nets as algorithmic linguistic theories), worth reading if you’re interested in a particular kind of linguistics. Consider pairing it with this well-aged bottle of Elman 1991!


  • Baroni, M. (2021, June). On the proper role of linguistically-oriented deep net analysis in linguistic theorizing. LingBuzz. Retrieved from
  • Bybee, J. L. (2010). Language, Usage, and Cognition. Cambridge: Cambridge University Press.
  • Christiansen, M. H., & Chater, N. (2017). Towards an integrated science of language. Nature Human Behaviour, 1, s41562-017-0163–017. doi: 10.1038/s41562-017-0163
  • Elman, J. L. (1991). Distributed Representations, Simple Recurrent Networks, And Grammatical Structure. Machine Learning, 7, 195–225. doi: 10.1023/A:1022699029236
  • Lappin, S. (2021). Deep learning and linguistic representation. Boca Raton: CRC Press.
  • MacDonald, M. C., & Christiansen, M. H. (2002). Reassessing working memory: Comment on Just and Carpenter (1992) and Waters and Caplan (1996). Psychological Review, 109(1), 35–54. doi: 10.1037/0033-295X.109.1.35

Originally tweeted by (@DingemanseMark) on June 17, 2021.