Thinking visually with Remarkable

Sketches, visualizations and other forms of externalizing cognition play a prominent role in the work of just about any scientist. It’s why we love using blackboards, whiteboards, notebooks and scraps of paper. Many folks who had the privilege of working the late Pieter Muysken fondly remember his habit of grabbing any old piece of paper that came to hand, scribbling while talking, then handing it over to you.

Since the summer of 2021 I have owned a Remarkable, and it has become an essential part of my scientific workflow because it seamlessly bridges this physical form of thinking with the digital world of drafts, files and emails. I rarely rave about tools (to each their own, etc.) but this is one of those that has changed my habits for the better in several ways: I’ve been reading more, taking more notes, writing more, and also doodling and sketching more. As a cognitive scientist I would describe it as a distraction-free piece of technology with just the right affordances for powerful forms of extended cognition (it is probably no coincidence that it was recommended to me by fellow traveller Sébastien Lerique, whose interests range from embodied rationality to interaction).

One of ways in which the Remarkable has changed my workflow and my collaborations is that it is much easier to sketch a basic idea for a visualization and share it digitally. We use this during brainstorms to produce first impressions or visualize hypotheses. Often such a rough sketch then functions as a placeholder in a draft until we’ve made an actual version based on our data.

The above example from a recent paper with Andreas Liesenfeld shows this process: first my rough sketch of what the plot might look like, which fuels our discussion and helps me to express how to transform our source data in R. Then a ggplot version I made in R that preserves the key idea and adds some bells and whistles like loess lines and colour.

I want to credit my collaborator Andreas Liesenfeld for pushing me to do more of this visual-first way of thinking. One of the things Andreas often asks when brainstorming about a new paper is: “okay but what’s the visual?”. Thinking early about compelling visualizations has made our papers more tightly integrated hybrids of text and visuals than they might otherwise have been. For instance, our ACL paper has 7 figures, approximately one to a page, that support the arguments, help organize the flow, and generally make for a nicer reading experience.

Conceptual frameworks

Sketches can also be useful to work out conceptual frameworks. In a recent collaboration with Raphaela Heesen, Marlen Fröhlich, Christine Sievers and Marieke Woensdregt we spent a lot of time talking about ways to characterize various types of communicative “redoings” across species. A key insight was that the variety of terms used in different literatures (eg. primatology vs. human interaction) could actually be linked by looking more closely at the sequential structure of communicative moves. I sent off a quick Saturday morning doodle to my collaborators, and ultimately we published a polished version of it in our paper on communicative redoings across species (PDF here).

Finally, sketches are useful to express ideas and hypotheses visually even before the data is in. For instance, in current work with Bonnie McLean and Michael Dunn we’re thinking a lot about transmission biases and how they influence cultural evolution over time. Bonnie’s dataset looks at biases and rates of change in how concepts relate to phonemic features. It’s helped me to express my thinking on this visually, and I can’t wait to see what Bonnie ultimately comes up with. (This visualization is inspired in part by something I read about parallax in Nick Sousanis’ amazing book Unflattening.)

Sketch showing three panels side by side. One the left, a plot showing a time series with a multitude of grey lines in the lower range and a single black line rising above the grey mass to occupy a distinctly higher position on the Y axis.

In the middle, a skewed square with points corresponding to the end points of all the lines in the left panel, suggesting that it is a sliver of the end of the first plot.

On the right, the middel panel turned towards the reader into a square X-Y plot with a mass of grey dots joined by isolines roughly in the middle and a solitary black dot in the top right.

Not a review

This is not a review of the Remarkable — just a reflection on how it’s changed my academic life for the better. Every device has pros and cons. For instance, I don’t particularly love the overpriced stylus (‘Marker plus’) or how they sell Connect subscriptions for slightly better syncing options — though you should be aware you don’t need a subscription to do any of the things I’ve described in this post. And on the other hand, I absolutely do love the litheness of this device, the just-right friction when writing, and the fact that it has no backlight. The design in general strikes me as a perfect embodiment of that philosopher Ivan Illich has called ‘convivial tools’: tech that is sophisticated yet also responsibly limited in ways that support human flourishing. Anyway, there’s a good remarkable subreddit if you’re in the market for a device like this.

Note. Remarkable has a referral program that gives you a $40 (or equivalent) discount if you use this link to purchase one. If you like the device and keep it, that would also mean I earn $40, which I would use to treat my team to fancy coffee and cakes!

Monetizing uninformation: a prediction

Over two years ago I wrote about the unstoppable tide of uninformation that follows the rise of large language models. With ChatGPT and other models bringing large-scale text generation to the masses, I want to register a dystopian prediction.

Of course OpenAI and other purveyors of stochastic parrots are keeping the receipts of what they generate (perhaps full copies of generated output, perhaps clever forms of watermarking or hashing). They are doing so for two reasons. First, to mitigate the (partly inevitable) problem of information pollution. With the web forming a large part of the training data for large language models you don’t want these things to feed on their own uninformation. Or at least I hope they’re sensible enough to want to avoid that.

But the second reason is to enable a new form of monetization. Flood the zone with bullshit (or facilitate others doing so), then offer paid services to detect said bullshit. (I use bullshit as a technical term for text produced without commitment to truth values; see Frankfurt 2009.) It’s guaranteed to work because as I wrote, the market forces are in place and they will be relentless.

Universities will pay for it to check student essays, as certification is more important than education. Large publishers will likely want it as part of their plagiarism checks. Communication agencies will want to claim they offer certified original human-curated content (while making extra money with a cheaper tier of LLM-supported services, undercutting their own business). Google and other behemoths with an interest in high quality information will have to pay to keep their search indexes relatively LLM-free and fight the inevitable rise of search engine optimized uninformation.

Meanwhile, academics will be antiquarian dealers of that scarce good of human-curated information, slowly and painstakingly produced. My hope is that they will devote at least some of their time to what Ivan Illich called counterfoil research:

Present research is overwhelmingly concentrated in two directions: research and development for breakthroughs to the better production of better wares and general systems analysis concerned with protecting [hu]man[ity] for further consumption. Future research ought to lead in the opposite direction; let us call it counterfoil research. Counterfoil research also has two major tasks: to provide guidelines for detecting the incipient stages of murderous logic in a tool; and to devise tools and tool-systems that optimize the balance of life, thereby maximizing liberty for all.

Illich, Tools for Conviviality, p. 92

  • Frankfurt, H. G. (2009). On Bullshit. In On Bullshit. Princeton University Press. doi: 10.1515/9781400826537
  • Illich, I. (1973). Tools for conviviality. London: Calder and Boyars.

Deep learning, image generation, and the rise of bias automation machines

DALL-E, a new image generation system by OpenAI, does impressive visualizations of biased datasets. I like how the first example that OpenAI used to present DALL-E to the world is a meme-like koala dunking a baseball leading into an array of old white men — representing at one blow the past and future of representation and generation.

It’s easy to be impressed by cherry-picked examples of DALL•E 2 output, but if the training data is web-scraped image+text data (of course it is) the ethical questions and consequences should command much more of our attention, as argued here by Abeba Birhane and Vinay Uday Prabhu.

Suave imagery makes it easy to miss what #dalle2 really excels at: automating bias. Consider what DALL•E 2 produces for the prompt “a data scientist creating artificial general intelligence”:

When the male bias was pointed out to AI lead developer Boris Power, he countered that “it generates a woman if you ask for a woman”. Ah yes, what more could we ask for? The irony is so thicc on this one that we should be happy to have ample #dalle2 generated techbros to roll eyes at. It inspired me to make a meme. Feel free to use this meme to express your utter delight at the dexterousness of DALL-E, cream of the crop of image generation!

The systematic erasure of human labour

It is not surprising that glamour magazines like Cosmopolitan, self-appointed suppliers of suave imagery, are the first to fall for the gimmicks of image generation. As its editor Karen Cheng found out after thousands of tries, it generates a woman if you ask for “a female astronaut with an athletic feminine body walking with swagger” (Figure 3).

I also love this triptych because of the evidence of human curation in the editor’s tweet (“after thousands of options, none felt quite right…”) — and the glib erasure of exactly that curation in the subtitle of the magazine cover: “and it only took 20 seconds to make”.

The erasure of human labour holds for just about every stage of the processing-to-production pipeline of today’s image generation models: from data collection to output curation. Believing in the magic of AI can only happen because of this systematic erasure.

Figure 3

Based on a thread originally tweeted by @dingemansemark@scholar.social (@DingemanseMark) on April 7, 2022.

Sometimes precision gained is freedom lost

Part of the struggle of writing in a non-native language is that it can be hard to intuit the strength of one’s writing. Perhaps this is why it is especially gratifying when generous readers lift out precisely those lines that {it?} took hard work to streamline — belated thanks!

Interestingly, the German translation for Tech Review needed double the amount of words for the same point: “Ein Mehr an Präzision bedeutet manchmal ein Weniger an Freiheit.” I’m still wondering whether that makes it more precise or less.

  • Dingemanse, M. (2020, August). Why language remains the most flexible brain-to-brain interface. Aeon. doi: 10.5281/zenodo.4014750

Talk, tradition, templates: a meta-note on building scientific arguments

Chartres cathedral (Gazette Des Beaux-Arts, 1869)

Reading Suchman’s classic Human-machine reconfigurations: plans and situated actions, I am impressed by what I’m reading on the performative and interactional achievement of the construction of gothic cathedrals, as studied by David Turnbull. In brief, the intriguing point is that no blueprints or technical drawings or even sketches are known to have existed for any of the early modern gothic cathedrals, like that of Chartres. Instead, Turnbull proposes, their construction was massively iterative and interactional, requiring —he says— three main ingredients: “talk, tradition, templates”. Each of these well-summarized by Suchman. This sounds like an account worth reading; indeed perhaps also worth emulating or building on. In the context of the language sciences, an analogue readily suggests itself. Aren’t languages rather like cathedrals — immense, cumulative, complex outcomes of iterative human practice?

Okay nice. At such a point you can go (at least) two ways. You can take the analogy and run with it, taking Turnbull’s nicely alliterative triad and asking, what are “talk, traditions, and templates” for the case of language? It would be a nice enough paper. The benefit would be that you make it recognizably similar and so if the earlier analysis made an impact, perhaps some of its success may rub off on yours. The risk is that you’re buying into a triadic structure devised for a particular rhetorical purpose in the context of one particular scientific project.

Going meta

The second way is to ‘go meta’ and ask, if this triad is a useful device to neatly and mnemonically explain something as complex as gothic cathedrals, what is the kind of rhetorical structure we need to make a point that is as compelling as this (in both form and content) for the domain we are interested in (say, language)? See, and I like that second move a lot more. Because you’ve learnt from someone else’s work, but on a fairly abstract level, without necessarily reifying the particular distinctions or terms they brought to bear on their phenomenon.

While writing these notes I realise that I in my reading and reviewing practice, I also tend to judge scientific work on these grounds (among others). Does it work with (‘apply’) reified distinctions in an unexamined way, or does it go a level up and truly build on others’ work? Does it treat citations perfunctorily and take frameworks as given, or does it reveal deep reading and critical engagement with the subject matter? The second approach, to me, is not only more interesting — it is also more likely to be novel, to hold water, to make a real contribution.

Further readign

  • Gould, S. J. (1997). The exaptive excellence of spandrels as a term and prototype. Proceedings of the National Academy of Sciences, 94(20), 10750–10755. doi: 10.1073/pnas.94.20.10750
  • Suchman, L. A. (2007). Human-machine reconfigurations: Plans and situated actions (2nd ed). Cambridge ; New York: Cambridge University Press.
  • Turnbull, D. (1993). The Ad Hoc Collective Work of Building Gothic Cathedrals with Templates, String, and Geometry. Science, Technology, & Human Values, 18(3), 315–340. doi: 10.1177/016224399301800304

A serendipitous wormhole into the history of Ethnomethodology and Conversation Analysis (EMCA)

A serendipitous wormhole into #EMCA history. I picked up Sudnow’s piano course online and diligently work through the lessons. Guess what he says some time into the audio-recorded version of his 1988 Chicago weekend seminar (see lines 7-11)

[Chicago, 1988. Audio recording of David Sudnow’s weekend seminar]

We learn too quickly and cannot afford to contaminate a movement by making a mistake.

People who type a lot have had this experience. You type a word and you make a mistake.

I have been involved, uh of late, in: a great deal of correspondence in connection with uh a deceased friend’s archives of scholarly work and what should be done with that and his name is Harvey. And about two months ago or three months ago when the correspondence started I made a mistake when I ( ) taped his name once and I wrote H A S R V E Y, >jst a mistake<.

I must’ve written his name uh two hundred times in the last few months in connection with all the letters and the various things they were doing. Every single time I do that I get H A S R V E Y and I have to go back and correct the S. I put it in the one time and my hands learned a new way of spelling Harvey. I call ‘m Harvey but my hands call ‘m Hasrvey.

And they learned it that one time. Right then and there, the old Harvey got replaced and a new Harvey, spelled H A S R V E Y got put in. So we learn very fast.

Folks who know #EMCA history will notice this is right at the height of the activity of the Harvey Sacks Memorial Association, when Sudnow, Jefferson, Schegloff, and others were exchanging letters on Sacks’ Nachlass, intellectual priority in CA, and so on

We have here a rare first person record of the activity that Gail Jefferson obliquely referred to in her acknowledgement to the posthumously published Sacks lectures (“With thanks to David Sudnow who kick-started the editing process when it had stalled”), and much more explicitly in an 1988 letter (paraphrased in Button et al. 2022).

Historical interest aside, I like how the telling demonstrates Sudnow’s gift for first-person observation — a powerful combination of ethnomethodology and phenomenology that is also on display in his books, Pilgrim in the Microworld and Ways of the Hand #EMCA

Originally tweeted by @dingemansemark@scholar.social (@DingemanseMark) on October 6, 2022.

The perils of edited volumes

Ten years ago, fresh out of my PhD, I completed three papers. One I submitted to a regular journal; it came out in 2012. One was for a special issue; it took until 2017 to appear. One was for an edited volume; the volume is yet to appear.

These may be extreme cases, but I think they reflect quite well the relative risks for early career researchers (in linguistics & perhaps more widely) of submitting to regular journals vs special issues vs edited volumes.

Avoiding the latter is not always possible; in linguistics, handbooks still have an audience. If I could advise my 2012 self, I’d say: 1. always preprint your work; 2. privilege online-first & open access venues; 3. use #RightsRetention statements to keep control over your work.

A natural experiment

Anyway, these three papers also provide an interesting natural experiment on the role of availability for reach and impact. The first, Advances in the cross-linguistic study of ideophones, now has >400 cites according to Google Scholar, improbably making it one of the most cited papers in its journal. This paper has done amazingly well.

The second, Expressiveness and system integration, has >50 cites and was scooped by a paper on Japanese that I wrote with Kimi Akita. We wrote that second paper two years after the first, but it appeared one year before it, if you still follow the chronology. As linguistics papers go, I don’t think it has done all that bad, especially considering that its impact was stunted by being in editorial purgatory for 4 years.

The third, “The language of perception in Siwu”, has only been seen by three people and cited by one of them (not me). I am not sure if or when it will see the light of day.

Always plot your data

Always plot your data. We're working with conversational corpora and looking at timing data. Here's a density plot of the timing of turn-taking for three corpora of Japanese and Spanish. At least 3 of the distributions look off (non-normal). But why?

Plotting turn duration against offset provides a clue: in the weird looking ones, there’s a crazy number of turns whose negative offset is equal to their duration — something that can happen if consecutive turns in the data share the exact same end time (very unlikely in actual data).

Plotting the actual timing of the turns as a piano roll shows what’s up: the way turns are segmented and overlap are highly improbable ways — imagine a conversation that goes like this! (in red are data points on the diagonal lines above)

Fortunately some of the corpora we have for these languages don’t show this — so we’re using those. If we hadn’t plotted the data in a few different ways it would have been pretty hard to spot, with consequences down the line. So: always plot your data.

Originally tweeted by @dingemansemark@scholar.social (@DingemanseMark) on November 6, 2021.

Why article-level metrics are better than JIF if you value talent over privilege

I’ve been caught up in a few debates recently about Recognition and Rewards, a series of initiatives in the Netherlands to diversify the ways in which we recognize and reward talent in academia. One flashpoint was the publication of an open letter signed by ~170 senior scientists (mostly from medical and engineering professions), itself written in response to two developments. First, the 2019 shift towards a “narrative CV” format in grant applications for the Dutch Research Council (NWO), as part of which applicants are asked to show evidence of the excellence, originality and impact of their work using article-level metrics instead of journal level metrics like the Journal Impact Factor (JIF). Second, the recent announcement of Utrecht University (signatory of the Declaration on Research Assessment) to abandon the JIF in its hiring and promotion processes (see coverage).

Why funders in search of talent are ditching the JIF

Some background will be useful. The decision to not use JIF for the evaluation of researchers and their work is evidence-based. There is a lot of work in bibliometry and beyond showing that a 2-year average of a skewed citation distribution is an imperfect measure of journal quality, a driver for perverse incentives, and a misleading proxy for the quality of individual papers. Indeed Clarivate itself, the for-profit provider of the JIF metric, has this to say about it: “In the case of academic evaluation for tenure, it is inappropriate to use a journal-level metric as a proxy measure for individual researchers, institutions, or articles”.

Despite this evidence, JIFs have long been widely used across the sciences not just as a way to tell librarians which journals are making waves (= what they were designed for) but also as a quick heuristic to judge the merits of work appearing in them or people publishing in them. As they say, ‘Don’t judge a book by its cover, but do judge scientific work by its JIF’. There is a considerable halo-effect attached to JIFs, whereby an article that ends up in a high IF journal (whether by sheer brilliance or simply knowing the right editor, or both) is treated, unread, with a level of veneration normally reserved for Wunderkinder. Usually this is done by people totally oblivious to network effects, gatekeeping and institutional biases.

It appears that the decision to explicitly outlaw the use of JIFs now has people coming out of the woodwork to protest. The first letter (and also another one by a number of junior medical scientists) is aimed specifically at the prohibition against using the JIF, which is (incorrectly) framed as a ban on all quantification. The feeling is that this deprives us of a valuable (if inexact) metric that has long been used as a quick heuristic of the ‘value’ or ‘quality’ of work.

‘Halo? What halo?’

Raymond Poot, main author of the first letter, strongly believes that the JIF, even if inexact, should not be ditched. Saying, “Let’s talk JIF”, he provides this diagram of citation distributions in support:

The diagram compares the citation distributions of Nature and PLOS ONE (an open access megajournal). Poot’s argument, if I understand it well, is that even if Nature’s JIF is skewed by a few highly cited papers, the median number of citations is still higher, at 13, than the median number of cites that PLOS ONE papers receive (which looks like 1). As Poot says in reference to an earlier tweet of mine on the halo-effect, ‘Halo? What halo?’.

We want to identify and reward good work wherever it appears

We’ll get to that halo. First things first. We’re talking about whether using the JIF (a journal’s 2-year citation average) is a good idea if you want to identify and reward good individual work. And especially whether using the JIF is better or worse than using article-level metrics. Another assumption: we care about top science so we would like to identify good work by talented people wherever it appears. Analogy: we want to scout everywhere, not just at the fancy private school where privileges and network can obscure diverse and original talent.

Let’s assume the figure represents the citation distributions reasonably well (I’m going to ignore the obvious folly of taking an average of a very skewed and clearly not unimodal distribution). Where is the JIF halo? Right in front of you, where it says, for publication numbers, “in thousands for PLOS ONE”. Publication volume differs by an order of magnitude. This diagram hides that by heavily compressing the PLOS distribution, which is never good practice for responsible visualization, so let’s fix that. We’ll lose exact numbers (they’re hard to get) but the difference is large enough for this to work whatever the numbers.

The enormous difference in sheer volume means that an OA megajournal is likely to have quite a few papers with more cites than the Nature median — high impact work that we would miss entirely if we focused only on the JIF. The flip side is where we find the halo effect: there are, in any given year, hundreds of Nature papers that underperform quite a bit relative to the IF (indeed half of them underperform relative to the median). This —the skewed distributions for both the megajournal and the glamour journal— shows why it is a bad idea to ascribe properties to individual papers based on how other papers published under the same flag have been cited.

On average, my paper is better than yours

“But still, surely on average Nature papers are…” — besides the point. I would rather give a talent grant to the bright student who made their way through the public school system (an over-performing PLOS paper) than to the one who dangles at the bottom of the distribution of their privileged private school (an underperforming Nature paper). Identifying talent on the basis of JIF instead of content or impact is like handing out bonus points to private school essays in the central exam. “But on average those elite schools do tend to do better don’t they?” Unsurprisingly, they do, and if you think such differences are meaningful or worth further reinforcement , it’s worth reading some more sociology, starting perhaps with the diversity-innovation paradox.

There are other issues with our hyperfocus on glamour journals. These journals like to publish good work but they also apply some highly subjective filters (selecting for ‘broad appeal’ or ‘groundbreaking’ research — phrases that will sound familiar from the desk-rejects that statistically speaking many readers from academia will have seen). Nature prides itself on an 8% acceptance rate, the same chances that we rightly call a lottery when it concerns grant proposals. Being overly selective inevitably means that you’ll miss out on top performers. One recent study concluded that this kind of gate-keeping often leads us to miss highly impactful ideas and research:

However, hindsight reveals numerous questionable gatekeeping decisions. Of the 808 eventually published articles in our dataset, our three focal journals rejected many highly cited manuscripts, including the 14 most popular; roughly the top 2 percent. Of those 14 articles, 12 were desk-rejected. This finding raises concerns regarding whether peer review is ill-suited to recognize and gestate the most impactful ideas and research.

Gatekeepers of course also introduce their own networks, preferences and biases with regards to the disciplines, topics, affiliations, and genders they’re more likely to favour. In this context, Nature has acknowledged the sexism of how its editorial boards are constituted, and as the New York Times wrote last year, the publishing process at top journals is “deeply insular, often hinging on personal connections between journal editors and the researchers from whom they solicit and receive manuscripts”.

From smoke and mirrors to actual article-level impact

“But doesn’t my Nature paper count for anything?” I sure hope it does. And the neat thing is, under the new call for evidence-based CVs you can provide evidence and arguments instead of relying on marketing or association fallacies. Do show us what’s so brilliant and original about it. Do tell us about your contributions to the team, about the applications of your work in industry, and about the relative citation ratio of your paper. Indeed, such article-level metrics are explicitly encouraged as a direct indicator of impact and originality. To spell it out: a PLOS ONE paper that makes it to the 1% most cited papers of its field is more telling than, say, a Nature paper of the same age that has managed to accrue a meagre 30 cites. An evidence-based CV format can show this equally for any type of scientific output, without distracting readers with the smoke and mirrors of the JIF.

Scientists are people, and people are easily fooled by marketing. That is going to be the case whether we mention the JIF or not. (The concerned medical scientists writing the letters know full well that most grant reviewers will know the “top” journals and make inferences accordingly.) The purpose of outlawing the JIF is essentially a nudge, designed to make evaluators reflect on this practice, and inviting them to look beyond the packaging to the content and its actual impact. I can only see this as an improvement — if the goal is to identify truly excellent, original and impactful work. Content over silly bean counts. True impact over halo effects.

If you want to find actual impact, look beyond the JIF

I have focused so far on PLOS ONE and Nature because that’s the example provided by Raymond Poot. However, arguably these are two extremes in a very varied publishing landscape. Most people will seek more specialised venues or go for other multidisciplinary journals. But the basic argument easily generalizes. Most journals’ citation distributions will overlap more than those of PLOS ONE and Nature. For instance, let’s take three multidisciplinary journals titrated along the JIF ranks: Science Advances (14.1), PNAS (11.1), and Scientific Reports (4.4). Set up by Nature to capture some of the market share of OA megajournals, Scientific Reports is obviously less artificially selective than the other two. And yet its sheer publication volume means that a larger number of high impact papers appear in Scientific Reports than in PNAS and Science Advances combined! This means, again, that if you want to find high impact work and you’re just looking at high IF journals, you’re missing out.

Trying to find good or impactful work on the basis of the JIF is like searching for your keys under the streetlight because that’s where the light is. Without it, we stand a better chance of identifying truly groundbreaking work across the board — and fostering diversity and innovation in the process.

Caveats. I’ve used article-level citations as a measure of impact here because they most directly relate to the statistically illiterate but widespread use of the JIF to make inferences about individual work or individual researchers. However, citations come with a time lag, are subject to known biases against underrepresented minorities, and are only one of multiple possible measures of originality, reach and impact. Of course, to the extent that you think this makes actual citations problematic as indicators of article-level impact or importance, it means the JIF is even more problematic.

Van betekenisloze getallen naar een evidence-based CV

Lezenswaardig: een groep jonge medici ageert tegen de marketing-wedstrijd waarin volgens hen narratieve CVs in kunnen ontaarden — de nieuwste bijdrage aan het Erkennen & Waarderen-debat. Maar niets is wat het lijkt. Over evidence-based CVs, kwaliteit & kwantificatie

Eerst dit: de brief benoemt het risico dat je met narratieve CVs een soort competitie krijgt tussen verhalen. Dat kan zeker als de conventies van het genre nog niet uitgekristalliseerd zijn, zoals ik al schreef in 2019, toen NWO het invoerde. Een mooie-verhalen-wedstrijd wil niemand, daar zijn we het over eens. Ik ben het wat dat betreft trouwens ook eens met misschien wel het belangrijkste punt van de eerste brief o.l.v. Raymond Poot: meten is weten. Je moet alleen wel weten wát je meet. Daarover gaat dit stuk ook.

De medici (zowel deze jongere collega’s als de senioren o.l.v. @raymondpoot in het openingssalvo) lijken vooral te ageren tegen de term “narratief CV”. Die heeft ook de schijn tegen natuurlijk: gaan we elkaar nou sterke verhalen zitten vertellen bij het kampvuur? Nee toch zeker! Volgens de briefschrijvers moet een wetenschapper in het nieuwe systeem iets over haar achtergrond & prestaties opschrijven “op een onderscheidende manier” en “zonder kwantitatieve maten te gebruiken”. Factcheck: ❌ Kwantificatie in het narratieve CV is prima, gewenst zelfs!

Laten we de call van NWO er anders even bij pakken: hier is de PDF — het stuk waar het om gaat (§3.4.1 sectie 1 en 2) plak ik hieronder

Evidence-based CV

Als de term “narratief CV” je niet zint kun je het ook een evidence-based CV noemen: in plaats van contextloze lijstjes & getallen wil men argumenten zien voor de excellentie van de kandidaat & haar werk, kracht bijgezet door kwalitatief en kwantitatief bewijs van impact.

Want kijk even mee: zowel kwantitatieve als kwalitatieve indicatoren zijn uitdrukkelijk toegestaan. Dat zou je niet uit de brieven van Raymond Poot & collega-medici gehaald hebben. Het cruciale verschil is dat indicatoren duidelijk betrekking moeten hebben op specieke items: “Alle type kwaliteitsindicatoren mogen genoemd worden, zolang ze betrekking hebben op slechts één output item.”

Wat hier goed aan is 1: Waar eerder de complete publicatielijst geplempt mocht worden (waar vooral veelschrijvers bij gebaat zijn) vraagt dit format om een gemotiveerde keuze van 10 items: de n-best methode die aan Ivy Leagues gangbaar is. Niks mis mee!

Wat hier goed aan is 2: Waar je eerder goede sier kon maken met journal-level metrics als IF (statistisch gezien niet meer dan een opgedirkt halo-effect) moet je nu hard bewijs leveren voor de impact & het belang van je werk.

Wat hier goed aan is 3: Waar je eerder te koop kon lopen met een hoge h-index (niet gecorrigeerd voor voorsprong door leeftijd, coauteurschap, zelfcitaties & andere biases) mag je nu laten zien welke van je papers echt zo briljant & origineel zijn.

Dat kwantificatie niet meer mag is quatsch

Volgens mij zijn dat ook 3 manieren waarop een evidence-based CV meer kansen biedt juist voor de ‘kwetsbare groepen’ die de brief noemt. (En ook: 3 manieren waarop de voorsprong van traditioneel bevoorrechten enigszins rechtgetrokken wordt — is dat niet ook een deel van de pijn?)

Kortom, dat kwantificatie niet meer zou mogen is quatsch. Je kunt alleen niet meer wegkomen met de meest indirecte cijfers (die vooral wat zeggen over privileges, kruiwagens en co-auteurs) — in plaats daarvan moet je nu hard bewijs leveren voor de impact & het belang van je werk.

Ik moet wel zeggen: de misverstanden in de brieven komen niet helemaal uit de lucht vallen. “Narratief CV” is geen beste term en er is kennelijk gebrek aan sterke voorbeelden van verantwoorde & genuanceerde kwantificatie op artikelniveau. Werk aan de winkel voor Erkennen en Waarderen en NWO!

Tot slot: álle briefschrijvers —van @raymondpoot cs tot @DeJongeAkademie @RadboudYA etc tot de jonge medici— zijn het erover eens dat roofbouw op de financiering de echte nekslag is voor topwetenschap in ons land: meer investering in fundamenteel onderzoek is cruciaal

Toevoeging 11 mei 2022:

Nou, mijn betoog in dit draadje, of in ieder geval de de term ‘evidence-based CV’, lijkt bij NWO gehoor gevonden te hebben — waar op mijn CV zal ik dat zetten? 😃

Originally tweeted by @dingemansemark@scholar.social (@DingemanseMark) on July 27, 2021.