Putting interaction centre-stage

I’ve been taking part (virtually) in a workshop today at the Cognitive Science conference in Sydney entitled “Putting interaction center-stage for the study of knowledge structures and processes”.

Kicking off the workshop, my own contribution was a summary of our Beyond Single-Mindedness manifesto. This was followed by Nick Enfield, who argued that concepts are necessarily social-relational, and by Joanna Rączaszek-Leonardi & Julian Zubek, who drew attention to the importance of first-person experiences of active agents as they couple with others in interaction. These talks were bundled in a ‘theoretical’ block, though each of them also had empirical components.

One feeling I had during the discussion following the talks is that it’s too easy to get muddled in theoretical distinctions and philosophical musings, and THAT is where interaction offers firm empirical grounding. If I look at the work of Lucy Suchman, Ed Hutchins, Gail Jefferson, or Linda Smith — it’s the direct empirical grounding afforded by looking at rich records of interaction that makes it possible to achieve real theoretical and conceptual progress.

For instance, Lucy Suchman (1987), by carefully studying how people interact with photocopiers, was able to singlehandledly upend the classical cognitive science agenda of plans as individual representations and instead show compellingly how they emerge as situated actions. From that empirical work we can then derive concepts like the contingent co-production of a shared material world (=Suchman’s definition of interaction).

As we write in Beyond Single-Mindedness (following Wittgenstein), interaction offers a form of direct empirical access to the multiscale dynamics of cognition that is hard to get otherwise. In that sense, interaction is a privileged locus for cognitive scientists interested in interactional resources and cognitive processes. Looking at it and closely observing it ought to be our first stop, not an afterthought.


The second half of the workshop was devoted to methodological contributions: Michael Richardson & Rachel Kallen on nonlinear modelling and machine learning approaches to prediction movement and action; Veronica Romero, Alexandra Paxton and Tahiya Chowdhury presenting a range of tools including OpenPose, OpenSmile and recommending Whisper ASR for automatic transcription. Finally, Kristian Tylén showed how coordinated epistemic interaction makes cognition a ‘public process’ and Hadar Karmazyn-Raz & Linda Smith presented work on the dynamics of caregiver-infant interactions.

I liked seeing this work presented, and I have learned new things. At the same time I have some doubts about unintended side-effects of some of these methods. I should clarify that we’ve used kinematics, unsupervised machine learning and speech recognition ourselves, so I’m aware of the utility. My own experience when it comes to such methods is that they are cool and potentially useful, but they also risk being “methods in search of questions” and moreover methods that risk putting us at larger distance from the actual empirical data. After all, Lucy Suchman didn’t need models of nonlinear coupled oscillators to turn the classical cogsci take on planning on its head. Gail Jefferson didn’t need recurrence quantification analysis to bring to light the turn-taking system in what would become the most cited paper of linguistics: she pioneered a system for the detailed transcription and analysis of interactive behaviour that is still the standard in conversation analysis.

New methods create new affordances and allow different types of analyses. But as I use them in my own work, complementary to the rigorous and systematic qualitative ways of looking at data furnished by ethnomethodology and conversations analysis, I do sometimes feel these new methods may have the effect of putting us at a larger distance from the data as it unfolds in the lived experience of the participants themselves.

A very simple illustration of this problem is the recommendation to use Meta’s Whisper ASR for automatic transcription. Our own recent research has shown that Whisper, like most if not all currently available ASR solutions, is terrible at representing timing and overlap, and erases many of the little words that are interactionally important. By our measure, using such ASR systems erases 15% of speech, or 1 out of every 8 words, and the words erased are some of the most interactionally consequential ones. So if you use transcripts or timing data coming out of Whisper without labour-intensive human quality control and correction, they’re nearly useless for fine-grained work on timing, alignment, and intersubjectivity. You’d be working with a funhouse mirror version of your ‘data’ and you wouldn’t see it unless you dive into the output yourself and compare it to the actual conversations. Unexamined use of technology has a way of putting us out of touch with the realities of interactions as they unfold.


We can also frame that as a question: as our methods become more technologically sophisticated, is there not a risk of losing sight of the value of careful qualitative observation of interaction? If we replace ANOVAs and 2×2 designs by RQAs and nonlinear modelling, what have we gained?

In response to this, Michael Richardson countered that you can use some of these methods to “uncover structures that you cannot uncover by observation”. He concurred though in saying that you cannot use these methods as a silver bullet; you cannot simply apply them and get something — “you still need to ground them theoretically”.

I would add to that that the grounding needs to be empirical as well (and perhaps in the first place). There is an unnecessary rift in cognitive science discourse (in general, I’m not talking about Richardson’s point here) where theory is cast as high-minded conceptual work and empirical research (especially of the observational kind) is more seen as grunt work, paving the way for the real (often experimental and computational) work. That is not how things work in my experience at all: there is a direct line between empirical, data-driven observation and theory development that does not always need to be mediated by experiments. Some of the strongest theoretical claims in my work (and some of the most replicable ones) derive directly from fine-grained empirical observation of co-present interaction.

Experiments are nice to check hunches; computational models are good to force oneself to specify things in unambiguous ways; but careful, systematic, disciplined observational work forms the empirical backbone of a lot of the most consequential research in human interaction over the past five decades. I’m putting things purposefully strongly here; of course there is a place for all these things besides one another, and I have used all of them in my own work. But the grounding, ultimately, has to come from the ground: the earthy, artisanal reality of everyday interaction.


  • Rączaszek-Leonardi, Joanna, Kristian Tylen, Mark Dingemanse, Linda Smith, Hadar Karmazyn Raz, Nick Enfield, Rachel W. Kallen, et al. 2023. “Putting Interaction Center-Stage for the Study of Knowledge Structures and Processes.” Proceedings of the Annual Meeting of the Cognitive Science Society 45 (45). https://escholarship.org/uc/item/8571r2dz.
  • Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “The Timing Bottleneck: Why Timing and Overlap Are Mission-Critical for Conversational User Interfaces, Speech Recognition and Dialogue Systems.” SIGDIAL 2023 (arXiv: https://doi.org/10.48550/arXiv.2307.15493)

Leave a Reply

Your email address will not be published. Required fields are marked *