A variety of vocal depictions: Notes on non-lexical vocalisations, I

Last week I was happy to present my work at a workshop on Ideophones and nonlexical vocalisations in Linköping, Sweden, organised by Leelo Keevallik and Emily Hofstetter. This was the kick-off for a new project on “Non-lexical vocalisations“. It was my first time in Linköping and it was great getting to know the vibrant community of interaction researchers from across departments. Also, I kind of fell in love with the Key Huset building and its light-flooded wood toned spaces.

The workshop was thought-provoking in many ways. This is the first of two posts in which I share some of my notes. It’s a personal take, not at all intended as a comprehensive summary, if only because I had to leave early to pick up my daughter from daycare back in Nijmegen and therefore missed the last third of the workshop, which (judging from Emily Hofstetter’s live tweeting) was just as interesting as the first two thirds. A central concern of the larger project hosting the workshop is to “problematise the traditional boundaries of linguistics”. This is something I’m sympathetic to, if only because my own work on ideophones and interjections has made me acutely aware of the subjectiveness of our notions of what is marginal and what is core in language.

Rara versus marginalia

In thinking about marginality, I find it useful to distinguish two ways in which things may be peripheral: rara and marginalia (see Dingemanse 2017). Rara are truly rare linguistic phenomena that are interesting precisely because they are so out of the ordinary: things like click phonemes, nominal tense, or affixation by place of articulation. Marginalia are common phenomena that just don’t happen to be part of the traditional interests of linguistics: things like gesture, ideophones, or indeed “non-lexical” vocalisations.

The crucial difference between rara and marginalia lies in the subjectivity of the latter. We can objectively tell whether something is truly rare or exceptional. But many classifications of things as peripheral or marginal are much more subjective. What we think of as marginal is determined by our data, methods, and theories; and in addition to that, by our own linguistic experience and language ideologies. There is nothing wrong about declaring some things as peripheral to your current interests: time is limited and we all have to make choices. But it is always useful to be aware of how you come to such choices, and to reflect on whether your interests (or methods, theories, ideologies) might benefit from a bit of recalibration.

Many of the phenomena in focus during the workshop were not rara but marginalia in this subjective sense: they occur all the time in language use and might tell us interesting things about language structure — but they’ve been mostly treated as marginal to the concerns of mainstream linguistics. However, the tide may be turning for at least some marginalia: work on ideophones is clearly on the rise, and initiatives such as Martina Wiltschko’s Eh lab at UBC and this new nonlexical vocalizations project at Linköping University show there is significant interest in this area.

Vocal depiction is rampant

One thing that struck me during the workshop is how common it is to use the voice to depict meaning, often in contexts where other means of communication may be much less efficient or effective. Whether it’s during lindy hop learning sessions (as in Leelo Keevallik‘s work) or band practice (as in Agnes Löfgren‘s data), in professional choreography rehearsals (as in Johanna Skubisz‘ work) or in everyday interaction in Siwu (as in my work on ideophones), people use vocal depictions —often in multimodal ensembles— to evoke perceptual experiences and coordinate bodily behaviour.

One thing all kinds of vocal depictions have in common is that they show rather than tell. It is incredibly hard to tell a dancer to execute a movement in a certain way; it is much easier to show it, either by means of a bodily demonstration or by means of gestural and vocal depictions. Or to take an example from my own research in Ghana, it is quite hard to explain how you can visually tell a real batch of gunpowder from a counterfeit one, but if you manage to depict its particular sheen using using gestures and an ideophone like kɛlɛŋkɛlɛŋkɛlɛŋ (as in example 11.11 here), you can go a long way.

Depictions construe a likeness or a replica of some sensory scene (Clark 2016), making aspects of it more directly accessible and manipulable than would be the case if the scene was merely described in arbitrary words. This is what makes them useful in a wide range of communicative contexts. In my own work on creative vocal depictions (PDF) I mentioned settings as diversified as storytelling, joint work in animation studios, and interaction in music and dance lessons. During the workshop we saw further examples from band practice, choreography rehearsals, multilingual conversations, and doctor-patient interaction. This diversity of contexts brings home the versatility of depiction as a communicative practice.

Versions of the ‘same’ thing are analytical rich points

Some of the richest opportunities for analysis come from cases where the interaction provides multiple versions of some behaviour designed to represent ostensibly the same scene. For instance, in Agnes Löfgren‘s extract from a band rehearsal, we heard a bass player convey (to the drummer) a particular rhythmic structure he had in mind for this piece. The bass player produced at least four versions of ostensibly the same content. The versions can be seen as escalations or upgrades, in part shaped by the drummer’s responses which ranged from ‘isn’t that what I’m doing now’ to ‘alright okay’ to ‘I don’t see it yet’ to ‘like it actually gets kind of cool’:

  1. a prose description (‘so it’s like you play fou- a four against our three’)
  2. a depiction in syllables (du du ka du du ka du ka) with the foot doubling as bass drum
  3. a short rhythmic phrase played on the bass, soon abandoned
  4. an actual demonstration on the drum set

Cases like this raise many intriguing questions, some inspired by Clark & Gerrig’s (1990) classic work on quotations as demonstrations. How do we decide between  modalities (or combinations of modalities) in designing depictions? What determines the ordering of strategies seen in successive pursuits? What is the role of recipient design in choosing one over another strategy? How do we select the aspects of a scene that we are going to depict, and how do we map these to the depictive means at hand? How is the design of our depictions shaped and constrained by the affordances of meaning and modality? And so on.

We saw more examples in Leelo Keevallik’s lindy hop data. In one memorable case, a lindy hop learner asks a question about a possibly problematic element of a dance move, referring to it using the creative vocal depiction “zup↑pum↑”. The teachers decide to show rather than tell by actually executing the moves, and in synchrony with this they produce vocalisations that depict some of the rhythmic and kinetic aspects of the dance — including a piece that structurally is recognisable (for us analysts as well as, presumably, for the learner asking the question) as the relevant referent of “”zup↑pum↑”. Also during the dance, the other teacher produces ‘nonlexical’ syllables like chigi digi digi in sync with the beat and with his movements, and after completing the dance, adds, “So yeah, it’s just a nice little jigijigijigi‘, simultaneously depicting some of the kinetic aspects of the dance in voice and hands.

Versions of ostensibly the same thing are crucial because they give us more material to work with if we want to understand the link between the depiction and the depicted scene — often a challenge not just for the analyst but also for the recipient in interaction. Versions give us analytical purchase in two key ways: they show multiple iterations of ostensibly the same action, and if we’re lucky, they also give us multiple takes on the material by the recipient, providing crucial interactional evidence of the success or failure of depictive stretches of behaviour.

One type of useful interactional evidence is when different participants provide takes on ostensibly the same scene that demonstrate (rather than just claim) their understanding or expertise. With ideophones, I have found that when one participant produces an ideophone evoking a scene (e.g., munyɛmunyɛ ‘sparkling’), in second position another participant may then produce another ideophone (e.g., gelegele ‘shiny’) as if to say, I agree with you, and here is how see it. This is where vocal depictions in interaction touch on matters of epistemics and authority.

A key challenge when working with creative depictions is that it can be hard for the analyst to even know what they are supposed to depict. Here, another type of interactional evidence can be particularly useful: when a recipient formulates their understanding of the depiction. In my talk at the workshop I discussed a case from my study on creative vocal depictions where one person’s creative ideophone kpaw is followed by the other’s interpretation in next turn: “the gun didn’t go off”:

  1. A:  lopɛ↑kpaw↑
         I fired ↑kpaw↑
  2.      (1.2)
  3. B:  kùdu leiba inɔ̀
         the gunpowder didn’t go through
  4. A:  kùdu leiba- kɔ
         the gunpowder didn’t go- gee!

What B does in line 3 is take A’s creative depiction and formulate an understanding of it in descriptive terms. This is analytically very useful, because it saves us the trouble of speculating what the depiction was supposed to evoke. B’s interpretation is ratified by A when he repeats it and continues the telling.

It is kind of wonderful that we can create and interpret vocal depictions just like that.  What cases like this show is that interactional evidence can help us crack some of the most intriguing questions about creative vocal depictions. Their interpretation is scaffolded by context, supported by people’s familiarity with (conventional) depictive strategies, and ratified in interaction by these kinds of understandings.

(An interesting boundary case comes from Hannah Pelikan‘s work on interaction with a Nao robot. She recorded games of charades. Nao would produce a pre-programmed ‘depiction’ (e.g. playing a plane sound and visually imitating wings with arms) and a participant would produce a verbal guess, which was then treated as right or wrong by Nao depending on a pre-programmed set of answers. Hannah’s data shows that people
are pretty graceful even when perfectly reasonable guesses are dismissed by Nao, and rapidly adapt to the limited agency displayed by the robot. What’s potentially interesting here is that we could get multiple takes on what is guaranteed to be the exact same depiction. Holding one side of the equation still, as it were, to see what the other, more flexible human side makes of it. However, due to the restricted format of the charades game, usually there was only one guess and no opportunities for redress.)

In closing

One thing that is so fascinating about marginalia is the combination of relatively common occurrence with a striking lack of systematic attention from linguists and interaction researchers. It means that there are lots of things still to find out about some of the most fundamental aspects of how we use language, and how language is shaped by and for social interaction. In the next installment I’ll explore some other themes from the workshop, focusing on the question: what does it mean to call something “non-lexical”?


  • Clark, Herbert H., and Richard J. Gerrig. 1990. “Quotations as Demonstrations.” Language 66 (4): 764–805.
  • Clark, Herbert H. 2016. “Depicting as a Method of Communication.” Psychological Review 123 (3): 324–47. https://doi.org/10.1037/rev0000026.
  • Dingemanse, Mark. 2014. “Making New Ideophones in Siwu: Creative Depiction in Conversation.” Pragmatics and Society 5 (3): 384–405. https://doi.org/10.1075/ps.5.3.04din.
  • Dingemanse, Mark. 2017. “On the Margins of Language: Ideophones, Interjections and Dependencies in Linguistic Theory.” In Dependencies in Language, edited by N. J. Enfield, 195–202. Berlin: Language Science Press. https://doi.org/10.5281/zenodo.573781.
  • Keevallik, Leelo. 2010. “Bodily Quoting in Dance Correction.” Research on Language & Social Interaction 43 (4): 401–26. https://doi.org/10.1080/08351813.2010.518065.
  • Keevallik, Leelo. 2014. “Turn Organization and Bodily-Vocal Demonstrations.” Journal of Pragmatics, A body of resources – CA studies of social conduct, 65 (May): 103–20. https://doi.org/10.1016/j.pragma.2014.01.008.

