What is ‘non-lexical’? Notes on non-lexical vocalisations, II

Dingemanse, Mark. 2020. Between Sound and Speech: Liminal Signs in InteractionResearch on Language and Social Interaction53(1), 188–196.  doi:10.1080/08351813.2020.1712967 (PDF)

TL;DRNon-lexical is a term people use for things that seem borderline linguistic, like sniffs, coughs, and grunts. However, it’s rarely a great idea to define things in terms of what they are not. In fact we can think of many of these things as liminal signs: signs that can but need not be used communicatively because they occupy the borderland between sound and speech.

Order at all points

One of the nice features of the workshop was the “rapid data session” format, which enabled analysts to make available one or two data extracts (often with audio, video and transcripts) for repeated inspection, allowing everyone in the audience to study them and make observations or ask questions. In this way we discussed data featuring vocalisations including ermmmnrrrnuh::, ʔouiʔ, ha:i: (sighed), du du ka du du du ka, k’hohhh, zuppum, hop-paa, and many more.

But there is method to the madness. For instance, talking on the topic of “How to audibly not say something with clicks”, Richard Ogden (York) showed how English speakers use various click sounds for double entendres, collusions and in general things that are treated as best left off the record. He also made a convincing case for a systematic, conventionalised contrast between lateral and central click sounds, which maps onto a contrast in social actions. Despite English not being generally known as a click language, English speakers have no trouble mastering this contrast and use it in everday interaction (some details are in Richard’s 2013 paper on clicks).

When speech sounds are distinctive in this way, linguists often use that as evidence to argue for phonemic status: the contrastive sounds earn their place in the phonology of the language. These conversational clicks form an interesting test case. Is a single systematic contrast, or even a small number of similarly contrasting items, sufficient for admission to the phoneme inventory, or is there some kind of threshold we use to determine this?

I think it is fair to point out that the majority of English words don’t feature contrastive click phonemes, and so this could be a reason to say they are not part of English phonology. But such frequency-based arguments can be slippery. Given that phonemes show a Zipfian distribution, we expect there to be relatively rare phonemes. Are clicks simply one extreme of this continuum? I can’t bring myself to agree with this either, if only because their distribution (in terms of places where they occur) seems quite different.

Most importantly, in English, these click sounds don’t seem to be contrastive within words the way p/t or k/g are, but instead are contrastive as stand alone items. So on a generous reading of ‘word’, these items are words, or at least lexical items, or at least conventionalised linguistic items. Which brings me to the key question I was left with after the workshop.

What, if anything, is “non-lexical”?

Throughout the workshop we faced the challenge of how exactly to refer to the various things we studied. One term used widely, mostly for want of a better one, was non-lexical vocalisations. While it may be the best we have currently, there are several issues with it.

First, it’s never great to define by negation. Is not being lexical the key feature setting apart these vocalisations as a phenomenon? What would lexical vocalisations be, anyway? We have the term ‘word’, so using an alternative like ‘vocalisation’ already implies some relevant difference between the items in focus and run-of-the-mill words like ‘cat’ or ‘mat’. And as we saw above for the clicks, a case could be made that even these phonologically outlandish items have some recognised (or at least recognisable) status as conventionalised items in a larger system of practices, i.e., a lexicon.

Second, calling them “non-lexical” implies that the lexicality of these items is somehow lacking or in doubt. True, these items are unlikely to be found in traditional lexicons; but the arbitrary constraints of printed dictionaries will never be a reliable guide for linguistic questions. Anyway this doesn’t help if we want to argue (as several of us did during the workshop) that the shape of these items can to an important degree be conventionalised, or that they may draw on partly conventionalised inventories of depictive practices, or that they are used in systematic ways, or that they form paradigmatic relations within larger systems of practices. All of these point to a conventionalised, and therefore possibly lexicalised, status of these things.

Depictions and displays

Before we worry about lexicality, it’s worth asking whether there is a unified phenomenon here in need of a single label like “non-lexical vocalisations”, or whether there are multiple distinct phenomena. I think there may be at least two clear groups of phenomena worth distinguishing:

1. Vocal depictions (≈ Clark’s ‘demonstrations’, Güldemann’s ‘mimesis’)

These are vocalisations typically presented as depictions of sensory scenes that enable others to perceive for themselves the scene depicted. Examples include ideophones, creative vocal imitations of sounds, movements and other sensory scenes. In Peircean terms, their mode of signification is primarily iconic. For example, a vocalisation like wop pa da PUM can iconically depict aspects of the temporal and kinetic dynamics of a sequence of dance moves (Keevallik 2010). Like all signs, vocal depictions may also have symbolic and indexical properties.

While most English speakers won’t feel that wop or pa da PUM are words, one could make a case for a degree of conventionalisation in particular communities of practice. For instance, dancers or musicians who work closely together likely converge on a small set of vocalisations they use in this way (Sundberg’s 1994 syllabling). From here it’s not far at all to the larger inventories of conventionalised vocal depictions we call ideophones. Indeed one place where we find ideophones is precisely in situations where there is a premium on sharing and calibrating sensory perceptions and achieving bodily coordination, as Elena Mihas (2013) has shown for ideophones in Ashenika Perene. (Some of these uses of ideophones are reviewed in a forthcoming article for the Oxford Research Encyclopedia of Linguistics; preprint here.) So I see vocal depictions as an overarching category that includes creative vocalisations as well as conventionalised ideophones, and everything in between.

2. Vocal displays (≈ Goffman’s ‘response cries’, Kockelman’s ‘interjections’ — I’m not sure whether ‘display’ is the best term here)

These are vocalisations typically produced as indexical signs of emotion, effort, evaluation. They are presented not so much depictions of events as responses to events. Examples include strain grunts, pain cries, yawns, interjections of disgust, vocal signs of cognitive effort, etc. For Goffman these would present themselves more as “giving off” than “giving” information, though of course precisely this opens up the possibility for people to produce or treat them as doing other things ostensibly off record. In Peircean terms, their mode of signification is primarily indexical. For instance, the phonetic form of a strain grunt does not itself present a resemblance to its ascribed meaning of ‘effort’ — it can be seen to indexically show that effort. Like all signs, vocal displays may also have symbolic and iconic properties.

I’m trying to be careful here in saying that vocal displays are “typically produced as indexical signs”. An inbreath or a click sound can be ‘merely’ an index of the physical process of preparing to speak, involuntarily produced; but that it regularly occurs in this indexical relationship means that we can also use it in a more controlled way to display imminent speakership, and therefore do interactional work. Likewise, something like um can be ‘merely’ an index of the cognitive process of starting to formulate a turn but not being ready to speak yet; but that it regularly occurs in this function makes it possible for us to do interactional work with it, for instance, buy ourselves time at interactionally fraught moments (Clark & Fox Tree 2003).

(Non)lexicality is an orthogonal issue

The two groups, vocal depictions and vocal displays, are united at least in being commonly treated as marginalia in the subjective sense (Dingemanse 2017). Further, vocal depictions and vocal displays are both more ‘showing’ than ‘telling’, though for different reasons: depictions because they iconically create a likeness (Donald 1998), displays because they indexically provide evidence of some inner feeling or state (Wharton 2003). Both groups also appear to allow a degree of gradience that seems to be less typical for more descriptive vocabulary: depictions because modifications in form analogically correspond to modifications in meaning, and displays because they are productively combined with a wide range of prosodic resources in the service of showing stance and streamlining interaction. All of these things may justify grouping them together as “vocalisations”. But I wouldn’t want to call them “non-lexical” across the board.

The reason is that lexicality is an orthogonal matter. Lexicality is a graded property (something can be more or less lexical) and it runs through both groups: in both, we have fully conventionalised lexical items like ideophones or the word “um” ; and items that are less clearly conventionalised and linguistically integrated, like the vocal depiction “pa da PUM” or a vocal display like an inbreath. And there are going to be lots of intermediate forms as well.

There are yet other things that have been called “nonlexical” or variations thereof, that may or may not be groupable with either of these two broad categories. For instance, Nigel Ward has an interesting line of work on continuers, backchannels and the like, which he calls “nonlexical conversational sounds” (Ward 2006). Despite an interesting degree of formal gradience, I think the claim of nonlexicality here is premature, and may be too strong. Likewise, Schegloff has described the interjection Huh?, used to initiate repair, as a “virtually pre-lexical grunt” (Schegloff 1997). Comparative interactional linguistic research has since shown that many languages have an interjection of this kind, and while it may not be the most prototypical lexical item, it certainly is a word rather than a grunt: it is integrated in terms of phonology and interrogative prosody, and its cross-linguistic commonalities notwithstanding, the actual realisations show enough language-specificity that they have to be learned.

Some of these items may be close to the vocal displays above, a link that is alluring because they don’t sound like many other words. But I would hesitate to identify them with response cries, exclamations or grunts; as I have argued elsewhere, perhaps their peculiar shapes are not so much because they originate as involuntary grunts, but because they are optimally adapted to the exigencies of conversation (as we have argued in detail for “Huh?”). That topic is at the core of my newest research project on Elementary particles of conversation. More about that on some other occassion.


