This is part II of my notes on the “Ideophones and non-lexical vocalisations” workshop. Part I is here. Note: the conceptual distinctions set out here represent work in progress and may be published at some point. I will update these posts with the reference if that happens.
Order at all points
One of the nice features of the workshop was the “rapid data session” format, which enabled analysts to make available one or two data extracts (often with audio, video and transcripts) for repeated inspection, allowing everyone in the audience to study them and make observations or ask questions. In this way we discussed data featuring vocalisations including ermmmnrrrnuh::, ʔouiʔ, ha:i: (sighed), du du ka du du du ka, k’hohhh, zuppum, hop-paa, and many more.
But there is method to the madness. For instance, talking on the topic of “How to audibly not say something with clicks”, Richard Ogden (York) showed how English speakers use various click sounds for double entendres, collusions and in general things that are treated as best left off the record. He also made a convincing case for a systematic, conventionalised contrast between lateral and central click sounds, which maps onto a contrast in social actions. Despite English not being generally known as a click language, English speakers have no trouble mastering this contrast and use it in everday interaction (some details are in Richard’s 2013 paper on clicks).
When speech sounds are distinctive in this way, linguists often use that as evidence to argue for phonemic status: the contrastive sounds earn their place in the phonology of the language. These conversational clicks form an interesting test case. Is a single systematic contrast, or even a small number of similarly contrasting items, sufficient for admission to the phoneme inventory, or is there some kind of threshold we use to determine this?
I think it is fair to point out that the majority of English words don’t feature contrastive click phonemes, and so this could be a reason to say they are not part of English phonology. But such frequency-based arguments can be slippery. Given that phonemes show a Zipfian distribution, we expect there to be relatively rare phonemes. Are clicks simply one extreme of this continuum? I can’t bring myself to agree with this either, if only because their distribution (in terms of places where they occur) seems quite different.
Most importantly, in English, these click sounds don’t seem to be contrastive within words the way p/t or k/g are, but instead are contrastive as stand alone items. So on a generous reading of ‘word’, these items are words, or at least lexical items, or at least conventionalised linguistic items. Which brings me to the key question I was left with after the workshop.
What, if anything, is “non-lexical”?
Throughout the workshop we faced the challenge of how exactly to refer to the various things we studied. One term used widely, mostly for want of a better one, was non-lexical vocalisations. While it may be the best we have currently, there are several issues with it.
First, it’s never great to define by negation. Is not being lexical the key feature setting apart these vocalisations as a phenomenon? What would lexical vocalisations be, anyway? We have the term ‘word’, so using an alternative like ‘vocalisation’ already implies some relevant difference between the items in focus and run-of-the-mill words like ‘cat’ or ‘mat’. And as we saw above for the clicks, a case could be made that even these phonologically outlandish items have some recognised (or at least recognisable) status as conventionalised items in a larger system of practices, i.e., a lexicon.
Second, calling them “non-lexical” implies that the lexicality of these items is somehow lacking or in doubt. True, these items are unlikely to be found in traditional lexicons; but the arbitrary constraints of printed dictionaries will never be a reliable guide for linguistic questions. Anyway this doesn’t help if we want to argue (as several of us did during the workshop) that the shape of these items can to an important degree be conventionalised, or that they may draw on partly conventionalised inventories of depictive practices, or that they are used in systematic ways, or that they form paradigmatic relations within larger systems of practices. All of these point to a conventionalised, and therefore possibly lexicalised, status of these things.
Depictions and displays
Before we worry about lexicality, it’s worth asking whether there is a unified phenomenon here in need of a single label like “non-lexical vocalisations”, or whether there are multiple distinct phenomena. I think there may be at least two clear groups of phenomena worth distinguishing:
1. Vocal depictions (≈ Clark’s ‘demonstrations’, Güldemann’s ‘mimesis’)
These are vocalisations typically presented as depictions of sensory scenes that enable others to perceive for themselves the scene depicted. Examples include ideophones, creative vocal imitations of sounds, movements and other sensory scenes. In Peircean terms, their mode of signification is primarily iconic. For example, a vocalisation like wop pa da PUM can iconically depict aspects of the temporal and kinetic dynamics of a sequence of dance moves (Keevallik 2010). Like all signs, vocal depictions may also have symbolic and indexical properties.
While most English speakers won’t feel that wop or pa da PUM are words, one could make a case for a degree of conventionalisation in particular communities of practice. For instance, dancers or musicians who work closely together likely converge on a small set of vocalisations they use in this way (Sundberg’s 1994 syllabling). From here it’s not far at all to the larger inventories of conventionalised vocal depictions we call ideophones. Indeed one place where we find ideophones is precisely in situations where there is a premium on sharing and calibrating sensory perceptions and achieving bodily coordination, as Elena Mihas (2013) has shown for ideophones in Ashenika Perene. (Some of these uses of ideophones are reviewed in a forthcoming article for the Oxford Research Encyclopedia of Linguistics; preprint here.) So I see vocal depictions as an overarching category that includes creative vocalisations as well as conventionalised ideophones, and everything in between.
2. Vocal displays (≈ Goffman’s ‘response cries’, Kockelman’s ‘interjections’ — I’m not sure whether ‘display’ is the best term here)
These are vocalisations typically produced as indexical signs of emotion, effort, evaluation. They are presented not so much depictions of events as responses to events. Examples include strain grunts, pain cries, yawns, interjections of disgust, vocal signs of cognitive effort, etc. For Goffman these would present themselves more as “giving off” than “giving” information, though of course precisely this opens up the possibility for people to produce or treat them as doing other things ostensibly off record. In Peircean terms, their mode of signification is primarily indexical. For instance, the phonetic form of a strain grunt does not itself present a resemblance to its ascribed meaning of ‘effort’ — it can be seen to indexically show that effort. Like all signs, vocal displays may also have symbolic and iconic properties.
I’m trying to be careful here in saying that vocal displays are “typically produced as indexical signs”. An inbreath or a click sound can be ‘merely’ an index of the physical process of preparing to speak, involuntarily produced; but that it regularly occurs in this indexical relationship means that we can also use it in a more controlled way to display imminent speakership, and therefore do interactional work. Likewise, something like um can be ‘merely’ an index of the cognitive process of starting to formulate a turn but not being ready to speak yet; but that it regularly occurs in this function makes it possible for us to do interactional work with it, for instance, buy ourselves time at interactionally fraught moments (Clark & Fox Tree 2003).
(Non)lexicality is an orthogonal issue
The two groups, vocal depictions and vocal displays, are united at least in being commonly treated as marginalia in the subjective sense (Dingemanse 2017). Further, vocal depictions and vocal displays are both more ‘showing’ than ‘telling’, though for different reasons: depictions because they iconically create a likeness (Donald 1998), displays because they indexically provide evidence of some inner feeling or state (Wharton 2003). Both groups also appear to allow a degree of gradience that seems to be less typical for more descriptive vocabulary: depictions because modifications in form analogically correspond to modifications in meaning, and displays because they are productively combined with a wide range of prosodic resources in the service of showing stance and streamlining interaction. All of these things may justify grouping them together as “vocalisations”. But I wouldn’t want to call them “non-lexical” across the board.
The reason is that lexicality is an orthogonal matter. Lexicality is a graded property (something can be more or less lexical) and it runs through both groups: in both, we have fully conventionalised lexical items like ideophones or the word “um” ; and items that are less clearly conventionalised and linguistically integrated, like the vocal depiction “pa da PUM” or a vocal display like an inbreath. And there are going to be lots of intermediate forms as well.
There are yet other things that have been called “nonlexical” or variations thereof, that may or may not be groupable with either of these two broad categories. For instance, Nigel Ward has an interesting line of work on continuers, backchannels and the like, which he calls “nonlexical conversational sounds” (Ward 2006). Despite an interesting degree of formal gradience, I think the claim of nonlexicality here is premature, and may be too strong. Likewise, Schegloff has described the interjection Huh?, used to initiate repair, as a “virtually pre-lexical grunt” (Schegloff 1997). Comparative interactional linguistic research has since shown that many languages have an interjection of this kind, and while it may not be the most prototypical lexical item, it certainly is a word rather than a grunt: it is integrated in terms of phonology and interrogative prosody, and its cross-linguistic commonalities notwithstanding, the actual realisations show enough language-specificity that they have to be learned.
Some of these items may be close to the vocal displays above, a link that is alluring because they don’t sound like many other words. But I would hesitate to identify them with response cries, exclamations or grunts; as I have argued elsewhere, perhaps their peculiar shapes are not so much because they originate as involuntary grunts, but because they are optimally adapted to the exigencies of conversation (as we have argued in detail for “Huh?”). That topic is at the core of my newest research project on Elementary particles of conversation. More about that on some other occassion.
- Akita, Kimi, and Mark Dingemanse. 2019. “Ideophones (Mimetics, Expressives).” In: Oxford Research Encyclopedia of Linguistics. Preprint: https://ling.auf.net/lingbuzz/004347
- Clark, Herbert H., and Jean E. Fox Tree. 2002. “Using Uh and Um in Spontaneous Speaking.” Cognition 84: 73–111.
- Dingemanse, Mark. 2014. “Making New Ideophones in Siwu: Creative Depiction in Conversation.” Pragmatics and Society 5 (3): 384–405. https://doi.org/10.1075/ps.5.3.04din.
- Dingemanse, Mark. 2017. “On the Margins of Language: Ideophones, Interjections and Dependencies in Linguistic Theory.” In Dependencies in Language, edited by N. J. Enfield, 195–202. Berlin: Language Science Press. https://doi.org/10.5281/zenodo.573781.
- Donald, Merlin. 1998. “Mimesis and the Executive Suite: Missing Links in Language Evolution.” In Approaches to the Evolution of Language: Social and Cognitive Bases, edited by James R. Hurford, Michael Studdert-Kennedy, and Chris Knight, 44–67. Cambridge: Cambridge University Press.
- Goffman, Erving. 1978. “Response Cries.” Language 54 (4): 787–815.
- Keevallik, Leelo. 2010. “Bodily Quoting in Dance Correction.” Research on Language & Social Interaction 43 (4): 401–26. https://doi.org/10.1080/08351813.2010.518065.
- Keevallik, Leelo. 2014. “Turn Organization and Bodily-Vocal Demonstrations.” Journal of Pragmatics, A body of resources – CA studies of social conduct, 65 (May): 103–20. https://doi.org/10.1016/j.pragma.2014.01.008.
- Kockelman, Paul. 2003. “The Meanings of Interjections in Q’eqchi’ Maya: From Emotive Reaction to Social and Discursive Action.” Current Anthropology 44 (4): 467–97.
- Mihas, Elena. 2013. “Composite Ideophone-Gesture Utterances in the Ashéninka Perené ‘Community of Practice’, an Amazonian Arawak Society from Central-Eastern Peru.” Gesture 13 (1): 28–62. https://doi.org/10.1075/gest.13.1.02mih.
- Ogden, Richard. 2013. “Clicks and Percussives in English Conversation.” Journal of the International Phonetic Association 43 (3): 299–320. https://doi.org/10.1017/S0025100313000224.
- Schegloff, Emanuel A. 1997. “Practices and Actions: Boundary Cases of Other-Initiated Repair.” Discourse Processes 23 (3): 499–545. https://doi.org/10.1080/01638539709545001.
- Ward, Nigel. 2006. “Non-Lexical Conversational Sounds in American English.” Pragmatics & Cognition 14: 129–82. https://doi.org/10.1075/pc.14.1.08war.
- Wharton, Tim. 2003. “Interjections, Language, and the `showing/Saying’ Continuum.” Pragmatics & Cognition 11: 39–91. https://doi.org/10.1075/pc.11.1.04wha.