Malinowski (1922) on Large Language Models

It’s easy to forget amidst a rising tide of synthetic text, but language is not actually about strings of words, and language scientists would do well not to chain themselves to models that presume so. For apt and timely commentary we turn to Bronislaw Malinowski who wrote:

there is a series of phenomena of great importance which cannot possibly be recorded by questioning or computing documents, but have to be observed in their full actuality. Let us call them the inponderabilia of actual life.

In follow-up work, Malinowski has critiqued the unexamined use of decontextualised strings of words as a proxy for Meaning:

To define Meaning, to explain the essential grammatical and lexical characters of language on the material furnished by the study of [written records], is nothing short of preposterous in the light of our argument. Yet it would be hardly an exaggeration to say that 99 per cent of all linguistic work has been inspired by the study of dead languages or at best of written records torn completely out of any context of situation.

Malinowski did not write this on his substack, in an op-ed in the New York Times, or in a preprint. He spent time doing primary fieldwork, lived with people whose language he learned, and based on close observation of language in everyday use came to an informed critique of his contemporaries’ extreme reliance on strings of text.

He did all this over a century ago, and yet here we are, running in circles around stochastic text generators or text regurgitators, as we may call the LLMs that today excel in next token prediction. Makes me think of something Wittgenstein wrote in another context, for a similar problem: “A picture held us captive. And we could not get outside it, for it lay in our language and language seemed to repeat it to us.”

  • Malinowski, B. (1922). Argonauts Of The Western Pacific. London: Routledge & Kegan Paul.
  • Malinowski, B. (1923). The problem of meaning in [underdescribed*] languages. In C. K. Ogden & Richards (Eds.), The meaning of meaning (pp. 296–336). London: Kegan Paul.

* I write [underdescribed] where Malinowski had ‘primitive’ to draw attention to the following: Malinowski wrote at a time when scientific racism meant that “modern” or “civilized” languages were habitually contrasted with “primitive” or “savage” ones — even as his own work helped demolish that distinction and showed the primacy of language use in everyday life across societies.

Update July 2023: If, despite this, you’re interested in “Large Language Models”, we have some relevant work for you: Opening up ChatGPT (and accompanying paper, and blog post).

