Large language models make it entirely trivial to generate endless amounts of seemingly plausible text. There’s no need to be cynical to see the virtual inevitability of unending waves of algorithmically tuned AI-generated uninformation: the market forces are in place and they will be relentless.
I say uninformation against the backdrop of Bateson’s tongue-in-cheek definition of information as ‘a difference that makes a difference’. If we don’t know (or can’t tell) the difference anymore, we are literally un-informed.
It is likely that a company like OpenAI sees some of this and that they’re keeping, for instance, time-stamped samples of AI-hallucinated content to enable some degree of textual provenance — but given how hard it is to deal with content farms already I think there’s little reason to be optimistic.
Which has an important consequence. The web makes up a large chunk of the data feeding GPT3 and kin. Posting the output of large language models online builds a feedback loop that cannot improve quality (unless we have mechanisms for textual provenance) and so will lead to uninformation feeding on uninformation.
All ingredients for an information heat death are on hand. True human-generated and human-curated information —of the kind produced, for instance, by academics in painstaking observations and publications— will become more scarce, and therefore more valuable. Counterintuitively, there was never a better time to be a scholar.
- Bateson, G. (1979). Mind and Nature. New York: E.P. Dutton.
One response to “Large language models and the unstoppable tide of uninformation”
[…] two years ago I wrote about the unstoppable tide of uninformation that follows the rise of large language models. With ChatGPT and other models bringing large-scale […]