Thinking visually with Remarkable

Sketches, visualizations and other forms of externalizing cognition play a prominent role in the work of just about any scientist. It’s why we love using blackboards, whiteboards, notebooks and scraps of paper. Many folks who had the privilege of working the late Pieter Muysken fondly remember his habit of grabbing any old piece of paper that came to hand, scribbling while talking, then handing it over to you.

Since the summer of 2021 I have owned a Remarkable, and it has become an essential part of my scientific workflow because it seamlessly bridges this physical form of thinking with the digital world of drafts, files and emails. I rarely rave about tools (to each their own, etc.) but this is one of those that has changed my habits for the better in several ways: I’ve been reading more, taking more notes, writing more, and also doodling and sketching more. As a cognitive scientist I would describe it as a distraction-free piece of technology with just the right affordances for powerful forms of extended cognition (it is probably no coincidence that it was recommended to me by fellow traveller Sébastien Lerique, whose interests range from embodied rationality to interaction).

One of ways in which the Remarkable has changed my workflow and my collaborations is that it is much easier to sketch a basic idea for a visualization and share it digitally. We use this during brainstorms to produce first impressions or visualize hypotheses. Often such a rough sketch then functions as a placeholder in a draft until we’ve made an actual version based on our data.

The above example from a recent paper with Andreas Liesenfeld shows this process: first my rough sketch of what the plot might look like, which fuels our discussion and helps me to express how to transform our source data in R. Then a ggplot version I made in R that preserves the key idea and adds some bells and whistles like loess lines and colour.

I want to credit my collaborator Andreas Liesenfeld for pushing me to do more of this visual-first way of thinking. One of the things Andreas often asks when brainstorming about a new paper is: “okay but what’s the visual?”. Thinking early about compelling visualizations has made our papers more tightly integrated hybrids of text and visuals than they might otherwise have been. For instance, our ACL paper has 7 figures, approximately one to a page, that support the arguments, help organize the flow, and generally make for a nicer reading experience.

Conceptual frameworks

Sketches can also be useful to work out conceptual frameworks. In a recent collaboration with Raphaela Heesen, Marlen Fröhlich, Christine Sievers and Marieke Woensdregt we spent a lot of time talking about ways to characterize various types of communicative “redoings” across species. A key insight was that the variety of terms used in different literatures (eg. primatology vs. human interaction) could actually be linked by looking more closely at the sequential structure of communicative moves. I sent off a quick Saturday morning doodle to my collaborators, and ultimately we published a polished version of it in our paper on communicative redoings across species (PDF here).

Finally, sketches are useful to express ideas and hypotheses visually even before the data is in. For instance, in current work with Bonnie McLean and Michael Dunn we’re thinking a lot about transmission biases and how they influence cultural evolution over time. Bonnie’s dataset looks at biases and rates of change in how concepts relate to phonemic features. It’s helped me to express my thinking on this visually, and I can’t wait to see what Bonnie ultimately comes up with. (This visualization is inspired in part by something I read about parallax in Nick Sousanis’ amazing book Unflattening.)

Sketch showing three panels side by side. One the left, a plot showing a time series with a multitude of grey lines in the lower range and a single black line rising above the grey mass to occupy a distinctly higher position on the Y axis.

In the middle, a skewed square with points corresponding to the end points of all the lines in the left panel, suggesting that it is a sliver of the end of the first plot.

On the right, the middel panel turned towards the reader into a square X-Y plot with a mass of grey dots joined by isolines roughly in the middle and a solitary black dot in the top right.

Not a review

This is not a review of the Remarkable — just a reflection on how it’s changed my academic life for the better. Every device has pros and cons. For instance, I don’t particularly love the overpriced stylus (‘Marker plus’) or how they sell Connect subscriptions for slightly better syncing options — though you should be aware you don’t need a subscription to do any of the things I’ve described in this post. And on the other hand, I absolutely do love the litheness of this device, the just-right friction when writing, and the fact that it has no backlight. The design in general strikes me as a perfect embodiment of that philosopher Ivan Illich has called ‘convivial tools’: tech that is sophisticated yet also responsibly limited in ways that support human flourishing. Anyway, there’s a good remarkable subreddit if you’re in the market for a device like this.

Note. Remarkable has a referral program that gives you a $40 (or equivalent) discount if you use this link to purchase one. If you like the device and keep it, that would also mean I earn $40, which I would use to treat my team to fancy coffee and cakes!

Always plot your data

Always plot your data. We're working with conversational corpora and looking at timing data. Here's a density plot of the timing of turn-taking for three corpora of Japanese and Spanish. At least 3 of the distributions look off (non-normal). But why?

Plotting turn duration against offset provides a clue: in the weird looking ones, there’s a crazy number of turns whose negative offset is equal to their duration — something that can happen if consecutive turns in the data share the exact same end time (very unlikely in actual data).

Plotting the actual timing of the turns as a piano roll shows what’s up: the way turns are segmented and overlap are highly improbable ways — imagine a conversation that goes like this! (in red are data points on the diagonal lines above)

Fortunately some of the corpora we have for these languages don’t show this — so we’re using those. If we hadn’t plotted the data in a few different ways it would have been pretty hard to spot, with consequences down the line. So: always plot your data.

Originally tweeted by (@DingemanseMark) on November 6, 2021.


Via Language Log, a nice tutorial titled Interactive Visualization for Computational Linguistics [PDF, 13,1 Mb] by Christopher Collins, Gerald Penn, and Sheelagh Carpendale. Includes not only lots of wonderful visualizations, but also a lot of background information on Gestalt perception, visualizations as ‘external cognition’, preattentive processing, info on a case study (slide 196ff.), and ample examples of different kinds of visualization software. See also InfoVis:Wiki — Linguistic Visualization.

Wordle now does Extended Latin and diacritics

Great news for those who are into visual corpus linguistics but don’t work on SAE languages: since July, Wordle handles alphabets in the Extended Latin ranges; and today its maker, Jonathan Feinberg, added support for combining diacritics. That means that you can now feed Wordle texts from languages that use tone marks and other diacritics in their orthographies. Like Siwu.

Wordle based on some ten minutes of spontaneous conversation in Siwu.

The Wordle above displays the most common words in some ten minutes of spontaneous conversation in Siwu, one of the fruits of my last fieldtrip. The conversation has four participants. Nothing groundbreaking about this particular Wordle, it’s just a nice word cloud starring: Continue reading

More visualizations


A visualization of the previous two posts on Many Eyes and Siwu ne

Because recursivity is a Good Thing, here is a visualization of the previous two posts on visualizing linguistic data with Many Eyes. The astute reader will note that the strange loop is not perfect since I didn’t use Many Eyes for the visualization — that is because nothing can do a simple visualization as beautifully as Wordle.

Unfortunately, Wordle doesn’t seem to handle Unicode outside the basic Latin range very well (probably because of the fancy fonts), otherwise I would’ve fed it some Siwu text, too. (I think Wordle could be made to work with SIL’s freely available Unicode fonts.)

Queensland grammar scandal at a glance

As an added bonus, here are the 75 most common content words from the recent discussion of the Queensland grammar scandal (sampled from three verbose posts at Language Log and from matjjin-nehen, including comments). It won’t help the debate, but it does give you the brouhaha at a glance. Another different something function. Grammatical grammar. Australian grammar errors indeed.


The 75 most common content words about the Queensland grammar brouhaha in the linguablogosphere

(Link to Wordle found in Cornelis Puschmann‘s feed.)

Many Eyes on Siwu ne

Lots of readers looked at the challenge I posted last week (my blog statistics say more than 450 views for the post alone, so that’s many eyes indeed). A few of you were even daring enough to come up with a story on the various functions of Siwu ne. The challenge was probably a bit too difficult (involving an untranslated text in an as of yet undescribed Niger-Congo language), which makes those few attempts all the more heroic. So what did they see?


ne and its right periphery; “ne, …” accounts for almost half of the tokens

Brett was the first to bite the bullet, providing some statistics on the use of ne. He noted that “it occurs sentence initially 158 times (out of 1161) and sentence terminally 83 times. (…) It often seems to bracket a whole clause and it can even be doubled. The ne kama ne string is quite common.” Ray Girvan didn’t trust the visualization and inspected the raw text instead. He discovered that the text contains some dialogues “as well as as a complete song/poem with multiple uses of “ne” in a question”; that the construct Si …. ne occurs frequently in it; and that the text probably consisted of several different text types. On came Jason with a number of rather detailed observations: Continue reading

Visual corpus linguistics with Many Eyes

I recently came across Many Eyes, a nifty data visualisation tool by IBM’s Visual Communication Lab. It has lots of options to handle tabular data, but —more interesting to linguists— it can also handle free text. The two visualization options it currently offers for text are a tag cloud and a so-called ‘word tree’. The former visualizes simple token frequency, the latter displays the occurences of a given word (or phrase) in a branching view. It is the latter that I find the most exciting feature, because it allows for rapid visual exploration of linguistic patterns in a text.

Take for instance the Siwu locative marker i. Before today I vaguely knew where it usually occurs (before an NP and after a VP, more or less). Now I know (1) that it also occurs sentence initially, as in I Ɔtuka ame, … {LOC Lolobi inside} ‘In Lolobi country, …’; (2) that it often precedes a deictic, as in …i mmɔ {LOC there} ‘over there’; and (3) that one can have nested occurrences, as in ma-sɛ ma-a-su kaku i ngbe-gɔ i ɔturi ɔ-kpi mmɔ {they-HAB they-FUT-take funeral LOC here-REL LOC person he-died there} ‘they usually will hold the funeral there were (‘in the place in which’) the person died’. The next step is to look more carefully into these particular constructions and improve my grammatical analysis. I might conclude, for example, that the distal deictic mmɔ is more nouny than I had taken it to be.

Of course, I would have discovered these facts eventually after carefully analyzing enough Siwu texts — but the point is that right now, finding and comparing these patterns took only five minutes of playing around with the word tree above. Cool, isn’t it? Let’s call it visual corpus linguistics. Continue reading