What sound symbolism can and cannot do: new paper in Language

What sound symbolism can and cannot doWe have a new paper out in Language:

Dingemanse, Mark, Will Schuerman, Eva Reinisch, Sylvia Tufvesson, and Holger Mitterer. 2016. “What Sound Symbolism Can and Cannot Do: Testing the Iconicity of Ideophones from Five Languages.” Language 92 (2): e117–33. doi:10.1353/lan.2016.0034

The basic finding is this: people are sensitive to the meaning of ideophones they’ve never heard, even when they are produced out of context by a computer voice in a difficult forced choice task. Yet there is also reason for caution: the effect is not nearly as strong as what people have found for pseudowords like bouba and kiki.

As we note in the introduction, “there appears to be a tendency to either underplay or exaggerate the significance of iconicity in the study of language and mind”. In this paper we chart a middle way between these extremes. Here’s a quick summary in 3×3 points:

What we did:

  1. Sound symbolism (iconicity in spoken language) is usually studied using hand-crafted pseudowords in binary forced choice experiments (think bouba and kiki, as reviewed here), but there are three problems with such experimental designs: (i) they run the risk of inflating effect sizes, (ii) it is unclear how they relate to natural languages, and (iii) they usually don’t control for prosody.
  2. We designed a study to tackle these problems by (i) adjusting the binary choice task to be more realistic and harder, (ii) using real words and meanings from natural languages, and (iii) teasing apart prosody and segmental features. Essentially, we bring linguistic insights to bear on the psychological study of sound symbolism.
  3. We take 203 ideophones —lexical sound-symbolic words— from 5 languages and 5 semantic domains and present them to 80 participants in 4 versions: (i) full original recording, (ii) full speech synthesized version, (iii) prosody-only condition and (iv) phonemes-only condition. The versions help us control for variation due to different speakers and help us examine the contributions of prosody and segmental features.

What we found:

  1. People can choose the correct translation of ideophones at a level significantly above chance. So ideophones in Japanese, Korean, Semai, Ewe and Siwu are not fully arbitrary, as is normally assumed of words; they contain iconic cues that even people who don’t speak these languages can pick up.
  2. Sound ideophones are easiest to guess, but the other semantic domains (movement, texture, color/visual appearance, and shape) come out significantly above chance as well. However, the effect is much more modest than most bouba/kiki studies: in the best versions, people score 57.2% on average (where 50% would be chance level) — quite different from the 95% that has sometimes been claimed for pseudoword studies.
  3. Performance for the original and resynthesised stimuli is indistinguishable, so our speech synthesis method works. Performance is significantly better for the full versions (i-ii) than for the reduced versions (iii-iv), so both prosody and phonemes contribute to the effect (and neither alone is sufficient).

What we conclude:

  1. Findings based on pseudowords like bouba/kiki cannot be automatically translated into claims about natural languages. Ideophones combine iconicity and arbitrariness, and lexical iconicity in ideophones is best characterised as a weak bias, which is supported by multimodal performances in actual use and which may be amplified in cultural evolution (cf our TiCS paper).
  2. Prosody is just as important as segmental information in supporting iconic interpretations (as predicted here). Prior work, which has rarely controlled for prosody, likely overestimates the role of speech sounds and underestimates the role of intonation, duration and other prosodic cues.
  3. Speech synthesis offers a viable way to achieve experimental control in the study of sound symbolism. To stimulate its wider use we’re making available all stimulus materials, including the diphone synthesis source files we used to create them. Get them at MUSE or OSF.
Here’s the abstract:

Sound symbolism is a phenomenon with broad relevance to the study of language and mind, but there has been a disconnect between its investigations in linguistics and psychology. This study tests the sound-symbolic potential of ideophones—words described as iconic—in an experimental task that improves over prior work in terms of ecological validity and experimental control. We presented 203 ideophones from five languages to eighty-two Dutch listeners in a binary-choice task, in four versions: original recording, full diphone resynthesis, segments-only resynthesis, and prosody-only resynthesis. Listeners guessed the meaning of all four versions above chance, confirming the iconicity of ideophones and showing the viability of speech synthesis as a way of controlling for segmental and suprasegmental properties in experimental studies of sound symbolism. The success rate was more modest than prior studies using pseudowords like bouba/kiki, implying that assumptions based on such words cannot simply be transferred to natural languages. Prosody and segments together drive the effect: neither alone is sufficient, showing that segments and prosody work together as cues supporting iconic interpretations. The findings cast doubt on attempts to ascribe iconic meanings to segments alone and support a view of ideophones as words that combine arbitrariness and iconicity. We discuss the implications for theory and methods in the empirical study of sound symbolism and iconicity.

Some things you need to know about Google Scholar

Summary: Google Scholar is great, but its inclusiveness and mix of automatically updated and hand-curated profiles means you should never take any of its numbers at face value. Case in point: the power couple Dr. A. Author and Prof. Et Al. If you have a Scholar profile, make sure you don’t let Scholar update the publication list automatically without checking and cleaning up regularly. If you’re looking at somebody else’s profile, take it with a big pinch of salt, especially when they have a reasonably common name or when duplicate entries or weird citation distributions indicate that it is being automatically updated. 

Update July 1st: Google Scholar has now manually blocked Prof. et al. from appearing in top rankings for her disciplines. They probably thought her too prominent a reminder of the gameability of their system (how long will it take before they silence her next of kin?). This doesn’t solve the real problem, noted below, of auto-updating profiles like Yi Zhang and John A. Smith diluting top rankings. In fact, even in scientometrics, it looks like there are at least 3 or 4 auto-updating profiles in the top 10.

I love Google Scholar. Like many scientists, I use it all the time for finding scientific literature online, and it is more helpful and comprehensive than services like PubMed, Sciencedirect, or JSTOR. I like that Google Scholar rapidly delivers scholarly papers as well as information about how these papers are cited. I also like its no-nonsense author profiles, which enable you to find someone’s most influential publications and gauge their relative influence at a glance. These are good things. But they are also bad things. Let’s consider why.

Three good things about Google Scholar

  1. Google Scholar is inclusive. It finds scholarly works of many types and indexes material from scholarly journals, books, conference proceedings, and preprint servers. In many disciplines, books and peer-reviewed proceedings are as highly valued and as influential as journal publications. Yet services like Web of Science and PubMed focus on indexing only journals, making Google Scholar a preferred tool for many people interested in publication discovery and citation counts.
  2. Its citation analysis is automated. Citations are updated continuously, and with Google indexing even the more obscure academic websites, keeping track of the influence of scholarly work has become easier than ever. You can even ask Scholar to send you an email when there are new citations of your work. There is very little selection, no hand-picking, and no influence from questionable measures like impact factor: only citations, pure and simple, determine the order in which papers are listed.
  3. Its profiles are done by scholars. No sane person wants to disambiguate the hundreds of scholars named Smith or clean up the mess of papers without named authors, titles or journals. Somebody at Google Scholar had the brilliant idea that this work can be farmed out to people who have a stake in it: individual scholars who want to make sure their contributions are presented correctly and comprehensively. So while citations are automated, the publication lists in Google Scholar profiles are at least potentially hand-curated by the profile owners. Pretty useful. But wait…

Three bad things about Google Scholar

The classic 'Title of paper', 1995

  1. Google Scholar is inclusive. It will count anything that remotely looks like an article, including the masterpiece “Title of article” (with 128 citations) by A. Author. It will include anything it finds on university web domains, so anyone with access to such a domain can easily game the system. Recently it has started to index stuff on academia.edu, a place without any quality control where anybody can upload anything for dissemination.
  2. Its citation analysis is automated. There are no humans pushing buttons, making decisions and filtering stuff. This means rigorous quality control is impossible. That’s why publications in the well-known “Name of journal” are counted as contributing bona fide citations, and indeed how “Title of article” can have 128 citations so far. It’s also why the recent addition of academia.edu content has resulted in an influx of duplicate citations due to poor metadata.
  3. Its profiles are done by scholars. Scholars have incentives to appear influential. H-indexes and citation counts play a role in how their work is evaluated and enter into funding and hiring decisions. Publications and co-authors can be added to Google Scholar manually without any constraints or control mechanism, an opportunity for gaming the system that some may find hard to resist. But forget malicious intent: scholars are people, and people are lazy. If Google Scholar tells them it can update their publications lists automatically, they’ll definitely do so — with consequences that can be as hilarious as harmful, as we’ll see below.

To illustrate these points, let’s have a look at the Google Scholar profiles of two eminent scholars, Dr. Author and Prof. Et Al.

Dr. Author

Enter dr. A. AuthorRanking second in the field of citation analysis, his h-index is 30 and he has over 3500 citations. Among his most influential papers are “Title of article” with 159 citations and “Title of paper” with 128 citations to date. It is a matter of some regret to him that his 1990 “Instructions to authors” has been less influential, but perhaps its time is yet to come. Dr. Author is active across a remarkable range of fields. He likes to write templates, editorials, and front matter but has also been known to produce peer-reviewed papers as well. His first name is variously spelled Andrew, Albert or Anonymous, but most people just call him “A.” and Google Scholar happily accepts that.

Dr. Author reminds us that Google Scholar citations are done by an automated system, and so will be necessarily noisy. His profile simply gathers anything attributed to “A. Author”, a listing that is automatically updated in accordance with Google Scholar’s recommended settings. How pieces like “Title of article” can accrue >100 citations is a bit of a mystery, especially since only a few of the citing articles are other templates. Some of A. Author’s highly cited papers seem to be due to incomplete metadata from the source; others seem to be simply misparses; some are correct in the sense that editorials are often authored by “anonymous author”. At any rate, this shows there are a lot of ghost publications and citations out there, some of which may easily be attributed to people or publications they don’t belong to.

But surely these are just quirks due to bad data — garbage in, garbage out, as they say. Actual scientists maintaining a profile can count on more reliable listings. Or can they?

Prof. Et Al

Enter prof. Et Al. With an h-index of 333 and over 1 million citations, she is the world’s most influential scientist, surpassing such highly cited scholars as Freud, Foucault, and Frith (what is it with F?). She has an Erdős number of 1 and ranks first in the disciplines of scientometrics, bibliometrics, quality control and performance assessment; in fact in any discipline she would care to associate herself to. How did she reach this status? Simply by (i) creating a profile under her name, (ii) blindly adding the publications that Google Scholar suggested were hers; (iii) allowing Scholar to update her profile automatically, as recommended. Oh, and just because Google Scholar allows her to, she also manually added some more papers she was sure she wrote (including with her good friend Paul Erdős).

Prof. Al reminds us that Google Scholar profiles are made by scholars. Scholars, being people, are mostly well-intentioned — but they can also be unsuspecting, lazy or worse. Prof. Al started out by simply doing what most scholars do when they create a new profile: following the instructions and recommended settings. If you do this blindly, Google Scholar will just add anything to your profile that comes remotely close to your name, and there is almost a guarantee that you’ll end up with a profile that way overestimates your scientific contributions.

Real-life examples

It is not that hard to find real examples of profiles getting a lot of extra padding because of Scholar’s automatic updating feature. Take Yi Zhang at Georgia Tech, who must surely be the most accomplished PhD student ever with 40.000+ citations and an h-index of 70. This is Google Scholar’s recommended “automatic updating” feature going bananas with what must be a very common name. Indeed, there is another Yi Zhang, ranking 4th in syntax just after Chomsky, Sag, and Kiparsky. His top cited paper has 306 citations and yet the sum of his work —a well-rounded total of 1000 publications— has somehow received over 23,000 citations. (Note that #5 and #6 in syntax are also auto-updating profiles.)

All this is mostly harmless fun, until you realise that a profile may be claiming the publications and citations of another one without either of them noticing. Case in point: the profile of Giovanni Arturo Rossi, an expert on respiratory diseases, is consistently hoovering up publications by my colleague Giovanni Rossi, who works on social interaction. Scholar auto-links author names to profiles in search results, preventing people from finding the real Rossi from his publications unless he actively and manually adds those Arturo-claimed publications to his profile.

Bottomline: if you have a common name, you’ll have to take control of every new publication manually, since otherwise Rossi (or Smith, or Zhang) is going to get it added automatically to their profile. Also, if you have a common name and you blindly follow Google Scholar’s recommended settings, you may be very pleased with your h-index, but probably for the wrong reasons (hello there John A. Smith, independent scholar, 23428 citations, h-index 64!). So my most general recommendation would be: don’t let Google Scholar update your profile automatically, and if you must, clean up regularly to avoid looking silly.

Know what you’re doing

So far, the examples arise simply from Google Scholar’s recommended setting to automatically update publication lists. It doesn’t look like any of these authors (well, except maybe dr. Author and prof. Et Al) have done anything like actively adding publications that aren’t theirs, or claiming they’ve worked with Paul Erdős. But here’s the thing: these things are not just possible, they are really easy, as prof. Et Al’s superstar profile shows. And with hundreds of thousands of active profiles, there’s bound to be some bad apples there.

What are the consequences? Nothing much if you take Google Scholar for what it is: a useful but imperfect tool. Yet many take it more seriously. If you’re in the business of comparing people (for instance while reviewing job applications or when looking for potential conference speakers), the metrics provided by Google Scholar are some of the first ones you’ll come across and it will be very tempting to use them. There is even an r package that will help you extract citation data and compare scholars based solely on citation numbers and h-indexes. All this is perilous business, considering these ranks are diluted with auto-updating ghost profiles.

Let me end by reiterating that I love Google Scholar and I use it all the time. It can be a tremendously useful tool. Like all tools, it can also be misinterpreted, misused and even gamed. If you know what you’re doing you should be fine. But if you think you can blindly trust it, take another look at the work of dr. A. Author and prof. dr. Et Al.

Oh, one more thing. If you’re organising a conference on scientometrics or bibliometrics, or looking to hire a new person in quality control or performance assessment, definitely have a look at the top ranking people in those fields. I believe you will find one individual who tops them all, and she’d definitely be worthy of an invitation.

Notes

The “A. Author” and “Et Al” profiles were created in June 2016 by Mark Dingemanse to illustrate the points made in this post. Thanks to Seán Roberts for suggesting that A. Author should co-author with Et Al. Just in case Google Scholar follows up with some manual quality control and some of these profiles or publications disappear, screenshots document all the relevant profiles and pages.

There is something of a tradition of creating Google Scholar profiles to make a point; see here and here, for example.

[Edit June 29, 2016: While my goal here is simply to promote mindful use of technology by noting some problems with Google Scholar profiles (as opposed to citations, the focus of most prior research), let me note there is of course a large scholarly literature in bibliometrics and scientometrics on the pros and cons of Google Scholar. Google Scholar Digest offers a comprehensive bibliography.]

 

Arbitrariness, iconicity and systematicity in language

arbicosysJust out in Trends in Cognitive Sciences: a review paper by yours truly with Damián Blasi, Gary Lupyan, Morten Christiansen and Padraic Monaghan. It is titled “Arbitrariness, iconicity and systematicity in language”. You can download it here (PDF). Here is a simple summary:

An important principle in linguistics is that words show no predictable relation between their form and their meaning (arbitrariness). Yet this principle does not have exclusive reign. Some words have forms that suggest aspects of their meaning (iconicity). Some groups of words have subtle statistical properties that give away something about their grammatical function (systematicity). To fully explain how words work, we need to recognise that the principle of arbitrariness is not the whole story, and that words can additionally show degrees of iconicity and systematicity.

Here are some of the main points made in the paper:

  1. Often, arbitrariness is thought to be not just necessary but also sufficient to explain how words work. The paper shows this is not the case: non-arbitrary patterns in language are more common than assumed, and they have implications for how we learn, process and use language.
  2. Often, arbitrariness and iconicity are pitted against each other. The paper shows this is an oversimplification: iconic words have a degree of arbitrariness and the two do not exclude each other.
  3. Often, the role of iconicity in language is thought to be minimal. The paper shows that can differ dramatically across languages and also varies as a function of meaning and modality (e.g. signed or spoken).
  4. Sometimes, iconicity and systematicity have been confused. The paper shows that distinguishing them helps us to better understand vocabulary structure, by showing why we may expect iconicity to show certain universal patterns while systematicity allows more language-specific patterns.
  5. Sometimes, we may forget that words are not abstract ideas but tools that have their own history. The paper argues that the way words are learned and used influences their form, and that this may help explain how arbitrariness, iconicity and systematicity pattern the way they do.
  6. Sometimes, language scientists make far-reaching claims based on studying a small portion of the vocabulary, or a small number of (typically Western) languages. The paper argues that we can get a better picture of language by looking at a wider range of evidence.
Dingemanse, Mark, Damián E. Blasi, Gary Lupyan, Morten H. Christiansen, and Padraic Monaghan. 2015. “Arbitrariness, Iconicity and Systematicity in Language.” Trends in Cognitive Sciences 19 (10): 603–15. doi:10.1016/j.tics.2015.07.013.

Folk Definitions in Linguistic Fieldwork

Folk definitionsAnother extensively revised chapter from my thesis sees the light: Folk definitions in linguistic fieldwork. In which I discuss a procedure that is part of many field work routines, but seldomly appreciated as a method of its own. Abstract:

Informal paraphrases by native speaker consultants are crucial tools in linguistic fieldwork. When recorded, archived, and analysed, they offer rich data that can be mined for many purposes, from lexicography to semantic typology and from ethnography to the investigation of gesture and speech. This paper describes a procedure for the collection and analysis of folk definitions that are native (in the language under study rather than the language of analysis), informal (spoken rather than written), and multi-modal (preserving the integrity of gesture-speech composite utterances). The value of folk definitions is demonstrated using the case of ideophones, words that are notoriously hard to study using traditional elicitation methods. Three explanatory strategies used in a set of folk definitions of ideophones are examined: the offering of everyday contexts of use, the use of depictive gestures, and the use of sense relations as semantic anchoring points. Folk definitions help elucidate word meanings that are hard to capture, bring to light cultural background knowledge that often remains implicit, and take seriously the crucial involvement of native speaker consultants in linguistic fieldwork. They provide useful data for language documentation and are an essential element of any toolkit for linguistic and ethnographic field research.

Dingemanse, Mark. 2015. “Folk Definitions in Linguistic Fieldwork.” In Language Documentation and Endangerment in Africa, edited by James Essegbey, Brent Henderson, and Fiona McLaughlin, 215–38. Amsterdam: John Benjamins. (PDF)

Bruner on language learning

Quote

Jerome Bruner (who turns 100 today!) writes in his 1983 autobiography (emphasis in original):

“How puzzling that there should be so much emphasis … on the underlying genetic program that makes language acquisition possible and so little on the ways in which the culture, the parents and more “expert” speakers (including other, older children) help the genetic program to find expression in actual language use. The educational level of parents deeply affects how well, richly and abstractly their children will talk (and listen). It is not just the grammar of sentences that is at issue, but discourse, dialogue, the capacity to interpret spoken and written language.

In the end, I came to the conclusion that the need to use language fully as an instrument for participating in a complex culture (just as the infant uses it to enter the simple culture of his surround) is what provides the engine for language acquisition. The genetic ‘program’ for language is only half the story. The support system is the other half.”

Three decades later, proposals for the other half, what Bruner calls “the engine of language acquisition”, have become increasingly well-articulated and supported by rich empirical data (cf., for instance, all the research reviewed in Tomasello’s (2008) Constructing a language). But the two halves (genetic underpinnings and cultural scaffolding) are still not regularly talking to each other. Indeed they’re frequently pretending that the other half has no story at all… Why?

Bruner, Jerome S. 1983. In Search of Mind: Essays in Autobiography. Alfred P. Sloan Foundation Series. New York u.a: Harper [and] Row.

Pragmatic Typology: invited panel at IPrA 2015 in Antwerp

Together with Giovanni Rossi I’ve organised an invited panel at the 14th International Pragmatic Conference in Antwerp, July 2015. Contributors include Jörg Zinken & Arnulf Deppermann; Sandy Thompson & Yoshi Ono; Stef Spronck; Giovanni Rossi, Simeon Floyd, Julija Baranova, Joe Blythe, Mark Dingemanse, Kobin Kendrick & N.J. Enfield; Ilana Mushin; and Mark Dingemanse. More information here.

IPRA Pragmatic Typology Panel

Hockett on open-mindedness in the language sciences

Hockett's design features (1960 version)Charles F. Hockett (1916-2000) is well-known for his work on the design features of language. Many linguists will know his 1960 article in Scientific American in which thirteen design features are nicely illustrated (though Hockett himself preferred the more developed 1968 version co-authored with Altmann).

Hockett worked in many areas of linguistics, from phonology to morphology and from linguistic anthropology to semantics. One of his later books — which I came across while doing research for our new book series Conceptual Foundations of Language Science — has the intriguing, slightly cumbersome title  “Refurbishing our Foundations: Elementary Linguistics from an Advanced Point of View”.

In this book, written towards the end of a long career, Hockett takes a birds’ eye view of the field of linguistics and presents his own perspective, which is often sensible, sometimes a bit idiosyncratic, and always interesting. The introduction is pleasantly constructive, in contrast to some other approaches (Hornstein, “Against open-mindedness” comes to mind). Hockett’s observations on the “eclipsing stance” are as relevant today as they were three decades ago. So here is Hockett on open-mindedness:

No one in any culture known to us denies the importance of language. Partly because it is important, partly just because, like Mount Everest, it is there, we should like to know how it works. To that end, people from time immemorial have examined it or speculated about it, trying to come up with cogent commentary.

What one sees of language, as of anything, depends on the angle of view, and different explorers approach from different directions. Unfortunately, sometimes they become so enamored of their particular approach that they incline to scoff at any other, so that instead of everybody being the richer for the variety, everybody loses. That attitude has been called the “eclipsing stance.”

The early followers of Noam Chomsky adopted this stance, but they were by no means the first: some of us post-Bloomfieldians came close to it in the 1940s (though Leonard Bloomfield himself never did), and so, apparently, did the Junggrammatiker in the late 1870s. But it is a wrong position to take, even toward those who have themselves assumed it. It is obviously impossible to see all of anything from a single vantage point. So it is never inappropriate to seek new perspectives, and always unseemly to derogate those favored by others. Or, to use a different figure: the blind man touching the tail has reason to say an elephant is like a rope, but no right to claim an elephant is not also like a wall or a tree-trunk or a snake.

I don’t mean we shouldn’t be critical. I do mean we should try to be most wary just of those propositions that we ourselves hold, or have held, closest to our hearts — above all, those we come to realize we have been taking for granted. Scientific hypotheses are formulated not to be protected but to be attacked. The good hypothesis defends itself, needing no help from enthusiastic partisans.

References

  • Hockett, Charles F. “The Origin of Speech.” Scientific American 203, no. 3 (1960): 89–96.
  • Hockett, Charles F., and Stuart A. Altmann. “A Note on Design Features.” In Animal Communication: Techniques of Study and Results of Research, edited by Thomas Sebeok, 61–72. Bloomington: Indiana University Press, 1968.
  • Hockett, 1987, Refurbishing our Foundations. Amsterdam: John Benjamins.

Conceptual Foundations of Language Science publishes its first book

Two months ago we started a new book series with the innovative open access publisher Language Science Press: Conceptual Foundations of Language Science. We’re proud to announce that the series published its first book this week. The book, Natural causes of language is introduced here by Nick Enfield:

You can download your own copy of the book directly from Language Science Press: http://langsci-press.org/catalog/book/48. If you prefer a print copy, you can order one through Amazon.

About the series

Conceptual Foundations of Language Science publishes short and accessible books that explore well-defined topics in the conceptual foundations of language science. The series provides a venue for conceptual arguments and explorations that do not require the traditional book-length treatment, yet that demand more space than a typical journal article allows. Books in the series are peer-reviewed, ensuring high scholarly quality; and they are open access, ensuring universal availability.

The editorial board of the series spans the full diversity of the language sciences, from phonology to syntax and semantics, from grammar to discourse, and from generative to functional and typological approaches to language: Balthasar Bickel (University of Zürich), Claire Bowern (Yale University), Elizabeth Couper-Kuhlen (University of Helsinki), William Croft (University of New Mexico), Rose-Marie Déchaine (University of British Columbia), William A. Foley (University of Sydney), William F. Hanks (University of California at Berkeley), Paul Kockelman (Yale University), Keren Rice (University of Toronto), Sharon Rose (University of California at San Diego), Frederick J. Newmeyer (University of Washington), Wendy Sandler (University of Haifa), and Dan Sperber (Central European University, Budapest).

Two basic ideas underlie the series. The first is that in times of empirical advances and methodological innovations, it is especially important to be clear and explicit about conceptual foundations. As we write in the series blurb, “In language science, our concepts about language underlie our thinking and organize our work. They determine our assumptions, direct our attention, and guide our hypotheses and our reasoning. Only with clarity about conceptual foundations can we pose coherent research questions, design critical experiments, and collect crucial data.”

The second idea is to take advantage of the affordances of open access publishing and step in a market gap left by commercial publishers. As we explain: “Traditional publishers tend not to publish very short books. The reasons are economic. With open-access, the problem does not arise. One benefit of the short format is that the book is accessible and quickly readable. Another is that authors will find writing such a book attractive because it is manageable, given the usual time constraints, especially for more senior authors.”

Do you have an idea for a book, or do you have a manuscript which would fit the goals of the series? Consider submitting it to Conceptual Foundations of Language Science. You’ll find further information on the website. Also check out Language Science Press, the visionary open access publishing house that hosts our series as well as a dozen others.

Editorial Manager and password security for academics

Today, Nature published a news feature by Cat Ferguson, Adam Marcus and Ivan Oransky (Retraction Watch) in which I am quoted about some problems with Editorial Manager (EM). This post provides the background to what I say there. Disclaimer: I am not a security expert, though the basic problems should be obvious to anyone caring about security and privacy on the web.

Editorial Manager (EM), the submission and reviewing software used by thousands of academic journals, routinely throws around passwords in plaintext. If you publish with any of the journals using EM, you’ll get emails with your password in plain text, even if you didn’t ask for it. Some configurations of EM even display the password in plain view on the user account page. This means that Editorial Manager does not safely encrypt passwords, which presents a massive security risk. Aries Systems, the firm behind Editorial Manager, defends itself by saying that (1) journal editors want these options and (2) they don’t collect financial information anyway. Those replies skirt the real issue: Editorial Manager, trusted by millions of academic authors and reviewers, fails to implement some of the most basic rules for the secure and responsible handling of passwords and user accounts.

Editorial manager

Every academic will run into Editorial Manager and its kin sooner or later. This is a piece of web-based software that helps the editors of academic journals to manage the submission and review procedure. Literally thousands of journals across all disciplines use it, from well-known interdisciplinary ones like PLOS ONE to niche journals like Policy and Society and Frontiers in Neuroendocrinology. Elsevier has its own branded version (EES) of what is essentially the same software.

EM requires authors, reviewers and editors to register an account. With such an account, users can submit and review manuscripts. The registration procedure asks for the usual username / email / password combination — nothing very special so far. Until you start using the system a bit more and you discover that it handles your password in, shall we say, a very casual way.

Take PLOS ONE (though note that any other journal using EM is vulnerable in the same ways). Say you submit a paper, or get a request to review one. You’ll get an email notification — with your password. You didn’t ask for this. In fact, even if you did, it shouldn’t be able to give it to you; at most it should offer you to set or reset it. Many of us haven’t seen plaintext password since the early 2000s; in the last decade, better and safer methods have been introduced everywhere, except at Editorial Manager.

Editorial Manager for PLoS ONE sends password in plain text

Editorial Manager for PLoS ONE sends you your password in plain text, even if you don’t ask for it

Packet sniffing and password reuse

I shouldn’t really have to explain why this is seriously problematic in multiple ways. Indeed why this is so is written all over the internet (1 2 3 4 5). The fact that some system sends you your password by email means that your password could be intercepted by any old packet sniffer on a network that you’re using (think free wifi). It also means, obviously, that your password can be retrieved by anyone who manages to get access to your email — either by looking over your shoulder, by rummaging in your inbox while you’re away from your computer, or by more sophisticated means.

Worse, the fact that the system can send out passwords means that passwords are stored in plaintext form, or using easily reversible encryption (which, experts say, comes down to the same). As plaintextoffenders puts it, the password is there on the server, waiting for someone to come and take it. And not only your password is there. It’s the passwords of the millions of users of the thousands of academic journals using this centrally hosted service. That, coupled with the knowledge that about 60% of users reuse passwords across different web services, means a security risk of massive proportions. Check out this XKCD comic for the basics on password reuse.

So EM freely shares passwords in emails. It also displays passwords on profile pages, offering further proof of the lack of encryption (or the use of reversible encryption). Thousands of academic journals crucially depend on this same system, making their hundreds of thousands of peer reviewers and authors sign up for it. You would think that with “periodic third party security and infrastructure audits” (according to Aries’ hosting checklist), Aries Systems would at least have ensured that the most basic lessons in user account security are taken care of. Apparently not.

Aries Systems: ‘It’s optional, so don’t worry’

While preparing a report on this matter in May 2013, I communicated my findings to the Editorial Manager team, because I thought it would be reasonable to give them the chance to respond to the issue. After a first email went unanswered, a reminder email led to an email exchange with Jennifer Fleet, Director of Client Services at Aries Systems, the company behind EM. Here are the most crucial excerpts from what she wrote to me (she gracefully gave me permission to cite this):

Our software (Editorial Manager) has a variety of configuration options that are made available to our publishing customers. The inclusion of credentials in emails is an optional configuration choice. The configuration option to include log-in credentials in emails is desired by some publishers because of the high convenience factor it provides to end users who infrequently access the system. However, inclusion of credentials in emails can also be entirely suppressed and many publishers in fact do not include credentials in emails. We have a wide variety of publishing customers and each is empowered by the administrative capabilities in the system to make their own choices concerning this type of policy.

While honest, this reply suggests that Aries Systems doesn’t realise how important it is to handle user information in a responsible way. The defense is basically: our clients want it, so we do it. But clients should never dictate security design. As a common principle in web development states, your responsibility goes beyond your application.

Modern, safe, and user-friendly ways of handling user account security (involving hashed+salted storage of passwords with tokenised ways of resetting (but never retrieving!) them) have been available for at least five years now. People have thought long and hard about this problem. Repeated breaches have shown how dangerous it is to use anything less than secure encryption and robust ways of resetting passwords. It is remarkable to see such a blatant disregard of industry-standard security measures.

Aries Systems continues by saying, “We do encrypt all passwords.” Here is a simple technical fact: any system that offers the possibility to switch on bizarre options like sending out or displaying plaintext passwords has to store its passwords in such a way that they can be easily fetched and decrypted. No matter how creatively you define “encryption”, you can’t get around that as long as you offer this ‘service’ to your customers.

Aries Systems: ‘We don’t collect financial information’

Aries Systems: “Also, Editorial Manager does not collect or store any financial information.

I would be happy to discuss this with you further by telephone or email, and I hope you will understand the dynamics and trade-offs under consideration.”

The final defense is that Editorial Manager doesn’t store financial information. This sounds like, “You haven’t given us your creditcard, so we’ll just handle your user accounts in an irresponsible way.” This disregards the fundamental principle, mentioned above, that your responsibility goes beyond your application. Additionally, it is of course only apparently a mitigating circumstance. It is widely known that about 60% of users reuse passwords across websites.

If malicious hackers were to get access to an EM server, how many of the emails and passwords would match with accounts on other services that do allow financial transactions? The userbase of EM consists of highly educated people in academia. They have creditcards, Amazon accounts, Paypal wallets, iTunes IDs, et cetera. A significant chunk of them may use the same password for some of those services. Put these things together and suddenly Editorial Manager becomes a very interesting hacking target. (I would not say this out loud here if I had not communicated all this to Aries Systems well over a year ago.)

It’s not just about financial information

The problem is not just about credit cards and such, but about the security of the very process of scholarly publishing. Editorial Manager is easily one of the weakest links in the chain of peer review. What if you could easily get access to someone’s account — pass yourself off as a peer reviewer, say, or get access to an editor’s account to invite your own friends (or yourself) to peer review your own papers?

This is not mere conjecture. It’s happened already, as documented by RetractionWatch: Elsevier’s editorial system (a branded version of EM) was hacked, leading to a peer review scandal and ultimately to a couple of retractions. The details of the case aren’t known, but with a link in the chain that is as weak as EM’s lighthearted handling of password security, I wouldn’t be surprised if some form of password hacking played a role; with the lax security of Editorial Manager, getting access to passwords is child’s play.

Let me end on a positive note. The journal Language recently transitioned to the open source Open Journal Systems, which, as far as I’ve been able to ascertain, handles passwords and account information in a much more secure and modern way. Such is the power of open source. Of course, this doesn’t really help us end users: we are unlikely to choose a publication venue on the basis of the manuscript management software they force us to use. But it does show that there are good models out there. Let’s hope that the dust kicked up by the Nature news story will bring some change for the better. Meanwhile, if you’re forced to use Editorial Manager, use disposable passwords and write to the editor to tell them of the risks of the system.

 

Mark Dingemanse
Max Planck Institute for Psycholinguistics, Nijmegen

Hockett on arbitrariness and iconicity

Quote

Charles F. HockettCharles Hockett had interesting views on the relation between iconicity and arbitrariness. Here is a key quote:

The difference of dimensionality means that signages can be iconic to an extent to which languages cannot; and they are, even though, as Frishberg (1975) tells us, the trend in Ameslan for over a century has been towards more and more conventionalization.

Now, while arbitrariness has its points (see, e.g., Hockett 1960a, p. 212), it also has drawbacks (Hewes, ANYAS, p. 495), so that it is perhaps more revealing to put the difference the other way around, as a limitation of spoken languages.

Indeed, the dimensionality of signing is that of life itself, and it would be stupid not to resort to picturing, pantomiming, or pointing whenever convenient. (Even when speaking we do this: for example, we utter a demonstrative such as there, which indicates relative distance but not direction, and supplement it by a pointing gesture that indicates direction but not distance.)

But when a representation of some fourdimensional hunk of life has to be compressed into the single dimension of speech, most iconicity is necessarily squeezed out. In one-dimensional projection, an elephant is indistinguishable from a woodshed. Speech perforce is largely arbitrary; if we speakers take pride in that, it is because in 50,000 years or so of talking we have learned to make a virtue of necessity (cf. Hill 1972, pp. 313-15).

Linearity means that single devices must serve multiple functions, whereupon structural ambiguity becomes par for the course (see C. R. Peters, Origins, pp. 83-102). We hear Carbon fourteen, Strontium ninety; out of context, we do not know whether this is mention of two radioactive isotopes, or a roadside marker giving the distance to two towns on the road ahead, or the final score in the game between Carbon Free Academy and Strontium Senior High. 

It is such ambiguities, forced by limited dimensionality if by nothing else, that have given rise to the- notion of “surface versus deep structure,” which Stokoe evokes for the remark of which the present paragraph is an expanded paraphrase-his trenchant observation (ANYAS, p. 510) that in sign, as over against speech, “surface and depth more nearly coincide.” (pp. 264-5)

Hockett, C. F. 1978. “In Search of Jove’s Brow.” American Speech 53 (4): 243–313. doi:10.2307/455140.