Why article-level metrics are better than JIF if you value talent over privilege

I’ve been caught up in a few debates recently about Recognition and Rewards, a series of initiatives in the Netherlands to diversify the ways in which we recognize and reward talent in academia. One flashpoint was the publication of an open letter signed by ~170 senior scientists (mostly from medical and engineering professions), itself written in response to two developments. First, the 2019 shift towards a “narrative CV” format in grant applications for the Dutch Research Council (NWO), as part of which applicants are asked to show evidence of the excellence, originality and impact of their work using article-level metrics instead of journal level metrics like the Journal Impact Factor (JIF). Second, the recent announcement of Utrecht University (signatory of the Declaration on Research Assessment) to abandon the JIF in its hiring and promotion processes (see coverage).

Why funders in search of talent are ditching the JIF

Some background will be useful. The decision to not use JIF for the evaluation of researchers and their work is evidence-based. There is a lot of work in bibliometry and beyond showing that a 2-year average of a skewed citation distribution is an imperfect measure of journal quality, a driver for perverse incentives, and a misleading proxy for the quality of individual papers. Indeed Clarivate itself, the for-profit provider of the JIF metric, has this to say about it: “In the case of academic evaluation for tenure, it is inappropriate to use a journal-level metric as a proxy measure for individual researchers, institutions, or articles”.

Despite this evidence, JIFs have long been widely used across the sciences not just as a way to tell librarians which journals are making waves (= what they were designed for) but also as a quick heuristic to judge the merits of work appearing in them or people publishing in them. As they say, ‘Don’t judge a book by its cover, but do judge scientific work by its JIF’. There is a considerable halo-effect attached to JIFs, whereby an article that ends up in a high IF journal (whether by sheer brilliance or simply knowing the right editor, or both) is treated, unread, with a level of veneration normally reserved for Wunderkinder. Usually this is done by people totally oblivious to network effects, gatekeeping and institutional biases.

It appears that the decision to explicitly outlaw the use of JIFs now has people coming out of the woodwork to protest. The first letter (and also another one by a number of junior medical scientists) is aimed specifically at the prohibition against using the JIF, which is (incorrectly) framed as a ban on all quantification. The feeling is that this deprives us of a valuable (if inexact) metric that has long been used as a quick heuristic of the ‘value’ or ‘quality’ of work.

‘Halo? What halo?’

Raymond Poot, main author of the first letter, strongly believes that the JIF, even if inexact, should not be ditched. Saying, “Let’s talk JIF”, he provides this diagram of citation distributions in support:

The diagram compares the citation distributions of Nature and PLOS ONE (an open access megajournal). Poot’s argument, if I understand it well, is that even if Nature’s JIF is skewed by a few highly cited papers, the median number of citations is still higher, at 13, than the median number of cites that PLOS ONE papers receive (which looks like 1). As Poot says in reference to an earlier tweet of mine on the halo-effect, ‘Halo? What halo?’.

We want to identify and reward good work wherever it appears

We’ll get to that halo. First things first. We’re talking about whether using the JIF (a journal’s 2-year citation average) is a good idea if you want to identify and reward good individual work. And especially whether using the JIF is better or worse than using article-level metrics. Another assumption: we care about top science so we would like to identify good work by talented people wherever it appears. Analogy: if articles are like schoolkids, we want to scout for talent everywhere, not just at the fancy private school where privileges and network can obscure diverse and original talent.

Let’s assume the figure represents the citation distributions reasonably well (I’m going to ignore the obvious folly of taking an average of a very skewed and clearly not unimodal distribution). Where is the JIF halo? Right in front of you, where it says, for publication numbers, “in thousands for PLOS ONE”. Publication volume differs by an order of magnitude. This diagram hides that by heavily compressing the PLOS distribution, which is never good practice for responsible visualization, so let’s fix that. We’ll lose exact numbers (they’re hard to get) but the difference is large enough for this to work whatever the numbers.

The enormous difference in sheer volume means that an OA megajournal is likely to have quite a few papers with more cites than the Nature median — high impact work that we would miss entirely if we focused only on the JIF. The flip side is where we find the halo effect: there are, in any given year, hundreds of Nature papers that underperform quite a bit relative to the IF (indeed half of them underperform relative to the median). This —the skewed distributions for both the megajournal and the glamour journal— shows why it is a bad idea to ascribe properties to individual papers based on how other papers published under the same flag have been cited.

On average, my paper is better than yours

“But still, surely on average Nature papers are…” — besides the point. I would rather give a talent grant to the bright student who made their way through the public school system (an over-performing PLOS paper) than to the one who dangles at the bottom of the distribution of their privileged private school (an underperforming Nature paper). Identifying talent on the basis of JIF instead of content or impact is like handing out bonus points to private school essays in the central exam. “But on average those elite schools do tend to do better don’t they?” Unsurprisingly, they do, and if you think such differences are meaningful or worth further reinforcement , it’s worth reading some more sociology, starting perhaps with the diversity-innovation paradox.

There are other issues with our hyperfocus on glamour journals. These journals like to publish good work but they also apply some highly subjective filters (selecting for ‘broad appeal’ or ‘groundbreaking’ research — phrases that will sound familiar from the desk-rejects that statistically speaking many readers from academia will have seen). Nature prides itself on an 8% acceptance rate, the same chances that we rightly call a lottery when it concerns grant proposals. Being overly selective inevitably means that you’ll miss out on top performers. One recent study concluded that this kind of gate-keeping often leads us to miss highly impactful ideas and research:

However, hindsight reveals numerous questionable gatekeeping decisions. Of the 808 eventually published articles in our dataset, our three focal journals rejected many highly cited manuscripts, including the 14 most popular; roughly the top 2 percent. Of those 14 articles, 12 were desk-rejected. This finding raises concerns regarding whether peer review is ill-suited to recognize and gestate the most impactful ideas and research.

Gatekeepers of course also introduce their own networks, preferences and biases with regards to the disciplines, topics, affiliations, and genders they’re more likely to favour. In this context, Nature has acknowledged the sexism of how its editorial boards are constituted, and as the New York Times wrote last year, the publishing process at top journals is “deeply insular, often hinging on personal connections between journal editors and the researchers from whom they solicit and receive manuscripts”.

From smoke and mirrors to actual article-level impact

“But doesn’t my Nature paper count for anything?” I sure hope it does. And the neat thing is, under the new call for evidence-based CVs you can provide evidence and arguments instead of relying on marketing or association fallacies. Do show us what’s so brilliant and original about it. Do tell us about your contributions to the team, about the applications of your work in industry, and about the relative citation ratio of your paper. Indeed, such article-level metrics are explicitly encouraged as a direct indicator of impact and originality. To spell it out: a PLOS ONE paper that makes it to the 1% most cited papers of its field is more telling than, say, a Nature paper of the same age that has managed to accrue a meagre 30 cites. An evidence-based CV format can show this equally for any type of scientific output, without distracting readers with the smoke and mirrors of the JIF.

Scientists are people, and people are easily fooled by marketing. That is going to be the case whether we mention the JIF or not. (The concerned medical scientists writing the letters know full well that most grant reviewers will know the “top” journals and make inferences accordingly.) The purpose of outlawing the JIF is essentially a nudge, designed to make evaluators reflect on this practice, and inviting them to look beyond the packaging to the content and its actual impact. I can only see this as an improvement — if the goal is to identify truly excellent, original and impactful work. Content over silly bean counts. True impact over halo effects.

If you want to find actual impact, look beyond the JIF

I have focused so far on PLOS ONE and Nature because that’s the example provided by Raymond Poot. However, arguably these are two extremes in a very varied publishing landscape. Most people will seek more specialised venues or go for other multidisciplinary journals. But the basic argument easily generalizes. Most journals’ citation distributions will overlap more than those of PLOS ONE and Nature. For instance, let’s take three multidisciplinary journals titrated along the JIF ranks: Science Advances (14.1), PNAS (11.1), and Scientific Reports (4.4). Set up by Nature to capture some of the market share of OA megajournals, Scientific Reports is obviously less artificially selective than the other two. And yet its sheer publication volume means that a larger number of high impact papers appear in Scientific Reports than in PNAS and Science Advances combined! This means, again, that if you want to find high impact work and you’re just looking at high IF journals, you’re missing out.

Trying to find good or impactful work on the basis of the JIF is like searching for your keys under the streetlight because that’s where the light is. Without it, we stand a better chance of identifying truly groundbreaking work across the board — and fostering diversity and innovation in the process.

Caveats. I’ve used article-level citations as a measure of impact here because they most directly relate to the statistically illiterate but widespread use of the JIF to make inferences about individual work or individual researchers. However, citations come with a time lag, are subject to known biases against underrepresented minorities, and are only one of multiple possible measures of originality, reach and impact. Of course, to the extent that you think this makes actual citations problematic as indicators of article-level impact or importance, it means the JIF is even more problematic.

3 thoughts on “Why article-level metrics are better than JIF if you value talent over privilege”

Leave a Reply

Your email address will not be published. Required fields are marked *