Some things you need to know about Google Scholar

Summary: Google Scholar is great, but its inclusiveness and mix of automatically updated and hand-curated profiles means you should never take any of its numbers at face value. Case in point: the power couple Prof. Et Al and Dr. A. Author, whose profiles I created following Scholar’s recommended settings (and a bit of manual embellishment). If you have a Scholar profile, make sure you don’t let Scholar update the publication list automatically without checking and cleaning up regularly. If you’re looking at somebody else’s profile, take it with a big pinch of salt, especially when they have a reasonably common name or when duplicate entries or weird citation distributions indicate that it is being automatically updated. 

Update July 1st: Google Scholar has now manually blocked Prof. et al. from appearing in top rankings for her disciplines. They probably thought her too prominent a reminder of the gameability of their system (how long will it take before they silence her next of kin?). This doesn’t solve the real problem, noted below, of auto-updating profiles like Yi Zhang and John A. Smith diluting top rankings. In fact, even in scientometrics, it looks like there are at least 3 or 4 auto-updating profiles in the top 10.

I love Google Scholar. Like many scientists, I use it all the time for finding scientific literature online, and it is more helpful and comprehensive than services like PubMed, Sciencedirect, or JSTOR. I like that Google Scholar rapidly delivers scholarly papers as well as information about how these papers are cited. I also like its no-nonsense author profiles, which enable you to find someone’s most influential publications and gauge their relative influence at a glance. These are good things. But they are also bad things. Let’s consider why.

Three good things about Google Scholar

  1. Google Scholar is inclusive. It finds scholarly works of many types and indexes material from scholarly journals, books, conference proceedings, and preprint servers. In many disciplines, books and peer-reviewed proceedings are as highly valued and as influential as journal publications. Yet services like Web of Science and PubMed focus on indexing only journals, making Google Scholar a preferred tool for many people interested in publication discovery and citation counts.
  2. Its citation analysis is automated. Citations are updated continuously, and with Google indexing even the more obscure academic websites, keeping track of the influence of scholarly work has become easier than ever. You can even ask Scholar to send you an email when there are new citations of your work. There is very little selection, no hand-picking, and no influence from questionable measures like impact factor: only citations, pure and simple, determine the order in which papers are listed.
  3. Its profiles are done by scholars. No sane person wants to disambiguate the hundreds of scholars named Smith or clean up the mess of papers without named authors, titles or journals. Somebody at Google Scholar had the brilliant idea that this work can be farmed out to people who have a stake in it: individual scholars who want to make sure their contributions are presented correctly and comprehensively. So while citations are automated, the publication lists in Google Scholar profiles are at least potentially hand-curated by the profile owners. Pretty useful. But wait…

Three bad things about Google Scholar

The classic 'Title of paper', 1995

  1. Google Scholar is inclusive. It will count anything that remotely looks like an article, including the masterpiece “Title of article” (with 128 citations) by A. Author. It will include anything it finds on university web domains, so anyone with access to such a domain can easily game the system. Recently it has started to index stuff on academia.edu, a place without any quality control where anybody can upload anything for dissemination.
  2. Its citation analysis is automated. There are no humans pushing buttons, making decisions and filtering stuff. This means rigorous quality control is impossible. That’s why publications in the well-known “Name of journal” are counted as contributing bona fide citations, and indeed how “Title of article” can have 128 citations so far. It’s also why the recent addition of academia.edu content has resulted in an influx of duplicate citations due to poor metadata.
  3. Its profiles are done by scholars. Scholars have incentives to appear influential. H-indexes and citation counts play a role in how their work is evaluated and enter into funding and hiring decisions. Publications and co-authors can be added to Google Scholar manually without any constraints or control mechanism, an opportunity for gaming the system that some may find hard to resist. But forget malicious intent: scholars are people, and people are lazy. If Google Scholar tells them it can update their publications lists automatically, they’ll definitely do so — with consequences that can be as hilarious as harmful, as we’ll see below.

To illustrate these points, let’s have a look at the Google Scholar profiles of two eminent scholars, Dr. Author and Prof. Et Al.

Dr. Author

Enter dr. A. AuthorRanking second in the field of citation analysis, his h-index is 30 and he has over 3500 citations. Among his most influential papers are “Title of article” with 159 citations and “Title of paper” with 128 citations to date. It is a matter of some regret to him that his 1990 “Instructions to authors” has been less influential, but perhaps its time is yet to come. Dr. Author is active across a remarkable range of fields. He likes to write templates, editorials, and front matter but has also been known to produce peer-reviewed papers as well. His first name is variously spelled Andrew, Albert or Anonymous, but most people just call him “A.” and Google Scholar happily accepts that.

Dr. Author reminds us that Google Scholar citations are done by an automated system, and so will be necessarily noisy. His profile simply gathers anything attributed to “A. Author”, a listing that is automatically updated in accordance with Google Scholar’s recommended settings. How pieces like “Title of article” can accrue >100 citations is a bit of a mystery, especially since only a few of the citing articles are other templates. Some of A. Author’s highly cited papers seem to be due to incomplete metadata from the source; others seem to be simply misparses; some are correct in the sense that editorials are often authored by “anonymous author”. At any rate, this shows there are a lot of ghost publications and citations out there, some of which may easily be attributed to people or publications they don’t belong to.

But surely these are just quirks due to bad data — garbage in, garbage out, as they say. Actual scientists maintaining a profile can count on more reliable listings. Or can they?

Prof. Et Al

Enter prof. Et Al. With an h-index of 333 and over 2 million citations, she is the world’s most influential scientist, surpassing such highly cited scholars as Freud, Foucault, and Frith (what is it with F?). She has an Erdős number of 1 and ranks first in the disciplines of scientometrics, bibliometrics, quality control and performance assessment; in fact in any discipline she would care to associate herself to. How did she reach this status? Simply by (i) creating a profile under her name, (ii) blindly adding the publications that Google Scholar suggested were hers; (iii) allowing Scholar to update her profile automatically, as recommended. Oh, and just because Google Scholar allows her to, she also manually added some more papers she was sure she wrote (including with her good friend Paul Erdős).

Prof. Al reminds us that Google Scholar profiles are made by scholars. Scholars, being people, are mostly well-intentioned — but they can also be unsuspecting, lazy or worse. Prof. Al started out by simply doing what most scholars do when they create a new profile: following the instructions and recommended settings. If you do this blindly, Google Scholar will just add anything to your profile that comes remotely close to your name, and there is almost a guarantee that you’ll end up with a profile that way overestimates your scientific contributions.

Real-life examples

It is not that hard to find real examples of profiles getting a lot of extra padding because of Scholar’s automatic updating feature. Take Yi Zhang at Georgia Tech, who must surely be the most accomplished PhD student ever with 40.000+ citations and an h-index of 70. This is Google Scholar’s recommended “automatic updating” feature going bananas with what must be a very common name. Indeed, there is another Yi Zhang, ranking 4th in syntax just after Chomsky, Sag, and Kiparsky. His top cited paper has 306 citations and yet the sum of his work —a well-rounded total of 1000 publications— has somehow received over 23,000 citations. (Note that #5 and #6 in syntax are also auto-updating profiles.)

All this is mostly harmless fun, until you realise that a profile may be claiming the publications and citations of another one without either of them noticing. Case in point: the profile of Giovanni Arturo Rossi, an expert on respiratory diseases, is consistently hoovering up publications by my colleague Giovanni Rossi, who works on social interaction. Scholar auto-links author names to profiles in search results, preventing people from finding the real Rossi from his publications unless he actively and manually adds those Arturo-claimed publications to his profile.

Bottomline: if you have a common name, you’ll have to take control of every new publication manually, since otherwise Rossi (or Smith, or Zhang) is going to get it added automatically to their profile. Also, if you have a common name and you blindly follow Google Scholar’s recommended settings, you may be very pleased with your h-index, but probably for the wrong reasons (hello there John A. Smith, independent scholar, 23428 citations, h-index 64!). So my most general recommendation would be: don’t let Google Scholar update your profile automatically, and if you must, clean up regularly to avoid looking silly.

Know what you’re doing

So far, the examples arise simply from Google Scholar’s recommended setting to automatically update publication lists. It doesn’t look like any of these authors (well, except maybe dr. Author and prof. Et Al) have done anything like actively adding publications that aren’t theirs, or claiming they’ve worked with Paul Erdős. But here’s the thing: these things are not just possible, they are really easy, as prof. Et Al’s superstar profile shows. And with hundreds of thousands of active profiles, there’s bound to be some bad apples there.

What are the consequences? Nothing much if you take Google Scholar for what it is: a useful but imperfect tool. Yet many take it more seriously. If you’re in the business of comparing people (for instance while reviewing job applications or when looking for potential conference speakers), the metrics provided by Google Scholar are some of the first ones you’ll come across and it will be very tempting to use them. There is even an r package that will help you extract citation data and compare scholars based solely on citation numbers and h-indexes. All this is perilous business, considering these ranks are diluted with auto-updating ghost profiles.

Let me end by reiterating that I love Google Scholar and I use it all the time. It can be a tremendously useful tool. Like all tools, it can also be misinterpreted, misused and even gamed. If you know what you’re doing you should be fine. But if you think you can blindly trust it, take another look at the work of dr. A. Author and prof. dr. Et Al.

Notes

The “A. Author” and “Et Al” profiles were created in June 2016 by Mark Dingemanse to illustrate the points made in this post. Thanks to Seán Roberts for suggesting that A. Author should co-author with Et Al. Just in case Google Scholar follows up with some manual quality control and some of these profiles or publications disappear, screenshots document all the relevant profiles and pages.

There is something of a tradition of creating Google Scholar profiles to make a point; see here and here, for example. While my goal here is simply to promote mindful use of technology by noting some problems with Google Scholar profiles (as opposed to citations, the focus of most prior research), let me note there is of course a large scholarly literature in bibliometrics and scientometrics on the pros and cons of Google Scholar. Google Scholar Digest offers a comprehensive bibliography.

5 thoughts on “Some things you need to know about Google Scholar

  1. what must be a very common name

    Understatement! Zhāng 张 is the third most common surname in the PRC (87.5 million people, 6.83 % of the population) as of 2007 and the fourth most common one (traditional character: 張) in the ROC (5.26 % of the population) as of 2010, says Wikipedia; its exact homophone 章 is much rarer (not in the top 100), but does include the actress Zhāng Zǐyí for example. Single-syllable given names are less common in China, but yi is four common syllables, and Google Scholar probably auto-adds every “Y. Zhang” just in case.

  2. But even far away from such extreme cases, there’s at least one fellow biologist who shares my exact full name, and there are other scientists with my last name whose first names begin with D. Google Scholar suggests all their publications to me. On top of that, my homonym seems not to have a Google Scholar profile. He’s lucky I haven’t turned auto-updating on.

  3. Wow! I guess I’m lucky with a relatively unique name (though there is a “Maria A Dingemanse” whose publications are suggested to me by Google Scholar). Auto-updating really should not be the recommended setting.

  4. Totally agree. The automatic update takes the moral responsibility of a person claiming other people’s work away. I had not looked at my google scholar for a while, and the citation increased a lot. It turned out that someone else’s work are listed. Probably google wants us to be glued to this site. But using this method is not very good. Having the biggest surname in China and a common initial, hundreds of papers would be added to your profile in a couple of days. One can get tired of it very easily and abandon the app. It seriously undermines the credibility of google scholar.

    The core problem is that Google Scholar simply misunderstand the incentives. For academic people, they have every incentive to publicise their works. Just let them add their work voluntarily should be good enough. At the same time, they would have to bear the responsibility to make sure what they add will be truly their own work. Now with this automatic function, no one will be responsible and everyone can claim that they do not know what’s going on. Not trustworthy at all. Sad

Leave a Reply

Your email address will not be published. Required fields are marked *


two × = 12