Some things you need to know about Google Scholar

Summary: Google Scholar is useful, but its inclusiveness and mix of automatically updated and hand-curated profiles means you should never take any of its numbers at face value. Case in point: the power couple Prof. Et Al and Dr. A. Author, whose profiles I created following Scholar’s recommended settings (and a bit of manual embellishment). If you have a Scholar profile, make sure you don’t let Scholar update the publication list automatically. If you’re looking at somebody else’s profile, take it with a big pinch of salt, especially when they have a reasonably common name or when messy entries or weird citation distributions indicate that it is being automatically updated. 

Update: Prof Et Al and Dr. Author are still very much alive, but Google Scholar has manually blocked them from appearing in top rankings for their disciplines. They probably thought them too prominent a reminder of the gameability of their topic rankings. This doesn’t solve the underlying problem of auto-updating profiles like Yi Zhang and John A. Smith diluting top rankings. Moral: never take Google Scholar’s top rankings by keyword at face value.

I love Google Scholar. Like many scientists, I use it all the time for finding scientific literature online. For many disciplines, it is more helpful and comprehensive than services like PubMed, Sciencedirect, or JSTOR. I like that Google Scholar rapidly delivers scholarly papers as well as information about how these papers are cited. I also like its no-nonsense author profiles, which enable you to find someone’s most influential publications and gauge their relative influence at a glance. These are good things. But they are also bad things. Let’s consider why.

Three good things about Google Scholar

  1. Google Scholar is inclusive. It finds scholarly works of many types and indexes material from scholarly journals, books, conference proceedings, and preprint servers. In many disciplines, books and peer-reviewed proceedings are as highly valued and as influential as journal publications. Services like Web of Science and PubMed focus on indexing only a disciplinarily narrow subset of journals, making Google Scholar a preferred tool for many people interested in publication discovery and citation counts.
  2. Its citation analysis is automated. Citations are updated continuously, and with Google indexing even the more obscure academic websites, keeping track of the influence of scholarly work has become easier than ever. You can even ask Scholar to send you an email when there are new citations of your work. There is very little selection, no hand-picking, and no influence from questionable measures like impact factor: only citations, pure and simple, determine the order in which papers are listed.
  3. Its profiles are done by scholars. No sane person wants to disambiguate the hundreds of scholars named Smith or clean up the mess of papers without named authors, titles or journals. Somebody at Google Scholar had the brilliant idea that this work can be farmed out to people who have a stake in it: individual scholars who want to make sure their contributions are presented correctly and comprehensively. So while citations are automated, the publication lists in Google Scholar profiles are at least potentially hand-curated by the profile owners. Pretty useful. But wait…

Three bad things about Google Scholar

The classic 'Title of paper', 1995
  1. Google Scholar is inclusive. It will count anything that remotely looks like an article, including the masterpiece “Title of paper” (with 128 citations at the time of writing) by A. Author. It will include anything it finds on university web domains, so anyone with access to such a domain can easily game the system. Recently it has started to index stuff on academia.edu, a place without any quality control where anybody can upload anything for dissemination.
  2. Its citation analysis is automated. There are no humans pushing buttons, making decisions and filtering stuff. This means rigorous quality control is impossible. That’s why publications in the well-known “Name of journal” are counted as contributing bona fide citations, and indeed how “Title of article” can have 128 citations so far. It’s also why the recent addition of academia.edu content has resulted in an influx of duplicate citations due to poor metadata.
  3. Its profiles are done by scholars. Scholars have incentives to appear influential. H-indexes and citation counts play a role in how their work is evaluated and enter into funding and hiring decisions. Publications and co-authors can be added to Google Scholar manually without any constraints or control mechanism, an opportunity for gaming the system that some may find hard to resist. But forget malicious intent: scholars are people, and people are lazy. If Google Scholar tells them it can update their publications lists automatically, they’ll definitely do so — with consequences that can be as hilarious as harmful, as we’ll see below.

To illustrate these points, let’s have a look at the Google Scholar profiles of two eminent scholars, Dr. Author and Prof. Et Al.

Dr. Author

Enter dr. A. AuthorRanking second in the field of citation analysis, his h-index is 30 and he has over 3500 citations. Among his most influential papers are “Title of article” with 159 citations and “Title of paper” with 128 citations to date. It is a matter of some regret to him that his 1990 “Instructions to authors” has been less influential, but perhaps its time is yet to come. Dr. Author is active across a remarkable range of fields. He likes to write templates, editorials, and front matter but has also been known to produce peer-reviewed papers as well. His first name is variously spelled Andrew, Albert or Anonymous, but most people just call him “A.” and Google Scholar happily accepts that.

Dr. Author reminds us that Google Scholar citations are done by an automated system, and so will be necessarily noisy. His profile simply gathers anything attributed to “A. Author”, a listing that is automatically updated in accordance with Google Scholar’s recommended settings. How pieces like “Title of article” can accrue >100 citations is a bit of a mystery, especially since only a few of the citing articles are other templates. Some of A. Author’s highly cited papers seem to be due to incomplete metadata from the source; others seem to be simply misparses; some are correct in the sense that editorials are often authored by “anonymous author”. At any rate, this shows there are a lot of ghost publications and citations out there, some of which may easily be attributed to people or publications they don’t belong to.

But surely these are just quirks due to bad data — garbage in, garbage out, as they say. Actual scientists maintaining a profile can count on more reliable listings. Or can they?

Prof. Et Al

Enter prof. Et Al. With an h-index of 333 and over 2 million citations, she is the world’s most influential scientist, surpassing such highly cited scholars as Freud, Foucault, and Frith (what is it with F?). She has an Erdős number of 1 and ranks first in the disciplines of scientometrics, bibliometrics, quality control and performance assessment; in fact in any discipline she would care to associate herself to. How did she reach this status? Simply by (i) creating a profile under her name, (ii) blindly adding the publications that Google Scholar suggested were hers; (iii) allowing Scholar to update her profile automatically, as recommended. Oh, and just because Google Scholar allows her to, she also manually added some more papers she was sure she wrote (including with her good friend Paul Erdős).

Prof. Al reminds us that Google Scholar profiles are made by scholars. Scholars, being people, are mostly well-intentioned — but they can also be unsuspecting, lazy or worse. Prof. Al started out by simply doing what most scholars do when they create a new profile: following the instructions and recommended settings. If you do this blindly, Google Scholar will just add anything to your profile that comes remotely close to your name, and there is almost a guarantee that you’ll end up with a profile that way overestimates your scientific contributions.

Real-life examples

It is not that hard to find real examples of profiles getting a lot of extra padding because of Scholar’s automatic updating feature. Take Yi Zhang at Georgia Tech, who must surely be the most accomplished PhD student ever with 40.000+ citations and an h-index of 70. This is Google Scholar’s recommended “automatic updating” feature going bananas with what must be a very common name. Indeed, there is another Yi Zhang, ranking 4th in syntax just after Chomsky, Sag, and Kiparsky. His top cited paper has 306 citations and yet the sum of his work —a well-rounded total of 1000 publications— has somehow received over 23,000 citations. (Note that #5 and #6 in syntax are also auto-updating profiles.)

[edit Aug 2017: I’m pleased to see that both Zhangs have updated their profiles so I have removed the links. I won’t be playing whack-a-mole with this indefinitely, but it’s trivial to find other examples.]

All this is mostly harmless fun, until you realise that a profile may be claiming the publications and citations of another one without either of them noticing. Case in point: the profile of Giovanni Arturo Rossi, an expert on respiratory diseases, is constantly hoovering up publications by my colleague Giovanni Rossi, who works on social interaction. Scholar auto-links author names to profiles in search results, preventing people from finding the real Rossi from his publications unless he actively and manually adds those Arturo-claimed publications to his profile.

Bottomline: if you have a common name, you’ll have to take control of every new publication manually, since otherwise Rossi (or Smith, or Zhang) is going to get it added automatically to their profile. Also, if you have a common name and you blindly follow Google Scholar’s recommended settings, you may be very pleased with your h-index, but probably for the wrong reasons (hello there John A. Smith, independent scholar, 23428 citations, h-index 64!). So my most general recommendation would be: don’t let Google Scholar update your profile automatically. If you really think you must, clean up regularly to avoid looking silly.

Know what you’re doing

So far, the examples arise simply from Google Scholar’s recommended setting to automatically update publication lists. It doesn’t look like any of these authors (well, except maybe dr. Author and prof. Et Al) have done anything like actively adding publications that aren’t theirs, or claiming they’ve worked with Paul Erdős. But here’s the thing: these things are not just possible, they are really easy, as prof. Et Al’s superstar profile shows. And with hundreds of thousands of active profiles, there’s bound to be some bad apples there.

What are the consequences? Nothing much if you take Google Scholar for what it is: a useful but imperfect tool. Yet many take it more seriously. If you’re in the business of comparing people (for instance while reviewing job applications or when looking for potential conference speakers), the metrics provided by Google Scholar are some of the first ones you’ll come across and it will be very tempting to use them. There is even an r package that will help you extract citation data and compare scholars based solely on citation numbers and h-indexes. All this is perilous business, considering these ranks are diluted with auto-updating ghost profiles.

Let me end by reiterating that I love Google Scholar and I use it all the time. It can be a tremendously useful tool. Like all tools, it can also be misinterpreted, misused and even gamed. If you know what you’re doing you should be fine. But if you think you can blindly trust it, take another look at the work of dr. A. Author and prof. dr. Et Al.

Notes

The “A. Author” and “Et Al” profiles were created in June 2016 by Mark Dingemanse to illustrate the points made in this post. Thanks to Seán Roberts for suggesting that A. Author should co-author with Et Al. Just in case Google Scholar follows up with some manual quality control and some of these profiles or publications disappear, screenshots document all the relevant profiles and pages.

There is something of a tradition of creating Google Scholar profiles to make a point; see here and here, for example. While my goal here is simply to promote mindful use of technology by noting some problems with Google Scholar profiles (as opposed to citations, the focus of most prior research), let me note there is of course a large scholarly literature in bibliometrics and scientometrics on the pros and cons of Google Scholar. Google Scholar Digest offers a comprehensive bibliography.

27 thoughts on “Some things you need to know about Google Scholar”

  1. what must be a very common name

    Understatement! Zhāng 张 is the third most common surname in the PRC (87.5 million people, 6.83 % of the population) as of 2007 and the fourth most common one (traditional character: 張) in the ROC (5.26 % of the population) as of 2010, says Wikipedia; its exact homophone 章 is much rarer (not in the top 100), but does include the actress Zhāng Zǐyí for example. Single-syllable given names are less common in China, but yi is four common syllables, and Google Scholar probably auto-adds every “Y. Zhang” just in case.

  2. But even far away from such extreme cases, there’s at least one fellow biologist who shares my exact full name, and there are other scientists with my last name whose first names begin with D. Google Scholar suggests all their publications to me. On top of that, my homonym seems not to have a Google Scholar profile. He’s lucky I haven’t turned auto-updating on.

  3. Wow! I guess I’m lucky with a relatively unique name (though there is a “Maria A Dingemanse” whose publications are suggested to me by Google Scholar). Auto-updating really should not be the recommended setting.

  4. Totally agree. The automatic update takes the moral responsibility of a person claiming other people’s work away. I had not looked at my google scholar for a while, and the citation increased a lot. It turned out that someone else’s work are listed. Probably google wants us to be glued to this site. But using this method is not very good. Having the biggest surname in China and a common initial, hundreds of papers would be added to your profile in a couple of days. One can get tired of it very easily and abandon the app. It seriously undermines the credibility of google scholar.

    The core problem is that Google Scholar simply misunderstand the incentives. For academic people, they have every incentive to publicise their works. Just let them add their work voluntarily should be good enough. At the same time, they would have to bear the responsibility to make sure what they add will be truly their own work. Now with this automatic function, no one will be responsible and everyone can claim that they do not know what’s going on. Not trustworthy at all. Sad

  5. Google Scholar is very helpful, however, I constantly see people using other’s work in spite of NOT having a very common surname. I used to wonder looking at others how on earth these folks get over 1000 citations, or how some of my peers are jumping up the citation ladder pretty much everyday? Then I looked up closely and found out more than 50% of the cited work doesn’t even belong to them. I can only check for people I know but it is extremely difficult to know the actual number of papers for scientists who I don’t know about the body of work. Sometimes it is hard to check back on Pubmed, too as it has some loopholes about how exactly the name’s spelled and stuff like that. In today’s world citations are kind of “very important”, and quite a few of these researchers are either not diligently checking or being quite happy to see the magic number grow without even being bothered to know that how its happening. Is there a way for Google to rectify this?

  6. Is there a way for Google to rectify this?

    No, there isn’t, and that is one of the reasons I wrote this post. Google Scholar has made the fundamental choice to trust those who make profiles. This saves them an enormous amount of work. For conscientious scientists, it is attractive because they are given the tools to manage and clean up their own profiles. But it does come with the disadvantages described in this post.

    Perhaps social accountability can do some of the work here: we shouldn’t shy away from telling people that it looks weird to have papers on their profile they didn’t author. But fundamentally, the problem is unsolvable because the system relies on trust. All the more reason to assume good faith, yet be skeptical of unanalysed citation counts.

  7. Hi,

    Articles are automatically added to my profile which are not mine, how to remove from my Google scholar profile and stop GS to add automatically?

    Thanks.
    Dharmesh

  8. I hate that anonymous people can follow my Google Scholar profile (and receive notifications upon my new publications), without me being able to know them, or block them. It is very annoying.

  9. This is very interesting, Mark. Thanks.
    Perhaps you can tell me something – I have an edited volume and chapter with a publisher and though they have been cited, and listed on my page, the citations are not being picked up by google. Is this because there is no electronic version of the book? If the publisher puts citation information for each chapter on its website, would that be picked up? Advice welcome!

  10. It’s hard to predict what will and will not be picked up by Google Scholar. It doesn’t look at citation information provided by publishers; instead, it indexes fulltext sources, to which Google usually has access even if we personally may not. So over time it does tend to discover the majority of citations, though some will inevitably be missed.

  11. Hi, on my GS profile, I have articles which are followed by an ” * “. I checked all the articles listed and they all cite my paper. How can I validate the articles so that I don’t have anymore the ” * ” ?

  12. This happens when you have manually merged items which may differ in some bibliographical details. The only way to get rid of the “*” is to unmerge the item and choose to display only one version of it (rather than the merged version).

  13. What are the implications for one’s h index if entries such as conference presentations or brief conference abstracts are included, given in my field few people cite either of these sources? Should these types of entries be deleted in my profile?

  14. No implications. Only cited publications count towards h-index. It’s more a question of how helpful the profile is for others. I wouldn’t list conference presentations or abstracts because others will rarely be looking for them, and they are not the kind of substantive contributions you expect on a publication list.

  15. Dear Sir/Madam,
    I m not able to add more than 20 Co author in my account.Can u answer me
    Thank you.

  16. This is correct. Or rather, you are able to add >20 but Google Scholar will only display the first 20. And you can’t order them yourself.

  17. I have deleted my Google Scholar account because it has such an obviously wrong citation at the top of my articles, but I can still see my account. What is wrong? How can I delete it once and for all?

  18. That profile is not deleted indeed as I can also see it. If you have access to it, I would recommend to only remove the inaccurate publications and switch off the auto-updating. Google Scholar profiles are pretty useful when tended well.

Leave a Reply

Your email address will not be published. Required fields are marked *