Update: Now also on Academia.edu
A while back some low quality citations started showing up on Google Scholar. They had titles like “CHAPTER 2 draft — email email@example.com” and it was hard find actual bibliographic metadata. Google Scholar seemed to have scraped random PDFs uploaded on Academia.edu and decided it was worth counting the citations in them even in the absence of proper metadata. I shared this on Twitter and promptly forgot about it.
Then I got an email from someone asking me to say a bit more about my concerns with poor metadata. I decided to write it up in a blog post. I’m afraid it turned into a a bit of rant about how Academia.edu seems built not so much for sharing scientific information as for playing to our vanity. Sorry about that. Let’s start with the poor metadata issue, which turns out to be rather pervasive.
Academia.edu has a massive metadata problem
- Academia.edu doesn’t record any metadata except title and author for the bulk of papers, and doesn’t expose any metadata using standard formats like RDF/unAPI.
Ever tried to figure out how to cite a paper on Academia.edu? This is hard because most of the metadata is missing. Reference managers like Zotero or Mendeley cannot detect and save papers for citing. For those hoping to cite works uploaded there, this makes life more difficult than it needs to be. For users of academia.edu, this hurts citability. Yes, I know academia.edu touts a 69% citation advantage. See here for a discussion of some concerns about that study. My point here is simply that whoever wants to cite papers found on Academia.edu currently has to get the metadata from elsewhere.
- Academia.edu doesn’t comply with Google Scholar guidelines for exposing metadata.
Google Scholar has started to index the heap of PDFs on Academia, but has to resort to scraping only the most superficial info available, usually from the PDF, because there is no metadata. This is the number one reason for the junk citations in Google Scholar that started this story. It means people’s work is misrepresented and makes it harder to figure out how to cite it — bad news, because Google Scholar is very widely used. For users of academia.edu, this hurts both the findability and the citability of their work. For users of Google Scholar, this adds more noise in an already noisy system.
- Academia.edu is built for single-authored papers, and its handling of multi-authored papers is surprisingly poor.
The default way of scraping author names leads to many errors and they can only be fixed manually. Take the paper Academia.edu staff published on ‘discoverability’ — the authors are all jumbled up. Only the original uploader owns the item and can add or fix bibliographic metadata, and for other authors, it’s hard to see who’s the owner. There is no system for duplicate detection and resolution. It is too easy for multiple authors to upload the same paper with slight differences in bibliographic metadata. It is too hard to clean up the mess and make sure there is only one good version of record. This affects people’s profiles and has undesirable knock-on effects for the points above.
- The process of adding papers is geared towards enriching Academia.edu content rather than towards promoting the sharing of correct and complete scientific information.
After Academia.edu gets your PDF (and that’s a requirement for adding a paper), there are very few opportunities for providing metadata, and the primary upload interface cares more about fluff like ‘research interests’ than about getting the basic bibliographic metadata right. There is no way to import via DOI or PMID (which would prevent many errors), or even to record these identifiers — a fatal lack of concern for interoperability which is quite surprising. Essentially, a user interface should make it easy for people to get things right, and hard to get them wrong. The current user interface for adding papers does the exact opposite (see annoted screenshot).
- It is surprisingly (and needlessly) hard to add crucial bibliographical information like journal, DOI and URL.
More details can only be added after importing papers, which simply means most users won’t do it. As far as I can see, the only way to do it is to go back to your publication list, hover over the Edit button, and find other fields to edit. Even here, there appears to be no place for identifiers like DOI or PMID. Page numbers and so on are hidden in an “Other” field. Any user interface designer will tell you that stuff buried this deeply might as well be left out: only a negligible amount of users will find and use it. Anyway, if you’ve succeeded in adding some of this metadata, congratulations for completing a futile exercise. The information you painstakingly entered is nowhere exposed and so cannot be reused or exported, except, again, manually. Quite remarkable in the age of APIs and interoperability.
How Academia.edu nudges us towards narcissism
My conclusion from these points: Academia.edu seems not to care about promoting the curation and sharing of correct and high quality metadata of scientific publications. One might counter that this is not the goal of the network, and that the content of the papers is what’s most important anyway. But peer-reviewed publications are still the main vehicle for advancing scientific results, and citations are still the main currency of cumulative science. So getting bibliographic metadata right is key to promoting science as a cumulative enterprise. Nor should this be hard in the era of DOIs and PMIDs, making it all the more surprising Academia.edu doesn’t care about them.
If there really is a problem, why do relatively few people complain and why are so many users seemingly happy about the service? There are several reasons. Not every academic has access to personal website or an open academic repository, and Academia.edu presents itself as one of the easiest options to make one’s work visible online (never mind the fact that it doesn’t actually make it easily citable, and laces it with ads to boot). It may be a way to keep up with colleagues. Also, I’ve heard people are happy with its “sessions” as a way to get interactive feedback on a paper. But there’s one important reason that I haven’t seen commented upon often: Academia.edu plays to our vanity. Many elements of its design are built to satisfy and amplify our craving for external validation.
Judging from the navigation menu, “analytics” is one of the most important elements of Academia.edu. Upload papers, tag them with research interests, and they generate paper views. Follow people and they’ll follow you back, generating profile views. Tomorrow your paper may be in the top 5%! Next week you might be crowned as the 1%! Look, your paper was just read by someone from Vienna! Your work is being read in 27 countries! You’re being followed by someone you barely know! All those things are nicely presented in spiffy graphs — evidently a part of Academia.edu that a lot of design resources have been devoted to.
And note some of the design here is cynical. The only two time windows offered are 30 days and 60 days, inviting you to come back at least this often to keep up with the stats (yes, you can download a CSV for more, but once again that is one of those power user features that will rarely be used). Views are promoted over actual downloads while bounce rates (basically, how many people are gone after a quick glance, usually the majority) are concealed. The most important metadata for papers (again, just taking the design as a measure of what Academia.edu promotes as important) is this mostly meaningless view count. Not where it was published, not how to cite it, certainly not where to find it off Academia.edu — just how many people had a look.
Academia.edu doesn’t take academia seriously
Does this mean everybody on Academia.edu is a narcissist? Of course not. My point is not about users; it is about the design of the service. User interface design is not innocent: as a recent Medium essay noted, technology hijacks our minds, constraining our options and nudging us in ways that often elude our awareness. Not everybody on Academia.edu is a narcissist, but many aspects of its design make it easy to become one. (The emails! Don’t get me started about the emails. By default, Academia.edu will send you an email whenever someone stumbled upon your profile or one of your papers. Just look at this Twitter feed to see how creepy people find that feature. You might even spot a few who have come to like it, Stockholm syndrome-style.)
I find @academia's "Someone Searched for You/Your Paper" instant updates a little unnerving but no way in hell am I disabling them.
— Mark Sussman (@marksussman) May 5, 2015
On balance, I feel Academia.edu doesn’t really take us seriously as academics. It takes our work to make a profit (for instance by putting advertisements around it), totally botches the metadata and tries to appease us by offering stats and social rankings that promote constant comparison. And nothing in its design suggests a regard for getting even the most basic bibliographic information about our scientific work right — even though that would be one way to turn page views into citations. This is one of the reasons the only paper I’ve uploaded there for years has been one pointing people to where they can find all my papers freely and without hassle.
To end on a slightly more optimistic note: at least the poor metadata problem can be solved. As far as I can see, nothing in Academia.edu’s business model turns on proliferating poor and incomplete metadata. The citation advantage it likes to claim could significantly increase if it started exposing metadata in ways that are compatible with widely used tools like Google Scholar and Zotero. It still won’t be a service I’m keen on using, but I do hold hope it will become better at promoting cumulative science rather than cynically playing to our vanity.