Open Access News pointed out a very interesting article in the Journal of Cell Biology, Show Me the Data. Written by that journal’s executive editor, the executive editor of Journal of Experimental Medicine, and the Executive Director of The Rockefeller University Press, it first reiterates many quality issues with journal impact factors that seem to be well-known among biologists, but I suspect that they are news to many economists. Many of these issues also hold for citation rankings for individuals. Beyond that, there are other issues that make citation data suspect. Fortunately, there are potential solutions to many of these problems.
First, it helps to describe impact factors as they are calculated by Thomson Scientific (previously the Institute of Scientific Information, or ISI). An impact factor in year t is the mean number of cites to all articles in that journal in years t-1 and t-2 divided by the number of number of research or review articles. Criticisms include
- the data in the denominator and numerator are not consistent
- Thomson is unclear on what exactly defines a research or review article
- some journals have negotiated with Thomson on exactly what defines the article type
- retracted papers are not excluded
- of course, the mean is inflated by a few star papers
- editors can game the system; apparently some do and some don’t (I’ve even seen this in the Wall Street Journal)
The authors go on to say that they contacted Thomson and received some of their data. They found numerous errors in how article were categorized. Further, “The total number of citations for each journal was substantially fewer than the number published” as reported by Thomson. When they requested further data from Thomson, the data still didn’t add up. They conclude “It became clear that Thomson Scientific could not or (for some as yet unexplained reason) would not sell us the data used to calculate their published impact factor.”
Their bottom line is even more clear: “If an author is unable to produce original data to verify a figure in one of our papers, we revoke the acceptance of the paper. We hope this account will convince some scientists and funding organizations to revoke their acceptance of impact factors as an accurate representation of the quality—or impact—of a paper published in a given journal. Just as scientists would not accept the findings in a scientific paper without seeing the primary data, so should they not rely on Thomson Scientific’s impact factor, which is based on hidden data.”
Besides the points reiterated and brought up in the Journal of Cell Biology, there are further accuracy issues with Thomson data. For example, to identify authors, they only use initials for the their first and middle name. As they pool papers from all fields, this is a more severe error than one might first guess. Thomson reports that Kit Baum (known to Thomson as CF Baum) has publications in the Fordham Law Review (on nuclear waste) and the Sociology of Education (on group leadership).
A further issue is Thomson’s coverage; EconLit lists some 1,240 journals in our field while the last time I checked Thomson covered but a fraction of these. I don’t have recent data for their coverage, but in total Thomson covers 8,700 journals encompassing all academic fields, so it seems doubtful that Thomas has substantially changed its economics coverage.
A further problem plaguing all citation analysis is simply extracting citation data with software. After all, citations are written for people, not machines. I haven’t seen data for Thomson on this (one wonders if it is public), but I do know that CitEc has faced a very real challenge here.
There would seem to be several solutions to these problems. First, all of us should treat impact factors and citation data with considerable caution. Basing journal rankings, tenure, promotion, and raises on uncritical acceptance of this data is a poor idea. In the extreme, one could imagine legal action in a tenure case.
Second, as the authors of the Journal of Cell Biology argue, this data should be public, just as research findings should be. One initiative here is a Petition for OA [open access] to bibliographic data. My understanding is that through a “RePEc service” like EconPapers or IDEAS, raw CitEc data can be accessed by the public. Further, CitEc works with RePEc Author Services to correct citations. Here’s one more reason to join those 15,000 who have registered with it!
Third, we should investigate putting unique identifiers into each reference so that software can easily read it. That is, besides listing the journal, its volume, and so on, it would also include a unique identifier to the cited paper. DOIs are one possibility, but it is prohibitively expensive to get a license to dispense DOIs. However, “RePEc handles,” which identify papers in RePEc, are permanent and also cover working papers. Thus, we might start including them in each reference. This highlights a further issue: there is little incentive for authors to add this to their citations as it aids others. Perhaps one step in this direction would be for sites like IDEAS, which provide references for papers in different formats like BibTeX or EndNote, could include the RePEc handle along with the current author, title, journal, etc.