Open Access News pointed out a very interesting article in the Journal of Cell Biology, Show Me the Data. Written by that journal’s executive editor, the executive editor of Journal of Experimental Medicine, and the Executive Director of The Rockefeller University Press, it first reiterates many quality issues with journal impact factors that seem to be well-known among biologists, but I suspect that they are news to many economists. Many of these issues also hold for citation rankings for individuals. Beyond that, there are other issues that make citation data suspect. Fortunately, there are potential solutions to many of these problems.

First, it helps to describe impact factors as they are calculated by Thomson Scientific (previously the Institute of Scientific Information, or ISI). An impact factor in year t is the mean number of cites to all articles in that journal in years t-1 and t-2 divided by the number of number of research or review articles. Criticisms include

  • the data in the denominator and numerator are not consistent
  • Thomson is unclear on what exactly defines a research or review article
  • some journals have negotiated with Thomson on exactly what defines the article type
  • retracted papers are not excluded
  • of course, the mean is inflated by a few star papers
  • editors can game the system; apparently some do and some don’t (I’ve even seen this in the Wall Street Journal)

The authors go on to say that they contacted Thomson and received some of their data. They found numerous errors in how article were categorized. Further, “The total number of citations for each journal was substantially fewer than the number published” as reported by Thomson. When they requested further data from Thomson, the data still didn’t add up. They conclude “It became clear that Thomson Scientific could not or (for some as yet unexplained reason) would not sell us the data used to calculate their published impact factor.”

Their bottom line is even more clear: “If an author is unable to produce original data to verify a figure in one of our papers, we revoke the acceptance of the paper. We hope this account will convince some scientists and funding organizations to revoke their acceptance of impact factors as an accurate representation of the quality—or impact—of a paper published in a given journal. Just as scientists would not accept the findings in a scientific paper without seeing the primary data, so should they not rely on Thomson Scientific’s impact factor, which is based on hidden data.”

Besides the points reiterated and brought up in the Journal of Cell Biology, there are further accuracy issues with Thomson data. For example, to identify authors, they only use initials for the their first and middle name. As they pool papers from all fields, this is a more severe error than one might first guess. Thomson reports that Kit Baum (known to Thomson as CF Baum) has publications in the Fordham Law Review (on nuclear waste) and the Sociology of Education (on group leadership).

A further issue is Thomson’s coverage; EconLit lists some 1,240 journals in our field while the last time I checked Thomson covered but a fraction of these. I don’t have recent data for their coverage, but in total Thomson covers 8,700 journals encompassing all academic fields, so it seems doubtful that Thomas has substantially changed its economics coverage.

A further problem plaguing all citation analysis is simply extracting citation data with software. After all, citations are written for people, not machines. I haven’t seen data for Thomson on this (one wonders if it is public), but I do know that CitEc has faced a very real challenge here.

There would seem to be several solutions to these problems. First, all of us should treat impact factors and citation data with considerable caution. Basing journal rankings, tenure, promotion, and raises on uncritical acceptance of this data is a poor idea. In the extreme, one could imagine legal action in a tenure case.

Second, as the authors of the Journal of Cell Biology argue, this data should be public, just as research findings should be. One initiative here is a Petition for OA [open access] to bibliographic data. My understanding is that through a “RePEc service” like EconPapers or IDEAS, raw CitEc data can be accessed by the public. Further, CitEc works with RePEc Author Services to correct citations. Here’s one more reason to join those 15,000 who have registered with it!

Third, we should investigate putting unique identifiers into each reference so that software can easily read it. That is, besides listing the journal, its volume, and so on, it would also include a unique identifier to the cited paper. DOIs are one possibility, but it is prohibitively expensive to get a license to dispense DOIs. However, “RePEc handles,” which identify papers in RePEc, are permanent and also cover working papers. Thus, we might start including them in each reference. This highlights a further issue: there is little incentive for authors to add this to their citations as it aids others. Perhaps one step in this direction would be for sites like IDEAS, which provide references for papers in different formats like BibTeX or EndNote, could include the RePEc handle along with the current author, title, journal, etc.

7 Responses to Citation Accuracy

  1. While this editorial the Journal of Cell Biology raises some red flags about the Thomson Scientific impact factors, these have been in doubt for a long time, due to their lack of coverage, as Bill Goffe mentions. We should ask ourselves, as RePEc, how we can improve on this.

    RePEc already supplies impact factors on its rankings page. How are they better than those from Thomson Scientific? They cover more years, which is important given the publication delays in Economics. They are computed for more journals. How are they worse? References from many journals are missing in the analysis, in major part because some publisher prohibits us from using them. This is partially compensated by the analysis of working papers, which actually makes RePEc impact factors more current. Both systems seem to suffer from the same problem with false positives. But at least RePEc is doing something about it with the RePEc Author Service by taking care of homonyms and allowing authors to correct the data further.

    To make RePEc impact factors a truly superior product, I think we need to work on two fronts: 1) convince more publishers to release data about their references or at least allow us to use those we can find. This follows the petition for open access to bibliographic data. 2) Make it a culture in Economics to use RePEc handles in bibliographies.

  2. I am appalled at how the Thomson Scientific impact factors are a fraud. I am not using them anymore and will use the RePEc ones. Continue with the good, open work!

    The Economic Logic blog on this.

  3. JMBC says:

    Following the “open” philosophy of RePEc, I would like to note that all data about citations generated by CitEc is available on the public domain. Anyone interested in reproduce the citations rankings or implement new added value services is wellcome to use the data. At the moment it’s available in AMF (Academic Metadata Format) format at the url:

  4. Thomson Scientific offers a correction to the Journal of Cell Biology editorial. Whether this is satisfying or not, it is clear that laying open all your data and procedure and allowing people to replicate freely the results should remove questions.

  5. [...] Not really, according to the RePEc blog (via Newmark). Thompson (formerly ISI) uses an imprecise and inconsistent method to compute journal impact factors and, even worse, refuses to release the raw data so that scores can be independently verified. One response: “[A]ll of us should treat impact factors and citation data with considerable caution. Basing journal rankings, tenure, promotion, and raises on uncritical acceptance of [these] data is a poor idea.” [...]

  6. [...] Peter Klein and the RePEc blog, in this article a team of biologists take on Thomson’s venerable citation impact factor – [...]

  7. [...] the impact? The RePEc blog chimes in: Besides the points reiterated and brought up in the Journal of Cell Biology, there are further [...]

