About RePEc impact factors

July 27, 2009

Impact factors have always been a popular way to measure the influence of academic journals. They have been popularized by ISI, now part of Thomson. RePEc also provides impact factors, and this post is about explaining the differences between the two.

ISI takes a sample of journals and analyzes the citations across those journals. To be eligible, a citations has to appear within two years of the publication of the cited article, the cited article must be printed (not forthcoming, a working paper or a manuscript), and the cited article must be among the analyzed journals (286 in Economics). ISI is currently experimenting with a five year window, in addition to the existing two-year window.

RePEc considers all publications listed in its bibliographic database. Thus, it also considers other publication forms than journal articles: close to 1000 journals and 2600 working paper series. It imposes no time window, citations of any age qualify. In most cases, a citation of a working paper will count towards its published form once the article is included in RePEc, possibly after the original citation (condition: at least one author has both versions in his/her RePEc profile). This implies that working paper series and book series can also have impact factors. RePEc is thus more comprehensive.

However, the pool of citations RePEc is drawing from is different. It relies very much on working papers (who can later be published), as they are typically openly accessible. Some publishers also provide references in the bibliographic metadata, but not all. One implication of this is that RePEc is more current as it includes citations to and from research that is not yet published. As research gets published, this data gets updated. But as references from many journals are missing, RePEc citation data must still be treated as experimental. Whether these omissions matter remain to be seen. After all, impact factors always have to be considered in relative terms, not in absolute terms, and if omissions were not biased, they would not matter.

Another major difference is that RePEc excludes self-citations. This is an important issue as some journals, explicitly or implicitly, encourage authors to cite other articles published within the two year window in the same journal. Thus, just as self-citations are excluded for authors, they are excluded for journals. And this can matter a lot.

Finally, the impact factor is determined by divided the eligible citations by the number of eligible articles. ISI determines itself what articles are eligible for the denominator, and this can even be negotiated with the publisher. In RePEc’s case, if an article (or a working paper) is listed, it counts without adjustment.

RePEc also publishes variations on the “simple” impact factor: recursive impact factors, where every citation counts with the impact factor of the citing publication, this favors impact over numbers; discounted impact factors, where the impact of a citation decays with time (regardless of the age of the cited item; and a combination of the two, discounted recursive impact factors. Finally, there is now also the h-index. All variations have a different story to tell about the publication, and RePEc offers the reader the choice.


Parsing citations

November 22, 2008

One of the services RePEc offers to authors is the discovery of citations, CitEc. This is a difficult undertaking as this needs to be done entirely automatically. As project leader José Manuel Barrueco Cruz discusses in a previous post, the reference section of a paper is extracted through a series of steps: pdf download, file conversion to PostScript, further conversion to plain text, identifying reference section. In each of these steps there are losses.

But even once the reference section is in hand, we are not out of trouble. One needs to identify where each reference starts and ends, then try to match it with something already in RePEc. Considering all the different citation styles, typos, and plain errors, this is a daunting task. Matches that are sufficiently close are counted as citations, matches that are in some grey zone are fed to the RePEc Author Service to solicit the author’s help in sorting them out. Below are a few examples of what is offered to authors, for the case of a classic article by Gary Becker, Kenneth Murphy and Robert Tamura, Human capital, fertility and economic growth:


  • [3] Becker, G.; Murphy, K. ald Tamura, R. (1993)Humall capital, fertility ald ecollomic growth 01 Humall Capital, third editioll, Gary Becker.
  • Becker, Gary S.; Murphy, Kevin M.; and Tamura, Robert. Human Capital, Fertility, and Econonric Growth, Journal of Political Economy, October 1990 98(5) Part 2, pp. S12-S37.
  • Becker GS, Murphy KM, Tamura R (1990) Human capital, fertility and economic growth. J Polit Econ 98:S12–S37.
  • 1-25. Kevin M. Murphy, and Robert Tamura, Human Capital, Fer- tility and Economic Growth, Journal of Political Economy, October
  • BECKER, 0. S., K. M. MURPHY and R. TAMURA (1990) Human Capital, Fertility and Economic Growth, Journal of Political Economy 98, S 12-37.
  • [6] Becker, G., Murphy, K. and Tamura, R. (1990), Human capital, fertility, and economic growth, Journal of Political Economy, vol. XCVIII, pp.12-37.
  • Population and Development Review, Vol.12, Supplement: Below-Replacement Fertility in Industrial Societies: Causes, Consequences, Policies, pp. 69-76. Becker, Gary; Kevin Murphy, y Robert Tamura. (1990). Human Capital, Fertility and Economic Growth. The Journal of Political Economy, Vol.98, No.5, Part 2: The Problem of Development: A Conference of the Institute for the Study of Free Enterprise System, S12-S37.
  • (March/April 1973 Supplement), S279-88. ______________ Kevin M. Murphy, and Robert Tamura, Human Capital, Fertility, and Economic Growth, Journal of Political Economy, XCVIII
  • Becker, S. Gary, Kevin, M. Murphy and Tamura, Robert (1990). `Human Capital, Fertility, and Economic Growth The Journal of Political Economy, Vol. 98, Issue 5, Part 2, Oct. 1990, pp. S12-S37.
  • Bankconference on developmenteconomics. ecker, Gary, KevinMurphy, and RobertTamura. 1990. Human Capital, Fertility, and EconomicGrowth., Journal of PoliticalEconomy 98, 5, Part 2, pp. S12-S37.

These examples show what can go wrong in the file conversion and how citing authors can make mistakes. Still, CitEc has been able to recognize there references, but is not sure enough about them.

This also highlights that we try to minimize errors, even if this means leaving good citations out. Other citations services may have a different approach.


RePEc as a bibliographic tool

September 14, 2008

RePEc is a scheme to collect bibliographic information about publication and pre-publications in Economics. Publishers provide all the relevant information, which is then displayed in various ways by RePEc services. This allows users to have access to this data. While it is useful to find items of research while browsing or searching through these services, it is even better when one can upload the relevant bibliographic data directly into one’s bibliographic tool.

Every abstract page on IDEAS has links that allow to download such bibliographic information in various formats: as a HTML citation, a plain text citation, the BibTeX entry familiar to LaTeX users, the RIS format used in various software like EndNote, and the ReDIF format used by RePEc. For registered authors, it is also possible to obtain these records for all their publications in one download. If other formats are used in the research community, they can be provided as well. Just ask.


Classifying authors

March 16, 2008

A difficult task librarians often face in the classification of items is determining whether authors with similar names are the same person. Indeed, bibliographic records are most of the time very limited in author identification. Take the case of Adam Smith. He may be listed under his full name, which is by no means unique, or worse only as A. Smith, which is easily confused with others. Librarians then rely on context and additional information gathered outside of the bibliographic record to attribute the work to the right person, hopefully without error.

With the large numbers of works now available, such laborious categorization becomes unfeasible, and automatic classification makes numerous errors. Within RePEc, we rely on the authors themselves to perform the classification. When they register in the RePEc Author Service, they have the opportunity to enter all the possible name variations in they may be listed in a bibliographic record. For John Maynard Keynes (who is not registered), such name variations could be:

John Maynard Keynes
John M. Keynes
John Keynes
J. M. Keynes
J. Keynes
Keynes, John Maynard
Keynes, John M.
Keynes, John
Keynes, J. M.
Keynes, J.

In addition, an author may have changed names (through marriage), be listed with a title (Prof., Sir) or with a suffix (Jr, Sr, III). Variations multiply if names have accents, which some publishers do not take into account or encode in the wrong character set. The possibilities are numerous. The registered author is then offered first suggestions of works that match the name variations and then suggestions that offer some close match to name variations (typographical errors happen). The author can then accept these works or reject them.

The RePEc Author Service has so far managed to collect data from close to 16,000 authors who have claimed over 300,000 works as theirs. Such data is in particular used to increase the accuracy of various rankings. And within this set of authors, there is already a large number of homonyms, even when one looks beyond the initial of the first name, which is the precision that some other services have.

If you know of other homonyms in the profession, encourage them to register!