Parsing citations

November 22, 2008

One of the services RePEc offers to authors is the discovery of citations, CitEc. This is a difficult undertaking as this needs to be done entirely automatically. As project leader José Manuel Barrueco Cruz discusses in a previous post, the reference section of a paper is extracted through a series of steps: pdf download, file conversion to PostScript, further conversion to plain text, identifying reference section. In each of these steps there are losses.

But even once the reference section is in hand, we are not out of trouble. One needs to identify where each reference starts and ends, then try to match it with something already in RePEc. Considering all the different citation styles, typos, and plain errors, this is a daunting task. Matches that are sufficiently close are counted as citations, matches that are in some grey zone are fed to the RePEc Author Service to solicit the author’s help in sorting them out. Below are a few examples of what is offered to authors, for the case of a classic article by Gary Becker, Kenneth Murphy and Robert Tamura, Human capital, fertility and economic growth:


  • [3] Becker, G.; Murphy, K. ald Tamura, R. (1993)Humall capital, fertility ald ecollomic growth 01 Humall Capital, third editioll, Gary Becker.
  • Becker, Gary S.; Murphy, Kevin M.; and Tamura, Robert. Human Capital, Fertility, and Econonric Growth, Journal of Political Economy, October 1990 98(5) Part 2, pp. S12-S37.
  • Becker GS, Murphy KM, Tamura R (1990) Human capital, fertility and economic growth. J Polit Econ 98:S12–S37.
  • 1-25. Kevin M. Murphy, and Robert Tamura, Human Capital, Fer- tility and Economic Growth, Journal of Political Economy, October
  • BECKER, 0. S., K. M. MURPHY and R. TAMURA (1990) Human Capital, Fertility and Economic Growth, Journal of Political Economy 98, S 12-37.
  • [6] Becker, G., Murphy, K. and Tamura, R. (1990), Human capital, fertility, and economic growth, Journal of Political Economy, vol. XCVIII, pp.12-37.
  • Population and Development Review, Vol.12, Supplement: Below-Replacement Fertility in Industrial Societies: Causes, Consequences, Policies, pp. 69-76. Becker, Gary; Kevin Murphy, y Robert Tamura. (1990). Human Capital, Fertility and Economic Growth. The Journal of Political Economy, Vol.98, No.5, Part 2: The Problem of Development: A Conference of the Institute for the Study of Free Enterprise System, S12-S37.
  • (March/April 1973 Supplement), S279-88. ______________ Kevin M. Murphy, and Robert Tamura, Human Capital, Fertility, and Economic Growth, Journal of Political Economy, XCVIII
  • Becker, S. Gary, Kevin, M. Murphy and Tamura, Robert (1990). `Human Capital, Fertility, and Economic Growth The Journal of Political Economy, Vol. 98, Issue 5, Part 2, Oct. 1990, pp. S12-S37.
  • Bankconference on developmenteconomics. ecker, Gary, KevinMurphy, and RobertTamura. 1990. Human Capital, Fertility, and EconomicGrowth., Journal of PoliticalEconomy 98, 5, Part 2, pp. S12-S37.

These examples show what can go wrong in the file conversion and how citing authors can make mistakes. Still, CitEc has been able to recognize there references, but is not sure enough about them.

This also highlights that we try to minimize errors, even if this means leaving good citations out. Other citations services may have a different approach.


Looking for a deep link?

November 21, 2008

If you were following a link and were expecting to find a specific post on the RePEc blog, we unfortunately had to move to a different host and links were broken. Please look for your post in the archives. Or if you were using one of the RSS feeds, please use the new ones: entries or comments. We apologize for the inconvenience.


The blog has moved to a new host

November 19, 2008

Due to chronic problems with DOS attacks and spamming that have crippled several times the host server, the RePEc has now moved to a new host. It is still available under the old https://blog.repec.org/ address, but no more under the alternative http://repec.org/blog/. Also, the addresses within the blog have all changed, which breaks deep links. Finally, old RSS feeds may still work as they are redirected, but it is safer to recreate them.

Users who created accounts at the old location will have to create new ones, unless they have already one on WordPress. I am very sorry for the trouble, and especially for the violation of the RePEc principle that links should never break. But I think we now have a permanent home for this blog and this should not happen again.


RePEc in October 2008

November 5, 2008

The major development this past month is that the contents of AgEcon Search are now listed on RePEc. About 30,000 works will gradually be integrated over the next weeks. Also, October is traditionally a busy month, which is reflected by a large number of new participants (authors and institutions) and high traffic. We recorded 701,893 file downloads and 2,757,234 abstract views. In addition, the following publishers joined us during this month: British University in Egypt, Migration Letters, Universitatea “Al. I. Cuza”, Université du Littoral, AgEcon Search, EERI, University of Suceava, Econometica, Spiru Haret University, WorldFish Center, Université Libre de Bruxelles, esocialsciences.com, Scuola Superiore Sant’Anna, Tufts University.

In terms of thresholds passed this month, we have:
150,000,000 cumulative abstract views
25,000,000 year-to-date abstract views
640,000 items listed
375,000 articles listed
18,000 authors registered
3,000 series and journals indexed
2,500 book chapters listed