How data is assembled in RePEc

March 30, 2008

RePEc is essentially a large bibliographic database. Thus it needs data about bibliographic items. As RePEc has no employee and can only rely on volunteers, it had to find a way to reduce the cost of data input to a minimum. It succeeded in the sense that this cost is shifted to those that benefit the most from having their publications listed on RePEc: the publishers. We call such publishers “RePEc archive maintainers.” They can be commercial publishers, university presses, economics departments, research centers, central banks, societies or other organizations that have some form of publication relevant to Economics.

This is how RePEc archive maintainers proceed: They maintain sets of flat text files following a particular format called ReDIF. There are different formats for different types of documents. For example, the template describing a working paper would look like this:

Template-Type: ReDIF-Paper 1.0

Author-Name: Hildegrund Muesli

Author-Workplace-Name: University of Upper Elbonia

Author-Name: Adalbrecht Vollkorn

Author-Workplace-Name: Institute for Grandiose Research

Title: The Economics of Gizmos: Grandiose Results

Abstract: Gizmos have become more common with the advent of cybermarkets. This paper explains how banking regulation, demographics and global climate change have increased the demand for gizmos.

Classification-JEL: Z00

Keywords: Location, Location, Location


Number: 0803

Creation-Date: 2008-02

Handle: RePEc:uel:papers:0803

There are other templates for articles, chapters, books, software components, series and archives. For RePEc-internal uses, people and institutions also have templates, all with unique identifiers (handles) that allow for cross-linking. These templates are then placed on the website or the anonymous ftp site of the publisher, and RePEc services visit them on a regular basis, typically daily, to check for changes. This allows for very fast turn-around times.

Complete instructions on how to proceed to open a RePEc archive can be found here. If your institution is not yet listed among the about 900 participating archives, consider following these instructions.

The Budapest Open Access Initiative

March 22, 2008

The Budapest Open Access Initiative (BOAI) was signed on 14 February 2002. Its goal is to encourage an international effort to make research in all academic fields freely available on the internet. It defines open access as “the world-wide electronic distribution of the peer-reviewed journal literature and completely free and unrestricted access to it by all scientists, scholars, teachers, students, and other curious minds.” This definition is not limited to articles published in journals but also encompasses pre-prints (discussion or workings papers as we call them in Economics).

The Directory of Open Access Journals (DOAJ) list (as of the writing of this post) 3289 journals, including 68 in Economics, that satisfy the requirements of BOAI. While Economics is relatively underrepresented, the working paper culture in our field allows to find in open access many, if not most, of the articles published in non-open access journals (RePEc tries very hard to identify links between different versions of the same work). In fact, most publishers explicitly allow authors to publish pre-prints or post-prints of their articles in institutional repositories, including working papers series. A good list of policies by publishers can be found at RoMEO.

If you think this is a good initiative, you can sign the BOAI here. Foremost, make sure your publications are available in free access through a working papers series, or absent this option, through the Munich Personal RePEc Archive. In particular, in most cases, authors should not remove their working papers once their are published in journals.

Classifying authors

March 16, 2008

A difficult task librarians often face in the classification of items is determining whether authors with similar names are the same person. Indeed, bibliographic records are most of the time very limited in author identification. Take the case of Adam Smith. He may be listed under his full name, which is by no means unique, or worse only as A. Smith, which is easily confused with others. Librarians then rely on context and additional information gathered outside of the bibliographic record to attribute the work to the right person, hopefully without error.

With the large numbers of works now available, such laborious categorization becomes unfeasible, and automatic classification makes numerous errors. Within RePEc, we rely on the authors themselves to perform the classification. When they register in the RePEc Author Service, they have the opportunity to enter all the possible name variations in they may be listed in a bibliographic record. For John Maynard Keynes (who is not registered), such name variations could be:

John Maynard Keynes
John M. Keynes
John Keynes
J. M. Keynes
J. Keynes
Keynes, John Maynard
Keynes, John M.
Keynes, John
Keynes, J. M.
Keynes, J.

In addition, an author may have changed names (through marriage), be listed with a title (Prof., Sir) or with a suffix (Jr, Sr, III). Variations multiply if names have accents, which some publishers do not take into account or encode in the wrong character set. The possibilities are numerous. The registered author is then offered first suggestions of works that match the name variations and then suggestions that offer some close match to name variations (typographical errors happen). The author can then accept these works or reject them.

The RePEc Author Service has so far managed to collect data from close to 16,000 authors who have claimed over 300,000 works as theirs. Such data is in particular used to increase the accuracy of various rankings. And within this set of authors, there is already a large number of homonyms, even when one looks beyond the initial of the first name, which is the precision that some other services have.

If you know of other homonyms in the profession, encourage them to register!

The RePEc budget for 2008

March 9, 2008
  Budget 2007 Effective 2007 Budget 2008
Expenses US$0.00 US$0.00 US$0.00
Revenues US$0.00 US$0.00 US$0.00

Thanks to all our volunteers!

RePEc in February 2008

March 2, 2008

Every month, a short summary of what happened with RePEc is sent to the RePEc-announce mailing list. I also put that message, slightly adapted, on this blog.

During this month, IDEAS moved to a new server sponsored by the Society for Economic Dynamics. It continues to be hosted by the University of Connecticut and is now located on a faster line to the Internet.

In terms of traffic, 613,984 file downloads and 2,246,241 abstract views were recorded within the month, once more significantly up from a year ago. This leads us to the thresholds we have passed this month:

40,000,000 cumulative article abstract views on all RePEc services
25,000,000 cumulative abstract views on EconPapers
300,000 items claimed by registered authors
100,000 JEL codes papers
20,000 unique subscribers in NEP
2,800 journals and series