Keeping RePEc participants in the loop

November 21, 2009

Over 12 years, RePEc has grown to a large community, with 1100 participating archives and over 22000 registered authors. But a community like this is only useful if it is dynamic: participants participate and users use it. It is easy to register and then forget about it. This is why archive and series maintainers, editors and authors get every month an individualized emails as a reminder that they are part of the community, as well as individualized news about their participation.

For editors, archive and series maintainers, the email contains the latest statistics about their publications: downloads and abstract views, impact factors, error message about the syntax of their templates, the URLs of the full texts, or our attempts to reach their data. Maintainers may as well get messages in between if some issues arise.

For authors, monthly emails also contain downloads and abstract views, as well as any new citations discovered by the CitEc project. Also, the email contains a personalized link leading to a ranking analysis of the author according to over 30 criteria. Authors also get separate emails alerting them of potential works they can add to their portfolio.

Keeping regular contact with participants is essential to ensure continued alertness about the project and to keep the collected data as fresh and accurate as possible.


RePEc in October 2009

November 4, 2009

This way again a very busy month, with 878,635 file downloads and 3,199,663 abstract views, which are numbers close to records. We also had 16 new particpating archives: Unversité d’Orléans, Finanzas Púplicas México, “Constantin Brancusi” University of Targu-Jiu, BEPress (II), University of Bacau, Ryerson University, Tblissi State University, Seoul National University, Athens University of Economics and Business, University of Manchester (II), Imperial College, Universté Catholique de Lille, ETH Zurich (IV, V), Towson University, European Commission Joint Research Centre.

In terms of thresholds passed, we report:

30000000 cumulative downloads through IDEAS
750000 cumulative software component downloads
450000 works listed in author profiles
300000 cumulative chapter downloads
1100 participating archives


Good practices for RePEc archive maintainers

October 27, 2009

The bibliographic data displayed in RePEc services originates in about 1100 participating archives, each maintained by a volunteer (see hee for instructions to start a RePEc archive). The quality of the data in RePEc thus depends on the quality of what is entered at the archive level, and there are obviously some variations. In general, we recommend to provide as many bibliographic details as possible so as to improve the chances of each work to be found in user searches. While missing fields are sometimes frustrating for users, the incorrect use of bibliographic fields is more so. This post provides some advice to RePEc archive maintainers regarding the most frequent violations of RePEc taxonomy.


  • It is always a good idea to check your series from time to time on EconPapers and IDEAS. A good opportunity is when you get your monthly email. That allows often to uncover errors. Also, use the syntax checker on EconPapers, which usually uncovers why some item is no showing up on RePEc.
  • The most frequent imprecision in RePEc data is the abuse of the Author-Name field. It should only contain the name of the author, but not his affiliation (which belongs into Author-Workplace-Name) nor his email address (Author-Email). Also, there should be only one author per Author-Name field. With multiple authors, repeat the field.

    Correct use of the Author-Name field is important, because it allows to attribute the works to the appropriate authors in the RePEc Author Service. It frustrates authors when they do not find their own works due to miscodings.

  • Generally, put in the field what the calls for. There is a surprising number of Title fields that actually contain abstracts, for example. And keywords or classification codes do not belong in the abstract, but in Keywords and Classification-JEL.
  • Make sure to provide a date for your bibliographic item. Without a date, it cannot be displayed in chronological order. For working papers, they cannot be considered for diffusion through NEP as it cannot be established whether they are new. For working papers, use the Creation-Date field, with a syntax like yyyy, yyyy-mm or yyyy-mm-dd. For articles use Year. The relevant date is the one at which the work was written, not when the bibliographic record was created.
  • Links to online texts are provided with the File-URL field. It should link directly to the pdf file, not to an intermediate abstract page. There are two reasons: First, users already see an abstract page on the RePEc service. Second, we need a direct link to perform the citation analysis.
  • The easiest way to including an abstract in a bibliographic record is to cut-and-paste from the pdf file. In some cases, some characters do not travel well. This is especially the case for ligatures like “ff”, “fl”, “fi”, and the like. Also, end-of-line hyphenations need to be removed from abstracts. Thus, always read through an abstract after pasting it.
  • Never, never recycle handles. Handles are unique identifiers that are used throughout RePEc, for example to assign paper to authors, relate references and determine what is a new record. Avoid changing handles, as this ruptures all these relations that need to be reestablished. But never, never reassign an existing handle to a different item, because this renders exiting relations erroneous.
  • Bibliographic records should not contain any HTML encoding. If a special character needs to be displayed, says an accented character, use UTF-8 encoding. The usual text editors will provide the byte-order mark (BOM) at the start of the file indicating that it is UTF-8 encoded. But you you generate the files through scripts, they need to explicitly add the BOM.


RePEc in September 2009

October 6, 2009

Now that vacations are over, activity on RePEc is as high as ever. Several new features were introduced in September: a Facebook application that allows to display one’s latest research and experimental blogs by NEP editors discussing research in some fields. Traffic has pickep up again, with 763,583 file downloads and 2,735,405 abstract views. Also, 11 new archives joined: University of Bath, Australian Journal of Labour Economics, University of Luxembourg, University of Pécs, University of Tsukuba, Bar-Ilan University, Australian National University (IV), c.MET-05, University of Natural Resources and Applied Life Sciences Vienna, Kenyon College, International Association for Energy Economics.

Finally, we passed some thresholds, including some major ones:
800,000 works listed
250,000 online working papers
200,000 article abstracts
25,000 NEP reports


How abstract views and downloads are counted

September 19, 2009

Authors and RePEc archive maintainers receive monthly emails with various statistics, and among the most anticipated statistics are our abstract views and download counts. It is important to understand how those statistics are collected and what they measure (and do not measure). Full statistics are available on the LogEc website managed by Sune Karlsson from Örebro University (Sweden).

Participating RePEc services (EconPapers, IDEAS, NEP and Socionet) keep a log of all activity on their sites. This allows us to count page views for the abstract pages of each items in the database (excluding NEP, as abstracts are listed in emails). Logs also record outclicks as users leave the RePEc services to the sites containing the full texts they seeks to download. This allows us to count “downloads”. Quotation marks are required as it is impossible to record whether the download was successful, for example in the case of gated publisher sites. Note also that this means that downloads that have not transited through a RePEc cannot be counted, as we do not have access to local logs.

LogEc gathers the logs from the participating services and aggregates the statistics. This involves much more than bean counting, though. Indeed, one first needs to exclude robot activity, as only human activity is of interest. Some robots declare themselves as such, but other hide their identity. One has thus to infer from various patterns what IP addresses are likely robots. This is an important step, as robots represent typically 75% of raw abstract views. Robots include spiders from many search engines as well as other initiatives on the Internet.

One needs also to weed out multiple views or downloads by the same user. This brings us to detecting attempts at increasing counts by authors. Obviously, we cannot reveal here how this is done, but let it be known that we have detected fraud even by authors using multiple Internet service providers. The methods used lead to some undercounting, though. Multiple users behind the same cache server may be counted only once, as it may for example happen to employees of the US Federal Reserve Banks that use RePEc.

And we are still not done pruning. LogEc then checks for additional patterns that need to be vetted by a human eye. Unusual activity is then checked and often reconciled with traffic from popular blogs, magazines and newspapers. But on other occasions, traffic surges cannot be explained in licit ways and need to be cleaned out.

After all these manipulations, statistics are published and disseminated. And despite substantial pruning, RePEc services still get over 2,000,000 abstract views and 600,000 downloads every month. See LogEc for details.


RePEc in August 2009

September 3, 2009

The quietest month of the year still brought some important news. RePEc now carries bibliographic information about 1000 journals and 300′000 working papers. We counted 647,942 file downloads and 2,213,814 abstract views for the month. For working papers, this adds up to 25 million downloads since we started counting!

In terms of developments, the RePEc Input Service now also allows journals that for some reason cannot open their own RePEc archive to index their articles in RePEc. Also, EconPapers allows users to download bibliographic data in various formats for their own databases. Both developments are due to Sune Karlsson, who also moved EconPapers and LogEc to new hardware.

During August 2009, the following archives joined RePEc: Sam Houston State University, arXiv, National Insurance Institute of Israel, University of Rome Sapienza (II), Economic Research Institute for ASEAN and East Asia, EPFL (II), BBVA, Journal of Transport Economics and Policy, Queens University of Charlotte. A special mention regarding arXiv: it is a very large and popular archive in Physics, Mathematics and Computer Sciences that is now feeding its Quantitative Finance content to RePEc.

Finally, here is the traditional list of thresholds passed during the last month:
25,000,000 cumulative working paper downloads
300,000 working papers
150,000 working papers with references
60,000 articles with references
1,000 journals


MPRA, the Munich Personal RePEc Archive

August 27, 2009

The Munich Personal RePEc Archive (MPRA) has been started three years ago. It has developed into one of the largest archives within the RePEc network, comprising roughly 9000 items at the time of writing. Christian Zimmermann has suggested that I share some toughs about its history and functioning.

The initial idea occurred to me when I heard that the Economics Working Paper Archive (EconWPA), run by Bob Parks, was discontinued in 2005. EconWPA offered the possibility for individual authors to make their contributions accessible to the community through the RePEc network, given that only institutions can set up RePEc archives. Although we have in Munich our discussion paper series integrated into RePEc, not all economists are so fortunate, and the need for a personal archive (as distinct from an institutional archive) was apparent.

Given that we had successfully established our department’s discussion paper series with the EPrints software, it appeared technically feasible to clone the software and use it for a personal RePEc archive. Discussion on the internal RePEc list led to the name “Munich Personal RePEc Archive,” the main concern being to clarify that the archive was intended as a RePEc service, rather something  original, and that the name would not exclude other personal RePEc archives in other locations. (If one of the other Munich universities wants to start another personal archive, we may get into a problem…)

I asked Volker Schallehn from the University Library, who has implemented the EPrints software for our university archives, about the possibility to help with such a project. He agreed to help. The next step was to convince the president of the university as well as the director of the library to agree dedicating some resources to the endeavor that would not serve people from Munich at all. They were in favor, and so we got started on September 19, 2006.

From a technical point of view the main problem was to automatize as much as possible, as we could not supply manpower: The generation of title pages, the  creation of metadate in the ReDif format required by the RePEc harvester, and the linking to the RePEc author service. With the help of  Thomas Krichel, Christian Zimmermann, Kit Baum, Sune Karlsson, Ivan Kurmarov, and others we manged to solve these problems and set up the website. We found editors. They do the main job now. The English editors handle often more than 50 submissions per day.

As the Eprints software permits to establish series in different languages, we decided to use these feature and to offer the service in all languages for authors who deal with country-specific issues and want to make their research available in their local language. However we require for all submissions English abstracts such that all users can obtain an impression what economists writing in other languages do and, if necessary, contact them. This feature has lead to quite a number of submissions in languages like Spanish or French, and to some smaller sets in Turkish, Arabic, and others. (Some of them look extremely pretty.) Maybe this feature creates a sense that all economists world-wide see themselves as members of a community with the common purpose of helping to improve living conditions around the globe.

A central motivation for establishing a pre-print archive like MPRA was to enable authors to secure the copyrights for their pre-print versions in case the copyright for the final article goes to the publisher. This permits open access to their work, even if publishers try to make the final work inaccessible for the non-paying public. This is a great convenience for academics and, I hope, generates a countervailing power that keeps a check on journal prices. Further, this arrangement provides a means for the authors to make their work accessible to others through the RePEc services.

As an unintended by-product some authors have obtained requests from publishers to publish their contribution in a volume or journal. This may indicate a trend for the future: While authors submitted their works to publishers (and paid for it), in the future simply put your stuff on the net, and publishers approach you in order to create collections that generate value added beyond mere publication, such that people and libraries a willing to pay for it. If MPRA could contribute to such a development, this would be nice.

It is quite astonishing to me how many good papers we obtain, in spite of the fact that we do no refereeing at all. (The editors check only some formal aspects, making sure that the submission is of academic nature, and a certain convention has emerged in this respect.)

MPRA offers a public forum for publishing papers, but not only that: It offers the possibility to publish comments on papers in the archive. This feature is not used. Maybe somebody has a suggestion how to organize discussions around papers such that people actually feel inclined to use such a feature.

So much about MPRA. If you have any suggestions, please feel free to communicate and discuss them on this blog.


EconPapers and LogEc on new hardware

August 12, 2009

Thanks to the continued support of the Swedish Business School at Örebro University, EconPapers and LogEc are now running on new and upgraded hardware. This will allow for the smooth running of these services over the next few years as the coverage of RePEc continues to grow and new features are added to the services.

EconPapers is a website that displays all the bibliographic data collected through RePEc. Contents can be browsed in various ways. A powerful search engine is also available. LogEc collects and displays statistics about abstract views and downloads from EconPapers and other participating RePEc services. Both EconPapers and LogEc are run by Sune Karlsson.


RePEc in July 2009

August 4, 2009

The month of July is generally calm. Regular classes are not in session on campuses, researchers are on vacation or at conferences, thus it is to be expected that RePEc sees little new material or traffic. We counted 674,639 File downloads and 2,287,995 abstract views, relatively modest numbers, saw only six new archives: Universidad de los Andes, Katholieke Universiteit Leuven (II), Spiru Haret University Brasov, Austrian Academy of Sciences, ETH Zürich (III), German Council for Social and Economic Data. The first added Venezuela to our list of participating countries, which is now at 68.

We still managed to pass a few thresholds:

400000 online articles
12500 listed book chapters
5000 subscribers to NEP-HIS, the largest subscriber base in NEP


RePEc in June 2009

July 10, 2009

What’s up at RePEc? We are happy to see an ever increasing popularity of our services, which has manifested itself last month with a record number of newly participating archives, 26, or one every working day: Univesidad de San Andres, Universität Marburg (II), Intervention, University of Florence (II), Bucharest University of Economics (III), University of Warsaw, Lucian Blaga University, University of Queensland (II), Universidade Federal de Goias, Banque de France, National Bank of Poland, Griffith University, University of Malaya, Associazione Rossi Doria, Superintendencia de Valores y Seguros, Basque Institute of Competitiveness, Bancaria, Jerusalem Institute for Market Studies, University of Buckingham Press, Sacred Heart University, Cahiers d’Économie Politique, Universidad Nacional de La Plata, University of California Riverside, Osaka University, Asociación Española de Historia Económica, Red Iberoamericana de Economía Ecológica.

In terms of traffice, we counted 725,569 file downloads and 2,588,500 abstract views. This allowed us to break the mark of 40 million downloads since we started counting this. As usual, many details are available at LogEc.

In terms of thresholds, we are proud to announce the following for June 2009:

60,000,000 article abstract views
40,000,000 downloads
15,000,000 article downloads
650,000 items available online
450,000 listed journal articles
250,000 book chapter downloads
10,000 listed book chapters
10,000 online book chapters