Understanding update delays in RePEc

One of the advantages of RePEc is the extremely rapid update cycle. Once a paper has been added to a RePEc archive, RePEc services like EconPapers and IDEAS usually pick it up and list it within 24 hours. The paper is also disseminated through the NEP email notification service within two weeks. The longer delay comes from the weekly frequency of the emails, the various checks that are performed before submitting the paper to editors and the availability of editors. Papers are also available the next day for authors to claim into their profile on the RePEc Author Service. However, they may not get notified about this until it is their turn in the “automatic search” that rotates through all registered people, currently 27,000. A full cycle there takes close to three months. Indeed, such searches impose quite a burden on the machine, and they yield most of the time no result. When they know a new paper is up, we encourage authors to log in and do a “manual search”.

Other aspects take longer. Citation extraction can take several weeks, as the data about the paper needs to be gathered, analyzed and then compared with exiting material in the database. And once the results are returned to RePEc services, those may take a while, too. While EconPapers tries to add the new data within a day at all the right spots (cited papers, citing paper), IDEAS prefers to work on a rolling basis, refreshing paper pages every 30 days (or sooner if a change was done in the originating RePEc archive). Indeed, IDEAS also needs to adjust author citation pages, recompute statistics for authors and recalculate impact factors. As authors are usually quite impatient, the refresh cycle for authors is 18 days, unless they just modified their profile.

Download and abstract views statistics on LogEc are refreshed once a month, within the first days. A higher frequency is not possible due to the many checks those statistics go through, some of them being performed manually. For the same reason, rankings are released only once a month, except for the impact factors on series and journals, as well as the citation rankings for individual items, which are recomputed along the refresh cycle mentioned above for IDEAS.

And there are many other updates that can take a few days and the data bounces from one service to the other. For example, EDIRC houses data about institutions, and any update there will be reflected in author affiliations as their profiles get refreshed, which can take some wait. And EDIRC gets data about affiliated people from the RePEc Author Service, which can take a pass or two before being visible on the web. Or, information about papers disseminated through NEP are given to RePEc services. IDEAS, for example, uses this to categorize authors into topical fields.

RePEc is not a centralized service. It has servers, data gatherers, analyzers and users disseminated around the globe. They exchange data, but not in real time. Consider that there are over 1000 archives, and that RePEc services are disseminated over a dozen different machines. Still, update time are much, much faster than what one would have expected from any bibliographic service before RePEc came into existence.

About these ads

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 230 other followers

%d bloggers like this: