Keeping RePEc participants in the loop

November 21, 2009

Over 12 years, RePEc has grown to a large community, with 1100 participating archives and over 22000 registered authors. But a community like this is only useful if it is dynamic: participants participate and users use it. It is easy to register and then forget about it. This is why archive and series maintainers, editors and authors get every month an individualized emails as a reminder that they are part of the community, as well as individualized news about their participation.

For editors, archive and series maintainers, the email contains the latest statistics about their publications: downloads and abstract views, impact factors, error message about the syntax of their templates, the URLs of the full texts, or our attempts to reach their data. Maintainers may as well get messages in between if some issues arise.

For authors, monthly emails also contain downloads and abstract views, as well as any new citations discovered by the CitEc project. Also, the email contains a personalized link leading to a ranking analysis of the author according to over 30 criteria. Authors also get separate emails alerting them of potential works they can add to their portfolio.

Keeping regular contact with participants is essential to ensure continued alertness about the project and to keep the collected data as fresh and accurate as possible.


CitEc machine moves

November 11, 2009

On 2009-11-10, the Instituto Valenciano de Investigaciones Económicas took over mutabor, the machine that makes CitEc, from the Universidad Politécnica de Valencia. The RePEc community is grateful to Fernando Ferrer, who helped running the machine at the Universidad Politécnica de Valencia. We cheer Rodrigo Aragón Rodríguez who will be helping to maintain the machine at its new location.

CitEc is the citation analysis project within RePEc. At the time of this writing, it has analysed 230.279 documents, finding 5.130.205 references and 2.176.994 citations. The software side of the project is maintained by José Manuel Barrueco Cruz.


RePEc in October 2009

November 4, 2009

This way again a very busy month, with 878,635 file downloads and 3,199,663 abstract views, which are numbers close to records. We also had 16 new particpating archives: Unversité d’Orléans, Finanzas Púplicas México, “Constantin Brancusi” University of Targu-Jiu, BEPress (II), University of Bacau, Ryerson University, Tblissi State University, Seoul National University, Athens University of Economics and Business, University of Manchester (II), Imperial College, Universté Catholique de Lille, ETH Zurich (IV, V), Towson University, European Commission Joint Research Centre.

In terms of thresholds passed, we report:

30000000 cumulative downloads through IDEAS
750000 cumulative software component downloads
450000 works listed in author profiles
300000 cumulative chapter downloads
1100 participating archives


Good practices for RePEc archive maintainers

October 27, 2009

The bibliographic data displayed in RePEc services originates in about 1100 participating archives, each maintained by a volunteer (see hee for instructions to start a RePEc archive). The quality of the data in RePEc thus depends on the quality of what is entered at the archive level, and there are obviously some variations. In general, we recommend to provide as many bibliographic details as possible so as to improve the chances of each work to be found in user searches. While missing fields are sometimes frustrating for users, the incorrect use of bibliographic fields is more so. This post provides some advice to RePEc archive maintainers regarding the most frequent violations of RePEc taxonomy.


  • It is always a good idea to check your series from time to time on EconPapers and IDEAS. A good opportunity is when you get your monthly email. That allows often to uncover errors. Also, use the syntax checker on EconPapers, which usually uncovers why some item is no showing up on RePEc.
  • The most frequent imprecision in RePEc data is the abuse of the Author-Name field. It should only contain the name of the author, but not his affiliation (which belongs into Author-Workplace-Name) nor his email address (Author-Email). Also, there should be only one author per Author-Name field. With multiple authors, repeat the field.

    Correct use of the Author-Name field is important, because it allows to attribute the works to the appropriate authors in the RePEc Author Service. It frustrates authors when they do not find their own works due to miscodings.

  • Generally, put in the field what the calls for. There is a surprising number of Title fields that actually contain abstracts, for example. And keywords or classification codes do not belong in the abstract, but in Keywords and Classification-JEL.
  • Make sure to provide a date for your bibliographic item. Without a date, it cannot be displayed in chronological order. For working papers, they cannot be considered for diffusion through NEP as it cannot be established whether they are new. For working papers, use the Creation-Date field, with a syntax like yyyy, yyyy-mm or yyyy-mm-dd. For articles use Year. The relevant date is the one at which the work was written, not when the bibliographic record was created.
  • Links to online texts are provided with the File-URL field. It should link directly to the pdf file, not to an intermediate abstract page. There are two reasons: First, users already see an abstract page on the RePEc service. Second, we need a direct link to perform the citation analysis.
  • The easiest way to including an abstract in a bibliographic record is to cut-and-paste from the pdf file. In some cases, some characters do not travel well. This is especially the case for ligatures like “ff”, “fl”, “fi”, and the like. Also, end-of-line hyphenations need to be removed from abstracts. Thus, always read through an abstract after pasting it.
  • Never, never recycle handles. Handles are unique identifiers that are used throughout RePEc, for example to assign paper to authors, relate references and determine what is a new record. Avoid changing handles, as this ruptures all these relations that need to be reestablished. But never, never reassign an existing handle to a different item, because this renders exiting relations erroneous.
  • Bibliographic records should not contain any HTML encoding. If a special character needs to be displayed, says an accented character, use UTF-8 encoding. The usual text editors will provide the byte-order mark (BOM) at the start of the file indicating that it is UTF-8 encoded. But you you generate the files through scripts, they need to explicitly add the BOM.


International Open Access Week

October 20, 2009

RePEc is highlighted in the Boston College Libraries’ newsletter, special issue for OA Week:
http://www.bc.edu/libraries/newsletter/


Polls on ranking disclosures

October 15, 2009

Rankings have become an important part of RePEc and we regularly get request about non-published rankings. Indeed, depending on the ranking in question, only the top 5%, 10% or 20% among authors or institutions are displayed, depending on the geographic or field aggregation. Given the insistence of some requests, I am now considering whether RePEc rankings should be disclosed in a more extensive fashion. Before making any changes, I am seeking the opinion of users.

But first, let me expose the reasons of the limited disclosure so far. Our interest is to have as many institutions and people participate in RePEc, and keep their data there current. Rankings provide the right incentives for this. Thus RePEc participation is our focus, and rankings are an accessory (and we still consider them to be experimental, as the data is still far from complete). We know, however, that at least some people do not like their poor rankings exposed and would thus remove their registration in RePEc if this were exposed. Thus, too extensive ranking disclosure would defeat their purpose. But I have no idea how widespread this would be. The second reason for limited disclosure is that rankings become less reliable as one goes further down the list. Consider, for example, that 28% of all authors have no recorded citation. Third, full disclosure will create a lot of large files and tables. We have about 22000 authors and 4500 institutions to rank…

The following polls are not binding. There results will help to define what users want. Feel free to discuss aspects that go beyond the options of the polls in the comment section (of this post, not of the individual polls). I will then decide what to do. For both author and institution rankings, the options are: 1) keep things as is, 2) disclose all the way to the top half, 3) keep things as is, but provide rankings for the following one in clusters. For example, rank the top 5% as now, then have a list of the top 6-10%, another for the top 10-15%. 4) Provide full rankings. Polls will be open until November 21, 2009.




RePEc in September 2009

October 6, 2009

Now that vacations are over, activity on RePEc is as high as ever. Several new features were introduced in September: a Facebook application that allows to display one’s latest research and experimental blogs by NEP editors discussing research in some fields. Traffic has pickep up again, with 763,583 file downloads and 2,735,405 abstract views. Also, 11 new archives joined: University of Bath, Australian Journal of Labour Economics, University of Luxembourg, University of Pécs, University of Tsukuba, Bar-Ilan University, Australian National University (IV), c.MET-05, University of Natural Resources and Applied Life Sciences Vienna, Kenyon College, International Association for Energy Economics.

Finally, we passed some thresholds, including some major ones:
800,000 works listed
250,000 online working papers
200,000 article abstracts
25,000 NEP reports


A new initiative on research blogging

September 30, 2009

We have discussed on various occasions new means of research dissemination and peer review on this blog. One way that could show promise is blogs that discuss research. There are still only few of them, and their readership is still rather small compared to the big current events blogs (and it may be better so). We want to explore whether blog can become a sustainable and useful way of dissemination, discussing and even advancing research in Economics.

To this end, a blog aggregator specialized on research blogs in Economics was created last year: EconAcademics. Some NEP editors are now starting an experiment whereby they highlight and open for discussion one working paper a week. This paper is taking from their weekly list of new working papers, specific to their field. For now, two such NEP blogs are in place: NEP-DGE (Dynamic General Equilibrium) and NEP-OPM (Open Macroeconomics). Others may follow soon and will be listed both in the side bar of this blog and on EconAcademics.

We hope that the selected papers will generate some interesting discussion. We will see whether the profession is ready for this type of discussion. Earlier attempts with the defunct WoPEc and with the Society of Labor Economists failed. A current initiative at the Economics E-Journal seems to work somewhat. Watch and participate in the NEP blogs!


How abstract views and downloads are counted

September 19, 2009

Authors and RePEc archive maintainers receive monthly emails with various statistics, and among the most anticipated statistics are our abstract views and download counts. It is important to understand how those statistics are collected and what they measure (and do not measure). Full statistics are available on the LogEc website managed by Sune Karlsson from Örebro University (Sweden).

Participating RePEc services (EconPapers, IDEAS, NEP and Socionet) keep a log of all activity on their sites. This allows us to count page views for the abstract pages of each items in the database (excluding NEP, as abstracts are listed in emails). Logs also record outclicks as users leave the RePEc services to the sites containing the full texts they seeks to download. This allows us to count “downloads”. Quotation marks are required as it is impossible to record whether the download was successful, for example in the case of gated publisher sites. Note also that this means that downloads that have not transited through a RePEc cannot be counted, as we do not have access to local logs.

LogEc gathers the logs from the participating services and aggregates the statistics. This involves much more than bean counting, though. Indeed, one first needs to exclude robot activity, as only human activity is of interest. Some robots declare themselves as such, but other hide their identity. One has thus to infer from various patterns what IP addresses are likely robots. This is an important step, as robots represent typically 75% of raw abstract views. Robots include spiders from many search engines as well as other initiatives on the Internet.

One needs also to weed out multiple views or downloads by the same user. This brings us to detecting attempts at increasing counts by authors. Obviously, we cannot reveal here how this is done, but let it be known that we have detected fraud even by authors using multiple Internet service providers. The methods used lead to some undercounting, though. Multiple users behind the same cache server may be counted only once, as it may for example happen to employees of the US Federal Reserve Banks that use RePEc.

And we are still not done pruning. LogEc then checks for additional patterns that need to be vetted by a human eye. Unusual activity is then checked and often reconciled with traffic from popular blogs, magazines and newspapers. But on other occasions, traffic surges cannot be explained in licit ways and need to be cleaned out.

After all these manipulations, statistics are published and disseminated. And despite substantial pruning, RePEc services still get over 2,000,000 abstract views and 600,000 downloads every month. See LogEc for details.


RePEcFB – An integration of your RePEc data into your Facebook profile

September 9, 2009

Following a suggestion on this blog and the creation of a RePEc Facebook group, we are happy to announce that a new service went online last week. The Facebook application RePEcFB allows Facebook users to integrate their RePEc data into Facebook. Economists on Facebook can create a small profile box listing their recent work, or a “My research” tab in the Facebook profile giving information about their working papers, publications and other research output. Users can list their affiliations and professional contact data, announce recent papers authored by their Facebook friends, or inform about conferences and other academic events they are going to attend. New papers or affiliations can be directly posted to the Wall and can be commented on by friends.

To use the application you both need a Facebook account and a RePEc author profile with RePEc Author Service. Detailed instructions can be found on the Notes tab of the application’s homepage.

RePEcFB was written by Ben Greiner with the help of László Kóczy, Sune Karlsson, and Thomas Krichel. The software is hosted on Sune’s server at Örebro University. The software is under ongoing development, so feel free to send comments to the author.