Suggestion box

May 23, 2009

RePEc is entirely driven by volunteers, who are also users. Most current volunteers came to RePEc because either they wanted to help with a current project or because they had some idea they wanted implemented in RePEc. We are opening this suggestion box for several reasons: as way to encourage feedback, to encourage more volunteers to come forward and pick a suggestion, and finally have users and RePEc team members discuss the proposed suggestions.

At RePEc, we like to be open. After all, we are creating open bibliographies using open source software, and we encourage open access. RePEc is there for you, so tell us how you want it to be. So, make your suggestion in the comment section below.

Institutional data in RePEc

December 19, 2008

RePEc gathers information not only about publications and authors, but also institutions. Specifically, the EDIRC project (Economics Departments, Institutes and Research Centers) catalogues since 1995 all academics and government institutions that employ a significant share of economists, including think tanks and associations. For-profit organizations (banks, consultants, etc.) are listed if they contribute their publications to RePEc. As of today, 11,000 institutions are listed, including over 600 associations. Over 4000 have at least one registered author and about 1000 have some publication in RePEc.

The collected institutional data is used and displayed in various ways throughout RePEc. Authors use it when
they register to determine their affiliations. So do RePEc archives for their publications. Author and institution data are combined on EDIRC to compile the publication output of all institutions. Combine this with citation data from CitEc and download data from LogEc to determine institutional rankings.

Note that all the information about institutions has been gathered with the help of a lot of people.

RePEc as a bibliographic tool

September 14, 2008

RePEc is a scheme to collect bibliographic information about publication and pre-publications in Economics. Publishers provide all the relevant information, which is then displayed in various ways by RePEc services. This allows users to have access to this data. While it is useful to find items of research while browsing or searching through these services, it is even better when one can upload the relevant bibliographic data directly into one’s bibliographic tool.

Every abstract page on IDEAS has links that allow to download such bibliographic information in various formats: as a HTML citation, a plain text citation, the BibTeX entry familiar to LaTeX users, the RIS format used in various software like EndNote, and the ReDIF format used by RePEc. For registered authors, it is also possible to obtain these records for all their publications in one download. If other formats are used in the research community, they can be provided as well. Just ask.

NEP alerts now available through RSS

August 13, 2008

NEP (New Economics Papers) is an email service that alerts subscribers to new online working papers in their area of interest. About 80 fields are currently available, and the roughly weekly emails are sent free of charge. While the RePEc team thought email dissemination was sufficient, there also appears to be demand for RSS feeds as for this and other blogs. This is now available, and the RSS feeds can be subscribed to by clicking on the relevant field report on the NEP home page.

This new feature was added in typical RePEc fashion: David Hugh-Jones inquired with Marco Novarese why there was no RSS feed, Thomas Krichel encouraged David to set it up, and two days later, it was up.

If you think new features should be added to RePEc, we always welcome suggestions, especially if you are willing to do it yourself… much like many of the available NEP editors have been volunteers who just wanted a particular field to be covered.

Using RePEc for syllabi, bibliographies and publication lists

July 13, 2008

As highlighted in a recent post, we encourage deep linking in RePEc services. This is particularly useful for reading lists and syllabi. In fact, IDEAS provides simple tools to create such lists on its web site.

The first one allows to create reading lists by providing code that is similar to HTML and includes handles of items listed in RePEc. Each of these items is then automatically matched with other versions, thus allowing to find a free version of a password protected article, or find the latest version of a working paper as published in a journal. Different layouts are possible: one for a course syllabus, one for reading lists.

The second one allows to create a list of publications from a set of authors registered on RePEc. Existing examples include ex-pats from some countries, graduates from programs, winners of prizes, etc. Note that such lists are automatically computed for members of research units or departments. See listing on EDIRC. For other lists, this tools comes handy.

Why hotlinking to a RePEc service makes sense

June 27, 2008

Hotlinking is the practice of linking to a web page deep in a web site, instead of its front page. This practice is discouraged by many news sites, both because they prefer users to browse through the site and because links may become obsolete.

At RePEc, we actually encourage hotlinking. Links in RePEc services are designed to stay current (in principle). Also, instead of linking to a PDF file on a researcher’s web page, which may disappear, abstract pages on EconPapers or IDEAS are much more stable. In addition, these abstract pages may provide links to other versions of the paper. This proves particularly useful if the user does not have access to a password protected article from a commercial publisher, or if the user wishes to know whether the paper has been published. Other links on the abstract page can also be valuable, like those to author profiles, references, citations and related works. Finally, authors always appreciate when paper downloads are counted towards their statistics. Indeed, RePEc can only monitor traffic routed through its services.

Therefore, we encourage hotlinks to RePEc services on blogs, online syllabi, personal web pages, online bibliographies, etc.

The citation extraction process in CitEc

January 16, 2008

CitEc is an experimental autonomous citation index, that is, it is a software system which is able to automatically extract references out of the full texts of documents and create links between citing references and cited papers.

With its last update, the CitEc database has reached almost three million references and more than one million citations between documents available in RePEc. This is an important threshold but still is far of being a complete set of citations. There are some limits in the references extracion process:

First, the system needs to have open access to a electronic version of the documents full text. Many journals listed in RePEc have restricted access and therefore are excluded of CitEc unless they grant special access or push the citations to RePEc in other ways. We are working with some publishers that kindly provide us with metadata about references. We try to get on board as many publishers as possible but unfortunately not all of them are willing to collaborate with us at this time. As a result, the data set is still made up mainly of references extracted from working papers. This has the advantage of provide the most updated data about citations since working papers contains the most recent research results.

Second, the URL provided by the RePEc archive maintainer must be correct and must point to the PDF file containing the document full text and not to an intermediate abstract page or similar. Some archives provides this kind of links to force the researchers to pass through their institutional web pages. The system is unable to follow the links to the hidden papers and they are missed in the references extraction process.

The third limit is more technical. In order to extract references, the PDFs files need to be converted into plain ASCII text. This step is key to successfully complete the process, since a good quality text representation of the document makes easier the identification of references. There are a wide variety of PDF files created in different ways and not all of them can be converted.

Finally, the systems does a parsing of the references section, which first needs to be isolated, to identify each reference and split it in its parts: title, author, year, etc. The parsing is done using pattern matching techniques which in some cases are not able to identify the full list of existing references.

As the las update as of December 31, 2007, the CitEc numbers are: 527,357 articles and working papers available in RePEc. Of them, 343,441 cannot be processed by the system due to limitations mentioned in the first two points above, namely:

101,886 have not an electronic representation

216,110 have restricted access

19,174 have not a direct link to the docuent full text

6,271 have wrong url

That leaves an amount of 183,916 documents available to be processed by CitEc. Of them, the process was successfully completed in 134,130 papers, that is the 73% of the available documents. The complete list of sources and the number of processed documents for each series or journal is available here.

All the previous considerations should be taken into account when CitEc data is used for scientific evaluation purposes. We still consider the data to be experimental.

From the point of view of RePEc archive maintainers there are a few basic steps they can take to improve the situation. For example:

  • provide direct and correct URLs to the documents full text
  • make use of the X-File-Ref to give the system an ASCII version of the references section of a particular document
  • help us to lobby the publishers and editors of the restricted journals asking them to send us metadata about references.

15,000 authors on the RePEc Author Service

December 15, 2007

The 15,000th author registered recently on the RePEc Author Service (which also has another 5,000 registered, but without any works in their profile). See a list of all those registered at EconPapers or IDEAS. This give us the opportunity to reflect on the coverage of this service: what proportion of academic economists is covered? Let me offer a few suggestions.

Assume that the works listed in RePEc provide a representative sample of all the works written by economists. Then determine how many of these works are listed in the profile of a registered author. By that account, about 40.1% have been claimed, and thus about 40% of the profession would be registered with RePEc. This latter number is in reality higher, due to several biases: a) some authors are not alive and cannot register; b) some registered authors have the unfortunate habit to remove from their profile working papers once they are published; c) some works listed are not written by economists, and these authors are less likely to register with RePEc.

Alternatively, estimate the number of authors in the world from the membership in academic societies. I guess the three largest societies are the American Economic Association (18,000 members), the European Economic Association (2,300 members) and the Econometric Society (5,500 members). Obviously, their membership overlaps, and not every of their members is an author. But not every economist is member either. Assume that adding their membership numbers corrects for all mismeasurements, then the RePEc Author Service covers 58% of the profession.

One can also observe a specific subsample of economists, those listed among the top 1000 by Tom Coupé. There, the RePEc Author Service covers 75% of the top 1000 by publications and 65% of the top 1000 by citations (which includes quite a few non-economists). But we have good reasons to believe these proportions are higher than for the whole population. Indeed the proportion is significantly higher for the better ranked within this sample, and we can extrapolate that those outside the top 1000 are less represented in the RePEc Author Service.

In summary, the RePEc Author Service covers between 40% and 75% of the profession. Possibly less, possibly more, likely in between.

Categorizing Authors

October 27, 2007

We are trying to find a way to categorize authors registered with RePEc into fields. There are two obvious ways to do so that we did not like. We went for a third.

Self-categorization at registration

This would allow authors, when they generate or update their profile at the RePEc Author Service, to declare in which field(s) they work in. We see two problems with that: 1) This is not implemented in the current service; 2) Self-categorization is not necessarily accurate, as authors may not make consistent choices.

Using JEL codes of works

Authors have works in their profiles that can help in categorizing them. One way to do so is to use the JEL codes. Given their number (over 900), you obviously do not want to use the full set of codes. But this is not the real problem. A major issue is that relatively few papers and articles are JEL-coded in RePEc (as of today, 109’085 of 543’566, or one fifth). Given the wealth of data, the small proportion is not that problematic. However, items are very inconsistently coded in the sense that some publishers do not use them at all, other put a large number of codes for each item, some put just the top level codes (in some cases the same codes to all papers in a series), some go with very fine codes. As authors tends to publish more with some publishers than others (think of working paper series), all sorts of biases can creep up. Also, these codes are typically self-declared, which can also be problematic.

Using NEP data

Our suggestion is to use data collected with NEP. This project catalogs new working papers by field, the results being announced through emails (subscribe for the report in your field if you have not done so yet). The cataloging is done by human editors help by a nifty expert system. Thus we do not have the problem of self-declaration. Currently, there are 79 active NEP reports, and they have dealt with over 90’000 papers which have been categorized about 260’000 times. Indeed, the same paper can appear in multiple reports. We think that the categorization of works is more consistently performed by NEP editors than publishers. Also, there is no self-categorizing problem. Finally, NEP reports correspond more closely to fields as they are used everyday: they may encompass several or only part of the top JEL codes. (By the way, if you think a field is not represented, volunteer to edit one. It is less work than you think)

Recent working papers of registered authors are disseminated through NEP, thus we can use this data to categorize authors. The subjective factor now how to define whether an author is a specialist in her field. Indeed, one may work in different fields, so there should certainly not be an expectation that all papers fit in the same field. And the NEP editor may also have missed some. In the current implementation, the following rule is applied: an author is considered a specialist in a particular field if, amongst all papers announced through NEP, at least 25% were announced in the relevant NEP report. She is also a specialist if at least 5 papers were announced in that list.


Why 25%? Having a majority of the papers in a field would too high a hurdle for those who work in several fields. One should also factor in that some papers may have been missed by NEP editors.


Why 5? Say that one needs, in many cases, about that many papers to obtain tenure. You obtain tenure when you are considered to be a valuable researcher in a field.

Use of this data

How does the categorization pan out with these specifications? See the author list. To see how the fields of an author have been determined, go to the very bottom of her profile. Ultimately, we may use this data to rank authors within fields, and do so as well for institutions. We will discuss this later.

Our question to you

What do you think of the choice of 25% and 5? Please discuss this in the comment section, we truly value your input.


Get every new post delivered to your Inbox.

Join 230 other followers