Categorizing Authors

October 27, 2007

We are trying to find a way to categorize authors registered with RePEc into fields. There are two obvious ways to do so that we did not like. We went for a third.

Self-categorization at registration

This would allow authors, when they generate or update their profile at the RePEc Author Service, to declare in which field(s) they work in. We see two problems with that: 1) This is not implemented in the current service; 2) Self-categorization is not necessarily accurate, as authors may not make consistent choices.

Using JEL codes of works

Authors have works in their profiles that can help in categorizing them. One way to do so is to use the JEL codes. Given their number (over 900), you obviously do not want to use the full set of codes. But this is not the real problem. A major issue is that relatively few papers and articles are JEL-coded in RePEc (as of today, 109’085 of 543’566, or one fifth). Given the wealth of data, the small proportion is not that problematic. However, items are very inconsistently coded in the sense that some publishers do not use them at all, other put a large number of codes for each item, some put just the top level codes (in some cases the same codes to all papers in a series), some go with very fine codes. As authors tends to publish more with some publishers than others (think of working paper series), all sorts of biases can creep up. Also, these codes are typically self-declared, which can also be problematic.

Using NEP data

Our suggestion is to use data collected with NEP. This project catalogs new working papers by field, the results being announced through emails (subscribe for the report in your field if you have not done so yet). The cataloging is done by human editors help by a nifty expert system. Thus we do not have the problem of self-declaration. Currently, there are 79 active NEP reports, and they have dealt with over 90’000 papers which have been categorized about 260’000 times. Indeed, the same paper can appear in multiple reports. We think that the categorization of works is more consistently performed by NEP editors than publishers. Also, there is no self-categorizing problem. Finally, NEP reports correspond more closely to fields as they are used everyday: they may encompass several or only part of the top JEL codes. (By the way, if you think a field is not represented, volunteer to edit one. It is less work than you think)

Recent working papers of registered authors are disseminated through NEP, thus we can use this data to categorize authors. The subjective factor now how to define whether an author is a specialist in her field. Indeed, one may work in different fields, so there should certainly not be an expectation that all papers fit in the same field. And the NEP editor may also have missed some. In the current implementation, the following rule is applied: an author is considered a specialist in a particular field if, amongst all papers announced through NEP, at least 25% were announced in the relevant NEP report. She is also a specialist if at least 5 papers were announced in that list.


Why 25%? Having a majority of the papers in a field would too high a hurdle for those who work in several fields. One should also factor in that some papers may have been missed by NEP editors.


Why 5? Say that one needs, in many cases, about that many papers to obtain tenure. You obtain tenure when you are considered to be a valuable researcher in a field.

Use of this data

How does the categorization pan out with these specifications? See the author list. To see how the fields of an author have been determined, go to the very bottom of her profile. Ultimately, we may use this data to rank authors within fields, and do so as well for institutions. We will discuss this later.

Our question to you

What do you think of the choice of 25% and 5? Please discuss this in the comment section, we truly value your input.