Categorizing Authors

We are trying to find a way to categorize authors registered with RePEc into fields. There are two obvious ways to do so that we did not like. We went for a third.

Self-categorization at registration

This would allow authors, when they generate or update their profile at the RePEc Author Service, to declare in which field(s) they work in. We see two problems with that: 1) This is not implemented in the current service; 2) Self-categorization is not necessarily accurate, as authors may not make consistent choices.

Using JEL codes of works

Authors have works in their profiles that can help in categorizing them. One way to do so is to use the JEL codes. Given their number (over 900), you obviously do not want to use the full set of codes. But this is not the real problem. A major issue is that relatively few papers and articles are JEL-coded in RePEc (as of today, 109’085 of 543’566, or one fifth). Given the wealth of data, the small proportion is not that problematic. However, items are very inconsistently coded in the sense that some publishers do not use them at all, other put a large number of codes for each item, some put just the top level codes (in some cases the same codes to all papers in a series), some go with very fine codes. As authors tends to publish more with some publishers than others (think of working paper series), all sorts of biases can creep up. Also, these codes are typically self-declared, which can also be problematic.

Using NEP data

Our suggestion is to use data collected with NEP. This project catalogs new working papers by field, the results being announced through emails (subscribe for the report in your field if you have not done so yet). The cataloging is done by human editors help by a nifty expert system. Thus we do not have the problem of self-declaration. Currently, there are 79 active NEP reports, and they have dealt with over 90’000 papers which have been categorized about 260’000 times. Indeed, the same paper can appear in multiple reports. We think that the categorization of works is more consistently performed by NEP editors than publishers. Also, there is no self-categorizing problem. Finally, NEP reports correspond more closely to fields as they are used everyday: they may encompass several or only part of the top JEL codes. (By the way, if you think a field is not represented, volunteer to edit one. It is less work than you think)

Recent working papers of registered authors are disseminated through NEP, thus we can use this data to categorize authors. The subjective factor now how to define whether an author is a specialist in her field. Indeed, one may work in different fields, so there should certainly not be an expectation that all papers fit in the same field. And the NEP editor may also have missed some. In the current implementation, the following rule is applied: an author is considered a specialist in a particular field if, amongst all papers announced through NEP, at least 25% were announced in the relevant NEP report. She is also a specialist if at least 5 papers were announced in that list.

25%

Why 25%? Having a majority of the papers in a field would too high a hurdle for those who work in several fields. One should also factor in that some papers may have been missed by NEP editors.

Why 5? Say that one needs, in many cases, about that many papers to obtain tenure. You obtain tenure when you are considered to be a valuable researcher in a field.

Use of this data

How does the categorization pan out with these specifications? See the author list. To see how the fields of an author have been determined, go to the very bottom of her profile. Ultimately, we may use this data to rank authors within fields, and do so as well for institutions. We will discuss this later.

Our question to you

What do you think of the choice of 25% and 5? Please discuss this in the comment section, we truly value your input.

This entry was posted on Saturday, October 27th, 2007 at 3:11 pm and is filed under RePEc features. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

5 Responses to Categorizing Authors

Richard Tol says:

October 28, 2007 at 4:58 am

The need for a double rule is obvious. A relative rule only would fail to recognise the minor specialisations of prolific authors, while an absolute rule only would disenfranchise authors at the start of their career. The only real alternative would be gliding scale from absolute to relative numbers as authors publish more, but that would be complicated and intransparant, and if done through some discrete approximation, induce nasty border effects. So, let’s stick to these rules.

5 and 25% are random numbers — as random as any other choice.

I did the obvious thing: Checked my own classification, and the things I know best: Environmental economics, and energy economics.

My specialisations are correctly recognised, although some are by co-authorship. One focus area is not there, but that is because there is no NEP report on water. This is an incentive to start one.

The lists of environmental economics and energy economics have many of the usual suspects.

There are some unexpected names, but in the cases I checked that is because these people did not update their profile for years.

Some of the big names are missing, but in most cases I checked that is because there people did not register. Publishing ranking by field would give them an additional incentive.

A few people are misclassified. In the cases I checked, that is because the NEP report cover only their recent papers. As far as I know, NEP reports do cover recently added historical material. However, it would be good to do a retrospective NEP report on all the material that was published before the first NEP report in an area.
scionescire says:

October 28, 2007 at 7:47 am

Dear Christian, first of all, let me again thank you for this initiative. Then, I’d like to repeat the example which (TMHO) demonstrates that the 25-percent criterion is of little value.

Consider a prolific author who has published 100 papers, which have received 300 entries in several reports. Now assume that this author has dispersed interests, so these 300 entries are distributed among, say, 10 reports. With an average of 30 papers, he would not be considered an expert in any of these fields. Moreover, even if he has contributed 70 entries in one of the 10 reports, this would not be sufficient to call him an expert in this field, according to the 25-percent criterion.

Compare this non-expert to another researcher who has contributed just one paper to NEP, which was cited in 4 reports. This renders him an expert in all four fields.

These two cases could easily be handled by the 5-paper criterion alone: the first researcher would then be considered an expert in many of his 10 fields, while the second guy had to wait a little bit longer.

Hence, while applying the 5-paper criterion alone would really makes sense, the 25-percent criterion provides no additional insights. On the contrary, it may rule out very productive writers and benefits the less prolific ones. I guess this was not the intention ;-)

All the best
Roland
schlicht says:

October 28, 2007 at 3:43 pm

It would be good to learn something about the intended uses of such lists. People could make also suggestions for other uses. This would make it easier to comment on the proposed criteria.

Ekkehart
Christian Zimmermann says:

October 28, 2007 at 5:53 pm

Let me respond to the comments so far.

Richard, the primary purpose of NEP is to announce new working papers. The categorization I propose jsut piggybacks on NEP. Thus, I do not see it appropriate to burden editors with the task of classifying older papers.

Roland, I see your point that someone with only a single paper could falsely be considered a specialist in a field. But keep in mind that NEP only has a snapshot of the literature and a recent at that. Only the more productive authors will have five or more papers in their field. For example, I consider myself a macroeconomist but would be excluded.

Ekkehart, this is a good point that any good economist should make: define the goal, then we can talk about the tools. Now that we approach 15’000 registered authors, my goal is to provide sublist by field of manageable size. A by-product is that those lists could be used for rankings within fields. There are various ways of doing the latter that I want to discuss later, and we should not be constrained by the construction of rankings.
The RePEc blog » Blog Archive » Ranking Institutions Within Fields says:

November 9, 2007 at 10:21 pm

[…] previous posts, we discussed how to categorize authors by field and then how to rank them within fields. These discussions are still open and I can still be […]

The RePEc Blog