Poll on internal disclosure of author statistics

January 14, 2010

The statistics compiled by RePEc are used for various rankings, for example for authors. While we still consider these statistics to be experimental, in particular those pertaining to citations, these numbers are increasingly used for evaluation purposes. We value the privacy of authors and only disclose the statistics for the top ranked ones. Authors get each month a link to their statistics, a link with a code valid only for a month. This avoids a link that may have been disclosed once to be visible forever.

We get more a more requests from department heads to obtain the data for members of their department. Our typical response is to have them ask directly the members of their department to forward them their monthly RePEc emails. But of course, we could also provide directly all the relevant information. The purpose of this poll is to see whether participants in RePEc would favor such a disclosure. The conditions would be:

  1. The request must come directly from the head of the relevant department, by email to Christian Zimmermann
  2. The request contains a link allowing to verify that this person is indeed head.
  3. Statistics would be disclosed only for the members of the department who are currently affiliated with the department, as indicated in their RePEc profiles.
  4. Those with invalid email addresses would be excluded from the analysis, under the presumption that their affiliation may not be current.
  5. The department head would be provided with a link containing the analysis, a link that expires with the next monthly update of the rankings.
  6. The department head needs to be a registered author.
  7. A department would receive at most one disclosure a year.
  8. We reserve the right to refuse disclosing statistics.
  9. By department, we mean any unit with a separate entry in EDIRC.

If you have an opinion about thus please vote below and/or offer a comment. Given the nature of the question, we would require significant more than a majority (two thirds) to offer this service. The poll closes on February 21, 2010.

Update (22/2/2010): This poll is now closed. There was little interest, with only 89 votes, which fell 52-37 in favor of the disclosure. With an approval rate of 58%, it fell short of what I consider a substantial majority for an implementation. Thus, there will be no disclosure to department heads.

Polls on ranking disclosures

October 15, 2009

Rankings have become an important part of RePEc and we regularly get request about non-published rankings. Indeed, depending on the ranking in question, only the top 5%, 10% or 20% among authors or institutions are displayed, depending on the geographic or field aggregation. Given the insistence of some requests, I am now considering whether RePEc rankings should be disclosed in a more extensive fashion. Before making any changes, I am seeking the opinion of users.

But first, let me expose the reasons of the limited disclosure so far. Our interest is to have as many institutions and people participate in RePEc, and keep their data there current. Rankings provide the right incentives for this. Thus RePEc participation is our focus, and rankings are an accessory (and we still consider them to be experimental, as the data is still far from complete). We know, however, that at least some people do not like their poor rankings exposed and would thus remove their registration in RePEc if this were exposed. Thus, too extensive ranking disclosure would defeat their purpose. But I have no idea how widespread this would be. The second reason for limited disclosure is that rankings become less reliable as one goes further down the list. Consider, for example, that 28% of all authors have no recorded citation. Third, full disclosure will create a lot of large files and tables. We have about 22000 authors and 4500 institutions to rank…

The following polls are not binding. There results will help to define what users want. Feel free to discuss aspects that go beyond the options of the polls in the comment section (of this post, not of the individual polls). I will then decide what to do. For both author and institution rankings, the options are: 1) keep things as is, 2) disclose all the way to the top half, 3) keep things as is, but provide rankings for the following one in clusters. For example, rank the top 5% as now, then have a list of the top 6-10%, another for the top 10-15%. 4) Provide full rankings. Polls will be open until November 21, 2009.

Update: Polls are now closed. A post soon will discuss results as well as various adjustments to rankings.

About RePEc impact factors

July 27, 2009

Impact factors have always been a popular way to measure the influence of academic journals. They have been popularized by ISI, now part of Thomson. RePEc also provides impact factors, and this post is about explaining the differences between the two.

ISI takes a sample of journals and analyzes the citations across those journals. To be eligible, a citations has to appear within two years of the publication of the cited article, the cited article must be printed (not forthcoming, a working paper or a manuscript), and the cited article must be among the analyzed journals (286 in Economics). ISI is currently experimenting with a five year window, in addition to the existing two-year window.

RePEc considers all publications listed in its bibliographic database. Thus, it also considers other publication forms than journal articles: close to 1000 journals and 2600 working paper series. It imposes no time window, citations of any age qualify. In most cases, a citation of a working paper will count towards its published form once the article is included in RePEc, possibly after the original citation (condition: at least one author has both versions in his/her RePEc profile). This implies that working paper series and book series can also have impact factors. RePEc is thus more comprehensive.

However, the pool of citations RePEc is drawing from is different. It relies very much on working papers (who can later be published), as they are typically openly accessible. Some publishers also provide references in the bibliographic metadata, but not all. One implication of this is that RePEc is more current as it includes citations to and from research that is not yet published. As research gets published, this data gets updated. But as references from many journals are missing, RePEc citation data must still be treated as experimental. Whether these omissions matter remain to be seen. After all, impact factors always have to be considered in relative terms, not in absolute terms, and if omissions were not biased, they would not matter.

Another major difference is that RePEc excludes self-citations. This is an important issue as some journals, explicitly or implicitly, encourage authors to cite other articles published within the two year window in the same journal. Thus, just as self-citations are excluded for authors, they are excluded for journals. And this can matter a lot.

Finally, the impact factor is determined by divided the eligible citations by the number of eligible articles. ISI determines itself what articles are eligible for the denominator, and this can even be negotiated with the publisher. In RePEc’s case, if an article (or a working paper) is listed, it counts without adjustment.

RePEc also publishes variations on the “simple” impact factor: recursive impact factors, where every citation counts with the impact factor of the citing publication, this favors impact over numbers; discounted impact factors, where the impact of a citation decays with time (regardless of the age of the cited item; and a combination of the two, discounted recursive impact factors. Finally, there is now also the h-index. All variations have a different story to tell about the publication, and RePEc offers the reader the choice.

The best top level institutions in Economics

June 21, 2009

RePEc rankings are surprisingly popular, despite their experimental status, in fact this is the most read topic on this blog. So to cater to the interest of our users, let us add another ranking… RePEc has been ranking institutions for quite a while now, using the institutions listed in EDIRC. This ranks, say, at the department level, not at the university level. This is detrimental to institutions where economists are scattered in various departments, in particular in departments that are not listed in EDIRC, for example law, political sciences and statistics. A new ranking is now computed that assembles all authors within the top level institution for their affiliation(s), say a university, a government, etc. Current results are here.

The methodology is the following. For affiliations listed in EDIRC, the top level is used. That would typically be a university. For affiliations not listed in EDIRC, the homepage domain of the institution submitted by the author is matched with any institutions listed in EDIRC. If no match is found, it is taken as is. Finally, as usual with multiple affiliations, a weighing scheme is used to distribute the author’s score across all affiliations.

Note a few particularities. All components of the University of London (LSE, Imperial College, etc.) are all merged into one. All subdivisions of a national government are also merged. US Federal Reserve Banks, however, are not merged, as they are top level in their respective states.

Tips for authors to improve their RePEc ranking

April 16, 2009

By far the most popular topic on this blog is material about rankings. People love to know who the best are and how they fare. This post is about optimizing one’s ranking within RePEc, and doing so in a way that does not trigger our safeguards against cheating. It turns out all the following points are points we actually want to encourage anyway so as to improve the quality of the data collected in RePEc.

As an author, here is what you can do once you logged into the RePEc Author Service:

  1. Make sure all your works listed in RePEc are actually in your profile. Thus, do not remove from your profile working papers that have been published. Some working paper series have higher impact factors than many journals, and working papers are much more downloaded than articles. In addition, if all versions are in your profile, we can link between them. (If you previously refused items that were yours, you can recuperate them by clicking on the “refused” tab in your research page, unrefuse the relevant items, and then redo the search)
  2. Make sure the name variations listed in your profile really encompass all possible ways a publisher may have listed your name. The automatic search is only going to find works with such names.
  3. There may be additional citations waiting for your approval. These are those for which we have less confidence that they pertain to the right work. Click on the “citation” tab in your author account.
  4. Link to your profile on EconPapers or IDEAS from your homepage or email signature.
  5. When refering to your works on a web page, put the link to EconPapers or IDEAS. We cannot count downloads that do not transit through RePEc services.
  6. Make sure all your works are listed on RePEc. For the missing ones, encourage the publisher to list them, or get your department to open a working paper series, or upload your works on the Munich Personal RePEc Archive.

As an institution, you can optimize your ranking by making sure your registered authors follow the advice from above and:

  1. Make sure everyone is registered and maintains his/her profile.
  2. Make sure everyone gives the proper affiliation. You can check who is listed with you by finding your institution on EDIRC.
  3. Have your working paper series listed on RePEc. Instructions are here.

If everyone optimizes like this, RePEc data will be more complete, current and useful. Help us make it better!

The best young economists?

March 25, 2009

Who are the best young economists? RePEc publishes all sorts of rankings based on its data, but has so far been missing one that highlights the best young economists. Indeed, they are typically invisible from the general rankings as it takes many years to build up the required body of work and citations to be featured among the top economists.

Unfortunately, authors registering with RePEc do not supply their year of birth or the year they obtained their last graduate degree. However, RePEc has information about the date of most publications, and it is then possible to determine (roughly) when a career started. Here, we do not make the type of publication (article vs. working paper, for example), as the goal is to try to approximate when the economist started being active in research.

Based on this criterion, two groups of economists are selected: those with their first publication, whatever the medium, less than five years ago, and those less than ten years ago. Quite obviously, there is considerably more measurement error compared to that already present in the general ranking, first because of the imperfect measure of the start of the career, second because the body of work is typically much smaller. But we hope people will still find these rankings useful.

Call for comments: modifications in the rankings of institutions

October 19, 2008

One feature of RePEc is its ability to rank researchers and the institutions they are affiliated with. Researchers create a list of affiliations when they register in the RePEc Author Service. However, this system was devised before rankings started to be computed, and some unforeseen consequences have emerged for authors with multiple affiliations. As there is no way to determine which affiliation is the main one, or what percentage economists would allocate to each, we are forced to treat each affiliation equally for ranking purposes. This leads in several cases institutional rankings to be “hijacked” by organizations that offer secondary affiliations. See, for example, the overall ranking of institutions. Another consequence can be found in the regional ranking, where individuals with a main affiliation from outside may take the place from legitimate insiders. Prime examples are Massachusetts, the United Kingdom and Germany.

What are the solutions? The obvious one is to modify the RePEc Author Service scripts to allow the declaration of a main affiliation or of affiliation shares. We have pondered that for some time now but find it very difficult to implement, especially as the main resource person for this project is not with us anymore. Thus we need to find some way to proxy the affiliations shares. I want to propose here one way to do this, open it for discussion, with the goal of having a formula in place for the January 2009 rankings.

The logic of the proposed formula is that there are many people affiliated with a particular institution, then it must be that most of them have courtesy or secondary affiliations. If person A is affiliated with institutions 1 and 2, institution 1 has many people registered and institution 2 few, then the ranking scores of person A should count more toward institution 2 than 1. Of course, such a distribution scheme pertains only to authors with multiple affiliations.

To be precise, let I be set set of affiliations of an author. For each i in I, let Si be the number of authors affiliated with institution i. Compute S as the sum of all Si. The weight of each affiliation is Ti=S/Si. These weights are then normalized to sum to one.

Take the following example. Economist A is affiliated with the Harvard Economics Department (46 registrants), the NBER (324 registrants) and the CEPR (262 registrants). The respective Ti would be 632/46=13.74, 632/324=1.95, and 632/262=2.41, given that 46+324+262=632. After normalizing the T‘s to one, Economist A’s ranking scores would count to 13.74/18.10=75.9% for the Harvard Economics Department, 1.95/18.10=10.8% for the NBER and 2.41/18.10=13.3% for the CEPR. For regional rankings, 86.7% (75.9% + 10.8%) of his scores would count in Massachusetts and 13.3% in the United Kingdom. Under current rules, scores are distributed fully to affiliated institutions and count fully in each region.

This is much simpler than I can manage to explain here… But a few additional details are in order: some variations in definitions can be discussed: Si can represent the number of registrants, the number of authors (registrants with works) or the numbers of works of authors. The latter would be to avoid institutions to discourage (erroneously) young faculty with few works to sign up. I favor the number of authors. Also, we need to deal with affiliations that are not listed in the database (EDIRC) and thus do not have a defined number of registrants. One solution is to just ignore such affiliations. The drawback is that the relevant authors may not get ranked in some regions where they are genuinely affiliated. Thus I propose to apply for those institutions the average Si of the other affiliations. If no affiliation is in the database, all get the same weight.

I now welcome comments on how to proceed and hope to implement the new scheme for the January 2009 rankings, which are released in the first days of February 2009.

January 18, 2009 Update: The new ranking method for institutions has now been programmed and is ready for the early February release. The formula discussed above has been adopted with two amendments. The first was discussed in the comments: 50% of the weight is allocated to the institution with the same domain name as the author’s email address. The remaining 50% is allocated over all affiliated institutions by the formula given above. The second amendment pertains to the weights of institutions that are not listed in EDIRC. As there is no author count for them, I put the default at the average number of authors per listed institution, currently 4.55.

February 3, 2009 Update: I am receiving many questions about the sudden changes in the rankings within countries. As authors with multiple affiliations do not count fully in each location any more, their ranking has worsened. Similarly, institutions that have many members with multiple affiliations now look worse. Note also that a few small errors have crept in, and they will be corrected for the February ranking.