Classifying authors

March 16, 2008

A difficult task librarians often face in the classification of items is determining whether authors with similar names are the same person. Indeed, bibliographic records are most of the time very limited in author identification. Take the case of Adam Smith. He may be listed under his full name, which is by no means unique, or worse only as A. Smith, which is easily confused with others. Librarians then rely on context and additional information gathered outside of the bibliographic record to attribute the work to the right person, hopefully without error.

With the large numbers of works now available, such laborious categorization becomes unfeasible, and automatic classification makes numerous errors. Within RePEc, we rely on the authors themselves to perform the classification. When they register in the RePEc Author Service, they have the opportunity to enter all the possible name variations in they may be listed in a bibliographic record. For John Maynard Keynes (who is not registered), such name variations could be:

John Maynard Keynes
John M. Keynes
John Keynes
J. M. Keynes
J. Keynes
Keynes, John Maynard
Keynes, John M.
Keynes, John
Keynes, J. M.
Keynes, J.

In addition, an author may have changed names (through marriage), be listed with a title (Prof., Sir) or with a suffix (Jr, Sr, III). Variations multiply if names have accents, which some publishers do not take into account or encode in the wrong character set. The possibilities are numerous. The registered author is then offered first suggestions of works that match the name variations and then suggestions that offer some close match to name variations (typographical errors happen). The author can then accept these works or reject them.

The RePEc Author Service has so far managed to collect data from close to 16,000 authors who have claimed over 300,000 works as theirs. Such data is in particular used to increase the accuracy of various rankings. And within this set of authors, there is already a large number of homonyms, even when one looks beyond the initial of the first name, which is the precision that some other services have.

If you know of other homonyms in the profession, encourage them to register!


The RePEc budget for 2008

March 9, 2008
  Budget 2007 Effective 2007 Budget 2008
Expenses US$0.00 US$0.00 US$0.00
Revenues US$0.00 US$0.00 US$0.00

Thanks to all our volunteers!


RePEc in February 2008

March 2, 2008

Every month, a short summary of what happened with RePEc is sent to the RePEc-announce mailing list. I also put that message, slightly adapted, on this blog.

During this month, IDEAS moved to a new server sponsored by the Society for Economic Dynamics. It continues to be hosted by the University of Connecticut and is now located on a faster line to the Internet.

In terms of traffic, 613,984 file downloads and 2,246,241 abstract views were recorded within the month, once more significantly up from a year ago. This leads us to the thresholds we have passed this month:

40,000,000 cumulative article abstract views on all RePEc services
25,000,000 cumulative abstract views on EconPapers
300,000 items claimed by registered authors
100,000 JEL codes papers
20,000 unique subscribers in NEP
2,800 journals and series


Volunteer recognition: Thomas Krichel

February 21, 2008

Thomas Krichel is not just a RePEc volunteer, he is RePEc. In 1991, as an research assistant at the Economic Department of Loughborough University, he saw the potential that the Internet gave for the dissemination of research in Economics, but could not manage to get a hold on good data about new working papers. In February 1993, on a lectureship at the University of Surrey, he was more lucky and teamed with Féthy Mili, Economics librarian at the Université de Montréal, who contributed data on 250 series, and Hans Amman (University of Amsterdam), who let Thomas use his coryfee mailing list. Bob Parks soon joined with his Economics Working Paper Archive at Washington University. Thus the NetEc project was launched. It moved to a gopher server at the Manchester Computing Centre in 1993, and then to the web. That year, Thomas also got help in collecting data from José Manuel Barrueco Cruz, Economics librarian at the University of Valencia. But soon they realized that there was too much information out on the Internet for just the two of them to collect.

This is when Thomas suggested the creation of RePEc which would completely decentralize the data input: the publishers, who benefit the most from having their papers listed on web indexes, were to index the works themselves. With the collaboration of Sune Karlsson (SWoPEc, Stockholm School of Economics), Bob Parks and Corry Stuyts (DEGREE, Netherlands), José and Thomas then launched RePEc in June 1997. It still works under the same principles, with great success.

Thomas is still the heart and soul of RePEc. He has his hand in almost every project that is undertaken. After completing his Economics PhD at the University of Surrey, he moved to Long Island University to take a position of assistant professor in … Library Studies. Now tenured, he is an eminence grise in the online provision of bibliographic data and is pushing the RePEc concept into other fields. Within RePEc, most of his attention is currently directed towards NEP, the email notification service on new working papers.


World Ranking of Repositories, RePEc is #2

February 14, 2008

The Webometrics Ranking of World Universities is an initiative that tries to establish which universities provides to most content on the web and get visibility from it. The ranking of universities is based on the size of the web domain (20%), the number of rich files available (PDF, RTF, etc., 15%), Research on Google Scholar (15%), and link visibility (50%). Not surprisingly, US universities monopolize the 24 first spots, led by MIT.

Webometrics also ranks repositories, the criteria being the same as for universities. The ranking is led by Arxiv, the grand-daddy of all repositories covering much of Physics and Mathematics. RePEc is number 2, followed by E-LIS, a repository in Library Sciences founded by Thomas Krichel, who is also at the origin of RePEc!

Other notables down the list: HAL, a French repository that feeds to RePEc at number 9, CDLIB, the University of California Repository, a RePEc participant at number 19, SSRN, not in RePEc, at number 37, the Munich Personal RePEc Archive, barely a year old, is already number 56, and AgEconSearch, not in RePEc, is ranked number 126.


Society for Economic Dynamics sponsors new server for IDEAS

February 7, 2008

IDEAS just moved to a new server sponsored by the Society for Economic Dynamics. The old server, which was sponsored by the College of Liberal Arts and Sciences at the University of Connecticut had been running almost flawlessly since October 2002, but was starting to get overwhelmed by the amount of material now in RePEc and by the heavy traffic and number crunching it entails. While the amount of material more than tripled, the complexity of the data increased much more than that, given the links with authors, references, citations, JEL codes, NEP reports, rankings, institutions, publication compilations, and reading lists.

The new server has more computational power, more memory and especially more disk space. As before, it is hosting IDEAS, EDIRC and QM&RBC. It also hosts the website of the Society of Economic Dynamics, which is willing to sponsor it as it was looking for space to host the datasets and program codes used for articles published in the Review of Economic Dynamics. The server is also set up to provide limited emergency support in case another RePEc service is failing. The hosting continues to be provided by the College of Liberal Arts and Sciences at the University of Connecticut. In particular, Tim Ruggieri from the CLAS Computer Support Group helped with the configuration of the server.

RePEc relies entirely on the support of volunteers in its operations. Contact us if you want to help in one way or the other.


RePEc in January 2008

February 2, 2008

Every month, a short summary of what happened with RePEc is sent to the RePEc-announce mailing list. I also put that message, slightly adapted, on this blog.

The RePEc Author Service was unfortunately down for 10 days. We hope this was only a temporary problem, and full functionalities will be restored soon. The RePEc Blog was very helpful in keeping user abreast of the situation.

Contentwise, a notable addition has been the complete listing of the Journal of Political Economy, starting in 1893. In terms of traffic, 552,272 file downloads and 1,946,427 abstract views were recorded within the month, significantly up from a year ago. This leads us to the thresholds we have passed this month:

90,000,000 abstract views on IDEAS
450,000 online items
275,000 paper announcements through NEP
175,000 items with citations
170,000 online papers
170,000 papers with abstract
80,000 papers with citations
30,000 articles with references
900 books online


RePEc Author Service is down

January 18, 2008

The RePEc Author Service is down at the time of this writing. IT personel from the College of Liberal Arts and Sciences at the University of Connecticut is currently looking into the issue.

Update (Friday 18:00 EST): The service is back up, after an interruption of about 18 hours. It should be fully functional. Please do not hesitate to report any issues. We are sorry for the inconvenience.

Update 2 (Saturday 7:00 EST): The server went down again, and will not be back up before Monday.

Update 3 (Monday 19:00 EST): The server is still down. When it is back up, we will keep it offline to investigate the problem.

Update 4 (Thursday 18:00 EST): The machine is running (for the moment…) but we are still keeping it offline to work on it.

Update 5 (Tuesday 10:00 EST): All tests have been passed successfully, we are progessively reestablishing all services.

Update 6 (Tuesday 14:00 EST): Everything is looking good so far, expect the service to be available tomorrow if tests continue to look positive.

Update 7 (Wednesday 14:00 EST): The RePec Author Service is back online. Please report anything unusual. It is to be expected that some data is out of date, in particular citation data. Sorry for the inconvenience, and let us hope everything works fine now that the service is live.


75% of the top 1000 economists are now registered with RePEc

January 8, 2008

The RePEc Author Service recently surpassed 15,000 registered authors, and the post relating this mentions the high coverage among top ranked economists. To document this, take one popular ranking, the one by Tom Coupé that is based on publications from 1990 to 2000. Tom Coupé has two rankings, one where publications are weighted by the impact factors of the journals, the other where citations are counted. According to the “publications” ranking, 75% of the 1000 economists are now registered with RePEc, according to the other 65%. The difference comes from the fact that the latter also includes non-economists (political scientists, statisticians, demographers, law scholars, and sociologists) that are cited in Economics journals.

One particularly interesting aspect of these rankings is how the proportions of registered authors decline with rankings:

Ranks registered,
publication ranking
registered,
citation ranking
1-100 93 77
101-200 81 72
201-300 78 69
301-400 73 76
401-500 77 66
501-600 71 61
601-700 73 54
701-800 77 55
801-900 62 62
901-1000 65 60
Total 750 652

How can we explain this pattern? Are registered authors more likely to publish well or be cited? This may be true for more recent measures of visibility, but in 1990-2000, the RePEc Author Service was not yet functional. Are then better ranked authors more likely to care more about their visibility and thus more likely to register?


RePEc in December 2007, and what we have done over Year 2007

January 2, 2008

Every month, a short summary of what happened with RePEc is sent to the RePEc-announce mailing list. I will also put that message, slightly adapted, on this blog.

The major event this month is that we passed to three important thresholds: 15,000 authors, 80% of the material now online, and 1/8 billion abstract views. For some hints at what 15,000 authors represent in the Economics profession, see elsewhere on the blog. Also, we have now released rankings for the most cited recent papers and articles.

As year 2007 is now over, we can reflect on what RePEc has achieved over that year. 158 archives were added, and the total of currently 844 archives have added 108,000 bibliographic items to RePEc, a 24% growth, with 240 new working paper series and 130 new journals. 105,000 new items are online, a 31% growth. 3,500 authors registered, almost ten a day, a 30% growth. Citation analysis coverage increased by 39%.

In 2007, we added also a few new features:

  • Compilations by institutions of all publications from affiliated and registered authors (find institutions on EDIRC)
  • Customized publication compilations: by defining a list of authors or by creating a reading list
  • Registered authors can now manage citations at the RePEc Author Service: delete erroneous ones and approve citations that were deemed dubious matches.
  • Rankings have been improved with more criteria, with rankings within fields and with citation rankings for recent items only.
  • The RePEc blog was inaugurated.

Finally, RePEc celebrated its 10th year in its current form. I think this was an impressive year, and I am looking forward to an even better year 2008!

In terms of traffic, December is expectedly calmer, but we still managed record numbers for the month: 1,822,061 abstract views and 504,315 downloads. This leads us to the thresholds we have passed this month:

125,000,000 cumulative abstract views
275,000 online articles
130,000 items with references
15,000 registered authors
1,900 working paper series
80% of all items available online