Wikipedia and RePEc

March 19, 2012

Wikipedia is a well known crowd-sourced encyclopedia. It has an incredible wealth of knowledge which is often backed up by appropriate citations. Those citations may lead to material listed on RePEc. In fact, Wikipedia is currently the most important referrer to IDEAS (excluding search engines) and there are currently 1516 links to IDEAS and EconPapers, mainly on Wikipedia, and also on a few other projects, like Wikibooks, Wikiversity and Wiktionary. This number is gathered from the 57 languages with the most pages on Wikipedia. Of the 1524, 1363 resolve to author, book, article, chapter, software component or paper pages on IDEAS or EconPapers. The rest are mostly to service portals or to rankings.

The fact that a paper is mentioned in Wikipedia is not unlike a citation. Hence, IDEAS now links back to the appropriate Wikipedia page whenever possible. This can be found on the “lists” subfield on every IDEAS page. And for those curious about the distribution by language for the back-links: English 574, German 165, Spanish 83, Norwegian 48, French 48, Japanese 44, Bulgarian 41, Turkish 36.


A new RePEc service: CollEc

March 11, 2012

A new RePEc service is now on-line, CollEc. The main goal of this initiative is to analyze co-authorship networks within Economics. To this end, it collects all the authorship data from the RePEc Author Service and computes the shortest path through co-authorship relationships between any two registered economists. From all this data, two “features” are computed.

First, a closeness and a betweenness score is computed for every economist. Closeness measure how close one is with everyone else. Betweenness measures how frequently shortest paths have a particular economist as a node. Of course, economists can be ranked according to both criteria.

Second, the website allows to display the shortest paths between any two economists, and one can be surprised at how short they often are. To play with this, either navigate the lists on CollEc or find the direct link to an author’s page on IDEAS (author profile, under “statistics”), then enter the name of another author.

Note that only authors registered with RePEc are considered. Also, not every registered author is part of this global network of co-authorship. For example, an author without a (registered) co-author is excluded. Also, an economist at the end of a path cannot have a betweenness score, mostly likely someone with a single (registered) co-author.


Important upgrade for the RePEc Author Service

February 28, 2012

The RePEc Author Service just underwent a major upgrade. One important aspect of it is the treatment of multiple affiliations. There are also other changes of a more cosmetic nature that should help users avoid some common mistakes as well as some administrative and management improvements that a typical user would not notice.

Multiple affiliations

The most requested change was to allow authors with multiple affiliations to either select an order of importance for the affiliations or to select weights for each. This has become important for the rankings, as authors are allocated to their respective affiliations. Some weighting scheme had to be put in place, and the one in place so far was guessing the probability that a particular affiliation is the main one. The risk of error is of course large. With the revision, authors now have to chose the proper weights themselves. This now applies to any authors changing affiliations, and any new registrant. Anybody getting on the affiliation page also has to choose weights. These weights will be enforced for the March 2012 rankings released in early April 2012. Note that weights are public information.

Clearer claim choices

This change has actually been in place for a few weeks. When authors were offered choices of research items to claim as theirs, quite a few got confused and did the exact opposite of what they wanted to do: claim works of others and refuse their own. The form is now much clearer, with green and red backgrounds for the choices. Our observation so far has been that the error rate has been dramatically reduced.

Better action alerts

When an author logs in, he/she will immediately see whether research items or citations are waiting to be claimed. Bright red numbers are then present next to the relevant links. This features will gradually roll in as author accounts are refreshed.

Avoid duplicate entries

Upon registration, there is now a check to avoid someone to register again. Indeed, when moving or changing email address, it is much better to update an existing account than create a second, none the least because this preserves links throughout the RePEc system. Of course, it is still possible that a homonym is registering, so the check can be bypassed.

Name variations

Research items are linked to authors using the name variations they supply. During registration, the system makes suggestions that a registrant can amend (for example: Adam Smith; Smith, Adam; A. Smith; Smith A.). Unfortunately, it was noticed that a not insignificant share of users was deleting valid name variations, in particular to keep just one. The system was then unable to link them with appropriate results. It is now impossible for a new registrant to have less than four name variations.

Better treatment of deceased authors

Unfortunately, some authors pass away, and by now the list has become significant. Instead of leaving those accounts orphaned, they are now aggregated into a master account that can manage them. Indeed, a deceased author may still have new works added to RePEc, and have new citations discovered. Such an account continues to provide useful information and should not be deleted. By the way, this is an opportunity for volunteer to get involved in helping RePEc.

Better monitoring

There is now also better monitoring of the activity on the RePEc Author Service to prevent abuse and errors. This also releases more time for the administrator to deal with other tasks.

Problems?

Should anything appear amiss, do not hesitate to contact the administrator listed on the RePEc Author Service website.

PS: Secure HTTP

The site is now also served under secure http (https), to increase the security of transactions.


30,000 authors now registered with RePEc

October 29, 2011

We are continually amazed at how RePEc has grown since its inception in 1997 (with a precursor stating in 1992). One example is that we now have 30,000 authors registered with the RePEc Author Service, averaging 23 listed works each. If we can call this a community, it is the largest in the profession, as it outnumbers the membership of the largest societies in Economics combined. It is also remarkable, that only 1% of the accounts have expired email addresses, showing that authors maintain their entries. This does not include the small but unfortunately growing number of deceased authors.

This is also a good opportunity to mention that the RePEc Author Service is now hosted by the Economic Research Division of the Federal Reserve Bank of St. Louis. We are currently working on a few innovations that will make the service more useful to the profession as well as facilitate its maintenance.


Why discussion paper archives should not allow the removal of items

August 20, 2011

The archives listed in RePEc differ in their policies regarding withdrawal of items, or replacement of an old item by a newer one. Some archives, like NBER, permit withdrawals and replacements, while others, like  IZA  or MPRA do permit neither withdrawals nor replacements. (ArXiv, the leading archive for physics, has adopted a no withdrawal policy as well.)

I am managing MPRA, which publishes unrefereed discussion papers in economics. In the following, I detail the reasoning underlying MPRA’s policy choice.  As the case for prohibiting withdrawals seems to be strong, it is hoped that other RePEc archives adopt a similar policy if they have not done so already.

Discussion papers are preliminary versions of articles that may appear in their final form in the future. Discussion of these preliminary versions serves to improve them.

Discussion of a discussion paper requires that it can be cited. Citation requires that you can find the cited item, and even the cited phrase at the page given in the citation. In short: The cited item must remain reliably unchanged and retrievable.

In the old days, you mailed typed manuscripts to colleagues, and successively revised your papers in response to their suggestions and criticism. This entailed the problem that your colleagues would refer to different versions. In order to correctly grasp their points, you had to keep track of the different versions you had mailed around. (I never managed.) With a stable Internet address for each version, this tracking can be done over the Internet with ease. Permitting substitution of old versions by new version under the same Internet address would invide confusion and would make citations unreliable.

So the alternative seems to be: Either you keep your papers private and have your discussion in form of private correspondence, or you put them on the Net for public discussion. The second alternative is implied by placing the paper in a discussion paper archive, and this seems to require that identifiable versions remain accessible concurrently.

In addition, there are further reasons for favoring a “no withdrawal” policy by archive maintainers.

– If the final version of a paper ends up in a toll-gated journal, this excludes the majority of economists from reading the final version. The presence of a preliminary version mitigates the problem.

– If the preliminary version is referred to by a hyperlink, the reference becomes largely useless. NEP reports will, for instance, show dead links in such cases. This is a nuisance.

– If problems about priority of findings arise, these may be settled more easily if all versions are available on the Net.

– For archive maintainers, the manual handling of withdrawals requires considerable work. This speaks against the possibility of withdrawals as well. (For large archives, this reason is overwhelming. At MPRA we initially permitted withdrawals, but this proved impracticable and provided the proximate cause for adopting the no-withdrawal policy.)

– Further, the fight against plagiarism is eased by adopting a non-withdrawal policy. Typically, plagiarizers ask for removal of their contribution if detection is imminent. This tends to shade the case. If a plagiary remains in the archive, the case remains transparent. If an item is identified as a plagiary, it is to be marked as such, and the original source indicated. This has additional advantages:

– the interested reader is referred to the original source

– the plagiarizer cannot make his plagiary undone, thereby hiding the offense from scrutiny by potential future employers

– because of that threat, plagiarism becomes more risky and is discouraged.

– problems with plagiarism may be settled more easily and be handled more transparently if all versions are available on the Net. Otherwise, a paper may be plagiarized, the original paper substituted by a revised  version, and priority will go to the plagiary, while the revised version will be counted as a result of plagiarism! This ought to be avoided.

The common objection against a no withdrawal policy is that authors would prefer readers to read the newest version. Yet RePEc provides information about all versions, and the metadata at IDEAS or EconPapers provide alerts about other existing versions. So the readers may choose the most recent one. (Such problems occur all the time, but it would be impractical to introduce the possibility of withdrawing everything, including published papers. For example, I have recently updated a paper published in a journal in 2008 and would like to refer the reader to the new version in the format of a discussion paper which contains important improvements and new material, but there is no way to do that, other than hoping that the reader searches through RePEc or sees the different versions in Google.)

There is, thus, a conflict between the interest of the author to have only his or her favorite version on the Net, and the public that is interested in transparency and unmanipulated documentation. At MPRA, we try to take account for that by indicating if a paper is superseded by a newer version. Further, we offer the possibility to watermark papers as withdrawn by the author, but leave them in the archive.


Three new fields covered by NEP

July 25, 2011

NEP (New Economics Papers) is the RePEc service in charge of disseminating recent working papers that are available online. This dissemination occurs through email lists and RSS feeds. Given the large number of them, about 400-500 a week, they are split into field specific reports, each headed by an editor who chooses what is relevant to the field of interest, aided by an expert system. About 90 fields are currently covered, and volunteers are welcome to edit any area that is currently not represented.

We take this opportunity to highlight three new reports of SEO services that have recently been opened:

  • NEP-DEM (Demographic Economics), edited by Clarence Nkengne Tsimpo (Université de Montréal and World Bank). Note that there are also a report for migration (NEP-MIG).
  • NEP-IUE (Informal and Underground Economics), edited by Catalina Granda Carvajal (Universidad de Antioquia).
  • NEP-LMA (Labor Markets: Supply, Demand, and Wages), edited by Erik Jonasson (Lunds University). There is also a general labor economics report (NEP-LAB) and one dedicated to unemployment, inequality and poverty (NEP-LTV).

Subscriptions are of course free, as everything in RePEc. Details are available at NEP, including for the many other reports.


NEP: 30000 reports and going

November 24, 2010

NEP (New Economics Papers) is an important element in the collection of services that use RePEc data. It disseminates through email and RSS weekly reports about new working papers in 85 different fields, each compiled by volunteer editors. This project has recently surpassed 30000 reports sent since 1998 to currently over email 60000 subscriptions from close to 30000 unique email addresses, announcing over 150000 papers on average to two field reports.

The quantity of information digested by this project has grown considerably over the years. Currently about 500 new papers a week are analyzed, a number too large for editors to manage. Thus several years ago an expert system has been put in place that learns on the choices of the editors and offers them every week the complete list of papers for selection, but placing the most likely choices first. It is remarkable how well this works, thereby saving our volunteers considerable time.

Volunteers are still welcome, for example to help with the general management of the project, help with existing reports or open new reports in fields not yet covered. Interested people should contact Marco Novarese.


Follow

Get every new post delivered to your Inbox.

Join 276 other followers