1 million works available online through RePEc

November 26, 2011

RePEc is about the facilitation of the diffusion of research in Economics. It does this through an open bibliography, which allows anyone to have its works listed, and anyone to use the bibliography. But of course, this is more powerful when the works are not just listed, but also available online with a direct link.

RePEc has now links to over a million works covering Economics and Finance, about half of which are in open access. While a majority are from journals (61%), online working papers are much more popular. While an article is download on average once every two months, working papers are downloaded close to once a month.

PS: RePEc volunteer and NEP-OPM editor Martin Berka is about to start a month-long rowing expedition from Sydney (Australia) to Auckland (New Zealand). You can follow the progress of his team here.


Why discussion paper archives should not allow the removal of items

August 20, 2011

The archives listed in RePEc differ in their policies regarding withdrawal of items, or replacement of an old item by a newer one. Some archives, like NBER, permit withdrawals and replacements, while others, like  IZA  or MPRA do permit neither withdrawals nor replacements. (ArXiv, the leading archive for physics, has adopted a no withdrawal policy as well.)

I am managing MPRA, which publishes unrefereed discussion papers in economics. In the following, I detail the reasoning underlying MPRA’s policy choice.  As the case for prohibiting withdrawals seems to be strong, it is hoped that other RePEc archives adopt a similar policy if they have not done so already.

Discussion papers are preliminary versions of articles that may appear in their final form in the future. Discussion of these preliminary versions serves to improve them.

Discussion of a discussion paper requires that it can be cited. Citation requires that you can find the cited item, and even the cited phrase at the page given in the citation. In short: The cited item must remain reliably unchanged and retrievable.

In the old days, you mailed typed manuscripts to colleagues, and successively revised your papers in response to their suggestions and criticism. This entailed the problem that your colleagues would refer to different versions. In order to correctly grasp their points, you had to keep track of the different versions you had mailed around. (I never managed.) With a stable Internet address for each version, this tracking can be done over the Internet with ease. Permitting substitution of old versions by new version under the same Internet address would invide confusion and would make citations unreliable.

So the alternative seems to be: Either you keep your papers private and have your discussion in form of private correspondence, or you put them on the Net for public discussion. The second alternative is implied by placing the paper in a discussion paper archive, and this seems to require that identifiable versions remain accessible concurrently.

In addition, there are further reasons for favoring a “no withdrawal” policy by archive maintainers.

– If the final version of a paper ends up in a toll-gated journal, this excludes the majority of economists from reading the final version. The presence of a preliminary version mitigates the problem.

– If the preliminary version is referred to by a hyperlink, the reference becomes largely useless. NEP reports will, for instance, show dead links in such cases. This is a nuisance.

– If problems about priority of findings arise, these may be settled more easily if all versions are available on the Net.

– For archive maintainers, the manual handling of withdrawals requires considerable work. This speaks against the possibility of withdrawals as well. (For large archives, this reason is overwhelming. At MPRA we initially permitted withdrawals, but this proved impracticable and provided the proximate cause for adopting the no-withdrawal policy.)

– Further, the fight against plagiarism is eased by adopting a non-withdrawal policy. Typically, plagiarizers ask for removal of their contribution if detection is imminent. This tends to shade the case. If a plagiary remains in the archive, the case remains transparent. If an item is identified as a plagiary, it is to be marked as such, and the original source indicated. This has additional advantages:

– the interested reader is referred to the original source

– the plagiarizer cannot make his plagiary undone, thereby hiding the offense from scrutiny by potential future employers

– because of that threat, plagiarism becomes more risky and is discouraged.

– problems with plagiarism may be settled more easily and be handled more transparently if all versions are available on the Net. Otherwise, a paper may be plagiarized, the original paper substituted by a revised  version, and priority will go to the plagiary, while the revised version will be counted as a result of plagiarism! This ought to be avoided.

The common objection against a no withdrawal policy is that authors would prefer readers to read the newest version. Yet RePEc provides information about all versions, and the metadata at IDEAS or EconPapers provide alerts about other existing versions. So the readers may choose the most recent one. (Such problems occur all the time, but it would be impractical to introduce the possibility of withdrawing everything, including published papers. For example, I have recently updated a paper published in a journal in 2008 and would like to refer the reader to the new version in the format of a discussion paper which contains important improvements and new material, but there is no way to do that, other than hoping that the reader searches through RePEc or sees the different versions in Google.)

There is, thus, a conflict between the interest of the author to have only his or her favorite version on the Net, and the public that is interested in transparency and unmanipulated documentation. At MPRA, we try to take account for that by indicating if a paper is superseded by a newer version. Further, we offer the possibility to watermark papers as withdrawn by the author, but leave them in the archive.


Little known features on RePEc sites

July 28, 2010

Various sites display information collected by RePEc, and they do so in ways that are not always similar. In particular, there are features that may not be noticeable to the casual user. Here are some featured on EconPapers, EconomistsOnline and IDEAS.


  1. EconPapers and IDEAS allow users to download bibliographic records in various formats, such as BibTeX, RIS (used by EndNote, ProCite and RefMan), plain text or HTML. IDEAS also provides this for all works of a registered author.
  2. RePEc services link different versions of a paper and article, as long as at least one of the authors has them listed in his/her profile and the titles are close. Contact RePEc for cases where titles differ.
  3. URLs on RePEc services are permanent, and can thus safely be used for referencing.
  4. EconomistsOnline allows to dynamically refine search results.
  5. Some services have advanced search features: EconPapers, IDEAS.
  6. One can navigate EconomistsOnline in four languages.
  7. IDEAS has tools to create reading lists and publication compilations of a group of people.
  8. EDIRC lists all publications of authors affiliated with an indexed institution.
  9. EconPapers provides a syntax and URL checker for the metadata submitted to RePEc.
  10. Both EconPapers and IDEAS provide links to citing and cited papers on each abstract page.
  11. Download statistics for series, journals, papers and authors are available at LogEc.


How to improve citation coverage in RePEc

April 28, 2010

One aspect of RePEc that has grown in importance over the last years is its citations analysis, provided by the CitEc project, in particular due to their use in rankings. Citations extractions is a complex process. First, one needs to be able to access texts and find where references are (see details), then one needs to be able to interpret those references and match them with some work already listed in RePEc (see details). At this time, 5,400,000 references could be extracted from 240,000 works, with 2,300,000 matched to an item listed in RePEc. While these numbers may sound impressive, it still means that only about a third of online texts could be parsed successfully. To improve on this we rely on the RePEc archive maintainers to help us do a better job. Here is some advice in this regard that they should heed, as any linked reference allows links back and forth between the citing and cited works, thus increasing visibility.


  1. Check out how successful CitEc is in extracting references from your series and journals. Maintainers receive every months statistics about coverage that they can monitor. In addition, they can look up on CitEc the reasons why some items were not processed. For the series with the best coverage, see here.
  2. Make sure links in the metadata go directly to a pdf file, and not to an intermediate abstract page. CitEc does not go further than the link that is provided to it. If you really want the abstract page present in the metadata, provide it as a second link.
  3. Make sure that CitEc is actually allowed to get to the pdf. If the pdfs are gated, consider allowing CitEc to access with its IP, which will be provided upon request.
  4. The above are not possible, or if for some other reason references cannot be parsed, one can also transfer references to CitEc by using the X-File-Ref construct in the metadata, as described here.
  5. For larger archives, an alternative way of transferring references can be arranged.
  6. Also, CitEc sometimes grabs too many references. This happens for working papers when a list of other papers in the series is appended. This is also a waste of paper. We strongly recommend not to have such lists and, where they are present, to alert CitEc so that these errors can be remedied.

Any request should be send to José Manuel Barrueco, who is in charge of the CitEc project.


Using RePEc as a search tool

April 11, 2010

Different people experience RePEc through its different uses, sometimes without being aware of its other uses. The purpose of this post is to highlight the use of RePEc services for bibliographic searches.

Currently, there are three different websites that offer bibliographic searches based on the data collected by RePEc: EconPapers, IDEAS and EconomistsOnline. Why use them instead of simply Google or Google Scholar? First, RePEc services allow fielded search: given the structure of the underlying metadata, it is possible to separate search results by authors, topical area, date, publication type and other attributes. EconomistOnline goes here the furthest, by allowing to narrow result sets successively according to various criteria. Second, the database and the search engines are updated as soon as publishers post new material, thus search results always reflect current holdings. Finally, as RePEc is not a spider, rather a catalog indexed directly by publishers, contents are known to be related to research in Economics. Thus, there are no irrelevant search results.

In addition, there are plug-ins available for most popular browsers both EconPapers and IDEAS. They allow to search RePEc directly from the search bar in a browser.


Why Journals?

December 16, 2009

When I started studying economics in the ‘sixties, there was no Xerox. Journals were printed, and then mailed. Because printing (type-setting by hand, no computer at that time) was expensive, only selected articles were distributed through journals, and journal editors had to select carefully. Researchers and even students subscribed to journals in order to have articles of interest available; otherwise they had to copy them by hand, or excerpt them, or go to the library to have a look. Distribution by print was the cheapest and most economic way of distributing research.

Hence the journals had a dual function: 1.) They selected research articles and 2.) they distributed them. The first function (selection) was necessary because printing–especially printing of mathematical formulae–was quite expensive. So the bundling of selection and distribution had an economic reason.

This reason has vanished. It is possible to distribute practically for free (through MPRA for instance). So the question is: Do we need journals, simply for the purpose of selecting articles, as the function of distributing articles is redundant nowadays. Let me share some thoughts on the issue. I concentrate on research journals, whether open access or not. Survey journals like the Journal of Economic Perspectives or commentary journals like Economists’ Voice are another matter.

Do We Need Quality Stamps?

Some people argue that journals provide a “quality stamp” for scientific contributions, just like rating agencies assess firms or assets. We know that rating agencies may induce unwarranted herding effects, yet the point that journals perform a rating function is true. But is it needed? And if needed, can’t it be provided more cheaply?

As to whether a quality stamp is needed: This may be different for different groups of users. So look at different groups that may benefit from a quality stamp.

Researchers

In my fields of research, I certainly do not use journal names for selecting articles. I search the Web and have my subscription to NEP. Most articles (99%) in top-ranking journals are of no concern to me because they discuss issues I don’t work on and seem too specialized, technical and boring as to make it worthwhile to read more than the abstract. But I get the abstract much earlier through NEP and other services. And further, I obtain the articles I am really interested in much earlier (one or two years earlier) over the net than through the journals. (Note that the articles in good journals are typically available on the Net at the time of publication.) If I find an article on the net that I like, typically a pre-print, and see it later published in a good journal, I feel a kind of satisfaction about the journal, but this does not seem to justify the existence of journals.

Further, I am not interested in seeing only the good papers that some referees approve of. As I know my field, I do not think that referees know better. Actually many papers in top journals are not so good, and mediocre journals publish excellent papers. Further, many rejected papers are rejected for reasons such as being badly written, ill organized, or employing faulty reasoning, but they often do contain useful references and interesting ideas, and therefore they interest me as much as a superbly crafted paper elaborating on rather sterile detail.

Yet there may be the benefit coming with having an article revised during the refereeing process. The probability that the mathematics are correct is slightly increased. Typically, the exposition is improved, too. Further, the references are enlarged by adding some quite relevant stuff, but also by adding things suggested by the referees for sundry reasons that hurt overall consistency. But this does not hurt much.

The benefits going with having an article refereed carry side-effects, however: Sometimes the editors’ and referees’ demands make papers worse. In the same vein, have a look at Bruno Frey’s amusing paper, and especially at what he reports about Robert Frank.

Regarding the publishing of my own research I see that publishing in a journal does not affect citations, but making a paper available on the net does so. Hence journal publication is of very limited value to me (but I don’t have to care about the journals I am publishing in because I am close to retirement).

So, overall, I think that researchers do not benefit significantly from journals that publish research papers.

Hiring Committees

A benefit from having quality stamps is that this helps hiring committees to select candidates under conditions of ignorance. This may be true, but I would consider this a dysfunction: In the first place, hiring committees should comprise knowledgeable members; otherwise you would not need hiring committees and leave the decision to bureaucrats; and second, citation numbers are much better indicators for the impact of an author’s work than the journals the author has published in. So ignorant hiring committees may better resort to RePEc citation scores, rather than being enthused by journal titles. (But then they will end up with hiring candidates who work in fields many people work in. So they end up with conventional candidates, rather than creative ones. But this will be the case whenever you have incompetent hiring committees.) In any case, hiring committees won’t need journals, as RePEc citation scores are independent of journal names and do not rely on the existence of journals.

However, the reliance of hiring committees on journal rankings may entail strictly negative consequences. I read, for instance, that Notre Dame University intends to dissolve the department of economic history because the economic historians do not publish in mainstream journals.

It seems to me that hiring committees do not benefit from the existence of journals either.

Libraries

It is sometimes said that journals permit journal rankings, and this is a help for librarians for deciding which journal to subscribe to. This is, of course, not an argument for supporting journals. Without journals, there would be no problem of selecting journals, and the librarians could concentrate on selecting books.

So I conclude that libraries would perform better if we had no journals.

Economics Without Journals

Imagine we had no economics journals. What would happen? Presumably people would write more books. I would consider this an advantage, as knowledge is much too fragmented at the moment. Further, institutions would be in demand to channel the flow of information better than possible through journals, such as blogs specializing on some topic or another, and meta-blogs like Econ Academics. I could imagine that collections of papers on certain topics would emerge. The Special Issues feature of the economics E-journal provides an example.

A Suggestion for a Next Step

My impression is that the existence of journals is a feature of the past. Journals will die, and this will be an improvement for academic economics. The process will be sped up if new ways of channeling information are devised. So here is just one idea:

I could think, for RePEc, to devise a feature that lists related papers to any given paper. Google Scholar has a feature like that, but I think that could be improved tremendously for our specific purposes. An easy way would be to look at the citations of any given paper and give all papers with similar citations. This could, theoretically, be achieved by building on the citation data created by the CitEc project. If someone with programming expertise could adopt such a project, this would be a great help for economists world-wide. (As a side effect, such a feature would put pressure on Elsevier to release its citation data.)

There are certainly many more suggestions. I am looking forward to see them, perhaps in comments to this blog. And certainly my general point must be controversial. I must have overlooked some important aspects. The world can not be as inefficient as I portray it. Otherwise we would have no journals right now.

Maybe we can have an exchange of ideas.


About RePEc impact factors

July 27, 2009

Impact factors have always been a popular way to measure the influence of academic journals. They have been popularized by ISI, now part of Thomson. RePEc also provides impact factors, and this post is about explaining the differences between the two.

ISI takes a sample of journals and analyzes the citations across those journals. To be eligible, a citations has to appear within two years of the publication of the cited article, the cited article must be printed (not forthcoming, a working paper or a manuscript), and the cited article must be among the analyzed journals (286 in Economics). ISI is currently experimenting with a five year window, in addition to the existing two-year window.

RePEc considers all publications listed in its bibliographic database. Thus, it also considers other publication forms than journal articles: close to 1000 journals and 2600 working paper series. It imposes no time window, citations of any age qualify. In most cases, a citation of a working paper will count towards its published form once the article is included in RePEc, possibly after the original citation (condition: at least one author has both versions in his/her RePEc profile). This implies that working paper series and book series can also have impact factors. RePEc is thus more comprehensive.

However, the pool of citations RePEc is drawing from is different. It relies very much on working papers (who can later be published), as they are typically openly accessible. Some publishers also provide references in the bibliographic metadata, but not all. One implication of this is that RePEc is more current as it includes citations to and from research that is not yet published. As research gets published, this data gets updated. But as references from many journals are missing, RePEc citation data must still be treated as experimental. Whether these omissions matter remain to be seen. After all, impact factors always have to be considered in relative terms, not in absolute terms, and if omissions were not biased, they would not matter.

Another major difference is that RePEc excludes self-citations. This is an important issue as some journals, explicitly or implicitly, encourage authors to cite other articles published within the two year window in the same journal. Thus, just as self-citations are excluded for authors, they are excluded for journals. And this can matter a lot.

Finally, the impact factor is determined by divided the eligible citations by the number of eligible articles. ISI determines itself what articles are eligible for the denominator, and this can even be negotiated with the publisher. In RePEc’s case, if an article (or a working paper) is listed, it counts without adjustment.

RePEc also publishes variations on the “simple” impact factor: recursive impact factors, where every citation counts with the impact factor of the citing publication, this favors impact over numbers; discounted impact factors, where the impact of a citation decays with time (regardless of the age of the cited item; and a combination of the two, discounted recursive impact factors. Finally, there is now also the h-index. All variations have a different story to tell about the publication, and RePEc offers the reader the choice.


Parsing citations

November 22, 2008

One of the services RePEc offers to authors is the discovery of citations, CitEc. This is a difficult undertaking as this needs to be done entirely automatically. As project leader José Manuel Barrueco Cruz discusses in a previous post, the reference section of a paper is extracted through a series of steps: pdf download, file conversion to PostScript, further conversion to plain text, identifying reference section. In each of these steps there are losses.

But even once the reference section is in hand, we are not out of trouble. One needs to identify where each reference starts and ends, then try to match it with something already in RePEc. Considering all the different citation styles, typos, and plain errors, this is a daunting task. Matches that are sufficiently close are counted as citations, matches that are in some grey zone are fed to the RePEc Author Service to solicit the author’s help in sorting them out. Below are a few examples of what is offered to authors, for the case of a classic article by Gary Becker, Kenneth Murphy and Robert Tamura, Human capital, fertility and economic growth:


  • [3] Becker, G.; Murphy, K. ald Tamura, R. (1993)Humall capital, fertility ald ecollomic growth 01 Humall Capital, third editioll, Gary Becker.
  • Becker, Gary S.; Murphy, Kevin M.; and Tamura, Robert. Human Capital, Fertility, and Econonric Growth, Journal of Political Economy, October 1990 98(5) Part 2, pp. S12-S37.
  • Becker GS, Murphy KM, Tamura R (1990) Human capital, fertility and economic growth. J Polit Econ 98:S12–S37.
  • 1-25. Kevin M. Murphy, and Robert Tamura, Human Capital, Fer- tility and Economic Growth, Journal of Political Economy, October
  • BECKER, 0. S., K. M. MURPHY and R. TAMURA (1990) Human Capital, Fertility and Economic Growth, Journal of Political Economy 98, S 12-37.
  • [6] Becker, G., Murphy, K. and Tamura, R. (1990), Human capital, fertility, and economic growth, Journal of Political Economy, vol. XCVIII, pp.12-37.
  • Population and Development Review, Vol.12, Supplement: Below-Replacement Fertility in Industrial Societies: Causes, Consequences, Policies, pp. 69-76. Becker, Gary; Kevin Murphy, y Robert Tamura. (1990). Human Capital, Fertility and Economic Growth. The Journal of Political Economy, Vol.98, No.5, Part 2: The Problem of Development: A Conference of the Institute for the Study of Free Enterprise System, S12-S37.
  • (March/April 1973 Supplement), S279-88. ______________ Kevin M. Murphy, and Robert Tamura, Human Capital, Fertility, and Economic Growth, Journal of Political Economy, XCVIII
  • Becker, S. Gary, Kevin, M. Murphy and Tamura, Robert (1990). `Human Capital, Fertility, and Economic Growth The Journal of Political Economy, Vol. 98, Issue 5, Part 2, Oct. 1990, pp. S12-S37.
  • Bankconference on developmenteconomics. ecker, Gary, KevinMurphy, and RobertTamura. 1990. Human Capital, Fertility, and EconomicGrowth., Journal of PoliticalEconomy 98, 5, Part 2, pp. S12-S37.

These examples show what can go wrong in the file conversion and how citing authors can make mistakes. Still, CitEc has been able to recognize there references, but is not sure enough about them.

This also highlights that we try to minimize errors, even if this means leaving good citations out. Other citations services may have a different approach.


RePEc as a bibliographic tool

September 14, 2008

RePEc is a scheme to collect bibliographic information about publication and pre-publications in Economics. Publishers provide all the relevant information, which is then displayed in various ways by RePEc services. This allows users to have access to this data. While it is useful to find items of research while browsing or searching through these services, it is even better when one can upload the relevant bibliographic data directly into one’s bibliographic tool.

Every abstract page on IDEAS has links that allow to download such bibliographic information in various formats: as a HTML citation, a plain text citation, the BibTeX entry familiar to LaTeX users, the RIS format used in various software like EndNote, and the ReDIF format used by RePEc. For registered authors, it is also possible to obtain these records for all their publications in one download. If other formats are used in the research community, they can be provided as well. Just ask.


Classifying authors

March 16, 2008

A difficult task librarians often face in the classification of items is determining whether authors with similar names are the same person. Indeed, bibliographic records are most of the time very limited in author identification. Take the case of Adam Smith. He may be listed under his full name, which is by no means unique, or worse only as A. Smith, which is easily confused with others. Librarians then rely on context and additional information gathered outside of the bibliographic record to attribute the work to the right person, hopefully without error.

With the large numbers of works now available, such laborious categorization becomes unfeasible, and automatic classification makes numerous errors. Within RePEc, we rely on the authors themselves to perform the classification. When they register in the RePEc Author Service, they have the opportunity to enter all the possible name variations in they may be listed in a bibliographic record. For John Maynard Keynes (who is not registered), such name variations could be:

John Maynard Keynes
John M. Keynes
John Keynes
J. M. Keynes
J. Keynes
Keynes, John Maynard
Keynes, John M.
Keynes, John
Keynes, J. M.
Keynes, J.

In addition, an author may have changed names (through marriage), be listed with a title (Prof., Sir) or with a suffix (Jr, Sr, III). Variations multiply if names have accents, which some publishers do not take into account or encode in the wrong character set. The possibilities are numerous. The registered author is then offered first suggestions of works that match the name variations and then suggestions that offer some close match to name variations (typographical errors happen). The author can then accept these works or reject them.

The RePEc Author Service has so far managed to collect data from close to 16,000 authors who have claimed over 300,000 works as theirs. Such data is in particular used to increase the accuracy of various rankings. And within this set of authors, there is already a large number of homonyms, even when one looks beyond the initial of the first name, which is the precision that some other services have.

If you know of other homonyms in the profession, encourage them to register!


Follow

Get every new post delivered to your Inbox.

Join 194 other followers