How abstract views and downloads are counted

September 19, 2009

Authors and RePEc archive maintainers receive monthly emails with various statistics, and among the most anticipated statistics are our abstract views and download counts. It is important to understand how those statistics are collected and what they measure (and do not measure). Full statistics are available on the LogEc website managed by Sune Karlsson from Örebro University (Sweden).

Participating RePEc services (EconPapers, IDEAS, NEP and Socionet) keep a log of all activity on their sites. This allows us to count page views for the abstract pages of each items in the database (excluding NEP, as abstracts are listed in emails). Logs also record outclicks as users leave the RePEc services to the sites containing the full texts they seeks to download. This allows us to count “downloads”. Quotation marks are required as it is impossible to record whether the download was successful, for example in the case of gated publisher sites. Note also that this means that downloads that have not transited through a RePEc cannot be counted, as we do not have access to local logs.

LogEc gathers the logs from the participating services and aggregates the statistics. This involves much more than bean counting, though. Indeed, one first needs to exclude robot activity, as only human activity is of interest. Some robots declare themselves as such, but other hide their identity. One has thus to infer from various patterns what IP addresses are likely robots. This is an important step, as robots represent typically 75% of raw abstract views. Robots include spiders from many search engines as well as other initiatives on the Internet.

One needs also to weed out multiple views or downloads by the same user. This brings us to detecting attempts at increasing counts by authors. Obviously, we cannot reveal here how this is done, but let it be known that we have detected fraud even by authors using multiple Internet service providers. The methods used lead to some undercounting, though. Multiple users behind the same cache server may be counted only once, as it may for example happen to employees of the US Federal Reserve Banks that use RePEc.

And we are still not done pruning. LogEc then checks for additional patterns that need to be vetted by a human eye. Unusual activity is then checked and often reconciled with traffic from popular blogs, magazines and newspapers. But on other occasions, traffic surges cannot be explained in licit ways and need to be cleaned out.

After all these manipulations, statistics are published and disseminated. And despite substantial pruning, RePEc services still get over 2,000,000 abstract views and 600,000 downloads every month. See LogEc for details.


RePEcFB – An integration of your RePEc data into your Facebook profile

September 9, 2009

Following a suggestion on this blog and the creation of a RePEc Facebook group, we are happy to announce that a new service went online last week. The Facebook application RePEcFB allows Facebook users to integrate their RePEc data into Facebook. Economists on Facebook can create a small profile box listing their recent work, or a “My research” tab in the Facebook profile giving information about their working papers, publications and other research output. Users can list their affiliations and professional contact data, announce recent papers authored by their Facebook friends, or inform about conferences and other academic events they are going to attend. New papers or affiliations can be directly posted to the Wall and can be commented on by friends.

To use the application you both need a Facebook account and a RePEc author profile with RePEc Author Service. Detailed instructions can be found on the Notes tab of the application’s homepage.

RePEcFB was written by Ben Greiner with the help of László Kóczy, Sune Karlsson, and Thomas Krichel. The software is hosted on Sune’s server at Örebro University. The software is under ongoing development, so feel free to send comments to the author.


MPRA, the Munich Personal RePEc Archive

August 27, 2009

The Munich Personal RePEc Archive (MPRA) has been started three years ago. It has developed into one of the largest archives within the RePEc network, comprising roughly 9000 items at the time of writing. Christian Zimmermann has suggested that I share some toughs about its history and functioning.

The initial idea occurred to me when I heard that the Economics Working Paper Archive (EconWPA), run by Bob Parks, was discontinued in 2005. EconWPA offered the possibility for individual authors to make their contributions accessible to the community through the RePEc network, given that only institutions can set up RePEc archives. Although we have in Munich our discussion paper series integrated into RePEc, not all economists are so fortunate, and the need for a personal archive (as distinct from an institutional archive) was apparent.

Given that we had successfully established our department’s discussion paper series with the EPrints software, it appeared technically feasible to clone the software and use it for a personal RePEc archive. Discussion on the internal RePEc list led to the name “Munich Personal RePEc Archive,” the main concern being to clarify that the archive was intended as a RePEc service, rather something  original, and that the name would not exclude other personal RePEc archives in other locations. (If one of the other Munich universities wants to start another personal archive, we may get into a problem…)

I asked Volker Schallehn from the University Library, who has implemented the EPrints software for our university archives, about the possibility to help with such a project. He agreed to help. The next step was to convince the president of the university as well as the director of the library to agree dedicating some resources to the endeavor that would not serve people from Munich at all. They were in favor, and so we got started on September 19, 2006.

From a technical point of view the main problem was to automatize as much as possible, as we could not supply manpower: The generation of title pages, the  creation of metadate in the ReDif format required by the RePEc harvester, and the linking to the RePEc author service. With the help of  Thomas Krichel, Christian Zimmermann, Kit Baum, Sune Karlsson, Ivan Kurmarov, and others we manged to solve these problems and set up the website. We found editors. They do the main job now. The English editors handle often more than 50 submissions per day.

As the Eprints software permits to establish series in different languages, we decided to use these feature and to offer the service in all languages for authors who deal with country-specific issues and want to make their research available in their local language. However we require for all submissions English abstracts such that all users can obtain an impression what economists writing in other languages do and, if necessary, contact them. This feature has lead to quite a number of submissions in languages like Spanish or French, and to some smaller sets in Turkish, Arabic, and others. (Some of them look extremely pretty.) Maybe this feature creates a sense that all economists world-wide see themselves as members of a community with the common purpose of helping to improve living conditions around the globe.

A central motivation for establishing a pre-print archive like MPRA was to enable authors to secure the copyrights for their pre-print versions in case the copyright for the final article goes to the publisher. This permits open access to their work, even if publishers try to make the final work inaccessible for the non-paying public. This is a great convenience for academics and, I hope, generates a countervailing power that keeps a check on journal prices. Further, this arrangement provides a means for the authors to make their work accessible to others through the RePEc services.

As an unintended by-product some authors have obtained requests from publishers to publish their contribution in a volume or journal. This may indicate a trend for the future: While authors submitted their works to publishers (and paid for it), in the future simply put your stuff on the net, and publishers approach you in order to create collections that generate value added beyond mere publication, such that people and libraries a willing to pay for it. If MPRA could contribute to such a development, this would be nice.

It is quite astonishing to me how many good papers we obtain, in spite of the fact that we do no refereeing at all. (The editors check only some formal aspects, making sure that the submission is of academic nature, and a certain convention has emerged in this respect.)

MPRA offers a public forum for publishing papers, but not only that: It offers the possibility to publish comments on papers in the archive. This feature is not used. Maybe somebody has a suggestion how to organize discussions around papers such that people actually feel inclined to use such a feature.

So much about MPRA. If you have any suggestions, please feel free to communicate and discuss them on this blog.


On versioning in RePEc

August 21, 2009

RePEc carries research in various formats. While journal articles are unique (with very few exceptions), working papers, as they are pre-prints, may be duplicates of listed articles, and they may even appear in different versions, either because they are published in different series, or because there may be updates within a series. We believe that is important to carry all versions, not just the last one, for the following reasons.


  1. Time-stamps: A working paper allows to establish when some research was conducted and thus determines preeminence of research ideas. Given publication delays in Economics, this can be important.
  2. Open access: Many journal articles have gated access. Such restrictions can be bypassed by reading working papers, which are mostly open access.
  3. Link to published version: It is still preferred to use published versions in citations, especially once a paper is accepted in a journal. The originally cited working paper is often linked to its published version.
  4. Visibility: Working papers are much more read than journal articles, both because they are more current and they are freely available. In addition, working papers are disseminated through NEP.

The process of linking the various versions of the same work is not obvious, however. With about 800,000 works in RePEc, performing matches on titles is a daunting task, especially as fuzzy matching is necessary due to slight variations in punctuation and spelling. For this reason, we do the matching only across the works listed in an author’s profile. This ensures that the likelihood of two works being different versions of the same one to be very close to 100%. But this also means that such matching cannot be done for works where none of the authors is registered, or where a registered authors did not add all versions to the profile, thereby indicating he/she is not the author of this particular version, rightly of wrongly.

In some cases, titles change across versions, or journal editors require a title change. In such cases, a manual link between versions can be added, just contact a member of the RePEc team with the relevant RePEc handles.


Suggestion box

May 23, 2009

RePEc is entirely driven by volunteers, who are also users. Most current volunteers came to RePEc because either they wanted to help with a current project or because they had some idea they wanted implemented in RePEc. We are opening this suggestion box for several reasons: as way to encourage feedback, to encourage more volunteers to come forward and pick a suggestion, and finally have users and RePEc team members discuss the proposed suggestions.

At RePEc, we like to be open. After all, we are creating open bibliographies using open source software, and we encourage open access. RePEc is there for you, so tell us how you want it to be. So, make your suggestion in the comment section below.


Institutional data in RePEc

December 19, 2008

RePEc gathers information not only about publications and authors, but also institutions. Specifically, the EDIRC project (Economics Departments, Institutes and Research Centers) catalogues since 1995 all academics and government institutions that employ a significant share of economists, including think tanks and associations. For-profit organizations (banks, consultants, etc.) are listed if they contribute their publications to RePEc. As of today, 11,000 institutions are listed, including over 600 associations. Over 4000 have at least one registered author and about 1000 have some publication in RePEc.

The collected institutional data is used and displayed in various ways throughout RePEc. Authors use it when
they register to determine their affiliations. So do RePEc archives for their publications. Author and institution data are combined on EDIRC to compile the publication output of all institutions. Combine this with citation data from CitEc and download data from LogEc to determine institutional rankings.

Note that all the information about institutions has been gathered with the help of a lot of people.


RePEc as a bibliographic tool

September 14, 2008

RePEc is a scheme to collect bibliographic information about publication and pre-publications in Economics. Publishers provide all the relevant information, which is then displayed in various ways by RePEc services. This allows users to have access to this data. While it is useful to find items of research while browsing or searching through these services, it is even better when one can upload the relevant bibliographic data directly into one’s bibliographic tool.

Every abstract page on IDEAS has links that allow to download such bibliographic information in various formats: as a HTML citation, a plain text citation, the BibTeX entry familiar to LaTeX users, the RIS format used in various software like EndNote, and the ReDIF format used by RePEc. For registered authors, it is also possible to obtain these records for all their publications in one download. If other formats are used in the research community, they can be provided as well. Just ask.


NEP alerts now available through RSS

August 13, 2008

NEP (New Economics Papers) is an email service that alerts subscribers to new online working papers in their area of interest. About 80 fields are currently available, and the roughly weekly emails are sent free of charge. While the RePEc team thought email dissemination was sufficient, there also appears to be demand for RSS feeds as for this and other blogs. This is now available, and the RSS feeds can be subscribed to by clicking on the relevant field report on the NEP home page.

This new feature was added in typical RePEc fashion: David Hugh-Jones inquired with Marco Novarese why there was no RSS feed, Thomas Krichel encouraged David to set it up, and two days later, it was up.

If you think new features should be added to RePEc, we always welcome suggestions, especially if you are willing to do it yourself… much like many of the available NEP editors have been volunteers who just wanted a particular field to be covered.


Using RePEc for syllabi, bibliographies and publication lists

July 13, 2008

As highlighted in a recent post, we encourage deep linking in RePEc services. This is particularly useful for reading lists and syllabi. In fact, IDEAS provides simple tools to create such lists on its web site.

The first one allows to create reading lists by providing code that is similar to HTML and includes handles of items listed in RePEc. Each of these items is then automatically matched with other versions, thus allowing to find a free version of a password protected article, or find the latest version of a working paper as published in a journal. Different layouts are possible: one for a course syllabus, one for reading lists.

The second one allows to create a list of publications from a set of authors registered on RePEc. Existing examples include ex-pats from some countries, graduates from programs, winners of prizes, etc. Note that such lists are automatically computed for members of research units or departments. See listing on EDIRC. For other lists, this tools comes handy.


Why hotlinking to a RePEc service makes sense

June 27, 2008

Hotlinking is the practice of linking to a web page deep in a web site, instead of its front page. This practice is discouraged by many news sites, both because they prefer users to browse through the site and because links may become obsolete.

At RePEc, we actually encourage hotlinking. Links in RePEc services are designed to stay current (in principle). Also, instead of linking to a PDF file on a researcher’s web page, which may disappear, abstract pages on EconPapers or IDEAS are much more stable. In addition, these abstract pages may provide links to other versions of the paper. This proves particularly useful if the user does not have access to a password protected article from a commercial publisher, or if the user wishes to know whether the paper has been published. Other links on the abstract page can also be valuable, like those to author profiles, references, citations and related works. Finally, authors always appreciate when paper downloads are counted towards their statistics. Indeed, RePEc can only monitor traffic routed through its services.

Therefore, we encourage hotlinks to RePEc services on blogs, online syllabi, personal web pages, online bibliographies, etc.


Follow

Get every new post delivered to your Inbox.

Join 48 other followers