One benefit for participants in RePEc, be it publishers or authors, is that they get some statistics on the popularity of their works. Beyond citation counts, RePEc also provides download counts and what we call abstract views, that is the number of times web pages paper abstracts have been viewed on participating RePEc services. This post explains how these last two statistics are computed.
LogEc is the site that holds these statistics. Not all sites using RePEc data report the necessary data, but some of the most popular do: EconPapers, IDEAS, NEP, and Socionet. Statistics are based on web servers logs, but those contain a lot of unnecessary data: traffic from various robots that populate search engines, scripts that scrape the sites, other automated processes, and various abuses. The idea is to only count traffic from humans interested in the indexed papers. This means some trimming needs to be done.
We are going to take the example of IDEAS, the most popular of the reporting sites. In October 2021, IDEAS got 43,643,586 page views. Of those, 30,336,150 pertained to abstract pages and 565,241 were full-text downloads. A first pass removes obvious robots (mostly self-declared robots from search engines) as well as multiple views or downloads from the same user. We are down to 20,573,559 abstract views and 375,028 downloads. The next pass does further trimming, mostly of more accesses by various scripts. We have now 2,123,054 abstract views and 339,384 downloads. We are, however, not done yet. A number of red flags have been identified, that is, problematic cases that need to be checked by hand back in the server logs. After vetting those, we have the final numbers of 1,986,493 abstract views and 339,039 downloads. In other words, we counted only 6.5% respectively 60% of the raw traffic. In some months, the proportion is even lower, for example, when IDEAS has been subjected to denial of service attacks.