Back in May 2012, we were complaining that reported traffic on RePEc sites was declining. This trend has continued and we need to revisit the issue.
Looking at the graphs at LogEc it is quite obvious that traffic is not increasing as you would expect, including accounting for the the fact that there is actually more and more material indexed on RePEc. Before looking for the reasons, we need to explain how these statistics are computed.
Only a limited set of RePEc services reported the detailed traffic statistics needed to compute this: EconPapers, IDEAS, NEP and Socionet. Aggregate numbers are not sufficient for other RePEc services to report their statistics, one needs a lot of details to determine whether traffic is robotic or human, to remove duplicates and to detect fraud attempts. If fact about 90% of total traffic is rejected for statistical purposes on those grounds. This complexity makes that several sites that use RePEc data are not reporting anything about their traffic. This includes: EconLit, EconStor, Google Scholar, Inomics, Microsoft Academic Search, OAISter/WORLDCAT, Scirus, Sciverse and very likely more. The fact that the data collected by RePEc is used in many places is not contrary to our mission. We want to improve the dissemination of research in Economics. But we seem to be able to track only a fraction of its use. As the number of RePEc services reporting statistics has not increased, while the number of sites using RePEc data has, we could explain the decrease in reported traffic as cannibalization. The overall use may have increased, and user satisfaction too, but we cannot demonstrate it.
Of course, given that we are filtering the traffic statistics, we may be filtering too much, and increasingly so. We have indeed tightened some rules over time, mostly to avoid counting new traffic patterns that are visibly not legitimate. For example, IDEAS threw out 3.4 millions abstract views (or two per listed abstracts) in July 2014 thanks to a single pattern rule that was introduced about a year ago. But this pattern was previously not problematic, so it is difficult to conclude that such tightening can explain a reduction in traffic. It remains a fact that the proportions of traffic that is excluded is steadily increasing. In raw numbers, IDEAS keeps breaking records. It filtered numbers, traffic is declining. Is it because there are really more and more robots out there?
The same applies to other potential explanations: Several institutions are caching our websites. several have all their members access the web through a single IP address and are thus undistinguishable to us. In both cases, downloads by different users look to us like they are coming from the same person and are counted only once. Is this more prevalent than before? Yes in both cases, but caching is very minor, and IP bundling pertains mostly to governmental institutions and corporate networks. How much this matters is difficult to evaluate.
The big elephant in the house is traffic coming from search engines, and most importantly Google. Google has changed its ranking criteria over time. Google Scholar has started privileging the original source over aggregators like RePEc several years ago, and the impact has been increasing as more publishers give Google Scholar direct access to their repositories. This pertains also to the general Google search engine. For example, traffic from Google to IDEAS dropped by a third from one day to the next on May 22, 2014, after Google decided to penalize the search ranking of aggregator web pages.
Finally, we cannot exclude that RePEc services are indeed less popular, which is bad. But if this is because people are more easily finding what they are looking for, then this is good, as the core missing of RePEc is to improve the dissemination of research in economics.