This post has an update
Over the past year, various RePEc services have been struggling with issues stemming from artificial intelligence robots (AI bots), and this post is a short summary of what happening and where we stand now.
About one year ago, RePEc services started noticing a marked increase in traffic by robots. We are used to search engine robots that scour our sites. They are welcome, first because they make our contents discoverable on search platforms, second because they behave well, that is, they obey the voluntary Robots Exclusion Protocol. This protocol allows webmasters to set boundaries of where robots can and cannot go, and at what frequency.
The new traffic was not obeying that protocol. It was also massive and kept inventing new ways of avoiding being blocked. These are AI bots that are looking for material to learn from, and RePEc material is very interesting to them. As for search bots, we do not have a problem with them as long as they behave well, after all our mission is to enhance the dissemination of economic research. But when they start having an impact on our human users, we have a problem.
EconPapers started getting so much traffic that it was bringing down the campus network, and it had to be shut down for a week. IDEAS search was getting hit so hard that it was not functional. After a months-long cat-and-mouse game, some search features had to be removed for humans to be able to discover the economic literature again. CitEc has also been under pressure at various times. After much optimization, we are now in a state where we can serve appropriately human users.
RePEc is not unique in facing these issues. Research libraries and digital archives all over the world have faced the same issues. In the end, they often were forced to implement costly protective measures to keep the robots at bay and still serve humans. It is now routine to have to pass a test before accessing content.
RePEc does not have the means of acquiring such protection and, as mentioned, we are OK with material being discoverable. We are thus currently in a situation where the sites are still openly accessible, but some features are disabled. And new disruptions may happen.
All this robotic traffic also has had the consequence that computing usage statistics has become much more challenging. Most AI bots do not identify themselves. Worse, they look for ways to hide themselves by masquerading as human users. We leverage those usage statistics on the LogEc site and for various statistics and rankings, so it is important that we get them right. We have spent more and more effort to clean the data, to the point that in September 2025, more than 99.5% of the traffic on IDEAS was thrown out. In addition to identifying robots, we also look automatically and then manually at all outliers, typically finding a couple hundred highly suspicious cases each month.
For October 2025 though, we have not been as successful. Traffic on IDEAS and EconPapers is double the previous month after vetting. We think this is suspicious. We do find a noticeable increase in referral traffic from AI sites, indicating that they have links to our sites and that people follow them. We are obviously happy about that. But this cannot justify a doubling, even if strangely several AI sites decided to hide those referrals as our tests revealed. Note that Google Analytics, which is used for IDEAS and is supposed to filter out robots, also finds a doubling of traffic.
As far as we can tell, this is not benefiting anything in particular: We have again vetted outliers. Thus comparisons between items, series, journals, or authors within a month are still valid. Comparisons from one month to the other may not, though. Time will tell whether this is a one-time problem.
2025-12-08 Update
The problems got worse with analysis of November 2025 traffic. While October had twice the expected abstracts views after vetting, November is close to thrice. We do notice a continuous, but not this abrupt, increase of identified traffic coming from users of AI tools. However, we find it hard to believe that it would rival Google and Google Scholar as a source of traffic. Thus, we believe we still have the problem of properly differentiating human traffic from robotic traffic from AI.
Unless there is a dramatic reversal for the December traffic analysis, we will drop abstract views from the list of criteria used for the author and institution rankings, as a consequence of our lack of confidence in those numbers. The numbers will still be reported, though.
You must be logged in to post a comment.