Wiki.ros.org now even slower in Europe

wiki.ros.org has never been a cheetah when accessed from Europe. I was used to page load times between 2-10 seconds. But lately, after Noetic EOL, it seems to be even slower, dangerously nearing various browser and protocol timeout limits.

Currently, I’m at 83 seconds:

This is clearly far from usable. Could someone please have a look at it?

2 Likes

We’re getting hammered by crawlers and it’s taking the load average through the roof on the server. It’s serving a lot of content, but the bots are getting most of it. We’re not alone in these issues: Open source devs say AI crawlers dominate traffic, forcing blocks on entire countries - Ars Technica :frowning:

Specifically watching the logs they’re hitting expensive actions such as querying for each diff of every page on the wiki sequentially, but are doing it from rotating IPs which avoids the effectiveness of rate limits that Moin Moin can provide. I’ve reduced the access limits, hopefully it will still be functional for users.

It looks like it’s doing better now with load averages coming down, but they look to be creeping up so I don’t know how long it will last.

If it gets bogged down again I would suggest that you look at one of the mirrors which conveniently have the list of mirrors mirrored. Some don’t appear to be online currently in Europe, but the static sites should load relatively quickly.

or

The rate of change of wiki content is very low now so the mirrors will have effectively the same content.

The plan is to switch the wiki content to be a static archive of the content in the near future too as we don’t have effective means to keep this service quality up. But if you need information in the mean time the mirrors are likely your best bet to get info quickly.

4 Likes

Outch… meet the dark side of LLMs… I hope the static archive will be much more agile. Wouldn’t it be possible to block at least these diff requests somewhere in .htaccess? I guess no real person needs it now.

Thanks for the list of mirrors.

Or deny the diff access to bots in the robots.txt

On a more serious note, the entities responsible for these crawlers typically don’t seem to respect robots.txt: Michael Tsai - Blog - AI Companies Ignoring Robots.txt

2 Likes

It seems to be much better now, thanks!

1 Like