Page 1 of 1

Crawlers

Posted: 30 Aug 2024, 10:42
by Duke
Considering the nature and the content of this forum, it might be a good idea to block the search engines crawlers like Ahrefs [Bot], Bing [Bot], Google [Bot], Semrush [Bot] from browsing this forum ;)

An example of robots.txt from another forum:

Code: Select all

User-agent: Amazonbot 
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: SemanticScholarBot
Disallow: /

User-agent: PetalBot
Disallow: /

User-agent: YandexBot
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: DotBot
Disallow: /

User-agent: SemrushBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: AhrefsBot
Crawl-delay: 10
Disallow: /ajax.php
Disallow: /attachment.php
Disallow: /calendar.php
Disallow: /cron.php
Disallow: /editpost.php
Disallow: /global.php
Disallow: /image.php
Disallow: /inlinemod.php
Disallow: /joinrequests.php
Disallow: /login.php
Disallow: /member.php
Disallow: /memberlist.php
Disallow: /misc.php
Disallow: /moderator.php
Disallow: /newattachment.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /poll.php
Disallow: /postings.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /profile.php
Disallow: /register.php
Disallow: /report.php
Disallow: /reputation.php
Disallow: /search.php
Disallow: /sendmessage.php
Disallow: /showgroups.php
Disallow: /subscription.php
Disallow: /threadrate.php
Disallow: /usercp.php
Disallow: /usernote.php

User-agent: *
Disallow: /ajax.php
Disallow: /attachment.php
Disallow: /calendar.php
Disallow: /cron.php
Disallow: /editpost.php
Disallow: /global.php
Disallow: /image.php
Disallow: /inlinemod.php
Disallow: /joinrequests.php
Disallow: /login.php
Disallow: /member.php
Disallow: /memberlist.php
Disallow: /misc.php
Disallow: /moderator.php
Disallow: /newattachment.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /poll.php
Disallow: /postings.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /profile.php
Disallow: /register.php
Disallow: /report.php
Disallow: /reputation.php
Disallow: /search.php
Disallow: /sendmessage.php
Disallow: /showgroups.php
Disallow: /subscription.php
Disallow: /threadrate.php
Disallow: /usercp.php
Disallow: /usernote.php

Crawlers

Posted: 30 Aug 2024, 11:27
by the_r3dacted
And kill discoverability? no lol

Crawlers

Posted: 30 Oct 2024, 14:23
by Duke
Google [Bot] ok, but AdsBot [Google] and Amazon [Bot] :evil:

Are you sure you want those ? They don't bring you discoverability at all but crap :thumbdown:

Crawlers

Posted: 30 Oct 2024, 15:04
by the_r3dacted
too busy and too lazy

Crawlers

Posted: 12 Nov 2024, 18:40
by the_r3dacted
Since the server likes to die every day, I wanted to try to tackle some of the potential reasons. That included finally implementing some sort of robots.txt

I didn't feel like outright blocking search engines like Yandex or AI bots, other than those from companies I hate like Meta. Also some SEO tools seem useless and could spam my shit so I blocked them outright. For example Ahrefs is blocked both in robots.txt and on a firewall level now.

So using your robots.txt as a base and with the help of a friend, I came up with this:
https://board.eclipse.cx/robots.txt

Crawlers

Posted: 12 Nov 2024, 23:13
by Duke
Well done! :thumbup:

Crawlers

Posted: 13 Nov 2024, 04:12
by Compa
Duke wrote: 12 Nov 2024, 23:13 Well done! :thumbup:
It took me about an hour to convince him to do a proper job of it.
Thanks for providing a nice template for phpBB though, that really helped us. :)

Crawlers

Posted: 20 Mar 2025, 23:23
by Duke
AI crawlers attacks and abuse:
https://news.ycombinator.com/item?id=43422413

Crawlers

Posted: 25 Feb 2026, 14:06
by Duke
About server slowness, downtime, and other issues maybe you should really consider using some filter like Anubis:
https://github.com/TecharoHQ/anubis

It's been used on Mozillazine.org but there are other ones.
Many forums are experiencing the same slowness or access issue these days because of AI crawlers, whatever and wherever the hosting is.

Crawlers

Posted: 27 Feb 2026, 15:09
by the_r3dacted
Duke wrote: 25 Feb 2026, 14:06 About server slowness, downtime, and other issues maybe you should really consider using some filter like Anubis:
https://github.com/TecharoHQ/anubis
Not going to do that. https://github.com/Eclipse-Community/r3dfox/issues/30

Crawlers

Posted: 27 Feb 2026, 20:27
by Duke
the_r3dacted wrote: 27 Feb 2026, 15:09 Not going to do that.
Your choice. But that really helped many forums from being overloaded.