Page 1 of 1

Crawlers

Posted: 30 Aug 2024, 10:42
by Duke
Considering the nature and the content of this forum, it might be a good idea to block the search engines crawlers like Ahrefs [Bot], Bing [Bot], Google [Bot], Semrush [Bot] from browsing this forum ;)

An example of robots.txt from another forum:

Code: Select all

User-agent: Amazonbot 
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: SemanticScholarBot
Disallow: /

User-agent: PetalBot
Disallow: /

User-agent: YandexBot
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: DotBot
Disallow: /

User-agent: SemrushBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: AhrefsBot
Crawl-delay: 10
Disallow: /ajax.php
Disallow: /attachment.php
Disallow: /calendar.php
Disallow: /cron.php
Disallow: /editpost.php
Disallow: /global.php
Disallow: /image.php
Disallow: /inlinemod.php
Disallow: /joinrequests.php
Disallow: /login.php
Disallow: /member.php
Disallow: /memberlist.php
Disallow: /misc.php
Disallow: /moderator.php
Disallow: /newattachment.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /poll.php
Disallow: /postings.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /profile.php
Disallow: /register.php
Disallow: /report.php
Disallow: /reputation.php
Disallow: /search.php
Disallow: /sendmessage.php
Disallow: /showgroups.php
Disallow: /subscription.php
Disallow: /threadrate.php
Disallow: /usercp.php
Disallow: /usernote.php

User-agent: *
Disallow: /ajax.php
Disallow: /attachment.php
Disallow: /calendar.php
Disallow: /cron.php
Disallow: /editpost.php
Disallow: /global.php
Disallow: /image.php
Disallow: /inlinemod.php
Disallow: /joinrequests.php
Disallow: /login.php
Disallow: /member.php
Disallow: /memberlist.php
Disallow: /misc.php
Disallow: /moderator.php
Disallow: /newattachment.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /poll.php
Disallow: /postings.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /profile.php
Disallow: /register.php
Disallow: /report.php
Disallow: /reputation.php
Disallow: /search.php
Disallow: /sendmessage.php
Disallow: /showgroups.php
Disallow: /subscription.php
Disallow: /threadrate.php
Disallow: /usercp.php
Disallow: /usernote.php

Crawlers

Posted: 30 Aug 2024, 11:27
by K4sum1
And kill discoverability? no lol

Crawlers

Posted: 30 Oct 2024, 14:23
by Duke
Google [Bot] ok, but AdsBot [Google] and Amazon [Bot] :evil:

Are you sure you want those ? They don't bring you discoverability at all but crap :thumbdown:

Crawlers

Posted: 30 Oct 2024, 15:04
by K4sum1
too busy and too lazy

Crawlers

Posted: 12 Nov 2024, 18:40
by K4sum1
Since the server likes to die every day, I wanted to try to tackle some of the potential reasons. That included finally implementing some sort of robots.txt

I didn't feel like outright blocking search engines like Yandex or AI bots, other than those from companies I hate like Meta. Also some SEO tools seem useless and could spam my shit so I blocked them outright. For example Ahrefs is blocked both in robots.txt and on a firewall level now.

So using your robots.txt as a base and with the help of a friend, I came up with this:
https://board.eclipse.cx/robots.txt

Crawlers

Posted: 12 Nov 2024, 23:13
by Duke
Well done! :thumbup:

Crawlers

Posted: 13 Nov 2024, 04:12
by Compa
Duke wrote: 12 Nov 2024, 23:13 Well done! :thumbup:
It took me about an hour to convince him to do a proper job of it.
Thanks for providing a nice template for phpBB though, that really helped us. :)

Crawlers

Posted: 20 Mar 2025, 23:23
by Duke
AI crawlers attacks and abuse:
https://news.ycombinator.com/item?id=43422413