Crawlers

Site suggestions.
User avatar
Duke
Full Moderator
Posts: 161
Joined: 16 Mar 2024, 13:32
OS: Windows 8.1 x64
Has thanked: 36 times
Been thanked: 28 times

Crawlers

Unread post by Duke »

Considering the nature and the content of this forum, it might be a good idea to block the search engines crawlers like Ahrefs [Bot], Bing [Bot], Google [Bot], Semrush [Bot] from browsing this forum ;)

An example of robots.txt from another forum:

Code: Select all

User-agent: Amazonbot 
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: SemanticScholarBot
Disallow: /

User-agent: PetalBot
Disallow: /

User-agent: YandexBot
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: DotBot
Disallow: /

User-agent: SemrushBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: AhrefsBot
Crawl-delay: 10
Disallow: /ajax.php
Disallow: /attachment.php
Disallow: /calendar.php
Disallow: /cron.php
Disallow: /editpost.php
Disallow: /global.php
Disallow: /image.php
Disallow: /inlinemod.php
Disallow: /joinrequests.php
Disallow: /login.php
Disallow: /member.php
Disallow: /memberlist.php
Disallow: /misc.php
Disallow: /moderator.php
Disallow: /newattachment.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /poll.php
Disallow: /postings.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /profile.php
Disallow: /register.php
Disallow: /report.php
Disallow: /reputation.php
Disallow: /search.php
Disallow: /sendmessage.php
Disallow: /showgroups.php
Disallow: /subscription.php
Disallow: /threadrate.php
Disallow: /usercp.php
Disallow: /usernote.php

User-agent: *
Disallow: /ajax.php
Disallow: /attachment.php
Disallow: /calendar.php
Disallow: /cron.php
Disallow: /editpost.php
Disallow: /global.php
Disallow: /image.php
Disallow: /inlinemod.php
Disallow: /joinrequests.php
Disallow: /login.php
Disallow: /member.php
Disallow: /memberlist.php
Disallow: /misc.php
Disallow: /moderator.php
Disallow: /newattachment.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /poll.php
Disallow: /postings.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /profile.php
Disallow: /register.php
Disallow: /report.php
Disallow: /reputation.php
Disallow: /search.php
Disallow: /sendmessage.php
Disallow: /showgroups.php
Disallow: /subscription.php
Disallow: /threadrate.php
Disallow: /usercp.php
Disallow: /usernote.php

User avatar
K4sum1
Lazy Owner
Posts: 1082
Joined: 11 Jan 2021, 07:40
Location: ur dads house
OS: Windows 8.1 x64
Has thanked: 722 times
Been thanked: 359 times
Contact:
United States of America

Crawlers

Unread post by K4sum1 »

And kill discoverability? no lol
I don't know what I'm doing hit album by Brad Sucks

User avatar
Duke
Full Moderator
Posts: 161
Joined: 16 Mar 2024, 13:32
OS: Windows 8.1 x64
Has thanked: 36 times
Been thanked: 28 times

Crawlers

Unread post by Duke »

Google [Bot] ok, but AdsBot [Google] and Amazon [Bot] :evil:

Are you sure you want those ? They don't bring you discoverability at all but crap :thumbdown:

User avatar
K4sum1
Lazy Owner
Posts: 1082
Joined: 11 Jan 2021, 07:40
Location: ur dads house
OS: Windows 8.1 x64
Has thanked: 722 times
Been thanked: 359 times
Contact:
United States of America

Crawlers

Unread post by K4sum1 »

too busy and too lazy
I don't know what I'm doing hit album by Brad Sucks

User avatar
K4sum1
Lazy Owner
Posts: 1082
Joined: 11 Jan 2021, 07:40
Location: ur dads house
OS: Windows 8.1 x64
Has thanked: 722 times
Been thanked: 359 times
Contact:
United States of America

Crawlers

Unread post by K4sum1 »

Since the server likes to die every day, I wanted to try to tackle some of the potential reasons. That included finally implementing some sort of robots.txt

I didn't feel like outright blocking search engines like Yandex or AI bots, other than those from companies I hate like Meta. Also some SEO tools seem useless and could spam my shit so I blocked them outright. For example Ahrefs is blocked both in robots.txt and on a firewall level now.

So using your robots.txt as a base and with the help of a friend, I came up with this:
https://board.eclipse.cx/robots.txt
I don't know what I'm doing hit album by Brad Sucks

User avatar
Duke
Full Moderator
Posts: 161
Joined: 16 Mar 2024, 13:32
OS: Windows 8.1 x64
Has thanked: 36 times
Been thanked: 28 times

Crawlers

Unread post by Duke »

Well done! :thumbup:

Compa
Posts: 502
Joined: 13 Jan 2021, 08:09
Has thanked: 24 times
Been thanked: 5 times

Crawlers

Unread post by Compa »

Duke wrote: 12 Nov 2024, 23:13 Well done! :thumbup:
It took me about an hour to convince him to do a proper job of it.
Thanks for providing a nice template for phpBB though, that really helped us. :)

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest