Eclipse Community

Crawlers

Post Reply   Page 1 of 1  [ 8 posts ]
First unread post
Author Message
Duke
Post subject: Crawlers
+ Posted: 30 Aug 2024, 10:42
Full Moderator
User avatar
Offline
 
Posts: 312
Joined: 16 Mar 2024, 13:32
OS: Windows 8.1 x64
 
Considering the nature and the content of this forum, it might be a good idea to block the search engines crawlers like Ahrefs [Bot], Bing [Bot], Google [Bot], Semrush [Bot] from browsing this forum ;)

An example of robots.txt from another forum:
User-agent: Amazonbot 
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: SemanticScholarBot
Disallow: /

User-agent: PetalBot
Disallow: /

User-agent: YandexBot
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: DotBot
Disallow: /

User-agent: SemrushBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: AhrefsBot
Crawl-delay: 10
Disallow: /ajax.php
Disallow: /attachment.php
Disallow: /calendar.php
Disallow: /cron.php
Disallow: /editpost.php
Disallow: /global.php
Disallow: /image.php
Disallow: /inlinemod.php
Disallow: /joinrequests.php
Disallow: /login.php
Disallow: /member.php
Disallow: /memberlist.php
Disallow: /misc.php
Disallow: /moderator.php
Disallow: /newattachment.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /poll.php
Disallow: /postings.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /profile.php
Disallow: /register.php
Disallow: /report.php
Disallow: /reputation.php
Disallow: /search.php
Disallow: /sendmessage.php
Disallow: /showgroups.php
Disallow: /subscription.php
Disallow: /threadrate.php
Disallow: /usercp.php
Disallow: /usernote.php

User-agent: *
Disallow: /ajax.php
Disallow: /attachment.php
Disallow: /calendar.php
Disallow: /cron.php
Disallow: /editpost.php
Disallow: /global.php
Disallow: /image.php
Disallow: /inlinemod.php
Disallow: /joinrequests.php
Disallow: /login.php
Disallow: /member.php
Disallow: /memberlist.php
Disallow: /misc.php
Disallow: /moderator.php
Disallow: /newattachment.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /poll.php
Disallow: /postings.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /profile.php
Disallow: /register.php
Disallow: /report.php
Disallow: /reputation.php
Disallow: /search.php
Disallow: /sendmessage.php
Disallow: /showgroups.php
Disallow: /subscription.php
Disallow: /threadrate.php
Disallow: /usercp.php
Disallow: /usernote.php


Top
Profile Quote
K4sum1
Post subject: Crawlers
+ Posted: 30 Aug 2024, 11:27
Lazy Owner
User avatar
Offline
 
Posts: 1185
Joined: 11 Jan 2021, 07:40
Location: ur dads house
OS: Windows 8.1 x64
 
And kill discoverability? no lol

_________________

I don't know what I'm doing hit album by Brad Sucks


Top
Profile Quote
Duke
Post subject: Crawlers
+ Posted: 30 Oct 2024, 14:23
Full Moderator
User avatar
Offline
 
Posts: 312
Joined: 16 Mar 2024, 13:32
OS: Windows 8.1 x64
 
Google [Bot] ok, but AdsBot [Google] and Amazon [Bot] :evil:

Are you sure you want those ? They don't bring you discoverability at all but crap :thumbdown:


Top
Profile Quote
K4sum1
Post subject: Crawlers
+ Posted: 30 Oct 2024, 15:04
Lazy Owner
User avatar
Offline
 
Posts: 1185
Joined: 11 Jan 2021, 07:40
Location: ur dads house
OS: Windows 8.1 x64
 
too busy and too lazy

_________________

I don't know what I'm doing hit album by Brad Sucks


Top
Profile Quote
K4sum1
Post subject: Crawlers
+ Posted: 12 Nov 2024, 18:40
Lazy Owner
User avatar
Offline
 
Posts: 1185
Joined: 11 Jan 2021, 07:40
Location: ur dads house
OS: Windows 8.1 x64
 
Since the server likes to die every day, I wanted to try to tackle some of the potential reasons. That included finally implementing some sort of robots.txt

I didn't feel like outright blocking search engines like Yandex or AI bots, other than those from companies I hate like Meta. Also some SEO tools seem useless and could spam my shit so I blocked them outright. For example Ahrefs is blocked both in robots.txt and on a firewall level now.

So using your robots.txt as a base and with the help of a friend, I came up with this:
https://board.eclipse.cx/robots.txt

_________________

I don't know what I'm doing hit album by Brad Sucks


Top
Profile Quote
Duke
Post subject: Crawlers
+ Posted: 12 Nov 2024, 23:13
Full Moderator
User avatar
Offline
 
Posts: 312
Joined: 16 Mar 2024, 13:32
OS: Windows 8.1 x64
 
Well done! :thumbup:


Top
Profile Quote
Compa
Post subject: Crawlers
+ Posted: 13 Nov 2024, 04:12
Banned
Offline
 
Posts: 498
Joined: 13 Jan 2021, 08:09
 
Duke wrote: *  12 Nov 2024, 23:13
Well done! :thumbup:
It took me about an hour to convince him to do a proper job of it.
Thanks for providing a nice template for phpBB though, that really helped us. :)


Top
Profile Quote
Duke
Post subject: Crawlers
+ Posted: 20 Mar 2025, 23:23
Full Moderator
User avatar
Offline
 
Posts: 312
Joined: 16 Mar 2024, 13:32
OS: Windows 8.1 x64
 
AI crawlers attacks and abuse:
https://news.ycombinator.com/item?id=43422413


Top
Profile Quote
Display: Sort by: Direction:
Post Reply   Page 1 of 1  [ 8 posts ]
Return to “Suggestions”
Jump to:

Who is online

Users browsing this forum: No registered users and 0 guests