Perl Programmer/Consultant
Remote System Administrator
Free Software
... contact me
  while ($making_other_plans) { life(); }
  location('ipsstb', 'internet', 'web_weirdness');

 For Web Designers 2022-08-28 04:20:47 UTC Mail Delivery Problems? 

Monday, August 06 2012

Bad Bots Must Be Punished.

I periodically look through my web server logs to pick out things that are not as they should be. You might recall from previous blog entries that I operate spam traps and so on -- last night I picked out of my server logs that some critter calling itself MJ12bot was going where no legitimate bots belong. But it's apparently trying to be a good bot because it leaves its calling card: - - [06/Aug/2012:01:28:42 -0600] "GET /robots.txt HTTP/1.0" 200 1247 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.3;")"

So off I go to that URI, and find that the folks who run the thing have said "If you have reason to believe that MJ12bot did NOT obey your robots.txt commands, then please let us know via email..." And so I did.

We discussed the matter via email a bit, and it seems probable that their bot encountered some kind of network error when it tried to grab my robots.txt file. Not an error response from my server, but a failure to even contact my server. To my way of thinking, in a case like that a properly designed bot will try again to get that file, and will not crawl the site until it gets either the file or a verifiable 404 Not Found. Not MJ12bot, though. The network failure is treated as if it were a 404, and is taken to mean that the whole darn site is wide open to them. Here's what their guy Alex said:

Sadly it's very difficult for us to diagnose this case - as you can see from your logs our bot grabs robots.txt, so we are not intentionally breaking your directives, it's just if bot could not get robots.txt then it could not obey it :(

Huh? Your bot encounters a network error and that gives you license to crawl my site in violation of my terms of service? It seems to me that if you know your crawler is broken in that way, which you do now, and you continue to run it knowing it's broken in that way, then what you're doing is willful negligence and that makes it intentional.

No worries here. I've informed the folks behind the thing that their bot is no longer welcome here and any connections it makes will be considered trespass. The fun part? When their bot comes around it will not see my web site. It will instead see a very, very long joke that will be delivered very, very slowly. How slowly? From start to finish will take from an hour and a half to more than six hours.

If you've seen a bad bot in your logs and want to punish it in this way, feel free to hit my contact form to inquire about it. It's a freebie if all you need is the application itself and very minimal installation/configuration instructions. After all: Bad Bots Must Be Punished!

→ committed: 8/6/2012 17:36:54

[ / internet / web_weirdness] permanent link

Comments: 6    Trackbacks: 0

Save the Net

Creative Commons License

Project Honeypot Member

September 2022
Mon Tue Wed Thu Fri Sat Sun
27 28 29 30    

By Month:

By category:


Served to at 04:20:47 GMT on Wednesday, September 28, 2022.