abusive crawler
This is a stupid spam harvester hitting my site right now. For those of you Web masters who read this blog, you can go ahead and be a bit proactive by blocking 208.66.195.0/28 block from your site right now. This abusive crawler, originating from West coast cogentco colo has been hitting my site at a rate of one new request every four seconds for the past couple of minutes.
Not only is it a bad crawler for it’s abusive crawling activities, but the user agent is spoofing Internet Explorer. Not suprisingly, Spamhaus has more here and here
Host: 208.66.195.7
/2005/06/07/i-owe/
Http Code: 200 Date: Aug 27 11:33:07 Http Version: HTTP/1.1 Size in Bytes: 14175
Referer: -
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)









Subscribe to Cleverhack 


















August 27th, 2006 at 6:54 pm
Thanks for sharing. In exchange, here’s my current /etc/hosts.deny sshd collection:
http://www.mcarthurweb.com/archive.php?item=224
August 29th, 2006 at 9:15 pm
If I may ask, what’s so abusive about once every 4 seconds? In all seriousness, that sounds like a request rate that a Commodore 64 ought to be able to keep up with.
I mean, sure, spam harvesting = teh suck, but I’m referring only the crawler’s behavior itself.
August 29th, 2006 at 9:23 pm
Thad, Because too many requests per minute will take down the site. This is especially true of database driven sites like this one.
I don’t need a non-legit source of traffic to add to the load.
August 29th, 2006 at 11:20 pm
OK. The only web content I’ve ever done has been static, so I don’t have any feel for how much load a database-driven site can create. I’ll believe it when you say an incessant request every 4 seconds could contribute to hosing a database-driven site.
I mean, after all, there were web servers for 8-bit embedded processors back in ‘99, devices that are fast only when compared to geological measures. I looked into this for my company at that time. (In that application, the content would have been dynamically generated, though not from a database.) Given what an eternity 4 seconds is for even a lowly 1GHz processor these days….
So, note to self: re-write spam-bot so it pauses for 5 seconds…. beta test on cleverhack to see if it passes muster.
;-)
August 30th, 2006 at 11:51 pm
Thad,
Well, if you think about it this way, every server has finite resources. So, if you’re running a Webserver and a database and a mail MTA, server processes add up.
In addition, I’m on a shared server, so I’m sure my hosting provider would rather have me ban a bad bot than have it not only stress my site, but potentially stressing the rest of the server. Read here for what can happen in a case like that.