<?xml version="1.0" encoding="utf-8"?><!-- generator="wordpress/2.0.2" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: abusive crawler</title>
	<link>http://cleverhack.com/2006/08/27/abusive-crawler/</link>
	<description>A Blog About Technology, Search Engine Optimization (SEO), Internet Marketing And More.</description>
	<pubDate>Sat, 10 Jan 2009 00:32:39 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.0.2</generator>

	<item>
		<title>by: joy</title>
		<link>http://cleverhack.com/2006/08/27/abusive-crawler/#comment-24545</link>
		<pubDate>Thu, 31 Aug 2006 03:51:31 +0000</pubDate>
		<guid>http://cleverhack.com/2006/08/27/abusive-crawler/#comment-24545</guid>
					<description>Thad,

Well, if you think about it this way, every server has finite resources. So, if you're running a Webserver and a database and a mail MTA, server processes add up. 

In addition, I'm on a shared server, so I'm sure my hosting provider would rather have me ban a bad bot than have it not only stress my site, but potentially stressing the rest of the server. &lt;a href=&quot;http://cleverhack.com/2006/08/30/my-wp-corrupted/&quot; rel=&quot;nofollow&quot;&gt;Read here for what can happen in a case like that&lt;/a&gt;.</description>
		<content:encoded><![CDATA[<p>Thad,</p>
<p>Well, if you think about it this way, every server has finite resources. So, if you&#8217;re running a Webserver and a database and a mail MTA, server processes add up. </p>
<p>In addition, I&#8217;m on a shared server, so I&#8217;m sure my hosting provider would rather have me ban a bad bot than have it not only stress my site, but potentially stressing the rest of the server. <a href="http://cleverhack.com/2006/08/30/my-wp-corrupted/" rel="nofollow">Read here for what can happen in a case like that</a>.</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Thad</title>
		<link>http://cleverhack.com/2006/08/27/abusive-crawler/#comment-24539</link>
		<pubDate>Wed, 30 Aug 2006 03:20:47 +0000</pubDate>
		<guid>http://cleverhack.com/2006/08/27/abusive-crawler/#comment-24539</guid>
					<description>OK. The only web content I've ever done has been static, so I don't have any feel for how much load a database-driven site can create. I'll believe it when you say an incessant request every 4 seconds could contribute to hosing a database-driven site.

I mean, after all, there were web servers for 8-bit embedded processors back in '99, devices that are fast only when compared to geological measures. I looked into this for my company at that time. (In that application, the content would have been dynamically generated, though not from a database.) Given what an eternity 4 seconds is for even a lowly 1GHz processor these days....

So, note to self: re-write spam-bot so it pauses for 5 seconds.... beta test on cleverhack to see if it passes muster.

;-)</description>
		<content:encoded><![CDATA[<p>OK. The only web content I&#8217;ve ever done has been static, so I don&#8217;t have any feel for how much load a database-driven site can create. I&#8217;ll believe it when you say an incessant request every 4 seconds could contribute to hosing a database-driven site.</p>
<p>I mean, after all, there were web servers for 8-bit embedded processors back in &#8216;99, devices that are fast only when compared to geological measures. I looked into this for my company at that time. (In that application, the content would have been dynamically generated, though not from a database.) Given what an eternity 4 seconds is for even a lowly 1GHz processor these days&#8230;.</p>
<p>So, note to self: re-write spam-bot so it pauses for 5 seconds&#8230;. beta test on cleverhack to see if it passes muster.</p>
<p>;-)</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: joy</title>
		<link>http://cleverhack.com/2006/08/27/abusive-crawler/#comment-24536</link>
		<pubDate>Wed, 30 Aug 2006 01:23:58 +0000</pubDate>
		<guid>http://cleverhack.com/2006/08/27/abusive-crawler/#comment-24536</guid>
					<description>Thad, Because too many requests per minute will take down the site. This is especially true of database driven sites like this one.

I don't need a non-legit source of traffic to add to the load.</description>
		<content:encoded><![CDATA[<p>Thad, Because too many requests per minute will take down the site. This is especially true of database driven sites like this one.</p>
<p>I don&#8217;t need a non-legit source of traffic to add to the load.</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Thad</title>
		<link>http://cleverhack.com/2006/08/27/abusive-crawler/#comment-24535</link>
		<pubDate>Wed, 30 Aug 2006 01:15:27 +0000</pubDate>
		<guid>http://cleverhack.com/2006/08/27/abusive-crawler/#comment-24535</guid>
					<description>If I may ask, what's so abusive about once every 4 seconds? In all seriousness, that sounds like a request rate that a Commodore 64 ought to be able to keep up with.

I mean, sure, spam harvesting = teh suck, but I'm referring only the crawler's behavior itself.</description>
		<content:encoded><![CDATA[<p>If I may ask, what&#8217;s so abusive about once every 4 seconds? In all seriousness, that sounds like a request rate that a Commodore 64 ought to be able to keep up with.</p>
<p>I mean, sure, spam harvesting = teh suck, but I&#8217;m referring only the crawler&#8217;s behavior itself.</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Don McArthur</title>
		<link>http://cleverhack.com/2006/08/27/abusive-crawler/#comment-24509</link>
		<pubDate>Sun, 27 Aug 2006 22:54:02 +0000</pubDate>
		<guid>http://cleverhack.com/2006/08/27/abusive-crawler/#comment-24509</guid>
					<description>Thanks for sharing. In exchange, here's my current /etc/hosts.deny sshd collection:

http://www.mcarthurweb.com/archive.php?item=224</description>
		<content:encoded><![CDATA[<p>Thanks for sharing. In exchange, here&#8217;s my current /etc/hosts.deny sshd collection:</p>
<p><a href='http://www.mcarthurweb.com/archive.php?item=224' rel='nofollow'>http://www.mcarthurweb.com/archive.php?item=224</a></p>
]]></content:encoded>
				</item>
</channel>
</rss>
