Archive for the 'Internet' Category

Google Android User Agent

Monday, December 21st, 2009

Looks like someone with an Android handset visited cleverhack earlier today… Notice that Google has a special version of the search engine interface for Android (hint: click on the referrer). This seems to be the latest build of Android at 2.0.1, had no idea Google was using the AppleWebKit framework though. The screen size is also generous, too. Resolution : 854 x 480
Color Depth : 32 bits

Http Code: 200 Date: Dec 21 14:23:49 Http Version: HTTP/1.1 Size in Bytes: 13396
Agent: Mozilla/5.0 (Linux; U; Android 2.0.1; en-us; Droid Build/ESD56) AppleWebKit/530.17 (KHTML, like Gecko) Version/4.0 Mobile Safari/530.17

Cuil referrer info (just because you like it)

Monday, July 28th, 2008

Because of the hoopla around cuil today, I thought I’d take a peek at this newest search engine’s referrers.

Cuil crawler info. I know I’ve been seeing this bot for the past year or so. Cuil’s crawler is apparently called twiceler (is that a pun?) and the user agent string uses which 302 redirects to the domain. As of this writing, the
cuil Webmaster info URL
has been updated from what is in the bot’s user agent string.




Http Code: 200 Date: Jul 28 15:02:12 Http Version: HTTP/1.0 Size in Bytes: 68965

Referer: -

Agent: Mozilla/5.0 (Twiceler-0.9

As for cuil visitor referrer info, here you go…

[Visitor’s IP Address]



Http Code: 200 Date: Jul 28 17:31:24 Http Version: HTTP/1.1 Size in Bytes: 17773


Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv: Gecko/2008070206 Firefox/3.0.1

If you happen to see a “&sl=long” appended after the referrer i.e. (, it indicates that the visitor was using the two column layout. If cuil ever gets significant marketshare, you can bet there will be SEO’s stressing about how their sites show in the two column vs. three column layout.

Otherwise, a cuil visitor presents in your visitor logs pretty much as any other visitor from the big search engines. The IP address belongs to the user (not a proxy like and so does the user agent.

As for my thoughts about cuil, I am not impressed with the image thumbnails with the search results, as nearly all I have seen so far have been wildly inappropriate for the results. As for information volume, I haven’t done a statistical survey, but google still presents a volume of results as opposed to cuil.

Rogue SEO spells out oh so not awesome

Sunday, July 6th, 2008

So earlier today I was doing some catching up on Google Alerts for some domains that I manage.

And I kept on finding pages which were unusually formatted.

When I first noticed these pages the middle of last week, I took them for a stupidly overzealous SEO who was planting link farms on sites he owns.

Now, I don’t think so - after examining a number of these rogue SEO pages, it looks like someone is taking advantage of an exploit in Apache to post directories full of these rogue SEO pages, to boost their page rank (while adding outside links on these rogue pages to, I guess, appear genuine).

All of the pages I’ve found are on machines running Apache in shared hosting settings with poorly maintained / designed parent sites. That sure as heck points to exploit.

Take for example the page I posted above. The full URL looks like

Since, like I noted before, the site is poorly maintained which means you can go ahead and browse the parent directories. The main Web site seems to be a homepage (created in Microsoft FrontPage) for a concert promoter in Allentown, PA. The hosting provider is E-Commerce, Inc. And this was just one, out of a number of pages that I found hosted by E-Commerce, Inc. I also found other pages on sites hosted by The Planet and, irony abounding, The Institute for Intelligence Studies at Mercyhurst College.

So, just who is planting these pages and why?

Don’t like Shyftr? Block the IP.

Saturday, April 12th, 2008

This past weekend there’s been a conversation about Shyftr a new RSS service that allows people to read and comment on full text stories on the Shyftr site, rather making the reader click through to the originating blog to comment. The thought is that folks who care about pageviews for advertising will lose out in such a scenario.

So, in the spirit of helping the wider, feathers in a ruffle, blogging community out, I’ve pasted the Shyftr RSS bot info below. The good news is that you can block the Shyftr IP address from accessing your blog (if you already have that capability through your blog hosting solution, etc.). As of present, the IP address is

Unlike other annoying bots, I would not block the user agent in your .htaccess file as the RSS bot software the Shyftr folks are using is the generic MagpieRSS toolset, which is used by other RSS services. Hopefully, the people at Shyftr will rename the user agent to something more uniquely identifiable in the future so you can block via .htaccess.

(Note: Blocking a future unique Shyftr user agent via robots.txt probably won’t work as the crawler would need to fetch the robots.txt file first before fetching your feed and I didn’t see that behavior tonight.)

Http Code: 200 Date: Apr 12 19:48:28 Http Version: HTTP/1.0 Size in Bytes: 6244
Referer: -
Agent: MagpieRSS/0.72 (+
Http Code: 200 Date: Apr 12 19:48:28 Http Version: HTTP/1.0 Size in Bytes: 1406
Referer: -
Agent: -

Some real people feedback about bookmarklets…

Sunday, January 20th, 2008

On the MSNBC developer blog, the question was posed How do you share?. Not in the grade school way, but in the newfangled Web 2.0 way.

Overall, the comments from MSNBC readers were pretty… negative. Aside from the “I’ll just paste the link I want to share in an email” or the “I’ll just add the page to my browser bookmarks” or the “they’re tracking your habits for nefarious purposes” comments, other commenters cited just one or two social bookmarking sites (the most popular seeming to be either or And a few other commenters wondered, “Hey, MSNBC, don’t you own Newsvine?”

It appears that the zen habits of social bookmarking hasn’t been widely accepted by the at large Internet populace.

Guy loses his domain due to a Gmail exploit

Saturday, December 29th, 2007

Had anyone else read this story of David Airey’s domains being stolen from him because of a Gmail exploit?

Both of David’s domains have been subsequently restored, thanks to the publicity he received this week.

Netscape Navigator End of Lifed, The Rest of Us Get A Little Nostalgic

Friday, December 28th, 2007

Let’s all take a moment and remember the good old days of the Internet in the 1990s … the Netscape Web browser is being end of lifed as of Feb 2008.

If you didn’t catch Code Rush, a documentary on Netscape which was shown on PBS in 2000, I highly recommend you do so.

SWSE - Semantic Web Search Engine

Sunday, December 23rd, 2007

This particular crawler is being deployed from the Semantic Web Search Engine (SWSE) project, which is attempting to crawl the nascent Semantic Web, including RSS and FOAF data.

This is yet another reason why deploying RSS is a good idea for any Web presence.

Here’s a link to the SWSE search demo.

Http Code: 304 Date: Dec 18 14:56:27 Http Version: HTTP/1.1 Size in Bytes: -
Referer: -
Agent: multicrawler (+

MSN Live Search - New activity

Monday, December 17th, 2007

Has anyone else seen some different activity coming from MSN? What I mean is that I’m seeing the following entries in my search logs, but it doesn’t appear like traditional MSNBot crawler behavior.

Why this activity is different:
1) The originating IP address is from the MSN netblock.
2) There is an alleged referrer that looks like it is from an MSN search
3) The user agent is showing as a browser.
4) This activity is showing very close to when I see MSNBot entries in my logs.

And no, the behavior does not appear to be a real life user.

Http Code: 200 Date: Dec 17 02:59:16 Http Version: HTTP/1.0 Size in Bytes: 40839
Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)

Http Code: 200 Date: Dec 17 03:13:02 Http Version: HTTP/1.0 Size in Bytes: 43238
Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)

Radian6 monitors you!

Sunday, December 16th, 2007

New crawler in my logs from an outfit called Radian6. From the Web site, they look to be a social media monitoring service for the Google Alerts challenged, I guess much in the same way as those other pre-existing social media monitoring services.

Http Code: 200 Date: Dec 16 16:52:32 Http Version: HTTP/1.1 Size in Bytes: 7365
Referer: -
Agent: R6_FeedFetcher_(

Hack Yahoo Fantasy Football

Monday, December 10th, 2007

Heh. If you happen to Google hack yahoo fantasy football we get a SERP that, well Google isn’t quite sure what to display on the SERP. Cleverhack is currently showing as #3 on the SERP, but with no related text. I’ve never seen a SERP quite like this one before.

You see, I’ve been getting a bunch of hits for this particular keyword combination, but what is most interesting is that up until now, I’ve never used the keywords in a sequential phrase. Sure, the word hack is in the URL, I list Yahoo IM info on my sidebar and I’ve written about fantasy football before (and I’m in the playoffs for two leagues this year!), but never sequentially, until now.

Kind of interesting, and yes I’m ruining the results because I’m blogging about it. And for you guys looking for how to hack Yahoo fantasy football, I don’t have that information on this blog…all I can say is that Randy Moss helped me out and LT has been mediocre up until recently.


Feed Each Other

Monday, December 10th, 2007

Feed Each Other is yet another online feed reader released in late September. According to one of the developers behind it, the difference between Feed Each Other and other online feed services is that Feed Each Other
lets you harness the power of your network of friends and colleagues to help you filter and explore the web in an fun, enlightening, efficient way

It seems like the idea is more of a feature than a standalone feed reading service.

However, I will admit that I do like the strict XHTML they’re using…quite nice.

Http Code: 200 Date: Dec 09 19:31:02 Http Version: HTTP/1.1 Size in Bytes: 6916
Referer: -
Agent: FeedEachOther :) +