Archive for the 'Internet' Category

Explicit Gmail referrer

Monday, November 27th, 2006

Recently, I had a visitor who clicked through a link in an email he/she received via Gmail. That isn’t so unusual, what was unusual was that the referrer read like this…

http://mail.google.com/mail/?account_id=username%40gmail.com

And thereby the visitor’s Gmail address was shared with me, merely by the visitor clicking through an email link. Normally, Gmail referrers look something like the example below (and yes, other Web mail providers - Yahoo Mail, AIM, MSN, etc. have similarly obscured referrers.)

https://mail.google.com/mail/?auth=somereallylongalphanumericstring

I suspect that the visitor was reading his/her Gmail through something other than an official Gmail client, but I don’t know what.

[tags]Gmail, referrer, Web mail [/tags]

Competitio.us

Sunday, November 26th, 2006

So, I get this unknown to me referrer this morning from a Web site called Competitio.us and I just had to check it out. Competitio.us is another brand tracking Web site service currently in beta. From what I could gather on the Web site, you (as a brand or project manager) sign up for the service and then set up Competitio.us to monitor news and traffic statistics about your…competition. Get it?

As for design, Competitio.us is using XHTML Transitional and their markup looks pretty clean. The underlying technology platform appears to be Ruby on Rails on Linux. Their color scheme and font choices just work. I’m going to give them bonus points for minimizing the use of images on their public pages and their harnessing of the H1 and H2 tags. I think their front page looks really good in terms of how much copy they’ve integrated, although they’re not currently not showing on Google for “competitor research” a term I’d think they’d want to show for. They may want to look at the keywords in their copy and maybe introduce some link anchor text with those keywords. In addition, their images are not alt tagged. The title tag on their pages could use some keywords, along with the fact there are no meta description tags.

As for “hacking” Competitio.us, even with the referrer information I have, I cannot view how or why my blog is being tracked. The referrer only links to this blog’s main URL. For giggles, I even signed up for a Competitio.us account, and tried the URL in the referrer, and only saw an ugly Application error (Rails) message - someone at Competitio.us may want to redirect a request like that to a page with information - so one cannot see URLs not associated with their account.

Host: [User’s IP Address]
/
Http Code: 200 Date: Nov 26 10:52:58 Http Version: HTTP/1.1 Size in Bytes: 15447
Referer: http://competitio.us/project/5217/competitor/23433
Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0

[tags]Competitio.us, ruby on rails, competitor research, competitor tracking, Web 2.0 [/tags]

Elements of Web 2.0 Design

Friday, November 24th, 2006

The Visual Design of Web 2.0 discusses some aspects of Web 2.0 design and how these aspects are implemented.

[tags]Web 2.0, Web Design, rounded corners, bright colors [/tags]

Amazon.com DDoS’ed by Customers Vote Winner

Thursday, November 23rd, 2006

In case you were hoping to take advantage of the Amazon Customers Vote deal for a $100 Xbox 360 on Thanksgiving, Amazon.com was reportedly not reachable from least 2-2:15pm EST (11am-11:15am PST). Presumably, the traffic caused by the $100 Xbox seekers was simply too much.

Some people are complaining that they couldn’t even load the Amazon homepage…I tried to around 2:10 EST and couldn’t get a response from the Web server.

Update: There are over 500 comments in a thread on the Amazon Customers Vote Forum with disgruntled customers chiming in, in addition to other blogs which have noted the outage. Plenty of people are not happy and some are filing Better Business Bureau complaints.

The Black Friday / Cyber Monday Fallacy

Tuesday, November 21st, 2006

Since some are speculating about shoppers habits this upcoming weekend, I’d like to chime in. Speaking as a person formerly involved in E-commerce, I can tell you that Black Friday and the upcoming Cyber Monday won’t be all that. Sure, the weekend will be good for online retailers, but these won’t be the busiest days for online retailers during the 2006 Holiday Season.

In fact, the busiest days for online retailers this year will be December 12th and 13th. The reason? Christmas is on a Monday this year, and that pushes up the last day for a package to ship via UPS Ground within the Continental US to Friday, December 15th. (You can spec out a shipment on the UPS Web site to see what I mean.) Many retailers will use the 14th as an extra day for shipping and handling and to sort out issues on their end.

The bulk of e-commerce shoppers are those who would rather not go shopping in brick and mortar stores, hence the waiting until the second week of December to shop, but who also don’t want to be bothered with expedited shipping expenses and/or waiting for a package to arrive at the last minute.

Shipping is a big deal to consumers, as we’ve seen how consumers would rather take a “free shipping” deal over a percentage off deal - even when the percentage off will help consumers save more. This shopper psychology is why sites like Amazon pull stunts like “free expedited shipping” in the last days before Christmas - they’re trying to convince folks to stick around and buy and not worry about shipping.

*Formerly as in, currently advising.

Google

Tuesday, November 21st, 2006

Well, since Google stock hit $500 today, I thought I’d do a Google post.

-I had someone pulling my feed who was on Google wifi.

-My Dad just had to compare Microsoft Virtual Earth vs. Google Earth. He seemed to like the zooming action of both, but thought it was difficult to tilt the view within Google Earth.

-I am administering a Google AdWords campaign which is unfortunately showing ads on a number of those cruddy waste of bandwidth Made for Adsense (MFA) sites. While I don’t want to take the ads off the content network, the only solution I can think of is to Google for the advertised URL and manually remove the MFA sites from the campaign. What a pain.

[tags]Google, Google Wifi, Google Earth, Google AdWords, Google AdSense [/tags]

Entireweb Speedy Spider

Monday, November 20th, 2006

Speedy Spider is the crawler for the Sweden based search engine Entireweb. I have not seen any referrers from Entireweb, but their Speedy Spider featured a URL to the informative Speedy Spider FAQ. In addition, Speedy Spider is quite polite for a bot, only crawling one or two pages per request.

Host: 62.13.25.220
/robots.txt
Http Code: 200 Date: Nov 19 06:57:54 Http Version: HTTP/1.0 Size in Bytes: 6702
Referer: -
Agent: Speedy Spider (Entireweb; Beta/1.0; http://www.entireweb.com/about/search_tech/speedyspider/)

[tags]Search Engine, bot, crawler, spider, Entireweb, Speedy Spider [/tags]

BuzzLogic

Sunday, November 19th, 2006

Oh, wait, just when you thought we were done here with research services for the Google impaired, there is yet another one. Buzzlogic has been sending out their crawler for the past few weeks to this blog and by happenstance currently has a private beta for companies.

What is different about BuzzLogic’s crawler though is that it’s revealing a referrer which really, honestly should not be seen in the Web logs. Also, their crawler does not have any identifying information in the User Agent field. Here’s an example.

The questionable referrer, which I am seeing via Sitemeter looks like this:

[file:///data/thumbnailer/work/home-2006-11-17-17:21:16.438/2006-11-19-07:37:13.838-in.html

If I had to guess, however BuzzLogic compiles the collected data into a static HTML file. I’ve seen that static HTML file change day by day, each with a different time/date stamp for each individual instance it hits my Web server.

This is what I see via my Web logs.
Host: 64.34.246.44 (I was only able to connect this to BuzzLogic through a traceroute of the IP address. The BuzzLogic Web server is hosted on what seems to be a completely different hosting provider.)
/wp-content/plugins/sociable/images/reddit.png (This crawler is hitting my image files for some reason.)
Http Code: 200 Date: Nov 19 10:37:14 Http Version: HTTP/1.1 Size in Bytes: 5943
Referer: -
Agent: Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.4 (like Gecko)

[tags]bot, crawler, scraper, buzzlogic, brand monitoring services, search engine challenged PR firms [/tags]

Webclipping

Saturday, November 18th, 2006

Yet another monitoring the Web just for you, your brand and your PR department which can’t use Google service, a bot from Webclipping was spied hitting my RSS feeds recently. Clicking around the Webclipping site (which doesn’t look all that hot in Firefox 2.0), the service seems to be similar to other monitoring outfits including brandimensions. (A side note, brandimensions, which I’ve written about before, charmingly has a Flash-only-I-don’t-really-want-to-be-found-by-search-engines homepage. As you can see, I’m not exactly a fan of a service which compiles my content and doesn’t allow me to see the context.)

Host: 38.144.36.19
/blog/blog.rdf
Http Code: 302 Date: Nov 18 18:08:51 Http Version: HTTP/1.1 Size in Bytes: 224
Referer: -
Agent: Mozilla/4.0 (Webclipping.com)

[tags]bot, crawler, scraper, webclipping, brand monitoring services, search engine challenged PR firms [/tags]

Hoopla

Saturday, November 18th, 2006

A still in private beta service Hoopla purports to be “the next big portal that renders other online news and blog services obsolete.” There’s also an accompanying blog, currently with only one entry.

I found Hoopla via my usual discovery method, my Web site logs where the crawler was hitting my RSS feeds. It appears they need to be crawling the Web and blogosphere for a bit in order to collect content for their portal. I can’t tell if the folks behind Hoopla are American and/or German though. It looks like the anonymous WHOIS registration is for an American company, and the language on the Hoopla parked page is definitely colloquial American English, but the crawler is from a German IP.

Host: 82.165.243.217
/blog/blog.rdf
Http Code: 302 Date: Nov 17 12:37:08 Http Version: HTTP/1.1 Size inBytes: 224
Referer: -
Agent: http://www.hoopla.com/; tracker@hoopla.com - Hoopla.com honors
robots.txt; Hoopla.com Tracker; Mozilla/5.0 (Windows; U; Windows NT 5.1;
en-US; rv:1.7.6) Gecko/20050402 Firefox/1.0.2

[tags]hoopla, RSS, beta, crawler, portal, portal page, Web 2.0 [/tags]

Three RSS applications - FeedSweep, Fatcast, Wefeelfine

Thursday, November 16th, 2006

FeedSweep provides a way to display syndicated RSS content on your site. So, for example, you could show the cleverhack feed if you really, really wanted to. One gripe, I couldn’t find an Add Feed to FeedSweep button.

<script src="http://www.feedsweep.com/products/feedsweep/producer.aspx?feeds=http%3A%2F%2Fcleverhack%2Ecom%2Ffeed%2F"></script>

Fatcast is an online RSS reader similar to Bloglines. Again, I could not find an Add Feed to Fatcast button. However, the service does allow one to share their list of feeds if they wish, so of course I made one exclusively with cleverhack feeds.

My Fatcast feed list.

Last, but certainly not without emotion is WeFeelFine. Appears to be a reasearch project which searches the blogosphere for words or phrases on how the blogosphere feels. The applet which displays the emotional information looks quite cool (click on the We Feel Fine link on the home page) and allows one to search via demographics - age, sex, location. (Warning, the applet seems to take a bit of memory in Firefox.) Anyone up for reading about how some emo twentysomethings from Seattle feel?

And how did I find WeFeelFine? Their crawler, which didn’t have any identifying info in the User Agent, but the lookup on the IP provided the domain name.

Host: 128.177.11.193
/2006/11/10/rutgers-9-0/
Http Code: 200 Date: Nov 10 00:05:56 Http Version: HTTP/1.1 Size in Bytes: 16477
Referer: -
Agent: Mozilla/4.0
00000000000000000000000000000000000000

[tags] RSS, RSS feeds, RSS Syndication, RSS Readers, RSS Research, FeedSweep, Fatcast, WeFeelFine [/tags]

Identify your (Web) spider

Wednesday, November 15th, 2006

Today Slashdot had a front page article about how to create a Web spider on Linux. Aside from the fact that the subject matter just totally excites my inner nerd, I wanted make a point especially for those who would be writing a spider, bot or crawler for fun and profit.

I have this true story about how, not so long ago, I was a Webmaster. One very busy morning, I had a crawler that was hitting my site and it was annoying the heck out of me as it was a little too aggressive. I really wanted to ban it, but I saw a URL in the User Agent, and so I tracked down the source. The homepage for the bot at the time looked like this and the site it was crawling for wasn’t live yet. At that point, I had a choice - I could just ban the bot and be done with it or allow the bot to run and hope that the not yet live site would someday provide some benefit.

As it turns out, I held my nose and allowed the bot to run. In fact, a few weeks later, it did slow down and was friendlier - so I didn’t mind it as much. The other part to this story is that the site in question went live in April 2006 - and it did show the crawled content.

In other words, if your bot is legit, identify it or face the chance that you could be banned from the very sites you want to crawl. While the shopwiki example isn’t the best example of a parked page, at least I had some information to go on as a Webmaster.

[tags]Spider, Bot, Crawler, User Agent, Webmaster, Web Admin [/tags]