The Next Big Web Thing - Loving Teh Web !!!1!1!11

>> Return to The Next Big Web Thing Index

Thursday, December 21, 2006

The Problem With Authority - Google And Comment Spam

If you're an average web surfer, you'll have little idea about the battle that's been waging over the years between Google and web spammers. The spammers want their sites to rank highest so that they get both the traffic and financial gains that come with it. Big search engines want to keep them out as they want to offer the best results and spam sites often have little to offer the web surfer. It's one hell of a tussle.

I've been following events myself over the last couple of years as I enjoy the mass of information and disinformation that comes with the coverage of this behind-the-scenes war. In some cases I've had to learn to protect against the tools being used by spammers in their battle, in others the information has helped me to gain an edge without stepping over the line. Once in a while, I've suffered as a result of going that step to far and then had to learn how to get it back in line with what the engines see as acceptable - you have to love this stuff and keep on top of it if you spend so much time working on the web.

Over the years, various methods have been developed by blackhat SEOs and spammers that have helped slide sites up the search results. Their methods include a range of practices from sneaky cloaking, keyword stuffing, hidden text to cross site scripting have been used to gain that all important edge in the results. Their methods have changed and developed alongside the changes in ranking algorithms of the search engines.

In recent years if you've owned a blog, forum or guestbook, you've no doubt come across comment spamming. Now not so long ago, Google put a lot of weight on the amount and quality of links your site had to rank it. Comment spam was generally and automated method of boosting this vote for your site. If you've seen waves of nonsense or heavily linked comments on your blog, you've been visited by spam bots trying to influence the search engines.

Now Google tried to encourage the use of the sometimes misunderstood nofollow attribute on blogs to try and combat this. Most blogs are nofollowed in their comments with little success - comment spam comes as thick and fast as it ever did, but at least where it's nofollowed, it's not passing any benefit to the sites it's linked to.

The Increasing Importance Of Authority

Most of what I'm talking about here is concerning Google, thanks mainly to its massive share of searches and the traffic it generates, but the effects do happen with the other main sites as well. So, one thing Google has done the last year to try and improve it algorithm is change the emphasis away from links to authority.

This means some sites are more trusted than others thanks to a number of factors that decide what makes a trusted domain. These factors include the age of the domain, the purpose of the domain and the quality of the links pointed at it. I believe the theory is that blending the results using quality and links as a yardstick is a potentially more valuable method of ranking sites in the search results. Educational and government sites are on the whole generally at the top of this authority hierarchy. It's widely believed that a link or two from a .gov (government) or a .edu (education) site will be a supercharged vote in your site's favour, and as such have become highly prized.

This is all good and well, but like any arms race, spammers are on the ball and comment spamming those sites also, especially so given the power of these links. But this new move towards domain authority is causing some new ranking problems - for starters, if a site with a lot of authority gets a page on something unrelated to it's main theme, it can rank highly for something it really shouldn't thanks to its authority in the eyes of the Google algorithm.

As an example, summer 2006 saw David Naylor and Danny Sullivan ranking highly for searches on Sky HD - not because they were authorities on the new technology, but because they'd mentioned in their blogs they'd got their hands on one. I get the same problem on one of my blogs - it ranks for stuff it shouldn't over other sites of mine that are bang on target for the theme, but not as old and established as my old personal blog.

Blog Spam Content Now More Relevant Than Real Content

Ok - if you're still with us, this background info leads to my concern at this authority based approach. I've spotted something the last couple of days that makes me think this authority trust slant in Google's algorithm has gone to far down the domain authority road, when it comes to how it ranks sites in the results. Tying together these ideas of domain authority and blog spam, Google is starting to rank the actual comment spam content on authority sites as more relevant than actual real content. And it's not just comment spam it's doing it with - it's comment feeds as well.

Ok, so let's look at this. I only spotted this because I saw a drop in my traffic on some nice terms that I had highly relevant and authoritative content on. I've not done a lot more research, but I've seen the same thing happening across a few terms and also, on Yahoo and MSN. Furthermore, it's the fault of spammers, but ironically, it's not doing them any good.

Now - a robotic spam attack will leave the same comments on a range of sites. Look at this page - and note the .edu domain and the theme of the site. It screams authority. However, scroll down and look at the comments, it's been spammed the hell out of on a load of terms designed to boost the rank of the sites it's pointing at.

No fear though - the links are all nofollowed so the spammers are getting no love or power from the links. But, this has led to a curious side effect - the actual comment spam content itself ranks highly because the site is such an authority site. Not only does it rank highly, it was spidered very quickly - check the dates and the content of the comments below.

Comment spam on the .edu page

So, all of a sudden, the page about a jobs fair on the Stanford Computer Forum is ranking for all sorts of terms, and highly. If it was one site it wouldn't be an issue - but lets not forget this is automated, so this has happened across a whole range of sites and they've all ping-ponged to the top of the search results, driving extremely relevant but not as 'domain authoritative' sites out the SERPs, and yes - that includes one of mine.

Just look at the results for the following two search terms - 12 Contract Month Nokia Vodafone and Galabingo Sharon and the sites in the top of them.

Search result for 12 Contract Month Nokia Vodafone

Search result for Galabingo Sharon

Notice how the top sites are all either RSS or Comment Spam. On the Gala Bingo search the top page is even more spammed than the Nokia search and the Stanford site's a bit further down the results. Worse still there's not a single relevant site ranked on the first page, or indeed the next 5 pages (I gave up there) - they're all comment spam pages.

Conclusion

OK, this is quite a limited set of results, I haven't explored it a lot further than searching for some of the terms on the Stanford site - the same happened on a number of them - not all of them, but enough to cause me concern across a couple of terms. It looks like the spammers have scored a big hit here. Ok, so they're not going to get the link love they were after, but at the very least, they've got the possibility of the click through traffic and the knowledge they've pushed relevant sites out of viewers eyes.

Google on the other hand has some work to do on the amount of weight it gives to a domain's authority. I don't claim to be any sort of expert on their search engine, but up until two days ago, it was doing something right as I could actually get to the proper content (not just my own) on the one search term. The sites in the SERPs then at least all had something to do with the subject. Now, none of them do. It looks like the spammers have won this battle for now, but as to if they'll win the war... Time will tell.

Share this story on: Digg | del.icio.us | Furl | reddit | | Yahoo!

2 Comments:

Justin said...

Nice, informative post. But what do you propose as a real solution? I mean, most bloggers now have anti-spam on their comments, but how does that prevent non-automated spammers?

3:36 PM  
Anthony D'Elia said...

The answer lies in context. Google must give more weight to the theme of the page and the site. If the keyword phrase is out of context, the page on which it resides should not rank well for it--regardless of the authority of the given domain.

6:54 PM  

Post a Comment

Links to this post:

Create a Link

>> Return to The Next Big Web Thing Index