Uninstalled: 05/01/2009

The Top Five Myths of SEO (IMHO) - Myth #2

Tuesday, May 26, 2009

Continuing in my short series of five big SEO myths, this one is perhaps the most controversial of the concepts I’m going to tackle.

In the first post in the series, I laid into the discredited but still apparently widespread practice of stuffing keywords into the meta tags of a web page. My research into how keywords are used by search engines also led to me taking a long hard look at the notion of Keyword Density and the idea that there is some magic optimum number that will make all the difference between search engine success and failure.

For those of you who already know what Keyword Density is and why it’s deemed so important, I might as well get this out of the way right up front: frankly, I'm just not buying it.

Quick disclaimers:

i. As with all the posts in this series, I'm writing from the perspective of a Public Relations bloke. My observations relate to how news releases and editorial copy perform in search engine terms; the same thoughts are not necessarily going to hold water when looked at from a broader web content perspective.

ii. I still have a lot to learn about all this stuff. If I get things wrong (as I inevitably will) I will add updated and corrected info in future posts.

OK. Onwards. If you want the really short version:

From what I've learned, Keyword Density is not entirely irrelevant, but it’s far from being the most important determinant of SEO success.

Rather than worry about achieving an optimum density percentage, people would do a lot better to focus on writing good, interesting copy.

[Note: I'm drawing heavily on the fact that I spent many years working in the knowledge management software business before moving into PR. I would never have considered myself a true KM expert, and I'm certainly not an expert in SEO - I'm a mere flack, after all - but I think I learned enough about keyword-based indexing and search techniques to be mildly dangerous. I've also dredged up from memory some of the old examples and thought models we used to use back in my KM days. Grateful credit to a number of my old KM buddies for seeding the dark and dusty corners of my mind with some of these still useful examples.]

Keyword Density is, according to Wikipedia's simple definition:

...the percentage of times a keyword or phrase appears on a web page compared to the total number of words on the page.

Let's say you're searching for the keyword "bogus" and you come across a 100-word document that happens to include that keyword six times -- that document has a density of six out of 100 for the keyword "bogus", or:

6/100 = 0.06 - expressed as a percentage a keyword density of 6%

The same document would probably have a totally different keyword density for other words, obviously. It's all relative. This density thing is considered important to SEO experts for all kinds of purportedly good reasons. Let's dig into it and I'll try to explain how I think this stuff works...

Think of the way a search engine functions. A potential customer sitting in front of the search engine is trying to find information that is important to them. As a search engine developer, you want to offer up useful and meaningful results when they search. Using only the simple keywords the user provides, somehow you have to try to figure out what information would matter most to that individual right now.

This is a massively hard thing for any computer system to do. Most of us aren't really terribly good at searching -- it's hard for us to translate the concepts and ideas we're looking for into simple keywords.

At the other end of the search pipe, it's almost indescribably challenging to build a computer system that can understand what all the stuff out there on the Web is about. And "aboutness" is really, really important. To a computer, the words and phrases in a document are just bits: ones and zeroes. They have no meaning; the computer doesn't know what the document is about.

People know that a certain arrangement of words on a page, with spaces and punctuation just so, will turn a set of otherwise random characters into something that has meaning; that has aboutness.

Think of it this way: say you've forgotten both the name and the author of an old poem you remember learning as a child. You recall the sense of the thing, but you can't remember how it went.

So you wander into a favourite second-hand bookstore to see if you can find a copy. Without even the poet's name, though, you're going to be kind of hosed.

Luckily, the ancient shopkeeper (let's call him Mr. Ptolemy) is both exceptionally well-read and has a prodigious memory.

Trying to describe the poem to our friendly bookstore owner, you mention that it's about the choices we all have to make in life, and the consequences we will inevitably face from those choices as we grow older.

Somehow, splendid chap that he is, Mr. Ptolemy is able to discern that you're talking about Robert Frost's "The Road Not Taken".

He understood precisely what you meant and, as he recites a couple of favourite lines ("...Two roads diverged in a wood, and I-- I took the one less traveled by, And that has made all the difference"), it all snaps into place. Yes! That's exactly the poem I'm looking for!

Now try to imagine sitting in front of the Web version of Google and achieving the same result. What keywords would you have used? "Life" and "Choices" perhaps? Neither of those words appears anywhere in the poem. So where are you going to start?*

You have the sum of all human knowledge at your fingertips, but all you can do is describe what the document you want is broadly about. And all the computer can do is a kind of textual number-crunching based on word frequency, link relationships and keyword concepts.

Do you see how hard this stuff is for the people who build search engines?

Without getting deep into the kind of incredibly clever semantic search stuff my friends at TextWise do (disclosure: they're a client), it's really quite amazingly hard for most software systems to understand in any real way what even a simple document is about. So search engines were built around certain compromises.

Typically, in documents, web pages and things like that, there is going to be some kind of discernible relationship between the words they contain and what the document is actually about (unless, it seems, we're looking at poetic metaphors). A document that uses the word "astrophysics" several times is likely (but far from certain) to have something to do with the general topic of astrophysics.

From this, we can infer that a whole bunch of documents and web pages with many similar words (astrophysics, astrophysicists, cosmologists, cosmology, etc.) are more likely to be about the same thing than documents with no similar words. This is useful, because it means we can start grouping stuff together into clusters of inferred aboutness.

(Homonyms tend to bugger this all up, I'm afraid. Our astrophysicist would mean something quite specific if she searched for "stars". To a teenage celebrity gossip junkie, the same keyword means something entirely different. And a poor chap who just had difficulty spelling the word "asterisk" would be even more confused. But let's not get too far down that path - semantic disambiguation blows my mind.)

By now, you should have already figured out how some of the earliest search engines worked.

Build a really, really big index of words and pointers to where they appear in lots and lots of documents.
Use the frequency of word-use as a guide to which documents are most likely to be about the topics your searcher is interested in.
Layer on some synonym cleverness and you've got the start of a workable way to navigate through an ever-expanding online corpus of knowledge.

It's from this approach that the notion of keyword density rose to prominence in the SEO world.

Unscrupulous marketers in the early days of the web figured out that early, dumb search engines could be fooled. A document that included the word "astrophysics" in every second sentence might, the theory went, end up being ranked as the single most relevant and useful document about astrophysics in the entire universe. (It wasn't really quite this unsubtle, but you get my drift).

Having worked out the importance of density, web marketing monkeys started stuffing their pages with hidden keywords. Remember that old practice of embedding white text on the white background of a page? That was a density game.

The search companies quickly caught on though, as the Wikipedia entry notes:

In the late 1990s, which was the early days of search engines, keyword density was an important factor in how a page was ranked. However, as webmasters discovered this and the implementation of optimum keyword density became widespread, it became a minor factor in the rankings. Search engines began giving priority to other factors that are beyond the direct control of webmasters. Today, the overuse of keywords, a practice called keyword stuffing, will cause a web page to be penalized.

If you do any research into this stuff at all, you'll soon see that there's something of a balancing act going on. On the one hand, you don't want to get downranked as a spammer for having too many keywords stuffed into your web pages. On the other, you don't want to run the risk of ranking too low by not including enough keywords.

There's a two-step consulting process taking place out there:

Help the client figure out the most important keywords that will attract the right audience to their web pages (e.g. people who want to buy a couch in Canada are probably searching for "chesterfield" not "setee");
Optimize all web content to hit the right proportion of keywords-to-text throughout.

The general consensus right now seems to be that maintaining a keyword density of between 2-3% in your web content is optimal.

Any higher than 3% and you might get marked as spam, any lower than 2% and you're just not even on radar. These numbers vary widely, mind: I've seen optimal density recommendations as high as 8% - which seems insane to me.

Think about this in PR terms for a second: to achieve 2-3% recommended density in a short, 400-word news release, you’d need to repeat the chosen keyword 8-12 times. We've all read news releases like that - the ones that sound like they were written by robots.

Here's the thing, though: other than a relatively small group of real experts (the people who actually build the search engine algorithms at Google and elsewhere) no one really seems to know whether keyword density has any impact on search engine results.

In fact, I've been unable to find a single shred of evidence that any major search engine in use today gives preference to a particular ratio of keywords in web pages.

There are a lot of conflicting opinions out there, and I could be 100% wrong about this, but stick with me...

In all of the reading I've been doing on this topic, it was one particular comment from Eric Brantner at the site Reve News (geddit?) that really sparked my skepticism. In a piece titled "Keyword Density: The SEO Myth that Never Dies", Eric writes:

The simple truth is search engines are far too advanced to be tricked by something as basic as an optimal keyword density

...and that makes a great deal of sense to me.

As an aside, I think one part of the problem is that people often completely misinterpret the idea behind those optimal density numbers. It's easy to assume "recommended density" should be taken as a guide to add more keywords into a web page until you hit the magic ratio, and there are scores of online keyword density calculators that promise to help you figure out your sweet spot.

In fact, if keyword density measures are important at all, they're primarily useful in helping to manage keyword overload -- to ensure your content doesn’t get discounted as spam.

Optimal density is something you're encouraged to work down to, not up towards. There’s a good article on this topic at the delightfully snarky SEOElite blog and another useful analysis on the well-known SEO Tools site.

Getting back to the main point, though, I’ve come across a number of sources making the (entirely believable) assertion that keyword density on a single document doesn’t actually matter much at all. And here's why: keyword density is an internal measure. It ignores the fact that no web page is an island.

In other words: assessing keyword density can only tell you something about the individual web page (and its numeric placement in a simple ranking table) - it's a way of analyzing word frequency in a document in relation only to the document itself.

Think of a great long list of documents, arranged in order of percentage density for the keyword "street".

- At the top of the list is a document that has a very high density, as it contains the keyword many thousands of times in a 2,000 page file (let's say it has a density of around 8%).
- Way further down the list is a web site that mentions the word fifty times out of 35,000 words (0.14% density).
- Somewhere in the middle is a Wikipedia entry with 133 uses of the keyword out of 2,700 words (5% density).

So which of these is actually the most relevant document? The answer, of course, all depends on what you're looking for.

That first document in the list includes the word "street" thousands of times because it's the Yellow Pages. Probably not what you had in mind.

The web site with a keyword density of less than 1% is the hip young online magazine you're looking for - the one that just happens to be about all things "Street", but is way too fearsomely cool to use the word more than a handful of times in its masthead and elsewhere.

At this point, the logic of my analogy crumbles and leaks rather, but you get the point. Just because a document uses the same word lots of times (or even just enough times) does not mean it's the most relevant and useful document for every search.

It’s like: if I stood in front of an audience for an hour and dropped the word “astrophysics” into every fifth sentence, a completely unsophisticated listener might assume that I know something about astrophysics just because I used the word a lot.

But linguistics research has shown that frequency has no bearing on relevance - and it doesn't take any kind of research to prove that I know the square root of bugger all about astrophysics (nor about SEO, for that matter).

The best and most advanced search engine algorithms (such as those in place at Google, for example) are designed to index and “understand” words in a document in the context of the index in which that document appears. The ultimate search engine, perhaps, would be one that (amongst its weaponry) had the ability to understand the true relevance of any single document when compared with every single other document in the known dataverse.

Again: the fact that a particular document happens to use a certain keyword a dozen times does not necessarily mean it is an authoritative source of info related to that keyword. Good search engines know this and have largely devalued keyword density as a ranking parameter. It’s still used, but it is not nearly as important a measure as it was way back at the dawn of the Web.

In short: frequency is not the same as relevance.

SEO efforts that focus too slavishly on achieving the optimum keyword density run the risk of creating dry, robotic copy that's a nightmare for human visitors to read, and may even be down-ranked by sophisticated search engines.

Perhaps I'm being naive here, but I can't help thinking that the goal of the search engines is to work the way our Mr. Ptolemy does in the bookstore example above. The search engine tries to understand what it is you're really interested in, and offer that stuff up to you through the browser.

Google uses more than 200 different signals to try to determine the best information to offer up for any search, and they change their algorithms (by some accounts) several times a week. In the midst of all this high-power computing, what they're trying to do is mimic a really good human guide. They do this by looking for the cues to what other people deem to be the most valuable, relevant, useful and interesting content on any topic - using all kinds of different "signals".

With all that sophistication going on, I can't help but think that such a simplistic notion as "keyword density" is a real red herring. Good content, well written, is as important today as it has always been. Write something useful, meaningful, intelligent, newsworthy or just genuinely interesting (or all of these), and the search engines will find you.

Before I shut up about this, a final thought on keywords. I've laid into them pretty hard in the first couple of posts here, and I don't want anyone getting the wrong idea. While I'm just not ready to go along with the magic "optimal keyword density" malarkey, I'm still a firm believer in the importance and value of using the right keywords for the audience you hope to attract.

Keywords are, after all, the simple inputs we use to search - so it's important to research and understand the words, phrases, synonyms and circuitous routes that bring people to your site. Studying your site analytics can be great for this.

In the last 24 hours, I know that people have come to my blog through searching for me by name (with all kinds of creative misspellings) or by searching for such diverse things as:

uninstalled
social media experts
future of branding
twitter policy
hohoto
the machine stops
i hate vista

(I'm still the #1 ranked site in Canada for this last example, btw - and do you think Microsoft has ever reached out to me in any way?)

Studying the keywords people use to find you can teach you a lot. They're still the key drivers of search and any professional communicator will want to be sure they're using the same kind of vocabulary as the potential audience they're seeking to engage. Again, there are a lot of online tools you can use to experiment with keywords. Go Google.

Just don't get too hung up on any spurious notions of optimal keyword density, OK?

*[In case you're wondering, if you Google "poem about life choices", without the quotation marks, one of the top five results just happens to be a link to Robert Frost's poem. Darn it. This doesn't mean that any part of my argument is necessarily invalid, though. It simply proves that I'm not very good at coming up with illustrative examples for some of my points.]

Back to Myth #1: The Importance of Keyword Meta Tags
Next up - Myth #3: On-page optimization is the thing

The Top Five Myths of SEO (IMHO) - Intro and Myth #1

Monday, May 25, 2009

This is the first in a short series of posts exploring what I believe are some of the top myths in Search Engine Optimization. I was going to throw all five myths into a single post, but then I realised that would make for an even more than usually lengthy piece, so I've split the whole thing up into (slightly) shorter chunks.

I've been doing a great deal of reading about Search Engine Optimization (SEO) in the last few months, partly out of general professional interest, and partly in order to better understand certain aspects for some of our client work.

There's a necessary and logical connection between Public Relations and SEO. Search engines like news - frequently updated, fresh content. This is the rationale behind Google News and the Yahoo! home page looking a lot like an online newspaper. As a flack, I'm kind of in the business of news and, more particularly, in the business of helping clients to get their news in front of as many of the right people as possible.

This is a deliberate over-simplification, but one of the primary tools we use in PR to convey a client's story is, of course, the news release. It's been said before that in the old days 80 to 90 per cent of the expected audience for a news release was members of the media. With the disintermediating effect of the Internet, the thinning out of media, and the growth of online audiences, as much as 50 per cent of the audience for any news release comes directly to the release through search. It's direct-to-consumer PR, in other words.

The main news wire services have seen this in the growth of direct traffic to their websites. News feeds that once ran directly into the specialised editorial systems in traditional news rooms, available only to journalists, stock traders and a select few others, are now widely available online for anyone to see just by visiting CNW Group, Marketwire, Businesswire or one of the newer, online-only distribution services. [Disclosure: I should probably mention, just in case, that CNW Group continues to be a valued and valuable client].

With news going direct to consumers, and directly into the indexes of the main search engines, it makes sense that the issuing organizations should pay attention to the way those search engines handle their news. If you think of yourself as one of the leading sources on a particular subject, you want to make sure your sage pronouncements and carefully-crafted messages are showing up high and bright in Google searches to do with that subject.

Our opinions today are formed and shaped by what we learn online. The vast majority of product purchase decisions are supported by online research, as are investment decisions and service choices. In this research-driven market, it's increasingly important to rank at or near the top of search results. I've seen comments suggesting that if you are on the second or third page of results you might only get one per cent of the search traffic that the top ranked site gets - and I can well believe that.

Hence, there is a natural relationship between the practice of Search Engine Optimization and the business of PR. Really good PR is, I think, a form of story-telling. Good SEO, it seems, is the practice of ensuring those stories reach the right ears (or eyes).

After months of online and offline research, soaking up as much information as I've been able to handle in spare hours, I still feel I've only just scratched the surface of this weird and nebulous topic. It's a moving target, that much is clear. As the major search engines continue to refine their algorithms to produce ever better results, the paid optimization consultants flex and respond in efforts to keep their client content as close to the top of the search results as possible.

I'm looking forward to the upcoming Search Engine Strategies conference, coming to Toronto in early June - hoping to learn a lot more from some of the most active participants in the field, including the luminously intelligent Andrew Goodman of Page Zero Media and a host of other interesting speakers and search technology experts.

One thing I'm keen to test is a personal theory I've arrived at through research and analysis over the past couple of months. I'm hoping to engage some of the speakers and attendees at the conference to see if what I've come to understand about the current state of SEO is true. In particular, I've synthesized a set of what I believe are giant myths about the way SEO works - ill-founded claims that still keep popping up all over the place but, from what I've learned, can't possibly be valid - even if they once were.

Obvious, up-front caveat: just in case it's not clear enough already, I'm really not an expert in this stuff. It's entirely possible I could be talking out of my ningnong here, but this stuff seems to make sense with what I've been able to learn and test in the last couple of months.

MYTH #1: The importance of keyword meta tags

I'm going to start with something that should be really basic, 101 level stuff to many of you - but it's startling how many people who seem interested in SEO don't know about this.

If you look at the source code of just about any web page, you'll see a whole bunch of special code elements called the "meta tags". I'm not going to go into detail about them here; you can learn a ton of information about meta tags on some much better sites than this one, if you're interested.

Suffice to say, the meta tags are, as the name suggests, a kind of special metadata, that can be used to describe the content and structure of the page. The "Title" meta tag, for example, determines what text appears in your browser's title bar as you're viewing the page. There are other meta tags for Description, Language, and so on.

One of these meta elements, the "Keywords" tag, is a relic of the early architecture of the World Wide Web, from way back in the pre-Google days. The first search engines (WebCrawler, Magellan, Alta Vista, Lycos, and others) looked for this hidden tag as a key set of clues to the topic of your website. Webmasters were supposed to use the Keywords meta tag to list some of the main subject keywords describing the content of the page - like a library index card describing what the page was about.

Of course, many people quickly caught on to the idea that this could be gamed. Stuffing a competitor's product names into your keywords was a quick and dirty way to try to steal some of their attention. Listing multiple synonyms for topics of interest to your target customers was another common form of "keyword stuffing" - trying to artificially increase the rank of your page by making it appear more relevant to a broad array of topics.

This kind of abuse became so rampant that it quickly led to the Keywords meta tag becoming completely ignored by modern search engines. Although many people still use it, and a lot of self-proclaimed SEO experts still seem to recommend it, the Keywords meta tag seems to be as vestigial as your appendix.

From what I've been able to glean, Yahoo! is alone among the major search engines in still giving this meta tag some (minor) weight. Google, it seems, has never put any value on the information in this tag. In discussing this with others, I had a couple of people question whether there was any evidence to this effect, so I went hunting.

It's hard to find any concrete word from Google on this subject, but here's something useful. In the comments of this post on the Official Google Webmaster Central blog, you'll find the blog author, Google employee John Mueller, says:

...we generally ignore the contents of the "keywords" meta tag. As with other possible meta tags, feel free to place it on your pages if you can use it for other purposes - it won't count against you.

Also, the Wikipedia page about meta tags states:

With respect to Google, thirty-seven leaders in search engine optimization concluded in April 2007 that the relevance of having your keywords in the meta-attribute keywords is little to none

This is a reference, btw, to an excellent study published at SEOmoz, one of the definitive pieces on search engine ranking factors.

So you think by now the word would be out and people would have stopped going on about the Keywords meta tag. And yet I have direct experience of "experts" who are actively charging clients for stuffing words into this part of their web page source code, claiming that it will help improve their ranking in the search engines.

It won't. Try this yourself: run a Google search for "keywords meta tag" - without the quotation marks. I don't want to link this, I want you to run the search for yourself. Now read what the first three or four articles that come up have to say about the subject.

Better? Good - now stop paying your SEO consultant for something that's just plain useless.

In short: using the Keywords meta tag in your web pages won’t necessarily hurt your rank in search engine results, but it absolutely won’t help either.

Next Myth: The Magic Keyword Density Percentage

The Top Five Myths of SEO (IMHO) - Myth #2

The Top Five Myths of SEO (IMHO) - Intro and Myth #1

about

search

recent posts

recent comments

archives

links

admin