5 Typical Website Obstacles: 2008 Version

From years of troubleshooting websites for companies, I have consistently run up against the same problems. The size of the company, or the website for that matter, has very little to do with these problems, however I usually find that larger companies have these problems at unusually higher rates than smaller companies.

Each of these problems can kill your internet marketing campaign in the search engines. They have critical importance in allowing the search engines to properly download your website to their servers and then analyze all of the information in your webpage’s code. Having your website downloaded correctly to the search engines is the first step in organic search marketing, and these issues are at the foundation of the organic strategy.

Web Programming for Marketers
A better title for this might be IT for Marketers, as both groups need to work together for a successful strategy. As much as IT needs to understand the goals of the marketing campaign, marketing needs to understand the many details and constraints of the IT group.

1. Robots.txt File
This is the first place that I look for problems in a website. This is such a simple file, and usually webmasters are the primary people that deal with this file. In larger organizations, it can be years before anyone checks this file for accuracy, or even to verify that it exists.

The purpose of the robots.txt file is explained in a past article, and it is a critical file for the search engines, as they request this file before downloading the pages of your website. If they request it, there should be a file there. But be careful. One misplaced forward slash can make your site invisible in the search engines.

2. Homepage Redirects
Redirects are many times the product of a content management system. You can spot this if you’ve ever typed in the domain of a website, www.domain.com, and the page that you end up on is www.domain.com/base/index.com (or something similar), a few levels deep.

Here is what happened:
The root level page is what you requested: www.domain.com. However, a redirect forwards the user to the actual content page, which is no longer on the root level, but in a subfolder (/base/).

A redirect is like a forwarding address. When you move to a new house, you fill out a mail forwarding slip to notify the post office that you’ve moved. A redirect is the same thing. The redirect notifies the person (or search engine) who requested the page that the page is in a new location, and send them there.

There are two types of redirects, 301 and 302. A 301 redirect means that the page has moved permanently. A 302 redirect means that the move is only temporary. 301’s are the preferred method of redirecting.

A slight tangent
Sometimes, a hosting company will tell you that they cannot do a redirect on the server. They will tell you to do an on-the-page redirect. This is not a recommended method of redirecting, as the search engines do not like this type of redirect. It was once used as the primary means of fooling people into visiting adult sites in the late 90’s. I never recommend using this method, as the 301 redirect on the server is the best method to use. Anything else can bring additional problems, and are not as “clean” as the 301.
back on track . . .

If your hosting company tells you that they cannot perform a 301 redirect on the server, get another hosting company – it’s that simple.

Here’s why the homepage redirect causes problems. Most people link to the root level domain (www.domain.com.) However, if your page is not there, because it has moved to a new location, the links are all pointing to the wrong place. The search engines will see the 301 redirect and will usually assign the link value to the new page. However, the impact is not as great. The page with the content is not the page where people are linking. That lessens the value of both the content and the links.

3. Javascript Navagation
I don’t see this much, but when I do, it’s usually done by a big company with a kludgy interface. Take www.coca-cola.com for example (which uses a horrible redirect sequence as well).

The primary navigation is built with JavaScript. The Corporate Links have an actual HTML link in the script. However, none of the primary country links have an actual URL that can be followed by the search engines. If JavaScript is all script and no links, then there are no page links for the search engine to find. You can spot this by looking for the href= prefix in the script. The link to the page should follow in quotes.

If the link simply has a “#” following the URL, then that’s a sure sign of JavaScript. It’s not a page request, its script. You think you can’t understand HTML code, try looking at JavaScript. No wonder the search engines avoid it.

4. Canonical Domains
This problem leads us back to the forwarding address illustration. Let’s say you have a home, but you decide to put three, or four or maybe ten mailboxes out in front of your house. Which one will the post office deliver to? Will it be the same one every time? Or will they just pick one, use it, and ignore the rest? If so, will it be the mailbox you wanted them to use?

This is what happens when a page on a website can be seen with more than one address (URL). A classic example is Brookstone.com. The homepage is accessed at multiple URL’s:

http://www.brookstone.com
http://brookstone.com
http://www.brookstone.com/world.asp?cmid=hdr_hmpg&cm_re=A_Hdr*Home*BKST

Here is a case of three mailboxes for one home (homepage). Which one is the *real* homepage?
. . . and because the navigation builds dynamically, each page can have potentially hundreds of URL combinations. But that gets really tricky to explain. Suffice it to say that it’s a BIG problem for Brookstone in the search engines.

5. Legacy Spam
This issue is not limited to big companies, but I am always surprised to see it. I’ve found too much of it on Fortune 500 company websites to dismiss it as an accident. Usually it was done by a third-party SEO company (and I use the term ‘SEO Company’ loosely). Because the website was not being spidered by Google, usually because of the four problems outlined above, a company decided to create new pages on the website in a structure that the search engines could find.

This is usually done with what we call invisible text. It’s usually white text on a white background, or other similar combinations. The link text is the same color as the background as to hide it from users, but it is followed by search engines. The links go to “doorway pages” via a simple link structure and content is provided to the search engines, which then gets published in the search results. Users either end up on that doorway page, or through detection, end up somewhere in the main site.

The rule of thumb is that if the page is intended for search engines, and not for humans, you are crossing the line, according to the search engines. I’ve had to help many companies recover from the penalty imposed on them from both Google and Yahoo when they used these types of tactics.

The best means of avoiding this is to remove the “invisible” links and fix the architecture of the website. Having the search engines find your website and spider the content naturally is the best means of getting in and staying in the search engines database. Creating doorway pages and new links is only a band-aid for a larger problem, and never a good substitute.

Hopefully, you can use this article to evaluate your own website. Here are some helpful tools to evaluate these issues:

Robots.txt
SearchStatus FireFox plug-in
Google Search Console

Javascript Navigation
Browser Status Bar
Developer Toolbar for FireFox (disables JavaScript)

Invisible links
Read the code
Highlight all of the text on the page (Ctl+A)