You are hereSEO Guide: Canonical Domains, Apache & HTTP 301 Redirects


SEO Guide: Canonical Domains, Apache & HTTP 301 Redirects


By hagrin - Posted on 05 January 2007

Posted By: hagrin
Create Date: 14 December 2005
Last Updated: 4 January 2006

Overview:
Search Engine Optimization (SEO) remains the ultimate goal of the webmaster, blog publisher, e-commerce seller, AdSense user and pageview junkie. By tweaking and modifying your website's layout, design and content, a domain owner can increase his listing rank when terms are searched on the major search engines (for the purpose of these articles, the major search engines are Google, Yahoo! and MSN). One SEO hint/tip/issue that website owners need to deal with is duplicate content penalties resulting from a canonical domain issue. This article will talk about what exactly this problem is and how to resolve it.

What Exactly is a Canonical Domain Name?:
Webopedia defines a canonical name (CNAME) as:

Short for canonical name, also referred to as a CNAME record, a record in a DNS database that indicates the true, or canonical, host name of a computer that its aliases are associated with. A computer hosting a Web site must have an IP address in order to be connected to the World Wide Web. The DNS resolves the computer’s domain name to its IP address, but sometimes more than one domain name resolves to the same IP address, and this is where the CNAME is useful. A machine can have an unlimited number of CNAME aliases, but a separate CNAME record must be in the database for each alias.

I'm sure many of you are saying "English (or your first language) please!". Basically, when you purchased your domain name (for instance, I bought hagrin.com), you have also purchased the ability to add a CNAME (sometimes called "parking a subdomain"). By default, the "www" CNAME is automatically created for your domain usually upon your purchasing of the domain. Therefore, right away, users will have two ways of navigating to your site - through http://www.hagrin.com (with the "www") and http://hagrin.com (just the domain name). Giving users the ability to get to your site in two ways seems to be beneficial without any drawbacks. However, if users can get to your site by 2 different URLs, search engine crawlers can also crawl your content by both URLs. If this does occur (and you have no preventive measures in place), then search engines may collect two copies of the same data, but at two different links potentially causing a "duplicate content" penalty for your site.

How do I know if I have a problem? Well, you can use the Search Engine Friendly Redirect Checker to diagnose any potential problems your site may have. As a note, don't only test the home page, try testing some pages that are not in the root directory to make sure all of your URLs redirect in a search engine friendly manner. So how can you avoid this from happening or fix it once you have diagnosed a problem?

The Fix:
I encountered this problem recently and wanted to make sure that I wasn't having my site split into two or having my content duplicated causing me to drop in the search rankings. Therefore, I started looking around for a way to redirect my users from the plain hagrin.com to www.hagrin.com for all documents on my server. Hagrin.com runs on a Linux machine using Apache as its web server software so the fix below is specific to Apache's web server. After browsing the web for a few hours, I came to the conclusion that I needed to perform a HTTP 301 Redirect for my hagrin.com pages to www.hagrin.com links. Knowing that I was using Apache, I was able to create a .htaccess file in the web root directory (/www) of my web server and added the following lines of code:

RewriteEngine On
RewriteCond %{HTTP_HOST} ^hagrin\.com$ [NC]
RewriteRule ^(.*)$ http://www.hagrin.com/$1 [R=301,L]

So what exactly does this code do? Well, if a user were to request http://hagrin.com/rss/hagrin_atom.xml, the user would be redirected to http://www.hagrin.com/rss/hagrin_atom.xml instead. This allows for both requests, hagrin.com and www.hagrin.com, to lead to the same URL and prevent any duplicate content penalties. If you aren't using Apache, the fix for this issue may be very different and I would suggest doing a Google search on HTTP 301 redirects to resolve any canonical domain name issues you may be having.

Resources:

  1. Webopedia CName Definition
  2. Search Engine Friendly Redirect Checker
  3. SocialSocial Patterns - "Cleaning Up Canonical URLs With Redirects"
  4. Matt Cutts on Canonical Domain Issues

Version Control:

  1. Version 1.1 - 4 January 2006 - Updated Resources to include Matt Cutts' Canonical Domain Issues post
  2. Version 1.0 - 14 December 2005 - Original Article