Drupal SEO: How Duplicate Content Hurts Drupal Sites

Drupal's clean URLs give it a good reputation when it comes to SEO, but there's a lot more you can do under the hood to improve Drupal's search engine friendliness. Today I will show you some Drupal SEO tips to help you avoid duplicate content and boost your search engine ranking.

Drupal SEO: Google traffic chart

Proper search engine optimization allows you to tap into a significant source of new visitors. If you rank well for your keywords, you may find that search engine hits account for more traffic than all your other referrals combined. Unfortunately, most Drupal sites aren't performing as well as they could due to duplicate content.

Drupal's Duplicate Content Problem

Let's take a look at two URLs:

http://blamcast.net/articles/drupal-seo
http://blamcast.net/articles/drupal-seo/

On a normal Drupal site, with clean URLs enabled, these two addresses are basically interchangeable. Technically, one is a file and the other is a directory, but Drupal will show the same content for both pages. This is useful because people can link to either one and not get a 404 error.

But, when it comes to getting a good search engine ranking, having two pages with the exact same content can hurt you. In the SEO world, this is known as Duplicate Content, and it's something you want to avoid as much as possible. In some cases, people have reported their pages disappearing completely from the search results because of duplicate content.

At the very least, duplicate content can decrease your ranking. Search engines treat links to your page as a vote of confidence. If the links are split between two different addresses, you're potentially throwing away half your votes.

Luckily, there's a simple solution to this common SEO problem.

Redirecting Drupal with the .htaccess File

Drupal uses a file called .htaccess to tell your web server how to handle URLs. This is the same file that enables Drupal's clean URL magic. By adding a simple redirect command to the beginning of your .htaccess file, you can force the server to automatically remove any trailing slashes.

#get rid of trailing slashes
RewriteCond %{HTTP_HOST} ^(www.)?blamcast\.net$ [NC]
RewriteRule ^(.+)/$ http://%{HTTP_HOST}/$1 [R=301,L]

This is the code I'm using to eliminate trailing slashes on Blamcast. You'll need to change the domain name to match your site. Back up your .htaccess file before making any changes; if you make a mistake, your site will gives you errors until it's fixed. If you need more help, check out this htaccess tutorial.

The code's function is very simple: it tells browsers and search engines that the page they're looking for has moved to a new location (one without a slash on the end). The user (or Googlebot) will be automatically forwarded to the correct URL.

Essentially, all the "votes" for your content are now being redirected to a single page, and that page will rank higher because of this.

Using Drupal's robots.txt File to Hide Duplicates

We're not done yet, there's still some duplicate content to take care of. Here's our next example:

http://blamcast.net/articles/drupal-seo
http://blamcast.net/node/44

Again, we've got two addresses with the exact same content. The solution this time is simply to tell search engines to ignore anything under the "node" path. To accomplish this, we'll add a line to the end of our robots.txt file

Disallow: /node/

Now whenever a search engine finds a link starting with "blamcast.net/node/" it will just ignore it, eliminating any duplicate content issues it might have caused.

There is one potential issue with this solution: Any content that's only accessible from a /node/ URL will no longer get indexed by search engines. This shouldn't be a problem if you're already SEO-conscious and creating proper URL aliases for all your content. If not, it's time to start! The pathauto module can bulk-generate aliases for your articles based on their titles. Afterwards, you should tweak them to better reflect your keywords.

Also, keep in mind that your robots.txt file has to be in your domain's root directory for search engines to find it. If your Drupal installation is in a subdirectory, adjust the paths accordingly, and move your robots.txt to your site's root directory. There's more information about robots.txt on Wikipedia.

The Global Redirect Module

Alternatively, if you don't mind installing a new module, the Global Redirect module will forward all of your /node/ URLs to the proper alias, and it even removes trailing slashes for you.

If you've recently made the switch from numbered /node/ URLs to SEO-friendly aliased URLs, Global Redirect is probably your best choice. Search engines will learn your new URLs by following the redirects, and update their links accordingly.

On the other hand, if you've been using aliased URLs from the start, this won't be an issue for you.

Personally, I prefer setting up .htaccess and robots.txt, rather than having a module parse the URL on every page load. It requires less overhead and there's one less module to have to keep up to date. Furthermore, you avoid any potential conflicts with other modules.

Drupal SEO: More Than Just Clean URLs

So, as you can see, there's more to Drupal SEO than just enabling clean URLs. By decreasing the amount of duplicate content on your site, you drive visitors and links to a smaller, more focused group of pages, leading to better search engine ranking and more traffic.

If you liked this article, be sure to check out how to survive traffic spikes with Drupal.

Posted by John on 2007-03-25
Tags: ,