7 Ways to Find (and Fix!) Duplicate Content

By Rhonda Hurwitz

Duplicate content happens.

You may be aware that content theft can create duplicate content and search engine penalties for your website or blog.

But, sometimes we inadvertently create duplicate content on our own websites, by not understanding content creation best practices!

Recent search engine updates have made it more important than ever to identify and address duplicate content … before it impacts your organic search ranking or link popularity.

Here’s how to detect and fix both types:

Detect and Fix Onsite Duplicate Content

Webmaster tools provide easy ways to help address duplicate content issues within your site or blog. Here’s a gameplan:

Top Level Domains: Select Preferred URL Structure

To avoid creating multiple URLs delivering the same content, decide what your preferred URL structure is.

For example, should our website address be ? Or should it be simply ? For top level domains (tLDs), do you want the site to appear with or without the ‘www’? It doesn’t matter which you pick, as long as you pick and use one or the other.

Once decided, this preference can be set within Google’s free Webmaster Tools under ‘Configuration’.

With the tLD sorted, you can now address individual instances of duplicate content.

URL Structure at the Page Level: Unique and Consistent

Determine which unique URL you would prefer to use for each piece of content. Flatter URL structures that keep the content closer to the root domain are better for SEO and can help influence a higher rate of clicks on calls-to-action.

Here’s a hypothetical example: .

Once you’ve selected a preferred URL structure, be consistent. Use the preferred URLs throughout site navigation, in anchor text links and in sitemap files.

Apply 301 Redirects

A 301 redirect points search engines from a page with an earlier URL structure to the page with the newly structured URL , so that search engines do not perceive duplicate content - a fantastic way to reunify duplicate content.

If you find duplication based on earlier URL structures, use a 301 redirect to indicate that the content has permanently moved.

Implement Canonical Tag

301 redirects are server settings.

If you’re not comfortable implementing a server change or your hosting provider does not support the use of 301 redirects, the canonical tag can also be used.

All major search engines including Google, Bing and Yahoo currently support the use of the canonical tag (rel=canonical), which can help point search engines to your preferred URLS when inserted into the page code of your site.

The official Google Webmaster Tools explanation, including an example, can be found here.

(PS - Here’s an explanation and example of this in use, for commenter Bruce:

There’s a very simple way out of this: the canonical tag. In the *<head>* of the page, you put the preferred URL in a tag like this (this is a sample from our own blog):

The href should be the URL you like most for the content. If a search engine sees duplicate content on the site, it’ll use the canonical URL as the one and only one page on the site; there will be no dup content penalty).

URL Parameter Handling Tool

For duplicate content issues that arise due to multiple URLs with query string parameters, consider using the URL parameter handling tool within Google Webmaster Tools.

(A query string is the part of a URL that contains data to be passed to web applications. Query strings contain parameters or variables. Sometimes these parameters impact the content of the page).

To clarify, here’s an example:

www.XYZClothing.com/products/women?category=dresses&color=green is a query string with parameters for ‘Category’ and ‘Color’.

Other URL parameters do not impact page content and are solely for tracking (like a session ID) or sorting purposes.

www.XYZClothing.com/products/women?category=dresses&sort=price_ascending delivers the same content as www.XYZClothing.com/products/women?category=dresses.

Google’s Webmaster tools to detect and clarify these situations.

Reconsider Robot.txt Use

Of note, Google no longer recommends blocking access to duplicate content with a robot.txt file.

(This file is in the root directory of the website (www.XYZClothing.com/robots.txt) and it instructs search bots on what to index within the site).

If you are currently managing onsite duplicate content with a robot.txt file, you can .

Detect and Fix Offsite Duplicate Content

What about offsite issues that create duplicate content?

Guest blogging, article syndication and maliciously pirated or scraped content can all negatively impact your organic search ranking, author rank and most importantly – control over your own original content.

An advanced plagiarism detection tool such as iCopyright’s premium Discovery™ service, continuously scans the web to protect your content.

Discovery crawls the internet every day looking for duplicates of your content.

And, unlike simple plagiarism detection tools, Discovery automates monitoring AND the entire infringement resolution process, from first contact to sending the DMCA takedown notice should that be needed to resolve the situation.

Takeaway for bloggers, writers and publishers:

You work hard to create unique content. Make sure it appears both on site and across the web as it should!

Use Webmaster tools to identify where the issues exist and fix duplicate content, both on and off your website or blog.
Using an advanced duplicate content detection and resolution tool can make this job much easier.

Attending to these 7 suggestions now will pay search big result dividends later!

To find Offsite Duplicate Content, Try Discovery Free For 30 Days!

Content theft is a reality in today’s online world.

To learn more about how iCopyright’s digital copyright solution can help you protect your online content, watch a quick video … then take advantage of our f!

11 Comments

Blogging Tips: Give Your "Content Engine" a Spring Tune-Up - iCopyright
Jun 13 @ 07:51:36

[...] like link building, duplicate content identification and social engagement must be part of an on-going review, not a one-time [...]
Bruce Smeaton
Jul 08 @ 14:17:01

Hi Rhonda

Now THAT is what I call “a resourceful article!” Interestingly, after reading it i realized that I actually have a major duplicate content issue going on with one of my client sites that WASN’T addressed in your article - for no other reason than it is a ‘really rare case’.

Let me explain…

Initially, most of the page URLs within the site were built in such a manner that every part of the URL beyond the domain (i.e. after the “/”) contained Upper and Lower Case letters:

Example: myclientsite.com/Diesel-Engines

Later on, the webmaster decided to change the URL structure so that everything appeared in lower case:

Example: myclientsite.com/diesel-engines

Naturally enough, this led to TWO different URLs pointing to the same content page (and when you multiply out the number of content pages involved, it amounted to 27% of the entire site).

I wasn’t phased by it as I assumed 301 redirects would completely solve the problem i.e. myclientsite.com/Diesel-Engines >>> myclientsite.com/diesel-engines

But alas! The particular version of the CMS (Contegro) that this site was built on, isn’t capable of allowing the creation of 301 redirects for the purpose of redirecting uppercase URLs to lower case ones. In other words, it doesn’t differentiate between upper and lower case URLs. So, any time someone tries to 301 redirect the upper case URLs to the lower case ones, a 301 loop is created. Not exactly the epitome of a good user experience!

While later versions of this CMS do force upper case URLs to appear as lower case ones, this doesn’t help my client’s problem of duplicate content (as the damage has already been done). For the record, it is disheartening going into Google Analytics and seeing traffic data applicable to both versions of the URLs. Even more disheartening when I find links pointing to them!

Bottom line… if 301 redirects cannot be used to resolve this problem (for the reason I’ve just stated), and Webmaster Tools URL Removal service can’t be used because the offending URLs still point to “live” pages (as opposed to deleted or blocked ones), what can I do?

Is this client’s site destined for eternal damnation from Google’s condemnation of duplicate content? NOTE: Ever since the roll-out of Penguin 2.0, the site has slowly been dropping in the rankings. After exhaustive site audits and reviews, the ONLY real issue I can find that is remotely related to what the common understanding of Penguin 2.0 is about, is the issue of duplicate content (i.e. duplicate page titles, descriptions, body copy etc.) applicable to the issues I’ve outlined above.

Love to hear your thoughts … and even better still, “LOVE TO KNOW HOW TO FIX IT!”

Cheers

Bruce
- Rhonda Hurwitz
  Jul 09 @ 21:52:57
  
  Hi Bruce -
  
  THanks for posing an interesting challenge.
  
  I actually forwarded your question to two different SEO experts that we know. If they come up with anything for you, I will add their suggestion to this comment. I’d expect to hear back within 24 hrs. Thanks for reading — to be continued
  
  Rhonda and Team iCX
  - Rhonda Hurwitz
    Jul 10 @ 08:07:00
    
    Hey Bruce,
    
    Here’s an answer from our resident expert, Jon:
    
    There’s a very simple way out of this: the canonical tag. In the
    of the page, you put the preferred URL in a tag like this (this is a sample from our own blog):
    The href should be the URL you like most for the content. If a search engine sees duplicate content on the site, it’ll use the canonical URL as the one and only one page on the site; there will be no dup content penalty.
    
    Here is Google’s page on the issue:
    
    Hope this helps,
    
    Jon
    - Bruce Smeaton
      Jul 10 @ 16:40:50
      
      Hey Rhonda
      
      Thanks so much for going the extra mile on this - very much appreciated. One little thing though… I see you accidentally forgot to actually include the URL you were referring to when you stated the following: “In the of the page, you put the preferred URL in a tag like this (this is a sample from our own blog):”
      
      If you get a chance, I’d love to see that URL
      
      Thanks again
      
      Bruce
      - Rhonda Hurwitz
        Jul 11 @ 03:15:12
        
        Bruce, don’t know how I forgot that … here it is:
        
        <link rel=”canonical” href=”https://info.icopyright.com/in-our-opinion/how-to-make-your-blog-content-theft-worthy” />
        
        hope that helps!
Bruce Smeaton
Jul 11 @ 03:19:10

Hi again, Rhonda

Believe it or not, there isn’t any link showing - just a space where it should be. I am thinking that each time you publish a reply, “something” is stripping out the links. Really weird!
- Rhonda Hurwitz
  Jul 11 @ 04:00:36
  
  OK …guess wordpress doesn’t want to show any html, so as a workaround I added an image of the link into the blog post under “implement canonical tag” … re-read the post and see if it shows up:)
  - Bruce Smeaton
    Jul 11 @ 04:12:29
    
    Yesssssss!!! Your perseverence won out in the end!!! the WordPress gremlins have been defeated - for now. I see that graphic loud and clear!!!
    
    Thanks so much, Rhonda
    - Rhonda Hurwitz
      Jul 12 @ 10:03:45
      
      You’re welcome! let us know how it works out:)
riz1
Sep 03 @ 08:35:07

Nice Work Rhonda