What is duplicate content?
Duplicate content issues occur when multiple Web sites contain identical or similar content. Search engines have implemented new filters specifically to monitor duplicate content. There is a certain amount of flexibility with the percentage of similarity on a page, but t this is open to debate as to what percentage would constitute duplicate content. If you are sure that you have not copied anyone else’s content, you still must be aware of the issue because someone might attempt to steal your content.
What constitutes duplicate content?
Instances of duplicate content arise in two primary ways: you copy someone else’s content for use on your page, or someone else copies your content fore use on their page. Another common occurrence of duplicate content:
• pages used purely for print formatting
• pages with parameters for style, formatting etc.
• pages with similar content, i.e. different URL's but the same text on those pages
• pages with different URL's but all being redirected to the same page
• be wary of Content Management Systems (CMS) which often lead to an incredible amount of duplicate content if not used correctly
Duplicate content issues:
• the following URLs are different but show the same content: example.com/, example.com/?, example.com/index.html, example.com/Home.aspx, www.example.com/, www.example.com/?, www.example.com/index.html, www.example.com/Home.aspx. Google will recognize that they're the same, and will try to pick the right one, (although sometimes they pick the wrong one).
• with multiple versions of the same thing, Google will spend more time crawling the same content, meaning it will have less time to go deeper into your site, and you run the risk of having content not get indexed.
• your link popularity will be diluted. Backlinks pointing to several different URL versions of the same content, will make it harder to accumulate link juice for one URL.
• having a spider indexing duplicate content on your Web site causes your server resources (i.e. processing power) are unnecessarily depleted, potentially leading to a poorer speed of website traversal.
How to Avoid Duplicate Content:
Avoiding duplicate content will allow you to eliminate penalties that are applied by search engines when duplicate content issues are discovered.
• use a tool located at www.copyscape.com which allows you to search for instances of duplicate content
• use different content, i.e. modify the content to be noticeably different from the copies
• use robots.txt file or robots meta tag
• use a "canonical" version of the URL, meaning the simplest, most significant form. Pick one for each page and link consistently within your site. You can also use the rel="canonical" link element.