Sunday, February 15, 2009

Solution for Duplicate page listings - 'canonical' attribute in head tag

One sort of duplicate content happens because of plagiarism, copying to forums, copying of articles to blogs (see  How Google Treats Duplicate Content). In those cases, there are really several physical instances of a page for various reasons. 
 
But there is another sort of "duplicate content" that is often really just an artifact of how the Web works and in part a bug of search engines. It is not duplicate content usually, but rather duplicate URLs for the same physical content.
 
 Suppose you have a page at http://seo.yu-hu.com. Just one physcial page. This one page can be reached in four different ways.
 
That is a simple case  for a site that uses physical files, not pages generated from a database.
A site that is run by a content management system however, may generate the same exact content in dozens of ways, from different URLs from the "products" or "catalog" or "archives" sections. It is still the same physical content that comes from the Database.  
 
Google  and other search engines decide that the additional pages are "duplicate content." - They really are.  
It is not clear how this may penalize your site or if it penalizes it.
 
Google and Yahoo! now let you tell them how to index the page. You do it by putting a "Canonical" attribute  in the head section of the page in a dummy link tag, link this.

<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish" />
 
The result should be that all the pagerank and other goodies will be given to the version of the page specified. See here for more details.
 
 

4 comments:

Anonymous said...

When I put in site:mysite.com into google, I get 65 pages, but only 46 indexed. I have only 46 pages on my site so where have the other 19 pages come from?

News Service said...

Look carefully at the extra page listings. Some are http://mysite.com. A second set is http://www.mysite.com. A third bunch may be printer pages. We can help you better if you provide an email address.

Anonymous said...

Thanks, My email is moore321@freeuk.com.

The problem is that I cant see these additional pages as Google doesn't show them, even though it clearly says there are about 65 pages indexed? It only shows the 46 I have on the site.

Wierd huh?

Cheers!

Madness222 said...

This is a nice story because as I know there are no solution in Duplicated page listing, but you prove it that there are always a solution in a such problem.

web graphic design services