Saturday, February 14, 2009

How Google treats Duplicate Content

Duplicate content is a headache for Search Engine positioning. Duplicate content is created in these situations:
  • Syndicated news items that appear in many Web sites
  • Sites that legitimately archive news items
  • Blogs that legitimately quote part of an article, or provide an entire article for reference.and link to the original under "fair use" doctrine.
  • "Classic" articles that are copied at numerous sites.
  • Classic literature and poetry
  • Plagiarism - A takes the content of B without permission and without giving credit to B, the originator, and puts it at their own Web site, or copies it to some Web forum or community Web site.
Very often, online journal articles disappear from the Web after a few years, and journals routinely do not reply to requests for syndication or quoting, so it is legitimate to quote large parts or all of such articles to make sure the record is intact, especially if your web log commentary refers to the article and makes no sense without it.
Plagiarism is a different matter of course.


Different search engines may deal with Duplicate content in different ways. Google uses a patented algorithm for finding duplicate content. It can put all or most duplicate content in "supplementary listings" that will not even be shown unless requested. Others may not even list those pages. The big problem is to determine what page "deserves" to be listed at the top of the SERP (Search Engine Results Page) listing. Most people are going to click on the top listing.


Obviously, the Web site that has the oldest file is probably the originator and should be listed first. But that is frequently NOT what happens. Plagiarism is the sincerest form of flattery. I wrote a rather successful article about an issue. It was promptly copied to a large Web site, and the listing at that Web site pushed the listing of my own page in Google way down in the page. In a different case of plagiarism, material published at our website was copied to a major journal, and to a major news service, neither of which gave any credit for the original and both of which claimed they had copyrighted our material!


If you are only "in business" to influence political opinion, then of course you are willing to sacrifice popularity of your own article in order to spread the word to the largest number of people. But in the long run, you still want your Web site to get more traffic, as that will help your cause the best.


In another case, I looked for an item using a keyword and found a prominently listed page at a closed, keyword protected, Web site. As it turns out, the original article is in the public domain and is freely available at another Web site, but that could only be found by searching the supplementary listings.

Google (if they are listening) should look into this problem, as it reduces the quality of their results, and in the long run, it will reduce the quality of materials on the Web. There is no practical way to prevent copying of materials, but these copies should all credit the original version and link to it. Authors should not be cheated out of credit for their work - that will not promote the creation of quality materials. If you own, or control a Web site or archiving forum, insist that any duplicate material that you copy must link to the original version on the Web. That will not necessarily ensure that the original is listed at the top of SERPs, but it will help. It will also reward the originator by providing the originating site with all important Link

There is one exception - if you are quoting an item as an example of hate propaganda, it seems to you that you are not morally obligated to provide a live Link to the original site and help their website popularity. You can provide the text of the URL without a live link or use a Nofollow attribute

.
Ami Isseroff

Duplicate and Triplicate Google Ad-Sense advertisements

The recession is upon us. That seems to mean that for many topics in many locales, AdSense may display the same advertisement in more than one ad slot on a page. Of course, this reduces Click-Through Rate (CTR- the percentage of visitors to a page who click on advertisements) because nobody will click on the same ad twice, and people who are not interested in finding out how to get a flat stomach might be interested in finding out where to get gourmet foods. Variety of advertisements obviously should increase CTR. Google often puts duplicate ads on a page even when there are different ads (also duplicates) on other similar pages!

The ways that Google seems to use to decide what ads to put on a page according to content are somewhat mysterious. You may have a whole page about astrophysics, but if for some reason there is a single link to a poetry website on that page, they may put an advertisement for poetry there. There algorithm may be a bit primitive.
You would have thunk that if Google AdSense knows about your page content, they also know what advertisements they put there. Since Google gets revenue from the advertisements, they should be interested in maximizing click through rate, right? I have not seen that anyone who obtained an interview with a Google guru asked about this problem.

Until Google acknowledges the plight of their suffering publishers and fixes the problem, you can help fix this condition a bit. Different shaped ad slots will draw different ads (though large horizontal and vertical graphic ads seem to draw the same content. You can also specify that one slat accepts graphics while another is text-only. What a pity that Google has that rigid rule about three ad units per page, whether they are big ad units or little ones, and whether it is a huge page or a little one.

Sunday, February 8, 2009

When did it happen?

Take a look at this news item:
 
Screen resolution 800 x 600 significantly decreased for exploring the internet according to OneStat.com
 
Amsterdam - July 25 - OneStat.com ( www.onestat.com ), the number one provider of real-time intelligence web analytics, today reported that more and more internet users choose for screen resolution 1024 x 768 which is the most popular screen resolution for exploring the internet.
 
The finding has important implications for web site designers because most web sites are designed for a screen resolution of 800 x 600 pixels.
 

The screen resolution 1024 x 768 has reached an all time high and has risen from 54.02 percent in June 2004 to 57.38 percent. Users with monitors set to the most common resolution 800 x 600 for web sites have an approximate 18.23 percent global usage share. A year ago this percentage was 24.66 percent.

Only one detail is missing from this page - the year. When did this happen? We can guess from the last paragraph that the article was published in 2005, but it does not say that.
 
Don't forget to put dates on time-locked materials.