Wednesday, December 3, 2008

Showstopper Google bug?

All search engines are based on the concept of Authority of Web pages and Websites. The pages and sites that are deemed to have the highest authority are retried in SERPs (Search Engine Result Pages) at the very top. Google based its success on having the best measure of Website authority - the Google PageRank algorithm. The rationale for this intellectual and technical feat and the mechanism are descibed here: The PageRank Citation Ranking: Bringing Order to the Web.
Let's see how good it is. The best authority on the Web for the what it says in the Bible is a copy of the bible, no? Anyone who quotes from the Bible might make a mistake, but the Bible is infallible about the Bible, I would think.
Here is a quote from the King James Bible:
If then God so clothe the grass, which is to day in the field, and to morrow is
cast into the oven; how much more will he clothe you, O ye of little faith?
(Luke 12:28)
The phrase "O ye of little faith" appears in the book of Matthew as well. I searched for this phrase, in quotes in Google. The first references that were quotes from the Bible appeared in the eighth page of results! Items like Time magazine articles, song lyrics and an article in one of my Web sites accounted for the first 70 or so results. Google claims it has about 36,000 results for this phrase.
"To be, or not to be" is the overly famous quote from Shakespeare's Hamlet. When this is searched in Google, a lone result appears in the third place in my part of the world, behind two Wikipedia articles. It is not from a Web site with the whole play, just a fragment with the soliloquy. The next result that is really from Shakespeare's play appears around position 75 again. Is there a 70 penalty for having the original text (like the 30 Penalty)? Google has about 2.5 million pages with this phrase, or so it claims.
Part of the problem is that we used phrases that are extremely popular. I looked up "which is to day in the field"  in Google and indeed, the very first page retrieved was from the Bible. But it was the only one actually from the Bible on that page! There were about 25,000 such pages in Google.
I also tried a different phrase from the same Shakespeare soliloquy, "But that the dread of something after death" - not all that famous. Not a single one of the first 10 results was a link to the original Shakespeare text. The text of the "Tragedie of Hamlet" was first listed as result number 26! There were 23,700 results for this phrase.
If we cannot rely on "authority" to get search engines to deliver the authentic and authoritative origins of quotes at the top of search results, then it doesn't seem to be worth that much.
I had better luck with this line "somewhere i have never travelled,gladly" - it is the title of a poem by e.e. cummings, who is apparently not quite as famous as the Bible or Shakespeare, and therefore he is allowed to be more of an authority on his own work than William Shakespeare or the King James Bible. For this quote too, there were about 9,000 pages claimed by Google. The entire first page of results and more were filled with links to the poem itself.
For "How do I love thee? Let me count the ways," the start of Sonnet 43 from Elizabeth Barrett Browning's "Sonnets from the Portuguese" the first three entries retrieved by Google were the actual poem. That's fair enough. More than that would be useless. Someone might be searching for a different page. 

No comments: