Tuesday, January 5, 2010

Cloud Computing: Don't junk your PC just yet

Self-styled financial mayvens and fools, motley and otherwise, are insisting that the "wave of the future" is cloud computing. The bulky and expensive PC will be replaced by a low powered and simple computer that runs a "thin client" software program connected to an Internet Software Services Provider, such as Google. The provider will run the latest versions of spreadsheets, word processors, desk top publishing applications, CAD-CAM programs, Customer Service Management programs, presentation software, database programs and anything else you can imagine, which your business can access for a nominal fee. No more expensive hardware upgrades, no more expensive software upgrades, and every function will be available everywhere on a netbook or equivalent computer. An end to the PC and the beginning of computing nirvana. Well maybe. Or maybe not.
 
If you have nothing to hide, you have nothing to fear... But everyone has something to hide.
 
There are a few flies in the hypothetical ointment of the Computing Cloud. The first is that quite a few firms and private individuals are very content to run hand-me down versions of operating systems and software that are just as good for their purposes as the new updated ones, and do not cost anything at all to upgrade. And though we won't talk about it (not "nice") some people actually steal software. None of those people or firms are going to be interested in paying a "nominal fee" each year for the version of whatever it is that includes the latest bugs, when they have a perfectly fine working version of a Word processor, spreadheet, presentation manager and database, which they know how to use and which is compatible with the files they have already created.
 
Moreover, not all updates can be done smoothly or without changing hardware. Changeover from 32 bit to 64 bit to 128 bit computers and operating systems will require new hardware and new locally installed software. Each major technological change is still going to require junking the old computer, and these will continue to happen every few years to provide better displays better mice or no mouse at all (which would be the best sort of mouse!) bigger and better disks working on new principles, and numerous other innovations. Remember when a computer display took up an an entire desk, in the "bad old days" of five years ago? Or when a 100 MB hard disk was really huge? I remember when 10 MB was a lot of hard disk too. These innovations will all require new hardware no matter what you do. The hardware industry won't stand still, and people will still need some or all of these innovations in the thinnest of hypothetical thin clients.  
 
Another problem is dependence. Everyone who remembers the bad old days of the big central computer also remembers the announcements that flashed across the screen as you were frantically trying to meet a deadline, "Acme central computer will be closing down for two hours in five minutes for maintenance. Please save your work and exit all active programs." The PC freed everyone from that.  Supposedly that won't happen in cloud computing, or will it? And nobody guarantees that the cloud computing provider is always going to provide the latest hardware or the full amount of computing capacity that you need. Everythng has limits and everything cost money, and it is always cheaper to provide less rather than more, and to maintain a capacity that is adequate for most of the day and most of the time, but not necessarily enough for peak use.
 
However, the biggest problem of cloud computing, the show stopper that will make it a no-go for most firms and individuals, is data security. The enthusiasts of cloud computing, asked about security, will spout reams of gibberish about https and unique private keys and prime numbers and 128 bit encryption. Technojargon to awe the uninitiated. The truth is more prosaic. There is no way to protect data against a determined attacker.
 
As a cloud computing executive said, "If you have nothing to hide, you have nothing to fear." But everyone has something to hide. Every firm has their latest designs in their database and word processors. They have all the contact information of their customers in their CSM database. They have the amounts that they bid on various sealed bid contracts. They have the studies that show their own weaknesses vis-a-vis their competitors. All of this is quite interesting information for competitors and industrial spies. All the fancy security protocols and encryption schemes in the world depend on you being you and not someone else, and they are all vulnerable to identity theft. Identity theft is a multi-billion dollar industry and it is growing, despite the best efforts of ingenious firms to foil it. Once the thief has your information, and can make the system think he or she is you, they can access anything you can access. A disgruntled former employee of your firm, a disgrunlted former employee of the cloud computing service, a dishonest employee or a determined industrial spy can and will get the passwords and counter passwords by phishing schemes, by stealing Wi-Fi signals or by hacking databases. There is no system so foolproof that it cannot be fooled. It happens all the time. And what happens to your data privacy if the internal revenue service (or your wife's divorce lawyer) forces the cloud computing service provider to produce the records?
 
Do you really have nothing to hide? Are you sure? I thought so. Don't throw away your PC.
 
Ami Isseroff
  
 

Friday, August 21, 2009

The end of free Internet news?

Rupert Murdoch and others have decided it is time to end free news content  on the Web. One of the reasons that someone like Murdoch could make such a proposition is that he admittedly knows nothing about the Internet or how it works. I think there is no way at all to really end free content and that paid content will never be able to compete with free. Consider firstly all the legitimate primary sources of internet news that are always going to be free: Government Web sites, government broadcasters and NGOs. Governments and NGOs want you to see their content. A large part of international news consists of refurbished government press agency announcements or NGO press releases. "The prominent NGO Birdwatchers International has relased a new study showing..." and the rest of the item just quotes the information in the press release.
 
Now consider also the question of copyright and "fair use." A blogger subscribes to a paid service and copies the main content of an article to their Web log. As long as they comment on the article, and they are not a for-profit organization, it is "fair use for educational purposes." Attempts to stop them will be stymied because they will be labelled attempts to stifle freedom of the press.
 
The claim of publishers that they produce "quality content" that people will want to pay for is also highly dubious. During the Iraq war and the Second Lebanon war and other such events, the press often published government or terrorist propaganda indiscriminately. A CNN report described how dramatic footage of amulances rushing to the rescue was generously faked by Hezbollah for the benefit of the press. A Reuters photo of smoke over Beirut was shown by a blogger to a fake, and bloggers showed many other instances in which the commercial press was fooled by biased stringers or interested parties into passing off fabrications as fact - the French footage of the alleged killing of Muhamad al Dura was one such instance. Consider also stories like Sy Hersh's allegations of an imminent US attack on Iran that never materialized. These stories appeared over and over, though they had no basis in fact. If you want to lie to me for free that's fine, but I won't pay for it. The same was true for Judith Miller's NYT stories about WMD in Iraq.
 
There are so many ways for good free content to get to the Web and be available to all, that it is really doubtful that many people will want to pay for it, especially considering the poor quality of a lot of commercial journalism.
 
Ami Isseroff

Friday, May 22, 2009

Decline of Dmoz: Schadenfreude and sadness

As a long time frustrated user, submitter and ex-editor of the Open Directory (AKA Dmoz) I had feelings of Schadenfreude mixed with sadness when I learned of its decline. It has lost a lot of its viewing public, mostly because search engines do the search job better, but it has also stopped accumulating new listings. Triplicate listing of garbage pages, editors that tyrannize people with other political viewpoints and confine their directories to polemical articles or to their friends' web sites, arbitrary and capricious editing rules used to keep out sites that editors don't like, all detract from the quality of Dmoz. Lack of quality ratings and quality criteria are also a problem.
 
There are a few articles on the decline of dmoz around the Web, and they have attracted quite a lot of comment. Dmoz editors keep writing to say how great they are, without any understanding of what the statistics are telling them. Frustrated users are venting, but dmoz editors are never going to take them seriously, and that's a big part of the problem - contempt for users, arrogance, the elitism of a closed group. But there are a lot of good editors at dmoz and it is worth saving from itself.
 
Some enterprising people made a dmozsucks.org Web site. God bless 'em, but directories like dmoz serve an important function if they are run right, because they can provide information about quality of Web pages to search engines. My detailed thoughts about this are at: The Decline of Dmoz.
 
Ami Isseroff

Sorting media garbage from media information - with special application to the Web and Internet

When I wrote the article: The Decline of Dmoz. it got me thinking about how to give Web page and other media raters objective criteria for deciding if an article or other media item is useful or good or if it is flotsam to be ignored. If there was a directory for "everything" how would you keep out the flood of garbage in internet, printed matter, video and TV, and how could you spot and highlight the really superb new articles or books that should be highlighted and emphasized. It's not as easy as you think. Leonard and Virginia Woolf had a publishing business, and one of the surprising things that they found was that in the long run, the books and poems and articles that were least popular in their initial publication often became best sellers. Indeed, their tiny, romantic, hopeless venture, Hogarth Press, that operated from a hand press, produced some of the greatest classics of the twentieth century. But these great artists sold pitifully small numbers of books when their works first appeared.
 
A scale of quality would be useful for consumers as well, since it would give them a better idea of how much reliance to place in a Web page, article or newscast. For that, we would have to eliminate some of the most obviously useless categories I will mention below, and provide more details of how to judge the less bad material.
 
This scale is going to need a lot of work, but here is a first go, from the bottom (or near it) to the top.
 
Web sites that should not be indexed at all:
 
Web sites that have taken over domain names and use them for porn, gambling or other exploitation.
 
Parked Domains
 
Gimmick sites that are just search engines or advertising
 
Plagiarized material - material that is taken verbatim from another Web site without a link to the original, and often without specifying author or credit and posted  to another Web site, Web log or forum. I have been, and am, the victim of this sort of thing and I am not the only one. The people who do it invariably have an excuse
 
Racist and hate sites, videos etc.  - I think Google's policy is wrong. Web sites like Stormfront, Jew Watch and ihr have no place on the Internet. Spreading disinformation and hate is not doing anyone a service. Inciting to genocide is a crime under the International genocide convention - really it is.
 
Spam and confidence schemes should be banned from the mails and the Internet.
 
Search engines index a surprising quantity of such sites.
 
Lowest Quality Materials
 
Anonymous emails and Web logs or sites that post and re-post copies of anonymous material that never had an author who would acknowledge them. They are almost invariably hoaxes.
 
"news" reports based on anonymous sources and without confirmation - whether they are on the Web or in other media, are of the same approximate quality as anonymous email hoaxes about the latest email
 
Opinion pieces or news items that rely on attribution from non-authoritative sources to establish facts, such as "A guy I met told me that Google no longer uses Pagerank for anything and it is not important." The author didn't say it, and it is probably not true. They hid behind the non-authoritative source to intentionally perpetrate a falsehood. This is done all the time by supposedly serious journals
 
Videos and similar material that are so poorly produced that you cannot hear what people are saying. It is beyond me why people post such things to YouTube.
 
Materials that are just copies of articles published elsewhere, properly attributed. These have some utility especially if the original may be obscure or removed from the Web by the publisher.Pn the Web, it is generally considered legitimate to post whole articles provided you give due credit to the original, and arent just duplicating someone else's Web site to steal their income. Usualy though, it is best to go to the original source and to quote only parts of it, if you can be sure the source will still be there in five years. On the Web, you cannot be too sure.
 
Conspiracy theories that are not verified from other sources. There are whole Web sites devoted to the most fantastic ideas, usually based on total disinformation and often involving race hate and paranoia. The FBI and the Mossad did not cause the 9-11 attacks or the attack in Mumbai. The Federal Reserve system is not a plot to steal your money and give it to rich bankers.
 
 
Materials to be treated with due caution
 
Claims made by commercial sources who are selling a product
 
Sources with an obvious political bias.
 
Materials that use adjectives or hype to describe products or political issues. If words like "right wing" or "left wing" or "progressive" appear too often in an article or report, you have to ask yourself if this person is telling you facts or trying to convince you of their opinion.
 
Differential treatment of subjects - For example, a publication that will regularly use adjectives like "right wing" or "extremist" to describe politicians on one side of a conflict, but refrains from using any adjectives to describe leaders of the other sie.
 
Materials that are unsourced.
 
Assertions from publications or authors who have a poor track record for accuracy. Certain people for example, regularly predict that Iran will explode a nuclear weapon in a few months, or that Israel or the US will attack Iran, but it never happens. If they were ignored, they could not make a living by spreading disinformation in that way.
 
An article or publication that omits important facts that you know to be true is probably trying to create bias.
 
An article or publication that intentionally distorts a quote or lies about a fact, should not be trusted about other facts and assertions.
 
An article or book that has more than a few ellipses ("...") in quotes, is probably distoring the meaning of the quotes. This is a favorite technque of certain politicians, and is useful for dishonest commercial purposes as well.
 
Information you can rely on
 
Source is generally known to be correct
 
Information is confirmed by other reports
 
The report is plausible based on scientific evidence and common sense.
 
Source has no reason to lie
 
There's a lot less of that around than you might think. Remember the Iraq WMD that weren't? The Israeli bio-weapon hoax that was reported in numerous respected journals?

Monday, March 16, 2009

Google Keyword Search Frequency mystery, or "Is Sex going out of style?"

According to  Google's  AdWords tool, sex may be going out of style. That is,  Google's data show supposedly that over a 12 month period, on average there were 124,000,000  (One Hundred and Twenty Four Million) searches for keyword Sex each month, whereas in the month of February there were only 90,500 (Ninety thousand Five Hundred) people searching for sex. No mistake about the number of 0s anywhere either. Double checked.   All the rest of the people looking for sex  must have found it. But sex was not the only keyword affected. Every keyword I checked except Facebook had a lowever search frequency in February than over the last 12 months on average. The size of the drop was not consistent however. Some words dropped much more than others Is search going out of style or there just something wrong with Google's reporting or what?
 

What happened to Jewwatch.com?

A political-social search engine optimization issue developed around the hate web site jewwatch.com. For many years, searches in  Google for the  keyword  Jew returned this odious site at the top of the  listings or among the first ten. Jewwatch.com features standard Anti-Semitism fare including ZOG, the Zionist Occupied Goverment and the forged Protocols of the Elders of Zion. Attempts to get  Google to ban the site failed in the past, but now Jewwatch is gone from the top listings for keyword Jew. The question is:

Sunday, February 15, 2009

Solution for Duplicate page listings - 'canonical' attribute in head tag

One sort of duplicate content happens because of plagiarism, copying to forums, copying of articles to blogs (see  How Google Treats Duplicate Content). In those cases, there are really several physical instances of a page for various reasons. 
 
But there is another sort of "duplicate content" that is often really just an artifact of how the Web works and in part a bug of search engines. It is not duplicate content usually, but rather duplicate URLs for the same physical content.
 
 Suppose you have a page at http://seo.yu-hu.com. Just one physcial page. This one page can be reached in four different ways.
 
That is a simple case  for a site that uses physical files, not pages generated from a database.
A site that is run by a content management system however, may generate the same exact content in dozens of ways, from different URLs from the "products" or "catalog" or "archives" sections. It is still the same physical content that comes from the Database.  
 
Google  and other search engines decide that the additional pages are "duplicate content." - They really are.  
It is not clear how this may penalize your site or if it penalizes it.
 
Google and Yahoo! now let you tell them how to index the page. You do it by putting a "Canonical" attribute  in the head section of the page in a dummy link tag, link this.

<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish" />
 
The result should be that all the pagerank and other goodies will be given to the version of the page specified. See here for more details.