Friday, May 22, 2009

Decline of Dmoz: Schadenfreude and sadness

As a long time frustrated user, submitter and ex-editor of the Open Directory (AKA Dmoz) I had feelings of Schadenfreude mixed with sadness when I learned of its decline. It has lost a lot of its viewing public, mostly because search engines do the search job better, but it has also stopped accumulating new listings. Triplicate listing of garbage pages, editors that tyrannize people with other political viewpoints and confine their directories to polemical articles or to their friends' web sites, arbitrary and capricious editing rules used to keep out sites that editors don't like, all detract from the quality of Dmoz. Lack of quality ratings and quality criteria are also a problem.
There are a few articles on the decline of dmoz around the Web, and they have attracted quite a lot of comment. Dmoz editors keep writing to say how great they are, without any understanding of what the statistics are telling them. Frustrated users are venting, but dmoz editors are never going to take them seriously, and that's a big part of the problem - contempt for users, arrogance, the elitism of a closed group. But there are a lot of good editors at dmoz and it is worth saving from itself.
Some enterprising people made a Web site. God bless 'em, but directories like dmoz serve an important function if they are run right, because they can provide information about quality of Web pages to search engines. My detailed thoughts about this are at: The Decline of Dmoz.
Ami Isseroff

Sorting media garbage from media information - with special application to the Web and Internet

When I wrote the article: The Decline of Dmoz. it got me thinking about how to give Web page and other media raters objective criteria for deciding if an article or other media item is useful or good or if it is flotsam to be ignored. If there was a directory for "everything" how would you keep out the flood of garbage in internet, printed matter, video and TV, and how could you spot and highlight the really superb new articles or books that should be highlighted and emphasized. It's not as easy as you think. Leonard and Virginia Woolf had a publishing business, and one of the surprising things that they found was that in the long run, the books and poems and articles that were least popular in their initial publication often became best sellers. Indeed, their tiny, romantic, hopeless venture, Hogarth Press, that operated from a hand press, produced some of the greatest classics of the twentieth century. But these great artists sold pitifully small numbers of books when their works first appeared.
A scale of quality would be useful for consumers as well, since it would give them a better idea of how much reliance to place in a Web page, article or newscast. For that, we would have to eliminate some of the most obviously useless categories I will mention below, and provide more details of how to judge the less bad material.
This scale is going to need a lot of work, but here is a first go, from the bottom (or near it) to the top.
Web sites that should not be indexed at all:
Web sites that have taken over domain names and use them for porn, gambling or other exploitation.
Parked Domains
Gimmick sites that are just search engines or advertising
Plagiarized material - material that is taken verbatim from another Web site without a link to the original, and often without specifying author or credit and posted  to another Web site, Web log or forum. I have been, and am, the victim of this sort of thing and I am not the only one. The people who do it invariably have an excuse
Racist and hate sites, videos etc.  - I think Google's policy is wrong. Web sites like Stormfront, Jew Watch and ihr have no place on the Internet. Spreading disinformation and hate is not doing anyone a service. Inciting to genocide is a crime under the International genocide convention - really it is.
Spam and confidence schemes should be banned from the mails and the Internet.
Search engines index a surprising quantity of such sites.
Lowest Quality Materials
Anonymous emails and Web logs or sites that post and re-post copies of anonymous material that never had an author who would acknowledge them. They are almost invariably hoaxes.
"news" reports based on anonymous sources and without confirmation - whether they are on the Web or in other media, are of the same approximate quality as anonymous email hoaxes about the latest email
Opinion pieces or news items that rely on attribution from non-authoritative sources to establish facts, such as "A guy I met told me that Google no longer uses Pagerank for anything and it is not important." The author didn't say it, and it is probably not true. They hid behind the non-authoritative source to intentionally perpetrate a falsehood. This is done all the time by supposedly serious journals
Videos and similar material that are so poorly produced that you cannot hear what people are saying. It is beyond me why people post such things to YouTube.
Materials that are just copies of articles published elsewhere, properly attributed. These have some utility especially if the original may be obscure or removed from the Web by the publisher.Pn the Web, it is generally considered legitimate to post whole articles provided you give due credit to the original, and arent just duplicating someone else's Web site to steal their income. Usualy though, it is best to go to the original source and to quote only parts of it, if you can be sure the source will still be there in five years. On the Web, you cannot be too sure.
Conspiracy theories that are not verified from other sources. There are whole Web sites devoted to the most fantastic ideas, usually based on total disinformation and often involving race hate and paranoia. The FBI and the Mossad did not cause the 9-11 attacks or the attack in Mumbai. The Federal Reserve system is not a plot to steal your money and give it to rich bankers.
Materials to be treated with due caution
Claims made by commercial sources who are selling a product
Sources with an obvious political bias.
Materials that use adjectives or hype to describe products or political issues. If words like "right wing" or "left wing" or "progressive" appear too often in an article or report, you have to ask yourself if this person is telling you facts or trying to convince you of their opinion.
Differential treatment of subjects - For example, a publication that will regularly use adjectives like "right wing" or "extremist" to describe politicians on one side of a conflict, but refrains from using any adjectives to describe leaders of the other sie.
Materials that are unsourced.
Assertions from publications or authors who have a poor track record for accuracy. Certain people for example, regularly predict that Iran will explode a nuclear weapon in a few months, or that Israel or the US will attack Iran, but it never happens. If they were ignored, they could not make a living by spreading disinformation in that way.
An article or publication that omits important facts that you know to be true is probably trying to create bias.
An article or publication that intentionally distorts a quote or lies about a fact, should not be trusted about other facts and assertions.
An article or book that has more than a few ellipses ("...") in quotes, is probably distoring the meaning of the quotes. This is a favorite technque of certain politicians, and is useful for dishonest commercial purposes as well.
Information you can rely on
Source is generally known to be correct
Information is confirmed by other reports
The report is plausible based on scientific evidence and common sense.
Source has no reason to lie
There's a lot less of that around than you might think. Remember the Iraq WMD that weren't? The Israeli bio-weapon hoax that was reported in numerous respected journals?