“Statistics are used as a drunk uses lampposts – for support, not illumination.” – anonymous, provided by A Word A Day.
All Posts by Bob Lewis
Information glut or nonsense glut? (first appeared in InfoWorld)
Evolutionary theory has to account for all the bizarre complexity of the natural world: the tail feathers of peacocks; the mating rituals of praying mantises; the popularity of Beavis and Butthead. One interesting question: Why do prey animals herd?
Herds are easy targets for predators. So why do animals join them?
One ingenious theory has it that even though the herd as a whole makes an easy target, each individual member is less likely to get eaten – they can hide behind the herd. One critter – usually old or infirm – gets eaten and the rest escape. When you’re solitary, your risk goes up.
Predators hunt in packs for entirely different reasons. Human beings, as omnivores, appear to have the instincts of both predators and prey: We hunt in packs, herd when in danger.
Which explains the popularity of “research reports” showing how many of our peers are adopting some technology or other. These reports show us how big our herd is and where it seems to be going. Infused with this knowledge we can stay in the middle of our herd, safely out of trouble.
And so it was that I found myself reading an “executive report” last week with several dozen bar charts. A typical chart segmented respondents into five categories, and showed how many of the twenty or so “yes” responses fell into each one.
Academic journals impose a discipline – peer review – which usually catches egregious statistical nonsense. But while academic publication requires peer review, business publication requires only a printing press.
Which lead to this report’s distribution to a large number of CIOs. I wonder how many of them looked at the bar charts, murmured, “No error bars,” to themselves, and tossed this information-free report into the trash.
We read over and over again about information glut. I sometimes wonder if what we really have is nonsense glut, with no more actual new information each year than a century ago.
Bar charts without error bars – those pesky black lines that show how uncertain we are about each bar’s true value – are only one symptom of the larger epidemic. We’re inundated with nonsense because we not only tolerate it, we embrace it.
Don’t believe me? Here’s a question: faced with a report like this and a critique by one of your analysts pointing out its deficiencies, would you say, “Thanks for the analysis,” as you shred the offending pages, or would you say, “Well, any information is better than none at all.”
Thomas Jefferson once said, “Ignorance is preferable to error,” and as usual, Tom is worth listening to. Next time you’re faced with some analysis or other take the time to read it critically. Look for sample sizes so small that comparisons are meaningless, like the bar charts I’ve been complaining about.
Also look for leading questions, like, “Would you prefer a delicious, flame-broiled hamburger, or a greasy, nasty looking fried chunk of cow?” (If your source has an axe to grind and doesn’t tell you the exact question asked, you can be pretty sure of the phrasing.)
Look for graphs presenting “data” with no hint as to how items were scored. How many graphs have you seen that divide the known universe into quadrants? You know the ones: every company is given a dot, the dots are all over the landscape, the upper right quadrant is “good”, and you have no clue why each dot landed where it did because the two axes both represent matters of opinion (“vendor stability” or “industry presence”).
Readers David Cassell and Tony Olsen, both statisticians, recently acquainted me with two measures, Data Density, and the Data-Ink Ratio, from Edward Tufte’s wonderful book, The Visual Display of Quantitative Information:.
To calculate the Data Density divide the number of data points by the total graph area. You express the result in dpsi – data per square inch.
You calculate the Data-Ink Ratio by dividing the amount of ink used to display non-redundant data by the total ink used to print the graph. Use care when scraping the ink off the page – one sneeze and you’re out of luck.