“A book lying idle on a shelf is wasted ammunition.” — Henry Miller
Month: May 2020
Modeling is always in fashion
Rank the most-reported aspects of COVID-19, in descending order of worst-explained-ness. Modeling is, if not at the top, close to it.
Which is a shame, because beyond improving our public policy discussions, better coverage would also help all of us who think in terms of business strategy and tactics think more deeply and, perhaps, more usefully about the role modeling might play in business planning.
For those interested in the public-health dimension, “COVID-19 Models: Can They Tell Us What We Want to Know?” Josh Michaud, Jennifer Kates, and Larry Levitt, KFF (Kaiser Family Foundation), Apr 16, 2020 provides a useful summary. It discusses three types of model that, translated to business planning terms, we might call actuarial, simulation, and multivariate-statistical.
Actuarial models divide a population into groups (cohorts) and move numbers of members of each cohort to other cohorts based on a defined set of rules. If you run an insurance company that needs to price risk (there’s no other kind), actuarial models are a useful alternative to throwing darts.
Imagine that instead you’re responsible for managing a business process of some kind. A common mistake process designers make is describing processes as collections of interconnected boxes.
It’s a mistake because most business processes consist of queues, not boxes. Take a six-step process, where each step takes an hour to execute. Add the steps and the cycle time should be six hours.
Measure cycle time and it’s more likely to be six days. That’s because each item tossed into a queue has to wait its turn before anyone starts tow work on it.
Think of these queues as actuarial cohorts and you stand a much better chance of accurately forecasting process cycle time and throughput — an outcome process managers presumably might find useful.
Truth in advertising: I don’t know if anyone has ever tried applying actuarial techniques to process analysis. But queue-to-queue vs box-to-box process analysis? It’s one of Lean’s most important contributions.
Simulation models are as the name implies. They define a collection of “agents” that behave like entities in the situation being simulated. The more accurately they describe agent behaviors, estimate the numbers of each type of agent, the probability distributions of different behaviors for each type, and the outcomes of these behaviors … including the outcomes of encounters among agents … the more accurate the model’s predictions.
For years, business strategists have talked about a company’s “business model.” These have mostly been narratives rather than true models. That is, they’ve been qualitative accounts of the buttons and levers business managers can push and pull to get the outcomes they want.
There’s no reason to think sophisticated modelers couldn’t develop equivalent simulation models to forecast the impact of different business strategies and tactics on, say, customer retention, mindshare, and walletshare.
If one of your modeling goals is understanding how something works, simulation is just the ticket.
The third type of model, multivariate-statistical, applies such techniques as multiple regression analysis, analysis of variance, and multidimensional scaling to large datasets to determine how strongly different hypothesized input factors correlate with the outputs that matter. For COVID-19, input factors are such well-known variables as adherence to social distancing, use of masks and gloves, and not pressuring a cohabiter to join you in your kale and beet salad diet. Outputs are correlations to rates of infection and strangulation.
In business, multivariate-statistical modeling is how most analytics gets done. It’s also more or less how neural-network-based machine learning works. It works better for interpolation than extrapolation, and depends on figuring out which way the arrow of causality points when an analysis discovers a correlation.
As with all programming, model value depends on testing, although model testing is more about consistency and calibration than defect detection. And COVID-19 models have brought the impact of data limitations on model outputs into sharp focus.
For clarity’s sake: Models are consistent when output metrics improve and get worse in step with reality. They’re calibrated when the output metrics match real-world measurements.
With COVID-19 testers have to balance clinical and statistical needs. Clinically, testing is how physicians determine which disease they’re treating, leading to the exact opposite of random sampling. With non-random samples, testing for consistency is possible, but calibration testing is, at best, contorted.
Lacking enough testing capacity to satisfy clinical demands, which for most of us must come first as an ethical necessity. Modelers are left to de-bias their non-random datasets — an inexact practice at best that limits their ability to calibrate models. That they yield different forecasts is unsurprising.
And guess what: Your own data scientists face a similar challenge: Their datasets are piles of business transactions that are, by their very nature, far from random.
Exercise suitable caution.