HomeIndustry Commentary

Modeling is always in fashion

Like Tweet Pin it Share Share Email

Rank the most-reported aspects of COVID-19, in descending order of worst-explained-ness. Modeling is, if not at the top, close to it.

Which is a shame, because beyond improving our public policy discussions, better coverage would also help all of us who think in terms of business strategy and tactics think more deeply and, perhaps, more usefully about the role modeling might play in business planning.

For those interested in the public-health dimension, “COVID-19 Models: Can They Tell Us What We Want to Know?” Josh Michaud, Jennifer Kates, and Larry Levitt, KFF (Kaiser Family Foundation), Apr 16, 2020 provides a useful summary. It discusses three types of model that, translated to business planning terms, we might call actuarial, simulation, and multivariate-statistical.

Actuarial models divide a population into groups (cohorts) and move numbers of members of each cohort to other cohorts based on a defined set of rules. If you run an insurance company that needs to price risk (there’s no other kind), actuarial models are a useful alternative to throwing darts.

Imagine that instead you’re responsible for managing a business process of some kind. A common mistake process designers make is describing processes as collections of interconnected boxes.

It’s a mistake because most business processes consist of queues, not boxes. Take a six-step process, where each step takes an hour to execute. Add the steps and the cycle time should be six hours.

Measure cycle time and it’s more likely to be six days. That’s because each item tossed into a queue has to wait its turn before anyone starts tow work on it.

Think of these queues as actuarial cohorts and you stand a much better chance of accurately forecasting process cycle time and throughput — an outcome process managers presumably might find useful.

Truth in advertising: I don’t know if anyone has ever tried applying actuarial techniques to process analysis. But queue-to-queue vs box-to-box process analysis? It’s one of Lean’s most important contributions.

Simulation models are as the name implies. They define a collection of “agents” that behave like entities in the situation being simulated. The more accurately they describe agent behaviors, estimate the numbers of each type of agent, the probability distributions of different behaviors for each type, and the outcomes of these behaviors … including the outcomes of encounters among agents … the more accurate the model’s predictions.

For years, business strategists have talked about a company’s “business model.” These have mostly been narratives rather than true models. That is, they’ve been qualitative accounts of the buttons and levers business managers can push and pull to get the outcomes they want.

There’s no reason to think sophisticated modelers couldn’t develop equivalent simulation models to forecast the impact of different business strategies and tactics on, say, customer retention, mindshare, and walletshare.

If one of your modeling goals is understanding how something works, simulation is just the ticket.

The third type of model, multivariate-statistical, applies such techniques as multiple regression analysis, analysis of variance, and multidimensional scaling to large datasets to determine how strongly different hypothesized input factors correlate with the outputs that matter. For COVID-19, input factors are such well-known variables as adherence to social distancing, use of masks and gloves, and not pressuring a cohabiter to join you in your kale and beet salad diet. Outputs are correlations to rates of infection and strangulation.

In business, multivariate-statistical modeling is how most analytics gets done. It’s also more or less how neural-network-based machine learning works. It works better for interpolation than extrapolation, and depends on figuring out which way the arrow of causality points when an analysis discovers a correlation.

As with all programming, model value depends on testing, although model testing is more about consistency and calibration than defect detection. And COVID-19 models have brought the impact of data limitations on model outputs into sharp focus.

For clarity’s sake: Models are consistent when output metrics improve and get worse in step with reality. They’re calibrated when the output metrics match real-world measurements.

With COVID-19 testers have to balance clinical and statistical needs. Clinically, testing is how physicians determine which disease they’re treating, leading to the exact opposite of random sampling. With non-random samples, testing for consistency is possible, but calibration testing is, at best, contorted.

Lacking enough testing capacity to satisfy clinical demands, which for most of us must come first as an ethical necessity. Modelers are left to de-bias their non-random datasets — an inexact practice at best that limits their ability to calibrate models. That they yield different forecasts is unsurprising.

And guess what: Your own data scientists face a similar challenge: Their datasets are piles of business transactions that are, by their very nature, far from random.

Exercise suitable caution.

Comments (6)

  • As far as I can tell, the University of Washington (to take one example) modeling has been very good. Given that our only effective weapon again covid 19 is human behavior, probably for the next 2 years, the modeling assuming a criteria based following of shelter in place strategy universally applied until “the curve was bent” put US deaths at around 60,000 deaths was looking pretty accurate.
    Then, a majority of states abandoned shelter in place before the testing, tracking, and tracing programs were even close to being in place, and before their curves were bent down for 14 days. Thus, the projected outcome has gone up to 130,000+ plus deaths.
    The number of completely new kinds of inputs seems to me to put covid 19 business impact way outside of the usual kinds of business kinds of models.
    I’m not sure it is possible to model the business impact of covid 19 until the science of covid 19 becomes sufficiently clear, which is not currently the case, and that occurrence can’t be predicted.
    A question: Is it possible to model dysfunction?

    • As I understand it, the U. of Washington model is a multivariate-statistical model. That would make it better when changing assumptions are within the extremes of its data – when it’s used for interpolation. It should be useful for predicting the impact of changed values of factors like social distancing and mask-wearing, but not for predicting future levels of these same factors.

      I’m not sure any other sort of model would be better. I suppose a sufficiently sophisticated simulation might be built to include factors like social-distancing-fatigue, but I’d be way out of my depth trying to evaluate them on this basis.

      As for the business impact, I’ve read about passionate speeches on the subject, but I haven’t run across anything purporting to be an economic model that predicts it.

  • In your “spare time”, it might be interesting to ask how the folks at University of Minnesota do their modeling, as they (along with various U of C’s and Stanford) seem to be doing good jobs of modeling possible covid 19 death ranges.
    The science I was specifically thinking of has to do with modeling 1) asymptomatic infected patients showing late symptoms, and 2) asymptomatic individuals that never show symptoms though carrying the virus and infecting others.
    Thanks for an interesting, though challenging, article.

  • Here in the UK, the most influential model has been that of Prof Neil Ferguson (Imperial College, London). But the code has belatedly been released (after some tidying up by the helpful folks at Github) and it’s rather alarming for something on which (many) lives may depend. 15,000 lines of unstructured C, that doesn’t even produce the same results when run on different systems (multi-threading issues). Several (damning) code reviews online if you search for them.

    Apart from the (lack of) code quality, the huge problem is that many of the fundamental parameters – the number of existing infections, mortality rates, etc. – are unknown, even to within an order of magnitude. It’s not a criticism – this is probably the best we can do with the current state of the art.

  • > ….and not pressuring a cohabiter to join you in your kale and beet salad diet.

    Sounds like only half a story here, Bob. So sorry for you. Need a bigger kitchen?

    Otherwise, a great column, as usual. Problem is, we often forget that models (of all kinds) are not reality. Regardless of the problem or the input data or the p-values, I think we’re better off when we expect that models merely provide different kinds of insight. That way, people running a business or responding to a novel threat have informed perspectives and can make wise decisions based on different kinds of system predicates and different kinds of human behaviors. (Ironic, since in real internet life, AI is ubiquitous and influential in framing our “choices.”)

    Carry on, good sir!

Comments are closed.