While many are recovering from the CrowdStrike patch situation, it is probably good to reflect on the fact that the outcome could have been worse.

It was fascinating to see that certain industries suffered more, including travel and healthcare.  It seems that tribal affiliations or very targeted marketing (or both) gave CrowdStrike several industry specific installation bases that were impacted significantly.    Most of us have at least one story of a friend or colleague that couldn’t catch a flight home on Friday or a medical procedure that couldn’t be performed.

Monoculture IT has the same problems as monocultures in agriculture, politics and communities—Vulnerabilities are amplified, and have more consequences.  We can’t buy “Big Mike” bananas these days, and some of my ancestors moved here because of a potato fungus.  In both cases, farmers were growing the same exact genetic sibling or clone, only to be taken out by a highly specialized threat, at a huge cost to everybody.  We live in a time where deliberately or accidentally bad software has replaced fungi as a daily threat.  The analogy is shockingly close. Maybe we should stop talking about viruses and call them fungi instead?

Luckily, our diverse systems saved us from worse, and I think there are some lessons here for IT leaders.

Lesson #1- OS Diversity is a good thing.

The multitudinous flavors of Linux have more or less replaced all the commercial Unix operating systems and are varied enough to keep pests and predators as bay.   Non CrowdStrike Windows systems were not affected at all.  Neither were the remaining Big Iron systems affected either.   This isn’t a “Windows is bad / Linux is good” situation.  We can predict that next time it will be a Linux, or legacy Unix, or Apple OS that’s the culprit.

When you evaluate the alternatives – OS diversity vs OS monoculture – monoculture makes technical architecture management easier. Diversity makes risk management easier. See Lesson #3.

My opinion: The management challenges of multiple OS platforms seem to be worth it.  And to be sure we’re clear: I’m advocating OS diversity, not obsolescence. Lifecycle management still matters, as one big airline that’s still using some sort of patched up version of Windows 3.1 we found out (!)

 

Lesson #2- DevOps needs multi-environment testing and promotion.

Like many of you, we adopted a containerized, Kubernetes based DevOps approach over the years. Bad releases make people unhappy, lose confidence and vendor patches still need to be tested before promotion to Production as if they were your own development.   It was surprising to see so many companies that seem to have dropped multiple levels of environments and testing, and allowed the CrowdStrike patch to be installed directly on Production environments.   I am scratching my head about why this would happen- Was it because it was felt that this was an especially trusted vendor?  Was it because of the perceived risks of a Zero Day vulnerability? What was communicated to them from CrowdStrike about the release? Are there companies running enterprise systems without a “Dev/Test/QA/Prod” environment architecture, based on some perceived cost savings?  We often see environment architecture compromised due to cost.

Maybe it was confusing “Continuous Integration / Continuous Delivery” (good) with “Continuous Integration / Continuous Deployment” (bad). Or maybe it was just plain old complacency.

Regardless, IT leaders need to take this lesson to heart now.

 

Lesson #3-  Making things easier often makes them more risky.

This week’s column was originally going to be about AT&T and Snowflake.   Data warehouse security really should be at the top list of considerations (Note to self—fix this column).    In both cases, we see how the promise of “Easy” management of key systems introduced risks that were pronounced and, unfortunately, realized.   This seems to be the case with CrowdStrike customers that assumed that the automatic patching system could be entrusted with a Production system.   We are likely running similar risks with all the Cloud hosted offerings in our lives right now that some cloud-hosted vendors hide from view.

There may not be a great alternative to this risk, but awareness is a good place to start.  Or, if you’re big enough to have the requisite clout, ask your cloud-hosting vendors how often they undergo an ITSM audit, and to share the results of the most recent one.

Just because the root cause of the CloudStrike mess was bad patch management, that doesn’t mean patch management is the only poorly implemented ITSM practice that can create vulnerabilities.

Bob says:

Now I’m not claiming to be original in what follows, but to define “artificial intelligence” we need to agree on (1) what “artificial” means; and (2) what “intelligence” means.

“Intelligence” first. The problem I see with defining it based on human behavior as a benchmark is Daniel Kahneman’s Thinking Fast and Slow. Thinking fast is how humans recognize faces. Thinking slow is how humans solve x=34*17. The irony here is that thinking slow is the reliable way to make a decision, but thinking fast is what neural networks do. It’s intrinsically unreliable, made worse by its reliance on equating correlation and causation.

To finish our definition of AI we need to define “Artificial” – something that’s less obvious than you might think. “Built by humans” is a good start, but a question: Is a hydroponic tomato artificial? Then there’s AI’s current dependence on Large Language Models. They’re convincing, but to a significant extent whoever loads the large language model shapes its responses. To a significant extent it’s really just a different way to program.

 

Greg says:

AI’s defining feature (IMHO) is that we are training a piece of software to make decisions more or less the way we would make the same decision—using probability models to do so.  (It is a machine after all).

AI has different levels of intelligence and gradations of capabilities.  A “Smart” microwave oven that can learn the optimal power level for popping popcorn isn’t the same thing as what a Radiologist might use for automatic feature extraction for cancer detection—But they both might use self-learning heuristics to get smarter.  Speaking of self-learning—

Self-learning software isn’t necessarily included in AI—and AI may not be self-learning.   If you want to have a flashback to Proto-AI from 40 years ago, be my guest here.   This is fun, but trust me, she isn’t learning anything.

 

Bob says:

I think you win the prize for best AI use case with your microwave popcorn example. I could have used this once upon a time, when I trusted my microwave’s Popcorn setting. Getting rid of the over-nuked popcorn smell took at least a week.

I won’t quibble about your definition of AI. Where I will quibble is the question of whether teaching computers to make decisions as we humans do is a good idea. I mean … we already have human beings to do that. When we train a neural network to do the same thing we’re just telling the computer to “trust its gut” – a practice whose value has been debunked over and over again when humans do it.

Having computers figure out new ways to make decisions, on the other hand, would be a truly interesting feat. Maybe if we find ways to meld AI and quantum computing we might make some progress on this front.

Or else I’m just being Fully Buzzword Compliant.

 

Greg Says:

You hit on the big question, of whether it is a good idea or not, and to sound like Bob Lewis for a second, I think the answer is –“It Depends.”

If we are using AI tools that know how to make human type decisions for feature extraction from fire department imagery, or 911 call center dispatching, but faster and better– the answer is clearly “Yes!”.

In these cases, we are gaining a teammate who can help us resolve ambiguity and help us make better decisions.

To test this, I was thinking about a disabled relative of mine– confined to a wheelchair, and with some big limits in quality of life– Used well, AI has the potential to enable this loved one to lead a much more fulfilling life, by coming alongside them.

But, if we are using AI that encourages our inner sloth, and decline towards Idiocracy, we will couch it as “Trusting our computer gut” and we suffer the outcomes.

Used poorly, further enabling our collective dopamine addictions– No thanks, we have enough of that.

 

Bob says:

And so, a challenge. If I’m prone to asserting “it depends,” and AI is all about getting computers to behave the way we humans behave, what has to happen so AIs answer most questions with “it depends,” given that this is the most accurate answer to most questions?

A hint: The starting point is what’s known as “explanatory AI.” Its purpose is to get AIs to answer the question, “Why do you think so?” That’s a useful starting point, but it’s far from the finish line as “it depends” is about context, not algorithms.