While many are recovering from the CrowdStrike patch situation, it is probably good to reflect on the fact that the outcome could have been worse.

It was fascinating to see that certain industries suffered more, including travel and healthcare.  It seems that tribal affiliations or very targeted marketing (or both) gave CrowdStrike several industry specific installation bases that were impacted significantly.    Most of us have at least one story of a friend or colleague that couldn’t catch a flight home on Friday or a medical procedure that couldn’t be performed.

Monoculture IT has the same problems as monocultures in agriculture, politics and communities—Vulnerabilities are amplified, and have more consequences.  We can’t buy “Big Mike” bananas these days, and some of my ancestors moved here because of a potato fungus.  In both cases, farmers were growing the same exact genetic sibling or clone, only to be taken out by a highly specialized threat, at a huge cost to everybody.  We live in a time where deliberately or accidentally bad software has replaced fungi as a daily threat.  The analogy is shockingly close. Maybe we should stop talking about viruses and call them fungi instead?

Luckily, our diverse systems saved us from worse, and I think there are some lessons here for IT leaders.

Lesson #1- OS Diversity is a good thing.

The multitudinous flavors of Linux have more or less replaced all the commercial Unix operating systems and are varied enough to keep pests and predators as bay.   Non CrowdStrike Windows systems were not affected at all.  Neither were the remaining Big Iron systems affected either.   This isn’t a “Windows is bad / Linux is good” situation.  We can predict that next time it will be a Linux, or legacy Unix, or Apple OS that’s the culprit.

When you evaluate the alternatives – OS diversity vs OS monoculture – monoculture makes technical architecture management easier. Diversity makes risk management easier. See Lesson #3.

My opinion: The management challenges of multiple OS platforms seem to be worth it.  And to be sure we’re clear: I’m advocating OS diversity, not obsolescence. Lifecycle management still matters, as one big airline that’s still using some sort of patched up version of Windows 3.1 we found out (!)

 

Lesson #2- DevOps needs multi-environment testing and promotion.

Like many of you, we adopted a containerized, Kubernetes based DevOps approach over the years. Bad releases make people unhappy, lose confidence and vendor patches still need to be tested before promotion to Production as if they were your own development.   It was surprising to see so many companies that seem to have dropped multiple levels of environments and testing, and allowed the CrowdStrike patch to be installed directly on Production environments.   I am scratching my head about why this would happen- Was it because it was felt that this was an especially trusted vendor?  Was it because of the perceived risks of a Zero Day vulnerability? What was communicated to them from CrowdStrike about the release? Are there companies running enterprise systems without a “Dev/Test/QA/Prod” environment architecture, based on some perceived cost savings?  We often see environment architecture compromised due to cost.

Maybe it was confusing “Continuous Integration / Continuous Delivery” (good) with “Continuous Integration / Continuous Deployment” (bad). Or maybe it was just plain old complacency.

Regardless, IT leaders need to take this lesson to heart now.

 

Lesson #3-  Making things easier often makes them more risky.

This week’s column was originally going to be about AT&T and Snowflake.   Data warehouse security really should be at the top list of considerations (Note to self—fix this column).    In both cases, we see how the promise of “Easy” management of key systems introduced risks that were pronounced and, unfortunately, realized.   This seems to be the case with CrowdStrike customers that assumed that the automatic patching system could be entrusted with a Production system.   We are likely running similar risks with all the Cloud hosted offerings in our lives right now that some cloud-hosted vendors hide from view.

There may not be a great alternative to this risk, but awareness is a good place to start.  Or, if you’re big enough to have the requisite clout, ask your cloud-hosting vendors how often they undergo an ITSM audit, and to share the results of the most recent one.

Just because the root cause of the CloudStrike mess was bad patch management, that doesn’t mean patch management is the only poorly implemented ITSM practice that can create vulnerabilities.