While many are recovering from the CrowdStrike patch situation, it is probably good to reflect on the fact that the outcome could have been worse.

It was fascinating to see that certain industries suffered more, including travel and healthcare.  It seems that tribal affiliations or very targeted marketing (or both) gave CrowdStrike several industry specific installation bases that were impacted significantly.    Most of us have at least one story of a friend or colleague that couldn’t catch a flight home on Friday or a medical procedure that couldn’t be performed.

Monoculture IT has the same problems as monocultures in agriculture, politics and communities—Vulnerabilities are amplified, and have more consequences.  We can’t buy “Big Mike” bananas these days, and some of my ancestors moved here because of a potato fungus.  In both cases, farmers were growing the same exact genetic sibling or clone, only to be taken out by a highly specialized threat, at a huge cost to everybody.  We live in a time where deliberately or accidentally bad software has replaced fungi as a daily threat.  The analogy is shockingly close. Maybe we should stop talking about viruses and call them fungi instead?

Luckily, our diverse systems saved us from worse, and I think there are some lessons here for IT leaders.

Lesson #1- OS Diversity is a good thing.

The multitudinous flavors of Linux have more or less replaced all the commercial Unix operating systems and are varied enough to keep pests and predators as bay.   Non CrowdStrike Windows systems were not affected at all.  Neither were the remaining Big Iron systems affected either.   This isn’t a “Windows is bad / Linux is good” situation.  We can predict that next time it will be a Linux, or legacy Unix, or Apple OS that’s the culprit.

When you evaluate the alternatives – OS diversity vs OS monoculture – monoculture makes technical architecture management easier. Diversity makes risk management easier. See Lesson #3.

My opinion: The management challenges of multiple OS platforms seem to be worth it.  And to be sure we’re clear: I’m advocating OS diversity, not obsolescence. Lifecycle management still matters, as one big airline that’s still using some sort of patched up version of Windows 3.1 we found out (!)

 

Lesson #2- DevOps needs multi-environment testing and promotion.

Like many of you, we adopted a containerized, Kubernetes based DevOps approach over the years. Bad releases make people unhappy, lose confidence and vendor patches still need to be tested before promotion to Production as if they were your own development.   It was surprising to see so many companies that seem to have dropped multiple levels of environments and testing, and allowed the CrowdStrike patch to be installed directly on Production environments.   I am scratching my head about why this would happen- Was it because it was felt that this was an especially trusted vendor?  Was it because of the perceived risks of a Zero Day vulnerability? What was communicated to them from CrowdStrike about the release? Are there companies running enterprise systems without a “Dev/Test/QA/Prod” environment architecture, based on some perceived cost savings?  We often see environment architecture compromised due to cost.

Maybe it was confusing “Continuous Integration / Continuous Delivery” (good) with “Continuous Integration / Continuous Deployment” (bad). Or maybe it was just plain old complacency.

Regardless, IT leaders need to take this lesson to heart now.

 

Lesson #3-  Making things easier often makes them more risky.

This week’s column was originally going to be about AT&T and Snowflake.   Data warehouse security really should be at the top list of considerations (Note to self—fix this column).    In both cases, we see how the promise of “Easy” management of key systems introduced risks that were pronounced and, unfortunately, realized.   This seems to be the case with CrowdStrike customers that assumed that the automatic patching system could be entrusted with a Production system.   We are likely running similar risks with all the Cloud hosted offerings in our lives right now that some cloud-hosted vendors hide from view.

There may not be a great alternative to this risk, but awareness is a good place to start.  Or, if you’re big enough to have the requisite clout, ask your cloud-hosting vendors how often they undergo an ITSM audit, and to share the results of the most recent one.

Just because the root cause of the CloudStrike mess was bad patch management, that doesn’t mean patch management is the only poorly implemented ITSM practice that can create vulnerabilities.

Greg says:

I’ve been hearing concerns in multiple organizations from people who work remotely, whether “remote” is a branch office or home office.

The complaints? That remote colleagues are missing out on important conversations that only happen in hallways, company break rooms, or around the foosball table.

Looking through the looking glass, there’s a managerial aspect of the situation, which, perhaps surprisingly, constitutes a breakdown of the old RACI chart (if you aren’t familiar with the framework, it’s an account of who does what on all project tasks – who performs work (Responsible); who decides something (Accountable); who influences (Consulted); and who cares (Informed; except for when the “I” stands for “Ignored”).

Virtualizing the workforce has revealed that RACI is no longer complete, and probably never was. RACI, as it turns out, is limited to a transactional view of employee interrelationships: Many project decisions are made “around the water cooler,” beyond the reach of project task assignments. To manage well we need another “I” – “Informal.”

 

 Bob says:

Maybe this is just a tangent, but one of the great leadership challenges virtualizing the workforce creates is that “What employees want” is only exceeded in its fogginess by “What management wants.”

As you point out, employees miss the watercooler effect and all the related socializing, informal brainstorming and so on that remote work has left behind. At the same time they like the convenience of not having to commute to a centralized office.

Meanwhile, managers want to be able to establish a consistent business culture – a goal already made difficult in a branch office situation even before Remote Work became a thing – while also keeping management/employee interactions relational rather than deteriorating into a purely transactional mode.

And while they want all of this, this they want to keep their workforce happy with their work situation.

So fess up, Greg. You manage people. How do you handle, and encourage them to handle, the growing gap separating the addition of Zoom to the missing RACI entry?

 

Greg says:

To be honest, there doesn’t seem to be a magic bullet–yet.   What seems to be the best solution so far is regular, face to face interactions, where people get these watercooler interactions that they need.  When technology has been tried, such as tablet based virtual telepresence robots or collaborative smart boards, they generally end up collecting dust.  When Google tried to replicate the sense of being in a room and working together to solve a problem, they ultimately gave up.

I am cautiously optimistic that AI tools will help us sift through the communications and help us find those important nuggets of information that lead  to feeling  “Consulted” and “Informed”.

 

Bob says:

I keep wondering if some of the solution is as prosaic as the Surface Pro stylus, coupled with a decently intuitive White Board app, along with sufficient training in its use, plus leader commitment to actually using it. The goal is to replicate the chemistry of a bunch of people in a room together, surrounded by whiteboards and fully charged Dry Erase markers.

Or am I just engaging in optimism bias, with a generous dose of wishful thinking?