Don’t apply traditional IT portfolio governance to shadow IT.

Why? Start with two iron-clad rules of IT portfolio governance:

  • No business sponsor, no project. With shadow IT, sponsorship is automatic. If nobody in the business cares at sponsorship levels the project wouldn’t be happening. A formal governance check is superfluous.
  • The only legitimate outcomes are “no” and “when.” There’s no maybe, and no priority scoring. With formal IT, approved projects are those added to the master schedule, to start when the necessary staff are available to work on them.

With shadow IT, the whole point is that the business area undertaking the project isn’t willing to wait.

The entire reason businesses need so-called IT governance … really, business change governance … is because of scarcity. In most companies there just aren’t enough IT developers to satisfy the business appetite for change and improvement.

Scarcity isn’t part of the shadow IT equation. Rather than trying to get their slice of the IT developer pie, with shadow IT, business units own the whole pie.

Does this mean shadow IT should be governance-free?

Yes … but no.

“Governance” almost always means “a committee must review this.”

That’s the no. Just as we have to restrain our natural impulse to strangle idiots, so, in business, we should restrain the impulse to form committees.

The yes comes from what we might call the John B. Finch principle: “Your right to swing your arm leaves off where my right not to have my nose struck begins.” (Oliver Wendell Holmes usually gets the credit.)

Shadow IT projects (really, shadow business-change projects) have three noses in range:

Re-keying: Shadow IT tends to deliver “islands of automation,” unintegrated with the rest of the company’s applications and information portfolios. That often results in the need to re-key data from other systems into the shadow-IT system, and to re-key data from it back into the company’s systems of record.

The former isn’t a problem. If the department head decides re-keying into the system is just fine, it is, by definition, just fine.

But the latter? If the re-keying must be performed by employees who report elsewhere, the project might violate the Finch Principle. It doesn’t if they’re keying the data now and this just changes the source. But if one manager is shoveling new work into a different department’s inbox, it’s a Finch Principle violation.

Fortunately, there’s a committee-free solution readily available. All the manager whose nose has been put out of joint has to do is document the offense and its budget impact. The result: The Finch Principle violator has to bear the expense. The company budgeting process obviates the need for separate governance.

Data exposure: Imagine the point of a shadow-IT system is to facilitate working with an external partner. Now imagine it inadvertently exposes sensitive data beyond the walls of the corporation. It’s a clear Finch Principle violation.

Happily, there’s a committee-free solution to this too. It’s a policy, probably an existing policy: InfoSec reviews all systems that are used by or expose data to outsiders.

Data governance: A popular approach to IT integration is the “federated architecture” — an environment that looks like a unified whole even though under the covers it’s been assembled from a variety of COTS and SaaS solutions.

A major challenge for federated architectures is reconciling data definitions so that data entered into one system can flow into the corresponding fields in other systems without mishap. As those in the trenches know while those who rely on PowerPoint do not, semantic mismatch, not field-level synchronization, is the big challenge to systems integration.

Shadow IT systems will often ignore the usual controls for this because it’s an eye-glazingly boring problem. Unless, that is, you’re either responsible for the integrity of the IT architecture, or genetically predisposed to lexicography.

Even if integration happens through re-keying, semantic mismatches can pollute data in the company’s systems of record.

The solution, to the extent there is one, doesn’t require a committee, although regrettably, many companies rely on committees to handle data governance.

Companies that want consistent data create and maintain some sort of glossary (data dictionary or encyclopedia) for that purpose. Everyone who uses data relies on it.

Which in turn probably means you’ll need to create some form of “Glossary Police” to make sure even shadow IT projects adhere to its definitions. But please … don’t make the Glossary Police a committee.

You’re better than that.

Evidence-based decision-making is superior to intuition-based decision-making. If you disagree, please feel free to build a bridge or skyscraper on footings designed by engineers who prefer gut feel to empirically tested formulas.

And then come all the caveats, because as much as KJR has been a strong proponent of evidence-based decision-making, there are plenty of ways to go about it that are far inferior to, not to mention far more expensive than your average Magic 8 Ball®.

The most obvious (and not this week’s topic) is the popular pastime of solving for the number — of hiding intuition-based decision-making inside evidence-oriented clothing. Before big-data analytics became popular, Excel was the preferred tool for this job.

The Hadoop ecosystem includes far more sophisticated ways to reach the same foregone conclusions. Apply the right filters and shop around among the different statistical tests available to you in even the sparsest of statistical packages and if you can’t come up with the answer you want, you aren’t using your imagination.

But even with the best of intentions and no desire to distort, conscious or otherwise, statistical analysis holds plenty of pitfalls, even for professionals.

Take this recent correction request, filed by CNN with the Pew Research Center. As reported by the Washington Post’s Erik Wemple, a recent Pew study concluded that last January, Foxnews.com had more unique visitors than CNN.com.

CNN’s complaint: Pew’s analysis …

Uses a custom entity, [E] Foxnews.com, for Fox News against raw site-level property metrics, [S] for CNN.com. This is not an apples-to-apples comparison since a custom entity may contain a collection of other URLs that remain hidden. As it turns out, we learned from our inquiry to comScore that Fox News’ custom entity is also comprised of a variety off-site traffic assignment letters (TALs) and, as such, is not truly the audience of foxnews.com but instead is assigned traffic from other sites that is reallocated back to Fox News even though the visitor did not consume said content on foxnews.com.

I won’t comment as to whether the use of TALs is legitimate or not, on the grounds that I’m not remotely qualified to do so. If you’re interested, here’s a link for more on the topic.

Presumably, Pew’s analysts are properly qualified, but (1) might not have been aware that comScore included TALs in its Foxnews.com tallies; or (2) might have concluded that including them in web traffic statistics is legitimate.

Which gets us to your big-data repository. One of the attractions of NoSQL technologies like Hadoop is that you can pretty much dump data into them without worrying too much about how the data are organized. That’s addressed during the analysis phase, which is why another descriptor for this family of technologies is “schema on demand.”

It’s reasonably well-known that this also means a lot of the data being dumped into these “data lakes” has not been subjected to much in the way of cleansing. That’s almost the point of it: Hadoop and its brethren are adept at storing huge streams of inbound data (hence “big data”). They wouldn’t be so adept at it if some pre-processor had to cleanse it all first.

You have to pay the piper sometime, of course. In this case, it means you’ve shifted work from those who program data-loading into traditional data warehouses to those who analyze data stored in non-traditional NoSQL data lakes.

What’s less-well recognized is what Pew’s analysts either did or didn’t address with the TAL question: With traditional data warehouses, professional analysts make decisions like this as a conscious part of designing their extract, translate and load (ETL) processes.

They might miss something subtle too, of course … there never are any guarantees when those pesky human beings are involved … but at least there’s a defined task and assigned responsibility for taking care of the problem.

Not that this means it’s all taken care of: Whatever filtering decisions data warehouse analysts might consciously make while implementing the system will usually turn into hidden assumptions inherited by those who analyze the data stored there later on.

At least with schema-on-demand, analysts have to make these decisions consciously, and so are aware of them.

If, that is, they’re knowledgeable enough to be aware of the need to make them all.

Which is why, whether your analytics strategy is built on a traditional data warehouse or a schema-on-demand data lake, you need the services of a professional data scientist.

Or, as we used to call them, statisticians.