Contrary to what you might have heard, we are stuck with Data Warehouses, whether we like them or not.

Let’s not get stuck in the differences between a “Data Lake”, “Warehouse”, “Silo”, “Data Intelligence Platform” or “Kevin” (Real name of a system out there). If it (1) merges; (2) scrubbed data; (3) in a form that makes analysis easy; and (4) with high performance; it’s a data warehouse.

And data warehouses aren’t going away anytime soon, because the problems they solve haven’t gone away. We still need a place to store data from a number of different systems that we can represent and reuse relatively easily for reports, business intelligence, dashboards and the like. And, even when these tools are offered “In the cloud”, as some sort of SaaS solution, they are still Data Warehouses, smelling just as sweet as anything called a “warehouse” is going to smell.

Why do we need them?

It pretty much always comes down to the same incident—An executive in the Business is struggling for insights or information to make a decision, and the existing enterprise systems can’t quite provide needed information at a glance, or, worse, they can but their reports disagree.

The discussion after this incident generally sounds something like this:

  • The IT team demonstrates that they’re collecting the right data, but it isn’t in the right order, timeframe or format, and doesn’t live inside a single system.
  • The Business executive asks searching questions about why, after all the company’s investments in CRM, ERP, reporting tools and more, they still can’t easily answer simple-sounding questions.
  • Conversations are remembered about how these systems were supposed to provide fantastic reporting and even better dashboards.
  • Frustration ensues. But eventually, blamestorming fatigue sets in and everyone involved figures out that the company needs a coherent analytics strategy, supported by IT in the form of … some form of data warehouse.
  • Ideally, this will all be accompanied by a more collaborative working relationship between business executive management and the IT organization.

 

Occasionally, there is resistance. Most often it’s rooted in a blame-oriented business culture – the need to spend time, money, and opportunity costs must be someone’s fault. And as a general rule, whenever there’s fault to be assigned, IT is the logical and convenient scapegoat.

Cut this off at the knees early and often. Explain that there’s no fault to be assigned. New requirements need new solutions, and new solutions aren’t free.  Let’s talk about some of the questions that you may get, and how to deal with them head on.

  • This seems like a big project.

Bite the bullet early and acknowledge that data warehousing projects, whether built around highly structured “snowflake” data models or data-dump-based HDFS data lakes, are never small and simple. But they don’t have to be unmanageable. There are ways to stage implementations so delivery happens at a satisfactory cadence. The complete analytics roadmap can get big, but the tools, technologies, and practices needed to support the effort are better than ever.

  • Aren’t we doing extra reporting work?

Not really, you are just doing the right work to create effective outcomes. The CRM/HRMS/whatever systems by themselves aren’t designed to give you the right intelligence at the right time – not through any inherent deficiencies of being a transactional management system, but because no one system has all the data needed to support the desired analyses.

  • I don’t really want Procurement/HR/Supply chain (Read- Other executives) to see my data.

Your response: “That’s a great point. As part of the roadmap we’ll definitely want to make sure the right mechanisms and processes are in place to make sure only the right people in the right roles have access to the right data.”

Next week: Specific tips and tricks for getting organized for success.

Often, when something new comes along, the skills you have to jettison outweigh the new ones you have to acquire.

I am, of course, writing about artificial intelligence and what IT has to do to cope with it. Are there any other topics for a Recognized Industry Pundit (RIP) to write about right now?

Sure there are, but not this week. This week’s topic is AI, and specifically the AI-driven need to rewrite the rules of IT quality assurance.

As an IT professional you’re familiar with software quality assurance (SQA) and its role in making sure the organization’s applications do what they’re supposed to do.

You’re also familiar with DQA – data quality assurance, while you might not use the acronym in your everyday conversations. You should, because what seems to be missing in IT AI methodology-land is the complete re-write we need of the DQA handbook.

Do some googling (or co-piloting, or whatever) and you’ll find quite a few suggestions for using AI to improve your DQA practices. But these get things backward.

In pre-AI IT, quality (to oversimplify) comes from SQA, a search for situations in which a program doesn’t turn its inputs into the right outputs.

Bring generative AI into the conversation and the day-to-day need for SQA goes away. Generative AI’s neural-network-based application logic is fixed – neural network nodes are, to oversimplify some more, multivariate correlation engines.

With generative AI it’s the data, not application logic, that drives output quality.

Trying to override this dynamic can be a cure that’s worse than the disease, as Google recently discovered to its corporate embarrassment.

When old-school DQA was in charge, biased data meant the company’s data repositories didn’t accurately reflect the underlying statistical universe.

What ran Google’s Gemini off the road was its attempt to inject bias into its outputs.

The problem Gemini ran afoul of was that The World isn’t what we want it to be. With Gemini, Google tried to fix what’s wrong with The World by superimposing its preferences on the Gemini’s outputs.

As explained by Prabhakar Raghavan, Google’s executive in charge:

Three weeks ago, we launched a new image generation feature for the Gemini conversational app (formerly known as Bard), which included the ability to create images of people.

It’s clear that this feature missed the mark. Some of the images generated are inaccurate or even offensive. We’re grateful for users’ feedback and are sorry the feature didn’t work well.

I’m pretty sure the situation is much, much worse than Raghavan’s apology suggests, because we can expect future image, video, audio, and text generation products to be just as problematic as Gemini is.

Fixing Gemini and its generative AI brethren amounts to trying to fix The World.

Imagine you asked Gemini to, as did The Verge, “… generate an image of a 1943 German Soldier. It should be an illustration.” Programmed to avoid generating offensively biased images, Gemini produced a picture showing a demographically diverse WWII-era German military workforce (click here).

Raghavan was right about it being an offensive output (or, more accurately, an output that would offend some viewers). But it wasn’t Gemini that was offensive. It’s how Google tried to teach Gemini how to respond when The World is offensive that ended up being offensive.

It could have worked, if it weren’t, that is, for two thorny questions: (1) who gets to define “ought to be?” and (2) if we’re going to tell AI what the right answer is, what’s the point?

We already have AI systems where humans tell the AI the right answer. They’re called “expert systems,” and they’ve been around since the 1970s.

One way of looking at generative AI is that (oversimplifying yet again) it’s just like expert systems except we’re trying to make machines the experts. In traditional analytics, data quality is something you take care of so you can draw reliable conclusions when you analyze the data with programs you’ve subjected to software quality assurance.

Data quality isn’t what it once was. Now, it’s what you need so that the data whose quality you’re assuring properly trains your generative AI.

In generative AI, that is, the data aren’t something you process with programmed logic. In a very real sense, the data are the program logic.

Bob’s last word: One more thing. The Gemini team produced its problematic results despite having Google’s resources to draw on. But AI vendors are starting to peddle the benefits of connecting your company’s internal data to the same AI technologies. It’s tempting, but if Google, with far deeper pockets than its customers have, couldn’t figure out the DQA practices it needed to stay out of trouble, how are its customers supposed to do so?

And while we’re on the subject, this week CIO.com’s CIO Survival Guide is:A CIO primer on addressing perceived AI risks.” It’s about real and perceived AI risks you probably haven’t read about anyplace else.