Data Warehouse technology is going through a bit of a renaissance—with more and better options in terms of hosting, performance, accessibility, and data transformation and processing options, at lower costs and headaches.  Is it perfect?  No, but at this point, the gaps are more about us humans than the technology.

Call it data warehousing’s third wave. The first wave relied on rigidly structured data warehouses, sometimes packaged into multiple smaller “data marts.” The second wave used hyperscale architectures to support “schema on demand” analytics without the first wave’s up-front detailed planning.

The third wave hasn’t arrived, but we can anticipate it: using artificial intelligence regimes to automate the schema on demand data-structuring and filtering process, greatly increasing data warehousing agility.

Assume, for the sake of argument, that you’re partway between the second and third waves. Let’s talk about some reasonable planning considerations to make adoption and the transition easier.

Governance—We are still talking about an enterprise system that everybody will be using and benefiting from. More so: your collective data warehouses have, in the aggregate, a broader array of stakeholders than anything else in your portfolios. So, you are going to be dealing with your colleagues a lot, and there may be a lot of reasonable (and unreasonable) questions and requests.

The best thing to do is to get ahead of it and come up with a governance mechanism that works for your organization.  This governance body needs to be in reasonable agreement as to the budget, scope, serviceability and needs.  Bonus points if you already have this worked out in your organization!  You are well ahead of the problem.  More bonus points if your governance solution recognizes the distinction between committees and councils. More bonus points if you have a single governance solution for each architectural grouping, as opposed to separate governance for each application, suite, and platform family.

Data ownership – One of the topics we alluded to last week is data confidentiality and user roles and rules within this shared system.  All the leaders need to respect each other’s need to publish their data to the rest of the company WHEN AGREED UPON, but not before.  Nothing destroys trust like using half completed data to make assumptions about your colleagues’ work.   Strict management of user roles and access are incredibly important for building and maintaining trust.

The tricky part is finding the right point of balance separating supporting “ownership” (or, better, “stewardship,” and embracing the dysfunction of organizational siloes.

Data quality— We cannot build a trusted single source of the truth with garbage data coming in.  We must insist that all systems that contribute data must provide data that are:

Clean (can pass bounds checks, makes logical sense, and reflects the data in the transactional system)

Complete (The system is getting all of the data from the transactional system)

Documented enough (so that it can be found and used appropriately- and not outside of its own limits).

Statistically legitimate (tested for flaws such as autocorrelation, heteroskedasticity, and insufficient sample size)

Metadata—We live in 2024.  If you must think too much about how to manage metadata, you have wrong platform. That doesn’t mean you’re actually managing metadata, though. That’s a process question. Please consider the need for metadata management to be your technical tip of the week.

Platform and hosting—There are great cloud options, managed by companies valued in the billions.  There are also really creative solutions like Doris that are open source and can be hosted wherever.   I think the real question is more about how mature your organization’s ability is to support this mission critical system.  Look yourself in the mirror for a moment and ask yourself if your team can support your data warehousing platforms internally, if you have a good Enterprise Managed Services partner, or if you feel you can work with a provider that truly manages this mission for you as a service.

Machine learning and generative AI – Opinion: while these will become your data warehouse’s most important consumers sometime in the indefinite future, they won’t be ready for prime time until Explanatory AI has matured.

Build with the end in mind–   Perhaps the most important consideration is to know what kind of decisions people (leaders as well as staff) are trying to make.  If you know what kind of decisions need to be made, you can start to offer options regarding the presentation of the data, and more importantly, the synthesized Information that best helps people make those decisions.

After you know these points, you can make intelligent decisions about the schema, performance optimizations, as well as how often the upstream systems need to update your Data Warehouse, what are the necessary elements of the data, and where do they come from, and finally, how do we test the quality of what data comes in.

Working backwards (in this case) turns out to be going forwards, actually.

Contrary to what you might have heard, we are stuck with Data Warehouses, whether we like them or not.

Let’s not get stuck in the differences between a “Data Lake”, “Warehouse”, “Silo”, “Data Intelligence Platform” or “Kevin” (Real name of a system out there). If it (1) merges; (2) scrubbed data; (3) in a form that makes analysis easy; and (4) with high performance; it’s a data warehouse.

And data warehouses aren’t going away anytime soon, because the problems they solve haven’t gone away. We still need a place to store data from a number of different systems that we can represent and reuse relatively easily for reports, business intelligence, dashboards and the like. And, even when these tools are offered “In the cloud”, as some sort of SaaS solution, they are still Data Warehouses, smelling just as sweet as anything called a “warehouse” is going to smell.

Why do we need them?

It pretty much always comes down to the same incident—An executive in the Business is struggling for insights or information to make a decision, and the existing enterprise systems can’t quite provide needed information at a glance, or, worse, they can but their reports disagree.

The discussion after this incident generally sounds something like this:

  • The IT team demonstrates that they’re collecting the right data, but it isn’t in the right order, timeframe or format, and doesn’t live inside a single system.
  • The Business executive asks searching questions about why, after all the company’s investments in CRM, ERP, reporting tools and more, they still can’t easily answer simple-sounding questions.
  • Conversations are remembered about how these systems were supposed to provide fantastic reporting and even better dashboards.
  • Frustration ensues. But eventually, blamestorming fatigue sets in and everyone involved figures out that the company needs a coherent analytics strategy, supported by IT in the form of … some form of data warehouse.
  • Ideally, this will all be accompanied by a more collaborative working relationship between business executive management and the IT organization.

 

Occasionally, there is resistance. Most often it’s rooted in a blame-oriented business culture – the need to spend time, money, and opportunity costs must be someone’s fault. And as a general rule, whenever there’s fault to be assigned, IT is the logical and convenient scapegoat.

Cut this off at the knees early and often. Explain that there’s no fault to be assigned. New requirements need new solutions, and new solutions aren’t free.  Let’s talk about some of the questions that you may get, and how to deal with them head on.

  • This seems like a big project.

Bite the bullet early and acknowledge that data warehousing projects, whether built around highly structured “snowflake” data models or data-dump-based HDFS data lakes, are never small and simple. But they don’t have to be unmanageable. There are ways to stage implementations so delivery happens at a satisfactory cadence. The complete analytics roadmap can get big, but the tools, technologies, and practices needed to support the effort are better than ever.

  • Aren’t we doing extra reporting work?

Not really, you are just doing the right work to create effective outcomes. The CRM/HRMS/whatever systems by themselves aren’t designed to give you the right intelligence at the right time – not through any inherent deficiencies of being a transactional management system, but because no one system has all the data needed to support the desired analyses.

  • I don’t really want Procurement/HR/Supply chain (Read- Other executives) to see my data.

Your response: “That’s a great point. As part of the roadmap we’ll definitely want to make sure the right mechanisms and processes are in place to make sure only the right people in the right roles have access to the right data.”

Next week: Specific tips and tricks for getting organized for success.