Unstructured data design – a modest proposal. Let’s try this again

I’m giving myself a Memorial Day break. I didn’t post anything yesterday, and today is a re-run, from, as the Beatles might have sung, 20 years ago today.

It’s a bit esoteric, but even more relevant today than when I first wrote it.

Take a look and let me know what you think about it.

– Bob

# # #

If there’s one certainty in our business, it’s that useful, lightweight frameworks turn into bloated, productivity-destroying methodologies.

And so it was with considerable trepidation last week that I suggested we need another methodology, to do for Content Management Systems (CMSs — the technologies we use to manage unstructured information) what normalization and related techniques do for relational database management systems (“Unstructured data design — the missing methodology,” KJR, 5/17/2010).

But we do. As evidence, I offer many of the comments and e-mails I received suggesting we don’t: Most pointed to the existence of well-developed tools that allow us to attach metadata to “unstructured content objects” (documents, spreadsheets, presentations, digital photos, videos and such, which we might as well acronymize now and have done with it: For our purposes they’re now officially “UCOs”).

And many more pointing out that search obviates the need for categorization.

Let’s handle search first, because it’s easier: Search is Google listing 16,345,321,568 UCOs that might have what you’re looking for. It’s also what leads to (for example) searches for “globe,” “sphere,” “orb,” and “ball” yielding entirely different results.

Search is what you do when you don’t have useful categories.

And then there’s metadata — the subject that proves we don’t have what we need. Because while we have the ability to attach metadata to UCOs, we have only ad hoc methods for deciding what that metadata should be.

Some readers suggested this might be a solved problem. Books are UCOs, and librarians have been categorizing them for centuries. Between the Dewey Decimal System and the Library of Congress Classification, surely there’s a sound basis on which to build.

Maybe there is. I’m skeptical but not knowledgeable enough to state with confidence they won’t work. I’m skeptical because their primary purpose is to place books in known locations in the library so they can be readily found, which means they’re probably similar to the single folder trees we need to move beyond.

What we need, that is, is the ability to place one UCO in as many different locations as anyone might logically expect to find it. To use one of my own books as an example, Leading IT: The Toughest Job in the World would fit in at least these categories: Leadership, Information Technology, Staffing, Decision-making, Motivation, Culture change, and Communication skills.

The official name for the discipline of defining knowledge domains … what we’re trying to do … is ontology. It’s an active area of development, including the creation of standards (such as OWL, which puzzlingly stands for “Web Ontology Language” instead of “Ontology Web Language,” but let it pass).

From what I’ve been able to determine, though, it appears everything being developed thus far falls under the heading of tools, with a useful methodology nowhere in sight. This is, perhaps, unsurprising as it appears philosophers have been discussing the subject at least since Aristotle first introduced it 2,350 years ago or so, without yet arriving at a consensus.

It could be awhile.

Of course, philosophers are obliged to develop systems so universal they apply, not only to our universe, but to all possible universes. Such is the nature of universal truth.

We don’t need to be quite so ambitious. We merely need to categorize information about our businesses. To get the ball rolling, I’ll offer up the framework we’ve synthesized at IT Catalysts. It enumerates ten topics that together completely describe any business — five internal and five external. They are:

Internal

People: The individual human beings who staff a business.
Processes: How people do their work.
Technologies: The tools people use to perform the roles they play in business processes.
Structure: Organizational structures, facilities, governance, accounting, and compensation — how the business is put together and interconnected.
Culture: The learned behavior people exhibit in response to their environment, and the shared attitudes that underlie it.

External

Products: Whatever the business sells to generate profitable revenue.
Customers: Whoever makes or influences buying decisions about the products a business sells.
Pricing: What the business charges for its products, terms and conditions of purchase, and the underlying principles that lead to them.
Marketplace: The business “ecosystem” in which the company exists, including customer groupings, competitors, partners, and suppliers.
Messages: How and what the business communicates with its marketplace.

There you go — a free gift, if you’ll forgive the redundancy. Just break these topics down into sub-topics and sub-sub-topics. The result should be a workable classification scheme.

Let me know when you’re done.

joe sixpak May 26, 2020 at 1:48 pm

Bob,

We already have cross indexing. Your idea still adds nothing new.
Maybe new more complex media means we need more of it but the idea is already in use.
Tony Kenck May 27, 2020 at 7:47 pm

I developed an olap database many years ago. That work led me to think about exactly this problem. Wouldn’t it be wonderful if we had n-dimensional filing systems in our directories.

Some of the fields would automatically fill in like date and application, others could be check boxes.

I love the idea, but at the time I could not find any such implementation.

So in addition to or instead of deep hierarchies, there might be times when people would simply want to tack on another dimension.

I’m confident that there would be a million implementation issues, but I really like the idea.
Stonebreaker May 29, 2020 at 7:59 pm

Twenty years ago, the Polar Bear Book (https://intertwingled.org/the-polar-bear-book/) was flying off the shelves, for exactly these reasons. And Peter and Lou really knew what to do with an ontology – still an incredibly rare skill.

Your instincts were evidently excellent, even though the discipline of information architecture was still in its infancy.