App Dev Methodologies – IS Survivor Publishing

Intelligence tests

Faced with a discipline that looks too much like hard work, I generally compromise by memorizing a handful of magic buzzwords and their definitions. That lets me acknowledge the discipline’s importance without having to actually learn a trade that looks like it would give me a migraine were I to pursue it.

Which gets us to testing … software quality assurance (SQA) … which I know consists of unit testing, integration testing, regression testing, user acceptance testing, and stress testing.

Although from the developer’s perspective, user acceptance testing and stress testing are one and the same thing – developers tend to find watching end-users try to use their software deeply stressful.

More to the point, I also “know” test automation is a key factor in successful SQA, even though I have no hands-on experience with it at all.

Speaking of no hands-on experience with testing stuff, the headline read, “Bombshell Stanford study finds ChatGPT and Google’s Bard answer medical questions with racist, debunked theories that harm Black patients.” (Garance Burke, Matt O’Brien and the Associated Press, October 20, 2023).

Which gets us to this week’s subject, AI testing. Short version: It’s essential. Longer version: For most IT organizations it’s a new competency, one that’s quite different from what we’re accustomed to. Especially, unlike app dev, where SQA is all about making sure the code does what it’s supposed to do, for the current crop of AI technologies SQA isn’t really SQA at all. It’s “DQA” (Data Quality Assurance) because, as the above-mentioned Stanford study documents, when AI reaches the wrong conclusion it isn’t because of bad code. It’s because the AI is being fed bad data.

In this, AI resembles human intelligence.

If you’re looking for a good place to start putting together an AI testing regime, Wipro has a nice introduction to the subject: “Testing of AI/ML-based systems,” (Sanjay Nambiar and Prashanth Davey, 2023). And no, I’m not affiliated or on commission.

Rather than continuing down the path of AI nuts and bolts, some observations:

Many industry commentators are fond of pointing out that “artificial intelligence” doesn’t really deal with intelligence, because what machines do doesn’t resemble human thinking.

Just my opinion: This is both bad logic and an incorrect statement.

The bad logic part is the contention that what AI does doesn’t resemble human thinking. The fact of the matter is that we don’t have a good enough grasp of how humans think to be so certain it isn’t what machines are doing when it looks like they’re thinking.

It’s an incorrect statement because decades ago, computers were able to do what we humans do when we think we’re thinking.

Revisit Thinking, Fast and Slow, (Daniel Kahneman, 2011). Kahneman identifies two modes of cognition, which he monosyllabically labels “fast” and “slow.”

The fast mode is the one you use when you recognize a friend’s face. You don’t expend much time and effort to think fast, which is why it’s fast. But you can’t rely on its results, something you’d find out if you tried to get your friend into a highly secure facility on the strength of you having recognized their face.

In security circles, identification and authentication are difficult to do reliably, specifically because doing them the fast way isn’t a reliable way to determine what access rights should be granted to the person trying to prove who they are.

Fast thinking, also known as “trusting your gut,” is quick but unreliable, unlike slow thinking, which is what you do when you apply evidence and logic to try to reach a correct conclusion.

One of life’s little ironies is that just about every bit of AI research and development is invested in achieving fast thinking – the kind of thinking whose results we can’t actually trust.

AI researchers aren’t focused on slow thinking – what we do when we say, “I’ve researched and thought about this a lot. Here’s what I concluded and why I reached that conclusion.” They aren’t because we already won that war. Slow thinking is the kind of artificial intelligence we achieved with expert systems in the late 1980s with their rule-based processing architectures.

Bob’s last word: For some reason, we shallow human beings want fast thinking to win out over slow thinking. Whether it’s advising someone faced with a tough decision to “trust your gut,” Obi Wan Kenobi telling Luke to shut off his targeting computer, or some beer-sodden opinionator at your local watering hole sharing what they incorrectly term their “thinking” on a subject. When we aren’t careful we end up promulgating the wit and wisdom of Spiro Agnew. “Ah,” he once rhetorically asked, “What do the experts know?”

Bob’s bragging rights: I just learned that TABPI – the Trade Association Business Publications International – has recognized Jason Snyder, my long-suffering editor at CIO.com and me a Silver Tabbie Award for our monthly feature, the CIO Survival Guide. Regarding the award, they say, “This blog scores highly for the consistent addressing of the readers’ challenges, backed by insightful examples and application to current events.“

Gratifying.

Speaking of which, On CIO.com’s CIO Survival Guide: “The CIO’s fatal flaw: Too much leadership, not enough management.” Its point: Compared to management, leadership is what has the mystique. But mystique isn’t what gets work out the door.

The unglamour of SQA

Before you can be strategic you have to be competent.

That’s according to Keep the Joint Running: A Manifesto for 21st Century Information Technology, (me, 2012), the source of all IT management wisdom worth wisdoming.

An unglamorous but essential ingredient of IT organizational competence is software quality assurance (SQA), the nuts-and-bolts discipline that makes sure a given application does what it’s supposed to do and doesn’t do anything else.

SQA isn’t just one practice. It’s several. It checks:

Software engineering – whether code adheres to the overall system architecture, is properly structured, and conforms to coding style standards.

Unit testing – whether a module correctly turns each possible input into the expected output.

Integration testing – whether a module interacts properly with all the other modules the team is creating.

Regression testing – whether the new modules break anything that’s already in production.

Stress testing – whether the whole system will perform well enough once everyone starts to bang on it.

User acceptance – whether the new modules are aesthetically pleasing enough; also, whether they do what the business needs them to do – do they, that is, effectively support, drive, and manage the business processes they’re supposed to support, drive, and manage.

Ideally, IT’s SQA function will establish and maintain automated test suites for all production applications and keep them current, to ensure efficient and correct unit, integration, regression, and stress testing.

In practice, creating and managing automated test suites is really, really hard.

This looks like a fabulous opportunity for generative AI, doesn’t it? Instead of asking it to generate a mathematical proof in the style of William Shakespeare, point your generative AI tool of choice to your library of production application code and tell it to … generate? … an automated test suite.

Generative AI, that is, could take one of the most fundamental but time-consuming and expensive aspects of IT competence and turn it into a button-push.

Brilliant!

Except for this annoying tidbit that’s been an issue since the earliest days of “big data,” generative AI’s forgotten precursor: How to perform SQA on big data analytics, let alone on generative AI’s responses to the problems assigned of it.

Way, way, way back we had data warehouses. Data warehouses start with data cleansing, so your business statisticians could rely on both the content and architecture of the data they analyzed.

But data warehouse efforts were bulky. They took too long, were anything but flexible, and frequently collapsed under their own weight, which is why big data, in the form of Hadoop and its hyperscale brethren, became popular. You just dumped your data into some data lakes, deferring data cleansing and structuring … turning that data into something analyzable … until the time came to analyze it. It was schema on demand, shifting responsibility from the IT-based data warehouse team to the company’s newly re-named statisticians, now “data scientists.”

The missing piece: SQA.

In scientific disciplines, researchers rely on the peer review process to spot bad statistics, along with all the other flaws they might have missed.

In a business environment, responsibility for detecting even such popular and easily anticipated management practices as solving for the number has no obvious organizational home.

Which gets us to this week’s conundrum. We might call it SQA*2. Imagine you ask your friendly generative AI to automagically generate an automated test suite. It happily complies. The SQA*2 challenge? How do you test the generative AI’s automated test suite to make sure the flaws it uncovers are truly flaws, and that it doesn’t miss some flaws that are present – feed it into another generative AI?

Bob’s last word: It’s easy, and gratifying, to point out all the potential gaps, defects, fallacies, and potential pitfalls embedded in generative-AI implementations. In the generative-AI vs human beings competition, we can rely on confirmation bias to assure ourselves that generative-AI’s numerous flaws will be thoroughly explored.

But even in the technology’s current level of development, we Homo sapiens need to consider the don’t-have-to-outrun-the-bear aspect of the situation:

Generative-AI doesn’t have to be perfect. It just has to be better at solving a problem than the best human beings are.

This week’s SQA*2 example … the automated generation of automated test suites … exemplifies the challenge we carbon-based technologies are going to increasingly face as we try to justify our existence given our silicon-based competition.

Bob’s sales pitch: You are required to read Isaac Asimov’s short story in which he predicts the rise of generative AI. Titled “The Jokester,” it’s classic Asimov, and well worth your time and attention (and yes, I did say the same thing twice).

Now on CIO.com’s CIO Survival Guide: “5 IT management practices certain to kill IT productivity.” What’s it about? The headline is accurate.