The IT industry has not approached the subject of service and how to measure it with much subtlety. Quite the opposite — we simply established service level measurement as a sort of tradition, to be respected and adhered to whether or not it fits the situation very well.
Service levels originated in the telecommunications industry, where customers needed some way beyond “I hope the lines stay up” to define and manage vendor performance. From this modest beginning sprang more consulting fees than you can shake a stick at, if shaking sticks at consulting fees is your idea of time well spent.
Service levels never strayed far from their origin, though, which means the hidden assumption that providers are vendors and users are customers is not very well hidden. But as long-time readers of Keep the Joint Running know, when internal IT defines itself as a vendor to its internal customers, it stops delivering the advantages to be had from working with internal IT and replaces them with the disadvantages of working with a second-rate outsourcer (the first rate ones being, by definition, in actual for-profit business with multiple clients).
Service levels are, if you aren’t familiar with the concept, two-part measures. The first part defines the minimum acceptable level of performance. The second tracks how often a provider meets or exceeds that performance level. It’s the reason I’ve been skeptical of their usefulness when you aren’t a vendor providing contractual protections to customers: You should be trying to improve performance while reducing variability — goals measured by entirely different formulas (for more, read “Measuring service,” Keep the Joint Running, 2/16/2004).
Service levels do have their place within internal IT. They are, as it turns out, just the ticket for whatever isn’t important enough to continuously improve. More precisely, they’re the right measure for every parameter of a system that isn’t an optimization target.
You can’t optimize everything. The reason is well known: There comes a point in any optimization process where trying to further improve one system parameter — say, cycle time — can only come at the expense of another system parameter, such as defect rate (unless you change the game by instituting an entirely different methodology or process design that is — if you do, all bets are off).
That isn’t necessarily the wrong choice to make. It should, however, be an informed choice. That’s where service levels come in. First you choose the one or two system parameters that matter most — whose performance you’ll work to continue improving. Then you establish service levels for the rest to make sure you don’t kill them in the process.
An example, to illustrate: You institute a standard software quality assurance methodology. The goal of the process is, unsurprisingly, to reduce the number of software defects that make it into production (that is, the goal is to improve quality). That’s the most important optimization parameter. And, you decide, you want the process to have a short cycle time, to the extent you can do so without jeopardizing quality.
That leaves four process parameters unspecified — fixed cost, unit cost, throughput, and excellence (which in this case means adaptability). And since they’re unspecified they’re likely to get out of hand, because …
The easiest way to improve quality is to increase the number of test cases. That, however, increases cycle time. To avoid that outcome you’d have to add staff. But that increases unit cost. Is that acceptable or not?
The answer to the question is defined by a service level — in this case, the maximum acceptable level of spending per deployed module, or some similar measure.
Or, you decide to improve quality by implementing some testing software. That lets you explore more test cases, and to automate regression testing and stress testing, without adding staff.
The software isn’t cheap, though, so up goes your fixed cost. Is that okay? It depends on the service level associated with fixed costs, sometimes referred to as the capital budget.
In IT we face these trade-offs over and over again. Tune servers for performance and they fail more often. If performance is your goal you need a service level for reliability, or vice versa. Tune servers for performance while improving reliability by adding redundancy and up go your costs. That’s right — your budget is just another service level.
It’s really pretty simple. Decide what matters most and improve it. Set boundaries on everything else so it doesn’t get out of hand.
We sure do make a big fuss over it all, though, don’t we?