Where to Start? (2 of 2) Metrics & Measurements

Bullet point #1 in your executive-friendly PowerPoint about “Achieving Operational Excellence in IT” covers Process and Procedure; so how do we measure our effectiveness? I’m a big proponent of Metrics and Measurements as well – but often times the biggest challenge is where to start?

Measure the Unmeasured

In most organizations (especially manufacturers!), the business has plenty of Key Performance Indicators (KPIs) that tell them how much productivity they are seeing, how much money they are saving, and how they are driving out variable costs. IT metrics don’t need to focus on those things – and it’s often difficult to get the business to share their control of the message.

Better to just focus on the behavior, performance, and availability of the system. For a start, try tracking uptime/availability. What percent of the year is the system available, with no problems? To be fair, you should define actual expected hours of service; is the system really expected to be 24×7? Or is it critical to be available and working from 4am to 9am – to get the day started, schedule the shift, print the reports, etc. These metrics help tremendously when the system does experience a hiccup; for end users up in arms over the lack of computing services this morning, point out that the system has had 99.9% uptime over the past year or so. Most folks understand the “five 9’s” concept, where each additional decimal point of uptime costs an order of magnitude more $$.

For example … this system is only used from 6 to 6, and never on the weekends. You didn’t budget for high-availability / clustered / failover / megaservers, did you?

WhyHigh
Click to enlarge …

Another trick you can do with usage report: if 30% of the reports on your server never get executed, consider taking the first set of reporting requirements for your next project, asking the user to prioritize it all – and postpone work on the bottom 30% of the reports! You will cut a ton of time off the development phase of your project, and the metrics suggest that most of the stuff you cut would never be used anyway! Note that I said we’d postpone the work – we can always go back and add critical missing reports later.

Visibility: as Important as Readability

This framework should give you a jump start on what to measure; you really need to focus on how you will deliver your pictures to the target eyeballs. Nice stats, but how are you going to let folks know the score? If you are fortunate enough to have a robust portal environment, and can configure plug-ins with graphs and such, your job should be easy. You’ll still have to learn to configure, feed the data, and automate – if not, the administrivia hassle will lead to neglect.

If your portal platform doesn’t do the graph thing – or if the plugin renders unreadable graphics (go read Tufte!), you may need to fall back on charts driven from spreadsheets. These can look great, but the mechanics of getting the finished picture on a web page can be a bit tricky. Start small – take your first baby steps with a simple uptime graph, and figure out how to publish and distribute with minimal effort. Once you get the hang of it, you can move on to more challenging metrics / communications.

Lies, Damn Lies, and Statistics

When dealing with metrics, you need to be careful and thoughtful when drawing conclusions or postulating cause-and-effect. Consider this first picture, showing the breakdown of help desk tickets between “just-in-time training” and true break-fix issues.

RootCauseBad
Click to enlarge …

One might infer that the user base has slowly but surely devolved over time. Trained employees leave the company or move on to other roles, new folks take their place. Training classes no longer exist, and little knowledge transfer takes place. The company is getting progressively dumber, and no one can stop the madness

Well, maybe not. let’s look at the same metrics, but presented as actual volumes …

RootCauseGood
Click to enlarge …

This is a completely different picture; the marked decline is almost entirely in “break-fix” issues. Clearly, IT has been spending much of their time fixing the nagging little bugs and annoyances that lead to user problems. The number of “How-to” calls has been reasonably steady … maybe this means that IT could stop programming and start working with the business on knowledge capture and retention …

This Post Has 0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles
Saving For A Rainy Day ... (Innovation Budget)

Accelerate Innovation with a Simpler Budget Approach

Organizations are desperate for innovation, but these are still investment choices that require complete and credible data to enable the right decisions. Developing a simple standard for characterizing all costs will accelerate decision making.

Read more