Sunday, April 3, 2011

Root Cause Analysis Toolkit now ready!

Today I finished the Root Cause Analysis Toolkit and have placed it on the ITIL page for download. If you do buy the toolkit please provide feedback on this blog as this will be used to continually improve it. The toolkit is a collection of information from numerous RCA exercises and many people's input.

Root Cause Analysis Toolkit
The Root Cause Analysis (RCA) Toolkit contains everything needed to learn how to conduct an effective RCA. Not only does it show what an RCA is but it also includes templates, how-tos, illustrates the benefits and defines terms.

The RCA Toolkit includes:
  • RCA Overview
  • RCA checklist
  • Casual map examples
  • Event/Cause map examples
  • Fault tree analysis 
  • RCA glossary
  • RCA templates for all activities
  • RCA - Return on Investment (ROI)
  • A review of the 5 whys
  • An outline of the entire RCA process

Friday, February 4, 2011

Benjamin Franklin's 5 Why Analysis



For want of a nail, a shoe was lost.
For want of a shoe, a horse was lost.
For want of a horse, a rider was lost.
For want of a rider, an army was lost.
For want of an army, a battle was lost.
For want of a battle, the war was lost.
For want of the war, the kingdom was lost.
...and all for the want of a little horseshoe nail.

Monday, January 31, 2011

Why Service Level Agreements Fail

Service Level Agreements (SLAs)can be a great tool if done right. Most times however, SLAs are done just so it can be said such a tool exists and the details of these poorly thought out agreements can cause all sorts of chaos. Remember an agreement is just that and must be beneficial to both parties with constant review and update.
Here are just some of the mistakes that are made in SLAs:
  1. Wrong Metrics: Parties enter into SLAs with a purpose in mind. This is usually to improve one or more business objectives. This includes cutting costs, improving customer service, reducing risk, increasing capacity or simply becoming more efficient. To have an SLA with real value it must have metrics that show performance of these objectives. If the metric(s) show a decline in performance then action should be taken to avoid violation of the SLA. Likewise, if performance is too high action should be taken to avoid excessive waste. Are you measuring what the SLA was designed to improve?
  2. Wrong Goals: You might have the right metric(s) but the wrong goal (or both). A good SLA should have both metrics and g0als or targets. For example, a typical ISP SLA states it will keep up-time to 99.999% which sounds great. It is only when you read the fine print that you realize this goal is only for their core network which is fully redundant. It does not mean your business will have 99.999% up-time. In fact, it likely has not goal for that.
  3. Poor Design: An SLA needs to be comprehensive enough to cover more than a few key metrics. It must have enough detail for both parties to be able to measure and improve key performance issues. Continuing with the ISP example, the SLA should contain not only up-time but also connection quality. While up-time is nice the line could be saturated causing dropped packets which is not technically "down" but makes doing business extremely difficult. Look for complimenting metrics that can be used to "troubleshoot" the service should it degrade.
  4. Poor Management: Creating a good SLA is hard work. Don't waste all that work by not monitoring the performance against the SLA. SLAs not only need up front investment to create but ongoing investment to monitor and improve throughout the life cycle of the contract.
  5. Low Value and Misused Penalties and Incentives: Quite often parties want an SLA because they are unhappy with existing service levels and want to punish the other party with the violation of the SLA. This creates a poisonous service relationship and ultimately results in either a termination of service or legal action. Instead, SLAs should be designed as a tool to help highlight opportunities where focus can be applied to ensure service levels remain adequate. This often means an investment by the service provider and could result in an increase in cost by the customer to receive the increase in service. In the ISP example, a customer receives a free day of service when there is a loss of service. This penalty is inconsequential to the provider and is also meaningless to the business who may have lost thousands of dollars due to the outage.
  6. Creating a Static Document: Things change. What was acceptable today won't be tomorrow. Don't write an SLA that does not consider this. As funding changes for a service that service level will fluctuate. Both parties must work together to ensure desired changes in service levels are matches with a sufficient investment. Any agreement is a promise between two parties. You are making a promise to deliver or receive for a monetary amount. Ensure you are able to do this knowing things change.

Sunday, January 30, 2011

Six Sigma - What is it?

Hundreds of books have been written on this subject. It has changed organizational cultures and revolutionized the manufacturing industry. It is a source of revenue for thousands of consultants and consulting firms. It is an industry in and of it's self. For those who are not familiar with Six Sigma, the following paragraphs give you a brief overview.


The Six Sigma system is a way to improve processes in work and manufacturing and its main goal is to eliminate defects. The Six Sigma methodology has been widely used by many Fortune 500 corporations with amazing results and can be used in small groups to achieve goals or on a corporate level affecting tens of thousands of workers. The short definition of the Six Sigma system is a set of practices that improve efficiency and remove defects.

The Six Sigma system has been around for over 20 years and was built upon the TQM (total quality management) and Zero Defect principles. It strives to achieve high quality manufacturing and business processes by continued efforts to reduce variations.

The major methodology of Six Sigma states that in order to eliminate defects or variations, processes used in both business and manufacturing must be measured, analyzed, controlled and improved upon. In addition, Six Sigma requires a sustained commitment from a small group or an entire organization.

Six Sigma refers to a defect level of lower than 3.4 defects or variations per million opportunities. Its name and actions strive to achieve high quality output. The Six Sigma methodology has been extremely successful throughout the business world and has helped companies save billions of dollars through enhanced productivity and a reduction of defects. The Six Sigma system was originally started by Motorola and is a trademark of the Motorola Corporation.

For more information on Six Sigma, Wikipedia is a good starting point.

Saturday, January 29, 2011

Understanding the Pareto Principle - The 80/20 Rule


The Pareto Principle originally referred to the observation that 80% of Italy’s wealth belonged to only 20% of the population.

Today, the Pareto Principle is the observation (not law) that most things in life are not distributed evenly. It is often used to quote the following:
  • 20% of the input creates 80% of the result
  • 20% of the employees produce 80% of the result
  • 20% of the customers create 80% of the revenue
  • 20% of the bugs cause 80% of the crashes
  • 20% of the features cause 80% of the usage
While management must always be aware of this concept to ensure effort is placed in that highest rewarding 20% area we must also realize that we often think these two numbers (typically 20 and 80) must add to 100 when this is not actually the case.

Most people are surprised by this but when we think about it, it becomes obvious. Twenty percent of the employees could create 10% of all results, or 50%, or 80% or 99%, or even 100%. Remember that the 80/20 rule is a rough guide about typical distributions. The numbers don’t have to be “20%” and “80%” exactly. The key point of the 80/20 rule is that most things in life (effort, reward, output) are not distributed evenly, some contribute more than others.

The Pareto Principle helps you realize that the majority of results come from a minority of inputs. With this in mind, it is important for management to know and act as follows:
  • Know: 20% of employees contribute 80% of results - Act: Reward those employees.
  • Know: 20% of bugs contribute 80% of crashes - Act: Fix these bugs first.
  • Know: 20% of customers contribute 80% of revenue - Act: Satisfy those customers.
The key point of Pareto is to realize that you can often concentrate effort on the 20% that makes a difference, instead of the 80% that doesn’t add much.

In economics terms, there is diminishing marginal benefit. This is related to the law of diminishing returns: each additional hour of effort returns a diminishing reward. By the end, you are spending lots of time on the minor details.

In conclusion, I use the Pareto Principe to hone in on value but I also realize not to be strict on the percentages. I may focus on 5% or 50% depending on the return on my investment.

The Simple Truth of Service

I often talk about empowerment and how that is not something anyone can give you but instead a mindset an individual must create. This video is a great example of that. Johnny the bagger looks at what he can do to increase the quality of service provided. Even though his part in the company is small and normally overlooked he finds a way to increase quality and it has a huge impact.



Friday, January 28, 2011

The 5 Why’s

One of may favorite root cause analysis techniques (or simply problem solving) is to teach people to ask, "why?" Our kids learn this trick and it drives us nuts but gets them a lot of information. We can learn from them and use it when we need to get to the bottom of an issue.
  • The 5 why's refers to the practice of asking, five times, why the failure has occurred in order to get to the root cause/causes of the problem.
  • It illustrates the importance of digging down beneath the most obvious cause of the problem.
  • Failure to determine the root cause assures that you will be treating the symptoms of the problem instead of its cause, in which case, the disease will return, and you will continue to have the same problems over and over again.
Notes:
  • The actual numbers of why's is not important as long as you get to the root cause.
  • There can be more than one cause to a problem.

Example 1
Problem Statement:
You are on your way home from work and your car stops in the middle of the road.
  1. Why did your car stop? - Because it ran out of gas.
  2. Why did it run out of gas? - Because I didn't buy any gas on my way to work.
  3. Why didn't you buy any gas this morning? - Because I didn't have any money.
  4. Why didn't you have any money? - Because I lost it all last night in a poker game.
  5. Why did you lose your money in last night's poker game? - Because I'm not very good at "bluffing" when I don't have a good hand.

Example 2
Problem Statement:
The Washington Monument is disintegrating.
  1. Why is the Monument disintegrating? - Because of the use of harsh chemicals
  2. Why are harsh chemicals being used? - To clean pigeon poop
  3. Why are there so many pigeons? - They eat spiders and there are a lot of spiders at monument
  4. Why so many spiders? - They eat gnats and there are lots of gnats at monument
  5. Why so many gnats? - They are attracted to the light at dusk.
Solution: Turn on the lights at a later time.

Try asking "why" next time your presented with a problem to see how many it takes to find the solution.