Around 100 itSMF members met at Chelsea Football Club to learn about ‘Proactive Problem Management’ from a variety of industry specialists.
Firstly, a quick summary of the sessions (Football Clichés A Go-Go):
FOX IT – GENTLE STRETCHES TO WARM UP
John Griffiths from Fox IT explored the human elements of problem management, the communication channels that exist between incident capture and problem resolution and the interpretation and translation that must happen via the service desk.
SERVICENOW – OWN GOAL
In many ways this event felt a little like a ServiceNow user group – but when the SaaS vendor took centre stage to deliver some thought leadership we were delivered an undiluted sales pitch.
I’ve had the pleasure of meeting David D’Agostino before and know him to be clever, funny and articulate – so I had high expectations for this session. This was an opportunity missed – The itSMF need to be brutal with their editorial – in the end it’s the vendor who came off worst.
KEPNER TREGOE – THE CROWD ARE ON THE PITCH…
A great session from Steve White at Kepner Tregoe. Steve hosted an interactive whiteboard session on defining proactive problem management. For me and for the other delegates I spoke to this was the highlight of the day. More like this please itSMF! It would have been interesting to perhaps walk through some real life scenarios and discuss options with the audience using this open forum approach.
PINK ELEPHANT – A HEARTY PERFORMANCE
Unfortunately I missed parts of Vawns Guest’s session but from what I saw and feedback from others Vawns gave a passionate lesson on the relationship between incident, problem and availability management.
OASIS HEALTHCARE – END-TO-END ACTION
This was an interesting case study from Mike Evans from ITS and Rich Starkey from Oasis Healthcare. The double act provided a before and after picture of progress at Oasis Healthcare, a network of over 200 UK dental practices. It was also great to see an organization sharing business benefits and return on investment for their project.
Is Honesty The Best Policy?
An interesting point was made during one of the sessions regarding honesty with problems – i.e. do we tell the customer we’re experiencing a problem?
There were mixed views on this – do we keep our problems to ourselves for fear of the organization using it against us or do we openly admit that, we’re human, mistakes happen and we’re doing everything we can to resolve it?
In my view – How an organization answers this question gives a good insight into their culture and maturity. I’m sure that at times there are perfectly good reasons for keeping schtum – but I think honestly is the best policy.
Whether you are trying to run trains on time, hosting services in a datacenter or delivering fruit and vegetables– a bit of honesty from your provider strengthens the relationship and gives the impression that you are not just being fobbed off.
Overall I would definitely recommend this seminar, some interactive sessions with lots of questions. I look forward to attending future itSMF seminars this year (further info here).
Finally, Colin Rudd asked the audience if there was interest in rejuvenating the Problem Management SIG and the response was positive – contact itSMF to learn more.
Recently I’ve been working on Incident Management, and specifically on Major Incident planning.
During my time in IT Operations I saw teams handle Major Incidents in a number of different ways. I actually found that in some cases all process and procedure went out of the window during a Major Incident, which has a horrible irony about it. Logically it would seem that this is the time that applying more process to the situation would help, especially in the area of communications.
For example in an organisation I worked in previously we had a run of Storage Area Network outages. The first couple caused absolute mayhem and I could see people pushing back against the idea of breaking out the process-book because all that mattered was finding the technical fix and getting the storage back up and running.
At the end of the Incident, once we’d restored the service we found that we, maybe unsurprisingly had a lot of unhappy customers! Our retrospective on that Incident showed us that taking just a short time at the beginning of the outage to sort out our communications plan would have helped the users a lot.
ITIL talks about Major Incident planning in a brief but fairly helpful way:
A separate procedure, with shorter timescales and greater urgency, must be used for ‘major’ incidents. A definition of what constitutes a major incident must be agreed and ideally mapped on to the overall incident prioritization system – such that they will be dealt with through the major incident process.
So, the first thing to note is that we don’t need a separate ITIL process for handling Major Incidents. The aim of the Incident Management process is to restore service to the users of a service, and that outcome suits us fine for Major Incidents too.
The Incident model, its categories and states ( New > Work In Progress > Resolved > Closed ) all work fine, and we shouldn’t be looking to stray too far from what we already have in terms of tools and process.
What is different about a Major Incident is that both the urgency and impact of the Incident are higher than a normal day-to-day Incident. Typically you might also say that a Major Incident affects multiple customers.
Working with a Major Incident
When working on a Major Incident we will probably have to think about communications a lot more, as our customers will want to know what is going on and rough timings for restoration of service.
Where a normal Incident will be handled by a single person (The Incident Owner) we might find that multiple people are involved in a Major Incident – one to handle the overall co-ordination for restoring service, one to handle communications and updates and so on.
Having a named person as a point of contact for users is a helpful trick. In my experience the one thing that users hate more than losing their service is not knowing when it will be restored, or receiving confusing or conflicting information. With one person responsible for both the technical fix and user communications this is bound to happen – split those tasks.
If your ITSM suite has functionality for a news ticker, or a SocialIT feed it might be a good idea to have a central place to update customers about the Major Incident you are working on. If you run a service for the paying public you might want to jump onto Twitter to stop the Twitchfork mob discussing your latest outage without you being part of the conversation!
What is a Major Incident
It is up to each organisation to clearly define what consitutes a Major Incident. Doing so is important, otherwise the team won’t know under what circumstances to start the process. Or you might find that without clear guidance a team will treat a server outage one week as Major (with excellent communciations) and not the next week with poor communications.
Having this defined is an important step, but will vary between organisations.
Roughly speaking a generic definition of a Major Incident could be
An Incident affecting more than one user
An Incident affecting more than one business unit
An Incident on a device on a certain type – Core switch, access router, Storage Area Network
Complete loss of a service, rather than degregation
Is a P1 Incident a Major Incident?
No, although I would say that every Major Incident would be a P1. An urgent Incident affecting a single user might not be a Major Incident, especially if the Incident has a documented workaround or can be fixed straightaway.
Confusing P1 Incidents with Major Incidents would be a mistake. Priority is a calculation of Impact and Urgency, and the Major Incident plan needs to be reserved for the absolute maximum examples of both, and probably where the impact is over multiple users.
Do I need a single Incident or multiple Incidents for logging a Major Incident?
This question might depend on your ITSM toolset, but my preference is to open a separate Incident for each user affected in the Incident when they contact the Servicedesk.
The reason for this is that different users will be impacted in different ways. A user heading off to a sales pitch will have different concerns to a user just about to go on holiday for 2 weeks. We might want to apply different treatment to these users (get the sales pitch user some sort of service straight away) and this becomes confusing when you work in a single Incident record.
If you have a system of Hierarchical escalation you might find that one customer would escalate the Major Incident (to their sales rep for example) where another customer isn’t too bothered because they use the affected service less frequently.
Having an Incident opened for each user/customer allows you to judge exactly the severity of the Incident. The challenge then becomes to manage those Incidents easily, and be able to communicate consistently with your customers.
Is a Major Incident a Problem?
No, although if we didn’t have a Problem record open for this Major Incident I think we should probably do so.
Remember the intended outcome of the Incident and Problem Management processes:
Incident Management: The outcome is a restoration of service for the users
Problem Management: The outcome is the identification and possibly removal of the causes of Incidents
The procedure is started when an Incident matches our definition of a Major Incident. It’s outcome is to restore service and to handle the communication with multiple affected users. That restoration of service could come from a number of different sources – The removal of the root cause, a documented Workaround or possibly we’ll have to find a Workaround.
Whereas the Major Incident plan and Problem Management process will probably work closely together it is not true to say that a Major Incident IS a Problem.
How can I measure my Major Incident Procedure?
I have some metrics for measuring the Major Incident procedure and I’d love to know your thoughts in the comments for this article.
Number of Incidents linked to a Major Incident: Where we are creating Incidents for each customer affected by a Major Incidents we should be able to measure the relative impact of each occurance.
The number of Major Incidents: We’d like to know how often we invoke the Major Incident plan
Mean Time Between Major Incidents: How much time elapses between Major Incidents being logged. This would be interesting in an organisation with service delivery issues, and they would hope to see Major Incidents happen less frequently
There you go. In summary handling Major Incidents isn’t a huge leap from the method that you use to handle day-to-day Incidents. It requires enhanced communciation and possibly measurement.
This is a quick review of Rob England’s book ‘Basic Service Management’.
You can find out more about Rob’s book and the TIPU method here: www.basicsm.com. If you want to share your own review please add a comment below.
In my opinion this is a well written introduction to service management.
This book might have also been called:
‘Service Management in a nutshell’
‘An introduction to Service Management’
‘Service Management for Business Owners’
‘The book on Service Management that you buy for your boss’ or
‘How to introduce someone to service management without scaring the bejesus out of them by banging on about ITIL or other IT geekery’
I read this in one sitting and I’m not a fast reader. It is quick, accessible and thought provoking.
It is not an ITSM or IT book per se, in fact I think the best recipient of this book is a non-IT business owner or service owner who wants to appreciate the benefits of service management.
As an ITSM professional, this is the sort of book you need to send to those you wish to educate and influence about your chosen profession. Or as one Amazon reviewer put it: “I recommend reading it before you get lost in ITIL”. This would also be useful to an entrepreneur looking to start or scale their business.
Why Service Management?
“If you are reading this book, you probably don’t manage your services so much. That gives you an opportunity to increase revenues and profitability: improving your service brings increased efficiency and effectiveness. That means increased returns for much less investment than from improving your products or equipment”.
Rob is a great wordsmith and well respected in the ITSM industry – my only criticism of this book is that I wish he had used the power of metaphor, story telling or examples to describe his seven practice areas. The second half of the book tends to slide into a glossary of his basic service management terms and bullet points. I thought this might have been a perfect opportunity for Rob to use some examples in order to reinforce his message and walk the reader through his ‘Seven Areas’ rather than explaining principles in purely theoretical terms.
In the ‘How to Use this Book’ section Rob urges the reader to “Read it, It is short”. In a similar fashion my advice to you as an ITSM professional is, “Buy it, it is good”.
Have you read Rob’s book? Please share your opinion in the comments below.