Thursday 14 October 2010

Keeping customers in the loop in an IT meltdown

IT critical incident communication by the University of Wisconsin Milwaukee

April 2009 had just drafted their communications processes, when it had a good test as they lost their email and calendaring for 4 days.

IT alert systems project had asked customers in an IT incident, what do you need to know? When do you need to know and how do you want to be told. Discovered that campus wanted to know about an incident within 10 mins of it happening. And they wanted to know via university home page. Problematic because home page primarily marketing. So compromise, put small icon in bottom right of home page which links to IT status page. Small red triangle with exclamation mark.

First step was to define roles and responsibilities. Not staff, but roles. Developed a repeatable process that can be applied to every incident. Drafted documentation for all staff to see. Then trained techncial staff who can post to status page in when to do it, what to include, how to write. Then had sessions for all staff in IT department to help them understand internal processes in an incident. Had to be all staff because they represent the department, eg when they walk across campus people will ask them what's going on.

Helpdesk are key players. Collect information from customers. First point of contact. Assess information and determine if they have to escalate. Also are key points of contact during an incident.

Incident coordinator only exists is there's an incident. First point of escalation and can post to status web page. Can also contact supervisor on call and call in technical staff.

Incident communications coordinator. Deals with incident comms internally and is only person allowed to talk to technical staff. Also let's senior management know about incident and deals with comms to IT support people in depts.

IT strategic communications manager is brought in if incident is serious enough. Deals with comms to customers, eg FAQs, scripts for helpdesk, web pages, broadcast emails.


Lessons learned:

Process is critical for cordinated and consistent comms. That way everyone knows what their role is and steps aren't forgotten.

Be open to feedback

All staff have to understand their roles and understand the process ahead of time.

Have dedicated staff for strategic comms. Fees up operational people.

Managing campus expectations contributes to success. Be consistent. Then they know what to expect. Always be truthful. If it's a serious problem tell them, and if don't know fix time tell them. All you have is your integrity.

Update update update. No news is news. Update status web page regularly or they will think you've forgotten.

Upstreaming communications is vital. Tell the senior management as soon as possible. Tell external comms people in case media get hold of it.

Don't over communicate. Tailor the message to the audience. Local IT professionals need more technical info for example than campus in general. Senior management will want info on scope and impact.

Post event evaluation is critical for continued improvement.

After April 2009 4 day outage post evaluation showed that customers were not happy about the outage but were happy with comms. Used 10 different comms media including web page, twitter, voicemail on helpdesk phone and 67 different messages.

Use common sense to determine level of incident including assessing impact and numbers affected. Don't rely on strict definitions.

No comments: