Monday, 12 November 2007

Waves of Power


Incidents have a habit of happening when I’m not around – even when Sheffield flooded back in June, I was away on a course, in the Lake District in beautiful sunshine. It shouldn’t have surprised me too much then when I received a call on my mobile which stared with those immortal words – “I think we’ve got a problem” - and I had just arrived in Scarborough on a long weekend!

On Thursday evening at 1720, a fire in a local substation caused a power outage to large parts of the University – including our main data centre. We’ve just spent a lot of money installing a diesel-powered generator to protect us against things like this, but it isn’t fully commissioned yet as we need to install a new UPS. We knew we had a two-week window where although the generator would cut in and provide power within seconds of a power loss, we didn’t have those few minutes of power from a UPS to allow that to happen. I mean, what were the chances of a power cut occurring in those 2 weeks? As it happened, the power was only off for about 20 seconds, as the generator kicked in, but 20 seconds is no different to 20 minutes or 20 hours to a computer. You know what a state your home PC is in if you just pull the plug out without shutting it down properly? Now imagine a room full of network and communications equipment, and about 70 servers, fallen over in a heap!

I’m lucky in that we have a team of excellent staff who worked until about 2330 to bring services back – a highly complex operation because of the dependencies of so many processes - often unknown and unforeseen. Also many systems require to be started in a particular order. Power came back on at 0600, and the generator did as it should and closed down, allowing the mains power to take over.

For the rest of the day many areas had to remain closed, as we were operating on reduced power, and there were a number of other building and security issues to be resolved. Work continued to make sure our systems were working correctly, and in the afternoon it was agreed to switch the data centre back over to the generator to fix some high voltage problems with the substation. Unfortunately, this didn’t work as expected and we lost power again – only for 10 seconds this time (I’m told that it was as long as it took the engineer to realise what had happened, utter an expletive, and flick the switch again) - – and completely self-induced! Cue all systems to be restarted again – slightly quicker this time after Thursday night’s rehearsal.

As I type this, everything appears to be back to normal, and our Business Continuity and Incident Plan has had a good test. Obviously nothing went as well as it could have done, and there are many lessons to learn, but I would like to put on record my thanks to all of the staff involved over the period who worked really hard.

Oh, and while I was keeping in touch with what was going on here on my mobile, I was watching the highest tide Scarborough has had in 50 years with some fantastic waves. But, the donkeys were back on the beach the next day!

No comments: