Friday 13 May 2011

eduserv Symposium, Above the Clouds

Armando Fox, from UC Berkley gave the final keynote at the Eduserv symposium which was entitled Above the Clouds, A View from Academia.

The report, Above the Clouds, on which the talk was based is available here.

The talk centred around how his lab had used the cloud over the past few years, and some of then issues associated with that. It was a fast paced talk, full of good information, and I'm just going to post a few key points, the video of the whole thing will be up on the Eduserv web site soon.

One of the things that had interested him and his lab was whether they could demonstrate that using cloud based services they could enable an entrepreneur to prototype a great web app over a long weekend and then deploy at scale. eBay had supposedly been developed over this time scale, but had had to be re-architectured many times since to cope with scale problems.

They had moved their services to Amazon's EC2 in 2008, and since then have spent $350,000 on amazon web services. That's about 1/3 of a PhD student a month. It's allowed them to carry out many experiments, ( 100 to 300 nodes most common, 900 max), have large scale storage and carry out cloud programming.
They have done work that they could not have done without cloud services, and it has acted as a research accelerator, at a cheaper cost.
Has given students an experience they would not have been able to have. Administering, provisioning, sizing and delivering courses have been much easier on the public cloud than using UC instructional computing.

In terms of costs, capital, hardware, networking and power is 5 to 7 times cheaper at 100k scale, ie when you have data centres that have at least a hundred thousand servers in them.
Cloud operations are heavily automated with 1000s of machines looked after by 1fte admin
This scale makes availability affordable with wide- area disaster recovery facilities.

It's hard to compete on cost with cloud providers, and that's even with their margins, which are estimated to be big! However, more competition may bring costs down.

Cloud allows you to smooth out peaks and troughs. Not waiting in a queue accelerates research. You can run several experiments simultaneously each using 100s of machines for 1 to 2 hours, without queuing up.

On the other hand, a lot of data is generated. for example, the LHC generates 60TB per day. All of this data needs moving. In the US, long haul networking is the most expensive cloud resource. UC Berkley have found that it's easier, cheaper and quicker to ship the drives to Amazon in the post. In UK, we are lucky to have JANET, but we need to combine this with cloud providers, ie get them direct links to JANET.

Does cloud create a single point of failure?
30 hour Amazon outage in April 2011. Triggered by human error during network configuration change. Good test case!
Netflix were largely unaffected, yet they are one of Amazon's largest customers, because Netflix had re-architectured their software to think about how to deal with failure.
Non redundant services were screwed with catastrophic outages. Cloud does not buy you redundancy.
Would more operational expertise have resolved the outage faster? Should they have been able to recover faster? Interesting question!

Keeping up with innovation can be an issue with cloud - AWS has deployed 1 new service every two months.

- Posted using BlogPress from my iPad

No comments: