At a RUGIT (Russell Group IT Directors) meeting in Liverpool for the next couple of days. Two interesting sessions today. The first one was on how we can use less power, especially for research computing. One of the scenarios examined was the use of PCs to perform tasks for High Throughput Computing, when they’re not being used, eg student open access PCs. This often uses a piece of software called Condor, and Condor clusters are in use at many UK Universities. One of the debates is around energy use – such PCs are usually in labs or classrooms, not the air-conditioned environment of a data centre. So, less power is consumed. But, we encourage everyone to turn their PCs off at night, so by leaving them turned on so jobs can be run on them, power use is increased. There is some research to show that using PCs still uses less power than dedicated computers in data centres, but this is still in its early stages.
We also looked at how power usage from our data centres (machine rooms) can be reduced. Basically, computers produce a lot of heat which has to be removed, usually by air-conditioning. Use of DC power (rather than AC), water-cooling, using waste heat to heat buildings, fresh air-cooling instead of air conditioning, are all things under consideration.
The second session was a look into the future – what will IT be like in 5, 10, 15 years time, and what effect will it have on our services. After an introductory talk looking at things like Moore’s Law, we split into groups to look at specific areas. My group had the task of looking at what consumer devices might be like in 5 years time – what will replace the iPod, iPhone etc, and what features will they have. We all agreed that they will be always on, always connected to “a network”, but we weren’t sure what that would be. They’ll have tactile interfaces, presence awareness and will communicate with each other. High definition video will be present and they’ll have a bigger virtual presence than their physical form, with either roll-out screens, virtual reality, or video projection. They’ll know who we are, where we are and what we’re doing. IT Services will have to accept that students will bring these devices to University and expect them to work and be supported. There'll probably be no need to provide University email systems, or wireless networks, as they’ll be available from so many other sources. Identity management and authentication will be key issues.
2 comments:
A fully optimised and well utilised compute cluster with a well configured scheduling system can be an energy efficient solution. With the new generation of multi core cpu's allowing power management the energy efficiency of such compute clusters can be improved.
At Sheffield University, Research Computing Group had a look at Condor as long as 4 years ago.
As well as teaching Condor as part of WR-Grid training program, Mike Griffiths also set up a Condor Cluster that contained the K17 teaching-lab PCs as well as a few desktop PCs. There is currently work being done in York University to use Condor as a meta-scheduler for WRG.
The High Throughput Computing Week had good discussions on the use of condor.
There are a number of obstacles to "Condor Clustered" Windows PCs providing the same service that more traditional HPC clusters can provide.
(1) Many of today's high CPU demand software and programs are written with Linux platforms in mind.
(2) Taking advantage of compute serving clusters, such as condor, require batch mode of operation. Linux programs tend to be easy to run in batch mode. Whereas Windows users would rather click the mouse here and there than type or remember any commands.
Forcing someone who is use to operating with a mouse to run his/her jobs in batch mode is a form of torture ! Therefore it is
unrealistic to expect windows machines running windows jobs during the non-office hours.
(3)For any condor cluster to have a realistic chance of being fully utilised, it should provide unix/linux service. This may be possible for some software by using a unix emulator such as Cygwin. Another option would be to dual boot them to linux at night.
(4)As for the power and efficiency
considerations; the major failure of the Condor solution is its current inability to wake-on-LAN
any of its workers. I have mentioned this to the Condor team during the Edinburgh meeting and asked them to give high priority to such a feature. As Condor is USA based they are still well behind with energy conservation considerations. Energy saving will be real if condor-pooled worker nodes can be woken up when there is any work to do.
There is one "energy usage" aspect of distributing HPC jobs to PCs around the campus that seems attractive to me and that is space heating of buildings.
Sometimes there is no better way of looking for solution than looking back to the past.
In Europe, many peasant-houses are built as two-storey buildings with the animals kept at the lower level. Reason: Natural Central Heating. How about adapting that model for our HPC facilities.
Last year, after a bit of encouragement Steve Beck of Mech Eng. has dedicated a research student to looking into heating and cooling in the Computer Centre
Building. I hope that from there we can perhaps move onto looking at capturing heat generated by computers to heat offices.
Perhaps at that point the ideal HPC configuration will be a flock of condor-pcs stabled in the ground floor of each building
keeping us warm by the brain-power of the research workers !
Post a Comment