Cloud bursting for higher education

Mahesh Pancholi, Research Computing Specialist at OCF, discusses the pros and cons of cloud bursting vs. in-house services for universities

Cloud bursting is revolutionising research at universities. A lot of universities are now engaging in using cloud bursting and are regularly taking advantage of public cloud infrastructures that are widely available by large companies like Amazon, Google and Microsoft. The concept of cloud bursting essentially came out of spare capacity that Amazon had on its massive server farms whilst running its websites. These massive server farms were built to meet particularly high demands at times like Christmas and Black Friday, but the rest of the time they sat idle, so the idea was created to sell that spare capacity.  

This has since grown into a whole business otherwise known as ‘infrastructure as a service’ (IaaS). Instead of having to buy your own kit and run your own services, you can rent time on someone else’s server and use their data centre resources. There is no longer a need to worry about power and electricity costs, data centre space or system administrator’s fees, as you pay a subscription cost to the IaaS company who will do it all for you.

Cloud bursting in universities

The uptake of the public cloud in universities has already happened, particularly when providing core IT services. By using Office 365, rather than an in-house email server, a university is utilising capacity in the cloud, so instead of having a rack of servers and system administrators to run their email service, it has become a full service from the public cloud for all of the university’s users. That’s probably where the biggest uptake started and since then there has been the realisation that at some point in the future, these cloud services are likely to be cheaper to run than buying your own equipment and running it all in-house. 

In general, there has certainly been enthusiasm to move towards cloud services and out of that came the OpenStack revolution, which is seen by many as the best of both worlds. You get a ‘cloud-like’ service with the ability to provision whatever type of server you want as a virtual machine, but with the advantage of it being onsite, giving you the control, privacy and data sovereignty. 

For example, many organisations prefer not to put HR data on the cloud, but if you have OpenStack onsite, you have a flexible compute platform where the HR data can sit idly for most of the month, and then for the five days it has to work hard, it can burst out to the rest of the infrastructure; helping everything run more efficiently and quickly for that crucial time of the month. Providers of research computing infrastructure have been keen to take advantage of the flexibility and security OpenStack provides and projects such as eMedLab and CLIMB are two very successful examples of this showing that private cloud has an established use for many universities. But what about public cloud?

Does it come at price?

With cloud bursting, there can be attractive initial rates to run the servers, but on top of that there are all the additional costs which, unless you’re experienced at running IT or cloud infrastructures, might not necessarily be noticed on the outset. For example, there are costs around data egress whereby most companies will say it is free to put your data into the cloud, but then there will be a cost to store the data on a monthly basis and a cost to access that data. So essentially, you are paying for the bandwidth when you are accessing your data back out of the cloud.

When conducting a scientific experiment which involves huge amounts of data, you need to continuously access the terabytes of data to run analyses, and you’ll get charged every time you retrieve that data. In some instances, even if you just try and run a search through your file structure, that counts as a data egress charge as you are still accessing the data. Unless all the potential scenarios have been considered in the use of cloud bursting, these sorts of costs can sneak up on you and it can become very expensive very quickly.

This has been recognised by both public cloud providers and the UK’s provider of digital solutions for UK education and research, Jisc. There are ongoing efforts to provide special pricing agreements for universities, waivers for certain charges, and even large amounts of credits to help show the utility of using public cloud services to enhance and expand the research capabilities for universities. Public cloud providers are surveying the market and partnering with companies, like OCF, for their pedigree in providing cutting-edge solutions to the UK research computing community for many years, in order to help universities take advantage of their products by integrating them with the existing infrastructure such as HPC clusters.

This leads us to the real argument for the public cloud. You no longer have to build a system big enough for your largest workload.  Black Friday is a good example… you don’t want to have to build a server farm big enough for Black Friday. You build a server farm big enough for your ‘normal Wednesday’ and burst out to the public cloud to deal with the additional workload. If there is no such thing as a ‘normal Wednesday’ for your organisation (such as a consultancy with a very peaky workload) it makes even more sense for the whole workload to be in the cloud, so that you only pay for what you consume rather than having too much or worse too little compute available to you.

People power

There is a noticeable increase in awareness of the benefits of public cloud bursting by universities, particularly in research computing. Whilst no-one is replacing their on-premises HPC system with the public cloud yet, it is recognised that bursting into the public cloud is incredibly useful for the provision of the latest technologies or extra capacity and expertise for researchers. 

There will need to be a culture shift in universities and funding for HPC in the cloud to be fully accepted, and most importantly, the costs in using the public cloud will need to be driven down further for wider adoption, but the trends are pointing to it not being too far off.