How a continually evolving HPC can drive university research forward

Simon Thompson, research computing infrastructure architect at the University of Birmingham, discusses how high-performance computing is a key factor in achieving greater research impact

As a leading research-led institution it is important that we are at the forefront of data research. However, modern artificial intelligence (AI), high-performance computing (HPC) and analytics workloads are driving an ever-growing set of data intensive challenges.

With the sheer amount of data that is being produced and processed by our researchers, the common questions we hear are ‘how can we analyse it fast enough?’ and ‘how can we make the process even quicker?’. These challenges can only be met with accelerated infrastructure.

At Birmingham, we have developed the Birmingham Environment for Academic Research (known locally as BEAR). This system includes HPC, data storage and private cloud in a tightly integrated environment where we believe in data access everywhere.

We are therefore always looking at ways to make research quicker and more accessible to our researchers through our infrastructure. Working with OCF, the HPC, storage and data analytics integrator, as our partner, we have the skills and knowledge we need and access to the right suppliers to continually update BEAR and keep pace with the escalating demand.

Recently, we realised that we needed more computational power tailored to the ever-increasing AI workloads generated by the university’s researchers. Therefore, the latest plans to upgrade BEAR will involve us integrating a total of 11 IBM POWER9 AC992-based servers; the same type of system installed in the number 1 and number 2 computers in the Top 500 list of supercomputers.

In the past few years we’ve seen a massive growth in life sciences at Birmingham and the university has supported computational demands with a multi-million-pound strategic investment in computing facilities which has supported this upgrade to BEAR.

We are always looking at ways to make research quicker and more accessible to our researchers through our infrastructure.

This significant enhancement to BEAR will mean an even more powerful and versatile computing environment to serve researchers. For example, Birmingham’s fellows working with The Alan Turing Institute looking at early diagnosis of, and new therapies for heart disease and cancer, will use AI to run faster diagnostics in the future. This could help support transformation in the healthcare sector, potentially saving lives.

In contrast, researchers in the physical sciences are similarly using machine learning and data science approaches to quantify the 4D (3D plus time) microstructures of advanced materials collected at national large synchrotron facilities such as the Diamond Light Source. This research expects to use the large-model support delivered in IBM’s PowerAI software stack to analyse terabytes of data being generated daily; currently an almost impossible task.

Advertisement

Using HPC to solve these sorts of challenges necessarily uses large amounts of power. At Birmingham, we take this seriously, and the energy efficiency of our whole environment is also key.

In 2018, the university opened a state of the art £5.5M ‘hot’ data centre, the UK’s first purpose built water-cooled research-focused data centre, meaning that 85% of the heat is recovered directly through the water-cooled systems, delivering impressive energy savings by minimising the cooling overheads.

Our HPC and GPU accelerated servers includes warm water-cooled nodes, where water is taken directly across the CPUs and GPUs at temperatures up to 35°C. Our on-premise private cloud deployment also uses the same technology.

A unique installation in the UK, the data centre doesn’t use any air-cooling systems and accommodates the IBM systems running alongside ‘direct to node’ water-cooled technology from Lenovo.

Using HPC to solve these sorts of challenges necessarily uses large amounts of power. At Birmingham, we take this seriously, and the energy efficiency of our whole environment is also key.

The climate in the UK means we can cool the systems without the need for compressive cooling, and for much of the year benefit from dry-air cooling.

The use of water cooling gives us an added advantage. As the CPUs are so efficiently cooled, we see sustained ‘turbo mode’ on the CPUs, meaning we get extra speed just through careful use of cooling technologies. Looking into the future, we’re not seeing the thermal demands of CPUs dropping and so we’re really well-placed to deliver dense HPC solutions, keeping both our space and energy footprint as low as possible.

In building our GPU capacity, we’re acutely aware of the need to ensure we can feed the systems with our researcher’s data; there’s no point in having super-fast technology if we can’t utilise it well.

We’ve always taken the approach that data should be pervasive across our solutions and have deployed some of the most advanced software defined storage which helps us target data placement. We’re currently evaluating our options for high speed NVMe-based storage to help deliver high-speed access into the GPU systems.

We are faced with a data explosion and we need the appropriate computational resources to be able to process this data. We’ve never been hung up about the number of cores in our HPC cluster, but we do care about providing a reliable system, with cutting-edge HPC tools for our researchers from both traditional and non-traditional disciplines.

We want them to be able to process data faster and generate new findings, achieving greater research impact. Also, in such a highly competitive field as academic research, providing superior HPC services to compute large quantities of data quickly can help to attract world-class researchers, as well as much needed grants and funding to move our research forward.

Advertisement