Supporting research through HPC investment

By Dr. Andrew Richards, Head of Advanced Research Computing at the University of Oxford

At the University of Oxford we have a long history in High Performance Computing. Part of the Advanced Research Computing (ARC) facility at the University, our HPC resources help researchers access advanced research computing facilities locally, nationally and internationally. 

Research from the University is rich and diverse with all four Divisions of the University – Mathematical, Physical and Life Sciences; Medical Sciences; Social Sciences; and Humanities – having access to and, use of the HPC cluster. They conduct research into computational chemistry, engineering, financial modeling, and data mining of ancient documents. As well as this, researchers collaborate both nationally and internationally on projects like the T2K experiment using the J-PARC accelerator in Japan. It’s even used by anthropologists to study religious groups using agent-based modeling. 

With HPC resources being important to so many researchers we recently launched a new, upgraded cluster that we nicknamed ARCUS Phase B, to complement our existing ARCUS Phase A cluster. The new cluster also replaced an older SGI based system was coming to end of life and it really wasn’t as efficient as it used to be.

We worked with OCF, a specialist integrator of HPC, big data management, storage and analytics, to design and build the system, which uses Lenovo NeXtScale servers as the main compute power in the cluster. 

HPC at the University level has become much more commoditised than it used to be. New machines feel much more ‘off the shelf’ than before [when you felt like you were building a machine from the ground up]. Having said that, its still a challenge for companies like OCF and we had some very specific requirements around integrating the new cluster with existing storage, as well as the general design of the machine, which features three different Intel CPUs as well as NVIDIA GPUs. The general design of the machine, therefore had to match the workloads that would run on it, so that’s where working with an OCF/Lenovo, integrator/vendor type partnership really works. They’re the experts that understand the mix of different workloads best and, so, can make recommendations on the best technology for the HPC cluster.

From an operational point of view, while we did need someone to ‘rack and stack’ the cluster and get it operational, we have a team of people within ARC to look after it on a day to day basis, so we needed to make sure the machine would also fit in with our operational requirements and job scheduling methodology. That’s where the value comes from above the actual hardware and installation. 

One of the areas we’ve been trying to improve over the last few years is the idea of co-investing in HPC resources, rather than individual departments buying their own smaller clusters. Instead, departments can invest the capital with us and we can add those resources into buying a larger cluster. They benefit from a much larger machine and can get priority access when needed.

The Networked Quantum Information Technologies Hub (NQIT) at the University did actually co-invest in ARCUS Phase B and, at their request, 20 NIVIDIA GPUs were added to the machine. NQIT get the benefits of a much larger machine, and the whole university benefits from the additional GPUs, which wouldn’t have been there without the co-investment.

NQIT are a great example of Government funded activity around developing quantum technology and they’re helping the University of Oxford take a leading role in the development. What they’re trying to model is quantum computing – looking at how you design the components that would make up a quantum computer. They do a lot of numerical analysis and statistical modeling on how quantum computers behave. Ultimately, they’re using the University’s HPC resources to effectively build a working quantum computer that would be resilient to errors and actually be usable.

As a central resource for the entire University, we really see ourselves as the first stepping stone into HPC. From PhD students upward i.e. people that haven’t used HPC before are who we really want to engage with. I don’t just see our facility as running a big machine; we’re here to help people do their research. It’s about offering more than just the hardware to researchers, we have a team of full-time staff that specialise in HPC hardware and software to support students. That’s our value proposition.