Building a career in HPC

Working in supercomputing will always be a race to the top, says Durham University’s Dr Lydia Heck

Supercomputing is one of those industries that never stands still – driven by a race to the top; everyone wants the most powerful computer, the most energy efficient; with the smallest carbon footprint, the most users, the broadest user base and the best I/O and more. Nobody ever builds an average supercomputer.

For a computer manager – it is like being the mechanic in the pit lane at a Formula 1 race, you know that you’re working on the best equipment money can buy, and that means you have to be the best too.

The supercomputers that I bought and managed for research groups at Durham University, in particular for the Institute for Computational Cosmology over the past 14 years have been some of the most powerful ever built in the UK. My first system was capable of delivering a single Megaflop per CPU (core processor unit)  and now, 28-years later, my current system can deliver 20.8 Gigaflops on just 1 core [that’s 20,000 times faster per core] and the current system is 35 million times more powerful. In between, I have built and managed more than 10 supercomputers.   

‘For a computer manager – it is like being the mechanic in the pit lane at a Formula 1 race, you know that you’re working on the best equipment money can buy, and that means you have to be the best too’

Right now, I’m personally responsible for the specification, procurement, installation and maintenance [wielding a screwdriver when necessary] of Durham University’s most famous HPC asset – COSMA. Originally 64 workstations with a Myrinet interconnect and 128 cores, this system evolved first to 528 cores, then 800 cores, 3,000 cores and finally 9,856 cores with continued operation of both COSMA version 4 and COSMA 5 simultaneously! 

There are six things that have helped in my career: 

1.     A good education always helps when mapping out a career. I hold a Diplom-Physikerin (equivalent to having a research masters in physics) and I hold a PhD in theoretical high-energy physics. I have worked in four different research areas in physics, from particle physics to cosmology. This understanding of science, my solid mathematical background and my experience in research has given me the domain knowledge to understand more completely my users and their codes and what they want and need from a High Performance Computing (HPC) system. 

2.     My path into the supercomputing world was slightly unusual in that I didn’t train for or apply for a job managing a supercomputer. I worked for someone who was already using such a system and when it failed, I stepped forward to repair it. That was the theme of my first few roles – stepping forward and getting my hands dirty and at the same time doing research into physics. Ultimately, putting myself forward has provided opportunities along the way.

3.     I learned very early on that you need to keep up with technical developments. Every single day I am now researching the best ways of delivering compute power to those who need it. I meet lots of tech-savvy people, stay in touch with hardware and software vendors like IBM, Lenovo, Atos-Bull, SGI, HP, Cray and others; and not forgetting building strong, ongoing relationships with integrators like OCF.  Through these contacts I can stay ahead in the game. 

4.     Don’t be afraid to try out what you have learned – it’s the fastest way to improve. I am continuously learning on the job. Sometimes, I’m applying knowledge to COSMA that I only learned myself two weeks before. That’s the high performance world; I can’t stand still!   

5.     You must get to know and understand the users of your supercomputer [see my point about having domain knowledge]. Day by day, I am working with researchers to understand their code, how to get it running more efficiently, to understand why it doesn’t work, and why it stopped working when it was fine previously! I can do this, because I have a science degree and a PhD and a passion for HPC.

6.     Lastly, value people for their brain power- don’t judge a book by its cover. There is always lots of debate in the industry around the role of women in IT. I think in this job – and any other for that matter – knowledge, passion, experience and attitude reign supreme.

ABOVE: Dr Lydia Heck is currently manager of the DiRAC Data Centric system (https://www.dirac.ac.uk) at Durham University and responsible for all of the COSMA High Performance Computing System.

INTERACTIVE ROUNDTABLE

The Role of Testing within Digital Transformations

Wednesday, January 26, 11AM (GMT)