Skip to main content

Experiences of Czech scientists pilot testing the GPU partition of LUMI

LUMI supercomputer

The world’s third most powerful supercomputer and Europe’s number one is fully operational. The second pilot phase of the GPU-based LUMI supercomputer has been completed, and LUMI is accepted as of February 2023 and officially ready to serve European scientists, including Czech scientists. LUMI contains a total of 10 240 graphics processors.

What were the first experiences of Czech scientists from IT4Innovations pilot testing of the GPU part of the LUMI supercomputer?

David Číž explores the viability of training machine learning models on AMD graphics cards in an HPC environment using the machine learning TensorFlow framework. He created benchmarks that will serve as a baseline for LUMI users, and he also compared them with the same benchmarks run on the Karolina supercomputer. On the pilot testing of LUMI, he adds:

–From start to finish, I was impressed with how smooth the experience already was for a pilot project. The documentation is nicely organized, with clear instructions and helpful examples, which makes connecting to LUMI and using its vast resources quick and easy. Creating the required environments with the appropriate software was made simple with the pre-made EasyBuild recipes, and the support team was always quick to help and answer questions. The system runs smoothly and quickly and is intuitive to use.

Sergiu Arapan focuses on two-dimensional van der Waals materials, which are promising candidates for future thermoelectric materials and compact spintronic applications. He uses state-of-the-art computational methods to study the structural and physical properties of these materials. On his first experience with the LUMI supercomputer, he says:

–Using the GPU nodes can speed up our electronic structure calculations by orders of magnitude and thus accelerate the computational design of new materials with desired properties. We have been successfully running our code on the Karolina supercomputer but with a different GPU architecture than on LUMI. The LUMI-G pilot phase offered us the opportunity to deploy our code on AMD GPUs. Though we still need to rework some parts of the code to use the capability provided by LUMI fully, we noticed that the Cray compiler and available mathematical libraries produce a more performant code. We would also like to give credit to the professionalism of the LUMI support staff and the easy and clear online documentation.

Oldřich Plchot from the Faculty of Information Technology of the Brno University of Technology was also allocated time on LUMI in the GPU pilot phase for a project entitled Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries.

–This project is specific in its requirements for the amount of audio data to be processed, which need not be strictly annotated. Our goal was to train an embedding extractor used in biometric applications for speaker verification. In simple terms, for each pair of speech utterances, we extract an embedding (a high-dimensional vector extracted from a neural network) and then compare it using, for example, their cosine distance to decide whether the embeddings are from the same speaker or from two different speakers. The embedding extractor is a deep convolutional neural network, and our proposed algorithm for its training can exploit weakly annotated data while optimising the objective function for speaker identification.

–By weakly annotated data, we mean training recordings that can contain an arbitrary number of speakers; during training, we only have information for each recording if any of the speakers are anywhere in the recording. This approach makes it possible to take advantage of a large amount of data freely available from the Internet, circumventing the current fundamental problem where we often face small amounts of training data for increasingly large neural networks. Acquiring and augmenting such data is significantly cheaper than having the data annotated and segmented manually. Thus, the computing power was essentially required to train this large neural network on roughly ten times the amount of data than is typical for manually annotated data. As the algorithm iteratively refined its estimate of where individual speakers occur in the weakly annotated data, a significantly larger number of iterations was also required.

Plchot had a mostly positive user experience using the LUMI supercomputer:

–We only used nodes with graphics accelerators for our computations, and we experienced several challenges in our initial experience with LUMI and AMD accelerators in the pilot testing. For example, on Karolina, we often use the feature to mount squashfs or tar files via fusermount, and this feature is missing on LUMI; we were promised an implementation of this functionality. We are glad for the opportunity to use the remaining computing time on LUMI until the end of March.

Author: Zuzana Červenková, IT4Innovations

Image: Fade Creative