Skip to main content

The NeurIPS 2023 Paper Awards honor researchers working on the LUMI supercomputer’s massive AMD GPU resources

The LUMI data center

Neural Information Processing Systems (NeurIPS) is the most frequented forum for leading research in artificial intelligence. The committee announced the award-winning papers for the NeurIPS 2023 conference on December 11th. Among 13,321 submissions, four main conference papers were selected for this prestigious award.

One of the winners is “Scaling Data-Constrained Language Models”, a work conducted by a group of researchers on the AMD GPU partition of the LUMI supercomputer. The focal point of their research is the impending challenge of running out of training data for large language models. The group investigated scaling language models in data-constrained regimes, running large sets of experiments varying the extent of data repetition and compute budget, ranging up to 900 billion training tokens and 9 billion parameter models.

The findings of their research revealed that with constrained data for a fixed compute budget, training with up to four epochs of repeated data yields negligible changes to loss compared to having unique data. However, as the repetition increased, the value of adding compute eventually decays to zero. They also investigate code augmentation and revising filtering strategies to scale further when data is limited. The full preprint is freely available on arXiv, and the resources on GitHub.

– LUMI enabled scaling up to 2,200 nodes at a given point, a world record for AI research on AMD hardware, said Niklas Muennighoff, first author of the winning paper.

– With its massive computational resources, LUMI is the ideal platform for large-scale cross-organizational collaborations, stated Sampo Pyysalo, member of the awarded research group.

The work was conducted by Niklas Muennighoff (Hugging Face), Alexander Rush (Cornell University, Hugging Face), Boaz Barak (Harvard University), Teven Le Scao (Hugging Face), Nouamane Tazi (Hugging Face), Aleksandra Piktus (Hugging Face), Sampo Pyysalo (University of Turku), Thomas Wolf (Hugging Face), and Colin Raffel (University of Toronto, Hugging Face).

See also: Niklas Muennighoff explaining about the research paper on a video on LinkedIn