Skip to main content

[Completed]Extension of the service break until Monday 16 September

Monday 16 September 11:45 CEST (12:45 EEST)

The maintenance of LUMI has ended, and LUMI is back into production.

All parts of the system have received updates. The updates that do affect running software most are an update of the OS to SUSE Enterprise Linux 15 SP5 on the login nodes and the matching version of Cray Operating System on the compute nodes, and the update of ROCm to 6.0.3, with an update of the programming environment to version 24.03 that fully supports ROCm 6.0. As this is a major version update of ROCm, it will have consequences for other software on the system. The LUST has prepared a web page listing updates and their consequences at: https://lumi-supercomputer.github.io/update-202409. We will keep updating the list in the next few weeks. Please read the update list carefully before restarting your work on LUMI and keep an eye on it the in the coming weeks.

As a large percentage of GPU jobs submitted before the maintenance break are expected to fail due to changes in the software environment. Therefore, these jobs have been placed on hold to allow you time to inspect them before deciding to cancel them, or, if you wish, release them and monitor their progress. After a one-week grace period, any held jobs will be removed from the queues. Instructions can be found on https://lumi-supercomputer.github.io/update-202409. Some CPU-based jobs may also fail, but it is expected that this failure will be immediate, without wasting billing units, so these jobs will be released by the system administrators.

The visualization nodes in the lumi-d partition will become available at a later date, but other services are all operational (with reduced capacity in some cases).

If you experience problems, do not hesitate to contact LUMI User Support via the web forms on https://lumi-supercomputer.eu/user-support/need-help/.

Thursday 12 September 14:00 CEST (15:00 EEST)

To guarantee optimal stability and performance, opening the system has been postponed until Monday 16 September afternoon. We are addressing a power delivery issue affecting the LUMI data center by restoring additional redundancy for the Lustre power rails. We are also using this time to implement critical system enhancements and doing additional validation. This includes ensuring reliability through configuration improvements and verifying functionality after major software and firmware upgrades.

Thank you for your understanding.

Friday 6 September 14:30 CEST (15:30 EEST)

Despite a LUMI system upgrade successfully completed, a failed power supply in the data center where LUMI is located affected the filesystems. We unfortunately need to extend the service break until Thursday 12 September.
Because of this same problem, the system will feature reduced capacity after coming back online. We will keep you updated about any progress.
We apologize for the inconvenience caused by this delay.