Skip to main content

[Update] Lustre filesystem performance

Friday 13 February 06:45 CET (07:45 EET)

HPE has identified a potential root cause of the recent Lustre filesystem issues. The problem was traced to a bug in the current Lustre version, affecting Lustre P2 and P4. The LUMI admin team has now implemented the proposed fixes on both partitions.
Identifying the cause required significant effort from HPE experts and LUMI admins, and once confirmed the fixes were deployed promptly. Thank you for your patience while this was investigated and resolved.
We expect these changes to improve filesystem stability. Please let us know if you still experience any issues https://lumi-supercomputer.eu/user-support/need-help/

Friday 6 February 13:15 CET (14:15 EET)

System administrators and HPE are still working on the issues many of you have reported, which affect file system performance. The root cause is still unknown, and unfortunately, we are unable to provide an accurate estimate of when this will be fixed at this time.
In the meantime, a few workarounds should temporarily deal with these issues:
  • Lustre-f seems to be more reliable than lustre-p. However, please note that this could be due to the lower utilization of lumi-f compared with lustre-p.
  • Working from compute nodes rather than login nodes should improve workflows in most cases.
We will keep you informed of any new developments regarding these issues, and we appreciate your understanding and patience.

Monday 2 February 15:30 CET (16:30 EET)

Lustre filesystem performance is still very slow despite of some visible improvements before the weekend. This can lead to software seemingly hanging, command prompts not returning any output for a while, login sessions hanging, or data transfers timing out.
The admin team together with HPE are still actively working on improving Lustre filesystem performance. As soon as we get some positive results, we will communicate them to you.

Friday 30 January 12:30 CET (13:30 EET)

System administrators together with HPE have managed to improve the performance of the Lustre file system and increase the number of available nodes. However, some issues still need to be addressed in order to reach a better system stability.
If you still experience issues, do not hesitate to contact the LUMI User Support with the help of the web forms https://lumi-supercomputer.eu/user-support/need-help/.  Thank you.

Thursday 29 January 15:00 CET (16:00 EET)

The admin team together with HPE are still actively working on improving Lustre filesystem performance. As soon as we get some positive results, we will communicate them to you here and by email using the LUMI users mailing list.

 

Tuesday 27 January 13:00 CET (14:00 EET)

Lustre filesystem performance is still very slow and can affect all Lustre filesystem partitions. System administrators together with the LUMI vendor are working to find a solution as fast as possible. We’ll keep you updated of any new development.

 

Friday 23 January 16:00 CET (17:00 EET)

Lustre filesystem performance is very slow at the moment, in particular for lustrep3. This can lead to software seemingly hanging, command prompts not returning any output for a while, login sessions hanging, or data transfers timing out. It is not limited to the login nodes but also affects the compute nodes. Users who have their home or project directory on lustrep3 will be particularly affected, but other users will also experience issues if they run on a compute node that gets its software from lustrep3, or use login node uan02, as that node also gets its software stack from lustrep3.