GPU outage on Kebnekaise (resolved)

  • Posted on: 25 September 2018
  • By: zao

The GPU nodes of Kebnekaise are temporarily unavailable since the evening of Monday 2018-09-24. Jobs that started after that may have failed with messages about mismatched driver versions and may report missing GPUs.

We are addressing the problem, while work progresses nodes will be unavailable and the queue may report that jobs are blocked due to resources. We will update this news entry when this is resolved.

Update 2018-09-26: We are patching nodes as they become available after job completion and return them to the pool.
The job queue is making forward progress but queue times may be slightly longer than usual until this is fully resolved.

Update 2018-10-02: All nodes are back in service and the queue should be processing at the usual rate.
Please note that there may still be a bit of job backlog from the earlier lack of resources.

Updated: 2021-11-11, 13:50