#12377 Build restarting on s390x builder
Opened 2 months ago by catanzaro. Modified a month ago

Describe what you would like us to do:


Please investigate this build which is restarting on s390x (presumably out of memory): https://koji.fedoraproject.org/koji/taskinfo?taskID=128394527

When do you need this to be done by? (YYYY/MM/DD)


Preferably 2025/01/27


Ugh.

So, some of the s390x builders (16-21) have less cpus and memory. I wonder if it hit one of those for the first cycle(s)

it's on 03 now, which has more memory and cpus.
but it's been building for like 10 hours. ;(

I guess if it fails this time I will scrap some smaller builders and build up a larger one. ;(

Presumably the smaller builders should just not be assigned to the heavybuilder channel, right?

And looking... they aren't, so it's the 'larger' ones that are no longer able to build it. ;(

So, I scrapped 21 and added it's memory to 20 and made that one the only one in the heavybuilder channel.

In fact it failed right around then and restarted and moved to 20. ;)

I guess lets see...

Metadata Update from @zlopez:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: high-gain, medium-trouble, ops

2 months ago

Build succeeded (after 78 hours :) so I think that worked. Thanks!

With only one heavybuilder there is generally going to be a long queue for WebKitGTK and Chromium builds, but I certainly do prefer waiting to build vs. unreliable or restarting builds.

Unfortunately I see builds restarting again:

I see the builder has 3 vCPUs and about 44 GB of RAM, so roughly 15 GB of RAM per vCPU, which should be far more RAM than required. But I think it might be running two jobs at once? Even so, it should still have more than enough RAM/vCPU to compile WebKitGTK. It might not be enough to link two WebKitGTKs at the same time, though? Linking requires a huge amount of RAM even without any parallelism.

The problem is that the host is OOM killing the vm.

So, I need to try and figure out why thats happening. ;( I might need to drop another 'normal' vm to make sure there's enough free memory for it.

Log in to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog