#11889 Likely OOM issue on ppc64le builder
Closed: Fixed 4 months ago by kevin. Opened 6 months ago by catanzaro.

Describe what you would like us to do:


I have a ppc64le build https://koji.fedoraproject.org/koji/taskinfo?taskID=116569426 that appears to be continuously restarting.

Each time it restarts, the build log is lost, so it's not possible to know for sure why it's restarting. But presumably it is running out of memory.

When do you need this to be done by? (YYYY/MM/DD)


2024/04/18 (since you require a date, I picked yesterday :)


It looks like there is only 15 GiB of RAM available on this builder. That's probably too low regardless of parallelism level; there is %limit_build in the spec file requesting 32 GB for the debuginfo stage, with a comment saying that 16 GB is not enough.

Metadata Update from @zlopez:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: low-trouble, medium-gain, ops

6 months ago

So, the build did finally finish. I was tailing the logs and it was at the end in debuginfo that it was having problems.

After freeze is over, I plan to try reinstalling some of the power9 hosts to see if I can get a more performant setup.

The ppc64le builders all have a 4.0 weight. webkitgtk has a 6.0 weight, so if its running on a builder, koji shouldn't schedule anything more on that builder (since it's already over capacity).

I am unsure what we can do downstream here to improve things aside from making builders more performant somehow.

I suggest a minimum of 32 GB of RAM for the heavybuilder channel. Pretty sure 16 GB is just not enough.

Well, I have no endless bag of memory... ;) So, the only way I can make a bigger one is by scrapping some others... which isn't great since most packages are small and having more smaller builders is better than fewer larger builders, but sure, we can look at that after freeze is over I suppose.

I'm also seeing if perhaps we could get more memory in next years budget rounds...

I've been playing with different configurations to see what I can manage to make things more performant/rebalanced.

ok, finally getting back to this.

I have setup 3 ppc64le builders with 32gb memory and put them in the heavybuilder channel.

So, please do look the next few builds and see if things are better?

I'd expect that will be enough. Will complain here if not.

Metadata Update from @kevin:
- Issue assigned to kevin

4 months ago

Looks like the builds on the 9th went ok.

Closing this for now, feel free to re-open or file a new ticket if you see problems again.

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

4 months ago

Log in to comment on this ticket.

Metadata
Boards 1
ops Status: Backlog