#9282 chromium builds hanging on aarch64 builders in koji
Closed: Fixed 4 years ago by kevin. Opened 4 years ago by spot.

Currently, the majority (though, not all) of chromium builds attempted through koji result in them hanging on the aarch64 builder, with the builder state at "free".

When I look at the aarch64 build log from a build in this state (https://koji.fedoraproject.org/koji/taskinfo?taskID=50384894) there are no errors, just seems to be paused in early build state. Perhaps the aarch64 builders (due to their slow nature) are timing out somehow?

While I recognize that chromium may be one of the largest packages in Fedora (from a build/resources perspective), this is a new failure state. Chromium is already significantly behind from a security perspective, so there is some urgency in getting builds done. I've been able to build successfully using a "local" ECS aarch64 instance of Fedora, so I feel confident this is not a chromium issue.


The problem is that there are no heavybuilder aarch64 boxes working currently. The only two boxes which should be 'enabled' are the two listed as .iad2. and they say they are not available. All of the ones which say they are enabled for .arm.fedoraproject.org should be disabled.

buildhw-a64-19.iad2.fedoraproject.org      
buildhw-a64-20.iad2.fedoraproject.org      
buildvm-aarch64-24.arm.fedoraproject.org       
buildvm-aarch64-25.arm.fedoraproject.org       
buildvm-armv7-10.arm.fedoraproject.org     
buildvm-armv7-11.arm.fedoraproject.org     
buildvm-armv7-12.arm.fedoraproject.org     
buildvm-armv7-13.arm.fedoraproject.org     
buildvm-armv7-14.arm.fedoraproject.org     
buildvm-armv7-15.arm.fedoraproject.org     
buildvm-armv7-16.arm.fedoraproject.org     
buildvm-armv7-17.arm.fedoraproject.org     
buildvm-armv7-18.arm.fedoraproject.org     
buildvm-armv7-19.arm.fedoraproject.org     
buildvm-armv7-20.arm.fedoraproject.org     

Yes, there's only 2 aarch64 builders in the heavybuild channel. buildhw-a64-19 and buildhw-a64-20.

Both were not checking in. OOM killed kojid on them. :(

I've restarted them, but I think the builds failed because the buildroot they were trying to use has been expired on the hub now (1 day).

I think to mitigate this we should probibly set the kojid unit file to auto restart on oom.

Please let me know when I should restart chromium builds.

Anytime. Should be back up and ready to process...

Metadata Update from @mohanboddu:
- Issue priority set to: Waiting on Assignee (was: Needs Review)
- Issue tagged with: medium-gain, medium-trouble, ops

4 years ago

I've restarted kojid on them again... and have submitted a freeze break to add 'Restart=on-failure' to the kojid unit

https://koji.fedoraproject.org/koji/taskinfo?taskID=50584507

It seems to never have gone past "free" for the aarch64 builder.

The Restart=on-failure change has been pushed out. Hopefully that will prevent them from getting OOM killed and never restarting.

The reason that one is free is that you are doing 4 chromium builds. We have exactly 2 aarch64 builders in the heavybuild channel. Those builders are building chromium 2 times, but they cannot build it 4 times without likely OOMing all of them. I still think this is a win if it takes an hour or two to finish those builds and do the next 2 over the 'normal' builders that were taking many many times that.

Does that make sense? I don't think there's much left I can do aside magically more hardware showing up (of the right type).

Metadata Update from @kevin:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

4 years ago

It does. I thought the other builds had failed.

Log in to comment on this ticket.

Metadata
Boards 1
ops Status: Done