To be fair, I don't blame it, but it would be nice to be able to build Chromium again.
Currently, it fails while opening 29 buildroots without any obvious failure in the build logs, see:
https://koji.fedoraproject.org/koji/taskinfo?taskID=82556973 https://koji.fedoraproject.org/koji/taskinfo?taskID=82572340
It does seem specific to x86_64, the F35 aarch64 build succeeded. nirik suggested that perhaps the OOM killer is killing kojid. Can AWS buy some more memory for your koji servers? :D
So, I forced the f35 one onto a buildhw (hardware box) and it finished.
The others finally failed... I am going to rebumit one and watch it's logs and see if I can see whats going on.
Metadata Update from @zlopez: - Issue tagged with: koji, medium-gain, medium-trouble, ops
Looks like the EL7 build is having the same failures.
yeah, and it's puzzling. I did a tail -f of the build log on a builder... and it processed along, until it just stopped. No errors or anything in the log or in the builder logs. :(
Will keep digging.
Metadata Update from @kevin: - Issue assigned to kevin - Issue priority set to: Waiting on Assignee (was: Needs Review)
ok, it is being oom killed... by systemd-oomd, which I thought I had disabled. ;(
Feb 15 06:09:11 buildvm-x86-25.iad2.fedoraproject.org systemd-oomd[612]: Killed /system.slice/kojid.s ervice due to memory used (15411052544) / total (15704350720) and swap used (10640809984) / total (11 811151872) being more than 90.00% Feb 15 06:09:12 buildvm-x86-25.iad2.fedoraproject.org systemd[1]: kojid.service: systemd-oomd killed 36 process(es) in this unit. Feb 15 06:09:12 buildvm-x86-25.iad2.fedoraproject.org systemd[1]: kojid.service: Main process exited, code=killed, status=9/KILL
I disabled it on the builder that it's running on now, will see how it does in the morning...
ok, That didn't help any.
I think the problem is one we often hit with these big projects: The buildvm-x86 vm's have 5 cpus and 15gb memory. So thats 3 for 5 threads running at the same time. If one of those threads goes over 3GB memory use, boom, OOM takes it out.
I am not sure why it would suddently be happening now tho. Are there any changes in chromium that would cause the compile to take a bunch more memory?
In any case, as a workaround, I removed the buildvm's from the channel chromium uses, it's just the buildhw-x86 boxes in there now. Can you re-submit and confirm that they all complete now?
I went ahead and resubmitted that epel7 one. The older failed ones looked like they might have been for other reasons, so I left them alone.
I see successful builds... so I think this is working around things for now. Please re-open or file a new ticket if it's not...
Metadata Update from @kevin: - Issue close_status updated to: Fixed - Issue status updated to: Closed (was: Open)
Log in to comment on this ticket.