The builds are getting 'unauthorized: authentication required' when writing to candidate-registry.fedoraproject.org, e.g. https://koji.fedoraproject.org/koji/taskinfo?taskID=92264779
2022-09-22 09:03:54,284 - atomic_reactor.plugins.tag_and_push - INFO - Registry candidate-registry.fedoraproject.org secret /var/run/secrets/atomic-reactor/v2-registry-dockercfg 2022-09-22 09:03:54,284 - atomic_reactor.plugins.tag_and_push - INFO - Calling: skopeo copy --authfile=/var/run/secrets/atomic-reactor/v2-registry-dockercfg/.dockercfg oci:/tmp/tmpewkf29io/flatpak-oci-image:app/org.mozilla.Thunderbird/x86_64/stable docker://candidate-registry.fedoraproject.org/thunderbird:f36-flatpak-candidate-12525-20220922090100-x86_64 2022-09-22 09:03:54,618 - atomic_reactor.plugins.tag_and_push - ERROR - push failed with output: b'Getting image source signatures\nCopying blob sha256:5cd45b51aac524c9c9430447182f81bcbe5aa95fe43347dc9af07b5a856af178\ntime="2022-09-22T09:03:54Z" level=fatal msg="writing blob: initiating layer upload to /v2/thunderbird/blobs/uploads/ in candidate-registry.fedoraproject.org: unauthorized: authentication required"\n'
Metadata Update from @phsmoura: - Issue priority set to: Waiting on Assignee (was: Needs Review) - Issue tagged with: medium-gain, medium-trouble, ops
This is all very puzzling.
Nothing should have changed here. I can see the credentials are the same as they have been since 2020 and look right.
When was the last working build? I'll keep trying to dig...
The last one I did that worked was 3 days ago, https://koji.fedoraproject.org/koji/buildinfo?buildID=2064679
I did upgrade proxies on the 20th (day after the successful build). I can't see how it would have changed anything much, but we also noticed that in error_logs on the proxy there is:
[Sat Oct 01 01:31:13.438405 2022] [auth_basic:error] [pid 3405590:tid 3408454] [client 10.3.169.115:56484] AH01618: user not found: /v2/thunderbird/blobs/uploads/
note the "user not" (two spaces). Like it's not sending or getting the user?
Would it be possible to test this theory somehow? Like, downgrading the proxies and see if it makes it work? If we can't get this to work before the release then F37 Silverblue is sadly going to ship with apps preinstalled that are stuck at F36 versions :(
Or can we ask someone to help who understands how the auth is supposed to work there?
@dustymabe Do you by any chance have ideas what's going on here?
Err, sorry Dusty, I wanted to ask @cverna here :) Not sure how I managed to pick you.
This is now finally fixed. Many many thanks to @darknao for helping debug it.
The problem is that the osbs playbooks had a race condition. If they were run without limiting prod or staging, the last one to happen to finish writing the local file would be the local file that all masters got. ie, the last time this ran, one of the staging hosts happened to finish writing the local file last, so all 4 masters (stg and prod) got the staging credentials for the registry secret. I have a PR to fix this, but in the mean time prod should be back to being correct and working. :)
Metadata Update from @kevin: - Issue close_status updated to: Fixed with Explanation - Issue status updated to: Closed (was: Open)
Excellent! Thank you for figuring it out.
Log in to comment on this ticket.