#7410 500 Error uploading to candidate-registry.fedoraproject.org
Closed: Fixed 5 years ago Opened 5 years ago by otaylor.

https://kojipkgs.fedoraproject.org//work/tasks/4246/31164246/x86_64.log shows failure:

INFO - Calling: skopeo copy --dest-creds=<HIDDEN> oci:/tmp/tmpitbg3wm1/flatpak-oci-image:runtime/org.fedoraproject.Platform/x86_64/f29 docker://candidate-registry.fedoraproject.org/flatpak-runtime:f29-flatpak-candidate-99023-20181128203614-x86_64'
ERROR - push failed with output:'
Getting image source signatures
Copying blob sha256:e5f4436db0856c251ba39ba0753bee276767b5c61bd6dcbdcb18d87703faa5bd\\n\\r 0 B / 339.39 MB \\r 0 B / 339.39 MB  0s\\n
time="2018-11-28T20:54:48Z" level=fatal msg="Error writing blob: Error initiating layer upload to /v2/flatpak-runtime/blobs/uploads/ in candidate-registry.fedoraproject.org: received unexpected HTTP status: 500 Internal Server Error" \\n\''

First step is probably to check the logs of the candidate registry and see if anything shows up there.


Metadata Update from @cverna:
- Issue assigned to cverna

5 years ago

Just as another datapoint, https://kojipkgs.fedoraproject.org//work/tasks/5405/31165405/x86_64.log shows the same failure with a much smaller upload (1.38MB instead of 339MB)

@otaylor do you know if we have your patched version of the docker registry in production ?

cc' @codeblock

Yes - there are previously built OCI images on the candidate registry - e.g.:
https://candidate-registry.fedoraproject.org/v2/eog/tags/list

Yes - there are previously built OCI images on the candidate registry - e.g.:
https://candidate-registry.fedoraproject.org/v2/eog/tags/list

Ok then we need someone to check the logs on registry. You might want to ask the oncall to do that.

I don't have the correct permission to login on the registry boxes.

I'll poke people about this tomorrow - need to knock off work for the day.

time="2018-11-28T21:54:28Z" level=error
msg="response completed with error" err.code=unknown err.detail="filesystem: mkdir /srv/registry/docker/registry/v2/rep
ositories/eog/_uploads/447a5b2a-0b3c-4d91-a95a-4ffb913ebca8: no space left on device"

and indeed it is full:

/dev/mapper/GuestVolGroup01-srv 89G 89G 24K 100% /srv

running registry garbage-collect --dry-run /etc/docker-distribution/registry/config.yml
ends with:

4190 blobs marked, 0 blobs eligible for deletion

Happy to clear space if someone can tell me how to do so

I suspect all that space is actually used by the builds in the candidate registry. So to free some up, we need to delete builds. Basic plan of attack that could be used:

  • Use regindexer to dump info all the images in the candidate registry (did this)
  • Write some python code to use the labels in the json code to look up corresponding koji builds, and see if they are untagged or have the 'trashcan' tag and dump out repository/hash pairs. (mostly did this - but koji is disconnecting - need to add retries),
  • Use curl or Python to send the appropriate DELETE http methods to delete those images.

This should easily give 300-400 images that could be deleted. There are also images where the label => koji build lookup fails of various types - base images, Flatpaks, stray builds of unknown origin, etc - but cleaning up the low-hanging fruit would a good start

I suspect all that space is actually used by the builds in the candidate registry. So to free some up, we need to delete builds. Basic plan of attack that could be used:

Use regindexer to dump info all the images in the candidate registry (did this)
Write some python code to use the labels in the json code to look up corresponding koji builds, and see if they are untagged or have the 'trashcan' tag and dump out repository/hash pairs. (mostly did this - but koji is disconnecting - need to add retries),
Use curl or Python to send the appropriate DELETE http methods to delete those images.

This should easily give 300-400 images that could be deleted. There are also images where the label => koji build lookup fails of various types - base images, Flatpaks, stray builds of unknown origin, etc - but cleaning up the low-hanging fruit would a good start

So we previously deleted all the tags from the f25 and f26 images, but they are still showing up in the registry, not sure why.

Since f27 is eol we could do the same here

I attached the script I used to delete the tags.
clean-registry.py

So it seems that the way to delete the entry from the catalog is once all the manifests have been deleted and the garbage collection was run, we can delete the directory from disk :

for example

sudo rm -rf docker/registry/v2/repositories/f25/cockpit

Deleting all the f27 builds would certainly be a fast way to free up some
space and allow builds to proceed. And I don't see any downsides of doing
that for the candidate registry. Long term, both the candidate and final
registries, perhaps we should tie deleting images to the deletion of the
corresponding builds in Koji? - either by extending the deletion process,
or with a garbage collection step where we scan for images with deleted
builds. I'll try playing around a little bit with your script and see if I
can make it detect images in the deleted state.

Here's the script adapted to find and delete images where the koji build is deleted:

clean-deleted-images.py

Probably good to comment out the delete_image_manifest() call and do some spot verification before running it.

Since the script reverse engineers the koji package name from the repository name, one thing that could go wrong here is that if the name and com.redhat.component labels don't match - if name=f27/mysql and com.redhat.component=mysql-container, it's conceivable that it the script could find the wrong koji build, decide it is deleted, and delete the image. It's much more likely, however, that the koji build simply won't be found. This is is particularly the case because for more recent builds, the n-v-r is something like mysql-1-2.fc28container with the dist tag.

@otaylor Thanks, I ll play around with it and look at configuring a cron job that runs maybe weekly.

In the meantime I have cleaned the f27 builds. So let's close this ticket

Metadata Update from @cverna:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

5 years ago

Login to comment on this ticket.

Metadata
Attachments 2
Attached 5 years ago View Comment
Attached 5 years ago View Comment