#6999 A static index for registry.fedoraproject.org [placeholder]
Closed: Fixed 6 years ago Opened 7 years ago by otaylor.

In order for GNOME Software to discover what Flatpaks are available and offer them to users for installation, it needs to be able to obtain:

  • A list of repositories in the registry with selected metadata from OCI annotations such as installed size.
  • Appstream data for the repositories in the registry to obtain titles, descriptions, icons, and so forth.

The idea is to generate a static file - as compared to a query api, this is more efficient for server load, and clients will typically need all of the information upfront. Static file generation could be done simply from a cron job, or using docker registry webhook notifications (https://docs.docker.com/registry/notifications/) to get faster and more efficient updates.

The Cockpit project has similar needs, and we are coordinating to figure out a common solution.


@cverna Could you talk to @puiterwijk and @bowlofeggs (note, he's on vacation until Wednesday but Patrick probably can still advise) and see if there's a way you could help with this piece of our registry infrastructure?

I would like to point out that we have a piece of software that will generate a static index, though for now it's human-readable HTML.
If you can tell us what information you need and in what format (json, xml, ...), I can easily add it to that same system...

So using the docker registry HTTP APIs we should be able to get something. After digging a little bit we have :

Regarding the appstream metadata I am not sure yet how we could get them. Any ideas ?

@otaylor Do you know exactly which OCI metadata gnome-software would need ? Maybe you already have an example of json file used by gnome-software ?

I actually got pretty far with this in the fall - there's a fairly complete solution at:

https://github.com/owtaylor/metastore/

I started off looking at 'reg' or writing a similar static approach by hand, but it didn't seem convincing to me that it would scale even to the anticipated needs of registry.fedoraproject.org - the basic problem is that you need a lot of HTTP requests to get all the data you need:

A) List all the repositories
B) List all the tags in each repositories
C) Get the manifest for each tag
D Get the config for the manifest to find out labels

I want to get to the point where we have ~1000 flatpaks in the Fedora registry, in addition to all the server side containers - multiply that out by different version tags, etc, and we're talking 10k+ requests at a minimum to do a complete scan. That's going to put a lot of load on the registry for multiple minutes, at my estimation. (A complete scan of the current registry contents to this level of detail takes several minutes, though it would likely be faster collocated.)

So how often could we run such a scan? Every 30 minutes? Every hour? For the candidate registry, we'd like new builds to visible be not in an hour, but ideally within seconds so that after a build a maintainer can immediately test out if it works.

The other problem is that you end up hardcoding in the server config the metadata and containers you need, or generating gigantic output files with ALL the metadata for ALL the containers.

The basic principles of metastore are:

  • Harvest metadata out of the registry via the http API
  • Update it incrementally triggered by webhooks
  • Store it in a database
  • Allow arbitrary queries, but pay a lot of attention to being able to cache them with a frontend cache

I've been holding off on pushing to deploy this to try and get some feedback internally/externally first, but haven't succeeded so far - partly because of lack of bandwidth to bug people.

In terms of appstream, I think we just put it into a OCI annotation at build time - though digging inside the container to extract when indexing is theoretically possible.

Thanks @otaylor, I ll look at the metastore and try to get it running on my machine.

Thanks @cverna! The included docker compose should work out of the box, and hopefully is useful documentation of how things need to be wired together. Let me know if anything is unclear or you hit any snags.

Hi @cverna, any progress on this ticket?

I could be wrong but I think this was covered by https://pagure.io/fedora-infrastructure/issue/7157

@otaylor Am I correct ? if so I think we can close this ticket

Metadata Update from @syeghiay:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

6 years ago

Log in to comment on this ticket.

Metadata