#12565 Fedora Kinoite Rawhide aarch64 builds failing since 2025-01-31
Closed: Fixed with Explanation 17 days ago by siosm. Opened 4 months ago by siosm.

  • Describe the issue

Fedora Kinoite Rawhide aarch64 builds are failing since 2025-01-31. The error is weird, and suggests a potential issue with the repo, which would be bad:

+ pungi-make-ostree tree --repo=/mnt/koji/compose/ostree/repo/ --log-dir=/mnt/koji/compose/rawhide/Fedora-Rawhide-20250206.n.0/logs/aarch64/Kinoite/ostree-2 --treefile=/mnt/koji/compose/rawhide/Fedora-Rawhide-20250206.n.0/work/ostree-2/config_repo/kinoite-ostree.yaml --version=Rawhide.20250206.n.0 --extra-config=/mnt/koji/compose/rawhide/Fedora-Rawhide-20250206.n.0/work/ostree-2/extra_config.json '--ostree-ref=fedora/rawhide/${basearch}/kinoite' --force-new-commit --unified-core
COMMAND: rpm-ostree compose tree --repo=/mnt/koji/compose/ostree/repo/ --write-commitid-to=/mnt/koji/compose/rawhide/Fedora-Rawhide-20250206.n.0/logs/aarch64/Kinoite/ostree-2/commitid.log --touch-if-changed=/mnt/koji/compose/rawhide/Fedora-Rawhide-20250206.n.0/logs/aarch64/Kinoite/ostree-2/commitid.log.stamp --unified-core --add-metadata-string=version=Rawhide.20250206.n.0 --force-nocache /mnt/koji/compose/rawhide/Fedora-Rawhide-20250206.n.0/work/ostree-2/config_repo/kinoite-ostree.json
-------------------------------------------------------------------------------
rpm-ostree version: 2025.4
Previous commit: d376d4347dfee44409c081c871b94e239041fdf4430d55ea20605681bc31bb94
error: Installing packages: Loading previous sepolicy: No such metadata object d376d4347dfee44409c081c871b94e239041fdf4430d55ea20605681bc31bb94.commit
Traceback (most recent call last):
  File "/usr/bin/pungi-make-ostree", line 33, in <module>
    sys.exit(load_entry_point('pungi==4.8.0', 'console_scripts', 'pungi-make-ostree')())
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3.13/site-packages/pungi/ostree/__init__.py", line 202, in main
    func()
    ~~~~^^
  File "/usr/lib/python3.13/site-packages/pungi/ostree/tree.py", line 157, in run
    self._make_tree()
    ~~~~~~~~~~~~~~~^^
  File "/usr/lib/python3.13/site-packages/pungi/ostree/tree.py", line 62, in _make_tree
    shortcuts.run(
    ~~~~~~~~~~~~~^
        cmd,
        ^^^^
    ...<4 lines>...
        errors="replace",
        ^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/lib/python3.13/site-packages/kobo/shortcuts.py", line 408, in run
    raise exc
RuntimeError: ERROR running command: rpm-ostree compose tree --repo=/mnt/koji/compose/ostree/repo/ --write-commitid-to=/mnt/koji/compose/rawhide/Fedora-Rawhide-20250206.n.0/logs/aarch64/Kinoite/ostree-2/commitid.log --touch-if-changed=/mnt/koji/compose/rawhide/Fedora-Rawhide-20250206.n.0/logs/aarch64/Kinoite/ostree-2/commitid.log.stamp --unified-core --add-metadata-string=version=Rawhide.20250206.n.0 --force-nocache /mnt/koji/compose/rawhide/Fedora-Rawhide-20250206.n.0/work/ostree-2/config_repo/kinoite-ostree.json
For more details see /mnt/koji/compose/rawhide/Fedora-Rawhide-20250206.n.0/logs/aarch64/Kinoite/ostree-2/create-ostree-repo.log

This happened in the past, but we did not find the source:
- https://gitlab.com/fedora/ostree/sig/-/issues/49

  • When do you need this? (YYYY/MM/DD)

No urgency, it's Rawhide only right now.

  • When is this no longer needed or useful? (YYYY/MM/DD)

N/A

  • If we cannot complete your request, what is the impact?

We don't get Kinoite Rawhide builds for aarch64.


Options that come to mind, not sure how feasible:
- Restore the file from backups? (do we have some?)
- Run a full ostree fsck on the repo?
- Remove commits until we get one that does not include the offending file.

Metadata Update from @phsmoura:
- Issue tagged with: low-gain, medium-trouble, ops

4 months ago

Thanks for the report @siosm. We'll get one of our releng folks to investigate the logs to see what's happening along with you or someone else within the atomic desktop folks.

/cc @ancarrol

I don't see that commit offhand in old snapshots... :(

Do we have any info on that commit? why it's looking for it/whats in it/when it was made?

The command is rpm-ostree compose tree --repo=/mnt/koji/compose/ostree/repo/ ... so the missing commit would be from the repo, and it's likely that the kinoite aarch64 refs in this repo points to it.

So could you look at the content of /mnt/koji/compose/ostree/repo/refs/heads/fedora/rawhide/aarch64/kinoite?

Querying it remotely does not give me that, but it's the summary file, not the head itself:

ostree remote summary fedora | grep rawhide/aarch64/kinoite -A4
* fedora/rawhide/aarch64/kinoite
    Latest Commit (85,4 kB):
      122a88f9fb5f5f15f7d76dd8f1cff50f9bf3bc6eb431cf07923f3d1996a5a421
    Version (ostree.commit.version): Rawhide.20250131.n.0
    Timestamp (ostree.commit.timestamp): 2025-01-31T06:51:40+01
# cat /mnt/koji/compose/ostree/repo/refs/heads/fedora/rawhide/aarch64/kinoite
d376d4347dfee44409c081c871b94e239041fdf4430d55ea20605681bc31bb94
# ls -la /mnt/koji/compose/ostree/repo/refs/heads/fedora/rawhide/aarch64/kinoite
-rw-r--r--. 1 263 263 65 Feb  1 06:01 /mnt/koji/compose/ostree/repo/refs/heads/fedora/rawhide/aarch64/kinoite

I assume that should normally be in repo/objects/d3/d376d4347dfee44409c081c871b94e239041fdf4430d55ea20605681bc31bb94*

which of course it's not, but I can't seem to see it in the snapshots for 2025-02-01 or 2025-02-02 either.

OK, so this is exactly as I "expected".

Can you replace this commit with the latest valid one from the aarch64 kinoite ref: 122a88f9fb5f5f15f7d76dd8f1cff50f9bf3bc6eb431cf07923f3d1996a5a421 ?

I don't know if there are good ways to do that with ostree rather than just manually replacing the content of the file.

I can... but why was it wrong?

So done now.

I don't really now why this ended up in this weird state.

The last build failed on a different commit now, so this is some form of progress:

rpm-ostree version: 2025.5
Previous commit: 122a88f9fb5f5f15f7d76dd8f1cff50f9bf3bc6eb431cf07923f3d1996a5a421
error: Installing packages: Loading previous sepolicy: No such metadata object 122a88f9fb5f5f15f7d76dd8f1cff50f9bf3bc6eb431cf07923f3d1996a5a421.commit

And this is the one we set there, so I don't understand why we have this issue.

So what does it mean there by 'previous sepolicy' ? the selinux-policy package in that commit?

I don't know why we are hitting this and it's hard to debug remotely. I think we should give up on the history for this ref and just remove it and let the next compose re-create it.

ok. I am happy to try and debug more locally if you can think of what to gather?

But sure, we can just punt I guess.

So, that would be:

rm -f /mnt/koji/compose/ostree/repo/refs/heads/fedora/rawhide/aarch64/kinoite

?
And do we need to also remove the /mnt/koji/ostree/repo/refs/heads/fedora/rawhide/aarch64/kinoite one? Or will the sync handle that?

ok. I am happy to try and debug more locally if you can think of what to gather?

We could do that, but that would take time that I would prefer to use working on other bugs.

So, that would be:

rm -f /mnt/koji/compose/ostree/repo/refs/heads/fedora/rawhide/aarch64/kinoite

?

That should do it.

And do we need to also remove the /mnt/koji/ostree/repo/refs/heads/fedora/rawhide/aarch64/kinoite one? Or will the sync handle that?

That's another part of the details of how this is setup in the Fedora infra that I don't know.

I don't know where the repo is / which one is the one exposed to users / why we have two.

We compose to /mnt/koji/compose/ostree/ (internalish compose only repo), then we sync to /mnt/koji/ostree (public, evertone repo)

This is to allow us to control when things appear. If for example a compose fails we don't want to sync things, and we don't want ostrees to just update as they appear, we want the rpms and ostrees to update at around the same time (once we know the rpm is released/composed/ready)

This is becoming a problem as we now have x86_64 KInoite Rawhide builds impacted:
- https://pagure.io/releng/failed-composes/issue/8158
- https://kojipkgs.fedoraproject.org//work/tasks/3207/132063207/runroot.log

ok, so shall I:

rm -f /mnt/koji/compose/ostree/repo/refs/heads/fedora/rawhide/aarch64/kinoite
rm -f /mnt/koji/ostree/repo/refs/heads/fedora/rawhide/aarch64/kinoite

and then we see what the next compose does?

Could I get ssh access to a server with read only access to those repos? That would help with debugging.

I'm starting to worry that if we don't find a good resolution here, we will have to delete more and more things and that will not fly for stable refs.

If we can find a some time to look at it together that would be good.

Otherwise we can try cleaning the refs and seeing what happens in the next compose.

ok. I added you to our sysadmin-troubleshoot group and added that to non sudo access on the releng-compose hosts.

See https://docs.fedoraproject.org/en-US/infra/sysadmin_guide/sshaccess/ on how to setup ssh to proxy via our bastion hosts.
Then, you should be able to ssh to branched-compose01.iad2.fedoraproject.org and have ro access to the mounts.

Let me know if you can find anything...

Thanks, taking a look. Note for reference: host is compose-branched01.iad2.fedoraproject.org :)

So, somehow, some files are missing in the compose repo:

$ cat /mnt/koji/compose/ostree/repo/refs/heads/fedora/rawhide/aarch64/kinoite 
122a88f9fb5f5f15f7d76dd8f1cff50f9bf3bc6eb431cf07923f3d1996a5a421
$ cat /mnt/koji/ostree/repo/refs/heads/fedora/rawhide/aarch64/kinoite 
122a88f9fb5f5f15f7d76dd8f1cff50f9bf3bc6eb431cf07923f3d1996a5a421
$ ls /mnt/koji/compose/ostree/repo/objects/12/2a88f9fb5f5f15f7d76dd8f1cff50f9bf3bc6eb431cf07923f3d1996a5a421.commit
ls: cannot access '/mnt/koji/compose/ostree/repo/objects/12/2a88f9fb5f5f15f7d76dd8f1cff50f9bf3bc6eb431cf07923f3d1996a5a421.commit': No such file or directory
$ ls /mnt/koji/ostree/repo/objects/12/2a88f9fb5f5f15f7d76dd8f1cff50f9bf3bc6eb431cf07923f3d1996a5a421.commit
/mnt/koji/ostree/repo/objects/12/2a88f9fb5f5f15f7d76dd8f1cff50f9bf3bc6eb431cf07923f3d1996a5a421.commit

We should be able to pull from the release repo to the compose one, which should pull all the missing objects/commits. Probably something like:

$ cd /mnt/koji/compose/ostree/repo
$ ostree pull-local /mnt/koji/ostree/repo fedora/rawhide/aarch64/kinoite

Big question here is how we ended up in this state as this is really weird. Maybe this is related to the garbage collection process as that is the only thing I can think of that would clean up commits in a repo.

CC @dustymabe @jlebon

ok, so should I try the ostree pull-local then? Is there a 'dry-run' way to do it to confirm it's going to copy the right stuff?

Unfortunately I don't think there is a dry-run command :/

The fedora-ostree-pruner is configured to delete commits in the history of refs in the compose repo after 90 days.

https://github.com/coreos/fedora-coreos-releng-automation/blob/449db4ec0c3cd1749efad390857c0d1ad17fcbb4/fedora-ostree-pruner/fedora-ostree-pruner#L43

The compose repo is just used for new composes and then that content gets synced to the prod repo. The compose repo shouldn't have the full history for every ref. That's what the prod repo is for (at least for the refs that get synced there).

Note IIUC the pruner shouldn't delete any content that was referenced by commits less than 90 days old.

$ cd /mnt/koji/compose/ostree/repo
$ ostree pull-local /mnt/koji/ostree/repo fedora/rawhide/aarch64/kinoite

This looks reasonable, especially since it's just rawhide I'm less concerned about things going wrong.

I'll also mention. If this was the pruner I would expect other Atomic Desktops to be having a similar problem. i.e. all of them would start failing at the same time and not just a single arch of a single variant.

I'll also mention. If this was the pruner I would expect other Atomic Desktops to be having a similar problem. i.e. all of them would start failing at the same time and not just a single arch of a single variant.

That is.. unless the ostree prune code has a bug, which is possible ;)

$ cd /mnt/koji/compose/ostree/repo $ ostree pull-local /mnt/koji/ostree/repo fedora/rawhide/aarch64/kinoite

This looks reasonable, especially since it's just rawhide I'm less concerned about things going wrong.

Thanks Dusty. Let's start with that and see if that improves things?

Kevin ran the command and things look in order now:

[siosm@compose-branched01 ~][PROD-IAD2]$ cat /mnt/koji/compose/ostree/repo/refs/heads/fedora/rawhide/aarch64/kinoite
122a88f9fb5f5f15f7d76dd8f1cff50f9bf3bc6eb431cf07923f3d1996a5a421
[siosm@compose-branched01 ~][PROD-IAD2]$ cat /mnt/koji/ostree/repo/refs/heads/fedora/rawhide/aarch64/kinoite
122a88f9fb5f5f15f7d76dd8f1cff50f9bf3bc6eb431cf07923f3d1996a5a421
[siosm@compose-branched01 ~][PROD-IAD2]$ ls /mnt/koji/compose/ostree/repo/objects/12/2a88f9fb5f5f15f7d76dd8f1cff50f9bf3bc6eb431cf07923f3d1996a5a421.commit
/mnt/koji/compose/ostree/repo/objects/12/2a88f9fb5f5f15f7d76dd8f1cff50f9bf3bc6eb431cf07923f3d1996a5a421.commit
[siosm@compose-branched01 ~][PROD-IAD2]$ ls /mnt/koji/ostree/repo/objects/12/2a88f9fb5f5f15f7d76dd8f1cff50f9bf3bc6eb431cf07923f3d1996a5a421.commit
/mnt/koji/ostree/repo/objects/12/2a88f9fb5f5f15f7d76dd8f1cff50f9bf3bc6eb431cf07923f3d1996a5a421.commit

It worked! :tada:

@kevin can you fix the x86_64 ref as well? Thanks!

$ cd /mnt/koji/compose/ostree/repo
$ ostree pull-local /mnt/koji/ostree/repo fedora/rawhide/x86_64/kinoite
# sudo -u ftpsync ostree pull-local /mnt/koji/ostree/repo fedora/rawhide/x86_64/kinoite
673 metadata, 1148 content objects imported; 0 bytes content written   

Thanks! This worked. We are now back on track for Rawhide builds.

Metadata Update from @siosm:
- Issue close_status updated to: Fixed with Explanation
- Issue status updated to: Closed (was: Open)

17 days ago

Log in to comment on this ticket.

Metadata
Boards 1
Ops Status: Backlog