#4594 Need RHEL7 VM for Fedora atomic composes
Closed: Fixed None Opened 10 years ago by lmacken.

We need a RHEL7 box for Fedora atomic tree composes. I wrote an atomic-composer ansible role, which is currently running on the composer.stg box, but we need something for production.

This box will most likely need read/write access to /mnt/koji/mash/atomic as well.


Which composes will happen there?

My understanding was that official composes just happen on the regular compose boxes?

And branched/rawhide happen on those boxes?

Replying to [comment:1 kevin]:

Which composes will happen there?
And branched/rawhide happen on those boxes?

At least updates & updates-testing composes, and it is trivial to enable rawhide & branched as well if we wanted, since it is fedmsg-driven.

My understanding was that official composes just happen on the regular compose boxes?

We need the OSTree stack to compose trees, which from what I understand doesn't work on RHEL6. I've been using builds from my copr on composer.stg (after signing them and putting them in our infra repo) https://copr.fedoraproject.org/coprs/lmacken/fedmsg-atomic-composer/

This seems to be the wrong approach to creating updates trees. We need it to be tightly coupled with updates pushes. As in if the tree compose fails we need to fail the updates push and fix whatever is causing the failure. we should run rpm-ostree compose in a mock chroot of the target os. i.e. a fedora 21 chroot for fedora 21 not on rhel. if there is issues running things in a chroot on a rhel6 box we can use the builders via the existing ssh setup we have for pungify in teh nightly composes.

Replying to [comment:3 ausil]:

This seems to be the wrong approach to creating updates trees. We need it to be tightly coupled with updates pushes. As in if the tree compose fails we need to fail the updates push and fix whatever is causing the failure.

This will add hours to the updates push process. Are we sure we want to block all updates repos from syncing to the mirror until all atomic trees are composed? If something breaks the atomic composes we'd most likely catch it in updates-testing first and be able to easily fix it. I'm a fan of the decoupled approach, but if we want it tightly coupled, that's feasible but it'll have drawbacks.

we should run rpm-ostree compose in a mock chroot of the target os. i.e. a fedora 21 chroot for fedora 21 not on rhel. if there is issues running things in a chroot on a rhel6 box we can use the builders via the existing ssh setup we have for pungify in teh nightly composes.

Composing in a mock chroot should be easy enough. My fedmsg-driven composer solution is based around how atomic01.qa is setup, and relies on the rpm-ostree-toolbox and the systemd journal, but if we want it tightly coupled into the bodhi masher then we'll have to scrap my solution and start from scratch.

The updates process is already very lengthy; what's the reasoning behind making it longer?

Are there potential failures in the Atomic compose that would not also happen in a normal updates tree compose?

If so, what's the negative effect of an Atomic compose that fails separately (asynchronously) from an updates tree compose?

My understanding (and Dennis, please correct me):

  • We have to mash the updates first in order to have the updated repo to point rpm-ostree too anyhow, no? We could fire off each atomic compose after the updates tree it needs is done?

  • If we treat the atomic compose and updates push as seperate, that could mean one or the other was 'newer' or contained things the other didn't. We wouldn't want to push out say x86_64 rpms when the i686 repo failed to compose... is this different somehow?

  • Say updates compose finished fine, but atomic failed. Would we want to push the updates without fixing the atomic compose so they stay in sync?

I guess it comes down to if the atomic composes are really a actual thing we want to keep in sync with updates and treat as a real deliverable or just a side thing that we don't care if they get out of sync or dont happen on the same schedule.

For F21, let's do whatever is the most straightforward and gets us to the release easiest.

For the future, maybe we actually want to look at making more of the process decoupled, so more can happen in parallel -- or be restarted or debugged independently.

Replying to [comment:6 kevin]:

My understanding (and Dennis, please correct me):

  • We have to mash the updates first in order to have the updated repo to point rpm-ostree too anyhow, no? We could fire off each atomic compose after the updates tree it needs is done?

Right, bodhi needs to mash the repos first. The question then is whether to
fire off the atomic tree compose as soon as the updates repos are done rsyncing
(after the push is complete), or if we want to block the entire push until the
atomic trees compose successfully. Since updates pushes are currently being
done as giant batches containing multiple releases/types/severities, this could
block them all up if 1 release/arch/repo atomic tree fails. This would also slow down the
process of pushing out security updates in general.

My initial approach was to create a modular service that triggered composes as
soon as the releng tools finish rsyncing repos. The goal was to not have to
modify bodhi at all, but if we want it to block the push until the atomic tree
has successfully composed, then it'll require modifications to bodhi1 which
then need to get ported to bodhi2.

In bodhi2 we have been discussing fedmsg-driven gating, where we could wait for
a fedmsg from taskotron or the atomic composer, for example, before making the
repos public. This will be easy in bodhi2, since it's a fedmsg consumer and can
react accordingly. To do this with bodhi1 we'd need some extra code to kick off
threads for each tree compose and wait for them to all succeed before flipping the
repo symlinks.

  • If we treat the atomic compose and updates push as seperate, that could mean one or the other was 'newer' or contained things the other didn't. We wouldn't want to push out say x86_64 rpms when the i686 repo failed to compose... is this different somehow?

With either approach, the atomic repos would never be able to be newer than the
repos. With the case of triggering the atomic compose based on the
bodhi.updates.fedora.sync/compose.{branched,rawhide}.rsync.complete fedmsgs,
then the repos would remain newer for as long as it takes to compose the
corresponding atomic trees.

  • Say updates compose finished fine, but atomic failed. Would we want to push the updates without fixing the atomic compose so they stay in sync?

I'm leaning more towards yes at the moment. Catching regressions within software in our repos is what updates-testing is for.

Let's say we ship a new rpm-ostree update with a bug and it hits the
updates-testing repo. The atomic composer then fails, and the normal updates
repos + trees go out in the mean time. The bug then gets fixed upstream,
then obsoleted in bodhi, and as a new tree gets composed as soon as the new
updates-testing repo is out.

In this case the updates-testing tree compose would hit the bug, and the repo
would be out of sync with the atomic tree for as long as it takes to push a new
update into testing.

I guess it comes down to if the atomic composes are really a actual thing we want to keep in sync with updates and treat as a real deliverable or just a side thing that we don't care if they get out of sync or dont happen on the same schedule.

I guess the question is, does "treating as a real deliverable" imply "keeping
rpm repos & atomic trees in perfect sync, all or nothing"?

A fedmsg-driven approach would could get all rawhide/updates/branch repos out
faster in general, but if we want to maintain perfect consistency between our repos & ostrees, then we'll have to take the performance hit.

Replying to [comment:8 lmacken]:

Replying to [comment:6 kevin]:

My understanding (and Dennis, please correct me):

  • We have to mash the updates first in order to have the updated repo to point rpm-ostree too anyhow, no? We could fire off each atomic compose after the updates tree it needs is done?

Right, bodhi needs to mash the repos first. The question then is whether to
fire off the atomic tree compose as soon as the updates repos are done rsyncing
(after the push is complete), or if we want to block the entire push until the
atomic trees compose successfully. Since updates pushes are currently being
done as giant batches containing multiple releases/types/severities, this could
block them all up if 1 release/arch/repo atomic tree fails. This would also slow down the
process of pushing out security updates in general.

we want to block the final rsync on the atomic tree being sucessful. we want to ensure consistency between across the different methods of delivery. It will slow down the process, but that slow down is not really that much. tree compose time is much much faster than mashing the tree.

My initial approach was to create a modular service that triggered composes as
soon as the releng tools finish rsyncing repos. The goal was to not have to
modify bodhi at all, but if we want it to block the push until the atomic tree
has successfully composed, then it'll require modifications to bodhi1 which
then need to get ported to bodhi2.

In bodhi2 we have been discussing fedmsg-driven gating, where we could wait for
a fedmsg from taskotron or the atomic composer, for example, before making the
repos public. This will be easy in bodhi2, since it's a fedmsg consumer and can
react accordingly. To do this with bodhi1 we'd need some extra code to kick off
threads for each tree compose and wait for them to all succeed before flipping the
repo symlinks.

I really do not want any of the release process to rely on fedmsg as it does not have guaranteed delivery.

  • If we treat the atomic compose and updates push as seperate, that could mean one or the other was 'newer' or contained things the other didn't. We wouldn't want to push out say x86_64 rpms when the i686 repo failed to compose... is this different somehow?

With either approach, the atomic repos would never be able to be newer than the
repos. With the case of triggering the atomic compose based on the
bodhi.updates.fedora.sync/compose.{branched,rawhide}.rsync.complete fedmsgs,
then the repos would remain newer for as long as it takes to compose the
corresponding atomic trees.

there would be a time of things being out of sync and that's not acceptable.

  • Say updates compose finished fine, but atomic failed. Would we want to push the updates without fixing the atomic compose so they stay in sync?

I'm leaning more towards yes at the moment. Catching regressions within software in our repos is what updates-testing is for.

The answer here is no. inconsistency is not acceptable

Let's say we ship a new rpm-ostree update with a bug and it hits the
updates-testing repo. The atomic composer then fails, and the normal updates
repos + trees go out in the mean time. The bug then gets fixed upstream,
then obsoleted in bodhi, and as a new tree gets composed as soon as the new
updates-testing repo is out.

In this case the updates-testing tree compose would hit the bug, and the repo
would be out of sync with the atomic tree for as long as it takes to push a new
update into testing.

I guess it comes down to if the atomic composes are really a actual thing we want to keep in sync with updates and treat as a real deliverable or just a side thing that we don't care if they get out of sync or dont happen on the same schedule.

I guess the question is, does "treating as a real deliverable" imply "keeping
rpm repos & atomic trees in perfect sync, all or nothing"?

the answer here is yes they have to be kept in sync, it is all or nothing.

A fedmsg-driven approach would could get all rawhide/updates/branch repos out
faster in general, but if we want to maintain perfect consistency between our repos & ostrees, then we'll have to take the performance hit.

we need to keep the consitency.

Replying to [comment:9 ausil]:

I guess the question is, does "treating as a real deliverable" imply "keeping
rpm repos & atomic trees in perfect sync, all or nothing"?

the answer here is yes they have to be kept in sync, it is all or nothing.

Okay, I'll make it so my composer can be imported and run by bodhi, which will
block the syncing of the repos until the atomic composes have successfully
completed.

I already have it composing within a mock chroot, so I'll close this ticket
since we don't need a new VM.

Replying to [comment:9 ausil]:

I really do not want any of the release process to rely on fedmsg as it does not have guaranteed delivery.

This is basically a misconception at this point, as all of our fedmsg consumers sync up with datanommer to ensure they haven't missed anything. We have also had zero cases of dropped fedmsgs (outside of firewall misconfiguration), which is more reliable than other things our releng process relies on, like gluster or nfs.

As I mentioned earlier, we have been planning to rely on fedmsg triggers in bodhi2 for certain things like blocking on taskotron checks, and in this case atomic tree composes.

Log in to comment on this ticket.

Metadata