#708 Change Updates Policy to avoid packages like grub being pushed in <24 hours and breaking our distro
Opened 2 years ago by kparal. Modified 9 months ago

There have been a series of critical packages being pushed stable too fast and breaking our distro. I failed to write them down in the past, but I'm starting today.

The latest example is grub2 from the last week. It spent mere 13 hours in updates-testing and then it was pushed because of +3 karma. Just 1 hour afterwards a negative comment was added, but it was too late. The update ended up in stable and probably broke booting to Windows for all our users on UEFI. The details are here:
https://bodhi.fedoraproject.org/updates/FEDORA-2022-8ffd58c713
https://bugzilla.redhat.com/show_bug.cgi?id=2115202

While we can blame the package maintainer for keeping the default "Stable by Karma = 3" for such a critical package like grub (can you imagine a more critical package?), instead of raising it to something like 15 or disabling it completely, I believe this is a policy failure. In particular this policy:
https://docs.fedoraproject.org/en-US/fesco/Updates_Policy/
and especially this section:
https://docs.fedoraproject.org/en-US/fesco/Updates_Policy/#karma-requirements

The defaults are completely wrong for critical packages like grub. Three thumbs-up in a short succession shouldn't mean that grub gets pushed immediately and automatically to millions of users. Yet, if the package maintainer doesn't pay attention to every single update they create and reconfigure it to something more reasonable every single time, that's exactly what happens. And I believe it's QA's job to come up with a better policy with better defaults and convince FESCo that it needs to be changed.

Let's use this ticket to track this issue and let's collect thoughts on what a better policy should look like either here on in the mailing lists.


Probably to make a new record, the updated grub package spent just 7 hours in testing. This is a recipe for disaster, which already happened (on a smaller scale) and will inevitably happen on a bigger scale as well, if we don't do something about it.

As a continuation of this story, while the very next grub update had karma autopush disabled, the next one was again published just with the default karma=3 autopush, and will likely get autopushed in <24 hours once again. I tried to reason with grub maintainer in Bodhi updates comments section (linked in the comment above), and as you can see, I was clearly unsuccessful.

This flurry of "testing changes in production" resulted not just in a broken boot for Windows users, but also completely broke Silverblue, Kinoite and IoT updates for stable F36 users, with a manual fix necessary.

I believe this nicely demonstrates the need for a policy change. We can't just rely on maintainers doing the right thing, because some of them don't, even when explained (or they can make mistakes), which means we need more safeguards so that our users' systems can't be broken so easily.

An idea I just came up with in the update, copied here:

"One thing I can think of is we could put a delay on autopush. It could be set not to happen until 24 or 48 hours after the update hit updates-testing, and be cancelled if any negative karma appeared during that period. This wouldn't require maintainers to do anything manually in the 'success case'."

Forgot to mention, the benefit of that versus just a flat minimum period in testing before going stable is we could still manually push e.g. critical CVE updates stable earlier.

Metadata Update from @kparal:
- Issue set to the milestone: Undefined Future (was: Fedora 37)

a year ago

It looks like https://bodhi.fedoraproject.org/updates/FEDORA-2023-bbb8d72c6f is another case, where it broke the KDE spin but it passed within the same day. There's a bunch of negative karma when people noticed the issue

That's quite an interesting case: it's an extremely important security fix so we specifically wanted to push it out very quickly. On balance (IMHO) it's still better that the security-fixed version is stable, even with that bug.

It looks like https://bodhi.fedoraproject.org/updates/FEDORA-2023-bbb8d72c6f is another case

Thanks for a comment. This case is a bit different - my ticket was only concerned about stable update for currently supported releases. Releases in development should probably have different requirements, and that wasn't my concern when creating this ticket.

so...are we doing anything here?

ironically I have a two month old bodhi git branch where I am trying to clean up all the logic and code around "does this update 'meet requirements'" (in various senses) and actually pushing updates stable. it's already pretty complicated. implementing the mechanism I blithely suggested above would make it...more so.

changing the updates policy is more within FESCo's mien than ours, so I'm not sure this should be a ticket on our team? or was the idea to work up a proposal for fesco?

Yeah, the idea was to discuss it in our team, find a proposal that we stand behind as a team, and propose it to FESCo. I expect tough opposition from package maintainers, though. "The spice and the updates must flow" sentiment. That's why I feel this is going to be lots of work for potentially zero gain. I still think we should it, though. So far, this ticket only served as a collection place for my rumbling :-/

Log in to comment on this ticket.

Metadata