Issue #708: Change Updates Policy to avoid packages like grub being pushed in <24 hours and breaking our distro - fedora-qa

fedora-qa

#708 Change Updates Policy to avoid packages like grub being pushed in <24 hours and breaking our distro

Opened 2 years ago by kparal. Modified 2 months ago

There have been a series of critical packages being pushed stable too fast and breaking our distro. I failed to write them down in the past, but I'm starting today.

The latest example is grub2 from the last week. It spent mere 13 hours in updates-testing and then it was pushed because of +3 karma. Just 1 hour afterwards a negative comment was added, but it was too late. The update ended up in stable and probably broke booting to Windows for all our users on UEFI. The details are here:
https://bodhi.fedoraproject.org/updates/FEDORA-2022-8ffd58c713
https://bugzilla.redhat.com/show_bug.cgi?id=2115202

While we can blame the package maintainer for keeping the default "Stable by Karma = 3" for such a critical package like grub (can you imagine a more critical package?), instead of raising it to something like 15 or disabling it completely, I believe this is a policy failure. In particular this policy:
https://docs.fedoraproject.org/en-US/fesco/Updates_Policy/
and especially this section:
https://docs.fedoraproject.org/en-US/fesco/Updates_Policy/#karma-requirements

The defaults are completely wrong for critical packages like grub. Three thumbs-up in a short succession shouldn't mean that grub gets pushed immediately and automatically to millions of users. Yet, if the package maintainer doesn't pay attention to every single update they create and reconfigure it to something more reasonable every single time, that's exactly what happens. And I believe it's QA's job to come up with a better policy with better defaults and convince FESCo that it needs to be changed.

Let's use this ticket to track this issue and let's collect thoughts on what a better policy should look like either here on in the mailing lists.

kparal commented 2 years ago

Probably to make a new record, the updated grub package spent just 7 hours in testing. This is a recipe for disaster, which already happened (on a smaller scale) and will inevitably happen on a bigger scale as well, if we don't do something about it.

kparal commented 2 years ago

As a continuation of this story, while the very next grub update had karma autopush disabled, the next one was again published just with the default karma=3 autopush, and will likely get autopushed in <24 hours once again. I tried to reason with grub maintainer in Bodhi updates comments section (linked in the comment above), and as you can see, I was clearly unsuccessful.

This flurry of "testing changes in production" resulted not just in a broken boot for Windows users, but also completely broke Silverblue, Kinoite and IoT updates for stable F36 users, with a manual fix necessary.

I believe this nicely demonstrates the need for a policy change. We can't just rely on maintainers doing the right thing, because some of them don't, even when explained (or they can make mistakes), which means we need more safeguards so that our users' systems can't be broken so easily.

adamwill commented 2 years ago

An idea I just came up with in the update, copied here:

"One thing I can think of is we could put a delay on autopush. It could be set not to happen until 24 or 48 hours after the update hit updates-testing, and be cancelled if any negative karma appeared during that period. This wouldn't require maintainers to do anything manually in the 'success case'."

adamwill commented 2 years ago

Forgot to mention, the benefit of that versus just a flat minimum period in testing before going stable is we could still manually push e.g. critical CVE updates stable earlier.

Metadata Update from @kparal:
- Issue set to the milestone: Undefined Future (was: Fedora 37)

7 months ago

redstrate commented 6 months ago

It looks like https://bodhi.fedoraproject.org/updates/FEDORA-2023-bbb8d72c6f is another case, where it broke the KDE spin but it passed within the same day. There's a bunch of negative karma when people noticed the issue

adamwill commented 6 months ago

That's quite an interesting case: it's an extremely important security fix so we specifically wanted to push it out very quickly. On balance (IMHO) it's still better that the security-fixed version is stable, even with that bug.

Edited 6 months ago by adamwill

kparal commented 6 months ago

It looks like https://bodhi.fedoraproject.org/updates/FEDORA-2023-bbb8d72c6f is another case

Thanks for a comment. This case is a bit different - my ticket was only concerned about stable update for currently supported releases. Releases in development should probably have different requirements, and that wasn't my concern when creating this ticket.

adamwill commented 2 months ago

so...are we doing anything here?

ironically I have a two month old bodhi git branch where I am trying to clean up all the logic and code around "does this update 'meet requirements'" (in various senses) and actually pushing updates stable. it's already pretty complicated. implementing the mechanism I blithely suggested above would make it...more so.

changing the updates policy is more within FESCo's mien than ours, so I'm not sure this should be a ticket on our team? or was the idea to work up a proposal for fesco?

kparal commented 2 months ago

Yeah, the idea was to discuss it in our team, find a proposal that we stand behind as a team, and propose it to FESCo. I expect tough opposition from package maintainers, though. "The spice and the updates must flow" sentiment. That's why I feel this is going to be lots of work for potentially zero gain. I still think we should it, though. So far, this ticket only served as a collection place for my rumbling :-/

Metadata

Assignee

kparal

Tags

Blocking

None

Depending on

None

Priority

normal

Milestone

Undefined Future

fedora-qa

Source Code

#708 Change Updates Policy to avoid packages like grub being pushed in <24 hours and breaking our distro Opened 2 years ago by kparal. Modified 2 months ago

Close issue as:

Metadata

task enhancement

#708 Change Updates Policy to avoid packages like grub being pushed in <24 hours and breaking our distro

Opened 2 years ago by kparal. Modified 2 months ago