#288 GUI-based recovery environment
Closed: Won't fix 6 months ago by catanzaro. Opened 2 years ago by cereal-lava-planet.

Currently the CLI recovery environment can be complicated an unappealing. A GUI recovery environment can increase the possibilities of fixing the system and provide easier ways for new comers and people with different levels of CLI knowledge to repair their devices. Also, other distros could benefit from this. Here's what I thought:

Features

  • Restoration the entire system by downloading the system image with options to wipe the personal files or not (like in Ubiquity);
  • Basic functionality tools to fix the system such as Terminal, Déjà-Dup or Pika Backup, Fedora Media Writer (if the first feature can't be implemented);
  • An user-friendly Boot Repair-like solution (the environment could still be accessed through BIOS boot menu)
  • Maybe Firefox could be included so the user could search for help documentation.

Cons

  • Unauthorized access
    Possible solution: the environment could be LUKS-encrypted with the root password. A similar thing exists in Windows: when attempting to trigger System Restore from the recovery options, the user password is asked.
  • Disk consumption
    Possible solution: only the very necessary would be included.

Thanks for these ideas! I agree that the recovery prompt isn't friendly.

Personally, I'd approach this by looking at the specific errors that can cause someone to end up at the recovery prompt, and what we can do about each of them, as opposed to trying to design a generic environment in which someone has to somehow figure out how to fix their system.

Thanks for these ideas! I agree that the recovery prompt isn't friendly.

Personally, I'd approach this by looking at the specific errors that can cause someone to end up at the recovery prompt, and what we can do about each of them, as opposed to trying to design a generic environment in which someone has to somehow figure out how to fix their system.

o/

Something like Startup Repair tool from Windows? It collects data and then tries to fix the OS.

We discussed this ticket at today's WG meeting.

We agreed that there would be value in having a graphical recovery environment. Just being able to boot to a generic desktop would be better than what we have now, and would at least allow someone to use the web or prepare install media to allow reinstallation.

If we were to include recovery tools, we'd need to have a better idea of what the most common issues are. Ideally we'd be able to automatically diagnose those common issues.

The other thing we discussed is the importance of having the ability to repair an existing installation from the installer. This would overwrite the existing install while preserving /home and recreating the existing user accounts.

Two actions agreed:

  • @ngompa to start a mailing list thread about the possibilities for booting a graphical recovery environment
  • @aday to add repair installation to our installer requirements document

Metadata Update from @aday:
- Issue tagged with: pending-action

2 years ago

Something to consider, as I noticed the upgrade to F36 moved my kernel installs on this system to /boot/efi, there is barely enough room for the existing kernels there. I don't think you could fit a gui initramfs in there even if you deleted all of the kernels.

Yikes, yeah, we'd have to put it on /boot or finagle some other way to get it...

A few data points from me:

  • I often use System Rescue CD since it has gparted
  • I saw this on the Fedora reddit but never actually used it: https://sourceforge.net/projects/boot-repair/
  • GUI or CLI I don't care, but I get tired of looking up how to mount a full system in a chroot. A helper script could pretty much automate this maybe if only the root device was specified (everything else can be determined through fstab?) and certainly /dev, /proc, /sys can 100% be automated.

If think that the live image could be improved to help to restore the system. I.e. if there was easy way to access the system drive in some chroot or toolbx environment ...

@jforbes

I noticed the upgrade to F36 moved my kernel installs on this system to /boot/efi

This is happening recently?

@hobbes1069

A helper script could pretty much automate this

This functionality is built into Anaconda/blivet. The magic is how it goes about finding /etc/fstab, but then it just assembles things accordingly. Maybe that can be repurposed. Curiously this is only available in Anaconda's rescue mode, which isn't available when launching Anaconda from Live media, only netinstallers and DVDs have this option.

@jforbes kernels moving to /boot/efi is not intentional, AFAIK; you probably hit some iteration of the systemd machine-id mess when you upgraded to f36.

Check if you have a /boot/efi/<machine-id>, where <machine-id> is the ID from /etc/machine-id. If so I think the recovery is to wipe that directory entirely, and re-run the correct kernel-install commands for your installed kernels. Something along those lines.

I should probably test that this doesn't happen on a fresh install/upgrade. I don't think it should any more, but best to be safe...

It is entirely possible that this is something wonky with my machine, which is why I didn't post it.
That machine was installed on Fedora 22 I think and has been upgraded with dnf ever since. I have replaced the motherboard and CPU, but not the root disk. I am going to verify on another old install that I have soonish, but I need a couple of other bugs to be fixed in beta first.
I did recover the machine the way I wanted once I figured out what it was doing. That machine was upgraded Tuesday of this week, so yes, it was recently.

Oh dear, I was hoping you'd upgraded earlier, before the grub change to ignore machine-ids...

No, it was after that change, I just noticed new installed machines have a larger /boot/efi as well.

yeah, we changed it from 200M to 500M at some point I believe. I have an old system with only 200M like you, heh.

We can Implement a Pop OS style recovery like how System76 make it for Pop OS , Is that possible or you would want a completely engineered from scratch Recovery instead of basing it on another distro's Recovery in this case Pop OS?

Pop!_OS recovery is literally just a copy of the install ISO laid down as another partition: https://support.system76.com/articles/pop-recovery/

So there's nothing special there. The "Refresh install" mode is something that's built into their installer, and Anaconda is capable of the same thing, there's just no UI for it.

@ngompa : Okay , so we just need to write an UI for the Anaconda Installer which does the Recovery Part right?

(Not a new) idea: have the "rescue" boot entry point to a read-only snapshot of the system root, with a volatile overlay for persistence so the environment doesn't get mad at the total lack of write access.

  • near-term, the snapshot is made from the root subvolume created during installation
  • initially the root.rescue snapshot takes up no additional space; the space it consumes is proportional to what packages are updated on the root subvolume as time goes by
  • user could choose to delete root.rescue if they prefer to have the space rather than the feature; no partitioning questions
  • root.rescuepins the rescue kernel's modules in the snapshot, so we can always boot this older kernel to a GUI even when the kernel is removed in the root subvolume
  • with additional work, it'd be straightforward to update root.rescue periodically, and even make its footprint smaller by removing non-essential packages
  • the user could opt for a "reset", simplistically something like:
mv root root.broken
btrfs subvolume snapshot root.rescue root
reboot

Just my 2¢, but I like Chris' not-new idea much better than having another partition to deal with. If vendors want some way that users can factory-reset their devices, I'd much rather see them provide that as a downloadable thumb drive image than reserving space on the HD. Or even provide such a (readonly?) bootable thumb drive with the PC when it is purchased. Providing an external factory-reset device would seem to have the additional benefit of working in cases where the user needed to replace or upgrade their HD.

Just my 2¢, but I like Chris' not-new idea much better than having another partition to deal with. If vendors want some way that users can factory-reset their devices, I'd much rather see them provide that as a downloadable thumb drive image than reserving space on the HD. Or even provide such a (readonly?) bootable thumb drive with the PC when it is purchased. Providing an external factory-reset device would seem to have the additional benefit of working in cases where the user needed to replace or upgrade their HD.

While I do like it as an option, I also think if it is the only option, it becomes problematic for people who do not have physical access to machines they are administering. There are other options there as well, but putting something on the HD should at least be an option.

Putting something on the HD is fine. Just don't make it a separate partition. Put it in file(s) somewhere (or snapshots as Chris suggested) instead. Adding a unique extra partition creates something that has to be worked around in more complex multiple-drive configurations. In particular, partitions mirrored across disks have to be equal in size. So you have to "waste" space on the other disks and worry about partition numbers not lining up and such. Maybe vendors don't provide such configurations out-of-the-box today, but I hope they will at some point and I wouldn't want the recovery partition to be something that needs to be worked around.

I don't really see requiring physical access to the machines in case of a recovery scenario as being significant. It would be nice not to need to physical access the machine, but there are always cases where that is necessary (e.g. a failed hard drive). And the snapshot recovery option should work without needing physical access in most cases (such remotely-administered machines typically allow access to the console -- boot menu and all -- over the network).

I suggest to integrate Timeshift or snapper to take auto backups like they do in Garuda Linux and have the user the choice to restore it from GRUB if anything goes wrong. Garuda also uses BTRFS

Pop!_OS have a recovery partition, which stores a full copy of the installation media. It's a interesting idea! Also, Pop!_OS allows the user to refresh the system, without losing files, from installer and also from GNOME Settings:

a.png

It's accessible by bringing the systemd-boot menu:

To boot into recovery mode, bring up the systemd-boot menu by holding down SPACE while the system is booting, or by holding/tapping any function keys NOT used to Access the BIOS/Boot Menu (On non-System76 hardware, try the keys F1 through F12):

b.png

Two actions agreed:

  • @ngompa to start a mailing list thread about the possibilities for booting a graphical recovery environment
  • @aday to add repair installation to our installer requirements document

Hi, can you post progress updates please?

Maybe we need to list all the features, and then figure out a way to order them in terms of how worth the effort they are? Or in terms of resources needed vs resources available to make them happen? This includes a list of what the environment should be capable of doing. The potential for scope creep seems high though, in that everything we think of would be awesome to have but then rapidly becomes complex as we have to work out how to make it happen in the Fedora releng/compose system, and then QA for testing it, and the more UI/UX is there we run into a need for some usability testing and translations, etc. How to get to a minimum viable feature that's also one we could build on without tearing it all down to get to a 2.0 version?

We discussed this ticket at yesterday's WG meeting.

@ngompa to start a mailing list thread about the possibilities for booting a graphical recovery environment
@aday to add repair installation to our installer requirements document

Both of these things happened. The mailing list thread didn't produce much in the way of useful information.

A preference was expressed amongst the group to continue researching this topic, with a view to identifying:

  1. the most common failure states that we'd be trying to recover from
  2. technical design options for the environment itself

@chrismurphy has kindly volunteered to take this task on.

Metadata Update from @aday:
- Issue assigned to chrismurphy

2 years ago

Fedora users mailing list thread, specific to the ways in which Fedora tends to break:
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org/thread/P4XLPR4MRPMTOPX5A6H326W3E4RZUOAT/

Multiple mentions of:
- nouveau kernel regression
- system update related, in which the system boots OK but user experiences anomalous behavior once logged in
- existing rescue option isn't used or isn't helpful
- generally pleased with reliability

I've started a new thread with followup questions about how helpful (a) a graphical rescue environment would be; (b) a snapshot+rollback mechanism; (c) which of the two would be more helpful.
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org/thread/3I5ZLFVFAXDLXM6LZMN5FI63FRW7KKXJ/

@chrismurphy I wonder if this is really a realistic project to take on. Are you still planning to work on this?

I suspect the live image is perfectly adequate as a graphical rescue environment?

Any more recent thoughts on having a snapshot and rollback mechanism?

If this is too out of scope, maybe an app could be develop and pre-installed in the live media (and in the "final" installation) to create and manage BRTFS snapshots. Timeshift currently doesn't work in Fedora because it needs a specific BRTFS setup iirc.

We discussed this issue at last week's WG meeting (23 May 2023). @ngompa is continuing to think about the technical design, and in that respect it remains active.

There are some open questions regarding the scope of the recovery features, which need to be settled. At a minimum, we need a way to support factory restore for OEM cases.

There's no particular time frame for this task, so removing pending-action.

Metadata Update from @aday:
- Issue untagged with: pending-action

11 months ago

I think nobody is interested in actually working on this, so I'm tempted to close it.

Is there anything pending Working Group attention here?

Nothing is pending for the WG but I think this is a bit more than an aspirational idea, though I don't know that anyone is working on it.

Seems to me a prerequisite includes a design, and also pretty significant decision about any disk layout changes: partitions, subvolumes, whatever else. Probably bootloader changes including setting boot next in the GNOME session prior to a reboot rather than depending on a pre-boot menu. That's a lot of work, so I'm not attached either way to preserving this issue. It'll come back one of these days I'm sure.

OK, I'm going to be bold and close this because nobody is working on it.

If somebody wants to work on GUI recovery environment, feel free to reopen this issue.

Metadata Update from @catanzaro:
- Issue close_status updated to: Won't fix
- Issue status updated to: Closed (was: Open)

6 months ago

Login to comment on this ticket.

Metadata
Attachments 2
Attached 2 years ago View Comment
Attached 2 years ago View Comment