Currently the CLI recovery environment can be complicated an unappealing. A GUI recovery environment can increase the possibilities of fixing the system and provide easier ways for new comers and people with different levels of CLI knowledge to repair their devices. Also, other distros could benefit from this. Here's what I thought:
Thanks for these ideas! I agree that the recovery prompt isn't friendly.
Personally, I'd approach this by looking at the specific errors that can cause someone to end up at the recovery prompt, and what we can do about each of them, as opposed to trying to design a generic environment in which someone has to somehow figure out how to fix their system.
Thanks for these ideas! I agree that the recovery prompt isn't friendly. Personally, I'd approach this by looking at the specific errors that can cause someone to end up at the recovery prompt, and what we can do about each of them, as opposed to trying to design a generic environment in which someone has to somehow figure out how to fix their system.
o/
Something like Startup Repair tool from Windows? It collects data and then tries to fix the OS.
We discussed this ticket at today's WG meeting.
We agreed that there would be value in having a graphical recovery environment. Just being able to boot to a generic desktop would be better than what we have now, and would at least allow someone to use the web or prepare install media to allow reinstallation.
If we were to include recovery tools, we'd need to have a better idea of what the most common issues are. Ideally we'd be able to automatically diagnose those common issues.
The other thing we discussed is the importance of having the ability to repair an existing installation from the installer. This would overwrite the existing install while preserving /home and recreating the existing user accounts.
Two actions agreed:
Metadata Update from @aday: - Issue tagged with: pending-action
Thread started on devel@: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/TMRL4DLKKZZR4FP3HTO7YLR5Y4CVFNGB/
devel@
Also, related ticket: fedora-btrfs/project#23
Something to consider, as I noticed the upgrade to F36 moved my kernel installs on this system to /boot/efi, there is barely enough room for the existing kernels there. I don't think you could fit a gui initramfs in there even if you deleted all of the kernels.
Yikes, yeah, we'd have to put it on /boot or finagle some other way to get it...
/boot
A few data points from me:
If think that the live image could be improved to help to restore the system. I.e. if there was easy way to access the system drive in some chroot or toolbx environment ...
@jforbes
I noticed the upgrade to F36 moved my kernel installs on this system to /boot/efi
This is happening recently?
@hobbes1069
A helper script could pretty much automate this
This functionality is built into Anaconda/blivet. The magic is how it goes about finding /etc/fstab, but then it just assembles things accordingly. Maybe that can be repurposed. Curiously this is only available in Anaconda's rescue mode, which isn't available when launching Anaconda from Live media, only netinstallers and DVDs have this option.
@jforbes kernels moving to /boot/efi is not intentional, AFAIK; you probably hit some iteration of the systemd machine-id mess when you upgraded to f36.
/boot/efi
Check if you have a /boot/efi/<machine-id>, where <machine-id> is the ID from /etc/machine-id. If so I think the recovery is to wipe that directory entirely, and re-run the correct kernel-install commands for your installed kernels. Something along those lines.
/boot/efi/<machine-id>
<machine-id>
/etc/machine-id
kernel-install
I should probably test that this doesn't happen on a fresh install/upgrade. I don't think it should any more, but best to be safe...
It is entirely possible that this is something wonky with my machine, which is why I didn't post it. That machine was installed on Fedora 22 I think and has been upgraded with dnf ever since. I have replaced the motherboard and CPU, but not the root disk. I am going to verify on another old install that I have soonish, but I need a couple of other bugs to be fixed in beta first. I did recover the machine the way I wanted once I figured out what it was doing. That machine was upgraded Tuesday of this week, so yes, it was recently.
Oh dear, I was hoping you'd upgraded earlier, before the grub change to ignore machine-ids...
No, it was after that change, I just noticed new installed machines have a larger /boot/efi as well.
yeah, we changed it from 200M to 500M at some point I believe. I have an old system with only 200M like you, heh.
We can Implement a Pop OS style recovery like how System76 make it for Pop OS , Is that possible or you would want a completely engineered from scratch Recovery instead of basing it on another distro's Recovery in this case Pop OS?
Pop!_OS recovery is literally just a copy of the install ISO laid down as another partition: https://support.system76.com/articles/pop-recovery/
So there's nothing special there. The "Refresh install" mode is something that's built into their installer, and Anaconda is capable of the same thing, there's just no UI for it.
@ngompa : Okay , so we just need to write an UI for the Anaconda Installer which does the Recovery Part right?
(Not a new) idea: have the "rescue" boot entry point to a read-only snapshot of the system root, with a volatile overlay for persistence so the environment doesn't get mad at the total lack of write access.
root
root.rescue
mv root root.broken btrfs subvolume snapshot root.rescue root reboot
Just my 2¢, but I like Chris' not-new idea much better than having another partition to deal with. If vendors want some way that users can factory-reset their devices, I'd much rather see them provide that as a downloadable thumb drive image than reserving space on the HD. Or even provide such a (readonly?) bootable thumb drive with the PC when it is purchased. Providing an external factory-reset device would seem to have the additional benefit of working in cases where the user needed to replace or upgrade their HD.
While I do like it as an option, I also think if it is the only option, it becomes problematic for people who do not have physical access to machines they are administering. There are other options there as well, but putting something on the HD should at least be an option.
Putting something on the HD is fine. Just don't make it a separate partition. Put it in file(s) somewhere (or snapshots as Chris suggested) instead. Adding a unique extra partition creates something that has to be worked around in more complex multiple-drive configurations. In particular, partitions mirrored across disks have to be equal in size. So you have to "waste" space on the other disks and worry about partition numbers not lining up and such. Maybe vendors don't provide such configurations out-of-the-box today, but I hope they will at some point and I wouldn't want the recovery partition to be something that needs to be worked around.
I don't really see requiring physical access to the machines in case of a recovery scenario as being significant. It would be nice not to need to physical access the machine, but there are always cases where that is necessary (e.g. a failed hard drive). And the snapshot recovery option should work without needing physical access in most cases (such remotely-administered machines typically allow access to the console -- boot menu and all -- over the network).
I suggest to integrate Timeshift or snapper to take auto backups like they do in Garuda Linux and have the user the choice to restore it from GRUB if anything goes wrong. Garuda also uses BTRFS
Pop!_OS have a recovery partition, which stores a full copy of the installation media. It's a interesting idea! Also, Pop!_OS allows the user to refresh the system, without losing files, from installer and also from GNOME Settings:
<img alt="a.png" src="/fedora-workstation/issue/raw/files/374a10da11f852ff7d02c86438350cf6e3412cb60dfb70de99fb8a7caeebb71e-a.png" />
It's accessible by bringing the systemd-boot menu:
To boot into recovery mode, bring up the systemd-boot menu by holding down SPACE while the system is booting, or by holding/tapping any function keys NOT used to Access the BIOS/Boot Menu (On non-System76 hardware, try the keys F1 through F12):
<img alt="b.png" src="/fedora-workstation/issue/raw/files/5fd11a3cead167fbcd022be17d65ff2251035f828f3832823a1bddd34f00c1f1-b.png" />
Two actions agreed: @ngompa to start a mailing list thread about the possibilities for booting a graphical recovery environment @aday to add repair installation to our installer requirements document
Hi, can you post progress updates please?
Maybe we need to list all the features, and then figure out a way to order them in terms of how worth the effort they are? Or in terms of resources needed vs resources available to make them happen? This includes a list of what the environment should be capable of doing. The potential for scope creep seems high though, in that everything we think of would be awesome to have but then rapidly becomes complex as we have to work out how to make it happen in the Fedora releng/compose system, and then QA for testing it, and the more UI/UX is there we run into a need for some usability testing and translations, etc. How to get to a minimum viable feature that's also one we could build on without tearing it all down to get to a 2.0 version?
We discussed this ticket at yesterday's WG meeting.
@ngompa to start a mailing list thread about the possibilities for booting a graphical recovery environment @aday to add repair installation to our installer requirements document
Both of these things happened. The mailing list thread didn't produce much in the way of useful information.
A preference was expressed amongst the group to continue researching this topic, with a view to identifying:
@chrismurphy has kindly volunteered to take this task on.
Metadata Update from @aday: - Issue assigned to chrismurphy
Fedora users mailing list thread, specific to the ways in which Fedora tends to break: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org/thread/P4XLPR4MRPMTOPX5A6H326W3E4RZUOAT/
Multiple mentions of: - nouveau kernel regression - system update related, in which the system boots OK but user experiences anomalous behavior once logged in - existing rescue option isn't used or isn't helpful - generally pleased with reliability
I've started a new thread with followup questions about how helpful (a) a graphical rescue environment would be; (b) a snapshot+rollback mechanism; (c) which of the two would be more helpful. https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org/thread/3I5ZLFVFAXDLXM6LZMN5FI63FRW7KKXJ/
@chrismurphy I wonder if this is really a realistic project to take on. Are you still planning to work on this?
I suspect the live image is perfectly adequate as a graphical rescue environment?
Any more recent thoughts on having a snapshot and rollback mechanism?
If this is too out of scope, maybe an app could be develop and pre-installed in the live media (and in the "final" installation) to create and manage BRTFS snapshots. Timeshift currently doesn't work in Fedora because it needs a specific BRTFS setup iirc.
We discussed this issue at last week's WG meeting (23 May 2023). @ngompa is continuing to think about the technical design, and in that respect it remains active.
There are some open questions regarding the scope of the recovery features, which need to be settled. At a minimum, we need a way to support factory restore for OEM cases.
There's no particular time frame for this task, so removing pending-action.
Metadata Update from @aday: - Issue untagged with: pending-action
I think nobody is interested in actually working on this, so I'm tempted to close it.
Is there anything pending Working Group attention here?
Nothing is pending for the WG but I think this is a bit more than an aspirational idea, though I don't know that anyone is working on it.
Seems to me a prerequisite includes a design, and also pretty significant decision about any disk layout changes: partitions, subvolumes, whatever else. Probably bootloader changes including setting boot next in the GNOME session prior to a reboot rather than depending on a pre-boot menu. That's a lot of work, so I'm not attached either way to preserving this issue. It'll come back one of these days I'm sure.
OK, I'm going to be bold and close this because nobody is working on it.
If somebody wants to work on GUI recovery environment, feel free to reopen this issue.
Metadata Update from @catanzaro: - Issue close_status updated to: Won't fix - Issue status updated to: Closed (was: Open)
GNOME is investigating this here
Log in to comment on this ticket.