Issue #6: enable discard=async by default - project

fedora-btrfs / project

#6 enable discard=async by default

Closed: Fixed a year ago by ngompa. Opened 3 years ago by chrismurphy.

We should consider discard=asyncby default on all baremetal installations; and coordinate with upstream about making it a default there too.

Summary:

Facebook is using it across the board with consumer level hardware for > 6 months, with no regressions; it's solved many problems
Too many discards too fast overwhelms weak SSD firmware.
Too few discards can sometimes result in too few "ready" erased blocks, forcing the firmware to work hard to erase blocks on demand - which can mean moving SSD pages around so that a erase block can be erased. This can significantly slow down an SSD.
discard=async gives the right amount of hinting, what's ready for erasure
VM images probably should not use discards, to avoid becoming sparse files and therefore fragmented. But this is best left up as a policy for the VM manager, whether to pass down discards. References:
https://www.redhat.com/archives/libvir-list/2020-August/msg00341.html
https://bugzilla.redhat.com/show_bug.cgi?id=1860720#c8

How to implement:

We can do this in kickstart using --fsoption discard=async which will enable it during installation and add it to /etc/fstab.

Overly verbose background and extras in this now closed RFE: kickstart option to control discard configuration

atim commented 3 years ago

Should we use discard=async for NVMe SSD's as well? Or periodic TRIM is preferable for NVMe devices?

chrismurphy commented 3 years ago

The discard mount option (both sync and async) will discard metadata blocks, and it happens fairly soon after blocks are freed. Those blocks might be needed by 'usebackuproot' if there's a crash or powerfailure. It's trivially reproducible (locally) to discover that those backuproots are all zeros if either discard mount option is enabled.

I'm inclined to say we should stick with just fstrim.timer as the primary way discards get issued. Once a week is surely often enough for the vast majority of workloads. And leave it up to case by case basis for enabling the discard mount option.

If it's possible to issue discards only to data blocks, that would be better.

Edited 3 years ago by chrismurphy

kparal commented 3 years ago

Neither the discard section nor the usebackuproot section in man 5 btrfs mention this. I wonder if it is intentional. Perhaps it would be worth the effort to file a request for btrfs developers to either implement data-only discard option, or to make sure that backuproot metadata are not discarded immediately (but later, compared to all other blocks). It might even be a bug.

chrismurphy commented 3 years ago

Upstream definitely knows about it. I may have overstated the value of usebackuproot because it shouldn't ever be needed, it's really a work around for buggy firmware to try to get back to a (hopefully) known good root in case write ordering got messed up by the firmware.

While most drives do honor write ordering properly, and can use discard safely, I think the overwhelming majority of users are adequately and well served by the default weekly fstrim. And that the incremental improvement of discard applies to few cases in Fedora compared to the risk for those who have buggy drives and untimely crash/power fail. I am really on the fence, and not opposed to discard by default but I'm inclined to be more conservative and let upstream lead on when discard should become the default. And right now it isn't.

Perhaps this is best dealt with documentation of mount options that aren't enabled by default but are recommended for specific use cases?

Metadata Update from @ngompa:
- Issue set to the milestone: Future Release (was: Fedora 34)

3 years ago

ngompa commented 2 years ago

At today's Fedora Hatch event, @masonchrislo was talking about discard=async and how he considers it pretty reasonable to activate by default when we detect that Btrfs is being put onto an SSD. We may want to revisit this and see if we can make fstrim.timer more intelligent to not collide with Btrfs.