#228 Enable full preemption
Opened 3 years ago by chrismurphy. Modified 6 months ago

Now that a single kernel can use preemption none/voluntary/full, we should evaluate the pros and cons of preempt=voluntary (the current default) versus preempt=full.

relevant message in devel@ thread:Dynamic preemption support in Linux 5.12 kernel.

This would be configured with a boot parameter, so it could be an edition/spin specific option. But we need to figure out how to add it in a configurable way. Maybe it can be done with a dracut drop-in config file?

There's other work happening with resource control, so part of the investigation should consider any conflicts, or other downsides.

@benzea


Metadata Update from @chrismurphy:
- Issue tagged with: pending-action

3 years ago

I would prefer that if we decide to do this, we should make this a default for all variants unless someone specifically wants to opt-out. Pretty much everything I have read on this seems to indicate the downsides are relatively minimal across the board.

I believe that this issue can be considered entirely independent from other resource control problems.

Personally, I am wondering a bit how exactly preemption helps here. Is the problem here that spinning up another core is costly, so we should preempt kernel tasks in order to run userspace earlier? i.e.:

CPU0: [kernel input] -> [other kernel task] -> userspace input -> ...
vs. with preemption:
CPU0: [kernel input] -> userspace input -> [other kernel task] -> ...

I've asked Tejun Heo about this, and he's confirmed there shouldn't be any impact on resource control specifically (and if there is, it's a bug we should fix).

I expect for the desktop case, it is a win across the board. There are certain server workloads where it can be a problem. I am not particularly comfortable switching defaults in the middle of a stable release though, particularly when we have so much testing with preempt voluntary (RHEL uses voluntary as well, and does a lot more performance testing than we do). Hopefully we can get a patch which lets us keep the default and still use dynamic so that F33/F34 users will benefit.

Wouldn't it make sense for these to be configurable through the tuning daemon (tuned)? That seems like what this is for...

BTW, this is how to change preempt on-the fly:

# cat /sys/kernel/debug/sched_preempt
(none) voluntary full

# echo full > /sys/kernel/debug/sched_preempt

# cat /sys/kernel/debug/sched_preempt
none voluntary (full) 

Would be nice to make preemptctl.

Sounds like Workstation should switch to preempt for F35.

I agree with Justin that there's no strong reason to make major changes like this in a stable release.

Wouldn't it make sense for these to be configurable through the tuning daemon (tuned)? That seems like what this is for...

We do not install tuned in Workstation.

We do not install tuned in Workstation.

why not? It's easy to do this.

Nobody has ever proposed it afaik.

/sys/kernel/debug/sched_preempt

debugfs is subject to kernel lockdown, which is in effect for UEFI Secure Boot systems, so currently it can't be checked or changed during runtime.

[  531.180062] Lockdown: cat: debugfs access is restricted; see man kernel_lockdown.7

Hmm, apparently there's now a GUI applet for tuned made by @xvitaly: https://github.com/EasyCoding/tuned-switcher

Hmm, apparently there's now a GUI applet for tuned made by @xvitaly: https://github.com/EasyCoding/tuned-switcher

System tray applet is fully functional:

  • full D-Bus support;
  • profile monitoring (will report if another application changes the profile);
  • easy profile switching;
  • shows currently selected profile;
  • supports automatic mode.

Screenshots:

Screenshot

GUI version is still in development.

Packaged and can be installed via sudo dnf install tuned-switcher.

The workstation UI for such things is in the control-center

The workstation UI for such things is in the control-center

power-profiles-daemon devs said that it will not work on desktop Ryzen PC's for example. And it doesn't work not only on desktop Ryzens. Maybe just at this moment and this temporary, so this is JFYI.

This ticket is tagged with pending-action and the F35 milestone. What needs to be done?

Nothing yet, as the patch that allows us to switch preempt mode from where we are doesn't seem to exist yet. I pinged Lutro on the fedora-devel thread to see if it is in progress. Once that patch goes in (and the resulting kernel is available as a rebase target for stable Fedora), is the time to start testing switching the workstation defaults. Users on stable Fedora will be able to manually make the switch, and then you can make a decision on setting a default for the next version. As it doesn't seem to be in 5.14, it will likely not be on the install kernel for F35

Metadata Update from @aday:
- Issue set to the milestone: Fedora 36 (was: Fedora 35)

2 years ago

Summarize a chat with @jforbes:

  • Fedora kernels are and will continue to have voluntary preemption
  • CONFIG_PREEMPT_DYNAMIC just arrived in the 5.16 merge window, and is needed to change to full preemption using kernel parameter preempt=full
  • We would need anaconda to add preempt=full to boot loader configuration, so that it's applied to new desktop installations
  • Discuss whether to run some desktop only RPM, that's just a scriptlet which has grubby set the boot parameter, so it's applied to upgrades
  • changing it via debugfs might be possible, but is untested, including whether it'll give the same results as the command line parameter

kernel crystal ball says:

the v5.16 kernel predictions: merge window closes on Sunday, 2021-11-14 and release on Sunday, 2022-01-09
the v5.17 kernel predictions: merge window closes on Sunday, 2022-01-23 and release on Sunday, 2022-03-20

Fedora 36 schedule key says:

Branch Fedora Linux 36 from Rawhide     Tue 2022-02-08

So it does seem plausible to get it done for Fedora 36, but it needs testing, and a decision about whether to enable it on new installations only or also on upgrades.

http://phb-crystal-ball.org/
https://fedorapeople.org/groups/schedule/f-36/f-36-key-tasks.html

Importantly, running with preempt=full at all needs some significant testing across desktop use cases as all Fedora and RHEL kernels have shipped with preempt=voluntary for a very long time. I honestly do not expect there to be issues, but we need to be sure. FWIW, I plan to switch over to 5.16-rc kernels for desktop use on at least one machine here to give preempt=full some testing as soon as I think things have stabilized enough (the 5.16 MR has been more painful than average so far). But I am probably not representative of the average Fedora workstation user as I still spend the majority of my time in terminals, and I don't have a btrfs filesystem anywhere here.

CONFIG_PREEMPT_DYNAMIC just arrived in the 5.16 merge window

As far as I know, CONFIG_PREEMPT_DYNAMIC arrived in the 5.12 merge window:

found in Linux kernels: 5.12–5.15, 5.15+HEAD

https://cateee.net/lkddb/web-lkddb/PREEMPT_DYNAMIC.html

running with preempt=full at all needs some significant testing across desktop use cases

Many desktop distros successfully use CONFIG_PREEMPT by default. Seems like full preemption is already well-tested.

CONFIG_PREEMPT_DYNAMIC just arrived in the 5.16 merge window

As far as I know, CONFIG_PREEMPT_DYNAMIC arrived in the 5.12 merge window:

found in Linux kernels: 5.12–5.15, 5.15+HEAD

https://cateee.net/lkddb/web-lkddb/PREEMPT_DYNAMIC.html

Yes, but no. It did, in a way that I was not willing to enable it (Requiring preempt=full as default) There was along thread about it on fedora-devel. 5.16 is where the options came in that actually allow me to enable it for Fedora.

running with preempt=full at all needs some significant testing across desktop use cases

Many desktop distros successfully use CONFIG_PREEMPT by default. Seems like full preemption is already well-tested.

I am not too concerned about "many desktop distros" I know what Fedora and Red Hat testing looks like, what the rest of the world does is not of too much concern to me. RHEL in particular has a fairly substantial performance testing setup, and preempt=full was never tested there. Voluntary has been the default for everything but s390 for a very long time. I want to see real testing. Luckily F38 is still a ways away, so we have time to do some of that testing.

From pykickstart docs, looks like it would be

bootloader --append "preempt=full"

If sysfs or debugfs, it could be done with a /etc/udev/rules.d drop-in.

We discussed this ticket at yesterday's working group meeting. The group is generally positive about pursuing full preemption. @chrismurphy has kindly agreed to monitor the situation and keep us informed.

5.16.0-0.rc4.29.fc36.x86_64+debug

# cat /sys/kernel/debug/sched/preempt 
cat: /sys/kernel/debug/sched/preempt: Operation not permitted
# dmesg
...
[  189.486988] Lockdown: cat: debugfs access is restricted; see man kernel_lockdown.7

I think with kernel lockdown in place due to UEFI Secure Boot being enabled here, it's not possible to change this via sysfs, and it'll need to be done by kernel boot parameter which means we need to do this via anaconda kickstart.

Can we have OpenQA run through testing all variants with preempt=full by default? Or organize some kind of test week to test server, cloud, and workstation workloads with this mode?

Can we have OpenQA run through testing all variants with preempt=full by default? Or organize some kind of test week to test server, cloud, and workstation workloads with this mode?

This would certainly be a good idea before changing the default for workstation. I don't know that server is interested given than RHEL seems happy with the voluntary default for kernel-ark right now, and I am not sure that cloud would either. The main benefit is for interactive/desktop type workloads. I do plan to add it to the test plan for 5.16 kernel test-week, and possibly even do a preempt=full ISO.

Sample size 1, running it for a few days now along with bcc-tools fileslower and i'm not seeing anything out of the ordinary compared to voluntary. Of course, fileslower is only looking at IO latency at the VFS layer, so it's a very narrow view.

Can we have OpenQA run through testing all variants with preempt=full by default? Or organize some kind of test week to test server, cloud, and workstation workloads with this mode?

This would certainly be a good idea before changing the default for workstation. I don't know that server is interested given than RHEL seems happy with the voluntary default for kernel-ark right now, and I am not sure that cloud would either. The main benefit is for interactive/desktop type workloads. I do plan to add it to the test plan for 5.16 kernel test-week, and possibly even do a preempt=full ISO.

Well, from the Cloud side (cc: @davdunc), I can say we're interested in any change that might potentially benefit us. From the Server side, data is always interesting when trying to figure out how to make things better. Doing that lets us figure out whether there's value or difference in being different across variants on this point.

Personally, I prefer to minimize differences across Fedora variants if it's reasonably possible. So testing across all the workload types makes sense to see if it makes sense to maintain a delta between desktop and non-desktop variants.

Metadata Update from @chrismurphy:
- Issue set to the milestone: Fedora 37 (was: Fedora 36)

2 years ago

Working full preemption is available with preempt=full boot parameter in 5.16+ kernels. I've been using it since available, and I think it's ready for wider testing. How to do that?

Discussed at meeting, chrismurphy will pull a system wide change proposal together for F37. And coordinate with QA for a test day(s).

Just to be clear, this system wide change proposal is for workstation only.

No, we're planning to propose this change for all editions. Certainly there's no reason for Workstation to differ from other desktop editions. As for server editions, I'll let Neal or Chris comment.

There absolutely is, which is why I did not make preempt=full the default.

Some troubles with backlight on my laptop when preempt=full is setup. The brightness level has dropped noticeably. Although the slider is turned at 100%. I have OLED display and use ICC Brightness tool. https://github.com/udifuchs/icc-brightness.

preempt=full works great on my PC. Much better experience. Didn't noticed any issue at least yet.

I wonder could preempt=full affect battery life on laptops?

There absolutely is, which is why I did not make preempt=full the default.

There isn't sufficient proof to indicate we should not at least try across the board. I've discussed it with @davdunc, @dcavalca, and @salimma in Fedora Cloud/Server and we are at least open to trying it. The data we have says it will at worst be net-neutral.

FWIW, I've been running preempt=full on my F36 desktop for a while now, working and gaming without any issue.

The WG discussed this issue during yesterday's meeting.

It seems that more benchmarking is required to try and clear up any uncertainty around using preempt=full on server and cloud. @chrismurphy is going to look into making this happen.

I think we're in agreement that this proposal needs more work, and so won't be ready for F37 - adjusting the milestone to F38.

Metadata Update from @aday:
- Issue set to the milestone: Fedora 38 (was: Fedora 37)

2 years ago

I switched to preempt=full to give it a try on my desktop:

[    0.102309] Dynamic Preempt: full

If I do run into bugs/problems, how would they manifest themselves on my workstation?

It seems that more benchmarking is required to try and clear up any uncertainty around using preempt=full on server and cloud. @chrismurphy is going to look into making this happen.

Hi @chrismurphy, are you still planning to work on this?

Also found this document which is old but has a nice summary of impact on slide 68. Notably, average latency is increased, but maximum latency and latency variability is reduced. (It also reduces throughput, but this is less interesting for desktops.)

I wonder if full preemption might be overkill? There is better documentation of the various options here and it sure sounds like the "Low-Latency Desktop" option might be more in line with what we're looking for...?

After two weeks of preempt=full, I can't find any noticeable changes (better or worse).

None of this may help, but: I'm on 6.2.10-200 (Fedora 37) with a Ryzen 3700X and a Radeon RX 6600. I have two 4K displays and I run some fairly memory-hungry applications.

Also found this document which is old but has a nice summary of impact on slide 68. Notably, average latency is increased, but maximum latency and latency variability is reduced. (It also reduces throughput, but this is less interesting for desktops.)

I wonder if full preemption might be overkill? There is better documentation of the various options here and it sure sounds like the "Low-Latency Desktop" option might be more in line with what we're looking for...?

Just to clarify, the request in this Pagure issue is for "full preempt" aka "Preemptible Kernel (Low-Latency Desktop)" to be the default. The names are unfortunately confusing, but this issue is not asking for any of the "RT" modes to be default - either "Preemptible Kernel (Basic RT)" or "Fully Preemptible Kernel (RT)"

My anecdata.. I have been using "preempt=full" with Linux kernel 5.19.x (Ubuntu 22.04 HWE) and Linux kernel 6.1.x (latest longterm kernel) on different systems since they were available to me. Neither have experienced problems with this setting. Neither have experienced problems, and both perform well.

I cannot claim that I notice the difference in responsiveness, even subjectively.

I'd like to do it but I'm not sure how to do it. It's unpopular to add another kernel parameter, since there's limited space for it and isn't user facing at all. But the sysfs switch isn't available to us when UEFI Secure Boot is enabled due to debugfs being (mostly?) disabled.

# cat /sys/kernel/debug/sched/preempt
[68690.741678] Lockdown: cat: debugfs access is restricted; see man kernel_lockdown.7

If it must be done with a boot parameter, it would need to be done in Anaconda. And since it probably needs to be limited to desktops, Anaconda needs a way to set edition/spin specific kernel parameters. That's quite a bit beyond my abilities to figure out, but I'll help where I can.

My own anecdote(s), no problem using preempt=full since it was introduced. But also it isn't something I think anyone would ever notice except when there's cpu latency pressure. There have been a few user reports from audiophiles solving "stuttering" (for lack of a proper sophisticated term) by enabling full preemption.

I think perhaps it needs to mature to the point where kernel developers expose a runtime configuration in sysfs outside debugfs. And then we could implement an edition/spin specific systemd unit of some sort that flips it on boot, and can easily be disabled/enabled by users Fedora wide.

I think perhaps it needs to mature to the point where kernel developers expose a runtime configuration in sysfs outside debugfs. And then we could implement an edition/spin specific systemd unit of some sort that flips it on boot, and can easily be disabled/enabled by users Fedora wide.

Well I bet the anaconda developers can figure out whether or not to set this parameter if we ask them to, but then it would only take effect for new installs, which doesn't sound good to me.

It kinda sounds like this ticket is not yet actionable for us if it's going to require further kernel changes? Maybe we can say we'd like to switch to "Low-Latency Desktop" if an easier way to enable it is added in the future?

Metadata Update from @catanzaro:
- Issue untagged with: pending-action

a year ago

So, after more than a year of this, I have been running preempt=full for quite a while now. I started with it on just one desktop and now I run it on desktop's. I still do not run it on the development boxes, as I see little value. I would be completely comfortable with workstation setting this as the default.

What I have not seen, and I would like to, is a good benchmark on some server workloads with it. It is kind of expected that we will lose a bit of throughput on heavy server workloads. How much? How much do we lose if we are running a desktop with sever workloads in the background? How much do we lose if we are not running a desktop really, and run exclusively server workloads? Those are the results that would make me comfortable changing the default in kernel, which would force across all editions. FWIW, I expect we would lose a measurable amount if we are running a desktop, with sever workloads in the background. I expect it might be "noise" when running sever workloads without a desktop, but I could be wrong.

We discussed this issue at yesterday's working group call.

It seems that there's a consensus that we want to enable full preemption for workstation. @chrismurphy has volunteered to write a change proposal for F39.

It would be good if we could convert existing installs to use full preemption on upgrade. However, this would require some additional work. It would be great if someone wanted to take this on.

Metadata Update from @aday:
- Issue set to the milestone: Fedora 39 (was: Fedora 38)

a year ago

Without the possibility of getting support for adjusting this on the fly with sysfs, would the other option be a modification like this:

sudo grubby --update-kernel=ALL --args="preempt=full"

I wasn't sure if there was a more creative way to get it enabled on the next boot. 😉

It seems that there's a consensus that we want to enable full preemption for workstation. @chrismurphy has volunteered to write a change proposal for F39.

Did this happen?

No I lost track of it. The most recent discussion thread on devel@ didn't address the questions I'd asked in May, which are similar to the questions asked by @jforbes in this issue.

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/DET63XCM3R37IQXCOKLWGATFQMP4KGNX

So I just bumped that thread.

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/ECVEGQWF34DQIXIF2B2AIVIXOGYH6L4V/

sudo grubby --update-kernel=ALL --args="preempt=full"

Getting this feature into upgrades is much easier if it's a kernel config change, than if it's a boot parameter change.

I'm skeptical grubby is a reliable way of doing this because it's no longer used by anything anymore. It's strictly a legacy user facing tool. Nothing else uses it, therefore we no longer know the extent of its liabilities.

I know grubby behaves poorly in the same situation and manner as this grub2-mkconfig bug. However, I can't find the grubby specific bug report.
https://bugzilla.redhat.com/show_bug.cgi?id=2120845

grubby doesn't seem to do sufficient sanity checking. It reads /etc/kernel/cmd, adds the args options passed by the user, and then steps on every single bootloader snippet in /boot/loader/entries including snippets it doesn't own by machine ID. This behavior is contrary to the Boot Loader Specification. It's not likely to affect many people in this example case, but I have no idea of the scope of customizations people might have done that grubby will proceed to break if we use it on everyone's machine during an upgrade. I mean, writing that in this issue just now, I started to involuntarily shake my head. I do not feel comfortable writing up a change proposal that depends on using grubby for this.

Metadata Update from @aday:
- Issue set to the milestone: None (was: Fedora 39)

11 months ago

For what it's worth, I've been having crashes under load every few days, every one mentioning something about "preemption". They happen every 24 hours or so like clockwork.

Screenshot_from_2023-09-12_00-39-58.png

I wondered what happened recently to start this behavior, and then I remembered I had recently removed "preempt=none" from my kernel command line, so it was back to the Fedora default preemption settings for Fedora 38.

I've switched back to "preempt=none" and I've not had a crash yet, but it's only been a day so far. I actually had to go to "preempt=none" sometime after the original change because of some major performance regressions, so I'd be concerned if full preemption is going to be a new default. I've never had any luck with any system using those settings, as it very negatively impacts both performance and stability.

I guess the crash in the screenshot is more of an issue for Mesa to look at, but, as it is now, I'd say that preempt=full needs way more testing before it should be the default on anything (and users must be informed how to disable it, and it must be very easy to disable.)

@jforbes

Those are the results that would make me comfortable changing the default in kernel, which would force across all editions. FWIW, I expect we would lose a measurable amount if we are running a desktop, with sever workloads in the background.

I would need to test this again (we now use RHEL for CI/CD hosts), but, my workload was mostly Docker containers for a CI system, running on many systems, including Fedora systems otherwise used as desktops.

I don't have the details going back that far anymore, but I investigated the performance regressions in our CI/CD pipelines (which can take 5-6 hours to run), thinking there was a problem with our software, as they were taking a longer period of time to complete - this was absolutely happening, not my imagination, and it was statistically significant and measurable. I investigated and changed to using "preempt=none" on the Fedora desktop systems and this solved it.

I also know people who use Fedora, never change the default settings, don't even know what preemption is, and they have no complaints, but they aren't running server-like workloads.

Sorry for such anecdotal "evidence" but I thought I might as well mention my experiences.

I should also clarify, that my observations above correspond to the change to kernels when they started using "PREEMPT_DYNAMIC" (and it's default settings), and not explicitly running systems with "preempt=full" -- and having much better luck with explicitly setting "preempt=none". I think preempt=full as default could only be worse - but I've not measured this.

For what it's worth, I've been having crashes under load every few days, every one mentioning something about "preemption". They happen every 24 hours or so like clockwork.

Screenshot_from_2023-09-12_00-39-58.png

I wondered what happened recently to start this behavior, and then I remembered I had recently removed "preempt=none" from my kernel command line, so it was back to the Fedora default preemption settings for Fedora 38.

I've switched back to "preempt=none" and I've not had a crash yet, but it's only been a day so far.

It's almost certainly unrelated.

I actually had to go to "preempt=none" sometime after the original change because of some major performance regressions, so I'd be concerned if full preemption is going to be a new default. I've never had any luck with any system using those settings, as it very negatively impacts both performance and stability.

I guess the crash in the screenshot is more of an issue for Mesa to look at, but, as it is now, I'd say that preempt=full needs way more testing before it should be the default on anything (and users must be informed how to disable it, and it must be very easy to disable.)

It's expected that preemption will reduce bandwidth ("server performance") but increase responsiveness ("desktop performance"). i.e. it's not surprising that disabling it is better for your CI workload.

I should also clarify, that my observations above correspond to the change to kernels when they started using "PREEMPT_DYNAMIC" (and it's default settings), and not explicitly running systems with "preempt=full" -- and having much better luck with explicitly setting "preempt=none". I think preempt=full as default could only be worse - but I've not measured this.

Dynamic preemption didn't actually change how things get scheduled, though. It just means it's possible to change between preemption modes using kernel command line instead of having to recompile the kernel to do so.

It's almost certainly unrelated.

Maybe so. I admittedly don't have enough data to say either way yet.

It's expected that preemption will reduce bandwidth ("server performance") but increase responsiveness ("desktop performance"). i.e. it's not surprising that disabling it is better for your CI workload.

I guess what I should emphasize is that while the "background" CI performance noticeably worse - noticeable enough to warrant further investigation without any real measurement, only casual observation - making me wonder what happened to change it, these newer preemption modes don't seem to bring any improvements to "desktop experience" to me on any of my machines whatsoever.

Perhaps I'm not the a typical user and not running typical workloads, but if the "desktop experience" performance improvements are seemingly nonexistent (for me), but the only noticeable change is that it makes everything else run slower, that's cause for concern.

Also, honestly, it's all a bit frustrating, especially since we've have recent firmware updates and software updates patching things -- and slowing down these Core i7 / Skylake systems even more.

I should add the systems I normally use aren't cutting edge but they aren't that old either - mostly Core i7 and Core i9 systems from 2017 through 2019.

@chrismurphy I'm tempted to close this because we do not seem to have a good understanding of the implications of changing the preemption mode. What do you think?

Also, I think it should be uncontroversial for us to decline to add a new kernel command line parameter. If changing preemption mode is desired, then it should be done without modifying the kernel command line. Do we agree on that? If so, that basically also ends this proposal.

Do you want me to schedule this for a meeting?

The sticking point in the past has been lack of an appropriate testing method to expose issues on servers.

To mitigate that problem, the idea has been to enable it on workstation and disable it on servers.

The gotcha with that is how to do it, because with UEFI Secure Boot enabled, kernel lockdown prevents using debugfs to switch between preempt full vs voluntary. The only switch we have right now is a kernel command line option.

Probably the easiest change is to just do it in Workstation (and desktop spins) using anaconda kickstart bootloader command with the --append option. That way the kernel default remains voluntary across Fedora variants. And the fact preempt full is enabled is exposed on the kernel command line - where users could choose to (manually) disable it.

Metadata Update from @catanzaro:
- Issue tagged with: meeting-request

8 months ago

Metadata Update from @catanzaro:
- Issue untagged with: meeting-request
- Issue tagged with: meeting

8 months ago

Metadata Update from @catanzaro:
- Issue untagged with: meeting

7 months ago

I sent an email inquiry to Michal Hocko mhocko@suse.com, who replied and cc'd Frederic Weisbecker fweisbecker@suse.de

Frederic was looking into a more dynamic way to change the preemption model IIRC. There were some issues with that. Our main goal was to reduce distributed kernel flavors by allowing
different preemption models in a single binary. We didn't really have any request for changing it during runtime (or after boot process).

I think our top impediment right now for using a kernel parameter, is the lack of any way of changing it on upgrades. It isn't merely that we don't have a good way to do it, we have none (as far as I'm aware). So how solvable is that?

Log in to comment on this ticket.

Metadata
Attachments 1