#228 Enable full preemption
Opened 2 years ago by chrismurphy. Modified 8 days ago

Now that a single kernel can use preemption none/voluntary/full, we should evaluate the pros and cons of preempt=voluntary (the current default) versus preempt=full.

relevant message in devel@ thread:Dynamic preemption support in Linux 5.12 kernel.

This would be configured with a boot parameter, so it could be an edition/spin specific option. But we need to figure out how to add it in a configurable way. Maybe it can be done with a dracut drop-in config file?

There's other work happening with resource control, so part of the investigation should consider any conflicts, or other downsides.

@benzea


Metadata Update from @chrismurphy:
- Issue tagged with: pending-action

2 years ago

I would prefer that if we decide to do this, we should make this a default for all variants unless someone specifically wants to opt-out. Pretty much everything I have read on this seems to indicate the downsides are relatively minimal across the board.

I believe that this issue can be considered entirely independent from other resource control problems.

Personally, I am wondering a bit how exactly preemption helps here. Is the problem here that spinning up another core is costly, so we should preempt kernel tasks in order to run userspace earlier? i.e.:

CPU0: [kernel input] -> [other kernel task] -> userspace input -> ...
vs. with preemption:
CPU0: [kernel input] -> userspace input -> [other kernel task] -> ...

I've asked Tejun Heo about this, and he's confirmed there shouldn't be any impact on resource control specifically (and if there is, it's a bug we should fix).

I expect for the desktop case, it is a win across the board. There are certain server workloads where it can be a problem. I am not particularly comfortable switching defaults in the middle of a stable release though, particularly when we have so much testing with preempt voluntary (RHEL uses voluntary as well, and does a lot more performance testing than we do). Hopefully we can get a patch which lets us keep the default and still use dynamic so that F33/F34 users will benefit.

Wouldn't it make sense for these to be configurable through the tuning daemon (tuned)? That seems like what this is for...

BTW, this is how to change preempt on-the fly:

# cat /sys/kernel/debug/sched_preempt
(none) voluntary full

# echo full > /sys/kernel/debug/sched_preempt

# cat /sys/kernel/debug/sched_preempt
none voluntary (full) 

Would be nice to make preemptctl.

Sounds like Workstation should switch to preempt for F35.

I agree with Justin that there's no strong reason to make major changes like this in a stable release.

Wouldn't it make sense for these to be configurable through the tuning daemon (tuned)? That seems like what this is for...

We do not install tuned in Workstation.

We do not install tuned in Workstation.

why not? It's easy to do this.

Nobody has ever proposed it afaik.

/sys/kernel/debug/sched_preempt

debugfs is subject to kernel lockdown, which is in effect for UEFI Secure Boot systems, so currently it can't be checked or changed during runtime.

[  531.180062] Lockdown: cat: debugfs access is restricted; see man kernel_lockdown.7

Hmm, apparently there's now a GUI applet for tuned made by @xvitaly: https://github.com/EasyCoding/tuned-switcher

Hmm, apparently there's now a GUI applet for tuned made by @xvitaly: https://github.com/EasyCoding/tuned-switcher

System tray applet is fully functional:

  • full D-Bus support;
  • profile monitoring (will report if another application changes the profile);
  • easy profile switching;
  • shows currently selected profile;
  • supports automatic mode.

Screenshots:

Screenshot

GUI version is still in development.

Packaged and can be installed via sudo dnf install tuned-switcher.

The workstation UI for such things is in the control-center

The workstation UI for such things is in the control-center

power-profiles-daemon devs said that it will not work on desktop Ryzen PC's for example. And it doesn't work not only on desktop Ryzens. Maybe just at this moment and this temporary, so this is JFYI.

This ticket is tagged with pending-action and the F35 milestone. What needs to be done?

Nothing yet, as the patch that allows us to switch preempt mode from where we are doesn't seem to exist yet. I pinged Lutro on the fedora-devel thread to see if it is in progress. Once that patch goes in (and the resulting kernel is available as a rebase target for stable Fedora), is the time to start testing switching the workstation defaults. Users on stable Fedora will be able to manually make the switch, and then you can make a decision on setting a default for the next version. As it doesn't seem to be in 5.14, it will likely not be on the install kernel for F35

Metadata Update from @aday:
- Issue set to the milestone: Fedora 36 (was: Fedora 35)

2 years ago

Summarize a chat with @jforbes:

  • Fedora kernels are and will continue to have voluntary preemption
  • CONFIG_PREEMPT_DYNAMIC just arrived in the 5.16 merge window, and is needed to change to full preemption using kernel parameter preempt=full
  • We would need anaconda to add preempt=full to boot loader configuration, so that it's applied to new desktop installations
  • Discuss whether to run some desktop only RPM, that's just a scriptlet which has grubby set the boot parameter, so it's applied to upgrades
  • changing it via debugfs might be possible, but is untested, including whether it'll give the same results as the command line parameter

kernel crystal ball says:

the v5.16 kernel predictions: merge window closes on Sunday, 2021-11-14 and release on Sunday, 2022-01-09
the v5.17 kernel predictions: merge window closes on Sunday, 2022-01-23 and release on Sunday, 2022-03-20

Fedora 36 schedule key says:

Branch Fedora Linux 36 from Rawhide     Tue 2022-02-08

So it does seem plausible to get it done for Fedora 36, but it needs testing, and a decision about whether to enable it on new installations only or also on upgrades.

http://phb-crystal-ball.org/
https://fedorapeople.org/groups/schedule/f-36/f-36-key-tasks.html

Importantly, running with preempt=full at all needs some significant testing across desktop use cases as all Fedora and RHEL kernels have shipped with preempt=voluntary for a very long time. I honestly do not expect there to be issues, but we need to be sure. FWIW, I plan to switch over to 5.16-rc kernels for desktop use on at least one machine here to give preempt=full some testing as soon as I think things have stabilized enough (the 5.16 MR has been more painful than average so far). But I am probably not representative of the average Fedora workstation user as I still spend the majority of my time in terminals, and I don't have a btrfs filesystem anywhere here.

CONFIG_PREEMPT_DYNAMIC just arrived in the 5.16 merge window

As far as I know, CONFIG_PREEMPT_DYNAMIC arrived in the 5.12 merge window:

found in Linux kernels: 5.12–5.15, 5.15+HEAD

https://cateee.net/lkddb/web-lkddb/PREEMPT_DYNAMIC.html

running with preempt=full at all needs some significant testing across desktop use cases

Many desktop distros successfully use CONFIG_PREEMPT by default. Seems like full preemption is already well-tested.

CONFIG_PREEMPT_DYNAMIC just arrived in the 5.16 merge window

As far as I know, CONFIG_PREEMPT_DYNAMIC arrived in the 5.12 merge window:

found in Linux kernels: 5.12–5.15, 5.15+HEAD

https://cateee.net/lkddb/web-lkddb/PREEMPT_DYNAMIC.html

Yes, but no. It did, in a way that I was not willing to enable it (Requiring preempt=full as default) There was along thread about it on fedora-devel. 5.16 is where the options came in that actually allow me to enable it for Fedora.

running with preempt=full at all needs some significant testing across desktop use cases

Many desktop distros successfully use CONFIG_PREEMPT by default. Seems like full preemption is already well-tested.

I am not too concerned about "many desktop distros" I know what Fedora and Red Hat testing looks like, what the rest of the world does is not of too much concern to me. RHEL in particular has a fairly substantial performance testing setup, and preempt=full was never tested there. Voluntary has been the default for everything but s390 for a very long time. I want to see real testing. Luckily F38 is still a ways away, so we have time to do some of that testing.

From pykickstart docs, looks like it would be

bootloader --append "preempt=full"

If sysfs or debugfs, it could be done with a /etc/udev/rules.d drop-in.

We discussed this ticket at yesterday's working group meeting. The group is generally positive about pursuing full preemption. @chrismurphy has kindly agreed to monitor the situation and keep us informed.

5.16.0-0.rc4.29.fc36.x86_64+debug

# cat /sys/kernel/debug/sched/preempt 
cat: /sys/kernel/debug/sched/preempt: Operation not permitted
# dmesg
...
[  189.486988] Lockdown: cat: debugfs access is restricted; see man kernel_lockdown.7

I think with kernel lockdown in place due to UEFI Secure Boot being enabled here, it's not possible to change this via sysfs, and it'll need to be done by kernel boot parameter which means we need to do this via anaconda kickstart.

Can we have OpenQA run through testing all variants with preempt=full by default? Or organize some kind of test week to test server, cloud, and workstation workloads with this mode?

Can we have OpenQA run through testing all variants with preempt=full by default? Or organize some kind of test week to test server, cloud, and workstation workloads with this mode?

This would certainly be a good idea before changing the default for workstation. I don't know that server is interested given than RHEL seems happy with the voluntary default for kernel-ark right now, and I am not sure that cloud would either. The main benefit is for interactive/desktop type workloads. I do plan to add it to the test plan for 5.16 kernel test-week, and possibly even do a preempt=full ISO.

Sample size 1, running it for a few days now along with bcc-tools fileslower and i'm not seeing anything out of the ordinary compared to voluntary. Of course, fileslower is only looking at IO latency at the VFS layer, so it's a very narrow view.

Can we have OpenQA run through testing all variants with preempt=full by default? Or organize some kind of test week to test server, cloud, and workstation workloads with this mode?

This would certainly be a good idea before changing the default for workstation. I don't know that server is interested given than RHEL seems happy with the voluntary default for kernel-ark right now, and I am not sure that cloud would either. The main benefit is for interactive/desktop type workloads. I do plan to add it to the test plan for 5.16 kernel test-week, and possibly even do a preempt=full ISO.

Well, from the Cloud side (cc: @davdunc), I can say we're interested in any change that might potentially benefit us. From the Server side, data is always interesting when trying to figure out how to make things better. Doing that lets us figure out whether there's value or difference in being different across variants on this point.

Personally, I prefer to minimize differences across Fedora variants if it's reasonably possible. So testing across all the workload types makes sense to see if it makes sense to maintain a delta between desktop and non-desktop variants.

Metadata Update from @chrismurphy:
- Issue set to the milestone: Fedora 37 (was: Fedora 36)

a year ago

Working full preemption is available with preempt=full boot parameter in 5.16+ kernels. I've been using it since available, and I think it's ready for wider testing. How to do that?

Discussed at meeting, chrismurphy will pull a system wide change proposal together for F37. And coordinate with QA for a test day(s).

Just to be clear, this system wide change proposal is for workstation only.

No, we're planning to propose this change for all editions. Certainly there's no reason for Workstation to differ from other desktop editions. As for server editions, I'll let Neal or Chris comment.

There absolutely is, which is why I did not make preempt=full the default.

Some troubles with backlight on my laptop when preempt=full is setup. The brightness level has dropped noticeably. Although the slider is turned at 100%. I have OLED display and use ICC Brightness tool. https://github.com/udifuchs/icc-brightness.

preempt=full works great on my PC. Much better experience. Didn't noticed any issue at least yet.

I wonder could preempt=full affect battery life on laptops?

There absolutely is, which is why I did not make preempt=full the default.

There isn't sufficient proof to indicate we should not at least try across the board. I've discussed it with @davdunc, @dcavalca, and @salimma in Fedora Cloud/Server and we are at least open to trying it. The data we have says it will at worst be net-neutral.

FWIW, I've been running preempt=full on my F36 desktop for a while now, working and gaming without any issue.

The WG discussed this issue during yesterday's meeting.

It seems that more benchmarking is required to try and clear up any uncertainty around using preempt=full on server and cloud. @chrismurphy is going to look into making this happen.

I think we're in agreement that this proposal needs more work, and so won't be ready for F37 - adjusting the milestone to F38.

Metadata Update from @aday:
- Issue set to the milestone: Fedora 38 (was: Fedora 37)

10 months ago

I switched to preempt=full to give it a try on my desktop:

[    0.102309] Dynamic Preempt: full

If I do run into bugs/problems, how would they manifest themselves on my workstation?

It seems that more benchmarking is required to try and clear up any uncertainty around using preempt=full on server and cloud. @chrismurphy is going to look into making this happen.

Hi @chrismurphy, are you still planning to work on this?

Also found this document which is old but has a nice summary of impact on slide 68. Notably, average latency is increased, but maximum latency and latency variability is reduced. (It also reduces throughput, but this is less interesting for desktops.)

I wonder if full preemption might be overkill? There is better documentation of the various options here and it sure sounds like the "Low-Latency Desktop" option might be more in line with what we're looking for...?

After two weeks of preempt=full, I can't find any noticeable changes (better or worse).

None of this may help, but: I'm on 6.2.10-200 (Fedora 37) with a Ryzen 3700X and a Radeon RX 6600. I have two 4K displays and I run some fairly memory-hungry applications.

Also found this document which is old but has a nice summary of impact on slide 68. Notably, average latency is increased, but maximum latency and latency variability is reduced. (It also reduces throughput, but this is less interesting for desktops.)

I wonder if full preemption might be overkill? There is better documentation of the various options here and it sure sounds like the "Low-Latency Desktop" option might be more in line with what we're looking for...?

Just to clarify, the request in this Pagure issue is for "full preempt" aka "Preemptible Kernel (Low-Latency Desktop)" to be the default. The names are unfortunately confusing, but this issue is not asking for any of the "RT" modes to be default - either "Preemptible Kernel (Basic RT)" or "Fully Preemptible Kernel (RT)"

My anecdata.. I have been using "preempt=full" with Linux kernel 5.19.x (Ubuntu 22.04 HWE) and Linux kernel 6.1.x (latest longterm kernel) on different systems since they were available to me. Neither have experienced problems with this setting. Neither have experienced problems, and both perform well.

I cannot claim that I notice the difference in responsiveness, even subjectively.

I'd like to do it but I'm not sure how to do it. It's unpopular to add another kernel parameter, since there's limited space for it and isn't user facing at all. But the sysfs switch isn't available to us when UEFI Secure Boot is enabled due to debugfs being (mostly?) disabled.

# cat /sys/kernel/debug/sched/preempt
[68690.741678] Lockdown: cat: debugfs access is restricted; see man kernel_lockdown.7

If it must be done with a boot parameter, it would need to be done in Anaconda. And since it probably needs to be limited to desktops, Anaconda needs a way to set edition/spin specific kernel parameters. That's quite a bit beyond my abilities to figure out, but I'll help where I can.

My own anecdote(s), no problem using preempt=full since it was introduced. But also it isn't something I think anyone would ever notice except when there's cpu latency pressure. There have been a few user reports from audiophiles solving "stuttering" (for lack of a proper sophisticated term) by enabling full preemption.

I think perhaps it needs to mature to the point where kernel developers expose a runtime configuration in sysfs outside debugfs. And then we could implement an edition/spin specific systemd unit of some sort that flips it on boot, and can easily be disabled/enabled by users Fedora wide.

I think perhaps it needs to mature to the point where kernel developers expose a runtime configuration in sysfs outside debugfs. And then we could implement an edition/spin specific systemd unit of some sort that flips it on boot, and can easily be disabled/enabled by users Fedora wide.

Well I bet the anaconda developers can figure out whether or not to set this parameter if we ask them to, but then it would only take effect for new installs, which doesn't sound good to me.

It kinda sounds like this ticket is not yet actionable for us if it's going to require further kernel changes? Maybe we can say we'd like to switch to "Low-Latency Desktop" if an easier way to enable it is added in the future?

Metadata Update from @catanzaro:
- Issue untagged with: pending-action

2 months ago

So, after more than a year of this, I have been running preempt=full for quite a while now. I started with it on just one desktop and now I run it on desktop's. I still do not run it on the development boxes, as I see little value. I would be completely comfortable with workstation setting this as the default.

What I have not seen, and I would like to, is a good benchmark on some server workloads with it. It is kind of expected that we will lose a bit of throughput on heavy server workloads. How much? How much do we lose if we are running a desktop with sever workloads in the background? How much do we lose if we are not running a desktop really, and run exclusively server workloads? Those are the results that would make me comfortable changing the default in kernel, which would force across all editions. FWIW, I expect we would lose a measurable amount if we are running a desktop, with sever workloads in the background. I expect it might be "noise" when running sever workloads without a desktop, but I could be wrong.

We discussed this issue at yesterday's working group call.

It seems that there's a consensus that we want to enable full preemption for workstation. @chrismurphy has volunteered to write a change proposal for F39.

It would be good if we could convert existing installs to use full preemption on upgrade. However, this would require some additional work. It would be great if someone wanted to take this on.

Metadata Update from @aday:
- Issue set to the milestone: Fedora 39 (was: Fedora 38)

9 days ago

Without the possibility of getting support for adjusting this on the fly with sysfs, would the other option be a modification like this:

sudo grubby --update-kernel=ALL --args="preempt=full"

I wasn't sure if there was a more creative way to get it enabled on the next boot. 😉

Login to comment on this ticket.

Metadata