#71 Should we include thermald by default?
Closed: Fixed 4 years ago by aday. Opened 6 years ago by orschiro.

Dear all,

thermald is a thermal daemon that can be found in the default repositories and can noticeably impact CPU heat production and thus fan noise.

I am wondering why it is not installed by default? Are there any objections I am not aware of?

Thank you!


The github repository seems to be really lacking in documentation that goes into why. Some background:

https://01.org/linux-thermal-daemon/documentation/introduction-thermal-daemon

@labbott - do you know anything about thermald? Is it a useful adjunct to what the kernel/BIOS does?

This has been brought up a few times before, see https://bugzilla.redhat.com/show_bug.cgi?id=1440479 and https://pagure.io/fesco/issue/1698 for some past history. https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/PR2QS3PMGQ26D3TM6HFE3XRATWOYRMDX/ also has some more discussion. Basically, nobody has really done enough work to show that it should be enabled by default on all systems. That said, I think if someone provided enough data and worked through the Fedora change process we could fix that.

Thank you @labbott and @otaylor, for your insights!

That said, I think if someone provided enough data and worked through the Fedora change process we could fix that.

Alright, let's see if someone can jump in here.

Yours,

Robert

I've been using thermald on an Intel NUC and HP Spectre for some time without any problems. However, recently I ran into a goofy problem with Firefox and Chrome, kidle-inject making the system unusable. [1]

I disabled thermald, and the problem is solved. That's correlation, I haven't done enough testing since then to ensure this wasn't some kind of kernel and thermald interaction.

If there is a way to show thermald is preferring kidle-inject to prevent the laptop from getting warm enough to need fans, that might be OK behavior for a laptop running exclusively on battery; or perhaps only once it has less than 30% battery remaining? But definitely not when plugged into AC.

[1]
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/BK2DVB4PA37FMB2DTMUI4YBM7YDGSJTK/

Setting aside the problem Chris discovered, which needs a bug report....

After skimming through https://01.org/linux-thermal-daemon/documentation/introduction-thermal-daemon, I'm left with a question for kernel people. Is thermald an elaborate workaround for kernel or hardware problems (incorrect ACPI data)? Or is it a technically-elegant solution to problems that cannot be better solved elsewhere in the stack?

A userspace daemon to manage such a low-level issue as thermal performance just seems rather suspect to me. I believe we rejected some custom power management daemon or configuration not too long ago, which feels similar.

I don't have a problem including it by default, but enabling by default requires analysis. I've definitively trace slow downs due to kernel kidle-inject processes to thermald. With it running, almost anything that would make the laptop work hard enough to cause fans to run, triggers the kidle-inject to soak up the CPU, leaving very little resources for user space. On the plus side, the laptop doesn't ever get hot, and doesn't run fans, and has great battery life. On the other hand, any appreciable CPU workload is slow to stalled and not responsive.

I think it could be a good option if it were started on demand, e.g. when on battery. And if there were some kind of simplistic performance slider interface to let the user choose their consequences.

CC @benzea @gicmo

This was discussed on fedora-devel recently. Final comment there is not encouraging.

Is thermald really ready? My instinct is to punt on this for further development?

@catanzaro, I think it is pretty ready overall.

Sorry, seems like I missed the message about bad performance on the HP notebook. From what I had gathered, so far we had not seen any performance regression. And Intel upstream has always been good at addressing issues we found so far.

That said, I kind of agree that we should at least figure out what is going on on that HP machine. I would say that thermald is more likely to give a boost rather than this regression.

@chrismurphy, could you provide the exact model? Are you able to provides logs from thermald with --loglevel=debug?

@benzea I've been running thermald on Fedora 31 for a few months now, and haven't experienced any downside with this second go around. Thus the original report is stale/obsolete. But I have filed a bug about some spurious warnings and complaints I see in the journal, to see whether this intended, or correctable.
https://bugzilla.redhat.com/show_bug.cgi?id=1793688

I think it is pretty ready overall.

OK, is there an updated change page? I understand the change was never approved by FESCo?

Right, I now understand what happened. I don't remember why neither of us responded to that mail, and I never CC'ed myself on fesco#2241 and had not realised it got rejected there :-(

Seems like we need to update the page and try to re-submit it.

Yup. Importantly, it was rejected because you hadn't had a chance to investigate Chris's bug report, not for any other reason. So while it's too late for F32, you can still propose a change for F33.

I still have an outstanding question from above:

After skimming through https://01.org/linux-thermal-daemon/documentation/introduction-thermal-daemon, I'm left with a question for kernel people. Is thermald an elaborate workaround for kernel or hardware problems (incorrect ACPI data)? Or is it a technically-elegant solution to problems that cannot be better solved elsewhere in the stack?

A userspace daemon to manage such a low-level issue as thermal performance just seems rather suspect to me. I believe we rejected some custom power management daemon or configuration not too long ago, which feels similar.

thermald is developed by Intel in conjunction with the corresponding kernel modules. I believe the reason is that it tries to do machine dependent thermal rules, and it is not feasible nor desirable to have all that logic in kernel space. Also, Intel does develop both thermald and kernel drivers together (and e.g. kernel changes have been done specifically for thermald).

Now, to adjust to each machine, it reads an XML configuration. In the "usual" scenario, i.e. as currently intended by Intel, this configuration is generated from ACPI DPTF tables using dptfxtract (which is in rpmfusion). However, that is not necessarily the case, and one could configure it using different methods in principle.

I think running thermald can already give advantages even without any machine specific configuration (i.e. removal of default RAPL limits as long as the CPU is in a reasonable temperature range). But I would need to double check this.

In the "usual" scenario, i.e. as currently intended by Intel, this configuration is generated from ACPI DPTF tables using dptfxtract (which is in rpmfusion).

Well that sounds pretty bad?

RPM Fusion non-free
/var/cache/dnf/rpmfusion-nonfree-updates-ae9d50128db39ba7/packages/dptfxtract-1.4.2-1.fc31.x86_64.rpm

I think we'd like to do this for Fedora 33, if possible? @benzea can you give a status report, either in the ticket or at the 10 Mar meeting (your choice)?

Metadata Update from @chrismurphy:
- Issue set to the milestone: Fedora 33

5 years ago

Metadata Update from @chrismurphy:
- Issue tagged with: meeting

5 years ago

I do not think a lot has changed since.

We do know that thermald is really important on certain machines. For example Dell XPS13 models can easily get a >30% performance boost using thermald and other machines have similar issues (e.g. throttled). Lenovo has worked on an in-firmware solution, but we cannot rely on something like that for all vendors.

I really don't like the whole situation with dptfxtract and I think it would be great to figure out a way to collect configurations for existing machines. But I fear thermald is the best/only option we have to address these thermal management issues and we really need to do so to be able to compete performance wise.

I'll try to go over the change proposal before the meeting to see what needs to be updated there.

As I am using throttled for more than year and half I'm really advocating for any solution that we ship out of the box. Are there any particular reasons why not shipping throttled instead of thermald as the former doesn't have dependency on anything proprietary (as far as I know).

Yes, at least two as far as i know of:

  1. throttled only addresses one specific aspect of the problem,
  2. throttled only works with legacy boot (this might have changed).

thermald on the other hand can do a lot more and will adjust further parameters to ensure the system is stable and performs well at the same time. In general, it also supports different power profiles, I believe.

Thank you for clarification Benjamin!

throttled only addresses one specific aspect of the problem,
throttled only works with legacy boot (this might have changed).

I do have a UEFI boot here and it works.

thermald on the other hand can do a lot more and will adjust further parameters to ensure the system is stable and performs well at the same time. In general, it also supports different power profiles, I believe.

Ok. But from my POV it looks like we can use throttled and fix one aspect or use thermald that won't be usable out of the box as it will depend on a package from rpmfusion.

Ok. But from my POV it looks like we can use throttled and fix one aspect or use thermald that won't be usable out of the box as it will depend on a package from rpmfusion.

As I see it, the big problem with that is that this is inherently unsafe. Quoting the thermald README.

  • The major change in this version is the active power limits adjustment.
    This will be useful to improve performance on some newer platform. But
    this will will lead to increase in CPU and other temperatures. Hence this
    is important to run dptfxtract version 1.4.1 tool to get performance
    sensitive thermal limits (https://github.com/intel/dptfxtract/commits/v1.4.1).
    If the default configuration picked up by thermald is not optimal, user
    can select other less aggressive configuration. Refer to the README here
    https://github.com/intel/dptfxtract/blob/master/README.txt

Now, I do expect that we are fine at least on e.g. Thinkpad machines. But I have no way of confirming this for consumer products and I feel much more confident leaving this to some Intel people instead of possibly overheating random laptop models.

So yes, unfortunately lifting the limits requires dptfxtract or a configuration file shipped/created in some other way. But I just don't feel that it is sane to ship something by default that may be unsafe.

At meeting:

Summary is, dptfxtract is essentially a black box that eats ACPI data and produces a machine specific configuration which is then used by thermald.

Decision: Generally no as described due to the non-free dptfxtract requirement; but explore the possibility of shipping and maintaining some subset of configuration files.

Actions:

  • @chrismurphy to attempt to start a discussion on 01.org where thermald is maintained;
  • ? to start a discussion on devel@ list specifically about the configuration files path; with emphasis on the benefit to Fedora users is 30-40% better battery life without negatively impacting performance; whether such a package of configurations is maintainable; and also if other distributions are using thermald already.

Metadata Update from @catanzaro:
- Issue untagged with: meeting

5 years ago

Looks like @mjg59 forked thermald and works on removing the dptfxtract dependency. Maybe we could consider this in the future (when Matthew says that it's ready).

https://github.com/mjg59/thermal_daemon
https://mjg59.dreamwidth.org/54923.html

Even better would be to help him

We were a bit surprised to find the change has been proposed again here: https://fedoraproject.org/wiki/Changes/ThermalManagementWS

Discussed it today and our conclusion was: further discussion required

To be clear, the Working Group's previous decision (https://pagure.io/fedora-workstation/issue/71#comment-634021) was to reject thermald due to its dptfextract dependency.

Metadata Update from @chrismurphy:
- Issue tagged with: meeting

4 years ago

Discussed at today's meeting. We've reversed our previous decision based on the understanding that (a) dptfxtract is not actually strictly required, and (b) Ubuntu has been shipping thermald without dptfxtract for six years. thermald is approved.

Action items:

  • Benjamin or Christian to submit pull request to comps to add thermald to @workstation-product
  • Benjamin or Christian to submit pull request to fedora-release to update presets

Metadata Update from @catanzaro:
- Issue untagged with: meeting

4 years ago

(Also, since this was a change proposal submitted to FESCo, we should probably wait for FESCo to approve as well.)

The WG's role is done here. @benzea , please reopen if you can't get the changes in for F33.

Metadata Update from @aday:
- Issue close_status updated to: Fixed
- Issue status updated to: Closed (was: Open)

4 years ago

Log in to comment on this ticket.

Metadata