Kevin Boone

Using thinkfan for fan control on Lenovo (Linux) laptops

A problem that affects high-performance laptop computers is that they tend to either run hot, or generate a lot of fan noise, or both. I have several Lenovo ThinkPad laptops, all with H-class Intel CPUs, all rated to run continuously at 65C. Some of these have discrete GPUs, with similar temperature ratings.

The default behaviour of the cooling system on my P53 is not to turn the fans on at all until the CPU/GPU temperatures reach about 60C, and then turn both the fans on at high speed. This is probably safe, given the ratings of the components, but I’d prefer to have a bit more fan noise, and not burn my fingers on the keyboard.

So I’ve started experimenting with the thinkfan utility, as a way to get some control over the fan curve, that is, the relationship between temperatures and fan speed.

Note:
You can probably break a computer if you fiddle with its cooling behaviour without proper care and attention. It’s something to be done cautiously, with careful observation.

thinkfan isn’t the only utility for temperature control on Linux, and it’s probably not limited to ThinkPads. It’s reasonably well documented but, as with all Linux documentation, it’s written for people who don’t need it. thinkfan is far from a zero-configuration utility, and setting it up for particular hardware can be rather fiddly.

In this article I explain how temperature monitoring and fan control work in Linux, and then describe how I configure thinkfan, using my Lenovo P53 as an example. The P53 is a big, heavy, industrial-grade laptop, that can generate an awful lot of heat.

Temperature monitoring in modern Linux systems

Temperature monitoring is part of the ‘hwmon’ subsystem, which the kernel exposes in the pseudo-directory /sys/class/hwmon. Various kernel modules, like ‘cortemp’, create pseudo-files in that directory, indicating temperature and many other things.

If you look in the hwmon directory, you’ll see it contains only subdirectories, named hwmon0, hwmon1, etc. These subdirectories are, in fact, links to other directories, usually under /sys/devices. Unfortunately, the numbering of the directories in /sys/class/hwmon isn’t guaranteed to be stable, which can be a problem. In practice, I’ve found that it’s reasonably stable on the same hardware, but I don’t think you should really rely on this if you don’t have to – and most of the time you don’t (more below).

All the temperature sensors have pseudo-files with names of the form tempNN_input, where NN is a number that starts at 1 (not 0).

To see them all, try this:

$ find -H /sys/class/hwmon/* -name temp?_input

You’ll need the -H here because the directories are symbolic links. On my Lenovo P53 I see 23 temperature sensors. Do I know what they all are? Nope – not even close. That’s also a problem.

I have some idea what they are, because some tempNN_input files are accompanied by a label. So, for example:

$ cat /sys/class/hwmon/hwmon10/temp1_label 
Package id 0

This is only partially useful, because I don’t know what specific hwmon driver corresponds to hwmon10. There are two ways I might find out.

$ ls -l /sys/class/hwmon/ | grep hwmon10
lrwxrwxrwx 1 root root 0 Aug 25 19:24 hwmon10 -> ../../devices/platform/coretemp.0/hwmon/hwmon10

You’ll see coretemp in the directory name. Another method is to look at the name file in the hwmon directory:

$ cat /sys/class/hwmon/hwmon10/name 
coretemp

In both cases I get coretemp as the name. It turns out that coretemp is a driver that measures temperatures on the individual cores of an Intel-like CPUs. On an ARM device you might have cpu-thermal instead.

To see all the various driver names, and the directories in hwmon they correspond to (on this current boot), try this:

find -H /sys/class/hwmon/* -name name -printf "%p " -exec cat {} \;
/sys/class/hwmon/hwmon0/name acpitz
/sys/class/hwmon/hwmon1/name BAT0
...

In short, each subdirectory of /sys/class/hwmon has a file name whose contents indicate what its driver is, and a bunch of tempNN_input files that indicate temperatures associated with that driver. If we’re lucky, each tempNN_input file has an tempNN_label file that says what that sensor is.

So far, so good; but how do we know which are useful drivers and temperature metrics to use for fan control? The simple answer is: we don’t. The coretemp module is almost always going to be useful, because CPU temperature is so significant. But what about disk temperatures? Or GPU temperatures?

There’s really no systematic solution here, apart from a heap of web searching. NVME disks, for example, are monitored by the nvme driver. On Lenovo Thinkpads, the driver thinkpad provides a temperature indicator whose label is GPU, which does exactly what the name suggests. If you have an NVidia GPU, it’s possible to get information directly from the proprietary NVidia kernel module, if you’re using it.

The thinkpad driver on my P53 exposes a bunch of temperature metrics with no labels. What do they do? I don’t know. Some websites provide information that might be relevant to some Lenovo models. In general, enthusiasts have worked out what the sensors do on some models by trial-and-error, using tactics like directing a cooling spray at various components, while watching the temperature metrics.

What about the temperature of the wifi adapter? On my P53, this information comes from the iwlwifi_1 hwmon driver but, again, this could well be different on another system.

The sensors utility will dump a heap of useful information from hwmon is a human-readable format; I found this to be a useful starting point, when figuring out what sensors to monitor. In the end, CPU temperature and GPU temperature (if your machine has a specific GPU) are likely to be the significant metrics; I’m also concerned about my NVME drives, which do tend to get quite warm under load.

There are (at least) a couple of other things to consider, when choosing which sensors to use for fan control.

First, not all sensors (not all tempNN_input files) correspond to dynamic values. Some represent other metrics of interest to the manufacturer. It’s important to realize that, just because an hwmon driver can read a sensor, that doesn’t mean its value can, or should, be interpreted.

A good example of this is the long-term temperature metrics stored by certain storage drives. Some sensors indicate average temperatures, or lifetime maximum temperatures. While these are important metrics for assessing the health of the drive, they’re useless for fan control.

Second, there’s no point using a metric for fan control if the fan speed makes little or no difference to it. If the relevant component isn’t in the airflow path of the fan, then the fan will have little effect on its temperature. For example, in my P53 the temperature of the wifi adapter only falls by a few degrees when running the fan at full speed for an extended duration. Using the wifi temperature sensor for fan control more-or-less guarantees that the fans will run all the time, even when the CPU/GPU temperatures are in the thirties, and the wifi adapter won’t run significantly cooler as a result.

In short, it’s worth monitoring all the temperatures, and seeing which ones change if you force the fan to run at full speed. If it changes only a little, question whether it’s any value for fan control. If a metric doesn’t change even when you cool the computer with a powerful external fan, then most likely it isn’t an instantaneous temperature measurement at all.

Fan control on modern Linux systems

Linux (at least on Intel hardware) has essentially two methods for controlling built-in fans. The more modern method uses pseudo-files in the hwmon subsystem, which you can find using essentially the same method as for temperature sensors. You’re looking for files called pwmNN, which NN is a number starting at 1. There are some subtleties here but, essentially, you set the fan speed by writing a number 0-255 to the relevant file.

The other method is to use the older interface /proc/acpi/ibm/fan. If you cat this file, you’ll get the current status of the fan, and brief instructions on what ‘commands’ to send, that is, what to write to the file, to control the fan. To set the fan speed, you’ll need to write level X to /proc/acpi/ibm/fan. X isn’t necessarily a number.

On my P53, the supported levels are 0-7, auto, disengaged, and full-speed. auto effectively puts control back in the hands of the machine’s firmware. On my machine level 7 and full-speed have the same effect. disengaged is subtly different – this disables the feedback system that regulates the fan speed. The result may (on some systems) be faster than full-speed, with the disadvantage that you could lose the ability to monitor the fan speed. Unless you’re in a very noisy environment, however, you’ll probably know that the fans are spinning at full speed, just because of the turbine roar they make.

On Thinkpad machines, fan control is done via the thinkpad_acpi module. You’ll need to load this module with the option fan_control=1 to enable non-default fan speeds.

About thinkfan

thinkfan is a simple utility for controlling fan speed, originally on Thinkpad devices, but probably these days on others. It was originally written by Victor Mataré. It’s been around for more than ten years, and it’s still kind-of maintained, although the peak of activity was back in 2020. There are 36 open bugs on the project’s GitHub page, but that’s probably not a huge number, compared to the number of people using it. It’s in most of the standard Linux repositories.

The purpose of thinkfan is to modify the default fan/temperature response, that is, to change the fan speed for particular temperatures. It’s not clear to me whether careless use, or misbehaviour, of thinkfan could actually break a specific machine. I would imagine that thermal CPU/GPU throttling would act to reduce the amount of heat generated if the fan speed were inadequate. However, it’s not something to be complacent about, and I’ve monitored my laptops’ temperatures quite carefully when they’re working hard, before deciding that my fan control settings are adequate.

thinkfan won’t do anything useful without uses a YAML configuration file. Personally, I loathe YAML, because layout is part of the syntax. Still, it is what it is. You’ll need to specify what sensors to read, what method of fan control to use, and what mapping to make between the temperature and fan speed. thinkfan supports hwmon interfaces and also /proc/acpi/ibm/fan. It has a number of other capabilities, too, which I haven’t used, and can’t comment on.

thinkfan has a ‘simple’ mode of operation, and a ‘detailed’ mode. In simple mode, the temperature used for fan control is just the highest value from among all the sensors listed in the configuration. In ‘detailed’ mode, each sensor is given its own temperature range, and the fan speed is determined from the sensor that reads the highest value within its own range. You probably need ‘detailed’ mode if you have components that run at very different temperatures. My CPU and GPU can stand temperatures of above 90C, for example, but a magnetic hard drive would struggle at temperatures above 60C.

An alternative to using ‘detailed’ mode is to apply a fixed temperature correction to specific sensors. You could, for example, add 20C to the temperature of a magnetic disk, and then assume that the temperature needed to give full fan speed is 80C, not 60C, as might be the case for the CPU.

I haven’t experimented much with any of these more sophisticated modes of operation, as I have no particular reason to think that the components in my laptops have different safe temperature ranges.

thinkfan sensor configuration

For the record, this is the configuration I’m using on my P53:

sensors:
  - hwmon: /sys/class/hwmon
    name: coretemp
    indices: [1, 2, 3, 4, 5, 6, 7] # CPU package and cores
  - hwmon: /sys/class/hwmon
    name: thinkpad
    indices: [1, 2]  # CPU aggregate and GPU
  - hwmon: /sys/devices/pci0000:00/0000:00:1b.0/0000:02:00.0/nvme/nvme0/hwmon2/temp1_input # SSD1
  - hwmon: /sys/devices/pci0000:00/0000:00:1d.0/0000:55:00.0/nvme/nvme1/hwmon3/temp1_input # SSD2

You’ll see that it’s possible to specify hwmon sensors in different ways. The more elegant is to use the name and indices format, with a specific base directory. With this format, thinkfan will enumerate the directories under the base directory, looking for a name file that matches the supplied name. indices refers to the specific tempXX_input file and, again, values start at 1, not 0. You’ll need to inspect the individual tempXX_label files to find the sensor indices to include. The advantage of this approach is that it won’t break if the entries in /sys/class/hwmon get renumbered.

In my case, temp1_input in the coretemp driver is the aggregate package temperature, labeled Package id 0. In my tests, this temperature invariably reads a bit higher than the other cores, which are temp2_input, etc. I’m not entirely sure, therefore, whether it’s necessary to record the additional core temperatures, along with the package temperature. I’ve tried both ways, and it doesn’t make a huge difference.

I’m also using the CPU and GPU sensors from the thinkpad driver. The CPU temperature is always a degree or two lower than the Package id 0 value so, again, I don’t really know whether this metric needs to be included. I’d certainly want to include the GPU temperature, though.

In my P53, recording the drive temperature is somewhat fiddly. There are two NVME drives, and one SATA. I’m not worried about the SATA drive, since it doesn’t get much use, and it never seems to get warm. It doesn’t even have ventilation slots in the chassis, so there’s probably little point increasing the fan speed even if it does warm up. The NVME drives, however, do have ventilation, and they do tend to get warm under load.

The problem with having two NVME drives is that they’re handled by two different instances of the nvme driver – but both are just called nvme. It would be nice to be able to specify:

  - hwmon: /sys/class/hwmon
    name: nvme 
    indices: [1] 

and have thinkfan read the temperatures for both drives. Sadly, that doesn’t work: it just reports duplicate drivers.

So I’ve had to use the full path to the corresponding temp1_input files for each NVME drive. So far the pathnames have remained consistent across reboots but, as I said, this is perhaps not something to rely on.

I’m not using any of the other temperature metrics, even those I can interpret. The ones I’m using are the ones that seem to increase under load. I’ve had to figure out what these are by extensive monitoring, along with trial-and-error. In short, while the thinkfan configuration is tricky, mastering it alone doesn’t mean the job’s done.

thinkfan fan configuration

You can use the same configuration syntax as for sensors but, since my P53 has the old /proc/acpi/ibm/fan interface, it’s easier just to use that. So the entire fan configuration is this:

fans:
  - tpacpi: /proc/acpi/ibm/fan

thinkfan fan curve

We have a way to read the temperature, and a way to set the fan speed. Now we need a way to link these two metrics together. There’s plenty of scope for creativity here, and I’m still experimenting, to see what gets the best results. My feeling is that, so long as you assign maximum fan speed to any temperature that is likely to be high enough to lead to severe throttling, you have a reasonably safe configuration. The creative part lies in deciding what to do with temperatures lower than this.

In my case, I want the fan to remain off completely at temperatures below 45C, then increase speed rapidly beyond that point. With the other settings I have, I know that all the routine stuff I do – editing text, looking at websites – doesn’t increase the temperature more than this, and I want the fan to be mostly off under light load. At greater than light loads, I want the fan to be on, and working hard: I feel this will increase the lifespan of the hardware, as well as making the machine more comfortable to use. If you prefer heat to fan noise, however, you’ll probably prefer a different configuration to mine.

This is the configuration I currently have:

levels:
  - [0, 0, 45]              # Fan off at less than 45C
  - ["level 1", 42, 50]     # Gentle
  - ["level 3", 48, 60]     # Moderate
  - ["level 5", 55, 70]     # High
  - ["level 6", 68, 78]     # Higher 
  - ["level 7", 75, 255]    # Max speed

Note that the temperature bands overlap: if they don’t, you’ll have the fans rapidly switching on and off when the temperature is near the band edges.

Dual fan issues

The P53 has separate fans for the CPU and GPU. In principle, they can be controlled individually, but the thinkpad driver doesn’t expose a way to do this. Even if it did, thinkfan doesn’t support multiple fans.

In practice, setting the fan speed via /proc/acpi/ibm/fan sets the speeds of both fans. They don’t run at exactly the same speed, whatever fan level you set, but they’re similar.

Closing remarks

thinkfan isn’t perfect. Sometimes it doesn’t start at boot time, and sometimes it crashes when waking up from sleep. The configuration is fiddly and, even though I understand it, I still needed a lot of trial-and-error to get the behaviour I wanted.

Still it seems to do what it should, and it makes my laptops more agreeable to use.