Command-line hacking: displaying system temperature
This is another in my series of articles on doing unusual and, perhaps, interesting things with Linux command-line tools and scripts. The purpose of this week's exercise is to create a formatted display of system temperatures from the various sensors, using only Bash constructs. This example is significantly simpler than some of my earlier ones, although it demonstrates basic file and string handling in Bash.
The output of the script is something like this:
zone temp ------ ----- acpitz 46 x86_pkg_temp 46 pch_skylake 52
I'll touch on interpretation of the zone (sensor) names later.
As ever, full source code is available in my GitHub repository, although this example is simple enough that I'll include most of the code in the article.
Modern CPUs and motherboards have multiple temperature sensors. Drivers
in the Linux kernel make their readings available as pseudo-files in
the directories /sys/class/thermal/thermal_zoneNNN/
.
Unfortunately, there's no single pseudo-file that can be displayed, that
will give a nice, formatted display of all the temperatures, which
is what this script tries to do.
Within each of the thermal_zoneNNN
directories are two
pseudo-files that are relevant here. type
gives the name
of the temperature sensor, while temp
is the temperature
itself.
The temperature is, according to the kernel documentation, supplied as a string of exactly five decimal digits, representing the temperature in millicelsius. So, to get an ordinary Celsius temperature, we must divide by 1000. That's easy: just take the top two digits of the five, and ignore the rest. This process will round the temperature down, so a temperature of 49900 will be reported as 49 degrees when, ideally, it should be rounded to the nearest integer. Alternatively, we could display the first three digits with a decimal point between them ("49.9") Both these approaches are left as an exercise to the reader -- simply inserting the decimal point is straightforward in a Bash script, but rounding to the nearest integer requires some real math. I'm happy with a result that is within a degree or so of the true value, and these sensors aren't hugely accurate, anyway.
The first task in the script is to gather a list of the
thermal_zoneNNN
directories. The way I've chosen to
do this is shown below, but I'm sure there are many other possibilities.
SCT=/sys/class/thermal for zone in `find $SCT -name thermal_zone* -exec basename {} \; ` do ... done
There's no functional need to assign /sys/class/thermal
to a
variable but, as it will be used many times, doing this tidies up the
script a little. The for
loop will execute once for each
thermal_zoneNNN
directory, with zone
set to the
directory name.
If any of the directory names might have contained spaces, we'd have to be a lot more careful about how the names are handled. In this case, however, there cannot be spaces in any of the directory names.
Within each thermal_zoneNNN
directory, we need to read
temp
and type
into variables. A common way
to read a file into a variable is:
my_var=`cat /path/to/file`
Bash experts discourage this kind of thing, because it spawns an extra process, when there's no real need to. I confess that I use this construct all the time, because it's so easy to read. However, let's try to be more modern.
type=$(</$SCT/$zone/type) temp=$(<$SCT/$zone/temp)
This approach uses the < syntax, so it isn't totally unfamiliar to a shell programmer. Still, I confess that I often forget the exact syntax.
This is where things get a little interesting, from a shell programming
point of view. On some Linux system, the temp
pseudo-file
does not supply a temperature, and trying to read it will result in
an error. This is the case even though the file exists, and has read
permissions. So what does temp
end up containing?
It turns out to be an empty string -- which is fine in this simple example.
A valid temperature will never be an empty string, so we don't have to
try to distinguish between valid empty strings and errors. But what happens
if a script has to read a file or pseudo-file that might actually
validly
be empty? When we use the < syntax, I'm not sure
there's a way to find out whether the file was present but empty, or
whether a read error occurred.
If we do need to make this distinction, one possibility might
be to use the read
call, which returns an error status.
Here is one way to tell whether a file is actually readable, when it
might be empty:
ok="" read -n 1 < /path/to/file >& /dev/null && ok="1" if [ ! -z $ok ] ... #File is actually readable
This is a rather ugly construction, and I'd be interested to know whether there's a nicer one. In practice, when writing shell scripts, it's usually fine to assume that if a file exists, and the user has read permission, a read operation will succeed. We don't normally need to be more paranoid than that. However, when handling files that are actually interfaces to drivers, oddities can arise, that need to be handled. Happily, in this simple example, it's safe to assume that reading an empty string represents an error condition, so we don't have to be too paranoid.
With the zone name and the temperature in variables, we just need to format them nicely. To reduce the temperature to an integer, we just extract the first two digits, like this:
temp=${temp:0:2}
This construct means 'take two characters, starting at position zero'. Exercise for the reader: does this still work for multibyte characters? We don't have to worry about that here, but sometimes we do.
To format the display so that the temperatures are aligned, we can
use the printf
command.
printf "%20s %s\n" $type $temp
This command should be familiar to C programmers. The first argument
controls the format of the arguments that follow. %s
in
general denotes a simple text string; %20s
pads
a string so that it is exactly twenty characters wide.
Finally, how do we know how to interpret the sensor names? The sad fact is
that, on the whole, we don't. These names are supplied by drivers
in the kernel, based on what sensors it detects by probing the motherboard.
From experience I know that x86_pkg_temp
is usually the CPU
die temperature on x86 systems, while acpitz
usually comes
from a motherboard sensor mounted close to the CPU. ARM Linux
systems that use the 'Big.Little' architecture seem to report the
'Big', 'Middle', and 'Little' cores with their own thermal_zone
entries. g3d
is probably an integrated graphics controller.
For the record, in my example pch_skylake
is the Platform Hub Controller.
This would once have been a separate chip on Intel-compatible
motherboards but, these days, is probably part of the CPU die.
It isn't unusual for this to record a higher temperature than
x86_pkg_temp
or, of course, not to be present at
all: the exact sensors will depend on the motherboard and CPU
design.
In short, interpretation of the thermal zone names will likely take a bit of web searching.