Exploring Java 17’s container-awareness features

This article is about container-awareness in the Java virtual machine, particularly in Java 17. Container-awareness is a method for the JVM to detect the resources it has available, when running in a container, and adjust itself accordingly. In the article I will discuss only memory resources. Everything here applies equally well to CPU resources; I’m using memory simply because it’s easier to monitor than CPU.

The ultimate goal of this article is to show how OpenShift resource limits and requests affect the JVM. However, to get to that point, we need to start by looking at simpler scenarios.

The JVM heap and its sizing

The Java virtual machine uses a flexible heap with garbage collection for its object management. Broadly speaking, the JVM allocates newly-created objects in the heap, where they remain until they are elligible to be removed by the garbage collector. The heap can expand between its initial and maximum values, according to the number of objects in play at a particular time.

It’s always been possible for the application designer to set the initial and maximum java heap size using the -Xms and -Xmx command-line switches. For example:

$ java -Xms512m -Xmx2048m ...

This is a reasonable way to control the heap size in a traditional operating system, that is, one that is not based on container technology. The operating system has a specific, well-known amount of memory available to it, and it’s unlikely to change much from day to day. Changing the JVM’s memory allocation is likely to amount to hacking on some configuration file or script.

Container technology, however, provides for flexible memory management. Administrators of container-based systems like Kubernetes or OpenShift expect to be able to tune an application’s memory usage by tuning the amount of memory assigned to its container. Moreover, tuning memory usage at the application might be complicated, if the configuration is buried somewhere inside a container image.

Later versions of Java have a container awareness feature, which allows the JVM to learn (or, at least, guess) the amount of memory present in the container. By ‘later’ I mean, perhaps confusingly, later releases of Java 1.8, Java 11, and Java 17. That is, the changes were back-ported from Java 17 to the later releases of earlier versions. If you see what I mean. The point I’m making is that container awareness is available on all Java platforms today, even as far back as Java 8.

Note Despite all this, there are subtle differences between Java versions and, in this article I use Java 17 exclusively.

As a result, we generally advise against using fixed heap settings (-Xmx and -Xms) when running Java in a container platform.

But what happens if the designer sets no heap limits at all? The JVM has to allocate some heap range, but how? And how will the JVM’s behaviour depend on the container’s memory settings?

In general, if no specific heap values are given, the Java JVM will apply defaults. Historically, Java had a rather complicated algorithm for working out which heap defaults to use; it depended on whether we were running in ‘server’ or ‘client’ mode, the amount of platform virtual memory, whether we were running a 32-bit or 64-bit platform, the garbage collection algorithm, and so on. The defaults changed from one Java version to another and, although they were documented, it wasn’t always easy to find the documentation. It made sense, in the pre-container days, to use at least the -Xmx switch for all non-trivial applications.

These days, for most reasonable platform configurations, the defaults are much simpler: the maximum heap is 1/4 of the platform’s reported memory size, and the minimum heap is 1/64, up to a limit of 512Mb. There are complications related to set-ups with very large, and very small, memory sizes; but these won’t affect most installations.

Using a program to report memory and heap allocations

In this article, I’ll be using a simple Java program to report on the platform memory and allocated heap sizes. It will get this information from JMX MBeans. I won’t present the source code here – it’s very simple and, as always, it’s available from my GitHub repository.

As well as the Java source, the application bundle contains scripts and a Dockerfile, to build the code into a container. I’ll demonstrate its use in a Podman container (although Docker will work just the same) and also in an OpenShift pod.

When I run the program on my workstation, here is the output:

$ java -jar target/java_docker_test-0.0.1-jar-with-dependencies.jar
Native memory size: 30.7 GB
Max heap size (Xmx): 7.7 GB
Init heap size (Xms): 492.0 MB

The ‘native’ memory size is what JMX reports to the program as the system memory. My workstation has 32Gb of RAM, so that value looks about right. The maximum and initial heap values are 1/4 and 1/64 of that figure, more or less. So that’s as expected.

Just to prove that the program works, I can run it with specific heap settings:

$ java -Xmx1000m -Xms100m -jar target/java_docker_test-0.0.1-jar-with-dependencies.jar
Native memory size: 30.7 GB
Max heap size (Xmx): 1000.0 MB
Init heap size (Xms): 100.0 MB

You’ll see that the heap values now reflect the -Xmx and -Xms settings, so I’m reasonably sure that the simple program works properly.

Heap mamangement in a container

Let’s try this application in a container. I’m using Podman, but Docker will behave the same.

In the application bundle is a Dockerfile for building the container. It looks like this:

FROM openjdk:17-alpine
ADD target/java_docker_test-0.0.1-jar-with-dependencies.jar .
ADD ./run.sh .
ENTRYPOINT ["/run.sh"]

The FROM line specifies that my image will be based on the Alpine Java 17 base image. Alpine is a lightweight Linux version, popular in containers.

To the base images I add the Java application’s JAR file, and a script run.sh. This script just runs the Java application, exactly as I did on my workstation. Here’s what it looks like:

!/bin/sh
java -jar java_docker_test-0.0.1-jar-with-dependencies.jar

Nothing clever there. Later, we’ll change the Java command line, to get particular heap behaviour in the container.

To build the container we do this (replacing podman with docker if you prefer):

podman build -t java_docker_test .

You can expect this command to take a little while the first time, as podman will need to retrieve the base image.

To run the container, do this:

$ podman run -it localhost/java_docker_test
Native memory size: 30.7 GB
Max heap size (Xmx): 7.7 GB
Init heap size (Xms): 492.0 MB

You see that the results are exactly the same as running outside the container. There’s no reason they should be different: podman will not constrain the container’s memory, unless we ask it to.

So let’s do that – let’s fix the values of RAM and swap.

$ podman run --memory 1000m --memory-swap 3000m -it localhost/java_docker_test
Native memory size: 1000.0 MB
Max heap size (Xmx): 241.7 MB
Init heap size (Xms): 16.0 MB

Note
Be aware that ‘–memory-swap’ really means ‘RAM plus swap’. This configuration allocates 1Gb RAM and 2Gb swap.

The program reports its memory size as 1000Mb (which matches the --memory argument) and, again, the heap sizes are 1/4 and 1/64, as always.

How does the JVM know how much memory its container has? We can experiment with this by logging into the container, by setting a shell as the --entrypoint:

$ podman run --entrypoint /bin/sh --memory 1000m --memory-swap 3000m -it localhost/java_docker_test

Let’s look at the system memory within the container:

/ # free
              total        used        free      shared  buff/cache   available
Mem:       32224352     4529920     2732716     5085020    24961716    22134684
Swap:       8388604      170248     8218356

These Mem and Swap figures match my workstation totals, not the container’s. The container only has 1Gb available, and somehow the JVM has worked this out.

Linux containers using a technology called ‘cgroups’ (control groups) for managing resources at the per-process level. Somewhat irritatingly for our purposes, there are two different versions of cgroups in circulation: v1 and v2. Most modern Linux kernels support both, but characteristics of the underlying platform dictate which to use. podman running on my workstation uses cgroups v2 but, as we’ll see, containers on my OpenShift installation uses v1.

Both versions do much the same thing; the irritation is that the filenames we’ll use to report cgroups metrics are different. Unfortunately, if you do a lot of work with cgroups, you really have to be familiar with both versions, and we’ll see both in use in this article.

There is an article on cgroups (v1) elsewhere on my website, which might be of interest to readers who want to see it in action.

From within the running container, we can get the memory and swap limits from cgroups like this:

/ # cat /sys/fs/cgroup/memory.max
1048576000
/ # cat /sys/fs/cgroup/memory.swap.max
2097152000

You’ll note that these agree with the limits I applied to the podman command.

It’s worth noting at this point that, so far as the JVM is concerned, the swap value is irrelevant. By default, when the JVM queries its memory allocation to calculate heap sizes, it uses ‘RAM’, not swap. Of course, it’s not real RAM, it’s a container simulation of RAM.

So that’s how the heap allocation works in a Podman container: the JVM uses cgroups to determine the amount of allocated RAM, and sets the heap limits to 1/4 and 1/64 of that figure.

It’s worth bearing in mind that cgroups enforces limits on the JVM. The JVM only reads the cgroups metrics to set the heap sizes appropriately. You could set -Xmx2000m in a container with only 1Gb allocated, and the JVM would try to create a heap that large – but it wouldn’t be allowed.

Heap management with OpenShift resource requests and limits

Now let’s try the same thing on OpenShift.

There are many ways to deploy a Java application in an OpenShift pod, but I want to make the test conditions as close as possible to the Podman/Docker case. To that end, I’ll do a Docker deployment on OpenShift, using the same Dockerfile and Java code as we used with Podman. Here’s how to do that.

First, create a Docker build configuration:

$ oc new-build --strategy=docker --binary --docker-image=openjdk:17-alpine --name=java-docker-test

I’ve named my build configuration java_docker_test.

Now run the build, using the files in the current directory (the directory containing the source bundle) as input to the build:

$ oc start-build java-docker-test --from-dir . --follow

This will take a little while, particularly the first time. All being well, you’ll see the same Docker build steps in the output as we saw when building for a local Podman installation.

Note
This step, oc start-build, is what you’ll need to repeat, if you change the Java program or scripts.

All being well, you’ll see that a build pod was created, and it completed. It should have created a new OpenShift container image called java-docker-test.

To get this image into a running pod, we can create a deployment from the image.

$ oc new-app java-docker-test

Use oc get pods to see what new pods have been created; you should see a pod with a name of the form java-docker-test-XXXXXXXXXX-XXXXX.

To see the application’s output, have a look at the logs from this pod:

$ oc logs java-docker-test-78c474dd96-sl87g
Native memory size: 6.6 GB
Max heap size (Xmx): 1.6 GB
Init heap size (Xms): 108.0 MB

The native memory size is reported as 6.6Gb. The heap sizes are, as always, 1/4 and 1/64 of this. But why 6.6Gb? Almost certainly the OpenShift node I’m running this pod on has much more memory than this. I didn’t apply any limits to the pod (yet). So why this figure?

It’s not just a platform default – this is a calculated value. But I’ll come back to how the calculation is done later, as it’s a bit of a distraction.

Let’s apply a resource request and limit to the deployment. The pod will restart, and we can look at the logs again.

$ oc set resources deployment java-docker-test --limits=memory=1Gi --requests=memory=1Gi

$ oc logs java-docker-test-5556df889b-w2nzf
Native memory size: 1.0 GB
Max heap size (Xmx): 247.5 MB
Init heap size (Xms): 16.0 MB

Again, this should make sense: we applied a 1Gb memory limit, and that’s what the JVM uses to allocate its maximum and minimum heap sizes.

Let’s log into the pod, and see where the JVM is getting its limit from.

$ oc rsh java-docker-test-5556df889b-w2nzf

$ cat /sys/fs/cgroup/memory/memory.limit_in_bytes
1073741824

Because we’re using cgroups v1 here, the name of the file containing the limit is not memory.max, as it was earlier – it’s memory.limit_in_bytes. The value is 1Gb, which is as we set on the command line.

The mystery I deferred is how the JVM arrived at a memory size of 6.6Gb when we didn’t set any limit. According the OpenShift documentation, a pod with no resource limit has unbounded resources. That is, the JVM should be able to allocate memory until the OpenShift node runs out of RAM. But, clearly, the JVM has arrived at some heap sizing. But how?

Log into the pod again, and look at the cgroups limit in this scenario:

$ cat /sys/fs/cgroup/memory/memory.limit_in_bytes
9223372036854771712

That’s a truly colossal number – scarcely a limit at all. It’s as close to ‘unbounded’ as makes no practical difference. To see where the JVM’s memory limit comes from, let’s ask it. Log into the running pod and run:

$ java -Xlog:os+container=trace -version
...
[0.031s][trace][os,container] Memory Limit is: 9223372036854771712
[0.031s][trace][os,container] Non-Hierarchical Memory Limit is: Unlimited
[0.031s][trace][os,container] Path to /memory.stat is /sys/fs/cgroup/memory/memory.stat
[0.031s][trace][os,container] Hierarchical Memory Limit is: 7125319680
...

You can see that the JVM has determined that its memory allocation is unlimited. So it’s parsed the file memory.stat, and found the value of the “hierarchical memory limit”.

The what now? This isn’t a concept that I really have space to explain in detail here. In essence, OpenShift containers run as processes in a hierarchy of resource constraints. The ‘hierarchical limit’ is obtained by taking the limit on the container’s parent process, and subtracting the memory used by other child processes (other pods) with the same parent.

Doing this math gives us about 71 trillion bytes, or 6.6Gb.

If you think this is an odd way to work out a memory limit, I’d be inclined to agree. But what else can the JVM do? Java has no notion of an unconstrained heap: it has to have some maximum. The JVM could just use the platform RAM as its base figure, as it does outside a container. But it isn’t safe for a JVM to assume that it has that much RAM available – it could be enormous, and there are too many other containers competing for it. I guess the ‘hierarchical limit’ is the best compromise that that JVM maintainers could think of.

In practice, I think we should assume that a JVM running in a pod with no limit will get what amounts to a random limit. In almost all circumstances it will make sense to apply a limit.

Resources limits and requests

In the previous command I set a resource limit and a resource request, both to the same value. My experience is that Java developers working on OpenShift often don’t understand the difference between these values.

I’ve noticed a vague expectation that the ‘limit’ figure should constrain the maximum heap, and the ‘request’ figure the minimum. It seems plausible that this should be the case but, as we’ve seen, that isn’t the case. In the previous example I set both request and limit to 1Gb, but the minimum heap value remained at 16Mb – the usual 1/64 of the total memory.

In fact, the JVM only sees the ‘limit’ figure. If there is, in fact, a way for a JVM to find the ‘request’ value from the platform, I don’t know of one. The request figure has no effect on the JVM heap size, maximum or minimum.

However, it does have implications. The request figure is used by the Kubernetes scheduler to decide which node to run the pod on. If we set a low memory request, then the scheduler will allow the pod to run on a node with low memory availability. The JVM will still get the heap allocation determined by the limit figure, however large that is. But if the pod is running on a node with low memory availability, it’s possible that the pod will fail at runtime, because there just isn’t enough memory to satisfy the limit.

Setting a low resource request is a ‘neighbourly’ thing for an application installer to do. It allows for fair distribution of pods between nodes. However, it’s probably not in the application’s best interests to have ‘limit’ and ‘request’ values that are hugely different. If we set a limit of, say, 1Gb, we’re doing so in the expectation that the pod might, sooner or later, use 1Gb. If the request value is, say, 128Mb we’re saying that the pod will be satisfied with 128Mb; if you’ve set a limit of 1Gb, that’s likely not to be the case.

It takes care and experience to determine good resource requests and limits for an application, and probably some testing. I usually recommend that installers set the same values for limit and request if they can. In practice, though, we often can’t, because if every application component does that, resources could be badly over-committed.

Percentage heap allocations

My experience is that the larger the memory allocated to a pod, the less sense it makes to allow the JVM to assign only a quarter of it for its heap. To be sure, most Java applications will use memory outside the heap. A common consumer of out-of-heap memory is thread stacks, and that can be very significant on heavily-loaded, concurrent applications. Many Java libraries allocate memory outside the heap – Netty is a good example of this. The Java classes that expand zipfiles also use non-heap memory, although this might not be obvious.

All things considered, though, if you’re allocating 32Gb of RAM to a Java pod, you’re probably expecting at least, say, 28Gb of it to be available to the JVM heap. Of course, we can give effect to that expectation by using the command-line argument -Xmx28000m, but that’s something we discourage, as I described earlier.

We would like the administrator to be able to tune the amount of memory available to a Java application by tuning the pod’s allocation, rather than by rebuilding the image with new code (even if it’s only a new start script).

In scenarios like this, it can make sense to allocate a specific fraction of the memory to the heap, rather than an amount. For example, if I want the heap to use between 20% and 70% of the total memory, I can run Java like this:

$ java -XX:MaxRAMPercentage=70 -XX:InitialRAMPercentage=20 ...

Recent Java versions have a bunch of additional command-line arguments for more subtle control of the heap in a container environment.

Conclusions

So what have we learned from all this?

In the container environment, the JVM uses cgroups to work out how much memory it has available and, if the installer does not specify any heaps sizes, it sets the maximum and initial values to 1/4 and 1/64 of the available memory limit
The container’s memory limit will nearly always be less than the platform’s RAM
The JVM only uses the container’s RAM size in its calculations, not swap However, both RAM and swap are a simulation – platform swap can be used to provide the container’s RAM if circumstances dictate
On OpenShift, the cgroups memory limit is derived from the resource limit values in the deployment, not the resource request value
Request values are used to determine which node to run the pod on
In practice, it’s rarely sensible to run a Java application on OpenShift without specifying at least the resource limit, if not the request
In a container environment, it often makes sense to specify heap sizes as a fraction of the available memory, rather than as specific values.

This has been a long article, to explain something which is actually quite simple: when deploying a Java application on OpenShift, you should nearly always set resource request and limit values, and you should consider sizing the heap using fractions of the available memory.