Compiling a Java program to native code using GraalVM, from the ground up

gears

GraalVM is a relatively-recent compiler and runtime package that has the capability to support multiple programming languages. It is open-source, but largely maintained and supported commercially by Oracle. An interesting feature, albeit one that is still subject to ongoing development, is its ability to produce true native-code executables that are reasonably self-contained. At the time of writing, the native-code generation feature is not available for all platforms.

This article describes step-by-step how to install GraalVM for Linux, add the 'Native Image' plug-in, and use it to compile some trivial Java programs. We shall see that even very simple programs can present some problems for the technology.

The GraalVM maintainers describe compiling to native code "ahead of time compilation". In my day, we just called that "compilation", and what a JVM did was called "interpretation". Oh, well.

In principle, compiled Java is what we've all been waiting for, at least in the micro-services world. It's difficult to think of anything that runs under a JVM as a micro-anything -- not without a hollow laugh, anyway. GraalVM's compiled Java offers the possibility -- in principle -- of millisecond program load times, and a memory footprint more credible in an environment where there might be hundreds of concurrent micro-services in the same physical host. Because Java is a VM-based language, it's impossible to eliminate the VM completely, but the 'SubstrateVM' runtime used by Native Image is claimed to be relatively lightweight. The picture isn't all as rosy as that, as we shall see.

As with all my software development articles, all steps are carried out on the command line, using no development tools more complicated than a text editor, so all the individual steps are (I hope) clear.

Obtain GraalVM

GraalVM exists in a 'community edition' (CE) and a commercially-supported 'enterprise edition'. Both can be obtained from the GraalVM download page. The community edition is all that is required to follow the examples in this article.

GraalVM CE is installed on Linux by the simple expedient of unpacking the download bundle into any convenient directory. In the examples that follow, I will be using an installation in /home/kevin/lib/graalvm. Of course, you can install GraalVM is a system-wide location if you prefer and if you have access rights.

Install the Native Image plug-in

With the community edition, this should be very straightforward:

$ /path/to/graalvm/gu install native-image

The installation provides the utility

/path/to/graalvm/bin/native-image

Hello, World

Let's start with the simplest possible Java program, to check that everything is installed and set up properly. In any convenient directory, create the file HelloWorld.java, like this:

public class HelloWorld
  {
  public static void main (String[] args)
    {
    System.out.println ("Hello, World");
    }
  }

GraalVM provides a javac compiler, but native-image works on compiled byte-code, not source code. Therefore there's no particular reason to use GraalVM's own javac. Whichever javac you prefer to use, compile the Java (running javac this way will produce a compiled .class file in the same directory as the source, as there is no package definition).

$ javac HelloWorld.java

Now feed the compiled class file into native-image:

$ JAVA_HOME=/path/to/graalvm /path/to/graalvm/bin/native-image \
   -classpath . HelloWorld

native-image can handle collections of class files, and also JAR files. In the latter case, it's still necessary to specify the class that provides the main() method -- the utility won't read it from a JAR manifest, even if there is one.

It's at this point that you might get your first disagreeable surprise: compiling Hello World to a binary will take about a minute. In a sense, that's not surprising -- Java is not a language designed to work this way, and the native-image utility has a great deal of work to do. Still, debugging a program when each change takes a minute to test soon gets old. It's not so bad if running native-image is just the last step in a process where most of the earlier steps are performed by a conventional compiler -- but we aren't quite at that stage yet.

Be that as it may, the native-image should have produced an executable called "helloworld" -- in this case, the binary name is derived from the main class name -- which you can run at the prompt:

$ ./helloworld
Hello, World

Note that the binary is about 2.2Mb in sized in the default, unstripped format. Stripping symbols from it doesn't make it much smaller. It's large compared to a C program that does the same thing, but it's an awful lot smaller than a full-sized JVM and all its runtime dependencies.

This is interesting, also:

$ time java HelloWorld
real	0m0.054s
user	0m0.045s
sys	0m0.017s

$ time ./helloworld 
real	0m0.004s
user	0m0.002s
sys	0m0.005s

These timing figures are pretty consistent and, although I don't think the time utility really provides millisecond precision, it's clear that the native code version starts and finishes in about a tenth the time the traditional JVM needs -- and since the program itself takes little to no time to execute, I think that's a fair estimate of the difference in overheads.

So far, so good.

An HTTP(S) request

We've seen that it's easy enough to compile to native code the Hello World example; time to look at something a bit more complicated. The following code snippet is a program that takes a specified URL, an fetches the content to standard out.

import java.net.*;
import java.io.*;

public class GetUrl 
  {
  public static void main (String[] args)
    {
    if (args.length != 1)
      {
      System.err.println ("Usage: java GetUrl {URL}");
      System.exit (0);
      }
    try
      {
      URL url = new URL (args[0]);
      URLConnection c = url.openConnection();
      InputStream is = c.getInputStream();
      int chr;
      while ((chr = is.read()) > 0)
        {
        System.out.write (chr);
        }
      }
    catch (Throwable e)
      {
      e.printStackTrace();
      }
    }
  }

This program will compile just fine, using the method described for Hello World earlier. But it won't actually run, either with HTTP or HTTPS URLs. Instead, you'll see an error message like this:

Exception in thread "main" com.oracle.svm.core.jdk.UnsupportedFeatureError:
Accessing an URL protocol that was not enabled. The URL protocol http is
supported but not enabled by default. It must be enabled by adding the
-H:EnableURLProtocols=http option to the native-image command.

To be fair, it is at least clear what needs to be done -- for this specific feature, anyway. We need to compile like this:

JAVA_HOME=... /native-image -H:EnableURLProtocols=https \
   -H:EnableURLProtocols=http -classpath . GetUrl

The problem is that I have not been able to find a complete list of features that need to be enabled this way, and I'm not sure whether such a list even exists. Since problems like this do not become apparent until runtime, and they can't be spotted by testing with an ordinary JVM, that significantly increases the testing burden.

Anyway, having fixed this problem it's on to the next one. Although GetUrl now works with HTTP URLs, it fails with HTTPS, like this:

WARNING: The sunec native library, required by the SunEC provider, could not be
loaded. This library is usually shipped as part of the JDK and can be found
under <JAVA_HOME>/jre/lib//libsunec.so. It is loaded at run time via
System.loadLibrary("sunec"), the first time services from SunEC are accessed. 

Again, it's clear what needs to be done, but it's less clear why. libsunec.so is provided with GraalVM, so why is it not just linked with the compiled executable? For testing purposes, we can just copy the library from the GraalVM installation to the current directory, but the application is no longer self-contained. It's also not entirely clear to me whether I can legally distribute this library with my application, or whether I would have to give instructions to end users on how to obtain a copy themselves. Handling SSL is hardly a niche activity in a Java application, and it's surprising that doing so requires these additional steps. I've also heard that the list of SSL certificates that gets built into the executable is truncated, and that it might be necessary to supply a full list; but I've not noticed this problem myself.

Reflection

This is the last example in this article and, in many ways, the most troublesome. Consider this code sample, which converts a number to a String in an extremely convoluted way. I'm not suggesting this is a practical method of coding -- just a way to illustrate a problem.

import java.lang.reflect.*;

public class Reflection 
  {
  public static void main (String[] args)
      throws Exception
    {
    Class c = Class.forName ("java.lang.String");
    Object o = c.newInstance();
    Method m = c.getMethod ("valueOf", int.class);
    Object result = m.invoke (o, 42);
    System.out.println ("result=" + result);
    }
  }

What this rather tortuous code does is to load the String class by name, instantiate it, locate its valueOf method, then call that method on the instance. I've gone to some trouble here never to refer to the String class directly at any point. While this is a pointless exercise here, reflection of this sort is absolutely ubiquitous in Java programming -- many would consider it one of the most power features of the language. With reflection we can, for example, delay decisions about which specific classes to use for particular functions until runtime. Of course, compiling to a native executable is all about making compilation decisions in advance -- it is inimical to reflection.

Consequently, if we compile this class in the same way we compiled HelloWorld earlier, we will run into a problem at runtime, if the program even compiles:

Exception in thread "main" java.lang.InstantiationException: Type
`java.lang.String` can not be instantiated reflectively as it does not have a
no-parameter constructor or the no-parameter constructor has not been added
explicitly to the native image.

What's happened here is that, as the class String was loaded by name only, and never used directly in the program, no code was compiled for it. The native compiler is smart enough to recognize the problem in this case, because it has built-in logic for handling Class.forName(); so we get a somewhat helpful message, rather than a core dump. Still, the onus is on the developer to configure the build to add classes that are only loaded reflectively. There is some information on doing this on a GitHub page.

Essentially, we must create a JSON file that lists the various classes and methods that must be included, and pass it to the native-image command line using -H:ReflectionConfigurationFiles=.... In the present case, it's reasonably clear what needs to be added -- a constructor for the String class. A suitable file is this:

[
  {
    "name" : "java.lang.String",
    "methods" : [
      {
        "name" : "<init>", "parameterTypes" : []
      }
    ]
  }
]

Despite the documentation, it's not always clear how to figure out how to specify the reflection properties, and I find that I need a certain amount of trial-and-error. In fact, some popular libraries, like Log4J2, use reflection so extensively that it's difficut -- perhaps impossible -- to get reflection to work correctly with native-image.

Notes

Command line parameters. The binary produced by native-image will tacitly process a number of traditional JVM parameters. In particular, it will set system properties if the command line has -Dname=value, and you can adjust the heap size using the familiar -Xmx... switch. You don't need to take account of these switches in your code if you don't want to -- they are silently removed from the command line, leaving all the unrecognized parameters in the args[] argument to main. What this means is that the compiled program will take whatever command-line arguments the application handles (if any), in addition to some of the common java command-line switches.

Thread/heap dumps. There's no obvious way to get a traditional Java thread or heap dump from a compiled program. You can get native thread and heap using pstack and gcore, but relating these to Java code is not at all straightforward. For serviceability, an application probably needs to generate a lot more of its own diagnostics than would be the case than when running under a traditional JVM.

Although I haven't demonstrated it in this article, one of the potential strengths of native compilation is that it can run static initializers at compile time, and store just the results in the image. This could be very effective for applications that do a lot of one-time initialization, but there are side-effects that developers need to be careful about.

Summary

Using native-image is straightforward, and clearly effective, with some kinds of Java program. At present, configuration can be a bit hit-and-miss, although this might improve in time. The biggest problem seems to be the widespread use of reflection in Java. It's easy enough to accomodate this when you're compiling your own code, but it's a much bigger problem if it's a library, with unfamiliar internal operation.