Compiling a Java program to native code using GraalVM, from the ground up
GraalVM is a relatively-recent compiler and runtime package that has the capability to support multiple programming languages. It is open-source, but largely maintained and supported commercially by Oracle. An interesting feature, albeit one that is still subject to ongoing development, is its ability to produce true native-code executables that are reasonably self-contained. At the time of writing, the native-code generation feature is not available for all platforms.
This article describes step-by-step how to install GraalVM for Linux, add the 'Native Image' plug-in, and use it to compile some trivial Java programs. We shall see that even very simple programs can present some problems for the technology.
The GraalVM maintainers describe compiling to native code "ahead of time compilation". In my day, we just called that "compilation", and what a JVM did was called "interpretation". Oh, well.
In principle, compiled Java is what we've all been waiting for, at least in the micro-services world. It's difficult to think of anything that runs under a JVM as a micro-anything -- not without a hollow laugh, anyway. GraalVM's compiled Java offers the possibility -- in principle -- of millisecond program load times, and a memory footprint more credible in an environment where there might be hundreds of concurrent micro-services in the same physical host. Because Java is a VM-based language, it's impossible to eliminate the VM completely, but the 'SubstrateVM' runtime used by Native Image is claimed to be relatively lightweight. The picture isn't all as rosy as that, as we shall see.
As with all my software development articles, all steps are carried out on the command line, using no development tools more complicated than a text editor, so all the individual steps are (I hope) clear.
Obtain GraalVM
GraalVM exists in a 'community edition' (CE) and a commercially-supported 'enterprise edition'. Both can be obtained from the GraalVM download page. The community edition is all that is required to follow the examples in this article.
GraalVM CE is installed on Linux by the simple expedient of unpacking
the download bundle into any convenient directory. In the examples that
follow, I will be using an installation in
/home/kevin/lib/graalvm
. Of course, you can install GraalVM
is a system-wide location if you prefer and if you have access rights.
Install the Native Image plug-in
With the community edition, this should be very straightforward:
$ /path/to/graalvm/gu install native-image
The installation provides the utility
/path/to/graalvm/bin/native-image
Hello, World
Let's start with the simplest possible Java program, to check that everything
is installed and set up properly. In any convenient directory,
create the file HelloWorld.java
, like this:
public class HelloWorld { public static void main (String[] args) { System.out.println ("Hello, World"); } }
GraalVM provides a javac
compiler, but native-image
works on compiled byte-code, not source code. Therefore there's no particular
reason to use GraalVM's own javac
. Whichever javac
you prefer to use, compile the Java (running javac
this way
will produce a compiled .class
file in the same directory
as the source, as there is no package
definition).
$ javac HelloWorld.java
Now feed the compiled class file into native-image
:
$ JAVA_HOME=/path/to/graalvm /path/to/graalvm/bin/native-image \ -classpath . HelloWorld
native-image
can handle collections of class files, and
also JAR files. In the latter case, it's still necessary to specify
the class that provides the main()
method -- the
utility won't read it from a JAR manifest, even if there is one.
It's at this point that you might get your first disagreeable surprise:
compiling Hello World to a binary will take about a minute.
In a sense, that's not surprising -- Java is not a language designed
to work this way, and the native-image
utility has a great
deal of work to do. Still, debugging a program when each change takes
a minute to test soon gets old. It's not so bad if running native-image
is just the last step in a process where most of the earlier steps
are performed by a conventional compiler -- but we aren't quite at that
stage yet.
Be that as it may, the native-image
should have produced an
executable called "helloworld" -- in this case, the binary name is
derived from the main class name -- which you can run at the prompt:
$ ./helloworld Hello, World
Note that the binary is about 2.2Mb in sized in the default, unstripped format. Stripping symbols from it doesn't make it much smaller. It's large compared to a C program that does the same thing, but it's an awful lot smaller than a full-sized JVM and all its runtime dependencies.
This is interesting, also:
$ time java HelloWorld real 0m0.054s user 0m0.045s sys 0m0.017s $ time ./helloworld real 0m0.004s user 0m0.002s sys 0m0.005s
These timing figures are pretty consistent and, although I don't think
the time
utility really provides millisecond precision, it's
clear that the native code version starts and finishes in about a tenth
the time the traditional JVM needs -- and since the program
itself takes little to no time to execute, I think that's a fair estimate
of the difference in overheads.
So far, so good.
An HTTP(S) request
We've seen that it's easy enough to compile to native code the Hello World example; time to look at something a bit more complicated. The following code snippet is a program that takes a specified URL, an fetches the content to standard out.
import java.net.*; import java.io.*; public class GetUrl { public static void main (String[] args) { if (args.length != 1) { System.err.println ("Usage: java GetUrl {URL}"); System.exit (0); } try { URL url = new URL (args[0]); URLConnection c = url.openConnection(); InputStream is = c.getInputStream(); int chr; while ((chr = is.read()) > 0) { System.out.write (chr); } } catch (Throwable e) { e.printStackTrace(); } } }
This program will compile just fine, using the method described for Hello World earlier. But it won't actually run, either with HTTP or HTTPS URLs. Instead, you'll see an error message like this:
Exception in thread "main" com.oracle.svm.core.jdk.UnsupportedFeatureError: Accessing an URL protocol that was not enabled. The URL protocol http is supported but not enabled by default. It must be enabled by adding the -H:EnableURLProtocols=http option to the native-image command.
To be fair, it is at least clear what needs to be done -- for this specific feature, anyway. We need to compile like this:
JAVA_HOME=... /native-image -H:EnableURLProtocols=https \ -H:EnableURLProtocols=http -classpath . GetUrl
The problem is that I have not been able to find a complete list of features that need to be enabled this way, and I'm not sure whether such a list even exists. Since problems like this do not become apparent until runtime, and they can't be spotted by testing with an ordinary JVM, that significantly increases the testing burden.
Anyway, having fixed this problem it's on to the next one. Although
GetUrl
now works with HTTP URLs, it fails with HTTPS,
like this:
WARNING: The sunec native library, required by the SunEC provider, could not be loaded. This library is usually shipped as part of the JDK and can be found under <JAVA_HOME>/jre/lib//libsunec.so. It is loaded at run time via System.loadLibrary("sunec"), the first time services from SunEC are accessed.
Again, it's clear what needs to be done, but it's less clear why.
libsunec.so
is provided with GraalVM, so why is it not
just linked with the compiled executable? For testing purposes, we can
just copy the library from the GraalVM installation to the current
directory, but the application is no longer self-contained. It's also
not entirely clear to me whether I can legally distribute this library
with my application, or whether I would have to give instructions to
end users on how to obtain a copy themselves. Handling SSL is hardly
a niche activity in a Java application, and it's surprising that doing
so requires these additional steps. I've also heard that the list
of SSL certificates that gets built into the executable is truncated,
and that it might be necessary to supply a full list; but I've not
noticed this problem myself.
Reflection
This is the last example in this article and, in many ways, the
most troublesome. Consider this code sample, which converts a
number to a String
in an extremely convoluted way.
I'm not suggesting this is a practical method of coding -- just a
way to illustrate a problem.
import java.lang.reflect.*; public class Reflection { public static void main (String[] args) throws Exception { Class c = Class.forName ("java.lang.String"); Object o = c.newInstance(); Method m = c.getMethod ("valueOf", int.class); Object result = m.invoke (o, 42); System.out.println ("result=" + result); } }
What this rather tortuous code does is to load the String
class by name, instantiate it, locate its valueOf
method,
then call that method on the instance. I've gone to some trouble
here never to refer to the String
class directly
at any point. While this is a pointless
exercise here, reflection of this sort is absolutely ubiquitous in Java
programming -- many would consider it one of the most power
features of the language. With reflection we can, for example,
delay decisions about which specific classes to use for particular
functions until runtime. Of course, compiling to a native executable
is all about making compilation decisions in advance -- it is inimical
to reflection.
Consequently, if we compile this class in the same way we compiled
HelloWorld
earlier, we will run into a problem at
runtime, if the program even compiles:
Exception in thread "main" java.lang.InstantiationException: Type `java.lang.String` can not be instantiated reflectively as it does not have a no-parameter constructor or the no-parameter constructor has not been added explicitly to the native image.
What's happened here is that, as the class String was loaded by
name only, and never used directly in the program, no code was
compiled for it. The native compiler is smart enough to recognize
the problem in this case, because it has built-in logic for handling
Class.forName()
; so we get a somewhat helpful message,
rather than a core dump. Still, the onus is on the developer to
configure the build to add classes that are only loaded reflectively.
There is some information on doing this on a
GitHub page.
Essentially, we must create a JSON file that lists the various
classes and methods that must be included, and pass it to the
native-image
command line using
-H:ReflectionConfigurationFiles=...
. In the present case,
it's reasonably clear what needs to be added -- a constructor for
the String
class. A suitable file is this:
[ { "name" : "java.lang.String", "methods" : [ { "name" : "<init>", "parameterTypes" : [] } ] } ]
Despite the documentation, it's not always clear how to figure out
how to specify the reflection properties, and I find that I need
a certain amount of trial-and-error. In fact, some popular libraries,
like Log4J2, use reflection so extensively that it's difficut --
perhaps impossible -- to get reflection to work correctly with
native-image
.
Notes
Command line parameters. The binary produced by
native-image
will tacitly process a number of
traditional JVM parameters. In particular, it will set system
properties if the command line has -Dname=value
, and
you can adjust the heap size using the familiar -Xmx...
switch. You don't need to take account of these switches in your
code if you don't want to -- they are silently removed from the
command line, leaving all the unrecognized parameters in the
args[]
argument to main. What this means is that the
compiled program will take whatever command-line arguments the
application handles (if any), in addition to some of the common
java
command-line switches.
Thread/heap dumps. There's no obvious way to get a traditional
Java thread or heap dump from a compiled program. You can get
native thread and heap using pstack
and gcore
,
but relating these to Java code is not at all straightforward. For
serviceability, an application probably needs to generate a lot more
of its own diagnostics than would be the case than when running under
a traditional JVM.
Although I haven't demonstrated it in this article, one of the potential strengths of native compilation is that it can run static initializers at compile time, and store just the results in the image. This could be very effective for applications that do a lot of one-time initialization, but there are side-effects that developers need to be careful about.
Summary
Using native-image
is straightforward, and clearly effective,
with some kinds of Java program. At present, configuration can be
a bit hit-and-miss, although this might improve in time. The biggest
problem seems to be the widespread use of reflection in Java. It's easy
enough to accomodate this when you're compiling your own code, but it's a
much bigger problem if it's a library, with unfamiliar internal operation.