Java as a scripting language: new auto-compilation features in Java 11

Introduction

Java logo Java is traditionally a compiled language, although the output of the compiler is not "machine code" in the conventional sense. Separate tools are needed to compile the Java source and to execute the compiled output. Languages like Perl and Python, on the other hand, are traditionally interpreted -- executing the program is a single-step operation that takes source code as its input. Many Linux/Unix shells, like bash, offer fairly sophisticated programming features in their own right: the choice whether to use bash or Perl for a task can sometimes be a coin-tossing one.

Programming languages that support one-step execution are often referred to as "scripting languages", although that's a pretty vague term.

Developers and system administrators have typically turned to scripting language for implementing quick (and sometimes dirty) utilities for simple, highly-specific tasks. The lack of a separate compilation operation makes it easy to debug and revise script code, and scripting languages usually have an accessible syntax.

Since Java 11, it has become possible to use Java in a very similar way to scripting languages. This new feature is particularly interesting if you want to use Java to write command-line utilities. So, while the basic principle I'm describing here does apply, to some extent, to Microsoft Windows, I would expect it to be of most interest to Linux/Unix developers and administrators.

Note:
When I use the term "Java script" -- which I'm reluctant to do -- I'm referring to Java scripting using the new auto-compile features in Java 11. Nothing in this article relates in any way to the JavaScript programming language. Sorry about this -- I don't choose these names.

Scripting in Java

Consider the old chestnut "Hello, World" program in Java:

public class Test
  {
  public static void main (String[] args)
    {
    System.out.println ("Hello, World");
    }
  }

Traditionally the compilation and execution would be specific steps, like this:

$  javac T1.java
$  java T1
Hello, World

In this mode of operation, javac produces one or more .class files, which contain the compiled code. java executes the compiled code, starting at a method called main() in the class T1. The java command does not take a filename as its input -- it takes a class name, and finds the corresponding .class file using class searching rules.

Since the implementation of JEP330 in JDK 11, it has become possible to run this simple Java program in a single step, like this:

$ java T1.java
Hello, World

In this mode of operation, no .class files are produced, and the compilation and execution steps are quietly combined. Running a Java program this way is very similar to running a Perl script, by entering:

$  perl my_script.pl 

In fact, there a more similarities than this one-step execution, as we shall see.

What else can we do?

One of the most striking things about this new feature is that regular Java class naming conventions are bypassed. What I mean by this is that, while traditional Java usage requires a public class to have the same name as the file that contains it, auto-compilation bypasses that requirement. In my previous example, I defined the class

public class Test

but I could have called it Fred or myclass, or anything I liked; I can still execute it using the filename Test.java. I can even define the class to be in a particular package, and execute it without using the package name (although I'm not sure why I would).

The rules for auto-compilation state that the execution begins with the first class defined in the file, at the main method.

Since the implementation refers to the "first" class, you might wonder if that means you can use multiple classes in the same file and, as it turns out, you can. So you can implement a full, object-oriented program, so long as it fits into one source file. This ability to handle multiple classes opens up the possibility of doing "real" scripting in Java, of the same kind we might otherwise do with Perl or Python.

The "script" we execute need not have a name that ends in .java; in fact, there are good reasons not to name files this way. If it doesn't, you can use the --source script to force the java utility to interpret the file as source code. So, for example, if I name my file test.jsh, I can execute it as

$  java --source 11 test.jsh 

If you're used to writing Perl or Python scripts for Linux, you might be familiar with invoking the script at the prompt just by its filename, without specifying the particular language interpreter. Can we do this with Java? Amazingly, yes we can.

Shebangs

Unix-like shells (and sometimes kernels) provide ways to have an interpreter launched, based on a specification in the file. For example, I could write a Perl script like this, in a file called my_script:

#!/usr/bin/perl
print ("Hello, World\n");

Then I can run it at a shell prompt like this:

$ ./my_script 

without needing to specify the interpreter perl in the command. Nearly always, I'll need to set the execute permission on the file first:

$ chmod 755 my_script 

This mode of operation works because of a collaboration between the shell, perl, and the kernel's program loader. Essentially, perl has to know to avoid interpreting the first line of the script, which is

#!/usr/bin/perl

and the program loader needs to read this line (and only this line) and invoke perl. This first line is colloquially know as a "shebang". Since many scripting languages use "#" to introduce a comment, the first requirement, ignoring the shebang line, is easy to implement -- at least in bash and Perl. It's more of a problem for Java -- but not an insurmountable one.

Here's a self-executing Java script. We could call the file test, or test.jsh or, in fact, anything that does not end in .java.

#!java --source 11
public class Test
  {
  public static void main (String[] args)
    {
    System.out.println ("Hello, World");
    }
  }

Then we can run it like this:

$ ./test
Hello, World

Note that the (invisible) Java compilation has ignored the shebang line, even though Java does not use a "#" to introduce comments. It is for this reason, I think, that the compiler won't allow a shebang line in a file named .java -- it completely violates the Java compiler's regular syntax rules about comments. This is a highly specific feature, introduced into Java to support scripting operation.

So what's going on?

It's important to understand that the new auto-compile feature does not change how Java works -- it's still a two-step process. Code is still compiled, following all the usual rules (apart from those applying to comments, as explained above). When the source has been compiled, the JVM's run-time engine is turned loose on the compiled code, exactly as before. All JVM subsystems, including the garbage collector, work as they always have.

There is therefore no gain in speed or memory efficiency from using Java in script mode. In fact, memory usage might be slightly increased, because the compiled code has to be held in memory for the whole duration of the program. Compared to the JVM itself, this contribution to memory usage is likely to be nugatory. However, repeatedly running the "script" incurs the compilation time overhead on every execution.

On my desktop system, the compilation processes takes about half a second, for a "script" of a few hundred lines of Java. That's not long if I only run it once. If I run it repeatedly -- perhaps in another script -- those half-second delays soon add up. In comparison, a similar script in Perl takes about 50 msec to start execution.

This isn't a surprise -- Perl is a language that was always designed to operate in a scripting, interpreted mode; Java is not. And, once compilation is over, the Java implementation may well out-perform the Perl version -- it really depends on the specific operations.

So is this real scripting?

That the new auto-compilation feature supports shebang lines -- and thus the ability for programs to be executed easily at the prompt -- does suggest that the implementers of the new feature were aiming at full-scale scripting. However, JEP330 expressly distances itself from this kind of speculation:

"...it is not a goal to evolve the Java language into a general purpose scripting language."

It is right to take this stance because, for better or worse, Java does not really have any of the features that make Perl and bash so successful for scripting.

Both these utilities (and Python to a lesser extent) were designed to form a kind of plumbing around command-line utilities. Consider, for example, the following Perl script:

my @df = `df -h`;
chomp (@df); # Remove end-of-line marks

foreach my $line (@df)
  {
  if ($line !~ /^Filesystem/) # Remove header line
    {
    print ("$line\n");
    #...
    }
 }

This script is intended to do something (doesn't matter what) to mounted filesystems, based on their size. It starts by executing the command-line utility df -h, and assigning its output line-by-line to an array of strings. Then it removes all end-of-line marks from the array using chomp(). Then it iterates the array, using a regular expression match to ignore the header line in the output from df.

The purpose of this script is unimportant -- what I'm trying to illustrate is how difficult it would be to implement these ten lines of Perl in Java. You'd have to use Runtime.exec() to execute df, and set up multiple threads to consume the stdout and stderr streams from its execution. Then you'd have to parse the output, removing end-of-line markers in your own code. You'd have to create a Vector<String> or similar to hold the specific lines as you parse them. You'd need to use the Java regular expression support to remove the unwanted lines. Oh, and you'd need to deal with character set conversion in some way, because the platform's character set probably won't match the JVM's internal string format.

In Perl or Python I can define modules -- separate program files -- that are themselves defined in Perl or Python. I don't need to compile or link them. There's no comparable way for a self-compiling Java source file to run another Java source file -- except by invoking the compiler explicitly.

The fact is that real scripting languages are good at this kind of thing -- they're good at (a) working with the platform, and (b) assembling a complex program from modules written in the same scripting language. A self-compiling Java script can run additional Java modules -- but they have to be compiled first. If the application allows for compiling some Java, there seems to be little to gain by not compiling all Java.

It's no accident that Perl is regularly voted the most hated programming language by developers, but there's no doubt that it's very good at the kinds of things it was designed to do.

In principle, where the new auto-compilation feature could be useful is in education. Auto-compilation makes Java quite access to experiment with -- but not as accessible as Python, because the student still has to content with the Java boilerplate.

Moreover, beyond primary education, is it too much to ask, that a potential programmer should know what a compiler does? Is typing javac followed by java really all that much less comprehensible than just typing java?

For all that, I can see a role for auto-compilation in an educational setting. I don't really see Java replacing Perl for one-off system administration tasks, and that doesn't seem to be the focus of the new features. These features are interesting, and time will tell whether they prove to be useful.