The nuts and bolts of anonymous inner classes in Java

Note:
Although I've tried to keep it basically up to date, this article is about concepts that don't have so much application to modern Java programming. Please note also that I use the term 'closure' a lot, even in circumstances where a lamba expression is not strictly a closure. This is just the terminology that was used in the Java community when I wrote this article. Modernizing the article properly would take more time than I can really spare at present. Sorry.

Introduction

This article describes the use of anonymous inner classes in Java programming, and some of the problems that developers commonly experience in their use. With reference to the decompiled output of the Java compiler, it attempts to explain that these problems are a consequence of the way that Java had to implement inner classes without breaking backward compatibility.

Inner classes in Java

Java has supported a notion of inner classes since version 1.1. An inner class is a class whose definition is nested inside that of another class. Named inner classes are commonly used to encapsulate subsidiary class-based logic inside a class that uses it. For example, here is the skeleton of a class for parsing XML documents:

public class XMLDocument
  {
  public static class Node
    {
    protected List nodes = new ArrayList<Node>();
    // Methods for manipulated document nodes
    }


  public void parse (String s)
    {
    // Parse document into nodes
    }
  }

The class Node is defined within XMLDocument because it is subsidiary to the document itself. Because this Node class is defined as static and public, it is accessible to other classes, and access does not require a specific instance of XMLDocument. For example, in another class I could say:

  XMLDocument.Node n = new XMLDocument.Node();

Had I not declared the Node class as static, I could still manipulate instances of XMLDocument.Node from other classes, but I could not instantiate Node instances independently of an enclosing XMLDocument.

It should be clear that defining an inner class as public static is almost the same as defining a global class. What we're really doing here is introducing a new level of application packaging, rather than encapsulating logic -- creating an inner class of this sort is rather a stylistic decision than a logical one. Had I declared the node class non-static and private, then I would have been created a class which is owned and managed entirely by its enclosing class; but it would still be a named class with its own identity, of a sort.

Anonymous inner classes

Anonymous inner classes look quite different from named classes. They have no (programmer-defined) name, only a limited independent identity, and typically are defined entirely within specific programming statements. The following code example shows a typical use of an anonymous inner class, which defines and instantiates a subclass of Thread to carry out some background operation.

public class Test 
{
public static void main (String[] args)
  {
  final int[] ticks = new int[1];
  new Thread()
    {
    public void run()
      {
      while (true)
        {
        try {Thread.sleep(1000);} catch (InterruptedException e){}
        System.out.println ("ticks=" + (ticks[0]++));
        }
      }
    }.start ();
  System.out.println ("Thread started; carrying on...");
  }
}

The syntax is somewhat unlike any other class definition in Java. In outline we have:

  Something o = new Something() 
    {
    // Definition of the methods of Something
    };
   o.someMethod();

In the previous example, since we were only calling one method on the Thread (start()), we did not even need to assign the new instance a variable name. We just had:

  new Something() 
    {
    // Definition of the methods of Something
    }.someMethod();

Something may be a class name or an interface name; definition of an anonymous inner class is one of the few circumtances in which we can legitimately say new [Interface] in Java. We can't instantiate an interface, of course, but with anonymous inner classes we provide the implementation of the interface right in the instantiation statement itself. Similarly, we can instantiate a fully abstract class this way, provided that the definition of the inner class provides definitions of all the required abstract methods.

Whether you like the anonymous inner class syntax or not, it is undeniable that this is idiomatic Java. Partly this is because Java relies so heavily on interfaces, and it is often much more compact simply to provide the implementation of the interface in line with the code that uses it.

Limitations of the use of anonymous inner classes

There are two, related problems that Java developers frequently come up against when coding with anonymous inner classes. Both are related to identifier scope, but in different ways.

Consider the following code, which is a very slight variation on the previous example -- but this one will not compile:

public class Test 
{
public static void main (String[] args)
  {
  int ticks = 0;
  new Thread()
    {
    public void run()
      {
      while (true)
        {
        try {Thread.sleep(1000);} catch (InterruptedException e){}
        System.out.println ("ticks=" + (ticks++));
        }
      }
    }.start ();
  System.out.println ("Thread started; carrying on...");
  }
}

If you try to compile this, you'll get a spiteful message from the compiler:

Test.java:13: local variable ticks is accessed from within inner class; needs to be declared final
        System.out.println ("ticks=" + (ticks++));

Of course, declaring the variable final is not at all what we want in this case -- we want the thread to be able to update the value of ticks.

The usual (wrong) explanation that is offered for this problem is that the variable ticks is out of scope when the method main() ends, and so is not available to the inner class. However, the same could be said for the variable ticks[] in the first example, and that compiles just fine. In fact, declaring a final array containing one variable is an ugly, but common-place, workaround for the problem described here.

The other common problem concerns scope resolution within the methods of the inner class. In the example above, the closest enclosing scope of the method run() is the class Thread and not the method main(), even though the code layout would suggest otherwise. This can lead to subtle problems with unexpected methods being called when there are multiple methods with the same name in different scopes.

Both these problems are hard to understand until we see how anonymous inner classes are actually implemented.

Anonymous inner classes under the hood

To understand what's going on here, we need to look at the code generated by the compiler. Because bytecode is not particularly easy to read, my approach will be to compile the classes, then convert them back to Java with a decompiler tool.

The first point to note is that the Java runtime has no understanding of inner classes at all. Whether the inner class is named or anonymous, a smoke-and-mirrors procedure is used to convert the inner class to a global class. If the class has a name, then the compiler generates class files whose names have the format [outer]$[inner] -- $ is a legal identifier in Java. For anonymous inner classes, the generated class files are simply numbered. So when the Thread example at the start of this article is compiled, we end up with a class file called Test$1.class. The number '1' indicates that this is the first anonymous class defined within the class Test.

Here is the code generated by the compiler for the public class called Test.

public class Test {
   public static void main(String[] var0) {
      int[] var1 = new int[1];
      (new 1(var1)).start();
      System.out.println("Thread started; carrying on...");
   }

You'll notice that the entire inner class definition is missing, and the instantiation of the inner class and the call to the start() method is replaced by:

      int[] var1 = new int[1];
      (new 1(var1)).start();

The class called 1 (not normally a legal class name, of course), is the anonymous inner class, whose implementation in the class file Test$1.class we'll get to in a minute.

Because the decompiler loses local variable names, it takes a bit of detective work to realize that var1 is actually the final array ticks we declared in the main() method:

      int[] ticks = new int[1];

When the anonymous inner class is instantiated, it gets passed the array ticks in its constructor. We did not tell the compiler to do that -- it had to do it, because there's really no other way for the local variable ticks to be made accessible to the anonymous inner class which, as we can see, is not really inner at all.

Now the inner class itself:

final class Test$1 extends Thread {
   final int[] val$ticks;
   Test$1(int[] var1) {
      this.val$ticks = var1;
   }

   public void run() {
      while(true) {
         try {
            Thread.sleep(1000L);
         } catch (InterruptedException var2) {
            ;
         }

         PrintStream var10000 = System.out;
         StringBuilder var10001 = (new StringBuilder()).append("ticks=");
         int var10005 = this.val$ticks[0];
         int var10002 = this.val$ticks[0];
         this.val$ticks[0] = var10005 + 1;
         var10000.println(var10001.append(var10002).toString());
      }
   }
}

Some of this rather tortuous code arises from the way that string concatenation is implemented in Java -- as a bunch of StringBuilder operations. That code isn't really relevant here.

The first thing to note is that the class Test$1 extends Thread -- it has to, because that's part of the definition in the original public outer class:

 new Thread()
    {
    // etc
    }.start();

Now look at the next few lines of this class:

   final int[] val$ticks;
   Test$1(int[] var1) {
      this.val$ticks = var1;
   }

The array val$ticks is simply the counterpart in this inner class of the array ticks that we declared in the main() method of Test. The constructor initializes this array from the value of ticks passed from the enclosing class.

Thereafter, the run() method references the elements of val$ticks, and any modifications made in this method are reflected back in the main() method, since ticks and val$ticks refer to the same method;

Had the method main() introduced more local variables, then the compiler would simply have extended the constructor of the anonymous class to include more paramters.

Why the implementation leads to problems

The Java runtime has no built-in notion of inner classes. We have seen how anonymous inner class usage is cleverly transfored into global class operations, with a bunch of synthetic variables and constructors forming the bridge between the inner and outer classes.

But, in the end, we are dealing with separate classes here. They have the same scope and lifetime arrangements as any other Java classes. It's easy to see why the run() method in the anonymous class 'sees' members of the Thread class before members of the Test class -- at runtime the inner class is nothing more nor less than a global class that extends Thread.

The problem in which local variables need to be declared as final is also easily explained, when we know how the implementation works. If I had defind ticks as a plain integer, then it would have been passed to the constructor of the inner class by value, and the inner class would have its own version of the variable, completely idenpendent of the value in the main() method. This has the potential to be deeply confusing and error-prone, and so the compiler rejects any attempt to create such a situation.

When we refer to an array in the run() method it still has to be declared final; but all this means in Java is that the variable that represents the array cannot be changed to take on the value of a different array. It does not mean that the array contents cannot be changed. Arguably, this is an odd definition of 'final', but it's useful here.

All these limitations could be overcome by changing the way that the JVM deals with inner classes at runtime. So far, no such change has been made, presumably because it would be difficult to keep the JVM backwardly-compatible with earlier compiled code. Moreover, it's possible that any plans in that direction could be overtaken by the current work on closures.

Where closures fit into all this

It seems very likely that the way in which anonymous inner classes are predominantly used reflects the fact that Java had for a long time no support for closures as first-class language elements. Many of the things that we do, in a rather ungainly way, with inner classes can be done in a more elegant way with closures. In this context, a closure is a code block that can be manipulated as an independent language element. With closure support, our original threading example code could be re-written something like this:

public class Test 
{
public static void main (String[] args)
  {
  int ticks = 0;
  new Thread
    (
      { () -> while (true) { System.out.println (ticks++); } 
    ).start();
  System.out.println ("Thread started; carrying on...");
  }
}

In this example (which may, or may not, ever work in Java), I've passed an anonymous block of code to the constructor of Thread, which stores it, and invokes it when its start() method is called. It's not hugely more elegant than the anonymous inner class example but, to be fair, this isn't really the kind of situation that closures are intended to simplify.

Closing remarks

The limitations of anonymous inner classes can readily be understood, not as the result of theoretical decisions in programming language theory, but expediencies that follow from the implementation strategy. Whether the introduction of closures into the language will eventually change any of these limitations remains to be seen.