A collection of Java curiosities and oddities

There is no denying the success of Java as a programming language and runtime environment. However, Java has a number of decidedly odd features. Some of these are potentially troublesome; many are little more than peculiarities that are unlikely to be noticed in day-to-day work.

Investigating these curiosities is interesting, because it can lead to a deeper understanding of how Java works, and in particular of the trade-offs that have had to be made to allow Java to evolve as a language, while still remaining backwardly-compatible with older versions. Some of the observations that follow are, however, frankly inexplicable -- at least by me. If you are able to offer an explanation, please feel free to sent it to me at the usual place.

In fact, comments of all sorts are welcome, as always. Please note that what follows is in no particular order.

• Java has a NullPointerException, but no pointers.

• In many senses, a Java array is an object of a class. You can call the methods of java.lang.Object on it, determine its class, pass it by reference to a method that takes an Object parameter, serialize it, etc. However, an array is an odd kind of class. It has a read-only attribute length -- despite that fact that ordinary classes cannot be defined to have read-only attributes. As a class, an array has no methods specific to array handling. For example, you can't sort it like this:

int[] array = new int[5];
//...
array.sort(); // No

All the methods for sorting arrays, regardless of type, in java.util.Arrays. An array could override toString() to produce useful output, but does not. In short, an array is an odd hybrid of a class and a primitive.

• This construct is legal in Java:

switch (5) 
  {
  case 1: // Do something
    break;
  }

Using a literal constant as a switch expression is utterly useless, but it is legal, nonetheless. It's legal in C as well (but what would you expect?)

Oddly, this allowance of a constant for the switch expression fails if the constant is the value of an enum. This fails, for example:

 enum T {test1, test2};
 switch (test1)
   {
   ...
   }

It's not a problem that it fails, because it's useless; it's only odd that other uses of a constant as a switch expression don't fail.

• A method can have the same name as a constructor, so long as it is distinguished by having a return value. E.g., in class Test, it is legal to say

public void Test() 
  {
  // Do something
  }

This is not reported as a botched attempt at defining a constructor. Of course, this method will not function as a constructor, so it's easy to think you've implemented a constructor when, in fact, you haven't. The solution is simple -- don't use the class name as a method name unless you intend a constructor.

• In general, the == operator when applied to two object references tests whether they refer to the same object. Even if the objects have a natural notion of equality (e.g., the same attributes), the test is still for equality of reference. (new String("cat") == new String("cat")) evaluates to false just as (new String("cat") == new String("dog")) does, because they are different objects, whether or not their contents are indentical. However, the notion of equality is weakened in various places. For example, if you append the string "cat" to an ArrayList al, then al.contains(new String ("cat")) is true, even though the instance "cat" is not in the list. In these situations, the notion of equality is taciltly converted to one of the application of the equals() method (and hashCode() -- see below). This is almost certainly what the programmer wants, but Java does not do what the programmer almost certainly wants when comparing two Strings for equality (C++ does the 'right thing' here).

To compare the contents of two strings for equality, you can use the equals() method. Bizarrely, however, even this won't work for StringBuilder or StringBuffer objects, since these classes don't implement equals(). Worse, they inherit the equals() method from object which, again, does what is almost certainly the wrong thing. To compare two StringBuffer objects for equality of contents, you'd need to do something like:

s1.toString().equals (s2.toString())

In fact, the == operator does not always test for equality of reference; that is a separate oddity, which is discussed here.

• On the subject of equality and the collections framework: most methods in the standard Java API -- in particular in the collections framework -- which call equals() on any object will also call hashCode(). The general assumption is that, for efficiency, it's quicker to compare two objects' hash codes than their contents. In practice that isn't always true, but the collections framework assumes that it is.

What this means is that if you implement equals() in a class, it's almost certainly necessary to implement hashCode() as well, or risk getting unpredictable results. If for some reason you can't provide a reasonable implementation of hashCode(), it's nearly always safer simply to implement the method to return zero than to ignore it. To ignore it means to inherit the implementation from Object, which will probably be unsuitable.

• There is never a need to call super() in a constructor, although this is often seen (and recommended). If the constructor does not call super() or super(args), then the base class constructor is automatically called. If you do explicitly call super(), the extra call is silently ignored.

• It is legitimate to define an abstract class that has no abstract methods. Such a class could be realized in a completely empty subclass. The purpose of such a construct is unclear. In C++, an abstract class is indicated by the presence of abstract methods; there is no separate 'abstract' modifier for the class. Java does use a separate abstract modifier, and it can be applied to a class that is not abstract in any meaningful sense.

It could be argued that declaring a class abstract is a useful way to ensure that a class need never be instantiated -- it has only static methods. A good example of such a class is java.lang.Math. However, the Java designers have not followed this approach -- correctly, in my view, because Math (and similar examples such as java.lang.System) are not abstract classes -- they are not classes at all in any meaningful sense. Rather they are artefacts of Java's insistence that all code has to be in some class or other. Math is more a namespace than a class. In fact, later versions of Java support static imports, so we can even hide the fact that math operations are in a class:

import static java.lang.Math.sin;
//...
double x = sin (1);

The import static here serves an almost identical purpose to using namespace in C++.

• Despite Java's good record of platform-neutrality, floating-point operations are not guaranteed to behave exactly equivalently on all platforms, unless exact compatibility is demanded by marking code with the strictfp keyword.

• The operations ++ and -- are not atomic, even when applied to a volatile variable. If use_count is a volatile integer, then a different thread can interrupt the operation of use_count++ between reading the value of use_count and writing it back.

• Java does not have a sizeof() operator. Although Java primitives are of documented range, the actual amount of memory used by a primitive cannot be determined with precision. You'll often read statements to the effect that a Java char is 16 bits, but that's not strictly true. It has the range of a 16-bit value, but the actual storage is platform-dependent. Where large amounts of data are involved, this uncertainty can be significant.

• That Java does not support unsigned integer data types is well known. However, Java's char type is unsigned, in that its minimum value is zero. However, arithmetic on chars is highly inconsistent -- at least until we consider in detail the compiler's type widening rules. For example, this is legal:

char a = 1;
char b = ++a;

Although this is illegal, although superficially similar:

char a = 1;
char b = a + 1;

That is, you can increment and decrement chars, although you can't add and subtract them.... but... this is legal:

char a = 1;
char b = 2;
int c = a + b;

That's because when you add (or subtract) chars, they are widened to integers, so the result is an integer. But the increment/decrement operations, because they modify the variable itself, do not widen. That explains why the result of ++a can be assigned to char b, but a+1 cannot.

Incidentally, a char variable can be decremented below zero, in which case it wraps around to 65,536 just like an unsigned 16-bit value should.

• A related problem with number widening is this one:

long microseconds_per_day = 24 * 60 * 60 * 1000 * 1000;

Although the calculation result will fit into a long, it won't fit into an int, and the compiler will do the calculation as an int, whereupon it will overflow, and leave an unxpectedly wrong value in microseconds_per_day.

• Java goes to some lengths to ensure that a local variable cannot be used before initialization, but it isn't smart enough to spot case where a variable is clearly initialized. For example, this code works fine:

  int a = 5, b;
  if (a > 10) 
      b = 1;
  else
      b = 1;
  System.out.println ("b= " + b);

But the following will not compile, because b 'might not have been initialized'.

  int a = 5, b;
  if (a > 10) 
      b = 1;
  if (a <= 10)
      b = 1;
  System.out.println ("b= " + b);

However, the two code snippets are exactly equivalent.

In C++, issues of this sort are generally considered to be warnings rather than errors, and compiler pragmas are typically provided to suppress them on occasions where the developer is smarter than the compiler. In Java, we often have to modify our code to accomodate the compiler's lack of smarts.

• A related issue to the previous one -- the compiler not being smart enough to work out whether the local variable had been initialized -- is the problem of detecting whether a final variable has been initialized more than once. Consider this example:

class Test
  {
  final int test;

  int getTest() throws Exception
    {
    throw new Exception();
    }

  Test ()
    {
    try
      {
      test = getTest();
      }
    catch (Exception e)
      {
      test = -1;
      }
    }
  }

int test is a blank final -- a final that can be assigned to exactly once in the life of the object. This code does not compile because the compiler thinks there is the possibility of test being multiply initialized, but close inspection of the code reveals that this is not the case. In the try block, test will only be assigned if no exception is throw, and in that case the catch will be skipped.

As in the previous example, it might sometimes be necessary to rearrange code to allow for the compiler's lack of insight.

• Despite Java's vigour in preventing local variables being used uninitialized, it is perfectly legal to use instance variables uninitialized (in which case, default values are used).

• There is no straightforward way for a Java application to determine the filesystem path of its executable. That is, there is no eqivalent of argv[0] in the arguments passed to main() in a C program. Since this information is often necessary, Java developers have to resort to platform-specific tricks, like reading the value of /proc/self/exe on a Linux system.

• Java provides signed right-shift (>>) and zero-fill right-shift (>>>) operators. However, as Java has no unsigned integer types, a zero-fill right shift (zeroing the sign bit) makes no sense whatsoever. If the integer is positive, then the signed and zero-fill right shifts are equivalent; if the integer is negative, then the effect of the zero-fill right shift will be to divide by minus two, rather than two. That, the negative number will become positive.

• Another oddity of the bit shift operators is that the shift distance is reduced modulo the variable width. That is, if you try to shift an integer 34 places, it will actually shift it 2 places (34 % 32). In ANSI C, such a shift to the right will result in zero, as all the data will be shifted out and replaced with zeros. To the right, the result will depend on the sign bit and the signedness of the variable. Typically the compiler will produce a warning in such situations, if it can.

There's no right or wrong approach to handling a situation like this; but arguably Java's approach is the least helpful -- it does something that the developer almost certainly did not intend or expect.

• In the collections framework, List interface defines a collection of indexed, non-unique elements, while Set defines a collection of unindexed, sorted, unique elements; but there is no built-in collection of indexed, unique elements, or of sorted, non-unique elements (indexed or otherwise). For some reason, only unique collections (sets) can be defined as sorted, and only non-unique collections (lists) can be defined as indexed. There are many other possible arrangements of data that have no representation in the collections framework. It's interesting that there is no concrete implementation of the Collection interface. Of course, the developer can just use ArrayList and ignore the ordering capabilities.

• If class Child extends class Parent, and Child and Parent are in different packages, then a method in Parent that is not specifically tagged as public or protected is not overridden by a method with the same name in Child. However, if the classes are in the same package, the method in Parent is overridden. In fact, this is all perfectly logical when you consider the rules about method access: a method without modifiers is private when considered by classes in a different package, but accessible in the same package; and you can't override a private method (the subclass might not even be aware that the method exists). The problem, of course, is that the rules on overriding and polymorphism become completely different when the classes are in different packages, compared to when they are in the same package. The solution is to tag a method as protected when it is intended to be overridden, and/or to make careful use of the @Override pragma whenever you expect a method call to override a base class method.

This issue is discussed in more detail in this article.

• Java allows a class to have members with the same name, so long as they are unambiguous. However, there are cases where names can be ambiguous, and the compiler does not complain. Consider the following example:

class Ambiguous
  { 
  static class AmbiguousName
    { 
    static String text = "Inner class";
    } 
  static TopLevel AmbiguousName = new TopLevel();
  }  

class TopLevel { String text = "Top-level class"; }

public class Test 
  { 
  public static void main( String[] args) 
    { 
    System.out.println( Ambiguous.AmbiguousName.text);
    }
  }

In the class Ambiguous, the name Ambiguous identifies an inner class and a variable. The variable references a class which has an instance variable text, and the inner class has a static instance variable of the same name. So to what does Ambiguous.AmbiguousName.text refer?

The answer is that the instance variable takes precedence over the inner class with the same name. No compiler error or warning is generated. This is a somewhat contrived example -- and in most cases the use of consistent naming conventions ought to prevent this kind of situation arising.

• You can use the unicode character specifier (\uX) not only as part of a string or character literal, and not only to name identifiers, but even as part of a Java keyword. For example, this is legal:

  cha\u0072 = '\u0072';

\u0072 is the unicode code point for the letter 'r'. Of course, such notation for Java keywords is unreadable. It's not particularly helpful for identifiers, either, given that the Java compiler will happily read unicode source files, so the actual characters can be used. In fact, it's possible to code an entire Java application as a string of \uX values -- an extreme example of source obfuscation.

More practically, the way in which the compiler treats unicode escapes can lead to odd results. For example, the compiler will reject this comment:

  /**
    A test for this class may be found at
    c:\unit_tests\something.java
  */

\unit_tests starts like a unicode escape, but is not one, and the compiler treats it as an error. The compiler will also reject this:

  /**
    // Note: \u000A is the unicode value for a line feed
  */

This fails because the unicode \u000A is parsed before the comment, so the line is split into two because \u000A is,as the comment itself says, a line feed. The second part of the line does not begin with //, and so is no longer a comment.

• Despite its name, Math.abs() is capable or returning a negative value when applied to integers. Specifically, it returns the same value as its argument when the argument is the largest negative number that the variable can hold. This peculiar result follows from the way two's-complement arithmetic works. Since Integer.MIN_VALUE -- the largest negative integer -- cannot be negated (because zero is considered to be positive, so there are one fewer, non-zero positive numbers available in the range than negative numbers), the argument is returned unchanged. Conceivably abs() should throw an exception in this case; but when we consider all the other places where integer arithmetic can fail without an exception, it hardly seems worth it for this unusual edge case.

• char variables are treated as integers when added to other number types, and characters when added to Strings. When two chars are added, they are both treated as numbers, not as Strings. These rules lead to the rather odd results:

"Hell" + 'o' = "Hello" (String)
104 + 'o' = 215 (integer) 
'h' + 'o' = 215 (integer)

This oddity follows from the way that char can be used as both a number and a character in Java. Since Java already has a 16-bit signed integer type, and does not really need a 16-bit unsigned integer type (since Java eschews unsigned integers in general), there really is little need to allow char variables to be treated as numbers at all.

In C++, the Standard Template Library defines the addition of strings to characters, characters to integers, and all manner of other things, and that's all well and good. The problem with Java is that the oddities are not part of a library implementation, but embedded into the very syntax of the programming language.

The ability of Java to treat chars as numbers leads to common programming errors like this:

  StringBuffer sb = new StringBuffer ('?');

What the programmer probably intended was a StringBuffer initialized to the string "?"; the actual result is an empty string whose initial capacity is the number of characters indicated by the unicode value of the ? character. • Number classes define constructors on Strings, but not equality with Strings. Consider this example:

 Integer x = new Integer("2");
 if (x.equals(2)) System.out.println ("equals1");
 if (x.equals("2")) System.out.println ("equals2");

You might think that if you can initialize an Integer from a String, you could compare an Integer with a String. Sadly, no. x.equals(2) is true, x.equals("2") is false, even though x was initialized from "2"

• An array of chars is not a kind of String (unlike in C)... except when it is. Consider this example:

System.out.println (int[]{1,2});
System.out.println (new char[]{'a','b'});

The println() call will happily concatenate the elements of the character array into a single text string. It won't do the same with an integer array -- it doesn't even display the contents of the array, just an object handle.

To add to the confusion, the following example also prints the character array as an object handle:

System.out.println (int[]{1,2});
System.out.println ("Array is: " + new char[]{'a','b'});

This is because, unlike the println() method, the + operator does not format the character array as a string.

• In Java, an anonymous code block (that is, statements enclosed in braces and not part of a specific method) in a class is considered to be part of the instance initializer, and is copied into the start of each constructor in the class. This is a rather confusing and ugly way to relieve the developer of the need to implement an additional method containing code shared between multiple constructors. Conceivably this construct exists because there needs to be some way to initialize anonymous inner classes (which can't have programmer-defined constructors).

• Java 1.7 allows digits in a number to separated by underscore characters to aid clarity of expression. So we can write one thousand as "1_000".

The problem is that _ is a valid character in an identifier, so there are complicated rules about how it can be used in a number. It's is not difficult to see why this statement does not define a number literal -- _1000 is a valid identifier name.

   double x = _1000;

It's less easy to see why this number definition is invalid:

   double x = 1000_;

Arguably, it does not reflect an everyday use of the digit separator -- we would not write "1000," in the UK, for example. On the other hand, we would not write "10,00" either, but

   double x = 10_00;

is legal Java.

• It is impossible to define a multi-line string literal in Java, except by using the addition operator, with the run-time overhead this entails.

• In general, the order of definition of members of a class is not important. If method a() calls method b(), which refers to variable c, there is no requirement that the members are defined in the top-to-bottom order c, b(), a(). However, the declaration of static initializers is inconsistent in this regard: variables referred to in a static {...} block must appear before that block in the source code. Similarly, a static initializer that instantiates its own class must appear after static initializers it depends on. Both these issues arise from the fact that static initializers are executed strictly from top to bottom; the compiler does not try to determine what dependecies there might be between them.

• The methods that mutate a StringBuffer or StringBuilder are implemented somewhat inconsistently. If sb is an instance of StringBuilder, the call sb.subSequence() returns a new object containing a substring of sb. It does not modify sb itself. On the other hand, the methods insert() and delete() do modify the string itself, even though they return a value. The value returned is, in fact, a reference to the original string. The way these methods are declared gives the impression that they will all return a new object, leaving the original unchanged, but this is not the case.

• The compound arithmetic operators +=, etc., implicitly cast their results to the type of the left-hand-side of the assignment, even if this would cause an overflow. So the following code compiles perfectly well:

 long y = 1000000000000L;
 int x = 0;          
 x += y;
 System.out.println ("x=" + x);

When run, the value of x turns out to be -727379968. It is therefore never really safe to use += on integers of mixed range. The following code (properly) fails to compile:

 long y = 1000000000000L;
 int x = 0;          
 x = x + y;

• The process of auto-boxing and auto-unboxing, introduced in later Java versions, can make it easier to read and write code that uses classes which wrap primitives (Integer, etc). Auto-boxing is the implicit conversion between primitive number types (int, float) and class number types (Integer, Float). To some extent this implicit conversion avoids one of the features that caused most complaint among Java programmers -- that there were essentially two separate programming conventions for primitive number types and class number types. However, auto-boxing can lead to surprising, even shocking, results.

    Integer x = new Integer (1);
    Integer y = new Integer (1);
    System.out.println ("x <= y: " + (x <= y));
    System.out.println ("x == y: " + (x == y));

The output is:

x <= y: true
x == y: false

The problem is that the arithmetic less-than operator causes auto-unboxing of the Integers into primitives, while the equality operator does not. So the equality test is for reference equivalence, and x and y are not the same instance.

Actually, it isn't quite true to say that the equality operator doesn't cause auto-unboxing. It does if the comparison is between a number class and a literal number or a number variable. Consider this code:

   Integer a = new Integer (0);
   Integer b = new Integer (0);
   int c = 0;
   System.out.println ("a == b: " + (a == b));
   System.out.println ("a == 0: " + (a == 0));
   System.out.println ("a == c: " + (a == c));

The output is:

a == b: false
a == 0: true
a == c: true

So if we compare two Integer objects containing zero for equality, we find that they are not equal (because we're testing the references, not the contents). However, if we test an Integer containing zero against a literal zero or an int with value zero, then they are equal, because of the auto-unboxing.

If that wasn't tricksy enough, a truly bizarre result comes from this code:

    Integer x = 1; 
    Integer y = 1; 
    System.out.println ("x <= y: " + (x <= y));
    System.out.println ("x == y: " + (x == y));

In this case, x == y is true, but for entirely the wrong reasons. It's true because after the assignments, x and y refer to the same instance. This behaviour is consistent with the way that String objects are assigned from string literals: if s1 = "hello" and s2 = "hello" then s1 == s2, but because s1 and s2 are the same object, not because their contents are the same.

And is if that wasn't enough, consider this code, which is the same as the previous apart from the actual numbers:

    Integer x = 11112332; 
    Integer y = 11112332; 
    System.out.println ("x <= y: " + (x <= y));
    System.out.println ("x == y: " + (x == y));

Now, amazingly, (x == y) is false. It appears that Java's sharing of literal object only applies to specific values. Specifically, it applies to values between -128 and 128. Anything that needs more than 8 signed bits gets a separate storage allocation.

Since String doesn't support arithmetic comparison operators like less-than, the inconsistencies in the way instances are created by assignment are not all that apparent. But with number classes, we have the truly frightening result that arithmetic comparisons depend on how the instance was initialized, and the actual value assigned, in addition to the inconsistent behaviour between == and all the other comparison operators.

• Unlike C++, Java lacks a way to specify that a class method does not modify the instance. In C++ we can say, for example:

  int getCount() const { return count; }

and it is clear to the user of the class that obj.getCount() can not modify obj. As well as improving clarity, it prevents the developer carelessly introducing mutating code into a method which is specified to be non-mutating. The lack of such a construct in Java is a significant ommission, which has to be overcome in awkward ways.

• To divide a floating point number by zero is not an error in Java. The floating point representation used is able to represent the quantity that arises from dividing by zero; you can even do arithmetic (to some extent) on these quantities, which are misleadingly (from a mathematical standpoint) referred to as 'infinities'. Similarly, no exception is thrown when attempting to take the square root of a negative number, even though Java has no built-in support for complex numbers. Instead, the result is the value 'NaN' (not-a-number). NaN is an odd kind of thing; it is the only numeric value in Java for which is not equal to itself. That is (NaN == NaN) evaluates to false.

Whatever the mathematical merit of this kind of number handling, it is inconistent with Java treatment of integer math.

• A switch expression can be used with an enum variable and, in fact, this is one of the most powerful and natural uses of a switch. However, the Java component won't warn you if you base a switch on an enum and then fail to handle on of the enum's values in a case (which is almost always a programming error). Most C++ compilers can do this so, presumably, it isn't rocket surgery.

• Java famously has no 'goto' construct or direct equivalent. Nevertheless , goto is a reserved word in the Java langauge. You can't, for example, name a variable 'goto'. Equally oddly, you can apply a label (e.g., something:) to any statement, or even a comment, although only loops and switch cases can be usefully labelled.

• Similarly, Java has reserved the keyword const, but it has no function. Instead, Java uses the keyword final for three different roles, one of which broadly aligns with const variables in C/C++. The other two uses of final are on a class (cannot be extended) and a method (cannot be overridden). It is not clear that these different semantic modifiers really benefit from being given the same keyword.

• Java provides no way to specify that a class member is to be accessible to the class and its subclasses only. You can declare members as private, which will hide them from the rest of the package; but it will also hide them from subclasses, whether in the same package or not. protected will make the member available to subclasses, but it will also make it available to other classes in the package, whether they are subclasses or not.

The only complete solution to this problem is to declare each class in its own package. Since that would be very ugly, the next best thing is to structure packages so that each consists of relatively few classes, written in close collaboration.

Java did support, for a short time, an access modified 'private protected' which was pretty much the same as protected in C++. However, it was removed, for reasons I've never fully understood. I suspect that the Java designers thought that the access control mechanism was already complicated enough, and by 'private protected' did not fit neatly into the regular progression of increasing access -- private, default, protected, public.

• Java supports abstract methods and static methods, but not abstract static methods. Similarly, you can't declare an interface method as static.

It's often claimed that 'abstract' and 'static' are logically incompatible. You'll sometimes see this claim backed up by the (philosophically dubious) statement that abstract methods cannot be overridden. Abstract methods are inherited by subclasses, and subclasses can define their own methods with the same name and arguments. That the Java language specification does not consider this a complete override is a matter of how the specification uses the word 'overrides', not a logical limitation of the concept of overriding. In Java terminology, a method only 'overrides' another if the choice of which method to call can be made at run-time, based on the run-time type. The overriding of static methods is not a true override (in these terms) because the call decision is based on compile-time information only.

Whetever the rights or wrongs of the way Java uses the term 'overrides', that 'abstract static' methods are not logically incoherent is indicated by the fact that some languages, e.g., SmallTalk, do provide such a construct.

• It is syntactically legal in Java to cast an object to an interface even if the object's class does not implement that interface. The same is not true for casting an object to another class type. Of course, the cast will fail at run-time; but it's odd that it is not rejected by the compiler even in cases where it is perfectly clear that the class in question does not implement the interface.

• Java interfaces can optionally be declared 'abstract', as can their methods. The 'abstract' modifier in such cases has no meaning and is ignored. Similarly, interface methods can be declared 'public', but such a declaration has no effect.

• Java allows an abstract class to be defined with a public constructor. Because the class can not (by definition) be instantiated, the use of a public constructor is meaningless. In fact, any access specifier excepted 'protected' in the C++ sense is meaningless -- the Java version of protected makes no sense either, because this would normally extend access not only to subclasses, but to other classes in the same package. But these other classes won't be able to instantiate the abstract class either, because it is abstract.

• All the classes in the collections framework are parameterised, so for example, Collection.add() is defined as

boolean add(E e)

What this means is that if I create an ArrayList like this:

ArrayList<Integer> list = new ArrayList<Integer>();

Then I'll get an error from the compiler if I try to add something to the list that cannot safely be converted to an Integer. This, for example, fails:

list.add (new String ("Hello"");

The remove() method, however, is not parameterised; it takes an Object argument. So this is legal:

String s = "hello";
list.remove (s);

Since one of the main reasons for using parameterised classes is to increase compile-time type security, declaring remove this way appears to be a missed opportunity.

• The toArray() method on Collection, like the remove() method, also has type-security problems. There is a variant of toArray that returns an object[] -- the problem there should be clear enough. There is also a toArray() method that takes an object and returns an array of that type:

<T>T[] toArray (T[] a)

But all that does is apply a downcast to each element in the collection; at compile time there is a check that the left-hand-side of the assignment from toArray() matches the type of the argument a, but there is no compile-time check that T is the proper type for the contents of the collection. So this code compiles perfectly well:

ArrayList<Integer> list = new ArrayList<Integer>();
list.add (new Integer(1)); 
String[] s = list.toArray(new String[0]);

It fails at runtime with an ArrayStoreException. What is needed -- and is not provided -- is the method

Interface Collection<E>
  {
  <E> E[] toArrayOfCollectionType();
  }

In fact, this method cannot easily be provided because of the internal implementation of generic classes -- see here for a further discussion of this point.

• That Java lacks an equivalent of C's typedef construct is well known and, in most cases, it is not missed. However, consider the following situation. We need to define a variable to store an integer quantity, and its range is not well known at the outset. It's easy enough to change an instance variable from

short total_lines;

int total_lines;

What's less straightforward is fixing all the casts that Java requires us to use when doing arithmetic with anything other than double or int.

For example:

short total_pages = 10_000;
short total_sheets = total_pages / 2; // Error
short total_sheets = (short)(total_pages / 2); // That's better

The second line won't compile because the result of total_pages / 2 is an int, not a short, even though if you divide a short by any integer at all, the result must fit into a short.

The problem arises when lines 1 and 3 in our previous example are, in reality, a thousand lines apart, or in separate source files. Suppose we decide at some point in the future that total_pages (whatever that represents) needs to be an int, and not a short. How do we find all the other lines that might be affected by such a change? Worse, what happens if we don't find them?

The problem is that it's not an error to write code like this:

int total_pages = 10_000;
// 1000 lines...
short total_sheets = (short)(total_pages / 2); // That's better

Here, the int total_pages very possibly won't fit into a short any more, even when divided by two; and the cast -- which was previously essential to make the code compile -- now stops the compiler warning us that we've made a mistake.

This is, to be fair, a general problem of change control: it's hard to keep track of all the subtle dependencies in a complex program. In C/C++, the conventional way to deal with problems of this type is to typedef an application-specific data type, and define such variables as are necessary, and their associated casts, to be of this custom type. Thereafter, to change the variable range, all we have to change is one typedef. This approach is far from perfect, as well -- no approach is ideal. However, Java lacks any efficient strategy for dealing with number type changes, even an imperfect one.

There is, of course, an inefficient strategy, which is to define your number variables in terms of custom wrapper classes, not primitives. We could create a class called MyInt to hold an int, then define a subclass that extends MyInt for a specific kind of number that the application uses. If an int later proves to be inappropriate, we could create a new wrapper class -- let's say MyLong -- and change the specific number class to dervie from that instead. This task would be less inefficient -- at development time anyway -- if we could use the built-in classes Integer, etc. Unfortunately, these are all defined as final, so we can't. In any case, this kind of approach is likely to be inefficient at run-time, if the application does a lot of arithmetic.

• Java has no destructors. In most object-oriented programming languages, a class can have a destructor which is the logical counterpart of the constructor. Typically the destructor is called when the object goes out of scope, or is specifically deleted. Java has no object delete operation, and objects that go out of scope are not necessarily considered inaccessible.

The lack of a destructor makes it impossible to develop Java applications according to the 'construction is resource allocation' model; typically resources are allocated in the constructor and freed in the destructor. However, the new 'try-with-resources' constructor provides a feature which is almost a destructor.

• The method Class.newInstance() can potentially throw any exception, and attempting to handle exceptions elegantly at compile time is extremely difficult. You might know, for example, that the class you want to instantiate has a constructor that throws SQLException, but you won't be able to declare a catch block for this exception if you instantiate the class using Class.newInstance. This deficiency can be used to create a utility class that can throw any kind of exception from any point in the code, with no compile-time checking at all (should you ever need to do such a thing).

• Java does not support operator overloading... except where it does. The + operator is overloaded for binary operations between a String and most other built-in types, as is the += operator. The = operator is overloaded for String and number classes (it does a value assignment, not the usual reference assignment). All number comparisons are overloaded for the number classes, except ==. The not (!) operator is overloaded for the Boolean class.

• Constant expressions are evaluated at compile time, even when they refer to different classes. This is in contrast to almost all other Java operations between classes, which are dynamically linked and evaluated at runtime. Consider this definition:

public class Lib 
  {
  public static final int answer = 42;
  }

When another class refers to Lib.answer, the compiler silenly replaces the expression with '42'. This is fine, until we modify the value of answer. Even if we recompile the Lib class, users of that class continue to use the old value until they, too, are recompiled.

This problem usually bites when upgrading or patching libraries. You might think that, Java being what it is, you could just change the libraries, and the rest of the code could remain unchanged. However, the answer example shows that this is not so, in general.

This finding does not usually surprise C/C++ developers, who are used to working in an environment where modules are, on the whole, statically linked. In Java, however, we've got used to almost everything being dynamically linked; the fact that references to constants are not dynamically linked can come as a bit of a surprise.

• Contrary to widespread belief, the Java compiler does support conditional compilation. In C/C++, it's common to see constructs like this:

#ifdef DEBUG
//... lots of debug code
#endif //DEBUG

The value of DEBUG is optionally set at compilation time and, if it is not defined, the debug code is not only never executed, it is never even included in the compiled output. The purpose here is to reduce both the overhead of checking the value of DEBUG at runtime, and the size of the compiled program.

Java has no preprocessor that can exclude code at compile time, but it does have a rudimentary conditional compilation mechanism all the same.

static final boolean debug = false;
if (debug) 
  {
  // Lots of debug code
  }

As in the C example above, the code in the if() {...} will not only never be executed, it will be eliminated from the compiled output. Setting debug=true will include the code, and execute it at runtime.

That this technique is not well known is evidenced by the number of published mechanisms there are for simulating conditional compilation in Java -- some quite complicated. I confess that I only understood it myself when I was experimenting to find out why the compiler did not complain about unreachable statements in the block if(false){ ... }. It's only the if block that is optimized in this way; the following code won't even compile:

static final boolean debug = false;
while (debug) 
  {
  // Compiler complains about unreachable code here 
  }

if (false) {...} is one of the few constructs where code is unreachable and the compiler does not complain; another is in catch blocks that can never be entered (since Java 7; see below).

• To support (presumably) the new exception re-throwing rules in Java 7, if an Exception or subclass of RuntimeExceptino is declared to be caught, but then re-thrown intact, the method from which it is re-thrown does not have to declare it. Consider the code below, in which an exception (of class Exception) is explicitly thrown from the method fail(). The method does not have to expose the exception in its own signature, even though it is ostensibly of checked type.

public class Test
  {
  public void fail()
    {
    try 
     {
     int i = 1/0;
     }
    catch (Exception e)
     {
     System.out.println ("Caught: " + e);
     throw e;
     }
    }

  public static void main (String[] args)
    {
    new Test().fail();
    }
  }

Similarly, the main() method does not have to handle the exception, even though it is -- apparently -- of a checked class.

In Java 1.6 it would be necessary to declare the method


fail()

with throws Exception.

So what's going on here? Since it's now possible to re-throw an exception which is ostensibly a superclass of the exception which the method is declared to throw, the Java compiler has to be much more thorough about working out which exceptions can really be thrown in a code block, rather than relying on what the developer says. In the example above, the compiler knows that an integer divison by zero is possible, but that is an unchecked exception. So although the developer has said throw e, where e is of (checked) type Exception, the compiler knows that no checked exception can be thrown from this method, despite what the developer has written. And since the exception that can be thrown is unchecked, the compiler quietly ignores it.

Moreover, if the try block contains no code that can raise any exception at all, then the catch block is never even compiled. If there is code in it, then the compiler does not warn about unreachable statements -- this is another of the very rare places where you can write an unreachable statement without the compiler complaining.

• The return statement in Java does not necessarily cause an immediate exit from a method -- it is overridden by a finally. For example, this method returns 1, not 0:

 int go()
    {
    try
      {
      return 0;
      }
    finally
      {
      return 1;
      }
    }

The complication is clear enough when the try and the finally are only a few lines apart. In practice, however, it could be quite difficult to understand why the return 0 was never apparently executed. It can be argued that the use of finally is intrisically likely to lead to a flow of control that is hard to follow. That finally exists at all is really a consequence of the fact that Java does not have destructors (see here). Lacking destructors, we need some way to tidy up resource allocation that happened in the try { ... }, and which is independent of whether an exception is thrown or not. With luck, the new 'try-with-resources' construct in Java 7 will make the use of finally less necessary.

• At last, Java has support for binary literals. It's odd, however, that you can't initialize a short or a byte from a negative binary literal, while you can from a postive one. It's even odder that the same restriction doe not apply to an int.

   short short1 = -32768; // Yes
   short short2 = 0b1000_0000_0000_0000; // No
   byte byte1 = -128; // Yes
   byte byte2 = 0b1000_0000; //No

The lines marked 'no' fail in the compiler for a possible loss of precision. The lines marked 'yes' pass, even though they represent exactly the same number. • Java allows arrays to be cast to different types, in some circumstances. I'm talking about the array itself, here -- not the individual elements of the array (which can, of course, be cast according to their own rules). However, there are some oddities in which kinds of cast are allowed and which are not.

An array of objects of a particular class can be cast to an array of objects of a superclass. This is always legal, both at compile time and at runtime. For example, if Vehicle is the superclass of Car this succeeds: of String:

  Car[] c = new Car[1];
  Vehicle[] v = (Vehicle[]) c;

If we immediately follow this with the following statement:

  v[0] = new Vehicle(0); // Fails

It fails at runtime, as it should: despite the cast, v is known at runtime to be an array of Car.

An attempt to cast an array of one type to an array of an unrelated type will not compile:

  Car[] c = new Car[0];
  Bicycle[] b = (Car[])c; // "Inconvertible types"

However, an attempt to cast to a subtype is also legal at compile time:

  Vehicle[] v = new Vehicle[1];
  v[0] = new Car();
  Car[] c = (Car[]) v; // Fails

But it fails at runtime regardless of the array contents. In this example, o contains no element that is not a Car, and yet we can't cast it to a Car[], despite that it compiles correctly. Since the JVM takes no account of the contents before rejecting the cast, one has to wonder why the compiler cannot reject the construct itself.

Although the compiler is smart enough to block casts to incompatible, unrelated array types, the preceding rules mean that we can fool the compiler by casting through a common subtype, like this:

    Bicycle b[] = new Bicyle[0];
    Object[] o = (Object[])b;
    Car [] c = (Car[])o; // We have cast Bicycle[] to Car[]

The code will fail at runtime because, as we've seen, you can't cast to a subtype at runtime, regardless of the contents. Arguably, if you're prepared to go to these lengths to fool the compiler into allowing something it would ordinary choke on, you get what you deserve.

Intuitively, the rules about casting beteen types make some kind of sense. An array of cars is also an array of vehicles, and it makes sense to be able to treat it as one, at least until we try to put a vehicle in it that is not a car. On the other hand, an array of vehicles is conceptually different from an array of cars, even if all the vehicles happen to be cars at some specific point in time.

Maybe.

The problem is that, even though the rules make some kind of sense, they are not followed by the collections framework.

• Casts between ArrayList types do not obey the same rules as casts between arrays.

Broadly speaking, an ArrayList<Integer> is comparaible to an Integer[], and the same broad equivalence exists for other ArrayLists of objects. We've already seen that Java allows an array to be cast to an array of a supertype. What about ArrayList? It turns out that this fails at compile time:

  ArrayList<Car> c = new ArrayList<Car>();
  ArrayList<Vehicle> v = (ArrayList<Object>) c; // "Inconvertible types"

Remember that the equivalent cast from Car[] to Vehicle[] was valid, both at compile time and -- to some extent -- at runtime. It turns out that an ArrayList<Car> does not cast the same way as a Car[].

What's worse, this is legal:

  ArrayList<Car> c = new ArrayList<Car>();
  ArrayList v = (ArrayList) c; //Legal

It gives a warning at compile time, because we're casting a typed ArrayList to an untyped one, but it does compile and run.

Having cast the class, we find that all type safety is lost:

  ArrayList<Car> c = new ArrayList<Car>();
  ArrayList v = (ArrayList) c; //Legal 
  v.add (new Bicycle()); // Legal

I suspect that the reason that ArrayList<T> casts fail at compile time, even in circumstances where the equivalent cast for arrays would be legal, is because type safety cannot be guranteed at runtime. In the example above, of trying to insert a Bicycle into an array Car[], the code correctly failed at runtime, because the JVM knows what kind of data the array is specified to carry. However, Java generic types (ArrayList and all the rest of the collections framework), exhibit type erasure. That is, all the type checking is done at compile time; at runtime an ArrayList<Car> is nothing more than an ArrayList<Object>. Consequently, compile-time checks must be more rigorous for collections than they are for arrays.

• String.replace() and String.replaceAll() both replace all instances of one character sequence with another. The difference is that replaceAll() operates on regular expressions, while replace() works on fixed strings. This is not at all obvious from the names, and it is a common mistake to write something like:

  String windowsFilename = unixFile.replaceAll ("/", "\\");

This fails because \ has a special meaning as a regular expression replacement token.

• Unlike many other object-oriented languages, methods called from constructors in Java are called polymorphically. What this means is that if class Child extends Parent, and the constructor of Parent calls a method which Child overrides, it is the overriding method in Child that gets called even though it is Parent that is being initialized at that time. This is perfectly consistent with the way method calls generally work in Java, but it means that methods in the subclass can get called before the base class has been initialized, and that is generally a bad, and confusing, thing.

Here is a trivial example:

class Base
  {
  final int x;

  void test() {};

  Base ()
    {
    test();
    x = 1;
    }
  }

public class Test extends Base
  {
  void test()
    {
    System.out.println ("x = " + x);
    }

  public static void main (String[] args)
    {
    new Test();
    }
  }

This code prints 'x=0', because at the time Test.test() is called, the constructor for Base has not yet completed.

• By forbidding multiple inheritance, Java avoids most of the problems that arise when a method overrides another method that is supplied by multiple base classes. But not all of them. Consider this code:

public interface I1
  {
  void log () throws IOException;
  }

public interface I2
  {
  void log () throws SQLException;
  }

public class Test 
  {
  void log () throws ???;
  }

Test.log() implements I1.log and I2.log. What exceptions can it be declared to throw? Common sense should suggest that it should be able to throw either SQLException or IOException; or perhaps it should be able to throw only the common base class of these two exceptions (Exception). In fact, it cannot throw any exception. In a situation like this, where a method is specified in multiple instances, the implementing method can throw only an exception that is specified in all base types.

• null can be assigned to a variable of any class type, without a cast. And yet, null is not an instance of any class -- (null instanceof Something) is false, whatever the Something is.

• The defects in the Calendar and Date classes are so many, and so well-known, that it seems cruel to give them further exposure. But just in case there can possibly be any Java programmer who has not fought bitterly with these classes, here are just a few.

- Calendar.set() does not check its arguments for sanity. It's legal, for example, to ask for the 13th month in the year.
- The Calendar and Date APIs interpret numbers differently. For example, Calendar, or the whole, counts years from year zero, while Date counts from 1900.
- Some methods are unhelpfully named. For example, Date.getDate() returns the day of the week, not the day of the month.
- Quantities like month are not integers, they are enums. Unfortunately, the date/calendar API predates Java's enum support for a decade, and the API uses plain integers for almost everything. - Month numbers are zero-based

• The use of anonymous inner classes gives rise to non-intuitive scope resolution rules (which have changed between Java 6 and 7). For example, this code does not compile with Java 6, and does with Java 7:

class Something
  {
  public void doIt (int n)
    {
    }
  public void doIt (int n, int m)
    {
    }
  }

public class Test 
  { 
  public void doIt ()
    { 
    } 
        
  public void go()
    {   
    new Something()
      {
      public void run()
        {
        doIt();
        }
      }.run();

    }
  }

With Java 6, the compiler says:

Test.java:23: cannot find symbol
symbol: method doIt()
        doIt();
        ^

If we change the name Test.doIt() to anything else, and the corresponding call to doIt in the run() method, the code compiles correctly; so it's not that there's anything wrong with the declaration itself, and the method is in scope.

The problem is that two other doIt methods are also in scope in the run() method. run() is a method added to an anonymous inner class which subclasses Something so, in the run() method, class Something is the enclosing scope.

Although Java happily supports method overloading, it won't look for matching method signatures that are in different scopes. So once the compiler has seen doInt(int) and doIt(int,int), it will stop looking. Neither of these method signatures match the call (no arguments), so the compilation fails.

The problem is easy to spot in an example like this, where all the methods are within a few lines of each other; but it can be a real stumper when it arises in a real application.

Although this behaviour is counter-intuitive, it's actually perfectly logical and in compliance with the Java language specification. The really odd thing is that the code above compiles perfectly well in Java 7. That suggests that the scope resolution rules have been subtly altered, but the change does not appear to be documented.

• The method java.io.OutputStream(int) writes a byte to the stream, not an int. The high 24 bits are ignored completely.

• Unlike in C, the short-circuit behaviour of logical AND and OR operators is well-defined in Java. Because it is well-defined, curiosities are exposed which are hidden in C. One such curiosity is that the bitwise operators & and | do not short-circuit. In

  if (a() | b())
    {
    //...
    }

Both methods are called, even if a() evaluates to true. If the || operator had been used, b() would not have been called.

This is all perfectly sensible -- in most cases you wouldn't want to use the bitwise operator for a logical comparison. The complication is that if the types being compared are boolean then | and || (and & and &&) are equivalent -- except for short-circuiting. So for boolean comparison, we have available both short-circuiting and a non-shortcircuiting logical operators -- something that is not true for other data types.

It is arguable whether Java needs bitwise operators at all -- I suspect they only exist to make Java easier for C programmers to learn. But the bitwise operators in Java are emasculated in Java. You can't, for example say:

  if (flags &amp; MASK)
    {
    //...
    }

It would perhaps have been better if bitwise operators had been relegated to a library class. Arguably, it would have been better had the logical operators not short-circuited, because this merely saves a few keystrokes at the expense of making code harder to troubleshoot.

• Java defines an interface CharSequence to represent a sequence of characters. It is implemented by String, StringBuilder, and StringBuffer. However, this interface appears to be completely useless. If there is any method in the Java API that takes a CharSequence as an argument, I have yet to find it. For example, there is no indexOf method that takes a CharSequence in any of the standard string-handling classes, so comparing strings of different type requires a bunch of ugly toString() calls.

• Java uses 16-bit unsigned values (or, at least values with a 16-bit unsigned range) to represent unicode characters. Unfortunately, the defined unicode code point set no longer fits into 16 bits. The UTF-16 specification defines a mechanism called surrogtate pairs for storing larger values. In essence, certain 16-bit values that are not used for normal characters are used to indicate that the following value is from a subsidiary table of characters.

Java does little to hide these implementation details. In the String class, surrogate pairs are passed around as ints, not as a specific character type or class, and developers are left to handle the details. Of course, you can still call the char-based methods, such as:

char charAt(int index);

But these will fail if the string contains surrogate pairs.

Consider this example:

  // Define three strings, each with a single character
  String sLatin = "A"; // Unicode code point < 256
  String sChinese = "東"; // Unicode code point < 65535
  String sLinearB = new String (Character.toChars (0x10400)); // code point > 65536
  // Make a single, three-character string out of these three characters
  String s = sLatin + sChinese + sLinearB;
  // How long is it?
  System.out.println ("Broken length=" + s.length());
  System.out.println ("Real length=" + s.codePointCount(0, s.length() -1));

This produces the following output:

Broken length=4
Real length=3

Notice that even the very fundamental String.length() fails here. The length of the string in characters is clearly 3, because we made the string s from three characters. The problem is that length() does not return the number of characters, but the number of 16-bit units of storage. The Linear B character whose code point is 0x10400 will not fit into 16 bits -- it requires a UTF-16 "surrogate pair" of 16-bit code units.

To be fair, this oddity is documented, and so long as the programmer understands the subtle differece between "UTF-16 code point" and "UTF-16 code unit" then there should be no problem, right?

The reality is that, if there is any possibility whatever that textual input might contain Unicode code points beyond 65535, then the programmer must abandon the traditional ways of working with strings in Java, in favour of the full Unicode API. This means, for example, using codePointCount() rather than the simple length(). Most fundamentally, we can't assume that a Unicode value will actually fit into a char, even though this data type was specifically designed to hold a Unicode value. Any code that manipulates individual characters should store them in an int, and use the specific methods on classes like Character to manipulate them.

This puts Java back where C was twenty years ago, when it became increasingly obvious that all the world's characters would not fit into a character set represented by a single byte. The numbers are much bigger now, of course, but the problem is the same. The way we tackled this in C/C++ was, essentially, to ignore it. We left the core language unchanged, and provided a variety of libraries that handled unicode using the basic integer data types we already had.

Unfortunately, Java does have a core language that is supposed to make unicode support reasonably natural and transparent. When we work with unicode in C++ (and especially in C) we know we're going to be in a world of hurt; with Java, developers expect to be protected from this nastiness and, increasingly, they aren't.

• The method InetAddress.isReachable() is supposed to test whether one host is reachable from another. The documentation says that it will use ICMP if the user has the appropriate permissions, or a TCP echo if not. On Linux, however, the JVM only uses ICMP if the user is root, even though an unprivileged user can make ICMP requests on Linux. Admittedly, this isn't allowed by default on Linux, but it's readily configured. The result is that the JVM refuses even to try something that might be allowed.