A collection of Java curiosities and oddities
There is no denying the success of Java as a programming language and runtime environment. However, Java has a number of decidedly odd features. Some of these are potentially troublesome; many are little more than peculiarities that are unlikely to be noticed in day-to-day work. Investigating these curiosities is interesting, because it can lead to a deeper understanding of how Java works, and in particular of the trade-offs that have had to be made to allow Java to evolve as a language, while still remaining backwardly-compatible with older versions. Some of the observations that follow are, however, frankly inexplicable -- at least by me. If you are able to offer an explanation, please feel free to sent it to me at the usual place. In fact, comments of all sorts are welcome, as always. Please note that what follows is in no particular order. • Java has aNullPointerException
, but no pointers.
•
In many senses, a Java array is an object of a class. You can
call the methods of java.lang.Object
on it, determine
its class, pass it by reference to a method that takes
an Object
parameter, serialize it, etc.
However, an array is an odd kind of class. It has a read-only
attribute length
-- despite that fact that ordinary classes
cannot be defined to have read-only attributes. As a class, an array
has no methods specific to array handling. For example, you can't
sort it like this:
int[] array = new int[5]; //... array.sort(); // NoAll the methods for sorting arrays, regardless of type, in
java.util.Arrays
. An array could override toString()
to produce useful output, but does not. In short, an array is an odd hybrid
of a class and a primitive.
•
This construct is legal in Java:
switch (5) { case 1: // Do something break; }Using a literal constant as a switch expression is utterly useless, but it is legal, nonetheless. It's legal in C as well (but what would you expect?) Oddly, this allowance of a constant for the switch expression fails if the constant is the value of an
enum
. This fails, for
example:
enum T {test1, test2}; switch (test1) { ... }It's not a problem that it fails, because it's useless; it's only odd that other uses of a constant as a switch expression don't fail. • A method can have the same name as a constructor, so long as it is distinguished by having a return value. E.g., in class Test, it is legal to say
public void Test() { // Do something }This is not reported as a botched attempt at defining a constructor. Of course, this method will not function as a constructor, so it's easy to think you've implemented a constructor when, in fact, you haven't. The solution is simple -- don't use the class name as a method name unless you intend a constructor. • In general, the == operator when applied to two object references tests whether they refer to the same object. Even if the objects have a natural notion of equality (e.g., the same attributes), the test is still for equality of reference.
(new String("cat") == new String("cat"))
evaluates
to false
just as
(new String("cat") == new String("dog"))
does, because they are
different objects, whether or not their contents are indentical.
However, the notion of
equality is weakened in
various places. For example, if you append the string "cat"
to an ArrayList al
, then
al.contains(new String ("cat"))
is true, even though
the instance "cat"
is not in the list. In these
situations, the notion of equality is taciltly converted to one of the
application of
the equals()
method (and hashCode()
--
see below). This is almost certainly what the programmer wants, but
Java does not do what the programmer almost certainly
wants when comparing
two Strings for equality (C++ does the 'right thing' here).
To compare the contents of two strings for equality, you can use
the equals()
method. Bizarrely, however, even this
won't work for StringBuilder
or StringBuffer
objects, since these classes don't implement equals()
. Worse,
they inherit the equals()
method from object
which,
again, does what is almost certainly the wrong thing. To compare two
StringBuffer
objects for equality of contents, you'd need to
do something like:
s1.toString().equals (s2.toString())In fact, the == operator does not always test for equality of reference; that is a separate oddity, which is discussed here. • On the subject of equality and the collections framework: most methods in the standard Java API -- in particular in the collections framework -- which call
equals()
on any object will also call hashCode()
.
The general assumption is that, for efficiency, it's quicker to
compare two objects' hash codes than their contents. In practice that isn't
always true, but the collections framework assumes that it is.
What this means is that if you implement equals()
in a class,
it's almost certainly necessary to implement hashCode()
as well,
or risk getting unpredictable results. If for some reason you can't
provide a reasonable implementation of hashCode()
, it's
nearly always safer simply to implement the method to return zero than
to ignore it. To ignore it means to inherit the implementation from
Object
, which will probably be unsuitable.
•
There is never a need to call super()
in a constructor,
although this is often seen
(and recommended). If the constructor does not
call super()
or super(args)
, then the base class
constructor is automatically called.
If you do explicitly call super()
, the extra call is silently
ignored.
•
It is legitimate to define an abstract class that has no abstract methods.
Such a class could be realized in a completely empty subclass. The purpose of
such a construct is unclear. In C++, an abstract class is indicated by the
presence of abstract methods; there is no separate 'abstract' modifier
for the class. Java does use a separate abstract modifier, and it can be
applied to a class that is not abstract in any meaningful sense.
It could be argued that declaring a class abstract is a useful way to
ensure that a class need never be instantiated -- it has only static
methods. A good example of such a class is java.lang.Math
.
However, the Java designers have not followed this approach --
correctly, in my view, because Math
(and similar examples
such as java.lang.System
) are not abstract
classes -- they are not classes at all in any meaningful
sense. Rather they are artefacts of Java's insistence that all code
has to be in some class or other. Math
is more a namespace
than a class. In fact, later versions of Java support
static imports, so we can even hide the fact that math operations are
in a class:
import static java.lang.Math.sin; //... double x = sin (1);The
import static
here serves an almost identical
purpose to using namespace
in C++.
•
Despite Java's good record of platform-neutrality, floating-point operations
are not guaranteed to behave exactly equivalently on all platforms, unless
exact compatibility is demanded by marking code with the strictfp
keyword.
•
The operations ++ and -- are not atomic, even when applied to a volatile
variable. If use_count
is a volatile integer, then
a different thread can interrupt the operation of use_count++
between reading the value of use_count
and writing it
back.
•
Java does not have a sizeof()
operator. Although Java primitives
are of documented range, the actual amount of memory used by a primitive
cannot be determined with precision.
You'll often read statements to the effect that
a Java char
is 16 bits, but that's not strictly true. It has
the range of a 16-bit value, but the actual storage is platform-dependent.
Where large amounts of data are involved, this
uncertainty can be significant.
•
That Java does not support unsigned integer data types is well known.
However, Java's char
type is unsigned, in that its minimum
value is zero. However, arithmetic on chars is highly inconsistent -- at
least until we consider in detail the compiler's type widening rules.
For example, this is legal:
char a = 1; char b = ++a;Although this is illegal, although superficially similar:
char a = 1; char b = a + 1;That is, you can increment and decrement chars, although you can't add and subtract them.... but... this is legal:
char a = 1; char b = 2; int c = a + b;That's because when you add (or subtract) chars, they are widened to integers, so the result is an integer. But the increment/decrement operations, because they modify the variable itself, do not widen. That explains why the result of
++a
can be assigned to char b
, but
a+1
cannot.
Incidentally, a char variable can be decremented below zero, in which case it
wraps around to 65,536 just like an unsigned 16-bit value should.
•
A related problem with number widening is this one:
long microseconds_per_day = 24 * 60 * 60 * 1000 * 1000;Although the calculation result will fit into a
long
, it
won't fit into an int
, and the compiler will do the
calculation as an int
, whereupon it will overflow, and leave
an unxpectedly wrong value in microseconds_per_day
.
•
Java goes to some lengths to ensure that a local variable cannot be used
before initialization, but it isn't smart enough to spot case where
a variable is clearly initialized. For example, this code works fine:
int a = 5, b; if (a > 10) b = 1; else b = 1; System.out.println ("b= " + b);But the following will not compile, because
b
'might not
have been initialized'.
int a = 5, b; if (a > 10) b = 1; if (a <= 10) b = 1; System.out.println ("b= " + b);However, the two code snippets are exactly equivalent. In C++, issues of this sort are generally considered to be warnings rather than errors, and compiler pragmas are typically provided to suppress them on occasions where the developer is smarter than the compiler. In Java, we often have to modify our code to accomodate the compiler's lack of smarts. • A related issue to the previous one -- the compiler not being smart enough to work out whether the local variable had been initialized -- is the problem of detecting whether a
final
variable has been
initialized more than once. Consider this example:
class Test { final int test; int getTest() throws Exception { throw new Exception(); } Test () { try { test = getTest(); } catch (Exception e) { test = -1; } } }
int test
is a blank final -- a final that can be assigned
to exactly once in the life of the object. This code does not compile
because the compiler thinks there is the possibility of test
being multiply initialized, but close inspection of the code reveals that
this is not the case. In the try
block, test
will only be assigned if no exception is throw, and in that case
the catch
will be skipped.
As in the previous example, it might sometimes be necessary to rearrange
code to allow for the compiler's lack of insight.
•
Despite Java's vigour in preventing local variables being used
uninitialized, it is perfectly legal to use instance variables
uninitialized (in which case, default values are used).
•
There is no straightforward way for a Java application to determine the
filesystem path of its executable. That is, there is no eqivalent
of argv[0]
in the arguments passed to main()
in a C program. Since this information is often necessary, Java
developers have to resort to platform-specific tricks, like reading
the value of /proc/self/exe
on a Linux system.
•
Java provides signed right-shift (>>) and zero-fill right-shift
(>>>) operators. However, as Java has no unsigned integer types,
a zero-fill right shift (zeroing the sign bit) makes no sense
whatsoever. If the integer is positive, then the signed and zero-fill
right shifts are equivalent; if the integer is negative, then the effect of
the zero-fill right shift will be to divide by minus two, rather
than two. That, the negative number will become positive.
•
Another oddity of the bit shift operators is that the shift distance
is reduced modulo the variable width. That is, if you try to shift
an integer 34 places, it will actually shift it 2 places (34 % 32).
In ANSI C, such a shift to the right will result in zero, as all the
data will be shifted out and replaced with zeros. To the right, the result
will depend on the sign bit and the signedness of the variable. Typically
the compiler will produce a warning in such situations, if it can.
There's no right or wrong approach to handling
a situation like this; but arguably
Java's approach is the least helpful -- it does something that the
developer almost certainly did not intend or expect.
•
In the collections framework, List
interface defines a
collection
of indexed, non-unique elements, while Set
defines a
collection of unindexed,
sorted, unique elements; but there
is no built-in collection of indexed, unique elements, or of sorted,
non-unique elements (indexed or otherwise). For some reason, only
unique collections (sets) can be defined as sorted, and only non-unique
collections (lists) can be defined as indexed. There are many other possible
arrangements of data that have no representation in the collections framework.
It's interesting that there is no concrete implementation of the
Collection
interface. Of course, the developer can just use
ArrayList
and ignore the ordering capabilities.
•
If class Child
extends class Parent
, and Child
and Parent
are in
different packages, then a method in Parent
that is not specifically
tagged as public
or protected
is not overridden
by a method with the same name in Child
. However,
if the classes are in the same package, the method in Parent
is
overridden. In fact, this is all perfectly logical when you consider the
rules about method access: a method without modifiers is private when considered
by classes in a different package, but accessible in the same package; and
you can't override a private method (the subclass might not even be aware that
the method exists). The problem, of course, is that the rules on overriding
and polymorphism become completely different when the classes are in different
packages, compared to when they are in the same package. The solution is to
tag a method as protected
when it is intended to be overridden,
and/or to make careful use of the @Override pragma whenever you expect
a method call to override a base class method.
This issue is discussed in more detail in this article.
•
Java allows a class to have members with the same name, so long as they
are unambiguous. However, there are cases where names can be ambiguous,
and the compiler does not complain. Consider the following example:
class Ambiguous { static class AmbiguousName { static String text = "Inner class"; } static TopLevel AmbiguousName = new TopLevel(); } class TopLevel { String text = "Top-level class"; } public class Test { public static void main( String[] args) { System.out.println( Ambiguous.AmbiguousName.text); } }In the class
Ambiguous
, the name Ambiguous
identifies
an inner class and a variable. The variable references a class
which has an instance variable text
, and the inner
class has a static instance variable of the same name. So to what does
Ambiguous.AmbiguousName.text
refer?
The answer is that the instance variable takes precedence over the inner
class with the same name. No compiler error or warning is generated.
This is a somewhat contrived example -- and in most cases the use of
consistent naming conventions ought to prevent this kind of situation arising.
•
You can use the unicode character specifier (\uX) not only as part
of a string or character literal, and not only to name identifiers, but
even as part of a Java keyword. For example, this is legal:
cha\u0072 = '\u0072';\u0072 is the unicode code point for the letter 'r'. Of course, such notation for Java keywords is unreadable. It's not particularly helpful for identifiers, either, given that the Java compiler will happily read unicode source files, so the actual characters can be used. In fact, it's possible to code an entire Java application as a string of \uX values -- an extreme example of source obfuscation. More practically, the way in which the compiler treats unicode escapes can lead to odd results. For example, the compiler will reject this comment:
/** A test for this class may be found at c:\unit_tests\something.java */
\unit_tests
starts like a unicode escape, but is not one, and
the compiler treats it as an error. The compiler will also reject this:
/** // Note: \u000A is the unicode value for a line feed */This fails because the unicode \u000A is parsed before the comment, so the line is split into two because \u000A is,as the comment itself says, a line feed. The second part of the line does not begin with //, and so is no longer a comment. • Despite its name,
Math.abs()
is capable or returning
a negative value when applied to integers. Specifically, it returns the same
value as its argument
when the argument is the largest negative number that the variable can
hold. This peculiar result follows from the way two's-complement
arithmetic works. Since Integer.MIN_VALUE -- the largest negative integer --
cannot be negated (because zero is considered to be positive, so
there are one fewer, non-zero positive numbers available in the range
than negative numbers), the argument is returned unchanged.
Conceivably abs()
should throw an exception in this case; but
when we consider all the other places where integer arithmetic can fail
without an exception, it hardly seems worth it for this unusual edge case.
•
char variables are treated as integers when added to other number
types, and characters when added to Strings. When two chars are
added, they are both treated as numbers, not as Strings. These rules lead
to the rather odd results:
"Hell" + 'o' = "Hello" (String) 104 + 'o' = 215 (integer) 'h' + 'o' = 215 (integer)This oddity follows from the way that char can be used as both a number and a character in Java. Since Java already has a 16-bit signed integer type, and does not really need a 16-bit unsigned integer type (since Java eschews unsigned integers in general), there really is little need to allow char variables to be treated as numbers at all. In C++, the Standard Template Library defines the addition of strings to characters, characters to integers, and all manner of other things, and that's all well and good. The problem with Java is that the oddities are not part of a library implementation, but embedded into the very syntax of the programming language. The ability of Java to treat
char
s as numbers leads to
common programming errors like this:
StringBuffer sb = new StringBuffer ('?');What the programmer probably intended was a
StringBuffer
initialized to the string "?"; the actual result is an empty string
whose initial capacity is the number of characters indicated by the
unicode value of the ? character.
•
Number classes define constructors on Strings, but not equality
with Strings. Consider this example:
Integer x = new Integer("2"); if (x.equals(2)) System.out.println ("equals1"); if (x.equals("2")) System.out.println ("equals2");You might think that if you can initialize an
Integer
from
a String, you could compare an Integer
with a String
.
Sadly, no. x.equals(2)
is true, x.equals("2")
is
false, even though x
was initialized from "2"
•
An array of chars is not a kind of String (unlike in C)... except when it
is. Consider this example:
System.out.println (int[]{1,2}); System.out.println (new char[]{'a','b'});The
println()
call will happily concatenate the elements of
the character array into a single text string. It won't do the same
with an integer array -- it doesn't even display the contents of the
array, just an object handle.
To add to the confusion, the following example also prints the character
array as an object handle:
System.out.println (int[]{1,2}); System.out.println ("Array is: " + new char[]{'a','b'});This is because, unlike the
println()
method, the + operator
does not format the character array as a string.
•
In Java, an anonymous code block (that is, statements enclosed in
braces and not part of a specific method) in a class is considered to be
part of the instance initializer, and is copied into the start of
each constructor in the class. This is a rather confusing and ugly way
to relieve the developer of the need to implement an additional method
containing
code shared between multiple constructors. Conceivably this construct
exists because there needs to be some way to initialize
anonymous inner classes (which can't have programmer-defined constructors).
•
Java 1.7 allows digits in a number to separated by underscore characters
to aid clarity of expression. So we can write one thousand as
"1_000".
The problem is that _ is a valid character in an identifier, so there are
complicated rules about how it can be used in a number. It's is not
difficult to see why this statement does not define a number literal --
_1000 is a valid identifier name.
double x = _1000;It's less easy to see why this number definition is invalid:
double x = 1000_;Arguably, it does not reflect an everyday use of the digit separator -- we would not write "1000," in the UK, for example. On the other hand, we would not write "10,00" either, but
double x = 10_00;is legal Java. • It is impossible to define a multi-line string literal in Java, except by using the addition operator, with the run-time overhead this entails. • In general, the order of definition of members of a class is not important. If method
a()
calls method b()
, which
refers to variable c
, there is no requirement that the
members are defined in the top-to-bottom order c
, b()
,
a()
. However, the declaration of static initializers is
inconsistent in this regard: variables referred to in a
static {...}
block must appear before that block in the source
code. Similarly, a static initializer that instantiates its own
class must appear after static initializers it depends on. Both these
issues arise from the fact that static initializers are executed
strictly from top to bottom; the compiler does not try to determine
what dependecies there might be between them.
•
The methods that mutate a StringBuffer
or
StringBuilder
are implemented somewhat inconsistently. If
sb
is an instance of StringBuilder
,
the call sb.subSequence()
returns a new
object containing a substring of sb
. It does not
modify sb
itself. On the other hand, the methods
insert()
and delete()
do modify the
string itself, even though they return a value. The value returned is,
in fact, a reference to the original string. The way these methods are
declared gives the impression that they will all return a new object, leaving
the original unchanged, but this is not the case.
•
The compound arithmetic operators +=
, etc., implicitly cast
their results to the type of the left-hand-side of the assignment,
even if this would cause an overflow.
So the following code compiles perfectly well:
long y = 1000000000000L; int x = 0; x += y; System.out.println ("x=" + x);When run, the value of
x
turns out to be -727379968
.
It is therefore never really safe to use +=
on integers of
mixed range. The following code (properly) fails to compile:
long y = 1000000000000L; int x = 0; x = x + y;• The process of auto-boxing and auto-unboxing, introduced in later Java versions, can make it easier to read and write code that uses classes which wrap primitives (
Integer
, etc). Auto-boxing is the
implicit conversion between
primitive number types
(int
, float
) and class number types
(Integer
, Float
).
To some extent this
implicit conversion avoids one of the features that caused most
complaint among Java programmers -- that there were essentially two
separate programming conventions for primitive number types and
class number types. However, auto-boxing
can lead to surprising, even shocking, results.
Integer x = new Integer (1); Integer y = new Integer (1); System.out.println ("x <= y: " + (x <= y)); System.out.println ("x == y: " + (x == y));The output is:
x <= y: true x == y: falseThe problem is that the arithmetic less-than operator causes auto-unboxing of the
Integer
s into primitives, while the equality operator
does not. So the equality test is for reference equivalence, and
x
and y
are not the same instance.
Actually, it isn't quite true to say that the equality operator doesn't
cause auto-unboxing. It does if the comparison is between a number
class and a literal number or a number variable. Consider this
code:
Integer a = new Integer (0); Integer b = new Integer (0); int c = 0; System.out.println ("a == b: " + (a == b)); System.out.println ("a == 0: " + (a == 0)); System.out.println ("a == c: " + (a == c));The output is:
a == b: false a == 0: true a == c: trueSo if we compare two
Integer
objects containing zero
for equality, we find that they are not equal (because we're testing the
references, not the contents). However, if we test an Integer containing
zero against a literal zero or an int
with value zero,
then they are equal, because of the auto-unboxing.
If that wasn't tricksy enough, a truly bizarre result comes from this code:
Integer x = 1; Integer y = 1; System.out.println ("x <= y: " + (x <= y)); System.out.println ("x == y: " + (x == y));In this case,
x == y
is true, but for entirely the
wrong reasons. It's true because after the assignments, x
and
y
refer to the same instance. This behaviour is
consistent with the way that String
objects are assigned from
string literals: if s1 = "hello"
and s2 = "hello"
then s1 == s2
, but because s1
and s2
are the same object, not because their contents are the same.
And is if that wasn't enough, consider this code, which is
the same as the previous apart from the actual numbers:
Integer x = 11112332; Integer y = 11112332; System.out.println ("x <= y: " + (x <= y)); System.out.println ("x == y: " + (x == y));Now, amazingly,
(x == y)
is false. It appears that
Java's sharing of literal object only applies to specific values. Specifically,
it applies to values between -128 and 128. Anything that needs more than
8 signed bits gets a separate storage allocation.
Since String
doesn't support arithmetic comparison operators
like less-than, the inconsistencies in the way instances are created by
assignment are not all that apparent. But with number classes, we have
the truly frightening result that arithmetic comparisons depend
on how the instance was initialized, and the actual value assigned,
in addition to the
inconsistent behaviour between == and all the other comparison operators.
•
Unlike C++, Java lacks a way to specify that a class method does
not modify the instance. In C++ we can say, for example:
int getCount() const { return count; }and it is clear to the user of the class that
obj.getCount()
can not modify obj
. As well as improving clarity, it
prevents the developer carelessly introducing mutating code into a method
which is specified to be non-mutating. The lack of such a construct in
Java is a significant ommission, which has to be overcome in awkward ways.
•
To divide a floating point number by zero is not an error in Java.
The floating point representation used is able to represent the quantity
that arises from dividing by zero; you can even do arithmetic (to some extent)
on these quantities, which are misleadingly (from a mathematical standpoint)
referred to as 'infinities'. Similarly, no exception is thrown when
attempting to take the square root of a negative number, even though
Java has no built-in support for complex numbers. Instead, the result
is the value 'NaN' (not-a-number). NaN is an odd kind of thing; it is
the only numeric value in Java for which is not equal to itself.
That is (NaN == NaN)
evaluates to false.
Whatever the mathematical merit of this kind of number handling, it
is inconistent with Java treatment of integer math.
•
A switch
expression can be used with an enum
variable and, in fact, this is one of the most powerful and natural uses
of a switch
. However, the Java component won't warn you if
you base a switch
on an enum
and then fail to
handle on of the enum's values in a case
(which is almost always
a programming error). Most C++ compilers
can do this so, presumably, it isn't rocket surgery.
•
Java famously has no 'goto' construct or direct equivalent. Nevertheless
, goto
is a reserved word in the Java langauge. You can't,
for example, name a variable 'goto'. Equally oddly, you can apply
a label (e.g., something:
) to any statement, or
even a comment, although only loops and switch cases can be usefully
labelled.
•
Similarly, Java has reserved the keyword const
, but
it has no function. Instead, Java uses the keyword final
for three different roles, one of which broadly aligns with const
variables in C/C++. The other two uses of final
are on a
class (cannot be extended) and a method (cannot be overridden). It is
not clear that these different semantic modifiers really benefit from
being given the same keyword.
•
Java provides no way to specify that a class member is to be
accessible to the class and its subclasses only. You can declare
members as private
, which will hide them from the rest of the
package; but it will also hide them from subclasses, whether in the same
package or not. protected
will make the member available to
subclasses, but it will also make it available to other classes in the package,
whether they are subclasses or not.
The only complete solution to this problem is to declare each class in its own
package. Since that would be very ugly, the next best thing is to structure
packages so that each consists of relatively few classes, written
in close collaboration.
Java did support, for a short time, an access modified 'private protected'
which was pretty much the same as protected
in C++.
However, it was removed, for reasons I've never fully understood. I suspect
that the Java designers thought that the access control mechanism was
already complicated enough, and by 'private protected' did not fit
neatly into the regular progression of increasing access -- private, default,
protected, public.
•
Java supports abstract methods and static methods, but not
abstract static methods. Similarly, you can't declare an interface
method as static.
It's often claimed that 'abstract' and 'static' are logically incompatible.
You'll sometimes see this claim backed up by the (philosophically
dubious) statement
that abstract methods cannot be overridden. Abstract methods are
inherited by subclasses, and subclasses can define their own methods
with the same name and arguments. That the Java language specification does
not consider this a complete override is a matter of how the specification uses
the word 'overrides', not a logical limitation of the concept of overriding.
In Java terminology, a method only 'overrides' another if the choice of
which method to call can be made at run-time, based on the run-time type.
The overriding of static methods is not a true override (in these terms)
because the call decision is based on compile-time information only.
Whetever the rights or wrongs of the way Java uses the term 'overrides',
that 'abstract static' methods
are not logically incoherent is indicated by the fact that some languages,
e.g., SmallTalk, do provide such a construct.
•
It is syntactically legal in Java to cast an object to an interface
even if the object's class does not implement that interface.
The same is not true for casting an object to another class type. Of course,
the cast will fail at run-time; but it's odd that it is not rejected by
the compiler even in cases where it is perfectly clear that the class in
question does not implement the interface.
•
Java interfaces can optionally be declared 'abstract', as can their
methods. The 'abstract' modifier in such cases has no meaning and is
ignored. Similarly, interface methods can be declared 'public', but
such a declaration has no effect.
•
Java allows an abstract class to be defined with a public constructor.
Because the class can not (by definition) be instantiated, the use of a
public constructor is meaningless. In fact, any access specifier excepted
'protected' in the C++ sense is meaningless -- the Java version
of protected makes no sense either, because this would normally extend
access not only to subclasses, but to other classes in the same package.
But these other classes won't be able to instantiate the abstract
class either, because it is abstract.
•
All the classes in the collections framework are parameterised, so
for example, Collection.add() is defined as
boolean add(E e)What this means is that if I create an
ArrayList
like
this:
ArrayList<Integer> list = new ArrayList<Integer>();Then I'll get an error from the compiler if I try to add something to the list that cannot safely be converted to an
Integer
.
This, for example, fails:
list.add (new String ("Hello"");The
remove()
method, however, is not parameterised; it
takes an Object
argument. So this is legal:
String s = "hello"; list.remove (s);Since one of the main reasons for using parameterised classes is to increase compile-time type security, declaring
remove
this way appears to be a missed opportunity.
•
The toArray()
method on Collection
, like
the remove()
method, also has type-security problems.
There is a variant of toArray
that returns an
object[]
-- the problem there should be clear enough.
There is also a toArray()
method that takes an object
and returns an array of that type:
<T>T[] toArray (T[] a)But all that does is apply a downcast to each element in the collection; at compile time there is a check that the left-hand-side of the assignment from
toArray()
matches the type of the
argument a
, but there is no compile-time check that
T
is the proper type for the contents of the collection.
So this code compiles perfectly well:
ArrayList<Integer> list = new ArrayList<Integer>(); list.add (new Integer(1)); String[] s = list.toArray(new String[0]);It fails at runtime with an
ArrayStoreException
.
What is needed -- and is not provided -- is the method
Interface Collection<E> { <E> E[] toArrayOfCollectionType(); }In fact, this method cannot easily be provided because of the internal implementation of generic classes -- see here for a further discussion of this point. • That Java lacks an equivalent of C's
typedef
construct is
well known and, in most cases, it is not missed.
However, consider the following situation. We need to define a variable
to store an integer quantity, and its range is not well known at the outset.
It's easy enough to change an instance variable from
short total_lines;to
int total_lines;What's less straightforward is fixing all the casts that Java requires us to use when doing arithmetic with anything other than
double
or int
.
For example:
short total_pages = 10_000; short total_sheets = total_pages / 2; // Error short total_sheets = (short)(total_pages / 2); // That's betterThe second line won't compile because the result of
total_pages / 2
is an int
, not a short
, even though if you divide
a short
by any integer at all,
the result must fit into a short
.
The problem arises when lines 1 and 3 in our previous example are,
in reality, a thousand lines apart, or in separate source files.
Suppose we decide at some point in the future that total_pages
(whatever that represents) needs to be an int
, and not
a short
. How do we find all the other lines that might be
affected by such a change? Worse, what happens if we don't find them?
The problem is that it's not an error to write code like this:
int total_pages = 10_000; // 1000 lines... short total_sheets = (short)(total_pages / 2); // That's betterHere, the
int
total_pages
very possibly won't
fit into
a short
any more, even when divided by two;
and the cast -- which was previously
essential to make the code compile -- now stops the compiler warning us
that we've made a mistake.
This is, to be fair, a general problem of change control: it's hard to
keep track of all the subtle dependencies in a complex program.
In C/C++, the conventional
way to deal with problems of this type is to typedef
an
application-specific data type, and define such variables as are necessary,
and their associated casts, to be of this custom type. Thereafter,
to change the
variable range, all we have to change is one typedef
. This
approach is far from perfect, as well -- no approach is ideal. However,
Java lacks any efficient strategy for dealing with number type changes, even
an imperfect one.
There is, of course, an inefficient strategy, which is
to define your number variables in terms of custom wrapper classes,
not primitives.
We could create a class called MyInt
to hold an int
,
then define a subclass that extends MyInt
for a specific
kind of number that the application uses.
If an int
later proves to be inappropriate, we
could create a new wrapper class -- let's say MyLong
--
and change the specific number class to dervie from that instead.
This task would be less inefficient -- at development time anyway --
if we could use the built-in classes Integer
, etc.
Unfortunately, these are all defined as final
, so we
can't. In any case, this kind of approach is likely to be inefficient
at run-time, if the application does a lot of arithmetic.
•
Java has no destructors. In most object-oriented programming languages,
a class can have a destructor which is the logical counterpart of the
constructor. Typically the destructor is called when the object goes
out of scope, or is specifically deleted. Java has no object delete
operation, and objects that go out of scope are not necessarily
considered inaccessible.
The lack of a destructor makes it impossible to develop Java applications
according to the 'construction is resource allocation' model; typically
resources are allocated in the constructor and freed in the destructor.
However, the new 'try-with-resources' constructor provides a feature
which is almost a destructor.
•
The method Class.newInstance()
can potentially throw any
exception, and attempting to handle exceptions elegantly at compile time is
extremely difficult. You might know, for example, that the class you
want to instantiate has a constructor that throws SQLException
,
but you won't be able to declare a catch
block for this
exception if you instantiate the class using Class.newInstance
.
This deficiency can be used to create a utility class that can throw
any kind of exception from any point in the code, with no compile-time
checking at all (should you ever need to do such a thing).
•
Java does not support operator overloading... except where it does.
The + operator is overloaded for binary operations between a
String
and most other
built-in types, as is the += operator. The = operator is overloaded
for String
and number classes (it does a value
assignment, not the usual reference assignment). All number
comparisons are overloaded for the number classes, except ==.
The not (!) operator is overloaded for the Boolean
class.
•
Constant expressions are evaluated at compile time,
even when they refer to different classes. This is in contrast
to almost all other Java operations between classes, which are
dynamically linked and evaluated at runtime. Consider this
definition:
public class Lib { public static final int answer = 42; }When another class refers to
Lib.answer
, the compiler
silenly replaces the expression with '42'. This is fine, until we
modify the value of answer
. Even if we recompile the
Lib
class, users of that class continue to use the old value until they,
too, are recompiled.
This problem usually bites when upgrading or patching libraries. You
might think that, Java being what it is, you could just change the libraries,
and the rest of the code could remain unchanged. However,
the answer
example shows that this is not so, in general.
This finding does not usually surprise C/C++ developers, who are used to
working in an environment where modules are, on the whole, statically
linked. In Java, however, we've got used to almost everything
being dynamically linked; the fact that references to constants are
not dynamically linked can come as a bit of a surprise.
•
Contrary to widespread belief, the Java compiler does
support conditional compilation. In C/C++, it's common to see
constructs like this:
#ifdef DEBUG //... lots of debug code #endif //DEBUGThe value of DEBUG is optionally set at compilation time and, if it is not defined, the debug code is not only never executed, it is never even included in the compiled output. The purpose here is to reduce both the overhead of checking the value of DEBUG at runtime, and the size of the compiled program. Java has no preprocessor that can exclude code at compile time, but it does have a rudimentary conditional compilation mechanism all the same.
static final boolean debug = false; if (debug) { // Lots of debug code }As in the C example above, the code in the
if() {...}
will not only never be executed, it will be eliminated from the
compiled output. Setting debug=true
will include the
code, and execute it at runtime.
That this technique is not well known is evidenced by the number of
published mechanisms there are for simulating conditional compilation
in Java -- some quite complicated. I confess that I only understood
it myself when I was experimenting to find out why the compiler did
not complain about unreachable statements in the block
if(false){ ... }
. It's only the if
block
that is optimized in this way; the following code won't even compile:
static final boolean debug = false; while (debug) { // Compiler complains about unreachable code here }
if (false) {...}
is one of the few constructs where
code is unreachable and the compiler does not complain; another is
in catch
blocks that can never be entered (since
Java 7; see below).
•
To support (presumably) the new exception re-throwing rules in Java 7,
if an Exception
or subclass of RuntimeExceptino
is declared to be caught, but then re-thrown intact, the method
from which it is re-thrown does not have to declare it. Consider
the code below, in which an exception (of class Exception
)
is explicitly thrown from the method fail()
. The
method does not have to expose the exception in its own signature,
even though it is ostensibly of checked type.
public class Test { public void fail() { try { int i = 1/0; } catch (Exception e) { System.out.println ("Caught: " + e); throw e; } } public static void main (String[] args) { new Test().fail(); } }Similarly, the
main()
method does not have to handle
the exception, even though it is -- apparently -- of a checked
class.
In Java 1.6 it would be necessary to declare the method
fail()
with throws Exception
.
So what's going on here? Since it's now possible to re-throw an
exception which is ostensibly a superclass of the exception which the
method is declared to throw, the Java compiler has to be much more
thorough about working out which exceptions can really
be thrown in a code block, rather than relying on what the developer
says. In the example above, the compiler knows that an integer
divison by zero is possible, but that is an unchecked exception. So
although the developer has said throw e
, where
e
is of (checked) type Exception
, the
compiler knows that no checked exception can be thrown from
this method, despite what the developer has written. And since the
exception that can be thrown is unchecked, the compiler quietly ignores
it.
Moreover, if the try
block contains no code that can
raise any exception at all, then the catch
block is
never even compiled. If there is code in it, then the compiler does
not warn about unreachable statements -- this is another of the very
rare places where you can write an unreachable statement without the
compiler complaining.
•
The return
statement in Java does not necessarily cause an
immediate exit from a method -- it is overridden by a finally
.
For example, this method returns 1, not 0:
int go() { try { return 0; } finally { return 1; } }The complication is clear enough when the
try
and the
finally
are only a few lines apart. In practice, however,
it could be quite difficult to understand why the return 0
was never apparently executed. It can be argued that the use of
finally
is intrisically likely to lead to a flow of control
that is hard to follow. That finally
exists at all is really
a consequence of the fact that Java does not have destructors (see here).
Lacking destructors, we need some way to tidy up resource allocation
that happened in the try { ... }
, and which is independent
of whether an exception is thrown or not. With luck, the new
'try-with-resources' construct in Java 7 will make the use of
finally
less necessary.
•
At last, Java has support for binary literals. It's odd, however,
that you can't initialize a short
or a byte
from
a negative binary literal, while you can from a postive one.
It's even odder that the same restriction doe not apply to an
int
.
short short1 = -32768; // Yes short short2 = 0b1000_0000_0000_0000; // No byte byte1 = -128; // Yes byte byte2 = 0b1000_0000; //NoThe lines marked 'no' fail in the compiler for a possible loss of precision. The lines marked 'yes' pass, even though they represent exactly the same number. • Java allows arrays to be cast to different types, in some circumstances. I'm talking about the array itself, here -- not the individual elements of the array (which can, of course, be cast according to their own rules). However, there are some oddities in which kinds of cast are allowed and which are not. An array of objects of a particular class can be cast to an array of objects of a superclass. This is always legal, both at compile time and at runtime. For example, if
Vehicle
is the superclass of
Car
this succeeds:
of String
:
Car[] c = new Car[1]; Vehicle[] v = (Vehicle[]) c;If we immediately follow this with the following statement:
v[0] = new Vehicle(0); // FailsIt fails at runtime, as it should: despite the cast,
v
is
known at runtime to be an array of Car
.
An attempt to cast an array of one type to an array of an unrelated
type will not compile:
Car[] c = new Car[0]; Bicycle[] b = (Car[])c; // "Inconvertible types"However, an attempt to cast to a subtype is also legal at compile time:
Vehicle[] v = new Vehicle[1]; v[0] = new Car(); Car[] c = (Car[]) v; // FailsBut it fails at runtime regardless of the array contents. In this example,
o
contains no element that is not a
Car
, and yet we can't cast it to a Car[]
,
despite that it compiles correctly. Since the JVM takes no account of
the contents before rejecting the cast, one has to wonder why the compiler
cannot reject the construct itself.
Although the compiler is smart enough to block casts to incompatible,
unrelated array types, the preceding rules mean that we can fool
the compiler by casting through a common subtype, like this:
Bicycle b[] = new Bicyle[0]; Object[] o = (Object[])b; Car [] c = (Car[])o; // We have cast Bicycle[] to Car[]The code will fail at runtime because, as we've seen, you can't cast to a subtype at runtime, regardless of the contents. Arguably, if you're prepared to go to these lengths to fool the compiler into allowing something it would ordinary choke on, you get what you deserve. Intuitively, the rules about casting beteen types make some kind of sense. An array of cars is also an array of vehicles, and it makes sense to be able to treat it as one, at least until we try to put a vehicle in it that is not a car. On the other hand, an array of vehicles is conceptually different from an array of cars, even if all the vehicles happen to be cars at some specific point in time. Maybe. The problem is that, even though the rules make some kind of sense, they are not followed by the collections framework. • Casts between
ArrayList
types do not
obey the same rules as casts between arrays.
Broadly speaking, an ArrayList<Integer>
is
comparaible to an Integer[]
, and the same broad
equivalence exists for other
ArrayList
s of objects. We've already seen that
Java allows an array to be cast
to an array of a supertype. What about ArrayList
? It
turns out that this fails at compile time:
ArrayList<Car> c = new ArrayList<Car>(); ArrayList<Vehicle> v = (ArrayList<Object>) c; // "Inconvertible types"Remember that the equivalent cast from
Car[]
to Vehicle[]
was valid, both at compile time and
-- to some extent -- at runtime. It turns out that an
ArrayList<Car>
does not cast the same way as
a Car[]
.
What's worse, this is legal:
ArrayList<Car> c = new ArrayList<Car>(); ArrayList v = (ArrayList) c; //LegalIt gives a warning at compile time, because we're casting a typed
ArrayList
to an untyped one, but it does compile and
run.
Having cast the class, we find that all type safety is lost:
ArrayList<Car> c = new ArrayList<Car>(); ArrayList v = (ArrayList) c; //Legal v.add (new Bicycle()); // LegalI suspect that the reason that
ArrayList<T>
casts fail
at compile time, even in circumstances where the equivalent cast for
arrays would be legal, is because type safety cannot be guranteed
at runtime. In the example above, of trying to insert a Bicycle
into an array Car[]
, the code correctly failed at runtime,
because the JVM knows what kind of data the array is specified to
carry. However, Java generic types (ArrayList
and all
the rest of the collections framework), exhibit type erasure.
That is, all the type checking is done at compile time; at runtime
an ArrayList<Car>
is nothing more than an
ArrayList<Object>
. Consequently, compile-time
checks must be more rigorous for collections than they are
for arrays.
•
String.replace()
and String.replaceAll()
both replace all instances of one character sequence with another.
The difference is that replaceAll()
operates on regular
expressions, while replace()
works on fixed strings.
This is not at all obvious from the names, and it is a common mistake
to write something like:
String windowsFilename = unixFile.replaceAll ("/", "\\");This fails because
\
has a special meaning as a regular
expression replacement token.
•
Unlike many other object-oriented languages, methods called from
constructors in Java are called polymorphically. What this means is
that if class Child extends Parent
, and
the constructor of Parent
calls a method which
Child
overrides, it is the overriding method
in Child
that gets called
even though it is Parent
that is being initialized
at that time. This is perfectly consistent with the way method
calls generally work in Java, but it means that methods in the
subclass can get called before the base class has been initialized,
and that is generally a bad, and confusing, thing.
Here is a trivial example:
class Base { final int x; void test() {}; Base () { test(); x = 1; } } public class Test extends Base { void test() { System.out.println ("x = " + x); } public static void main (String[] args) { new Test(); } }This code prints 'x=0', because at the time
Test.test()
is called,
the constructor for Base
has not yet completed.
•
By forbidding multiple inheritance, Java avoids most of the
problems that arise when a method overrides another method that
is supplied by multiple base classes. But not all of them.
Consider this code:
public interface I1 { void log () throws IOException; } public interface I2 { void log () throws SQLException; } public class Test { void log () throws ???; }
Test.log()
implements I1.log
and I2.log
.
What exceptions can it be declared to throw? Common sense should suggest
that it should be able to throw either SQLException
or
IOException
; or perhaps it should be able to throw only the
common base class of these two exceptions (Exception
). In
fact, it cannot throw any exception. In a situation like this,
where a method is specified in multiple instances, the implementing
method can throw only an exception that is specified in
all base types.
•
null
can be assigned to a variable of any class type,
without a cast. And yet, null
is not an instance of any
class -- (null instanceof Something)
is false, whatever
the Something
is.
•
The defects in the Calendar
and Date
classes
are so many, and so well-known, that it seems cruel to give them further
exposure. But just in case there can possibly be any Java programmer who
has not
fought bitterly with these classes, here are just a few.
- Calendar.set()
does not check its arguments for sanity.
It's legal, for example, to ask for the 13th month in the year.- The
Calendar
and Date
APIs interpret
numbers differently. For example, Calendar
, or the whole,
counts years from year zero, while Date
counts from 1900.- Some methods are unhelpfully named. For example,
Date.getDate()
returns the day of the week, not the day of the month.- Quantities like month are not integers, they are
enum
s.
Unfortunately, the date/calendar API predates Java's enum support for a
decade, and the API uses plain integers for almost everything.
- Month numbers are zero-based
•
The use of anonymous inner classes gives rise to non-intuitive scope
resolution rules (which have changed between Java 6 and 7). For
example, this code does not compile with Java 6, and does with Java 7:
class Something { public void doIt (int n) { } public void doIt (int n, int m) { } } public class Test { public void doIt () { } public void go() { new Something() { public void run() { doIt(); } }.run(); } }With Java 6, the compiler says:
Test.java:23: cannot find symbol symbol: method doIt() doIt(); ^If we change the name
Test.doIt()
to anything else, and
the corresponding call to doIt
in the run()
method, the code compiles correctly; so it's not that there's anything
wrong with the declaration itself, and the method is in scope.
The problem is that two other doIt
methods are also in
scope in the run()
method. run()
is a method
added to an anonymous inner class which subclasses Something
so, in the run()
method, class Something
is
the enclosing scope.
Although Java happily supports method overloading, it won't look for
matching method signatures that are in different scopes. So
once the compiler has seen doInt(int)
and
doIt(int,int)
, it will stop looking. Neither of these
method signatures match the call (no arguments), so the compilation
fails.
The problem is easy to spot in an example like this, where all the
methods are within a few lines of each other; but it can be a real
stumper when it arises in a real application.
Although this behaviour is counter-intuitive, it's actually perfectly
logical and in compliance with the Java language specification.
The really odd thing is that the code above compiles perfectly well
in Java 7. That suggests that the scope resolution rules have been
subtly altered, but the change does not appear to be documented.
•
The method java.io.OutputStream(int)
writes a byte
to the stream, not an int
. The high 24 bits are
ignored completely.
•
Unlike in C, the short-circuit behaviour of logical
AND and OR operators is well-defined in Java. Because it is well-defined,
curiosities are exposed which are hidden in C. One such curiosity
is that the bitwise operators &
and | do not
short-circuit. In
if (a() | b()) { //... }Both methods are called, even if
a()
evaluates to true.
If the || operator had been used, b()
would not have
been called.
This is all perfectly sensible -- in most cases you wouldn't want to
use the bitwise operator for a logical comparison. The complication is
that if the types being compared are boolean
then
| and || (and & and &&) are equivalent -- except for
short-circuiting. So for boolean
comparison, we have
available both short-circuiting and a non-shortcircuiting logical
operators -- something that is not true for other data types.
It is arguable whether Java needs bitwise operators at all -- I
suspect they only exist to make Java easier for C programmers to
learn. But the bitwise operators in Java are emasculated in Java.
You can't, for example say:
if (flags & MASK) { //... }It would perhaps have been better if bitwise operators had been relegated to a library class. Arguably, it would have been better had the logical operators not short-circuited, because this merely saves a few keystrokes at the expense of making code harder to troubleshoot. • Java defines an interface
CharSequence
to represent
a sequence of characters. It is implemented by String
,
StringBuilder
, and StringBuffer
. However,
this interface appears to be completely useless. If there is any
method in the Java API that takes a CharSequence
as an
argument, I have yet to find it. For example, there is no
indexOf
method that takes a CharSequence
in
any of the standard string-handling classes, so comparing strings of
different type requires a bunch of ugly toString()
calls.
•
Java uses 16-bit unsigned values (or, at least values with a 16-bit unsigned
range) to represent unicode characters. Unfortunately, the defined
unicode code point set no longer fits into 16 bits. The UTF-16 specification
defines a mechanism called surrogtate pairs for storing
larger values. In essence, certain 16-bit values that are not used for
normal characters are used to indicate that the following value
is from a subsidiary table of characters.
Java does little to hide these implementation details. In the String
class, surrogate pairs are passed around as int
s,
not as a specific character type or class, and developers are left to
handle the details. Of course, you can still call the char
-based
methods, such as:
char charAt(int index);But these will fail if the string contains surrogate pairs. Consider this example:
// Define three strings, each with a single character String sLatin = "A"; // Unicode code point < 256 String sChinese = "東"; // Unicode code point < 65535 String sLinearB = new String (Character.toChars (0x10400)); // code point > 65536 // Make a single, three-character string out of these three characters String s = sLatin + sChinese + sLinearB; // How long is it? System.out.println ("Broken length=" + s.length()); System.out.println ("Real length=" + s.codePointCount(0, s.length() -1));This produces the following output:
Broken length=4 Real length=3Notice that even the very fundamental
String.length()
fails
here. The length of the string in characters is clearly 3, because we
made the string s
from three characters. The problem is that
length()
does not return the number of characters, but the
number of 16-bit units of storage. The Linear B character whose code point
is 0x10400 will not fit into 16 bits -- it requires a UTF-16 "surrogate
pair" of 16-bit code units.
To be fair, this oddity is documented, and so long as the programmer
understands the subtle differece between "UTF-16 code point"
and "UTF-16 code unit" then there should be no problem, right?
The reality is that, if there is any possibility whatever that textual
input might contain Unicode code points beyond 65535, then the programmer
must abandon the traditional ways of working with strings in Java, in
favour of the full Unicode API. This means, for example, using
codePointCount()
rather than the simple length()
.
Most fundamentally, we can't assume that a Unicode value will actually
fit into a char
, even though this data type was specifically
designed to hold a Unicode value. Any code that manipulates individual
characters should store them in an int
, and use the specific
methods on classes like Character
to manipulate them.
This puts Java back where C was twenty years ago, when it became increasingly
obvious that all the world's characters would not fit into a character set
represented by a single byte. The numbers are much bigger now, of course,
but the problem is the same. The way we tackled this in C/C++ was,
essentially, to ignore it. We left the core language unchanged, and provided
a variety of libraries that handled unicode using the basic integer data
types we already had.
Unfortunately, Java does have a core language that is supposed
to make unicode support reasonably natural and transparent. When we work
with unicode in C++ (and especially in C) we know we're going to be
in a world of hurt; with Java, developers expect to be protected from this
nastiness and, increasingly, they aren't.
•
The method InetAddress.isReachable()
is supposed to test whether
one host is reachable from another. The documentation says that it will
use ICMP if the user has the appropriate permissions, or a TCP echo if not.
On Linux, however, the JVM only uses ICMP if the user is root,
even though an unprivileged user can make ICMP requests on Linux. Admittedly,
this isn't allowed by default on Linux, but it's readily configured. The
result is that the JVM refuses even to try something that might be allowed.