Lua quick-start for Java programmers

Lua logo

This article provides the barest essentials of Lua programming, for developers who are familiar with Java (or, perhaps, C/C++). Despite its simplicity, Lua is a fully-featured programming language with an extensive run-time library, and it's impossible to do it justice in a short article. However, I think it's possible to give enough of the basics to give an experienced developer a running start.

Note:
Lua is often found embedded in other applications. There are several different versions in widespread use, all built in slightly different ways. It's better, I think, to build Lua from source if practicable, because then you'll know exactly what you're getting. It's not like building gcc or Perl from source -- Lua is implemented in only 20 000 lines of C, and builds in a few seconds on a modern desktop system

Fundamental implementation

Lua is traditionally an interpreted scripting language -- there is no well-defined "compile" phase of development. Lua doesn't really have anything that corresponds to Java's notion of a class file. Although there is a compiler that can produce Lua byte-code, the notion of a virtual machine is not as well-defined in Lua as it is in Java, and Lua byte-code is essentially a binary parse tree, not a set of machine instructions for a virtual machine.

There is a spin-off of Lua called LuaJIT, which implements a just-in-time compiler for the Lua language. This mode of operation is more similar to the way Java works except that, again, there is no separate compilation phase.

Like Java, and like most scripting languages, Lua uses garbage collection (GC) for memory management. Lua's GC scheme is based on reference counting, and is much simpler than Java's. As in Java, it's possible to break the garbage collector by overloading it, but it seems to be quite difficult. That's not necessarily because Lua's GC scheme is more robust than Java's -- rather, I don't think that Lua is called on to carry out the same kind of heavyweight tasks that Java often is.

The Lua language syntax is block-structured, as most modern procedural languages are. It's a verbose language compared to Java -- more like BASIC or Algol. However, most symbols -- parentheses, brackets, arithmetic operators, etc -- have meanings that will be completely familiar.

Lua is weakly-typed -- it is neither necessary nor possible to stipulate that a variable holds only a specific type.

Unlike Java, a Lua program does not have to start in a specific class -- Lua does have a concept of class, but it's not as central as it is in Java. Instead, execution begins with the first statement that can actually be executed.

Like Java, C++, and Perl, and unlike Python, Lua has no regard for program layout and indentation. You can format your source files however you like.

Hello, World

This is the canonical "Hello, World" for Lua.

print ("Hello, World")

The following are equivalent

print ('Hello, World')
print 'Hello, World'

You won't always get away with writing a function call without parentheses, so it might be better to avoid this particular idiom.

The standard Lua print() is broadly similar to Java's system.out.println() -- there's no way to suppress the terminating end-of-line character. So another alternative is:

io.write ("Hello, World\n");

io.write() is a more flexible function than print, but print is convenient for debugging, because it can print things other than text strings without explicit conversion.

Namespace and packaging

Broadly, Lua has a flat namespace. However, the practice of grouping functions into modules gives an illusion of separate namespaces. For example:

foo=require ("foo");
foo.some_function ("bar");

foo is not a namespace here, despite appearances: it's a table. There's no particular reason to be concerned about the specific implementation details, unless you're working with object-oriented programming -- more on this later -- but, essentially, [table_name].[item_index]() is a valid function call in Lua.

The require() function can load various types of module -- Lua source code, Lua bytecode, native code -- each with its own search path. When the module is loaded, the loader creates a table and associates each function name with its implementation.

This module.function naming convention applies not only to modules loaded explicitly, but also to many built-in functions. So we have, for example, math.sin and io.read. Again, there's not really a "math" or an "io" namespace, in the sense of Java or C++.

As modules are the only built-in facility for organizing functions, they are used a lot.

There is a package manager with dependency management and online repository called LuaRocks. It has a similar function in Lua to that of Maven in Java. I can't comment on it because I don't use it (I don't use Maven either, if I can avoid it) -- I don't like automated dependency management. I'd rather have the pain of figuring out all the dependencies myself, and know exactly what I'm using, and where it comes from. Still, I appreciate that I am a minority voice in this area, and LuaRocks is widely used.

Data types

Lua data types are nil, boolean, number, string, function, userdata, thread, and table. Like Java, data types can be extended using classes, but the fundamental types are fixed.

'nil' is the both the type and the value of an unassigned variable. In Java, 'null' is a value, but not a type.

'boolean' has exactly the same meaning in Lua as it does in Java.

'number' is potentially confusing. Historically, Lua supported only floating-point number representations. More recently, support was added for integers, but there is still no specific integer type. If a number can be represented as an integer, then it will be. However, arithmetic methods that could conceivably produce a non-integer result will be silently promoted to floating-point. This scheme attempts to provide the simplicity of a single number representation, with the speed of integer arithmetic, but it can be confusing, particular to developers who are used to strongly-typed languages. So far as I know, there is no way to tell, within a Lua program, whether a particular number is being represented as an integer or in floating-point.

Although the Lua documentation describes 'string' is a sequence of characters, that's only true in encoding schemes that use one byte per character -- more on this later. It's safer to think of a string as a sequence of bytes.

Strings can be concatenated using the .. operator. In Lua, the + operator applies only to numbers (as it should).

'function' is a first-class data type in Lua, making lambda functions easy to implement. Java has only recently had this facility, but it's been part of Lua for as long as I can remember.

'userdata' is an unstructured data block. It is mostly used for passing data through Lua between extensions implemented in C.

'thread' is actually a co-routine; Lua has no built-in support for true operating system threads, and embedding Lua in multi-threaded contexts is highly precarious.

A 'table' is an associative array -- a list of key/value mappings -- a bit like a Hashtable in Java. However, tables are used to implement other data types, particularly lists. A list is a table where the key/value pairs have numeric keys. Whether it is used as an associative array, or a simple list, a table can be indexed like a Java array. The syntax is the same as Java's -- array[index]. However, the first item in a Lua list has index 1, not 0.

Note:
Unlike Java, Lua data types have platform-specific ranges. When built on a modern Linux system, most likely both integer and floating-point numbers use 64-bit representations. However, if portability is required, this should not be relied on.

Like Java, Lua has no specific support for unsigned number arithmetic. However, when you can't even tell whether a number is an integer or not, whether it's signed or not is likely to be moot.

Lua has no specific character data type -- you can store a character's numeric value in a number. Nor is there a character literal -- the double quote "foo" and single-quote 'f' both specify a string.

String values are immutable in Lua, as they are in Java -- there is no way to change the contents of a string once it has been created.

Variable scope

A hazard for Java/C programmers using Lua is that variables by default have global scope, even if they are defined in a specific block. You need to use the local keyword to define a block-scoped variable. In practice, this means that most variable definitions are local.

One important exception to this principle is the loop control variable in a for loop:

> for i=1,3 do print (i) end; print (i)
1
2
3
nil
i ends up as nil because it's a new variable -- the first i is confined to the loop.
Note:
The semicolon is essentially white-space in Lua. It often improves readability to use semicolons to separate statements, but they have no semantic effect

Control structures

The "if..then..else" test has the following form:

if test then statements elseif test2 then statements else statements end

The comparison operators that can be used for tests are essentially the same as in Java, with the exception that "not equal" is ~=, rather than !=.

Lua strings are compared by their contents, not by their references as in Java. So:

> if "rover" == "rover" then print ("true") end
true

This comparison would evaluate to false in Java, as many novice (and not-so-novice) Java developers have discovered to their cost.

Because Lua is so weakly typed, you can put more-or-less any kind of expression into a test -- Java is much stricter. Unfortunately, unlike other languages that allow this kind of flexibility -- such as C and Perl -- the behaviour of Lua is inexpressive:

> if "false" then print ("true") end
true
> if 0 then print ("true") end
true

The only thing that evaluates to false, other than a boolean expression, is nil. This is sometimes useful, but it's advisable to be careful about putting non-booleans into a test, even though it is syntactically allowable.

The for loop has already been introduced; its structure is pretty conventional. The same is true for the while loop:

while test do statements end
There is no do...while, but there is a repeat:
repeat statements until test 

There is no equivalent of the switch control statement. It can be simulated in various ways, but none is very elegant. The method that executes fastest is probably to pack the case actions as functions into a table, like this:

choice =
  {
  [1] = function() print "one" end,
  [2] = function() print "two" end
  }

choice[1]();

This works because functions are first-class data elements in Lua, and because table indexing is comparatively quick. The above method works even for non-numeric case values, which wouldn't be the case in Java.

Unicode support

Unlike Java, Lua is not Unicode-aware in any meaningful way. There are functions for manipulating UTF8 data, but the basic string data type and library functions are essentially for ASCII. So, for example:

> s="x±y"
> print (#(s))
4

The two bytes needed to represent the "+/-" Unicode character in UTF8 are read as two separate characters. The makes Lua much like C in its string handling, and not at all like Java, which is Unicode-aware from the ground up.

There's no safe way to manipulate multi-byte strings using built-in Lua functions. To some extent, you can get away with treating UTF8 strings like ASCII -- provided you don't have to deal with the individual characters. C programmers will, I presume, be aware of this fundamental limitation, and know how to get around it. Java developers might never have had to.

Parameter passing semantics

Parameter passing semantics in Lua are essentially the same as in Java. Primitive data types are passed by value, that is, they are copied onto the stack of the called function. Tables, like Java objects, are also passed by value, but only the identifier of table is passed. So a called function can modify the contents of the table, just as a called function can modify the contents of passed object in Java, provided the object has methods that allow this. So, for example:

function foo (something)
  something["foo"]="bar";
end

local mytable = {}
foo (mytable);
print ("foo = " .. mytable.foo);

on exit from foo(), the foo field of mytable has been set. So passing a table as an argument to a function has the feel of passing by reference, even though neither Lua nor Java support references in the strict sense.

Incidentally, this code snippet shows two of the three syntactic ways of indexing a Lua table by key: table["key"] and table.key. We will look at the third method later.

Co-routines

Lua does not support operating-system threads, or any form of pre-emptive or concurrent scheduling. Extensions written in C that aim to add this support are associated with numerous complications. I've learned not to try to do any kind of platform-level multi-threading in Lua -- it's just too fragile.

Of course, Lua is not alone among scripting languages in having this limitation. JavaScript is notoriously deficient in theading support as well.

Lua does offer support for co-routines. Using co-routines is not entirely unknown in the Java world but, because true threading is built into the language, developers tend to use that instead. However, sometimes co-routines are safer, and no less efficient, and this technique deserves to be better-known.

Co-routines are functions that explicitly yield control to one another at specific points. When a function has yielded, some sort of scheduler -- which has to be provided by the developer -- can select another "thread" to resume.

Broadly, Lua's co-routine support is encapsulated in four functions in the coroutine module: create(), yield(), resume(), and status(). Once a function has executed yield(), Lua will suspend it until some other function calls resume(). You can use status to work out which co-routines are alive (that is, yielded) and capable of running further, and which are dead (the function has returned).

What makes Lua's co-routine support particularly interesting is that yield() and resume() can pass parameters to one another. This makes it possible to pass data between "threads" in a completely safe way, without the risk of race conditions.

In the end, though, co-routines are just a framework on which a developer can build an application that uses co-operative scheduling. Most of the work has to be done by the developer, and it's a lot harder than simply saying new Thread()... in Java.

Closures, etc

Because a function is a first-class language element in Lua, functions can be passed as arguments to other functions, or stored in tables. This is valid Lua for example:

function do_three (something)
  for i=1,3 do
    something (i);
  end
end

do_three (function(x) print ("hello " .. x) end);

Until the introduction of lambda functions in Java 8, we would have had to get this effect in Java using interfaces implement by anonymous inner classes, which was very inexpressive.

Object-oriented programming

Lua has rudimentary object-oriented constructs. In essence, a class is a table, and an instance is table with some metadata that identifies the 'self'/'this' value for the instance. The following code snippet shows the barest outline of a class for handling complex numbers.

Complex = { } -- Empty table - no default attributes

function Complex:new (re, im)
  -- Boilerplate set-up code
  o = {}
  setmetatable (o, self)
  self.__index = self;
  -- Initialize the object
  self.re = re;
  self.im = im;
  return o;
end

function Complex:getReal()
  return self.re;
end

function Complex:toString()
  return self.re .. "+" .. self.im .. "i";
end

function Complex:add (o)
  self.re = self.re + o.re;
  self.im = self.im + o.im;
  return self;
end

I use names with an initial capital letter to indicate a table that is playing the role of a class -- this matches the usual Java naming convention.

The 'attributes' of the object are the key/value pairs of the table, with the key as the attribute name. Because we can reference table values using the table.key syntax, manipulating object attributes in Lua looks a lot like it does in Java.

However, despite this superficial similarity, classes are not first-class language constructs in Lua, and you'll have to help the language out by careful syntactic usage. So I would have to use my Complex class like this:

c1 = Complex:new (1, 2);
c2 = Complex:new (3, 4);

sum = c1:add (c2);
print ("sum = " .. sum:toString());

using a colon in the method calls to indicate to Lua that it should insert the self reference that makes the table method behave like a class method; and it is 'self', not 'this' as in Java.

It's possible to implement 'real' object-oriented techniques like polymorphism and virtual methods and, to some extent, to control the visibility of a class's attributes. However, these techniques are a little fiddly, and beyond the scope of this simple overview.

The standard library

The basic Lua distribution has a minimal standard library, although it's easy to extend, in Lua or in C. The basic library provides a subset of the facilities in the java.io and java.math packages, and the System class. If you want database support, or a graphical user interface, you'll need to seek out a module or build one yourself.

One module that deserves a specific mention is luaposix. This implements all the POSIX features that are part of Linux/Unix, but not included in ANSI-standard C. Even a basic feature like sub-second timing needs this module, or an equivalent one.

Because Lua has been in widespread use for such a long time, it's relatively easy to find modules to do most common tasks. However, as with code in Maven repositories, the quality is somewhat variable.