Lua quick-start for Java programmers
This article provides the barest essentials of Lua programming, for developers who are familiar with Java (or, perhaps, C/C++). Despite its simplicity, Lua is a fully-featured programming language with an extensive run-time library, and it's impossible to do it justice in a short article. However, I think it's possible to give enough of the basics to give an experienced developer a running start.
Note:
Lua is often found embedded in other applications. There are several different versions in widespread use, all built in slightly different ways. It's better, I think, to build Lua from source if practicable, because then you'll know exactly what you're getting. It's not like buildinggcc
orPerl
from source -- Lua is implemented in only 20 000 lines of C, and builds in a few seconds on a modern desktop system
Fundamental implementation
Lua is traditionally an interpreted scripting language -- there is no well-defined "compile" phase of development. Lua doesn't really have anything that corresponds to Java's notion of a class file. Although there is a compiler that can produce Lua byte-code, the notion of a virtual machine is not as well-defined in Lua as it is in Java, and Lua byte-code is essentially a binary parse tree, not a set of machine instructions for a virtual machine.
There is a spin-off of Lua called LuaJIT, which implements a just-in-time compiler for the Lua language. This mode of operation is more similar to the way Java works except that, again, there is no separate compilation phase.
Like Java, and like most scripting languages, Lua uses garbage collection (GC) for memory management. Lua's GC scheme is based on reference counting, and is much simpler than Java's. As in Java, it's possible to break the garbage collector by overloading it, but it seems to be quite difficult. That's not necessarily because Lua's GC scheme is more robust than Java's -- rather, I don't think that Lua is called on to carry out the same kind of heavyweight tasks that Java often is.
The Lua language syntax is block-structured, as most modern procedural languages are. It's a verbose language compared to Java -- more like BASIC or Algol. However, most symbols -- parentheses, brackets, arithmetic operators, etc -- have meanings that will be completely familiar.
Lua is weakly-typed -- it is neither necessary nor possible to stipulate that a variable holds only a specific type.
Unlike Java, a Lua program does not have to start in a specific class -- Lua does have a concept of class, but it's not as central as it is in Java. Instead, execution begins with the first statement that can actually be executed.
Like Java, C++, and Perl, and unlike Python, Lua has no regard for program layout and indentation. You can format your source files however you like.
Hello, World
This is the canonical "Hello, World" for Lua.
print ("Hello, World")
The following are equivalent
print ('Hello, World') print 'Hello, World'
You won't always get away with writing a function call without parentheses, so it might be better to avoid this particular idiom.
The standard Lua print()
is broadly similar
to Java's system.out.println()
-- there's no way
to suppress the terminating end-of-line character.
So another alternative is:
io.write ("Hello, World\n");
io.write()
is a more flexible function than
print
, but print is convenient for debugging, because
it can print things other than text strings without explicit
conversion.
Namespace and packaging
Broadly, Lua has a flat namespace. However, the practice of grouping functions into modules gives an illusion of separate namespaces. For example:
foo=require ("foo"); foo.some_function ("bar");
foo
is not a namespace here, despite appearances:
it's a table. There's no particular reason to be concerned about
the specific implementation details, unless you're working
with object-oriented programming -- more on this later -- but,
essentially, [table_name].[item_index]()
is a valid
function call in Lua.
The require()
function can load various types
of module -- Lua source code, Lua bytecode, native code --
each with its own search path. When the module is loaded,
the loader creates a table and associates each function name
with its implementation.
This module.function
naming convention applies not
only to modules loaded
explicitly, but also to many built-in functions. So we
have, for example, math.sin
and io.read
.
Again, there's not really a "math" or an "io" namespace, in the
sense of Java or C++.
As modules are the only built-in facility for organizing functions, they are used a lot.
There is a package manager with dependency management and online repository called LuaRocks. It has a similar function in Lua to that of Maven in Java. I can't comment on it because I don't use it (I don't use Maven either, if I can avoid it) -- I don't like automated dependency management. I'd rather have the pain of figuring out all the dependencies myself, and know exactly what I'm using, and where it comes from. Still, I appreciate that I am a minority voice in this area, and LuaRocks is widely used.
Data types
Lua data types are nil, boolean, number, string, function, userdata, thread, and table. Like Java, data types can be extended using classes, but the fundamental types are fixed.
'nil' is the both the type and the value of an unassigned variable. In Java, 'null' is a value, but not a type.
'boolean' has exactly the same meaning in Lua as it does in Java.
'number' is potentially confusing. Historically, Lua supported only floating-point number representations. More recently, support was added for integers, but there is still no specific integer type. If a number can be represented as an integer, then it will be. However, arithmetic methods that could conceivably produce a non-integer result will be silently promoted to floating-point. This scheme attempts to provide the simplicity of a single number representation, with the speed of integer arithmetic, but it can be confusing, particular to developers who are used to strongly-typed languages. So far as I know, there is no way to tell, within a Lua program, whether a particular number is being represented as an integer or in floating-point.
Although the Lua documentation describes 'string' is a sequence of characters, that's only true in encoding schemes that use one byte per character -- more on this later. It's safer to think of a string as a sequence of bytes.
Strings can be concatenated using the ..
operator. In Lua,
the +
operator applies only to numbers (as it should).
'function' is a first-class data type in Lua, making lambda functions easy to implement. Java has only recently had this facility, but it's been part of Lua for as long as I can remember.
'userdata' is an unstructured data block. It is mostly used for passing data through Lua between extensions implemented in C.
'thread' is actually a co-routine; Lua has no built-in support for true operating system threads, and embedding Lua in multi-threaded contexts is highly precarious.
A 'table' is an associative array -- a list of key/value
mappings -- a bit like a Hashtable
in
Java. However, tables are used to implement other data types, particularly
lists. A list is a table where the key/value pairs have numeric keys.
Whether it is used as an associative array, or a simple list,
a table can be indexed like a Java array. The syntax is the same
as Java's -- array[index]
. However, the first
item in a Lua list has index 1, not 0.
Note:
Unlike Java, Lua data types have platform-specific ranges. When built on a modern Linux system, most likely both integer and floating-point numbers use 64-bit representations. However, if portability is required, this should not be relied on.
Like Java, Lua has no specific support for unsigned number arithmetic. However, when you can't even tell whether a number is an integer or not, whether it's signed or not is likely to be moot.
Lua has no specific character data type -- you can store a character's
numeric value in a number. Nor is there a character literal
-- the double quote "foo"
and single-quote 'f'
both specify a string.
String values are immutable in Lua, as they are in Java -- there is no way to change the contents of a string once it has been created.
Variable scope
A hazard for Java/C programmers using Lua is that variables
by default have global scope, even if they are defined
in a specific block. You need to use the local
keyword to define a block-scoped variable. In practice, this
means that most variable definitions are local
.
One important exception to this principle is the loop
control variable in a for
loop:
> for i=1,3 do print (i) end; print (i) 1 2 3 nil
i
ends up as nil
because it's a new
variable -- the first i
is confined to the loop.
Note:
The semicolon is essentially white-space in Lua. It often improves readability to use semicolons to separate statements, but they have no semantic effect
Control structures
The "if..then..else" test has the following form:
if test then statements elseif test2 then statements else statements end
The comparison operators that can be used for tests are essentially
the same as in Java, with the exception that "not equal" is ~=
,
rather than !=
.
Lua strings are compared by their contents, not by their references as in Java. So:
> if "rover" == "rover" then print ("true") end true
This comparison would evaluate to false
in Java, as many
novice (and not-so-novice) Java developers have discovered to
their cost.
Because Lua is so weakly typed, you can put more-or-less any kind of expression into a test -- Java is much stricter. Unfortunately, unlike other languages that allow this kind of flexibility -- such as C and Perl -- the behaviour of Lua is inexpressive:
> if "false" then print ("true") end true > if 0 then print ("true") end true
The only thing that evaluates to false
, other than a
boolean expression, is nil
. This is sometimes useful,
but it's advisable to be careful about putting non-booleans
into a test, even though it is syntactically allowable.
The for
loop has already been introduced; its structure
is pretty conventional. The same is true for the while
loop:
while test do statements endThere is no
do...while
, but there is a repeat
:
repeat statements until test
There is no equivalent of the switch
control statement.
It can be simulated in various ways, but none is very elegant. The
method that executes fastest is probably to pack the case actions
as functions into a table, like this:
choice = { [1] = function() print "one" end, [2] = function() print "two" end } choice[1]();
This works because functions are first-class data elements in Lua, and because table indexing is comparatively quick. The above method works even for non-numeric case values, which wouldn't be the case in Java.
Unicode support
Unlike Java, Lua is not Unicode-aware in any meaningful way. There are functions for manipulating UTF8 data, but the basic string data type and library functions are essentially for ASCII. So, for example:
> s="x±y" > print (#(s)) 4
The two bytes needed to represent the "+/-" Unicode character in UTF8 are read as two separate characters. The makes Lua much like C in its string handling, and not at all like Java, which is Unicode-aware from the ground up.
There's no safe way to manipulate multi-byte strings using built-in Lua functions. To some extent, you can get away with treating UTF8 strings like ASCII -- provided you don't have to deal with the individual characters. C programmers will, I presume, be aware of this fundamental limitation, and know how to get around it. Java developers might never have had to.
Parameter passing semantics
Parameter passing semantics in Lua are essentially the same as in Java. Primitive data types are passed by value, that is, they are copied onto the stack of the called function. Tables, like Java objects, are also passed by value, but only the identifier of table is passed. So a called function can modify the contents of the table, just as a called function can modify the contents of passed object in Java, provided the object has methods that allow this. So, for example:
function foo (something) something["foo"]="bar"; end local mytable = {} foo (mytable); print ("foo = " .. mytable.foo);
on exit from foo()
, the foo
field of
mytable
has been set. So passing a table as an argument
to a function has the feel of passing by reference, even
though neither Lua nor Java support references in the strict sense.
Incidentally, this code snippet shows two of the three syntactic
ways of indexing a Lua table by key: table["key"]
and table.key
. We will look at the third method later.
Co-routines
Lua does not support operating-system threads, or any form of pre-emptive or concurrent scheduling. Extensions written in C that aim to add this support are associated with numerous complications. I've learned not to try to do any kind of platform-level multi-threading in Lua -- it's just too fragile.
Of course, Lua is not alone among scripting languages in having this limitation. JavaScript is notoriously deficient in theading support as well.
Lua does offer support for co-routines. Using co-routines is not entirely unknown in the Java world but, because true threading is built into the language, developers tend to use that instead. However, sometimes co-routines are safer, and no less efficient, and this technique deserves to be better-known.
Co-routines are functions that explicitly yield control to one another at specific points. When a function has yielded, some sort of scheduler -- which has to be provided by the developer -- can select another "thread" to resume.
Broadly, Lua's co-routine support is encapsulated in four functions
in the coroutine
module: create()
,
yield()
, resume()
, and status()
.
Once a function has executed yield()
, Lua will suspend it
until some other function calls resume()
. You can use
status
to work out which co-routines are alive (that is,
yielded) and capable of running further, and which are dead (the function
has returned).
What makes Lua's co-routine support particularly interesting is that
yield()
and resume()
can pass parameters to
one another. This makes it possible to pass data between "threads" in a
completely safe way, without the risk of race conditions.
In the end, though, co-routines are just a framework on which a developer
can build an application that uses co-operative scheduling. Most of
the work has to be done by the developer, and it's a lot harder than
simply saying new Thread()...
in Java.
Closures, etc
Because a function is a first-class language element in Lua, functions can be passed as arguments to other functions, or stored in tables. This is valid Lua for example:
function do_three (something) for i=1,3 do something (i); end end do_three (function(x) print ("hello " .. x) end);
Until the introduction of lambda functions in Java 8, we would have had to get this effect in Java using interfaces implement by anonymous inner classes, which was very inexpressive.
Object-oriented programming
Lua has rudimentary object-oriented constructs. In essence, a class is a table, and an instance is table with some metadata that identifies the 'self'/'this' value for the instance. The following code snippet shows the barest outline of a class for handling complex numbers.
Complex = { } -- Empty table - no default attributes function Complex:new (re, im) -- Boilerplate set-up code o = {} setmetatable (o, self) self.__index = self; -- Initialize the object self.re = re; self.im = im; return o; end function Complex:getReal() return self.re; end function Complex:toString() return self.re .. "+" .. self.im .. "i"; end function Complex:add (o) self.re = self.re + o.re; self.im = self.im + o.im; return self; end
I use names with an initial capital letter to indicate a table that is playing the role of a class -- this matches the usual Java naming convention.
The 'attributes' of the object are the key/value pairs of the table, with
the key as the attribute name. Because we can reference table values
using the table.key
syntax, manipulating object attributes
in Lua looks a lot like it does in Java.
However, despite this superficial similarity, classes are not first-class
language constructs in Lua, and you'll have to help the language out
by careful syntactic usage. So I would have to use my Complex
class like this:
c1 = Complex:new (1, 2); c2 = Complex:new (3, 4); sum = c1:add (c2); print ("sum = " .. sum:toString());
using a colon in the method calls to indicate to
Lua that it should insert the self
reference that makes the table method behave like a class method; and
it is 'self', not 'this' as in Java.
It's possible to implement 'real' object-oriented techniques like polymorphism and virtual methods and, to some extent, to control the visibility of a class's attributes. However, these techniques are a little fiddly, and beyond the scope of this simple overview.
The standard library
The basic Lua distribution has a minimal standard library, although it's
easy to extend, in Lua or in C. The basic
library provides a subset of the facilities in the java.io
and
java.math
packages, and the System
class. If you
want database support, or a graphical user interface, you'll need to seek out
a module or build one yourself.
One module that deserves a specific mention is luaposix. This implements all the POSIX features that are part of Linux/Unix, but not included in ANSI-standard C. Even a basic feature like sub-second timing needs this module, or an equivalent one.
Because Lua has been in widespread use for such a long time, it's relatively easy to find modules to do most common tasks. However, as with code in Maven repositories, the quality is somewhat variable.