Container from scratch: Networking
In the last-but-one article
I demonstrated how to use
chroot
to provide a container with a private filesystem.
That filesystem was a complete, but minimal, version of Alpine Linux.
Then, in the previous article
I showed how to improve the
isolation of the container by providing it with private namespaces for
processes, mounts, and network identity. That article also touched on the use
of private user namespaces and "rootless" containers, although these
are not techniques I use in these demonstrations.
This article completes the creation of a basically-workable container, by providing it with its own network namespace. This allows the container to have its own network interface, distinct from that of the host. This interface can be used to communicate with the host and other services. Since the container can only use a specific interface, it can't see general network traffic in the host, or between other containers, unless the host allows it.
I haven't put this subject off until now merely because it's more complicated
than the preceding demonstrations -- although it is. I needed first to
explain how unshare
works, so I could explain why I'm not using unshare
in
this section.
This article builds on the work of the previous ones in the series. I assume that, if you want to follow the steps yourself, you've set up the container filesystem and scripts as I described before. None of what follows will make sense, or even work, without that preparation.
Overview
At this point, if you've followed the previous articles, you should have a primitive container. We can run a shell in the container like this:
# unshare -mpfu chroot container /bin/start.sh
The script start.sh
sets up the container's environment, and then
runs a shell as an unprivileged user. I explained in the previous article
why the script needs to change to an unprivileged user account,
as soon as possible, despite the
sandboxing provided by the container.
The "proto-container" has its own process list, its own root filesystem, its own list of mounts, and its own hostname. What it doesn't have, yet, is a private network interface. We could define an interface for the container easily enough, but there's every chance it will clash with the host. In any case, in a containerized environment we want to control the channels of communication between containers, both for efficiency and for security.
Linux has a bewildering array of network bridging and tunneling technologies, each with its own strengths and weaknesses. There's a useful summary in this article on Red Hat Developer.
The approach we will follow is to provide a private network namespace, which is
initially empty. Then we'll create a 'veth (virtual ethernet) tunnel', placing one end of the
tunnel in the container, and the other in the host. This will create a point-to-point
network link between container and host. Then we'll set up routes and iptables
rules to allow wider communication.
I should point out from the outset that this is only one of many different methods that might be used to create a private network for containers. It probably isn't the most efficient, or the most scalable -- I chose it because I think it's the easiest to understand.
Why we're not using unshare
here
unshare -n
will provide a child process with a private network
namespace. We're already using unshare
for all the other
private namespaces, but I don't think it will work here.
The problem is that, as far as I can tell, unshare
can only
create an anonymous network namespace. In some circumstances that
would be fine -- we could position the ends of the tunnel by process ID
rather than by namespace name. But we're also providing a private
process namespace as well, so it's hard to figure out the relevant process IDs.
This problem is probably not insurmountable, but it's just easier to
use a named network namespace in this demonstration.
We'll use ip netns exec
to run a process in a named
network namespace.
Overview
This demonstration makes using of "virtual ethernet" ("veth") interfaces. A veth tunnel always has two 'ends'. Each end is an interface in its own right, and can have its own network properties. In particular, the ends will have different IP addresses, in the same address range.
What's particularly important about veth tunnels for our present purposes is that the two ends can be placed in different network namespaces. This provides the namespaces with a point-to-point, private communications channel. It's possible to use veth interfaces to provide communications links from container to container, but it's more common to use the container host as a backbone for communication.
There are probably applications in which containers only communicate with their host, but more commonly containers will need wider communication, and perhaps even Internet access.
The approach I've adopted for extended communication is the use of iptables rules with packet forwarding and IP masquerading. What this means is that network traffic from the container will appear to originate from the host's primary network interface, whatever its destination. iptables keeps track of network packets in flight, and routes packets intended for the container to the proper interface, even though the destination address in the IP header is that of the host. This is essentially the same technique that DSL routers use, to allow multiple computers in a home or organization to share a public IP number.
Demonstration
Start by creating a named network namespace. If you're already using containers or virtual machines, you might find that some namespaces exist. You'll need to change the name I'm using -- 'container-netns' -- in the unlikely event this name is already in use.
The following steps should be carried out on the host, not inside the container.
# ip netns list (Check name is not already in use) # ip netns add container-netns
Now it should be possible to run the container's shell just as before,
but with the added step of imposing the new network namespace.
We'll use ip netns exec
to do this:
# ip netns exec container-netns unshare -mpfu chroot container /bin/start.sh $ /sbin/ifconfig (nothing)
You'll see that there are no configured network interface. The loopback (lo) interface exists, but is down. It won't be needed for this demonstration, but probably will in a real application.
Now create the veth interface pair. I'm using the names 'veth-host' and 'veth-container' to denote the two ends of the link. Again, you might need to check that these names are not already in use.
The following steps should be carried out in the host -- quit the container shell or use a different session.
# ip link show (check names are not already used) # ip link add veth-host type veth peer name veth-container # ip link list | grep veth 34: veth-container@veth-host: <BROADCAST,MULTICAST,M-DOWN>... 35: veth-host@veth-container: <BROADCAST,MULTICAST,M-DOWN>...
The veth-host interface will remain in the host's (global) namespace. The veth-container interface -- the other end of the link -- needs to be moved into the container's network namespace. This namespace, you should recall, is 'container-netns' (unless you had to pick a different name).
# ip link set veth-container netns container-netns
Now assign an IP number to the host end of the veth pair. I'm using 10.0.3.X addresses below -- yet again, you'll need to choose different addresses (typically in the 10.x.x.x range) if these IP numbers are already in use. I will use 10.0.3.1 for the host end of the veth pair, and 10.0.3.2 (later) for the container end.
# ip addr add 10.0.3.1/24 dev veth-host # ip link set veth-host up
That's all the configuration needed for the host. All that is needed
for the container, for now, is to assign an IP number to its end
of the veth tunnel. The interface 'veth-container' should now
be available in the
container, although you won't see it with ifconfig
because
it's currently down.
To configure the container, you can modify start.sh
so
that it leaves you in a root shell, or just add the following lines
to the script itself, before it invokes the shell.
# ip addr add 10.0.3.2/24 dev veth-container # ip link set veth-container up
If you run ifconfig
in the container now, you
should see the veth-container interface, with its IP number.
Now we can use the network testing utility nc
to verify that there is free, bidirectional communication between the host and
the container. In the container we'll run nc
in listening
mode, so that it will wait for incoming connections. In the host, in
a different session, we run nc
in client mode.
In the container shell, start nc
as a listener, on port
8080. It doesn't matter if this port is in use in the host -- we're
in a private network namespace now.
$ nc -l -p 8080
Now, from the host, connect to the container on port 8080:
$ nc 10.0.3.2 8080
Any text you type in one nc
session should be relayed
to the other and printed.
Press ctrl+d in the host's nc
session to end
it. The container-side session should end automatically.
Note that what we've created here is a point-to-point link -- it exists for communication between the host and the container, and for no other purpose.
Now it's time to extend the reach of the container, by routing packets from its interface to other hosts (or other containers). To do that we must first enable packet forwarding in the kernel. However, if you already use VMs or containers, this may have been done by some other set-up. No harm will be done by repeating the step, so:
# echo 1 > /proc/sys/net/ipv4/ip_forward
The following steps can be very complicated. What will make them complicated is an existing iptables firewall configuration that conflicts with the packet routing we need to create. I can't predict what settings will work in any system but by own, and I'd strongly recommend that, for testing purposes, you disable any software firewall completely. In most cases you can just flush the rules like this:
# iptables -F
If you can't do this -- and in some environments it might be risky -- you'll need to ensure that you don't have rules in place that will block IP traffic from the container. Unfortunately, I can't advise on how to do that -- there's just too much variation in software firewall configuration.
The following steps -- which should be carried out on the host --
assume that the host's primary interface is called eth0
.
As ever, you'll need to substitute the name that is appropriate
on your system.
First, add FORWARD rules between the primary interface and the host end of the veth interface pair, in both directions. These rules will allow network traffic that arrives from the container on veth-host to be forwarded to the host's primary interface, and vice versa
# iptables -A FORWARD -o eth0 -i veth-host -j ACCEPT # iptables -A FORWARD -i eth0 -o veth-host -j ACCEPT
Now set up a rule for masquerading traffic whose source IP is that of the container-end of the veth pair.
# iptables -t nat -A POSTROUTING -s 10.0.3.2/24 -o eth0 -j MASQUERADE
That completes the setup in the host. In the container, we'll need
to create a route that will send all traffic, other than that within
the container itself, to the host end of the veth pair. So add this
line to start.sh
:
ip route add default via 10.0.3.1
Finally, to get Internet access in the container, you'll probably also need
to configure a DNS resolver in /etc/resolv.conf
. What
is appropriate will depend on where your system is located, and how it is set
up. In my tests I'm adding code like this to the script, to add
a resolver to /etc/resolv.conf
if the file does not exist:
if [ ! -f /etc/resolv.conf] ; then echo "nameserver 8.8.4.4" > /etc/resolv.conf
"8.8.4.4" is Google's DNS server, but you probably have a more local one. And now, at last, we should be able to run the container shell, and carry out operations that require Internet access, like:
# ip netns exec container-netns unshare -mpfu chroot container /bin/start.sh mycontainer:~$ wget http://google.com index.html saved.
There are many other significant, network-related activities that need to be carried out, to create a viable container infrastructure. Since we only have one container so far, it's difficult to demonstrate those steps.