Extracting software from the Raspbian repository, for assembling a custom Linux image for the Raspberry Pi

Pi logo Most of the Raspberry Pi-based equipment I build uses a custom Linux installation -- all the official Linux distributions are far too desktop-focused for an appliance. I explain in my article on booting a Raspberry Pi to a shell how to begin the process of assembling a custom Linux using the official Raspian Linux kernel, which seems to me to be a reasonable approach unless you're looking for sub-second boot times. I think also that, when it comes to general-purpose utilities (grep, cp, sshd, etc), there's something to be said for using pre-built versions of these things, rather than compiling them all from scratch. After all, the problem with Raspbian is not the software itself, but the way it is assembled into a full distribution.

It's perfectly possible to pick individual packages out of the repository, and unpack them to get the necessary files. In some cases there may be no alternative. In most cases, though, it's easier if you can set things up so you can just download a package, and all its dependencies, and have the contents unpacked into a staging directory, ready to include in your build. To be fair, there is a certain amount of 'dependency sprawl' in the Raspbian (and most other) repositories, so you do have to exercise a bit of caution, and probably script some post-download pruning.

If you're running a Debian-based Linux, or even Raspbian itself, then you can use the apt-get utility to fetch files from the repository appropriate for your distribution. However, this is probably not a good way to try to assemble your own Linux distribution, for a number of reasons.

In my view it's better, and not all that difficult, to script the download of files from a specified repository for a specified target architecture and a specified version, rather than to try to subvert existing tools to do the job (even if they are available).

How a Debian-based repository is organized

Like other Debian-based Linux distributions, Raspbian versions are identified by name, rather than by number. At the time of writing, the latest version is 'Buster'. The main Raspian repository is at http://archive.raspbian.org/raspbian/. Within that directory, the actual packages -- for all platforms and architectures -- are in http://archive.raspbian.org/raspbian/pool/main. The pool/main/ directory is organized into (approximate) alpha-numeric order by package name but, with all the architectures and versions jumbled together, it's difficult to locate the required package.

To find the specific download you need for a particular Raspbian version, you need to look in the package index which, for Buster on ARM, is the file Packages in dists/buster/main/binary-armhf. There are compressed and uncompressed versions, both of which are substantial, and probably only need to be downloaded once in the life of a project.

Structure of the Packages file

The Packages file is a plain text file consisting of blocks of lines, one block per package, separated by blank lines. In a script, you can easily find the block corresponding to a particular package by searching for a line of the form

Package: {name}

and reading to the next blank line.

Each block consists of name-value pairs. For illustration, here is the entry for the findutils package.

Package: findutils
Version: 4.6.0+git+20190209-2
Architecture: armhf
Essential: yes
Maintainer: Andreas Metzler 
Installed-Size: 1855
Pre-Depends: libc6 (>= 2.28), libselinux1 (>= 1.32)
Suggests: mlocate | locate
Breaks: binstats (<< 1.08-8.1),... 
Multi-Arch: foreign
Homepage: https://savannah.gnu.org/projects/findutils/
Priority: required
Section: utils
Filename: pool/main/f/findutils/findutils_4.6.0+git+20190209-2_armhf.deb
Size: 652248
SHA256: 08b612...
SHA256: 08b612...
SHA1: e49792b1...
MD5sum: 504604...
Description: utilities for finding files--find, xargs
 GNU findutils provides utilities to find files meeting specified
 criteria and perform various actions on the files which are found.
 This package contains 'find' and 'xargs'; however, 'locate' has
 been split off into a separate package.

For our purposes, the relevant entries are Depends and Pre-Depends -- which amount to the same thing here, and Filename. These are the dependencies, and the location of the .deb file that contains the code.

Even this simple example does illustrate a potential problem with using a repository for the purposes of building a custom Linux. find has a dependency on libc, which is unproblematic -- almost everything does. It also has a dependency on libselinux1, which may or may not be relevant. If you're building an appliance with an embedded Linux system, you may have no need for SELinux. However, find won't work without the libselinux.so library, whether you're using SELinux or not. There's no easy way to break that dependency other than to build find yourself from source, with SELinux support disabled. If you download the libselinux1 package -- and you'll have to, if you're using repository builds -- you'll find it has a dependency on libpcre3 which has a dependency on... and so it goes.

In short, working with a repository is much quicker than building everything from source, but it's difficult to avoid including a heap of software that you might not ever need, and which is not optimized for embedded use. What you gain on the swings... etc.

In brief, to download findutils, you'll need to download it's .deb package, and then you'll have to follow all its dependencies recursively, downloading all the packages that are referenced. If you're scripting the process, you'll need to keep track of what you've downloaded, not just to avoid downloading common dependencies over and over again, but to avoid the inevitable 'dependency loops' that the repository has. The packages themselves are located by the Filename: line in each package block, and this is relative to the top of the repository.

Structure of a .deb file

The .deb package file is just an ar archive, and you can use ar x to unpack it. Inside the archive are further compressed archives, of which the only one relevant here is data.xxx. This file will be in one of any number of compression formats so, if you're scripting the download, you need to be prepared to examine the file extension and use the appropriate decompresser.

The contents of the data.xxx are the actual files, relative to the root directory of the installation. You can unpack this file into whichever directory you are using to stage your root filesystem.

Many packages contain post-installation scripts and, while these might be appropriate if you're installing a package on a running Linux system, they will be of little use in a custom build. In practice, you'll need to script these post-installation steps, if there are any, manually.

An implementation

A simple Perl script that does all the above, may be found on GitHub. By all means use it, but check it for suitability first.

Summary

Building the root filesystem of a custom Linux using a repository is not difficult, and it's more convenient than building everything from source. However, the files you get may not be optimal for an embedded installation, and you'll almost certainly get a lot more files than you ideally want. There's more to installing software than simply copying files into the proper locations, and using a repository won't relieve you of the need to do the post-installation work, whatever it turns out to be.

One final note: there is software in the Raspbian repository that is not, or is no longer, used by the mainstream distribution. There is an implementation of SysV init, for example, which might be more appropriate in an embedded Linux than systemd would be.