MBook – a proposal for a new, simple e-book format based on Markdown

This article proposes for a new, simple format for storing e-books, particularly novels. I’m calling this format “MBook” for now, because its text format is a subset of Markdown. While MBook lacks (by design) the sophistication of other formats, its intentional simplicity provides some advantages.
Why e-book formats are problematic
Do we really need a new document format?
There are many ways we can encode a book for viewing on a computing device. In the simplest case we might just use plain ASCII or Unicode text. The problem with this approach is that many books, even novels, benefit from non-text material like cover art, and the ability to apply simple emphasis, like bold and italic.
We might instead use word processor formats, like Microsoft Word, RTF, or ODF. These are all capable or representing books, but they are complex, and may be proprietary. It isn’t easy to write software to handle such formats, so they haven’t been widely adopted for e-books.
Then there are file formats specifically for e-books, such as EPUB, MOBI, and AZW. These formats are complex and, again, some are proprietary. All offer far more features than we require to encode most novels.
A more subtle problem is that the sophistication of these formats leads to their being abused. The EPUB format, for example, being based on HTML, allows the author very fine control over text appearance and layout. While there are circumstances in which this level of control might be necessary, it’s rarely appropriate for novels. When I read an EPUB book on my Kobo reader, for example, I invariably have to start with a bunch of configuration changes, to make the text look like I want, not the way the author wanted. It isn’t uncommon for authors to insist on specific typefaces or specific text sizes, knowing nothing of the reader’s preferences, or the device on which the book will be read. This can be a nuisance.
If you tend to be a bit paranoid – and we all should be, the Internet being what it is – you might also worry about how an e-book reader might be used to compromise your privacy or security. A format as complex as EPUB requires correspondingly complicated software to read, and it’s plausible that a rogue EPUB might exploit defects in this software. Unlikely, to be sure; but not impossible.
Frankly, with these factors in mind, plain text looks increasingly like a good idea. The problem lies in adding just enough additional material to plain text, without introducing any of the problems of EPUB and the like.
Gempub – a step in the right direction
Any format for representing e-books needs some fundamental features.
- There must be a way to divide a long text into chapters. There should be a way to tell the reader the order in which these chapters appear within the book, if the chapters are in separate files
- Ideally, the format should support images: at least a cover image, but perhaps illustrations within the text itself
- There should be a way to store metadata, like the author, publication date, and title
- There should be provision for rudimentary text formatting, like headings and sub-headings
Additionally, for our present purposes, the format should be easy to author and to display using software.
Gempub is an e-book format that uses Gemtext as its text format. Gempub comes out of the Gemini Protocol project – Gemtext is the default document format with this protocol. Gemtext looks like a highly simplified Markdown.
A Gempub document is a zipfile that contains Gemtext files, one for each chapter, along with whatever images are required, and some metadata.
It is trivially easy to hack up a viewer for Gempub books, particularly if you regard the display of images as optional. In practice, support for Gempub has been added to applications that already support Gemtext – typically browsers for small-net protocols like Gemini.
In addition, it’s easy to create Gempub books, either from scratch or by converting from other formats. You don’t need any specific tooling – once you understand the format, all you need is a text editor and a zip utility.
A great many novels can be delivered perfectly well in Gempub format, particularly the classics. Dickens, Melville, and Jane Austen need nothing more. Gempub alleviates all the complexity-related problems of EPUB and the like, and allows the reader complete control over formatting and layout.
Unfortunately, Gemtext is just too simple for many books. Not only does it not support even the most rudimentary text formatting – not even bold or italic – it does not distinguish clearly between line breaks and paragraph breaks. A number of conventions have grown up among Gemtext users that mitigate these problems but, unfortunately, they aren’t universally applied.
The ‘MBook’ format
The MBook format is intended to overcome the limitations of Gempub, while being no more difficult to author, and little more difficult to display using software. The main difference between Gempub and MBook is the text format: rather than Gemtext, MBook uses CommonMark Markdown.
Markdown is a document format that allows simple formatting, and a small amount of control over layout. Markdown is relatively easy to parse in software but, in practice, there’s little need to: there are already many open-source libraries to do this.
Like Gempub, and also like EPUB, MBook documents are zip archives, containing text, images, and metadata. A typical novel might be structured internally like this:
mbook_metadata.txt
index.txt
description.md
text/
chapter-01.md
chapter-02.md
...
images/
cover.jpg
...
Putting the text and images into separate directories is for ease of
maintenance – the format itself does not care how the archive is
structured. The file index.txt contains the chapter index,
which lists the document files in reading order. This is a plain text
file, structured like this:
text/chapter-01.md Chapter 1: In the beginning
text/chapter-02.md Chapter 2: What the butler saw
...
That is, the index gives the pathnames of the chapters, relative to the top directory of the archive and, optionally, the chapter headings. Reader software can use this information to provide a table of contents, although the document can contain its own contents page if the author prefers.
The file mbook_metadata.txt contains metadata about the
book – author, title, etc.
If the text includes illustrations, they are referenced using ordinary Markdown syntax, like this:

A reader application that can’t render images is expected to display the alt-text instead. As in the chapter index, the pathnames of images are relative to the top-level directory of the zip archive.
Documents for which MBook might, and might not, be suitable
MBook is a simple format, intended mostly for novels and poetry. It is hugely simpler than EPUB.
The text of an MBook e-book can contain illustrations, but the format provides no means for the book’s creator to specify the size or location of images. It is expected that illustrations will be displayed using the full width of the page. Documents can have links embedded in the text, but only to chapters, not to specific lines.
MBook is unlikely to be appropriate for textbooks, or documents containing complicated mathematical expressions. Since there is no way to specify a typeface, it’s also unsuitable for any work that uses font changes for artistic effect.
The MBook specification
The MBook format has a detailed specification and explanatory notes on GitHub. Prospective book authors or packagers should at least glance at the specification, because MBook does not allow all possible Markdown features. In particular, it forbids the use of HTML embedded in Markdown. This deliberate lack of HTML support is both to make it easier to write software for displaying MBook, and to prevent authors using HTML to format text is ways that Markdown does not allow.
At present, I don’t regard the specification as fixed, and I welcome corrections or feature requests, through GitHub.
MBook reader software
At the time of writing, the following software supports version 1.0.1 of the MBook specification.
MBook samples
There are some
sample documents in MBook format. All these have been converted from
EPUB, and are in the public domain. It’s relatively straightforward to
convert an EPUB document with straightforward format to MBook. For
example, straightforward HTML formatting like <b> for
bold and <i> for italic can easily be converted to
Markdown.
Unfortunately, many EPUB documents use non-standard formatting based on CSS style-sheets, and these are less easy to convert.
The future
I do not know how much interest there will be in MBook. There’s been some small interest in the small-net community, but it needs more than this for it to be worth investing significant effort.
In the longer term, I’d like to write a MBook reader for Android or, at least, add MBook support to an existing e-book app. I’d also like to add support to stand-alone readers like those from Kobo. I don’t know if it would be possible to add support to Kindle devices, even if there were any interest in doing so.
If anybody is interested in collaborating on any of this, do please let me know.
Have you posted something in response to this page?
Feel free to send a webmention
to notify me, giving the URL of the blog or page that refers to
this one.


