If You Could Write HTML, You Could Make Ebooks
You might have heard of the EPUB format. It is the standard e-book format for most of the e-book platforms (excluding Kindle and iBook).
Believe it or not? It is just a bunch of HTMLs zipped together 😂
Unzipping an EPUB
This may sound dumb, but try it for yourself:
- First, download a DRM-free EPUB, like Metamorphosis by Kafka.
- Change the file extension to .zip. (Eg. rename to pg5200-images-3.zip)
- Unzip it.
And viola! You just got yourself the source files of an EPUB!
Tools You Need to Edit an EPUB
I’ll briefly explain the EPUB file structure later. For now, let’s look at these tools that I use every day:
1. Visual Studio Code / Sublime Text / Any IDE
Since that you are most likely an experienced developer, just use your favorite IDE. As usual, you could create a GIT (optional), open the root folder in a window and save it as a workspace/project before you start.
My usual work involves cleaning up messy EPUB exports from Adobe InDesign, thus even though I mostly code in VSCode, I would use Sublime Text instead for EPUB editing, as its UI is much easier to apply regex find and replace.
2. Calibre
Calibre is a popular free software to manage and edit e-books. It has a built-in IDE designed for EPUB editing.
But what I like is not its IDE, though, as I would prefer my usual one.
I like that Calibre could open a folder for editing directly, unlike Sigil which could only open zipped EPUB files.
The coolest function it provides is the ability to subset embedded fonts. Subsetting means that it will remove unused characters from the font files based on your content to cut down the EPUB file size.
This is especially important for East Asian fonts (Chinese / Japanese) as they are pretty huge.
That’s why I always use Calibre as my final touch after I completed all the contents.
3. eCanCrusher
eCanCrusher is just a zipping tool for EPUB. Why use it instead of 7zip, though? That’s because EPUB has a special requirement when zipping, where the mimetype file should not be compressed.
If you’re familiar with terminal, you could do this instead (from stackoverflow):
cd "folder of epub content"
# add mimetype 1st without compression
zip -0 -X ../file.epub mimetype
# add the rest
zip -9 -X -r -u ../file.epub *
…but eCanCrusher made it super easy as you just need to drag and drop. My only complain is that its icon isn’t very visually appealing. Luckily you could easily replace an app icon in Mac.
4. Pagina EPUB-Checker
When you upload your EPUB onto Google Play Books, it would run your file through a validator before your book could go live. The validator is an open source program developed by W3C.
Pagina EPUB-Checker is a GUI tool that features the same validator engine. All you need to do is drag your root folder or the compiled EPUB into it, and it will tell you all the errors:
5. Any EPUB Reader
You need at least an EPUB reader to view your compiled EPUB file. I have several just for sure, such as Calibre / Adobe Digital Editions / Thorium Reader, because each of them behaves very differently.
In fact, it’s quite a challenge to have your EPUB displayed reliably across readers especially when you have special CSS layout in your page (just like email clients, ughh), so it’s important to keep your HTML/CSS as simple as possible.
EPUB 3 File Structure
Before I start, you could just download any public domain EPUB, whether from Project Gutenberg or Github, and use them as your template.
The latest version of EPUB format is EPUB 3.3. Version 3 was planned to support Javascript, but no reader would support it due to security concerns.
Below are the must-have files that you’d find in any EPUB file. Other than these, you’re free to organise other files in your own way.
mimetype
First, you need a file called mimetype in the root folder, without any file extension. It should only contain this exact string:
application/epub+zip
META-INF/container.xml
Next, you need to have a container.xml in a folder named META-INF. It would point to the location of a .opf file (an XML file to define the book details) relative to the root folder, like this:
<?xml version='1.0' encoding='utf-8'?>
<container xmlns="urn:oasis:names:tc:opendocument:xmlns:container" version="1.0">
<rootfiles>
<rootfile full-path="content.opf" media-type="application/oebps-package+xml"/>
</rootfiles>
</container>
content.opf
You could name this file anything you want. It should be an XML file that lists all the metadata, files (manifest) and page order (spine) of your e-book.
Here’s an example:
<?xml version='1.0' encoding='utf-8'?>
<package xmlns="http://www.idpf.org/2007/opf" version="3.0" unique-identifier="BookId" prefix="calibre: https://calibre-ebook.com">
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:title id="BookTitle">一万个你也比不上这个你</dc:title>
<dc:creator id="BookAuthor">许书芹</dc:creator>
<dc:identifier>isbn:9789672466017</dc:identifier>
<dc:language>zh</dc:language>
<dc:date>2019-06-18T16:00:00+00:00</dc:date>
<dc:publisher>Odonata Publishing Sdn Bhd</dc:publisher>
<meta name="cover" content="cover.jpg"/>
<meta property="belongs-to-collection" id="BookSeries">恋习</meta>
<meta refines="#BookSeries" property="collection-type">series</meta>
<meta refines="#BookSeries" property="group-position">3</meta>
</metadata>
<manifest>
<item id="cover.xhtml" href="Text/cover.xhtml" media-type="application/xhtml+xml"/>
<item id="titlepage" href="Text/titlepage.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter1" href="Text/chapter1.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter2" href="Text/chapter2.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter3" href="Text/chapter3.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter4" href="Text/chapter4.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter5" href="Text/chapter5.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter6" href="Text/chapter6.xhtml" media-type="application/xhtml+xml"/>
<item id="chapter7" href="Text/chapter7.xhtml" media-type="application/xhtml+xml"/>
<item id="copyright" href="Text/copyright.xhtml" media-type="application/xhtml+xml"/>
<item id="nav.xhtml" href="Text/nav.xhtml" media-type="application/xhtml+xml" properties="nav"/>
<item id="style.css" href="Styles/style.css" media-type="text/css"/>
<item id="cover.jpg" href="Images/cover.jpg" media-type="image/jpeg" properties="cover-image"/>
</manifest>
<spine>
<itemref idref="cover.xhtml" linear="no"/>
<itemref idref="titlepage"/>
<itemref idref="chapter1"/>
<itemref idref="chapter2"/>
<itemref idref="chapter3"/>
<itemref idref="chapter4"/>
<itemref idref="chapter5"/>
<itemref idref="chapter6"/>
<itemref idref="chapter7"/>
<itemref idref="copyright"/>
</spine>
</package>
toc.ncx
This is a file required in EPUB 2, an older version of the EPUB format. If you find this file in EPUB 3, it’s there for backward compatibility purposes.
nav.xhtml or toc.xhtml
If you see this file, it must be EPUB version 3. The file name of nav.xhtml could be anything, as long as it is declared in the manifest section of content.opf:
<item id="nav.xhtml" href="Text/nav.xhtml" media-type="application/xhtml+xml" properties="nav"/>
<!-- The item with properties="nav" will be used as the Table of Content -->
This file must contain a <nav> tag. And inside the <nav> there must be an <ol> tag.
Difference Between EPUB 2 and 3
This is not very important. TL;DR Just ignore EPUB 2 and create EPUB 3 whenever you can.
In EPUB 3, it is possible to have a fixed layout just like PDF, but I won’t go into details as it’s not very useful except for comics. Some platforms like Google Play accepts PDF directly for fixed layout.
Technical wise, the most noticeable difference is, as mentioned above, EPUB 2 uses toc.ncx file for navigation, while EPUB 3 uses any XHTML file that has properties=“nav” attribute when declared in content.opf.
What about EPUB 1?
TL;DR you don’t need to know about it because it almost never existed.
Difference Between XHTML and HTML
doctype
XHTML in EPUB should use this exact format (notice the doctype and attributes in <html> tag):
<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
<head>
<!-- headers here -->
</head>
<body>
<!-- content here -->
</body>
</html>
Single Tags
In XHTML you need to close every single tags with a slash at the end of the tag:
<meta name="viewport" content="width=device-width" />
<br />
<hr />
HTML Entities
Most of the HTML entities are unusable in EPUB. For example, the © entity for copyright symbol:
Instead, you could specify its character code, such as © for the copyright symbol.
That’s All, Folks
If you’re writing your own e-book, or accepted a job to convert some PDF to EPUB, I hope this simple introduction helps you to kick-start your journey.