KIST to eBook Conversion Notes
KIST Markup by Richard Rathe, v0.17, May 2024
This is a work in progress and subject to change!
Converting KIST to EPUB
Several steps are required to prepare KIST for publication as an eBook. Here is the basic process I've come up with so far.
- Split the source file into separate files for each major section.
- page00.txt, page01.txt. page02.txt, etc…
- Convert these source files to XHTML.
- page00.xhtml, page01.xhtml. page02.xhtml, etc…
- Build metadata, manifest, and
spine
lists to store in the OPF file. - Build the Table of Contents (NCX) file.
- Zip the contents into a compressed archive and rename to .epub.
This assumes a flat
organization with all content files in the same directory.
Jargon Watch
- XHTML = An XML-compatible version of HTML
- OPF = Open Packaging Format (XML) (see below)
- NCX = Navigation Control file for XML (table of contents)
- OEBPS = Open eBook Publication Structure (legacy name?)
EPUB Formatting Notes
I've never liked XML much, but epub is an XML-based format, so I'll just muscle through regardless. Here is my cheat sheet for epub version 2.0.
- Basic directory/file framework…
- META-INF (directory)†
- container.xml†
- mimetype (file)†
- OEBPS (directory)†
- content.opf (file)‡
- metadata
- manifest
- spine
- Cover.jpg (cover art)
- epub.css (styles as needed)
- toc.ncx (table of contents)‡
- content files…
- content.opf (file)‡
†Set and forget incantations. ‡Files where we really need to do some work!
The META-INF directory usually contains a single XML file with one bit of data in it—the path to the content.opf file. Set it and forget it!
The mimetype file contains the phrase application/epub+zip
with no new line. It is an identifier for eBook reader software (see below).
The OEBPS directory is where you actually build your book. All of your files live here, plus a few special bits.
The content.opf XML file contains several parts with distinct functions.
- Metadata like title, author, date of publication, etc.
- A Manifest of every file in the OEBPS directory
- The Spine (page order) of the book
- (The optional Guide section is not used here.)
The Cover.jpg is just what it sounds like. Apparently it must be capitalized and must be a JPEG.
The CSS file (if used) can have any name. I standardized on epub.css for this project.
The toc.ncx file uses a specialized XML format to define the Table of Contents.
The rest of the files are content (XHTML), images (JPEG), and possibly other media.
Zip It All Up!
There is one caveat when creating the final zip archive.
The mimetype
file must be uncompressed and positioned first in the archive.
This is so reader and other software can confirm the format (application/epub+zip).
Two zip commands are required. The first to make the empty archive and place the mimetype file. The second to actually compress the rest of the eBook.
zip -X0 bookname mimetype zip -rDX9 bookname * -x mimetype
On MacOS you may need to add another exclusion (-x) for hidden directory information files that might be present.
zip -rDX9 $bookname * -x mimetype -x "*.DS_Store"