18 June 2017
EPub 3.1 is not structurally different from ePub 3.0.1 or ePub 3. Let's take a quick look at a few details.
EPUB 3.1 is structurally identical to ePub 3.0.1 or ePub 3. Let's take a quick look at the package internals as a refresher, or as an introduction for anyone new to the EPUB game.
All EPUB files are a collection of ZIP files and directories. The EPUB 3.1 ZIP package remains the same as prior versions. (In the code: capitals are directories, lowercase are files).
mimetype META-INF container.xml OPS package.opf nav.xhtml chapter01.xhtml chapter02.xhmtl (etc) CSS IMAGES OTHER-FOLDERS
We are using chapter as a placeholder for any book section type.
The folder names can be anything you want as long as your internal references to files uses the correct file paths. Infogrid Pacific uses OPS (Open Publication Structure) as we have been making ePub files long before the IDPF started to make them. The IDPF uses the directory EPUB and packages the section-files in nested directories. IGP puts the book section files in the same directory as the package.opf file to keep pathnames short. There are pros and cons to this approach we will discuss another time.
This is a simple text statement, encoded as UTF-8 (NOTE: In EPUB everyfile must be encoded as UTF-8 or UTF-16). This must be the first file in the ZIP package. This is identical to previous ePub versions.
If you are assembling an EPUB by hand just zip the mimetype file first and then add all other files. That will make sure it is the first file in the package.
This file is in the META-INF directory and instructs the reading system where to find the OPF file. The path to the OPF file is a relative path. This is identical to previous ePub versions.
<?xml version="1.0" encoding="UTF-8"?> <container version="1.0"> <rootfiles> <rootfile full-path="OPS/package.opf" media-type="application/oebps-package+xml"/> </rootfiles> </container>
Reading systems look at this directory/file first to find the filename of the *.opf file. The filename doesn't have to be package.opf but it is sensible to keep this standardized. Some reading systems may look for this exact filename.
If an EPUB is encrypted the signature XML file also goes here.
The package.opf is an XML file with UTF-8 encoding that is "the heart and soul" of an EPUB and defines the package. There are four major structures in the OPF. These are:
There are a lot of attributes in the opening <package> element but these are all "cut-and-paste" and pretty much identical for every EPUB file. The exciting difference is version="3.1".
<?xml version="1.0" encoding="utf-8"?> <package prefix="dc: http://purl.org/dc/elements/1.1/" <span class="style4-rw">version="3.1"</span> xml:lang="en" unique-identifier="isbn"> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/"> ...... </metadata> <manifest> ...... </manifest> <spine> ...... </spine> </package>
You must change the xml:lang="en" attribute to reflect the primary language of the book, and unique-identifier is an ID reference to the <dc:identifer> element. It can be any valid XML ID (must start with a letter) as long as the ID and dc:identifier reference are identical.
Here is an example package.opf for a three chapter book with a cover, title-page and chapter documents (Content Documents). The new EPUB 3.1 items are bold highlighted.
<?xml version="1.0" encoding="utf-8"?> <package <span class="style4-rw">version="3.1" </span> xml:lang="en" unique-identifier="isbn"> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:identifier id="isbn">9876543210123</dc:identifier> <dc:title>Book Title</dc:title> <dc:language>en</dc:language> <meta property="dcterms:modified">2017-06-16T14:00:00Z</meta> <dc:creator>Author Name</dc:creator> <dc:publisher>Publisher Name</dc:publisher> <meta property="dc:rights">Copyright © 2010-2017 Publisher</meta> </metadata> <manifest> <item id="toc" href="TOC.xhtml" properties="nav" media-type="application/xhtml+xml" /> <item id="cover" href="cover.xhtml" media-type="application/xhtml+xml" /> <item id="cover-image" properties="cover-image" href="cover.jpg" media-type="image/jpeg" /> <item id="s01" href="DIRECTORYPATH/title-page.html" media-type="application/xhtml+xml" /> <item id="s02" href="DIRECTORYPATH/chapter1.html" media-type="application/xhtml+xml" /> <item id="s03" href="DIRECTORYPATH/chapter2.html" media-type="application/xhtml+xml" /> <item id="s04" href="DIRECTORYPATH/chapter3.html" media-type="application/xhtml+xml" /> <item id="s05" href="DIRECTORYPATH/exercise1.html" media-type="application/xhtml+xml" /> </manifest> <spine> <itemref idref="s01"/> <itemref idref="s02"/> <itemref idref="s03"/> <itemref idref="s04"/> <itemref idref="s05" linear="no"/> </spine> </package>
Reading systems use the metadata and cover file to create the Reader book selection library.
One of the most important files in the manifest is the Table of Contents or book Navigation file. This is mandatory in an EPUB 3.1. This is identified by a reading system with properties="nav" attribute.
The reading system can also find the cover image with the properties="cover-image" attribute.
This is formally known as the EPUB Navigation Document. This file is the primary book navigation file used by the reading system. This can be any filename as it is referenced from the <manifest> <item> that has the properties="nav" applied. It is good to be consistent with this filename. For example the IDPF examples use nav.xhtml. We have used TOC.xhtml since 2007.
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html> <html xmlns:epub="http://www.idpf.org/2007/ops"> <head> <title>Contents</title> <link rel="stylesheet" href="css/stylesheet.css" type="text/css"/> </head> <body> <nav id="toc" <span class="style4-rw">role="navigation"</span> epub:type="toc"> <h2 <span class="style4-rw">id="toc-title"</span>>Table of Contents</h2> <ol<span class="style4-rw"> aria-labelledby="toc-title"</span> epub:type="list"> <li><a href="DirectoryPath/title-page.html">Title</a></li> <li><a href="DirectoryPath/chapter1.html">Chapter 1</a></li> <li><a href="DirectoryPath/chapter2.html">Chapter 2</a></li> </ol> </nav> </body> </html>
This is primarily a file to be used and presented by the reading system. EPUB 3.1 strongly recommends the inclusion of accessibility tagging. Here reading software will speak "Table of Contents" before reading through the list.
This can also contain page numbers and landmarks both of which are optional. We will leave those for a separate discussion.
The "Content Documents" are the stuff readers want to enjoy and engage with. All the above is technical packaging stuff for reading systems.
<?xml version="1.0" encoding="UTF-8"?> <html xml:lang="en" lang="en"> <head> <title>Section Title</title> <link rel="stylesheet" type="text/css" href="filename.css"/> <span class="style4-rw"><meta name="generator" content="IGP:Digital Publisher 7.4"/></span> <meta name="description" content="Description text"/> <meta charset="utf-8"/> </head> <body> <div class="galley-rw chapter-rw" <span class="style4-rw">role="chapter"</span>> <div class="title-block-rw"> <p title-num-rw>ONE</p> <h1>The story of Digital Content</h1> </div> <p>The story begins in Chapter One....</p> <p>Chapter One ends.</p> </div> </body> </html>
The meta name="generator" element is optional but does let you state how the content was produced.
<meta charset="utf-8"> is new in HTML5. It is important especially if the content is in a language such as Chinese where it would be <meta charset="utf-16">
This is inserted automatically when files are produced with IGP:Digital Publisher. The version number lets you understand the features supported in the package definition.
EPUB 3.1 packaging is pretty much identical to ePub 3.x.x with no significant changes. All of this technical information is provided for publishers new to EPUB 3.1 to understand what makes an e-Book different from a website or other package.
All of the packaging details are fully automatically handled by IGP:Digital Publisher through the Formats On Demand module. There is nothing to do except click the EPUB 3.1 Format button.
The important thing is to take accessibility seriously when producing EPUB 3.1. This is especially important, and challenging, with interactive education content.
IGP:FoundationXHTML was designed with accessibiity at the forefront by using power semantic in all tagging to the finest detail. This can then be process packaged to EPUB, e-Books and Websites. Good accessibility strategies need to be built in from the start, not added as an after-thought. Just one more strength of IGP:Digital Publisher.
Article updated: 7 November 2017
Posted by Richard Pipe