Learn the Basic Structure of EPUB 3.1

18 June 2017

AZARDI, ePub, ePub3.1, IDPF, e-Books, EPUB3.1 Samples.

EPub 3.1 is not structurally different from ePub 3.0.1 or ePub 3. Let's take a quick look at a few details.

Building basics of an EPUB 3.1 book

EPUB 3.1 is structurally identical to ePub 3.0.1 or ePub 3. Let's take a quick look at the package internals as a refresher, or as an introduction for anyone new to the EPUB game.

A promotional graphic of a horizontal blue gift box wrapped with green ribbon. Three lines of text. One. Packagine EPUB 3.1. Two. It's easy folks. Three. When you use IGP:Digital Publisher. PACKAGING EPUB 3.1 IT'S EASY FOLKS IT'S EASY FOLKS WHEN YOU USE IGP:DIGITAL PUBLISHER

The EPUB Package

The Package Structure

All EPUB files are a collection of ZIP files and directories. The EPUB 3.1 ZIP package remains the same as prior versions. (In the code: capitals are directories, lowercase are files).

mimetype
META-INF
    container.xml
OPS
    package.opf
    nav.xhtml
    section01.xhtml
    section02.xhmtl (etc)
    CSS
    IMAGES
    OTHER-FOLDERS

The folder names can be anything you want as long as your internal references to files uses the correct file paths. Infogrid Pacific uses OPS (Open Publication Structure) as we have been making ePub files long before the IDPF started to make them.  The IDPF uses the directory EPUB and packages the section-files in nested directories. IGP puts the book section files in the same directory as the package.opf file to keep pathnames short. There are pros and cons to this approach we will discuss another time.

mimetype

This is a simple text statement, encoded as UTF-8 (NOTE: In EPUB everyfile must be encoded as UTF-8 or UTF-16). This must be the first file in the ZIP package. This is identical to previous ePub versions.

application/epub+zip

 If you are assembling an EPUB by hand just zip the mimetype file first and then add all other files. That will make sure it is the first file in the package.

Container

This file is in the META-INF directory and instructs the reading system where to find the OPF file. The path to the OPF file is a relative path. This is identical to previous ePub versions.

<?xml version="1.0" encoding="UTF-8"?>
<container  
    version="1.0">
    <rootfiles>
        <rootfile full-path="OPS/package.opf" 
            media-type="application/oebps-package+xml"/>
   </rootfiles>
</container>

 Reading systems look at this directory/file first to find the filename of the *.opf file. The filename doesn't have to be package.opf but it is sensible to keep this standardized. Some reading systems may look for this exact filename.

If an EPUB is encrypted the signature XML file also goes here.

package.opf

The package.opf is an XML file with UTF-8 encoding that is "the heart and soul" of an EPUB and defines the package. There are four major structures in the OPF. These are: 

  1. <package> This is the container element but it contains essential attribute declarations that effectively define everything else as part of an EPUB.
  2. <metadata> This is the mandatory metadata that must be included in the package. This has rules that must be obeyed.
  3. <manifest> This is a list of all the files in the package.
  4. <spine> This is the list of files in the next-previous sequence that can be navigated through from a reading system.

There are a lot of attributes in the opening <package> element but these are all "cut-and-paste" and pretty much identical for every EPUB file. The exciting difference is version="3.1".

<?xml version="1.0" encoding="utf-8"?>
<package  
        prefix="dc: http://purl.org/dc/elements/1.1/"         
        <span class="style4-rw">version="3.1"</span> 
        xml:lang="en" 
        unique-identifier="isbn">
    <metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
        ......
    </metadata>
    <manifest>
        ......
    </manifest>
    <spine>
        ......
    </spine>
</package>

You must change the xml:lang="en" attribute to reflect the primary language of the book, and unique-identifier is an ID reference to the <dc:identifer> element. It can be any valid XML ID (must start with a letter) as long as the ID and dc:identifier reference are identical.

Here is an example package.opf for a three chapter book with a cover, title-page and chapter documents (Content Documents). The new EPUB 3.1 items are bold highlighted.

<?xml version="1.0" encoding="utf-8"?>
<package          
        <span class="style4-rw">version="3.1" </span>
        xml:lang="en" 
        unique-identifier="isbn">
    <metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
        <dc:identifier id="isbn">9876543210123</dc:identifier>
        <dc:title>Book Title</dc:title>
       <dc:language>en</dc:language>
        <meta property="dcterms:modified">2017-06-16T14:00:00Z</meta>
        <dc:creator>Author Name</dc:creator>
        <dc:publisher>Publisher Name</dc:publisher>
        <meta property="dc:rights">Copyright © 2010-2017 Publisher</meta>
    </metadata>
    <manifest>
        <item id="toc" href="TOC.xhtml" properties="nav"
            media-type="application/xhtml+xml" />
        <item id="cover"  href="cover.xhtml" 
            media-type="application/xhtml+xml" />
        <item id="cover-image" properties="cover-image" 
            href="cover.jpg" media-type="image/jpeg" />
        <item id="s01" href="PAGEFILES/title-page.html" 
            media-type="application/xhtml+xml" />
        <item id="s02" href="PAGEFILES/chapter1.html" 
            media-type="application/xhtml+xml" />
        <item id="s03" href="PAGEFILES/chapter2.html" 
            media-type="application/xhtml+xml" />
        <item id="s04" href="PAGEFILES/chapter3.html"
            media-type="application/xhtml+xml" />
        <item id="s05" href="PAGEFILES/exercise1.html" 
            media-type="application/xhtml+xml" />
    </manifest>
    <spine>
        <itemref idref="s01"/>
        <itemref idref="s02"/>
        <itemref idref="s03"/>
        <itemref idref="s04"/>
        <itemref idref="s05" linear="no"/>
    </spine>
</package>

Reading systems use the metadata and cover file to create the Reader book selection library.

One of the most important files in the manifest is the Table of Contents or book Navigation file. This is mandatory in an EPUB 3.1. This is identified by a reading system with properties="nav" attribute.

The reading system can also find the cover image with the properties="cover-image" attribute.

TOC.xhtml

This is formally known as the EPUB Navigation Document. This file is the primary book navigation file used by the reading system. This can be any filename as it is referenced from the <manifest> <item> that has the properties="nav" applied. It is good to be consistent with this filename. For example the IDPF examples use nav.xhtml. We have used TOC.xhtml since 2007.

m<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html>
<html  xmlns:epub="http://www.idpf.org/2007/ops">
    <head>
        <title>Contents</title>
        <link rel="stylesheet" href="css/stylesheet.css" type="text/css"/>
    </head>
   
        <nav id="toc" <span class="style4-rw">role="navigation"</span> epub:type="toc">
            <h2 <span class="style4-rw">id="toc-title"</span>>Table of Contents</h2>
            <ol<span class="style4-rw"> aria-labelledby="toc-title"</span> epub:type="list">
                <li><a href="PAGEFILES/title-page.html">Title</a></li>
                <li><a href="PAGEFILES/chapter1.html">Chapter 1</a></li>
                <li><a href="PAGEFILES/chapter2.html">Chapter 2</a></li>
             </ol>
        </nav>
    </body>
</html>

This is primarily a file to be used and presented by the reading system. EPUB 3.1 strongly recommends the inclusion of accessibility tagging. Here reading software will speak "Table of Contents" before reading through the list.

This can also contain page numbers and landmarks both of which are optional. We will leave those for a separate discussion.

Book content files

The "Content Documents" are the stuff readers want to enjoy and engage with. All the above is technical packaging stuff for reading systems.

<?xml version="1.0" encoding="UTF-8"?>
<html  
    xml:lang="en" 
    lang="en">
   <head>
      <title>Section Title</title>
      <link rel="stylesheet" type="text/css" href="filename.css"/>
      <span class="style4-rw"><meta name="generator" content="IGP:Digital Publisher 7.4"/></span>
      <meta name="description" content="Description text"/>
      <meta charset="utf-8"/>>
   </head>
   <body>
     <div class="galley-rw chapter-rw" <span class="style4-rw">role="chapter"</span>>
        <div class="title-block-rw">
           <p title-num-rw>ONE</p>
           <h1>The story of Digital Content</h1>
        </div>
        <p>The story begins in Chapter One....</p>
        <p>Chapter One ends.</p>
      </div>
   </body>
</html>

The meta name="generator" element is optional but does let you state how the content was produced.

<meta charset="utf-8"> is new in HTML5. It is important especially if the content is in a language such as Chinese where it would be <meta charset="utf-16">

This is inserted automatically when files are produced with IGP:Digital Publisher. The version number lets you understand the features supported in the package definition.

In Summary

EPUB 3.1 packaging is pretty much identical to ePub 3.x.x in packaging with no significant changes. All of this technical information is provided for publishers new to EPUB 3.1 to understand what makes an e-Book different from a website or other package.

All of the packaging details are fully automatically handled by IGP:Digital Publisher through the Formats On Demand module. There is nothing to do except click the EPUB 3.1 Format button.

The important thing is to take accessibility seriously when producing EPUB 3.1. This is especially important with interactive education content.

IGP:FoundationXHTML was designed with accessibiity at the forefront by using power semantic in all tagging to the finest detail. This can then be process packaged to EPUB, e-Books and Websites. Good accessibility strategies need to be built in from the start, not added as an after-thought. Just one more strength of IGP:Digital Publisher.

 

Posted by Richard Pipe

comments powered by Disqus