EPub3, Accessibility and Pagebreaks

22 September 2013

Accessibility, ePub3, IGP:Digital Publisher, AZARDI, Production Challenges

A major outcome of the recent AAP ePub3 conference was ePub pagebreaks that match the original source print-book from which a digital book was created. This is what we did.

A major outcome of the recent AAP ePub3 conference was ePub pagebreaks that match the original source print-book from which a digital book was created.

Two criteria were identified:

  1. The Source book ISBN should be in the metadata.
  2. The epub:type pagebreak property positions should match the source book pagination.

Although there is a lobby that doesn't particularly like seeing print book page numbers in eBooks, it is as good a linear reading reference as anything. It particularly adds a needed positional syntax for tag and linked indexes.

We have used the pagebreak property since 2008 (long before ePub3) for both ePub2 index linking and for textbook WebApps. In addition to accessibility, textbook navigation in a blended learning environment (print and eBooks) becomes easy. When the teacher says "Turn to page 87" everyone gets to the correct place instantly.

Pagebreak Tools and Uses

In all our retrodigitization we always capture pagebreaks and have done since 1999, both in earlier XML production systems and in the current XHTML5 system. The pagebreak selector can be used in several ways inside IGP:Digital Publisher.

  1. Generation of Line-by-Line PDF reproductions. We use this for the Faber Finds reprint series (Case Study).
  2. Creation of Fixed Layout ePub3s from reflowable source content so the print and ePub have matching pagination. This has proved very useful for English Language Teaching books (Case Study) and has been used for significant tradebooks as well. (Reflowable to Fixed Layout Article.)
  3. Tracking and enforcing "Fair Use" copying in various delivery platforms.
  4. Production metric tracking and reporting.

MOD51 to the Pagebreak Rescue

IGP:Digital Publisher "Insert Pagebreaks" is brought to you courtesy of MOD51, the same technology that delivers awesome typography reporting AND importing to XHTML from PDF.

IGP:Digital Publisher produces both a print/rgb PDF and various eBook packages from the same XHTML source. Inserting page-breaks back into the XHTML after the final PDF has been generated was a tedious manual process until MOD51 matured. Here is an earlier article on the marvel that MOD51 is and the digital content problems it solves.

With IGP:Digital Publisher once you have created your perfectly kerned and tracked PDF for a print edition you can now create pagebreaks in your IGP:FoundationXHTML (FX) (and hence your ePub any-number).

FX always contains the pagebreak element as a span within content without exception. It is a map to a specific print book pagination and must not affect the value of the content. This is what it looks like.

<span class="pagebreak-rw">87</span>

These selectors have the epub:type="pagebreak" attribute inserted on generation of an ePub3 format. That gets a little more verbose and looks like this in the final ePub3 package. It is also a link target as well.

<span class="pagebreak-rw" epub:type="pagebreak">87</span>

It works like this:

The complexity of pagebreak insertion. There are just two options, put 'em in, or take 'em out!

  1. Select Editing Tools and then Content Tools. Scroll past all the other exciting options until you find the item Page Break Insertion.
  2. Click Insert Pagebreak.
  3. Wait a few moments and the pagebreaks will be generated and inserted into the XHTML. If for any reason a pagebreak is missed the report will let you know. You have the opportunity to insert a missing pagebreak by hand.

Why would there be missing pagebreaks? Content is infinitely complex so there can always be conditions where any process fails with real world digital content. It doesn't happen often, but is a reality. The Page Break Insertion processor maintains an inventory of page boundary matches it inserted and verified. Any missing pagebreak numbers can then be easily reported.

We have handled problem areas such as columns, footnotes and floating blocks such as Figures, Illustrations and Tables. Page-breaks are inserted relevant to the galley text flow rather than any media content block items.

But There Is So Much More

Because IGP:FoundationXHTML is semantically driven, when an ePub3 file is created a very rich set of epub:type properties is generated and included.

It is probably a truism that an IGP:Digital Publisher generated ePub3 is the most specification  compliant package available from any production method or system.

Check out this article on the epub:type mapping richness in an ePub3 generated by IGP:Digital Publisher.

Pagebreaks in AZARDI

So pagebreaks need to be used in Reading Systems. Those that support epub:type="pagebreak" all use them differently.

AZARDI uses them very explicitly and "in your face". This is for academic and education content where referencing a page number can be very important.

AZARDI pagebreaks packaged from IGP:Digital Publisher is a little unique in that the page numbers are grouped by book section. Regretfully silly limitations in the ePub3 spec don't allow the section titles to be inserted in a page navigation structure. This is discussed in detail here.

On the AZARDI Interface there is a View Page Numbers button that toggles through four states starting with no visible page break:

The pagebreak can be viewed as a bar clearly marking the actual textbreak position. Or...

The pagebreak can be viewed as a less intrusive inline dot at the actual text break position. Or...

The pagebreak can be viewed as a inline margin dot. This is our default presentation for textbooks.

Finally you can click the button one more time and there are no page breaks.

To see this in action check out any of the Guy de Maupassant ePub3 sample books available here. The ePub3 page numbers relate to the A5 Print PDF.

Pagebreak are Done!

Our pagebreak automation journey appears to be over for the time being.

IGP:Digital Publisher automatically inserts pagebreaks into IGP:FoundationXHTML documents for many purposes, one of which is pagebreaks in ePub3 for accessibility and page navigation. We also added the Source ISBN field to the standard packaging metadata fields.

In many ways this demonstrates how a content-centric production approach provides flexibility, adaptability and productivity not available with a design-tool or XML first driven approach.

Posted by Richard Pipe