IDPF Index Spec. Deconstructed

01 April 2013

Indexing and eIndexing, IDPF, ePub3, Specification, Analysis, Opinion

The IDPF ePub3 Indexing Specification is as abstract, incomplete and pretty much useless as the CFI Specification. It should be low priority for any reading system developer or publisher. It is not a total fail, but it is significantly incomplete.

The IDPF ePub3 Indexing Specification is as abstract, incomplete and pretty much as pointless as the CFI Specification. It should be low priority for any reading system developer or publisher producing digital content.

It is not a total fail (like CFI with its arcane XLinks vocabulary expression), but it is incomplete and un-necessary. It does nothing except increase production costs and the long term cost of digital content ownership.

Flavour d'Jour

It is probably not the place of any specification to give vision statements on what could be done with it. But that is what happens in the voluminous Appendices to this specification. It's rather like the "Les Miserables" song "I Have A Dream" which is going to end with the line "My spec. has killed the dream I dreamed".

The IDPF Index thing is thankfully less than than (<) the IDPF Content Fragment Identifier (CFI) interpretation of the very dead X:Link specification. It brings no value now, doubtless future value and actually does nothing except provide wrappers for HTML <a href> links and targets.

For those who have a pathological aversion to reading these types of specification documents I have kindly provided a neutral analysis here.

Read the Specification Luke!

There are probably very few people who have read or tried to digest this specification (about to become real) draft, or actually applied it to real content for use in a reading system. We have done both in our evaluation.

There are even fewer who have seriously tagged Indexes for future value and variation processing. Here are the challenges presented by the specification:

  1. Reading Systems are meant to be able to interprete and use this "stuff".... as the IDPF specification typically says... somehow.
  2. Publishers and their service providers are meant to be able to actually produce content that implements the tagging structures so reading systems can use it.
  3. The IDPF continues to think that because they have written words in a specification it matters to publishers.
  4. The specification is a epub:type encapsulation of a decade-old failed XML tagging strategy for indexes. Do we want to step back that far in digital-content time?

Index Spec Discussion History. EPub3 Supporter, Critic and Troll

I was recently called an IDPF troll for comments on earlier versions of this "Index" specification. I was painfully and personally hurt (NOT!) by these accusations.

This was because I was silly enough to tweet on the use of <ul> instead of <ol> for indexes. While this may not affect reading system madness it does affect digital content ownership, processing and reuse. However the IDPF voices don't give a teddy about the cost of content ownership because they somehow see ePub3 as "a magic solution" to the digital content ownership challenge. EPub3 is of course nothing more than one delivery format. They simply don't understand this.

Using the wrong structural type is simply stupid. I continue to reinforce this point. No matter how you process and reorganize an Index statically or dynamically it has no value if it is un-ordered. In long-term digital content ownership and management consistent correct use of structure and sematics matter. IDPF. Fail with unfathomably stupid arguments.

We have a significant HTML5 digital content management system that relies on the fact that ordered means just what it says and unordered means who cares what the sequence of "items" is (like a grocery shopping list).

Indexes and the art of Indexing  are/is complex. No one is denying or belittling this. The IGP:Digital Publisher eIndexer reflects the complexity of both the mental process an Indexer (human) has to go through. IGP:eIndexer is instantly ready for every aspect of this specification and can be upgraded quickly and easily when required.

The conundrum is should we support this specification pre-emptively (as we have supported ePub3 in production tools and reading systems for two years) or wait for some publisher to say we just have to have this IDPF index thing! Other than the production and processing testing we have done so far we will wait.

Fortunately we have a module attached to IGP:Digital Publisher called IGP:Formats On Demand. It's job is to take highly value tagged content and dumb it down for ePub3 reading systems, the Kindle drone format and possibly now the ePub3 Index specification... if a publisher ever asks for it.

Complexity Without Benefit

So lets get into a bit of practical, real-world analysis (even if the only reading system that will even try and support this is Readium).

More new rules and epub:type properties have been unleashed by the Index specification than you can throw a caveman's club at. Here they are in all their excitment.

For Documents That Are Only Indexes

If your document should happen to be only an Index, make sure you use this declaration in your metadata. (I hate to think what the LOC, ONIX, METS, PRISM people will say about the discriminating use of the deadly DC).

<dc:type>index</dc:type>

Your Manifest

Of course your Manifest must be intelligent. Don't forget to add this.

properties="index"

Index is now right up there with MathML, SVG and Scripts. Very exciting!

And the epub:type Vocabulary

But even more excitingly the specification creates a radically extended epub:type vocabulary with lots of glorious new property values. Here they are for the digital content production afficiandos. I am not explaining them here, just making the list. The vocabulary is mostly self explanatory. How they are to be used is not!

  1. index-editor-note
  2. index-entry
  3. index-entry-list
  4. index-group
  5. index-headnotes
  6. index
  7. index-legend
  8. index-locator
  9. index-locator-list
  10. index-locator-range
  11. index-term
  12. index-term-categories
  13. index-term-category
  14. index-xref-preferred
  15. index-xref-related

And Now Down to Nav

Of course indexes must be referenced from the nav.XHTML file. Add in something crazy like meaningless DocBook "role" statement:

collection role="index"

collection role="index-group"

Strange Manual Linking

The specification goes into detail on tagging index terms in an index with all those exciting epub:type properties. It even introduces the property inheritance concept which means you don't have to use the properties they have defined. Wow! How awesomely original. We have been using sub-group inheritance in IGP:FoundationXTML since 2006! But admittedly it is a step up for the IDPF. They must have been so excited with this innovation.

What is of considerable interest is the index link targets. In the cute online Index sample they have used a very manually applied href value and #href which doesn't adequately illustrate the complexities of real-world digital-content target index linking.

As far as I can see the targets are naive link #hrefs. They have no semantic for the targets. Now I may be wrong here, but nothing I could discover in the specification gave any indication this was otherwise.

A target can be an item, a start or an end point. But there is no epub:type property that can be applied to a <span> for example to deliver different highlighting or other visual and accessibility clues to the content body. This is an amazing omission.

Index Power Now!

In the world of IGP:Digital Publisher Index links are symmetrical and semantic. Linking symmetry is for the crudest reading systems. The semantics is for more advanced reading systems that support Javascript. It is possible to get to an Index term of interest from the index reference in the text.

We use HTML5 data- attributes for link types rather than class statements because they are IGP:FoundationXHTML and AZARDI specific. That means a line of Javascript can create a reference back to the Index term.

But there is more of course. In a full AZARDI ePub3 package, Index terms are block packaged to the index references to empower horizontal Index term navigation. The horizontal navigation packages are processed and created at packaging time so the reader CPU/Memory resource consumption is minimized. The reading system should not be reponsible for everything presentation and interactive. This crushes innovation; which is where the ePub3 spec. is failing so miserably.

AZARDI Horizontal Index Linking

Click on the following horizontal index linking examples (the links go nowhere this is for presentation demonstration only) of a relatively large index-term.

Main term example with page numbers. This is suitable for documents that have pagebreaks included and a source reference to the print work the pagebreaks represent. Ideally the links resolve to the actual index reference point on that page, not just the page number.

Main term example with direction indicators. Suitable where no pagebreak references are available and the index reference is tagged in the content. This is built into the IGP:Digital Publisher eIndexer for the production of front-list print and digital only books. Notice that the arrows of single references point to whether the index reference is before or after the current location. This sense of direction is essential in horizontal navigation tools.

Sub-term example with page numbers. Suitable for documents that have pagebreaks included and a source reference to the print work the pagebreaks represent. The abstraction with page numbers in digital content is that the user could have clicked the Page 12 index term reference, the page 47 start index-term reference or the page 52 end index-term reference.

 Sub-term example with direction indicators. Suitable where no pagebreak references are available and the index reference is tagged in the content. This is built into the IGP:Digital Publisher eIndexer for the production of front-list print and digital only books. The sub-term is listed first with the primary term second, and other links referenced appropriately.

The value of this approach is a user can explore all index references and sub-references for any specific index term and easily return to their starting reading point from any of the other index blobs.

While a reading system theoretically could assemble this package dynamically:

  1. It doesn't make sense to commit tablets to the memory and CPU resource required.
  2. You would never get multiple reading systems to handle this consistently across all platforms.
  3. Some nutter, with a smirk on their knowledgeable face, would insist it uses the hopelessly general HTML5 <aside> element.

No New Ideas. Production Nightmare.

AZARDI has horizontal Index navigation, Index context comparison, multi-document index referencing and more. This builds pre-processed content and linking. The only job of the reading system is to follow the links.

All the IDPF has done is make index production rigid, complex, incomplete and expensive. Without target properties the system is very limited as to what can actually be created.

It would appear the potential dynamic and interactive potential of indexes have been turned into a lump of "XML-like" cement, encapsulated in epub:type properties and other strange things.

The good/bad news is: no publisher will be able to produce a consistently tagged index that any reading system will be able to use for years (but probably ever) unless they use an advanced XHTML production system like IGP:Digital Publisher.

Is there a point to specifications of this nature? I certainly understand the goodwill and dreamtime nature of the specification. It reflects the fact that the current IDPF management and specification writers are not involved on a daily basis with real-world digital content. It is not hard. It is not expensive.

It is obvious that there are people involved who are passionate about indexes/ing, don't have a 2013 view on digital content production and are influenced by the apparent "knowledge" of the old-XML school dudes.

Summary

Ultimately all the IDPF has done is added some epub:type properties to standard index terms. Pitifully they have used unordered lists.

It's nice to see that in their demostration Index they have even sneaked in an index link to an arcane XLink type CFI item. A link to the irrelevant arcane!

They give acknowledgement to the DocBook specification. That alone should raise serious alarms with serious digital content practioners. A decade old failed XML method should not even influence a 2013 HTML5 specification. There are actually real 2013 new and powerful options.

The outcome is a reading system specification that defines a rigid Index tagging strategy that publisher must create for a reading system to understand. This is just plain wrong.

It is this lock-down approach that continues to put nails into the ePub3 coffin. The inspirational outcomes of the AAP conference and the as yet undocumented abstract outcomes of the recent EDUPUB conference are being ignored by the IDPF "specification at any cost machine".

Yes, it can be made to work. Yes, it will make ePub3 content rigid and more expensive to create. Yes, it is somewhat practical-ish. Yes, it is a decade old with a few flashy elements. Yes, it is rigid in spite of the inspiring Appendices. Yes, it will cause reading system developer head-aches.

So for me it's all YES'S.

IGP:Digital Publisher is one of the very few systems that can easily process epub:type properties into anything, even this specification. Currently an ePub3 produced with IGP:Digital publisher includes around 90% of epub:type properties (where they make sense).

Posted by Richard Pipe