Power XHTML e-Indexing

21 April 2013

Indexing and eIndexing, Multiple Format Production, Static Sites, Digital Publishing

The challenge! Indexing continuous XHTML content (tagged in IGP:FoundationXHTML of course) to create print indexes in any page-size edition and with powerful backlinked e-Indexes for digital content.

The challenge! interactively indexing continuous XHTML content (tagged in IGP:FoundationXHTML of course) to create print indexes in multiple print book page-size editions and with powerful backlinked e-Indexes for digital content.

IGP:Digital Publisher now has the most amazing, easy to use Indexing application and processors available in the known Universe (That's hyperbole for those who don't instinctively recognize such things!)

It has been built based on real-world recommendations (software development euphemism for complaints, winges, whines, nags, cajoling, begging, wishing, pleading, threatening) by our academic clients; plus our own drive to create something better for the digital content future launch-pad on which we all stand.

Indexing is tough. But done well adds significant value to print books. What has yet to be seen is the Index as the most significant exploration and navigation tool for relevant content in 2013 content. Stand-by World!

Requirements

Traditional indexing is hard. While there are plenty of guidelines around they don't actually say how an indexing application should work. We had to understand as far as possible what is happening inside an indexer's head. We were not allowed to perform open-brain surgery on any indexers so we had to gaze deeply into the infinitely deep pools of pained experience they call eyes and extract the pain. But we also have to be honest and say, Indexes are now a new and magical thing that is bigger than Avatar (which is exactly what an Index is in a sequence-distorted dimension).

So here is our rather obvious list of requirements:

  1. Easy to use. This is a rather obvious first requirement for a job as challenging as Indexing. The Indexer want to read and think about the content, not fight with an interface.
  2. Interactive build. The Index should always be available to view and edit as it is being built. Every change should be seen immediately.
  3. Index while authoring/editing.  Rather than limit indexing to when documents are complete indexing can be a part of the authoring and editing process. An Indexer can also work on a document while it is being edited and correct the beginner index of an ambitious author or editor.
  4. Navigate and Inspect. There are not page numbers while the index is being built. If you want to check that index item in the copy, just click the button.
  5. Editing fidelity. Must work when sections are reorders, or even if an Index term is moved in an editing cut and paste operation.
  6. Works for all formats. Must work for print, e-books, be processable to other XML Schemas and strange formats like ePub3, plus work online.
  7. Support multiple IGP:Digital Publisher Design Profiles. This means the index must generate correct page numbers for any print format such as Paperback, Large Print and RGB PDF with full index interactivity even with significant repagination. The days of a one print edition being the master cited reference may be over.
  8. Production convergence. Handle backlist digitization and front list Index generation in the same manner with only the index term resolution being the difference. That is all about the quality of the IGP:FoundationXHTML strategy.
  9. Multiple Index Generation. Allow processing of a primary Index to multiple specialist sub-Indexes. Eg: Name/Place/Date indexes. Dream feature!
  10. Index Term to format. Allow index terms to be associated with format (design profiles) and not be limited by print page count requirements. IE. You can have 20,000 index references in your e-Book even if your print edition is restricted to 10 pages.
  11. Remixable. Must be able to be used with the IGP:Digital Publisher REMIX feature, allow assembly of disparate sections into a new book, with easy reorganization of the newly assembled Index references and items. A feature regarded as impossible by some (but not the secret way we do it. Now we haven't actually implemented this yet. But it is on the list of things to do.
  12. e-Index Ready. Instantly work with the AZARDI presentation Interactive Index.

With that "little" list of requirements the problem was attacked with gusto using nothing more than JQuery, Javascript, XHTML, CSS IGP:FoundationXHTML as the rock on which to build and a decade or so of exerience getting things wrong until we get them right!

The Tools

The Indexer is easy to use and you can watch the index build. The "page numbers" are * at this stage as no format has been generated. The little navigation buttons allow you to go directly to the index item in the IGP:Writer text.

The objective was to keep the tools as simple and direct as possible so the Indexer can focus on the content and not the tools. Because IGP:Digital Publisher is a Web App and not a desktop application things have to be done a little differently.

Because the indexer gets to see the content and index building side by side there are a number of different strategies available.

Manual Index Terms. Highlight any term and click on the Index list. It is automatically and instantly inserted into the Index list. Click on a root term and it is inserted as a sub-term. Click on a sub-term and it is inserted on a sub-sub-term, etc.

Manual Index Range Terms. Click in a paragraph to set the term range start point. Click in a lower paragraph to set the term range end point. Click on the index as with the Manual Index Terms and the range is instantly set.

Edit terms. What is in the book and what should be in the index. You can edit a term to "Index lingo" and it will immediately re-sort itself.

Italic Style Terms. Italicize where you need to.

Remove a Term Entry. If an entry term is not required, click the delete option.

Remove a Term. If a full term is not required, click the delete option and all entry terms are also deleted.

Click Save. Your IGP:FoundationXHTML index lists are immediately generated and available for inspection on the IGP:Writer page and ready for print PDF or e-book format generation.

Key-Term generation. Provide a keyword list and the application will process the file and add all occurences of the Index Keywords to the Index. This gives a robust start for proper-names, events, dates and other sigificant content. It also gives an over-population of terms so both terms and entries had to be able to be inspected and deleted interactively.

Well that's the big picture there are of course a million details which are for user documentation not a major feature announcement.

The Outcome

Generate an edition in A5, B5, 6in X 9in or any other size using Design Profiles and your index is automatically generated with the correct page numbers for each index item. No fuss.

The e-book edition can have print-page numbers using ePub3 page links, sequence number links or anonymous links. More importantly DP lets you reverse process your indexes for e-Books.

A soon release of AZARDI will allow interactive indexes to be accessed and navigated anywhere from within a document.

Next - Multiple Indexes

Multiple indexes are a reality with some content, but too expensive to produce with print in many cases. You have a great master index, now you want an index of places, people, events, food or any other term group.

To make Indexer's life that little bit more difficult we are thinking of adding an "Index type" classification value so there is one master index processed to all other indexes. IE. Classification metadata on index terms. 

This makes it easier to create index rich and valuable content. But that may be breaking the print index methods a little too early in this new digital content frontier we inhabit.

New Interactive Indexes

It needs to be easy to create an Index Term hover (or tap) interaction that allows the horizontal exploration through the content of an Index term and it's sub-terms. This is coming to an AZARDI reader near you soon. Sadly it will probably never be seen in any other reading system.

Wrapping it up

Indexes are a print invention and a powerful content engagement tool. However the print index thinking is constrained by things such as page counts in certain types of publishing, plus often the need for economy and a certain vagueness and specialist paper-saving vocabulary construction. Will a digital content transition occur, how soon will the transition occur? Sigh! a million questions...

Our index master-plan with AZARDI is to be able to combine all Indexes from all books together within the reading system and make them a valid search target. For most appropriate content this will result in more valuable search and discovery results. We are now going to start experimenting with our shiny new e-Index tool for Web Indexes on static sites.

If e-Index creation is easy and delivers the goods, indexes become significant content engagement power-up tools for the future, not just a continued imitation of print indexes.

IGP:Digital Publisher e-Index is ready to go and will be updated on all licensee installations on the next major release update.

It is up to you whether you create yesterday's print indexes or start redefining indexes and indexing as a vital component in significant digital content into the future.

All feedback gratefully received. We are sure this toolset will be a prime candidate for a lot of improvement.

Product feature announcement. Posted by Richard Pipe

Start a real digital content strategy with

IGP:Digital Publisher

The complete digital publishing content management and production solution.

Available as for Small and Medium publisher:

Subscription Portals

Production Service Portals

IGP:Digital Publisher is also available as a full site license purchase.

Contact us for more information...

Use one master XHTML file to instantly create multiple print, e-book and Internet formats.

comments powered by Disqus