10 February 2013
Getting the structure of any digital content correct is core for any real digital content strategy. To start this discussion in the right tone. Publishers! Avoid XML with vehemence and violence. Implement an XHTML strategy with a controlled vocabulary.
Getting the structure of any digital content correct is core for any real digital content strategy. To start this discussion in the right tone. Publishers! Avoid XML with vehemence and violence. Implement an XHTML strategy with a controlled CSS selector vocabulary.
O'Reilly's have always been strong proponents of "XML First" and have promoted their cleverness with this approach. Of course they have mainly mono-subject content with a half-life of only a few months. With the O'Reilly TOC conference starting in a few days it seems like the right time to bring this up.
Here is one of the saddest and typical ePub2-3 pain stories from O'Reilly themselves:
There are a LOT of reasons for selecting XHTML over XML; probably too many for this post, so this fireside chat may roll over into more posts.
Easy. Because it doesn't work, it is not maintainable, it is not sustainable, it is not extensible, it is not flexible or agile, it costs a lot of money, it never delivers what it promises and it is not ready for the type of future content is now facing.
These are of course generalizations. Sometimes it does work. But just having a cupboard full of XML tagged content is not a real 2013 digital content strategy.
The same way that HTML5 has eliminated Flash and redefined what the Internet is becoming, it redefines publisher content management strategies. XML consultants either don't get it, or don't want to get it. From the O'Reilly article:
"Over the past year and a half, O’Reilly has sponsored the DocBook project’s development of open source XSL stylesheets for transforming DocBook XML content to EPUB 3, which we’ve used to update our own toolchain to produce EPUB 3 output. With the release of iBooks 3.0 in late 2012, a critical mass of O’Reilly’s readers had devices that supported EPUB 3 content. We felt it was time to upgrade our content to EPUB 3 to provide people using 3.0-compliant platforms the best quality reading experience."
Yes folks. It took them only a year. IGP:Digital Publisher was creating ePub3 from five year old stored content in December 2011, just 15 days after the spec was released. No XSL stylesheet to transform DocBook crap to XHTML. All their special handling statements in the article were just implicit in the content.
It only has to be hard if you use XML first!
Around 18 months ago I started a series of articles on the philosophy and approach of IGP:FoundationXHTML. For various reasons that fizzled to a halt as I went on a blogging sabbatical to focus on the job.
Since then we have put the full IGP:FoundationXHTML Specification documents online so a set of articles is not really required. Those interested in these deep things can read and absorb it at their leisure. However it does seems to be time to restart the dialogue now the resources are available to talk against.
The argument with the "XML crowd" boils down to one word "semantics". It is a load of nonsense. On our IGP:FoundationXHTML Guiding Principles page we define SEVEN properties with equivalent importance. These are:
You can read more about the arguments of controlled vocabularies using XML "validation" vs. just controlling them with tools here and throughout the FX specification.
With the exception of NLM (and a lot of that is tagged really badly) there is not a XML system out there that delivers the goods for any publisher of any content whatsoever. You hear how DocBook has a large vocabulary (it's actually rather weak and misses details) but it does not even get content structure close to correct. It is very sad.
We used the NLM Bibliography semantic tagging patterns in IGP:FoundationXHTML because it is excellent, created by experts, complete and one of the areas in content where semantics really is an important property for academic discovery.
You have to respect, admire and use professional excellence in XML. It doesn't happen as much as many people think.
Of course the problem of digital content ownership and production is bigger than atrocious XML strategies.
So you are a small or medium publisher. If you read that article by O'Reilly you should be running screaming for the XML exit door.
As it happens there are worse digital content strategies than worthless XML. Even worse than XML is trying to produce multiple formats in desktop environments such InDesign, Sigil and Calibre, or like ilk. These can eventually produce an ePub format with massive effort.
Now you need an ePub3. Blast. Gotta go throught the same thing again. If you need complex notes, indexes, references, and image positioning it just can't be done sensibly.
People who use these systems think because the PDF was made in InDesign, I can use the same tool to make an ePub. Adobe will get it right, Yeah!
The fact is Adobe can't get it right. You only have to read the IDML document to see why they can never get it right without a total refactoring of the core software (It's an XML garbage can). From a digital content production perspective tools such as InDesign (great for PDF) and Apples iBooks Publish are the digital content gutter; where money runs away and content goes to die. It is incredibly expensive to create that content in the first instance, and it cannot be reused by publishers to make more money.
The XML crowd (those who think DocBook, TEI, NLM, or some custom XML monstrosity is a content strategy) have got all this wrong.
The O'Reilly article highlights the stupidity of an XML repository in mindless DocBook. If your "XML consultant" is recommending any of these approaches, chances are they are going to cost you, the publisher, a lot of money without much in the way of deliverables, except delivering the consultant more money.
Yesterdays XML solutions do not address the business dynamics required by publishers today. What do you as a publisher need from your digital content? Try this list:
OK. If you are writing and publishing a novel a text-editor is fine. If you are a hobbiest type production person enjoying the incompatibility problems of Kindle, iBooks, Nook, Kobo, etc. differences that's fine.
But if you have to deliver education, academic, trade non-fiction, self-help, travel, cooking, magazines and just about every other type of content out there on a schedule and a budget, the print and webpage origin tool-set doesn't cut it.
All the format and channel delivery problems are addressed very easily if you start with your content in the right format. In 2013 that means XHTML5. Don't compromise.