The XML Book Business

October 29, 2003

Kendall Grant Clark

After spending a week of toil and labor in the Semantic Web mines, I've returned to the surface, to the sweetness and light of the XML developer community. And what do I find but a crisis about the XML part of the technical book publishing industry, as well as a monster thread about character entity names.

I've been beaten up publicly as of late for failing to disclose various affiliations and perceived conflicts of interest. Let me say, then, for the record that I work for O'Reilly as an editor of this site, and that I know very little, if anything about O'Reilly's book publishing business. So now that I've outed myself, let's get to the issue.

The Business ...

Simon St. Laurent -- a book editor for O'Reilly -- started the conversation on the XML-DEV mailing list as a result, or so I suspect, from his having attended various editorial meetings leading up to O'Reilly's FOO Camp earlier in the month. I was fortunate enough to attend one of the FOO Camp presentations which Simon refers to. Indeed, the XML book business is fairly grim across the board, not just for O'Reilly.

That's an important point to think clearly about in my view. After all, O'Reilly as an organization is far from omniscient; in fact, Tim O'Reilly seemed to admit at a FOO Camp presentation that O'Reilly had missed the PHP boat, something others have been saying too. In that case, of course, it was a tricky thing to see, especially for O'Reilly, which has put a ton of effort behind Perl. It particularly tried to pitch Perl as something other than a language for writing cgi-bin Web applications. That must have made PHP's success as a kind of Perl++ in that particular space hard to see coming. One of the most common ways in which people and organizations err is zigging when a zag would have worked better.

... Isn't Doing Very Well

So the first point is that everyone's XML book business is in the dumper. That's the fact which makes this whole issue of interest to more than just O'Reilly employees. If you are inclined to think that the book business is any sort of predictor of technological uptake in general, then you may be worried that sales of XML books are flat at best. I'm not sure that book sales are much of an indicator, at least not in the current economic climate in which technology capital expenditures are, and have been for nearly three years, trending down sharply.

However, XML book sales are down even relative to the downward trends of other technology books. As St. Laurent put it,

XML book sales have dropped substantially, even relative to the overall decline in technology books. A few books dominate the broad (typically though not necessarily beginner) end of the market, while more focused books struggle to achieve the numbers needed to justify their publication.

And the numbers for web services books are even worse.

Possible Reasons, Murky Answers

St. Laurent offers three candidate causes for the decline. First, that XML was overhyped and that wave of enthusiasm has finally foundered on the rock's of XML's disappointment. Second, that there is some measure of standards exhaustion, with W3C XML Schema being something of a key exhauster of developer confidence. Third, that most use of XML is relatively simple usage which doesn't require many more than a few basic books.

St. Laurent thinks about these things as much as anyone I know; hence it might be worth lingering a bit over his candidate reasons. As to the first, I suspect everyone will agree that XML was overhyped. All hype waves eventually exhaust themselves. But St. Laurent adds a claim which many in the XML development community will not agree with -- namely, that the inevitable letdown from the hype was exacerbated by XML failing to be what developers took it to be. That seems as much an indictment of XML as of the fact that XML was overhyped.

As to the second, I tend to agree with St. Laurent about the complexity of specifications being an impediment for many XML developers. Even more to the point, there was something about the W3C XML Schema (WXS) specification in particular which marked a kind of breakpoint in my ability to track carefully the W3C's output. Before WXS I thought I grasped everything of relevance. Since WXS I have had to pick my spots more carefully. Maybe WXS doesn't deserve the blame, but it's as good as place as any to place it, in my view.

As to the third, I think St. Laurent is exactly right here; and I will only point out that the second and third reasons work together in interesting ways. That some kind of XML core is very useful, and very often used, suggests that the W3C -- and here I mean the prime mover member organizations, most of which are enormous corporations -- may have missed the boat in aggressively pursuing more complex, less used permutations.

To St. Laurent's list of candidate reasons, I add the maturity of the Web as an informal publishing medium. Despite O'Reilly's investment in sites like, I would be very surprised if the Web doesn't still keep folks there up nights worrying about technical book sales dwindling and never recovering. Book publishers will deny this -- and maybe they're right -- but good enough is often enough when it comes to technical material. For any reasonably interesting computing technology the Web often contains enough information which is good enough to get most technical workers over the hump. Even worse, much of that information is published informally, which is to say that it's not really published at all, it's just there, and it's certainly not there for a profit motive. I think there's a future yet in technical book publishing, but it's probably going to be different than the past.

Chris Wilper makes this point in an amusing way:

The truth is, I can almost always find the reference material I need online, with the added bonus of mailing list archives. The tactile experience of books is the only thing I miss.

Oh, and the smell.

Wilper's experience exactly echoes my own. The number of tech books I've bought is inversely proportional to my technical skills. I am, as Wilper hints he may be, a serious book junkie. I love the damn things, to the tune of having more than 3,000 volumes in my library. The Web has meant I can focus my bookish acquisitive desires on stuff other than the technical. I can nearly always find something good enough on the Web about the latest XML or Python or Linux or Semantic Web doodad.

In a followup message, St. Laurent speculated as to the reasons why XML developers, particularly the kind of junkies and addicts who haunt XML-DEV, aren't buying more books, again offering three. First, a mismatch between the kinds of XML books published, typically targeted at beginners, and the advanced status of some XML developers. Second, the dispersion of interests as people move from beginner to advanced status. As Simon puts it, "I've tried, but not all niches are sustainable". Which is too bad, of course, because for many of us the really interesting stuff is stuff that has, necessarily and unavoidably, a small audience. Thus, doing something to lower the cost of producing a new book -- or finding a new genre of publication which is (a) cheaper and quicker to produce than a book; (b) more substantial than a long web article; (c) targeted at these interesting niches; and (d) not insanely expensive -- might help the bottom line. Third, technical books sometimes suck. That's a hard truth, but a truth nonetheless. I don't think O'Reilly is worse in this regard than other publishers; in fact, O'Reilly is better than most other publishers in this regard. But sometimes, as the waves of hype crash over and over us, in early days, we lose our way and become disoriented. That's not so good for quality control.

Exacerbating all these issues is the fog of book publishing. It's not easy for publishers to know why people do or do not buy technical books; in fact, I think it may be harder to technical publishers to figure this out than for, say, publishers of fiction. Taste is more fickle, but less complex, than necessity. It can even be difficult to know how many books end up in the hands of actual readers, given that publishers sell to retailers, not to readers. My impression is that the folks at O'Reilly have worked hard to solve this problem, and that they still have more questions than answers.