XML.com: XML From the Inside Out

XML.comWebServices.XML.comO'Reilly Networkoreilly.com
  Articles | Weblogs | Newsletter | Safari Bookshelf
advertisement

Article:
 The Long, Long Arm of SGML
Subject: Basically right, but overreading a bit
Date: 2003-11-10 09:54:05
From: Wendell Piez

I think Kendall has this basically right, but there's also a bit of overreading which gets in the way. What he says about the SGML legacy in XML points to a significant aspect of the problem without actually revealing it -- in fact it's rather masked by the "Anxiety of Influence" analogy ... which also has it only partly right. It's true that every generation of engineers dismisses the challenges and disparages the solutions of the preceding generation (while being indelibly imprinted by them); but there's also more going on here.


The requirement for human-legible representations of non-keyboard (non-displayable) characters exactly straddles the line between XML-document-as-lexical-instance (unparsed) and XML-document-as-model (parsed, probably into a tree) that so disconcerts XMLers. There is only one solution to this, namely to provide the processor with a mapping of external representations to internal structures. All the proposals are variations of this:


Tim Bray - standardize a mapping and build it into the tools
Richard Tobin - declare the mappings in namespaced attributes, not in a DTD
old-fashioned - use a DTD, internally if you want to go standalone (what's the big deal?)


The differences amount to differences in (a) required infrastructure, and (b) level of standardization -- but none of them get to the heart of the matter, which is that much as XML developers want to reduce the role of XML-as-lexical-instance in favor of the purity of the model, they (we) just can't get away from the fundamental requirement addressed by XML (or SGML before it) in the first place -- to represent something as complex as that model (to say nothing of the real-world documents or objects that that model seeks to represent!) in something as lo-fi as a stream of 7bit ASCII characters.


Many SGML features that are so disparaged or derided by XMLers make much more sense if you look at this requirement in the context of systems with 4MHz processors and 640Kb of RAM (which, as some of us recall, is as much as we will ever need) ... in that world, you really want a DTD to configure lexical aspects of markup such as tag omissibility or DATATAG, which XML has decided there's no call for. Consequently, SGML was much more willing to see the lexical instance as a primary artifact (not just a temporary serialization of the "real thing") and much more ecumenical with respect to processing models than is XML. (Anyone remember the Desperate Perl Hacker? Tree-based namespaces did him in: R.I.P.)


My own prediction is that this particular proposal won't really go places -- DTDs, especially given internal subsets, just aren't that broken -- but that the underlying issue won't disappear either. It's only evident, now, in brushfires like this one; but as long as XML is still growing into the application spaces where the tree model is sufficient -- which is to say, as long as we can manage to forego our needs for even more complex kinds of representation such as overlapping structures -- there'll be enough to keep us busy, and the worst stresses and strains will remain potential.


In the meantime, general entities, and the kind of bridge-to-the-serializer solution pioneered by Zarella and Tony, will be enough.


SGML is Dead! Long Live SGML!


Previous Message Previous Message   Next Message Next Message

Sponsored By:


Contact Us | Our Mission | Privacy Policy | Advertise With Us | | Submissions Guidelines
Copyright © 2008 O'Reilly Media, Inc. | (707) 827-7000 / (800) 998-9938